Hi Richard,

I think the only way to know for sure what's going on is to turn on
tracing.  Since you don't want to get flooded with irrelevant trace
data, I'd suggest running it once to print out the cycle where the
interesting activity starts, then running again using --Trace.start=<n>
to enable tracing just before that point.  If you set the InstExec, Bus,
and Pipeline trace flags, plus the one for whatever device you're using,
you should get a pretty complete picture of what's going on.

Anyone else have any other tips?

Steve

Richard R. Zhang wrote:
> Hi Ali,Lisa, and Steve,
> Thanks a lot for your help.
> Following Ali's advice, I have tried two methods to implement the time 
> measure in the
> guest system. But they still led to the same results which is same as I 
> mentioned  in
> my first mail. The I/O access time in CacheCPU is still much bigger than it 
> in Detail-
> edCPU. 
> Here are my methods.
> Method 1. As I did before, I added a m5 instrustion to the simulator. The 
> following code
> fragments come from the file isa_desc and pseudo_inst.cc in my m5 code.
> In isa_desc:
> ...
>             0x24: gettick({{
>                 Ra = AlphaPseudo::gettick(xc->xcBase()) + (Rb & 0);
>             }}, No_OpClass, IsNonSpeculative);
> ...
> In pseudo_inst.cc:
> ...
>     uint64_t
>     gettick (ExecContext *xc)
>     {
>         return curTick;
>     }
> ...
> And I changed the readl and writel function with the following function in 
> Linux driver.
> static void __stat_writel(u32 v, volatile void __iomem *addr)
> {
>     int64_t before, after;
>     after = 0;
>     before = gettick(after);
>     writel (v, addr);
>     after = gettick ((int64_t)addr);
>     
>     if (enable_stat){
>         iow_cycles += (after - before);
>         iow_count ++; 
>     }
> }
> static u32  __stat_readl(const volatile void __iomem *addr)
> {
>     int64_t before, after;
>     u32 ret;
>     after = 0;
>     before = gettick(after);
>     ret = readl (addr);
>     after = gettick (ret);
>    
>     if (enable_stat){ 
>         ior_cycles += (after - before);
>         ior_count ++;
>     }
>     
>     return ret;
> }
> Method 2. I use the rpcc instruction to get the tick value, not mine pseudo
> instruction. The only difference in driver code is that the gettick function
> is replaced by __rpcc function which is a wrapper of rpcc instruction.
> static __inline int64_t _rpcc(int64_t dep)
> {
>     int64_t res;
>     asm volatile ("rpcc %0, %1" : "=r"(res) : "r"(dep) : "memory");
>     return res;
> }
> 
> I'm confused with this problem. I think that the I/O register access could
> not be fast as the cache access. It seems that the data was come from the 
> cache not the device. Because the instruction dependency has been added in
> my time measure code, the out-of-order model could not be real reason. And
> I have found a strange thing. In detailedCPU model, the first I/O access 
> time is similar as the cachedCPU model. The following text is come from the
> console log. The strings in it is printed by printk in Linux driver.
> 
> XXXXXXXisrXXXXXXX
> irq count:16, used 3973 cycles
>       In this irq:
>       io read count:1, used 1556 cycles
>       io write count:1, used 1604 cycles
> XXXXXXXisrXXXXXXX
> irq count:17, used 4736 cycles
>       In this irq:
>       io read count:2, used 3240 cycles
>       io write count:0, used 0 cycles
>       
> ...from here, the simulator entered the detailed mode...      
> XXXXXXXisrXXXXXXX
> irq count:18, used 4169 cycles
>       In this irq:
>       io read count:2, used 1623 cycles
>       io write count:0, used 0 cycles
> XXXXXXXisrXXXXXXX
> irq count:19, used 3818 cycles
>       In this irq:
>       io read count:1, used 9 cycles
>       io write count:1, used 11 cycles
> 
> Could anyone give me some more advices? Thanks a lot!
> 
> Richard R. Zhang
> 2006-05-12
> 
> 
> 
> 发件人: Ali Saidi
> 发送时间: 2006-04-27 12:14:54
> 收件人: Steve Reinhardt
> 抄送: Richard R. Zhang; Lisa Hsu; m5sim-users
> 主题: Re: [m5sim-users] Is there something wrong with the io access latency?
> 
> I  believe  Steve  is  exactly  correct,  the  out-of-order  model  is  not   
>  
> enforcing  a  dependency  between  your  two  instructions.  The  way  to  
> fix    
> it  is  to  force  a  dependancy  to  a  register  (for  example  the  result 
>  of    
> the  load).  You  need  to  do  this  both  in  the  decoder    and  in  the  
> code    
> that  executes  the  instruction.
> 
> For  example  for  the  rpcc  instruction  (this  code  may  be  a  little  
> bit    
> newer  than  yours,  but  same  idea):
>    /*  Rb  is  a  fake  dependency  so  here  is  a  fun  way  to  get  the  
> parser    
> to  understand  that.  */
>    Ra  =  xc- >readMiscRegWithEffect(AlphaISA::IPR_CC,  fault)  +  (Rb  &  0);
> 
> and  in  some  code:
> 
> inline  uint32_t  cycleCounter(uint32_t  dep)
> {
>          uint32_t  res;
>          asm  volatile  ("rpcc  %0,  %1"  :  "=r"(res)  :  "r"  (dep)  :  
> "memory");
>          return  res;
> }
> 
>    t1  =  cycleCounter(trash);
>    for  (x  =  0;  x   <  count;  x++)  {
>                  trash  =  readl(addr);
>                  t2  =  cycleCounter(trash);
> }
> 
> 
> Ali
> 
> On  Apr  26,  2006,  at  9:59  PM,  Steve  Reinhardt  wrote:
> 
>>  My  guess  would  be  that  it  has  to  do  with  the  out-of-order    
>>  scheduling  in  the  detailed  CPU.    If  the  instruction  that  reads    
>>  curTick  has  no  dependence  on  the  read  or  write  instructions,  then 
>>    
>>  it  will  get  executed  out-of-order  while  the  read  or  write  is  
>> still    
>>  stalled.
>>
>>  I  remember  that  we  ran  into  this  problem  ourselves  but  I  don't   
>>  
>>  remember  the  details  of  how  we  solved  it...  Ali  or  Nate,  can  
>> you    
>>  help  here?
>>
>>  Steve
>>
>>  Richard  R.  Zhang  wrote:
>>>  Hi  Lisa  and  all  M5  users,
>>>  I  find  something  strange  with  the  io  access  latency.  Could  you   
>>>  
>>>  give  me  a  hint  with  it?
>>>  I  have  added  a  new  instruction  to  the  alpha  isa.  This  
>>> instruction    
>>>  can  get  the  curTick  in  M5.  It  seems  to  work  correctly.  So,  I  
>>> plan    
>>>  to  use  it  to  measure  the  time  in  the  guest  OS.  Then,  I  added  
>>> some    
>>>  statements  to  the  ns83820  driver.  These  statements  compute  the    
>>>  time  used  by  the  driver  irq  routine  and  the  io  access(just  
>>> compute    
>>>  the  of  writel  and  readl).  But  the  results  below  puzzled  me,  and 
>>>  I    
>>>  can't  explain  it.  These  results  come  from  a  netperf  maerts  test  
>>>   
>>>  under  Sampler  mode,  and  the  memory  configuration  is  STE.
>>>  ---------------------------------------------------------------------  
>>>  --------
>>>  | | CacheCPU  mode | DetailedCPU  mode |
>>>  ---------------------------------------------------------------------  
>>>  --------
>>>  |avg.io  read  time | 1581  cycles | 40  cycles |
>>>  ---------------------------------------------------------------------  
>>>  --------
>>>  |avg.io  write  time | 1561  cycles | 9  cycles |
>>>  ---------------------------------------------------------------------  
>>>  --------
>>>  I  don't  know  why  the  io  access  time  in  CacheCPU  is  much  bigger 
>>>    
>>>  than  it  in  DetailedCPU.  I  think  that  the  time  in  CacheCPU  mode  
>>>   
>>>  should  less  than  which  in  DetailedCPU  mode,  at  least  equal  to  
>>> it.    
>>>  This  is  strange  to  me.  Could  anybody  give  me  the  explain  with  
>>> it?    
>>>  Thanks  a  lot.
>>>  Best  wishes,
>>>  Richard  R.  Zhang
>>>  2006-04-26
>>>  -------------------------------------------------------
>>>  Using  Tomcat  but  need  to  do  more?  Need  to  support  web  services, 
>>>    
>>>  security?
>>>  Get  stuff  done  quickly  with  pre-integrated  technology  to  make  
>>> your    
>>>  job  easier
>>>  Download  IBM  WebSphere  Application  Server  v.1.0.1  based  on  Apache  
>>>   
>>>  Geronimo
>>>  http://sel.as-us.falkag.net/sel?  
>>>  cmd=lnk&kid=120709&bid=263057&dat=121642
>>>  _______________________________________________
>>>  m5sim-users  mailing  list
>>>  [email protected]
>>>  https://lists.sourceforge.net/lists/listinfo/m5sim-users
>>
>>  -------------------------------------------------------
>>  Using  Tomcat  but  need  to  do  more?  Need  to  support  web  services,  
>>   
>>  security?
>>  Get  stuff  done  quickly  with  pre-integrated  technology  to  make  your 
>>    
>>  job  easier
>>  Download  IBM  WebSphere  Application  Server  v.1.0.1  based  on  Apache   
>>  
>>  Geronimo
>>  http://sel.as-us.falkag.net/sel?  
>>  cmd=lnk&kid=120709&bid=263057&dat=121642
>>  _______________________________________________
>>  m5sim-users  mailing  list
>>  [email protected]
>>  https://lists.sourceforge.net/lists/listinfo/m5sim-users
>>


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
m5sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/m5sim-users

Reply via email to