Richard,

If you set pc_sample_profile=<num cpu cycles per sample> on a detailed CPU you'll get a file called m5prof.<cpuname> which has the profile information. You can then use the profile-top program in the util directory to convert this to large categories.


Ali

On May 14, 2006, at 10:20 PM, Richard R. Zhang wrote:

Hi Steve,
Thanks a lot for your reply. I'll follow your suggestion to trace what is going on in that situation. And when the result comes out, I'll report it
to the mail list.
I have another question. I have read your paper, CSE-TR-505-04. Some figu- res in it show the CPU utilization percent. I want to know how you get those figures. Could you give me a suggestion about it? And if it is possible,
please send me a particular data of them.
Many thanks to you!

Best wishes,

Richard R. Zhang
2006-05-15



·¢¼þÈË£º Steve Reinhardt
·¢ËÍʱ¼ä£º 2006-05-14 10:13:31
ÊÕ¼þÈË£º Richard R. Zhang
³ ËÍ£º Ali Saidi; Lisa Hsu; m5sim-users
Ö÷Ì⣺ Re: [m5sim-users] Is there anything wrong with the io access latency?


Hi  Richard,

I think the only way to know for sure what's going on is to turn on tracing. Since you don't want to get flooded with irrelevant trace data, I'd suggest running it once to print out the cycle where the interesting activity starts, then running again using -- Trace.start= <n > to enable tracing just before that point. If you set the InstExec, Bus, and Pipeline trace flags, plus the one for whatever device you're using,
you  should  get  a  pretty  complete  picture  of  what's  going  on.

Anyone  else  have  any  other  tips?

Steve

Richard  R.  Zhang  wrote:
 Hi  Ali£¬Lisa,  and  Steve,
 Thanks  a  lot  for  your  help.
Following Ali's advice, I have tried two methods to implement the time measure in the guest system. But they still led to the same results which is same as I mentioned in my first mail. The I/O access time in CacheCPU is still much bigger than it in Detail-
 edCPU.
 Here  are  my  methods.
Method 1. As I did before, I added a m5 instrustion to the simulator. The following code fragments come from the file isa_desc and pseudo_inst.cc in my m5 code.
 In  isa_desc:
 ...
                         0x24:  gettick({{
Ra = AlphaPseudo::gettick(xc- >xcBase()) + (Rb & 0);
                         }},  No_OpClass,  IsNonSpeculative);
 ...
 In  pseudo_inst.cc:
 ...
         uint64_t
         gettick  (ExecContext  *xc)
         {
                 return  curTick;
         }
 ...
And I changed the readl and writel function with the following function in Linux driver.
 static  void  __stat_writel(u32  v,  volatile  void  __iomem  *addr)
 {
         int64_t  before,  after;
         after  =  0;
         before  =  gettick(after);
         writel  (v,  addr);
         after  =  gettick  ((int64_t)addr);

         if  (enable_stat){
                 iow_cycles  +=  (after  -  before);
                 iow_count  ++;
         }
 }
 static  u32    __stat_readl(const  volatile  void  __iomem  *addr)
 {
         int64_t  before,  after;
         u32  ret;
         after  =  0;
         before  =  gettick(after);
         ret  =  readl  (addr);
         after  =  gettick  (ret);

         if  (enable_stat){
                 ior_cycles  +=  (after  -  before);
                 ior_count  ++;
         }

         return  ret;
 }
Method 2. I use the rpcc instruction to get the tick value, not mine pseudo instruction. The only difference in driver code is that the gettick function is replaced by __rpcc function which is a wrapper of rpcc instruction.
 static  __inline  int64_t  _rpcc(int64_t  dep)
 {
         int64_t  res;
asm volatile ("rpcc %0, %1" : "=r"(res) : "r"(dep) : "memory");
         return  res;
 }

I'm confused with this problem. I think that the I/O register access could not be fast as the cache access. It seems that the data was come from the cache not the device. Because the instruction dependency has been added in my time measure code, the out-of-order model could not be real reason. And I have found a strange thing. In detailedCPU model, the first I/O access time is similar as the cachedCPU model. The following text is come from the console log. The strings in it is printed by printk in Linux driver.

 XXXXXXXisrXXXXXXX
 irq  count:16,  used  3973  cycles
  In  this  irq:
  io  read  count:1,  used  1556  cycles
  io  write  count:1,  used  1604  cycles
 XXXXXXXisrXXXXXXX
 irq  count:17,  used  4736  cycles
  In  this  irq:
  io  read  count:2,  used  3240  cycles
  io  write  count:0,  used  0  cycles

 ...from  here,  the  simulator  entered  the  detailed  mode...
 XXXXXXXisrXXXXXXX
 irq  count:18,  used  4169  cycles
  In  this  irq:
  io  read  count:2,  used  1623  cycles
  io  write  count:0,  used  0  cycles
 XXXXXXXisrXXXXXXX
 irq  count:19,  used  3818  cycles
  In  this  irq:
  io  read  count:1,  used  9  cycles
  io  write  count:1,  used  11  cycles

 Could  anyone  give  me  some  more  advices?  Thanks  a  lot!

 Richard  R.  Zhang
 2006-05-12



 ·¢¼þÈË£º  Ali  Saidi
 ·¢ËÍʱ¼ä£º  2006-04-27  12:14:54
 ÊÕ¼þÈË£º  Steve  Reinhardt
 ³ ËÍ£º  Richard  R.  Zhang;  Lisa  Hsu;  m5sim-users
Ö÷Ì⣺ Re: [m5sim-users] Is there something wrong with the io access latency?

I believe Steve is exactly correct, the out- of-order model is not enforcing a dependency between your two instructions. The way to fix it is to force a dependancy to a register (for example the result of the load). You need to do this both in the decoder and in the code
 that    executes    the    instruction.

For example for the rpcc instruction (this code may be a little bit
 newer    than    yours,    but    same    idea):
/* Rb is a fake dependency so here is a fun way to get the parser
 to    understand    that.    */
Ra = xc- >readMiscRegWithEffect (AlphaISA::IPR_CC, fault) + (Rb & 0);

 and    in    some    code:

 inline    uint32_t    cycleCounter(uint32_t    dep)
 {
                   uint32_t    res;
asm volatile ("rpcc %0, %1" : "=r"(res) : "r" (dep) : "memory");
                   return    res;
 }

       t1    =    cycleCounter(trash);
       for    (x    =    0;    x       <    count;    x++)    {
                                   trash    =    readl(addr);
                                   t2    =    cycleCounter(trash);
 }


 Ali

On Apr 26, 2006, at 9:59 PM, Steve Reinhardt wrote:

My guess would be that it has to do with the out-of-order scheduling in the detailed CPU. If the instruction that reads curTick has no dependence on the read or write instructions, then it will get executed out-of-order while the read or write is still
   stalled.

I remember that we ran into this problem ourselves but I don't remember the details of how we solved it... Ali or Nate, can you
   help    here?

   Steve

   Richard    R.    Zhang    wrote:
   Hi    Lisa    and    all    M5    users,
I find something strange with the io access latency. Could you
   give    me    a    hint    with    it?
I have added a new instruction to the alpha isa. This instruction can get the curTick in M5. It seems to work correctly. So, I plan to use it to measure the time in the guest OS. Then, I added some statements to the ns83820 driver. These statements compute the time used by the driver irq routine and the io access(just compute the of writel and readl). But the results below puzzled me, and I can't explain it. These results come from a netperf maerts test under Sampler mode, and the memory configuration is STE. ------------------------------------------------------------------- --
   --------
   |  |  CacheCPU    mode  |  DetailedCPU    mode  |
------------------------------------------------------------------- --
   --------
   |avg.io    read    time  |  1581    cycles  |  40    cycles  |
------------------------------------------------------------------- --
   --------
   |avg.io    write    time  |  1561    cycles  |  9    cycles  |
------------------------------------------------------------------- --
   --------
I don't know why the io access time in CacheCPU is much bigger than it in DetailedCPU. I think that the time in CacheCPU mode should less than which in DetailedCPU mode, at least equal to it. This is strange to me. Could anybody give me the explain with it?
   Thanks    a    lot.
   Best    wishes,
   Richard    R.    Zhang
   2006-04-26
   -------------------------------------------------------
Using Tomcat but need to do more? Need to support web services,
   security?
Get stuff done quickly with pre-integrated technology to make your
   job    easier
Download IBM WebSphere Application Server v. 1.0.1 based on Apache
   Geronimo
   http://sel.as-us.falkag.net/sel?
   cmd=lnk&kid=120709&bid=263057&dat=121642
   _______________________________________________
   m5sim-users    mailing    list
   [email protected]
   https://lists.sourceforge.net/lists/listinfo/m5sim-users

   -------------------------------------------------------
Using Tomcat but need to do more? Need to support web services,
   security?
Get stuff done quickly with pre-integrated technology to make your
   job    easier
Download IBM WebSphere Application Server v. 1.0.1 based on Apache
   Geronimo
   http://sel.as-us.falkag.net/sel?
   cmd=lnk&kid=120709&bid=263057&dat=121642
   _______________________________________________
   m5sim-users    mailing    list
   [email protected]
   https://lists.sourceforge.net/lists/listinfo/m5sim-users

ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿÿÿÿÿÔ²)àN‰œjÖî¶wžvÚ¢j+{ó^yÛh²êi ¢»py»®øœzÏìyË«ŠÜÿël¶çßv‰Þªèœ’\°ŠØi ïâž× «^vל†z%¢ ¢f¤{*.®:y«"z°èÂyhiÒ1g›J˜^ à)¦Xœjب'«½êïÿ_ôÿVÚ±çhœ Zr†zº'Šj!¶Úÿÿû—ö¬þëÿ}©djçzßìz_Ü™žOä‰ ÛNô÷öâßN{ýÖ ëÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ ÿÿù¹²)ÿºÇ«²f¢–)à–+-››"›û¬z»ÿ–+-³û (º·~Šà{ùÞ¶m¦ÏÿþX¬¶Ïì¢êÜyú +ïçzßåŠËlþX¬¶)ߣù¹²)ÿºÇ«




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
m5sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/m5sim-users

Reply via email to