Re: [m5sim-users] Is there anything wrong with the io access latency?

Ali Saidi Tue, 16 May 2006 07:29:48 -0700

Richard,

If you set pc_sample_profile=<num cpu cycles per sample> on adetailed CPU you'll get a file called m5prof.<cpuname> which has theprofile information. You can then use the profile-top program in theutil directory to convert this to large categories.



Ali

On May 14, 2006, at 10:20 PM, Richard R. Zhang wrote:

Hi Steve,
Thanks a lot for your reply. I'll follow your suggestion to tracewhat isgoing on in that situation. And when the result comes out, I'llreport it
to the mail list.
I have another question. I have read your paper, CSE-TR-505-04.Some figu-res in it show the CPU utilization percent. I want to know how youget thosefigures. Could you give me a suggestion about it? And if it ispossible,
please send me a particular data of them.
Many thanks to you!

Best wishes,

Richard R. Zhang
2006-05-15



·¢¼þÈË£º Steve Reinhardt
·¢ËÍÊ±¼ä£º 2006-05-14 10:13:31
ÊÕ¼þÈË£º Richard R. Zhang
³ ËÍ£º Ali Saidi; Lisa Hsu; m5sim-users
Ö÷Ìâ£º Re: [m5sim-users] Is there anything wrong with the ioaccess latency?
Hi  Richard,
I think the only way to know for sure what's going onis to turn ontracing. Since you don't want to get flooded withirrelevant tracedata, I'd suggest running it once to print out the cyclewhere theinteresting activity starts, then running again using --Trace.start= <n >to enable tracing just before that point. If you setthe InstExec, Bus,and Pipeline trace flags, plus the one for whateverdevice you're using,
you  should  get  a  pretty  complete  picture  of  what's  going  on.

Anyone  else  have  any  other  tips?

Steve

Richard  R.  Zhang  wrote:
 Hi  Ali£¬Lisa,  and  Steve,
 Thanks  a  lot  for  your  help.
Following Ali's advice, I have tried two methods toimplement the time measure in theguest system. But they still led to the same resultswhich is same as I mentioned inmy first mail. The I/O access time in CacheCPU isstill much bigger than it in Detail-
 edCPU.
 Here  are  my  methods.
Method 1. As I did before, I added a m5 instrustionto the simulator. The following codefragments come from the file isa_desc and pseudo_inst.ccin my m5 code.
 In  isa_desc:
 ...
                         0x24:  gettick({{
Ra = AlphaPseudo::gettick(xc->xcBase()) + (Rb & 0);
                         }},  No_OpClass,  IsNonSpeculative);
 ...
 In  pseudo_inst.cc:
 ...
         uint64_t
         gettick  (ExecContext  *xc)
         {
                 return  curTick;
         }
 ...
And I changed the readl and writel function with thefollowing function in Linux driver.
 static  void  __stat_writel(u32  v,  volatile  void  __iomem  *addr)
 {
         int64_t  before,  after;
         after  =  0;
         before  =  gettick(after);
         writel  (v,  addr);
         after  =  gettick  ((int64_t)addr);

         if  (enable_stat){
                 iow_cycles  +=  (after  -  before);
                 iow_count  ++;
         }
 }
 static  u32    __stat_readl(const  volatile  void  __iomem  *addr)
 {
         int64_t  before,  after;
         u32  ret;
         after  =  0;
         before  =  gettick(after);
         ret  =  readl  (addr);
         after  =  gettick  (ret);

         if  (enable_stat){
                 ior_cycles  +=  (after  -  before);
                 ior_count  ++;
         }

         return  ret;
 }
Method 2. I use the rpcc instruction to get the tickvalue, not mine pseudoinstruction. The only difference in driver code is thatthe gettick functionis replaced by __rpcc function which is a wrapper ofrpcc instruction.
 static  __inline  int64_t  _rpcc(int64_t  dep)
 {
         int64_t  res;
asm volatile ("rpcc %0, %1" : "=r"(res) :"r"(dep) : "memory");
         return  res;
 }
I'm confused with this problem. I think that the I/Oregister access couldnot be fast as the cache access. It seems that thedata was come from thecache not the device. Because the instruction dependencyhas been added inmy time measure code, the out-of-order model could notbe real reason. AndI have found a strange thing. In detailedCPU model, thefirst I/O accesstime is similar as the cachedCPU model. The followingtext is come from theconsole log. The strings in it is printed by printk inLinux driver.
 XXXXXXXisrXXXXXXX
 irq  count:16,  used  3973  cycles
  In  this  irq:
  io  read  count:1,  used  1556  cycles
  io  write  count:1,  used  1604  cycles
 XXXXXXXisrXXXXXXX
 irq  count:17,  used  4736  cycles
  In  this  irq:
  io  read  count:2,  used  3240  cycles
  io  write  count:0,  used  0  cycles

 ...from  here,  the  simulator  entered  the  detailed  mode...
 XXXXXXXisrXXXXXXX
 irq  count:18,  used  4169  cycles
  In  this  irq:
  io  read  count:2,  used  1623  cycles
  io  write  count:0,  used  0  cycles
 XXXXXXXisrXXXXXXX
 irq  count:19,  used  3818  cycles
  In  this  irq:
  io  read  count:1,  used  9  cycles
  io  write  count:1,  used  11  cycles

 Could  anyone  give  me  some  more  advices?  Thanks  a  lot!

 Richard  R.  Zhang
 2006-05-12



 ·¢¼þÈË£º  Ali  Saidi
 ·¢ËÍÊ±¼ä£º  2006-04-27  12:14:54
 ÊÕ¼þÈË£º  Steve  Reinhardt
 ³ ËÍ£º  Richard  R.  Zhang;  Lisa  Hsu;  m5sim-users
Ö÷Ìâ£º Re: [m5sim-users] Is there something wrongwith the io access latency?
I believe Steve is exactly correct, the out-of-order model is notenforcing a dependency between your twoinstructions. The way to fixit is to force a dependancy to aregister (for example the result ofthe load). You need to do this both inthe decoder and in the code
 that    executes    the    instruction.
For example for the rpcc instruction (thiscode may be a little bit
 newer    than    yours,    but    same    idea):
/* Rb is a fake dependency so hereis a fun way to get the parser
 to    understand    that.    */
Ra = xc- >readMiscRegWithEffect(AlphaISA::IPR_CC, fault) + (Rb & 0);
 and    in    some    code:

 inline    uint32_t    cycleCounter(uint32_t    dep)
 {
                   uint32_t    res;
asm volatile ("rpcc %0, %1" :"=r"(res) : "r" (dep) : "memory");
                   return    res;
 }

       t1    =    cycleCounter(trash);
       for    (x    =    0;    x       <    count;    x++)    {
                                   trash    =    readl(addr);
                                   t2    =    cycleCounter(trash);
 }


 Ali
On Apr 26, 2006, at 9:59 PM, SteveReinhardt wrote:
My guess would be that it has to dowith the out-of-orderscheduling in the detailed CPU. Ifthe instruction that readscurTick has no dependence on the reador write instructions, thenit will get executed out-of-order whilethe read or write is still
   stalled.
I remember that we ran into thisproblem ourselves but I don'tremember the details of how we solvedit... Ali or Nate, can you
   help    here?

   Steve

   Richard    R.    Zhang    wrote:
   Hi    Lisa    and    all    M5    users,
I find something strange with the ioaccess latency. Could you
   give    me    a    hint    with    it?
I have added a new instruction to thealpha isa. This instructioncan get the curTick in M5. It seemsto work correctly. So, I planto use it to measure the time inthe guest OS. Then, I added somestatements to the ns83820 driver. Thesestatements compute thetime used by the driver irq routineand the io access(just computethe of writel and readl). But theresults below puzzled me, and Ican't explain it. These results comefrom a netperf maerts testunder Sampler mode, and the memoryconfiguration is STE.---------------------------------------------------------------------
   --------
   |  |  CacheCPU    mode  |  DetailedCPU    mode  |
---------------------------------------------------------------------
   --------
   |avg.io    read    time  |  1581    cycles  |  40    cycles  |
---------------------------------------------------------------------
   --------
   |avg.io    write    time  |  1561    cycles  |  9    cycles  |
---------------------------------------------------------------------
   --------
I don't know why the io access timein CacheCPU is much biggerthan it in DetailedCPU. I think thatthe time in CacheCPU modeshould less than which in DetailedCPUmode, at least equal to it.This is strange to me. Could anybodygive me the explain with it?
   Thanks    a    lot.
   Best    wishes,
   Richard    R.    Zhang
   2006-04-26
   -------------------------------------------------------
Using Tomcat but need to do more?Need to support web services,
   security?
Get stuff done quickly with pre-integratedtechnology to make your
   job    easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
   Geronimo
   http://sel.as-us.falkag.net/sel?
   cmd=lnk&kid=120709&bid=263057&dat=121642
   _______________________________________________
   m5sim-users    mailing    list
   [email protected]
   https://lists.sourceforge.net/lists/listinfo/m5sim-users
   -------------------------------------------------------
Using Tomcat but need to do more? Needto support web services,
   security?
Get stuff done quickly with pre-integratedtechnology to make your
   job    easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
   Geronimo
   http://sel.as-us.falkag.net/sel?
   cmd=lnk&kid=120709&bid=263057&dat=121642
   _______________________________________________
   m5sim-users    mailing    list
   [email protected]
   https://lists.sourceforge.net/lists/listinfo/m5sim-users
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÔ²)àN‰œjÖî¶wžvÚ¢j+{ó^yÛh²êi¢»py»®øœzÏìyË«ŠÜÿël¶çßv‰Þªèœ’\°ŠØiïâž× «^v×œ†z%¢ ¢f¤{*.®:y«"z°èÂyhiÒ1g›J˜^à)¦XœjØ¨'«½êïÿ_ôÿVÚ±çhœZr†zº'Šj!¶Úÿÿû—ö¬þëÿ}©djçzßìz_Ü™žOä‰ÛNô÷öâßN{ýÖëÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿù¹²)ÿºÇ«²f¢–)à–+-››"›û¬z»ÿ–+-³û(º·~Šà{ùÞ¶m¦ÏÿþX¬¶Ïì¢êÜyú+ïçzßåŠËlþX¬¶)ß£ù¹²)ÿºÇ«




-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
m5sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/m5sim-users

Re: [m5sim-users] Is there anything wrong with the io access latency?

Reply via email to