Hi Steve,
Thanks a lot for your reply. I'll follow your suggestion to trace what is
going on in that situation. And when the result comes out, I'll report it
to the mail list.
I have another question. I have read your paper, CSE-TR-505-04. Some figu-
res in it show the CPU utilization percent. I want to know how you get those
figures. Could you give me a suggestion about it? And if it is possible, 
please send me a particular data of them.
Many thanks to you!

Best wishes,

Richard R. Zhang
2006-05-15



发件人: Steve Reinhardt
发送时间: 2006-05-14 10:13:31
收件人: Richard R. Zhang
抄送: Ali Saidi; Lisa Hsu; m5sim-users
主题: Re: [m5sim-users] Is there anything wrong with the io access latency?


Hi  Richard,

I  think  the  only  way  to  know  for  sure  what's  going  on  is  to  turn  
on
tracing.    Since  you  don't  want  to  get  flooded  with  irrelevant  trace
data,  I'd  suggest  running  it  once  to  print  out  the  cycle  where  the
interesting  activity  starts,  then  running  again  using  --Trace.start= <n >
to  enable  tracing  just  before  that  point.    If  you  set  the  InstExec, 
 Bus,
and  Pipeline  trace  flags,  plus  the  one  for  whatever  device  you're  
using,
you  should  get  a  pretty  complete  picture  of  what's  going  on.

Anyone  else  have  any  other  tips?

Steve

Richard  R.  Zhang  wrote:
>  Hi  Ali,Lisa,  and  Steve,
>  Thanks  a  lot  for  your  help.
>  Following  Ali's  advice,  I  have  tried  two  methods  to  implement  the  
> time  measure  in  the
>  guest  system.  But  they  still  led  to  the  same  results  which  is  
> same  as  I  mentioned    in
>  my  first  mail.  The  I/O  access  time  in  CacheCPU  is  still  much  
> bigger  than  it  in  Detail-
>  edCPU.  
>  Here  are  my  methods.
>  Method  1.  As  I  did  before,  I  added  a  m5  instrustion  to  the  
> simulator.  The  following  code
>  fragments  come  from  the  file  isa_desc  and  pseudo_inst.cc  in  my  m5  
> code.
>  In  isa_desc:
>  ...
>                          0x24:  gettick({{
>                                  Ra  =  AlphaPseudo::gettick(xc- >xcBase())  
> +  (Rb  &  0);
>                          }},  No_OpClass,  IsNonSpeculative);
>  ...
>  In  pseudo_inst.cc:
>  ...
>          uint64_t
>          gettick  (ExecContext  *xc)
>          {
>                  return  curTick;
>          }
>  ...
>  And  I  changed  the  readl  and  writel  function  with  the  following  
> function  in  Linux  driver.
>  static  void  __stat_writel(u32  v,  volatile  void  __iomem  *addr)
>  {
>          int64_t  before,  after;
>          after  =  0;
>          before  =  gettick(after);
>          writel  (v,  addr);
>          after  =  gettick  ((int64_t)addr);
>          
>          if  (enable_stat){
>                  iow_cycles  +=  (after  -  before);
>                  iow_count  ++;  
>          }
>  }
>  static  u32    __stat_readl(const  volatile  void  __iomem  *addr)
>  {
>          int64_t  before,  after;
>          u32  ret;
>          after  =  0;
>          before  =  gettick(after);
>          ret  =  readl  (addr);
>          after  =  gettick  (ret);
>        
>          if  (enable_stat){  
>                  ior_cycles  +=  (after  -  before);
>                  ior_count  ++;
>          }
>          
>          return  ret;
>  }
>  Method  2.  I  use  the  rpcc  instruction  to  get  the  tick  value,  not  
> mine  pseudo
>  instruction.  The  only  difference  in  driver  code  is  that  the  
> gettick  function
>  is  replaced  by  __rpcc  function  which  is  a  wrapper  of  rpcc  
> instruction.
>  static  __inline  int64_t  _rpcc(int64_t  dep)
>  {
>          int64_t  res;
>          asm  volatile  ("rpcc  %0,  %1"  :  "=r"(res)  :  "r"(dep)  :  
> "memory");
>          return  res;
>  }
>  
>  I'm  confused  with  this  problem.  I  think  that  the  I/O  register  
> access  could
>  not  be  fast  as  the  cache  access.  It  seems  that  the  data  was  
> come  from  the  
>  cache  not  the  device.  Because  the  instruction  dependency  has  been  
> added  in
>  my  time  measure  code,  the  out-of-order  model  could  not  be  real  
> reason.  And
>  I  have  found  a  strange  thing.  In  detailedCPU  model,  the  first  I/O 
>  access  
>  time  is  similar  as  the  cachedCPU  model.  The  following  text  is  
> come  from  the
>  console  log.  The  strings  in  it  is  printed  by  printk  in  Linux  
> driver.
>  
>  XXXXXXXisrXXXXXXX
>  irq  count:16,  used  3973  cycles
>   In  this  irq:
>   io  read  count:1,  used  1556  cycles
>   io  write  count:1,  used  1604  cycles
>  XXXXXXXisrXXXXXXX
>  irq  count:17,  used  4736  cycles
>   In  this  irq:
>   io  read  count:2,  used  3240  cycles
>   io  write  count:0,  used  0  cycles
>   
>  ...from  here,  the  simulator  entered  the  detailed  mode... 
>  XXXXXXXisrXXXXXXX
>  irq  count:18,  used  4169  cycles
>   In  this  irq:
>   io  read  count:2,  used  1623  cycles
>   io  write  count:0,  used  0  cycles
>  XXXXXXXisrXXXXXXX
>  irq  count:19,  used  3818  cycles
>   In  this  irq:
>   io  read  count:1,  used  9  cycles
>   io  write  count:1,  used  11  cycles
>  
>  Could  anyone  give  me  some  more  advices?  Thanks  a  lot!
>  
>  Richard  R.  Zhang
>  2006-05-12
>  
>  
>  
>  发件人:  Ali  Saidi
>  发送时间:  2006-04-27  12:14:54
>  收件人:  Steve  Reinhardt
>  抄送:  Richard  R.  Zhang;  Lisa  Hsu;  m5sim-users
>  主题:  Re:  [m5sim-users]  Is  there  something  wrong  with  the  io  access  
> latency?
>  
>  I    believe    Steve    is    exactly    correct,    the    out-of-order    
> model    is    not        
>  enforcing    a    dependency    between    your    two    instructions.    
> The    way    to    fix        
>  it    is    to    force    a    dependancy    to    a    register    (for    
> example    the    result    of        
>  the    load).    You    need    to    do    this    both    in    the    
> decoder        and    in    the    code        
>  that    executes    the    instruction.
>  
>  For    example    for    the    rpcc    instruction    (this    code    may  
>   be    a    little    bit        
>  newer    than    yours,    but    same    idea):
>        /*    Rb    is    a    fake    dependency    so    here    is    a    
> fun    way    to    get    the    parser        
>  to    understand    that.    */
>        Ra    =    xc-   >readMiscRegWithEffect(AlphaISA::IPR_CC,    fault)    
> +    (Rb    &    0);
>  
>  and    in    some    code:
>  
>  inline    uint32_t    cycleCounter(uint32_t    dep)
>  {
>                    uint32_t    res;
>                    asm    volatile    ("rpcc    %0,    %1"    :    "=r"(res)  
>   :    "r"    (dep)    :    "memory");
>                    return    res;
>  }
>  
>        t1    =    cycleCounter(trash);
>        for    (x    =    0;    x       <    count;    x++)    {
>                                    trash    =    readl(addr);
>                                    t2    =    cycleCounter(trash);
>  }
>  
>  
>  Ali
>  
>  On    Apr    26,    2006,    at    9:59    PM,    Steve    Reinhardt    
> wrote:
>  
> >    My    guess    would    be    that    it    has    to    do    with    
> > the    out-of-order        
> >    scheduling    in    the    detailed    CPU.        If    the    
> > instruction    that    reads        
> >    curTick    has    no    dependence    on    the    read    or    write   
> >  instructions,    then        
> >    it    will    get    executed    out-of-order    while    the    read    
> > or    write    is    still        
> >    stalled.
> >
> >    I    remember    that    we    ran    into    this    problem    
> > ourselves    but    I    don't        
> >    remember    the    details    of    how    we    solved    it...    Ali  
> >   or    Nate,    can    you        
> >    help    here?
> >
> >    Steve
> >
> >    Richard    R.    Zhang    wrote:
> > >    Hi    Lisa    and    all    M5    users,
> > >    I    find    something    strange    with    the    io    access    
> > > latency.    Could    you        
> > >    give    me    a    hint    with    it?
> > >    I    have    added    a    new    instruction    to    the    alpha    
> > > isa.    This    instruction        
> > >    can    get    the    curTick    in    M5.    It    seems    to    work 
> > >    correctly.    So,    I    plan        
> > >    to    use    it    to    measure    the    time    in    the    guest  
> > >   OS.    Then,    I    added    some        
> > >    statements    to    the    ns83820    driver.    These    statements   
> > >  compute    the        
> > >    time    used    by    the    driver    irq    routine    and    the    
> > > io    access(just    compute        
> > >    the    of    writel    and    readl).    But    the    results    
> > > below    puzzled    me,    and    I        
> > >    can't    explain    it.    These    results    come    from    a    
> > > netperf    maerts    test        
> > >    under    Sampler    mode,    and    the    memory    configuration    
> > > is    STE.
> > >    ---------------------------------------------------------------------  
> > >   
> > >    --------
> > >    |  |  CacheCPU    mode  |  DetailedCPU    mode  |
> > >    ---------------------------------------------------------------------  
> > >   
> > >    --------
> > >    |avg.io    read    time  |  1581    cycles  |  40    cycles  |
> > >    ---------------------------------------------------------------------  
> > >   
> > >    --------
> > >    |avg.io    write    time  |  1561    cycles  |  9    cycles  |
> > >    ---------------------------------------------------------------------  
> > >   
> > >    --------
> > >    I    don't    know    why    the    io    access    time    in    
> > > CacheCPU    is    much    bigger        
> > >    than    it    in    DetailedCPU.    I    think    that    the    time  
> > >   in    CacheCPU    mode        
> > >    should    less    than    which    in    DetailedCPU    mode,    at    
> > > least    equal    to    it.        
> > >    This    is    strange    to    me.    Could    anybody    give    me   
> > >  the    explain    with    it?        
> > >    Thanks    a    lot.
> > >    Best    wishes,
> > >    Richard    R.    Zhang
> > >    2006-04-26
> > >    -------------------------------------------------------
> > >    Using    Tomcat    but    need    to    do    more?    Need    to    
> > > support    web    services,        
> > >    security?
> > >    Get    stuff    done    quickly    with    pre-integrated    
> > > technology    to    make    your        
> > >    job    easier
> > >    Download    IBM    WebSphere    Application    Server    v.1.0.1    
> > > based    on    Apache        
> > >    Geronimo
> > >    http://sel.as-us.falkag.net/sel?    
> > >    cmd=lnk&kid=120709&bid=263057&dat=121642
> > >    _______________________________________________
> > >    m5sim-users    mailing    list
> > >    [email protected]
> > >    https://lists.sourceforge.net/lists/listinfo/m5sim-users
> >
> >    -------------------------------------------------------
> >    Using    Tomcat    but    need    to    do    more?    Need    to    
> > support    web    services,        
> >    security?
> >    Get    stuff    done    quickly    with    pre-integrated    technology  
> >   to    make    your        
> >    job    easier
> >    Download    IBM    WebSphere    Application    Server    v.1.0.1    
> > based    on    Apache        
> >    Geronimo
> >    http://sel.as-us.falkag.net/sel?    
> >    cmd=lnk&kid=120709&bid=263057&dat=121642
> >    _______________________________________________
> >    m5sim-users    mailing    list
> >    [email protected]
> >    https://lists.sourceforge.net/lists/listinfo/m5sim-users
> >
R颧�:&q�[嘿�y�hv�ō�^y�h碴i⒒py����z�r赈�!�端n}�h�戤��%������{^�董y�^r���2����歙焊�m娆�昝�
       濉�H��m*az����bq�b�t��鳙�]5m�v�昆�!xg��xΒm���zV�呵��F���遍\�

Reply via email to