Hi Richard,
I think the only way to know for sure what's going on is to turn on
tracing. Since you don't want to get flooded with irrelevant trace
data, I'd suggest running it once to print out the cycle where the
interesting activity starts, then running again using --Trace.start=<n>
to enable tracing just before that point. If you set the InstExec, Bus,
and Pipeline trace flags, plus the one for whatever device you're using,
you should get a pretty complete picture of what's going on.
Anyone else have any other tips?
Steve
Richard R. Zhang wrote:
> Hi Ali,Lisa, and Steve,
> Thanks a lot for your help.
> Following Ali's advice, I have tried two methods to implement the time
> measure in the
> guest system. But they still led to the same results which is same as I
> mentioned in
> my first mail. The I/O access time in CacheCPU is still much bigger than it
> in Detail-
> edCPU.
> Here are my methods.
> Method 1. As I did before, I added a m5 instrustion to the simulator. The
> following code
> fragments come from the file isa_desc and pseudo_inst.cc in my m5 code.
> In isa_desc:
> ...
> 0x24: gettick({{
> Ra = AlphaPseudo::gettick(xc->xcBase()) + (Rb & 0);
> }}, No_OpClass, IsNonSpeculative);
> ...
> In pseudo_inst.cc:
> ...
> uint64_t
> gettick (ExecContext *xc)
> {
> return curTick;
> }
> ...
> And I changed the readl and writel function with the following function in
> Linux driver.
> static void __stat_writel(u32 v, volatile void __iomem *addr)
> {
> int64_t before, after;
> after = 0;
> before = gettick(after);
> writel (v, addr);
> after = gettick ((int64_t)addr);
>
> if (enable_stat){
> iow_cycles += (after - before);
> iow_count ++;
> }
> }
> static u32 __stat_readl(const volatile void __iomem *addr)
> {
> int64_t before, after;
> u32 ret;
> after = 0;
> before = gettick(after);
> ret = readl (addr);
> after = gettick (ret);
>
> if (enable_stat){
> ior_cycles += (after - before);
> ior_count ++;
> }
>
> return ret;
> }
> Method 2. I use the rpcc instruction to get the tick value, not mine pseudo
> instruction. The only difference in driver code is that the gettick function
> is replaced by __rpcc function which is a wrapper of rpcc instruction.
> static __inline int64_t _rpcc(int64_t dep)
> {
> int64_t res;
> asm volatile ("rpcc %0, %1" : "=r"(res) : "r"(dep) : "memory");
> return res;
> }
>
> I'm confused with this problem. I think that the I/O register access could
> not be fast as the cache access. It seems that the data was come from the
> cache not the device. Because the instruction dependency has been added in
> my time measure code, the out-of-order model could not be real reason. And
> I have found a strange thing. In detailedCPU model, the first I/O access
> time is similar as the cachedCPU model. The following text is come from the
> console log. The strings in it is printed by printk in Linux driver.
>
> XXXXXXXisrXXXXXXX
> irq count:16, used 3973 cycles
> In this irq:
> io read count:1, used 1556 cycles
> io write count:1, used 1604 cycles
> XXXXXXXisrXXXXXXX
> irq count:17, used 4736 cycles
> In this irq:
> io read count:2, used 3240 cycles
> io write count:0, used 0 cycles
>
> ...from here, the simulator entered the detailed mode...
> XXXXXXXisrXXXXXXX
> irq count:18, used 4169 cycles
> In this irq:
> io read count:2, used 1623 cycles
> io write count:0, used 0 cycles
> XXXXXXXisrXXXXXXX
> irq count:19, used 3818 cycles
> In this irq:
> io read count:1, used 9 cycles
> io write count:1, used 11 cycles
>
> Could anyone give me some more advices? Thanks a lot!
>
> Richard R. Zhang
> 2006-05-12
>
>
>
> 发件人: Ali Saidi
> 发送时间: 2006-04-27 12:14:54
> 收件人: Steve Reinhardt
> 抄送: Richard R. Zhang; Lisa Hsu; m5sim-users
> 主题: Re: [m5sim-users] Is there something wrong with the io access latency?
>
> I believe Steve is exactly correct, the out-of-order model is not
>
> enforcing a dependency between your two instructions. The way to
> fix
> it is to force a dependancy to a register (for example the result
> of
> the load). You need to do this both in the decoder and in the
> code
> that executes the instruction.
>
> For example for the rpcc instruction (this code may be a little
> bit
> newer than yours, but same idea):
> /* Rb is a fake dependency so here is a fun way to get the
> parser
> to understand that. */
> Ra = xc- >readMiscRegWithEffect(AlphaISA::IPR_CC, fault) + (Rb & 0);
>
> and in some code:
>
> inline uint32_t cycleCounter(uint32_t dep)
> {
> uint32_t res;
> asm volatile ("rpcc %0, %1" : "=r"(res) : "r" (dep) :
> "memory");
> return res;
> }
>
> t1 = cycleCounter(trash);
> for (x = 0; x < count; x++) {
> trash = readl(addr);
> t2 = cycleCounter(trash);
> }
>
>
> Ali
>
> On Apr 26, 2006, at 9:59 PM, Steve Reinhardt wrote:
>
>> My guess would be that it has to do with the out-of-order
>> scheduling in the detailed CPU. If the instruction that reads
>> curTick has no dependence on the read or write instructions, then
>>
>> it will get executed out-of-order while the read or write is
>> still
>> stalled.
>>
>> I remember that we ran into this problem ourselves but I don't
>>
>> remember the details of how we solved it... Ali or Nate, can
>> you
>> help here?
>>
>> Steve
>>
>> Richard R. Zhang wrote:
>>> Hi Lisa and all M5 users,
>>> I find something strange with the io access latency. Could you
>>>
>>> give me a hint with it?
>>> I have added a new instruction to the alpha isa. This
>>> instruction
>>> can get the curTick in M5. It seems to work correctly. So, I
>>> plan
>>> to use it to measure the time in the guest OS. Then, I added
>>> some
>>> statements to the ns83820 driver. These statements compute the
>>> time used by the driver irq routine and the io access(just
>>> compute
>>> the of writel and readl). But the results below puzzled me, and
>>> I
>>> can't explain it. These results come from a netperf maerts test
>>>
>>> under Sampler mode, and the memory configuration is STE.
>>> ---------------------------------------------------------------------
>>> --------
>>> | | CacheCPU mode | DetailedCPU mode |
>>> ---------------------------------------------------------------------
>>> --------
>>> |avg.io read time | 1581 cycles | 40 cycles |
>>> ---------------------------------------------------------------------
>>> --------
>>> |avg.io write time | 1561 cycles | 9 cycles |
>>> ---------------------------------------------------------------------
>>> --------
>>> I don't know why the io access time in CacheCPU is much bigger
>>>
>>> than it in DetailedCPU. I think that the time in CacheCPU mode
>>>
>>> should less than which in DetailedCPU mode, at least equal to
>>> it.
>>> This is strange to me. Could anybody give me the explain with
>>> it?
>>> Thanks a lot.
>>> Best wishes,
>>> Richard R. Zhang
>>> 2006-04-26
>>> -------------------------------------------------------
>>> Using Tomcat but need to do more? Need to support web services,
>>>
>>> security?
>>> Get stuff done quickly with pre-integrated technology to make
>>> your
>>> job easier
>>> Download IBM WebSphere Application Server v.1.0.1 based on Apache
>>>
>>> Geronimo
>>> http://sel.as-us.falkag.net/sel?
>>> cmd=lnk&kid=120709&bid=263057&dat=121642
>>> _______________________________________________
>>> m5sim-users mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/m5sim-users
>>
>> -------------------------------------------------------
>> Using Tomcat but need to do more? Need to support web services,
>>
>> security?
>> Get stuff done quickly with pre-integrated technology to make your
>>
>> job easier
>> Download IBM WebSphere Application Server v.1.0.1 based on Apache
>>
>> Geronimo
>> http://sel.as-us.falkag.net/sel?
>> cmd=lnk&kid=120709&bid=263057&dat=121642
>> _______________________________________________
>> m5sim-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/m5sim-users
>>
-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
m5sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/m5sim-users