Hi Ali,Lisa, and Steve,
Thanks a lot for your help.
Following Ali's advice, I have tried two methods to implement the time measure
in the
guest system. But they still led to the same results which is same as I
mentioned in
my first mail. The I/O access time in CacheCPU is still much bigger than it in
Detail-
edCPU.
Here are my methods.
Method 1. As I did before, I added a m5 instrustion to the simulator. The
following code
fragments come from the file isa_desc and pseudo_inst.cc in my m5 code.
In isa_desc:
...
0x24: gettick({{
Ra = AlphaPseudo::gettick(xc->xcBase()) + (Rb & 0);
}}, No_OpClass, IsNonSpeculative);
...
In pseudo_inst.cc:
...
uint64_t
gettick (ExecContext *xc)
{
return curTick;
}
...
And I changed the readl and writel function with the following function in
Linux driver.
static void __stat_writel(u32 v, volatile void __iomem *addr)
{
int64_t before, after;
after = 0;
before = gettick(after);
writel (v, addr);
after = gettick ((int64_t)addr);
if (enable_stat){
iow_cycles += (after - before);
iow_count ++;
}
}
static u32 __stat_readl(const volatile void __iomem *addr)
{
int64_t before, after;
u32 ret;
after = 0;
before = gettick(after);
ret = readl (addr);
after = gettick (ret);
if (enable_stat){
ior_cycles += (after - before);
ior_count ++;
}
return ret;
}
Method 2. I use the rpcc instruction to get the tick value, not mine pseudo
instruction. The only difference in driver code is that the gettick function
is replaced by __rpcc function which is a wrapper of rpcc instruction.
static __inline int64_t _rpcc(int64_t dep)
{
int64_t res;
asm volatile ("rpcc %0, %1" : "=r"(res) : "r"(dep) : "memory");
return res;
}
I'm confused with this problem. I think that the I/O register access could
not be fast as the cache access. It seems that the data was come from the
cache not the device. Because the instruction dependency has been added in
my time measure code, the out-of-order model could not be real reason. And
I have found a strange thing. In detailedCPU model, the first I/O access
time is similar as the cachedCPU model. The following text is come from the
console log. The strings in it is printed by printk in Linux driver.
XXXXXXXisrXXXXXXX
irq count:16, used 3973 cycles
In this irq:
io read count:1, used 1556 cycles
io write count:1, used 1604 cycles
XXXXXXXisrXXXXXXX
irq count:17, used 4736 cycles
In this irq:
io read count:2, used 3240 cycles
io write count:0, used 0 cycles
...from here, the simulator entered the detailed mode...
XXXXXXXisrXXXXXXX
irq count:18, used 4169 cycles
In this irq:
io read count:2, used 1623 cycles
io write count:0, used 0 cycles
XXXXXXXisrXXXXXXX
irq count:19, used 3818 cycles
In this irq:
io read count:1, used 9 cycles
io write count:1, used 11 cycles
Could anyone give me some more advices? Thanks a lot!
Richard R. Zhang
2006-05-12
发件人: Ali Saidi
发送时间: 2006-04-27 12:14:54
收件人: Steve Reinhardt
抄送: Richard R. Zhang; Lisa Hsu; m5sim-users
主题: Re: [m5sim-users] Is there something wrong with the io access latency?
I believe Steve is exactly correct, the out-of-order model is not
enforcing a dependency between your two instructions. The way to fix
it is to force a dependancy to a register (for example the result
of
the load). You need to do this both in the decoder and in the
code
that executes the instruction.
For example for the rpcc instruction (this code may be a little bit
newer than yours, but same idea):
/* Rb is a fake dependency so here is a fun way to get the
parser
to understand that. */
Ra = xc- >readMiscRegWithEffect(AlphaISA::IPR_CC, fault) + (Rb & 0);
and in some code:
inline uint32_t cycleCounter(uint32_t dep)
{
uint32_t res;
asm volatile ("rpcc %0, %1" : "=r"(res) : "r" (dep) :
"memory");
return res;
}
t1 = cycleCounter(trash);
for (x = 0; x < count; x++) {
trash = readl(addr);
t2 = cycleCounter(trash);
}
Ali
On Apr 26, 2006, at 9:59 PM, Steve Reinhardt wrote:
>
> My guess would be that it has to do with the out-of-order
> scheduling in the detailed CPU. If the instruction that reads
> curTick has no dependence on the read or write instructions, then
>
> it will get executed out-of-order while the read or write is
> still
> stalled.
>
> I remember that we ran into this problem ourselves but I don't
> remember the details of how we solved it... Ali or Nate, can you
>
> help here?
>
> Steve
>
> Richard R. Zhang wrote:
> > Hi Lisa and all M5 users,
> > I find something strange with the io access latency. Could you
> >
> > give me a hint with it?
> > I have added a new instruction to the alpha isa. This
> > instruction
> > can get the curTick in M5. It seems to work correctly. So, I
> > plan
> > to use it to measure the time in the guest OS. Then, I added
> > some
> > statements to the ns83820 driver. These statements compute the
> > time used by the driver irq routine and the io access(just
> > compute
> > the of writel and readl). But the results below puzzled me, and
> > I
> > can't explain it. These results come from a netperf maerts test
> >
> > under Sampler mode, and the memory configuration is STE.
> > ---------------------------------------------------------------------
> > --------
> > | | CacheCPU mode | DetailedCPU mode |
> > ---------------------------------------------------------------------
> > --------
> > |avg.io read time | 1581 cycles | 40 cycles |
> > ---------------------------------------------------------------------
> > --------
> > |avg.io write time | 1561 cycles | 9 cycles |
> > ---------------------------------------------------------------------
> > --------
> > I don't know why the io access time in CacheCPU is much bigger
> >
> > than it in DetailedCPU. I think that the time in CacheCPU mode
> >
> > should less than which in DetailedCPU mode, at least equal to
> > it.
> > This is strange to me. Could anybody give me the explain with
> > it?
> > Thanks a lot.
> > Best wishes,
> > Richard R. Zhang
> > 2006-04-26
> > -------------------------------------------------------
> > Using Tomcat but need to do more? Need to support web services,
> >
> > security?
> > Get stuff done quickly with pre-integrated technology to make
> > your
> > job easier
> > Download IBM WebSphere Application Server v.1.0.1 based on Apache
> >
> > Geronimo
> > http://sel.as-us.falkag.net/sel?
> > cmd=lnk&kid=120709&bid=263057&dat=121642
> > _______________________________________________
> > m5sim-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/m5sim-users
>
>
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services,
>
> security?
> Get stuff done quickly with pre-integrated technology to make your
>
> job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache
> Geronimo
> http://sel.as-us.falkag.net/sel?
> cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> m5sim-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/m5sim-users
>