Hi Ali£¬Lisa, and Steve,
Thanks a lot for your help.
Following Ali's advice, I have tried two methods to
implement the time measure in the
guest system. But they still led to the same results
which is same as I mentioned in
my first mail. The I/O access time in CacheCPU is
still much bigger than it in Detail-
edCPU.
Here are my methods.
Method 1. As I did before, I added a m5 instrustion
to the simulator. The following code
fragments come from the file isa_desc and pseudo_inst.cc
in my m5 code.
In isa_desc:
...
0x24: gettick({{
Ra = AlphaPseudo::gettick(xc-
>xcBase()) + (Rb & 0);
}}, No_OpClass, IsNonSpeculative);
...
In pseudo_inst.cc:
...
uint64_t
gettick (ExecContext *xc)
{
return curTick;
}
...
And I changed the readl and writel function with the
following function in Linux driver.
static void __stat_writel(u32 v, volatile void __iomem *addr)
{
int64_t before, after;
after = 0;
before = gettick(after);
writel (v, addr);
after = gettick ((int64_t)addr);
if (enable_stat){
iow_cycles += (after - before);
iow_count ++;
}
}
static u32 __stat_readl(const volatile void __iomem *addr)
{
int64_t before, after;
u32 ret;
after = 0;
before = gettick(after);
ret = readl (addr);
after = gettick (ret);
if (enable_stat){
ior_cycles += (after - before);
ior_count ++;
}
return ret;
}
Method 2. I use the rpcc instruction to get the tick
value, not mine pseudo
instruction. The only difference in driver code is that
the gettick function
is replaced by __rpcc function which is a wrapper of
rpcc instruction.
static __inline int64_t _rpcc(int64_t dep)
{
int64_t res;
asm volatile ("rpcc %0, %1" : "=r"(res) :
"r"(dep) : "memory");
return res;
}
I'm confused with this problem. I think that the I/O
register access could
not be fast as the cache access. It seems that the
data was come from the
cache not the device. Because the instruction dependency
has been added in
my time measure code, the out-of-order model could not
be real reason. And
I have found a strange thing. In detailedCPU model, the
first I/O access
time is similar as the cachedCPU model. The following
text is come from the
console log. The strings in it is printed by printk in
Linux driver.
XXXXXXXisrXXXXXXX
irq count:16, used 3973 cycles
In this irq:
io read count:1, used 1556 cycles
io write count:1, used 1604 cycles
XXXXXXXisrXXXXXXX
irq count:17, used 4736 cycles
In this irq:
io read count:2, used 3240 cycles
io write count:0, used 0 cycles
...from here, the simulator entered the detailed mode...
XXXXXXXisrXXXXXXX
irq count:18, used 4169 cycles
In this irq:
io read count:2, used 1623 cycles
io write count:0, used 0 cycles
XXXXXXXisrXXXXXXX
irq count:19, used 3818 cycles
In this irq:
io read count:1, used 9 cycles
io write count:1, used 11 cycles
Could anyone give me some more advices? Thanks a lot!
Richard R. Zhang
2006-05-12
·¢¼þÈË£º Ali Saidi
·¢ËÍʱ¼ä£º 2006-04-27 12:14:54
ÊÕ¼þÈË£º Steve Reinhardt
³ ËÍ£º Richard R. Zhang; Lisa Hsu; m5sim-users
Ö÷Ì⣺ Re: [m5sim-users] Is there something wrong
with the io access latency?
I believe Steve is exactly correct, the out-
of-order model is not
enforcing a dependency between your two
instructions. The way to fix
it is to force a dependancy to a
register (for example the result of
the load). You need to do this both in
the decoder and in the code
that executes the instruction.
For example for the rpcc instruction (this
code may be a little bit
newer than yours, but same idea):
/* Rb is a fake dependency so here
is a fun way to get the parser
to understand that. */
Ra = xc- >readMiscRegWithEffect
(AlphaISA::IPR_CC, fault) + (Rb & 0);
and in some code:
inline uint32_t cycleCounter(uint32_t dep)
{
uint32_t res;
asm volatile ("rpcc %0, %1" :
"=r"(res) : "r" (dep) : "memory");
return res;
}
t1 = cycleCounter(trash);
for (x = 0; x < count; x++) {
trash = readl(addr);
t2 = cycleCounter(trash);
}
Ali
On Apr 26, 2006, at 9:59 PM, Steve
Reinhardt wrote:
My guess would be that it has to do
with the out-of-order
scheduling in the detailed CPU. If
the instruction that reads
curTick has no dependence on the read
or write instructions, then
it will get executed out-of-order while
the read or write is still
stalled.
I remember that we ran into this
problem ourselves but I don't
remember the details of how we solved
it... Ali or Nate, can you
help here?
Steve
Richard R. Zhang wrote:
Hi Lisa and all M5 users,
I find something strange with the io
access latency. Could you
give me a hint with it?
I have added a new instruction to the
alpha isa. This instruction
can get the curTick in M5. It seems
to work correctly. So, I plan
to use it to measure the time in
the guest OS. Then, I added some
statements to the ns83820 driver. These
statements compute the
time used by the driver irq routine
and the io access(just compute
the of writel and readl). But the
results below puzzled me, and I
can't explain it. These results come
from a netperf maerts test
under Sampler mode, and the memory
configuration is STE.
-------------------------------------------------------------------
--
--------
| | CacheCPU mode | DetailedCPU mode |
-------------------------------------------------------------------
--
--------
|avg.io read time | 1581 cycles | 40 cycles |
-------------------------------------------------------------------
--
--------
|avg.io write time | 1561 cycles | 9 cycles |
-------------------------------------------------------------------
--
--------
I don't know why the io access time
in CacheCPU is much bigger
than it in DetailedCPU. I think that
the time in CacheCPU mode
should less than which in DetailedCPU
mode, at least equal to it.
This is strange to me. Could anybody
give me the explain with it?
Thanks a lot.
Best wishes,
Richard R. Zhang
2006-04-26
-------------------------------------------------------
Using Tomcat but need to do more?
Need to support web services,
security?
Get stuff done quickly with pre-integrated
technology to make your
job easier
Download IBM WebSphere Application Server v.
1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?
cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
m5sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/m5sim-users
-------------------------------------------------------
Using Tomcat but need to do more? Need
to support web services,
security?
Get stuff done quickly with pre-integrated
technology to make your
job easier
Download IBM WebSphere Application Server v.
1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?
cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
m5sim-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/m5sim-users