Hi Kevin,
I'm glad to see your reply. I just found this problem few days ago. I solved it
by
a similar method with you mentioned. I detail my method below, and hope it will
be useful.
I added two pseudo instructions, timerstart and timerstop. Their opcodes are
0x25
and 0x26 in M5 special 0x01 opcode space. The Rb of instruction timerstart is
the
timer type, and the instruction returns a timerID in its Ra register. The Ra of
instruction timerstop is the timerID which is the timerstart returned. And Rb
of
timerstop is the dependency register.
I added two flags in cpu/static_inst.hh:122, which are IsTimerStart and
IsTimerStop,
to indicate the pseudo instructions. Two functions, listed below, are als added
to
cpu/static_inst.hh:211.
bool isTimerStart() const { reutrn flags[IsTimerStart];}
bool isTimerStop() const { return flags[IsTimerStop];}
Two similar functions, listed below, are added to
encumbered/cpu/full/dyn_inst.hh:180.
bool isTimerStart() const { return staticInst->isTimerStart();}
bool isTimerStop() const { return staticInst->isTimerStop();}
These functions are used to make the instrution detection more easily.
Then, I added two virtual methods, addStatsTimer and stopStatsTimer to class
BaseCPU
in cpu/base.hh:89, and implement them in cpu/base.cc to implement the base timer
function. These motheds are overrided in class FullCPU, in
encumbered/cpu/full/cpu.hh.
The FullCPU::addStatsTimer calls the BaseCPU::addStatsTimer directly. And
FullCPU::stopStatsTimer just records the timerID and its PC in a map. Another
mothed
FullCPU::fullcpuStopStatsTimer is added to implement timer stop operation. Then
the
following code fragment added in encumbered/cpu/full/writeback.cc:595, is used
to
detect timerstop instruction, and complete the timer stop operation.
if(rob_entry->inst->isTimerStop()){
DynInst *di = rob_entry->inst;
FullCPU *cpu = di->cpu;
uint32_t timerid;
std::map<uint64_t, uint32_t>::iterator it;
it = cpu->TimerPCmap.find (di->PC);
if (it != cpu->TimerPCmap.end()){
timerid = it->second;
cpu->fullcpuStopStatsTimer(timerid, di->PC);
}
}
Finally, I added the following statements to arch/alpha/isa_desc:2713.
0x25: timerstart({{
#if FULL_SYSTEM
Ra = AlphaPseudo::timerstart(xc->xcBase(), Rb);
#else
Ra = 0;
#endif
}}, IsTimerStart);
0x26: timerstop({{
#if FULL_SYSTEM
Rb = AlphaPseudo::timerstop(xc->xcBase(), Ra) + (Rb & 0);
#else
;
#endif
}}, IsTimerStop);
Two functions, timerstart and timerstop, are added to arch/alpha/pseudo_inst.cc.
Here is their implementation.
uint32_t timerstart(ExecContext *xc, uint64_t type)
{
BaseCPU *cpu = xc->cpu;
uint64_t pc = xc->readPC();
uint32_t id = cpu->addStatsTimer(type, pc);
return id;
}
uint32_t timerstop (ExecContext *xc, uint64_t id)
{
BaseCPU *cpu = xc->cpu;
uint64_t pc = xc->readPC();
uint32_t timerid = (uint32_t)id;
cpu->stopStatsTimer(timerid, pc);
return 0;
}
Because the pseudo instruction does not have any mnemonic symbol, I use
following macros, and have added them to my source code of Linux.
#define starttimer(save, id, type) \
{\
asm volatile ("mov $8, %0" : "=r"(save)::"memory");\
asm volatile ("lda $8, "#type);\
asm volatile (".long (((0x01) << 26) | ((0) << 21) | ((8) << 16) |
(0x25))");\
asm volatile ("sextl $0, %0" : "=r"(id)::"memory");\
asm volatile ("mov %0, $8" ::"r"(save):"memory");\
}
#define stoptimer(save, save1, id, dep) \
{\
asm volatile ("mov $8, %0" : "=r"(save)::"memory");\
asm volatile ("mov $7, %0" : "=r"(save1)::"memory");\
asm volatile ("mov %0, $8" :: "r"(id):"memory");\
asm volatile ("mov %0, $7" :: "r"(dep):"memory");\
asm volatile (".long (((0x01) << 26) | ((8) << 21) | ((7) << 16) |
(0x26))");\
asm volatile ("mov %0, $7" :: "r"(save1):"memory");\
asm volatile ("mov %0, $8" :: "r"(save):"memory");\
}
Richard R. Zhang
2006-06-01
发件人: Kevin Lim
发送时间: 2006-06-01 03:10:14
收件人: Richard R. Zhang
抄送: m5sim-users
主题: Re: [m5sim-users] Is there anything wrong with the io access latency?
Hi Richard,
Sorry for not responding earlier. What you're encountering is one of
the
deficincies of the encumbered FullCPU. Like SimpleScalar, it executes
instructions at the front of the pipeline instead of the execute
stage.
The call to execute the instruction happens in fetch.cc:880. Thus your
pseudo instruction executes and returns the curTick around the time it
is fetched, instead of when it writes back. We have an out of order
model that executes at execute that will be included in our 2.0
release.
If you can't wait until then, I suggest trying to make the front
end
halt and not execute until the back end drains whenever it encounters
your pseudo instruction. As a start you could add an extra flag to
your
pseudo instruction to mark it as such (add to StaticInst, and in
isa_desc), and have the code in fetch.cc detect your pseudo
instruction.
I'm not familiar enough with the rest of the CPU code to help you
too
much beyond that unfortunately. This method might be somewhat complex,
and might be inaccurate by a few cycles because it waits for the
pipeline to drain, but it should be closer to the real I/O latency
than
is currently reported.
Kevin
Richard R. Zhang wrote:
>Hi Ali,
>Thanks for your reply. I'll try it later. And I think that I have
> found the reason
>of the too small io access latency. Either rpcc or my own pseudo
>instrucion has the
>same problem. The value it returns is the cycles when the
>instruction is dispatched,
>not the cycles when it is written back. So, the fake dependency
>register does not take
>effect. The table shows the cycles of instruction execution stage of
> the following
>instrucion sequence.
> rpcc r9,r9
> ...
> ldl r0,0(r16)
> ...
> rpcc r1,r1
> ...
>_____________________________________________________________
>| |rpcc r9,r9 | ldl r0,0(r16) | rpcc r1,r1
> |
>-------------------------------------------------------------
>|fetech | 80386 | 80390 | 80395
> |
>-------------------------------------------------------------
>|dispatch | 80402 | 80406 | 80411
> |
>-------------------------------------------------------------
>|writeback | 80407 | 81970 | 81976
> |
>-------------------------------------------------------------
>|commit | 80441 | 81971 | 81982
> |
>-------------------------------------------------------------
>The values in the table are the lowest 5 digits of cycle. The
>remainning digits are same.
>The values come from a trace file.
>
>Though I know the reason, I still have no idea to implement the
>rpcc in the correct way.
>How can I let the rpcc to get cycles when the dependency register
>has been ready? Can you
>give me a suggestion?
>
>Thanks!
>
>Richard R. Zhang
>2006-05-17
>
>
>
>
>
>
>-------------------------------------------------------
>Using Tomcat but need to do more? Need to support web services,
>security?
>Get stuff done quickly with pre-integrated technology to make your
>job easier
>Download IBM WebSphere Application Server v.1.0.1 based on Apache
>Geronimo
>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
>_______________________________________________
>m5sim-users mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/m5sim-users
>
>
>
Ym���僵�j�◎���Ljv�y谚藏��h������⑺Z��b�An�\��ěy�^r�r&П8^�(!z�g�f蕻��y掩登��'�q�b�{"��^��-�x"�郜都�jv��匹J�i⒒B�'$��^j规⒎《��钎j爽}�dj
薹����