Re: [m5-users] running multi core- instruction count in all cores is zero except 1... why?

prannav shrestha Mon, 23 Jun 2008 13:56:33 -0700

Yes, both CPU0 and CPU1 has fault 0.

This is what in the debug file


0: system.cpu1.fetch: [tid:0]: fault (itbmiss) detected @ PC 0x000000
 0: system.cpu1.decode: Processing [tid:0]
0: system.cpu0.fetch: [tid:0]: fault (itbmiss) detected @ PC 0x000000
 0: system.cpu0.decode: Processing [tid:0]

So, whats the story then?

Prannav

On Mon, Jun 23, 2008 at 3:24 PM, Korey Sewell <[EMAIL PROTECTED]> wrote:

> If starting workloads at PC 0 is what we intend to do  (when did that
> happen?), that would mean that the real starting PC would need to be
> loaded in the correct system register.
>
> For SE mode, are we making sure we load the system registers correctly
> so that the fault handler can pick it up and start at the right
> address?
>
> Last time I looked at the code (admittedly a while back), the AlphaTLB
> the fault handling code for ITLB miss loads from a system register
> right but then the problem I encountered was that in SE mode all the
> system stuff isnt loaded correctly so even though the fault is being
> taken correctly the registers to handle it at init. time werent set up
> correctly (Probably in the src/arch/alpha/process.cc) for SE and thus
> the trap handler didnt go through right.
>
> Again, if the code is updated to a diff. point I could be wrong. I
> dont even remember us intentionally starting workloads at PC 0 to get
> the ITLB Miss Fault so that's why I think that's a problem.
>
> prannav,
> is the CPU0 getting the same PC 0 fault to start off with or is it
> just CPU1? The answer to that question probably tells the story.
>
> On 6/23/08, Gabe Black <[EMAIL PROTECTED]> wrote:
> > Yes, it starts with PC 0, and yest there's an I TLB miss, but latter
> > there are actual instructions in the rest of the pipeline which wouldn't
> > happen if the process was bogus. There would also not be any
> > instructions to squash. Every time o3 starts up, it has to get itself
> > initialized and causes some bogus faults and such for the first few
> > ticks. After that, it gets it's act straightened out and goes to the
> > right place. I think he mentioned trying initializing the workload
> > outside of the loop the, so I'm pretty sure a bad process object isn't
> > it the culprit.
> >
> > To answer your question, yes there is a TLB object in SE mode. The TLB
> > itself doesn't handle the faults and is instead loaded by the faults, at
> > which point execution continues like it normally would. There is still
> > an abstracted page table structure the fault uses to look up the
> > official mapping of a virtual address to load the TLB, so if the address
> > is completely invalid, ie. not mapped in any way by the process, there
> > would still be a fail or panic just like before. I do think the TLB
> > could be to blame, though, since it's possible the fault is loading the
> > TLB in such a way as the TLB never matches with the entry.
> >
> > Prannav, if you could find the answers to those questions I mentioned,
> > it would really help clarify what's actually going on. If you run a gdb
> > targeted at Alpha on the benchmark binary and disassemble the
> > instruction at 0x12000dde8 that's what the instruction should be. In the
> > O3 output, I think it prints out what the instruction disassembles to
> > when it's fetched, so that will tell you what M5 is trying to execute.
> >
> > Gabe
> >
> > Korey Sewell wrote:
> > > but why would there be a TLB miss?
> > >
> > > It's because you are trying to execute PC "0x0" which is obviously not
> valid.
> > >
> > > I'm pretty sure that's the culprit. That's happened to me a bunch of
> > > times in the past and it's always initializing the process wrong in
> > > some way.
> > >
> > > But Did we not add TLB code for SE mode and the like? So now, instead
> > > of a "unmapped" failure and die (like old M5) we are probably just
> > > repeatedly trying to handle a trap that we shouldnt be handling.
> > >
> > >
> > > On 6/23/08, Gabe Black <[EMAIL PROTECTED]> wrote:
> > >
> > >> I don't think that's actually the problem since later the cpu goes
> > >> through a lot of the motions of executing instructions. The line
> > >>
> > >>  13000: system.cpu1.commit: Inst [sn:3] PC 0x12000dde8 has a fault
> > >>
> > >>
> > >> suggests that the instructions are executing, they're just faulting
> over
> > >> and over. What would be helpful is if you can figure out:
> > >>
> > >> 1. What the instruction actually is.
> > >> 2. What the fault it's throwing is.
> > >> 3. Why it's throwing that fault.
> > >> 4. Why it never successfully fixes that condition.
> > >>
> > >> What I'd guess is that there's some sort of data TLB miss that's
> > >> happening which is never successfully being fixed. Usually in glibc,
> one
> > >> of if not the first instruction a process executes sets the frame
> > >> pointer to 0, so I'm not sure what fault this could be throwing. It's
> > >> also possible the instruction address is being mistranslated and
> you're
> > >> executing the wrong memory.
> > >>
> > >> Gabe
> > >>
> > >> Korey Sewell wrote:
> > >>
> > >>> You need to look just a bit closer at this... The line(s) of interest
> are:
> > >>> " 0: system.cpu1.fetch: [tid:0]: Attempting to translate and read
> > >>> instruction, starting at PC 0x000000."
> > >>>
> > >>> Thus, if CPU1 is starting at address 0x0, that probably means it is
> > >>> starting with no workload, and eventually experienced a trap because
> > >>> there is no code at that address.
> > >>>
> > >>> It probably would be the best thing to have some kind of check
> > >>> somewhere to WARN a user that a CPU has no valid process to start
> from
> > >>> and then sleep the CPU rather than waste sim. cycles on that.
> > >>>
> > >>> Anyway,
> > >>> you have to figure out how to get the 2nd CPU to get a valid process.
> > >>> On a first cut, I would just hardcode the CPU processes bindings
> > >>> instead of using the loop like in your example. If you can get that
> to
> > >>> work, then you know something  is going on with how the loop is
> > >>> setting up your system.
> > >>>
> > >>> I'm guessing that something like this would work (check syntax
> though):
> > >>> "
> > >>> Process process1 = Benchmarks.SPECGCCEIO()
> > >>> Process process2 = Benchmarks.SPECGCCEIO()
> > >>> system.cpu[0].workload = process1
> > >>> system.cpu[1].workload = process2
> > >>> "
> > >>>
> > >>> I've done similar when i've needed something quick to work. I've
> > >>> noticed that you're using EIO so if you aren't able to hardcode it,
> > >>> then the EIO functionality could be the culprit as well.
> > >>>
> > >>> On Mon, Jun 23, 2008 at 12:15 AM, prannav shrestha <
> [EMAIL PROTECTED]> wrote:
> > >>>
> > >>>
> > >>>> HI Sewell!!
> > >>>> I run the simulation with O3CPU flags, and looking at the
> information
> > >>>> provided, i found that in my case, PC value for CPU remains same
> forever,
> > >>>> whereas CPU0 is executing the workload.  Also, the insructions
> fetched by
> > >>>> CPU1, which is always the same, is squashed everytime. Some part of
> the
> > >>>> debug output is as below:
> > >>>>
> > >>>> Global frequency set at 1000000000000 ticks per second
> > >>>>       0: global: BTB: Creating BTB object.
> > >>>>       0: system.cpu0.iew.lsq: LSQ sharing policy set to Partitioned:
> 32
> > >>>> entries per LQ | 32 entries per SQ      0: system.cpu0.commit:
> Commit Policy
> > >>>> set to Round Robin.      0: system.cpu0.rob: ROB sharing policy set
> to
> > >>>> Partitioned
> > >>>>       0: global: Creating AlphaO3CPU object.
> > >>>>
> > >>>> number of threads: 1
> > >>>>       0: global: Workload[0] process is 0      0: global: BTB:
> Creating BTB
> > >>>> object.
> > >>>>       0: system.cpu1.iew.lsq: LSQ sharing policy set to Partitioned:
> 32
> > >>>> entries per LQ | 32 entries per SQ      0: system.cpu1.commit:
> Commit Policy
> > >>>> set to Round Robin.      0: system.cpu1.rob: ROB sharing policy set
> to
> > >>>> Partitioned
> > >>>>       0: global: Creating AlphaO3CPU object.
> > >>>>
> > >>>> number of threads: 1
> > >>>>       0: global: Workload[0] process is 0      0: global: Calling
> activate
> > >>>> on Thread Context 0
> > >>>>       0: system.cpu0: [tid:0]: Calling activate thread.
> > >>>>       0: system.cpu0: [tid:0]: Adding to active threads list
> > >>>>       0: system.cpu0.fetch: Waking up from quiesce
> > >>>>       0: system.cpu0.commit: Generating TC squash event for [tid:0]
> > >>>>       0: global: Calling activate on Thread Context 0
> > >>>>       0: system.cpu1: [tid:0]: Calling activate thread.
> > >>>>       0: system.cpu1: [tid:0]: Adding to active threads list
> > >>>>       0: system.cpu1.fetch: Waking up from quiesce
> > >>>>       0: system.cpu1.commit: Generating TC squash event for [tid:0]
> > >>>>       0: system.cpu1:
> > >>>>
> > >>>> FullO3CPU: Ticking main, FullO3CPU.
> > >>>>       0: system.cpu1.fetch: Running stage.
> > >>>>       0: system.cpu1.fetch: Attempting to fetch from [tid:0]
> > >>>>       0: system.cpu1.fetch: [tid:0]: Attempting to translate and
> read
> > >>>> instruction, starting at PC 0x000000.
> > >>>>       0: system.cpu1.fetch: [tid:0]: Blocked, need to handle the
> trap.
> > >>>>       0: system.cpu1.fetch: [tid:0]: fault (itbmiss) detected @ PC
> > >>>> 0x000000      0: system.cpu1.decode: Processing [tid:0]
> > >>>>       0: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> > >>>> stage.
> > >>>>       0: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> > >>>> ......................
> > >>>> ........
> > >>>> ....
> > >>>> FullO3CPU: Ticking main, FullO3CPU.
> > >>>>    6500: system.cpu0.fetch: Running stage.
> > >>>>    6500: system.cpu0.fetch: There are no more threads available to
> fetch
> > >>>> from.
> > >>>>    6500: system.cpu0.decode: Processing [tid:0]
> > >>>>    6500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to
> run
> > >>>> stage.
> > >>>>    6500: system.cpu0.decode: [tid:0] Nothing to do, breaking out
> early.
> > >>>>    6500: system.cpu0.commit: Getting instructions from Rename stage.
> > >>>>    6500: system.cpu0.commit: Trying to commit instructions in the
> ROB.
> > >>>>    6500: system.cpu0.commit: [tid:0]: Instruction [sn:2] PC
> 0x120140ce8 is
> > >>>> head of ROB and ready to commit
> > >>>>    6500: system.cpu0.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> > >>>>    6500: system.cpu0: Scheduling next tick!
> > >>>>    6500: system.cpu1:
> > >>>>
> > >>>> FullO3CPU: Ticking main, FullO3CPU.
> > >>>>    6500: system.cpu1.fetch: Running stage.
> > >>>>    6500: system.cpu1.fetch: There are no more threads available to
> fetch
> > >>>> from.
> > >>>>    6500: system.cpu1.decode: Processing [tid:0]
> > >>>>    6500: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> > >>>> stage.
> > >>>>    6500: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> > >>>>    6500: system.cpu1.commit: Getting instructions from Rename stage.
> > >>>>    6500: system.cpu1.commit: Trying to commit instructions in the
> ROB.
> > >>>>    6500: system.cpu1.commit: [tid:0]: Instruction [sn:2] PC
> 0x12000dde8 is
> > >>>> head of ROB and ready to commit
> > >>>>    6500: system.cpu1.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> > >>>>    6500: system.cpu1: Scheduling next tick!
> > >>>>    7000: system.cpu1:
> > >>>>
> > >>>> .............
> > >>>> ..........
> > >>>> ..........
> > >>>>
> > >>>> FullO3CPU: Ticking main, FullO3CPU.
> > >>>>   13000: system.cpu1.fetch: Running stage.
> > >>>>   13000: system.cpu1.fetch: There are no more threads available to
> fetch
> > >>>> from.
> > >>>>   13000: system.cpu1.decode: Processing [tid:0]
> > >>>>   13000: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> > >>>> stage.
> > >>>>   13000: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> > >>>>   13000: system.cpu1.commit: Getting instructions from Rename stage.
> > >>>>   13000: system.cpu1.commit: Trying to commit instructions in the
> ROB.
> > >>>>   13000: system.cpu1.commit: Trying to commit head instruction,
> [sn:3]
> > >>>> [tid:0]
> > >>>>   13000: system.cpu1.commit: Inst [sn:3] PC 0x12000dde8 has a fault
> > >>>>   13000: system.cpu1.commit: Generating trap event for [tid:0]
> > >>>>   13000: system.cpu1.commit: Unable to commit head instruction
> > >>>> PC:0x12000dde8 [tid:0] [sn:3].
> > >>>>   13000: system.cpu1.commit: [tid:0]: Instruction [sn:3] PC
> 0x12000dde8 is
> > >>>> head of ROB and ready to commit
> > >>>>   13000: system.cpu1.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> > >>>>   13000: system.cpu1: Scheduling next tick!
> > >>>>   13000: system.cpu0:
> > >>>>
> > >>>> .....
> > >>>> ....
> > >>>> FullO3CPU: Ticking main, FullO3CPU.
> > >>>>   32500: system.cpu0.fetch: Running stage.
> > >>>>   32500: system.cpu0.fetch: Attempting to fetch from [tid:0]
> > >>>>   32500: system.cpu0.fetch: [tid:0]: Attempting to translate and
> read
> > >>>> instruction, starting at PC 0x120140cf0.
> > >>>>   32500: system.cpu0.fetch: Fetch: Doing instruction read.
> > >>>>   32500: system.cpu0.fetch: [tid:0]: Doing cache access.
> > >>>>   32500: system.cpu0.decode: Processing [tid:0]
> > >>>>   32500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to
> run
> > >>>> stage.
> > >>>>   32500: system.cpu0.decode: [tid:0]: Sending instruction to rename.
> > >>>>   32500: system.cpu0.decode: [tid:0]: Processing instruction [sn:3]
> with PC
> > >>>> 0x120140ce8
> > >>>>   32500: system.cpu0.decode: [tid:0]: Processing instruction [sn:4]
> with PC
> > >>>> 0x120140cec
> > >>>>   32500: system.cpu0.commit: Getting instructions from Rename stage.
> > >>>>   32500: system.cpu0.commit: Trying to commit instructions in the
> ROB.
> > >>>>   32500: system.cpu0.commit: [tid:0]: ROB has 0 insts & 192 free
> entries.
> > >>>>   32500: system.cpu0: Scheduling next tick!
> > >>>>   33000: system.cpu0:
> > >>>>
> > >>>> ..............
> > >>>> .........
> > >>>> FullO3CPU: Ticking main, FullO3CPU.
> > >>>>   39500: system.cpu0.fetch: Running stage.
> > >>>>   39500: system.cpu0.fetch: Attempting to fetch from [tid:0]
> > >>>>   39500: system.cpu0.fetch: [tid:0]: Attempting to translate and
> read
> > >>>> instruction, starting at PC 0x120140d00.
> > >>>>   39500: system.cpu0.fetch: Fetch: Doing instruction read.
> > >>>>   39500: system.cpu0.fetch: [tid:0]: Doing cache access.
> > >>>>   39500: system.cpu0.decode: Processing [tid:0]
> > >>>>   39500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to
> run
> > >>>> stage.
> > >>>>   39500: system.cpu0.decode: [tid:0]: Sending instruction to rename.
> > >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:5]
> with PC
> > >>>> 0x120140cf0
> > >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:6]
> with PC
> > >>>> 0x120140cf4
> > >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:7]
> with PC
> > >>>> 0x120140cf8
> > >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:8]
> with PC
> > >>>> 0x120140cfc
> > >>>>   39500: system.cpu0.commit: Getting instructions from Rename stage.
> > >>>>   39500: system.cpu0.commit: Trying to commit instructions in the
> ROB.
> > >>>>   39500: system.cpu0.commit: [tid:0]: ROB has 0 insts & 192 free
> entries.
> > >>>>   39500: system.cpu0: Scheduling next tick!
> > >>>>   40000: system.cpu0:
> > >>>> FullO3CPU: Ticking main, FullO3CPU.
> > >>>>   40000: system.cpu1.fetch: Running stage.
> > >>>>   40000: system.cpu1.fetch: There are no more threads available to
> fetch
> > >>>> from.
> > >>>>   40000: system.cpu1.decode: Processing [tid:0]
> > >>>>   40000: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> > >>>> stage.
> > >>>>   40000: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> > >>>>   40000: system.cpu1.commit: Squashing from trap, restarting at PC
> > >>>> 0x12000dde8
> > >>>>   40000: system.cpu1.commit: [tid:0]: Instruction [sn:5] PC
> 0x12000dde8 is
> > >>>> head of ROB and ready to commit
> > >>>>   40000: system.cpu1.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> > >>>>   40000: system.cpu1: Scheduling next tick!
> > >>>>   40500: system.cpu1.fetch-iport: Received timing
> > >>>>   40500: system.cpu1:
> > >>>> ....................
> > >>>> .................
> > >>>> FullO3CPU: Ticking main, FullO3CPU.
> > >>>> 1243500: system.cpu0.fetch: Running stage.
> > >>>> 1243500: system.cpu0.fetch: There are no more threads available to
> fetch
> > >>>> from.
> > >>>> 1243500: system.cpu0.decode: Processing [tid:0]
> > >>>> 1243500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to
> run
> > >>>> stage.
> > >>>> 1243500: system.cpu0.decode: [tid:0] Nothing to do, breaking out
> early.
> > >>>> 1243500: system.cpu0.commit: Getting instructions from Rename stage.
> > >>>> 1243500: system.cpu0.commit: Trying to commit instructions in the
> ROB.
> > >>>> 1243500: system.cpu0.commit: [tid:0]: Instruction [sn:369] PC
> 0x120106160 is
> > >>>> head of ROB and ready to commit
> > >>>> 1243500: system.cpu0.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> > >>>> 1243500: system.cpu0: Scheduling next tick!
> > >>>> 1244000: system.cpu0:
> > >>>>
> > >>>> .........
> > >>>> ........
> > >>>> FullO3CPU: Ticking main, FullO3CPU.
> > >>>> 4001000: system.cpu1.fetch: [tid:0]: Done squashing, switching to
> running.
> > >>>> 4001000: system.cpu1.fetch: Running stage.
> > >>>> 4001000: system.cpu1.fetch: Attempting to fetch from [tid:0]
> > >>>> 4001000: system.cpu1.fetch: [tid:0]: Attempting to translate and
> read
> > >>>> instruction, starting at PC 0x12000dde8.
> > >>>> 4001000: system.cpu1.fetch: [tid:0]: Blocked, need to handle the
> trap.
> > >>>> 4001000: system.cpu1.fetch: [tid:0]: fault (itbmiss) detected @ PC
> > >>>> 0x12000dde84001000: system.cpu1.decode: Processing [tid:0]
> > >>>> 4001000: system.cpu1.decode: [tid:0]: Done squashing, switching to
> running.
> > >>>> 4001000: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> > >>>> stage.
> > >>>> 4001000: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> > >>>> 4001000: system.cpu1.commit: Getting instructions from Rename stage.
> > >>>> 4001000: system.cpu1.commit: Trying to commit instructions in the
> ROB.
> > >>>> 4001000: system.cpu1.commit: [tid:0]: ROB has 0 insts & 192 free
> entries.
> > >>>> 4001000: system.cpu1: Scheduling next tick!
> > >>>> 4001500: system.cpu1:
> > >>>> ......
> > >>>>
> > >>>> Exiting @ cycle 4002500 because all threads reached the max
> instruction
> > >>>> count
> > >>>>
> > >>>> I am running two different benchmarks in those two cores. I have
> included
> > >>>> the long list of the debug. The PC of CPU1 is always the same. Why
> the
> > >>>> instruction at CPU1 is always being squashed?
> > >>>>
> > >>>> regards,
> > >>>> prannav
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Sun, Jun 22, 2008 at 10:49 PM, Korey Sewell <[EMAIL PROTECTED]>
> wrote:
> > >>>>
> > >>>>
> > >>>>> Turn on the trace flags for O3CPU and see what it says.... Inside
> the
> > >>>>> "o3" folder there should be a *.py file that list the trace-flags.
> > >>>>>
> > >>>>> If you run the simulation with all the O3 trace-flags on you should
> be
> > >>>>> able to see what's happening. I suggest only running for maybe 100
> > >>>>> ticks so that you dont get overloaded with text.
> > >>>>>
> > >>>>> From the debug output, you should see each CPU get initialized,
> fetch,
> > >>>>> and all that. I'm guessing what happens is that CPU1 starts up,
> sees
> > >>>>> no process to latch onto, and then sleeps, but the debug output
> should
> > >>>>> verify that for you.
> > >>>>>
> > >>>>> Simply looking at the instruction commits (Exec) wont help you in
> this
> > >>>>> case since you're saying that there is no instructions commit.
> You'll
> > >>>>> at least need to turn on "Fetch Decode Exec" and all that so you
> can
> > >>>>> get a detailed view.
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> ----------
> > >>>>> Korey L Sewell
> > >>>>> Graduate Student - PhD Candidate
> > >>>>> Computer Science & Engineering
> > >>>>> University of Michigan
> > >>>>> _______________________________________________
> > >>>>> m5-users mailing list
> > >>>>> [email protected]
> > >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> > >>>>>
> > >>>>>
> > >>>> _______________________________________________
> > >>>> m5-users mailing list
> > >>>> [email protected]
> > >>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> > >>>>
> > >>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >> _______________________________________________
> > >> m5-users mailing list
> > >> [email protected]
> > >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> > >>
> > >>
> > >
> > >
> > >
> >
> > _______________________________________________
> > m5-users mailing list
> > [email protected]
> > http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >
>
>
> --
> ----------
> Korey L Sewell
> Graduate Student - PhD Candidate
> Computer Science & Engineering
> University of Michigan
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] running multi core- instruction count in all cores is zero except 1... why?

Reply via email to