> On June 24, 2015, 7:30 a.m., Andreas Hansson wrote:
> > src/mem/ruby/system/RubyPort.hh, line 191
> > <http://reviews.gem5.org/r/2911/diff/1/?file=46843#file46843line191>
> >
> >     A port is not allowed to send a new request until it gets a retry, so 
> > the assert is appropriate in this case.
> 
> Jason Power wrote:
>     OK. Either I'm missing something glaringly obvious, or there is a bug in 
> the O3 model. Let me show you what I'm seeing.
>     
>     Here is a very simple test that produces the problem:
>     build/X86_MESI_Two_Level/gem5.debug --debug-flags=LSQUnit,RubyPort 
> configs/example/se.py --cpu-type=detailed --ruby -c 
> tests/test-progs/hello/bin/x86/linux/hello
>     
>     Here is a snippet of the debug output:
>          1> 480000: system.cpu.iew.lsq.thread0: Executing load PC 
> (0x415b57=>0x415b5b).(0=>1), [sn:171]
>          2> 480000: system.cpu.iew.lsq.thread0: Read called, load idx: 9, 
> store idx: 18, storeHead: 12 addr: 0x94e78
>          3> 480000: system.cpu.iew.lsq.thread0: Doing memory access for inst 
> [sn:171] PC (0x415b57=>0x415b5b).(0=>1)
>          4> 480000: system.ruby.l1_cntrl0.sequencer.slave1: Timing request 
> for address 0x94e78 on port 1
>          5> 480000: system.ruby.l1_cntrl0.sequencer.slave1: Request for 
> address 0x94e78 did not issued because Aliased
>          6> 480000: system.cpu.iew.lsq.thread0: LQ size: 33, #loads occupied: 
> 7
>          7> 480000: system.cpu.iew.lsq.thread0: SQ size: 33, #stores 
> occupied: 7
>          8> 480500: system.cpu.iew.lsq.thread0: Inserting store PC 
> (0x415b77=>0x415b7e).(1=>2), idx:19 [sn:188]
>          9> 480500: system.cpu.iew.lsq.thread0: LQ size: 33, #loads occupied: 
> 7
>          10> 480500: system.cpu.iew.lsq.thread0: SQ size: 33, #stores 
> occupied: 8
>          11> 481000: system.cpu.iew.lsq.thread0: Executing load PC 
> (0x415b65=>0x415b68).(0=>1), [sn:176]
>          12> 481000: system.cpu.iew.lsq.thread0: Read called, load idx: 10, 
> store idx: 19, storeHead: 12 addr: 0x94e80
>          13> 481000: system.cpu.iew.lsq.thread0: Doing memory access for inst 
> [sn:176] PC (0x415b65=>0x415b68).(0=>1)
>          14> 481000: system.ruby.l1_cntrl0.sequencer.slave1: Timing request 
> for address 0x94e80 on port 1
>          15> 481000: system.ruby.l1_cntrl0.sequencer.slave1: Request ReadReq 
> 0x94e80 issued
>          16> 481000: system.cpu.iew.lsq.thread0: Executing load PC 
> (0x415b6f=>0x415b73).(0=>1), [sn:184]
>          17> 481000: system.cpu.iew.lsq.thread0: Read called, load idx: 11, 
> store idx: 19, storeHead: 12 addr: 0x94e88
>          18> 481000: system.cpu.iew.lsq.thread0: Doing memory access for inst 
> [sn:184] PC (0x415b6f=>0x415b73).(0=>1)
>          19> 481000: system.ruby.l1_cntrl0.sequencer.slave1: Timing request 
> for address 0x94e88 on port 1
>         20> gem5.debug: 
> build/X86_MESI_Two_Level/mem/ruby/system/RubyPort.hh:192: void 
> RubyPort::addToRetryList(RubyPort::MemSlavePort*): Assertion 
> `std::find(retryList.begin(), retryList.end(), port) == retryList.end()' 
> failed.
>         21> Program aborted at cycle 481000
>     
>     Annotations:
>     3> LSQ issues a load
>     5> Ruby finds the address aliases and returns false for sendTimingRequest 
> (line 811: src/cpu/o3/lsq_unit.hh)
>     13> LSQ does another memory access (same function call) and it succeeds.
>     18> LSQ does another memory access (same function call) and it fails. 
> This causes Ruby to try to put the port on the retry list a second time and 
> the assert is triggered.
>     
>     It seems to me that the O3CPU does not follow the invariant that "A port 
> is not allowed to send a new request until it gets a retry". Am I missing 
> something here?
> 
> Andreas Hansson wrote:
>     My bad. The CPUs are indeed breaking this convention, and in the classic 
> memory system we have a check in the caches "if cache already committed to 
> send a retry, return false and do nothing else".
> 
> Emilio Castillo wrote:
>     Hello,
>     
>     I recently have been working in this issue and have identified this 
> commit "changeset 10333 6be8945d226b" as a source of issues with Ruby Port 
> retries and hangs, for me just to revert this changeset did the trick.

Emilio, are you suggesting to revert 10333 instead of this patch? It seems to 
me that reverting 10333 would be a large change. From reading the log, it fixes 
a performance problem with the O3CPU. Do you have a concrete reason why the 
10333 changeset is broken?


- Jason


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2911/#review6580
-----------------------------------------------------------


On June 23, 2015, 9:52 p.m., Jason Power wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2911/
> -----------------------------------------------------------
> 
> (Updated June 23, 2015, 9:52 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 10875:e15eea749253
> ---------------------------
> Ruby: Remove assert in RubyPort retry list logic
> 
> Remove the assert when adding a port to the RubyPort retry list.
> Instead of asserting, just ignore the added port, since it's
> already on the list.
> Without this patch, Ruby+detailed fails for even the simplest tests
> 
> 
> Diffs
> -----
> 
>   src/mem/ruby/system/RubyPort.hh e4f63f1d502d 
> 
> Diff: http://reviews.gem5.org/r/2911/diff/
> 
> 
> Testing
> -------
> 
> build/X86_MESI_Two_Level/gem5.debug configs/example/se.py --cpu-type=detailed 
> --ruby -c tests/test-progs/hello/bin/x86/linux/hello now runs to completion. 
> 
> Note: this is compatible (and required) with http://reviews.gem5.org/r/2787/ 
> as well.
> 
> 
> Thanks,
> 
> Jason Power
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to