Re: [gem5-dev] Review Request 2776: ruby: cleaner ruby tester support

Brad Beckmann Tue, 19 May 2015 08:37:16 -0700


> On May 12, 2015, 5:45 a.m., Andreas Hansson wrote:
> > src/mem/packet_queue.cc, line 117
> > <http://reviews.gem5.org/r/2776/diff/1/?file=45138#file45138line117>
> >
> >     I strongly disagree with this change. This buffer should not exist, and 
> > even 100 packets is a stretch. Any module hitting this needs to recondiser 
> > how it deals with flow control
> 
> Brad Beckmann wrote:
>     I would agree with you that the buffer should not exist, but removing the 
> buffer goes well beyond this patch.  The worst part about the buffer is the 
> size is not visible to the sender.  It is a really bad design, but for now we 
> need to get around the immediate problem.  Later we can discuss how and who 
> will fix this right.
>     
>     We've discussed the proposal in the past to increase the size, but 
> unfortunately we have not came to a resolution.  We absolutely need to 
> resolve this now.  We cannot use the ruby tester with our upcoming GPU model 
> without this change.
>     
>     The important thing to keep in mind is that the RubyTester is designed to 
> stress the protocol logic and it creates contention scenarios that would not 
> exist in a real system.  The RubyTester finds bugs in the matter of seconds 
> that may not be encountered in hours of real workload simulation.  It does 
> this by sending large bursts of racey requests.  Bottomline: the RubyTester 
> needs this large value.
> 
> Andreas Hansson wrote:
>     I would argue that if you need flow control, you should not use the 
> QueuedPorts, and rather use a "bare-metal" MasterPort, and deal with the flow 
> control. There is no point in adding flow control (or buffer visibility) to 
> the QueuePort in my opinion.
>     
>     I'd suggest to switch to a MasterPort either as part of this patch, or a 
> patch before it. That way you have no implicit buffer, and no need to create 
> kilobytes of invisible buffering in the system.
> 
> Brad Beckmann wrote:
>     The RubyPort and the Ruby Tester predate MasterPorts and QueuedPorts.  
> Your suggestion goes far beyond this patch.  Reimplementing RubyPort's m5 
> ports to inherit from a different base port is a very signficant change with 
> many, many implications beyond the Ruby Tester.  That is a pretty 
> unreasonable request.
>     
>     Please keep in mind that I was not the one who decided RubyPort to use 
> QueuedPorts. That decision was made back in 2012 with patches from your group 
> 8914:8c3bd7bea667, and 8922:17f037ad8918.
> 
> Jason Power wrote:
>     Is the queued port really supposed to be a high-fidelity model of 
> anything in the "real" world? I was under the impression that it was a 
> courtesy wrapper around the port so that we don't have 10 different 
> implementations of the same general thing.
>     
>     IMO, if you are trying to model an on-chip network at high-fidelity you 
> need something more than what is provided by this model anyway. It seems to 
> me that in this case, the queued port is a gem5 interface decision, not a 
> hardware system design decision. Therefore, infinite buffering is not a major 
> problem.
>     
>     What if this was turned into a "warn" or a "warn_once"? It seems like a 
> good idea to let the user know that the queue is growing wildly as that might 
> be an indication of a deadlock.
>     
>     (Note: IIRC with our high-bandwidth GPU model we've run into this 100 
> packet limit as well.)
> 
> Andreas Hansson wrote:
>     I think:
>     
>     1) The 100 packet sanity check should definitely not be changed as part 
> of this patch. If we do want to change it, it should be a patch on its own.
>     
>     2) If there are somehow 10000 responses queued up, then something is 
> truly _fundamentally_ broken, and having a panic telling the user it is 
> broken is helpful imho. Do you not agree?
>     
>     The QueuedPort is an evolution of the SimpleTimingPort that was used 
> earlier, and I believe the RubyPort has always had this issue (it borrowed a 
> lot of not-so-great design decisions from the Bus for example). The 
> QueuedPort should really only be used as a "lazy" way of implementing 
> non-performance critical devices. I've been meaning to remove it from the 
> classic Caches as well, just not gotten around to it.
>     
>     In the case of the RubyPort, I do not see the big issue of moving away 
> from QueuedPorts. We have already done so in the Bridge, SimpleMemory etc. 
> The only implication is that we need to deal with the port flow control 
> (retry), and pass that downstream/upstream. Since the RubyPort really does 
> not correspond to any real physical object, with any buffers, I think the 
> case is clear: It should not do any buffering behind the user's back. Do you 
> not think this is reasonable?
> 
> Brad Beckmann wrote:
>     Yes, if there is 10000 responses queued up, then there might be something 
> broken.  However, 100 responses can be queued up rather easily when running 
> with the Ruby Tester or with a GPU.  That is why this patch changes the 
> sanity check from 100 to 10000.
>     
>     RubyPort is used by a lot of users, with many different 
> scenarios/protocols.  Moving it away from QueuedPorts will fundamentally 
> change its behavior.  That should be done carefully and I don't want to delay 
> this bug fix with a project that large.
>     
>     Would you be happy if we introduced a patch that modifies the 
> QueuedSlavePort, RespPacketQueue, and PacketQueue constructors so that the 
> RubyPort can set the transmitList threshold to 10000 and the default can 
> still be set to 100?
> 
> Andreas Hansson wrote:
>     Can we simply leave this change out of this patch? I think the right 
> solution here is to fix the dodgy assumptions on implicit buffering, and I'm 
> happy to have a go myself (given that I've done something dimilar to a range 
> of other modules).
> 
> Brad Beckmann wrote:
>     The RubyTester and our soon-to-be-posed HSA-compatible GPU will not work 
> without this change.  I believe many Ruby users, especially Ruby users 
> evaluating GPUs, have had this type of change in their personal patch queues 
> for a very long time.  We need to get this problem out of our personal patch 
> queues and into the main line.
>     
>     I appreciate your offer to redesign the RubyPort yourself, but even if 
> that were to happen, I would prefer to get this change happen first.  A 
> redesign of the RubyPort will require a lot of downstream work to verify 
> things still work as expected.
>     
>     Why are you so against changing this somewhat arbritrary value, or at 
> least allowing someone to change it?  As Jason pointed out, the queue port is 
> an infinite buffer and doesn't represent a real hardware structure.  Why so 
> much resistance?  This small change really benefits Ruby users and if we make 
> it configurable, it will have no impact on Classic users.
> 
> Andreas Hansson wrote:
>     I fundamentally object to the change since it effectively adds an 
> implicit infinite buffer (in quite some places), thus changing the very 
> behaviour of the system and potentially masking problems.
>     
>     The RubyPort should really just be a glue-logic adaptor. Agreed? In a 
> system architecture, there is no such thing as a RubyPort, and there should 
> not be kilobytes of "magic" buffering imposed on the user. If this buffer is 
> needed, then we should instantiate a buffer, cache, etc. I do not understand 
> the statement that this is needed to make things work. If you need an 
> infinite buffer to make things work, then the solution is broken. Am I 
> missing something?
>     
>     I would propose to simply remove the QueuedPort, make it inherit from the 
> Master/SlavePort directly, remove the implicit buffer altogether, and simply 
> relay the retry to/from Ruby. Surely Ruby understands flow control...does it 
> not?
> 
> Steve Reinhardt wrote:
>     Getting rid of implicit infinite buffering is a noble and appropriate 
> long-term goal, I agree.  However, in the near term, we already have implicit 
> infinite buffering, and the only thing we're debating here is at what 
> arbitrary point do we warn the user that the use of that infinite buffer is 
> potentially getting abusive.  To me it seems obvious that the fundamental 
> near-term problem is that we have this totally arbitrary constant embedded in 
> the code and no way to adjust it based on different circumstances in which 
> different values would be appropriate.  I think Brad's suggestion that we 
> make this threshold configurable and then find some way to propagate the 
> value downward seems like the right way to go for now; we can leave the 
> threshold at 100 for CPU-based systems, and crank it up for GPU-based 
> systems, with maybe a third even higher thershold specifically for when we 
> run RubyTester.
> 
> Andreas Hansson wrote:
>     I'd rather have one arbitrary constant than three :-), and the lower the 
> better.
>     
>     Can we just proceed with this patch without this change? At least then I 
> can perhaps get an appreciation for the issue (if you can point me to an 
> example that fails). Right now I really do not understand how this could ever 
> be a problem in any sensible setup.
> 
> Joel Hestness wrote:
>     I agree with Andreas on this one. I don't feel that an "arbitrary" 
> constant should be changed to another "arbitrary" constant in a patch that is 
> trying to fix up the RubyTester. There must be configurations of the 
> RubyTester that will work without increasing the threshold to 10000, so the 
> RubyTester changes will still work for some configurations. These are largely 
> orthogonal issues.
>     
>     Split the threshold change into a separate patch where we can focus on 
> debating the merits of changes between so-called "arbitrary" buffering limits 
> (e.g. gem5-gpu doesn't require any changes to the RubyPort or QueuedPort 
> buffering thresholds).
> 
> Steve Reinhardt wrote:
>     We can certainly split this change into a separate patch.  It's true that 
> 10,000 is just another arbitrary value; it may make more sense to just add a 
> parameter that lets us turn off this threshold check altogether rather than 
> create another arbitrary constant.  Note that the RubyTester is not trying to 
> be a "sensible setup", it's a stress tester that is stressing the protocol 
> implementation beyond the limits of what a "sensible" configuration might be 
> able to do.
> 
> Brad Beckmann wrote:
>     I have no problem moving this change to a separate patch, but we still 
> need to provide flexibility on either setting this arbitrary value or turning 
> it off.  Leaving it as is will not work for us.  We have important 
> configurations of the RubyTester and other internal testers that reach the 
> 100 packet threshold.
>     
>     I don't want to create a separate patch only to spawn off another week of 
> high-level resistance.  Can you confirm that if we create a patch that allows 
> an object to configure the threshold or turn it off, that you will approve 
> it?  Do you have a preference between those two options?
> 
> Joel Hestness wrote:
>     Sure.
>     
>     The RubyTester (or other testers) is certainly an exceptional case for 
> coherence and consistency incantations aimed at finding bugs. From what I've 
> read about the incoming GPU memory hierarchies in these reviews, however, I'm 
> unsure whether I can be convinced that a GPU core's sequencer should be 
> allowed to queue more than order(100) packets. Hitting this limit with real 
> system components seems like a major red flag that your simulated system is 
> very unbalanced and needs to be reconfigured. We definitely need to see the 
> GPU code to understand that more deeply.
>     
>     As a separate review request, I would prefer that we add an option to 
> allow very large but finite buffering in all PacketQueue instances of a 
> single simulation. Allowing infinite buffering is likely a bad idea for 
> debugging (imagine a Ruby livelock that just fills a RubyPort response buffer 
> that never unblocks). I'd also prefer comments on that option that strongly 
> suggest it should only be used for exceptional cases, such as testers. Such 
> an option should *not* be exposed on the command line due to obvious 
> potential misuse by a naive user, but instead should be set by the RubyTester 
> simulation config file. I would likely oppose a per-instance PacketQueue 
> buffering limit, because that would just encourage strange, improper use of 
> fake buffering. Until we can actually see your GPU code, I feel that the 
> option should only be allowed to be used with the RubyTester in mainline gem5 
> code.
> 
> Brad Beckmann wrote:
>     Thanks Joel for voicing your preference for the finite buffering case.  
> Once Andreas approves as well, I'll implement and post that patch.
>     
>     As far as your concern with an unbalanced GPU system, please keep in mind 
> this buffer is for responses.  Also in our implementation each work-item 
> issues a separate packet request that is coalesced in the RubyPort.  All it 
> takes is 2 wavefronts (2x64 work-items) to be waiting on the same few cache 
> lines.  When those cache lines arrive at the GPU L1 cache, 128 packets will 
> be sent back to the GPU cache thus the threshold will be exceeded.  I do not 
> think this is a result of an unbalanced system.
> 
> Andreas Hansson wrote:
>     I would like to understand the architecture a bit better before simply 
> increasing the limit. Again, I propose we get this patch pushed without this 
> change, since I envision that the discussion has not quite converged.
>     
>     From the brief description it sounds like the issue arises due to a 
> rather unfortunate use of packets. Perhaps I am missing something, but why 
> not simply add masking or do the coalescing before the "real" port? Are the 
> GPU L1 caches not taking care of this already (or are the L1's in Ruby?).
>     
>     I had a look at the RubyPort, and it seems all cases of QueuedPort can 
> easily be removed and replaced simply by forwarding retries from one side to 
> the other. The one exception I cannot figure out is the "hit_callback". How 
> can we tell Ruby that a response needs to wait?


Yes this discussion has not quite converge, but I think that is mostly due to 
your concerns.  You say "rather unfortunate use of packets".  Perhaps we used 
packets differently than what you would have expected, but do not believe that 
deems the use improper or unfortunate.  Please be open to uses of gem5 beyond 
the ones you have previously considered.

The GPU L1 caches are modeled in Ruby.  Memory requests are sequenced and 
coalesced in the RubyPort.  This is no different than the other Ruby protocols.

The "hit_callback" is pretty fundamental to Ruby.  It has been designed to 
always "sink" responses because the protocols are set up to reserve the 
resources ahead of time.  I believe that is fairly representative of real 
hardware and please note resources like register ports are outside of the realm 
of Ruby.  Thus responses will stall inside the RubyPort.

Please let's not stall this necessary change on a fundamental redesign of Ruby.


- Brad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2776/#review6152
-----------------------------------------------------------


On May 11, 2015, 10:28 p.m., Tony Gutierrez wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2776/
> -----------------------------------------------------------
> 
> (Updated May 11, 2015, 10:28 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 10833:e624796bae17
> ---------------------------
> ruby: cleaner ruby tester support
> 
> This patch allows the ruby random tester to use ruby ports that may only
> support instr or data requests.  This patch is similar to a previous changeset
> (8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets.
> This current patch implements the support in a more straight-forward way.
> The patch also includes better DPRINTFs and generalizes the retry behavior
> needed by the ruby tester so that other testers/cpu models can use it as well.
> 
> 
> Diffs
> -----
> 
>   configs/example/ruby_random_test.py 
> fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   configs/ruby/MESI_Three_Level.py fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   configs/ruby/MESI_Two_Level.py fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   configs/ruby/MI_example.py fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   configs/ruby/MOESI_CMP_directory.py 
> fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   configs/ruby/MOESI_CMP_token.py fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   configs/ruby/MOESI_hammer.py fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/cpu/testers/rubytest/Check.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/cpu/testers/rubytest/CheckTable.cc 
> fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/cpu/testers/rubytest/RubyTester.hh 
> fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/cpu/testers/rubytest/RubyTester.cc 
> fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/cpu/testers/rubytest/RubyTester.py 
> fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/mem/packet_queue.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/mem/ruby/system/RubyPort.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/mem/ruby/system/RubyPort.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec 
>   src/mem/ruby/system/Sequencer.py fbdaa08aaa426b9f4660c366f934ccb670d954ec 
> 
> Diff: http://reviews.gem5.org/r/2776/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Tony Gutierrez
> 
>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 2776: ruby: cleaner ruby tester support

Reply via email to