> On May 12, 2015, 5:45 a.m., Andreas Hansson wrote: > > src/mem/packet_queue.cc, line 117 > > <http://reviews.gem5.org/r/2776/diff/1/?file=45138#file45138line117> > > > > I strongly disagree with this change. This buffer should not exist, and > > even 100 packets is a stretch. Any module hitting this needs to recondiser > > how it deals with flow control > > Brad Beckmann wrote: > I would agree with you that the buffer should not exist, but removing the > buffer goes well beyond this patch. The worst part about the buffer is the > size is not visible to the sender. It is a really bad design, but for now we > need to get around the immediate problem. Later we can discuss how and who > will fix this right. > > We've discussed the proposal in the past to increase the size, but > unfortunately we have not came to a resolution. We absolutely need to > resolve this now. We cannot use the ruby tester with our upcoming GPU model > without this change. > > The important thing to keep in mind is that the RubyTester is designed to > stress the protocol logic and it creates contention scenarios that would not > exist in a real system. The RubyTester finds bugs in the matter of seconds > that may not be encountered in hours of real workload simulation. It does > this by sending large bursts of racey requests. Bottomline: the RubyTester > needs this large value. > > Andreas Hansson wrote: > I would argue that if you need flow control, you should not use the > QueuedPorts, and rather use a "bare-metal" MasterPort, and deal with the flow > control. There is no point in adding flow control (or buffer visibility) to > the QueuePort in my opinion. > > I'd suggest to switch to a MasterPort either as part of this patch, or a > patch before it. That way you have no implicit buffer, and no need to create > kilobytes of invisible buffering in the system. > > Brad Beckmann wrote: > The RubyPort and the Ruby Tester predate MasterPorts and QueuedPorts. > Your suggestion goes far beyond this patch. Reimplementing RubyPort's m5 > ports to inherit from a different base port is a very signficant change with > many, many implications beyond the Ruby Tester. That is a pretty > unreasonable request. > > Please keep in mind that I was not the one who decided RubyPort to use > QueuedPorts. That decision was made back in 2012 with patches from your group > 8914:8c3bd7bea667, and 8922:17f037ad8918. > > Jason Power wrote: > Is the queued port really supposed to be a high-fidelity model of > anything in the "real" world? I was under the impression that it was a > courtesy wrapper around the port so that we don't have 10 different > implementations of the same general thing. > > IMO, if you are trying to model an on-chip network at high-fidelity you > need something more than what is provided by this model anyway. It seems to > me that in this case, the queued port is a gem5 interface decision, not a > hardware system design decision. Therefore, infinite buffering is not a major > problem. > > What if this was turned into a "warn" or a "warn_once"? It seems like a > good idea to let the user know that the queue is growing wildly as that might > be an indication of a deadlock. > > (Note: IIRC with our high-bandwidth GPU model we've run into this 100 > packet limit as well.) > > Andreas Hansson wrote: > I think: > > 1) The 100 packet sanity check should definitely not be changed as part > of this patch. If we do want to change it, it should be a patch on its own. > > 2) If there are somehow 10000 responses queued up, then something is > truly _fundamentally_ broken, and having a panic telling the user it is > broken is helpful imho. Do you not agree? > > The QueuedPort is an evolution of the SimpleTimingPort that was used > earlier, and I believe the RubyPort has always had this issue (it borrowed a > lot of not-so-great design decisions from the Bus for example). The > QueuedPort should really only be used as a "lazy" way of implementing > non-performance critical devices. I've been meaning to remove it from the > classic Caches as well, just not gotten around to it. > > In the case of the RubyPort, I do not see the big issue of moving away > from QueuedPorts. We have already done so in the Bridge, SimpleMemory etc. > The only implication is that we need to deal with the port flow control > (retry), and pass that downstream/upstream. Since the RubyPort really does > not correspond to any real physical object, with any buffers, I think the > case is clear: It should not do any buffering behind the user's back. Do you > not think this is reasonable? > > Brad Beckmann wrote: > Yes, if there is 10000 responses queued up, then there might be something > broken. However, 100 responses can be queued up rather easily when running > with the Ruby Tester or with a GPU. That is why this patch changes the > sanity check from 100 to 10000. > > RubyPort is used by a lot of users, with many different > scenarios/protocols. Moving it away from QueuedPorts will fundamentally > change its behavior. That should be done carefully and I don't want to delay > this bug fix with a project that large. > > Would you be happy if we introduced a patch that modifies the > QueuedSlavePort, RespPacketQueue, and PacketQueue constructors so that the > RubyPort can set the transmitList threshold to 10000 and the default can > still be set to 100? > > Andreas Hansson wrote: > Can we simply leave this change out of this patch? I think the right > solution here is to fix the dodgy assumptions on implicit buffering, and I'm > happy to have a go myself (given that I've done something dimilar to a range > of other modules). > > Brad Beckmann wrote: > The RubyTester and our soon-to-be-posed HSA-compatible GPU will not work > without this change. I believe many Ruby users, especially Ruby users > evaluating GPUs, have had this type of change in their personal patch queues > for a very long time. We need to get this problem out of our personal patch > queues and into the main line. > > I appreciate your offer to redesign the RubyPort yourself, but even if > that were to happen, I would prefer to get this change happen first. A > redesign of the RubyPort will require a lot of downstream work to verify > things still work as expected. > > Why are you so against changing this somewhat arbritrary value, or at > least allowing someone to change it? As Jason pointed out, the queue port is > an infinite buffer and doesn't represent a real hardware structure. Why so > much resistance? This small change really benefits Ruby users and if we make > it configurable, it will have no impact on Classic users. > > Andreas Hansson wrote: > I fundamentally object to the change since it effectively adds an > implicit infinite buffer (in quite some places), thus changing the very > behaviour of the system and potentially masking problems. > > The RubyPort should really just be a glue-logic adaptor. Agreed? In a > system architecture, there is no such thing as a RubyPort, and there should > not be kilobytes of "magic" buffering imposed on the user. If this buffer is > needed, then we should instantiate a buffer, cache, etc. I do not understand > the statement that this is needed to make things work. If you need an > infinite buffer to make things work, then the solution is broken. Am I > missing something? > > I would propose to simply remove the QueuedPort, make it inherit from the > Master/SlavePort directly, remove the implicit buffer altogether, and simply > relay the retry to/from Ruby. Surely Ruby understands flow control...does it > not? > > Steve Reinhardt wrote: > Getting rid of implicit infinite buffering is a noble and appropriate > long-term goal, I agree. However, in the near term, we already have implicit > infinite buffering, and the only thing we're debating here is at what > arbitrary point do we warn the user that the use of that infinite buffer is > potentially getting abusive. To me it seems obvious that the fundamental > near-term problem is that we have this totally arbitrary constant embedded in > the code and no way to adjust it based on different circumstances in which > different values would be appropriate. I think Brad's suggestion that we > make this threshold configurable and then find some way to propagate the > value downward seems like the right way to go for now; we can leave the > threshold at 100 for CPU-based systems, and crank it up for GPU-based > systems, with maybe a third even higher thershold specifically for when we > run RubyTester. > > Andreas Hansson wrote: > I'd rather have one arbitrary constant than three :-), and the lower the > better. > > Can we just proceed with this patch without this change? At least then I > can perhaps get an appreciation for the issue (if you can point me to an > example that fails). Right now I really do not understand how this could ever > be a problem in any sensible setup. > > Joel Hestness wrote: > I agree with Andreas on this one. I don't feel that an "arbitrary" > constant should be changed to another "arbitrary" constant in a patch that is > trying to fix up the RubyTester. There must be configurations of the > RubyTester that will work without increasing the threshold to 10000, so the > RubyTester changes will still work for some configurations. These are largely > orthogonal issues. > > Split the threshold change into a separate patch where we can focus on > debating the merits of changes between so-called "arbitrary" buffering limits > (e.g. gem5-gpu doesn't require any changes to the RubyPort or QueuedPort > buffering thresholds). > > Steve Reinhardt wrote: > We can certainly split this change into a separate patch. It's true that > 10,000 is just another arbitrary value; it may make more sense to just add a > parameter that lets us turn off this threshold check altogether rather than > create another arbitrary constant. Note that the RubyTester is not trying to > be a "sensible setup", it's a stress tester that is stressing the protocol > implementation beyond the limits of what a "sensible" configuration might be > able to do.
I have no problem moving this change to a separate patch, but we still need to provide flexibility on either setting this arbitrary value or turning it off. Leaving it as is will not work for us. We have important configurations of the RubyTester and other internal testers that reach the 100 packet threshold. I don't want to create a separate patch only to spawn off another week of high-level resistance. Can you confirm that if we create a patch that allows an object to configure the threshold or turn it off, that you will approve it? Do you have a preference between those two options? - Brad ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2776/#review6152 ----------------------------------------------------------- On May 11, 2015, 10:28 p.m., Tony Gutierrez wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > http://reviews.gem5.org/r/2776/ > ----------------------------------------------------------- > > (Updated May 11, 2015, 10:28 p.m.) > > > Review request for Default. > > > Repository: gem5 > > > Description > ------- > > Changeset 10833:e624796bae17 > --------------------------- > ruby: cleaner ruby tester support > > This patch allows the ruby random tester to use ruby ports that may only > support instr or data requests. This patch is similar to a previous changeset > (8932:1b2c17565ac8) that was unfortunately broken by subsequent changesets. > This current patch implements the support in a more straight-forward way. > The patch also includes better DPRINTFs and generalizes the retry behavior > needed by the ruby tester so that other testers/cpu models can use it as well. > > > Diffs > ----- > > configs/example/ruby_random_test.py > fbdaa08aaa426b9f4660c366f934ccb670d954ec > configs/ruby/MESI_Three_Level.py fbdaa08aaa426b9f4660c366f934ccb670d954ec > configs/ruby/MESI_Two_Level.py fbdaa08aaa426b9f4660c366f934ccb670d954ec > configs/ruby/MI_example.py fbdaa08aaa426b9f4660c366f934ccb670d954ec > configs/ruby/MOESI_CMP_directory.py > fbdaa08aaa426b9f4660c366f934ccb670d954ec > configs/ruby/MOESI_CMP_token.py fbdaa08aaa426b9f4660c366f934ccb670d954ec > configs/ruby/MOESI_hammer.py fbdaa08aaa426b9f4660c366f934ccb670d954ec > src/cpu/testers/rubytest/Check.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec > src/cpu/testers/rubytest/CheckTable.cc > fbdaa08aaa426b9f4660c366f934ccb670d954ec > src/cpu/testers/rubytest/RubyTester.hh > fbdaa08aaa426b9f4660c366f934ccb670d954ec > src/cpu/testers/rubytest/RubyTester.cc > fbdaa08aaa426b9f4660c366f934ccb670d954ec > src/cpu/testers/rubytest/RubyTester.py > fbdaa08aaa426b9f4660c366f934ccb670d954ec > src/mem/packet_queue.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec > src/mem/ruby/system/RubyPort.hh fbdaa08aaa426b9f4660c366f934ccb670d954ec > src/mem/ruby/system/RubyPort.cc fbdaa08aaa426b9f4660c366f934ccb670d954ec > src/mem/ruby/system/Sequencer.py fbdaa08aaa426b9f4660c366f934ccb670d954ec > > Diff: http://reviews.gem5.org/r/2776/diff/ > > > Testing > ------- > > > Thanks, > > Tony Gutierrez > > _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
