On Tue, Apr 9, 2013 at 11:44 AM, Nilay Vaish <[email protected]> wrote:
>
> On Tue, 9 Apr 2013, Joel Hestness wrote:
>>
>>  On April 8, 2013, 8:11 a.m., Andreas Hansson wrote:
>>>
>>>> src/mem/ruby/system/RubyPort.**cc, line 500
>>>> <http://reviews.gem5.org/r/**1813/diff/1/?file=34762#**file34762line500<http://reviews.gem5.org/r/1813/diff/1/?file=34762#file34762line500>
>>>> >
>>>>
>>>>
>>>>     Could you add a comment that the request does not need a response
>>>> and is thus deleted with the packet.
>>>>
>>>>     Also, it would be good to highlight that the same packet
>>>> (potentially altered) is now sent to all the ports and this is making
>>>> assumptions about what they do (or don't do) to it.
>>>>
>>>>     Thanks
>>>>
>>>
>> Added these comments to the patch for checkin
>>
>>
>>
> Why not move the line of code back inside the loop?
>
>
That's a good question.  First, note that ALL existing recvTimingSnoopReq()
functions "drop" (e.g. all those in timing CPUs and page table walkers) or
"pass through" (e.g. classic caches, busses) packets that they receive.
 Hence, there are no current implementations of recvTimingSnoopReq() that
delete the packet they receive.  Given this, these were the options I
considered when trying to decide how to handle this:

 1) Allocate packets and requests as I have in this review request.  This
results in the least memory allocations, and is correct given that the only
components that observe these requests are the LSQs (O3 and inorder), both
of which do not modify the packet (If we decide to go this route, perhaps
we could consider declaring the recvTimingSnoopReq() packet parameter as
const?).

 2) Stack allocate a packet and heap allocate a request inside the
conditional in the loop, which would result in a request heap allocation
for each snooping port (there are usually 2-3 for each snoop: icache,
dcache and page table walker for archs with one).  This results in 2-3X
more allocations than my review request and would be correct.

 3) Heap allocate both the packet AND request, and delete each packet that
is sent within the conditional in the loop.  This results in 4-6X more
allocations than my request and would be correct.

 4) Heap allocate both the packet AND request within the conditional in the
loop, and enforce that the destination of the packet delete it.  This would
result in 4-6X more allocations than my request and would be correct.  This
would require fixing the ~8 places that implement recvTimingSnoopReq() to
delete the packets.

Some other notes from this investigation that are relevant:
 - These invalidate snoop requests are handled inconsistently from all
other requests that do not require a response.  In other cases (e.g.
FlushReq received by Ruby Sequencers), the destination of the request
deletes the packet and the request if it is not reused.
 - Most sendTimingSnoopResp() code paths appear to be stubbed, and the
packet queue is the only component calling sendTimingSnoopResp() currently.
 Even in the packet queues, there aren't any master ports that are using
the deferred packet send_as_snoop functionality, so I'm not clear that this
code is actually being used.


There is an obvious simplicity argument to choosing (1), so that's what I
submitted.  It may not be the preferred solution though, so I'm willing to
reimplement an alternative.  Let me know if you think we should take
another route.

  Thanks,
  Joel


-- 
  Joel Hestness
  PhD Student, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://www.cs.utexas.edu/~hestness
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to