Re: [gem5-dev] Review Request: Models the request and response bandwidth between O3 and L1 (cache read and write ports)

Andreas Hansson Wed, 23 Jan 2013 09:41:41 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/1422/#review3890
-----------------------------------------------------------



Let me start by saying that I really support the effort and think we should 
enable modelling of this resource constraint. That said, there are a number of 
issues that I see with how it is approached at the moment. I hope we can find a 
solution to all of them. The most important concerns are the changes to the 
PacketQueue and the notion of the "incoming cycle". See the code comments for 
more details.

Thanks again for all the hard work.


src/cpu/o3/iew_impl.hh
<http://reviews.gem5.org/r/1422/#comment3740>

    Is this needed for the patch and if so, could you be more descriptive?



src/cpu/o3/inst_queue.hh
<http://reviews.gem5.org/r/1422/#comment3741>

    The name does not really distinguish it from the existing 
getDeferredMemInstToExecute. Could we either use the same name or somehow make 
the TLB vs port distinction more clear?



src/cpu/o3/inst_queue.hh
<http://reviews.gem5.org/r/1422/#comment3742>

    Similar to previous remark



src/cpu/o3/inst_queue.hh
<http://reviews.gem5.org/r/1422/#comment3743>

    Once again the name is a bit odd compared to deferredMemInsts



src/cpu/o3/inst_queue_impl.hh
<http://reviews.gem5.org/r/1422/#comment3744>

    I would prefer to keep the assignment outside the condition



src/cpu/o3/inst_queue_impl.hh
<http://reviews.gem5.org/r/1422/#comment3745>

    It looks like what you want is a deque and not a list?
    
    front and pop_front would be more descriptive



src/cpu/o3/lsq.hh
<http://reviews.gem5.org/r/1422/#comment3746>

    please use an unsigned type



src/cpu/o3/lsq.hh
<http://reviews.gem5.org/r/1422/#comment3747>

    const?



src/cpu/o3/lsq.hh
<http://reviews.gem5.org/r/1422/#comment3748>

    const?



src/cpu/o3/lsq.hh
<http://reviews.gem5.org/r/1422/#comment3749>

    const?



src/cpu/o3/lsq.hh
<http://reviews.gem5.org/r/1422/#comment3750>

    const



src/cpu/o3/lsq.hh
<http://reviews.gem5.org/r/1422/#comment3751>

    const



src/cpu/o3/lsq_impl.hh
<http://reviews.gem5.org/r/1422/#comment3752>

    When is a port "used"?
    
    To me it seems like a port that tried and did not yet succeed somehow is 
used. Perhaps it's only confusion on my side



src/cpu/o3/lsq_impl.hh
<http://reviews.gem5.org/r/1422/#comment3753>

    Is the check really needed?



src/cpu/o3/lsq_impl.hh
<http://reviews.gem5.org/r/1422/#comment3754>

    && ?



src/cpu/o3/lsq_impl.hh
<http://reviews.gem5.org/r/1422/#comment3755>

    && ?



src/cpu/o3/lsq_impl.hh
<http://reviews.gem5.org/r/1422/#comment3756>

    Not really related to this patch



src/cpu/o3/lsq_impl.hh
<http://reviews.gem5.org/r/1422/#comment3757>

    Useful, but not really related to this patch



src/cpu/o3/lsq_impl.hh
<http://reviews.gem5.org/r/1422/#comment3758>

    Whitespace



src/cpu/o3/lsq_impl.hh
<http://reviews.gem5.org/r/1422/#comment3759>

    Useful, but preferably in a patch on its own



src/cpu/o3/lsq_unit_impl.hh
<http://reviews.gem5.org/r/1422/#comment3760>

    I'd suggest making the debug improvements a separate patch



src/cpu/o3/lsq_unit_impl.hh
<http://reviews.gem5.org/r/1422/#comment3761>

    Where there not even more fields added to the class?



src/cpu/o3/lsq_unit_impl.hh
<http://reviews.gem5.org/r/1422/#comment3762>

    We failed and still delete all these bits and pieces. I don't follow the 
logic here.



src/mem/cache/base.cc
<http://reviews.gem5.org/r/1422/#comment3764>

    We should do this as a separate patch



src/mem/packet_queue.hh
<http://reviews.gem5.org/r/1422/#comment3763>

    It is very unfortunate that this all ends up in the packet queue that is 
used in far more places than where it is needed.
    
    Somehow I wish we could contain this in the CPU, and the current way of 
putting the responsibility on the PacketQueue seems odd to me.



src/sim/clocked_object.hh
<http://reviews.gem5.org/r/1422/#comment3765>

    I think this is a really dodgy one. In gem5 we have a general problem that 
there is no notify/update distinction like in the SystemC or any RTL simulator. 
Thus, what happens at time T is visible in 0 time and the order of evaluating 
events at time T can have big repercussions.
    
    In your case, the difference between the CPU going first and the cache 
going first is causing the problem as far as I can tell. I think we should try 
and solve this problem rather than work around it like this. I don't know what 
the general feeling is, but I fear that event priorities might not even be able 
to solve it. I am afraid I don't have any easy solution...


- Andreas Hansson


On Jan. 18, 2013, 11:49 p.m., Amin Farmahini wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/1422/
> -----------------------------------------------------------
> 
> (Updated Jan. 18, 2013, 11:49 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Description
> -------
> 
> I made some changes to O3 to model the bandwidth between O3 and L1. By 
> bandwidth I mean the number of requests and responses sent or received each 
> cycle (not the amount of data transferred). I limit both the number of 
> requests sent by O3 and the number of responses received by O3.
> 
> For REQUESTS:
> I have a separate read requests (loads) counter and a separate write requests 
> (stores) counter and a separate shared requests (read/write) counter.
> LOADS: O3 limits the number of read requests sent each cycle to the number of 
> defined cache read ports.
> STORES: Similarly, O3 limits the number of write requests sent each cycle to 
> the number of defined cache write ports.
> Also, shared ports can be used for read/write requests.
> Note that no matter how many ports are defined, we have only a single actual 
> cache port used for all read and write requests. So just like the current 
> gem5 code, only one dcachePort is defined in the code. However, I limit the 
> number of requests to the number of cache ports defined in parameters.
> 
> For RESPONSES:
> If there are not enough cache ports, response packets are buffered in the 
> port (cache side) and are sent to the processor next cycle.
> 
> I don't believe what I implemented is the best way to model cache ports here, 
> so your feedback would be appreciated.
> 
> 
> Diffs
> -----
> 
>   src/cpu/base_dyn_inst.hh UNKNOWN 
>   src/cpu/o3/O3CPU.py UNKNOWN 
>   src/cpu/o3/iew_impl.hh UNKNOWN 
>   src/cpu/o3/inst_queue.hh UNKNOWN 
>   src/cpu/o3/inst_queue_impl.hh UNKNOWN 
>   src/cpu/o3/lsq.hh UNKNOWN 
>   src/cpu/o3/lsq_impl.hh UNKNOWN 
>   src/cpu/o3/lsq_unit.hh UNKNOWN 
>   src/cpu/o3/lsq_unit_impl.hh UNKNOWN 
>   src/mem/cache/BaseCache.py UNKNOWN 
>   src/mem/cache/base.hh UNKNOWN 
>   src/mem/cache/base.cc UNKNOWN 
>   src/mem/cache/cache.hh UNKNOWN 
>   src/mem/cache/cache_impl.hh UNKNOWN 
>   src/mem/packet_queue.hh UNKNOWN 
>   src/mem/packet_queue.cc UNKNOWN 
>   src/mem/port.hh UNKNOWN 
>   src/mem/port.cc UNKNOWN 
>   src/sim/clocked_object.hh UNKNOWN 
> 
> Diff: http://reviews.gem5.org/r/1422/diff/
> 
> 
> Testing
> -------
> 
> a few small benches done only in SE and classic
> 
> 
> Thanks,
> 
> Amin Farmahini
> 
>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request: Models the request and response bandwidth between O3 and L1 (cache read and write ports)

Reply via email to