I think there are two (or more) slightly different but inter-related things going one here. One is the issue that Amin has already raised about how we don't really model CPU/cache "ports" properly, particularly in terms of how responses get returned to the CPU. I think this is the root problem here; basically when a bunch of responses a ready to come back at the same time, they are delivered in sequential ticks, which is really never the right answer: if you really have N ports, you should be able to deliver N responses in the same tick, while if you have more responses than ports, the responses should really be delivered on later CPU clocks (where the number of clocks later depends on the port bandwidth), and not just a single tick later.
A second issue deals with how events are scheduled within a single tick, which comes into play because, if we fix the first problem, then we will have multiple responses coming back in the same tick. Andreas is right that static priorities don't form a really general solution, but I think they can be adequate, given that we're not trying to do a full-blown verilog-style sensitivity-list-driven RTL-style model, but something at a higher level of abstraction. That said, we need to do a better job of defining what our abstractions are, and what rules need to be followed to make them work consistently. It's late on a Friday afternoon, so I won't venture to put forth any concrete proposals at this point, but I wanted to reply on this thread just to clarify that we may be making this problem too hard by not delineating its various semi-separable components. Steve On Fri, Feb 15, 2013 at 10:29 AM, Amin Farmahini <[email protected]> wrote: > Mitch, regarding our conversation last night and your post, you may want > to take a look at > http://www.mail-archive.com/[email protected]/msg04107.html > > Thanks, > Amin > > > > On Fri, Feb 15, 2013 at 12:05 PM, Mitch Hayenga < > [email protected]> wrote: > >> Nevermind, o3 has this problem as well.... >> >> 28231500: system.cpu.iew.lsq.thread0: Doing memory access for inst >> [sn:105942] PC (0x899a=>0x899c).(0=>1) >> 28231500: system.cpu.iew.lsq.thread0: Doing memory access for inst >> [sn:105943] PC (0x899c=>0x899e).(0=>1) >> 28232500: system.cpu.iew.lsq.thread0: Writeback event [sn:105942]. >> 28232500: system.cpu.iew: Sending instructions to commit, [sn:105942] PC >> (0x899a=>0x899c).(0=>1). >> 28232501: system.cpu.iew.lsq.thread0: Writeback event [sn:105943]. >> 28233000: system.cpu.iew: Sending instructions to commit, [sn:105943] PC >> (0x899c=>0x899e).(0=>1). >> >> 2 loads sent to the memory system in the same cycle, both hit in the L1 >> cache, but result in different cycle latencies. >> >> On Fri, Feb 15, 2013 at 10:36 AM, Mitch Hayenga < >> [email protected]> wrote: >> >>> This is a nicely timed thread. I just hit a related ticking issue while >>> performance validating my core model. Here is an example case: >>> >>> ld r1, [sp, #0x16] // L1 cache hit >>> ld r2, [sp, #0x24] // L1 cache hit >>> >>> My core assumes 2 load ports, so both of these loads issue and hit in >>> the same cycle. But the way the gem5 memory system sends the hits back >>> results in my core receiving the response packets on different cycles. >>> >>> Assuming a 1000ps clock, load 1 returns on tick 1000 and load 2 returns >>> on tick 1001. Because gem5 chooses to tick the memory system before my >>> core, here is the resulting timing. >>> >>> 1000: load 1 returns via recvTimingResp >>> 1000: cpu ticks seeing load 1 >>> 1001: load 2 returns via recvTimingResp >>> 2000: cpu ticks seeing load 2 >>> >>> I need to look into how o3 handled this (because it seems to). For a >>> general solution, I wonder if having ports in the memory system aware of >>> their connections "delta" cycles would help get around issues like this. >>> Ex: all responses would be sent between (0, cycle_ticks) and not on either >>> boundary (so there would be no risk of straddling clock cycles). >>> >>> I know Amin hit this issue as well with his cpu core and had to hack in >>> a fix for it as well. >>> >>> >>> On Fri, Feb 15, 2013 at 5:25 AM, Andreas Hansson < >>> [email protected]> wrote: >>> >>>> Hi Steve, >>>> >>>> This is getting a bit philosophical, so please excuse me for getting >>>> a bit off topic. >>>> >>>> I spent some time thinking about this yesterday, and I am not sure >>>> how the priorities really help us in the general case, as they rely on a >>>> loop-free "spanning-tree" of event orders. Take for example a cache and a >>>> bus. Should we prioritise "releasing" the bus and forwarding a response to >>>> the (potentially blocked) cache, or should we prioritise trying to send a >>>> request from the cache? What if there are buses on both sides of the cache? >>>> There are many similar situations in gem5. The only way the priorities >>>> would really help is if we at instantiation time managed to sort all events >>>> by splitting input consuming and output generating code and always bridging >>>> them with events, and then flatten the entire system such that there were >>>> no loops (which I think is impossible). >>>> >>>> I could be missing something here, but my feeling is that the >>>> priorities are not really useful when it comes to getting the event >>>> scheduling right for interconnected components. >>>> >>>> Andreas >>>> >>>> From: Steve Reinhardt <[email protected]> >>>> >>>> Reply-To: gem5 users mailing list <[email protected]> >>>> Date: Thursday, 14 February 2013 18:10 >>>> >>>> To: gem5 users mailing list <[email protected]> >>>> Subject: Re: [gem5-users] Question about PacketQueue::scheduleSend >>>> >>>> For events that are scheduled in the same cycle, we can use the >>>> event priorities to control relative ordering. Our initial assumption when >>>> we saw that the cache ack was occurring after the CPU's check of the >>>> store_in_flight flag was that it would just be a matter of changing the >>>> priorities, but then we found that the ack was coming a full tick later, so >>>> obviously it was not so simple. >>>> >>>> If there's a more general problem and/or a more general solution >>>> here, we'd be glad to hear about it. >>>> >>>> Steve >>>> >>>> >>>> On Thu, Feb 14, 2013 at 3:39 AM, Andreas Hansson < >>>> [email protected]> wrote: >>>> >>>>> Hi Binh, >>>>> >>>>> Thanks for your question. >>>>> >>>>> The reason for the +1 is that gem5 does not have a "proper" delta >>>>> cycle. Ultimately, everything that happens between curTick and the next >>>>> "cycle" is the same time, but gem5 is a bit unclear on this point in that >>>>> the execution and evaluation order actually matters. If a clock edge is at >>>>> 500 and 1000, then everything from [500, 1000) is in the same cycle. There >>>>> was a discussion a few weeks back about how this is solved in the O3 cpu. >>>>> >>>>> Even if we remove the +1, there is no guarantee that the packet >>>>> queue's notion of time T comes before (or after) someone else's notion of >>>>> time T. Thus, we could remove the +1, and it might solve this specific >>>>> case, but in general it is not solving the more general problem of >>>>> concurrent events. The CPU might first see the 500, conclude there is >>>>> nothing to do, and then get the response from the cache. >>>>> >>>>> I'm open for suggestions and keen to know what people think is the >>>>> best way forward. >>>>> >>>>> Andreas >>>>> >>>>> From: "Binh Q. Pham" <[email protected]> >>>>> Reply-To: gem5 users mailing list <[email protected]> >>>>> Date: Wednesday, 13 February 2013 18:16 >>>>> >>>>> To: gem5 users mailing list <[email protected]> >>>>> Subject: [gem5-users] Question about PacketQueue::scheduleSend >>>>> >>>>> Hi everyone, >>>>> >>>>> I see in mem/packet_queue.cc, function PacketQueue::scheduleSend(Tick >>>>> time), we have: >>>>> // the next ready time is either determined by the next deferred >>>>> packet, >>>>> // or in the cache through the MSHR ready time >>>>> Tick nextReady = std::min(deferredPacketReadyTime(), time); >>>>> if (nextReady != MaxTick) { >>>>> // if the sendTiming caused someone else to call our >>>>> // recvTiming we could already have an event scheduled, check >>>>> if (!sendEvent.scheduled()) { >>>>> em.schedule(&sendEvent, std::max(nextReady, curTick() + >>>>> 1)); >>>>> } >>>>> } >>>>> >>>>> Why do we do curTick() + 1? If a clock cycle is a multiple of ticks, >>>>> e.g. 500 ticks, and curTick() is 500 or cycle #1, then curTick() + 1 >>>>> effectively schedules the event to be sent NOT at cycle 1 (Tick 500) or >>>>> cycle 2 (Tick 1000)? >>>>> >>>>> I noticed this while debugging a problem related to back to back >>>>> stores sent to the cache. Basically, I have 2 stores: store #1 is sent to >>>>> the cache in cycle 1, and ideally should get the cache ACK in cycle 2 >>>>> (Cache hit latency is 1 cycle). In cycle 2, the CPU checks if there is any >>>>> store in flight before it can send out store #2, but it cannot because >>>>> store #1's ACK is back at Tick 501! >>>>> >>>>> Right now, I have changed the code to em.schedule(&sendEvent, >>>>> std::max(nextReady, curTick())); but it would be best if someone could >>>>> give >>>>> me some insight about this code. >>>>> >>>>> I believe Andreas Hansson has written this code. So if you see this >>>>> thread, I would really appreciate if you can give some comments. >>>>> >>>>> Thanks! >>>>> >>>>> Binh >>>>> -- >>>>> >>>>> -- IMPORTANT NOTICE: The contents of this email and any attachments >>>>> are confidential and may also be privileged. If you are not the intended >>>>> recipient, please notify the sender immediately and do not disclose the >>>>> contents to any other person, use it for any purpose, or store or copy the >>>>> information in any medium. Thank you. >>>>> >>>>> _______________________________________________ >>>>> gem5-users mailing list >>>>> [email protected] >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>>> >>>> >>>> >>>> -- IMPORTANT NOTICE: The contents of this email and any attachments are >>>> confidential and may also be privileged. If you are not the intended >>>> recipient, please notify the sender immediately and do not disclose the >>>> contents to any other person, use it for any purpose, or store or copy the >>>> information in any medium. Thank you. >>>> >>>> _______________________________________________ >>>> gem5-users mailing list >>>> [email protected] >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >>>> >>> >>> >> >> _______________________________________________ >> gem5-users mailing list >> [email protected] >> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > > > _______________________________________________ > gem5-users mailing list > [email protected] > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list [email protected] http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
