Re: [m5-dev] curTick + 1 bugs in Memory System?

Steve Reinhardt Sat, 24 Apr 2010 17:12:08 -0700

Hi Korey,

Thanks for bringing this up.  It is a confusing thing that we probably
need to think through and find a solution for, and agree to and
document what the expectations are on the Port interface as far as
timing.

I believe the assumption in the SimpleTimingPort case is that the
device on the other side of the port is managing things according to
its internal clock, and that even though you may send a packet on a
funny tick then the receiver will deal with it appropriately (or close
enough).  The receiver could do this by accepting the packet on
curTick+1 and queuing it internally until it's ready, or it could
return false on sendTiming() and use the retry mechanism to tell the
port when it's ready.  So resending on curTick+1 is basically "send it
as soon as the receiver will take it".  I'm not saying that's elegant
or the best way to do it though.

This arguably works OK if you're talking to a bus; even though an idle
bus will forward a packet through on an arbitrary tick, it calculates
the next free time according to its internal clock cycle, and so any
subsequent packet that comes before the bus goes idle again will be
stalled until the next clock edge where the bus is free, and queued
packets are always handled on bus clock edges.  So a lightly used bus
will allow packets through on arbitrary ticks, but never closer
together than the bus bandwidth would imply, and heavily loaded bus
where there is queuing will process packets on bus clock edges.

The cache model (as you see) doesn't have a notion of a clock; it just
has a latency.  Thus it also doesn't have any internal bandwidth
constraint.  The general expectation is that whatever is feeding the
cache (whether a CPU or a bus) is imposing the bandwidth constraint by
not sending packets in at a higher rate than the cache is supposed to
handle.

I assume you've got a cache hooked up directly to your CPU.  I think
the solutions to your problem that fit with the current model are for
the cpu itself to do what I stated above, which is either to accept
Packet B at 340001 but buffer it internally and process it on the next
cpu clock edge, or have your receiveTiming() method return false for
Packet B and then issue a retry at the next cpu clock edge.  The
former seems a little cleaner to me, but I don't have a strong
opinion.

I'm also open to thoughts about how to handle this better in
general... I agree that scheduling things on "curTick+1" just seems
wrong, and it is confusing and a potential (and quite possibly
current) source of timing model errors, so if there's a clean and easy
way to improve it without scheduling lots of extra events then we
should consider that.  Requiring every SimObject to have a clock and
every port to know the clock of its owner is one possibility; it's
rather invasive, but maybe it's a good idea anyway.  Thoughts from the
peanut gallery?

Steve

On Sat, Apr 24, 2010 at 7:02 AM, Korey Sewell <ksew...@umich.edu> wrote:
> Hey all,
> I've recently come across a couple of code segments that worry me a bit.
> They are different spots in the memory system where you have a "curTick + 1"
> to schedule something.
>
> If you grep around, you'll see a # of these in the code:
> src/cpu/inorder/cpu.cc:697:            mainEventQueue.schedule(&tickEvent,
> nextCycle(curTick + 1));
> src/cpu/inorder/cpu.cc:698:            DPRINTF(InOrderCPU, "Scheduled CPU
> for next tick @ %i.\n", nextCycle(curTick + 1));
> src/cpu/o3/lsq_unit_impl.hh:757:                cpu->schedule(wb, curTick +
> 1);
> src/dev/ide_disk.cc:556:    schedule(dmaTransferEvent, curTick + 1);
> src/mem/bridge.cc:286:            schedule(sendEvent, std::max(buf->ready,
> curTick + 1));
> src/mem/cache/cache_impl.hh:1559:            schedule(sendEvent,
> std::max(nextReady, curTick + 1));
> src/mem/cache/base.cc:127:        schedule(ev, curTick + 1);
> src/mem/ruby/system/RubyPort.cc:303:    schedSendTiming(pkt, curTick + 1);
> //minimum latency, must be > 0
> src/mem/ruby/system/RubyPort.cc:310:    schedSendTiming(pkt, curTick + 1);
> //minimum latency, must be > 0
>
> as well as:
> src/mem/tport.cc:155:            schedule(sendEvent, time <= curTick ?
> curTick+1 : time);
>
> As far as I can tell, you get to scheduling things on invalid cycles when
> you use a curTick + 1 instead of a "curTick + cycles(1)" where you account
> for what the next valid cycle is in a assumed cycles() function.
>
> The last example is the one causing buggy results for me: the
> SimpleTimingPort::sendDeferredPacket() function around line 55. I interpret
> the code snippet below as saying that if you have multiple packets to send
> off the transmit list in on one cycle, then send the subsequent packet on
> "curTick+1":
>  bool success = sendTiming(dp.pkt);
>
>     if (success) {
>         if (!transmitList.empty() && !sendEvent->scheduled()) {
>             Tick time = transmitList.front().tick;
>             schedule(sendEvent, time <= curTick ? curTick+1 : time);
>         }
>
>         if (transmitList.empty() && drainEvent) {
>             drainEvent->process();
>             drainEvent = NULL;
>         }
>     } ...
>
> The problem is here is that say you have 2 packets scheduled to come back on
> tick 340000: Packet A will come back on tick 340000 and Packet B will come
> back on tick 340001. Isnt 340001 an invalid tick for data to come back on?
> Shouldnt it be on whatever clock the bus/port/memory_object is running on?
> In my sims, the resulting effect is a bug since I want to activate a thread
> when data comes back from the cache, but if I get data back on tick 340001
> and the CPU never sees tick 340001 the system eventually just deadlocks.
>
> I've fixed some things on the CPU side of things to only schedule on valid
> CPU ticks but I dont think I have should to do bookkeeping to account for if
> data comes back too early (there's more to it then just scheduling ticks
> correctly).
>
> So the "BIG" fix for the SimpleTimingPort is to figure out how to change
> "curTick + 1" into "curTick + portOwnerCycles(1)" or something of that
> nature. The problem is there is no obvious way to write that function yet.
>
> In my example, I really just want to add the BaseCache Latency
> (SimpleTimingPort->CachePort) so I need a cycles * cache->hitLatency.
> However, SimpleTimingPort is just a base class of CachePort, so we cant
> assume that SimpleTimingPort will always have a cache for me to call into to
> figure out the right timing. Also, what happens if SimpleTimingPort is
> connected to a Bus and not a CPU? Additional ramifications there?
>
> I always get a bit confused on the port connection handling but I think
> there is a potential problem here with this. But maybe there is a reason for
> this "+ 1" code that I'm unaware of. If so, please let me know. If not, if
> anyone has any thoughts/ideas on a fix, please advise!
>
> Thanks.
>
> --
> - Korey
>
> _______________________________________________
> m5-dev mailing list
> m5-dev@m5sim.org
> http://m5sim.org/mailman/listinfo/m5-dev
>
>
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] curTick + 1 bugs in Memory System?

Reply via email to