So, this is what is happening: CPU -> TLB translation (miss) -> TableWalker -> system.membus transaction -> L1 dcache hit (dcache memside port) -> Table walk complete -> TLB translation finished -> CPU -> read -> L1 Dcache miss -> dcache mem-side port -> boom!
I think it's still solved via delaying the translation by one cycle. Ali On Mon, 27 Sep 2010 11:49:46 -0700, Steve Reinhardt <[email protected]> wrote: > OK, so I understand that the DMAPort/iodevice.cc part is a red > herring. But why does the response come out of the cache's > MemSidePort? Is the TLB hooked up on the opposite side of the cache > from the CPU? I'm not 100% sure that's directly related to the > problem, but it seems odd. Basically you're leaving the back door of > the cache, and then going around the side yard, coming in the front > door, and trying to leave the back door again in the same cycle. In > contrast, if you were leaving through the front door and someone else > tried to come in the front door at the same time, either the cache > would be smart enough to issue a retry or if not at least the back > door would still be free for use. > > Steve > > On Mon, Sep 27, 2010 at 6:15 AM, Ali Saidi <[email protected]> wrote: >> So, the table walker seems to be the problem. Looks like the response from >> the translation needs be delayed at least one cycle for this to all work >> out.... >> >> Ali >> >> >> #0 SimpleTimingPort::schedSendEvent (this=0x10bf61080, when=507847508000) >> at tport.hh:111 >> #1 0x0000000100a30b07 in BaseCache::CachePort::requestBus >> (this=0x10bf61080, cause=BaseCache::Request_MSHR, time=507847508000) at >> base.hh:136 >> #2 0x0000000100a2e54e in Cache<LRU>::timingAccess (this=0x102257800, >> pkt=0x102277060) at base.hh:469 >> #3 0x0000000100a2ea93 in Cache<LRU>::CpuSidePort::recvTiming >> (this=0x10bf60fa0, pkt=<value temporarily unavailable, due to >> optimizations>) at cache_impl.hh:1532 >> #4 0x00000001007cf4fd in TimingSimpleCPU::handleReadPacket >> (this=0x10bcefbb0, pkt=0x102277060) at build/ARM_FS/cpu/simple/timing.cc:260 >> #5 0x00000001007cfc0c in TimingSimpleCPU::sendData (this=0x10bcefbb0, >> req=0x102276850, data=0x10c3c1ad0 "", res=0x0, read=true) at >> build/ARM_FS/cpu/simple/timing.cc:2 >> #6 0x00000001007cfefe in TimingSimpleCPU::finishTranslation >> (this=0x10bcefbb0, state=0x10c3c2e20) at >> build/ARM_FS/cpu/simple/timing.cc:683 >> #7 0x00000001007d1679 in DataTranslation<TimingSimpleCPU>::finish >> (this=0x10c3c25c0, fau...@0x102277060, req=0xbd0600, tc=<value temporarily >> unavailable, due to optim >> #8 0x0000000100070ff3 in RefCountingPtr<FaultBase>::del () at >> build/ARM_FS/arch/arm/tlb.cc:592 >> #9 ~RefCountingPtr [inlined] () at >> /Users/alisai01/work/m5/build/ARM_FS/base/refcnt.hh:85 >> #10 0x0000000100070ff3 in ArmISA::TLB::translateTiming (this=<value >> temporarily unavailable, due to optimizations>, req=0x102276850, >> tc=0x10bceca00, translation=0x10c3 >> #11 0x000000010008f9ca in ArmISA::TableWalker::doL1DescriptorWrapper >> (this=0x10bced2a0) at build/ARM_FS/arch/arm/table_walker.cc:601 >> #12 0x00000001008d054a in DmaPort::recvTiming (this=0x10bf61b00, >> pkt=0x102277000) at build/ARM_FS/dev/io_device.cc:146 >> #13 0x00000001009ae96d in Bus::recvTiming (this=0x10bcad7b0, >> pkt=0x102277000) at build/ARM_FS/mem/bus.cc:262 >> #14 0x00000001009b462b in Bus::BusPort::recvTiming (this=<value temporarily >> unavailable, due to optimizations>, pkt=<value temporarily unavailable, due >> to optimization >> #15 0x0000000100a042ae in Cache<LRU>::MemSidePort::sendPacket >> (this=0x10bf61080) at port.hh:186 >> #16 0x0000000100a92090 in Flags<short>::isSet () at >> build/ARM_FS/sim/eventq.cc:203 >> #17 0x0000000100a92090 in Event::isExitEvent () at >> /Users/alisai01/work/m5/build/ARM_FS/base/flags.hh:296 >> #18 0x0000000100a92090 in EventQueue::serviceOne (this=<value temporarily >> unavailable, due to optimizations>) at build/ARM_FS/sim/eventq.cc:204 >> >> >> >> >> On Sep 27, 2010, at 1:10 AM, Steve Reinhardt wrote: >> >>> That does seem odd... like the dcache is sending a packet, and in the >>> middle the bus decides to retry the cache to get it to send a second >>> packet before it completes sending the first. Can you send a stack >>> backtrace from somewhere in the middle (where the bus has called back >>> to the cache)? >>> >>> Steve >>> >>> On Sun, Sep 26, 2010 at 5:04 PM, Ali Saidi <[email protected]> wrote: >>>> This is what happens (I've added a lot of dprintfs)... >>>> >>>> The cache port sends a packet: >>>> 507847507000: system.cpu.dcache-mem_side_port: sendPacket() called >>>> 507847507000: system.cpu.dcache-mem_side_port: -- defered Packet Ready:1 >>>> 507847507000: system.cpu.dcache-mem_side_port: --sending packet >>>> >>>> This part I don't get. The bus receives the packet and probes the cache >>>> that made the request? Either way an sendEvent gets scheduled here for the >>>> future. >>>> 507847507000: system.membus: recvTiming: src 6 dst 8 ReadResp 0x73631e4 >>>> 507847507000: system.cpu.dcache: ReadReq 79d2000 miss >>>> 507847507000: system.cpu.dcache-mem_side_port: Asserting bus request for >>>> cause 0 >>>> 507847507000: system.cpu.dcache-mem_side_port: sendEvent being scheduled @ >>>> 507847508000 >>>> 507847507000: system.membus: The bus is now occupied from tick >>>> 507847507000 to 507847509000 >>>> >>>> Now sendTiming returns success, so the cache port says that was done, and >>>> waitingOnRetry is false (although the sendTiming scheduled an event). >>>> 507847507000: system.cpu.dcache-mem_side_port: --sendTiming returned 1 >>>> 507847507000: system.cpu.dcache-mem_side_port: sendPacket(): >>>> WaitingOnRetry:0 >>>> >>>> Now the cache port attempts to schedule the next packet, since there isn't >>>> a retry pending >>>> 507847507000: system.cpu.dcache-mem_side_port: --sendEvent.scheduled=1 >>>> sendEvent.when=507847508000 >>>> 507847507000: system.cpu.dcache-mem_side_port: --more packets to send @ >>>> 507847508000 >>>> >>>> At this point the sendEvent is scheduled but it's already been scheduled >>>> above, so we assert. >>>> >>>> Any ideas? >>>> >>>> Ali >>>> >>>> >>>> >>>> On Sep 24, 2010, at 4:34 PM, Steve Reinhardt wrote: >>>> >>>>> Can you provide some more information on where you're seeing this? >>>>> One simple case where it would happen is if there are actually >>>>> multiple instances of the cache that are doing this on the same >>>>> tick... >>>>> >>>>> Steve >>>>> >>>>> On Wed, Sep 22, 2010 at 4:10 PM, Ali Saidi <[email protected]> wrote: >>>>>> Anyone ever seen a case where >>>>>> schedule(sendEvent, std::max(nextReady, curTick + 1)); >>>>>> is called multiple times in cache_impl.hh? >>>>>> >>>>>> I haven't quite been able to track what is going on. Any ideas how this >>>>>> could happen Steve? >>>>>> Thanks, >>>>>> Ali >>>>>> >>>>>> _______________________________________________ >>>>>> m5-dev mailing list >>>>>> [email protected] >>>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>>> >>>>> _______________________________________________ >>>>> m5-dev mailing list >>>>> [email protected] >>>>> http://m5sim.org/mailman/listinfo/m5-dev >>>>> >>>> >>>> _______________________________________________ >>>> m5-dev mailing list >>>> [email protected] >>>> http://m5sim.org/mailman/listinfo/m5-dev >>>> >>> _______________________________________________ >>> m5-dev mailing list >>> [email protected] >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >> >> _______________________________________________ >> m5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/m5-dev >> > _______________________________________________ > m5-dev mailing list > [email protected] > http://m5sim.org/mailman/listinfo/m5-dev _______________________________________________ m5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/m5-dev
