Re: [m5-dev] cache bug?

Ali Saidi Mon, 27 Sep 2010 12:58:35 -0700

I think I can delay the response one cycle in the table walker, but I
suppose delaying in the CPU is another option. The TimingSimpleCPU has
gotten horribly complex. Do you  think that making it event driven would
slow it down that much?


Ali


On Mon, 27 Sep 2010 12:13:18 -0700, Steve Reinhardt <[email protected]>
wrote:
> I see... so the response comes out the mem side b/c the TLB walker
> bypasses the dcache but the dcache is responding since it's got an
> owned copy in cache.
> 
> I agree that delaying the request one cycle after the translation
> completes should fix it, I'm just trying to make sure I really
> understand what's causing the problem.
> 
> We should also make sure we add this cycle in a way that doesn't
> require us to schedule an extra event in the case that we have a TLB
> hit, if that's at all possible.  Or perhaps we should give up on
> trying to make TimingSimpleCPU efficient and make every step event
> driven; that would slow things down but clean up the code immensely by
> only having a single execution path.
> 
> Steve
> 
> On Mon, Sep 27, 2010 at 12:00 PM, Ali Saidi <[email protected]> wrote:
>>
>> So, this is what is happening:
>>
>> CPU -> TLB translation (miss) -> TableWalker -> system.membus
>> transaction -> L1 dcache hit (dcache memside port) -> Table walk
>> complete -> TLB translation finished -> CPU -> read -> L1 Dcache miss ->
>> dcache mem-side port -> boom!
>>
>> I think it's still solved via delaying the translation by one cycle.
>>
>> Ali
>>
>>
>> On Mon, 27 Sep 2010 11:49:46 -0700, Steve Reinhardt <[email protected]>
>> wrote:
>>> OK, so I understand that the DMAPort/iodevice.cc part is a red
>>> herring.  But why does the response come out of the cache's
>>> MemSidePort?  Is the TLB hooked up on the opposite side of the cache
>>> from the CPU?  I'm not 100% sure that's directly related to the
>>> problem, but it seems odd.  Basically you're leaving the back door of
>>> the cache, and then going around the side yard, coming in the front
>>> door, and trying to leave the back door again in the same cycle.  In
>>> contrast, if you were leaving through the front door and someone else
>>> tried to come in the front door at the same time, either the cache
>>> would be smart enough to issue a retry or if not at least the back
>>> door would still be free for use.
>>>
>>> Steve
>>>
>>> On Mon, Sep 27, 2010 at 6:15 AM, Ali Saidi <[email protected]> wrote:
>>>> So, the table walker seems to be the problem. Looks like the response from 
>>>> the translation needs be delayed at least one cycle for this to all work 
>>>> out....
>>>>
>>>> Ali
>>>>
>>>>
>>>> #0  SimpleTimingPort::schedSendEvent (this=0x10bf61080, when=507847508000) 
>>>> at tport.hh:111
>>>> #1  0x0000000100a30b07 in BaseCache::CachePort::requestBus 
>>>> (this=0x10bf61080, cause=BaseCache::Request_MSHR, time=507847508000) at 
>>>> base.hh:136
>>>> #2  0x0000000100a2e54e in Cache<LRU>::timingAccess (this=0x102257800, 
>>>> pkt=0x102277060) at base.hh:469
>>>> #3  0x0000000100a2ea93 in Cache<LRU>::CpuSidePort::recvTiming 
>>>> (this=0x10bf60fa0, pkt=<value temporarily unavailable, due to 
>>>> optimizations>) at cache_impl.hh:1532
>>>> #4  0x00000001007cf4fd in TimingSimpleCPU::handleReadPacket 
>>>> (this=0x10bcefbb0, pkt=0x102277060) at 
>>>> build/ARM_FS/cpu/simple/timing.cc:260
>>>> #5  0x00000001007cfc0c in TimingSimpleCPU::sendData (this=0x10bcefbb0, 
>>>> req=0x102276850, data=0x10c3c1ad0 "", res=0x0, read=true) at 
>>>> build/ARM_FS/cpu/simple/timing.cc:2
>>>> #6  0x00000001007cfefe in TimingSimpleCPU::finishTranslation 
>>>> (this=0x10bcefbb0, state=0x10c3c2e20) at 
>>>> build/ARM_FS/cpu/simple/timing.cc:683
>>>> #7  0x00000001007d1679 in DataTranslation<TimingSimpleCPU>::finish 
>>>> (this=0x10c3c25c0, fau...@0x102277060, req=0xbd0600, tc=<value temporarily 
>>>> unavailable, due to optim
>>>> #8  0x0000000100070ff3 in RefCountingPtr<FaultBase>::del () at 
>>>> build/ARM_FS/arch/arm/tlb.cc:592
>>>> #9  ~RefCountingPtr [inlined] () at 
>>>> /Users/alisai01/work/m5/build/ARM_FS/base/refcnt.hh:85
>>>> #10 0x0000000100070ff3 in ArmISA::TLB::translateTiming (this=<value 
>>>> temporarily unavailable, due to optimizations>, req=0x102276850, 
>>>> tc=0x10bceca00, translation=0x10c3
>>>> #11 0x000000010008f9ca in ArmISA::TableWalker::doL1DescriptorWrapper 
>>>> (this=0x10bced2a0) at build/ARM_FS/arch/arm/table_walker.cc:601
>>>> #12 0x00000001008d054a in DmaPort::recvTiming (this=0x10bf61b00, 
>>>> pkt=0x102277000) at build/ARM_FS/dev/io_device.cc:146
>>>> #13 0x00000001009ae96d in Bus::recvTiming (this=0x10bcad7b0, 
>>>> pkt=0x102277000) at build/ARM_FS/mem/bus.cc:262
>>>> #14 0x00000001009b462b in Bus::BusPort::recvTiming (this=<value 
>>>> temporarily unavailable, due to optimizations>, pkt=<value temporarily 
>>>> unavailable, due to optimization
>>>> #15 0x0000000100a042ae in Cache<LRU>::MemSidePort::sendPacket 
>>>> (this=0x10bf61080) at port.hh:186
>>>> #16 0x0000000100a92090 in Flags<short>::isSet () at 
>>>> build/ARM_FS/sim/eventq.cc:203
>>>> #17 0x0000000100a92090 in Event::isExitEvent () at 
>>>> /Users/alisai01/work/m5/build/ARM_FS/base/flags.hh:296
>>>> #18 0x0000000100a92090 in EventQueue::serviceOne (this=<value temporarily 
>>>> unavailable, due to optimizations>) at build/ARM_FS/sim/eventq.cc:204
>>>>
>>>>
>>>>
>>>>
>>>> On Sep 27, 2010, at 1:10 AM, Steve Reinhardt wrote:
>>>>
>>>>> That does seem odd... like the dcache is sending a packet, and in the
>>>>> middle the bus decides to retry the cache to get it to send a second
>>>>> packet before it completes sending the first.  Can you send a stack
>>>>> backtrace from somewhere in the middle (where the bus has called back
>>>>> to the cache)?
>>>>>
>>>>> Steve
>>>>>
>>>>> On Sun, Sep 26, 2010 at 5:04 PM, Ali Saidi <[email protected]> wrote:
>>>>>> This is what happens (I've added a lot of dprintfs)...
>>>>>>
>>>>>> The cache port sends a packet:
>>>>>> 507847507000: system.cpu.dcache-mem_side_port: sendPacket() called
>>>>>> 507847507000: system.cpu.dcache-mem_side_port: -- defered Packet Ready:1
>>>>>> 507847507000: system.cpu.dcache-mem_side_port: --sending packet
>>>>>>
>>>>>> This part I don't get. The bus receives the packet and probes the cache 
>>>>>> that made the request? Either way an sendEvent gets scheduled here for 
>>>>>> the future.
>>>>>> 507847507000: system.membus: recvTiming: src 6 dst 8 ReadResp 0x73631e4
>>>>>> 507847507000: system.cpu.dcache: ReadReq 79d2000 miss
>>>>>> 507847507000: system.cpu.dcache-mem_side_port: Asserting bus request for 
>>>>>> cause 0
>>>>>> 507847507000: system.cpu.dcache-mem_side_port: sendEvent being scheduled 
>>>>>> @ 507847508000
>>>>>> 507847507000: system.membus: The bus is now occupied from tick 
>>>>>> 507847507000 to 507847509000
>>>>>>
>>>>>> Now sendTiming returns success, so the cache port says that was done, 
>>>>>> and waitingOnRetry is false (although the sendTiming scheduled an event).
>>>>>> 507847507000: system.cpu.dcache-mem_side_port: --sendTiming returned 1
>>>>>> 507847507000: system.cpu.dcache-mem_side_port: sendPacket(): 
>>>>>> WaitingOnRetry:0
>>>>>>
>>>>>> Now the cache port attempts to schedule the next packet, since there 
>>>>>> isn't a retry pending
>>>>>> 507847507000: system.cpu.dcache-mem_side_port: --sendEvent.scheduled=1 
>>>>>> sendEvent.when=507847508000
>>>>>> 507847507000: system.cpu.dcache-mem_side_port: --more packets to send @ 
>>>>>> 507847508000
>>>>>>
>>>>>> At this point the sendEvent is scheduled but it's already been scheduled 
>>>>>> above, so we assert.
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> Ali
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sep 24, 2010, at 4:34 PM, Steve Reinhardt wrote:
>>>>>>
>>>>>>> Can you provide some more information on where you're seeing this?
>>>>>>> One simple case where it would happen is if there are actually
>>>>>>> multiple instances of the cache that are doing this on the same
>>>>>>> tick...
>>>>>>>
>>>>>>> Steve
>>>>>>>
>>>>>>> On Wed, Sep 22, 2010 at 4:10 PM, Ali Saidi <[email protected]> wrote:
>>>>>>>> Anyone ever seen a case where
>>>>>>>>            schedule(sendEvent, std::max(nextReady, curTick + 1));
>>>>>>>> is called multiple times in cache_impl.hh?
>>>>>>>>
>>>>>>>> I haven't quite been able to track what is going on. Any ideas how 
>>>>>>>> this could happen Steve?
>>>>>>>> Thanks,
>>>>>>>> Ali
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> m5-dev mailing list
>>>>>>>> [email protected]
>>>>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> m5-dev mailing list
>>>>>>> [email protected]
>>>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> m5-dev mailing list
>>>>>> [email protected]
>>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>>
>>>>> _______________________________________________
>>>>> m5-dev mailing list
>>>>> [email protected]
>>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>>
>>>>
>>>> _______________________________________________
>>>> m5-dev mailing list
>>>> [email protected]
>>>> http://m5sim.org/mailman/listinfo/m5-dev
>>>>
>>> _______________________________________________
>>> m5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>> _______________________________________________
>> m5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
> _______________________________________________
> m5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/m5-dev

_______________________________________________
m5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/m5-dev

Re: [m5-dev] cache bug?

Reply via email to