The fetch stage seems to have done that forever. I think because we're using RAM as a filesystem for ARM at the moment that is one reason why it's more prone to pop-up in this case, but it seems like it's been an issue for a long time.
With my first suggestion the L2 could send the data up to the L1s without issue, it's just that the L1s wouldn't assume a cache line sized read is another cache above them: diff -r 5e58eaf00b58 src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Sat Feb 19 17:32:43 2011 -0600 +++ b/src/mem/cache/cache_impl.hh Sun Feb 20 15:39:45 2011 -0600 @@ -193,7 +193,7 @@ blk->trackLoadLocked(pkt); } pkt->setDataFromBlock(blk->data, blkSize); - if (pkt->getSize() == blkSize) { + if (pkt->getSize() == blkSize && !isTopLevel) { // special handling for coherent block requests from // upper-level caches if (pkt->needsExclusive()) { I suppose I could change !isTopLevel to !pkt->req->isInstFetch() and that would implement your solution below, correct? The more I think about it, I think we really need to use isTopLevel. The problem doesn't end with instruction fetch, that is just a special case. I/O devices do full block reads an writes. For whatever reason, if an I/O device did a write of a block and then read it back while it lived in the I/O cache the data could be lost there too. I'm going to add an assert to the fetch stage and the DmaDevice::dmaAction() like: assert(pkt->sharedAsserted() || !pkt->memInhibitAsserted()) to catch these situations (that will do it correct, it seems like memInhibitAsserted() is being overloaded to mean you have the block in owner if it's not shared). Ali On Feb 20, 2011, at 7:00 PM, Steve Reinhardt wrote: > Yea, the protocol does assume that a full cache-block request is from another > coherent entity. Has O3 always fetched full cache blocks at a time? If so, > then I'm also surprised we haven't seen it (or maybe we have seen it, but not > recognized it). > > Getting rid of this code would solve the problem, but as you say, would > degrade performance in the case where an L2 could hand ownership to an L1 > dcache and avoid a later upgrade transaction, particularly since (IIRC) an L1 > dcache miss that also misses in the L2 is handled as a pair of mostly > independent misses, thus getting rid of this optimization would mean that a > cold read miss in a multilevel cache could never take fully advantage of the > E state. (Which was why I added it in to begin with.) > > Another question is whether it ever makes sense for an icache to be an owner > of a dirty block... I'd think not. So a third possible solution would be to > add a flag (or a distinct request type) to distinguish the situation where a > read is OK with getting an exclusive/owned copy from one where it isn't, and > factor that into the condition on this code. Then you could flag icaches to > only issue the latter type of read, and the icache would never get a dirty > copy. Of course, you'd want to have the O3 fetch stage use this type of > request too (for completeness), and you'd still have to have a parameter > indicating that this is an icache and should use this different request type, > so it's just as complicated as your option 1 (maybe a little more so) and may > not have a really significant impact, but yet it would be more realistic in > that the same L2 could perform this optimization for L1 dcaches but not L1 > icaches and the icache would never have a dirty block. > > Steve > > On Sun, Feb 20, 2011 at 9:05 AM, Ali Saidi <sa...@umich.edu> wrote: > If you look at the attached annotated trace you can see that a cache block > was written to, was dirty and then the dirty flag goes away at some point. I > traced it down to this code in the cache: > // special considerations if we're owner: > if (!deferred_response) { > // if we are responding immediately and can > // signal that we're transferring ownership > // along with exclusivity, do so > pkt->assertMemInhibit(); > blk->status &= ~BlkDirty; > > What seems to be happening is that the o3 cpu's fetch stage is grabbing an > entire block at a time, which makes the L1I cache believe it can provide > ownership to the cache above it (there isn't one) during the fetch. The fetch > stage doesn't do anything special when it's provided ownership of the block, > nor should it ever be provided ownership, so the information gets lost. I'm > actually very surprised we haven't hit this before. Anyway, the question is > how to fix it. I can think of two solutions and I'm going to implement (1) > if no one has a better suggestion. > > 1) Add a parameter indicating that the cache is at the top level and disable > this stuff when the parameter is set > 2) Not do anything special (remove the optimization above, which make the > protocol less performant) > 3) ???? > > > Ali > > > > > _______________________________________________ > m5-dev mailing list > m5-dev@m5sim.org > http://m5sim.org/mailman/listinfo/m5-dev > >
_______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev