I agree with Joel that this has been an interesting discussion. While there are questions about how these situations can occur and what is the best way to fix them, there doesn't seem to be anyone resisting the fact that this patch should be checked in, correct? I think it is very important that we not require protocols to always provide completely functional memory.
Brad -----Original Message----- From: gem5-dev [mailto:[email protected]] On Behalf Of Steve Reinhardt via gem5-dev Sent: Monday, October 27, 2014 10:02 AM To: Nilay Vaish Cc: gem5 Developer List Subject: Re: [gem5-dev] Review Request 2466: ruby: provide a second copy of the memory On Mon, Oct 27, 2014 at 6:58 AM, Nilay Vaish <[email protected]> wrote: > On Sun, 26 Oct 2014, Steve Reinhardt wrote: > > On Sun, Oct 26, 2014 at 2:23 PM, Nilay Vaish <[email protected]> wrote: >> >> >>> Marc Orr asked me the same question last year. I am pasting the >>> examples I gave him: >>> >>> a. the data in the message is stale, but the sender does not know >>> about it. Take a look at the MESI CMP directory protocol. In the >>> case when an >>> L1 >>> controller (A) sends a PUTX to the L2 controller, it is possible >>> that the >>> L2 controller has already transferred the ownership to some L1 >>> controller (B). In this case, it is possible that there are two >>> message buffers that contain messages from A and B to the L2 >>> controller, but it is message from B which has the 'right' data. >>> >>> >> Interesting. I can see how this technically could be a problem, but >> it seems like a pretty unlikely corner case. Have you seen it happen >> in practice, and if so, what was the functional read for? I suppose >> I just have a hard time imagining an actual program that has a lot of >> contention on a block that ends up being used as a parameter to a >> system call. I guess it could happen with a syscall that's >> specifically for synchronization, like futex. >> > > About an year, I had actually committed a patch that returned the > first data value it found (after making sure that no controller had > the block in a stable state). I ran into the case illustrated above > and I had to rollback the patch. Do you recall any details of how you ran into this? Was this in the process of executing an emulated syscall? How did you detect the problem? > b. no data is present in the message and the receiver will infer that > the >>> data it has is correct since the message did not have any data. >>> >>> >> This seems like it should pretty easy to fix... if you're querying >> the message to see if it has relevant data, then if the address >> matches but there is no data, you should just return false. I'd >> think there'd be a protocol-independent way to determine that a >> message has no data. It's similar to the idea that you have to check >> the valid bit in the cache, you can't just look for a tag match. >> > > I doubt we can do this in protocol independent way. I think the > presence / absence of data is decided by the type of the message which > is a protocol specific enum. > I don't see how this is any harder than having the cache controller know whether a block is valid or writable in a protocol-independent fashion. Unless the entire message format is completely protocol defined, I'd think it would be even easier, since you could directly check whether there's a data or payload section in the message. Steve _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
