Hi Tim,
  I'm having a bit of trouble following, so perhaps I can try to reiterate
your progress:
   1) After a checkpoint restore, your simulated benchmark crashes with a
memory bug
   2) You've tracked down that a cache line may not have been properly
saved into the cache trace
      - During checkpointing, the line had read-only permissions, but was
dirty (i.e. probably differed from the version in memory)
   3) On restore, something is happening with this cache line that is
causing a memory bug. Can you elaborate on this?

  It seems like maybe the restored cache line contains dirty data that
needs to get written back to memory, but maybe the line gets evicted (i.e.
dirty data gets lost), because the dirty status is not saved during
checkpoint cache recording? If this is the case, this is surprising,
because the RubySystem also flushes cache data back to memory during
serialize(). This should ensure that the memory should have a correct
version of the data, whether or not it was dirty.

  I can imagine a few possible problems here:
   A) MOESI_hammer has a bug that allows a dirty line to be shared read-only
   B) MOESI_hammer doesn't properly implement the flush to push the dirty
data back to memory
   C) The memory checkpoint is being taken before the RubySystem cache
flushing, which would mean that memory's contents do not contain dirty data

  (C) seems most probable given that the changeset 10524 moved memories out
of Ruby (note in that changeset that memory checkpointing occurred in
RubySystem::serialize() AFTER the cache flush operation). Can you check
whether the RubySystem or the memories execute serialize() first?

  Joel



On Mon, Jun 22, 2015 at 3:08 PM, Timothy M Jones <[email protected]
> wrote:

> Hi Joel,
>
> On 22/06/2015 20:35, Joel Hestness wrote:
>
>>    I'm not sure whether this is really a bug.
>>
>
> No, I'm sure that this isn't the bug.  The problem is that when the line
> is restored it isn't restored to the same state (my guess so far is that it
> isn't consistent with memory and because this dirty bit isn't preserved in
> the checkpoint / ruby trace, it causes the wrong data to be used somewhere
> later down the line.
>
>  Dirty cache lines could be
>> shared in a read-only state among caches. Whether to allow data in caches
>> to differ from memory (i.e. dirty bit set) is a choice by a directory of a
>> protocol. It could write a line back to memory to clean it, or allow
>> shared
>> (read-only) copies that are the same, but dirty among caches. I'm not sure
>> whether the MOESI_hammer directory allows this, but it can be a
>> possibility.
>>
>>  OK, thanks for the explanation.
>
>     Are you still testing this with MOESI_hammer?
>>
>
> Yes, I am.
>
>  Also, I'm not sure what
>> code you're referring to that checks the access permissions and dirty bits
>> (during cache warm-up?). Can you point us to that code?
>>
>
> Yeah, the problem is that it doesn't check the dirty bits, just the access
> permissions.  It's in src/mem/ruby/structures/CacheMemory.cc:326, function
> recordCacheContents()
>
>
> Cheers
> Tim
>
> --
> Timothy M. Jones
> http://www.cl.cam.ac.uk/~tmj32/
> _______________________________________________
> gem5-dev mailing list
> [email protected]
> http://m5sim.org/mailman/listinfo/gem5-dev
>



-- 
  Joel Hestness
  PhD Candidate, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to