> On Aug. 5, 2016, 7:15 a.m., Andreas Hansson wrote:
> > I see how this works as a stop gap, but ultimately I would like to push for 
> > the removal of the shadow memory as the first option. Is it really that 
> > much effort?
> 
> David Hashe wrote:
>     I'm not personally familiar enough with why the shadow memory is needed 
> to be able to say how much effort it would take to remove, but I believe so.

Providing background since some might not be familiar with the problem.

__The following links are relevant:__
http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas Hansson)
http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ 
(Joel Hestness’ comment)

__Why does Ruby need a shadow copy?__

Ruby needs the shadow copy to allow it to do functional accesses in situations 
where it would normally fail. Functional accesses are generated by system calls 
or by devices to do functional loading and storing to hack around deficiencies 
in the device model or runtime.

__What is a functional access?__

A functional access is a memory access that immediately resolves in the memory 
system. Typically, this involves updating the data value of the memory location 
without generating any events that go into the event queue. The result is that 
the memory values appear to have been updated magically without ever creating 
the events that it would have needed to create if it was operating in the 
normal manner.

__What's different about functional accesses compared with timing accesses?__

The difference is that functional accesses must complete immediately before 
returning control back over to the simulation. For example, system calls are 
executed in an X86 system when the processor executes either 'int 0x80' or 
'syscall'. In SE mode, the system call invocation and all of the resulting 
loads and stores must be completed by the time that we return control back to 
the simulated process. That single 'syscall' instruction that the processor 
executes is supposed to represent an entire set of instructions, many of them 
necessary loads and stores, that would have executed if we were running the 
code in a real system with an actual kernel.

Timing accesses, on the other hand, are sent through the cache hierarchy and 
represent what would happen in a "real" system. For timing accesses, the 
processor creates events that get put into the event queue and are resolved at 
specific ticks according to the memory model associated with the simulation. 
Each memory event can generate subsequent events which may or may not modify 
the cache state and memory state of the simulated system.

__Why can't Ruby handle functional accesses without the shadow copy?__

Well it could handle function accesses without the shadow copy, but it's 
difficult to implement properly for most protocols. The shadow copy has been 
considered to be an acceptable crutch to allow protocol writers to avoid the 
complexities associated with verifying that their protocol is data correct.

Consider the following case: a read request comes into an L1 cache and is about 
to evict a cache line to be sent to a downstream L2 cache. The eviction is 
represented by a series of state transitions in Ruby to handle moving the stale 
data out of the L1 into the L2 or possibly a temporary buffer before copying 
the new data into the L1 cache. There may be several intermediate states needed 
to complete transition which are termed transient states. While the cache 
line’s state machine is in a transient state, data cannot be read or written to 
the cache line. (Ruby has an assertion in the code to protect against reads on 
lines that must be due to some of the data being "busy".) The asserts were 
added because the evicted, old data likely resides in some temporary data 
structure(s) which are likely not easy to access and update (i.e. MSHR, write 
buffer, message buffer, request packets, etc.). That doesn’t mean that it’s 
conceptually impossible to update all of this temporary data; it’s just 
difficult to do in most cases.

__How does the shadow copy solve the problem?__

The "--access-backing-store" option solves the problem by caching data in a 
shadow copy of the system memory. __All functional accesses are sent to this 
shadow copy instead of being directed to the normal, default system memory. 
Also, hit callbacks from the memory slave ports (which belong to the sequencers 
that created the request) will write (or read) data into (from) the shadow copy 
during the hit callback. If I am not mistaken, the hit callback on the memory 
slave port is equivalent to an L1 hit meaning that the request completed 
traversing the cache subsystem. In traversing the cache subsystem, the request 
did touch the default memory through normal behavior, but any returning 
information carried by the packet will be discarded in favor of what’s in the 
shadow copy. If data is read from the shadow copy, the request packet (again 
issued by a timing request) is updated to reflect the shadow copy’s value 
before the packet is finally handed back to the sequencer.__ The interesting 
code can be found by searching for "access_backing_store" in "RubyPort.cc".

System call instructions have an ordering semantic that prevents them from 
being executed before all of the preceding instructions have executed. The 
ordering semantic protect us from clobbering and/or missing timing accesses 
with subsequent functional access during the system call. The key thing which 
protects us here is that the Ruby sequencer needs to tell the processor that 
the instruction has finished. This cannot happen until the L1's hit callback 
has returned ensuring that the shadow copy has seen the timing accesses. (Need 
to verify this by looking through that code, but believe that’s true from 
previous experience.)

If that’s true, than other functional accesses need to be careful in how they 
issue instructions or we may see consistency issues caused by value reordering 
from the cache hierarchy. For instance, consider what might happen if the 
system call did not have the ordering property. It would be possible to the 
system call instruction to issue functional accesses to the shadow copy before 
still active timing accesses were seen by it. (There's no way that the 
processor could prevent the accesses from occurring by checking normal data 
dependencies because all it sees is a single instruction: syscall or int0x80.) 
So, I am a bit wary of seeing functional accesses in weird places. For 
instance, I wouldn’t embed a functional access into a normal instruction. (I 
don't know if anyone has ever tried that or if it's even possible, but it would 
be a bad idea. There might be a magic instruction which does this or someone 
might try to do it in the future.)

__What happens if we do not have a shadow copy?__

The behavior without a shadow copy of memory (i.e. no --access-backing-store) 
is kind of interesting. It highlights why we need the shadow copy in the first 
place (see RubySystem.cc::functional_read/write). Essentially, the 
functional_writes will always succeed by attempting to write to as much of 
their state as possible. However, functional_reads can (and will) fail. It’s 
not completely obvious, but I am confident that the failures stem from the 
cache lines returning “busy” states caused by recent transitions in the cache 
hierarchy. (It seems that this is what Nilay is referring to in his summary for 
reviews.gem5.org/r/2466.)


- Brandon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------


On Aug. 5, 2016, 9:37 p.m., David Hashe wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3580/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2016, 9:37 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11562:7375e1f533fa
> ---------------------------
> cpu, mem, sim: Enable KVM support for Ruby
> 
> Only map memories into the KVM guest address space that are
> marked as usable by KVM.
> 
> Remember whether a BackingStoreEntry should be mapped by KVM.
> 
> Fix bug causing incomplete draining of Ruby Sequencer.
> 
> 
> Diffs
> -----
> 
>   src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee 
> 
> Diff: http://reviews.gem5.org/r/3580/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> David Hashe
> 
>

_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to