> On Aug. 5, 2016, 7:15 a.m., Andreas Hansson wrote:
> > I see how this works as a stop gap, but ultimately I would like to push for 
> > the removal of the shadow memory as the first option. Is it really that 
> > much effort?
> 
> David Hashe wrote:
>     I'm not personally familiar enough with why the shadow memory is needed 
> to be able to say how much effort it would take to remove, but I believe so.
> 
> Brandon Potter wrote:
>     Providing background since some might not be familiar with the problem.
>     
>     __The following links are relevant:__
>     http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas 
> Hansson)
>     http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
>     http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
>     
> https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ 
> (Joel Hestness’ comment)
>     
>     __Why does Ruby need a shadow copy?__
>     
>     Ruby needs the shadow copy to allow it to do functional accesses in 
> situations where it would normally fail. Functional accesses are generated by 
> system calls or by devices to do functional loading and storing to hack 
> around deficiencies in the device model or runtime.
>     
>     __What is a functional access?__
>     
>     A functional access is a memory access that immediately resolves in the 
> memory system. Typically, this involves updating the data value of the memory 
> location without generating any events that go into the event queue. The 
> result is that the memory values appear to have been updated magically 
> without ever creating the events that it would have needed to create if it 
> was operating in the normal manner.
>     
>     __What's different about functional accesses compared with timing 
> accesses?__
>     
>     The difference is that functional accesses must complete immediately 
> before returning control back over to the simulation. For example, system 
> calls are executed in an X86 system when the processor executes either 'int 
> 0x80' or 'syscall'. In SE mode, the system call invocation and all of the 
> resulting loads and stores must be completed by the time that we return 
> control back to the simulated process. That single 'syscall' instruction that 
> the processor executes is supposed to represent an entire set of 
> instructions, many of them necessary loads and stores, that would have 
> executed if we were running the code in a real system with an actual kernel.
>     
>     Timing accesses, on the other hand, are sent through the cache hierarchy 
> and represent what would happen in a "real" system. For timing accesses, the 
> processor creates events that get put into the event queue and are resolved 
> at specific ticks according to the memory model associated with the 
> simulation. Each memory event can generate subsequent events which may or may 
> not modify the cache state and memory state of the simulated system.
>     
>     __Why can't Ruby handle functional accesses without the shadow copy?__
>     
>     Well it could handle function accesses without the shadow copy, but it's 
> difficult to implement properly for most protocols. The shadow copy has been 
> considered to be an acceptable crutch to allow protocol writers to avoid the 
> complexities associated with verifying that their protocol is data correct.
>     
>     Consider the following case: a read request comes into an L1 cache and is 
> about to evict a cache line to be sent to a downstream L2 cache. The eviction 
> is represented by a series of state transitions in Ruby to handle moving the 
> stale data out of the L1 into the L2 or possibly a temporary buffer before 
> copying the new data into the L1 cache. There may be several intermediate 
> states needed to complete transition which are termed transient states. While 
> the cache line’s state machine is in a transient state, data cannot be read 
> or written to the cache line. (Ruby has an assertion in the code to protect 
> against reads on lines that must be due to some of the data being "busy".) 
> The asserts were added because the evicted, old data likely resides in some 
> temporary data structure(s) which are likely not easy to access and update 
> (i.e. MSHR, write buffer, message buffer, request packets, etc.). That 
> doesn’t mean that it’s conceptually impossible to update all of this 
> temporary data; it’s just difficult to do in most cases.
>     
>     __How does the shadow copy solve the problem?__
>     
>     The "--access-backing-store" option solves the problem by caching data in 
> a shadow copy of the system memory. __All functional accesses are sent to 
> this shadow copy instead of being directed to the normal, default system 
> memory. Also, hit callbacks from the memory slave ports (which belong to the 
> sequencers that created the request) will write (or read) data into (from) 
> the shadow copy during the hit callback. If I am not mistaken, the hit 
> callback on the memory slave port is equivalent to an L1 hit meaning that the 
> request completed traversing the cache subsystem. In traversing the cache 
> subsystem, the request did touch the default memory through normal behavior, 
> but any returning information carried by the packet will be discarded in 
> favor of what’s in the shadow copy. If data is read from the shadow copy, the 
> request packet (again issued by a timing request) is updated to reflect the 
> shadow copy’s value before the packet is finally handed back to the 
> sequencer.__ The interesting code can be found by searching for 
> "access_backing_store" in "RubyPort.cc".
>     
>     System call instructions have an ordering semantic that prevents them 
> from being executed before all of the preceding instructions have executed. 
> The ordering semantic protect us from clobbering and/or missing timing 
> accesses with subsequent functional access during the system call. The key 
> thing which protects us here is that the Ruby sequencer needs to tell the 
> processor that the instruction has finished. This cannot happen until the 
> L1's hit callback has returned ensuring that the shadow copy has seen the 
> timing accesses. (Need to verify this by looking through that code, but 
> believe that’s true from previous experience.)
>     
>     If that’s true, than other functional accesses need to be careful in how 
> they issue instructions or we may see consistency issues caused by value 
> reordering from the cache hierarchy. For instance, consider what might happen 
> if the system call did not have the ordering property. It would be possible 
> to the system call instruction to issue functional accesses to the shadow 
> copy before still active timing accesses were seen by it. (There's no way 
> that the processor could prevent the accesses from occurring by checking 
> normal data dependencies because all it sees is a single instruction: syscall 
> or int0x80.) So, I am a bit wary of seeing functional accesses in weird 
> places. For instance, I wouldn’t embed a functional access into a normal 
> instruction. (I don't know if anyone has ever tried that or if it's even 
> possible, but it would be a bad idea. There might be a magic instruction 
> which does this or someone might try to do it in the future.)
>     
>     __What happens if we do not have a shadow copy?__
>     
>     The behavior without a shadow copy of memory (i.e. no 
> --access-backing-store) is kind of interesting. It highlights why we need the 
> shadow copy in the first place (see RubySystem.cc::functional_read/write). 
> Essentially, the functional_writes will always succeed by attempting to write 
> to as much of their state as possible. However, functional_reads can (and 
> will) fail. It’s not completely obvious, but I am confident that the failures 
> stem from the cache lines returning “busy” states caused by recent 
> transitions in the cache hierarchy. (It seems that this is what Nilay is 
> referring to in his summary for reviews.gem5.org/r/2466.)

__Is it possible to remove the shadow copy?__

Yes, it is possible, but it requires a lot of work; more work than most people 
can reasonably be expected to contribute as an unrelated patch. The solution 
requires that the protocols are data correct; this entails making all of the 
functional accesses propagate correctly through temporary variables. Even if it 
is possible to remove for existing public protcools, it's likely that the 
protocol developers will want to retain this functionality to help with 
developing new protocols. Even if that's done, I suspect heavy resistance if we 
tried to force other developers with private protocols to insure that their 
protocols are data correct even in the face of functional accesses. It's my 
understanding that the folks here at AMD aren't the only ones who rely on the 
shadow copy; I think Wisconsin folks use it too.

Generally, we need better random memory testers to exercise the protocols and 
uncover problems. In my opinion, that should be the main priority for Ruby 
developers. I don't have much confidence in running new workloads if the 
simulation relies on Ruby; the protocols just aren't tested well enough. This 
memory tester needs to issue functional accesses as well as timing accesses to 
actually test whether the protocols are always data correct. It's not enough to 
simply have a few benchmarks that we test in the regressions even if the 
benchmarks are long running.


- Brandon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------


On Aug. 5, 2016, 9:37 p.m., David Hashe wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3580/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2016, 9:37 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11562:7375e1f533fa
> ---------------------------
> cpu, mem, sim: Enable KVM support for Ruby
> 
> Only map memories into the KVM guest address space that are
> marked as usable by KVM.
> 
> Remember whether a BackingStoreEntry should be mapped by KVM.
> 
> Fix bug causing incomplete draining of Ruby Sequencer.
> 
> 
> Diffs
> -----
> 
>   src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee 
> 
> Diff: http://reviews.gem5.org/r/3580/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> David Hashe
> 
>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to