Re: [gem5-dev] Review Request 3580: cpu, mem, sim: Enable KVM support for Ruby

Andreas Hansson Fri, 12 Aug 2016 07:52:27 -0700


> On Aug. 5, 2016, 7:15 a.m., Andreas Hansson wrote:
> > I see how this works as a stop gap, but ultimately I would like to push for 
> > the removal of the shadow memory as the first option. Is it really that 
> > much effort?
> 
> David Hashe wrote:
>     I'm not personally familiar enough with why the shadow memory is needed 
> to be able to say how much effort it would take to remove, but I believe so.
> 
> Brandon Potter wrote:
>     Providing background since some might not be familiar with the problem.
>     
>     __The following links are relevant:__
>     http://reviews.gem5.org/r/2466 (Joel Hestness' response to Andreas 
> Hansson)
>     http://reviews.gem5.org/r/2627 (Joel Hestness' comment)
>     http://reviews.gem5.org/r/3580 (Andreas Hansson's comment)
>     
> https://groups.google.com/forum/#!msg/gem5-gpu-dev/hjMJs_bAwlY/tE05yRQfJysJ 
> (Joel Hestness’ comment)
>     
>     __Why does Ruby need a shadow copy?__
>     
>     Ruby needs the shadow copy to allow it to do functional accesses in 
> situations where it would normally fail. Functional accesses are generated by 
> system calls or by devices to do functional loading and storing to hack 
> around deficiencies in the device model or runtime.
>     
>     __What is a functional access?__
>     
>     A functional access is a memory access that immediately resolves in the 
> memory system. Typically, this involves updating the data value of the memory 
> location without generating any events that go into the event queue. The 
> result is that the memory values appear to have been updated magically 
> without ever creating the events that it would have needed to create if it 
> was operating in the normal manner.
>     
>     __What's different about functional accesses compared with timing 
> accesses?__
>     
>     The difference is that functional accesses must complete immediately 
> before returning control back over to the simulation. For example, system 
> calls are executed in an X86 system when the processor executes either 'int 
> 0x80' or 'syscall'. In SE mode, the system call invocation and all of the 
> resulting loads and stores must be completed by the time that we return 
> control back to the simulated process. That single 'syscall' instruction that 
> the processor executes is supposed to represent an entire set of 
> instructions, many of them necessary loads and stores, that would have 
> executed if we were running the code in a real system with an actual kernel.
>     
>     Timing accesses, on the other hand, are sent through the cache hierarchy 
> and represent what would happen in a "real" system. For timing accesses, the 
> processor creates events that get put into the event queue and are resolved 
> at specific ticks according to the memory model associated with the 
> simulation. Each memory event can generate subsequent events which may or may 
> not modify the cache state and memory state of the simulated system.
>     
>     __Why can't Ruby handle functional accesses without the shadow copy?__
>     
>     Well it could handle function accesses without the shadow copy, but it's 
> difficult to implement properly for most protocols. The shadow copy has been 
> considered to be an acceptable crutch to allow protocol writers to avoid the 
> complexities associated with verifying that their protocol is data correct.
>     
>     Consider the following case: a read request comes into an L1 cache and is 
> about to evict a cache line to be sent to a downstream L2 cache. The eviction 
> is represented by a series of state transitions in Ruby to handle moving the 
> stale data out of the L1 into the L2 or possibly a temporary buffer before 
> copying the new data into the L1 cache. There may be several intermediate 
> states needed to complete transition which are termed transient states. While 
> the cache line’s state machine is in a transient state, data cannot be read 
> or written to the cache line. (Ruby has an assertion in the code to protect 
> against reads on lines that must be due to some of the data being "busy".) 
> The asserts were added because the evicted, old data likely resides in some 
> temporary data structure(s) which are likely not easy to access and update 
> (i.e. MSHR, write buffer, message buffer, request packets, etc.). That 
> doesn’t mean that it’s conceptually impossible to update all of this 
> temporary data; it’s just difficult to do in most cases.
>     
>     __How does the shadow copy solve the problem?__
>     
>     The "--access-backing-store" option solves the problem by caching data in 
> a shadow copy of the system memory. __All functional accesses are sent to 
> this shadow copy instead of being directed to the normal, default system 
> memory. Also, hit callbacks from the memory slave ports (which belong to the 
> sequencers that created the request) will write (or read) data into (from) 
> the shadow copy during the hit callback. If I am not mistaken, the hit 
> callback on the memory slave port is equivalent to an L1 hit meaning that the 
> request completed traversing the cache subsystem. In traversing the cache 
> subsystem, the request did touch the default memory through normal behavior, 
> but any returning information carried by the packet will be discarded in 
> favor of what’s in the shadow copy. If data is read from the shadow copy, the 
> request packet (again issued by a timing request) is updated to reflect the 
> shadow copy’s value before the packet is finally handed back to the 
> sequencer.__ The interesting code can be found by searching for 
> "access_backing_store" in "RubyPort.cc".
>     
>     System call instructions have an ordering semantic that prevents them 
> from being executed before all of the preceding instructions have executed. 
> The ordering semantic protect us from clobbering and/or missing timing 
> accesses with subsequent functional access during the system call. The key 
> thing which protects us here is that the Ruby sequencer needs to tell the 
> processor that the instruction has finished. This cannot happen until the 
> L1's hit callback has returned ensuring that the shadow copy has seen the 
> timing accesses. (Need to verify this by looking through that code, but 
> believe that’s true from previous experience.)
>     
>     If that’s true, than other functional accesses need to be careful in how 
> they issue instructions or we may see consistency issues caused by value 
> reordering from the cache hierarchy. For instance, consider what might happen 
> if the system call did not have the ordering property. It would be possible 
> to the system call instruction to issue functional accesses to the shadow 
> copy before still active timing accesses were seen by it. (There's no way 
> that the processor could prevent the accesses from occurring by checking 
> normal data dependencies because all it sees is a single instruction: syscall 
> or int0x80.) So, I am a bit wary of seeing functional accesses in weird 
> places. For instance, I wouldn’t embed a functional access into a normal 
> instruction. (I don't know if anyone has ever tried that or if it's even 
> possible, but it would be a bad idea. There might be a magic instruction 
> which does this or someone might try to do it in the future.)
>     
>     __What happens if we do not have a shadow copy?__
>     
>     The behavior without a shadow copy of memory (i.e. no 
> --access-backing-store) is kind of interesting. It highlights why we need the 
> shadow copy in the first place (see RubySystem.cc::functional_read/write). 
> Essentially, the functional_writes will always succeed by attempting to write 
> to as much of their state as possible. However, functional_reads can (and 
> will) fail. It’s not completely obvious, but I am confident that the failures 
> stem from the cache lines returning “busy” states caused by recent 
> transitions in the cache hierarchy. (It seems that this is what Nilay is 
> referring to in his summary for reviews.gem5.org/r/2466.)
> 
> Brandon Potter wrote:
>     __Is it possible to remove the shadow copy?__
>     
>     Yes, it is possible, but it requires a lot of work; more work than most 
> people can reasonably be expected to contribute as an unrelated patch. The 
> solution requires that the protocols are data correct; this entails making 
> all of the functional accesses propagate correctly through temporary 
> variables. Even if it is possible to remove for existing public protcools, 
> it's likely that the protocol developers will want to retain this 
> functionality to help with developing new protocols. Even if that's done, I 
> suspect heavy resistance if we tried to force other developers with private 
> protocols to insure that their protocols are data correct even in the face of 
> functional accesses. It's my understanding that the folks here at AMD aren't 
> the only ones who rely on the shadow copy; I think Wisconsin folks use it too.
>     
>     Generally, we need better random memory testers to exercise the protocols 
> and uncover problems. In my opinion, that should be the main priority for 
> Ruby developers. I don't have much confidence in running new workloads if the 
> simulation relies on Ruby; the protocols just aren't tested well enough. This 
> memory tester needs to issue functional accesses as well as timing accesses 
> to actually test whether the protocols are always data correct. It's not 
> enough to simply have a few benchmarks that we test in the regressions even 
> if the benchmarks are long running.
> 
> Andreas Hansson wrote:
>     Thanks for all the comments Brandon.
>     
>     The reason why I don't like the original patch is that it confuses atomic 
> and functional (we should really rename the latter to debug accesses to align 
> with SystemC TLM), and does so without any sensible rules/assumptions in 
> place. I guess the shadow memory is quite similar though, as it seems near 
> impossible to actually explain what is correct and/or expected behaviuor when 
> timing and functional accesses interleave.
>     
>     At the moment we are experiencing actual ordering (and correctness) 
> issues with the interplay between real memory accesses, i.e. timing and 
> atomic, and the debug memory accesses performed using the functional API. In 
> our case the latter are used by a number of wrapped models that are expecting 
> instantaneous updates to memory, and separate function and timing packets. 
> The bottom line is that the use of functional accesses causes big problems 
> when ordering and consistency models matter, and we should try and sort this 
> out before we make even more complicated. We have the same issue with the SST 
> integration. Ultimately the functional read and write accesses should only be 
> used when the guest system is in rest, but clearly we are not sticking to 
> that at the moment.
>     
>     The MemChecker is a good start in checking that the timing behaviuor 
> works as expected with respect to the consistency model, but it is not clear 
> to me how we could add the functional accesses as part of the check. Also, is 
> it at all used with Ruby today? In the interplay between timing/atomic and 
> functional, what is "right"? Technically the functional/debug accesses do not 
> exist in the simulated guest. I am really keen to hear what people think is 
> right.
> 
> Brad Beckmann wrote:
>     This has been a great discussion and I woudl ike to see it continue.  
> However, I think we all should conclude the discussion is somewaht orthogonal 
> to this current patch.  The current patch fixes an immediate problem using 
> KVM with Ruby.  It has gone through 4 revisions and David responded to many 
> comments and suggestions.  Can we please check it in now?
>     
>     Brandon is right on regarding the tester gap.  However, we should be 
> clear on the capablities of the current testers and we should all agree this 
> gap is not unique to Ruby.  The current memtest randomizes racy accesses, but 
> only really checks single-writer/multiple-reader cohernce.  The current 
> rubytest further stresses races, but only checks SC execution.  I'm less 
> familar with the MemChecker, but I believe it siimply monitors a exectuion 
> and ensures SC execution.  Please correct me if I'm wrong, but I don't 
> believe it has any concept of an acquire/ld fence or a release/st fence.  
> Bradon's frustration with our recent GPU protocols is due to a lack of a 
> relaxed consistency model checker in gem5.  We developed such a tester a few 
> years ago, however that work stopped and I have not been able to find someone 
> to take over the development.  It is hard and complicated work.  There simply 
> isn't many people that can do it.  If anyone is interested, please let me 
> know.
> 
> Andreas Hansson wrote:
>     The MemChecker does what you want. At least that's my understanding. 
> Stephan or Marco can comment further.
>     
>     I think we should make sure we get this right, as it's already quite 
> complex code with the various types of memories. Furthermore, once it's 
> committed we traditionally have limited success in getting design changes 
> done.
> 
> Brad Beckmann wrote:
>     Andreas, even if MemChecker does what you think it does, that is 
> orthogonal to this patch.  This patch fixes a current problem with KVM and 
> Ruby.  We will likely never get rid of the backing store for Ruby, even with 
> a perfect checker. 
>     
>     Do I understand your resistance to checking in this patch, correctly?  
> You don't want us to check this in and instead want us to do it "right" by 
> spending 9-12 man-months of effort to remove the Ruby backing store?  Do you 
> fully appreciate the amount of work required?  Do you undertand all the 
> benefits of the backing store as Brandon and previously Joel outlined?


I think you've got this all wrong.

The current patch only adds a visibility switch, and all I'm asking is that we 
document how the various switches combine, and what combinations make sense, 
and where they are used.

The other patch which mixes atomic/functional needs further discussion.


- Andreas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3580/#review8579
-----------------------------------------------------------


On Aug. 5, 2016, 9:37 p.m., David Hashe wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3580/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2016, 9:37 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11562:7375e1f533fa
> ---------------------------
> cpu, mem, sim: Enable KVM support for Ruby
> 
> Only map memories into the KVM guest address space that are
> marked as usable by KVM.
> 
> Remember whether a BackingStoreEntry should be mapped by KVM.
> 
> Fix bug causing incomplete draining of Ruby Sequencer.
> 
> 
> Diffs
> -----
> 
>   src/cpu/kvm/vm.cc 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/AbstractMemory.py 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/abstract_mem.hh 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/abstract_mem.cc 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/physical.hh 704b0198f747b766b839c577614eb2924fd1dfee 
>   src/mem/physical.cc 704b0198f747b766b839c577614eb2924fd1dfee 
> 
> Diff: http://reviews.gem5.org/r/3580/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> David Hashe
> 
>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Re: [gem5-dev] Review Request 3580: cpu, mem, sim: Enable KVM support for Ruby

Reply via email to