> On April 6, 2016, 9:46 p.m., Andreas Hansson wrote:
> > For the classic memory system, all caches before the point of coherency 
> > have the same line size, and it is a property of the system. Is this change 
> > departing from this assumption for Ruby? What is the bigger picture here?
> 
> Brandon Potter wrote:
>     __TL;DR: Never uses global variables, static class members, or static 
> variable declarations in function bodies. If you do, someone will come 
> knocking on your office door in the future; it might be me.__
>     
>     The bigger picture is the desire to run two or more "systems" inside the 
> same gem5 process; ideally the systems could be independent from one another 
> i.e. completely different coherence protocols, topologies, system block 
> sizes, etc.. (Without Ruby, it was already possible to run multiple simulated 
> systems in the same process. We only realized that Ruby had this problem when 
> we started trying to enable it in our simulations.) Being able to run two 
> simulated systems together in the same process is important if you want them 
> to communicate with one another through mutually shared objects. I am aware 
> of the dist-gem5 changeset (1640dd68b0a4); it is a nice addition and could 
> supplant this whole effort by allowing us to create two seperate gem5 
> processes each with a single system invocation and then communicate through 
> the dist-gem5 interface. (Some folks here at AMD already use dist-gem5 for 
> their research.) However, there's no reason why we shouldn't also be able to 
> achieve a multi-instance run without having to create different processes. 
> With a single process, we could theoretically take advantage of shared memory 
> with multi-threading __(on the host)__ with different real hardware threads 
> servicing gem5 event queues. (Preliminary work on multi-threaded and multiple 
> event queues has already been done and checked in. We have internal patches 
> that we would eventually like to push to the public tree to extend it.) 
> Ideally, dist-gem5 and this work could even be complementary. We might run a 
> multi-threaded gem5 process with multiple simulated systems coexisting 
> together in the process and then scale that out to multiple processes on 
> different host nodes which communicate with one another through dist-gem5; in 
> this way, we could harness a cluster and use the threading within nodes.
>     
>     __Ruby problems:__ The issues are related to global variables and static 
> class members. Since the variables are visible to the entire gem5 process, 
> they're used by each independent simulated system even though that's not 
> really what we want. For example, the deal breaker (which caused initial 
> failed assertions) was how the component mappings were being determined post 
> python configuration. There were global counters tracking how many times the 
> controller classes had been instantiated to figure out how many controllers 
> (cache, directory, etc.) were being created. This information was used to 
> determine how to route messages between the controllers. With multiple 
> simulated systems being created, the extra class invocations were throwing 
> things off and tripping assertions because it was seeing too many controller 
> registrations.
>     
>     __Block size:__ I agree with what you're saying that it should be the 
> same within a system; the alternative does not make sense to me. (I'm going 
> to subject you to a bit of Ruby background to try to explain why the block 
> size changes look they way that they do.) There are System objects and 
> RubySystem objects and there is a 1:1 correlation between the two. 
> Essentially, they both serve as an aggregation point for all system wide 
> attributes. The only thing that's special about the RubySystem object is that 
> it contains Ruby specific parameters. So you might ask, "Why not place the 
> block size (or other attributes) in RubySystem and access it (them) 
> indirectly through that object?" I tried that with a set of patches last 
> year; it was the most obvious initial solution. The patches were not well 
> received -- partly because of the code organization/quality, partly because 
> of Ruby specific eccentriticites, and partly because of the following 
> connundrum: the block size is needed by _everything_. I need to pass a 
> parameter through deeply-nested function parameter lists from within 
> controller objects that are created independently in Python configuration 
> scripts __and__ dynamically created entry objects which exist independently 
> and which are owned by the controller objects __and__ messages. Should all of 
> those objects be exposed to a centralized object like RubySystem?
>     
>     __Why is that necessary to pass the block size around like this? I still 
> don't get it. Type some more!:__ It is necessary because the controllers call 
> functions that rely on knowing the block size. (For instance, the 
> makeLineAddress function needs to know the block size to figure out the base 
> cache line address; makeLineAddress is used _everywhere_.) Each controller 
> needs either a RubySystem handle to access it indirectly or a block size 
> provided directly as a parameter. You might say, "Well, the indirect access 
> still doesn't seem like a big deal with the controllers!" Correct. RubySystem 
> objects are accessed indirectly through several "top level" objects already; 
> that's the point. The problem is that passing the RubySystem pointer around 
> to the other objects that I mentioned is _nasty_ (and Nilay claims 
> inefficient due to the extra dereferences on so many of the accesses). The 
> entries are an incredible pain to deal with; they are templated types that 
> may or may not have DataBlks or WriteMasks. (The DataBlk class is essentially 
> a vector that holds data inside messages, TBEs (MSHRs), caches, and other 
> assorted _fun_ objects. WriteMasks were introduced with the recent AMD GPU 
> patches as a way of copying only portions of a DataBlk. Both rely on the 
> block size to know how to size the two objects. Previously, they both just 
> used the global in their constructor so that they were always sized 
> appropriately; the global went away so _fun_.) Not only are the entries 
> templated types, but it was previously not possible to pass the data block in 
> the constructor so there's a patch in this list that makes that possible. 
> Otherwise, we'd have to create the DataBlk or WriteMask as an empty vector 
> and then subsequently initialize it after the entry was created... which is 
> what we do with messages... _fun_! Message do not have a constructor 
> available to them directly within the slicc language. They are created as 
> part of the "enqueue" keyword and their fields are initialized subsequently. 
> (It might be possible to pass in the data block implicitly by modifiying 
> "enqueue" but I didn't bother with it.) __So there, that's why.__
>     
>     __Reality__: Currently, we can run two or more simulated systems in the 
> same process with almost identical configurations; pretty much everything 
> stays the same except the workloads/kernels which are provided to each 
> system. _For now, this is fine for us here at AMD, but I can imagine that 
> someone else will want homogenous systems._ One problem is writing a 
> configuration script which allows us to specify two completely different 
> systems in a clean way. Right now, we go through the process of making one 
> system and then duplicate that system. Another problem is how the PROTOCOL is 
> specified at build time; I have not put much thought into solving this yet, 
> but it seems like a hard problem. Maybe we can make a meta-protocol that 
> encompasses all of the existing ones? Seems crazy. Finally, I don't think 
> that I have completely solved the problem of statics/globals. I found that 
> the number of virtual networks is still a global definition somewhere in the 
> code the other day. If the PROTOCOL issue gets solved, it will need to be 
> changed as well. There are also probably other things that I am overlooking.
> 
> Andreas Hansson wrote:
>     Ok. Could you perhaps add a _short_ note in the patch summary explaining 
> that this is to allow multiple systems with different cache line size.
> 
> Jason Lowe-Power wrote:
>     If the only benefit of this set of patches is to allow multiple systems 
> with different block sizes in the same gem5 process, then I don't think they 
> are worth the pain they will cause to others. As it is, anyone with a SLICC 
> protocol will have a significant headache to update with these patches. 
> Additionally, they ask the programmer to write more easy to mess up code 
> (e.g., every out_msg must call DataBlk.alloc(...)). 
>     
>     I agree that static variables are bad... but I don't see how getting rid 
> of them are worth this headache.
>     
>     On another note: How does the classic memory system deal with this? Maybe 
> you could look there to see how the block size is parameterized. However, I 
> have a feeling a lot of this is that Ruby has so much baggage that even 
> seemingly simple things like removing static variables causes a ripple effect.

Jason, I completely understand the reluctance to merge with this change.  
However, we all agree this is an improvement.  The static variables have been a 
major issue for a while.  It will be great when we robustly support multi-node 
Ruby simulations within an single gem5 process.  A lot of our research at AMD 
depends on it (our batch system only supports single process simulations) and I 
suspect other researchers will benefit as well.

As Brandon points out, this is the second time he's implemented this change.  
Nilay did not like the first implementation and we all agreed that Nilay's 
proposal was better.  Thus Brandon spent months implementing and debugging this 
new approach.  He's done all the work on the public protocols (and AMD's 
internal ones).  It is not that hard to follow his example.


- Brad


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/3428/#review8153
-----------------------------------------------------------


On April 4, 2016, 11:42 p.m., Brandon Potter wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/3428/
> -----------------------------------------------------------
> 
> (Updated April 4, 2016, 11:42 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> -------
> 
> Changeset 11430:ad5965016b8e
> ---------------------------
> ruby: pass in block size to ENTRY objects with block size
> 
> 
> Diffs
> -----
> 
>   src/mem/protocol/MOESI_CMP_token-L2cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_token-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_hammer-cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_hammer-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/ruby/structures/TBETable.hh 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/slicc/symbols/Type.py cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-L3cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-Region-CorePair.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-Region-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-RegionBuffer.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-RegionDir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-probeFilter.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_directory-L1cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_directory-L2cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_directory-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_directory-dma.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_CMP_token-L1cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_VIPER-TCP.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_VIPER_Region-TCC.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Three_Level-L0cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Three_Level-L1cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Two_Level-L1cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Two_Level-L2cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MESI_Two_Level-dir.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MI_example-cache.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MI_example-dir.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/MOESI_AMD_Base-CorePair.sm 
> cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_RfO-SQC.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_RfO-TCC.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_RfO-TCCdir.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_RfO-TCP.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_VIPER-SQC.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
>   src/mem/protocol/GPU_VIPER-TCC.sm cfad34a15729e1d5e096245f5a80ded6e2c379ca 
> 
> Diff: http://reviews.gem5.org/r/3428/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Brandon Potter
> 
>

_______________________________________________
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to