Hi Andreas, Also, RE: your prompt about deprecating the RubyMemoryControl (here: http://reviews.gem5.org/r/3116/), I have some questions that relate to our discussion in this past thread:
I recall your claim that DRAMCtrls should be able to Pareto dominate the RubyMemoryControl, but I haven't found that to be the case in my tests. GPU applications frequently operate near peak achievable bandwidth, which hovers in the range of 80-81% for NVIDIA Fermi hardware. I'm generally able to get 81-83% of theoretical peak with the RubyMemoryControl configured like GDDR5. However, for some reason, I've only been able to get 72-74% out of DRAMCtrls (I tested all the available configs without much luck). Do you have ideas for what I might try? Thanks! Joel On Tue, Oct 14, 2014 at 3:14 PM, Joel Hestness <[email protected]> wrote: > Hi Andreas, > > Some brief clarifications before addressing your questions below: I've > validated most of our gem5-gpu memory hierarchy modeling against NVIDIA > Fermi hardware (GTX580 and Tesla C2070) using some reverse engineering. > While I've also tested newer hardware, it will be easier to validate this > gem5 change if we aim to model something close to these same Fermi > baselines. Also, the RubyMemoryController doesn't model things like > separate data, command, and core frequency, or an open-page policy, so I've > had to do some digging to translate the RubyMemoryController parameters > back to actual parameters - more on this below. > > > I am not sure I grok the latency and queue argument still. Adding a larger >> response queue does not increase the latency unless there is also a bunch >> of transactions queued up. Am I missing something? >> > > I probably should have been referring to these as "delay queues" rather > than buffers. They do not model actual hardware buffers, but rather they > are meant to model the DRAM controller pipeline (and possibly interconnect) > latencies. GPGPU-Sim often decouples functional simulation from timing > simulation, and this is one of those cases. Unlike gem5's event-driven > simulation which allows scheduling when an access should complete, > GPGPU-Sim puts accesses in these delay queues, which are stepped each DRAM > controller cycle to move the accesses through at predictable latencies. > > > We can easily set the static pipeline latency to a high value, but 100 ns >> sounds much too large. What is that value based on? >> > > This was based on my own empirical testing, and what we've used for > validating gem5-gpu in the past. However, I appreciate the prompt for > review, since I had not thought through how the DRAMCtrl models this > differently than the RubyMemoryController: > > First, the RubyMemoryController models a close-page policy with a static > "bank access latency", which is meant to approximate RAS + CAS + bank > access. However, given that row-buffers are large for 8n prefetch, GPU DRAM > controllers probably use an open-page policy, so an average random access > (what I used for validating) would be an open-page row buffer miss and > probably include precharge latency. For this reason, I had pegged > RubyMemoryController zero-load latency between 88 and 104ns depending on > the system being modeled. A row-buffer hit would bring that zero-load > latency down to maybe 20-36ns. > > Second, it doesn't address your prompt directly, but it is worth noting > that most GPU memory hierarchies aim for very predictable latencies, which > often means seemingly excessive pipelining. This is to help support > very-near-peak bandwidths where necessary and makes GPU-wide scheduling > more predictable. This is likely a second-order reason why GPU DRAM > controllers have higher than expected zero-load latency. > > I'm not sure I can get any more precise with the data I have, but maybe > that is enough to work with? > > > Do you have details on the addressing scheme you mention? >> > > The GPGPU-Sim config contains the following for the DRAM pin identifiers > (R = row address, C = column address, B = bank address, 0 = data), and the > field sizes are echoed on page 15, table 6 of the Hynix data sheet: > > -gpgpu_mem_addr_mapping dramid@8 > ;00000000.00000000.00000000.00000000.0000RRRR.RRRRRRRR.BBBCCCCB.CCSSSSSS > > I'm not familiar with the different (sub)cycle interpretations of control > and address signals on particular lines though. Are the pin IDs sufficient, > or are you looking for something more specific here? > > > In GDDR5, the command bus is DDR and the data bus QDR. The part you refer >> to is either running at 5.5 Gbps or 6 Gbps according to the data sheet, so >> I do find the 4 Gbps surprising. Are you sure? >> > > Yes, to start with I think we should use the 4 Gbps target: We've > validated against the GeForce GTX480 and 580, which use low-end data bus > clocks between 3.2 and 4 Gbps/pin (note: in practice, discrete GPUs take > more liberties to set frequencies as desired, since their tighter coupling > with DRAMs offers more freedom than the clocking agreements required > between CPU chips and their memories). More recent cards use the higher > frequencies, but I have limited testing with them. > > > The burst length I am talking about is the DRAM burst length, which for >> GDDR5 is 8 beats (so with a x32 mode that would be 32 bytes per burst. I >> do not see this specified anywhere in the line you sent, and therefore I >> am curious if it is omitted completely (DRAMSim2 for example makes >> dangerous assumptions here). >> > > Yep, I follow but wasn't sure if/how GPU vendors might play with this for > streaming/interleaving purposes. On closer inspection of the Hynix doc, it > says burst length is "8 only" on page 5. I also did a spot check of a few, > more recent GDDR5 data sheets, and they show the same thing. > > > Joel > > > > > >> >> On 14/10/2014 00:22, "Joel Hestness via gem5-dev" <[email protected]> >> wrote: >> >> >Hi Andreas, >> > >> > >> >> Thanks. I really do not understand the return queue argument. Why on >> >>earth >> >> would you need such a large return queue? Surely the agent making the >> >> requests (the GPU in this case) should have allocated space for the >> >> response, no? >> >> >> > >> >Good question. I believe that buffer is just to get the appropriate >> >overall >> >minimum latency for memory accesses (i.e. GPGPU-Sim does some level of >> >functional controller modeling, and the buffer is just for timing). If >> >there a way to add arbitrary latency to the DRAMCtrl, it would be nice to >> >aim for no-load latency of roughly 100ns (from when the controller >> >receives >> >the access to when a response is returned to the Ruby network). Also, if >> I >> >have a chance to test out Nilay's patch, I could tune on this setting. >> > >> >Concerning the configuration, what is the assumed clock speed (tCK), and >> >> is it operated as a x16 or x32 part? Is the burst length implicit in >> >>your >> >> configuration (or is 8 the default)? >> >> >> > >> >The memory modeled in GPGPU-Sim is x32 based on the addressing scheme. >> >That's where I'd start the mode. >> > >> >In GDDR5, the core frequency is 1/4 of command frequency (and 1/2 in >> >GDDR3?). A common channel frequency is 4GHz, resulting in effective >> >channel >> >bandwidth of 32GT/s. So, a starting baseline of tCK = 1ns should be >> >sufficient. Does that sound right? >> > >> >I believe burst length is typically 8 and can be much longer. In >> practice, >> >this depend highly on address hashing/interleaving across many >> controllers >> >in various different GPUs, so 8 should be a sufficient baseline. >> > >> > >> > Joel >> > >> > >> > >> >On 10/13/14, 10:28 PM, "Joel Hestness via gem5-dev" <[email protected]> >> >> wrote: >> >> >> >> >Hi Andreas, >> >> > Sure thing. We try to closely replicate the parameters used in >> >>GPGPU-Sim >> >> >v3.2.2, which are specified in the Hynix datasheet here: >> >> >http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFR(Rev1.0).pdf. >> >> > >> >> > Here are relevant excerpts from the GPGPU-Sim GTX480 config file >> >> >(attached): >> >> > >> >> ># The DRAM return queue and the scheduler queue together should >> provide >> >> >buffer >> >> ># to sustain the memory level parallelism to tolerate DRAM latency >> >> ># To allow 100% DRAM utility, there should at least be enough buffer >> to >> >> >sustain >> >> ># the minimum DRAM latency (100 core cycles). I.e. >> >> ># Total buffer space required = 100 x 924MHz / 700MHz = 132 >> >> >-gpgpu_frfcfs_dram_sched_queue_size 16 >> >> >-gpgpu_dram_return_queue_size 116 >> >> > >> >> >-dram_data_command_freq_ratio 4 # GDDR5 is QDR >> >> >-gpgpu_dram_timing_opt "nbk=16:CCD=2:RRD=6:RCD=12:RAS=28:RP=12:RC=40: >> >> > >> CL=12:WL=4:CDLR=5:WR=12:nbkgrp=4:CCDL=3:RTPL=2" >> >> > >> >> > If anything needs clarification, I'm happy to help sort it out. Just >> >>let >> >> >me know. >> >> > >> >> > Thanks! >> >> > Joel >> >> > >> >> > >> >> > >> >> >On Mon, Oct 13, 2014 at 4:11 PM, Andreas Hansson via gem5-dev < >> >> >[email protected]> wrote: >> >> > >> >> >> Hi Joel, >> >> >> >> >> >> I am happy to spend the 5 minutes creating a GDDR5 configuration. Do >> >>you >> >> >> have any specific data sheet you would like to capture? >> >> >> >> >> >> Andreas >> >> >> >> >> >> On 10/13/14, 10:09 PM, "Joel Hestness via gem5-dev" >> >><[email protected]> >> >> >> wrote: >> >> >> >> >> >> >Hi guys, >> >> >> > >> >> >> > >> >> >> >> Thanks for the clarification. I believe the RubyMemoryController >> >>is >> >> >> >> completely Pareto dominated by the vanilla DRAMCtrl module, but >> if >> >> >>there >> >> >> >> is any specific feature/setting missing I would be keen to know. >> >> >> >> >> >> >> >> If possible I would like to make sure we use the same controller >> >>as a >> >> >> >> default for all timing simulations (even if the other one would >> be >> >> >> >> maintained as a fallback). >> >> >> > >> >> >> > >> >> >> >I'd like to second the desire to have a simple replacement baseline >> >> >>that >> >> >> >performs at least as well as the RubyMemoryController in most/all >> >> >>cases: >> >> >> >gem5-gpu now has more than 100 users, and as far as I know, we are >> >>all >> >> >> >using Ruby and thus the RubyMemoryController. The >> >>RubyMemoryController >> >> >>is >> >> >> >pretty simple to configure similarly to DDR3 or GDDR5 and to >> >>interpret >> >> >> >results. It performs surprisingly close to some GPU hardware. If >> >>this >> >> >> >controller goes away, I (and I'm sure other gem5-gpu users) would >> >> >>prefer >> >> >> >to >> >> >> >have something that is known to perform as well and is also easy to >> >> >> >configure. >> >> >> > >> >> >> >I think we (the gem5-gpu crew) are fine with the >> >>RubyMemoryController >> >> >> >going >> >> >> >away eventually. However, given that there isn't currently a >> >>GDDR-like >> >> >> >DRAMCtrl configuration in gem5, I'd like to second Nilay and Brad >> >>that >> >> >>we >> >> >> >offer users sufficient time to prepare for RubyMemoryController >> >> >>removal. >> >> >> >We >> >> >> >will need to adapt our heterogeneous Ruby coherence protocols, and >> >> >>other >> >> >> >users have their own protocols they'd need to adapt as well. >> >> >> > >> >> >> > >> >> >> > Joel >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> >> On 10/13/14, 9:01 PM, "Nilay Vaish via gem5-dev" >> >><[email protected]> >> >> >> >> wrote: >> >> >> >> >> >> >> >> >On Mon, 13 Oct 2014, Andreas Hansson via gem5-dev wrote: >> >> >> >> > >> >> >> >> >> Hi all, >> >> >> >> >> >> >> >> >> >> With Nilay?s recent improvements to Ruby I would like to >> >> >>understand >> >> >> >>if >> >> >> >> >> there is any point in still having the RubyMemoryControl, or >> >>if we >> >> >> >> >> should just clean things up a bit and remove it. I would think >> >>the >> >> >> >>best >> >> >> >> >> way forward is to clean up the integration of Ruby and classic >> >>and >> >> >> >> >> ensure that there is no duplicated functionality beyond what >> is >> >> >> >> >>strictly >> >> >> >> >> necessary. >> >> >> >> >> >> >> >> >> >> Nilay, do you think this would make sense? Is there anyone >> else >> >> >>with >> >> >> >> >>any >> >> >> >> >> opinions in this matter? >> >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> >I was in favor of dropping RubyMemoryControl. But I had some >> >> >> >>discussion >> >> >> >> >with Brad Beckmann from AMD. Since AMD has some infrastructure >> >>in >> >> >> >>place >> >> >> >> >already, they would like to retain RubyMemoryControl for the >> time >> >> >> >>being. >> >> >> >> > >> >> >> >> >I suggest that we retain the memory controller code in ruby for >> >> >>another >> >> >> >> >six months or so, and then we will drop it. In the mean time, >> >>we >> >> >> >> >will update the interface so that ruby protocols can use classic >> >> >>memory >> >> >> >> >controller. The code for this is already on the reviewboard. >> >>Over >> >> >> >>this >> >> >> >> >six month period, I hope, most users would have switched to >> using >> >> >> >>classic >> >> >> >> >controller. >> >> >> >> > >> >> >> >> >Thanks >> >> >> >> >Nilay >> >> >> >> >_______________________________________________ >> >> >> >> >gem5-dev mailing list >> >> >> >> >[email protected] >> >> >> >> >http://m5sim.org/mailman/listinfo/gem5-dev >> >> >> >> >> >> >> >> >> >> >> >> -- IMPORTANT NOTICE: The contents of this email and any >> >>attachments >> >> >>are >> >> >> >> confidential and may also be privileged. If you are not the >> >>intended >> >> >> >> recipient, please notify the sender immediately and do not >> >>disclose >> >> >>the >> >> >> >> contents to any other person, use it for any purpose, or store or >> >> >>copy >> >> >> >>the >> >> >> >> information in any medium. Thank you. >> >> >> >> >> >> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 >> >>9NJ, >> >> >> >> Registered in England & Wales, Company No: 2557590 >> >> >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge >> >>CB1 >> >> >> >>9NJ, >> >> >> >> Registered in England & Wales, Company No: 2548782 >> >> >> >> _______________________________________________ >> >> >> >> gem5-dev mailing list >> >> >> >> [email protected] >> >> >> >> http://m5sim.org/mailman/listinfo/gem5-dev >> >> >> >> >> >> >> > >> >> >> > >> >> >> > >> >> >> >-- >> >> >> > Joel Hestness >> >> >> > PhD Student, Computer Architecture >> >> >> > Dept. of Computer Science, University of Wisconsin - Madison >> >> >> > http://pages.cs.wisc.edu/~hestness/ >> >> >> >_______________________________________________ >> >> >> >gem5-dev mailing list >> >> >> >[email protected] >> >> >> >http://m5sim.org/mailman/listinfo/gem5-dev >> >> >> > >> >> >> >> >> >> >> >> >> -- IMPORTANT NOTICE: The contents of this email and any attachments >> >>are >> >> >> confidential and may also be privileged. If you are not the intended >> >> >> recipient, please notify the sender immediately and do not disclose >> >>the >> >> >> contents to any other person, use it for any purpose, or store or >> >>copy >> >> >>the >> >> >> information in any medium. Thank you. >> >> >> >> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >> >> >> Registered in England & Wales, Company No: 2557590 >> >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 >> >> >>9NJ, >> >> >> Registered in England & Wales, Company No: 2548782 >> >> >> >> >> >> _______________________________________________ >> >> >> gem5-dev mailing list >> >> >> [email protected] >> >> >> http://m5sim.org/mailman/listinfo/gem5-dev >> >> >> >> >> > >> >> > >> >> > >> >> >-- >> >> > Joel Hestness >> >> > PhD Student, Computer Architecture >> >> > Dept. of Computer Science, University of Wisconsin - Madison >> >> > http://pages.cs.wisc.edu/~hestness/ >> >> >_______________________________________________ >> >> >gem5-dev mailing list >> >> >[email protected] >> >> >http://m5sim.org/mailman/listinfo/gem5-dev >> >> > >> >> >> >> >> >> -- IMPORTANT NOTICE: The contents of this email and any attachments are >> >> confidential and may also be privileged. If you are not the intended >> >> recipient, please notify the sender immediately and do not disclose the >> >> contents to any other person, use it for any purpose, or store or copy >> >>the >> >> information in any medium. Thank you. >> >> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >> >> Registered in England & Wales, Company No: 2557590 >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 >> >>9NJ, >> >> Registered in England & Wales, Company No: 2548782 >> >> >> >> _______________________________________________ >> >> gem5-dev mailing list >> >> [email protected] >> >> http://m5sim.org/mailman/listinfo/gem5-dev >> >> >> > >> > >> >-- >> > Joel Hestness >> > PhD Student, Computer Architecture >> > Dept. of Computer Science, University of Wisconsin - Madison >> > http://pages.cs.wisc.edu/~hestness/ >> >_______________________________________________ >> >gem5-dev mailing list >> >[email protected] >> >http://m5sim.org/mailman/listinfo/gem5-dev >> > >> >> >> -- IMPORTANT NOTICE: The contents of this email and any attachments are >> confidential and may also be privileged. If you are not the intended >> recipient, please notify the sender immediately and do not disclose the >> contents to any other person, use it for any purpose, or store or copy the >> information in any medium. Thank you. >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >> Registered in England & Wales, Company No: 2557590 >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, >> Registered in England & Wales, Company No: 2548782 >> >> _______________________________________________ >> gem5-dev mailing list >> [email protected] >> http://m5sim.org/mailman/listinfo/gem5-dev >> > > -- > Joel Hestness > PhD Student, Computer Architecture > Dept. of Computer Science, University of Wisconsin - Madison > http://pages.cs.wisc.edu/~hestness/ > -- Joel Hestness PhD Candidate, Computer Architecture Dept. of Computer Science, University of Wisconsin - Madison http://pages.cs.wisc.edu/~hestness/ _______________________________________________ gem5-dev mailing list [email protected] http://m5sim.org/mailman/listinfo/gem5-dev
