Hi Andreas,
  Also, RE: your prompt about deprecating the RubyMemoryControl (here:
http://reviews.gem5.org/r/3116/), I have some questions that relate to our
discussion in this past thread:

  I recall your claim that DRAMCtrls should be able to Pareto dominate the
RubyMemoryControl, but I haven't found that to be the case in my tests. GPU
applications frequently operate near peak achievable bandwidth, which
hovers in the range of 80-81% for NVIDIA Fermi hardware. I'm generally able
to get 81-83% of theoretical peak with the RubyMemoryControl configured
like GDDR5. However, for some reason, I've only been able to get 72-74% out
of DRAMCtrls (I tested all the available configs without much luck). Do you
have ideas for what I might try?

  Thanks!
  Joel


On Tue, Oct 14, 2014 at 3:14 PM, Joel Hestness <[email protected]> wrote:

> Hi Andreas,
>
> Some brief clarifications before addressing your questions below: I've
> validated most of our gem5-gpu memory hierarchy modeling against NVIDIA
> Fermi hardware (GTX580 and Tesla C2070) using some reverse engineering.
> While I've also tested newer hardware, it will be easier to validate this
> gem5 change if we aim to model something close to these same Fermi
> baselines. Also, the RubyMemoryController doesn't model things like
> separate data, command, and core frequency, or an open-page policy, so I've
> had to do some digging to translate the RubyMemoryController parameters
> back to actual parameters - more on this below.
>
>
> I am not sure I grok the latency and queue argument still. Adding a larger
>> response queue does not increase the latency unless there is also a bunch
>> of transactions queued up. Am I missing something?
>>
>
> I probably should have been referring to these as "delay queues" rather
> than buffers. They do not model actual hardware buffers, but rather they
> are meant to model the DRAM controller pipeline (and possibly interconnect)
> latencies. GPGPU-Sim often decouples functional simulation from timing
> simulation, and this is one of those cases. Unlike gem5's event-driven
> simulation which allows scheduling when an access should complete,
> GPGPU-Sim puts accesses in these delay queues, which are stepped each DRAM
> controller cycle to move the accesses through at predictable latencies.
>
>
> We can easily set the static pipeline latency to a high value, but 100 ns
>> sounds much too large. What is that value based on?
>>
>
> This was based on my own empirical testing, and what we've used for
> validating gem5-gpu in the past. However, I appreciate the prompt for
> review, since I had not thought through how the DRAMCtrl models this
> differently than the RubyMemoryController:
>
> First, the RubyMemoryController models a close-page policy with a static
> "bank access latency", which is meant to approximate RAS + CAS + bank
> access. However, given that row-buffers are large for 8n prefetch, GPU DRAM
> controllers probably use an open-page policy, so an average random access
> (what I used for validating) would be an open-page row buffer miss and
> probably include precharge latency. For this reason, I had pegged
> RubyMemoryController zero-load latency between 88 and 104ns depending on
> the system being modeled. A row-buffer hit would bring that zero-load
> latency down to maybe 20-36ns.
>
> Second, it doesn't address your prompt directly, but it is worth noting
> that most GPU memory hierarchies aim for very predictable latencies, which
> often means seemingly excessive pipelining. This is to help support
> very-near-peak bandwidths where necessary and makes GPU-wide scheduling
> more predictable. This is likely a second-order reason why GPU DRAM
> controllers have higher than expected zero-load latency.
>
> I'm not sure I can get any more precise with the data I have, but maybe
> that is enough to work with?
>
>
> Do you have details on the addressing scheme you mention?
>>
>
> The GPGPU-Sim config contains the following for the DRAM pin identifiers
> (R = row address, C = column address, B = bank address, 0 = data), and the
> field sizes are echoed on page 15, table 6 of the Hynix data sheet:
>
> -gpgpu_mem_addr_mapping dramid@8
> ;00000000.00000000.00000000.00000000.0000RRRR.RRRRRRRR.BBBCCCCB.CCSSSSSS
>
> I'm not familiar with the different (sub)cycle interpretations of control
> and address signals on particular lines though. Are the pin IDs sufficient,
> or are you looking for something more specific here?
>
>
> In GDDR5, the command bus is DDR and the data bus QDR. The part you refer
>> to is either running at 5.5 Gbps or 6 Gbps according to the data sheet, so
>> I do find the 4 Gbps surprising. Are you sure?
>>
>
> Yes, to start with I think we should use the 4 Gbps target: We've
> validated against the GeForce GTX480 and 580, which use low-end data bus
> clocks between 3.2 and 4 Gbps/pin (note: in practice, discrete GPUs take
> more liberties to set frequencies as desired, since their tighter coupling
> with DRAMs offers more freedom than the clocking agreements required
> between CPU chips and their memories). More recent cards use the higher
> frequencies, but I have limited testing with them.
>
>
> The burst length I am talking about is the DRAM burst length, which for
>> GDDR5 is 8 beats (so with a x32 mode that would be 32 bytes per burst. I
>> do not see this specified anywhere in the line you sent, and therefore I
>> am curious if it is omitted completely (DRAMSim2 for example makes
>> dangerous assumptions here).
>>
>
> Yep, I follow but wasn't sure if/how GPU vendors might play with this for
> streaming/interleaving purposes. On closer inspection of the Hynix doc, it
> says burst length is "8 only" on page 5. I also did a spot check of a few,
> more recent GDDR5 data sheets, and they show the same thing.
>
>
>   Joel
>
>
>
>
>
>>
>> On 14/10/2014 00:22, "Joel Hestness via gem5-dev" <[email protected]>
>> wrote:
>>
>> >Hi Andreas,
>> >
>> >
>> >> Thanks. I really do not understand the return queue argument. Why on
>> >>earth
>> >> would you need such a large return queue? Surely the agent making the
>> >> requests (the GPU in this case) should have allocated space for the
>> >> response, no?
>> >>
>> >
>> >Good question. I believe that buffer is just to get the appropriate
>> >overall
>> >minimum latency for memory accesses (i.e. GPGPU-Sim does some level of
>> >functional controller modeling, and the buffer is just for timing). If
>> >there a way to add arbitrary latency to the DRAMCtrl, it would be nice to
>> >aim for no-load latency of roughly 100ns (from when the controller
>> >receives
>> >the access to when a response is returned to the Ruby network). Also, if
>> I
>> >have a chance to test out Nilay's patch, I could tune on this setting.
>> >
>> >Concerning the configuration, what is the assumed clock speed (tCK), and
>> >> is it operated as a x16 or x32 part? Is the burst length implicit in
>> >>your
>> >> configuration (or is 8 the default)?
>> >>
>> >
>> >The memory modeled in GPGPU-Sim is x32 based on the addressing scheme.
>> >That's where I'd start the mode.
>> >
>> >In GDDR5, the core frequency is 1/4 of command frequency (and 1/2 in
>> >GDDR3?). A common channel frequency is 4GHz, resulting in effective
>> >channel
>> >bandwidth of 32GT/s. So, a starting baseline of tCK = 1ns should be
>> >sufficient. Does that sound right?
>> >
>> >I believe burst length is typically 8 and can be much longer. In
>> practice,
>> >this depend highly on address hashing/interleaving across many
>> controllers
>> >in various different GPUs, so 8 should be a sufficient baseline.
>> >
>> >
>> >  Joel
>> >
>> >
>> >
>> >On 10/13/14, 10:28 PM, "Joel Hestness via gem5-dev" <[email protected]>
>> >> wrote:
>> >>
>> >> >Hi Andreas,
>> >> >  Sure thing. We try to closely replicate the parameters used in
>> >>GPGPU-Sim
>> >> >v3.2.2, which are specified in the Hynix datasheet here:
>> >> >http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFR(Rev1.0).pdf.
>> >> >
>> >> >  Here are relevant excerpts from the GPGPU-Sim GTX480 config file
>> >> >(attached):
>> >> >
>> >> ># The DRAM return queue and the scheduler queue together should
>> provide
>> >> >buffer
>> >> ># to sustain the memory level parallelism to tolerate DRAM latency
>> >> ># To allow 100% DRAM utility, there should at least be enough buffer
>> to
>> >> >sustain
>> >> ># the minimum DRAM latency (100 core cycles).  I.e.
>> >> >#   Total buffer space required = 100 x 924MHz / 700MHz = 132
>> >> >-gpgpu_frfcfs_dram_sched_queue_size 16
>> >> >-gpgpu_dram_return_queue_size 116
>> >> >
>> >> >-dram_data_command_freq_ratio 4  # GDDR5 is QDR
>> >> >-gpgpu_dram_timing_opt "nbk=16:CCD=2:RRD=6:RCD=12:RAS=28:RP=12:RC=40:
>> >> >
>> CL=12:WL=4:CDLR=5:WR=12:nbkgrp=4:CCDL=3:RTPL=2"
>> >> >
>> >> >  If anything needs clarification, I'm happy to help sort it out. Just
>> >>let
>> >> >me know.
>> >> >
>> >> >  Thanks!
>> >> >  Joel
>> >> >
>> >> >
>> >> >
>> >> >On Mon, Oct 13, 2014 at 4:11 PM, Andreas Hansson via gem5-dev <
>> >> >[email protected]> wrote:
>> >> >
>> >> >> Hi Joel,
>> >> >>
>> >> >> I am happy to spend the 5 minutes creating a GDDR5 configuration. Do
>> >>you
>> >> >> have any specific data sheet you would like to capture?
>> >> >>
>> >> >> Andreas
>> >> >>
>> >> >> On 10/13/14, 10:09 PM, "Joel Hestness via gem5-dev"
>> >><[email protected]>
>> >> >> wrote:
>> >> >>
>> >> >> >Hi guys,
>> >> >> >
>> >> >> >
>> >> >> >> Thanks for the clarification. I believe the RubyMemoryController
>> >>is
>> >> >> >> completely Pareto dominated by the vanilla DRAMCtrl module, but
>> if
>> >> >>there
>> >> >> >> is any specific feature/setting missing I would be keen to know.
>> >> >> >>
>> >> >> >> If possible I would like to make sure we use the same controller
>> >>as a
>> >> >> >> default for all timing simulations (even if the other one would
>> be
>> >> >> >> maintained as a fallback).
>> >> >> >
>> >> >> >
>> >> >> >I'd like to second the desire to have a simple replacement baseline
>> >> >>that
>> >> >> >performs at least as well as the RubyMemoryController in most/all
>> >> >>cases:
>> >> >> >gem5-gpu now has more than 100 users, and as far as I know, we are
>> >>all
>> >> >> >using Ruby and thus the RubyMemoryController. The
>> >>RubyMemoryController
>> >> >>is
>> >> >> >pretty simple to configure similarly to DDR3 or GDDR5 and to
>> >>interpret
>> >> >> >results. It performs surprisingly close to some GPU hardware. If
>> >>this
>> >> >> >controller goes away, I (and I'm sure other gem5-gpu users) would
>> >> >>prefer
>> >> >> >to
>> >> >> >have something that is known to perform as well and is also easy to
>> >> >> >configure.
>> >> >> >
>> >> >> >I think we (the gem5-gpu crew) are fine with the
>> >>RubyMemoryController
>> >> >> >going
>> >> >> >away eventually. However, given that there isn't currently a
>> >>GDDR-like
>> >> >> >DRAMCtrl configuration in gem5, I'd like to second Nilay and Brad
>> >>that
>> >> >>we
>> >> >> >offer users sufficient time to prepare for RubyMemoryController
>> >> >>removal.
>> >> >> >We
>> >> >> >will need to adapt our heterogeneous Ruby coherence protocols, and
>> >> >>other
>> >> >> >users have their own protocols they'd need to adapt as well.
>> >> >> >
>> >> >> >
>> >> >> >  Joel
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >> On 10/13/14, 9:01 PM, "Nilay Vaish via gem5-dev"
>> >><[email protected]>
>> >> >> >> wrote:
>> >> >> >>
>> >> >> >> >On Mon, 13 Oct 2014, Andreas Hansson via gem5-dev wrote:
>> >> >> >> >
>> >> >> >> >> Hi all,
>> >> >> >> >>
>> >> >> >> >> With Nilay?s recent improvements to Ruby I would like to
>> >> >>understand
>> >> >> >>if
>> >> >> >> >> there is any point in still having the RubyMemoryControl, or
>> >>if we
>> >> >> >> >> should just clean things up a bit and remove it. I would think
>> >>the
>> >> >> >>best
>> >> >> >> >> way forward is to clean up the integration of Ruby and classic
>> >>and
>> >> >> >> >> ensure that there is no duplicated functionality beyond what
>> is
>> >> >> >> >>strictly
>> >> >> >> >> necessary.
>> >> >> >> >>
>> >> >> >> >> Nilay, do you think this would make sense? Is there anyone
>> else
>> >> >>with
>> >> >> >> >>any
>> >> >> >> >> opinions in this matter?
>> >> >> >> >>
>> >> >> >> >
>> >> >> >> >
>> >> >> >> >I was in favor of dropping RubyMemoryControl.  But I had some
>> >> >> >>discussion
>> >> >> >> >with Brad Beckmann from AMD.  Since AMD has some infrastructure
>> >>in
>> >> >> >>place
>> >> >> >> >already, they would like to retain RubyMemoryControl for the
>> time
>> >> >> >>being.
>> >> >> >> >
>> >> >> >> >I suggest that we retain the memory controller code in ruby for
>> >> >>another
>> >> >> >> >six months or so, and then we will drop it.  In the mean time,
>> >>we
>> >> >> >> >will update the interface so that ruby protocols can use classic
>> >> >>memory
>> >> >> >> >controller.  The code for this is already on the reviewboard.
>> >>Over
>> >> >> >>this
>> >> >> >> >six month period, I hope, most users would have switched to
>> using
>> >> >> >>classic
>> >> >> >> >controller.
>> >> >> >> >
>> >> >> >> >Thanks
>> >> >> >> >Nilay
>> >> >> >> >_______________________________________________
>> >> >> >> >gem5-dev mailing list
>> >> >> >> >[email protected]
>> >> >> >> >http://m5sim.org/mailman/listinfo/gem5-dev
>> >> >> >>
>> >> >> >>
>> >> >> >> -- IMPORTANT NOTICE: The contents of this email and any
>> >>attachments
>> >> >>are
>> >> >> >> confidential and may also be privileged. If you are not the
>> >>intended
>> >> >> >> recipient, please notify the sender immediately and do not
>> >>disclose
>> >> >>the
>> >> >> >> contents to any other person, use it for any purpose, or store or
>> >> >>copy
>> >> >> >>the
>> >> >> >> information in any medium.  Thank you.
>> >> >> >>
>> >> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
>> >>9NJ,
>> >> >> >> Registered in England & Wales, Company No:  2557590
>> >> >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
>> >>CB1
>> >> >> >>9NJ,
>> >> >> >> Registered in England & Wales, Company No:  2548782
>> >> >> >> _______________________________________________
>> >> >> >> gem5-dev mailing list
>> >> >> >> [email protected]
>> >> >> >> http://m5sim.org/mailman/listinfo/gem5-dev
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >--
>> >> >> >  Joel Hestness
>> >> >> >  PhD Student, Computer Architecture
>> >> >> >  Dept. of Computer Science, University of Wisconsin - Madison
>> >> >> >  http://pages.cs.wisc.edu/~hestness/
>> >> >> >_______________________________________________
>> >> >> >gem5-dev mailing list
>> >> >> >[email protected]
>> >> >> >http://m5sim.org/mailman/listinfo/gem5-dev
>> >> >> >
>> >> >>
>> >> >>
>> >> >> -- IMPORTANT NOTICE: The contents of this email and any attachments
>> >>are
>> >> >> confidential and may also be privileged. If you are not the intended
>> >> >> recipient, please notify the sender immediately and do not disclose
>> >>the
>> >> >> contents to any other person, use it for any purpose, or store or
>> >>copy
>> >> >>the
>> >> >> information in any medium.  Thank you.
>> >> >>
>> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> >> >> Registered in England & Wales, Company No:  2557590
>> >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>> >> >>9NJ,
>> >> >> Registered in England & Wales, Company No:  2548782
>> >> >>
>> >> >> _______________________________________________
>> >> >> gem5-dev mailing list
>> >> >> [email protected]
>> >> >> http://m5sim.org/mailman/listinfo/gem5-dev
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> >--
>> >> >  Joel Hestness
>> >> >  PhD Student, Computer Architecture
>> >> >  Dept. of Computer Science, University of Wisconsin - Madison
>> >> >  http://pages.cs.wisc.edu/~hestness/
>> >> >_______________________________________________
>> >> >gem5-dev mailing list
>> >> >[email protected]
>> >> >http://m5sim.org/mailman/listinfo/gem5-dev
>> >> >
>> >>
>> >>
>> >> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>> >> confidential and may also be privileged. If you are not the intended
>> >> recipient, please notify the sender immediately and do not disclose the
>> >> contents to any other person, use it for any purpose, or store or copy
>> >>the
>> >> information in any medium.  Thank you.
>> >>
>> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> >> Registered in England & Wales, Company No:  2557590
>> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>> >>9NJ,
>> >> Registered in England & Wales, Company No:  2548782
>> >>
>> >> _______________________________________________
>> >> gem5-dev mailing list
>> >> [email protected]
>> >> http://m5sim.org/mailman/listinfo/gem5-dev
>> >>
>> >
>> >
>> >--
>> >  Joel Hestness
>> >  PhD Student, Computer Architecture
>> >  Dept. of Computer Science, University of Wisconsin - Madison
>> >  http://pages.cs.wisc.edu/~hestness/
>> >_______________________________________________
>> >gem5-dev mailing list
>> >[email protected]
>> >http://m5sim.org/mailman/listinfo/gem5-dev
>> >
>>
>>
>> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>> confidential and may also be privileged. If you are not the intended
>> recipient, please notify the sender immediately and do not disclose the
>> contents to any other person, use it for any purpose, or store or copy the
>> information in any medium.  Thank you.
>>
>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> Registered in England & Wales, Company No:  2557590
>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>> Registered in England & Wales, Company No:  2548782
>>
>> _______________________________________________
>> gem5-dev mailing list
>> [email protected]
>> http://m5sim.org/mailman/listinfo/gem5-dev
>>
>
> --
>   Joel Hestness
>   PhD Student, Computer Architecture
>   Dept. of Computer Science, University of Wisconsin - Madison
>   http://pages.cs.wisc.edu/~hestness/
>



-- 
  Joel Hestness
  PhD Candidate, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://pages.cs.wisc.edu/~hestness/
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to