Hi Joel,

I suspect this is down to configuration differences. The read/write
switching is probably a key part here. By default the DRAMCtrl has the
min-write-limit set to 16 (to not introduce too high read latencies). You
can beef up this number, and also increase the read/write queue sizes, and
thus improve the utilisation. In general I would suggest for someone to
“correlate” and tune the GDDR5 (and HBM) config based on an actual
controller implementation.

Andreas

On 22/09/2015 14:30, "gem5-dev on behalf of Joel Hestness"
<[email protected] on behalf of [email protected]> wrote:

>Hi Andreas,
>  Also, RE: your prompt about deprecating the RubyMemoryControl (here:
>http://reviews.gem5.org/r/3116/), I have some questions that relate to our
>discussion in this past thread:
>
>  I recall your claim that DRAMCtrls should be able to Pareto dominate the
>RubyMemoryControl, but I haven't found that to be the case in my tests.
>GPU
>applications frequently operate near peak achievable bandwidth, which
>hovers in the range of 80-81% for NVIDIA Fermi hardware. I'm generally
>able
>to get 81-83% of theoretical peak with the RubyMemoryControl configured
>like GDDR5. However, for some reason, I've only been able to get 72-74%
>out
>of DRAMCtrls (I tested all the available configs without much luck). Do
>you
>have ideas for what I might try?
>
>  Thanks!
>  Joel
>
>
>On Tue, Oct 14, 2014 at 3:14 PM, Joel Hestness <[email protected]>
>wrote:
>
>> Hi Andreas,
>>
>> Some brief clarifications before addressing your questions below: I've
>> validated most of our gem5-gpu memory hierarchy modeling against NVIDIA
>> Fermi hardware (GTX580 and Tesla C2070) using some reverse engineering.
>> While I've also tested newer hardware, it will be easier to validate
>>this
>> gem5 change if we aim to model something close to these same Fermi
>> baselines. Also, the RubyMemoryController doesn't model things like
>> separate data, command, and core frequency, or an open-page policy, so
>>I've
>> had to do some digging to translate the RubyMemoryController parameters
>> back to actual parameters - more on this below.
>>
>>
>> I am not sure I grok the latency and queue argument still. Adding a
>>larger
>>> response queue does not increase the latency unless there is also a
>>>bunch
>>> of transactions queued up. Am I missing something?
>>>
>>
>> I probably should have been referring to these as "delay queues" rather
>> than buffers. They do not model actual hardware buffers, but rather they
>> are meant to model the DRAM controller pipeline (and possibly
>>interconnect)
>> latencies. GPGPU-Sim often decouples functional simulation from timing
>> simulation, and this is one of those cases. Unlike gem5's event-driven
>> simulation which allows scheduling when an access should complete,
>> GPGPU-Sim puts accesses in these delay queues, which are stepped each
>>DRAM
>> controller cycle to move the accesses through at predictable latencies.
>>
>>
>> We can easily set the static pipeline latency to a high value, but 100
>>ns
>>> sounds much too large. What is that value based on?
>>>
>>
>> This was based on my own empirical testing, and what we've used for
>> validating gem5-gpu in the past. However, I appreciate the prompt for
>> review, since I had not thought through how the DRAMCtrl models this
>> differently than the RubyMemoryController:
>>
>> First, the RubyMemoryController models a close-page policy with a static
>> "bank access latency", which is meant to approximate RAS + CAS + bank
>> access. However, given that row-buffers are large for 8n prefetch, GPU
>>DRAM
>> controllers probably use an open-page policy, so an average random
>>access
>> (what I used for validating) would be an open-page row buffer miss and
>> probably include precharge latency. For this reason, I had pegged
>> RubyMemoryController zero-load latency between 88 and 104ns depending on
>> the system being modeled. A row-buffer hit would bring that zero-load
>> latency down to maybe 20-36ns.
>>
>> Second, it doesn't address your prompt directly, but it is worth noting
>> that most GPU memory hierarchies aim for very predictable latencies,
>>which
>> often means seemingly excessive pipelining. This is to help support
>> very-near-peak bandwidths where necessary and makes GPU-wide scheduling
>> more predictable. This is likely a second-order reason why GPU DRAM
>> controllers have higher than expected zero-load latency.
>>
>> I'm not sure I can get any more precise with the data I have, but maybe
>> that is enough to work with?
>>
>>
>> Do you have details on the addressing scheme you mention?
>>>
>>
>> The GPGPU-Sim config contains the following for the DRAM pin identifiers
>> (R = row address, C = column address, B = bank address, 0 = data), and
>>the
>> field sizes are echoed on page 15, table 6 of the Hynix data sheet:
>>
>> -gpgpu_mem_addr_mapping dramid@8
>> ;00000000.00000000.00000000.00000000.0000RRRR.RRRRRRRR.BBBCCCCB.CCSSSSSS
>>
>> I'm not familiar with the different (sub)cycle interpretations of
>>control
>> and address signals on particular lines though. Are the pin IDs
>>sufficient,
>> or are you looking for something more specific here?
>>
>>
>> In GDDR5, the command bus is DDR and the data bus QDR. The part you
>>refer
>>> to is either running at 5.5 Gbps or 6 Gbps according to the data
>>>sheet, so
>>> I do find the 4 Gbps surprising. Are you sure?
>>>
>>
>> Yes, to start with I think we should use the 4 Gbps target: We've
>> validated against the GeForce GTX480 and 580, which use low-end data bus
>> clocks between 3.2 and 4 Gbps/pin (note: in practice, discrete GPUs take
>> more liberties to set frequencies as desired, since their tighter
>>coupling
>> with DRAMs offers more freedom than the clocking agreements required
>> between CPU chips and their memories). More recent cards use the higher
>> frequencies, but I have limited testing with them.
>>
>>
>> The burst length I am talking about is the DRAM burst length, which for
>>> GDDR5 is 8 beats (so with a x32 mode that would be 32 bytes per burst.
>>>I
>>> do not see this specified anywhere in the line you sent, and therefore
>>>I
>>> am curious if it is omitted completely (DRAMSim2 for example makes
>>> dangerous assumptions here).
>>>
>>
>> Yep, I follow but wasn't sure if/how GPU vendors might play with this
>>for
>> streaming/interleaving purposes. On closer inspection of the Hynix doc,
>>it
>> says burst length is "8 only" on page 5. I also did a spot check of a
>>few,
>> more recent GDDR5 data sheets, and they show the same thing.
>>
>>
>>   Joel
>>
>>
>>
>>
>>
>>>
>>> On 14/10/2014 00:22, "Joel Hestness via gem5-dev" <[email protected]>
>>> wrote:
>>>
>>> >Hi Andreas,
>>> >
>>> >
>>> >> Thanks. I really do not understand the return queue argument. Why on
>>> >>earth
>>> >> would you need such a large return queue? Surely the agent making
>>>the
>>> >> requests (the GPU in this case) should have allocated space for the
>>> >> response, no?
>>> >>
>>> >
>>> >Good question. I believe that buffer is just to get the appropriate
>>> >overall
>>> >minimum latency for memory accesses (i.e. GPGPU-Sim does some level of
>>> >functional controller modeling, and the buffer is just for timing). If
>>> >there a way to add arbitrary latency to the DRAMCtrl, it would be
>>>nice to
>>> >aim for no-load latency of roughly 100ns (from when the controller
>>> >receives
>>> >the access to when a response is returned to the Ruby network). Also,
>>>if
>>> I
>>> >have a chance to test out Nilay's patch, I could tune on this setting.
>>> >
>>> >Concerning the configuration, what is the assumed clock speed (tCK),
>>>and
>>> >> is it operated as a x16 or x32 part? Is the burst length implicit in
>>> >>your
>>> >> configuration (or is 8 the default)?
>>> >>
>>> >
>>> >The memory modeled in GPGPU-Sim is x32 based on the addressing scheme.
>>> >That's where I'd start the mode.
>>> >
>>> >In GDDR5, the core frequency is 1/4 of command frequency (and 1/2 in
>>> >GDDR3?). A common channel frequency is 4GHz, resulting in effective
>>> >channel
>>> >bandwidth of 32GT/s. So, a starting baseline of tCK = 1ns should be
>>> >sufficient. Does that sound right?
>>> >
>>> >I believe burst length is typically 8 and can be much longer. In
>>> practice,
>>> >this depend highly on address hashing/interleaving across many
>>> controllers
>>> >in various different GPUs, so 8 should be a sufficient baseline.
>>> >
>>> >
>>> >  Joel
>>> >
>>> >
>>> >
>>> >On 10/13/14, 10:28 PM, "Joel Hestness via gem5-dev"
>>><[email protected]>
>>> >> wrote:
>>> >>
>>> >> >Hi Andreas,
>>> >> >  Sure thing. We try to closely replicate the parameters used in
>>> >>GPGPU-Sim
>>> >> >v3.2.2, which are specified in the Hynix datasheet here:
>>> >>
>>>>http://www.hynix.com/datasheet/pdf/graphics/H5GQ1H24AFR(Rev1.0).pdf.
>>> >> >
>>> >> >  Here are relevant excerpts from the GPGPU-Sim GTX480 config file
>>> >> >(attached):
>>> >> >
>>> >> ># The DRAM return queue and the scheduler queue together should
>>> provide
>>> >> >buffer
>>> >> ># to sustain the memory level parallelism to tolerate DRAM latency
>>> >> ># To allow 100% DRAM utility, there should at least be enough
>>>buffer
>>> to
>>> >> >sustain
>>> >> ># the minimum DRAM latency (100 core cycles).  I.e.
>>> >> >#   Total buffer space required = 100 x 924MHz / 700MHz = 132
>>> >> >-gpgpu_frfcfs_dram_sched_queue_size 16
>>> >> >-gpgpu_dram_return_queue_size 116
>>> >> >
>>> >> >-dram_data_command_freq_ratio 4  # GDDR5 is QDR
>>> >> >-gpgpu_dram_timing_opt
>>>"nbk=16:CCD=2:RRD=6:RCD=12:RAS=28:RP=12:RC=40:
>>> >> >
>>> CL=12:WL=4:CDLR=5:WR=12:nbkgrp=4:CCDL=3:RTPL=2"
>>> >> >
>>> >> >  If anything needs clarification, I'm happy to help sort it out.
>>>Just
>>> >>let
>>> >> >me know.
>>> >> >
>>> >> >  Thanks!
>>> >> >  Joel
>>> >> >
>>> >> >
>>> >> >
>>> >> >On Mon, Oct 13, 2014 at 4:11 PM, Andreas Hansson via gem5-dev <
>>> >> >[email protected]> wrote:
>>> >> >
>>> >> >> Hi Joel,
>>> >> >>
>>> >> >> I am happy to spend the 5 minutes creating a GDDR5
>>>configuration. Do
>>> >>you
>>> >> >> have any specific data sheet you would like to capture?
>>> >> >>
>>> >> >> Andreas
>>> >> >>
>>> >> >> On 10/13/14, 10:09 PM, "Joel Hestness via gem5-dev"
>>> >><[email protected]>
>>> >> >> wrote:
>>> >> >>
>>> >> >> >Hi guys,
>>> >> >> >
>>> >> >> >
>>> >> >> >> Thanks for the clarification. I believe the
>>>RubyMemoryController
>>> >>is
>>> >> >> >> completely Pareto dominated by the vanilla DRAMCtrl module,
>>>but
>>> if
>>> >> >>there
>>> >> >> >> is any specific feature/setting missing I would be keen to
>>>know.
>>> >> >> >>
>>> >> >> >> If possible I would like to make sure we use the same
>>>controller
>>> >>as a
>>> >> >> >> default for all timing simulations (even if the other one
>>>would
>>> be
>>> >> >> >> maintained as a fallback).
>>> >> >> >
>>> >> >> >
>>> >> >> >I'd like to second the desire to have a simple replacement
>>>baseline
>>> >> >>that
>>> >> >> >performs at least as well as the RubyMemoryController in
>>>most/all
>>> >> >>cases:
>>> >> >> >gem5-gpu now has more than 100 users, and as far as I know, we
>>>are
>>> >>all
>>> >> >> >using Ruby and thus the RubyMemoryController. The
>>> >>RubyMemoryController
>>> >> >>is
>>> >> >> >pretty simple to configure similarly to DDR3 or GDDR5 and to
>>> >>interpret
>>> >> >> >results. It performs surprisingly close to some GPU hardware. If
>>> >>this
>>> >> >> >controller goes away, I (and I'm sure other gem5-gpu users)
>>>would
>>> >> >>prefer
>>> >> >> >to
>>> >> >> >have something that is known to perform as well and is also
>>>easy to
>>> >> >> >configure.
>>> >> >> >
>>> >> >> >I think we (the gem5-gpu crew) are fine with the
>>> >>RubyMemoryController
>>> >> >> >going
>>> >> >> >away eventually. However, given that there isn't currently a
>>> >>GDDR-like
>>> >> >> >DRAMCtrl configuration in gem5, I'd like to second Nilay and
>>>Brad
>>> >>that
>>> >> >>we
>>> >> >> >offer users sufficient time to prepare for RubyMemoryController
>>> >> >>removal.
>>> >> >> >We
>>> >> >> >will need to adapt our heterogeneous Ruby coherence protocols,
>>>and
>>> >> >>other
>>> >> >> >users have their own protocols they'd need to adapt as well.
>>> >> >> >
>>> >> >> >
>>> >> >> >  Joel
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >> On 10/13/14, 9:01 PM, "Nilay Vaish via gem5-dev"
>>> >><[email protected]>
>>> >> >> >> wrote:
>>> >> >> >>
>>> >> >> >> >On Mon, 13 Oct 2014, Andreas Hansson via gem5-dev wrote:
>>> >> >> >> >
>>> >> >> >> >> Hi all,
>>> >> >> >> >>
>>> >> >> >> >> With Nilay?s recent improvements to Ruby I would like to
>>> >> >>understand
>>> >> >> >>if
>>> >> >> >> >> there is any point in still having the RubyMemoryControl,
>>>or
>>> >>if we
>>> >> >> >> >> should just clean things up a bit and remove it. I would
>>>think
>>> >>the
>>> >> >> >>best
>>> >> >> >> >> way forward is to clean up the integration of Ruby and
>>>classic
>>> >>and
>>> >> >> >> >> ensure that there is no duplicated functionality beyond
>>>what
>>> is
>>> >> >> >> >>strictly
>>> >> >> >> >> necessary.
>>> >> >> >> >>
>>> >> >> >> >> Nilay, do you think this would make sense? Is there anyone
>>> else
>>> >> >>with
>>> >> >> >> >>any
>>> >> >> >> >> opinions in this matter?
>>> >> >> >> >>
>>> >> >> >> >
>>> >> >> >> >
>>> >> >> >> >I was in favor of dropping RubyMemoryControl.  But I had some
>>> >> >> >>discussion
>>> >> >> >> >with Brad Beckmann from AMD.  Since AMD has some
>>>infrastructure
>>> >>in
>>> >> >> >>place
>>> >> >> >> >already, they would like to retain RubyMemoryControl for the
>>> time
>>> >> >> >>being.
>>> >> >> >> >
>>> >> >> >> >I suggest that we retain the memory controller code in ruby
>>>for
>>> >> >>another
>>> >> >> >> >six months or so, and then we will drop it.  In the mean
>>>time,
>>> >>we
>>> >> >> >> >will update the interface so that ruby protocols can use
>>>classic
>>> >> >>memory
>>> >> >> >> >controller.  The code for this is already on the reviewboard.
>>> >>Over
>>> >> >> >>this
>>> >> >> >> >six month period, I hope, most users would have switched to
>>> using
>>> >> >> >>classic
>>> >> >> >> >controller.
>>> >> >> >> >
>>> >> >> >> >Thanks
>>> >> >> >> >Nilay
>>> >> >> >> >_______________________________________________
>>> >> >> >> >gem5-dev mailing list
>>> >> >> >> >[email protected]
>>> >> >> >> >http://m5sim.org/mailman/listinfo/gem5-dev
>>> >> >> >>
>>> >> >> >>
>>> >> >> >> -- IMPORTANT NOTICE: The contents of this email and any
>>> >>attachments
>>> >> >>are
>>> >> >> >> confidential and may also be privileged. If you are not the
>>> >>intended
>>> >> >> >> recipient, please notify the sender immediately and do not
>>> >>disclose
>>> >> >>the
>>> >> >> >> contents to any other person, use it for any purpose, or
>>>store or
>>> >> >>copy
>>> >> >> >>the
>>> >> >> >> information in any medium.  Thank you.
>>> >> >> >>
>>> >> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge
>>>CB1
>>> >>9NJ,
>>> >> >> >> Registered in England & Wales, Company No:  2557590
>>> >> >> >> ARM Holdings plc, Registered office 110 Fulbourn Road,
>>>Cambridge
>>> >>CB1
>>> >> >> >>9NJ,
>>> >> >> >> Registered in England & Wales, Company No:  2548782
>>> >> >> >> _______________________________________________
>>> >> >> >> gem5-dev mailing list
>>> >> >> >> [email protected]
>>> >> >> >> http://m5sim.org/mailman/listinfo/gem5-dev
>>> >> >> >>
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> >--
>>> >> >> >  Joel Hestness
>>> >> >> >  PhD Student, Computer Architecture
>>> >> >> >  Dept. of Computer Science, University of Wisconsin - Madison
>>> >> >> >  http://pages.cs.wisc.edu/~hestness/
>>> >> >> >_______________________________________________
>>> >> >> >gem5-dev mailing list
>>> >> >> >[email protected]
>>> >> >> >http://m5sim.org/mailman/listinfo/gem5-dev
>>> >> >> >
>>> >> >>
>>> >> >>
>>> >> >> -- IMPORTANT NOTICE: The contents of this email and any
>>>attachments
>>> >>are
>>> >> >> confidential and may also be privileged. If you are not the
>>>intended
>>> >> >> recipient, please notify the sender immediately and do not
>>>disclose
>>> >>the
>>> >> >> contents to any other person, use it for any purpose, or store or
>>> >>copy
>>> >> >>the
>>> >> >> information in any medium.  Thank you.
>>> >> >>
>>> >> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
>>>9NJ,
>>> >> >> Registered in England & Wales, Company No:  2557590
>>> >> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
>>>CB1
>>> >> >>9NJ,
>>> >> >> Registered in England & Wales, Company No:  2548782
>>> >> >>
>>> >> >> _______________________________________________
>>> >> >> gem5-dev mailing list
>>> >> >> [email protected]
>>> >> >> http://m5sim.org/mailman/listinfo/gem5-dev
>>> >> >>
>>> >> >
>>> >> >
>>> >> >
>>> >> >--
>>> >> >  Joel Hestness
>>> >> >  PhD Student, Computer Architecture
>>> >> >  Dept. of Computer Science, University of Wisconsin - Madison
>>> >> >  http://pages.cs.wisc.edu/~hestness/
>>> >> >_______________________________________________
>>> >> >gem5-dev mailing list
>>> >> >[email protected]
>>> >> >http://m5sim.org/mailman/listinfo/gem5-dev
>>> >> >
>>> >>
>>> >>
>>> >> -- IMPORTANT NOTICE: The contents of this email and any attachments
>>>are
>>> >> confidential and may also be privileged. If you are not the intended
>>> >> recipient, please notify the sender immediately and do not disclose
>>>the
>>> >> contents to any other person, use it for any purpose, or store or
>>>copy
>>> >>the
>>> >> information in any medium.  Thank you.
>>> >>
>>> >> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>>> >> Registered in England & Wales, Company No:  2557590
>>> >> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>>> >>9NJ,
>>> >> Registered in England & Wales, Company No:  2548782
>>> >>
>>> >> _______________________________________________
>>> >> gem5-dev mailing list
>>> >> [email protected]
>>> >> http://m5sim.org/mailman/listinfo/gem5-dev
>>> >>
>>> >
>>> >
>>> >--
>>> >  Joel Hestness
>>> >  PhD Student, Computer Architecture
>>> >  Dept. of Computer Science, University of Wisconsin - Madison
>>> >  http://pages.cs.wisc.edu/~hestness/
>>> >_______________________________________________
>>> >gem5-dev mailing list
>>> >[email protected]
>>> >http://m5sim.org/mailman/listinfo/gem5-dev
>>> >
>>>
>>>
>>> -- IMPORTANT NOTICE: The contents of this email and any attachments are
>>> confidential and may also be privileged. If you are not the intended
>>> recipient, please notify the sender immediately and do not disclose the
>>> contents to any other person, use it for any purpose, or store or copy
>>>the
>>> information in any medium.  Thank you.
>>>
>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
>>> Registered in England & Wales, Company No:  2557590
>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
>>>9NJ,
>>> Registered in England & Wales, Company No:  2548782
>>>
>>> _______________________________________________
>>> gem5-dev mailing list
>>> [email protected]
>>> http://m5sim.org/mailman/listinfo/gem5-dev
>>>
>>
>> --
>>   Joel Hestness
>>   PhD Student, Computer Architecture
>>   Dept. of Computer Science, University of Wisconsin - Madison
>>   http://pages.cs.wisc.edu/~hestness/
>>
>
>
>
>--
>  Joel Hestness
>  PhD Candidate, Computer Architecture
>  Dept. of Computer Science, University of Wisconsin - Madison
>  http://pages.cs.wisc.edu/~hestness/
>_______________________________________________
>gem5-dev mailing list
>[email protected]
>http://m5sim.org/mailman/listinfo/gem5-dev


________________________________

-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
_______________________________________________
gem5-dev mailing list
[email protected]
http://m5sim.org/mailman/listinfo/gem5-dev

Reply via email to