Re: [gem5-users] DRAM controller write requests merge

Rizwana Begum via gem5-users Thu, 22 Jan 2015 15:59:34 -0800

Hello Andreas,

I agree totally with you that low power modes is the way to go for getting
better energy-performance tradeoffs for memory than going with DFS.
However, In past I have experimented with DFS for memory bus only. With
change in memory frequency I scaled only tBURST linearly with frequency and
observed performance vs. frequency trend for memory intensive benchmarks
for open-page policy. For closed page policy, I had no performance
improvement with increase in memory frequency as command-to-command static
latency of DRAM dominates. I also had energy vs frequency trade-off as
background energy scales with memory frequency.


All of the above DFS exploration was done on an old Gem5 commit ( commit:
d2404e ). I had a simplified micron memory power model and the simple
frequency scaling mentioned above implemented on top of this old commit.
Recently we moved to a latest Gem5 commit (commit : 4a411f) that has
detailed power and performance model compared to the old commit. I am
trying to have a quick DFS implementation here and observe the trends of
energy and performance vs. memory frequency. Then I think, exploring low
power modes will be my next step.

I am able to express the timings parameters that are specific to DRAM
module in terms of memory frequency. Some of the DRAM related timing
parameters are static latencies, some are function of tCK (Got details from
micron datasheet). From discussion in this thread so far, I think PHY also
works in sync with memory frequency. While, MC I believe should have either
it's own clock domain, or might work in L1/L2/Core clock domain. However,
given that I don't have a good model for MC and PHY latencies, for now, I
am planning to only scale DRAM related parameters and leave PHY,MC static
latencies as they are.

I appreciate yours and Tao's inputs so far. I would be happy to receive any
more ideas if you have regarding my DFS implementation approach.

Thank you,
-Rizwana



On Thu, Jan 22, 2015 at 5:10 PM, Andreas Hansson <andreas.hans...@arm.com>
wrote:

> Hi Rizwana,
>
> All objects belong to a clock domain. That said, there is no clock domain
> specifically for the memory controller in the example scripts. At the
> moment all the timings in the controller are absolute, and are not
> expressed in cycles.
>
> In general the best strategy to modulate DRAM performance is to use the
> low power modes (rather than DVFS). The energy spent is far from
> proportional, and thus it is better to be completely off when possible. We
> have some patches that add low-power modes to the controller and they
> should hopefully be on RB soon.
>
> Andreas
>
> -----
> On Jan 22, 2015 at 9:59 PM, Rizwana Begum <rizwana....@gmail.com> wrote:
>
>
> Ah. I see. Thanks for pointing me to the static latencies, I missed on
> that. As the controller latency is modeled as static latency, am I right in
> saying that as of the latest commit MC is not attached to any clock domain
> in Gem5?
>
> Thanks,
> -Rizwana
>
> On Thursday, January 22, 2015, Andreas Hansson via gem5-users <
> gem5-users@gem5.org> wrote:
>
> > Hi Rizwana,
> >
> > The DRAM controller has two parameters to control the static latency in
> > the controller itself, and also the PHY and actual round trip to the
> > device. These parameters are called front end and back end latency, and
> you
> > can set them to match a given controller architecture and/or PHY
> > implementation. That should be enough for most cases I would think.
> >
> > Andreas
> >
> > -----
> > On Jan 22, 2015 at 9:22 PM, Rizwana Begum via gem5-users <
> > gem5-users@gem5.org <javascript:;>> wrote:
> >
> >
> > Great. That was helpful. So, am I right in assuming that Gem5 DRAM
> > controller model doesn't account for signal propagation delay on
> > command/data bus? I am coming to this conclusion as read response event
> > from MC is scheduled to upper ports after tCL+tBURST after read command
> is
> > issued. Infact, I had a chance to use DRAMSim2 in the past, and I don't
> > remember signal propagation delay being accounted there either. Is it too
> > small and can safely be ignored?
> >
> > Thanks,
> > -Rizwana
> >
> > On Thu, Jan 22, 2015 at 2:58 PM, Tao Zhang <tao.zhang.0...@gmail.com
> > <javascript:;>> wrote:
> >
> > > The timing of RL (aka tCL) is dedicated to DRAM module. This is the
> > > distance from DRAM module receive the CAS command to DRAM module put
> the
> > > first data on the interface/bus. On MC/PHY side, it should account for
> > the
> > > signal propagation delay on the command/data bus. In fact, signal "DQS"
> > is
> > > also used to assist the read data sampling.
> > >
> > > The bus protocol is defined by JEDEC. It is completely different from
> > > AMBA/AHB. The bus has only one master (MC) and may have multiple slaves
> > > (DRAM ranks). So it looks like a AHB-lite. But in general, they are two
> > > stories.
> > >
> > > -Tao
> > >
> > > On Thu, Jan 22, 2015 at 11:50 AM, Rizwana Begum <rizwana....@gmail.com
> > <javascript:;>>
> > > wrote:
> > >
> > >> Thanks Tao for your response. That clarifies a lot of my questions. So
> > >> here is what I understand:
> > >>
> > >> DRAM module runs at a particular clock frequency. Bus connecting DRAM
> > >> module and PHY runs in sync with this clock frequency. PHY as well
> runs
> > >> synchronously to DRAM module clock frequency. Now, for a 64bit bus,
> > burst
> > >> length of 8 (64 bytes transferred per burst) my understanding of read
> > >> operations is that, after the read command is issued, first bit of
> data
> > is
> > >> available after read latency. At immediate clock edge after read
> > latency,
> > >> 8bytes are sampled and transferred over the bus. Then every
> consecutive
> > >> rising and falling clock edges, 8 more bytes are sampled and
> transferred
> > >> over the bus for four consecutive clock cycles. Thereby, PHY receives
> > all
> > >> 64bytes worth data at the end of read latency + 4 clock cycles. Is
> this
> > >> right?
> > >>
> > >> Also, any idea if this bus (connecting DRAM and PHY) same as system
> bus?
> > >> For example, is it AMBA/AHB on latest ARM SoCs?
> > >>
> > >> Thanks again,
> > >> -Rizwana
> > >>
> > >> On Thu, Jan 22, 2015 at 12:49 PM, Tao Zhang <tao.zhang.0...@gmail.com
> > <javascript:;>>
> > >> wrote:
> > >>
> > >>> Hi Rizwana,
> > >>>
> > >>> see my understanding inline. Thanks,
> > >>>
> > >>> -Tao
> > >>>
> > >>> On Thu, Jan 22, 2015 at 8:12 AM, Rizwana Begum via gem5-users <
> > >>> gem5-users@gem5.org <javascript:;>> wrote:
> > >>>
> > >>>> Hello Andreas,
> > >>>>
> > >>>> Thanks for the reply. Sure, I will try to get the patch up on review
> > >>>> board.
> > >>>> I have another question: Though this is related to DDR/MC
> architecture
> > >>>> and not directly related to Gem5 DDR model implementation, I am
> > hoping you
> > >>>> (or anyone else on the list) would have a good understanding to
> > clarify my
> > >>>> confusions:
> > >>>>
> > >>>> As far as I understand 'busBusyUntil' represents the data bus. This
> > >>>> variable is used to keep track of data bus availability:
> > >>>>
> > >>>> 1. Is the data bus is the bus used to transfer data from core DRAM
> > >>>> module to PHY?
> > >>>>
> > >>>
> > >>>    Yes, you are right. In addition, this is also the bus to transfer
> > >>> data from PHY to DRAM module.
> > >>>
> > >>>
> > >>>> 2. I believe PHY is the DRAM physical interface IP. Where is it
> > >>>> physically located? Is it located on core along side memory
> > controller (MC)
> > >>>> or on DIMMs? And what exactly does physical bus (the wires
> connecting
> > DIMMs
> > >>>> to MC) connect? DRAM and PHY or PHY and MC?
> > >>>>
> > >>>
> > >>>     It is on the core/MC side.  The physical bus refers to DRAM and
> > PHY.
> > >>> Logically, you can treat PHY as a part of MC and it just incurs some
> > extra
> > >>> latency. In this way, the physical bus can be extended to DRAM and
> MC.
> > >>>
> > >>>
> > >>>> 3. My confusion is that actual physical bus on SoC connecting the
> DRAM
> > >>>> module and MC should be different from data bus that 'busBusyUntil'
> is
> > >>>> representing. It takes tBURST ns (function of memory cycles) to
> > transfer
> > >>>> one burst on the data bus and the actual physical bus speed
> shouldn't
> > be
> > >>>> depending upon memory frequency for transferring data from DRAM to
> > MC. Am I
> > >>>> right?
> > >>>>
> > >>>
> > >>>     The "busBusyUntil" is still valid. The actual physical bus speed
> > >>> should be consistent with the SPEC (e.g., 800MHz, 933MHz,
> 1600MHz...).
> > >>> Remember, the JEDEC DRAM is a Synchronous DRAM. It means both PHY and
> > DRAM
> > >>> module should be in sync with the same clock frequency. As one end of
> > the
> > >>> connection is the DRAM module, PHY should run at the same frequency
> as
> > DRAM
> > >>> module runs.
> > >>>
> > >>>
> > >>>> I would appreciate if anyone can provide insight into these
> questions.
> > >>>>
> > >>>> Thank you,
> > >>>> -Rizwana
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> On Wed, Jan 21, 2015 at 4:45 PM, Andreas Hansson <
> > >>>> andreas.hans...@arm.com <javascript:;>> wrote:
> > >>>>
> > >>>>>  Hi Rizwana,
> > >>>>>
> > >>>>>  It could very well be that you’ve hit a bug. I’d suggest to post a
> > >>>>> review on the reviewboard to make it more clear what changes need
> to
> > be
> > >>>>> done. If you’re not familiar with the process have a look at
> > >>>>> http://www.gem5.org/Commit_Access. The easiest is to use the
> > >>>>> reviewboard mercurial plugin.
> > >>>>>
> > >>>>>  I look forward to see the patch.
> > >>>>>
> > >>>>>  Thanks,
> > >>>>>
> > >>>>>  Andreas
> > >>>>>
> > >>>>>   From: Rizwana Begum via gem5-users <gem5-users@gem5.org
> > <javascript:;>>
> > >>>>> Reply-To: Rizwana Begum <rizwana....@gmail.com <javascript:;>>,
> > gem5 users mailing
> > >>>>> list <gem5-users@gem5.org <javascript:;>>
> > >>>>> Date: Wednesday, 21 January 2015 16:24
> > >>>>> To: gem5 users mailing list <gem5-users@gem5.org <javascript:;>>
> > >>>>> Subject: [gem5-users] DRAM controller write requests merge
> > >>>>>
> > >>>>>  Hello Users,
> > >>>>>
> > >>>>>  I am trying to understanding write packets queuing in DRAM
> > >>>>> controller model. I am looking at 'addToWriteQueue' function. From
> my
> > >>>>> understanding so far, it merges write requests across burst
> > boundaries.
> > >>>>> Looking at following if statement:
> > >>>>>
> > >>>>>  if ((addr + size) >= (*w)->addr &&
> > >>>>>                            ((*w)->addr + (*w)->size - addr) <=
> > >>>>> burstSize) {
> > >>>>>                     // the new one is just before or partially
> > >>>>>                     // overlapping with the existing one, and
> > together
> > >>>>>                     // they fit within a burst
> > >>>>> ....
> > >>>>>  ....
> > >>>>> ....
> > >>>>> }
> > >>>>>
> > >>>>>  Merging here may make the write request go across burst boundary.
> > >>>>> Size computation in the beginning of the for loop of this function
> > suggests
> > >>>>> that packets are split at burst boundaries. For example, if the
> > packet addr
> > >>>>> is 16, burst size is 32 bytes and packet request size is 25 bytes
> > (all in
> > >>>>> decimal for ease), then 2 write bursts should be added to the
> queue:
> > 16-31,
> > >>>>> 32-40. However, while merging, lets say if there existed a packet
> > already
> > >>>>> in write queue from 32-40, then a write from 16-40 is added to the
> > queue
> > >>>>> which is across burst boundary. is that physically possible?
> > Shouldn't
> > >>>>> there be two write requests in the queue:16-31, 32-40 instead of
> one
> > single
> > >>>>> merged request?
> > >>>>>
> > >>>>>  Thank you,
> > >>>>> -Rizwana
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> -- IMPORTANT NOTICE: The contents of this email and any attachments
> > >>>>> are confidential and may also be privileged. If you are not the
> > intended
> > >>>>> recipient, please notify the sender immediately and do not disclose
> > the
> > >>>>> contents to any other person, use it for any purpose, or store or
> > copy the
> > >>>>> information in any medium. Thank you.
> > >>>>>
> > >>>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
> 9NJ,
> > >>>>> Registered in England & Wales, Company No: 2557590
> > >>>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
> CB1
> > >>>>> 9NJ, Registered in England & Wales, Company No: 2548782
> > >>>>>
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> gem5-users mailing list
> > >>>> gem5-users@gem5.org <javascript:;>
> > >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> > >>>>
> > >>>
> > >>>
> > >>
> > >
> >
> > _______________________________________________
> > gem5-users mailing list
> > gem5-users@gem5.org <javascript:;>
> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
>

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAM controller write requests merge

Reply via email to