Re: [gem5-users] DRAM controller write requests merge

Andreas Hansson via gem5-users Fri, 23 Jan 2015 01:18:14 -0800

Hi Rizwana,

If you really want to do DRAM DFS using the ClockDomains of gem5, we will 
indeed need to have a dedicated clock domain for the controller, and a 
multiplier for the PHY/interface. It can be done, we would just have to enhance 
all the timings to be expressed in clocks where appropriate, and then implement 
all the equations for the timings that are max or an absolute time and a number 
of clocks. It can be done, it’s quite a lot of work though, and it will add a 
big chunk of complexity to the code.


Also, note that the DFS work done by Qingyuan and Rizwana (till now) uses the 
Micron power calculator. This is very dangerous (in my opinion), as the 
calculator assumes best-case timings, and when you start using DFS I have 
little or no confidence in this model. See all the DRAMPower publications on 
this issue. This particular issue is solved when using DRAMPower together with 
the gem5 DRAM controller model, so going forward you should not have these 
problems Rizwana. In addition, neither of these models (Micron or DRAMPower) 
include the PHY power, which is particularly non-linear. In short, I would be 
very careful in drawing any conclusions from these results.

Don’t get me wrong, I’m not trying to discourage you from looking at DRAM DFS. 
Just be aware that it’s not as easy and straightforward as it seems.

Andreas

From: Tao Zhang <tao.zhang.0...@gmail.com<mailto:tao.zhang.0...@gmail.com>>
Date: Friday, 23 January 2015 01:24
To: Rizwana Begum <rizwana....@gmail.com<mailto:rizwana....@gmail.com>>
Cc: Andreas Hansson <andreas.hans...@arm.com<mailto:andreas.hans...@arm.com>>, 
gem5 users mailing list <gem5-users@gem5.org<mailto:gem5-users@gem5.org>>
Subject: Re: DRAM controller write requests merge

>>>> MC I believe should have either it's own clock domain, or might work in 
>>>> L1/L2/Core clock domain.

It is more reasonable to assume MC working at the same frequency as DRAM rather 
than the high CPU clock frequency. In fact, MC frequency is relatively flexible 
in a real chip. It can run even much slower than DRAM frequency as long as the 
its peak bandwidth >= DRAM peak bandwidth. Two CDCs (clock domain crossing) may 
be needed in the MC. One between the system bus and MC while the other is 
between MC and PHY/DRAM bus.

When it comes to the pros/cons DVFS vs. low power, Qingyuan Deng has a series 
of papers talking about it with various granularity: MemScale , MultiScale, 
CoScale.  (http://paul.rutgers.edu/~qdeng/) I am not going to argue which one 
is better. But at least it can give you a straightforward insight on this 
technology in memory subsystem.

-Tao

On Thu, Jan 22, 2015 at 3:59 PM, Rizwana Begum 
<rizwana....@gmail.com<mailto:rizwana....@gmail.com>> wrote:
Hello Andreas,

I agree totally with you that low power modes is the way to go for getting 
better energy-performance tradeoffs for memory than going with DFS. However, In 
past I have experimented with DFS for memory bus only. With change in memory 
frequency I scaled only tBURST linearly with frequency and observed performance 
vs. frequency trend for memory intensive benchmarks for open-page policy. For 
closed page policy, I had no performance improvement with increase in memory 
frequency as command-to-command static latency of DRAM dominates. I also had 
energy vs frequency trade-off as background energy scales with memory frequency.

All of the above DFS exploration was done on an old Gem5 commit ( commit: 
d2404e ). I had a simplified micron memory power model and the simple frequency 
scaling mentioned above implemented on top of this old commit. Recently we 
moved to a latest Gem5 commit (commit : 4a411f) that has detailed power and 
performance model compared to the old commit. I am trying to have a quick DFS 
implementation here and observe the trends of energy and performance vs. memory 
frequency. Then I think, exploring low power modes will be my next step.

I am able to express the timings parameters that are specific to DRAM module in 
terms of memory frequency. Some of the DRAM related timing parameters are 
static latencies, some are function of tCK (Got details from micron datasheet). 
From discussion in this thread so far, I think PHY also works in sync with 
memory frequency. While, MC I believe should have either it's own clock domain, 
or might work in L1/L2/Core clock domain. However, given that I don't have a 
good model for MC and PHY latencies, for now, I am planning to only scale DRAM 
related parameters and leave PHY,MC static latencies as they are.

I appreciate yours and Tao's inputs so far. I would be happy to receive any 
more ideas if you have regarding my DFS implementation approach.

Thank you,
-Rizwana



On Thu, Jan 22, 2015 at 5:10 PM, Andreas Hansson 
<andreas.hans...@arm.com<mailto:andreas.hans...@arm.com>> wrote:
Hi Rizwana,

All objects belong to a clock domain. That said, there is no clock domain 
specifically for the memory controller in the example scripts. At the moment 
all the timings in the controller are absolute, and are not expressed in cycles.

In general the best strategy to modulate DRAM performance is to use the low 
power modes (rather than DVFS). The energy spent is far from proportional, and 
thus it is better to be completely off when possible. We have some patches that 
add low-power modes to the controller and they should hopefully be on RB soon.

Andreas

-----
On Jan 22, 2015 at 9:59 PM, Rizwana Begum 
<rizwana....@gmail.com<mailto:rizwana....@gmail.com>> wrote:


Ah. I see. Thanks for pointing me to the static latencies, I missed on
that. As the controller latency is modeled as static latency, am I right in
saying that as of the latest commit MC is not attached to any clock domain
in Gem5?

Thanks,
-Rizwana

On Thursday, January 22, 2015, Andreas Hansson via gem5-users <
gem5-users@gem5.org<mailto:gem5-users@gem5.org>> wrote:

> Hi Rizwana,
>
> The DRAM controller has two parameters to control the static latency in
> the controller itself, and also the PHY and actual round trip to the
> device. These parameters are called front end and back end latency, and you
> can set them to match a given controller architecture and/or PHY
> implementation. That should be enough for most cases I would think.
>
> Andreas
>
> -----
> On Jan 22, 2015 at 9:22 PM, Rizwana Begum via gem5-users <
> gem5-users@gem5.org<mailto:gem5-users@gem5.org> <javascript:;>> wrote:
>
>
> Great. That was helpful. So, am I right in assuming that Gem5 DRAM
> controller model doesn't account for signal propagation delay on
> command/data bus? I am coming to this conclusion as read response event
> from MC is scheduled to upper ports after tCL+tBURST after read command is
> issued. Infact, I had a chance to use DRAMSim2 in the past, and I don't
> remember signal propagation delay being accounted there either. Is it too
> small and can safely be ignored?
>
> Thanks,
> -Rizwana
>
> On Thu, Jan 22, 2015 at 2:58 PM, Tao Zhang 
> <tao.zhang.0...@gmail.com<mailto:tao.zhang.0...@gmail.com>
> <javascript:;>> wrote:
>
> > The timing of RL (aka tCL) is dedicated to DRAM module. This is the
> > distance from DRAM module receive the CAS command to DRAM module put the
> > first data on the interface/bus. On MC/PHY side, it should account for
> the
> > signal propagation delay on the command/data bus. In fact, signal "DQS"
> is
> > also used to assist the read data sampling.
> >
> > The bus protocol is defined by JEDEC. It is completely different from
> > AMBA/AHB. The bus has only one master (MC) and may have multiple slaves
> > (DRAM ranks). So it looks like a AHB-lite. But in general, they are two
> > stories.
> >
> > -Tao
> >
> > On Thu, Jan 22, 2015 at 11:50 AM, Rizwana Begum 
> > <rizwana....@gmail.com<mailto:rizwana....@gmail.com>
> <javascript:;>>
> > wrote:
> >
> >> Thanks Tao for your response. That clarifies a lot of my questions. So
> >> here is what I understand:
> >>
> >> DRAM module runs at a particular clock frequency. Bus connecting DRAM
> >> module and PHY runs in sync with this clock frequency. PHY as well runs
> >> synchronously to DRAM module clock frequency. Now, for a 64bit bus,
> burst
> >> length of 8 (64 bytes transferred per burst) my understanding of read
> >> operations is that, after the read command is issued, first bit of data
> is
> >> available after read latency. At immediate clock edge after read
> latency,
> >> 8bytes are sampled and transferred over the bus. Then every consecutive
> >> rising and falling clock edges, 8 more bytes are sampled and transferred
> >> over the bus for four consecutive clock cycles. Thereby, PHY receives
> all
> >> 64bytes worth data at the end of read latency + 4 clock cycles. Is this
> >> right?
> >>
> >> Also, any idea if this bus (connecting DRAM and PHY) same as system bus?
> >> For example, is it AMBA/AHB on latest ARM SoCs?
> >>
> >> Thanks again,
> >> -Rizwana
> >>
> >> On Thu, Jan 22, 2015 at 12:49 PM, Tao Zhang 
> >> <tao.zhang.0...@gmail.com<mailto:tao.zhang.0...@gmail.com>
> <javascript:;>>
> >> wrote:
> >>
> >>> Hi Rizwana,
> >>>
> >>> see my understanding inline. Thanks,
> >>>
> >>> -Tao
> >>>
> >>> On Thu, Jan 22, 2015 at 8:12 AM, Rizwana Begum via gem5-users <
> >>> gem5-users@gem5.org<mailto:gem5-users@gem5.org> <javascript:;>> wrote:
> >>>
> >>>> Hello Andreas,
> >>>>
> >>>> Thanks for the reply. Sure, I will try to get the patch up on review
> >>>> board.
> >>>> I have another question: Though this is related to DDR/MC architecture
> >>>> and not directly related to Gem5 DDR model implementation, I am
> hoping you
> >>>> (or anyone else on the list) would have a good understanding to
> clarify my
> >>>> confusions:
> >>>>
> >>>> As far as I understand 'busBusyUntil' represents the data bus. This
> >>>> variable is used to keep track of data bus availability:
> >>>>
> >>>> 1. Is the data bus is the bus used to transfer data from core DRAM
> >>>> module to PHY?
> >>>>
> >>>
> >>>    Yes, you are right. In addition, this is also the bus to transfer
> >>> data from PHY to DRAM module.
> >>>
> >>>
> >>>> 2. I believe PHY is the DRAM physical interface IP. Where is it
> >>>> physically located? Is it located on core along side memory
> controller (MC)
> >>>> or on DIMMs? And what exactly does physical bus (the wires connecting
> DIMMs
> >>>> to MC) connect? DRAM and PHY or PHY and MC?
> >>>>
> >>>
> >>>     It is on the core/MC side.  The physical bus refers to DRAM and
> PHY.
> >>> Logically, you can treat PHY as a part of MC and it just incurs some
> extra
> >>> latency. In this way, the physical bus can be extended to DRAM and MC.
> >>>
> >>>
> >>>> 3. My confusion is that actual physical bus on SoC connecting the DRAM
> >>>> module and MC should be different from data bus that 'busBusyUntil' is
> >>>> representing. It takes tBURST ns (function of memory cycles) to
> transfer
> >>>> one burst on the data bus and the actual physical bus speed shouldn't
> be
> >>>> depending upon memory frequency for transferring data from DRAM to
> MC. Am I
> >>>> right?
> >>>>
> >>>
> >>>     The "busBusyUntil" is still valid. The actual physical bus speed
> >>> should be consistent with the SPEC (e.g., 800MHz, 933MHz, 1600MHz...).
> >>> Remember, the JEDEC DRAM is a Synchronous DRAM. It means both PHY and
> DRAM
> >>> module should be in sync with the same clock frequency. As one end of
> the
> >>> connection is the DRAM module, PHY should run at the same frequency as
> DRAM
> >>> module runs.
> >>>
> >>>
> >>>> I would appreciate if anyone can provide insight into these questions.
> >>>>
> >>>> Thank you,
> >>>> -Rizwana
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jan 21, 2015 at 4:45 PM, Andreas Hansson <
> >>>> andreas.hans...@arm.com<mailto:andreas.hans...@arm.com> <javascript:;>> 
> >>>> wrote:
> >>>>
> >>>>>  Hi Rizwana,
> >>>>>
> >>>>>  It could very well be that you’ve hit a bug. I’d suggest to post a
> >>>>> review on the reviewboard to make it more clear what changes need to
> be
> >>>>> done. If you’re not familiar with the process have a look at
> >>>>> http://www.gem5.org/Commit_Access. The easiest is to use the
> >>>>> reviewboard mercurial plugin.
> >>>>>
> >>>>>  I look forward to see the patch.
> >>>>>
> >>>>>  Thanks,
> >>>>>
> >>>>>  Andreas
> >>>>>
> >>>>>   From: Rizwana Begum via gem5-users 
> >>>>> <gem5-users@gem5.org<mailto:gem5-users@gem5.org>
> <javascript:;>>
> >>>>> Reply-To: Rizwana Begum 
> >>>>> <rizwana....@gmail.com<mailto:rizwana....@gmail.com> <javascript:;>>,
> gem5 users mailing
> >>>>> list <gem5-users@gem5.org<mailto:gem5-users@gem5.org> <javascript:;>>
> >>>>> Date: Wednesday, 21 January 2015 16:24
> >>>>> To: gem5 users mailing list 
> >>>>> <gem5-users@gem5.org<mailto:gem5-users@gem5.org> <javascript:;>>
> >>>>> Subject: [gem5-users] DRAM controller write requests merge
> >>>>>
> >>>>>  Hello Users,
> >>>>>
> >>>>>  I am trying to understanding write packets queuing in DRAM
> >>>>> controller model. I am looking at 'addToWriteQueue' function. From my
> >>>>> understanding so far, it merges write requests across burst
> boundaries.
> >>>>> Looking at following if statement:
> >>>>>
> >>>>>  if ((addr + size) >= (*w)->addr &&
> >>>>>                            ((*w)->addr + (*w)->size - addr) <=
> >>>>> burstSize) {
> >>>>>                     // the new one is just before or partially
> >>>>>                     // overlapping with the existing one, and
> together
> >>>>>                     // they fit within a burst
> >>>>> ....
> >>>>>  ....
> >>>>> ....
> >>>>> }
> >>>>>
> >>>>>  Merging here may make the write request go across burst boundary.
> >>>>> Size computation in the beginning of the for loop of this function
> suggests
> >>>>> that packets are split at burst boundaries. For example, if the
> packet addr
> >>>>> is 16, burst size is 32 bytes and packet request size is 25 bytes
> (all in
> >>>>> decimal for ease), then 2 write bursts should be added to the queue:
> 16-31,
> >>>>> 32-40. However, while merging, lets say if there existed a packet
> already
> >>>>> in write queue from 32-40, then a write from 16-40 is added to the
> queue
> >>>>> which is across burst boundary. is that physically possible?
> Shouldn't
> >>>>> there be two write requests in the queue:16-31, 32-40 instead of one
> single
> >>>>> merged request?
> >>>>>
> >>>>>  Thank you,
> >>>>> -Rizwana
> >>>>>
> >>>>>
> >>>>>
> >>>>> -- IMPORTANT NOTICE: The contents of this email and any attachments
> >>>>> are confidential and may also be privileged. If you are not the
> intended
> >>>>> recipient, please notify the sender immediately and do not disclose
> the
> >>>>> contents to any other person, use it for any purpose, or store or
> copy the
> >>>>> information in any medium. Thank you.
> >>>>>
> >>>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ,
> >>>>> Registered in England & Wales, Company No: 2557590
> >>>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1
> >>>>> 9NJ, Registered in England & Wales, Company No: 2548782
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> gem5-users mailing list
> >>>> gem5-users@gem5.org<mailto:gem5-users@gem5.org> <javascript:;>
> >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
> >>>>
> >>>
> >>>
> >>
> >
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org<mailto:gem5-users@gem5.org> <javascript:;>
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users




-- IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.

ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, Registered 
in England & Wales, Company No: 2557590
ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge CB1 9NJ, 
Registered in England & Wales, Company No: 2548782

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAM controller write requests merge

Reply via email to