Re: [gem5-users] DRAM controller write requests merge

Rizwana Begum via gem5-users Thu, 22 Jan 2015 16:59:14 -0800

Hello,

Just on a side note: I messed up commit ids:


Our *old Gem5 commit* was: ad4564da49b5e9d3f94b8015b7d0e02432bc20b8
Author: Chander Sudanthi <chander.sudan...@arm.com>
Date:   Thu Oct 31 13:41:13 2013 -0500

    ARM: add support for TEEHBR access

    Thumb2 ARM kernels may access the TEEHBR via thumbee_notifier
    in arch/arm/kernel/thumbee.c.  The Linux kernel code just seems
    to be saving and restoring the register.  This patch adds support
    for the TEEHBR cp14 register.  Note, this may be a special case
    when restoring from an image that was run on a system that
    supports ThumbEE.

*new Gem5 commit *is: 1c9d5d9417b35e002d65c3eb329b372fd2320055
Author: Andreas Hansson <andreas.hans...@arm.com>
Date:   Tue Dec 2 06:08:25 2014 -0500

    stats: Bump stats for fixes, mostly TLB and WriteInvalidate

Thank you,
-Rizwana

On Thu, Jan 22, 2015 at 6:59 PM, Rizwana Begum <rizwana....@gmail.com>
wrote:

> Hello Andreas,
>
> I agree totally with you that low power modes is the way to go for getting
> better energy-performance tradeoffs for memory than going with DFS.
> However, In past I have experimented with DFS for memory bus only. With
> change in memory frequency I scaled only tBURST linearly with frequency and
> observed performance vs. frequency trend for memory intensive benchmarks
> for open-page policy. For closed page policy, I had no performance
> improvement with increase in memory frequency as command-to-command static
> latency of DRAM dominates. I also had energy vs frequency trade-off as
> background energy scales with memory frequency.
>
> All of the above DFS exploration was done on an old Gem5 commit ( commit:
> d2404e ). I had a simplified micron memory power model and the simple
> frequency scaling mentioned above implemented on top of this old commit.
> Recently we moved to a latest Gem5 commit (commit : 4a411f) that has
> detailed power and performance model compared to the old commit. I am
> trying to have a quick DFS implementation here and observe the trends of
> energy and performance vs. memory frequency. Then I think, exploring low
> power modes will be my next step.
>
> I am able to express the timings parameters that are specific to DRAM
> module in terms of memory frequency. Some of the DRAM related timing
> parameters are static latencies, some are function of tCK (Got details from
> micron datasheet). From discussion in this thread so far, I think PHY also
> works in sync with memory frequency. While, MC I believe should have either
> it's own clock domain, or might work in L1/L2/Core clock domain. However,
> given that I don't have a good model for MC and PHY latencies, for now, I
> am planning to only scale DRAM related parameters and leave PHY,MC static
> latencies as they are.
>
> I appreciate yours and Tao's inputs so far. I would be happy to receive
> any more ideas if you have regarding my DFS implementation approach.
>
> Thank you,
> -Rizwana
>
>
>
> On Thu, Jan 22, 2015 at 5:10 PM, Andreas Hansson <andreas.hans...@arm.com>
> wrote:
>
>> Hi Rizwana,
>>
>> All objects belong to a clock domain. That said, there is no clock domain
>> specifically for the memory controller in the example scripts. At the
>> moment all the timings in the controller are absolute, and are not
>> expressed in cycles.
>>
>> In general the best strategy to modulate DRAM performance is to use the
>> low power modes (rather than DVFS). The energy spent is far from
>> proportional, and thus it is better to be completely off when possible. We
>> have some patches that add low-power modes to the controller and they
>> should hopefully be on RB soon.
>>
>> Andreas
>>
>> -----
>> On Jan 22, 2015 at 9:59 PM, Rizwana Begum <rizwana....@gmail.com> wrote:
>>
>>
>> Ah. I see. Thanks for pointing me to the static latencies, I missed on
>> that. As the controller latency is modeled as static latency, am I right
>> in
>> saying that as of the latest commit MC is not attached to any clock domain
>> in Gem5?
>>
>> Thanks,
>> -Rizwana
>>
>> On Thursday, January 22, 2015, Andreas Hansson via gem5-users <
>> gem5-users@gem5.org> wrote:
>>
>> > Hi Rizwana,
>> >
>> > The DRAM controller has two parameters to control the static latency in
>> > the controller itself, and also the PHY and actual round trip to the
>> > device. These parameters are called front end and back end latency, and
>> you
>> > can set them to match a given controller architecture and/or PHY
>> > implementation. That should be enough for most cases I would think.
>> >
>> > Andreas
>> >
>> > -----
>> > On Jan 22, 2015 at 9:22 PM, Rizwana Begum via gem5-users <
>> > gem5-users@gem5.org <javascript:;>> wrote:
>> >
>> >
>> > Great. That was helpful. So, am I right in assuming that Gem5 DRAM
>> > controller model doesn't account for signal propagation delay on
>> > command/data bus? I am coming to this conclusion as read response event
>> > from MC is scheduled to upper ports after tCL+tBURST after read command
>> is
>> > issued. Infact, I had a chance to use DRAMSim2 in the past, and I don't
>> > remember signal propagation delay being accounted there either. Is it
>> too
>> > small and can safely be ignored?
>> >
>> > Thanks,
>> > -Rizwana
>> >
>> > On Thu, Jan 22, 2015 at 2:58 PM, Tao Zhang <tao.zhang.0...@gmail.com
>> > <javascript:;>> wrote:
>> >
>> > > The timing of RL (aka tCL) is dedicated to DRAM module. This is the
>> > > distance from DRAM module receive the CAS command to DRAM module put
>> the
>> > > first data on the interface/bus. On MC/PHY side, it should account for
>> > the
>> > > signal propagation delay on the command/data bus. In fact, signal
>> "DQS"
>> > is
>> > > also used to assist the read data sampling.
>> > >
>> > > The bus protocol is defined by JEDEC. It is completely different from
>> > > AMBA/AHB. The bus has only one master (MC) and may have multiple
>> slaves
>> > > (DRAM ranks). So it looks like a AHB-lite. But in general, they are
>> two
>> > > stories.
>> > >
>> > > -Tao
>> > >
>> > > On Thu, Jan 22, 2015 at 11:50 AM, Rizwana Begum <
>> rizwana....@gmail.com
>> > <javascript:;>>
>> > > wrote:
>> > >
>> > >> Thanks Tao for your response. That clarifies a lot of my questions.
>> So
>> > >> here is what I understand:
>> > >>
>> > >> DRAM module runs at a particular clock frequency. Bus connecting DRAM
>> > >> module and PHY runs in sync with this clock frequency. PHY as well
>> runs
>> > >> synchronously to DRAM module clock frequency. Now, for a 64bit bus,
>> > burst
>> > >> length of 8 (64 bytes transferred per burst) my understanding of read
>> > >> operations is that, after the read command is issued, first bit of
>> data
>> > is
>> > >> available after read latency. At immediate clock edge after read
>> > latency,
>> > >> 8bytes are sampled and transferred over the bus. Then every
>> consecutive
>> > >> rising and falling clock edges, 8 more bytes are sampled and
>> transferred
>> > >> over the bus for four consecutive clock cycles. Thereby, PHY receives
>> > all
>> > >> 64bytes worth data at the end of read latency + 4 clock cycles. Is
>> this
>> > >> right?
>> > >>
>> > >> Also, any idea if this bus (connecting DRAM and PHY) same as system
>> bus?
>> > >> For example, is it AMBA/AHB on latest ARM SoCs?
>> > >>
>> > >> Thanks again,
>> > >> -Rizwana
>> > >>
>> > >> On Thu, Jan 22, 2015 at 12:49 PM, Tao Zhang <
>> tao.zhang.0...@gmail.com
>> > <javascript:;>>
>> > >> wrote:
>> > >>
>> > >>> Hi Rizwana,
>> > >>>
>> > >>> see my understanding inline. Thanks,
>> > >>>
>> > >>> -Tao
>> > >>>
>> > >>> On Thu, Jan 22, 2015 at 8:12 AM, Rizwana Begum via gem5-users <
>> > >>> gem5-users@gem5.org <javascript:;>> wrote:
>> > >>>
>> > >>>> Hello Andreas,
>> > >>>>
>> > >>>> Thanks for the reply. Sure, I will try to get the patch up on
>> review
>> > >>>> board.
>> > >>>> I have another question: Though this is related to DDR/MC
>> architecture
>> > >>>> and not directly related to Gem5 DDR model implementation, I am
>> > hoping you
>> > >>>> (or anyone else on the list) would have a good understanding to
>> > clarify my
>> > >>>> confusions:
>> > >>>>
>> > >>>> As far as I understand 'busBusyUntil' represents the data bus. This
>> > >>>> variable is used to keep track of data bus availability:
>> > >>>>
>> > >>>> 1. Is the data bus is the bus used to transfer data from core DRAM
>> > >>>> module to PHY?
>> > >>>>
>> > >>>
>> > >>>    Yes, you are right. In addition, this is also the bus to transfer
>> > >>> data from PHY to DRAM module.
>> > >>>
>> > >>>
>> > >>>> 2. I believe PHY is the DRAM physical interface IP. Where is it
>> > >>>> physically located? Is it located on core along side memory
>> > controller (MC)
>> > >>>> or on DIMMs? And what exactly does physical bus (the wires
>> connecting
>> > DIMMs
>> > >>>> to MC) connect? DRAM and PHY or PHY and MC?
>> > >>>>
>> > >>>
>> > >>>     It is on the core/MC side.  The physical bus refers to DRAM and
>> > PHY.
>> > >>> Logically, you can treat PHY as a part of MC and it just incurs some
>> > extra
>> > >>> latency. In this way, the physical bus can be extended to DRAM and
>> MC.
>> > >>>
>> > >>>
>> > >>>> 3. My confusion is that actual physical bus on SoC connecting the
>> DRAM
>> > >>>> module and MC should be different from data bus that
>> 'busBusyUntil' is
>> > >>>> representing. It takes tBURST ns (function of memory cycles) to
>> > transfer
>> > >>>> one burst on the data bus and the actual physical bus speed
>> shouldn't
>> > be
>> > >>>> depending upon memory frequency for transferring data from DRAM to
>> > MC. Am I
>> > >>>> right?
>> > >>>>
>> > >>>
>> > >>>     The "busBusyUntil" is still valid. The actual physical bus speed
>> > >>> should be consistent with the SPEC (e.g., 800MHz, 933MHz,
>> 1600MHz...).
>> > >>> Remember, the JEDEC DRAM is a Synchronous DRAM. It means both PHY
>> and
>> > DRAM
>> > >>> module should be in sync with the same clock frequency. As one end
>> of
>> > the
>> > >>> connection is the DRAM module, PHY should run at the same frequency
>> as
>> > DRAM
>> > >>> module runs.
>> > >>>
>> > >>>
>> > >>>> I would appreciate if anyone can provide insight into these
>> questions.
>> > >>>>
>> > >>>> Thank you,
>> > >>>> -Rizwana
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> On Wed, Jan 21, 2015 at 4:45 PM, Andreas Hansson <
>> > >>>> andreas.hans...@arm.com <javascript:;>> wrote:
>> > >>>>
>> > >>>>>  Hi Rizwana,
>> > >>>>>
>> > >>>>>  It could very well be that you’ve hit a bug. I’d suggest to post
>> a
>> > >>>>> review on the reviewboard to make it more clear what changes need
>> to
>> > be
>> > >>>>> done. If you’re not familiar with the process have a look at
>> > >>>>> http://www.gem5.org/Commit_Access. The easiest is to use the
>> > >>>>> reviewboard mercurial plugin.
>> > >>>>>
>> > >>>>>  I look forward to see the patch.
>> > >>>>>
>> > >>>>>  Thanks,
>> > >>>>>
>> > >>>>>  Andreas
>> > >>>>>
>> > >>>>>   From: Rizwana Begum via gem5-users <gem5-users@gem5.org
>> > <javascript:;>>
>> > >>>>> Reply-To: Rizwana Begum <rizwana....@gmail.com <javascript:;>>,
>> > gem5 users mailing
>> > >>>>> list <gem5-users@gem5.org <javascript:;>>
>> > >>>>> Date: Wednesday, 21 January 2015 16:24
>> > >>>>> To: gem5 users mailing list <gem5-users@gem5.org <javascript:;>>
>> > >>>>> Subject: [gem5-users] DRAM controller write requests merge
>> > >>>>>
>> > >>>>>  Hello Users,
>> > >>>>>
>> > >>>>>  I am trying to understanding write packets queuing in DRAM
>> > >>>>> controller model. I am looking at 'addToWriteQueue' function.
>> From my
>> > >>>>> understanding so far, it merges write requests across burst
>> > boundaries.
>> > >>>>> Looking at following if statement:
>> > >>>>>
>> > >>>>>  if ((addr + size) >= (*w)->addr &&
>> > >>>>>                            ((*w)->addr + (*w)->size - addr) <=
>> > >>>>> burstSize) {
>> > >>>>>                     // the new one is just before or partially
>> > >>>>>                     // overlapping with the existing one, and
>> > together
>> > >>>>>                     // they fit within a burst
>> > >>>>> ....
>> > >>>>>  ....
>> > >>>>> ....
>> > >>>>> }
>> > >>>>>
>> > >>>>>  Merging here may make the write request go across burst boundary.
>> > >>>>> Size computation in the beginning of the for loop of this function
>> > suggests
>> > >>>>> that packets are split at burst boundaries. For example, if the
>> > packet addr
>> > >>>>> is 16, burst size is 32 bytes and packet request size is 25 bytes
>> > (all in
>> > >>>>> decimal for ease), then 2 write bursts should be added to the
>> queue:
>> > 16-31,
>> > >>>>> 32-40. However, while merging, lets say if there existed a packet
>> > already
>> > >>>>> in write queue from 32-40, then a write from 16-40 is added to the
>> > queue
>> > >>>>> which is across burst boundary. is that physically possible?
>> > Shouldn't
>> > >>>>> there be two write requests in the queue:16-31, 32-40 instead of
>> one
>> > single
>> > >>>>> merged request?
>> > >>>>>
>> > >>>>>  Thank you,
>> > >>>>> -Rizwana
>> > >>>>>
>> > >>>>>
>> > >>>>>
>> > >>>>> -- IMPORTANT NOTICE: The contents of this email and any
>> attachments
>> > >>>>> are confidential and may also be privileged. If you are not the
>> > intended
>> > >>>>> recipient, please notify the sender immediately and do not
>> disclose
>> > the
>> > >>>>> contents to any other person, use it for any purpose, or store or
>> > copy the
>> > >>>>> information in any medium. Thank you.
>> > >>>>>
>> > >>>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1
>> 9NJ,
>> > >>>>> Registered in England & Wales, Company No: 2557590
>> > >>>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge
>> CB1
>> > >>>>> 9NJ, Registered in England & Wales, Company No: 2548782
>> > >>>>>
>> > >>>>
>> > >>>>
>> > >>>> _______________________________________________
>> > >>>> gem5-users mailing list
>> > >>>> gem5-users@gem5.org <javascript:;>
>> > >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>> > >>>>
>> > >>>
>> > >>>
>> > >>
>> > >
>> >
>> > _______________________________________________
>> > gem5-users mailing list
>> > gem5-users@gem5.org <javascript:;>
>> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>>
>>
>

_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Re: [gem5-users] DRAM controller write requests merge

Reply via email to