Hello, Just on a side note: I messed up commit ids:
Our *old Gem5 commit* was: ad4564da49b5e9d3f94b8015b7d0e02432bc20b8 Author: Chander Sudanthi <chander.sudan...@arm.com> Date: Thu Oct 31 13:41:13 2013 -0500 ARM: add support for TEEHBR access Thumb2 ARM kernels may access the TEEHBR via thumbee_notifier in arch/arm/kernel/thumbee.c. The Linux kernel code just seems to be saving and restoring the register. This patch adds support for the TEEHBR cp14 register. Note, this may be a special case when restoring from an image that was run on a system that supports ThumbEE. *new Gem5 commit *is: 1c9d5d9417b35e002d65c3eb329b372fd2320055 Author: Andreas Hansson <andreas.hans...@arm.com> Date: Tue Dec 2 06:08:25 2014 -0500 stats: Bump stats for fixes, mostly TLB and WriteInvalidate Thank you, -Rizwana On Thu, Jan 22, 2015 at 6:59 PM, Rizwana Begum <rizwana....@gmail.com> wrote: > Hello Andreas, > > I agree totally with you that low power modes is the way to go for getting > better energy-performance tradeoffs for memory than going with DFS. > However, In past I have experimented with DFS for memory bus only. With > change in memory frequency I scaled only tBURST linearly with frequency and > observed performance vs. frequency trend for memory intensive benchmarks > for open-page policy. For closed page policy, I had no performance > improvement with increase in memory frequency as command-to-command static > latency of DRAM dominates. I also had energy vs frequency trade-off as > background energy scales with memory frequency. > > All of the above DFS exploration was done on an old Gem5 commit ( commit: > d2404e ). I had a simplified micron memory power model and the simple > frequency scaling mentioned above implemented on top of this old commit. > Recently we moved to a latest Gem5 commit (commit : 4a411f) that has > detailed power and performance model compared to the old commit. I am > trying to have a quick DFS implementation here and observe the trends of > energy and performance vs. memory frequency. Then I think, exploring low > power modes will be my next step. > > I am able to express the timings parameters that are specific to DRAM > module in terms of memory frequency. Some of the DRAM related timing > parameters are static latencies, some are function of tCK (Got details from > micron datasheet). From discussion in this thread so far, I think PHY also > works in sync with memory frequency. While, MC I believe should have either > it's own clock domain, or might work in L1/L2/Core clock domain. However, > given that I don't have a good model for MC and PHY latencies, for now, I > am planning to only scale DRAM related parameters and leave PHY,MC static > latencies as they are. > > I appreciate yours and Tao's inputs so far. I would be happy to receive > any more ideas if you have regarding my DFS implementation approach. > > Thank you, > -Rizwana > > > > On Thu, Jan 22, 2015 at 5:10 PM, Andreas Hansson <andreas.hans...@arm.com> > wrote: > >> Hi Rizwana, >> >> All objects belong to a clock domain. That said, there is no clock domain >> specifically for the memory controller in the example scripts. At the >> moment all the timings in the controller are absolute, and are not >> expressed in cycles. >> >> In general the best strategy to modulate DRAM performance is to use the >> low power modes (rather than DVFS). The energy spent is far from >> proportional, and thus it is better to be completely off when possible. We >> have some patches that add low-power modes to the controller and they >> should hopefully be on RB soon. >> >> Andreas >> >> ----- >> On Jan 22, 2015 at 9:59 PM, Rizwana Begum <rizwana....@gmail.com> wrote: >> >> >> Ah. I see. Thanks for pointing me to the static latencies, I missed on >> that. As the controller latency is modeled as static latency, am I right >> in >> saying that as of the latest commit MC is not attached to any clock domain >> in Gem5? >> >> Thanks, >> -Rizwana >> >> On Thursday, January 22, 2015, Andreas Hansson via gem5-users < >> gem5-users@gem5.org> wrote: >> >> > Hi Rizwana, >> > >> > The DRAM controller has two parameters to control the static latency in >> > the controller itself, and also the PHY and actual round trip to the >> > device. These parameters are called front end and back end latency, and >> you >> > can set them to match a given controller architecture and/or PHY >> > implementation. That should be enough for most cases I would think. >> > >> > Andreas >> > >> > ----- >> > On Jan 22, 2015 at 9:22 PM, Rizwana Begum via gem5-users < >> > gem5-users@gem5.org <javascript:;>> wrote: >> > >> > >> > Great. That was helpful. So, am I right in assuming that Gem5 DRAM >> > controller model doesn't account for signal propagation delay on >> > command/data bus? I am coming to this conclusion as read response event >> > from MC is scheduled to upper ports after tCL+tBURST after read command >> is >> > issued. Infact, I had a chance to use DRAMSim2 in the past, and I don't >> > remember signal propagation delay being accounted there either. Is it >> too >> > small and can safely be ignored? >> > >> > Thanks, >> > -Rizwana >> > >> > On Thu, Jan 22, 2015 at 2:58 PM, Tao Zhang <tao.zhang.0...@gmail.com >> > <javascript:;>> wrote: >> > >> > > The timing of RL (aka tCL) is dedicated to DRAM module. This is the >> > > distance from DRAM module receive the CAS command to DRAM module put >> the >> > > first data on the interface/bus. On MC/PHY side, it should account for >> > the >> > > signal propagation delay on the command/data bus. In fact, signal >> "DQS" >> > is >> > > also used to assist the read data sampling. >> > > >> > > The bus protocol is defined by JEDEC. It is completely different from >> > > AMBA/AHB. The bus has only one master (MC) and may have multiple >> slaves >> > > (DRAM ranks). So it looks like a AHB-lite. But in general, they are >> two >> > > stories. >> > > >> > > -Tao >> > > >> > > On Thu, Jan 22, 2015 at 11:50 AM, Rizwana Begum < >> rizwana....@gmail.com >> > <javascript:;>> >> > > wrote: >> > > >> > >> Thanks Tao for your response. That clarifies a lot of my questions. >> So >> > >> here is what I understand: >> > >> >> > >> DRAM module runs at a particular clock frequency. Bus connecting DRAM >> > >> module and PHY runs in sync with this clock frequency. PHY as well >> runs >> > >> synchronously to DRAM module clock frequency. Now, for a 64bit bus, >> > burst >> > >> length of 8 (64 bytes transferred per burst) my understanding of read >> > >> operations is that, after the read command is issued, first bit of >> data >> > is >> > >> available after read latency. At immediate clock edge after read >> > latency, >> > >> 8bytes are sampled and transferred over the bus. Then every >> consecutive >> > >> rising and falling clock edges, 8 more bytes are sampled and >> transferred >> > >> over the bus for four consecutive clock cycles. Thereby, PHY receives >> > all >> > >> 64bytes worth data at the end of read latency + 4 clock cycles. Is >> this >> > >> right? >> > >> >> > >> Also, any idea if this bus (connecting DRAM and PHY) same as system >> bus? >> > >> For example, is it AMBA/AHB on latest ARM SoCs? >> > >> >> > >> Thanks again, >> > >> -Rizwana >> > >> >> > >> On Thu, Jan 22, 2015 at 12:49 PM, Tao Zhang < >> tao.zhang.0...@gmail.com >> > <javascript:;>> >> > >> wrote: >> > >> >> > >>> Hi Rizwana, >> > >>> >> > >>> see my understanding inline. Thanks, >> > >>> >> > >>> -Tao >> > >>> >> > >>> On Thu, Jan 22, 2015 at 8:12 AM, Rizwana Begum via gem5-users < >> > >>> gem5-users@gem5.org <javascript:;>> wrote: >> > >>> >> > >>>> Hello Andreas, >> > >>>> >> > >>>> Thanks for the reply. Sure, I will try to get the patch up on >> review >> > >>>> board. >> > >>>> I have another question: Though this is related to DDR/MC >> architecture >> > >>>> and not directly related to Gem5 DDR model implementation, I am >> > hoping you >> > >>>> (or anyone else on the list) would have a good understanding to >> > clarify my >> > >>>> confusions: >> > >>>> >> > >>>> As far as I understand 'busBusyUntil' represents the data bus. This >> > >>>> variable is used to keep track of data bus availability: >> > >>>> >> > >>>> 1. Is the data bus is the bus used to transfer data from core DRAM >> > >>>> module to PHY? >> > >>>> >> > >>> >> > >>> Yes, you are right. In addition, this is also the bus to transfer >> > >>> data from PHY to DRAM module. >> > >>> >> > >>> >> > >>>> 2. I believe PHY is the DRAM physical interface IP. Where is it >> > >>>> physically located? Is it located on core along side memory >> > controller (MC) >> > >>>> or on DIMMs? And what exactly does physical bus (the wires >> connecting >> > DIMMs >> > >>>> to MC) connect? DRAM and PHY or PHY and MC? >> > >>>> >> > >>> >> > >>> It is on the core/MC side. The physical bus refers to DRAM and >> > PHY. >> > >>> Logically, you can treat PHY as a part of MC and it just incurs some >> > extra >> > >>> latency. In this way, the physical bus can be extended to DRAM and >> MC. >> > >>> >> > >>> >> > >>>> 3. My confusion is that actual physical bus on SoC connecting the >> DRAM >> > >>>> module and MC should be different from data bus that >> 'busBusyUntil' is >> > >>>> representing. It takes tBURST ns (function of memory cycles) to >> > transfer >> > >>>> one burst on the data bus and the actual physical bus speed >> shouldn't >> > be >> > >>>> depending upon memory frequency for transferring data from DRAM to >> > MC. Am I >> > >>>> right? >> > >>>> >> > >>> >> > >>> The "busBusyUntil" is still valid. The actual physical bus speed >> > >>> should be consistent with the SPEC (e.g., 800MHz, 933MHz, >> 1600MHz...). >> > >>> Remember, the JEDEC DRAM is a Synchronous DRAM. It means both PHY >> and >> > DRAM >> > >>> module should be in sync with the same clock frequency. As one end >> of >> > the >> > >>> connection is the DRAM module, PHY should run at the same frequency >> as >> > DRAM >> > >>> module runs. >> > >>> >> > >>> >> > >>>> I would appreciate if anyone can provide insight into these >> questions. >> > >>>> >> > >>>> Thank you, >> > >>>> -Rizwana >> > >>>> >> > >>>> >> > >>>> >> > >>>> >> > >>>> On Wed, Jan 21, 2015 at 4:45 PM, Andreas Hansson < >> > >>>> andreas.hans...@arm.com <javascript:;>> wrote: >> > >>>> >> > >>>>> Hi Rizwana, >> > >>>>> >> > >>>>> It could very well be that you’ve hit a bug. I’d suggest to post >> a >> > >>>>> review on the reviewboard to make it more clear what changes need >> to >> > be >> > >>>>> done. If you’re not familiar with the process have a look at >> > >>>>> http://www.gem5.org/Commit_Access. The easiest is to use the >> > >>>>> reviewboard mercurial plugin. >> > >>>>> >> > >>>>> I look forward to see the patch. >> > >>>>> >> > >>>>> Thanks, >> > >>>>> >> > >>>>> Andreas >> > >>>>> >> > >>>>> From: Rizwana Begum via gem5-users <gem5-users@gem5.org >> > <javascript:;>> >> > >>>>> Reply-To: Rizwana Begum <rizwana....@gmail.com <javascript:;>>, >> > gem5 users mailing >> > >>>>> list <gem5-users@gem5.org <javascript:;>> >> > >>>>> Date: Wednesday, 21 January 2015 16:24 >> > >>>>> To: gem5 users mailing list <gem5-users@gem5.org <javascript:;>> >> > >>>>> Subject: [gem5-users] DRAM controller write requests merge >> > >>>>> >> > >>>>> Hello Users, >> > >>>>> >> > >>>>> I am trying to understanding write packets queuing in DRAM >> > >>>>> controller model. I am looking at 'addToWriteQueue' function. >> From my >> > >>>>> understanding so far, it merges write requests across burst >> > boundaries. >> > >>>>> Looking at following if statement: >> > >>>>> >> > >>>>> if ((addr + size) >= (*w)->addr && >> > >>>>> ((*w)->addr + (*w)->size - addr) <= >> > >>>>> burstSize) { >> > >>>>> // the new one is just before or partially >> > >>>>> // overlapping with the existing one, and >> > together >> > >>>>> // they fit within a burst >> > >>>>> .... >> > >>>>> .... >> > >>>>> .... >> > >>>>> } >> > >>>>> >> > >>>>> Merging here may make the write request go across burst boundary. >> > >>>>> Size computation in the beginning of the for loop of this function >> > suggests >> > >>>>> that packets are split at burst boundaries. For example, if the >> > packet addr >> > >>>>> is 16, burst size is 32 bytes and packet request size is 25 bytes >> > (all in >> > >>>>> decimal for ease), then 2 write bursts should be added to the >> queue: >> > 16-31, >> > >>>>> 32-40. However, while merging, lets say if there existed a packet >> > already >> > >>>>> in write queue from 32-40, then a write from 16-40 is added to the >> > queue >> > >>>>> which is across burst boundary. is that physically possible? >> > Shouldn't >> > >>>>> there be two write requests in the queue:16-31, 32-40 instead of >> one >> > single >> > >>>>> merged request? >> > >>>>> >> > >>>>> Thank you, >> > >>>>> -Rizwana >> > >>>>> >> > >>>>> >> > >>>>> >> > >>>>> -- IMPORTANT NOTICE: The contents of this email and any >> attachments >> > >>>>> are confidential and may also be privileged. If you are not the >> > intended >> > >>>>> recipient, please notify the sender immediately and do not >> disclose >> > the >> > >>>>> contents to any other person, use it for any purpose, or store or >> > copy the >> > >>>>> information in any medium. Thank you. >> > >>>>> >> > >>>>> ARM Limited, Registered office 110 Fulbourn Road, Cambridge CB1 >> 9NJ, >> > >>>>> Registered in England & Wales, Company No: 2557590 >> > >>>>> ARM Holdings plc, Registered office 110 Fulbourn Road, Cambridge >> CB1 >> > >>>>> 9NJ, Registered in England & Wales, Company No: 2548782 >> > >>>>> >> > >>>> >> > >>>> >> > >>>> _______________________________________________ >> > >>>> gem5-users mailing list >> > >>>> gem5-users@gem5.org <javascript:;> >> > >>>> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> > >>>> >> > >>> >> > >>> >> > >> >> > > >> > >> > _______________________________________________ >> > gem5-users mailing list >> > gem5-users@gem5.org <javascript:;> >> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >> >> >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users