8 Voltage- and Frequency Scaling in Nouveau

Martin Peres Thu, 12 Jun 2014 16:15:07 -0700

On 11/06/2014 13:59, Roy Spliet wrote:

Dear Mr. Dew,
I hereby wish to propose the X.org EVoC project "REclock -Reverse-engineer and implement NVA3/5/8 Voltage- and Frequency Scalingin Nouveau" for which I am willing to participate, and apply for theassociated funding. Full details below or onhttp://nouveau.spliet.org/evoc.html . For any further questions feelfree to contact me either on Freenode IRC (rspliet) or by e-mail tothis address.Thank you for your consideration, and I look forward to hearing morefrom you soon. Yours,
Roy Spliet


Hello Roy,

Thank you for your proposal. After careful consideration from the boardof directors, we accepted it. You may start your EVoC on June 16th.Our treasurer will contact you in private to get your bankinginformation and send you an initial payment along with 250€ for buyingthe hardware you need.

We wish you and your mentor the best of luck on this project! Do nothesitate to contact us if you have any question.


Martin Peres, on behalf of the board of directors

---


  REclock: Reverse-engineer and implement NVA3/5/8 Voltage- and
  Frequency Scaling in Nouveau
NVIDIA graphics cards often support running at a variety of differentperformance "levels". This aids in reducing the power demand and heatdissipation of the devices when idle, while unleashing full potentialunder load. A performance level comprises the clock speed and voltagefor several subcomponents in the GPU. The difference between thelowest and highest performance level can be as much as a factor 10 inclock speed.
Despite hard work from many developers, reclocking support in Nouveaustill has quite a few loose ends: engine reclocking is mostly in placebut not always reliable, there are several missing routines related tomemory reclocking and in general the actions required to performvoltage- and frequency scaling are not or only partially understood.Because of this, NVIDIA GPUs driven by nouveau are limited to usingthe boot speed and voltage only, severely limiting performance andusability.
For this project, I aim to tie these loose ends together for NVIDIAsNVA3/5/8 GPUs. I intend to fully reverse engineer severalsubcomponents related to voltage and frequency scaling, try to get afull understanding of the clock tree and use this gained knowledge tofurther improve the nouveau voltage and frequency scalingimplementation for said GPUs.
      Personal information
My name is Roy Spliet, I'm a graduated masters student from DelftUniversity of Technology (TU Delft), planning to continue my academiccareer as a PhD student in computer architecture. My backgroundincludes kernel/driver development (nouveau, LITMUS^RT) and GPGPUprogramming in OpenCL.
Previous involvement in Nouveau has led to successfullyreverse-engineering and implementing reclocking support for thememory-less NVIDIA NVAA and NVAC chipsets, alongside manycontributions to memory reclocking for pre-NVC0 (Fermi) GPUs. For moredetails about my personal background, please consulthttp://roy.spliet.org.
    Background
NVIDIA GPUs feature a complex multi-layer clock tree that allows forper-subcomponent alteration of clock speeds. The precise clock tree isa complex network consisting of one or more input clocks, severalfixed dividers, and a lot of routing to distribute these clocks toevery subcomponent. On the last level there is usually a Phase-LockLoop (PLL) that can take either the original clock or one of severaldivided clocks as an input, and bring this clock up to the desiredlevel for the associated subcomponent. Control registers alter theprecise input of these PLLs, and can in addition be configured tobypass the PLLs.
The video BIOS (VBIOS) provides two services: it takes care ofbringing the GPU in to an initial valid state, and it contains crucialinformation regarding reclocking. Most importantly, the VBIOSdescribes the ranges of each PLL in the system. On a higher level, theVBIOS also contains several "performance levels". Each level consistsof a clock speed for each subcomponent. NVIDIA's driver switchesbetween these performance levels based on the load. For most enginesthis routine consists of bypassing the PLL, setting it to a new value,testing the newly set values, and then re-enabling the PLL.
      Memory reclocking
Memory reclocking is a bit more difficult than other engines. Besidesan input clock, the memory controller also needs to know of a varietyof latencies, that are usually defined in clock ticks but mandated innanoseconds. These latencies, or timings, are described in the VBIOS.
To keep the memory controller and the engines running in sync, a formof link training is also required. Updating all this information mustbe done according to strict timing requirements, and failure to meetthese deadlines results in corrupted memory and all consequencesassociated. Although the memory is often well documented in thepublic, NVIDIA's memory controller is not. Reverse engineering it is adifficult challenge, as there is very little feedback beyond either aworking system or a complete crash.
      Reclocking engine
To facilitate the action of reclocking from within the GPU itself,increasing stability on operating system failures, NVIDIA added asubcomponent called PDAEMON. This component has full access to manyregisters accessible through MMIO, including the registers controllingthe clocks, latencies and other power-management related features.PDAEMON is a programmable engine supporting the Falcon or fμc ISA.NVIDIA's driver uploads the firmware for this engine, dubbed PMU.
PMU is responsible for many power-management related functions,including: monitor temperature, control fan speed and monitor the loadon the GPU. To alter clock speeds, the NVIDIA driver can uploadspecial scripts in a language called "seq" that will be interpreted byPMU. These scripts contain sequences of registers that need to beadjusted in order, along with required pause commands and other logic.Full understanding of the seq ISA gives full understanding of theactions executed by NVIDIA's driver on a reclock operation and theirtiming.
Nouveau has it's own implementation of the PMU microcode, including ascriptable engine offering many of the capabilities implemented inolder hardware. However, it's capabilities might be insufficient toperform all the tasks that NVIDIA's driver performs through PMU.
      Current state
Nouveau has a lot of code in place for engine reclocking. Many of thePLLs have been identified, and some of the control registers have beenreverse engineered either partially or completely. Although known towork on some GPUs, engine reclocking does not work reliably at leaston my NVA8.
For memory reclocking, some code exists to determine the latenciesthat the memory and the memory controller need to know. Still, thereare some other features vital for memory reclocking that areill-understood, unimplemented and/or incorrect. In addition, the orderof events is likely wrong. As a result, clocking memory to anyperformance level higher than the boot clocks likely results in memorycorruption. The link training unit found on some GPUs with DDR3 is oneimportant example of a feature not handled by Nouveau currently.
Large parts of the VBIOS are well understood and parsed both by thenouveau kernel driver and the envytools VBIOS parsing tool. Any bitsleft could lead to interesting clues on actions required for reclocking.
    Project


      Scope
In this project I aim to get a better understanding of the reclockingfeatures of the NVA3/5/8, as utilised by NVIDIA's official devicedriver. The eventual goal of this project is complete voltage andfrequency scaling for these GPUs in nouveau. Gained knowledge couldbenefit the implementation of newer generations of cards as well.
I limit myself to the core features and aim for a manual control ofthe voltage and clock frequencies based on profiles in the VBIOS;dynamic reclocking based on load information is beyond the scope ofthis project.
Initial code contributions will not make use of Nouveaus PMU engine.When established that this is absolutely necessary, the firmware couldbe extended to support the desired functionality. However, until thisis established, reclocking through PDAEMON is considered a nice tohave feature with low priority.
      Benefits to the community
Users will benefit from the increased performance that nouveau canoffer under higher clocks, while having the capability to save energywhen the processing power is not required. This could lead toprolongued battery life for mobile systems using the Open SourceNVIDIA driver stack.
This work combined with the GSoC project on performance countersprovides the prerequisites for implementing dynamic frequency scalingin future work, enabling all users of the open source graphics driverstack to profit from these benefits without manual intervention.
      Deliverables
Implementation will be done entirely in the Nouveau kernel module,forked from an upstream kernel. Produced patches are intended to bemerged back into mainline kernel at the end of the project, but mightrequire some after-care when conflicting maintenance is done onnouveau. Controls are exposed through sysfs.
Documentation will be added to the "envytools" GIT repository whereapplicable.
      Mentor

Ilia Mirkin


      Schedule
My availability is roughly full time between now and the start of thenew academic year in October. Tentative planning:
Description     Deliverable     Timeframe       Required
Reverse engineer seq ISA        Documentation (envytools)       1 week  X
Write seq script decoder        Decoding tool (envytools)       1 week  X
RE clock tree for NVA3/5/8 Documentation (envytools), full graph 1-2week(s) XFinish/fix engine reclocking for NVA3/5/8 Kernel code allowing usersto successfully select any performance level through SysFS 1 week XRE+implement DDR3 link training unit Documentation (envytools) +Kernel code (no directly visible changes) 1 week XRE+implement DDR3 memory reclocking Kernel code, observableperformance improvements for highest performance level on affectedGPUs 3 weeks XRE+implement GDDR3 memory reclocking Kernel code, observableperformance improvements for highest performance level on affectedGPUs 3 weeks XRE+implement GDDR5 memory reclocking* Kernel code, observableperformance improvements for highest performance level on affectedGPUs ?RE+implement DDR2 memory reclocking* Kernel code, observableperformance improvements for highest performance level on affectedGPUs ?
* If hardware available


      Risks
There is little risk attached to all tasks resulting in documentationof the clock tree. Patches to the nouveau kernel tree are expected,but chances exist that the code does not generalise to all cards.Earlier experience makes me confident engine reclocking can beimplemented with low risk. Achievements for memory reclocking are notguaranteed given the complexity of the job, although progress isdefinitely expected.
      Hardware
I currently possess one NVA8 GPU with DDR3 memory. More NVA3/5/8hardware is available through Martin Peres and accessible remotely.Possibly missing in our combined collection are NVA3/5/8 graphicscards with DDR2. If budget is available, this could be purchased (newapproximately €50,=) by either Martin Peres or myself forreverse-engineering purposes.
_______________________________________________
[email protected]: X.Org Foundation Board of Directors
Archives: http://foundation.x.org/cgi-bin/mailman/private/board
Info: http://foundation.x.org/cgi-bin/mailman/listinfo/board


_______________________________________________
Nouveau mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] EVoC Proposal: REclock - Reverse-engineer and implement NVA3/5/8 Voltage- and Frequency Scaling in Nouveau

Reply via email to