Hello, Some users have been complaining for years about their GPU sounding like a jet engine at take off. Last year, I finally laid my hand on one of these GPUs and have been trying to fix this issue on and off since then.
After failing to find anything in the HW, I figured out that the duty cycle set by nvidia's proprietary driver would be way under the expected value. By randomly changing values in the unknown tables of the vbios, I found out that there is a fan calibration table at the offset 0x18 in the BIT P table (version 2). In this table, I identified 2 major 16 bits parameters at offset 0xa and 0xc[2]. The first one, I named pwm_max, while naming the latter pwm_offset. As expected, these parameters look like a mapping function of the form aX + b. However, after gathering more samples, I found out that the output was not continuous when linearly increasing pwm_offset [1]. Even more funnily, the period of this square function is linear with the frequency used for the fan's PWN. I tried reverse engineering the formula to describe this function, but failed to find a version that would work perfectly for all PWM frequency. This is the closest I have got to[3], and I basically stopped there about a year ago because I could not figure it out and got frustrated :s. I started again on this project 2 weeks ago, with the intent of finding a good-enough solution for nouveau, and modelling the rest of the equation that that would allow me to compute what duty I should set for every wanted fan speed (%). I again mostly succeeded... but it would seem that the interpretation of the table depends on the generation of chipset (Tesla behaves one way, Fermi+ behaves another way). Also, the proprietary is not consistent for rules such as what to do when the computed duty value is going to be lower than 0 or not (sometimes we clamp it to 0, some times we set it to the same value as the divider, some times we set it to a slightly lower value than the divider). I have been trying to cover all edge cases by generating a randomized set of values for the PWM frequency, pwm_max, and pwm_offset values, flashed the vbios, and iterate from 0% to 100% fan speed while dumping the values set by your driver. Using half a million sample points (which took a week to acquire), my model computes 97% of the values correctly (ignoring off by ones), while the remaining 3% are worryingly off (by up to 100%)... It is clear that the code is not trivial and is full of branching, which makes clean-room reverse engineering a chore. As a final attempt to make a somewhat complete solution, I tried this weekend to make a "safe" model that would still make the GPUs quiet. I managed to improve the pass rate from 97 to 99.6%, but the remaining failures conflict with my previous findings, which are also way more prevalent. In the end, the only completely-safe way of driving the fan is the current behaviour of nouveau... At this point, I am ready to throw in the towel and hardcode parameters in nouveau to address the problem of the loudest GPUs, but this is of course suboptimal. This is why I am asking for your help. Would you have some documentation about this fan calibration table that could help me here? Code would be even more appreciated. Thanks a lot in advance, Martin PS: here is most of the code you may want to see: http://fs.mupuf.org/nvidia/fan_calib/ [1] http://fs.mupuf.org/nvidia/fan_calib/pwm_offset.png [2] https://github.com/envytools/envytools/blob/master/nvbios/power.c#L333 [3] https://github.com/envytools/envytools/blob/master/nvbios/power.c#L298 _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau