On Wed, 3 Jul 2013 15:59:47 +0100
Mans Rullgard <mans.rullg...@linaro.org> wrote:

> On 3 July 2013 14:13, Renato Golin <renato.go...@linaro.org> wrote:
> > Hi Folks,
> >
> > I'm running two buildbots here at home and am getting consistent failures
> > from the Pandas because of overheating. I've set up a monitor that will tell
> > me the current CPU temperature and the allowed maximum, and when the bot
> > passes 90%, it shuts itself off.
> >
> > The problem is that I'm running with heat-sinks and the boards are on top of
> > three fans, so there really isn't much more I can do to solve this problem.
> >
> > I personally think this is a hardware problem, since everything is in the
> > same die, CPU, GPU and RAM, and the physical dimensions of the chip are
> > quite small. I remember when Intel started overheating (around 486DX66) and
> > the die was huge (more head dissipation), plus RAM and GPU were separate,
> > and it still needed a hefty heat-sink.
> >
> > It's true that gates are far smaller today, but it's not true that a dual
> > core 1.3GHz + GPU + RAM will produce less heat on a small die than a 66KHz
> > CPU on a huge die, so why anyone think it's a good idea to release a 1+GHz
> > chip without *any* form of heat dissipation is beyond my comprehension.
> 
> Modern silicon processes are much more power-efficient than those of the 90s.
> For example, an old ~500MHz Alpha machine I have readily consumes 90W even
> when idle.  A quad-core Intel i7 typically has a TDP of 130W at full load.
> That's orders of magnitude more gates clocked at 6x the frequency and still
> using only marginally more power.
> 
> BTW, the RAM is a separate chip mounted on top of the SoC.
> 
> > Manufacturers only got away with it, so far, because people rarely use 100%
> > of the CPU power for extended periods of time, because ARM devices end up as
> > set-top boxes, mobile phones and tablets. However, even those devices will
> > heat up when playing 2 h films or games, and they do have some form of heat
> > sink.
> 
> An OMAP4460 will run at 1.2GHz indefinitely without overheating in reasonable
> ambient temperature.  The higher frequencies are only meant to be used in
> conjunction with (software) thermal management to throttle back if temperature
> rises.
> 
> If you don't have thermal management in the kernel you're running, you need
> to clamp the clock at a safe value.

By the way, power consumption is not constant and heavily depends on
what the CPU is actually doing. And 100% CPU load in one application
does not mean that it would consume the same amount of power as 100%
CPU load in another application. With some targeted "optimisations" it
is possible to boost power consumption roughly by a factor of 1.5x
compared to most heavy workloads in real applications. I have a
collection of ARM cpuburn programs, empirically tuned for different
microarchitectures (which means that they still can be possibly
"improved"):

    https://github.com/ssvb/cpuburn

It is possible that Cortex-A15 would show a similar ~1.5x factor for
the power consumption boost if somebody were to tune cpuburn for it.
But I'm a bit reluctant to dismantle my ARM Chromebook to hook a
multimeter there (developer boards with no batteries and with barrel
power connectors are much more easy to deal with).

Some time ago, I tossed my Cortex-A9 cpuburn to the ODROID-X people.
And coincidentally they quickly got the thermal framework properly
integrated into their kernels and also started to offer optional
active coolers to their customers :-)

Now if you also consider that SoCs usually have a lot more than
just the CPU cores, the peak power consumption can be really high.
Designing the cooling system so that it is able to handle the peak
power consumption is a bit of an overkill. It is going to be expensive
and/or bulky. And just restricting the CPU clock frequency so that the
power consumption never exceeds a certain threshold, you are going to
end up clocking the CPU at a really low speed. In my opinion, the right
solution for modern ARM SoCs is just to always ensure proper throttling
support (both in the hardware and in the software). ARM can even call it
"turbo-boost", "turbo-core" or use some other marketing buzzword ;-)

-- 
Best regards,
Siarhei Siamashka

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to