I believe that in the LAVA lab there are a few pandas with USB keys
that are used for builds to try and overcome some reliability
problems. Don't know if it was a temperature problem or something
else. With any luck someone who knows more about that issue can speak
up and share what they found. You could also try running "stress --cpu
4 --vm 2" and see if any errors show. I find that on my desktop
running 2x the number of CPU stress threads as I have CPUs is about
right to eat all available resources. That will just stress RAM and
CPU, not disk I/O, which should pinpoint the problem. Plenty of other
options 
(http://www.hecticgeek.com/2012/11/stress-test-your-ubuntu-computer-with-stress/)...

Is running at 100% of the thermal limit really an issue? Isn't the
point that it is the limit, which itself should have some safety built
in? I don't know off hand if the OMAP 4 SoCs incorporate hardware
frequency limiting or if it is entirely software, in which case the
kernel frequency governor should (at a guess) be throttling back.

I did have a panda give up on me about a year ago. It wasn't being
worked hard, but did refuse to get through a boot most of the time (it
did power on and get part way through booting). Those boards aren't
designed for high reliability and it may be that you just need to get
a couple of replacements.

James

On 3 July 2013 14:13, Renato Golin <renato.go...@linaro.org> wrote:
> Hi Folks,
>
> I'm running two buildbots here at home and am getting consistent failures
> from the Pandas because of overheating. I've set up a monitor that will tell
> me the current CPU temperature and the allowed maximum, and when the bot
> passes 90%, it shuts itself off.
>
> The problem is that I'm running with heat-sinks and the boards are on top of
> three fans, so there really isn't much more I can do to solve this problem.
>
> I personally think this is a hardware problem, since everything is in the
> same die, CPU, GPU and RAM, and the physical dimensions of the chip are
> quite small. I remember when Intel started overheating (around 486DX66) and
> the die was huge (more head dissipation), plus RAM and GPU were separate,
> and it still needed a hefty heat-sink.
>
> It's true that gates are far smaller today, but it's not true that a dual
> core 1.3GHz + GPU + RAM will produce less heat on a small die than a 66KHz
> CPU on a huge die, so why anyone think it's a good idea to release a 1+GHz
> chip without *any* form of heat dissipation is beyond my comprehension.
>
> Manufacturers only got away with it, so far, because people rarely use 100%
> of the CPU power for extended periods of time, because ARM devices end up as
> set-top boxes, mobile phones and tablets. However, even those devices will
> heat up when playing 2 h films or games, and they do have some form of heat
> sink.
>
> We, at the toolchain group, make things worse by using 100% CPU, 24 / 7,
> something that Panda boards, or Arndales were not designed to do. However,
> with ARM moving into the server space, their designs will have to be
> re-thought, and what a better place than Linaro for making sure we get it
> right?
>
> For the time being, I believe we *must* have air conditioning in the Lab all
> the time, and we *must* have heat-sinks on every board, and we *must*
> monitor the CPU temperature of the boards, at least until we're comfortable
> that they're not failing all the time.
>
> Can we make a temperature monitor (like the one attached) a default feature
> on Linaro Ubuntu distributions? We could dump that info to the syslog/dmesg
> whenever it crosses the (say) 75% threshold, and report more often when it
> crosses the 95%, possibly dumping the processe(s) that are consuming more
> CPU at the time, to enable post-mortem debugging.
>
> cheers,
> --renato
>
> As a side note, the quad-A9 ODroid does ship with a massive heat-sink, which
> also serves as a fancy case. Quite clever, really.
>
> _______________________________________________
> linaro-validation mailing list
> linaro-validat...@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-validation
>



-- 
James Tunnicliffe

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to