On Tue, 9 Apr 2013 01:20:02 +0200
Alexander Sack <[email protected]> wrote:

> Paul,
> 
> you suggest to use different configurations for doing the "build the
> kernel" 

I'm sure it's just typo, but "build gcc/binutils" to be exact.

> validation and the benchmarking? Why wouldn't we run in the
> same heat issues while benchmarking?

Well, I'm just trying to brainstorm how we can make CBuild/LAVA
actually useful, not something which takes 10hrs just to finish with
failure in >50% of cases. So, just common sense: benchmarking will
still be affected, but at least builds won't be. Also, a gcc build takes
~10hrs, so there's much more chance for it to go into thermal issues
than for a benchmark (don't remember exactly, but should be just
1-2hrs).

> Can we somehow put the SoC into a very low power state for 2 minutes
> for cool down and simply wait before starting the build/test?

That's something I have no idea about (well, I'm sure we can, I'm not
sure next 2 minutes won't heat it back).

> 
> Anyway, I assume that the strict environment requirement where
> toolchain WG needs to sign off on the setup would mostly apply to
> benchmarking only and that we could probably choose any stable image
> for doing the build validation of the toolchain that is rock solid.
> Matt?

It's trickier than that. There always been a known stable image
available as an alternative during testing, but gcc build on it, later
cannot be run on official TCWG image due to missing lib dependencies.
I.e., that should be not just any image, but pretty close to TCWG's.
Also, current TCWG image was received in ready form from Michael Hope,
and I'm personally not sure what's inside, so some effort would be
needed to be spent on figuring that out.

> 
> If so, it sounds sensible to just pick a recent release with thermal
> enabled for the build job and use the special configuration for the
> benchmarking parts - maybe with a cooling step as above.
> 
> 
> 
> On Mon, Apr 8, 2013 at 7:56 PM, Paul Sokolovsky
> <[email protected]>wrote:
> 
> > Hello,
> >
> > On Sat, 6 Apr 2013 15:40:15 +0300
> > Paul Sokolovsky <[email protected]> wrote:
> >
> > []
> >
> > > Ok, so here're these 2 builds:
> > >
> > > gcc-4.8~svn196132
> > >
> > > panda-es02
> > > https://validation.linaro.org/lava-server/scheduler/job/50993
> > >
> > > panda-es05
> > > https://validation.linaro.org/lava-server/scheduler/job/50994
> > >
> > > #50993 went well.
> > >
> > > #50994 midway in compilation started to get invalid data (grep for
> > > "is not valid in preprocessor expressions"), then got kernel
> > > fault, then got caught in reboot loop, apparently due to:
> > >
> > > [    6.631256] thermal_init_thermal_state: Getting initial temp
> > > for cpu domain [    6.638702] thermal_request_temp
> > > [    6.642150] omap_fatal_zone:FATAL ZONE (hot spot temp: 128490)
> > >
> > > - all these behaviors were seen by me before (actually,
> > > previously, I didn't see such explicit messages from kernel that
> > > it's a thermal faults).
> > >
> > > So, CBuild/LAVA can do (successful) builds, but some builds fail
> > > due to thermal issues. Actually, let me load up all boards with
> > > the same build now, towards assessing thermal failure rate more
> > > scientifically.
> >
> > Well, let's count:
> >
> >
> > https://validation.linaro.org/lava-server/dashboard/streams/anonymous/cbuild/bundles/9dea0c78604ce5e65178ec8d71edac8d8a499c4e/
> > OK
> >
> >
> > https://validation.linaro.org/lava-server/dashboard/streams/anonymous/cbuild/bundles/cf346215160e778be13e031378b36328eeb315c3/
> > Failed, thermal (see above)
> >
> >
> > https://validation.linaro.org/lava-server/dashboard/streams/anonymous/cbuild/bundles/bce0a440e5cece819678e6fd7f276a43c2a44ecd/
> > "../../../gcc-4.8~svn196132/gcc/gengtype.c:4106:39: error: cannot
> > convert 'flisT*' to 'flist*' in assignment" flipped bit in file (so
> > that before, including on local Panda), then other random failures.
> >
> >
> > https://validation.linaro.org/lava-server/dashboard/streams/anonymous/cbuild/bundles/6c31d8e21dbc716255c7ec13702b5252434f9ced/
> > "rsync: getaddrinfo: toolchain64 2000: Temporary failure in name
> > resolution" - network flip
> >
> >
> > https://validation.linaro.org/lava-server/dashboard/streams/anonymous/cbuild/bundles/de94c85961e59dbaba41f8dec6c20f4740384241/
> > OK
> >
> >
> > https://validation.linaro.org/lava-server/dashboard/streams/anonymous/cbuild/bundles/c032912d494e80857b931ab8c2dafc4e4f83c762/
> > Lot of "malloc: ../bash/jobs.c:743: assertion botched"
> >
> >
> > https://validation.linaro.org/lava-server/dashboard/streams/anonymous/cbuild/bundles/d36badcd7310aa0db4a1efcd1554d11da06ef6bc/
> > Reboot during configure, then reboot cycle.
> >
> > So, out of 7 builds, only 2 were successful, the rest fail due to
> > thermal issues (apparently, all but the one with network issues). We
> > can try to consider why LAVA-based builds have such low yield rate
> > comparing to native Cbuild builds (one explanation is that LAVA does
> > pretty heavy lifting to install OS, etc., so when build starts, CPU
> > is already pretty hot), but it's clear that using kernel with
> > voltage/frequency scaling disabled simply doesn't work that well for
> > *builds* (vs benchmarks).
> >
> > So, we can consider preparing OS image which uses the same basic OS,
> > but normal kernel, and use that for builds. Would TCWG be able to
> > prepare such image? If not, we can add it to the list of tasks for
> > (now combined) LAVA/Infra team, but with all the other tasks we
> > have, it may take some time to get to.
> >
> >
> > --
> > Best Regards,
> > Paul
> >
> > Linaro.org | Open source software for ARM SoCs
> > Follow Linaro: http://www.facebook.com/pages/Linaro
> > http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog
> >
> 
> 
> 



-- 
Best Regards,
Paul

Linaro.org | Open source software for ARM SoCs
Follow Linaro: http://www.facebook.com/pages/Linaro
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog

_______________________________________________
linaro-validation mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to