Re: [Linaro-validation] Forced reboots are harmful

Alexander Sack Tue, 20 Mar 2012 06:58:23 -0700

On Tue, Mar 20, 2012 at 12:52:48PM +0100, Zygmunt Krynicki wrote:
> W dniu 20.03.2012 11:48, Alexander Sack pisze:
> >On Tue, Mar 20, 2012 at 11:41:37AM +0100, Zygmunt Krynicki wrote:
> >>Hi
> >>
> >>Experimenting with the dispatcher made me realize that forced reboots (on
> >>timeouts, for example) are an excellent way to damage the master image. At
> >>the very best we are forced to re-check the master image. At the very worst
> >>we may damage the superblock and generally hose the master.
> >>
> >>Do you think it is feasible to mount the master read-only and only do r/w
> >>work on the test partitions?
> >>
> >
> >I like this idea... That combined with always-poweroff-on-reboot feels
> >like a good idea to compensate potential issues...
> 
> Just curious: why would we always poweroff on reboot? Do you mean actual
> power being cut or the equivalent of poweroff(8)?


That's a different background. The key of automation infrastructure is
to ensure that each invidividual test is run in a controlled
environment with as close to 100% reprodicibility of the state as
possible.

soft rebooting the unit doesn't guarantee to bring you back into a
known base state. Hence, the requirement to always hard reboot with
proper time unpowered in between.

> 
> >Take the approach known from live-cd into account, such as
> >aufs/unionfs and things should work well ... maybe master image doesnt
> >even need a partition anymore, but can be just a .img file on the fat
> >boot partition, just like how ubuntu live-cd etc. works...
> 
> I wonder what is the complexity of this approach. I would also like to
> consider the memory requirements. As an alternative we could try to mount
> the master image from NBD. The NBD server already support "reverting to
> snapshot" and keeping delta for each connected client in a temporary
> file.

Considering that the master image is nano and boots to the console
only, I don't think that the memory requirements would exceed what we
target LAVA lab at.

What I don't like about NBD is that it makes the LAVA infrastructure
more complex and harder to replicate. Everytime we add a new
server/service that isn't the image/board itself, we diverge from
something that can be validated and released efficiently/effectively a
bit further.

> 
> >Anyone can think of reason to not put that into the backlog?
> >
> >If LAVA team decides to investigate that path, please check with
> >DevPlatform team on how they can help...
> 
> I think we should seriously consider it as a milestone towards LAVA
> reliability and automation of master image construction.

Do we have a few empiric examples of the gathered list of LAVA
incident that allows us to identify changes to master image (not
talking about reproducability here) as a recurring source for
unreliability?


-- 
Alexander Sack <[email protected]>
Technical Director, Linaro Platform Teams
http://www.linaro.org | Open source software for ARM SoCs
http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog


_______________________________________________
linaro-validation mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-validation

Re: [Linaro-validation] Forced reboots are harmful

Reply via email to