On 07/01/2012 07:03 PM, Michael Hudson-Doyle wrote:
Andy Doan <[email protected]> writes:

We've seen a significant reduction in health job failures, but I still
wanted to send out a report on these so people could see how things are
still breaking.

We've had 25 real health failures over the past 2 weeks.

By device type:

   6 snowball
   1 imx53
   1 vexpress
   2 beagle
   6 origen
   9 panda

By failure type:

   2 SD cards died: (both on Origen)

Yay!  That's the sort of problem we are _supposed_ to be finding :)

   7 Serial Console Related:
    - 5 connection never established at start of job

I'd dearly love to know what's going on here.  I could implement a kind
of ~exponential back off where we wait 5 seconds, 1 minute, 5 minutes
between attempts to reset the port?

Maybe Dave has some thoughts. I haven't played around enough with that stuff to have a very informed opinion, but that does sound worth trying on the surface.


    - 1 connection dropped during test
    - 1 garbage over serial line

Not sure what we can do about these in general.

yeah, and if its that small a number, I'm inclined to wait and until they become a higher percentage of our problems.

   10 Network Related:
    - 3 network failed to come up
    - 3 ping unreachable error

    - 3 wget | tar type failure (kind of network we think)

We have a plan around these, at least.

Cheers,
mwh




_______________________________________________
linaro-validation mailing list
[email protected]
http://lists.linaro.org/mailman/listinfo/linaro-validation

Reply via email to