Hello, On Thu, 18 Apr 2013 11:57:43 +0530 Vishal Bhoj <[email protected]> wrote:
> On 16 April 2013 15:31, Paul Sokolovsky <[email protected]> > wrote: > > > Hello, > > > > On Tue, 16 Apr 2013 06:19:51 +0530 > > Vishal Bhoj <[email protected]> wrote: > > > > [] > > > > > This error is related to infrastructure. I am not sure if this > > > > > will be resolved if the publishing is updated. > > > > > > > > ChannelClosedException as quoted below means an EC2 instance got > > > > terminated (or otherwise "lost") behind Jenkins' back. > > > > Generally, this issue is a known non-deterministic failure > > > > issue, and bound to happen from time to time due to the nature > > > > of EC2 (complex big system, has non-zero stream of errors). > > > > > > > > > > We really need to find a solution for this. We are running into > > > this error on a regular basis nowadays. > > > > Yes, I see that 2 builds I tried yesterday didn't succeed either, > > with the same error. Builds of that job look weird, because they're > > still in compile phase after 3.5hrs after the start - that's too > > long. At last 2 builds were killed at almost the same time, but > > it's not build timeout (set at 4:45mins, looks differently), not > > EC2 monitoring script (not active now, never killed running builds, > > only zombie instances). > > > > I'm cc:ing Phillip just in case if he may know of anything which may > > kill EC2 in the "old" EC2 Linaro account in 3.5hrs? > > > > > > I still think the likely cause though is master overload due to > > publishing issues, and would like to keep working on resolving that > > first. I have good results so far - "copycat" build on a sandbox > > finished with less than 2min: > > > > https://ec2-107-20-93-222.compute-1.amazonaws.com/jenkins/job/pfalcon_galaxynexus-linaro/9/ > > > > That's 1/2 of all work needed tho, going to deploy needed parts on > > production and continue with it. > > > > Is there any update on why we are seeing this failure ? We still > continue to see the same failure: > https://android-build.linaro.org/jenkins/job/linaro-android_vexpress-linaro-mp/263/console So, last GMT evening, new publishing was deployed, and previously known jobs with publishing failures were confirmed to be ok. Later, with daily builds kicking in, and small zombie pile-up on ci.linaro.org, we had publishing testing turned into overall stress testing, and at 12 concurrent builds, Jenkins master started to choke, and lose track of build slaves. Well, 12 parallel is too much anyway, we never tested with more than 10 previously, and normally don't have more than 6. That build was rebuilt without issues a bit later: https://android-build.linaro.org/jenkins/job/linaro-android_vexpress-linaro-mp/264/ And access-control issue was discovered and fixes too, so currently there're no known issue on android-build.linaro.org, with me keeping monitoring it and around to resolve any issues. [] -- Best Regards, Paul Linaro.org | Open source software for ARM SoCs Follow Linaro: http://www.facebook.com/pages/Linaro http://twitter.com/#!/linaroorg - http://www.linaro.org/linaro-blog _______________________________________________ linaro-validation mailing list [email protected] http://lists.linaro.org/mailman/listinfo/linaro-validation
