On Fri, Jan 29, 2016 at 12:58 PM, j.nitsc...@ok.de <j.nitsc...@ok.de> wrote:

> On Thu, 28 Jan 2016 16:10:52 -0800 Kay Schenk wrote:
> >
> > On 01/14/2016 09:48 AM, Kay Schenk wrote:
> >> On Thu, Jan 14, 2016 at 4:04 AM, j.nitsc...@ok.de
> >> <mailto:j.nitsc...@ok.de> <j.nitsc...@ok.de
> >> <mailto:j.nitsc...@ok.de>> wrote:
> >>
> >>     Hello,
> >>
> >>     some may have noticed our linux-32 buildbot fails quite often. [1]
> >>     Here an analysis: (tl;dr jump to solutions)
> >>     * always fails in first buildbot step: svn updating
> >>     * failed step takes around 6 minutes, a successfull step uses ~37
> >>     minutes to complete
> >>     * the commands in the step take much time and often a timeout
> >>     triggers
> >>
> >>     The commands and their timeouts (seconds) are:
> >>     1) svn --version (1200)
> >>     2) rm -rf
> >>     /home/buildslave20/slave20/openoffice-linux32-nightly/build (120)
> >>     3) chmod -Rf u+rwx
> >>     /home/buildslave20/slave20/openoffice-linux32-nightly/build
> >>     (120) ah, why?
> >>     4) rm -rf
> >>     /home/buildslave20/slave20/openoffice-linux32-nightly/build
> >>     (120) huh, again?
> >>     5) svn info --xml --non-interactive --no-auth-cache (1200)
> >>     6) svn update --non-interactive --no-auth-cache (1200)
> >>     7) cp -R -P -p -v
> >>     /home/buildslave20/slave20/openoffice-linux32-nightly/source
> >>     /home/buildslave20/slave20/openoffice-linux32-nightly/build (120)
> >>     8) svn info --xml (1200)
> >>
> >>     Their results:
> >>     1) Always finishes in ~15 seconds
> >>     2) No output, almost always fails with command timed out: 120
> >>     seconds
> >>     without output, attempting to kill
> >>     3) No output, almost always fails with command timed out: 120
> >>     seconds
> >>     without output, attempting to kill
> >>     4) No output, finishes sometimes.
> >>     *if we fail here the build process is stopped and this the
> >>     reason for
> >>     the often failures*
> >>     5) Local command completes in a sec.
> >>     6) Can take a while depending in source changes. Gives tons of
> >>     output,
> >>     so timeout never triggers.
> >>     7) Takes *very* long (over 20 minutes) but never triggers timeout as
> >>     '-v' the output spams the log.
> >>     8) Local command again takes a sec.
> >>
> >>     Conclusions:
> >>     *file operations don't have enough time to finish*
> >>
> >>     Solutions:
> >>     Edit 'svn updating' buildstep
> >>     a) Remove rm and chmod commands and replace cp with
> >>     'rsync -q -t -p -r --delete
> >>     /home/buildslave20/slave20/openoffice-linux32-nightly/source
> >>     /home/buildslave20/slave20/openoffice-linux32-nightly/build'
> >>       This is much faster as very few copies needed and it's delete is
> >>     faster than rm command. But increase the timeout anyway just in
> >>     case.
> >>     (*preferred* solution but needs rsync on the box)
> >>     b) increase the timeouts and shut up cp by removing '-v'
> >>     c) remove unversioned files when updating and build in this folder
> >>     d) Make rm and chmod verbose by adding '-v' (or -c' for chmod).
> >>     Spam the
> >>     log even more, but the timeouts won't trigger.
> >>       Who doesn't like 50MB logfiles? Yes, the log for this step of
> >>     every
> >>     succesfull build is over 50MB currently! Starting build #127 [1]
> >>     (before
> >>     this build there was only a build folder but no source
> >>       Not a serious solution!
> >>
> >>     *I suggest we fix this soon because the huge log files will blow
> >>     up a
> >>     server sooner or later.*
> >>
> >>     Regards Jochen
> >>
> >>     [1] https://ci.apache.org/builders/openoffice-linux32-nightly
> >>
> >>     note: on linux64 buildbot the file operations are *much* faster. cp
> >>     takes 90 secs isn't verbose but in the 120 sec timeout limit.
> >>
> >>
> >> ​Thanks for the suggestions, I will look into this. ​
> >>
> >>
> > I just wanted to  give a short update on this.
> >
> > * our Linux-32 and linux-64 buildbots use the same mechanisms for an
> > svn pull -- a "copy" -- so I left the 32-bit instructions as is
> 'copy' instructions differ in one detail
> Linux-32: cp -R -P -p -v
> /home/buildslave20/slave20/openoffice-linux32-nightly/source
> /home/buildslave20/slave20/openoffice-linux32-nightly/build
> Linux-64: cp -R -P -p
> /home/buildslave19/slave19/openofficeorg-nightly/source
> /home/buildslave19/slave19/openofficeorg-nightly/build
>
> *-v* needs to go to reduce the log siz
>

​OK. and thank you.
Your eyes are better than mine! :)
​


> but we have to increase timeout further before we do this or copy will
> always fail
>
>
> https://ci.apache.org/builders/openoffice-linux32-nightly/builds/162/steps/svn/logs/stdio
> :
>

​I will try it again. It's these odd remove commands that we don't seem to
control that seem to be a problem.
​


> > cp -R -P -p -v
> > /home/buildslave20/slave20/openoffice-linux32-nightly/source
> > /home/buildslave20/slave20/openoffice-linux32-nightly/build in dir
> > /home/buildslave20/slave20/openoffice-linux32-nightly (timeout 120 secs)
> ... humongous log ...
> > elapsedTime=1370.929525 program finished with exit code 0
> seems 1200 won't be enough, note that the timeout for cp was still 120
>
> On Thu, 28 Jan 2016 16:10:52 -0800 Kay Schenk wrote:
> > * I recently updated the timeout for the svn pull for linux-32 to
> > 1200 secs. To me it looked like this was set to 120 though it IS
> > supposed to default to 1200, but...
> timeouts in 'svn update' of build #162 (Jan 29 02:05) haven't changed
> from older builds
> >
> > * there are some other extra steps -- some removes -- that seem to
> > be tacked onto the svn step that are outside of our config commands
> > that ARE timing out and seem to NOT be governed by the total timeout
> > for this step, yet they time out in successful builds also.
> well, removes get an other try after a chmod.
> so the first remove can timeout without consequence
>
> when both removes fail the build fails, but succeeds the next day
> because most files are removed already
>

​Yes, that is correct. I noticed this also. So we get builds every other
day.
​


> > * there are some buildbot setup instructions that differ for our
> > linux-64 and linux-32 builds.
> maybe our instructions don't reach the buildbots or aren't updated?
> >
> > Detailed in:
> > My INFRA ticket to track Linux-32 buildbot problems:
> >
> > https://issues.apache.org/jira/browse/INFRA-10997
> >
> > So, still a mystery to me at this point.
> checking time frame for other tasks is a good idea
> the difference of the same cp on Linux-32 and Linux-64 looks too big
> Linux-32: elapsedTime=1370.929525
> Linux-64: elapsedTime=117.262038
>
>
​Thanks again for your assistance...more tweaking to come...soon.​


-- 
----------------------------------------------------------------------
MzK

"Though no one can go back and make a brand new start,
 anyone can start from now and make a brand new ending."
                                                          -- Carl Bard

Reply via email to