Re: we need a fix: precommit failures correlate to hdfs patches

Allen Wittenauer Mon, 04 May 2015 16:12:04 -0700

FWIW, I’m working on getting the Jenkins race conditions that Sean pointed out 
fixed in HADOOP-11917.



On May 4, 2015, at 2:23 PM, Chris Nauroth <cnaur...@hortonworks.com> wrote:

> If we suspect long run times are a potential root cause, then another
> thing we could try is turning on parallel test execution.  To do that,
> we'd add the -Pparallel-tests argument and possibly tune
> -DtestsThreadCount=N.  (The default for N is 4.)
> 
> https://issues.apache.org/jira/browse/HADOOP-9287
> 
> This has given some of us significant speed-ups while running tests in our
> dev environments.  I haven't tried it in a while though, so we might
> surface some test isolation problems, such as if 2 test suites tried to
> work in the same directory for data.  We cleaned up a lot of issues like
> that before committing the parallel-tests patches, but it's possible new
> problems have crept in.
> 
> --Chris Nauroth
> 
> 
> 
> 
> On 5/3/15, 9:02 PM, "Sean Busbey" <bus...@cloudera.com> wrote:
> 
>> The patch artifact directory in the mainline hadoop jenkins jobs are
>> outside of the workspace. I'm not sure what, if anything, jenkins
>> guarantees about files out of the main workspace.
>> 
>> They all write to ${WORKSPACE}/../patchProcess, which will probably
>> collide
>> if multiple runs happen on the same machine. They also all blindly move
>> that directory at the end of the run.
>> 
>> On Sun, May 3, 2015 at 3:02 PM, Allen Wittenauer <a...@altiscale.com> wrote:
>> 
>>> 
>>>        So, as some may have noticed, I slammed the Jenkins servers over
>>> the weekend to get some recent patch test runs in JIRA for the bug bash
>>> this week.  I've had a suspicion for a while now that either the long
>>> run
>>> times of the hadoop-hdfs module unit tests (typically 2+ hours) or the
>>> hdfs
>>> tests themselves were related to the patch process directory getting
>>> removed out from underneath test-patch.
>>> 
>>>        To test the hypothesis, I submitted all of the non-HDFS patches
>>> so
>>> that they were first in the queue.  Let them run for a very long time.
>>> Jenkins bounced back and forth between YARN, MR, and HADOOP.   No issues
>>> encounters.  Added HDFS patches into the mix. BOOM. The dreaded "The
>>> patch
>>> artifact directory has been removed! ³ started to appear here and there.
>>> This seems to provide some evidence that, yes, hdfs unit tests are
>>> directory or indirectly related to the failures.
>>> 
>>>        IMO, I think we need to take a serious look at:
>>> 
>>>        * splitting up the hadoop-hdfs module into multiple modules to
>>> reduce unit test run times
>>>        * checking to see if the pre commit hooks in hdfs are different
>>> than the rest (I do know that the YARN bits are different and appear to
>>> have some bugs as well)
>>>        * increasing the timeout for jenkins job runs
>>> 
>>>        FWIW, I¹ve also found some minor things here and there with the
>>> rewritten test-patch.sh.  JIRAs have been filed.  One critical, one
>>> major
>>> and a handful of minor things.
>> 
>> 
>> 
>> 
>> -- 
>> Sean
>

Re: we need a fix: precommit failures correlate to hdfs patches

Reply via email to