FWIW, I’m working on getting the Jenkins race conditions that Sean pointed out fixed in HADOOP-11917.
On May 4, 2015, at 2:23 PM, Chris Nauroth <cnaur...@hortonworks.com> wrote: > If we suspect long run times are a potential root cause, then another > thing we could try is turning on parallel test execution. To do that, > we'd add the -Pparallel-tests argument and possibly tune > -DtestsThreadCount=N. (The default for N is 4.) > > https://issues.apache.org/jira/browse/HADOOP-9287 > > This has given some of us significant speed-ups while running tests in our > dev environments. I haven't tried it in a while though, so we might > surface some test isolation problems, such as if 2 test suites tried to > work in the same directory for data. We cleaned up a lot of issues like > that before committing the parallel-tests patches, but it's possible new > problems have crept in. > > --Chris Nauroth > > > > > On 5/3/15, 9:02 PM, "Sean Busbey" <bus...@cloudera.com> wrote: > >> The patch artifact directory in the mainline hadoop jenkins jobs are >> outside of the workspace. I'm not sure what, if anything, jenkins >> guarantees about files out of the main workspace. >> >> They all write to ${WORKSPACE}/../patchProcess, which will probably >> collide >> if multiple runs happen on the same machine. They also all blindly move >> that directory at the end of the run. >> >> On Sun, May 3, 2015 at 3:02 PM, Allen Wittenauer <a...@altiscale.com> wrote: >> >>> >>> So, as some may have noticed, I slammed the Jenkins servers over >>> the weekend to get some recent patch test runs in JIRA for the bug bash >>> this week. I've had a suspicion for a while now that either the long >>> run >>> times of the hadoop-hdfs module unit tests (typically 2+ hours) or the >>> hdfs >>> tests themselves were related to the patch process directory getting >>> removed out from underneath test-patch. >>> >>> To test the hypothesis, I submitted all of the non-HDFS patches >>> so >>> that they were first in the queue. Let them run for a very long time. >>> Jenkins bounced back and forth between YARN, MR, and HADOOP. No issues >>> encounters. Added HDFS patches into the mix. BOOM. The dreaded "The >>> patch >>> artifact directory has been removed! ³ started to appear here and there. >>> This seems to provide some evidence that, yes, hdfs unit tests are >>> directory or indirectly related to the failures. >>> >>> IMO, I think we need to take a serious look at: >>> >>> * splitting up the hadoop-hdfs module into multiple modules to >>> reduce unit test run times >>> * checking to see if the pre commit hooks in hdfs are different >>> than the rest (I do know that the YARN bits are different and appear to >>> have some bugs as well) >>> * increasing the timeout for jenkins job runs >>> >>> FWIW, I¹ve also found some minor things here and there with the >>> rewritten test-patch.sh. JIRAs have been filed. One critical, one >>> major >>> and a handful of minor things. >> >> >> >> >> -- >> Sean >