[ 
https://issues.apache.org/jira/browse/HADOOP-15711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16610002#comment-16610002
 ] 

Allen Wittenauer commented on HADOOP-15711:
-------------------------------------------

bq. Isolate a commit which caused stopping sending the emails when the build 
finishes.

Jenkins will send an email if the build finishes.  This job doesn't finish in 
the allotted time. Increasing the allotted time just makes it spin longer.  The 
only "commit" would be to rewrite parts of Yetus to write partial report files.

bq. Alternatively we can just switch to old-fashioned test-patch build in favor 
of yetus. It seems that our local Jenkins with test-patch based build is 
succeeding on branch-2, that is, some tests are failing, but the build does not 
timeout.

Pre-yetus test-patch doesn't know how to do full builds.  So I don't know what 
is running on your local instance. The "full" builds[1] that the ASF used to 
run prior to Yetus (about 3 years ago...) were literally:

cd hadoop-(common|hdfs|mapreduce|yarn)-project
mvn test (options)

This means that if there was ANY failure in a module, the whole mvn process 
died.  As soon as the mvn process dies, the build is dead and the job is closed 
off.  Yetus actually knows to break builds up by module, so one module failing 
will not prevent the other modules from getting tested, regardless of the 
normal maven dependency checks.

Additionally, your local Jenkins instance is likely running on something 
significantly bigger/less busy than the ASF node. When I played with this last 
year I could absolutely get branch-2 to exhaust system resources without Yetus 
running on my own gear. In fact, the Yetus barrier code was able to be written 
because I was able to duplicate the failures.  If Yetus is removed, then 
someone needs to help Infra babysit the dead nodes that will likely be created.

I seem to recall that the last time I tried these experiments, branch-2.7 was 
OK.  So there is something in 2.8 and up that is messed up.

1 - They weren't actually full builds and large swaths of code went untested, 
including all of hadoop-tools.

> Fix branch-2 builds
> -------------------
>
>                 Key: HADOOP-15711
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15711
>             Project: Hadoop Common
>          Issue Type: Task
>            Reporter: Jonathan Hung
>            Priority: Critical
>
> Branch-2 builds have been disabled for a while: 
> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/
> A test run here causes hdfs tests to hang: 
> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86-jhung/4/
> Running hadoop-hdfs tests locally reveal some errors such 
> as:{noformat}[ERROR] 
> testComplexAppend2(org.apache.hadoop.hdfs.TestFileAppend2)  Time elapsed: 
> 0.059 s  <<< ERROR!
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:714)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:1164)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:1128)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:174)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1172)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:403)
>         at 
> org.apache.hadoop.hdfs.DFSTestUtil.formatNameNode(DFSTestUtil.java:234)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:1080)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:883)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:514)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:473)
>         at 
> org.apache.hadoop.hdfs.TestFileAppend2.testComplexAppend(TestFileAppend2.java:489)
>         at 
> org.apache.hadoop.hdfs.TestFileAppend2.testComplexAppend2(TestFileAppend2.java:543)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){noformat}
> I was able to get more tests passing locally by increasing the max user 
> process count on my machine. But the error suggests that there's an issue in 
> the tests themselves. Not sure if the error seen locally is the same reason 
> as why jenkins builds are failing, I wasn't able to confirm based on the 
> jenkins builds' lack of output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to