[ 
https://issues.apache.org/jira/browse/HADOOP-15711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16623061#comment-16623061
 ] 

Allen Wittenauer commented on HADOOP-15711:
-------------------------------------------

That's not how any of this works. People need to stop skimming and actually 
_READ_ the damn logs.

bq. Folks are working on patches/features targeting branch-2 that do not touch 
HDFS code and hence probably the tests wont get executed anyway - atleast for 
most YARN patches.

It's a race condition in surefire.  YARN patches ARE impacted.   HDFS is just 
impacted more often than not because there are so many in one maven module.   
Go look at the ENTIRE log that shv posted:

hadoop-hdfs
bkjournal
mapreduce-client-jobclient
hadoop-archives
hadoop-extras
yarn-distributedshell
yarn-client

Hey look, two YARN modules.  

Hell, this revert triggered one in hadoop-common in its very first run.  

bq. @ Ignore offending tests for now

You can't.  There is no identifier as to which tests actually trigger the dead 
JVM bug because it's random.  Worse still, the XML data that Surefire, Yetus, 
and Jenkins relies upon isn't written down when the JVM crash occurs so no clue 
as to which tests didn't run.  That's why this bug is so deadly.

One can either have a good run (like this revert patch's first run--only one 
JVM crashed!) or a bad one where the whole house of cards come crumbling down 
(like the second one).

Personally:  I know that reverting surefire upgrades are just going to make 
things worse.  Upgrading surefire helped stabilize things tremendously in 
trunk. It might be that surefire needs to get upgraded to an even higher rev.

If everyone is just worried about getting patch results and not full builds 
then with the exception of HDFS, everything should be working fairly well so 
long as committers keep branch-2's Dockerfile updated correctly. (e.g., 
YARN-8658 generated branch-2 results for a patch)  HDFS has some other issue in 
branch-2.8 and up that causes it to just go out of control randomly, especially 
when the unit tests are run in parallel.

> Fix branch-2 builds
> -------------------
>
>                 Key: HADOOP-15711
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15711
>             Project: Hadoop Common
>          Issue Type: Task
>            Reporter: Jonathan Hung
>            Priority: Critical
>         Attachments: HADOOP-15711.001.branch-2.patch
>
>
> Branch-2 builds have been disabled for a while: 
> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86/
> A test run here causes hdfs tests to hang: 
> https://builds.apache.org/view/H-L/view/Hadoop/job/hadoop-qbt-branch2-java7-linux-x86-jhung/4/
> Running hadoop-hdfs tests locally reveal some errors such 
> as:{noformat}[ERROR] 
> testComplexAppend2(org.apache.hadoop.hdfs.TestFileAppend2)  Time elapsed: 
> 0.059 s  <<< ERROR!
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:714)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:1164)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:1128)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:174)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1172)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:403)
>         at 
> org.apache.hadoop.hdfs.DFSTestUtil.formatNameNode(DFSTestUtil.java:234)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:1080)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:883)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:514)
>         at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:473)
>         at 
> org.apache.hadoop.hdfs.TestFileAppend2.testComplexAppend(TestFileAppend2.java:489)
>         at 
> org.apache.hadoop.hdfs.TestFileAppend2.testComplexAppend2(TestFileAppend2.java:543)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){noformat}
> I was able to get more tests passing locally by increasing the max user 
> process count on my machine. But the error suggests that there's an issue in 
> the tests themselves. Not sure if the error seen locally is the same reason 
> as why jenkins builds are failing, I wasn't able to confirm based on the 
> jenkins builds' lack of output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to