[
https://issues.apache.org/jira/browse/SOLR-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720604#comment-16720604
]
Dawid Weiss commented on SOLR-13060:
------------------------------------
So my best guess is this: the suite didn't terminate due to a timeout because
it completed (with a failure). There are numerous messages about threads that
leaked the suite scope and many attempts to send interrupts to these threads.
Fairly early on this happens as well:
{code}
[
"APPEND_STDERR",
{
"chunk": "177769 WARN
(org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeResourceMonitor@6734d615)
[ ] o.a.h.h.s.n.NameNodeResourceChecker Space available on volume
'/dev/sdb1' is 0, which is below the configured reserved amount 104857600%0A"
}
]
{code}
The subsequent 31 gigs of logs is repetitive this:
{code}
[
"APPEND_STDERR",
{
"chunk": "49851804 WARN
(org.apache.hadoop.hdfs.server.datanode.DataXceiverServer@2d347651) [ ]
o.a.h.h.s.d.DataNode 127.0.0.1:42148:DataXceiverServer:
%0Ajava.nio.channels.ClosedChannelException: null%0A%09at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:235)
~[?:1.8.0_191]%0A%09at
sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:100)
~[?:1.8.0_191]%0A%09at
org.apache.hadoop.hdfs.net.TcpPeerServer.accept(TcpPeerServer.java:141)
~[hadoop-hdfs-2.7.4.jar:?]%0A%09at
org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:135)
[hadoop-hdfs-2.7.4.jar:?]%0A%09at java.lang.Thread.run(Thread.java:748)
[?:1.8.0_191]%0A"
}
]
{code}
I am not sure why the JVM doesn't terminate eventually because forked process
does halt() after all suites are processed. Without a stack trace it's
impossible to tell where it got stuck or why... Lack of clean environment
because of those thread leaks doesn't make the diagnosis easier. I'll take
another look tomorrow, time permitting, but it's such a mess on border
conditions (leaked unknown threads crossing test/ runner scope boundaries, 0
disk space, timeouts, 40 gig logs...) that any analysis is very time consuming,
if at all possible.
I think limiting the amount of sysouts will be a good start; the patch I posted
a few days ago is pretty much ready, I'll commit it in over the weekend.
> Some Nightly HDFS tests never terminate on ASF Jenkins, triggering whole-job
> timeout, causing Jenkins to kill JVMs, causing dump files to be created that
> fill all disk space, causing failure of all following jobs on the same node
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-13060
> URL: https://issues.apache.org/jira/browse/SOLR-13060
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Tests
> Reporter: Steve Rowe
> Priority: Major
> Attachments:
> junit4-J0-20181210_065854_4175881849742830327151.spill.part1.gz
>
>
> The 3 tests that are affected:
> * HdfsAutoAddReplicasIntegrationTest
> * HdfsCollectionsAPIDistributedZkTest
> * MoveReplicaHDFSTest
> Instances from the dev list:
> 12/1:
> https://lists.apache.org/thread.html/e04ad0f9113e15f77393ccc26e3505e3090783b1d61bd1c7ff03d33e@%3Cdev.lucene.apache.org%3E
> 12/5:
> https://lists.apache.org/thread.html/d78c99255abfb5134803c2b77664c1a039d741f92d6e6fcbcc66cd14@%3Cdev.lucene.apache.org%3E
> 12/8:
> https://lists.apache.org/thread.html/92ad03795ae60b1e94859d49c07740ca303f997ae2532e6f079acfb4@%3Cdev.lucene.apache.org%3E
> 12/8:
> https://lists.apache.org/thread.html/26aace512bce0b51c4157e67ac3120f93a99905b40040bee26472097@%3Cdev.lucene.apache.org%3E
> 12/11:
> https://lists.apache.org/thread.html/33558a8dd292fd966a7f476bf345b66905d99f7eb9779a4d17b7ec97@%3Cdev.lucene.apache.org%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]