[
https://issues.apache.org/jira/browse/SOLR-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720256#comment-16720256
]
Steve Rowe commented on SOLR-13060:
-----------------------------------
Same thing happened again, AFAICT *after* my suite timeout commit, on
https://builds.apache.org/job/Lucene-Solr-NightlyTests-master/1721/ - two test
suites continued to print HEARTBEAT messages long after the hour (3600s)
timeout I set on them. [~dweiss] do you understand what's happening?
>From .../1721/consoleText:
{noformat}
Checking out Revision ef2f0cd88c6eb4b662aea06eaeb3b933288b23eb (refs/remotes/
origin/master)
[...]
[junit4] HEARTBEAT J2 PID(17526@lucene1-us-west): 2018-12-13T05:07:17,
stalled for 48000s at: MoveReplicaHDFSTest (suite)
[junit4] HEARTBEAT J0 PID(17515@lucene1-us-west): 2018-12-13T05:07:17,
stalled for 49652s at: HdfsCollectionsAPIDistributedZkTest (suite)
{noformat}
>From {{git log}}:
{noformat}
commit ef2f0cd88c6eb4b662aea06eaeb3b933288b23eb
Author: Jan Høydahl <[email protected]>
Date: Wed Dec 12 11:33:32 2018 +0100
[...]
commit ec1bd0da2f784a39fbe5dc21d78349c41bfdaec2
Author: Steve Rowe <[email protected]>
Date: Tue Dec 11 18:49:06 2018 -0800
{noformat}
>From {{MoveReplicaHDFSTest}} and {{HdfsCollectionsAPIDistributedZkTest}} at
>commit {{ef2f0c}}:
{code:java}
@ThreadLeakFilters(defaultFilters = true, filters = {
BadHdfsThreadsFilter.class, // hdfs currently leaks thread(s)
MoveReplicaHDFSTest.ForkJoinThreadsFilter.class
})
@Nightly // test is too long for non nightly
@TimeoutSuite(millis = TimeUnits.HOUR)
public class MoveReplicaHDFSTest extends MoveReplicaTest {
{code}
{code:java}
@Slow
@Nightly
@ThreadLeakFilters(defaultFilters = true, filters = {
BadHdfsThreadsFilter.class // hdfs currently leaks thread(s)
})
@TimeoutSuite(millis = TimeUnits.HOUR)
//commented 23-AUG-2018
@LuceneTestCase.BadApple(bugUrl="https://issues.apache.org/jira/browse/SOLR-12028")
// 12-Jun-2018
public class HdfsCollectionsAPIDistributedZkTest extends
CollectionsAPIDistributedZkTest {
{code}
> Some Nightly HDFS tests never terminate on ASF Jenkins, triggering whole-job
> timeout, causing Jenkins to kill JVMs, causing dump files to be created that
> fill all disk space, causing failure of all following jobs on the same node
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-13060
> URL: https://issues.apache.org/jira/browse/SOLR-13060
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Tests
> Reporter: Steve Rowe
> Priority: Major
> Attachments:
> junit4-J0-20181210_065854_4175881849742830327151.spill.part1.gz
>
>
> The 3 tests that are affected:
> * HdfsAutoAddReplicasIntegrationTest
> * HdfsCollectionsAPIDistributedZkTest
> * MoveReplicaHDFSTest
> Instances from the dev list:
> 12/1:
> https://lists.apache.org/thread.html/e04ad0f9113e15f77393ccc26e3505e3090783b1d61bd1c7ff03d33e@%3Cdev.lucene.apache.org%3E
> 12/5:
> https://lists.apache.org/thread.html/d78c99255abfb5134803c2b77664c1a039d741f92d6e6fcbcc66cd14@%3Cdev.lucene.apache.org%3E
> 12/8:
> https://lists.apache.org/thread.html/92ad03795ae60b1e94859d49c07740ca303f997ae2532e6f079acfb4@%3Cdev.lucene.apache.org%3E
> 12/8:
> https://lists.apache.org/thread.html/26aace512bce0b51c4157e67ac3120f93a99905b40040bee26472097@%3Cdev.lucene.apache.org%3E
> 12/11:
> https://lists.apache.org/thread.html/33558a8dd292fd966a7f476bf345b66905d99f7eb9779a4d17b7ec97@%3Cdev.lucene.apache.org%3E
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]