[ 
https://issues.apache.org/jira/browse/SOLR-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724511#comment-16724511
 ] 

Dawid Weiss commented on SOLR-13074:
------------------------------------

So the problem here is that a NPE happens in BeforeClass hook in 
MoveReplicaHDFSTest:
{code}
  @BeforeClass
  public static void setupClass() throws Exception {
    System.setProperty("solr.hdfs.blockcache.enabled", "false");
    dfsCluster = 
HdfsTestUtil.setupClass(createTempDir().toFile().getAbsolutePath());

    ZkConfigManager configManager = new ZkConfigManager(zkClient());
{code}

when zkClient() is called from SolrCloudTestCase, the 'cluster' variable is 
null, causing an NPE. Then things get out of hand because we already 
initialized dfsCluster, but the AfterClass hook fails with an NPE before it can 
clean it up:
{code}
  @AfterClass
  public static void teardownClass() throws Exception {
    cluster.shutdown(); // need to close before the MiniDFSCluster
    HdfsTestUtil.teardownClass(dfsCluster);
    dfsCluster = null;
  }
{code}

That's the reason of all those thread leaks from Hdfs. Now, I have no idea how 
to initialize this cluster properly (I know nothing about cloud infra). I've 
committed some code to master to clean up this test properly: this now displays 
the actual cause of the problem. The cleanup code begs for some kind of 
higher-level "closer" which could close all these objects in order, taking into 
account nulls and their specific close methods. I didn't deal with it.

[[email protected]] -- would you take a look at how to initialize the 
cluster properly in this test? Or maybe [~ab] would know how to fix it (I see 
you're the original author of this test, Andrzej, hence the question).

> MoveReplicaHDFSTest leaks threads, falls into an endless loop, logging like 
> crazy
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-13074
>                 URL: https://issues.apache.org/jira/browse/SOLR-13074
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Dawid Weiss
>            Assignee: Dawid Weiss
>            Priority: Major
>
> This reproduces for me, always (Linux box):
> {code}
> ant test  -Dtestcase=MoveReplicaHDFSTest -Dtests.seed=DC1CE772C445A55D 
> -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=fr 
> -Dtests.timezone=Australia/Tasmania -Dtests.asserts=true 
> -Dtests.file.encoding=UTF-8
> {code}
> It's the bug in Hadoop I discusse in SOLR-13060 -- one of the threads falls 
> into an endless loop when terminated (interrupted). Perhaps we should close 
> something cleanly and don't.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to