[
https://issues.apache.org/jira/browse/SOLR-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16724511#comment-16724511
]
Dawid Weiss commented on SOLR-13074:
------------------------------------
So the problem here is that a NPE happens in BeforeClass hook in
MoveReplicaHDFSTest:
{code}
@BeforeClass
public static void setupClass() throws Exception {
System.setProperty("solr.hdfs.blockcache.enabled", "false");
dfsCluster =
HdfsTestUtil.setupClass(createTempDir().toFile().getAbsolutePath());
ZkConfigManager configManager = new ZkConfigManager(zkClient());
{code}
when zkClient() is called from SolrCloudTestCase, the 'cluster' variable is
null, causing an NPE. Then things get out of hand because we already
initialized dfsCluster, but the AfterClass hook fails with an NPE before it can
clean it up:
{code}
@AfterClass
public static void teardownClass() throws Exception {
cluster.shutdown(); // need to close before the MiniDFSCluster
HdfsTestUtil.teardownClass(dfsCluster);
dfsCluster = null;
}
{code}
That's the reason of all those thread leaks from Hdfs. Now, I have no idea how
to initialize this cluster properly (I know nothing about cloud infra). I've
committed some code to master to clean up this test properly: this now displays
the actual cause of the problem. The cleanup code begs for some kind of
higher-level "closer" which could close all these objects in order, taking into
account nulls and their specific close methods. I didn't deal with it.
[[email protected]] -- would you take a look at how to initialize the
cluster properly in this test? Or maybe [~ab] would know how to fix it (I see
you're the original author of this test, Andrzej, hence the question).
> MoveReplicaHDFSTest leaks threads, falls into an endless loop, logging like
> crazy
> ---------------------------------------------------------------------------------
>
> Key: SOLR-13074
> URL: https://issues.apache.org/jira/browse/SOLR-13074
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Dawid Weiss
> Assignee: Dawid Weiss
> Priority: Major
>
> This reproduces for me, always (Linux box):
> {code}
> ant test -Dtestcase=MoveReplicaHDFSTest -Dtests.seed=DC1CE772C445A55D
> -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.locale=fr
> -Dtests.timezone=Australia/Tasmania -Dtests.asserts=true
> -Dtests.file.encoding=UTF-8
> {code}
> It's the bug in Hadoop I discusse in SOLR-13060 -- one of the threads falls
> into an endless loop when terminated (interrupted). Perhaps we should close
> something cleanly and don't.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]