Tai Zhou created HDFS-16668:
-------------------------------
Summary: Clean up MoverExecutor after each iteration to avoid
potential thread leak
Key: HDFS-16668
URL: https://issues.apache.org/jira/browse/HDFS-16668
Project: Hadoop HDFS
Issue Type: Improvement
Affects Versions: 3.3.3
Reporter: Tai Zhou
Attachments: screenshot-1.png
Hi,
I am working on a HDFS smart storage management project recently. It is based
on the Mover in Hadoop-hdfs project. I noticed that most code in Mover is
similar to Balancer. However, Mover doesn't clean up MoverExecutor as Balancer
does.
If we have multiple NameSystem for Namenode Connectors or have a large number
of datanodes, Mover will result in threads leaking because there might be
numerous iterations to process these namespaces. Like our project, we modified
some source code so that we can use mover.run() once we found the blocks did
not match the expected storage policies. So our application will initialize
Namenode Connector and Mover continually. It turns out we have thousands of
threads or threads pools for MoverExecutor.
here is what it looks like. We can see here are 9000+ threads like this in WAIT
condition.
!image-2022-05-11-15-40-39-194.png!
I know generally users may not use Mover like us. They might use it by CLI. But
more and more users are planing to apply RBF or multiple NameSystems, or with
a large cluster of datanodes. Mover CLI have to keep more than thousands of
thread after pressing the enter key.
I have pulled a quick fix code, if you guys are interested, plz take a look at
it.
thx.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]