[
https://issues.apache.org/jira/browse/HDFS-16631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
fanshilun updated HDFS-16631:
-----------------------------
Description:
In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of
deadlock, this is a very meaningful discussion, I was reading the log and found
the following:
{code:java}
2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN
datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) -
not open lock leak check func.{code}
Looking at the code, I found that there is such a parameter:
{code:java}
<property>
<name>dfs.datanode.lockmanager.trace</name>
<value>false</value>
<description>
If this is true, after shut down datanode lock Manager will print all leak
thread that not release by lock Manager. Only used for test or trace dead
lock
problem. In produce default set false, because it's have little
performance loss.
</description>
</property> {code}
I think this parameter should be added in the test environment, so that if
there is a DN deadlock, the cause can be quickly located.
According to suggestions, the following modifications are made:
1. On the read and write lock related methods of DataSetLockManager, add the
operation name to clearly indicate the source of the lock, which is convenient
for public use.
2. Increase the granularity of indicator monitoring, including the number of
locks, the time of locks, and the early warning of locks.
was:
In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of
deadlock, this is a very meaningful discussion, I was reading the log and found
the following:
{code:java}
2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN
datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) -
not open lock leak check func.{code}
Looking at the code, I found that there is such a parameter:
{code:java}
<property>
<name>dfs.datanode.lockmanager.trace</name>
<value>false</value>
<description>
If this is true, after shut down datanode lock Manager will print all leak
thread that not release by lock Manager. Only used for test or trace dead
lock
problem. In produce default set false, because it's have little
performance loss.
</description>
</property> {code}
I think this parameter should be added in the test environment, so that if
there is a DN deadlock, the cause can be quickly located.
> Enable dfs.datanode.lockmanager.trace In Test
> ---------------------------------------------
>
> Key: HDFS-16631
> URL: https://issues.apache.org/jira/browse/HDFS-16631
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: datanode
> Reporter: fanshilun
> Assignee: fanshilun
> Priority: Minor
> Labels: pull-request-available
> Attachments: image-2022-06-18-09-49-28-725.png
>
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of
> deadlock, this is a very meaningful discussion, I was reading the log and
> found the following:
> {code:java}
> 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN
> datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) -
> not open lock leak check func.{code}
> Looking at the code, I found that there is such a parameter:
> {code:java}
> <property>
> <name>dfs.datanode.lockmanager.trace</name>
> <value>false</value>
> <description>
> If this is true, after shut down datanode lock Manager will print all
> leak
> thread that not release by lock Manager. Only used for test or trace
> dead lock
> problem. In produce default set false, because it's have little
> performance loss.
> </description>
> </property> {code}
> I think this parameter should be added in the test environment, so that if
> there is a DN deadlock, the cause can be quickly located.
> According to suggestions, the following modifications are made:
> 1. On the read and write lock related methods of DataSetLockManager, add the
> operation name to clearly indicate the source of the lock, which is
> convenient for public use.
> 2. Increase the granularity of indicator monitoring, including the number of
> locks, the time of locks, and the early warning of locks.
>
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]