[ 
https://issues.apache.org/jira/browse/HDFS-16631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fanshilun updated HDFS-16631:
-----------------------------
    Description: 
In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of 
deadlock, this is a very meaningful discussion, I was reading the log and found 
the following:
{code:java}
2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN 
datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) -
 not open lock leak check func.{code}
Looking at the code, I found that there is such a parameter:
{code:java}
<property>
    <name>dfs.datanode.lockmanager.trace</name>
    <value>false</value>
    <description>
      If this is true, after shut down datanode lock Manager will print all leak
      thread that not release by lock Manager. Only used for test or trace dead 
lock
      problem. In produce default set false, because it's have little 
performance loss.
    </description>
  </property> {code}
I think this parameter should be added in the test environment, so that if 
there is a DN deadlock, the cause can be quickly located.

According to suggestions, the following modifications are made:

1. On the read and write lock related methods of DataSetLockManager, add the 
operation name to clearly indicate the source of the lock, which is convenient 
for public use.
2. Increase the granularity of indicator monitoring, including the number of 
locks, the time of locks, and the early warning of locks.

 

  was:
In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of 
deadlock, this is a very meaningful discussion, I was reading the log and found 
the following:
{code:java}
2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN 
datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) -
 not open lock leak check func.{code}
Looking at the code, I found that there is such a parameter:
{code:java}
<property>
    <name>dfs.datanode.lockmanager.trace</name>
    <value>false</value>
    <description>
      If this is true, after shut down datanode lock Manager will print all leak
      thread that not release by lock Manager. Only used for test or trace dead 
lock
      problem. In produce default set false, because it's have little 
performance loss.
    </description>
  </property> {code}
I think this parameter should be added in the test environment, so that if 
there is a DN deadlock, the cause can be quickly located.

 


> Enable dfs.datanode.lockmanager.trace In Test
> ---------------------------------------------
>
>                 Key: HDFS-16631
>                 URL: https://issues.apache.org/jira/browse/HDFS-16631
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>            Reporter: fanshilun
>            Assignee: fanshilun
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: image-2022-06-18-09-49-28-725.png
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In Jira HDFS-16600. Fix deadlock on DataNode side. We discussed the issue of 
> deadlock, this is a very meaningful discussion, I was reading the log and 
> found the following:
> {code:java}
> 2022-05-27 07:39:47,890 [Listener at localhost/36941] WARN 
> datanode.DataSetLockManager (DataSetLockManager.java:lockLeakCheck(261)) -
>  not open lock leak check func.{code}
> Looking at the code, I found that there is such a parameter:
> {code:java}
> <property>
>     <name>dfs.datanode.lockmanager.trace</name>
>     <value>false</value>
>     <description>
>       If this is true, after shut down datanode lock Manager will print all 
> leak
>       thread that not release by lock Manager. Only used for test or trace 
> dead lock
>       problem. In produce default set false, because it's have little 
> performance loss.
>     </description>
>   </property> {code}
> I think this parameter should be added in the test environment, so that if 
> there is a DN deadlock, the cause can be quickly located.
> According to suggestions, the following modifications are made:
> 1. On the read and write lock related methods of DataSetLockManager, add the 
> operation name to clearly indicate the source of the lock, which is 
> convenient for public use.
> 2. Increase the granularity of indicator monitoring, including the number of 
> locks, the time of locks, and the early warning of locks.
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to