[ 
https://issues.apache.org/jira/browse/HDFS-14536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Chow updated HDFS-14536:
-----------------------------
    Description: 
I found all my my request blocked in every 6 hours and use jstack to dump 
threads:
{code:java}
"1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
tid=0x00007f7ed01c3800 nid=0x1a1c waiting for monitor entry [0x00007f7e9e42b000]
  java.lang.Thread.State: BLOCKED (on object monitor)
    at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
 - waiting to lock <0x00000002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)

/// lots of BLOCKED for about one minute

"java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
queue]" #77 daemon prio=5 os_prio=0 tid=0x00007f7ed8590800 nid=0x49cc runnable 
[0x00007f7ea0344000]
  java.lang.Thread.State: RUNNABLE
    at 
org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
    at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
    at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.<init>(ReplicaInfo.java:95)
    at 
org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.<init>(FinalizedReplica.java:60)
    at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
    - locked <0x00000002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
    - locked <0x00000002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{code}

This problem may relate to lots of blocks in every datanode. 
After reviewing the DirectoryScanner behavior, I think the lock 
{{synchronized(dataset)}} in  {{scan()}} can be removed safely.

As dataset.getFinalizedBlocks() is a synchronized method, the memReport will be 
generated from volumeMap with lock. After that , the outside lock 
{{synchronized(dataset)}} is needless but I've encounter the later while-loop 
running 20 seconds that blocking other threads.

Yes, other blocks could be added or deleted from memory or disk, but it has no 
affect to get the differences between them. Because the later fixing processing 
will handle both sides.

 
{code:java}
  void scan() {
    clear();
    Map<String, ScanInfo[]> diskReport = getDiskReport();

    // Hold FSDataset lock to prevent further changes to the block map
    synchronized(dataset) {  // remove synchronized?
      for (Entry<String, ScanInfo[]> entry : diskReport.entrySet()) {
        String bpid = entry.getKey();
        ScanInfo[] blockpoolReport = entry.getValue();

        Stats statsRecord = new Stats(bpid);
        stats.put(bpid, statsRecord);
        LinkedList<ScanInfo> diffRecord = new LinkedList<ScanInfo>();
        diffs.put(bpid, diffRecord);

        statsRecord.totalBlocks = blockpoolReport.length;
        // getFinalizedBlocks is synchronized 
        List<FinalizedReplica> bl = dataset.getFinalizedBlocks(bpid);
        FinalizedReplica[] memReport = bl.toArray(new 
FinalizedReplica[bl.size()]);
        Arrays.sort(memReport); // Sort based on blockId

        int d = 0; // index for blockpoolReport
        int m = 0; // index for memReprot
        while (m < memReport.length && d < blockpoolReport.length) {
          // may take 20s to finish this loop
        }
      } //end for
    } //end synchronized
  }{code}
 

Let say block A is added and block B is deleted after 
{{dataset.getFinalizedBlocks(bpid)}} called. That means the disk  file and 
memory is updated. Because we have a memory finalized blocks snapshot before, 
and just care about those blocks consistence. 

If the difference report occasionally has that deleted block B, the later 
{{checkAndUpdate}} will handle it in the right way.


I've compile this patch and try will test it in my enviroment.

  was:
I found all my my request blocked in every 6 hours and use jstack to dump 
threads:
{code:java}
"1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
tid=0x00007f7ed01c3800 nid=0x1a1c waiting for monitor entry [0x00007f7e9e42b000]
  java.lang.Thread.State: BLOCKED (on object monitor)
    at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
 - waiting to lock <0x00000002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
 at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)

/// lots of BLOCKED for about one minute

"java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
queue]" #77 daemon prio=5 os_prio=0 tid=0x00007f7ed8590800 nid=0x49cc runnable 
[0x00007f7ea0344000]
  java.lang.Thread.State: RUNNABLE
    at 
org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
    at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
    at 
org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.<init>(ReplicaInfo.java:95)
    at 
org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.<init>(FinalizedReplica.java:60)
    at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
    - locked <0x00000002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
    - locked <0x00000002c00410f8> (a 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
    at 
org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
{code}

This problem may relate to lots of blocks in every datanode. 
After reviewing the DirectoryScanner behavior, I think the lock 
{{synchronized(dataset)}} in  {{scan()}} can be removed safely.

As dataset.getFinalizedBlocks() is a synchronized method, the memReport will be 
generated from volumeMap with lock. After that , the outside lock 
{{synchronized(dataset)}} is needless but I've encounter the later while loop 
running 20 seconds

Yes, other blocks could be added or deleted from memory or disk, but it has no 
affect to get the differences between them. Because the later fixing processing 
will handle both sides.

 
{code:java}
  void scan() {
    clear();
    Map<String, ScanInfo[]> diskReport = getDiskReport();

    // Hold FSDataset lock to prevent further changes to the block map
    synchronized(dataset) {  // remove synchronized?
      for (Entry<String, ScanInfo[]> entry : diskReport.entrySet()) {
        String bpid = entry.getKey();
        ScanInfo[] blockpoolReport = entry.getValue();

        Stats statsRecord = new Stats(bpid);
        stats.put(bpid, statsRecord);
        LinkedList<ScanInfo> diffRecord = new LinkedList<ScanInfo>();
        diffs.put(bpid, diffRecord);

        statsRecord.totalBlocks = blockpoolReport.length;
        // getFinalizedBlocks is synchronized 
        List<FinalizedReplica> bl = dataset.getFinalizedBlocks(bpid);
        FinalizedReplica[] memReport = bl.toArray(new 
FinalizedReplica[bl.size()]);
        Arrays.sort(memReport); // Sort based on blockId

        int d = 0; // index for blockpoolReport
        int m = 0; // index for memReprot
        while (m < memReport.length && d < blockpoolReport.length) {
          // may take 20s to finish this loop
        }
      } //end for
    } //end synchronized
  }{code}
 

Let say block A is added and block B is deleted after 
{{dataset.getFinalizedBlocks(bpid)}} called. That means the disk  file and 
memory is updated. Because we have a memory finalized blocks snapshot before, 
and just care about those blocks consistence. 

If the difference report occasionally has that deleted block B, the later 
{{checkAndUpdate}} will handle it in the right way.


I've compile this patch and try will test it in my enviroment.


> It's safe to remove synchronized from reconcile scan
> ----------------------------------------------------
>
>                 Key: HDFS-14536
>                 URL: https://issues.apache.org/jira/browse/HDFS-14536
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.6.0, 2.7.0
>            Reporter: Sean Chow
>            Priority: Major
>         Attachments: HDFS-14536.00.patch
>
>
> I found all my my request blocked in every 6 hours and use jstack to dump 
> threads:
> {code:java}
> "1490900528@qtp-341796579-35" #132086969 daemon prio=5 os_prio=0 
> tid=0x00007f7ed01c3800 nid=0x1a1c waiting for monitor entry 
> [0x00007f7e9e42b000]
>   java.lang.Thread.State: BLOCKED (on object monitor)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.getDfsUsed(FsVolumeImpl.java:295)
>  - waiting to lock <0x00000002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>  at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getVolumeInfo(FsDatasetImpl.java:2323)
> /// lots of BLOCKED for about one minute
> "java.util.concurrent.ThreadPoolExecutor$Worker@49d8fff9[State = -1, empty 
> queue]" #77 daemon prio=5 os_prio=0 tid=0x00007f7ed8590800 nid=0x49cc 
> runnable [0x00007f7ea0344000]
>   java.lang.Thread.State: RUNNABLE
>     at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.idToBlockDir(DatanodeUtil.java:120)
>     at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.getDir(ReplicaInfo.java:143)
>     at 
> org.apache.hadoop.hdfs.server.datanode.ReplicaInfo.<init>(ReplicaInfo.java:95)
>     at 
> org.apache.hadoop.hdfs.server.datanode.FinalizedReplica.<init>(FinalizedReplica.java:60)
>     at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getFinalizedBlocks(FsDatasetImpl.java:1510)
>     - locked <0x00000002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:437)
>     - locked <0x00000002c00410f8> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:404)
>     at 
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:360)
>     at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>     at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> {code}
> This problem may relate to lots of blocks in every datanode. 
> After reviewing the DirectoryScanner behavior, I think the lock 
> {{synchronized(dataset)}} in  {{scan()}} can be removed safely.
> As dataset.getFinalizedBlocks() is a synchronized method, the memReport will 
> be generated from volumeMap with lock. After that , the outside lock 
> {{synchronized(dataset)}} is needless but I've encounter the later while-loop 
> running 20 seconds that blocking other threads.
> Yes, other blocks could be added or deleted from memory or disk, but it has 
> no affect to get the differences between them. Because the later fixing 
> processing will handle both sides.
>  
> {code:java}
>   void scan() {
>     clear();
>     Map<String, ScanInfo[]> diskReport = getDiskReport();
>     // Hold FSDataset lock to prevent further changes to the block map
>     synchronized(dataset) {  // remove synchronized?
>       for (Entry<String, ScanInfo[]> entry : diskReport.entrySet()) {
>         String bpid = entry.getKey();
>         ScanInfo[] blockpoolReport = entry.getValue();
>         Stats statsRecord = new Stats(bpid);
>         stats.put(bpid, statsRecord);
>         LinkedList<ScanInfo> diffRecord = new LinkedList<ScanInfo>();
>         diffs.put(bpid, diffRecord);
>         statsRecord.totalBlocks = blockpoolReport.length;
>         // getFinalizedBlocks is synchronized 
>         List<FinalizedReplica> bl = dataset.getFinalizedBlocks(bpid);
>         FinalizedReplica[] memReport = bl.toArray(new 
> FinalizedReplica[bl.size()]);
>         Arrays.sort(memReport); // Sort based on blockId
>         int d = 0; // index for blockpoolReport
>         int m = 0; // index for memReprot
>         while (m < memReport.length && d < blockpoolReport.length) {
>           // may take 20s to finish this loop
>         }
>       } //end for
>     } //end synchronized
>   }{code}
>  
> Let say block A is added and block B is deleted after 
> {{dataset.getFinalizedBlocks(bpid)}} called. That means the disk  file and 
> memory is updated. Because we have a memory finalized blocks snapshot before, 
> and just care about those blocks consistence. 
> If the difference report occasionally has that deleted block B, the later 
> {{checkAndUpdate}} will handle it in the right way.
> I've compile this patch and try will test it in my enviroment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to