[ 
https://issues.apache.org/jira/browse/HDFS-11047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15613000#comment-15613000
 ] 

Arpit Agarwal commented on HDFS-11047:
--------------------------------------

The v2 patch looks good.

Nitpick: can you please change the _may need to_ to _should_?

{code}
  /**
   * Gets a list of references to the finalized blocks for the given block pool.
   * <p>
   * Callers of this function should call
   * {@link FsDatasetSpi#acquireDatasetLock} to avoid blocks' status being
   * changed during list iteration.
{code}

Also can you please copy the documentation to 
{{FsDatasetImpl#getFinalizedBlocks}} so future callers are less likely to miss 
the caveat?

> Remove deep copies of FinalizedReplica to alleviate heap consumption on 
> DataNode
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-11047
>                 URL: https://issues.apache.org/jira/browse/HDFS-11047
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, fs
>            Reporter: Xiaobing Zhou
>            Assignee: Xiaobing Zhou
>         Attachments: HDFS-11047.000.patch, HDFS-11047.001.patch
>
>
> DirectoryScanner does scan by deep copying FinalizedReplica. In a deployment 
> with 500,000+ blocks, we've seen the DN heap usage being accumulated to high 
> peaks very quickly. Deep copies of FinalizedReplica will make DN heap usage 
> even worse if directory scans are scheduled more frequently. This proposes 
> removing unnecessary deep copies since DirectoryScanner#scan already holds 
> lock of dataset. The sibling work is tracked by AMBARI-18694
> DirectoryScanner#scan
> {code}
>     try(AutoCloseableLock lock = dataset.acquireDatasetLock()) {
>       for (Entry<String, ScanInfo[]> entry : diskReport.entrySet()) {
>         String bpid = entry.getKey();
>         ScanInfo[] blockpoolReport = entry.getValue();
>         
>         Stats statsRecord = new Stats(bpid);
>         stats.put(bpid, statsRecord);
>         LinkedList<ScanInfo> diffRecord = new LinkedList<ScanInfo>();
>         diffs.put(bpid, diffRecord);
>         
>         statsRecord.totalBlocks = blockpoolReport.length;
>         List<ReplicaInfo> bl = dataset.getFinalizedBlocks(bpid); /* deep 
> copies here*/
> {code}
> FsDatasetImpl#getFinalizedBlocks
> {code}
>   public List<ReplicaInfo> getFinalizedBlocks(String bpid) {
>     try (AutoCloseableLock lock = datasetLock.acquire()) {
>       ArrayList<ReplicaInfo> finalized =
>           new ArrayList<ReplicaInfo>(volumeMap.size(bpid));
>       for (ReplicaInfo b : volumeMap.replicas(bpid)) {
>         if (b.getState() == ReplicaState.FINALIZED) {
>           finalized.add(new ReplicaBuilder(ReplicaState.FINALIZED)
>               .from(b).build()); /* deep copies here*/
>         }
>       }
>       return finalized;
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to