[ 
https://issues.apache.org/jira/browse/HDFS-15621?focusedWorklogId=581824&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-581824
 ]

ASF GitHub Bot logged work on HDFS-15621:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 13/Apr/21 14:52
            Start Date: 13/Apr/21 14:52
    Worklog Time Spent: 10m 
      Work Description: jojochuang commented on a change in pull request #2849:
URL: https://github.com/apache/hadoop/pull/2849#discussion_r612493208



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
##########
@@ -297,23 +281,18 @@ private static String getSuffix(File f, String prefix) {
      * @param metaFile the path to the block meta-data file

Review comment:
       need a @param for basePath. Also add that metaFile stores only the 
suffix.

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/FsVolumeSpi.java
##########
@@ -227,27 +227,27 @@
    */
   public static class ScanInfo implements Comparable<ScanInfo> {
     private final long blockId;
-
     /**
-     * The block file path, relative to the volume's base directory.
-     * If there was no block file found, this may be null. If 'vol'
-     * is null, then this is the full path of the block file.
+     * The full path to the folder containing the block / meta files.
      */
-    private final String blockSuffix;
-
+    private final File basePath;
     /**
-     * The suffix of the meta file path relative to the block file.
-     * If blockSuffix is null, then this will be the entire path relative
-     * to the volume base directory, or an absolute path if vol is also
-     * null.
+     * The block file name, with no path
      */
-    private final String metaSuffix;
+    private final String blockFile;
+    /**
+     * Holds the meta file name, with no path, only if blockFile is null.
+     * If blockFile is not null, the meta file will be named identically to
+     * the blockFile, but with a suffix like "_1234.meta". If the blockFile
+     * is present, we store only the meta file suffix.
+     */

Review comment:
       it would also make sense to copy this comment to the constructor 
parameters.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 581824)
    Time Spent: 50m  (was: 40m)

> Datanode DirectoryScanner uses excessive memory
> -----------------------------------------------
>
>                 Key: HDFS-15621
>                 URL: https://issues.apache.org/jira/browse/HDFS-15621
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.4.0
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Screenshot 2020-10-09 at 14.11.36.png, Screenshot 
> 2020-10-09 at 15.20.56.png
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> We generally work a rule of 1GB heap on a datanode per 1M blocks. For nodes 
> with a lot of blocks, this can mean a lot of heap.
> We recently captured a heapdump of a DN with about 22M blocks and found only 
> about 1.5GB was occupied by the ReplicaMap. Another 9GB of the heap is taken 
> by the DirectoryScanner ScanInfo objects. Most of this memory was alloated to 
> strings.
> Checking the strings in question, we can see two strings per scanInfo, 
> looking like:
> {code}
> /current/BP-671271071-10.163.205.13-1552020401842/current/finalized/subdir28/subdir17/blk_1180438785
> _106716708.meta
> {code}
> I will update a screen shot from MAT showing this.
> For the first string especially, the part 
> "/current/BP-671271071-10.163.205.13-1552020401842/current/finalized/" will 
> be the same for every block in the block pool as the scanner is only 
> concerned about finalized blocks.
> We can probably also store just the subdir indexes "28" and "27" rather than 
> "subdir28/subdir17" and then construct the path when it is requested via the 
> getter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to