[ 
https://issues.apache.org/jira/browse/HDFS-11881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056600#comment-16056600
 ] 

Manoj Govindassamy commented on HDFS-11881:
-------------------------------------------

Thanks for the review [~jojochuang]. 

bq. It looks to me that existing tests in TestSnapshotDiffReport cover most 
code path. Maybe it's easier to update one of the test there to create 100 more 
files.
TestSnapshotDiffReport is testing the diff report using HDFS API. But in the 
context of the GC problem we are trying to solve here, we want the diff command 
to be run over the shell. TestSnapshotCommands is already testing snapshot 
commands over shell and looks like a better place for the new diff command test.

bq. have you done any kind of heap size measurement against a real cluster of 
substantial size before/after this patch?
I was planning to do memory profiling for the broader fix. This particular jira 
is more of short term, quick fix approach using the already existing and proven 
methods. By looking at the code and comments, ChunkedArrayList is far better 
than ArrayList in terms of memory usage. This quick fix might not sufficient to 
solve all the GC problems around snapshot diff report, but definitely helps to 
mitigate the problem.

bq. Diff#insert uses ArrayList to store created and deleted inodes. Considering 
that a directory might have millions of created/deleted inodes in a snapshot, 
there is a potential upside to convert these lists to ChunkedArrayList.
Thats right, DirectoryDiff list is still ArrayList and suffers from the 
contiguous memory allocation issue. Will convert these to ChunkedArrayList.

> NameNode consumes a lot of memory for snapshot diff report generation
> ---------------------------------------------------------------------
>
>                 Key: HDFS-11881
>                 URL: https://issues.apache.org/jira/browse/HDFS-11881
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, snapshots
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11881.01.patch
>
>
> *Problem:*
> HDFS supports a snapshot diff tool which can generate a [detailed report | 
> https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html#Get_Snapshots_Difference_Report]
>  of modified, created, deleted and renamed files between any 2 snapshots.
> {noformat}
> hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
> {noformat}
> However, if the diff list between 2 snapshots happens to be huge, in the 
> order of millions, then NameNode can consume a lot of memory while generating 
> the huge diff report. In a few cases, we are seeing NameNode getting into a 
> long GC lasting for few minutes to make room for this burst in memory 
> requirement during snapshot diff report generation.
> *RootCause:*
> * NameNode tries to generate the diff report with all diff entries at once 
> which puts undue pressure 
> * Each diff report entry has the diff type (enum), source path byte array, 
> and destination path byte array to the minimum. Let's take file deletions use 
> case. For file deletions, there would be only source or destination paths in 
> the diff report entry. Let's assume these deleted files on average take 
> 128Bytes for the path. 4 million file deletion captured in diff report will 
> thus need 512MB of memory 
> * The snapshot diff report uses simple java ArrayList which tries to double 
> its backing contiguous memory chunk every time the usage factor crosses the 
> capacity threshold. So, a 512MB memory requirement might be internally asking 
> for a much larger contiguous memory chunk
> *Proposal:*
> * Make NameNode snapshot diff report service follow the batch model (like 
> directory listing service). Clients (hdfs snapshotDiff command) will then 
> receive  diff report in small batches, and need to iterate several times to 
> get the full list.
> * Additionally, snap diff report service in the NameNode can make use of 
> ChunkedArrayList data structure instead of the current ArrayList so as to 
> avoid the curse of fragmentation and large contiguous memory requirement.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to