[jira] [Commented] (HADOOP-17531) DistCp: Reduce memory usage on copying huge directories

Ayush Saxena (Jira) Fri, 05 Mar 2021 02:53:34 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295971#comment-17295971
 ]


Ayush Saxena commented on HADOOP-17531:
---------------------------------------

Have raised HADOOP-17558, for Object stores. Will try use a fixed size TPE 
there, instead this prod-consumer setup

Regarding {{listFiles}}, I think this won't include directories?, and In distCp 
we add the directories too in the sequence File, Using listFiles would miss 
atleast the empty directories and may be that preserve attributes(-p option) on 
directories would also not work.

I would give a try to that as well in HADOOP-17558 and try to include the IO 
performance stuff in the LOG as well.

> DistCp: Reduce memory usage on copying huge directories
> -------------------------------------------------------
>
>                 Key: HADOOP-17531
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17531
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Priority: Critical
>              Labels: pull-request-available
>         Attachments: MoveToStackIterator.patch, gc-NewD-512M-3.8ML.log
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Presently distCp, uses the producer-consumer kind of setup while building the 
> listing, the input queue and output queue are both unbounded, thus the 
> listStatus grows quite huge.
> Rel Code Part :
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L635
> This goes on bredth-first traversal kind of stuff(uses queue instead of 
> earlier stack), so if you have files at lower depth, it will like open up the 
> entire tree and the start processing....



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-17531) DistCp: Reduce memory usage on copying huge directories

Reply via email to