[
https://issues.apache.org/jira/browse/HADOOP-17531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295971#comment-17295971
]
Ayush Saxena commented on HADOOP-17531:
---------------------------------------
Have raised HADOOP-17558, for Object stores. Will try use a fixed size TPE
there, instead this prod-consumer setup
Regarding {{listFiles}}, I think this won't include directories?, and In distCp
we add the directories too in the sequence File, Using listFiles would miss
atleast the empty directories and may be that preserve attributes(-p option) on
directories would also not work.
I would give a try to that as well in HADOOP-17558 and try to include the IO
performance stuff in the LOG as well.
> DistCp: Reduce memory usage on copying huge directories
> -------------------------------------------------------
>
> Key: HADOOP-17531
> URL: https://issues.apache.org/jira/browse/HADOOP-17531
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Ayush Saxena
> Priority: Critical
> Labels: pull-request-available
> Attachments: MoveToStackIterator.patch, gc-NewD-512M-3.8ML.log
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> Presently distCp, uses the producer-consumer kind of setup while building the
> listing, the input queue and output queue are both unbounded, thus the
> listStatus grows quite huge.
> Rel Code Part :
> https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/SimpleCopyListing.java#L635
> This goes on bredth-first traversal kind of stuff(uses queue instead of
> earlier stack), so if you have files at lower depth, it will like open up the
> entire tree and the start processing....
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]