[
https://issues.apache.org/jira/browse/HADOOP-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291918#comment-16291918
]
Aaron Fabbri commented on HADOOP-12502:
---------------------------------------
Hi, thanks for the work on this [~vinayrpet] and [~jojochuang]. Some questions
on latest patch.
Do we know where the most memory was going? Is it the references to all the
listStatus() arrays held in the full recursion call tree in
fs/shell/Command.java? (Each recursive call passes reference to that level's
listStatus() array, meaning whole tree will be held in heap, right?)
How does this patch fix the OOM issue? Is it because we're now holding
RemoteIterator's for the whole directory tree in memory, instead of holding the
actual listStatus arrays?
{noformat}
+ public RemoteIterator<FileStatus> listStatusIterator(Path p)
+ throws IOException {
+ // Not-using fs' listStatusIterator since iterator also expect to return
+ // sorted elements.
+ return new RemoteIterator<FileStatus>() {
+ private final FileStatus[] stats = listStatus(p);
{noformat}
Why are we forcing ChecksumFilesystem to not use listStatusIterator(), and
sorting the results here? This could increase memory usage, no? I don't think
sorted iterator is required by the FS contract.
The Ls.java changes seem tricky. I wonder if there is a simpler way of doing
this (idea: Command exposes an overridable {{boolean isSorted()}} predicate
that Ls.java can override if it needs sorting, and leave the traversal logic in
Command instead of mucking with it in Ls?)
{noformat}
- // TODO: this really should be iterative
{noformat}
This comment is still true? I'm guessing the intent was "iterative" as in "not
recursive", instead of "iterative" as in "using an iterator"
{noformat}
+ protected void recursePath(PathData item) throws IOException {
+ if (!isRecursive() || isOrderTime() || isOrderSize() || isOrderReverse()) {
+ // use the non-iterative method for listing because explicit sorting is
{noformat}
You mean "non-recursive", right? Or maybe "non-iterator".
{noformat}
+ // required based on time/size/reverse or Total number of entries
+ // required to print summary first when non-recursive.
+ processPaths(item, item.getDirectoryContents());
{noformat}
What about the depth++ depth-- accounting in Command.recursePaths() that you
skip here? Is the logic that Ls does not use getDepth()? Seems brittle.
{noformat}
+ } else {
+ super.recursePath(item);
+ }
+ }
+
{noformat}
Why does PathData.getDirectoryContents() sort its listing?
{noformat}
@Override
protected void processPaths(PathData parent,
RemoteIterator<PathData> itemsIterator) throws IOException {
if (pathOnly) {
// If there is a need of printing only paths, then iterator can be used
// directly.
super.processPaths(parent, itemsIterator);
return;
}
/*
* LS output should be formatted properly. Grouping 100 items and formatting
* the output to reduce the creation of huge sized arrays. This method will
* be called only when recursive is set.
*/
List<PathData> items = new ArrayList<>();
while (itemsIterator.hasNext()) {
while (items.size() < 100 && itemsIterator.hasNext()) {
items.add(itemsIterator.next());
}
processPaths(parent, items.toArray(new PathData[items.size()]));
items.clear();
{noformat}
I guess this is much of the memory savings. I guess this chunking into 100
works without changing the depth-first search ordering.
What about the sorting in existing {{Ls#processPaths()}}? That changes because
we now only sort the batches of 100.
I like the idea of chunking the depth first search (DFS) into blocks of 100 and
releasing references on the way up. Wouldn't we want to do this in Command
instead of Ls? Two reasons: (1) other commands benefit (2) less brittle in
terms of how recursion logic is wired up between Command and Ls.
> SetReplication OutOfMemoryError
> -------------------------------
>
> Key: HADOOP-12502
> URL: https://issues.apache.org/jira/browse/HADOOP-12502
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.3.0
> Reporter: Philipp Schuegerl
> Assignee: Vinayakumar B
> Attachments: HADOOP-12502-01.patch, HADOOP-12502-02.patch,
> HADOOP-12502-03.patch, HADOOP-12502-04.patch, HADOOP-12502-05.patch,
> HADOOP-12502-06.patch, HADOOP-12502-07.patch
>
>
> Setting the replication of a HDFS folder recursively can run out of memory.
> E.g. with a large /var/log directory:
> hdfs dfs -setrep -R -w 1 /var/log
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit
> exceeded
> at java.util.Arrays.copyOfRange(Arrays.java:2694)
> at java.lang.String.<init>(String.java:203)
> at java.lang.String.substring(String.java:1913)
> at java.net.URI$Parser.substring(URI.java:2850)
> at java.net.URI$Parser.parse(URI.java:3046)
> at java.net.URI.<init>(URI.java:753)
> at org.apache.hadoop.fs.Path.initialize(Path.java:203)
> at org.apache.hadoop.fs.Path.<init>(Path.java:116)
> at org.apache.hadoop.fs.Path.<init>(Path.java:94)
> at
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222)
> at
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
> at
> org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
> at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
> at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
> at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
> at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
> at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
> at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
> at
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
> at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
> at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
> at
> org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]