[jira] [Commented] (HADOOP-12502) SetReplication OutOfMemoryError

Aaron Fabbri (JIRA) Thu, 14 Dec 2017 17:59:20 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16291918#comment-16291918
 ]


Aaron Fabbri commented on HADOOP-12502:
---------------------------------------

Hi, thanks for the work on this [~vinayrpet] and [~jojochuang]. Some questions 
on latest patch.

Do we know where the most memory was going? Is it the references to all the 
listStatus() arrays held in the full recursion call tree in 
fs/shell/Command.java? (Each recursive call passes reference to that level's 
listStatus() array, meaning whole tree will be held in heap, right?)

How does this patch fix the OOM issue?  Is it because we're now holding 
RemoteIterator's for the whole directory tree in memory, instead of holding the 
actual listStatus arrays?

{noformat}
+  public RemoteIterator<FileStatus> listStatusIterator(Path p)
+      throws IOException {
+    // Not-using fs' listStatusIterator since iterator also expect to return
+    // sorted elements.
+    return new RemoteIterator<FileStatus>() {
+      private final FileStatus[] stats = listStatus(p);
{noformat}

Why are we forcing ChecksumFilesystem to not use listStatusIterator(), and 
sorting the results here?  This could increase memory usage, no? I don't think 
sorted iterator is required by the FS contract.

The Ls.java changes seem tricky. I wonder if there is a simpler way of doing 
this (idea: Command exposes an overridable {{boolean isSorted()}} predicate 
that Ls.java can override if it needs sorting, and leave the traversal logic in 
Command instead of mucking with it in Ls?)

{noformat}
-    // TODO: this really should be iterative
{noformat}

This comment is still true? I'm guessing the intent was "iterative" as in "not 
recursive", instead of "iterative" as in "using an iterator"

{noformat}
+  protected void recursePath(PathData item) throws IOException {
+    if (!isRecursive() || isOrderTime() || isOrderSize() || isOrderReverse()) {
+      // use the non-iterative method for listing because explicit sorting is
{noformat}

You mean "non-recursive", right? Or maybe "non-iterator".

{noformat}
+      // required based on time/size/reverse or Total number of entries
+      // required to print summary first when non-recursive.
+      processPaths(item, item.getDirectoryContents());
{noformat}

What about the depth++ depth-- accounting in Command.recursePaths() that you 
skip here? Is the logic that Ls does not use getDepth()?  Seems brittle.

{noformat}
+    } else {
+      super.recursePath(item);
+    }
+  }
+
{noformat}

Why does PathData.getDirectoryContents() sort its listing?

{noformat}
  @Override
  protected void processPaths(PathData parent,
      RemoteIterator<PathData> itemsIterator) throws IOException {
    if (pathOnly) {
      // If there is a need of printing only paths, then iterator can be used
      // directly.
      super.processPaths(parent, itemsIterator);
      return;
    }
    /*
     * LS output should be formatted properly. Grouping 100 items and formatting
     * the output to reduce the creation of huge sized arrays. This method will
     * be called only when recursive is set.
     */
    List<PathData> items = new ArrayList<>();
    while (itemsIterator.hasNext()) {
      while (items.size() < 100 && itemsIterator.hasNext()) {
        items.add(itemsIterator.next());
      }
      processPaths(parent, items.toArray(new PathData[items.size()]));
      items.clear();
{noformat}
I guess this is much of the memory savings. I guess this chunking into 100 
works without changing the depth-first search ordering.

What about the sorting in existing {{Ls#processPaths()}}?  That changes because 
we now only sort the batches of 100.

I like the idea of chunking the depth first search (DFS) into blocks of 100 and 
releasing references on the way up. Wouldn't we want to do this in Command 
instead of Ls?  Two reasons: (1) other commands benefit (2) less brittle in 
terms of how recursion logic is wired up between Command and Ls.


> SetReplication OutOfMemoryError
> -------------------------------
>
>                 Key: HADOOP-12502
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12502
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Philipp Schuegerl
>            Assignee: Vinayakumar B
>         Attachments: HADOOP-12502-01.patch, HADOOP-12502-02.patch, 
> HADOOP-12502-03.patch, HADOOP-12502-04.patch, HADOOP-12502-05.patch, 
> HADOOP-12502-06.patch, HADOOP-12502-07.patch
>
>
> Setting the replication of a HDFS folder recursively can run out of memory. 
> E.g. with a large /var/log directory:
> hdfs dfs -setrep -R -w 1 /var/log
> Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit 
> exceeded
>       at java.util.Arrays.copyOfRange(Arrays.java:2694)
>       at java.lang.String.<init>(String.java:203)
>       at java.lang.String.substring(String.java:1913)
>       at java.net.URI$Parser.substring(URI.java:2850)
>       at java.net.URI$Parser.parse(URI.java:3046)
>       at java.net.URI.<init>(URI.java:753)
>       at org.apache.hadoop.fs.Path.initialize(Path.java:203)
>       at org.apache.hadoop.fs.Path.<init>(Path.java:116)
>       at org.apache.hadoop.fs.Path.<init>(Path.java:94)
>       at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:222)
>       at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.makeQualified(HdfsFileStatus.java:246)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:689)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
>       at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>       at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
>       at 
> org.apache.hadoop.fs.shell.PathData.getDirectoryContents(PathData.java:268)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:347)
>       at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:308)
>       at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:278)
>       at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:260)
>       at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:244)
>       at 
> org.apache.hadoop.fs.shell.SetReplication.processArguments(SetReplication.java:76)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-12502) SetReplication OutOfMemoryError

Reply via email to