[
https://issues.apache.org/jira/browse/HADOOP-16570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932742#comment-16932742
]
Steve Loughran commented on HADOOP-16570:
-----------------------------------------
Even when you set the # of threads to 0, so disabling task commit, the job
commit (for a terasort BTW) fails.
stack trace implies its while listing the files to commit
{code}
main
at java.util.Arrays.copyOfRange([CII)[C (Arrays.java:3664)
at java.lang.String.<init>([CII)V (String.java:207)
at java.lang.String.substring(II)Ljava/lang/String; (String.java:1969)
at java.net.URI$Parser.substring(II)Ljava/lang/String; (URI.java:2869)
at java.net.URI$Parser.parseHierarchical(II)I (URI.java:3106)
at java.net.URI$Parser.parse(Z)V (URI.java:3053)
at java.net.URI.<init>(Ljava/lang/String;)V (URI.java:588)
at
org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.destinationPath()Lorg/apache/hadoop/fs/Path;
(SinglePendingCommit.java:253)
at org.apache.hadoop.fs.s3a.commit.files.SinglePendingCommit.validate()V
(SinglePendingCommit.java:195)
at org.apache.hadoop.fs.s3a.commit.files.PendingSet.validate()V
(PendingSet.java:146)
at
org.apache.hadoop.fs.s3a.commit.files.PendingSet.load(Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/Path;)Lorg/apache/hadoop/fs/s3a/commit/files/PendingSet;
(PendingSet.java:109)
at
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.lambda$loadPendingsetFiles$1(Ljava/util/List;Lorg/apache/hadoop/fs/FileSystem;Lorg/apache/hadoop/fs/FileStatus;)V
(AbstractS3ACommitter.java:492)
at
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter$$Lambda$92.run(Ljava/lang/Object;)V
(Unknown Source)
at
org.apache.hadoop.fs.s3a.commit.Tasks$Builder.runSingleThreaded(Lorg/apache/hadoop/fs/s3a/commit/Tasks$Task;)Z
(Tasks.java:165)
at
org.apache.hadoop.fs.s3a.commit.Tasks$Builder.run(Lorg/apache/hadoop/fs/s3a/commit/Tasks$Task;)Z
(Tasks.java:150)
at
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.loadPendingsetFiles(Lorg/apache/hadoop/mapreduce/JobContext;ZLorg/apache/hadoop/fs/FileSystem;Ljava/lang/Iterable;)Ljava/util/List;
(AbstractS3ACommitter.java:490)
at
org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.listPendingUploads(Lorg/apache/hadoop/mapreduce/JobContext;Z)Ljava/util/List;
(StagingCommitter.java:502)
at
org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.listPendingUploadsToCommit(Lorg/apache/hadoop/mapreduce/JobContext;)Ljava/util/List;
(StagingCommitter.java:472)
at
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.commitJob(Lorg/apache/hadoop/mapreduce/JobContext;)V
(AbstractS3ACommitter.java:598)
at
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.commitJob(Lorg/apache/hadoop/mapreduce/JobContext;Lscala/collection/Seq;)V
(HadoopMapReduceCommitProtocol.scala:166)
at
org.apache.spark.internal.io.cloud.PathOutputCommitProtocol.commitJob(Lorg/apache/hadoop/mapreduce/JobContext;Lscala/collection/Seq;)V
(PathOutputCommitProtocol.scala:194)
at
{code}
Hypothesis: there are too many files to commit by way of enumerating them all
and then committing.
* We need to move to a sequence of load and commit or load and abort where both
the load and the commit/abort is done in the same worker thread.
* we don't create a list of results for a success file except for smaller jobs.
Maybe we could list the first 100 files and not worry about the rest; but do
add a counter of how many files there really were, if we didn't have one
already.
> S3A committers leak threads on job/task commit
> ----------------------------------------------
>
> Key: HADOOP-16570
> URL: https://issues.apache.org/jira/browse/HADOOP-16570
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.2.0, 3.1.2
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Priority: Major
>
> The fixed size ThreadPool created in AbstractS3ACommitter doesn't get cleaned
> up at EOL; as a result you leak the no. of threads set in
> "fs.s3a.committer.threads"
> Not visible in MR/distcp jobs, but ultimately causes OOM on Spark
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]