[
https://issues.apache.org/jira/browse/SPARK-33402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17228788#comment-17228788
]
Steve Loughran commented on SPARK-33402:
----------------------------------------
The good news, this surfaced in a test I have, now that HADOOP-17318 allows me
to configure the S3A committers to fail fast if they don't get a job ID.
{code}
2020-11-09 10:58:25,827 [Executor task launch worker for task 1] INFO
rdd.HadoopRDD (Logging.scala:logInfo(57)) - Input split:
s3a://stevel-ireland/cloud-integration/DELAY_LISTING_ME/S3ABasicIOSuite/FileOutput/part-00000:0+3893
- FileOutput
- NewHadoopAPI *** FAILED ***
org.apache.hadoop.fs.s3a.commit.PathCommitException: `': Job/task context
does not contain a unique ID in spark.sql.sources.writeJobUUID
at
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.buildJobUUID(AbstractS3ACommitter.java:1268)
at
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitter.<init>(AbstractS3ACommitter.java:176)
at
org.apache.hadoop.fs.s3a.commit.staging.StagingCommitter.<init>(StagingCommitter.java:113)
at
org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitter.<init>(DirectoryStagingCommitter.java:57)
at
org.apache.hadoop.fs.s3a.commit.staging.DirectoryStagingCommitterFactory.createTaskCommitter(DirectoryStagingCommitterFactory.java:45)
at
org.apache.hadoop.fs.s3a.commit.S3ACommitterFactory.createTaskCommitter(S3ACommitterFactory.java:81)
at
org.apache.hadoop.fs.s3a.commit.AbstractS3ACommitterFactory.createOutputCommitter(AbstractS3ACommitterFactory.java:48)
at
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.getOutputCommitter(FileOutputFormat.java:338)
at
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupCommitter(HadoopMapReduceCommitProtocol.scala:100)
at
org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:161)
...
{code}
> SparkHadoopWriter to pass job UUID down in spark.sql.sources.writeJobUUID
> -------------------------------------------------------------------------
>
> Key: SPARK-33402
> URL: https://issues.apache.org/jira/browse/SPARK-33402
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 2.4.8, 3.1.0
> Reporter: Steve Loughran
> Priority: Major
>
> SPARK-33230 restored setting a unique job ID in a spark sql job writing
> through the hadoop output formatters, but saving files from an RDD don't
> because SparkHadoopWriter doesn't insert the UUID
> Proposed: set the same property
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]