[
https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867688#comment-17867688
]
Anika Kelhanka commented on SPARK-44884:
----------------------------------------
*Issue:*
* This issue happens specifically when {{partitionOverwriteMode = dynamic}}
(Insert Overwrite -
[SPARK-20236|https://issues.apache.org/jira/browse/SPARK-20236]).
* "_SUCCESS" file is created for spark version <= 3.0.2, given:
{{"spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs"=”true”}}.
* "_SUCCESS" file is not created for spark version > 3.0.2 even when
{{"spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs"=”true”}}.
----------------------------------------------------------------------------------------------------------------------------------------
*Analysis (RCA):*
* In the Spark versions prior to 3.0.2, the SUCCESS Marker file is created on
the root path when spark job is successful. This is expected behavior.
* What changed: After the change for
[SPARK-29302|https://issues.apache.org/jira/browse/SPARK-29302] (dynamic
partition overwrite with speculation enabled) got merged, the SUCCESS marker
file stopped getting created at the root location when the Spark job writes in
dynamic partition override mode.
* The change [SPARK-29302|https://issues.apache.org/jira/browse/SPARK-29302]
(dynamic partition overwrite with speculation enabled) sets the
{{committerOutputPath=${stagingDir}}} which previously stored root dir path, in
[this
codeblock|https://github.com/apache/spark/pull/29000/files#diff-15b529afe19e971b138fc604909bcab2e42484babdcea937f41d18cb22d9401dR167-R175].
* The {{committerOutputPath}} parameter is passed on to the hadoop committer,
which creates the SUCCESS marker file at the path specified in
{{committerOutputPath}} parameter. Thus, the SUCESS marker is now created
inside the stagingDir.
* Once Hadoop committer has finished writing, The Spark Commit Protocol logic
copies all the data files to root path, [but NOT the SUCCESS marker] before
deleting the ${stagingDir}.
* The stagingDir is then deleted along with SUCCESS Marker file.
----------------------------------------------------------------------------------------------------------------------------------------
*Proposed Fix:*
The gap in this logic can be mended by adding a step to copy _SUCCESS file as
well to the final location before deleting the stagingDir.
Also, ensure that when
{{"spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs"=”false”}},
the _SUCCESS marker file will not be created by the Hadoop output committers in
stagingDir itself.
I am working on a fix for same.
> Spark doesn't create SUCCESS file in Spark 3.3.0+ when partitionOverwriteMode
> is dynamic
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-44884
> URL: https://issues.apache.org/jira/browse/SPARK-44884
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.3.0
> Reporter: Dipayan Dev
> Priority: Major
> Attachments: image-2023-08-20-18-46-53-342.png,
> image-2023-08-25-13-01-42-137.png
>
>
> The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0
> (tested with 3.4.1 as well)
> Code to reproduce the issue
>
> {code:java}
> scala> spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
> scala> val DF = Seq(("test1", 123)).toDF("name", "num")
> scala> DF.write.option("path",
> "gs://test_bucket/table").mode("overwrite").partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1")
> {code}
>
> The above code succeeds and creates external Hive table, but {*}there is no
> SUCCESS file generated{*}.
> Adding the content of the bucket after table creation
> !image-2023-08-25-13-01-42-137.png|width=500,height=130!
> The same code when running with spark 2.4.0 (with or without external path),
> generates the SUCCESS file.
> {code:java}
> scala>
> DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1"){code}
> !image-2023-08-20-18-46-53-342.png|width=465,height=166!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]