[
https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17759065#comment-17759065
]
Dipayan Dev edited comment on SPARK-44884 at 8/25/23 2:20 PM:
--------------------------------------------------------------
Right, the behaviour is same in Spark 2 and 3. However, in Spark 2.x after
renaming the temporary subdir, it writes the _SUCCESS file on the root path but
not in Spark 3.x when that param is passed.
I see this part of the code ([Hadoop
Committer|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L433[]])
is not changed in the latest hadoop-mapreduce, but somewhere probably
_partitionOverwriteMode_ {color:#172b4d}option is broken when passed from
latest Spark Dataframewriter. {color}
was (Author: JIRAUSER301514):
Right, the behaviour is same in Spark 2 and 3. However, in Spark 2.x after
renaming the temporary subdir, it writes the _SUCCESS file on the root path but
not in Spark 3.x when that param is passed.
I see this part of the code ([Hadoop
Committer|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L433[]|https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java#L433])
is not changed in the latest hadoop-mapreduce, but somewhere
partitionOverwriteMode {color:#172b4d}option is broken when passed latest Spark
Dataframewriter. {color}
> Spark doesn't create SUCCESS file in Spark 3.3.0+ when partitionOverwriteMode
> is dynamic
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-44884
> URL: https://issues.apache.org/jira/browse/SPARK-44884
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 3.3.0
> Reporter: Dipayan Dev
> Priority: Critical
> Attachments: image-2023-08-20-18-46-53-342.png,
> image-2023-08-25-13-01-42-137.png
>
>
> The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0
> (tested with 3.4.1 as well)
> Code to reproduce the issue
>
> {code:java}
> scala> spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
> scala> val DF = Seq(("test1", 123)).toDF("name", "num")
> scala> DF.write.option("path",
> "gs://test_bucket/table").mode("overwrite").partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1")
> {code}
>
> The above code succeeds and creates external Hive table, but {*}there is no
> SUCCESS file generated{*}.
> Adding the content of the bucket after table creation
> !image-2023-08-25-13-01-42-137.png|width=500,height=130!
> The same code when running with spark 2.4.0 (with or without external path),
> generates the SUCCESS file.
> {code:java}
> scala>
> DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1"){code}
> !image-2023-08-20-18-46-53-342.png|width=465,height=166!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]