[jira] [Commented] (SPARK-44884) Spark doesn't create SUCCESS file in Spark 3.3.0+ when partitionOverwriteMode is dynamic

Steve Loughran (Jira) Mon, 22 Jul 2024 05:34:04 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17867743#comment-17867743
 ]


Steve Loughran commented on SPARK-44884:
----------------------------------------

FWIW The new manifest committer, written for performance on abfs and 
correctness + performance on gcs generates the exact same JSON file as the s3a 
committers, and can be executed against local file:// URLs as well as hdfs. If 
the base hadoop version spark uses includes this committer (MAPREDUCE-7341} 
then you could write a test to verify the copied file is JSON; the class 
org.apache.hadoop.mapreduce.lib.output.committer.manifest.files.ManifestSuccessData
 will actually load the manifest and let you access and print its internals

bq. Also, ensure that when 
"spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs"=”false”, the 
_SUCCESS marker file will not be created by the Hadoop output committers in 
stagingDir itself.


that's in the hadoop mapreduce codebase, should all be good there -but tests 
are welcome. 

> Spark doesn't create SUCCESS file in Spark 3.3.0+ when partitionOverwriteMode 
> is dynamic
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-44884
>                 URL: https://issues.apache.org/jira/browse/SPARK-44884
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 3.3.0
>            Reporter: Dipayan Dev
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2023-08-20-18-46-53-342.png, 
> image-2023-08-25-13-01-42-137.png
>
>
> The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0 
> (tested with 3.4.1 as well)
> Code to reproduce the issue
>  
> {code:java}
> scala> spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic") 
> scala> val DF = Seq(("test1", 123)).toDF("name", "num")
> scala> DF.write.option("path", 
> "gs://test_bucket/table").mode("overwrite").partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1")
>  {code}
>  
> The above code succeeds and creates external Hive table, but {*}there is no 
> SUCCESS file generated{*}.
> Adding the content of the bucket after table creation
> !image-2023-08-25-13-01-42-137.png|width=500,height=130!
>  The same code when running with spark 2.4.0 (with or without external path), 
> generates the SUCCESS file.
> {code:java}
> scala> 
> DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.test_tb1"){code}
> !image-2023-08-20-18-46-53-342.png|width=465,height=166!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-44884) Spark doesn't create SUCCESS file in Spark 3.3.0+ when partitionOverwriteMode is dynamic

Reply via email to