[ 
https://issues.apache.org/jira/browse/SPARK-42439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688615#comment-17688615
 ] 

Apache Spark commented on SPARK-42439:
--------------------------------------

User 'LorenzoMartini' has created a pull request for this issue:
https://github.com/apache/spark/pull/40017

> Job description in v2 FileWrites can have the wrong committer
> -------------------------------------------------------------
>
>                 Key: SPARK-42439
>                 URL: https://issues.apache.org/jira/browse/SPARK-42439
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.3.1
>            Reporter: Lorenzo Martini
>            Priority: Minor
>
> There is a difference in behavior between v1 writes and v2 writes in the 
> order of events happening when configuring the file writer and the committer.
> v1:
>  # writer.prepareWrite()
>  # committer.setupJob()
> v2:
>  # committer.setupJob()
>  # writer.prepareWrite()
>  
> This is because the `prepareWrite()` call (that is the one performing the 
> call `
> job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]])`)
> happens as part of the `createWriteJobDescription` which is `lazy val` in the 
> `toBatch` call and therefore is evaluated after the `committer.setupJob` at 
> the end of the `toBatch`
> This causes issues when evaluating the committer as some elements might be 
> missing, for example the aforementioned output format class not being set, 
> causing the committer being set up as generic write instead of parquet write.
>  
> The fix is very simple and it is to make the `createJobDescription` call 
> non-lazy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to