[ https://issues.apache.org/jira/browse/SPARK-42439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17688615#comment-17688615 ]
Apache Spark commented on SPARK-42439: -------------------------------------- User 'LorenzoMartini' has created a pull request for this issue: https://github.com/apache/spark/pull/40017 > Job description in v2 FileWrites can have the wrong committer > ------------------------------------------------------------- > > Key: SPARK-42439 > URL: https://issues.apache.org/jira/browse/SPARK-42439 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 3.3.1 > Reporter: Lorenzo Martini > Priority: Minor > > There is a difference in behavior between v1 writes and v2 writes in the > order of events happening when configuring the file writer and the committer. > v1: > # writer.prepareWrite() > # committer.setupJob() > v2: > # committer.setupJob() > # writer.prepareWrite() > > This is because the `prepareWrite()` call (that is the one performing the > call ` > job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]])`) > happens as part of the `createWriteJobDescription` which is `lazy val` in the > `toBatch` call and therefore is evaluated after the `committer.setupJob` at > the end of the `toBatch` > This causes issues when evaluating the committer as some elements might be > missing, for example the aforementioned output format class not being set, > causing the committer being set up as generic write instead of parquet write. > > The fix is very simple and it is to make the `createJobDescription` call > non-lazy -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org