[ https://issues.apache.org/jira/browse/PIG-5319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363875#comment-17363875 ]
Koji Noguchi commented on PIG-5319: ----------------------------------- I do see OutputFormat created twice (*** below) Using Spark-2.4 {code:java|title=SparkHadoopWriter.scala} 117 committer.setupTask(taskContext). *** 118 119 // Initiate the writer. 120 config.initWriter(taskContext, sparkPartitionId) *** {code} Within setupTask and initWriter, each is creating a separate OutputFormat. Trace for each. {noformat} SparkHadoopWriter.scala:117 committer.setupTask(taskContext) --> HadoopMapReduceCommitProtocol.scala:217 setupCommitter(taskContext) --> --> HadoopMapReduceCommitProtocol.scala:94 val format = context.getOutputFormatClass.newInstance() {noformat} and {noformat} SparkHadoopWriter.scala:120 config.initWriter(taskContext, sparkPartitionId) --> SparkHadoopWriter.scala:343 val taskFormat = getOutputFormat() --> --> SparkHadoopWriter.scala:384 outputFormat.newInstance() {noformat} > Investigate why TestStoreInstances fails with Spark 2.2 > ------------------------------------------------------- > > Key: PIG-5319 > URL: https://issues.apache.org/jira/browse/PIG-5319 > Project: Pig > Issue Type: Bug > Components: spark > Reporter: Nándor Kollár > Priority: Major > > TestStoreInstances unit test fails with Spark 2.2.x. It seems in job and task > commit logic changed a lot since Spark 2.1.x, now it looks like Spark uses a > different PigOutputFormat when writing to files, and a different one when > getting the OutputCommitters -- This message was sent by Atlassian Jira (v8.3.4#803005)