I was trying to make my email short and concise, but the rationale behind setting that as 1 by default is because it's safer. With algorithm version 2 you run the risk of having bad data in cases where tasks fail or even duplicate data if a task fails and succeeds on a reattempt (I don't know if this is true for all OutputCommitters that extend the FileOutputCommitter or not).
Imran and Marcelo also discussed this here: https://issues.apache.org/jira/browse/SPARK-20107?focusedCommentId=15945177&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15945177 I also did discuss this a bit with Steve Loughran and his opinion was that v2 should just be deprecated all together. I believe he was going to bring that up with the Hadoop developers. On Thu, Jun 25, 2020 at 3:56 PM Sean Owen <sro...@gmail.com> wrote: > I think is a Hadoop property that is just passed through? if the > default is different in Hadoop 3 we could mention that in the docs. i > don't know if we want to always set it to 1 as a Spark default, even > in Hadoop 3 right? > > On Thu, Jun 25, 2020 at 2:43 PM Waleed Fateem <waleed.fat...@gmail.com> > wrote: > > > > Hello! > > > > I noticed that in the documentation starting with 2.2.0 it states that > the parameter spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version > is 1 by default: > > https://issues.apache.org/jira/browse/SPARK-20107 > > > > I don't actually see this being set anywhere explicitly in the Spark > code and so the documentation isn't entirely accurate in case you run on an > environment that has MAPREDUCE-6406 implemented (starting with Hadoop 3.0). > > > > The default version was explicitly set to 2 in the FileOutputCommitter > class, so any output committer that inherits from this class > (ParquetOutputCommitter for example) would use v2 in a Hadoop 3.0 > environment and v1 in the older Hadoop environments. > > > > Would it make sense for us to consider setting v1 as the default in code > in case the configuration was not set by a user? > > > > Regards, > > > > Waleed >