I think is a Hadoop property that is just passed through? if the default is different in Hadoop 3 we could mention that in the docs. i don't know if we want to always set it to 1 as a Spark default, even in Hadoop 3 right?
On Thu, Jun 25, 2020 at 2:43 PM Waleed Fateem <waleed.fat...@gmail.com> wrote: > > Hello! > > I noticed that in the documentation starting with 2.2.0 it states that the > parameter spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 > by default: > https://issues.apache.org/jira/browse/SPARK-20107 > > I don't actually see this being set anywhere explicitly in the Spark code and > so the documentation isn't entirely accurate in case you run on an > environment that has MAPREDUCE-6406 implemented (starting with Hadoop 3.0). > > The default version was explicitly set to 2 in the FileOutputCommitter class, > so any output committer that inherits from this class (ParquetOutputCommitter > for example) would use v2 in a Hadoop 3.0 environment and v1 in the older > Hadoop environments. > > Would it make sense for us to consider setting v1 as the default in code in > case the configuration was not set by a user? > > Regards, > > Waleed --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org