I think is a Hadoop property that is just passed through? if the
default is different in Hadoop 3 we could mention that in the docs. i
don't know if we want to always set it to 1 as a Spark default, even
in Hadoop 3 right?

On Thu, Jun 25, 2020 at 2:43 PM Waleed Fateem <waleed.fat...@gmail.com> wrote:
>
> Hello!
>
> I noticed that in the documentation starting with 2.2.0 it states that the 
> parameter spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version is 1 
> by default:
> https://issues.apache.org/jira/browse/SPARK-20107
>
> I don't actually see this being set anywhere explicitly in the Spark code and 
> so the documentation isn't entirely accurate in case you run on an 
> environment that has MAPREDUCE-6406 implemented (starting with Hadoop 3.0).
>
> The default version was explicitly set to 2 in the FileOutputCommitter class, 
> so any output committer that inherits from this class (ParquetOutputCommitter 
> for example) would use v2 in a Hadoop 3.0 environment and v1 in the older 
> Hadoop environments.
>
> Would it make sense for us to consider setting v1 as the default in code in 
> case the configuration was not set by a user?
>
> Regards,
>
> Waleed

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to