Repository: spark
Updated Branches:
  refs/heads/master 471de5db5 -> edc87d76e


[SPARK-20107][DOC] Add 
spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version option to 
configuration.md

## What changes were proposed in this pull request?

Add `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version` option to 
`configuration.md`.
Set `spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2` can speed 
up 
[HadoopMapReduceCommitProtocol.commitJob](https://github.com/apache/spark/blob/v2.1.0/core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala#L121)
 for many output files.

All cloudera's hadoop 2.6.0-cdh5.4.0 or higher versions(see: 
https://github.com/cloudera/hadoop-common/commit/1c1236182304d4075276c00c4592358f428bc433
 and 
https://github.com/cloudera/hadoop-common/commit/16b2de27321db7ce2395c08baccfdec5562017f0)
 and apache's hadoop 2.7.0 or higher versions support this improvement.

More see:

1. [MAPREDUCE-4815](https://issues.apache.org/jira/browse/MAPREDUCE-4815): 
Speed up FileOutputCommitter#commitJob for many output files.
2. [MAPREDUCE-6406](https://issues.apache.org/jira/browse/MAPREDUCE-6406): 
Update the default version for the property 
mapreduce.fileoutputcommitter.algorithm.version to 2.

## How was this patch tested?

Manual test and exist tests.

Author: Yuming Wang <[email protected]>

Closes #17442 from wangyum/SPARK-20107.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/edc87d76
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/edc87d76
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/edc87d76

Branch: refs/heads/master
Commit: edc87d76efea7b4d19d9d0c4ddba274a3ccb8752
Parents: 471de5d
Author: Yuming Wang <[email protected]>
Authored: Thu Mar 30 10:39:57 2017 +0100
Committer: Sean Owen <[email protected]>
Committed: Thu Mar 30 10:39:57 2017 +0100

----------------------------------------------------------------------
 docs/configuration.md | 9 +++++++++
 1 file changed, 9 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/edc87d76/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index 4729f1b..a975392 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -1137,6 +1137,15 @@ Apart from these, the following properties are also 
available, and may be useful
     mapping has high overhead for blocks close to or below the page size of 
the operating system.
   </td>
 </tr>
+<tr>
+  
<td><code>spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version</code></td>
+  <td>1</td>
+  <td>
+    The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
+    Version 2 may have better performance, but version 1 may handle failures 
better in certain situations,
+    as per <a 
href="https://issues.apache.org/jira/browse/MAPREDUCE-4815";>MAPREDUCE-4815</a>.
+  </td>
+</tr>
 </table>
 
 ### Networking


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to