Distinguishing file missing/corruption for low replication files
----------------------------------------------------------------
Key: HADOOP-6084
URL: https://issues.apache.org/jira/browse/HADOOP-6084
Project: Hadoop Core
Issue Type: Improvement
Components: dfs
Reporter: Koji Noguchi
In PIG-856, there's a discussion about reducing the replication factor for
intermediate files between jobs.
I've seen users doing the same in mapreduce jobs getting some speed up. (I
believe their outputs were too small to benefit from the pipelining.)
Problem is, when users start changing replications to 1 (or 2), ops starts
seeing alerts from fsck and HADOOP-4103 even with a single datanode failure.
Also, problem of Namenode not getting out of safemode when restarted.
My answer has been asking the users "please don't change the replication less
than 3".
But is this the right approach?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.