[
https://issues.apache.org/jira/browse/HADOOP-12017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621944#comment-14621944
]
Vinayakumar B commented on HADOOP-12017:
----------------------------------------
bq. I understand that you want some way to set the replication of the index
files. But why the source file replication factor and the destination index
file replication factor have to be the same?
{{jobfs.setReplication(srcFiles, repl);}}, The {{repl}} used to set the
replication of {{srcFiles}}. But this {{srcFiles}} is not the actual source
files which contains data, this is just an intermediate list of filestatuses,
written as sequencefile, which will be read to generate the MR job splits,
immediately after this file is created. First, HDFS will not have any time to
replicate, second, there is no use of increasing the replication since it will
be read in the same client and only once as part of split generation. Also
{{srcFiles}} will be deleted once the Job is done.
On the other hand, actual data files, which are created from mappers as part
files, have the default replication. Still the proposed patch didn't change
this. Need to change this these also.
So, IMO, user specified 'replication' should be used for the resultant archive
(both content and indexes), not for the intermediate file.
Also, since default replication 10, is not really used, we can change this to
default replication 3 itself. and update in docs also.
Any thoughts?
> Hadoop archives command should use configurable replication factor when
> closing
> -------------------------------------------------------------------------------
>
> Key: HADOOP-12017
> URL: https://issues.apache.org/jira/browse/HADOOP-12017
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: Zhe Zhang
> Assignee: Bibin A Chundatt
> Attachments: 0002-HADOOP-12017.patch, 0003-HADOOP-12017.patch,
> 0003-HADOOP-12017.patch, 0004-HADOOP-12017.patch
>
>
> {{HadoopArchives#HArchivesReducer#close}} uses hard-coded replication factor.
> It should use {{repl}} instead, which is parsed from command line parameters.
> {code}
> // try increasing the replication
> fs.setReplication(index, (short) 5);
> fs.setReplication(masterIndex, (short) 5);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)