[ 
https://issues.apache.org/jira/browse/HADOOP-12017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14621944#comment-14621944
 ] 

Vinayakumar B commented on HADOOP-12017:
----------------------------------------

bq. I understand that you want some way to set the replication of the index 
files. But why the source file replication factor and the destination index 
file replication factor have to be the same?
{{jobfs.setReplication(srcFiles, repl);}}, The {{repl}} used to set the 
replication of {{srcFiles}}. But this {{srcFiles}} is not the actual source 
files which contains data, this is just an intermediate list of filestatuses, 
written as sequencefile, which will be read to generate the MR job splits, 
immediately after this file is created. First, HDFS will not have any time to 
replicate, second,  there is no use of increasing the replication since it will 
be read in the same client and only once as part of split generation. Also 
{{srcFiles}} will be deleted once the Job is done.

On the other hand, actual data files, which are created from mappers as part 
files, have the default replication. Still the proposed patch didn't change 
this. Need to change this these also.

So, IMO, user specified 'replication' should be used for the resultant archive 
(both content and indexes), not for the intermediate file.
Also, since default replication 10, is not really used, we can change this to 
default replication 3 itself. and update in docs also.

Any thoughts?

> Hadoop archives command should use configurable replication factor when 
> closing
> -------------------------------------------------------------------------------
>
>                 Key: HADOOP-12017
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12017
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Zhe Zhang
>            Assignee: Bibin A Chundatt
>         Attachments: 0002-HADOOP-12017.patch, 0003-HADOOP-12017.patch, 
> 0003-HADOOP-12017.patch, 0004-HADOOP-12017.patch
>
>
> {{HadoopArchives#HArchivesReducer#close}} uses hard-coded replication factor. 
> It should use {{repl}} instead, which is parsed from command line parameters.
> {code}
>       // try increasing the replication 
>       fs.setReplication(index, (short) 5);
>       fs.setReplication(masterIndex, (short) 5);
>     }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to