[jira] Commented: (HADOOP-51) per-file replication counts

Bryan Pendleton (JIRA) Sat, 08 Apr 2006 12:16:35 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12373745 ]


Bryan Pendleton commented on HADOOP-51:
---------------------------------------

Great!

A few comments from reading the patch (haven't test with it yet):
1) The <description> for dfs.replication.min is wrong
2) This is a wider concern, but on coding style - the idiom of 
conf.getType("config.value",defaultValue) is good for user-defined values, but 
shouldn't the default be skipped for things that are defined in 
hadoop-default.xml, in general? It takes away the value of hadoop-default, and 
it also means changing that value might or might not always have the desired 
system-wide results.
3) Wouldn't it be better to log at a severe level replications that are set 
below minReplication, or greater than maxReplication, and just set the 
replication to the nearest bound? Since replication is set per-file by the 
application, but min and max are probably set by the administrator of the 
hadoop cluster. Throwing an IOException causes failure where degraded 
performance would be preferable.
4) I may be dense, but I didn't see any way to specify that replication be 
"full", ie, a copy per datanode. I got the feeling this was something that was 
desired of this functionality (ie, for job.jar files, job configs, and lookup 
data used widely in a job) Using a short means, if we ever scale to > 32k 
nodes, there'd be no way to manually specify this. Just using Short.MAX_VALUE 
means getting a lot of errors about not being able to replicate as fully as 
desired.

Otherwise, this looks like a wonderful patch!

> per-file replication counts
> ---------------------------
>
>          Key: HADOOP-51
>          URL: http://issues.apache.org/jira/browse/HADOOP-51
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko
>      Fix For: 0.2
>  Attachments: Replication.patch
>
> It should be possible to specify different replication counts for different 
> files.  Perhaps an option when creating a new file should be the desired 
> replication count.  MapReduce should take advantage of this feature so that 
> job.xml and job.jar files, which are frequently accessed by lots of machines, 
> are more highly replicated than large data files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-51) per-file replication counts

Reply via email to