[
http://issues.apache.org/jira/browse/HADOOP-51?page=comments#action_12373745 ]
Bryan Pendleton commented on HADOOP-51:
---------------------------------------
Great!
A few comments from reading the patch (haven't test with it yet):
1) The <description> for dfs.replication.min is wrong
2) This is a wider concern, but on coding style - the idiom of
conf.getType("config.value",defaultValue) is good for user-defined values, but
shouldn't the default be skipped for things that are defined in
hadoop-default.xml, in general? It takes away the value of hadoop-default, and
it also means changing that value might or might not always have the desired
system-wide results.
3) Wouldn't it be better to log at a severe level replications that are set
below minReplication, or greater than maxReplication, and just set the
replication to the nearest bound? Since replication is set per-file by the
application, but min and max are probably set by the administrator of the
hadoop cluster. Throwing an IOException causes failure where degraded
performance would be preferable.
4) I may be dense, but I didn't see any way to specify that replication be
"full", ie, a copy per datanode. I got the feeling this was something that was
desired of this functionality (ie, for job.jar files, job configs, and lookup
data used widely in a job) Using a short means, if we ever scale to > 32k
nodes, there'd be no way to manually specify this. Just using Short.MAX_VALUE
means getting a lot of errors about not being able to replicate as fully as
desired.
Otherwise, this looks like a wonderful patch!
> per-file replication counts
> ---------------------------
>
> Key: HADOOP-51
> URL: http://issues.apache.org/jira/browse/HADOOP-51
> Project: Hadoop
> Type: New Feature
> Components: dfs
> Versions: 0.2
> Reporter: Doug Cutting
> Assignee: Konstantin Shvachko
> Fix For: 0.2
> Attachments: Replication.patch
>
> It should be possible to specify different replication counts for different
> files. Perhaps an option when creating a new file should be the desired
> replication count. MapReduce should take advantage of this feature so that
> job.xml and job.jar files, which are frequently accessed by lots of machines,
> are more highly replicated than large data files.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira