[ http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376639 ]
eric baldeschwieler commented on HADOOP-171: -------------------------------------------- highReplicationHint is not the right description IMO. One goal is to optimize on distribution. Call it distributionHint(). Another might be everyRackHint(). Different semantics, different effect. > need standard API to set dfs replication = high > ----------------------------------------------- > > Key: HADOOP-171 > URL: http://issues.apache.org/jira/browse/HADOOP-171 > Project: Hadoop > Type: New Feature > Components: dfs > Versions: 0.2 > Reporter: Doug Cutting > Assignee: Konstantin Shvachko > > There should be a standard way to indicate that files should be highly > replicated, appropriate for files that all nodes will read. This should be > settable both on file creation and for already-existing files. Perhaps > specifying a particular replication value, like Short.MAX_VALUE, or zero, can > be used to signal this. The level should not be constant, but should be > relative to the cluster size and network topography. If more nodes are added > or if nodes are deleted, the actual replication count should increase or > decrease. > Initially, all that is needed is an API to specify this. It could initially > be implemented with a constant (e.g., 10) or with something related to the > number of datanodes (sqrt?), and needn't auto-adjust as the cluster size > changes That is only the long-term goal. > When JobClient copies job files (job.xml & job.jar) into the job's > filesystem, it should specify this replication level. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
