[jira] Commented: (HADOOP-171) need standard API to set dfs replication = high

Doug Cutting (JIRA) Wed, 26 Apr 2006 16:36:25 -0700

    [ 
http://issues.apache.org/jira/browse/HADOOP-171?page=comments#action_12376605 ]


Doug Cutting commented on HADOOP-171:
-------------------------------------

One alternative to 'fs.copyFromLocalFile(localJobJar, remoteJobJar, 
Short.MAX_VALUE)' might be:

fs.copyFromLocalFile(localJobJar, remoteJobJar);
fs.setReplicatoin(remoteJobJar, Short.MAX_VALUE);

In other words, we indicate this after file creation.  Or if folks don't like 
using Short.MAX_VALUE this way, then this could be something like:

fs.create("job.xml"); 
fs.setReplicateHighly("job.xml");
fs.copyFromLocalFile(localJobJar, remoteJobJar);
fs.setReplicateHighly(remoteJobJar);

One issue with this is, what would getReplication() return for these?  And 
would setReplication(f, getReplication(f)) be a no-op?  An advantage of using a 
sentinel value like Short.MAX_VALUE is that it doesn't add a lot of special 
cases to the existing, numeric API.

> need standard API to set dfs replication = high
> -----------------------------------------------
>
>          Key: HADOOP-171
>          URL: http://issues.apache.org/jira/browse/HADOOP-171
>      Project: Hadoop
>         Type: New Feature

>   Components: dfs
>     Versions: 0.2
>     Reporter: Doug Cutting
>     Assignee: Konstantin Shvachko

>
> There should be a standard way to indicate that files should be highly 
> replicated, appropriate for files that all nodes will read.  This should be 
> settable both on file creation and for already-existing files.  Perhaps 
> specifying a particular replication value, like Short.MAX_VALUE, or zero, can 
> be used to signal this.  The level should not be constant, but should be 
> relative to the cluster size and network topography.  If more nodes are added 
> or if nodes are deleted, the actual replication count should increase or 
> decrease.
> Initially, all that is needed is an API to specify this.  It could initially 
> be implemented with a constant (e.g., 10) or with something related to the 
> number of datanodes (sqrt?), and needn't auto-adjust as the cluster size 
> changes  That is only  the long-term goal.
> When JobClient copies job files (job.xml & job.jar) into the job's 
> filesystem, it should specify this replication level.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-171) need standard API to set dfs replication = high

Reply via email to