[ 
http://issues.apache.org/jira/browse/HADOOP-170?page=comments#action_12376581 ] 

Doug Cutting commented on HADOOP-170:
-------------------------------------

>  This probably has to be called smartReplication(). 

What's the matter with Integer.MAX_VALUE?

This is one of the most important applications for variable replications 
counts.  We have them now, but they're not yet easy to use in the most obvious 
and needed application.  That's why I'm asking about this.

> we will still not see the desired locality because of the crc files

The .crc files are very tiny, much less than 1%.  If all of the data is read 
locally but .crc files, then throughput will be much faster, switches will not 
be the bottleneck.


> setReplication and related bug fixes
> ------------------------------------
>
>          Key: HADOOP-170
>          URL: http://issues.apache.org/jira/browse/HADOOP-170
>      Project: Hadoop
>         Type: Improvement

>   Components: fs, dfs
>     Versions: 0.1.1
>     Reporter: Konstantin Shvachko
>     Assignee: Konstantin Shvachko
>  Attachments: setReplication.patch
>
> Having variable replication (HADOOP-51) it is natural to be able to
> change replication for existing files. This patch introduces the 
> functionality.
> Here is a detailed list of issues addressed by the patch.
> 1) setReplication() and getReplication() methods are implemented.
> 2) DFSShell prints file replication for any listed file.
> 3) Bug fix. FSDirectory.delete() logs delete operation even if it is not 
> successful.
> 4) Bug fix. This is a distributed bug.
> Suppose that file replication is 3, and a client reduces it to 1.
> Two data nodes will be chosen to remove their copies, and will do that.
> After a while they will report to the name node that the copies have been 
> actually deleted.
> Until they report the name node assumes the copies still exist.
> Now the client decides to increase replication back to 3 BEFORE the data nodes
> reported the copies are deleted. Then the name node can choose one of the 
> data nodes,
> which it thinks have a block copy, to replicate the block to new data nodes.
> This setting is quite unusual but possible even without variable replications.
> 5) Logging for name and data nodes is improved in several cases.
> E.g. data nodes never logged that they deleted a block.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to