RE: HDFS tool and replication questions...

dhruba Borthakur Mon, 10 Dec 2007 13:12:35 -0800

The replication factor should be such that it can provide some level of
availability and performance. HDFS attempts to distribute replicas of a
block so that they reside across multiple racks. HDFS block replication
is *purely* block-based and file-agnostic; i.e. blocks belonging to the
same file are handled precisely the same way as blocks belonging to
different files.


Hope this helps,
dhruba
   
  Also, are there any metrics or best practices around what the
replication factor should be based on the number of nodes in the grid?
Does HDFS attempt to involve all nodes in the grid in replication?  In
other words, if I have 100 nodes in my grid, and a replication factor of
6, will all 100 nodes wind up storing data for a given file assuming the
file large enough?
   
  Thanks,
  C G

       
---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo!
Search.

RE: HDFS tool and replication questions...

Reply via email to