Hi All:
   
  Is there a tool available that will provide information about how a file is 
replicated within HDFS?  I'm looking for something that will "prove" that a 
file is replicated across multiple nodes, and let me see how many nodes 
participated, etc.  This is a point of interest technically, but more 
importantly a point of due diligence around data security and integrity 
accountability. 
   
  Also, are there any metrics or best practices around what the replication 
factor should be based on the number of nodes in the grid?  Does HDFS attempt 
to involve all nodes in the grid in replication?  In other words, if I have 100 
nodes in my grid, and a replication factor of 6, will all 100 nodes wind up 
storing data for a given file assuming the file large enough?
   
  Thanks,
  C G

       
---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

Reply via email to