Hi All:
Is there a tool available that will provide information about how a file is
replicated within HDFS? I'm looking for something that will "prove" that a
file is replicated across multiple nodes, and let me see how many nodes
participated, etc. This is a point of interest technically, but more
importantly a point of due diligence around data security and integrity
accountability.
Also, are there any metrics or best practices around what the replication
factor should be based on the number of nodes in the grid? Does HDFS attempt
to involve all nodes in the grid in replication? In other words, if I have 100
nodes in my grid, and a replication factor of 6, will all 100 nodes wind up
storing data for a given file assuming the file large enough?
Thanks,
C G
---------------------------------
Looking for last minute shopping deals? Find them fast with Yahoo! Search.