More to the specific point, yes, all 100 nodes will wind up storing data for
large files because blocks should be assigned pretty much at random.
The exception is files that originate on a datanode. There, the local node
gets one copy of each block. Replica blocks follow the random rule,
howeve
The replication factor should be such that it can provide some level of
availability and performance. HDFS attempts to distribute replicas of a
block so that they reside across multiple racks. HDFS block replication
is *purely* block-based and file-agnostic; i.e. blocks belonging to the
same file a
The web interface to the namenode will let your drill down to the file
itself. That will tell you where the blocks are (scroll down to the
bottom). You can also use hadoop fsck
For example:
[EMAIL PROTECTED]:~/hadoop-0.15.1$ bin/hadoop fsck
/user/rmobin/data/11/30Statu
Hi All:
Is there a tool available that will provide information about how a file is
replicated within HDFS? I'm looking for something that will "prove" that a
file is replicated across multiple nodes, and let me see how many nodes
participated, etc. This is a point of interest technicall