Re: HDFS tool and replication questions...

2007-12-10 Thread Ted Dunning
More to the specific point, yes, all 100 nodes will wind up storing data for large files because blocks should be assigned pretty much at random. The exception is files that originate on a datanode. There, the local node gets one copy of each block. Replica blocks follow the random rule, howeve

RE: HDFS tool and replication questions...

2007-12-10 Thread dhruba Borthakur
The replication factor should be such that it can provide some level of availability and performance. HDFS attempts to distribute replicas of a block so that they reside across multiple racks. HDFS block replication is *purely* block-based and file-agnostic; i.e. blocks belonging to the same file a

Re: HDFS tool and replication questions...

2007-12-10 Thread Ted Dunning
The web interface to the namenode will let your drill down to the file itself. That will tell you where the blocks are (scroll down to the bottom). You can also use hadoop fsck For example: [EMAIL PROTECTED]:~/hadoop-0.15.1$ bin/hadoop fsck /user/rmobin/data/11/30Statu

HDFS tool and replication questions...

2007-12-10 Thread C G
Hi All: Is there a tool available that will provide information about how a file is replicated within HDFS? I'm looking for something that will "prove" that a file is replicated across multiple nodes, and let me see how many nodes participated, etc. This is a point of interest technicall