The web interface to the namenode will let your drill down to the file
itself.  That will tell you where the blocks are (scroll down to the
bottom).  You can also use hadoop fsck <filename>

For example:

[EMAIL PROTECTED]:~/hadoop-0.15.1$ bin/hadoop fsck
/user/rmobin/data/11/30........................Status: HEALTHY
 Total size:    13728838080 B
 Total blocks:  216 (avg. block size 63559435 B)
 Total dirs:    0
 Total files:   24
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Target replication factor:     2
 Real replication factor:       2.0


The filesystem under path '/user/rmobin/data/11/30' is HEALTHY




On 12/10/07 11:58 AM, "C G" <[EMAIL PROTECTED]> wrote:

> Hi All:
>    
>   Is there a tool available that will provide information about how a file is
> replicated within HDFS?  I'm looking for something that will "prove" that a
> file is replicated across multiple nodes, and let me see how many nodes
> participated, etc.  This is a point of interest technically, but more
> importantly a point of due diligence around data security and integrity
> accountability. 
>    
>   Also, are there any metrics or best practices around what the replication
> factor should be based on the number of nodes in the grid?  Does HDFS attempt
> to involve all nodes in the grid in replication?  In other words, if I have
> 100 nodes in my grid, and a replication factor of 6, will all 100 nodes wind
> up storing data for a given file assuming the file large enough?
>    
>   Thanks,
>   C G
> 
>        
> ---------------------------------
> Looking for last minute shopping deals?  Find them fast with Yahoo! Search.

Reply via email to