Re: Unbalanced Datanode and Lots of Blocks Waiting for Deletion

Todd Lipcon Wed, 02 Jun 2010 14:35:28 -0700

Hi Jeff,

This issue is caused by a confluence of factors.


The first is this bug fixed in trunk:
https://issues.apache.org/jira/browse/HADOOP-5124

The second, and what causes a lot of extra deletion when it shouldn't,
especially with HBase is this one, not fixed yet:
https://issues.apache.org/jira/browse/HDFS-1172

And lastly, the thing that can cause deletions to hold up heartbeats and
cause HDFS-1172:
https://issues.apache.org/jira/browse/HDFS-611

We'll likely include HDFS-611 and HADOOP-5124 in the next beta release of
CDH3.

Thanks
-Todd

On Wed, Jun 2, 2010 at 2:27 PM, jeff whiting <je...@qualtrics.com> wrote:

> I'm running a 3 node hdfs cluster and am having major data distribution
> issues.  Looking at "live nodes" in the web interface I'm seeing the
> following:
>
> NodeLast
> Contact Admin StateConfigured
> Capacity (TB) Used
> (TB)Non DFS
> Used (TB) Remaining
> (TB)Used
> (%) Used
> (%)Remaining
> (%) 
> Blocksds1<http://ds1.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F>
> 2 In Service5.37 1.620.27 3.4830.19  64.7381969 
> ds2<http://ds2.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F>
> 0 In Service5.37 5.10.27 094.9  0.0172692 
> ds3<http://ds3.internal:50075/browseDirectory.jsp?namenodeInfoPort=50070&dir=%2F>0In
> Service 5.371.77 0.273.33 32.91 62.0184412
>
> In a non-html formated way:
>
> node   capacity   %used
> ds1      5.37TB     30%
> ds2      5.37TB     95%
> ds3      5.37TB     32%
>
>
> I ran a dfsadmin metasave and got the following
>
> Metasave: Blocks waiting for replication: 0
> Metasave: Blocks being replicated: 0
> Metasave: Blocks 692759 waiting deletion from 3 datanodes.
>
> It looks like all of the spacing being used on ds2 is due to block not
> being deleted.  The vast majority of blocks that need to be deleted are
> attributed to ds2 (I didn't include it here because the list is so large).
> Checking the logs I'll see the occasional:
>
> (FSNamesystem.java:invalidateWorkForOneNode(2717)) - BLOCK* ask
> 192.168.0.81:50010 to delete  blk_8850139985669106987_2950393
> blk_6677512006515381913_3142239 blk_-7534196842342813001_2880360
> blk_6575946937866450337_3280570 blk_-3722158283806045364_3118632
> blk_-3490603823691151224_3036593 blk_-897396045616120182_2930553
> blk_-4660390234299740937_3117083 blk_4605672167531794646_3042444
> blk_-2793729264523330063_3046949 blk_-1069835590195826211_2928578
> blk_-3689480462529026793_3284707 blk_-2100166843619194516_3265408
> blk_5162047185501320447_3278539 blk_3664800743566330457_3065400
> blk_3369418146997398320_3111317 blk_5964743871832843148_3031713
> blk_-8218489376644120438_2987780 blk_367071346032512828_3180655
> blk_-442303570139272169_3314076 blk_5419190113922354447_3205121
> blk_-2101734991458420810_3075412 blk_1957248302788390163_2955454
> blk_8699145900031080784_2957098 blk_7385528884584110838_3058451
> blk_4447871550951654682_3039010 blk_1887493293417017989_3223726
> blk_6157668188087364422_2901764 blk_-8576478885691122637_3268999
> blk_1151511910147641335_3222139 blk_8085841381003430120_2901077
> blk_-7657800079806100653_3240574 blk_234746170041166777_3211835
> blk_7314545895906772373_2975491 blk_613366993704120940_2873518
> blk_-7668134916749889355_2904183 blk_64385028396804451_3109940
>
> but it is very infrequent for ds2.  For ds1 and ds3 the requests are much
> more regular.  Any idea what is going on?  Why it isn't sending the delete
> commands? Or what I need to do or check to solve the problem?
>
> Thanks,
> ~Jeff
>
> --
> Jeff Whiting
> Qualtrics Senior Software Engineer
> je...@qualtrics.com
>
>
>
>
>
>


-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Unbalanced Datanode and Lots of Blocks Waiting for Deletion

Reply via email to