[ 
https://issues.apache.org/jira/browse/HADOOP-5523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12682916#action_12682916
 ] 

Igor Bolotin commented on HADOOP-5523:
--------------------------------------

DF and DU sizes on the datanode match very closely with information reported by 
dfsadmin command. 
Lsof reports some 1000 open files in DFS data directories on the problematic 
datanode, but total size for open files is only about 10GB.

Here is something interesting - fsck before datanode restart reports very 
significant number of over-replicated blocks (~10% of blocks are 
over-replicated):

Status: HEALTHY
 Total size:    1472758591906 B (Total open files size: 29050588133 B)          
                                            
 Total dirs:    58431                                                           
                                            
 Total files:   375703 (Files currently being written: 418)                     
                                            
 Total blocks (validated):      387205 (avg. block size 3803562 B) (Total open 
file blocks (not validated): 595)            
 Minimally replicated blocks:   387205 (100.0 %)                                
                                            
 Over-replicated blocks:        38782 (10.015883 %)                             
                                            
 Under-replicated blocks:       0 (0.0 %)                                       
                                            
 Mis-replicated blocks:         0 (0.0 %)                                       
                                            
 Default replication factor:    3                                               
                                            
 Average block replication:     3.1003888                                       
                                            
 Corrupt blocks:                0                                               
                                            
 Missing replicas:              0 (0.0 %)                                       
                                            
 Number of data-nodes:          7                                               
                                            
 Number of racks:               1                                               
                                            

After datanode restart - over-replicated nodes are practically gone:

Status: HEALTHY
 Total size:    1310669475298 B (Total open files size: 29535016933 B)
 Total dirs:    59431
 Total files:   377177 (Files currently being written: 387)
 Total blocks (validated):      386661 (avg. block size 3389712 B) (Total open 
file blocks (not validated): 607)
 Minimally replicated blocks:   386661 (100.0 %)
 Over-replicated blocks:        272 (0.070345856 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0007036
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          7
 Number of racks:               1


> Datanode stops cleaning disk space
> ----------------------------------
>
>                 Key: HADOOP-5523
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5523
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.19.0
>         Environment: Linux
>            Reporter: Igor Bolotin
>            Priority: Critical
>
> Here is the situation - DFS cluster running Hadoop version 0.19.0. The 
> cluster is running on multiple servers with practically identical hardware. 
> Everything works perfectly well, except for one thing - from time to time one 
> of the data nodes (every time it's a different node) starts to consume more 
> and more disk space. The node keeps going and if we don't do anything - it 
> runs out of space completely (ignoring 20GB reserved space settings). 
> Once restarted - it cleans disk rapidly and goes back to approximately the 
> same utilization as the rest of data nodes in the cluster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to