[ https://issues.apache.org/jira/browse/HDFS-9530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15231505#comment-15231505 ]
Brahma Reddy Battula commented on HDFS-9530: -------------------------------------------- To Make it simple for analysis, 1. Reservation happens only when the block is being received using BlockReceiver. No other places reservation happens, so no need to release as well. 2. BlockReceiver constructor have a try-catch block where it will release all the bytes reserved, if there is any exceptions after reserving. 3. BlockReceiver#receiveBlock() have the try-catch block where it will release all the bytes reserved if there is any exceptions during the receiving process. 4. During successful receiving of packets, {{ReplicaInPipeline#setBytesAcked(..)}} will be called by {{PacketResponder}} 5. Once the block is completely received, {{FsDataSetImpl#finalizeReplica(..)} will release all the remaining reserved bytes. Only place left is in {{DataXceiver#writeBlock()}}, exception can happen after creation of {{BlockReceiver}} and before calling {{BlockReceiver#receiveBlock()}}, if failed to connect to Mirror nodes. Only in this case, bytes will not be released. But a ReplicaInfo instance will be already created in ReplicaMap. Here, if the client re-creates the pipeline with same blockId, then same ReplicaInfo instance will be used, So no extra reservation happens. This can be verified using the same testcase as patch, but failing the pipeline for append, where abandonBlock will not be called, and pipeline will be recovered for same block. But in case of fresh block creation, block will be abandoned and fresh block with new pipeline will be requested. The old block created at Datanode will be eventually be deleted, BUT reserved space was never released. That's why you are not seeing many RBW blocks in RBW directory, but reserved space went on to accumulate > 1TB. Though I have given two approaches, #1, releasing reserved bytes while deletion of ReplicaInPipeline instances, will cover all hidden cases, if any, as well. Hope this helps. > huge Non-DFS Used in hadoop 2.6.2 & 2.7.1 > ----------------------------------------- > > Key: HDFS-9530 > URL: https://issues.apache.org/jira/browse/HDFS-9530 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.7.1 > Reporter: Fei Hui > Attachments: HDFS-9530-01.patch > > > i think there are bugs in HDFS > =============================================================================== > here is config > <property> > <name>dfs.datanode.data.dir</name> > <value> > > file:///mnt/disk4,file:///mnt/disk1,file:///mnt/disk3,file:///mnt/disk2 > </value> > </property> > here is dfsadmin report > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 238604832768 (222.22 GB) > DFS Remaining: 215772954624 (200.95 GB) > DFS Used: 22831878144 (21.26 GB) > DFS Used%: 9.57% > Under replicated blocks: 4 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > ------------------------------------------------- > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7190958080 (6.70 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72343986176 (67.38 GB) > DFS Used%: 8.96% > DFS Remaining%: 90.14% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:02 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 7219073024 (6.72 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 72315871232 (67.35 GB) > DFS Used%: 9.00% > DFS Remaining%: 90.11% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 8421847040 (7.84 GB) > Non DFS Used: 721473536 (688.05 MB) > DFS Remaining: 71113097216 (66.23 GB) > DFS Used%: 10.49% > DFS Remaining%: 88.61% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 1 > Last contact: Wed Dec 09 15:55:03 CST 2015 > ================================================================================ > when running hive job , dfsadmin report as follows > [hadoop@worker-1 ~]$ hadoop dfsadmin -report > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > Configured Capacity: 240769253376 (224.23 GB) > Present Capacity: 108266011136 (100.83 GB) > DFS Remaining: 80078416384 (74.58 GB) > DFS Used: 28187594752 (26.25 GB) > DFS Used%: 26.04% > Under replicated blocks: 7 > Blocks with corrupt replicas: 0 > Missing blocks: 0 > ------------------------------------------------- > Live datanodes (3): > Name: 10.117.60.59:50010 (worker-2) > Hostname: worker-2 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9015627776 (8.40 GB) > Non DFS Used: 44303742464 (41.26 GB) > DFS Remaining: 26937047552 (25.09 GB) > DFS Used%: 11.23% > DFS Remaining%: 33.56% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 693 > Last contact: Wed Dec 09 15:37:35 CST 2015 > Name: 10.168.156.0:50010 (worker-3) > Hostname: worker-3 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 9163116544 (8.53 GB) > Non DFS Used: 47895897600 (44.61 GB) > DFS Remaining: 23197403648 (21.60 GB) > DFS Used%: 11.42% > DFS Remaining%: 28.90% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 750 > Last contact: Wed Dec 09 15:37:36 CST 2015 > Name: 10.117.15.38:50010 (worker-1) > Hostname: worker-1 > Decommission Status : Normal > Configured Capacity: 80256417792 (74.74 GB) > DFS Used: 10008850432 (9.32 GB) > Non DFS Used: 40303602176 (37.54 GB) > DFS Remaining: 29943965184 (27.89 GB) > DFS Used%: 12.47% > DFS Remaining%: 37.31% > Configured Cache Capacity: 0 (0 B) > Cache Used: 0 (0 B) > Cache Remaining: 0 (0 B) > Cache Used%: 100.00% > Cache Remaining%: 0.00% > Xceivers: 632 > Last contact: Wed Dec 09 15:37:36 CST 2015 > ========================================================================= > but, df output is as follows on worker-1 > [hadoop@worker-1 ~]$ df > Filesystem 1K-blocks Used Available Use% Mounted on > /dev/xvda1 20641404 4229676 15363204 22% / > tmpfs 8165456 0 8165456 0% /dev/shm > /dev/xvdc 20642428 2596920 16996932 14% /mnt/disk3 > /dev/xvdb 20642428 2692228 16901624 14% /mnt/disk4 > /dev/xvdd 20642428 2445852 17148000 13% /mnt/disk2 > /dev/xvde 20642428 2909764 16684088 15% /mnt/disk1 > df output conflitcs with dfsadmin report > any suggestions? -- This message was sent by Atlassian JIRA (v6.3.4#6332)