[
https://issues.apache.org/jira/browse/HDFS-5042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16082748#comment-16082748
]
Kihwal Lee commented on HDFS-5042:
----------------------------------
We are seeing significant performance degradation in 2.8 with this change.
Whenever the write load increases, multiple datanodes stop heartbeating for a
long time, causing missing blocks. All other Xceiver threads and the actor
threads are waiting for the dataset impl lock. The writes make progress, but
slow enough to get always caught by jstack.
{noformat}
"DataXceiver for client at xxx [Receiving block BP-aaa:blk_xxxx_xxxx]"
#343116 daemon prio=5 os_prio=0 tid=0x00007f3b1ef18000 nid=0x19193 runnable
[0x00007f3a5c104000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:388)
at org.apache.hadoop.io.IOUtils.fsync(IOUtils.java:394)
at org.apache.hadoop.io.IOUtils.fsync(IOUtils.java:376)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.fsyncDirectory(FsDatasetImpl.java:899)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeReplica(FsDatasetImpl.java:1756)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.finalizeBlock(FsDatasetImpl.java:1724)
at
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:949)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:854)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:166)
at
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
at
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:288)
at java.lang.Thread.run(Thread.java:745)
Locked ownable synchronizers:
- <0x00000000d5ff46f8> (a
java.util.concurrent.locks.ReentrantLock$NonfairSync)
{noformat}
We are still analyzing whether there are other factors in play.
> Completed files lost after power failure
> ----------------------------------------
>
> Key: HDFS-5042
> URL: https://issues.apache.org/jira/browse/HDFS-5042
> Project: Hadoop HDFS
> Issue Type: Bug
> Environment: ext3 on CentOS 5.7 (kernel 2.6.18-274.el5)
> Reporter: Dave Latham
> Assignee: Vinayakumar B
> Priority: Critical
> Fix For: 2.9.0, 2.7.4, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-5042-01.patch, HDFS-5042-02.patch,
> HDFS-5042-03.patch, HDFS-5042-04.patch, HDFS-5042-05-branch-2.patch,
> HDFS-5042-05.patch, HDFS-5042-branch-2-01.patch, HDFS-5042-branch-2-05.patch,
> HDFS-5042-branch-2.7-05.patch, HDFS-5042-branch-2.7-06.patch,
> HDFS-5042-branch-2.8-05.patch, HDFS-5042-branch-2.8-06.patch
>
>
> We suffered a cluster wide power failure after which HDFS lost data that it
> had acknowledged as closed and complete.
> The client was HBase which compacted a set of HFiles into a new HFile, then
> after closing the file successfully, deleted the previous versions of the
> file. The cluster then lost power, and when brought back up the newly
> created file was marked CORRUPT.
> Based on reading the logs it looks like the replicas were created by the
> DataNodes in the 'blocksBeingWritten' directory. Then when the file was
> closed they were moved to the 'current' directory. After the power cycle
> those replicas were again in the blocksBeingWritten directory of the
> underlying file system (ext3). When those DataNodes reported in to the
> NameNode it deleted those replicas and lost the file.
> Some possible fixes could be having the DataNode fsync the directory(s) after
> moving the block from blocksBeingWritten to current to ensure the rename is
> durable or having the NameNode accept replicas from blocksBeingWritten under
> certain circumstances.
> Log snippets from RS (RegionServer), NN (NameNode), DN (DataNode):
> {noformat}
> RS 2013-06-29 11:16:06,812 DEBUG org.apache.hadoop.hbase.util.FSUtils:
> Creating
> file=hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> with permission=rwxrwxrwx
> NN 2013-06-29 11:16:06,830 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.allocateBlock:
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c.
> blk_1395839728632046111_357084589
> DN 2013-06-29 11:16:06,832 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_1395839728632046111_357084589 src: /10.0.5.237:14327 dest:
> /10.0.5.237:50010
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.1:50010 is added to
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,370 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 10.0.6.24:50010 is added to
> blk_1395839728632046111_357084589 size 25418340
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 10.0.5.237:50010 is added to
> blk_1395839728632046111_357084589 size 25418340
> DN 2013-06-29 11:16:11,385 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Received block
> blk_1395839728632046111_357084589 of size 25418340 from /10.0.5.237:14327
> DN 2013-06-29 11:16:11,385 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder 2 for block
> blk_1395839728632046111_357084589 terminating
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: Removing
> lease on file
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> from client DFSClient_hb_rs_hs745,60020,1372470111932
> NN 2013-06-29 11:16:11,385 INFO org.apache.hadoop.hdfs.StateChange: DIR*
> NameSystem.completeFile: file
> /hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> is closed by DFSClient_hb_rs_hs745,60020,1372470111932
> RS 2013-06-29 11:16:11,393 INFO org.apache.hadoop.hbase.regionserver.Store:
> Renaming compacted file at
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/.tmp/6e0cc30af6e64e56ba5a539fdf159c4c
> to
> hdfs://hm3:9000/hbase/users-6/b5b0820cde759ae68e333b2f4015bb7e/n/6e0cc30af6e64e56ba5a539fdf159c4c
> RS 2013-06-29 11:16:11,505 INFO org.apache.hadoop.hbase.regionserver.Store:
> Completed major compaction of 7 file(s) in n of
> users-6,\x12\xBDp\xA3,1359426311784.b5b0820cde759ae68e333b2f4015bb7e. into
> 6e0cc30af6e64e56ba5a539fdf159c4c, size=24.2m; total size for store is 24.2m
> ------- CRASH, RESTART ---------
> NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: addStoredBlock request received for
> blk_1395839728632046111_357084589 on 10.0.6.1:50010 size 21978112 but was
> rejected: Reported as block being written but is a block of closed file.
> NN 2013-06-29 12:01:19,743 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet
> of 10.0.6.1:50010
> NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: addStoredBlock request received for
> blk_1395839728632046111_357084589 on 10.0.5.237:50010 size 16971264 but was
> rejected: Reported as block being written but is a block of closed file.
> NN 2013-06-29 12:01:20,155 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet
> of 10.0.5.237:50010
> NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: addStoredBlock request received for
> blk_1395839728632046111_357084589 on 10.0.6.24:50010 size 21913088 but was
> rejected: Reported as block being written but is a block of closed file.
> NN 2013-06-29 12:01:20,175 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> NameSystem.addToInvalidates: blk_1395839728632046111 is added to invalidSet
> of 10.0.6.24:50010
> (note the clock on the server running DN is wrong after restart. I believe
> timestamps are off by 6 hours:)
> DN 2013-06-29 06:07:22,877 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block
> blk_1395839728632046111_357084589 file
> /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111 for deletion
> DN 2013-06-29 06:07:24,952 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block
> blk_1395839728632046111_357084589 at file
> /data/hadoop/dfs/data/blocksBeingWritten/blk_1395839728632046111
> {noformat}
> There was some additional discussion on this thread on the mailing list:
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-user/201307.mbox/%3CCA+qbEUPuf19PL_EVeWi1104+scLVrcS0LTFUvBPw=qcuxnz...@mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]