[
https://issues.apache.org/jira/browse/HDFS-5225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904144#comment-13904144
]
Lars Francke commented on HDFS-5225:
------------------------------------
We're running CDH 4.5.0 which is using Hadoop 2.0. I see that a fix for this
issue is in CDH 4.6 but that's not released yet.
> datanode keeps logging the same 'is no longer in the dataset' message over
> and over again
> -----------------------------------------------------------------------------------------
>
> Key: HDFS-5225
> URL: https://issues.apache.org/jira/browse/HDFS-5225
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.1.1-beta
> Reporter: Roman Shaposhnik
> Assignee: Tsuyoshi OZAWA
> Priority: Blocker
> Attachments: HDFS-5225-reproduce.1.txt, HDFS-5225.1.patch,
> HDFS-5225.2.patch
>
>
> I was running the usual Bigtop testing on 2.1.1-beta RC1 with the following
> configuration: 4 nodes fully distributed cluster with security on.
> All of a sudden my DN ate up all of the space in /var/log logging the
> following message repeatedly:
> {noformat}
> 2013-09-18 20:51:12,046 INFO
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner:
> BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369 is no longer
> in the dataset
> {noformat}
> It wouldn't answer to a jstack and jstack -F ended up being useless.
> Here's what I was able to find in the NameNode logs regarding this block ID:
> {noformat}
> fgrep -rI 'blk_1073742189' hadoop-hdfs-namenode-ip-10-224-158-152.log
> 2013-09-18 18:03:16,972 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> allocateBlock:
> /user/jenkins/testAppendInputWedSep18180222UTC2013/test4.fileWedSep18180222UTC2013._COPYING_.
> BP-1884637155-10.224.158.152-1379524544853
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW],
> ReplicaUnderConstruction[10.34.74.206:1004|RBW],
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]}
> 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 10.224.158.152:1004 is added to
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW],
> ReplicaUnderConstruction[10.34.74.206:1004|RBW],
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,222 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 10.34.74.206:1004 is added to
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW],
> ReplicaUnderConstruction[10.34.74.206:1004|RBW],
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,224 INFO BlockStateChange: BLOCK* addStoredBlock:
> blockMap updated: 10.83.107.80:1004 is added to
> blk_1073742189_1369{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1,
> replicas=[ReplicaUnderConstruction[10.83.107.80:1004|RBW],
> ReplicaUnderConstruction[10.34.74.206:1004|RBW],
> ReplicaUnderConstruction[10.224.158.152:1004|RBW]]} size 0
> 2013-09-18 18:03:17,899 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369,
> newGenerationStamp=1370, newLength=1048576, newNodes=[10.83.107.80:1004,
> 10.34.74.206:1004, 10.224.158.152:1004],
> clientName=DFSClient_NONMAPREDUCE_-450304083_1)
> 2013-09-18 18:03:17,904 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1369)
> successfully to
> BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370
> 2013-09-18 18:03:18,540 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> updatePipeline(block=BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370,
> newGenerationStamp=1371, newLength=2097152, newNodes=[10.83.107.80:1004,
> 10.34.74.206:1004, 10.224.158.152:1004],
> clientName=DFSClient_NONMAPREDUCE_-450304083_1)
> 2013-09-18 18:03:18,548 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> updatePipeline(BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1370)
> successfully to
> BP-1884637155-10.224.158.152-1379524544853:blk_1073742189_1371
> 2013-09-18 18:03:26,150 INFO BlockStateChange: BLOCK* addToInvalidates:
> blk_1073742189_1371 10.83.107.80:1004 10.34.74.206:1004 10.224.158.152:1004
> 2013-09-18 18:03:26,847 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> InvalidateBlocks: ask 10.34.74.206:1004 to delete [blk_1073742178_1359,
> blk_1073742183_1362, blk_1073742184_1363, blk_1073742186_1366,
> blk_1073742188_1368, blk_1073742189_1371]
> 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> InvalidateBlocks: ask 10.224.158.152:1004 to delete [blk_1073742177_1353,
> blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356,
> blk_1073742181_1358, blk_1073742182_1361, blk_1073742185_1364,
> blk_1073742187_1367, blk_1073742188_1368, blk_1073742189_1371]
> 2013-09-18 18:03:29,848 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
> InvalidateBlocks: ask 10.83.107.80:1004 to delete [blk_1073742177_1353,
> blk_1073742178_1359, blk_1073742179_1355, blk_1073742180_1356,
> blk_1073742181_1358, blk_1073742182_1361, blk_1073742183_1362,
> blk_1073742184_1363, blk_1073742185_1364, blk_1073742186_1366,
> blk_1073742187_1367, blk_1073742188_1368, blk_1073742189_1371]
> {noformat}
> This seems to suggest that the block was successfully deleted, but then DN
> got into a death spiral inside of a scanner.
> I can keep the cluster running for a few days if anybody is willing to take a
> look. Ask me for creds via a personal email.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)