[ 
https://issues.apache.org/jira/browse/HDFS-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14536432#comment-14536432
 ] 

Hudson commented on HDFS-8245:
------------------------------

SUCCESS: Integrated in Hadoop-Yarn-trunk #922 (See 
[https://builds.apache.org/job/Hadoop-Yarn-trunk/922/])
HDFS-8245. Standby namenode doesn't process DELETED_BLOCK if the addblock 
request is in edit log. Contributed by Rushabh S Shah. (kihwal: rev 
2d4ae3d18bc530fa9f81ee616db8af3395705fb9)
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/datanode/TestBlockReplacement.java
* 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
* hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
* 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/ha/TestDNFencing.java


> Standby namenode doesn't process DELETED_BLOCK if the add block request is in 
> edit log.
> ---------------------------------------------------------------------------------------
>
>                 Key: HDFS-8245
>                 URL: https://issues.apache.org/jira/browse/HDFS-8245
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>              Labels: BB2015-05-TBR
>             Fix For: 2.7.1
>
>         Attachments: HDFS-8245-1.patch, HDFS-8245.patch
>
>
> The following series of events happened on Standby namenode :
> 2015-04-09 07:47:21,735 \[Edit log tailer] INFO ha.EditLogTailer: Triggering 
> log roll on remote NameNode Active Namenode (ANN)
> 2015-04-09 07:58:01,858 \[Edit log tailer] INFO ha.EditLogTailer: Triggering 
> log roll on remote NameNode ANN
> The following series of events happened on Active Namenode:,
> 2015-04-09 07:47:21,747 \[IPC Server handler 99 on 8020] INFO 
> namenode.FSNamesystem: Roll Edit Log from Standby NN (SNN)
> 2015-04-09 07:58:01,868 \[IPC Server handler 18 on 8020] INFO 
> namenode.FSNamesystem: Roll Edit Log from SNN
> The following series of events happened on datanode ( {color:red} datanodeA 
> {color}):
> 2015-04-09 07:52:15,817 \[DataXceiver for client 
> DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1 at /xxxx:51078 
> \[Receiving block 
> BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867]] INFO 
> datanode.DataNode: Receiving 
> BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867 src: 
> /client:51078 dest: /{color:red}datanodeA:1004{color}
> 2015-04-09 07:52:15,969 \[PacketResponder: 
> BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO DataNode.clienttrace: src: 
> /client:51078, dest: /{color:red}datanodeA:1004{color}, bytes: 20, op: 
> HDFS_WRITE, cliID: 
> DFSClient_attempt_1428022041757_102831_r_000107_0_1139131345_1, offset: 0, 
> srvID: 356a8a98-826f-446d-8f4c-ce288c1f0a75, blockid: 
> BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, duration: 
> 148948385
> 2015-04-09 07:52:15,969 \[PacketResponder: 
> BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, 
> type=HAS_DOWNSTREAM_IN_PIPELINE] INFO datanode.DataNode: PacketResponder: 
> BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867, 
> type=HAS_DOWNSTREAM_IN_PIPELINE terminating
> 2015-04-09 07:52:25,970 \[DataXceiver for client /{color:red}<datanodeB> 
> {color}:52827 \[Copying block 
> BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867]] INFO 
> datanode.DataNode: Copied 
> BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867 to 
> <{color:red}datanodeB{color}>:52827
> 2015-04-09 07:52:28,187 \[DataNode:   heartbeating to ANN:8020] INFO 
> impl.FsDatasetAsyncDiskService: Scheduling blk_1570321882_1102029183867 file 
> <path>/blk_1570321882 for deletion
> 2015-04-09 07:52:28,188 \[Async disk worker #1482 for volume ] INFO 
> impl.FsDatasetAsyncDiskService: Deleted BP-595383232-xxxx-1360869396230 
> blk_1570321882_1102029183867 file <path>/blk_1570321882
> Then we failover for upgrade and then the standby became active.
> When we did  ls command on this file, we got the following exception:
> 15/04/09 22:07:39 WARN hdfs.BlockReaderFactory: I/O error constructing remote 
> block reader.
> java.io.IOException: Got error for OP_READ_BLOCK, self=/client:32947, 
> remote={color:red}datanodeA:1004{color}, for file <filename>, for pool 
> BP-595383232-xxxx-1360869396230 block 1570321882_1102029183867
>         at 
> org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:445)
>         at 
> org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:410)
>         at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:815)
>         at 
> org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:693)
>         at 
> org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:351)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:576)
>         at 
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
>         at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
>         at java.io.DataInputStream.read(DataInputStream.java:100)
>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:78)
>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:52)
>         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)
>         at 
> org.apache.hadoop.fs.shell.CopyCommands$Merge.processArguments(CopyCommands.java:97)
>         at 
> org.apache.hadoop.fs.shell.Command.processRawArguments(Command.java:190)
>         at org.apache.hadoop.fs.shell.Command.run(Command.java:154)
>         at org.apache.hadoop.fs.FsShell.run(FsShell.java:287)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
>         at org.apache.hadoop.fs.FsShell.main(FsShell.java:340)
> Eventually this request was served by {color:red} datanodeB {color} on the 
> second attempt, 
> 15/04/09 22:07:39 INFO hdfs.DFSClient: Successfully connected to 
> {color:red}datanodeB:1004{color} for 
> BP-595383232-xxxx-1360869396230:blk_1570321882_1102029183867
> On further analyzing this problem, I found that when {color:red} datanodeA 
> {color} notified the namenode after deleting that block, the then ANN didn't 
> queue delete request to PendingDNMessages.
> So when the then SNN read the edits, it didn't knew that the block from 
> {color:red}datanodeA {color} was deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to