[
https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14341087#comment-14341087
]
Arpit Agarwal commented on HDFS-7820:
-------------------------------------
Thank you for reporting this issue [~andreina]. This looks like an unfortunate
interaction of rollback with sequential block IDs. Your fix will work for most
deployments but I don't like adding a hard-coded 10M increment. Let me think
about this some more.
> Client Write fails after rolling upgrade rollback with "<block_id> already
> exist in finalized state"
> ----------------------------------------------------------------------------------------------------
>
> Key: HDFS-7820
> URL: https://issues.apache.org/jira/browse/HDFS-7820
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: J.Andreina
> Assignee: J.Andreina
> Attachments: HDFS-7820.1.patch
>
>
> Steps to Reproduce:
> ===================
> Step 1: Prepare rolling upgrade using "hdfs dfsadmin -rollingUpgrade prepare"
> Step 2: Shutdown SNN and NN
> Step 3: Start NN with the "hdfs namenode -rollingUpgrade started" option.
> Step 4: Executed "hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT>
> upgrade" and restarted Datanode
> Step 5: Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007,
> blk_1073741832_1008,blk_1073741833_1009 )
> Step 6: Shutdown both NN and DN
> Step 7: Start NNs with the "hdfs namenode -rollingUpgrade rollback" option.
> Start DNs with the "-rollback" option.
> Step 8: Write 2 files to hdfs.
> Issue:
> =======
> Client write failed with below exception
> {noformat}
> 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Receiving BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741832_1008 src:
> /XXXXXXXXXXX:48545 dest: /XXXXXXXXXXX:50010
> 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> opWriteBlock BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741832_1008
> received exception
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741832_1008 already exists in
> state FINALIZED and thus cannot be created.
> {noformat}
> Observations:
> =============
> 1. At Namenode side block invalidate is been sent only to 2 blocks.
> {noformat}
> 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add
> blk_1073741833_1009 to XXXXXXXXXXX:50010
> 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add
> blk_1073741831_1007 to XXXXXXXXXXX:50010
> {noformat}
> 2. fsck report does not show information on blk_1073741832_1008
> {noformat}
> FSCK started by Rex (auth:SIMPLE) from /XXXXXXXXXXX for path / at Mon Feb 23
> 16:17:57 CST 2015
> /File1: Under replicated
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741825_1001. Target Replicas
> is 3 but found 1 replica(s).
> /File11: Under replicated
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741827_1003. Target Replicas
> is 3 but found 1 replica(s).
> /File2: Under replicated
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741826_1002. Target Replicas
> is 3 but found 1 replica(s).
> /AfterRollback_2: Under replicated
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741831_1007. Target Replicas
> is 3 but found 1 replica(s).
> /Test1: Under replicated
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741828_1004. Target Replicas
> is 3 but found 1 replica(s).
> Status: HEALTHY
> Total size: 31620 B
> Total dirs: 7
> Total files: 6
> Total symlinks: 0
> Total blocks (validated): 5 (avg. block size 6324 B)
> Minimally replicated blocks: 5 (100.0 %)
> Over-replicated blocks: 0 (0.0 %)
> Under-replicated blocks: 5 (100.0 %)
> Mis-replicated blocks: 0 (0.0 %)
> Default replication factor: 3
> Average block replication: 1.0
> Corrupt blocks: 0
> Missing replicas: 10 (66.666664 %)
> Number of data-nodes: 1
> Number of racks: 1
> FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)