[ 
https://issues.apache.org/jira/browse/HDFS-7820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.Andreina updated HDFS-7820:
-----------------------------
    Attachment: HDFS-7820.1.patch

I have attached a patch where the block id value will be incremented by 10 
million , if the RollingUpgradeStartupOption=ROLLBACK. 

This would avoid client write failure immediately after rollback,  because of 
assigning same block id as blocks written before rollback , which are still in 
Finalized state (Which will be deleted after the second block report.)

Please review the patch and give your feedback. 


> Client Write fails after rolling upgrade operation with "<block_id> already 
> exist in finalized state"
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7820
>                 URL: https://issues.apache.org/jira/browse/HDFS-7820
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: J.Andreina
>            Assignee: J.Andreina
>         Attachments: HDFS-7820.1.patch
>
>
> Steps to Reproduce:
> ===================
> Step 1:  Prepare rolling upgrade using "hdfs dfsadmin -rollingUpgrade prepare"
> Step 2:  Shutdown SNN and NN
> Step 3:  Start NN with the "hdfs namenode -rollingUpgrade started" option.
> Step 4:  Executed "hdfs dfsadmin -shutdownDatanode <DATANODE_HOST:IPC_PORT> 
> upgrade" and restarted Datanode
> Step 5:  Write 3 files to hdfs ( block id assigned are : blk_1073741831_1007, 
> blk_1073741832_1008,blk_1073741833_1009 )
> Step 6:  Shutdown both NN and DN
> Step 7:  Start NNs with the "hdfs namenode -rollingUpgrade rollback" option.
>          Start DNs with the "-rollback" option.
> Step 8:  Write 2 files to hdfs.
> Issue:
> =======
> Client write failed with below exception
> {noformat}
> 2015-02-23 16:00:12,896 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Receiving BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741832_1008 src: 
> /XXXXXXXXXXX:48545 dest: /XXXXXXXXXXX:50010
> 2015-02-23 16:00:12,897 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> opWriteBlock BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741832_1008 
> received exception 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741832_1008 already exists in 
> state FINALIZED and thus cannot be created.
> {noformat}
> Observations:
> =============
> 1. At Namenode side block invalidate is been sent only to 2 blocks.
> {noformat}
> 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
> blk_1073741833_1009 to XXXXXXXXXXX:50010
> 15/02/23 14:59:56 INFO BlockStateChange: BLOCK* InvalidateBlocks: add 
> blk_1073741831_1007 to XXXXXXXXXXX:50010
> {noformat}
> 2. fsck report does not show information on blk_1073741832_1008
> {noformat}
> FSCK started by Rex (auth:SIMPLE) from /XXXXXXXXXXX for path / at Mon Feb 23 
> 16:17:57 CST 2015
> /File1:  Under replicated 
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741825_1001. Target Replicas 
> is 3 but found 1 replica(s).
> /File11:  Under replicated 
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741827_1003. Target Replicas 
> is 3 but found 1 replica(s).
> /File2:  Under replicated 
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741826_1002. Target Replicas 
> is 3 but found 1 replica(s).
> /AfterRollback_2:  Under replicated 
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741831_1007. Target Replicas 
> is 3 but found 1 replica(s).
> /Test1:  Under replicated 
> BP-1837556285-XXXXXXXXXXX-1423130389269:blk_1073741828_1004. Target Replicas 
> is 3 but found 1 replica(s).
> Status: HEALTHY
>  Total size:    31620 B
>  Total dirs:    7
>  Total files:   6
>  Total symlinks:                0
>  Total blocks (validated):      5 (avg. block size 6324 B)
>  Minimally replicated blocks:   5 (100.0 %)
>  Over-replicated blocks:        0 (0.0 %)
>  Under-replicated blocks:       5 (100.0 %)
>  Mis-replicated blocks:         0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:     1.0
>  Corrupt blocks:                0
>  Missing replicas:              10 (66.666664 %)
>  Number of data-nodes:          1
>  Number of racks:               1
> FSCK ended at Mon Feb 23 16:17:57 CST 2015 in 3 milliseconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to