[
https://issues.apache.org/jira/browse/HDFS-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14364884#comment-14364884
]
J.Andreina commented on HDFS-7939:
----------------------------------
Step 1: NN1 is active , NN2 is standby.
Step 2: Perform "hdfs dfsadmin rollingUpgrade prepare"
Step 3: Active NN1 gone down.
{noformat}
NN1:
-rw-r--r-- 1 Rex users 67 Mar 17 17:35
edits_0000000000000000001-0000000000000000003
-rw-r--r-- 1 Rex users 1048576 Mar 17 17:35 edits_inprogress_0000000000000000004
-rw-r--r-- 1 Rex users 350 Mar 17 17:33 fsimage_0000000000000000000
-rw-r--r-- 1 Rex users 62 Mar 17 17:33 fsimage_0000000000000000000.md5
-rw-r--r-- 1 Rex users 2 Mar 17 17:35 seen_txid
-rw-r--r-- 1 Rex users 206 Mar 17 17:33 VERSION
NN2:
-rw-r--r-- 1 Rex users 1048576 Mar 17 17:38 edits_inprogress_0000000000000000005
-rw-r--r-- 1 Rex users 349 Mar 17 17:37 fsimage_0000000000000000000
-rw-r--r-- 1 Rex users 62 Mar 17 17:37 fsimage_0000000000000000000.md5
-rw-r--r-- 1 Rex users 2 Mar 17 17:37 seen_txid
-rw-r--r-- 1 Rex users 205 Mar 17 17:37 VERSION
{noformat}
Step 4: Restart NN2 with "rollingUpgrade started" option. ( Created
fsimage_rollback_0000000000000000004, closed txn 5 and NN2 became Active. But
not able to upload to NN1.)
Step 5: Restart NN1 with "rollingUpgrade started" option. ( NN1 became standby)
Issue :
=======
NN1 did checkpoint for one extra txn ( id: 5) and uploaded one more
fsimage_rollback_0000000000000000005 to NN2
On rollback , NN2 deletes only fsimage_rollback_0000000000000000005 , leaving
behind fsimage_rollback_0000000000000000004 without deleting.
{noformat}
NN2 :
-rw-r--r-- 1 Rex users 1048576 Mar 17 17:38
edits_0000000000000000005-0000000000000000005
-rw-r--r-- 1 Rex users 1048576 Mar 17 17:39 edits_inprogress_0000000000000000006
-rw-r--r-- 1 Rex users 349 Mar 17 17:37 fsimage_0000000000000000000
-rw-r--r-- 1 Rex users 62 Mar 17 17:37 fsimage_0000000000000000000.md5
-rw-r--r-- 1 Rex users 356 Mar 17 17:39 fsimage_rollback_0000000000000000004
-rw-r--r-- 1 Rex users 71 Mar 17 17:39
fsimage_rollback_0000000000000000004.md5
-rw-r--r-- 1 Rex users 356 Mar 17 17:39 fsimage_rollback_0000000000000000005
-rw-r--r-- 1 Rex users 71 Mar 17 17:39
fsimage_rollback_0000000000000000005.md5
-rw-r--r-- 1 Rex users 2 Mar 17 17:37 seen_txid
-rw-r--r-- 1 Rex users 205 Mar 17 17:39 VERSION
NN1 :
-rw-r--r-- 1 Rex users 67 Mar 17 17:38
edits_0000000000000000001-0000000000000000003
-rw-r--r-- 1 Rex users 1048576 Mar 17 17:38 edits_inprogress_0000000000000000004
-rw-r--r-- 1 Rex users 349 Mar 17 17:36 fsimage_0000000000000000000
-rw-r--r-- 1 Rex users 62 Mar 17 17:36 fsimage_0000000000000000000.md5
-rw-r--r-- 1 Rex users 356 Mar 17 17:39 fsimage_rollback_0000000000000000005
-rw-r--r-- 1 Rex users 71 Mar 17 17:39
fsimage_rollback_0000000000000000005.md5
-rw-r--r-- 1 Rex users 2 Mar 17 17:38 seen_txid
-rw-r--r-- 1 Rex users 205 Mar 17 17:39 VERSION
{noformat}
> Two fsimage_rollback_* files are created which are not deleted after rollback.
> ------------------------------------------------------------------------------
>
> Key: HDFS-7939
> URL: https://issues.apache.org/jira/browse/HDFS-7939
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: J.Andreina
> Assignee: J.Andreina
> Priority: Critical
>
> During checkpoint , if any failure in uploading to the remote Namenode then
> restarting Namenode with "rollingUpgrade started" option creates 2
> fsimage_rollback_* at Active Namenode .
> On rolling upgrade rollback , initially created fsimage_rollback_* file is
> not been deleted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)