[
https://issues.apache.org/jira/browse/HDFS-7934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14362832#comment-14362832
]
J.Andreina commented on HDFS-7934:
----------------------------------
Steps to Reproduce:
=================
Step 1: Start NN1 as active , NN2 as standby .
Step 2: Perform "hdfs dfsadmin -rollingUpgrade prepare"
Step 3: Start NN2 active and NN1 as standby with rolling upgrade started option.
Step 4: DN also restarted in upgrade mode.
{noformat}
NN2 active:
-rw-r--r-- 1 Rex users 1048576 Mar 13 17:36 edits_inprogress_0000000000000000031
-rw-r--r-- 1 Rex users 350 Mar 13 17:33 fsimage_0000000000000000000
-rw-r--r-- 1 Rex users 62 Mar 13 17:33 fsimage_0000000000000000000.md5
-rw-r--r-- 1 Rex users 622 Mar 13 17:36 fsimage_rollback_0000000000000000029
-rw-r--r-- 1 Rex users 71 Mar 13 17:36
fsimage_rollback_0000000000000000029.md5
-rw-r--r-- 1 Rex users 2 Mar 13 17:33 seen_txid
-rw-r--r-- 1 Rex users 206 Mar 13 17:36 VERSION
{noformat}
Step 5: NN2 active shutdown
Step 6: write files
{noformat}
NN1 active:
-rw-r--r-- 1 Rex users 1817 Mar 13 17:35
edits_0000000000000000001-0000000000000000026
-rw-r--r-- 1 Rex users 67 Mar 13 17:35
edits_0000000000000000027-0000000000000000029
-rw-r--r-- 1 Rex users 1048576 Mar 13 17:35
edits_0000000000000000030-0000000000000000030
-rw-r--r-- 1 Rex users 1048576 Mar 13 17:39 edits_inprogress_0000000000000000032
-rw-r--r-- 1 Rex users 350 Mar 13 17:32 fsimage_0000000000000000000
-rw-r--r-- 1 Rex users 62 Mar 13 17:32 fsimage_0000000000000000000.md5
-rw-r--r-- 1 Rex users 622 Mar 13 17:36 fsimage_rollback_0000000000000000029
-rw-r--r-- 1 Rex users 71 Mar 13 17:36
fsimage_rollback_0000000000000000029.md5
-rw-r--r-- 1 Rex users 3 Mar 13 17:35 seen_txid
-rw-r--r-- 1 Rex users 206 Mar 13 17:32 VERSION
{noformat}
Step 7: bring down both NN
Step 8: Start NN2 and NN1 with rolling upgrade rollback option.
Issue:
======
NN2 active started successfully but NN1 standby startup failed with following
exception:
{noformat}
15/03/13 17:41:30 ERROR namenode.NameNode: Failed to start namenode.
java.io.IOException: Gap in transactions. Expected to be able to read up until
at least txid 31 but unable to find any edit logs containing txid 31
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.checkForGaps(FSEditLog.java:1617)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1575)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:647)
{noformat}
{noformat}
NN2 active:
-rw-r--r-- 1 Rex users 1048576 Mar 13 17:36
edits_0000000000000000031-0000000000000000031.trash
-rw-r--r-- 1 Rex users 1048576 Mar 13 17:40 edits_inprogress_0000000000000000030
-rw-r--r-- 1 Rex users 350 Mar 13 17:33 fsimage_0000000000000000000
-rw-r--r-- 1 Rex users 62 Mar 13 17:33 fsimage_0000000000000000000.md5
-rw-r--r-- 1 Rex users 622 Mar 13 17:36 fsimage_0000000000000000029
-rw-r--r-- 1 Rex users 62 Mar 13 17:40 fsimage_0000000000000000029.md5
-rw-r--r-- 1 Rex users 2 Mar 13 17:33 seen_txid
-rw-r--r-- 1 Rex users 206 Mar 13 17:40 VERSION
{noformat}
{noformat}
NN1 standby:
-rw-r--r-- 1 Rex users 1817 Mar 13 17:35
edits_0000000000000000001-0000000000000000026
-rw-r--r-- 1 Rex users 67 Mar 13 17:35
edits_0000000000000000027-0000000000000000029
-rw-r--r-- 1 Rex users 1048576 Mar 13 17:35
edits_0000000000000000030-0000000000000000030
-rw-r--r-- 1 Rex users 1048576 Mar 13 17:39
edits_0000000000000000032-0000000000000000062
-rw-r--r-- 1 Rex users 350 Mar 13 17:32 fsimage_0000000000000000000
-rw-r--r-- 1 Rex users 62 Mar 13 17:32 fsimage_0000000000000000000.md5
-rw-r--r-- 1 Rex users 622 Mar 13 17:36 fsimage_rollback_0000000000000000029
-rw-r--r-- 1 Rex users 71 Mar 13 17:36
fsimage_rollback_0000000000000000029.md5
-rw-r--r-- 1 Rex users 3 Mar 13 17:35 seen_txid
-rw-r--r-- 1 Rex users 206 Mar 13 17:32 VERSION
{noformat}
> During Rolling upgrade rollback ,standby namenode startup fails.
> ----------------------------------------------------------------
>
> Key: HDFS-7934
> URL: https://issues.apache.org/jira/browse/HDFS-7934
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: J.Andreina
> Assignee: J.Andreina
> Priority: Critical
>
> During Rolling upgrade rollback , standby namenode startup fails , while
> loading edits and when there is no local copy of edits created after upgrade
> ( which is already been removed by Active Namenode from journal manager and
> from Active's local).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)