[
https://issues.apache.org/jira/browse/HDFS-15468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172078#comment-17172078
]
Amithsha commented on HDFS-15468:
---------------------------------
This could be because of breaching the JN Quorum. Try restarting in rolling
fashing.
> Active namenode crashed when no edit recover
> --------------------------------------------
>
> Key: HDFS-15468
> URL: https://issues.apache.org/jira/browse/HDFS-15468
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, journal-node, namenode
> Affects Versions: 3.0.0
> Reporter: Karthik Palanisamy
> Priority: Critical
>
> if namenode is under safe mode and let restart two journal node for
> maintenance activity.
> In this case, the journal node will not finalize the last edit segment which
> is edit in-progress.
> This last edit segment will be finalized or recovered when edit rolling
> operation else when epoch change due to namenode failover.
> But the current scenario is no failover, just namenode is under safe mode.
> If we leave the safe mode then active namenode will crash.
> Ie.
> the current open segment is edits_inprogress_0000000010356376710 but it is
> not recovered or finalized post JN2 restart. I think we need to recover the
> edits after JN restart.
> {code:java}
> Journal node
> 2020-06-20 16:11:53,458 INFO server.Journal
> (Journal.java:scanStorageForLatestEdits(193)) - Latest log is
> EditLogFile(file=/hadoop/hdfs/journal/xxx/current/edits_inprogress_0000000010356376710,first=0000000010356376710,last=0000000010356376710,inProgress=true,hasCorruptHeader=false)
> 2020-06-20 16:19:06,397 INFO ipc.Server (Server.java:logException(2435)) -
> IPC Server handler 3 on 8485, call
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from
> 10.x.x.x:28444 Call#49083225 Retry#0
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't
> write, no segment open
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
> {code}
> {code:java}
> {code:java}
> Namenode log:
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
> exceptions to achieve quorum size 2/3. 1 successful responses:
> 10.x.x.x:8485: null [success]
> 2 exceptions thrown:
> 10.y.y.y:8485: Can't write, no segment open
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]