[
https://issues.apache.org/jira/browse/HDFS-15468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Karthik Palanisamy updated HDFS-15468:
--------------------------------------
Description:
if namenode is under safe mode and let restart two journal node for
maintenance activity.
In this case, the journal node will not finalize the last edit segment which
is edit in-progress.
This last edit segment will be finalized or recovered when edit rolling
operation else when epoch change due to namenode failover.
But the current scenario is no failover, just namenode is under safe mode. If
we leave the safe mode then active namenode will crash.
Ie.
the current open segment is edits_inprogress_0000000010356376710 but it is not
recovered or finalized post JN2 restart. I think we need to recover the edits
after JN restart.
{code:java}
Journal node
2020-06-20 16:11:53,458 INFO server.Journal
(Journal.java:scanStorageForLatestEdits(193)) - Latest log is
EditLogFile(file=/hadoop/hdfs/journal/xxx/current/edits_inprogress_0000000010356376710,first=0000000010356376710,last=0000000010356376710,inProgress=true,hasCorruptHeader=false)
2020-06-20 16:19:06,397 INFO ipc.Server (Server.java:logException(2435)) - IPC
Server handler 3 on 8485, call
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from
10.x.x.x:28444 Call#49083225 Retry#0
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't
write, no segment open
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
{code}
{code:java}
{code:java}
Namenode log:
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions
to achieve quorum size 2/3. 1 successful responses:
10.x.x.x:8485: null [success]
2 exceptions thrown:
10.y.y.y:8485: Can't write, no segment open
{code}
was:
if namenode is under safe mode and let restart two journal node for
maintenance activity.
In this case, the journal node will not finalize the last edit segment which is
edit in-progress.
This last edit segment will be finalized or recovered when edit rolling
operation else when epoch change due to namenode failover.
But the current scenario is no failover, just namenode is under safe mode. If
we leave the safe mode then active namenode will crash.
Ie.
the current open segment is edits_inprogress_0000000010356376710 but it is not
recovered or finalized post JN2 restart. I think we need to recover the edits
after JN restart.
{code:java}
Journal node
2020-06-20 16:11:53,458 INFO server.Journal
(Journal.java:scanStorageForLatestEdits(193)) - Latest log is
EditLogFile(file=/hadoop/hdfs/journal/PRODNNHA/current/edits_inprogress_0000000010356376710,first=0000000010356376710,last=0000000010356376710,inProgress=true,hasCorruptHeader=false)
2020-06-20 16:19:06,397 INFO ipc.Server (Server.java:logException(2435)) - IPC
Server handler 3 on 8485, call
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from
10.x.x.x:28444 Call#49083225 Retry#0
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't
write, no segment open
at
org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
{code}
{code}
{code:java}
Namenode log:
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions
to achieve quorum size 2/3. 1 successful responses:
10.x.x.x:8485: null [success]
2 exceptions thrown:
10.y.y.y:8485: Can't write, no segment open
{code}
> Active namenode crashed when no edit recover
> --------------------------------------------
>
> Key: HDFS-15468
> URL: https://issues.apache.org/jira/browse/HDFS-15468
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Karthik Palanisamy
> Priority: Critical
>
> if namenode is under safe mode and let restart two journal node for
> maintenance activity.
>
> In this case, the journal node will not finalize the last edit segment which
> is edit in-progress.
>
> This last edit segment will be finalized or recovered when edit rolling
> operation else when epoch change due to namenode failover.
>
> But the current scenario is no failover, just namenode is under safe mode.
> If we leave the safe mode then active namenode will crash.
>
> Ie.
> the current open segment is edits_inprogress_0000000010356376710 but it is
> not recovered or finalized post JN2 restart. I think we need to recover the
> edits after JN restart.
>
>
>
> {code:java}
> Journal node
> 2020-06-20 16:11:53,458 INFO server.Journal
> (Journal.java:scanStorageForLatestEdits(193)) - Latest log is
> EditLogFile(file=/hadoop/hdfs/journal/xxx/current/edits_inprogress_0000000010356376710,first=0000000010356376710,last=0000000010356376710,inProgress=true,hasCorruptHeader=false)
> 2020-06-20 16:19:06,397 INFO ipc.Server (Server.java:logException(2435)) -
> IPC Server handler 3 on 8485, call
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from
> 10.x.x.x:28444 Call#49083225 Retry#0
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't
> write, no segment open
> at
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
> {code}
> {code:java}
> {code:java}
> Namenode log:
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
> exceptions to achieve quorum size 2/3. 1 successful responses:
> 10.x.x.x:8485: null [success]
> 2 exceptions thrown:
> 10.y.y.y:8485: Can't write, no segment open
> {code}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]