[ 
https://issues.apache.org/jira/browse/HDFS-15468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Palanisamy updated HDFS-15468:
--------------------------------------
    Description: 
 if namenode is under safe mode and let restart two journal node for 
maintenance activity.
  
 In this case, the journal node will not finalize the last edit segment which 
is edit in-progress.
  
 This last edit segment will be finalized or recovered when edit rolling 
operation else when epoch change due to namenode failover.
  
 But the current scenario is no failover, just namenode is under safe mode. If 
we leave the safe mode then active namenode will crash.
  
 Ie.
 the current open segment is edits_inprogress_0000000010356376710 but it is not 
recovered or finalized post JN2 restart. I think we need to recover the edits 
after JN restart. 
  
  
  
{code:java}
Journal node 
2020-06-20 16:11:53,458 INFO  server.Journal 
(Journal.java:scanStorageForLatestEdits(193)) - Latest log is 
EditLogFile(file=/hadoop/hdfs/journal/xxx/current/edits_inprogress_0000000010356376710,first=0000000010356376710,last=0000000010356376710,inProgress=true,hasCorruptHeader=false)
2020-06-20 16:19:06,397 INFO  ipc.Server (Server.java:logException(2435)) - IPC 
Server handler 3 on 8485, call 
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from 
10.x.x.x:28444 Call#49083225 Retry#0
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
write, no segment open
        at 
org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
{code}
{code:java}
{code:java}
Namenode log:
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions 
to achieve quorum size 2/3. 1 successful responses:
10.x.x.x:8485: null [success]
2 exceptions thrown:
10.y.y.y:8485: Can't write, no segment open
{code}
 
  

  was:
 if namenode is under safe mode and let restart two journal node for 
maintenance activity.
 
In this case, the journal node will not finalize the last edit segment which is 
edit in-progress.
 
This last edit segment will be finalized or recovered when edit rolling 
operation else when epoch change due to namenode failover.
 
But the current scenario is no failover, just namenode is under safe mode. If 
we leave the safe mode then active namenode will crash.
 
Ie.
the current open segment is edits_inprogress_0000000010356376710 but it is not 
recovered or finalized post JN2 restart. I think we need to recover the edits 
after JN restart. 
 
 
 
{code:java}
Journal node 
2020-06-20 16:11:53,458 INFO  server.Journal 
(Journal.java:scanStorageForLatestEdits(193)) - Latest log is 
EditLogFile(file=/hadoop/hdfs/journal/PRODNNHA/current/edits_inprogress_0000000010356376710,first=0000000010356376710,last=0000000010356376710,inProgress=true,hasCorruptHeader=false)
2020-06-20 16:19:06,397 INFO  ipc.Server (Server.java:logException(2435)) - IPC 
Server handler 3 on 8485, call 
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from 
10.x.x.x:28444 Call#49083225 Retry#0
org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
write, no segment open
        at 
org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
{code}
{code}
{code:java}
Namenode log:
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions 
to achieve quorum size 2/3. 1 successful responses:
10.x.x.x:8485: null [success]
2 exceptions thrown:
10.y.y.y:8485: Can't write, no segment open
{code}
 
 


> Active namenode crashed when no edit recover
> --------------------------------------------
>
>                 Key: HDFS-15468
>                 URL: https://issues.apache.org/jira/browse/HDFS-15468
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.0.0
>            Reporter: Karthik Palanisamy
>            Priority: Critical
>
>  if namenode is under safe mode and let restart two journal node for 
> maintenance activity.
>   
>  In this case, the journal node will not finalize the last edit segment which 
> is edit in-progress.
>   
>  This last edit segment will be finalized or recovered when edit rolling 
> operation else when epoch change due to namenode failover.
>   
>  But the current scenario is no failover, just namenode is under safe mode. 
> If we leave the safe mode then active namenode will crash.
>   
>  Ie.
>  the current open segment is edits_inprogress_0000000010356376710 but it is 
> not recovered or finalized post JN2 restart. I think we need to recover the 
> edits after JN restart. 
>   
>   
>   
> {code:java}
> Journal node 
> 2020-06-20 16:11:53,458 INFO  server.Journal 
> (Journal.java:scanStorageForLatestEdits(193)) - Latest log is 
> EditLogFile(file=/hadoop/hdfs/journal/xxx/current/edits_inprogress_0000000010356376710,first=0000000010356376710,last=0000000010356376710,inProgress=true,hasCorruptHeader=false)
> 2020-06-20 16:19:06,397 INFO  ipc.Server (Server.java:logException(2435)) - 
> IPC Server handler 3 on 8485, call 
> org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocol.journal from 
> 10.x.x.x:28444 Call#49083225 Retry#0
> org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException: Can't 
> write, no segment open
>         at 
> org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:484)
> {code}
> {code:java}
> {code:java}
> Namenode log:
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many 
> exceptions to achieve quorum size 2/3. 1 successful responses:
> 10.x.x.x:8485: null [success]
> 2 exceptions thrown:
> 10.y.y.y:8485: Can't write, no segment open
> {code}
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to