[ 
https://issues.apache.org/jira/browse/HDFS-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508486#comment-13508486
 ] 

Vinay commented on HDFS-4238:
-----------------------------

Hi Aaron,
Yes, its the standby who purged the edits from shared storage. I will describe 
it as below.

1. NN1 was Active and NN2 was Standby.
2. NN2 was doing the checkpoint every one hour. Every time saving namespace was 
success, but uploading to Active was failing due to some security issue. This 
was continued for long time. As part of {{saveNameSpace()}} in NN2, edits from 
shared storage purged, which are not present in NN1's fsimage. At this time 
Active was having old fsimage itself, but it was running.
3. After some time, NN1 got restarted, and NN2 became Active.
4. The current Standby NN1, is having out of date fsimage and there is a gap 
between fsimage and edits in shared storage. So NN1 cannot to 
tailing/checkpoint.


My point is, Standby should not do any modifications to shared storage.
                
> [HA] Standby namenode should not do purging of shared storage edits.
> --------------------------------------------------------------------
>
>                 Key: HDFS-4238
>                 URL: https://issues.apache.org/jira/browse/HDFS-4238
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0, 2.0.2-alpha
>            Reporter: Vinay
>
> This happened in our cluster,
> >> Standby NN was keep doing checkpoint every one hour and uploading to 
> >> Active NN was continuously failing due to some kerberos issue and nobody 
> >> noticed this, since Active was servicing properly.
> >> Active NN was up for long time with fsimage having very least transaction.
> >> Standby NN has saved the checkpoint in its name dir and purged the txns > 
> >> 1000000 from shared storage ( includes edits which are not present in 
> >> Active NN's fsimage)
> >> After some time Active NN is restarted and StandBy NN switched to Active.
> Now current Standby not able to load any edits from shared storage, as 
> expected edits are not present in shared storage. Its keep running idle.
> So {{editLog.purgeLogsOlderThan(purgeLogsFrom);}} always should be called 
> from Active NameNode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to