[
https://issues.apache.org/jira/browse/HDFS-2693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172886#comment-13172886
]
Todd Lipcon commented on HDFS-2693:
-----------------------------------
Found a nasty bug in the current patch: in {{getBlockLocationsUpdateTimes}} the
{{checkOperation}} call has to move down inside {{try..catch}} or else calling
this function on an NN in standby state will leak the FSN readlock, killing the
SBN. I'll upload a new patch soon. Still worth reviewing the general patch
since most of it seems to be working correctly.
> Synchronization issues around state transition
> ----------------------------------------------
>
> Key: HDFS-2693
> URL: https://issues.apache.org/jira/browse/HDFS-2693
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Attachments: hdfs-2693.txt
>
>
> Currently when the NN changes state, it does so without synchronization. In
> particular, the state transition function does:
> (1) leave old state
> (2) change state variable
> (3) enter new state
> This means that the NN is marked as "active" before it has actually
> transitioned to active mode and opened its edit logs. This gives a window
> where write transactions can come in and the {{checkOperation}} allows them,
> but then they fail because the edit log is not yet opened.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira