[
https://issues.apache.org/jira/browse/HDFS-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13190279#comment-13190279
]
Aaron T. Myers commented on HDFS-2823:
--------------------------------------
I see three options:
# Make {{o.a.h.ipc.Client}} not catch {{InterruptedException}}. (Todd mentioned
that this is already filed as some trunk JIRA, but I can't find it right now.)
# Add a check for {{shouldRun}} that breaks out of the loop before acquiring
the lock, after the edit log tailer thread triggers a log roll, but before it
tries to acquire the FSNS lock.
# Move edit log roll triggering to a separate thread.
Thoughts?
> HA: Transition to active can cause NN deadlock
> ----------------------------------------------
>
> Key: HDFS-2823
> URL: https://issues.apache.org/jira/browse/HDFS-2823
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: ha, name-node
> Affects Versions: HA branch (HDFS-1623)
> Reporter: Aaron T. Myers
> Assignee: Aaron T. Myers
> Priority: Blocker
>
> On transition to active, we have to take the FSNS write lock. In
> {{EditLogTailer#stop}}, we interrupt the edit log tailer thread and then join
> on that thread. When tailing edits, the edit log tailer thread acquires the
> FSNS write lock interruptibly, precisely so that we avoid deadlocks on
> transition to active. However, the edit log tailer thread now also triggers
> edit log rolls. Several places in {{ipc.Client}} catch and ignore
> {{InterruptedException}}, and in so doing may cause the {{Thread#interrupt}}
> call to be missed by the edit log tailer thread.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira