[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

Kihwal Lee (JIRA) Fri, 19 Sep 2014 08:17:04 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14140717#comment-14140717
 ]


Kihwal Lee commented on HDFS-7097:
----------------------------------

On a standby name node, the persistent state (i.e. things saved to fsimage) is 
updated only by replaying edits once it is started.  
- During checkpointing do not allow any state changes that will be persisted. 
(I.e. edit log replaying)
- During checkpointing do not allow any RPC calls that may affect checkpointing 
itself. (saveNamespace, restoreFailedStorage, etc.)

A new {{ReentrantLock}},{{nsLock}}, is introduced to coordinate checkpointing 
and other activities.  Any thing that requires both {{nsLock}} and {{fsLock}}, 
{{nsLock}} is to be locked first. Otherwise it can block other RPC calls that 
do not require {{nsLock}}, not to mention deadlock.  These locks are all locked 
interruptibly, so that the threads can be stopped during HA transition.

Also improved {{FSImageFormat.Saver}} on cancellation checking, by making the 
check interval counter a class-level variable and following the PB format's 
threshold variable name. This only matters if the legacy oiv image saving is 
enabled.

> Allow block reports to be processed during checkpointing on standby name node
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-7097
>                 URL: https://issues.apache.org/jira/browse/HDFS-7097
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-7097.patch
>
>
> On a reasonably busy HDFS cluster, there are stream of creates, causing data 
> nodes to generate incremental block reports.  When a standby name node is 
> checkpointing, RPC handler threads trying to process a full or incremental 
> block report is blocked on the name system's {{fsLock}}, because the 
> checkpointer acquires the read lock on it.  This can create a serious problem 
> if the size of name space is big and checkpointing takes a long time.
> All available RPC handlers can be tied up very quickly. If you have 100 
> handlers, it only takes 34 file creates.  If a separate service RPC port is 
> not used, HA transition will have to wait in the call queue for minutes. Even 
> if a separate service RPC port is configured, hearbeats from datanodes will 
> be blocked. A standby NN  with a big name space can lose all data nodes after 
> checkpointing.  The rpc calls will also be retransmitted by data nodes many 
> times, filling up the call queue and potentially causing listen queue 
> overflow.
> Since block reports are not modifying any state that is being saved to 
> fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-7097) Allow block reports to be processed during checkpointing on standby name node

Reply via email to