[ 
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14165623#comment-14165623
 ] 

Aaron T. Myers commented on HDFS-7097:
--------------------------------------

The patch looks pretty good, and in thinking about it a fair bit I think it 
won't regress the issue I was trying to address in HDFS-5064, though Kihwal I 
would appreciate if you could confirm that as well.

A few small comments:

# Does {{FSNamesystem#rollEditLog}} need to take the nsLock as well? Seems like 
it might, given that tailing edits no longer is taking the normal FSNS rw lock.
# Similarly for {{FSNamesystem#(start|end)Checkpoint}}, though that's less 
obvious to me.
# Seems a little strange to me to be calling this new lock the "nsLock", when 
that's also what we've been calling the main FSNS rw lock all this time. I'd 
suggest renaming this to the "checkpoint lock" or something, to more clearly 
distinguish its purpose.
# I think you can now remove some of the other stuff added as part of 
HDFS-5064, e.g. the entire {{longReadLock}} I believe was only actually being 
locked for read during checkpointing.

Thanks a lot for working on this, Kihwal.

> Allow block reports to be processed during checkpointing on standby name node
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-7097
>                 URL: https://issues.apache.org/jira/browse/HDFS-7097
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-7097.patch, HDFS-7097.patch, HDFS-7097.patch
>
>
> On a reasonably busy HDFS cluster, there are stream of creates, causing data 
> nodes to generate incremental block reports.  When a standby name node is 
> checkpointing, RPC handler threads trying to process a full or incremental 
> block report is blocked on the name system's {{fsLock}}, because the 
> checkpointer acquires the read lock on it.  This can create a serious problem 
> if the size of name space is big and checkpointing takes a long time.
> All available RPC handlers can be tied up very quickly. If you have 100 
> handlers, it only takes 34 file creates.  If a separate service RPC port is 
> not used, HA transition will have to wait in the call queue for minutes. Even 
> if a separate service RPC port is configured, hearbeats from datanodes will 
> be blocked. A standby NN  with a big name space can lose all data nodes after 
> checkpointing.  The rpc calls will also be retransmitted by data nodes many 
> times, filling up the call queue and potentially causing listen queue 
> overflow.
> Since block reports are not modifying any state that is being saved to 
> fsimage, I propose letting them through during checkpointing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to