[
https://issues.apache.org/jira/browse/HDFS-7097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kihwal Lee updated HDFS-7097:
-----------------------------
Status: Patch Available (was: Open)
> Allow block reports to be processed during checkpointing on standby name node
> -----------------------------------------------------------------------------
>
> Key: HDFS-7097
> URL: https://issues.apache.org/jira/browse/HDFS-7097
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Kihwal Lee
> Priority: Critical
> Attachments: HDFS-7097.patch
>
>
> On a reasonably busy HDFS cluster, there are stream of creates, causing data
> nodes to generate incremental block reports. When a standby name node is
> checkpointing, RPC handler threads trying to process a full or incremental
> block report is blocked on the name system's {{fsLock}}, because the
> checkpointer acquires the read lock on it. This can create a serious problem
> if the size of name space is big and checkpointing takes a long time.
> All available RPC handlers can be tied up very quickly. If you have 100
> handlers, it only takes 34 file creates. If a separate service RPC port is
> not used, HA transition will have to wait in the call queue for minutes. Even
> if a separate service RPC port is configured, hearbeats from datanodes will
> be blocked. A standby NN with a big name space can lose all data nodes after
> checkpointing. The rpc calls will also be retransmitted by data nodes many
> times, filling up the call queue and potentially causing listen queue
> overflow.
> Since block reports are not modifying any state that is being saved to
> fsimage, I propose letting them through during checkpointing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)