[
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791085#comment-16791085
]
Wei-Chiu Chuang commented on HDFS-14366:
----------------------------------------
Nice catch.
+1 pending Jenkins. Actually I had spotted the issue before, but didn't follow
up. Quoting my comments at HDFS-14171:
{quote}
Finally, getNumLiveDataNodes() is used in a few other places too. Most notably,
it is used by BlockManager#isSufficientlyReplicated, which is called by
FSDirAppendOp#appendFile. I'm wondering if the same perf issue would occur when
appending files in a large clusters like this.
{quote}
> Improve HDFS append performance
> -------------------------------
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Affects Versions: 2.8.2
> Reporter: Chao Sun
> Assignee: Chao Sun
> Priority: Major
> Attachments: HDFS-14366.000.patch, append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as
> 10X write lock time than other write operations. By collecting flamegraph on
> the namenode (see attachment: append-flamegraph.png), we found that most of
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
> /** @return the number of live datanodes. */
> public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
> for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
> numLive++;
> }
> }
> }
> return numLive;
> }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly
> expensive in large clusters since {{datanodeMap}} is being modified in many
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in
> {{isSufficientlyReplicated}}:
> {code}
> /**
> * Check if a block is replicated to at least the minimum replication.
> */
> public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
> }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it
> will call {{getNumLiveDataNodes()}} _every time_ even though usually
> {{minReplication}} is much smaller than the latter.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]