[
https://issues.apache.org/jira/browse/HDFS-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chao Sun updated HDFS-14366:
----------------------------
Description:
In our HDFS cluster we observed that {{append}} operation can take as much as
10X write lock time than other write operations. By collecting flamegraph on
the namenode (see attachment: append-flamegraph.png), we found that most of the
append call is spent on {{getNumLiveDataNodes()}}:
{code}
/** @return the number of live datanodes. */
public int getNumLiveDataNodes() {
int numLive = 0;
synchronized (this) {
for(DatanodeDescriptor dn : datanodeMap.values()) {
if (!isDatanodeDead(dn) ) {
numLive++;
}
}
}
return numLive;
}
{code}
this method synchronizes on the {{DatanodeManager}} which is particularly
expensive in large clusters since {{datanodeMap}} is being modified in many
places such as processing DN heartbeats.
For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in
{{isSufficientlyReplicated}}:
{code}
/**
* Check if a block is replicated to at least the minimum replication.
*/
public boolean isSufficientlyReplicated(BlockInfo b) {
// Compare against the lesser of the minReplication and number of live DNs.
final int replication =
Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
return countNodes(b).liveReplicas() >= replication;
}
{code}
The way that the {{replication}} is calculated is not very optimal, as it will
call {{getNumLiveDataNodes()}} every time even though usually
{{minReplication}} is much smaller than the latter.
was:
In our HDFS cluster we observed that {{append}} operation can take as much as
10X write lock time than other write operations. By collecting flamegraph on
the namenode (see attachment), we found that most of the append call is spent
on {{getNumLiveDataNodes()}}:
{code}
/** @return the number of live datanodes. */
public int getNumLiveDataNodes() {
int numLive = 0;
synchronized (this) {
for(DatanodeDescriptor dn : datanodeMap.values()) {
if (!isDatanodeDead(dn) ) {
numLive++;
}
}
}
return numLive;
}
{code}
this method synchronizes on the {{DatanodeManager}} which is particularly
expensive in large clusters since {{datanodeMap}} is being modified in many
places such as processing DN heartbeats.
For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in
{{isSufficientlyReplicated}}:
{code}
/**
* Check if a block is replicated to at least the minimum replication.
*/
public boolean isSufficientlyReplicated(BlockInfo b) {
// Compare against the lesser of the minReplication and number of live DNs.
final int replication =
Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
return countNodes(b).liveReplicas() >= replication;
}
{code}
The way that the {{replication}} is calculated is not very optimal, as it will
call {{getNumLiveDataNodes()}} every time even though usually
{{minReplication}} is much smaller than the latter.
> Improve HDFS append performance
> -------------------------------
>
> Key: HDFS-14366
> URL: https://issues.apache.org/jira/browse/HDFS-14366
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs
> Reporter: Chao Sun
> Assignee: Chao Sun
> Priority: Major
> Attachments: append-flamegraph.png
>
>
> In our HDFS cluster we observed that {{append}} operation can take as much as
> 10X write lock time than other write operations. By collecting flamegraph on
> the namenode (see attachment: append-flamegraph.png), we found that most of
> the append call is spent on {{getNumLiveDataNodes()}}:
> {code}
> /** @return the number of live datanodes. */
> public int getNumLiveDataNodes() {
> int numLive = 0;
> synchronized (this) {
> for(DatanodeDescriptor dn : datanodeMap.values()) {
> if (!isDatanodeDead(dn) ) {
> numLive++;
> }
> }
> }
> return numLive;
> }
> {code}
> this method synchronizes on the {{DatanodeManager}} which is particularly
> expensive in large clusters since {{datanodeMap}} is being modified in many
> places such as processing DN heartbeats.
> For {{append}} operation, {{getNumLiveDataNodes()}} is invoked in
> {{isSufficientlyReplicated}}:
> {code}
> /**
> * Check if a block is replicated to at least the minimum replication.
> */
> public boolean isSufficientlyReplicated(BlockInfo b) {
> // Compare against the lesser of the minReplication and number of live
> DNs.
> final int replication =
> Math.min(minReplication, getDatanodeManager().getNumLiveDataNodes());
> return countNodes(b).liveReplicas() >= replication;
> }
> {code}
> The way that the {{replication}} is calculated is not very optimal, as it
> will call {{getNumLiveDataNodes()}} every time even though usually
> {{minReplication}} is much smaller than the latter.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]