[
https://issues.apache.org/jira/browse/HDFS-16898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683978#comment-17683978
]
ASF GitHub Bot commented on HDFS-16898:
---------------------------------------
virajjasani commented on PR #5330:
URL: https://github.com/apache/hadoop/pull/5330#issuecomment-1416204631
> Hi, @virajjasani . thanks for your careful review. Surely, before
[HDFS-6788](https://issues.apache.org/jira/browse/HDFS-6788), this part was
covered by synchronized lock. but in method `processCommandFromActive` and
`processCommandFromStandby`, it just use the parameter actor to print log info.
The lock here is just trying to decide actor is whether bpServiceToActive or
not and determine to execute either processCommandFromActive or
processCommandFromStandby.
>
> when occurs switchover between active namenode and standby namenode, the
datanodes would be set to stale status, in stale status, we are not allowed to
delete blocks directly, we put those blocks into postponedMisreplicatedBlocks.
So, even we execute the DatanodeCommand from the previous active namenode(now
standby), it is okay.
Thank you @hfutatzhanghb.
I was just going to state that we don't need write lock to verify whether
the current actor is the one connected to active namenode, read lock would be
sufficient. But looks like you already made the change.
I did a quick glance and we don't hit this log line in our clusters so far
but this PR has interesting fix. I will check this further for any more
resource contention.
> Make write lock fine-grain in processCommandFromActor method
> ------------------------------------------------------------
>
> Key: HDFS-16898
> URL: https://issues.apache.org/jira/browse/HDFS-16898
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 3.3.4
> Reporter: ZhangHB
> Priority: Major
> Labels: pull-request-available
>
> Now in method processCommandFromActor, we have code like below:
>
> {code:java}
> writeLock();
> try {
> if (actor == bpServiceToActive) {
> return processCommandFromActive(cmd, actor);
> } else {
> return processCommandFromStandby(cmd, actor);
> }
> } finally {
> writeUnlock();
> } {code}
> if method processCommandFromActive costs much time, the write lock would not
> release.
>
> It maybe block the updateActorStatesFromHeartbeat method in
> offerService,furthermore, it can cause the lastcontact of datanode very high,
> even dead when lastcontact beyond 600s.
> {code:java}
> bpos.updateActorStatesFromHeartbeat(
> this, resp.getNameNodeHaState());{code}
> here we can make write lock fine-grain in processCommandFromActor method to
> address this problem
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]