[
https://issues.apache.org/jira/browse/HDFS-12639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200832#comment-16200832
]
Hanisha Koneru commented on HDFS-12639:
---------------------------------------
Hi [~daryn], are you working on this Jira? If not, I would like to take it up.
> BPOfferService lock may stall all service actors
> ------------------------------------------------
>
> Key: HDFS-12639
> URL: https://issues.apache.org/jira/browse/HDFS-12639
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.8.0
> Reporter: Daryn Sharp
>
> {{BPOfferService}} manages {{BPServiceActor}} instances for the active and
> standby. It uses a RW lock to primarily protect registration information
> while determining the active/standby from heartbeats.
> Unfortunately the write lock is held during command processing. If an actor
> is experiencing high latency processing commands, the other actor will
> neither be able to register (blocked in createRegistration, setNamespaceInfo,
> verifyAndSetNamespaceInfo) nor process heartbeats (blocked in
> updateActorStatesFromHeartbeat).
> The worst case scenario for processing commands while holding the lock is
> re-registration. The actor will loop, catching and logging exceptions,
> leaving the other actor blocked for an non-deterministic (possibly infinite)
> amount of time.
> The lock must not be held during command processing.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]