Daryn Sharp created HDFS-12645:
----------------------------------
Summary: FSDatasetImpl lock will stall BP service actors and may
cause missing blocks
Key: HDFS-12645
URL: https://issues.apache.org/jira/browse/HDFS-12645
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
The DN is extremely susceptible to a slow volume due bad locking practices. DN
operations require a fs dataset lock. IO in the dataset lock should not be
permissible as it leads to severe performance degradation and possibly
(temporarily) missing blocks.
A slow disk will cause pipelines to experience significant latency and
timeouts, increasing lock/io contention while cleaning up, leading to more
timeouts, etc. Meanwhile, the actor service thread is interleaving multiple
lock acquire/releases with xceivers. If many commands are issued, the node may
be incorrectly declared as dead.
HDFS-12639 documents that both actors synchronize on the offer service lock
while processing commands. A backlogged active actor will block the standby
actor and cause it to go dead too.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]