[
https://issues.apache.org/jira/browse/IGNITE-22261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Puchkovskiy updated IGNITE-22261:
---------------------------------------
Description:
# NodeImpl#executeApplyingTasks() takes NodeImpl.writeLock and calls
LogManager.appendEntries()
# LogManager tries to enqueue a task to diskQueue which is full, hence it
blocks until a task gets consumed from diskQueue
# diskQueue is consumed by StableClosureEventHandler
# StableClosureEventHandler tries to enqueue a task to
FSMCallerImpl#taskQueue, which is also full, so this also blocks until a task
gets consumed from FSMCallerImpl#taskQueue
# FSMCallerImpl#taskQueue is consumed by ApplyTaskHandler
# ApplyTaskHandler calls NodeImpl#onConfigurationChangeDone(), which tries to
take NodeImpl#writeLock
As a result, there is a deadlock:
NodeImpl#writeLock->LogManager#diskQueue->FSMCallerImpl#taskQueue->NodeImpl#writeLock
(disruptors are used as blocking queues in JRaft, so, when full, they act like
locks).
This was caught by ItNodeTest#testNodeTaskOverload() which uses extremely short
disruptors (2 items max each).
was:
# NodeImpl#executeApplyingTasks() takes NodeImpl.writeLock and calls
LogManager.appendEntries()
# LogManager tries to enqueue a task to diskQueue which is full, hence it
blocks until a task gets consumed from diskQueue
# diskQueue is consumed by StableClosureEventHandler
# StableClosureEventHandler tries to enqueue a task to
FSMCallerImpl#taskQueue, which is also full, so this also blocks until a task
gets consumed from FSMCallerImpl#taskQueue
# FSMCallerImpl#taskQueue is consumed by ApplyTaskHandler
# ApplyTaskHandler calls NodeImpl#onConfigurationChangeDone(), which tries to
take NodeImpl#writeLock
As a result, there is a deadlock:
NodeImpl#writeLock->LogManager#diskQueue->FSMCallerImpl#taskQueue->NodeImpl#writeLock
(disruptors are used as blocking queues in JRaft, so, when full, they act like
locks).
> Deadlock on configuration application in NodeImpl when disruptors are full
> --------------------------------------------------------------------------
>
> Key: IGNITE-22261
> URL: https://issues.apache.org/jira/browse/IGNITE-22261
> Project: Ignite
> Issue Type: Bug
> Reporter: Roman Puchkovskiy
> Priority: Major
> Labels: ignite-3
>
> # NodeImpl#executeApplyingTasks() takes NodeImpl.writeLock and calls
> LogManager.appendEntries()
> # LogManager tries to enqueue a task to diskQueue which is full, hence it
> blocks until a task gets consumed from diskQueue
> # diskQueue is consumed by StableClosureEventHandler
> # StableClosureEventHandler tries to enqueue a task to
> FSMCallerImpl#taskQueue, which is also full, so this also blocks until a task
> gets consumed from FSMCallerImpl#taskQueue
> # FSMCallerImpl#taskQueue is consumed by ApplyTaskHandler
> # ApplyTaskHandler calls NodeImpl#onConfigurationChangeDone(), which tries
> to take NodeImpl#writeLock
> As a result, there is a deadlock:
> NodeImpl#writeLock->LogManager#diskQueue->FSMCallerImpl#taskQueue->NodeImpl#writeLock
> (disruptors are used as blocking queues in JRaft, so, when full, they act
> like locks).
> This was caught by ItNodeTest#testNodeTaskOverload() which uses extremely
> short disruptors (2 items max each).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)