[ 
https://issues.apache.org/jira/browse/IGNITE-19142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706112#comment-17706112
 ] 

Roman Puchkovskiy commented on IGNITE-19142:
--------------------------------------------

Thanks!

> IncomingSnapshotCopier.cancel() blocks forever if called from multiple threads
> ------------------------------------------------------------------------------
>
>                 Key: IGNITE-19142
>                 URL: https://issues.apache.org/jira/browse/IGNITE-19142
>             Project: Ignite
>          Issue Type: Bug
>            Reporter: Roman Puchkovskiy
>            Assignee: Roman Puchkovskiy
>            Priority: Major
>              Labels: ignite-3
>         Attachments: threads_report.txt
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> One of the test runs hang forever. Thread dump and process state analysis 
> shown the following:
>  # Some thread A invoked IncomingSnapshotCopier.cancel() (twice) on the same 
> copier, which made its busyLock block any operations
>  # Another thread B, trying to process an InstallSnapshotRequest, found that 
> the leader has changed (probably, due to the cluster being shut down) and 
> triggered 'interrupt download snapshots'
>  # As a result, this thread B called IncomingSnapshotCopier.cancel() on the 
> same copier, but its busyLock.block() (which internally just takes a write 
> lock) blocks this thread forever (as the lock is taken by thread A on step 1)
>  # Before trying to cancel the snapshot downloading, thread B took 
> Node.writeLock. As the thread is now blocked forever, it cannot release it, 
> so every operation on the JRaft Node is blocked, including shutdown
>  # So the cluster hangs forever on stop, making the tests hang forever as well
> We should make IncomingSnapshotCopier.cancel() idempotent even when called 
> from different threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to