[
https://issues.apache.org/jira/browse/IGNITE-8295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16442297#comment-16442297
]
Andrew Mashenkov commented on IGNITE-8295:
------------------------------------------
After wrap partStoreLock into checkpointLock i've got next stacktrace.
Seems, we should truncate partition file under checkpointLock.
java.lang.AssertionError: FullPageId [pageId=0001005700000003,
effectivePageId=0000005700000003, grpId=2141373874]
at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:730)
at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:624)
at
org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:142)
at
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.saveMetadata(PagesList.java:301)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:186)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:164)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3155)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2909)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2808)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
> Possible deadlock on partition eviction.
> ----------------------------------------
>
> Key: IGNITE-8295
> URL: https://issues.apache.org/jira/browse/IGNITE-8295
> Project: Ignite
> Issue Type: Bug
> Components: persistence
> Reporter: Andrew Mashenkov
> Assignee: Andrew Mashenkov
> Priority: Major
> Fix For: 2.6
>
> Attachments: deadlock.stack
>
>
> GridCacheOffheapManager.recreateCacheDataStore() calls
> updatePartitionCounter() under partStoreLock which may try to acquire
> checkpointReadLock.
> recreateCacheDataStore() method can be called with checkpointReadLock (on
> GridDhtPartitionsExchangeFuture.updatePartitionFullMap)
> or without checkpointReadLock (GridDhtPartitionEvictor thread calls
> evictPartitionAsync),
> So, checkpoint can cause a deadlock if it happens in between.
> Seems, we should acquire checkpointReadLock before partStoreLock.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)