Dmitriy Pavlov created IGNITE-7507:
--------------------------------------
Summary: Ignite node performance drop during checkpoint start:
store metapage eviction causes long checkpoint lock hold time
Key: IGNITE-7507
URL: https://issues.apache.org/jira/browse/IGNITE-7507
Project: Ignite
Issue Type: Bug
Components: persistence
Reporter: Dmitriy Pavlov
Assignee: Dmitriy Pavlov
Fix For: 2.5
Store metadata Page eviction becomes very expensive operation during checkpoint
start.
These pages reads hands ignite node until metadata will be loaded from disk.
Following store (paritition) metapages:
- Partition Metadata Page
- Freelist Meta Page
- Partition Counters IO
required during execution of saveStoreMetadata() & markCheckpointBegin()
If this page is not available in memory, it is loaded from disk.
But such loads are done while holding checkpointLock (in write mode).
Example of timing:
- checkpointLockWait=75ms, checkpointLockHoldTime=2653ms, pages=696120
All this time worker threads are not able to put any data to any cache.
It is required to avoid eviction of such pages (evict it with lowest priority
than dirty page).
(Full stacktrace)
{noformat} db-checkpoint-thread-#40%checkpoint.IgniteMassLoadSandboxTest1%
Id=63 WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at
org.apache.ignite.internal.util.future.GridFutureAdapter.getUninterruptibly(GridFutureAdapter.java:145)
at
org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.read(AsyncFileIO.java:95)
at
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:324)
at
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:306)
at
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:291)
at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:656)
at
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:576)
at
org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:130)
at
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.saveMetadata(PagesList.java:301)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:196)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:168)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3022)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2719)
at
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2644)
at
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
at java.lang.Thread.run(Thread.java:748)
{noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)