Dmitriy Pavlov created IGNITE-7507:
--------------------------------------

             Summary: Ignite node performance drop during checkpoint start: 
store metapage eviction causes long checkpoint lock hold time
                 Key: IGNITE-7507
                 URL: https://issues.apache.org/jira/browse/IGNITE-7507
             Project: Ignite
          Issue Type: Bug
          Components: persistence
            Reporter: Dmitriy Pavlov
            Assignee: Dmitriy Pavlov
             Fix For: 2.5


Store metadata Page eviction becomes very expensive operation during checkpoint 
start.

These pages reads hands ignite node until metadata will be loaded from disk.

Following store (paritition) metapages:
- Partition Metadata Page
- Freelist Meta Page
- Partition Counters IO
required during execution of saveStoreMetadata() & markCheckpointBegin()

If this page is not available in memory, it is loaded from disk.
But such loads are done while holding checkpointLock (in write mode).
Example of timing:
- checkpointLockWait=75ms, checkpointLockHoldTime=2653ms, pages=696120

All this time worker threads are not able to put any data to any cache.

It is required to avoid eviction of such pages (evict it with lowest priority 
than dirty page).

(Full stacktrace)       
{noformat} db-checkpoint-thread-#40%checkpoint.IgniteMassLoadSandboxTest1% 
Id=63 WAITING        
        
at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
        at 
org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
        at 
org.apache.ignite.internal.util.future.GridFutureAdapter.getUninterruptibly(GridFutureAdapter.java:145)
        at 
org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.read(AsyncFileIO.java:95)
        at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:324)
        at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:306)
        at 
org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:291)
        at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:656)
        at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:576)
        at 
org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:130)
        at 
org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.saveMetadata(PagesList.java:301)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:196)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:168)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3022)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2719)
        at 
org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2644)
        at 
org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
        at java.lang.Thread.run(Thread.java:748)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to