[ 
https://issues.apache.org/jira/browse/IGNITE-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334085#comment-16334085
 ] 

Dmitriy Sorokin commented on IGNITE-7019:
-----------------------------------------

We discussed possible solutions with [~mcherkasov] and [~avinogradov], and 
chose the following: first, when IOOME occured on page moving from bucket with 
lower index to higher one, we leave page on old bucket; second, we add 
periodical task for looking up such pages (placed on wrong buckets) and 
correcting its placement if possible (no IOOME on page moving).

Also we need reproducer for this bug, I'll make it at first.

> Cluster can not survive after IgniteOOM
> ---------------------------------------
>
>                 Key: IGNITE-7019
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7019
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.3
>            Reporter: Mikhail Cherkasov
>            Assignee: Dmitriy Sorokin
>            Priority: Critical
>              Labels: iep-7
>             Fix For: 2.5
>
>
> even if we have full sync mode and transactional cache we can't add new nodes 
> if there  was IgniteOOM, after adding new nodes and re-balancing, old nodes 
> can't evict partitions:
> {code}
> [2017-11-17 20:02:24,588][ERROR][sys-#65%DR1%][GridDhtPreloader] Partition 
> eviction failed, this can cause grid hang.
> class org.apache.ignite.internal.mem.IgniteOutOfMemoryException: Not enough 
> memory allocated [policyName=100MB_Region_Eviction, size=104.9 MB]
> Consider increasing memory policy size, enabling evictions, adding more nodes 
> to the cluster, reducing number of backups or reducing model size.
>     at 
> org.apache.ignite.internal.pagemem.impl.PageMemoryNoStoreImpl.allocatePage(PageMemoryNoStoreImpl.java:294)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePageNoReuse(DataStructure.java:117)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.DataStructure.allocatePage(DataStructure.java:105)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.addStripe(PagesList.java:413)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.getPageForPut(PagesList.java:528)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.put(PagesList.java:617)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.freelist.FreeListImpl.addForRecycle(FreeListImpl.java:582)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.reuseFreePages(BPlusTree.java:3847)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.releaseAll(BPlusTree.java:4106)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree$Remove.access$6900(BPlusTree.java:3166)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.doRemove(BPlusTree.java:1782)
>     at 
> org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.remove(BPlusTree.java:1567)
>     at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(IgniteCacheOffheapManagerImpl.java:1387)
>     at 
> org.apache.ignite.internal.processors.cache.IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.java:374)
>     at 
> org.apache.ignite.internal.processors.cache.GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233)
>     at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588)
>     at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:892)
>     at 
> org.apache.ignite.internal.processors.cache.distributed.dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:750)
>     at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:593)
>     at 
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader$3.call(GridDhtPreloader.java:580)
>     at 
> org.apache.ignite.internal.util.IgniteUtils.wrapThreadLoader(IgniteUtils.java:6639)
>     at 
> org.apache.ignite.internal.processors.closure.GridClosureProcessor$2.body(GridClosureProcessor.java:967)
>     at 
> org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:748)
> {code}
> Discussion on the dev list: 
> http://apache-ignite-developers.2346864.n4.nabble.com/How-properly-handle-IgniteOOM-td25288.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to