[ https://issues.apache.org/jira/browse/IGNITE-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015067#comment-17015067 ]
Colin Cassidy edited comment on IGNITE-12096 at 1/14/20 3:11 PM: ----------------------------------------------------------------- I have been looking at the related Ignite code - in particular, AbstractFreeList and PagesList and debugging the occurrence of this error. I believe that I have some understanding of the problem and also a simple potential fix, but I would like to verify this.I have been looking at the Ignite code - in particular, AbstractFreeList and PagesList and debugging the occurrence of this error. I believe that I have some understanding of the problem and also a simple potential fix, but I would like to verify this. The problem occurs because fillFactor is calculated in DataRegionMetricsImpl as fillFactor = totalAllocated - freeSpace. When entries are removed from the cache in the supplied example, the value of totalAllocated is not reduced. In Ignite 2.6, freeSpace increased to compensate but from 2.7, this does not happen. The reason is that the freeSpace calculation appears to delibareately exclude the REUSE_BUCKET - which is the final entry (index 255) in the bucket list. [https://github.com/gridgain/gridgain/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L394] When the test is run in Ignite 2.6, cache entry removal results in a large increment to bucket 251. This brings fillFactor close to zero. From Ignite 2.7, it is the REUSE_BUCKET that is incremented - but this does not contribute to freeSpace. The difference in behaviour appears to be caused by the below change ([IGNITE-10669|https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901]), which appears to mark pages for recycling: [https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641|https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901] My proposed fix is to change AbstractFreeList:394 to include the REUSE_BUCKET in the freeSpace calculation i.e. _for (int b = BUCKETS - 1; b > 0; b--) {_ instead of _for (int b = BUCKETS - 2; b > 0; b--) {_ I can confirm that this fixes the metrics reporting for the test - but would like to understand the reasoning behind excluding the REUSE_BUCKET in the first place. Is there some reason why pages that are marked for recycling cannot be included in the free list? If so, is there a way we can avoid these pages being left in an apparently permanent limbo? Do you agree with my proposed change, [~sergey-chugunov] and [~jokser]? FYI - the issue exists in all versions that I have tested including GitHub master and GridGain commercial versions. As far as I'm aware IGNITE-11455 and IGNITE-10669 do not fix this issue. Regards, Colin. was (Author: coliin): I have been looking at the Ignite code - in particular, AbstractFreeList and PagesList and debugging the occurrence of this error. I believe that I have some understanding of the problem and also a simple potential fix, but I would like to verify this.I have been looking at the Ignite code - in particular, AbstractFreeList and PagesList and debugging the occurrence of this error. I believe that I have some understanding of the problem and also a simple potential fix, but I would like to verify this. The problem occurs because fillFactor is calculated in DataRegionMetricsImpl as fillFactor = totalAllocated - freeSpace. When entries are removed from the cache in the supplied example, the value of totalAllocated is not reduced. In Ignite 2.6, freeSpace increased to compensate but from 2.7, this does not happen. The reason is that the freeSpace calculation appears to delibareately exclude the REUSE_BUCKET - which is the final entry (index 255) in the bucket list. [https://github.com/gridgain/gridgain/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L394] When the test is run in Ignite 2.6, cache entry removal results in a large increment to bucket 251. This brings fillFactor close to zero. From Ignite 2.7, it is the REUSE_BUCKET that is incremented - but this does not contribute to freeSpace. The difference in behaviour appears to be caused by the following change, which appears to mark pages for recycling: [https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641|https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901] [https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901|https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901] My proposed fix is to change AbstractFreeList:394 to include the REUSE_BUCKET in the freeSpace calculation i.e. _for (int b = BUCKETS - 1; b > 0; b--) {_ instead of _for (int b = BUCKETS - 2; b > 0; b--) {_ I can confirm that this fixes the metrics reporting for the test - but would like to understand the reasoning behind excluding the REUSE_BUCKET in the first place. Is there some reason why pages that are marked for recycling cannot be included in the free list? If so, is there a way we can avoid these pages being left in an apparently permanent limbo? Do you agree with my proposed change, [~sergey-chugunov] and [~jokser]? Regards, Colin. > Used memory metric does not contract when cache is emptied - FreeList > REUSE_BUCKET is not accounted for > ------------------------------------------------------------------------------------------------------- > > Key: IGNITE-12096 > URL: https://issues.apache.org/jira/browse/IGNITE-12096 > Project: Ignite > Issue Type: Bug > Components: cache > Affects Versions: 2.7 > Reporter: Colin Cassidy > Priority: Critical > > When using the Ignite metrics API to measure available memory, the usage > figures appear to be accurate while memory is being consumed - but when > memory is freed the metrics do not drop. They appear to report that memory > has not been freed up, even though it has. > A reliable estimate of memory consumption is very important for solutions > that don't use native persistence - as this is the only controllable way of > avoiding a critical OOM condition. > Reproducer below. This affects Ignite 2.7+. > {{}}{{import org.apache.ignite.failure.NoOpFailureHandler; }} > {{import org.junit.Test; }} > {{public class MemoryTest2 { }} > {{ private static final String CACHE_NAME = "cache"; }} > {{ private static final String DEFAULT_MEMORY_REGION = "Default_Region"; > }} > {{ private static final long MEM_SIZE = 100L * 1024 * 1024; }} > {{ @Test }} > {{ public void testOOM() throws InterruptedException { }} > {{ try (Ignite ignite = startIgnite("IgniteMemoryMonitorTest1")) { }} > {{ fillDataRegion(ignite); }} > {{ CacheConfiguration<Object, Object> cfg = new }} > {{CacheConfiguration<>(CACHE_NAME); }} > {{ cfg.setStatisticsEnabled(true); }} > {{ IgniteCache<Object, Object> cache = }} > {{ignite.getOrCreateCache(cfg); }} > {{ // Clear all entries from the cache to free up memory }} > {{ memUsed(ignite); }} > {{ cache.clear(); }} > {{ cache.removeAll(); }} > {{ cache.put("Key", "Value"); }} > {{ memUsed(ignite); }} > {{ cache.destroy(); }} > {{ Thread.sleep(5000); }} > {{ // Should now report close to 0% but reports 59% still }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ }} > {{ private Ignite startIgnite(String instanceName) { }} > {{ IgniteConfiguration cfg = new IgniteConfiguration(); }} > {{ cfg.setIgniteInstanceName(instanceName); }} > {{ cfg.setDataStorageConfiguration(createDataStorageConfiguration()); > }} > {{ cfg.setFailureHandler(new NoOpFailureHandler()); }} > {{ return Ignition.start(cfg); }} > {{ } }} > {{ private DataStorageConfiguration createDataStorageConfiguration() { }} > {{ return new DataStorageConfiguration() }} > {{ .setDefaultDataRegionConfiguration( }} > {{ new DataRegionConfiguration() }} > {{ .setName(DEFAULT_MEMORY_REGION) }} > {{ .setInitialSize(MEM_SIZE) }} > {{ .setMaxSize(MEM_SIZE) }} > {{ .setMetricsEnabled(true)); }} > {{ } }} > {{ private void fillDataRegion(Ignite ignite) { }} > {{ byte[] megabyte = new byte[1024 * 1024]; }} > {{ IgniteCache<Object, Object> cache = }} > {{ ignite.getOrCreateCache(CACHE_NAME); }} > {{ for (int i = 0; i < 50; i++) { }} > {{ cache.put(i, megabyte); }} > {{ memUsed(ignite); }} > {{ } }} > {{ } }} > {{ private void memUsed(Ignite ignite) { }} > {{ DataRegionConfiguration defaultDataRegionCfg = }} > {{ignite.configuration() }} > {{ .getDataStorageConfiguration() }} > {{ .getDefaultDataRegionConfiguration(); }} > {{ String regionName = defaultDataRegionCfg.getName(); }} > {{ DataRegionMetrics metrics = ignite.dataRegionMetrics(regionName); > }} > {{ float usedMem = metrics.getPagesFillFactor() * }} > {{metrics.getTotalAllocatedPages() * metrics.getPageSize(); }} > {{ float pctUsed = 100 * usedMem / defaultDataRegionCfg.getMaxSize(); > }} > {{ System.out.println("Memory used: " + pctUsed + "%"); }} > {{ } }} > {{} }} -- This message was sent by Atlassian Jira (v8.3.4#803005)