[ 
https://issues.apache.org/jira/browse/IGNITE-12096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17015067#comment-17015067
 ] 

Colin Cassidy edited comment on IGNITE-12096 at 1/14/20 3:05 PM:
-----------------------------------------------------------------

 

I have been looking at the Ignite code - in particular, AbstractFreeList and 
PagesList and debugging the occurrence of this error. I believe that I have 
some understanding of the problem and also a simple potential fix, but I would 
like to verify this.I have been looking at the Ignite code - in particular, 
AbstractFreeList and PagesList and debugging the occurrence of this error. I 
believe that I have some understanding of the problem and also a simple 
potential fix, but I would like to verify this.

The problem occurs because fillFactor is calculated in DataRegionMetricsImpl as 
fillFactor = totalAllocated - freeSpace. When entries are removed from the 
cache in the supplied example, the value of totalAllocated is not reduced. In 
Ignite 2.6, freeSpace increased to compensate but from 2.7, this does not 
happen.
 The reason is that the freeSpace calculation appears to delibareately exclude 
the REUSE_BUCKET - which is the final entry (index 255) in the bucket list. 
[https://github.com/gridgain/gridgain/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L394]

When the test is run in Ignite 2.6, cache entry removal results in a large 
increment to bucket 251. This brings fillFactor close to zero. From Ignite 2.7, 
it is the REUSE_BUCKET that is incremented - but this does not contribute to 
freeSpace.

The difference in behaviour appears to be caused by the following change, which 
appears to mark pages for recycling:

[https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641|https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901]

[https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901|https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901]

My proposed fix is to change AbstractFreeList:394 to include the REUSE_BUCKET 
in the freeSpace calculation i.e.

_for (int b = BUCKETS - 1; b > 0; b--) {_

instead of

_for (int b = BUCKETS - 2; b > 0; b--) {_

I can confirm that this fixes the metrics reporting for the test - but would 
like to understand the reasoning behind excluding the REUSE_BUCKET in the first 
place. Is there some reason why pages that are marked for recycling cannot be 
included in the free list? If so, is there a way we can avoid these pages being 
left in an apparently permanent limbo?

Do you agree with my proposed change, [~sergey-chugunov] and [~jokser]?

Regards,

Colin.


was (Author: coliin):
 

I have been looking at the Ignite code - in particular, AbstractFreeList and 
PagesList and debugging the occurrence of this error. I believe that I have 
some understanding of the problem and also a simple potential fix, but I would 
like to verify this.I have been looking at the Ignite code - in particular, 
AbstractFreeList and PagesList and debugging the occurrence of this error. I 
believe that I have some understanding of the problem and also a simple 
potential fix, but I would like to verify this.


The problem occurs because fillFactor is calculated in DataRegionMetricsImpl as 
fillFactor = totalAllocated - freeSpace. When entries are removed from the 
cache in the supplied example, the value of totalAllocated is not reduced. In 
Ignite 2.6, freeSpace increased to compensate but from 2.7, this does not 
happen.
The reason is that the freeSpace calculation appears to delibareately exclude 
the REUSE_BUCKET - which is the final entry (index 255) in the bucket list. 
[https://github.com/gridgain/gridgain/blob/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L394]


When the test is run in Ignite 2.6, cache entry removal results in a large 
increment to bucket 251. This brings fillFactor close to zero. From Ignite 2.7, 
it is the REUSE_BUCKET that is incremented - but this does not contribute to 
freeSpace.


The difference in behaviour appears to be caused by the following change, which 
appears to mark pages for recycling:

[https://github.com/gridgain/gridgain/blame/master/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/persistence/freelist/AbstractFreeList.java#L641https://github.com/gridgain/gridgain/commit/47da5df328a18d0d55ba534b1af541b72df76901]


My proposed fix is to change AbstractFreeList:394 to include the REUSE_BUCKET 
in the freeSpace calculation i.e.


_for (int b = BUCKETS - 1; b > 0; b--) {_

instead of

_for (int b = BUCKETS - 2; b > 0; b--) {_


I can confirm that this fixes the metrics reporting for the test - but would 
like to understand the reasoning behind excluding the REUSE_BUCKET in the first 
place. Is there some reason why pages that are marked for recycling cannot be 
included in the free list? If so, is there a way we can avoid these pages being 
left in an apparently permanent limbo?


Do you agree with my proposed change, [~sergey-chugunov] and [~jokser]?


Regards,

Colin.

> Used memory metric does not contract when cache is emptied - FreeList 
> REUSE_BUCKET is not accounted for
> -------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-12096
>                 URL: https://issues.apache.org/jira/browse/IGNITE-12096
>             Project: Ignite
>          Issue Type: Bug
>          Components: cache
>    Affects Versions: 2.7
>            Reporter: Colin Cassidy
>            Priority: Critical
>
> When using the Ignite metrics API to measure available memory, the usage 
> figures appear to be accurate while memory is being consumed - but when 
> memory is freed the metrics do not drop. They appear to report that memory 
> has not been freed up, even though it has.
> A reliable estimate of memory consumption is very important for solutions 
> that don't use native persistence - as this is the only controllable way of 
> avoiding a critical OOM condition.
> Reproducer below. This affects Ignite 2.7+.
> {{}}{{import org.apache.ignite.failure.NoOpFailureHandler; }}
>  {{import org.junit.Test; }}
> {{public class MemoryTest2 { }}
> {{    private static final String CACHE_NAME = "cache"; }}
>  {{    private static final String DEFAULT_MEMORY_REGION = "Default_Region"; 
> }}
>  {{    private static final long MEM_SIZE = 100L * 1024 * 1024; }}
> {{    @Test }}
>  {{    public void testOOM() throws InterruptedException { }}
>  {{        try (Ignite ignite = startIgnite("IgniteMemoryMonitorTest1")) { }}
>  {{            fillDataRegion(ignite); }}
>  {{            CacheConfiguration<Object, Object> cfg = new }}
>  {{CacheConfiguration<>(CACHE_NAME); }}
>  {{            cfg.setStatisticsEnabled(true); }}
>  {{            IgniteCache<Object, Object> cache = }}
>  {{ignite.getOrCreateCache(cfg); }}
> {{            // Clear all entries from the cache to free up memory }}
>  {{            memUsed(ignite); }}
>  {{            cache.clear(); }}
>  {{            cache.removeAll(); }}
>  {{            cache.put("Key", "Value"); }}
>  {{            memUsed(ignite); }}
>  {{            cache.destroy(); }}
>  {{            Thread.sleep(5000); }}
> {{            // Should now report close to 0% but reports 59% still }}
>  {{            memUsed(ignite); }}
>  {{        } }}
>  {{    } }}
>  {{    }}
>  {{    private Ignite startIgnite(String instanceName) { }}
>  {{        IgniteConfiguration cfg = new IgniteConfiguration(); }}
>  {{        cfg.setIgniteInstanceName(instanceName); }}
>  {{        cfg.setDataStorageConfiguration(createDataStorageConfiguration()); 
> }}
>  {{        cfg.setFailureHandler(new NoOpFailureHandler()); }}
>  {{        return Ignition.start(cfg); }}
>  {{    } }}
> {{    private DataStorageConfiguration createDataStorageConfiguration() { }}
>  {{        return new DataStorageConfiguration() }}
>  {{                .setDefaultDataRegionConfiguration( }}
>  {{                        new DataRegionConfiguration() }}
>  {{                                .setName(DEFAULT_MEMORY_REGION) }}
>  {{                                .setInitialSize(MEM_SIZE) }}
>  {{                                .setMaxSize(MEM_SIZE) }}
>  {{                                .setMetricsEnabled(true)); }}
>  {{    } }}
> {{    private void fillDataRegion(Ignite ignite) { }}
>  {{        byte[] megabyte = new byte[1024 * 1024]; }}
>  {{            IgniteCache<Object, Object> cache = }}
>  {{                    ignite.getOrCreateCache(CACHE_NAME); }}
>  {{            for (int i = 0; i < 50; i++) { }}
>  {{                cache.put(i, megabyte); }}
>  {{                memUsed(ignite); }}
>  {{            } }}
>  {{    } }}
> {{    private void memUsed(Ignite ignite) { }}
>  {{        DataRegionConfiguration defaultDataRegionCfg = }}
>  {{ignite.configuration() }}
>  {{                .getDataStorageConfiguration() }}
>  {{                .getDefaultDataRegionConfiguration(); }}
>  {{        String regionName = defaultDataRegionCfg.getName(); }}
>  {{        DataRegionMetrics metrics = ignite.dataRegionMetrics(regionName); 
> }}
>  {{        float usedMem = metrics.getPagesFillFactor() * }}
>  {{metrics.getTotalAllocatedPages() * metrics.getPageSize(); }}
>  {{        float pctUsed = 100 * usedMem / defaultDataRegionCfg.getMaxSize(); 
> }}
>  {{        System.out.println("Memory used: " + pctUsed + "%"); }}
>  {{    } }}
>  {{} }}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to