This is a known issue in GemFire. I filed a corresponding Geode
ticket: GEODE-1672.

The issue involves recovering large amounts of when using the heap LRU. The
data get recovered asynchronously before it can be evicted by the heap LRU.
So it is possible you will run out of heap during recovery. Eviction only
moves an entry value to disk. The entry key and entry meta-data remain in
memory. Geode will stop recovering values once the resource manager hits
the eviction limit but, since it is not able to evict values already
recovered, it can run out of memory recovering the keys and meta-data for
subsequent entries.

In your case the only known workaround is to set this system property:
-Dgemfire.disk.recoverValues=false
This will cause values to not be asynchronously recovered from disk. So any
time you ask for a recovered value by doing a Region get it will need to be
faulted in at that time. For some uses cases this might be optimal since if
prevents recovery from faulting in values that may never be read.


On Fri, Jul 15, 2016 at 2:01 PM, Nilkanth Patel <nilkanth.hpa...@gmail.com>
wrote:

> Hello,
>
> Facing issue in recovering data for persisted regions when large amount
> (more than heap) of data is persisted.
>
> brief about scenario .
>
> Creating 10 regions, lets call it R1, R2, R3, ... R10 with following
> config.
> For R1, R2, Total # of buckets = 113.
> For R3, R4, R10, #of buckets = 511.
>
> All above regions are configured with Disk persistance enabled (ASYNCH) and
> eviction action overflow to disk. like,
>
> RegionFactory<> rf =
> cache.createRegionFactory(RegionShortcut.PARTITION_PERSISTENT_OVERFLOW);
> rf.setDiskSynchronous(false) //for asynch writes.
> rf.setDiskStoreName("myDiskStore");PartitionAttributesFactory paf =
> new
> PartitionAttributesFactory().setRedundantCopies(3).paf.setTotalNumBuckets(511);
>
>
> For each server, Setting both --initial-heap and --max-heap to same, i.e
> 16gb with --eviction-heap-percentage=81 --critical-heap-percentage=90
>
> I keep the system running (puts, gets, delete) for hours to add data over
> time until i have overflowed tons of data approaching the heap size or
> more.
> Now i shutdown my cluster and then attempt to restart but it does not come
> up. It seems during this early phase of recovery (large amount of data),
> geode surpasses the critical threshold which kills it before successful
> startup.
>
> Is this observation correct and is this a known limitation...? If so any
> work around for this..?
>
> Also, Considering the above case, recovery for (1)
> ForceDisconnect--->Autoconnect case and (2) normal_shutdown-->restart case
> is a same mechanism or is there any differences?
>
> Thanks in advance,.
>
> Nilkanth.
>

Reply via email to