[ 
https://issues.apache.org/jira/browse/IGNITE-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967866#comment-16967866
 ] 

Zane Hu commented on IGNITE-10959:
----------------------------------

We have observed two cases of using huge amount of memory in Ignite Continuous 
Query, which both are caused by too many pending cache-update events since an 
earlier event than the pending events has not arrived yet. BTW, we use Ignite 
2.7.0.
 * One is CacheContinuousQueryHandler.rcvs growing to 7.7 GB Retained Heap, 
seen in Jmap/Memory Analyzer. Also we saw "Pending events reached max of buffer 
size" in Ignite log file. According to 
[https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryPartitionRecovery.java#L196],
 it is because the size of CacheContinuousQueryPartitionRecovery.pendingEvts >= 
MAX_BUFF_SIZE, (default 10,000). And Ignite will flush and remove 10% of the 
entries in the pendingEvts, regardless some unarrived early events are dropped 
without notifying the listener. This upper-bound limit of MAX_BUFF_SIZE 
prevents the memory from further growing to OOM.
 * Another is CacheContinuousQueryEventBuffer.pending growing to 22 GB Retained 
Heap, seen in Jmap/Memory Analyzer. According to 
[https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryEventBuffer.java#L168],
 the cache-update events are processed in batch of 
CacheContinuousQueryEventBuffer.Batch.entries[BUF_SIZE] (default BUF_SIZE is 
1,000). If an event entry is within the current batch (e.updateCounter() <= 
batch.endCntr), it is processed by batch.processEntry0(). Otherwise it is put 
in CacheContinuousQueryEventBuffer.pending. However, according to 
[https://github.com/apache/ignite/blob/ignite-2.7/modules/core/src/main/java/org/apache/ignite/internal/processors/cache/query/continuous/CacheContinuousQueryEventBuffer.java#L425],
 if any event within the current batch has not arrived, Ignite will not move to 
the next batch to process entries in CacheContinuousQueryEventBuffer.pending. 
Differently from processing in 
CacheContinuousQueryPartitionRecovery.pendingEvts and MAX_BUFF_SIZE, there is 
NO upper-bound limit on CacheContinuousQueryEventBuffer.pending. It means if an 
earlier event than the events in CacheContinuousQueryEventBuffer.pending never 
comes for some reason (high frequency of lots of events, high concurrency, 
timeout, ...), CacheContinuousQueryEventBuffer.pending will grow to OOM. To 
prevent this, I think Ignite at least needs to add an upper-bound limit and 
some processing here, to flush and remove 10% events out from 
CacheContinuousQueryEventBuffer.pending, similarly to 
CacheContinuousQueryPartitionRecovery.pendingEvts. In terms of exception 
handling, I think dropping some events is better than OOM. 

 

> Memory leaks in continuous query handlers
> -----------------------------------------
>
>                 Key: IGNITE-10959
>                 URL: https://issues.apache.org/jira/browse/IGNITE-10959
>             Project: Ignite
>          Issue Type: Bug
>    Affects Versions: 2.7
>            Reporter: Denis Mekhanikov
>            Priority: Major
>             Fix For: 2.9
>
>         Attachments: CacheContinuousQueryMemoryUsageTest.java, 
> CacheContinuousQueryMemoryUsageTest.result, 
> CacheContinuousQueryMemoryUsageTest2.java, continuousquery_leak_profile.png
>
>
> Continuous query handlers don't clear internal data structures after cache 
> events are processed.
> A test, that reproduces the problem, is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to