[ 
https://issues.apache.org/jira/browse/IGNITE-17735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607891#comment-17607891
 ] 

Vladimir Steshin edited comment on IGNITE-17735 at 10/25/22 9:34 AM:
---------------------------------------------------------------------

Datastreamer with '_allowOverwrite==true_' and _ATOMIC/PRIMARY_SYNC_ persistent 
cache may consume heap. 

The streamer had been created before the persistence. It's default setting are 
still for in-memory caches. Streamer decides how many data send to a node based 
on CPU number. Probably it's not best approach for persistent caches.

There is related 'perNodeParallelOperations()' setting. But the defaults might 
be adjusted for persistence.

Suggestion: reduce default max unresponded streamer batches for persistent 
caches. There is no reason to send more than 4-8-16 unresponded batches because 
they stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and 
so on. 

The problem is that certain streamer receiver might not wait for backup updates 
on loading node and keep sending update batches again and again. Individual 
receiver uses _cache.put()_. Every put creates additional backup requests. But 
current streamer batch request is already responded to. Next batch updates is 
accepted. Nodes start accumulating related to records update structures in the 
heap. Some JFR screens attached.

 See `DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`, 
`JmhStreamerReceiverBenchmark`.


was (Author: vladsz83):
Datastreamer with '_allowOverwrite==false_' and _ATOMIC/PRIMARY_SYNC_ 
persistent cache may consume heap. 

The streamer had been created before the persistence. It's default setting are 
still for in-memory caches. Streamer decides how many data send to a node based 
on CPU number. Probably it's not best approach for persistent caches.

There is related 'perNodeParallelOperations()' setting. But the defaults might 
be adjusted for persistence.

Suggestion: reduce default max unresponded streamer batches for persistent 
caches. There is no reason to send more than 4-8-16 unresponded batches because 
they stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and 
so on. 

The problem is that certain streamer receiver might not wait for backup updates 
on loading node and keep sending update batches again and again. Individual 
receiver uses _cache.put()_. Every put creates additional backup requests. But 
current streamer batch request is already responded to. Next batch updates is 
accepted. Nodes start accumulating related to records update structures in the 
heap. Some JFR screens attached.

 See `DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`, 
`JmhStreamerReceiverBenchmark`.

> Datastreamer may consume heap with default settings.
> ----------------------------------------------------
>
>                 Key: IGNITE-17735
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17735
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Major
>              Labels: ise
>         Attachments: DS_heap_no_events_no_wal.png, 
> DS_heap_no_events_no_wal_2.png, HeapConsumptionDataStreamerTest.src
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to