[ https://issues.apache.org/jira/browse/IGNITE-17735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607891#comment-17607891 ]
Vladimir Steshin edited comment on IGNITE-17735 at 10/25/22 9:34 AM: --------------------------------------------------------------------- Datastreamer with '_allowOverwrite==false_' and _ATOMIC/PRIMARY_SYNC_ persistent cache may consume heap. The streamer had been created before the persistence. It's default setting are still for in-memory caches. Streamer decides how many data send to a node based on CPU number. Probably it's not best approach for persistent caches. There is related 'perNodeParallelOperations()' setting. But the defaults might be adjusted for persistence. Suggestion: reduce default max unresponded streamer batches for persistent caches. There is no reason to send more than 4-8-16 unresponded batches because they stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so on. The problem is that certain streamer receiver might not wait for backup updates on loading node and keep sending update batches again and again. Individual receiver uses _cache.put()_. Every put creates additional backup requests. But current streamer batch request is already responded to. Next batch updates is accepted. Nodes start accumulating related to records update structures in the heap. Some JFR screens attached. See `DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`, `JmhStreamerReceiverBenchmark`. was (Author: vladsz83): Datastreamer with Individual receiver and ATOMIC/PRIMARY_SYNC persistent cache may consume heap. It has related 'perNodeParallelOperations()' setting. But it doesn't depend on StreamReceiver. User can experience heap issues with a trivial case. The problem is that the streamer doesn't wait for backup updates on primary node and keep sending update batches again and again. Individual receiver uses cache.put(). Every put creates a future for primary update and future and update update request for the backups. Nodes start accumulating related to single update objects in the heap (`processDhtAtomicUpdateRequest()`). There is no reason to send more than 2-3-4 unresponded batches because they stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so on. Why so many parallel batches by default? Especially for persistent caches. IgniteDataStreamer.DFLT_PARALLEL_OPS_MULTIPLIER=8 is weird to me. With 8 CPUs and 16 threads I get 128 parallel batches. Proposal: reduce default max parallel batches for a nod. Make this value depend on the persistence. Some JFR screens attached. See `JmhStreamerReceiverBenchmark.bchIndividual_512_1()`, `DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`. > Datastreamer may consume heap with default settings. > ---------------------------------------------------- > > Key: IGNITE-17735 > URL: https://issues.apache.org/jira/browse/IGNITE-17735 > Project: Ignite > Issue Type: Sub-task > Reporter: Vladimir Steshin > Assignee: Vladimir Steshin > Priority: Major > Labels: ise > Attachments: DS_heap_no_events_no_wal.png, > DS_heap_no_events_no_wal_2.png, HeapConsumptionDataStreamerTest.src > > Time Spent: 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010)