[
https://issues.apache.org/jira/browse/IGNITE-17735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607891#comment-17607891
]
Vladimir Steshin edited comment on IGNITE-17735 at 10/18/22 4:33 PM:
---------------------------------------------------------------------
Datastreamer with Individual receiver and ATOMIC/PRIMARY_SYNC persistent cache
may consume heap. It has related 'perNodeParallelOperations()' setting. But it
doesn't depend on StreamReceiver. User can experience heap issues with a
trivial case.
The problem is that the streamer doesn't wait for backup updates on primary
node and keep sending update batches again and again. Individual receiver uses
cache.put(). Every put creates a future for primary update and future and
update update request for the backups. Nodes start accumulating related to
single update objects in the heap (`processDhtAtomicUpdateRequest()`).
There is no reason to send more than 2-3-4 unresponded batches because they
stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so
on. Why so many parallel batches by default? Especially for persistent caches.
IgniteDataStreamer.DFLT_PARALLEL_OPS_MULTIPLIER=8 is weird to me. With 8 CPUs
and 16 threads I get 128 parallel batches.
Proposal: reduce default max parallel batches for a nod. Make this value depend
on the persistence.
Some JFR screens attached.
See `JmhStreamerReceiverBenchmark.bchIndividual_512_1()`,
`DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`.
was (Author: vladsz83):
Datastreamer with Individual receiver and ATOMIC/PRIMARY_SYNC persistent cache
may consume heap. The test case is simple: 2 or 3 servers, 2 or 1 backups and
Datastreamer from client loading significant amount of data. Around 1G of heap.
Tested with 6 (16) CPU's, 6-16 streamer threads.
See `JmhStreamerReceiverBenchmark.bchIndividual_512_1()`,
`DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`.
The problem is that the streamer doesn't wait for backup updates on primary
node and keep sending update batches again and again. Individual receiver uses
cache.put(). Every put creates a future for primary update and future and
update update request for the backups. Nodes start accumulating related to
single update objects in the heap (`processDhtAtomicUpdateRequest()`).
There is no reason to send more than 2-3-4 unresponded batches because they
stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so
on. Why so many parallel batches by default? Especially for persistent caches.
IgniteDataStreamer.DFLT_PARALLEL_OPS_MULTIPLIER=8 is weird to me. With 8 CPUs
and 16 threads I get 128 parallel batches.
Solution: reduce default max parallel batches for a nod. Make this value depend
on the persistence.
Some JFR screens attached.
> Datastreamer may consume heap with default settings.
> ----------------------------------------------------
>
> Key: IGNITE-17735
> URL: https://issues.apache.org/jira/browse/IGNITE-17735
> Project: Ignite
> Issue Type: Sub-task
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Labels: ise
> Attachments: DS_heap_no_events_no_wal.png,
> DS_heap_no_events_no_wal_2.png
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)