[
https://issues.apache.org/jira/browse/IGNITE-17735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607891#comment-17607891
]
Vladimir Steshin edited comment on IGNITE-17735 at 9/21/22 8:10 PM:
--------------------------------------------------------------------
Datastreamer with Individual receiver and ATOMIC/PRIMARY_SYNC persistent cache
may consume heap. The test case is simple: 2 or 3 servers, 2 or 1 backups and
Datastreamer from client loading significant amount of data. Around 1G of heap.
Tested with 6 (16) CPU's, 6-16 streamer threads.
See `JmhStreamerReceiverBenchmark.bchIndividual_512_1()`,
`DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`.
The problem is that the streamer doesn't wait for backup updates on primary
node and keep sending update batches again and again. Individual receiver uses
cache.put(). Every put creates a future for primary update and future and
update update request for the backups. Nodes start accumulating related to
single update objects in the heap (`processDhtAtomicUpdateRequest()`).
There is no reason to send more than 2-3-4 unresponded batches because they
stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so
on. Why so many parallel batches by default? Especially for persistent caches.
IgniteDataStreamer.DFLT_PARALLEL_OPS_MULTIPLIER=8 is weird to me. With 8 CPUs
and 16 threads I get 128 parallel batches.
Solution: reduce default max parallel batches for a nod. Make this value depend
on the persistence.
Some JFR screens attached.
was (Author: vladsz83):
Datastreamer with Individual receiver and ATOMIC/PRIMARY_SYNC persistent cache
may consume heap. The test case is simple: 2 or 3 servers, 2 or 1 backups and
Datastreamer from client loading significant amount of data. Around 1G of heap.
Tested with 6 (16) CPU's, 6-16 streamer threads.
See `JmhStreamerReceiverBenchmark.bchIndividual_512_1()`,
`DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`.
The problem is that the streamer doesn't wait for backup updates from primary
node and keep sending update batches again and again. Individual receiver uses
cache.put(). Every put creates a future and update request for the backups.
Nodes start accumulating related to single update objects in the heap
(`processDhtAtomicUpdateRequest()`).
There is no reason to send more than 2-3-4 unresponded batches because they
stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so
on. Why so many parallel batches by default? Especially for persistent caches.
IgniteDataStreamer.DFLT_PARALLEL_OPS_MULTIPLIER=8 is weird to me. With 8 CPUs
and 16 threads I get 128 parallel batches.
Solution: reduce default max parallel batches for a nod. Make this value depend
on the persistence.
Some JFR screens attached.
> Datastreamer may consume whole heap.
> ------------------------------------
>
> Key: IGNITE-17735
> URL: https://issues.apache.org/jira/browse/IGNITE-17735
> Project: Ignite
> Issue Type: Sub-task
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Attachments: DS_heap_no_events_no_wal.png,
> DS_heap_no_events_no_wal_2.png
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)