[jira] [Comment Edited] (IGNITE-17735) Datastreamer may consume whole heap.

Vladimir Steshin (Jira) Wed, 21 Sep 2022 13:11:03 -0700


    [ 
https://issues.apache.org/jira/browse/IGNITE-17735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17607891#comment-17607891
 ]


Vladimir Steshin edited comment on IGNITE-17735 at 9/21/22 8:10 PM:
--------------------------------------------------------------------

Datastreamer with Individual receiver and ATOMIC/PRIMARY_SYNC persistent cache 
may consume heap. The test case is simple: 2 or 3 servers, 2 or 1 backups and 
Datastreamer from client loading significant amount of data. Around 1G of heap. 
Tested with 6 (16) CPU's, 6-16 streamer threads.

See `JmhStreamerReceiverBenchmark.bchIndividual_512_1()`, 
`DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`.

The problem is that the streamer doesn't wait for backup updates on primary 
node and keep sending update batches again and again. Individual receiver uses 
cache.put(). Every put creates a future for primary update and future and 
update update request for the backups. Nodes start accumulating related  to 
single update objects in the heap (`processDhtAtomicUpdateRequest()`).

There is no reason to send more than 2-3-4 unresponded batches because they 
stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so 
on. Why so many parallel batches by default? Especially for persistent caches. 
IgniteDataStreamer.DFLT_PARALLEL_OPS_MULTIPLIER=8 is weird to me. With 8 CPUs 
and 16 threads I get 128 parallel batches. 

Solution: reduce default max parallel batches for a nod. Make this value depend 
on the persistence.

Some JFR screens attached.


was (Author: vladsz83):
Datastreamer with Individual receiver and ATOMIC/PRIMARY_SYNC persistent cache 
may consume heap. The test case is simple: 2 or 3 servers, 2 or 1 backups and 
Datastreamer from client loading significant amount of data. Around 1G of heap. 
Tested with 6 (16) CPU's, 6-16 streamer threads.

See `JmhStreamerReceiverBenchmark.bchIndividual_512_1()`, 
`DataStreamProcessorSelfTest.testAtomicPrimarySyncStability()`.

The problem is that the streamer doesn't wait for backup updates from primary 
node and keep sending update batches again and again. Individual receiver uses 
cache.put(). Every put creates a future and update request for the backups. 
Nodes start accumulating related  to single update objects in the heap 
(`processDhtAtomicUpdateRequest()`).

There is no reason to send more than 2-3-4 unresponded batches because they 
stuck at disk writes, WAL writes, page replacements, WAL rolling, GCs and so 
on. Why so many parallel batches by default? Especially for persistent caches. 
IgniteDataStreamer.DFLT_PARALLEL_OPS_MULTIPLIER=8 is weird to me. With 8 CPUs 
and 16 threads I get 128 parallel batches. 

Solution: reduce default max parallel batches for a nod. Make this value depend 
on the persistence.

Some JFR screens attached.

> Datastreamer may consume whole heap.
> ------------------------------------
>
>                 Key: IGNITE-17735
>                 URL: https://issues.apache.org/jira/browse/IGNITE-17735
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Major
>         Attachments: DS_heap_no_events_no_wal.png, 
> DS_heap_no_events_no_wal_2.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (IGNITE-17735) Datastreamer may consume whole heap.

Reply via email to