[ 
https://issues.apache.org/jira/browse/IGNITE-28673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Chesnokov updated IGNITE-28673:
-----------------------------------------
    Description: 
Remove the mutable continuous routine start discovery path and keep a single 
immutable-style flow for starting routines. This should simplify 
*{{GridContinuousProcessor}}* by removing duplicated start messages, mutable 
ack handling, and special branching between mutable and immutable discovery 
modes

During validation, 
*{{{}CacheContinuousQueryConcurrentPartitionUpdateTest{}}}#concurrentUpdatesAndQueryStart*
 exposed a performance issue. After replacing the discovery ack with 
{*}{{ContinuousRoutineStartResultMessage}}{*}, CQ start completion started 
using the same *{{TOPIC_CONTINUOUS}}* communication path as regular CQ 
notifications

Under heavy update load, CQ notifications may create a large communication 
backlog and delay {*}{{ContinuousRoutineStartResultMessage}}{*}. JFR and heap 
dump analysis showed millions of retained *{{GridContinuousMessage}}* and 
*{{CacheContinuousQueryEntry}}* objects and only one waiting 
*{{ContinuousRoutineStartResultMessage}}*

As a result, _{{cache.query(qry)}}_ may hang for a long time while waiting for 
*{{ContinuousRoutineStartResultMessage}}* from the communication queue instead 
of discovery

UPD: The same problem with zookeeper and immutable path.
 * Download and apply patch from attachments
 * Run zkTest#testConcurrentUpdatesAndQueryStartAtomicCacheGroup
 * See OOM

  was:
Remove the mutable continuous routine start discovery path and keep a single 
immutable-style flow for starting routines. This should simplify 
*{{GridContinuousProcessor}}* by removing duplicated start messages, mutable 
ack handling, and special branching between mutable and immutable discovery 
modes

During validation, 
*{{{}CacheContinuousQueryConcurrentPartitionUpdateTest{}}}#concurrentUpdatesAndQueryStart*
 exposed a performance issue. After replacing the discovery ack with 
{*}{{ContinuousRoutineStartResultMessage}}{*}, CQ start completion started 
using the same *{{TOPIC_CONTINUOUS}}* communication path as regular CQ 
notifications

Under heavy update load, CQ notifications may create a large communication 
backlog and delay {*}{{ContinuousRoutineStartResultMessage}}{*}. JFR and heap 
dump analysis showed millions of retained *{{GridContinuousMessage}}* and 
*{{CacheContinuousQueryEntry}}* objects and only one waiting 
*{{ContinuousRoutineStartResultMessage}}*

As a result, _{{cache.query(qry)}}_ may hang for a long time while waiting for 
*{{ContinuousRoutineStartResultMessage}}* from the communication queue instead 
of discovery


> Remove mutable path from continuous routine start discovery
> -----------------------------------------------------------
>
>                 Key: IGNITE-28673
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28673
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Aleksandr Chesnokov
>            Assignee: Aleksandr Chesnokov
>            Priority: Major
>              Labels: IEP-132, ise
>         Attachments: patch.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Remove the mutable continuous routine start discovery path and keep a single 
> immutable-style flow for starting routines. This should simplify 
> *{{GridContinuousProcessor}}* by removing duplicated start messages, mutable 
> ack handling, and special branching between mutable and immutable discovery 
> modes
> During validation, 
> *{{{}CacheContinuousQueryConcurrentPartitionUpdateTest{}}}#concurrentUpdatesAndQueryStart*
>  exposed a performance issue. After replacing the discovery ack with 
> {*}{{ContinuousRoutineStartResultMessage}}{*}, CQ start completion started 
> using the same *{{TOPIC_CONTINUOUS}}* communication path as regular CQ 
> notifications
> Under heavy update load, CQ notifications may create a large communication 
> backlog and delay {*}{{ContinuousRoutineStartResultMessage}}{*}. JFR and heap 
> dump analysis showed millions of retained *{{GridContinuousMessage}}* and 
> *{{CacheContinuousQueryEntry}}* objects and only one waiting 
> *{{ContinuousRoutineStartResultMessage}}*
> As a result, _{{cache.query(qry)}}_ may hang for a long time while waiting 
> for *{{ContinuousRoutineStartResultMessage}}* from the communication queue 
> instead of discovery
> UPD: The same problem with zookeeper and immutable path.
>  * Download and apply patch from attachments
>  * Run zkTest#testConcurrentUpdatesAndQueryStartAtomicCacheGroup
>  * See OOM



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to