[jira] [Commented] (KAFKA-9846) Race condition can lead to severe lag underestimate for active tasks

Vinoth Chandar (Jira) Tue, 23 Jun 2020 13:24:03 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143275#comment-17143275
 ]


Vinoth Chandar commented on KAFKA-9846:
---------------------------------------

Sounds good. Close this as "Wont fix" then?

> Race condition can lead to severe lag underestimate for active tasks
> --------------------------------------------------------------------
>
>                 Key: KAFKA-9846
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9846
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.5.0
>            Reporter: Sophie Blee-Goldman
>            Assignee: Vinoth Chandar
>            Priority: Critical
>             Fix For: 2.6.0
>
>
> In KIP-535 we added the ability to query still-restoring and standby tasks. 
> To give users control over how out of date the data they fetch can be, we 
> added an API to KafkaStreams that fetches the end offsets for all changelog 
> partitions and computes the lag for each local state store.
> During this lag computation, we check whether an active task is in RESTORING 
> and calculate the actual lag if so. If not, we assume it's in RUNNING and 
> return a lag of zero. However, tasks may be in other states besides running 
> and restoring; notably they first pass through the CREATED state before 
> getting to RESTORING. A CREATED task may happen to be caught-up to the end 
> offset, but in many cases it is likely to be lagging or even completely 
> uninitialized.
> This introduces a race condition where users may be led to believe that a 
> task has zero lag and is "safe" to query even with the strictest correctness 
> guarantees, while the task is actually lagging by some unknown amount.  
> During transfer of ownership of the task between different threads on the 
> same machine, tasks can actually spend a while in CREATED while the new owner 
> waits to acquire the task directory lock. So, this race condition may not be 
> particularly rare in multi-threaded Streams applications



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-9846) Race condition can lead to severe lag underestimate for active tasks

Reply via email to