[ https://issues.apache.org/jira/browse/KAFKA-9846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
A. Sophie Blee-Goldman resolved KAFKA-9846. ------------------------------------------- Resolution: Fixed Resolving since this is fixed in 2.6 > Race condition can lead to severe lag underestimate for active tasks > -------------------------------------------------------------------- > > Key: KAFKA-9846 > URL: https://issues.apache.org/jira/browse/KAFKA-9846 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.5.0 > Reporter: A. Sophie Blee-Goldman > Priority: Critical > Fix For: 2.6.0 > > > In KIP-535 we added the ability to query still-restoring and standby tasks. > To give users control over how out of date the data they fetch can be, we > added an API to KafkaStreams that fetches the end offsets for all changelog > partitions and computes the lag for each local state store. > During this lag computation, we check whether an active task is in RESTORING > and calculate the actual lag if so. If not, we assume it's in RUNNING and > return a lag of zero. However, tasks may be in other states besides running > and restoring; notably they first pass through the CREATED state before > getting to RESTORING. A CREATED task may happen to be caught-up to the end > offset, but in many cases it is likely to be lagging or even completely > uninitialized. > This introduces a race condition where users may be led to believe that a > task has zero lag and is "safe" to query even with the strictest correctness > guarantees, while the task is actually lagging by some unknown amount. > During transfer of ownership of the task between different threads on the > same machine, tasks can actually spend a while in CREATED while the new owner > waits to acquire the task directory lock. So, this race condition may not be > particularly rare in multi-threaded Streams applications -- This message was sent by Atlassian Jira (v8.3.4#803005)