[
https://issues.apache.org/jira/browse/KAFKA-10091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17298273#comment-17298273
]
Konstantine Karantasis commented on KAFKA-10091:
------------------------------------------------
[~vvcephei] is this improvement now targeting the 3.0.0 release?
That's what I see when I read the table in the KIP page here:
[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals]
but I want to confirm before we the "Fix version" here.
> Improve task idling
> -------------------
>
> Key: KAFKA-10091
> URL: https://issues.apache.org/jira/browse/KAFKA-10091
> Project: Kafka
> Issue Type: Task
> Components: streams
> Reporter: John Roesler
> Assignee: John Roesler
> Priority: Major
> Fix For: 2.8.0
>
>
> When Streams is processing a task with multiple inputs, each time it is ready
> to process a record, it has to choose which input to process next. It always
> takes from the input for which the next record has the least timestamp. The
> result of this is that Streams processes data in timestamp order. However, if
> the buffer for one of the inputs is empty, Streams doesn't know what
> timestamp the next record for that input will be.
> Streams introduced a configuration "max.task.idle.ms" in KIP-353 to address
> this issue.
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization]
> The config allows Streams to wait some amount of time for data to arrive on
> the empty input, so that it can make a timestamp-ordered decision about which
> input to pull from next.
> However, this config can be hard to use reliably and efficiently, since what
> we're really waiting for is the next poll that _would_ return data from the
> empty input's partition, and this guarantee is a function of the poll
> interval, the max poll interval, and the internal logic that governs when
> Streams will poll again.
> The ideal case is you'd be able to guarantee at a minimum that _any_ amount
> of idling would guarantee you poll data from the empty partition if there's
> data to fetch.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)