Re: [jira] [Commented] (KAFKA-4113) Allow KTable bootstrap

Greg Fodor Fri, 21 Oct 2016 12:36:35 -0700

I managed to track down one case where we were seeing issues with missing
data when transitioning to a new node to being a retention policy on the
topic. There is an additional case but have not been able to repro at this
time. We recently fixed a problem where we were failing to properly
gracefully shut down our jobs in certain cases so there's a chance that
might be related. Anyhow, now that I have a better understanding of things
I will be able to investigate if we experience missing keys in the future,
thanks!


On Oct 20, 2016 2:08 PM, "Greg Fodor (JIRA)" <[email protected]> wrote:


    [ https://issues.apache.org/jira/browse/KAFKA-4113?page=
com.atlassian.jira.plugin.system.issuetabpanels:comment-
tabpanel&focusedCommentId=15593018#comment-15593018 ]

Greg Fodor commented on KAFKA-4113:
-----------------------------------

Oh, so it should be doing exactly what makes sense to me -- I am on 0.10.0.
Let me verify that there isn't something else going on! Thanks for the info.

> Allow KTable bootstrap
> ----------------------
>
>                 Key: KAFKA-4113
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4113
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Guozhang Wang
>
> On the mailing list, there are multiple request about the possibility to
"fully populate" a KTable before actual stream processing start.
> Even if it is somewhat difficult to define, when the initial populating
phase should end, there are multiple possibilities:
> The main idea is, that there is a rarely updated topic that contains the
data. Only after this topic got read completely and the KTable is ready,
the application should start processing. This would indicate, that on
startup, the current partition sizes must be fetched and stored, and after
KTable got populated up to those offsets, stream processing can start.
> Other discussed ideas are:
> 1) an initial fixed time period for populating
> (it might be hard for a user to estimate the correct value)
> 2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
> 3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
> The API change is not decided yet, and the API desing is part of this
JIRA.
> One suggestion (for option (4)) was:
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table
without reading any other topics until see one record with timestamp 1000.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [jira] [Commented] (KAFKA-4113) Allow KTable bootstrap

Reply via email to