Great to know! Thanks Greg. Please keep us posted with any new finding you have.
Guozhang On Fri, Oct 21, 2016 at 12:35 PM, Greg Fodor <[email protected]> wrote: > I managed to track down one case where we were seeing issues with missing > data when transitioning to a new node to being a retention policy on the > topic. There is an additional case but have not been able to repro at this > time. We recently fixed a problem where we were failing to properly > gracefully shut down our jobs in certain cases so there's a chance that > might be related. Anyhow, now that I have a better understanding of things > I will be able to investigate if we experience missing keys in the future, > thanks! > > On Oct 20, 2016 2:08 PM, "Greg Fodor (JIRA)" <[email protected]> wrote: > > > [ https://issues.apache.org/jira/browse/KAFKA-4113?page= > com.atlassian.jira.plugin.system.issuetabpanels:comment- > tabpanel&focusedCommentId=15593018#comment-15593018 ] > > Greg Fodor commented on KAFKA-4113: > ----------------------------------- > > Oh, so it should be doing exactly what makes sense to me -- I am on 0.10.0. > Let me verify that there isn't something else going on! Thanks for the > info. > > > Allow KTable bootstrap > > ---------------------- > > > > Key: KAFKA-4113 > > URL: https://issues.apache.org/jira/browse/KAFKA-4113 > > Project: Kafka > > Issue Type: Sub-task > > Components: streams > > Reporter: Matthias J. Sax > > Assignee: Guozhang Wang > > > > On the mailing list, there are multiple request about the possibility to > "fully populate" a KTable before actual stream processing start. > > Even if it is somewhat difficult to define, when the initial populating > phase should end, there are multiple possibilities: > > The main idea is, that there is a rarely updated topic that contains the > data. Only after this topic got read completely and the KTable is ready, > the application should start processing. This would indicate, that on > startup, the current partition sizes must be fetched and stored, and after > KTable got populated up to those offsets, stream processing can start. > > Other discussed ideas are: > > 1) an initial fixed time period for populating > > (it might be hard for a user to estimate the correct value) > > 2) an "idle" period, ie, if no update to a KTable for a certain time is > > done, we consider it as populated > > 3) a timestamp cut off point, ie, all records with an older timestamp > > belong to the initial populating phase > > The API change is not decided yet, and the API desing is part of this > JIRA. > > One suggestion (for option (4)) was: > > {noformat} > > KTable table = builder.table("topic", 1000); // populate the table > without reading any other topics until see one record with timestamp 1000. > > {noformat} > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332) > -- -- Guozhang
