[
https://issues.apache.org/jira/browse/KAFKA-4113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535220#comment-16535220
]
Tommy Becker commented on KAFKA-4113:
-------------------------------------
[~mjsax] has any thought been given to making the strategy for choosing which
topics to process from pluggable? I feel like the current timestamp behavior is
one such strategy, but for some other use-cases I feel that a simple
topic-level prioritization would be sufficient. For example, in the case where
the table backing topic receives way less traffic than the stream topic, I
think it could be reasonable to always prefer messages from the table topic
over the stream topic. Such a scheme could work for a lot of cases and is quite
a bit easier to reason about and implement.
> Allow KTable bootstrap
> ----------------------
>
> Key: KAFKA-4113
> URL: https://issues.apache.org/jira/browse/KAFKA-4113
> Project: Kafka
> Issue Type: New Feature
> Components: streams
> Reporter: Matthias J. Sax
> Assignee: Guozhang Wang
> Priority: Major
>
> On the mailing list, there are multiple request about the possibility to
> "fully populate" a KTable before actual stream processing start.
> Even if it is somewhat difficult to define, when the initial populating phase
> should end, there are multiple possibilities:
> The main idea is, that there is a rarely updated topic that contains the
> data. Only after this topic got read completely and the KTable is ready, the
> application should start processing. This would indicate, that on startup,
> the current partition sizes must be fetched and stored, and after KTable got
> populated up to those offsets, stream processing can start.
> Other discussed ideas are:
> 1) an initial fixed time period for populating
> (it might be hard for a user to estimate the correct value)
> 2) an "idle" period, ie, if no update to a KTable for a certain time is
> done, we consider it as populated
> 3) a timestamp cut off point, ie, all records with an older timestamp
> belong to the initial populating phase
> The API change is not decided yet, and the API desing is part of this JIRA.
> One suggestion (for option (4)) was:
> {noformat}
> KTable table = builder.table("topic", 1000); // populate the table without
> reading any other topics until see one record with timestamp 1000.
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)