[
https://issues.apache.org/jira/browse/KAFKA-6542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16964779#comment-16964779
]
Chad Preisler edited comment on KAFKA-6542 at 11/1/19 11:50 AM:
----------------------------------------------------------------
>KTables are not really bootstrapped – compare KAFKA-4113 for a detailed
>discussion.
I've read KAFKA-4113, but we've had missed joins on start up because the
kTable was still building. That issue referenced this on as addressing the
"gaps".--
> Correct, but this describes only the failure case scenario.
> a new application will have an empty KTable state when it's started.
It seems reasonable to expect that if a record exists on the right side of the
join condition at application startup that the join should succeed regardless
of the state of the "current" kTable. If we can't rely on that assumption, then
we can't rely on kStream to kTable joins in general.
When running applications in a containerized environment (k8s), I would say it
is more normal to have to rebuild a kTable from scratch than it is to have an
existing kTable on startup.
We've had joins fail on startup with the following scenario.
- Exactly once semantics turned on.
- Pod crashes.
- Existing kTable state is gone.
- Application restarts
- Last failed record is sent
- Join fails because kTable is still building and right side of join does not
exist yet. In this situation the right side record existed on a topic before
the left side record was created on the topic.
I'll assume that there is a scenario where having this situation is okay, but
there should at least be an option to "bootstrap" the kTable on startup. I know
global kTable does bootstrap, but kTable is more appealing for large topics
because we can run more instances and reduce startup time.
was (Author: cpreisler):
>KTables are not really bootstrapped – compare KAFKA-4113 for a detailed
>discussion.
I've read KAFKA-4113, but we've had missed joins on start up because the
kTable was still building. That issue referenced this on as addressing the
"gaps".--
> Correct, but this describes only the failure case scenario.
> a new application will have an empty KTable state when it's started.
It seems reasonable to expect that if a record exists on the right side of the
join condition at application startup that the join should succeed regardless
of the state of the "current" kTable. If we can't rely on that assumption, then
we can't rely on kStream to kTable joins in general.
When running applications in a containerized environment (k8s), I would say it
is more normal to have to rebuild a kTable from scratch than it is to have an
existing kTable on startup.
We've had joins fail on startup with the following scenario.
- Exactly once semantics turned on.
- Pod crashes.
- Existing kTable state is gone.
- Application restarts
- Last failed record is sent
- Join fails because kTable is still building and right side of join does not
exist yet. In this situation the right side of the record existed before the
left side record was created.
I'll assume that there is a scenario where having this situation is okay, but
there should at least be an option to "bootstrap" the kTable on startup. I know
global kTable does bootstrap, but kTable is more appealing for large topics
because we can run more instances and reduce startup time.
> Tables should trigger joins too, not just streams
> -------------------------------------------------
>
> Key: KAFKA-6542
> URL: https://issues.apache.org/jira/browse/KAFKA-6542
> Project: Kafka
> Issue Type: Improvement
> Components: streams
> Affects Versions: 1.1.0
> Reporter: Antony Stubbs
> Priority: Major
>
> At the moment it's quite possible to have a race condition when joining a
> stream with a table, if the stream event arrives first, before the table
> event, in which case the join will fail.
> This is also related to bootstrapping KTables (which is what a GKTable does).
> Related to: KAFKA-4113 Allow KTable bootstrap
--
This message was sent by Atlassian Jira
(v8.3.4#803005)