[
https://issues.apache.org/jira/browse/KAFKA-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263424#comment-16263424
]
Matthias J. Sax commented on KAFKA-6249:
----------------------------------------
Atm, it is not possible to query StandbyTasks -- for this reason, the metadata
is not exposed. KAFKA-6144 is exactly about making StandbyTasks queryable (the
JIRA title is a little bit obscure, but exposing the metadata implies to make
StandbyTask queryable -- otherwise, exposing the metadata would not make any
sense).
So your observation/experiment make totally sense. Can we close this JIRA as
duplicate of KAFKA-6144 than?
This wondering, why you have long time for which you cannot query -- for
failure scenario, StandbyTasks should become active tasks and take over --
thus, downtime for IQ should be very short. (Note: this does not apply to scale
out scenarios -- newly added instanced are not queryable for some time as they
start with "nothing" thus need to rebuild state, what can take some time. This
is a know issue and will be addressed in the future though).
> Interactive query downtime when node goes down even with standby replicas
> -------------------------------------------------------------------------
>
> Key: KAFKA-6249
> URL: https://issues.apache.org/jira/browse/KAFKA-6249
> Project: Kafka
> Issue Type: Bug
> Components: streams
> Affects Versions: 1.0.0
> Reporter: Charles Crain
>
> In a multi-node Kafka Streams application that uses interactive queries, the
> queryable store will become unavailable (throw InvalidStateStoreException)
> for up to several minutes when a node goes down. This happens regardless of
> how many nodes are in the application as well as how many standby replicas
> are configured.
> My expectation is that if a standby replica is present, that the interactive
> query would fail over to the live replica immediately causing negligible
> downtime for interactive queries. Instead, what appears to happen is that
> the queryable store is down for however long it takes for the nodes to
> completely rebalance (this takes a few minutes for a couple GB of total data
> in the queryable store's backing topic).
> I am filing this as a bug, realizing that it may in fact be a feature
> request. However, until there is a way we can use interactive queries with
> minimal (~zero) downtime on node failure, we are having to entertain other
> strategies for serving queries (e.g. manually materializing the topic to an
> external resilient store such as Cassandra) in order to meet our SLAs.
> If there is a way to minimize the downtime of interactive queries on node
> failure that I am missing, I would like to know what it is.
> Our team is super-enthusiastic about Kafka Streams and we're keen to use it
> for just about everything! This is our only major roadblock.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)