[ https://issues.apache.org/jira/browse/KAFKA-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263424#comment-16263424 ]
Matthias J. Sax commented on KAFKA-6249: ---------------------------------------- Atm, it is not possible to query StandbyTasks -- for this reason, the metadata is not exposed. KAFKA-6144 is exactly about making StandbyTasks queryable (the JIRA title is a little bit obscure, but exposing the metadata implies to make StandbyTask queryable -- otherwise, exposing the metadata would not make any sense). So your observation/experiment make totally sense. Can we close this JIRA as duplicate of KAFKA-6144 than? This wondering, why you have long time for which you cannot query -- for failure scenario, StandbyTasks should become active tasks and take over -- thus, downtime for IQ should be very short. (Note: this does not apply to scale out scenarios -- newly added instanced are not queryable for some time as they start with "nothing" thus need to rebuild state, what can take some time. This is a know issue and will be addressed in the future though). > Interactive query downtime when node goes down even with standby replicas > ------------------------------------------------------------------------- > > Key: KAFKA-6249 > URL: https://issues.apache.org/jira/browse/KAFKA-6249 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 1.0.0 > Reporter: Charles Crain > > In a multi-node Kafka Streams application that uses interactive queries, the > queryable store will become unavailable (throw InvalidStateStoreException) > for up to several minutes when a node goes down. This happens regardless of > how many nodes are in the application as well as how many standby replicas > are configured. > My expectation is that if a standby replica is present, that the interactive > query would fail over to the live replica immediately causing negligible > downtime for interactive queries. Instead, what appears to happen is that > the queryable store is down for however long it takes for the nodes to > completely rebalance (this takes a few minutes for a couple GB of total data > in the queryable store's backing topic). > I am filing this as a bug, realizing that it may in fact be a feature > request. However, until there is a way we can use interactive queries with > minimal (~zero) downtime on node failure, we are having to entertain other > strategies for serving queries (e.g. manually materializing the topic to an > external resilient store such as Cassandra) in order to meet our SLAs. > If there is a way to minimize the downtime of interactive queries on node > failure that I am missing, I would like to know what it is. > Our team is super-enthusiastic about Kafka Streams and we're keen to use it > for just about everything! This is our only major roadblock. -- This message was sent by Atlassian JIRA (v6.4.14#64029)