[jira] [Commented] (KAFKA-6249) Interactive query downtime when node goes down even with standby replicas

Matthias J. Sax (JIRA) Wed, 22 Nov 2017 14:03:27 -0800

    [ 
https://issues.apache.org/jira/browse/KAFKA-6249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263424#comment-16263424
 ]


Matthias J. Sax commented on KAFKA-6249:
----------------------------------------

Atm, it is not possible to query StandbyTasks -- for this reason, the metadata 
is not exposed. KAFKA-6144 is exactly about making StandbyTasks queryable (the 
JIRA title is a little bit obscure, but exposing the metadata implies to make 
StandbyTask queryable -- otherwise, exposing the metadata would not make any 
sense).

So your observation/experiment make totally sense. Can we close this JIRA as 
duplicate of KAFKA-6144 than?

This wondering, why you have long time for which you cannot query -- for 
failure scenario, StandbyTasks should become active tasks and take over -- 
thus, downtime for IQ should be very short. (Note: this does not apply to scale 
out scenarios -- newly added instanced are not queryable for some time as they 
start with "nothing" thus need to rebuild state, what can take some time. This 
is a know issue and will be addressed in the future though).

> Interactive query downtime when node goes down even with standby replicas
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-6249
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6249
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 1.0.0
>            Reporter: Charles Crain
>
> In a multi-node Kafka Streams application that uses interactive queries, the 
> queryable store will become unavailable (throw InvalidStateStoreException) 
> for up to several minutes when a node goes down.  This happens regardless of 
> how many nodes are in the application as well as how many standby replicas 
> are configured.
> My expectation is that if a standby replica is present, that the interactive 
> query would fail over to the live replica immediately causing negligible 
> downtime for interactive queries.  Instead, what appears to happen is that 
> the queryable store is down for however long it takes for the nodes to 
> completely rebalance (this takes a few minutes for a couple GB of total data 
> in the queryable store's backing topic).
> I am filing this as a bug, realizing that it may in fact be a feature 
> request.  However, until there is a way we can use interactive queries with 
> minimal (~zero) downtime on node failure, we are having to entertain other 
> strategies for serving queries (e.g. manually materializing the topic to an 
> external resilient store such as Cassandra) in order to meet our SLAs.
> If there is a way to minimize the downtime of interactive queries on node 
> failure that I am missing, I would like to know what it is.
> Our team is super-enthusiastic about Kafka Streams and we're keen to use it 
> for just about everything!  This is our only major roadblock.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (KAFKA-6249) Interactive query downtime when node goes down even with standby replicas

Reply via email to