[ 
https://issues.apache.org/jira/browse/KAFKA-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565772#comment-16565772
 ] 

Matthias J. Sax commented on KAFKA-7209:
----------------------------------------

About `offsets.topic.replication.factor` – if you set it to one, and the 
corresponding broker goes down, you cannot commit offset any longer, thus, you 
might want to set it to 3. Also note, that the number of in-sync replicas 
config is important – the broker default is used, and you can only write to the 
topic if enough in-sync replicas are online. Thus, you should not set it to 3, 
but at max 2 to survive a single broker failure.

For `transaction.state.log.XXX` configs: as long as you don't use exactly-once, 
you can ignore those setting.

For the failure scenarios: can you provide DEBUG logs for the brokers and the 
Streams application so we can dig into it? For the first scenario, after the 
rebalance, the state directories should be created, but we will need Streams 
DEBUG logs to see. For scenario (2) there should not be any data loss – we 
might need Streams and broker logs to dig into it.

For a clean restart with the same application.id, you should check out the 
application reset tool: 
[https://kafka.apache.org/20/documentation/streams/developer-guide/app-reset-tool.html]

Btw: you report this error for 0.11.0.1 and 0.11.0.3 was release recently – I 
would highly recommend to upgrade to 0.11.0.3 and check if the issue is still 
there – there are many bug fixed and the issue might be resolved already.

> Kafka stream does not rebalance when one node gets down
> -------------------------------------------------------
>
>                 Key: KAFKA-7209
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7209
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.11.0.1
>            Reporter: Yogesh BG
>            Priority: Critical
>
> I use kafka streams 0.11.0.1session timeout 60000, retries to be int.max and 
> backoff time default
>  
> I have 3 nodes running kafka cluster of 3 broker
> and i am running the 3 kafka stream with same 
> [application.id|http://application.id/]
> each node has one broker one kafka stream application
> everything works fine during setup
> i bringdown one node, so one kafka broker and one streaming app is down
> now i see exceptions in other two streaming apps and it never gets re 
> balanced waited for hours and never comes back to norma
> is there anything am missing?
> i also tried looking into when one broker is down call stream.close, cleanup 
> and restart this also doesn't help
> can anyone help me?
>  
>  
>  
>  One thing i observed lately is that kafka topics with partitions one gets 
> reassigned but i have topics of 16 partitions and replication factor 3. It 
> never settles up



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to