[
https://issues.apache.org/jira/browse/STORM-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074058#comment-16074058
]
Stig Rohde Døssing commented on STORM-2426:
-------------------------------------------
[~EitZei] Okay, it isn't exactly that issue then, but it's very similar. When
the worker is killed, the KafkaConsumer doesn't get a chance to disconnect
cleanly from Kafka, so Kafka will wait for the full session timeout before it
declares the missing consumer dead and finishes rebalancing. I tried out
killing workers with settings similar to yours, and the rebalance ends up
taking a few minutes. This should explain the long rebalance. STORM-2542 will
definitely solve this, since Kafka is not involved in assigning partitions with
that change, so rebalances are local to each spout instance instead of being
something the spouts need to coordinate through Kafka.
> First tuples fail after worker is respawn
> -----------------------------------------
>
> Key: STORM-2426
> URL: https://issues.apache.org/jira/browse/STORM-2426
> Project: Apache Storm
> Issue Type: Bug
> Components: storm-kafka-client
> Affects Versions: 1.0.2
> Reporter: Antti Järvinen
> Attachments: 2017-03-20-Kafka-spout-issue.txt,
> 2017-03-21-Timeout-ticks.txt
>
>
> Topology with two Kafka spouts (org.apache.storm.kafka.spout.KafkaSpout)
> reading from two different topics with same consumer group ID.
> 1. Kill the only worker process for topology
> 2. Storm creates new worker
> 3. Kafka starts rebalancing (log line 15-16)
> 4. Kafka rebalancing done (log line 18-19)
> 5. Kafka topics read and tuples emitted (log line 28-29)
> 6. Tuples immediately fail (log line 30-33)
> The delay between tuples emitted and tuples failing is just some 10 ms. No
> bolts in topology received the tuples.
> What could cause this? The assumption is that there are uncommitted messages
> in Spout when it is killed and those are retried.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)