[jira] [Commented] (STORM-2426) First tuples fail after worker is respawn

JIRA Tue, 04 Jul 2017 13:18:35 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16074058#comment-16074058
 ]


Stig Rohde Døssing commented on STORM-2426:
-------------------------------------------

[~EitZei] Okay, it isn't exactly that issue then, but it's very similar. When 
the worker is killed, the KafkaConsumer doesn't get a chance to disconnect 
cleanly from Kafka, so Kafka will wait for the full session timeout before it 
declares the missing consumer dead and finishes rebalancing. I tried out 
killing workers with settings similar to yours, and the rebalance ends up 
taking a few minutes. This should explain the long rebalance. STORM-2542 will 
definitely solve this, since Kafka is not involved in assigning partitions with 
that change, so rebalances are local to each spout instance instead of being 
something the spouts need to coordinate through Kafka.

> First tuples fail after worker is respawn
> -----------------------------------------
>
>                 Key: STORM-2426
>                 URL: https://issues.apache.org/jira/browse/STORM-2426
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-kafka-client
>    Affects Versions: 1.0.2
>            Reporter: Antti Järvinen
>         Attachments: 2017-03-20-Kafka-spout-issue.txt, 
> 2017-03-21-Timeout-ticks.txt
>
>
> Topology with two Kafka spouts (org.apache.storm.kafka.spout.KafkaSpout) 
> reading from two different topics with same consumer group ID. 
> 1. Kill the only worker process for topology
> 2. Storm creates new worker
> 3. Kafka starts rebalancing (log line 15-16)
> 4. Kafka rebalancing done (log line 18-19)
> 5. Kafka topics read and tuples emitted (log line 28-29)
> 6. Tuples immediately fail (log line 30-33)
> The delay between tuples emitted and tuples failing is just some 10 ms. No 
> bolts in topology received the tuples.
> What could cause this? The assumption is that there are uncommitted messages 
> in Spout when it is killed and those are retried.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (STORM-2426) First tuples fail after worker is respawn

Reply via email to