[ 
https://issues.apache.org/jira/browse/STORM-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071062#comment-16071062
 ] 

Stig Rohde Døssing commented on STORM-2426:
-------------------------------------------

Yes, I believe so. This looks a lot like how the subscribe API behaves when 
there are multiple KafkaConsumers in a thread, as described here 
https://issues.apache.org/jira/browse/STORM-2514?focusedCommentId=16014195&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16014195.
 I'm not really sure why the rebalance takes 3.5 minutes, since I would expect 
it to take 2.5 (the session timeout). In the linked demo code, the long 
rebalance happens because rebalancing can't finish until all the KafkaConsumers 
call (and block in) poll, which can't happen if there are multiple consumers in 
a thread. The rebalance times out at the session timeout.

Was the number of executors lower than the number of tasks when you had this 
problem [~EitZei]?

STORM-2542 gets rid of long rebalances, so it should be fixed in any case even 
if it's not the issue where the task count isn't equal to the executor number.

> First tuples fail after worker is respawn
> -----------------------------------------
>
>                 Key: STORM-2426
>                 URL: https://issues.apache.org/jira/browse/STORM-2426
>             Project: Apache Storm
>          Issue Type: Bug
>          Components: storm-kafka-client
>    Affects Versions: 1.0.2
>            Reporter: Antti Järvinen
>         Attachments: 2017-03-20-Kafka-spout-issue.txt, 
> 2017-03-21-Timeout-ticks.txt
>
>
> Topology with two Kafka spouts (org.apache.storm.kafka.spout.KafkaSpout) 
> reading from two different topics with same consumer group ID. 
> 1. Kill the only worker process for topology
> 2. Storm creates new worker
> 3. Kafka starts rebalancing (log line 15-16)
> 4. Kafka rebalancing done (log line 18-19)
> 5. Kafka topics read and tuples emitted (log line 28-29)
> 6. Tuples immediately fail (log line 30-33)
> The delay between tuples emitted and tuples failing is just some 10 ms. No 
> bolts in topology received the tuples.
> What could cause this? The assumption is that there are uncommitted messages 
> in Spout when it is killed and those are retried.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to