[jira] [Comment Edited] (STORM-277) Kafka-Spout does not support parallelism(unpredictable behaviour)

Michael Noll (JIRA) Sat, 05 Apr 2014 10:13:18 -0700

    [ 
https://issues.apache.org/jira/browse/STORM-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961169#comment-13961169
 ]


Michael Noll edited comment on STORM-277 at 4/5/14 5:10 PM:
------------------------------------------------------------

Also, make sure you are not running into the Kafka 0.8 "issue" where, when you 
don't provide a partition key in your Kafka message, the Kafka producer will 
"stick" to a given partition for a certain amount of time instead of the more 
intuitive behavior of picking a new partition at random for each new message.  
So depending on your setup your Kafka producer may have decided to send its 
messages to only one of N partitions, and thus the Kafka spout may also not be 
able to benefit from parallelism > 1.

See the related [discussion on wurstmeister's kafka-spout 
repo|https://github.com/wurstmeister/storm-kafka-0.8-plus/commit/2f45866c8e011ac4804c940ff9e1d7c147591761#commitcomment-5861615]
 as well as the Kafka FAQ entry [Why is data not evenly distributed among 
partitions when a partitioning key is not 
specified?|https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified?|]


was (Author: miguno):
Also, make sure you are not running into the Kafka 0.8 "issue" where, when you 
don't provide a partition key in your Kafka message, the Kafka producer will 
"stick" to a given partition for a certain amount of time instead of the more 
intuitive behavior of picking a new partition at random for each new message.  
So depending on your setup your Kafka producer may have decided to send its 
messages to only one of N partitions, and thus the Kafka spout may also not be 
able to benefit from parallelism > 1.

See the related [discussion on wurstmeister's kafka-spout 
repo|https://github.com/wurstmeister/storm-kafka-0.8-plus/commit/2f45866c8e011ac4804c940ff9e1d7c147591761#commitcomment-5861615]
 as well as the Kafka FAQ entry 
[https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified?|Why
 is data not evenly distributed among partitions when a partitioning key is not 
specified?]

> Kafka-Spout does not support parallelism(unpredictable behaviour)
> -----------------------------------------------------------------
>
>                 Key: STORM-277
>                 URL: https://issues.apache.org/jira/browse/STORM-277
>             Project: Apache Storm (Incubating)
>          Issue Type: Bug
>            Reporter: Amol Fasale
>              Labels: kafka-0.8
>
> I'm using storm-kafka-0.8-plus for storm and kafka integration, storm spout 
> works fine when parallelism is set to 1, but when to support increased event 
> load, kafka-spout is set to parallelism > 1, but it seems to be kafka-spout 
> does not distribute the load across the workers(some times, cases like higher 
> parallelism, kafka-spout distribute the load on 3/4 workers but unpredictable 
> distribution of tuples)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (STORM-277) Kafka-Spout does not support parallelism(unpredictable behaviour)

Reply via email to