[
https://issues.apache.org/jira/browse/STORM-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961169#comment-13961169
]
Michael Noll edited comment on STORM-277 at 4/5/14 5:10 PM:
------------------------------------------------------------
Also, make sure you are not running into the Kafka 0.8 "issue" where, when you
don't provide a partition key in your Kafka message, the Kafka producer will
"stick" to a given partition for a certain amount of time instead of the more
intuitive behavior of picking a new partition at random for each new message.
So depending on your setup your Kafka producer may have decided to send its
messages to only one of N partitions, and thus the Kafka spout may also not be
able to benefit from parallelism > 1.
See the related [discussion on wurstmeister's kafka-spout
repo|https://github.com/wurstmeister/storm-kafka-0.8-plus/commit/2f45866c8e011ac4804c940ff9e1d7c147591761#commitcomment-5861615]
as well as the Kafka FAQ entry [Why is data not evenly distributed among
partitions when a partitioning key is not
specified?|https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified?|]
was (Author: miguno):
Also, make sure you are not running into the Kafka 0.8 "issue" where, when you
don't provide a partition key in your Kafka message, the Kafka producer will
"stick" to a given partition for a certain amount of time instead of the more
intuitive behavior of picking a new partition at random for each new message.
So depending on your setup your Kafka producer may have decided to send its
messages to only one of N partitions, and thus the Kafka spout may also not be
able to benefit from parallelism > 1.
See the related [discussion on wurstmeister's kafka-spout
repo|https://github.com/wurstmeister/storm-kafka-0.8-plus/commit/2f45866c8e011ac4804c940ff9e1d7c147591761#commitcomment-5861615]
as well as the Kafka FAQ entry
[https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified?|Why
is data not evenly distributed among partitions when a partitioning key is not
specified?]
> Kafka-Spout does not support parallelism(unpredictable behaviour)
> -----------------------------------------------------------------
>
> Key: STORM-277
> URL: https://issues.apache.org/jira/browse/STORM-277
> Project: Apache Storm (Incubating)
> Issue Type: Bug
> Reporter: Amol Fasale
> Labels: kafka-0.8
>
> I'm using storm-kafka-0.8-plus for storm and kafka integration, storm spout
> works fine when parallelism is set to 1, but when to support increased event
> load, kafka-spout is set to parallelism > 1, but it seems to be kafka-spout
> does not distribute the load across the workers(some times, cases like higher
> parallelism, kafka-spout distribute the load on 3/4 workers but unpredictable
> distribution of tuples)
--
This message was sent by Atlassian JIRA
(v6.2#6252)