Take a look at how the messages are actually distributed across the
partitions. If the message keys have a low cardinality, you might get poor
distribution (i.e. all the messages are actually only in two of the five
partitions, leading to what you see in Spark).

If you take a look at the Kafka data directories, you can probably get an
idea of the distribution by just examining the sizes of each partition.

Jeremy

On Wed, Sep 14, 2016 at 12:33 PM, Rachana Srivastava <
rachana.srivast...@markmonitor.com> wrote:

> Hello all,
>
>
>
> I have created a Kafka topic with 5 partitions.  And I am using
> createStream receiver API like following.   But somehow only one receiver
> is getting the input data. Rest of receivers are not processign anything.
> Can you please help?
>
>
>
> JavaPairDStream<String, String> messages = null;
>
>
>
>             if(sparkStreamCount > 0){
>
>                 // We create an input DStream for each partition of the
> topic, unify those streams, and then repartition the unified stream.
>
>                 List<JavaPairDStream<String, String>> kafkaStreams = new
> ArrayList<JavaPairDStream<String, String>>(sparkStreamCount);
>
>                 for (int i = 0; i < sparkStreamCount; i++) {
>
>                                 kafkaStreams.add(
> KafkaUtils.createStream(jssc, contextVal.getString(KAFKA_ZOOKEEPER),
> contextVal.getString(KAFKA_GROUP_ID), kafkaTopicMap));
>
>                 }
>
>                 messages = jssc.union(kafkaStreams.get(0),
> kafkaStreams.subList(1, kafkaStreams.size()));
>
>             }
>
>             else{
>
>                 messages =  KafkaUtils.createStream(jssc,
> contextVal.getString(KAFKA_ZOOKEEPER), contextVal.getString(KAFKA_GROUP_ID),
> kafkaTopicMap);
>
>             }
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to