That’s an obvious workaround, yes, thank you Tobias.
However, I’m prototyping substitution to real batch process, where I’d have to 
create six streams (and possibly more).  It could be a bit messy.
On the other hand, under the hood KafkaInputDStream which is create with this 
KafkaUtils call,  calls ConsumerConnector.createMessageStream which returns a 
Map[String, List[KafkaStream] keyed by topic. It is, however, not exposed.
So Kafka does provide capability of creating multiple streams based on topic, 
but Spark doesn’t use it, which is unfortunate.

Sergey

From: Tobias Pfeiffer <t...@preferred.jp<mailto:t...@preferred.jp>>
Reply-To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Date: Wednesday, July 2, 2014 at 9:54 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Kafka - streaming from multiple topics

Sergey,

you might actually consider using two streams, like
      val stream1 = KafkaUtils.createStream(ssc,"localhost:2181","logs", 
Map("retarget" -> 2))
      val stream2 = KafkaUtils.createStream(ssc,"localhost:2181","logs", 
Map("datapair" -> 2))
to achieve what you want. This has the additional advantage that there are 
actually two connections to Kafka and data is possibly received on different 
cluster nodes, already increasing parallelity in an early stage of processing.

Tobias



On Thu, Jul 3, 2014 at 6:47 AM, Sergey Malov 
<sma...@collective.com<mailto:sma...@collective.com>> wrote:
HI,
I would like to set up streaming from Kafka cluster, reading multiple topics 
and then processing each of the differently.
So, I’d create a stream

      val stream = KafkaUtils.createStream(ssc,"localhost:2181","logs", 
Map("retarget" -> 2,"datapair" -> 2))

And then based on whether it’s “retarget” topic or “datapair”, set up different 
filter function, map function, reduce function, etc. Is it possible ?  I’d 
assume it should be, since ConsumerConnector can map of KafkaStreams keyed on 
topic, but I can’t find that it would be visible to Spark.

Thank you,

Sergey Malov


Reply via email to