That’s an obvious workaround, yes, thank you Tobias. However, I’m prototyping substitution to real batch process, where I’d have to create six streams (and possibly more). It could be a bit messy. On the other hand, under the hood KafkaInputDStream which is create with this KafkaUtils call, calls ConsumerConnector.createMessageStream which returns a Map[String, List[KafkaStream] keyed by topic. It is, however, not exposed. So Kafka does provide capability of creating multiple streams based on topic, but Spark doesn’t use it, which is unfortunate.
Sergey From: Tobias Pfeiffer <t...@preferred.jp<mailto:t...@preferred.jp>> Reply-To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Date: Wednesday, July 2, 2014 at 9:54 PM To: "user@spark.apache.org<mailto:user@spark.apache.org>" <user@spark.apache.org<mailto:user@spark.apache.org>> Subject: Re: Kafka - streaming from multiple topics Sergey, you might actually consider using two streams, like val stream1 = KafkaUtils.createStream(ssc,"localhost:2181","logs", Map("retarget" -> 2)) val stream2 = KafkaUtils.createStream(ssc,"localhost:2181","logs", Map("datapair" -> 2)) to achieve what you want. This has the additional advantage that there are actually two connections to Kafka and data is possibly received on different cluster nodes, already increasing parallelity in an early stage of processing. Tobias On Thu, Jul 3, 2014 at 6:47 AM, Sergey Malov <sma...@collective.com<mailto:sma...@collective.com>> wrote: HI, I would like to set up streaming from Kafka cluster, reading multiple topics and then processing each of the differently. So, I’d create a stream val stream = KafkaUtils.createStream(ssc,"localhost:2181","logs", Map("retarget" -> 2,"datapair" -> 2)) And then based on whether it’s “retarget” topic or “datapair”, set up different filter function, map function, reduce function, etc. Is it possible ? I’d assume it should be, since ConsumerConnector can map of KafkaStreams keyed on topic, but I can’t find that it would be visible to Spark. Thank you, Sergey Malov