We should be able to support that use case in the direct API. It may be as
simple as allowing the users to pass on a function that returns the set of
topic+partitions to read from.
That is function (Time) => Set[TopicAndPartition] This gets called every
batch interval before the offsets are decided. This would allow users to
add topics, delete topics, modify partitions on the fly.

What do you think Cody?




On Wed, Apr 1, 2015 at 11:57 AM, Neelesh <neele...@gmail.com> wrote:

> Thanks Cody!
>
> On Wed, Apr 1, 2015 at 11:21 AM, Cody Koeninger <c...@koeninger.org>
> wrote:
>
>> If you want to change topics from batch to batch, you can always just
>> create a KafkaRDD repeatedly.
>>
>> The streaming code as it stands assumes a consistent set of topics
>> though.  The implementation is private so you cant subclass it without
>> building your own spark.
>>
>> On Wed, Apr 1, 2015 at 1:09 PM, Neelesh <neele...@gmail.com> wrote:
>>
>>> Thanks Cody, that was really helpful.  I have a much better
>>> understanding now. One last question -  Kafka topics  are initialized once
>>> in the driver, is there an easy way of adding/removing topics on the fly?
>>> KafkaRDD#getPartitions() seems to be computed only once, and no way of
>>> refreshing them.
>>>
>>> Thanks again!
>>>
>>> On Wed, Apr 1, 2015 at 10:01 AM, Cody Koeninger <c...@koeninger.org>
>>> wrote:
>>>
>>>> https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md
>>>>
>>>> The kafka consumers run in the executors.
>>>>
>>>> On Wed, Apr 1, 2015 at 11:18 AM, Neelesh <neele...@gmail.com> wrote:
>>>>
>>>>> With receivers, it was pretty obvious which code ran where - each
>>>>> receiver occupied a core and ran on the workers. However, with the new
>>>>> kafka direct input streams, its hard for me to understand where the code
>>>>> that's reading from kafka brokers runs. Does it run on the driver (I hope
>>>>> not), or does it run on workers?
>>>>>
>>>>> Any help appreciated
>>>>> thanks!
>>>>> -neelesh
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to