Re: Read Kafka topic in a Spark batch job

nihed mbarek Tue, 05 Jul 2016 05:25:53 -0700

Hi,

Are you using a new version of kafka  ? if yes
since 0.9 auto.offset.reset parameter take :


   - earliest: automatically reset the offset to the earliest offset
   - latest: automatically reset the offset to the latest offset
   - none: throw exception to the consumer if no previous offset is found
   for the consumer's group
   - anything else: throw exception to the consumer.

https://kafka.apache.org/documentation.html


Regards,

On Tue, Jul 5, 2016 at 2:15 PM, Bruckwald Tamás <tamas.bruckw...@freemail.hu
> wrote:

> Hello,
>
> I'm writing a Spark (v1.6.0) batch job which reads from a Kafka topic.
> For this I can use org.apache.spark.streaming.kafka.KafkaUtils#createRDD
> however, I need to set the offsets for all the partitions and also need to
> store them somewhere (ZK? HDFS?) to know from where to start the next batch
> job.
> What is the right approach to read from Kafka in a batch job?
>
> I'm also thinking about writing a streaming job instead, which reads from
> auto.offset.reset=smallest and saves the checkpoint to HDFS and then in the
> next run it starts from that.
> But in this case how can I just fetch once and stop streaming after the
> first batch?
>
> I posted this question on StackOverflow recently (
> http://stackoverflow.com/q/38026627/4020050) but got no answer there, so
> I'd ask here as well, hoping that I get some ideas on how to resolve this
> issue.
>
> Thanks - Bruckwald
>



-- 

M'BAREK Med Nihed,
Fedora Ambassador, TUNISIA, Northern Africa
http://www.nihed.com

<http://tn.linkedin.com/in/nihed>

Re: Read Kafka topic in a Spark batch job

Reply via email to