Hi, Are you using a new version of kafka ? if yes since 0.9 auto.offset.reset parameter take :
- earliest: automatically reset the offset to the earliest offset - latest: automatically reset the offset to the latest offset - none: throw exception to the consumer if no previous offset is found for the consumer's group - anything else: throw exception to the consumer. https://kafka.apache.org/documentation.html Regards, On Tue, Jul 5, 2016 at 2:15 PM, Bruckwald Tamás <tamas.bruckw...@freemail.hu > wrote: > Hello, > > I'm writing a Spark (v1.6.0) batch job which reads from a Kafka topic. > For this I can use org.apache.spark.streaming.kafka.KafkaUtils#createRDD > however, I need to set the offsets for all the partitions and also need to > store them somewhere (ZK? HDFS?) to know from where to start the next batch > job. > What is the right approach to read from Kafka in a batch job? > > I'm also thinking about writing a streaming job instead, which reads from > auto.offset.reset=smallest and saves the checkpoint to HDFS and then in the > next run it starts from that. > But in this case how can I just fetch once and stop streaming after the > first batch? > > I posted this question on StackOverflow recently ( > http://stackoverflow.com/q/38026627/4020050) but got no answer there, so > I'd ask here as well, hoping that I get some ideas on how to resolve this > issue. > > Thanks - Bruckwald > -- M'BAREK Med Nihed, Fedora Ambassador, TUNISIA, Northern Africa http://www.nihed.com <http://tn.linkedin.com/in/nihed>