org.apache.spark.streaming.kafka.KafkaCluster has methods getLatestLeaderOffsets and getEarliestLeaderOffsets
On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande <deshpandesh...@gmail.com> wrote: > Thanks TD. > > On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das <tathagata.das1...@gmail.com> > wrote: >> >> I dont think there is any easier way. >> >> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande <deshpandesh...@gmail.com> >> wrote: >>> >>> Thanks TD for the response. I forgot to mention that I am not using >>> structured streaming. >>> >>> I was looking into KafkaUtils.createRDD, and looks like I need to get the >>> earliest and the latest offset for each partition to build the >>> Array(offsetRange). I wanted to know if there was a easier way. >>> >>> 1 reason why we are hesitating to use structured streaming is because I >>> need to persist the data in Cassandra database which I believe is not >>> production ready. >>> >>> >>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das >>> <tathagata.das1...@gmail.com> wrote: >>>> >>>> Its best to use DataFrames. You can read from as streaming or as batch. >>>> More details here. >>>> >>>> >>>> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries >>>> >>>> https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html >>>> >>>> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande >>>> <deshpandesh...@gmail.com> wrote: >>>>> >>>>> Hi all, >>>>> >>>>> What is the easiest way to read all the data from kafka in a batch >>>>> program for a given topic? >>>>> I have 10 kafka partitions, but the data is not much. I would like to >>>>> read from the earliest from all the partitions for a topic. >>>>> >>>>> I appreciate any help. Thanks >>>> >>>> >>> >> > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org