org.apache.spark.streaming.kafka.KafkaCluster has methods
getLatestLeaderOffsets and getEarliestLeaderOffsets

On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande
<deshpandesh...@gmail.com> wrote:
> Thanks TD.
>
> On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das <tathagata.das1...@gmail.com>
> wrote:
>>
>> I dont think there is any easier way.
>>
>> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande <deshpandesh...@gmail.com>
>> wrote:
>>>
>>> Thanks TD for the response. I forgot to mention that I am not using
>>> structured streaming.
>>>
>>> I was looking into KafkaUtils.createRDD, and looks like I need to get the
>>> earliest and the latest offset for each partition to build the
>>> Array(offsetRange). I wanted to know if there was a easier way.
>>>
>>> 1 reason why we are hesitating to use structured streaming is because I
>>> need to persist the data in Cassandra database which I believe is not
>>> production ready.
>>>
>>>
>>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das
>>> <tathagata.das1...@gmail.com> wrote:
>>>>
>>>> Its best to use DataFrames. You can read from as streaming or as batch.
>>>> More details here.
>>>>
>>>>
>>>> https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries
>>>>
>>>> https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html
>>>>
>>>> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande
>>>> <deshpandesh...@gmail.com> wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> What is the easiest way to read all the data from kafka in a batch
>>>>> program for a given topic?
>>>>> I have 10 kafka partitions, but the data is not much. I would like to
>>>>> read  from the earliest from all the partitions for a topic.
>>>>>
>>>>> I appreciate any help. Thanks
>>>>
>>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to