Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-10 Thread shyla deshpande
Thanks Cody.

On Wed, Aug 9, 2017 at 8:46 AM, Cody Koeninger  wrote:

> org.apache.spark.streaming.kafka.KafkaCluster has methods
> getLatestLeaderOffsets and getEarliestLeaderOffsets
>
> On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande
>  wrote:
> > Thanks TD.
> >
> > On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das <
> tathagata.das1...@gmail.com>
> > wrote:
> >>
> >> I dont think there is any easier way.
> >>
> >> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande <
> deshpandesh...@gmail.com>
> >> wrote:
> >>>
> >>> Thanks TD for the response. I forgot to mention that I am not using
> >>> structured streaming.
> >>>
> >>> I was looking into KafkaUtils.createRDD, and looks like I need to get
> the
> >>> earliest and the latest offset for each partition to build the
> >>> Array(offsetRange). I wanted to know if there was a easier way.
> >>>
> >>> 1 reason why we are hesitating to use structured streaming is because I
> >>> need to persist the data in Cassandra database which I believe is not
> >>> production ready.
> >>>
> >>>
> >>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das
> >>>  wrote:
> 
>  Its best to use DataFrames. You can read from as streaming or as
> batch.
>  More details here.
> 
> 
>  https://spark.apache.org/docs/latest/structured-streaming-
> kafka-integration.html#creating-a-kafka-source-for-batch-queries
> 
>  https://databricks.com/blog/2017/04/26/processing-data-in-
> apache-kafka-with-structured-streaming-in-apache-spark-2-2.html
> 
>  On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande
>   wrote:
> >
> > Hi all,
> >
> > What is the easiest way to read all the data from kafka in a batch
> > program for a given topic?
> > I have 10 kafka partitions, but the data is not much. I would like to
> > read  from the earliest from all the partitions for a topic.
> >
> > I appreciate any help. Thanks
> 
> 
> >>>
> >>
> >
>


Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-09 Thread Cody Koeninger
org.apache.spark.streaming.kafka.KafkaCluster has methods
getLatestLeaderOffsets and getEarliestLeaderOffsets

On Mon, Aug 7, 2017 at 11:37 PM, shyla deshpande
 wrote:
> Thanks TD.
>
> On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das 
> wrote:
>>
>> I dont think there is any easier way.
>>
>> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande 
>> wrote:
>>>
>>> Thanks TD for the response. I forgot to mention that I am not using
>>> structured streaming.
>>>
>>> I was looking into KafkaUtils.createRDD, and looks like I need to get the
>>> earliest and the latest offset for each partition to build the
>>> Array(offsetRange). I wanted to know if there was a easier way.
>>>
>>> 1 reason why we are hesitating to use structured streaming is because I
>>> need to persist the data in Cassandra database which I believe is not
>>> production ready.
>>>
>>>
>>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das
>>>  wrote:

 Its best to use DataFrames. You can read from as streaming or as batch.
 More details here.


 https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries

 https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html

 On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande
  wrote:
>
> Hi all,
>
> What is the easiest way to read all the data from kafka in a batch
> program for a given topic?
> I have 10 kafka partitions, but the data is not much. I would like to
> read  from the earliest from all the partitions for a topic.
>
> I appreciate any help. Thanks


>>>
>>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-07 Thread shyla deshpande
Thanks TD.

On Mon, Aug 7, 2017 at 8:59 PM, Tathagata Das 
wrote:

> I dont think there is any easier way.
>
> On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande 
> wrote:
>
>> Thanks TD for the response. I forgot to mention that I am not using
>> structured streaming.
>>
>> I was looking into KafkaUtils.createRDD, and looks like I need to get
>> the earliest and the latest offset for each partition to build the
>> Array(offsetRange). I wanted to know if there was a easier way.
>>
>> 1 reason why we are hesitating to use structured streaming is because I
>> need to persist the data in Cassandra database which I believe is not
>> production ready.
>>
>>
>> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> Its best to use DataFrames. You can read from as streaming or as batch.
>>> More details here.
>>>
>>> https://spark.apache.org/docs/latest/structured-streaming-ka
>>> fka-integration.html#creating-a-kafka-source-for-batch-queries
>>> https://databricks.com/blog/2017/04/26/processing-data-in-ap
>>> ache-kafka-with-structured-streaming-in-apache-spark-2-2.html
>>>
>>> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande <
>>> deshpandesh...@gmail.com> wrote:
>>>
 Hi all,

 What is the easiest way to read all the data from kafka in a batch
 program for a given topic?
 I have 10 kafka partitions, but the data is not much. I would like to
 read  from the earliest from all the partitions for a topic.

 I appreciate any help. Thanks

>>>
>>>
>>
>


Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-07 Thread Tathagata Das
I dont think there is any easier way.

On Mon, Aug 7, 2017 at 7:32 PM, shyla deshpande 
wrote:

> Thanks TD for the response. I forgot to mention that I am not using
> structured streaming.
>
> I was looking into KafkaUtils.createRDD, and looks like I need to get the
> earliest and the latest offset for each partition to build the
> Array(offsetRange). I wanted to know if there was a easier way.
>
> 1 reason why we are hesitating to use structured streaming is because I
> need to persist the data in Cassandra database which I believe is not
> production ready.
>
>
> On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das  > wrote:
>
>> Its best to use DataFrames. You can read from as streaming or as batch.
>> More details here.
>>
>> https://spark.apache.org/docs/latest/structured-streaming-ka
>> fka-integration.html#creating-a-kafka-source-for-batch-queries
>> https://databricks.com/blog/2017/04/26/processing-data-in-ap
>> ache-kafka-with-structured-streaming-in-apache-spark-2-2.html
>>
>> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande > > wrote:
>>
>>> Hi all,
>>>
>>> What is the easiest way to read all the data from kafka in a batch
>>> program for a given topic?
>>> I have 10 kafka partitions, but the data is not much. I would like to
>>> read  from the earliest from all the partitions for a topic.
>>>
>>> I appreciate any help. Thanks
>>>
>>
>>
>


Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-07 Thread shyla deshpande
Thanks TD for the response. I forgot to mention that I am not using
structured streaming.

I was looking into KafkaUtils.createRDD, and looks like I need to get the
earliest and the latest offset for each partition to build the
Array(offsetRange). I wanted to know if there was a easier way.

1 reason why we are hesitating to use structured streaming is because I
need to persist the data in Cassandra database which I believe is not
production ready.


On Mon, Aug 7, 2017 at 6:11 PM, Tathagata Das 
wrote:

> Its best to use DataFrames. You can read from as streaming or as batch.
> More details here.
>
> https://spark.apache.org/docs/latest/structured-streaming-
> kafka-integration.html#creating-a-kafka-source-for-batch-queries
> https://databricks.com/blog/2017/04/26/processing-data-in-
> apache-kafka-with-structured-streaming-in-apache-spark-2-2.html
>
> On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande 
> wrote:
>
>> Hi all,
>>
>> What is the easiest way to read all the data from kafka in a batch
>> program for a given topic?
>> I have 10 kafka partitions, but the data is not much. I would like to
>> read  from the earliest from all the partitions for a topic.
>>
>> I appreciate any help. Thanks
>>
>
>


Re: KafkaUtils.createRDD , How do I read all the data from kafka in a batch program for a given topic?

2017-08-07 Thread Tathagata Das
Its best to use DataFrames. You can read from as streaming or as batch.
More details here.

https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#creating-a-kafka-source-for-batch-queries
https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html

On Mon, Aug 7, 2017 at 6:03 PM, shyla deshpande 
wrote:

> Hi all,
>
> What is the easiest way to read all the data from kafka in a batch program
> for a given topic?
> I have 10 kafka partitions, but the data is not much. I would like to read
>  from the earliest from all the partitions for a topic.
>
> I appreciate any help. Thanks
>