Re: Should CarbonData need to integrate with Spark Streaming too?

xm_zzc Tue, 16 Jan 2018 23:47:30 -0800

Liang Chen wrote
> Hi
> 
> Thanks for you started this discussion for adding spark streaming support.
> 1. Please try to utilize the current code(structured streaming), not
> adding
> separated logic code for spark streaming.


[reply] The original idea is to reuse the current code(structured streaming)
to implement integration Spark Streaming.


Liang Chen wrote
> 2. I suggest that by default is using structured streaming , please
> consider
> how to make configuration for enabling/switching to spark streaming.

[reply] The implementations of Structured Streaming and Spark Streaming are
different, the usage of them are different too, I don't understand what dose
'consider 
how to make configuration for enabling/switching to spark streaming' mean?
IMO, we just need to implement a utilities to write rdd data to streaming
segment in DStream.foreachRDD, the logic of this utilities is the same as
CarbonAppendableStreamSink.addBatch. right?


Liang Chen wrote
> Regards
> Liang
> 
> 
> xm_zzc wrote
>> Hi dev:
>>   Currently CarbonData 1.3(will be released soon) just support to
>> integrate
>> with Spark Structured Streaming which requires Kafka's version must be >=
>> 0.10. I think there are still many users  integrating Spark Streaming
>> with
>> kafka 0.8, at least our cluster is, but the cost of upgrading kafka is
>> too
>> much. So should CarbonData need to integrate with Spark Streaming too?
>>   
>>   I think there are two ways to integrate with Spark Streaming, as
>> following:
>>   1). CarbonData batch data loading + Auto compaction
>>   Use CarbonSession.createDataFrame to convert rdd to DataFrame in
>> InputDStream.foreachRDD, and then save rdd data into CarbonData table
>> which
>> support auto compaction. In this way, it can support to create
>> pre-aggregate
>> tables on this main table too (Streaming table does not support to create
>> pre-aggregate tables on it).
>>   
>>   I can test with this way in our QA env and add example to CarbonData.
>>   
>>   2). The same as integration with Structured Streaming
>>   With this way, Structured Streaming append every mini-batch data into
>> stream segment which is row format, and then when the size of stream
>> segment
>> is greater than 'carbon.streaming.segment.max.size', it will auto convert
>> stream segment to batch segment(column format) at the begin of each batch
>> and create a new stream segment to append data.
>>   However, I have no idea how to integrate with Spark Streaming yet, *any
>> suggestion for this*? 
>> 
>> 
>> 
>> --
>> Sent from:
>> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
> 
> 
> 
> 
> 
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/





--
Sent from: 
http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: Should CarbonData need to integrate with Spark Streaming too?

Reply via email to