Liang Chen wrote > Hi > > Thanks for you started this discussion for adding spark streaming support. > 1. Please try to utilize the current code(structured streaming), not > adding > separated logic code for spark streaming.
[reply] The original idea is to reuse the current code(structured streaming) to implement integration Spark Streaming. Liang Chen wrote > 2. I suggest that by default is using structured streaming , please > consider > how to make configuration for enabling/switching to spark streaming. [reply] The implementations of Structured Streaming and Spark Streaming are different, the usage of them are different too, I don't understand what dose 'consider how to make configuration for enabling/switching to spark streaming' mean? IMO, we just need to implement a utilities to write rdd data to streaming segment in DStream.foreachRDD, the logic of this utilities is the same as CarbonAppendableStreamSink.addBatch. right? Liang Chen wrote > Regards > Liang > > > xm_zzc wrote >> Hi dev: >> Currently CarbonData 1.3(will be released soon) just support to >> integrate >> with Spark Structured Streaming which requires Kafka's version must be >= >> 0.10. I think there are still many users integrating Spark Streaming >> with >> kafka 0.8, at least our cluster is, but the cost of upgrading kafka is >> too >> much. So should CarbonData need to integrate with Spark Streaming too? >> >> I think there are two ways to integrate with Spark Streaming, as >> following: >> 1). CarbonData batch data loading + Auto compaction >> Use CarbonSession.createDataFrame to convert rdd to DataFrame in >> InputDStream.foreachRDD, and then save rdd data into CarbonData table >> which >> support auto compaction. In this way, it can support to create >> pre-aggregate >> tables on this main table too (Streaming table does not support to create >> pre-aggregate tables on it). >> >> I can test with this way in our QA env and add example to CarbonData. >> >> 2). The same as integration with Structured Streaming >> With this way, Structured Streaming append every mini-batch data into >> stream segment which is row format, and then when the size of stream >> segment >> is greater than 'carbon.streaming.segment.max.size', it will auto convert >> stream segment to batch segment(column format) at the begin of each batch >> and create a new stream segment to append data. >> However, I have no idea how to integrate with Spark Streaming yet, *any >> suggestion for this*? >> >> >> >> -- >> Sent from: >> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ > > > > > > -- > Sent from: > http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/ -- Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/