Hi 

Very excited to see that CarbonData will integrate with Spark 2.x, look
forward to getting performance improved further and usability enhanced.

Regards
Liang


Jacky Li wrote
> Hi all,
> 
> Currently CarbonData only works with spark1.5 and spark1.6, as Apache
> Spark community is moving to 2.1, more and more user will deploy spark 2.x
> in production environment. In order to make CarbonData even more popular,
> I think now it is good time to start considering spark2.x integration with
> CarbonData.
> 
> Moreover, we can take this as a chance to refactory CarbonData to make it
> both easier to use and higher performance.
> 
> Usability:
> Instead of using CarbonContext, in spark2 integration, user should able to
> 1. use native SparkSession in the spark application to create and query
> table backed by CarbonData files with full feature support, including
> index and late decode optimization.
> 
> 2. use CarbonData's API and tool to acomplish carbon specific tasks, like
> compaction, delete segment, etc.
> 
> Perforamnce:
> 1. deep integration with Datasource API and leveraging spark2's whole
> stage codegen feature.
> 
> 2. provide implementation of vectorized record reader, to improve scanning
> performance.
> 
> Since spark2 changes a lot comparing to spark 1.6, it may take some time
> to complete all these features. With the help of contributors and
> committers, I hope we can have basic features working in next CarbonData
> release. 
> 
> What do you think about this idea? All kinds of contribution and
> suggestions are welcomed.
> 
> Regards,
> Jacky Li





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Spark-2-integration-with-CarbonData-tp3236p3238.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Reply via email to