Hi Very excited to see that CarbonData will integrate with Spark 2.x, look forward to getting performance improved further and usability enhanced.
Regards Liang Jacky Li wrote > Hi all, > > Currently CarbonData only works with spark1.5 and spark1.6, as Apache > Spark community is moving to 2.1, more and more user will deploy spark 2.x > in production environment. In order to make CarbonData even more popular, > I think now it is good time to start considering spark2.x integration with > CarbonData. > > Moreover, we can take this as a chance to refactory CarbonData to make it > both easier to use and higher performance. > > Usability: > Instead of using CarbonContext, in spark2 integration, user should able to > 1. use native SparkSession in the spark application to create and query > table backed by CarbonData files with full feature support, including > index and late decode optimization. > > 2. use CarbonData's API and tool to acomplish carbon specific tasks, like > compaction, delete segment, etc. > > Perforamnce: > 1. deep integration with Datasource API and leveraging spark2's whole > stage codegen feature. > > 2. provide implementation of vectorized record reader, to improve scanning > performance. > > Since spark2 changes a lot comparing to spark 1.6, it may take some time > to complete all these features. With the help of contributors and > committers, I hope we can have basic features working in next CarbonData > release. > > What do you think about this idea? All kinds of contribution and > suggestions are welcomed. > > Regards, > Jacky Li -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Spark-2-integration-with-CarbonData-tp3236p3238.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
