Good summary Luke! :-) We shall see how these fit in sprint plans. I hope some task can start in one or two sprints.
On Fri, May 1, 2015 at 12:59 AM, Luke Han <[email protected]> wrote: > There are widely discussions around Spark and Kylin when we talking to > different ordinations, teams and individuals. To leverage Spark ecosystem > is always in our mind, it could bring some benefits for the current > challenges Kylin has: > > *High latency when reading data from Hive * > --Several hours to fetch data when join big tables > --Route to SQL-on-Hadoop turned off due to performance issue > > *Time-to-Market of data latency* > --Huge IO & Network traffic with MR jobs > > *Streaming* > --Streaming process and pre-calculate cubes > > > Leveraging Spark, there are some options: > > *Integrating with Spark SQL: * > --Option I: Read data from SparkSQL instead of Hive > --Option II: Route unsupported queries to SparkSQL > --Option III: Kylin to be OLAP source of SparkSQL > > *Spark Cube Build Engine* > --Efficiency cube generate engine with Spark > > *Spark Streaming * > --Leverage SparkStreaming for StreamingOLAP (TBD) > > *HBase?* > --Any idea? > > > With great meetings with Huawei, Paypal and others, I think it's time to > bring this to design and architecture phase now. > > Here's epic and features for tracking: > https://issues.apache.org/jira/browse/KYLIN-679 > > KYLIN-741 <https://issues.apache.org/jira/browse/KYLIN-741> Read data from > SparkSQL > KYLIN-742 <https://issues.apache.org/jira/browse/KYLIN-742> Route > unsupported queries to SparkSQL > KYLIN-743 <https://issues.apache.org/jira/browse/KYLIN-743> Kylin to be > OLAP source of SparkSQL > KYLIN-744 <https://issues.apache.org/jira/browse/KYLIN-744> Spark Cube > Build Engine > > The initial efforts will focus on SparkSQL and then Sparking Cube Engine. > > Please leave general discussion here but please start to talking technical > detail, design, architecture under each JIRA (mailing list will got all > update automatically). > > If you have any idea or willing to contribute, please feel free to let's > know or add comments in each ticket. > > PS. as previous plan, Spark relative stuff will be managed under v0.9.x > version: > https://issues.apache.org/jira/browse/KYLIN-577 > <https://issues.apache.org/jira/browse/KYLIN-577> > > Thanks. > > Best Regards! > --------------------- > > Luke Han >
