Re: Adding Spark Support to Apache Kylin

Li Yang Sat, 02 May 2015 14:04:05 -0700

Good summary Luke!  :-)

We shall see how these fit in sprint plans. I hope some task can start in
one or two sprints.


On Fri, May 1, 2015 at 12:59 AM, Luke Han <[email protected]> wrote:

> There are widely discussions around Spark and Kylin when we talking to
> different ordinations, teams and individuals. To leverage Spark ecosystem
> is always in our mind, it could bring some benefits for the current
> challenges Kylin has:
>
> *High latency when reading data from Hive *
> --Several hours to fetch data when join big tables
> --Route to SQL-on-Hadoop turned off due to performance issue
>
> *Time-to-Market of data latency*
> --Huge IO & Network traffic with MR jobs
>
> *Streaming*
> --Streaming process and pre-calculate cubes
>
>
> Leveraging Spark, there are some options:
>
> *Integrating with Spark SQL: *
> --Option I: Read data from SparkSQL instead of Hive
> --Option II: Route unsupported queries to SparkSQL
> --Option III: Kylin to be OLAP source of SparkSQL
>
> *Spark Cube Build Engine*
> --Efficiency cube generate engine with Spark
>
> *Spark Streaming *
> --Leverage SparkStreaming for StreamingOLAP (TBD)
>
> *HBase?*
> --Any idea?
>
>
> With great meetings with Huawei, Paypal and others, I think it's time to
> bring this to design and architecture phase now.
>
> Here's epic and features for tracking:
> https://issues.apache.org/jira/browse/KYLIN-679
>
> KYLIN-741 <https://issues.apache.org/jira/browse/KYLIN-741> Read data from
> SparkSQL
> KYLIN-742 <https://issues.apache.org/jira/browse/KYLIN-742> Route
> unsupported queries to SparkSQL
> KYLIN-743 <https://issues.apache.org/jira/browse/KYLIN-743> Kylin to be
> OLAP source of SparkSQL
> KYLIN-744 <https://issues.apache.org/jira/browse/KYLIN-744> Spark Cube
> Build Engine
>
> The initial efforts will focus on SparkSQL and then Sparking Cube Engine.
>
> Please leave general discussion here but please start to talking technical
> detail, design, architecture under each JIRA (mailing list will got all
> update automatically).
>
> If you have any idea or willing to contribute, please feel free to let's
> know or add comments in each ticket.
>
> PS. as previous plan, Spark relative stuff will be managed under v0.9.x
> version:
> https://issues.apache.org/jira/browse/KYLIN-577
> <https://issues.apache.org/jira/browse/KYLIN-577>
>
> Thanks.
>
> Best Regards!
> ---------------------
>
> Luke Han
>

Re: Adding Spark Support to Apache Kylin

Reply via email to