There are widely discussions around Spark and Kylin when we talking to
different ordinations, teams and individuals. To leverage Spark ecosystem
is always in our mind, it could bring some benefits for the current
challenges Kylin has:

*High latency when reading data from Hive *
--Several hours to fetch data when join big tables
--Route to SQL-on-Hadoop turned off due to performance issue

*Time-to-Market of data latency*
--Huge IO & Network traffic with MR jobs

*Streaming*
--Streaming process and pre-calculate cubes


Leveraging Spark, there are some options:

*Integrating with Spark SQL: *
--Option I: Read data from SparkSQL instead of Hive
--Option II: Route unsupported queries to SparkSQL
--Option III: Kylin to be OLAP source of SparkSQL

*Spark Cube Build Engine*
--Efficiency cube generate engine with Spark

*Spark Streaming *
--Leverage SparkStreaming for StreamingOLAP (TBD)

*HBase?*
--Any idea?


With great meetings with Huawei, Paypal and others, I think it's time to
bring this to design and architecture phase now.

Here's epic and features for tracking:
https://issues.apache.org/jira/browse/KYLIN-679

KYLIN-741 <https://issues.apache.org/jira/browse/KYLIN-741> Read data from
SparkSQL
KYLIN-742 <https://issues.apache.org/jira/browse/KYLIN-742> Route
unsupported queries to SparkSQL
KYLIN-743 <https://issues.apache.org/jira/browse/KYLIN-743> Kylin to be
OLAP source of SparkSQL
KYLIN-744 <https://issues.apache.org/jira/browse/KYLIN-744> Spark Cube
Build Engine

The initial efforts will focus on SparkSQL and then Sparking Cube Engine.

Please leave general discussion here but please start to talking technical
detail, design, architecture under each JIRA (mailing list will got all
update automatically).

If you have any idea or willing to contribute, please feel free to let's
know or add comments in each ticket.

PS. as previous plan, Spark relative stuff will be managed under v0.9.x
version:
https://issues.apache.org/jira/browse/KYLIN-577
<https://issues.apache.org/jira/browse/KYLIN-577>

Thanks.

Best Regards!
---------------------

Luke Han

Reply via email to