[jira] [Updated] (KYLIN-745) Generic Data Reader

Luke Han (JIRA) Wed, 29 Apr 2015 06:15:04 -0700

     [ 
https://issues.apache.org/jira/browse/KYLIN-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Luke Han updated KYLIN-745:
---------------------------
    Description: 
When data be stored on existing DW like Oracle, it's not be able to read 
directly through Kylin to build cube. 
There are many requirements coming from different teams like Candor about this.

There are two options:
#1, copy your data to Hive and then build cube through Kylin. There are some 
cases are running this model to bring data into Hive from DW and leveraging 
Kylin very well. 
#2, rewrite  data read module to pull data from Oracle directly. Actually, the 
first step of cube build is generate Hive Query to read data and generate one 
temp table in Hive, so it should be not too complicated to do this (but it 
depends network and others, otherwise, #1 will be more efficient one). Then 
process cube build as normal. Using generical reader to read data from any SQL 
rdbms through JDBC or other protocol will be perfect solution since cube could 
be built without ETL process

Scope:
Only ready data directly from existing RDBMs and store jointed result in Hive 
(temp table) for further processing, no any other transfomation. 
By design, Kylin is OLAP system not ETL one.

  was:
One requirement from Candor:
"We have large Oracle based OLAP data warehouse (star schema), and I was
wondering if it is possible to connect Kylin to our existing data warehouse
instead of using Hive."

There are two options:
     #1, copy your data to Hive and then build cube through Kylin. There are 
some cases are running this model to bring data into Hive from DW and 
leveraging Kylin very well. 
     #2, rewrite  data read module to pull data from Oracle directly. Actually, 
the first step of cube build is generate Hive Query to read data and generate 
one temp table in Hive, so it should be not too complicated to do this (but it 
depends network and others, otherwise, #1 will be more efficient one). Then 
process cube build as normal.


>From Jaya, CTO of 
"We currently have few different data warehouses (Oracle, Postgres, Greenplum) 
and I totally agree with you that a generic jdbc data read module is the right 
way to go."


When data be stored on existing DW like Oracle, it's not be able to read 
directly through Kylin to build cube. 
There are many requirements coming from different teams like Candor about this.

There are two options:
#1, copy your data to Hive and then build cube through Kylin. There are some 
cases are running this model to bring data into Hive from DW and leveraging 
Kylin very well. 
#2, rewrite  data read module to pull data from Oracle directly. Actually, the 
first step of cube build is generate Hive Query to read data and generate one 
temp table in Hive, so it should be not too complicated to do this (but it 
depends network and others, otherwise, #1 will be more efficient one). Then 
process cube build as normal. Using generical reader to read data from any SQL 
rdbms through JDBC or other protocol will be perfect solution since cube could 
be built without ETL process

Scope:
Only ready data directly from existing RDBMs and store jointed result in Hive 
(temp table) for further processing, no any other transfomation. 
By design, Kylin is OLAP system not ETL one.


> Generic Data Reader
> -------------------
>
>                 Key: KYLIN-745
>                 URL: https://issues.apache.org/jira/browse/KYLIN-745
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Job Engine, Spark Engine
>            Reporter: Luke Han
>            Assignee: ZhouQianhao
>
> When data be stored on existing DW like Oracle, it's not be able to read 
> directly through Kylin to build cube. 
> There are many requirements coming from different teams like Candor about 
> this.
> There are two options:
> #1, copy your data to Hive and then build cube through Kylin. There are some 
> cases are running this model to bring data into Hive from DW and leveraging 
> Kylin very well. 
> #2, rewrite  data read module to pull data from Oracle directly. Actually, 
> the first step of cube build is generate Hive Query to read data and generate 
> one temp table in Hive, so it should be not too complicated to do this (but 
> it depends network and others, otherwise, #1 will be more efficient one). 
> Then process cube build as normal. Using generical reader to read data from 
> any SQL rdbms through JDBC or other protocol will be perfect solution since 
> cube could be built without ETL process
> Scope:
> Only ready data directly from existing RDBMs and store jointed result in Hive 
> (temp table) for further processing, no any other transfomation. 
> By design, Kylin is OLAP system not ETL one.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (KYLIN-745) Generic Data Reader

Reply via email to