Hi Kylin Community,

I had discussion with Shaofeng (@Shaofengshi) on JIRA KYLIN-1351 to make
Kylin to support RDBMS as data source. But we want to get more input from
community to see how much importance and urgency for this feature. Please
do respond and provide your suggestion if you are in need of this feature
or are interested in developing this feature.

Though Kylin today supports plugin datasource, this RDBMS feature is not
trivial in that we need take care of the following problems.

1. Independent dictionary especially for data type mapping.
Hive has its different data type system from RDBMS. Kylin dictionary should
infer column type from HIVE schema today, but we need make sure dictionary
is dependent of data source so that RDBMS schema can be stored in Kylin
dictionary

2. Pipeline
Do we import data from RDBMS to Hive or directly read data from RDBMS?
If the destination is Hive, we may reuse current Hive MR cubing job, but we
need take care of RDBMS to Hive conversion.
If Kylin directly reads data from RDBMS, we need write a new MR or Spark
job.

3. Consistency
Normally RDBMS supports data insert/update/delete, how does Kylin handle
that?

4. Read continuously
Do we require that RDBMS fact table always has a timestamp field which
Kylin uses for reading records continuously?

5. Cube modeling
Is current cube modeling feature independent enough to support RDBMS
modeling?

6. Sharding
Normally RDBMS can support complicated join queries across multiple tables,
here the reason we use Kylin is probably that the source table is sharded
into many children tables and Kylin can query across all the shards once
after the data is imported into Kylin.

Thanks
Edward

Reply via email to