Hi Kylin Community, I had discussion with Shaofeng (@Shaofengshi) on JIRA KYLIN-1351 to make Kylin to support RDBMS as data source. But we want to get more input from community to see how much importance and urgency for this feature. Please do respond and provide your suggestion if you are in need of this feature or are interested in developing this feature.
Though Kylin today supports plugin datasource, this RDBMS feature is not trivial in that we need take care of the following problems. 1. Independent dictionary especially for data type mapping. Hive has its different data type system from RDBMS. Kylin dictionary should infer column type from HIVE schema today, but we need make sure dictionary is dependent of data source so that RDBMS schema can be stored in Kylin dictionary 2. Pipeline Do we import data from RDBMS to Hive or directly read data from RDBMS? If the destination is Hive, we may reuse current Hive MR cubing job, but we need take care of RDBMS to Hive conversion. If Kylin directly reads data from RDBMS, we need write a new MR or Spark job. 3. Consistency Normally RDBMS supports data insert/update/delete, how does Kylin handle that? 4. Read continuously Do we require that RDBMS fact table always has a timestamp field which Kylin uses for reading records continuously? 5. Cube modeling Is current cube modeling feature independent enough to support RDBMS modeling? 6. Sharding Normally RDBMS can support complicated join queries across multiple tables, here the reason we use Kylin is probably that the source table is sharded into many children tables and Kylin can query across all the shards once after the data is imported into Kylin. Thanks Edward