Hive table with Star-Schema is Protocol between Kylin and upstream, ETL process.
A key success fact of a DW/BI project is to make sure the data model to be stable as much as possible. When we prompt user with one option "ignore", they will ignore eventually and never looking back again. But they actually need such data as dimension or measure, once they notice this, either report a defect of Kylin, or continue to ask support such data type. And, as Kylin's design principle, only Hive metadata will be exposed exactly to Analyst/End User for ANSI SQL for interactive query, if there are missing columns, any owner of Kylin platform, even Kylin community here, have to "convince" users such data types are not supported yet. But if stop them at the very early beginning, there's one chance to "educate" users to accept the Protocol we are using today. And, such stop will lead user to refine their ETL process or create view before really play in Kylin, that's better than let them refactor all ETL, Hive, Kylin Cube even Report later with many efforts already putted on that. Hope this will bring more clear idea about why it should let user to do something for data model but not do "automatically" job for them:-) Thanks. Best Regards! --------------------- Luke Han On Tue, Dec 1, 2015 at 4:53 PM, Xiaoyu Wang <[email protected]> wrote: > We can prompt to user the unspport column,and let user choose ignore or > cancel. > > > 在 2015年12月01日 16:16, Han, Luke 写道: > >> No, ignore is not right approach. >> >> Show stop for user and refine the data model will be the right way, >> otherwise how can you explain to user those missing columns? >> >> We could have one notification when user sync metadata to Kylin with such >> data type. >> >> Thanks. >> >> 发自我的 iPhone >> >> 在 2015年12月1日,16:13,Xiaoyu Wang <[email protected]> 写道: >>> >>> Yes agree,the jira:https://issues.apache.org/jira/browse/KYLIN-1111 >>> I will try do it and submit patch. >>> >>> 在 2015年12月01日 16:07, Shi, Shaofeng 写道: >>>> Kylin should automatically skip these complex columns, instead of >>>> blocking >>>> user from import the table, what do you think? >>>> >>>> On 12/1/15, 3:32 PM, "Xiaoyu Wang" <[email protected]> wrote: >>>>> >>>>> Yes You can create a hive view to remove the datatype array,map column. >>>>> >>>>> 在 2015年12月01日 15:26, Yiming Liu 写道: >>>>>> Thanks Xiaoyu, for the quick response. >>>>>> >>>>>> >>>>>> Currently, there is no way to remove those fields. The error happens >>>>>> on >>>>>> the first step "Sync Hive tables" when designing cube. >>>>>> >>>>>> >>>>>> I will redesign my original tables to fit the datatype requirement. >>>>>> >>>>>> >>>>>> ------------------ Original ------------------ >>>>>> From: "Xiaoyu Wang";<[email protected]>; >>>>>> Date: Tue, Dec 1, 2015 03:20 PM >>>>>> To: "dev"<[email protected]>; >>>>>> >>>>>> Subject: Re: How to support Avro Complex Type on Kylin >>>>>> >>>>>> >>>>>> >>>>>> Kylin does not support datatype like "array" "map". >>>>>> Can't set the array,map datatype column as dimension. >>>>>> You can remove the array,map column from cube design, and retry . >>>>>> >>>>>> 在 2015年12月01日 15:05, Yiming Liu 写道: >>>>>>> Hi Kylin expert, >>>>>>> >>>>>>> I have a table with avro encoding. It has map, array field type. I >>>>>>> could query the table on Hive. >>>>>>> >>>>>>> When I sync the table into Kylin, the Kylin says: >>>>>>> "bad data type -- array<string>, does not match >>>>>>> >>>>>>> (any|char|varchar|boolean|binary|integer|tinyint|smallint|bigint|decimal >>>>>>> >>>>>>> |numeric|float|real|double|date|time|datetime|timestamp|byte|int|short|l >>>>>>> ong|string|hllc|_literal_type|_derived_type)\s*(?:[(]([\d\s,]+)[)])?" >>>>>>> >>>>>>> So it seems Kylin does not support the avro complex type, is it >>>>>>> right? >>>>>>> Do you have any suggestion how to process the complex data type. >>>>>>> >>>>>>> SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe >>>>>>> >>>>>>> InputFormat: org.apache.hadoop.hive.ql.io >>>>>>> .avro.AvroContainerInputFormat >>>>>>> >>>>>>> OutputFormat: org.apache.hadoop.hive.ql.io >>>>>>> .avro.AvroContainerOutputForma >>>>>>> t >>>>>>> >>>>>>> Following is my table schema: >>>>>>> 0 sessionid string >>>>>>> 1 userid string >>>>>>> 2 hosts array<string> >>>>>>> 3 domain string >>>>>>> 4 visittimes int >>>>>>> 5 firsttimestamp bigint >>>>>>> 6 lasttimestamp bigint >>>>>>> 7 sessiontimestamp bigint >>>>>>> 8 useragent map<string,string> >>>>>>> 9 srcaddrunsignedint bigint >>>>>>> 10 srcaddrstr string >>>>>>> 11 srcaddrcity map<string,string> >>>>>>> 12 srcaddrlocation map<string,string> >>>>>>> 13 destaddrunsignedint bigint >>>>>>> 14 destaddrstr string >>>>>>> 15 destaddrcity map<string,string> >>>>>>> 16 destaddrlocation map<string,string> >>>>>>> 17 keywords map<string,array<string>> >>>>>>> 18 topics map<string,double> >>>>>>> 19 cookies map<string,string> >>>>>>> 20 urls array<string> >>>>>>> 21 year int >>>>>>> 22 month int >>>>>>> 23 day int >>>>>>> 24 hour int >>>>>>> >>>>>> >
