Re: How to support Avro Complex Type on Kylin

Luke Han Tue, 01 Dec 2015 01:18:08 -0800

Hive table with Star-Schema is Protocol between Kylin and upstream, ETL
process.


A key success fact of a DW/BI project is to make sure the data model to be
stable as much as possible. When we prompt user with one option "ignore",
they will ignore eventually and never looking back again.

But they actually need such data as dimension or measure,
once they notice this, either report a defect of Kylin, or continue to ask
support such data type.

And, as Kylin's design principle, only Hive metadata will be exposed
exactly to Analyst/End User
for ANSI SQL for interactive query, if there are missing columns, any owner
of
Kylin platform, even Kylin community here, have to "convince" users such
data types are not
supported yet.
But if stop them at the very early beginning, there's one chance to
"educate" users to accept
the Protocol we are using today.
And, such stop will lead user to refine their ETL process or create view
before really play in Kylin,
that's better than let them refactor all ETL, Hive, Kylin Cube even Report
later with many efforts already putted on that.

Hope this will bring more clear idea about why it should let user to do
something for data model but not do "automatically" job for them:-)

Thanks.


Best Regards!
---------------------

Luke Han

On Tue, Dec 1, 2015 at 4:53 PM, Xiaoyu Wang <[email protected]> wrote:

> We can prompt to user the unspport column,and let user choose ignore or
> cancel.
>
>
> 在 2015年12月01日 16:16, Han, Luke 写道:
>
>> No, ignore is not right approach.
>>
>> Show stop for user and refine the data model will be the right way,
>> otherwise how can you explain to user those missing columns?
>>
>> We could have one notification when user sync metadata to Kylin with such
>> data type.
>>
>> Thanks.
>>
>> 发自我的 iPhone
>>
>> 在 2015年12月1日，16:13，Xiaoyu Wang <[email protected]> 写道：
>>>
>>> Yes agree,the jira:https://issues.apache.org/jira/browse/KYLIN-1111
>>> I will try do it and submit patch.
>>>
>>> 在 2015年12月01日 16:07, Shi, Shaofeng 写道:
>>>> Kylin should automatically skip these complex columns, instead of
>>>> blocking
>>>> user from import the table, what do you think?
>>>>
>>>> On 12/1/15, 3:32 PM, "Xiaoyu Wang" <[email protected]> wrote:
>>>>>
>>>>> Yes You can create a hive view to remove the datatype array,map column.
>>>>>
>>>>> 在 2015年12月01日 15:26, Yiming Liu 写道:
>>>>>> Thanks Xiaoyu, for the quick response.
>>>>>>
>>>>>>
>>>>>> Currently, there is no way to remove those fields. The error happens
>>>>>> on
>>>>>> the first step "Sync Hive tables" when designing cube.
>>>>>>
>>>>>>
>>>>>> I will redesign my original tables to fit the datatype requirement.
>>>>>>
>>>>>>
>>>>>> ------------------ Original ------------------
>>>>>> From:  "Xiaoyu Wang";<[email protected]>;
>>>>>> Date:  Tue, Dec 1, 2015 03:20 PM
>>>>>> To:  "dev"<[email protected]>;
>>>>>>
>>>>>> Subject:  Re: How to support Avro Complex Type on Kylin
>>>>>>
>>>>>>
>>>>>>
>>>>>> Kylin does not support datatype like "array" "map".
>>>>>> Can't set the array,map datatype column as dimension.
>>>>>> You can remove the array,map column from cube design, and retry .
>>>>>>
>>>>>> 在 2015年12月01日 15:05, Yiming Liu 写道:
>>>>>>> Hi Kylin expert,
>>>>>>>
>>>>>>> I have a table with avro encoding. It has map, array field type. I
>>>>>>> could query the table on Hive.
>>>>>>>
>>>>>>> When I sync the table into Kylin, the Kylin says:
>>>>>>> "bad data type -- array&lt;string&gt;, does not match
>>>>>>>
>>>>>>> (any|char|varchar|boolean|binary|integer|tinyint|smallint|bigint|decimal
>>>>>>>
>>>>>>> |numeric|float|real|double|date|time|datetime|timestamp|byte|int|short|l
>>>>>>> ong|string|hllc|_literal_type|_derived_type)\s*(?:[(]([\d\s,]+)[)])?"
>>>>>>>
>>>>>>> So it seems Kylin does not support the avro complex type, is it
>>>>>>> right?
>>>>>>> Do you have any suggestion how to process the complex data type.
>>>>>>>
>>>>>>> SerDe Library:    org.apache.hadoop.hive.serde2.avro.AvroSerDe
>>>>>>>
>>>>>>> InputFormat:    org.apache.hadoop.hive.ql.io
>>>>>>> .avro.AvroContainerInputFormat
>>>>>>>
>>>>>>> OutputFormat:    org.apache.hadoop.hive.ql.io
>>>>>>> .avro.AvroContainerOutputForma
>>>>>>> t
>>>>>>>
>>>>>>> Following is my table schema:
>>>>>>> 0        sessionid    string
>>>>>>> 1        userid    string
>>>>>>> 2        hosts    array<string>
>>>>>>> 3        domain    string
>>>>>>> 4        visittimes    int
>>>>>>> 5        firsttimestamp    bigint
>>>>>>> 6        lasttimestamp    bigint
>>>>>>> 7        sessiontimestamp    bigint
>>>>>>> 8        useragent    map<string,string>
>>>>>>> 9        srcaddrunsignedint    bigint
>>>>>>> 10        srcaddrstr    string
>>>>>>> 11        srcaddrcity    map<string,string>
>>>>>>> 12        srcaddrlocation    map<string,string>
>>>>>>> 13        destaddrunsignedint    bigint
>>>>>>> 14        destaddrstr    string
>>>>>>> 15        destaddrcity    map<string,string>
>>>>>>> 16        destaddrlocation    map<string,string>
>>>>>>> 17        keywords    map<string,array<string>>
>>>>>>> 18        topics    map<string,double>
>>>>>>> 19        cookies    map<string,string>
>>>>>>> 20        urls    array<string>
>>>>>>> 21        year    int
>>>>>>> 22        month    int
>>>>>>> 23        day    int
>>>>>>> 24        hour    int
>>>>>>>
>>>>>>
>

Re: How to support Avro Complex Type on Kylin

Reply via email to