Yes ,very clear and agree with your idea.
This can save much time for every one!
在 2015年12月01日 17:17, Luke Han 写道:
Hive table with Star-Schema is Protocol between Kylin and upstream, ETL
process.
A key success fact of a DW/BI project is to make sure the data model to be
stable as much as possible. When we prompt user with one option "ignore",
they will ignore eventually and never looking back again.
But they actually need such data as dimension or measure,
once they notice this, either report a defect of Kylin, or continue to ask
support such data type.
And, as Kylin's design principle, only Hive metadata will be exposed
exactly to Analyst/End User
for ANSI SQL for interactive query, if there are missing columns, any owner
of
Kylin platform, even Kylin community here, have to "convince" users such
data types are not
supported yet.
But if stop them at the very early beginning, there's one chance to
"educate" users to accept
the Protocol we are using today.
And, such stop will lead user to refine their ETL process or create view
before really play in Kylin,
that's better than let them refactor all ETL, Hive, Kylin Cube even Report
later with many efforts already putted on that.
Hope this will bring more clear idea about why it should let user to do
something for data model but not do "automatically" job for them:-)
Thanks.
Best Regards!
---------------------
Luke Han
On Tue, Dec 1, 2015 at 4:53 PM, Xiaoyu Wang <[email protected]> wrote:
We can prompt to user the unspport column,and let user choose ignore or
cancel.
在 2015年12月01日 16:16, Han, Luke 写道:
No, ignore is not right approach.
Show stop for user and refine the data model will be the right way,
otherwise how can you explain to user those missing columns?
We could have one notification when user sync metadata to Kylin with such
data type.
Thanks.
发自我的 iPhone
在 2015年12月1日,16:13,Xiaoyu Wang <[email protected]> 写道:
Yes agree,the jira:https://issues.apache.org/jira/browse/KYLIN-1111
I will try do it and submit patch.
在 2015年12月01日 16:07, Shi, Shaofeng 写道:
Kylin should automatically skip these complex columns, instead of
blocking
user from import the table, what do you think?
On 12/1/15, 3:32 PM, "Xiaoyu Wang" <[email protected]> wrote:
Yes You can create a hive view to remove the datatype array,map column.
在 2015年12月01日 15:26, Yiming Liu 写道:
Thanks Xiaoyu, for the quick response.
Currently, there is no way to remove those fields. The error happens
on
the first step "Sync Hive tables" when designing cube.
I will redesign my original tables to fit the datatype requirement.
------------------ Original ------------------
From: "Xiaoyu Wang";<[email protected]>;
Date: Tue, Dec 1, 2015 03:20 PM
To: "dev"<[email protected]>;
Subject: Re: How to support Avro Complex Type on Kylin
Kylin does not support datatype like "array" "map".
Can't set the array,map datatype column as dimension.
You can remove the array,map column from cube design, and retry .
在 2015年12月01日 15:05, Yiming Liu 写道:
Hi Kylin expert,
I have a table with avro encoding. It has map, array field type. I
could query the table on Hive.
When I sync the table into Kylin, the Kylin says:
"bad data type -- array<string>, does not match
(any|char|varchar|boolean|binary|integer|tinyint|smallint|bigint|decimal
|numeric|float|real|double|date|time|datetime|timestamp|byte|int|short|l
ong|string|hllc|_literal_type|_derived_type)\s*(?:[(]([\d\s,]+)[)])?"
So it seems Kylin does not support the avro complex type, is it
right?
Do you have any suggestion how to process the complex data type.
SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe
InputFormat: org.apache.hadoop.hive.ql.io
.avro.AvroContainerInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io
.avro.AvroContainerOutputForma
t
Following is my table schema:
0 sessionid string
1 userid string
2 hosts array<string>
3 domain string
4 visittimes int
5 firsttimestamp bigint
6 lasttimestamp bigint
7 sessiontimestamp bigint
8 useragent map<string,string>
9 srcaddrunsignedint bigint
10 srcaddrstr string
11 srcaddrcity map<string,string>
12 srcaddrlocation map<string,string>
13 destaddrunsignedint bigint
14 destaddrstr string
15 destaddrcity map<string,string>
16 destaddrlocation map<string,string>
17 keywords map<string,array<string>>
18 topics map<string,double>
19 cookies map<string,string>
20 urls array<string>
21 year int
22 month int
23 day int
24 hour int