Re: 关于关联表的cube创建和查询

Shi, Shaofeng Mon, 06 Apr 2015 03:40:31 -0700

Some comments:

[1] This is a known issue which just be fixed on last Friday; Please
download a new build from https://kylin.incubator.apache.org/download/
Besides, the pk-fk doesn’t need be a dimension; as Kylin will join them as
a big flat table (use the pk-fk info user entered) before building the
cube; So you just need include the columns that would be “group by” as
dimensions;


[2] The interface that Kylin exposed to external is standard SQL (the cube
is transparent to BI user); So the metadata is similar as Hive tables; At
run-time, Kylin will parse the SQL and then pick a Cube to support the
query; The cube selection will consider: query tables and columns (whether
all tables and columns are covered by the cube) , joins conditions (left,
inner); If more than 1 cube can fulfill the query, Kylin will pick the
most efficient one (e.g, less tables and columns).

[3] 20 minutes is a reasonable time I think, as it involves a couple of
steps on hive, map reduce, hbase, etc. If want to reduce the time, you
need do some analysis to figure out which can be improved; Besides we’re
developing a new cubing algorithm
(https://issues.apache.org/jira/browse/KYLIN-607), aiming to reduce the
time for large cubes (e.g, more than 10 dimensions); So far we don’t have
the comparison data and couldn’t promise it helps your case; Please watch
our mailing list we will announce if it is ready;

Thanks for the interest on Kylin.

On 4/5/15, 8:30 AM, "Li Yang" <[email protected]> wrote:

>[Translation]
>
>2015-03-30 19:41 GMT-07:00 wanghaifei <[email protected]>:
>
>Hello
>      I tested cube and met a few problems.
>
>      Here is an example:  A fact table and a dimension table.
>
>       They join like below:  [here info missing]
>
>       Cube dimensions like below: [here info missing]
>
>
> ---------------------------------------
>   Question1：
>         Error occurs at the last step to save cube:  Cannot find rowkey
>column PROVINCE_ID in cube。
>          But if I removed the id on dimension table, the cube creates
>successfully.
>          But the created cube won't be able to serve join queries
>(personally opinion)
>          Not sure the problem is about creating the cube or in correct
>SQL?
>
>  Question2:
>         The built cube is shown with hive table name [here info missing],
>but not cube name (only show cube name at query time). And Chinese comment
>is not supported?
>         If have more than one cube using the same fact table under one
>project, then one will overlap with the other?  (Under different project
>don't have this problem.)
>
>  Question3:
>         It took 20 minutes to build a cube with 3GB input, 3 dimensions,
>2
>metrics. How to improve build speed?
>
>
>
>2015-03-31 14:38 GMT+08:00 Ted Dunning <[email protected]>:
>
>> The standard language of Apache mailing lists is English.  I do know
>>that a
>> lot of contributors speak Chinese, however, so could somebody
>>contribute a
>> translation in addition to answering this question?
>>
>>
>>
>> 2015-03-30 19:41 GMT-07:00 wanghaifei <[email protected]>:
>>
>> >      你好，
>> >       我在测试cube时遇到几个问题。
>> >
>> >       这里有个例子：  有个事实表 fact1和对照表 dim, 现对他们进行关联。
>> >
>> >        建立了如下关联关系：
>> >
>> >        建立的维度如下：
>> >
>> >
>> >  ---------------------------------------
>> >    问题1：
>> >          在最后一步保存cube时出现这个错误： Cannot find rowkey column
PROVINCE_ID in
>> > cube。
>> >           但是，我在建立维度时去掉对照表的 省id,   cube就创建成功。
>> >           但，这会造成无法在cube查询时 ，进行关联操作,只能进行单表查询(个人观点)。
>> >            不知道是不是建cube出问题， 还是查询sql 写法的问题？
>> >
>> >   问题2：
>> >          这里成功运行完cube以后出来的表示  用的的hive元表中的表名，而非特定的
>> > cube名称(只有在执行查询时,才会出现对应的cube)， 且列不支持中文注释？
>> >          如果在同一个project下创建俩cube, 且这俩cube用到的hive元表都是一样的，
>> > 这就会出现一个cube会被另一cube覆盖？( 不同的project不存在此类问题)
>> >
>> >   问题3：
>> >         3G数据量 3个维度、2指标，3节点跑cube需要接近20分钟， 如何提高执行效率？
>> >
>> > ------------------------------
>> >
>>

Re: 关于关联表的cube创建和查询

Reply via email to