Some comments: [1] This is a known issue which just be fixed on last Friday; Please download a new build from https://kylin.incubator.apache.org/download/ Besides, the pk-fk doesn’t need be a dimension; as Kylin will join them as a big flat table (use the pk-fk info user entered) before building the cube; So you just need include the columns that would be “group by” as dimensions;
[2] The interface that Kylin exposed to external is standard SQL (the cube is transparent to BI user); So the metadata is similar as Hive tables; At run-time, Kylin will parse the SQL and then pick a Cube to support the query; The cube selection will consider: query tables and columns (whether all tables and columns are covered by the cube) , joins conditions (left, inner); If more than 1 cube can fulfill the query, Kylin will pick the most efficient one (e.g, less tables and columns). [3] 20 minutes is a reasonable time I think, as it involves a couple of steps on hive, map reduce, hbase, etc. If want to reduce the time, you need do some analysis to figure out which can be improved; Besides we’re developing a new cubing algorithm (https://issues.apache.org/jira/browse/KYLIN-607), aiming to reduce the time for large cubes (e.g, more than 10 dimensions); So far we don’t have the comparison data and couldn’t promise it helps your case; Please watch our mailing list we will announce if it is ready; Thanks for the interest on Kylin. On 4/5/15, 8:30 AM, "Li Yang" <[email protected]> wrote: >[Translation] > >2015-03-30 19:41 GMT-07:00 wanghaifei <[email protected]>: > >Hello > I tested cube and met a few problems. > > Here is an example: A fact table and a dimension table. > > They join like below: [here info missing] > > Cube dimensions like below: [here info missing] > > > --------------------------------------- > Question1: > Error occurs at the last step to save cube: Cannot find rowkey >column PROVINCE_ID in cube。 > But if I removed the id on dimension table, the cube creates >successfully. > But the created cube won't be able to serve join queries >(personally opinion) > Not sure the problem is about creating the cube or in correct >SQL? > > Question2: > The built cube is shown with hive table name [here info missing], >but not cube name (only show cube name at query time). And Chinese comment >is not supported? > If have more than one cube using the same fact table under one >project, then one will overlap with the other? (Under different project >don't have this problem.) > > Question3: > It took 20 minutes to build a cube with 3GB input, 3 dimensions, >2 >metrics. How to improve build speed? > > > >2015-03-31 14:38 GMT+08:00 Ted Dunning <[email protected]>: > >> The standard language of Apache mailing lists is English. I do know >>that a >> lot of contributors speak Chinese, however, so could somebody >>contribute a >> translation in addition to answering this question? >> >> >> >> 2015-03-30 19:41 GMT-07:00 wanghaifei <[email protected]>: >> >> > 你好, >> > 我在测试cube时遇到几个问题。 >> > >> > 这里有个例子: 有个事实表 fact1和对照表 dim, 现对他们进行关联。 >> > >> > 建立了如下关联关系: >> > >> > 建立的维度如下: >> > >> > >> > --------------------------------------- >> > 问题1: >> > 在最后一步保存cube时出现这个错误: Cannot find rowkey column PROVINCE_ID in >> > cube。 >> > 但是,我在建立维度时去掉对照表的 省id, cube就创建成功。 >> > 但,这会造成无法在cube查询时 ,进行关联操作,只能进行单表查询(个人观点)。 >> > 不知道是不是建cube出问题, 还是查询sql 写法的问题? >> > >> > 问题2: >> > 这里成功运行完cube以后出来的表示 用的的hive元表中的表名,而非特定的 >> > cube名称(只有在执行查询时,才会出现对应的cube), 且列不支持中文注释? >> > 如果在同一个project下创建俩cube, 且这俩cube用到的hive元表都是一样的, >> > 这就会出现一个cube会被另一cube覆盖?( 不同的project不存在此类问题) >> > >> > 问题3: >> > 3G数据量 3个维度、2指标,3节点跑cube需要接近20分钟, 如何提高执行效率? >> > >> > ------------------------------ >> > >>
