[ 
https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Calaba updated KYLIN-1836:
----------------------------------
    Summary: Kylin 1.5+ New Aggregation Group - UI improvements  (was: Kylin 
1.5+ New Aggregation Group - UI improvement)

> Kylin 1.5+ New Aggregation Group - UI improvements
> --------------------------------------------------
>
>                 Key: KYLIN-1836
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1836
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1
>            Reporter: Richard Calaba
>
> After reading the Tech Blog - 
> https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin 
> Ma - I got few ideas mentioned below - to help the Cube designers understand 
> impact of their cube design on the Build and Query performance - see below:
> BTW: hank you for putting this Blog together !!! and thank you for 
> referencing this blog through Kylin UI - link in the Aggregation Groups 
> section !! - it is very powerful optimization technique.)
> Idea 1
> =====
>  It would be great if the Advanced Settings section on UI can calculate the 
> exact number of Cuboids defined by every Aggregation Group (# of combinations 
> ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and 
> then also showing the overall total of Cuboids considering ALL the defined 
> Aggregation Groups.
> Idea 2
> =====
> As Aggregation Group section is about optimizing # of necessary cuboids 
> assuming you know the queries patterns. This is sometimes easy but for more 
> complex dashboards where multiple people work on defining the queries this is 
> hard to control and guess, thus I would suggest adding a new Tab in the 
> Monitor Kylin UI - next to Job and Slow Queries add additional tab 
> "Non-satisfied Queries" showing the Queries which were not able to be 
> evaluated by Kylin - queries which end with "No Realization" exception. 
> Together with the Query SQL (including all the parameters) it would help to 
> show the "missing dimension name" used in the query which was the cause for 
> not finding proper Cuboid.
> Idea 3
> =====
> Can anyone also document the section Rowkeys in the same section of UI 
> (Advanced Settings) ??? It is not really clear what effect will have if I 
> start playing with the Rowkeys section (adding/removing dimension fields; 
> adding non-dimension fields, ...). All I understand is that the "Rowkeys" 
> section has impact only on HBase storage of calculated cuboids. Thus doesn't 
> have impact on Cube Build time that much (except the impact that the Trie for 
> dictionary needs to be built for every specified rowkey on this tab). I 
> understand that the major impact of Rowkeys section is thus only on HBase 
> size / regions split and thus also on the Query execution time. 
> What I am confused with is whether I can define high-cardinality dimension in 
> Cube and remove it from the Rowkeys section ??? What would happen in HBase 
> storage and expected Query time ...would that dimension be still 
> query-enabled ??
> The closest explanation I found is this Reply from - Yu Feng's here 
> http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html
> ==========================================================
> Reply: Cube size determines how to split region for table in hbase after 
> generate 
> all cuboid files, for example, If all of your cuboid file size is 100GB, 
> your  cube size set to "SMALL", and the property for SMALL is 10GB, kylin 
> will create hbase table with 10 regions. it will calculate every start 
> rowkey and end rowkey of every region before create htable. then create 
> table with those split infomations. 
> Rowkey column length is another thing, you can choose either use dictionary 
> or set rowkey column length for every dimension , If you use dictionary, 
> kylin will build dictionary for this column(Trie tree), it means every 
> value of the dimension will be encoded as a unique number value, because 
> dimension value is a part of hbase rowkey, so it will reduce hbase table 
> size with dictionary. However, kylin store the dictionary in memory, if 
> dimension cardinality is large, It will become something bad. If you set 
> rowkey 
> column length to N for one dimension, kylin will not build dictionary for 
> it, and every value will be cutted to a N-length string, so, no dictionary 
> in memory, rowkey in hbase table will be longer. 
> ==========================================================



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to