[
https://issues.apache.org/jira/browse/KYLIN-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Richard Calaba updated KYLIN-1836:
----------------------------------
Summary: Kylin 1.5+ New Aggregation Group - UI improvements (was: Kylin
1.5+ New Aggregation Group - UI improvement)
> Kylin 1.5+ New Aggregation Group - UI improvements
> --------------------------------------------------
>
> Key: KYLIN-1836
> URL: https://issues.apache.org/jira/browse/KYLIN-1836
> Project: Kylin
> Issue Type: Improvement
> Affects Versions: v1.5.0, v1.5.1, v1.5.2, v1.5.3, v1.5.2.1
> Reporter: Richard Calaba
>
> After reading the Tech Blog -
> https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin
> Ma - I got few ideas mentioned below - to help the Cube designers understand
> impact of their cube design on the Build and Query performance - see below:
> BTW: hank you for putting this Blog together !!! and thank you for
> referencing this blog through Kylin UI - link in the Aggregation Groups
> section !! - it is very powerful optimization technique.)
> Idea 1
> =====
> It would be great if the Advanced Settings section on UI can calculate the
> exact number of Cuboids defined by every Aggregation Group (# of combinations
> ; # of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and
> then also showing the overall total of Cuboids considering ALL the defined
> Aggregation Groups.
> Idea 2
> =====
> As Aggregation Group section is about optimizing # of necessary cuboids
> assuming you know the queries patterns. This is sometimes easy but for more
> complex dashboards where multiple people work on defining the queries this is
> hard to control and guess, thus I would suggest adding a new Tab in the
> Monitor Kylin UI - next to Job and Slow Queries add additional tab
> "Non-satisfied Queries" showing the Queries which were not able to be
> evaluated by Kylin - queries which end with "No Realization" exception.
> Together with the Query SQL (including all the parameters) it would help to
> show the "missing dimension name" used in the query which was the cause for
> not finding proper Cuboid.
> Idea 3
> =====
> Can anyone also document the section Rowkeys in the same section of UI
> (Advanced Settings) ??? It is not really clear what effect will have if I
> start playing with the Rowkeys section (adding/removing dimension fields;
> adding non-dimension fields, ...). All I understand is that the "Rowkeys"
> section has impact only on HBase storage of calculated cuboids. Thus doesn't
> have impact on Cube Build time that much (except the impact that the Trie for
> dictionary needs to be built for every specified rowkey on this tab). I
> understand that the major impact of Rowkeys section is thus only on HBase
> size / regions split and thus also on the Query execution time.
> What I am confused with is whether I can define high-cardinality dimension in
> Cube and remove it from the Rowkeys section ??? What would happen in HBase
> storage and expected Query time ...would that dimension be still
> query-enabled ??
> The closest explanation I found is this Reply from - Yu Feng's here
> http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html
> ==========================================================
> Reply: Cube size determines how to split region for table in hbase after
> generate
> all cuboid files, for example, If all of your cuboid file size is 100GB,
> your cube size set to "SMALL", and the property for SMALL is 10GB, kylin
> will create hbase table with 10 regions. it will calculate every start
> rowkey and end rowkey of every region before create htable. then create
> table with those split infomations.
> Rowkey column length is another thing, you can choose either use dictionary
> or set rowkey column length for every dimension , If you use dictionary,
> kylin will build dictionary for this column(Trie tree), it means every
> value of the dimension will be encoded as a unique number value, because
> dimension value is a part of hbase rowkey, so it will reduce hbase table
> size with dictionary. However, kylin store the dictionary in memory, if
> dimension cardinality is large, It will become something bad. If you set
> rowkey
> column length to N for one dimension, kylin will not build dictionary for
> it, and every value will be cutted to a N-length string, so, no dictionary
> in memory, rowkey in hbase table will be longer.
> ==========================================================
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)