Richard Calaba created KYLIN-1836:
-------------------------------------
Summary: Kylin 1.5+ New Aggregation Group - UI improvement
Key: KYLIN-1836
URL: https://issues.apache.org/jira/browse/KYLIN-1836
Project: Kylin
Issue Type: Improvement
Affects Versions: v1.5.2, v1.5.1, v1.5.0, v1.5.3, v1.5.2.1
Reporter: Richard Calaba
After reading the Tech Blog -
https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/ from Hongbin Ma
- I got few ideas mentioned below - to help the Cube designers understand
impact of their cube design on the Build and Query performance - see below:
BTW: hank you for putting this Blog together !!! and thank you for referencing
this blog through Kylin UI - link in the Aggregation Groups section !! - it is
very powerful optimization technique.)
Idea 1
=====
It would be great if the Advanced Settings section on UI can calculate the
exact number of Cuboids defined by every Aggregation Group (# of combinations ;
# of pruned combinations (based on Hier/Joint and Mandatory Dimensions) and
then also showing the overall total of Cuboids considering ALL the defined
Aggregation Groups.
Idea 2
=====
As Aggregation Group section is about optimizing # of necessary cuboids
assuming you know the queries patterns. This is sometimes easy but for more
complex dashboards where multiple people work on defining the queries this is
hard to control and guess, thus I would suggest adding a new Tab in the Monitor
Kylin UI - next to Job and Slow Queries add additional tab "Non-satisfied
Queries" showing the Queries which were not able to be evaluated by Kylin -
queries which end with "No Realization" exception. Together with the Query SQL
(including all the parameters) it would help to show the "missing dimension
name" used in the query which was the cause for not finding proper Cuboid.
Idea 3
=====
Can anyone also document the section Rowkeys in the same section of UI
(Advanced Settings) ??? It is not really clear what effect will have if I start
playing with the Rowkeys section (adding/removing dimension fields; adding
non-dimension fields, ...). All I understand is that the "Rowkeys" section has
impact only on HBase storage of calculated cuboids. Thus doesn't have impact on
Cbe Build time that much (only that Trie for dictionary needs to be bulit for
every specified rowkey) - major impact it hase on HBase size / regions split
and thus also Query time.
What I am for example confused with is if I can define high-cardinality
dimension in Cube and remove it from the Rowkeys section ??? What would happen
in HBase storage and expected Query time ...
The closest explanation I fond is this from - Yu Feng's reply
--http://apache-kylin.74782.x6.nabble.com/Relationship-between-rowkey-column-length-and-cube-size-td3174.html
==========================================================Reply: Cube size
determines how to split region for table in hbase after generate
all cuboid files, for example, If all of your cuboid file size is 100GB,
your cube size set to "SMALL", and the property for SMALL is 10GB, kylin
will create hbase table with 10 regions. it will calculate every start
rowkey and end rowkey of every region before create htable. then create
table with those split infomations.
Rowkey column length is another thing, you can choose either use dictionary
or set rowkey column length for every dimension , If you use dictionary,
kylin will build dictionary for this column(Trie tree), it means every
value of the dimension will be encoded as a unique number value, because
dimension value is a part of hbase rowkey, so it will reduce hbase table
size with dictionary. However, kylin store the dictionary in memory, if
dimension cardinality is large, It will become something bad. If you set rowkey
column length to N for one dimension, kylin will not build dictionary for
it, and every value will be cutted to a N-length string, so, no dictionary
in memory, rowkey in hbase table will be longer.
==========================================================
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)