Let me have a try to explain it.

Cube size determines how to split region for table in hbase after generate
all cuboid files, for example, If all of your cuboid file size is 100GB,
your  cube size set to "SMALL", and the property for SMALL is 10GB, kylin
will create hbase table with 10 regions. it will calculate every start
rowkey and end rowkey of every region before create htable. then create
table with those split infomations.

Rowkey column length is another thing, you can choose either use dictionary
or set rowkey column length for every dimension , If you use dictionary,
kylin will build dictionary for this column(Trie tree), it means every
value of the dimension will be encoded as a unique number value, because
dimension value is a part of hbase rowkey, so it will reduce hbase table
size with dictionary. However, kylin store the dictionary in memory, if
dimension cardinality is large, It will become something bad. If you set rowkey
column length to N for one dimension, kylin will not build dictionary for
it, and every value will be cutted to a N-length string, so, no dictionary
in memory, rowkey in hbase table will be longer.

Hope to be helpful to you.

2016-01-09 13:00 GMT+08:00 Kiriti Sai <[email protected]>:

> Hi,
> When using an UHC dimension, I've disabled the dictionary for that
> dimension in the advanced settings and set the rowkey column length as 100
> since it's something like a text description. The data has around 6.6
> billion rows and I guess the cardinality is nearly 1 billion for this row.
> I know Kylin is not suitable to be used in such scenario, but can someone
> please explain me the relationship between the cube size and the rowkey
> column length. I'm asking this question just out of curiosity, since I
> haven't found any explanation relating these two.
>
> Thank You.
>

Reply via email to