Let me have a try to explain it. Cube size determines how to split region for table in hbase after generate all cuboid files, for example, If all of your cuboid file size is 100GB, your cube size set to "SMALL", and the property for SMALL is 10GB, kylin will create hbase table with 10 regions. it will calculate every start rowkey and end rowkey of every region before create htable. then create table with those split infomations.
Rowkey column length is another thing, you can choose either use dictionary or set rowkey column length for every dimension , If you use dictionary, kylin will build dictionary for this column(Trie tree), it means every value of the dimension will be encoded as a unique number value, because dimension value is a part of hbase rowkey, so it will reduce hbase table size with dictionary. However, kylin store the dictionary in memory, if dimension cardinality is large, It will become something bad. If you set rowkey column length to N for one dimension, kylin will not build dictionary for it, and every value will be cutted to a N-length string, so, no dictionary in memory, rowkey in hbase table will be longer. Hope to be helpful to you. 2016-01-09 13:00 GMT+08:00 Kiriti Sai <[email protected]>: > Hi, > When using an UHC dimension, I've disabled the dictionary for that > dimension in the advanced settings and set the rowkey column length as 100 > since it's something like a text description. The data has around 6.6 > billion rows and I guess the cardinality is nearly 1 billion for this row. > I know Kylin is not suitable to be used in such scenario, but can someone > please explain me the relationship between the cube size and the rowkey > column length. I'm asking this question just out of curiosity, since I > haven't found any explanation relating these two. > > Thank You. >
