Piece of my knowledge on Kylin:

On 3/17/15, 1:38 PM, "Abhishek Sinha" <[email protected]> wrote:

>Hi,
>
>Can anyone explain the two steps in the cube build process?
>
>1. Why do we need to extract the distinct columns from Fact Table or
>calculate the HIVE table cardinality?

Kylin builds dictionary for each column, it needs to fetch the distinct
values for each column; Using dictionary will greatly reduce the storage
size;
The cardinality can optimize the row key sequence, and so to determine the
roadmap of cube building, which will help 1) reduce the cube building time
2) reduce the cube scan range so to improve query performance

>
>2. What is the use of RowKey? How is it calculated? How does it help in
>calculating HTable Region splits?

RowKey is the key in Kylin¹s storage (Hbase); It is composed by the
dimensions¹ values (encoded in bytes); Assume your table has dimension
columns A, B, C; Their cardinality is n1, n2, n3; In the base cuboid,
there will be n1*n2*n3 rows; each row¹s key is A+B+C (concat of encoded
bytes); When user sends a query like ³select Š from fact group by A, B, C
where A=XX and B=YY and C=ZZ², Kylin will use encode(XX) + encode(YY) +
encode(ZZ) as the key to query hbase to get the pre-aggregated result;
>
>
>Is there any documentation available on these? Or any research paper/book
>referred during the project?
Check the docs here, especially the "Design Cube in Kylin.pdf" :
https://github.com/KylinOLAP/Kylin/tree/master/docs

>

Reply via email to