Hi Team,

This is Kamal Bannuru , I am newbie to Kylin community, please help me with
"*How to estimate cluster resources and storage required by Kylin*"

Please find more details about the dimension tables ,fact table and cube
design details as below.

*Dimension Table:* dim_audio_songs_name_mapping 
--------------------------------------------------
Column Name     |       DataType                |  Sample values
--------------------------------------------------
songid                |     String              |       s001
songname                 |      String          |       yyy
artistname               |      String          |       XXX
country_code     |      String          |       IN
--------------------------------------------------
Dimension table size in HDFS:10 GB
No.Of Records                   :5 Million records


*Fact Table:* tb_songs_tranasactions    
--------------------------------------------------
Column Name    |        DataType        | Sample Value
--------------------------------------------------
transactionid      |    bigint  | 1001
country_code     |      String  | IN
currency            |   String  | INR
paid_money       |      String  | 1000
songid          |      String   | s001
--------------------------------------------------

Dimension table size in HDFS : 20 GB
No.Of Records                    : 50 Million Records


Model CubeEngine        MR      

*Cube Design details:*
----------------------------------------------------------------------------------------------------
Column Type         | Column Name       | Join Relation 
----------------------------------------------------------------------------------------------------
Dimension Column | Country_code   |
tb_songs_tranasactions.country_code=dim_audio_songs_name_mapping.country_code
Dimension Column | songid                  |
tb_songs_tranasactions.songid=dim_audio_songs_name_mapping.songid
Measure                | Metric                   | Count(transactionid)
count(tb_songs_tranasactions.transactionid)
Measure                | Metric                  | SUM(paid_money)
sum(tb_songs_tranasactions.paid_money)
----------------------------------------------------------------------------------------------------



*Cube size estimation and required computations calcuations     *               
        
*Storage Estimations:*                          

1)  Please share the details like how much storage is relatively required
considering the dimension columns , cardinality values and facts data .
2) how much hive storage is required for the intermediate tables and for the
cube storage size at Hbase.
3) Do we have any Aproximate formulas to estimate these sizes ?
                                
*Computation Estimations*
*Cube building :*
How much computation resources at cluster are required for the intermediate
hive jobs  using cube engine as MR ?                            
                                
*Cube Query :*
How much computation resources are required for Cube query from hbase
storage ?

Do we have any Aproximate formulas to estimate these sizes ?    

If these questions are already answered, please share the links, please let
me know if any more details are required.

Thanks for the support.

Regards
Kamal Bannuru.





--
Sent from: http://apache-kylin.74782.x6.nabble.com/

Reply via email to