Hi,
It depends on how you organize the data. For example, where do you store facts 
? 
If you store facts in hbase and build indexes on low selective columns (e.g 
month or day) then you will have too many gets on hbase.
Why do not use spark streaming and spark dataframes ? You can save the latest 
data received via spark streaming as parquet and the use cached RDD to query as 
SQL. This is very fast you can join with dimensions tables (static or created 
on the received data) and you can offer also SQL interface via thrift.
Then according to cube refresh policies update the cube from parquet files and 
remove the latest files as soon cube is updated.
We should try to write a design document with all the proposed solutions by 
writing pro and cons of each proposed solutions.
Even if I am busy on customer projects I can contribute to write a such 
document if you want, but someone should start to write the solutions already 
implemented.
Regards,
-- gas



-------- Messaggio originale --------
Da: Sarnath <[email protected]> 
Data: 24/09/2015  08:22  (GMT+01:00) 
A: [email protected] 
Oggetto: Re: 回复: Kylin Real time 

Hi,

Can you share some reasons why "Inverted Index" did not work..
Coz, I am precisely trying to do the same for storing cubes - in our own
private implementation.
Wondering - what problems are upstream?

Thanks,
Best,
Sarnath

Reply via email to