Re: Best way to Query MOLAP data cube with billions of records

Luke Han Mon, 08 Jun 2015 12:19:29 -0700

The final cube size depends on your dimensions cardinality, number, partial
setting and others.

It's hard to say the exactly size of that. How big of your source hive
table?
There's one measure called "Expansion Rate" in Kylin. Normally it will only
take 30~50% of source table size (not always, some cases will be extreme
large). In our internal case, we do not allow 10+ times of source table, it
requires optimization and settings to turn the final size.

For your capacity plan, I think you could refer to the target source hive
size, and then make some calculation for your HBase storage (for Kylin
cubes).

Thanks.

Best Regards!
---------------------

Luke Han

2015-06-08 11:52 GMT-07:00 Jakob Stengård <[email protected]>:

> It depends on how many nodes you have and their specs. You could do ROLAP
> with hive-on-tez for one bilion rows. I personally tried it using the ORC
> fromat, and vectroization, and queries took like 20s with just 3 nodes.
>
>
>
> On Mon, Jun 8, 2015 at 6:47 PM, Vineet Mishra <[email protected]>
> wrote:
>
> > Hi,
> >
> > I am looking out for the best way to query my aggregated data from
> Tableau
> > using any connectivity measure, persisting the data in Hadoop(HDFS) and
> > retrieving it through any connector/query engine possible.
> >
> > For the MOLAP cube of a Billion of records with just 10 or 20 aggregates
> > what will be the suitable storage and query engine for my use case.
> >
> > Urgent Call. Need experts advise!
> >
> > Thanks!
> >
>
>
>
> --
>
> Med vänlig hälsning (Best Regards)
> *Jakob Stengård*
>

Re: Best way to Query MOLAP data cube with billions of records

Reply via email to