Re: Choice of HBASE

Sarnath Wed, 27 May 2015 01:45:07 -0700

Thanks for all your answers. I see the curse of dimensions - which can get
really bad when number of dimensions increases. What kind of optimizations
did you apply to reduce that? If you could name a prominent few - it will
be very useful knowledge.

As far as HBASE - Are you using the Combination of Dimensions as RowKey for
HBASE? e.g. /ProductID=9739/Year=2015/Month=9/WeekOfDay=Monday can be a Row
Key to show the aggregation for all Mondays on September 2015 for Product
9739.

Is that a right way to think about how HBASE is being used? The
columns/column families can possibly represent different cubes.

If the underlying data-store supports multi-dimensional maps - I think that
will be useful. Yes, HBASE is a multi-Dmap -- but those dimensions are
imposed by HBASE... i.e. Map<RowKey, ColumnFamily, Column, Time>
And that's limited. Our Cube can have a lot of dimensions.

I am not an expert. But, from a very cursory look, Cassandra looks to be a
better bet. It has a query Language (CQL) (unlike HBASE which depends on
Hive which I hear is pretty slow). It looks like it can support map of map
of maps..... (nested tuples) which can come handy storing values of a Cube.

I just want to get a conceptual understanding of how Kylin works. I hope
this discussion will help me get there.

Thanks,
Best,
Sarnath

On Wed, May 27, 2015 at 10:49 AM, 蒋旭 <[email protected]> wrote:

> 1. Data cube is multi-dimensional array that is basically key-value data
> model. HBase is ordered key-value storage that is suitable for cube data
> model and query processing.
> 2. Kylin is focus on Hadoop. HBase is seamlessly integrate with MR, HDFS,
> HIVE.
> 3. HBase is scale out that is suitable to store large volume data set.
> 4. HBase coprocessor provide server-side parallel processing that is
> suitable for push-down computation and parallel the query processing.
>
> Thanks
> JiangXu
> ------------------ 原始邮件 ------------------
> 发件人: hongbin ma <[email protected]>
> 发送时间: 2015年05月27日 12:47
> 收件人: dev <[email protected]>
> 主题: Re: Choice of HBASE
>
>
>
> On Wed, May 27, 2015 at 12:35 PM, Sarnath <[email protected]> wrote:
>
> > Is it because Cube data can grow exponentially (2^N) with increasing
> > dimensions?
> >
>
> this is one of the most important reasons. We applied many optimization to
> avoid curse of dimensions, but the cube size can still grow very large,
> especially when distinct count appears in metrics
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Choice of HBASE

Reply via email to