Thank you Andrew.... Can you tell how ordered partitioning is exploited by
Kylin? I want to know how the Cube is exposed via HBASE's ROWKEY and Column
Families. Can you somebody explain that? Thanks much.

On Thu, May 28, 2015 at 3:03 AM, Andrew Purtell <[email protected]> wrote:

> HBase does not depend on Hive.
>
> If you want a CQL equivalent for HBase, you can use Apache Phoenix.
>
> Misunderstandings about HBase capabilities and options with respect to
> Cassandra are common. I suspect this is because of DataStax marketing.
> Cursory looks are ofen wrong.
>
> Given Kylin's goal to integrate well with Hadoop, an impartial assessment
> is very likely to conclude that use of Cassandra is suboptimal. Some
> reasons that come immediately to mind: Data stored in both HDFS and
> Cassandra's own storage will be redundant many times over due to
> replication in both storage systems. Cassandra lacks ordered partitioning
> as default, which Kylin is taking advantage of, and ordered partitioning in
> Cassandra comes with operational headaches.
>
>
> On Wed, May 27, 2015 at 1:44 AM, Sarnath <[email protected]> wrote:
>
> > Thanks for all your answers. I see the curse of dimensions - which can
> get
> > really bad when number of dimensions increases. What kind of
> optimizations
> > did you apply to reduce that? If you could name a prominent few - it will
> > be very useful knowledge.
> >
> > As far as HBASE - Are you using the Combination of Dimensions as RowKey
> for
> > HBASE? e.g. /ProductID=9739/Year=2015/Month=9/WeekOfDay=Monday can be a
> Row
> > Key to show the aggregation for all Mondays on September 2015 for Product
> > 9739.
> >
> > Is that a right way to think about how HBASE is being used? The
> > columns/column families can possibly represent different cubes.
> >
> > If the underlying data-store supports multi-dimensional maps - I think
> that
> > will be useful. Yes, HBASE is a multi-Dmap -- but those dimensions are
> > imposed by HBASE... i.e. Map<RowKey, ColumnFamily, Column, Time>
> > And that's limited. Our Cube can have a lot of dimensions.
> >
> > I am not an expert. But, from a very cursory look, Cassandra looks to be
> a
> > better bet. It has a query Language (CQL) (unlike HBASE which depends on
> > Hive which I hear is pretty slow). It looks like it can support map of
> map
> > of maps..... (nested tuples) which can come handy storing values of a
> Cube.
> >
> > I just want to get a conceptual understanding of how Kylin works. I hope
> > this discussion will help me get there.
> >
> > Thanks,
> > Best,
> > Sarnath
> >
> > On Wed, May 27, 2015 at 10:49 AM, 蒋旭 <[email protected]> wrote:
> >
> > > 1. Data cube is multi-dimensional array that is basically key-value
> data
> > > model. HBase is ordered key-value storage that is suitable for cube
> data
> > > model and query processing.
> > > 2. Kylin is focus on Hadoop. HBase is seamlessly integrate with MR,
> HDFS,
> > > HIVE.
> > > 3. HBase is scale out that is suitable to store large volume data set.
> > > 4. HBase coprocessor provide server-side parallel processing that is
> > > suitable for push-down computation and parallel the query processing.
> > >
> > > Thanks
> > > JiangXu
> > > ------------------ 原始邮件 ------------------
> > > 发件人: hongbin ma <[email protected]>
> > > 发送时间: 2015年05月27日 12:47
> > > 收件人: dev <[email protected]>
> > > 主题: Re: Choice of HBASE
> > >
> > >
> > >
> > > On Wed, May 27, 2015 at 12:35 PM, Sarnath <[email protected]> wrote:
> > >
> > > > Is it because Cube data can grow exponentially (2^N) with increasing
> > > > dimensions?
> > > >
> > >
> > > ​this is one of the most important reasons. We applied many
> optimization
> > to
> > > avoid curse of dimensions, but the cube size can still grow very large,
> > > especially when distinct count appears in metrics​
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Reply via email to