Re: Choice of HBASE

Sarnath Wed, 27 May 2015 20:34:01 -0700

Also - Ordered Partitioning can help HBASE to do row-scans... i.e. I can
query with Partial Key and start a scan from there... But is that a
requirement in Kylin? Say Slicing (or) so where some dimensions are kept
constant and allowing other dimensions to vary? That sounds like a good
usecase... But can some1 confirm?


On Thu, May 28, 2015 at 8:46 AM, Sarnath <[email protected]> wrote:

> Thank you Andrew.... Can you tell how ordered partitioning is exploited by
> Kylin? I want to know how the Cube is exposed via HBASE's ROWKEY and Column
> Families. Can you somebody explain that? Thanks much.
>
> On Thu, May 28, 2015 at 3:03 AM, Andrew Purtell <[email protected]>
> wrote:
>
>> HBase does not depend on Hive.
>>
>> If you want a CQL equivalent for HBase, you can use Apache Phoenix.
>>
>> Misunderstandings about HBase capabilities and options with respect to
>> Cassandra are common. I suspect this is because of DataStax marketing.
>> Cursory looks are ofen wrong.
>>
>> Given Kylin's goal to integrate well with Hadoop, an impartial assessment
>> is very likely to conclude that use of Cassandra is suboptimal. Some
>> reasons that come immediately to mind: Data stored in both HDFS and
>> Cassandra's own storage will be redundant many times over due to
>> replication in both storage systems. Cassandra lacks ordered partitioning
>> as default, which Kylin is taking advantage of, and ordered partitioning
>> in
>> Cassandra comes with operational headaches.
>>
>>
>> On Wed, May 27, 2015 at 1:44 AM, Sarnath <[email protected]> wrote:
>>
>> > Thanks for all your answers. I see the curse of dimensions - which can
>> get
>> > really bad when number of dimensions increases. What kind of
>> optimizations
>> > did you apply to reduce that? If you could name a prominent few - it
>> will
>> > be very useful knowledge.
>> >
>> > As far as HBASE - Are you using the Combination of Dimensions as RowKey
>> for
>> > HBASE? e.g. /ProductID=9739/Year=2015/Month=9/WeekOfDay=Monday can be a
>> Row
>> > Key to show the aggregation for all Mondays on September 2015 for
>> Product
>> > 9739.
>> >
>> > Is that a right way to think about how HBASE is being used? The
>> > columns/column families can possibly represent different cubes.
>> >
>> > If the underlying data-store supports multi-dimensional maps - I think
>> that
>> > will be useful. Yes, HBASE is a multi-Dmap -- but those dimensions are
>> > imposed by HBASE... i.e. Map<RowKey, ColumnFamily, Column, Time>
>> > And that's limited. Our Cube can have a lot of dimensions.
>> >
>> > I am not an expert. But, from a very cursory look, Cassandra looks to
>> be a
>> > better bet. It has a query Language (CQL) (unlike HBASE which depends on
>> > Hive which I hear is pretty slow). It looks like it can support map of
>> map
>> > of maps..... (nested tuples) which can come handy storing values of a
>> Cube.
>> >
>> > I just want to get a conceptual understanding of how Kylin works. I hope
>> > this discussion will help me get there.
>> >
>> > Thanks,
>> > Best,
>> > Sarnath
>> >
>> > On Wed, May 27, 2015 at 10:49 AM, 蒋旭 <[email protected]> wrote:
>> >
>> > > 1. Data cube is multi-dimensional array that is basically key-value
>> data
>> > > model. HBase is ordered key-value storage that is suitable for cube
>> data
>> > > model and query processing.
>> > > 2. Kylin is focus on Hadoop. HBase is seamlessly integrate with MR,
>> HDFS,
>> > > HIVE.
>> > > 3. HBase is scale out that is suitable to store large volume data set.
>> > > 4. HBase coprocessor provide server-side parallel processing that is
>> > > suitable for push-down computation and parallel the query processing.
>> > >
>> > > Thanks
>> > > JiangXu
>> > > ------------------ 原始邮件 ------------------
>> > > 发件人: hongbin ma <[email protected]>
>> > > 发送时间: 2015年05月27日 12:47
>> > > 收件人: dev <[email protected]>
>> > > 主题: Re: Choice of HBASE
>> > >
>> > >
>> > >
>> > > On Wed, May 27, 2015 at 12:35 PM, Sarnath <[email protected]> wrote:
>> > >
>> > > > Is it because Cube data can grow exponentially (2^N) with increasing
>> > > > dimensions?
>> > > >
>> > >
>> > > this is one of the most important reasons. We applied many
>> optimization
>> > to
>> > > avoid curse of dimensions, but the cube size can still grow very
>> large,
>> > > especially when distinct count appears in metrics
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > >
>> > > *Bin Mahone | 马洪宾*
>> > > Apache Kylin: http://kylin.io
>> > > Github: https://github.com/binmahone
>> > >
>> >
>>
>>
>>
>> --
>> Best regards,
>>
>>    - Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>
>
>

Re: Choice of HBASE

Reply via email to