Thank you Andrew.... Can you tell how ordered partitioning is exploited by Kylin? I want to know how the Cube is exposed via HBASE's ROWKEY and Column Families. Can you somebody explain that? Thanks much.
On Thu, May 28, 2015 at 3:03 AM, Andrew Purtell <[email protected]> wrote: > HBase does not depend on Hive. > > If you want a CQL equivalent for HBase, you can use Apache Phoenix. > > Misunderstandings about HBase capabilities and options with respect to > Cassandra are common. I suspect this is because of DataStax marketing. > Cursory looks are ofen wrong. > > Given Kylin's goal to integrate well with Hadoop, an impartial assessment > is very likely to conclude that use of Cassandra is suboptimal. Some > reasons that come immediately to mind: Data stored in both HDFS and > Cassandra's own storage will be redundant many times over due to > replication in both storage systems. Cassandra lacks ordered partitioning > as default, which Kylin is taking advantage of, and ordered partitioning in > Cassandra comes with operational headaches. > > > On Wed, May 27, 2015 at 1:44 AM, Sarnath <[email protected]> wrote: > > > Thanks for all your answers. I see the curse of dimensions - which can > get > > really bad when number of dimensions increases. What kind of > optimizations > > did you apply to reduce that? If you could name a prominent few - it will > > be very useful knowledge. > > > > As far as HBASE - Are you using the Combination of Dimensions as RowKey > for > > HBASE? e.g. /ProductID=9739/Year=2015/Month=9/WeekOfDay=Monday can be a > Row > > Key to show the aggregation for all Mondays on September 2015 for Product > > 9739. > > > > Is that a right way to think about how HBASE is being used? The > > columns/column families can possibly represent different cubes. > > > > If the underlying data-store supports multi-dimensional maps - I think > that > > will be useful. Yes, HBASE is a multi-Dmap -- but those dimensions are > > imposed by HBASE... i.e. Map<RowKey, ColumnFamily, Column, Time> > > And that's limited. Our Cube can have a lot of dimensions. > > > > I am not an expert. But, from a very cursory look, Cassandra looks to be > a > > better bet. It has a query Language (CQL) (unlike HBASE which depends on > > Hive which I hear is pretty slow). It looks like it can support map of > map > > of maps..... (nested tuples) which can come handy storing values of a > Cube. > > > > I just want to get a conceptual understanding of how Kylin works. I hope > > this discussion will help me get there. > > > > Thanks, > > Best, > > Sarnath > > > > On Wed, May 27, 2015 at 10:49 AM, 蒋旭 <[email protected]> wrote: > > > > > 1. Data cube is multi-dimensional array that is basically key-value > data > > > model. HBase is ordered key-value storage that is suitable for cube > data > > > model and query processing. > > > 2. Kylin is focus on Hadoop. HBase is seamlessly integrate with MR, > HDFS, > > > HIVE. > > > 3. HBase is scale out that is suitable to store large volume data set. > > > 4. HBase coprocessor provide server-side parallel processing that is > > > suitable for push-down computation and parallel the query processing. > > > > > > Thanks > > > JiangXu > > > ------------------ 原始邮件 ------------------ > > > 发件人: hongbin ma <[email protected]> > > > 发送时间: 2015年05月27日 12:47 > > > 收件人: dev <[email protected]> > > > 主题: Re: Choice of HBASE > > > > > > > > > > > > On Wed, May 27, 2015 at 12:35 PM, Sarnath <[email protected]> wrote: > > > > > > > Is it because Cube data can grow exponentially (2^N) with increasing > > > > dimensions? > > > > > > > > > > this is one of the most important reasons. We applied many > optimization > > to > > > avoid curse of dimensions, but the cube size can still grow very large, > > > especially when distinct count appears in metrics > > > > > > > > > > > > -- > > > Regards, > > > > > > *Bin Mahone | 马洪宾* > > > Apache Kylin: http://kylin.io > > > Github: https://github.com/binmahone > > > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
