Re: Group by + where clause

Adunuthula, Seshu Fri, 11 Dec 2015 09:32:36 -0800

Sarnath,

This is great info and a lot more fun discussion to have, as the quote
goes ³if we are going by the opinions lets use mine, otherwise lets look
at the dataŠ²

‹‹‹‹

Our work is essentially similar to what Apache Kylin
<http://kylin.apache.org/> does. Kylin uses HBase as their store and it
uses carefully designed Row-Keys for searching data in Cubes. If we
understand right, the row-keys are made up of a bitmask representing the
dimensions that are grouped followed by values of each dimension. The
values corresponding to the row-key are the different metrics calculated
for that combination of dimensions.

In our opinion, Row-key based search in HBase is essentially a search on
lexicographically ordered data and this can cause un-necessary lags in
OLAP Cube Search (especially when you are slicing and dicing the cube).
For e.g. Let us say we want to search for all words in an English
dictionary where second letter is Œa¹. We still need to go through all
chapters of a ³dictionary².  Inside each chapter, we still need to ³scan²
until we find our results. Our solution uses a Search mechanism powered by
inverted-index (Courtesy: ElasticSearch).  Inverted index does not require
such nearly-full-scans and should be able to retrieve data much faster. In
our case, ElasticSearch lifts this burden and additionally we don¹t have
to worry about 

‹‹‹

I am still parsing this information, but how well does an inverted index
perform for a range query, get me all sales for a region where sales is <
10M? 

On 12/11/15, 8:27 AM, "Sarnath" <[email protected]> wrote:

>Here is the Sunday afternoon cuppa tea that I promised. Sorry about the
>delay. I have tried to be as fair as possible and have advised pinch if
>salt where necessary....
>
>http://www.hcltech.com/blogs/engineering-and-rd-services/olap-cubing-big-d
>ata
>
>Thanks,
>Best,
>Sarnath & Big data CoE from HCL

Re: Group by + where clause

Reply via email to