Sarnath, This is great info and a lot more fun discussion to have, as the quote goes ³if we are going by the opinions lets use mine, otherwise lets look at the dataв
‹‹‹‹ Our work is essentially similar to what Apache Kylin <http://kylin.apache.org/> does. Kylin uses HBase as their store and it uses carefully designed Row-Keys for searching data in Cubes. If we understand right, the row-keys are made up of a bitmask representing the dimensions that are grouped followed by values of each dimension. The values corresponding to the row-key are the different metrics calculated for that combination of dimensions. In our opinion, Row-key based search in HBase is essentially a search on lexicographically ordered data and this can cause un-necessary lags in OLAP Cube Search (especially when you are slicing and dicing the cube). For e.g. Let us say we want to search for all words in an English dictionary where second letter is Œa¹. We still need to go through all chapters of a ³dictionary². Inside each chapter, we still need to ³scan² until we find our results. Our solution uses a Search mechanism powered by inverted-index (Courtesy: ElasticSearch). Inverted index does not require such nearly-full-scans and should be able to retrieve data much faster. In our case, ElasticSearch lifts this burden and additionally we don¹t have to worry about ‹‹‹ I am still parsing this information, but how well does an inverted index perform for a range query, get me all sales for a region where sales is < 10M? On 12/11/15, 8:27 AM, "Sarnath" <[email protected]> wrote: >Here is the Sunday afternoon cuppa tea that I promised. Sorry about the >delay. I have tried to be as fair as possible and have advised pinch if >salt where necessary.... > >http://www.hcltech.com/blogs/engineering-and-rd-services/olap-cubing-big-d >ata > >Thanks, >Best, >Sarnath & Big data CoE from HCL
