Re: Group by + where clause

Sarnath Sun, 15 Nov 2015 08:59:57 -0800

To answer other questions...

>Also, Kylin is premised on cubes not fitting in memory
> while ES is pretty optimistically assuming they will fit in memory


HBase too internally tries to keep everything in memory. ES too tries to do
the same. How the tools juggle around memory and disk totally depends on
the infra.. There is nothing much to discuss on these lines. But that said,
does kylin do optimizations beyond what HBase provides?
Also, I learn from my colleague that Kylin has a limit of 4mln
aggregations... Is that true?

> How do you handle queries that do indirect references to cubes (say by
> asking for a rollup by region when you only cubed by city)?
>

Our engine does not understand/enforce hierarchies. However, it gives an
option to construct drill down aggregations by using whatever dimensions
the user specifies. Due to this sense of detachment, I don't think we will
ever grow up to smartly combine cube information with underlying
hierarchical relationships.

Using ES relieves us of many things. We don't worry about compression, REST
API to search cubes, actual search process and thus allows us to operate at
a very high level.

> And do you automatically decide which cubes to use?

I think this is inferred from the GROUP BY clause. I think kylin uses a
bitmask in the front of the row-key to mark the dimensions that are being
aggregated. Similarly, we use a field in each ES document to specify what
type of aggregation it holds...

Btw... Does kylin enable run-length-encoding for row-keys in hbase? I think
that can save a lot of space on disk.(but not on memory I think)

Re: Group by + where clause

Reply via email to