To answer other questions... >Also, Kylin is premised on cubes not fitting in memory > while ES is pretty optimistically assuming they will fit in memory
HBase too internally tries to keep everything in memory. ES too tries to do the same. How the tools juggle around memory and disk totally depends on the infra.. There is nothing much to discuss on these lines. But that said, does kylin do optimizations beyond what HBase provides? Also, I learn from my colleague that Kylin has a limit of 4mln aggregations... Is that true? > How do you handle queries that do indirect references to cubes (say by > asking for a rollup by region when you only cubed by city)? > Our engine does not understand/enforce hierarchies. However, it gives an option to construct drill down aggregations by using whatever dimensions the user specifies. Due to this sense of detachment, I don't think we will ever grow up to smartly combine cube information with underlying hierarchical relationships. Using ES relieves us of many things. We don't worry about compression, REST API to search cubes, actual search process and thus allows us to operate at a very high level. > And do you automatically decide which cubes to use? I think this is inferred from the GROUP BY clause. I think kylin uses a bitmask in the front of the row-key to mark the dimensions that are being aggregated. Similarly, we use a field in each ES document to specify what type of aggregation it holds... Btw... Does kylin enable run-length-encoding for row-keys in hbase? I think that can save a lot of space on disk.(but not on memory I think)
