> > Also, if SQL parsing is CPU intensive, it should not really take 100ms > > unless some IO is being performed... > > > > It isn't the parsing. It is the combinatoric explosion in the optimizer. >
Hmmm... Sorry that went over my head. Calcite optimizer? In the query above, which is a simple group by, what is there to optimize? One simply needs to scan hbase and return back the data.. Pardon my ignorance here. I would really like to understand this part. Can you educate me on query optimization and how and what role does that play? > Yes. The issue is that you have aggregations across many combinations of > variables. > > That can mean that the number of rows in the cube datastores can be a > significant fraction of the size of the original data. In fact, you could > cause it to be much bigger than the original data (not that such a thing > would make much sense). > Yea, I get this part.. Cardinality of dimensions get multiplied and technically possible to have a really huge cube.... > The number of similar aggregations also tends to increase the complexity of > the query optimization, although there are good guarantees on how this > complexity will grow. Umm...I assume you are not talking about cube build phase rather the query phase.. I don't understand how query optimization can depend on the number of aggregations present in the cube...my naive thought process is -- just look at what is being grouped by and what metric is being asked for... That will tell you how to search and fetch results from hbase... So, I really can't wrap my head around the optimization. Best, Sarnath
