On Tue, Nov 17, 2015 at 12:59 AM, Sarnath <[email protected]> wrote:
> > > Also, if SQL parsing is CPU intensive, it should not really take 100ms > > > unless some IO is being performed... > > > > > > > It isn't the parsing. It is the combinatoric explosion in the optimizer. > > > > Hmmm... Sorry that went over my head. Calcite optimizer? In the query > above, which is a simple group by, what is there to optimize? One simply > needs to scan hbase and return back the data.. Pardon my ignorance here. I > would really like to understand this part. Can you educate me on query > optimization and how and what role does that play? > This particular case is quite simple. But if you had specified several elements to be grouped by and possibly had some other constraints, then the optimizer comes into play. Moreover, because of the focus on calcite and similar on analytical queries where 100ms extra is not a big concern, it was deemed acceptable to have a significant query startup time. This would be completely unacceptable in, say, an OLTP application, but isn't a big deal if the typical query takes seconds or more. To summarize, you pay a cost for the Calcite optimizer, even for simple queries. ... > > The number of similar aggregations also tends to increase the complexity > of > > the query optimization, although there are good guarantees on how this > > complexity will grow. > > Umm...I assume you are not talking about cube build phase rather the query > phase.. I don't understand how query optimization can depend on the number > of aggregations present in the cube...my naive thought process is -- just > look at what is being grouped by and what metric is being asked for... That > will tell you how to search and fetch results from hbase... So, I really > can't wrap my head around the optimization. > The issue is finding the optimal cuboid(s) to use to compute the query. This is a greatest lower bound problem on a lattice.
