Re: 回复： Group by + where clause

Ted Dunning Mon, 16 Nov 2015 08:23:44 -0800

On Tue, Nov 17, 2015 at 12:59 AM, Sarnath <[email protected]> wrote:


> > > Also, if SQL parsing is CPU intensive, it should not really take 100ms
> > > unless some IO is being performed...
> > >
> >
> > It isn't the parsing.  It is the combinatoric explosion in the optimizer.
> >
>
> Hmmm... Sorry that went over my head. Calcite optimizer? In the query
> above, which is a simple group by, what is there to optimize? One simply
> needs to scan hbase and return back the data.. Pardon my ignorance here. I
> would really like to understand this part. Can you educate me on query
> optimization and how and what role does that play?
>

This particular case is quite simple.  But if you had specified several
elements to be grouped by and possibly had some other constraints, then the
optimizer comes into play.

Moreover, because of the focus on calcite and similar on analytical queries
where 100ms extra is not a big concern, it was deemed acceptable to have a
significant query startup time. This would be completely unacceptable in,
say, an OLTP application, but isn't a big deal if the typical query takes
seconds or more.

To summarize, you pay a cost for the Calcite optimizer, even for simple
queries.

...
> > The number of similar aggregations also tends to increase the complexity
> of
> > the query optimization, although there are good guarantees on how this
> > complexity will grow.
>
> Umm...I assume you are not talking about cube build phase rather the query
> phase.. I don't understand how query optimization can depend on the number
> of aggregations present in the cube...my naive thought process is -- just
> look at what is being grouped by and what metric is being asked for... That
> will tell you how to search and fetch results from hbase... So, I really
> can't wrap my head around the optimization.
>

The issue is finding the optimal cuboid(s) to use to compute the query.

This is a greatest lower bound problem on a lattice.

Re: 回复： Group by + where clause

Reply via email to