Re: 回复： Group by + where clause

Sarnath Mon, 16 Nov 2015 08:00:29 -0800

> > Also, if SQL parsing is CPU intensive, it should not really take 100ms
> > unless some IO is being performed...
> >
>
> It isn't the parsing.  It is the combinatoric explosion in the optimizer.
>


Hmmm... Sorry that went over my head. Calcite optimizer? In the query
above, which is a simple group by, what is there to optimize? One simply
needs to scan hbase and return back the data.. Pardon my ignorance here. I
would really like to understand this part. Can you educate me on query
optimization and how and what role does that play?

> Yes. The issue is that you have aggregations across many combinations of
> variables.
>
> That can mean that the number of rows in the cube datastores can be a
> significant fraction of the size of the original data.  In fact, you could
> cause it to be much bigger than the original data (not that such a thing
> would make much sense).
>

Yea, I get this part.. Cardinality of dimensions get multiplied and
technically possible to have a really huge cube....

> The number of similar aggregations also tends to increase the complexity
of
> the query optimization, although there are good guarantees on how this
> complexity will grow.

Umm...I assume you are not talking about cube build phase rather the query
phase.. I don't understand how query optimization can depend on the number
of aggregations present in the cube...my naive thought process is -- just
look at what is being grouped by and what metric is being asked for... That
will tell you how to search and fetch results from hbase... So, I really
can't wrap my head around the optimization.

Best,
Sarnath

Re: 回复： Group by + where clause

Reply via email to