Re: 回复： Group by + where clause

Luke Han Mon, 16 Nov 2015 08:41:56 -0800

Echo to Ted's comments, second/sub-second latency for OLAP is good enough
since it's for "Human" to interactive with data source. several ms latency
is more like for OLTP or monitoring system purpose.


Beyond talking about query latency, would you like to share your cube build
time also?

BTW, which company and product you are?




Best Regards!
---------------------

Luke Han

On Tue, Nov 17, 2015 at 12:22 AM, Ted Dunning <[email protected]> wrote:

> On Tue, Nov 17, 2015 at 12:59 AM, Sarnath <[email protected]> wrote:
>
> > > > Also, if SQL parsing is CPU intensive, it should not really take
> 100ms
> > > > unless some IO is being performed...
> > > >
> > >
> > > It isn't the parsing.  It is the combinatoric explosion in the
> optimizer.
> > >
> >
> > Hmmm... Sorry that went over my head. Calcite optimizer? In the query
> > above, which is a simple group by, what is there to optimize? One simply
> > needs to scan hbase and return back the data.. Pardon my ignorance here.
> I
> > would really like to understand this part. Can you educate me on query
> > optimization and how and what role does that play?
> >
>
> This particular case is quite simple.  But if you had specified several
> elements to be grouped by and possibly had some other constraints, then the
> optimizer comes into play.
>
> Moreover, because of the focus on calcite and similar on analytical queries
> where 100ms extra is not a big concern, it was deemed acceptable to have a
> significant query startup time. This would be completely unacceptable in,
> say, an OLTP application, but isn't a big deal if the typical query takes
> seconds or more.
>
> To summarize, you pay a cost for the Calcite optimizer, even for simple
> queries.
>
> ...
> > > The number of similar aggregations also tends to increase the
> complexity
> > of
> > > the query optimization, although there are good guarantees on how this
> > > complexity will grow.
> >
> > Umm...I assume you are not talking about cube build phase rather the
> query
> > phase.. I don't understand how query optimization can depend on the
> number
> > of aggregations present in the cube...my naive thought process is -- just
> > look at what is being grouped by and what metric is being asked for...
> That
> > will tell you how to search and fetch results from hbase... So, I really
> > can't wrap my head around the optimization.
> >
>
> The issue is finding the optimal cuboid(s) to use to compute the query.
>
> This is a greatest lower bound problem on a lattice.
>

Re: 回复： Group by + where clause

Reply via email to