Re: Group by + where clause

hongbin ma Tue, 17 Nov 2015 20:49:06 -0800

The observed performance gap should have nothing to do with calcite, I
think we're missing the key factors here.

In my opinion, the key factors is in the choice of storage engine. Kylin is
using HBase for now, even though we're opening for other alternatives for
cube storage, we might not easily switch to another(like ES,Kudu) unless
we're really convinced the other one is a better choice.

ES might be good, but the benchmark @Sarnath provided is nowhere near a
practical case and thus can't be convincing.(AFAIK no practical case has
merely two dimensions and 1M cube entries), ES's high performance on tiny
cubes does not lead to the conclusion that it is a better choice than HBase
regarding moderate size cubes. Even though HBase has its burdens and
blames, it has many nice features like coprocessor to help cube scanning.

Still I want to emphasize that we have a open architecture to change
underlying cube storage engine whenever necessary.

On Tue, Nov 17, 2015 at 11:49 AM, Ted Dunning <[email protected]> wrote:

> On Tue, Nov 17, 2015 at 5:16 AM, Julian Hyde <[email protected]> wrote:
>
> > And before you demonize Calcite, Ted, let’s make it clear that we don’t
> > know whether Calcite is to blame here. I actually doubt that it is,
> because
> > AFAIK Kylin does not use Calcite for cuboid selection.
> >
>
> Julian,
>
> I hope my comments were not intended as demonization. I think that for all
> of the current applications of Calcite, there are costs and benefits that
> tip very much toward the benefit side, particular in the context being in
> question.
>
> It is a good idea to measure before attempting to change things. But I
> really don't think that getting the entire parsing and planning time faster
> than about 100 ms needs to be a goal of Calcite. It might be nice, but it
> just isn't needed in the current consumers I know about. So even if
> detailed measurement turns up Calcite taking about that long, I don't think
> that is a priority issue.
>
>
>
>
> >
> > Sarnath’s original remark was
> >
> > > if SQL parsing is CPU intensive, it should not really take 100ms
> > > unless some IO is being performed.
> >
> > SQL parsing absolutely should not take 100ms. It’s sometimes justified
> for
> > the whole query preparation process to take 100ms — if that’s what it
> takes
> > to find a “smart” way to answer the query, and the query would take a
> long
> > time to answer otherwise.
> >
> > But if there’s a performance problem in the query preparation process,
> > let’s log a performance bug, and identify where the time is being spent.
> >
>

-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Group by + where clause

Reply via email to