Re: Is CBO really useful(you can not estimate cost very well )?

Stamatis Zampetakis Wed, 05 Feb 2020 09:36:20 -0800

Hi Aron,

When you say that the execution plan changes based on the stats of the
previous stage for me that is still cost-based optimization.
As other people mentioned cost-based optimization can be static (the plan
does not change after the initial optimization) or dynamic (the plan can
change at various intervals by receiving additional input).
Although I've not seen it in practice you could even apply the Volcano
approach on sub plans after having executed a portion of the query.


Now regarding the OOM, I think it's only a problem of in-memory processing.
You could have operators that write intermediate results to disk (all
traditional DBMS do this) instead of keeping everything in memory and in
that case you don't have to worry about that.

Speaking about dynamic optimization techniques, there is another recent
paper that comes to my mind [1] where the plan execution switches from
interpretation to compilation. The technique is quite popular in
programming languages (Java is what comes first to my mind) but less common
in DBMS.

Best,
Stamatis

[1] https://db.in.tum.de/~leis/papers/adaptiveexecution.pdf

On Wed, Feb 5, 2020 at 3:41 AM JiaTao Tao <[email protected]> wrote:

> Hi Rui. Julian
> Thanks for your reply
> Volcano/Cascade is greate(Rules do not affect each other). The problem is
> the estimated cost does really help, particularly in big data?
> The estimates may be inaccurate, like this "JOIN-FILTER-TABLSCAN", we
> estimate that the filtering rate is large and chose broadcast join but the
> really filter rate is low can leads OOM.
> We can do specific optimization like "join reorder"(maybe the only one)
> with CBO and do other optimization by RBO, the join/aggregate selection
> that we can do by the runtime. I think rules like "pushing that
> filter/aggregation through that join" are heuristics and can be done by
> RBO.
>
>
> Regards!
>
> Aron Tao
>
>
> Julian Hyde <[email protected]> 于2020年2月5日周三 上午5:50写道：
>
> > Runtime optimization is important too. Examples include the DB’s buffer
> > cache (predicting future disk accesses based on previous accesses using
> > e.g. LRU2) and self-tuning algorithms such as hybrid hash join (which
> > figures out what key to use to partition, and which side of a join to
> > build, based on data it sees while it is running).
> >
> > It’s not either/or. You have to have both planning time and runtime
> > optimization.
> >
> > Stats need to get a lot better. Especially if you are optimizing N-way
> > joins for N > 5. I’m a big fan of the “How good are optimizers, really?”
> > paper and the join-order benchmark. And approaches such as materializing
> > intermediate result sets and re-planning.
> >
> > But even if your stats are not great, CBO is an organizing principle for
> > an optimizer. Is it worth pushing that filter through that join? Cost
> > before and after, and see which is better. Is it still worth it if I push
> > down the aggregation? Again, run the costs. Without an organizing
> > principle, every optimization (transform rule) has to be aware of every
> > other rule, and that leads to a software architecture that is too tightly
> > coupled, and eventually it is not possible to add a new feature without
> > breaking existing functionality.
> >
> > My definition of a “cost-based optimizer” is one that is
> nondeterministic.
> > By which I mean, it can investigate multiple paths, thereby avoiding
> local
> > minima.
> >
> > Therefore I am a fan of simple, crappy stats such as estimated row count.
> >
> > Julian
> >
> > [1] http://www.vldb.org/pvldb/vol9/p204-leis.pdf <
> > http://www.vldb.org/pvldb/vol9/p204-leis.pdf>
> >
> > > On Feb 4, 2020, at 7:30 AM, JiaTao Tao <[email protected]> wrote:
> > >
> > > Under big data, does CBO have such a big effect?
> > > Node like filter/join/aggregate, their cost is estimated.
> > >
> > > There's one case, I call it runtime optimizing, it means optimizing
> > while c
> > > alculating, You adjust your execution plan in real-time based on the
> > > execution statistics of the previous step(like join stragety selection,
> > > hash or broadcast or SMJ), but it's not CBO, like in volcano planner.
> > >
> > > Regards!
> > >
> > > Aron Tao
> >
> >
>

Re: Is CBO really useful(you can not estimate cost very well )?

Reply via email to