>From some real world practice I have heard (but we haven't tried it in
BeamSQL yet), CBO becomes bad as the complexity of plan increases as
estimation will be hard to be close to reality in the later stages for big
data processing.  Instead, the runtime optimization that you called out,
does work. Basically the idea is to use the previous stage's result to
estimate next stage's stats, run next stage. And then based on the running
result of the next stage to estimate the next next stage. It's still CBO
but just always estimates no more than one layer.


-Rui

On Tue, Feb 4, 2020 at 7:30 AM JiaTao Tao <[email protected]> wrote:

> Under big data, does CBO have such a big effect?
> Node like filter/join/aggregate, their cost is estimated.
>
> There's one case, I call it runtime optimizing, it means optimizing while c
> alculating, You adjust your execution plan in real-time based on the
> execution statistics of the previous step(like join stragety selection,
> hash or broadcast or SMJ), but it's not CBO, like in volcano planner.
>
> Regards!
>
> Aron Tao
>

Reply via email to