Hi, devs,

I'd like to start a discuss about adding an option called
"table.oprimizer.busy-join-reorder-threshold" for planner rule while we try
to introduce a new busy join reorder rule[1] into Flink.

This join reorder rule is based on dynamic programing[2], which can store
all possible intermediate results, and the cost model can be used to select
the optimal join reorder result. Compare with the existing Lopt join
reorder rule, the new rule can give more possible results and the result
can be more accurate. However, the search space of this rule will become
very large as the number of tables increases. So we should introduce an
option to limit the expansion of search space, if the number of table can
be reordered less than the threshold, the new busy join reorder rule is
used. On the contrary, the Lopt rule is used.

The default threshold intended to be set to 12. One reason is that in the
tpc-ds benchmark test, when the number of tables exceeds 12, the
optimization time will be very long. The other reason is that it refers to
relevant engines, like Spark, whose recommended setting is 12.[3]

Looking forward to your feedback.

[1]  https://issues.apache.org/jira/browse/FLINK-30376
[2]
https://courses.cs.duke.edu/compsci516/cps216/spring03/papers/selinger-etal-1979.pdf
[3]
https://spark.apache.org/docs/3.3.1/configuration.html#runtime-sql-configuration

Best regards,
Yunhong Zheng

Reply via email to