Hi Alexander,

Thanks for looking into this issue!

We did not support CROSS JOIN on purpose because the general case is very
expensive to compute. Also as you noticed we would have to make sure that
inner-joins are preferred over cross joins (if possible). Cost-based
optimizers (such as Calcite's Volcano Planner) use cost estimates for that.
So in principle, the approach of assigning high costs was correct. In the
case of Flink, we have to be careful though, because we do not have good
estimates for the size of our inputs (actually we do not have any at the
moment...). So even tweaked cost functions might bit us.

I would actually prefer to not add support for NOT IN (except for the
simple case of a list of a few literals) at this point.
Adding a CrossJoin translation rule will also add support for arbitrary
cross joins, which I would like to avoid.
With support for cross joins it is possible to write by accident queries
that run for days and produce vast amounts of data filling up all disks.

I also think there are other issues, which are more important to address
and would have a bigger impact than support for NOT IN.

Best, Fabian

2016-11-10 11:26 GMT+01:00 Alexander Shoshin <alexander_shos...@epam.com>:

> Hi,
>
> I am working on FLINK-4541 issue and this is my current changes:
> https://github.com/apache/flink/compare/master...
> AlexanderShoshin:FLINK-4541.
>
> I found that NOT IN does not work with nested queries because of missed
> DataSet planner rule for a cross join. After adding DataSetCrossJoinRule
> several tests from org.apache.flink.api.scala.batch.ExplainTest
> (testJoinWithExtended and testJoinWithoutExtended) that should check
> execution plans became fail because VolcanoPlanner started to build new
> execution plans for them using DataSetCrossJoin. That's why I increased
> DataSetCrossJoin cost (in computeSelfCost(...) function) to avoid its usage
> if it is possible. But it seems to be not a good idea.
>
> Do you have any ideas how to solve a problem with testJoinWithExtended and
> testJoinWithoutExtended tests in another way? Is it correct that these
> tests check an execution plan instead of a query result?
>
> Regards,
> Alexander
>
>

Reply via email to