Hi Fabian, Should we close this issue then? Or I could just leave the comment why we can't repair NOT IN at the moment. So no one else will do the same research again. Perhaps the Calcite team will change a logical plan for NOT IN and we will be back to this issue.
Regards, Alexander -----Original Message----- From: Fabian Hueske [mailto:fhue...@gmail.com] Sent: Friday, November 11, 2016 2:04 AM To: dev@flink.apache.org Subject: Re: [FLINK-4541] Support for SQL NOT IN operator > Hi Alexander, > > Thanks for looking into this issue! > > We did not support CROSS JOIN on purpose because the general case is very > expensive to compute. > Also as you noticed we would have to make sure that inner-joins are preferred > over cross joins (if possible). > Cost-based optimizers (such as Calcite's Volcano Planner) use cost estimates > for that. > So in principle, the approach of assigning high costs was correct. In the > case of Flink, > we have to be careful though, because we do not have good estimates for the > size of > our inputs (actually we do not have any at the moment...). So even tweaked > cost functions might bit us. > I would actually prefer to not add support for NOT IN (except for the simple > case of a list of a few literals) at this point. > Adding a CrossJoin translation rule will also add support for arbitrary cross > joins, which I would like to avoid. > With support for cross joins it is possible to write by accident queries that > run for days and produce > vast amounts of data filling up all disks. > > I also think there are other issues, which are more important to address and > would have a bigger impact than support for NOT IN. > > Best, Fabian > > 2016-11-10 11:26 GMT+01:00 Alexander Shoshin <alexander_shos...@epam.com>: > > > Hi, > > > > I am working on FLINK-4541 issue and this is my current changes: > > https://github.com/apache/flink/compare/master... > > AlexanderShoshin:FLINK-4541. > > > > I found that NOT IN does not work with nested queries because of > > missed DataSet planner rule for a cross join. After adding > > DataSetCrossJoinRule several tests from > > org.apache.flink.api.scala.batch.ExplainTest > > (testJoinWithExtended and testJoinWithoutExtended) that should check > > execution plans became fail because VolcanoPlanner started to build > > new execution plans for them using DataSetCrossJoin. That's why I > > increased DataSetCrossJoin cost (in computeSelfCost(...) function) to > > avoid its usage if it is possible. But it seems to be not a good idea. > > > > Do you have any ideas how to solve a problem with testJoinWithExtended > > and testJoinWithoutExtended tests in another way? Is it correct that > > these tests check an execution plan instead of a query result? > > > > Regards, > > Alexander > > > >