As you've observed, we have to manually enable nested loops (which cartesian joins require).
alter session set `planner.enable_nljoin_for_scalar_only` = false; I am guessing this approach was taken because 1. Although ANSI compliant SQL. These type of joins are not typical in most applications. 2. To avoid the possibility of the cost based optimizer selecting a nested loop algorithm (expensive join algorithm) by mistake. If this happens a lot, it would decrease overall performance. 3. Simple enough workaround (manually enabling/disabling) until the logic is developed to automatically enable nested loops when needed. If anyone knows for "sure", please chime in. OLAP Engines like Modrian have already been reported to generate queries that need Cartesian joins. https://issues.apache.org/jira/browse/DRILL-2915 Hopefully this helps... For my use case, it's no big deal to enable it manually. If performance ever becomes an issue, I can toggle it off and on as needed. I asked a similar question.. If you check back in the archives for this list (12/2016) you will see the details. On Sat, May 6, 2017 at 3:05 PM, Muhammad Gelbana <[email protected]> wrote: > > Here it is: > > SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc > LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM > `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON ( > > `t0`.`UserID` IS NOT DISTINCT FROM > > `t1`.`UserID`) LIMIT 2147483647 > > I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into > > *`t0`.`UserID` = `t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID` > IS NULL**)* while checking if the query is a cartesian join, and when the > check returns true, it throws an excetion saying: *This query cannot be > planned possibly due to either a cartesian join or an inequality join* > > > *---------------------* > *Muhammad Gelbana* > http://www.linkedin.com/in/mgelbana > > On Sat, May 6, 2017 at 6:53 PM, Gautam Parai <[email protected]> wrote: > > > Can you please specify the query you are trying to execute? > > > > > > Gautam > > > > ________________________________ > > From: Muhammad Gelbana <[email protected]> > > Sent: Saturday, May 6, 2017 7:34:53 AM > > To: [email protected]; [email protected] > > Subject: Running cartesian joins on Drill > > > > Is there a reason why Drill would intentionally reject cartesian join > > queries even if *planner.enable_nljoin_for_scalar_only* is disabled ? > > > > Any ideas how could a query be rewritten to overcome this restriction ? > > > > *---------------------* > > *Muhammad Gelbana* > > http://www.linkedin.com/in/mgelbana > > >
