@chulbert is correct. You have to disable planner.enable_nljoin_for_scalar_only in order for Drill to use nested loop joins. Otherwise, it only does so when one of the join inputs is a scalar subquery, i.e., a single return row.
Drill requires disabling the parameter because nested loop joins are currently implemented as inefficient Cartesian joins, which can be very expensive. The end user needs to make an explicit choice to use them. -- Zelaine On 5/6/17, 9:17 PM, "clhub...@gmail.com" <clhub...@gmail.com> wrote: As you've observed, we have to manually enable nested loops (which cartesian joins require). alter session set `planner.enable_nljoin_for_scalar_only` = false; I am guessing this approach was taken because 1. Although ANSI compliant SQL. These type of joins are not typical in most applications. 2. To avoid the possibility of the cost based optimizer selecting a nested loop algorithm (expensive join algorithm) by mistake. If this happens a lot, it would decrease overall performance. 3. Simple enough workaround (manually enabling/disabling) until the logic is developed to automatically enable nested loops when needed. If anyone knows for "sure", please chime in. OLAP Engines like Modrian have already been reported to generate queries that need Cartesian joins. https://issues.apache.org/jira/browse/DRILL-2915 Hopefully this helps... For my use case, it's no big deal to enable it manually. If performance ever becomes an issue, I can toggle it off and on as needed. I asked a similar question.. If you check back in the archives for this list (12/2016) you will see the details. On Sat, May 6, 2017 at 3:05 PM, Muhammad Gelbana <m.gelb...@gmail.com> wrote: > > Here it is: > > SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc > LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM > `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON ( > > `t0`.`UserID` IS NOT DISTINCT FROM > > `t1`.`UserID`) LIMIT 2147483647 > > I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into > > *`t0`.`UserID` = `t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID` > IS NULL**)* while checking if the query is a cartesian join, and when the > check returns true, it throws an excetion saying: *This query cannot be > planned possibly due to either a cartesian join or an inequality join* > > > *---------------------* > *Muhammad Gelbana* > http://www.linkedin.com/in/mgelbana > > On Sat, May 6, 2017 at 6:53 PM, Gautam Parai <gpa...@mapr.com> wrote: > > > Can you please specify the query you are trying to execute? > > > > > > Gautam > > > > ________________________________ > > From: Muhammad Gelbana <m.gelb...@gmail.com> > > Sent: Saturday, May 6, 2017 7:34:53 AM > > To: u...@drill.apache.org; dev@drill.apache.org > > Subject: Running cartesian joins on Drill > > > > Is there a reason why Drill would intentionally reject cartesian join > > queries even if *planner.enable_nljoin_for_scalar_only* is disabled ? > > > > Any ideas how could a query be rewritten to overcome this restriction ? > > > > *---------------------* > > *Muhammad Gelbana* > > http://www.linkedin.com/in/mgelbana > > >