I believe clhubert is referring to this discussion <http://drill-user.incubator.apache.narkive.com/TIXWiTY4/cartesian-product-in-apache-drill#post1> .
So why Drill doesn't transform this query into a nested join query ? Simply because there is no Calcite rule to transform it into a nested loop join ? Is it not technically possible to write such Rule or is it feasible so I may take on this challenge ? Also pardon me for repeating my question but I fail to find an answer in your replies, why doesn't Drill just run a cartesian join ? Because it's expensive regarding resources (i.e. CPU\Network\RAM) ? Thanks a lot Shadi for the query, it works for me. *---------------------* *Muhammad Gelbana* http://www.linkedin.com/in/mgelbana On Mon, May 8, 2017 at 6:10 AM, Shadi Khalifa <[email protected]> wrote: > Hi Muhammad, > > I did the following as a workaround to have Cartesian product. The basic > idea is to create a dummy column on the fly that has the value 1 in both > tables and then join on that column leading to having a match of every row > of the first table with every row of the second table, hence do a Cartesian > product. This might not be the most efficient way but it will do the job. > > *Original Query:* > SELECT * FROM > ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT > 2147483647) `t0` > INNER JOIN > ( SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT > 2147483647) `t1` > ON (`t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`) > LIMIT 2147483647 > > *Workaround (add columns **d1a381f3g73 and **d1a381f3g74 to tables one > and two, respectively. Names don't really matter, just need to be unique):* > SELECT * FROM > ( SELECT *1 as d1a381f3g73*, 'ABC' `UserID` FROM > `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t0` > INNER JOIN > ( SELECT *1 as d1a381f3g74*, 'ABC' `UserID` FROM > `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` > ON (`t0`.*d1a381f3g73 = *`t1`.*d1a381f3g74*) > WHERE `t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID` > LIMIT 2147483647 > > Regards > > > *Shadi Khalifa, PhD* > Postdoctoral Fellow > Cognitive Analytics Development Hub > Centre for Advanced Computing > Queen’s University > (613) 533-6000 x78347 > http://cac.queensu.ca > > I'm just a neuron in the society collective brain > > *Join us for HPCS in June 2017! Register at:* *http://2017.hpcs.ca/ > <http://2017.hpcs.ca/>* > > P Please consider your environmental responsibility before printing this > e-mail > > *01001001 00100000 01101100 01101111 01110110 01100101 00100000 01000101 > 01100111 01111001 01110000 01110100 * > > *The information transmitted is intended only for the person or entity to > which it is addressed and may contain confidential material. Any review or > dissemination of this information by persons other than the intended > recipient is prohibited. If you received this in error, please contact the > sender and delete the material from any computer. Thank you.* > > > > On Saturday, May 6, 2017 6:05 PM, Muhammad Gelbana <[email protected]> > wrote: > > > > Here it is: > > SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc > LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM > `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON ( > > `t0`.`UserID` IS NOT DISTINCT FROM > > `t1`.`UserID`) LIMIT 2147483647 > > I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into > > *`t0`.`UserID` = `t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID` > IS NULL**)* while checking if the query is a cartesian join, and when the > check returns true, it throws an excetion saying: *This query cannot be > planned possibly due to either a cartesian join or an inequality join* > > > *---------------------* > *Muhammad Gelbana* > http://www.linkedin.com/in/mgelbana > > On Sat, May 6, 2017 at 6:53 PM, Gautam Parai <[email protected]> wrote: > > > Can you please specify the query you are trying to execute? > > > > > > Gautam > > > > ________________________________ > > From: Muhammad Gelbana <[email protected]> > > Sent: Saturday, May 6, 2017 7:34:53 AM > > To: [email protected]; [email protected] > > Subject: Running cartesian joins on Drill > > > > Is there a reason why Drill would intentionally reject cartesian join > > queries even if *planner.enable_nljoin_for_scalar_only* is disabled ? > > > > Any ideas how could a query be rewritten to overcome this restriction ? > > > > *---------------------* > > *Muhammad Gelbana* > > http://www.linkedin.com/in/mgelbana > > > > >
