Re: Running cartesian joins on Drill

Shadi Khalifa Mon, 08 May 2017 13:55:00 -0700

Hi Muhammad,
I did the following as a workaround to have Cartesian product. The basic idea 
is to create a dummy column on the fly that has the value 1 in both tables and 
then join on that column leading to having a match of every row of the first 
table with every row of the second table, hence do a Cartesian product. This 
might not be the most efficient way but it will do the job.
Original Query:SELECT * FROM ( SELECT 'ABC' `UserID` FROM 
`dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t0` INNER JOIN ( SELECT 
'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON 
(`t0`.`UserID` IS NOT DISTINCT FROM `t1`.`UserID`) LIMIT 2147483647
Workaround (add columns d1a381f3g73 and d1a381f3g74 to tables one and two, 
respectively. Names don't really matter, just need to be unique):SELECT * FROM 
( SELECT 1 as d1a381f3g73, 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc 
LIMIT 2147483647) `t0` INNER JOIN ( SELECT 1 as d1a381f3g74, 'ABC' `UserID` 
FROM `dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON 
(`t0`.d1a381f3g73 = `t1`.d1a381f3g74)WHERE `t0`.`UserID` IS NOT DISTINCT FROM 
`t1`.`UserID`LIMIT 2147483647 Regards

 Shadi Khalifa, PhD Postdoctoral Fellow Cognitive Analytics Development Hub 
Centre for Advanced Computing Queen’s University (613) 533-6000 x78347 
http://cac.queensu.ca  I'm just a neuron in thesociety collective brain 
Join us for HPCS in June 2017! Register at:  http://2017.hpcs.ca/
 P Please consider yourenvironmental responsibility before printing this e-mail 
   01001001 00100000 01101100 01101111 0111011001100101 00100000 01000101 
01100111 01111001 01110000 01110100 
  The information transmitted is intended only forthe person or entity to which 
it is addressed and may contain confidential material.Any review or 
dissemination of this information by persons other than theintended recipient 
is prohibited. If you received this in error, please contactthe sender and 
delete the material from any computer. Thank you.

    On Saturday, May 6, 2017 6:05 PM, Muhammad Gelbana <[email protected]> 
wrote:

Here it is:

SELECT * FROM (SELECT 'ABC' `UserID` FROM `dfs`.`path_to_parquet_file` tc
LIMIT 2147483647) `t0` INNER JOIN (SELECT 'ABC' `UserID` FROM
`dfs`.`path_to_parquet_file` tc LIMIT 2147483647) `t1` ON (

`t0`.`UserID` IS NOT DISTINCT FROM

`t1`.`UserID`) LIMIT 2147483647

I debugged Drill code and found it decomposes *IS NOT DISTINCT FROM* into

*`t0`.`UserID` = `t1`.`UserID` OR (`t0`.`UserID` IS NULL && `t1`.`UserID`
IS NULL**)* while checking if the query is a cartesian join, and when the
check returns true, it throws an excetion saying: *This query cannot be
planned possibly due to either a cartesian join or an inequality join*

*---------------------*
*Muhammad Gelbana*
http://www.linkedin.com/in/mgelbana

On Sat, May 6, 2017 at 6:53 PM, Gautam Parai <[email protected]> wrote:

> Can you please specify the query you are trying to execute?
>
>
> Gautam
>
> ________________________________
> From: Muhammad Gelbana <[email protected]>
> Sent: Saturday, May 6, 2017 7:34:53 AM
> To: [email protected]; [email protected]
> Subject: Running cartesian joins on Drill
>
> Is there a reason why Drill would intentionally reject cartesian join
> queries even if *planner.enable_nljoin_for_scalar_only* is disabled ?
>
> Any ideas how could a query be rewritten to overcome this restriction ?
>
> *---------------------*
> *Muhammad Gelbana*
> http://www.linkedin.com/in/mgelbana
>

Re: Running cartesian joins on Drill

Reply via email to