[jira] [Created] (DRILL-5935) Hash Join projects unneeded columns

Boaz Ben-Zvi (JIRA) Mon, 06 Nov 2017 15:28:22 -0800

Boaz Ben-Zvi created DRILL-5935:
-----------------------------------

             Summary: Hash Join projects unneeded columns
                 Key: DRILL-5935
                 URL: https://issues.apache.org/jira/browse/DRILL-5935
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Relational Operators
    Affects Versions: 1.11.0
            Reporter: Boaz Ben-Zvi
            Priority: Minor



 The Hash Join operator projects all its input columns, including unneeded 
ones, relying on (multiple) project operators downstream to remove those 
columns. This is significantly wasteful, in both time and space (as each value 
is copied individually). Instead, the Hash Join itself should not project these 
unneeded columns. 

   In the following example, the join-key columns need not be projected. 
However the two hash join operators do project them.  Another problem: The 
join-key columns are copied from BOTH sides (build and probe), which is a 
waste, as both are IDENTICAL.
   Last - the plan in this example places the first join under the _build_ side 
of the second join; and the unneeded column from the first join (the join-key) 
is taken and finally projected by the second join. 

The sample query is:
{code}
select c.c_first_name, c.c_last_name, s.ss_quantity, a.ca_city 
from dfs.`/data/json/s1/customer` c, dfs.`/data/json/s1/store_sales` s, 
dfs.`/data/json/s1/customer_address` a
where c.c_customer_sk = s.ss_customer_sk and c.c_customer_id = a.ca_address_id; 
{code}

The plan first builds on 'customer_address' and probes with 'customer', and the 
output projects all 6 columns (2 from 'a', 4 from 'c'). Then the second join 
builds on all those 6 columns from the first join, and probes from the large 
table 'store_sales', and finally all 8 columns are projected (see below). Then 
3 project operators are used to remove the unneeded columns (see attached 
profile) - hence more waste.

{code}
    public void projectBuildRecord(int buildIndex, int outIndex)
        throws SchemaChangeException
    {
        {
            vv3 .copyFromSafe(((buildIndex)& 65535), (outIndex), vv0 
[((buildIndex)>>> 16)]);
        }
        {
            vv9 .copyFromSafe(((buildIndex)& 65535), (outIndex), vv6 
[((buildIndex)>>> 16)]);
        }
        {
            vv15 .copyFromSafe(((buildIndex)& 65535), (outIndex), vv12 
[((buildIndex)>>> 16)]);
        }
        {
            vv21 .copyFromSafe(((buildIndex)& 65535), (outIndex), vv18 
[((buildIndex)>>> 16)]);
        }
        {
            vv27 .copyFromSafe(((buildIndex)& 65535), (outIndex), vv24 
[((buildIndex)>>> 16)]);
        }
        {
            vv33 .copyFromSafe(((buildIndex)& 65535), (outIndex), vv30 
[((buildIndex)>>> 16)]);
        }
    }

    public void projectProbeRecord(int probeIndex, int outIndex)
        throws SchemaChangeException
    {
        {
            vv39 .copyFromSafe((probeIndex), (outIndex), vv36);
        }
        {
            vv45 .copyFromSafe((probeIndex), (outIndex), vv42);
        }
    }
{code}
 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (DRILL-5935) Hash Join projects unneeded columns

Reply via email to