Ben Becker created DRILL-173:
--------------------------------

             Summary: Join operator should reuse ValueVectors when duplicate 
keys are present
                 Key: DRILL-173
                 URL: https://issues.apache.org/jira/browse/DRILL-173
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: Alpha
            Reporter: Ben Becker


There are cases where joining two record batches can result in redundant work.  
Consider a merge join performed on two tables (*t1* and *t2*) with duplicate 
keys on both sides:

h5. t1
|| key || value ||
| 2 | 'a' |
| 2 | 'b' |

h5. t2
|| key || value ||
| 2 | 'A' |
| 2 | 'B' |
| 2 | 'C' |

The resulting table will contain the cross product of all key values '2':

|| key || t1.value || t2.value ||
| 2 | 'a' | 'A' |
| 2 | 'a' | 'B' |
| 2 | 'a' | 'C' |
| 2 | 'b' | 'A' |
| 2 | 'b' | 'B' |
| 2 | 'b' | 'C' |

The current implementation iteratively copies t2.value from the incoming 
vectors.  Ideally, the t2.value vector would only be iteratively constructed 
the first pass; after that it can be copied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to