Tim Armstrong created IMPALA-9875:
-------------------------------------

             Summary: Deduplicate build in joins with distinct semantics
                 Key: IMPALA-9875
                 URL: https://issues.apache.org/jira/browse/IMPALA-9875
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
            Reporter: Tim Armstrong


For left semi and anti joins with only equi-join predicates, we don't need to 
store duplicates in the hash table, because a probe row will always match the 
first build row. We could rework the build process in PhjBuilder so that it 
builds the hash table on the fly and avoids insertion into the 
BufferedTupleStream if there is a match in the hash table. I.e. the build 
process would be closer to GroupingAggregator.

Some other joins like that in IMPALA-1706 also have distinct semantics, so 
maybe this could be applied there too to avoid exploding joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to