Tim Armstrong created IMPALA-9875:
-------------------------------------
Summary: Deduplicate build in joins with distinct semantics
Key: IMPALA-9875
URL: https://issues.apache.org/jira/browse/IMPALA-9875
Project: IMPALA
Issue Type: Improvement
Components: Backend
Reporter: Tim Armstrong
For left semi and anti joins with only equi-join predicates, we don't need to
store duplicates in the hash table, because a probe row will always match the
first build row. We could rework the build process in PhjBuilder so that it
builds the hash table on the fly and avoids insertion into the
BufferedTupleStream if there is a match in the hash table. I.e. the build
process would be closer to GroupingAggregator.
Some other joins like that in IMPALA-1706 also have distinct semantics, so
maybe this could be applied there too to avoid exploding joins.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]