[
https://issues.apache.org/jira/browse/IMPALA-10098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17183482#comment-17183482
]
Shant Hovsepian commented on IMPALA-10098:
------------------------------------------
[~tarmstrong] we have seen the full range of cardinalities. In the case of
TPC-DS a common pattern is to find all "transactions not returned" which often
is an ANTI JOIN or LEFT JOIN between two fact tables. The return rate is around
1% in this synthetic case so at a 30TB scale factor the cardinality is close to
100M. TPC-DS also has cases with item dimensions and NOT IN which are in the
order of hundreds and thousands of unique values.
> Runtime Filters for Set Exclusion or Compliment
> -----------------------------------------------
>
> Key: IMPALA-10098
> URL: https://issues.apache.org/jira/browse/IMPALA-10098
> Project: IMPALA
> Issue Type: New Feature
> Reporter: Shant Hovsepian
> Priority: Major
> Labels: runtime-filters
>
> It would be beneficial to extend runtime filters to push set exclusion down
> to scan nodes. This would be used to optimize NOT IN, EXCEPT style queries or
> more generally ANTI JOINS, as well as OUTER JOINs which filter out non null
> attributes from the nullable side.
> This is almost the inverse operation of a traditional bloom filter, other
> data structures might be more efficient.
> This would also compliment Impala's left deep pipelined query planning very
> well for what otherwise would require complex query plans due to reordering
> restrictions with ANTI/OUTER joins.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]