Riza Suminto created IMPALA-12357:
-------------------------------------
Summary: Skip scheduling runtime filter from PK-FK join with full
build scan
Key: IMPALA-12357
URL: https://issues.apache.org/jira/browse/IMPALA-12357
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Reporter: Riza Suminto
Attachments: Screen Shot 2023-08-04 at 3.13.56 PM.png
PK-FK inner join between a dimension table and a fact table is a common
occurrence in a query. It is also often that such join does not involve any
predicate filter in the dimension table. Thus, runtime filter values coming
from this kind of dimension table scan (PK) is likely inclusive to all values
of the fact table column (FK). It is ineffective to generate this filter
because this filter is unlikely to reject any rows.
Attached screenshot shows visualization of RF 50, 52, 60, and 62 targeting
49:SCAN from TPC-DS Q64. These runtime filters coming from full dimension table
scan on PK-FK join. In theory, these filters should not reject any probe rows.
The query profile, however, shows that these filters can still reject some
probe rows with NULL values in their target column. Unfortunately, due to the
low number of NULL vs non-NULL, all of those filters still ended up disabled by
scanners because the 49:SCAN deemed them ineffective.
We can skip generating runtime filters that match all these criteria:
# Build side is full table scan
# No runtime filter targeting the build scan
# There is a PK-FK constraint between the runtime filter origin column in the
build side and the target column in the probe side.
If PK-FK constraint is not declared in table schema, which happen most of the
time, criteria 3 can be replaced by checking the runtime filter’s false
positive probability (eliminate one with high false positive probability).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)