Riza Suminto created IMPALA-12357:
-------------------------------------

             Summary: Skip scheduling runtime filter from PK-FK join with full 
build scan
                 Key: IMPALA-12357
                 URL: https://issues.apache.org/jira/browse/IMPALA-12357
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
            Reporter: Riza Suminto
         Attachments: Screen Shot 2023-08-04 at 3.13.56 PM.png

PK-FK inner join between a dimension table and a fact table is a common 
occurrence in a query. It is also often that such join does not involve any 
predicate filter in the dimension table. Thus, runtime filter values coming 
from this kind of dimension table scan (PK) is likely inclusive to all values 
of the fact table column (FK). It is ineffective to generate this filter 
because this filter is unlikely to reject any rows.

Attached screenshot shows visualization of RF 50, 52, 60, and 62 targeting 
49:SCAN from TPC-DS Q64. These runtime filters coming from full dimension table 
scan on PK-FK join. In theory, these filters should not reject any probe rows. 
The query profile, however, shows that these filters can still reject some 
probe rows with NULL values in their target column. Unfortunately, due to the 
low number of NULL vs non-NULL, all of those filters still ended up disabled by 
scanners because the 49:SCAN deemed them ineffective.

We can skip generating runtime filters that match all these criteria:
 # Build side is full table scan
 # No runtime filter targeting the build scan
 # There is a PK-FK constraint between the runtime filter origin column in the 
build side and the target column in the probe side.

If PK-FK constraint is not declared in table schema, which happen most of the 
time, criteria 3 can be replaced by checking the runtime filter’s false 
positive probability (eliminate one with high false positive probability).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to