Quanlong Huang created IMPALA-15002:
---------------------------------------

             Summary: No way to determine effective runtime filters on 
KuduScanNode based on profile
                 Key: IMPALA-15002
                 URL: https://issues.apache.org/jira/browse/IMPALA-15002
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
            Reporter: Quanlong Huang
            Assignee: Quanlong Huang


Runtime filters on KuduScanNode are pushed down to kudu scanner as normal 
predicates:
[https://github.com/apache/impala/blob/e1ca23d627532bb17228e3d455c55a03b3e28f49/be/src/exec/kudu/kudu-scanner.cc#L277-L278]
[https://github.com/apache/impala/blob/e1ca23d627532bb17228e3d455c55a03b3e28f49/be/src/exec/kudu/kudu-scanner.cc#L319-L331]

Impala is not aware of the effect of an individual filter. So currently the 
profile counters of runtime filters on KuduScanNode are all 0. For instance, 
for the following query
{code:sql}
use functional_kudu;
select STRAIGHT_JOIN count(*) from alltypes p join [BROADCAST] alltypestiny b
on p.month = b.int_col and b.month = 1 and b.string_col = "1";{code}
Profile counters of the runtime filters are all 0:
{noformat}
        KUDU_SCAN_NODE (id=0):
          Filter 0 (1.00 MB):
             - Files processed: 0 (0)
             - Files rejected: 0 (0)
             - Files total: 0 (0)
             - RowGroups processed: 0 (0)
             - RowGroups rejected: 0 (0)
             - RowGroups total: 0 (0)
             - Rows processed: 0 (0)
             - Rows rejected: 0 (0)
             - Rows total: 0 (0)
             - Splits processed: 0 (0)
             - Splits rejected: 0 (0)
             - Splits total: 0 (0)
          Filter 1 (0):
             - Files processed: 0 (0)
             - Files rejected: 0 (0)
             - Files total: 0 (0)
             - RowGroups processed: 0 (0)
             - RowGroups rejected: 0 (0)
             - RowGroups total: 0 (0)
             - Rows processed: 0 (0)
             - Rows rejected: 0 (0)
             - Rows total: 0 (0)
             - Splits processed: 0 (0)
             - Splits rejected: 0 (0)
             - Splits total: 0 (0){noformat}
Running the same query on parquet tables gets meaningful counters:
{noformat}
        HDFS_SCAN_NODE (id=0):
          Filter 1 (0):
             - Files processed: 8 (8)
             - Files rejected: 6 (6)
             - Files total: 8 (8)
             - RowGroups processed: 0 (0)
             - RowGroups rejected: 0 (0)
             - RowGroups total: 0 (0)
             - Rows processed: 0 (0)
             - Rows rejected: 0 (0)
             - Rows total: 0 (0)
             - Splits processed: 2 (2)
             - Splits rejected: 0 (0)
             - Splits total: 2 (2)
          Filter 0 (1.00 MB):
             - Files processed: 2 (2)
             - Files rejected: 0 (0)
             - Files total: 2 (2)
             - RowGroups processed: 0 (0)
             - RowGroups rejected: 0 (0)
             - RowGroups total: 0 (0)
             - Rows processed: 0 (0)
             - Rows rejected: 0 (0)
             - Rows total: 0 (0)
             - Splits processed: 2 (2)
             - Splits rejected: 0 (0)
             - Splits total: 2 (2){noformat}
KUDU-2162 was filed to add metrics for this. However, the metrics it added are 
not enough to determine the effect of the filters, i.e. whether some data has 
been filtered out.

So far we can only check the number output rows and see if it's smaller than a 
full scan cardinality. E.g. like this test:
https://github.com/apache/impala/blob/e1ca23d627532bb17228e3d455c55a03b3e28f49/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test#L29-L30



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to