Quanlong Huang created IMPALA-15002:
---------------------------------------
Summary: No way to determine effective runtime filters on
KuduScanNode based on profile
Key: IMPALA-15002
URL: https://issues.apache.org/jira/browse/IMPALA-15002
Project: IMPALA
Issue Type: Bug
Components: Backend
Reporter: Quanlong Huang
Assignee: Quanlong Huang
Runtime filters on KuduScanNode are pushed down to kudu scanner as normal
predicates:
[https://github.com/apache/impala/blob/e1ca23d627532bb17228e3d455c55a03b3e28f49/be/src/exec/kudu/kudu-scanner.cc#L277-L278]
[https://github.com/apache/impala/blob/e1ca23d627532bb17228e3d455c55a03b3e28f49/be/src/exec/kudu/kudu-scanner.cc#L319-L331]
Impala is not aware of the effect of an individual filter. So currently the
profile counters of runtime filters on KuduScanNode are all 0. For instance,
for the following query
{code:sql}
use functional_kudu;
select STRAIGHT_JOIN count(*) from alltypes p join [BROADCAST] alltypestiny b
on p.month = b.int_col and b.month = 1 and b.string_col = "1";{code}
Profile counters of the runtime filters are all 0:
{noformat}
KUDU_SCAN_NODE (id=0):
Filter 0 (1.00 MB):
- Files processed: 0 (0)
- Files rejected: 0 (0)
- Files total: 0 (0)
- RowGroups processed: 0 (0)
- RowGroups rejected: 0 (0)
- RowGroups total: 0 (0)
- Rows processed: 0 (0)
- Rows rejected: 0 (0)
- Rows total: 0 (0)
- Splits processed: 0 (0)
- Splits rejected: 0 (0)
- Splits total: 0 (0)
Filter 1 (0):
- Files processed: 0 (0)
- Files rejected: 0 (0)
- Files total: 0 (0)
- RowGroups processed: 0 (0)
- RowGroups rejected: 0 (0)
- RowGroups total: 0 (0)
- Rows processed: 0 (0)
- Rows rejected: 0 (0)
- Rows total: 0 (0)
- Splits processed: 0 (0)
- Splits rejected: 0 (0)
- Splits total: 0 (0){noformat}
Running the same query on parquet tables gets meaningful counters:
{noformat}
HDFS_SCAN_NODE (id=0):
Filter 1 (0):
- Files processed: 8 (8)
- Files rejected: 6 (6)
- Files total: 8 (8)
- RowGroups processed: 0 (0)
- RowGroups rejected: 0 (0)
- RowGroups total: 0 (0)
- Rows processed: 0 (0)
- Rows rejected: 0 (0)
- Rows total: 0 (0)
- Splits processed: 2 (2)
- Splits rejected: 0 (0)
- Splits total: 2 (2)
Filter 0 (1.00 MB):
- Files processed: 2 (2)
- Files rejected: 0 (0)
- Files total: 2 (2)
- RowGroups processed: 0 (0)
- RowGroups rejected: 0 (0)
- RowGroups total: 0 (0)
- Rows processed: 0 (0)
- Rows rejected: 0 (0)
- Rows total: 0 (0)
- Splits processed: 2 (2)
- Splits rejected: 0 (0)
- Splits total: 2 (2){noformat}
KUDU-2162 was filed to add metrics for this. However, the metrics it added are
not enough to determine the effect of the filters, i.e. whether some data has
been filtered out.
So far we can only check the number output rows and see if it's smaller than a
full scan cardinality. E.g. like this test:
https://github.com/apache/impala/blob/e1ca23d627532bb17228e3d455c55a03b3e28f49/testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test#L29-L30
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]