[
https://issues.apache.org/jira/browse/IMPALA-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18067101#comment-18067101
]
Quanlong Huang commented on IMPALA-14796:
-----------------------------------------
A runtime filter can have multiple target nodes, e.g., in the above filters of
TPCH-Q5, filter 2 has two target nodes (0,3). So a single boolean "effective"
column is not enough. We can add a column to list the node ids that the filter
is effective (i.e. rejected some data).
Uploaded a patch for review: https://gerrit.cloudera.org/c/24123/
The new "Final filter table" for TPCH-Q5:
{noformat}
Final filter table:
ID Src. Node Tgt. Node(s) Eff. Tgt. Node(s) Target type Partition
filter Pending (Expected) First arrived Completed Enabled Bloom Size Est
fpp Min value Max value In-list size
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10 6 2 2 LOCAL
false 0 (3) N/A N/A false 1.00 MB
2.06e-06
8 7 1 1 REMOTE
false 0 (3) 743.792ms 744.190ms true 1.00 MB
2.06e-06
5 8 2 2 LOCAL
false 0 (3) N/A N/A false 1.00 MB
3.54e-11
4 8 0 N REMOTE
false 0 (3) 733.116ms 733.521ms true 1.00 MB
7.53e-16
2 9 0, 3 0, 3 REMOTE, REMOTE false,
false 0 (3) 725.755ms 726.151ms true 1.00 MB
7.53e-16
0 10 4 4 REMOTE
false 0 (3) 716.720ms 717.109ms true 1.00 MB
2.79e-17{noformat}
Note that filter 4 doesn't reject any rows so its corresponding "Eff. Tgt.
Node(s)" column value is "N".
> Add "effective" column in "Final filter table" in query profile
> ---------------------------------------------------------------
>
> Key: IMPALA-14796
> URL: https://issues.apache.org/jira/browse/IMPALA-14796
> Project: IMPALA
> Issue Type: New Feature
> Components: Backend
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Major
> Labels: ramp-up
>
> In the query profile, there is a section about runtime filters, e.g., for
> TPCH-Q5:
> {noformat}
> Final filter table:
> ID Src. Node Tgt. Node(s) Target type Partition filter Pending
> (Expected) First arrived Completed Enabled Bloom Size Est fpp Min
> value Max value In-list size
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> 10 6 2 LOCAL false
> 0 (3) N/A N/A true 1.00 MB 2.06e-06
>
> 8 7 1 REMOTE false
> 0 (3) 392.242ms 393.338ms true 1.00 MB 2.06e-06
>
> 5 8 2 LOCAL false
> 0 (3) N/A N/A true 1.00 MB 3.54e-11
>
> 4 8 0 REMOTE false
> 0 (3) 351.647ms 351.978ms true 1.00 MB 7.53e-16
>
> 2 9 0, 3 REMOTE, REMOTE false, false
> 0 (3) 347.219ms 347.494ms true 1.00 MB 7.53e-16
>
> 0 10 4 REMOTE false
> 0 (3) 342.907ms 343.293ms true 1.00 MB 2.79e-17{noformat}
> It'd be helpful to add a boolean column "effective" to show whether the
> filter actually rejects any data (rows/RowGroups/splits/files).
> Currently, we have to check the "rejected" counters of the ScanNodes, e.g.,
> {noformat}
> Filter 2 (1.00 MB):
> - Files processed: 0 (0)
> - Files rejected: 0 (0)
> - Files total: 0 (0)
> - RowGroups processed: 1 (1)
> - RowGroups rejected: 0 (0)
> - RowGroups total: 1 (1)
> - Rows processed: 150.00K (150000)
> - Rows rejected: 119.82K (119817)
> - Rows total: 150.00K (150000)
> - Splits processed: 0 (0)
> - Splits rejected: 0 (0)
> - Splits total: 0 (0)
> Filter 4 (1.00 MB):
> - Files processed: 0 (0)
> - Files rejected: 0 (0)
> - Files total: 0 (0)
> - RowGroups processed: 1 (1)
> - RowGroups rejected: 0 (0)
> - RowGroups total: 1 (1)
> - Rows processed: 16.38K (16384)
> - Rows rejected: 0 (0)
> - Rows total: 30.18K (30183)
> - Splits processed: 0 (0)
> - Splits rejected: 0 (0)
> - Splits total: 0 (0){noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]