[
https://issues.apache.org/jira/browse/KUDU-2162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16222666#comment-16222666
]
Thomas Tauber-Marshall commented on KUDU-2162:
----------------------------------------------
[~wdberkeley] "bytes read" and "elapsed time" is fine, but I don't think it
really gets at what I'm interested in here.
When you say "bytes read" - I'm guessing that would include bytes from blocks
that are read but where the bytes are filtered by the predicate, but doesn't
include bytes from blocks that aren't read because the entire block is
filtered, in which case there's not really a direct relationship between "bytes
read" and the selectivity of the filter, which was what Impala is interested in
here.
For example, one use case for this would be for Impala to determine while the
query is running that a particular runtime filter is not very selective and to
stop applying it. Is that possible with the "bytes read"/"elapsed time" stats?
> Expose stats about scan filters
> -------------------------------
>
> Key: KUDU-2162
> URL: https://issues.apache.org/jira/browse/KUDU-2162
> Project: Kudu
> Issue Type: Improvement
> Components: client
> Reporter: Thomas Tauber-Marshall
> Assignee: Will Berkeley
>
> Impala is working on implementing runtime filters that get pushed down into
> Kudu using KuduScanner::AddConjunctPredicate()
> It would be useful for perf analysis and debugging to be able to include info
> in Impala's runtime profile about the effectiveness of the filters, eg.
> number of rows that are filtered.
> This would probably require at least two counters:
> - # of blocks that are entirely skipped
> - # of rows that are filtered from blocks that aren't entirely skipped
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)