[
https://issues.apache.org/jira/browse/KUDU-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Wong updated KUDU-3193:
------------------------------
Description:
Often times slow queries can be the result of a sub-optimal schema for a given
workload, e.g. if a scan's predicate is not on a prefix of the primary key.
Diagnosing such issues typically takes some understanding of the workloads that
are being run against a given table. It'd be nice if there were something more
quantitative to understand whether a table(t)'s schema is to blame for a slow
scan.
One thought that comes to mind is maintaining a histogram metric per-tablet of
the ratio between the number of rows returned during a given scan and the
number of rows iterated through during that scan. A consistently low value of
this metric would indicate that predicates applied to the given tablet are not
very effective.
was:
Often times slow queries can be the result of a sub-optimal schema for a given
workload, e.g. if a scan's predicate is not on a prefix of the primary key.
Diagnosing such issues typically takes some understanding of the workloads that
are being run against a given table. It'd be nice if there were something more
quantitative to understand whether a table(t)'s schema is to blame for a slow
scan.
One thought that comes to mind is maintaining a histogram metric per-tablet of
the ratio between the number of rows returned during a given and the number of
rows iterated through during that scan. A consistently low value of this metric
would indicate that predicates applied to the given tablet are not very
effective.
> Per-tablet histogram for scan predicate efficiency
> --------------------------------------------------
>
> Key: KUDU-3193
> URL: https://issues.apache.org/jira/browse/KUDU-3193
> Project: Kudu
> Issue Type: Task
> Components: metrics, ops-tooling, perf, tablet
> Reporter: Andrew Wong
> Priority: Major
>
> Often times slow queries can be the result of a sub-optimal schema for a
> given workload, e.g. if a scan's predicate is not on a prefix of the primary
> key. Diagnosing such issues typically takes some understanding of the
> workloads that are being run against a given table. It'd be nice if there
> were something more quantitative to understand whether a table(t)'s schema is
> to blame for a slow scan.
> One thought that comes to mind is maintaining a histogram metric per-tablet
> of the ratio between the number of rows returned during a given scan and the
> number of rows iterated through during that scan. A consistently low value of
> this metric would indicate that predicates applied to the given tablet are
> not very effective.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)