[ https://issues.apache.org/jira/browse/KUDU-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Wong updated KUDU-3193: ------------------------------ Description: Often times slow queries can be the result of a sub-optimal schema for a given workload, e.g. if a scan's predicate is not on a prefix of the primary key. Diagnosing such issues typically takes some understanding of the workloads that are being run against a given table. It'd be nice if there were something more quantitative to understand whether a table(t)'s schema is to blame for a slow scan. One thought that comes to mind is maintaining a histogram metric per-tablet of the ratio between the number of rows returned during a given scan and the number of rows iterated through during that scan. A consistently low value of this metric would indicate that predicates applied to the given tablet are doing a lot of IO reading rows that are not in the results set. was: Often times slow queries can be the result of a sub-optimal schema for a given workload, e.g. if a scan's predicate is not on a prefix of the primary key. Diagnosing such issues typically takes some understanding of the workloads that are being run against a given table. It'd be nice if there were something more quantitative to understand whether a table(t)'s schema is to blame for a slow scan. One thought that comes to mind is maintaining a histogram metric per-tablet of the ratio between the number of rows returned during a given scan and the number of rows iterated through during that scan. A consistently low value of this metric would indicate that predicates applied to the given tablet are not very effective. > Per-tablet histogram for scan predicate efficiency > -------------------------------------------------- > > Key: KUDU-3193 > URL: https://issues.apache.org/jira/browse/KUDU-3193 > Project: Kudu > Issue Type: Task > Components: metrics, ops-tooling, perf, tablet > Reporter: Andrew Wong > Priority: Major > > Often times slow queries can be the result of a sub-optimal schema for a > given workload, e.g. if a scan's predicate is not on a prefix of the primary > key. Diagnosing such issues typically takes some understanding of the > workloads that are being run against a given table. It'd be nice if there > were something more quantitative to understand whether a table(t)'s schema is > to blame for a slow scan. > One thought that comes to mind is maintaining a histogram metric per-tablet > of the ratio between the number of rows returned during a given scan and the > number of rows iterated through during that scan. A consistently low value of > this metric would indicate that predicates applied to the given tablet are > doing a lot of IO reading rows that are not in the results set. -- This message was sent by Atlassian Jira (v8.3.4#803005)