[ 
https://issues.apache.org/jira/browse/KUDU-3193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wong updated KUDU-3193:
------------------------------
    Description: 
Often times slow queries can be the result of a sub-optimal schema for a given 
workload, e.g. if a scan's predicate is not on a prefix of the primary key. 
Diagnosing such issues typically takes some understanding of the workloads that 
are being run against a given table. It'd be nice if there were something more 
quantitative to understand whether a table(t)'s schema is to blame for a slow 
scan.

One thought that comes to mind is maintaining a histogram metric per-tablet of 
the ratio between the number of rows returned during a given scan and the 
number of rows iterated through during that scan. A consistently low value of 
this metric would indicate that predicates applied to the given tablet are 
doing a lot of IO reading rows that are not in the results set.

  was:
Often times slow queries can be the result of a sub-optimal schema for a given 
workload, e.g. if a scan's predicate is not on a prefix of the primary key. 
Diagnosing such issues typically takes some understanding of the workloads that 
are being run against a given table. It'd be nice if there were something more 
quantitative to understand whether a table(t)'s schema is to blame for a slow 
scan.

One thought that comes to mind is maintaining a histogram metric per-tablet of 
the ratio between the number of rows returned during a given scan and the 
number of rows iterated through during that scan. A consistently low value of 
this metric would indicate that predicates applied to the given tablet are not 
very effective.


> Per-tablet histogram for scan predicate efficiency
> --------------------------------------------------
>
>                 Key: KUDU-3193
>                 URL: https://issues.apache.org/jira/browse/KUDU-3193
>             Project: Kudu
>          Issue Type: Task
>          Components: metrics, ops-tooling, perf, tablet
>            Reporter: Andrew Wong
>            Priority: Major
>
> Often times slow queries can be the result of a sub-optimal schema for a 
> given workload, e.g. if a scan's predicate is not on a prefix of the primary 
> key. Diagnosing such issues typically takes some understanding of the 
> workloads that are being run against a given table. It'd be nice if there 
> were something more quantitative to understand whether a table(t)'s schema is 
> to blame for a slow scan.
> One thought that comes to mind is maintaining a histogram metric per-tablet 
> of the ratio between the number of rows returned during a given scan and the 
> number of rows iterated through during that scan. A consistently low value of 
> this metric would indicate that predicates applied to the given tablet are 
> doing a lot of IO reading rows that are not in the results set.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to