kudu git commit: Blogpost describing index skip scan optimization.

mpercy Wed, 26 Sep 2018 10:51:49 -0700

Repository: kudu
Updated Branches:
  refs/heads/gh-pages 9f7058cc7 -> 83530755d



Blogpost describing index skip scan optimization.

Link to the version with images:
https://github.com/AnupamaGupta01/kudu/blob/blogpost-2/_posts/2018-09-25-index-skip-scan-optimization-in-kudu.md

Change-Id: I2250652dcba3d1b0a06f1ffb7f23c11bf533d35e
Reviewed-on: http://gerrit.cloudera.org:8080/11263
Reviewed-by: Mike Percy <[email protected]>
Tested-by: Mike Percy <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/83530755
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/83530755
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/83530755

Branch: refs/heads/gh-pages
Commit: 83530755dbcddd44debe9b3ade286f58b96d5c82
Parents: 9f7058c
Author: anupama <[email protected]>
Authored: Fri Aug 17 14:53:47 2018 -0700
Committer: Mike Percy <[email protected]>
Committed: Wed Sep 26 17:51:16 2018 +0000

----------------------------------------------------------------------
 ...9-26-index-skip-scan-optimization-in-kudu.md | 114 +++++++++++++++++++
 img/index-skip-scan/example-table.png           | Bin 0 -> 73625 bytes
 img/index-skip-scan/skip-scan-example-table.png | Bin 0 -> 48512 bytes
 .../skip-scan-performance-graph.png             | Bin 0 -> 59142 bytes
 4 files changed, 114 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/83530755/_posts/2018-09-26-index-skip-scan-optimization-in-kudu.md
----------------------------------------------------------------------
diff --git a/_posts/2018-09-26-index-skip-scan-optimization-in-kudu.md 
b/_posts/2018-09-26-index-skip-scan-optimization-in-kudu.md
new file mode 100644
index 0000000..ba91926
--- /dev/null
+++ b/_posts/2018-09-26-index-skip-scan-optimization-in-kudu.md
@@ -0,0 +1,114 @@
+---
+layout: post
+title: "Index Skip Scan Optimization in Kudu"
+author: Anupama Gupta
+---
+
+This summer I got the opportunity to intern with the Apache Kudu team at 
Cloudera.
+My project was to optimize the Kudu scan path by implementing a technique 
called
+index skip scan (a.k.a. scan-to-seek, see section 4.1 in [1]). I wanted to 
share
+my experience and the progress we've made so far on the approach.
+
+<!--more-->
+
+Let's begin with discussing the current query flow in Kudu.
+Consider the following table:
+
+{% highlight SQL %}
+CREATE TABLE metrics (
+    host STRING,
+    tstamp INT,
+    clusterid INT,
+    role STRING,
+    PRIMARY KEY (host, tstamp, clusterid)
+);
+{% endhighlight %}
+
+![png]({{ site.github.url }}/img/index-skip-scan/example-table.png){: 
.img-responsive}
+*Sample rows of table `metrics` (sorted by key columns).*
+
+
+In this case, by default, Kudu internally builds a primary key index 
(implemented as a
+[B-tree](https://en.wikipedia.org/wiki/B-tree)) for the table `metrics`.
+As shown in the table above, the index data is sorted by the composite of all 
key columns.
+When the user query contains the first key column (`host`), Kudu uses the 
index (as the index data is
+primarily sorted on the first key column).
+
+Now, what if the user query does not contain the first key column and instead 
only contains the `tstamp` column?
+In the above case, the `tstamp` column values are sorted with respect to 
`host`,
+but are not globally sorted, and as such, it's non-trivial to use the index to 
filter rows.
+Instead, a full tablet scan is done by default. Other databases may optimize 
such scans by building secondary indexes
+(though it might be redundant to build one on one of the primary keys). 
However, this isn't an option for Kudu,
+given its lack of secondary index support.
+
+The question is, can Kudu do better than a full tablet scan here?
+
+The answer is yes! Let's observe the column preceding the `tstamp` column. We 
will refer to it as the
+"prefix column" and its specific value as the "prefix key". In this example, 
`host` is the prefix column.
+Note that the prefix keys are sorted in the index and that all rows of a given 
prefix key are also sorted by the
+remaining key columns. Therefore, we can use the index to skip to the rows 
that have distinct prefix keys,
+and also satisfy the predicate on the `tstamp` column.
+For example, consider the query:
+{% highlight SQL %}
+SELECT clusterid FROM metrics WHERE tstamp = 100;
+{% endhighlight %}
+
+![png]({{ site.github.url 
}}/img/index-skip-scan/skip-scan-example-table.png){: .img-responsive}
+*Skip scan flow illustration. The rows in green are scanned and the rest are 
skipped.*
+
+The tablet server can use the index to **skip** to the first row with a 
distinct prefix key (`host = helium`) that
+matches the predicate (`tstamp = 100`) and then **scan** through the rows 
until the predicate no longer matches. At that
+point we would know that no more rows with `host = helium` will satisfy the 
predicate, and we can skip to the next
+prefix key. This holds true for all distinct keys of `host`. Hence, this 
method is popularly known as
+**skip scan optimization**[2, 3].
+
+Performance
+==========
+
+This optimization can speed up queries significantly, depending on the 
cardinality (number of distinct values) of the
+prefix column. The lower the prefix column cardinality, the better the skip 
scan performance. In fact, when the
+prefix column cardinality is high, skip scan is not a viable approach. The 
performance graph (obtained using the example
+schema and query pattern mentioned earlier) is shown below.
+
+Based on our experiments, on up to 10 million rows per tablet (as shown 
below), we found that the skip scan performance
+begins to get worse with respect to the full tablet scan performance when the 
prefix column cardinality
+exceeds sqrt(number_of_rows_in_tablet).
+Therefore, in order to use skip scan performance benefits when possible and 
maintain a consistent performance in cases
+of large prefix column cardinality, we have tentatively chosen to dynamically 
disable skip scan when the number of skips for
+distinct prefix keys exceeds sqrt(number_of_rows_in_tablet).
+It will be an interesting project to further explore sophisticated heuristics 
to decide when
+to dynamically disable skip scan.
+
+![png]({{ site.github.url 
}}/img/index-skip-scan/skip-scan-performance-graph.png){: .img-responsive}
+
+Conclusion
+==========
+
+Skip scan optimization in Kudu can lead to huge performance benefits that 
scale with the size of
+data in Kudu tablets. This is a work-in-progress 
[patch](https://gerrit.cloudera.org/#/c/10983/).
+The implementation in the patch works only for equality predicates on the 
non-first primary key
+columns. An important point to note is that although, in the above specific 
example, the number of prefix
+columns is one (`host`), this approach is generalized to work with any number 
of prefix columns.
+
+This work also lays the groundwork to leverage the skip scan approach and 
optimize query processing time in the
+following use cases:
+
+- Range predicates
+- In-list predicates
+
+This was my first time working on an open source project. I thoroughly enjoyed 
working on this challenging problem,
+right from understanding the scan path in Kudu to working on a full-fledged 
implementation of
+the skip scan optimization. I am very grateful to the Kudu team for guiding 
and supporting me throughout the
+internship period.
+
+References
+==========
+
+[[1]](https://storage.googleapis.com/pub-tools-public-publication-data/pdf/42851.pdf):
 Gupta, Ashish, et al. "Mesa:
+Geo-replicated, near real-time, scalable data warehousing." Proceedings of the 
VLDB Endowment 7.12 (2014): 1259-1270.
+
+[[2]](https://oracle-base.com/articles/9i/index-skip-scanning/): Index Skip 
Scanning - Oracle Database
+
+[[3]](https://www.sqlite.org/optoverview.html#skipscan): Skip Scan - SQLite
+
+

http://git-wip-us.apache.org/repos/asf/kudu/blob/83530755/img/index-skip-scan/example-table.png
----------------------------------------------------------------------
diff --git a/img/index-skip-scan/example-table.png 
b/img/index-skip-scan/example-table.png
new file mode 100644
index 0000000..585ae4d
Binary files /dev/null and b/img/index-skip-scan/example-table.png differ

http://git-wip-us.apache.org/repos/asf/kudu/blob/83530755/img/index-skip-scan/skip-scan-example-table.png
----------------------------------------------------------------------
diff --git a/img/index-skip-scan/skip-scan-example-table.png 
b/img/index-skip-scan/skip-scan-example-table.png
new file mode 100644
index 0000000..3b965ac
Binary files /dev/null and b/img/index-skip-scan/skip-scan-example-table.png 
differ

http://git-wip-us.apache.org/repos/asf/kudu/blob/83530755/img/index-skip-scan/skip-scan-performance-graph.png
----------------------------------------------------------------------
diff --git a/img/index-skip-scan/skip-scan-performance-graph.png 
b/img/index-skip-scan/skip-scan-performance-graph.png
new file mode 100644
index 0000000..5004b65
Binary files /dev/null and 
b/img/index-skip-scan/skip-scan-performance-graph.png differ

kudu git commit: Blogpost describing index skip scan optimization.

Reply via email to