Re: Regarding index skip scan in kudu

Mike Percy Wed, 06 Mar 2019 00:53:26 -0800

Hi Shanmuga,
Ok, sounds good. Make sure you are subscribed to dev@ in case you are not
already. I also suggest you join the Slack channel linked from
kudu.apache.org in case you want to discuss details in the chat channel,
although communicating primarily on dev@ is also fine.

The current work-in-progress patch for skip-scan is at
https://gerrit.cloudera.org/c/10983/ -- it's a somewhat large patch that
deals with low-level internals. I just rebased the patch onto the latest
master branch code and re-posted it as revision 21.

To understand the patch, definitely read the commit message and the
comments in the patch, and you will need to learn something about Kudu
tablet internals (on the plus side, there is no distributed component to
this feature). One good high level resource is the Kudu whitepaper
<https://kudu.apache.org/kudu.pdf> from 2015. We also have design docs
<https://github.com/apache/kudu/blob/master/docs/design-docs/README.md> in
the Kudu source tree; the most relevant one for this patch is the CFile
design doc
<https://github.com/apache/kudu/blob/master/docs/design-docs/cfile.md>
(although
it is written as more of a reference).

To get a sense of the remaining work needed to merge the patch with its
current scope, take a look at the review comments on the patch. There are
quite a few suggestions there, most of them important but also nit-picky.
However, they won't give you a lot of insight into the implementation
approach.

Unfortunately there is no a detailed design doc but I will try to outline
the important details here.

The skip-scan concept is a query optimization that calls for the index to
be used in cases where there are query predicates specified on columns that
are in the index and where the query is likely to match multiple rows, but
not all rows. Kudu only has a single index (the primary key index), and we
can make a good example using a compound primary key. Imagine we have a
schema like:

CREATE TABLE views (
  host STRING,
  timestamp INT,
  count INT,
  PRIMARY KEY (host, timestamp)
);

Let's specify a query: SELECT count from views WHERE timestamp = 100;
(note: I'm using SQL syntax for clarity in the explanation)

Currently Kudu will simply do a full table scan for this query. If instead
we could leverage the primary key index to do a skip-scan then we could
potentially avoid reading a lot of irrelevant rows and therefore save on
I/O costs. So the fundamental idea behind skip scan seems reasonable in
this case.

The implementation of this in Kudu gets a little complicated, because while
Kudu maintains a forward index for the primary key (Kudu calls this the
ad-hoc index for historical reasons) that maps from encoded PK to internal
row id, it's not possible in the general case to do a reverse lookup from
row id to primary key because Kudu doesn't maintain such a reverse index
and Kudu also doesn't materialize columns that are not either in the query
projection or the predicate. Because of this, the WIP patch relies on a
combination of crawling through the index and generating
next-possible-values to look up in the ad-hoc index's B-tree to implement
the skip-scan logic.

As a result of repeated B-tree index lookups, there is a definite
performance cost to using skip scan, so it can't simply be used all the
time (a full table scan is definitely faster than a skip scan if all rows
need to be returned, since a full table scan doesn't have to query the
ad-hoc index). The patch implements a heuristic that basically boils down
to attempting skip scan for a while and periodically checking to ensure
that a certain ratio of rows included in the result to rows filtered out
doesn't get too high, and if we are not skipping enough rows it falls back
to a full table scan. There are several ideas that have been proposed for
specific heuristics to use, mostly variations on the same theme. In any
case, one of the primary concerns about merging the patch as-is is that the
current formula for the ratio may not be accurate (or conservative) enough
to avoid turning on skip scan in cases where we should just do a full table
scan or a range scan. However, in order to prove it one way or the other,
someone has to do some reasonable performance testing and make the case
based on data.

One final thing I'll note here is that the implementation in the patch is
currently limited only to equality predicates on the 2nd column in a
compound primary key, however once the core skip-scan implementation is in
place then if so desired it would be natural to extend it to inequality
predicates, predicates on arbitrary columns in the ad-hoc index, as well as
support constructs like IN LIST, BETWEEN, multiple range predicates, etc.
Extending the work to apply in those cases is not trivial but I think
putting in a basic implementation is a good first step toward generalizing
the optimization.

I hope you find this explanation helpful. Feel free to take a look and let
me know when you have questions.

Mike

On Tue, Mar 5, 2019 at 2:02 AM shanmuga chidambaravel <
[email protected]> wrote:

> Hi Mike,
>
> Thanks for the speedy reply.
> I would like to help with this. How do I get started on this?
>
> Regards,
> Shanmuga
>
> On Mon, Mar 4, 2019 at 11:59 AM Mike Percy <[email protected]> wrote:
>
> > Hi Shanmuga,
> > Unfortunately the original author's internship ended last summer and
> > nobody has taken the time to complete the work. It would definitely speed
> > up certain types of queries. There are some concerns that in its current
> > state it could cause a performance regression for some queries though. It
> > could probably benefit from improvements to the heuristics it uses to
> > decide when to enable the skip scan optimization. Let me know if you're
> > interested in working on it.
> >
> > Mike
> >
> > On Mon, Mar 4, 2019 at 2:40 AM shanmuga chidambaravel <
> > [email protected]> wrote:
> >
> >> Hi
> >>
> >> I recently saw this blog post about index skip scan in kudu
> >>
> >>
> https://kudu.apache.org/2018/09/26/index-skip-scan-optimization-in-kudu.html
> >>  .
> >> It seems like this would provide huge performance increase for filtering
> >> in
> >> Kudu.
> >>
> >> Can you please tell me if this is on roadmap for planned release?
> >>
> >> Regards,
> >> Shanmuga
> >>
> >
>

Re: Regarding index skip scan in kudu

Reply via email to