Hi Shanmuga, Ok, sounds good. Make sure you are subscribed to dev@ in case you are not already. I also suggest you join the Slack channel linked from kudu.apache.org in case you want to discuss details in the chat channel, although communicating primarily on dev@ is also fine.
The current work-in-progress patch for skip-scan is at https://gerrit.cloudera.org/c/10983/ -- it's a somewhat large patch that deals with low-level internals. I just rebased the patch onto the latest master branch code and re-posted it as revision 21. To understand the patch, definitely read the commit message and the comments in the patch, and you will need to learn something about Kudu tablet internals (on the plus side, there is no distributed component to this feature). One good high level resource is the Kudu whitepaper <https://kudu.apache.org/kudu.pdf> from 2015. We also have design docs <https://github.com/apache/kudu/blob/master/docs/design-docs/README.md> in the Kudu source tree; the most relevant one for this patch is the CFile design doc <https://github.com/apache/kudu/blob/master/docs/design-docs/cfile.md> (although it is written as more of a reference). To get a sense of the remaining work needed to merge the patch with its current scope, take a look at the review comments on the patch. There are quite a few suggestions there, most of them important but also nit-picky. However, they won't give you a lot of insight into the implementation approach. Unfortunately there is no a detailed design doc but I will try to outline the important details here. The skip-scan concept is a query optimization that calls for the index to be used in cases where there are query predicates specified on columns that are in the index and where the query is likely to match multiple rows, but not all rows. Kudu only has a single index (the primary key index), and we can make a good example using a compound primary key. Imagine we have a schema like: CREATE TABLE views ( host STRING, timestamp INT, count INT, PRIMARY KEY (host, timestamp) ); Let's specify a query: SELECT count from views WHERE timestamp = 100; (note: I'm using SQL syntax for clarity in the explanation) Currently Kudu will simply do a full table scan for this query. If instead we could leverage the primary key index to do a skip-scan then we could potentially avoid reading a lot of irrelevant rows and therefore save on I/O costs. So the fundamental idea behind skip scan seems reasonable in this case. The implementation of this in Kudu gets a little complicated, because while Kudu maintains a forward index for the primary key (Kudu calls this the ad-hoc index for historical reasons) that maps from encoded PK to internal row id, it's not possible in the general case to do a reverse lookup from row id to primary key because Kudu doesn't maintain such a reverse index and Kudu also doesn't materialize columns that are not either in the query projection or the predicate. Because of this, the WIP patch relies on a combination of crawling through the index and generating next-possible-values to look up in the ad-hoc index's B-tree to implement the skip-scan logic. As a result of repeated B-tree index lookups, there is a definite performance cost to using skip scan, so it can't simply be used all the time (a full table scan is definitely faster than a skip scan if all rows need to be returned, since a full table scan doesn't have to query the ad-hoc index). The patch implements a heuristic that basically boils down to attempting skip scan for a while and periodically checking to ensure that a certain ratio of rows included in the result to rows filtered out doesn't get too high, and if we are not skipping enough rows it falls back to a full table scan. There are several ideas that have been proposed for specific heuristics to use, mostly variations on the same theme. In any case, one of the primary concerns about merging the patch as-is is that the current formula for the ratio may not be accurate (or conservative) enough to avoid turning on skip scan in cases where we should just do a full table scan or a range scan. However, in order to prove it one way or the other, someone has to do some reasonable performance testing and make the case based on data. One final thing I'll note here is that the implementation in the patch is currently limited only to equality predicates on the 2nd column in a compound primary key, however once the core skip-scan implementation is in place then if so desired it would be natural to extend it to inequality predicates, predicates on arbitrary columns in the ad-hoc index, as well as support constructs like IN LIST, BETWEEN, multiple range predicates, etc. Extending the work to apply in those cases is not trivial but I think putting in a basic implementation is a good first step toward generalizing the optimization. I hope you find this explanation helpful. Feel free to take a look and let me know when you have questions. Mike On Tue, Mar 5, 2019 at 2:02 AM shanmuga chidambaravel < [email protected]> wrote: > Hi Mike, > > Thanks for the speedy reply. > I would like to help with this. How do I get started on this? > > Regards, > Shanmuga > > On Mon, Mar 4, 2019 at 11:59 AM Mike Percy <[email protected]> wrote: > > > Hi Shanmuga, > > Unfortunately the original author's internship ended last summer and > > nobody has taken the time to complete the work. It would definitely speed > > up certain types of queries. There are some concerns that in its current > > state it could cause a performance regression for some queries though. It > > could probably benefit from improvements to the heuristics it uses to > > decide when to enable the skip scan optimization. Let me know if you're > > interested in working on it. > > > > Mike > > > > On Mon, Mar 4, 2019 at 2:40 AM shanmuga chidambaravel < > > [email protected]> wrote: > > > >> Hi > >> > >> I recently saw this blog post about index skip scan in kudu > >> > >> > https://kudu.apache.org/2018/09/26/index-skip-scan-optimization-in-kudu.html > >> . > >> It seems like this would provide huge performance increase for filtering > >> in > >> Kudu. > >> > >> Can you please tell me if this is on roadmap for planned release? > >> > >> Regards, > >> Shanmuga > >> > > >
