[ https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15742552#comment-15742552 ]
Alex Petrov commented on CASSANDRA-12915: ----------------------------------------- bq. But really it would be better if the query optimizer was in the database rather than in our code. I'm sorry if it sounded like I'm opposed to the change. I'm just saying that the bottleneck is in the way Terms are iterated, so we may want to address this one first. Also, I'm sure this way does not bring in any regressions, but since if we improve Term iteration, we may bring the performance to the levels comparable with what happens in the {{TokenTree}}, which would mean that we don't have to ignore one of the indexes. In my opinion, join of two indexes is a manifestation and not a source of the problem, so I'd rather fix the cause. bq. Do you see a quick way of fixing the 'empty iterator' issue ? Sure, I have a patch in my branch, which pretty much literally returns an empty iterator (one that'd return {{false}} for {{hasNext}} and {{endOfData}} for {{next}}. bq. Would you agree to chance the format to include proper cardinality estimation to the index (so we can build upon later) It's not my decision, you can post the proposal to the Mailing List and everyone who's involved with SASI can contribute to the discussion and provide insightful feedback. At this point I'm not sure about a) how exactly to do that, since technically we have counts on token tree level, but it seems that with LIKE the problem is one level higher, on TERM level, so we have to know how many items would match a non-EQ query. I'm not aware of any way to do this without a linear scan, tree or trie (e.g. estimate). My initial comment was related to EQ. b) if we really need to follow this path or understand how to optimise {{LIKE}} prefix queries instead. But this may be a very significant chunk of work. bq. What would be the ETA for a proper query planner ? If O(months), would it make sense to merge something like what I did to bring some performance improvement in the meantime ? Unfortunately I do not have any estimates on that. In my personal opinion, Query Planner in SASI is secondary to the improvements to RangeIterators, #11990 (and all opportunities that it opens before us), possible optimisations for how LIKE queries are iterated are going to both solve this particular problem and give much better results in "average case". Hope that helps. > SASI: Index intersection can be very inefficient > ------------------------------------------------ > > Key: CASSANDRA-12915 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12915 > Project: Cassandra > Issue Type: Improvement > Components: sasi > Reporter: Corentin Chary > Fix For: 3.x > > > It looks like RangeIntersectionIterator.java and be pretty inefficient in > some cases. Let's take the following query: > SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar'; > In this case: > * index1 = 'foo' will match 2 items > * index2 = 'bar' will match ~300k items > On my setup, the query will take ~1 sec, most of the time being spent in > disk.TokenTree.getTokenAt(). > if I patch RangeIntersectionIterator so that it doesn't try to do the > intersection (and effectively only use 'index1') the query will run in a few > tenth of milliseconds. > I see multiple solutions for that: > * Add a static thresold to avoid the use of the index for the intersection > when we know it will be slow. Probably when the range size factor is very > small and the range size is big. > * CASSANDRA-10765 -- This message was sent by Atlassian JIRA (v6.3.4#6332)