[
https://issues.apache.org/jira/browse/CASSANDRA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800089#comment-13800089
]
Alex Liu commented on CASSANDRA-6048:
-------------------------------------
Since we don't know all the index results, it's hard to build a perfect hash,
it's difficult to use bitmap here. We can reduce the random I/O between indexes
and base CF by increasing the paging page size to minimum 200(or any big number
but won't OOM), so hopefully we can get all the index results within one seq
read of index per index.
Use primary index scan + loop If the primary index is significant more
selective (mean of number of columns is much lower) than other indexes and the
number of index results are less than some threshold. Once above the threshold
we switch to index merging during run time.
Other cases use index merge.
We can also set a upper bound of the number of indexes to be merged, so we
won't end up with OOM.
> Add the ability to use multiple indexes in a single query
> ---------------------------------------------------------
>
> Key: CASSANDRA-6048
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6048
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Alex Liu
> Assignee: Alex Liu
> Fix For: 2.1
>
> Attachments: 6048-1.2-branch.txt, 6048-trunk.txt
>
>
> Existing data filtering uses the following algorithm
> {code}
> 1. find best selective predicate based on the smallest mean columns count
> 2. fetch rows for the best selective predicate predicate, then filter the
> data based on other predicates left.
> {code}
> So potentially we could improve the performance by
> {code}
> 1. joining multiple predicates then do the data filtering for other
> predicates.
> 2. fine tune the best predicate selection algorithm
> {code}
> For multiple predicate join, it could improve performance if one predicate
> has many entries and another predicate has a very few of entries. It means a
> few index CF read, join the row keys, fetch rows then filter other predicates
> Another approach is to have index on multiple columns.
--
This message was sent by Atlassian JIRA
(v6.1#6144)