[ https://issues.apache.org/jira/browse/CASSANDRA-8931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646225#comment-14646225 ]
Jonathan Ellis commented on CASSANDRA-8931: ------------------------------------------- Good idea. This will save a lot of memory. > IndexSummary (and Index) should store the token, and the minimal key to > unambiguously direct a query > ---------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-8931 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8931 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Benedict > Assignee: Stefania > Labels: performance > > Since these files are likely sticking around a little longer, it is probably > worth optimising them. A relatively simple change to Index and IndexSummary > could reduce the amount of space required significantly, reduce the CPU > burden of lookup, and hopefully bound the amount of space needed as key size > grows. On writing first we always store the token before the key (if it is > different to the key); then we simply truncate the whole record to the > minimum length necessary to answer an inequality search. Since the data file > contains the key also, we can corroborate we have the right key once we've > looked up. Since BFs are used to reduce unnecessary lookups, we don't save > much by ruling the false positives out one step earlier. > An improved follow up version would be to use a trie of shortest length to > answer inequality lookups, as this would also ensure very long keys with > common prefixes would not significantly increase the size of the index or > summary. This would translate to a trie index for the summary keying into a > static trie page for the index. -- This message was sent by Atlassian JIRA (v6.3.4#6332)