Benedict resolved CASSANDRA-6709.

    Resolution: Later

Closing as the new sstable format most likely makes this unnecessary by 
eliminating the need for a separate key cache, although we *may* want to 
revisit this at some point afterwards since a separate cache could still be 
beneficial by improving memory occupancy rate, so closing as later instead of 

> Changes to KeyCache
> -------------------
>                 Key: CASSANDRA-6709
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6709
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Benedict
>            Priority: Minor
> It seems to me that KeyCache can be improved in a number of ways, but first 
> let's state the basic goal of KeyCache: to reduce the average query response 
> time by providing an exact seek position in a file for a given key.
> As it stands, KeyCache is both 100% accurate, but requires a lot of overhead 
> per entry.
> I propose to make KeyCache *mostly* accurate (say 99.9999%), by which I means 
> it will always fail accurately, but may rarely return an incorrect address, 
> and code the end users of it to be able to retry to confirm they seeked to 
> the correct position in the file, and to retry without the cache if they did 
> not.
> The advantage of this is that we can both take the cache off-heap easily, and 
> pack a lot more items into the cache. If we permit collisions across files 
> and simply use the (full 128-bit) murmur hash of the key + cfId + file 
> generation, we should get enough uniqueness to rarely have an erroneuous 
> collision, however we will be using only 20 bytes per key, instead of the 
> current 100 + <key length> bytes. This should allow us to answer far more 
> queries from the key cache than before, so the positive improvement to 
> performance should be greater than the negative drain.
> For the structure I propose an associative cache, where a single contiguous 
> address space is broken up into regions of, say, 8 entries, plus one counter. 
> The counter tracks the recency of access of each of the entries, so that on 
> write the least recently accessed/written can be replaced. A linear probe 
> within the region is used to determine if the entry we're looking for is 
> present. This should be very quick, as the entire region should fit into one 
> or two lines of L1.
> Advantage: we may see 5x bump in cache hit-rate, or even more for large keys.

This message was sent by Atlassian JIRA

Reply via email to