[ 
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232833#comment-14232833
 ] 

Benedict commented on CASSANDRA-7438:
-------------------------------------

re: hash bits:

there's not really a dramatic benefit to using more than 32-bits. We will 
always use the upper bits for the segment and the lower bits for the bucket, 
for which 4B items is plenty, although we don't have proper entropy for all the 
bits; we may have only 28-bits of good collision free-ness; we will want to 
rehash the murmur hash to ensure this is spread evenly to avoid a grow boundary 
consistently failing to reduce collisions. 

The one advantage of having some spare hash bits is that we can use these to 
avoid running a potentially expensive comparison on a large key until high 
confidence we've found the correct item - and as the number of unused hash bits 
for indexing dwindle, the value of this goes up. But the number of instances 
where this helps will be vanishingly small, since the head of the key will be 
on the same cache line and a hash collision and key prefix collision is pretty 
unlikely. It might be more significant if we were to use open-address hashing, 
as we would have excellent locality and reduce the number of expected cache 
misses for a lookup. But this won't be measurable above the cache serialization 
costs. We do already have these hash bits calculated in c*, typically. We also 
are unlikely to notice the overhead - allocations are likely to have ~16 bytes 
of overhead, be padded to the nearest 8 or 16 bytes, and a row has a lot of 
bumpf to encode. I doubt there will be any variation in storage costs from 
using all 64 bits.

i.e., whatever floats your boat

> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
>                 Key: CASSANDRA-7438
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Linux
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: performance
>             Fix For: 3.0
>
>         Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in 
> JVM heap as BB, 
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better 
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off 
> heap and use JNI to interact with cache. We might want to ensure that the new 
> implementation match the existing API's (ICache), and the implementation 
> needs to have safe memory access, low overhead in memory and less memcpy's 
> (As much as possible).
> We might also want to make this cache configurable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to