[
https://issues.apache.org/jira/browse/CASSANDRA-7438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232833#comment-14232833
]
Benedict commented on CASSANDRA-7438:
-------------------------------------
re: hash bits:
there's not really a dramatic benefit to using more than 32-bits. We will
always use the upper bits for the segment and the lower bits for the bucket,
for which 4B items is plenty, although we don't have proper entropy for all the
bits; we may have only 28-bits of good collision free-ness; we will want to
rehash the murmur hash to ensure this is spread evenly to avoid a grow boundary
consistently failing to reduce collisions.
The one advantage of having some spare hash bits is that we can use these to
avoid running a potentially expensive comparison on a large key until high
confidence we've found the correct item - and as the number of unused hash bits
for indexing dwindle, the value of this goes up. But the number of instances
where this helps will be vanishingly small, since the head of the key will be
on the same cache line and a hash collision and key prefix collision is pretty
unlikely. It might be more significant if we were to use open-address hashing,
as we would have excellent locality and reduce the number of expected cache
misses for a lookup. But this won't be measurable above the cache serialization
costs. We do already have these hash bits calculated in c*, typically. We also
are unlikely to notice the overhead - allocations are likely to have ~16 bytes
of overhead, be padded to the nearest 8 or 16 bytes, and a row has a lot of
bumpf to encode. I doubt there will be any variation in storage costs from
using all 64 bits.
i.e., whatever floats your boat
> Serializing Row cache alternative (Fully off heap)
> --------------------------------------------------
>
> Key: CASSANDRA-7438
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7438
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Environment: Linux
> Reporter: Vijay
> Assignee: Vijay
> Labels: performance
> Fix For: 3.0
>
> Attachments: 0001-CASSANDRA-7438.patch, tests.zip
>
>
> Currently SerializingCache is partially off heap, keys are still stored in
> JVM heap as BB,
> * There is a higher GC costs for a reasonably big cache.
> * Some users have used the row cache efficiently in production for better
> results, but this requires careful tunning.
> * Overhead in Memory for the cache entries are relatively high.
> So the proposal for this ticket is to move the LRU cache logic completely off
> heap and use JNI to interact with cache. We might want to ensure that the new
> implementation match the existing API's (ICache), and the implementation
> needs to have safe memory access, low overhead in memory and less memcpy's
> (As much as possible).
> We might also want to make this cache configurable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)