[
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972524#comment-13972524
]
Benedict commented on CASSANDRA-6694:
-------------------------------------
So, on the whole I really don't perceive this approach as better: there's a
great deal of code duplication now (set to get worse still when you finish the
refactor for DecoratedKey), between each of the correspondingly named cell
implementations. Personally I think the Impl approach is neater as a result of
avoiding that (this may be more pronounced if we decide to optimise equals() is
you suggested). That said, if this moves us forwards I can live with it, if you
can address point 1 below.
There are a few problems though:
# I am *very* opposed to a public setPeer() method. This is a deal breaker for
me - but it can be avoided with a bit more refactoring.
# Your optimised updateDigest function is actually much slower than the old
implementation for all but the smallest values: an optimised version needs to
batch the contents into an array (stored in a ThreadLocal) and call
updateDigest with the array, unless the total size is very small (there's a
crossover point on my laptop of about 12 bytes, under which it's faster to call
update(byte)).
# AbstractNativeCell.getBytes actually calls setBytes
# excessHeapSize... should be unsharedHeapSize...
# There should be no hashCode method in Buffer\*Cell - I removed these for a
reason. Because we can have a Cell that is a CellName, and vice-versa, using a
Cell as a key for a map is likely dangerous. Since we don't do it anywhere,
it's safe to simply remove the methods.
There may be other minor issues, I'll hold off giving it a formal review until
we decide the direction we're going. To respond to a few of your comments:
bq. CounterUpdateCell interface is missing as well as NativeCounterUpdateCell
implementation to match it.
There shouldn't be one for the time being - we can never construct one.
bq. CounterUpdateCell should be BufferCounterUpdateCell as it extends BufferCell
Same reason - it doesn't exist as either or, so I made a conscious decision to
leave it as a CounterUpdateCell: the fact that it extends BufferCell is kind of
unimportant. It's purpose is somewhat different, and I think it is better left
named CounterUpdateCell, as that is its purpose (to carry a counter update as
far as the memtable, and no further).
bq. Impl classes extends another Impl classes which doesn't make much sense as
all of the methods are static.
This brings in the namespace of the extended class' static methods, which is
useful.
bq. When taken out of context like that it doesn't really make sense but what I
meant, there are situation where we don't really need to get BB from the
CellName but can transfer bytes directly (especially for the native cell
implementations).
Sure, but again: scope of ticket, and care needs to be taken when doing this
(e.g. your updateDigest modifications)
> Slightly More Off-Heap Memtables
> --------------------------------
>
> Key: CASSANDRA-6694
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Labels: performance
> Fix For: 2.1 beta2
>
>
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as
> the on-heap overhead is still very large. It should not be tremendously
> difficult to extend these changes so that we allocate entire Cells off-heap,
> instead of multiple BBs per Cell (with all their associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6
> bytes per cell on average for the btree overhead, for a total overhead of
> around 20-22 bytes). This translates to 8-byte object overhead, 4-byte
> address (we will do alignment tricks like the VM to allow us to address a
> reasonably large memory space, although this trick is unlikely to last us
> forever, at which point we will have to bite the bullet and accept a 24-byte
> per cell overhead), and 4-byte object reference for maintaining our internal
> list of allocations, which is unfortunately necessary since we cannot safely
> (and cheaply) walk the object graph we allocate otherwise, which is necessary
> for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName
> instances so that they may be backed by native memory OR heap memory.
--
This message was sent by Atlassian JIRA
(v6.2#6252)