[
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962691#comment-13962691
]
Benedict commented on CASSANDRA-6694:
-------------------------------------
bq. I think Jonathan Ellis mentioned that it might be better to reduce usage of
the column names instead of merging cell with column name (if I remember
correctly)
I don't recall this suggestion. Perhaps you are referring to the suggestion
that we not extract the cell names from the cell as often as we do, for the
purpose of comparison, in order to reduce garbage production?
bq. Regarding placeholders idea, if we allocate contiguous region for the whole
cell we can just have memory object + 1 int (or was it even short?...) field
which marks the end of the column name at that buffer, as column timestamp is a
fixed size long we know exactly where column value ends, that also helps with
spatial locality in most of the
In this case, this suggestion has much more complex problems:
# More (multiple implementation) virtual method invocations (as shown by
CASSANDRA-6993 this can have meaningfully negative performance implications)
# Major refactor of AbstractType hierarchy to prevent bytebuffer allocation on
comparison
# More object allocation in the request threads due to having to re-pack all of
any parameters into a Cell with a single buffer, as opposed to just dropping
them in place
# At which point it would make most sense to refactor (and mostly eliminate)
the entirety of CASSANDRA-5417, as we're almost always pumping the result
straight into a Cell anyway, so extracting the components into separate buffers
and repacking them into a single buffer in the Cell is very wasteful
That said, it is *viable*. It has some advantages too: the comparisons between
Native and Buffer cells are much more easily optimised. Many of these changes
may well need to happen in the natural course of things anyway as we optimise
the native implementation. But it has comparatively wide-ranging implications
for the current on-heap use case that might be a bit too much to bite off right
now.
bq. if it's not essential then we can do it at the very last stage once we done
with all more important changes which are plenty
I disagree. It makes the patch more complicated to *not* move it around.
Because something is not essential does not mean it is not the better option
> Slightly More Off-Heap Memtables
> --------------------------------
>
> Key: CASSANDRA-6694
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Labels: performance
> Fix For: 2.1 beta2
>
>
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as
> the on-heap overhead is still very large. It should not be tremendously
> difficult to extend these changes so that we allocate entire Cells off-heap,
> instead of multiple BBs per Cell (with all their associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6
> bytes per cell on average for the btree overhead, for a total overhead of
> around 20-22 bytes). This translates to 8-byte object overhead, 4-byte
> address (we will do alignment tricks like the VM to allow us to address a
> reasonably large memory space, although this trick is unlikely to last us
> forever, at which point we will have to bite the bullet and accept a 24-byte
> per cell overhead), and 4-byte object reference for maintaining our internal
> list of allocations, which is unfortunately necessary since we cannot safely
> (and cheaply) walk the object graph we allocate otherwise, which is necessary
> for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName
> instances so that they may be backed by native memory OR heap memory.
--
This message was sent by Atlassian JIRA
(v6.2#6252)