[
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964564#comment-13964564
]
Pavel Yaskevich commented on CASSANDRA-6694:
--------------------------------------------
bq. The JVM instruction set is besides the point. The point is what hotspot
will do: with a single implementor or static method of small enough bytecode
representation, it will be inlined. Note I said "multiple implementation"
virtual method. With the option you suggest we will need an extra virtual
invocation cost with every access to the underlying bytes, some extra math to
access the right location, and one extra object field reference to locate the
position we're offsetting from. These costs mount up rapidly.
How is that besides the point when you claim that method calls with multiple
implementations are slower than (and not getting inlined) static method
invocations from multiple classes basically constant_pool reimplementation in
your code?... What I claim is that it doesn't matter if you override a method
multiple times or call a static method which calls another static method like
your patch does for DeletedCell e.g. \{Native,
Buffer\}DeletedCell.cellDataSize() which calls
DeletedCell.Impl.cellDataSize(this) which transfers to
Cell.Impl.cellDataSize(this); Just make an example disassemble classes (with
javap -c or similar) and see what bytecode did it generate. Also for inlining
problem I would like to see the proof of reason why are those methods are not
getting inlined (are they even touched by JIT?) by enabling logging with
-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining and
sharing the output, otherwise "multiple implementation" virtual method being
slow claim is just empty rhetoric.
bq. Hmm. No, I now note your "client" implementation: what exactly is this one?
Please clarify, as the thrift cell is going to need to be compared with the
other implementations, and suddenly much of any benefit will disappear. The
best way to make comparisons cheap and easy is to have both sides of the
comparison have at least the same layout. If we have to either virtual invoke
or instanceof check for every comparison, and a different code path for
comparing each type of representation, there will be a performance impact. As
such the only main benefit of this approach is eliminated in my eyes. Also, how
will this "client" implementation achieve its various functions, and define its
type? Seems like you'll need a duplicate hierarchy still.
What was just a suggestion for temp container in between client transport and
memtable, as those buffers are already allocated separately by thrift it seems
reasonable to have Cell work with those buffers, it would take more memory for
ByteBuffer containers passed from Thrift but cell comparison logic should not
change because as they would operate on the common container type, it's similar
contept to what Netty does with ByteBuf gathered from other ByteBuf pieces.
> Slightly More Off-Heap Memtables
> --------------------------------
>
> Key: CASSANDRA-6694
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Labels: performance
> Fix For: 2.1 beta2
>
>
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as
> the on-heap overhead is still very large. It should not be tremendously
> difficult to extend these changes so that we allocate entire Cells off-heap,
> instead of multiple BBs per Cell (with all their associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6
> bytes per cell on average for the btree overhead, for a total overhead of
> around 20-22 bytes). This translates to 8-byte object overhead, 4-byte
> address (we will do alignment tricks like the VM to allow us to address a
> reasonably large memory space, although this trick is unlikely to last us
> forever, at which point we will have to bite the bullet and accept a 24-byte
> per cell overhead), and 4-byte object reference for maintaining our internal
> list of allocations, which is unfortunately necessary since we cannot safely
> (and cheaply) walk the object graph we allocate otherwise, which is necessary
> for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName
> instances so that they may be backed by native memory OR heap memory.
--
This message was sent by Atlassian JIRA
(v6.2#6252)