[ 
https://issues.apache.org/jira/browse/CASSANDRA-6694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963907#comment-13963907
 ] 

Pavel Yaskevich commented on CASSANDRA-6694:
--------------------------------------------

bq. I'm saying performance critical code is impacted when you have virtual 
method calls that cannot be optimised by the VM (i.e. those with multiple 
implementations). I meant CASSANDRA-6553 and CASSANDRA-6934

Which means that if we actually optimize AbstractType and derivatives to work 
directly with underlying bytes whole problem could be resolved? That's why I 
want to understand why we can't have a simple implementation of the cell which 
has one buffer + metadata about component sizes (which could also be encoded) 
instead of having buffer per component in the name (if composite) + buffer for 
value + long timestamp? Maybe it would be easier to offload all of the work to 
AbstractType instead of trying to optimize on the Cell level?

I went through JVM instruction set doc (specifically 
http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.invokespecial
 and 
http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.invokestatic)
 those methods are not that different and both have to do lookup in the 
constant_pool of that class so I'm wondering if it's virtual calls that create 
a problem or it's something else masked by that...

It also looks like if we use static Impl scheme (like in the patch set) would 
execute the same amount of instructions because compiler emits *aload_0* (this) 
in both cases before would it be invoke\{special, virtual\} or invokestatic, 
and more instructions in static Impl form if we use something else instead of 
"this". Generally when callers use methods from super class or interface (as it 
is right now for e.g. Cell.dataSize()) compiler would emit *aload_0, 
invokevirtual #offset* directly to the Cell method, where with static Impl it 
has to that multiple times *aload_0, invokevirtual #offset* (to the method in 
DeleteCell.dataSize() and then internally *aload_0, invokestatic #offset* (to 
the DeletedCell.Impl.dataSize()) which means longer constant_pool walk.


bq. Then what exactly do we win? We still have to have two hierarchies and the 
same modularization. Also the potential ease of optimizations for comparison 
disappear, and we still have increased indirection and virtual method call 
costs. If this is the suggestion, I am very -1, as the payoff is very small, 
the work nontrivial and the negatives substantial.

The wins are, primarily, less object overhead (ultimate goal of all this) and 
maintainability of the code. We basically have Cell based on type - expired, 
deleted, counter, client (the last one being used mostly by Thrift) as it is 
right now, so no Buffered* or Native* plus allocators of 3 types (maybe we 
actually don't need one which allocates DirectBuffer but can just go with JNA 
backed one) which allocate raw bytes. Cell reconcile, equals, dataSize and 
other methods become straight-forward. Also, as we consider Composite as a 
complete entity, storing components as contiguous blocks would reduce container 
overhead + speeds up comparisons by exploiting spatial locality. 

[~jbellis] mentioned this "My preferred solution would be, stop extracting the 
name so often by itself. Spot checking the code, it seems we usually do this 
just to "simplify" a comparison, so this could in principle just be done with 
the Cell object rather than just the name." I think that would would further 
benefit the approach that I'm describing.

> Slightly More Off-Heap Memtables
> --------------------------------
>
>                 Key: CASSANDRA-6694
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6694
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>              Labels: performance
>             Fix For: 2.1 beta2
>
>
> The Off Heap memtables introduced in CASSANDRA-6689 don't go far enough, as 
> the on-heap overhead is still very large. It should not be tremendously 
> difficult to extend these changes so that we allocate entire Cells off-heap, 
> instead of multiple BBs per Cell (with all their associated overhead).
> The goal (if possible) is to reach an overhead of 16-bytes per Cell (plus 4-6 
> bytes per cell on average for the btree overhead, for a total overhead of 
> around 20-22 bytes). This translates to 8-byte object overhead, 4-byte 
> address (we will do alignment tricks like the VM to allow us to address a 
> reasonably large memory space, although this trick is unlikely to last us 
> forever, at which point we will have to bite the bullet and accept a 24-byte 
> per cell overhead), and 4-byte object reference for maintaining our internal 
> list of allocations, which is unfortunately necessary since we cannot safely 
> (and cheaply) walk the object graph we allocate otherwise, which is necessary 
> for (allocation-) compaction and pointer rewriting.
> The ugliest thing here is going to be implementing the various CellName 
> instances so that they may be backed by native memory OR heap memory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to