[
https://issues.apache.org/jira/browse/CASSANDRA-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13920720#comment-13920720
]
Benedict commented on CASSANDRA-6689:
-------------------------------------
bq. sort of RCU (i'm looking at you OpOrder)
What do you mean here? If you mean read-copy-update, OpOrder is nothing like
this.
bq. I'm not sure what is to retain here if we do that copy when we send to the
wire
Ultimately, doing this copying before sending to the wire is something I would
like to avoid. Using the RefAction.allocateOnHeap() on top of this copying sees
wire transfer speeds for thrift drop by about 10% in my fairly rough-and-ready
benchmarks, so obviously copying has a cost. Possibly this cost is due to
unavoidably copying data you don't necessarily want to serialise, but it seems
to be there. Ultimately if we want to get in-memory read operations to 10x
their current performance, we can't go cutting any corners.
bq. introducing separate gc
I've stated clearly what this introduces as a benefit: overwrite workloads no
longer cause excessive flushes
bq. things but as we have a fixed number of threads it is going to work out
the same way as for buffering open files in the steady system state
Your next sentence states how this is a large cause of memory consumption, so
surely we should be using that memory if possible for other uses (returning it
to the buffer cache, or using it internally for more caching)?
bq. Temporary memory allocated by readers is exactly what we should be managing
at the first place because they allocate the most and it always the biggest
concern for us
I agree we should be moving to managing this as well, however I disagree about
how we should be managing it. In the medium term we should be bringing the
buffer cache in process, so that we can answer some queries without handing off
to the mutation stage (anything known to be non-blocking and fast should be
answered immediately by the thread that processed the connection), at which
point we will benefit from shared use of the memory pool, and concrete control
over how much memory readers are using, and zero-copy reads from the buffer
cache. I hope we may be able to do this for 3.0.
bq. do a simple memcpy test and see how much mb/s can you get from copying from
one pre-allocated pool to another
Are you performing a full object tree copy, and doing this with a running
system to see how it affects the performance of other system components? If
not, it doesn't seem to be a useful comparison. Note that this will still
create a tremendous amount of heap churn, as most of the memory used by objects
right now is on-heap. So copying the records is almost certainly no better for
young gen pressure than what we currently do - in fact, *it probably makes the
situation worse*.
bq. it's not the memtable which creates the most of the noise and memory
presure in the system (even tho it uses big chunk of heap)
It may not be causing the young gen pressure you're seeing, but it certainly
offers some benefit here by keeping more rows in memory so recent queries are
more likely to be answered with zero allocation, so reducing young gen
pressure; it is also a foundation for improving the row cache and introducing a
shared page cache which could bring us closer to zero allocation reads.
It's also not clear to me how you would be managing the reclaim of the off-heap
allocations without OpOrder, or do you mean to only use off-heap buffers for
readers, or to ref-count any memory as you're reading it? Not using off-heap
memory for the memtables would negate the main original point of this ticket:
to support larger memtables, thus reducing write amplification. Ref-counting
incurs overhead linear to the size of the result set, much like copying, and is
also fiddly to get right (not convinced it's cleaner or neater), whereas
OpOrder incurs overhead proportional to the number of times you reclaim. So if
you're using OpOrder, all you're really talking about is a new RefAction:
copyToAllocator() or something. So it doesn't notably reduce complexity, it
just reduces the quality of the end result.
> Partially Off Heap Memtables
> ----------------------------
>
> Key: CASSANDRA-6689
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6689
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Fix For: 2.1 beta2
>
> Attachments: CASSANDRA-6689-small-changes.patch
>
>
> Move the contents of ByteBuffers off-heap for records written to a memtable.
> (See comments for details)
--
This message was sent by Atlassian JIRA
(v6.2#6252)