[
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082202#comment-17082202
]
Benedict Elliott Smith commented on CASSANDRA-15229:
----------------------------------------------------
bq. In networking, most of the time, buffer will be release immediately after
allocation and with recycleWhenFree=false, fully freed chunk will be reused
instead of being recycled to global list. Partial-recycle is unlikely affect
networking usage. I am happy to test it..
It is famously difficult to prove a negative, particularly via external
testing. It will be untrue in some circumstances, most notably large message
processing (which happens asynchronously). I would need to review the buffer
control flow in messaging to confirm it is sufficiently low risk to modify the
behaviour here, so I would prefer we not modify it in a way that is not easily
verified.
bq. will it create fragmentation in system direct memory?
Not easily completely ruled out, but given this data will be allocated mostly
in its own virtual page space (given all allocations are much larger than a
normal page), it hopefully shouldn't be an insurmountable problem for most
allocators given the availability of almost unlimited virtual page space on
modern systems.
bq. I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both
latencies are slightly worse than baseline.
Did you perform the simple optimisation of rounding up to the >= 2KiB boundary
(for equivalent behaviour), then re-using any buffer that is correctly sized
when evicting to make room for a new item? It might well be possible to make
this yet more efficient than {{BufferPool}} by reducing this boundary to e.g.
1KiB, or perhaps as little as 512B.
So if I were doing this myself, I think I would be starting at this point and
if necessary would move towards further reusing the buffers we already have in
the cache - since it is already a pool of them. I would just be looking to
smooth out the random distribution of sizes used with e.g. a handful of queues
each containing a single size of buffer and at most a handful of items each.
This feels like a simpler solution to me, particularly as it does not affect
any other pool users.
However, I’m not doing the work (nor maybe reviewing it), so if you are willing
to at least enable the behaviour only for the ChunkCache so this change cannot
have any unintended negative effect for those users not expected to benefit, my
main concern will be alleviated.
> BufferPool Regression
> ---------------------
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
> Issue Type: Bug
> Components: Local/Caching
> Reporter: Benedict Elliott Smith
> Assignee: ZhaoYang
> Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png,
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png,
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png,
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we
> need to either change our behaviour to handle uncorrelated lifetimes or use
> something else. This is particularly important with the default chunk size
> for compressed sstables being reduced. If we address the problem, we should
> also utilise the BufferPool for native transport connections like we do for
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used
> for things with uncorrelated lifetimes, which essentially boils down to
> tracking those chunks that have not been freed and re-circulating them when
> we run out of completely free blocks. We should probably also permit
> instantiating separate {{BufferPool}}, so that we can insulate internode
> messaging from the {{ChunkCache}}, or at least have separate memory bounds
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce
> the amount of global coordination and per-allocation overhead. We don’t need
> 1KiB granularity for allocations, nor 16 byte granularity for tiny
> allocations.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]