[ 
https://issues.apache.org/jira/browse/CASSANDRA-8897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521568#comment-14521568
 ] 

Benedict commented on CASSANDRA-8897:
-------------------------------------

[~stefania_alborghetti] I've force pushed a version with a lot of suggestions 
to [the same repository as 
before|https://github.com/belliottsmith/cassandra/tree/8897-suggestions]

There was a reasonably serious bug with concurrent free that I've fixed. We 
should create a stress burn test (see 
[here|https://github.com/belliottsmith/injector/tree/master/src/test/java/bes] 
for some examples, or LongOpOrderTest, or LongSharedExecutorPoolTest) that we 
can leave running for an extended period to be certain of its concurrent 
behaviour.

I also got a bit into the weeds, so I hope you don't feel like I'm treading on 
your toes too much. I started out with the intention of:

* renaming a few things (the terms "Chunk" were a bit overloaded, for instance, 
and methods like "get" weren't sufficiently descriptive IMO), 
* trim a few bits of unnecessary boilerplate:
** the Allocator class we only need a single version of, and only one method 
call
** a lot of trace statements for things that should never occur (guess they may 
have been for debugging?) , or error log messages for things that should 
probably be assertions
* simplifying the slot/reclaim logic

Combined that turned out to be a lot, and in the last of these really I went to 
town. I wanted to just make it a little more efficient in the common case, but 
as I thought about it I realised there is a much simpler approach that gives us 
near optimal behaviour in all cases, and is much simpler to reason about, so 
I've half implemented it to confirm it makes sense and is relatively simple and 
clear, and left the mess for you to clean up. There are a lot of potential 
decisions still to be made, though.

The idea is pretty simple: chunk up any given buffer into 64 units, and have a 
single long as a bitmap for which bits are available, and which are not. To 
allocate we now round up to the nearest *unit*, not the nearest power of 2, so 
we can allocate with finer granularity. To find an allocation, we convert it 
into a bitmask that represents the number of units we need, and search in the 
long for the first occurrence of the bitmask that is sufficiently aligned. We 
then clear those bits. On release we simply set the bits again.

The nice thing about this is it has very little code, has guaranteed bounds on 
running time, and no extra memory allocation or compaction of slots. Allocation 
and free are both constant time. We can probably lower the constant factor for 
fragmented allocations if we care to, but I'm not convinced it warrants it.

What I haven't done: 

* currently this means we slice a 64Kb block into 1Kb units. To allocate 
smaller buffers, we can create a microChunk from an allocation within a chunk 
(at the localPool level), from which we can serve smaller requests (which could 
be served in multiples of 16 bytes, so we get finer granularity again). This 
could also help us avoid the problem of wastage if we were to, say, allocate a 
64/32K buffer when we still had 16K spare in the current chunk, since we could 
convert the remainder into a microChunk for serving any small requests.
* We could safely and cheaply assert the buffer has not already been freed
* We could consider making this fully concurrent, dropping the normalFree and 
atomicFree, and just using the bitmap for determining its current status via a 
CAS. I was generally hoping to avoid introducing extra concurrency on the 
critical path, but we could potentially have two paths, one for concurrent and 
one for non-concurrent access, and introduce a flag so that any concurrent free 
on a non-concurrent path would fail. With or without this, though, I like the 
increased simplicity of only relying on the bitmap, since that means only a 
handful of lines of code to understand the memory management
* We could consider making the chunks available for reallocation before they 
are fully free, since there's no different between a partially or fully free 
chunk now for allocation purposes

What I've also done:
* I've conditionally made the attachment a Ref object, so that in unit tests / 
dtests (or other scenarios) we can debug potential leaks of bytebuffers

> Remove FileCacheService, instead pooling the buffers
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8897
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8897
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Assignee: Stefania
>             Fix For: 3.x
>
>
> After CASSANDRA-8893, a RAR will be a very lightweight object and will not 
> need caching, so we can eliminate this cache entirely. Instead we should have 
> a pool of buffers that are page-aligned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to