[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532376#comment-14532376
 ] 

Benedict commented on CASSANDRA-8630:
-------------------------------------

I'm in favour of simplifying this. Focusing on a small number of well designed 
and optimised paths for reads is the best route. I think we should also merge 
functionality with "ByteBufferDataInput" - if you look at it, you'll see for 
mmapped files we're actually incurring all of the CPU overhead for constructing 
the int/long values too. If we can tolerate this, we can instead tolerate a 
check before a read on if we need to move the buffer (so they can share the 
same implementation). In fact, this would at the same time permit us to 
eliminate the weirdness with multiple file "segments", by having the mmap 
reader encapsulate that information and avoid it leaking into the rest of the 
codebase. If we can merge all of our readers into approximately one functional 
implementation of NIO reading, we're in a _much_ better position than we were.

Obviously the main complexity is when a read spans two buffer offsets. The 
question then becomes what to do: ideally we want to read from the underlying 
file at page boundaries (although right now this is impossible in the common 
case of compression, so perhaps we shouldn't worry too much until 
CASSANDRA-8896 is delivered), but we also want to allocate page-aligned buffers 
(and CASSANDRA-8897 currently won't easily offer "just slightly larger than" 
page-aligned buffers). So: do we have a slow path for when crossing these 
boundaries? I don't like that either, as it will also likely slow down the 
common case. 

I think the best option is to have a buffer of size min(chunk-size + one page, 
2 * chunk-size). This really requires CASSANDRA-8894, and even then probably 
requires an increase in the size of our buffer pool chunks in CASSANDRA-8897, 
which is quite achievable but may result in a higher watermark of memory use. 
We could make the default chunk size 256K (currently it is 64K), which would 
make it allocate _only_ page-aligned units, which would also simplify some of 
its logic but require that we complicate other bits, so that we don't discard 
64K because we need a 68K allocation (i.e. we would need a queue of chunks 
we're currently able to allocate from). [~stef1927]: thoughts?

> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Oleg Anastasyev
>              Labels: performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and 
> SequencialWriter.write<Type> methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to