[jira] [Comment Edited] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Benedict (JIRA) Wed, 19 Aug 2015 01:11:20 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702681#comment-14702681
 ]


Benedict edited comment on CASSANDRA-8630 at 8/19/15 8:10 AM:
--------------------------------------------------------------

bq. RateLimiter is not a final class. 

I think Ariel was suggesting a new class that explicitly performs no work. 
However, since we use this class more often for reads than we do for 
compaction, I would prefer we stick with the more performant option of just 
null checking. Certainly using a full-fat RateLimiter is more expensive than 
this

bq. Have a look at MmappedSegmentedFile.Builder.addPotentialBoundary() and 
createSegments()

I should have written a bit about this before work started: my expectation is 
that this can all be completely removed. The reason for it was that we treated 
each mmap file segment as completely distinct, so we had to have each partition 
end before a 2G boundary (so we could map the entirety), or we had to use a 
non-mmap segment. That's no longer the case, since we just rebuffer, so we can 
safely eliminate all of the mess with segment boundaries, and just map in 
increments of 2G (or, frankly, whatever we feel like. It might be nice to do it 
exactly once when we "early open" so that we do not remap the same regions 
multiple times). At the same time we can eliminate the idea of multiple 
segments; we should always have just one segment. Given this, we should also 
consider renaming them, since they're no longer "segments" - they cover the 
whole file.

(Caveat: I haven't reviewed the code directly, I'm just going off the comments)


was (Author: benedict):
bq. RateLimiter is not a final class. 

I think Ariel was suggesting a new class that explicitly performs no work. 
However, since we use this class more often for reads than we do for 
compaction, I would prefer we stick with the more performant option of just 
null checking. Certainly using a full-fat RateLimiter is more expensive than 
this

bq. Have a look at MmappedSegmentedFile.Builder.addPotentialBoundary() and 
createSegments()

I should have written a bit about this before work started: my expectation is 
that this can all be completely removed. The reason for it was that we treated 
each mmap file segment as completely distinct, so we had to have each partition 
end on a 2G boundary (so we could map the entirety). That's no longer the case, 
since we just rebuffer, so we can safely eliminate all of the mess with segment 
boundaries, and just map in increments of 2G (or, frankly, whatever we feel 
like. It might be nice to do it exactly once when we "early open" so that we do 
not remap the same regions multiple times). At the same time we can eliminate 
the idea of multiple segments; we should always have just one segment. Given 
this, we should also consider renaming them, since they're no longer "segments" 
- they cover the whole file.

(Caveat: I haven't reviewed the code directly, I'm just going off the comments)

> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Stefania
>              Labels: compaction, performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and 
> SequencialWriter.write<Type> methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Reply via email to