[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Benedict (JIRA) Fri, 21 Aug 2015 03:49:08 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706537#comment-14706537
 ]


Benedict commented on CASSANDRA-8630:
-------------------------------------

A few random comments (not performing review, since Ariel's on that):

For ChecksummedDataInput, we can just update the crc whenever we exhaust the 
buffer, and on calling getCrc() we can update with whatever we have read so far 
in the current buffer. Introducing an extra {{forceSlowPath}} property in the 
superclass to every single call is something I would prefer we avoid.

We should comment the copying of the State object in MmappedRegions, so it's 
clear this is for thread safety, and that we still logically reference the 
original state.

We should file some follow ups to:

* Compact the mmap ranges, at least on the _final_ opening of the file
* Use the mmap extension logic for compressed files
* Generally I think we've gotten close enough to a good state that we should 
consider refactoring the whole remaining SequentialWriter, RAR, 
CompressionMetadata(.Writer) etc collection of classes. Preferably move them 
all into their own package, and make their relations more simply defined.

> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Stefania
>              Labels: compaction, performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and 
> SequencialWriter.write<Type> methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Reply via email to