[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Stefania (JIRA) Wed, 26 Aug 2015 02:31:58 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712805#comment-14712805
 ]


Stefania commented on CASSANDRA-8630:
-------------------------------------

bq. We could consider changing this for compaction readers, or at least for 
throttled readers (which amount to the same thing). There's no reason not to 
read 64Kb at a time for compaction, since we know we'll want all of the data.

Done. If limiter is not null then the bufferSize is limited to 64k in RAR. For 
compressed RAR however, we keep on using the chunk data length.

bq. Right so there needs to be a copy, but you don't need to copy the same 
state every time you read. You can make an immutable copy once on write, and 
then share that indefinitely. I think you are on the right track with the 
isCopy flag, but maybe make it a field that is called immutableCopy or 
something, and shared copy returns the same immutable view of the state every 
time. So if immutableCopy is null then this State object is the immutable copy.

Nice idea, implemented, thanks.

bq. ChecksummedDataInput test doesn't check for failing checksums. resetCrc(), 
and readBytes() are also not tested.

Added.

bq. BufferedRandomAccessFileTest.testAssertionErrorWhenBytesPastMarkIsNegative 
failed for me.

It works for me both from Intellij and from the command line. Have you checked 
you have set {{-ea}}?

bq. CompressedRandomAccessReader.reBufferMmap() doesn't appear to be tested.

Adapted an existing test, {{testResetAndTruncate}}, to also run using mmap 
segments. 

I also fixed a few warnings in CompressedRandomAccessReaderTest and 
RandomAccessReaderTest.


> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Stefania
>              Labels: compaction, performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and 
> SequencialWriter.write<Type> methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Reply via email to