[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Stefania (JIRA) Fri, 21 Aug 2015 03:00:04 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706492#comment-14706492
 ]


Stefania commented on CASSANDRA-8630:
-------------------------------------

I completed the move of segments to the builder, the bulk of the implementation 
is {{MmappedRegions}}, which is a {{SharedCloseableImpl}} owned by the builder. 
Each time a {{SegmentedFile}} is called we copy a snapshot and increment the 
reference count. Shall we rename {{SegmentedFile}} and derived classes to 
somethign else? The compressed segmented file still owns the mmapped regions as 
it creates the metadata each time and I wasn't sure how to handle this.

I also fixed a channel ownership problem in the builder, eventually we should 
really consider passing the file path to the builder constructor.

The integration with 6230 was a bit harder than I anticipated and some hinting 
utests were failing on Jenkins. I had to introduce a flag to force the slow 
path in {{RebufferingInputStream}} or else {{ChecksummedDataInput}} would not 
work. My idea of updating the CRC during rebuffering is no good because we must 
actually rely only on what is read by this class, not the underlying reader, 
and we must extract a crc only on what's read, not necessarily buffered.

Please note the new CI links (due to renaming of the branch to 
[8630-3.0|https://github.com/stef1927/cassandra/commits/8630-3.0]:

http://cassci.datastax.com/job/stef1927-8630-3.0-testall/
http://cassci.datastax.com/job/stef1927-8630-3.0-dtest/

cstar is currently having 
[issues|http://cstar.datastax.com/tests/id/e05752e6-47c9-11e5-b4da-42010af0688f],
 as soon as it is available I will launch a comparison.

Here are the remaining CR comments:

bq. MemoryInputStream.available() can wrap the addition between 
buffer.remaining() + Ints.saturatedCast(memRemaining()). Do the addition and 
then the saturating cast.

Fixed, thanks.

bq. Why does RandomAccessReader accept a builder and a parameter for 
initializing the buffer? Seems like we lose the bonus of a builder a builder 
allowing a constant signature.

Good point, fixed.

bq. A nit in initializeBuffer, it does firstSegment.value().slice() which 
implies you want a subset of the buffer? duplicate() makes it obvious there is 
no such concern.

Fixed.

bq. I think there is a place for unit tests stressing the 2 gigabyte 
boundaries. That means testing available()/length()/remaining() style methods 
as well as being able to read and seek with instances of these things that are 
larger than 2g. Doing it with the actual file based ones seems bad, but maybe 
you could intercept those to work with memory so they run fast or ingloriously 
mock their inputs.

I've added {{RandomAccessReaderTest.testVeryLarge}}, it uses a fake file 
channel that just increments the position and the buffers without reading 
anything.

bq. For rate limiting is your current solution to consume buffer size bytes 
from the limiter at a time for both mmap reads and standard? And you accomplish 
this by slicing the buffer then updating the position? I don't see you setting 
the limit before slicing?

I've added the call to {{limit()}} in {{rebufferMmap()}}. I missed that 
initially. The downside is that we are rebuffering more often than needed for 
mmap disk access, however we have more accurate throttling. I've run again the 
same micro-benchmark and noticed no difference in performance.

bq. I thought NIODataInputStream had methods for reading into ByteBuffers, but 
I was wrong. It's kind of thorny to add one to RebufferingInputStream so I 
think you did the right thing putting it in FileSegmentedInputStream even 
though it's an odd concern to have in that class. Unless you have a better idea.

Shall we move {{readBytes}} from {{FileDataInput}} to {{DataInputPlus}}, at 
which point it can be implemented only by {{RebufferingInputStream}}?


bq. In SSTableReader you are adding and removing fields from files. What are 
the cross version compatibility issues with that?

It should be fixed now, see {{Version.hasBoundaries()}}. It's currently set to 
ignore boundaries for >= 3.0.


> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Stefania
>              Labels: compaction, performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and 
> SequencialWriter.write<Type> methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Reply via email to