[
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706492#comment-14706492
]
Stefania commented on CASSANDRA-8630:
-------------------------------------
I completed the move of segments to the builder, the bulk of the implementation
is {{MmappedRegions}}, which is a {{SharedCloseableImpl}} owned by the builder.
Each time a {{SegmentedFile}} is called we copy a snapshot and increment the
reference count. Shall we rename {{SegmentedFile}} and derived classes to
somethign else? The compressed segmented file still owns the mmapped regions as
it creates the metadata each time and I wasn't sure how to handle this.
I also fixed a channel ownership problem in the builder, eventually we should
really consider passing the file path to the builder constructor.
The integration with 6230 was a bit harder than I anticipated and some hinting
utests were failing on Jenkins. I had to introduce a flag to force the slow
path in {{RebufferingInputStream}} or else {{ChecksummedDataInput}} would not
work. My idea of updating the CRC during rebuffering is no good because we must
actually rely only on what is read by this class, not the underlying reader,
and we must extract a crc only on what's read, not necessarily buffered.
Please note the new CI links (due to renaming of the branch to
[8630-3.0|https://github.com/stef1927/cassandra/commits/8630-3.0]:
http://cassci.datastax.com/job/stef1927-8630-3.0-testall/
http://cassci.datastax.com/job/stef1927-8630-3.0-dtest/
cstar is currently having
[issues|http://cstar.datastax.com/tests/id/e05752e6-47c9-11e5-b4da-42010af0688f],
as soon as it is available I will launch a comparison.
Here are the remaining CR comments:
bq. MemoryInputStream.available() can wrap the addition between
buffer.remaining() + Ints.saturatedCast(memRemaining()). Do the addition and
then the saturating cast.
Fixed, thanks.
bq. Why does RandomAccessReader accept a builder and a parameter for
initializing the buffer? Seems like we lose the bonus of a builder a builder
allowing a constant signature.
Good point, fixed.
bq. A nit in initializeBuffer, it does firstSegment.value().slice() which
implies you want a subset of the buffer? duplicate() makes it obvious there is
no such concern.
Fixed.
bq. I think there is a place for unit tests stressing the 2 gigabyte
boundaries. That means testing available()/length()/remaining() style methods
as well as being able to read and seek with instances of these things that are
larger than 2g. Doing it with the actual file based ones seems bad, but maybe
you could intercept those to work with memory so they run fast or ingloriously
mock their inputs.
I've added {{RandomAccessReaderTest.testVeryLarge}}, it uses a fake file
channel that just increments the position and the buffers without reading
anything.
bq. For rate limiting is your current solution to consume buffer size bytes
from the limiter at a time for both mmap reads and standard? And you accomplish
this by slicing the buffer then updating the position? I don't see you setting
the limit before slicing?
I've added the call to {{limit()}} in {{rebufferMmap()}}. I missed that
initially. The downside is that we are rebuffering more often than needed for
mmap disk access, however we have more accurate throttling. I've run again the
same micro-benchmark and noticed no difference in performance.
bq. I thought NIODataInputStream had methods for reading into ByteBuffers, but
I was wrong. It's kind of thorny to add one to RebufferingInputStream so I
think you did the right thing putting it in FileSegmentedInputStream even
though it's an odd concern to have in that class. Unless you have a better idea.
Shall we move {{readBytes}} from {{FileDataInput}} to {{DataInputPlus}}, at
which point it can be implemented only by {{RebufferingInputStream}}?
bq. In SSTableReader you are adding and removing fields from files. What are
the cross version compatibility issues with that?
It should be fixed now, see {{Version.hasBoundaries()}}. It's currently set to
ignore boundaries for >= 3.0.
> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
> Issue Type: Improvement
> Components: Core, Tools
> Reporter: Oleg Anastasyev
> Assignee: Stefania
> Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png,
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz,
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as
> their matching write* are implemented with numerous calls of byte by byte
> read and write.
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and
> SequencialWriter.write<Type> methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30%
> faster on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction.
> (I attached a cpu load graph from one of our production, orange is niced CPU
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)