[
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702662#comment-14702662
]
Stefania commented on CASSANDRA-8630:
-------------------------------------
Thank you for the quick review and all the tips on flight recorder!
I think I've addressed most of your comments with the [latest
commit|https://github.com/stef1927/cassandra/commit/4794b0009b363450e03fa85cc8ca2eeb403f3e37]:
bq. In RebufferingInputStream, don't throw assertion error since it's not an
assert nor an Error to read from a closed stream. JDK classes seem to throw
IOException with a message. I say don't throw anything just let it NPE since
that is what the other functions do and we are avoiding the extra branch in
those. Or... check for "closed" all the time and throw IOException. Maybe
Benedict has an opinion on the performance of checking.
Done, it'll just NPE.
bq. RandomAccessReader.reBufferMmap() - how is mmap reading rate limited?
The throttling was kind of work in progress. The issue I was facing with the
performance measurements is that
the limiter was aquiring the entire buffer capacity and this can be big for
mmap segments (the entire file length up to 2GB). So, as Benedict correctly
guessed, multiple tables at once would block when trying to swap in the first
segment in rebuffer. Therefore I changed the RAR constructor to initialize the
buffer to the first segment (and this won't be throttled). Then I moved the
throttling back to reBuffer, so it covers mmap segments too, except I am
careful to throttle on buffer.remaining() rather than buffer.capacity(). So the
first segment won't be throttled, this could be a problem but the existing
behavior doesn't throttle mmap segment at all. Perhaps we shouldn't be
throttling when rebuffering but when reading?
bq. RandomAccessReader.Channel is not a channel. It's more of a wrapper,
descriptor, proxy or something.
This was a ugly hack to support the compressed commitlog replay code, which
expects a FileDataInput that can be created with a BB, a path and an offset
(the old ByteBufferDataInput). I did not have the confidence to change
commitlog code so I ended up with a channel wrapper to support an empty channel
and reuse the RAR. I got rid of it and added a new sub-class of DataInputBuffer
that implements FileDataInput, I called it FileSegmentInputStream.
bq. RateLimiter is not a final class. We could start using a noop rate limiter
instead of null. Constructor is private, maybe a rate limiter with a huge rate?
Replaced null with RateLimiter.create(Double.MAX_VALUE).
bq. RandomAccessReader.bytesRemaining() uses Ints.checkedCast, but we expect
the file to be bigger than an int so it shouldn't throw. The API allows this
and FileInputStream doesn't throw for available. In this case saturated cast is
probably the right one. We should do a pass for Ints.checkedCast and make sure
throwing is the right behavior instead of writing handling for it.
Changed to saturatedCast, thanks for spotting this.
bq. BufferPool.java has an import change that is extra
Fixed, thank you.
bq. I was going to ask for some warnings cleanup, but it's a big patch touching
a lot of files that already had warnings, so whatever you want to do.
I've killed a few, not too many though, if you have any specific files that
bother you particularly let me know.
bq. Thumbs up for logging the random seed in the tests
Thanks.
bq. RandomAccessReader.readBytes could allocate the buffer and then invoke the
superclass read method
I think it's faster as it is? This is a hot-spot according to flight recorder.
bq. MemoryInputStream.reBuffer is allegedly not tested
I've added {{MemoryTest.testInputStream()}}
bq. RandomAccessReader.reBuffer has two cases that aren't tested, if (limit >
fileLength) and in the loop if (n < 0)
fileLength should always be less than the channel size (I've added a check in
the builder when we override it). So {{n <0}} would only happen if the file got
modified which shouldn't happen, therefore I replaced the {{break}} with an
{{FSError}}. As for {{limit > fileLength}} I don't see how this could happen so
I removed it. Since this was existing code, could you double check this?
bq. RandomAccessReader.open(ByteBuffer, String, long) usage of checked cast
seems like it would also limit to 2 gig files?
No longer applicable.
bq. Not sure about the first checked cast in reBufferMmap, if it saturated the
min would still work, and you would be able to do it multiple times to get to
the next entry so no reason to throw an exception?
Yes you're quite right, saturatedCast is the correct choice.
bq. Test coverage looks excellent on the things you worked on.
Thanks.
bq. What's the business with the missing segments? How does that happen and how
often? Just wondering if going to the buffer pool for that makes sense.
Have a look at MmappedSegmentedFile.Builder.addPotentialBoundary() and
createSegments(); only segments that are less than 2GB are added to the map.
The previous implementation would invoke a RAR for missing segments that's why
we switch to rebufferStandard in case of missing segments.
I've also rebased. This is on trunk for now, should it be moved to the 3.0
branch?
CI is still running.
I'll also try to organize a cperf READ test.
> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
> Key: CASSANDRA-8630
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
> Project: Cassandra
> Issue Type: Improvement
> Components: Core, Tools
> Reporter: Oleg Anastasyev
> Assignee: Stefania
> Labels: compaction, performance
> Fix For: 3.x
>
> Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png,
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz,
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as
> their matching write* are implemented with numerous calls of byte by byte
> read and write.
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and
> SequencialWriter.write<Type> methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30%
> faster on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction.
> (I attached a cpu load graph from one of our production, orange is niced CPU
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)