[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Stefania (JIRA) Wed, 19 Aug 2015 00:55:09 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14702662#comment-14702662
 ]


Stefania commented on CASSANDRA-8630:
-------------------------------------

Thank you for the quick review and all the tips on flight recorder!

I think I've addressed most of your comments with the [latest 
commit|https://github.com/stef1927/cassandra/commit/4794b0009b363450e03fa85cc8ca2eeb403f3e37]:

bq. In RebufferingInputStream, don't throw assertion error since it's not an 
assert nor an Error to read from a closed stream. JDK classes seem to throw 
IOException with a message. I say don't throw anything just let it NPE since 
that is what the other functions do and we are avoiding the extra branch in 
those. Or... check for "closed" all the time and throw IOException. Maybe 
Benedict has an opinion on the performance of checking.

Done, it'll just NPE.

bq. RandomAccessReader.reBufferMmap() - how is mmap reading rate limited?

The throttling was kind of work in progress. The issue I was facing with the 
performance measurements is that
the limiter was aquiring the entire buffer capacity and this can be big for 
mmap segments (the entire file length up to 2GB). So, as Benedict correctly 
guessed, multiple tables at once would block when trying to swap in the first 
segment in rebuffer. Therefore I changed the RAR constructor to initialize the 
buffer to the first segment (and this won't be throttled). Then I moved the 
throttling back to reBuffer, so it covers mmap segments too, except I am 
careful to throttle on buffer.remaining() rather than buffer.capacity(). So the 
first segment won't be throttled, this could be a problem but the existing 
behavior doesn't throttle mmap segment at all. Perhaps we shouldn't be 
throttling when rebuffering but when reading?

bq. RandomAccessReader.Channel is not a channel. It's more of a wrapper, 
descriptor, proxy or something.

This was a ugly hack to support the compressed commitlog replay code, which 
expects a FileDataInput that can be created with a BB, a path and an offset 
(the old ByteBufferDataInput). I did not have the confidence to change 
commitlog code so I ended up with a channel wrapper to support an empty channel 
and reuse the RAR. I got rid of it and added a new sub-class of DataInputBuffer 
that implements FileDataInput, I called it FileSegmentInputStream.

bq. RateLimiter is not a final class. We could start using a noop rate limiter 
instead of null. Constructor is private, maybe a rate limiter with a huge rate?

Replaced null with RateLimiter.create(Double.MAX_VALUE).

bq. RandomAccessReader.bytesRemaining() uses Ints.checkedCast, but we expect 
the file to be bigger than an int so it shouldn't throw. The API allows this 
and FileInputStream doesn't throw for available. In this case saturated cast is 
probably the right one. We should do a pass for Ints.checkedCast and make sure 
throwing is the right behavior instead of writing handling for it.

Changed to saturatedCast, thanks for spotting this.

bq. BufferPool.java has an import change that is extra

Fixed, thank you.

bq. I was going to ask for some warnings cleanup, but it's a big patch touching 
a lot of files that already had warnings, so whatever you want to do.

I've killed a few, not too many though, if you have any specific files that 
bother you particularly let me know.

bq. Thumbs up for logging the random seed in the tests

Thanks.

bq. RandomAccessReader.readBytes could allocate the buffer and then invoke the 
superclass read method

I think it's faster as it is? This is a hot-spot according to flight recorder.

bq. MemoryInputStream.reBuffer is allegedly not tested

I've added {{MemoryTest.testInputStream()}}

bq. RandomAccessReader.reBuffer has two cases that aren't tested, if (limit > 
fileLength) and in the loop if (n < 0)

fileLength should always be less than the channel size (I've added a check in 
the builder when we override it). So {{n <0}} would only happen if the file got 
modified which shouldn't happen, therefore I replaced the {{break}} with an 
{{FSError}}. As for {{limit > fileLength}} I don't see how this could happen so 
I removed it. Since this was existing code, could you double check this?

bq. RandomAccessReader.open(ByteBuffer, String, long) usage of checked cast 
seems like it would also limit to 2 gig files?

No longer applicable.

bq. Not sure about the first checked cast in reBufferMmap, if it saturated the 
min would still work, and you would be able to do it multiple times to get to 
the next entry so no reason to throw an exception?

Yes you're quite right, saturatedCast is the correct choice.

bq. Test coverage looks excellent on the things you worked on.

Thanks.

bq. What's the business with the missing segments? How does that happen and how 
often? Just wondering if going to the buffer pool for that makes sense.

Have a look at MmappedSegmentedFile.Builder.addPotentialBoundary() and 
createSegments(); only segments that are less than 2GB are added to the map. 
The previous implementation would invoke a RAR for missing segments that's why 
we switch to rebufferStandard in case of missing segments.

I've also rebased. This is on trunk for now, should it be moved to the 3.0 
branch?

CI is still running. 

I'll also try to organize a cperf READ test.

> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Stefania
>              Labels: compaction, performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and 
> SequencialWriter.write<Type> methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8630) Faster sequential IO (on compaction, streaming, etc)

Reply via email to