[ 
https://issues.apache.org/jira/browse/CASSANDRA-8630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703958#comment-14703958
 ] 

Benedict commented on CASSANDRA-8630:
-------------------------------------

bq. I don't see how an array of pairs can be less indirection then a map, or 
result in less boxing unless there are parallel arrays

Right, which is the standard approach for this kind of thing in Java.

bq. There might be something to not remapping entire files every 50 megabytes 
as part of early opening, but it's definitely better as a separate task. It's 
also not clear whether it's going to be faster or just feel better.

We've had a few weird kernel level memory interactions reported, and I cannot 
shake the feeling this was related. We never tracked down the cause, but also 
did not have follow up, so it's also quite possible it was an environmental 
issue. 

However, either way, if we're rewriting it right now (which to some extent we 
have to if we're eliminating the current ugliness of multiple readers, 
"potential boundaries" etc - cleanliness scope creep, I'll admit, but when 
refactoring a bunch of classes I don't think we should miss an opportunity to 
remove dead and complicating concepts, such as the need for Iterators of 
multiple FDI, that only makes sense for MFDI) we may as well do it correctly. 
If it's noticeably more work, then sure let's leave it. But if we're changing 
the behaviour, I don't think it is worth artificially reimplementing it the 
obviously worse way (irregardless of how much worse).

bq. ImmutableSortedMap (or is it navigable?) might split the difference between 
the two approaches.

You would think so. But take a look at its {{floorEntry}} implementation, which 
we would need to make use of. I'm terribly disappointed whenever I look beneath 
the hood of Guava.

bq. In SSTableReader you are adding and removing fields from files. What are 
the cross version compatibility issues with that?

This has been discussed already, I think?

> Faster sequential IO (on compaction, streaming, etc)
> ----------------------------------------------------
>
>                 Key: CASSANDRA-8630
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8630
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core, Tools
>            Reporter: Oleg Anastasyev
>            Assignee: Benedict
>              Labels: compaction, performance
>             Fix For: 3.x
>
>         Attachments: 8630-FasterSequencialReadsAndWrites.txt, cpu_load.png, 
> flight_recorder_001_files.tar.gz, flight_recorder_002_files.tar.gz, 
> mmaped_uncomp_hotspot.png
>
>
> When node is doing a lot of sequencial IO (streaming, compacting, etc) a lot 
> of CPU is lost in calls to RAF's int read() and DataOutputStream's write(int).
> This is because default implementations of readShort,readLong, etc as well as 
> their matching write* are implemented with numerous calls of byte by byte 
> read and write. 
> This makes a lot of syscalls as well.
> A quick microbench shows than just reimplementation of these methods in 
> either way gives 8x speed increase.
> A patch attached implements RandomAccessReader.read<Type> and 
> SequencialWriter.write<Type> methods in more efficient way.
> I also eliminated some extra byte copies in CompositeType.split and 
> ColumnNameHelper.maxComponents, which were on my profiler's hotspot method 
> list during tests.
> A stress tests on my laptop show that this patch makes compaction 25-30% 
> faster  on uncompressed sstables and 15% faster for compressed ones.
> A deployment to production shows much less CPU load for compaction. 
> (I attached a cpu load graph from one of our production, orange is niced CPU 
> load - i.e. compaction; yellow is user - i.e. not compaction related tasks)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to