[ 
https://issues.apache.org/jira/browse/CASSANDRA-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385290#comment-14385290
 ] 

Benedict commented on CASSANDRA-9060:
-------------------------------------

CASSANDRA-8670 will give the best option for this, but in the meantime (I think 
this fix should go into 2.1, personally, since it is trivial and likely to have 
significant impact - it's kind of amazing this oversight has gone unnoticed for 
so long, so thanks for pointing it out). 

Looking at it, I'm not at all convinced by the wrapping/unwrapping of the longs 
either, since our DataOutput implementations all just convert the writeLong() 
into a series of write(byte) calls. But the simplest, least invasive solution 
to this, is to indeed pass a BufferedOutputStream() into a 
DataOutputStreamPlus, rather than constructing a DataOutputStreamAndChannel. 
For 2.1 I think we should make this tiny change. 

For 3.0, I think we should wait for CASSANDRA-8670 and think through the 
wrapping/unwrapping of long business, and see if there is a clearer route. 
Perhaps version bump, so we can simply stream the raw bytes to disk without any 
conversion, since that makes the most sense - there's no reason to be flipping 
bytes whatsoever here, since we always index into the data by byte. If we want 
to maintain serialization format, we could buffer segments of the filter into a 
ByteBuffer/Memory object, and use Long.reverseBytes() prior to flushing that 
buffered data to disk. On reading we could populate the entire bitset, then 
iterate through reversing the bytes as we go. I would prefer to see the on disk 
representation match the in-memory though.

As to the antiCompaction calculation, that isn't my area but your conclusion 
seems reasonable to me. Taking a look at the code, it seems we would need to 
somehow correct for the ratio of each sstable we expect to be on each side of 
the range, which might lead to one side obtaining a worse than expected fp, 
with the other obtaining a better. Whereas right now both receive significantly 
better false positive ratios. I'm not sure how effectively we could better deal 
with this (at least without a bit more research effort and thought). [~krummas]?

> Anticompaction hangs on bloom filter bitset serialization 
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-9060
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9060
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Gustav Munkby
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: trunk-9060.patch
>
>
> I tried running an incremental repair against a 15-node vnode-cluster with 
> roughly 500GB data running on 2.1.3-SNAPSHOT, without performing the 
> suggested migration steps. I manually chose a small range for the repair 
> (using --start/end-token). The actual repair part took almost no time at all, 
> but the anticompactions took a lot of time (not surprisingly).
> Obviously, this might not be the ideal way to run incremental repairs, but I 
> wanted to look into what made the whole process so slow. The results were 
> rather surprising. The majority of the time was spent serializing bloom 
> filters.
> The reason seemed to be two-fold. First, the bloom-filters generated were 
> huge (probably because the original SSTables were large). With a proper 
> migration to incremental repairs, I'm guessing this would not happen. 
> Secondly, however, the bloom filters were being written to the output one 
> byte at a time (with quite a few type-conversions on the way) to transform 
> the little-endian in-memory representation to the big-endian on-disk 
> representation.
> I have implemented a solution where big-endian is used in-memory as well as 
> on-disk, which obviously makes de-/serialization much, much faster. This 
> introduces some slight overhead when checking the bloom filter, but I can't 
> see how that would be problematic. An obvious alternative would be to still 
> perform the serialization/deserialization using a byte array, but perform the 
> byte-order swap there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to