[
https://issues.apache.org/jira/browse/CASSANDRA-9060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcus Eriksson updated CASSANDRA-9060:
---------------------------------------
Attachment: 0001-another-tweak-to-9060.patch
> Anticompaction hangs on bloom filter bitset serialization
> ----------------------------------------------------------
>
> Key: CASSANDRA-9060
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9060
> Project: Cassandra
> Issue Type: Bug
> Reporter: Gustav Munkby
> Assignee: Gustav Munkby
> Priority: Minor
> Fix For: 2.1.4
>
> Attachments: 0001-another-tweak-to-9060.patch, 2.1-9060-simple.patch,
> trunk-9060.patch
>
>
> I tried running an incremental repair against a 15-node vnode-cluster with
> roughly 500GB data running on 2.1.3-SNAPSHOT, without performing the
> suggested migration steps. I manually chose a small range for the repair
> (using --start/end-token). The actual repair part took almost no time at all,
> but the anticompactions took a lot of time (not surprisingly).
> Obviously, this might not be the ideal way to run incremental repairs, but I
> wanted to look into what made the whole process so slow. The results were
> rather surprising. The majority of the time was spent serializing bloom
> filters.
> The reason seemed to be two-fold. First, the bloom-filters generated were
> huge (probably because the original SSTables were large). With a proper
> migration to incremental repairs, I'm guessing this would not happen.
> Secondly, however, the bloom filters were being written to the output one
> byte at a time (with quite a few type-conversions on the way) to transform
> the little-endian in-memory representation to the big-endian on-disk
> representation.
> I have implemented a solution where big-endian is used in-memory as well as
> on-disk, which obviously makes de-/serialization much, much faster. This
> introduces some slight overhead when checking the bloom filter, but I can't
> see how that would be problematic. An obvious alternative would be to still
> perform the serialization/deserialization using a byte array, but perform the
> byte-order swap there.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)