[
https://issues.apache.org/jira/browse/CASSANDRA-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon Williams updated CASSANDRA-8747:
----------------------------------------
Reviewer: Marcus Eriksson
> Make SSTableWriter.openEarly behaviour more robust
> --------------------------------------------------
>
> Key: CASSANDRA-8747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8747
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Fix For: 2.1.4
>
>
> Currently openEarly does some fairly ugly looping back over the summary data
> we've collected looking for one we think should be fully covered in the Index
> and Data files, and that should have a safe boundary between it and the end
> of an IndexSummary entry so that when scanning across it we should not
> accidentally read an incomplete key. The approach taken is a little difficult
> to reason about though, and be confident of, and I now realise is also very
> subtly broken. Since we're cleaning up the behaviour around this code, it
> seemed worthwhile to improve its clarity and make its behaviour easier to
> reason about. The current behaviour can be characterised as:
> # Take the current Index file length
> # Find the IndexSummary boundary key (first key in an interval) that starts
> past this position
> # Take the IndexSummary boundary key (first key) for the preceding interval
> as our initial boundary
> # Construct a reader with this boundary
> # Lookup our last key in the reader, and if its end position is past the end
> of the data file, take the prior summary boundary. Repeat until we find one
> starting before the end.
> The bug may well be very hard to exhibit, or even impossible, but is that if
> we have a single very large partition followed by 127 very tiny partitions
> (or whatever the IndexSummary interval is configured as), our IndexSummary
> interval buffer may not guarantee the record we have selected as our end is
> fully readable.
> The new approach is to track in the IndexSummary the safe and optimal
> boundary point (i.e. the last record in each summary interval) and its bounds
> in the index and data files. On flushing either file, we notify the summary
> builder to the new flush points, and it consults its map of these and selects
> the last such boundary that can safely be read in both. This is much easier
> to understand, and has no such subtle risk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)