[
https://issues.apache.org/jira/browse/CASSANDRA-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benedict updated CASSANDRA-8747:
--------------------------------
Description:
Currently openEarly does some fairly ugly looping back over the summary data
we've collected looking for one we think should be fully covered in the Index
and Data files, and that should have a safe boundary between it and the end of
an IndexSummary entry so that when scanning across it we should not
accidentally read an incomplete key. The approach taken is a little difficult
to reason about though, and be confident of, and I now realise is also very
subtly broken. Since we're cleaning up the behaviour around this code, it
seemed worthwhile to improve its clarity and make its behaviour easier to
reason about. The current behaviour can be characterised as:
# Take the current Index file length
# Find the IndexSummary boundary key (first key in an interval) that starts
past this position
# Take the IndexSummary boundary key (first key) for the preceding interval as
our initial boundary
# Construct a reader with this boundary
# Lookup our last key in the reader, and if its end position is past the end of
the data file, take the prior summary boundary. Repeat until we find one
starting before the end.
The bug may well be very hard to exhibit, or even impossible, but is that if we
have a single very large partition followed by 127 very tiny partitions (or
whatever the IndexSummary interval is configured as), our IndexSummary interval
buffer may not guarantee the record we have selected as our end is fully
readable.
The new approach is to track in the IndexSummary the safe and optimal boundary
point (i.e. the last record in each summary interval) and its bounds in the
index and data files. On flushing either file, we notify the summary builder to
the new flush points, and it consults its map of these and selects the last
such boundary that can safely be read in both. This is much easier to
understand, and has no such subtle risk.
was:
Currently openEarly does some fairly ugly looping back over the summary data
we've collected looking for one we think should be fully covered in the Index
and Data files, and that should have a safe boundary between it and the end of
an IndexSummary entry so that when scanning across it we should not
accidentally read an incomplete key. The approach taken is a little difficult
to reason about though, and be confident of. Since we're cleaning up the
behaviour around this code, it seemed worthwhile to improve its clarity and
make its behaviour easier to reason about. The current behaviour can be
characterised as:
Find the first summary record
Priority: Major (was: Minor)
Issue Type: Bug (was: Improvement)
Summary: Make SSTableWriter.openEarly behaviour more robust (was: Make
SSTableWriter.openEarly behaviour more obvious)
> Make SSTableWriter.openEarly behaviour more robust
> --------------------------------------------------
>
> Key: CASSANDRA-8747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8747
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Fix For: 2.1.4
>
>
> Currently openEarly does some fairly ugly looping back over the summary data
> we've collected looking for one we think should be fully covered in the Index
> and Data files, and that should have a safe boundary between it and the end
> of an IndexSummary entry so that when scanning across it we should not
> accidentally read an incomplete key. The approach taken is a little difficult
> to reason about though, and be confident of, and I now realise is also very
> subtly broken. Since we're cleaning up the behaviour around this code, it
> seemed worthwhile to improve its clarity and make its behaviour easier to
> reason about. The current behaviour can be characterised as:
> # Take the current Index file length
> # Find the IndexSummary boundary key (first key in an interval) that starts
> past this position
> # Take the IndexSummary boundary key (first key) for the preceding interval
> as our initial boundary
> # Construct a reader with this boundary
> # Lookup our last key in the reader, and if its end position is past the end
> of the data file, take the prior summary boundary. Repeat until we find one
> starting before the end.
> The bug may well be very hard to exhibit, or even impossible, but is that if
> we have a single very large partition followed by 127 very tiny partitions
> (or whatever the IndexSummary interval is configured as), our IndexSummary
> interval buffer may not guarantee the record we have selected as our end is
> fully readable.
> The new approach is to track in the IndexSummary the safe and optimal
> boundary point (i.e. the last record in each summary interval) and its bounds
> in the index and data files. On flushing either file, we notify the summary
> builder to the new flush points, and it consults its map of these and selects
> the last such boundary that can safely be read in both. This is much easier
> to understand, and has no such subtle risk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)