[ https://issues.apache.org/jira/browse/CASSANDRA-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318178#comment-14318178 ]
Marcus Eriksson commented on CASSANDRA-8747: -------------------------------------------- +1 > Make SSTableWriter.openEarly behaviour more robust > -------------------------------------------------- > > Key: CASSANDRA-8747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8747 > Project: Cassandra > Issue Type: Bug > Components: Core > Reporter: Benedict > Assignee: Benedict > Fix For: 2.1.4 > > > Currently openEarly does some fairly ugly looping back over the summary data > we've collected looking for one we think should be fully covered in the Index > and Data files, and that should have a safe boundary between it and the end > of an IndexSummary entry so that when scanning across it we should not > accidentally read an incomplete key. The approach taken is a little difficult > to reason about though, and be confident of, and I now realise is also very > subtly broken. Since we're cleaning up the behaviour around this code, it > seemed worthwhile to improve its clarity and make its behaviour easier to > reason about. The current behaviour can be characterised as: > # Take the current Index file length > # Find the IndexSummary boundary key (first key in an interval) that starts > past this position > # Take the IndexSummary boundary key (first key) for the preceding interval > as our initial boundary > # Construct a reader with this boundary > # Lookup our last key in the reader, and if its end position is past the end > of the data file, take the prior summary boundary. Repeat until we find one > starting before the end. > The bug may well be very hard to exhibit, or even impossible, but is that if > we have a single very large partition followed by 127 very tiny partitions > (or whatever the IndexSummary interval is configured as), our IndexSummary > interval buffer may not guarantee the record we have selected as our end is > fully readable. > The new approach is to track in the IndexSummary the safe and optimal > boundary point (i.e. the last record in each summary interval) and its bounds > in the index and data files. On flushing either file, we notify the summary > builder to the new flush points, and it consults its map of these and selects > the last such boundary that can safely be read in both. This is much easier > to understand, and has no such subtle risk. -- This message was sent by Atlassian JIRA (v6.3.4#6332)