[ 
https://issues.apache.org/jira/browse/CASSANDRA-8747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14318178#comment-14318178
 ] 

Marcus Eriksson commented on CASSANDRA-8747:
--------------------------------------------

+1

> Make SSTableWriter.openEarly behaviour more robust
> --------------------------------------------------
>
>                 Key: CASSANDRA-8747
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8747
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 2.1.4
>
>
> Currently openEarly does some fairly ugly looping back over the summary data 
> we've collected looking for one we think should be fully covered in the Index 
> and Data files, and that should have a safe boundary between it and the end 
> of an IndexSummary entry so that when scanning across it we should not 
> accidentally read an incomplete key. The approach taken is a little difficult 
> to reason about though, and be confident of, and I now realise is also very 
> subtly broken. Since we're cleaning up the behaviour around this code, it 
> seemed worthwhile to improve its clarity and make its behaviour easier to 
> reason about. The current behaviour can be characterised as:
> # Take the current Index file length
> # Find the IndexSummary boundary key (first key in an interval) that starts 
> past this position
> # Take the IndexSummary boundary key (first key) for the preceding interval 
> as our initial boundary
> # Construct a reader with this boundary
> # Lookup our last key in the reader, and if its end position is past the end 
> of the data file, take the prior summary boundary. Repeat until we find one 
> starting before the end.
> The bug may well be very hard to exhibit, or even impossible, but is that if 
> we have a single very large partition followed by 127 very tiny partitions 
> (or whatever the IndexSummary interval is configured as), our IndexSummary 
> interval buffer may not guarantee the record we have selected as our end is 
> fully readable.
> The new approach is to track in the IndexSummary the safe and optimal 
> boundary point (i.e. the last record in each summary interval) and its bounds 
> in the index and data files. On flushing either file, we notify the summary 
> builder to the new flush points, and it consults its map of these and selects 
> the last such boundary that can safely be read in both. This is much easier 
> to understand, and has no such subtle risk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to