[ https://issues.apache.org/jira/browse/CASSANDRA-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246786#comment-14246786 ]
Benedict commented on CASSANDRA-8458: ------------------------------------- There would be a bug if ranges could wrap with the first "continue" potentially filtering files we actually need to visit. However Range.normalize() _claims_ to remove all wrapped ranges, in which case we can actually simplify the second check to remove !isWrapRound(), making it a little clearer. I'm pretty sure this should throw an AssertionError given our new checks on first/last, and again it simplifies the code a little. Also, technically not related to this ticket, but doesn't it strike you as a bug to use Operation.GT, instead of Operation.GE? If so, I'm not sure how we don't have tests to catch it. {code} RowIndexEntry idxLeft = getPosition(leftBound, Operator.GT); long left = idxLeft == null ? -1 : idxLeft.position; if (left == -1) // left is past the end of the file continue; {code} Couldn't we also simplify correcting "right" to a similar approach as you've taken for left, and use the fact that we know the positions we're looking for exist in both cases? I think the following snippet for the guts of the method should work but is (perhaps?) a little clearer: {code} // range is never wrap around, unless right is the minimum token assert !range.isWrapAround() || range.right.isMinimum(); // truncate the range so it at most covers the sstable AbstractBounds<RowPosition> bounds = range.toRowBounds(); RowPosition leftBound = bounds.left.compareTo(first) > 0 ? bounds.left : first.getToken().minKeyBound(); RowPosition rightBound = bounds.right.isMinimum() ? last.getToken().maxKeyBound() : bounds.right; // skip non-overlapping ranges if (leftBound.compareTo(last) > 0 || rightBound.compareTo(first) < 0) continue; // transform into positions known to exist in the file long left = getPosition(leftBound, Operator.GE).position; long right = (rightBound.compareTo(last) >= 0) ? (openReason == OpenReason.EARLY // if opened early, we overlap with the old sstables by one key, so we know that the last // (and further) key(s) will be streamed from these if necessary ? getPosition(last.getToken().maxKeyBound(), Operator.GE).position : uncompressedLength()) : getPosition(rightBound, Operator.GT).position; {code} > Don't give out positions in an sstable beyond its first/last tokens > ------------------------------------------------------------------- > > Key: CASSANDRA-8458 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8458 > Project: Cassandra > Issue Type: Bug > Reporter: Marcus Eriksson > Assignee: Marcus Eriksson > Fix For: 2.1.3 > > Attachments: > 0001-Make-sure-we-don-t-give-out-positions-from-an-sstabl.patch > > > Looks like we include tmplink sstables in streams in 2.1+, and when we do, > sometimes we get this error message on the receiving side: > {{java.io.IOException: Corrupt input data, block did not start with 2 byte > signature ('ZV') followed by type byte, 2-byte length)}}. I've only seen this > happen when a tmplink sstable is included in the stream. > We can not just exclude the tmplink files when starting the stream - we need > to include the original file, which we might miss since we check if the > requested stream range intersects the sstable range. -- This message was sent by Atlassian JIRA (v6.3.4#6332)