[
https://issues.apache.org/jira/browse/CASSANDRA-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jonathan Ellis updated CASSANDRA-1042:
--------------------------------------
Attachment: 1042-v2.txt
v2 attached:
- removes wrapped-range handling from CFS.getRangeSlice, since StorageProxy
always unwraps first
- adds (initially failing) system test exercising wrapped-range path
- adds sorting of unwrapped, restricted ranges relative to the original query
range [this is the bug fix]
> ColumnFamilyRecordReader returns duplicate rows
> -----------------------------------------------
>
> Key: CASSANDRA-1042
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1042
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.6
> Reporter: Joost Ouwerkerk
> Assignee: Jeremy Hanna
> Fix For: 0.6.4
>
> Attachments: 1042-0_6.txt, 1042-test.txt, 1042-v2.txt,
> Cassandra-1042-0_6-branch.patch.txt, CASSANDRA-1042-trunk.patch.txt,
> cassandra.tar.gz, duplicate_keys.rtf
>
>
> There's a bug in ColumnFamilyRecordReader that appears when processing a
> single split (which happens in most tests that have small number of rows),
> and potentially in other cases. When the start and end tokens of the split
> are equal, duplicate rows can be returned.
> Example with 5 rows:
> token (start and end) = 53193025635115934196771903670925341736
> Tokens returned by first get_range_slices iteration (all 5 rows):
> 16955237001963240173058271559858726497
> 40670782773005619916245995581909898190
> 99079589977253916124855502156832923443
> 144992942750327304334463589818972416113
> 166860289390734216023086131251507064403
> Tokens returned by next iteration (first token is last token from
> previous, end token is unchanged)
> 16955237001963240173058271559858726497
> 40670782773005619916245995581909898190
> Tokens returned by final iteration (first token is last token from
> previous, end token is unchanged)
> [] (empty)
> In this example, the mapper has processed 7 rows in total, 2 of which
> were duplicates.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.