[
https://issues.apache.org/jira/browse/CASSANDRA-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882767#action_12882767
]
Jeremy Hanna commented on CASSANDRA-1042:
-----------------------------------------
The patch should apply cleanly to 0.7/trunk as well
> ColumnFamilyRecordReader returns duplicate rows
> -----------------------------------------------
>
> Key: CASSANDRA-1042
> URL: https://issues.apache.org/jira/browse/CASSANDRA-1042
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.6
> Reporter: Joost Ouwerkerk
> Assignee: Jeremy Hanna
> Fix For: 0.6.4
>
> Attachments: 1042-0_6.txt, Cassandra-1042-0_6-branch.patch.txt,
> CASSANDRA-1042-trunk.patch.txt, cassandra.tar.gz
>
>
> There's a bug in ColumnFamilyRecordReader that appears when processing a
> single split (which happens in most tests that have small number of rows),
> and potentially in other cases. When the start and end tokens of the split
> are equal, duplicate rows can be returned.
> Example with 5 rows:
> token (start and end) = 53193025635115934196771903670925341736
> Tokens returned by first get_range_slices iteration (all 5 rows):
> 16955237001963240173058271559858726497
> 40670782773005619916245995581909898190
> 99079589977253916124855502156832923443
> 144992942750327304334463589818972416113
> 166860289390734216023086131251507064403
> Tokens returned by next iteration (first token is last token from
> previous, end token is unchanged)
> 16955237001963240173058271559858726497
> 40670782773005619916245995581909898190
> Tokens returned by final iteration (first token is last token from
> previous, end token is unchanged)
> [] (empty)
> In this example, the mapper has processed 7 rows in total, 2 of which
> were duplicates.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.