[ 
https://issues.apache.org/jira/browse/BEAM-14558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17550154#comment-17550154
 ] 

Danny McCormick commented on BEAM-14558:
----------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/21715

> Data missing when using CassandraIO.Read
> ----------------------------------------
>
>                 Key: BEAM-14558
>                 URL: https://issues.apache.org/jira/browse/BEAM-14558
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-cassandra
>    Affects Versions: 2.34.0, 2.35.0, 2.36.0, 2.37.0, 2.38.0, 2.39.0
>            Reporter: Christophe ROQUETTE
>            Priority: P1
>
> h2. Bug
> Data at the beginning or end of the token ring is never retrieved, due to a 
> bad TokenRange request.
> This bug was introduced by BEAM-9008, in [this 
> commit|https://github.com/apache/beam/commit/e12fc33e55e23db9f2aee330039d16dace34f9aa]
> A basic reproduction case & workarounds are available here:
> [Github/beam-cassandraio-bug|https://github.com/KriKroff/beam-cassandraio-bug]
> h2. Description
> When using {{{}CassandraIO{}}}, a list of token ranges is requested to C* 
> nodes in order to create splits in those ranges.
> A split will be represented as a RingRange resulting in a request to C* in 
> the form of
> `TOKEN(partition_key) >= range_start AND TOKEN(partition_key) < range_end`
> The token ring goes from Long.MIN_VALUE to Long.MAX_VALUE (so -2xxx to 2xxx), 
> a range may contains the "join point" and be represented by [2xx, -2xxx].
> In this case (Aka TokenRange isWrapping), old implementation used to send 2 
> different requests:
>  * {{TOKEN(partition_key) >= range_start}} (To get result up to the end of 
> the ring, i.e. Long.MAX_VALUE)
>  * {{TOKEN(partition_key) < range_end}} (To get result from the beginning end 
> of the ring, i.e. Long.MIN_VALUE)
> Now, this behavior is not implemented anymore and token ranges are all called 
> the same way, even in the wrapping case.
> It results in a request like :
> {{TOKEN(partition_key) >= 2XXX AND TOKEN(partition_key) < -2xxx}}
> This gives 0 results, and some data is never retrieved.
>  
> h2. WorkArounds
>  * Downgrade to 2.33.0
>  * Use customer TokenRanges & readAll implementation



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to