Christophe ROQUETTE created BEAM-14558:
------------------------------------------
Summary: Data missing when using CassandraIO.Read
Key: BEAM-14558
URL: https://issues.apache.org/jira/browse/BEAM-14558
Project: Beam
Issue Type: Bug
Components: io-java-cassandra
Affects Versions: 2.39.0, 2.38.0, 2.37.0, 2.36.0, 2.35.0, 2.34.0
Reporter: Christophe ROQUETTE
h2. Bug
Data at the beginning or end of the token ring is never retrieved, due to a bad
TokenRange request.
This bug was introduced by BEAM-9008, in [this
commit|https://github.com/apache/beam/commit/e12fc33e55e23db9f2aee330039d16dace34f9aa]
A basic reproduction case & workarounds are available here:
[Github/beam-cassandraio-bug|https://github.com/KriKroff/beam-cassandraio-bug]
h2. Description
When using {{{}CassandraIO{}}}, a list of token ranges is requested to C* nodes
in order to create splits in those ranges.
A split will be represented as a RingRange resulting in a request to C* in the
form of
`TOKEN(partition_key) >= range_start AND TOKEN(partition_key) < range_end`
The token ring goes from Long.MIN_VALUE to Long.MAX_VALUE (so -2xxx to 2xxx), a
range may contains the "join point" and be represented by [2xx, -2xxx].
In this case (Aka TokenRange isWrapping), old implementation used to send 2
different requests:
* {{TOKEN(partition_key) >= range_start}} (To get result up to the end of the
ring, i.e. Long.MAX_VALUE)
* {{TOKEN(partition_key) < range_end}} (To get result from the beginning end
of the ring, i.e. Long.MIN_VALUE)
Now, this behavior is not implemented anymore and token ranges are all called
the same way, even in the wrapping case.
It results in a request like :
{{TOKEN(partition_key) >= 2XXX AND TOKEN(partition_key) < -2xxx}}
This gives 0 results, and some data is never retrieved.
h2. WorkArounds
* Downgrade to 2.33.0
* Use customer TokenRanges & readAll implementation
--
This message was sent by Atlassian Jira
(v8.20.7#820007)