[
https://issues.apache.org/jira/browse/CASSANDRA-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
denish updated CASSANDRA-20796:
-------------------------------
Description:
*Environment*
* Cassandra version: *4.1.8* (Instaclustr managed service build)
* Replacement method: {{delete-data}} +
{{-Dcassandra.replace_address_first_boot=<ip>}}
* 64 vnodes per node
* Cluster size: 24 nodes, 4 DCs (AWS VPC)
* Materialised views enabled
*Summary*
We observed bootstrap failure on a replacement node due to premature streaming
of a materialised view (MV) SSTable before the node had received or published
any token ownership. This led to an immediate
{{StreamReceivedOutOfTokenRangeException}} and bootstrap abortion.
----
*Observed behavior*
As soon as the replacement node joined with {{{}replace_address_first_boot{}}},
it was contacted by peers and streamed MV SSTables. However, the node had not
yet converged gossip or entered the JOINING/NORMAL state, it had zero owned
ranges. Cassandra threw:
StreamReceivedOutOfTokenRangeException:
Received stream for sstable <ks>/<mv> containing key DecoratedKey(...) outside
of owned ranges [(−9223372036854775808 ... 9223372036854775807)]
Streaming session was aborted and bootstrap rolled back.
Range list used during validation was {{{}[(−2⁶³ ... 2⁶³−1)]{}}}, indicating an
*empty* range set on receiver.
*Expected behavior*
MV SSTables should not be streamed until the joining node has received its
{{TokenMetadata}} and published range ownership. At the very least, MV
validation should be deferred or retried rather than aborting the entire
bootstrap.
*Recommendation / Follow-up*
This behaviour appears to still affect users on 4.1.8. While CASSANDRA‑13704
and sub‑tasks like CASSANDRA‑13708 address general out-of-range handling, this
specific edge case (MV + pre-ownership streaming) still causes hard bootstrap
failures unless manually mitigated.
Would be great to confirm whether MV SSTables are now covered in those patches
or whether a future improvement could defer MV validation until
{{TokenMetadata.getPendingRanges}} or {{isMember()}} returns non-empty.
was:
*Environment*
* Cassandra version: *4.1.8* (Instaclustr managed service build)
* Replacement method: {{delete-data}} +
{{-Dcassandra.replace_address_first_boot=<ip>}}
* 64 vnodes per node
* Cluster size: 24 nodes, 4 DCs (AWS VPC)
* Affected DC node: running in {{EU_WEST_1}}
* Materialised views enabled
*Summary*
We observed bootstrap failure on a replacement node due to premature streaming
of a materialised view (MV) SSTable before the node had received or published
any token ownership. This led to an immediate
{{StreamReceivedOutOfTokenRangeException}} and bootstrap abortion.
----
*Observed behavior*
As soon as the replacement node joined with {{{}replace_address_first_boot{}}},
it was contacted by peers and streamed MV SSTables. However, the node had not
yet converged gossip or entered the JOINING/NORMAL state, it had zero owned
ranges. Cassandra threw:
StreamReceivedOutOfTokenRangeException:
Received stream for sstable <ks>/<mv> containing key DecoratedKey(...) outside
of owned ranges [(−9223372036854775808 ... 9223372036854775807)]
Streaming session was aborted and bootstrap rolled back.
Range list used during validation was {{{}[(−2⁶³ ... 2⁶³−1)]{}}}, indicating an
*empty* range set on receiver.
*Expected behavior*
MV SSTables should not be streamed until the joining node has received its
{{TokenMetadata}} and published range ownership. At the very least, MV
validation should be deferred or retried rather than aborting the entire
bootstrap.
*Recommendation / Follow-up*
This behaviour appears to still affect users on 4.1.8. While CASSANDRA‑13704
and sub‑tasks like CASSANDRA‑13708 address general out-of-range handling, this
specific edge case (MV + pre-ownership streaming) still causes hard bootstrap
failures unless manually mitigated.
Would be great to confirm whether MV SSTables are now covered in those patches
or whether a future improvement could defer MV validation until
{{TokenMetadata.getPendingRanges}} or {{isMember()}} returns non-empty.
> Bootstrap fails with StreamReceivedOutOfTokenRangeException when MV streamed
> too early
> --------------------------------------------------------------------------------------
>
> Key: CASSANDRA-20796
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20796
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: denish
> Priority: Normal
>
> *Environment*
> * Cassandra version: *4.1.8* (Instaclustr managed service build)
> * Replacement method: {{delete-data}} +
> {{-Dcassandra.replace_address_first_boot=<ip>}}
> * 64 vnodes per node
> * Cluster size: 24 nodes, 4 DCs (AWS VPC)
> * Materialised views enabled
>
> *Summary*
> We observed bootstrap failure on a replacement node due to premature
> streaming of a materialised view (MV) SSTable before the node had received or
> published any token ownership. This led to an immediate
> {{StreamReceivedOutOfTokenRangeException}} and bootstrap abortion.
>
> ----
> *Observed behavior*
> As soon as the replacement node joined with
> {{{}replace_address_first_boot{}}}, it was contacted by peers and streamed MV
> SSTables. However, the node had not yet converged gossip or entered the
> JOINING/NORMAL state, it had zero owned ranges. Cassandra threw:
> StreamReceivedOutOfTokenRangeException:
> Received stream for sstable <ks>/<mv> containing key DecoratedKey(...)
> outside of owned ranges [(−9223372036854775808 ... 9223372036854775807)]
> Streaming session was aborted and bootstrap rolled back.
> Range list used during validation was {{{}[(−2⁶³ ... 2⁶³−1)]{}}}, indicating
> an *empty* range set on receiver.
>
> *Expected behavior*
> MV SSTables should not be streamed until the joining node has received its
> {{TokenMetadata}} and published range ownership. At the very least, MV
> validation should be deferred or retried rather than aborting the entire
> bootstrap.
>
> *Recommendation / Follow-up*
> This behaviour appears to still affect users on 4.1.8. While CASSANDRA‑13704
> and sub‑tasks like CASSANDRA‑13708 address general out-of-range handling,
> this specific edge case (MV + pre-ownership streaming) still causes hard
> bootstrap failures unless manually mitigated.
> Would be great to confirm whether MV SSTables are now covered in those
> patches or whether a future improvement could defer MV validation until
> {{TokenMetadata.getPendingRanges}} or {{isMember()}} returns non-empty.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]