[ 
https://issues.apache.org/jira/browse/CASSANDRA-20796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

denish updated CASSANDRA-20796:
-------------------------------
    Description: 
*Environment*
 * Cassandra version: *4.1.8* (Instaclustr managed service build)

 * Replacement method: {{delete-data}} + 
{{-Dcassandra.replace_address_first_boot=<ip>}}

 * 64 vnodes per node
 * Cluster size: 24 nodes, 4 DCs (AWS VPC)
 * Materialised views enabled

 

*Summary*
We observed bootstrap failure on a replacement node due to premature streaming 
of a materialised view (MV) SSTable before the node had received or published 
any token ownership. This led to an immediate 
{{StreamReceivedOutOfTokenRangeException}} and bootstrap abortion.

 
----
*Observed behavior*
As soon as the replacement node joined with {{{}replace_address_first_boot{}}}, 
it was contacted by peers and streamed MV SSTables. However, the node had not 
yet converged gossip or entered the JOINING/NORMAL state, it had zero owned 
ranges. Cassandra threw:

StreamReceivedOutOfTokenRangeException:
Received stream for sstable <ks>/<mv> containing key DecoratedKey(...) outside 
of owned ranges [(−9223372036854775808 ... 9223372036854775807)]

Streaming session was aborted and bootstrap rolled back.

Range list used during validation was {{{}[(−2⁶³ ... 2⁶³−1)]{}}}, indicating an 
*empty* range set on receiver.

 

*Expected behavior*
MV SSTables should not be streamed until the joining node has received its 
{{TokenMetadata}} and published range ownership. At the very least, MV 
validation should be deferred or retried rather than aborting the entire 
bootstrap.

 

*Recommendation / Follow-up*
This behaviour appears to still affect users on 4.1.8. While CASSANDRA‑13704 
and sub‑tasks like CASSANDRA‑13708 address general out-of-range handling, this 
specific edge case (MV + pre-ownership streaming) still causes hard bootstrap 
failures unless manually mitigated.

Would be great to confirm whether MV SSTables are now covered in those patches 
or whether a future improvement could defer MV validation until 
{{TokenMetadata.getPendingRanges}} or {{isMember()}} returns non-empty.

  was:
*Environment*
 * Cassandra version: *4.1.8* (Instaclustr managed service build)

 * Replacement method: {{delete-data}} + 
{{-Dcassandra.replace_address_first_boot=<ip>}}

 * 64 vnodes per node

 * Cluster size: 24 nodes, 4 DCs (AWS VPC)

 * Affected DC node: running in {{EU_WEST_1}}

 * Materialised views enabled

 

*Summary*
We observed bootstrap failure on a replacement node due to premature streaming 
of a materialised view (MV) SSTable before the node had received or published 
any token ownership. This led to an immediate 
{{StreamReceivedOutOfTokenRangeException}} and bootstrap abortion.

 
----
*Observed behavior*
As soon as the replacement node joined with {{{}replace_address_first_boot{}}}, 
it was contacted by peers and streamed MV SSTables. However, the node had not 
yet converged gossip or entered the JOINING/NORMAL state, it had zero owned 
ranges. Cassandra threw:

StreamReceivedOutOfTokenRangeException:
Received stream for sstable <ks>/<mv> containing key DecoratedKey(...) outside 
of owned ranges [(−9223372036854775808 ... 9223372036854775807)]

Streaming session was aborted and bootstrap rolled back.

Range list used during validation was {{{}[(−2⁶³ ... 2⁶³−1)]{}}}, indicating an 
*empty* range set on receiver.

 

*Expected behavior*
MV SSTables should not be streamed until the joining node has received its 
{{TokenMetadata}} and published range ownership. At the very least, MV 
validation should be deferred or retried rather than aborting the entire 
bootstrap.

 

*Recommendation / Follow-up*
This behaviour appears to still affect users on 4.1.8. While CASSANDRA‑13704 
and sub‑tasks like CASSANDRA‑13708 address general out-of-range handling, this 
specific edge case (MV + pre-ownership streaming) still causes hard bootstrap 
failures unless manually mitigated.

Would be great to confirm whether MV SSTables are now covered in those patches 
or whether a future improvement could defer MV validation until 
{{TokenMetadata.getPendingRanges}} or {{isMember()}} returns non-empty.


> Bootstrap fails with StreamReceivedOutOfTokenRangeException when MV streamed 
> too early
> --------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20796
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20796
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: denish
>            Priority: Normal
>
> *Environment*
>  * Cassandra version: *4.1.8* (Instaclustr managed service build)
>  * Replacement method: {{delete-data}} + 
> {{-Dcassandra.replace_address_first_boot=<ip>}}
>  * 64 vnodes per node
>  * Cluster size: 24 nodes, 4 DCs (AWS VPC)
>  * Materialised views enabled
>  
> *Summary*
> We observed bootstrap failure on a replacement node due to premature 
> streaming of a materialised view (MV) SSTable before the node had received or 
> published any token ownership. This led to an immediate 
> {{StreamReceivedOutOfTokenRangeException}} and bootstrap abortion.
>  
> ----
> *Observed behavior*
> As soon as the replacement node joined with 
> {{{}replace_address_first_boot{}}}, it was contacted by peers and streamed MV 
> SSTables. However, the node had not yet converged gossip or entered the 
> JOINING/NORMAL state, it had zero owned ranges. Cassandra threw:
> StreamReceivedOutOfTokenRangeException:
> Received stream for sstable <ks>/<mv> containing key DecoratedKey(...) 
> outside of owned ranges [(−9223372036854775808 ... 9223372036854775807)]
> Streaming session was aborted and bootstrap rolled back.
> Range list used during validation was {{{}[(−2⁶³ ... 2⁶³−1)]{}}}, indicating 
> an *empty* range set on receiver.
>  
> *Expected behavior*
> MV SSTables should not be streamed until the joining node has received its 
> {{TokenMetadata}} and published range ownership. At the very least, MV 
> validation should be deferred or retried rather than aborting the entire 
> bootstrap.
>  
> *Recommendation / Follow-up*
> This behaviour appears to still affect users on 4.1.8. While CASSANDRA‑13704 
> and sub‑tasks like CASSANDRA‑13708 address general out-of-range handling, 
> this specific edge case (MV + pre-ownership streaming) still causes hard 
> bootstrap failures unless manually mitigated.
> Would be great to confirm whether MV SSTables are now covered in those 
> patches or whether a future improvement could defer MV validation until 
> {{TokenMetadata.getPendingRanges}} or {{isMember()}} returns non-empty.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to