[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-23 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Reviewers: Sam Tunnicliffe, Stefan Miklosovic  (was: Sam Tunnicliffe)

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-23 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

  Fix Version/s: 5.1
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/7fe30fc313ac35b1156f5a37d2069e29cded710b
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

committed, thanks.

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-23 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Status: Ready to Commit  (was: Review In Progress)

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-23 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848984#comment-17848984
 ] 

Sam Tunnicliffe commented on CASSANDRA-19592:
-

{quote}Does anything else need to be done except merging?
{quote}
No, I think it just fell between Alex & me. I'll get it rebased & merged.

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19593) Transactional Guardrails

2024-05-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848624#comment-17848624
 ] 

Sam Tunnicliffe commented on CASSANDRA-19593:
-

{quote}This brings us to more general problem of transactional configuration 
which should be done as well. It is questionable if it is desirable to do it as 
part of this ticket or not, however, I would like to look into how we could do 
that as well.
{quote}
We've been working on some proposals for this, some of which were briefly 
discussed in CASSANDRA-12937.  I agree with [~ifesdjeen] in that this warrants 
its own CEP. I know he's been working on document for that, I'll see if we can 
get that ready for circulation.

> Transactional Guardrails
> 
>
> Key: CASSANDRA-19593
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19593
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Guardrails, Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it is time to start to think about this more seriously. TCM is 
> getting into pretty nice shape and we might start to investigate how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail

2024-05-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848544#comment-17848544
 ] 

Sam Tunnicliffe commented on CASSANDRA-19556:
-

[~mck] this certainly isn't critical for 5.1/6.0, my comment was just intended 
as a counterpoint to illustrate why it might but useful in a 5.0.x

To that point, I'd definitely think about adding _something_ to minors in 
branches with upgrade paths to current trunk. Not an actual guardrail, just a 
system property or similar to optionally disable certain operations immediately 
prior to upgrade. If we did go down that route, there is some precedent from 
back in the day for mandating a minimum minor version prior to a major upgrade 
(from {{{}NEWS.txt{}}}):
{code:java}
Upgrade to 3.0 is supported from Cassandra 2.1 versions greater or equal to 
2.1.9,
or Cassandra 2.2 versions greater or equal to 2.2.2. 
{code}
but like I said, this isn't critical for upgrading to current trunk and I'm 
definitely not advocating for anything in 5.0-rc

> Add guardrail to block DDL/DCL queries and replace alter_table_enabled 
> guardrail
> 
>
> Key: CASSANDRA-19556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19556
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Guardrails
>Reporter: Yuqi Yan
>Assignee: Yuqi Yan
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Sometimes we want to block DDL/DCL queries to stop new schemas being created 
> or roles created. (e.g. when doing live-upgrade)
> For DDL guardrail current implementation won't block the query if it's no-op 
> (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The 
> guardrail check is added in apply() right after all the existence check)
> I don't have preference on either block every DDL query or check whether if 
> it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. 
> at startup, which is no-op but will be blocked by this guardrail and failed 
> to start.
>  
> 4.1 PR: [https://github.com/apache/cassandra/pull/3248]
> trunk PR: [https://github.com/apache/cassandra/pull/3275]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-16 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846952#comment-17846952
 ] 

Sam Tunnicliffe commented on CASSANDRA-19592:
-

[~ifesdjeen] I'm +1 on this version, wdyt?

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-16 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Attachment: ci_summary-1.html

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-16 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Status: Review In Progress  (was: Changes Suggested)

Discussed with Alex and made a few tweaks. Pushed the latest version and 
attached updated CI summary. The single test failure is unrelated.

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests

2024-05-16 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19599:

  Fix Version/s: 5.1
 (was: 5.x)
Source Control Link: 
https://github.com/apache/cassandra/commit/a15b137b7c8c84773453dbe264fcd2d4b76222c0
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks

> Remove unused config params for out of range token requests
> ---
>
> Key: CASSANDRA-19599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19599
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Config
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html
>
>
> The fields {{log_out_of_token_range_requests}} and 
> {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually 
> been used and are just vestiges from early development on CEP-21. 
> We should remove them and the related accessors in {{DatabaseDescriptor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests

2024-05-16 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19599:

Status: Ready to Commit  (was: Review In Progress)

> Remove unused config params for out of range token requests
> ---
>
> Key: CASSANDRA-19599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19599
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Config
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> The fields {{log_out_of_token_range_requests}} and 
> {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually 
> been used and are just vestiges from early development on CEP-21. 
> We should remove them and the related accessors in {{DatabaseDescriptor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests

2024-05-16 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19599:

Reviewers: Marcus Eriksson
   Status: Review In Progress  (was: Patch Available)

> Remove unused config params for out of range token requests
> ---
>
> Key: CASSANDRA-19599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19599
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Config
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> The fields {{log_out_of_token_range_requests}} and 
> {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually 
> been used and are just vestiges from early development on CEP-21. 
> We should remove them and the related accessors in {{DatabaseDescriptor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests

2024-05-13 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19599:

Attachment: ci_summary.html

> Remove unused config params for out of range token requests
> ---
>
> Key: CASSANDRA-19599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19599
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Config
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>
> The fields {{log_out_of_token_range_requests}} and 
> {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually 
> been used and are just vestiges from early development on CEP-21. 
> We should remove them and the related accessors in {{DatabaseDescriptor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail

2024-05-10 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845345#comment-17845345
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-19556 at 5/10/24 1:15 PM:
--

bq. personally i see no reason this needs to be in any 5.0.x

Not an absolute necessity, but it would be quite convenient to have it or 
something similar as part of the preferred upgrade path to 5.1/6.0, as 
mentioned 
[here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan]
 


was (Author: beobal):
bq. personally i see no reason this needs to be in any 5.0.x

Not an absolute necessity, but it would be quite convenient to have it or 
something similar,
 as part of the preferred upgrade path to 5.1/6.0, as mentioned 
[here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan]
 

> Add guardrail to block DDL/DCL queries and replace alter_table_enabled 
> guardrail
> 
>
> Key: CASSANDRA-19556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19556
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Guardrails
>Reporter: Yuqi Yan
>Assignee: Yuqi Yan
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Sometimes we want to block DDL/DCL queries to stop new schemas being created 
> or roles created. (e.g. when doing live-upgrade)
> For DDL guardrail current implementation won't block the query if it's no-op 
> (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The 
> guardrail check is added in apply() right after all the existence check)
> I don't have preference on either block every DDL query or check whether if 
> it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. 
> at startup, which is no-op but will be blocked by this guardrail and failed 
> to start.
>  
> 4.1 PR: [https://github.com/apache/cassandra/pull/3248]
> trunk PR: [https://github.com/apache/cassandra/pull/3275]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail

2024-05-10 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845345#comment-17845345
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-19556 at 5/10/24 1:14 PM:
--

bq. personally i see no reason this needs to be in any 5.0.x

Not an absolute necessity, but it would be quite convenient to have it or 
something similar,
 as part of the preferred upgrade path to 5.1/6.0, as mentioned 
[here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan]
 


was (Author: beobal):
bq. personally i see no reason this needs to be in any 5.0.x

Not an absolute necessity, but it would be quite convenient to have it or 
something similar, as part of the preferred upgrade path to 5.1/6.0, as 
mentioned 
[here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan]
 

> Add guardrail to block DDL/DCL queries and replace alter_table_enabled 
> guardrail
> 
>
> Key: CASSANDRA-19556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19556
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Guardrails
>Reporter: Yuqi Yan
>Assignee: Yuqi Yan
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Sometimes we want to block DDL/DCL queries to stop new schemas being created 
> or roles created. (e.g. when doing live-upgrade)
> For DDL guardrail current implementation won't block the query if it's no-op 
> (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The 
> guardrail check is added in apply() right after all the existence check)
> I don't have preference on either block every DDL query or check whether if 
> it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. 
> at startup, which is no-op but will be blocked by this guardrail and failed 
> to start.
>  
> 4.1 PR: [https://github.com/apache/cassandra/pull/3248]
> trunk PR: [https://github.com/apache/cassandra/pull/3275]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail

2024-05-10 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845345#comment-17845345
 ] 

Sam Tunnicliffe commented on CASSANDRA-19556:
-

bq. personally i see no reason this needs to be in any 5.0.x

Not an absolute necessity, but it would be quite convenient to have it or 
something similar, as part of the preferred upgrade path to 5.1/6.0, as 
mentioned 
[here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan]
 

> Add guardrail to block DDL/DCL queries and replace alter_table_enabled 
> guardrail
> 
>
> Key: CASSANDRA-19556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19556
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Guardrails
>Reporter: Yuqi Yan
>Assignee: Yuqi Yan
>Priority: Normal
> Fix For: 5.0-rc, 5.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Sometimes we want to block DDL/DCL queries to stop new schemas being created 
> or roles created. (e.g. when doing live-upgrade)
> For DDL guardrail current implementation won't block the query if it's no-op 
> (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The 
> guardrail check is added in apply() right after all the existence check)
> I don't have preference on either block every DDL query or check whether if 
> it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. 
> at startup, which is no-op but will be blocked by this guardrail and failed 
> to start.
>  
> 4.1 PR: [https://github.com/apache/cassandra/pull/3248]
> trunk PR: [https://github.com/apache/cassandra/pull/3275]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19158) Reuse native transport-driven futures in Debounce

2024-05-08 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19158:

Status: Changes Suggested  (was: Review In Progress)

Thanks, this is definitely an improvement. I've left a few comments on the PR

> Reuse native transport-driven futures in Debounce
> -
>
> Key: CASSANDRA-19158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19158
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, we create a future in Debounce, then create one more future in 
> RemoteProcessor#sendWithCallback. This is further exacerbated by chaining 
> calls, when we first attempt to catch up from peer, and then from CMS.
> First of all, we should always only use a native transport timeout driven 
> futures returned from sendWithCallback, since they implement reasonable 
> retries under the hood, and are easy to bulk-configure (ie you can simply 
> change timeout in yaml and have all futures change their behaviour).
> Second, we should _chain_ futures and use map or andThen for fallback 
> operations such as trying to catch up from CMS after an unsuccesful attemp to 
> catch up from peer.
> This should significantly simplify the code and number of blocked/waiting 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19158) Reuse native transport-driven futures in Debounce

2024-05-08 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19158:

Status: Review In Progress  (was: Patch Available)

> Reuse native transport-driven futures in Debounce
> -
>
> Key: CASSANDRA-19158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19158
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we create a future in Debounce, then create one more future in 
> RemoteProcessor#sendWithCallback. This is further exacerbated by chaining 
> calls, when we first attempt to catch up from peer, and then from CMS.
> First of all, we should always only use a native transport timeout driven 
> futures returned from sendWithCallback, since they implement reasonable 
> retries under the hood, and are easy to bulk-configure (ie you can simply 
> change timeout in yaml and have all futures change their behaviour).
> Second, we should _chain_ futures and use map or andThen for fallback 
> operations such as trying to catch up from CMS after an unsuccesful attemp to 
> catch up from peer.
> This should significantly simplify the code and number of blocked/waiting 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19158) Reuse native transport-driven futures in Debounce

2024-05-08 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19158:

Test and Documentation Plan: Refactoring, CI tests
 Status: Patch Available  (was: Open)

> Reuse native transport-driven futures in Debounce
> -
>
> Key: CASSANDRA-19158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19158
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we create a future in Debounce, then create one more future in 
> RemoteProcessor#sendWithCallback. This is further exacerbated by chaining 
> calls, when we first attempt to catch up from peer, and then from CMS.
> First of all, we should always only use a native transport timeout driven 
> futures returned from sendWithCallback, since they implement reasonable 
> retries under the hood, and are easy to bulk-configure (ie you can simply 
> change timeout in yaml and have all futures change their behaviour).
> Second, we should _chain_ futures and use map or andThen for fallback 
> operations such as trying to catch up from CMS after an unsuccesful attemp to 
> catch up from peer.
> This should significantly simplify the code and number of blocked/waiting 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19158) Reuse native transport-driven futures in Debounce

2024-05-08 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19158:

Change Category: Code Clarity
 Complexity: Normal
Component/s: Transactional Cluster Metadata
  Reviewers: Sam Tunnicliffe
 Status: Open  (was: Triage Needed)

> Reuse native transport-driven futures in Debounce
> -
>
> Key: CASSANDRA-19158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19158
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, we create a future in Debounce, then create one more future in 
> RemoteProcessor#sendWithCallback. This is further exacerbated by chaining 
> calls, when we first attempt to catch up from peer, and then from CMS.
> First of all, we should always only use a native transport timeout driven 
> futures returned from sendWithCallback, since they implement reasonable 
> retries under the hood, and are easy to bulk-configure (ie you can simply 
> change timeout in yaml and have all futures change their behaviour).
> Second, we should _chain_ futures and use map or andThen for fallback 
> operations such as trying to catch up from CMS after an unsuccesful attemp to 
> catch up from peer.
> This should significantly simplify the code and number of blocked/waiting 
> threads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19593) Transactional Guardrails

2024-05-07 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844326#comment-17844326
 ] 

Sam Tunnicliffe commented on CASSANDRA-19593:
-

I think this is a great idea, in fact I would suggest that all node 
configuration should be moving away from yaml and into global cluster metadata, 
per CEP-21. However, I don't think we should rush to do this just yet. 

The introduction of TCM has already meant a lot of change to the codebase and I 
can't help feeling that it would benefit from a period of stabilization. 
Understandably and correctly IMO, community focus for the past few months has 
been on getting 5.0 ready for release, so I don't believe there's actually been 
that many eyes on trunk and TCM in particular. 

While we have confidence in the design and implementation of CEP-21, my 
personal opinion is that it would be best to phase it in as incrementally as 
possible. Like I said, a lot has already changed but this is really the minimum 
that could be done to make it useful. I'd propose holding off on bringing more 
into TCM until 5.0 is out the door at least, but maybe even until 5.1 has had 
some meaningful exposure. 

> Transactional Guardrails
> 
>
> Key: CASSANDRA-19593
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19593
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Feature/Guardrails, Transactional Cluster Metadata
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I think it is time to start to think about this more seriously. TCM is 
> getting into pretty nice shape and we might start to investigate how to do 
> this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19615) Merge pre-existing schema with the system defined one during upgrade

2024-05-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19615:

Status: Review In Progress  (was: Patch Available)

> Merge pre-existing schema with the system defined one during upgrade
> 
>
> Key: CASSANDRA-19615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19615
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When upgrading we should merge the pre-existing schema with the 
> system-defined schema. For example, if a table was defined in 5.0 in 
> system_distributed, but then removed from SystemDistributedKeyspace.java in 
> 5.1 we should still be able to read it (until manually dropped).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19615) Merge pre-existing schema with the system defined one during upgrade

2024-05-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19615:

Status: Ready to Commit  (was: Review In Progress)

+1 with one very minor comment on the PR

> Merge pre-existing schema with the system defined one during upgrade
> 
>
> Key: CASSANDRA-19615
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19615
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When upgrading we should merge the pre-existing schema with the 
> system-defined schema. For example, if a table was defined in 5.0 in 
> system_distributed, but then removed from SystemDistributedKeyspace.java in 
> 5.1 we should still be able to read it (until manually dropped).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19517) Raise priority of TCM internode messages during critical operations

2024-05-03 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843238#comment-17843238
 ] 

Sam Tunnicliffe commented on CASSANDRA-19517:
-

+1

> Raise priority of TCM internode messages during critical operations
> ---
>
> Key: CASSANDRA-19517
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19517
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz
>
>
> In a busy cluster, TCM messages may not get propagated throughout the 
> cluster, since they will be ordered together with other P1 messages (for 
> {{TCM_}} prefixed verbs), and with P2 with all Paxos operations.
> To avoid this, and make sure we can continue cluster metadata changes, all 
> {{TCM_}}-prefixed verbs should have {{P0}} priority, just like Gossip 
> messages used to. All Paxos messages that involve distributed metadata 
> keyspace should now get an {{URGENT}} flag, which will instruct internode 
> messaging to schedule them on the {{URGENT_MESSAGES}} connection.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19581) Add nodetool command to unregister LEFT nodes

2024-05-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19581:

Status: Ready to Commit  (was: Changes Suggested)

+1

> Add nodetool command to unregister LEFT nodes
> -
>
> Key: CASSANDRA-19581
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19581
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary-1.html, ci_summary.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When decommissioning a node it still remains in ClusterMetadata with state = 
> LEFT. We should provide a nodetool command to unregister such nodes 
> completely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages

2024-05-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19613:

  Fix Version/s: 5.1
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/51d048a93a7e7cfb93a544dabba4b6f7aa1bbdd1
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks!

> Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
> --
>
> Key: CASSANDRA-19613
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19613
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html
>
>
> We should add \{{ClusterMetadata.instance().metadataIdentifier}} to 
> \{{GossipDigestSyn}} messages and compare with the local one, rejecting 
> anything that has the wrong identifier like we do with cluster name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages

2024-05-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19613:

Attachment: ci_summary.html

> Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
> --
>
> Key: CASSANDRA-19613
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19613
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Attachments: ci_summary.html
>
>
> We should add \{{ClusterMetadata.instance().metadataIdentifier}} to 
> \{{GossipDigestSyn}} messages and compare with the local one, rejecting 
> anything that has the wrong identifier like we do with cluster name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages

2024-05-02 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19613:

Reviewers: Marcus Eriksson  (was: Marcus Eriksson, Sam Tunnicliffe)

> Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
> --
>
> Key: CASSANDRA-19613
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19613
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
>
> We should add \{{ClusterMetadata.instance().metadataIdentifier}} to 
> \{{GossipDigestSyn}} messages and compare with the local one, rejecting 
> anything that has the wrong identifier like we do with cluster name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages

2024-05-02 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19613:

Reviewers: Marcus Eriksson, Sam Tunnicliffe
   Status: Review In Progress  (was: Patch Available)

> Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
> --
>
> Key: CASSANDRA-19613
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19613
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
>
> We should add \{{ClusterMetadata.instance().metadataIdentifier}} to 
> \{{GossipDigestSyn}} messages and compare with the local one, rejecting 
> anything that has the wrong identifier like we do with cluster name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages

2024-05-02 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19613:

Test and Documentation Plan: New and existing tests in CI
 Status: Patch Available  (was: Open)

> Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
> --
>
> Key: CASSANDRA-19613
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19613
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
>
> We should add \{{ClusterMetadata.instance().metadataIdentifier}} to 
> \{{GossipDigestSyn}} messages and compare with the local one, rejecting 
> anything that has the wrong identifier like we do with cluster name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages

2024-05-02 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19613:

 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
Discovered By: Unit Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

This is relatively benign, but it is related to flakiness in 
{{{}o.a.c.distributed.test.tcm.SplitBrainTest{}}}.

In that test, the two halves of a split brain cluster attempt to reestablish 
communication due to having members of both in the seed lists. When this 
prompts one side to try and catch up with metadata changes, this is correctly 
identified as an error which is what the test asserts. The flakiness arises 
when the gossip attempt is made before the epochs of the two separate clusters 
have had a chance to diverge. In that case, no further communication is 
performed and the error is not triggered.

The proposed patch includes a rewrite of {{SplitBrainTest}} which decouples the 
catchup from gossip, removing the dependency on timing. 

> Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
> --
>
> Key: CASSANDRA-19613
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19613
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
>
> We should add \{{ClusterMetadata.instance().metadataIdentifier}} to 
> \{{GossipDigestSyn}} messages and compare with the local one, rejecting 
> anything that has the wrong identifier like we do with cluster name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages

2024-05-02 Thread Sam Tunnicliffe (Jira)
Sam Tunnicliffe created CASSANDRA-19613:
---

 Summary: Add ClusterMetadata.metadataIdentifier to GossipDigestSyn 
messages
 Key: CASSANDRA-19613
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19613
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Gossip, Transactional Cluster Metadata
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe


We should add \{{ClusterMetadata.instance().metadataIdentifier}} to 
\{{GossipDigestSyn}} messages and compare with the local one, rejecting 
anything that has the wrong identifier like we do with cluster name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19581) Add nodetool command to unregister LEFT nodes

2024-05-01 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19581:

Reviewers: Sam Tunnicliffe  (was: Alex Petrov, Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Add nodetool command to unregister LEFT nodes
> -
>
> Key: CASSANDRA-19581
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19581
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When decommissioning a node it still remains in ClusterMetadata with state = 
> LEFT. We should provide a nodetool command to unregister such nodes 
> completely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19581) Add nodetool command to unregister LEFT nodes

2024-05-01 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19581:

Status: Changes Suggested  (was: Review In Progress)

Left a couple of comments on the PR

> Add nodetool command to unregister LEFT nodes
> -
>
> Key: CASSANDRA-19581
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19581
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When decommissioning a node it still remains in ClusterMetadata with state = 
> LEFT. We should provide a nodetool command to unregister such nodes 
> completely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19581) Add nodetool command to unregister LEFT nodes

2024-05-01 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19581:

Test and Documentation Plan: New and existing tests.
 Status: Patch Available  (was: In Progress)

> Add nodetool command to unregister LEFT nodes
> -
>
> Key: CASSANDRA-19581
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19581
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When decommissioning a node it still remains in ClusterMetadata with state = 
> LEFT. We should provide a nodetool command to unregister such nodes 
> completely.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-01 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Status: Changes Suggested  (was: Review In Progress)

I think the approach is sound but we need to extend it to 
{{CreateKeyspaceStatement}} {{CreateIndexStatement}} 

The former may use {{default_keyspace_rf}} which can be set differently per 
node, so we need to be explicit and encode whatever value the coordinator 
resolves here.

Likewise, {{default_secondary_index}} injects the class of the index 
implementation (or an alias for one) if none is specified (and 
{{default_secondary_index_enabled: true}}, which is the default). This may also 
have different values per-node, which causes different indexes to be created 
locally.

I didn't find any other per-node settings that we need to mitigate in this way, 
but I'll try to do another pass.   

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS

2024-05-01 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19592:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
> ---
>
> Key: CASSANDRA-19592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19592
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Schema
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Attachments: ci_summary.html
>
>
> This is done to unblock CASSANDRA-12937 and allow preserving defaults with 
> which the table was created between node bounces and between nodes with 
> different configurations. For now, we are preserving 5.0 behaviour.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-19599) Remove unused config params for out of range token requests

2024-04-30 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842288#comment-17842288
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-19599 at 4/30/24 8:09 AM:
--

PR for trivial change, CI pending.
https://github.com/apache/cassandra/pull/3276
 


was (Author: beobal):
PR for trivial change, CI pending.

> Remove unused config params for out of range token requests
> ---
>
> Key: CASSANDRA-19599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19599
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Config
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> The fields {{log_out_of_token_range_requests}} and 
> {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually 
> been used and are just vestiges from early development on CEP-21. 
> We should remove them and the related accessors in {{DatabaseDescriptor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests

2024-04-30 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19599:

Test and Documentation Plan: Unused code removal, existing CI
 Status: Patch Available  (was: Open)

PR for trivial change, CI pending.

> Remove unused config params for out of range token requests
> ---
>
> Key: CASSANDRA-19599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19599
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Config
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> The fields {{log_out_of_token_range_requests}} and 
> {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually 
> been used and are just vestiges from early development on CEP-21. 
> We should remove them and the related accessors in {{DatabaseDescriptor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests

2024-04-30 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19599:

Change Category: Code Clarity
 Complexity: Low Hanging Fruit
  Fix Version/s: 5.x
 Status: Open  (was: Triage Needed)

> Remove unused config params for out of range token requests
> ---
>
> Key: CASSANDRA-19599
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19599
> Project: Cassandra
>  Issue Type: Task
>  Components: Local/Config
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> The fields {{log_out_of_token_range_requests}} and 
> {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually 
> been used and are just vestiges from early development on CEP-21. 
> We should remove them and the related accessors in {{DatabaseDescriptor}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

2024-04-25 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19221:

Status: Ready to Commit  (was: Review In Progress)

+1. Agreed about the splitting of {{MetadataChangeSimulationTest}}, 
CASSANDRA-19344 added another dimension to the tests, so it's not super 
surprising if that sometimes pushes run time over the timeout.

> CMS: Nodes can restart with new ipaddress already defined in the cluster
> 
>
> Key: CASSANDRA-19221
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Paul Chandler
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> several pods go down and  ip addresses are swapped between nodes. In 4.0 this 
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> {code}
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> {code}
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> {code}
> bin/nodetool -p 7199 describecms
> {code}
> Cluster Metadata Service:
> {code}
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> {code}
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip 
> addresses for the rpc_address and listen_address 
>  
> The nodes come back as normal, but the nodeid has now been swapped against 
> the ip address:
> Before:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens  Owns (effective)  Host ID                   
>             Rack
> UN  127.0.0.3  75.2 KiB   16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  86.77 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  80.88 KiB  16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> After:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load        Tokens  Owns (effective)  Host ID                  
>              Rack
> UN  127.0.0.3  149.62 KiB  16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  155.48 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  75.74 KiB   16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> On previous tests of this I have created a table with a replication factor of 
> 1, inserted some data before the swap.   After the swap the data on nodes 2 
> and 3 is now missing. 
> One theory I have is that I am using different port numbers for the different 
> nodes, and I am only swapping the ip addresses and not the port numbers, so 
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19587) Remove leftover period column from system.metadata_snapshots

2024-04-25 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19587:

Status: Review In Progress  (was: Patch Available)

> Remove leftover period column from system.metadata_snapshots
> 
>
> Key: CASSANDRA-19587
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19587
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Seems we left a period column in metadata_snapshots in 
> CASSANDRA-19189/CASSANDRA-19482 - it should be removed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19587) Remove leftover period column from system.metadata_snapshots

2024-04-25 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19587:

Status: Ready to Commit  (was: Review In Progress)

+1

> Remove leftover period column from system.metadata_snapshots
> 
>
> Key: CASSANDRA-19587
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19587
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Seems we left a period column in metadata_snapshots in 
> CASSANDRA-19189/CASSANDRA-19482 - it should be removed



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path

2024-04-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19191:

Status: Needs Committer  (was: Patch Available)

> Optimisations to PlacementForRange, improve lookup on r/w path
> --
>
> Key: CASSANDRA-19191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19191
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The lookup used when selecting the appropriate replica group for a range or 
> token while peforming reads and writes is extremely simplistic and 
> inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by 
> replacing the current naive iteration with use a more efficient lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path

2024-04-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19191:

Status: Review In Progress  (was: Needs Committer)

> Optimisations to PlacementForRange, improve lookup on r/w path
> --
>
> Key: CASSANDRA-19191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19191
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The lookup used when selecting the appropriate replica group for a range or 
> token while peforming reads and writes is extremely simplistic and 
> inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by 
> replacing the current naive iteration with use a more efficient lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path

2024-04-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19191:

Status: Ready to Commit  (was: Review In Progress)

+1 

> Optimisations to PlacementForRange, improve lookup on r/w path
> --
>
> Key: CASSANDRA-19191
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19191
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The lookup used when selecting the appropriate replica group for a range or 
> token while peforming reads and writes is extremely simplistic and 
> inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by 
> replacing the current naive iteration with use a more efficient lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19132) Update use of transition plan in PrepareReplace

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19132:

Attachment: ci_summary.html

> Update use of transition plan in PrepareReplace
> ---
>
> Key: CASSANDRA-19132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19132
> Project: Cassandra
>  Issue Type: Task
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When PlacementTransitionPlan was reworked to make its use more consistent 
> across join and leave operations, PrepareReplace was not updated. This could 
> now be simplified in line with the other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19132) Update use of transition plan in PrepareReplace

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19132:

Status: Ready to Commit  (was: Review In Progress)

+1 LGTM. I rebased and added a second commit with a slight tweak to 
{{PlacementTransitionPlan}}. CI looks reasonable, 2 previously known failures + 
1 {{Port already in use}} which I believe is an infra problem

> Update use of transition plan in PrepareReplace
> ---
>
> Key: CASSANDRA-19132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19132
> Project: Cassandra
>  Issue Type: Task
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When PlacementTransitionPlan was reworked to make its use more consistent 
> across join and leave operations, PrepareReplace was not updated. This could 
> now be simplified in line with the other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

  Fix Version/s: 5.1
 (was: 5.x)
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/dabcb175527d3c2daef54c6ce029b3c3054b2a77
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks!

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary-1.html, ci_summary.html, 
> remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at 

[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Status: Ready to Commit  (was: Review In Progress)

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html, 
> remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> 

[jira] [Updated] (CASSANDRA-19132) Update use of transition plan in PrepareReplace

2024-04-19 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19132:

Status: Review In Progress  (was: Patch Available)

> Update use of transition plan in PrepareReplace
> ---
>
> Key: CASSANDRA-19132
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19132
> Project: Cassandra
>  Issue Type: Task
>  Components: Cluster/Membership
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1-alpha1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When PlacementTransitionPlan was reworked to make its use more consistent 
> across join and leave operations, PrepareReplace was not updated. This could 
> now be simplified in line with the other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19221:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> CMS: Nodes can restart with new ipaddress already defined in the cluster
> 
>
> Key: CASSANDRA-19221
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Paul Chandler
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> several pods go down and  ip addresses are swapped between nodes. In 4.0 this 
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> {code}
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> {code}
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> {code}
> bin/nodetool -p 7199 describecms
> {code}
> Cluster Metadata Service:
> {code}
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> {code}
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip 
> addresses for the rpc_address and listen_address 
>  
> The nodes come back as normal, but the nodeid has now been swapped against 
> the ip address:
> Before:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens  Owns (effective)  Host ID                   
>             Rack
> UN  127.0.0.3  75.2 KiB   16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  86.77 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  80.88 KiB  16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> After:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load        Tokens  Owns (effective)  Host ID                  
>              Rack
> UN  127.0.0.3  149.62 KiB  16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  155.48 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  75.74 KiB   16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> On previous tests of this I have created a table with a replication factor of 
> 1, inserted some data before the swap.   After the swap the data on nodes 2 
> and 3 is now missing. 
> One theory I have is that I am using different port numbers for the different 
> nodes, and I am only swapping the ip addresses and not the port numbers, so 
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster

2024-04-18 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838761#comment-17838761
 ] 

Sam Tunnicliffe commented on CASSANDRA-19221:
-

+1. I left a couple of minor suggestions on the PR, feel free to accept or 
ignore them.

> CMS: Nodes can restart with new ipaddress already defined in the cluster
> 
>
> Key: CASSANDRA-19221
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19221
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Paul Chandler
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html
>
>
> I am simulating running a cluster in Kubernetes and testing what happens when 
> several pods go down and  ip addresses are swapped between nodes. In 4.0 this 
> is blocked and the node cannot be restarted.
> To simulate this I create a 3 node cluster on a local machine using 3 
> loopback addresses
> {code}
> 127.0.0.1
> 127.0.0.2
> 127.0.0.3
> {code}
> The nodes are created correctly and the first node is assigned as a CMS node 
> as shown:
> {code}
> bin/nodetool -p 7199 describecms
> {code}
> Cluster Metadata Service:
> {code}
> Members: /127.0.0.1:7000
> Is Member: true
> Service State: LOCAL
> {code}
> At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip 
> addresses for the rpc_address and listen_address 
>  
> The nodes come back as normal, but the nodeid has now been swapped against 
> the ip address:
> Before:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load       Tokens  Owns (effective)  Host ID                   
>             Rack
> UN  127.0.0.3  75.2 KiB   16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  86.77 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  80.88 KiB  16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> After:
> {code}
> Datacenter: datacenter1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address    Load        Tokens  Owns (effective)  Host ID                  
>              Rack
> UN  127.0.0.3  149.62 KiB  16      76.0%             
> 6d194555-f6eb-41d0-c000-0003  rack1
> UN  127.0.0.2  155.48 KiB  16      59.3%             
> 6d194555-f6eb-41d0-c000-0002  rack1
> UN  127.0.0.1  75.74 KiB   16      64.7%             
> 6d194555-f6eb-41d0-c000-0001  rack1
> {code}
> On previous tests of this I have created a table with a replication factor of 
> 1, inserted some data before the swap.   After the swap the data on nodes 2 
> and 3 is now missing. 
> One theory I have is that I am using different port numbers for the different 
> nodes, and I am only swapping the ip addresses and not the port numbers, so 
> the ip:port still looks unique
> i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044
> and 127.0.0.3:9044 becomes 127.0.0.3:9043
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-18 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838739#comment-17838739
 ] 

Sam Tunnicliffe commented on CASSANDRA-19344:
-

Rebased and attached an updated {{ci_summary-1.html}}

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html, 
> remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)

[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Attachment: ci_summary-1.html

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary-1.html, ci_summary.html, 
> remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> 

[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Source Control Link: 
https://github.com/apache/cassandra/commit/cbf4dcb3345c7e2f42f6a897c66b6460b7acc2ca
  (was: 
https://github.com/apache/cassandra/commit/a5b8c06bb925905719261b1f449fffb049f54d1b)
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Committed, thanks!

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Reviewers: Alex Petrov, Blake Eggleston, Marcus Eriksson, Sam Tunnicliffe  
(was: Alex Petrov, Blake Eggleston, Marcus Eriksson)
   Status: Review In Progress  (was: Patch Available)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Status: Ready to Commit  (was: Review In Progress)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node

2024-04-18 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19538:

  Fix Version/s: 5.1
 (was: 5.x)
  Since Version: NA
Source Control Link: 
https://github.com/apache/cassandra/commit/80971709b983566a3f2dbfc189dfa1c5367d69bb
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Merged to trunk (with ninja followup because *someone* forgot to add 
{{CHANGES.txt}})

> Test Failure: test_assassinate_valid_node
> -
>
> Key: CASSANDRA-19538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19538
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary-1.html, ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failing consistently on trunk:
> {code:java}
> ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 
> seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:
>  Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
>  Tail: ... some nodes were not ready
> INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 
> - Setup task failed with error, rescheduling
> self = 
> def test_assassinate_valid_node(self):
> """
> @jira_ticket CASSANDRA-16588
> Test that after taking two non-seed nodes down and assassinating
> one of them, the other can come back up.
> """
> cluster = self.cluster
> 
> cluster.populate(5).start()
> node1 = cluster.nodelist()[0]
> node3 = cluster.nodelist()[2]
> 
> self.cluster.set_configuration_options({
> 'seed_provider': [{'class_name': 
> 'org.apache.cassandra.locator.SimpleSeedProvider',
>'parameters': [{'seeds': node1.address()}]
>   }]
> })
> 
> non_seed_nodes = cluster.nodelist()[-2:]
> for node in non_seed_nodes:
> node.stop()
> 
> assassination_target = non_seed_nodes[0]
> logger.debug("Assassinating non-seed node 
> {}".format(assassination_target.address()))
> out, err, _ = node1.nodetool("assassinate 
> {}".format(assassination_target.address()))
> assert_stderr_clean(err)
> 
> logger.debug("Starting non-seed nodes")
> for node in non_seed_nodes:
> >   node.start()
> gossip_test.py:78: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start
> node.watch_log_for_alive(self, from_mark=mark)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in 
> watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for
> TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> start = 1712173052.8186479, timeout = 120
> msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n 
> Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] 
> 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed 
> with error, rescheduling\n"
> node = 'node1'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> >   raise TimeoutError.create(start, timeout, msg, node)
> E   ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 
> 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in 
> system.log:
> EHead: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
> ETail: ... some nodes were not ready
> E   INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 
> CassandraRoleManager.java:484 - Setup task failed with error, rescheduling
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node

2024-04-18 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838624#comment-17838624
 ] 

Sam Tunnicliffe commented on CASSANDRA-19538:
-

+1 to the followup commit too

> Test Failure: test_assassinate_valid_node
> -
>
> Key: CASSANDRA-19538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19538
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failing consistently on trunk:
> {code:java}
> ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 
> seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:
>  Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
>  Tail: ... some nodes were not ready
> INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 
> - Setup task failed with error, rescheduling
> self = 
> def test_assassinate_valid_node(self):
> """
> @jira_ticket CASSANDRA-16588
> Test that after taking two non-seed nodes down and assassinating
> one of them, the other can come back up.
> """
> cluster = self.cluster
> 
> cluster.populate(5).start()
> node1 = cluster.nodelist()[0]
> node3 = cluster.nodelist()[2]
> 
> self.cluster.set_configuration_options({
> 'seed_provider': [{'class_name': 
> 'org.apache.cassandra.locator.SimpleSeedProvider',
>'parameters': [{'seeds': node1.address()}]
>   }]
> })
> 
> non_seed_nodes = cluster.nodelist()[-2:]
> for node in non_seed_nodes:
> node.stop()
> 
> assassination_target = non_seed_nodes[0]
> logger.debug("Assassinating non-seed node 
> {}".format(assassination_target.address()))
> out, err, _ = node1.nodetool("assassinate 
> {}".format(assassination_target.address()))
> assert_stderr_clean(err)
> 
> logger.debug("Starting non-seed nodes")
> for node in non_seed_nodes:
> >   node.start()
> gossip_test.py:78: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start
> node.watch_log_for_alive(self, from_mark=mark)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in 
> watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for
> TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> start = 1712173052.8186479, timeout = 120
> msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n 
> Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] 
> 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed 
> with error, rescheduling\n"
> node = 'node1'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> >   raise TimeoutError.create(start, timeout, msg, node)
> E   ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 
> 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in 
> system.log:
> EHead: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
> ETail: ... some nodes were not ready
> E   INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 
> CassandraRoleManager.java:484 - Setup task failed with error, rescheduling
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19567) Minimize the heap consumption when registering metrics

2024-04-18 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838546#comment-17838546
 ] 

Sam Tunnicliffe commented on CASSANDRA-19567:
-

bq. The problem is only reproducible on the x86 machine, the problem is not 
reproducible on the arm64. 

We've observed the increased heap usage on apple silicon, so I don't believe 
this is entirely true. 

> Minimize the heap consumption when registering metrics
> --
>
> Key: CASSANDRA-19567
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19567
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Maxim Muzafarov
>Assignee: Maxim Muzafarov
>Priority: Normal
> Fix For: 5.x
>
>
> The problem is only reproducible on the x86 machine, the problem is not 
> reproducible on the arm64. A quick analysis showed a lot of MetricName 
> objects stored in the heap, although the real cause could be related to 
> something else, the MetricName object requires extra attention.
> To reproduce run the command run locally:
> {code}
> ant test-jvm-dtest-some 
> -Dtest.name=org.apache.cassandra.distributed.test.ReadRepairTest
> {code}
> The error:
> {code:java}
> [junit-timeout] Exception in thread "main" java.lang.OutOfMemoryError: Java 
> heap space
> [junit-timeout]     at 
> java.base/java.lang.StringLatin1.newString(StringLatin1.java:769)
> [junit-timeout]     at 
> java.base/java.lang.StringBuffer.toString(StringBuffer.java:716)
> [junit-timeout]     at 
> org.apache.cassandra.CassandraBriefJUnitResultFormatter.endTestSuite(CassandraBriefJUnitResultFormatter.java:191)
> [junit-timeout]     at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.fireEndTestSuite(JUnitTestRunner.java:854)
> [junit-timeout]     at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:578)
> [junit-timeout]     at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1197)
> [junit-timeout]     at 
> org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1042)
> [junit-timeout] Testsuite: 
> org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED
> [junit-timeout] Testsuite: 
> org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED
>  Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec
> [junit-timeout] 
> [junit-timeout] Testcase: 
> org.apache.cassandra.distributed.test.ReadRepairTest:readRepairRTRangeMovementTest-cassandra.testtag_IS_UNDEFINED:
>     Caused an ERROR
> [junit-timeout] Forked Java VM exited abnormally. Please note the time in the 
> report does not reflect the time until the VM exit.
> [junit-timeout] junit.framework.AssertionFailedError: Forked Java VM exited 
> abnormally. Please note the time in the report does not reflect the time 
> until the VM exit.
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]     at java.base/java.util.Vector.forEach(Vector.java:1365)
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]     at java.base/java.util.Vector.forEach(Vector.java:1365)
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]     at 
> jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> [junit-timeout]     at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout] 
> [junit-timeout] 
> [junit-timeout] Test org.apache.cassandra.distributed.test.ReadRepairTest 
> FAILED (crashed)BUILD FAILED
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19538:

Status: Ready to Commit  (was: Review In Progress)

+1 LGTM. 

Using an incorrect {{lastModified}} was causing updates to gossip state not to 
fire, because it looked to the listener like nothing had changed.

> Test Failure: test_assassinate_valid_node
> -
>
> Key: CASSANDRA-19538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19538
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failing consistently on trunk:
> {code:java}
> ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 
> seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:
>  Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
>  Tail: ... some nodes were not ready
> INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 
> - Setup task failed with error, rescheduling
> self = 
> def test_assassinate_valid_node(self):
> """
> @jira_ticket CASSANDRA-16588
> Test that after taking two non-seed nodes down and assassinating
> one of them, the other can come back up.
> """
> cluster = self.cluster
> 
> cluster.populate(5).start()
> node1 = cluster.nodelist()[0]
> node3 = cluster.nodelist()[2]
> 
> self.cluster.set_configuration_options({
> 'seed_provider': [{'class_name': 
> 'org.apache.cassandra.locator.SimpleSeedProvider',
>'parameters': [{'seeds': node1.address()}]
>   }]
> })
> 
> non_seed_nodes = cluster.nodelist()[-2:]
> for node in non_seed_nodes:
> node.stop()
> 
> assassination_target = non_seed_nodes[0]
> logger.debug("Assassinating non-seed node 
> {}".format(assassination_target.address()))
> out, err, _ = node1.nodetool("assassinate 
> {}".format(assassination_target.address()))
> assert_stderr_clean(err)
> 
> logger.debug("Starting non-seed nodes")
> for node in non_seed_nodes:
> >   node.start()
> gossip_test.py:78: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start
> node.watch_log_for_alive(self, from_mark=mark)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in 
> watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for
> TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> start = 1712173052.8186479, timeout = 120
> msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n 
> Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] 
> 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed 
> with error, rescheduling\n"
> node = 'node1'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> >   raise TimeoutError.create(start, timeout, msg, node)
> E   ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 
> 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in 
> system.log:
> EHead: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
> ETail: ... some nodes were not ready
> E   INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 
> CassandraRoleManager.java:484 - Setup task failed with error, rescheduling
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19538:

Reviewers: Sam Tunnicliffe, Sam Tunnicliffe
   Sam Tunnicliffe, Sam Tunnicliffe  (was: Sam Tunnicliffe)
   Status: Review In Progress  (was: Patch Available)

> Test Failure: test_assassinate_valid_node
> -
>
> Key: CASSANDRA-19538
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19538
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI, Test/dtest/python
>Reporter: Ekaterina Dimitrova
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Failing consistently on trunk:
> {code:java}
> ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 
> seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:
>  Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
>  Tail: ... some nodes were not ready
> INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 
> - Setup task failed with error, rescheduling
> self = 
> def test_assassinate_valid_node(self):
> """
> @jira_ticket CASSANDRA-16588
> Test that after taking two non-seed nodes down and assassinating
> one of them, the other can come back up.
> """
> cluster = self.cluster
> 
> cluster.populate(5).start()
> node1 = cluster.nodelist()[0]
> node3 = cluster.nodelist()[2]
> 
> self.cluster.set_configuration_options({
> 'seed_provider': [{'class_name': 
> 'org.apache.cassandra.locator.SimpleSeedProvider',
>'parameters': [{'seeds': node1.address()}]
>   }]
> })
> 
> non_seed_nodes = cluster.nodelist()[-2:]
> for node in non_seed_nodes:
> node.stop()
> 
> assassination_target = non_seed_nodes[0]
> logger.debug("Assassinating non-seed node 
> {}".format(assassination_target.address()))
> out, err, _ = node1.nodetool("assassinate 
> {}".format(assassination_target.address()))
> assert_stderr_clean(err)
> 
> logger.debug("Starting non-seed nodes")
> for node in non_seed_nodes:
> >   node.start()
> gossip_test.py:78: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start
> node.watch_log_for_alive(self, from_mark=mark)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in 
> watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for
> TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name,
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> start = 1712173052.8186479, timeout = 120
> msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n 
> Head: INFO  [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] 
> 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed 
> with error, rescheduling\n"
> node = 'node1'
> @staticmethod
> def raise_if_passed(start, timeout, msg, node=None):
> if start + timeout < time.time():
> >   raise TimeoutError.create(start, timeout, msg, node)
> E   ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 
> 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in 
> system.log:
> EHead: INFO  [Messaging-EventLoop-3-1] 2024-04-03 19:37:3
> ETail: ... some nodes were not ready
> E   INFO  [OptionalTasks:1] 2024-04-03 19:39:30,454 
> CassandraRoleManager.java:484 - Setup task failed with error, rescheduling
> ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError
> {code}
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0
> https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Reviewers: Alex Petrov, Blake Eggleston, Marcus Eriksson  (was: Blake 
Eggleston)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-17 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838035#comment-17838035
 ] 

Sam Tunnicliffe commented on CASSANDRA-19514:
-

Trunk PR and CI results attached. The python dtest failure is CASSANDRA-19538, 
of the 2 python upgrade test failures, one looks like CASSANDRA-19520 and the 
other may be a timeout related.

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Attachment: ci_summary.html
result_details.tar.gz

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-17 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Status: Patch Available  (was: In Progress)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837201#comment-17837201
 ] 

Sam Tunnicliffe commented on CASSANDRA-12937:
-

So you're suggesting that we could have different default values in yaml across 
the cluster, but that all nodes actually apply the same value regardless of 
their own configured default? Which specific value makes it into schema just 
depends on which instance acts as the coordinator for a given DCL statement? It 
seems like if we actually want these to be cluster wide values and not 
configurable on a per-node basis the defaults themselves should be in TCM, 
independently of the schema transformations. In the past, we've used per-node 
configuration like this to experiment with new compression algorithms on a 
per-node basis and I can imagine potentially wanting to do the same with things 
like compaction, so I'm not entirely convinced that this assumption is correct. 

As far as the serialization format, schema transformations have to be 
round-trippable via CQL for the purposes of recreating a schema from a 
snapshot. So I don't think that using the CQL itself as the format is 
inherently flawed and it does have a couple of big positives, namely that it's 
great for visibility (for operators or when debugging) and that it doesn't 
invent a new format, that we have version and manage as new 
flags/features/defaults are added. 
   
We should just need fully resolve and expand a DCL statement before serializing 
it in {{SchemaAlteringStatement}} which would be entirely possible, but I 
remain unconvinced that just picking the defaults from whatever node happens to 
be coordinating is the right way to go.   

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837179#comment-17837179
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-12937 at 4/15/24 11:11 AM:
---

The problem with that is that the defaults may be different on every instance, 
so what exactly should be stored in the TCM log? Ideally we should store the 
value that is actually resolved during initial execution on each node so that 
it can be re-used if/when the transformation is reapplied. That should probably 
be in a parallel local datastructure though, not in the node's local log table 
as we don't want to ship those local defaults to peers when providing log 
catchup (because they should use their own defaults).  


was (Author: beobal):
The problem with that is that the defaults may be different on every instance, 
so what exactly should be stored in the TCM log? Ideally we should store the 
value that is actually resolved during initial execution on each node to be 
persisted locally so that it can be re-used if/when the transformation is 
reapplied. That should probably be in a parallel local datastructure though, 
not in the node's local log table as we don't want to ship those local defaults 
to peers when providing log catchup (because they should use their own 
defaults).  

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12937) Default setting (yaml) for SSTable compression

2024-04-15 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837179#comment-17837179
 ] 

Sam Tunnicliffe commented on CASSANDRA-12937:
-

The problem with that is that the defaults may be different on every instance, 
so what exactly should be stored in the TCM log? Ideally we should store the 
value that is actually resolved during initial execution on each node to be 
persisted locally so that it can be re-used if/when the transformation is 
reapplied. That should probably be in a parallel local datastructure though, 
not in the node's local log table as we don't want to ship those local defaults 
to peers when providing log catchup (because they should use their own 
defaults).  

> Default setting (yaml) for SSTable compression
> --
>
> Key: CASSANDRA-12937
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12937
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Michael Semb Wever
>Assignee: Stefan Miklosovic
>Priority: Low
>  Labels: AdventCalendar2021
> Fix For: 5.x
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>
> In many situations the choice of compression for sstables is more relevant to 
> the disks attached than to the schema and data.
> This issue is to add to cassandra.yaml a default value for sstable 
> compression that new tables will inherit (instead of the defaults found in 
> {{CompressionParams.DEFAULT}}.
> Examples where this can be relevant are filesystems that do on-the-fly 
> compression (btrfs, zfs) or specific disk configurations or even specific C* 
> versions (see CASSANDRA-10995 ).
> +Additional information for newcomers+
> Some new fields need to be added to {{cassandra.yaml}} to allow specifying 
> the field required for defining the default compression parameters. In 
> {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for 
> the default compression. This field should be initialized in 
> {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where 
> {{CompressionParams.DEFAULT}} was used the code should call 
> {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some 
> copy of configured {{CompressionParams}}.
> Some unit test using {{OverrideConfigurationLoader}} should be used to test 
> that the table schema use the new default when a new table is created (see 
> CreateTest for some example).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration

2024-04-15 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837177#comment-17837177
 ] 

Sam Tunnicliffe commented on CASSANDRA-18954:
-

[~jlewandowski] 
bq.  CASSANDRA-12937 problem is caused by the fact that the transformations are 
not pure. It is not enough that they are side-effects-free, they also cannot 
depend on any external properties other than the current cluster state and the 
store transformation data.

Yes, this is exactly what I mean and we're looking into a proper fix for this 
now
 

> Transformations should be pure so that replaying them results in the same 
> outcome regardless of the node state or configuration
> ---
>
> Key: CASSANDRA-18954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18954
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> Discussed on Slack



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-12 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe reassigned CASSANDRA-19514:
---

Assignee: Sam Tunnicliffe  (was: David Capwell)

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19344:

Attachment: remove-n4-pre-19344.txt
remove-n4-post-19344.txt

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, remove-n4-post-19344.txt, 
> remove-n4-pre-19344.txt, result_details.tar.gz
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> 

[jira] [Commented] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-04-11 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836302#comment-17836302
 ] 

Sam Tunnicliffe commented on CASSANDRA-19344:
-

The actual cause was that the way we construct placement deltas for a 
PlacementTransitionPlan did not properly consider transientness. Multi-step 
operations always follow the pattern:

* add new write replicas
* add new read replicas/remove old read replicas
* remove old write replicas

So when an operation causes a replica to transition from TRANSIENT to FULL for 
the same range (or part of a range), it could become a FULL read replica before 
becoming a FULL write replica.
Consider this simplified example where we remove N4 and the effect on N2:

{code}
RF=3/1
At START 
  10203040 
+-+-+-+-+-+
  N1N2N3N4

N2 replicates:
  (10,20]   - FULL (Primary Range)
  (,10] + (40,] - FULL
  (30,40]   - TRANSIENT

After FINISH

  102030
+-+-+-+---+
  N1N2N3

N2 replicates:
  (10,20]   - FULL (Primary Range)
  (,10] + (30,] - FULL
  (20,30]   - TRANSIENT

In removing N4, N2 gains (20,30] TRANSIENT and (30,40] TRANSIENT -> FULL 
Potential problem -> 
for READS N2 becomes FULL(30,40] after MID_LEAVE 
for WRITES N2 only becomes FULL(30,40] after FINISH_LEAVE
so between the 2 events, coordinators will not send writes to N2 unless one 
of the other replicas is unresponsive. 
Coordinators will send reads to N2 during this window though. 
If cleanup is run before N2 becomes a FULL replica for (30,40], any data 
for that range (including that which was 
just streamed to it) will be purged.
{code}

Below is an illustration of the ranges replicated by N2 at each step:

{code}
+-+--+
|EPOCH| STATE  | RANGES REPLICATED BY N2
 |
|-++-|
|0| START STATE| WRITES -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]|
| ||  READS -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]|
|-++-|
|1| ENACT START_LEAVE(N4)  | WRITES -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(20,30], (30,40]]   |
| ||  READS -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]|
|-++-|
|2| ENACT MID_LEAVE(N4)| WRITES -> FULL: [(40,], (,10], (10,20]]
  TRANSIENT: [(20,30], (30,40]]  |
| ||  READS -> FULL: [(40,], (,10], (10,20], 
(30,40]] TRANSIENT: [(20,30]]   |
|-++-|
|3| ENACT FINISH_LEAVE(N4) | WRITES -> FULL: [(30,], (,10], (10,20]] 
TRANSIENT: [(20,30]]|
| ||  READS -> FULL: [(30,], (,10], (10,20]] 
TRANSIENT: [(20,30]]|
+-++-+
{code}

After applying the fix here, these are changed so that the {{(30,40]}} changing 
from {{TRANSIENT}} to {{FULL}} for 
writes is part of enacting the {{START_LEAVE(N4)}} in epoch 1, i.e. before N2 
becomes a FULL replica for reads of
{{(30,40]}} when {{MID_LEAVE(N4)}} is enacted in epoch 2. 

{code}
+-+--+
|EPOCH| STATE  | RANGES REPLICATED BY N2
 |
|-++-|
|0| START STATE|  WRITES -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]   |
| ||   READS -> FULL: [(40,], (,10], (10,20]] 
TRANSIENT: [(30,40]]   |
|-++-|
|1| ENACT START_LEAVE(N4)  | WRITES -> FULL: [(40,], (,10], (10,20], 
(30,40]] TRANSIENT: [(20,30]]   |
| ||  READS -> FULL: [(40,], (,10], (10,20]]
  TRANSIENT: [(30,40]]   |

[jira] [Commented] (CASSANDRA-19516) Use Transformation.Kind.id in local and distributed log tables

2024-04-11 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836280#comment-17836280
 ] 

Sam Tunnicliffe commented on CASSANDRA-19516:
-

+1

> Use Transformation.Kind.id in local and distributed log tables
> --
>
> Key: CASSANDRA-19516
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19516
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should store {{Kind.id}} added in CASSANDRA-19390 in the local and 
> distributed log tables. Virtual table will still do the id -> string lookup 
> for easier reading



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19516) Use Transformation.Kind.id in local and distributed log tables

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19516:

Status: Review In Progress  (was: Patch Available)

> Use Transformation.Kind.id in local and distributed log tables
> --
>
> Key: CASSANDRA-19516
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19516
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Attachments: ci_summary.html
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We should store {{Kind.id}} added in CASSANDRA-19390 in the local and 
> distributed log tables. Virtual table will still do the id -> string lookup 
> for easier reading



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Source Control Link: 
https://github.com/apache/cassandra/commit/728b9ec4c604f6939facf62a261ca795ef6dbf0c
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Fix Version/s: 5.1-alpha1
   (was: 5.x)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.1-alpha1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Status: Ready to Commit  (was: Review In Progress)

+1 from Alex on the PR

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Status: Needs Committer  (was: Patch Available)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-11 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Reviewers: Alex Petrov
   Status: Review In Progress  (was: Needs Committer)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration

2024-04-10 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835805#comment-17835805
 ] 

Sam Tunnicliffe commented on CASSANDRA-18954:
-

Yep, fair enough. I wouldn't expect you to have followed those things, my 
intention was really to give a heads up that I don't think this is an issue 
anymore and that I'd probably close it soon. That was the thinking when I made 
the original comment, before CASSANDRA-12937 highlighted the issue with 
mutating local config between restarts. I'll leave this alone until we resolve 
that, which we're working towards now.

> Transformations should be pure so that replaying them results in the same 
> outcome regardless of the node state or configuration
> ---
>
> Key: CASSANDRA-18954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18954
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> Discussed on Slack



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Attachment: ci_summary.html
result_details.tar.gz

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Attachment: (was: result_details.tar.gz)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner

2024-04-10 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19482:

Attachment: (was: ci_summary.html)

> Simplify metadata log implementation using custom partitioner
> -
>
> Key: CASSANDRA-19482
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19482
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The distributed metadata log table can be simplified by leveraging the fact 
> that replicas are all responsible for the entire token range. Given this 
> assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in 
> CASSANDRA-19391 to make it much easier to append to/read from the tail of the 
> log, effectively removing the need for the {{Period}} construct. This will 
> also improve apply to the local metadata log used at startup.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration

2024-04-09 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835304#comment-17835304
 ] 

Sam Tunnicliffe commented on CASSANDRA-18954:
-

Actually, scratch that just for now - I think we need to address 
CASSANDRA-12937 first

> Transformations should be pure so that replaying them results in the same 
> outcome regardless of the node state or configuration
> ---
>
> Key: CASSANDRA-18954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18954
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> Discussed on Slack



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail

2024-04-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19514:

Resolution: (was: Fixed)
Status: Open  (was: Resolved)

Reopening this as the problem is entirely relevant to trunk so we should apply 
the patch there too.

> When jvm-dtest is shutting down an instance TCM retries block the shutdown 
> causing the test to fail
> ---
>
> Key: CASSANDRA-19514
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19514
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership, Test/dtest/java
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 5.1
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks
> {code}
> java.lang.RuntimeException: java.util.concurrent.TimeoutException
>org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540)
>
> org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098)
>
> org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77)
>java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
>
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  Caused by: java.util.concurrent.TimeoutException
>
> org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253)
>
> org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) 
> Suppressed: java.util.concurrent.TimeoutException
> {code}
> In debugger I found the blocked future and it was 
> src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on 
> src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration

2024-04-03 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833530#comment-17833530
 ] 

Sam Tunnicliffe commented on CASSANDRA-18954:
-

[~jlewandowski] I think that this might be obsolete now, following a few 
changes that landed before {{cep-21-tcm}} was merged, plus CASSANDRA-19271 and 
CASSANDRA-19384. Replay of log entries during startup don't enact any side 
effects now as they are replayed. Would you mind if I closed this?

> Transformations should be pure so that replaying them results in the same 
> outcome regardless of the node state or configuration
> ---
>
> Key: CASSANDRA-18954
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18954
> Project: Cassandra
>  Issue Type: Bug
>  Components: Transactional Cluster Metadata
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> Discussed on Slack



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19271) Improve setup and initialisation of LocalLog/LogSpec

2024-04-03 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19271:

Epic Link: CASSANDRA-19055

> Improve setup and initialisation of LocalLog/LogSpec
> 
>
> Key: CASSANDRA-19271
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19271
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Transactional Cluster Metadata
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Urgent
> Fix For: 5.1
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13855) Implement Http Seed provider

2024-03-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829941#comment-17829941
 ] 

Sam Tunnicliffe edited comment on CASSANDRA-13855 at 3/22/24 5:27 PM:
--

Not completely relevant to the discussion here, but slightly adjacent... the 
roles of both snitches and seeds have changed slightly in trunk. See 
CASSANDRA-19488 which I finally remembered to file for more detail on snitches. 
Seeds are now really only necessary as initial contact points for joining 
nodes. Technically, they do still perform the same function regarding gossip 
convergence, but that it is way less important/relevant now as we don't rely on 
gossip state for correctness. Of course, this doesn't make the goal of this 
JIRA any less valid, I just thought I should mention it. 



was (Author: beobal):
Not completely relevant to the discussion here, but slightly adjacent... the 
roles of both snitches and seeds have changed slightly in trunk. See 
CASSANDRA-XXX which I finally remembered to file for more detail on snitches. 
Seeds are now really only necessary as initial contact points for joining 
nodes. Technically, they do still perform the same function regarding gossip 
convergence, but that it is way less important/relevant now as we don't rely on 
gossip state for correctness. Of course, this doesn't make the goal of this 
JIRA any less valid, I just thought I should mention it. 


> Implement Http Seed provider
> 
>
> Key: CASSANDRA-13855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13855
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination, Legacy/Core
>Reporter: Jon Haddad
>Assignee: Claude Warren
>Priority: Low
>  Labels: lhf
> Fix For: 5.x
>
> Attachments: 0001-Add-URL-Seed-Provider-trunk.txt, signature.asc, 
> signature.asc, signature.asc
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Seems like including a dead simple seed provider that can fetch from a URL, 1 
> line per seed, would be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13855) Implement Http Seed provider

2024-03-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829941#comment-17829941
 ] 

Sam Tunnicliffe commented on CASSANDRA-13855:
-

Not completely relevant to the discussion here, but slightly adjacent... the 
roles of both snitches and seeds have changed slightly in trunk. See 
CASSANDRA-XXX which I finally remembered to file for more detail on snitches. 
Seeds are now really only necessary as initial contact points for joining 
nodes. Technically, they do still perform the same function regarding gossip 
convergence, but that it is way less important/relevant now as we don't rely on 
gossip state for correctness. Of course, this doesn't make the goal of this 
JIRA any less valid, I just thought I should mention it. 


> Implement Http Seed provider
> 
>
> Key: CASSANDRA-13855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13855
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Coordination, Legacy/Core
>Reporter: Jon Haddad
>Assignee: Claude Warren
>Priority: Low
>  Labels: lhf
> Fix For: 5.x
>
> Attachments: 0001-Add-URL-Seed-Provider-trunk.txt, signature.asc, 
> signature.asc, signature.asc
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Seems like including a dead simple seed provider that can fetch from a URL, 1 
> line per seed, would be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19488) Ensure snitches always defer to ClusterMetadata

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19488:

Change Category: Operability
 Complexity: Normal
  Fix Version/s: 5.x
 Status: Open  (was: Triage Needed)

> Ensure snitches always defer to ClusterMetadata
> ---
>
> Key: CASSANDRA-19488
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership, Messaging/Internode, Transactional 
> Cluster Metadata
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
>
> Internally, C* always uses {{ClusterMetadata}} as the source of topology 
> information when calculating data placements, replica plans etc and as such 
> the role of the snitch has been somewhat reduced. 
> Sorting and comparison functions as provided by specialisations like 
> {{DynamicEndpointSnitch}} are still used, but the snitch should only be 
> responsible for providing the DC and rack for a new node when it first joins 
> a cluster.
> Aside from initial startup and registration, snitch implementations should 
> always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a 
> risk that the snitch config drifts out of sync with TCM and output from tools 
> like {{nodetool ring}} and {{gossipinfo}} becomes incorrect.
> A complication is that topology is used when opening connections to peers as 
> certain internode connection settings are variable at the DC level, so at the 
> time of connecting we want to check the location of the remote peer. Usually, 
> this is available from {{{}ClusterMetadata{}}}, but in the case of a brand 
> new node joining the cluster nothing is known a priori. The current 
> implementation assumes that the snitch will know the location of the new node 
> ahead of time, but in practice this is often not the case (though with 
> variants of {{PropertyFileSnitch}} it _should_ be), and the remote node is 
> temporarily assigned a default DC. This is problematic as it can cause the 
> internode connection settings which depend on DC to be incorrectly set. 
> Internode connections are long lived and any established while the DC is 
> unknown (potentially with incorrect config) will persist indefinitely. This 
> particular issue is not directly related to TCM and is present in earlier 
> versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-19488) Ensure snitches always defer to ClusterMetadata

2024-03-22 Thread Sam Tunnicliffe (Jira)
Sam Tunnicliffe created CASSANDRA-19488:
---

 Summary: Ensure snitches always defer to ClusterMetadata
 Key: CASSANDRA-19488
 URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
 Project: Cassandra
  Issue Type: Improvement
  Components: Cluster/Membership, Messaging/Internode, Transactional 
Cluster Metadata
Reporter: Sam Tunnicliffe
Assignee: Sam Tunnicliffe


Internally, C* always uses {{ClusterMetadata}} as the source of topology 
information when calculating data placements, replica plans etc and as such the 
role of the snitch has been somewhat reduced. 

Sorting and comparison functions as provided by specialisations like 
{{DynamicEndpointSnitch}} are still used, but the snitch should only be 
responsible for providing the DC and rack for a new node when it first joins a 
cluster.

Aside from initial startup and registration, snitch implementations should 
always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a 
risk that the snitch config drifts out of sync with TCM and output from tools 
like {{nodetool ring}} and {{gossipinfo}} becomes incorrect.

A complication is that topology is used when opening connections to peers as 
certain internode connection settings are variable at the DC level, so at the 
time of connecting we want to check the location of the remote peer. Usually, 
this is available from {{{}ClusterMetadata{}}}, but in the case of a brand new 
node joining the cluster nothing is known a priori. The current implementation 
assumes that the snitch will know the location of the new node ahead of time, 
but in practice this is often not the case (though with variants of 
{{PropertyFileSnitch}} it _should_ be), and the remote node is temporarily 
assigned a default DC. This is problematic as it can cause the internode 
connection settings which depend on DC to be incorrectly set. Internode 
connections are long lived and any established while the DC is unknown 
(potentially with incorrect config) will persist indefinitely. This particular 
issue is not directly related to TCM and is present in earlier versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19130) Implement transactional table truncation

2024-03-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829898#comment-17829898
 ] 

Sam Tunnicliffe commented on CASSANDRA-19130:
-

The way truncation works is that it writes a timestamp into a system table on 
each node, associated with the table being truncated (and a commitlog 
position). Then, when local reads and writes are done against that table, any 
cells with a timestamp earlier than the truncation is essentially discarded. If 
any node misses that message and so doesn't write the timestamp it won't do 
this filtering and so data can be resurrected. This is a strictly one time 
operation and there's no way for a node which does miss such a message to catch 
it up later, which is why {{TRUNCATE}} currently requires all nodes to be up.

With TCM, we can improve this by having an entry in the log which contains the 
truncation timestamp.  Then it can be distributed to peers the same way as any 
other log entry, allowing them to catch up if they miss it. Replicas and 
coordinators participating in a read already check that they're all up to date 
with each other attempt to catch up if not.

We shouldn't have to change how truncation works on the local level, just have 
{{TruncateStatement}} work by committing a new transform to the CMS. The 
trickiest bit will be to make sure that the {{execute}} method itself is 
side-effect free (i.e. it only produces a new ClusterMetadata). The way to do 
that is with a {{ChangeListener}} which implements a post-commit event to do 
the work of {{CFS::truncateBlocking}}

> Implement transactional table truncation
> 
>
> Key: CASSANDRA-19130
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19130
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Consistency/Coordination
>Reporter: Marcus Eriksson
>Assignee: Stefan Miklosovic
>Priority: Normal
> Fix For: 5.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> TRUNCATE table should leverage cluster metadata to ensure consistent 
> truncation timestamps across all replicas. The current implementation depends 
> on all nodes being available, but this could be reimplemented as a 
> {{Transformation}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

Attachment: 19482_patch_for_19255.diff

> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Fix For: 5.x
>
> Attachments: 19482_patch_for_19255.diff, ci_summary.html, 
> result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

Status: Needs Committer  (was: Patch Available)

> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

  Fix Version/s: 5.x
  Since Version: 5.x
Source Control Link: 
https://github.com/apache/cassandra/commit/a69c8657d75de627fb1fe518bfe1d657add11740
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

LGTM too, committed as {{a69c8657d75de627fb1fe518bfe1d657add11740}} (with a 
couple of extremely minor changes).
One thing to note is that this will need slight readjustment if CASSANDRA-19482 
is committed as the way we handle the {{MetaStrategy}} keyspace is changed. 
The attached diff will fix that if/when necessary.


> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

Status: Ready to Commit  (was: Review In Progress)

> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces

2024-03-22 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-19255:

Status: Review In Progress  (was: Needs Committer)

> StorageService.getRangeToEndpointMap() MBean operation is running into NPE 
> for LocalStrategy keysapces
> --
>
> Key: CASSANDRA-19255
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19255
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Membership
>Reporter: n.v.harikrishna
>Assignee: n.v.harikrishna
>Priority: Normal
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the StorageService's MBean operation getRangeToEndpointMap is called for 
> LocalStrategy keyspaces, then it is running into NPE. It is working in 
> earlier major version, but failing in trunk. It can be reproduced in local 
> using JConsole or using a tool like `jmxterm` (unfortunately these tools are 
> not giving full stacktrace). Observed the same behavior with 
> getRangeToEndpointWithPortMap operation too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets

2024-03-22 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829794#comment-17829794
 ] 

Sam Tunnicliffe commented on CASSANDRA-19344:
-

The linked PR modifies a number existing tests to make the original failure 
mode deterministic. It also adds support for transient replication to  
{{PlacementSimulator}}  {{MetadataChangeSimulationTest}} and the associated 
{{{}TokenPlacementModel{}}}. Finally, it modifies the way the 
{{PlacementTransitionPlan}} is prepared for operations involving range 
movements to ensure that any transition from a transient to full replica 
happens safely.

> Range movements involving transient replicas must safely enact changes to 
> read and write replica sets
> -
>
> Key: CASSANDRA-19344
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19344
> Project: Cassandra
>  Issue Type: Bug
>  Components: CI
>Reporter: Ekaterina Dimitrova
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 5.x
>
> Attachments: ci_summary.html, result_details.tar.gz
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> (edit) This was originally opened due to a flaky test 
> {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}}
> The test can fail in two different ways:
> {code:java}
> junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), 
> (31,50)] at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203)
>  at 
> org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code}
> as in here - 
> [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0]
> and
> {code:java}
> junit.framework.AssertionFailedError: nodetool command [removenode, 
> 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: 
> stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use 
> decommission command to remove it from the ring -- StackTrace -- 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038)
>  at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at 
> org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: 
> java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and 
> owns this ID. Use decommission command to remove it from the ring at 
> org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110)
>  at 
> org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682)
>  at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at 
> org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at 
> org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388)
>  at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at 
> org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at 
> org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129)
>  at 
> 

  1   2   3   4   5   6   7   8   9   10   >