[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Reviewers: Sam Tunnicliffe, Stefan Miklosovic (was: Sam Tunnicliffe) > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Fix Version/s: 5.1 Since Version: NA Source Control Link: https://github.com/apache/cassandra/commit/7fe30fc313ac35b1156f5a37d2069e29cded710b Resolution: Fixed Status: Resolved (was: Ready to Commit) committed, thanks. > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Status: Ready to Commit (was: Review In Progress) > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848984#comment-17848984 ] Sam Tunnicliffe commented on CASSANDRA-19592: - {quote}Does anything else need to be done except merging? {quote} No, I think it just fell between Alex & me. I'll get it rebased & merged. > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19593) Transactional Guardrails
[ https://issues.apache.org/jira/browse/CASSANDRA-19593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848624#comment-17848624 ] Sam Tunnicliffe commented on CASSANDRA-19593: - {quote}This brings us to more general problem of transactional configuration which should be done as well. It is questionable if it is desirable to do it as part of this ticket or not, however, I would like to look into how we could do that as well. {quote} We've been working on some proposals for this, some of which were briefly discussed in CASSANDRA-12937. I agree with [~ifesdjeen] in that this warrants its own CEP. I know he's been working on document for that, I'll see if we can get that ready for circulation. > Transactional Guardrails > > > Key: CASSANDRA-19593 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19593 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > I think it is time to start to think about this more seriously. TCM is > getting into pretty nice shape and we might start to investigate how to do > this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848544#comment-17848544 ] Sam Tunnicliffe commented on CASSANDRA-19556: - [~mck] this certainly isn't critical for 5.1/6.0, my comment was just intended as a counterpoint to illustrate why it might but useful in a 5.0.x To that point, I'd definitely think about adding _something_ to minors in branches with upgrade paths to current trunk. Not an actual guardrail, just a system property or similar to optionally disable certain operations immediately prior to upgrade. If we did go down that route, there is some precedent from back in the day for mandating a minimum minor version prior to a major upgrade (from {{{}NEWS.txt{}}}): {code:java} Upgrade to 3.0 is supported from Cassandra 2.1 versions greater or equal to 2.1.9, or Cassandra 2.2 versions greater or equal to 2.2.2. {code} but like I said, this isn't critical for upgrading to current trunk and I'm definitely not advocating for anything in 5.0-rc > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846952#comment-17846952 ] Sam Tunnicliffe commented on CASSANDRA-19592: - [~ifesdjeen] I'm +1 on this version, wdyt? > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Attachment: ci_summary-1.html > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Status: Review In Progress (was: Changes Suggested) Discussed with Alex and made a few tweaks. Pushed the latest version and attached updated CI summary. The single test failure is unrelated. > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests
[ https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19599: Fix Version/s: 5.1 (was: 5.x) Source Control Link: https://github.com/apache/cassandra/commit/a15b137b7c8c84773453dbe264fcd2d4b76222c0 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed, thanks > Remove unused config params for out of range token requests > --- > > Key: CASSANDRA-19599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19599 > Project: Cassandra > Issue Type: Task > Components: Local/Config >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary.html > > > The fields {{log_out_of_token_range_requests}} and > {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually > been used and are just vestiges from early development on CEP-21. > We should remove them and the related accessors in {{DatabaseDescriptor}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests
[ https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19599: Status: Ready to Commit (was: Review In Progress) > Remove unused config params for out of range token requests > --- > > Key: CASSANDRA-19599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19599 > Project: Cassandra > Issue Type: Task > Components: Local/Config >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > > The fields {{log_out_of_token_range_requests}} and > {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually > been used and are just vestiges from early development on CEP-21. > We should remove them and the related accessors in {{DatabaseDescriptor}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests
[ https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19599: Reviewers: Marcus Eriksson Status: Review In Progress (was: Patch Available) > Remove unused config params for out of range token requests > --- > > Key: CASSANDRA-19599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19599 > Project: Cassandra > Issue Type: Task > Components: Local/Config >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > > The fields {{log_out_of_token_range_requests}} and > {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually > been used and are just vestiges from early development on CEP-21. > We should remove them and the related accessors in {{DatabaseDescriptor}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests
[ https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19599: Attachment: ci_summary.html > Remove unused config params for out of range token requests > --- > > Key: CASSANDRA-19599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19599 > Project: Cassandra > Issue Type: Task > Components: Local/Config >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > > The fields {{log_out_of_token_range_requests}} and > {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually > been used and are just vestiges from early development on CEP-21. > We should remove them and the related accessors in {{DatabaseDescriptor}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845345#comment-17845345 ] Sam Tunnicliffe edited comment on CASSANDRA-19556 at 5/10/24 1:15 PM: -- bq. personally i see no reason this needs to be in any 5.0.x Not an absolute necessity, but it would be quite convenient to have it or something similar as part of the preferred upgrade path to 5.1/6.0, as mentioned [here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan] was (Author: beobal): bq. personally i see no reason this needs to be in any 5.0.x Not an absolute necessity, but it would be quite convenient to have it or something similar, as part of the preferred upgrade path to 5.1/6.0, as mentioned [here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan] > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 1.5h > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845345#comment-17845345 ] Sam Tunnicliffe edited comment on CASSANDRA-19556 at 5/10/24 1:14 PM: -- bq. personally i see no reason this needs to be in any 5.0.x Not an absolute necessity, but it would be quite convenient to have it or something similar, as part of the preferred upgrade path to 5.1/6.0, as mentioned [here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan] was (Author: beobal): bq. personally i see no reason this needs to be in any 5.0.x Not an absolute necessity, but it would be quite convenient to have it or something similar, as part of the preferred upgrade path to 5.1/6.0, as mentioned [here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan] > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 1.5h > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845345#comment-17845345 ] Sam Tunnicliffe commented on CASSANDRA-19556: - bq. personally i see no reason this needs to be in any 5.0.x Not an absolute necessity, but it would be quite convenient to have it or something similar, as part of the preferred upgrade path to 5.1/6.0, as mentioned [here|https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata#CEP21:TransactionalClusterMetadata-MigrationPlan] > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 1.5h > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19158) Reuse native transport-driven futures in Debounce
[ https://issues.apache.org/jira/browse/CASSANDRA-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19158: Status: Changes Suggested (was: Review In Progress) Thanks, this is definitely an improvement. I've left a few comments on the PR > Reuse native transport-driven futures in Debounce > - > > Key: CASSANDRA-19158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19158 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 20m > Remaining Estimate: 0h > > Currently, we create a future in Debounce, then create one more future in > RemoteProcessor#sendWithCallback. This is further exacerbated by chaining > calls, when we first attempt to catch up from peer, and then from CMS. > First of all, we should always only use a native transport timeout driven > futures returned from sendWithCallback, since they implement reasonable > retries under the hood, and are easy to bulk-configure (ie you can simply > change timeout in yaml and have all futures change their behaviour). > Second, we should _chain_ futures and use map or andThen for fallback > operations such as trying to catch up from CMS after an unsuccesful attemp to > catch up from peer. > This should significantly simplify the code and number of blocked/waiting > threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19158) Reuse native transport-driven futures in Debounce
[ https://issues.apache.org/jira/browse/CASSANDRA-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19158: Status: Review In Progress (was: Patch Available) > Reuse native transport-driven futures in Debounce > - > > Key: CASSANDRA-19158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19158 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we create a future in Debounce, then create one more future in > RemoteProcessor#sendWithCallback. This is further exacerbated by chaining > calls, when we first attempt to catch up from peer, and then from CMS. > First of all, we should always only use a native transport timeout driven > futures returned from sendWithCallback, since they implement reasonable > retries under the hood, and are easy to bulk-configure (ie you can simply > change timeout in yaml and have all futures change their behaviour). > Second, we should _chain_ futures and use map or andThen for fallback > operations such as trying to catch up from CMS after an unsuccesful attemp to > catch up from peer. > This should significantly simplify the code and number of blocked/waiting > threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19158) Reuse native transport-driven futures in Debounce
[ https://issues.apache.org/jira/browse/CASSANDRA-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19158: Test and Documentation Plan: Refactoring, CI tests Status: Patch Available (was: Open) > Reuse native transport-driven futures in Debounce > - > > Key: CASSANDRA-19158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19158 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we create a future in Debounce, then create one more future in > RemoteProcessor#sendWithCallback. This is further exacerbated by chaining > calls, when we first attempt to catch up from peer, and then from CMS. > First of all, we should always only use a native transport timeout driven > futures returned from sendWithCallback, since they implement reasonable > retries under the hood, and are easy to bulk-configure (ie you can simply > change timeout in yaml and have all futures change their behaviour). > Second, we should _chain_ futures and use map or andThen for fallback > operations such as trying to catch up from CMS after an unsuccesful attemp to > catch up from peer. > This should significantly simplify the code and number of blocked/waiting > threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19158) Reuse native transport-driven futures in Debounce
[ https://issues.apache.org/jira/browse/CASSANDRA-19158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19158: Change Category: Code Clarity Complexity: Normal Component/s: Transactional Cluster Metadata Reviewers: Sam Tunnicliffe Status: Open (was: Triage Needed) > Reuse native transport-driven futures in Debounce > - > > Key: CASSANDRA-19158 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19158 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, we create a future in Debounce, then create one more future in > RemoteProcessor#sendWithCallback. This is further exacerbated by chaining > calls, when we first attempt to catch up from peer, and then from CMS. > First of all, we should always only use a native transport timeout driven > futures returned from sendWithCallback, since they implement reasonable > retries under the hood, and are easy to bulk-configure (ie you can simply > change timeout in yaml and have all futures change their behaviour). > Second, we should _chain_ futures and use map or andThen for fallback > operations such as trying to catch up from CMS after an unsuccesful attemp to > catch up from peer. > This should significantly simplify the code and number of blocked/waiting > threads. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19593) Transactional Guardrails
[ https://issues.apache.org/jira/browse/CASSANDRA-19593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844326#comment-17844326 ] Sam Tunnicliffe commented on CASSANDRA-19593: - I think this is a great idea, in fact I would suggest that all node configuration should be moving away from yaml and into global cluster metadata, per CEP-21. However, I don't think we should rush to do this just yet. The introduction of TCM has already meant a lot of change to the codebase and I can't help feeling that it would benefit from a period of stabilization. Understandably and correctly IMO, community focus for the past few months has been on getting 5.0 ready for release, so I don't believe there's actually been that many eyes on trunk and TCM in particular. While we have confidence in the design and implementation of CEP-21, my personal opinion is that it would be best to phase it in as incrementally as possible. Like I said, a lot has already changed but this is really the minimum that could be done to make it useful. I'd propose holding off on bringing more into TCM until 5.0 is out the door at least, but maybe even until 5.1 has had some meaningful exposure. > Transactional Guardrails > > > Key: CASSANDRA-19593 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19593 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > I think it is time to start to think about this more seriously. TCM is > getting into pretty nice shape and we might start to investigate how to do > this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19615) Merge pre-existing schema with the system defined one during upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-19615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19615: Status: Review In Progress (was: Patch Available) > Merge pre-existing schema with the system defined one during upgrade > > > Key: CASSANDRA-19615 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19615 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary.html > > Time Spent: 20m > Remaining Estimate: 0h > > When upgrading we should merge the pre-existing schema with the > system-defined schema. For example, if a table was defined in 5.0 in > system_distributed, but then removed from SystemDistributedKeyspace.java in > 5.1 we should still be able to read it (until manually dropped). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19615) Merge pre-existing schema with the system defined one during upgrade
[ https://issues.apache.org/jira/browse/CASSANDRA-19615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19615: Status: Ready to Commit (was: Review In Progress) +1 with one very minor comment on the PR > Merge pre-existing schema with the system defined one during upgrade > > > Key: CASSANDRA-19615 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19615 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary.html > > Time Spent: 20m > Remaining Estimate: 0h > > When upgrading we should merge the pre-existing schema with the > system-defined schema. For example, if a table was defined in 5.0 in > system_distributed, but then removed from SystemDistributedKeyspace.java in > 5.1 we should still be able to read it (until manually dropped). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19517) Raise priority of TCM internode messages during critical operations
[ https://issues.apache.org/jira/browse/CASSANDRA-19517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843238#comment-17843238 ] Sam Tunnicliffe commented on CASSANDRA-19517: - +1 > Raise priority of TCM internode messages during critical operations > --- > > Key: CASSANDRA-19517 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19517 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz > > > In a busy cluster, TCM messages may not get propagated throughout the > cluster, since they will be ordered together with other P1 messages (for > {{TCM_}} prefixed verbs), and with P2 with all Paxos operations. > To avoid this, and make sure we can continue cluster metadata changes, all > {{TCM_}}-prefixed verbs should have {{P0}} priority, just like Gossip > messages used to. All Paxos messages that involve distributed metadata > keyspace should now get an {{URGENT}} flag, which will instruct internode > messaging to schedule them on the {{URGENT_MESSAGES}} connection. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19581) Add nodetool command to unregister LEFT nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19581: Status: Ready to Commit (was: Changes Suggested) +1 > Add nodetool command to unregister LEFT nodes > - > > Key: CASSANDRA-19581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19581 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Attachments: ci_summary-1.html, ci_summary.html > > Time Spent: 20m > Remaining Estimate: 0h > > When decommissioning a node it still remains in ClusterMetadata with state = > LEFT. We should provide a nodetool command to unregister such nodes > completely. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
[ https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19613: Fix Version/s: 5.1 Since Version: NA Source Control Link: https://github.com/apache/cassandra/commit/51d048a93a7e7cfb93a544dabba4b6f7aa1bbdd1 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed, thanks! > Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages > -- > > Key: CASSANDRA-19613 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19613 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary.html > > > We should add \{{ClusterMetadata.instance().metadataIdentifier}} to > \{{GossipDigestSyn}} messages and compare with the local one, rejecting > anything that has the wrong identifier like we do with cluster name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
[ https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19613: Attachment: ci_summary.html > Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages > -- > > Key: CASSANDRA-19613 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19613 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Attachments: ci_summary.html > > > We should add \{{ClusterMetadata.instance().metadataIdentifier}} to > \{{GossipDigestSyn}} messages and compare with the local one, rejecting > anything that has the wrong identifier like we do with cluster name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
[ https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19613: Reviewers: Marcus Eriksson (was: Marcus Eriksson, Sam Tunnicliffe) > Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages > -- > > Key: CASSANDRA-19613 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19613 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > > We should add \{{ClusterMetadata.instance().metadataIdentifier}} to > \{{GossipDigestSyn}} messages and compare with the local one, rejecting > anything that has the wrong identifier like we do with cluster name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
[ https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19613: Reviewers: Marcus Eriksson, Sam Tunnicliffe Status: Review In Progress (was: Patch Available) > Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages > -- > > Key: CASSANDRA-19613 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19613 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > > We should add \{{ClusterMetadata.instance().metadataIdentifier}} to > \{{GossipDigestSyn}} messages and compare with the local one, rejecting > anything that has the wrong identifier like we do with cluster name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
[ https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19613: Test and Documentation Plan: New and existing tests in CI Status: Patch Available (was: Open) > Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages > -- > > Key: CASSANDRA-19613 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19613 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > > We should add \{{ClusterMetadata.instance().metadataIdentifier}} to > \{{GossipDigestSyn}} messages and compare with the local one, rejecting > anything that has the wrong identifier like we do with cluster name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
[ https://issues.apache.org/jira/browse/CASSANDRA-19613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19613: Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Discovered By: Unit Test Severity: Normal Status: Open (was: Triage Needed) This is relatively benign, but it is related to flakiness in {{{}o.a.c.distributed.test.tcm.SplitBrainTest{}}}. In that test, the two halves of a split brain cluster attempt to reestablish communication due to having members of both in the seed lists. When this prompts one side to try and catch up with metadata changes, this is correctly identified as an error which is what the test asserts. The flakiness arises when the gossip attempt is made before the epochs of the two separate clusters have had a chance to diverge. In that case, no further communication is performed and the error is not triggered. The proposed patch includes a rewrite of {{SplitBrainTest}} which decouples the catchup from gossip, removing the dependency on timing. > Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages > -- > > Key: CASSANDRA-19613 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19613 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > > We should add \{{ClusterMetadata.instance().metadataIdentifier}} to > \{{GossipDigestSyn}} messages and compare with the local one, rejecting > anything that has the wrong identifier like we do with cluster name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19613) Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages
Sam Tunnicliffe created CASSANDRA-19613: --- Summary: Add ClusterMetadata.metadataIdentifier to GossipDigestSyn messages Key: CASSANDRA-19613 URL: https://issues.apache.org/jira/browse/CASSANDRA-19613 Project: Cassandra Issue Type: Bug Components: Cluster/Gossip, Transactional Cluster Metadata Reporter: Sam Tunnicliffe Assignee: Sam Tunnicliffe We should add \{{ClusterMetadata.instance().metadataIdentifier}} to \{{GossipDigestSyn}} messages and compare with the local one, rejecting anything that has the wrong identifier like we do with cluster name. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19581) Add nodetool command to unregister LEFT nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19581: Reviewers: Sam Tunnicliffe (was: Alex Petrov, Sam Tunnicliffe) Status: Review In Progress (was: Patch Available) > Add nodetool command to unregister LEFT nodes > - > > Key: CASSANDRA-19581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19581 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 20m > Remaining Estimate: 0h > > When decommissioning a node it still remains in ClusterMetadata with state = > LEFT. We should provide a nodetool command to unregister such nodes > completely. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19581) Add nodetool command to unregister LEFT nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19581: Status: Changes Suggested (was: Review In Progress) Left a couple of comments on the PR > Add nodetool command to unregister LEFT nodes > - > > Key: CASSANDRA-19581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19581 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 20m > Remaining Estimate: 0h > > When decommissioning a node it still remains in ClusterMetadata with state = > LEFT. We should provide a nodetool command to unregister such nodes > completely. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19581) Add nodetool command to unregister LEFT nodes
[ https://issues.apache.org/jira/browse/CASSANDRA-19581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19581: Test and Documentation Plan: New and existing tests. Status: Patch Available (was: In Progress) > Add nodetool command to unregister LEFT nodes > - > > Key: CASSANDRA-19581 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19581 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 20m > Remaining Estimate: 0h > > When decommissioning a node it still remains in ClusterMetadata with state = > LEFT. We should provide a nodetool command to unregister such nodes > completely. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Status: Changes Suggested (was: Review In Progress) I think the approach is sound but we need to extend it to {{CreateKeyspaceStatement}} {{CreateIndexStatement}} The former may use {{default_keyspace_rf}} which can be set differently per node, so we need to be explicit and encode whatever value the coordinator resolves here. Likewise, {{default_secondary_index}} injects the class of the index implementation (or an alias for one) if none is specified (and {{default_secondary_index_enabled: true}}, which is the default). This may also have different values per-node, which causes different indexes to be created locally. I didn't find any other per-node settings that we need to mitigate in this way, but I'll try to do another pass. > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Reviewers: Sam Tunnicliffe, Sam Tunnicliffe Sam Tunnicliffe, Sam Tunnicliffe (was: Sam Tunnicliffe) Status: Review In Progress (was: Patch Available) > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19599) Remove unused config params for out of range token requests
[ https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842288#comment-17842288 ] Sam Tunnicliffe edited comment on CASSANDRA-19599 at 4/30/24 8:09 AM: -- PR for trivial change, CI pending. https://github.com/apache/cassandra/pull/3276 was (Author: beobal): PR for trivial change, CI pending. > Remove unused config params for out of range token requests > --- > > Key: CASSANDRA-19599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19599 > Project: Cassandra > Issue Type: Task > Components: Local/Config >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > > The fields {{log_out_of_token_range_requests}} and > {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually > been used and are just vestiges from early development on CEP-21. > We should remove them and the related accessors in {{DatabaseDescriptor}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests
[ https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19599: Test and Documentation Plan: Unused code removal, existing CI Status: Patch Available (was: Open) PR for trivial change, CI pending. > Remove unused config params for out of range token requests > --- > > Key: CASSANDRA-19599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19599 > Project: Cassandra > Issue Type: Task > Components: Local/Config >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > > The fields {{log_out_of_token_range_requests}} and > {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually > been used and are just vestiges from early development on CEP-21. > We should remove them and the related accessors in {{DatabaseDescriptor}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19599) Remove unused config params for out of range token requests
[ https://issues.apache.org/jira/browse/CASSANDRA-19599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19599: Change Category: Code Clarity Complexity: Low Hanging Fruit Fix Version/s: 5.x Status: Open (was: Triage Needed) > Remove unused config params for out of range token requests > --- > > Key: CASSANDRA-19599 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19599 > Project: Cassandra > Issue Type: Task > Components: Local/Config >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > > The fields {{log_out_of_token_range_requests}} and > {{reject_out_of_token_range_requests}} in {{Config.java}} have never actually > been used and are just vestiges from early development on CEP-21. > We should remove them and the related accessors in {{DatabaseDescriptor}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19221: Status: Ready to Commit (was: Review In Progress) +1. Agreed about the splitting of {{MetadataChangeSimulationTest}}, CASSANDRA-19344 added another dimension to the tests, so it's not super surprising if that sometimes pushes run time over the timeout. > CMS: Nodes can restart with new ipaddress already defined in the cluster > > > Key: CASSANDRA-19221 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19221 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Paul Chandler >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-1.html, ci_summary.html > > > I am simulating running a cluster in Kubernetes and testing what happens when > several pods go down and ip addresses are swapped between nodes. In 4.0 this > is blocked and the node cannot be restarted. > To simulate this I create a 3 node cluster on a local machine using 3 > loopback addresses > {code} > 127.0.0.1 > 127.0.0.2 > 127.0.0.3 > {code} > The nodes are created correctly and the first node is assigned as a CMS node > as shown: > {code} > bin/nodetool -p 7199 describecms > {code} > Cluster Metadata Service: > {code} > Members: /127.0.0.1:7000 > Is Member: true > Service State: LOCAL > {code} > At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip > addresses for the rpc_address and listen_address > > The nodes come back as normal, but the nodeid has now been swapped against > the ip address: > Before: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 75.2 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 86.77 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 80.88 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > After: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 149.62 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 155.48 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 75.74 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > On previous tests of this I have created a table with a replication factor of > 1, inserted some data before the swap. After the swap the data on nodes 2 > and 3 is now missing. > One theory I have is that I am using different port numbers for the different > nodes, and I am only swapping the ip addresses and not the port numbers, so > the ip:port still looks unique > i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044 > and 127.0.0.3:9044 becomes 127.0.0.3:9043 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19587) Remove leftover period column from system.metadata_snapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-19587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19587: Status: Review In Progress (was: Patch Available) > Remove leftover period column from system.metadata_snapshots > > > Key: CASSANDRA-19587 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19587 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > Seems we left a period column in metadata_snapshots in > CASSANDRA-19189/CASSANDRA-19482 - it should be removed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19587) Remove leftover period column from system.metadata_snapshots
[ https://issues.apache.org/jira/browse/CASSANDRA-19587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19587: Status: Ready to Commit (was: Review In Progress) +1 > Remove leftover period column from system.metadata_snapshots > > > Key: CASSANDRA-19587 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19587 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > Seems we left a period column in metadata_snapshots in > CASSANDRA-19189/CASSANDRA-19482 - it should be removed -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path
[ https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19191: Status: Needs Committer (was: Patch Available) > Optimisations to PlacementForRange, improve lookup on r/w path > -- > > Key: CASSANDRA-19191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19191 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > The lookup used when selecting the appropriate replica group for a range or > token while peforming reads and writes is extremely simplistic and > inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by > replacing the current naive iteration with use a more efficient lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path
[ https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19191: Status: Review In Progress (was: Needs Committer) > Optimisations to PlacementForRange, improve lookup on r/w path > -- > > Key: CASSANDRA-19191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19191 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > The lookup used when selecting the appropriate replica group for a range or > token while peforming reads and writes is extremely simplistic and > inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by > replacing the current naive iteration with use a more efficient lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19191) Optimisations to PlacementForRange, improve lookup on r/w path
[ https://issues.apache.org/jira/browse/CASSANDRA-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19191: Status: Ready to Commit (was: Review In Progress) +1 > Optimisations to PlacementForRange, improve lookup on r/w path > -- > > Key: CASSANDRA-19191 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19191 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary-1.html, ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > The lookup used when selecting the appropriate replica group for a range or > token while peforming reads and writes is extremely simplistic and > inefficient. There is plenty of scope to improve {{PlacementsForRange}} to by > replacing the current naive iteration with use a more efficient lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19132) Update use of transition plan in PrepareReplace
[ https://issues.apache.org/jira/browse/CASSANDRA-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19132: Attachment: ci_summary.html > Update use of transition plan in PrepareReplace > --- > > Key: CASSANDRA-19132 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19132 > Project: Cassandra > Issue Type: Task > Components: Cluster/Membership >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > When PlacementTransitionPlan was reworked to make its use more consistent > across join and leave operations, PrepareReplace was not updated. This could > now be simplified in line with the other operations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19132) Update use of transition plan in PrepareReplace
[ https://issues.apache.org/jira/browse/CASSANDRA-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19132: Status: Ready to Commit (was: Review In Progress) +1 LGTM. I rebased and added a second commit with a slight tweak to {{PlacementTransitionPlan}}. CI looks reasonable, 2 previously known failures + 1 {{Port already in use}} which I believe is an infra problem > Update use of transition plan in PrepareReplace > --- > > Key: CASSANDRA-19132 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19132 > Project: Cassandra > Issue Type: Task > Components: Cluster/Membership >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > When PlacementTransitionPlan was reworked to make its use more consistent > across join and leave operations, PrepareReplace was not updated. This could > now be simplified in line with the other operations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets
[ https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19344: Fix Version/s: 5.1 (was: 5.x) Since Version: NA Source Control Link: https://github.com/apache/cassandra/commit/dabcb175527d3c2daef54c6ce029b3c3054b2a77 Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed, thanks! > Range movements involving transient replicas must safely enact changes to > read and write replica sets > - > > Key: CASSANDRA-19344 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19344 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary-1.html, ci_summary.html, > remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz > > Time Spent: 1h 40m > Remaining Estimate: 0h > > (edit) This was originally opened due to a flaky test > {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}} > The test can fail in two different ways: > {code:java} > junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), > (31,50)] at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203) > at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} > as in here - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0] > and > {code:java} > junit.framework.AssertionFailedError: nodetool command [removenode, > 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: > stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use > decommission command to remove it from the ring -- StackTrace -- > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at
[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets
[ https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19344: Status: Ready to Commit (was: Review In Progress) > Range movements involving transient replicas must safely enact changes to > read and write replica sets > - > > Key: CASSANDRA-19344 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19344 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary-1.html, ci_summary.html, > remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz > > Time Spent: 1h 40m > Remaining Estimate: 0h > > (edit) This was originally opened due to a flaky test > {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}} > The test can fail in two different ways: > {code:java} > junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), > (31,50)] at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203) > at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} > as in here - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0] > and > {code:java} > junit.framework.AssertionFailedError: nodetool command [removenode, > 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: > stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use > decommission command to remove it from the ring -- StackTrace -- > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at >
[jira] [Updated] (CASSANDRA-19132) Update use of transition plan in PrepareReplace
[ https://issues.apache.org/jira/browse/CASSANDRA-19132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19132: Status: Review In Progress (was: Patch Available) > Update use of transition plan in PrepareReplace > --- > > Key: CASSANDRA-19132 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19132 > Project: Cassandra > Issue Type: Task > Components: Cluster/Membership >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1-alpha1 > > Time Spent: 10m > Remaining Estimate: 0h > > When PlacementTransitionPlan was reworked to make its use more consistent > across join and leave operations, PrepareReplace was not updated. This could > now be simplified in line with the other operations. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19221: Reviewers: Sam Tunnicliffe, Sam Tunnicliffe Sam Tunnicliffe, Sam Tunnicliffe (was: Sam Tunnicliffe) Status: Review In Progress (was: Patch Available) > CMS: Nodes can restart with new ipaddress already defined in the cluster > > > Key: CASSANDRA-19221 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19221 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Paul Chandler >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary.html > > > I am simulating running a cluster in Kubernetes and testing what happens when > several pods go down and ip addresses are swapped between nodes. In 4.0 this > is blocked and the node cannot be restarted. > To simulate this I create a 3 node cluster on a local machine using 3 > loopback addresses > {code} > 127.0.0.1 > 127.0.0.2 > 127.0.0.3 > {code} > The nodes are created correctly and the first node is assigned as a CMS node > as shown: > {code} > bin/nodetool -p 7199 describecms > {code} > Cluster Metadata Service: > {code} > Members: /127.0.0.1:7000 > Is Member: true > Service State: LOCAL > {code} > At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip > addresses for the rpc_address and listen_address > > The nodes come back as normal, but the nodeid has now been swapped against > the ip address: > Before: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 75.2 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 86.77 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 80.88 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > After: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 149.62 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 155.48 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 75.74 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > On previous tests of this I have created a table with a replication factor of > 1, inserted some data before the swap. After the swap the data on nodes 2 > and 3 is now missing. > One theory I have is that I am using different port numbers for the different > nodes, and I am only swapping the ip addresses and not the port numbers, so > the ip:port still looks unique > i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044 > and 127.0.0.3:9044 becomes 127.0.0.3:9043 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19221) CMS: Nodes can restart with new ipaddress already defined in the cluster
[ https://issues.apache.org/jira/browse/CASSANDRA-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838761#comment-17838761 ] Sam Tunnicliffe commented on CASSANDRA-19221: - +1. I left a couple of minor suggestions on the PR, feel free to accept or ignore them. > CMS: Nodes can restart with new ipaddress already defined in the cluster > > > Key: CASSANDRA-19221 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19221 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Paul Chandler >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary.html > > > I am simulating running a cluster in Kubernetes and testing what happens when > several pods go down and ip addresses are swapped between nodes. In 4.0 this > is blocked and the node cannot be restarted. > To simulate this I create a 3 node cluster on a local machine using 3 > loopback addresses > {code} > 127.0.0.1 > 127.0.0.2 > 127.0.0.3 > {code} > The nodes are created correctly and the first node is assigned as a CMS node > as shown: > {code} > bin/nodetool -p 7199 describecms > {code} > Cluster Metadata Service: > {code} > Members: /127.0.0.1:7000 > Is Member: true > Service State: LOCAL > {code} > At this point I bring down the nodes 127.0.0.2 and 127.0.0.3 and swap the ip > addresses for the rpc_address and listen_address > > The nodes come back as normal, but the nodeid has now been swapped against > the ip address: > Before: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 75.2 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 86.77 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 80.88 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > After: > {code} > Datacenter: datacenter1 > === > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > UN 127.0.0.3 149.62 KiB 16 76.0% > 6d194555-f6eb-41d0-c000-0003 rack1 > UN 127.0.0.2 155.48 KiB 16 59.3% > 6d194555-f6eb-41d0-c000-0002 rack1 > UN 127.0.0.1 75.74 KiB 16 64.7% > 6d194555-f6eb-41d0-c000-0001 rack1 > {code} > On previous tests of this I have created a table with a replication factor of > 1, inserted some data before the swap. After the swap the data on nodes 2 > and 3 is now missing. > One theory I have is that I am using different port numbers for the different > nodes, and I am only swapping the ip addresses and not the port numbers, so > the ip:port still looks unique > i.e. 127.0.0.2:9043 becomes 127.0.0.2:9044 > and 127.0.0.3:9044 becomes 127.0.0.3:9043 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets
[ https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838739#comment-17838739 ] Sam Tunnicliffe commented on CASSANDRA-19344: - Rebased and attached an updated {{ci_summary-1.html}} > Range movements involving transient replicas must safely enact changes to > read and write replica sets > - > > Key: CASSANDRA-19344 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19344 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary-1.html, ci_summary.html, > remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz > > Time Spent: 1h 40m > Remaining Estimate: 0h > > (edit) This was originally opened due to a flaky test > {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}} > The test can fail in two different ways: > {code:java} > junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), > (31,50)] at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203) > at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} > as in here - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0] > and > {code:java} > junit.framework.AssertionFailedError: nodetool command [removenode, > 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: > stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use > decommission command to remove it from the ring -- StackTrace -- > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets
[ https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19344: Attachment: ci_summary-1.html > Range movements involving transient replicas must safely enact changes to > read and write replica sets > - > > Key: CASSANDRA-19344 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19344 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary-1.html, ci_summary.html, > remove-n4-post-19344.txt, remove-n4-pre-19344.txt, result_details.tar.gz > > Time Spent: 1h 40m > Remaining Estimate: 0h > > (edit) This was originally opened due to a flaky test > {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}} > The test can fail in two different ways: > {code:java} > junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), > (31,50)] at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203) > at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} > as in here - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0] > and > {code:java} > junit.framework.AssertionFailedError: nodetool command [removenode, > 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: > stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use > decommission command to remove it from the ring -- StackTrace -- > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at >
[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19514: Source Control Link: https://github.com/apache/cassandra/commit/cbf4dcb3345c7e2f42f6a897c66b6460b7acc2ca (was: https://github.com/apache/cassandra/commit/a5b8c06bb925905719261b1f449fffb049f54d1b) Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed, thanks! > When jvm-dtest is shutting down an instance TCM retries block the shutdown > causing the test to fail > --- > > Key: CASSANDRA-19514 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19514 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Test/dtest/java >Reporter: David Capwell >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks > {code} > java.lang.RuntimeException: java.util.concurrent.TimeoutException >org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540) > > org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098) > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77) >java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Caused by: java.util.concurrent.TimeoutException > > org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) > Suppressed: java.util.concurrent.TimeoutException > {code} > In debugger I found the blocked future and it was > src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on > src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19514: Reviewers: Alex Petrov, Blake Eggleston, Marcus Eriksson, Sam Tunnicliffe (was: Alex Petrov, Blake Eggleston, Marcus Eriksson) Status: Review In Progress (was: Patch Available) > When jvm-dtest is shutting down an instance TCM retries block the shutdown > causing the test to fail > --- > > Key: CASSANDRA-19514 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19514 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Test/dtest/java >Reporter: David Capwell >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks > {code} > java.lang.RuntimeException: java.util.concurrent.TimeoutException >org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540) > > org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098) > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77) >java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Caused by: java.util.concurrent.TimeoutException > > org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) > Suppressed: java.util.concurrent.TimeoutException > {code} > In debugger I found the blocked future and it was > src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on > src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19514: Status: Ready to Commit (was: Review In Progress) > When jvm-dtest is shutting down an instance TCM retries block the shutdown > causing the test to fail > --- > > Key: CASSANDRA-19514 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19514 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Test/dtest/java >Reporter: David Capwell >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks > {code} > java.lang.RuntimeException: java.util.concurrent.TimeoutException >org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540) > > org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098) > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77) >java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Caused by: java.util.concurrent.TimeoutException > > org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) > Suppressed: java.util.concurrent.TimeoutException > {code} > In debugger I found the blocked future and it was > src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on > src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node
[ https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19538: Fix Version/s: 5.1 (was: 5.x) Since Version: NA Source Control Link: https://github.com/apache/cassandra/commit/80971709b983566a3f2dbfc189dfa1c5367d69bb Resolution: Fixed Status: Resolved (was: Ready to Commit) Merged to trunk (with ninja followup because *someone* forgot to add {{CHANGES.txt}}) > Test Failure: test_assassinate_valid_node > - > > Key: CASSANDRA-19538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19538 > Project: Cassandra > Issue Type: Bug > Components: CI, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary-1.html, ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > Failing consistently on trunk: > {code:java} > ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 > seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log: > Head: INFO [Messaging-EventLoop-3-1] 2024-04-03 19:37:3 > Tail: ... some nodes were not ready > INFO [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 > - Setup task failed with error, rescheduling > self = > def test_assassinate_valid_node(self): > """ > @jira_ticket CASSANDRA-16588 > Test that after taking two non-seed nodes down and assassinating > one of them, the other can come back up. > """ > cluster = self.cluster > > cluster.populate(5).start() > node1 = cluster.nodelist()[0] > node3 = cluster.nodelist()[2] > > self.cluster.set_configuration_options({ > 'seed_provider': [{'class_name': > 'org.apache.cassandra.locator.SimpleSeedProvider', >'parameters': [{'seeds': node1.address()}] > }] > }) > > non_seed_nodes = cluster.nodelist()[-2:] > for node in non_seed_nodes: > node.stop() > > assassination_target = non_seed_nodes[0] > logger.debug("Assassinating non-seed node > {}".format(assassination_target.address())) > out, err, _ = node1.nodetool("assassinate > {}".format(assassination_target.address())) > assert_stderr_clean(err) > > logger.debug("Starting non-seed nodes") > for node in non_seed_nodes: > > node.start() > gossip_test.py:78: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start > node.watch_log_for_alive(self, from_mark=mark) > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in > watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1712173052.8186479, timeout = 120 > msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n > Head: INFO [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] > 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed > with error, rescheduling\n" > node = 'node1' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after > 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in > system.log: > EHead: INFO [Messaging-EventLoop-3-1] 2024-04-03 19:37:3 > ETail: ... some nodes were not ready > E INFO [OptionalTasks:1] 2024-04-03 19:39:30,454 > CassandraRoleManager.java:484 - Setup task failed with error, rescheduling > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError > {code} > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0 > https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node
[ https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838624#comment-17838624 ] Sam Tunnicliffe commented on CASSANDRA-19538: - +1 to the followup commit too > Test Failure: test_assassinate_valid_node > - > > Key: CASSANDRA-19538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19538 > Project: Cassandra > Issue Type: Bug > Components: CI, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > Failing consistently on trunk: > {code:java} > ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 > seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log: > Head: INFO [Messaging-EventLoop-3-1] 2024-04-03 19:37:3 > Tail: ... some nodes were not ready > INFO [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 > - Setup task failed with error, rescheduling > self = > def test_assassinate_valid_node(self): > """ > @jira_ticket CASSANDRA-16588 > Test that after taking two non-seed nodes down and assassinating > one of them, the other can come back up. > """ > cluster = self.cluster > > cluster.populate(5).start() > node1 = cluster.nodelist()[0] > node3 = cluster.nodelist()[2] > > self.cluster.set_configuration_options({ > 'seed_provider': [{'class_name': > 'org.apache.cassandra.locator.SimpleSeedProvider', >'parameters': [{'seeds': node1.address()}] > }] > }) > > non_seed_nodes = cluster.nodelist()[-2:] > for node in non_seed_nodes: > node.stop() > > assassination_target = non_seed_nodes[0] > logger.debug("Assassinating non-seed node > {}".format(assassination_target.address())) > out, err, _ = node1.nodetool("assassinate > {}".format(assassination_target.address())) > assert_stderr_clean(err) > > logger.debug("Starting non-seed nodes") > for node in non_seed_nodes: > > node.start() > gossip_test.py:78: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start > node.watch_log_for_alive(self, from_mark=mark) > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in > watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1712173052.8186479, timeout = 120 > msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n > Head: INFO [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] > 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed > with error, rescheduling\n" > node = 'node1' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after > 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in > system.log: > EHead: INFO [Messaging-EventLoop-3-1] 2024-04-03 19:37:3 > ETail: ... some nodes were not ready > E INFO [OptionalTasks:1] 2024-04-03 19:39:30,454 > CassandraRoleManager.java:484 - Setup task failed with error, rescheduling > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError > {code} > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0 > https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19567) Minimize the heap consumption when registering metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-19567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838546#comment-17838546 ] Sam Tunnicliffe commented on CASSANDRA-19567: - bq. The problem is only reproducible on the x86 machine, the problem is not reproducible on the arm64. We've observed the increased heap usage on apple silicon, so I don't believe this is entirely true. > Minimize the heap consumption when registering metrics > -- > > Key: CASSANDRA-19567 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19567 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Maxim Muzafarov >Assignee: Maxim Muzafarov >Priority: Normal > Fix For: 5.x > > > The problem is only reproducible on the x86 machine, the problem is not > reproducible on the arm64. A quick analysis showed a lot of MetricName > objects stored in the heap, although the real cause could be related to > something else, the MetricName object requires extra attention. > To reproduce run the command run locally: > {code} > ant test-jvm-dtest-some > -Dtest.name=org.apache.cassandra.distributed.test.ReadRepairTest > {code} > The error: > {code:java} > [junit-timeout] Exception in thread "main" java.lang.OutOfMemoryError: Java > heap space > [junit-timeout] at > java.base/java.lang.StringLatin1.newString(StringLatin1.java:769) > [junit-timeout] at > java.base/java.lang.StringBuffer.toString(StringBuffer.java:716) > [junit-timeout] at > org.apache.cassandra.CassandraBriefJUnitResultFormatter.endTestSuite(CassandraBriefJUnitResultFormatter.java:191) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.fireEndTestSuite(JUnitTestRunner.java:854) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:578) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1197) > [junit-timeout] at > org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:1042) > [junit-timeout] Testsuite: > org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED > [junit-timeout] Testsuite: > org.apache.cassandra.distributed.test.ReadRepairTest-cassandra.testtag_IS_UNDEFINED > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0 sec > [junit-timeout] > [junit-timeout] Testcase: > org.apache.cassandra.distributed.test.ReadRepairTest:readRepairRTRangeMovementTest-cassandra.testtag_IS_UNDEFINED: > Caused an ERROR > [junit-timeout] Forked Java VM exited abnormally. Please note the time in the > report does not reflect the time until the VM exit. > [junit-timeout] junit.framework.AssertionFailedError: Forked Java VM exited > abnormally. Please note the time in the report does not reflect the time > until the VM exit. > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at java.base/java.util.Vector.forEach(Vector.java:1365) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at java.base/java.util.Vector.forEach(Vector.java:1365) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] at > jdk.internal.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > [junit-timeout] at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit-timeout] > [junit-timeout] > [junit-timeout] Test org.apache.cassandra.distributed.test.ReadRepairTest > FAILED (crashed)BUILD FAILED > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node
[ https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19538: Status: Ready to Commit (was: Review In Progress) +1 LGTM. Using an incorrect {{lastModified}} was causing updates to gossip state not to fire, because it looked to the listener like nothing had changed. > Test Failure: test_assassinate_valid_node > - > > Key: CASSANDRA-19538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19538 > Project: Cassandra > Issue Type: Bug > Components: CI, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > Failing consistently on trunk: > {code:java} > ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 > seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log: > Head: INFO [Messaging-EventLoop-3-1] 2024-04-03 19:37:3 > Tail: ... some nodes were not ready > INFO [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 > - Setup task failed with error, rescheduling > self = > def test_assassinate_valid_node(self): > """ > @jira_ticket CASSANDRA-16588 > Test that after taking two non-seed nodes down and assassinating > one of them, the other can come back up. > """ > cluster = self.cluster > > cluster.populate(5).start() > node1 = cluster.nodelist()[0] > node3 = cluster.nodelist()[2] > > self.cluster.set_configuration_options({ > 'seed_provider': [{'class_name': > 'org.apache.cassandra.locator.SimpleSeedProvider', >'parameters': [{'seeds': node1.address()}] > }] > }) > > non_seed_nodes = cluster.nodelist()[-2:] > for node in non_seed_nodes: > node.stop() > > assassination_target = non_seed_nodes[0] > logger.debug("Assassinating non-seed node > {}".format(assassination_target.address())) > out, err, _ = node1.nodetool("assassinate > {}".format(assassination_target.address())) > assert_stderr_clean(err) > > logger.debug("Starting non-seed nodes") > for node in non_seed_nodes: > > node.start() > gossip_test.py:78: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start > node.watch_log_for_alive(self, from_mark=mark) > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in > watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1712173052.8186479, timeout = 120 > msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n > Head: INFO [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] > 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed > with error, rescheduling\n" > node = 'node1' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after > 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in > system.log: > EHead: INFO [Messaging-EventLoop-3-1] 2024-04-03 19:37:3 > ETail: ... some nodes were not ready > E INFO [OptionalTasks:1] 2024-04-03 19:39:30,454 > CassandraRoleManager.java:484 - Setup task failed with error, rescheduling > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError > {code} > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0 > https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19538) Test Failure: test_assassinate_valid_node
[ https://issues.apache.org/jira/browse/CASSANDRA-19538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19538: Reviewers: Sam Tunnicliffe, Sam Tunnicliffe Sam Tunnicliffe, Sam Tunnicliffe (was: Sam Tunnicliffe) Status: Review In Progress (was: Patch Available) > Test Failure: test_assassinate_valid_node > - > > Key: CASSANDRA-19538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19538 > Project: Cassandra > Issue Type: Bug > Components: CI, Test/dtest/python >Reporter: Ekaterina Dimitrova >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > Failing consistently on trunk: > {code:java} > ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after 120.11/120 > seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log: > Head: INFO [Messaging-EventLoop-3-1] 2024-04-03 19:37:3 > Tail: ... some nodes were not ready > INFO [OptionalTasks:1] 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 > - Setup task failed with error, rescheduling > self = > def test_assassinate_valid_node(self): > """ > @jira_ticket CASSANDRA-16588 > Test that after taking two non-seed nodes down and assassinating > one of them, the other can come back up. > """ > cluster = self.cluster > > cluster.populate(5).start() > node1 = cluster.nodelist()[0] > node3 = cluster.nodelist()[2] > > self.cluster.set_configuration_options({ > 'seed_provider': [{'class_name': > 'org.apache.cassandra.locator.SimpleSeedProvider', >'parameters': [{'seeds': node1.address()}] > }] > }) > > non_seed_nodes = cluster.nodelist()[-2:] > for node in non_seed_nodes: > node.stop() > > assassination_target = non_seed_nodes[0] > logger.debug("Assassinating non-seed node > {}".format(assassination_target.address())) > out, err, _ = node1.nodetool("assassinate > {}".format(assassination_target.address())) > assert_stderr_clean(err) > > logger.debug("Starting non-seed nodes") > for node in non_seed_nodes: > > node.start() > gossip_test.py:78: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:915: in start > node.watch_log_for_alive(self, from_mark=mark) > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:684: in > watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:608: in watch_log_for > TimeoutError.raise_if_passed(start=start, timeout=timeout, node=self.name, > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > start = 1712173052.8186479, timeout = 120 > msg = "Missing: ['127.0.0.4:7000.* is now UP'] not found in system.log:\n > Head: INFO [Messaging-EventLoop-3-1] 2024-04-03 1...[OptionalTasks:1] > 2024-04-03 19:39:30,454 CassandraRoleManager.java:484 - Setup task failed > with error, rescheduling\n" > node = 'node1' > @staticmethod > def raise_if_passed(start, timeout, msg, node=None): > if start + timeout < time.time(): > > raise TimeoutError.create(start, timeout, msg, node) > E ccmlib.node.TimeoutError: 03 Apr 2024 19:39:32 [node1] after > 120.11/120 seconds Missing: ['127.0.0.4:7000.* is now UP'] not found in > system.log: > EHead: INFO [Messaging-EventLoop-3-1] 2024-04-03 19:37:3 > ETail: ... some nodes were not ready > E INFO [OptionalTasks:1] 2024-04-03 19:39:30,454 > CassandraRoleManager.java:484 - Setup task failed with error, rescheduling > ../env3.8/lib/python3.8/site-packages/ccmlib/node.py:56: TimeoutError > {code} > https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2680/workflows/8b1c0d0a-7458-4b43-9bba-ac96b9bfe64f/jobs/58929/tests#failed-test-0 > https://ci-cassandra.apache.org/job/Cassandra-trunk/1859/#showFailuresLink -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19514: Reviewers: Alex Petrov, Blake Eggleston, Marcus Eriksson (was: Blake Eggleston) > When jvm-dtest is shutting down an instance TCM retries block the shutdown > causing the test to fail > --- > > Key: CASSANDRA-19514 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19514 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Test/dtest/java >Reporter: David Capwell >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks > {code} > java.lang.RuntimeException: java.util.concurrent.TimeoutException >org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540) > > org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098) > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77) >java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Caused by: java.util.concurrent.TimeoutException > > org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) > Suppressed: java.util.concurrent.TimeoutException > {code} > In debugger I found the blocked future and it was > src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on > src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838035#comment-17838035 ] Sam Tunnicliffe commented on CASSANDRA-19514: - Trunk PR and CI results attached. The python dtest failure is CASSANDRA-19538, of the 2 python upgrade test failures, one looks like CASSANDRA-19520 and the other may be a timeout related. > When jvm-dtest is shutting down an instance TCM retries block the shutdown > causing the test to fail > --- > > Key: CASSANDRA-19514 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19514 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Test/dtest/java >Reporter: David Capwell >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks > {code} > java.lang.RuntimeException: java.util.concurrent.TimeoutException >org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540) > > org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098) > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77) >java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Caused by: java.util.concurrent.TimeoutException > > org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) > Suppressed: java.util.concurrent.TimeoutException > {code} > In debugger I found the blocked future and it was > src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on > src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19514: Attachment: ci_summary.html result_details.tar.gz > When jvm-dtest is shutting down an instance TCM retries block the shutdown > causing the test to fail > --- > > Key: CASSANDRA-19514 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19514 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Test/dtest/java >Reporter: David Capwell >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks > {code} > java.lang.RuntimeException: java.util.concurrent.TimeoutException >org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540) > > org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098) > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77) >java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Caused by: java.util.concurrent.TimeoutException > > org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) > Suppressed: java.util.concurrent.TimeoutException > {code} > In debugger I found the blocked future and it was > src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on > src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19514: Status: Patch Available (was: In Progress) > When jvm-dtest is shutting down an instance TCM retries block the shutdown > causing the test to fail > --- > > Key: CASSANDRA-19514 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19514 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Test/dtest/java >Reporter: David Capwell >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks > {code} > java.lang.RuntimeException: java.util.concurrent.TimeoutException >org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540) > > org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098) > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77) >java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Caused by: java.util.concurrent.TimeoutException > > org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) > Suppressed: java.util.concurrent.TimeoutException > {code} > In debugger I found the blocked future and it was > src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on > src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837201#comment-17837201 ] Sam Tunnicliffe commented on CASSANDRA-12937: - So you're suggesting that we could have different default values in yaml across the cluster, but that all nodes actually apply the same value regardless of their own configured default? Which specific value makes it into schema just depends on which instance acts as the coordinator for a given DCL statement? It seems like if we actually want these to be cluster wide values and not configurable on a per-node basis the defaults themselves should be in TCM, independently of the schema transformations. In the past, we've used per-node configuration like this to experiment with new compression algorithms on a per-node basis and I can imagine potentially wanting to do the same with things like compaction, so I'm not entirely convinced that this assumption is correct. As far as the serialization format, schema transformations have to be round-trippable via CQL for the purposes of recreating a schema from a snapshot. So I don't think that using the CQL itself as the format is inherently flawed and it does have a couple of big positives, namely that it's great for visibility (for operators or when debugging) and that it doesn't invent a new format, that we have version and manage as new flags/features/defaults are added. We should just need fully resolve and expand a DCL statement before serializing it in {{SchemaAlteringStatement}} which would be entirely possible, but I remain unconvinced that just picking the defaults from whatever node happens to be coordinating is the right way to go. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837179#comment-17837179 ] Sam Tunnicliffe edited comment on CASSANDRA-12937 at 4/15/24 11:11 AM: --- The problem with that is that the defaults may be different on every instance, so what exactly should be stored in the TCM log? Ideally we should store the value that is actually resolved during initial execution on each node so that it can be re-used if/when the transformation is reapplied. That should probably be in a parallel local datastructure though, not in the node's local log table as we don't want to ship those local defaults to peers when providing log catchup (because they should use their own defaults). was (Author: beobal): The problem with that is that the defaults may be different on every instance, so what exactly should be stored in the TCM log? Ideally we should store the value that is actually resolved during initial execution on each node to be persisted locally so that it can be re-used if/when the transformation is reapplied. That should probably be in a parallel local datastructure though, not in the node's local log table as we don't want to ship those local defaults to peers when providing log catchup (because they should use their own defaults). > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837179#comment-17837179 ] Sam Tunnicliffe commented on CASSANDRA-12937: - The problem with that is that the defaults may be different on every instance, so what exactly should be stored in the TCM log? Ideally we should store the value that is actually resolved during initial execution on each node to be persisted locally so that it can be re-used if/when the transformation is reapplied. That should probably be in a parallel local datastructure though, not in the node's local log table as we don't want to ship those local defaults to peers when providing log catchup (because they should use their own defaults). > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837177#comment-17837177 ] Sam Tunnicliffe commented on CASSANDRA-18954: - [~jlewandowski] bq. CASSANDRA-12937 problem is caused by the fact that the transformations are not pure. It is not enough that they are side-effects-free, they also cannot depend on any external properties other than the current cluster state and the store transformation data. Yes, this is exactly what I mean and we're looking into a proper fix for this now > Transformations should be pure so that replaying them results in the same > outcome regardless of the node state or configuration > --- > > Key: CASSANDRA-18954 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18954 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > > Discussed on Slack -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe reassigned CASSANDRA-19514: --- Assignee: Sam Tunnicliffe (was: David Capwell) > When jvm-dtest is shutting down an instance TCM retries block the shutdown > causing the test to fail > --- > > Key: CASSANDRA-19514 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19514 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Test/dtest/java >Reporter: David Capwell >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1 > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks > {code} > java.lang.RuntimeException: java.util.concurrent.TimeoutException >org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540) > > org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098) > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77) >java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Caused by: java.util.concurrent.TimeoutException > > org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) > Suppressed: java.util.concurrent.TimeoutException > {code} > In debugger I found the blocked future and it was > src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on > src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets
[ https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19344: Attachment: remove-n4-pre-19344.txt remove-n4-post-19344.txt > Range movements involving transient replicas must safely enact changes to > read and write replica sets > - > > Key: CASSANDRA-19344 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19344 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html, remove-n4-post-19344.txt, > remove-n4-pre-19344.txt, result_details.tar.gz > > Time Spent: 1h 40m > Remaining Estimate: 0h > > (edit) This was originally opened due to a flaky test > {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}} > The test can fail in two different ways: > {code:java} > junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), > (31,50)] at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203) > at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} > as in here - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0] > and > {code:java} > junit.framework.AssertionFailedError: nodetool command [removenode, > 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: > stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use > decommission command to remove it from the ring -- StackTrace -- > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at >
[jira] [Commented] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets
[ https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836302#comment-17836302 ] Sam Tunnicliffe commented on CASSANDRA-19344: - The actual cause was that the way we construct placement deltas for a PlacementTransitionPlan did not properly consider transientness. Multi-step operations always follow the pattern: * add new write replicas * add new read replicas/remove old read replicas * remove old write replicas So when an operation causes a replica to transition from TRANSIENT to FULL for the same range (or part of a range), it could become a FULL read replica before becoming a FULL write replica. Consider this simplified example where we remove N4 and the effect on N2: {code} RF=3/1 At START 10203040 +-+-+-+-+-+ N1N2N3N4 N2 replicates: (10,20] - FULL (Primary Range) (,10] + (40,] - FULL (30,40] - TRANSIENT After FINISH 102030 +-+-+-+---+ N1N2N3 N2 replicates: (10,20] - FULL (Primary Range) (,10] + (30,] - FULL (20,30] - TRANSIENT In removing N4, N2 gains (20,30] TRANSIENT and (30,40] TRANSIENT -> FULL Potential problem -> for READS N2 becomes FULL(30,40] after MID_LEAVE for WRITES N2 only becomes FULL(30,40] after FINISH_LEAVE so between the 2 events, coordinators will not send writes to N2 unless one of the other replicas is unresponsive. Coordinators will send reads to N2 during this window though. If cleanup is run before N2 becomes a FULL replica for (30,40], any data for that range (including that which was just streamed to it) will be purged. {code} Below is an illustration of the ranges replicated by N2 at each step: {code} +-+--+ |EPOCH| STATE | RANGES REPLICATED BY N2 | |-++-| |0| START STATE| WRITES -> FULL: [(40,], (,10], (10,20]] TRANSIENT: [(30,40]]| | || READS -> FULL: [(40,], (,10], (10,20]] TRANSIENT: [(30,40]]| |-++-| |1| ENACT START_LEAVE(N4) | WRITES -> FULL: [(40,], (,10], (10,20]] TRANSIENT: [(20,30], (30,40]] | | || READS -> FULL: [(40,], (,10], (10,20]] TRANSIENT: [(30,40]]| |-++-| |2| ENACT MID_LEAVE(N4)| WRITES -> FULL: [(40,], (,10], (10,20]] TRANSIENT: [(20,30], (30,40]] | | || READS -> FULL: [(40,], (,10], (10,20], (30,40]] TRANSIENT: [(20,30]] | |-++-| |3| ENACT FINISH_LEAVE(N4) | WRITES -> FULL: [(30,], (,10], (10,20]] TRANSIENT: [(20,30]]| | || READS -> FULL: [(30,], (,10], (10,20]] TRANSIENT: [(20,30]]| +-++-+ {code} After applying the fix here, these are changed so that the {{(30,40]}} changing from {{TRANSIENT}} to {{FULL}} for writes is part of enacting the {{START_LEAVE(N4)}} in epoch 1, i.e. before N2 becomes a FULL replica for reads of {{(30,40]}} when {{MID_LEAVE(N4)}} is enacted in epoch 2. {code} +-+--+ |EPOCH| STATE | RANGES REPLICATED BY N2 | |-++-| |0| START STATE| WRITES -> FULL: [(40,], (,10], (10,20]] TRANSIENT: [(30,40]] | | || READS -> FULL: [(40,], (,10], (10,20]] TRANSIENT: [(30,40]] | |-++-| |1| ENACT START_LEAVE(N4) | WRITES -> FULL: [(40,], (,10], (10,20], (30,40]] TRANSIENT: [(20,30]] | | || READS -> FULL: [(40,], (,10], (10,20]] TRANSIENT: [(30,40]] |
[jira] [Commented] (CASSANDRA-19516) Use Transformation.Kind.id in local and distributed log tables
[ https://issues.apache.org/jira/browse/CASSANDRA-19516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836280#comment-17836280 ] Sam Tunnicliffe commented on CASSANDRA-19516: - +1 > Use Transformation.Kind.id in local and distributed log tables > -- > > Key: CASSANDRA-19516 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19516 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > We should store {{Kind.id}} added in CASSANDRA-19390 in the local and > distributed log tables. Virtual table will still do the id -> string lookup > for easier reading -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19516) Use Transformation.Kind.id in local and distributed log tables
[ https://issues.apache.org/jira/browse/CASSANDRA-19516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19516: Status: Review In Progress (was: Patch Available) > Use Transformation.Kind.id in local and distributed log tables > -- > > Key: CASSANDRA-19516 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19516 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Attachments: ci_summary.html > > Time Spent: 10m > Remaining Estimate: 0h > > We should store {{Kind.id}} added in CASSANDRA-19390 in the local and > distributed log tables. Virtual table will still do the id -> string lookup > for easier reading -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19482: Source Control Link: https://github.com/apache/cassandra/commit/728b9ec4c604f6939facf62a261ca795ef6dbf0c Resolution: Fixed Status: Resolved (was: Ready to Commit) > Simplify metadata log implementation using custom partitioner > - > > Key: CASSANDRA-19482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19482 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > The distributed metadata log table can be simplified by leveraging the fact > that replicas are all responsible for the entire token range. Given this > assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in > CASSANDRA-19391 to make it much easier to append to/read from the tail of the > log, effectively removing the need for the {{Period}} construct. This will > also improve apply to the local metadata log used at startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19482: Fix Version/s: 5.1-alpha1 (was: 5.x) > Simplify metadata log implementation using custom partitioner > - > > Key: CASSANDRA-19482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19482 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.1-alpha1 > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > The distributed metadata log table can be simplified by leveraging the fact > that replicas are all responsible for the entire token range. Given this > assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in > CASSANDRA-19391 to make it much easier to append to/read from the tail of the > log, effectively removing the need for the {{Period}} construct. This will > also improve apply to the local metadata log used at startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19482: Status: Ready to Commit (was: Review In Progress) +1 from Alex on the PR > Simplify metadata log implementation using custom partitioner > - > > Key: CASSANDRA-19482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19482 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > The distributed metadata log table can be simplified by leveraging the fact > that replicas are all responsible for the entire token range. Given this > assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in > CASSANDRA-19391 to make it much easier to append to/read from the tail of the > log, effectively removing the need for the {{Period}} construct. This will > also improve apply to the local metadata log used at startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19482: Status: Needs Committer (was: Patch Available) > Simplify metadata log implementation using custom partitioner > - > > Key: CASSANDRA-19482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19482 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > The distributed metadata log table can be simplified by leveraging the fact > that replicas are all responsible for the entire token range. Given this > assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in > CASSANDRA-19391 to make it much easier to append to/read from the tail of the > log, effectively removing the need for the {{Period}} construct. This will > also improve apply to the local metadata log used at startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19482: Reviewers: Alex Petrov Status: Review In Progress (was: Needs Committer) > Simplify metadata log implementation using custom partitioner > - > > Key: CASSANDRA-19482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19482 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > The distributed metadata log table can be simplified by leveraging the fact > that replicas are all responsible for the entire token range. Given this > assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in > CASSANDRA-19391 to make it much easier to append to/read from the tail of the > log, effectively removing the need for the {{Period}} construct. This will > also improve apply to the local metadata log used at startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835805#comment-17835805 ] Sam Tunnicliffe commented on CASSANDRA-18954: - Yep, fair enough. I wouldn't expect you to have followed those things, my intention was really to give a heads up that I don't think this is an issue anymore and that I'd probably close it soon. That was the thinking when I made the original comment, before CASSANDRA-12937 highlighted the issue with mutating local config between restarts. I'll leave this alone until we resolve that, which we're working towards now. > Transformations should be pure so that replaying them results in the same > outcome regardless of the node state or configuration > --- > > Key: CASSANDRA-18954 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18954 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > > Discussed on Slack -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19482: Attachment: ci_summary.html result_details.tar.gz > Simplify metadata log implementation using custom partitioner > - > > Key: CASSANDRA-19482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19482 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 1.5h > Remaining Estimate: 0h > > The distributed metadata log table can be simplified by leveraging the fact > that replicas are all responsible for the entire token range. Given this > assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in > CASSANDRA-19391 to make it much easier to append to/read from the tail of the > log, effectively removing the need for the {{Period}} construct. This will > also improve apply to the local metadata log used at startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19482: Attachment: (was: result_details.tar.gz) > Simplify metadata log implementation using custom partitioner > - > > Key: CASSANDRA-19482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19482 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Time Spent: 1.5h > Remaining Estimate: 0h > > The distributed metadata log table can be simplified by leveraging the fact > that replicas are all responsible for the entire token range. Given this > assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in > CASSANDRA-19391 to make it much easier to append to/read from the tail of the > log, effectively removing the need for the {{Period}} construct. This will > also improve apply to the local metadata log used at startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19482) Simplify metadata log implementation using custom partitioner
[ https://issues.apache.org/jira/browse/CASSANDRA-19482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19482: Attachment: (was: ci_summary.html) > Simplify metadata log implementation using custom partitioner > - > > Key: CASSANDRA-19482 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19482 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Time Spent: 1.5h > Remaining Estimate: 0h > > The distributed metadata log table can be simplified by leveraging the fact > that replicas are all responsible for the entire token range. Given this > assumption, we can then use {{ReversedLongLocalPartitioner}} introduced in > CASSANDRA-19391 to make it much easier to append to/read from the tail of the > log, effectively removing the need for the {{Period}} construct. This will > also improve apply to the local metadata log used at startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17835304#comment-17835304 ] Sam Tunnicliffe commented on CASSANDRA-18954: - Actually, scratch that just for now - I think we need to address CASSANDRA-12937 first > Transformations should be pure so that replaying them results in the same > outcome regardless of the node state or configuration > --- > > Key: CASSANDRA-18954 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18954 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > > Discussed on Slack -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19514) When jvm-dtest is shutting down an instance TCM retries block the shutdown causing the test to fail
[ https://issues.apache.org/jira/browse/CASSANDRA-19514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19514: Resolution: (was: Fixed) Status: Open (was: Resolved) Reopening this as the problem is entirely relevant to trunk so we should apply the patch there too. > When jvm-dtest is shutting down an instance TCM retries block the shutdown > causing the test to fail > --- > > Key: CASSANDRA-19514 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19514 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership, Test/dtest/java >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Fix For: 5.1 > > Time Spent: 10m > Remaining Estimate: 0h > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest#testRequestingPeerWatermarks > {code} > java.lang.RuntimeException: java.util.concurrent.TimeoutException >org.apache.cassandra.utils.Throwables.maybeFail(Throwables.java:79) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:540) > > org.apache.cassandra.distributed.impl.AbstractCluster.close(AbstractCluster.java:1098) > > org.apache.cassandra.distributed.test.log.RequestCurrentEpochTest.testRequestingPeerWatermarks(RequestCurrentEpochTest.java:77) >java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > Caused by: java.util.concurrent.TimeoutException > > org.apache.cassandra.utils.concurrent.AbstractFuture.get(AbstractFuture.java:253) > > org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532) > Suppressed: java.util.concurrent.TimeoutException > {code} > In debugger I found the blocked future and it was > src/java/org/apache/cassandra/tcm/EpochAwareDebounce.java waiting on > src/java/org/apache/cassandra/tcm/RemoteProcessor.java retries -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18954) Transformations should be pure so that replaying them results in the same outcome regardless of the node state or configuration
[ https://issues.apache.org/jira/browse/CASSANDRA-18954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17833530#comment-17833530 ] Sam Tunnicliffe commented on CASSANDRA-18954: - [~jlewandowski] I think that this might be obsolete now, following a few changes that landed before {{cep-21-tcm}} was merged, plus CASSANDRA-19271 and CASSANDRA-19384. Replay of log entries during startup don't enact any side effects now as they are replayed. Would you mind if I closed this? > Transformations should be pure so that replaying them results in the same > outcome regardless of the node state or configuration > --- > > Key: CASSANDRA-18954 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18954 > Project: Cassandra > Issue Type: Bug > Components: Transactional Cluster Metadata >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > > Discussed on Slack -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19271) Improve setup and initialisation of LocalLog/LogSpec
[ https://issues.apache.org/jira/browse/CASSANDRA-19271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19271: Epic Link: CASSANDRA-19055 > Improve setup and initialisation of LocalLog/LogSpec > > > Key: CASSANDRA-19271 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19271 > Project: Cassandra > Issue Type: Improvement > Components: Transactional Cluster Metadata >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Urgent > Fix For: 5.1 > > Attachments: ci_summary.html, result_details.tar.gz > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13855) Implement Http Seed provider
[ https://issues.apache.org/jira/browse/CASSANDRA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829941#comment-17829941 ] Sam Tunnicliffe edited comment on CASSANDRA-13855 at 3/22/24 5:27 PM: -- Not completely relevant to the discussion here, but slightly adjacent... the roles of both snitches and seeds have changed slightly in trunk. See CASSANDRA-19488 which I finally remembered to file for more detail on snitches. Seeds are now really only necessary as initial contact points for joining nodes. Technically, they do still perform the same function regarding gossip convergence, but that it is way less important/relevant now as we don't rely on gossip state for correctness. Of course, this doesn't make the goal of this JIRA any less valid, I just thought I should mention it. was (Author: beobal): Not completely relevant to the discussion here, but slightly adjacent... the roles of both snitches and seeds have changed slightly in trunk. See CASSANDRA-XXX which I finally remembered to file for more detail on snitches. Seeds are now really only necessary as initial contact points for joining nodes. Technically, they do still perform the same function regarding gossip convergence, but that it is way less important/relevant now as we don't rely on gossip state for correctness. Of course, this doesn't make the goal of this JIRA any less valid, I just thought I should mention it. > Implement Http Seed provider > > > Key: CASSANDRA-13855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13855 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination, Legacy/Core >Reporter: Jon Haddad >Assignee: Claude Warren >Priority: Low > Labels: lhf > Fix For: 5.x > > Attachments: 0001-Add-URL-Seed-Provider-trunk.txt, signature.asc, > signature.asc, signature.asc > > Time Spent: 0.5h > Remaining Estimate: 0h > > Seems like including a dead simple seed provider that can fetch from a URL, 1 > line per seed, would be useful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13855) Implement Http Seed provider
[ https://issues.apache.org/jira/browse/CASSANDRA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829941#comment-17829941 ] Sam Tunnicliffe commented on CASSANDRA-13855: - Not completely relevant to the discussion here, but slightly adjacent... the roles of both snitches and seeds have changed slightly in trunk. See CASSANDRA-XXX which I finally remembered to file for more detail on snitches. Seeds are now really only necessary as initial contact points for joining nodes. Technically, they do still perform the same function regarding gossip convergence, but that it is way less important/relevant now as we don't rely on gossip state for correctness. Of course, this doesn't make the goal of this JIRA any less valid, I just thought I should mention it. > Implement Http Seed provider > > > Key: CASSANDRA-13855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13855 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Coordination, Legacy/Core >Reporter: Jon Haddad >Assignee: Claude Warren >Priority: Low > Labels: lhf > Fix For: 5.x > > Attachments: 0001-Add-URL-Seed-Provider-trunk.txt, signature.asc, > signature.asc, signature.asc > > Time Spent: 0.5h > Remaining Estimate: 0h > > Seems like including a dead simple seed provider that can fetch from a URL, 1 > line per seed, would be useful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19488) Ensure snitches always defer to ClusterMetadata
[ https://issues.apache.org/jira/browse/CASSANDRA-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19488: Change Category: Operability Complexity: Normal Fix Version/s: 5.x Status: Open (was: Triage Needed) > Ensure snitches always defer to ClusterMetadata > --- > > Key: CASSANDRA-19488 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19488 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Membership, Messaging/Internode, Transactional > Cluster Metadata >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > > Internally, C* always uses {{ClusterMetadata}} as the source of topology > information when calculating data placements, replica plans etc and as such > the role of the snitch has been somewhat reduced. > Sorting and comparison functions as provided by specialisations like > {{DynamicEndpointSnitch}} are still used, but the snitch should only be > responsible for providing the DC and rack for a new node when it first joins > a cluster. > Aside from initial startup and registration, snitch implementations should > always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a > risk that the snitch config drifts out of sync with TCM and output from tools > like {{nodetool ring}} and {{gossipinfo}} becomes incorrect. > A complication is that topology is used when opening connections to peers as > certain internode connection settings are variable at the DC level, so at the > time of connecting we want to check the location of the remote peer. Usually, > this is available from {{{}ClusterMetadata{}}}, but in the case of a brand > new node joining the cluster nothing is known a priori. The current > implementation assumes that the snitch will know the location of the new node > ahead of time, but in practice this is often not the case (though with > variants of {{PropertyFileSnitch}} it _should_ be), and the remote node is > temporarily assigned a default DC. This is problematic as it can cause the > internode connection settings which depend on DC to be incorrectly set. > Internode connections are long lived and any established while the DC is > unknown (potentially with incorrect config) will persist indefinitely. This > particular issue is not directly related to TCM and is present in earlier > versions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19488) Ensure snitches always defer to ClusterMetadata
Sam Tunnicliffe created CASSANDRA-19488: --- Summary: Ensure snitches always defer to ClusterMetadata Key: CASSANDRA-19488 URL: https://issues.apache.org/jira/browse/CASSANDRA-19488 Project: Cassandra Issue Type: Improvement Components: Cluster/Membership, Messaging/Internode, Transactional Cluster Metadata Reporter: Sam Tunnicliffe Assignee: Sam Tunnicliffe Internally, C* always uses {{ClusterMetadata}} as the source of topology information when calculating data placements, replica plans etc and as such the role of the snitch has been somewhat reduced. Sorting and comparison functions as provided by specialisations like {{DynamicEndpointSnitch}} are still used, but the snitch should only be responsible for providing the DC and rack for a new node when it first joins a cluster. Aside from initial startup and registration, snitch implementations should always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a risk that the snitch config drifts out of sync with TCM and output from tools like {{nodetool ring}} and {{gossipinfo}} becomes incorrect. A complication is that topology is used when opening connections to peers as certain internode connection settings are variable at the DC level, so at the time of connecting we want to check the location of the remote peer. Usually, this is available from {{{}ClusterMetadata{}}}, but in the case of a brand new node joining the cluster nothing is known a priori. The current implementation assumes that the snitch will know the location of the new node ahead of time, but in practice this is often not the case (though with variants of {{PropertyFileSnitch}} it _should_ be), and the remote node is temporarily assigned a default DC. This is problematic as it can cause the internode connection settings which depend on DC to be incorrectly set. Internode connections are long lived and any established while the DC is unknown (potentially with incorrect config) will persist indefinitely. This particular issue is not directly related to TCM and is present in earlier versions. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19130) Implement transactional table truncation
[ https://issues.apache.org/jira/browse/CASSANDRA-19130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829898#comment-17829898 ] Sam Tunnicliffe commented on CASSANDRA-19130: - The way truncation works is that it writes a timestamp into a system table on each node, associated with the table being truncated (and a commitlog position). Then, when local reads and writes are done against that table, any cells with a timestamp earlier than the truncation is essentially discarded. If any node misses that message and so doesn't write the timestamp it won't do this filtering and so data can be resurrected. This is a strictly one time operation and there's no way for a node which does miss such a message to catch it up later, which is why {{TRUNCATE}} currently requires all nodes to be up. With TCM, we can improve this by having an entry in the log which contains the truncation timestamp. Then it can be distributed to peers the same way as any other log entry, allowing them to catch up if they miss it. Replicas and coordinators participating in a read already check that they're all up to date with each other attempt to catch up if not. We shouldn't have to change how truncation works on the local level, just have {{TruncateStatement}} work by committing a new transform to the CMS. The trickiest bit will be to make sure that the {{execute}} method itself is side-effect free (i.e. it only produces a new ClusterMetadata). The way to do that is with a {{ChangeListener}} which implements a post-commit event to do the work of {{CFS::truncateBlocking}} > Implement transactional table truncation > > > Key: CASSANDRA-19130 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19130 > Project: Cassandra > Issue Type: New Feature > Components: Consistency/Coordination >Reporter: Marcus Eriksson >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > TRUNCATE table should leverage cluster metadata to ensure consistent > truncation timestamps across all replicas. The current implementation depends > on all nodes being available, but this could be reimplemented as a > {{Transformation}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces
[ https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19255: Attachment: 19482_patch_for_19255.diff > StorageService.getRangeToEndpointMap() MBean operation is running into NPE > for LocalStrategy keysapces > -- > > Key: CASSANDRA-19255 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19255 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: n.v.harikrishna >Assignee: n.v.harikrishna >Priority: Normal > Fix For: 5.x > > Attachments: 19482_patch_for_19255.diff, ci_summary.html, > result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > When the StorageService's MBean operation getRangeToEndpointMap is called for > LocalStrategy keyspaces, then it is running into NPE. It is working in > earlier major version, but failing in trunk. It can be reproduced in local > using JConsole or using a tool like `jmxterm` (unfortunately these tools are > not giving full stacktrace). Observed the same behavior with > getRangeToEndpointWithPortMap operation too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces
[ https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19255: Status: Needs Committer (was: Patch Available) > StorageService.getRangeToEndpointMap() MBean operation is running into NPE > for LocalStrategy keysapces > -- > > Key: CASSANDRA-19255 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19255 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: n.v.harikrishna >Assignee: n.v.harikrishna >Priority: Normal > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > When the StorageService's MBean operation getRangeToEndpointMap is called for > LocalStrategy keyspaces, then it is running into NPE. It is working in > earlier major version, but failing in trunk. It can be reproduced in local > using JConsole or using a tool like `jmxterm` (unfortunately these tools are > not giving full stacktrace). Observed the same behavior with > getRangeToEndpointWithPortMap operation too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces
[ https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19255: Fix Version/s: 5.x Since Version: 5.x Source Control Link: https://github.com/apache/cassandra/commit/a69c8657d75de627fb1fe518bfe1d657add11740 Resolution: Fixed Status: Resolved (was: Ready to Commit) LGTM too, committed as {{a69c8657d75de627fb1fe518bfe1d657add11740}} (with a couple of extremely minor changes). One thing to note is that this will need slight readjustment if CASSANDRA-19482 is committed as the way we handle the {{MetaStrategy}} keyspace is changed. The attached diff will fix that if/when necessary. > StorageService.getRangeToEndpointMap() MBean operation is running into NPE > for LocalStrategy keysapces > -- > > Key: CASSANDRA-19255 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19255 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: n.v.harikrishna >Assignee: n.v.harikrishna >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > When the StorageService's MBean operation getRangeToEndpointMap is called for > LocalStrategy keyspaces, then it is running into NPE. It is working in > earlier major version, but failing in trunk. It can be reproduced in local > using JConsole or using a tool like `jmxterm` (unfortunately these tools are > not giving full stacktrace). Observed the same behavior with > getRangeToEndpointWithPortMap operation too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces
[ https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19255: Status: Ready to Commit (was: Review In Progress) > StorageService.getRangeToEndpointMap() MBean operation is running into NPE > for LocalStrategy keysapces > -- > > Key: CASSANDRA-19255 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19255 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: n.v.harikrishna >Assignee: n.v.harikrishna >Priority: Normal > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > When the StorageService's MBean operation getRangeToEndpointMap is called for > LocalStrategy keyspaces, then it is running into NPE. It is working in > earlier major version, but failing in trunk. It can be reproduced in local > using JConsole or using a tool like `jmxterm` (unfortunately these tools are > not giving full stacktrace). Observed the same behavior with > getRangeToEndpointWithPortMap operation too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19255) StorageService.getRangeToEndpointMap() MBean operation is running into NPE for LocalStrategy keysapces
[ https://issues.apache.org/jira/browse/CASSANDRA-19255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19255: Status: Review In Progress (was: Needs Committer) > StorageService.getRangeToEndpointMap() MBean operation is running into NPE > for LocalStrategy keysapces > -- > > Key: CASSANDRA-19255 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19255 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: n.v.harikrishna >Assignee: n.v.harikrishna >Priority: Normal > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > When the StorageService's MBean operation getRangeToEndpointMap is called for > LocalStrategy keyspaces, then it is running into NPE. It is working in > earlier major version, but failing in trunk. It can be reproduced in local > using JConsole or using a tool like `jmxterm` (unfortunately these tools are > not giving full stacktrace). Observed the same behavior with > getRangeToEndpointWithPortMap operation too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19344) Range movements involving transient replicas must safely enact changes to read and write replica sets
[ https://issues.apache.org/jira/browse/CASSANDRA-19344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829794#comment-17829794 ] Sam Tunnicliffe commented on CASSANDRA-19344: - The linked PR modifies a number existing tests to make the original failure mode deterministic. It also adds support for transient replication to {{PlacementSimulator}} {{MetadataChangeSimulationTest}} and the associated {{{}TokenPlacementModel{}}}. Finally, it modifies the way the {{PlacementTransitionPlan}} is prepared for operations involving range movements to ensure that any transition from a transient to full replica happens safely. > Range movements involving transient replicas must safely enact changes to > read and write replica sets > - > > Key: CASSANDRA-19344 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19344 > Project: Cassandra > Issue Type: Bug > Components: CI >Reporter: Ekaterina Dimitrova >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 5.x > > Attachments: ci_summary.html, result_details.tar.gz > > Time Spent: 10m > Remaining Estimate: 0h > > (edit) This was originally opened due to a flaky test > {{org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode-_jdk17}} > The test can fail in two different ways: > {code:java} > junit.framework.AssertionFailedError: NOT IN CURRENT: 31 -- [(00,20), > (31,50)] at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.assertAllContained(TransientRangeMovementTest.java:203) > at > org.apache.cassandra.distributed.test.TransientRangeMovementTest.testRemoveNode(TransientRangeMovementTest.java:183) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43){code} > as in here - > [https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/2639/workflows/32b92ce7-5e9d-4efb-8362-d200d2414597/jobs/55139/tests#failed-test-0] > and > {code:java} > junit.framework.AssertionFailedError: nodetool command [removenode, > 6d194555-f6eb-41d0-c000-0003, --force] was not successful stdout: > stderr: error: Node /127.0.0.4:7012 is alive and owns this ID. Use > decommission command to remove it from the ring -- StackTrace -- > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at > org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$51(Instance.java:1038) > at org.apache.cassandra.concurrent.FutureTask.call(FutureTask.java:61) at > org.apache.cassandra.concurrent.FutureTask.run(FutureTask.java:71) at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > at > io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) > at java.base/java.lang.Thread.run(Thread.java:833) Notifications: Error: > java.lang.UnsupportedOperationException: Node /127.0.0.4:7012 is alive and > owns this ID. Use decommission command to remove it from the ring at > org.apache.cassandra.tcm.sequences.SingleNodeSequences.removeNode(SingleNodeSequences.java:110) > at > org.apache.cassandra.service.StorageService.removeNode(StorageService.java:3682) > at org.apache.cassandra.tools.NodeProbe.removeNode(NodeProbe.java:1020) at > org.apache.cassandra.tools.nodetool.RemoveNode.execute(RemoveNode.java:51) at > org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:388) > at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:373) at > org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:272) at > org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:1129) > at >