[jira] [Updated] (CASSANDRA-18120) Single slow node dramatically reduces cluster logged batch write throughput regardless of CL
[ https://issues.apache.org/jira/browse/CASSANDRA-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-18120: --- Change Category: Operability Complexity: Normal Component/s: Consistency/Coordination Fix Version/s: 4.0.x 4.1.x 5.0.x 5.x Reviewers: Michael Semb Wever Status: Open (was: Triage Needed) > Single slow node dramatically reduces cluster logged batch write throughput > regardless of CL > > > Key: CASSANDRA-18120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18120 > Project: Cassandra > Issue Type: Improvement > Components: Consistency/Coordination >Reporter: Dan Sarisky >Assignee: Maxim Chanturiay >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, > QUORUM, or LOCAL_QUORUM) > > On clusters of any size - a single extremely slow node causes a ~90% loss of > cluster-wide throughput using batched writes. We can replicate this in the > lab via CPU or disk throttling. I observe this in 3.11, 4.0, and 4.1. > > It appears the mechanism in play is: > Those logged batches are immediately written to two replica nodes and the > actual mutations aren't processed until those two nodes acknowledge the batch > statements. Those replica nodes are selected randomly from all nodes in the > local data center currently up in gossip. If a single node is slow, but > still thought to be up in gossip, this eventually causes every other node to > have all of its MutationStages to be waiting while the slow replica accepts > batch writes. > > The code in play appears to be: > See > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245]. > In the method filterBatchlogEndpoints() there is a > Collections.shuffle() to order the endpoints and a > FailureDetector.isEndpointAlive() to test if the endpoint is acceptable. > > This behavior causes Cassandra to move from a multi-node fault tolerant > system toa collection of single points of failure. > > We try to take administrator actions to kill off the extremely slow nodes, > but it would be great to have some notion of "what node is a bad choice" when > writing log batches to replica nodes. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-18120) Single slow node dramatically reduces cluster logged batch write throughput regardless of CL
[ https://issues.apache.org/jira/browse/CASSANDRA-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849074#comment-17849074 ] Michael Semb Wever edited comment on CASSANDRA-18120 at 5/23/24 6:39 PM: - [~shunsaker], do you want to share the patch you're willing to upstream ? That patch would have had a lot of production exposure already, so it would be my preference. [~maximc], are you ok if we focus on Shayne's patch ? I know you've done a lot of work already, and it sucks when you've completed a patch and it was the first patch offered. Given your expertise now, and not letting it go to waste, it would be very valuable to have you as a reviewer (and tester). bq. Michael Semb Wever and Maxim Chanturiay provide strong arguments against Dynamic snitch. This is not related to logged batch writes, and today the dynamic snitch does nothing for it anyway. The advice to disable the dynamic snitch has been a long standing recommendation from The Last Pickle, aimed at competent Cassandra operators that have healthy and performant clusters, and solid enough monitoring and alerting in place to otherwise detect and deal with a slow node. The dynamic snitch comes with its own overhead, and on healthy performant clusters can't keep up, so offers very little value and hurts latencies. (Don't look past those caveats though!) If you have a problem with slow nodes, and don't have a way to deal with it, then the dynamic snitch is a good option, and adding the same ability to the batchlog makes sense. was (Author: michaelsembwever): [~shunsaker], do you want to share the patch you're willing to upstream ? That patch would have had a lot of production exposure already, so it would be my preference. [~maximc], are you ok if we focus on Shayne's patch ? I know you've done a lot of work already, and it sucks when you've completed a patch and it was the first patch offered. Given your expertise now, and not letting it go to waste, it would be very valuable to have you as a reviewer (and tester). bq. Michael Semb Wever and Maxim Chanturiay provide strong arguments against Dynamic snitch. This is not related to logged batch writes, and today the dynamic snitch does nothing for it anyway. The advice to disable the dynamic snitch has been a long standing recommendation from The Last Pickle, aimed at competent Cassandra operators that have healthy and performant clusters, and solid enough monitoring and alerting in place to otherwise detect and deal with a slow node. The dynamic snitch comes with its own overhead, and on healthy performant clusters can't keep up, so offers very little value. (Don't look past those caveats though!) If you have a problem with slow nodes, and don't have a way to deal with it, then the dynamic snitch is a good option, and adding the same ability to the batchlog makes sense. > Single slow node dramatically reduces cluster logged batch write throughput > regardless of CL > > > Key: CASSANDRA-18120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18120 > Project: Cassandra > Issue Type: Improvement >Reporter: Dan Sarisky >Assignee: Maxim Chanturiay >Priority: Normal > > We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, > QUORUM, or LOCAL_QUORUM) > > On clusters of any size - a single extremely slow node causes a ~90% loss of > cluster-wide throughput using batched writes. We can replicate this in the > lab via CPU or disk throttling. I observe this in 3.11, 4.0, and 4.1. > > It appears the mechanism in play is: > Those logged batches are immediately written to two replica nodes and the > actual mutations aren't processed until those two nodes acknowledge the batch > statements. Those replica nodes are selected randomly from all nodes in the > local data center currently up in gossip. If a single node is slow, but > still thought to be up in gossip, this eventually causes every other node to > have all of its MutationStages to be waiting while the slow replica accepts > batch writes. > > The code in play appears to be: > See > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245]. > In the method filterBatchlogEndpoints() there is a > Collections.shuffle() to order the endpoints and a > FailureDetector.isEndpointAlive() to test if the endpoint is acceptable. > > This behavior causes Cassandra to move from a multi-node fault tolerant > system toa collection of single points of failure. > > We try to take administrator actions t
[jira] [Comment Edited] (CASSANDRA-18120) Single slow node dramatically reduces cluster logged batch write throughput regardless of CL
[ https://issues.apache.org/jira/browse/CASSANDRA-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849074#comment-17849074 ] Michael Semb Wever edited comment on CASSANDRA-18120 at 5/23/24 6:38 PM: - [~shunsaker], do you want to share the patch you're willing to upstream ? This patch has had a lot of production exposure already, so it has my preference. [~maximc], are you ok if we focus on Shayne's patch ? I know you've done a lot of work already, and it sucks when you've completed a patch and it was the first patch offered. Given your expertise now, and not letting it go to waste, it would be very valuable to have you as a reviewer (and tester). bq. Michael Semb Wever and Maxim Chanturiay provide strong arguments against Dynamic snitch. This is not related to logged batch writes, and today the dynamic snitch does nothing for it anyway. The advice to disable the dynamic snitch has been a long standing recommendation from The Last Pickle, aimed at competent Cassandra operators that have healthy and performant clusters, and solid enough monitoring and alerting in place to otherwise detect and deal with a slow node. The dynamic snitch comes with its own overhead, and on healthy performant clusters can't keep up, so offers very little value. (Don't look past those caveats though!) If you have a problem with slow nodes, and don't have a way to deal with it, then the dynamic snitch is a good option, and adding the same ability to the batchlog makes sense. was (Author: michaelsembwever): [~shunsaker], do you want to share the patch you're willing to upstream ? This patch has had a lot of production exposure already, so it has my preference. [~maximc], are you ok if we focus on Shayne's patch ? I know you've done a lot of work already, and it sucks when you've completed a patch and it was the first patch offered. Given your expertise now, and not letting it go to waste, it would be very valuable to have you as a reviewer (and tester). bq. Michael Semb Wever and Maxim Chanturiay provide strong arguments against Dynamic snitch. This is not related to logged batch writes, and today the dynamic snitch does nothing for it anyway. The advice to disable the dynamic snitch has been a long standing recommendation from The Last Pickle, aimed at competent Cassandra operators that have healthy and performant clusters, and solid enough monitoring and alerting in place to otherwise detect and deal with a slow node. The dynamic snitch comes with its own overhead, and on healthy performant clusters can't keep up, so offers very little value. (Don't look past those caveats though!) If you have a problem with slow nodes, and don't have a way to deal with it, then the dynamic snitch is a good option, and adding the same ability to the batchlog adds to that. > Single slow node dramatically reduces cluster logged batch write throughput > regardless of CL > > > Key: CASSANDRA-18120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18120 > Project: Cassandra > Issue Type: Improvement >Reporter: Dan Sarisky >Assignee: Maxim Chanturiay >Priority: Normal > > We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, > QUORUM, or LOCAL_QUORUM) > > On clusters of any size - a single extremely slow node causes a ~90% loss of > cluster-wide throughput using batched writes. We can replicate this in the > lab via CPU or disk throttling. I observe this in 3.11, 4.0, and 4.1. > > It appears the mechanism in play is: > Those logged batches are immediately written to two replica nodes and the > actual mutations aren't processed until those two nodes acknowledge the batch > statements. Those replica nodes are selected randomly from all nodes in the > local data center currently up in gossip. If a single node is slow, but > still thought to be up in gossip, this eventually causes every other node to > have all of its MutationStages to be waiting while the slow replica accepts > batch writes. > > The code in play appears to be: > See > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245]. > In the method filterBatchlogEndpoints() there is a > Collections.shuffle() to order the endpoints and a > FailureDetector.isEndpointAlive() to test if the endpoint is acceptable. > > This behavior causes Cassandra to move from a multi-node fault tolerant > system toa collection of single points of failure. > > We try to take administrator actions to kill off the extremely slow nodes, &g
[jira] [Comment Edited] (CASSANDRA-18120) Single slow node dramatically reduces cluster logged batch write throughput regardless of CL
[ https://issues.apache.org/jira/browse/CASSANDRA-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849074#comment-17849074 ] Michael Semb Wever edited comment on CASSANDRA-18120 at 5/23/24 6:38 PM: - [~shunsaker], do you want to share the patch you're willing to upstream ? That patch would have had a lot of production exposure already, so it would be my preference. [~maximc], are you ok if we focus on Shayne's patch ? I know you've done a lot of work already, and it sucks when you've completed a patch and it was the first patch offered. Given your expertise now, and not letting it go to waste, it would be very valuable to have you as a reviewer (and tester). bq. Michael Semb Wever and Maxim Chanturiay provide strong arguments against Dynamic snitch. This is not related to logged batch writes, and today the dynamic snitch does nothing for it anyway. The advice to disable the dynamic snitch has been a long standing recommendation from The Last Pickle, aimed at competent Cassandra operators that have healthy and performant clusters, and solid enough monitoring and alerting in place to otherwise detect and deal with a slow node. The dynamic snitch comes with its own overhead, and on healthy performant clusters can't keep up, so offers very little value. (Don't look past those caveats though!) If you have a problem with slow nodes, and don't have a way to deal with it, then the dynamic snitch is a good option, and adding the same ability to the batchlog makes sense. was (Author: michaelsembwever): [~shunsaker], do you want to share the patch you're willing to upstream ? This patch has had a lot of production exposure already, so it has my preference. [~maximc], are you ok if we focus on Shayne's patch ? I know you've done a lot of work already, and it sucks when you've completed a patch and it was the first patch offered. Given your expertise now, and not letting it go to waste, it would be very valuable to have you as a reviewer (and tester). bq. Michael Semb Wever and Maxim Chanturiay provide strong arguments against Dynamic snitch. This is not related to logged batch writes, and today the dynamic snitch does nothing for it anyway. The advice to disable the dynamic snitch has been a long standing recommendation from The Last Pickle, aimed at competent Cassandra operators that have healthy and performant clusters, and solid enough monitoring and alerting in place to otherwise detect and deal with a slow node. The dynamic snitch comes with its own overhead, and on healthy performant clusters can't keep up, so offers very little value. (Don't look past those caveats though!) If you have a problem with slow nodes, and don't have a way to deal with it, then the dynamic snitch is a good option, and adding the same ability to the batchlog makes sense. > Single slow node dramatically reduces cluster logged batch write throughput > regardless of CL > > > Key: CASSANDRA-18120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18120 > Project: Cassandra > Issue Type: Improvement >Reporter: Dan Sarisky >Assignee: Maxim Chanturiay >Priority: Normal > > We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, > QUORUM, or LOCAL_QUORUM) > > On clusters of any size - a single extremely slow node causes a ~90% loss of > cluster-wide throughput using batched writes. We can replicate this in the > lab via CPU or disk throttling. I observe this in 3.11, 4.0, and 4.1. > > It appears the mechanism in play is: > Those logged batches are immediately written to two replica nodes and the > actual mutations aren't processed until those two nodes acknowledge the batch > statements. Those replica nodes are selected randomly from all nodes in the > local data center currently up in gossip. If a single node is slow, but > still thought to be up in gossip, this eventually causes every other node to > have all of its MutationStages to be waiting while the slow replica accepts > batch writes. > > The code in play appears to be: > See > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245]. > In the method filterBatchlogEndpoints() there is a > Collections.shuffle() to order the endpoints and a > FailureDetector.isEndpointAlive() to test if the endpoint is acceptable. > > This behavior causes Cassandra to move from a multi-node fault tolerant > system toa collection of single points of failure. > > We try to take administrator actions to kill off the extremely slow nodes, &g
[jira] [Updated] (CASSANDRA-18120) Single slow node dramatically reduces cluster logged batch write throughput regardless of CL
[ https://issues.apache.org/jira/browse/CASSANDRA-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-18120: --- Summary: Single slow node dramatically reduces cluster logged batch write throughput regardless of CL (was: Single slow node dramatically reduces cluster write throughput regardless of CL) > Single slow node dramatically reduces cluster logged batch write throughput > regardless of CL > > > Key: CASSANDRA-18120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18120 > Project: Cassandra > Issue Type: Improvement >Reporter: Dan Sarisky >Assignee: Maxim Chanturiay >Priority: Normal > > We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, > QUORUM, or LOCAL_QUORUM) > > On clusters of any size - a single extremely slow node causes a ~90% loss of > cluster-wide throughput using batched writes. We can replicate this in the > lab via CPU or disk throttling. I observe this in 3.11, 4.0, and 4.1. > > It appears the mechanism in play is: > Those logged batches are immediately written to two replica nodes and the > actual mutations aren't processed until those two nodes acknowledge the batch > statements. Those replica nodes are selected randomly from all nodes in the > local data center currently up in gossip. If a single node is slow, but > still thought to be up in gossip, this eventually causes every other node to > have all of its MutationStages to be waiting while the slow replica accepts > batch writes. > > The code in play appears to be: > See > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245]. > In the method filterBatchlogEndpoints() there is a > Collections.shuffle() to order the endpoints and a > FailureDetector.isEndpointAlive() to test if the endpoint is acceptable. > > This behavior causes Cassandra to move from a multi-node fault tolerant > system toa collection of single points of failure. > > We try to take administrator actions to kill off the extremely slow nodes, > but it would be great to have some notion of "what node is a bad choice" when > writing log batches to replica nodes. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18120) Single slow node dramatically reduces cluster write throughput regardless of CL
[ https://issues.apache.org/jira/browse/CASSANDRA-18120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849074#comment-17849074 ] Michael Semb Wever commented on CASSANDRA-18120: [~shunsaker], do you want to share the patch you're willing to upstream ? This patch has had a lot of production exposure already, so it has my preference. [~maximc], are you ok if we focus on Shayne's patch ? I know you've done a lot of work already, and it sucks when you've completed a patch and it was the first patch offered. Given your expertise now, and not letting it go to waste, it would be very valuable to have you as a reviewer (and tester). bq. Michael Semb Wever and Maxim Chanturiay provide strong arguments against Dynamic snitch. This is not related to logged batch writes, and today the dynamic snitch does nothing for it anyway. The advice to disable the dynamic snitch has been a long standing recommendation from The Last Pickle, aimed at competent Cassandra operators that have healthy and performant clusters, and solid enough monitoring and alerting in place to otherwise detect and deal with a slow node. The dynamic snitch comes with its own overhead, and on healthy performant clusters can't keep up, so offers very little value. (Don't look past those caveats though!) If you have a problem with slow nodes, and don't have a way to deal with it, then the dynamic snitch is a good option, and adding the same ability to the batchlog adds to that. > Single slow node dramatically reduces cluster write throughput regardless of > CL > --- > > Key: CASSANDRA-18120 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18120 > Project: Cassandra > Issue Type: Improvement >Reporter: Dan Sarisky >Assignee: Maxim Chanturiay >Priority: Normal > > We issue writes to Cassandra as logged batches(RF=3, Consistency levels=TWO, > QUORUM, or LOCAL_QUORUM) > > On clusters of any size - a single extremely slow node causes a ~90% loss of > cluster-wide throughput using batched writes. We can replicate this in the > lab via CPU or disk throttling. I observe this in 3.11, 4.0, and 4.1. > > It appears the mechanism in play is: > Those logged batches are immediately written to two replica nodes and the > actual mutations aren't processed until those two nodes acknowledge the batch > statements. Those replica nodes are selected randomly from all nodes in the > local data center currently up in gossip. If a single node is slow, but > still thought to be up in gossip, this eventually causes every other node to > have all of its MutationStages to be waiting while the slow replica accepts > batch writes. > > The code in play appears to be: > See > [https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/locator/ReplicaPlans.java#L245]. > In the method filterBatchlogEndpoints() there is a > Collections.shuffle() to order the endpoints and a > FailureDetector.isEndpointAlive() to test if the endpoint is acceptable. > > This behavior causes Cassandra to move from a multi-node fault tolerant > system toa collection of single points of failure. > > We try to take administrator actions to kill off the extremely slow nodes, > but it would be great to have some notion of "what node is a bad choice" when > writing log batches to replica nodes. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-12937: -- Status: In Progress (was: Patch Available) > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h 20m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849025#comment-17849025 ] Dmitry Konstantinov edited comment on CASSANDRA-19651 at 5/23/24 4:05 PM: -- I have found one more place in the current logic to fix: currently we consider that requestedCLAchieved = true if org.apache.cassandra.service.AbstractWriteResponseHandler#signal is invoked but this method is invoked not only when we have enough successful responses to treat the operation as complete, there is also signal() invocation from org.apache.cassandra.service.AbstractWriteResponseHandler#onFailure when we have got too many failures and can say that there is no chance to get enough successful responses. So, a condition: if (blockFor() + failures <= candidateReplicaCount()) should be added into the signal() logic for ideal CL logic. I am going to share an updated patch version. was (Author: dnk): I have found one more place in the current logic to fix: currently we consider that requestedCLAchieved = true if org.apache.cassandra.service.AbstractWriteResponseHandler#signal is invoked but this method is invoked not only when we have enough successful responses to treat the operation as complete, there is also signal() invocation from org.apache.cassandra.service.AbstractWriteResponseHandler#onFailure when we have got too many failures and can say that there is no change to get enough successful responses. So, a condition: if (blockFor() + failures <= candidateReplicaCount()) should be added into the signal() logic for ideal CL logic. I am going to share an updated patch version. > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Assignee: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 19651-4.1.patch > > > org.apache.cassandra.service.AbstractWriteResponseHandler: > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849025#comment-17849025 ] Dmitry Konstantinov commented on CASSANDRA-19651: - I have found one more place in the current logic to fix: currently we consider that requestedCLAchieved = true if org.apache.cassandra.service.AbstractWriteResponseHandler#signal is invoked but this method is invoked not only when we have enough successful responses to treat the operation as complete, there is also signal() invocation from org.apache.cassandra.service.AbstractWriteResponseHandler#onFailure when we have got too many failures and can say that there is no change to get enough successful responses. So, a condition: if (blockFor() + failures <= candidateReplicaCount()) should be added into the signal() logic for ideal CL logic. I am going to share an updated patch version. > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Assignee: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 19651-4.1.patch > > > org.apache.cassandra.service.AbstractWriteResponseHandler: > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Konstantinov updated CASSANDRA-19651: Status: In Progress (was: Patch Available) > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Assignee: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 19651-4.1.patch > > > org.apache.cassandra.service.AbstractWriteResponseHandler: > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12937) Default setting (yaml) for SSTable compression
[ https://issues.apache.org/jira/browse/CASSANDRA-12937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849018#comment-17849018 ] Stefan Miklosovic commented on CASSANDRA-12937: --- I consolidated the links for PR's as it was getting confusing. Claude's PR: [https://github.com/apache/cassandra/pull/3168] mine which is same as his but squashed: [https://github.com/apache/cassandra/pull/3330] I rebased mine against current trunk where CASSANDRA-19592 was merged and I see that the problematic test (SSTableCompressionTest#configChangeIsolation) passes now which is indeed good news. I will run a CI to see if something else is broken. btw [~jlewandowski] mentioned me privately that it would be nice if we had the configuration like this: {code} sstable: selected_format: big default_compression: lz4 check this format: big: option1: abc option2: 123 bti: option3: xyz option4: 999 compression: check this lz4: enabled: true chunk_length: 16KiB max_compressed_length: 16KiB snappy: enabled: true chunk_length: 16KiB max_compressed_length: 16KiB deflate: enabled: false chunk_length: 16KiB max_compressed_length: 16KiB {code} instead of what we have now: {code} sstable_compression: - class_name: lz4 parameters: - enabled: "true" chunk_length: 16KiB max_compressed_length: 16KiB {code} The reasoning behind that is that we are just enriching existing configuration section, we are not inventing anything new. Plus it would be cool to have predefined compression options so if we just use lz4 in CQL then all parameters will be automatically taken into consideration as well. If we provide some parameters on CQL, these will be merged into what is in cassandra.yaml. [~claude] I can take a look into this. > Default setting (yaml) for SSTable compression > -- > > Key: CASSANDRA-12937 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12937 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Michael Semb Wever >Assignee: Stefan Miklosovic >Priority: Low > Labels: AdventCalendar2021 > Fix For: 5.x > > Time Spent: 8h 20m > Remaining Estimate: 0h > > In many situations the choice of compression for sstables is more relevant to > the disks attached than to the schema and data. > This issue is to add to cassandra.yaml a default value for sstable > compression that new tables will inherit (instead of the defaults found in > {{CompressionParams.DEFAULT}}. > Examples where this can be relevant are filesystems that do on-the-fly > compression (btrfs, zfs) or specific disk configurations or even specific C* > versions (see CASSANDRA-10995 ). > +Additional information for newcomers+ > Some new fields need to be added to {{cassandra.yaml}} to allow specifying > the field required for defining the default compression parameters. In > {{DatabaseDescriptor}} a new {{CompressionParams}} field should be added for > the default compression. This field should be initialized in > {{DatabaseDescriptor.applySimpleConfig()}}. At the different places where > {{CompressionParams.DEFAULT}} was used the code should call > {{DatabaseDescriptor#getDefaultCompressionParams}} that should return some > copy of configured {{CompressionParams}}. > Some unit test using {{OverrideConfigurationLoader}} should be used to test > that the table schema use the new default when a new table is created (see > CreateTest for some example). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19660) Support for netty-tcnative 2.0.62+
[ https://issues.apache.org/jira/browse/CASSANDRA-19660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849002#comment-17849002 ] Brandon Williams commented on CASSANDRA-19660: -- 2.0.62 for trunk sounds reasonable to me. > Support for netty-tcnative 2.0.62+ > -- > > Key: CASSANDRA-19660 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19660 > Project: Cassandra > Issue Type: Improvement >Reporter: Zbyszek Z >Priority: Normal > > Hello, > Are there plans to support netty-tcnative in version 2.0.62? Current version > 2.0.36 does not work with openssl3.x. Motivation is that openssl 3.0.9+ is > FIPS certified. > Currently i am able to replace library default boringSSL implementation with > openssl by recompiling netty-tcnative but cassandra fails to load 2.0.62 > regardless if it is compiled with 1.1.1 or 3.0. > Or is there other way to implement openssl3.x ? > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19660) Support for netty-tcnative 2.0.62+
[ https://issues.apache.org/jira/browse/CASSANDRA-19660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848999#comment-17848999 ] Zbyszek Z commented on CASSANDRA-19660: --- Oh, thats bummer as this is only 1 version up. On the github there is wrong version number suggesting that .61 added openssl version but it was infact in .62. No way to reconsider ? > Support for netty-tcnative 2.0.62+ > -- > > Key: CASSANDRA-19660 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19660 > Project: Cassandra > Issue Type: Improvement >Reporter: Zbyszek Z >Priority: Normal > > Hello, > Are there plans to support netty-tcnative in version 2.0.62? Current version > 2.0.36 does not work with openssl3.x. Motivation is that openssl 3.0.9+ is > FIPS certified. > Currently i am able to replace library default boringSSL implementation with > openssl by recompiling netty-tcnative but cassandra fails to load 2.0.62 > regardless if it is compiled with 1.1.1 or 3.0. > Or is there other way to implement openssl3.x ? > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Reviewers: Sam Tunnicliffe, Stefan Miklosovic (was: Sam Tunnicliffe) > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Fix Version/s: 5.1 Since Version: NA Source Control Link: https://github.com/apache/cassandra/commit/7fe30fc313ac35b1156f5a37d2069e29cded710b Resolution: Fixed Status: Resolved (was: Ready to Commit) committed, thanks. > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Fix For: 5.1 > > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-19592: Status: Ready to Commit (was: Review In Progress) > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848984#comment-17848984 ] Sam Tunnicliffe commented on CASSANDRA-19592: - {quote}Does anything else need to be done except merging? {quote} No, I think it just fell between Alex & me. I'll get it rebased & merged. > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19660) Support for netty-tcnative 2.0.62+
[ https://issues.apache.org/jira/browse/CASSANDRA-19660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848982#comment-17848982 ] Brandon Williams commented on CASSANDRA-19660: -- Trunk is already at 2.0.61: https://github.com/apache/cassandra/blob/trunk/.build/parent-pom-template.xml#L802 We won't be upgrading libs in released majors (including 5.0 at this point since it is so close.) > Support for netty-tcnative 2.0.62+ > -- > > Key: CASSANDRA-19660 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19660 > Project: Cassandra > Issue Type: Improvement >Reporter: Zbyszek Z >Priority: Normal > > Hello, > Are there plans to support netty-tcnative in version 2.0.62? Current version > 2.0.36 does not work with openssl3.x. Motivation is that openssl 3.0.9+ is > FIPS certified. > Currently i am able to replace library default boringSSL implementation with > openssl by recompiling netty-tcnative but cassandra fails to load 2.0.62 > regardless if it is compiled with 1.1.1 or 3.0. > Or is there other way to implement openssl3.x ? > Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-17667) Text value containing "/*" interpreted as multiline comment in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-17667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-17667: --- Status: Patch Available (was: In Progress) > Text value containing "/*" interpreted as multiline comment in cqlsh > > > Key: CASSANDRA-17667 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17667 > Project: Cassandra > Issue Type: Bug > Components: CQL/Interpreter >Reporter: ANOOP THOMAS >Assignee: Brad Schoening >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x, 5.x > > > I use CQLSH command line utility to load some DDLs. The version of utility I > use is this: > {noformat} > [cqlsh 6.0.0 | Cassandra 4.0.0.47 | CQL spec 3.4.5 | Native protocol > v5]{noformat} > Command that loads DDL.cql: > {noformat} > cqlsh -u username -p password cassandra.example.com 65503 --ssl -f DDL.cql > {noformat} > I have a line in CQL script that breaks the syntax. > {noformat} > INSERT into tablename (key,columnname1,columnname2) VALUES > ('keyName','value1','/value2/*/value3');{noformat} > {{/*}} here is interpreted as start of multi-line comment. It used to work on > older versions of cqlsh. The error I see looks like this: > {noformat} > SyntaxException: line 4:2 mismatched input 'Update' expecting ')' > (...,'value1','/value2INSERT into tablename(INSERT into tablename > (key,columnname1,columnname2)) VALUES ('[Update]-...) SyntaxException: line > 1:0 no viable alternative at input '(' ([(]...) > {noformat} > Same behavior while running in interactive mode too. {{/*}} inside a CQL > statement should not be interpreted as start of multi-line comment. > With schema: > {code:java} > CREATE TABLE tablename ( key text primary key, columnname1 text, columnname2 > text);{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19592) Expand CREATE TABLE CQL on a coordinating node before submitting to CMS
[ https://issues.apache.org/jira/browse/CASSANDRA-19592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848969#comment-17848969 ] Stefan Miklosovic commented on CASSANDRA-19592: --- Does anything else need to be done except merging? This would unblock CASSANDRA-12937 as you are for sure aware of. > Expand CREATE TABLE CQL on a coordinating node before submitting to CMS > --- > > Key: CASSANDRA-19592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19592 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Schema >Reporter: Alex Petrov >Assignee: Alex Petrov >Priority: Normal > Attachments: ci_summary-1.html, ci_summary.html > > > This is done to unblock CASSANDRA-12937 and allow preserving defaults with > which the table was created between node bounces and between nodes with > different configurations. For now, we are preserving 5.0 behaviour. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19450) Hygiene updates for warnings and pytests
[ https://issues.apache.org/jira/browse/CASSANDRA-19450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brad Schoening updated CASSANDRA-19450: --- Description: * -Update 'Warning' message to write to stderr- * -Replace TimeoutError Exception with builtin (since Python 3.3)- * -Remove re.pattern_type (removed since Python 3.7)- * Fix mutable arg [] in read_until() * Remove redirect of stderr to stdout in pytest fixture with tty=false; Deprecation warnings can otherwise fail unit tests when stdout & stderr output is combined. * Fix several pycodestyle issues was: * Update 'Warning' message to write to stderr * -Replace TimeoutError Exception with builtin (since Python 3.3)- * -Remove re.pattern_type (removed since Python 3.7)- * Fix mutable arg [] in read_until() * Remove redirect of stderr to stdout in pytest fixture with tty=false; Deprecation warnings can otherwise fail unit tests when stdout & stderr output is combined. * Fix several pycodestyle issues > Hygiene updates for warnings and pytests > > > Key: CASSANDRA-19450 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19450 > Project: Cassandra > Issue Type: Improvement > Components: CQL/Interpreter >Reporter: Brad Schoening >Assignee: Brad Schoening >Priority: Low > Fix For: 5.x > > > > * -Update 'Warning' message to write to stderr- > * -Replace TimeoutError Exception with builtin (since Python 3.3)- > * -Remove re.pattern_type (removed since Python 3.7)- > * Fix mutable arg [] in read_until() > * Remove redirect of stderr to stdout in pytest fixture with tty=false; > Deprecation warnings can otherwise fail unit tests when stdout & stderr > output is combined. > * Fix several pycodestyle issues -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19660) Support for netty-tcnative 2.0.62+
Zbyszek Z created CASSANDRA-19660: - Summary: Support for netty-tcnative 2.0.62+ Key: CASSANDRA-19660 URL: https://issues.apache.org/jira/browse/CASSANDRA-19660 Project: Cassandra Issue Type: Improvement Reporter: Zbyszek Z Hello, Are there plans to support netty-tcnative in version 2.0.62? Current version 2.0.36 does not work with openssl3.x. Motivation is that openssl 3.0.9+ is FIPS certified. Currently i am able to replace library default boringSSL implementation with openssl by recompiling netty-tcnative but cassandra fails to load 2.0.62 regardless if it is compiled with 1.1.1 or 3.0. Or is there other way to implement openssl3.x ? Thank you -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19632: -- Status: Needs Committer (was: Patch Available) > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 0.5h > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848940#comment-17848940 ] Stefan Miklosovic edited comment on CASSANDRA-19632 at 5/23/24 12:59 PM: - I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first PR. [https://github.com/apache/cassandra/pull/3329] The perception I got by going through all of that is that people were already following the rule of "if it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. Not all people also seem to understand that when it is logged like this: {code:java} logger.trace("abc {}", object); {code} then the actual object.toString() is evaluated _after_ we are absolutely sure we go to indeed log. I do not think that this is necessary, even "object" is some "heavyweight" when it comes to toString because it is not called prematurely anyway. {code:java} if (logger.isTraceEnabled()) logger.trace("abc {}", object); {code} as per [https://www.slf4j.org/faq.html#string_contents] {quote}The logging system will invoke complexObject.toString() method only after it has ascertained that the log statement was enabled. Otherwise, the cost of complexObject.toString() conversion will be advantageously avoided. {quote} was (Author: smiklosovic): I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. Not all people also seem to understand that when it is logged like this: {code} logger.trace("abc {}", object); {code} then the actual object.toString() is evaluated _after_ we are absolutely sure we go to indeed log. I do not think that this is necessary, even "object" is some "heavyweight" when it comes to toString because it is not called prematurely anyway. {code} if (logger.isTraceEnabled()) logger.trace("abc {}", object); {code} as per https://www.slf4j.org/faq.html#string_contents {quote} The logging system will invoke complexObject.toString() method only after it has ascertained that the log statement was enabled. Otherwise, the cost of complexObject.toString() conversion will be advantageously avoided. {quote} > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 20m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848940#comment-17848940 ] Stefan Miklosovic edited comment on CASSANDRA-19632 at 5/23/24 12:57 PM: - I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. Not all people also seem to understand that when it is logged like this: {code} logger.trace("abc {}", object); {code} then the actual object.toString() is evaluated _after_ we are absolutely sure we go to indeed log. I do not think that this is necessary, even "object" is some "heavyweight" when it comes to toString because it is not called prematurely anyway. {code} if (logger.isTraceEnabled()) logger.trace("abc {}", object); {code} as per https://www.slf4j.org/faq.html#string_contents {quote} The logging system will invoke complexObject.toString() method only after it has ascertained that the log statement was enabled. Otherwise, the cost of complexObject.toString() conversion will be advantageously avoided. {quote} was (Author: smiklosovic): I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. Not all people also seem to understand that when it is logged like this: {code} logger.trace("abc {}", object); {code} then the actual object.toString() is evaluated _after_ we are absolutely sure we go to indeed log. I do not think that this is necessary, even "object" is some "heavyweight" when it comes to toString because it is not called prematurely anyway. {code} if (logger.isTraceEnabled()) logger.trace("abc {}", object); {code} > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 20m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848940#comment-17848940 ] Stefan Miklosovic edited comment on CASSANDRA-19632 at 5/23/24 12:55 PM: - I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. Not all people also dont seem to understand that when it is logger like this: {code} logger.trace("abc {}", object); {code} then the actual object.toString() is evaluated _after_ we are absolutely sure we go to indeed log. I do not think that this is necessary, even "object" is some "heavyweight" when it comes to toString because it is not called prematurely anyway. {code} if (logger.isTraceEnabled()) logger.trace("abc {}", object); {code} was (Author: smiklosovic): I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 20m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848940#comment-17848940 ] Stefan Miklosovic edited comment on CASSANDRA-19632 at 5/23/24 12:55 PM: - I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. Not all people also seem to understand that when it is logged like this: {code} logger.trace("abc {}", object); {code} then the actual object.toString() is evaluated _after_ we are absolutely sure we go to indeed log. I do not think that this is necessary, even "object" is some "heavyweight" when it comes to toString because it is not called prematurely anyway. {code} if (logger.isTraceEnabled()) logger.trace("abc {}", object); {code} was (Author: smiklosovic): I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. Not all people also seem to understand that when it is logger like this: {code} logger.trace("abc {}", object); {code} then the actual object.toString() is evaluated _after_ we are absolutely sure we go to indeed log. I do not think that this is necessary, even "object" is some "heavyweight" when it comes to toString because it is not called prematurely anyway. {code} if (logger.isTraceEnabled()) logger.trace("abc {}", object); {code} > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 20m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848940#comment-17848940 ] Stefan Miklosovic edited comment on CASSANDRA-19632 at 5/23/24 12:55 PM: - I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. Not all people also seem to understand that when it is logger like this: {code} logger.trace("abc {}", object); {code} then the actual object.toString() is evaluated _after_ we are absolutely sure we go to indeed log. I do not think that this is necessary, even "object" is some "heavyweight" when it comes to toString because it is not called prematurely anyway. {code} if (logger.isTraceEnabled()) logger.trace("abc {}", object); {code} was (Author: smiklosovic): I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. Not all people also dont seem to understand that when it is logger like this: {code} logger.trace("abc {}", object); {code} then the actual object.toString() is evaluated _after_ we are absolutely sure we go to indeed log. I do not think that this is necessary, even "object" is some "heavyweight" when it comes to toString because it is not called prematurely anyway. {code} if (logger.isTraceEnabled()) logger.trace("abc {}", object); {code} > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 20m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848940#comment-17848940 ] Stefan Miklosovic commented on CASSANDRA-19632: --- I went through all logger.trace in the production code and I modified only 59 files instead of 127 in the first one. https://github.com/apache/cassandra/pull/3329 The perception I got by going through all of that is that people were already following the rule of "it it has more than 2 arguments then wrap it in logger.isTraceEnabled" so I went by that logic as well everywhere where it was not done like that. There were also inconsistent usages of logger.trace() with 0 / 1 / 2 arguments. Sometimes it was wrapped in isTraceEnabled, sometimes it was not, without any apparent reason. I think that for simple cases it is not necessary to wrap it, we have majority of cases like that in the code base (not wrapped). I have also fixed the cases where string concatenation was used and similar. > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 20m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19659) null values injected while drop compact storage was executed
Matthias Pfau created CASSANDRA-19659: - Summary: null values injected while drop compact storage was executed Key: CASSANDRA-19659 URL: https://issues.apache.org/jira/browse/CASSANDRA-19659 Project: Cassandra Issue Type: Bug Components: Cluster/Schema Reporter: Matthias Pfau We noticed that values of some inserts that were run in parallel to an alter table drop compact storage statement turned into null values. This happened with version 3.11.10. Since then, we upgraded to 3.11.17. We tried to write a reproducer (inserting millions of rows/columns in parallel to drop compact storage) but we were not able to reproduce the bug with 3.11.17. This was reported on the mailing list first ([https://lists.apache.org/thread/hgwyd917yp01q0k9op79gztkx3qwypbc)] and seems to relate to https://issues.apache.org/jira/browse/CASSANDRA-13004. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848892#comment-17848892 ] Stefan Miklosovic edited comment on CASSANDRA-19632 at 5/23/24 10:48 AM: - I did some research on this and it is quite interesting read. https://www.slf4j.org/faq.html#logging_performance If we do this {code} logger.trace("abc" + obj + "def"); {code} there is a performance penalty related to evaluating the argument. When we do this: {code} logger.trace("abc {}", def); {code} this will evaluate {} only when we go to log on trace so it will check tracing just once and it will evaluate {} at most once (if we indeed go to log). Doing this {code} if (logger.isTracingEnabled()) logger.trace("abc" + obj + "def"); {code} is the least performance-friendly, because it needs to evaluate if we go to log on trace and if we do, then it will evaluate it for the second time in logger.trace itself, plus it needs to construct the logging message before doing so. Doing this: {code} if (logger.isTracingEnabled()) logger.trace("abc {}", def); {code} will check if we go to trace at most twice and it will evaluate the placeholder at most once. I think that wrapping logger.trace with logger.isTracingEnabled() is necessary only in case if 1) we do not use placeholders in the logging message, only string concatenation, which would construct the logging message regardless of logging or not, because it was not checked yet 2) if the number of placeholder arguments in logger.trace is bigger than 2. For example, for 2) {code} logger.trace("abc {} def {}", obj1, obj2); {code} This will be OK. However this: {code} logger.trace("abc {} def {} ghi {}", obj1, obj2, obj3); {code} this will contain the hidden cost of constructing an object, Object[] ... with these three parameters. Logging internals will create an object if the number of placeholders is bigger than 2. The cost of calling logger.isTracingEnabled is negligable, 1% of the actual logging of the message but I think it is not necessary, as long as we make sure that we are not using string concatenation upon message construction and as long as we are not using more than 2 placeholders. Also, it is important to check that it does not cost a lot to evaluate the argument itself, as we hit that case in CASSANDRA-19429, for example {code} logger.trace("abc {}", thisIs.Hard.ToResolve()) {code} Even we use a placeholder, if thisIs.Hard.ToResolve() takes a lot of resources, that is not good and in that case it is preferable to wrap it in isTracingEnabled(). There is no silver bullet, I think we need to just go case by case by the rules I described and change it where it does not comply. According to the docs, the alternative is also to use lambdas: {code} logger.trace("abc {}", () -> thisIs.Hard.ToResolve()) {code} this will check if we go to log on trace level just once and it will evaluate the placeholder if we indeed go to do that. I think this is the best solution. I will try to go over the codebase to see where we are at currently. EDIT: lambdas were added in SLF4J 2.0.0-alpha1 was (Author: smiklosovic): I did some research on this and it is quite interesting read. https://www.slf4j.org/faq.html#logging_performance If we do this {code} logger.trace("abc" + obj + "def"); {code} there is a performance penalty related to evaluating the argument. When we do this: {code} logger.trace("abc {}", def); {code} this will evaluate {} only when we go to log on trace so it will check tracing just once and it will evaluate {} at most once (if we indeed go to log). Doing this {code} if (logger.isTracingEnabled()) logger.trace("abc" + obj + "def"); {code} is the least performance-friendly, because it needs to evaluate if we go to log on trace and if we do, then it will evaluate it for the second time in logger.trace itself, plus it needs to construct the logging message before doing so. Doing this: {code} if (logger.isTracingEnabled()) logger.trace("abc {}", def); {code} will check if we go to trace at most twice and it will evaluate the placeholder at most once. I think that wrapping logger.trace with logger.isTracingEnabled() is necessary only in case if 1) we do not use placeholders in the logging message, only string concatenation, which would construct the logging message regardless of logging or not, because it was not checked yet 2) if the number of placeholder arguments in logger.trace is bigger than 2. For example, for 2) {code} logger.trace("abc {} def {}", obj1, obj2); {code} This will be OK. However this: {code} logger.trace("abc {} def {} ghi {}", obj1, obj2, obj3); {code} this will contain t
[jira] [Commented] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848892#comment-17848892 ] Stefan Miklosovic commented on CASSANDRA-19632: --- I did some research on this and it is quite interesting read. https://www.slf4j.org/faq.html#logging_performance If we do this {code} logger.trace("abc" + obj + "def"); {code} there is a performance penalty related to evaluating the argument. When we do this: {code} logger.trace("abc {}", def); {code} this will evaluate {} only when we go to log on trace so it will check tracing just once and it will evaluate {} at most once (if we indeed go to log). Doing this {code} if (logger.isTracingEnabled()) logger.trace("abc" + obj + "def"); {code} is the least performance-friendly, because it needs to evaluate if we go to log on trace and if we do, then it will evaluate it for the second time in logger.trace itself, plus it needs to construct the logging message before doing so. Doing this: {code} if (logger.isTracingEnabled()) logger.trace("abc {}", def); {code} will check if we go to trace at most twice and it will evaluate the placeholder at most once. I think that wrapping logger.trace with logger.isTracingEnabled() is necessary only in case if 1) we do not use placeholders in the logging message, only string concatenation, which would construct the logging message regardless of logging or not, because it was not checked yet 2) if the number of placeholder arguments in logger.trace is bigger than 2. For example, for 2) {code} logger.trace("abc {} def {}", obj1, obj2); {code} This will be OK. However this: {code} logger.trace("abc {} def {} ghi {}", obj1, obj2, obj3); {code} this will contain the hidden cost of constructing an object, Object[] ... with these three parameters. Logging internals will create an object if the number of placeholders is bigger than 2. The cost of calling logger.isTracingEnabled is negligable, 1% of the actual logging of the message but I think it is not necessary, as long as we make sure that we are not using string concatenation upon message construction and as long as we are not using more than 2 placeholders. Also, it is important to check that it does not cost a lot to evaluate the argument itself, as we hit that case in CASSANDRA-19429, for example {code} logger.trace("abc {}", thisIs.Hard.ToResolve()) {code} Even we use a placeholder, if thisIs.Hard.ToResolve() takes a lot of resources, that is not good and in that case it is preferable to wrap it in isTracingEnabled(). There is no silver bullet, I think we need to just go case by case by the rules I described and change it where it does not comply. According to the docs, the alternative is also to use lambdas: {code} logger.trace("abc {}", () -> thisIs.Hard.ToResolve()) {code} this will check if we go to log on trace level just once and it will evaluate the placeholder if we indeed go to do that. I think this is the best solution. I will try to go over the codebase to see where we are at currently. > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19658) Test failure: replace_address_test.py::TestReplaceAddress::test_restart_failed_replace
[ https://issues.apache.org/jira/browse/CASSANDRA-19658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848784#comment-17848784 ] Brandon Williams commented on CASSANDRA-19658: -- Looks like fallout from CASSANDRA-15439 > Test failure: > replace_address_test.py::TestReplaceAddress::test_restart_failed_replace > -- > > Key: CASSANDRA-19658 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19658 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x > > > This can be seen failing in butler: > https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-5.0/failure/replace_address_test/TestReplaceAddress/test_restart_failed_replace > {noformat} > ccmlib.node.TimeoutError: 14 May 2024 18:19:08 [node1] after 120.13/120 > seconds Missing: ['FatClient /127.0.0.4:7000 has been silent for 3ms, > removing from gossip'] not found in system.log: > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19658) Test failure: replace_address_test.py::TestReplaceAddress::test_restart_failed_replace
[ https://issues.apache.org/jira/browse/CASSANDRA-19658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19658: - Fix Version/s: 5.0.x (was: 5.0-rc) > Test failure: > replace_address_test.py::TestReplaceAddress::test_restart_failed_replace > -- > > Key: CASSANDRA-19658 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19658 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0.x > > > This can be seen failing in butler: > https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-5.0/failure/replace_address_test/TestReplaceAddress/test_restart_failed_replace > {noformat} > ccmlib.node.TimeoutError: 14 May 2024 18:19:08 [node1] after 120.13/120 > seconds Missing: ['FatClient /127.0.0.4:7000 has been silent for 3ms, > removing from gossip'] not found in system.log: > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19658) Test failure: replace_address_test.py::TestReplaceAddress::test_restart_failed_replace
[ https://issues.apache.org/jira/browse/CASSANDRA-19658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19658: - Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Normal Component/s: Cluster/Membership Discovered By: DTest Fix Version/s: 4.0.x 4.1.x 5.0-rc Severity: Normal Status: Open (was: Triage Needed) > Test failure: > replace_address_test.py::TestReplaceAddress::test_restart_failed_replace > -- > > Key: CASSANDRA-19658 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19658 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 4.0.x, 4.1.x, 5.0-rc > > > This can be seen failing in butler: > https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-5.0/failure/replace_address_test/TestReplaceAddress/test_restart_failed_replace > {noformat} > ccmlib.node.TimeoutError: 14 May 2024 18:19:08 [node1] after 120.13/120 > seconds Missing: ['FatClient /127.0.0.4:7000 has been silent for 3ms, > removing from gossip'] not found in system.log: > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19658) Test failure: replace_address_test.py::TestReplaceAddress::test_restart_failed_replace
[ https://issues.apache.org/jira/browse/CASSANDRA-19658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams reassigned CASSANDRA-19658: Assignee: Brandon Williams > Test failure: > replace_address_test.py::TestReplaceAddress::test_restart_failed_replace > -- > > Key: CASSANDRA-19658 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19658 > Project: Cassandra > Issue Type: Bug >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > > This can be seen failing in butler: > https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-5.0/failure/replace_address_test/TestReplaceAddress/test_restart_failed_replace > {noformat} > ccmlib.node.TimeoutError: 14 May 2024 18:19:08 [node1] after 120.13/120 > seconds Missing: ['FatClient /127.0.0.4:7000 has been silent for 3ms, > removing from gossip'] not found in system.log: > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19658) Test failure: replace_address_test.py::TestReplaceAddress::test_restart_failed_replace
Brandon Williams created CASSANDRA-19658: Summary: Test failure: replace_address_test.py::TestReplaceAddress::test_restart_failed_replace Key: CASSANDRA-19658 URL: https://issues.apache.org/jira/browse/CASSANDRA-19658 Project: Cassandra Issue Type: Bug Reporter: Brandon Williams This can be seen failing in butler: https://butler.cassandra.apache.org/#/ci/upstream/workflow/Cassandra-5.0/failure/replace_address_test/TestReplaceAddress/test_restart_failed_replace {noformat} ccmlib.node.TimeoutError: 14 May 2024 18:19:08 [node1] after 120.13/120 seconds Missing: ['FatClient /127.0.0.4:7000 has been silent for 3ms, removing from gossip'] not found in system.log: {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19534) unbounded queues in native transport requests lead to node instability
[ https://issues.apache.org/jira/browse/CASSANDRA-19534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848777#comment-17848777 ] Ekaterina Dimitrova commented on CASSANDRA-19534: - {quote}+1 LGTM (dropped a couple more little cleanup nits in the PR) {quote} Does this mean this is ready to commit? :D I am excited as this is one of the last two 5.0 rc blockers > unbounded queues in native transport requests lead to node instability > -- > > Key: CASSANDRA-19534 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19534 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Local Write-Read Paths >Reporter: Jon Haddad >Assignee: Alex Petrov >Priority: Normal > Fix For: 4.1.x, 5.0-rc, 5.x > > Attachments: Scenario 1 - QUEUE + Backpressure.jpg, Scenario 1 - > QUEUE.jpg, Scenario 1 - Stock.jpg, Scenario 2 - QUEUE + Backpressure.jpg, > Scenario 2 - QUEUE.jpg, Scenario 2 - Stock.jpg, ci_summary.html, > image-2024-05-03-16-08-10-101.png, screenshot-1.png, screenshot-2.png, > screenshot-3.png, screenshot-4.png, screenshot-5.png, screenshot-6.png, > screenshot-7.png, screenshot-8.png, screenshot-9.png > > Time Spent: 9h 50m > Remaining Estimate: 0h > > When a node is under pressure, hundreds of thousands of requests can show up > in the native transport queue, and it looks like it can take way longer to > timeout than is configured. We should be shedding load much more > aggressively and use a bounded queue for incoming work. This is extremely > evident when we combine a resource consuming workload with a smaller one: > Running 5.0 HEAD on a single node as of today: > {noformat} > # populate only > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --maxrlat 100 --populate > 10m --rate 50k -n 1 > # workload 1 - larger reads > easy-cass-stress run RandomPartitionAccess -p 100 -r 1 > --workload.rows=10 --workload.select=partition --rate 200 -d 1d > # second workload - small reads > easy-cass-stress run KeyValue -p 1m --rate 20k -r .5 -d 24h{noformat} > It appears our results don't time out at the requested server time either: > > {noformat} > Writes Reads > Deletes Errors > Count Latency (p99) 1min (req/s) | Count Latency (p99) 1min (req/s) | > Count Latency (p99) 1min (req/s) | Count 1min (errors/s) > 950286 70403.93 634.77 | 789524 70442.07 426.02 | > 0 0 0 | 9580484 18980.45 > 952304 70567.62 640.1 | 791072 70634.34 428.36 | > 0 0 0 | 9636658 18969.54 > 953146 70767.34 640.1 | 791400 70767.76 428.36 | > 0 0 0 | 9695272 18969.54 > 956833 71171.28 623.14 | 794009 71175.6 412.79 | > 0 0 0 | 9749377 19002.44 > 959627 71312.58 656.93 | 795703 71349.87 435.56 | > 0 0 0 | 9804907 18943.11{noformat} > > After stopping the load test altogether, it took nearly a minute before the > requests were no longer queued. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19648) Flaky test: StartupChecksTest#testKernelBug1057843Check() on Non-Linux OS
[ https://issues.apache.org/jira/browse/CASSANDRA-19648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19648: - Fix Version/s: 5.1 > Flaky test: StartupChecksTest#testKernelBug1057843Check() on Non-Linux OS > - > > Key: CASSANDRA-19648 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19648 > Project: Cassandra > Issue Type: Improvement > Components: Test/unit >Reporter: Ling Mao >Assignee: Ling Mao >Priority: Low > Fix For: 5.1-alpha1, 5.1 > > Time Spent: 20m > Remaining Estimate: 0h > > Flaky test: StartupChecksTest#testKernelBug1057843Check() cannot pass in my > MacOs(maybe Windows OS). Just skip this test when tested on Non-Linux OS -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19656) Revisit disabling chronicle analytics
[ https://issues.apache.org/jira/browse/CASSANDRA-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19656: - Test and Documentation Plan: run CI Status: Patch Available (was: Open) > Revisit disabling chronicle analytics > - > > Key: CASSANDRA-19656 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19656 > Project: Cassandra > Issue Type: Task > Components: Local/Other >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > > We first considered this in CASSANDRA-18538 but determined it wasn't a > problem. We have upgraded chronicle in CASSANDRA-18049 so we should > reconfirm with packet analysis that nothing is phoning home, and perhaps > consider taking further precautions by proactively disabling it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19656) Revisit disabling chronicle analytics
[ https://issues.apache.org/jira/browse/CASSANDRA-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848742#comment-17848742 ] Brandon Williams commented on CASSANDRA-19656: -- bq. Would be supportive of revisiting alternate libs in a future major to avoid the need to explicitly disable as well. I am in agreement there, it would be best to not have to do this again. For now, I have again confirmed nothing is phoning home while capturing packets during manual tests, unit tests, in-jvm dtests, and the fql and auditlog python dtests. [Here|https://github.com/driftx/cassandra/tree/CASSANDRA-19656-5.0] is a branch that revives my patch from CASSANDRA-18538 to disable in the server options, and takes [~aweisberg]'s suggestion of also disabling in static init in DatabaseDescriptor. If that looks agreeable I will merge up and run CI. > Revisit disabling chronicle analytics > - > > Key: CASSANDRA-19656 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19656 > Project: Cassandra > Issue Type: Task > Components: Local/Other >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > > We first considered this in CASSANDRA-18538 but determined it wasn't a > problem. We have upgraded chronicle in CASSANDRA-18049 so we should > reconfirm with packet analysis that nothing is phoning home, and perhaps > consider taking further precautions by proactively disabling it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848709#comment-17848709 ] Michael Semb Wever edited comment on CASSANDRA-19556 at 5/22/24 8:00 PM: - bq. it is removing alter_table_enabled guardrail. We can deprecate alter_table_enabled in 5.0.1 If I'm reading all comments correctly, it seems the right approach is this ticket waits til 5.0.0 is released, and then introduces the system property in 4.0, 4.1, 5.0 branches, deprecates alter_table_enabled in cassandra-5.0, and applies the current patch to trunk. was (Author: michaelsembwever): bq. it is removing alter_table_enabled guardrail. We can deprecate alter_table_enabled in 5.0.1 If I'm reading all comments correctly, it seems the right approach is this ticket waits til 5.0.0 is released, and then introduces the system property in Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848712#comment-17848712 ] Stefan Miklosovic commented on CASSANDRA-19556: --- OK, that sounds good. > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848709#comment-17848709 ] Michael Semb Wever commented on CASSANDRA-19556: bq. it is removing alter_table_enabled guardrail. We can deprecate alter_table_enabled in 5.0.1 If I'm reading all comments correctly, it seems the right approach is this ticket waits til 5.0.0 is released, and then introduces the system property in Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19657) Use more appropriate GC logging
Jon Haddad created CASSANDRA-19657: -- Summary: Use more appropriate GC logging Key: CASSANDRA-19657 URL: https://issues.apache.org/jira/browse/CASSANDRA-19657 Project: Cassandra Issue Type: Bug Reporter: Jon Haddad Our GC default setting spam the gc log resulting in unnecessary IO, cpu usage, and makes them less effective to use with tools like GCEasy. Here's some examples of what our defaults log: {noformat} [2024-05-22T09:14:45.558-0700][0.022s][11668][9219][trace] Trying to allocate at address 0x0006c000 heap of size 0x1 [2024-05-22T09:14:45.559-0700][0.022s][11668][9219][debug] Running G1 PreTouch with 4 workers for 4 work units pre-touching 4294967296B. [2024-05-22T09:23:33.653-0700][121.363s][12868][21251][trace] GC(15) | 255|0x0007bf00, 0x0007bf00, 0x0007c000| 0%| F| |TAMS 0x0007bf00, 0x0007bf00| Untracked [2024-05-22T09:23:33.528-0700][121.238s][12868][21251][trace] GC(15) | 178|0x00077200, 0x00077300, 0x00077300|100%| E|CS|TAMS 0x00077200, 0x00077200| Complete [2024-05-22T09:24:01.731-0700][149.441s][12868][21251][debug] GC(16) Heap before GC invocations=16 (full 0): garbage-first heap total 4194304K, used 1918824K [0x0006c000, 0x0007c000) [2024-05-22T09:24:01.731-0700][149.441s][12868][21251][debug] GC(16) region size 16384K, 108 young (1769472K), 3 survivors (49152K) [2024-05-22T09:24:01.731-0700][149.441s][12868][21251][debug] GC(16) Metaspace used 63070K, capacity 65566K, committed 65680K, reserved 1105920K [2024-05-22T09:24:01.731-0700][149.441s][12868][21251][debug] GC(16) class spaceused 7274K, capacity 8432K, committed 8448K, reserved 1048576K {noformat} Quickly looking at the breakdown of trace, debug, and info: {noformat} grep trace logs/gc.log | wc -l 7771 grep debug logs/gc.log | wc -l 188 grep info logs/gc.log | wc -l 1065 {noformat} In summary, we're seeing almost 9K worth of debug and trace log lines for every 1K of INFO, and the debug and trace doesn't have any value for a user that would leave it unset. Really interested users might want this, but they should know how to enable it. We should use the INFO settings by default: {noformat} JVM_OPTS="$JVM_OPTS -Xlog:gc=info:file=${CASSANDRA_LOG_DIR}/gc.log:time,uptime,pid,tid,level:filecount=10,filesize=10485760" {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19656) Revisit disabling chronicle analytics
[ https://issues.apache.org/jira/browse/CASSANDRA-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848673#comment-17848673 ] C. Scott Andreas commented on CASSANDRA-19656: -- Brandon, good catch on the upgrade and on revisiting. +1 on proactively disabling to make sure we don't get surprised in the future. Would be supportive of revisiting alternate libs in a future major to avoid the need to explicitly disable as well. > Revisit disabling chronicle analytics > - > > Key: CASSANDRA-19656 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19656 > Project: Cassandra > Issue Type: Task > Components: Local/Other >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > > We first considered this in CASSANDRA-18538 but determined it wasn't a > problem. We have upgraded chronicle in CASSANDRA-18049 so we should > reconfirm with packet analysis that nothing is phoning home, and perhaps > consider taking further precautions by proactively disabling it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19635) Update target Cassandra versions for integration tests, support new 5.0.x
[ https://issues.apache.org/jira/browse/CASSANDRA-19635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848669#comment-17848669 ] Bret McGuire commented on CASSANDRA-19635: -- An additional note here: there's an interesting goal to add support for running tests against current C* master, which at the moment would be the "tip of the spear" of Cassandra 5.0 development. While this is a useful feature to add it's (a) secondary to the immediate concern about validating the Java driver against Cassandra 5.0.x and (b) something that should live in any CI infrastructure built for the ASF (CASSANDRA-18971) rather than something we add to the legacy DataStax infrastructure. > Update target Cassandra versions for integration tests, support new 5.0.x > - > > Key: CASSANDRA-19635 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19635 > Project: Cassandra > Issue Type: Task > Components: Client/java-driver >Reporter: Bret McGuire >Assignee: Shiva Kalyan >Priority: Normal > > {color:#172b4d}[CASSANDRA-19292|https://issues.apache.org/jira/browse/CASSANDRA-19292] > added support for running integration tests against Cassandra 4.1.x but we > still need the ability to run against Cassandra 5.0.x. As of this writing we > need [riptano/ccm|https://github.com/riptano/ccm] to manage Cassandra 5.0.x > clusters. The DataStax CI infrastructure, however, uses a private fork of > ccm which adds the ability to manage DSE clusters (something riptano/ccm > can't do right now). So we presumably need to do one of the following: > {color} > * Port Cassandra 5.0.x support to the private fork > * Port DSE support to riptano/ccm > * Change the build process to install both riptano/ccm and the private fork > into distinct venvs and manage accordingly -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848637#comment-17848637 ] Stefan Miklosovic commented on CASSANDRA-19632: --- That will be probably the truth. I might check that. We can revert the cases when it is logging like this: {code:java} if (logger.isTracingEnabled()) logger.trace("a message"); {code} In these cases, I think it is pretty much an overkill. It is more about not evaluating the message itself if there are some arguments. > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848634#comment-17848634 ] Brandon Williams commented on CASSANDRA-19632: -- I think we should check some microbenchmarks around this. My understanding is the trace function will call isTraceEnabled itself, so wrapping this purely for trace logging calls shouldn't be beneficial, only if the flag is used to prevent other execution from occurring. > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19632: -- Test and Documentation Plan: ci Status: Patch Available (was: In Progress) I've created a PR. I have not added a checkstyle rule, it is actually quite tricky to get right and I do not think we can generalize it enough. It might be very specific to the code. > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19593) Transactional Guardrails
[ https://issues.apache.org/jira/browse/CASSANDRA-19593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848624#comment-17848624 ] Sam Tunnicliffe commented on CASSANDRA-19593: - {quote}This brings us to more general problem of transactional configuration which should be done as well. It is questionable if it is desirable to do it as part of this ticket or not, however, I would like to look into how we could do that as well. {quote} We've been working on some proposals for this, some of which were briefly discussed in CASSANDRA-12937. I agree with [~ifesdjeen] in that this warrants its own CEP. I know he's been working on document for that, I'll see if we can get that ready for circulation. > Transactional Guardrails > > > Key: CASSANDRA-19593 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19593 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > I think it is time to start to think about this more seriously. TCM is > getting into pretty nice shape and we might start to investigate how to do > this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19656) Revisit disabling chronicle analytics
[ https://issues.apache.org/jira/browse/CASSANDRA-19656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19656: - Change Category: Quality Assurance Complexity: Normal Component/s: Local/Other Fix Version/s: 5.0.x 5.x Assignee: Brandon Williams Status: Open (was: Triage Needed) > Revisit disabling chronicle analytics > - > > Key: CASSANDRA-19656 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19656 > Project: Cassandra > Issue Type: Task > Components: Local/Other >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 5.0.x, 5.x > > > We first considered this in CASSANDRA-18538 but determined it wasn't a > problem. We have upgraded chronicle in CASSANDRA-18049 so we should > reconfirm with packet analysis that nothing is phoning home, and perhaps > consider taking further precautions by proactively disabling it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19656) Revisit disabling chronicle analytics
Brandon Williams created CASSANDRA-19656: Summary: Revisit disabling chronicle analytics Key: CASSANDRA-19656 URL: https://issues.apache.org/jira/browse/CASSANDRA-19656 Project: Cassandra Issue Type: Task Reporter: Brandon Williams We first considered this in CASSANDRA-18538 but determined it wasn't a problem. We have upgraded chronicle in CASSANDRA-18049 so we should reconfirm with packet analysis that nothing is phoning home, and perhaps consider taking further precautions by proactively disabling it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19593) Transactional Guardrails
[ https://issues.apache.org/jira/browse/CASSANDRA-19593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848590#comment-17848590 ] Stefan Miklosovic commented on CASSANDRA-19593: --- There is minimumReplicationFactor guardrail which looks into DatabaseDescriptor.getDefaultKeyspaceRF() when validating the value: {code:java} public static void validateMinRFThreshold(int warn, int fail) { validateMinIntThreshold(warn, fail, "minimum_replication_factor"); if (fail > DatabaseDescriptor.getDefaultKeyspaceRF()) throw new IllegalArgumentException(format("minimum_replication_factor_fail_threshold to be set (%d) " + "cannot be greater than default_keyspace_rf (%d)", fail, DatabaseDescriptor.getDefaultKeyspaceRF())); } {code} this is similarly done for maximum_replication_factor. Obviously, conf.default_keyspace_rf can be different per node, if misconfigured, so transformation application would not be the same on all nodes and it might fail on some and not fail on others. This brings us to more general problem of transactional configuration which should be done as well. It is questionable if it is desirable to do it as part of this ticket or not, however, I would like to look into how we could do that as well. > Transactional Guardrails > > > Key: CASSANDRA-19593 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19593 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > I think it is time to start to think about this more seriously. TCM is > getting into pretty nice shape and we might start to investigate how to do > this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848560#comment-17848560 ] Stefan Miklosovic edited comment on CASSANDRA-19556 at 5/22/24 11:00 AM: - That being said, I am curious if the patch for trunk might go in as is - it is removing alter_table_enabled guardrail. If that is not desirable, then shadowing it is the next option. That means I would need to get alter_table_enabled back. Is everybody OK with this? Otherwise I am all ears how to do this - that was my whole point why I wanted to do it before 5.0.0 is out by removing the old one. Or we just do what Sam suggests, we just remove this feature altogether and replace it by a system property. That will work but it will not work while cluster is up and people need to prevent schema modifications operationally. was (Author: smiklosovic): That being said, I am curious if the patch for trunk might go in as is - it is removing alter_table_enabled guardrail. If that is not desirable, then shadowing it is the next option. That means I would need to get alter_table_enabled back. Is everybody OK with this? Otherwise I am all ears how to do this - that was my whole point why I wanted to do it before 5.0.0 is out by removing the old one. > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848553#comment-17848553 ] Stefan Miklosovic edited comment on CASSANDRA-19556 at 5/22/24 10:49 AM: - OK, I think we can just defer this effort to trunk / 5.1 then. Removing it from 5.0-rc. was (Author: smiklosovic): OK, I think we can just defer this effor to trunk / 5.1 then. Removing it from 5.0-rc. > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19556: -- Fix Version/s: (was: 5.0-rc) > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.x > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848553#comment-17848553 ] Stefan Miklosovic commented on CASSANDRA-19556: --- OK, I think we can just defer this effor to trunk / 5.1 then. Removing it from 5.0-rc. > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19556) Add guardrail to block DDL/DCL queries and replace alter_table_enabled guardrail
[ https://issues.apache.org/jira/browse/CASSANDRA-19556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848544#comment-17848544 ] Sam Tunnicliffe commented on CASSANDRA-19556: - [~mck] this certainly isn't critical for 5.1/6.0, my comment was just intended as a counterpoint to illustrate why it might but useful in a 5.0.x To that point, I'd definitely think about adding _something_ to minors in branches with upgrade paths to current trunk. Not an actual guardrail, just a system property or similar to optionally disable certain operations immediately prior to upgrade. If we did go down that route, there is some precedent from back in the day for mandating a minimum minor version prior to a major upgrade (from {{{}NEWS.txt{}}}): {code:java} Upgrade to 3.0 is supported from Cassandra 2.1 versions greater or equal to 2.1.9, or Cassandra 2.2 versions greater or equal to 2.2.2. {code} but like I said, this isn't critical for upgrading to current trunk and I'm definitely not advocating for anything in 5.0-rc > Add guardrail to block DDL/DCL queries and replace alter_table_enabled > guardrail > > > Key: CASSANDRA-19556 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19556 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails >Reporter: Yuqi Yan >Assignee: Yuqi Yan >Priority: Normal > Fix For: 5.0-rc, 5.x > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Sometimes we want to block DDL/DCL queries to stop new schemas being created > or roles created. (e.g. when doing live-upgrade) > For DDL guardrail current implementation won't block the query if it's no-op > (e.g. CREATE TABLE...IF NOT EXISTS, but table already exists, etc. The > guardrail check is added in apply() right after all the existence check) > I don't have preference on either block every DDL query or check whether if > it's no-op here. Just we have some users always run CREATE..IF NOT EXISTS.. > at startup, which is no-op but will be blocked by this guardrail and failed > to start. > > 4.1 PR: [https://github.com/apache/cassandra/pull/3248] > trunk PR: [https://github.com/apache/cassandra/pull/3275] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19650) CCM wrongly interprets CASSANDRA_USE_JDK11 for Cassandra 4.x
[ https://issues.apache.org/jira/browse/CASSANDRA-19650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Semb Wever updated CASSANDRA-19650: --- Discovered By: DTest (was: User Report) Fix Version/s: 3.0.x 3.11.x 4.0.x 4.1.x (was: NA) Severity: Normal (was: Low) > CCM wrongly interprets CASSANDRA_USE_JDK11 for Cassandra 4.x > > > Key: CASSANDRA-19650 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19650 > Project: Cassandra > Issue Type: Bug > Components: Build, Test/dtest/python >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x > > > CCM interprets {{CASSANDRA_USE_JDK11}} only by its existence in the > environment rather than by its actual value (true/false). > I can see two solutions: > - make it interpret {{CASSANDRA_USE_JDK11}} properly > - do not take into account {{CASSANDRA_USE_JDK11}} in the current env and set > it or unset it automatically when starting a node basing on which Java > version was selected -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19650) CCM wrongly interprets CASSANDRA_USE_JDK11 for Cassandra 4.x
[ https://issues.apache.org/jira/browse/CASSANDRA-19650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848505#comment-17848505 ] Michael Semb Wever commented on CASSANDRA-19650: In addition, CASSANDRA-19636 appears to have broken jdk-switching (upgrade_through_versions) when, after an upgrade and jdk-switch, a new node is added that relies on a jdk-switch. The following logging shows a four node cluster starting up using jdk8 (from matching $PATH and $JAVA_HOME) on 4.1.6. It then switch all four nodes to jdk11 (using $JAVA11_HOME) and 5.0. But when it tries to add the 5th node (on jdk11 and 5.0) it suddenly fails. {noformat} upgrade_tests/upgrade_through_versions_test.py::TestUpgrade_indev_4_1_x_To_indev_5_0_x::test_bootstrap_multidc … 21:00:02,930 ccm INFO Supported Java versions for Cassandra distribution in '/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1': None 21:00:03,64 ccm INFO node1: Using the current Java 8 available on PATH for the current invocation of Cassandra 4.1.6. 21:00:03,153 ccm INFO Supported Java versions for Cassandra distribution in '/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1': None 21:00:03,284 ccm INFO node1: Using the current Java 8 available on PATH for the current invocation of Cassandra 4.1.6. 21:00:03,333 ccm INFO Starting node1 with JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 java_version=8 cassandra_version=4.1.6, install_dir=/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1 21:00:08,458 ccm INFO Supported Java versions for Cassandra distribution in '/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1': None 21:00:08,595 ccm INFO node2: Using the current Java 8 available on PATH for the current invocation of Cassandra 4.1.6. 21:00:08,707 ccm INFO Supported Java versions for Cassandra distribution in '/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1': None 21:00:08,841 ccm INFO node2: Using the current Java 8 available on PATH for the current invocation of Cassandra 4.1.6. 21:00:08,889 ccm INFO Starting node2 with JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 java_version=8 cassandra_version=4.1.6, install_dir=/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1 21:00:23,522 ccm INFO Supported Java versions for Cassandra distribution in '/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1': None 21:00:23,657 ccm INFO node3: Using the current Java 8 available on PATH for the current invocation of Cassandra 4.1.6. 21:00:23,756 ccm INFO Supported Java versions for Cassandra distribution in '/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1': None 21:00:23,901 ccm INFO node3: Using the current Java 8 available on PATH for the current invocation of Cassandra 4.1.6. 21:00:23,947 ccm INFO Starting node3 with JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 java_version=8 cassandra_version=4.1.6, install_dir=/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1 21:00:38,573 ccm INFO Supported Java versions for Cassandra distribution in '/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1': None 21:00:38,707 ccm INFO node4: Using the current Java 8 available on PATH for the current invocation of Cassandra 4.1.6. 21:00:38,803 ccm INFO Supported Java versions for Cassandra distribution in '/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1': None 21:00:38,931 ccm INFO node4: Using the current Java 8 available on PATH for the current invocation of Cassandra 4.1.6. 21:00:38,978 ccm INFO Starting node4 with JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 java_version=8 cassandra_version=4.1.6, install_dir=/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-4.1 … 21:02:18,416 ccm INFO Supported Java versions for Cassandra distribution in '/parallel-ci/work/.ccm/repository/githubCOLONapacheSLASHcassandra-5.0': [11, 17] 21:02:18,556 ccm WARNING node1: The current Java 8 is not supported by Cassandra 5.0 (supported versions: [11, 17]). 21:02:18,557 ccm INFO node1: CCM has found {17: 'JAVA17_HOME', 8: 'JAVA_HOME', 11: 'JAVA11_HOME'} Java distributions, the required Java version for Cassandra 5.0 is [11, 17]. … FAILED {noformat} ref: https://pastebin.com/JfGziJHh [~aweisberg], have you been able to provide steps to reproduce this ? > CCM wrongly interprets CASSANDRA_USE_JDK11 for Cassandra 4.x > > > Key: CASSANDRA-19650 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19650 > Project: Cassandra > Issue Type: Bug > Components: Build, Test/dtest/python >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal >
[jira] [Commented] (CASSANDRA-19650) CCM wrongly interprets CASSANDRA_USE_JDK11 for Cassandra 4.x
[ https://issues.apache.org/jira/browse/CASSANDRA-19650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848503#comment-17848503 ] Michael Semb Wever commented on CASSANDRA-19650: All CI, all branches before 5, is currently broken because of CASSANDRA-19636 Specifically the cqlsh ({{`pylib/cassandra-cqlsh-tests.sh`}}) tests. {noformat} 18:29:42 + ccm create test -n 1 --install-dir=/home/cassandra/cassandra 18:29:43 Current cluster is now: test 18:29:43 16:29:43,74 ccm DEBUG using balanced tokens for non-vnode cluster 18:29:43 + ccm updateconf 'user_defined_functions_enabled: true' 18:29:44 + ccm updateconf 'scripted_user_defined_functions_enabled: true' 18:29:44 ++ ccm node1 versionfrombuild 18:29:44 + version_from_build=4.1.6 18:29:44 ++ python -c 'from distutils.version import LooseVersion 18:29:44 print ("postcdc" if LooseVersion("4.1.6") >= "3.8" else "precdc") 18:29:44 ' 18:29:45 :2: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead. 18:29:45 + export pre_or_post_cdc=postcdc 18:29:45 + pre_or_post_cdc=postcdc 18:29:45 + case "${pre_or_post_cdc}" in 18:29:45 + ccm updateconf 'cdc_enabled: true' 18:29:45 + ccm start --wait-for-binary-proto 18:29:46 16:29:46,124 ccm INFO Supported Java versions for Cassandra distribution in '/home/cassandra/cassandra': None 18:29:46 16:29:46,186 ccm WARNING node1: The current Java 8 is not supported by Cassandra 4.1.6 (supported versions: [11]). 18:29:46 Traceback (most recent call last): 18:29:46 File "/home/cassandra/cassandra/venv/bin/ccm", line 7, in 18:29:46 exec(compile(f.read(), __file__, 'exec')) 18:29:46 File "/home/cassandra/cassandra/venv/src/ccm/ccm", line 112, in 18:29:46 cmd.run() 18:29:46 File "/home/cassandra/cassandra/venv/src/ccm/ccmlib/cmds/cluster_cmds.py", line 513, in run 18:29:46 if self.cluster.start(no_wait=self.options.no_wait, 18:29:46 File "/home/cassandra/cassandra/venv/src/ccm/ccmlib/cluster.py", line 526, in start 18:29:46 p = node.start(update_pid=False, jvm_args=jvm_args, jvm_version=jvm_version, 18:29:46 File "/home/cassandra/cassandra/venv/src/ccm/ccmlib/node.py", line 820, in start 18:29:46 env = self.get_env() 18:29:46 File "/home/cassandra/cassandra/venv/src/ccm/ccmlib/node.py", line 240, in get_env 18:29:46 env = common.update_java_version(jvm_version=None, 18:29:46 File "/home/cassandra/cassandra/venv/src/ccm/ccmlib/common.py", line 960, in update_java_version 18:29:46 return _update_java_version(current_java_version, current_java_home_version, 18:29:46 File "/home/cassandra/cassandra/venv/src/ccm/ccmlib/common.py", line 1031, in _update_java_version 18:29:46 raise RuntimeError('{}: Cannot find any Java distribution for the current invocation. Available Java distributions: {}, required Java distributions: {}' 18:29:46 RuntimeError: node1: Cannot find any Java distribution for the current invocation. Available Java distributions: {8: 'JAVA_HOME'}, required Java distributions: [11] {noformat} Ref: - https://ci-cassandra.apache.org/job/Cassandra-4.1-cqlsh-tests/435/ - https://nightlies.apache.org/cassandra/cassandra-4.1/Cassandra-4.1-cqlsh-tests/435/Cassandra-4.1-cqlsh-tests/cython=no,jdk=jdk_1.8_latest,label=cassandra/ This is because it was expecting {{`CASSANDRA_USE_JDK11=false`}} to work. It never did, but before 19636 was being ignored. Ref: https://github.com/apache/cassandra/blob/cassandra-4.1/pylib/cassandra-cqlsh-tests.sh#L47 > CCM wrongly interprets CASSANDRA_USE_JDK11 for Cassandra 4.x > > > Key: CASSANDRA-19650 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19650 > Project: Cassandra > Issue Type: Bug > Components: Build, Test/dtest/python >Reporter: Jacek Lewandowski >Assignee: Jacek Lewandowski >Priority: Normal > Fix For: NA > > > CCM interprets {{CASSANDRA_USE_JDK11}} only by its existence in the > environment rather than by its actual value (true/false). > I can see two solutions: > - make it interpret {{CASSANDRA_USE_JDK11}} properly > - do not take into account {{CASSANDRA_USE_JDK11}} in the current env and set > it or unset it automatically when starting a node basing on which Java > version was selected -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Konstantinov updated CASSANDRA-19651: Test and Documentation Plan: a unit test is updated to cover the issue docs updates are not planned Status: Patch Available (was: In Progress) > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Assignee: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 19651-4.1.patch > > > org.apache.cassandra.service.AbstractWriteResponseHandler: > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19632: -- Change Category: Code Clarity Complexity: Normal Component/s: Legacy/Core Assignee: Stefan Miklosovic Status: Open (was: Triage Needed) > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Core >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19632) wrap tracing logs in isTraceEnabled across the codebase
[ https://issues.apache.org/jira/browse/CASSANDRA-19632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Miklosovic updated CASSANDRA-19632: -- Fix Version/s: (was: 5.0.x) > wrap tracing logs in isTraceEnabled across the codebase > --- > > Key: CASSANDRA-19632 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19632 > Project: Cassandra > Issue Type: Improvement >Reporter: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > > Our usage of logger.isTraceEnabled across the codebase is inconsistent. This > would also fix issues similar in e.g. CASSANDRA-19429 as [~rustyrazorblade] > suggested. > We should fix this at least in trunk and 5.0 (not critical though) and > probably come up with a checkstyle rule to prevent not calling isTraceEnabled > while logging with TRACE level. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19655) Incorrect date format in SettingsGraph
Dmitrii Kriukov created CASSANDRA-19655: --- Summary: Incorrect date format in SettingsGraph Key: CASSANDRA-19655 URL: https://issues.apache.org/jira/browse/CASSANDRA-19655 Project: Cassandra Issue Type: Bug Reporter: Dmitrii Kriukov Assignee: Dmitrii Kriukov ? "cassandra-stress - " + new SimpleDateFormat("-mm-dd hh:mm:ss").format(new Date()) should be ? "cassandra-stress - " + new SimpleDateFormat("-MM-dd hh:mm:ss").format(new Date()) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19655) Incorrect date format in SettingsGraph
[ https://issues.apache.org/jira/browse/CASSANDRA-19655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848473#comment-17848473 ] Dmitrii Kriukov commented on CASSANDRA-19655: - PR https://github.com/apache/cassandra/pull/3309 > Incorrect date format in SettingsGraph > -- > > Key: CASSANDRA-19655 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19655 > Project: Cassandra > Issue Type: Bug >Reporter: Dmitrii Kriukov >Assignee: Dmitrii Kriukov >Priority: Normal > > ? "cassandra-stress - " + new SimpleDateFormat("-mm-dd > hh:mm:ss").format(new Date()) > should be > ? "cassandra-stress - " + new SimpleDateFormat("-MM-dd > hh:mm:ss").format(new Date()) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19654) Update bundled Cassandra cassandra-driver-core dependency
[ https://issues.apache.org/jira/browse/CASSANDRA-19654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848457#comment-17848457 ] Jackson Fleming commented on CASSANDRA-19654: - [https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-14721] - is the CVE someone has flagged to us, but there's a lot more reported on the maven page for 3.11.0 ([https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core/3.11.0] ) > Update bundled Cassandra cassandra-driver-core dependency > - > > Key: CASSANDRA-19654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19654 > Project: Cassandra > Issue Type: Task > Components: Dependencies >Reporter: Jackson Fleming >Priority: Normal > > There's a dependency in Cassandra project on an old version of the Java > driver cassandra-driver-core - 3.11.0 in the 4.0 and later releases of > Cassandra > > (For example on the 4.1 branch > [https://github.com/apache/cassandra/blob/cassandra-4.1/build.xml#L691)] > > It appears that this dependency may have some security vulnerabilities in > transitive dependencies. > But also this is a very old version of the driver, ideally it would be > aligned to a newer version, I would suggest either 3.11.5 which is the latest > in that line of driver versions > [https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core|https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core)] > or this gets updated to the latest 4.x driver (as of writing that's 4.18.1 in > [https://mvnrepository.com/artifact/org.apache.cassandra/java-driver-core] ) > but this seems like a larger undertaking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19654) Update bundled Cassandra cassandra-driver-core dependency
Jackson Fleming created CASSANDRA-19654: --- Summary: Update bundled Cassandra cassandra-driver-core dependency Key: CASSANDRA-19654 URL: https://issues.apache.org/jira/browse/CASSANDRA-19654 Project: Cassandra Issue Type: Task Components: Dependencies Reporter: Jackson Fleming There's a dependency in Cassandra project on an old version of the Java driver cassandra-driver-core - 3.11.0 in the 4.0 and later releases of Cassandra (For example on the 4.1 branch [https://github.com/apache/cassandra/blob/cassandra-4.1/build.xml#L691)] It appears that this dependency may have some security vulnerabilities in transitive dependencies. But also this is a very old version of the driver, ideally it would be aligned to a newer version, I would suggest either 3.11.5 which is the latest in that line of driver versions [https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core|https://mvnrepository.com/artifact/com.datastax.cassandra/cassandra-driver-core)] or this gets updated to the latest 4.x driver (as of writing that's 4.18.1 in [https://mvnrepository.com/artifact/org.apache.cassandra/java-driver-core] ) but this seems like a larger undertaking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not documented correctly
[ https://issues.apache.org/jira/browse/CASSANDRA-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848377#comment-17848377 ] Michael Semb Wever edited comment on CASSANDRA-12864 at 5/21/24 9:29 PM: - In addition, the page has a number of faults that should be corrected: - the default is periodic, - there's an odd '(Default Value: (complex option): ' section (maybe this is just asciidoc ?) - the sentence "Any data written to Cassandra will first be written to a commit log before being written to a memtable." isn't strictly-speaking correct, the commitlog and the memtable happen in parallel… And, it would be worthwhile if the page referenced, for more advanced info, this [blog post|https://cassandra.apache.org/_/blog/Learn-How-CommitLog-Works-in-Apache-Cassandra.html] these pages are now found at - https://cassandra.apache.org/doc/3.11/cassandra/architecture/storage_engine.html#commit-log - https://cassandra.apache.org/doc/4.0/cassandra/architecture/storage_engine.html#commit-log - https://cassandra.apache.org/doc/4.1/cassandra/architecture/storage_engine.html#commit-log - https://cassandra.apache.org/doc/5.0/cassandra/architecture/storage-engine.html - https://cassandra.apache.org/doc/5.1/cassandra/architecture/storage-engine.html - https://cassandra.apache.org/doc/latest/cassandra/architecture/storage-engine.html was (Author: michaelsembwever): In addition, the page has a number of faults that should be corrected: - the default is periodic, - there's an odd '(Default Value: (complex option): ' section (maybe this is just asciidoc ?) - the sentence "Any data written to Cassandra will first be written to a commit log before being written to a memtable." isn't strictly-speaking correct, the commitlog and the memtable happen in parallel… And, it would be worthwhile if the page referenced, for more advanced info, this [blog post|https://cassandra.apache.org/_/blog/Learn-How-CommitLog-Works-in-Apache-Cassandra.html] > "commitlog_sync_batch_window_in_ms" parameter is not documented correctly > - > > Key: CASSANDRA-12864 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12864 > Project: Cassandra > Issue Type: Bug > Components: Documentation >Reporter: Hiroyuki Yamada >Priority: Normal > > "commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in > the latest versions in 2.1.16, 2.2.8 and 3.9. > Here is the way to reproduce the bug. > 1. set the following parameters in cassandra.yaml > * commitlog_sync: batch > * commitlog_sync_batch_window_in_ms: 1 (10s) > 2. issue an insert from cqlsh > 3. it immediately returns instead of waiting for 10 seconds. > Please refer to the communication in the mailing list. > http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-18732) Baseline Diagnostic vtables for Accord
[ https://issues.apache.org/jira/browse/CASSANDRA-18732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848391#comment-17848391 ] Caleb Rackliffe commented on CASSANDRA-18732: - A quick note on cache stats... Since CASSANDRA-14572, the JMX "AccordStateCache" metrics have been exposed as virtual tables automatically. The only new cache stats virtual table in this patch covers the global {{AccordCommandStore#stateCache}}. > Baseline Diagnostic vtables for Accord > -- > > Key: CASSANDRA-18732 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18732 > Project: Cassandra > Issue Type: Improvement > Components: Accord, Observability/Metrics >Reporter: Caleb Rackliffe >Assignee: Caleb Rackliffe >Priority: Normal > Labels: pull-request-available > Fix For: 5.x > > Time Spent: 50m > Remaining Estimate: 0h > > In addition to JMX-based metrics, there are bits of diagnostic information > for Accord that we should consider exposing through vtables: > 1.) We should ensure that coordinator-level CQL transactions and the local > reads and writes they spawn are visible to the existing {{QueriesTable}} > vtable. > The first may already just work. We may need to make some tweaks to > {{TxnNamedRead}} and {{TxnWrite}} for the local operations though. > ({{CommandStore}} tasks are out of scope here, as they would probably be more > confusing than useful in {{QueriesTable}}?) > 2.) A new vtable for pending commands for a key. > - Disable SELECT */require a partition key > - Might require partial back-port of stringifying table/partition key from > Accord to be correct > - ex. {{SELECT timestamps FROM myawesometable where ks=? and table=? and > partition_key=?}} > - Clustering can be the Accord timestamp elements, no further normal columns. > 3.) A new vtable for command store-specific cache stats > - Gather via {{Store.execute()}} for correctness. > - Store id should be partition key (see {{AccordCommandStore}}) > - hits, misses, total (maybe just throw out the keyspaces and coalesce > ranges?) > 4.) (Requires [~aweisberg]'s outstanding work) A new vtable for live > migration state > - {{TableMigrationState}} could be flattened into a row > - Is this already persisted? If so, why a new vtable? > 5.) A vtable to expose {{accord.local.Node#coordinating()}} as a map > - ex. {{SELECT txn_id, type}} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19653) Fix Cassandra 4 build for snakeyaml >= 2.0
Manuel Rojas created CASSANDRA-19653: Summary: Fix Cassandra 4 build for snakeyaml >= 2.0 Key: CASSANDRA-19653 URL: https://issues.apache.org/jira/browse/CASSANDRA-19653 Project: Cassandra Issue Type: Bug Reporter: Manuel Rojas Hello, We're trying to build a custom version of cassandra 4.1.4 that fixes some of the CVEs reported here: [https://mvnrepository.com/artifact/org.apache.cassandra/cassandra-all/4.1.4] So far the build passes with these upgrades on build.xml dependencies: |*Name*|*Default version*|*Modified version*| |slf4j|1.7.25|1.7.36| |*{color:#00875a}snakeyaml{color}*|*{color:#00875a}1.26{color}*|*{color:#00875a}1.33{color}*| |jackson-databind|2.13.2.2|2.17.0| |netty|4.1.58.Final|4.1.109.Final| |guava|27.0-jre|33.2.0-jre| |commons-codec|1.9|1.17.0| We're being asked to bump *snakeyaml* to >= {*}2.0{*}. This causes the build to fail with: {code:java} _build_java: [echo] Compiling for Java 8... [javac] Compiling 2138 source files to /Users/mrojas/git/cassandra/build/classes/main [javac] Note: Processing compiler hints annotations [javac] Note: Processing compiler hints annotations [javac] Note: Writing compiler command file at META-INF/hotspot_compiler [javac] Note: Done processing compiler hints annotations [javac] /Users/mrojas/git/cassandra/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java:226: error: constructor Composer in class Composer cannot be applied to given types; [javac] constructor.setComposer(new Composer(null, null) [javac] ^ [javac] required: Parser,Resolver,LoaderOptions [javac] found: , [javac] reason: actual and formal argument lists differ in length [javac] /Users/mrojas/git/cassandra/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java:279: error: incompatible types: Class cannot be converted to ClassLoader [javac] super(theRoot, classLoader); [javac] ^ [javac] where CAP#1 is a fresh type-variable: [javac] CAP#1 extends Object from capture of ? [javac] /Users/mrojas/git/cassandra/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java:260: error: constructor Composer in class Composer cannot be applied to given types; [javac] constructor.setComposer(new Composer(null, null) [javac] ^ [javac] required: Parser,Resolver,LoaderOptions [javac] found: , [javac] reason: actual and formal argument lists differ in length [javac] /Users/mrojas/git/cassandra/src/java/org/apache/cassandra/tools/JMXTool.java:168: error: constructor Representer in class Representer cannot be applied to given types; [javac] Representer representer = new Representer(); [javac] ^ [javac] required: DumperOptions [javac] found: no arguments [javac] reason: actual and formal argument lists differ in length [javac] /Users/mrojas/git/cassandra/src/java/org/apache/cassandra/tools/JMXTool.java:398: error: no suitable constructor found for Constructor(no arguments) [javac] { [javac] ^ [javac] constructor Constructor.Constructor(LoaderOptions) is not applicable [javac] (actual and formal argument lists differ in length) [javac] constructor Constructor.Constructor(Class,LoaderOptions) is not applicable [javac] (actual and formal argument lists differ in length) [javac] constructor Constructor.Constructor(TypeDescription,LoaderOptions) is not applicable [javac] (actual and formal argument lists differ in length) [javac] constructor Constructor.Constructor(TypeDescription,Collection,LoaderOptions) is not applicable [javac] (actual and formal argument lists differ in length) [javac] constructor Constructor.Constructor(String,LoaderOptions) is not applicable [javac] (actual and formal argument lists differ in length) [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] Note: Some messages have been simplified; recompile with -Xdiags:verbose to get full output [javac] 5 errorsBUILD FAILED {code} This is result of running: {code:java} ➜ cassandra git:(99d9faeef5) ✗ ant realclean ➜ cassandra git:(99d9faeef5) ✗ ant artifacts {code} Could you please fix build to work with snakeyaml >= 2.0? Thanks -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits
[jira] [Commented] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not documented correctly
[ https://issues.apache.org/jira/browse/CASSANDRA-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848377#comment-17848377 ] Michael Semb Wever commented on CASSANDRA-12864: In addition, the page has a number of faults that should be corrected: - the default is periodic, - there's an odd '(Default Value: (complex option): ' section (maybe this is just asciidoc ?) - the sentence "Any data written to Cassandra will first be written to a commit log before being written to a memtable." isn't strictly-speaking correct, the commitlog and the memtable happen in parallel… And, it would be worthwhile if the page referenced, for more advanced info, this [blog post|https://cassandra.apache.org/_/blog/Learn-How-CommitLog-Works-in-Apache-Cassandra.html] > "commitlog_sync_batch_window_in_ms" parameter is not documented correctly > - > > Key: CASSANDRA-12864 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12864 > Project: Cassandra > Issue Type: Bug > Components: Documentation >Reporter: Hiroyuki Yamada >Priority: Normal > > "commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in > the latest versions in 2.1.16, 2.2.8 and 3.9. > Here is the way to reproduce the bug. > 1. set the following parameters in cassandra.yaml > * commitlog_sync: batch > * commitlog_sync_batch_window_in_ms: 1 (10s) > 2. issue an insert from cqlsh > 3. it immediately returns instead of waiting for 10 seconds. > Please refer to the communication in the mailing list. > http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18493) SAI - LIKE prefix/suffix support
[ https://issues.apache.org/jira/browse/CASSANDRA-18493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Adamson updated CASSANDRA-18493: - Component/s: Feature/SAI (was: Feature/2i Index) > SAI - LIKE prefix/suffix support > > > Key: CASSANDRA-18493 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18493 > Project: Cassandra > Issue Type: Epic > Components: Feature/SAI >Reporter: Mike Adamson >Assignee: Mike Adamson >Priority: Normal > Fix For: 5.x > > > This should provide the following functionality: > * LIKE abc% - prefix support > * LIKE %bcd - suffix support > * LIKE ab%cd - prefix/suffix support > Out of scope: > * LIKE %abc% - contains support > The index support for this can broken down as follows (general ideas that are > open to suggestions): > * Prefix support. This can currently be achieved with the existing trie > index but this needs work to make it more performant in coalescing postings. > An alternative approach could be to modify the block balanced tree to support > variable length datatypes. This would make general range queries possible on > variable length types as well as prefix queries. These would benefit from the > auxilary postings present in the balanced tree. > * Suffix support. This will need a reverse index on the values. This allows > a search of the suffix to operate in the same way as a prefix query. There is > no reason why suffix index cannot be built on top of the prefix index with > separate postings for prefix and suffix. We would need to look at the byte > comparable code in order to produce reverse values efficiently that sort > correctly. > * Prefix/Suffix support. This would require separate prefix and suffix index > searches and an intersection on the resulting postings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18493) SAI - LIKE prefix/suffix support
[ https://issues.apache.org/jira/browse/CASSANDRA-18493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Adamson updated CASSANDRA-18493: - Description: This should provide the following functionality: * LIKE abc% - prefix support * LIKE %bcd - suffix support * LIKE ab%cd - prefix/suffix support Out of scope: * LIKE %abc% - contains support The index support for this can broken down as follows (general ideas that are open to suggestions): * Prefix support. This can currently be achieved with the existing trie index but this needs work to make it more performant in coalescing postings. An alternative approach could be to modify the block balanced tree to support variable length datatypes. This would make general range queries possible on variable length types as well as prefix queries. These would benefit from the auxilary postings present in the balanced tree. * Suffix support. This will need a reverse index on the values. This allows a search of the suffix to operate in the same way as a prefix query. There is no reason why suffix index cannot be built on top of the prefix index with separate postings for prefix and suffix. We would need to look at the byte comparable code in order to produce reverse values efficiently that sort correctly. * Prefix/Suffix support. This would require separate prefix and suffix index searches and an intersection on the resulting postings. was: This should provide the following functionality: * LIKE abc% - prefix support * LIKE %bcd - suffix support * LIKE %bc% - prefix/suffix support > SAI - LIKE prefix/suffix support > > > Key: CASSANDRA-18493 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18493 > Project: Cassandra > Issue Type: Epic > Components: Feature/2i Index >Reporter: Mike Adamson >Assignee: Mike Adamson >Priority: Normal > Fix For: 5.x > > > This should provide the following functionality: > * LIKE abc% - prefix support > * LIKE %bcd - suffix support > * LIKE ab%cd - prefix/suffix support > Out of scope: > * LIKE %abc% - contains support > The index support for this can broken down as follows (general ideas that are > open to suggestions): > * Prefix support. This can currently be achieved with the existing trie > index but this needs work to make it more performant in coalescing postings. > An alternative approach could be to modify the block balanced tree to support > variable length datatypes. This would make general range queries possible on > variable length types as well as prefix queries. These would benefit from the > auxilary postings present in the balanced tree. > * Suffix support. This will need a reverse index on the values. This allows > a search of the suffix to operate in the same way as a prefix query. There is > no reason why suffix index cannot be built on top of the prefix index with > separate postings for prefix and suffix. We would need to look at the byte > comparable code in order to produce reverse values efficiently that sort > correctly. > * Prefix/Suffix support. This would require separate prefix and suffix index > searches and an intersection on the resulting postings. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18493) SAI - LIKE prefix/suffix support
[ https://issues.apache.org/jira/browse/CASSANDRA-18493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Adamson updated CASSANDRA-18493: - Epic Link: CASSANDRA-19224 Issue Type: Epic (was: Improvement) Summary: SAI - LIKE prefix/suffix support (was: Add LIKE prefix/suffix support to SAI) > SAI - LIKE prefix/suffix support > > > Key: CASSANDRA-18493 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18493 > Project: Cassandra > Issue Type: Epic > Components: Feature/2i Index >Reporter: Mike Adamson >Assignee: Mike Adamson >Priority: Normal > Fix For: 5.x > > > This should provide the following functionality: > * LIKE abc% - prefix support > * LIKE %bcd - suffix support > * LIKE %bc% - prefix/suffix support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18493) SAI - LIKE prefix/suffix support
[ https://issues.apache.org/jira/browse/CASSANDRA-18493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Adamson updated CASSANDRA-18493: - Epic Link: (was: CASSANDRA-19224) > SAI - LIKE prefix/suffix support > > > Key: CASSANDRA-18493 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18493 > Project: Cassandra > Issue Type: Epic > Components: Feature/2i Index >Reporter: Mike Adamson >Assignee: Mike Adamson >Priority: Normal > Fix For: 5.x > > > This should provide the following functionality: > * LIKE abc% - prefix support > * LIKE %bcd - suffix support > * LIKE %bc% - prefix/suffix support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-18493) Add LIKE prefix/suffix support to SAI
[ https://issues.apache.org/jira/browse/CASSANDRA-18493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Adamson updated CASSANDRA-18493: - Change Category: Operability Complexity: Byzantine Status: Open (was: Triage Needed) > Add LIKE prefix/suffix support to SAI > - > > Key: CASSANDRA-18493 > URL: https://issues.apache.org/jira/browse/CASSANDRA-18493 > Project: Cassandra > Issue Type: Improvement > Components: Feature/2i Index >Reporter: Mike Adamson >Priority: Normal > Fix For: 5.x > > > This should provide the following functionality: > * LIKE abc% - prefix support > * LIKE %bcd - suffix support > * LIKE %bc% - prefix/suffix support -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19628) Correct testing instructions on the website
[ https://issues.apache.org/jira/browse/CASSANDRA-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lorina Poland updated CASSANDRA-19628: -- Component/s: Legacy/Documentation and Website (was: Documentation) > Correct testing instructions on the website > --- > > Key: CASSANDRA-19628 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19628 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Documentation and Website >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.0.31, 4.0.14, 4.1.6, 5.0-beta2, 5.0, 5.1 > > > At https://cassandra.apache.org/_/development/testing.html it says to issue > these statements for cqlsh tests: > {noformat} > ccm updateconf "enable_user_defined_functions: true" > ccm updateconf "enable_scripted_user_defined_functions: true" > ccm updateconf "cdc_enabled: true" > {noformat} > But these actually break the configuration so it won't start. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12864) "commitlog_sync_batch_window_in_ms" parameter is not documented correctly
[ https://issues.apache.org/jira/browse/CASSANDRA-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lorina Poland updated CASSANDRA-12864: -- Component/s: Documentation (was: Legacy/Documentation and Website) > "commitlog_sync_batch_window_in_ms" parameter is not documented correctly > - > > Key: CASSANDRA-12864 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12864 > Project: Cassandra > Issue Type: Bug > Components: Documentation >Reporter: Hiroyuki Yamada >Priority: Normal > > "commitlog_sync_batch_window_in_ms" doesn't seem to be working at least in > the latest versions in 2.1.16, 2.2.8 and 3.9. > Here is the way to reproduce the bug. > 1. set the following parameters in cassandra.yaml > * commitlog_sync: batch > * commitlog_sync_batch_window_in_ms: 1 (10s) > 2. issue an insert from cqlsh > 3. it immediately returns instead of waiting for 10 seconds. > Please refer to the communication in the mailing list. > http://www.mail-archive.com/user@cassandra.apache.org/msg49642.html -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19628) Correct testing instructions on the website
[ https://issues.apache.org/jira/browse/CASSANDRA-19628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lorina Poland updated CASSANDRA-19628: -- Component/s: Documentation (was: Legacy/Documentation and Website) > Correct testing instructions on the website > --- > > Key: CASSANDRA-19628 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19628 > Project: Cassandra > Issue Type: Bug > Components: Documentation >Reporter: Brandon Williams >Assignee: Brandon Williams >Priority: Normal > Fix For: 3.0.31, 4.0.14, 4.1.6, 5.0-beta2, 5.0, 5.1 > > > At https://cassandra.apache.org/_/development/testing.html it says to issue > these statements for cqlsh tests: > {noformat} > ccm updateconf "enable_user_defined_functions: true" > ccm updateconf "enable_scripted_user_defined_functions: true" > ccm updateconf "cdc_enabled: true" > {noformat} > But these actually break the configuration so it won't start. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Konstantinov updated CASSANDRA-19651: Since Version: 4.0 (was: 4.1.0) > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Assignee: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 19651-4.1.patch > > > org.apache.cassandra.service.AbstractWriteResponseHandler: > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest
[ https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19641: --- Attachment: ci_summary.html > Accord barriers/inclusive sync points cause failures in BurnTest > > > Key: CASSANDRA-19641 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19641 > Project: Cassandra > Issue Type: Bug > Components: Accord >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > Attachments: ci_summary.html > > > The burn test fails almost every run at the moment we found several things to > fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19641) Accord barriers/inclusive sync points cause failures in BurnTest
[ https://issues.apache.org/jira/browse/CASSANDRA-19641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ariel Weisberg updated CASSANDRA-19641: --- Test and Documentation Plan: Small tweaks to one of the Accord tests, covered by existing simulator tests, going to add checks in AccordMigrationTest that validate that the cache and system table for migrated keys is being correctly populated Status: Patch Available (was: Open) > Accord barriers/inclusive sync points cause failures in BurnTest > > > Key: CASSANDRA-19641 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19641 > Project: Cassandra > Issue Type: Bug > Components: Accord >Reporter: Ariel Weisberg >Assignee: Ariel Weisberg >Priority: Normal > > The burn test fails almost every run at the moment we found several things to > fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19652) ShallowInfoRetriever: cache offsets to void resetting of RandomAccessReader buffer
[ https://issues.apache.org/jira/browse/CASSANDRA-19652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19652: - Change Category: Performance Complexity: Normal Status: Open (was: Triage Needed) > ShallowInfoRetriever: cache offsets to void resetting of RandomAccessReader > buffer > -- > > Key: CASSANDRA-19652 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19652 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Dmitry Konstantinov >Assignee: Dmitry Konstantinov >Priority: Normal > Fix For: 5.x > > > Currently in > org.apache.cassandra.io.sstable.format.big.RowIndexEntry.ShallowInfoRetriever#fetchIndex > we do 2 seek/read operations: 1st is to find the offset for IndexInfo and > the 2nd to read it. These are two quite distant regions of the file and for > standard disk access mode we do not use a benefit from a buffer in > RandomAccessReader due to jumping between the regions and reseting this > buffer again and again. A possible improvement here can be to read and cache > N first offsets (to limit the amount of memory to use) on the first read and > do later only sequential reads of IndexInfo data. By caching of less than 1Kb > we can reduce the number of syscalls even more, in my case: from few hundred > to less than 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19652) ShallowInfoRetriever: cache offsets to void resetting of RandomAccessReader buffer
[ https://issues.apache.org/jira/browse/CASSANDRA-19652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Konstantinov reassigned CASSANDRA-19652: --- Assignee: Dmitry Konstantinov > ShallowInfoRetriever: cache offsets to void resetting of RandomAccessReader > buffer > -- > > Key: CASSANDRA-19652 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19652 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Dmitry Konstantinov >Assignee: Dmitry Konstantinov >Priority: Normal > Fix For: 5.x > > > Currently in > org.apache.cassandra.io.sstable.format.big.RowIndexEntry.ShallowInfoRetriever#fetchIndex > we do 2 seek/read operations: 1st is to find the offset for IndexInfo and > the 2nd to read it. These are two quite distant regions of the file and for > standard disk access mode we do not use a benefit from a buffer in > RandomAccessReader due to jumping between the regions and reseting this > buffer again and again. A possible improvement here can be to read and cache > N first offsets (to limit the amount of memory to use) on the first read and > do later only sequential reads of IndexInfo data. By caching of less than 1Kb > we can reduce the number of syscalls even more, in my case: from few hundred > to less than 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19652) ShallowInfoRetriever: cache offsets to void resetting of RandomAccessReader buffer
[ https://issues.apache.org/jira/browse/CASSANDRA-19652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Konstantinov updated CASSANDRA-19652: Fix Version/s: 5.x > ShallowInfoRetriever: cache offsets to void resetting of RandomAccessReader > buffer > -- > > Key: CASSANDRA-19652 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19652 > Project: Cassandra > Issue Type: Improvement > Components: Local/SSTable >Reporter: Dmitry Konstantinov >Priority: Normal > Fix For: 5.x > > > Currently in > org.apache.cassandra.io.sstable.format.big.RowIndexEntry.ShallowInfoRetriever#fetchIndex > we do 2 seek/read operations: 1st is to find the offset for IndexInfo and > the 2nd to read it. These are two quite distant regions of the file and for > standard disk access mode we do not use a benefit from a buffer in > RandomAccessReader due to jumping between the regions and reseting this > buffer again and again. A possible improvement here can be to read and cache > N first offsets (to limit the amount of memory to use) on the first read and > do later only sequential reads of IndexInfo data. By caching of less than 1Kb > we can reduce the number of syscalls even more, in my case: from few hundred > to less than 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19652) ShallowInfoRetriever: cache offsets to void resetting of RandomAccessReader buffer
Dmitry Konstantinov created CASSANDRA-19652: --- Summary: ShallowInfoRetriever: cache offsets to void resetting of RandomAccessReader buffer Key: CASSANDRA-19652 URL: https://issues.apache.org/jira/browse/CASSANDRA-19652 Project: Cassandra Issue Type: Improvement Components: Local/SSTable Reporter: Dmitry Konstantinov Currently in org.apache.cassandra.io.sstable.format.big.RowIndexEntry.ShallowInfoRetriever#fetchIndex we do 2 seek/read operations: 1st is to find the offset for IndexInfo and the 2nd to read it. These are two quite distant regions of the file and for standard disk access mode we do not use a benefit from a buffer in RandomAccessReader due to jumping between the regions and reseting this buffer again and again. A possible improvement here can be to read and cache N first offsets (to limit the amount of memory to use) on the first read and do later only sequential reads of IndexInfo data. By caching of less than 1Kb we can reduce the number of syscalls even more, in my case: from few hundred to less than 10. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Konstantinov reassigned CASSANDRA-19651: --- Assignee: Dmitry Konstantinov > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Assignee: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 19651-4.1.patch > > > org.apache.cassandra.service.AbstractWriteResponseHandler: > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Konstantinov updated CASSANDRA-19651: Description: org.apache.cassandra.service.AbstractWriteResponseHandler: {code:java} private final void decrementResponseOrExpired() { int decrementedValue = responsesAndExpirations.decrementAndGet(); if (decrementedValue == 0) { // The condition being signaled is a valid proxy for the CL being achieved // Only mark it as failed if the requested CL was achieved. if (!condition.isSignalled() && requestedCLAchieved) { replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); } else { replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - queryStartNanoTime); } } } {code} Actual result: responsesAndExpirations is a total number of replicas across all DCs which does not depend on the ideal CL, so the metric value for replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the latest response/timeout for all replicas. Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get enough responses from replicas according to the ideal CL. was: {code:java} private final void decrementResponseOrExpired() { int decrementedValue = responsesAndExpirations.decrementAndGet(); if (decrementedValue == 0) { // The condition being signaled is a valid proxy for the CL being achieved // Only mark it as failed if the requested CL was achieved. if (!condition.isSignalled() && requestedCLAchieved) { replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); } else { replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - queryStartNanoTime); } } } {code} Actual result: responsesAndExpirations is a total number of replicas across all DCs which does not depend on the ideal CL, so the metric value for replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the latest response/timeout for all replicas. Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get enough responses from replicas according to the ideal CL. > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 19651-4.1.patch > > > org.apache.cassandra.service.AbstractWriteResponseHandler: > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848235#comment-17848235 ] Dmitry Konstantinov edited comment on CASSANDRA-19651 at 5/21/24 2:56 PM: -- Please find the patch for 4.1 branch attached: [^19651-4.1.patch] was (Author: dnk): I am going to attach a patch with a fix soon. > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 19651-4.1.patch > > > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Konstantinov updated CASSANDRA-19651: Attachment: 19651-4.1.patch > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > Attachments: 19651-4.1.patch > > > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848235#comment-17848235 ] Dmitry Konstantinov commented on CASSANDRA-19651: - I am going to attach a patch with a fix soon. > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19651: - Fix Version/s: 5.0.x 5.x > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x, 5.0.x, 5.x > > > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19651: - Bug Category: Parent values: Correctness(12982) Complexity: Low Hanging Fruit Component/s: Legacy/Observability Severity: Normal Status: Open (was: Triage Needed) > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Observability >Reporter: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x > > > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-19651: - Fix Version/s: 4.1.x > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug >Reporter: Dmitry Konstantinov >Priority: Normal > Fix For: 4.1.x > > > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
[ https://issues.apache.org/jira/browse/CASSANDRA-19651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry Konstantinov updated CASSANDRA-19651: Discovered By: User Report Since Version: 4.1.0 > idealCLWriteLatency metric reports the worst response time instead of the > time when ideal CL is satisfied > - > > Key: CASSANDRA-19651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 > Project: Cassandra > Issue Type: Bug >Reporter: Dmitry Konstantinov >Priority: Normal > > {code:java} > private final void decrementResponseOrExpired() > { > int decrementedValue = responsesAndExpirations.decrementAndGet(); > if (decrementedValue == 0) > { > // The condition being signaled is a valid proxy for the CL being > achieved > // Only mark it as failed if the requested CL was achieved. > if (!condition.isSignalled() && requestedCLAchieved) > { > replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); > } > else > { > > replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - > queryStartNanoTime); > } > } > } {code} > Actual result: responsesAndExpirations is a total number of replicas across > all DCs which does not depend on the ideal CL, so the metric value for > replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the > latest response/timeout for all replicas. > Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated > when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-19651) idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied
Dmitry Konstantinov created CASSANDRA-19651: --- Summary: idealCLWriteLatency metric reports the worst response time instead of the time when ideal CL is satisfied Key: CASSANDRA-19651 URL: https://issues.apache.org/jira/browse/CASSANDRA-19651 Project: Cassandra Issue Type: Bug Reporter: Dmitry Konstantinov {code:java} private final void decrementResponseOrExpired() { int decrementedValue = responsesAndExpirations.decrementAndGet(); if (decrementedValue == 0) { // The condition being signaled is a valid proxy for the CL being achieved // Only mark it as failed if the requested CL was achieved. if (!condition.isSignalled() && requestedCLAchieved) { replicaPlan.keyspace().metric.writeFailedIdealCL.inc(); } else { replicaPlan.keyspace().metric.idealCLWriteLatency.addNano(nanoTime() - queryStartNanoTime); } } } {code} Actual result: responsesAndExpirations is a total number of replicas across all DCs which does not depend on the ideal CL, so the metric value for replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get the latest response/timeout for all replicas. Expected result: replicaPlan.keyspace().metric.idealCLWriteLatency is updated when we get enough responses from replicas according to the ideal CL. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-19593) Transactional Guardrails
[ https://issues.apache.org/jira/browse/CASSANDRA-19593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848171#comment-17848171 ] Stefan Miklosovic edited comment on CASSANDRA-19593 at 5/21/24 11:45 AM: - Progress: I finished all transformations (Flags, Values, Thresholds, Customs) (Custom is a custom guardrail as per CEP-24). I have also tested ser/de of these transformations, there are tests for vtables and diffing of transformations as well. I have also started to validate the input in vtables before it is committed to TCM. That means that no invalid configuration (e.g. warn threshold bigger than fail threshold) will be committed when CQL statement against such vtables is executed. Validations are done per each logical guardrail category if applicable. It is worth to say that from the implementation perspective, Values (related to values guardrail) are rather special when it comes to CQL. If we want to have this table {code:java} VIRTUAL TABLE system_guardrails.values ( name text PRIMARY KEY, disallowed frozen>, ignored frozen>, warned frozen> ) {code} when we do this query: {code:java} update system_guardrails.values set warned = {'QUORUM'}, disallowed = {'EACH_QUORUM'} WHERE name = 'read_consistency_levels'; {code} the way how it works until now is that each value for respective column will come to AbstractMutableVirtualTable#applyColumnUpdate. But, think about that, if we have two columns modified, as in the above example, that would translate to two separate commits into TCM, just because mutable vtable iterates over such columns. I do not think this is desirable, there should be one commit per query, basically. So the transformation might contain more than one column. In order to do that, I had to override "apply" method in AbstractMutableVirtualTable and I had to remove "final" modifier. This was already discussed with [~blerer] that it might be possible to remove that in order to accommodate this kind of situations. Also, I am making sure that I am not committing something which has not changed. E.g. when I execute above query twice, it will be actually committed just once, because for the second time, when diffing it, there is no difference, hence no commit is necessary. This was quite tricky to get right, especially for values, because I wanted to model the situation when we are removing the value by setting it to null, like this: {code:java} update system_guardrails.values set warned = null, disallowed = {} WHERE name = 'read_consistency_levels'; {code} "null" and "empty" are two different operations in terms of mutable vtable. If it is set to null, it is looked at as if we are going to delete but if it is an empty set, it is a regular update. This was tricky to get right too but I think I am there. was (Author: smiklosovic): Progress: I finished all transformations (Flags, Values, Thresholds, Customs) (Custom is a custom guardrail as per CEP-24). I have also tested ser/de of these transformations, there are tests for vtables and diffing of transformations as well. I have also started to validate the input in vtables before it is committed to TCM. That means that no invalid configuration (e.g. warn threshold bigger than fail threshold) will be committed when CQL statement against such vtables is executed. Validations are done per each logical guardrail category if applicable. > Transactional Guardrails > > > Key: CASSANDRA-19593 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19593 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > I think it is time to start to think about this more seriously. TCM is > getting into pretty nice shape and we might start to investigate how to do > this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-19593) Transactional Guardrails
[ https://issues.apache.org/jira/browse/CASSANDRA-19593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848171#comment-17848171 ] Stefan Miklosovic commented on CASSANDRA-19593: --- Progress: I finished all transformations (Flags, Values, Thresholds, Customs) (Custom is a custom guardrail as per CEP-24). I have also tested ser/de of these transformations, there are tests for vtables and diffing of transformations as well. I have also started to validate the input in vtables before it is committed to TCM. That means that no invalid configuration (e.g. warn threshold bigger than fail threshold) will be committed when CQL statement against such vtables is executed. Validations are done per each logical guardrail category if applicable. > Transactional Guardrails > > > Key: CASSANDRA-19593 > URL: https://issues.apache.org/jira/browse/CASSANDRA-19593 > Project: Cassandra > Issue Type: New Feature > Components: Feature/Guardrails, Transactional Cluster Metadata >Reporter: Stefan Miklosovic >Assignee: Stefan Miklosovic >Priority: Normal > Fix For: 5.x > > Time Spent: 10m > Remaining Estimate: 0h > > I think it is time to start to think about this more seriously. TCM is > getting into pretty nice shape and we might start to investigate how to do > this. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org