[jira] [Updated] (CASSANDRA-15291) Batch the token metadata update to improve the speed
[ https://issues.apache.org/jira/browse/CASSANDRA-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15291: --- Test and Documentation Plan: unitttest Status: Patch Available (was: Open) > Batch the token metadata update to improve the speed > > > Key: CASSANDRA-15291 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15291 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Low > > There's a faster API to batch load the tokens instead of updating one > endpoint at a time. For a large vNode cluster (> 1K nodes), it can reduce the > populate time from 14 seconds to > 0.1 seconds. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15141) Faster token ownership calculation for NetworkTopologyStrategy
[ https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15141: --- Summary: Faster token ownership calculation for NetworkTopologyStrategy (was: RemoveNode takes long time and blocks gossip stage) > Faster token ownership calculation for NetworkTopologyStrategy > -- > > Key: CASSANDRA-15141 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15141 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > This function > [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] > during removenode and decommission is slow for large vnode cluster with > NetworkTopologyStrategy. As it needs to build whole replications map for > every token range. > In one of our cluster (> 1k nodes), it takes about 20 seconds for each > NetworkTopologyStrategy keyspace, so the total time to process a removenode > message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user > keyspace). It blocks the heartbeat propagation and causes false down node. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15291) Batch the token metadata update to improve the speed
[ https://issues.apache.org/jira/browse/CASSANDRA-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915152#comment-16915152 ] Jay Zhuang commented on CASSANDRA-15291: | Branch | CircleCI | | [15291-trunk|https://github.com/Instagram/cassandra/tree/15291-trunk] | https://circleci.com/workflow-run/ab9ff528-e34a-476f-ac91-767f1dab796a | > Batch the token metadata update to improve the speed > > > Key: CASSANDRA-15291 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15291 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Low > > There's a faster API to batch load the tokens instead of updating one > endpoint at a time. For a large vNode cluster (> 1K nodes), it can reduce the > populate time from 14 seconds to > 0.1 seconds. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15291) Batch the token metadata update to improve the speed
[ https://issues.apache.org/jira/browse/CASSANDRA-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15291: --- Summary: Batch the token metadata update to improve the speed (was: Batch the token update to improve the token populate speed) > Batch the token metadata update to improve the speed > > > Key: CASSANDRA-15291 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15291 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Low > > There's a faster API to batch load the tokens instead of updating one > endpoint at a time. For a large vNode cluster (> 1K nodes), it can reduce the > populate time from 14 seconds to > 0.1 seconds. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15291) Batch the token update to improve the token populate speed
[ https://issues.apache.org/jira/browse/CASSANDRA-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15291: --- Change Category: Performance Complexity: Normal Priority: Low (was: Normal) Status: Open (was: Triage Needed) > Batch the token update to improve the token populate speed > -- > > Key: CASSANDRA-15291 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15291 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Low > > There's a faster API to batch load the tokens instead of updating one > endpoint at a time. For a large vNode cluster (> 1K nodes), it can reduce the > populate time from 14 seconds to > 0.1 seconds. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15291) Batch the token update to improve the token populate speed
Jay Zhuang created CASSANDRA-15291: -- Summary: Batch the token update to improve the token populate speed Key: CASSANDRA-15291 URL: https://issues.apache.org/jira/browse/CASSANDRA-15291 Project: Cassandra Issue Type: Improvement Components: Cluster/Membership Reporter: Jay Zhuang Assignee: Jay Zhuang There's a faster API to batch load the tokens instead of updating one endpoint at a time. For a large vNode cluster (> 1K nodes), it can reduce the populate time from 14 seconds to 0.1 seconds. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15290) Avoid token cache invalidation for removing proxy node
[ https://issues.apache.org/jira/browse/CASSANDRA-15290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15290: --- Test and Documentation Plan: The patch is deployed in our production env. Status: Patch Available (was: Open) > Avoid token cache invalidation for removing proxy node > -- > > Key: CASSANDRA-15290 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15290 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Low > Fix For: 4.0.x > > > As proxy node doesn't own token, adding/removing doesn't change token > ownership and no need to invalidate the token cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15290) Avoid token cache invalidation for removing proxy node
[ https://issues.apache.org/jira/browse/CASSANDRA-15290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915145#comment-16915145 ] Jay Zhuang commented on CASSANDRA-15290: | Branch | CircleCI | | [15290-trunk|https://github.com/Instagram/cassandra/tree/15290-trunk] | https://circleci.com/workflow-run/068aa546-39cd-4999-b959-4bd5d180c5d1 | > Avoid token cache invalidation for removing proxy node > -- > > Key: CASSANDRA-15290 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15290 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Low > Fix For: 4.0.x > > > As proxy node doesn't own token, adding/removing doesn't change token > ownership and no need to invalidate the token cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15290) Avoid token cache invalidation for removing proxy node
[ https://issues.apache.org/jira/browse/CASSANDRA-15290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15290: --- Change Category: Performance Complexity: Normal Fix Version/s: 4.0.x Priority: Low (was: Normal) Status: Open (was: Triage Needed) > Avoid token cache invalidation for removing proxy node > -- > > Key: CASSANDRA-15290 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15290 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Low > Fix For: 4.0.x > > > As proxy node doesn't own token, adding/removing doesn't change token > ownership and no need to invalidate the token cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15290) Avoid token cache invalidation for removing proxy node
Jay Zhuang created CASSANDRA-15290: -- Summary: Avoid token cache invalidation for removing proxy node Key: CASSANDRA-15290 URL: https://issues.apache.org/jira/browse/CASSANDRA-15290 Project: Cassandra Issue Type: Improvement Components: Cluster/Membership Reporter: Jay Zhuang Assignee: Jay Zhuang As proxy node doesn't own token, adding/removing doesn't change token ownership and no need to invalidate the token cache. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15239) [flaky in-mem dtest] nodeDownDuringMove - org.apache.cassandra.distributed.test.GossipTest
[ https://issues.apache.org/jira/browse/CASSANDRA-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15239: --- Severity: Normal Complexity: Normal Discovered By: Unit Test Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Status: Open (was: Triage Needed) > [flaky in-mem dtest] nodeDownDuringMove - > org.apache.cassandra.distributed.test.GossipTest > -- > > Key: CASSANDRA-15239 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15239 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Jay Zhuang >Priority: Normal > > The in-mem dtest fail from time to time: > {noformat} > nodeDownDuringMove - org.apache.cassandra.distributed.test.GossipTest > java.lang.RuntimeException: java.lang.IllegalStateException: Unable to > contact any seeds! > {noformat} > [https://circleci.com/gh/Instagram/cassandra/98] > More details: > {noformat} > Testcase: > nodeDownDuringMove(org.apache.cassandra.distributed.test.GossipTest): Caused > an ERROR > java.lang.IllegalStateException: Unable to contact any seeds! > java.lang.RuntimeException: java.lang.IllegalStateException: Unable to > contact any seeds! > at > org.apache.cassandra.distributed.impl.IsolatedExecutor.waitOn(IsolatedExecutor.java:166) > at > org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$sync$4(IsolatedExecutor.java:69) > at > org.apache.cassandra.distributed.impl.Instance.startup(Instance.java:322) > at > org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:148) > at > org.apache.cassandra.distributed.test.GossipTest.nodeDownDuringMove(GossipTest.java:96) > Caused by: java.lang.IllegalStateException: Unable to contact any seeds! > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1261) > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:921) > at > org.apache.cassandra.distributed.impl.Instance.lambda$startup$6(Instance.java:301) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83) > at java.lang.Thread.run(Thread.java:748) > Test org.apache.cassandra.distributed.test.GossipTest FAILED > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode
[ https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889202#comment-16889202 ] Jay Zhuang edited comment on CASSANDRA-15098 at 7/19/19 10:07 PM: -- Rebased the code and passed tests, please review: | Branch | uTest | jvm-dTest | dTest | dTest vnode | | [15098-3.0|https://github.com/instagram/cassandra/tree/15098-3.0] | [#107 passed |https://circleci.com/gh/Instagram/cassandra/107] | [#108 failed|https://circleci.com/gh/Instagram/cassandra/108], known issue: CASSANDRA-15239 | [#110 failed|https://circleci.com/gh/Instagram/cassandra/110], passed locally, known issue: CASSANDRA-14595 | [#109 failed | https://circleci.com/gh/Instagram/cassandra/109], passed locally, known issue: CASSANDRA-14595 | | [15098-3.11|https://github.com/instagram/cassandra/tree/15098-3.11] | [#100 passed|https://circleci.com/gh/Instagram/cassandra/100] | [#99 passed|https://circleci.com/gh/Instagram/cassandra/99] | [#111 failed|https://circleci.com/gh/Instagram/cassandra/111], passed locally: CASSANDRA-14595 | [#112 failed|https://circleci.com/gh/Instagram/cassandra/112], passed locally: CASSANDRA-14595 | | [15098-trunk|https://github.com/instagram/cassandra/tree/15098-trunk] | [#104 failed|https://circleci.com/gh/Instagram/cassandra/104], passed locally and re-run passed [#117|https://circleci.com/gh/Instagram/cassandra/117] | [#105 passed|https://circleci.com/gh/Instagram/cassandra/105] | [#114 passed|https://circleci.com/gh/Instagram/cassandra/114] | [#113 passed|https://circleci.com/gh/Instagram/cassandra/113] | was (Author: jay.zhuang): Rebased the code and passed tests, please review: | Branch | uTest | jvm-dTest | dTest | dTest vnode | | [15098-3.0|https://github.com/instagram/cassandra/tree/15098-3.0] | [#107 passed |https://circleci.com/gh/Instagram/cassandra/107] | [#108 failed|https://circleci.com/gh/Instagram/cassandra/108], known issue: CASSANDRA-15239 | [#110 failed|https://circleci.com/gh/Instagram/cassandra/110], passed locally, known issue: CASSANDRA-14595 | [#109 failed | https://circleci.com/gh/Instagram/cassandra/109], passed locally, known issue: CASSANDRA-14595 | | [15098-3.11|https://github.com/instagram/cassandra/tree/15098-3.11] | [#100 passed|https://circleci.com/gh/Instagram/cassandra/100] | [#99 passed|https://circleci.com/gh/Instagram/cassandra/99] | [#111 failed|https://circleci.com/gh/Instagram/cassandra/111], passed locally: CASSANDRA-14595 | [#112 failed|https://circleci.com/gh/Instagram/cassandra/112], passed locally: CASSANDRA-14595 | | [15098-trunk|https://github.com/instagram/cassandra/tree/15098-trunk] | [#104 failed|https://circleci.com/gh/Instagram/cassandra/104], passed locally and re-run passed [#117|https://circleci.com/gh/Instagram/cassandra/117] | [#105 passed|https://circleci.com/gh/Instagram/cassandra/105] | [#114 passed|https://circleci.com/gh/Instagram/cassandra/114] | [#113 passed|https://circleci.com/gh/Instagram/cassandra/113] | > Endpoints no longer owning tokens are not removed for vnode > --- > > Key: CASSANDRA-15098 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15098 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The logical here to remove endpoints no longer owning tokens is not working > for multiple tokens (vnode): > https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505 > And it's very expensive to copy the tokenmetadata for every check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode
[ https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889202#comment-16889202 ] Jay Zhuang commented on CASSANDRA-15098: Rebased the code and passed tests, please review: | Branch | uTest | jvm-dTest | dTest | dTest vnode | | [15098-3.0|https://github.com/instagram/cassandra/tree/15098-3.0] | [#107 passed |https://circleci.com/gh/Instagram/cassandra/107] | [#108 failed|https://circleci.com/gh/Instagram/cassandra/108], known issue: CASSANDRA-15239 | [#110 failed|https://circleci.com/gh/Instagram/cassandra/110], passed locally, known issue: CASSANDRA-14595 | [#109 failed | https://circleci.com/gh/Instagram/cassandra/109], passed locally, known issue: CASSANDRA-14595 | | [15098-3.11|https://github.com/instagram/cassandra/tree/15098-3.11] | [#100 passed|https://circleci.com/gh/Instagram/cassandra/100] | [#99 passed|https://circleci.com/gh/Instagram/cassandra/99] | [#111 failed|https://circleci.com/gh/Instagram/cassandra/111], passed locally: CASSANDRA-14595 | [#112 failed|https://circleci.com/gh/Instagram/cassandra/112], passed locally: CASSANDRA-14595 | | [15098-trunk|https://github.com/instagram/cassandra/tree/15098-trunk] | [#104 failed|https://circleci.com/gh/Instagram/cassandra/104], passed locally and re-run passed [#117|https://circleci.com/gh/Instagram/cassandra/117] | [#105 passed|https://circleci.com/gh/Instagram/cassandra/105] | [#114 passed|https://circleci.com/gh/Instagram/cassandra/114] | [#113 passed|https://circleci.com/gh/Instagram/cassandra/113] | > Endpoints no longer owning tokens are not removed for vnode > --- > > Key: CASSANDRA-15098 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15098 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The logical here to remove endpoints no longer owning tokens is not working > for multiple tokens (vnode): > https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505 > And it's very expensive to copy the tokenmetadata for every check. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15239) [flaky in-mem dtest] nodeDownDuringMove - org.apache.cassandra.distributed.test.GossipTest
Jay Zhuang created CASSANDRA-15239: -- Summary: [flaky in-mem dtest] nodeDownDuringMove - org.apache.cassandra.distributed.test.GossipTest Key: CASSANDRA-15239 URL: https://issues.apache.org/jira/browse/CASSANDRA-15239 Project: Cassandra Issue Type: Bug Components: Test/dtest Reporter: Jay Zhuang The in-mem dtest fail from time to time: {noformat} nodeDownDuringMove - org.apache.cassandra.distributed.test.GossipTest java.lang.RuntimeException: java.lang.IllegalStateException: Unable to contact any seeds! {noformat} [https://circleci.com/gh/Instagram/cassandra/98] More details: {noformat} Testcase: nodeDownDuringMove(org.apache.cassandra.distributed.test.GossipTest): Caused an ERROR java.lang.IllegalStateException: Unable to contact any seeds! java.lang.RuntimeException: java.lang.IllegalStateException: Unable to contact any seeds! at org.apache.cassandra.distributed.impl.IsolatedExecutor.waitOn(IsolatedExecutor.java:166) at org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$sync$4(IsolatedExecutor.java:69) at org.apache.cassandra.distributed.impl.Instance.startup(Instance.java:322) at org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:148) at org.apache.cassandra.distributed.test.GossipTest.nodeDownDuringMove(GossipTest.java:96) Caused by: java.lang.IllegalStateException: Unable to contact any seeds! at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1261) at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:921) at org.apache.cassandra.distributed.impl.Instance.lambda$startup$6(Instance.java:301) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83) at java.lang.Thread.run(Thread.java:748) Test org.apache.cassandra.distributed.test.GossipTest FAILED {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15097: --- Fix Version/s: 4.0 3.11.5 3.0.19 Source Control Link: [3f70e7c72c703bc323b169a28e8754ce67d4e479|https://github.com/apache/cassandra/commit/3f70e7c72c703bc323b169a28e8754ce67d4e479] Since Version: 3.0.0 Status: Resolved (was: Ready to Commit) Resolution: Fixed Thanks [~samt] for the review. Committed as [3f70e7c|https://github.com/apache/cassandra/commit/3f70e7c72c703bc323b169a28e8754ce67d4e479]. > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > Fix For: 3.0.19, 3.11.5, 4.0 > > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886404#comment-16886404 ] Jay Zhuang commented on CASSANDRA-15097: Add dTest results: | Branch | uTest | jvm-dTest | dTest | dTest vnode | | [15097-3.0|https://github.com/instagram/cassandra/tree/15097-3.0] | [#66 passed |https://circleci.com/gh/Instagram/cassandra/66] | [#67 passed|https://circleci.com/gh/Instagram/cassandra/67] | [#75 failed|https://circleci.com/gh/Instagram/cassandra/75], passed locally: CASSANDRA-14595 | [#74 failed | https://circleci.com/gh/Instagram/cassandra/74], passed locally: CASSANDRA-14595 | | [15097-3.11|https://github.com/instagram/cassandra/tree/15097-3.11] | [#69 passed|https://circleci.com/gh/Instagram/cassandra/69] | [#68 passed|https://circleci.com/gh/Instagram/cassandra/68] | [#77 failed|https://circleci.com/gh/Instagram/cassandra/77], passed locally: CASSANDRA-14595 | [#76 failed|https://circleci.com/gh/Instagram/cassandra/76], passed locally: CASSANDRA-14595 | | [15097-trunk|https://github.com/instagram/cassandra/tree/15097-trunk] | [#72 passed|https://circleci.com/gh/Instagram/cassandra/72] | [#73 passed|https://circleci.com/gh/Instagram/cassandra/73] | [#78 passed|https://circleci.com/gh/Instagram/cassandra/78] | [#79 failed|https://circleci.com/gh/Instagram/cassandra/79], passed locally| All failed dtests are passed locally with 10x run: {{$ pytest --count=10 --cassandra-dir=~/cassandra $TESTS}} > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885427#comment-16885427 ] Jay Zhuang edited comment on CASSANDRA-15097 at 7/15/19 5:29 PM: - Thanks [~samt]. The patch is rebased and tests are passed in circleci: | Branch | uTest (circleci) | | [15097-3.0|https://github.com/instagram/cassandra/tree/15097-3.0] | [pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.0] | | [15097-3.11|https://github.com/instagram/cassandra/tree/15097-3.11] | [pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.11] | | [15097-trunk|https://github.com/instagram/cassandra/tree/15097-trunk] | [pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-trunk] | was (Author: jay.zhuang): Here is a patch to filter out updated states: | Branch | uTest (circleci) | | [15097-3.0|https://github.com/instagram/cassandra/tree/15097-3.0] | [pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.0] | | [15097-3.11|https://github.com/instagram/cassandra/tree/15097-3.11] | [pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.11] | | [15097-trunk|https://github.com/instagram/cassandra/tree/15097-trunk] | [pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-trunk] | > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15097: --- Status: Review In Progress (was: Changes Suggested) > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885427#comment-16885427 ] Jay Zhuang commented on CASSANDRA-15097: Here is a patch to filter out updated states: | Branch | uTest (circleci) | | [15097-3.0|https://github.com/instagram/cassandra/tree/15097-3.0] | [pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.0] | | [15097-3.11|https://github.com/instagram/cassandra/tree/15097-3.11] | [pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.11] | | [15097-trunk|https://github.com/instagram/cassandra/tree/15097-trunk] | [pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-trunk] | > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15141) RemoveNode takes long time and blocks gossip stage
[ https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15141: --- Test and Documentation Plan: It's deployed to all our production clusters. Status: Patch Available (was: Open) > RemoveNode takes long time and blocks gossip stage > -- > > Key: CASSANDRA-15141 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15141 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > This function > [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] > during removenode and decommission is slow for large vnode cluster with > NetworkTopologyStrategy. As it needs to build whole replications map for > every token range. > In one of our cluster (> 1k nodes), it takes about 20 seconds for each > NetworkTopologyStrategy keyspace, so the total time to process a removenode > message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user > keyspace). It blocks the heartbeat propagation and causes false down node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15141) RemoveNode takes long time and blocks gossip stage
[ https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876492#comment-16876492 ] Jay Zhuang commented on CASSANDRA-15141: Here is a patch to improve the performance of calculating endpoint replicas. It's 100x - 1000x faster than the default implementation: | Branch | uTest | JVM-dTest | dTest | | [15141-trunk|https://github.com/Instagram/cassandra/tree/15141-trunk] | [circle #51|https://circleci.com/gh/Instagram/cassandra/51] | [circle #50|https://circleci.com/gh/Instagram/cassandra/50] | [circle #53|https://circleci.com/gh/Instagram/cassandra/53] | > RemoveNode takes long time and blocks gossip stage > -- > > Key: CASSANDRA-15141 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15141 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > This function > [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] > during removenode and decommission is slow for large vnode cluster with > NetworkTopologyStrategy. As it needs to build whole replications map for > every token range. > In one of our cluster (> 1k nodes), it takes about 20 seconds for each > NetworkTopologyStrategy keyspace, so the total time to process a removenode > message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user > keyspace). It blocks the heartbeat propagation and causes false down node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15141) RemoveNode takes long time and blocks gossip stage
[ https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15141: --- Complexity: Challenging Change Category: Performance Status: Open (was: Triage Needed) > RemoveNode takes long time and blocks gossip stage > -- > > Key: CASSANDRA-15141 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15141 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > This function > [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] > during removenode and decommission is slow for large vnode cluster with > NetworkTopologyStrategy. As it needs to build whole replications map for > every token range. > In one of our cluster (> 1k nodes), it takes about 20 seconds for each > NetworkTopologyStrategy keyspace, so the total time to process a removenode > message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user > keyspace). It blocks the heartbeat propagation and causes false down node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15141) RemoveNode takes long time and blocks gossip stage
Jay Zhuang created CASSANDRA-15141: -- Summary: RemoveNode takes long time and blocks gossip stage Key: CASSANDRA-15141 URL: https://issues.apache.org/jira/browse/CASSANDRA-15141 Project: Cassandra Issue Type: Improvement Components: Cluster/Gossip, Cluster/Membership Reporter: Jay Zhuang Assignee: Jay Zhuang This function [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002] during removenode and decommission is slow for large vnode cluster with NetworkTopologyStrategy. As it needs to build whole replications map for every token range. In one of our cluster (> 1k nodes), it takes about 20 seconds for each NetworkTopologyStrategy keyspace, so the total time to process a removenode message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user keyspace). It blocks the heartbeat propagation and causes false down node. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15097: --- Test and Documentation Plan: Unittest is passed. And the code is committed and running in Instagram production environment. Status: Patch Available (was: Open) > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15133) Node restart causes unnecessary token metadata update
[ https://issues.apache.org/jira/browse/CASSANDRA-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15133: --- Test and Documentation Plan: Unittest is passed. And the code is committed and running in Instagram production environment. Status: Patch Available (was: Open) > Node restart causes unnecessary token metadata update > - > > Key: CASSANDRA-15133 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15133 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > Restarting a node causes gossip generation update. When it propagates the > message to the cluster, every node blindly update its local token metadata > even it is not changed. Update token metadata is expensive for large vnode > cluster and causes token metadata cache unnecessarily invalided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15097: --- Severity: Low Complexity: Normal Discovered By: User Report Bug Category: Parent values: Degradation(12984)Level 1 values: Performance Bug/Regression(12997) Status: Open (was: Triage Needed) > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode
[ https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15098: --- Test and Documentation Plan: Unittest is passed. And the code is committed and running in Instagram production environment. Status: Patch Available (was: Open) > Endpoints no longer owning tokens are not removed for vnode > --- > > Key: CASSANDRA-15098 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15098 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The logical here to remove endpoints no longer owning tokens is not working > for multiple tokens (vnode): > https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505 > And it's very expensive to copy the tokenmetadata for every check. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode
[ https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15098: --- Severity: Normal Complexity: Normal Discovered By: User Report Bug Category: Parent values: Correctness(12982)Level 1 values: Persistent Corruption / Loss(12986) Status: Open (was: Triage Needed) > Endpoints no longer owning tokens are not removed for vnode > --- > > Key: CASSANDRA-15098 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15098 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The logical here to remove endpoints no longer owning tokens is not working > for multiple tokens (vnode): > https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505 > And it's very expensive to copy the tokenmetadata for every check. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15133) Node restart causes unnecessary token metadata update
[ https://issues.apache.org/jira/browse/CASSANDRA-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-15133: --- Complexity: Low Hanging Fruit Change Category: Performance Status: Open (was: Triage Needed) > Node restart causes unnecessary token metadata update > - > > Key: CASSANDRA-15133 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15133 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > Restarting a node causes gossip generation update. When it propagates the > message to the cluster, every node blindly update its local token metadata > even it is not changed. Update token metadata is expensive for large vnode > cluster and causes token metadata cache unnecessarily invalided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15133) Node restart causes unnecessary token metadata update
[ https://issues.apache.org/jira/browse/CASSANDRA-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844502#comment-16844502 ] Jay Zhuang commented on CASSANDRA-15133: Here is a purposed fix: | [15133-trunk|https://github.com/cooldoger/cassandra/tree/15133-trunk] | [!https://circleci.com/gh/cooldoger/cassandra/tree/15133-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15133-trunk] | And also it fixes a removing {{movingEndpoint}} issue, as the following code never works, removing item while looping the collection will throw `ConcurrentModificationException`: {noformat} for (Pair pair : movingEndpoints) { if (pair.right.equals(endpoint)) { movingEndpoints.remove(pair); break; } } {noformat} > Node restart causes unnecessary token metadata update > - > > Key: CASSANDRA-15133 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15133 > Project: Cassandra > Issue Type: Improvement > Components: Cluster/Gossip, Cluster/Membership >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > Restarting a node causes gossip generation update. When it propagates the > message to the cluster, every node blindly update its local token metadata > even it is not changed. Update token metadata is expensive for large vnode > cluster and causes token metadata cache unnecessarily invalided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15133) Node restart causes unnecessary token metadata update
Jay Zhuang created CASSANDRA-15133: -- Summary: Node restart causes unnecessary token metadata update Key: CASSANDRA-15133 URL: https://issues.apache.org/jira/browse/CASSANDRA-15133 Project: Cassandra Issue Type: Improvement Components: Cluster/Gossip, Cluster/Membership Reporter: Jay Zhuang Assignee: Jay Zhuang Restarting a node causes gossip generation update. When it propagates the message to the cluster, every node blindly update its local token metadata even it is not changed. Update token metadata is expensive for large vnode cluster and causes token metadata cache unnecessarily invalided. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode
[ https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825447#comment-16825447 ] Jay Zhuang commented on CASSANDRA-15098: Here is an utest to reproduce the problem and a purposed fix, please review: | Branch | uTest | | [15098-3.0|https://github.com/cooldoger/cassandra/tree/15098-3.0] | [!https://circleci.com/gh/cooldoger/cassandra/tree/15098-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15098-3.0] | | [15098-3.11|https://github.com/cooldoger/cassandra/tree/15098-3.11] | [!https://circleci.com/gh/cooldoger/cassandra/tree/15098-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15098-3.11] | | [15098-trunk|https://github.com/cooldoger/cassandra/tree/15098-trunk] | [!https://circleci.com/gh/cooldoger/cassandra/tree/15098-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15098-trunk] | > Endpoints no longer owning tokens are not removed for vnode > --- > > Key: CASSANDRA-15098 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15098 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The logical here to remove endpoints no longer owning tokens is not working > for multiple tokens (vnode): > https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505 > And it's very expensive to copy the tokenmetadata for every check. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode
Jay Zhuang created CASSANDRA-15098: -- Summary: Endpoints no longer owning tokens are not removed for vnode Key: CASSANDRA-15098 URL: https://issues.apache.org/jira/browse/CASSANDRA-15098 Project: Cassandra Issue Type: Bug Components: Cluster/Gossip Reporter: Jay Zhuang Assignee: Jay Zhuang The logical here to remove endpoints no longer owning tokens is not working for multiple tokens (vnode): https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505 And it's very expensive to copy the tokenmetadata for every check. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15097) Avoid updating unchanged gossip state
[ https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824636#comment-16824636 ] Jay Zhuang commented on CASSANDRA-15097: Here is a patch to filter out updated states: | Branch | uTest | | [15097-3.0|https://github.com/cooldoger/cassandra/tree/15097-3.0] | [!https://circleci.com/gh/cooldoger/cassandra/tree/15097-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15097-3.0] | | [15097-3.11|https://github.com/cooldoger/cassandra/tree/15097-3.11] | [!https://circleci.com/gh/cooldoger/cassandra/tree/15097-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15097-3.11] | | [15097-trunk|https://github.com/cooldoger/cassandra/tree/15097-trunk] | [!https://circleci.com/gh/cooldoger/cassandra/tree/15097-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15097-trunk] | > Avoid updating unchanged gossip state > - > > Key: CASSANDRA-15097 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Normal > > The node might get unchanged gossip states, the state might be just updated > after sending a GOSSIP_SYN, then it will get the state that is already up to > date. If the heartbeat in the GOSSIP_ACK message is updated, it will > unnecessary re-apply the same state again, which could be costly like > updating token change. > It's very likely to happen for large cluster when a node startup, as the > first gossip message will sync all endpoints tokens, it could take some time > (in our case about 200 seconds), during that time, it keeps gossip with other > node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15097) Avoid updating unchanged gossip state
Jay Zhuang created CASSANDRA-15097: -- Summary: Avoid updating unchanged gossip state Key: CASSANDRA-15097 URL: https://issues.apache.org/jira/browse/CASSANDRA-15097 Project: Cassandra Issue Type: Bug Components: Cluster/Gossip Reporter: Jay Zhuang Assignee: Jay Zhuang The node might get unchanged gossip states, the state might be just updated after sending a GOSSIP_SYN, then it will get the state that is already up to date. If the heartbeat in the GOSSIP_ACK message is updated, it will unnecessary re-apply the same state again, which could be costly like updating token change. It's very likely to happen for large cluster when a node startup, as the first gossip message will sync all endpoints tokens, it could take some time (in our case about 200 seconds), during that time, it keeps gossip with other node and get the full token states. Which causes lots of pending gossip tasks. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13849) GossipStage blocks because of race in ActiveRepairService
[ https://issues.apache.org/jira/browse/CASSANDRA-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-13849: --- Fix Version/s: (was: 3.11.x) (was: 3.0.x) 3.0.16 3.11.2 > GossipStage blocks because of race in ActiveRepairService > - > > Key: CASSANDRA-13849 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13849 > Project: Cassandra > Issue Type: Bug >Reporter: Tom van der Woerdt >Assignee: Sergey Lapukhov >Priority: Major > Labels: patch > Fix For: 3.0.16, 3.11.2, 4.0 > > Attachments: CAS-13849.patch, CAS-13849_2.patch, CAS-13849_3.patch > > > Bad luck caused a kernel panic in a cluster, and that took another node with > it because GossipStage stopped responding. > I think it's pretty obvious what's happening, here are the relevant excerpts > from the stack traces : > {noformat} > "Thread-24004" #393781 daemon prio=5 os_prio=0 tid=0x7efca9647400 > nid=0xe75c waiting on condition [0x7efaa47fe000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00052b63a7e8> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:332) > - locked <0x0002e6bc99f0> (a > org.apache.cassandra.service.ActiveRepairService) > at > org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:211) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$3/1498438472.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > "GossipTasks:1" #367 daemon prio=5 os_prio=0 tid=0x7efc5e971000 > nid=0x700b waiting for monitor entry [0x7dfb839fe000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:421) > - waiting to lock <0x0002e6bc99f0> (a > org.apache.cassandra.service.ActiveRepairService) > at > org.apache.cassandra.service.ActiveRepairService.convict(ActiveRepairService.java:776) > at > org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:306) > at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:775) > > at > org.apache.cassandra.gms.Gossiper.access$800(Gossiper.java:67) > at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:187) > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$3/1498438472.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > "GossipStage:1" #320 daemon prio=5 os_prio=0 tid=0x7efc5b9f2c00 > nid=0x6fcd waiting for monitor entry [0x7e260186a000] >java.lang.Thread.State: BLOCKED (on object monitor) > at >
[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743263#comment-16743263 ] Jay Zhuang commented on CASSANDRA-14526: The new tests passed locally (except: CASSANDRA-14984, not related to this change or CASSANDRA-14525). Committed as [{{e6f58cb}}|https://github.com/apache/cassandra-dtest/commit/e6f58cb33f7a09f273c5990d5d21c7b529ba80bf]. Thanks [~chovatia.jayd...@gmail.com]. > dtest to validate Cassandra state post failed/successful bootstrap > -- > > Key: CASSANDRA-14526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14526 > Project: Cassandra > Issue Type: Sub-task > Components: Test/dtest >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Labels: dtest > > Please find dtest here: > || dtest || > | [patch > |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14526: --- Resolution: Fixed Reviewer: Jay Zhuang Status: Resolved (was: Patch Available) > dtest to validate Cassandra state post failed/successful bootstrap > -- > > Key: CASSANDRA-14526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14526 > Project: Cassandra > Issue Type: Sub-task > Components: Test/dtest >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Labels: dtest > > Please find dtest here: > || dtest || > | [patch > |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14984) [dtest] 2 TestBootstrap tests failed for branch 2.2
Jay Zhuang created CASSANDRA-14984: -- Summary: [dtest] 2 TestBootstrap tests failed for branch 2.2 Key: CASSANDRA-14984 URL: https://issues.apache.org/jira/browse/CASSANDRA-14984 Project: Cassandra Issue Type: Bug Components: Test/dtest Reporter: Jay Zhuang Failed tests: {noformat} test_decommissioned_wiped_node_can_join test_decommissioned_wiped_node_can_gossip_to_single_seed {noformat} Error: {noformat} ... # Decommision the new node and kill it logger.debug("Decommissioning & stopping node2") > node2.decommission() ... def handle_external_tool_process(process, cmd_args): out, err = process.communicate() if (out is not None) and isinstance(out, bytes): out = out.decode() if (err is not None) and isinstance(err, bytes): err = err.decode() rc = process.returncode if rc != 0: > raise ToolError(cmd_args, rc, out, err) E ccmlib.node.ToolError: Subprocess ['nodetool', '-h', 'localhost', '-p', '7200', 'decommission'] exited with non-zero status; exit status: 2; E stderr: error: Thread signal failed E -- StackTrace -- E java.io.IOException: Thread signal failed {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742385#comment-16742385 ] Jay Zhuang commented on CASSANDRA-14526: Thanks [~chovatia.jayd...@gmail.com], the change looks good to me. Kick off a build: || Branch || dTest || | 2.2 | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/672/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/672/] | | 3.0 | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/671/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/671/] | | 3.11 | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/670/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/670/] | | trunk | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/669/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/669/] | > dtest to validate Cassandra state post failed/successful bootstrap > -- > > Key: CASSANDRA-14526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14526 > Project: Cassandra > Issue Type: Sub-task > Components: Test/dtest >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Labels: dtest > > Please find dtest here: > || dtest || > | [patch > |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740064#comment-16740064 ] Jay Zhuang commented on CASSANDRA-14526: Hi [~chovatia.jayd...@gmail.com], the test {{secondary_indexes_test.py.TestPreJoinCallback.test_resume}} is still not stable. Only 2 out 10 runs are passed from my tests: {noformat} $ pytest --count=10 -p no:flaky --cassandra-dir=/Users/zjay/ws/cassandra secondary_indexes_test.py::TestPreJoinCallback::test_resume ... secondary_indexes_test.py:1175: in _base_test joinFn(cluster, tokens[1]) secondary_indexes_test.py:1210: in resume node2.watch_log_for('Starting listening for CQL clients') ... if start + timeout < time.time(): > raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", > time.gmtime()) + " [" + self.name + "] Missing: " + str([e.pattern for e in > tofind]) + ":\n" + reads[:50] + ".\nSee {} for > remainder".format(filename)) E ccmlib.node.TimeoutError: 11 Jan 2019 05:43:25 [node2] Missing: ['Starting listening for CQL clients']: E INFO [main] 2019-01-10 21:33:18,285 YamlConfigura. E See system.log for remainder ... = 8 failed, 2 passed, 2 error in 5153.09 seconds == {noformat} Would you please take a look? > dtest to validate Cassandra state post failed/successful bootstrap > -- > > Key: CASSANDRA-14526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14526 > Project: Cassandra > Issue Type: Sub-task > Components: Test/dtest >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Labels: dtest > > Please find dtest here: > || dtest || > | [patch > |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739100#comment-16739100 ] Jay Zhuang commented on CASSANDRA-14525: I'm sorry for not committing the dtest change. Will do that ASAP (I'm still trying to confirm a few flaky tests are not introduced by the changes). > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 2.2.14, 3.0.18, 3.11.4, 4.0 > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933] > and as a result >
[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732692#comment-16732692 ] Jay Zhuang commented on CASSANDRA-14526: I still see the same failure after updating the branch: {noformat} node3.start(jvm_args=["-Dcassandra.write_survey=true", "-Dcassandra.ring_delay_ms=5000"], wait_other_notice=True) self.assert_log_had_msg(node3, 'Some data streaming failed', timeout=30) > self.assert_log_had_msg(node3, 'Not starting client transports as > bootstrap has not completed', timeout=30) bootstrap_test.py:767: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = , node = , msg = 'Not starting client transports as bootstrap has not completed', timeout = 30, kwargs = {} def assert_log_had_msg(self, node, msg, timeout=600, **kwargs): """ Wrapper for ccmlib.node.Node#watch_log_for to cause an assertion failure when a log message isn't found within the timeout. :param node: Node which logs we should watch :param msg: String message we expect to see in the logs. :param timeout: Seconds to wait for msg to appear """ try: node.watch_log_for(msg, timeout=timeout, **kwargs) except TimeoutError: > pytest.fail("Log message was not seen within > timeout:\n{0}".format(msg)) E Failed: Log message was not seen within timeout: E Not starting client transports as bootstrap has not completed dtest.py:266: Failed {noformat} > dtest to validate Cassandra state post failed/successful bootstrap > -- > > Key: CASSANDRA-14526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14526 > Project: Cassandra > Issue Type: Sub-task > Components: Testing >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Labels: dtest > > Please find dtest here: > || dtest || > | [patch > |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731753#comment-16731753 ] Jay Zhuang edited comment on CASSANDRA-14526 at 1/2/19 4:08 AM: Seems the new test is failing: {noformat} $ pytest --cassandra-dir=/Users/zjay/ws/cassandra bootstrap_test.py::TestBootstrap::test_bootstrap_binary_disabled ... > self.assert_log_had_msg(node3, 'Not starting client transports as > bootstrap has not completed', timeout=30) ... E ccmlib.node.TimeoutError: 02 Jan 2019 03:45:28 [node3] Missing: ['Not starting client transports as bootstrap has not completed']: E INFO [main] 2019-01-01 19:44:46,816 Config.java:4. E See system.log for remainder ... {noformat} was (Author: jay.zhuang): Seems the test is failing for branch 2.2: {noformat} $ pytest --cassandra-dir=/Users/zjay/ws/cassandra bootstrap_test.py::TestBootstrap::test_bootstrap_binary_disabled ... E ccmlib.node.TimeoutError: 02 Jan 2019 03:45:28 [node3] Missing: ['Not starting client transports as bootstrap has not completed']: E INFO [main] 2019-01-01 19:44:46,816 Config.java:4. E See system.log for remainder ... {noformat} > dtest to validate Cassandra state post failed/successful bootstrap > -- > > Key: CASSANDRA-14526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14526 > Project: Cassandra > Issue Type: Sub-task > Components: Testing >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Labels: dtest > > Please find dtest here: > || dtest || > | [patch > |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731753#comment-16731753 ] Jay Zhuang commented on CASSANDRA-14526: Seems the test is failing for branch 2.2: {noformat} $ pytest --cassandra-dir=/Users/zjay/ws/cassandra bootstrap_test.py::TestBootstrap::test_bootstrap_binary_disabled ... E ccmlib.node.TimeoutError: 02 Jan 2019 03:45:28 [node3] Missing: ['Not starting client transports as bootstrap has not completed']: E INFO [main] 2019-01-01 19:44:46,816 Config.java:4. E See system.log for remainder ... {noformat} > dtest to validate Cassandra state post failed/successful bootstrap > -- > > Key: CASSANDRA-14526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14526 > Project: Cassandra > Issue Type: Sub-task > Components: Testing >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Labels: dtest > > Please find dtest here: > || dtest || > | [patch > |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14525: --- Fix Version/s: (was: 4.x) 3.11.4 > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 2.2.14, 3.0.18, 3.11.4, 4.0 > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933] > and as a result > [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999] > will not be invoked. > API
[jira] [Updated] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14525: --- Resolution: Fixed Fix Version/s: (was: 3.0.x) (was: 2.2.x) 4.x 3.0.18 2.2.14 Status: Resolved (was: Ready to Commit) > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 2.2.14, 3.0.18, 4.0, 4.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933] > and as a result >
[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731200#comment-16731200 ] Jay Zhuang commented on CASSANDRA-14525: Thanks [~chovatia.jayd...@gmail.com] and [~KurtG]. Committed as [{{a6196a3}}|https://github.com/apache/cassandra/commit/a6196a3a79b67dc6577747e591456328e57c314f]. > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933] > and as a result >
[jira] [Resolved] (CASSANDRA-14946) DistributedReadWritePathTest fails in circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang resolved CASSANDRA-14946. Resolution: Duplicate > DistributedReadWritePathTest fails in circleci > -- > > Key: CASSANDRA-14946 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14946 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jay Zhuang >Priority: Major > > {{org/apache/cassandra/distributed/DistributedReadWritePathTest}} test fails > in circleci: > {noformat} > [junit] Testcase: > org.apache.cassandra.distributed.DistributedReadWritePathTest:coordinatorWrite: > Caused an ERROR > [junit] Forked Java VM exited abnormally. Please note the time in the > report does not reflect the time until the VM exit. > [junit] junit.framework.AssertionFailedError: Forked Java VM exited > abnormally. Please note the time in the report does not reflect the time > until the VM exit. > [junit] at java.util.Vector.forEach(Vector.java:1275) > [junit] at java.util.Vector.forEach(Vector.java:1275) > [junit] at java.lang.Thread.run(Thread.java:748) > {noformat} > The test works locally, seems circleci container doesn't have enough memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14946) DistributedReadWritePathTest fails in circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731143#comment-16731143 ] Jay Zhuang commented on CASSANDRA-14946: Seems it's a duplication of CASSANDRA-14922 > DistributedReadWritePathTest fails in circleci > -- > > Key: CASSANDRA-14946 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14946 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jay Zhuang >Priority: Major > > {{org/apache/cassandra/distributed/DistributedReadWritePathTest}} test fails > in circleci: > {noformat} > [junit] Testcase: > org.apache.cassandra.distributed.DistributedReadWritePathTest:coordinatorWrite: > Caused an ERROR > [junit] Forked Java VM exited abnormally. Please note the time in the > report does not reflect the time until the VM exit. > [junit] junit.framework.AssertionFailedError: Forked Java VM exited > abnormally. Please note the time in the report does not reflect the time > until the VM exit. > [junit] at java.util.Vector.forEach(Vector.java:1275) > [junit] at java.util.Vector.forEach(Vector.java:1275) > [junit] at java.lang.Thread.run(Thread.java:748) > {noformat} > The test works locally, seems circleci container doesn't have enough memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730522#comment-16730522 ] Jay Zhuang commented on CASSANDRA-14525: Hi, just a question : {{SystemKeyspace.bootstrapComplete()}} is checked here: [https://github.com/apache/cassandra/commit/9c3fb65e697d810321936e06504de4b2f7cf633f#diff-b76a607445d53f18a98c9df14323c7ddR392] But not here: [https://github.com/apache/cassandra/commit/9c3fb65e697d810321936e06504de4b2f7cf633f#diff-b76a607445d53f18a98c9df14323c7ddR351] Is that expected? > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing >
[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730460#comment-16730460 ] Jay Zhuang commented on CASSANDRA-14525: The uTest failure is because CASSANDRA-14946. The check conditions are hard to read, how about switching it from: {noformat} // We only start transports if bootstrap has completed and we're not in survey mode, OR if we are in // survey mode and streaming has completed but we're not using auth. // OR if we have not joined the ring yet. if (StorageService.instance.hasJoined() && ((!StorageService.instance.isSurveyMode() && !SystemKeyspace.bootstrapComplete()) || (StorageService.instance.isSurveyMode() && StorageService.instance.isBootstrapMode( { logger.info("Not starting client transports as bootstrap has not completed"); return; } else if (StorageService.instance.hasJoined() && StorageService.instance.isSurveyMode() && DatabaseDescriptor.getAuthenticator().requireAuthentication()) { // Auth isn't initialised until we join the ring, so if we're in survey mode auth will always fail. logger.info("Not starting client transports as write_survey mode and authentication is enabled"); return; } {noformat} to: {noformat} // Do not start the transports if we already joined the ring AND // if we are in survey mode, streaming has not completed or auth is enabled // if we are not in survey mode, bootstrap has not completed if (StorageService.instance.hasJoined()) { if (StorageService.instance.isSurveyMode()) { if (StorageService.instance.isBootstrapMode() || DatabaseDescriptor.getAuthenticator().requireAuthentication()) { logger.info("Not starting client transports in write_survey mode as it's bootstrapping or auth is enabled"); return; } } else { if (!SystemKeyspace.bootstrapComplete()) { logger.info("Not starting client transports as bootstrap has not completed"); return; } } } {noformat} > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) >
[jira] [Created] (CASSANDRA-14946) DistributedReadWritePathTest fails in circleci
Jay Zhuang created CASSANDRA-14946: -- Summary: DistributedReadWritePathTest fails in circleci Key: CASSANDRA-14946 URL: https://issues.apache.org/jira/browse/CASSANDRA-14946 Project: Cassandra Issue Type: Bug Components: Testing Reporter: Jay Zhuang {{org/apache/cassandra/distributed/DistributedReadWritePathTest}} test fails in circleci: {noformat} [junit] Testcase: org.apache.cassandra.distributed.DistributedReadWritePathTest:coordinatorWrite: Caused an ERROR [junit] Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. [junit] junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. [junit] at java.util.Vector.forEach(Vector.java:1275) [junit] at java.util.Vector.forEach(Vector.java:1275) [junit] at java.lang.Thread.run(Thread.java:748) {noformat} The test works locally, seems circleci container doesn't have enough memory. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727118#comment-16727118 ] Jay Zhuang commented on CASSANDRA-14525: Rebased the code and started the tests: | Branch | uTest | dTest | | [14525-2.2|https://github.com/cooldoger/cassandra/tree/14525-2.2] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14525-2.2.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14525-2.2] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/664/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/664/] | | [14525-3.0|https://github.com/cooldoger/cassandra/tree/14525-3.0] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14525-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14525-3.0] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/665/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/665/] | | [14525-3.11|https://github.com/cooldoger/cassandra/tree/14525-3.11] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14525-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14525-3.11] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/666/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/666/] | | [14525-trunk|https://github.com/cooldoger/cassandra/tree/14525-trunk] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14525-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14525-trunk] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/667/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/667/] | > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at >
[jira] [Updated] (CASSANDRA-14616) cassandra-stress write hangs with default options
[ https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14616: --- Resolution: Fixed Fix Version/s: 4.0 3.11.4 3.0.18 Status: Resolved (was: Ready to Commit) Thank you [~Stefania] for the review. Committed as [{{bbf7dac}}|https://github.com/apache/cassandra/commit/bbf7dac87cdc41bf8e138a99f630e7a827ad0d98]. The dTest is committed as [{{325ef3f}}|https://github.com/apache/cassandra-dtest/commit/325ef3fa063252e6dad88473613abbd829e8c24d]. > cassandra-stress write hangs with default options > - > > Key: CASSANDRA-14616 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14616 > Project: Cassandra > Issue Type: Bug > Components: Stress >Reporter: Chris Lohfink >Assignee: Jay Zhuang >Priority: Major > Fix For: 3.0.18, 3.11.4, 4.0 > > > Cassandra stress sits there for incredibly long time after connecting to JMX. > To reproduce {code}./tools/bin/cassandra-stress write{code} > If you give it a -n its not as bad which is why dtests etc dont seem to be > impacted. Does not occur in 3.0 branch but does in 3.11 and trunk -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14616) cassandra-stress write hangs with default options
[ https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang reassigned CASSANDRA-14616: -- Assignee: Jay Zhuang (was: Jeremy) > cassandra-stress write hangs with default options > - > > Key: CASSANDRA-14616 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14616 > Project: Cassandra > Issue Type: Bug > Components: Stress >Reporter: Chris Lohfink >Assignee: Jay Zhuang >Priority: Major > > Cassandra stress sits there for incredibly long time after connecting to JMX. > To reproduce {code}./tools/bin/cassandra-stress write{code} > If you give it a -n its not as bad which is why dtests etc dont seem to be > impacted. Does not occur in 3.0 branch but does in 3.11 and trunk -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14616) cassandra-stress write hangs with default options
[ https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687347#comment-16687347 ] Jay Zhuang commented on CASSANDRA-14616: The failed the utest is because of CASSANDRA-14891 > cassandra-stress write hangs with default options > - > > Key: CASSANDRA-14616 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14616 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Assignee: Jeremy >Priority: Major > > Cassandra stress sits there for incredibly long time after connecting to JMX. > To reproduce {code}./tools/bin/cassandra-stress write{code} > If you give it a -n its not as bad which is why dtests etc dont seem to be > impacted. Does not occur in 3.0 branch but does in 3.11 and trunk -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14891) [utest] LegacySSTableTest.testInaccurateSSTableMinMax test failed
Jay Zhuang created CASSANDRA-14891: -- Summary: [utest] LegacySSTableTest.testInaccurateSSTableMinMax test failed Key: CASSANDRA-14891 URL: https://issues.apache.org/jira/browse/CASSANDRA-14891 Project: Cassandra Issue Type: Bug Components: Testing Reporter: Jay Zhuang {noformat} junit.framework.AssertionFailedError at org.apache.cassandra.db.SinglePartitionSliceCommandTest.getUnfilteredsFromSinglePartition(SinglePartitionSliceCommandTest.java:404) at org.apache.cassandra.io.sstable.LegacySSTableTest.ttestInaccurateSSTableMinMax(LegacySSTableTest.java:323) {noformat} Related to CASSANDRA-14861 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14616) cassandra-stress write hangs with default options
[ https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang reassigned CASSANDRA-14616: -- Assignee: Jeremy Quinn > cassandra-stress write hangs with default options > - > > Key: CASSANDRA-14616 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14616 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Assignee: Jeremy Quinn >Priority: Major > > Cassandra stress sits there for incredibly long time after connecting to JMX. > To reproduce {code}./tools/bin/cassandra-stress write{code} > If you give it a -n its not as bad which is why dtests etc dont seem to be > impacted. Does not occur in 3.0 branch but does in 3.11 and trunk -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14616) cassandra-stress write hangs with default options
[ https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14616: --- Reproduced In: 3.11.0, 4.0 Status: Patch Available (was: Open) > cassandra-stress write hangs with default options > - > > Key: CASSANDRA-14616 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14616 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Assignee: Jeremy >Priority: Major > > Cassandra stress sits there for incredibly long time after connecting to JMX. > To reproduce {code}./tools/bin/cassandra-stress write{code} > If you give it a -n its not as bad which is why dtests etc dont seem to be > impacted. Does not occur in 3.0 branch but does in 3.11 and trunk -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14616) cassandra-stress write hangs with default options
[ https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang reassigned CASSANDRA-14616: -- Assignee: Jeremy (was: Jeremy Quinn) > cassandra-stress write hangs with default options > - > > Key: CASSANDRA-14616 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14616 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Assignee: Jeremy >Priority: Major > > Cassandra stress sits there for incredibly long time after connecting to JMX. > To reproduce {code}./tools/bin/cassandra-stress write{code} > If you give it a -n its not as bad which is why dtests etc dont seem to be > impacted. Does not occur in 3.0 branch but does in 3.11 and trunk -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14616) cassandra-stress write hangs with default options
[ https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687300#comment-16687300 ] Jay Zhuang commented on CASSANDRA-14616: Hi [~Yarnspinner], the fix looks good. I had the similar fix which re-enables {{warm-up}} to 50k as before ([{{StressAction.java}}|https://github.com/apache/cassandra/commit/6a1b1f26b7174e8c9bf86a96514ab626ce2a4117#diff-fd2f2d2364937fcb1c0d73c8314f1418L90]) |Branch|uTest|dTest| |[14890-3.0|https://github.com/cooldoger/cassandra/tree/14890-3.0]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/]| |[14890-3.11|https://github.com/cooldoger/cassandra/tree/14890-3.11]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/]| |[14890-trunk|https://github.com/cooldoger/cassandra/tree/14890-trunk]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/]| Here is a dTest to reproduce the problem: |[14890|https://github.com/cooldoger/cassandra-dtest/tree/14890]| > cassandra-stress write hangs with default options > - > > Key: CASSANDRA-14616 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14616 > Project: Cassandra > Issue Type: Bug >Reporter: Chris Lohfink >Priority: Major > > Cassandra stress sits there for incredibly long time after connecting to JMX. > To reproduce {code}./tools/bin/cassandra-stress write{code} > If you give it a -n its not as bad which is why dtests etc dont seem to be > impacted. Does not occur in 3.0 branch but does in 3.11 and trunk -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if `n` is not specified
[ https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14890: --- Resolution: Duplicate Status: Resolved (was: Patch Available) Resolve as duplication to CASSANDRA-14616. > cassandra-stress hang for 200 seconds if `n` is not specified > - > > Key: CASSANDRA-14890 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14890 > Project: Cassandra > Issue Type: Bug > Components: Stress >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > if parameter {{n}} is not specified, cassandra-stress will hang (wait) for > 200 seconds between warm-up and sending traffic. > For example, the following command will hang 200 seconds before sending the > traffic: > {noformat} > $ ./tools/bin/cassandra-stress write > ... > Created keyspaces. Sleeping 1s for propagation. > Sleeping 2s... > Warming up WRITE with 0 iterations... > Failed to connect over JMX; not collecting these stats > {noformat} > It's waiting for this: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72] > As there's no warm-up traffic (CASSANDRA-13773), it will wait until: > {noformat} > (measurements >= waiter.maxMeasurements) > {noformat} > {{maxMeasurements}} is 200 by default: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if `n` is not specified
[ https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14890: --- Summary: cassandra-stress hang for 200 seconds if `n` is not specified (was: cassandra-stress hang for 200 seconds if n is not specified) > cassandra-stress hang for 200 seconds if `n` is not specified > - > > Key: CASSANDRA-14890 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14890 > Project: Cassandra > Issue Type: Bug > Components: Stress >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > if parameter {{n}} is not specified, cassandra-stress will hang (wait) for > 200 seconds between warm-up and sending traffic. > For example, the following command will hang 200 seconds before sending the > traffic: > {noformat} > $ ./tools/bin/cassandra-stress write > ... > Created keyspaces. Sleeping 1s for propagation. > Sleeping 2s... > Warming up WRITE with 0 iterations... > Failed to connect over JMX; not collecting these stats > {noformat} > It's waiting for this: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72] > As there's no warm-up traffic (CASSANDRA-13773), it will wait until: > {noformat} > (measurements >= waiter.maxMeasurements) > {noformat} > {{maxMeasurements}} is 200 by default: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if n is not specified
[ https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687254#comment-16687254 ] Jay Zhuang commented on CASSANDRA-14890: Here is a patch to re-enable {{warm-up}} if `n` is not set ([{{StressAction.java}}|https://github.com/apache/cassandra/commit/6a1b1f26b7174e8c9bf86a96514ab626ce2a4117#diff-fd2f2d2364937fcb1c0d73c8314f1418L90]): | Branch | uTest | dTest | | [14890-3.0|https://github.com/cooldoger/cassandra/tree/14890-3.0] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/] | | [14890-3.11|https://github.com/cooldoger/cassandra/tree/14890-3.11] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/] | | [14890-trunk|https://github.com/cooldoger/cassandra/tree/14890-trunk] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/] | Here is the dTest to reproduce the problem: |[14890|https://github.com/cooldoger/cassandra-dtest/tree/14890]| [~Stefania] would you please review? > cassandra-stress hang for 200 seconds if n is not specified > --- > > Key: CASSANDRA-14890 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14890 > Project: Cassandra > Issue Type: Bug > Components: Stress >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > if parameter {{n}} is not specified, cassandra-stress will hang (wait) for > 200 seconds between warm-up and sending traffic. > For example, the following command will hang 200 seconds before sending the > traffic: > {noformat} > $ ./tools/bin/cassandra-stress write > ... > Created keyspaces. Sleeping 1s for propagation. > Sleeping 2s... > Warming up WRITE with 0 iterations... > Failed to connect over JMX; not collecting these stats > {noformat} > It's waiting for this: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72] > As there's no warm-up traffic (CASSANDRA-13773), it will wait until: > {noformat} > (measurements >= waiter.maxMeasurements) > {noformat} > {{maxMeasurements}} is 200 by default: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if n is not specified
[ https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687254#comment-16687254 ] Jay Zhuang edited comment on CASSANDRA-14890 at 11/14/18 10:51 PM: --- Here is a patch to re-enable {{warm-up}} if `n` is not set ([{{StressAction.java}}|https://github.com/apache/cassandra/commit/6a1b1f26b7174e8c9bf86a96514ab626ce2a4117#diff-fd2f2d2364937fcb1c0d73c8314f1418L90]): |Branch|uTest|dTest| |[14890-3.0|https://github.com/cooldoger/cassandra/tree/14890-3.0]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/]| |[14890-3.11|https://github.com/cooldoger/cassandra/tree/14890-3.11]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/]| |[14890-trunk|https://github.com/cooldoger/cassandra/tree/14890-trunk]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/]| Here is a dTest to reproduce the problem: |[14890|https://github.com/cooldoger/cassandra-dtest/tree/14890]| [~Stefania] would you please review? was (Author: jay.zhuang): Here is a patch to re-enable {{warm-up}} if `n` is not set ([{{StressAction.java}}|https://github.com/apache/cassandra/commit/6a1b1f26b7174e8c9bf86a96514ab626ce2a4117#diff-fd2f2d2364937fcb1c0d73c8314f1418L90]): | Branch | uTest | dTest | | [14890-3.0|https://github.com/cooldoger/cassandra/tree/14890-3.0] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/] | | [14890-3.11|https://github.com/cooldoger/cassandra/tree/14890-3.11] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/] | | [14890-trunk|https://github.com/cooldoger/cassandra/tree/14890-trunk] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk] | [!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/] | Here is the dTest to reproduce the problem: |[14890|https://github.com/cooldoger/cassandra-dtest/tree/14890]| [~Stefania] would you please review? > cassandra-stress hang for 200 seconds if n is not specified > --- > > Key: CASSANDRA-14890 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14890 > Project: Cassandra > Issue Type: Bug > Components: Stress >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > if parameter {{n}} is not specified, cassandra-stress will hang (wait) for > 200 seconds between warm-up and sending traffic. > For example, the following command will hang 200 seconds before sending the > traffic: > {noformat} > $ ./tools/bin/cassandra-stress write > ... > Created keyspaces. Sleeping 1s for propagation. > Sleeping 2s... > Warming up WRITE with 0 iterations... > Failed to connect over JMX; not collecting these stats > {noformat} > It's waiting for this: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72] > As there's no warm-up traffic (CASSANDRA-13773), it will wait until: > {noformat} > (measurements >= waiter.maxMeasurements) > {noformat} > {{maxMeasurements}} is 200 by default: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if n is not specified
[ https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14890: --- Status: Patch Available (was: Open) > cassandra-stress hang for 200 seconds if n is not specified > --- > > Key: CASSANDRA-14890 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14890 > Project: Cassandra > Issue Type: Bug > Components: Stress >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > if parameter {{n}} is not specified, cassandra-stress will hang (wait) for > 200 seconds between warm-up and sending traffic. > For example, the following command will hang 200 seconds before sending the > traffic: > {noformat} > $ ./tools/bin/cassandra-stress write > ... > Created keyspaces. Sleeping 1s for propagation. > Sleeping 2s... > Warming up WRITE with 0 iterations... > Failed to connect over JMX; not collecting these stats > {noformat} > It's waiting for this: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72] > As there's no warm-up traffic (CASSANDRA-13773), it will wait until: > {noformat} > (measurements >= waiter.maxMeasurements) > {noformat} > {{maxMeasurements}} is 200 by default: > [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if n is not specified
Jay Zhuang created CASSANDRA-14890: -- Summary: cassandra-stress hang for 200 seconds if n is not specified Key: CASSANDRA-14890 URL: https://issues.apache.org/jira/browse/CASSANDRA-14890 Project: Cassandra Issue Type: Bug Components: Stress Reporter: Jay Zhuang Assignee: Jay Zhuang if parameter {{n}} is not specified, cassandra-stress will hang (wait) for 200 seconds between warm-up and sending traffic. For example, the following command will hang 200 seconds before sending the traffic: {noformat} $ ./tools/bin/cassandra-stress write ... Created keyspaces. Sleeping 1s for propagation. Sleeping 2s... Warming up WRITE with 0 iterations... Failed to connect over JMX; not collecting these stats {noformat} It's waiting for this: [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72] As there's no warm-up traffic (CASSANDRA-13773), it will wait until: {noformat} (measurements >= waiter.maxMeasurements) {noformat} {{maxMeasurements}} is 200 by default: [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap
[ https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1241#comment-1241 ] Jay Zhuang commented on CASSANDRA-14526: Hi [~chovatia.jayd...@gmail.com], the function name is duplicated (https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk#diff-0b30b9f097df89d74be1d1af8205ac7eR707), I assume the first one could be removed. > dtest to validate Cassandra state post failed/successful bootstrap > -- > > Key: CASSANDRA-14526 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14526 > Project: Cassandra > Issue Type: Sub-task >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Labels: dtest > > Please find dtest here: > || dtest || > | [patch > |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state
[ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1220#comment-1220 ] Jay Zhuang commented on CASSANDRA-14525: Sure, I'll kick off the tests. > streaming failure during bootstrap makes new node into inconsistent state > - > > Key: CASSANDRA-14525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14525 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jaydeepkumar Chovatia >Assignee: Jaydeepkumar Chovatia >Priority: Major > Fix For: 4.0, 2.2.x, 3.0.x > > > If bootstrap fails for newly joining node (most common reason is due to > streaming failure) then Cassandra state remains in {{joining}} state which is > fine but Cassandra also enables Native transport which makes overall state > inconsistent. This further creates NullPointer exception if auth is enabled > on the new node, please find reproducible steps here: > For example if bootstrap fails due to streaming errors like > {quote}java.util.concurrent.ExecutionException: > org.apache.cassandra.streaming.StreamException: Stream failed > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) > [apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) > [apache-cassandra-3.0.16.jar:3.0.16] > Caused by: org.apache.cassandra.streaming.StreamException: Stream failed > at > org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) > ~[guava-18.0.jar:na] > at > com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) > ~[guava-18.0.jar:na] > at > org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > ~[apache-cassandra-3.0.16.jar:3.0.16] > at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121] > {quote} > then variable [StorageService.java::dataAvailable > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] > will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not > call [StorageService.java::finishJoiningRing > |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933] > and as a result > [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999] > will not be invoked. > API
[jira] [Updated] (CASSANDRA-14791) [utest] tests unable to write system tmp directory
[ https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14791: --- Issue Type: Bug (was: Task) > [utest] tests unable to write system tmp directory > -- > > Key: CASSANDRA-14791 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14791 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.0 > > > Some tests are failing from time to time because it cannot write to directory > {{/tmp/}}: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/ > {noformat} > java.lang.RuntimeException: java.nio.file.AccessDeniedException: > /tmp/na-1-big-Data.db > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141) > at > org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100) > {noformat} > I guess it's because some Jenkins slaves don't have proper permission set. > For slave {{cassandra16}}, the tests are fine: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14791) [utest] tests unable to write system tmp directory
[ https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14791: --- Fix Version/s: 4.0 > [utest] tests unable to write system tmp directory > -- > > Key: CASSANDRA-14791 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14791 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.0 > > > Some tests are failing from time to time because it cannot write to directory > {{/tmp/}}: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/ > {noformat} > java.lang.RuntimeException: java.nio.file.AccessDeniedException: > /tmp/na-1-big-Data.db > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141) > at > org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100) > {noformat} > I guess it's because some Jenkins slaves don't have proper permission set. > For slave {{cassandra16}}, the tests are fine: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14791) [utest] tests unable to write system tmp directory
[ https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14791: --- Resolution: Fixed Status: Resolved (was: Ready to Commit) Thanks [~krummas] for the review. Committed as [{{73ebd20}}|https://github.com/apache/cassandra/commit/73ebd200c04335624f956e79624cf8494d872f19]. > [utest] tests unable to write system tmp directory > -- > > Key: CASSANDRA-14791 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14791 > Project: Cassandra > Issue Type: Task > Components: Testing >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > Some tests are failing from time to time because it cannot write to directory > {{/tmp/}}: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/ > {noformat} > java.lang.RuntimeException: java.nio.file.AccessDeniedException: > /tmp/na-1-big-Data.db > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141) > at > org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100) > {noformat} > I guess it's because some Jenkins slaves don't have proper permission set. > For slave {{cassandra16}}, the tests are fine: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory
[ https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641314#comment-16641314 ] Jay Zhuang commented on CASSANDRA-14791: The root cause of this test failure is not because {{/tmp/}} directory is not writable. But because the unittest generated tmp files {{/tmp/na-1-big-Data.db}} and {{/tmp/na-1-big-CompressionInfo.db}} are not deleted after the test. So I guess on these nodes, the test was run by other user, which left the tmp files that the current user cannot override. I'm able to reproduce the same error message by: {noformat} sudo chown root:root /tmp/na-1-big-Data.db {noformat} Here is a patch for trunk: | Branch | uTest | | [14791|https://github.com/cooldoger/cassandra/tree/14791] | [!https://circleci.com/gh/cooldoger/cassandra/tree/14791.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14791] | Passed the tests in Jenkins: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/36/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/ > [utest] tests unable to write system tmp directory > -- > > Key: CASSANDRA-14791 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14791 > Project: Cassandra > Issue Type: Task > Components: Testing >Reporter: Jay Zhuang >Priority: Minor > > Some tests are failing from time to time because it cannot write to directory > {{/tmp/}}: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/ > {noformat} > java.lang.RuntimeException: java.nio.file.AccessDeniedException: > /tmp/na-1-big-Data.db > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141) > at > org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100) > {noformat} > I guess it's because some Jenkins slaves don't have proper permission set. > For slave {{cassandra16}}, the tests are fine: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14791) [utest] tests unable to write system tmp directory
[ https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14791: --- Assignee: Jay Zhuang Status: Patch Available (was: Open) > [utest] tests unable to write system tmp directory > -- > > Key: CASSANDRA-14791 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14791 > Project: Cassandra > Issue Type: Task > Components: Testing >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > Some tests are failing from time to time because it cannot write to directory > {{/tmp/}}: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/ > {noformat} > java.lang.RuntimeException: java.nio.file.AccessDeniedException: > /tmp/na-1-big-Data.db > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141) > at > org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100) > {noformat} > I guess it's because some Jenkins slaves don't have proper permission set. > For slave {{cassandra16}}, the tests are fine: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
[ https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640961#comment-16640961 ] Jay Zhuang commented on CASSANDRA-14610: I'm unable to reproduce the problem locally, for the failed job in Jenkins, seems mostly it's because timeout to populate 6 nodes: {noformat} Error Message ccmlib.node.NodeError: Error starting node1. Stacktrace self = @since('4.0') def test_describecluster_more_information_three_datacenters(self): """ nodetool describecluster should be more informative. It should include detailes for total node count, list of datacenters, RF, number of nodes per dc, how many are down and version(s). @jira_ticket CASSANDRA-13853 @expected_result This test invokes nodetool describecluster and matches the output with the expected one """ cluster = self.cluster > cluster.populate([2, 3, 1]).start(wait_for_binary_proto=True) {noformat} Other tests which requires 6 nodes all marked as {{@pytest.mark.resource_intensive}} (then these tests are skipped). I think reducing the node number from 6 to 4 should help. +1 for the patch (also it passed 100 times locally run). > Flaky dtest: > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > --- > > Key: CASSANDRA-14610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14610 > Project: Cassandra > Issue Type: Task > Components: Testing, Tools >Reporter: Jason Brown >Assignee: Marcus Eriksson >Priority: Minor > Labels: dtest > > @jay zhuang observed > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > being flaky in Apache Jenkins. I ran locally and got a different flaky > behavior: > {noformat} > out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster') > assert 0 == len(err), err > > assert out_node1_dc1 == out_node1_dc3 > E AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster > Infor...1=3, dc3=1}\n' > E Cluster Information: > E Name: test > E Snitch: org.apache.cassandra.locator.PropertyFileSnitch > E DynamicEndPointSnitch: enabled > E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > E Schema versions: > E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, > 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > E > E ...Full output truncated (26 lines hidden), use '-vv' to show > 09:58:14,357 ccm DEBUG Log-watching thread exiting. > ===Flaky Test Report=== > test_describecluster_more_information_three_datacenters failed and was not > selected for rerun. > > assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' > Cluster Information: > Name: test > Snitch: org.apache.cassandra.locator.PropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, > 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > > ...Full output truncated (26 lines hidden), use '-vv' to show > [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>] > ===End Flaky Test Report=== > {noformat} > As this test is for a patch that was introduced for 4.0, this dtest (should) > only be failing on trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
[ https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14610: --- Reviewer: Jay Zhuang > Flaky dtest: > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > --- > > Key: CASSANDRA-14610 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14610 > Project: Cassandra > Issue Type: Task > Components: Testing, Tools >Reporter: Jason Brown >Assignee: Marcus Eriksson >Priority: Minor > Labels: dtest > > @jay zhuang observed > nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters > being flaky in Apache Jenkins. I ran locally and got a different flaky > behavior: > {noformat} > out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster') > assert 0 == len(err), err > > assert out_node1_dc1 == out_node1_dc3 > E AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster > Infor...1=3, dc3=1}\n' > E Cluster Information: > E Name: test > E Snitch: org.apache.cassandra.locator.PropertyFileSnitch > E DynamicEndPointSnitch: enabled > E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > E Schema versions: > E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, > 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > E > E ...Full output truncated (26 lines hidden), use '-vv' to show > 09:58:14,357 ccm DEBUG Log-watching thread exiting. > ===Flaky Test Report=== > test_describecluster_more_information_three_datacenters failed and was not > selected for rerun. > > assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n' > Cluster Information: > Name: test > Snitch: org.apache.cassandra.locator.PropertyFileSnitch > DynamicEndPointSnitch: enabled > Partitioner: org.apache.cassandra.dht.Murmur3Partitioner > Schema versions: > fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, > 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]... > > ...Full output truncated (26 lines hidden), use '-vv' to show > [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>] > ===End Flaky Test Report=== > {noformat} > As this test is for a patch that was introduced for 4.0, this dtest (should) > only be failing on trunk. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory
[ https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-12704: --- Resolution: Fixed Fix Version/s: 4.0 Status: Resolved (was: Ready to Commit) > snapshot build never be able to publish to mvn artifactory > -- > > Key: CASSANDRA-12704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12704 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.0 > > Attachments: 12704-trunk.txt > > > {code} > $ ant publish > {code} > works fine when property "release" is set, which publishes the binaries to > release Artifactory. > But for daily snapshot build, if "release" is set, it won't be snapshot build: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74 > if "release" is not set, it doesn't publish to snapshot Artifactory: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888 > I would suggest just removing the "if check" for target "publish". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory
[ https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631220#comment-16631220 ] Jay Zhuang commented on CASSANDRA-12704: Thanks [~michaelsembwever]. Committed to trunk as [{{87a}}|https://github.com/apache/cassandra/commit/87abe7249f7ad8b11235d61e048735bd6d62]. > snapshot build never be able to publish to mvn artifactory > -- > > Key: CASSANDRA-12704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12704 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > Attachments: 12704-trunk.txt > > > {code} > $ ant publish > {code} > works fine when property "release" is set, which publishes the binaries to > release Artifactory. > But for daily snapshot build, if "release" is set, it won't be snapshot build: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74 > if "release" is not set, it doesn't publish to snapshot Artifactory: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888 > I would suggest just removing the "if check" for target "publish". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory
[ https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-12704: --- Status: Ready to Commit (was: Patch Available) > snapshot build never be able to publish to mvn artifactory > -- > > Key: CASSANDRA-12704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12704 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > Attachments: 12704-trunk.txt > > > {code} > $ ant publish > {code} > works fine when property "release" is set, which publishes the binaries to > release Artifactory. > But for daily snapshot build, if "release" is set, it won't be snapshot build: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74 > if "release" is not set, it doesn't publish to snapshot Artifactory: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888 > I would suggest just removing the "if check" for target "publish". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory
[ https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630718#comment-16630718 ] Jay Zhuang commented on CASSANDRA-14791: [~mshuler] talked about the docker option in the last NGCC: https://github.com/ngcc/ngcc2017/blob/master/Help_Test_Apache_Cassandra-NGCC_2017.pdf . Any idea how we can move forward with this? > [utest] tests unable to write system tmp directory > -- > > Key: CASSANDRA-14791 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14791 > Project: Cassandra > Issue Type: Task > Components: Testing >Reporter: Jay Zhuang >Priority: Minor > > Some tests are failing from time to time because it cannot write to directory > {{/tmp/}}: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/ > {noformat} > java.lang.RuntimeException: java.nio.file.AccessDeniedException: > /tmp/na-1-big-Data.db > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141) > at > org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100) > {noformat} > I guess it's because some Jenkins slaves don't have proper permission set. > For slave {{cassandra16}}, the tests are fine: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory
[ https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-12704: --- Reviewer: mck > snapshot build never be able to publish to mvn artifactory > -- > > Key: CASSANDRA-12704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12704 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > Attachments: 12704-trunk.txt > > > {code} > $ ant publish > {code} > works fine when property "release" is set, which publishes the binaries to > release Artifactory. > But for daily snapshot build, if "release" is set, it won't be snapshot build: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74 > if "release" is not set, it doesn't publish to snapshot Artifactory: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888 > I would suggest just removing the "if check" for target "publish". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory
[ https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630699#comment-16630699 ] Jay Zhuang commented on CASSANDRA-12704: Do you think the change should go to trunk only or other branches too? I would prefer branches from 2.2, as we might have snapshot artifacts for all active branches. > snapshot build never be able to publish to mvn artifactory > -- > > Key: CASSANDRA-12704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12704 > Project: Cassandra > Issue Type: Bug > Components: Packaging >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > Attachments: 12704-trunk.txt > > > {code} > $ ant publish > {code} > works fine when property "release" is set, which publishes the binaries to > release Artifactory. > But for daily snapshot build, if "release" is set, it won't be snapshot build: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74 > if "release" is not set, it doesn't publish to snapshot Artifactory: > https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888 > I would suggest just removing the "if check" for target "publish". -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory
[ https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629010#comment-16629010 ] Jay Zhuang commented on CASSANDRA-14791: Hi [~mshuler], [~spo...@gmail.com], any idea if there's a permission setting we could set for the Jenkins Job/Slave? > [utest] tests unable to write system tmp directory > -- > > Key: CASSANDRA-14791 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14791 > Project: Cassandra > Issue Type: Task > Components: Testing >Reporter: Jay Zhuang >Priority: Minor > > Some tests are failing from time to time because it cannot write to directory > {{/tmp/}}: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/ > {noformat} > java.lang.RuntimeException: java.nio.file.AccessDeniedException: > /tmp/na-1-big-Data.db > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152) > at > org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141) > at > org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119) > at > org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db > at > sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) > at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) > at > sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) > at java.nio.channels.FileChannel.open(FileChannel.java:287) > at java.nio.channels.FileChannel.open(FileChannel.java:335) > at > org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100) > {noformat} > I guess it's because some Jenkins slaves don't have proper permission set. > For slave {{cassandra16}}, the tests are fine: > https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14791) [utest] tests unable to write system tmp directory
Jay Zhuang created CASSANDRA-14791: -- Summary: [utest] tests unable to write system tmp directory Key: CASSANDRA-14791 URL: https://issues.apache.org/jira/browse/CASSANDRA-14791 Project: Cassandra Issue Type: Task Components: Testing Reporter: Jay Zhuang Some tests are failing from time to time because it cannot write to directory {{/tmp/}}: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/ {noformat} java.lang.RuntimeException: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db at org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119) at org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152) at org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141) at org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82) at org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119) at org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:748) Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db at sun.nio.fs.UnixException.translateToIOException(UnixException.java:84) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177) at java.nio.channels.FileChannel.open(FileChannel.java:287) at java.nio.channels.FileChannel.open(FileChannel.java:335) at org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100) {noformat} I guess it's because some Jenkins slaves don't have proper permission set. For slave {{cassandra16}}, the tests are fine: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14497) Add Role login cache
[ https://issues.apache.org/jira/browse/CASSANDRA-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598116#comment-16598116 ] Jay Zhuang commented on CASSANDRA-14497: Thanks [~beobal] for answering the questions. This is a very useful feature that we should try to get it into {{4.0}}, we have several clusters that are having high QPS on {{system_auth}} because of {{canLogin}} is not cached. Overall the patch looks good to me, I have a few comments or actually just questions: {quote} Permissions can be grouped together by assigning them to roles, which can then be granted to other roles. LOGIN is the way to differentiate these logical roles from ones which represent 'real' database users. {quote} {{logical}} role doesn't have password right? Can we use that? {quote} Both login and superuser privs are part of authz really. {quote} Then if we disable [{{authorizer}}|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L112], it should not do the login check right? But the current implementation still checks, and the patch will load all the role hierarchy information to the cache then find the primary role we need: https://github.com/beobal/cassandra/commit/cf3965c82cafd31a3b585e19fd6beba9a56b85e5#diff-b13b86e6bacbcc61c6e9f07715f46ed6R109 Maybe my questions are beyond the scope of this ticket. If we just want to add cache with minimized the impact. I think the patch looks good. > Add Role login cache > > > Key: CASSANDRA-14497 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14497 > Project: Cassandra > Issue Type: Improvement > Components: Auth >Reporter: Jay Zhuang >Assignee: Sam Tunnicliffe >Priority: Major > Labels: security > Fix For: 4.0 > > > The > [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313] > function is used for all auth message: > [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82]. > But the > [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521] > information is not cached. So it hits the database every time: > [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407]. > For a cluster with lots of new connections, it's causing performance issue. > The mitigation for us is to increase the {{system_auth}} replication factor > to match the number of nodes, so > [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488] > would be very cheap. The P99 dropped immediately, but I don't think it is > not a good solution. > I would purpose to add {{Role.canLogin}} to the RolesCache to improve the > auth performance. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-9989) Optimise BTree.build
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-9989: -- Fix Version/s: (was: 4.x) 4.0 > Optimise BTree.build > > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.0 > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14678) Propose reducing the default value for PasswordAuthenticator number of hashing rounds
[ https://issues.apache.org/jira/browse/CASSANDRA-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597986#comment-16597986 ] Jay Zhuang commented on CASSANDRA-14678: {quote} The purpose of hashing with bcrypt is to prevent easy decoding of passwords in case of a database leak, partially. A sleep wouldn't help with that, nor is it a good idea in general, speaking of DoS. {quote} That's true. If the hashed password is leaked, it would be easier to decode. {quote} That's fair, although in this case you'd still be sending the plaintext hash over unsecure network, which will be sufficient for anyone else to log in by intercepting just that. {quote} One solution is adding timestamp to the salt and the server only check the hash within {{timestamp +/- [a configurable time]}}. > Propose reducing the default value for PasswordAuthenticator number of > hashing rounds > - > > Key: CASSANDRA-14678 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14678 > Project: Cassandra > Issue Type: Wish > Components: Auth >Reporter: Shichao An >Priority: Major > > We saw performance degradation in some of our Cassandra clusters using > PasswordAuthenticator. When the clients start connecting to the Cassandra > nodes, the CPU load increases, and there is a high chance that the host will > be unable to recover from high CPU usage if the clients retry indefinitely at > relatively high frequency. In each reconnection, the clients try to initiate > auth handshakes, but may fail due to timeouts from the overloaded host, > whereas the sporadic auth handshakes will put more load to the host, so on so > forth. In our case, the load average can be 1000~3000 on a 32-core host. The > host is basically unable to serve any traffic. > We found it is caused by the slow `BCrypt.checkpw` operation, where the > generated salted hash round is 10 because `GENSALT_LOG2_ROUNDS_PROPERTY` > defaults 10, which makes it 2^10 rounds of hashing iterations. I changed the > hashing rounds to 4 by overriding `auth_bcrypt_gensalt_log2_rounds` system > property and it can effectively solve above-mentioned the CPU issue. > It took us some time to nail down the cause of this problem. Shall we reduce > the default value of `GENSALT_LOG2_ROUNDS_PROPERTY` to a smaller value than > 10? Any suggestions on the tradeoff between performance and cryptographic > impact? > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14551) ReplicationAwareTokenAllocator should block bootstrap if no replication number is set
[ https://issues.apache.org/jira/browse/CASSANDRA-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597961#comment-16597961 ] Jay Zhuang commented on CASSANDRA-14551: Hi [~dikanggu], I just saw your response in the commit: {quote} I explicitly made it work with 0 replication factor case in this jira, CASSANDRA-12983. This change will break the behavior in that situation, right? {quote} Yeah, I see your point. I think the suggestion for that (and other use cases) is using a separate dummy keyspace for {{allocate_tokens_for_keyspace}}, so you can have more control over token allocation and production keyspace. Default replication to 1 may not be a right assumption. > ReplicationAwareTokenAllocator should block bootstrap if no replication > number is set > - > > Key: CASSANDRA-14551 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14551 > Project: Cassandra > Issue Type: Bug > Components: Configuration >Reporter: Jay Zhuang >Assignee: Jay Zhuang >Priority: Minor > > We're using > [ReplicationAwareTokenAllocator|https://www.datastax.com/dev/blog/token-allocation-algorithm]. > When bootstrapping a new DC, the tokens are not well distributed. The > problem is because the replication number is not set for the new DC before > the bootstrap. > I would suggest blocking the bootstrap if replication number is not set. It's > unsafe to assume the default replicas is 1. Which also causes the following > invalid stats: > {noformat} > WARN [main] 2018-06-29 17:30:55,696 TokenAllocation.java:69 - Replicated > node load in datacenter before allocation max NaN min NaN stddev NaN > WARN [main] 2018-06-29 17:30:55,696 TokenAllocation.java:70 - Replicated > node load in datacenter after allocation max NaN min NaN stddev NaN > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14678) Propose reducing the default value for PasswordAuthenticator number of hashing rounds
[ https://issues.apache.org/jira/browse/CASSANDRA-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597608#comment-16597608 ] Jay Zhuang commented on CASSANDRA-14678: Should we add a sleep instead of doing hundreds of hashing which is a very CPU intensive? So it won't exhaust the node and protected from potential DoS attack (by keeping auth with invalid password). For the password protection, seems {{2^4=16}} rounds of hashing is already good enough. > Propose reducing the default value for PasswordAuthenticator number of > hashing rounds > - > > Key: CASSANDRA-14678 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14678 > Project: Cassandra > Issue Type: Wish > Components: Auth >Reporter: Shichao An >Priority: Major > > We saw performance degradation in some of our Cassandra clusters using > PasswordAuthenticator. When the clients start connecting to the Cassandra > nodes, the CPU load increases, and there is a high chance that the host will > be unable to recover from high CPU usage if the clients retry indefinitely at > relatively high frequency. In each reconnection, the clients try to initiate > auth handshakes, but may fail due to timeouts from the overloaded host, > whereas the sporadic auth handshakes will put more load to the host, so on so > forth. In our case, the load average can be 1000~3000 on a 32-core host. The > host is basically unable to serve any traffic. > We found it is caused by the slow `BCrypt.checkpw` operation, where the > generated salted hash round is 10 because `GENSALT_LOG2_ROUNDS_PROPERTY` > defaults 10, which makes it 2^10 rounds of hashing iterations. I changed the > hashing rounds to 4 by overriding `auth_bcrypt_gensalt_log2_rounds` system > property and it can effectively solve above-mentioned the CPU issue. > It took us some time to nail down the cause of this problem. Shall we reduce > the default value of `GENSALT_LOG2_ROUNDS_PROPERTY` to a smaller value than > 10? Any suggestions on the tradeoff between performance and cryptographic > impact? > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-9989) Optimise BTree.build
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-9989: -- Resolution: Fixed Status: Resolved (was: Ready to Commit) Thanks [~benedict] for the review. Committed as [{{2e59ea8}}|https://github.com/apache/cassandra/commit/2e59ea8c7f21cb11b7ce71a5cdf303a8ed453bc0]. > Optimise BTree.build > > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597080#comment-16597080 ] Jay Zhuang commented on CASSANDRA-9989: --- I rebased and squashed the commits: |Branch|uTest| | [9989-rebased|https://github.com/cooldoger/cassandra/tree/9989-rebased] | [!https://circleci.com/gh/cooldoger/cassandra/tree/9989-rebased.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989-rebased] | > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596997#comment-16596997 ] Jay Zhuang commented on CASSANDRA-9989: --- There's a little bit improvement when we pre-compute the child size and split the left values to the last 2 nodes: [{{9989-2}}|https://github.com/cooldoger/cassandra/tree/9989-2]. Here is my benchmark test results (it should only impact the large bTree build): {noformat} == calculate child size every round: [java] Benchmark (dataSize) Mode Cnt Score Error Units [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 124595.864 ? 10133.336 ops/ms [java] BTreeBuildBench.buildTreeTest 2 thrpt 16 120228.601 ? 12859.617 ops/ms [java] BTreeBuildBench.buildTreeTest 5 thrpt 16 103881.001 ? 8136.400 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 89141.480 ? 7716.011 ops/ms [java] BTreeBuildBench.buildTreeTest 20 thrpt 16 67390.602 ? 8057.348 ops/ms [java] BTreeBuildBench.buildTreeTest 40 thrpt 16 19633.234 ? 1545.773 ops/ms [java] BTreeBuildBench.buildTreeTest 100 thrpt 16 10334.557 ? 1027.898 ops/ms [java] BTreeBuildBench.buildTreeTest1000 thrpt 161239.163 ? 173.303 ops/ms [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 104.024 ? 12.069 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 10.259 ? 1.088 ops/ms == pre-calculate child size and split the left values to the last 2 nodes: [java] Benchmark (dataSize) Mode Cnt Score Error Units [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 122030.330 ? 10528.782 ops/ms [java] BTreeBuildBench.buildTreeTest 2 thrpt 16 121939.935 ? 12627.014 ops/ms [java] BTreeBuildBench.buildTreeTest 5 thrpt 16 104694.942 ? 9031.935 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 87687.949 ? 9029.432 ops/ms [java] BTreeBuildBench.buildTreeTest 20 thrpt 16 67941.722 ? 7099.874 ops/ms [java] BTreeBuildBench.buildTreeTest 40 thrpt 16 19468.380 ? 1640.993 ops/ms [java] BTreeBuildBench.buildTreeTest 100 thrpt 16 10503.954 ? 980.228 ops/ms [java] BTreeBuildBench.buildTreeTest1000 thrpt 161374.558 ? 167.329 ops/ms [java] BTreeBuildBench.buildTreeTest 1 thrpt 16 111.364 ? 8.896 ops/ms [java] BTreeBuildBench.buildTreeTest 10 thrpt 16 10.728 ? 1.107 ops/ms {noformat} I would prefer the clearer code. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596957#comment-16596957 ] Jay Zhuang commented on CASSANDRA-9989: --- Nice catch. I updated the branch to split the left values to the left child nodes. Please review again: |branch|[9989|https://github.com/cooldoger/cassandra/commits/9989]| > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595399#comment-16595399 ] Jay Zhuang commented on CASSANDRA-9989: --- Thanks [~benedict]. {quote} 1. It might be nice to rename pos to index for consistency with indexOffsets {quote} Changed. {quote} 2. It might be nicer to split even more evenly, as far as possible - if only from a code perspective. The situation you're accounting for of a single key in the final child could be resolved by decrementing some K from every other node, I think. {quote} Make sense to me. It could be done by: {noformat} childSize = (size - 1) / childNum; {noformat} instead of {noformat} childSize = size / childNum; {noformat} The code is also simpler. Please review: |branch|[9989|https://github.com/cooldoger/cassandra/commits/9989]| > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594239#comment-16594239 ] Jay Zhuang commented on CASSANDRA-9989: --- Sorry, I misunderstood the last part. I updated the patch to evenly split the values to all child nodes. The TREE_SIZE is starting from index 0 instead of 1. {{left}} is replaced with an incrementing counter, please review: |branch | [9989|https://github.com/cooldoger/cassandra/tree/9989]| {{LongBTreeTest.java}} is timing out even for trunk, I'm increasing timeout and run through the test. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591169#comment-16591169 ] Jay Zhuang commented on CASSANDRA-9989: --- Sorry, I missed {{point 1.}} (I thought it's something else), but you already get it: {noformat} int childrenNum = (size + childSize + 1) / (childSize +1) ==> int childrenNum = size / (childSize + 1) + (childSize + 1) / (childSize + 1) ==> int childrenNum = size / (childSize + 1) + 1 {noformat} We can make it more clear with: {{(size / (childSize + 1)) + 1}}. {quote}I think it might be nicer in this case to make the TREE_SIZE logically more obvious, so that its accessors (which are rather more complicated) are simplified, rather than its calculation? I don't think this is very tricky anyway - just set TREE_SIZE[0] = FAN_FACTOR, and leave the loop as it is, I think? {quote} It's also used here: [{{TREE_SIZE[level-2]}}|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-4b911b7d0959c6219175e2349968f3cdR196]. Which needs to be changed to: {{int grandchildSize = level == 1 ? 0 : TREE_SIZE[level - 2];}}. I prefer avoiding this check while building every node (will add a comment that leaf node is level 1). But I'm fine with either way. {quote}I think this is also more easily done by an incrementing counter, rather than decrementing? {quote} Sure, I'll update the patch for that. > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider
[ https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590983#comment-16590983 ] Jay Zhuang commented on CASSANDRA-9989: --- [~benedict], thank you very much for the review. I updated the branch based on your comments: [{{branch: 9989}}|https://github.com/cooldoger/cassandra/commits/9989]. {quote}1. How did you arrive at your childrenNum calculation, and are we certain it is correct? This is pretty critical for correctness, and hard to test fully, so it would be nice to have some comments justifying it. 4. It would be nice if we removed MAX_DEPTH, and just truncated TREE_SIZE to the correct maximum in our static block {quote} Fixed. Now it auto calculates the max height of the tree that we could build. {quote}2. Why decrement left instead of just counting up the number of values written? {quote} It's used to update [{{indexOffsets\[i\]}}|https://github.com/cooldoger/cassandra/commit/0c1a9d11d6540ac7b233c400e0d8b1a56e647d5f#diff-4b911b7d0959c6219175e2349968f3cdR179]. {quote}3. Why is TREE_SIZE indexed from 1, not 0? {quote} Just to make the initial calculation easier: [{{TREE_SIZE\[i-1\]}}|https://github.com/cooldoger/cassandra/commit/0c1a9d11d6540ac7b233c400e0d8b1a56e647d5f#diff-4b911b7d0959c6219175e2349968f3cdR84]. With the new patch, it's also used here: to get [{{grandchildSize}}|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-4b911b7d0959c6219175e2349968f3cdR196]. {quote}I'm also torn on the splitting of the last two nodes - this is consistent with the current NodeBuilder logic, but it does complicate the code a little versus evenly splitting the remaining size amongst all the children. {quote} I was thinking to make the tree a little bit more balanced by splitting it equally to the last 2 nodes. But yes, it also makes sense to make it the same as before. I updated the code and added an unittest to make sure the BTree is [exactly the same|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-cb7b127243f861292899bad7305217dbR592] as before (with {{[NodeBuilder()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java]}}). > Optimise BTree.Buider > - > > Key: CASSANDRA-9989 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9989 > Project: Cassandra > Issue Type: Sub-task >Reporter: Benedict >Assignee: Jay Zhuang >Priority: Minor > Fix For: 4.x > > Attachments: 9989-trunk.txt > > > BTree.Builder could reduce its copying, and exploit toArray more efficiently, > with some work. It's not very important right now because we don't make as > much use of its bulk-add methods as we otherwise might, however over time > this work will become more useful. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14596) [dtest] test_mutation_v5 - write_failures_test.TestWriteFailures
[ https://issues.apache.org/jira/browse/CASSANDRA-14596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-14596: --- Resolution: Fixed Status: Resolved (was: Ready to Commit) Thanks [~jasobrown] for the review. Committed as [{{2572ddc}}|https://github.com/apache/cassandra-dtest/commit/2572ddce6c9a33ae81e1543195bfae084f835d6d]. > [dtest] test_mutation_v5 - write_failures_test.TestWriteFailures > > > Key: CASSANDRA-14596 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14596 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jason Brown >Assignee: Jay Zhuang >Priority: Minor > Labels: dtest > > dtest fails with the following pytest error: > {noformat} > s = b'\x00\x00' > > unpack = lambda s: packer.unpack(s)[0] > E struct.error: unpack requires a buffer of 4 bytes > {noformat} > Test fails on 3.11 (was introduced for 3.10), but succeeds on trunk -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14596) [dtest] test_mutation_v5 - write_failures_test.TestWriteFailures
[ https://issues.apache.org/jira/browse/CASSANDRA-14596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang reassigned CASSANDRA-14596: -- Assignee: Jay Zhuang > [dtest] test_mutation_v5 - write_failures_test.TestWriteFailures > > > Key: CASSANDRA-14596 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14596 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Jason Brown >Assignee: Jay Zhuang >Priority: Minor > Labels: dtest > > dtest fails with the following pytest error: > {noformat} > s = b'\x00\x00' > > unpack = lambda s: packer.unpack(s)[0] > E struct.error: unpack requires a buffer of 4 bytes > {noformat} > Test fails on 3.11 (was introduced for 3.10), but succeeds on trunk -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org