[jira] [Updated] (CASSANDRA-15291) Batch the token metadata update to improve the speed

2020-01-07 Thread Jay Zhuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15291:
---
Test and Documentation Plan: unitttest
 Status: Patch Available  (was: Open)

> Batch the token metadata update to improve the speed
> 
>
> Key: CASSANDRA-15291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15291
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Low
>
> There's a faster API to batch load the tokens instead of updating one 
> endpoint at a time. For a large vNode cluster (> 1K nodes), it can reduce the 
> populate time from 14 seconds to
> 0.1 seconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15141) Faster token ownership calculation for NetworkTopologyStrategy

2019-08-25 Thread Jay Zhuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15141:
---
Summary: Faster token ownership calculation for NetworkTopologyStrategy  
(was: RemoveNode takes long time and blocks gossip stage)

> Faster token ownership calculation for NetworkTopologyStrategy
> --
>
> Key: CASSANDRA-15141
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15141
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> This function 
> [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
>  during removenode and decommission is slow for large vnode cluster with 
> NetworkTopologyStrategy. As it needs to build whole replications map for 
> every token range.
> In one of our cluster (> 1k nodes), it takes about 20 seconds for each 
> NetworkTopologyStrategy keyspace, so the total time to process a removenode 
> message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user 
> keyspace). It blocks the heartbeat propagation and causes false down node.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15291) Batch the token metadata update to improve the speed

2019-08-24 Thread Jay Zhuang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915152#comment-16915152
 ] 

Jay Zhuang commented on CASSANDRA-15291:


| Branch | CircleCI |
| [15291-trunk|https://github.com/Instagram/cassandra/tree/15291-trunk] | 
https://circleci.com/workflow-run/ab9ff528-e34a-476f-ac91-767f1dab796a |

> Batch the token metadata update to improve the speed
> 
>
> Key: CASSANDRA-15291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15291
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Low
>
> There's a faster API to batch load the tokens instead of updating one 
> endpoint at a time. For a large vNode cluster (> 1K nodes), it can reduce the 
> populate time from 14 seconds to
> 0.1 seconds.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15291) Batch the token metadata update to improve the speed

2019-08-24 Thread Jay Zhuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15291:
---
Summary: Batch the token metadata update to improve the speed  (was: Batch 
the token update to improve the token populate speed)

> Batch the token metadata update to improve the speed
> 
>
> Key: CASSANDRA-15291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15291
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Low
>
> There's a faster API to batch load the tokens instead of updating one 
> endpoint at a time. For a large vNode cluster (> 1K nodes), it can reduce the 
> populate time from 14 seconds to
> 0.1 seconds.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15291) Batch the token update to improve the token populate speed

2019-08-24 Thread Jay Zhuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15291:
---
Change Category: Performance
 Complexity: Normal
   Priority: Low  (was: Normal)
 Status: Open  (was: Triage Needed)

> Batch the token update to improve the token populate speed
> --
>
> Key: CASSANDRA-15291
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15291
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Low
>
> There's a faster API to batch load the tokens instead of updating one 
> endpoint at a time. For a large vNode cluster (> 1K nodes), it can reduce the 
> populate time from 14 seconds to
> 0.1 seconds.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15291) Batch the token update to improve the token populate speed

2019-08-24 Thread Jay Zhuang (Jira)
Jay Zhuang created CASSANDRA-15291:
--

 Summary: Batch the token update to improve the token populate speed
 Key: CASSANDRA-15291
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15291
 Project: Cassandra
  Issue Type: Improvement
  Components: Cluster/Membership
Reporter: Jay Zhuang
Assignee: Jay Zhuang


There's a faster API to batch load the tokens instead of updating one endpoint 
at a time. For a large vNode cluster (> 1K nodes), it can reduce the populate 
time from 14 seconds to
0.1 seconds.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15290) Avoid token cache invalidation for removing proxy node

2019-08-24 Thread Jay Zhuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15290:
---
Test and Documentation Plan: The patch is deployed in our production env.
 Status: Patch Available  (was: Open)

> Avoid token cache invalidation for removing proxy node
> --
>
> Key: CASSANDRA-15290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15290
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Low
> Fix For: 4.0.x
>
>
> As proxy node doesn't own token, adding/removing doesn't change token 
> ownership and no need to invalidate the token cache.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15290) Avoid token cache invalidation for removing proxy node

2019-08-24 Thread Jay Zhuang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915145#comment-16915145
 ] 

Jay Zhuang commented on CASSANDRA-15290:


| Branch | CircleCI |
| [15290-trunk|https://github.com/Instagram/cassandra/tree/15290-trunk] | 
https://circleci.com/workflow-run/068aa546-39cd-4999-b959-4bd5d180c5d1 |

> Avoid token cache invalidation for removing proxy node
> --
>
> Key: CASSANDRA-15290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15290
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Low
> Fix For: 4.0.x
>
>
> As proxy node doesn't own token, adding/removing doesn't change token 
> ownership and no need to invalidate the token cache.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15290) Avoid token cache invalidation for removing proxy node

2019-08-24 Thread Jay Zhuang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15290:
---
Change Category: Performance
 Complexity: Normal
  Fix Version/s: 4.0.x
   Priority: Low  (was: Normal)
 Status: Open  (was: Triage Needed)

> Avoid token cache invalidation for removing proxy node
> --
>
> Key: CASSANDRA-15290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15290
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Low
> Fix For: 4.0.x
>
>
> As proxy node doesn't own token, adding/removing doesn't change token 
> ownership and no need to invalidate the token cache.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15290) Avoid token cache invalidation for removing proxy node

2019-08-24 Thread Jay Zhuang (Jira)
Jay Zhuang created CASSANDRA-15290:
--

 Summary: Avoid token cache invalidation for removing proxy node
 Key: CASSANDRA-15290
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15290
 Project: Cassandra
  Issue Type: Improvement
  Components: Cluster/Membership
Reporter: Jay Zhuang
Assignee: Jay Zhuang


As proxy node doesn't own token, adding/removing doesn't change token ownership 
and no need to invalidate the token cache.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15239) [flaky in-mem dtest] nodeDownDuringMove - org.apache.cassandra.distributed.test.GossipTest

2019-07-20 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15239:
---
 Severity: Normal
   Complexity: Normal
Discovered By: Unit Test
 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Status: Open  (was: Triage Needed)

> [flaky in-mem dtest] nodeDownDuringMove - 
> org.apache.cassandra.distributed.test.GossipTest
> --
>
> Key: CASSANDRA-15239
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15239
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Jay Zhuang
>Priority: Normal
>
> The in-mem dtest fail from time to time:
> {noformat}
> nodeDownDuringMove - org.apache.cassandra.distributed.test.GossipTest
> java.lang.RuntimeException: java.lang.IllegalStateException: Unable to 
> contact any seeds!
> {noformat}
> [https://circleci.com/gh/Instagram/cassandra/98]
> More details:
> {noformat}
> Testcase: 
> nodeDownDuringMove(org.apache.cassandra.distributed.test.GossipTest): Caused 
> an ERROR
> java.lang.IllegalStateException: Unable to contact any seeds!
> java.lang.RuntimeException: java.lang.IllegalStateException: Unable to 
> contact any seeds!
> at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.waitOn(IsolatedExecutor.java:166)
> at 
> org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$sync$4(IsolatedExecutor.java:69)
> at 
> org.apache.cassandra.distributed.impl.Instance.startup(Instance.java:322)
> at 
> org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:148)
> at 
> org.apache.cassandra.distributed.test.GossipTest.nodeDownDuringMove(GossipTest.java:96)
> Caused by: java.lang.IllegalStateException: Unable to contact any seeds!
> at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1261)
> at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:921)
> at 
> org.apache.cassandra.distributed.impl.Instance.lambda$startup$6(Instance.java:301)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
> at java.lang.Thread.run(Thread.java:748)
> Test org.apache.cassandra.distributed.test.GossipTest FAILED
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode

2019-07-19 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889202#comment-16889202
 ] 

Jay Zhuang edited comment on CASSANDRA-15098 at 7/19/19 10:07 PM:
--

Rebased the code and passed tests, please review:

| Branch | uTest | jvm-dTest | dTest | dTest vnode |
| [15098-3.0|https://github.com/instagram/cassandra/tree/15098-3.0] | [#107 
passed |https://circleci.com/gh/Instagram/cassandra/107] | [#108 
failed|https://circleci.com/gh/Instagram/cassandra/108], known issue: 
CASSANDRA-15239 | [#110 
failed|https://circleci.com/gh/Instagram/cassandra/110], passed locally, known 
issue: CASSANDRA-14595 | [#109 failed | 
https://circleci.com/gh/Instagram/cassandra/109], passed locally, known issue: 
CASSANDRA-14595 |
| [15098-3.11|https://github.com/instagram/cassandra/tree/15098-3.11] | [#100 
passed|https://circleci.com/gh/Instagram/cassandra/100] | [#99 
passed|https://circleci.com/gh/Instagram/cassandra/99] | [#111 
failed|https://circleci.com/gh/Instagram/cassandra/111], passed locally: 
CASSANDRA-14595 | [#112 
failed|https://circleci.com/gh/Instagram/cassandra/112], passed locally: 
CASSANDRA-14595 |
| [15098-trunk|https://github.com/instagram/cassandra/tree/15098-trunk] | [#104 
failed|https://circleci.com/gh/Instagram/cassandra/104], passed locally and 
re-run passed [#117|https://circleci.com/gh/Instagram/cassandra/117] | [#105 
passed|https://circleci.com/gh/Instagram/cassandra/105] | [#114 
passed|https://circleci.com/gh/Instagram/cassandra/114] | [#113 
passed|https://circleci.com/gh/Instagram/cassandra/113] |



was (Author: jay.zhuang):
Rebased the code and passed tests, please review:

 | Branch | uTest | jvm-dTest | dTest | dTest vnode |
| [15098-3.0|https://github.com/instagram/cassandra/tree/15098-3.0] | [#107 
passed |https://circleci.com/gh/Instagram/cassandra/107] | [#108 
failed|https://circleci.com/gh/Instagram/cassandra/108], known issue: 
CASSANDRA-15239 | [#110 
failed|https://circleci.com/gh/Instagram/cassandra/110], passed locally, known 
issue: CASSANDRA-14595 | [#109 failed | 
https://circleci.com/gh/Instagram/cassandra/109], passed locally, known issue: 
CASSANDRA-14595 |
| [15098-3.11|https://github.com/instagram/cassandra/tree/15098-3.11] | [#100 
passed|https://circleci.com/gh/Instagram/cassandra/100] | [#99 
passed|https://circleci.com/gh/Instagram/cassandra/99] | [#111 
failed|https://circleci.com/gh/Instagram/cassandra/111], passed locally: 
CASSANDRA-14595 | [#112 
failed|https://circleci.com/gh/Instagram/cassandra/112], passed locally: 
CASSANDRA-14595 |
| [15098-trunk|https://github.com/instagram/cassandra/tree/15098-trunk] | [#104 
failed|https://circleci.com/gh/Instagram/cassandra/104], passed locally and 
re-run passed [#117|https://circleci.com/gh/Instagram/cassandra/117] | [#105 
passed|https://circleci.com/gh/Instagram/cassandra/105] | [#114 
passed|https://circleci.com/gh/Instagram/cassandra/114] | [#113 
passed|https://circleci.com/gh/Instagram/cassandra/113] |


> Endpoints no longer owning tokens are not removed for vnode
> ---
>
> Key: CASSANDRA-15098
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15098
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The logical here to remove endpoints no longer owning tokens is not working 
> for multiple tokens (vnode):
> https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505
> And it's very expensive to copy the tokenmetadata for every check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode

2019-07-19 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889202#comment-16889202
 ] 

Jay Zhuang commented on CASSANDRA-15098:


Rebased the code and passed tests, please review:

 | Branch | uTest | jvm-dTest | dTest | dTest vnode |
| [15098-3.0|https://github.com/instagram/cassandra/tree/15098-3.0] | [#107 
passed |https://circleci.com/gh/Instagram/cassandra/107] | [#108 
failed|https://circleci.com/gh/Instagram/cassandra/108], known issue: 
CASSANDRA-15239 | [#110 
failed|https://circleci.com/gh/Instagram/cassandra/110], passed locally, known 
issue: CASSANDRA-14595 | [#109 failed | 
https://circleci.com/gh/Instagram/cassandra/109], passed locally, known issue: 
CASSANDRA-14595 |
| [15098-3.11|https://github.com/instagram/cassandra/tree/15098-3.11] | [#100 
passed|https://circleci.com/gh/Instagram/cassandra/100] | [#99 
passed|https://circleci.com/gh/Instagram/cassandra/99] | [#111 
failed|https://circleci.com/gh/Instagram/cassandra/111], passed locally: 
CASSANDRA-14595 | [#112 
failed|https://circleci.com/gh/Instagram/cassandra/112], passed locally: 
CASSANDRA-14595 |
| [15098-trunk|https://github.com/instagram/cassandra/tree/15098-trunk] | [#104 
failed|https://circleci.com/gh/Instagram/cassandra/104], passed locally and 
re-run passed [#117|https://circleci.com/gh/Instagram/cassandra/117] | [#105 
passed|https://circleci.com/gh/Instagram/cassandra/105] | [#114 
passed|https://circleci.com/gh/Instagram/cassandra/114] | [#113 
passed|https://circleci.com/gh/Instagram/cassandra/113] |


> Endpoints no longer owning tokens are not removed for vnode
> ---
>
> Key: CASSANDRA-15098
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15098
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The logical here to remove endpoints no longer owning tokens is not working 
> for multiple tokens (vnode):
> https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505
> And it's very expensive to copy the tokenmetadata for every check.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15239) [flaky in-mem dtest] nodeDownDuringMove - org.apache.cassandra.distributed.test.GossipTest

2019-07-19 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-15239:
--

 Summary: [flaky in-mem dtest] nodeDownDuringMove - 
org.apache.cassandra.distributed.test.GossipTest
 Key: CASSANDRA-15239
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15239
 Project: Cassandra
  Issue Type: Bug
  Components: Test/dtest
Reporter: Jay Zhuang


The in-mem dtest fail from time to time:
{noformat}
nodeDownDuringMove - org.apache.cassandra.distributed.test.GossipTest
java.lang.RuntimeException: java.lang.IllegalStateException: Unable to contact 
any seeds!
{noformat}
[https://circleci.com/gh/Instagram/cassandra/98]

More details:
{noformat}
Testcase: nodeDownDuringMove(org.apache.cassandra.distributed.test.GossipTest): 
Caused an ERROR
java.lang.IllegalStateException: Unable to contact any seeds!
java.lang.RuntimeException: java.lang.IllegalStateException: Unable to contact 
any seeds!
at 
org.apache.cassandra.distributed.impl.IsolatedExecutor.waitOn(IsolatedExecutor.java:166)
at 
org.apache.cassandra.distributed.impl.IsolatedExecutor.lambda$sync$4(IsolatedExecutor.java:69)
at 
org.apache.cassandra.distributed.impl.Instance.startup(Instance.java:322)
at 
org.apache.cassandra.distributed.impl.AbstractCluster$Wrapper.startup(AbstractCluster.java:148)
at 
org.apache.cassandra.distributed.test.GossipTest.nodeDownDuringMove(GossipTest.java:96)
Caused by: java.lang.IllegalStateException: Unable to contact any seeds!
at 
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1261)
at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:921)
at 
org.apache.cassandra.distributed.impl.Instance.lambda$startup$6(Instance.java:301)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
at java.lang.Thread.run(Thread.java:748)


Test org.apache.cassandra.distributed.test.GossipTest FAILED
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-07-18 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15097:
---
  Fix Version/s: 4.0
 3.11.5
 3.0.19
Source Control Link: 
[3f70e7c72c703bc323b169a28e8754ce67d4e479|https://github.com/apache/cassandra/commit/3f70e7c72c703bc323b169a28e8754ce67d4e479]
  Since Version: 3.0.0
 Status: Resolved  (was: Ready to Commit)
 Resolution: Fixed

Thanks [~samt] for the review. Committed as 
[3f70e7c|https://github.com/apache/cassandra/commit/3f70e7c72c703bc323b169a28e8754ce67d4e479].

> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
> Fix For: 3.0.19, 3.11.5, 4.0
>
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-07-16 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16886404#comment-16886404
 ] 

Jay Zhuang commented on CASSANDRA-15097:


Add dTest results:
| Branch | uTest | jvm-dTest | dTest | dTest vnode |
| [15097-3.0|https://github.com/instagram/cassandra/tree/15097-3.0] | [#66 
passed |https://circleci.com/gh/Instagram/cassandra/66] | [#67 
passed|https://circleci.com/gh/Instagram/cassandra/67] | [#75 
failed|https://circleci.com/gh/Instagram/cassandra/75], passed locally: 
CASSANDRA-14595 | [#74 failed | 
https://circleci.com/gh/Instagram/cassandra/74], passed locally: 
CASSANDRA-14595 |
| [15097-3.11|https://github.com/instagram/cassandra/tree/15097-3.11] | [#69 
passed|https://circleci.com/gh/Instagram/cassandra/69] | [#68 
passed|https://circleci.com/gh/Instagram/cassandra/68] | [#77 
failed|https://circleci.com/gh/Instagram/cassandra/77], passed locally: 
CASSANDRA-14595 | [#76 failed|https://circleci.com/gh/Instagram/cassandra/76], 
passed locally: CASSANDRA-14595 |
| [15097-trunk|https://github.com/instagram/cassandra/tree/15097-trunk] | [#72 
passed|https://circleci.com/gh/Instagram/cassandra/72] | [#73 
passed|https://circleci.com/gh/Instagram/cassandra/73] | [#78 
passed|https://circleci.com/gh/Instagram/cassandra/78] | [#79 
failed|https://circleci.com/gh/Instagram/cassandra/79], passed locally|
All failed dtests are passed locally with 10x run:
{{$ pytest --count=10 --cassandra-dir=~/cassandra $TESTS}}

> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-07-15 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885427#comment-16885427
 ] 

Jay Zhuang edited comment on CASSANDRA-15097 at 7/15/19 5:29 PM:
-

Thanks [~samt]. The patch is rebased and tests are passed in circleci:
| Branch | uTest (circleci) |
| [15097-3.0|https://github.com/instagram/cassandra/tree/15097-3.0] | 
[pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.0] |
| [15097-3.11|https://github.com/instagram/cassandra/tree/15097-3.11] | 
[pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.11] |
| [15097-trunk|https://github.com/instagram/cassandra/tree/15097-trunk] | 
[pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-trunk] |


was (Author: jay.zhuang):
Here is a patch to filter out updated states:
| Branch | uTest (circleci) |
| [15097-3.0|https://github.com/instagram/cassandra/tree/15097-3.0] | 
[pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.0] |
| [15097-3.11|https://github.com/instagram/cassandra/tree/15097-3.11] | 
[pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.11] |
| [15097-trunk|https://github.com/instagram/cassandra/tree/15097-trunk] | 
[pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-trunk] |

> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-07-15 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15097:
---
Status: Review In Progress  (was: Changes Suggested)

> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-07-15 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16885427#comment-16885427
 ] 

Jay Zhuang commented on CASSANDRA-15097:


Here is a patch to filter out updated states:
| Branch | uTest (circleci) |
| [15097-3.0|https://github.com/instagram/cassandra/tree/15097-3.0] | 
[pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.0] |
| [15097-3.11|https://github.com/instagram/cassandra/tree/15097-3.11] | 
[pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-3.11] |
| [15097-trunk|https://github.com/instagram/cassandra/tree/15097-trunk] | 
[pass|https://circleci.com/gh/Instagram/cassandra/tree/15097-trunk] |

> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15141) RemoveNode takes long time and blocks gossip stage

2019-07-01 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15141:
---
Test and Documentation Plan: It's deployed to all our production clusters.
 Status: Patch Available  (was: Open)

> RemoveNode takes long time and blocks gossip stage
> --
>
> Key: CASSANDRA-15141
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15141
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> This function 
> [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
>  during removenode and decommission is slow for large vnode cluster with 
> NetworkTopologyStrategy. As it needs to build whole replications map for 
> every token range.
> In one of our cluster (> 1k nodes), it takes about 20 seconds for each 
> NetworkTopologyStrategy keyspace, so the total time to process a removenode 
> message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user 
> keyspace). It blocks the heartbeat propagation and causes false down node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15141) RemoveNode takes long time and blocks gossip stage

2019-07-01 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876492#comment-16876492
 ] 

Jay Zhuang commented on CASSANDRA-15141:


Here is a patch to improve the performance of calculating endpoint replicas. 
It's 100x - 1000x faster than the default implementation:
| Branch | uTest | JVM-dTest | dTest |
| [15141-trunk|https://github.com/Instagram/cassandra/tree/15141-trunk] | 
[circle #51|https://circleci.com/gh/Instagram/cassandra/51] | [circle 
#50|https://circleci.com/gh/Instagram/cassandra/50] | [circle 
#53|https://circleci.com/gh/Instagram/cassandra/53] |

> RemoveNode takes long time and blocks gossip stage
> --
>
> Key: CASSANDRA-15141
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15141
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> This function 
> [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
>  during removenode and decommission is slow for large vnode cluster with 
> NetworkTopologyStrategy. As it needs to build whole replications map for 
> every token range.
> In one of our cluster (> 1k nodes), it takes about 20 seconds for each 
> NetworkTopologyStrategy keyspace, so the total time to process a removenode 
> message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user 
> keyspace). It blocks the heartbeat propagation and causes false down node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15141) RemoveNode takes long time and blocks gossip stage

2019-05-24 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15141:
---
 Complexity: Challenging
Change Category: Performance
 Status: Open  (was: Triage Needed)

> RemoveNode takes long time and blocks gossip stage
> --
>
> Key: CASSANDRA-15141
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15141
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> This function 
> [{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
>  during removenode and decommission is slow for large vnode cluster with 
> NetworkTopologyStrategy. As it needs to build whole replications map for 
> every token range.
> In one of our cluster (> 1k nodes), it takes about 20 seconds for each 
> NetworkTopologyStrategy keyspace, so the total time to process a removenode 
> message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user 
> keyspace). It blocks the heartbeat propagation and causes false down node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15141) RemoveNode takes long time and blocks gossip stage

2019-05-24 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-15141:
--

 Summary: RemoveNode takes long time and blocks gossip stage
 Key: CASSANDRA-15141
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15141
 Project: Cassandra
  Issue Type: Improvement
  Components: Cluster/Gossip, Cluster/Membership
Reporter: Jay Zhuang
Assignee: Jay Zhuang


This function 
[{{getAddressReplicas()}}|https://github.com/apache/cassandra/blob/7df67eff2d66dba4bed2b4f6aeabf05144d9b057/src/java/org/apache/cassandra/service/StorageService.java#L3002]
 during removenode and decommission is slow for large vnode cluster with 
NetworkTopologyStrategy. As it needs to build whole replications map for every 
token range.
In one of our cluster (> 1k nodes), it takes about 20 seconds for each 
NetworkTopologyStrategy keyspace, so the total time to process a removenode 
message takes at least 80 seconds (20 * 4: 3 system keyspaces, 1 user 
keyspace). It blocks the heartbeat propagation and causes false down node.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-05-20 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15097:
---
Test and Documentation Plan: Unittest is passed. And the code is committed 
and running in Instagram production environment.
 Status: Patch Available  (was: Open)

> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15133) Node restart causes unnecessary token metadata update

2019-05-20 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15133:
---
Test and Documentation Plan: Unittest is passed. And the code is committed 
and running in Instagram production environment.
 Status: Patch Available  (was: Open)

> Node restart causes unnecessary token metadata update
> -
>
> Key: CASSANDRA-15133
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15133
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> Restarting a node causes gossip generation update. When it propagates the 
> message to the cluster, every node blindly update its local token metadata 
> even it is not changed. Update token metadata is expensive for large vnode 
> cluster and causes token metadata cache unnecessarily invalided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-05-20 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15097:
---
 Severity: Low
   Complexity: Normal
Discovered By: User Report
 Bug Category: Parent values: Degradation(12984)Level 1 values: Performance 
Bug/Regression(12997)
   Status: Open  (was: Triage Needed)

> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode

2019-05-20 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15098:
---
Test and Documentation Plan: Unittest is passed. And the code is committed 
and running in Instagram production environment.
 Status: Patch Available  (was: Open)

> Endpoints no longer owning tokens are not removed for vnode
> ---
>
> Key: CASSANDRA-15098
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15098
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The logical here to remove endpoints no longer owning tokens is not working 
> for multiple tokens (vnode):
> https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505
> And it's very expensive to copy the tokenmetadata for every check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode

2019-05-20 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15098:
---
 Severity: Normal
   Complexity: Normal
Discovered By: User Report
 Bug Category: Parent values: Correctness(12982)Level 1 values: Persistent 
Corruption / Loss(12986)
   Status: Open  (was: Triage Needed)

> Endpoints no longer owning tokens are not removed for vnode
> ---
>
> Key: CASSANDRA-15098
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15098
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The logical here to remove endpoints no longer owning tokens is not working 
> for multiple tokens (vnode):
> https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505
> And it's very expensive to copy the tokenmetadata for every check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15133) Node restart causes unnecessary token metadata update

2019-05-20 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-15133:
---
 Complexity: Low Hanging Fruit
Change Category: Performance
 Status: Open  (was: Triage Needed)

> Node restart causes unnecessary token metadata update
> -
>
> Key: CASSANDRA-15133
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15133
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> Restarting a node causes gossip generation update. When it propagates the 
> message to the cluster, every node blindly update its local token metadata 
> even it is not changed. Update token metadata is expensive for large vnode 
> cluster and causes token metadata cache unnecessarily invalided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15133) Node restart causes unnecessary token metadata update

2019-05-20 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844502#comment-16844502
 ] 

Jay Zhuang commented on CASSANDRA-15133:


Here is a purposed fix:
| [15133-trunk|https://github.com/cooldoger/cassandra/tree/15133-trunk] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/15133-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15133-trunk]
 |

And also it fixes a removing {{movingEndpoint}} issue, as the following code 
never works, removing item while looping the collection will throw 
`ConcurrentModificationException`:
{noformat}
for (Pair pair : movingEndpoints)
{
if (pair.right.equals(endpoint))
{
movingEndpoints.remove(pair);
break;
}
}
{noformat}

> Node restart causes unnecessary token metadata update
> -
>
> Key: CASSANDRA-15133
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15133
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> Restarting a node causes gossip generation update. When it propagates the 
> message to the cluster, every node blindly update its local token metadata 
> even it is not changed. Update token metadata is expensive for large vnode 
> cluster and causes token metadata cache unnecessarily invalided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15133) Node restart causes unnecessary token metadata update

2019-05-20 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-15133:
--

 Summary: Node restart causes unnecessary token metadata update
 Key: CASSANDRA-15133
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15133
 Project: Cassandra
  Issue Type: Improvement
  Components: Cluster/Gossip, Cluster/Membership
Reporter: Jay Zhuang
Assignee: Jay Zhuang


Restarting a node causes gossip generation update. When it propagates the 
message to the cluster, every node blindly update its local token metadata even 
it is not changed. Update token metadata is expensive for large vnode cluster 
and causes token metadata cache unnecessarily invalided.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode

2019-04-24 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16825447#comment-16825447
 ] 

Jay Zhuang commented on CASSANDRA-15098:


Here is an utest to reproduce the problem and a purposed fix, please review:
| Branch | uTest |
| [15098-3.0|https://github.com/cooldoger/cassandra/tree/15098-3.0] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/15098-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15098-3.0]
 |
| [15098-3.11|https://github.com/cooldoger/cassandra/tree/15098-3.11] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/15098-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15098-3.11]
 |
| [15098-trunk|https://github.com/cooldoger/cassandra/tree/15098-trunk] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/15098-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15098-trunk]
 |

> Endpoints no longer owning tokens are not removed for vnode
> ---
>
> Key: CASSANDRA-15098
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15098
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The logical here to remove endpoints no longer owning tokens is not working 
> for multiple tokens (vnode):
> https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505
> And it's very expensive to copy the tokenmetadata for every check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15098) Endpoints no longer owning tokens are not removed for vnode

2019-04-24 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-15098:
--

 Summary: Endpoints no longer owning tokens are not removed for 
vnode
 Key: CASSANDRA-15098
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15098
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Gossip
Reporter: Jay Zhuang
Assignee: Jay Zhuang


The logical here to remove endpoints no longer owning tokens is not working for 
multiple tokens (vnode):
https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/service/StorageService.java#L2505

And it's very expensive to copy the tokenmetadata for every check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-04-23 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16824636#comment-16824636
 ] 

Jay Zhuang commented on CASSANDRA-15097:


Here is a patch to filter out updated states:
| Branch | uTest |
| [15097-3.0|https://github.com/cooldoger/cassandra/tree/15097-3.0] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/15097-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15097-3.0]
 |
| [15097-3.11|https://github.com/cooldoger/cassandra/tree/15097-3.11] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/15097-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15097-3.11]
 |
| [15097-trunk|https://github.com/cooldoger/cassandra/tree/15097-trunk] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/15097-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/15097-trunk]
 |

> Avoid updating unchanged gossip state
> -
>
> Key: CASSANDRA-15097
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Normal
>
> The node might get unchanged gossip states, the state might be just updated 
> after sending a GOSSIP_SYN, then it will get the state that is already up to 
> date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
> unnecessary re-apply the same state again, which could be costly like 
> updating token change.
> It's very likely to happen for large cluster when a node startup, as the 
> first gossip message will sync all endpoints tokens, it could take some time 
> (in our case about 200 seconds), during that time, it keeps gossip with other 
> node and get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15097) Avoid updating unchanged gossip state

2019-04-23 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-15097:
--

 Summary: Avoid updating unchanged gossip state
 Key: CASSANDRA-15097
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15097
 Project: Cassandra
  Issue Type: Bug
  Components: Cluster/Gossip
Reporter: Jay Zhuang
Assignee: Jay Zhuang


The node might get unchanged gossip states, the state might be just updated 
after sending a GOSSIP_SYN, then it will get the state that is already up to 
date. If the heartbeat in the GOSSIP_ACK message is updated, it will 
unnecessary re-apply the same state again, which could be costly like updating 
token change.
It's very likely to happen for large cluster when a node startup, as the first 
gossip message will sync all endpoints tokens, it could take some time (in our 
case about 200 seconds), during that time, it keeps gossip with other node and 
get the full token states. Which causes lots of pending gossip tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13849) GossipStage blocks because of race in ActiveRepairService

2019-02-26 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-13849:
---
Fix Version/s: (was: 3.11.x)
   (was: 3.0.x)
   3.0.16
   3.11.2

> GossipStage blocks because of race in ActiveRepairService
> -
>
> Key: CASSANDRA-13849
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13849
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom van der Woerdt
>Assignee: Sergey Lapukhov
>Priority: Major
>  Labels: patch
> Fix For: 3.0.16, 3.11.2, 4.0
>
> Attachments: CAS-13849.patch, CAS-13849_2.patch, CAS-13849_3.patch
>
>
> Bad luck caused a kernel panic in a cluster, and that took another node with 
> it because GossipStage stopped responding.
> I think it's pretty obvious what's happening, here are the relevant excerpts 
> from the stack traces :
> {noformat}
> "Thread-24004" #393781 daemon prio=5 os_prio=0 tid=0x7efca9647400 
> nid=0xe75c waiting on condition [0x7efaa47fe000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00052b63a7e8> (a 
> java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:332)
> - locked <0x0002e6bc99f0> (a 
> org.apache.cassandra.service.ActiveRepairService)
> at 
> org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:211)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)   
>   
>   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$3/1498438472.run(Unknown
>  Source)
> at java.lang.Thread.run(Thread.java:748)
> "GossipTasks:1" #367 daemon prio=5 os_prio=0 tid=0x7efc5e971000 
> nid=0x700b waiting for monitor entry [0x7dfb839fe000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:421)
> - waiting to lock <0x0002e6bc99f0> (a 
> org.apache.cassandra.service.ActiveRepairService)
> at 
> org.apache.cassandra.service.ActiveRepairService.convict(ActiveRepairService.java:776)
> at 
> org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:306)
> at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:775) 
>   
>  at 
> org.apache.cassandra.gms.Gossiper.access$800(Gossiper.java:67)
> at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:187)
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$3/1498438472.run(Unknown
>  Source)
> at java.lang.Thread.run(Thread.java:748)
> "GossipStage:1" #320 daemon prio=5 os_prio=0 tid=0x7efc5b9f2c00 
> nid=0x6fcd waiting for monitor entry [0x7e260186a000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> 

[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2019-01-15 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743263#comment-16743263
 ] 

Jay Zhuang commented on CASSANDRA-14526:


The new tests passed locally (except: CASSANDRA-14984, not related to this 
change or CASSANDRA-14525). Committed as 
[{{e6f58cb}}|https://github.com/apache/cassandra-dtest/commit/e6f58cb33f7a09f273c5990d5d21c7b529ba80bf].
 Thanks [~chovatia.jayd...@gmail.com].

> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/dtest
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>  Labels: dtest
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2019-01-15 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14526:
---
Resolution: Fixed
  Reviewer: Jay Zhuang
Status: Resolved  (was: Patch Available)

> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/dtest
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>  Labels: dtest
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14984) [dtest] 2 TestBootstrap tests failed for branch 2.2

2019-01-15 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-14984:
--

 Summary: [dtest] 2 TestBootstrap tests failed for branch 2.2
 Key: CASSANDRA-14984
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14984
 Project: Cassandra
  Issue Type: Bug
  Components: Test/dtest
Reporter: Jay Zhuang


Failed tests:
{noformat}
test_decommissioned_wiped_node_can_join
test_decommissioned_wiped_node_can_gossip_to_single_seed
{noformat}

Error:
{noformat}
...
# Decommision the new node and kill it
logger.debug("Decommissioning & stopping node2")
>   node2.decommission()
...
def handle_external_tool_process(process, cmd_args):
out, err = process.communicate()
if (out is not None) and isinstance(out, bytes):
out = out.decode()
if (err is not None) and isinstance(err, bytes):
err = err.decode()
rc = process.returncode

if rc != 0:
>   raise ToolError(cmd_args, rc, out, err)
E   ccmlib.node.ToolError: Subprocess ['nodetool', '-h', 'localhost', 
'-p', '7200', 'decommission'] exited with non-zero status; exit status: 2;
E   stderr: error: Thread signal failed
E   -- StackTrace --
E   java.io.IOException: Thread signal failed
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2019-01-14 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742385#comment-16742385
 ] 

Jay Zhuang commented on CASSANDRA-14526:


Thanks [~chovatia.jayd...@gmail.com], the change looks good to me. Kick off a 
build:

|| Branch || dTest ||
| 2.2 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/672/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/672/]
 |
| 3.0 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/671/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/671/]
 |
| 3.11 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/670/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/670/]
 |
| trunk | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/669/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/669/]
 |



> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/dtest
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>  Labels: dtest
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2019-01-10 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740064#comment-16740064
 ] 

Jay Zhuang commented on CASSANDRA-14526:


Hi [~chovatia.jayd...@gmail.com], the test 
{{secondary_indexes_test.py.TestPreJoinCallback.test_resume}} is still not 
stable. Only 2 out 10 runs are passed from my tests:
{noformat}
$ pytest --count=10 -p no:flaky --cassandra-dir=/Users/zjay/ws/cassandra 
secondary_indexes_test.py::TestPreJoinCallback::test_resume
...
secondary_indexes_test.py:1175: in _base_test
joinFn(cluster, tokens[1])
secondary_indexes_test.py:1210: in resume
node2.watch_log_for('Starting listening for CQL clients')
...
if start + timeout < time.time():
>   raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", 
> time.gmtime()) + " [" + self.name + "] Missing: " + str([e.pattern for e in 
> tofind]) + ":\n" + reads[:50] + ".\nSee {} for 
> remainder".format(filename))
E   ccmlib.node.TimeoutError: 11 Jan 2019 05:43:25 [node2] 
Missing: ['Starting listening for CQL clients']:

E   
INFO  [main] 2019-01-10 21:33:18,285 YamlConfigura.
E   See system.log for remainder
...
= 8 failed, 2 passed, 2 error in 5153.09 seconds ==
{noformat}
Would you please take a look?

> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/dtest
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>  Labels: dtest
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

2019-01-09 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739100#comment-16739100
 ] 

Jay Zhuang commented on CASSANDRA-14525:


I'm sorry for not committing the dtest change. Will do that ASAP (I'm still 
trying to confirm a few flaky tests are not introduced by the changes).

> streaming failure during bootstrap makes new node into inconsistent state
> -
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
> Fix For: 2.2.14, 3.0.18, 3.11.4, 4.0
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>  ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) 
> ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
>  will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not 
> call [StorageService.java::finishJoiningRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
>  and as a result 
> 

[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2019-01-02 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732692#comment-16732692
 ] 

Jay Zhuang commented on CASSANDRA-14526:


I still see the same failure after updating the branch:
{noformat}
node3.start(jvm_args=["-Dcassandra.write_survey=true", 
"-Dcassandra.ring_delay_ms=5000"], wait_other_notice=True)
self.assert_log_had_msg(node3, 'Some data streaming failed', timeout=30)
>   self.assert_log_had_msg(node3, 'Not starting client transports as 
> bootstrap has not completed', timeout=30)

bootstrap_test.py:767:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ _ _ _ _ _

self = , node = 
, msg = 'Not starting client transports 
as bootstrap has not completed', timeout = 30, kwargs = {}

def assert_log_had_msg(self, node, msg, timeout=600, **kwargs):
"""
Wrapper for ccmlib.node.Node#watch_log_for to cause an assertion 
failure when a log message isn't found
within the timeout.
:param node: Node which logs we should watch
:param msg: String message we expect to see in the logs.
:param timeout: Seconds to wait for msg to appear
"""
try:
node.watch_log_for(msg, timeout=timeout, **kwargs)
except TimeoutError:
>   pytest.fail("Log message was not seen within 
> timeout:\n{0}".format(msg))
E   Failed: Log message was not seen within timeout:
E   Not starting client transports as bootstrap has not completed

dtest.py:266: Failed
{noformat}

> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>  Labels: dtest
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2019-01-01 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731753#comment-16731753
 ] 

Jay Zhuang edited comment on CASSANDRA-14526 at 1/2/19 4:08 AM:


Seems the new test is failing:
{noformat}
$ pytest --cassandra-dir=/Users/zjay/ws/cassandra 
bootstrap_test.py::TestBootstrap::test_bootstrap_binary_disabled
...
>   self.assert_log_had_msg(node3, 'Not starting client transports as 
> bootstrap has not completed', timeout=30)
...
E   ccmlib.node.TimeoutError: 02 Jan 2019 03:45:28 [node3] 
Missing: ['Not starting client transports as bootstrap has not completed']:
E   INFO  [main] 2019-01-01 19:44:46,816 Config.java:4.
E   See system.log for remainder
...
{noformat}


was (Author: jay.zhuang):
Seems the test is failing for branch 2.2:
{noformat}
$ pytest --cassandra-dir=/Users/zjay/ws/cassandra 
bootstrap_test.py::TestBootstrap::test_bootstrap_binary_disabled
...
E   ccmlib.node.TimeoutError: 02 Jan 2019 03:45:28 [node3] 
Missing: ['Not starting client transports as bootstrap has not completed']:
E   INFO  [main] 2019-01-01 19:44:46,816 Config.java:4.
E   See system.log for remainder
...
{noformat}

> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>  Labels: dtest
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2019-01-01 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731753#comment-16731753
 ] 

Jay Zhuang commented on CASSANDRA-14526:


Seems the test is failing for branch 2.2:
{noformat}
$ pytest --cassandra-dir=/Users/zjay/ws/cassandra 
bootstrap_test.py::TestBootstrap::test_bootstrap_binary_disabled
...
E   ccmlib.node.TimeoutError: 02 Jan 2019 03:45:28 [node3] 
Missing: ['Not starting client transports as bootstrap has not completed']:
E   INFO  [main] 2019-01-01 19:44:46,816 Config.java:4.
E   See system.log for remainder
...
{noformat}

> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>  Labels: dtest
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

2018-12-30 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14525:
---
Fix Version/s: (was: 4.x)
   3.11.4

> streaming failure during bootstrap makes new node into inconsistent state
> -
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
> Fix For: 2.2.14, 3.0.18, 3.11.4, 4.0
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>  ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) 
> ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
>  will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not 
> call [StorageService.java::finishJoiningRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
>  and as a result 
> [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999]
>  will not be invoked.
> API 

[jira] [Updated] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

2018-12-30 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14525:
---
   Resolution: Fixed
Fix Version/s: (was: 3.0.x)
   (was: 2.2.x)
   4.x
   3.0.18
   2.2.14
   Status: Resolved  (was: Ready to Commit)

> streaming failure during bootstrap makes new node into inconsistent state
> -
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
> Fix For: 2.2.14, 3.0.18, 4.0, 4.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>  ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) 
> ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
>  will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not 
> call [StorageService.java::finishJoiningRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
>  and as a result 
> 

[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

2018-12-30 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731200#comment-16731200
 ] 

Jay Zhuang commented on CASSANDRA-14525:


Thanks [~chovatia.jayd...@gmail.com] and [~KurtG]. Committed as 
[{{a6196a3}}|https://github.com/apache/cassandra/commit/a6196a3a79b67dc6577747e591456328e57c314f].

> streaming failure during bootstrap makes new node into inconsistent state
> -
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
> Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>  ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) 
> ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
>  will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not 
> call [StorageService.java::finishJoiningRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
>  and as a result 
> 

[jira] [Resolved] (CASSANDRA-14946) DistributedReadWritePathTest fails in circleci

2018-12-30 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang resolved CASSANDRA-14946.

Resolution: Duplicate

> DistributedReadWritePathTest fails in circleci
> --
>
> Key: CASSANDRA-14946
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14946
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jay Zhuang
>Priority: Major
>
> {{org/apache/cassandra/distributed/DistributedReadWritePathTest}} test fails 
> in circleci:
> {noformat}
> [junit] Testcase: 
> org.apache.cassandra.distributed.DistributedReadWritePathTest:coordinatorWrite:
>  Caused an ERROR
> [junit] Forked Java VM exited abnormally. Please note the time in the 
> report does not reflect the time until the VM exit.
> [junit] junit.framework.AssertionFailedError: Forked Java VM exited 
> abnormally. Please note the time in the report does not reflect the time 
> until the VM exit.
> [junit]   at java.util.Vector.forEach(Vector.java:1275)
> [junit]   at java.util.Vector.forEach(Vector.java:1275)
> [junit]   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The test works locally, seems circleci container doesn't have enough memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14946) DistributedReadWritePathTest fails in circleci

2018-12-30 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731143#comment-16731143
 ] 

Jay Zhuang commented on CASSANDRA-14946:


Seems it's a duplication of CASSANDRA-14922

> DistributedReadWritePathTest fails in circleci
> --
>
> Key: CASSANDRA-14946
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14946
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jay Zhuang
>Priority: Major
>
> {{org/apache/cassandra/distributed/DistributedReadWritePathTest}} test fails 
> in circleci:
> {noformat}
> [junit] Testcase: 
> org.apache.cassandra.distributed.DistributedReadWritePathTest:coordinatorWrite:
>  Caused an ERROR
> [junit] Forked Java VM exited abnormally. Please note the time in the 
> report does not reflect the time until the VM exit.
> [junit] junit.framework.AssertionFailedError: Forked Java VM exited 
> abnormally. Please note the time in the report does not reflect the time 
> until the VM exit.
> [junit]   at java.util.Vector.forEach(Vector.java:1275)
> [junit]   at java.util.Vector.forEach(Vector.java:1275)
> [junit]   at java.lang.Thread.run(Thread.java:748)
> {noformat}
> The test works locally, seems circleci container doesn't have enough memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

2018-12-28 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730522#comment-16730522
 ] 

Jay Zhuang commented on CASSANDRA-14525:


Hi, just a question : {{SystemKeyspace.bootstrapComplete()}} is checked here: 
[https://github.com/apache/cassandra/commit/9c3fb65e697d810321936e06504de4b2f7cf633f#diff-b76a607445d53f18a98c9df14323c7ddR392]

But not here: 
[https://github.com/apache/cassandra/commit/9c3fb65e697d810321936e06504de4b2f7cf633f#diff-b76a607445d53f18a98c9df14323c7ddR351]

Is that expected?

> streaming failure during bootstrap makes new node into inconsistent state
> -
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
> Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>  ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) 
> ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
>  will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not 
> call [StorageService.java::finishJoiningRing 
> 

[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

2018-12-28 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730460#comment-16730460
 ] 

Jay Zhuang commented on CASSANDRA-14525:


The uTest failure is because CASSANDRA-14946. The check conditions are hard to 
read, how about switching it from:
{noformat}
// We only start transports if bootstrap has completed and we're not in 
survey mode, OR if we are in
// survey mode and streaming has completed but we're not using auth.
// OR if we have not joined the ring yet.
if (StorageService.instance.hasJoined() &&
((!StorageService.instance.isSurveyMode() && 
!SystemKeyspace.bootstrapComplete()) ||
(StorageService.instance.isSurveyMode() && 
StorageService.instance.isBootstrapMode(
{
logger.info("Not starting client transports as bootstrap has not 
completed");
return;
}
else if (StorageService.instance.hasJoined() &&  
StorageService.instance.isSurveyMode() &&
DatabaseDescriptor.getAuthenticator().requireAuthentication())
{
// Auth isn't initialised until we join the ring, so if we're in 
survey mode auth will always fail.
logger.info("Not starting client transports as write_survey mode 
and authentication is enabled");
return;
}
{noformat}
to:
{noformat}
// Do not start the transports if we already joined the ring AND
// if we are in survey mode, streaming has not completed or auth is 
enabled
// if we are not in survey mode, bootstrap has not completed
if (StorageService.instance.hasJoined())
{
if (StorageService.instance.isSurveyMode())
{
if (StorageService.instance.isBootstrapMode() ||

DatabaseDescriptor.getAuthenticator().requireAuthentication())
{
logger.info("Not starting client transports in write_survey 
mode as it's bootstrapping or auth is enabled");
return;
}
}
else
{
if (!SystemKeyspace.bootstrapComplete()) {
logger.info("Not starting client transports as bootstrap 
has not completed");
return;
}
}
}
{noformat}

> streaming failure during bootstrap makes new node into inconsistent state
> -
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
> Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  

[jira] [Created] (CASSANDRA-14946) DistributedReadWritePathTest fails in circleci

2018-12-28 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-14946:
--

 Summary: DistributedReadWritePathTest fails in circleci
 Key: CASSANDRA-14946
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14946
 Project: Cassandra
  Issue Type: Bug
  Components: Testing
Reporter: Jay Zhuang


{{org/apache/cassandra/distributed/DistributedReadWritePathTest}} test fails in 
circleci:
{noformat}
[junit] Testcase: 
org.apache.cassandra.distributed.DistributedReadWritePathTest:coordinatorWrite: 
  Caused an ERROR
[junit] Forked Java VM exited abnormally. Please note the time in the 
report does not reflect the time until the VM exit.
[junit] junit.framework.AssertionFailedError: Forked Java VM exited 
abnormally. Please note the time in the report does not reflect the time until 
the VM exit.
[junit] at java.util.Vector.forEach(Vector.java:1275)
[junit] at java.util.Vector.forEach(Vector.java:1275)
[junit] at java.lang.Thread.run(Thread.java:748)
{noformat}
The test works locally, seems circleci container doesn't have enough memory.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

2018-12-21 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727118#comment-16727118
 ] 

Jay Zhuang commented on CASSANDRA-14525:


Rebased the code and started the tests:
| Branch | uTest | dTest |
| [14525-2.2|https://github.com/cooldoger/cassandra/tree/14525-2.2] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14525-2.2.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14525-2.2]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/664/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/664/]
 |
| [14525-3.0|https://github.com/cooldoger/cassandra/tree/14525-3.0] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14525-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14525-3.0]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/665/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/665/]
 |
| [14525-3.11|https://github.com/cooldoger/cassandra/tree/14525-3.11] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14525-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14525-3.11]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/666/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/666/]
 |
| [14525-trunk|https://github.com/cooldoger/cassandra/tree/14525-trunk] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14525-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14525-trunk]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/667/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/667/]
 |

> streaming failure during bootstrap makes new node into inconsistent state
> -
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
> Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> 

[jira] [Updated] (CASSANDRA-14616) cassandra-stress write hangs with default options

2018-12-06 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14616:
---
   Resolution: Fixed
Fix Version/s: 4.0
   3.11.4
   3.0.18
   Status: Resolved  (was: Ready to Commit)

Thank you [~Stefania] for the review. Committed as 
[{{bbf7dac}}|https://github.com/apache/cassandra/commit/bbf7dac87cdc41bf8e138a99f630e7a827ad0d98].
 The dTest is committed as 
[{{325ef3f}}|https://github.com/apache/cassandra-dtest/commit/325ef3fa063252e6dad88473613abbd829e8c24d].

> cassandra-stress write hangs with default options
> -
>
> Key: CASSANDRA-14616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Stress
>Reporter: Chris Lohfink
>Assignee: Jay Zhuang
>Priority: Major
> Fix For: 3.0.18, 3.11.4, 4.0
>
>
> Cassandra stress sits there for incredibly long time after connecting to JMX. 
> To reproduce {code}./tools/bin/cassandra-stress write{code}
> If you give it a -n its not as bad which is why dtests etc dont seem to be 
> impacted. Does not occur in 3.0 branch but does in 3.11 and trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14616) cassandra-stress write hangs with default options

2018-12-06 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang reassigned CASSANDRA-14616:
--

Assignee: Jay Zhuang  (was: Jeremy)

> cassandra-stress write hangs with default options
> -
>
> Key: CASSANDRA-14616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14616
> Project: Cassandra
>  Issue Type: Bug
>  Components: Stress
>Reporter: Chris Lohfink
>Assignee: Jay Zhuang
>Priority: Major
>
> Cassandra stress sits there for incredibly long time after connecting to JMX. 
> To reproduce {code}./tools/bin/cassandra-stress write{code}
> If you give it a -n its not as bad which is why dtests etc dont seem to be 
> impacted. Does not occur in 3.0 branch but does in 3.11 and trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14616) cassandra-stress write hangs with default options

2018-11-14 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687347#comment-16687347
 ] 

Jay Zhuang commented on CASSANDRA-14616:


The failed the utest is because of CASSANDRA-14891

> cassandra-stress write hangs with default options
> -
>
> Key: CASSANDRA-14616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14616
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Lohfink
>Assignee: Jeremy
>Priority: Major
>
> Cassandra stress sits there for incredibly long time after connecting to JMX. 
> To reproduce {code}./tools/bin/cassandra-stress write{code}
> If you give it a -n its not as bad which is why dtests etc dont seem to be 
> impacted. Does not occur in 3.0 branch but does in 3.11 and trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14891) [utest] LegacySSTableTest.testInaccurateSSTableMinMax test failed

2018-11-14 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-14891:
--

 Summary: [utest] LegacySSTableTest.testInaccurateSSTableMinMax 
test failed
 Key: CASSANDRA-14891
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14891
 Project: Cassandra
  Issue Type: Bug
  Components: Testing
Reporter: Jay Zhuang


{noformat}
junit.framework.AssertionFailedError
at 
org.apache.cassandra.db.SinglePartitionSliceCommandTest.getUnfilteredsFromSinglePartition(SinglePartitionSliceCommandTest.java:404)
at 
org.apache.cassandra.io.sstable.LegacySSTableTest.ttestInaccurateSSTableMinMax(LegacySSTableTest.java:323)
{noformat}

Related to CASSANDRA-14861



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14616) cassandra-stress write hangs with default options

2018-11-14 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang reassigned CASSANDRA-14616:
--

Assignee: Jeremy Quinn

> cassandra-stress write hangs with default options
> -
>
> Key: CASSANDRA-14616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14616
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Lohfink
>Assignee: Jeremy Quinn
>Priority: Major
>
> Cassandra stress sits there for incredibly long time after connecting to JMX. 
> To reproduce {code}./tools/bin/cassandra-stress write{code}
> If you give it a -n its not as bad which is why dtests etc dont seem to be 
> impacted. Does not occur in 3.0 branch but does in 3.11 and trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14616) cassandra-stress write hangs with default options

2018-11-14 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14616:
---
Reproduced In: 3.11.0, 4.0
   Status: Patch Available  (was: Open)

> cassandra-stress write hangs with default options
> -
>
> Key: CASSANDRA-14616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14616
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Lohfink
>Assignee: Jeremy
>Priority: Major
>
> Cassandra stress sits there for incredibly long time after connecting to JMX. 
> To reproduce {code}./tools/bin/cassandra-stress write{code}
> If you give it a -n its not as bad which is why dtests etc dont seem to be 
> impacted. Does not occur in 3.0 branch but does in 3.11 and trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14616) cassandra-stress write hangs with default options

2018-11-14 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang reassigned CASSANDRA-14616:
--

Assignee: Jeremy  (was: Jeremy Quinn)

> cassandra-stress write hangs with default options
> -
>
> Key: CASSANDRA-14616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14616
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Lohfink
>Assignee: Jeremy
>Priority: Major
>
> Cassandra stress sits there for incredibly long time after connecting to JMX. 
> To reproduce {code}./tools/bin/cassandra-stress write{code}
> If you give it a -n its not as bad which is why dtests etc dont seem to be 
> impacted. Does not occur in 3.0 branch but does in 3.11 and trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14616) cassandra-stress write hangs with default options

2018-11-14 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687300#comment-16687300
 ] 

Jay Zhuang commented on CASSANDRA-14616:


Hi [~Yarnspinner], the fix looks good. I had the similar fix which re-enables 
{{warm-up}} to 50k as before 
([{{StressAction.java}}|https://github.com/apache/cassandra/commit/6a1b1f26b7174e8c9bf86a96514ab626ce2a4117#diff-fd2f2d2364937fcb1c0d73c8314f1418L90])

|Branch|uTest|dTest|
|[14890-3.0|https://github.com/cooldoger/cassandra/tree/14890-3.0]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/]|
|[14890-3.11|https://github.com/cooldoger/cassandra/tree/14890-3.11]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/]|
|[14890-trunk|https://github.com/cooldoger/cassandra/tree/14890-trunk]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/]|

Here is a dTest to reproduce the problem:
|[14890|https://github.com/cooldoger/cassandra-dtest/tree/14890]|

> cassandra-stress write hangs with default options
> -
>
> Key: CASSANDRA-14616
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14616
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Chris Lohfink
>Priority: Major
>
> Cassandra stress sits there for incredibly long time after connecting to JMX. 
> To reproduce {code}./tools/bin/cassandra-stress write{code}
> If you give it a -n its not as bad which is why dtests etc dont seem to be 
> impacted. Does not occur in 3.0 branch but does in 3.11 and trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if `n` is not specified

2018-11-14 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14890:
---
Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Resolve as duplication to CASSANDRA-14616.

> cassandra-stress hang for 200 seconds if `n` is not specified
> -
>
> Key: CASSANDRA-14890
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14890
> Project: Cassandra
>  Issue Type: Bug
>  Components: Stress
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> if parameter {{n}} is not specified, cassandra-stress will hang (wait) for 
> 200 seconds between warm-up and sending traffic.
> For example, the following command will hang 200 seconds before sending the 
> traffic:
> {noformat}
> $ ./tools/bin/cassandra-stress write
> ...
> Created keyspaces. Sleeping 1s for propagation.
> Sleeping 2s...
> Warming up WRITE with 0 iterations...
> Failed to connect over JMX; not collecting these stats
> {noformat}
> It's waiting for this: 
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72]
> As there's no warm-up traffic (CASSANDRA-13773), it will wait until:
> {noformat}
> (measurements >= waiter.maxMeasurements)
> {noformat}
> {{maxMeasurements}} is 200 by default:
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if `n` is not specified

2018-11-14 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14890:
---
Summary: cassandra-stress hang for 200 seconds if `n` is not specified  
(was: cassandra-stress hang for 200 seconds if n is not specified)

> cassandra-stress hang for 200 seconds if `n` is not specified
> -
>
> Key: CASSANDRA-14890
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14890
> Project: Cassandra
>  Issue Type: Bug
>  Components: Stress
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> if parameter {{n}} is not specified, cassandra-stress will hang (wait) for 
> 200 seconds between warm-up and sending traffic.
> For example, the following command will hang 200 seconds before sending the 
> traffic:
> {noformat}
> $ ./tools/bin/cassandra-stress write
> ...
> Created keyspaces. Sleeping 1s for propagation.
> Sleeping 2s...
> Warming up WRITE with 0 iterations...
> Failed to connect over JMX; not collecting these stats
> {noformat}
> It's waiting for this: 
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72]
> As there's no warm-up traffic (CASSANDRA-13773), it will wait until:
> {noformat}
> (measurements >= waiter.maxMeasurements)
> {noformat}
> {{maxMeasurements}} is 200 by default:
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if n is not specified

2018-11-14 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687254#comment-16687254
 ] 

Jay Zhuang commented on CASSANDRA-14890:


Here is a patch to re-enable {{warm-up}} if `n` is not set 
([{{StressAction.java}}|https://github.com/apache/cassandra/commit/6a1b1f26b7174e8c9bf86a96514ab626ce2a4117#diff-fd2f2d2364937fcb1c0d73c8314f1418L90]):
| Branch | uTest | dTest |
| [14890-3.0|https://github.com/cooldoger/cassandra/tree/14890-3.0] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/]
 |
| [14890-3.11|https://github.com/cooldoger/cassandra/tree/14890-3.11] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/]
 |
| [14890-trunk|https://github.com/cooldoger/cassandra/tree/14890-trunk] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/]
 |

Here is the dTest to reproduce the problem:
|[14890|https://github.com/cooldoger/cassandra-dtest/tree/14890]|

[~Stefania] would you please review?

> cassandra-stress hang for 200 seconds if n is not specified
> ---
>
> Key: CASSANDRA-14890
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14890
> Project: Cassandra
>  Issue Type: Bug
>  Components: Stress
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> if parameter {{n}} is not specified, cassandra-stress will hang (wait) for 
> 200 seconds between warm-up and sending traffic.
> For example, the following command will hang 200 seconds before sending the 
> traffic:
> {noformat}
> $ ./tools/bin/cassandra-stress write
> ...
> Created keyspaces. Sleeping 1s for propagation.
> Sleeping 2s...
> Warming up WRITE with 0 iterations...
> Failed to connect over JMX; not collecting these stats
> {noformat}
> It's waiting for this: 
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72]
> As there's no warm-up traffic (CASSANDRA-13773), it will wait until:
> {noformat}
> (measurements >= waiter.maxMeasurements)
> {noformat}
> {{maxMeasurements}} is 200 by default:
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if n is not specified

2018-11-14 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687254#comment-16687254
 ] 

Jay Zhuang edited comment on CASSANDRA-14890 at 11/14/18 10:51 PM:
---

Here is a patch to re-enable {{warm-up}} if `n` is not set 
([{{StressAction.java}}|https://github.com/apache/cassandra/commit/6a1b1f26b7174e8c9bf86a96514ab626ce2a4117#diff-fd2f2d2364937fcb1c0d73c8314f1418L90]):
|Branch|uTest|dTest|
|[14890-3.0|https://github.com/cooldoger/cassandra/tree/14890-3.0]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/]|
|[14890-3.11|https://github.com/cooldoger/cassandra/tree/14890-3.11]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/]|
|[14890-trunk|https://github.com/cooldoger/cassandra/tree/14890-trunk]|[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk]|[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/]|

Here is a dTest to reproduce the problem:
|[14890|https://github.com/cooldoger/cassandra-dtest/tree/14890]|

[~Stefania] would you please review?


was (Author: jay.zhuang):
Here is a patch to re-enable {{warm-up}} if `n` is not set 
([{{StressAction.java}}|https://github.com/apache/cassandra/commit/6a1b1f26b7174e8c9bf86a96514ab626ce2a4117#diff-fd2f2d2364937fcb1c0d73c8314f1418L90]):
| Branch | uTest | dTest |
| [14890-3.0|https://github.com/cooldoger/cassandra/tree/14890-3.0] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.0]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/661/]
 |
| [14890-3.11|https://github.com/cooldoger/cassandra/tree/14890-3.11] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-3.11]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/662/]
 |
| [14890-trunk|https://github.com/cooldoger/cassandra/tree/14890-trunk] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14890-trunk]
 | 
[!https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/badge/icon!|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/663/]
 |

Here is the dTest to reproduce the problem:
|[14890|https://github.com/cooldoger/cassandra-dtest/tree/14890]|

[~Stefania] would you please review?

> cassandra-stress hang for 200 seconds if n is not specified
> ---
>
> Key: CASSANDRA-14890
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14890
> Project: Cassandra
>  Issue Type: Bug
>  Components: Stress
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> if parameter {{n}} is not specified, cassandra-stress will hang (wait) for 
> 200 seconds between warm-up and sending traffic.
> For example, the following command will hang 200 seconds before sending the 
> traffic:
> {noformat}
> $ ./tools/bin/cassandra-stress write
> ...
> Created keyspaces. Sleeping 1s for propagation.
> Sleeping 2s...
> Warming up WRITE with 0 iterations...
> Failed to connect over JMX; not collecting these stats
> {noformat}
> It's waiting for this: 
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72]
> As there's no warm-up traffic (CASSANDRA-13773), it will wait until:
> {noformat}
> (measurements >= waiter.maxMeasurements)
> {noformat}
> {{maxMeasurements}} is 200 by default:
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if n is not specified

2018-11-14 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14890:
---
Status: Patch Available  (was: Open)

> cassandra-stress hang for 200 seconds if n is not specified
> ---
>
> Key: CASSANDRA-14890
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14890
> Project: Cassandra
>  Issue Type: Bug
>  Components: Stress
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> if parameter {{n}} is not specified, cassandra-stress will hang (wait) for 
> 200 seconds between warm-up and sending traffic.
> For example, the following command will hang 200 seconds before sending the 
> traffic:
> {noformat}
> $ ./tools/bin/cassandra-stress write
> ...
> Created keyspaces. Sleeping 1s for propagation.
> Sleeping 2s...
> Warming up WRITE with 0 iterations...
> Failed to connect over JMX; not collecting these stats
> {noformat}
> It's waiting for this: 
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72]
> As there's no warm-up traffic (CASSANDRA-13773), it will wait until:
> {noformat}
> (measurements >= waiter.maxMeasurements)
> {noformat}
> {{maxMeasurements}} is 200 by default:
> [https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14890) cassandra-stress hang for 200 seconds if n is not specified

2018-11-13 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-14890:
--

 Summary: cassandra-stress hang for 200 seconds if n is not 
specified
 Key: CASSANDRA-14890
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14890
 Project: Cassandra
  Issue Type: Bug
  Components: Stress
Reporter: Jay Zhuang
Assignee: Jay Zhuang


if parameter {{n}} is not specified, cassandra-stress will hang (wait) for 200 
seconds between warm-up and sending traffic.
For example, the following command will hang 200 seconds before sending the 
traffic:
{noformat}
$ ./tools/bin/cassandra-stress write
...
Created keyspaces. Sleeping 1s for propagation.
Sleeping 2s...
Warming up WRITE with 0 iterations...
Failed to connect over JMX; not collecting these stats
{noformat}

It's waiting for this: 
[https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/util/Uncertainty.java#L72]
As there's no warm-up traffic (CASSANDRA-13773), it will wait until:
{noformat}
(measurements >= waiter.maxMeasurements)
{noformat}
{{maxMeasurements}} is 200 by default:
[https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommand.java#L153]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14526) dtest to validate Cassandra state post failed/successful bootstrap

2018-10-27 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1241#comment-1241
 ] 

Jay Zhuang commented on CASSANDRA-14526:


Hi [~chovatia.jayd...@gmail.com], the function name is duplicated 
(https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk#diff-0b30b9f097df89d74be1d1af8205ac7eR707),
 I assume the first one could be removed.

> dtest to validate Cassandra state post failed/successful bootstrap
> --
>
> Key: CASSANDRA-14526
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14526
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
>  Labels: dtest
>
> Please find dtest here:
> || dtest ||
> | [patch 
> |https://github.com/apache/cassandra-dtest/compare/master...jaydeepkumar1984:14526-trunk]|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

2018-10-27 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1220#comment-1220
 ] 

Jay Zhuang commented on CASSANDRA-14525:


Sure, I'll kick off the tests.

> streaming failure during bootstrap makes new node into inconsistent state
> -
>
> Key: CASSANDRA-14525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jaydeepkumar Chovatia
>Assignee: Jaydeepkumar Chovatia
>Priority: Major
> Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to 
> streaming failure) then Cassandra state remains in {{joining}} state which is 
> fine but Cassandra also enables Native transport which makes overall state 
> inconsistent. This further creates NullPointer exception if auth is enabled 
> on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: 
> org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) 
> ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:660)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:573)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567)
>  [apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) 
> [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) 
> ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>  ~[guava-18.0.jar:na]
>  at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>  ~[guava-18.0.jar:na]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) 
> ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
>  ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892]
>  will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not 
> call [StorageService.java::finishJoiningRing 
> |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933]
>  and as a result 
> [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999]
>  will not be invoked.
> API 

[jira] [Updated] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-10-08 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14791:
---
Issue Type: Bug  (was: Task)

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.0
>
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-10-08 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14791:
---
Fix Version/s: 4.0

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.0
>
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-10-08 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14791:
---
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

Thanks [~krummas] for the review. Committed as 
[{{73ebd20}}|https://github.com/apache/cassandra/commit/73ebd200c04335624f956e79624cf8494d872f19].

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-10-07 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641314#comment-16641314
 ] 

Jay Zhuang commented on CASSANDRA-14791:


The root cause of this test failure is not because {{/tmp/}} directory is not 
writable. But because the unittest generated tmp files 
{{/tmp/na-1-big-Data.db}} and {{/tmp/na-1-big-CompressionInfo.db}} are not 
deleted after the test. So I guess on these nodes, the test was run by other 
user, which left the tmp files that the current user cannot override. I'm able 
to reproduce the same error message by:
{noformat}
sudo chown root:root /tmp/na-1-big-Data.db
{noformat}

Here is a patch for trunk:
| Branch | uTest |
| [14791|https://github.com/cooldoger/cassandra/tree/14791] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/14791.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/14791]
 |

Passed the tests in Jenkins:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-testall/36/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing
>Reporter: Jay Zhuang
>Priority: Minor
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-10-07 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14791:
---
Assignee: Jay Zhuang
  Status: Patch Available  (was: Open)

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters

2018-10-06 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16640961#comment-16640961
 ] 

Jay Zhuang commented on CASSANDRA-14610:


I'm unable to reproduce the problem locally, for the failed job in Jenkins, 
seems mostly it's because timeout to populate 6 nodes:
{noformat}
Error Message
ccmlib.node.NodeError: Error starting node1.
Stacktrace
self = 

@since('4.0')
def test_describecluster_more_information_three_datacenters(self):
"""
nodetool describecluster should be more informative. It should 
include detailes
for total node count, list of datacenters, RF, number of nodes per 
dc, how many
are down and version(s).
@jira_ticket CASSANDRA-13853
@expected_result This test invokes nodetool describecluster and 
matches the output with the expected one
"""
cluster = self.cluster
>   cluster.populate([2, 3, 1]).start(wait_for_binary_proto=True)
{noformat}

Other tests which requires 6 nodes all marked as 
{{@pytest.mark.resource_intensive}} (then these tests are skipped). I think 
reducing the node number from 6 to 4 should help.

+1 for the patch (also it passed 100 times locally run).

> Flaky dtest: 
> nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
> ---
>
> Key: CASSANDRA-14610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14610
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing, Tools
>Reporter: Jason Brown
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: dtest
>
> @jay zhuang observed 
> nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
>  being flaky in Apache Jenkins. I ran locally and got a different flaky 
> behavior:
> {noformat}
> out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster')
> assert 0 == len(err), err
> >   assert out_node1_dc1 == out_node1_dc3
> E   AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster 
> Infor...1=3, dc3=1}\n'
> E   Cluster Information:
> E Name: test
> E Snitch: org.apache.cassandra.locator.PropertyFileSnitch
> E DynamicEndPointSnitch: enabled
> E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> E Schema versions:
> E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 
> 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]...
> E 
> E ...Full output truncated (26 lines hidden), use '-vv' to show
> 09:58:14,357 ccm DEBUG Log-watching thread exiting.
> ===Flaky Test Report===
> test_describecluster_more_information_three_datacenters failed and was not 
> selected for rerun.
>   
>   assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n'
> Cluster Information:
>   Name: test
>   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
>   DynamicEndPointSnitch: enabled
>   Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>   Schema versions:
>   fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, 
> 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]...
>   
>   ...Full output truncated (26 lines hidden), use '-vv' to show
>   [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>]
> ===End Flaky Test Report===
> {noformat}
> As this test is for a patch that was introduced for 4.0, this dtest (should) 
> only be failing on trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14610) Flaky dtest: nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters

2018-10-06 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14610:
---
Reviewer: Jay Zhuang

> Flaky dtest: 
> nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
> ---
>
> Key: CASSANDRA-14610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14610
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing, Tools
>Reporter: Jason Brown
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: dtest
>
> @jay zhuang observed 
> nodetool_test.TestNodetool.test_describecluster_more_information_three_datacenters
>  being flaky in Apache Jenkins. I ran locally and got a different flaky 
> behavior:
> {noformat}
> out_node1_dc3, err, _ = node1_dc3.nodetool('describecluster')
> assert 0 == len(err), err
> >   assert out_node1_dc1 == out_node1_dc3
> E   AssertionError: assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster 
> Infor...1=3, dc3=1}\n'
> E   Cluster Information:
> E Name: test
> E Snitch: org.apache.cassandra.locator.PropertyFileSnitch
> E DynamicEndPointSnitch: enabled
> E Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> E Schema versions:
> E fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 
> 127.0.0.5, 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]...
> E 
> E ...Full output truncated (26 lines hidden), use '-vv' to show
> 09:58:14,357 ccm DEBUG Log-watching thread exiting.
> ===Flaky Test Report===
> test_describecluster_more_information_three_datacenters failed and was not 
> selected for rerun.
>   
>   assert 'Cluster Info...1=3, dc3=1}\n' == 'Cluster Infor...1=3, dc3=1}\n'
> Cluster Information:
>   Name: test
>   Snitch: org.apache.cassandra.locator.PropertyFileSnitch
>   DynamicEndPointSnitch: enabled
>   Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>   Schema versions:
>   fc9ec7cd-80ba-3f27-87af-fc0bafcf7a03: [127.0.0.6, 127.0.0.5, 
> 127.0.0.4, 127.0.0.3, 127.0.0.2, 127.0.0.1]...
>   
>   ...Full output truncated (26 lines hidden), use '-vv' to show
>   [ /opt/orig/1/opt/dev/cassandra-dtest/nodetool_test.py:373>]
> ===End Flaky Test Report===
> {noformat}
> As this test is for a patch that was introduced for 4.0, this dtest (should) 
> only be failing on trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-12704:
---
   Resolution: Fixed
Fix Version/s: 4.0
   Status: Resolved  (was: Ready to Commit)

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.0
>
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631220#comment-16631220
 ] 

Jay Zhuang commented on CASSANDRA-12704:


Thanks [~michaelsembwever]. Committed to trunk as 
[{{87a}}|https://github.com/apache/cassandra/commit/87abe7249f7ad8b11235d61e048735bd6d62].

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-12704:
---
Status: Ready to Commit  (was: Patch Available)

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-09-27 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630718#comment-16630718
 ] 

Jay Zhuang commented on CASSANDRA-14791:


[~mshuler] talked about the docker option in the last NGCC: 
https://github.com/ngcc/ngcc2017/blob/master/Help_Test_Apache_Cassandra-NGCC_2017.pdf
 . Any idea how we can move forward with this?

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing
>Reporter: Jay Zhuang
>Priority: Minor
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-12704:
---
Reviewer: mck

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12704) snapshot build never be able to publish to mvn artifactory

2018-09-27 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630699#comment-16630699
 ] 

Jay Zhuang commented on CASSANDRA-12704:


Do you think the change should go to trunk only or other branches too?
I would prefer branches from 2.2, as we might have snapshot artifacts for all 
active branches.

> snapshot build never be able to publish to mvn artifactory
> --
>
> Key: CASSANDRA-12704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12704
> Project: Cassandra
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
> Attachments: 12704-trunk.txt
>
>
> {code}
> $ ant publish
> {code}
> works fine when property "release" is set, which publishes the binaries to 
> release Artifactory.
> But for daily snapshot build, if "release" is set, it won't be snapshot build:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L74
> if "release" is not set, it doesn't publish to snapshot Artifactory:
> https://github.com/apache/cassandra/blob/cassandra-3.7/build.xml#L1888
> I would suggest just removing the "if check" for target "publish".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-09-26 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16629010#comment-16629010
 ] 

Jay Zhuang commented on CASSANDRA-14791:


Hi [~mshuler], [~spo...@gmail.com], any idea if there's a permission setting we 
could set for the Jenkins Job/Slave?

> [utest] tests unable to write system tmp directory
> --
>
> Key: CASSANDRA-14791
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
> Project: Cassandra
>  Issue Type: Task
>  Components: Testing
>Reporter: Jay Zhuang
>Priority: Minor
>
> Some tests are failing from time to time because it cannot write to directory 
> {{/tmp/}}:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/
> {noformat}
> java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
> /tmp/na-1-big-Data.db
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
>   at 
> org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
>   at 
> org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
>   at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
>   at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
>   at 
> sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
>   at java.nio.channels.FileChannel.open(FileChannel.java:287)
>   at java.nio.channels.FileChannel.open(FileChannel.java:335)
>   at 
> org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
> {noformat}
>  I guess it's because some Jenkins slaves don't have proper permission set. 
> For slave {{cassandra16}}, the tests are fine:
> https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14791) [utest] tests unable to write system tmp directory

2018-09-25 Thread Jay Zhuang (JIRA)
Jay Zhuang created CASSANDRA-14791:
--

 Summary: [utest] tests unable to write system tmp directory
 Key: CASSANDRA-14791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14791
 Project: Cassandra
  Issue Type: Task
  Components: Testing
Reporter: Jay Zhuang


Some tests are failing from time to time because it cannot write to directory 
{{/tmp/}}:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/lastCompletedBuild/testReport/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/

{noformat}
java.lang.RuntimeException: java.nio.file.AccessDeniedException: 
/tmp/na-1-big-Data.db
at 
org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:119)
at 
org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:152)
at 
org.apache.cassandra.io.util.SequentialWriter.(SequentialWriter.java:141)
at 
org.apache.cassandra.io.compress.CompressedSequentialWriter.(CompressedSequentialWriter.java:82)
at 
org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testCompressedReadWith(CompressedInputStreamTest.java:119)
at 
org.apache.cassandra.streaming.compression.CompressedInputStreamTest.testException(CompressedInputStreamTest.java:78)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.file.AccessDeniedException: /tmp/na-1-big-Data.db
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at 
org.apache.cassandra.io.util.SequentialWriter.openChannel(SequentialWriter.java:100)
{noformat}

 I guess it's because some Jenkins slaves don't have proper permission set. For 
slave {{cassandra16}}, the tests are fine:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-trunk-test/723/testReport/junit/org.apache.cassandra.streaming.compression/CompressedInputStreamTest/testException/history/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14497) Add Role login cache

2018-08-30 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16598116#comment-16598116
 ] 

Jay Zhuang commented on CASSANDRA-14497:


Thanks [~beobal] for answering the questions. This is a very useful feature 
that we should try to get it into {{4.0}}, we have several clusters that are 
having high QPS on {{system_auth}} because of {{canLogin}} is not cached.

Overall the patch looks good to me, I have a few comments or actually just 
questions:
{quote}
Permissions can be grouped together by assigning them to roles, which can then 
be granted to other roles. LOGIN is the way to differentiate these logical 
roles from ones which represent 'real' database users.
{quote}
{{logical}} role doesn't have password right? Can we use that?
{quote}
Both login and superuser privs are part of authz really. 
{quote}
Then if we disable 
[{{authorizer}}|https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L112],
 it should not do the login check right? But the current implementation still 
checks, and the patch will load all the role hierarchy information to the cache 
then find the primary role we need: 
https://github.com/beobal/cassandra/commit/cf3965c82cafd31a3b585e19fd6beba9a56b85e5#diff-b13b86e6bacbcc61c6e9f07715f46ed6R109

Maybe my questions are beyond the scope of this ticket. If we just want to add 
cache with minimized the impact. I think the patch looks good.

> Add Role login cache
> 
>
> Key: CASSANDRA-14497
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14497
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Auth
>Reporter: Jay Zhuang
>Assignee: Sam Tunnicliffe
>Priority: Major
>  Labels: security
> Fix For: 4.0
>
>
> The 
> [{{ClientState.login()}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/ClientState.java#L313]
>  function is used for all auth message: 
> [{{AuthResponse.java:82}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/messages/AuthResponse.java#L82].
>  But the 
> [{{role.canLogin}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L521]
>  information is not cached. So it hits the database every time: 
> [{{CassandraRoleManager.java:407}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L407].
>  For a cluster with lots of new connections, it's causing performance issue. 
> The mitigation for us is to increase the {{system_auth}} replication factor 
> to match the number of nodes, so 
> [{{local_one}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/auth/CassandraRoleManager.java#L488]
>  would be very cheap. The P99 dropped immediately, but I don't think it is 
> not a good solution.
> I would purpose to add {{Role.canLogin}} to the RolesCache to improve the 
> auth performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-9989) Optimise BTree.build

2018-08-30 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-9989:
--
Fix Version/s: (was: 4.x)
   4.0

> Optimise BTree.build
> 
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.0
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14678) Propose reducing the default value for PasswordAuthenticator number of hashing rounds

2018-08-30 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597986#comment-16597986
 ] 

Jay Zhuang commented on CASSANDRA-14678:


{quote}
The purpose of hashing with bcrypt is to prevent easy decoding of passwords in 
case of a database leak, partially. A sleep wouldn't help with that, nor is it 
a good idea in general, speaking of DoS.
{quote}
That's true. If the hashed password is leaked, it would be easier to decode.

{quote}
That's fair, although in this case you'd still be sending the plaintext hash 
over unsecure network, which will be sufficient for anyone else to log in by 
intercepting just that.
{quote}
One solution is adding timestamp to the salt and the server only check the hash 
within {{timestamp +/- [a configurable time]}}.


> Propose reducing the default value for PasswordAuthenticator number of 
> hashing rounds
> -
>
> Key: CASSANDRA-14678
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14678
> Project: Cassandra
>  Issue Type: Wish
>  Components: Auth
>Reporter: Shichao An
>Priority: Major
>
> We saw performance degradation in some of our Cassandra clusters using 
> PasswordAuthenticator. When the clients start connecting to the Cassandra 
> nodes, the CPU load increases, and there is a high chance that the host will 
> be unable to recover from high CPU usage if the clients retry indefinitely at 
> relatively high frequency. In each reconnection, the clients try to initiate 
> auth handshakes, but may fail due to timeouts from the overloaded host, 
> whereas the sporadic auth handshakes will put more load to the host, so on so 
> forth. In our case, the load average can be 1000~3000 on a 32-core host. The 
> host is basically unable to serve any traffic.
> We found it is caused by the slow `BCrypt.checkpw` operation, where the 
> generated salted hash round is 10 because `GENSALT_LOG2_ROUNDS_PROPERTY` 
> defaults 10, which makes it 2^10 rounds of hashing iterations. I changed the 
> hashing rounds to 4 by overriding `auth_bcrypt_gensalt_log2_rounds` system 
> property and it can effectively solve above-mentioned the CPU issue.
> It took us some time to nail down the cause of this problem. Shall we reduce 
> the default value of `GENSALT_LOG2_ROUNDS_PROPERTY` to a smaller value than 
> 10? Any suggestions on the tradeoff between performance and cryptographic 
> impact?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14551) ReplicationAwareTokenAllocator should block bootstrap if no replication number is set

2018-08-30 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597961#comment-16597961
 ] 

Jay Zhuang commented on CASSANDRA-14551:


Hi [~dikanggu], I just saw your response in the commit:
{quote}
I explicitly made it work with 0 replication factor case in this jira, 
CASSANDRA-12983. This change will break the behavior in that situation, right?
{quote}
Yeah, I see your point. I think the suggestion for that (and other use cases) 
is using a separate dummy keyspace for {{allocate_tokens_for_keyspace}}, so you 
can have more control over token allocation and production keyspace.  Default 
replication to 1 may not be a right assumption.

> ReplicationAwareTokenAllocator should block bootstrap if no replication 
> number is set
> -
>
> Key: CASSANDRA-14551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Configuration
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>Priority: Minor
>
> We're using 
> [ReplicationAwareTokenAllocator|https://www.datastax.com/dev/blog/token-allocation-algorithm].
>  When bootstrapping a new DC, the tokens are not well distributed. The 
> problem is because the replication number is not set for the new DC before 
> the bootstrap.
> I would suggest blocking the bootstrap if replication number is not set. It's 
> unsafe to assume the default replicas is 1. Which also causes the following 
> invalid stats:
> {noformat}
> WARN  [main] 2018-06-29 17:30:55,696 TokenAllocation.java:69 - Replicated 
> node load in datacenter before allocation max NaN min NaN stddev NaN
> WARN  [main] 2018-06-29 17:30:55,696 TokenAllocation.java:70 - Replicated 
> node load in datacenter after allocation max NaN min NaN stddev NaN
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14678) Propose reducing the default value for PasswordAuthenticator number of hashing rounds

2018-08-30 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597608#comment-16597608
 ] 

Jay Zhuang commented on CASSANDRA-14678:


Should we add a sleep instead of doing hundreds of hashing which is a very CPU 
intensive? So it won't exhaust the node and protected from potential DoS attack 
(by keeping auth with invalid password).
For the password protection, seems {{2^4=16}} rounds of hashing is already good 
enough.

> Propose reducing the default value for PasswordAuthenticator number of 
> hashing rounds
> -
>
> Key: CASSANDRA-14678
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14678
> Project: Cassandra
>  Issue Type: Wish
>  Components: Auth
>Reporter: Shichao An
>Priority: Major
>
> We saw performance degradation in some of our Cassandra clusters using 
> PasswordAuthenticator. When the clients start connecting to the Cassandra 
> nodes, the CPU load increases, and there is a high chance that the host will 
> be unable to recover from high CPU usage if the clients retry indefinitely at 
> relatively high frequency. In each reconnection, the clients try to initiate 
> auth handshakes, but may fail due to timeouts from the overloaded host, 
> whereas the sporadic auth handshakes will put more load to the host, so on so 
> forth. In our case, the load average can be 1000~3000 on a 32-core host. The 
> host is basically unable to serve any traffic.
> We found it is caused by the slow `BCrypt.checkpw` operation, where the 
> generated salted hash round is 10 because `GENSALT_LOG2_ROUNDS_PROPERTY` 
> defaults 10, which makes it 2^10 rounds of hashing iterations. I changed the 
> hashing rounds to 4 by overriding `auth_bcrypt_gensalt_log2_rounds` system 
> property and it can effectively solve above-mentioned the CPU issue.
> It took us some time to nail down the cause of this problem. Shall we reduce 
> the default value of `GENSALT_LOG2_ROUNDS_PROPERTY` to a smaller value than 
> 10? Any suggestions on the tradeoff between performance and cryptographic 
> impact?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-9989) Optimise BTree.build

2018-08-30 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-9989:
--
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

Thanks [~benedict] for the review. Committed as 
[{{2e59ea8}}|https://github.com/apache/cassandra/commit/2e59ea8c7f21cb11b7ce71a5cdf303a8ed453bc0].

> Optimise BTree.build
> 
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-29 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16597080#comment-16597080
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

I rebased and squashed the commits:
|Branch|uTest|
| [9989-rebased|https://github.com/cooldoger/cassandra/tree/9989-rebased] | 
[!https://circleci.com/gh/cooldoger/cassandra/tree/9989-rebased.svg?style=svg!|https://circleci.com/gh/cooldoger/cassandra/tree/9989-rebased]
 |


> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-29 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596997#comment-16596997
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

There's a little bit improvement when we pre-compute the child size and split 
the left values to the last 2 nodes: 
[{{9989-2}}|https://github.com/cooldoger/cassandra/tree/9989-2]. Here is my 
benchmark test results (it should only impact the large bTree build):

{noformat}
== calculate child size every round:
 [java] Benchmark  (dataSize)   Mode  Cnt   Score   
Error   Units
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16  124595.864 ? 
10133.336  ops/ms
 [java] BTreeBuildBench.buildTreeTest   2  thrpt   16  120228.601 ? 
12859.617  ops/ms
 [java] BTreeBuildBench.buildTreeTest   5  thrpt   16  103881.001 ? 
 8136.400  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16   89141.480 ? 
 7716.011  ops/ms
 [java] BTreeBuildBench.buildTreeTest  20  thrpt   16   67390.602 ? 
 8057.348  ops/ms
 [java] BTreeBuildBench.buildTreeTest  40  thrpt   16   19633.234 ? 
 1545.773  ops/ms
 [java] BTreeBuildBench.buildTreeTest 100  thrpt   16   10334.557 ? 
 1027.898  ops/ms
 [java] BTreeBuildBench.buildTreeTest1000  thrpt   161239.163 ? 
  173.303  ops/ms
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16 104.024 ? 
   12.069  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16  10.259 ? 
1.088  ops/ms

== pre-calculate child size and split the left values to the last 2 nodes:
 [java] Benchmark  (dataSize)   Mode  Cnt   Score   
Error   Units
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16  122030.330 ? 
10528.782  ops/ms
 [java] BTreeBuildBench.buildTreeTest   2  thrpt   16  121939.935 ? 
12627.014  ops/ms
 [java] BTreeBuildBench.buildTreeTest   5  thrpt   16  104694.942 ? 
 9031.935  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16   87687.949 ? 
 9029.432  ops/ms
 [java] BTreeBuildBench.buildTreeTest  20  thrpt   16   67941.722 ? 
 7099.874  ops/ms
 [java] BTreeBuildBench.buildTreeTest  40  thrpt   16   19468.380 ? 
 1640.993  ops/ms
 [java] BTreeBuildBench.buildTreeTest 100  thrpt   16   10503.954 ? 
  980.228  ops/ms
 [java] BTreeBuildBench.buildTreeTest1000  thrpt   161374.558 ? 
  167.329  ops/ms
 [java] BTreeBuildBench.buildTreeTest   1  thrpt   16 111.364 ? 
8.896  ops/ms
 [java] BTreeBuildBench.buildTreeTest  10  thrpt   16  10.728 ? 
1.107  ops/ms
{noformat}

I would prefer the clearer code.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-29 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596957#comment-16596957
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Nice catch. I updated the branch to split the left values to the left child 
nodes. Please review again:

|branch|[9989|https://github.com/cooldoger/cassandra/commits/9989]|

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-28 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16595399#comment-16595399
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Thanks [~benedict].

{quote}
1. It might be nice to rename pos to index for consistency with indexOffsets
{quote}
Changed.

{quote}
2. It might be nicer to split even more evenly, as far as possible - if only 
from a code perspective. The situation you're accounting for of a single key in 
the final child could be resolved by decrementing some K from every other node, 
I think.
{quote}
Make sense to me. It could be done by:
{noformat}
childSize = (size - 1) / childNum;
{noformat}
instead of
{noformat}
childSize = size / childNum;
{noformat}
The code is also simpler.

Please review:
|branch|[9989|https://github.com/cooldoger/cassandra/commits/9989]|

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-27 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594239#comment-16594239
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Sorry, I misunderstood the last part. I updated the patch to evenly split the 
values to all child nodes. The TREE_SIZE is starting from index 0 instead of 1. 
{{left}} is replaced with an incrementing counter, please review:
|branch | [9989|https://github.com/cooldoger/cassandra/tree/9989]|

{{LongBTreeTest.java}} is timing out even for trunk, I'm increasing timeout and 
run through the test.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-23 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16591169#comment-16591169
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

Sorry, I missed {{point 1.}} (I thought it's something else), but you already 
get it:
{noformat}
 int childrenNum = (size + childSize + 1) / (childSize +1)
==>  int childrenNum = size / (childSize + 1) + (childSize + 1) / (childSize + 
1)
==>  int childrenNum = size / (childSize + 1) + 1
{noformat}
We can make it more clear with: {{(size / (childSize + 1)) + 1}}.

 
{quote}I think it might be nicer in this case to make the TREE_SIZE logically 
more obvious, so that its accessors (which are rather more complicated) are 
simplified, rather than its calculation? I don't think this is very tricky 
anyway - just set TREE_SIZE[0] = FAN_FACTOR, and leave the loop as it is, I 
think?
{quote}
It's also used here: 
[{{TREE_SIZE[level-2]}}|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-4b911b7d0959c6219175e2349968f3cdR196].
 Which needs to be changed to: {{int grandchildSize = level == 1 ? 0 : 
TREE_SIZE[level - 2];}}. I prefer avoiding this check while building every node 
(will add a comment that leaf node is level 1). But I'm fine with either way.

 
{quote}I think this is also more easily done by an incrementing counter, rather 
than decrementing?
{quote}
Sure, I'll update the patch for that.

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9989) Optimise BTree.Buider

2018-08-23 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590983#comment-16590983
 ] 

Jay Zhuang commented on CASSANDRA-9989:
---

[~benedict], thank you very much for the review.

I updated the branch based on your comments: [{{branch: 
9989}}|https://github.com/cooldoger/cassandra/commits/9989].
{quote}1. How did you arrive at your childrenNum calculation, and are we 
certain it is correct? This is pretty critical for correctness, and hard to 
test fully, so it would be nice to have some comments justifying it.
 4. It would be nice if we removed MAX_DEPTH, and just truncated TREE_SIZE to 
the correct maximum in our static block
{quote}
Fixed. Now it auto calculates the max height of the tree that we could build.
{quote}2. Why decrement left instead of just counting up the number of values 
written?
{quote}
It's used to update 
[{{indexOffsets\[i\]}}|https://github.com/cooldoger/cassandra/commit/0c1a9d11d6540ac7b233c400e0d8b1a56e647d5f#diff-4b911b7d0959c6219175e2349968f3cdR179].
{quote}3. Why is TREE_SIZE indexed from 1, not 0?
{quote}
Just to make the initial calculation easier: 
[{{TREE_SIZE\[i-1\]}}|https://github.com/cooldoger/cassandra/commit/0c1a9d11d6540ac7b233c400e0d8b1a56e647d5f#diff-4b911b7d0959c6219175e2349968f3cdR84].
 With the new patch, it's also used here: to get 
[{{grandchildSize}}|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-4b911b7d0959c6219175e2349968f3cdR196].
{quote}I'm also torn on the splitting of the last two nodes - this is 
consistent with the current NodeBuilder logic, but it does complicate the code 
a little versus evenly splitting the remaining size amongst all the children.
{quote}
I was thinking to make the tree a little bit more balanced by splitting it 
equally to the last 2 nodes. But yes, it also makes sense to make it the same 
as before. I updated the code and added an unittest to make sure the BTree is 
[exactly the 
same|https://github.com/cooldoger/cassandra/commit/8369dc8b7be3ccf8d1972e9c8cff95adb3493005#diff-cb7b127243f861292899bad7305217dbR592]
 as before (with 
{{[NodeBuilder()|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/btree/NodeBuilder.java]}}).

> Optimise BTree.Buider
> -
>
> Key: CASSANDRA-9989
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9989
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Benedict
>Assignee: Jay Zhuang
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 9989-trunk.txt
>
>
> BTree.Builder could reduce its copying, and exploit toArray more efficiently, 
> with some work. It's not very important right now because we don't make as 
> much use of its bulk-add methods as we otherwise might, however over time 
> this work will become more useful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14596) [dtest] test_mutation_v5 - write_failures_test.TestWriteFailures

2018-08-10 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-14596:
---
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

Thanks [~jasobrown] for the review. Committed as 
[{{2572ddc}}|https://github.com/apache/cassandra-dtest/commit/2572ddce6c9a33ae81e1543195bfae084f835d6d].

> [dtest] test_mutation_v5 - write_failures_test.TestWriteFailures
> 
>
> Key: CASSANDRA-14596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14596
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jason Brown
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: dtest
>
> dtest fails with the following pytest error:
> {noformat}
> s = b'\x00\x00'
> >   unpack = lambda s: packer.unpack(s)[0]
> E   struct.error: unpack requires a buffer of 4 bytes
> {noformat}
> Test fails on 3.11 (was introduced for 3.10), but succeeds on trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-14596) [dtest] test_mutation_v5 - write_failures_test.TestWriteFailures

2018-08-01 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang reassigned CASSANDRA-14596:
--

Assignee: Jay Zhuang

> [dtest] test_mutation_v5 - write_failures_test.TestWriteFailures
> 
>
> Key: CASSANDRA-14596
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14596
> Project: Cassandra
>  Issue Type: Bug
>  Components: Testing
>Reporter: Jason Brown
>Assignee: Jay Zhuang
>Priority: Minor
>  Labels: dtest
>
> dtest fails with the following pytest error:
> {noformat}
> s = b'\x00\x00'
> >   unpack = lambda s: packer.unpack(s)[0]
> E   struct.error: unpack requires a buffer of 4 bytes
> {noformat}
> Test fails on 3.11 (was introduced for 3.10), but succeeds on trunk



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



  1   2   3   4   5   6   7   8   >