[jira] [Commented] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078943#comment-17078943
 ] 

Marcus Eriksson commented on CASSANDRA-15708:
-

thanks, created CASSANDRA-15709 for the test failure

> Fix in-jvm upgrade dtests
> -
>
> Key: CASSANDRA-15708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15709) SEPExecutorTest changingMaxWorkersMeetsConcurrencyGoalsTest failure

2020-04-08 Thread Marcus Eriksson (Jira)
Marcus Eriksson created CASSANDRA-15709:
---

 Summary: SEPExecutorTest 
changingMaxWorkersMeetsConcurrencyGoalsTest failure
 Key: CASSANDRA-15709
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15709
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson


{code}
[junit-timeout] Testsuite: org.apache.cassandra.concurrent.SEPExecutorTest
[junit-timeout] Testsuite: org.apache.cassandra.concurrent.SEPExecutorTest 
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.656 sec
[junit-timeout] 
[junit-timeout] Testcase: 
changingMaxWorkersMeetsConcurrencyGoalsTest(org.apache.cassandra.concurrent.SEPExecutorTest):
 FAILED
[junit-timeout] expected: but was:
[junit-timeout] junit.framework.AssertionFailedError: expected: but 
was:
[junit-timeout] at 
org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:180)
[junit-timeout] at 
org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:110)
[junit-timeout] 
[junit-timeout] 
[junit-timeout] Test org.apache.cassandra.concurrent.SEPExecutorTest FAILED
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config

2020-04-08 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078863#comment-17078863
 ] 

David Capwell commented on CASSANDRA-15686:
---

Here is the script I used to get the container CPUs

{code}
$ cat ci/get_cpu_count
#!/usr/bin/env bash

#set -o xtrace
set -o errexit
set -o pipefail
set -o nounset

if [ -d /sys/fs/cgroup/cpu ]; then
  quota=$(cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us)
  period=$(cat /sys/fs/cgroup/cpu/cpu.cfs_period_us)
  shares=$(cat /sys/fs/cgroup/cpu/cpu.shares)
  if [ $quota -gt -1 ] && [ $period -gt 0 ]; then 
echo $(( $quota / $period  ))
exit
  elif [ $shares -gt -1 ] && [ $shares -ne 1024 ]; then
# in docker shares default to 1024 so double check for that default
awk "
function ceil(x) {
  y=int(x);
  return ( x>y ? y+1 : y )
}
BEGIN { print ceil($shares / 1024.0) }
"
exit
  fi
fi

nproc
{code}

> Improvements in circle CI default config
> 
>
> Key: CASSANDRA-15686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Kevin Gallardo
>Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the default? We could generate a 
> lowres version of the new config-2_1.yml, and rename it {{config.yml}} so 
> that it gets picked up by CircleCI automatically instead of the current 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config

2020-04-08 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078840#comment-17078840
 ] 

David Capwell commented on CASSANDRA-15686:
---

[~mck] [~newkek] doesn't look like concurrent runners are safe =(

{code}
org.apache.cassandra.exceptions.ConfigurationException: 127.0.0.1:7014 is in 
use by another process.  Change listen_address:storage_port in cassandra.yaml 
to values that do not conflict with other services
at 
org.apache.cassandra.net.InboundConnectionInitiator.bind(InboundConnectionInitiator.java:159)
at 
org.apache.cassandra.net.InboundConnectionInitiator.bind(InboundConnectionInitiator.java:181)
at 
org.apache.cassandra.net.InboundSockets$InboundSocket.open(InboundSockets.java:95)
at 
org.apache.cassandra.net.InboundSockets$InboundSocket.open(InboundSockets.java:82)
at org.apache.cassandra.net.InboundSockets.open(InboundSockets.java:209)
at 
org.apache.cassandra.net.ConnectionTest.lambda$doTest$8(ConnectionTest.java:241)
at 
org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:262)
at 
org.apache.cassandra.net.ConnectionTest.doTest(ConnectionTest.java:240)
at org.apache.cassandra.net.ConnectionTest.test(ConnectionTest.java:229)
at 
org.apache.cassandra.net.ConnectionTest.testCRCCorruption(ConnectionTest.java:725)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
{code}


I ran with 8 cores and 16gb memory (matches HIGHER in circle ci) and saw more 
failures than normal (normally ~1-3 failures, had 9; most known flaky tests).  
At least for me, running with 8 runners didn't seem to improve job latency; 
felt the same (though only ran once so could have been a outlier).

> Improvements in circle CI default config
> 
>
> Key: CASSANDRA-15686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Kevin Gallardo
>Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the 

[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config

2020-04-08 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078822#comment-17078822
 ] 

David Capwell commented on CASSANDRA-15686:
---

FYI I tested setting runner based off container cpu (4cpu, 8gb memory) limits 
and got the following

{code}
[junit-timeout] [36.344s][warning][os,thread] Failed to start thread - 
pthread_create failed (EAGAIN) for attributes: stacksize: 1024k, guardsize: 0k, 
detached.
{code}



> Improvements in circle CI default config
> 
>
> Key: CASSANDRA-15686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Kevin Gallardo
>Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the default? We could generate a 
> lowres version of the new config-2_1.yml, and rename it {{config.yml}} so 
> that it gets picked up by CircleCI automatically instead of the current 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15637) CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference between thrift and the new system.size_estimates table when dealing with multiple dc

2020-04-08 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078783#comment-17078783
 ] 

David Capwell commented on CASSANDRA-15637:
---

Anything I can do to help the review?

> CqlInputFormat regression going from 2.1 to 3.x caused by semantic difference 
> between thrift and the new system.size_estimates table when dealing with 
> multiple dc deployments
> --
>
> Key: CASSANDRA-15637
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15637
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> In 3.0 CqlInputFormat switched away from thrift in favor of a new 
> system.size_estimates table, but the semantics changed when dealing with 
> multiple DCs or when Cassandra is not collocated with Hadoop.
> The core issues are:
> * system.size_estimates uses the primary range, in a multi-dc setup this 
> could lead to uneven ranges
> example:
> {code}
> DC1: [0, 10, 20, 30]
> DC2: [1, 11, 21, 31]
> DC3: [2, 12, 22, 32]
> {code}
> Using NetworkTopologyStrategy the primary ranges are: [0, 1), [1, 2), [2, 
> 10), [10, 11), [11, 12), [12, 20), [20, 21), [21, 22), [22, 30), [30, 31), 
> [31, 32), [32, 0).
> Given this the only ranges that are more than one token are: [2, 10), [12, 
> 20), [22, 30).
> * system.size_estimates is not replicated so need to hit every node in the 
> cluster to get estimates, if nodes are down in the DC with non-size-1 ranges 
> there is no way to get a estimate.
> * CqlInputFormat used to call describe_local_ring so all interactions were 
> with a single DC, the java driver doesn't filter the DC so looks to allow 
> cross DC traffic and includes nodes from other DCs in the replica set; in the 
> example above, the amount of splits went from 4 to 12.
> * CqlInputFormat used to call describe_splits_ex to dynamically calculate the 
> estimates, this was on the "local primary range" and was able to hit replicas 
> to create estimates if the primary was down. With system.size_estimates we no 
> longer have backup and no longer expose the "local primary range" in multi-dc.
> * CqlInputFormat had a config cassandra.input.keyRange which let you define 
> your own range.  If the range doesn't perfectly match the local range then 
> the intersectWith calls will produce ranges with no estimates.  Example: [0, 
> 10, 20], cassandra.input.keyRange=5,15.  This won't find any estimates so 
> will produce 2 splits with 128 estimate (default when not found).
> * CqlInputFormat special cases Cassandra being collocated with Hadoop and 
> assumes this when querying system.size_estimates as it doesn't filter to the 
> specific host, this means that non-collocated deployments randomly select the 
> nodes and create splits with ranges the hosts do not have locally.
> The problems are deterministic to replicate, the following test will show it
> 1) deploy a 3 DC cluster with 3 nodes each
> 2) create DC2 tokens are +1 of DC1 and DC3 are +1 of DC2
> 3) CREATE KEYSPACE simpleuniform0 WITH replication = {‘class’: 
> ‘NetworkTopologyStrategy’, ‘DC1’: 3, ‘DC2’: 3, ‘DC3’: 3};
> 4) CREATE TABLE simpletable0 (pk bigint, ck bigint, value blob, PRIMARY KEY 
> (pk, ck))
> 5) insert 500k partitions uniformly: [0, 500,000)
> 6) wait until estimates catch up to writes
> 7) for all nodes, SELECT * FROM system.size_estimates
> You will get the following
> {code}
>  keyspace_name  | table_name   | range_start  | range_end
> | mean_partition_size | partitions_count
> +--+--+--+-+--
>  simpleuniform0 | simpletable0 | -9223372036854775808 | -6148914691236517206 
> |  87 |   122240
>  simpleuniform0 | simpletable0 |  6148914691236517207 | -9223372036854775808 
> |  87 |   121472
> (2 rows)
>  keyspace_name  | table_name   | range_start | range_end   | 
> mean_partition_size | partitions_count
> +--+-+-+-+--
>  simpleuniform0 | simpletable0 |   2 | 6148914691236517205 |  
> 87 |   243072
> (1 rows)
>  keyspace_name  | table_name   | range_start  | range_end
> | mean_partition_size | partitions_count
> +--+--+--+-+--

[jira] [Commented] (CASSANDRA-14781) Log message when mutation passed to CommitLog#add(Mutation) is too large is not descriptive enough

2020-04-08 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078740#comment-17078740
 ] 

Jordan West commented on CASSANDRA-14781:
-

I believe this patch is ready (and has one +1) but needs a committer to review. 
[~n.v.harikrishna] was another ticket opened? 

> Log message when mutation passed to CommitLog#add(Mutation) is too large is 
> not descriptive enough
> --
>
> Key: CASSANDRA-14781
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14781
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Hints, Local/Commit Log, Messaging/Client
>Reporter: Jordan West
>Assignee: Tom Petracca
>Priority: Normal
>  Labels: protocolv5
> Fix For: 4.0-beta
>
> Attachments: CASSANDRA-14781.patch, CASSANDRA-14781_3.0.patch, 
> CASSANDRA-14781_3.11.patch
>
>
> When hitting 
> [https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/commitlog/CommitLog.java#L256-L257],
>  the log message produced does not help the operator track down what data is 
> being written. At a minimum the keyspace and cfIds involved would be useful 
> (and are available) – more detail might not be reasonable to include. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15704) add client request size metrics to netty pipeline

2020-04-08 Thread Jon Haddad (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078726#comment-17078726
 ] 

Jon Haddad commented on CASSANDRA-15704:


Thanks for the review David.  I run a couple tests and confirmed the double 
count.  I'll revise the patch.

> add client request size metrics to netty pipeline
> -
>
> Key: CASSANDRA-15704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15704
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability/Metrics
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We currently lack metrics around client connection incoming / outgoing bytes. 
>  It’s fairly standard to know how many bytes are read and written to the 
> network, but that aggregates client facing and internal cluster traffic to a 
> single number.  This patch will help us understand client overhead more 
> granularly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15656) Expose repair streaming metric

2020-04-08 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078710#comment-17078710
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15656:
-

[~marcuse] also this one should be documented if not, too
CC again [~lor...@datastax.com]

> Expose repair streaming metric
> --
>
> Key: CASSANDRA-15656
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15656
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: Screenshot 2020-04-02 at 09.04.41.png, Screenshot 
> 2020-04-02 at 09.05.03.png, Screenshot 2020-04-02 at 09.05.19.png
>
>
> We should expose a metric for how much data is streamed during repair



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15654) Track preview repair failures

2020-04-08 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078708#comment-17078708
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-15654 at 4/8/20, 9:08 PM:
--

Marcus Eriksson, I almost forgot, we need to consider documenting this metric.
CC also [~lor...@datastax.com]


was (Author: e.dimitrova):
[~marcuse], I almost forgot, we need to consider documenting this metric.
CC also [~polandll]

> Track preview repair failures
> -
>
> Key: CASSANDRA-15654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15654
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair, Observability/Metrics
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 4.0-alpha
>
> Attachments: Screenshot 2020-04-02 at 09.10.06.png, Screenshot 
> 2020-04-02 at 09.10.34.png
>
>
> We should expose a metric for when preview repair fails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15654) Track preview repair failures

2020-04-08 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078708#comment-17078708
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15654:
-

[~marcuse], I almost forgot, we need to consider documenting this metric.
CC also [~polandll]

> Track preview repair failures
> -
>
> Key: CASSANDRA-15654
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15654
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Repair, Observability/Metrics
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Low
> Fix For: 4.0-alpha
>
> Attachments: Screenshot 2020-04-02 at 09.10.06.png, Screenshot 
> 2020-04-02 at 09.10.34.png
>
>
> We should expose a metric for when preview repair fails



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15390) Avoid unnecessary collection/iterator allocations during btree construction

2020-04-08 Thread Blake Eggleston (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15390:

Source Control Link: 
https://github.com/apache/cassandra/commit/01a091a0412c4a05eae94f24458dd139a42dfda3
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

thanks, committed to trunk as 
[01a091a0412c4a05eae94f24458dd139a42dfda3|https://github.com/apache/cassandra/commit/01a091a0412c4a05eae94f24458dd139a42dfda3]

> Avoid unnecessary collection/iterator allocations during btree construction
> ---
>
> Key: CASSANDRA-15390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15390
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> A heavily used btree builder path does a lot of unnecessary conversions to 
> and from collections and iterators. Adding dedicated support for Object[] 
> reduces compaction garbage by up to 8.3%



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15390) Avoid unnecessary collection/iterator allocations during btree construction

2020-04-08 Thread Blake Eggleston (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15390:

Status: Ready to Commit  (was: Changes Suggested)

> Avoid unnecessary collection/iterator allocations during btree construction
> ---
>
> Key: CASSANDRA-15390
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15390
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local/Compaction
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Normal
> Fix For: 4.0
>
>
> A heavily used btree builder path does a lot of unnecessary conversions to 
> and from collections and iterators. Adding dedicated support for Object[] 
> reduces compaction garbage by up to 8.3%



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Avoid unnecessary collection/iterator allocations during btree construction

2020-04-08 Thread bdeggleston
This is an automated email from the ASF dual-hosted git repository.

bdeggleston pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 01a091a  Avoid unnecessary collection/iterator allocations during 
btree construction
01a091a is described below

commit 01a091a0412c4a05eae94f24458dd139a42dfda3
Author: Blake Eggleston 
AuthorDate: Wed Oct 16 09:03:26 2019 -0700

Avoid unnecessary collection/iterator allocations during btree construction

Patch by Blake Eggleston; Reviewed by Benedict Elliott Smith for 
CASSANDRA-15390
---
 CHANGES.txt|  1 +
 .../org/apache/cassandra/utils/btree/BTree.java| 70 +-
 2 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index acac895..ac0fe22 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-alpha4
+ * Avoid unnecessary collection/iterator allocations during btree construction 
(CASSANDRA-15390)
  * Repair history tables should have TTL and TWCS (CASSANDRA-12701)
  * Fix cqlsh erroring out on Python 3.7 due to webbrowser module being absent 
(CASSANDRA-15572)
  * Fix IMH#acquireCapacity() to return correct Outcome when endpoint reserve 
runs out (CASSANDRA-15607)
diff --git a/src/java/org/apache/cassandra/utils/btree/BTree.java 
b/src/java/org/apache/cassandra/utils/btree/BTree.java
index 6d0af6e..97e935e 100644
--- a/src/java/org/apache/cassandra/utils/btree/BTree.java
+++ b/src/java/org/apache/cassandra/utils/btree/BTree.java
@@ -112,6 +112,38 @@ public class BTree
 public static Dir desc(boolean desc) { return desc ? DESC : ASC; }
 }
 
+/**
+ * Enables methods to consume the contents of iterators, collections, or 
arrays without duplicating code or
+ * allocating intermediate objects. Instead of taking an argument that 
implements an interface, a method takes
+ * an opaque object as the input, and a singleton helper object it uses as 
an intermediary to access it's contents.
+ * The purpose of doing things this way is to avoid memory allocations on 
hot paths.
+ */
+private interface IteratingFunction
+{
+/**
+ * Returns the next object at the given index. This method  must be 
called with sequentially increasing index
+ * values, starting at 0, and must only be called once per index 
value. The results of calling this method
+ * without following these rules are undefined.
+ */
+ K nextAt(T input, int idx);
+}
+
+private static final IteratingFunction ITERATOR_FUNCTION = new 
IteratingFunction()
+{
+public  K nextAt(Iterator input, int idx)
+{
+return (K) input.next();
+}
+};
+
+private static final IteratingFunction ARRAY_FUNCTION = new 
IteratingFunction()
+{
+public  K nextAt(Object[] input, int idx)
+{
+return (K) input[idx];
+}
+};
+
 public static Object[] empty()
 {
 return EMPTY_LEAF;
@@ -124,7 +156,7 @@ public class BTree
 
 public static  Object[] build(Collection 
source, UpdateFunction updateF)
 {
-return buildInternal(source, source.size(), updateF);
+return buildInternal(source.iterator(), ITERATOR_FUNCTION, 
source.size(), updateF);
 }
 
 /**
@@ -138,35 +170,44 @@ public class BTree
 {
 if (size < 0)
 throw new IllegalArgumentException(Integer.toString(size));
-return buildInternal(source, size, updateF);
+return buildInternal(source.iterator(), ITERATOR_FUNCTION, size, 
updateF);
+}
+
+public static  Object[] build(Object[] 
source, int size, UpdateFunction updateF)
+{
+if (size < 0)
+throw new IllegalArgumentException(Integer.toString(size));
+return buildInternal(source, ARRAY_FUNCTION, size, updateF);
 }
 
-private static  Object[] 
buildLeaf(Iterator it, int size, UpdateFunction updateF)
+private static  Object[] buildLeaf(S 
source, IteratingFunction iterFunc, int size, int startIdx, 
UpdateFunction updateF)
 {
 V[] values = (V[]) new Object[size | 1];
 
+int idx = startIdx;
 for (int i = 0; i < size; i++)
 {
-K k = it.next();
+K k = iterFunc.nextAt(source, idx);
 values[i] = updateF.apply(k);
+idx++;
 }
-if (updateF != UpdateFunction.noOp())
+if (updateF != UpdateFunction.noOp())
 updateF.allocated(ObjectSizes.sizeOfArray(values));
 return values;
 }
 
-private static  Object[] 
buildInternal(Iterator it, int size, int level, UpdateFunction updateF)
+private static  Object[] buildInternal(S 
source, IteratingFunction iterFunc, int size, int level, int startIdx, 
UpdateFunction updateF)
 {
 assert size > 0;
 assert level >= 

[jira] [Commented] (CASSANDRA-15642) Inconsistent failure messages on distributed queries

2020-04-08 Thread Kevin Gallardo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078645#comment-17078645
 ] 

Kevin Gallardo commented on CASSANDRA-15642:


I see, the differentiation "waiting until we can guarantee we can never 
succeed", I will say it makes more sense presented this way.

Although when testing for CASSANDRA-15543, {{blockFor()}} and 
{{cassandraReplicaCount()}} were both the same, which means we would still fail 
as soon as n >= 1, bringing it back to the same conclusion.

bq. if you want to file a ticket for it I think I can make us both happy: we 
should always fail a query as soon as we know it cannot succeed [...]

I think we still miss out on potential information that could be returned to 
the user and improve usability, but I have presented my arguments already, so I 
won't keep insisting. The case of the schema agreement error is still to me a 
clear situation where things could be improved. But if anything I would hope 
there was be a place where this sort of behavior was documented and explained, 
rather than users having to discover it by themselves in unfortunate 
circumstances, or having to go through the code.

Also I agree it seems to me like a good idea from my POV for the "speculative 
read" (or put more simply a "retry" iiuc?). It would be an improvement, though 
I'm thinking the drivers already provides this sort of utility that are well 
customizable by the users, compared to a server-side solution so I suppose it 
has upsides and downsides. But something that makes completing a request more 
robust seems like a good idea regardless.

> Inconsistent failure messages on distributed queries
> 
>
> Key: CASSANDRA-15642
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15642
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Consistency/Coordination
>Reporter: Kevin Gallardo
>Priority: Normal
>
> As a follow up to some exploration I have done for CASSANDRA-15543, I 
> realized the following behavior in both {{ReadCallback}} and 
> {{AbstractWriteHandler}}:
>  - await for responses
>  - when all required number of responses have come back: unblock the wait
>  - when a single failure happens: unblock the wait
>  - when unblocked, look to see if the counter of failures is > 1 and if so 
> return an error message based on the {{failures}} map that's been filled
> Error messages that can result from this behavior can be a ReadTimeout, a 
> ReadFailure, a WriteTimeout or a WriteFailure.
> In case of a Write/ReadFailure, the user will get back an error looking like 
> the following:
> "Failure: Received X responses, and Y failures"
> (if this behavior I describe is incorrect, please correct me)
> This causes a usability problem. Since the handler will fail and throw an 
> exception as soon as 1 failure happens, the error message that is returned to 
> the user may not be accurate.
> (note: I am not entirely sure of the behavior in case of timeouts for now)
> For example, say a request at CL = QUORUM = 3, a failed request may complete 
> first, then a successful one completes, and another fails. If the exception 
> is thrown fast enough, the error message could say 
>  "Failure: Received 0 response, and 1 failure at CL = 3"
> Which:
> 1. doesn't make a lot of sense because the CL doesn't match the number of 
> results in the message, so you end up thinking "what happened with the rest 
> of the required CL?"
> 2. the information is incorrect. We did receive a successful response, only 
> it came after the initial failure.
> From that logic, I think it is safe to assume that the information returned 
> in the error message cannot be trusted in case of a failure. Only information 
> users should extract out of it is that at least 1 node has failed.
> For a big improvement in usability, the {{ReadCallback}} and 
> {{AbstractWriteResponseHandler}} could instead wait for all responses to come 
> back before unblocking the wait, or let it timeout. This is way, the users 
> will be able to have some trust around the information returned to them.
> Additionally, an error that happens first prevents a timeout to happen 
> because it fails immediately, and so potentially it hides problems with other 
> replicas. If we were to wait for all responses, we might get a timeout, in 
> that case we'd also be able to tell wether failures have happened *before* 
> that timeout, and have a more complete diagnostic where you can't detect both 
> errors at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15700) Performance regression on internode messaging

2020-04-08 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-15700:
--
Reviewers: Aleksey Yeschenko

> Performance regression on internode messaging
> -
>
> Key: CASSANDRA-15700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Sergio Bossa
>Assignee: Sergio Bossa
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Oss40patchedvsOss311.png, Oss40vsOss311.png, oss40.gc, 
> oss40_nogc.tar.xz, oss40_system.log
>
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings in the {{id()}} call. The next question is 
> what causes all those message expirations; we thoroughly reviewed the 
> internode messaging code and the only issue we could find so far is related 
> to the "batch pruning" calls 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
>  and 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
>  it _seems_ too much time is spent on those, causing the event loop to fall 
> behind in processing the rest of the messages, which will end up being 
> expired. This is supported by the analysis of the collapsed stacks (after 
> fixing the GC issue):
> {noformat}
> (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
> org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
> org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
> org/apache/cassandra/net/PrunableArrayQueue:prune 1621
> org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
> {noformat}
> Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
> spends half of its time pruning. But only a tiny portion of such pruning time 
> is spent actually expiring:
> {noformat}
> (tprint (top-aggregated-calls oss40nogc 
> "OutboundMessageQueue:pruneInternalQueueWithLock" 5))
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900
> org/apache/cassandra/net/PrunableArrayQueue:prune 1894
> org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147
> org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147
> org/apache/cassandra/net/OutboundConnection:onExpired 147
> {noformat}
> And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant:
> {noformat}
> (tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5))
> org/apache/cassandra/net/PrunableArrayQueue:prune 1718
> org/apache/cassandra/net/OutboundConnection:releaseCapacity 27
> java/util/concurrent/ConcurrentHashMap:replaceNode 19
> java/util/concurrent/ConcurrentLinkedQueue:offer 16
> java/util/concurrent/LinkedBlockingQueue:offer 15
> {noformat}
> That said, before proceeding with a PR to fix those issues, I'd like to 
> understand: what's the reason to prune so often, rather than just when 
> polling the message during delivery? If there's a reason I'm missing, let's 
> talk about how to optimize pruning, otherwise let's get rid of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15700) Performance regression on internode messaging

2020-04-08 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078610#comment-17078610
 ] 

Benedict Elliott Smith commented on CASSANDRA-15700:


Your proposed solution sounds great, so there's not much use for me here.  
Aleksey is the more natural reviewer anyway.

> Performance regression on internode messaging
> -
>
> Key: CASSANDRA-15700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Sergio Bossa
>Assignee: Sergio Bossa
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Oss40patchedvsOss311.png, Oss40vsOss311.png, oss40.gc, 
> oss40_nogc.tar.xz, oss40_system.log
>
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings in the {{id()}} call. The next question is 
> what causes all those message expirations; we thoroughly reviewed the 
> internode messaging code and the only issue we could find so far is related 
> to the "batch pruning" calls 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
>  and 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
>  it _seems_ too much time is spent on those, causing the event loop to fall 
> behind in processing the rest of the messages, which will end up being 
> expired. This is supported by the analysis of the collapsed stacks (after 
> fixing the GC issue):
> {noformat}
> (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
> org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
> org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
> org/apache/cassandra/net/PrunableArrayQueue:prune 1621
> org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
> {noformat}
> Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
> spends half of its time pruning. But only a tiny portion of such pruning time 
> is spent actually expiring:
> {noformat}
> (tprint (top-aggregated-calls oss40nogc 
> "OutboundMessageQueue:pruneInternalQueueWithLock" 5))
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900
> org/apache/cassandra/net/PrunableArrayQueue:prune 1894
> org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147
> org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147
> org/apache/cassandra/net/OutboundConnection:onExpired 147
> {noformat}
> And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant:
> {noformat}
> (tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5))
> org/apache/cassandra/net/PrunableArrayQueue:prune 1718
> org/apache/cassandra/net/OutboundConnection:releaseCapacity 27
> java/util/concurrent/ConcurrentHashMap:replaceNode 19
> java/util/concurrent/ConcurrentLinkedQueue:offer 16
> java/util/concurrent/LinkedBlockingQueue:offer 15
> {noformat}
> That said, before proceeding with a PR to fix those issues, I'd like to 
> understand: what's the reason to prune so often, rather than just when 
> polling the message during delivery? If there's a reason I'm missing, let's 
> talk about how to optimize pruning, otherwise let's get rid of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15700) Performance regression on internode messaging

2020-04-08 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078557#comment-17078557
 ] 

Aleksey Yeschenko commented on CASSANDRA-15700:
---

I'll take a look soonish.

> Performance regression on internode messaging
> -
>
> Key: CASSANDRA-15700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Sergio Bossa
>Assignee: Sergio Bossa
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Oss40patchedvsOss311.png, Oss40vsOss311.png, oss40.gc, 
> oss40_nogc.tar.xz, oss40_system.log
>
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings in the {{id()}} call. The next question is 
> what causes all those message expirations; we thoroughly reviewed the 
> internode messaging code and the only issue we could find so far is related 
> to the "batch pruning" calls 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
>  and 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
>  it _seems_ too much time is spent on those, causing the event loop to fall 
> behind in processing the rest of the messages, which will end up being 
> expired. This is supported by the analysis of the collapsed stacks (after 
> fixing the GC issue):
> {noformat}
> (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
> org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
> org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
> org/apache/cassandra/net/PrunableArrayQueue:prune 1621
> org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
> {noformat}
> Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
> spends half of its time pruning. But only a tiny portion of such pruning time 
> is spent actually expiring:
> {noformat}
> (tprint (top-aggregated-calls oss40nogc 
> "OutboundMessageQueue:pruneInternalQueueWithLock" 5))
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900
> org/apache/cassandra/net/PrunableArrayQueue:prune 1894
> org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147
> org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147
> org/apache/cassandra/net/OutboundConnection:onExpired 147
> {noformat}
> And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant:
> {noformat}
> (tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5))
> org/apache/cassandra/net/PrunableArrayQueue:prune 1718
> org/apache/cassandra/net/OutboundConnection:releaseCapacity 27
> java/util/concurrent/ConcurrentHashMap:replaceNode 19
> java/util/concurrent/ConcurrentLinkedQueue:offer 16
> java/util/concurrent/LinkedBlockingQueue:offer 15
> {noformat}
> That said, before proceeding with a PR to fix those issues, I'd like to 
> understand: what's the reason to prune so often, rather than just when 
> polling the message during delivery? If there's a reason I'm missing, let's 
> talk about how to optimize pruning, otherwise let's get rid of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078552#comment-17078552
 ] 

David Capwell commented on CASSANDRA-15708:
---

[~marcuse] can you file a ticket for the test failure you saw?

{code}
[junit-timeout] Testsuite: org.apache.cassandra.concurrent.SEPExecutorTest
[junit-timeout] Testsuite: org.apache.cassandra.concurrent.SEPExecutorTest 
Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.656 sec
[junit-timeout] 
[junit-timeout] Testcase: 
changingMaxWorkersMeetsConcurrencyGoalsTest(org.apache.cassandra.concurrent.SEPExecutorTest):
 FAILED
[junit-timeout] expected: but was:
[junit-timeout] junit.framework.AssertionFailedError: expected: but 
was:
[junit-timeout] at 
org.apache.cassandra.concurrent.SEPExecutorTest.assertMaxTaskConcurrency(SEPExecutorTest.java:180)
[junit-timeout] at 
org.apache.cassandra.concurrent.SEPExecutorTest.changingMaxWorkersMeetsConcurrencyGoalsTest(SEPExecutorTest.java:110)
[junit-timeout] 
[junit-timeout] 
[junit-timeout] Test org.apache.cassandra.concurrent.SEPExecutorTest FAILED
{code}

> Fix in-jvm upgrade dtests
> -
>
> Key: CASSANDRA-15708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078553#comment-17078553
 ] 

David Capwell commented on CASSANDRA-15708:
---

LGTM +1

> Fix in-jvm upgrade dtests
> -
>
> Key: CASSANDRA-15708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078547#comment-17078547
 ] 

David Capwell commented on CASSANDRA-15708:
---

https://app.circleci.com/pipelines/github/dcapwell/cassandra/193/workflows/6d184c9e-f1b9-435e-bc67-66531ddbb47e/jobs/964

{code}
[junit-timeout] Caused by: java.lang.NoSuchFieldException: cdc_raw_directory
[junit-timeout] at java.lang.Class.getDeclaredField(Class.java:2070)
[junit-timeout] at 
org.apache.cassandra.distributed.impl.InstanceConfig.propagate(InstanceConfig.java:221)
{code}

> Fix in-jvm upgrade dtests
> -
>
> Key: CASSANDRA-15708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078543#comment-17078543
 ] 

David Capwell commented on CASSANDRA-15708:
---

Patch LGTM, I am just running the tests without the patch to produce the errors 
so they are recorded in JIRA.

> Fix in-jvm upgrade dtests
> -
>
> Key: CASSANDRA-15708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15688) Invalid cdc_raw_directory prevents server startup

2020-04-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078535#comment-17078535
 ] 

Jon Meredith commented on CASSANDRA-15688:
--

And resolved - good to merge once you're happy with it.

CircleCI 
[Java8|https://circleci.com/workflow-run/406dfdde-7609-47c4-9083-abc06a452935] 
[Java11|https://circleci.com/workflow-run/b356b840-191d-44f7-90c7-63b898a3c11c] 
this time it really is a couple of flaky unrelated dtests.

> Invalid cdc_raw_directory prevents server startup
> -
>
> Key: CASSANDRA-15688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15688
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Change Data Capture
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If {{cdc_raw_directory}} is set to an invalid directory it prevents startup 
> of the server even when cdc_enabled is set false.
> The directory can either be set directly by the {{cdc_raw_directory}} setting 
> in configuration YAML or indirectly via the {{cassandra.storage_dir}} system 
> property, which is how I encountered it.
> Easy to reproduce by setting {{cdc_raw_directory}} to {{notadir/notasubdir}}
> Additionally while investigating, discovered that 
> {{DatabaseDescriptor.guessFileStore}} can cause a {{NullPointerException}} if 
> it runs out of parent elements
>  before it can get a FileStore. It should provide a more useful 
> ConfigurationException providing details on the problematic path.
>  {{guessFileStore}} is used for checks on {{commitlog_directory}}, 
> {{cdc_raw_directory}} and {{data_file_directories}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15708:
--
Test and Documentation Plan: circle ci
 Status: Patch Available  (was: Open)

> Fix in-jvm upgrade dtests
> -
>
> Key: CASSANDRA-15708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15708:
--
Reviewers: Alex Petrov, David Capwell, David Capwell  (was: Alex Petrov, 
David Capwell)
   Alex Petrov, David Capwell, David Capwell  (was: Alex Petrov)
   Status: Review In Progress  (was: Patch Available)

> Fix in-jvm upgrade dtests
> -
>
> Key: CASSANDRA-15708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15700) Performance regression on internode messaging

2020-04-08 Thread Sergio Bossa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078510#comment-17078510
 ] 

Sergio Bossa commented on CASSANDRA-15700:
--

{quote}Honestly though it sounds like there is not much between our proposals 
now, which is reassuring, so perhaps we should focus on the common ground we 
have. 
{quote}
Indeed, I think we agree on the necessity to fix this in the least risky way, 
which it seems to mean keeping the current pruning implementation and accuracy, 
but avoiding it to be run *at every single delivery* (although maybe you 
dispute this being the problem, but the collapsed stacks speak soundly about 
that).

In the spirit of that, I've fixed the message queue algorithm to compute the 
expiration deadline in a way that can be used to actually run the pruning task 
_only after such deadline_. I'll give it another review on my own and possibly 
add more unit tests (please note the current implementation seems to had none 
at all), but performance tests now look much better (orange is patched 4.0):

!Oss40patchedvsOss311.png|width=556,height=299!

Here's the branch: 
[https://github.com/sbtourist/cassandra/commits/CASSANDRA-15700]

I understand there's no _panic_ to fix it, and you'll be away the next couple 
weeks, but this means realistically postponing this issue for at least 3 weeks, 
which will add to the delay we're already accumulating in 4.0 for other 
reasons, so maybe you could delegate this to [~aleksey] as you mentioned?

> Performance regression on internode messaging
> -
>
> Key: CASSANDRA-15700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Sergio Bossa
>Assignee: Sergio Bossa
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Oss40patchedvsOss311.png, Oss40vsOss311.png, oss40.gc, 
> oss40_nogc.tar.xz, oss40_system.log
>
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings in the {{id()}} call. The next question is 
> what causes all those message expirations; we thoroughly reviewed the 
> internode messaging code and the only issue we could find so far is related 
> to the "batch pruning" calls 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
>  and 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
>  it _seems_ too much time is spent on those, causing the event loop to fall 
> behind in processing the rest of the messages, which will end up being 
> expired. This is supported by the analysis of the collapsed stacks (after 
> fixing the GC issue):
> {noformat}
> (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
> org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
> org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
> org/apache/cassandra/net/PrunableArrayQueue:prune 1621
> org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
> {noformat}
> Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
> spends half of its time pruning. But only a tiny portion of such pruning time 
> is spent actually expiring:
> {noformat}
> (tprint (top-aggregated-calls oss40nogc 
> "OutboundMessageQueue:pruneInternalQueueWithLock" 5))
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900
> org/apache/cassandra/net/PrunableArrayQueue:prune 1894
> org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147
> org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147
> org/apache/cassandra/net/OutboundConnection:onExpired 147
> {noformat}
> And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant:
> {noformat}
> (tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5))
> org/apache/cassandra/net/PrunableArrayQueue:prune 1718
> 

[jira] [Updated] (CASSANDRA-15700) Performance regression on internode messaging

2020-04-08 Thread Sergio Bossa (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Bossa updated CASSANDRA-15700:
-
Attachment: Oss40patchedvsOss311.png

> Performance regression on internode messaging
> -
>
> Key: CASSANDRA-15700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Sergio Bossa
>Assignee: Sergio Bossa
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Oss40patchedvsOss311.png, Oss40vsOss311.png, oss40.gc, 
> oss40_nogc.tar.xz, oss40_system.log
>
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings in the {{id()}} call. The next question is 
> what causes all those message expirations; we thoroughly reviewed the 
> internode messaging code and the only issue we could find so far is related 
> to the "batch pruning" calls 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
>  and 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
>  it _seems_ too much time is spent on those, causing the event loop to fall 
> behind in processing the rest of the messages, which will end up being 
> expired. This is supported by the analysis of the collapsed stacks (after 
> fixing the GC issue):
> {noformat}
> (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
> org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
> org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
> org/apache/cassandra/net/PrunableArrayQueue:prune 1621
> org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
> {noformat}
> Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
> spends half of its time pruning. But only a tiny portion of such pruning time 
> is spent actually expiring:
> {noformat}
> (tprint (top-aggregated-calls oss40nogc 
> "OutboundMessageQueue:pruneInternalQueueWithLock" 5))
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900
> org/apache/cassandra/net/PrunableArrayQueue:prune 1894
> org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147
> org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147
> org/apache/cassandra/net/OutboundConnection:onExpired 147
> {noformat}
> And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant:
> {noformat}
> (tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5))
> org/apache/cassandra/net/PrunableArrayQueue:prune 1718
> org/apache/cassandra/net/OutboundConnection:releaseCapacity 27
> java/util/concurrent/ConcurrentHashMap:replaceNode 19
> java/util/concurrent/ConcurrentLinkedQueue:offer 16
> java/util/concurrent/LinkedBlockingQueue:offer 15
> {noformat}
> That said, before proceeding with a PR to fix those issues, I'd like to 
> understand: what's the reason to prune so often, rather than just when 
> polling the message during delivery? If there's a reason I'm missing, let's 
> talk about how to optimize pruning, otherwise let's get rid of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078465#comment-17078465
 ] 

Marcus Eriksson edited comment on CASSANDRA-15708 at 4/8/20, 5:13 PM:
--

https://github.com/krummas/cassandra/commits/marcuse/15708

[jvm upgrade dtests|https://circleci.com/gh/krummas/cassandra/3109], [jvm 
dtests|https://circleci.com/gh/krummas/cassandra/3105], [unit 
tests|https://circleci.com/gh/krummas/cassandra/3106]


was (Author: krummas):
https://github.com/krummas/cassandra/commits/marcuse/15708
https://circleci.com/workflow-run/890e1586-abb6-4831-9728-161598393342 (but I 
guess those can't be viewed so I'll post screenshots once done)

> Fix in-jvm upgrade dtests
> -
>
> Key: CASSANDRA-15708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-08 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078472#comment-17078472
 ] 

ZhaoYang edited comment on CASSANDRA-15666 at 4/8/20, 5:06 PM:
---

bq. 1) Only the "follower" is allowed to send the CompleteMessage.
bq. 2) Only the "initiator" is allowed to close the session and its channels 
after receiving the CompleteMessage.

[~sbtourist] [~blerer] I have addressed review feedback and include above 
modification. do you mind having a look?

can you also trigger on apache CI? thanks


was (Author: jasonstack):
bq. 1) Only the "follower" is allowed to send the CompleteMessage.
bq. 2) Only the "initiator" is allowed to close the session and its channels 
after receiving the CompleteMessage.

[~sbtourist] [~blerer] I have addressed review feedback and include above 
modification. do you mind having a look?

> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-08 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078472#comment-17078472
 ] 

ZhaoYang commented on CASSANDRA-15666:
--

bq. 1) Only the "follower" is allowed to send the CompleteMessage.
bq. 2) Only the "initiator" is allowed to close the session and its channels 
after receiving the CompleteMessage.

[~sbtourist] [~blerer] I have addressed review feedback and include above 
modification. do you mind having a look?

> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-08 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15666:
-
Test and Documentation Plan: 
Added interceptor to verify stream messages and state transition.

 

  was:
Added interceptor to verify stream messages and state transition.

CI: [https://circleci.com/workflow-run/80cdbf0c-65d3-439c-8134-61f492d1d55f]
 


> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-15708:

 Bug Category: Parent values: Code(13163)Level 1 values: Bug - Unclear 
Impact(13164)
   Complexity: Low Hanging Fruit
  Component/s: Test/dtest
Discovered By: Unit Test
Reviewers: Alex Petrov
 Severity: Low
   Status: Open  (was: Triage Needed)

https://github.com/krummas/cassandra/commits/marcuse/15708
https://circleci.com/workflow-run/890e1586-abb6-4831-9728-161598393342 (but I 
guess those can't be viewed so I'll post screenshots once done)

> Fix in-jvm upgrade dtests
> -
>
> Key: CASSANDRA-15708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15708) Fix in-jvm upgrade dtests

2020-04-08 Thread Marcus Eriksson (Jira)
Marcus Eriksson created CASSANDRA-15708:
---

 Summary: Fix in-jvm upgrade dtests
 Key: CASSANDRA-15708
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15708
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson


In-jvm upgrade dtests were broken by CASSANDRA-15539



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15688) Invalid cdc_raw_directory prevents server startup

2020-04-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078450#comment-17078450
 ] 

Jon Meredith commented on CASSANDRA-15688:
--

Time to eat humble pie. Reproduced locally, will investigate.
{code:java}
[junit-timeout] Forked Java VM exited abnormally. Please note the time in the 
report does not reflect the time until the VM exit.
[junit-timeout] junit.framework.AssertionFailedError: Forked Java VM exited 
abnormally. Please note the time in the report does not reflect the time until 
the VM exit.
[junit-timeout] at java.util.Vector.forEach(Vector.java:1275)
[junit-timeout] at java.util.Vector.forEach(Vector.java:1275)
[junit-timeout] at java.lang.Thread.run(Thread.java:748) {code}

> Invalid cdc_raw_directory prevents server startup
> -
>
> Key: CASSANDRA-15688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15688
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Change Data Capture
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If {{cdc_raw_directory}} is set to an invalid directory it prevents startup 
> of the server even when cdc_enabled is set false.
> The directory can either be set directly by the {{cdc_raw_directory}} setting 
> in configuration YAML or indirectly via the {{cassandra.storage_dir}} system 
> property, which is how I encountered it.
> Easy to reproduce by setting {{cdc_raw_directory}} to {{notadir/notasubdir}}
> Additionally while investigating, discovered that 
> {{DatabaseDescriptor.guessFileStore}} can cause a {{NullPointerException}} if 
> it runs out of parent elements
>  before it can get a FileStore. It should provide a more useful 
> ConfigurationException providing details on the problematic path.
>  {{guessFileStore}} is used for checks on {{commitlog_directory}}, 
> {{cdc_raw_directory}} and {{data_file_directories}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15688) Invalid cdc_raw_directory prevents server startup

2020-04-08 Thread Jon Meredith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078428#comment-17078428
 ] 

Jon Meredith commented on CASSANDRA-15688:
--

Many failures again - still believed to be unrelated. Rebased again (though 
nothing significant to tests landed) and reran

CircleCI 
[Java8|https://circleci.com/workflow-run/1d67e766-5c04-4c44-9fc1-a1c455a32e7a] 
[Java11|https://circleci.com/workflow-run/e8b46dc7-6b16-42fe-97bb-b85de93e362f]

> Invalid cdc_raw_directory prevents server startup
> -
>
> Key: CASSANDRA-15688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15688
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Change Data Capture
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If {{cdc_raw_directory}} is set to an invalid directory it prevents startup 
> of the server even when cdc_enabled is set false.
> The directory can either be set directly by the {{cdc_raw_directory}} setting 
> in configuration YAML or indirectly via the {{cassandra.storage_dir}} system 
> property, which is how I encountered it.
> Easy to reproduce by setting {{cdc_raw_directory}} to {{notadir/notasubdir}}
> Additionally while investigating, discovered that 
> {{DatabaseDescriptor.guessFileStore}} can cause a {{NullPointerException}} if 
> it runs out of parent elements
>  before it can get a FileStore. It should provide a more useful 
> ConfigurationException providing details on the problematic path.
>  {{guessFileStore}} is used for checks on {{commitlog_directory}}, 
> {{cdc_raw_directory}} and {{data_file_directories}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15700) Performance regression on internode messaging

2020-04-08 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078265#comment-17078265
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15700 at 4/8/20, 4:00 PM:
-

bq. on what ground you say this is an isolated problem?

This appears to be an issue in communication: the word isolated refers to the 
problem _code_.  Please be assured I consider this a _serious_ problem, I am 
just unconcerned about (any difficulty) resolving it.  There is no need to 
panic and rush a fix.

bq. I’d still like to understand how the pruning approach we're discussing here 
is important to the control flow semantics at all

In the word of Dirk Gently: everything’s connected :)  Your prior alternative 
proposal to replace the existing semantics involved two distinct changes to 
control flow, namely introducing a hash timer wheel (something I’m in favour of 
generally, but demonstrably a control flow change, and preferable to defer 
until 5.0) and eliminating the expiry on enqueue.  I just consider these kinds 
of change to be riskier at this stage.

bq. The collapsed stacks clearly show most time is spent by pruning itself, 
that is by iterating the queue, rather than by expiring messages

I believe it shows as much as 10% of time in real expirations?  That is not 
insignificant, and given how relatively cheap evaluating an expiration is, it 
_may_ well be the case that the algorithmic inefficiency we are discussing is 
incidental to the behaviour.

bq. to make 4.0 behave as similarly as possible to 3.11

This leaves a significant gap still: 4.0 will use the local node message 
arrival to determine the timeout for its response, so there is still plenty of 
scope for messages to expire ahead of 3.x

bq. No, I meant to compute the "next expire time" as an approximation of the 
expire time of the processed messages, rather than relying on exactly computing 
it via the pruner at every event loop delivery run.
bq. We can keep the enqueue() pruning, as that's not the worst offender (see 
collapsed stacks).

The enqueue pruning is cheap because we compute the minimum expiration time, so 
it is infrequently called; if we only guess this number now, we offer no 
guarantees the balance of new messages dropped in favour of expired messages.

bq. so why adding an additional mechanism and 4th config property (your 20% 
threshold)

Is this an additional mechanism? We already have a mechanism, we just pick our 
number differently.  Guessing at a number is also a mechanism, surely?  How 
would we configure the guess algorithm, and why wouldn’t we expose its 
parameters?  I had assumed we would not make this configurable, in the same way 
we would not make the assumptions of any guess algorithm configurable, since 
its purpose is just to guarantee algorithmic complexity and bound how far from 
our memory limits we permit expired messages to be preferred over unexpired 
messages.

Honestly though it sounds like there is not much between our proposals now, 
which is reassuring, so perhaps we should focus on the common ground we have.  
However, I am now on leave so I would appreciate it if you can be patient until 
I return to continue this discussion.  I would also like to look more closely 
again at the existing behaviour, as the pruning is closely related to the 
migration of new records from the MPSC queue to the internal queue.  I have 
been trying to respond promptly to your queries, but I feel that in doing so my 
responses have not been sufficiently well considered, and I would prefer to 
take time to produce a complete and coherent view and proposal.  Is that 
acceptable to you?


was (Author: benedict):
bq. on what ground you say this is an isolated problem?

This appears to be an issue in communication: the word isolated refers to the 
problem _code_.  Please be assured I consider this a _serious_ problem, I am 
just unconcerned about resolving it.  There is no need to panic and rush a fix.

bq. I’d still like to understand how the pruning approach we're discussing here 
is important to the control flow semantics at all

In the word of Dirk Gently: everything’s connected :)  Your prior alternative 
proposal to replace the existing semantics involved two distinct changes to 
control flow, namely introducing a hash timer wheel (something I’m in favour of 
generally, but demonstrably a control flow change, and preferable to defer 
until 5.0) and eliminating the expiry on enqueue.  I just consider these kinds 
of change to be riskier at this stage.

bq. The collapsed stacks clearly show most time is spent by pruning itself, 
that is by iterating the queue, rather than by expiring messages

I believe it shows as much as 10% of time in real expirations?  That is not 
insignificant, and given how relatively cheap evaluating an expiration is, it 
_may_ well be the case that the algorithmic 

[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config

2020-04-08 Thread Stefan Podkowinski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078419#comment-17078419
 ] 

Stefan Podkowinski commented on CASSANDRA-15686:


"there is no guidance or document saying they are not reliable on low config"

Can we update the README mentioned above on that?

> Improvements in circle CI default config
> 
>
> Key: CASSANDRA-15686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Kevin Gallardo
>Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the default? We could generate a 
> lowres version of the new config-2_1.yml, and rename it {{config.yml}} so 
> that it gets picked up by CircleCI automatically instead of the current 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config

2020-04-08 Thread Stefan Podkowinski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078414#comment-17078414
 ] 

Stefan Podkowinski commented on CASSANDRA-15686:


I understand that it can be confusing to have tests available to run that have 
literally no chance to complete successfully using the low-res settings. But if 
you look at the 
[README|https://github.com/apache/cassandra/tree/trunk/.circleci] you'll see 
that we use the same file `config-2_1.yml` and just patch it for high-res 
settings. The idea was not having to maintain two versions of circle config 
files. 

> Improvements in circle CI default config
> 
>
> Key: CASSANDRA-15686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Kevin Gallardo
>Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the default? We could generate a 
> lowres version of the new config-2_1.yml, and rename it {{config.yml}} so 
> that it gets picked up by CircleCI automatically instead of the current 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15707) Fix cqlsh output test

2020-04-08 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078066#comment-17078066
 ] 

Michael Semb Wever edited comment on CASSANDRA-15707 at 4/8/20, 3:50 PM:
-

||branch||circleci||jenkins||
|[trunk_15707|https://github.com/apache/cassandra/compare/trunk...nastra:CASSANDRA-15707]|[circleci|https://circleci.com/gh/nastra/workflows/cassandra/tree/CASSANDRA-15707]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/41/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/41]|


was (Author: michaelsembwever):
||branch||circleci||jenkins||
|[trunk_15707|https://github.com/apache/cassandra/compare/trunk...nastra:CASSANDRA-15707]|[circleci|https://circleci.com/gh/nastra/workflows/cassandra/tree/CASSANDRA-15707]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/36/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/36]|

> Fix cqlsh output test
> -
>
> Key: CASSANDRA-15707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15707
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/cqlshlib.test.test_cqlsh_output/TestCqlshOutput/
> {code}
> Sequences differ: ['CRE[438 chars]nt', ") WITH additional_write_policy = 
> '99p'",[710 chars], ''] != ['CRE[438 chars]nt', ') WITH 
> bloom_filter_fp_chance = 0.01', "[711 chars], '']
> First differing element 17:
> ") WITH additional_write_policy = '99p'"
> ') WITH bloom_filter_fp_chance = 0.01'
> Diff is 1475 characters long. Set self.maxDiff to None to see it.
> """Fail immediately, with the given message."""
> >>  raise self.failureException('Sequences differ: [\'CRE[438 chars]nt\', ") 
> >> WITH additional_write_policy = \'99p\'",[710 chars], \'\'] != [\'CRE[438 
> >> chars]nt\', \') WITH bloom_filter_fp_chance = 0.01\', "[711 chars], 
> >> \'\']\n\nFirst differing element 17:\n") WITH additional_write_policy = 
> >> \'99p\'"\n\') WITH bloom_filter_fp_chance = 0.01\'\n\nDiff is 1475 
> >> characters long. Set self.maxDiff to None to see it.')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-builds] 02/02: Add Ubuntu 19.10 with JDK11 Image

2020-04-08 Thread mck
This is an automated email from the ASF dual-hosted git repository.

mck pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git

commit 32527553a8f37a9a3e7b6847bcc06869f8aa59e2
Author: Eduard Tudenhoefner 
AuthorDate: Thu Apr 2 17:49:43 2020 +0200

Add Ubuntu 19.10 with JDK11 Image

This image includes Python 3.6+3.7+3.8 installations in case C* wants to
perform testing with multiple Python versions.
---
 docker/testing/ubuntu1910_j11.docker   | 127 +
 .../testing/ubuntu1910_j11_w_dependencies.docker   |  37 ++
 2 files changed, 164 insertions(+)

diff --git a/docker/testing/ubuntu1910_j11.docker 
b/docker/testing/ubuntu1910_j11.docker
new file mode 100644
index 000..38f60bb
--- /dev/null
+++ b/docker/testing/ubuntu1910_j11.docker
@@ -0,0 +1,127 @@
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+FROM ubuntu:19.10
+MAINTAINER Eduard Tudenhoefner 
+
+# install our python dependencies and some other stuff we need
+# libev4 libev-dev are for the python driver / libssl-dev is for python3.6
+
+RUN export DEBIAN_FRONTEND=noninteractive && \
+apt-get update && \
+apt-get install -y --no-install-recommends software-properties-common 
apt-utils vim
+
+RUN export DEBIAN_FRONTEND=noninteractive && \
+apt-get update && \
+apt-get install -y git-core python2.7 python3-pip python3.8 python3.8-venv 
python3.8-dev net-tools libev4 libev-dev wget gcc libssl-dev
+
+# need to install Python 3.6 as well
+RUN cd /opt && wget https://www.python.org/ftp/python/3.6.10/Python-3.6.10.tgz 
&& \
+tar xzf Python-3.6.10.tgz && cd Python-3.6.10 && \
+./configure --enable-optimizations && \
+make altinstall && \
+cp /opt/Python-3.6.10/python /usr/bin/python3.6
+
+RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.6 2
+RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.8 3
+RUN python3.6 -m pip install --upgrade pip
+RUN python3.7 -m pip install --upgrade pip
+RUN python3.8 -m pip install --upgrade pip
+
+# solves warning: "jemalloc shared library could not be preloaded to speed up 
memory allocations"
+RUN export DEBIAN_FRONTEND=noninteractive && \
+apt-get update && \
+apt-get install -y --no-install-recommends libjemalloc2
+
+# install dumb-init as minimal init system
+RUN export DEBIAN_FRONTEND=noninteractive && \
+apt-get update && \
+apt-get install -y dumb-init
+
+# generate locales for the standard en_US.UTF8 value we use for testing
+RUN export DEBIAN_FRONTEND=noninteractive && \
+apt-get update && \
+apt-get install -y locales && \
+locale-gen en_US.UTF-8
+
+# as we only need the requirements.txt file from the dtest repo, let's just 
get it from GitHub as a raw asset
+# so we can avoid needing to clone the entire repo just to get this file
+ADD 
https://raw.githubusercontent.com/apache/cassandra-dtest/master/requirements.txt
 /opt
+RUN chmod 0644 /opt/requirements.txt
+
+# now setup python via virtualenv with all of the python dependencies we need 
according to requirements.txt
+RUN pip3 install virtualenv
+RUN pip3 install --upgrade wheel
+
+# openjdk + ant
+RUN export DEBIAN_FRONTEND=noninteractive && \
+apt-get update && \
+apt-get install -y --no-install-recommends openjdk-8-jdk openjdk-11-jdk 
ant ant-optional
+
+# make Java 8 the default executable (we use to run all tests against Java 8)
+RUN update-java-alternatives -s java-1.8.0-openjdk-amd64
+
+# setup our user -- if we don't do this docker will default to root and 
Cassandra will fail to start
+# as we appear to have a check that the user isn't starting Cassandra as root
+RUN export DEBIAN_FRONTEND=noninteractive && \
+apt-get install sudo && \
+adduser --disabled-password --gecos "" cassandra && \
+echo "cassandra ALL=(root) NOPASSWD:ALL" > /etc/sudoers.d/cassandra && \
+chmod 0440 /etc/sudoers.d/cassandra
+
+# fix up permissions on the cassandra home dir
+RUN chown -R cassandra:cassandra /home/cassandra
+
+# switch to the cassandra user... we are all done running things as root
+USER cassandra
+ENV HOME /home/cassandra
+WORKDIR /home/cassandra
+
+# Add environment variables for Ant and Java and add them to the PATH
+RUN echo 'export ANT_HOME=/usr/share/ant' >> /home/cassandra/.bashrc && \
+echo 'export JAVA8_HOME=/usr/lib/jvm/java-8-openjdk-amd64' >> 
/home/cassandra/.bashrc && \
+echo 'export 

[cassandra-builds] 01/02: fix typos

2020-04-08 Thread mck
This is an automated email from the ASF dual-hosted git repository.

mck pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git

commit 02fc65f9def0069c09bc9516b5ae5aafcc14b58b
Author: Eduard Tudenhoefner 
AuthorDate: Thu Apr 2 17:49:16 2020 +0200

fix typos
---
 docker/testing/ubuntu1810_j11.docker | 2 +-
 docker/testing/ubuntu18_j11.docker   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/docker/testing/ubuntu1810_j11.docker 
b/docker/testing/ubuntu1810_j11.docker
index 328cd31..835d2ad 100644
--- a/docker/testing/ubuntu1810_j11.docker
+++ b/docker/testing/ubuntu1810_j11.docker
@@ -81,7 +81,7 @@ RUN echo 'export ANT_HOME=/usr/share/ant' >> 
/home/cassandra/.bashrc && \
 echo 'export JAVA_HOME=$JAVA8_HOME' >> /home/cassandra/.bashrc
 
 # run pip commands and setup virtualenv (note we do this after we switch to 
cassandra user so we
-# setup the virtualenv for the cassandrauser and not the root user by acident)
+# setup the virtualenv for the cassandra user and not the root user by 
accident)
 RUN virtualenv --python=python3.6 --no-site-packages env
 RUN chmod +x env/bin/activate
 RUN /bin/bash -c "source ~/env/bin/activate && pip3 install Cython && pip3 
install -r /opt/requirements.txt && pip3 freeze --user"
diff --git a/docker/testing/ubuntu18_j11.docker 
b/docker/testing/ubuntu18_j11.docker
index d54eb5e..7e92c0f 100644
--- a/docker/testing/ubuntu18_j11.docker
+++ b/docker/testing/ubuntu18_j11.docker
@@ -81,7 +81,7 @@ RUN echo 'export ANT_HOME=/usr/share/ant' >> 
/home/cassandra/.bashrc && \
 echo 'export JAVA_HOME=$JAVA8_HOME' >> /home/cassandra/.bashrc
 
 # run pip commands and setup virtualenv (note we do this after we switch to 
cassandra user so we
-# setup the virtualenv for the cassandrauser and not the root user by acident)
+# setup the virtualenv for the cassandra user and not the root user by 
accident)
 RUN virtualenv --python=python3.6 --no-site-packages env
 RUN chmod +x env/bin/activate
 RUN /bin/bash -c "source ~/env/bin/activate && pip3 install Cython && pip3 
install -r /opt/requirements.txt && pip3 freeze --user"


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-builds] branch master updated (eeb3804 -> 3252755)

2020-04-08 Thread mck
This is an automated email from the ASF dual-hosted git repository.

mck pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git.


from eeb3804  Jenkins devbranch-artifacts job to call 
`cassandra-builds/build-scripts/cassandra-artifacts.sh`
 new 02fc65f  fix typos
 new 3252755  Add Ubuntu 19.10 with JDK11 Image

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docker/testing/ubuntu1810_j11.docker   |  2 +-
 docker/testing/ubuntu18_j11.docker |  2 +-
 ...ubuntu1810_j11.docker => ubuntu1910_j11.docker} | 38 +-
 ...docker => ubuntu1910_j11_w_dependencies.docker} |  6 ++--
 4 files changed, 34 insertions(+), 14 deletions(-)
 copy docker/testing/{ubuntu1810_j11.docker => ubuntu1910_j11.docker} (73%)
 copy docker/testing/{ubuntu1810_j11_w_dependencies.docker => 
ubuntu1910_j11_w_dependencies.docker} (91%)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config

2020-04-08 Thread Kevin Gallardo (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078372#comment-17078372
 ] 

Kevin Gallardo commented on CASSANDRA-15686:


bq. Unit tests should be able to complete on Circle CI with medium instances. 
The unit tests are run automatically, so that's the default.

JVM dtests also run by default FWIW, they are able to complete on medium 
instances, as opposed to the Python dtests.

bq. Pretty much all others won't due to limited resources. You shouldn't try 
running them, but if you do, then we can't guarantee they may complete unless 
high resource settings are used.

Right, that was my understanding initially. Since those tests are available to 
run on the lowres config though, and there is no guidance or document saying 
they are not reliable on low config, and people get confused as to why the 
Python dtests for example run with 200-300 failures when using the default 
config. I was suggesting here to either bump up the default config, so that 
they can be run successfully, or remove them from the config since they can't 
be run (and only keep them in the HIGHRES file?), or maybe document this 
better. I understood David recommended fixing the tests to make them run on 
medium instances too, that sounds reasonable too IMO if it makes things less 
confusing in the long run.

Thanks for the input

> Improvements in circle CI default config
> 
>
> Key: CASSANDRA-15686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Kevin Gallardo
>Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the default? We could generate a 
> lowres version of the new config-2_1.yml, and rename it {{config.yml}} so 
> that it gets picked up by CircleCI automatically instead of the current 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15700) Performance regression on internode messaging

2020-04-08 Thread Aleksey Yeschenko (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-15700:
--
 Bug Category: Parent values: Degradation(12984)Level 1 values: Performance 
Bug/Regression(12997)
   Complexity: Normal
Discovered By: Performance Regression Test
Fix Version/s: (was: 4.0)
   4.0-beta
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Performance regression on internode messaging
> -
>
> Key: CASSANDRA-15700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Sergio Bossa
>Assignee: Sergio Bossa
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Oss40vsOss311.png, oss40.gc, oss40_nogc.tar.xz, 
> oss40_system.log
>
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings in the {{id()}} call. The next question is 
> what causes all those message expirations; we thoroughly reviewed the 
> internode messaging code and the only issue we could find so far is related 
> to the "batch pruning" calls 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
>  and 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
>  it _seems_ too much time is spent on those, causing the event loop to fall 
> behind in processing the rest of the messages, which will end up being 
> expired. This is supported by the analysis of the collapsed stacks (after 
> fixing the GC issue):
> {noformat}
> (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
> org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
> org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
> org/apache/cassandra/net/PrunableArrayQueue:prune 1621
> org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
> {noformat}
> Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
> spends half of its time pruning. But only a tiny portion of such pruning time 
> is spent actually expiring:
> {noformat}
> (tprint (top-aggregated-calls oss40nogc 
> "OutboundMessageQueue:pruneInternalQueueWithLock" 5))
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1900
> org/apache/cassandra/net/PrunableArrayQueue:prune 1894
> org/apache/cassandra/net/OutboundMessageQueue$1Pruner:onPruned 147
> org/apache/cassandra/net/OutboundConnection$$Lambda$444/740904487:accept 147
> org/apache/cassandra/net/OutboundConnection:onExpired 147
> {noformat}
> And indeed, the {{PrunableArrayQueue:prune()}} self time is dominant:
> {noformat}
> (tprint (top-self-calls oss40nogc "PrunableArrayQueue:prune" 5))
> org/apache/cassandra/net/PrunableArrayQueue:prune 1718
> org/apache/cassandra/net/OutboundConnection:releaseCapacity 27
> java/util/concurrent/ConcurrentHashMap:replaceNode 19
> java/util/concurrent/ConcurrentLinkedQueue:offer 16
> java/util/concurrent/LinkedBlockingQueue:offer 15
> {noformat}
> That said, before proceeding with a PR to fix those issues, I'd like to 
> understand: what's the reason to prune so often, rather than just when 
> polling the message during delivery? If there's a reason I'm missing, let's 
> talk about how to optimize pruning, otherwise let's get rid of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15551) Fix flaky tests org.apache.cassandra.service.MoveTest testStateJumpToNormal and testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes

2020-04-08 Thread Gianluca Righetto (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078323#comment-17078323
 ] 

Gianluca Righetto commented on CASSANDRA-15551:
---

[~eduard.tudenhoefner] I can consistently reproduce this by using breakpoints, 
so the issue is still there. I'm about to submit a patch with a fix, but I 
think I found another potential race condition in this test yesterday, which 
I'm investigating now.

> Fix flaky tests org.apache.cassandra.service.MoveTest testStateJumpToNormal 
> and testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes
> 
>
> Key: CASSANDRA-15551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> testStateJumpToNormal failure was on java 11
> {code}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1028)
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1023)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2513)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2055)
>   at org.apache.cassandra.Util.createInitialRing(Util.java:225)
>   at 
> org.apache.cassandra.service.MoveTest.testStateJumpToNormal(MoveTest.java:935)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes failure was on 
> java 8
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:2174)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2511)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2055)
>   at org.apache.cassandra.Util.createInitialRing(Util.java:225)
>   at 
> org.apache.cassandra.service.MoveTest.testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes(MoveTest.java:199)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15666) Race condition when completing stream sessions

2020-04-08 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078313#comment-17078313
 ] 

Benjamin Lerer commented on CASSANDRA-15666:


I put some comments on the PR.
It is always easier to fix some problems in major versions as there are less 
constraints during upgrades. So unless we believe that it will take a long 
time, it is probably better to fix it in the scope of that ticket.

> Race condition when completing stream sessions
> --
>
> Key: CASSANDRA-15666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15666
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0
>
>
> {{StreamSession#prepareAsync()}} executes, as the name implies, 
> asynchronously from the IO thread: this opens up for race conditions between 
> the sending of the {{PrepareSynAckMessage}} and the call to 
> {{StreamSession#maybeCompleted()}}. I.e., the following could happen:
> 1) Node A sends {{PrepareSynAckMessage}} from the {{prepareAsync()}} thread.
> 2) Node B receives it and starts streaming.
> 3) Node A receives the streamed file and sends {{ReceivedMessage}}.
> 4) At this point, if this was the only file to stream, both nodes are ready 
> to close the session via {{maybeCompleted()}}, but:
> a) Node A will call it twice from both the IO thread and the thread at #1, 
> closing the session and its channels.
> b) Node B will attempt to send a {{CompleteMessage}}, but will fail because 
> the session has been closed in the meantime.
> There are other subtle variations of the pattern above, depending on the 
> order of concurrently sent/received messages.
> I believe the best fix would be to modify the message exchange so that:
> 1) Only the "follower" is allowed to send the {{CompleteMessage}}.
> 2) Only the "initiator" is allowed to close the session and its channels 
> after receiving the {{CompleteMessage}}.
> By doing so, the message exchange logic would be easier to reason about, 
> which is overall a win anyway.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15700) Performance regression on internode messaging

2020-04-08 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078265#comment-17078265
 ] 

Benedict Elliott Smith commented on CASSANDRA-15700:


bq. on what ground you say this is an isolated problem?

This appears to be an issue in communication: the word isolated refers to the 
problem _code_.  Please be assured I consider this a _serious_ problem, I am 
just unconcerned about resolving it.  There is no need to panic and rush a fix.

bq. I’d still like to understand how the pruning approach we're discussing here 
is important to the control flow semantics at all

In the word of Dirk Gently: everything’s connected :)  Your prior alternative 
proposal to replace the existing semantics involved two distinct changes to 
control flow, namely introducing a hash timer wheel (something I’m in favour of 
generally, but demonstrably a control flow change, and preferable to defer 
until 5.0) and eliminating the expiry on enqueue.  I just consider these kinds 
of change to be riskier at this stage.

bq. The collapsed stacks clearly show most time is spent by pruning itself, 
that is by iterating the queue, rather than by expiring messages

I believe it shows as much as 10% of time in real expirations?  That is not 
insignificant, and given how relatively cheap evaluating an expiration is, it 
_may_ well be the case that the algorithmic inefficiency we are discussing is 
incidental to the behaviour.

bq. to make 4.0 behave as similarly as possible to 3.11

This leaves a significant gap still: 4.0 will use the local node message 
arrival to determine the timeout for its response, so there is still plenty of 
scope for messages to expire ahead of 3.x

bq. No, I meant to compute the "next expire time" as an approximation of the 
expire time of the processed messages, rather than relying on exactly computing 
it via the pruner at every event loop delivery run.
bq. We can keep the enqueue() pruning, as that's not the worst offender (see 
collapsed stacks).

The enqueue pruning is cheap because we compute the minimum expiration time, so 
it is infrequently called; if we only guess this number now, we offer no 
guarantees the balance of new messages dropped in favour of expired messages.

bq. so why adding an additional mechanism and 4th config property (your 20% 
threshold)

Is this an additional mechanism? We already have a mechanism, we just pick our 
number differently.  Guessing at a number is also a mechanism, surely?  How 
would we configure the guess algorithm, and why wouldn’t we expose its 
parameters?  I had assumed we would not make this configurable, in the same way 
we would not make the assumptions of any guess algorithm configurable, since 
its purpose is just to guarantee algorithmic complexity and bound how far from 
our memory limits we permit expired messages to be preferred over unexpired 
messages.

Honestly though it sounds like there is not much between our proposals now, 
which is reassuring, so perhaps we should focus on the common ground we have.  
However, I am now on leave so I would appreciate it if you can be patient until 
I return to continue this discussion.  I would also like to look more closely 
again at the existing behaviour, as the pruning is closely related to the 
migration of new records from the MPSC queue to the internal queue.  I have 
been trying to respond promptly to your queries, but I feel that in doing so my 
responses have not been sufficiently well considered, and I would prefer to 
take time to produce a complete and coherent view and proposal.  Is that 
acceptable to you?

> Performance regression on internode messaging
> -
>
> Key: CASSANDRA-15700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Sergio Bossa
>Assignee: Sergio Bossa
>Priority: Normal
> Fix For: 4.0
>
> Attachments: Oss40vsOss311.png, oss40.gc, oss40_nogc.tar.xz, 
> oss40_system.log
>
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings 

[jira] [Updated] (CASSANDRA-14050) Many cqlsh_copy_tests are busted

2020-04-08 Thread Aleksandr Sorokoumov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Sorokoumov updated CASSANDRA-14050:
-
Reviewers: Aleksandr Sorokoumov, Aleksandr Sorokoumov  (was: Aleksandr 
Sorokoumov)
   Aleksandr Sorokoumov, Aleksandr Sorokoumov  (was: Aleksandr 
Sorokoumov)
   Status: Review In Progress  (was: Patch Available)

> Many cqlsh_copy_tests are busted
> 
>
> Key: CASSANDRA-14050
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14050
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Kjellman
>Assignee: Stefania Alborghetti
>Priority: Normal
>
> Many cqlsh_copy_tests are busted. We should disable the entire suite until 
> this is resolved as these tests are currently nothing but a waste of time.
> test_bulk_round_trip_blogposts - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> test_bulk_round_trip_blogposts_with_max_connections - 
> cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> test_bulk_round_trip_default - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> Error starting node3.
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-S9NfIH
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'memtable_allocation_type': 'offheap_objects',
> 'num_tokens': '256',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 2546, in test_bulk_round_trip_blogposts
> stress_table='stresscql.blogposts')
>   File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 2451, in _test_bulk_round_trip
> self.prepare(nodes=nodes, partitioner=partitioner, 
> configuration_options=configuration_options)
>   File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 115, in prepare
> self.cluster.populate(nodes, 
> tokens=tokens).start(wait_for_binary_proto=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", 
> line 423, in start
> raise NodeError("Error starting {0}.".format(node.name), p)
> "Error starting node3.\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-S9NfIH\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'memtable_allocation_type': 'offheap_objects',\n  
>   'num_tokens': '256',\n'phi_convict_threshold': 5,\n
> 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': 
> 1,\n'request_timeout_in_ms': 1,\n
> 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15686) Improvements in circle CI default config

2020-04-08 Thread Stefan Podkowinski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078223#comment-17078223
 ] 

Stefan Podkowinski edited comment on CASSANDRA-15686 at 4/8/20, 12:14 PM:
--

Unit tests should be able to complete on Circle CI with medium instances. The 
unit tests are run automatically, so that's the default. Pretty much all others 
won't due to limited resources. You shouldn't try running them, but if you do, 
then we can't guarantee they may complete unless high resource settings are 
used.


was (Author: spo...@gmail.com):
Unit tests should be able to complete on Circle CI with medium instances. The 
unit tests are run automatically, so that's the default. Pretty much all others 
won't due to limited resources. You shouldn't try running them, but if you do, 
then we can guarantee they may complete unless high resource settings are used.

> Improvements in circle CI default config
> 
>
> Key: CASSANDRA-15686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Kevin Gallardo
>Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the default? We could generate a 
> lowres version of the new config-2_1.yml, and rename it {{config.yml}} so 
> that it gets picked up by CircleCI automatically instead of the current 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15686) Improvements in circle CI default config

2020-04-08 Thread Stefan Podkowinski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078223#comment-17078223
 ] 

Stefan Podkowinski commented on CASSANDRA-15686:


Unit tests should be able to complete on Circle CI with medium instances. The 
unit tests are run automatically, so that's the default. Pretty much all others 
won't due to limited resources. You shouldn't try running them, but if you do, 
then we can guarantee they may complete unless high resource settings are used.

> Improvements in circle CI default config
> 
>
> Key: CASSANDRA-15686
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15686
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build
>Reporter: Kevin Gallardo
>Priority: Normal
>
> I have been looking at and played around with the [default CircleCI 
> config|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml], 
> a few comments/questions regarding the following topics:
>  * Python dtests do not run successfully (200-300 failures) on {{medium}} 
> instances, they seem to only run with small flaky failures on {{large}} 
> instances or higher
>  * Python Upgrade tests:
>  ** Do not seem to run without many failures on any instance types / any 
> parallelism setting
>  ** Do not seem to parallelize well, it seems each container is going to 
> download multiple C* versions
>  ** Additionally it seems the configuration is not up to date, as currently 
> we get errors because {{JAVA8_HOME}} is not set
>  * Unit tests do not seem to parallelize optimally, number of test runners do 
> not reflect the available CPUs on the container. Ideally if # of runners == # 
> of CPUs, build time is improved, on any type of instances.
>  ** For instance when using the current configuration, running on medium 
> instances, build will use 1 junit test runner, but 2 CPUs are available. If 
> using 2 runners, the build time is reduced from 19min (at the current main 
> config of parallelism=4) to 12min.
>  * There are some typos in the file, some dtests say "Run Unit Tests" but 
> they are JVM dtests (see 
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1077],
>  
> [here|https://github.com/apache/cassandra/blob/trunk/.circleci/config.yml#L1386])
> So some ways to process these would be:
>  * Do the Python dtests run successfully for anyone on {{medium}} instances? 
> If not, would it make sense to bump them to {{large}} so that they can be run 
> successfully?
>  * Does anybody ever run the python upgrade tests on CircleCI and what is the 
> configuration that makes it work?
>  * Would it make sense to either hardcode the number of test runners in the 
> unit tests with `-Dtest.runners` in the config file to reflect the number of 
> CPUs on the instances, or change the build so that it is able to detect the 
> appropriate number of core available automatically?
> Additionally, it seems this default config file (config.yml) is not as well 
> maintained as the 
> [{{config-2_1.yml}}|https://github.com/apache/cassandra/blob/trunk/.circleci/config-2_1.yml]
>  (+its lowres/highres) version in the same folder (from CASSANDRA-14806). 
> What is the reasoning for maintaining these 2 versions of the build? Could 
> the better maintained version be used as the default? We could generate a 
> lowres version of the new config-2_1.yml, and rename it {{config.yml}} so 
> that it gets picked up by CircleCI automatically instead of the current 
> default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14050) Many cqlsh_copy_tests are busted

2020-04-08 Thread Aleksandr Sorokoumov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksandr Sorokoumov updated CASSANDRA-14050:
-
Reviewers: Aleksandr Sorokoumov

> Many cqlsh_copy_tests are busted
> 
>
> Key: CASSANDRA-14050
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14050
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Michael Kjellman
>Assignee: Stefania Alborghetti
>Priority: Normal
>
> Many cqlsh_copy_tests are busted. We should disable the entire suite until 
> this is resolved as these tests are currently nothing but a waste of time.
> test_bulk_round_trip_blogposts - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> test_bulk_round_trip_blogposts_with_max_connections - 
> cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> test_bulk_round_trip_default - cqlsh_tests.cqlsh_copy_tests.CqlshCopyTest
> Error starting node3.
>  >> begin captured logging << 
> dtest: DEBUG: cluster ccm directory: /tmp/dtest-S9NfIH
> dtest: DEBUG: Done setting configuration options:
> {   'initial_token': None,
> 'memtable_allocation_type': 'offheap_objects',
> 'num_tokens': '256',
> 'phi_convict_threshold': 5,
> 'range_request_timeout_in_ms': 1,
> 'read_request_timeout_in_ms': 1,
> 'request_timeout_in_ms': 1,
> 'truncate_request_timeout_in_ms': 1,
> 'write_request_timeout_in_ms': 1}
> - >> end captured logging << -
>   File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
>   File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 2546, in test_bulk_round_trip_blogposts
> stress_table='stresscql.blogposts')
>   File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 2451, in _test_bulk_round_trip
> self.prepare(nodes=nodes, partitioner=partitioner, 
> configuration_options=configuration_options)
>   File "/home/cassandra/cassandra-dtest/cqlsh_tests/cqlsh_copy_tests.py", 
> line 115, in prepare
> self.cluster.populate(nodes, 
> tokens=tokens).start(wait_for_binary_proto=True)
>   File 
> "/home/cassandra/env/local/lib/python2.7/site-packages/ccmlib/cluster.py", 
> line 423, in start
> raise NodeError("Error starting {0}.".format(node.name), p)
> "Error starting node3.\n >> begin captured logging << 
> \ndtest: DEBUG: cluster ccm directory: 
> /tmp/dtest-S9NfIH\ndtest: DEBUG: Done setting configuration options:\n{   
> 'initial_token': None,\n'memtable_allocation_type': 'offheap_objects',\n  
>   'num_tokens': '256',\n'phi_convict_threshold': 5,\n
> 'range_request_timeout_in_ms': 1,\n'read_request_timeout_in_ms': 
> 1,\n'request_timeout_in_ms': 1,\n
> 'truncate_request_timeout_in_ms': 1,\n'write_request_timeout_in_ms': 
> 1}\n- >> end captured logging << 
> -"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15690) Single partition queries can mistakenly omit partition deletions and resurrect data

2020-04-08 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-15690:

Test and Documentation Plan: New in-jvm dtests added. For trunk, the 
proposed snapshot on mismatch feature has a descriptive comment in 
cassandra.yaml, but we should add full docs for repaired data tracking in 
general.
 Status: Patch Available  (was: Open)

Pushed branches with fixes and additional tests for 3.0, 3.11 and trunk. The 
trunk branch includes a second commit which adds the ability to trigger a 
snapshot if a mismatch is detected between repaired data across replicas either 
at query time (using the tracking capability from CASSANDRA-14145) or during a 
preview repair. The snapshotting is controlled by a yaml setting and can also 
be enabled/disabled by jmx. To avoid filling the disks with snapshots, each 
replica will take at most 1 snapshot per-table per-day via this mechanism 
(manually triggered snapshots are not affected). These snapshots can be very 
useful in debugging temporary divergences between replicas which may be 
resolved by full or read repairs before investigation takes place.


||branch||utests||in-jvm dtests||dtests_with_vnodes||dtests_no_vnodes||
|[15690-3.0|https://github.com/beobal/cassandra/tree/15690-3.0]|[jdk8|https://circleci.com/gh/beobal/cassandra/1290]|[jdk8|https://circleci.com/gh/beobal/cassandra/1289]|[jdk8|https://circleci.com/gh/beobal/cassandra/1296]|[jdk8|https://circleci.com/gh/beobal/cassandra/1297]|
|[15690-3.11|https://github.com/beobal/cassandra/tree/15690-3.11]|[jdk8|https://circleci.com/gh/beobal/cassandra/1311]|[jdk8|https://circleci.com/gh/beobal/cassandra/1312]|[jdk8|https://circleci.com/gh/beobal/cassandra/1314]|[jdk8|https://circleci.com/gh/beobal/cassandra/1313]|
|[15690-trunk|https://github.com/beobal/cassandra/tree/15690-trunk]|[jdk8|https://circleci.com/gh/beobal/cassandra/1303],
 
[jdk11|https://circleci.com/gh/beobal/cassandra/1307]|[jdk8|https://circleci.com/gh/beobal/cassandra/1304],
 
[jdk11|https://circleci.com/gh/beobal/cassandra/1302]|[jdk8|https://circleci.com/gh/beobal/cassandra/1305],
 
[jdk11|https://circleci.com/gh/beobal/cassandra/1309]|[jdk8|https://circleci.com/gh/beobal/cassandra/1306],
 [jdk11|https://circleci.com/gh/beobal/cassandra/1308]|


> Single partition queries can mistakenly omit partition deletions and 
> resurrect data
> ---
>
> Key: CASSANDRA-15690
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15690
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: Aleksey Yeschenko
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-alpha
>
>
> We have logic that allows us to exclude sstables with partition deletions 
> that are older than the minimum collected timestamp in a local request. 
> However, it’s possible that another node could have rows that aren’t known to 
> the local node that are in turn older than the excluded partition deletion. 
> In such a scenario, those will be mistakenly resurrected, which is a 
> correctness issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15647) Mismatching dependencies between cassandra dist and cassandra-all pom

2020-04-08 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15647:
---
Reviewers: Michael Semb Wever, Michael Semb Wever  (was: Michael Semb Wever)
   Michael Semb Wever, Michael Semb Wever
   Status: Review In Progress  (was: Patch Available)

> Mismatching dependencies between cassandra dist and cassandra-all pom
> -
>
> Key: CASSANDRA-15647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15647
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build, Dependencies
>Reporter: Marvin Froeder
>Assignee: Ryan Svihla
>Priority: Normal
> Fix For: 4.0-beta
>
>
> I noticed that the cassandra distribution (tar.gz) dependencies doesn't match 
> the dependency list for the cassandra-all that is available at maven central.
> Cassandra distribution only includes jna 4.2.2.
> But, the maven dependency also include jna-platform 4.4.0
> Breakdown of relevant maven dependencies:
> ```
> [INFO] +- org.apache.cassandra:cassandra-all:jar:4.0-alpha3:provided
> [INFO] |  +- net.java.dev.jna:jna:jar:4.2.2:provided
> [INFO] |  +- net.openhft:chronicle-threads:jar:1.16.0:provided
> [INFO] |  |  \- net.openhft:affinity:jar:3.1.7:provided
> [INFO] |  | \- net.java.dev.jna:jna-platform:jar:4.4.0:provided
> ```
> As you can see, jna is a direct dependency and jna-platform is a transitive 
> dependency from chronicle-threads.
> I expected this issue to had been fixed by 
> https://github.com/apache/cassandra/pull/240/, but this change seem to have 
> being reverted, as no longer in trunk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15647) Mismatching dependencies between cassandra dist and cassandra-all pom

2020-04-08 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078076#comment-17078076
 ] 

Michael Semb Wever edited comment on CASSANDRA-15647 at 4/8/20, 11:10 AM:
--

||branch||circleci||jenkins||
|[trunk_15647|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_15647]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Ftrunk_15647]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/37/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/37]|


was (Author: michaelsembwever):
||branch||circleci||jenkins||
|[trunk_15647|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_15647]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fmck/trunk_15647]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/37/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/37]|

> Mismatching dependencies between cassandra dist and cassandra-all pom
> -
>
> Key: CASSANDRA-15647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15647
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build, Dependencies
>Reporter: Marvin Froeder
>Assignee: Ryan Svihla
>Priority: Normal
> Fix For: 4.0-beta
>
>
> I noticed that the cassandra distribution (tar.gz) dependencies doesn't match 
> the dependency list for the cassandra-all that is available at maven central.
> Cassandra distribution only includes jna 4.2.2.
> But, the maven dependency also include jna-platform 4.4.0
> Breakdown of relevant maven dependencies:
> ```
> [INFO] +- org.apache.cassandra:cassandra-all:jar:4.0-alpha3:provided
> [INFO] |  +- net.java.dev.jna:jna:jar:4.2.2:provided
> [INFO] |  +- net.openhft:chronicle-threads:jar:1.16.0:provided
> [INFO] |  |  \- net.openhft:affinity:jar:3.1.7:provided
> [INFO] |  | \- net.java.dev.jna:jna-platform:jar:4.4.0:provided
> ```
> As you can see, jna is a direct dependency and jna-platform is a transitive 
> dependency from chronicle-threads.
> I expected this issue to had been fixed by 
> https://github.com/apache/cassandra/pull/240/, but this change seem to have 
> being reverted, as no longer in trunk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15647) Mismatching dependencies between cassandra dist and cassandra-all pom

2020-04-08 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078076#comment-17078076
 ] 

Michael Semb Wever commented on CASSANDRA-15647:


||branch||circleci||jenkins||
|[trunk_15647|https://github.com/apache/cassandra/compare/trunk...thelastpickle:mck/trunk_15647]|[circleci|https://circleci.com/gh/thelastpickle/workflows/cassandra/tree/mck%2Fmck/trunk_15647]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/37/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/37]|

> Mismatching dependencies between cassandra dist and cassandra-all pom
> -
>
> Key: CASSANDRA-15647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15647
> Project: Cassandra
>  Issue Type: Bug
>  Components: Build, Dependencies
>Reporter: Marvin Froeder
>Assignee: Ryan Svihla
>Priority: Normal
> Fix For: 4.0-beta
>
>
> I noticed that the cassandra distribution (tar.gz) dependencies doesn't match 
> the dependency list for the cassandra-all that is available at maven central.
> Cassandra distribution only includes jna 4.2.2.
> But, the maven dependency also include jna-platform 4.4.0
> Breakdown of relevant maven dependencies:
> ```
> [INFO] +- org.apache.cassandra:cassandra-all:jar:4.0-alpha3:provided
> [INFO] |  +- net.java.dev.jna:jna:jar:4.2.2:provided
> [INFO] |  +- net.openhft:chronicle-threads:jar:1.16.0:provided
> [INFO] |  |  \- net.openhft:affinity:jar:3.1.7:provided
> [INFO] |  | \- net.java.dev.jna:jna-platform:jar:4.4.0:provided
> ```
> As you can see, jna is a direct dependency and jna-platform is a transitive 
> dependency from chronicle-threads.
> I expected this issue to had been fixed by 
> https://github.com/apache/cassandra/pull/240/, but this change seem to have 
> being reverted, as no longer in trunk.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15707) Fix cqlsh output test

2020-04-08 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078066#comment-17078066
 ] 

Michael Semb Wever commented on CASSANDRA-15707:


||branch||circleci||jenkins||
|[trunk_15707|https://github.com/apache/cassandra/compare/trunk...nastra:CASSANDRA-15707]|[circleci|https://circleci.com/gh/nastra/workflows/cassandra/tree/CASSANDRA-15707]|[!https://ci-cassandra.apache.org/job/Cassandra-devbranch/36/badge/icon!|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/36]|

> Fix cqlsh output test
> -
>
> Key: CASSANDRA-15707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15707
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/cqlshlib.test.test_cqlsh_output/TestCqlshOutput/
> {code}
> Sequences differ: ['CRE[438 chars]nt', ") WITH additional_write_policy = 
> '99p'",[710 chars], ''] != ['CRE[438 chars]nt', ') WITH 
> bloom_filter_fp_chance = 0.01', "[711 chars], '']
> First differing element 17:
> ") WITH additional_write_policy = '99p'"
> ') WITH bloom_filter_fp_chance = 0.01'
> Diff is 1475 characters long. Set self.maxDiff to None to see it.
> """Fail immediately, with the given message."""
> >>  raise self.failureException('Sequences differ: [\'CRE[438 chars]nt\', ") 
> >> WITH additional_write_policy = \'99p\'",[710 chars], \'\'] != [\'CRE[438 
> >> chars]nt\', \') WITH bloom_filter_fp_chance = 0.01\', "[711 chars], 
> >> \'\']\n\nFirst differing element 17:\n") WITH additional_write_policy = 
> >> \'99p\'"\n\') WITH bloom_filter_fp_chance = 0.01\'\n\nDiff is 1475 
> >> characters long. Set self.maxDiff to None to see it.')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15707) Fix cqlsh output test

2020-04-08 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15707:
---
Reviewers: Michael Semb Wever, Michael Semb Wever  (was: Michael Semb Wever)
   Michael Semb Wever, Michael Semb Wever
   Status: Review In Progress  (was: Patch Available)

> Fix cqlsh output test
> -
>
> Key: CASSANDRA-15707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15707
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/cqlshlib.test.test_cqlsh_output/TestCqlshOutput/
> {code}
> Sequences differ: ['CRE[438 chars]nt', ") WITH additional_write_policy = 
> '99p'",[710 chars], ''] != ['CRE[438 chars]nt', ') WITH 
> bloom_filter_fp_chance = 0.01', "[711 chars], '']
> First differing element 17:
> ") WITH additional_write_policy = '99p'"
> ') WITH bloom_filter_fp_chance = 0.01'
> Diff is 1475 characters long. Set self.maxDiff to None to see it.
> """Fail immediately, with the given message."""
> >>  raise self.failureException('Sequences differ: [\'CRE[438 chars]nt\', ") 
> >> WITH additional_write_policy = \'99p\'",[710 chars], \'\'] != [\'CRE[438 
> >> chars]nt\', \') WITH bloom_filter_fp_chance = 0.01\', "[711 chars], 
> >> \'\']\n\nFirst differing element 17:\n") WITH additional_write_policy = 
> >> \'99p\'"\n\') WITH bloom_filter_fp_chance = 0.01\'\n\nDiff is 1475 
> >> characters long. Set self.maxDiff to None to see it.')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15573) Python 3.8 fails to execute cqlsh

2020-04-08 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078051#comment-17078051
 ] 

Michael Semb Wever commented on CASSANDRA-15573:


ASF Jenkins CI 
[results|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/34/pipeline]

> Python 3.8 fails to execute cqlsh
> -
>
> Key: CASSANDRA-15573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15573
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Tool/cqlsh
>Reporter: Yuki Morishita
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Python 3.8 renamed sre_parse.Pattern to sre_parse.State (see 
> [https://bugs.python.org/issue34681] and corresponding pull request 
> [https://github.com/python/cpython/pull/9310])
> So when executing cqlsh with Python 3.8, it throws error:
> {code:java}
> Traceback (most recent call last):
>   File ".\bin\cqlsh.py", line 175, in 
> from cqlshlib import cql3handling, cqlhandling, pylexotron, sslhandling, 
> cqlshhandling
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\cql3handling.py", line 19, 
> in 
> from cqlshlib.cqlhandling import CqlParsingRuleSet, Hint
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\cqlhandling.py", line 23, 
> in 
> from cqlshlib import pylexotron, util
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\pylexotron.py", line 342, 
> in 
> class ParsingRuleSet:
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\pylexotron.py", line 343, 
> in ParsingRuleSet
> RuleSpecScanner = SaferScanner([
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\saferscanner.py", line 74, 
> in __init__
> s = re.sre_parse.Pattern()
> AttributeError: module 'sre_parse' has no attribute 'Pattern'
> {code}
> h2. Summary of Work that was done
> Added a Python 3.8 compatible SaferScanner implementation ([diff 
> here|https://github.com/apache/cassandra/pull/518/commits/2e6813f0ef5817e5d8d655052d61ce75a5fc062c]).
>  Note that the changes from CASSANDRA-15659 are required in order to verify 
> that the issue is fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15659) Better support of Python 3 for cqlsh

2020-04-08 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077243#comment-17077243
 ] 

Michael Semb Wever edited comment on CASSANDRA-15659 at 4/8/20, 10:50 AM:
--

ASF Jenkins CI 
[results|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/32/pipeline]


was (Author: michaelsembwever):
ASF Jenkins CI 
[results|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/27/pipeline]

> Better support of Python 3 for cqlsh
> 
>
> Key: CASSANDRA-15659
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15659
> Project: Cassandra
>  Issue Type: Task
>  Components: Tool/cqlsh
>Reporter: Stefan Miklosovic
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> h2. From mailing list:
> [https://lists.apache.org/thread.html/r377099b632c62b641e4feef5b738084fc5369b0c7157fae867853597%40%3Cdev.cassandra.apache.org%3E]
>  
> As of today (24/3/2020) and current trunk, there is Python 3.6 supported (1) 
> but there is not any 3.6 version ootb in Debian for example. E.g. Buster has 
> Python 3.7 and other (recent) releases have version 2.7. This means that if 
> one wants to use Python 3 in Debian, he has to use 3.6 but it is not in the 
> repository so he has to download / compile / install it on his own.
> There should be some sane Python 3 version supported which is as well present 
> in Debian repository (or requirement to run with 3.6 should be relaxed) .
> (1) 
> [https://github.com/apache/cassandra/blob/bf9a1d487b9ba469e8d740cf7d1cd419535a7e79/bin/cqlsh#L57-L65]
> h2. Summary of work that was done:
> I relaxed the requirement of *cqlsh* only working with Python 2.7 & 3.6 by 
> allowing Python 3.6+.
>  Note that I left the constraint for Python 3.6 being the minimum Python3 
> version. 
>  As [~ptbannister] pointed out, we could remove the Python 3.6 min version 
> once we remove Python 2.7 support, as otherwise testing with lots of 
> different Python versions will get costly.
> 2 Dockerfiles were added in *pylib* for minimal local testing of *cqlsh* 
> starting up with Python 3.7 & 3.8 and that both revealed
>  CASSANDRA-15572 and CASSANDRA-15573. 
>  CASSANDRA-15572 was fixed here as it was a one-liner. And I'm going to 
> tackle CASSANDRA-15573 later.
> Python 3.8 testing was added to the CircleCI config so that we can actually 
> see what else breaks with newer Python versions.
> A new Docker images with Ubuntu 19.10 was required for testing 
> ([https://github.com/apache/cassandra-builds/pull/17]). This docker image 
> sets up Python 2.7/3.6/3.7/3.8 with their respective virtual environments, 
> which are then being used by the CircleCI yaml.
> The image *spod/cassandra-testing-ubuntu1810-java11-w-dependencies:20190306* 
> couldn't be updated unfortunately because it can't be built anymore, due to 
> Ubuntu 18.10 being EOL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15573) Python 3.8 fails to execute cqlsh

2020-04-08 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078049#comment-17078049
 ] 

Michael Semb Wever commented on CASSANDRA-15573:


bq.  I think once CASSANDRA-15659 and CASSANDRA-15573 are in, we could tackle 
of how and what versions of Python to test.

And could that please be done with Jenkins in mind as well :) Delegating as 
much as possible out of our circleci and jenkins configurations, and into 
re-usable test scripts would be a real win.

> Python 3.8 fails to execute cqlsh
> -
>
> Key: CASSANDRA-15573
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15573
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Tool/cqlsh
>Reporter: Yuki Morishita
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Python 3.8 renamed sre_parse.Pattern to sre_parse.State (see 
> [https://bugs.python.org/issue34681] and corresponding pull request 
> [https://github.com/python/cpython/pull/9310])
> So when executing cqlsh with Python 3.8, it throws error:
> {code:java}
> Traceback (most recent call last):
>   File ".\bin\cqlsh.py", line 175, in 
> from cqlshlib import cql3handling, cqlhandling, pylexotron, sslhandling, 
> cqlshhandling
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\cql3handling.py", line 19, 
> in 
> from cqlshlib.cqlhandling import CqlParsingRuleSet, Hint
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\cqlhandling.py", line 23, 
> in 
> from cqlshlib import pylexotron, util
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\pylexotron.py", line 342, 
> in 
> class ParsingRuleSet:
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\pylexotron.py", line 343, 
> in ParsingRuleSet
> RuleSpecScanner = SaferScanner([
>   File "C:\Users\Yuki 
> Morishita\Projects\cassandra\bin\..\pylib\cqlshlib\saferscanner.py", line 74, 
> in __init__
> s = re.sre_parse.Pattern()
> AttributeError: module 'sre_parse' has no attribute 'Pattern'
> {code}
> h2. Summary of Work that was done
> Added a Python 3.8 compatible SaferScanner implementation ([diff 
> here|https://github.com/apache/cassandra/pull/518/commits/2e6813f0ef5817e5d8d655052d61ce75a5fc062c]).
>  Note that the changes from CASSANDRA-15659 are required in order to verify 
> that the issue is fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15707) Fix cqlsh output test

2020-04-08 Thread Eduard Tudenhoefner (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduard Tudenhoefner updated CASSANDRA-15707:

 Bug Category: Parent values: Correctness(12982)
   Complexity: Low Hanging Fruit
Discovered By: Unit Test
Fix Version/s: 4.0-alpha
 Severity: Low
   Status: Open  (was: Triage Needed)

> Fix cqlsh output test
> -
>
> Key: CASSANDRA-15707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15707
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/cqlshlib.test.test_cqlsh_output/TestCqlshOutput/
> {code}
> Sequences differ: ['CRE[438 chars]nt', ") WITH additional_write_policy = 
> '99p'",[710 chars], ''] != ['CRE[438 chars]nt', ') WITH 
> bloom_filter_fp_chance = 0.01', "[711 chars], '']
> First differing element 17:
> ") WITH additional_write_policy = '99p'"
> ') WITH bloom_filter_fp_chance = 0.01'
> Diff is 1475 characters long. Set self.maxDiff to None to see it.
> """Fail immediately, with the given message."""
> >>  raise self.failureException('Sequences differ: [\'CRE[438 chars]nt\', ") 
> >> WITH additional_write_policy = \'99p\'",[710 chars], \'\'] != [\'CRE[438 
> >> chars]nt\', \') WITH bloom_filter_fp_chance = 0.01\', "[711 chars], 
> >> \'\']\n\nFirst differing element 17:\n") WITH additional_write_policy = 
> >> \'99p\'"\n\') WITH bloom_filter_fp_chance = 0.01\'\n\nDiff is 1475 
> >> characters long. Set self.maxDiff to None to see it.')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15707) Fix cqlsh output test

2020-04-08 Thread Eduard Tudenhoefner (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduard Tudenhoefner updated CASSANDRA-15707:

Test and Documentation Plan: nosetests test/test_cqlsh_output.py
 Status: Patch Available  (was: In Progress)

Passes now locally
{code}
$ nosetests test/test_cqlsh_output.py
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
test_blob_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_boolean_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_cancel_statement (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_color_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_columnless_key_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) 
... ok
test_count_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_describe_cluster_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) 
... ok
test_describe_columnfamilies_output 
(cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_describe_columnfamily_output 
(cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_describe_keyspace_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) 
... ok
test_describe_schema_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) 
... ok
test_empty_cf_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_empty_line (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_eof_prints_newline (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_exit_prints_no_newline (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) 
... ok
test_help (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_help_types (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_multiline_statements (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... 
ok
test_no_color_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_no_prompt_or_colors_output 
(cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_null_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_numeric_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_printing_cql_error (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_printing_integrity_error (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) 
... ok
test_printing_lex_error (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_printing_parse_error (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... 
ok
test_prompt (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_show_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_static_cf_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_string_output_ascii (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... 
ok
test_string_output_utf8 (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_timestamp_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_user_types_output (cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
test_user_types_with_collections 
(cqlshlib.test.test_cqlsh_output.TestCqlshOutput) ... ok
cqlshlib.test.test_cqlsh_output.testrun_cqlsh ... ok
cqlshlib.test.test_cqlsh_output.testcall_cqlsh ... ok

--
XML: /home/nastra/Development/workspace/cassandra/pylib/cqlshlib/nosetests.xml
--
Ran 36 tests in 24.021s

OK
{code}

> Fix cqlsh output test
> -
>
> Key: CASSANDRA-15707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15707
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/cqlshlib.test.test_cqlsh_output/TestCqlshOutput/
> {code}
> Sequences differ: ['CRE[438 chars]nt', ") WITH additional_write_policy = 
> '99p'",[710 chars], ''] != ['CRE[438 chars]nt', ') WITH 
> bloom_filter_fp_chance = 0.01', "[711 chars], '']
> First differing element 17:
> ") WITH additional_write_policy = '99p'"
> ') WITH bloom_filter_fp_chance = 0.01'
> Diff is 1475 characters long. Set self.maxDiff to None to see it.
> """Fail immediately, with the given message."""
> >>  raise self.failureException('Sequences differ: [\'CRE[438 chars]nt\', ") 
> >> WITH additional_write_policy = \'99p\'",[710 chars], \'\'] != [\'CRE[438 
> >> chars]nt\', \') WITH bloom_filter_fp_chance = 0.01\', "[711 chars], 
> >> \'\']\n\nFirst differing element 17:\n") WITH additional_write_policy = 
> >> 

[jira] [Updated] (CASSANDRA-15707) Fix cqlsh output test

2020-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15707:
---
Labels: pull-request-available  (was: )

> Fix cqlsh output test
> -
>
> Key: CASSANDRA-15707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15707
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
>
> https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/cqlshlib.test.test_cqlsh_output/TestCqlshOutput/
> {code}
> Sequences differ: ['CRE[438 chars]nt', ") WITH additional_write_policy = 
> '99p'",[710 chars], ''] != ['CRE[438 chars]nt', ') WITH 
> bloom_filter_fp_chance = 0.01', "[711 chars], '']
> First differing element 17:
> ") WITH additional_write_policy = '99p'"
> ') WITH bloom_filter_fp_chance = 0.01'
> Diff is 1475 characters long. Set self.maxDiff to None to see it.
> """Fail immediately, with the given message."""
> >>  raise self.failureException('Sequences differ: [\'CRE[438 chars]nt\', ") 
> >> WITH additional_write_policy = \'99p\'",[710 chars], \'\'] != [\'CRE[438 
> >> chars]nt\', \') WITH bloom_filter_fp_chance = 0.01\', "[711 chars], 
> >> \'\']\n\nFirst differing element 17:\n") WITH additional_write_policy = 
> >> \'99p\'"\n\') WITH bloom_filter_fp_chance = 0.01\'\n\nDiff is 1475 
> >> characters long. Set self.maxDiff to None to see it.')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15700) Performance regression on internode messaging

2020-04-08 Thread Sergio Bossa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15700?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078042#comment-17078042
 ] 

Sergio Bossa commented on CASSANDRA-15700:
--

{quote}This would be a regression from 3.0 unfortunately
{quote}
Oh that's a good point, forgot we do that in 3.0+ too. We can keep the 
{{enqueue()}} pruning, as that's not the worst offender (see collapsed stacks).
{quote}I assume you mean to maintain a guess of the number of items we _on 
average_ have in the queue?
{quote}
No, I meant to compute the "next expire time" as an approximation of the expire 
time of the processed messages, rather than relying on exactly computing it via 
the pruner at every event loop delivery run.
{quote}One of the features running through this work is that we make strong 
guarantees, and this would weaken that.  I would prefer to be able to stipulate 
that (e.g.) we never reject an {{enqueue}} if more than 20% of the queue is 
already expired.
{quote}
How is that related to what we're discussing? Enqueuing new messages is 
regulated by memory limiting, so that's what gives us strong guarantees; also, 
we do already prune the backlog if memory limits are met.
{quote}messing with important control flow semantics is something I would 
prefer to avoid.
{quote}
I'd still like to understand how the pruning approach we're discussing here is 
important to the control flow semantics at all, as I don't think I've got a 
clear answer yet, although it might be me missing the point. What I've 
heard/understood is:

1) It protects us against saturating memory upon network stalls.

2) It protects us against saturating memory upon too many expired messages.

AFAIU, none of those is accurate, as the current implementation doesn't satisfy 
#1, and #2 is covered by the memory limits implementation.
{quote}I think the least risky approach is to change how we compute and select 
the expiration time we use for triggering an expiration. This also has the 
benefit of maintaining well-define guarantees, is simple, and modifies no 
behaviours besides the selection of this value.
{quote}
This is something that can definitely be tried first, to reduce the amount of 
pruner runs. I will test this next.
{quote}More specifically (e.g.), you pick a time {{t}}, such that you expect 
the set of messages with expiration less than {{t}}, say {{M( Performance regression on internode messaging
> -
>
> Key: CASSANDRA-15700
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15700
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Sergio Bossa
>Assignee: Sergio Bossa
>Priority: Normal
> Fix For: 4.0
>
> Attachments: Oss40vsOss311.png, oss40.gc, oss40_nogc.tar.xz, 
> oss40_system.log
>
>
> Me and [~jasonstack] have been investigating a performance regression 
> affecting 4.0 during a 3 nodes, RF 3 write throughput test with a timeseries 
> like workload, as shown in this plot, where blue is 3.11 and orange is 4.0:
> !Oss40vsOss311.png|width=389,height=214!
>  It's been a bit of a long investigation, but two clues ended up standing out:
> 1) An abnormal number of expired messages on 4.0 (as shown in the attached  
> system log), while 3.11 has almost none.
> 2) An abnormal GC activity (as shown in the attached gc log).
> Turns out the two are related, as the [on expired 
> callback|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundConnection.java#L462]
>  creates a huge amount of strings in the {{id()}} call. The next question is 
> what causes all those message expirations; we thoroughly reviewed the 
> internode messaging code and the only issue we could find so far is related 
> to the "batch pruning" calls 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L81]
>  and 
> [here|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundMessageQueue.java#L188]:
>  it _seems_ too much time is spent on those, causing the event loop to fall 
> behind in processing the rest of the messages, which will end up being 
> expired. This is supported by the analysis of the collapsed stacks (after 
> fixing the GC issue):
> {noformat}
> (tprint (top-aggregated-calls oss40nogc "EventLoopDelivery:doRun" 5))
> org/apache/cassandra/net/OutboundConnection$EventLoopDelivery:doRun 3456
> org/apache/cassandra/net/OutboundMessageQueue:access$600 1621
> org/apache/cassandra/net/PrunableArrayQueue:prune 1621
> org/apache/cassandra/net/OutboundMessageQueue$WithLock:close 1621
> org/apache/cassandra/net/OutboundMessageQueue:pruneInternalQueueWithLock 1620
> {noformat}
> Those are the top 5 sampled calls from {{EventLoopDelivery#doRun()}} which 
> spends half of 

[jira] [Assigned] (CASSANDRA-15707) Fix cqlsh output test

2020-04-08 Thread Eduard Tudenhoefner (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduard Tudenhoefner reassigned CASSANDRA-15707:
---

Assignee: Eduard Tudenhoefner

> Fix cqlsh output test
> -
>
> Key: CASSANDRA-15707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15707
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>
> https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/cqlshlib.test.test_cqlsh_output/TestCqlshOutput/
> {code}
> Sequences differ: ['CRE[438 chars]nt', ") WITH additional_write_policy = 
> '99p'",[710 chars], ''] != ['CRE[438 chars]nt', ') WITH 
> bloom_filter_fp_chance = 0.01', "[711 chars], '']
> First differing element 17:
> ") WITH additional_write_policy = '99p'"
> ') WITH bloom_filter_fp_chance = 0.01'
> Diff is 1475 characters long. Set self.maxDiff to None to see it.
> """Fail immediately, with the given message."""
> >>  raise self.failureException('Sequences differ: [\'CRE[438 chars]nt\', ") 
> >> WITH additional_write_policy = \'99p\'",[710 chars], \'\'] != [\'CRE[438 
> >> chars]nt\', \') WITH bloom_filter_fp_chance = 0.01\', "[711 chars], 
> >> \'\']\n\nFirst differing element 17:\n") WITH additional_write_policy = 
> >> \'99p\'"\n\') WITH bloom_filter_fp_chance = 0.01\'\n\nDiff is 1475 
> >> characters long. Set self.maxDiff to None to see it.')
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15706) Fix cqlsh completion test

2020-04-08 Thread Eduard Tudenhoefner (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078039#comment-17078039
 ] 

Eduard Tudenhoefner commented on CASSANDRA-15706:
-

Tests on Jenkins 
[here|https://ci-cassandra.apache.org/blue/organizations/jenkins/Cassandra-devbranch/detail/Cassandra-devbranch/35/pipeline/]

> Fix cqlsh completion test
> -
>
> Key: CASSANDRA-15706
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15706
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This has been failing for a while now because *system_views* and 
> *system_virtual_schema* are occuring in the completion.
> {code}
> cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion.test_complete_in_drop_keyspace
>  (from nosetests)
> Failing for the past 1 build (Since Unstable#42 )
> Took 2 sec.
> Error Message
> Items in the second set but not the first:
> 'system_views'
> 'system_virtual_schema'
> """Fail immediately, with the given message."""
> >>  raise self.failureException("Items in the second set but not the 
> >> first:\n'system_views'\n'system_virtual_schema'")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15670) Transient Replication: unable to insert data when the keyspace is configured with the SimpleStrategy

2020-04-08 Thread Francisco Fernandez (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078036#comment-17078036
 ] 

Francisco Fernandez commented on CASSANDRA-15670:
-

I've opened a PR moving the logic to calculate the write endpoints for 
transient replication to concrete ReplicationStrategy classes. That might have 
a bit of impact on performance depending on the compiler optimizations, was 
that the reason for the previous approach? 

> Transient Replication: unable to insert data when the keyspace is configured 
> with the SimpleStrategy
> 
>
> Key: CASSANDRA-15670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15670
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Transient Replication
>Reporter: Alan Boudreault
>Assignee: Francisco Fernandez
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> An error is thrown then trying to insert data with the transient replication 
> + SimpleStrategy configured.
> Test case:
> {code:java}
> CREATE KEYSPACE test_tr WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '3/1'};
> CREATE TABLE test_tr.users (id int PRIMARY KEY, username text) with 
> read_repair ='NONE';
> INSERT INTO test_tr.users (id, username) VALUES (1, 'alan');{code}
>  
> traceback:
> {code:java}
> ERROR [Native-Transport-Requests-8] 2020-03-27 10:27:17,188 
> ErrorMessage.java:450 - Unexpected exception during request
> java.lang.ClassCastException: org.apache.cassandra.locator.SimpleStrategy 
> cannot be cast to org.apache.cassandra.locator.NetworkTopologyStrategy
>   at 
> org.apache.cassandra.db.ConsistencyLevel.eachQuorumForRead(ConsistencyLevel.java:103)
>   at 
> org.apache.cassandra.db.ConsistencyLevel.eachQuorumForWrite(ConsistencyLevel.java:112)
>   at 
> org.apache.cassandra.locator.ReplicaPlans$2.select(ReplicaPlans.java:409)
>   at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:353)
>   at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:348)
>   at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:341)
>   at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:330)
>   at 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1171)
>   at 
> org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:713)
>   at 
> org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:951)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:475)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:453)
>   at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:216)
>   at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:247)
>   at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:233)
>   at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108)
>   at 
> org.apache.cassandra.transport.Message$Request.execute(Message.java:253)
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:725)
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:630)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
>  {code}
>  
> --> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ConsistencyLevel.java#L103



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-builds] branch master updated: Jenkins devbranch-artifacts job to call `cassandra-builds/build-scripts/cassandra-artifacts.sh`

2020-04-08 Thread mck
This is an automated email from the ASF dual-hosted git repository.

mck pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git


The following commit(s) were added to refs/heads/master by this push:
 new eeb3804  Jenkins devbranch-artifacts job to call 
`cassandra-builds/build-scripts/cassandra-artifacts.sh`
eeb3804 is described below

commit eeb3804f734a70112fc7fc66ceef2fd855814aac
Author: mck 
AuthorDate: Wed Apr 8 12:26:00 2020 +0200

Jenkins devbranch-artifacts job to call 
`cassandra-builds/build-scripts/cassandra-artifacts.sh`
---
 jenkins-dsl/cassandra_job_dsl_seed.groovy | 1 +
 1 file changed, 1 insertion(+)

diff --git a/jenkins-dsl/cassandra_job_dsl_seed.groovy 
b/jenkins-dsl/cassandra_job_dsl_seed.groovy
index 7876f05..26430d2 100644
--- a/jenkins-dsl/cassandra_job_dsl_seed.groovy
+++ b/jenkins-dsl/cassandra_job_dsl_seed.groovy
@@ -505,6 +505,7 @@ job('Cassandra-devbranch-artifacts') {
 steps {
 buildDescription('', buildDescStr)
 shell("git clean -xdff ; git clone -b ${buildsBranch} ${buildsRepo}")
+shell('./cassandra-builds/build-scripts/cassandra-artifacts.sh')
 }
 publishers {
 postBuildTask {


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15670) Transient Replication: unable to insert data when the keyspace is configured with the SimpleStrategy

2020-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15670:
---
Labels: pull-request-available  (was: )

> Transient Replication: unable to insert data when the keyspace is configured 
> with the SimpleStrategy
> 
>
> Key: CASSANDRA-15670
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15670
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Transient Replication
>Reporter: Alan Boudreault
>Assignee: Francisco Fernandez
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>
> An error is thrown then trying to insert data with the transient replication 
> + SimpleStrategy configured.
> Test case:
> {code:java}
> CREATE KEYSPACE test_tr WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '3/1'};
> CREATE TABLE test_tr.users (id int PRIMARY KEY, username text) with 
> read_repair ='NONE';
> INSERT INTO test_tr.users (id, username) VALUES (1, 'alan');{code}
>  
> traceback:
> {code:java}
> ERROR [Native-Transport-Requests-8] 2020-03-27 10:27:17,188 
> ErrorMessage.java:450 - Unexpected exception during request
> java.lang.ClassCastException: org.apache.cassandra.locator.SimpleStrategy 
> cannot be cast to org.apache.cassandra.locator.NetworkTopologyStrategy
>   at 
> org.apache.cassandra.db.ConsistencyLevel.eachQuorumForRead(ConsistencyLevel.java:103)
>   at 
> org.apache.cassandra.db.ConsistencyLevel.eachQuorumForWrite(ConsistencyLevel.java:112)
>   at 
> org.apache.cassandra.locator.ReplicaPlans$2.select(ReplicaPlans.java:409)
>   at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:353)
>   at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:348)
>   at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:341)
>   at 
> org.apache.cassandra.locator.ReplicaPlans.forWrite(ReplicaPlans.java:330)
>   at 
> org.apache.cassandra.service.StorageProxy.performWrite(StorageProxy.java:1171)
>   at 
> org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:713)
>   at 
> org.apache.cassandra.service.StorageProxy.mutateWithTriggers(StorageProxy.java:951)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement.executeWithoutCondition(ModificationStatement.java:475)
>   at 
> org.apache.cassandra.cql3.statements.ModificationStatement.execute(ModificationStatement.java:453)
>   at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:216)
>   at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:247)
>   at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:233)
>   at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:108)
>   at 
> org.apache.cassandra.transport.Message$Request.execute(Message.java:253)
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.processRequest(Message.java:725)
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.lambda$channelRead0$0(Message.java:630)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:165)
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:119)
>   at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>   at java.lang.Thread.run(Thread.java:748)
>  {code}
>  
> --> 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ConsistencyLevel.java#L103



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15706) Fix cqlsh completion test

2020-04-08 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-15706:
---
Reviewers: Benjamin Lerer

> Fix cqlsh completion test
> -
>
> Key: CASSANDRA-15706
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15706
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This has been failing for a while now because *system_views* and 
> *system_virtual_schema* are occuring in the completion.
> {code}
> cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion.test_complete_in_drop_keyspace
>  (from nosetests)
> Failing for the past 1 build (Since Unstable#42 )
> Took 2 sec.
> Error Message
> Items in the second set but not the first:
> 'system_views'
> 'system_virtual_schema'
> """Fail immediately, with the given message."""
> >>  raise self.failureException("Items in the second set but not the 
> >> first:\n'system_views'\n'system_virtual_schema'")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-builds] branch master updated (6efcddd -> 33ba1e3)

2020-04-08 Thread mck
This is an automated email from the ASF dual-hosted git repository.

mck pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git.


from 6efcddd  Add Jenkins post-action to prune docker data
 add 33ba1e3  From the dtest build scripts remove the "--no-site-packages" 
flag (which breaks virtualenv >20), and add a tmpdir environment variable 
(pointing to a `tmp` folder in current directory)

No new revisions were added by this update.

Summary of changes:
 build-scripts/cassandra-dtest-pytest.sh   |  4 +++-
 build-scripts/cassandra-dtest.sh  |  4 +++-
 build-scripts/cassandra-test.sh   | 15 ---
 jenkins-dsl/cassandra_job_dsl_seed.groovy | 27 +++
 4 files changed, 33 insertions(+), 17 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15707) Fix cqlsh output test

2020-04-08 Thread Eduard Tudenhoefner (Jira)
Eduard Tudenhoefner created CASSANDRA-15707:
---

 Summary: Fix cqlsh output test
 Key: CASSANDRA-15707
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15707
 Project: Cassandra
  Issue Type: Bug
  Components: Test/unit
Reporter: Eduard Tudenhoefner


https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/cqlshlib.test.test_cqlsh_output/TestCqlshOutput/

{code}
Sequences differ: ['CRE[438 chars]nt', ") WITH additional_write_policy = 
'99p'",[710 chars], ''] != ['CRE[438 chars]nt', ') WITH bloom_filter_fp_chance 
= 0.01', "[711 chars], '']

First differing element 17:
") WITH additional_write_policy = '99p'"
') WITH bloom_filter_fp_chance = 0.01'

Diff is 1475 characters long. Set self.maxDiff to None to see it.
"""Fail immediately, with the given message."""
>>  raise self.failureException('Sequences differ: [\'CRE[438 chars]nt\', ") 
>> WITH additional_write_policy = \'99p\'",[710 chars], \'\'] != [\'CRE[438 
>> chars]nt\', \') WITH bloom_filter_fp_chance = 0.01\', "[711 chars], 
>> \'\']\n\nFirst differing element 17:\n") WITH additional_write_policy = 
>> \'99p\'"\n\') WITH bloom_filter_fp_chance = 0.01\'\n\nDiff is 1475 
>> characters long. Set self.maxDiff to None to see it.')
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15706) Fix cqlsh completion test

2020-04-08 Thread Eduard Tudenhoefner (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078006#comment-17078006
 ] 

Eduard Tudenhoefner commented on CASSANDRA-15706:
-

{code}
$ nosetests test/test_cqlsh_completion.py 
nose.config: INFO: Ignoring files matching ['^\\.', '^_', '^setup\\.py$']
test_complete_command_words 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_alter_columnfamily 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_batch 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_create_columnfamily 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_create_index 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_create_keyspace 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_create_table 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_delete 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
Tests for Cassandra-10733 ... ok
test_complete_in_drop (cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) 
... ok
test_complete_in_drop_columnfamily 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_drop_index 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_drop_keyspace 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_insert 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_select 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_string_literals 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_truncate 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_update 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
test_complete_in_use (cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) 
... ok
test_complete_in_uuid (cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) 
... ok
test_complete_on_empty_string 
(cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion) ... ok
cqlshlib.test.test_cqlsh_completion.testrun_cqlsh ... ok

--
XML: /home/nastra/Development/workspace/cassandra/pylib/cqlshlib/nosetests.xml
--
Ran 22 tests in 165.902s

OK
{code}
Passes now locally as well

> Fix cqlsh completion test
> -
>
> Key: CASSANDRA-15706
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15706
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This has been failing for a while now because *system_views* and 
> *system_virtual_schema* are occuring in the completion.
> {code}
> cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion.test_complete_in_drop_keyspace
>  (from nosetests)
> Failing for the past 1 build (Since Unstable#42 )
> Took 2 sec.
> Error Message
> Items in the second set but not the first:
> 'system_views'
> 'system_virtual_schema'
> """Fail immediately, with the given message."""
> >>  raise self.failureException("Items in the second set but not the 
> >> first:\n'system_views'\n'system_virtual_schema'")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15706) Fix cqlsh completion test

2020-04-08 Thread Eduard Tudenhoefner (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduard Tudenhoefner updated CASSANDRA-15706:

Test and Documentation Plan: Start up cqlsh and try *CREATE TABLE ...* 
completions. No system keyspaces should show up during completions
 Status: Patch Available  (was: In Progress)

> Fix cqlsh completion test
> -
>
> Key: CASSANDRA-15706
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15706
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This has been failing for a while now because *system_views* and 
> *system_virtual_schema* are occuring in the completion.
> {code}
> cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion.test_complete_in_drop_keyspace
>  (from nosetests)
> Failing for the past 1 build (Since Unstable#42 )
> Took 2 sec.
> Error Message
> Items in the second set but not the first:
> 'system_views'
> 'system_virtual_schema'
> """Fail immediately, with the given message."""
> >>  raise self.failureException("Items in the second set but not the 
> >> first:\n'system_views'\n'system_virtual_schema'")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15706) Fix cqlsh completion test

2020-04-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15706:
---
Labels: pull-request-available  (was: )

> Fix cqlsh completion test
> -
>
> Key: CASSANDRA-15706
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15706
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>
> This has been failing for a while now because *system_views* and 
> *system_virtual_schema* are occuring in the completion.
> {code}
> cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion.test_complete_in_drop_keyspace
>  (from nosetests)
> Failing for the past 1 build (Since Unstable#42 )
> Took 2 sec.
> Error Message
> Items in the second set but not the first:
> 'system_views'
> 'system_virtual_schema'
> """Fail immediately, with the given message."""
> >>  raise self.failureException("Items in the second set but not the 
> >> first:\n'system_views'\n'system_virtual_schema'")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15706) Fix cqlsh completion test

2020-04-08 Thread Eduard Tudenhoefner (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eduard Tudenhoefner updated CASSANDRA-15706:

 Bug Category: Parent values: Correctness(12982)
   Complexity: Low Hanging Fruit
Discovered By: Unit Test
Fix Version/s: 4.0-alpha
 Severity: Low
   Status: Open  (was: Triage Needed)

> Fix cqlsh completion test
> -
>
> Key: CASSANDRA-15706
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15706
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Testing
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> This has been failing for a while now because *system_views* and 
> *system_virtual_schema* are occuring in the completion.
> {code}
> cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion.test_complete_in_drop_keyspace
>  (from nosetests)
> Failing for the past 1 build (Since Unstable#42 )
> Took 2 sec.
> Error Message
> Items in the second set but not the first:
> 'system_views'
> 'system_virtual_schema'
> """Fail immediately, with the given message."""
> >>  raise self.failureException("Items in the second set but not the 
> >> first:\n'system_views'\n'system_virtual_schema'")
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15706) Fix cqlsh completion test

2020-04-08 Thread Eduard Tudenhoefner (Jira)
Eduard Tudenhoefner created CASSANDRA-15706:
---

 Summary: Fix cqlsh completion test
 Key: CASSANDRA-15706
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15706
 Project: Cassandra
  Issue Type: Bug
  Components: Legacy/Testing
Reporter: Eduard Tudenhoefner
Assignee: Eduard Tudenhoefner


This has been failing for a while now because *system_views* and 
*system_virtual_schema* are occuring in the completion.

{code}
cqlshlib.test.test_cqlsh_completion.TestCqlshCompletion.test_complete_in_drop_keyspace
 (from nosetests)

Failing for the past 1 build (Since Unstable#42 )
Took 2 sec.
Error Message
Items in the second set but not the first:
'system_views'
'system_virtual_schema'
"""Fail immediately, with the given message."""
>>  raise self.failureException("Items in the second set but not the 
>> first:\n'system_views'\n'system_virtual_schema'")
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15676) flaky test testWriteUnknownResult- org.apache.cassandra.distributed.test.CasWriteTest

2020-04-08 Thread Eduard Tudenhoefner (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077964#comment-17077964
 ] 

Eduard Tudenhoefner commented on CASSANDRA-15676:
-

For completeness, here's the history on Jenkins: 
https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/junit/org.apache.cassandra.distributed.test/CasWriteTest/history/

> flaky test testWriteUnknownResult- 
> org.apache.cassandra.distributed.test.CasWriteTest
> -
>
> Key: CASSANDRA-15676
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15676
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Kevin Gallardo
>Assignee: Eduard Tudenhoefner
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Failure observed in: 
> https://app.circleci.com/pipelines/github/newkek/cassandra/33/workflows/54007cf7-4424-4ec1-9655-665f6044e6d1/jobs/187/tests
> {noformat}
> testWriteUnknownResult - org.apache.cassandra.distributed.test.CasWriteTest
> junit.framework.AssertionFailedError: Expecting cause to be 
> CasWriteUncertainException
>   at 
> org.apache.cassandra.distributed.test.CasWriteTest.testWriteUnknownResult(CasWriteTest.java:257)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15660) Unable to specify -e/--execute flag in cqlsh

2020-04-08 Thread ZhaoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZhaoYang updated CASSANDRA-15660:
-
Reviewers: Brandon Williams, Dinesh Joshi  (was: Brandon Williams)

> Unable to specify -e/--execute flag in cqlsh
> 
>
> Key: CASSANDRA-15660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15660
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/cqlsh
>Reporter: Stefan Miklosovic
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From mailing list:
> [https://lists.apache.org/thread.html/r377099b632c62b641e4feef5b738084fc5369b0c7157fae867853597%40%3Cdev.cassandra.apache.org%3E]
> The bug looks like this:
> {code:java}
> $ /usr/bin/cqlsh -e 'describe keyspaces' -u cassandra -p cassandra 127.0.0.1
> Usage: cqlsh.py [options] [host [port]]cqlsh.py: error: '127.0.0.1' is not a 
> valid port number.
> {code}
> This is working in 3.x releases just fine but fails on 4.
> The workaround for 4.x code as of today is to put these statements into file 
> and use "-f" flag.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15313) Fix flaky - ChecksummingTransformerTest - org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest

2020-04-08 Thread Eduard Tudenhoefner (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077959#comment-17077959
 ] 

Eduard Tudenhoefner commented on CASSANDRA-15313:
-

For completeness, here's the Jenkins history for this test failure: 
https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/junit/org.apache.cassandra.transport.frame.checksum/ChecksummingTransformerTest/history/

> Fix flaky - ChecksummingTransformerTest - 
> org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
> ---
>
> Key: CASSANDRA-15313
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15313
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Vinay Chella
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: CASSANDRA-15313-hack.patch
>
>
> During the recent runs, this test appears to be flaky.
> Example failure: 
> [https://circleci.com/gh/vinaykumarchella/cassandra/459#tests/containers/94]
> corruptionCausesFailure-compression - 
> org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest
> {code:java}
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>   at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>   at org.quicktheories.impl.Precursor.(Precursor.java:17)
>   at 
> org.quicktheories.impl.ConcreteDetachedSource.(ConcreteDetachedSource.java:8)
>   at 
> org.quicktheories.impl.ConcreteDetachedSource.detach(ConcreteDetachedSource.java:23)
>   at org.quicktheories.generators.Retry.generate(CodePoints.java:51)
>   at 
> org.quicktheories.generators.Generate.lambda$intArrays$10(Generate.java:190)
>   at 
> org.quicktheories.generators.Generate$$Lambda$17/1847008471.generate(Unknown 
> Source)
>   at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255)
>   at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36)
>   at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown 
> Source)
>   at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36)
>   at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown 
> Source)
>   at org.quicktheories.core.Gen.lambda$mix$10(Gen.java:184)
>   at org.quicktheories.core.Gen$$Lambda$45/802243390.generate(Unknown 
> Source)
>   at org.quicktheories.core.Gen.lambda$flatMap$5(Gen.java:93)
>   at org.quicktheories.core.Gen$$Lambda$48/363509958.generate(Unknown 
> Source)
>   at 
> org.quicktheories.dsl.TheoryBuilder4.lambda$prgnToTuple$12(TheoryBuilder4.java:188)
>   at 
> org.quicktheories.dsl.TheoryBuilder4$$Lambda$40/2003496028.generate(Unknown 
> Source)
>   at org.quicktheories.core.DescribingGenerator.generate(Gen.java:255)
>   at org.quicktheories.core.FilteredGenerator.generate(Gen.java:225)
>   at org.quicktheories.core.Gen.lambda$map$0(Gen.java:36)
>   at org.quicktheories.core.Gen$$Lambda$20/71399214.generate(Unknown 
> Source)
>   at org.quicktheories.impl.Core.generate(Core.java:150)
>   at org.quicktheories.impl.Core.shrink(Core.java:103)
>   at org.quicktheories.impl.Core.run(Core.java:39)
>   at org.quicktheories.impl.TheoryRunner.check(TheoryRunner.java:35)
>   at org.quicktheories.dsl.TheoryBuilder4.check(TheoryBuilder4.java:150)
>   at 
> org.quicktheories.dsl.TheoryBuilder4.checkAssert(TheoryBuilder4.java:162)
>   at 
> org.apache.cassandra.transport.frame.checksum.ChecksummingTransformerTest.corruptionCausesFailure(ChecksummingTransformerTest.java:87)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15660) Unable to specify -e/--execute flag in cqlsh

2020-04-08 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077957#comment-17077957
 ] 

ZhaoYang edited comment on CASSANDRA-15660 at 4/8/20, 9:07 AM:
---

[~djoshi] thanks for the feedback. Writing shell script for the first time..

Updated the patch to remove "--python" option and its value from users 
arguments {{$@}}. This seems to be a safer approach comparing existing trunk - 
constructing users arguments as string..

Shellcheck passed and "cqlsh_tests/test_cqlsh.py" passed locally. Do you mind 
having another look? 


was (Author: jasonstack):
[~djoshi] thanks the feedback. Writing shell script for the first time..

Updated the patch to remove "--python" option and its value from users 
arguments {{$@}}. This seems to be a safer approach comparing existing trunk - 
constructing users arguments as string..

Shellcheck passed and "cqlsh_tests/test_cqlsh.py" passed locally. Do you mind 
having another look? 

> Unable to specify -e/--execute flag in cqlsh
> 
>
> Key: CASSANDRA-15660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15660
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/cqlsh
>Reporter: Stefan Miklosovic
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From mailing list:
> [https://lists.apache.org/thread.html/r377099b632c62b641e4feef5b738084fc5369b0c7157fae867853597%40%3Cdev.cassandra.apache.org%3E]
> The bug looks like this:
> {code:java}
> $ /usr/bin/cqlsh -e 'describe keyspaces' -u cassandra -p cassandra 127.0.0.1
> Usage: cqlsh.py [options] [host [port]]cqlsh.py: error: '127.0.0.1' is not a 
> valid port number.
> {code}
> This is working in 3.x releases just fine but fails on 4.
> The workaround for 4.x code as of today is to put these statements into file 
> and use "-f" flag.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15660) Unable to specify -e/--execute flag in cqlsh

2020-04-08 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077957#comment-17077957
 ] 

ZhaoYang edited comment on CASSANDRA-15660 at 4/8/20, 9:07 AM:
---

[~djoshi] thanks for the feedback. Writing shell script for the first time..

Updated the patch to remove "--python" option and its value from users 
arguments {{$@}}. This seems to be a safer approach comparing to existing trunk 
- constructing users arguments as string..

Shellcheck passed and "cqlsh_tests/test_cqlsh.py" passed locally. Do you mind 
having another look? 


was (Author: jasonstack):
[~djoshi] thanks for the feedback. Writing shell script for the first time..

Updated the patch to remove "--python" option and its value from users 
arguments {{$@}}. This seems to be a safer approach comparing existing trunk - 
constructing users arguments as string..

Shellcheck passed and "cqlsh_tests/test_cqlsh.py" passed locally. Do you mind 
having another look? 

> Unable to specify -e/--execute flag in cqlsh
> 
>
> Key: CASSANDRA-15660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15660
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/cqlsh
>Reporter: Stefan Miklosovic
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From mailing list:
> [https://lists.apache.org/thread.html/r377099b632c62b641e4feef5b738084fc5369b0c7157fae867853597%40%3Cdev.cassandra.apache.org%3E]
> The bug looks like this:
> {code:java}
> $ /usr/bin/cqlsh -e 'describe keyspaces' -u cassandra -p cassandra 127.0.0.1
> Usage: cqlsh.py [options] [host [port]]cqlsh.py: error: '127.0.0.1' is not a 
> valid port number.
> {code}
> This is working in 3.x releases just fine but fails on 4.
> The workaround for 4.x code as of today is to put these statements into file 
> and use "-f" flag.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15660) Unable to specify -e/--execute flag in cqlsh

2020-04-08 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077957#comment-17077957
 ] 

ZhaoYang commented on CASSANDRA-15660:
--

[~djoshi] thanks the feedback. Writing shell script for the first time..

Updated the patch to remove "--python" option and its value from users 
arguments {{$@}}. This seems to be a safer approach comparing existing trunk - 
constructing users arguments as string..

Shellcheck passed and "cqlsh_tests/test_cqlsh.py" passed locally. Do you mind 
having another look? 

> Unable to specify -e/--execute flag in cqlsh
> 
>
> Key: CASSANDRA-15660
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15660
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/cqlsh
>Reporter: Stefan Miklosovic
>Assignee: ZhaoYang
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> From mailing list:
> [https://lists.apache.org/thread.html/r377099b632c62b641e4feef5b738084fc5369b0c7157fae867853597%40%3Cdev.cassandra.apache.org%3E]
> The bug looks like this:
> {code:java}
> $ /usr/bin/cqlsh -e 'describe keyspaces' -u cassandra -p cassandra 127.0.0.1
> Usage: cqlsh.py [options] [host [port]]cqlsh.py: error: '127.0.0.1' is not a 
> valid port number.
> {code}
> This is working in 3.x releases just fine but fails on 4.
> The workaround for 4.x code as of today is to put these statements into file 
> and use "-f" flag.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15551) Fix flaky tests org.apache.cassandra.service.MoveTest testStateJumpToNormal and testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes

2020-04-08 Thread Eduard Tudenhoefner (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077953#comment-17077953
 ] 

Eduard Tudenhoefner commented on CASSANDRA-15551:
-

According to 
https://ci-cassandra.apache.org/view/branches/job/Cassandra-trunk/45/testReport/junit/org.apache.cassandra.service/MoveTest/history/
 this hasn't been failing in a while. Can we close this ticket?

> Fix flaky tests org.apache.cassandra.service.MoveTest testStateJumpToNormal 
> and testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes
> 
>
> Key: CASSANDRA-15551
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15551
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> testStateJumpToNormal failure was on java 11
> {code}
> java.lang.NullPointerException
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1028)
>   at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:1023)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2513)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2055)
>   at org.apache.cassandra.Util.createInitialRing(Util.java:225)
>   at 
> org.apache.cassandra.service.MoveTest.testStateJumpToNormal(MoveTest.java:935)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}
> testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes failure was on 
> java 8
> {code}
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.service.StorageService.updatePeerInfo(StorageService.java:2174)
>   at 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:2511)
>   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:2055)
>   at org.apache.cassandra.Util.createInitialRing(Util.java:225)
>   at 
> org.apache.cassandra.service.MoveTest.testMoveWithPendingRangesNetworkStrategyRackAwareThirtyNodes(MoveTest.java:199)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-12701) Repair history tables should have TTL and TWCS

2020-04-08 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-12701:

  Fix Version/s: 4.0-alpha
Source Control Link: 
https://github.com/apache/cassandra/commit/25aa10c9c12c294ff5998d481a0618193c66b5c2
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

committed, thanks!

one dtest failure: test_simple_strategy_each_quorum_counters - 
consistency_test.TestAccuracy - but that passes locally and looks unrelated

in-jvm upgrade dtests look broken, but that shouldn't be related to this - will 
look in to those next

> Repair history tables should have TTL and TWCS
> --
>
> Key: CASSANDRA-12701
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12701
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Core
>Reporter: Chris Lohfink
>Assignee: Marcus Eriksson
>Priority: Normal
>  Labels: lhf
> Fix For: 4.0-alpha
>
> Attachments: CASSANDRA-12701.txt
>
>
> Some tools schedule a lot of small subrange repairs which can lead to a lot 
> of repairs constantly being run. These partitions can grow pretty big in 
> theory. I dont think much reads from them which might help but its still 
> kinda wasted disk space. I think a month TTL (longer than gc grace) and maybe 
> a 1 day twcs window makes sense to me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Repair history tables should have TTL and TWCS

2020-04-08 Thread marcuse
This is an automated email from the ASF dual-hosted git repository.

marcuse pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 25aa10c  Repair history tables should have TTL and TWCS
25aa10c is described below

commit 25aa10c9c12c294ff5998d481a0618193c66b5c2
Author: Marcus Eriksson 
AuthorDate: Thu Apr 2 14:17:40 2020 +0200

Repair history tables should have TTL and TWCS

Patch by marcuse; reviewed by Jon Meredith for CASSANDRA-12701
---
 CHANGES.txt|  1 +
 .../repair/SystemDistributedKeyspace.java  | 25 --
 .../apache/cassandra/schema/CompactionParams.java  |  6 ++
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index ab71308..acac895 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-alpha4
+ * Repair history tables should have TTL and TWCS (CASSANDRA-12701)
  * Fix cqlsh erroring out on Python 3.7 due to webbrowser module being absent 
(CASSANDRA-15572)
  * Fix IMH#acquireCapacity() to return correct Outcome when endpoint reserve 
runs out (CASSANDRA-15607)
  * Fix nodetool describering output (CASSANDRA-15682)
diff --git 
a/src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java 
b/src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java
index a28a637..2e3b981 100644
--- a/src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java
+++ b/src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java
@@ -28,9 +28,11 @@ import java.util.List;
 import java.util.Map;
 import java.util.Set;
 import java.util.UUID;
+import java.util.concurrent.TimeUnit;
 
 import com.google.common.base.Joiner;
 import com.google.common.collect.Lists;
+import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Sets;
 
 import org.slf4j.Logger;
@@ -47,6 +49,7 @@ import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.gms.Gossiper;
 import org.apache.cassandra.locator.InetAddressAndPort;
 import org.apache.cassandra.repair.messages.RepairOption;
+import org.apache.cassandra.schema.CompactionParams;
 import org.apache.cassandra.schema.KeyspaceMetadata;
 import org.apache.cassandra.schema.KeyspaceParams;
 import org.apache.cassandra.schema.SchemaConstants;
@@ -77,8 +80,9 @@ public final class SystemDistributedKeyspace
  * gen 2: (pre-)add coordinator_port and participants_v2 columns to 
repair_history in 3.0, 3.11, 4.0
  * gen 3: gc_grace_seconds raised from 0 to 10 days in CASSANDRA-12954 in 
3.11.0
  * gen 4: compression chunk length reduced to 16KiB, 
memtable_flush_period_in_ms now unset on all tables in 4.0
+ * gen 5: add ttl and TWCS to repair_history tables
  */
-public static final long GENERATION = 4;
+public static final long GENERATION = 5;
 
 public static final String REPAIR_HISTORY = "repair_history";
 
@@ -105,7 +109,11 @@ public final class SystemDistributedKeyspace
  + "status text,"
  + "started_at timestamp,"
  + "finished_at timestamp,"
- + "PRIMARY KEY ((keyspace_name, columnfamily_name), 
id))");
+ + "PRIMARY KEY ((keyspace_name, columnfamily_name), id))")
+.defaultTimeToLive((int) TimeUnit.DAYS.toSeconds(30))
+
.compaction(CompactionParams.twcs(ImmutableMap.of("compaction_window_unit","DAYS",
+  
"compaction_window_size","1")))
+.build();
 
 private static final TableMetadata ParentRepairHistory =
 parse(PARENT_REPAIR_HISTORY,
@@ -121,7 +129,11 @@ public final class SystemDistributedKeyspace
  + "requested_ranges set,"
  + "successful_ranges set,"
  + "options map,"
- + "PRIMARY KEY (parent_id))");
+ + "PRIMARY KEY (parent_id))")
+.defaultTimeToLive((int) TimeUnit.DAYS.toSeconds(30))
+
.compaction(CompactionParams.twcs(ImmutableMap.of("compaction_window_unit","DAYS",
+  
"compaction_window_size","1")))
+.build();
 
 private static final TableMetadata ViewBuildStatus =
 parse(VIEW_BUILD_STATUS,
@@ -131,14 +143,13 @@ public final class SystemDistributedKeyspace
  + "view_name text,"
  + "host_id uuid,"
  + "status text,"
- + "PRIMARY KEY ((keyspace_name, view_name), host_id))");
+ + "PRIMARY KEY ((keyspace_name, view_name), 
host_id))").build();
 
-private static TableMetadata parse(String table, String description, 
String cql)
+private static TableMetadata.Builder parse(String table, String 
description, String cql)
 {
 return