[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151196#comment-16151196 ] mck edited comment on CASSANDRA-13418 at 9/1/17 9:39 PM: - {quote}P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it.{quote} I agree, but found no clear method name to use. As Marcus' comments, {{getFullyExpiredSSTables(..)}} isn't appropriate. Any suggestions for a clear name? Otherwise the method is at 70 lines length, not great but no disaster, so i'm ok either way. was (Author: michaelsembwever): {quote}P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it.{quote} I agree, but found not clear method name to use. As Marcus' comments, {{getFullyExpiredSSTables(..)}} isn't appropriate. Any suggestions for a clear name? Otherwise the method is at 70 lines length, not great but no disaster, so i'm ok either way. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151196#comment-16151196 ] mck commented on CASSANDRA-13418: - {quote}P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it.{quote} I agree, but found not clear method name to use. As Marcus' comments, {{getFullyExpiredSSTables(..)}} isn't appropriate. Any suggestions for a clear name? Otherwise the method is at 70 lines length, not great but no disaster, so i'm ok either way. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13754) BTree.Builder memory leak
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-13754: - Resolution: Fixed Status: Resolved (was: Ready to Commit) Committed as [bed7fa5ef8492d1ff3852cf299622a5ad4e0b621|https://github.com/apache/cassandra/commit/bed7fa5ef8492d1ff3852cf299622a5ad4e0b621] to [cassandra-3.11|https://github.com/apache/cassandra/tree/cassandra-3.11] and merged to trunk. > BTree.Builder memory leak > - > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans >Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13339) java.nio.BufferOverflowException: null
[ https://issues.apache.org/jira/browse/CASSANDRA-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150916#comment-16150916 ] Jason Brown commented on CASSANDRA-13339: - Here's where I'm at so far investigating: - A mutation is being stored on the coorindator as it is a replica for the data, as can be seen from the {{LocalMutationRunnable}} in the stack traces. - In both 3.0 and 3.9 (which have been reported), We execute the {{StorageProxy#performLocally}} method that takes an {{IAsyncCallbackWithFailure}} as the last parameter (The method has a different arity between the two cassandra versions, but it's the same method basically). That method is a few different ways in {{StorageProxy}} -- {{#apply}} - the standard 'write a mutation' function -- sync batchlog - write the batchlog and block -- counter write -- syncWriteBatachedMutations -- asyncWriteBatchedMutations Due to the way everyone's currently reported stack traces look, what I'm suspecting is the write thread think the rows in the Mutation (one of the {{PartitionUpdate}}'s {{holder}} instances to be specific) are empty when we check the serialized size, but not empty when we actually serialize. Here's why: The stack traces all fail in [{{UnfilteredRowIteratorSerializer#serialize}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java#L120]. At that point in the serialize method, we've already written out at least two bytes (one for the partition key length, and one for the flags). We then try to serialize the {{SerializationHeader}}, which serializes the {{EncodingStats}}, and then it fails. In [{{UnfilteredRowIteratorSerializer#serializedSize}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java#L150], it accounts for the partition key length and flags *at the minimum*. If the {{iterator}} argument to the method {{#isEmpty}}, is simply returns the currently computed size. Thus we always serialize the 'basic data' about a row, but then nothing else; we knew we needed to something about a row, but didn't have the full knowledge about the row when we calculated the size. I think there may be some thread visibility issue or some race condition where the {{iterator}} is empty at {{UnfilteredRowIteratorSerializer#serializedSize}}, yet not empty at {{UnfilteredRowIteratorSerializer#serialize}}. Note that there may be something funny going on with the {{PartitionUpdate#holder}}, but I could see anything obvious (without grasping at straws). Without more details or a way to reproduce, I'm kind of at a stand-still without just flailing at all the things. Thanks to all those who have commented, especially [~crichards] > java.nio.BufferOverflowException: null > -- > > Key: CASSANDRA-13339 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13339 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Chris Richards > > I'm seeing the following exception running Cassandra 3.9 (with Netty updated > to 4.1.8.Final) running on a 2 node cluster. It would have been processing > around 50 queries/second at the time (mixture of > inserts/updates/selects/deletes) : there's a collection of tables (some with > counters some without) and a single materialized view. > {code} > ERROR [MutationStage-4] 2017-03-15 22:50:33,052 StorageProxy.java:1353 - > Failed to apply mutation locally : {} > java.nio.BufferOverflowException: null > at > org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.writeUnsignedVInt(BufferedDataOutputStreamPlus.java:262) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.EncodingStats$Serializer.serialize(EncodingStats.java:233) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.SerializationHeader$Serializer.serializeForMessaging(SerializationHeader.java:380) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:122) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serialize(PartitionUpdate.java:790) > ~[apache-cassandra-3.9.jar:3.9] > at >
[jira] [Commented] (CASSANDRA-12813) NPE in auth for bootstrapping node
[ https://issues.apache.org/jira/browse/CASSANDRA-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150908#comment-16150908 ] Andres March commented on CASSANDRA-12813: -- any workaround? I got this on a 3.9 cluster with a new node bootstrapping. No upgrades from another version. > NPE in auth for bootstrapping node > -- > > Key: CASSANDRA-12813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12813 > Project: Cassandra > Issue Type: Bug >Reporter: Charles Mims >Assignee: Alex Petrov > Fix For: 2.2.9, 3.0.10, 3.10 > > > {code} > ERROR [SharedPool-Worker-1] 2016-10-19 21:40:25,991 Message.java:617 - > Unexpected exception during request; channel = [id: 0x15eb017f, / omitted>:40869 => /10.0.0.254:9042] > java.lang.NullPointerException: null > at > org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:144) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:86) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:182) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513) > [apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407) > [apache-cassandra-3.0.9.jar:3.0.9] > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_101] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > [apache-cassandra-3.0.9.jar:3.0.9] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-3.0.9.jar:3.0.9] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] > {code} > I have a node that has been joining for around 24 hours. My application is > configured with the IP address of the joining node in the list of nodes to > connect to (ruby driver), and I have been getting around 200 events of this > NPE per hour. I removed the IP of the joining node from the list of nodes > for my app to connect to and the errors stopped. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[3/3] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e5f3bb6e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e5f3bb6e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e5f3bb6e Branch: refs/heads/trunk Commit: e5f3bb6e583a4f71a2522a040a93468404dfb653 Parents: fb0e001 bed7fa5 Author: Robert StuppAuthored: Fri Sep 1 19:16:29 2017 +0200 Committer: Robert Stupp Committed: Fri Sep 1 19:16:29 2017 +0200 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/utils/btree/BTree.java | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/e5f3bb6e/CHANGES.txt -- diff --cc CHANGES.txt index 78c2947,c4a3170..023ff06 --- a/CHANGES.txt +++ b/CHANGES.txt @@@ -1,134 -1,6 +1,135 @@@ +4.0 + * Add stress profile yaml with LWT (CASSANDRA-7960) + * Reduce memory copies and object creations when acting on ByteBufs (CASSANDRA-13789) + * simplify mx4j configuration (Cassandra-13578) + * Fix trigger example on 4.0 (CASSANDRA-13796) + * force minumum timeout value (CASSANDRA-9375) + * use netty for streaming (CASSANDRA-12229) + * Use netty for internode messaging (CASSANDRA-8457) + * Add bytes repaired/unrepaired to nodetool tablestats (CASSANDRA-13774) + * Don't delete incremental repair sessions if they still have sstables (CASSANDRA-13758) + * Fix pending repair manager index out of bounds check (CASSANDRA-13769) + * Don't use RangeFetchMapCalculator when RF=1 (CASSANDRA-13576) + * Don't optimise trivial ranges in RangeFetchMapCalculator (CASSANDRA-13664) + * Use an ExecutorService for repair commands instead of new Thread(..).start() (CASSANDRA-13594) + * Fix race / ref leak in anticompaction (CASSANDRA-13688) + * Expose tasks queue length via JMX (CASSANDRA-12758) + * Fix race / ref leak in PendingRepairManager (CASSANDRA-13751) + * Enable ppc64le runtime as unsupported architecture (CASSANDRA-13615) + * Improve sstablemetadata output (CASSANDRA-11483) + * Support for migrating legacy users to roles has been dropped (CASSANDRA-13371) + * Introduce error metrics for repair (CASSANDRA-13387) + * Refactoring to primitive functional interfaces in AuthCache (CASSANDRA-13732) + * Update metrics to 3.1.5 (CASSANDRA-13648) + * batch_size_warn_threshold_in_kb can now be set at runtime (CASSANDRA-13699) + * Avoid always rebuilding secondary indexes at startup (CASSANDRA-13725) + * Upgrade JMH from 1.13 to 1.19 (CASSANDRA-13727) + * Upgrade SLF4J from 1.7.7 to 1.7.25 (CASSANDRA-12996) + * Default for start_native_transport now true if not set in config (CASSANDRA-13656) + * Don't add localhost to the graph when calculating where to stream from (CASSANDRA-13583) + * Make CDC availability more deterministic via hard-linking (CASSANDRA-12148) + * Allow skipping equality-restricted clustering columns in ORDER BY clause (CASSANDRA-10271) + * Use common nowInSec for validation compactions (CASSANDRA-13671) + * Improve handling of IR prepare failures (CASSANDRA-13672) + * Send IR coordinator messages synchronously (CASSANDRA-13673) + * Flush system.repair table before IR finalize promise (CASSANDRA-13660) + * Fix column filter creation for wildcard queries (CASSANDRA-13650) + * Add 'nodetool getbatchlogreplaythrottle' and 'nodetool setbatchlogreplaythrottle' (CASSANDRA-13614) + * fix race condition in PendingRepairManager (CASSANDRA-13659) + * Allow noop incremental repair state transitions (CASSANDRA-13658) + * Run repair with down replicas (CASSANDRA-10446) + * Added started & completed repair metrics (CASSANDRA-13598) + * Added started & completed repair metrics (CASSANDRA-13598) + * Improve secondary index (re)build failure and concurrency handling (CASSANDRA-10130) + * Improve calculation of available disk space for compaction (CASSANDRA-13068) + * Change the accessibility of RowCacheSerializer for third party row cache plugins (CASSANDRA-13579) + * Allow sub-range repairs for a preview of repaired data (CASSANDRA-13570) + * NPE in IR cleanup when columnfamily has no sstables (CASSANDRA-13585) + * Fix Randomness of stress values (CASSANDRA-12744) + * Allow selecting Map values and Set elements (CASSANDRA-7396) + * Fast and garbage-free Streaming Histogram (CASSANDRA-13444) + * Update repairTime for keyspaces on completion (CASSANDRA-13539) + * Add configurable upper bound for validation executor threads (CASSANDRA-13521) + * Bring back maxHintTTL propery (CASSANDRA-12982) + * Add testing guidelines (CASSANDRA-13497) + * Add more repair metrics (CASSANDRA-13531) + * RangeStreamer should be smarter when
[1/3] cassandra git commit: BTree.Builder memory leak
Repository: cassandra Updated Branches: refs/heads/cassandra-3.11 cd3aca036 -> bed7fa5ef refs/heads/trunk fb0e0019e -> e5f3bb6e5 BTree.Builder memory leak patch by Robert Stupp; reviewed by Jeremiah Jordan for CASSANDRA-13754 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bed7fa5e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bed7fa5e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bed7fa5e Branch: refs/heads/cassandra-3.11 Commit: bed7fa5ef8492d1ff3852cf299622a5ad4e0b621 Parents: cd3aca0 Author: Robert StuppAuthored: Fri Sep 1 19:11:32 2017 +0200 Committer: Robert Stupp Committed: Fri Sep 1 19:12:01 2017 +0200 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/utils/btree/BTree.java | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bed7fa5e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e5ccf45..c4a3170 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.11.1 + * BTree.Builder memory leak (CASSANDRA-13754) * Revert CASSANDRA-10368 of supporting non-pk column filtering due to correctness (CASSANDRA-13798) * Fix cassandra-stress hang issues when an error during cluster connection happens (CASSANDRA-12938) * Better bootstrap failure message when blocked by (potential) range movement (CASSANDRA-13744) http://git-wip-us.apache.org/repos/asf/cassandra/blob/bed7fa5e/src/java/org/apache/cassandra/utils/btree/BTree.java -- diff --git a/src/java/org/apache/cassandra/utils/btree/BTree.java b/src/java/org/apache/cassandra/utils/btree/BTree.java index 1a5d9ae..a4519b9 100644 --- a/src/java/org/apache/cassandra/utils/btree/BTree.java +++ b/src/java/org/apache/cassandra/utils/btree/BTree.java @@ -866,7 +866,7 @@ public class BTree private void cleanup() { quickResolver = null; -Arrays.fill(values, 0, count, null); +Arrays.fill(values, null); count = 0; detected = true; auto = true; - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[2/3] cassandra git commit: BTree.Builder memory leak
BTree.Builder memory leak patch by Robert Stupp; reviewed by Jeremiah Jordan for CASSANDRA-13754 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bed7fa5e Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bed7fa5e Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bed7fa5e Branch: refs/heads/trunk Commit: bed7fa5ef8492d1ff3852cf299622a5ad4e0b621 Parents: cd3aca0 Author: Robert StuppAuthored: Fri Sep 1 19:11:32 2017 +0200 Committer: Robert Stupp Committed: Fri Sep 1 19:12:01 2017 +0200 -- CHANGES.txt | 1 + src/java/org/apache/cassandra/utils/btree/BTree.java | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/bed7fa5e/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index e5ccf45..c4a3170 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 3.11.1 + * BTree.Builder memory leak (CASSANDRA-13754) * Revert CASSANDRA-10368 of supporting non-pk column filtering due to correctness (CASSANDRA-13798) * Fix cassandra-stress hang issues when an error during cluster connection happens (CASSANDRA-12938) * Better bootstrap failure message when blocked by (potential) range movement (CASSANDRA-13744) http://git-wip-us.apache.org/repos/asf/cassandra/blob/bed7fa5e/src/java/org/apache/cassandra/utils/btree/BTree.java -- diff --git a/src/java/org/apache/cassandra/utils/btree/BTree.java b/src/java/org/apache/cassandra/utils/btree/BTree.java index 1a5d9ae..a4519b9 100644 --- a/src/java/org/apache/cassandra/utils/btree/BTree.java +++ b/src/java/org/apache/cassandra/utils/btree/BTree.java @@ -866,7 +866,7 @@ public class BTree private void cleanup() { quickResolver = null; -Arrays.fill(values, 0, count, null); +Arrays.fill(values, null); count = 0; detected = true; auto = true; - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13754) BTree.Builder memory leak
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-13754: - Summary: BTree.Builder memory leak (was: FastThreadLocal leaks memory) > BTree.Builder memory leak > - > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans >Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13085) Cassandra fails to start because WindowsFailedSnapshotTracker can not write to CASSANDRA_HOME
[ https://issues.apache.org/jira/browse/CASSANDRA-13085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150839#comment-16150839 ] Jason Rust commented on CASSANDRA-13085: We've also hit this issue when trying to deploy C* on windows. If a new directory is chosen it might make sense to also use that as the default in the HeapUtils class, the only other class that references and tries to write to the CASSANDRA_HOME folder. A less-invasive workaround I've found is to set CASSANDRA_HOME to the data directory as the very last line of cassandra-env.ps1. This allows the libraries to be sourced from the real CASSANDRA_HOME, but then overwrites the variable before C* actually launches. > Cassandra fails to start because WindowsFailedSnapshotTracker can not write > to CASSANDRA_HOME > - > > Key: CASSANDRA-13085 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13085 > Project: Cassandra > Issue Type: Bug > Components: Local Write-Read Paths, Packaging > Environment: might be windows only considering the classname >Reporter: Pieter-Jan Pintens > Labels: windows > > We are currently trying to package Cassandra with our application. > In windows our server does not want to start because it want to write to > CASSANDRA_HOME\.toDelete, since we install to 'C:\program files\...' this is > not possible when started under a non privileged user. We were hoping that > setting pointers for the data and log dir to a writable location (somewhere > under user home) would be enough to start cassandra but this component wants > to write to a path that we cannot modify. > For us there are a couple of solutions: > 1) the location can be specified using a system property like data and log > dirs > 2) this file is written to the data location > Our current work arround would be to patch this class file but that is hard > to maintain. > {noformat} > Exception (java.lang.RuntimeException) encountered during startup: Failed to > cre > ate failed snapshot tracking file [.toDelete]. Aborting > java.lang.RuntimeException: Failed to create failed snapshot tracking file > [.toDelete]. Aborting > at > org.apache.cassandra.db.WindowsFailedSnapshotTracker.deleteOldSnapshots(WindowsFailedSnapshotTracker.java:98) > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:186) > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601) > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730) > at com.id.cassandra.wrapper.Main.main(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) > at java.lang.reflect.Method.invoke(Unknown Source) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
Cassandra COPY TO CSV format , producing inconsistent data
HI , we are facing issues with copy to csv file , from a table. the date field value is not getting in CSV, but the select state on the same table will show the date.. and this behavior is inconsistent , sometimes we get the data for date filed and sometimes not.. also the select statement very occasionally on the first run date column shown as empty and on immediate second run it shows the date column value improper result: select call_uid,call_start_date from customer_calls where customer_uid=2904; call_uid| call_start_date -+- 19096285868247 | 2017-08-30 13:30:23.839000+ 19096285878250 | 2017-08-30 13:30:33.842000+ 19096374614659 |null 19096374616659 |null 19096374618669 |null 19096374620665 |null 19096374622671 |null 19096374624662 |null 19096374626656 |null 20195690924360 | 2017-08-29 07:54:12.171000+ 20195797463722 | 2017-08-30 13:29:51.558000+ Proper result: (on executing second time) cqlsh:ncm> select call_uid,call_start_date from customer_calls where customer_uid=2904; call_uid | call_start_date +- 19096374614659 | 2017-08-31 14:09:30.248000+ 19096374616659 | 2017-08-31 14:09:32.247000+ 19096374618669 | 2017-08-31 14:09:34.258000+ 19096374620665 | 2017-08-31 14:09:36.253000+ 19096374622671 | 2017-08-31 14:09:38.259000+ 19096374624662 | 2017-08-31 14:09:40.25+ 19096374626656 | 2017-08-31 14:09:42.244000+ - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13738) Load is over calculated after each IndexSummaryRedistribution
[ https://issues.apache.org/jira/browse/CASSANDRA-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150783#comment-16150783 ] Jay Zhuang commented on CASSANDRA-13738: 2.2 branch uTest fail for {{ant eclipse-warnings}}, but I'm unable to reproduce it locally: {noformat} eclipse-warnings: [mkdir] Created dir: /home/ubuntu/cassandra/build/ecj [echo] Running Eclipse Code Analysis. Output logged to /home/ubuntu/cassandra/build/ecj/eclipse_compiler_checks.txt [java] incorrect classpath: /home/ubuntu/cassandra/build/cobertura/classes [java] -- [java] 1. ERROR in /home/ubuntu/cassandra/src/java/org/apache/cassandra/db/compaction/CompactionManager.java (at line 853) [java] ISSTableScanner scanner = cleanupStrategy.getScanner(sstable, getRateLimiter()); [java] ^^^ [java] Resource 'scanner' should be managed by try-with-resource [java] -- [java] -- [java] 2. ERROR in /home/ubuntu/cassandra/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java (at line 257) [java] scanners.add(new LeveledScanner(intersecting, range)); [java] ^^^ [java] Potential resource leak: '' may not be closed [java] -- [java] -- [java] 3. ERROR in /home/ubuntu/cassandra/src/java/org/apache/cassandra/tools/SSTableExport.java (at line 315) [java] ISSTableScanner scanner = reader.getScanner(); [java] ^^^ [java] Resource 'scanner' should be managed by try-with-resource [java] -- [java] 3 problems (3 errors) {noformat} And for the other test failures, I don't think they're introduced by this patch. > Load is over calculated after each IndexSummaryRedistribution > - > > Key: CASSANDRA-13738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13738 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Jay Zhuang >Assignee: Jay Zhuang > Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x > > Attachments: sizeIssue.png > > > For example, here is one of our cluster with about 500GB per node, but > {{nodetool status}} shows far more load than it actually is and keeps > increasing, restarting the process will reset the load, but keeps increasing > afterwards: > {noformat} > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- AddressLoad Tokens Owns (effective) Host ID > Rack > UN IP1* 13.52 TB 256 100.0% > c4c31e0a-3f01-49f7-8a22-33043737975d rac1 > UN IP2* 14.25 TB 256 100.0% > efec4980-ec9e-4424-8a21-ce7ddaf80aa0 rac1 > UN IP3* 13.52 TB 256 100.0% > 7dbcfdfc-9c07-4b1a-a4b9-970b715ebed8 rac1 > UN IP4* 22.13 TB 256 100.0% > 8879e6c4-93e3-4cc5-b957-f999c6b9b563 rac1 > UN IP5* 18.02 TB 256 100.0% > 4a1eaf22-4a83-4736-9e1c-12f898d685fa rac1 > UN IP6* 11.68 TB 256 100.0% > d633c591-28af-42cc-bc5e-47d1c8bcf50f rac1 > {noformat} > !sizeIssue.png|test! > The root cause is if the SSTable index summary is redistributed (typically > executes hourly), the updated SSTable size is added again. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13833) Failed compaction is not captured
[ https://issues.apache.org/jira/browse/CASSANDRA-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jay Zhuang updated CASSANDRA-13833: --- Status: Patch Available (was: Open) > Failed compaction is not captured > - > > Key: CASSANDRA-13833 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13833 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Jay Zhuang >Assignee: Jay Zhuang > > Follow up for CASSANDRA-13785, when the compaction failed, it fails silently. > No error message is logged and exceptions metric is not updated. Basically, > it's unable to get the exception: > [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491] > Here is the call stack: > {noformat} > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at java.lang.Thread.run(Thread.java:745) > {noformat} > There're 2 {{FutureTask}} in the call stack, for example > {{FutureTask1(FutureTask2))}}, If the call thrown an exception, > {{FutureTask2}} sets the status, save the exception and return. But > FutureTask1 doesn't get any exception, then set the status to normal. So > we're unable to get the exception in: > [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491] > 2.1.x is working fine, here is the call stack: > {noformat} > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) > ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_141] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_141] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13530) GroupCommitLogService
[ https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150661#comment-16150661 ] Ariel Weisberg commented on CASSANDRA-13530: You are still testing batch at 2ms. I don't think that should hurt performance, but I would really like to see testing with it at the default value. If it's syncing extra times with smaller batches due to the 2ms setting that would hurt performance. My main question is how many operations are in each batch when batch and group commitlog are syncing? Is it aggregating batches the way it is supposed to? Is it syncing more times per second and killing the throughput of the underlying device? Is the issue that the device is shared between data and the commit log so it's better to have fewer larger syncs? Have you added a warmup phase to the testing so that everything is warmed up before you start measuring? Can you modify the each commit log to log when each sync starts and completes along with how many log entries are in each sync? When you log have the log statements go to a dedicated thread to log them via an unbounded blocking queue so they don't impact performance. > GroupCommitLogService > - > > Key: CASSANDRA-13530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13530 > Project: Cassandra > Issue Type: Improvement >Reporter: Yuji Ito >Assignee: Yuji Ito > Fix For: 2.2.x, 3.0.x, 3.11.x > > Attachments: groupCommit22.patch, groupCommit30.patch, > groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, > groupCommitLog_result.xlsx, GuavaRequestThread.java, MicroRequestThread.java > > > I propose a new CommitLogService, GroupCommitLogService, to improve the > throughput when lots of requests are received. > It improved the throughput by maximum 94%. > I'd like to discuss about this CommitLogService. > Currently, we can select either 2 CommitLog services; Periodic and Batch. > In Periodic, we might lose some commit log which hasn't written to the disk. > In Batch, we can write commit log to the disk every time. The size of commit > log to write is too small (< 4KB). When high concurrency, these writes are > gathered and persisted to the disk at once. But, when insufficient > concurrency, many small writes are issued and the performance decreases due > to the latency of the disk. Even if you use SSD, processes of many IO > commands decrease the performance. > GroupCommitLogService writes some commitlog to the disk at once. > The patch adds GroupCommitLogService (It is enabled by setting > `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml). > The difference from Batch is just only waiting for the semaphore. > By waiting for the semaphore, some writes for commit logs are executed at the > same time. > In GroupCommitLogService, the latency becomes worse if the there is no > concurrency. > I measured the performance with my microbench (MicroRequestThread.java) by > increasing the number of threads.The cluster has 3 nodes (Replication factor: > 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume. > The result is as below. The GroupCommitLogService with 10ms window improved > update with Paxos by 94% and improved select with Paxos by 76%. > h6. SELECT / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|192|103| > |2|163|212| > |4|264|416| > |8|454|800| > |16|744|1311| > |32|1151|1481| > |64|1767|1844| > |128|2949|3011| > |256|4723|5000| > h6. UPDATE / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|45|26| > |2|39|51| > |4|58|102| > |8|102|198| > |16|167|213| > |32|289|295| > |64|544|548| > |128|1046|1058| > |256|2020|2061| -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13838) Ensure all threads are FastThreadLocal.removeAll() is called for all threads
[ https://issues.apache.org/jira/browse/CASSANDRA-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-13838: - Status: Patch Available (was: Open) > Ensure all threads are FastThreadLocal.removeAll() is called for all threads > > > Key: CASSANDRA-13838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13838 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp > > There are a couple of places, there it's not guaranteed that > FastThreadLocal.removeAll() is called. Most misses are actually not that > critical, but the miss for the thread created via in > org.apache.cassandra.streaming.ConnectionHandler.MessageHandler#start(java.net.Socket, > int, boolean) could be critical, because these threads are created for every > stream-session. > (Follow-up from CASSANDRA-13754) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13838) Ensure all threads are FastThreadLocal.removeAll() is called for all threads
[ https://issues.apache.org/jira/browse/CASSANDRA-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150634#comment-16150634 ] Robert Stupp commented on CASSANDRA-13838: -- ||cassandra-3.11|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...snazy:13838-ftl-ensure-3.11] ||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:13838-ftl-ensure-trunk] > Ensure all threads are FastThreadLocal.removeAll() is called for all threads > > > Key: CASSANDRA-13838 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13838 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp >Assignee: Robert Stupp > > There are a couple of places, there it's not guaranteed that > FastThreadLocal.removeAll() is called. Most misses are actually not that > critical, but the miss for the thread created via in > org.apache.cassandra.streaming.ConnectionHandler.MessageHandler#start(java.net.Socket, > int, boolean) could be critical, because these threads are created for every > stream-session. > (Follow-up from CASSANDRA-13754) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150611#comment-16150611 ] Jeremiah Jordan commented on CASSANDRA-13754: - and +1 for the patch. > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans >Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremiah Jordan updated CASSANDRA-13754: Status: Ready to Commit (was: Patch Available) > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans >Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150609#comment-16150609 ] Jeremiah Jordan commented on CASSANDRA-13754: - +1 for just https://github.com/apache/cassandra/commit/2cafd0b6b4bbc5a6ec5726d47d0093bdac3af19c to fix this and splitting out the other changes to a new ticket. > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans >Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-13754: - Reviewer: Jeremiah Jordan Status: Patch Available (was: In Progress) Given that the FTL changes apparently do not have any influence to the OOM issuse, but look serious enough to fix, I've split them out into CASSANDRA-13838. Patch for this ticket is reduced to the BTree change. CI looks good. > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans >Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13734) BufferUnderflowException when using uppercase UUID
[ https://issues.apache.org/jira/browse/CASSANDRA-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150585#comment-16150585 ] Claudia S commented on CASSANDRA-13734: --- Hi, Following the schemas that were used in my tests. We also have more versions of the table using more complex clustering keys, let me know if these could have an impact and you need them as well. {code} CREATE TYPE IF NOT EXISTS event_log_system.vivates_participant ( id text, global_role text, local_role text ); CREATE TABLE IF NOT EXISTS event_log_system.event ( id uuid, version text, created_at timestamp, event_type text, source_id text, session_id text, session_typetext, business_process_id text, action text, action_outcome text, user frozen , hp_id text, patient_id text, participants frozen, message text, extensions map , PRIMARY KEY (id) ); CREATE TABLE IF NOT EXISTS event_log_system.event_by_patient_timestamp ( id uuid, version text, created_at timestamp, event_type text, source_id text, session_id text, session_typetext, business_process_id text, action text, action_outcome text, user frozen , hp_id text, patient_id text, participants frozen , message text, extensions map , PRIMARY KEY (patient_id, created_at, id) ) WITH CLUSTERING ORDER BY (created_at DESC); {code} And the queries we do (in some cases we also execute the same queries without using JSON): {code} SELECT JSON * FROM event WHERE id = ? SELECT JSON * FROM event_by_patient_timestamp WHERE patient_id = ? AND created_at < ? LIMIT ?; SELECT JSON COUNT(*) FROM event_by_patient_timestamp WHERE patient_id = ? AND created_at > ?; {code} > BufferUnderflowException when using uppercase UUID > -- > > Key: CASSANDRA-13734 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13734 > Project: Cassandra > Issue Type: Bug > Environment: Cassandra 2.2.8 running on OSX 10.12.5 > * org.apache.cassandra:cassandra-all:jar:2.2.8 > * com.datastax.cassandra:cassandra-driver-core:jar:3.0.0 > * org.apache.cassandra:cassandra-thrift:jar:2.2.8 >Reporter: Claudia S > > We have a table with a primary key of type uuid which we query for results in > JSON format. When I accidentally caused a query passing a UUID which has an > uppercase letter I noticed that this causes a BufferUnderflowException on > Cassandra. > I directly attempted the queries using cqlsh, I can retrieve the entry using > standard select but whenever I pass JSON I get a BufferUnderflowException. > {code:title=cql queries} > cassandra@cqlsh:event_log_system> SELECT * FROM event WHERE id = > 559a4d83-9410-4b69-b459-566b8cf57aaa; > [RESULT REMOVED] > (1 rows) > cassandra@cqlsh:event_log_system> SELECT * FROM event WHERE id = > 559a4d83-9410-4b69-b459-566b8cf57AAA; > [RESULT REMOVED] > (1 rows) > cassandra@cqlsh:event_log_system> SELECT JSON * FROM event WHERE id = > 559a4d83-9410-4b69-b459-566b8cf57AAA; > ServerError: java.nio.BufferUnderflowException > cassandra@cqlsh:event_log_system> SELECT JSON * FROM event WHERE id = > 559a4d83-9410-4b69-b459-566b8cf57aaa; > ServerError: java.nio.BufferUnderflowException > {code} > {code:title=log} > TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,392 Message.java:506 - > Received: QUERY SELECT JSON * FROM event WHERE id = > 559a4d83-9410-4b69-b459-566b8cf57AAA;, v=4 > TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,392 QueryProcessor.java:221 - > Process org.apache.cassandra.cql3.statements.SelectStatement@67e6c0c @CL.ONE > TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,392 ReadCallback.java:76 - > Blockfor is 1; setting up requests to localhost/127.0.0.1 > TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,393 > AbstractReadExecutor.java:118 - reading data locally > TRACE [SharedPool-Worker-2] 2017-07-28 20:40:41,393 SliceQueryFilter.java:269 > - collecting 0 of 2147483647: :false:0@150126701983 > TRACE [SharedPool-Worker-2] 2017-07-28 20:40:41,393 SliceQueryFilter.java:269 > - collecting 1 of 2147483647: can_login:false:1@150126701983 > TRACE [SharedPool-Worker-2] 2017-07-28 20:40:41,393 SliceQueryFilter.java:269 > - collecting 1 of 2147483647: is_superuser:false:1@150126701983 > TRACE [SharedPool-Worker-2] 2017-07-28 20:40:41,393 SliceQueryFilter.java:269 > - collecting 1 of 2147483647: salted_hash:false:60@150126701983 > TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,393
[jira] [Created] (CASSANDRA-13838) Ensure all threads are FastThreadLocal.removeAll() is called for all threads
Robert Stupp created CASSANDRA-13838: Summary: Ensure all threads are FastThreadLocal.removeAll() is called for all threads Key: CASSANDRA-13838 URL: https://issues.apache.org/jira/browse/CASSANDRA-13838 Project: Cassandra Issue Type: Improvement Reporter: Robert Stupp Assignee: Robert Stupp There are a couple of places, there it's not guaranteed that FastThreadLocal.removeAll() is called. Most misses are actually not that critical, but the miss for the thread created via in org.apache.cassandra.streaming.ConnectionHandler.MessageHandler#start(java.net.Socket, int, boolean) could be critical, because these threads are created for every stream-session. (Follow-up from CASSANDRA-13754) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
[ https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson resolved CASSANDRA-13836. - Resolution: Fixed Fix Version/s: 4.0 committed as {{fb0e0019e76eb96659904}} > dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore > > > Key: CASSANDRA-13836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 4.0 > > > Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and > that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance
[ https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150566#comment-16150566 ] Kevin Rivait commented on CASSANDRA-13780: -- thank you Jeff, regarding TWCS, agree this is a non issue. We went back and dumped the SSTABLE max/min dates and verified that the buckets older than TTL and GCGS are in fact being dropped. regarding adding a DC, that is exactly what we did in our DEV/TEST environment, but we rebuilt nodes one at a time, we weren't sure the consequences (if any) of rebuilding all nodes at the same time. > ADD Node streaming throughput performance > - > > Key: CASSANDRA-13780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13780 > Project: Cassandra > Issue Type: Improvement > Components: Core > Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 > 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux > Architecture: x86_64 > CPU op-mode(s):32-bit, 64-bit > Byte Order:Little Endian > CPU(s):40 > On-line CPU(s) list: 0-39 > Thread(s) per core:2 > Core(s) per socket:10 > Socket(s): 2 > NUMA node(s): 2 > Vendor ID: GenuineIntel > CPU family:6 > Model: 79 > Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz > Stepping: 1 > CPU MHz: 2199.869 > BogoMIPS: 4399.36 > Virtualization:VT-x > L1d cache: 32K > L1i cache: 32K > L2 cache: 256K > L3 cache: 25600K > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38 > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39 > total used free sharedbuffers cached > Mem: 252G 217G34G 708K 308M 149G > -/+ buffers/cache:67G 185G > Swap: 16G 0B16G >Reporter: Kevin Rivait > Fix For: 3.0.9 > > > Problem: Adding a new node to a large cluster runs at least 1000x slower than > what the network and node hardware capacity can support, taking several days > per new node. Adjusting stream throughput and other YAML parameters seems to > have no effect on performance. Essentially, it appears that Cassandra has an > architecture scalability growth problem when adding new nodes to a moderate > to high data ingestion cluster because Cassandra cannot add new node capacity > fast enough to keep up with increasing data ingestion volumes and growth. > Initial Configuration: > Running 3.0.9 and have implemented TWCS on one of our largest table. > Largest table partitioned on (ID, MM) using 1 day buckets with a TTL of > 60 days. > Next release will change partitioning to (ID, MMDD) so that partitions > are aligned with daily TWCS buckets. > Each node is currently creating roughly a 30GB SSTable per day. > TWCS working as expected, daily SSTables are dropping off daily after 70 > days ( 60 + 10 day grace) > Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , > replication factor 3 > Data directories are backed with 4 - 2TB SSDs on each node and a 1 800GB SSD > for commit logs. > Requirement is to double cluster size, capacity, and ingestion volume within > a few weeks. > Observed Behavior: > 1. streaming throughput during add node – we observed maximum 6 Mb/s > streaming from each of the 14 nodes on a 20Gb/s switched network, taking at > least 106 hours for each node to join cluster and each node is only about 2.2 > TB is size. > 2. compaction on the newly added node - compaction has fallen behind, with > anywhere from 4,000 to 10,000 SSTables at any given time. It took 3 weeks > for compaction to finish on each newly added node. Increasing number of > compaction threads to match number of CPU (40) and increasing compaction > throughput to 32MB/s seemed to be the sweet spot. > 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days. > Compaction correctly placed the data in daily files, but the problem is the > file dates reflect when compaction created the file and not the date of the > last record written in the TWCS bucket, which will cause the files to remain > around much longer than necessary. > Two Questions: > 1. What can be done to substantially improve the performance of adding a new > node? > 2. Can compaction on TWCS partitions for newly added nodes change the file > create date to match the highest date record in the file -or- add another > piece of meta-data to the TWCS files that reflect the file drop date so that > TWCS partitions can be dropped consistently? -- This message was sent by Atlassian JIRA (v6.4.14#64029)
cassandra git commit: Set encryptionOptions to null if encryption is disabled in BulkLoadConnectionFactory
Repository: cassandra Updated Branches: refs/heads/trunk 8ed41fbc5 -> fb0e0019e Set encryptionOptions to null if encryption is disabled in BulkLoadConnectionFactory Patch by marcuse; reviewed by Jason Brown for CASSANDRA-13836 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fb0e0019 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fb0e0019 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fb0e0019 Branch: refs/heads/trunk Commit: fb0e0019e76eb96659904b9161f06b600718e704 Parents: 8ed41fb Author: Marcus ErikssonAuthored: Fri Sep 1 15:03:05 2017 +0200 Committer: Marcus Eriksson Committed: Fri Sep 1 15:55:06 2017 +0200 -- .../org/apache/cassandra/tools/BulkLoadConnectionFactory.java| 4 +++- src/java/org/apache/cassandra/tools/BulkLoader.java | 2 +- 2 files changed, 4 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/fb0e0019/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java -- diff --git a/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java b/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java index d119081..b56d292 100644 --- a/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java +++ b/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java @@ -38,7 +38,9 @@ public class BulkLoadConnectionFactory extends DefaultConnectionFactory implemen { this.storagePort = storagePort; this.secureStoragePort = secureStoragePort; -this.encryptionOptions = encryptionOptions; +this.encryptionOptions = encryptionOptions != null && encryptionOptions.internode_encryption == EncryptionOptions.ServerEncryptionOptions.InternodeEncryption.none + ? null + : encryptionOptions; this.outboundBindAny = outboundBindAny; } http://git-wip-us.apache.org/repos/asf/cassandra/blob/fb0e0019/src/java/org/apache/cassandra/tools/BulkLoader.java -- diff --git a/src/java/org/apache/cassandra/tools/BulkLoader.java b/src/java/org/apache/cassandra/tools/BulkLoader.java index 0f1c555..01d8c33 100644 --- a/src/java/org/apache/cassandra/tools/BulkLoader.java +++ b/src/java/org/apache/cassandra/tools/BulkLoader.java @@ -104,7 +104,7 @@ public class BulkLoader // Give sockets time to gracefully close Thread.sleep(1000); -// System.exit(0); // We need that to stop non daemonized threads +System.exit(0); // We need that to stop non daemonized threads } catch (Exception e) { - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559 ] Romain GERARD edited comment on CASSANDRA-13418 at 9/1/17 1:56 PM: --- Don't worry [~michaelsembwever], I am currently working on an issue with couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. was (Author: rgerard): Don't worry [~michaelsembwever], I am currently working with an issue on couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559 ] Romain GERARD edited comment on CASSANDRA-13418 at 9/1/17 1:51 PM: --- Don't worry [~michaelsembwever], I am currently working with an issue on couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. was (Author: rgerard): Don't worry [~mck], I am currently working with an issue on couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
[ https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150556#comment-16150556 ] Jason Brown edited comment on CASSANDRA-13836 at 9/1/17 1:51 PM: - [~krummas] and I discussed offline, and we'll commit his current patch as-is, and open a new ticket for the daemon threads issue. This way we can unblock dtests on trunk. UPDATE: created CASSANDRA-13837 was (Author: jasobrown): [~krummas] and I discussed offline, and we'll commit his current patch as-is, and open a new ticket for the daemon threads issue. This way we can unblock dtests on trunk. > dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore > > > Key: CASSANDRA-13836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > > Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and > that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559 ] Romain GERARD commented on CASSANDRA-13418: --- Don't worry [~mck], I am currently working with an issue on couchbase so I couldn't have checked it until monday. So no hard feeling :) P.s: https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176 still think is less clear to inline it. > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13837) Hanging threads in BulkLoader
Jason Brown created CASSANDRA-13837: --- Summary: Hanging threads in BulkLoader Key: CASSANDRA-13837 URL: https://issues.apache.org/jira/browse/CASSANDRA-13837 Project: Cassandra Issue Type: Bug Reporter: Jason Brown Assignee: Jason Brown Priority: Minor [~krummas] discovered some threads that were not closing correctly when he fixed CASSANDRA-13836. We suspect this is due to CASSANDRA-8457/CASSANDRA-12229. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
[ https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150556#comment-16150556 ] Jason Brown commented on CASSANDRA-13836: - [~krummas] and I discussed offline, and we'll commit his current patch as-is, and open a new ticket for the daemon threads issue. This way we can unblock dtests on trunk. > dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore > > > Key: CASSANDRA-13836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > > Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and > that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
[ https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150518#comment-16150518 ] Jason Brown commented on CASSANDRA-13836: - Do we need to uncomment the {{System.exit()}} in {{BulkLoader}}? That was not one the changes from CASSANDRA-12229, but from CASSANDRA-10637 (1.5 years ago) Otherwise +1 > dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore > > > Key: CASSANDRA-13836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > > Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and > that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
[ https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150520#comment-16150520 ] Marcus Eriksson commented on CASSANDRA-13836: - I'm guessing CASSANDRA-12229 or CASSANDRA-8457 introduced some non-daemon threads so that we need the System.exit again? sstableloader exits fine before those two went in, but after it hangs > dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore > > > Key: CASSANDRA-13836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > > Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and > that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
[ https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-13836: --- Assignee: Marcus Eriksson (was: Jason Brown) > dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore > > > Key: CASSANDRA-13836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > > Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and > that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
[ https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-13836: Reviewer: Jason Brown > dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore > > > Key: CASSANDRA-13836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > > Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and > that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
[ https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150474#comment-16150474 ] Marcus Eriksson commented on CASSANDRA-13836: - https://github.com/krummas/cassandra/commits/marcuse/13836 also seems we now again need the System.exit(0) there, we should probably have a look at that as well [~jasobrown] could you review? > dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore > > > Key: CASSANDRA-13836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Jason Brown > > Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and > that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
[ https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown reassigned CASSANDRA-13836: --- Assignee: Jason Brown > dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore > > > Key: CASSANDRA-13836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Eriksson >Assignee: Jason Brown > > Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and > that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
Marcus Eriksson created CASSANDRA-13836: --- Summary: dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore Key: CASSANDRA-13836 URL: https://issues.apache.org/jira/browse/CASSANDRA-13836 Project: Cassandra Issue Type: Bug Reporter: Marcus Eriksson Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and that makes the dtest hang -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13339) java.nio.BufferOverflowException: null
[ https://issues.apache.org/jira/browse/CASSANDRA-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150412#comment-16150412 ] Jason Brown edited comment on CASSANDRA-13339 at 9/1/17 12:56 PM: -- [~theochu] Thanks for the additional data points. Are you using counters? Can you copy the trace trace, as well, into this ticket? Basically, I'm trying to get enough data so I can reproduce this bug - because then I can actually fix it. was (Author: jasobrown): [~theochu] Thanks for the additional data points. Are you using counters? Can you copy the trace trace, as well, into this ticket? > java.nio.BufferOverflowException: null > -- > > Key: CASSANDRA-13339 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13339 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Chris Richards > > I'm seeing the following exception running Cassandra 3.9 (with Netty updated > to 4.1.8.Final) running on a 2 node cluster. It would have been processing > around 50 queries/second at the time (mixture of > inserts/updates/selects/deletes) : there's a collection of tables (some with > counters some without) and a single materialized view. > {code} > ERROR [MutationStage-4] 2017-03-15 22:50:33,052 StorageProxy.java:1353 - > Failed to apply mutation locally : {} > java.nio.BufferOverflowException: null > at > org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.writeUnsignedVInt(BufferedDataOutputStreamPlus.java:262) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.EncodingStats$Serializer.serialize(EncodingStats.java:233) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.SerializationHeader$Serializer.serializeForMessaging(SerializationHeader.java:380) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:122) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serialize(PartitionUpdate.java:790) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.Mutation$MutationSerializer.serialize(Mutation.java:393) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:279) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:241) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1347) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2539) > [apache-cassandra-3.9.jar:3.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_121] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > [apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) > [apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) > [apache-cassandra-3.9.jar:3.9] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {code} > and then again shortly afterwards > {code} > ERROR [MutationStage-3] 2017-03-15 23:27:36,198 StorageProxy.java:1353 - > Failed to apply mutation locally : {} > java.nio.BufferOverflowException: null > at > org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.9.jar:3.9] > at >
[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3
[ https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pawel Szlendak updated CASSANDRA-13835: --- Description: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 4. Run some get_slice queries to an empty CF and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 0.3126376 *get_slice average response time: 0.000397208075838* 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 1.1646185 *get_slice average response time: 0.00147842634753* 9. Compare the average response times EXPECTED: get_slice response time of Cassandra 3.10 is not worse than on Cassandra 1.2.18 ACTUAL: get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra 1.2.18 REMARKS: - this seems to happen only on Windows platform (tested on Windows 10 and Windows Server 2008 R2) - running the very same procedure on Linux (Ubuntu) renders roughly the same response times - I sniffed the traffic to/from Cassandra 1.2.18 and Cassandra 3.10 and it can be seen that Cassandra 3.10 responds slower (Wireshark dumps attached) - when attacking the server with concurrent get_slice queries I can see lower CPU usage for Cassandra 3.10 that for Cassandra 1.2.18 - get_slice in attack.py queries the column family for non-exisitng key (the column famility is empty) I am willing to work on this on my own if you guys give me some tips on where to look for. I am also aware that this might be more Windows/Java related, nevertheless, any help from your side would be much appreciated. was: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 4. Run some get_slice queries to an empty CF and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 0.3126376 *get_slice average response time: 0.000397208075838* 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 1.1646185 *get_slice average response time: 0.00147842634753* 9. Compare the average response times EXPECTED: get_slice response time of
[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3
[ https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pawel Szlendak updated CASSANDRA-13835: --- Description: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 4. Run some get_slice queries to an empty CF and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 0.3126376 *get_slice average response time: 0.000397208075838* 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 1.1646185 *get_slice average response time: 0.00147842634753* 9. Compare the average response times EXPECTED: get_slice response time of Cassandra 3.10 is not worse than on Cassandra 1.2.18 ACTUAL: get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra 1.2.18 REMARKS: - this seems to happen only on Windows platform (tested on Windows 10 and Windows Server 2008 R2) - running the very same procedure on Linux (Ubuntu) renders roughly the same response times - I sniffed the traffic to/from Cassandra 1.2.18 and Cassandra 3.10 and it can be seen that Cassandra 3.10 responds slower (Wireshark dumps attached) - when attacking the server with concurrent get_slice queries I can see lower CPU usage for Cassandra 3.10 that for Cassandra 1.2.18 - get_slice in attack.py queries the column family for non-exisitng key (the column familiy is empty) I am willing to work on this on my own if you guys give me some tips on where to look for. I am also aware that this might be more Windows/Java related, nevertheless, any help from your side would be much appreciated. was: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 4. Run some get_slice queries to an empty CF and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 0.3126376 *get_slice average response time: 0.000397208075838* 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 1.1646185 *get_slice average response time: 0.00147842634753* 9. Compare the average response times EXPECTED: get_slice response time of
[jira] [Updated] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-13418: Fix Version/s: 4.x 3.11.x Status: Patch Available (was: Open) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Fix For: 3.11.x, 4.x > > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150446#comment-16150446 ] mck commented on CASSANDRA-13418: - Updated: || branch || testall || dtest || | [cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265] | | [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] | [testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418] | [dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265] | > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150303#comment-16150303 ] ZhaoYang edited comment on CASSANDRA-11500 at 9/1/17 12:27 PM: --- [~pauloricardomg] thanks for the feedback (y) bq. I wasn't very comfortable with our previous approach of enforcing strict liveness during row merge, since it changes a lot of low-level structures/interfaces (like BTreeRow/MergeListener, etc) to enforce a table-level setting. Since we'll probably get rid of this when doing a proper implementation of virtual cells , I updated on this commit to perform the filtering during read instead which will give us the same result but with less change in unrelated code. Do you see any problem with this approach? As we discussed offline, we need to make sure the raw data including tombstone, expired liveness are shipped to the coordinator side. Enforcing strict liveness in {{ReadCommand.executeLocally()}} would remove the row before digest or data response. Instead, we add {{enforceStrictLiveness}} to {{Row.purge}} to get the same result but less interfaces changes for {{Row}}. bq. One problem of replacing shadowable tombstones by expired liveness info is that it stores an additional unused ttl field for every shadowed view entry to solve the commutative view deletion problem. In order to avoid this I updated the patch to only use expired ttl when a shadowable tombstone would not work along with an explanation on why that is used since it's a hack Shadowable tombstone will be deprecated and use expired livenessInfo if the deletion time is greater than merged-row deletion to avoid uncessary expired livenessInfo. bq. in TableViews.java, the DeletionTracker should be applied even if existing has no data, eg. partition-deletion It's tested by "testRangeDeletionWithFlush()" in ViewTest. Without partition deletion info from deletion tracker, existing row is given as empty and it will resurrect deleted cells. bq. In order to prevent against this, I added a note to the Upgrading section of NEWS.txt explaining about this caveat and that running repair before the upgrade should be sufficient to avoid it. (y) | source | unit | [dtest| | [trunk|https://github.com/jasonstack/cassandra/commits/trunk-11500-squashed] | https://circleci.com/gh/jasonstack/cassandra/551 | secondary_indexes_test.TestPreJoinCallback.resumt_test | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.11] | https://circleci.com/gh/jasonstack/cassandra/557 | counter_tests.TestCounters.test_13691 | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.0] | https://circleci.com/gh/jasonstack/cassandra/556| counter_tests.TestCounters.test_13691 authe_test.TestAuth.sysmtem_auth_ks_is_alterable_test | | [dtest|https://github.com/riptano/cassandra-dtest/commits/11500-poc]| Those failed dtests are not related. {code} Changes: 1. Using expired livenessInfo if computed deletion time is greater than merged row deletion. There are only 2 cases: a. non-pk base column used in view pk is removed by partial update or partial delete b. unselected base column is removed by partial update or partial delete Current shadowable tombstone is not used to avoid the issue of resurrecting deleted cells. We will expired-livenessInfo and merged base row deletion instead. 2. It's strict-liveness iff there is non-key base column in view-pk. The existence of view row is solely base on this non-key base column. 3. If there is no non-pk base column in view-pk, the view's liveness/deletion is using max of base livenessIno + unselected column. unselected column's ttl is used only when it affects view row liveness. Selected columns won't contribute to livenessInfo or row deletion. * this wouldn't support complex cases as explained above. eg. c/d unselected, update c@10, delete c@11, update d@5. view row should be alive but dead 4. in TableViews.java, the DeletionTracker should be applied even if existing has no data, eg. partition-deletion 5. When generating read command to read existing base data, need to query all base columns instead of view's queried column if base and view having same key columns to read unselected column. {code} was (Author: jasonstack): [~pauloricardomg] thanks for the feedback (y) bq. I wasn't very comfortable with our previous approach of enforcing strict liveness during row merge, since it changes a lot of low-level structures/interfaces (like BTreeRow/MergeListener, etc) to enforce a table-level setting. Since we'll probably get rid of this when doing a proper implementation of virtual cells , I updated on this commit to perform the filtering during read instead which will give us the same result but with less change in unrelated code. Do you see any problem with this approach? As
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409 ] mck edited comment on CASSANDRA-13418 at 9/1/17 12:15 PM: -- [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest?) was (Author: michaelsembwever): [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest?) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150416#comment-16150416 ] Markus Dlugi commented on CASSANDRA-13754: -- Your latest patch which resets the entire {{BTree$Builder.values}} array seemed to do the trick, entire load test is now running smoothly. No more crazy GCing and most importantly no {{OutOfMemoryError}} s. Thanks a lot for the fast support and help! > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans >Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409 ] mck edited comment on CASSANDRA-13418 at 9/1/17 12:06 PM: -- [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest?) was (Author: michaelsembwever): [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest still warranted?) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409 ] mck edited comment on CASSANDRA-13418 at 9/1/17 12:05 PM: -- [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest still warranted?) was (Author: michaelsembwever): [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyTest.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest still warranted?) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13339) java.nio.BufferOverflowException: null
[ https://issues.apache.org/jira/browse/CASSANDRA-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150412#comment-16150412 ] Jason Brown commented on CASSANDRA-13339: - [~theochu] Thanks for the additional data points. Are you using counters? Can you copy the trace trace, as well, into this ticket? > java.nio.BufferOverflowException: null > -- > > Key: CASSANDRA-13339 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13339 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Chris Richards > > I'm seeing the following exception running Cassandra 3.9 (with Netty updated > to 4.1.8.Final) running on a 2 node cluster. It would have been processing > around 50 queries/second at the time (mixture of > inserts/updates/selects/deletes) : there's a collection of tables (some with > counters some without) and a single materialized view. > {code} > ERROR [MutationStage-4] 2017-03-15 22:50:33,052 StorageProxy.java:1353 - > Failed to apply mutation locally : {} > java.nio.BufferOverflowException: null > at > org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.writeUnsignedVInt(BufferedDataOutputStreamPlus.java:262) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.EncodingStats$Serializer.serialize(EncodingStats.java:233) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.SerializationHeader$Serializer.serializeForMessaging(SerializationHeader.java:380) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:122) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serialize(PartitionUpdate.java:790) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.Mutation$MutationSerializer.serialize(Mutation.java:393) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:279) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) > ~[apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.db.Mutation.apply(Mutation.java:241) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1347) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2539) > [apache-cassandra-3.9.jar:3.9] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_121] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > [apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) > [apache-cassandra-3.9.jar:3.9] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) > [apache-cassandra-3.9.jar:3.9] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] > {code} > and then again shortly afterwards > {code} > ERROR [MutationStage-3] 2017-03-15 23:27:36,198 StorageProxy.java:1353 - > Failed to apply mutation locally : {} > java.nio.BufferOverflowException: null > at > org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.writeUnsignedVInt(BufferedDataOutputStreamPlus.java:262) > ~[apache-cassandra-3.9.jar:3.9] > at > org.apache.cassandra.db.rows.EncodingStats$Serializer.serialize(EncodingStats.java:233) > ~[apache-cassandra-3.9.jar:3.9] > at >
[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150355#comment-16150355 ] Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 12:03 PM: --- Some additional observations, after taking yet another look at the test results: * Although very similar, the 3.11 {{testall}} failures are not exactly the same as the ones in the baseline. * The trunk {{dtest}} failures seem to diverge from the pattern of "common-expected-to-be-unrelated failures plus test_13747 failures". I'll try to see whether this can be attributed to flakiness (looking closer at the results, re-running the CI run on the same branch, running another CI run on a clean branch copy of the trunk, etc.) was (Author: dimitarndimitrov): Some additional observations, after taking yet another look at the test results: * Although very similar, the 3.11 {{testall}} failures are not exactly the same as the ones in the baseline. * The trunk {{dtest}} failures seem to diverge from the pattern of "common-expected-to-be-unrelated failures plus test_13747 failures". I'll try to see whether this can be attributed to flakiness (looking closer to the results, re-running the CI run on the same branch, running another CI run on a clean branch copy of the trunk, etc.) > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > Attachments: c13692-2.2-dtest-results.PNG, > c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, > c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, > c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, > c13692-testall-results.PNG > > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if (availableSpace < estimatedWriteSize) > throw new RuntimeException(String.format("Not enough space to > write %s to %s (%s available)", > > FBUtilities.prettyPrintMemory(estimatedWriteSize), > d.location, > > FBUtilities.prettyPrintMemory(availableSpace))); > logger.trace("putting compaction results in {}", directory); > return d; > } > d = getDirectories().getWriteableLocation(estimatedWriteSize); > if (d == null) > throw new RuntimeException(String.format("Not enough disk space > to store %s", > > FBUtilities.prettyPrintMemory(estimatedWriteSize))); > return d; > } > {code} > However, the thrown exception does not trigger the failure policy. > CASSANDRA-11448 fixed a similar problem. The buggy code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new RuntimeException("Insufficient disk space to write " + > writeSize + " bytes"); > return directory; > } > {code} > The fixed code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new FSWriteError(new IOException("Insufficient disk space > to write " + writeSize + " bytes"), ""); > return directory; > } > {code} > The fixed code throws FSWE and triggers the failure policy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409 ] mck commented on CASSANDRA-13418: - [~rgerard], i failed to see your last comment til now. I've addressed [~krummas]'s concerns [here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe], but feel terrible now for stepping on your toes. A few code style issues beyond the braces have been fixed. Thanks for the push back Marcus! For example, I change the names of the constants in {{TimeWindowCompactionStrategyOptions}} to be more in align with the previous constants there. Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. One for the {{TimeWindowCompactionStrategyTest.validateOptions}} which is only there for the tests, and a new test method which does what Marcus asks for. ([~krummas], do you still want a dtest still warranted?) > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150387#comment-16150387 ] Aleksey Yeschenko commented on CASSANDRA-13692: --- [~dimitarndimitrov] I've noticed and I know who you are. Welcome to the community (: FWIW I've run the dtest locally a couple dozen times, and it's passing just fine. > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > Attachments: c13692-2.2-dtest-results.PNG, > c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, > c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, > c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, > c13692-testall-results.PNG > > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if (availableSpace < estimatedWriteSize) > throw new RuntimeException(String.format("Not enough space to > write %s to %s (%s available)", > > FBUtilities.prettyPrintMemory(estimatedWriteSize), > d.location, > > FBUtilities.prettyPrintMemory(availableSpace))); > logger.trace("putting compaction results in {}", directory); > return d; > } > d = getDirectories().getWriteableLocation(estimatedWriteSize); > if (d == null) > throw new RuntimeException(String.format("Not enough disk space > to store %s", > > FBUtilities.prettyPrintMemory(estimatedWriteSize))); > return d; > } > {code} > However, the thrown exception does not trigger the failure policy. > CASSANDRA-11448 fixed a similar problem. The buggy code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new RuntimeException("Insufficient disk space to write " + > writeSize + " bytes"); > return directory; > } > {code} > The fixed code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new FSWriteError(new IOException("Insufficient disk space > to write " + writeSize + " bytes"), ""); > return directory; > } > {code} > The fixed code throws FSWE and triggers the failure policy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13532) sstabledump reports incorrect usage for argument order
[ https://issues.apache.org/jira/browse/CASSANDRA-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150381#comment-16150381 ] Michael Sear commented on CASSANDRA-13532: -- I just came across this bug myself. I think, though, that it would be preferable to fix the parser so it consumes a single value per argument. e.g. : {code:java} sstabledump -k mykey1 -k mykey2 mysstable {code} Don't you think? I'd have thought this would be more consistent with the way arguments are normally used on the command line. > sstabledump reports incorrect usage for argument order > -- > > Key: CASSANDRA-13532 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13532 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Ian Ilsley >Assignee: Varun Barala >Priority: Minor > Labels: lhf > Fix For: 3.0.15, 3.11.1, 4.0 > > Attachments: sstabledump#printUsage.patch > > > sstabledump usage reports > {{usage: sstabledump }} > However the actual usage is > {{sstabledump }} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150351#comment-16150351 ] Dimitar Dimitrov commented on CASSANDRA-13692: -- Ah, sorry, I should have at least attached the failure logs - I didn't attach the build artifacts, as I wasn't sure if they were sanitized with regard to non-public data. I'll sync with a more knowledgeable colleague, and get back to you with the necessary info. P.S. Like you've probably noticed, I'm still new to one of the more visible presences here, and many of the steps in the process are a bit hazy to me - I'll make sure to improve quickly on that though :) > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > Attachments: c13692-2.2-dtest-results.PNG, > c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, > c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, > c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, > c13692-testall-results.PNG > > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if (availableSpace < estimatedWriteSize) > throw new RuntimeException(String.format("Not enough space to > write %s to %s (%s available)", > > FBUtilities.prettyPrintMemory(estimatedWriteSize), > d.location, > > FBUtilities.prettyPrintMemory(availableSpace))); > logger.trace("putting compaction results in {}", directory); > return d; > } > d = getDirectories().getWriteableLocation(estimatedWriteSize); > if (d == null) > throw new RuntimeException(String.format("Not enough disk space > to store %s", > > FBUtilities.prettyPrintMemory(estimatedWriteSize))); > return d; > } > {code} > However, the thrown exception does not trigger the failure policy. > CASSANDRA-11448 fixed a similar problem. The buggy code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new RuntimeException("Insufficient disk space to write " + > writeSize + " bytes"); > return directory; > } > {code} > The fixed code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new FSWriteError(new IOException("Insufficient disk space > to write " + writeSize + " bytes"), ""); > return directory; > } > {code} > The fixed code throws FSWE and triggers the failure policy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150355#comment-16150355 ] Dimitar Dimitrov commented on CASSANDRA-13692: -- Some additional observations, after taking yet another look at the test results: * Although very similar, the 3.11 {{testall}} failures are not exactly the same as the ones in the baseline. * The trunk {{dtest}} failures seem to diverge from the pattern of "common-expected-to-be-unrelated failures plus test_13747 failures". I'll try to see whether this can be attributed to flakiness (looking closer to the results, re-running the CI run on the same branch, running another CI run on a clean branch copy of the trunk, etc.) > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > Attachments: c13692-2.2-dtest-results.PNG, > c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, > c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, > c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, > c13692-testall-results.PNG > > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if (availableSpace < estimatedWriteSize) > throw new RuntimeException(String.format("Not enough space to > write %s to %s (%s available)", > > FBUtilities.prettyPrintMemory(estimatedWriteSize), > d.location, > > FBUtilities.prettyPrintMemory(availableSpace))); > logger.trace("putting compaction results in {}", directory); > return d; > } > d = getDirectories().getWriteableLocation(estimatedWriteSize); > if (d == null) > throw new RuntimeException(String.format("Not enough disk space > to store %s", > > FBUtilities.prettyPrintMemory(estimatedWriteSize))); > return d; > } > {code} > However, the thrown exception does not trigger the failure policy. > CASSANDRA-11448 fixed a similar problem. The buggy code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new RuntimeException("Insufficient disk space to write " + > writeSize + " bytes"); > return directory; > } > {code} > The fixed code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new FSWriteError(new IOException("Insufficient disk space > to write " + writeSize + " bytes"), ""); > return directory; > } > {code} > The fixed code throws FSWE and triggers the failure policy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150338#comment-16150338 ] Robert Stupp commented on CASSANDRA-13754: -- Your observation regarding {{BTree.Builder.values[]}} seems correct. However, {{SEPWorker}} must *not* remove the thread locals - it's the intention of these thread-locals to be kept for reuse. > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans >Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted
[ https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150303#comment-16150303 ] ZhaoYang commented on CASSANDRA-11500: -- [~pauloricardomg] thanks for the feedback (y) bq. I wasn't very comfortable with our previous approach of enforcing strict liveness during row merge, since it changes a lot of low-level structures/interfaces (like BTreeRow/MergeListener, etc) to enforce a table-level setting. Since we'll probably get rid of this when doing a proper implementation of virtual cells , I updated on this commit to perform the filtering during read instead which will give us the same result but with less change in unrelated code. Do you see any problem with this approach? As we discussed offline, we need to make sure the raw data including tombstone, expired liveness are shipped to the coordinator side. Enforcing strict liveness in {{ReadCommand.executeLocally()}} would remove the row before digest or data response. Instead, we add {{enforceStrictLiveness}} to {{Row.purge}} to get the same result but less interfaces changes for {{Row}}. bq. One problem of replacing shadowable tombstones by expired liveness info is that it stores an additional unused ttl field for every shadowed view entry to solve the commutative view deletion problem. In order to avoid this I updated the patch to only use expired ttl when a shadowable tombstone would not work along with an explanation on why that is used since it's a hack Shadowable tombstone will be deprecated and use expired livenessInfo if the deletion time is greater than merged-row deletion to avoid uncessary expired livenessInfo. bq. in TableViews.java, the DeletionTracker should be applied even if existing has no data, eg. partition-deletion It's tested by "testRangeDeletionWithFlush()" in ViewTest. Without partition deletion info from deletion tracker, existing row is given as empty and it will resurrect deleted cells. bq. In order to prevent against this, I added a note to the Upgrading section of NEWS.txt explaining about this caveat and that running repair before the upgrade should be sufficient to avoid it. (y) | source | unit | [dtest| | [trunk|https://github.com/jasonstack/cassandra/commits/trunk-11500-squashed] | https://circleci.com/gh/jasonstack/cassandra/551 | x | | [3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.11] | https://circleci.com/gh/jasonstack/cassandra/557 | x | | [3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.0] | https://circleci.com/gh/jasonstack/cassandra/556| x | | [dtest|https://github.com/riptano/cassandra-dtest/commits/11500-poc]| {code} Changes: 1. Using expired livenessInfo if computed deletion time is greater than merged row deletion. There are only 2 cases: a. non-pk base column used in view pk is removed by partial update or partial delete b. unselected base column is removed by partial update or partial delete Current shadowable tombstone is not used to avoid the issue of resurrecting deleted cells. We will expired-livenessInfo and merged base row deletion instead. 2. It's strict-liveness iff there is non-key base column in view-pk. The existence of view row is solely base on this non-key base column. 3. If there is no non-pk base column in view-pk, the view's liveness/deletion is using max of base livenessIno + unselected column. unselected column's ttl is used only when it affects view row liveness. Selected columns won't contribute to livenessInfo or row deletion. * this wouldn't support complex cases as explained above. eg. c/d unselected, update c@10, delete c@11, update d@5. view row should be alive but dead 4. in TableViews.java, the DeletionTracker should be applied even if existing has no data, eg. partition-deletion 5. When generating read command to read existing base data, need to query all base columns instead of view's queried column if base and view having same key columns to read unselected column. {code} > Obsolete MV entry may not be properly deleted > - > > Key: CASSANDRA-11500 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11500 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Sylvain Lebresne >Assignee: ZhaoYang > Fix For: 3.0.x, 3.11.x, 4.x > > > When a Materialized View uses a non-PK base table column in its PK, if an > update changes that column value, we add the new view entry and remove the > old one. When doing that removal, the current code uses the same timestamp > than for the liveness info of the new entry, which is the max timestamp for > any columns participating to the view PK. This is not correct for the > deletion as the old view entry could have
[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150291#comment-16150291 ] Aleksey Yeschenko commented on CASSANDRA-13692: --- [~dimitarndimitrov] Maybe, maybe not. The screenshot is useless to me as I can't click on test details. > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > Attachments: c13692-2.2-dtest-results.PNG, > c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, > c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, > c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, > c13692-testall-results.PNG > > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if (availableSpace < estimatedWriteSize) > throw new RuntimeException(String.format("Not enough space to > write %s to %s (%s available)", > > FBUtilities.prettyPrintMemory(estimatedWriteSize), > d.location, > > FBUtilities.prettyPrintMemory(availableSpace))); > logger.trace("putting compaction results in {}", directory); > return d; > } > d = getDirectories().getWriteableLocation(estimatedWriteSize); > if (d == null) > throw new RuntimeException(String.format("Not enough disk space > to store %s", > > FBUtilities.prettyPrintMemory(estimatedWriteSize))); > return d; > } > {code} > However, the thrown exception does not trigger the failure policy. > CASSANDRA-11448 fixed a similar problem. The buggy code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new RuntimeException("Insufficient disk space to write " + > writeSize + " bytes"); > return directory; > } > {code} > The fixed code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new FSWriteError(new IOException("Insufficient disk space > to write " + writeSize + " bytes"), ""); > return directory; > } > {code} > The fixed code throws FSWE and triggers the failure policy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13810) Overload because of hint pressure + MVs
[ https://issues.apache.org/jira/browse/CASSANDRA-13810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-13810: Labels: materializedviews (was: ) > Overload because of hint pressure + MVs > --- > > Key: CASSANDRA-13810 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13810 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Tom van der Woerdt > Labels: materializedviews > > Cluster setup: 3 DCs, 20 Cassandra nodes each, all 3.0.14, with approx. 200GB > data per machine. Many tables have MVs associated. > During some maintenance we did a rolling restart of all nodes in the cluster. > This caused a buildup of hints/batches, as expected. Most nodes came back > just fine, except for two nodes. > These two nodes came back with a loadavg of >100, and 'nodetool tpstats' > showed a million (not exaggerating) MutationStage tasks per second(!). It was > clear that these were mostly (all?) mutations coming from hints, as indicated > by thousands of log entries per second in debug.log : > {noformat} > DEBUG [SharedPool-Worker-107] 2017-08-27 13:16:51,098 HintVerbHandler.java:95 > - Failed to apply hint > java.util.concurrent.CompletionException: > org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - > received only 0 responses. > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > ~[na:1.8.0_144] > at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:481) > ~[apache-cassandra-3.0.14.jar:3.0.14] > at > org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:495) > ~[apache-cassandra-3.0.14.jar:3.0.14] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_144] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > ~[apache-cassandra-3.0.14.jar:3.0.14] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > ~[apache-cassandra-3.0.14.jar:3.0.14] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144] > Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation > timed out - received only 0 responses. > ... 6 common frames omitted > {noformat} > After reading the relevant code, it seems that a hint is considered > droppable, and in the mutation path when the table contains a MV and the lock > fails to acquire and the mutation is droppable, it throws a WTE without > waiting until the timeout expires. This explains why Cassandra is able to > process a million mutations per second without actually considering them > 'dropped' in the 'nodetool tpstats' output. > I managed to recover the two nodes by stopping handoffs on all nodes in the > cluster and reenabling them one at a time. It's likely that the hint/batchlog > settings were sub-optimal on this cluster, but I think that the retry > behavior(?) of hints should be improved as it's hard to express hint > throughput in kb/s when the mutations can involve MVs. > More data available upon request -- I'm not sure which bits are relevant and > which aren't. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13810) Overload because of hint pressure + MVs
[ https://issues.apache.org/jira/browse/CASSANDRA-13810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta updated CASSANDRA-13810: Component/s: Materialized Views > Overload because of hint pressure + MVs > --- > > Key: CASSANDRA-13810 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13810 > Project: Cassandra > Issue Type: Bug > Components: Materialized Views >Reporter: Tom van der Woerdt > Labels: materializedviews > > Cluster setup: 3 DCs, 20 Cassandra nodes each, all 3.0.14, with approx. 200GB > data per machine. Many tables have MVs associated. > During some maintenance we did a rolling restart of all nodes in the cluster. > This caused a buildup of hints/batches, as expected. Most nodes came back > just fine, except for two nodes. > These two nodes came back with a loadavg of >100, and 'nodetool tpstats' > showed a million (not exaggerating) MutationStage tasks per second(!). It was > clear that these were mostly (all?) mutations coming from hints, as indicated > by thousands of log entries per second in debug.log : > {noformat} > DEBUG [SharedPool-Worker-107] 2017-08-27 13:16:51,098 HintVerbHandler.java:95 > - Failed to apply hint > java.util.concurrent.CompletionException: > org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - > received only 0 responses. > at > java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) > ~[na:1.8.0_144] > at > java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977) > ~[na:1.8.0_144] > at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:481) > ~[apache-cassandra-3.0.14.jar:3.0.14] > at > org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:495) > ~[apache-cassandra-3.0.14.jar:3.0.14] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_144] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > ~[apache-cassandra-3.0.14.jar:3.0.14] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > ~[apache-cassandra-3.0.14.jar:3.0.14] > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144] > Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation > timed out - received only 0 responses. > ... 6 common frames omitted > {noformat} > After reading the relevant code, it seems that a hint is considered > droppable, and in the mutation path when the table contains a MV and the lock > fails to acquire and the mutation is droppable, it throws a WTE without > waiting until the timeout expires. This explains why Cassandra is able to > process a million mutations per second without actually considering them > 'dropped' in the 'nodetool tpstats' output. > I managed to recover the two nodes by stopping handoffs on all nodes in the > cluster and reenabling them one at a time. It's likely that the hint/batchlog > settings were sub-optimal on this cluster, but I think that the retry > behavior(?) of hints should be improved as it's hard to express hint > throughput in kb/s when the mutations can involve MVs. > More data available upon request -- I'm not sure which bits are relevant and > which aren't. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3
[ https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pawel Szlendak updated CASSANDRA-13835: --- Description: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 4. Run some get_slice queries to an empty CF and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 0.3126376 *get_slice average response time: 0.000397208075838* 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 1.1646185 *get_slice average response time: 0.00147842634753* 9. Compare the average response times EXPECTED: get_slice response time of Cassandra 3.10 is not worse than on Cassandra 1.2.18 ACTUAL: get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra 1.2.18 REMARKS: - this seems to happen only on Windows platform (tested on Windows 10 and Windows Server 2008 R2) - running the very same procedure on Linux (Ubuntu) renders roughly the same response times - I sniffed the traffic to/from Cassandra 1.2.18 and Cassandra 3.10 and it can be seen that Cassandra 3.10 responds slower (Wireshark dumps attached) - when attacking the server with concurrent get_slice queries I can see lower CPU usage for Cassandra 3.10 that for Cassandra 1.2.18 I am willing to work on this on my own if you guys give me some tips on where to look for. I am also aware that this might be more Windows/Java related, nevertheless, any help from your side would be much appreciated. was: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script python attack.py create 4. Run some get_slice queries to an empty CF and note down the average response time (in seconds) python attack.py get_slice count: 788 get_slice total response time: 0.3126376 *get_slice average response time: 0.000397208075838* 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script python attack.py create 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time (in seconds) python attack.py get_slice count: 788 get_slice total response time: 1.1646185 *get_slice average response time: 0.00147842634753* 9. Compare the average response times EXPECTED: get_slice response time of Cassandra 3.10 is not worse than on Cassandra 1.2.18 ACTUAL: get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra 1.2.18 REMARKS: - this seems to happen only on Windows
[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3
[ https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pawel Szlendak updated CASSANDRA-13835: --- Description: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 4. Run some get_slice queries to an empty CF and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 0.3126376 *get_slice average response time: 0.000397208075838* 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 1.1646185 *get_slice average response time: 0.00147842634753* 9. Compare the average response times EXPECTED: get_slice response time of Cassandra 3.10 is not worse than on Cassandra 1.2.18 ACTUAL: get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra 1.2.18 REMARKS: - this seems to happen only on Windows platform (tested on Windows 10 and Windows Server 2008 R2) - running the very same procedure on Linux (Ubuntu) renders roughly the same response times - I sniffed the traffic to/from Cassandra 1.2.18 and Cassandra 3.10 and it can be seen that Cassandra 3.10 responds slower (Wireshark dumps attached) - when attacking the server with concurrent get_slice queries I can see lower CPU usage for Cassandra 3.10 that for Cassandra 1.2.18 I am willing to work on this on my own if you guys give me some tips on where to look for. I am also aware that this might be more Windows/Java related, nevertheless, any help from your side would be much appreciated. was: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 4. Run some get_slice queries to an empty CF and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 0.3126376 *get_slice average response time: 0.000397208075838* 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script {noformat} python attack.py create {noformat} 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time (in seconds) {noformat} python attack.py {noformat} get_slice count: 788 get_slice total response time: 1.1646185 *get_slice average response time: 0.00147842634753* 9. Compare the average response times EXPECTED: get_slice response time of Cassandra 3.10 is not worse than on Cassandra 1.2.18 ACTUAL: get_slice response time of Cassandra
[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3
[ https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pawel Szlendak updated CASSANDRA-13835: --- Description: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script python attack.py create 4. Run some get_slice queries to an empty CF and note down the average response time (in seconds) python attack.py get_slice count: 788 get_slice total response time: 0.3126376 *get_slice average response time: 0.000397208075838* 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script python attack.py create 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time (in seconds) python attack.py get_slice count: 788 get_slice total response time: 1.1646185 *get_slice average response time: 0.00147842634753* 9. Compare the average response times EXPECTED: get_slice response time of Cassandra 3.10 is not worse than on Cassandra 1.2.18 ACTUAL: get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra 1.2.18 REMARKS: - this seems to happen only on Windows platform (tested on Windows 10 and Windows Server 2008 R2) - running the very same procedure on Linux (Ubuntu) renders roughly the same response times - I sniffed the traffic to/from Cassandra 1.2.18 and Cassandra 3.10 and it can be seen that Cassandra 3.10 responds slower (Wireshark dumps attached) - when attacking the server with concurrent get_slice queries I can see lower CPU usage for Cassandra 3.10 that for Cassandra 1.2.18 I am willing to work on this on my own if you guys give me some tips on where to look for. I am also aware that this might be more Windows/Java related, nevertheless, any help from your side would be much appreciated. was: I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script python attack.py create 4. Run some get_slice queries to an empty CF and note down the average response time python attack.py get_slice count: 788 get_slice total response time: 0.3126376 get_slice average response time: 0.000397208075838 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script python attack.py create 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time python attack.py get_slice count: 788 get_slice total response time: 1.1646185 get_slice average response time: 0.00147842634753 9. Compare the average response times EXPECTED: get_slice response time of Cassandra 3.10 is not worse than on Cassandra 1.2.18 ACTUAL: get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra 1.2.18 REMARKS: - this seems to happen only on Windows platform (tested on Windows 10 and Windows Server 2008 R2) - running the very same procedure on Linux (Ubuntu) renders
[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212 ] Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:51 AM: -- Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.PNG] | [dtest|^c13692-2.2-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.PNG] | [dtest|^c13692-3.0-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.PNG] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.PNG] | [dtest|^c13692-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} results look good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 dtests failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey], do you have an idea if that could be the case? was (Author: dimitarndimitrov): Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.PNG] | [dtest|^c13692-2.2-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.PNG] | [dtest|^c13692-3.0-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.PNG] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.PNG] | [dtest|^c13692-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 dtests failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > Attachments: c13692-2.2-dtest-results.PNG, > c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, > c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, > c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, > c13692-testall-results.PNG > > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}",
[jira] [Created] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3
Pawel Szlendak created CASSANDRA-13835: -- Summary: Thrift get_slice responds slower on Cassandra 3 Key: CASSANDRA-13835 URL: https://issues.apache.org/jira/browse/CASSANDRA-13835 Project: Cassandra Issue Type: Bug Reporter: Pawel Szlendak Attachments: attack.py, cassandra120_get_slice_reply_time.png, cassandra310_get_slice_reply_time.png I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was surprised to notice performance degradation of my server application. I dug down through my application stack only to find out that the cause of the performance issue was slower response time of Cassandra 3.10 get_slice as compared to Cassandra 1.2.18 (almost x3 times slower on average). I am attaching a python script (attack.py) here that can be used to reproduce this issue on a Windows platform. The script uses the pycassa python library that can easily be installed using pip. REPRODUCTION STEPS: 1. Install Cassandra 1.2.18 from https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz 2. Run Cassandra 1.2.18 from cmd console using cassandra.bat 3. Create a test keyspace and an empty CF using attack.py script python attack.py create 4. Run some get_slice queries to an empty CF and note down the average response time python attack.py get_slice count: 788 get_slice total response time: 0.3126376 get_slice average response time: 0.000397208075838 5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz 6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run Cassandra from an elevated cmd console using cassandra.bat 7. Create a test keyspace and an empty CF using attack.py script python attack.py create 8. Run some get_slice queries to an empty CF using attack.py and note down the average response time python attack.py get_slice count: 788 get_slice total response time: 1.1646185 get_slice average response time: 0.00147842634753 9. Compare the average response times EXPECTED: get_slice response time of Cassandra 3.10 is not worse than on Cassandra 1.2.18 ACTUAL: get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra 1.2.18 REMARKS: - this seems to happen only on Windows platform (tested on Windows 10 and Windows Server 2008 R2) - running the very same procedure on Linux (Ubuntu) renders roughly the same response times - I sniffed the traffic to/from Cassandra 1.2.18 and Cassandra 3.10 and it can been seen that Cassandra 3.10 responds slower (Wireshark dumps attached) - when attacking the server with concurrent get_slice queries I can see lower CPU usage for Cassandra 3.10 that for Cassandra 1.2.18 I am willing to work on this on my own if you guys give me some tips on where to look for. I am also aware that this might be more Windows/Java related, nevertheless, any help from your side would be much appreciated. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212 ] Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:50 AM: -- Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.PNG] | [dtest|^c13692-2.2-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.PNG] | [dtest|^c13692-3.0-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.PNG] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.PNG] | [dtest|^c13692-dtest-results.PNG] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 dtests failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? was (Author: dimitarndimitrov): Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 dtests failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > Attachments: c13692-2.2-dtest-results.PNG, > c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, > c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, > c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, > c13692-testall-results.PNG > > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); >
[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212 ] Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:49 AM: -- Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 dtests failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? was (Author: dimitarndimitrov): Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > Attachments: c13692-2.2-dtest-results.PNG, > c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, > c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, > c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, > c13692-testall-results.PNG > > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory);
[jira] [Updated] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dimitar Dimitrov updated CASSANDRA-13692: - Attachment: c13692-2.2-dtest-results.PNG c13692-2.2-testall-results.PNG c13692-3.0-dtest-results.PNG c13692-3.0-testall-results.PNG c13692-3.11-dtest-results.PNG c13692-3.11-testall-results.PNG c13692-dtest-results.PNG c13692-testall-results.PNG Adding screenshots from CI results. > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > Attachments: c13692-2.2-dtest-results.PNG, > c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, > c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, > c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, > c13692-testall-results.PNG > > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if (availableSpace < estimatedWriteSize) > throw new RuntimeException(String.format("Not enough space to > write %s to %s (%s available)", > > FBUtilities.prettyPrintMemory(estimatedWriteSize), > d.location, > > FBUtilities.prettyPrintMemory(availableSpace))); > logger.trace("putting compaction results in {}", directory); > return d; > } > d = getDirectories().getWriteableLocation(estimatedWriteSize); > if (d == null) > throw new RuntimeException(String.format("Not enough disk space > to store %s", > > FBUtilities.prettyPrintMemory(estimatedWriteSize))); > return d; > } > {code} > However, the thrown exception does not trigger the failure policy. > CASSANDRA-11448 fixed a similar problem. The buggy code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new RuntimeException("Insufficient disk space to write " + > writeSize + " bytes"); > return directory; > } > {code} > The fixed code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new FSWriteError(new IOException("Insufficient disk space > to write " + writeSize + " bytes"), ""); > return directory; > } > {code} > The fixed code throws FSWE and triggers the failure policy. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212 ] Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:46 AM: -- Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? was (Author: dimitarndimitrov): Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if
[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212 ] Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:46 AM: -- Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? was (Author: dimitarndimitrov): Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if
[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212 ] Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:45 AM: -- Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? was (Author: dimitarndimitrov): Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if
[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212 ] Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:44 AM: -- Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? was (Author: dimitarndimitrov): Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if
[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212 ] Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:44 AM: -- Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? was (Author: dimitarndimitrov): Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if (availableSpace
[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212 ] Dimitar Dimitrov commented on CASSANDRA-13692: -- Okay, here are the branches with the proposed changes: | [2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2] | [testall|^c13692-2.2-testall-results.png] | [dtest|^c13692-2.2-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/]) | | [3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0] | [testall|^c13692-3.0-testall-results.png] | [dtest|^c13692-3.0-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/]) | | [3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11] | [testall|^c13692-3.11-testall-results.png] ([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/]) | [dtest|^c13692-3.11-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/]) | | [trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692] | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] ([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/]) | {{testall}} looks good for all branches, but there's a common theme of consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to the common-expected-to-be-unrelated {{dtest}} failures. My assumption is that this is related to CASSANDRA-13747 (the comments there seem to corroborate that). [~iamaleksey] , do you have an idea if that could be the case? > CompactionAwareWriter_getWriteDirectory throws incompatible exceptions > -- > > Key: CASSANDRA-13692 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13692 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Hao Zhong >Assignee: Dimitar Dimitrov > Labels: lhf > > The CompactionAwareWriter_getWriteDirectory throws RuntimeException: > {code} > public Directories.DataDirectory getWriteDirectory(Iterable > sstables, long estimatedWriteSize) > { > File directory = null; > for (SSTableReader sstable : sstables) > { > if (directory == null) > directory = sstable.descriptor.directory; > if (!directory.equals(sstable.descriptor.directory)) > { > logger.trace("All sstables not from the same disk - putting > results in {}", directory); > break; > } > } > Directories.DataDirectory d = > getDirectories().getDataDirectoryForFile(directory); > if (d != null) > { > long availableSpace = d.getAvailableSpace(); > if (availableSpace < estimatedWriteSize) > throw new RuntimeException(String.format("Not enough space to > write %s to %s (%s available)", > > FBUtilities.prettyPrintMemory(estimatedWriteSize), > d.location, > > FBUtilities.prettyPrintMemory(availableSpace))); > logger.trace("putting compaction results in {}", directory); > return d; > } > d = getDirectories().getWriteableLocation(estimatedWriteSize); > if (d == null) > throw new RuntimeException(String.format("Not enough disk space > to store %s", > > FBUtilities.prettyPrintMemory(estimatedWriteSize))); > return d; > } > {code} > However, the thrown exception does not trigger the failure policy. > CASSANDRA-11448 fixed a similar problem. The buggy code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new RuntimeException("Insufficient disk space to write " + > writeSize + " bytes"); > return directory; > } > {code} > The fixed code is: > {code} > protected Directories.DataDirectory getWriteDirectory(long writeSize) > { > Directories.DataDirectory directory = > getDirectories().getWriteableLocation(writeSize); > if (directory == null) > throw new FSWriteError(new IOException("Insufficient disk space > to write " + writeSize + " bytes"),
[jira] [Comment Edited] (CASSANDRA-13833) Failed compaction is not captured
[ https://issues.apache.org/jira/browse/CASSANDRA-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150158#comment-16150158 ] Marcus Eriksson edited comment on CASSANDRA-13833 at 9/1/17 7:58 AM: - nice catch, code LGTM, just a small test fix for 3.11 and trunk: https://github.com/krummas/cassandra/commit/bb9c9e0b685d3b4e76a7b082b46b01a7ed6c8af5 running dtests: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/261/ https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/262/ https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/263/ https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/264/ was (Author: krummas): nice catch, code LGTM, just a small fix for 3.11 and trunk: https://github.com/krummas/cassandra/commit/bb9c9e0b685d3b4e76a7b082b46b01a7ed6c8af5 running dtests: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/261/ https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/262/ https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/263/ https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/264/ > Failed compaction is not captured > - > > Key: CASSANDRA-13833 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13833 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Jay Zhuang >Assignee: Jay Zhuang > > Follow up for CASSANDRA-13785, when the compaction failed, it fails silently. > No error message is logged and exceptions metric is not updated. Basically, > it's unable to get the exception: > [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491] > Here is the call stack: > {noformat} > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at java.lang.Thread.run(Thread.java:745) > {noformat} > There're 2 {{FutureTask}} in the call stack, for example > {{FutureTask1(FutureTask2))}}, If the call thrown an exception, > {{FutureTask2}} sets the status, save the exception and return. But > FutureTask1 doesn't get any exception, then set the status to normal. So > we're unable to get the exception in: > [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491] > 2.1.x is working fine, here is the call stack: > {noformat} > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) > ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_141] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_141] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150175#comment-16150175 ] Marcus Eriksson commented on CASSANDRA-13418: - A dtest that makes sure that we drop sstables when the option is enabled and that we don't drop them when it is not enabled > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150173#comment-16150173 ] Romain GERARD commented on CASSANDRA-13418: --- Will change code style and the protected to private (Splitting from getFullyExpiredSSTables seems more readable to me) If you can think of any more test. I will add them > Allow TWCS to ignore overlaps when dropping fully expired sstables > -- > > Key: CASSANDRA-13418 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13418 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Corentin Chary >Assignee: Romain GERARD > Labels: twcs > Attachments: twcs-cleanup.png > > > http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If > you really want read-repairs you're going to have sstables blocking the > expiration of other fully expired SSTables because they overlap. > You can set unchecked_tombstone_compaction = true or tombstone_threshold to a > very low value and that will purge the blockers of old data that should > already have expired, thus removing the overlaps and allowing the other > SSTables to expire. > The thing is that this is rather CPU intensive and not optimal. If you have > time series, you might not care if all your data doesn't exactly expire at > the right time, or if data re-appears for some time, as long as it gets > deleted as soon as it can. And in this situation I believe it would be really > beneficial to allow users to simply ignore overlapping SSTables when looking > for fully expired ones. > To the question: why would you need read-repairs ? > - Full repairs basically take longer than the TTL of the data on my dataset, > so this isn't really effective. > - Even with a 10% chances of doing a repair, we found out that this would be > enough to greatly reduce entropy of the most used data (and if you have > timeseries, you're likely to have a dashboard doing the same important > queries over and over again). > - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow. > I'll try to come up with a patch demonstrating how this would work, try it on > our system and report the effects. > cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory
[ https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150172#comment-16150172 ] Markus Dlugi commented on CASSANDRA-13754: -- [~snazy], I don't think the node is overloaded. I originally thought so as well, so I made a little experiment where I included a cap in our load test limiting the {{INSERT}} s per minute from ~25,000 to ~10,000. As a consequence, the node survived a little longer, but in the end it still died with an {{OutOfMemoryError}} after more data had been inserted. So it's not that there are too many active writes, it's just that the node fails after a certain amount of total writes, which indicates to me that a memory leak is indeed happening. I also had another look into the heap dump I sent you, and you are correct that the heap is mostly filled with {{BTree$Builder}} instances that still have stuff in their {{values}} array. However, if you look closer, you will notice that for each of these instances, the {{values}} array always contains {{null}} for the first couple of entries, and only after those there is still actual content. For some reason, the actual content always starts at index 28, whereas indices 0 - 27 are {{null}} - not sure if this is a coincidence? But you can also see that for all the {{BTree$Builder}} objects, the {{count}} attribute is 0, which also indicates to me that {{BTree$Builder.cleanup()}} has already run and those are not active writes. This theory is supported by the fact that my little workaround of manually calling {{FastThreadLocal.removeAll()}} actually works, because this means that no other objects except the {{FastThreadLocal}} s still have references to the builders. Therefore, I think we have two issues here: # {{SEPWorker}} is never cleaning the {{FastThreadLocal}} s, therefore accumulating references to otherwise dead objects - maybe we can include something to at least remove non-static entries regularly? # {{BTree$Builder}} seems to have an issue properly cleaning up after building, so the objects referenced by the {{FastThreadLocal}} s of the {{SEPWorker}} threads are very large and thus ultimately lead to the {{OutOfMemoryError}} s > FastThreadLocal leaks memory > > > Key: CASSANDRA-13754 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13754 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15 >Reporter: Eric Evans >Assignee: Robert Stupp > Fix For: 3.11.1 > > > After a chronic bout of {{OutOfMemoryError}} in our development environment, > a heap analysis is showing that more than 10G of our 12G heaps are consumed > by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) > of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances. > Reverting > [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54] > fixes the issue. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13833) Failed compaction is not captured
[ https://issues.apache.org/jira/browse/CASSANDRA-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150158#comment-16150158 ] Marcus Eriksson commented on CASSANDRA-13833: - nice catch, code LGTM, just a small fix for 3.11 and trunk: https://github.com/krummas/cassandra/commit/bb9c9e0b685d3b4e76a7b082b46b01a7ed6c8af5 running dtests: https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/261/ https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/262/ https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/263/ https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/264/ > Failed compaction is not captured > - > > Key: CASSANDRA-13833 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13833 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Jay Zhuang >Assignee: Jay Zhuang > > Follow up for CASSANDRA-13785, when the compaction failed, it fails silently. > No error message is logged and exceptions metric is not updated. Basically, > it's unable to get the exception: > [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491] > Here is the call stack: > {noformat} > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at java.lang.Thread.run(Thread.java:745) > {noformat} > There're 2 {{FutureTask}} in the call stack, for example > {{FutureTask1(FutureTask2))}}, If the call thrown an exception, > {{FutureTask2}} sets the status, save the exception and return. But > FutureTask1 doesn't get any exception, then set the status to normal. So > we're unable to get the exception in: > [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491] > 2.1.x is working fine, here is the call stack: > {noformat} > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) > ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_141] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_141] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13530) GroupCommitLogService
[ https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149884#comment-16149884 ] Yuji Ito edited comment on CASSANDRA-13530 at 9/1/17 7:17 AM: -- [~aweisberg] Sorry for late. I measured the latencies again. As you said, a test requests 256 operations without SERIAL twice. And the below results are reported in the 2nd requests. h5. Average latency of UPDATE ||Throughput\[ops\]||Batch - 2ms \[ms\]||Group - 15ms \[ms\]|| |100|1.63|9.58| |200|11.83|9.67| |500|17.31|10.20| |1000|19.93|10.75| I attached the result file ([^groupCommitLog_noSerial_result.xlsx]) including histograms of latency. was (Author: yuji): Sorry for late. I measured the latencies again. As you said, a test requests 256 operations without SERIAL twice. And the below results are reported in the 2nd requests. h5. Average latency of UPDATE ||Throughput\[ops\]||Batch - 2ms \[ms\]||Group - 15ms \[ms\]|| |100|1.63|9.58| |200|11.83|9.67| |500|17.31|10.20| |1000|19.93|10.75| I attached the result file ([^groupCommitLog_noSerial_result.xlsx]) including histograms of latency. > GroupCommitLogService > - > > Key: CASSANDRA-13530 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13530 > Project: Cassandra > Issue Type: Improvement >Reporter: Yuji Ito >Assignee: Yuji Ito > Fix For: 2.2.x, 3.0.x, 3.11.x > > Attachments: groupCommit22.patch, groupCommit30.patch, > groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, > groupCommitLog_result.xlsx, GuavaRequestThread.java, MicroRequestThread.java > > > I propose a new CommitLogService, GroupCommitLogService, to improve the > throughput when lots of requests are received. > It improved the throughput by maximum 94%. > I'd like to discuss about this CommitLogService. > Currently, we can select either 2 CommitLog services; Periodic and Batch. > In Periodic, we might lose some commit log which hasn't written to the disk. > In Batch, we can write commit log to the disk every time. The size of commit > log to write is too small (< 4KB). When high concurrency, these writes are > gathered and persisted to the disk at once. But, when insufficient > concurrency, many small writes are issued and the performance decreases due > to the latency of the disk. Even if you use SSD, processes of many IO > commands decrease the performance. > GroupCommitLogService writes some commitlog to the disk at once. > The patch adds GroupCommitLogService (It is enabled by setting > `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml). > The difference from Batch is just only waiting for the semaphore. > By waiting for the semaphore, some writes for commit logs are executed at the > same time. > In GroupCommitLogService, the latency becomes worse if the there is no > concurrency. > I measured the performance with my microbench (MicroRequestThread.java) by > increasing the number of threads.The cluster has 3 nodes (Replication factor: > 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume. > The result is as below. The GroupCommitLogService with 10ms window improved > update with Paxos by 94% and improved select with Paxos by 76%. > h6. SELECT / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|192|103| > |2|163|212| > |4|264|416| > |8|454|800| > |16|744|1311| > |32|1151|1481| > |64|1767|1844| > |128|2949|3011| > |256|4723|5000| > h6. UPDATE / sec > ||\# of threads||Batch 2ms||Group 10ms|| > |1|45|26| > |2|39|51| > |4|58|102| > |8|102|198| > |16|167|213| > |32|289|295| > |64|544|548| > |128|1046|1058| > |256|2020|2061| -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13833) Failed compaction is not captured
[ https://issues.apache.org/jira/browse/CASSANDRA-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-13833: Reviewer: Marcus Eriksson > Failed compaction is not captured > - > > Key: CASSANDRA-13833 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13833 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Jay Zhuang >Assignee: Jay Zhuang > > Follow up for CASSANDRA-13785, when the compaction failed, it fails silently. > No error message is logged and exceptions metric is not updated. Basically, > it's unable to get the exception: > [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491] > Here is the call stack: > {noformat} > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at java.lang.Thread.run(Thread.java:745) > {noformat} > There're 2 {{FutureTask}} in the call stack, for example > {{FutureTask1(FutureTask2))}}, If the call thrown an exception, > {{FutureTask2}} sets the status, save the exception and return. But > FutureTask1 doesn't get any exception, then set the status to normal. So > we're unable to get the exception in: > [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491] > 2.1.x is working fine, here is the call stack: > {noformat} > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73) > ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_141] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ~[na:1.8.0_141] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > [na:1.8.0_141] > at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141] > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org