date:20170901


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151196#comment-16151196
 ] 

mck edited comment on CASSANDRA-13418 at 9/1/17 9:39 PM:
-

{quote}P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.{quote}

I agree, but found no clear method name to use. As Marcus' comments, 
{{getFullyExpiredSSTables(..)}} isn't appropriate.
Any suggestions for a clear name? Otherwise the method is at 70 lines length, 
not great but no disaster, so i'm ok either way.


was (Author: michaelsembwever):
{quote}P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.{quote}

I agree, but found not clear method name to use. As Marcus' comments, 
{{getFullyExpiredSSTables(..)}} isn't appropriate.
Any suggestions for a clear name? Otherwise the method is at 70 lines length, 
not great but no disaster, so i'm ok either way.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151196#comment-16151196
 ] 

mck commented on CASSANDRA-13418:
-

{quote}P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.{quote}

I agree, but found not clear method name to use. As Marcus' comments, 
{{getFullyExpiredSSTables(..)}} isn't appropriate.
Any suggestions for a clear name? Otherwise the method is at 70 lines length, 
not great but no disaster, so i'm ok either way.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13754) BTree.Builder memory leak


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-13754:
-
Resolution: Fixed
Status: Resolved  (was: Ready to Commit)

Committed as 
[bed7fa5ef8492d1ff3852cf299622a5ad4e0b621|https://github.com/apache/cassandra/commit/bed7fa5ef8492d1ff3852cf299622a5ad4e0b621]
 to [cassandra-3.11|https://github.com/apache/cassandra/tree/cassandra-3.11] 
and merged to trunk.


> BTree.Builder memory leak
> -
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>Reporter: Eric Evans
>Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13339) java.nio.BufferOverflowException: null


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150916#comment-16150916
 ] 

Jason Brown commented on CASSANDRA-13339:
-

Here's where I'm at so far investigating:
- A mutation is being stored on the coorindator as it is a replica for the 
data, as can be seen from the {{LocalMutationRunnable}} in the stack traces.
- In both 3.0 and 3.9 (which have been reported), We execute the 
{{StorageProxy#performLocally}} method that takes an 
{{IAsyncCallbackWithFailure}} as the last parameter (The method has a different 
arity between the two cassandra versions, but it's the same method basically). 
That method is a few different ways in {{StorageProxy}}
-- {{#apply}} - the standard 'write a mutation' function
-- sync batchlog - write the batchlog and block
-- counter write
-- syncWriteBatachedMutations
-- asyncWriteBatchedMutations

Due to the way everyone's currently reported stack traces look, what I'm 
suspecting is the write thread think the rows in the Mutation (one of the 
{{PartitionUpdate}}'s {{holder}} instances to be specific) are empty when we 
check the serialized size, but not empty when we actually serialize. Here's why:

The stack traces all fail in 
[{{UnfilteredRowIteratorSerializer#serialize}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java#L120].
 At that point in the serialize method, we've already written out at least two 
bytes (one for the partition key length, and one for the flags). We then try to 
serialize the {{SerializationHeader}}, which serializes the {{EncodingStats}}, 
and then it fails. In 
[{{UnfilteredRowIteratorSerializer#serializedSize}}|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java#L150],
 it accounts for the partition key length and flags *at the minimum*. If the 
{{iterator}} argument to the method {{#isEmpty}}, is simply returns the 
currently computed size. Thus we always serialize the 'basic data' about a row, 
but then nothing else; we knew we needed to something about a row, but didn't 
have the full knowledge about the row when we calculated the size.

I think there may be some thread visibility issue or some race condition where 
the {{iterator}} is empty at 
{{UnfilteredRowIteratorSerializer#serializedSize}}, yet not empty at 
{{UnfilteredRowIteratorSerializer#serialize}}. Note that there may be something 
funny going on with the {{PartitionUpdate#holder}}, but I could see anything 
obvious (without grasping at straws).

Without more details or a way to reproduce, I'm kind of at a stand-still 
without just flailing at all the things. Thanks to all those who have 
commented, especially [~crichards]

> java.nio.BufferOverflowException: null
> --
>
> Key: CASSANDRA-13339
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13339
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Chris Richards
>
> I'm seeing the following exception running Cassandra 3.9 (with Netty updated 
> to 4.1.8.Final) running on a 2 node cluster.  It would have been processing 
> around 50 queries/second at the time (mixture of 
> inserts/updates/selects/deletes) : there's a collection of tables (some with 
> counters some without) and a single materialized view.
> {code}
> ERROR [MutationStage-4] 2017-03-15 22:50:33,052 StorageProxy.java:1353 - 
> Failed to apply mutation locally : {}
> java.nio.BufferOverflowException: null
>   at 
> org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.writeUnsignedVInt(BufferedDataOutputStreamPlus.java:262)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.EncodingStats$Serializer.serialize(EncodingStats.java:233)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.SerializationHeader$Serializer.serializeForMessaging(SerializationHeader.java:380)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:122)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serialize(PartitionUpdate.java:790)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
>

[jira] [Commented] (CASSANDRA-12813) NPE in auth for bootstrapping node

2017-09-01 Thread Andres March (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150908#comment-16150908
 ] 

Andres March commented on CASSANDRA-12813:
--

any workaround?  I got this on a 3.9 cluster with a new node bootstrapping.  No 
upgrades from another version.

> NPE in auth for bootstrapping node
> --
>
> Key: CASSANDRA-12813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12813
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Charles Mims
>Assignee: Alex Petrov
> Fix For: 2.2.9, 3.0.10, 3.10
>
>
> {code}
> ERROR [SharedPool-Worker-1] 2016-10-19 21:40:25,991 Message.java:617 - 
> Unexpected exception during request; channel = [id: 0x15eb017f, / omitted>:40869 => /10.0.0.254:9042]
> java.lang.NullPointerException: null
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:144)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:86)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:182)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_101]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.9.jar:3.0.9]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> {code}
> I have a node that has been joining for around 24 hours.  My application is 
> configured with the IP address of the joining node in the list of nodes to 
> connect to (ruby driver), and I have been getting around 200 events of this 
> NPE per hour.  I removed the IP of the joining node from the list of nodes 
> for my app to connect to and the errors stopped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[3/3] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

2017-09-01 Thread snazy

Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/e5f3bb6e
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/e5f3bb6e
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/e5f3bb6e

Branch: refs/heads/trunk
Commit: e5f3bb6e583a4f71a2522a040a93468404dfb653
Parents: fb0e001 bed7fa5
Author: Robert Stupp 
Authored: Fri Sep 1 19:16:29 2017 +0200
Committer: Robert Stupp 
Committed: Fri Sep 1 19:16:29 2017 +0200

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/utils/btree/BTree.java | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/e5f3bb6e/CHANGES.txt
--
diff --cc CHANGES.txt
index 78c2947,c4a3170..023ff06
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@@ -1,134 -1,6 +1,135 @@@
 +4.0
 + * Add stress profile yaml with LWT (CASSANDRA-7960)
 + * Reduce memory copies and object creations when acting on ByteBufs 
(CASSANDRA-13789)
 + * simplify mx4j configuration (Cassandra-13578)
 + * Fix trigger example on 4.0 (CASSANDRA-13796)
 + * force minumum timeout value (CASSANDRA-9375)
 + * use netty for streaming (CASSANDRA-12229)
 + * Use netty for internode messaging (CASSANDRA-8457)
 + * Add bytes repaired/unrepaired to nodetool tablestats (CASSANDRA-13774)
 + * Don't delete incremental repair sessions if they still have sstables 
(CASSANDRA-13758)
 + * Fix pending repair manager index out of bounds check (CASSANDRA-13769)
 + * Don't use RangeFetchMapCalculator when RF=1 (CASSANDRA-13576)
 + * Don't optimise trivial ranges in RangeFetchMapCalculator (CASSANDRA-13664)
 + * Use an ExecutorService for repair commands instead of new 
Thread(..).start() (CASSANDRA-13594)
 + * Fix race / ref leak in anticompaction (CASSANDRA-13688)
 + * Expose tasks queue length via JMX (CASSANDRA-12758)
 + * Fix race / ref leak in PendingRepairManager (CASSANDRA-13751)
 + * Enable ppc64le runtime as unsupported architecture (CASSANDRA-13615)
 + * Improve sstablemetadata output (CASSANDRA-11483)
 + * Support for migrating legacy users to roles has been dropped 
(CASSANDRA-13371)
 + * Introduce error metrics for repair (CASSANDRA-13387)
 + * Refactoring to primitive functional interfaces in AuthCache 
(CASSANDRA-13732)
 + * Update metrics to 3.1.5 (CASSANDRA-13648)
 + * batch_size_warn_threshold_in_kb can now be set at runtime (CASSANDRA-13699)
 + * Avoid always rebuilding secondary indexes at startup (CASSANDRA-13725)
 + * Upgrade JMH from 1.13 to 1.19 (CASSANDRA-13727)
 + * Upgrade SLF4J from 1.7.7 to 1.7.25 (CASSANDRA-12996)
 + * Default for start_native_transport now true if not set in config 
(CASSANDRA-13656)
 + * Don't add localhost to the graph when calculating where to stream from 
(CASSANDRA-13583)
 + * Make CDC availability more deterministic via hard-linking (CASSANDRA-12148)
 + * Allow skipping equality-restricted clustering columns in ORDER BY clause 
(CASSANDRA-10271)
 + * Use common nowInSec for validation compactions (CASSANDRA-13671)
 + * Improve handling of IR prepare failures (CASSANDRA-13672)
 + * Send IR coordinator messages synchronously (CASSANDRA-13673)
 + * Flush system.repair table before IR finalize promise (CASSANDRA-13660)
 + * Fix column filter creation for wildcard queries (CASSANDRA-13650)
 + * Add 'nodetool getbatchlogreplaythrottle' and 'nodetool 
setbatchlogreplaythrottle' (CASSANDRA-13614)
 + * fix race condition in PendingRepairManager (CASSANDRA-13659)
 + * Allow noop incremental repair state transitions (CASSANDRA-13658)
 + * Run repair with down replicas (CASSANDRA-10446)
 + * Added started & completed repair metrics (CASSANDRA-13598)
 + * Added started & completed repair metrics (CASSANDRA-13598)
 + * Improve secondary index (re)build failure and concurrency handling 
(CASSANDRA-10130)
 + * Improve calculation of available disk space for compaction 
(CASSANDRA-13068)
 + * Change the accessibility of RowCacheSerializer for third party row cache 
plugins (CASSANDRA-13579)
 + * Allow sub-range repairs for a preview of repaired data (CASSANDRA-13570)
 + * NPE in IR cleanup when columnfamily has no sstables (CASSANDRA-13585)
 + * Fix Randomness of stress values (CASSANDRA-12744)
 + * Allow selecting Map values and Set elements (CASSANDRA-7396)
 + * Fast and garbage-free Streaming Histogram (CASSANDRA-13444)
 + * Update repairTime for keyspaces on completion (CASSANDRA-13539)
 + * Add configurable upper bound for validation executor threads 
(CASSANDRA-13521)
 + * Bring back maxHintTTL propery (CASSANDRA-12982)
 + * Add testing guidelines (CASSANDRA-13497)
 + * Add more repair metrics (CASSANDRA-13531)
 + * RangeStreamer should be smarter when

[1/3] cassandra git commit: BTree.Builder memory leak

2017-09-01 Thread snazy

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.11 cd3aca036 -> bed7fa5ef
  refs/heads/trunk fb0e0019e -> e5f3bb6e5


BTree.Builder memory leak

patch by Robert Stupp; reviewed by Jeremiah Jordan for CASSANDRA-13754


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bed7fa5e
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bed7fa5e
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bed7fa5e

Branch: refs/heads/cassandra-3.11
Commit: bed7fa5ef8492d1ff3852cf299622a5ad4e0b621
Parents: cd3aca0
Author: Robert Stupp 
Authored: Fri Sep 1 19:11:32 2017 +0200
Committer: Robert Stupp 
Committed: Fri Sep 1 19:12:01 2017 +0200

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/utils/btree/BTree.java | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bed7fa5e/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index e5ccf45..c4a3170 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.11.1
+ * BTree.Builder memory leak (CASSANDRA-13754)
  * Revert CASSANDRA-10368 of supporting non-pk column filtering due to 
correctness (CASSANDRA-13798)
  * Fix cassandra-stress hang issues when an error during cluster connection 
happens (CASSANDRA-12938)
  * Better bootstrap failure message when blocked by (potential) range movement 
(CASSANDRA-13744)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/bed7fa5e/src/java/org/apache/cassandra/utils/btree/BTree.java
--
diff --git a/src/java/org/apache/cassandra/utils/btree/BTree.java 
b/src/java/org/apache/cassandra/utils/btree/BTree.java
index 1a5d9ae..a4519b9 100644
--- a/src/java/org/apache/cassandra/utils/btree/BTree.java
+++ b/src/java/org/apache/cassandra/utils/btree/BTree.java
@@ -866,7 +866,7 @@ public class BTree
 private void cleanup()
 {
 quickResolver = null;
-Arrays.fill(values, 0, count, null);
+Arrays.fill(values, null);
 count = 0;
 detected = true;
 auto = true;


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[2/3] cassandra git commit: BTree.Builder memory leak

2017-09-01 Thread snazy

BTree.Builder memory leak

patch by Robert Stupp; reviewed by Jeremiah Jordan for CASSANDRA-13754


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/bed7fa5e
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/bed7fa5e
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/bed7fa5e

Branch: refs/heads/trunk
Commit: bed7fa5ef8492d1ff3852cf299622a5ad4e0b621
Parents: cd3aca0
Author: Robert Stupp 
Authored: Fri Sep 1 19:11:32 2017 +0200
Committer: Robert Stupp 
Committed: Fri Sep 1 19:12:01 2017 +0200

--
 CHANGES.txt  | 1 +
 src/java/org/apache/cassandra/utils/btree/BTree.java | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/bed7fa5e/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index e5ccf45..c4a3170 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.11.1
+ * BTree.Builder memory leak (CASSANDRA-13754)
  * Revert CASSANDRA-10368 of supporting non-pk column filtering due to 
correctness (CASSANDRA-13798)
  * Fix cassandra-stress hang issues when an error during cluster connection 
happens (CASSANDRA-12938)
  * Better bootstrap failure message when blocked by (potential) range movement 
(CASSANDRA-13744)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/bed7fa5e/src/java/org/apache/cassandra/utils/btree/BTree.java
--
diff --git a/src/java/org/apache/cassandra/utils/btree/BTree.java 
b/src/java/org/apache/cassandra/utils/btree/BTree.java
index 1a5d9ae..a4519b9 100644
--- a/src/java/org/apache/cassandra/utils/btree/BTree.java
+++ b/src/java/org/apache/cassandra/utils/btree/BTree.java
@@ -866,7 +866,7 @@ public class BTree
 private void cleanup()
 {
 quickResolver = null;
-Arrays.fill(values, 0, count, null);
+Arrays.fill(values, null);
 count = 0;
 detected = true;
 auto = true;


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13754) BTree.Builder memory leak


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-13754:
-
Summary: BTree.Builder memory leak  (was: FastThreadLocal leaks memory)

> BTree.Builder memory leak
> -
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>Reporter: Eric Evans
>Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13085) Cassandra fails to start because WindowsFailedSnapshotTracker can not write to CASSANDRA_HOME

2017-09-01 Thread Jason Rust (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150839#comment-16150839
 ] 

Jason Rust commented on CASSANDRA-13085:


We've also hit this issue when trying to deploy C* on windows.  If a new 
directory is chosen it might make sense to also use that as the default in the 
HeapUtils class, the only other class that references and tries to write to the 
CASSANDRA_HOME folder.

A less-invasive workaround I've found is to set CASSANDRA_HOME to the data 
directory as the very last line of cassandra-env.ps1.  This allows the 
libraries to be sourced from the real CASSANDRA_HOME, but then overwrites the 
variable before C* actually launches.

> Cassandra fails to start because WindowsFailedSnapshotTracker can not write 
> to CASSANDRA_HOME
> -
>
> Key: CASSANDRA-13085
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13085
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths, Packaging
> Environment: might be windows only considering the classname
>Reporter: Pieter-Jan Pintens
>  Labels: windows
>
> We are currently trying to package Cassandra with our application.
> In windows our server does not want to start because it want to write to 
> CASSANDRA_HOME\.toDelete, since we install to 'C:\program files\...' this is 
> not possible when started under a non privileged user. We were hoping that 
> setting pointers for the data and log dir to a writable location (somewhere 
> under user home) would be enough to start cassandra but this component wants 
> to write to a path that we cannot modify.
> For us there are a couple of solutions:
> 1) the location can be specified using a system property like data and log 
> dirs
> 2) this file is written to the data location
> Our current work arround would be to patch this class file but that is hard 
> to maintain.
> {noformat}
> Exception (java.lang.RuntimeException) encountered during startup: Failed to 
> cre
> ate failed snapshot tracking file [.toDelete]. Aborting
> java.lang.RuntimeException: Failed to create failed snapshot tracking file 
> [.toDelete]. Aborting 
> at 
> org.apache.cassandra.db.WindowsFailedSnapshotTracker.deleteOldSnapshots(WindowsFailedSnapshotTracker.java:98)
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:186)
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:601)
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:730)
> at com.id.cassandra.wrapper.Main.main(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
> at java.lang.reflect.Method.invoke(Unknown Source)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Cassandra COPY TO CSV format , producing inconsistent data

2017-09-01 Thread helloga...@gmail.com

HI , 

we are facing issues with copy to  csv file , from a table.  the date field 
value is not getting in CSV, 
but the select state on the same table will show the date.. 

and this behavior is  inconsistent , sometimes we get the data for date filed 
and sometimes not.. 
also the select statement  very occasionally  on the first run date column 
shown as empty and on immediate second run it shows the date column value

improper result:
select call_uid,call_start_date from customer_calls where customer_uid=2904;

 call_uid| call_start_date
-+-
 19096285868247 | 2017-08-30 13:30:23.839000+
  19096285878250 | 2017-08-30 13:30:33.842000+
  19096374614659 |null
  19096374616659 |null
  19096374618669 |null
  19096374620665 |null
  19096374622671 |null
  19096374624662 |null
  19096374626656 |null
 20195690924360 | 2017-08-29 07:54:12.171000+
  20195797463722 | 2017-08-30 13:29:51.558000+


Proper result: (on executing second time)
cqlsh:ncm> select call_uid,call_start_date from customer_calls where 
customer_uid=2904;

 call_uid   | call_start_date
+-
 19096374614659 | 2017-08-31 14:09:30.248000+
 19096374616659 | 2017-08-31 14:09:32.247000+
 19096374618669 | 2017-08-31 14:09:34.258000+
 19096374620665 | 2017-08-31 14:09:36.253000+
 19096374622671 | 2017-08-31 14:09:38.259000+
 19096374624662 | 2017-08-31 14:09:40.25+
 19096374626656 | 2017-08-31 14:09:42.244000+





-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13738) Load is over calculated after each IndexSummaryRedistribution

2017-09-01 Thread Jay Zhuang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150783#comment-16150783
 ] 

Jay Zhuang commented on CASSANDRA-13738:


2.2 branch uTest fail for {{ant eclipse-warnings}}, but I'm unable to reproduce 
it locally:
{noformat}
eclipse-warnings:
[mkdir] Created dir: /home/ubuntu/cassandra/build/ecj
 [echo] Running Eclipse Code Analysis.  Output logged to 
/home/ubuntu/cassandra/build/ecj/eclipse_compiler_checks.txt
 [java] incorrect classpath: /home/ubuntu/cassandra/build/cobertura/classes
 [java] --
 [java] 1. ERROR in 
/home/ubuntu/cassandra/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
 (at line 853)
 [java] ISSTableScanner scanner = cleanupStrategy.getScanner(sstable, 
getRateLimiter());
 [java] ^^^
 [java] Resource 'scanner' should be managed by try-with-resource
 [java] --
 [java] --
 [java] 2. ERROR in 
/home/ubuntu/cassandra/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java
 (at line 257)
 [java] scanners.add(new LeveledScanner(intersecting, range));
 [java]  ^^^
 [java] Potential resource leak: '' may not be 
closed
 [java] --
 [java] --
 [java] 3. ERROR in 
/home/ubuntu/cassandra/src/java/org/apache/cassandra/tools/SSTableExport.java 
(at line 315)
 [java] ISSTableScanner scanner = reader.getScanner();
 [java] ^^^
 [java] Resource 'scanner' should be managed by try-with-resource
 [java] --
 [java] 3 problems (3 errors)
{noformat}
And for the other test failures, I don't think they're introduced by this patch.

> Load is over calculated after each IndexSummaryRedistribution
> -
>
> Key: CASSANDRA-13738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13738
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
> Attachments: sizeIssue.png
>
>
> For example, here is one of our cluster with about 500GB per node, but 
> {{nodetool status}} shows far more load than it actually is and keeps 
> increasing, restarting the process will reset the load, but keeps increasing 
> afterwards:
> {noformat}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens   Owns (effective)  Host ID  
>  Rack
> UN  IP1*   13.52 TB   256  100.0%
> c4c31e0a-3f01-49f7-8a22-33043737975d  rac1
> UN  IP2*   14.25 TB   256  100.0%
> efec4980-ec9e-4424-8a21-ce7ddaf80aa0  rac1
> UN  IP3*   13.52 TB   256  100.0%
> 7dbcfdfc-9c07-4b1a-a4b9-970b715ebed8  rac1
> UN  IP4*   22.13 TB   256  100.0%
> 8879e6c4-93e3-4cc5-b957-f999c6b9b563  rac1
> UN  IP5*   18.02 TB   256  100.0%
> 4a1eaf22-4a83-4736-9e1c-12f898d685fa  rac1
> UN  IP6*   11.68 TB   256  100.0%
> d633c591-28af-42cc-bc5e-47d1c8bcf50f  rac1
> {noformat}
> !sizeIssue.png|test!
> The root cause is if the SSTable index summary is redistributed (typically 
> executes hourly), the updated SSTable size is added again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13833) Failed compaction is not captured

2017-09-01 Thread Jay Zhuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Zhuang updated CASSANDRA-13833:
---
Status: Patch Available  (was: Open)

> Failed compaction is not captured
> -
>
> Key: CASSANDRA-13833
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13833
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>
> Follow up for CASSANDRA-13785, when the compaction failed, it fails silently. 
> No error message is logged and exceptions metric is not updated. Basically, 
> it's unable to get the exception: 
> [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491]
> Here is the call stack:
> {noformat}
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> There're 2 {{FutureTask}} in the call stack, for example 
> {{FutureTask1(FutureTask2))}}, If the call thrown an exception, 
> {{FutureTask2}} sets the status, save the exception and return. But 
> FutureTask1 doesn't get any exception, then set the status to normal. So 
> we're unable to get the exception in:
> [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491]
> 2.1.x is working fine, here is the call stack:
> {noformat}
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264)
>  ~[main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_141]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_141]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[na:1.8.0_141]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_141]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13530) GroupCommitLogService

2017-09-01 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150661#comment-16150661
 ] 

Ariel Weisberg commented on CASSANDRA-13530:


You are still testing batch at 2ms. I don't think that should hurt performance, 
but I would really like to see testing with it at the default value. If it's 
syncing extra times with smaller batches due to the 2ms setting that would hurt 
performance.

My main question is how many operations are in each batch when batch and group 
commitlog are syncing?  Is it aggregating batches the way it is supposed to? Is 
it syncing more times per second and killing the throughput of the underlying 
device? Is the issue that the device is shared between data and the commit log 
so it's better to have fewer larger syncs? 

Have you added a warmup phase to the testing so that everything is warmed up 
before you start measuring? Can you modify the each commit log to log when each 
sync starts and completes along with how many log entries are in each sync? 
When you log have the log statements go to a dedicated thread to log them via 
an unbounded blocking queue so they don't impact performance.

> GroupCommitLogService
> -
>
> Key: CASSANDRA-13530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13530
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuji Ito
>Assignee: Yuji Ito
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: groupCommit22.patch, groupCommit30.patch, 
> groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, 
> groupCommitLog_result.xlsx, GuavaRequestThread.java, MicroRequestThread.java
>
>
> I propose a new CommitLogService, GroupCommitLogService, to improve the 
> throughput when lots of requests are received.
> It improved the throughput by maximum 94%.
> I'd like to discuss about this CommitLogService.
> Currently, we can select either 2 CommitLog services; Periodic and Batch.
> In Periodic, we might lose some commit log which hasn't written to the disk.
> In Batch, we can write commit log to the disk every time. The size of commit 
> log to write is too small (< 4KB). When high concurrency, these writes are 
> gathered and persisted to the disk at once. But, when insufficient 
> concurrency, many small writes are issued and the performance decreases due 
> to the latency of the disk. Even if you use SSD, processes of many IO 
> commands decrease the performance.
> GroupCommitLogService writes some commitlog to the disk at once.
> The patch adds GroupCommitLogService (It is enabled by setting 
> `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml).
> The difference from Batch is just only waiting for the semaphore.
> By waiting for the semaphore, some writes for commit logs are executed at the 
> same time.
> In GroupCommitLogService, the latency becomes worse if the there is no 
> concurrency.
> I measured the performance with my microbench (MicroRequestThread.java) by 
> increasing the number of threads.The cluster has 3 nodes (Replication factor: 
> 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume.
> The result is as below. The GroupCommitLogService with 10ms window improved 
> update with Paxos by 94% and improved select with Paxos by 76%.
> h6. SELECT / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|192|103|
> |2|163|212|
> |4|264|416|
> |8|454|800|
> |16|744|1311|
> |32|1151|1481|
> |64|1767|1844|
> |128|2949|3011|
> |256|4723|5000|
> h6. UPDATE / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|45|26|
> |2|39|51|
> |4|58|102|
> |8|102|198|
> |16|167|213|
> |32|289|295|
> |64|544|548|
> |128|1046|1058|
> |256|2020|2061|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13838) Ensure all threads are FastThreadLocal.removeAll() is called for all threads


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-13838:
-
Status: Patch Available  (was: Open)

> Ensure all threads are FastThreadLocal.removeAll() is called for all threads
> 
>
> Key: CASSANDRA-13838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13838
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>
> There are a couple of places, there it's not guaranteed that 
> FastThreadLocal.removeAll() is called. Most misses are actually not that 
> critical, but the miss for the thread created via in 
> org.apache.cassandra.streaming.ConnectionHandler.MessageHandler#start(java.net.Socket,
>  int, boolean) could be critical, because these threads are created for every 
> stream-session.
> (Follow-up from CASSANDRA-13754)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13838) Ensure all threads are FastThreadLocal.removeAll() is called for all threads


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150634#comment-16150634
 ] 

Robert Stupp commented on CASSANDRA-13838:
--

||cassandra-3.11|[branch|https://github.com/apache/cassandra/compare/cassandra-3.11...snazy:13838-ftl-ensure-3.11]
||trunk|[branch|https://github.com/apache/cassandra/compare/trunk...snazy:13838-ftl-ensure-trunk]


> Ensure all threads are FastThreadLocal.removeAll() is called for all threads
> 
>
> Key: CASSANDRA-13838
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13838
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>
> There are a couple of places, there it's not guaranteed that 
> FastThreadLocal.removeAll() is called. Most misses are actually not that 
> critical, but the miss for the thread created via in 
> org.apache.cassandra.streaming.ConnectionHandler.MessageHandler#start(java.net.Socket,
>  int, boolean) could be critical, because these threads are created for every 
> stream-session.
> (Follow-up from CASSANDRA-13754)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory

2017-09-01 Thread Jeremiah Jordan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150611#comment-16150611
 ] 

Jeremiah Jordan commented on CASSANDRA-13754:
-

and +1 for the patch.

> FastThreadLocal leaks memory
> 
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>Reporter: Eric Evans
>Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13754) FastThreadLocal leaks memory

2017-09-01 Thread Jeremiah Jordan (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremiah Jordan updated CASSANDRA-13754:

Status: Ready to Commit  (was: Patch Available)

> FastThreadLocal leaks memory
> 
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>Reporter: Eric Evans
>Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory

2017-09-01 Thread Jeremiah Jordan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150609#comment-16150609
 ] 

Jeremiah Jordan commented on CASSANDRA-13754:
-

+1 for just 
https://github.com/apache/cassandra/commit/2cafd0b6b4bbc5a6ec5726d47d0093bdac3af19c
 to fix this and splitting out the other changes to a new ticket.

> FastThreadLocal leaks memory
> 
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>Reporter: Eric Evans
>Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13754) FastThreadLocal leaks memory


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-13754:
-
Reviewer: Jeremiah Jordan
  Status: Patch Available  (was: In Progress)

Given that the FTL changes apparently do not have any influence to the OOM 
issuse, but look serious enough to fix, I've split them out into 
CASSANDRA-13838.

Patch for this ticket is reduced to the BTree change.
CI looks good.

> FastThreadLocal leaks memory
> 
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>Reporter: Eric Evans
>Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13734) BufferUnderflowException when using uppercase UUID

2017-09-01 Thread Claudia S (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150585#comment-16150585
 ] 

Claudia S commented on CASSANDRA-13734:
---

Hi,

Following the schemas that were used in my tests. We also have more versions of 
the table using more complex clustering keys, let me know if these could have 
an impact and you need them as well.

{code}
CREATE TYPE IF NOT EXISTS event_log_system.vivates_participant (
id  text,
global_role text,
local_role  text
);

CREATE TABLE IF NOT EXISTS event_log_system.event (
id  uuid,
version text,
created_at  timestamp,
event_type  text,
source_id   text,
session_id  text,
session_typetext,
business_process_id text,
action  text,
action_outcome  text,
user frozen ,
hp_id   text,
patient_id  text,
participants frozen ,
message text,
extensions  map,
PRIMARY KEY (id)
);

CREATE TABLE IF NOT EXISTS event_log_system.event_by_patient_timestamp (
id  uuid,
version text,
created_at  timestamp,
event_type  text,
source_id   text,
session_id  text,
session_typetext,
business_process_id text,
action  text,
action_outcome  text,
user frozen ,
hp_id   text,
patient_id  text,
participants frozen ,
message text,
extensions  map,
PRIMARY KEY (patient_id, created_at, id) 
)
WITH CLUSTERING ORDER BY (created_at DESC);
{code}

And the queries we do (in some cases we also execute the same queries without 
using JSON):
{code}
SELECT JSON * FROM event WHERE id = ?
SELECT JSON * FROM event_by_patient_timestamp WHERE patient_id = ? AND 
created_at < ? LIMIT ?;
SELECT JSON COUNT(*) FROM event_by_patient_timestamp WHERE patient_id = ? AND 
created_at > ?;
{code}

> BufferUnderflowException when using uppercase UUID
> --
>
> Key: CASSANDRA-13734
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13734
> Project: Cassandra
>  Issue Type: Bug
> Environment: Cassandra 2.2.8 running on OSX 10.12.5
> * org.apache.cassandra:cassandra-all:jar:2.2.8
> * com.datastax.cassandra:cassandra-driver-core:jar:3.0.0
> * org.apache.cassandra:cassandra-thrift:jar:2.2.8
>Reporter: Claudia S
>
> We have a table with a primary key of type uuid which we query for results in 
> JSON format. When I accidentally caused a query passing a UUID which has an 
> uppercase letter I noticed that this causes a BufferUnderflowException on 
> Cassandra.
> I directly attempted the queries using cqlsh, I can retrieve the entry using 
> standard select but whenever I pass JSON I get a BufferUnderflowException.
> {code:title=cql queries}
> cassandra@cqlsh:event_log_system> SELECT * FROM event WHERE id = 
> 559a4d83-9410-4b69-b459-566b8cf57aaa;
> [RESULT REMOVED]
> (1 rows)
> cassandra@cqlsh:event_log_system> SELECT * FROM event WHERE id = 
> 559a4d83-9410-4b69-b459-566b8cf57AAA;
> [RESULT REMOVED]
> (1 rows)
> cassandra@cqlsh:event_log_system> SELECT JSON * FROM event WHERE id = 
> 559a4d83-9410-4b69-b459-566b8cf57AAA;
> ServerError: java.nio.BufferUnderflowException
> cassandra@cqlsh:event_log_system> SELECT JSON * FROM event WHERE id = 
> 559a4d83-9410-4b69-b459-566b8cf57aaa;
> ServerError: java.nio.BufferUnderflowException
> {code}
> {code:title=log}
> TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,392 Message.java:506 - 
> Received: QUERY SELECT JSON * FROM event WHERE id = 
> 559a4d83-9410-4b69-b459-566b8cf57AAA;, v=4
> TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,392 QueryProcessor.java:221 - 
> Process org.apache.cassandra.cql3.statements.SelectStatement@67e6c0c @CL.ONE
> TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,392 ReadCallback.java:76 - 
> Blockfor is 1; setting up requests to localhost/127.0.0.1
> TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,393 
> AbstractReadExecutor.java:118 - reading data locally
> TRACE [SharedPool-Worker-2] 2017-07-28 20:40:41,393 SliceQueryFilter.java:269 
> - collecting 0 of 2147483647: :false:0@150126701983
> TRACE [SharedPool-Worker-2] 2017-07-28 20:40:41,393 SliceQueryFilter.java:269 
> - collecting 1 of 2147483647: can_login:false:1@150126701983
> TRACE [SharedPool-Worker-2] 2017-07-28 20:40:41,393 SliceQueryFilter.java:269 
> - collecting 1 of 2147483647: is_superuser:false:1@150126701983
> TRACE [SharedPool-Worker-2] 2017-07-28 20:40:41,393 SliceQueryFilter.java:269 
> - collecting 1 of 2147483647: salted_hash:false:60@150126701983
> TRACE [SharedPool-Worker-1] 2017-07-28 20:40:41,393

[jira] [Created] (CASSANDRA-13838) Ensure all threads are FastThreadLocal.removeAll() is called for all threads

Robert Stupp created CASSANDRA-13838:


 Summary: Ensure all threads are FastThreadLocal.removeAll() is 
called for all threads
 Key: CASSANDRA-13838
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13838
 Project: Cassandra
  Issue Type: Improvement
Reporter: Robert Stupp
Assignee: Robert Stupp


There are a couple of places, there it's not guaranteed that 
FastThreadLocal.removeAll() is called. Most misses are actually not that 
critical, but the miss for the thread created via in 
org.apache.cassandra.streaming.ConnectionHandler.MessageHandler#start(java.net.Socket,
 int, boolean) could be critical, because these threads are created for every 
stream-session.

(Follow-up from CASSANDRA-13754)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Resolved] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson resolved CASSANDRA-13836.
-
   Resolution: Fixed
Fix Version/s: 4.0

committed as {{fb0e0019e76eb96659904}}

> dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
> 
>
> Key: CASSANDRA-13836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 4.0
>
>
> Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and 
> that makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13780) ADD Node streaming throughput performance

2017-09-01 Thread Kevin Rivait (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150566#comment-16150566
 ] 

Kevin Rivait commented on CASSANDRA-13780:
--

thank you Jeff, 

regarding TWCS, agree this is a non issue. We went back and dumped the SSTABLE 
max/min dates and verified that the buckets older than TTL and GCGS are in fact 
being dropped.

regarding adding a DC,  that is exactly what we did in our DEV/TEST 
environment,  but we rebuilt nodes one at a time, we weren't sure the 
consequences (if any) of rebuilding all nodes at the same time.

> ADD Node streaming throughput performance
> -
>
> Key: CASSANDRA-13780
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13780
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: Linux 2.6.32-696.3.2.el6.x86_64 #1 SMP Mon Jun 19 
> 11:55:55 PDT 2017 x86_64 x86_64 x86_64 GNU/Linux
> Architecture:  x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):40
> On-line CPU(s) list:   0-39
> Thread(s) per core:2
> Core(s) per socket:10
> Socket(s): 2
> NUMA node(s):  2
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 79
> Model name:Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
> Stepping:  1
> CPU MHz:   2199.869
> BogoMIPS:  4399.36
> Virtualization:VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  256K
> L3 cache:  25600K
> NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38
> NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39
>  total   used   free sharedbuffers cached
> Mem:  252G   217G34G   708K   308M   149G
> -/+ buffers/cache:67G   185G
> Swap:  16G 0B16G
>Reporter: Kevin Rivait
> Fix For: 3.0.9
>
>
> Problem: Adding a new node to a large cluster runs at least 1000x slower than 
> what the network and node hardware capacity can support, taking several days 
> per new node.  Adjusting stream throughput and other YAML parameters seems to 
> have no effect on performance.  Essentially, it appears that Cassandra has an 
> architecture scalability growth problem when adding new nodes to a moderate 
> to high data ingestion cluster because Cassandra cannot add new node capacity 
> fast enough to keep up with increasing data ingestion volumes and growth.
> Initial Configuration: 
> Running 3.0.9 and have implemented TWCS on one of our largest table.
> Largest table partitioned on (ID, MM)  using 1 day buckets with a TTL of 
> 60 days.
> Next release will change partitioning to (ID, MMDD) so that partitions 
> are aligned with daily TWCS buckets.
> Each node is currently creating roughly a 30GB SSTable per day.
> TWCS working as expected,  daily SSTables are dropping off daily after 70 
> days ( 60 + 10 day grace)
> Current deployment is a 28 node 2 datacenter cluster, 14 nodes in each DC , 
> replication factor 3
> Data directories are backed with 4 - 2TB SSDs on each node  and a 1 800GB SSD 
> for commit logs.
> Requirement is to double cluster size, capacity, and ingestion volume within 
> a few weeks.
> Observed Behavior:
> 1. streaming throughput during add node – we observed maximum 6 Mb/s 
> streaming from each of the 14 nodes on a 20Gb/s switched network, taking at 
> least 106 hours for each node to join cluster and each node is only about 2.2 
> TB is size.
> 2. compaction on the newly added node - compaction has fallen behind, with 
> anywhere from 4,000 to 10,000 SSTables at any given time.  It took 3 weeks 
> for compaction to finish on each newly added node.   Increasing number of 
> compaction threads to match number of CPU (40)  and increasing compaction 
> throughput to 32MB/s seemed to be the sweet spot. 
> 3. TWCS buckets on new node, data streamed to this node over 4 1/2 days.  
> Compaction correctly placed the data in daily files, but the problem is the 
> file dates reflect when compaction created the file and not the date of the 
> last record written in the TWCS bucket, which will cause the files to remain 
> around much longer than necessary.  
> Two Questions:
> 1. What can be done to substantially improve the performance of adding a new 
> node?
> 2. Can compaction on TWCS partitions for newly added nodes change the file 
> create date to match the highest date record in the file -or- add another 
> piece of meta-data to the TWCS files that reflect the file drop date so that 
> TWCS partitions can be dropped consistently?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

cassandra git commit: Set encryptionOptions to null if encryption is disabled in BulkLoadConnectionFactory

2017-09-01 Thread marcuse

Repository: cassandra
Updated Branches:
  refs/heads/trunk 8ed41fbc5 -> fb0e0019e


Set encryptionOptions to null if encryption is disabled in 
BulkLoadConnectionFactory

Patch by marcuse; reviewed by Jason Brown for CASSANDRA-13836


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/fb0e0019
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/fb0e0019
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/fb0e0019

Branch: refs/heads/trunk
Commit: fb0e0019e76eb96659904b9161f06b600718e704
Parents: 8ed41fb
Author: Marcus Eriksson 
Authored: Fri Sep 1 15:03:05 2017 +0200
Committer: Marcus Eriksson 
Committed: Fri Sep 1 15:55:06 2017 +0200

--
 .../org/apache/cassandra/tools/BulkLoadConnectionFactory.java| 4 +++-
 src/java/org/apache/cassandra/tools/BulkLoader.java  | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/fb0e0019/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java
--
diff --git a/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java 
b/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java
index d119081..b56d292 100644
--- a/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java
+++ b/src/java/org/apache/cassandra/tools/BulkLoadConnectionFactory.java
@@ -38,7 +38,9 @@ public class BulkLoadConnectionFactory extends 
DefaultConnectionFactory implemen
 {
 this.storagePort = storagePort;
 this.secureStoragePort = secureStoragePort;
-this.encryptionOptions = encryptionOptions;
+this.encryptionOptions = encryptionOptions != null && 
encryptionOptions.internode_encryption == 
EncryptionOptions.ServerEncryptionOptions.InternodeEncryption.none
+ ? null
+ : encryptionOptions;
 this.outboundBindAny = outboundBindAny;
 }
 

http://git-wip-us.apache.org/repos/asf/cassandra/blob/fb0e0019/src/java/org/apache/cassandra/tools/BulkLoader.java
--
diff --git a/src/java/org/apache/cassandra/tools/BulkLoader.java 
b/src/java/org/apache/cassandra/tools/BulkLoader.java
index 0f1c555..01d8c33 100644
--- a/src/java/org/apache/cassandra/tools/BulkLoader.java
+++ b/src/java/org/apache/cassandra/tools/BulkLoader.java
@@ -104,7 +104,7 @@ public class BulkLoader
 
 // Give sockets time to gracefully close
 Thread.sleep(1000);
-// System.exit(0); // We need that to stop non daemonized threads
+System.exit(0); // We need that to stop non daemonized threads
 }
 catch (Exception e)
 {


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 9/1/17 1:56 PM:
---

Don't worry [~michaelsembwever], I am currently working on an issue with 
couchbase so I couldn't have checked it until monday. So no hard feeling :)


P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.


was (Author: rgerard):
Don't worry [~michaelsembwever], I am currently working with an issue on 
couchbase so I couldn't have checked it until monday. So no hard feeling :)


P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559
 ] 

Romain GERARD edited comment on CASSANDRA-13418 at 9/1/17 1:51 PM:
---

Don't worry [~michaelsembwever], I am currently working with an issue on 
couchbase so I couldn't have checked it until monday. So no hard feeling :)


P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.


was (Author: rgerard):
Don't worry [~mck], I am currently working with an issue on couchbase so I 
couldn't have checked it until monday. So no hard feeling :)


P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150556#comment-16150556
 ] 

Jason Brown edited comment on CASSANDRA-13836 at 9/1/17 1:51 PM:
-

[~krummas] and I discussed offline, and we'll commit his current patch as-is, 
and open a new ticket for the daemon threads issue. This way we can unblock 
dtests on trunk.

UPDATE: created CASSANDRA-13837


was (Author: jasobrown):
[~krummas] and I discussed offline, and we'll commit his current patch as-is, 
and open a new ticket for the daemon threads issue. This way we can unblock 
dtests on trunk.

> dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
> 
>
> Key: CASSANDRA-13836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>
> Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and 
> that makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150559#comment-16150559
 ] 

Romain GERARD commented on CASSANDRA-13418:
---

Don't worry [~mck], I am currently working with an issue on couchbase so I 
couldn't have checked it until monday. So no hard feeling :)


P.s: 
https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab#diff-e8e282423dcbf34d30a3578c8dec15cdR176
 still think is less clear to inline it.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-13837) Hanging threads in BulkLoader

Jason Brown created CASSANDRA-13837:
---

 Summary: Hanging threads in BulkLoader
 Key: CASSANDRA-13837
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13837
 Project: Cassandra
  Issue Type: Bug
Reporter: Jason Brown
Assignee: Jason Brown
Priority: Minor


[~krummas] discovered some threads that were not closing correctly when he 
fixed CASSANDRA-13836. We suspect this is due to CASSANDRA-8457/CASSANDRA-12229.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150556#comment-16150556
 ] 

Jason Brown commented on CASSANDRA-13836:
-

[~krummas] and I discussed offline, and we'll commit his current patch as-is, 
and open a new ticket for the daemon threads issue. This way we can unblock 
dtests on trunk.

> dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
> 
>
> Key: CASSANDRA-13836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>
> Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and 
> that makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150518#comment-16150518
 ] 

Jason Brown commented on CASSANDRA-13836:
-

Do we need to uncomment the {{System.exit()}} in {{BulkLoader}}? That was not 
one the changes from CASSANDRA-12229, but from CASSANDRA-10637 (1.5 years ago)

Otherwise +1

> dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
> 
>
> Key: CASSANDRA-13836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>
> Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and 
> that makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150520#comment-16150520
 ] 

Marcus Eriksson commented on CASSANDRA-13836:
-

I'm guessing CASSANDRA-12229 or CASSANDRA-8457 introduced some non-daemon 
threads so that we need the System.exit again? sstableloader exits fine before 
those two went in, but after it hangs

> dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
> 
>
> Key: CASSANDRA-13836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>
> Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and 
> that makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-13836:
---

Assignee: Marcus Eriksson  (was: Jason Brown)

> dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
> 
>
> Key: CASSANDRA-13836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>
> Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and 
> that makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-13836:

Reviewer: Jason Brown

> dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
> 
>
> Key: CASSANDRA-13836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>
> Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and 
> that makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150474#comment-16150474
 ] 

Marcus Eriksson commented on CASSANDRA-13836:
-

https://github.com/krummas/cassandra/commits/marcuse/13836

also seems we now again need the System.exit(0) there, we should probably have 
a look at that as well

[~jasobrown] could you review?

> dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
> 
>
> Key: CASSANDRA-13836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Jason Brown
>
> Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and 
> that makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Assigned] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown reassigned CASSANDRA-13836:
---

Assignee: Jason Brown

> dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
> 
>
> Key: CASSANDRA-13836
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Jason Brown
>
> Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and 
> that makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-13836) dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore

Marcus Eriksson created CASSANDRA-13836:
---

 Summary: dtest failure: 
snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore
 Key: CASSANDRA-13836
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13836
 Project: Cassandra
  Issue Type: Bug
Reporter: Marcus Eriksson


Looks like sstableloader always tries to use SSL since CASSANDRA-12229 and that 
makes the dtest hang



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13339) java.nio.BufferOverflowException: null


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150412#comment-16150412
 ] 

Jason Brown edited comment on CASSANDRA-13339 at 9/1/17 12:56 PM:
--

[~theochu] Thanks for the additional data points. Are you using counters? Can 
you copy the trace trace, as well, into this ticket?

Basically, I'm trying to get enough data so I can reproduce this bug - because 
then I can actually fix it.


was (Author: jasobrown):
[~theochu] Thanks for the additional data points. Are you using counters? Can 
you copy the trace trace, as well, into this ticket?

> java.nio.BufferOverflowException: null
> --
>
> Key: CASSANDRA-13339
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13339
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Chris Richards
>
> I'm seeing the following exception running Cassandra 3.9 (with Netty updated 
> to 4.1.8.Final) running on a 2 node cluster.  It would have been processing 
> around 50 queries/second at the time (mixture of 
> inserts/updates/selects/deletes) : there's a collection of tables (some with 
> counters some without) and a single materialized view.
> {code}
> ERROR [MutationStage-4] 2017-03-15 22:50:33,052 StorageProxy.java:1353 - 
> Failed to apply mutation locally : {}
> java.nio.BufferOverflowException: null
>   at 
> org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.writeUnsignedVInt(BufferedDataOutputStreamPlus.java:262)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.EncodingStats$Serializer.serialize(EncodingStats.java:233)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.SerializationHeader$Serializer.serializeForMessaging(SerializationHeader.java:380)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:122)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serialize(PartitionUpdate.java:790)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.serialize(Mutation.java:393)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:279) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:241) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1347)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2539)
>  [apache-cassandra-3.9.jar:3.9]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_121]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  [apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.9.jar:3.9]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}
> and then again shortly afterwards
> {code}
> ERROR [MutationStage-3] 2017-03-15 23:27:36,198 StorageProxy.java:1353 - 
> Failed to apply mutation locally : {}
> java.nio.BufferOverflowException: null
>   at 
> org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
>

[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3

[
https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pawel Szlendak updated CASSANDRA-13835:
---
Description:
I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was
surprised to notice performance degradation of my server application.
I dug down through my application stack only to find out that the cause of the
performance issue was slower response time of Cassandra 3.10 get_slice as
compared to Cassandra 1.2.18 (almost x3 times slower on average).

I am attaching a python script (attack.py) here that can be used to reproduce
this issue on a Windows platform. The script uses the pycassa python library
that can easily be installed using pip.

{noformat}
python attack.py create
{noformat}

4. Run some get_slice queries to an empty CF and note down the average response
time (in seconds)

{noformat}
python attack.py
{noformat}

get_slice count: 788
get_slice total response time: 0.3126376
*get_slice average response time: 0.000397208075838*
5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from
https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz
6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run
Cassandra from an elevated cmd console using cassandra.bat
7. Create a test keyspace and an empty CF using attack.py script

{noformat}
python attack.py create
{noformat}

8. Run some get_slice queries to an empty CF using attack.py and note down the
average response time (in seconds)

{noformat}
python attack.py
{noformat}

get_slice count: 788
get_slice total response time: 1.1646185
*get_slice average response time: 0.00147842634753*
9. Compare the average response times
EXPECTED:
get_slice response time of Cassandra 3.10 is not worse than on Cassandra
1.2.18
ACTUAL:
get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra
1.2.18

REMARKS:
- this seems to happen only on Windows platform (tested on Windows 10 and
Windows Server 2008 R2)
- running the very same procedure on Linux (Ubuntu) renders roughly the same
response times
- I sniffed the traffic to/from Cassandra 1.2.18 and Cassandra 3.10 and it can
be seen that Cassandra 3.10 responds slower (Wireshark dumps attached)
- when attacking the server with concurrent get_slice queries I can see lower
CPU usage for Cassandra 3.10 that for Cassandra 1.2.18
- get_slice in attack.py queries the column family for non-exisitng key (the
column famility is empty)

I am willing to work on this on my own if you guys give me some tips on where
to look for. I am also aware that this might be more Windows/Java related,
nevertheless, any help from your side would be much appreciated.

was:
I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was
surprised to notice performance degradation of my server application.
I dug down through my application stack only to find out that the cause of the
performance issue was slower response time of Cassandra 3.10 get_slice as
compared to Cassandra 1.2.18 (almost x3 times slower on average).

I am attaching a python script (attack.py) here that can be used to reproduce
this issue on a Windows platform. The script uses the pycassa python library
that can easily be installed using pip.

{noformat}
python attack.py create
{noformat}

4. Run some get_slice queries to an empty CF and note down the average response
time (in seconds)

{noformat}
python attack.py
{noformat}

{noformat}
python attack.py create
{noformat}

8. Run some get_slice queries to an empty CF using attack.py and note down the
average response time (in seconds)

{noformat}
python attack.py
{noformat}

get_slice count: 788
get_slice total response time: 1.1646185
*get_slice average response time: 0.00147842634753*
9. Compare the average response times
EXPECTED:
get_slice response time of

[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3

[
https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

I am attaching a python script (attack.py) here that can be used to reproduce
this issue on a Windows platform. The script uses the pycassa python library
that can easily be installed using pip.

{noformat}
python attack.py create
{noformat}

4. Run some get_slice queries to an empty CF and note down the average response
time (in seconds)

{noformat}
python attack.py
{noformat}

{noformat}
python attack.py create
{noformat}

8. Run some get_slice queries to an empty CF using attack.py and note down the
average response time (in seconds)

{noformat}
python attack.py
{noformat}

I am attaching a python script (attack.py) here that can be used to reproduce
this issue on a Windows platform. The script uses the pycassa python library
that can easily be installed using pip.

{noformat}
python attack.py create
{noformat}

4. Run some get_slice queries to an empty CF and note down the average response
time (in seconds)

{noformat}
python attack.py
{noformat}

{noformat}
python attack.py create
{noformat}

8. Run some get_slice queries to an empty CF using attack.py and note down the
average response time (in seconds)

{noformat}
python attack.py
{noformat}

get_slice count: 788
get_slice total response time: 1.1646185
*get_slice average response time: 0.00147842634753*
9. Compare the average response times
EXPECTED:
get_slice response time of

[jira] [Updated] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-13418:

Fix Version/s: 4.x
   3.11.x
   Status: Patch Available  (was: Open)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Fix For: 3.11.x, 4.x
>
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150446#comment-16150446
 ] 

mck commented on CASSANDRA-13418:
-

Updated:
|| branch || testall || dtest ||
| 
[cassandra-3.11_13418|https://github.com/thelastpickle/cassandra/tree/mck/cassandra-3.11_13418]
   | 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Fcassandra-3.11_13418]
 | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265]
 |
| [trunk_13418|https://github.com/thelastpickle/cassandra/tree/mck/trunk_13418] 
| 
[testall|https://circleci.com/gh/thelastpickle/cassandra/tree/mck%2Ftrunk_13418]
  | 
[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/265]
 |

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-09-01 Thread ZhaoYang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150303#comment-16150303
 ] 

ZhaoYang edited comment on CASSANDRA-11500 at 9/1/17 12:27 PM:
---

[~pauloricardomg] thanks for the feedback (y)

bq. I wasn't very comfortable with our previous approach of enforcing strict 
liveness during row merge, since it changes a lot of low-level 
structures/interfaces (like BTreeRow/MergeListener, etc) to enforce a 
table-level setting. Since we'll probably get rid of this when doing a proper 
implementation of virtual cells , I updated on this commit to perform the 
filtering during read instead which will give us the same result but with less 
change in unrelated code. Do you see any problem with this approach?

As we discussed offline, we need to make sure the raw data including tombstone, 
expired liveness are shipped to the coordinator side.  Enforcing strict 
liveness in {{ReadCommand.executeLocally()}} would remove the row before digest 
or data response. Instead, we add {{enforceStrictLiveness}} to {{Row.purge}} to 
get the same result but less interfaces changes for {{Row}}.

bq. One problem of replacing shadowable tombstones by expired liveness info is 
that it stores an additional unused ttl field for every shadowed view entry to 
solve the commutative view deletion problem. In order to avoid this I updated 
the patch to only use expired ttl when a shadowable tombstone would not work 
along with an explanation on why that is used since it's a hack

Shadowable tombstone will be deprecated and use expired livenessInfo if the 
deletion time is greater than merged-row deletion to avoid uncessary expired 
livenessInfo.

bq. in TableViews.java, the DeletionTracker should be applied even if existing 
has no data, eg. partition-deletion

It's tested by  "testRangeDeletionWithFlush()" in ViewTest. Without partition 
deletion info from deletion tracker, existing row is given as empty and it will 
resurrect deleted cells.

bq.  In order to prevent against this, I added a note to the Upgrading section 
of NEWS.txt explaining about this caveat and that running repair before the 
upgrade should be sufficient to avoid it.

(y)

| source | unit | [dtest| 
| [trunk|https://github.com/jasonstack/cassandra/commits/trunk-11500-squashed] 
|  https://circleci.com/gh/jasonstack/cassandra/551 | 
secondary_indexes_test.TestPreJoinCallback.resumt_test |
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.11]
 |  https://circleci.com/gh/jasonstack/cassandra/557 | 
counter_tests.TestCounters.test_13691 |
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.0]
 |  https://circleci.com/gh/jasonstack/cassandra/556|  
counter_tests.TestCounters.test_13691
authe_test.TestAuth.sysmtem_auth_ks_is_alterable_test |
| [dtest|https://github.com/riptano/cassandra-dtest/commits/11500-poc]|

Those failed dtests are not related.
{code}
Changes:
1. Using expired livenessInfo if computed deletion time is greater than merged 
row deletion. There are only 2 cases:
  a. non-pk base column used in view pk is removed by partial update or 
partial delete
  b. unselected base column is removed by partial update or partial delete
   
   Current shadowable tombstone is not used to avoid the issue of resurrecting 
deleted cells. We will expired-livenessInfo and merged base row deletion 
instead. 

2. It's strict-liveness iff there is non-key base column in view-pk. The 
existence of view row is solely base on this non-key base column.

3. If there is no non-pk base column in view-pk, the view's liveness/deletion 
is using max of base livenessIno + unselected column. unselected column's ttl 
is used only when it affects view row liveness. Selected columns won't 
contribute to livenessInfo or row deletion.
* this wouldn't support complex cases as explained above. eg. c/d 
unselected, update c@10, delete c@11, update d@5. view row should be alive but 
dead

4. in TableViews.java, the DeletionTracker should be applied even if existing 
has no data, eg. partition-deletion

5. When generating read command to read existing base data, need to query all 
base columns instead of view's queried column if base and view having same key 
columns to read unselected column. 
{code}


was (Author: jasonstack):
[~pauloricardomg] thanks for the feedback (y)

bq. I wasn't very comfortable with our previous approach of enforcing strict 
liveness during row merge, since it changes a lot of low-level 
structures/interfaces (like BTreeRow/MergeListener, etc) to enforce a 
table-level setting. Since we'll probably get rid of this when doing a proper 
implementation of virtual cells , I updated on this commit to perform the 
filtering during read instead which will give us the same result but with less 
change in unrelated code. Do you see any problem with this approach?

As

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409
 ] 

mck edited comment on CASSANDRA-13418 at 9/1/17 12:15 PM:
--

[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/58440e707cd6490847a37dc8d76c150d3eb27aab],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest?)


was (Author: michaelsembwever):
[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest?)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory

2017-09-01 Thread Markus Dlugi (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150416#comment-16150416
 ] 

Markus Dlugi commented on CASSANDRA-13754:
--

Your latest patch which resets the entire {{BTree$Builder.values}} array seemed 
to do the trick, entire load test is now running smoothly. No more crazy GCing 
and most importantly no {{OutOfMemoryError}} s. Thanks a lot for the fast 
support and help!

> FastThreadLocal leaks memory
> 
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>Reporter: Eric Evans
>Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409
 ] 

mck edited comment on CASSANDRA-13418 at 9/1/17 12:06 PM:
--

[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest?)


was (Author: michaelsembwever):
[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest still warranted?)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409
 ] 

mck edited comment on CASSANDRA-13418 at 9/1/17 12:05 PM:
--

[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyOptions.validateOptions}} which is 
only there for the tests, and a new test method which does what Marcus asks 
for. ([~krummas], do you still want a dtest still warranted?)


was (Author: michaelsembwever):
[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyTest.validateOptions}} which is only 
there for the tests, and a new test method which does what Marcus asks for. 
([~krummas], do you still want a dtest still warranted?)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13339) java.nio.BufferOverflowException: null


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150412#comment-16150412
 ] 

Jason Brown commented on CASSANDRA-13339:
-

[~theochu] Thanks for the additional data points. Are you using counters? Can 
you copy the trace trace, as well, into this ticket?

> java.nio.BufferOverflowException: null
> --
>
> Key: CASSANDRA-13339
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13339
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Chris Richards
>
> I'm seeing the following exception running Cassandra 3.9 (with Netty updated 
> to 4.1.8.Final) running on a 2 node cluster.  It would have been processing 
> around 50 queries/second at the time (mixture of 
> inserts/updates/selects/deletes) : there's a collection of tables (some with 
> counters some without) and a single materialized view.
> {code}
> ERROR [MutationStage-4] 2017-03-15 22:50:33,052 StorageProxy.java:1353 - 
> Failed to apply mutation locally : {}
> java.nio.BufferOverflowException: null
>   at 
> org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.writeUnsignedVInt(BufferedDataOutputStreamPlus.java:262)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.EncodingStats$Serializer.serialize(EncodingStats.java:233)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.SerializationHeader$Serializer.serializeForMessaging(SerializationHeader.java:380)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:122)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:89)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.partitions.PartitionUpdate$PartitionUpdateSerializer.serialize(PartitionUpdate.java:790)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.Mutation$MutationSerializer.serialize(Mutation.java:393)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.commitlog.CommitLog.add(CommitLog.java:279) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:493) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:396) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:215) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:227) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.db.Mutation.apply(Mutation.java:241) 
> ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.service.StorageProxy$8.runMayThrow(StorageProxy.java:1347)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.service.StorageProxy$LocalMutationRunnable.run(StorageProxy.java:2539)
>  [apache-cassandra-3.9.jar:3.9]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_121]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  [apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
>  [apache-cassandra-3.9.jar:3.9]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
> [apache-cassandra-3.9.jar:3.9]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}
> and then again shortly afterwards
> {code}
> ERROR [MutationStage-3] 2017-03-15 23:27:36,198 StorageProxy.java:1353 - 
> Failed to apply mutation locally : {}
> java.nio.BufferOverflowException: null
>   at 
> org.apache.cassandra.io.util.DataOutputBufferFixed.doFlush(DataOutputBufferFixed.java:52)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.write(BufferedDataOutputStreamPlus.java:132)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.io.util.BufferedDataOutputStreamPlus.writeUnsignedVInt(BufferedDataOutputStreamPlus.java:262)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
> org.apache.cassandra.db.rows.EncodingStats$Serializer.serialize(EncodingStats.java:233)
>  ~[apache-cassandra-3.9.jar:3.9]
>   at 
>

[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150355#comment-16150355
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 12:03 PM:
---

Some additional observations, after taking yet another look at the test results:
* Although very similar, the 3.11 {{testall}} failures are not exactly the same 
as the ones in the baseline.
* The trunk {{dtest}} failures seem to diverge from the pattern of 
"common-expected-to-be-unrelated failures plus test_13747 failures". I'll try 
to see whether this can be attributed to flakiness (looking closer at the 
results, re-running the CI run on the same branch, running another CI run on a 
clean branch copy of the trunk, etc.)


was (Author: dimitarndimitrov):
Some additional observations, after taking yet another look at the test results:
* Although very similar, the 3.11 {{testall}} failures are not exactly the same 
as the ones in the baseline.
* The trunk {{dtest}} failures seem to diverge from the pattern of 
"common-expected-to-be-unrelated failures plus test_13747 failures". I'll try 
to see whether this can be attributed to flakiness (looking closer to the 
results, re-running the CI run on the same branch, running another CI run on a 
clean branch copy of the trunk, etc.)

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To

[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150409#comment-16150409
 ] 

mck commented on CASSANDRA-13418:
-

[~rgerard], i failed to see your last comment til now.
I've addressed [~krummas]'s concerns 
[here|https://github.com/thelastpickle/cassandra/commit/17b1d30ac8f07c49bfc4d51b14d3201cc969fcfe],
 but feel terrible now for stepping on your toes.

A few code style issues beyond the braces have been fixed. Thanks for the push 
back Marcus!
For example, I change the names of the constants in 
{{TimeWindowCompactionStrategyOptions}} to be more in align with the previous 
constants there.

Two additions to the tests in {{TimeWindowCompactionStrategyTest}} are added. 
One for the {{TimeWindowCompactionStrategyTest.validateOptions}} which is only 
there for the tests, and a new test method which does what Marcus asks for. 
([~krummas], do you still want a dtest still warranted?)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions

2017-09-01 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150387#comment-16150387
 ] 

Aleksey Yeschenko commented on CASSANDRA-13692:
---

[~dimitarndimitrov] I've noticed and I know who you are. Welcome to the 
community (:

FWIW I've run the dtest locally a couple dozen times, and it's passing just 
fine.

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13532) sstabledump reports incorrect usage for argument order

2017-09-01 Thread Michael Sear (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150381#comment-16150381
 ] 

Michael Sear commented on CASSANDRA-13532:
--

I just came across this bug myself.  I think, though, that it would be 
preferable to fix the parser so it consumes a single value per argument.  e.g. :

{code:java}
sstabledump -k mykey1 -k mykey2 mysstable
{code}

Don't you think?  I'd have thought this would be more consistent with the way 
arguments are normally used on the command line.

> sstabledump reports incorrect usage for argument order
> --
>
> Key: CASSANDRA-13532
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13532
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Ian Ilsley
>Assignee: Varun Barala
>Priority: Minor
>  Labels: lhf
> Fix For: 3.0.15, 3.11.1, 4.0
>
> Attachments: sstabledump#printUsage.patch
>
>
> sstabledump usage reports 
> {{usage: sstabledump  }}
> However the actual usage is 
> {{sstabledump   }}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150351#comment-16150351
 ] 

Dimitar Dimitrov commented on CASSANDRA-13692:
--

Ah, sorry, I should have at least attached the failure logs - I didn't attach 
the build artifacts, as I wasn't sure if they were sanitized with regard to 
non-public data.
I'll sync with a more knowledgeable colleague, and get back to you with the 
necessary info.

P.S. Like you've probably noticed, I'm still new to one of the more visible 
presences here, and many of the steps in the process are a bit hazy to me - 
I'll make sure to improve quickly on that though :)

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150355#comment-16150355
 ] 

Dimitar Dimitrov commented on CASSANDRA-13692:
--

Some additional observations, after taking yet another look at the test results:
* Although very similar, the 3.11 {{testall}} failures are not exactly the same 
as the ones in the baseline.
* The trunk {{dtest}} failures seem to diverge from the pattern of 
"common-expected-to-be-unrelated failures plus test_13747 failures". I'll try 
to see whether this can be attributed to flakiness (looking closer to the 
results, re-running the CI run on the same branch, running another CI run on a 
clean branch copy of the trunk, etc.)

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150338#comment-16150338
 ] 

Robert Stupp commented on CASSANDRA-13754:
--

Your observation regarding {{BTree.Builder.values[]}} seems correct.
However, {{SEPWorker}} must *not* remove the thread locals - it's the intention 
of these thread-locals to be kept for reuse.

> FastThreadLocal leaks memory
> 
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>Reporter: Eric Evans
>Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-11500) Obsolete MV entry may not be properly deleted

2017-09-01 Thread ZhaoYang (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-11500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150303#comment-16150303
 ] 

ZhaoYang commented on CASSANDRA-11500:
--

[~pauloricardomg] thanks for the feedback (y)

bq. I wasn't very comfortable with our previous approach of enforcing strict 
liveness during row merge, since it changes a lot of low-level 
structures/interfaces (like BTreeRow/MergeListener, etc) to enforce a 
table-level setting. Since we'll probably get rid of this when doing a proper 
implementation of virtual cells , I updated on this commit to perform the 
filtering during read instead which will give us the same result but with less 
change in unrelated code. Do you see any problem with this approach?

As we discussed offline, we need to make sure the raw data including tombstone, 
expired liveness are shipped to the coordinator side.  Enforcing strict 
liveness in {{ReadCommand.executeLocally()}} would remove the row before digest 
or data response. Instead, we add {{enforceStrictLiveness}} to {{Row.purge}} to 
get the same result but less interfaces changes for {{Row}}.

bq. One problem of replacing shadowable tombstones by expired liveness info is 
that it stores an additional unused ttl field for every shadowed view entry to 
solve the commutative view deletion problem. In order to avoid this I updated 
the patch to only use expired ttl when a shadowable tombstone would not work 
along with an explanation on why that is used since it's a hack

Shadowable tombstone will be deprecated and use expired livenessInfo if the 
deletion time is greater than merged-row deletion to avoid uncessary expired 
livenessInfo.

bq. in TableViews.java, the DeletionTracker should be applied even if existing 
has no data, eg. partition-deletion

It's tested by  "testRangeDeletionWithFlush()" in ViewTest. Without partition 
deletion info from deletion tracker, existing row is given as empty and it will 
resurrect deleted cells.

bq.  In order to prevent against this, I added a note to the Upgrading section 
of NEWS.txt explaining about this caveat and that running repair before the 
upgrade should be sufficient to avoid it.

(y)

| source | unit | [dtest| 
| [trunk|https://github.com/jasonstack/cassandra/commits/trunk-11500-squashed] 
|  https://circleci.com/gh/jasonstack/cassandra/551 | x |
| 
[3.11|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.11]
 |  https://circleci.com/gh/jasonstack/cassandra/557 | x |
| 
[3.0|https://github.com/jasonstack/cassandra/commits/CASSANDRA-11500-strict-3.0]
 |  https://circleci.com/gh/jasonstack/cassandra/556|  x |
| [dtest|https://github.com/riptano/cassandra-dtest/commits/11500-poc]|
{code}
Changes:
1. Using expired livenessInfo if computed deletion time is greater than merged 
row deletion. There are only 2 cases:
  a. non-pk base column used in view pk is removed by partial update or 
partial delete
  b. unselected base column is removed by partial update or partial delete
   
   Current shadowable tombstone is not used to avoid the issue of resurrecting 
deleted cells. We will expired-livenessInfo and merged base row deletion 
instead. 

2. It's strict-liveness iff there is non-key base column in view-pk. The 
existence of view row is solely base on this non-key base column.

3. If there is no non-pk base column in view-pk, the view's liveness/deletion 
is using max of base livenessIno + unselected column. unselected column's ttl 
is used only when it affects view row liveness. Selected columns won't 
contribute to livenessInfo or row deletion.
* this wouldn't support complex cases as explained above. eg. c/d 
unselected, update c@10, delete c@11, update d@5. view row should be alive but 
dead

4. in TableViews.java, the DeletionTracker should be applied even if existing 
has no data, eg. partition-deletion

5. When generating read command to read existing base data, need to query all 
base columns instead of view's queried column if base and view having same key 
columns to read unselected column. 
{code}

> Obsolete MV entry may not be properly deleted
> -
>
> Key: CASSANDRA-11500
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11500
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Sylvain Lebresne
>Assignee: ZhaoYang
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> When a Materialized View uses a non-PK base table column in its PK, if an 
> update changes that column value, we add the new view entry and remove the 
> old one. When doing that removal, the current code uses the same timestamp 
> than for the liveness info of the new entry, which is the max timestamp for 
> any columns participating to the view PK. This is not correct for the 
> deletion as the old view entry could have

[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions

2017-09-01 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150291#comment-16150291
 ] 

Aleksey Yeschenko commented on CASSANDRA-13692:
---

[~dimitarndimitrov] Maybe, maybe not. The screenshot is useless to me as I 
can't click on test details.

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13810) Overload because of hint pressure + MVs

2017-09-01 Thread Paulo Motta (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-13810:

Labels: materializedviews  (was: )

> Overload because of hint pressure + MVs
> ---
>
> Key: CASSANDRA-13810
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13810
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Tom van der Woerdt
>  Labels: materializedviews
>
> Cluster setup: 3 DCs, 20 Cassandra nodes each, all 3.0.14, with approx. 200GB 
> data per machine. Many tables have MVs associated.
> During some maintenance we did a rolling restart of all nodes in the cluster. 
> This caused a buildup of hints/batches, as expected. Most nodes came back 
> just fine, except for two nodes.
> These two nodes came back with a loadavg of >100, and 'nodetool tpstats' 
> showed a million (not exaggerating) MutationStage tasks per second(!). It was 
> clear that these were mostly (all?) mutations coming from hints, as indicated 
> by thousands of log entries per second in debug.log :
> {noformat}
> DEBUG [SharedPool-Worker-107] 2017-08-27 13:16:51,098 HintVerbHandler.java:95 
> - Failed to apply hint
> java.util.concurrent.CompletionException: 
> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - 
> received only 0 responses.
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) 
> ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
>  ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  ~[na:1.8.0_144]
> at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:481) 
> ~[apache-cassandra-3.0.14.jar:3.0.14]
> at 
> org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:495) 
> ~[apache-cassandra-3.0.14.jar:3.0.14]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_144]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.14.jar:3.0.14]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> ~[apache-cassandra-3.0.14.jar:3.0.14]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
> Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation 
> timed out - received only 0 responses.
> ... 6 common frames omitted
> {noformat}
> After reading the relevant code, it seems that a hint is considered 
> droppable, and in the mutation path when the table contains a MV and the lock 
> fails to acquire and the mutation is droppable, it throws a WTE without 
> waiting until the timeout expires. This explains why Cassandra is able to 
> process a million mutations per second without actually considering them 
> 'dropped' in the 'nodetool tpstats' output.
> I managed to recover the two nodes by stopping handoffs on all nodes in the 
> cluster and reenabling them one at a time. It's likely that the hint/batchlog 
> settings were sub-optimal on this cluster, but I think that the retry 
> behavior(?) of hints should be improved as it's hard to express hint 
> throughput in kb/s when the mutations can involve MVs.
> More data available upon request -- I'm not sure which bits are relevant and 
> which aren't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13810) Overload because of hint pressure + MVs

2017-09-01 Thread Paulo Motta (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-13810:

Component/s: Materialized Views

> Overload because of hint pressure + MVs
> ---
>
> Key: CASSANDRA-13810
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13810
> Project: Cassandra
>  Issue Type: Bug
>  Components: Materialized Views
>Reporter: Tom van der Woerdt
>  Labels: materializedviews
>
> Cluster setup: 3 DCs, 20 Cassandra nodes each, all 3.0.14, with approx. 200GB 
> data per machine. Many tables have MVs associated.
> During some maintenance we did a rolling restart of all nodes in the cluster. 
> This caused a buildup of hints/batches, as expected. Most nodes came back 
> just fine, except for two nodes.
> These two nodes came back with a loadavg of >100, and 'nodetool tpstats' 
> showed a million (not exaggerating) MutationStage tasks per second(!). It was 
> clear that these were mostly (all?) mutations coming from hints, as indicated 
> by thousands of log entries per second in debug.log :
> {noformat}
> DEBUG [SharedPool-Worker-107] 2017-08-27 13:16:51,098 HintVerbHandler.java:95 
> - Failed to apply hint
> java.util.concurrent.CompletionException: 
> org.apache.cassandra.exceptions.WriteTimeoutException: Operation timed out - 
> received only 0 responses.
> at 
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
>  ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
>  ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture.uniAccept(CompletableFuture.java:647) 
> ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:632)
>  ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
>  ~[na:1.8.0_144]
> at 
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
>  ~[na:1.8.0_144]
> at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:481) 
> ~[apache-cassandra-3.0.14.jar:3.0.14]
> at 
> org.apache.cassandra.db.Keyspace.lambda$applyInternal$0(Keyspace.java:495) 
> ~[apache-cassandra-3.0.14.jar:3.0.14]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_144]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  ~[apache-cassandra-3.0.14.jar:3.0.14]
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> ~[apache-cassandra-3.0.14.jar:3.0.14]
> at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_144]
> Caused by: org.apache.cassandra.exceptions.WriteTimeoutException: Operation 
> timed out - received only 0 responses.
> ... 6 common frames omitted
> {noformat}
> After reading the relevant code, it seems that a hint is considered 
> droppable, and in the mutation path when the table contains a MV and the lock 
> fails to acquire and the mutation is droppable, it throws a WTE without 
> waiting until the timeout expires. This explains why Cassandra is able to 
> process a million mutations per second without actually considering them 
> 'dropped' in the 'nodetool tpstats' output.
> I managed to recover the two nodes by stopping handoffs on all nodes in the 
> cluster and reenabling them one at a time. It's likely that the hint/batchlog 
> settings were sub-optimal on this cluster, but I think that the retry 
> behavior(?) of hints should be improved as it's hard to express hint 
> throughput in kb/s when the mutations can involve MVs.
> More data available upon request -- I'm not sure which bits are relevant and 
> which aren't.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3

[
https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

I am attaching a python script (attack.py) here that can be used to reproduce
this issue on a Windows platform. The script uses the pycassa python library
that can easily be installed using pip.

{noformat}
python attack.py create
{noformat}

4. Run some get_slice queries to an empty CF and note down the average response
time (in seconds)

{noformat}
python attack.py
{noformat}

{noformat}
python attack.py create
{noformat}

8. Run some get_slice queries to an empty CF using attack.py and note down the
average response time (in seconds)

{noformat}
python attack.py
{noformat}

I am attaching a python script (attack.py) here that can be used to reproduce
this issue on a Windows platform. The script uses the pycassa python library
that can easily be installed using pip.

REPRODUCTION STEPS:
1. Install Cassandra 1.2.18 from
https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz
2. Run Cassandra 1.2.18 from cmd console using cassandra.bat
3. Create a test keyspace and an empty CF using attack.py script
python attack.py create
4. Run some get_slice queries to an empty CF and note down the average response
time (in seconds)
python attack.py
get_slice count: 788
get_slice total response time: 0.3126376
*get_slice average response time: 0.000397208075838*
5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from
https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz
6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run
Cassandra from an elevated cmd console using cassandra.bat
7. Create a test keyspace and an empty CF using attack.py script
python attack.py create
8. Run some get_slice queries to an empty CF using attack.py and note down the
average response time (in seconds)
python attack.py
get_slice count: 788
get_slice total response time: 1.1646185
*get_slice average response time: 0.00147842634753*
9. Compare the average response times
EXPECTED:
get_slice response time of Cassandra 3.10 is not worse than on Cassandra
1.2.18
ACTUAL:
get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra
1.2.18

REMARKS:
- this seems to happen only on Windows

[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3

[
https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

I am attaching a python script (attack.py) here that can be used to reproduce
this issue on a Windows platform. The script uses the pycassa python library
that can easily be installed using pip.

{noformat}
python attack.py create
{noformat}

4. Run some get_slice queries to an empty CF and note down the average response
time (in seconds)

{noformat}
python attack.py
{noformat}

{noformat}
python attack.py create
{noformat}

8. Run some get_slice queries to an empty CF using attack.py and note down the
average response time (in seconds)

{noformat}
python attack.py
{noformat}

I am attaching a python script (attack.py) here that can be used to reproduce
this issue on a Windows platform. The script uses the pycassa python library
that can easily be installed using pip.

{noformat}
python attack.py create
{noformat}

4. Run some get_slice queries to an empty CF and note down the average response
time (in seconds)

{noformat}
python attack.py
{noformat}

{noformat}
python attack.py create
{noformat}

8. Run some get_slice queries to an empty CF using attack.py and note down the
average response time (in seconds)

{noformat}
python attack.py
{noformat}

[jira] [Updated] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3

[
https://issues.apache.org/jira/browse/CASSANDRA-13835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

I am attaching a python script (attack.py) here that can be used to reproduce
this issue on a Windows platform. The script uses the pycassa python library
that can easily be installed using pip.

REPRODUCTION STEPS:
1. Install Cassandra 1.2.18 from
https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz
2. Run Cassandra 1.2.18 from cmd console using cassandra.bat
3. Create a test keyspace and an empty CF using attack.py script
python attack.py create
4. Run some get_slice queries to an empty CF and note down the average response
time
python attack.py
get_slice count: 788
get_slice total response time: 0.3126376
get_slice average response time: 0.000397208075838
5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from
https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz
6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run
Cassandra from an elevated cmd console using cassandra.bat
7. Create a test keyspace and an empty CF using attack.py script
python attack.py create
8. Run some get_slice queries to an empty CF using attack.py and note down the
average response time
python attack.py
get_slice count: 788
get_slice total response time: 1.1646185
get_slice average response time: 0.00147842634753
9. Compare the average response times
EXPECTED:
get_slice response time of Cassandra 3.10 is not worse than on Cassandra
1.2.18
ACTUAL:
get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra
1.2.18

REMARKS:
- this seems to happen only on Windows platform (tested on Windows 10 and
Windows Server 2008 R2)
- running the very same procedure on Linux (Ubuntu) renders

[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:51 AM:
--

Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.PNG] | 
[dtest|^c13692-2.2-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.PNG] | 
[dtest|^c13692-3.0-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.PNG] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.PNG] | [dtest|^c13692-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} results look good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 dtests failing, in addition to the 
common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that).

[~iamaleksey], do you have an idea if that could be the case?


was (Author: dimitarndimitrov):
Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.PNG] | 
[dtest|^c13692-2.2-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.PNG] | 
[dtest|^c13692-3.0-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.PNG] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.PNG] | [dtest|^c13692-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 dtests failing, in addition to the 
common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}",

[jira] [Created] (CASSANDRA-13835) Thrift get_slice responds slower on Cassandra 3

Pawel Szlendak created CASSANDRA-13835:
--

 Summary: Thrift get_slice responds slower on Cassandra 3
 Key: CASSANDRA-13835
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13835
 Project: Cassandra
  Issue Type: Bug
Reporter: Pawel Szlendak
 Attachments: attack.py, cassandra120_get_slice_reply_time.png, 
cassandra310_get_slice_reply_time.png

I have recently upgraded from Cassandra 1.2.18 to Cassandra 3.10 and was 
surprised to notice performance degradation of my server application.
I dug down through my application stack only to find out that the cause of the 
performance issue was slower response time of Cassandra 3.10 get_slice as 
compared to Cassandra 1.2.18 (almost x3 times slower on average).

I am attaching a python script (attack.py) here that can be used to reproduce 
this issue on a Windows platform. The script uses the pycassa python library 
that can easily be installed using pip.

REPRODUCTION STEPS:
1. Install Cassandra 1.2.18 from 
https://archive.apache.org/dist/cassandra/1.2.18/apache-cassandra-1.2.18-bin.tar.gz
2. Run Cassandra 1.2.18 from cmd console using cassandra.bat
3. Create a test keyspace and an empty CF using attack.py script
   python attack.py create
4. Run some get_slice queries to an empty CF and note down the average response 
time
   python attack.py
   get_slice count: 788
   get_slice total response time: 0.3126376
   get_slice average response time: 0.000397208075838
5. Stop Cassandra 1.2.18 and install Cassandra 3.10 from 
https://archive.apache.org/dist/cassandra/3.10/apache-cassandra-3.10-bin.tar.gz
6. Tweak cassandra.yaml to run thrift service (start_rpc=true) and run 
Cassandra from an elevated cmd console using cassandra.bat
7. Create a test keyspace and an empty CF using attack.py script
   python attack.py create
8. Run some get_slice queries to an empty CF using attack.py and note down the 
average response time
   python attack.py
   get_slice count: 788
   get_slice total response time: 1.1646185
   get_slice average response time: 0.00147842634753
9. Compare the average response times
EXPECTED:
   get_slice response time of Cassandra 3.10 is not worse than on Cassandra 
1.2.18
ACTUAL:
   get_slice response time of Cassandra 3.10 is x3 worse than that of Cassandra 
1.2.18

REMARKS:
- this seems to happen only on Windows platform (tested on Windows 10 and 
Windows Server 2008 R2)
- running the very same procedure on Linux (Ubuntu) renders roughly the same 
response times
- I sniffed the traffic to/from Cassandra 1.2.18 and Cassandra 3.10 and it can 
been seen that Cassandra 3.10 responds slower (Wireshark dumps attached)
- when attacking the server with concurrent get_slice queries I can see lower 
CPU usage for Cassandra 3.10 that for Cassandra 1.2.18

I am willing to work on this on my own if you guys give me some tips on where 
to look for. I am also aware that this might be more Windows/Java related, 
nevertheless, any help from your side would be much appreciated.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:50 AM:
--

Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.PNG] | 
[dtest|^c13692-2.2-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.PNG] | 
[dtest|^c13692-3.0-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.PNG] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.PNG] | [dtest|^c13692-dtest-results.PNG] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 dtests failing, in addition to the 
common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?


was (Author: dimitarndimitrov):
Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 dtests failing, in addition to the 
common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
>

[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:49 AM:
--

Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 dtests failing, in addition to the 
common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?


was (Author: dimitarndimitrov):
Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);

[jira] [Updated] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimitar Dimitrov updated CASSANDRA-13692:
-
Attachment: c13692-2.2-dtest-results.PNG
c13692-2.2-testall-results.PNG
c13692-3.0-dtest-results.PNG
c13692-3.0-testall-results.PNG
c13692-3.11-dtest-results.PNG
c13692-3.11-testall-results.PNG
c13692-dtest-results.PNG
c13692-testall-results.PNG

Adding screenshots from CI results.

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
> Attachments: c13692-2.2-dtest-results.PNG, 
> c13692-2.2-testall-results.PNG, c13692-3.0-dtest-results.PNG, 
> c13692-3.0-testall-results.PNG, c13692-3.11-dtest-results.PNG, 
> c13692-3.11-testall-results.PNG, c13692-dtest-results.PNG, 
> c13692-testall-results.PNG
>
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"), "");
> return directory;
> }
> {code}
> The fixed code throws FSWE and triggers the failure policy.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:46 AM:
--

Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] | [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?


was (Author: dimitarndimitrov):
Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if

[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:46 AM:
--

Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?


was (Author: dimitarndimitrov):
Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |

| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if

[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:45 AM:
--

Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 | [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |

| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?


was (Author: dimitarndimitrov):
Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 |
 [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if

[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:44 AM:
--

Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 |
 [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?


was (Author: dimitarndimitrov):
Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 |
 [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if

[jira] [Comment Edited] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212
 ] 

Dimitar Dimitrov edited comment on CASSANDRA-13692 at 9/1/17 8:44 AM:
--

Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 |
 [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?


was (Author: dimitarndimitrov):
Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 |
 [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace

[jira] [Commented] (CASSANDRA-13692) CompactionAwareWriter_getWriteDirectory throws incompatible exceptions


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150212#comment-16150212
 ] 

Dimitar Dimitrov commented on CASSANDRA-13692:
--

Okay, here are the branches with the proposed changes:

| 
[2.2|https://github.com/apache/cassandra/compare/cassandra-2.2...dimitarndimitrov:c13692-2.2]
 | [testall|^c13692-2.2-testall-results.png] | 
[dtest|^c13692-2.2-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-2.2_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.0|https://github.com/apache/cassandra/compare/cassandra-3.0...dimitarndimitrov:c13692-3.0]
 | [testall|^c13692-3.0-testall-results.png] | 
[dtest|^c13692-3.0-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.0_dtest/lastCompletedBuild/testReport/])
 |
| 
[3.11|https://github.com/apache/cassandra/compare/cassandra-3.11...dimitarndimitrov:c13692-3.11]
 | [testall|^c13692-3.11-testall-results.png] 
([testall-baseline|https://cassci.datastax.com/job/cassandra-3.11_testall/lastCompletedBuild/testReport/])
 |
 [dtest|^c13692-3.11-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/cassandra-3.11_dtest/lastCompletedBuild/testReport/])
 |
| 
[trunk|https://github.com/apache/cassandra/compare/trunk...dimitarndimitrov:c13692]
 | [testall|^c13692-testall-results.png] |
 [dtest|^c13692-dtest-results.png] 
([dtest-baseline|https://cassci.datastax.com/job/trunk_dtest/lastCompletedBuild/testReport/])
 |

{{testall}} looks good for all branches, but there's a common theme of 
consistency_test.TestConsistency.test_13747 {{dtest}}s failing, in addition to 
the common-expected-to-be-unrelated {{dtest}} failures.
My assumption is that this is related to CASSANDRA-13747 (the comments there 
seem to corroborate that). [~iamaleksey] , do you have an idea if that could be 
the case?

> CompactionAwareWriter_getWriteDirectory throws incompatible exceptions
> --
>
> Key: CASSANDRA-13692
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13692
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Hao Zhong
>Assignee: Dimitar Dimitrov
>  Labels: lhf
>
> The CompactionAwareWriter_getWriteDirectory throws RuntimeException:
> {code}
> public Directories.DataDirectory getWriteDirectory(Iterable 
> sstables, long estimatedWriteSize)
> {
> File directory = null;
> for (SSTableReader sstable : sstables)
> {
> if (directory == null)
> directory = sstable.descriptor.directory;
> if (!directory.equals(sstable.descriptor.directory))
> {
> logger.trace("All sstables not from the same disk - putting 
> results in {}", directory);
> break;
> }
> }
> Directories.DataDirectory d = 
> getDirectories().getDataDirectoryForFile(directory);
> if (d != null)
> {
> long availableSpace = d.getAvailableSpace();
> if (availableSpace < estimatedWriteSize)
> throw new RuntimeException(String.format("Not enough space to 
> write %s to %s (%s available)",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize),
>  d.location,
>  
> FBUtilities.prettyPrintMemory(availableSpace)));
> logger.trace("putting compaction results in {}", directory);
> return d;
> }
> d = getDirectories().getWriteableLocation(estimatedWriteSize);
> if (d == null)
> throw new RuntimeException(String.format("Not enough disk space 
> to store %s",
>  
> FBUtilities.prettyPrintMemory(estimatedWriteSize)));
> return d;
> }
> {code}
> However, the thrown exception does not  trigger the failure policy. 
> CASSANDRA-11448 fixed a similar problem. The buggy code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new RuntimeException("Insufficient disk space to write " + 
> writeSize + " bytes");
> return directory;
> }
> {code}
> The fixed code is:
> {code}
> protected Directories.DataDirectory getWriteDirectory(long writeSize)
> {
> Directories.DataDirectory directory = 
> getDirectories().getWriteableLocation(writeSize);
> if (directory == null)
> throw new FSWriteError(new IOException("Insufficient disk space 
> to write " + writeSize + " bytes"),

[jira] [Comment Edited] (CASSANDRA-13833) Failed compaction is not captured


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150158#comment-16150158
 ] 

Marcus Eriksson edited comment on CASSANDRA-13833 at 9/1/17 7:58 AM:
-

nice catch, code LGTM, just a small test fix for 3.11 and trunk: 
https://github.com/krummas/cassandra/commit/bb9c9e0b685d3b4e76a7b082b46b01a7ed6c8af5

running dtests:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/261/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/262/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/263/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/264/


was (Author: krummas):
nice catch, code LGTM, just a small fix for 3.11 and trunk: 
https://github.com/krummas/cassandra/commit/bb9c9e0b685d3b4e76a7b082b46b01a7ed6c8af5

running dtests:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/261/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/262/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/263/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/264/

> Failed compaction is not captured
> -
>
> Key: CASSANDRA-13833
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13833
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>
> Follow up for CASSANDRA-13785, when the compaction failed, it fails silently. 
> No error message is logged and exceptions metric is not updated. Basically, 
> it's unable to get the exception: 
> [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491]
> Here is the call stack:
> {noformat}
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> There're 2 {{FutureTask}} in the call stack, for example 
> {{FutureTask1(FutureTask2))}}, If the call thrown an exception, 
> {{FutureTask2}} sets the status, save the exception and return. But 
> FutureTask1 doesn't get any exception, then set the status to normal. So 
> we're unable to get the exception in:
> [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491]
> 2.1.x is working fine, here is the call stack:
> {noformat}
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264)
>  ~[main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_141]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_141]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[na:1.8.0_141]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_141]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150175#comment-16150175
 ] 

Marcus Eriksson commented on CASSANDRA-13418:
-

A dtest that makes sure that we drop sstables when the option is enabled and 
that we don't drop them when it is not enabled

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150173#comment-16150173
 ] 

Romain GERARD commented on CASSANDRA-13418:
---

Will change code style and the protected to private (Splitting from 
getFullyExpiredSSTables seems more readable to me)

If you can think of any more test. I will add them

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Romain GERARD
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13754) FastThreadLocal leaks memory

2017-09-01 Thread Markus Dlugi (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150172#comment-16150172
 ] 

Markus Dlugi commented on CASSANDRA-13754:
--

[~snazy], I don't think the node is overloaded. I originally thought so as 
well, so I made a little experiment where I included a cap in our load test 
limiting the {{INSERT}} s per minute from ~25,000 to ~10,000. As a consequence, 
the node survived a little longer, but in the end it still died with an 
{{OutOfMemoryError}} after more data had been inserted. So it's not that there 
are too many active writes, it's just that the node fails after a certain 
amount of total writes, which indicates to me that a memory leak is indeed 
happening.

I also had another look into the heap dump I sent you, and you are correct that 
the heap is mostly filled with {{BTree$Builder}} instances that still have 
stuff in their {{values}} array. However, if you look closer, you will notice 
that for each of these instances, the {{values}} array always contains {{null}} 
for the first couple of entries, and only after those there is still actual 
content. For some reason, the actual content always starts at index 28, whereas 
indices 0 - 27 are {{null}} - not sure if this is a coincidence? But you can 
also see that for all the {{BTree$Builder}} objects, the {{count}} attribute is 
0, which also indicates to me that {{BTree$Builder.cleanup()}} has already run 
and those are not active writes. This theory is supported by the fact that my 
little workaround of manually calling {{FastThreadLocal.removeAll()}} actually 
works, because this means that no other objects except the {{FastThreadLocal}} 
s still have references to the builders.

Therefore, I think we have two issues here:

# {{SEPWorker}} is never cleaning the {{FastThreadLocal}} s, therefore 
accumulating references to otherwise dead objects - maybe we can include 
something to at least remove non-static entries regularly?
# {{BTree$Builder}} seems to have an issue properly cleaning up after building, 
so the objects referenced by the {{FastThreadLocal}} s of the {{SEPWorker}} 
threads are very large and thus ultimately lead to the {{OutOfMemoryError}} s

> FastThreadLocal leaks memory
> 
>
> Key: CASSANDRA-13754
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13754
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: Cassandra 3.11.0, Netty 4.0.44.Final, OpenJDK 8u141-b15
>Reporter: Eric Evans
>Assignee: Robert Stupp
> Fix For: 3.11.1
>
>
> After a chronic bout of {{OutOfMemoryError}} in our development environment, 
> a heap analysis is showing that more than 10G of our 12G heaps are consumed 
> by the {{threadLocals}} members (instances of {{java.lang.ThreadLocalMap}}) 
> of various {{io.netty.util.concurrent.FastThreadLocalThread}} instances.  
> Reverting 
> [cecbe17|https://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=commit;h=cecbe17e3eafc052acc13950494f7dddf026aa54]
>  fixes the issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13833) Failed compaction is not captured


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16150158#comment-16150158
 ] 

Marcus Eriksson commented on CASSANDRA-13833:
-

nice catch, code LGTM, just a small fix for 3.11 and trunk: 
https://github.com/krummas/cassandra/commit/bb9c9e0b685d3b4e76a7b082b46b01a7ed6c8af5

running dtests:
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/261/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/262/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/263/
https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/264/

> Failed compaction is not captured
> -
>
> Key: CASSANDRA-13833
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13833
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Jay Zhuang
>Assignee: Jay Zhuang
>
> Follow up for CASSANDRA-13785, when the compaction failed, it fails silently. 
> No error message is logged and exceptions metric is not updated. Basically, 
> it's unable to get the exception: 
> [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491]
> Here is the call stack:
> {noformat}
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> There're 2 {{FutureTask}} in the call stack, for example 
> {{FutureTask1(FutureTask2))}}, If the call thrown an exception, 
> {{FutureTask2}} sets the status, save the exception and return. But 
> FutureTask1 doesn't get any exception, then set the status to normal. So 
> we're unable to get the exception in:
> [CompactionManager.java:1491|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L1491]
> 2.1.x is working fine, here is the call stack:
> {noformat}
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:73)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264)
>  ~[main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_141]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_141]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  ~[na:1.8.0_141]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [na:1.8.0_141]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_141]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13530) GroupCommitLogService

2017-09-01 Thread Yuji Ito (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16149884#comment-16149884
 ] 

Yuji Ito edited comment on CASSANDRA-13530 at 9/1/17 7:17 AM:
--

[~aweisberg]
Sorry for late. I measured the latencies again.
As you said, a test requests 256 operations without SERIAL twice. And the 
below results are reported in the 2nd requests.

h5. Average latency of UPDATE
||Throughput\[ops\]||Batch - 2ms \[ms\]||Group - 15ms \[ms\]||
|100|1.63|9.58|
|200|11.83|9.67|
|500|17.31|10.20|
|1000|19.93|10.75|

I attached the result file ([^groupCommitLog_noSerial_result.xlsx]) including 
histograms of latency.


was (Author: yuji):
Sorry for late. I measured the latencies again.
As you said, a test requests 256 operations without SERIAL twice. And the 
below results are reported in the 2nd requests.

h5. Average latency of UPDATE
||Throughput\[ops\]||Batch - 2ms \[ms\]||Group - 15ms \[ms\]||
|100|1.63|9.58|
|200|11.83|9.67|
|500|17.31|10.20|
|1000|19.93|10.75|

I attached the result file ([^groupCommitLog_noSerial_result.xlsx]) including 
histograms of latency.

> GroupCommitLogService
> -
>
> Key: CASSANDRA-13530
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13530
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Yuji Ito
>Assignee: Yuji Ito
> Fix For: 2.2.x, 3.0.x, 3.11.x
>
> Attachments: groupCommit22.patch, groupCommit30.patch, 
> groupCommit3x.patch, groupCommitLog_noSerial_result.xlsx, 
> groupCommitLog_result.xlsx, GuavaRequestThread.java, MicroRequestThread.java
>
>
> I propose a new CommitLogService, GroupCommitLogService, to improve the 
> throughput when lots of requests are received.
> It improved the throughput by maximum 94%.
> I'd like to discuss about this CommitLogService.
> Currently, we can select either 2 CommitLog services; Periodic and Batch.
> In Periodic, we might lose some commit log which hasn't written to the disk.
> In Batch, we can write commit log to the disk every time. The size of commit 
> log to write is too small (< 4KB). When high concurrency, these writes are 
> gathered and persisted to the disk at once. But, when insufficient 
> concurrency, many small writes are issued and the performance decreases due 
> to the latency of the disk. Even if you use SSD, processes of many IO 
> commands decrease the performance.
> GroupCommitLogService writes some commitlog to the disk at once.
> The patch adds GroupCommitLogService (It is enabled by setting 
> `commitlog_sync` and `commitlog_sync_group_window_in_ms` in cassandra.yaml).
> The difference from Batch is just only waiting for the semaphore.
> By waiting for the semaphore, some writes for commit logs are executed at the 
> same time.
> In GroupCommitLogService, the latency becomes worse if the there is no 
> concurrency.
> I measured the performance with my microbench (MicroRequestThread.java) by 
> increasing the number of threads.The cluster has 3 nodes (Replication factor: 
> 3). Each nodes is AWS EC2 m4.large instance + 200IOPS io1 volume.
> The result is as below. The GroupCommitLogService with 10ms window improved 
> update with Paxos by 94% and improved select with Paxos by 76%.
> h6. SELECT / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|192|103|
> |2|163|212|
> |4|264|416|
> |8|454|800|
> |16|744|1311|
> |32|1151|1481|
> |64|1767|1844|
> |128|2949|3011|
> |256|4723|5000|
> h6. UPDATE / sec
> ||\# of threads||Batch 2ms||Group 10ms||
> |1|45|26|
> |2|39|51|
> |4|58|102|
> |8|102|198|
> |16|167|213|
> |32|289|295|
> |64|544|548|
> |128|1046|1058|
> |256|2020|2061|



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13833) Failed compaction is not captured