date:20171102


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236776#comment-16236776
 ] 

Michał Szczygieł commented on CASSANDRA-12182:
--

Thank you [~KurtG] for the feedback. I've attached patch with a testcase.

> redundant StatusLogger print out when both dropped message and long GC event 
> happen
> ---
>
> Key: CASSANDRA-12182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12182
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wei Deng
>Assignee: Michał Szczygieł
>Priority: Minor
>  Labels: lhf
> Attachments: 12182-trunk.txt, 12182-trunk.txt
>
>
> I was stress testing a C* 3.0 environment and it appears that when the CPU is 
> running low, HINT and MUTATION messages will start to get dropped, and the GC 
> thread can also get some really long-running GC, and I'd get some redundant 
> log entries in system.log like the following:
> {noformat}
> WARN  [Service Thread] 2016-07-12 22:48:45,748  GCInspector.java:282 - G1 
> Young Generation GC in 522ms.  G1 Eden Space: 68157440 -> 0; G1 Old Gen: 
> 3376113224 -> 3468387912; G1 Survivor Space: 24117248 -> 0; 
> INFO  [Service Thread] 2016-07-12 22:48:45,763  StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,775  MessagingService.java:983 - 
> MUTATION messages were dropped in last 5000 ms: 419 for internal timeout and 
> 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,776  MessagingService.java:983 - 
> HINT messages were dropped in last 5000 ms: 89 for internal timeout and 0 for 
> cross node timeout
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,776  StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,798  StatusLogger.java:56 - 
> MutationStage32  4194   32997234 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,798  StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,799  StatusLogger.java:56 - 
> ReadStage 0 0940 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,800  StatusLogger.java:56 - 
> MutationStage32  4363   32997333 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,801  StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,801  StatusLogger.java:56 - 
> ReadStage 0 0940 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,802  StatusLogger.java:56 - 
> RequestResponseStage  0 0   11094437 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,802  StatusLogger.java:56 - 
> ReadRepairStage   0 0  5 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,803  StatusLogger.java:56 - 
> RequestResponseStage  4 0   11094509 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,807  StatusLogger.java:56 - 
> ReadRepairStage   0 0  5 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,808  StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,809  StatusLogger.java:56 - 
> MiscStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,809  StatusLogger.java:56 - 
> CompactionExecutor262   1234 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,810  StatusLogger.java:56 - 
> MemtableReclaimMemory 0 0 79 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,810  StatusLogger.java:56 - 
> PendingRangeCalculator0 0  3 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,819  StatusLogger.java:56 - 
> GossipStage   0 0   5214 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,820  StatusLogger.java:56 - 
> SecondaryIndexManagement  0 0  3 0
>  0
> INFO

[jira] [Updated] (CASSANDRA-12182) redundant StatusLogger print out when both dropped message and long GC event happen


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Szczygieł updated CASSANDRA-12182:
-
Status: Patch Available  (was: In Progress)

> redundant StatusLogger print out when both dropped message and long GC event 
> happen
> ---
>
> Key: CASSANDRA-12182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12182
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wei Deng
>Assignee: Michał Szczygieł
>Priority: Minor
>  Labels: lhf
> Attachments: 12182-trunk.txt, 12182-trunk.txt
>
>
> I was stress testing a C* 3.0 environment and it appears that when the CPU is 
> running low, HINT and MUTATION messages will start to get dropped, and the GC 
> thread can also get some really long-running GC, and I'd get some redundant 
> log entries in system.log like the following:
> {noformat}
> WARN  [Service Thread] 2016-07-12 22:48:45,748  GCInspector.java:282 - G1 
> Young Generation GC in 522ms.  G1 Eden Space: 68157440 -> 0; G1 Old Gen: 
> 3376113224 -> 3468387912; G1 Survivor Space: 24117248 -> 0; 
> INFO  [Service Thread] 2016-07-12 22:48:45,763  StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,775  MessagingService.java:983 - 
> MUTATION messages were dropped in last 5000 ms: 419 for internal timeout and 
> 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,776  MessagingService.java:983 - 
> HINT messages were dropped in last 5000 ms: 89 for internal timeout and 0 for 
> cross node timeout
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,776  StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,798  StatusLogger.java:56 - 
> MutationStage32  4194   32997234 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,798  StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,799  StatusLogger.java:56 - 
> ReadStage 0 0940 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,800  StatusLogger.java:56 - 
> MutationStage32  4363   32997333 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,801  StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,801  StatusLogger.java:56 - 
> ReadStage 0 0940 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,802  StatusLogger.java:56 - 
> RequestResponseStage  0 0   11094437 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,802  StatusLogger.java:56 - 
> ReadRepairStage   0 0  5 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,803  StatusLogger.java:56 - 
> RequestResponseStage  4 0   11094509 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,807  StatusLogger.java:56 - 
> ReadRepairStage   0 0  5 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,808  StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,809  StatusLogger.java:56 - 
> MiscStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,809  StatusLogger.java:56 - 
> CompactionExecutor262   1234 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,810  StatusLogger.java:56 - 
> MemtableReclaimMemory 0 0 79 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,810  StatusLogger.java:56 - 
> PendingRangeCalculator0 0  3 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,819  StatusLogger.java:56 - 
> GossipStage   0 0   5214 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,820  StatusLogger.java:56 - 
> SecondaryIndexManagement  0 0  3 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,820  StatusLogger.java:56 - 
>

[jira] [Updated] (CASSANDRA-12182) redundant StatusLogger print out when both dropped message and long GC event happen


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Szczygieł updated CASSANDRA-12182:
-
Attachment: 12182-trunk.txt

> redundant StatusLogger print out when both dropped message and long GC event 
> happen
> ---
>
> Key: CASSANDRA-12182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12182
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wei Deng
>Assignee: Michał Szczygieł
>Priority: Minor
>  Labels: lhf
> Attachments: 12182-trunk.txt, 12182-trunk.txt
>
>
> I was stress testing a C* 3.0 environment and it appears that when the CPU is 
> running low, HINT and MUTATION messages will start to get dropped, and the GC 
> thread can also get some really long-running GC, and I'd get some redundant 
> log entries in system.log like the following:
> {noformat}
> WARN  [Service Thread] 2016-07-12 22:48:45,748  GCInspector.java:282 - G1 
> Young Generation GC in 522ms.  G1 Eden Space: 68157440 -> 0; G1 Old Gen: 
> 3376113224 -> 3468387912; G1 Survivor Space: 24117248 -> 0; 
> INFO  [Service Thread] 2016-07-12 22:48:45,763  StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,775  MessagingService.java:983 - 
> MUTATION messages were dropped in last 5000 ms: 419 for internal timeout and 
> 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,776  MessagingService.java:983 - 
> HINT messages were dropped in last 5000 ms: 89 for internal timeout and 0 for 
> cross node timeout
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,776  StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,798  StatusLogger.java:56 - 
> MutationStage32  4194   32997234 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,798  StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,799  StatusLogger.java:56 - 
> ReadStage 0 0940 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,800  StatusLogger.java:56 - 
> MutationStage32  4363   32997333 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,801  StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,801  StatusLogger.java:56 - 
> ReadStage 0 0940 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,802  StatusLogger.java:56 - 
> RequestResponseStage  0 0   11094437 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,802  StatusLogger.java:56 - 
> ReadRepairStage   0 0  5 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,803  StatusLogger.java:56 - 
> RequestResponseStage  4 0   11094509 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,807  StatusLogger.java:56 - 
> ReadRepairStage   0 0  5 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,808  StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,809  StatusLogger.java:56 - 
> MiscStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,809  StatusLogger.java:56 - 
> CompactionExecutor262   1234 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,810  StatusLogger.java:56 - 
> MemtableReclaimMemory 0 0 79 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,810  StatusLogger.java:56 - 
> PendingRangeCalculator0 0  3 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,819  StatusLogger.java:56 - 
> GossipStage   0 0   5214 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,820  StatusLogger.java:56 - 
> SecondaryIndexManagement  0 0  3 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,820  StatusLogger.java:56 - 
> HintsDispatcher

[jira] [Comment Edited] (CASSANDRA-13475) First version of pluggable storage engine API.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236746#comment-16236746
 ] 

Blake Eggleston edited comment on CASSANDRA-13475 at 11/2/17 10:51 PM:
---

Let's keep discussion on this jira for the time being. Also, we're just talking 
about a plan at this point. What do you think of the plan as proposed? Any 
concerns? Things you think should be added, removed, or reordered?

edit: sorry Jason, I responded before I saw your response


was (Author: bdeggleston):
Let's keep discussion on this jira for the time being. Also, we're just talking 
about a plan at this point. What do you think of the plan as proposed? Any 
concerns? Things you think should be added, removed, or reordered?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement

2017-11-02 Thread Chris mildebrandt (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236755#comment-16236755
 ] 

Chris mildebrandt commented on CASSANDRA-13592:
---

I'm getting almost exactly the same stacktrace using Cassandra 3.11.1:
{noformat}
java.lang.NullPointerException: null
at 
org.apache.cassandra.dht.Murmur3Partitioner.getHash(Murmur3Partitioner.java:230)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.dht.Murmur3Partitioner.decorateKey(Murmur3Partitioner.java:66)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.config.CFMetaData.decorateKey(CFMetaData.java:627) 
~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.service.pager.PartitionRangeQueryPager.(PartitionRangeQueryPager.java:44)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.db.PartitionRangeReadCommand.getPager(PartitionRangeReadCommand.java:268)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.cql3.statements.SelectStatement.getPager(SelectStatement.java:475)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:288)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:118)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:530)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:507)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:146)
 ~[apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517)
 [apache-cassandra-3.11.1.jar:3.11.1]
at 
org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410)
 [apache-cassandra-3.11.1.jar:3.11.1]
at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348)
 [netty-all-4.0.44.Final.jar:4.0.44.Final]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_131]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
 [apache-cassandra-3.11.1.jar:3.11.1]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) 
[apache-cassandra-3.11.1.jar:3.11.1]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
{noformat}

It can be recreated with this project:
https://github.com/eyeofthefrog/CASSANDRA-13592

I think it's the same root cause, but let me know if I should open another 
issue. 

> Null Pointer exception at SELECT JSON statement
> ---
>
> Key: CASSANDRA-13592
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13592
> Project: Cassandra
>  Issue Type: Bug
>  Components: CQL
> Environment: Debian Linux
>Reporter: Wyss Philipp
>Assignee: ZhaoYang
>Priority: Major
>  Labels: beginner
> Fix For: 2.2.11, 3.0.15, 3.11.1, 4.0
>
> Attachments: system.log
>
>
> A Nulll pointer exception appears when the command
> {code}
> SELECT JSON * FROM examples.basic;
> ---MORE---
>  message="java.lang.NullPointerException">
> Examples.basic has the following description (DESC examples.basic;):
> CREATE TABLE examples.basic (
> key frozen> PRIMARY KEY,
> wert text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND

[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236746#comment-16236746
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

Let's keep discussion on this jira for the time being. Also, we're just talking 
about a plan at this point. What do you think of the plan as proposed? Any 
concerns? Things you think should be added, removed, or reordered?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236734#comment-16236734
 ] 

Jason Brown commented on CASSANDRA-13475:
-

[~dikanggu] please send out a message to the dev@ ML with the link to your quip 
doc, that way folks who aren't following this ticket (right now) can know where 
the action is taking place.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

2017-11-02 Thread Dikang Gu (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236710#comment-16236710
 ] 

Dikang Gu commented on CASSANDRA-13475:
---

[~bdeggleston], yeah, they are very good points. To have a central place for 
the discussion, I will try to answer your questions, and add more details to 
the quip: https://quip.com/bhw5ABUCi3co. Everyone should have access to the 
quip, and please feel free to edit/comment on it.

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236648#comment-16236648
 ] 

Blake Eggleston commented on CASSANDRA-13475:
-

I think it’s too early to start looking at code, or talking about api 
specifics. We should start by getting a rough plan together. My thoughts on an 
initial plan are below. This is just a rough idea dump, so let me know if I’ve 
missed anything.

# Discuss expectations, guidelines, non-technical stuff, etc.
** Let’s start off by making sure we’re all on the same page about:
*** What we expect the end result to be
*** Guidelines on planning / implementing component refactors
*** Any approximate timelines you have in mind, if any
*** Pluggable storage's place in the cassandra project
# Agree on the boundaries of the storage engine layer. What it is and isn’t 
responsible for.
** This has already been discussed to some degree, but let’s agree on a 
definition.
# Work out a strategy for streaming and repair
** This is a bit hand wavy at the moment, and not having a solid streaming and 
repair story is a non starter. So let’s figure out how that’s going to work 
(including incremental repair) before we get too deep into anything els
# Decide how schema ui / metadata will be refactored to support multiple 
storage engines
# Work out a strategy for exposing metrics / monitoring from different engines.
# Migrate read command and write logic into cfs
# Identify remaining leaky parts of CFS class.
** Some of this will be legit storage implementation details. Other parts will 
be systems we’ve missed, or things that need to be abstracted.
# Identify systems not controlled by CFS that interacts with storage layer on 
it’s own
# Implement streaming / repair changes
# Refactor each leaky group of cfs components
# Refactor each non-cfs system that interacts with storage layer.
# Refactor metrics/monitoring systems
# Refactor schema ui, metadata implementation
# Extract interfaces from CFS and keyspace
# Introduce pluggable Keyspace/CFS factories controlled by schema

Thoughts?

> First version of pluggable storage engine API.
> --
>
> Key: CASSANDRA-13475
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Dikang Gu
>Assignee: Dikang Gu
>Priority: Major
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes

2017-11-02 Thread Loic Lambiel (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236514#comment-16236514
 ] 

Loic Lambiel commented on CASSANDRA-13948:
--

Did additional testing and wasn't able to reproduce :-/ 

I'll try the patch on more representative nodes in the coming days and report 
back any issue.

> Reload compaction strategies when JBOD disk boundary changes
> 
>
> Key: CASSANDRA-13948
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13948
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Paulo Motta
>Assignee: Paulo Motta
>Priority: Major
> Fix For: 3.11.x, 4.x
>
> Attachments: debug.log
>
>
> The thread dump below shows a race between an sstable replacement by the 
> {{IndexSummaryRedistribution}} and 
> {{AbstractCompactionTask.getNextBackgroundTask}}:
> {noformat}
> Thread 94580: (state = BLOCKED)
>  - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information 
> may be imprecise)
>  - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, 
> line=175 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() 
> @bci=1, line=836 (Compiled frame)
>  - 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node,
>  int) @bci=67, line=870 (Compiled frame)
>  - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) 
> @bci=17, line=1199 (Compiled frame)
>  - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, 
> line=943 (Compiled frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable,
>  java.lang.Iterable) @bci=359, line=483 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification,
>  java.lang.Object) @bci=53, line=555 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection,
>  java.util.Collection, org.apache.cassandra.db.compaction.OperationType, 
> java.lang.Throwable) @bci=50, line=409 (Interpreted frame)
>  - 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable)
>  @bci=157, line=227 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable)
>  @bci=61, line=116 (Compiled frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit()
>  @bci=2, line=200 (Interpreted frame)
>  - 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish()
>  @bci=5, line=185 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries()
>  @bci=559, line=130 (Interpreted frame)
>  - 
> org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=9, line=1420 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution)
>  @bci=4, line=250 (Interpreted frame)
>  - 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() 
> @bci=30, line=228 (Interpreted frame)
>  - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() 
> @bci=4, line=125 (Interpreted frame)
>  - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run()
>  @bci=4, line=118 (Compiled frame)
>  - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 
> (Compiled frame)
>  - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled 
> frame)
>  - 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask)
>  @bci=1, line=180 (Compiled frame)
>  - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() 
> @bci=37, line=294 (Compiled frame)
>  - 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
>  @bci=95, line=1149 (Compiled frame)
>  - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 
> (Interpreted frame)
>  - 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable)
>  @bci=1, line=81 (Interpreted frame)
>  - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 
> (Interpreted frame)
>  -

[jira] [Commented] (CASSANDRA-13988) Add a timeout field to EXECUTE / QUERY / BATCH messages


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236307#comment-16236307
 ] 

Aleksey Yeschenko commented on CASSANDRA-13988:
---

Pretty sure this is a duplicate of CASSANDRA-2848.

> Add a timeout field to EXECUTE / QUERY / BATCH messages
> ---
>
> Key: CASSANDRA-13988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13988
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Michaël Figuière
>Priority: Minor
>
> The request timeout at the coordinator level is currently statically 
> configured through the {{request_timeout_in_ms}} and 
> {{xxx_request_timeout_in_ms}} parameters in cassandra.yaml. There would be 
> some benefits in making it possible for the client to dynamically define it 
> through the CQL Protocol:
> * In practice, there's often a misalignment between the timeout configured in 
> Cassandra and in the client leading non-optimal query execution flow, where 
> the coordinator continues to work while the client is not waiting anymore, or 
> where the client waits for too long for a potential response. The 99th 
> percentile latency can be significantly impacted by such issues. 
> * While the read timeout is typically statically configured on the Drivers, 
> on the Java Driver 3.x the developer is free to set a custom timeout using 
> {{ResultSetFuture#get(long, TimeUnit)}} which can lead to an extra 
> misalignment of timeouts with the coordinator. The Java Driver 4.x will make 
> the timeout configurable per query through its new {{DriverConfigProfile}} 
> abstraction.
> * It makes it possible for applications to shift to a "remaining time budget" 
> approach rather than the often inappropriate static timeout one. Also, the 
> Java Driver 4.x plans to change its definition of {{readTimeout}} from a per 
> execution attempt time to an overall query execution time. So the Driver 
> itself would also be able to work on a "remaining time budget" for each of 
> its execution attempts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-12182) redundant StatusLogger print out when both dropped message and long GC event happen


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michał Szczygieł updated CASSANDRA-12182:
-
Status: In Progress  (was: Patch Available)

> redundant StatusLogger print out when both dropped message and long GC event 
> happen
> ---
>
> Key: CASSANDRA-12182
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12182
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Wei Deng
>Assignee: Michał Szczygieł
>Priority: Minor
>  Labels: lhf
> Attachments: 12182-trunk.txt
>
>
> I was stress testing a C* 3.0 environment and it appears that when the CPU is 
> running low, HINT and MUTATION messages will start to get dropped, and the GC 
> thread can also get some really long-running GC, and I'd get some redundant 
> log entries in system.log like the following:
> {noformat}
> WARN  [Service Thread] 2016-07-12 22:48:45,748  GCInspector.java:282 - G1 
> Young Generation GC in 522ms.  G1 Eden Space: 68157440 -> 0; G1 Old Gen: 
> 3376113224 -> 3468387912; G1 Survivor Space: 24117248 -> 0; 
> INFO  [Service Thread] 2016-07-12 22:48:45,763  StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,775  MessagingService.java:983 - 
> MUTATION messages were dropped in last 5000 ms: 419 for internal timeout and 
> 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,776  MessagingService.java:983 - 
> HINT messages were dropped in last 5000 ms: 89 for internal timeout and 0 for 
> cross node timeout
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,776  StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,798  StatusLogger.java:56 - 
> MutationStage32  4194   32997234 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,798  StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,799  StatusLogger.java:56 - 
> ReadStage 0 0940 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,800  StatusLogger.java:56 - 
> MutationStage32  4363   32997333 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,801  StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,801  StatusLogger.java:56 - 
> ReadStage 0 0940 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,802  StatusLogger.java:56 - 
> RequestResponseStage  0 0   11094437 0
>  0
> INFO  [Service Thread] 2016-07-12 22:48:45,802  StatusLogger.java:56 - 
> ReadRepairStage   0 0  5 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,803  StatusLogger.java:56 - 
> RequestResponseStage  4 0   11094509 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,807  StatusLogger.java:56 - 
> ReadRepairStage   0 0  5 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,808  StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,809  StatusLogger.java:56 - 
> MiscStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,809  StatusLogger.java:56 - 
> CompactionExecutor262   1234 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,810  StatusLogger.java:56 - 
> MemtableReclaimMemory 0 0 79 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,810  StatusLogger.java:56 - 
> PendingRangeCalculator0 0  3 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,819  StatusLogger.java:56 - 
> GossipStage   0 0   5214 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,820  StatusLogger.java:56 - 
> SecondaryIndexManagement  0 0  3 0
>  0
> INFO  [ScheduledTasks:1] 2016-07-12 22:48:45,820  StatusLogger.java:56 - 
> HintsDispatcher

[jira] [Commented] (CASSANDRA-13988) Add a timeout field to EXECUTE / QUERY / BATCH messages


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236282#comment-16236282
 ] 

Michaël Figuière commented on CASSANDRA-13988:
--

Looking into it, it seems like the {{ReadCommand#getTimeout()}} abstract method 
offers a convenient opportunity to implement this feature.

> Add a timeout field to EXECUTE / QUERY / BATCH messages
> ---
>
> Key: CASSANDRA-13988
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13988
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Michaël Figuière
>Priority: Minor
>
> The request timeout at the coordinator level is currently statically 
> configured through the {{request_timeout_in_ms}} and 
> {{xxx_request_timeout_in_ms}} parameters in cassandra.yaml. There would be 
> some benefits in making it possible for the client to dynamically define it 
> through the CQL Protocol:
> * In practice, there's often a misalignment between the timeout configured in 
> Cassandra and in the client leading non-optimal query execution flow, where 
> the coordinator continues to work while the client is not waiting anymore, or 
> where the client waits for too long for a potential response. The 99th 
> percentile latency can be significantly impacted by such issues. 
> * While the read timeout is typically statically configured on the Drivers, 
> on the Java Driver 3.x the developer is free to set a custom timeout using 
> {{ResultSetFuture#get(long, TimeUnit)}} which can lead to an extra 
> misalignment of timeouts with the coordinator. The Java Driver 4.x will make 
> the timeout configurable per query through its new {{DriverConfigProfile}} 
> abstraction.
> * It makes it possible for applications to shift to a "remaining time budget" 
> approach rather than the often inappropriate static timeout one. Also, the 
> Java Driver 4.x plans to change its definition of {{readTimeout}} from a per 
> execution attempt time to an overall query execution time. So the Driver 
> itself would also be able to work on a "remaining time budget" for each of 
> its execution attempts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Created] (CASSANDRA-13988) Add a timeout field to EXECUTE / QUERY / BATCH messages

Michaël Figuière created CASSANDRA-13988:


 Summary: Add a timeout field to EXECUTE / QUERY / BATCH messages
 Key: CASSANDRA-13988
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13988
 Project: Cassandra
  Issue Type: Improvement
Reporter: Michaël Figuière
Priority: Minor


The request timeout at the coordinator level is currently statically configured 
through the {{request_timeout_in_ms}} and {{xxx_request_timeout_in_ms}} 
parameters in cassandra.yaml. There would be some benefits in making it 
possible for the client to dynamically define it through the CQL Protocol:

* In practice, there's often a misalignment between the timeout configured in 
Cassandra and in the client leading non-optimal query execution flow, where the 
coordinator continues to work while the client is not waiting anymore, or where 
the client waits for too long for a potential response. The 99th percentile 
latency can be significantly impacted by such issues. 
* While the read timeout is typically statically configured on the Drivers, on 
the Java Driver 3.x the developer is free to set a custom timeout using 
{{ResultSetFuture#get(long, TimeUnit)}} which can lead to an extra misalignment 
of timeouts with the coordinator. The Java Driver 4.x will make the timeout 
configurable per query through its new {{DriverConfigProfile}} abstraction.
* It makes it possible for applications to shift to a "remaining time budget" 
approach rather than the often inappropriate static timeout one. Also, the Java 
Driver 4.x plans to change its definition of {{readTimeout}} from a per 
execution attempt time to an overall query execution time. So the Driver itself 
would also be able to work on a "remaining time budget" for each of its 
execution attempts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[2/3] cassandra git commit: ninja-fix comment to correct the default RING_DEALY value

ninja-fix comment to correct the default RING_DEALY value


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c8a3b58b
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c8a3b58b
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c8a3b58b

Branch: refs/heads/trunk
Commit: c8a3b58bdbf12909ac0a823308e8a278cd02001b
Parents: ea443df
Author: Jason Brown 
Authored: Thu Nov 2 10:46:24 2017 -0700
Committer: Jason Brown 
Committed: Thu Nov 2 10:46:24 2017 -0700

--
 conf/jvm.options | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/c8a3b58b/conf/jvm.options
--
diff --git a/conf/jvm.options b/conf/jvm.options
index f91466a..bfe2da9 100644
--- a/conf/jvm.options
+++ b/conf/jvm.options
@@ -49,7 +49,7 @@
 # Allow restoring specific tables from an archived commit log.
 #-Dcassandra.replayList=table
 
-# Allows overriding of the default RING_DELAY (1000ms), which is the amount of 
time a node waits
+# Allows overriding of the default RING_DELAY (3ms), which is the amount 
of time a node waits
 # before joining the ring.
 #-Dcassandra.ring_delay_ms=ms
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[1/3] cassandra git commit: ninja-fix comment to correct the default RING_DEALY value

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-3.11 ea443dfe3 -> c8a3b58bd
  refs/heads/trunk 684e250ba -> 87962dcf3


ninja-fix comment to correct the default RING_DEALY value


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c8a3b58b
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c8a3b58b
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c8a3b58b

Branch: refs/heads/cassandra-3.11
Commit: c8a3b58bdbf12909ac0a823308e8a278cd02001b
Parents: ea443df
Author: Jason Brown 
Authored: Thu Nov 2 10:46:24 2017 -0700
Committer: Jason Brown 
Committed: Thu Nov 2 10:46:24 2017 -0700

--
 conf/jvm.options | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/c8a3b58b/conf/jvm.options
--
diff --git a/conf/jvm.options b/conf/jvm.options
index f91466a..bfe2da9 100644
--- a/conf/jvm.options
+++ b/conf/jvm.options
@@ -49,7 +49,7 @@
 # Allow restoring specific tables from an archived commit log.
 #-Dcassandra.replayList=table
 
-# Allows overriding of the default RING_DELAY (1000ms), which is the amount of 
time a node waits
+# Allows overriding of the default RING_DELAY (3ms), which is the amount 
of time a node waits
 # before joining the ring.
 #-Dcassandra.ring_delay_ms=ms
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[3/3] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/87962dcf
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/87962dcf
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/87962dcf

Branch: refs/heads/trunk
Commit: 87962dcf364944f656b5212b8418432fbd1c4b95
Parents: 684e250 c8a3b58
Author: Jason Brown 
Authored: Thu Nov 2 10:46:42 2017 -0700
Committer: Jason Brown 
Committed: Thu Nov 2 10:47:17 2017 -0700

--
 conf/jvm.options | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/87962dcf/conf/jvm.options
--


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13973) IllegalArgumentException in upgradesstables compaction

2017-11-02 Thread Dan Kinder (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236142#comment-16236142
 ] 

Dan Kinder commented on CASSANDRA-13973:


Thanks [~jjirsa] I'll give it a shot.

> IllegalArgumentException in upgradesstables compaction
> --
>
> Key: CASSANDRA-13973
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13973
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Dan Kinder
>Assignee: Jeff Jirsa
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> After an upgrade from 2.2.6 to 3.0.15 (sstable version la to mc), when I try 
> to run upgradesstables, most of them upgrade fine but I see the exception 
> below on several nodes, and it doesn't complete.
> CASSANDRA-12717 looks similar but the stack trace is not the same, so I 
> assumed it is not identical. The various nodes this happens on all give the 
> same trace.
> Might be notable that this is an analytics cluster with some large 
> partitions, in the GB size.
> {noformat}
> error: Out of range: 7316844981
> -- StackTrace --
> java.lang.IllegalArgumentException: Out of range: 7316844981
> at com.google.common.primitives.Ints.checkedCast(Ints.java:91)
> at 
> org.apache.cassandra.db.RowIndexEntry$IndexedEntry.promotedSize(RowIndexEntry.java:329)
> at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.serialize(RowIndexEntry.java:133)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.append(BigTableWriter.java:409)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.afterAppend(BigTableWriter.java:120)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:157)
> at 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125)
> at 
> org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:88)
> at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:424)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:311)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Comment Edited] (CASSANDRA-13973) IllegalArgumentException in upgradesstables compaction

2017-11-02 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236109#comment-16236109
 ] 

Jeff Jirsa edited comment on CASSANDRA-13973 at 11/2/17 4:52 PM:
-

Thanks for the feedback [~slebresne]. I really appreciate you taking the time 
to respond. The risky feeling here is why I'm moving slow on this myself - it 
seems straightforward, but obviously the potential for unexpected surprises 
here is pretty high. 

Better messaging on that assert makes a lot of sense. Also good to see the 
second confirmation that changing {{column_index_size_in_kb}} is a good 
workaround, the drawback is that it's instance-wide, so if you have just a 
handful of wide rows (like this user, the histogram shows their 99% size is 
less than 1MB, but their max size is 394GB), you suffer a disk penalty on all 
keyspaces/tables/rows in order to not crash on the one bad row. 

[~dankinder] if you need to unblock yourself right now, changing 
{{column_index_size_in_kb}} on your instance to 256 (7G of index data needs to 
go under 2G in size, so multiplying factor of 4) PROBABLY works past this 
issue, but expect a bit more disk IO (particularly reads) after the change 
(+upgradesstables)






was (Author: jjirsa):
Thanks for the feedback [~slebresne]. I really appreciate you taking the time 
to respond. The risky feeling here is why I'm moving slow on this myself - it 
seems straightforward, but obviously the potential for unexpected surprises 
here is pretty high. 

Better messaging on that assert makes a lot of sense. Also good to see the 
second confirmation that changing {{column_index_size_in_kb}} is a good 
workaround, the drawback is that it's instance-wide, so if you have just a 
handful of wide rows (like this user, the histogram shows their 99% size is 
less than 1MB, but their max size is 394GB), you suffer a disk penalty on all 
keyspaces/tables/rows in order to not crash on the one bad row. 

[~dankinder] if you need to unblock yourself right now, changing 
{{column_index_size_in_kb}} on your instance to 256k (7G of index data needs to 
go under 2G in size, so multiplying factor of 4) PROBABLY works past this 
issue, but expect a bit more disk IO (particularly reads) after the change 
(+upgradesstables)





> IllegalArgumentException in upgradesstables compaction
> --
>
> Key: CASSANDRA-13973
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13973
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Dan Kinder
>Assignee: Jeff Jirsa
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> After an upgrade from 2.2.6 to 3.0.15 (sstable version la to mc), when I try 
> to run upgradesstables, most of them upgrade fine but I see the exception 
> below on several nodes, and it doesn't complete.
> CASSANDRA-12717 looks similar but the stack trace is not the same, so I 
> assumed it is not identical. The various nodes this happens on all give the 
> same trace.
> Might be notable that this is an analytics cluster with some large 
> partitions, in the GB size.
> {noformat}
> error: Out of range: 7316844981
> -- StackTrace --
> java.lang.IllegalArgumentException: Out of range: 7316844981
> at com.google.common.primitives.Ints.checkedCast(Ints.java:91)
> at 
> org.apache.cassandra.db.RowIndexEntry$IndexedEntry.promotedSize(RowIndexEntry.java:329)
> at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.serialize(RowIndexEntry.java:133)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.append(BigTableWriter.java:409)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.afterAppend(BigTableWriter.java:120)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:157)
> at 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125)
> at 
> org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:88)
> at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:424)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:311)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
>

[jira] [Commented] (CASSANDRA-13973) IllegalArgumentException in upgradesstables compaction

2017-11-02 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236109#comment-16236109
 ] 

Jeff Jirsa commented on CASSANDRA-13973:


Thanks for the feedback [~slebresne]. I really appreciate you taking the time 
to respond. The risky feeling here is why I'm moving slow on this myself - it 
seems straightforward, but obviously the potential for unexpected surprises 
here is pretty high. 

Better messaging on that assert makes a lot of sense. Also good to see the 
second confirmation that changing {{column_index_size_in_kb}} is a good 
workaround, the drawback is that it's instance-wide, so if you have just a 
handful of wide rows (like this user, the histogram shows their 99% size is 
less than 1MB, but their max size is 394GB), you suffer a disk penalty on all 
keyspaces/tables/rows in order to not crash on the one bad row. 

[~dankinder] if you need to unblock yourself right now, changing 
{{column_index_size_in_kb}} on your instance to 256k (7G of index data needs to 
go under 2G in size, so multiplying factor of 4) PROBABLY works past this 
issue, but expect a bit more disk IO (particularly reads) after the change 
(+upgradesstables)





> IllegalArgumentException in upgradesstables compaction
> --
>
> Key: CASSANDRA-13973
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13973
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Dan Kinder
>Assignee: Jeff Jirsa
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> After an upgrade from 2.2.6 to 3.0.15 (sstable version la to mc), when I try 
> to run upgradesstables, most of them upgrade fine but I see the exception 
> below on several nodes, and it doesn't complete.
> CASSANDRA-12717 looks similar but the stack trace is not the same, so I 
> assumed it is not identical. The various nodes this happens on all give the 
> same trace.
> Might be notable that this is an analytics cluster with some large 
> partitions, in the GB size.
> {noformat}
> error: Out of range: 7316844981
> -- StackTrace --
> java.lang.IllegalArgumentException: Out of range: 7316844981
> at com.google.common.primitives.Ints.checkedCast(Ints.java:91)
> at 
> org.apache.cassandra.db.RowIndexEntry$IndexedEntry.promotedSize(RowIndexEntry.java:329)
> at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.serialize(RowIndexEntry.java:133)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.append(BigTableWriter.java:409)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.afterAppend(BigTableWriter.java:120)
> at 
> org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:157)
> at 
> org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125)
> at 
> org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:88)
> at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89)
> at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:424)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:311)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13849) GossipStage blocks because of race in ActiveRepairService


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236068#comment-16236068
 ] 

Blake Eggleston commented on CASSANDRA-13849:
-

Patch looks good. I've merged it up through trunk and started tests here:

|[3.0|https://github.com/bdeggleston/cassandra/tree/13849-3.0] | 
[utests|https://circleci.com/gh/bdeggleston/cassandra/152] | 
[dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/408/]|
|[3.11|https://github.com/bdeggleston/cassandra/tree/13849-3.11] | 
[utests|https://circleci.com/gh/bdeggleston/cassandra/153] | 
[dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/409/]|
|[trunk|https://github.com/bdeggleston/cassandra/tree/13849-trunk] | 
[utests|https://circleci.com/gh/bdeggleston/cassandra/154] | 
[dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/410/]
 |

I'll commit once the tests are complete, assuming there aren't any problems.

> GossipStage blocks because of race in ActiveRepairService
> -
>
> Key: CASSANDRA-13849
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13849
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom van der Woerdt
>Assignee: Sergey Lapukhov
>Priority: Major
>  Labels: patch
> Fix For: 3.0.x, 3.11.x
>
> Attachments: CAS-13849.patch, CAS-13849_2.patch, CAS-13849_3.patch
>
>
> Bad luck caused a kernel panic in a cluster, and that took another node with 
> it because GossipStage stopped responding.
> I think it's pretty obvious what's happening, here are the relevant excerpts 
> from the stack traces :
> {noformat}
> "Thread-24004" #393781 daemon prio=5 os_prio=0 tid=0x7efca9647400 
> nid=0xe75c waiting on condition [0x7efaa47fe000]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00052b63a7e8> (a 
> java.util.concurrent.CountDownLatch$Sync)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
> at 
> org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:332)
> - locked <0x0002e6bc99f0> (a 
> org.apache.cassandra.service.ActiveRepairService)
> at 
> org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:211)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)   
>   
>   at 
> java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> at 
> org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$3/1498438472.run(Unknown
>  Source)
> at java.lang.Thread.run(Thread.java:748)
> "GossipTasks:1" #367 daemon prio=5 os_prio=0 tid=0x7efc5e971000 
> nid=0x700b waiting for monitor entry [0x7dfb839fe000]
>java.lang.Thread.State: BLOCKED (on object monitor)
> at 
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:421)
> - waiting to lock <0x0002e6bc99f0> (a 
> org.apache.cassandra.service.ActiveRepairService)
> at 
> org.apache.cassandra.service.ActiveRepairService.convict(ActiveRepairService.java:776)
> at 
> org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:306)
> at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:775) 
>   
>  at 
> org.apache.cassandra.gms.Gossiper.access$800(Gossiper.java:67)
> at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:187)
> at 
> org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
>

[jira] [Resolved] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston resolved CASSANDRA-13885.
-
Resolution: Won't Fix

> Allow to run full repairs in 3.0 without additional cost of anti-compaction
> ---
>
> Key: CASSANDRA-13885
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13885
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Major
>
> This ticket is basically the result of the discussion in Cassandra user list: 
> https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html
> I was asked to open a ticket by Paulo Motta to think about back-porting 
> running full repairs without the additional cost of anti-compaction.
> Basically there is no way in 3.0 to run full repairs from several nodes 
> concurrently without troubles caused by (overlapping?) anti-compactions. 
> Coming from 2.1 this is a major change from an operational POV, basically 
> breaking any e.g. cron job based solution kicking off -pr based repairs on 
> several nodes concurrently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13975) Add a workaround for overly large read repair mutations


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235982#comment-16235982
 ] 

Aleksey Yeschenko commented on CASSANDRA-13975:
---

A straight-forward change pushed 
[here|https://github.com/iamaleksey/cassandra/commits/13975-3.0]. Unit test run 
[here|https://circleci.com/gh/iamaleksey/cassandra/63], dtest run 
[here|https://builds.apache.org/job/Cassandra-devbranch-dtest/407/].



> Add a workaround for overly large read repair mutations
> ---
>
> Key: CASSANDRA-13975
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13975
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Major
> Fix For: 3.0.x, 3.11.x
>
>
> It's currently possible for {{DataResolver}} to accumulate more changes to 
> read repair that would fit in a single serialized mutation. If that happens, 
> the node receiving the mutation would fail, and the read would time out, and 
> won't be able to proceed until the operator runs repair or manually drops the 
> affected partitions.
> Ideally we should either read repair iteratively, or at least split the 
> resulting mutation into smaller chunks in the end. In the meantime, for 
> 3.0.x, I suggest we add logging to catch this, and a -D flag to allow 
> proceeding with the requests as is when the mutation is too large, without 
> read repair.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13975) Add a workaround for overly large read repair mutations


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-13975:
--
Status: Patch Available  (was: In Progress)

> Add a workaround for overly large read repair mutations
> ---
>
> Key: CASSANDRA-13975
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13975
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Major
> Fix For: 3.0.x, 3.11.x
>
>
> It's currently possible for {{DataResolver}} to accumulate more changes to 
> read repair that would fit in a single serialized mutation. If that happens, 
> the node receiving the mutation would fail, and the read would time out, and 
> won't be able to proceed until the operator runs repair or manually drops the 
> affected partitions.
> Ideally we should either read repair iteratively, or at least split the 
> resulting mutation into smaller chunks in the end. In the meantime, for 
> 3.0.x, I suggest we add logging to catch this, and a -D flag to allow 
> proceeding with the requests as is when the mutation is too large, without 
> read repair.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13975) Add a workaround for overly large read repair mutations


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-13975:
--
Reviewer: Sam Tunnicliffe

> Add a workaround for overly large read repair mutations
> ---
>
> Key: CASSANDRA-13975
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13975
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Major
> Fix For: 3.0.x, 3.11.x
>
>
> It's currently possible for {{DataResolver}} to accumulate more changes to 
> read repair that would fit in a single serialized mutation. If that happens, 
> the node receiving the mutation would fail, and the read would time out, and 
> won't be able to proceed until the operator runs repair or manually drops the 
> affected partitions.
> Ideally we should either read repair iteratively, or at least split the 
> resulting mutation into smaller chunks in the end. In the meantime, for 
> 3.0.x, I suggest we add logging to catch this, and a -D flag to allow 
> proceeding with the requests as is when the mutation is too large, without 
> read repair.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13975) Add a workaround for overly large read repair mutations


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-13975:
--
Summary: Add a workaround for overly large read repair mutations  (was: TBD)

> Add a workaround for overly large read repair mutations
> ---
>
> Key: CASSANDRA-13975
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13975
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Major
> Fix For: 3.0.x, 3.11.x
>
>
> It's currently possible for {{DataResolver}} to accumulate more changes to 
> read repair that would fit in a single serialized mutation. If that happens, 
> the node receiving the mutation would fail, and the read would time out, and 
> won't be able to proceed until the operator runs repair or manually drops the 
> affected partitions.
> Ideally we should either read repair iteratively, or at least split the 
> resulting mutation into smaller chunks in the end. In the meantime, for 
> 3.0.x, I suggest we add logging to catch this, and a -D flag to allow 
> proceeding with the requests as is when the mutation is too large, without 
> read repair.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-13975) TBD


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-13975:
--
Description: 
It's currently possible for {{DataResolver}} to accumulate more changes to read 
repair that would fit in a single serialized mutation. If that happens, the 
node receiving the mutation would fail, and the read would time out, and won't 
be able to proceed until the operator runs repair or manually drops the 
affected partitions.

Ideally we should either read repair iteratively, or at least split the 
resulting mutation into smaller chunks in the end. In the meantime, for 3.0.x, 
I suggest we add logging to catch this, and a -D flag to allow proceeding with 
the requests as is when the mutation is too large, without read repair.

  was:TBD


> TBD
> ---
>
> Key: CASSANDRA-13975
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13975
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Major
> Fix For: 3.0.x, 3.11.x
>
>
> It's currently possible for {{DataResolver}} to accumulate more changes to 
> read repair that would fit in a single serialized mutation. If that happens, 
> the node receiving the mutation would fail, and the read would time out, and 
> won't be able to proceed until the operator runs repair or manually drops the 
> affected partitions.
> Ideally we should either read repair iteratively, or at least split the 
> resulting mutation into smaller chunks in the end. In the meantime, for 
> 3.0.x, I suggest we add logging to catch this, and a -D flag to allow 
> proceeding with the requests as is when the mutation is too large, without 
> read repair.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Resolved] (CASSANDRA-13982) Refactoring to specialised functional interfaces


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown resolved CASSANDRA-13982.
-
   Resolution: Fixed
Fix Version/s: (was: 4.x)
   4.0

some dtests are failing on other, unrelated branches, so i do not think any new 
failure is introduced with this patch. Thus, I'm +1, and committed as sha 
{{684e250ba6e5b5bd1c246ceac332a91b2dc90859}}

Thanks!

> Refactoring to specialised functional interfaces
> 
>
> Key: CASSANDRA-13982
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13982
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Ameya Ketkar
>Assignee: Ameya Ketkar
>Priority: Minor
>  Labels: static-analysis
> Fix For: 4.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Usage of specialised functional interfaces provided by JDK, will reduce the 
> autoboxing overhead hence. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

cassandra git commit: Refactoring to specialised functional interfaces

Repository: cassandra
Updated Branches:
  refs/heads/trunk 3fe31ffdd -> 684e250ba


Refactoring to specialised functional interfaces

patch by Ameya Ketkar; reviewed by jasobrown for CASSANDRA-13982


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/684e250b
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/684e250b
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/684e250b

Branch: refs/heads/trunk
Commit: 684e250ba6e5b5bd1c246ceac332a91b2dc90859
Parents: 3fe31ff
Author: ameya 
Authored: Sat Oct 28 16:50:24 2017 -0700
Committer: Jason Brown 
Committed: Thu Nov 2 06:44:48 2017 -0700

--
 CHANGES.txt |  1 +
 .../cassandra/auth/jmx/AuthorizationProxy.java  | 15 ++--
 .../org/apache/cassandra/db/Directories.java|  5 +-
 .../org/apache/cassandra/db/ReadCommand.java|  3 +-
 .../db/compaction/CompactionController.java |  3 +-
 .../db/compaction/CompactionIterator.java   |  6 +-
 .../db/compaction/CompactionManager.java|  3 +-
 .../db/compaction/SSTableSplitter.java  |  3 +-
 .../cassandra/db/compaction/Upgrader.java   |  3 +-
 .../cassandra/db/compaction/Verifier.java   |  3 +-
 .../db/lifecycle/LifecycleTransaction.java  |  4 +-
 .../db/lifecycle/LogAwareFileLister.java|  8 +--
 .../cassandra/db/partitions/PurgeFunction.java  |  3 +-
 .../cassandra/hints/HintsDispatchExecutor.java  |  8 +--
 .../compress/CompressedInputStream.java |  8 +--
 .../cassandra/tools/SSTableMetadataViewer.java  |  8 +--
 .../cassandra/tools/StandaloneSSTableUtil.java  |  3 +-
 src/java/org/apache/cassandra/tools/Util.java   | 18 ++---
 .../test/microbench/AutoBoxingBench.java| 74 
 .../auth/jmx/AuthorizationProxyTest.java| 21 +++---
 .../db/compaction/CompactionControllerTest.java |  3 +-
 .../rows/UnfilteredRowIteratorsMergeTest.java   | 10 +--
 .../db/rows/UnfilteredRowsGenerator.java|  8 +--
 .../service/NativeTransportServiceTest.java |  7 +-
 24 files changed, 157 insertions(+), 71 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/684e250b/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 6c3eb53..71f4b1d 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0
+ * Refactoring to specialised functional interfaces (CASSANDRA-13982)
  * Speculative retry should allow more friendly params (CASSANDRA-13876)
  * Throw exception if we send/receive repair messages to incompatible nodes 
(CASSANDRA-13944)
  * Replace usages of MessageDigest with Guava's Hasher (CASSANDRA-13291)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/684e250b/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java
--
diff --git a/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java 
b/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java
index 1d8f462..d9b63c6 100644
--- a/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java
+++ b/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java
@@ -23,8 +23,9 @@ import java.security.AccessControlContext;
 import java.security.AccessController;
 import java.security.Principal;
 import java.util.Set;
+import java.util.function.BooleanSupplier;
 import java.util.function.Function;
-import java.util.function.Supplier;
+import java.util.function.Predicate;
 import java.util.stream.Collectors;
 import javax.management.MBeanServer;
 import javax.management.MalformedObjectNameException;
@@ -110,7 +111,7 @@ public class AuthorizationProxy implements InvocationHandler
  Used to check whether the Role associated with the authenticated Subject 
has superuser
  status. By default, just delegates to Roles::hasSuperuserStatus, but can 
be overridden for testing.
  */
-protected Function isSuperuser = 
Roles::hasSuperuserStatus;
+protected Predicate isSuperuser = Roles::hasSuperuserStatus;
 
 /*
  Used to retrieve the set of all permissions granted to a given role. By 
default, this fetches
@@ -123,7 +124,7 @@ public class AuthorizationProxy implements InvocationHandler
  Used to decide whether authorization is enabled or not, usually this 
depends on the configured
  IAuthorizer, but can be overridden for testing.
  */
-protected Supplier isAuthzRequired = () -> 
DatabaseDescriptor.getAuthorizer().requireAuthorization();
+protected BooleanSupplier isAuthzRequired = () -> 
DatabaseDescriptor.getAuthorizer().requireAuthorization();
 
 /*
  Used to find matching MBeans when the invocation target is a pattern type 
ObjectName.
@@

[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability

2017-11-02 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235731#comment-16235731
 ] 

Ariel Weisberg commented on CASSANDRA-13987:


bq.  but the ordering in the sidekick entries are not guaranteed to be in the 
same order as the commit log's entries. 
Just a heads up they would. You would increment the offsets atomically using 
CAS of two 4-byte values packed into one 8-byte value.

> Multithreaded commitlog subtly changed durability
> -
>
> Key: CASSANDRA-13987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13987
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Major
> Fix For: 4.x
>
>
> When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly 
> changed the way that commitlog durability worked. Everything still gets 
> written to an mmap file. However, not everything is replayable from the 
> mmaped file after a process crash, in periodic mode.
> In brief, the reason this changesd is due to the chained markers that are 
> required for the multithreaded commit log. At each msync, we wait for 
> outstanding mutations to serialize into the commitlog, and update a marker 
> before and after the commits that have accumluated since the last sync. With 
> those markers, we can safely replay that section of the commitlog. Without 
> the markers, we have no guarantee that the commits in that section were 
> successfully written, thus we abandon those commits on replay.
> If you have correlated process failures of multiple nodes at "nearly" the 
> same time (see ["There Is No 
> Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have 
> data loss if none of the nodes msync the commitlog. For example, with RF=3, 
> if quorum write succeeds on two nodes (and we acknowledge the write back to 
> the client), and then the process on both nodes OOMs (say, due to reading the 
> index for a 100GB partition), the write will be lost if neither process 
> msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. 
> The reason why this data is silently lost is due to the chained markers that 
> were introduced with CASSANDRA-3578.
> The problem we are addressing with this ticket is incrementally improving 
> 'durability' due to process crash, not host crash. (Note: operators should 
> use batch mode to ensure greater durability, but batch mode in it's current 
> implementation is a) borked, and b) will burn through, *very* rapidly, SSDs 
> that don't have a non-volatile write cache sitting in front.) 
> The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which 
> means that a node could lose up to ten seconds of data due to process crash. 
> The unfortunate thing is that the data is still avaialble, in the mmap file, 
> but we can't replay it due to incomplete chained markers.
> ftr, I don't believe we've ever had a stated policy about commitlog 
> durability wrt process crash. Pre-2.0 we naturally piggy-backed off the 
> memory mapped file and the fact that every mutation was acquired a lock and 
> wrote into the mmap buffer, and the ability to replay everything out of it 
> came for free. With CASSANDRA-3578, that was subtly changed. 
> Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust 
> the durability 
> guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit]
>  of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm 
> using that idea as a loose springboard for what to do here.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235707#comment-16235707
 ] 

Jason Brown commented on CASSANDRA-13987:
-

Just to add these here for completeness, I spoke with several other 
contributors, and here is a brief summary of each idea and my reasoning for not 
pursuing each.

[~mkjellman] proposed to reintroduce a lock to the commitlog path, albeit with 
a smaller scope. The basic idea would still use multiple threads to serialize 
the mutation into the log, but we would lock around getting the {{Allocation}} 
buffer and writing the mutation's length and checksum. This would allow us to 
be able to replay everything that successfully serialized into the commitlog; 
we could skip entries that did not completely serialize (and thus fail on 
deserialization) as we would be guaranteed the entry's length was written at 
the beginning of the entry (and thus we could skip to the next entry if 
possible).

The biggest downside here was the reintroduction of the lock, which is a larger 
topic than what I want to address here, and should involve a wider community 
discussion.

[~aweisberg] proposed having a mmaped sidekick file where we would capture the 
position (and checksum of the position) of each entry in the main commitlog 
file. The entries in the sidekick file would be fixed-size values (8 bytes), so 
we would always be able to read the values. We would use something like the 
main commitlog's CAS to allocate space for the sidekick entry, but the ordering 
in the sidekick entries are not guaranteed to be in the same order as the 
commit log's entries. On replay, we would need to read in the sidekick file to 
know the offsets, and we would need to attempt to replay as many of the entries 
from the main commitlog as appeared in the sidekick file.

While being a reasonably good idea, the downside for me is that introducing 
another file for ensuring more commitlog replayablility seemed more involved 
than probably necessary for the stated goal. Coorinated failures are already an 
edge condition, and imposing the sidekick file tax on every commitlog might be 
more than required. Also, I am concerned about the additional cost on replay to 
read the sidekick file, order the entries, and then ensure at least all those 
entries are replayed. We are sensitive to startup times, and this would add to 
it (albeit perhaps slightly). Another complicating factor for this idea is that 
is does not work with compressed or encrypted commitlogs.



> Multithreaded commitlog subtly changed durability
> -
>
> Key: CASSANDRA-13987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13987
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Major
> Fix For: 4.x
>
>
> When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly 
> changed the way that commitlog durability worked. Everything still gets 
> written to an mmap file. However, not everything is replayable from the 
> mmaped file after a process crash, in periodic mode.
> In brief, the reason this changesd is due to the chained markers that are 
> required for the multithreaded commit log. At each msync, we wait for 
> outstanding mutations to serialize into the commitlog, and update a marker 
> before and after the commits that have accumluated since the last sync. With 
> those markers, we can safely replay that section of the commitlog. Without 
> the markers, we have no guarantee that the commits in that section were 
> successfully written, thus we abandon those commits on replay.
> If you have correlated process failures of multiple nodes at "nearly" the 
> same time (see ["There Is No 
> Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have 
> data loss if none of the nodes msync the commitlog. For example, with RF=3, 
> if quorum write succeeds on two nodes (and we acknowledge the write back to 
> the client), and then the process on both nodes OOMs (say, due to reading the 
> index for a 100GB partition), the write will be lost if neither process 
> msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. 
> The reason why this data is silently lost is due to the chained markers that 
> were introduced with CASSANDRA-3578.
> The problem we are addressing with this ticket is incrementally improving 
> 'durability' due to process crash, not host crash. (Note: operators should 
> use batch mode to ensure greater durability, but batch mode in it's current 
> implementation is a) borked, and b) will burn through, *very* rapidly, SSDs 
> that don't have a non-volatile write cache sitting in front.) 
> The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which 
>

[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235703#comment-16235703
 ] 

Jason Brown commented on CASSANDRA-13987:
-

Here is a branch that takes the simplest path: it updates the commit log 
chained markers (in periodic mode) much more quickly than it mysnc's.

||trunk||
|[branch|https://github.com/jasobrown/cassandra/tree/commitlog_mmap-more-frequent-markers]|
|[utests|https://circleci.com/gh/jasobrown/cassandra/tree/commitlog_mmap-more-frequent-markers]|
|[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/407/]|

The basic idea is that if we can update the chained markers (say, (a 
configurable) once every 100 milliseconds), that should probably be more than 
enough time to survive a correlated failure between two nodes about OOM. This 
is *not* a silver bullet to ensure complete replayability, as in that case you 
should use batch commit log mode for each commit to ensure durability. There 
are alternatives that I discussed with others (see next comment).

This branch does not solve the problem for compressed/encrypted commitlog (in 
periodic mode), as those implementations do not use mmaped files. I am not sure 
how best (or if) to address those. Switching them to use a memory mapped file 
might not be too difficult, code-wise, but i'm not sure about performance 
implications. Apparently, [~benedict] and I had some discussion about the use 
of mmap and the commitlog a lng time ago (CASSANDRA-6809), but I honestly 
can't remember the details beyond our comments on that ticket.

> Multithreaded commitlog subtly changed durability
> -
>
> Key: CASSANDRA-13987
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13987
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Major
> Fix For: 4.x
>
>
> When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly 
> changed the way that commitlog durability worked. Everything still gets 
> written to an mmap file. However, not everything is replayable from the 
> mmaped file after a process crash, in periodic mode.
> In brief, the reason this changesd is due to the chained markers that are 
> required for the multithreaded commit log. At each msync, we wait for 
> outstanding mutations to serialize into the commitlog, and update a marker 
> before and after the commits that have accumluated since the last sync. With 
> those markers, we can safely replay that section of the commitlog. Without 
> the markers, we have no guarantee that the commits in that section were 
> successfully written, thus we abandon those commits on replay.
> If you have correlated process failures of multiple nodes at "nearly" the 
> same time (see ["There Is No 
> Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have 
> data loss if none of the nodes msync the commitlog. For example, with RF=3, 
> if quorum write succeeds on two nodes (and we acknowledge the write back to 
> the client), and then the process on both nodes OOMs (say, due to reading the 
> index for a 100GB partition), the write will be lost if neither process 
> msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. 
> The reason why this data is silently lost is due to the chained markers that 
> were introduced with CASSANDRA-3578.
> The problem we are addressing with this ticket is incrementally improving 
> 'durability' due to process crash, not host crash. (Note: operators should 
> use batch mode to ensure greater durability, but batch mode in it's current 
> implementation is a) borked, and b) will burn through, *very* rapidly, SSDs 
> that don't have a non-volatile write cache sitting in front.) 
> The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which 
> means that a node could lose up to ten seconds of data due to process crash. 
> The unfortunate thing is that the data is still avaialble, in the mmap file, 
> but we can't replay it due to incomplete chained markers.
> ftr, I don't believe we've ever had a stated policy about commitlog 
> durability wrt process crash. Pre-2.0 we naturally piggy-backed off the 
> memory mapped file and the fact that every mutation was acquired a lock and 
> wrote into the mmap buffer, and the ability to replay everything out of it 
> came for free. With CASSANDRA-3578, that was subtly changed. 
> Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust 
> the durability 
> guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit]
>  of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm 
> using that idea as a loose springboard for what to do

[jira] [Created] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability

Jason Brown created CASSANDRA-13987:
---

Summary: Multithreaded commitlog subtly changed durability
Key: CASSANDRA-13987
URL: https://issues.apache.org/jira/browse/CASSANDRA-13987
Project: Cassandra
Issue Type: Improvement
Reporter: Jason Brown
Assignee: Jason Brown
Priority: Major
Fix For: 4.x

When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly
changed the way that commitlog durability worked. Everything still gets written
to an mmap file. However, not everything is replayable from the mmaped file
after a process crash, in periodic mode.

In brief, the reason this changesd is due to the chained markers that are
required for the multithreaded commit log. At each msync, we wait for
outstanding mutations to serialize into the commitlog, and update a marker
before and after the commits that have accumluated since the last sync. With
those markers, we can safely replay that section of the commitlog. Without the
markers, we have no guarantee that the commits in that section were
successfully written, thus we abandon those commits on replay.

If you have correlated process failures of multiple nodes at "nearly" the same
time (see ["There Is No Now"|http://queue.acm.org/detail.cfm?id=2745385]), it
is possible to have data loss if none of the nodes msync the commitlog. For
example, with RF=3, if quorum write succeeds on two nodes (and we acknowledge
the write back to the client), and then the process on both nodes OOMs (say,
due to reading the index for a 100GB partition), the write will be lost if
neither process msync'ed the commitlog. More exactly, the commitlog cannot be
fully replayed. The reason why this data is silently lost is due to the chained
markers that were introduced with CASSANDRA-3578.

The problem we are addressing with this ticket is incrementally improving
'durability' due to process crash, not host crash. (Note: operators should use
batch mode to ensure greater durability, but batch mode in it's current
implementation is a) borked, and b) will burn through, *very* rapidly, SSDs
that don't have a non-volatile write cache sitting in front.)

The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which
means that a node could lose up to ten seconds of data due to process crash.
The unfortunate thing is that the data is still avaialble, in the mmap file,
but we can't replay it due to incomplete chained markers.

ftr, I don't believe we've ever had a stated policy about commitlog durability
wrt process crash. Pre-2.0 we naturally piggy-backed off the memory mapped file
and the fact that every mutation was acquired a lock and wrote into the mmap
buffer, and the ability to replay everything out of it came for free. With
CASSANDRA-3578, that was subtly changed.

Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust
the durability
guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit]
of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm using
that idea as a loose springboard for what to do here.

--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-10404) Node to Node encryption transitional mode


[ 
https://issues.apache.org/jira/browse/CASSANDRA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235603#comment-16235603
 ] 

Jason Brown commented on CASSANDRA-10404:
-

Thanks, [~eperott]. [~spo...@gmail.com] any additional comments or concerns?

> Node to Node encryption transitional mode
> -
>
> Key: CASSANDRA-10404
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10404
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Tom Lewis
>Assignee: Jason Brown
>Priority: Major
> Fix For: 4.x
>
>
> Create a transitional mode for encryption that allows encrypted and 
> unencrypted traffic node-to-node during a change over to encryption from 
> unencrypted. This alleviates downtime during the switch.
>  This is similar to CASSANDRA-10559 which is intended for client-to-node



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-10404) Node to Node encryption transitional mode