[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237059#comment-16237059 ] Paulo Motta commented on CASSANDRA-13948: - bq. Did additional testing and wasn't able to reproduce :-/ this looks similar to CASSANDRA-12743, so I wonder if it's an existing race that showed up due to the large compaction backlog after the deadlock was fixed. bq. I'll try the patch on more representative nodes in the coming days and report back any issue. sounds good, if you manage to reproduce it would be nice if you could change the log level of the {{org.apache.cassandra.db.compaction}} package to {{TRACE}}. > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > -
[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16237080#comment-16237080 ] Dikang Gu commented on CASSANDRA-13475: --- [~bdeggleston], I think I merge most of your questions into the quip, here is a snapshot of it: == Apache Cassandra Pluggable Storage Engine What is a Cassandra pluggable storage engine A Cassandra pluggable storage engine is the component in Cassandra database sever that is responsible for managing how data is stored, both in memory and disk, and performing actual data I/O operations for a database, as well as enabling certain features sets that target a specific application need. More concretely, the storage engine will be **responsible** for: 1. Physical Storage: This involves everything from supporting C* data types and table schema, as well as the format used for storing data to physical disk. 2. Query: storage engine will support point query and range query of data stored in the database. 3. Memory Caches: The storage engine may implement row cache or block cache, for query performance optimization. 4. Advanced Data Types: Like list/map/counter, it's up to storage engine whether to support the advanced data types or not. 5. Index Support: It's up to storage engine whether to support secondary index of the stored data or not. The storage engine will **NOT be responsible **for any distributed or network features, like schema, gossip, replication, streaming, repair, etc. Those features need to be implemented on top of the storage engine. Project Goal * Clear interface of the Pluggable Storage Engine, which means there is clear boundary on the storage engine, and we can drop in any storage engine implementation without any change of other components. * Refactor existing Cassandra code base to follow the pluggable storage engine architecture. Timelines/Guidelines I expect it will be year long effort to refactor existing storage engine to follow a mature pluggable storage engine API. During the time, we will refactor the existing storage engine piece by piece, there should be no regression (performance, reliability or testability) introduced during the process. (Very high level) Designs streaming Current streaming is coupled with storage engine, but it's not necessary. The StreamSession class could be very general streaming handling framework. My proposal is that, for the three streaming phases: 1. Connections Initialization: It could be remain unchanged. 2. Stream preparation phase: We abstract the StreamTransferTask and StreamReceiveTask, each storage engine will implement its own TransferTask and ReceiveTask, which hide the details about how to buck read/write to the storage engine. 3. Streaming phase: Each storage engine implement its own StreamReader and StreamWriter, to read/write data from/into the stream. On the receiving side, once the streamed message is fully received, the implementation will be responsible for ingesting the streamed files into the engine, and make it available for client requests. repair For repair, my idea is that we can keep the high level design, that uses Merkle trees to calculate the difference, and then uses the streaming framework to streaming the data. To calculate Merkle trees, different storage engine will have different implementation, a naive way is to sequential scan a token range to build the Merkel trees, and then stream the inconsistent token range. It should be doable. But the incremental repair may not be supported by all storage engines. keyspace Metadata Let's say we can config the storage engine per keyspace, under this design, we can add a storage engine option in the KeyspaceParams which is stored in KeyspaceMetadata. We can support setting the storage engine during the creation of the keyspace, in CQL. Also, we can support the mechanism to be able to overwrite the option per server. In this case, streaming between different storage engine needs to be supported. When we open or initial a keyspace, we will pick the specific storage engine based on the option in KeyspaceParams. table metadata I think we can keep most of the options in the TableParams, https://github.com/apache/cassandra/blob/8b3a60b9a7dbefeecc06bace617279612ec7092d/src/java/org/apache/cassandra/schema/TableParams.java#L36 The storage engine needs to respect the options in the TableParams, and apply them if possible. For example, if the storage engine is not a LSM tree based implementation, then it may not need compaction, then it will ignore that option. For storage engine specific options, again, like the compaction, we can move them out of the general params, and allow to load them from some config files. Metrics Each storage engine can implement its own JMX/MBeans, then metrics can still be exposed through JMX. read path Each storage
[jira] [Updated] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dikang Gu updated CASSANDRA-13475: -- Description: In order to support pluggable storage engine, we need to define a unified interface/API, which can allow us to plug in different storage engines for different requirements. Here is a design quip we are currently working on: https://quip.com/bhw5ABUCi3co In very high level, the storage engine interface should include APIs to: 1. Apply update into the engine. 2. Query data from the engine. 3. Stream data in/out to/from the engine. 4. Table operations, like create/drop/truncate a table, etc. 5. Various stats about the engine. I create this ticket to start the discussions about the interface. was: In order to support pluggable storage engine, we need to define a unified interface/API, which can allow us to plug in different storage engines for different requirements. In very high level, the storage engine interface should include APIs to: 1. Apply update into the engine. 2. Query data from the engine. 3. Stream data in/out to/from the engine. 4. Table operations, like create/drop/truncate a table, etc. 5. Various stats about the engine. I create this ticket to start the discussions about the interface. > First version of pluggable storage engine API. > -- > > Key: CASSANDRA-13475 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13475 > Project: Cassandra > Issue Type: Sub-task >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > > In order to support pluggable storage engine, we need to define a unified > interface/API, which can allow us to plug in different storage engines for > different requirements. > Here is a design quip we are currently working on: > https://quip.com/bhw5ABUCi3co > In very high level, the storage engine interface should include APIs to: > 1. Apply update into the engine. > 2. Query data from the engine. > 3. Stream data in/out to/from the engine. > 4. Table operations, like create/drop/truncate a table, etc. > 5. Various stats about the engine. > I create this ticket to start the discussions about the interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12182) redundant StatusLogger print out when both dropped message and long GC event happen
[ https://issues.apache.org/jira/browse/CASSANDRA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236948#comment-16236948 ] Kurt Greaves commented on CASSANDRA-12182: -- Thanks [~mychal] I can RIP now that I know no one will ever break the status logger again. I ran the test a few hundred times to confirm there is no flakiness and seems like it works perfectly so props to you. I noticed that the "StatusLogger is busy" message obviously interleaves with the actual StatusLogger dump, but I think this is fine as unless it happens a ludicrous amount it doesn't really interfere with the printout, and if it's happening that much to cause problems well the dumps probably aren't really a major concern. I've pinged IRC for someone to doubly review/commit. > redundant StatusLogger print out when both dropped message and long GC event > happen > --- > > Key: CASSANDRA-12182 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12182 > Project: Cassandra > Issue Type: Bug >Reporter: Wei Deng >Assignee: Michał Szczygieł >Priority: Minor > Labels: lhf > Attachments: 12182-trunk.txt, 12182-trunk.txt > > > I was stress testing a C* 3.0 environment and it appears that when the CPU is > running low, HINT and MUTATION messages will start to get dropped, and the GC > thread can also get some really long-running GC, and I'd get some redundant > log entries in system.log like the following: > {noformat} > WARN [Service Thread] 2016-07-12 22:48:45,748 GCInspector.java:282 - G1 > Young Generation GC in 522ms. G1 Eden Space: 68157440 -> 0; G1 Old Gen: > 3376113224 -> 3468387912; G1 Survivor Space: 24117248 -> 0; > INFO [Service Thread] 2016-07-12 22:48:45,763 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,775 MessagingService.java:983 - > MUTATION messages were dropped in last 5000 ms: 419 for internal timeout and > 0 for cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 MessagingService.java:983 - > HINT messages were dropped in last 5000 ms: 89 for internal timeout and 0 for > cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > MutationStage32 4194 32997234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,799 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,800 StatusLogger.java:56 - > MutationStage32 4363 32997333 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > RequestResponseStage 0 0 11094437 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,803 StatusLogger.java:56 - > RequestResponseStage 4 0 11094509 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,807 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,808 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > MiscStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > CompactionExecutor262 1234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,810 StatusLogger.java:56 - > MemtableReclaimMemory 0 0 79 0 > 0 > INFO
[jira] [Commented] (CASSANDRA-12182) redundant StatusLogger print out when both dropped message and long GC event happen
[ https://issues.apache.org/jira/browse/CASSANDRA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236776#comment-16236776 ] Michał Szczygieł commented on CASSANDRA-12182: -- Thank you [~KurtG] for the feedback. I've attached patch with a testcase. > redundant StatusLogger print out when both dropped message and long GC event > happen > --- > > Key: CASSANDRA-12182 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12182 > Project: Cassandra > Issue Type: Bug >Reporter: Wei Deng >Assignee: Michał Szczygieł >Priority: Minor > Labels: lhf > Attachments: 12182-trunk.txt, 12182-trunk.txt > > > I was stress testing a C* 3.0 environment and it appears that when the CPU is > running low, HINT and MUTATION messages will start to get dropped, and the GC > thread can also get some really long-running GC, and I'd get some redundant > log entries in system.log like the following: > {noformat} > WARN [Service Thread] 2016-07-12 22:48:45,748 GCInspector.java:282 - G1 > Young Generation GC in 522ms. G1 Eden Space: 68157440 -> 0; G1 Old Gen: > 3376113224 -> 3468387912; G1 Survivor Space: 24117248 -> 0; > INFO [Service Thread] 2016-07-12 22:48:45,763 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,775 MessagingService.java:983 - > MUTATION messages were dropped in last 5000 ms: 419 for internal timeout and > 0 for cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 MessagingService.java:983 - > HINT messages were dropped in last 5000 ms: 89 for internal timeout and 0 for > cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > MutationStage32 4194 32997234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,799 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,800 StatusLogger.java:56 - > MutationStage32 4363 32997333 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > RequestResponseStage 0 0 11094437 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,803 StatusLogger.java:56 - > RequestResponseStage 4 0 11094509 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,807 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,808 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > MiscStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > CompactionExecutor262 1234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,810 StatusLogger.java:56 - > MemtableReclaimMemory 0 0 79 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,810 StatusLogger.java:56 - > PendingRangeCalculator0 0 3 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,819 StatusLogger.java:56 - > GossipStage 0 0 5214 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,820 StatusLogger.java:56 - > SecondaryIndexManagement 0 0 3 0 > 0 > INFO
[jira] [Updated] (CASSANDRA-12182) redundant StatusLogger print out when both dropped message and long GC event happen
[ https://issues.apache.org/jira/browse/CASSANDRA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michał Szczygieł updated CASSANDRA-12182: - Status: Patch Available (was: In Progress) > redundant StatusLogger print out when both dropped message and long GC event > happen > --- > > Key: CASSANDRA-12182 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12182 > Project: Cassandra > Issue Type: Bug >Reporter: Wei Deng >Assignee: Michał Szczygieł >Priority: Minor > Labels: lhf > Attachments: 12182-trunk.txt, 12182-trunk.txt > > > I was stress testing a C* 3.0 environment and it appears that when the CPU is > running low, HINT and MUTATION messages will start to get dropped, and the GC > thread can also get some really long-running GC, and I'd get some redundant > log entries in system.log like the following: > {noformat} > WARN [Service Thread] 2016-07-12 22:48:45,748 GCInspector.java:282 - G1 > Young Generation GC in 522ms. G1 Eden Space: 68157440 -> 0; G1 Old Gen: > 3376113224 -> 3468387912; G1 Survivor Space: 24117248 -> 0; > INFO [Service Thread] 2016-07-12 22:48:45,763 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,775 MessagingService.java:983 - > MUTATION messages were dropped in last 5000 ms: 419 for internal timeout and > 0 for cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 MessagingService.java:983 - > HINT messages were dropped in last 5000 ms: 89 for internal timeout and 0 for > cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > MutationStage32 4194 32997234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,799 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,800 StatusLogger.java:56 - > MutationStage32 4363 32997333 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > RequestResponseStage 0 0 11094437 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,803 StatusLogger.java:56 - > RequestResponseStage 4 0 11094509 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,807 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,808 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > MiscStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > CompactionExecutor262 1234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,810 StatusLogger.java:56 - > MemtableReclaimMemory 0 0 79 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,810 StatusLogger.java:56 - > PendingRangeCalculator0 0 3 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,819 StatusLogger.java:56 - > GossipStage 0 0 5214 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,820 StatusLogger.java:56 - > SecondaryIndexManagement 0 0 3 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,820 StatusLogger.java:56 - >
[jira] [Updated] (CASSANDRA-12182) redundant StatusLogger print out when both dropped message and long GC event happen
[ https://issues.apache.org/jira/browse/CASSANDRA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michał Szczygieł updated CASSANDRA-12182: - Attachment: 12182-trunk.txt > redundant StatusLogger print out when both dropped message and long GC event > happen > --- > > Key: CASSANDRA-12182 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12182 > Project: Cassandra > Issue Type: Bug >Reporter: Wei Deng >Assignee: Michał Szczygieł >Priority: Minor > Labels: lhf > Attachments: 12182-trunk.txt, 12182-trunk.txt > > > I was stress testing a C* 3.0 environment and it appears that when the CPU is > running low, HINT and MUTATION messages will start to get dropped, and the GC > thread can also get some really long-running GC, and I'd get some redundant > log entries in system.log like the following: > {noformat} > WARN [Service Thread] 2016-07-12 22:48:45,748 GCInspector.java:282 - G1 > Young Generation GC in 522ms. G1 Eden Space: 68157440 -> 0; G1 Old Gen: > 3376113224 -> 3468387912; G1 Survivor Space: 24117248 -> 0; > INFO [Service Thread] 2016-07-12 22:48:45,763 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,775 MessagingService.java:983 - > MUTATION messages were dropped in last 5000 ms: 419 for internal timeout and > 0 for cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 MessagingService.java:983 - > HINT messages were dropped in last 5000 ms: 89 for internal timeout and 0 for > cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > MutationStage32 4194 32997234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,799 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,800 StatusLogger.java:56 - > MutationStage32 4363 32997333 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > RequestResponseStage 0 0 11094437 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,803 StatusLogger.java:56 - > RequestResponseStage 4 0 11094509 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,807 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,808 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > MiscStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > CompactionExecutor262 1234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,810 StatusLogger.java:56 - > MemtableReclaimMemory 0 0 79 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,810 StatusLogger.java:56 - > PendingRangeCalculator0 0 3 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,819 StatusLogger.java:56 - > GossipStage 0 0 5214 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,820 StatusLogger.java:56 - > SecondaryIndexManagement 0 0 3 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,820 StatusLogger.java:56 - > HintsDispatcher
[jira] [Comment Edited] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236746#comment-16236746 ] Blake Eggleston edited comment on CASSANDRA-13475 at 11/2/17 10:51 PM: --- Let's keep discussion on this jira for the time being. Also, we're just talking about a plan at this point. What do you think of the plan as proposed? Any concerns? Things you think should be added, removed, or reordered? edit: sorry Jason, I responded before I saw your response was (Author: bdeggleston): Let's keep discussion on this jira for the time being. Also, we're just talking about a plan at this point. What do you think of the plan as proposed? Any concerns? Things you think should be added, removed, or reordered? > First version of pluggable storage engine API. > -- > > Key: CASSANDRA-13475 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13475 > Project: Cassandra > Issue Type: Sub-task >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > > In order to support pluggable storage engine, we need to define a unified > interface/API, which can allow us to plug in different storage engines for > different requirements. > In very high level, the storage engine interface should include APIs to: > 1. Apply update into the engine. > 2. Query data from the engine. > 3. Stream data in/out to/from the engine. > 4. Table operations, like create/drop/truncate a table, etc. > 5. Various stats about the engine. > I create this ticket to start the discussions about the interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13592) Null Pointer exception at SELECT JSON statement
[ https://issues.apache.org/jira/browse/CASSANDRA-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236755#comment-16236755 ] Chris mildebrandt commented on CASSANDRA-13592: --- I'm getting almost exactly the same stacktrace using Cassandra 3.11.1: {noformat} java.lang.NullPointerException: null at org.apache.cassandra.dht.Murmur3Partitioner.getHash(Murmur3Partitioner.java:230) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.dht.Murmur3Partitioner.decorateKey(Murmur3Partitioner.java:66) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.config.CFMetaData.decorateKey(CFMetaData.java:627) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.service.pager.PartitionRangeQueryPager.(PartitionRangeQueryPager.java:44) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.db.PartitionRangeReadCommand.getPager(PartitionRangeReadCommand.java:268) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.cql3.statements.SelectStatement.getPager(SelectStatement.java:475) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:288) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:118) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:224) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:530) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.cql3.QueryProcessor.processPrepared(QueryProcessor.java:507) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.transport.messages.ExecuteMessage.execute(ExecuteMessage.java:146) ~[apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:517) [apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:410) [apache-cassandra-3.11.1.jar:3.11.1] at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:357) [netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext.access$600(AbstractChannelHandlerContext.java:35) [netty-all-4.0.44.Final.jar:4.0.44.Final] at io.netty.channel.AbstractChannelHandlerContext$7.run(AbstractChannelHandlerContext.java:348) [netty-all-4.0.44.Final.jar:4.0.44.Final] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_131] at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) [apache-cassandra-3.11.1.jar:3.11.1] at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [apache-cassandra-3.11.1.jar:3.11.1] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131] {noformat} It can be recreated with this project: https://github.com/eyeofthefrog/CASSANDRA-13592 I think it's the same root cause, but let me know if I should open another issue. > Null Pointer exception at SELECT JSON statement > --- > > Key: CASSANDRA-13592 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13592 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: Debian Linux >Reporter: Wyss Philipp >Assignee: ZhaoYang >Priority: Major > Labels: beginner > Fix For: 2.2.11, 3.0.15, 3.11.1, 4.0 > > Attachments: system.log > > > A Nulll pointer exception appears when the command > {code} > SELECT JSON * FROM examples.basic; > ---MORE--- > message="java.lang.NullPointerException"> > Examples.basic has the following description (DESC examples.basic;): > CREATE TABLE examples.basic ( > key frozen> PRIMARY KEY, > wert text > ) WITH bloom_filter_fp_chance = 0.01 > AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', > 'max_threshold': '32', 'min_threshold': '4'} > AND compression = {'chunk_length_in_kb': '64', 'class': > 'org.apache.cassandra.io.compress.LZ4Compressor'} > AND crc_check_chance = 1.0 > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND
[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236746#comment-16236746 ] Blake Eggleston commented on CASSANDRA-13475: - Let's keep discussion on this jira for the time being. Also, we're just talking about a plan at this point. What do you think of the plan as proposed? Any concerns? Things you think should be added, removed, or reordered? > First version of pluggable storage engine API. > -- > > Key: CASSANDRA-13475 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13475 > Project: Cassandra > Issue Type: Sub-task >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > > In order to support pluggable storage engine, we need to define a unified > interface/API, which can allow us to plug in different storage engines for > different requirements. > In very high level, the storage engine interface should include APIs to: > 1. Apply update into the engine. > 2. Query data from the engine. > 3. Stream data in/out to/from the engine. > 4. Table operations, like create/drop/truncate a table, etc. > 5. Various stats about the engine. > I create this ticket to start the discussions about the interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236734#comment-16236734 ] Jason Brown commented on CASSANDRA-13475: - [~dikanggu] please send out a message to the dev@ ML with the link to your quip doc, that way folks who aren't following this ticket (right now) can know where the action is taking place. > First version of pluggable storage engine API. > -- > > Key: CASSANDRA-13475 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13475 > Project: Cassandra > Issue Type: Sub-task >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > > In order to support pluggable storage engine, we need to define a unified > interface/API, which can allow us to plug in different storage engines for > different requirements. > In very high level, the storage engine interface should include APIs to: > 1. Apply update into the engine. > 2. Query data from the engine. > 3. Stream data in/out to/from the engine. > 4. Table operations, like create/drop/truncate a table, etc. > 5. Various stats about the engine. > I create this ticket to start the discussions about the interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236710#comment-16236710 ] Dikang Gu commented on CASSANDRA-13475: --- [~bdeggleston], yeah, they are very good points. To have a central place for the discussion, I will try to answer your questions, and add more details to the quip: https://quip.com/bhw5ABUCi3co. Everyone should have access to the quip, and please feel free to edit/comment on it. > First version of pluggable storage engine API. > -- > > Key: CASSANDRA-13475 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13475 > Project: Cassandra > Issue Type: Sub-task >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > > In order to support pluggable storage engine, we need to define a unified > interface/API, which can allow us to plug in different storage engines for > different requirements. > In very high level, the storage engine interface should include APIs to: > 1. Apply update into the engine. > 2. Query data from the engine. > 3. Stream data in/out to/from the engine. > 4. Table operations, like create/drop/truncate a table, etc. > 5. Various stats about the engine. > I create this ticket to start the discussions about the interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.
[ https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236648#comment-16236648 ] Blake Eggleston commented on CASSANDRA-13475: - I think it’s too early to start looking at code, or talking about api specifics. We should start by getting a rough plan together. My thoughts on an initial plan are below. This is just a rough idea dump, so let me know if I’ve missed anything. # Discuss expectations, guidelines, non-technical stuff, etc. ** Let’s start off by making sure we’re all on the same page about: *** What we expect the end result to be *** Guidelines on planning / implementing component refactors *** Any approximate timelines you have in mind, if any *** Pluggable storage's place in the cassandra project # Agree on the boundaries of the storage engine layer. What it is and isn’t responsible for. ** This has already been discussed to some degree, but let’s agree on a definition. # Work out a strategy for streaming and repair ** This is a bit hand wavy at the moment, and not having a solid streaming and repair story is a non starter. So let’s figure out how that’s going to work (including incremental repair) before we get too deep into anything els # Decide how schema ui / metadata will be refactored to support multiple storage engines # Work out a strategy for exposing metrics / monitoring from different engines. # Migrate read command and write logic into cfs # Identify remaining leaky parts of CFS class. ** Some of this will be legit storage implementation details. Other parts will be systems we’ve missed, or things that need to be abstracted. # Identify systems not controlled by CFS that interacts with storage layer on it’s own # Implement streaming / repair changes # Refactor each leaky group of cfs components # Refactor each non-cfs system that interacts with storage layer. # Refactor metrics/monitoring systems # Refactor schema ui, metadata implementation # Extract interfaces from CFS and keyspace # Introduce pluggable Keyspace/CFS factories controlled by schema Thoughts? > First version of pluggable storage engine API. > -- > > Key: CASSANDRA-13475 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13475 > Project: Cassandra > Issue Type: Sub-task >Reporter: Dikang Gu >Assignee: Dikang Gu >Priority: Major > > In order to support pluggable storage engine, we need to define a unified > interface/API, which can allow us to plug in different storage engines for > different requirements. > In very high level, the storage engine interface should include APIs to: > 1. Apply update into the engine. > 2. Query data from the engine. > 3. Stream data in/out to/from the engine. > 4. Table operations, like create/drop/truncate a table, etc. > 5. Various stats about the engine. > I create this ticket to start the discussions about the interface. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13948) Reload compaction strategies when JBOD disk boundary changes
[ https://issues.apache.org/jira/browse/CASSANDRA-13948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236514#comment-16236514 ] Loic Lambiel commented on CASSANDRA-13948: -- Did additional testing and wasn't able to reproduce :-/ I'll try the patch on more representative nodes in the coming days and report back any issue. > Reload compaction strategies when JBOD disk boundary changes > > > Key: CASSANDRA-13948 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13948 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Major > Fix For: 3.11.x, 4.x > > Attachments: debug.log > > > The thread dump below shows a race between an sstable replacement by the > {{IndexSummaryRedistribution}} and > {{AbstractCompactionTask.getNextBackgroundTask}}: > {noformat} > Thread 94580: (state = BLOCKED) > - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information > may be imprecise) > - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14, > line=175 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt() > @bci=1, line=836 (Compiled frame) > - > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(java.util.concurrent.locks.AbstractQueuedSynchronizer$Node, > int) @bci=67, line=870 (Compiled frame) > - java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(int) > @bci=17, line=1199 (Compiled frame) > - java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock() @bci=5, > line=943 (Compiled frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleListChangedNotification(java.lang.Iterable, > java.lang.Iterable) @bci=359, line=483 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionStrategyManager.handleNotification(org.apache.cassandra.notifications.INotification, > java.lang.Object) @bci=53, line=555 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.Tracker.notifySSTablesChanged(java.util.Collection, > java.util.Collection, org.apache.cassandra.db.compaction.OperationType, > java.lang.Throwable) @bci=50, line=409 (Interpreted frame) > - > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doCommit(java.lang.Throwable) > @bci=157, line=227 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit(java.lang.Throwable) > @bci=61, line=116 (Compiled frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.commit() > @bci=2, line=200 (Interpreted frame) > - > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.finish() > @bci=5, line=185 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryRedistribution.redistributeSummaries() > @bci=559, line=130 (Interpreted frame) > - > org.apache.cassandra.db.compaction.CompactionManager.runIndexSummaryRedistribution(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=9, line=1420 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(org.apache.cassandra.io.sstable.IndexSummaryRedistribution) > @bci=4, line=250 (Interpreted frame) > - > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries() > @bci=30, line=228 (Interpreted frame) > - org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow() > @bci=4, line=125 (Interpreted frame) > - org.apache.cassandra.utils.WrappedRunnable.run() @bci=1, line=28 > (Interpreted frame) > - > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run() > @bci=4, line=118 (Compiled frame) > - java.util.concurrent.Executors$RunnableAdapter.call() @bci=4, line=511 > (Compiled frame) > - java.util.concurrent.FutureTask.runAndReset() @bci=47, line=308 (Compiled > frame) > - > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask) > @bci=1, line=180 (Compiled frame) > - java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run() > @bci=37, line=294 (Compiled frame) > - > java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker) > @bci=95, line=1149 (Compiled frame) > - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=624 > (Interpreted frame) > - > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(java.lang.Runnable) > @bci=1, line=81 (Interpreted frame) > - org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$8.run() @bci=4 > (Interpreted frame) > -
[jira] [Commented] (CASSANDRA-13988) Add a timeout field to EXECUTE / QUERY / BATCH messages
[ https://issues.apache.org/jira/browse/CASSANDRA-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236307#comment-16236307 ] Aleksey Yeschenko commented on CASSANDRA-13988: --- Pretty sure this is a duplicate of CASSANDRA-2848. > Add a timeout field to EXECUTE / QUERY / BATCH messages > --- > > Key: CASSANDRA-13988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13988 > Project: Cassandra > Issue Type: Improvement >Reporter: Michaël Figuière >Priority: Minor > > The request timeout at the coordinator level is currently statically > configured through the {{request_timeout_in_ms}} and > {{xxx_request_timeout_in_ms}} parameters in cassandra.yaml. There would be > some benefits in making it possible for the client to dynamically define it > through the CQL Protocol: > * In practice, there's often a misalignment between the timeout configured in > Cassandra and in the client leading non-optimal query execution flow, where > the coordinator continues to work while the client is not waiting anymore, or > where the client waits for too long for a potential response. The 99th > percentile latency can be significantly impacted by such issues. > * While the read timeout is typically statically configured on the Drivers, > on the Java Driver 3.x the developer is free to set a custom timeout using > {{ResultSetFuture#get(long, TimeUnit)}} which can lead to an extra > misalignment of timeouts with the coordinator. The Java Driver 4.x will make > the timeout configurable per query through its new {{DriverConfigProfile}} > abstraction. > * It makes it possible for applications to shift to a "remaining time budget" > approach rather than the often inappropriate static timeout one. Also, the > Java Driver 4.x plans to change its definition of {{readTimeout}} from a per > execution attempt time to an overall query execution time. So the Driver > itself would also be able to work on a "remaining time budget" for each of > its execution attempts. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-12182) redundant StatusLogger print out when both dropped message and long GC event happen
[ https://issues.apache.org/jira/browse/CASSANDRA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michał Szczygieł updated CASSANDRA-12182: - Status: In Progress (was: Patch Available) > redundant StatusLogger print out when both dropped message and long GC event > happen > --- > > Key: CASSANDRA-12182 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12182 > Project: Cassandra > Issue Type: Bug >Reporter: Wei Deng >Assignee: Michał Szczygieł >Priority: Minor > Labels: lhf > Attachments: 12182-trunk.txt > > > I was stress testing a C* 3.0 environment and it appears that when the CPU is > running low, HINT and MUTATION messages will start to get dropped, and the GC > thread can also get some really long-running GC, and I'd get some redundant > log entries in system.log like the following: > {noformat} > WARN [Service Thread] 2016-07-12 22:48:45,748 GCInspector.java:282 - G1 > Young Generation GC in 522ms. G1 Eden Space: 68157440 -> 0; G1 Old Gen: > 3376113224 -> 3468387912; G1 Survivor Space: 24117248 -> 0; > INFO [Service Thread] 2016-07-12 22:48:45,763 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,775 MessagingService.java:983 - > MUTATION messages were dropped in last 5000 ms: 419 for internal timeout and > 0 for cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 MessagingService.java:983 - > HINT messages were dropped in last 5000 ms: 89 for internal timeout and 0 for > cross node timeout > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,776 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > MutationStage32 4194 32997234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,798 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,799 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,800 StatusLogger.java:56 - > MutationStage32 4363 32997333 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,801 StatusLogger.java:56 - > ReadStage 0 0940 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > RequestResponseStage 0 0 11094437 0 > 0 > INFO [Service Thread] 2016-07-12 22:48:45,802 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,803 StatusLogger.java:56 - > RequestResponseStage 4 0 11094509 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,807 StatusLogger.java:56 - > ReadRepairStage 0 0 5 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,808 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > MiscStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,809 StatusLogger.java:56 - > CompactionExecutor262 1234 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,810 StatusLogger.java:56 - > MemtableReclaimMemory 0 0 79 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,810 StatusLogger.java:56 - > PendingRangeCalculator0 0 3 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,819 StatusLogger.java:56 - > GossipStage 0 0 5214 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,820 StatusLogger.java:56 - > SecondaryIndexManagement 0 0 3 0 > 0 > INFO [ScheduledTasks:1] 2016-07-12 22:48:45,820 StatusLogger.java:56 - > HintsDispatcher
[jira] [Commented] (CASSANDRA-13988) Add a timeout field to EXECUTE / QUERY / BATCH messages
[ https://issues.apache.org/jira/browse/CASSANDRA-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236282#comment-16236282 ] Michaël Figuière commented on CASSANDRA-13988: -- Looking into it, it seems like the {{ReadCommand#getTimeout()}} abstract method offers a convenient opportunity to implement this feature. > Add a timeout field to EXECUTE / QUERY / BATCH messages > --- > > Key: CASSANDRA-13988 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13988 > Project: Cassandra > Issue Type: Improvement >Reporter: Michaël Figuière >Priority: Minor > > The request timeout at the coordinator level is currently statically > configured through the {{request_timeout_in_ms}} and > {{xxx_request_timeout_in_ms}} parameters in cassandra.yaml. There would be > some benefits in making it possible for the client to dynamically define it > through the CQL Protocol: > * In practice, there's often a misalignment between the timeout configured in > Cassandra and in the client leading non-optimal query execution flow, where > the coordinator continues to work while the client is not waiting anymore, or > where the client waits for too long for a potential response. The 99th > percentile latency can be significantly impacted by such issues. > * While the read timeout is typically statically configured on the Drivers, > on the Java Driver 3.x the developer is free to set a custom timeout using > {{ResultSetFuture#get(long, TimeUnit)}} which can lead to an extra > misalignment of timeouts with the coordinator. The Java Driver 4.x will make > the timeout configurable per query through its new {{DriverConfigProfile}} > abstraction. > * It makes it possible for applications to shift to a "remaining time budget" > approach rather than the often inappropriate static timeout one. Also, the > Java Driver 4.x plans to change its definition of {{readTimeout}} from a per > execution attempt time to an overall query execution time. So the Driver > itself would also be able to work on a "remaining time budget" for each of > its execution attempts. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-13988) Add a timeout field to EXECUTE / QUERY / BATCH messages
Michaël Figuière created CASSANDRA-13988: Summary: Add a timeout field to EXECUTE / QUERY / BATCH messages Key: CASSANDRA-13988 URL: https://issues.apache.org/jira/browse/CASSANDRA-13988 Project: Cassandra Issue Type: Improvement Reporter: Michaël Figuière Priority: Minor The request timeout at the coordinator level is currently statically configured through the {{request_timeout_in_ms}} and {{xxx_request_timeout_in_ms}} parameters in cassandra.yaml. There would be some benefits in making it possible for the client to dynamically define it through the CQL Protocol: * In practice, there's often a misalignment between the timeout configured in Cassandra and in the client leading non-optimal query execution flow, where the coordinator continues to work while the client is not waiting anymore, or where the client waits for too long for a potential response. The 99th percentile latency can be significantly impacted by such issues. * While the read timeout is typically statically configured on the Drivers, on the Java Driver 3.x the developer is free to set a custom timeout using {{ResultSetFuture#get(long, TimeUnit)}} which can lead to an extra misalignment of timeouts with the coordinator. The Java Driver 4.x will make the timeout configurable per query through its new {{DriverConfigProfile}} abstraction. * It makes it possible for applications to shift to a "remaining time budget" approach rather than the often inappropriate static timeout one. Also, the Java Driver 4.x plans to change its definition of {{readTimeout}} from a per execution attempt time to an overall query execution time. So the Driver itself would also be able to work on a "remaining time budget" for each of its execution attempts. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[2/3] cassandra git commit: ninja-fix comment to correct the default RING_DEALY value
ninja-fix comment to correct the default RING_DEALY value Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c8a3b58b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c8a3b58b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c8a3b58b Branch: refs/heads/trunk Commit: c8a3b58bdbf12909ac0a823308e8a278cd02001b Parents: ea443df Author: Jason BrownAuthored: Thu Nov 2 10:46:24 2017 -0700 Committer: Jason Brown Committed: Thu Nov 2 10:46:24 2017 -0700 -- conf/jvm.options | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c8a3b58b/conf/jvm.options -- diff --git a/conf/jvm.options b/conf/jvm.options index f91466a..bfe2da9 100644 --- a/conf/jvm.options +++ b/conf/jvm.options @@ -49,7 +49,7 @@ # Allow restoring specific tables from an archived commit log. #-Dcassandra.replayList=table -# Allows overriding of the default RING_DELAY (1000ms), which is the amount of time a node waits +# Allows overriding of the default RING_DELAY (3ms), which is the amount of time a node waits # before joining the ring. #-Dcassandra.ring_delay_ms=ms - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[1/3] cassandra git commit: ninja-fix comment to correct the default RING_DEALY value
Repository: cassandra Updated Branches: refs/heads/cassandra-3.11 ea443dfe3 -> c8a3b58bd refs/heads/trunk 684e250ba -> 87962dcf3 ninja-fix comment to correct the default RING_DEALY value Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/c8a3b58b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/c8a3b58b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/c8a3b58b Branch: refs/heads/cassandra-3.11 Commit: c8a3b58bdbf12909ac0a823308e8a278cd02001b Parents: ea443df Author: Jason BrownAuthored: Thu Nov 2 10:46:24 2017 -0700 Committer: Jason Brown Committed: Thu Nov 2 10:46:24 2017 -0700 -- conf/jvm.options | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/c8a3b58b/conf/jvm.options -- diff --git a/conf/jvm.options b/conf/jvm.options index f91466a..bfe2da9 100644 --- a/conf/jvm.options +++ b/conf/jvm.options @@ -49,7 +49,7 @@ # Allow restoring specific tables from an archived commit log. #-Dcassandra.replayList=table -# Allows overriding of the default RING_DELAY (1000ms), which is the amount of time a node waits +# Allows overriding of the default RING_DELAY (3ms), which is the amount of time a node waits # before joining the ring. #-Dcassandra.ring_delay_ms=ms - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[3/3] cassandra git commit: Merge branch 'cassandra-3.11' into trunk
Merge branch 'cassandra-3.11' into trunk Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/87962dcf Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/87962dcf Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/87962dcf Branch: refs/heads/trunk Commit: 87962dcf364944f656b5212b8418432fbd1c4b95 Parents: 684e250 c8a3b58 Author: Jason BrownAuthored: Thu Nov 2 10:46:42 2017 -0700 Committer: Jason Brown Committed: Thu Nov 2 10:47:17 2017 -0700 -- conf/jvm.options | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/87962dcf/conf/jvm.options -- - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13973) IllegalArgumentException in upgradesstables compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236142#comment-16236142 ] Dan Kinder commented on CASSANDRA-13973: Thanks [~jjirsa] I'll give it a shot. > IllegalArgumentException in upgradesstables compaction > -- > > Key: CASSANDRA-13973 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13973 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Dan Kinder >Assignee: Jeff Jirsa >Priority: Major > Fix For: 3.0.x, 3.11.x, 4.x > > > After an upgrade from 2.2.6 to 3.0.15 (sstable version la to mc), when I try > to run upgradesstables, most of them upgrade fine but I see the exception > below on several nodes, and it doesn't complete. > CASSANDRA-12717 looks similar but the stack trace is not the same, so I > assumed it is not identical. The various nodes this happens on all give the > same trace. > Might be notable that this is an analytics cluster with some large > partitions, in the GB size. > {noformat} > error: Out of range: 7316844981 > -- StackTrace -- > java.lang.IllegalArgumentException: Out of range: 7316844981 > at com.google.common.primitives.Ints.checkedCast(Ints.java:91) > at > org.apache.cassandra.db.RowIndexEntry$IndexedEntry.promotedSize(RowIndexEntry.java:329) > at > org.apache.cassandra.db.RowIndexEntry$Serializer.serialize(RowIndexEntry.java:133) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.append(BigTableWriter.java:409) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.afterAppend(BigTableWriter.java:120) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:157) > at > org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125) > at > org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:88) > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > at > org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:424) > at > org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:311) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-13973) IllegalArgumentException in upgradesstables compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236109#comment-16236109 ] Jeff Jirsa edited comment on CASSANDRA-13973 at 11/2/17 4:52 PM: - Thanks for the feedback [~slebresne]. I really appreciate you taking the time to respond. The risky feeling here is why I'm moving slow on this myself - it seems straightforward, but obviously the potential for unexpected surprises here is pretty high. Better messaging on that assert makes a lot of sense. Also good to see the second confirmation that changing {{column_index_size_in_kb}} is a good workaround, the drawback is that it's instance-wide, so if you have just a handful of wide rows (like this user, the histogram shows their 99% size is less than 1MB, but their max size is 394GB), you suffer a disk penalty on all keyspaces/tables/rows in order to not crash on the one bad row. [~dankinder] if you need to unblock yourself right now, changing {{column_index_size_in_kb}} on your instance to 256 (7G of index data needs to go under 2G in size, so multiplying factor of 4) PROBABLY works past this issue, but expect a bit more disk IO (particularly reads) after the change (+upgradesstables) was (Author: jjirsa): Thanks for the feedback [~slebresne]. I really appreciate you taking the time to respond. The risky feeling here is why I'm moving slow on this myself - it seems straightforward, but obviously the potential for unexpected surprises here is pretty high. Better messaging on that assert makes a lot of sense. Also good to see the second confirmation that changing {{column_index_size_in_kb}} is a good workaround, the drawback is that it's instance-wide, so if you have just a handful of wide rows (like this user, the histogram shows their 99% size is less than 1MB, but their max size is 394GB), you suffer a disk penalty on all keyspaces/tables/rows in order to not crash on the one bad row. [~dankinder] if you need to unblock yourself right now, changing {{column_index_size_in_kb}} on your instance to 256k (7G of index data needs to go under 2G in size, so multiplying factor of 4) PROBABLY works past this issue, but expect a bit more disk IO (particularly reads) after the change (+upgradesstables) > IllegalArgumentException in upgradesstables compaction > -- > > Key: CASSANDRA-13973 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13973 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Dan Kinder >Assignee: Jeff Jirsa >Priority: Major > Fix For: 3.0.x, 3.11.x, 4.x > > > After an upgrade from 2.2.6 to 3.0.15 (sstable version la to mc), when I try > to run upgradesstables, most of them upgrade fine but I see the exception > below on several nodes, and it doesn't complete. > CASSANDRA-12717 looks similar but the stack trace is not the same, so I > assumed it is not identical. The various nodes this happens on all give the > same trace. > Might be notable that this is an analytics cluster with some large > partitions, in the GB size. > {noformat} > error: Out of range: 7316844981 > -- StackTrace -- > java.lang.IllegalArgumentException: Out of range: 7316844981 > at com.google.common.primitives.Ints.checkedCast(Ints.java:91) > at > org.apache.cassandra.db.RowIndexEntry$IndexedEntry.promotedSize(RowIndexEntry.java:329) > at > org.apache.cassandra.db.RowIndexEntry$Serializer.serialize(RowIndexEntry.java:133) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.append(BigTableWriter.java:409) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.afterAppend(BigTableWriter.java:120) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:157) > at > org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125) > at > org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:88) > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > at > org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:424) > at > org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:311) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at >
[jira] [Commented] (CASSANDRA-13973) IllegalArgumentException in upgradesstables compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236109#comment-16236109 ] Jeff Jirsa commented on CASSANDRA-13973: Thanks for the feedback [~slebresne]. I really appreciate you taking the time to respond. The risky feeling here is why I'm moving slow on this myself - it seems straightforward, but obviously the potential for unexpected surprises here is pretty high. Better messaging on that assert makes a lot of sense. Also good to see the second confirmation that changing {{column_index_size_in_kb}} is a good workaround, the drawback is that it's instance-wide, so if you have just a handful of wide rows (like this user, the histogram shows their 99% size is less than 1MB, but their max size is 394GB), you suffer a disk penalty on all keyspaces/tables/rows in order to not crash on the one bad row. [~dankinder] if you need to unblock yourself right now, changing {{column_index_size_in_kb}} on your instance to 256k (7G of index data needs to go under 2G in size, so multiplying factor of 4) PROBABLY works past this issue, but expect a bit more disk IO (particularly reads) after the change (+upgradesstables) > IllegalArgumentException in upgradesstables compaction > -- > > Key: CASSANDRA-13973 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13973 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Dan Kinder >Assignee: Jeff Jirsa >Priority: Major > Fix For: 3.0.x, 3.11.x, 4.x > > > After an upgrade from 2.2.6 to 3.0.15 (sstable version la to mc), when I try > to run upgradesstables, most of them upgrade fine but I see the exception > below on several nodes, and it doesn't complete. > CASSANDRA-12717 looks similar but the stack trace is not the same, so I > assumed it is not identical. The various nodes this happens on all give the > same trace. > Might be notable that this is an analytics cluster with some large > partitions, in the GB size. > {noformat} > error: Out of range: 7316844981 > -- StackTrace -- > java.lang.IllegalArgumentException: Out of range: 7316844981 > at com.google.common.primitives.Ints.checkedCast(Ints.java:91) > at > org.apache.cassandra.db.RowIndexEntry$IndexedEntry.promotedSize(RowIndexEntry.java:329) > at > org.apache.cassandra.db.RowIndexEntry$Serializer.serialize(RowIndexEntry.java:133) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.append(BigTableWriter.java:409) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.afterAppend(BigTableWriter.java:120) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:157) > at > org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125) > at > org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:88) > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > at > org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:424) > at > org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:311) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13849) GossipStage blocks because of race in ActiveRepairService
[ https://issues.apache.org/jira/browse/CASSANDRA-13849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236068#comment-16236068 ] Blake Eggleston commented on CASSANDRA-13849: - Patch looks good. I've merged it up through trunk and started tests here: |[3.0|https://github.com/bdeggleston/cassandra/tree/13849-3.0] | [utests|https://circleci.com/gh/bdeggleston/cassandra/152] | [dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/408/]| |[3.11|https://github.com/bdeggleston/cassandra/tree/13849-3.11] | [utests|https://circleci.com/gh/bdeggleston/cassandra/153] | [dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/409/]| |[trunk|https://github.com/bdeggleston/cassandra/tree/13849-trunk] | [utests|https://circleci.com/gh/bdeggleston/cassandra/154] | [dtests|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/410/] | I'll commit once the tests are complete, assuming there aren't any problems. > GossipStage blocks because of race in ActiveRepairService > - > > Key: CASSANDRA-13849 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13849 > Project: Cassandra > Issue Type: Bug >Reporter: Tom van der Woerdt >Assignee: Sergey Lapukhov >Priority: Major > Labels: patch > Fix For: 3.0.x, 3.11.x > > Attachments: CAS-13849.patch, CAS-13849_2.patch, CAS-13849_3.patch > > > Bad luck caused a kernel panic in a cluster, and that took another node with > it because GossipStage stopped responding. > I think it's pretty obvious what's happening, here are the relevant excerpts > from the stack traces : > {noformat} > "Thread-24004" #393781 daemon prio=5 os_prio=0 tid=0x7efca9647400 > nid=0xe75c waiting on condition [0x7efaa47fe000] >java.lang.Thread.State: TIMED_WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0x00052b63a7e8> (a > java.util.concurrent.CountDownLatch$Sync) > at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277) > at > org.apache.cassandra.service.ActiveRepairService.prepareForRepair(ActiveRepairService.java:332) > - locked <0x0002e6bc99f0> (a > org.apache.cassandra.service.ActiveRepairService) > at > org.apache.cassandra.repair.RepairRunnable.runMayThrow(RepairRunnable.java:211) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > > at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at > org.apache.cassandra.concurrent.NamedThreadFactory$$Lambda$3/1498438472.run(Unknown > Source) > at java.lang.Thread.run(Thread.java:748) > "GossipTasks:1" #367 daemon prio=5 os_prio=0 tid=0x7efc5e971000 > nid=0x700b waiting for monitor entry [0x7dfb839fe000] >java.lang.Thread.State: BLOCKED (on object monitor) > at > org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:421) > - waiting to lock <0x0002e6bc99f0> (a > org.apache.cassandra.service.ActiveRepairService) > at > org.apache.cassandra.service.ActiveRepairService.convict(ActiveRepairService.java:776) > at > org.apache.cassandra.gms.FailureDetector.interpret(FailureDetector.java:306) > at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:775) > > at > org.apache.cassandra.gms.Gossiper.access$800(Gossiper.java:67) > at org.apache.cassandra.gms.Gossiper$GossipTask.run(Gossiper.java:187) > at > org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable.run(DebuggableScheduledThreadPoolExecutor.java:118) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at >
[jira] [Resolved] (CASSANDRA-13885) Allow to run full repairs in 3.0 without additional cost of anti-compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Blake Eggleston resolved CASSANDRA-13885. - Resolution: Won't Fix > Allow to run full repairs in 3.0 without additional cost of anti-compaction > --- > > Key: CASSANDRA-13885 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13885 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer >Priority: Major > > This ticket is basically the result of the discussion in Cassandra user list: > https://www.mail-archive.com/user@cassandra.apache.org/msg53562.html > I was asked to open a ticket by Paulo Motta to think about back-porting > running full repairs without the additional cost of anti-compaction. > Basically there is no way in 3.0 to run full repairs from several nodes > concurrently without troubles caused by (overlapping?) anti-compactions. > Coming from 2.1 this is a major change from an operational POV, basically > breaking any e.g. cron job based solution kicking off -pr based repairs on > several nodes concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13975) Add a workaround for overly large read repair mutations
[ https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235982#comment-16235982 ] Aleksey Yeschenko commented on CASSANDRA-13975: --- A straight-forward change pushed [here|https://github.com/iamaleksey/cassandra/commits/13975-3.0]. Unit test run [here|https://circleci.com/gh/iamaleksey/cassandra/63], dtest run [here|https://builds.apache.org/job/Cassandra-devbranch-dtest/407/]. > Add a workaround for overly large read repair mutations > --- > > Key: CASSANDRA-13975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13975 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Major > Fix For: 3.0.x, 3.11.x > > > It's currently possible for {{DataResolver}} to accumulate more changes to > read repair that would fit in a single serialized mutation. If that happens, > the node receiving the mutation would fail, and the read would time out, and > won't be able to proceed until the operator runs repair or manually drops the > affected partitions. > Ideally we should either read repair iteratively, or at least split the > resulting mutation into smaller chunks in the end. In the meantime, for > 3.0.x, I suggest we add logging to catch this, and a -D flag to allow > proceeding with the requests as is when the mutation is too large, without > read repair. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13975) Add a workaround for overly large read repair mutations
[ https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13975: -- Status: Patch Available (was: In Progress) > Add a workaround for overly large read repair mutations > --- > > Key: CASSANDRA-13975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13975 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Major > Fix For: 3.0.x, 3.11.x > > > It's currently possible for {{DataResolver}} to accumulate more changes to > read repair that would fit in a single serialized mutation. If that happens, > the node receiving the mutation would fail, and the read would time out, and > won't be able to proceed until the operator runs repair or manually drops the > affected partitions. > Ideally we should either read repair iteratively, or at least split the > resulting mutation into smaller chunks in the end. In the meantime, for > 3.0.x, I suggest we add logging to catch this, and a -D flag to allow > proceeding with the requests as is when the mutation is too large, without > read repair. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13975) Add a workaround for overly large read repair mutations
[ https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13975: -- Reviewer: Sam Tunnicliffe > Add a workaround for overly large read repair mutations > --- > > Key: CASSANDRA-13975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13975 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Major > Fix For: 3.0.x, 3.11.x > > > It's currently possible for {{DataResolver}} to accumulate more changes to > read repair that would fit in a single serialized mutation. If that happens, > the node receiving the mutation would fail, and the read would time out, and > won't be able to proceed until the operator runs repair or manually drops the > affected partitions. > Ideally we should either read repair iteratively, or at least split the > resulting mutation into smaller chunks in the end. In the meantime, for > 3.0.x, I suggest we add logging to catch this, and a -D flag to allow > proceeding with the requests as is when the mutation is too large, without > read repair. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13975) Add a workaround for overly large read repair mutations
[ https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13975: -- Summary: Add a workaround for overly large read repair mutations (was: TBD) > Add a workaround for overly large read repair mutations > --- > > Key: CASSANDRA-13975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13975 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Major > Fix For: 3.0.x, 3.11.x > > > It's currently possible for {{DataResolver}} to accumulate more changes to > read repair that would fit in a single serialized mutation. If that happens, > the node receiving the mutation would fail, and the read would time out, and > won't be able to proceed until the operator runs repair or manually drops the > affected partitions. > Ideally we should either read repair iteratively, or at least split the > resulting mutation into smaller chunks in the end. In the meantime, for > 3.0.x, I suggest we add logging to catch this, and a -D flag to allow > proceeding with the requests as is when the mutation is too large, without > read repair. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13975) TBD
[ https://issues.apache.org/jira/browse/CASSANDRA-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-13975: -- Description: It's currently possible for {{DataResolver}} to accumulate more changes to read repair that would fit in a single serialized mutation. If that happens, the node receiving the mutation would fail, and the read would time out, and won't be able to proceed until the operator runs repair or manually drops the affected partitions. Ideally we should either read repair iteratively, or at least split the resulting mutation into smaller chunks in the end. In the meantime, for 3.0.x, I suggest we add logging to catch this, and a -D flag to allow proceeding with the requests as is when the mutation is too large, without read repair. was:TBD > TBD > --- > > Key: CASSANDRA-13975 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13975 > Project: Cassandra > Issue Type: Bug > Components: Coordination >Reporter: Aleksey Yeschenko >Assignee: Aleksey Yeschenko >Priority: Major > Fix For: 3.0.x, 3.11.x > > > It's currently possible for {{DataResolver}} to accumulate more changes to > read repair that would fit in a single serialized mutation. If that happens, > the node receiving the mutation would fail, and the read would time out, and > won't be able to proceed until the operator runs repair or manually drops the > affected partitions. > Ideally we should either read repair iteratively, or at least split the > resulting mutation into smaller chunks in the end. In the meantime, for > 3.0.x, I suggest we add logging to catch this, and a -D flag to allow > proceeding with the requests as is when the mutation is too large, without > read repair. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Resolved] (CASSANDRA-13982) Refactoring to specialised functional interfaces
[ https://issues.apache.org/jira/browse/CASSANDRA-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown resolved CASSANDRA-13982. - Resolution: Fixed Fix Version/s: (was: 4.x) 4.0 some dtests are failing on other, unrelated branches, so i do not think any new failure is introduced with this patch. Thus, I'm +1, and committed as sha {{684e250ba6e5b5bd1c246ceac332a91b2dc90859}} Thanks! > Refactoring to specialised functional interfaces > > > Key: CASSANDRA-13982 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13982 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Ameya Ketkar >Assignee: Ameya Ketkar >Priority: Minor > Labels: static-analysis > Fix For: 4.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > Usage of specialised functional interfaces provided by JDK, will reduce the > autoboxing overhead hence. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Refactoring to specialised functional interfaces
Repository: cassandra Updated Branches: refs/heads/trunk 3fe31ffdd -> 684e250ba Refactoring to specialised functional interfaces patch by Ameya Ketkar; reviewed by jasobrown for CASSANDRA-13982 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/684e250b Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/684e250b Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/684e250b Branch: refs/heads/trunk Commit: 684e250ba6e5b5bd1c246ceac332a91b2dc90859 Parents: 3fe31ff Author: ameyaAuthored: Sat Oct 28 16:50:24 2017 -0700 Committer: Jason Brown Committed: Thu Nov 2 06:44:48 2017 -0700 -- CHANGES.txt | 1 + .../cassandra/auth/jmx/AuthorizationProxy.java | 15 ++-- .../org/apache/cassandra/db/Directories.java| 5 +- .../org/apache/cassandra/db/ReadCommand.java| 3 +- .../db/compaction/CompactionController.java | 3 +- .../db/compaction/CompactionIterator.java | 6 +- .../db/compaction/CompactionManager.java| 3 +- .../db/compaction/SSTableSplitter.java | 3 +- .../cassandra/db/compaction/Upgrader.java | 3 +- .../cassandra/db/compaction/Verifier.java | 3 +- .../db/lifecycle/LifecycleTransaction.java | 4 +- .../db/lifecycle/LogAwareFileLister.java| 8 +-- .../cassandra/db/partitions/PurgeFunction.java | 3 +- .../cassandra/hints/HintsDispatchExecutor.java | 8 +-- .../compress/CompressedInputStream.java | 8 +-- .../cassandra/tools/SSTableMetadataViewer.java | 8 +-- .../cassandra/tools/StandaloneSSTableUtil.java | 3 +- src/java/org/apache/cassandra/tools/Util.java | 18 ++--- .../test/microbench/AutoBoxingBench.java| 74 .../auth/jmx/AuthorizationProxyTest.java| 21 +++--- .../db/compaction/CompactionControllerTest.java | 3 +- .../rows/UnfilteredRowIteratorsMergeTest.java | 10 +-- .../db/rows/UnfilteredRowsGenerator.java| 8 +-- .../service/NativeTransportServiceTest.java | 7 +- 24 files changed, 157 insertions(+), 71 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/684e250b/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 6c3eb53..71f4b1d 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Refactoring to specialised functional interfaces (CASSANDRA-13982) * Speculative retry should allow more friendly params (CASSANDRA-13876) * Throw exception if we send/receive repair messages to incompatible nodes (CASSANDRA-13944) * Replace usages of MessageDigest with Guava's Hasher (CASSANDRA-13291) http://git-wip-us.apache.org/repos/asf/cassandra/blob/684e250b/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java -- diff --git a/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java b/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java index 1d8f462..d9b63c6 100644 --- a/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java +++ b/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java @@ -23,8 +23,9 @@ import java.security.AccessControlContext; import java.security.AccessController; import java.security.Principal; import java.util.Set; +import java.util.function.BooleanSupplier; import java.util.function.Function; -import java.util.function.Supplier; +import java.util.function.Predicate; import java.util.stream.Collectors; import javax.management.MBeanServer; import javax.management.MalformedObjectNameException; @@ -110,7 +111,7 @@ public class AuthorizationProxy implements InvocationHandler Used to check whether the Role associated with the authenticated Subject has superuser status. By default, just delegates to Roles::hasSuperuserStatus, but can be overridden for testing. */ -protected Function isSuperuser = Roles::hasSuperuserStatus; +protected Predicate isSuperuser = Roles::hasSuperuserStatus; /* Used to retrieve the set of all permissions granted to a given role. By default, this fetches @@ -123,7 +124,7 @@ public class AuthorizationProxy implements InvocationHandler Used to decide whether authorization is enabled or not, usually this depends on the configured IAuthorizer, but can be overridden for testing. */ -protected Supplier isAuthzRequired = () -> DatabaseDescriptor.getAuthorizer().requireAuthorization(); +protected BooleanSupplier isAuthzRequired = () -> DatabaseDescriptor.getAuthorizer().requireAuthorization(); /* Used to find matching MBeans when the invocation target is a pattern type ObjectName. @@
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235731#comment-16235731 ] Ariel Weisberg commented on CASSANDRA-13987: bq. but the ordering in the sidekick entries are not guaranteed to be in the same order as the commit log's entries. Just a heads up they would. You would increment the offsets atomically using CAS of two 4-byte values packed into one 8-byte value. > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Major > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235707#comment-16235707 ] Jason Brown commented on CASSANDRA-13987: - Just to add these here for completeness, I spoke with several other contributors, and here is a brief summary of each idea and my reasoning for not pursuing each. [~mkjellman] proposed to reintroduce a lock to the commitlog path, albeit with a smaller scope. The basic idea would still use multiple threads to serialize the mutation into the log, but we would lock around getting the {{Allocation}} buffer and writing the mutation's length and checksum. This would allow us to be able to replay everything that successfully serialized into the commitlog; we could skip entries that did not completely serialize (and thus fail on deserialization) as we would be guaranteed the entry's length was written at the beginning of the entry (and thus we could skip to the next entry if possible). The biggest downside here was the reintroduction of the lock, which is a larger topic than what I want to address here, and should involve a wider community discussion. [~aweisberg] proposed having a mmaped sidekick file where we would capture the position (and checksum of the position) of each entry in the main commitlog file. The entries in the sidekick file would be fixed-size values (8 bytes), so we would always be able to read the values. We would use something like the main commitlog's CAS to allocate space for the sidekick entry, but the ordering in the sidekick entries are not guaranteed to be in the same order as the commit log's entries. On replay, we would need to read in the sidekick file to know the offsets, and we would need to attempt to replay as many of the entries from the main commitlog as appeared in the sidekick file. While being a reasonably good idea, the downside for me is that introducing another file for ensuring more commitlog replayablility seemed more involved than probably necessary for the stated goal. Coorinated failures are already an edge condition, and imposing the sidekick file tax on every commitlog might be more than required. Also, I am concerned about the additional cost on replay to read the sidekick file, order the entries, and then ensure at least all those entries are replayed. We are sensitive to startup times, and this would add to it (albeit perhaps slightly). Another complicating factor for this idea is that is does not work with compressed or encrypted commitlogs. > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Major > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which >
[jira] [Commented] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
[ https://issues.apache.org/jira/browse/CASSANDRA-13987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235703#comment-16235703 ] Jason Brown commented on CASSANDRA-13987: - Here is a branch that takes the simplest path: it updates the commit log chained markers (in periodic mode) much more quickly than it mysnc's. ||trunk|| |[branch|https://github.com/jasobrown/cassandra/tree/commitlog_mmap-more-frequent-markers]| |[utests|https://circleci.com/gh/jasobrown/cassandra/tree/commitlog_mmap-more-frequent-markers]| |[dtest|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/407/]| The basic idea is that if we can update the chained markers (say, (a configurable) once every 100 milliseconds), that should probably be more than enough time to survive a correlated failure between two nodes about OOM. This is *not* a silver bullet to ensure complete replayability, as in that case you should use batch commit log mode for each commit to ensure durability. There are alternatives that I discussed with others (see next comment). This branch does not solve the problem for compressed/encrypted commitlog (in periodic mode), as those implementations do not use mmaped files. I am not sure how best (or if) to address those. Switching them to use a memory mapped file might not be too difficult, code-wise, but i'm not sure about performance implications. Apparently, [~benedict] and I had some discussion about the use of mmap and the commitlog a lng time ago (CASSANDRA-6809), but I honestly can't remember the details beyond our comments on that ticket. > Multithreaded commitlog subtly changed durability > - > > Key: CASSANDRA-13987 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 > Project: Cassandra > Issue Type: Improvement >Reporter: Jason Brown >Assignee: Jason Brown >Priority: Major > Fix For: 4.x > > > When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly > changed the way that commitlog durability worked. Everything still gets > written to an mmap file. However, not everything is replayable from the > mmaped file after a process crash, in periodic mode. > In brief, the reason this changesd is due to the chained markers that are > required for the multithreaded commit log. At each msync, we wait for > outstanding mutations to serialize into the commitlog, and update a marker > before and after the commits that have accumluated since the last sync. With > those markers, we can safely replay that section of the commitlog. Without > the markers, we have no guarantee that the commits in that section were > successfully written, thus we abandon those commits on replay. > If you have correlated process failures of multiple nodes at "nearly" the > same time (see ["There Is No > Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have > data loss if none of the nodes msync the commitlog. For example, with RF=3, > if quorum write succeeds on two nodes (and we acknowledge the write back to > the client), and then the process on both nodes OOMs (say, due to reading the > index for a 100GB partition), the write will be lost if neither process > msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. > The reason why this data is silently lost is due to the chained markers that > were introduced with CASSANDRA-3578. > The problem we are addressing with this ticket is incrementally improving > 'durability' due to process crash, not host crash. (Note: operators should > use batch mode to ensure greater durability, but batch mode in it's current > implementation is a) borked, and b) will burn through, *very* rapidly, SSDs > that don't have a non-volatile write cache sitting in front.) > The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which > means that a node could lose up to ten seconds of data due to process crash. > The unfortunate thing is that the data is still avaialble, in the mmap file, > but we can't replay it due to incomplete chained markers. > ftr, I don't believe we've ever had a stated policy about commitlog > durability wrt process crash. Pre-2.0 we naturally piggy-backed off the > memory mapped file and the fact that every mutation was acquired a lock and > wrote into the mmap buffer, and the ability to replay everything out of it > came for free. With CASSANDRA-3578, that was subtly changed. > Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust > the durability > guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] > of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm > using that idea as a loose springboard for what to do
[jira] [Created] (CASSANDRA-13987) Multithreaded commitlog subtly changed durability
Jason Brown created CASSANDRA-13987: --- Summary: Multithreaded commitlog subtly changed durability Key: CASSANDRA-13987 URL: https://issues.apache.org/jira/browse/CASSANDRA-13987 Project: Cassandra Issue Type: Improvement Reporter: Jason Brown Assignee: Jason Brown Priority: Major Fix For: 4.x When multithreaded commitlog was introduced in CASSANDRA-3578, we subtly changed the way that commitlog durability worked. Everything still gets written to an mmap file. However, not everything is replayable from the mmaped file after a process crash, in periodic mode. In brief, the reason this changesd is due to the chained markers that are required for the multithreaded commit log. At each msync, we wait for outstanding mutations to serialize into the commitlog, and update a marker before and after the commits that have accumluated since the last sync. With those markers, we can safely replay that section of the commitlog. Without the markers, we have no guarantee that the commits in that section were successfully written, thus we abandon those commits on replay. If you have correlated process failures of multiple nodes at "nearly" the same time (see ["There Is No Now"|http://queue.acm.org/detail.cfm?id=2745385]), it is possible to have data loss if none of the nodes msync the commitlog. For example, with RF=3, if quorum write succeeds on two nodes (and we acknowledge the write back to the client), and then the process on both nodes OOMs (say, due to reading the index for a 100GB partition), the write will be lost if neither process msync'ed the commitlog. More exactly, the commitlog cannot be fully replayed. The reason why this data is silently lost is due to the chained markers that were introduced with CASSANDRA-3578. The problem we are addressing with this ticket is incrementally improving 'durability' due to process crash, not host crash. (Note: operators should use batch mode to ensure greater durability, but batch mode in it's current implementation is a) borked, and b) will burn through, *very* rapidly, SSDs that don't have a non-volatile write cache sitting in front.) The current default for {{commitlog_sync_period_in_ms}} is 10 seconds, which means that a node could lose up to ten seconds of data due to process crash. The unfortunate thing is that the data is still avaialble, in the mmap file, but we can't replay it due to incomplete chained markers. ftr, I don't believe we've ever had a stated policy about commitlog durability wrt process crash. Pre-2.0 we naturally piggy-backed off the memory mapped file and the fact that every mutation was acquired a lock and wrote into the mmap buffer, and the ability to replay everything out of it came for free. With CASSANDRA-3578, that was subtly changed. Something [~jjirsa] pointed out to me is that [MySQL provides a way to adjust the durability guarantees|https://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit] of each commit in innodb via the {{innodb_flush_log_at_trx_commit}}. I'm using that idea as a loose springboard for what to do here. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10404) Node to Node encryption transitional mode
[ https://issues.apache.org/jira/browse/CASSANDRA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235603#comment-16235603 ] Jason Brown commented on CASSANDRA-10404: - Thanks, [~eperott]. [~spo...@gmail.com] any additional comments or concerns? > Node to Node encryption transitional mode > - > > Key: CASSANDRA-10404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10404 > Project: Cassandra > Issue Type: New Feature >Reporter: Tom Lewis >Assignee: Jason Brown >Priority: Major > Fix For: 4.x > > > Create a transitional mode for encryption that allows encrypted and > unencrypted traffic node-to-node during a change over to encryption from > unencrypted. This alleviates downtime during the switch. > This is similar to CASSANDRA-10559 which is intended for client-to-node -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-10404) Node to Node encryption transitional mode
[ https://issues.apache.org/jira/browse/CASSANDRA-10404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235402#comment-16235402 ] Per Otterström commented on CASSANDRA-10404: I'm +1 on this! > Node to Node encryption transitional mode > - > > Key: CASSANDRA-10404 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10404 > Project: Cassandra > Issue Type: New Feature >Reporter: Tom Lewis >Assignee: Jason Brown >Priority: Major > Fix For: 4.x > > > Create a transitional mode for encryption that allows encrypted and > unencrypted traffic node-to-node during a change over to encryption from > unencrypted. This alleviates downtime during the switch. > This is similar to CASSANDRA-10559 which is intended for client-to-node -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-13973) IllegalArgumentException in upgradesstables compaction
[ https://issues.apache.org/jira/browse/CASSANDRA-13973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16235374#comment-16235374 ] Sylvain Lebresne commented on CASSANDRA-13973: -- bq. do you two feel this is safe? I can't think of something this would hard break off the top of my head. That said, and for what it's worth, my more complete initial reflexion is that: * on 3.0/3.X, this feels a tad risky: we're adding new code to the file indexing (granted, not excessively complex one) and it doesn't get a lot more critical. It could also change the performance profile, and while it might be on the good side in many cases, it may not always be (especially when the patch rely on 2 new settings whose defaults may happen to not be the right ones for someone's pratical workload). As few people will run into this problem in the first place, asking those rare users to change {{column_index_size_in_kb}} would probably be safer overall (tbc, I'm suggesting here to improve the error message of the checked cast to point people to that work-around, not to leave people on their own as done currently). * on 4.0, we already have CASSANDRA-11206 (which is in fact in 3.11 as well) to help work with large indexes, and things like CASSANDRA-9754 are supposed to make that even better, so the benefit on memory of this aren't that clear. CASSANDRA-11206 doesn't solve the {{AsssertionError}} of this ticket, but we could move the index size from {{int}} to {{long}} (or varint) for that. Which isn't necessarilly saying we shouldn't do this, but adding multiple ways to fix the same problem, each with their own new config setting (CASSANDRA-11206 added one, this patch adds 2) doesn't feel ideal, so that's to be taken into consideration imo. > IllegalArgumentException in upgradesstables compaction > -- > > Key: CASSANDRA-13973 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13973 > Project: Cassandra > Issue Type: Bug > Components: Compaction >Reporter: Dan Kinder >Assignee: Jeff Jirsa >Priority: Major > Fix For: 3.0.x, 3.11.x, 4.x > > > After an upgrade from 2.2.6 to 3.0.15 (sstable version la to mc), when I try > to run upgradesstables, most of them upgrade fine but I see the exception > below on several nodes, and it doesn't complete. > CASSANDRA-12717 looks similar but the stack trace is not the same, so I > assumed it is not identical. The various nodes this happens on all give the > same trace. > Might be notable that this is an analytics cluster with some large > partitions, in the GB size. > {noformat} > error: Out of range: 7316844981 > -- StackTrace -- > java.lang.IllegalArgumentException: Out of range: 7316844981 > at com.google.common.primitives.Ints.checkedCast(Ints.java:91) > at > org.apache.cassandra.db.RowIndexEntry$IndexedEntry.promotedSize(RowIndexEntry.java:329) > at > org.apache.cassandra.db.RowIndexEntry$Serializer.serialize(RowIndexEntry.java:133) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter$IndexWriter.append(BigTableWriter.java:409) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.afterAppend(BigTableWriter.java:120) > at > org.apache.cassandra.io.sstable.format.big.BigTableWriter.append(BigTableWriter.java:157) > at > org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:125) > at > org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:88) > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:109) > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:195) > at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > at > org.apache.cassandra.db.compaction.CompactionManager$5.execute(CompactionManager.java:424) > at > org.apache.cassandra.db.compaction.CompactionManager$2.call(CompactionManager.java:311) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) > at java.lang.Thread.run(Thread.java:748) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional
[jira] [Updated] (CASSANDRA-13872) document speculative_retry on DDL page
[ https://issues.apache.org/jira/browse/CASSANDRA-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jordan Vaughan updated CASSANDRA-13872: --- Status: Patch Available (was: Reopened) Here's my patch documenting case-insensitivity and the new "P" suffix: [trunk-13872-DocumentPSuffix|https://github.com/jtvaughan/cassandra/tree/trunk-13872-DocumentPSuffix] > document speculative_retry on DDL page > -- > > Key: CASSANDRA-13872 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13872 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jon Haddad >Assignee: Jordan Vaughan >Priority: Major > Labels: docuentation, lhf > Fix For: 4.0 > > > There's no mention of speculative_retry or how it works on > https://cassandra.apache.org/doc/latest/cql/ddl.html -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org