[jira] [Commented] (CASSANDRA-15464) Inserts to set slow due to AtomicBTreePartition for ComplexColumnData.dataSize

Benedict Elliott Smith (Jira) Wed, 20 Dec 2023 02:17:04 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-15464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798900#comment-17798900
 ]


Benedict Elliott Smith commented on CASSANDRA-15464:
----------------------------------------------------

I think it is likely to have been fixed by CASSANDRA-15511

> Inserts to set<text> slow due to AtomicBTreePartition for 
> ComplexColumnData.dataSize
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15464
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15464
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Core
>            Reporter: Eric Jacobsen
>            Priority: Normal
>
> Concurrent inserts to set<text> can cause client timeouts and excessive CPU 
> due to compare and swap in AtomicBTreePartition for 
> ComplexColumnData.dataSize. As the length of the set gets longer, the 
> probability of doing the compare decreases.
> The problem we saw in production was with insertions into a set<text> with 
> len(set<text>) hundreds to thousands. Because of the semantics of what we 
> store in the set, we had not anticipated the length being more than about 10. 
> (Almost all rows have length <= 6, the largest observed was 7032. Total 
> number of rows < 4000. 3 machines were used.)
> The bad behavior we saw was all machines went to 100% cpu on all cores, and 
> clients were timing out. Our immediate solution in production was adding more 
> machines (went from 3 machines to 6 machines). The stack included 
> partitions.AtomicBTreePartition.addAllWithSizeDelta … 
> ComplexColumnData.dataSize.
> The AtomicBTreePartition code uses a Compare And Swap approach, yet the time 
> between compares is dependent on the length of the set. When the length of 
> the set is long, with concurrent updates, each loop is unlikely to make 
> forward progress and can be delayed looping.
> Here is one example call stack:
> {noformat}
> "SharedPool-Worker-40" #167 daemon prio=10 os_prio=0 tid=0x00007f9bb4032800 
> nid=0x2ee5 runnable [0x00007f9b067f4000]
> java.lang.Thread.State: RUNNABLE
> at 
> org.apache.cassandra.db.rows.ComplexColumnData.dataSize(ComplexColumnData.java:114)
> at org.apache.cassandra.db.rows.BTreeRow.dataSize(BTreeRow.java:373)
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:292)
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:235)
> at org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:159)
> at org.apache.cassandra.utils.btree.TreeBuilder.update(TreeBuilder.java:73)
> at org.apache.cassandra.utils.btree.BTree.update(BTree.java:181)
> at 
> org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:155)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:254)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1204)
> at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:573)
> at org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:384)
> at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:205)
> at org.apache.cassandra.hints.Hint.applyFuture(Hint.java:99)
> at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:95)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
> at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
> In a test program to repro the problem, we raise the number of concurrent 
> users and lower the think time between queries. Updating elements of 
> low-length sets can occur without errors, and with long-length sets, clients 
> time out with errors and there are periods with all cores 99.x% CPU and with 
> jstack shows time going to  ComplexColumnData.dataSize.
> Here is the schema. Our long term application solution was to just have the 
> set elements be part of the primary key and avoid using set<text>, thus 
> guaranteeing the code does not go through ComplexColumnData.dataSize
> {noformat}
> CREATE TABLE x.x (
>  x int PRIMARY KEY,
>  y set<text> ) ...
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-15464) Inserts to set slow due to AtomicBTreePartition for ComplexColumnData.dataSize

Reply via email to