[ https://issues.apache.org/jira/browse/CASSANDRA-15464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17798900#comment-17798900 ]
Benedict Elliott Smith commented on CASSANDRA-15464: ---------------------------------------------------- I think it is likely to have been fixed by CASSANDRA-15511 > Inserts to set<text> slow due to AtomicBTreePartition for > ComplexColumnData.dataSize > ------------------------------------------------------------------------------------ > > Key: CASSANDRA-15464 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15464 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Core > Reporter: Eric Jacobsen > Priority: Normal > > Concurrent inserts to set<text> can cause client timeouts and excessive CPU > due to compare and swap in AtomicBTreePartition for > ComplexColumnData.dataSize. As the length of the set gets longer, the > probability of doing the compare decreases. > The problem we saw in production was with insertions into a set<text> with > len(set<text>) hundreds to thousands. Because of the semantics of what we > store in the set, we had not anticipated the length being more than about 10. > (Almost all rows have length <= 6, the largest observed was 7032. Total > number of rows < 4000. 3 machines were used.) > The bad behavior we saw was all machines went to 100% cpu on all cores, and > clients were timing out. Our immediate solution in production was adding more > machines (went from 3 machines to 6 machines). The stack included > partitions.AtomicBTreePartition.addAllWithSizeDelta … > ComplexColumnData.dataSize. > The AtomicBTreePartition code uses a Compare And Swap approach, yet the time > between compares is dependent on the length of the set. When the length of > the set is long, with concurrent updates, each loop is unlikely to make > forward progress and can be delayed looping. > Here is one example call stack: > {noformat} > "SharedPool-Worker-40" #167 daemon prio=10 os_prio=0 tid=0x00007f9bb4032800 > nid=0x2ee5 runnable [0x00007f9b067f4000] > java.lang.Thread.State: RUNNABLE > at > org.apache.cassandra.db.rows.ComplexColumnData.dataSize(ComplexColumnData.java:114) > at org.apache.cassandra.db.rows.BTreeRow.dataSize(BTreeRow.java:373) > at > org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:292) > at > org.apache.cassandra.db.partitions.AtomicBTreePartition$RowUpdater.apply(AtomicBTreePartition.java:235) > at org.apache.cassandra.utils.btree.NodeBuilder.update(NodeBuilder.java:159) > at org.apache.cassandra.utils.btree.TreeBuilder.update(TreeBuilder.java:73) > at org.apache.cassandra.utils.btree.BTree.update(BTree.java:181) > at > org.apache.cassandra.db.partitions.AtomicBTreePartition.addAllWithSizeDelta(AtomicBTreePartition.java:155) > at org.apache.cassandra.db.Memtable.put(Memtable.java:254) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:1204) > at org.apache.cassandra.db.Keyspace.applyInternal(Keyspace.java:573) > at org.apache.cassandra.db.Keyspace.applyFuture(Keyspace.java:384) > at org.apache.cassandra.db.Mutation.applyFuture(Mutation.java:205) > at org.apache.cassandra.hints.Hint.applyFuture(Hint.java:99) > at org.apache.cassandra.hints.HintVerbHandler.doVerb(HintVerbHandler.java:95) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136) > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > at java.lang.Thread.run(Thread.java:748) > {noformat} > In a test program to repro the problem, we raise the number of concurrent > users and lower the think time between queries. Updating elements of > low-length sets can occur without errors, and with long-length sets, clients > time out with errors and there are periods with all cores 99.x% CPU and with > jstack shows time going to ComplexColumnData.dataSize. > Here is the schema. Our long term application solution was to just have the > set elements be part of the primary key and avoid using set<text>, thus > guaranteeing the code does not go through ComplexColumnData.dataSize > {noformat} > CREATE TABLE x.x ( > x int PRIMARY KEY, > y set<text> ) ... > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org