[ 
https://issues.apache.org/jira/browse/CASSANDRA-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14359995#comment-14359995
 ] 

Aleksey Yeschenko commented on CASSANDRA-8961:
----------------------------------------------

CASSANDRA-8099 will make that query not use range tombstones, but you'll have 
to wait for 3.0 to get that.

> Data rewrite case causes almost non-functional compaction
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-8961
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8961
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Centos 6.6, Cassandra 2.0.12 (Also seen in Cassandra 2.1)
>            Reporter: Dan Kinder
>            Priority: Minor
>
> There seems to be a bug of some kind where compaction grinds to a halt in 
> this use case: from time to time we have a set of rows we need to "migrate", 
> changing their primary key by deleting the row and inserting a new row with 
> the same partition key and different cluster key. The python script below 
> demonstrates this; it takes a bit of time to run (didn't try to optimize it) 
> but when it's done it will be trying to compact a few hundred megs of data 
> for a long time... on the order of days, or it will never finish.
> Not verified by this sandboxed experiment but it seems that compression 
> settings do not matter and that this seems to happen to STCS as well, not 
> just LCS. I am still testing if other patterns cause this terrible compaction 
> performance, like deleting all rows then inserting or vice versa.
> Even if it isn't a "bug" per se, is there a way to fix or work around this 
> behavior?
> {code}
> import string
> import random
> from cassandra.cluster import Cluster
> cluster = Cluster(['localhost'])
> db = cluster.connect('walker')
> db.execute("DROP KEYSPACE IF EXISTS trial")
> db.execute("""CREATE KEYSPACE trial
>               WITH REPLICATION = { 'class': 'SimpleStrategy', 
> 'replication_factor': 1 }""")
> db.execute("""CREATE TABLE trial.tbl (
>                 pk text,
>                 data text,
>                 PRIMARY KEY(pk, data)
>               ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' }
>                 AND compression = {'sstable_compression': ''}""")
> # Number of rows to insert and "move"
> n = 200000                                                                  
>                                                                             
> # Insert n rows with the same partition key, 1KB of unique data in cluster key
> for i in range(n):
>     db.execute("INSERT INTO trial.tbl (pk, data) VALUES ('thepk', %s)",
>         [str(i).zfill(1024)])
> # Update those n rows, deleting each and replacing with a very similar row
> for i in range(n):
>     val = str(i).zfill(1024)
>     db.execute("DELETE FROM trial.tbl WHERE pk = 'thepk' AND data = %s", 
> [val])
>     db.execute("INSERT INTO trial.tbl (pk, data) VALUES ('thepk', %s)", ["1" 
> + val])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to