[ https://issues.apache.org/jira/browse/CASSANDRA-15413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Petrov updated CASSANDRA-15413: ------------------------------------ Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable Corruption / Loss(12986) Complexity: Challenging Component/s: Local/SSTable Discovered By: User Report Severity: Critical Assignee: Alex Petrov Status: Open (was: Triage Needed) > Missing results on reading large frozen text map > ------------------------------------------------ > > Key: CASSANDRA-15413 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15413 > Project: Cassandra > Issue Type: Bug > Components: Local/SSTable > Reporter: Tyler Codispoti > Assignee: Alex Petrov > Priority: Normal > > Cassandra version: 2.2.15 > I have been running into a case where, when fetching the results from a table > with a frozen<map<text, text>>, if the number of results is greater than the > fetch size (default 5000), we can end up with missing data. > Side note: The table schema comes from using KairosDB, but we've isolated > this issue to Cassandra itself. But it looks like this can cause problems for > users of KairosDB as well. > Repro case. Tested against fresh install of Cassandra 2.2.15. > 1. Create table (csqlsh) > {code:sql} > CREATE KEYSPACE test > WITH REPLICATION = { > 'class' : 'SimpleStrategy', > 'replication_factor' : 1 > }; > CREATE TABLE test.test ( > name text, > tags frozen<map<text, text>>, > PRIMARY KEY (name, tags) > ) WITH CLUSTERING ORDER BY (tags ASC); > {code} > 2. Insert data (python3) > {code:python} > import time > from cassandra.cluster import Cluster > cluster = Cluster(['127.0.0.1']) > session = cluster.connect('test') > for i in range(0, 20000): > session.execute( > """ > INSERT INTO test (name, tags) > VALUES (%s, %s) > """, > ("test_name", {'id':str(i)}) > ) > {code} > > 3. Flush > > {code:java} > nodetools flush{code} > > > 4. Fetch data (python3) > {code:python} > import time > from cassandra.cluster import Cluster > cluster = Cluster(['127.0.0.1'], control_connection_timeout=5000) > session = cluster.connect('test') > session.default_fetch_size = 5000 > session.default_timeout = 120 > count = 0 > rows = session.execute("select tags from test where name='test_name'") > for row in rows: > count += 1 > print(count) > {code} > Result: 10111 (expected 20000) > > Changing the page size changes the result count. Some quick samples: > > ||default_fetch_size||count|| > |5000|10111| > |1000|1830| > |999|1840| > |998|1850| > |20000|20000| > |100000|20000| > > > In short, I cannot guarantee I'll get all the results back unless the page > size > number of rows. > This seems to get worse with multiple SSTables (eg nodetool flush between > some of the insert batches). When using replication, the issue can get > disgustingly bad - potentially giving a different result on each query. > Interesting, if we pad the values on the tag map ("id" in this repro case) so > that the insertion is in lexicographical order, there is no issue. I believe > the issue also does not repro if I do not call "nodetools flush" before > querying. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org