[ https://issues.apache.org/jira/browse/CASSANDRA-20709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17959553#comment-17959553 ]
Ekaterina Dimitrova commented on CASSANDRA-20709: ------------------------------------------------- [~mmarshall] , I assumed this is the case for both cassandra-5.0 and trunk branches and added the expected fix versions. Please feel free to correct them if I am wrong. > SAI queries can miss a memtable/sstable if flush happens during query > --------------------------------------------------------------------- > > Key: CASSANDRA-20709 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20709 > Project: Apache Cassandra > Issue Type: Bug > Components: Feature/SAI > Reporter: Michael Marshall > Priority: Normal > Fix For: 5.0.x, 6.x > > > The IndexSearchResultIterator takes sstable index references to query and > then gets the memtable index references within the searchMemtableIndexes > method. This ordering is not safe because a concurrent flush can lead to > missing rows that were in the memtable when the QueryView was created but in > an sstable when the searchMemtableIndexes is called. > The solution is to get the memtables before getting the sstables. I propose > that we add the memtables to the QueryView to make the abstraction a bit > simpler to follow. We already do the correct thing in hybrid ANN queries, but > my proposed solution would simplify that code too by storing the memtable > index references in the QueryView. > Here is a test to reproduce the bug: > {code:java} > public class FlushIndexWhileQueryingTest extends SAITester > { > @Test > public void testFlushDuringEqualityQuery() throws Throwable > { > createTable("CREATE TABLE %s (k text PRIMARY KEY, x int)"); > createIndex("CREATE CUSTOM INDEX ON %s(x) USING > 'StorageAttachedIndex'"); > waitForTableIndexesQueryable(); > execute("INSERT INTO %s (k, x) VALUES (?, ?)", "a", 0); > execute("INSERT INTO %s (k, x) VALUES (?, ?)", "b", 0); > execute("INSERT INTO %s (k, x) VALUES (?, ?)", "c", 1); > // We use a barrier to trigger flush at precisely the right time > InvokePointBuilder initialInvokePoint = > InvokePointBuilder.newInvokePoint() > > .onClass(IndexSearchResultIterator.class) > .onMethod("build") > .atEntry(); > Injections.Barrier initialBarrier = > Injections.newBarrier("pause_query", 2, false) > .add(initialInvokePoint) > .build(); > InvokePointBuilder secondInvokePoint = > InvokePointBuilder.newInvokePoint() > > .onClass(MemtableIndexManager.class) > > .onMethod("searchMemtableIndexes") > .atEntry(); > Injections.Barrier secondBarrier = > Injections.newBarrier("resume_query", 2, false) > .add(secondInvokePoint) > .build(); > Injections.inject(initialBarrier, secondBarrier); > // Flush in a separate thread to allow the query to run concurrently > ForkJoinPool.commonPool().submit(() -> { > try > { > initialBarrier.arrive(); > flush(); > secondBarrier.arrive(); > } > catch (InterruptedException t) > { > throw new RuntimeException(t); > } > }); > assertRowCount(execute("SELECT k FROM %s WHERE x = 0"), 2); > assertEquals("Confirm that we hit the barrier (helps in case method > name changed)", 0, initialBarrier.getCount()); > assertEquals("Confirm that we hit the barrier (helps in case method > name changed)", 0, secondBarrier.getCount()); > } > } > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org