[ 
https://issues.apache.org/jira/browse/CASSANDRA-16554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16554:
------------------------------------------
    Reviewers: Andres de la Peña, Andres de la Peña  (was: Andres de la Peña)
               Andres de la Peña, Andres de la Peña  (was: Andres de la Peña)
       Status: Review In Progress  (was: Patch Available)

> Race between secondary index building and active compactions tracking
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-16554
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16554
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Feature/2i Index, Local/Other
>            Reporter: Yifan Cai
>            Assignee: Yifan Cai
>            Priority: Normal
>             Fix For: 4.0
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is a race condition between the secondary index build compaction task 
> and the active compactions tracking, especially when incremental repair is 
> running. 
> It could result into 2 exceptions. 
> {code:java}
> Caused by: java.util.NoSuchElementException
>       at 
> org.apache.cassandra.utils.AbstractIterator.next(AbstractIterator.java:64)
>       at 
> org.apache.cassandra.io.sstable.ReducingKeyIterator.next(ReducingKeyIterator.java:117)
>       at 
> org.apache.cassandra.index.internal.CollatedViewIndexBuilder.build(CollatedViewIndexBuilder.java:74)
>       at 
> org.apache.cassandra.db.compaction.CompactionManager$14.run(CompactionManager.java:1688)
> {code}
> {code:java}
> Caused by: java.io.EOFException
>       at 
> org.apache.cassandra.io.util.RebufferingInputStream.readByte(RebufferingInputStream.java:180)
>       at 
> org.apache.cassandra.utils.vint.VIntCoding.readUnsignedVInt(VIntCoding.java:68)
>       at 
> org.apache.cassandra.io.util.RebufferingInputStream.readUnsignedVInt(RebufferingInputStream.java:243)
>       at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.readPosition(RowIndexEntry.java:364)
>       at 
> org.apache.cassandra.db.RowIndexEntry$Serializer.skip(RowIndexEntry.java:369)
>       at 
> org.apache.cassandra.io.sstable.KeyIterator.computeNext(KeyIterator.java:110)
> {code}
> In the first exception, the iterator returns true for the call of `hasNext`, 
> but the following call of `next` throws. 
> In the second exception, the file wrapper object returns false for the call 
> of `isEOF`, but the following call that reads from it throws EOFException. 
> The exceptions can be constantly reproduced by the test in the patch. 
> The root cause of the exception is from the thread-unsafe lazy initialization 
> found in `ReducingKeyIterator` and `KeyIterator`. When the `maybeInit` 
> methods of both classes are called from multiple threads, it is likely to 
> instantiate 2 instances and mess up the internal state. Those iterators might 
> not be considered being used in a multiple thread environment when being 
> added to the codebase initially. However, the `CollatedViewIndexBuilder` 
> contains the reference to those 2 iterator, and it, as a 
> `CompactionInfo.Holder`, is added to active compactions to be accessed from 
> other threads, e.g. by calling `ActiveCompactions#getCompactionsForSSTable`. 
> The implementation of `getCompactionInfo` in `CollatedViewIndexBuilder` 
> publishes the reference to the `ReducingKeyIterator` and transitively 
> `KeyIterator` to the other threads. Therefore, the other threads can possibly 
> race. 
> For the instance of NSEE, thread 1 calls the `hasNext` on the 
> `ReducingKeyIterator`, it initialize the internal merge iterator. The call 
> returns true. Right after, there is a thread 2 that calls 
> `ActiveCompactions#getCompactionsForSSTable` and it sees the instance of 
> `ReducingKeyIterator` is not initialized yet, so run `maybeInit` again. The 
> reference of `iter` is replaced with the second instance. Now, thread 1 calls 
> `next` using the new instance that does not has the correct state (expecting 
> HAS_NEXT). So it again calls `hasNext`, which fetches the next element from 
> the merged iterator. Those 2 `ReducingKeyIterator` share the same instances 
> of `KeyIterator`. If the key iterators are already at the end, calling 
> `hasNext` returns false and we get the NSEE. 
> Besides the explicit NSEE exception, with the above reasoning, it is also 
> possible to have the unexpected behavior that skips one item from the merge 
> iterator. It leads to no 2i is built for that missing partition. Such case is 
> totally hidden since no exception is ever thrown. 
> To fix the unexpected behaviors, we need to correctly lazy initialize the 
> iterators. 
> Looking at the implementations of `CompactionInfo.Holder`, `ScrubInfo` and 
> `VerifyInfo` also publishes the non-thread-safe `RandomAccessReader` when 
> creating the compaction info object. According to the code inspection, there 
> is a rare chance that a thread is calling `getFilePointer` from the file 
> object and another thread closes the file, which can produce a NPE. When 
> closing the file, the internal states, e.g. buffer is set to null. I did not 
> add the fix in this patch, as it is never spotted. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to