[ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15965534#comment-15965534
 ] 

Corentin Chary commented on CASSANDRA-13432:
--------------------------------------------

It's 2.1.15 but I don't believe it has been fixed. I believe that it's stuck in 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156) 
which doesn't count tombstones. A simple patch could be something like:

{code}
diff --git a/src/java/org/apache/cassandra/db/filter/QueryFilter.java 
b/src/java/org/apache/cassandra/db/filter/QueryFilter.java
index db531a5..8b718db 100644
--- a/src/java/org/apache/cassandra/db/filter/QueryFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/QueryFilter.java
@@ -23,6 +23,10 @@ import java.util.Iterator;
 import java.util.List;
 import java.util.SortedSet;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.Cell;
 import org.apache.cassandra.db.ColumnFamily;
 import org.apache.cassandra.db.DecoratedKey;
@@ -34,10 +38,12 @@ import 
org.apache.cassandra.db.columniterator.OnDiskAtomIterator;
 import org.apache.cassandra.db.composites.CellName;
 import org.apache.cassandra.db.composites.Composite;
 import org.apache.cassandra.io.sstable.SSTableReader;
+import org.apache.cassandra.tracing.Tracing;
 import org.apache.cassandra.utils.MergeIterator;
 
 public class QueryFilter
 {
+    private static final Logger logger = 
LoggerFactory.getLogger(QueryFilter.class);
     public final DecoratedKey key;
     public final String cfName;
     public final IDiskAtomFilter filter;
@@ -147,6 +153,7 @@ public class QueryFilter
         return new Iterator<Cell>()
         {
             private Cell next;
+            private int tombstoneCount = 0;                                    
                                                                                
                                                                                
             
                                                                                
                                                                                
                                                                                
             
             public boolean hasNext()                                           
                                                                                
                                                                                
             
             {                                                                  
                                                                                
                                                                                
             
@@ -181,6 +188,19 @@ public class QueryFilter                                   
                                                                                
                                                                                
             
                     }                                                          
                                                                                
                                                                                
             
                     else                                                       
                                                                                
                                                                                
             
                     {                                                          
                                                                                
                                                                                
             
+                        tombstoneCount++;                                      
                                                                                
                                                                                
             
+                        if (tombstoneCount > 
DatabaseDescriptor.getTombstoneFailureThreshold())                              
                                                                                
                                               
+                        {                                                      
                                                                                
                                                                                
             
+                            Tracing.trace("Scanned over {} tombstones; query 
aborted (see tombstone_failure_threshold)",                                     
                                                                                
               
+                                    
DatabaseDescriptor.getTombstoneFailureThreshold());                             
                                                                                
                                                        
+                            String msg = String.format("Scanned over %d 
tombstones in %s.%s for key: %1.512s; query aborted (see 
tombstone_failure_threshold).",                                                 
                                           
+                                    
DatabaseDescriptor.getTombstoneFailureThreshold(),                              
                                                                                
                                                        
+                                    returnCF.metadata().ksName,                
                                                                                
                                                                                
             
+                                    returnCF.metadata().cfName,                
                                                                                
                                                                                
             
+                                    "unknown");                                
                                                                                
                                                                                
             
+                            logger.error(msg);                                 
                                                                                
                                                                                
             
+                            throw new TombstoneOverwhelmingException();        
                                                                                
                                                                                
             
+                        }                                                      
                                                                                
                                                                                
             
                         returnCF.addAtom(atom);                                
                                                                                
                                                                                
             
                     }                                                          
                                                                                
                                                                                
             
                 }
{code}

> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-13432
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Corentin Chary
>             Fix For: 2.1.x
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool Name                    Active   Pending      Completed   Blocked  All 
> time blocked
> MutationStage                     0         0       32135875         0        
>          0
> ReadStage                       114         0       29492940         0        
>          0
> RequestResponseStage              0         0       86090931         0        
>          0
> ReadRepairStage                   0         0         166645         0        
>          0
> CounterMutationStage              0         0              0         0        
>          0
> MiscStage                         0         0              0         0        
>          0
> HintedHandoff                     0         0             47         0        
>          0
> GossipStage                       0         0         188769         0        
>          0
> CacheCleanupExecutor              0         0              0         0        
>          0
> InternalResponseStage             0         0              0         0        
>          0
> CommitLogArchiver                 0         0              0         0        
>          0
> CompactionExecutor                0         0          86835         0        
>          0
> ValidationExecutor                0         0              0         0        
>          0
> MigrationStage                    0         0              0         0        
>          0                                    
> AntiEntropyStage                  0         0              0         0        
>          0                                    
> PendingRangeCalculator            0         0             92         0        
>          0                                    
> Sampler                           0         0              0         0        
>          0                                    
> MemtableFlushWriter               0         0            563         0        
>          0                                    
> MemtablePostFlush                 0         0           1500         0        
>          0                                    
> MemtableReclaimMemory             1        29            534         0        
>          0                                    
> Native-Transport-Requests        41         0       54819182         0        
>       1896                            
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>    java.lang.Thread.State: WAITING
>       at sun.misc.Unsafe.park(Native Method)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>       at 
> org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
>       at 
> org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
>       at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
>       at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>       at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>       at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>       at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>       at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
>       at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
>       at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>       at 
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
>       at 
> org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
>       at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
>       at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
>       at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
>       at 
> com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
>       at 
> com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
>       at 
> org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:263)
>       at 
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:108)
>       at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
>       at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
>       at 
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:316)
>       at 
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2015)
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1858)
>       at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:353)
>       at 
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
>       at 
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
>       at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>       at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - None
> "SharedPool-Worker-206" - Thread t@1014
>    java.lang.Thread.State: RUNNABLE
>       at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>       at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>       at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>       at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>       at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
>       at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
>       at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>       at 
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
>       at 
> org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
>       at 
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
>       at 
> org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:89)
>       at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:48)
>       at 
> org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:105)
>       at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82)
>       at 
> org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
>       at 
> org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:316)
>       at 
> org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2015)
>       at 
> org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1858)
>       at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:353)
>       at 
> org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85)
>       at 
> org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
>       at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>       at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105)
>       at java.lang.Thread.run(Thread.java:745)
>    Locked ownable synchronizers:
>       - None
> {code}
> As you can see MemtableReclaimMemory is waiting on the read barrier to be 
> released, but there are two queries currently being executed which are 
> locking this.
> Since most of the time is spent pretty low in the stack, these read 
> operations will never timeout (they are reading rows with tons of tombstones).
> We also can easily detect or purge the offending line because there is no 
> easy way to find out which partition is currently being read.
> The TombstoneFailureThreshold should also protect us, but it is probably 
> being checked too high in the call stack.
> Looks like RangeTombstoneList or DeletionInfo should also check for 
> DatabaseDescriptor.getTombstoneFailureThreshold()



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to