Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.1 8602fe8ce -> 86b6ec529


Add option to do more aggressive tombstone compaction.

Patch by pauloricardomg; reviewed by marcuse for CASSANDRA-6563


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/367c7419
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/367c7419
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/367c7419

Branch: refs/heads/cassandra-2.1
Commit: 367c741931c2a20eb2213650313dc238e8b0f3aa
Parents: 9376fdd
Author: Marcus Eriksson <[email protected]>
Authored: Tue May 27 10:20:29 2014 +0200
Committer: Marcus Eriksson <[email protected]>
Committed: Tue May 27 10:39:17 2014 +0200

----------------------------------------------------------------------
 CHANGES.txt                                     |   1 +
 doc/cql3/CQL.textile                            |  21 ++--
 pylib/cqlshlib/cql3handling.py                  |   2 +-
 .../compaction/AbstractCompactionStrategy.java  |  21 ++++
 .../db/compaction/CompactionsTest.java          | 104 ++++++++++++++++---
 5 files changed, 123 insertions(+), 26 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/cassandra/blob/367c7419/CHANGES.txt
----------------------------------------------------------------------
diff --git a/CHANGES.txt b/CHANGES.txt
index 42a1148..6a16cae 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -3,6 +3,7 @@
  * Support selecting multiple rows in a partition using IN (CASSANDRA-6875)
  * cqlsh: always emphasize the partition key in DESC output (CASSANDRA-7274)
  * Copy compaction options to make sure they are reloaded (CASSANDRA-7290)
+ * Add option to do more aggressive tombstone compactions (CASSANDRA-6563)
 
 2.0.8
  * Always reallocate buffers in HSHA (CASSANDRA-6285)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/367c7419/doc/cql3/CQL.textile
----------------------------------------------------------------------
diff --git a/doc/cql3/CQL.textile b/doc/cql3/CQL.textile
index 3c64bc6..393dc0d 100644
--- a/doc/cql3/CQL.textile
+++ b/doc/cql3/CQL.textile
@@ -335,16 +335,17 @@ h4(#compactionOptions). @compaction@ options
 
 The @compaction@ property must at least define the @'class'@ sub-option, that 
defines the compaction strategy class to use. The default supported class are 
@'SizeTieredCompactionStrategy'@ and @'LeveledCompactionStrategy'@. Custom 
strategy can be provided by specifying the full class name as a "string 
constant":#constants. The rest of the sub-options depends on the chosen class. 
The sub-options supported by the default classes are:
 
-|_. option                        |_. supported compaction strategy |_. 
default |_. description |
-| @enabled@                       | _all_                           | true     
 | A boolean denoting whether compaction should be enabled or not.|
-| @tombstone_threshold@           | _all_                           | 0.2      
 | A ratio such that if a sstable has more than this ratio of gcable tombstones 
over all contained columns, the sstable will be compacted (with no other 
sstables) for the purpose of purging those tombstones. |
-| @tombstone_compaction_interval@ | _all_                           | 1 day    
 | The minimum time to wait after an sstable creation time before considering 
it for "tombstone compaction", where "tombstone compaction" is the compaction 
triggered if the sstable has more gcable tombstones than @tombstone_threshold@. 
|
-| @min_sstable_size@              | SizeTieredCompactionStrategy    | 50MB     
 | The size tiered strategy groups SSTables to compact in buckets. A bucket 
groups SSTables that differs from less than 50% in size.  However, for small 
sizes, this would result in a bucketing that is too fine grained. 
@min_sstable_size@ defines a size threshold (in bytes) below which all SSTables 
belong to one unique bucket|
-| @min_threshold@                 | SizeTieredCompactionStrategy    | 4        
 | Minimum number of SSTables needed to start a minor compaction.|
-| @max_threshold@                 | SizeTieredCompactionStrategy    | 32       
 | Maximum number of SSTables processed by one minor compaction.|
-| @bucket_low@                    | SizeTieredCompactionStrategy    | 0.5      
 | Size tiered consider sstables to be within the same bucket if their size is 
within [average_size * @bucket_low@, average_size * @bucket_high@ ] (i.e the 
default groups sstable whose sizes diverges by at most 50%)|
-| @bucket_high@                   | SizeTieredCompactionStrategy    | 1.5      
 | Size tiered consider sstables to be within the same bucket if their size is 
within [average_size * @bucket_low@, average_size * @bucket_high@ ] (i.e the 
default groups sstable whose sizes diverges by at most 50%).|
-| @sstable_size_in_mb@            | LeveledCompactionStrategy       | 5MB      
 | The target size (in MB) for sstables in the leveled strategy. Note that 
while sstable sizes should stay less or equal to @sstable_size_in_mb@, it is 
possible to exceptionally have a larger sstable as during compaction, data for 
a given partition key are never split into 2 sstables|
+|_. option                         |_. supported compaction strategy |_. 
default |_. description |
+| @enabled@                        | _all_                           | true    
  | A boolean denoting whether compaction should be enabled or not.|
+| @tombstone_threshold@            | _all_                           | 0.2     
  | A ratio such that if a sstable has more than this ratio of gcable 
tombstones over all contained columns, the sstable will be compacted (with no 
other sstables) for the purpose of purging those tombstones. |
+| @tombstone_compaction_interval@  | _all_                           | 1 day   
  | The minimum time to wait after an sstable creation time before considering 
it for "tombstone compaction", where "tombstone compaction" is the compaction 
triggered if the sstable has more gcable tombstones than @tombstone_threshold@. 
|
+| @unchecked_tombstone_compaction@ | _all_                           | false   
 | Setting this to true enables more aggressive tombstone compactions - single 
sstable tombstone compactions will run without checking how likely it is that 
they will be successful. |
+| @min_sstable_size@               | SizeTieredCompactionStrategy    | 50MB    
  | The size tiered strategy groups SSTables to compact in buckets. A bucket 
groups SSTables that differs from less than 50% in size.  However, for small 
sizes, this would result in a bucketing that is too fine grained. 
@min_sstable_size@ defines a size threshold (in bytes) below which all SSTables 
belong to one unique bucket|
+| @min_threshold@                  | SizeTieredCompactionStrategy    | 4       
  | Minimum number of SSTables needed to start a minor compaction.|
+| @max_threshold@                  | SizeTieredCompactionStrategy    | 32      
  | Maximum number of SSTables processed by one minor compaction.|
+| @bucket_low@                     | SizeTieredCompactionStrategy    | 0.5     
  | Size tiered consider sstables to be within the same bucket if their size is 
within [average_size * @bucket_low@, average_size * @bucket_high@ ] (i.e the 
default groups sstable whose sizes diverges by at most 50%)|
+| @bucket_high@                    | SizeTieredCompactionStrategy    | 1.5     
  | Size tiered consider sstables to be within the same bucket if their size is 
within [average_size * @bucket_low@, average_size * @bucket_high@ ] (i.e the 
default groups sstable whose sizes diverges by at most 50%).|
+| @sstable_size_in_mb@             | LeveledCompactionStrategy       | 5MB     
  | The target size (in MB) for sstables in the leveled strategy. Note that 
while sstable sizes should stay less or equal to @sstable_size_in_mb@, it is 
possible to exceptionally have a larger sstable as during compaction, data for 
a given partition key are never split into 2 sstables|
 
 
 For the @compression@ property, the following default sub-options are 
available:

http://git-wip-us.apache.org/repos/asf/cassandra/blob/367c7419/pylib/cqlshlib/cql3handling.py
----------------------------------------------------------------------
diff --git a/pylib/cqlshlib/cql3handling.py b/pylib/cqlshlib/cql3handling.py
index 9b78638..b2557fe 100644
--- a/pylib/cqlshlib/cql3handling.py
+++ b/pylib/cqlshlib/cql3handling.py
@@ -79,7 +79,7 @@ class Cql3ParsingRuleSet(CqlParsingRuleSet):
         # (CQL3 option name, schema_columnfamilies column name (or None if 
same),
         #  list of known map keys)
         ('compaction', 'compaction_strategy_options',
-            ('class', 'max_threshold', 'tombstone_compaction_interval', 
'tombstone_threshold', 'enabled')),
+            ('class', 'max_threshold', 'tombstone_compaction_interval', 
'tombstone_threshold', 'enabled', 'unchecked_tombstone_compaction')),
         ('compression', 'compression_parameters',
             ('sstable_compression', 'chunk_length_kb', 'crc_check_chance')),
     )

http://git-wip-us.apache.org/repos/asf/cassandra/blob/367c7419/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
----------------------------------------------------------------------
diff --git 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
index 0a857b3..dc7e43a 100644
--- 
a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
+++ 
b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
@@ -19,6 +19,7 @@ package org.apache.cassandra.db.compaction;
 
 import java.util.*;
 
+import com.google.common.collect.ImmutableMap;
 import com.google.common.base.Predicate;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Iterables;
@@ -49,8 +50,12 @@ public abstract class AbstractCompactionStrategy
     protected static final float DEFAULT_TOMBSTONE_THRESHOLD = 0.2f;
     // minimum interval needed to perform tombstone removal compaction in 
seconds, default 86400 or 1 day.
     protected static final long DEFAULT_TOMBSTONE_COMPACTION_INTERVAL = 86400;
+    protected static final boolean 
DEFAULT_UNCHECKED_TOMBSTONE_COMPACTION_OPTION = false;
+
     protected static final String TOMBSTONE_THRESHOLD_OPTION = 
"tombstone_threshold";
     protected static final String TOMBSTONE_COMPACTION_INTERVAL_OPTION = 
"tombstone_compaction_interval";
+    // disable range overlap check when deciding if an SSTable is candidate 
for tombstone compaction (CASSANDRA-6563)
+    protected static final String UNCHECKED_TOMBSTONE_COMPACTION_OPTION = 
"unchecked_tombstone_compaction";
     protected static final String COMPACTION_ENABLED = "enabled";
 
     public final Map<String, String> options;
@@ -58,6 +63,7 @@ public abstract class AbstractCompactionStrategy
     protected final ColumnFamilyStore cfs;
     protected float tombstoneThreshold;
     protected long tombstoneCompactionInterval;
+    protected boolean uncheckedTombstoneCompaction;
 
     /**
      * pause/resume/getNextBackgroundTask must synchronize.  This guarantees 
that after pause completes,
@@ -88,6 +94,8 @@ public abstract class AbstractCompactionStrategy
             tombstoneThreshold = optionValue == null ? 
DEFAULT_TOMBSTONE_THRESHOLD : Float.parseFloat(optionValue);
             optionValue = options.get(TOMBSTONE_COMPACTION_INTERVAL_OPTION);
             tombstoneCompactionInterval = optionValue == null ? 
DEFAULT_TOMBSTONE_COMPACTION_INTERVAL : Long.parseLong(optionValue);
+            optionValue = options.get(UNCHECKED_TOMBSTONE_COMPACTION_OPTION);
+            uncheckedTombstoneCompaction = optionValue == null ? 
DEFAULT_UNCHECKED_TOMBSTONE_COMPACTION_OPTION : 
Boolean.parseBoolean(optionValue);
             if (!shouldBeEnabled())
                 this.disable();
         }
@@ -96,6 +104,7 @@ public abstract class AbstractCompactionStrategy
             logger.warn("Error setting compaction strategy options ({}), 
defaults will be used", e.getMessage());
             tombstoneThreshold = DEFAULT_TOMBSTONE_THRESHOLD;
             tombstoneCompactionInterval = 
DEFAULT_TOMBSTONE_COMPACTION_INTERVAL;
+            uncheckedTombstoneCompaction = 
DEFAULT_UNCHECKED_TOMBSTONE_COMPACTION_OPTION;
         }
     }
 
@@ -289,6 +298,10 @@ public abstract class AbstractCompactionStrategy
         if (droppableRatio <= tombstoneThreshold)
             return false;
 
+        //sstable range overlap check is disabled. See CASSANDRA-6563.
+        if (uncheckedTombstoneCompaction)
+            return true;
+
         Set<SSTableReader> overlaps = 
cfs.getOverlappingSSTables(Collections.singleton(sstable));
         if (overlaps.isEmpty())
         {
@@ -358,6 +371,13 @@ public abstract class AbstractCompactionStrategy
             }
         }
 
+        String unchecked = options.get(UNCHECKED_TOMBSTONE_COMPACTION_OPTION);
+        if (unchecked != null)
+        {
+            if (!unchecked.equalsIgnoreCase("true") && 
!unchecked.equalsIgnoreCase("false"))
+                throw new ConfigurationException(String.format("'%s' should be 
either 'true' or 'false', not '%s'",UNCHECKED_TOMBSTONE_COMPACTION_OPTION, 
unchecked));
+        }
+
         String compactionEnabled = options.get(COMPACTION_ENABLED);
         if (compactionEnabled != null)
         {
@@ -369,6 +389,7 @@ public abstract class AbstractCompactionStrategy
         Map<String, String> uncheckedOptions = new HashMap<String, 
String>(options);
         uncheckedOptions.remove(TOMBSTONE_THRESHOLD_OPTION);
         uncheckedOptions.remove(TOMBSTONE_COMPACTION_INTERVAL_OPTION);
+        uncheckedOptions.remove(UNCHECKED_TOMBSTONE_COMPACTION_OPTION);
         uncheckedOptions.remove(COMPACTION_ENABLED);
         return uncheckedOptions;
     }

http://git-wip-us.apache.org/repos/asf/cassandra/blob/367c7419/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
----------------------------------------------------------------------
diff --git a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java 
b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
index 7b91bed..98eacbf 100644
--- a/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/CompactionsTest.java
@@ -26,7 +26,6 @@ import java.util.concurrent.TimeUnit;
 
 import com.google.common.base.Function;
 import com.google.common.collect.Iterables;
-import com.google.common.collect.SetMultimap;
 import com.google.common.collect.Sets;
 import org.junit.Test;
 import org.junit.runner.RunWith;
@@ -35,7 +34,6 @@ import org.apache.cassandra.OrderedJUnit4ClassRunner;
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.columniterator.IdentityQueryFilter;
 import org.apache.cassandra.db.columniterator.OnDiskAtomIterator;
 import org.apache.cassandra.db.filter.QueryFilter;
 import org.apache.cassandra.db.marshal.CompositeType;
@@ -54,12 +52,13 @@ import static org.junit.Assert.*;
 @RunWith(OrderedJUnit4ClassRunner.class)
 public class CompactionsTest extends SchemaLoader
 {
+    private static final String STANDARD1 = "Standard1";
     public static final String KEYSPACE1 = "Keyspace1";
 
     public ColumnFamilyStore testSingleSSTableCompaction(String 
strategyClassName) throws Exception
     {
         Keyspace keyspace = Keyspace.open(KEYSPACE1);
-        ColumnFamilyStore store = keyspace.getColumnFamilyStore("Standard1");
+        ColumnFamilyStore store = keyspace.getColumnFamilyStore(STANDARD1);
         store.clearUnsafe();
         store.metadata.gcGraceSeconds(1);
         store.setCompactionStrategyClass(strategyClassName);
@@ -67,18 +66,7 @@ public class CompactionsTest extends SchemaLoader
         // disable compaction while flushing
         store.disableAutoCompaction();
 
-        long timestamp = System.currentTimeMillis();
-        for (int i = 0; i < 10; i++)
-        {
-            DecoratedKey key = Util.dk(Integer.toString(i));
-            RowMutation rm = new RowMutation(KEYSPACE1, key.key);
-            for (int j = 0; j < 10; j++)
-                rm.add("Standard1", ByteBufferUtil.bytes(Integer.toString(j)),
-                       ByteBufferUtil.EMPTY_BYTE_BUFFER,
-                       timestamp,
-                       j > 0 ? 3 : 0); // let first column never expire, since 
deleting all columns does not produce sstable
-            rm.apply();
-        }
+        long timestamp = populate(KEYSPACE1, STANDARD1, 0, 9, 3); //ttl=3s
         store.forceBlockingFlush();
         assertEquals(1, store.getSSTables().size());
         long originalSize = 
store.getSSTables().iterator().next().uncompressedLength();
@@ -103,6 +91,22 @@ public class CompactionsTest extends SchemaLoader
         return store;
     }
 
+    private long populate(String ks, String cf, int startRowKey, int 
endRowKey, int ttl) {
+        long timestamp = System.currentTimeMillis();
+        for (int i = startRowKey; i <= endRowKey; i++)
+        {
+            DecoratedKey key = Util.dk(Integer.toString(i));
+            RowMutation rm = new RowMutation(ks, key.key);
+            for (int j = 0; j < 10; j++)
+                rm.add(cf, ByteBufferUtil.bytes(Integer.toString(j)),
+                       ByteBufferUtil.EMPTY_BYTE_BUFFER,
+                       timestamp,
+                       j > 0 ? ttl : 0); // let first column never expire, 
since deleting all columns does not produce sstable
+            rm.apply();
+        }
+        return timestamp;
+    }
+
     /**
      * Test to see if sstable has enough expired columns, it is compacted 
itself.
      */
@@ -158,6 +162,76 @@ public class CompactionsTest extends SchemaLoader
         assert !iter.hasNext();
     }
 
+    @Test
+    public void testUncheckedTombstoneSizeTieredCompaction() throws Exception
+    {
+        Keyspace keyspace = Keyspace.open(KEYSPACE1);
+        ColumnFamilyStore store = keyspace.getColumnFamilyStore(STANDARD1);
+        store.clearUnsafe();
+        store.metadata.gcGraceSeconds(1);
+        
store.metadata.compactionStrategyOptions.put("tombstone_compaction_interval", 
"1");
+        
store.metadata.compactionStrategyOptions.put("unchecked_tombstone_compaction", 
"false");
+        store.reload();
+        
store.setCompactionStrategyClass(SizeTieredCompactionStrategy.class.getName());
+
+        // disable compaction while flushing
+        store.disableAutoCompaction();
+
+        //Populate sstable1 with with keys [0..9]
+        populate(KEYSPACE1, STANDARD1, 0, 9, 3); //ttl=3s
+        store.forceBlockingFlush();
+
+        //Populate sstable2 with with keys [10..19] (keys do not overlap with 
SSTable1)
+        long timestamp2 = populate(KEYSPACE1, STANDARD1, 10, 19, 3); //ttl=3s
+        store.forceBlockingFlush();
+
+        assertEquals(2, store.getSSTables().size());
+
+        Iterator<SSTableReader> it = store.getSSTables().iterator();
+        long originalSize1 = it.next().uncompressedLength();
+        long originalSize2 = it.next().uncompressedLength();
+
+        // wait enough to force single compaction
+        TimeUnit.SECONDS.sleep(5);
+
+        // enable compaction, submit background and wait for it to complete
+        store.enableAutoCompaction();
+        
FBUtilities.waitOnFutures(CompactionManager.instance.submitBackground(store));
+        while (CompactionManager.instance.getPendingTasks() > 0 || 
CompactionManager.instance.getActiveCompactions() > 0)
+            TimeUnit.SECONDS.sleep(1);
+
+        // even though both sstables were candidate for tombstone compaction
+        // it was not executed because they have an overlapping token range
+        assertEquals(2, store.getSSTables().size());
+        it = store.getSSTables().iterator();
+        long newSize1 = it.next().uncompressedLength();
+        long newSize2 = it.next().uncompressedLength();
+        assertEquals("candidate sstable should not be tombstone-compacted 
because its key range overlap with other sstable",
+                      originalSize1, newSize1);
+        assertEquals("candidate sstable should not be tombstone-compacted 
because its key range overlap with other sstable",
+                      originalSize2, newSize2);
+
+        // now let's enable the magic property
+        
store.metadata.compactionStrategyOptions.put("unchecked_tombstone_compaction", 
"true");
+        store.reload();
+
+        //submit background task again and wait for it to complete
+        
FBUtilities.waitOnFutures(CompactionManager.instance.submitBackground(store));
+        while (CompactionManager.instance.getPendingTasks() > 0 || 
CompactionManager.instance.getActiveCompactions() > 0)
+            TimeUnit.SECONDS.sleep(1);
+
+        //we still have 2 sstables, since they were not compacted against each 
other
+        assertEquals(2, store.getSSTables().size());
+        it = store.getSSTables().iterator();
+        newSize1 = it.next().uncompressedLength();
+        newSize2 = it.next().uncompressedLength();
+        assertTrue("should be less than " + originalSize1 + ", but was " + 
newSize1, newSize1 < originalSize1);
+        assertTrue("should be less than " + originalSize2 + ", but was " + 
newSize2, newSize2 < originalSize2);
+
+        // make sure max timestamp of compacted sstables is recorded properly 
after compaction.
+        assertMaxTimestamp(store, timestamp2);
+    }
+
     public static void assertMaxTimestamp(ColumnFamilyStore cfs, long 
maxTimestampExpected)
     {
         long maxTimestampObserved = Long.MIN_VALUE;

Reply via email to