This is an automated email from the ASF dual-hosted git repository.
smiklosovic pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git
The following commit(s) were added to refs/heads/trunk by this push:
new 3dd460bf34 Remove traces of sampling and auto training for Zstd
dictionary compression
3dd460bf34 is described below
commit 3dd460bf34c77efc6a1e430057034c12bae86429
Author: Stefan Miklosovic <[email protected]>
AuthorDate: Tue Feb 3 11:55:08 2026 +1100
Remove traces of sampling and auto training for Zstd dictionary compression
This is unnecessary code as the current implementation of CEP-54 is not
implemeting / exercising these features.
patch by Stefan Miklosovic; reviewed by Yifan Cai for CASSANDRA-21154
---
conf/cassandra.yaml | 10 --
conf/cassandra_latest.yaml | 10 --
.../pages/managing/operating/compression.adoc | 32 +---
src/java/org/apache/cassandra/config/Config.java | 4 -
.../cassandra/config/DatabaseDescriptor.java | 14 --
.../compression/CompressionDictionaryManager.java | 24 +--
.../CompressionDictionaryTrainingConfig.java | 13 --
.../compression/ICompressionDictionaryTrainer.java | 23 +--
.../db/compression/ZstdDictionaryTrainer.java | 44 +----
.../io/compress/CompressedSequentialWriter.java | 7 -
.../CompressionDictionaryIntegrationTest.java | 1 -
.../CompressionDictionaryManagerTest.java | 21 +--
.../CompressionDictionarySchedulerTest.java | 45 ++---
.../CompressionDictionaryTrainingConfigTest.java | 6 -
.../db/compression/SSTableChunkSamplerTest.java | 1 -
.../db/compression/ZstdDictionaryTrainerTest.java | 190 ++++-----------------
.../utils/CompressionDictionaryHelper.java | 4 +-
17 files changed, 65 insertions(+), 384 deletions(-)
diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml
index a1be9ce1a9..6cb3c08db0 100644
--- a/conf/cassandra.yaml
+++ b/conf/cassandra.yaml
@@ -2934,13 +2934,3 @@ compression_dictionary_cache_size: 10
# Expired dictionaries will be removed from memory but can be reloaded if
needed.
# Min unit: s
compression_dictionary_cache_expire: 24h
-
-# Enable automatic dictionary training based on sampling of write operations.
-# When enabled, the system will automatically collect samples and train new
dictionaries.
-# Manual training via nodetool is always available regardless of this setting.
-compression_dictionary_training_auto_train_enabled: false
-
-# Sampling rate for automatic dictionary training (0.01-1).
-# Value of 0.01 means 1% of writes are sampled. Lower values reduce overhead
but may
-# result in less representative sample data for dictionary training.
-compression_dictionary_training_sampling_rate: 0.01
diff --git a/conf/cassandra_latest.yaml b/conf/cassandra_latest.yaml
index 90d7ea0d1d..1d08bf1722 100644
--- a/conf/cassandra_latest.yaml
+++ b/conf/cassandra_latest.yaml
@@ -2673,13 +2673,3 @@ compression_dictionary_cache_size: 10
# Expired dictionaries will be removed from memory but can be reloaded if
needed.
# Min unit: s
compression_dictionary_cache_expire: 24h
-
-# Enable automatic dictionary training based on sampling of write operations.
-# When enabled, the system will automatically collect samples and train new
dictionaries.
-# Manual training via nodetool is always available regardless of this setting.
-compression_dictionary_training_auto_train_enabled: false
-
-# Sampling rate for automatic dictionary training (0.01-1).
-# Value of 0.01 means 1% of writes are sampled. Lower values reduce overhead
but may
-# result in less representative sample data for dictionary training.
-compression_dictionary_training_sampling_rate: 0.01
diff --git a/doc/modules/cassandra/pages/managing/operating/compression.adoc
b/doc/modules/cassandra/pages/managing/operating/compression.adoc
index ca028aff95..a5a06c4e00 100644
--- a/doc/modules/cassandra/pages/managing/operating/compression.adoc
+++ b/doc/modules/cassandra/pages/managing/operating/compression.adoc
@@ -110,7 +110,7 @@ dictionaries to maintain optimal compression ratios.
Before dictionary compression can provide optimal results, a compression
dictionary must be trained on representative data samples. Cassandra
-supports both manual and automatic training approaches.
+supports manual training approach for now.
==== Manual Dictionary Training
@@ -141,20 +141,6 @@ nodetool compressiondictionary train --force <keyspace>
<table>
This can be useful for testing or when you want to train a dictionary from
limited data during initial setup.
-==== Automatic Dictionary Training
-
-Enable automatic training in `cassandra.yaml`:
-
-[source,yaml]
-----
-compression_dictionary_training_auto_train_enabled: true
-compression_dictionary_training_sampling_rate: 0.01 # 1% of writes
-----
-
-When enabled, Cassandra automatically samples write operations and
-trains dictionaries in the background based on the configured sampling
-rate (range: 0.01-1, where 0.01 = 1% of writes).
-
=== Dictionary Storage and Distribution
Compression dictionaries are stored cluster-wide in the
@@ -298,13 +284,6 @@ next access.
=== Training Configuration
-* `compression_dictionary_training_auto_train_enabled` (default: `false`):
-Enable automatic background dictionary training. When enabled, Cassandra
-samples writes and trains dictionaries automatically.
-* `compression_dictionary_training_sampling_rate` (default: `0.01`):
-Sampling rate for automatic training, range 0.01-1 where 0.01 = 1% of
-writes. Lower values reduce training overhead but may miss data patterns.
-
Example configuration:
[source,yaml]
@@ -314,10 +293,6 @@ compression_dictionary_refresh_interval: 3600
compression_dictionary_refresh_initial_delay: 10
compression_dictionary_cache_size: 10
compression_dictionary_cache_expire: 3600
-
-# Automatic training
-compression_dictionary_training_auto_train_enabled: false
-compression_dictionary_training_sampling_rate: 0.01
----
=== CQL training parameters:
@@ -407,11 +382,6 @@ typically minimal (default 64KB per dictionary × cache
size).
`nodetool compressiondictionary train` samples SSTable chunk data and
performs CPU-intensive dictionary training. Consider running training
during off-peak hours.
-* *Automatic Training Impact*: When
-`compression_dictionary_training_auto_train_enabled` is true, write
-operations are sampled based on
`compression_dictionary_training_sampling_rate`.
-This adds minimal overhead but should be monitored in write-intensive
-workloads.
* *Dictionary Refresh*: The dictionary refresh process
(`compression_dictionary_refresh_interval`) checks for new dictionaries
cluster-wide. The default 1-hour interval balances freshness with
diff --git a/src/java/org/apache/cassandra/config/Config.java
b/src/java/org/apache/cassandra/config/Config.java
index 43cdf116ea..51cc039ec4 100644
--- a/src/java/org/apache/cassandra/config/Config.java
+++ b/src/java/org/apache/cassandra/config/Config.java
@@ -524,10 +524,6 @@ public class Config
public volatile int compression_dictionary_cache_size = 10; // max
dictionaries per table
public volatile DurationSpec.IntSecondsBound
compression_dictionary_cache_expire = new DurationSpec.IntSecondsBound("24h");
- // Dictionary training settings
- public volatile boolean compression_dictionary_training_auto_train_enabled
= false;
- public volatile float compression_dictionary_training_sampling_rate =
0.01f; // samples 1%
-
public DataStorageSpec.LongMebibytesBound paxos_cache_size = null;
public DataStorageSpec.LongMebibytesBound consensus_migration_cache_size =
null;
diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
index a40db14a6e..ec4e4f1e9e 100644
--- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
+++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
@@ -1257,9 +1257,6 @@ public class DatabaseDescriptor
{
throw new ConfigurationException(ex.getMessage());
}
-
- if (conf.compression_dictionary_training_sampling_rate <= 0.0f ||
conf.compression_dictionary_training_sampling_rate > 1.0f)
- throw new ConfigurationException("Sampling rate has to be between
(0.0;1], it is " + conf.compression_dictionary_training_sampling_rate);
}
@VisibleForTesting
@@ -4467,17 +4464,6 @@ public class DatabaseDescriptor
return conf.compression_dictionary_cache_expire.toSeconds();
}
- public static boolean getCompressionDictionaryTrainingAutoTrainEnabled()
- {
- return conf.compression_dictionary_training_auto_train_enabled;
- }
-
-
- public static float getCompressionDictionaryTrainingSamplingRate()
- {
- return conf.compression_dictionary_training_sampling_rate;
- }
-
public static int getStreamingKeepAlivePeriod()
{
return conf.streaming_keep_alive_period.toSeconds();
diff --git
a/src/java/org/apache/cassandra/db/compression/CompressionDictionaryManager.java
b/src/java/org/apache/cassandra/db/compression/CompressionDictionaryManager.java
index a7875fa3ad..2e05fb9681 100644
---
a/src/java/org/apache/cassandra/db/compression/CompressionDictionaryManager.java
+++
b/src/java/org/apache/cassandra/db/compression/CompressionDictionaryManager.java
@@ -18,7 +18,6 @@
package org.apache.cassandra.db.compression;
-import java.nio.ByteBuffer;
import java.util.List;
import java.util.Map;
import java.util.Set;
@@ -34,7 +33,6 @@ import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.cassandra.config.DataStorageSpec;
-import org.apache.cassandra.config.DatabaseDescriptor;
import org.apache.cassandra.db.ColumnFamilyStore;
import
org.apache.cassandra.db.compression.CompressionDictionary.LightweightCompressionDictionary;
import
org.apache.cassandra.db.compression.CompressionDictionaryDetailsTabularData.CompressionDictionaryDataObject;
@@ -88,7 +86,7 @@ public class CompressionDictionaryManager implements
CompressionDictionaryManage
scheduler.scheduleRefreshTask();
- trainer.start(false, createTrainingConfig());
+ trainer.start(createTrainingConfig());
}
if (registerBookkeeping && isEnabled)
@@ -148,7 +146,7 @@ public class CompressionDictionaryManager implements
CompressionDictionaryManage
// Start trainer if it exists
if (trainer != null)
{
- trainer.start(false, createTrainingConfig());
+ trainer.start(createTrainingConfig());
}
return;
}
@@ -165,21 +163,6 @@ public class CompressionDictionaryManager implements
CompressionDictionaryManage
}
}
- /**
- * Adds a sample to the dictionary trainer for learning compression
patterns.
- * Samples are randomly selected to avoid bias and improve dictionary
quality.
- *
- * @param sample the sample data to potentially add for training
- */
- public void addSample(ByteBuffer sample)
- {
- ICompressionDictionaryTrainer dictionaryTrainer = trainer;
- if (dictionaryTrainer != null && dictionaryTrainer.shouldSample())
- {
- dictionaryTrainer.addSample(sample);
- }
- }
-
@Nullable
@Override
public CompressionDictionary getCurrent()
@@ -252,7 +235,7 @@ public class CompressionDictionaryManager implements
CompressionDictionaryManage
logger.info("Starting SSTable-based training for {}.{} with {}
SSTables",
keyspaceName, tableName, sstables.size());
- trainer.start(true, trainingConfig);
+ trainer.start(trainingConfig);
scheduler.scheduleSSTableBasedTraining(trainer, sstables,
trainingConfig, force);
}
@@ -389,7 +372,6 @@ public class CompressionDictionaryManager implements
CompressionDictionaryManage
.builder()
.maxDictionarySize(getCompressionDictionaryTrainingMaxDictionarySize(compressionParams,
parameters))
.maxTotalSampleSize(getCompressionDictionaryTrainingMaxTotalSampleSize(compressionParams,
parameters))
-
.samplingRate(DatabaseDescriptor.getCompressionDictionaryTrainingSamplingRate())
.chunkSize(compressionParams.chunkLength())
.build();
}
diff --git
a/src/java/org/apache/cassandra/db/compression/CompressionDictionaryTrainingConfig.java
b/src/java/org/apache/cassandra/db/compression/CompressionDictionaryTrainingConfig.java
index 794807974b..242490581f 100644
---
a/src/java/org/apache/cassandra/db/compression/CompressionDictionaryTrainingConfig.java
+++
b/src/java/org/apache/cassandra/db/compression/CompressionDictionaryTrainingConfig.java
@@ -28,7 +28,6 @@ public class CompressionDictionaryTrainingConfig
public final int maxDictionarySize;
public final int maxTotalSampleSize;
public final int acceptableTotalSampleSize;
- public final float samplingRate;
public final int chunkSize;
private CompressionDictionaryTrainingConfig(Builder builder)
@@ -36,7 +35,6 @@ public class CompressionDictionaryTrainingConfig
this.maxDictionarySize = builder.maxDictionarySize;
this.maxTotalSampleSize = builder.maxTotalSampleSize;
this.acceptableTotalSampleSize = builder.maxTotalSampleSize / 10 * 8;
- this.samplingRate = builder.samplingRate;
this.chunkSize = builder.chunkSize;
}
@@ -49,7 +47,6 @@ public class CompressionDictionaryTrainingConfig
{
private int maxDictionarySize = 65536; // 64KB default
private int maxTotalSampleSize = 10 * 1024 * 1024; // 10MB total
- private float samplingRate = 0.01f; // Sampling 1%
private int chunkSize = 64 * 1024; // 64KB default
public Builder maxDictionarySize(int size)
@@ -64,15 +61,6 @@ public class CompressionDictionaryTrainingConfig
return this;
}
- public Builder samplingRate(float samplingRate)
- {
- if (samplingRate <= 0.0f || samplingRate > 1.0f)
- throw new IllegalArgumentException("Sampling rate has to be
between (0.0;1], it is " + samplingRate);
-
- this.samplingRate = samplingRate;
- return this;
- }
-
public Builder chunkSize(int chunkSize)
{
this.chunkSize = chunkSize;
@@ -83,7 +71,6 @@ public class CompressionDictionaryTrainingConfig
{
Preconditions.checkArgument(maxDictionarySize > 0,
"maxDictionarySize must be positive");
Preconditions.checkArgument(maxTotalSampleSize > 0,
"maxTotalSampleSize must be positive");
- Preconditions.checkArgument(samplingRate > 0, "samplingRate must
be positive");
Preconditions.checkArgument(chunkSize > 0, "chunkSize must be
positive");
return new CompressionDictionaryTrainingConfig(this);
}
diff --git
a/src/java/org/apache/cassandra/db/compression/ICompressionDictionaryTrainer.java
b/src/java/org/apache/cassandra/db/compression/ICompressionDictionaryTrainer.java
index 76a9b02bb2..5fad643fc4 100644
---
a/src/java/org/apache/cassandra/db/compression/ICompressionDictionaryTrainer.java
+++
b/src/java/org/apache/cassandra/db/compression/ICompressionDictionaryTrainer.java
@@ -41,18 +41,11 @@ public interface ICompressionDictionaryTrainer extends
AutoCloseable
/**
* Starts the trainer for collecting samples.
*
- * @param manualTraining true if this is manual training, false for
automatic
* @param trainingConfig training configuration to use
* @return true if the trainer is started; otherwise false. The trainer is
started
- * in any of those conditions: 1. trainer closed; 2. not requested
for
- * either manual or auto training; 3. failed to start
+ * in any of those conditions: 1. trainer closed; 2. failed to
start
*/
- boolean start(boolean manualTraining, CompressionDictionaryTrainingConfig
trainingConfig);
-
- /**
- * @return true if the trainer is ready to take a new sample; otherwise,
false
- */
- boolean shouldSample();
+ boolean start(CompressionDictionaryTrainingConfig trainingConfig);
/**
* Adds a sample to the training dataset.
@@ -122,14 +115,6 @@ public interface ICompressionDictionaryTrainer extends
AutoCloseable
*/
void setDictionaryTrainedListener(Consumer<CompressionDictionary>
listener);
- /**
- * Updates the sampling rate for this trainer.
- *
- * @param newSamplingRate the new sampling rate. For exmaple, 0.01 -
sample 1% of data,
- * 1 = sample every time (100%), 0.5 - sample 50%
of data.
- */
- void updateSamplingRate(float newSamplingRate);
-
/**
* Factory method to create appropriate trainer based on compression
parameters.
*
@@ -149,7 +134,7 @@ public interface ICompressionDictionaryTrainer extends
AutoCloseable
throw new IllegalArgumentException("Compressor does not support
dictionary training: " + params.getSstableCompressor());
}
- IDictionaryCompressor dictionaryCompressor = (IDictionaryCompressor)
compressor;
+ IDictionaryCompressor<?> dictionaryCompressor =
(IDictionaryCompressor<?>) compressor;
return
dictionaryCompressor.acceptableDictionaryKind().createTrainer(keyspaceName,
tableName, compressor);
}
@@ -159,6 +144,6 @@ public interface ICompressionDictionaryTrainer extends
AutoCloseable
SAMPLING,
TRAINING,
COMPLETED,
- FAILED;
+ FAILED
}
}
diff --git
a/src/java/org/apache/cassandra/db/compression/ZstdDictionaryTrainer.java
b/src/java/org/apache/cassandra/db/compression/ZstdDictionaryTrainer.java
index 1f370143ef..a49c779ceb 100644
--- a/src/java/org/apache/cassandra/db/compression/ZstdDictionaryTrainer.java
+++ b/src/java/org/apache/cassandra/db/compression/ZstdDictionaryTrainer.java
@@ -19,7 +19,6 @@
package org.apache.cassandra.db.compression;
import java.nio.ByteBuffer;
-import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.atomic.AtomicLong;
import java.util.function.Consumer;
@@ -31,7 +30,6 @@ import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.cassandra.concurrent.ScheduledExecutors;
-import org.apache.cassandra.config.DatabaseDescriptor;
import org.apache.cassandra.db.compression.CompressionDictionary.DictId;
import org.apache.cassandra.db.compression.CompressionDictionary.Kind;
import org.apache.cassandra.io.compress.IDictionaryCompressor;
@@ -56,9 +54,6 @@ public class ZstdDictionaryTrainer implements
ICompressionDictionaryTrainer
private final AtomicLong sampleCount;
private final int compressionLevel; // optimal if using the same level for
training as when compressing.
- // Sampling rate can be updated during training
- private volatile float samplingRate;
-
// Minimum number of samples required by ZSTD library
private static final int MIN_SAMPLES_REQUIRED = 11;
@@ -70,31 +65,15 @@ public class ZstdDictionaryTrainer implements
ICompressionDictionaryTrainer
private volatile String failureMessage;
public ZstdDictionaryTrainer(String keyspaceName, String tableName, int
compressionLevel)
- {
- this(keyspaceName,
- tableName,
- compressionLevel,
-
DatabaseDescriptor.getCompressionDictionaryTrainingSamplingRate());
- }
-
- @VisibleForTesting
- public ZstdDictionaryTrainer(String keyspaceName, String tableName, int
compressionLevel, float samplingRate)
{
this.keyspaceName = keyspaceName;
this.tableName = tableName;
this.totalSampleSize = new AtomicLong(0);
this.sampleCount = new AtomicLong(0);
this.compressionLevel = compressionLevel;
- this.samplingRate = samplingRate;
this.currentTrainingStatus = TrainingStatus.NOT_STARTED;
}
- @Override
- public boolean shouldSample()
- {
- return zstdTrainer != null && ThreadLocalRandom.current().nextFloat()
< samplingRate;
- }
-
@Override
public void addSample(ByteBuffer sample)
{
@@ -304,16 +283,15 @@ public class ZstdDictionaryTrainer implements
ICompressionDictionaryTrainer
}
@Override
- public boolean start(boolean manualTraining,
CompressionDictionaryTrainingConfig trainingConfig)
+ public boolean start(CompressionDictionaryTrainingConfig trainingConfig)
{
- if (closed || !(manualTraining || shouldAutoStartTraining()))
+ if (closed)
return false;
try
{
// reset on starting; a new zstdTrainer instance is created during
reset
reset(trainingConfig);
- logger.info("Started dictionary training for {}.{}", keyspaceName,
tableName);
currentTrainingStatus = TrainingStatus.SAMPLING;
failureMessage = null; // Clear any previous failure message
return true;
@@ -327,14 +305,6 @@ public class ZstdDictionaryTrainer implements
ICompressionDictionaryTrainer
return false;
}
- /**
- * Determines if training should auto-start based on configuration.
- */
- private boolean shouldAutoStartTraining()
- {
- return
DatabaseDescriptor.getCompressionDictionaryTrainingAutoTrainEnabled();
- }
-
@Override
public void reset(CompressionDictionaryTrainingConfig trainingConfig)
{
@@ -365,16 +335,6 @@ public class ZstdDictionaryTrainer implements
ICompressionDictionaryTrainer
this.dictionaryTrainedListener = listener;
}
- @Override
- public void updateSamplingRate(float newSamplingRate)
- {
- if (newSamplingRate <= 0.0f || newSamplingRate > 1.0f)
- throw new IllegalArgumentException("Sampling rate has to be
between (0.0;1], it is " + newSamplingRate);
-
- this.samplingRate = newSamplingRate;
- logger.debug("Updated sampling rate to {} for {}.{}", newSamplingRate,
keyspaceName, tableName);
- }
-
/**
* Notifies the registered listener that a dictionary has been trained.
*
diff --git
a/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java
b/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java
index 71f1dc4c6f..484b50c5e5 100644
--- a/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java
+++ b/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java
@@ -184,13 +184,6 @@ public class CompressedSequentialWriter extends
SequentialWriter
{
// compressing data with buffer re-use
buffer.flip();
-
- // Collect sample for dictionary training before compression
- if (isDictionaryEnabled)
- {
- compressionDictionaryManager.addSample(buffer.duplicate());
- }
-
compressed.clear();
compressor.compress(buffer, compressed);
}
diff --git
a/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryIntegrationTest.java
b/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryIntegrationTest.java
index b192b6bd2d..4bf4107712 100644
---
a/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryIntegrationTest.java
+++
b/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryIntegrationTest.java
@@ -56,7 +56,6 @@ public class CompressionDictionaryIntegrationTest extends
CQLTester
public void configureDatabaseDescriptor()
{
Config config = DatabaseDescriptor.getRawConfig();
- config.compression_dictionary_training_sampling_rate = 1.0f;
config.flush_compression = Config.FlushCompression.table;
DatabaseDescriptor.setConfig(config);
}
diff --git
a/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryManagerTest.java
b/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryManagerTest.java
index d7bc52e07a..3efb9d9111 100644
---
a/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryManagerTest.java
+++
b/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryManagerTest.java
@@ -18,7 +18,6 @@
package org.apache.cassandra.db.compression;
-import java.nio.ByteBuffer;
import java.util.Map;
import org.junit.After;
@@ -37,7 +36,6 @@ import org.apache.cassandra.schema.KeyspaceParams;
import org.apache.cassandra.schema.TableMetadata;
import static org.assertj.core.api.Assertions.assertThat;
-import static org.assertj.core.api.Assertions.assertThatNoException;
import static org.assertj.core.api.Assertions.assertThatThrownBy;
public class CompressionDictionaryManagerTest
@@ -118,7 +116,7 @@ public class CompressionDictionaryManagerTest
TrainingState trainingState =
TrainingState.fromCompositeData(managerWithDict.getTrainingState());
assertThat(trainingState.getStatus())
.as("Training status should be valid")
- .isEqualTo(TrainingStatus.NOT_STARTED);
+ .isEqualTo(TrainingStatus.SAMPLING);
}
@Test
@@ -190,23 +188,6 @@ public class CompressionDictionaryManagerTest
.isNotSameAs(initialTrainer);
}
- @Test
- public void testAddSample()
- {
- ByteBuffer sample = ByteBuffer.wrap("test sample data".getBytes());
- ByteBuffer emptyBuffer = ByteBuffer.allocate(0);
-
- // Should not throw for dictionary-enabled table
- assertThatNoException().isThrownBy(() ->
managerWithDict.addSample(sample));
- assertThatNoException().isThrownBy(() ->
managerWithDict.addSample(null));
- assertThatNoException().isThrownBy(() ->
managerWithDict.addSample(emptyBuffer));
-
- // Should not throw for non-dictionary table (graceful handling)
- assertThatNoException().isThrownBy(() ->
managerWithoutDict.addSample(sample));
- assertThatNoException().isThrownBy(() ->
managerWithoutDict.addSample(null));
- assertThatNoException().isThrownBy(() ->
managerWithoutDict.addSample(emptyBuffer));
- }
-
@Test
public void testTrainManualWithNonDictionaryTable()
{
diff --git
a/test/unit/org/apache/cassandra/db/compression/CompressionDictionarySchedulerTest.java
b/test/unit/org/apache/cassandra/db/compression/CompressionDictionarySchedulerTest.java
index 42f91a82e0..e9c5b00b47 100644
---
a/test/unit/org/apache/cassandra/db/compression/CompressionDictionarySchedulerTest.java
+++
b/test/unit/org/apache/cassandra/db/compression/CompressionDictionarySchedulerTest.java
@@ -64,15 +64,16 @@ public class CompressionDictionarySchedulerTest extends
CQLTester
scheduler = new CompressionDictionaryScheduler(KEYSPACE, table, cache,
true);
ColumnFamilyStore cfs =
Keyspace.open(keyspace()).getColumnFamilyStore(table);
- CompressionDictionaryManager manager =
cfs.compressionDictionaryManager();
-
- Set<SSTableReader> sstables = new HashSet<>();
- CompressionDictionaryTrainingConfig config =
createSampleAllTrainingConfig(cfs);
+ try (CompressionDictionaryManager manager =
cfs.compressionDictionaryManager())
+ {
+ Set<SSTableReader> sstables = new HashSet<>();
+ CompressionDictionaryTrainingConfig config =
createSampleAllTrainingConfig(cfs);
- // Should not throw, but task will complete quickly with no SSTables
- scheduler.scheduleSSTableBasedTraining(manager.trainer(), sstables,
config, true);
- spinUntilTrue(() -> !scheduler.isManualTrainingRunning());
- assertThat(manager.getCurrent()).isNull();
+ // Should not throw, but task will complete quickly with no
SSTables
+ scheduler.scheduleSSTableBasedTraining(manager.trainer(),
sstables, config, true);
+ spinUntilTrue(() -> !scheduler.isManualTrainingRunning());
+ assertThat(manager.getCurrent()).isNull();
+ }
}
@Test
@@ -84,23 +85,24 @@ public class CompressionDictionarySchedulerTest extends
CQLTester
ColumnFamilyStore cfs =
Keyspace.open(keyspace()).getColumnFamilyStore(table);
cfs.disableAutoCompaction();
- CompressionDictionaryManager manager =
cfs.compressionDictionaryManager();
-
- createSSTables();
+ try (CompressionDictionaryManager manager =
cfs.compressionDictionaryManager())
+ {
+ createSSTables();
- Set<SSTableReader> sstables = cfs.getLiveSSTables();
- assertThat(sstables).isNotEmpty();
+ Set<SSTableReader> sstables = cfs.getLiveSSTables();
+ assertThat(sstables).isNotEmpty();
- CompressionDictionaryTrainingConfig config =
createSampleAllTrainingConfig(cfs);
- manager.trainer().start(true, config);
+ CompressionDictionaryTrainingConfig config =
createSampleAllTrainingConfig(cfs);
+ manager.trainer().start(config);
- assertThat(manager.getCurrent()).as("There should be no dictionary at
this step").isNull();
- scheduler.scheduleSSTableBasedTraining(manager.trainer(), sstables,
config, true);
+ assertThat(manager.getCurrent()).as("There should be no dictionary
at this step").isNull();
+ scheduler.scheduleSSTableBasedTraining(manager.trainer(),
sstables, config, true);
- // Task should be scheduled
- assertThat(scheduler.isManualTrainingRunning()).isTrue();
- // A dictionary should be trained
- spinUntilTrue(() -> manager.getCurrent() != null);
+ // Task should be scheduled
+ assertThat(scheduler.isManualTrainingRunning()).isTrue();
+ // A dictionary should be trained
+ spinUntilTrue(() -> manager.getCurrent() != null);
+ }
}
private void createSSTables()
@@ -122,7 +124,6 @@ public class CompressionDictionarySchedulerTest extends
CQLTester
.builder()
.maxDictionarySize(new
DataStorageSpec.IntKibibytesBound(DEFAULT_TRAINING_MAX_DICTIONARY_SIZE_PARAMETER_VALUE).toBytes())
.maxTotalSampleSize(new
DataStorageSpec.IntKibibytesBound(DEFAULT_TRAINING_MAX_TOTAL_SAMPLE_SIZE_PARAMETER_VALUE).toBytes())
- .samplingRate(1.0f)
.chunkSize(cfs.metadata().params.compression.chunkLength())
.build();
}
diff --git
a/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryTrainingConfigTest.java
b/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryTrainingConfigTest.java
index a066f629b9..37ae196af1 100644
---
a/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryTrainingConfigTest.java
+++
b/test/unit/org/apache/cassandra/db/compression/CompressionDictionaryTrainingConfigTest.java
@@ -35,9 +35,6 @@ public class CompressionDictionaryTrainingConfigTest
assertThat(config.maxTotalSampleSize)
.as("Default max total sample size should be 10MB")
.isEqualTo(10 * 1024 * 1024);
- assertThat(config.samplingRate)
- .as("Default sampling rate should be 0.01 (1%)")
- .isEqualTo(0.01f);
}
@Test
@@ -45,19 +42,16 @@ public class CompressionDictionaryTrainingConfigTest
{
int dictSize = 16 * 1024; // 16KB
int sampleSize = 2 * 1024 * 1024; // 2MB
- float samplingRate = 0.005f; // 0.5%
CompressionDictionaryTrainingConfig config =
CompressionDictionaryTrainingConfig.builder()
.maxDictionarySize(dictSize)
.maxTotalSampleSize(sampleSize)
-
.samplingRate(samplingRate)
.build();
// Verify all calculated values are consistent
assertThat(config.maxDictionarySize).isEqualTo(dictSize);
assertThat(config.maxTotalSampleSize).isEqualTo(sampleSize);
assertThat(config.acceptableTotalSampleSize).isEqualTo(sampleSize / 10
* 8);
- assertThat(config.samplingRate).isEqualTo(0.005f);
// Verify relationship between max and acceptable sample sizes
assertThat(config.acceptableTotalSampleSize)
diff --git
a/test/unit/org/apache/cassandra/db/compression/SSTableChunkSamplerTest.java
b/test/unit/org/apache/cassandra/db/compression/SSTableChunkSamplerTest.java
index 22a4b081d5..2fc44853a2 100644
--- a/test/unit/org/apache/cassandra/db/compression/SSTableChunkSamplerTest.java
+++ b/test/unit/org/apache/cassandra/db/compression/SSTableChunkSamplerTest.java
@@ -195,7 +195,6 @@ public class SSTableChunkSamplerTest extends CQLTester
// Create a mock trainer that is not ready to sample
ICompressionDictionaryTrainer trainer =
mock(ICompressionDictionaryTrainer.class, RETURNS_DEEP_STUBS);
- when(trainer.shouldSample()).thenReturn(false);
when(trainer.getTrainingState().getStatus()).thenReturn(ICompressionDictionaryTrainer.TrainingStatus.NOT_STARTED);
// Should throw IllegalStateException when trainer is not ready
diff --git
a/test/unit/org/apache/cassandra/db/compression/ZstdDictionaryTrainerTest.java
b/test/unit/org/apache/cassandra/db/compression/ZstdDictionaryTrainerTest.java
index 7794d25cb9..5e7f526012 100644
---
a/test/unit/org/apache/cassandra/db/compression/ZstdDictionaryTrainerTest.java
+++
b/test/unit/org/apache/cassandra/db/compression/ZstdDictionaryTrainerTest.java
@@ -49,7 +49,6 @@ public class ZstdDictionaryTrainerTest
private CompressionDictionaryTrainingConfig testConfig;
private ZstdDictionaryTrainer trainer;
- private Consumer<CompressionDictionary> mockCallback;
private AtomicReference<CompressionDictionary> callbackResult;
@BeforeClass
@@ -64,13 +63,12 @@ public class ZstdDictionaryTrainerTest
testConfig = CompressionDictionaryTrainingConfig.builder()
.maxDictionarySize(1024) // Small for testing
.maxTotalSampleSize(10
* 1024) // 10KB total
- .samplingRate(1)
// 100% sampling for predictable tests
.build();
callbackResult = new AtomicReference<>();
- mockCallback = callbackResult::set;
+ Consumer<CompressionDictionary> mockCallback = callbackResult::set;
- trainer = new ZstdDictionaryTrainer(TEST_KEYSPACE, TEST_TABLE,
COMPRESSION_LEVEL, testConfig.samplingRate);
+ trainer = new ZstdDictionaryTrainer(TEST_KEYSPACE, TEST_TABLE,
COMPRESSION_LEVEL);
trainer.setDictionaryTrainedListener(mockCallback);
}
@@ -109,7 +107,7 @@ public class ZstdDictionaryTrainerTest
public void testTrainerStart()
{
// Auto start depends on configuration - test both scenarios
- boolean started = trainer.start(false, testConfig);
+ boolean started = trainer.start(testConfig);
if (started)
{
assertThat(trainer.getTrainingState().getStatus())
@@ -127,7 +125,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testTrainerStartManual()
{
- assertThat(trainer.start(true, testConfig))
+ assertThat(trainer.start(testConfig))
.as("Manual training should start successfully")
.isTrue();
assertThat(trainer.getTrainingState().getStatus())
@@ -141,25 +139,22 @@ public class ZstdDictionaryTrainerTest
@Test
public void testTrainerStartMultipleTimes()
{
- assertThat(trainer.start(true, testConfig))
+ assertThat(trainer.start(testConfig))
.as("First start (manual training) should succeed")
.isTrue();
Object firstTrainer = trainer.trainer();
assertThat(firstTrainer).isNotNull();
- assertThat(trainer.start(true, testConfig))
+ assertThat(trainer.start(testConfig))
.as("Second start (manual training) should suceed and reset")
.isTrue();
Object secondTrainer = trainer.trainer();
assertThat(secondTrainer).isNotNull().isNotSameAs(firstTrainer);
- assertThat(trainer.start(false, testConfig))
- .as("Third start (not manual training) should fail")
- .isFalse();
}
@Test
public void testTrainerCloseIdempotent()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
trainer.close();
trainer.close(); // Should not throw
trainer.close(); // Should not throw
@@ -172,7 +167,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testTrainerReset()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
addSampleData(1000); // Add some samples
assertThat(trainer.getTrainingState().getSampleCount())
@@ -194,10 +189,10 @@ public class ZstdDictionaryTrainerTest
@Test
public void testStartAfterClose()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
trainer.close();
- assertThat(trainer.start(true, testConfig))
+ assertThat(trainer.start(testConfig))
.as("Should not start after close")
.isFalse();
assertThat(trainer.getTrainingState().getStatus())
@@ -205,55 +200,10 @@ public class ZstdDictionaryTrainerTest
.isEqualTo(TrainingStatus.NOT_STARTED);
}
- @Test
- public void testShouldSample()
- {
- trainer.start(true, testConfig);
- // With sampling rate 1 (100%), should always return true
- for (int i = 0; i < 10; i++)
- {
- assertThat(trainer.shouldSample())
- .as("Should sample with rate 1")
- .isTrue();
- }
- }
-
- @Test
- public void testShouldSampleWithLowRate()
- {
- // Test with lower sampling rate
- CompressionDictionaryTrainingConfig lowSamplingConfig =
- CompressionDictionaryTrainingConfig.builder()
- .maxDictionarySize(1024)
- .maxTotalSampleSize(10 * 1024)
- .samplingRate(0.001f) // 0.1%
sampling
- .build();
-
- try (ZstdDictionaryTrainer lowSamplingTrainer = new
ZstdDictionaryTrainer(TEST_KEYSPACE, TEST_TABLE, COMPRESSION_LEVEL,
lowSamplingConfig.samplingRate))
- {
- lowSamplingTrainer.setDictionaryTrainedListener(mockCallback);
- // With very low sampling rate, should mostly return false
- int sampleCount = 0;
- int iterations = 1000;
- for (int i = 0; i < iterations; i++)
- {
- if (lowSamplingTrainer.shouldSample())
- {
- sampleCount++;
- }
- }
-
- // Should be roughly 0.1% (1 out of 1000), allow some variance
- assertThat(sampleCount)
- .as("Sample rate should be low")
- .isLessThan(iterations / 10);
- }
- }
-
@Test
public void testAddSample()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
assertThat(trainer.getTrainingState().getSampleCount())
.as("Initial sample count should be 0")
@@ -291,7 +241,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testAddSampleAfterClose()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
trainer.close();
ByteBuffer sample = ByteBuffer.wrap(SAMPLE_DATA.getBytes());
@@ -308,7 +258,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testAddNullSample()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
trainer.addSample(null); // Should not throw
assertThat(trainer.getTrainingState().getStatus())
@@ -322,7 +272,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testAddEmptySample()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
ByteBuffer empty = ByteBuffer.allocate(0);
trainer.addSample(empty); // Should not throw
@@ -337,7 +287,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testIsReady()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
assertThat(trainer.isReady())
.as("Should not be ready initially")
.isFalse();
@@ -362,7 +312,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testTrainDictionaryWithInsufficientSampleCount()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
// Add sufficient data size but only 5 samples (less than minimum 11)
for (int i = 0; i < 5; i++)
@@ -395,7 +345,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testTrainDictionaryWithSufficientSampleCount()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
// Add 15 samples with sufficient total size
for (int i = 0; i < 15; i++)
@@ -416,7 +366,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testTrainDictionaryAsync() throws Exception
{
- Future<CompressionDictionary> future = startTraining(true, false,
testConfig);
+ Future<CompressionDictionary> future = startTraining(false,
testConfig);
CompressionDictionary dictionary = future.get(5, TimeUnit.SECONDS);
assertThat(dictionary).as("Dictionary should not be null").isNotNull();
@@ -431,7 +381,7 @@ public class ZstdDictionaryTrainerTest
public void testTrainDictionaryAsyncForce() throws Exception
{
// Don't add enough samples
- Future<CompressionDictionary> future = startTraining(true, true,
testConfig, 512);
+ Future<CompressionDictionary> future = startTraining(true, testConfig,
512);
CompressionDictionary dictionary = future.get(1, TimeUnit.SECONDS);
assertThat(dictionary)
.as("Forced async training should produce dictionary")
@@ -442,7 +392,7 @@ public class ZstdDictionaryTrainerTest
public void testTrainDictionaryAsyncForceFailsWithNoData() throws Exception
{
AtomicReference<CompressionDictionary> dictRef = new
AtomicReference<>();
- Future<CompressionDictionary> result = startTraining(true, true,
testConfig, 0)
+ Future<CompressionDictionary> result = startTraining(true, testConfig,
0)
.addCallback((dict,
t) -> dictRef.set(dict));
assertThat(result.isDone() && result.cause() != null)
@@ -459,7 +409,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testDictionaryTrainedListener()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
addSampleData(testConfig.acceptableTotalSampleSize);
// Train dictionary synchronously - callback should be called
@@ -523,88 +473,6 @@ public class ZstdDictionaryTrainerTest
.isFalse();
}
- @Test
- public void testUpdateSamplingRate()
- {
- trainer.start(true, testConfig);
-
- // Test updating to different valid sampling rates
- trainer.updateSamplingRate(0.1f);
-
- // With sampling rate 10 (10%), should mostly return false
- int sampleCount = 0;
- int iterations = 1000;
- for (int i = 0; i < iterations; i++)
- {
- if (trainer.shouldSample())
- {
- sampleCount++;
- }
- }
-
- // Should be roughly 10% (1 out of 10), allow some variance
- assertThat(sampleCount)
- .as("Sample rate should be approximately 10%")
- .isGreaterThan(iterations / 20) // at least 5%
- .isLessThan(iterations / 5); // at most 20%
-
- // Test updating to 100% sampling
- trainer.updateSamplingRate(1.0f);
-
- // Should always sample now
- for (int i = 0; i < 10; i++)
- {
- assertThat(trainer.shouldSample())
- .as("Should always sample with rate 1")
- .isTrue();
- }
- }
-
- @Test
- public void testUpdateSamplingRateValidation()
- {
- trainer.start(true, testConfig);
-
- // Test invalid sampling rates
- assertThatThrownBy(() -> trainer.updateSamplingRate(0f))
- .isInstanceOf(IllegalArgumentException.class)
- .hasMessageContaining("Sampling rate has to be between (0.0;1], it is
0.0");
-
- assertThatThrownBy(() -> trainer.updateSamplingRate(-1f))
- .isInstanceOf(IllegalArgumentException.class)
- .hasMessageContaining("Sampling rate has to be between (0.0;1], it is
-1.0");
-
- assertThatThrownBy(() -> trainer.updateSamplingRate(-100f))
- .isInstanceOf(IllegalArgumentException.class)
- .hasMessageContaining("Sampling rate has to be between (0.0;1], it is
-100.0");
- }
-
- @Test
- public void testUpdateSamplingRateBeforeStart()
- {
- // Should be able to update sampling rate even before start
- trainer.updateSamplingRate(0.2f);
-
- trainer.start(true, testConfig);
-
- // Verify the updated rate is used after start
- int sampleCount = 0;
- int iterations = 1000;
- for (int i = 0; i < iterations; i++)
- {
- if (trainer.shouldSample())
- {
- sampleCount++;
- }
- }
-
- // Should be roughly 20% (1 out of 5), allow some variance
- assertThat(sampleCount)
- .as("Sample rate should be approximately 20%")
- .isGreaterThan(iterations / 10) // at least 10%
- .isLessThan(iterations / 2); // at most 50%
- }
-
@Test
public void testTrainDictionaryNotInitialized()
{
@@ -619,7 +487,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testTrainDictionaryClosed()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
addSampleData(testConfig.acceptableTotalSampleSize);
trainer.close();
@@ -633,7 +501,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testTrainDictionaryInsufficientSampleSize()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
// Add enough samples (15) but with insufficient total size
for (int i = 0; i < 15; i++)
@@ -664,7 +532,7 @@ public class ZstdDictionaryTrainerTest
@Test
public void testTrainDictionaryInsufficientBothSampleCountAndSize()
{
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
// Add only 3 samples with small size
for (int i = 0; i < 3; i++)
@@ -689,9 +557,9 @@ public class ZstdDictionaryTrainerTest
.hasMessageContaining("Use --force to train anyway");
}
- private Future<CompressionDictionary> startTraining(boolean
manualTraining, boolean forceTrain, CompressionDictionaryTrainingConfig config,
int sampleSize) throws Exception
+ private Future<CompressionDictionary> startTraining(boolean forceTrain,
CompressionDictionaryTrainingConfig config, int sampleSize) throws Exception
{
- trainer.start(manualTraining, config);
+ trainer.start(config);
if (sampleSize > 0)
{
addSampleData(sampleSize);
@@ -713,9 +581,9 @@ public class ZstdDictionaryTrainerTest
return future;
}
- private Future<CompressionDictionary> startTraining(boolean
manualTraining, boolean forceTrain, CompressionDictionaryTrainingConfig config)
throws Exception
+ private Future<CompressionDictionary> startTraining(boolean forceTrain,
CompressionDictionaryTrainingConfig config) throws Exception
{
- return startTraining(manualTraining, forceTrain, config,
config.acceptableTotalSampleSize);
+ return startTraining(forceTrain, config,
config.acceptableTotalSampleSize);
}
private void addSampleData(int totalSize)
@@ -742,7 +610,7 @@ public class ZstdDictionaryTrainerTest
.isEqualTo(0);
// Start training
- trainer.start(true, testConfig);
+ trainer.start(testConfig);
// Add some samples
byte[] sampleBytes = SAMPLE_DATA.getBytes();
diff --git
a/test/unit/org/apache/cassandra/utils/CompressionDictionaryHelper.java
b/test/unit/org/apache/cassandra/utils/CompressionDictionaryHelper.java
index 031d79f3b6..3d7d632778 100644
--- a/test/unit/org/apache/cassandra/utils/CompressionDictionaryHelper.java
+++ b/test/unit/org/apache/cassandra/utils/CompressionDictionaryHelper.java
@@ -56,9 +56,9 @@ public class CompressionDictionaryHelper
.maxTotalSampleSize(1024
* 1024) // 1MB total
.build();
- try (ZstdDictionaryTrainer trainer = new
ZstdDictionaryTrainer(keyspace, table, 3, 100))
+ try (ZstdDictionaryTrainer trainer = new
ZstdDictionaryTrainer(keyspace, table, 3))
{
- trainer.start(true, config);
+ trainer.start(config);
for (int i = 0; i < 25000; i++)
{
trainer.addSample(UTF8Type.instance.fromString(CompressionDictionaryHelper.INSTANCE.getRandomSample()));
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]