[
https://issues.apache.org/jira/browse/CASSANDRA-21188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061310#comment-18061310
]
Stefan Miklosovic edited comment on CASSANDRA-21188 at 2/26/26 10:04 AM:
-------------------------------------------------------------------------
[~ycai] I think it would be better if we did this now instead of waiting for
CASSANDRA-19776, the problems described in CASSANDRA-19776 are not critical and
whole codebase is using that already. We are special in this regard that we
reference SSTables we want to train on in a custom way.
https://github.com/apache/cassandra/pull/4638
was (Author: smiklosovic):
[~ycai] I think it would be better if we did this now instead of waiting for
CASSANDRA-19776, the problems described in CASSANDRA-19776 are not critical and
whole codebase is using that already. We are special in this regard that we
reference SSTables we want to train on in a custom way.
> Race between compaction and dictionary compression training. Status stuck at
> SAMPLING. ExportImportListCompressionDictionaryTest hangs
> --------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-21188
> URL: https://issues.apache.org/jira/browse/CASSANDRA-21188
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Feature/Compression
> Reporter: Maxim Muzafarov
> Assignee: Stefan Miklosovic
> Priority: Normal
> Fix For: 5.x
>
>
> There is a race beween compaction process and dictionary compression training
> start:
> # CompressionDictionaryManager: We collect all live sstables
> # ICompressionDictionaryTrainer: start a new traning
> # currentTrainingStatus moved to SAMPLING
> # All SSTables get compacted within concurrent compaction thread
> # SSTableSamplingTask: in the constructor sstable.tryRef return null
> # We run this task on a thread pool and it exists in cancelManualTraining
> # the currentTrainingStatus remains SAMPLING (should be *FAILED* or
> {*}COMPLETED!{*})
> ExportImportListCompressionDictionaryTest hangs for 10 minutes (configured
> constant) for now reason.
> The logs:
> {code}
> INFO [PerDiskMemtableFlushWriter_0:1] 2026-02-21T17:07:05,061
> Flushing.java:157 - Writing
> Memtable-table_testexportingspecificdictionary_strateg_18@1268950324(61.523KiB
> serialized bytes, 1000 ops, 506.836KiB (0%) on-heap, 0B (0%) off-heap),
> flushed range = [min(-9223372036854775808), max(9223372036854775807))
> INFO [PerDiskMemtableFlushWriter_0:1] 2026-02-21T17:07:05,061
> Flushing.java:197 - Completed flushing
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-11-big-Data.db
> (28.198KiB) for commitlog position
> CommitLogPosition(segmentId=1771693567140, position=654098), time spent: 0
> ms, bytes flushed: 28875 / (rate: 28.198KiB/s), partitions flushed: 1000 /
> (rate: 1000/s), rows: 1000 / (rate: 1000/s), cpu time: 0 ms, heap allocated:
> 220.711KiB
> INFO [MemtableFlushWriter:1] 2026-02-21T17:07:05,084 LogTransaction.java:266
> - Unfinished transaction log, deleting
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa_txn_flush_bf3ac330-0f47-11f1-88d2-574197b4b378.log
>
> DEBUG [MemtableFlushWriter:1] 2026-02-21T17:07:05,087
> ColumnFamilyStore.java:1416 - Flushed to
> [BigTableReader:big(path='/Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-11-big-Data.db')]
> (1 sstables, 30.889KiB), biggest 30.889KiB, smallest 30.889KiB
> INFO [main] 2026-02-21T17:07:05,091 ColumnFamilyStore.java:1088 - Enqueuing
> flush of cql_test_keyspace.table_testexportingspecificdictionary_strateg_18,
> Reason: UNIT_TESTS, Usage: 506.836KiB (0%) on-heap, 0B (0%) off-heap
> INFO [PerDiskMemtableFlushWriter_0:2] 2026-02-21T17:07:05,092
> Flushing.java:157 - Writing
> Memtable-table_testexportingspecificdictionary_strateg_18@957877902(61.523KiB
> serialized bytes, 1000 ops, 506.836KiB (0%) on-heap, 0B (0%) off-heap),
> flushed range = [min(-9223372036854775808), max(9223372036854775807))
> INFO [PerDiskMemtableFlushWriter_0:2] 2026-02-21T17:07:05,094
> Flushing.java:197 - Completed flushing
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-12-big-Data.db
> (28.201KiB) for commitlog position
> CommitLogPosition(segmentId=1771693567140, position=726098), time spent: 0
> ms, bytes flushed: 28878 / (rate: 28.201KiB/s), partitions flushed: 1000 /
> (rate: 1000/s), rows: 1000 / (rate: 1000/s), cpu time: 0 ms, heap allocated:
> 220.711KiB
> INFO [MemtableFlushWriter:2] 2026-02-21T17:07:05,112 LogTransaction.java:266
> - Unfinished transaction log, deleting
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa_txn_flush_bf3fa530-0f47-11f1-88d2-574197b4b378.log
>
> DEBUG [MemtableFlushWriter:2] 2026-02-21T17:07:05,116
> ColumnFamilyStore.java:1416 - Flushed to
> [BigTableReader:big(path='/Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-12-big-Data.db')]
> (1 sstables, 30.887KiB), biggest 30.887KiB, smallest 30.887KiB
> DEBUG [CompactionExecutor:2] 2026-02-21T17:07:05,117 Directories.java:554 -
> FileStore /System/Volumes/Data (/dev/disk3s5) has 593792975872 bytes
> available, checking if we can write 103847 bytes
> INFO [CompactionExecutor:2] 2026-02-21T17:07:05,117 CompactionTask.java:229
> - Compacting (bf4375c0-0f47-11f1-88d2-574197b4b378)
> [/Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-11-big-Data.db,
>
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-12-big-Data.db,
>
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-9-big-Data.db,
>
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-10-big-Data.db,
> ]
> DEBUG [CompactionExecutor:2] 2026-02-21T17:07:05,118 CursorCompactor.java:152
> - Cursor compaction for table:
> table_testexportingspecificdictionary_strateg_18 keyspace: cql_test_keyspace
> is supported.
> INFO [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,146
> CommandInvokerService.java:185 - Executing command 'train' with execution ID:
> a5960218-7f58-41a2-a06f-d627acf20efd
> INFO [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,147
> CompressionDictionaryManager.java:237 - Starting SSTable-based training for
> cql_test_keyspace.table_testexportingspecificdictionary_strateg_18 with 1
> SSTables
> INFO [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,150
> CompressionDictionaryScheduler.java:101 - Starting SSTable-based dictionary
> training for
> cql_test_keyspace.table_testexportingspecificdictionary_strateg_18 from 1
> SSTables
> DEBUG [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,150
> CompressionDictionaryScheduler.java:198 - Couldn't acquire reference to
> SSTable
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-13-big.
> It may have been removed.
> WARN [NonPeriodicTasks:1] 2026-02-21T17:07:05,150
> CompressionDictionaryScheduler.java:213 - No SSTables available for sampling
> in cql_test_keyspace.table_testexportingspecificdictionary_strateg_18
> INFO [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,150
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> INFO [CompactionExecutor:2] 2026-02-21T17:07:05,152
> CursorCompactor.java:1574 - Compaction ended
> bf4375c0-0f47-11f1-88d2-574197b4b378: { data bytes read = 294620, data bytes
> written = 297868, input (keys = [1:10000,] = 10000, rows = [1:10000,] =
> 10000, cells = [1:10000,] = 10000), output (keys = 10000, rows = 10000,
> cells = 10000)}
> INFO [CompactionExecutor:2] 2026-02-21T17:07:05,153 CompactionTask.java:336
> - Compacted (bf4375c0-0f47-11f1-88d2-574197b4b378) 4 sstables to
> [build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-13-big,]
> to level=0. 101.413KiB to 91.637KiB (~90% of original) in 35ms. Read
> Throughput = 2.826MiB/s, Write Throughput = 2.554MiB/s, Row Throughput =
> ~10,000/s. 10,000 total partitions merged to 10,000. Partition merge counts
> were {1:10000, }. Time spent writing keys = 10ms
> INFO [NonPeriodicTasks:1] 2026-02-21T17:07:05,153 BigFormat.java:324 -
> Deleting sstable:
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-11-big
> INFO [NonPeriodicTasks:1] 2026-02-21T17:07:05,154 BigFormat.java:324 -
> Deleting sstable:
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-12-big
> INFO [NonPeriodicTasks:1] 2026-02-21T17:07:05,154 BigFormat.java:324 -
> Deleting sstable:
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-9-big
> INFO [NonPeriodicTasks:1] 2026-02-21T17:07:05,155 BigFormat.java:324 -
> Deleting sstable:
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-10-big
> INFO [NonPeriodicTasks:1] 2026-02-21T17:07:05,155 LogTransaction.java:266 -
> Unfinished transaction log, deleting
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa_txn_compaction_bf4375c0-0f47-11f1-88d2-574197b4b378.log
>
> INFO [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:06,155
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> INFO [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:07,159
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> INFO [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:08,163
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> INFO [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:09,168
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]