[ 
https://issues.apache.org/jira/browse/CASSANDRA-21188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061310#comment-18061310
 ] 

Stefan Miklosovic edited comment on CASSANDRA-21188 at 2/26/26 10:04 AM:
-------------------------------------------------------------------------

[~ycai] I think it would be better if we did this now instead of waiting for 
CASSANDRA-19776, the problems described in CASSANDRA-19776 are not critical and 
whole codebase is using that already. We are special in this regard that we 
reference SSTables we want to train on in a custom way.

https://github.com/apache/cassandra/pull/4638


was (Author: smiklosovic):
[~ycai] I think it would be better if we did this now instead of waiting for 
CASSANDRA-19776, the problems described in CASSANDRA-19776 are not critical and 
whole codebase is using that already. We are special in this regard that we 
reference SSTables we want to train on in a custom way.

> Race between compaction and dictionary compression training. Status stuck at 
> SAMPLING. ExportImportListCompressionDictionaryTest hangs
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-21188
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-21188
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Feature/Compression
>            Reporter: Maxim Muzafarov
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 5.x
>
>
> There is a race beween compaction process and dictionary compression training 
> start:
>  # CompressionDictionaryManager: We collect all live sstables
>  # ICompressionDictionaryTrainer: start a new traning
>  # currentTrainingStatus moved to SAMPLING
>  # All SSTables get compacted within concurrent compaction thread
>  # SSTableSamplingTask: in the constructor sstable.tryRef return null
>  # We run this task on a thread pool and it exists in cancelManualTraining
>  # the currentTrainingStatus remains SAMPLING (should be *FAILED* or 
> {*}COMPLETED!{*})
> ExportImportListCompressionDictionaryTest hangs for 10 minutes (configured 
> constant) for now reason. 
> The logs:
> {code}
> INFO  [PerDiskMemtableFlushWriter_0:1] 2026-02-21T17:07:05,061 
> Flushing.java:157 - Writing 
> Memtable-table_testexportingspecificdictionary_strateg_18@1268950324(61.523KiB
>  serialized bytes, 1000 ops, 506.836KiB (0%) on-heap, 0B (0%) off-heap), 
> flushed range = [min(-9223372036854775808), max(9223372036854775807))
> INFO  [PerDiskMemtableFlushWriter_0:1] 2026-02-21T17:07:05,061 
> Flushing.java:197 - Completed flushing 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-11-big-Data.db
>  (28.198KiB) for commitlog position 
> CommitLogPosition(segmentId=1771693567140, position=654098), time spent: 0 
> ms, bytes flushed: 28875 / (rate: 28.198KiB/s), partitions flushed: 1000 / 
> (rate: 1000/s), rows: 1000 / (rate: 1000/s), cpu time: 0 ms, heap allocated: 
> 220.711KiB
> INFO  [MemtableFlushWriter:1] 2026-02-21T17:07:05,084 LogTransaction.java:266 
> - Unfinished transaction log, deleting 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa_txn_flush_bf3ac330-0f47-11f1-88d2-574197b4b378.log
>  
> DEBUG [MemtableFlushWriter:1] 2026-02-21T17:07:05,087 
> ColumnFamilyStore.java:1416 - Flushed to 
> [BigTableReader:big(path='/Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-11-big-Data.db')]
>  (1 sstables, 30.889KiB), biggest 30.889KiB, smallest 30.889KiB
> INFO  [main] 2026-02-21T17:07:05,091 ColumnFamilyStore.java:1088 - Enqueuing 
> flush of cql_test_keyspace.table_testexportingspecificdictionary_strateg_18, 
> Reason: UNIT_TESTS, Usage: 506.836KiB (0%) on-heap, 0B (0%) off-heap
> INFO  [PerDiskMemtableFlushWriter_0:2] 2026-02-21T17:07:05,092 
> Flushing.java:157 - Writing 
> Memtable-table_testexportingspecificdictionary_strateg_18@957877902(61.523KiB 
> serialized bytes, 1000 ops, 506.836KiB (0%) on-heap, 0B (0%) off-heap), 
> flushed range = [min(-9223372036854775808), max(9223372036854775807))
> INFO  [PerDiskMemtableFlushWriter_0:2] 2026-02-21T17:07:05,094 
> Flushing.java:197 - Completed flushing 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-12-big-Data.db
>  (28.201KiB) for commitlog position 
> CommitLogPosition(segmentId=1771693567140, position=726098), time spent: 0 
> ms, bytes flushed: 28878 / (rate: 28.201KiB/s), partitions flushed: 1000 / 
> (rate: 1000/s), rows: 1000 / (rate: 1000/s), cpu time: 0 ms, heap allocated: 
> 220.711KiB
> INFO  [MemtableFlushWriter:2] 2026-02-21T17:07:05,112 LogTransaction.java:266 
> - Unfinished transaction log, deleting 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa_txn_flush_bf3fa530-0f47-11f1-88d2-574197b4b378.log
>  
> DEBUG [MemtableFlushWriter:2] 2026-02-21T17:07:05,116 
> ColumnFamilyStore.java:1416 - Flushed to 
> [BigTableReader:big(path='/Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-12-big-Data.db')]
>  (1 sstables, 30.887KiB), biggest 30.887KiB, smallest 30.887KiB
> DEBUG [CompactionExecutor:2] 2026-02-21T17:07:05,117 Directories.java:554 - 
> FileStore /System/Volumes/Data (/dev/disk3s5) has 593792975872 bytes 
> available, checking if we can write 103847 bytes
> INFO  [CompactionExecutor:2] 2026-02-21T17:07:05,117 CompactionTask.java:229 
> - Compacting (bf4375c0-0f47-11f1-88d2-574197b4b378) 
> [/Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-11-big-Data.db,
>  
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-12-big-Data.db,
>  
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-9-big-Data.db,
>  
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-10-big-Data.db,
>  ]
> DEBUG [CompactionExecutor:2] 2026-02-21T17:07:05,118 CursorCompactor.java:152 
> - Cursor compaction for table: 
> table_testexportingspecificdictionary_strateg_18 keyspace: cql_test_keyspace 
> is supported.
> INFO  [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,146 
> CommandInvokerService.java:185 - Executing command 'train' with execution ID: 
> a5960218-7f58-41a2-a06f-d627acf20efd
> INFO  [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,147 
> CompressionDictionaryManager.java:237 - Starting SSTable-based training for 
> cql_test_keyspace.table_testexportingspecificdictionary_strateg_18 with 1 
> SSTables
> INFO  [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,150 
> CompressionDictionaryScheduler.java:101 - Starting SSTable-based dictionary 
> training for 
> cql_test_keyspace.table_testexportingspecificdictionary_strateg_18 from 1 
> SSTables
> DEBUG [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,150 
> CompressionDictionaryScheduler.java:198 - Couldn't acquire reference to 
> SSTable 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-13-big.
>  It may have been removed.
> WARN  [NonPeriodicTasks:1] 2026-02-21T17:07:05,150 
> CompressionDictionaryScheduler.java:213 - No SSTables available for sampling 
> in cql_test_keyspace.table_testexportingspecificdictionary_strateg_18
> INFO  [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:05,150 
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> INFO  [CompactionExecutor:2] 2026-02-21T17:07:05,152 
> CursorCompactor.java:1574 - Compaction ended 
> bf4375c0-0f47-11f1-88d2-574197b4b378: { data bytes read = 294620, data bytes 
> written = 297868,  input (keys = [1:10000,] = 10000, rows = [1:10000,] = 
> 10000, cells = [1:10000,] = 10000),  output (keys = 10000, rows = 10000, 
> cells = 10000)}
> INFO  [CompactionExecutor:2] 2026-02-21T17:07:05,153 CompactionTask.java:336 
> - Compacted (bf4375c0-0f47-11f1-88d2-574197b4b378) 4 sstables to 
> [build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-13-big,]
>  to level=0.  101.413KiB to 91.637KiB (~90% of original) in 35ms.  Read 
> Throughput = 2.826MiB/s, Write Throughput = 2.554MiB/s, Row Throughput = 
> ~10,000/s.  10,000 total partitions merged to 10,000.  Partition merge counts 
> were {1:10000, }. Time spent writing keys = 10ms
> INFO  [NonPeriodicTasks:1] 2026-02-21T17:07:05,153 BigFormat.java:324 - 
> Deleting sstable: 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-11-big
> INFO  [NonPeriodicTasks:1] 2026-02-21T17:07:05,154 BigFormat.java:324 - 
> Deleting sstable: 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-12-big
> INFO  [NonPeriodicTasks:1] 2026-02-21T17:07:05,154 BigFormat.java:324 - 
> Deleting sstable: 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-9-big
> INFO  [NonPeriodicTasks:1] 2026-02-21T17:07:05,155 BigFormat.java:324 - 
> Deleting sstable: 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa-10-big
> INFO  [NonPeriodicTasks:1] 2026-02-21T17:07:05,155 LogTransaction.java:266 - 
> Unfinished transaction log, deleting 
> /Users/maxim.muzafarov/IdeaProjects/cassandra/build/test/cassandra/data/cql_test_keyspace/table_testexportingspecificdictionary_strateg_18-1b255f4def2540a60000000000000056/pa_txn_compaction_bf4375c0-0f47-11f1-88d2-574197b4b378.log
>  
> INFO  [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:06,155 
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> INFO  [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:07,159 
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> INFO  [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:08,163 
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> INFO  [RMI TCP Connection(26)-127.0.0.1] 2026-02-21T17:07:09,168 
> ToolRunner.java:927 - >>>> Polling training status...SAMPLING
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to