Interesting! I suspect I know what the increased disk usage in TWCS, and it's a solvable problem, the problem is roughly something like this: - Window 1 has sstables 1, 2, 3, 4, 5, 6 - We start compacting 1, 2, 3, 4 (using STCS-in-TWCS first window) - The TWCS window rolls over - We flush (sstable 7), and trigger the TWCS window major compaction, which starts compacting 5, 6, 7 + any other sstable from that window - If the first compaction (1,2,3,4) has finished by the time sstable 7 is flushed, we'll include it's result in that compaction, if it doesn't we'll have to do the major compaction twice to guarantee we have exactly one sstable per window, which will temporarily increase disk space
We can likely fix this by not scheduling the major compaction until we know all of the sstables in the window are available to be compacted. Also your data model is probably typical, but not well suited for time series cases - if you find my 2016 Cassandra Summit TWCS talk (it's on youtube), I mention aligning partition keys to TWCS windows, which involves adding a second component to the partition key. This is hugely important in terms of making sure TWCS data expires quickly and avoiding having to read from more than one TWCS window at a time. - Jeff On Mon, May 14, 2018 at 7:12 AM, Lucas Benevides < lu...@maurobenevides.com.br> wrote: > Dear community, > > I want to tell you about my paper published in a conference in March. The > title is " NoSQL Database Performance Tuning for IoT Data - Cassandra > Case Study" and it is available (not for free) in > http://www.scitepress.org/DigitalLibrary/Link.aspx?doi= > 10.5220/0006782702770284 . > > TWCS is used and compared with DTCS. > > I hope you can download it, unfortunately I cannot send copies as the > publisher has its copyright. > > Lucas B. Dias > > >