Samuel Klock created CASSANDRA-14941:
----------------------------------------
Summary: Expired secondary index sstables are not promptly
discarded under TWCS
Key: CASSANDRA-14941
URL: https://issues.apache.org/jira/browse/CASSANDRA-14941
Project: Cassandra
Issue Type: Bug
Components: Secondary Indexes
Reporter: Samuel Klock
We have a table in a cluster running 3.0.17 storing roughly time-series data
using TWCS with a secondary index. We've noticed that while expired sstables
for the table are discarded mostly when we expect them to be, the expired
sstables for the secondary index would linger for weeks longer than expected –
essentially indefinitely. Eventually the sstables would fill disks, which would
require manual steps (deleting ancient index sstables) to address. We verified
with {{sstableexpiredblockers}} that there wasn't anything on disk blocking the
expired sstables from being dropped, so this looks like a bug.
Through some debugging, we traced the problem to the index's memtables, which
were consistently (except _just_ after node restarts) reporting a minimum
timestamp from September 2015 – much older than any of our live data – which
causes {{CompactionController.getFullyExpiredSSTables()}} to consistently
return an empty set. The reason that the index sstables report this minimum
timestamp is because of how index updates are created, using
{{PartitionUpdate.singleRowUpdate()}}:
{code:java}
public static PartitionUpdate singleRowUpdate(CFMetaData metadata,
DecoratedKey key, Row row, Row staticRow)
{
MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
Holder holder = new Holder(
new PartitionColumns(
staticRow == null ? Columns.NONE :
Columns.from(staticRow.columns()),
row == null ? Columns.NONE : Columns.from(row.columns())
),
row == null ? BTree.empty() : BTree.singleton(row),
deletionInfo,
staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow,
EncodingStats.NO_STATS
);
return new PartitionUpdate(metadata, key, holder, deletionInfo, false);
}
{code}
The use of {{EncodingStats.NO_STATS}} makes it appear as though the earliest
timestamp in the resulting {{PartitionUpdate}} is from September 2015. That
timestamp becomes the minimum for the memtable.
Modifying this version of {{PartitionUpdate.singleRowUpdate()}} to:
{code:java}
public static PartitionUpdate singleRowUpdate(CFMetaData metadata,
DecoratedKey key, Row row, Row staticRow)
{
MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
staticRow = (staticRow == null ? Rows.EMPTY_STATIC_ROW : staticRow);
EncodingStats stats = EncodingStats.Collector.collect(staticRow,
(row == null ?
Collections.emptyIterator() :
Iterators.singletonIterator(row)),
deletionInfo);
Holder holder = new Holder(
new PartitionColumns(
staticRow == Rows.EMPTY_STATIC_ROW ? Columns.NONE :
Columns.from(staticRow.columns()),
row == null ? Columns.NONE : Columns.from(row.columns())
),
row == null ? BTree.empty() : BTree.singleton(row),
deletionInfo,
staticRow,
stats
);
return new PartitionUpdate(metadata, key, holder, deletionInfo, false);
}
{code}
(i.e., computing an {{EncodingStats}} from the contents of the update) seems to
fix the problem. However, we're not certain whether A) there's a functional
reason the method was using {{EncodingStats.NO_STATS}} previously or B) whether
the {{EncodingStats}} the revised version creates is correct (in particular,
the use of {{deletionInfo}} feels a little suspect). We're also not sure
whether there's a more appropriate fix (e.g., changing how the memtables
compute the minimum timestamp, particularly in the {{NO_STATS}} case).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]