Dominic Letz created CASSANDRA-8547:
---------------------------------------
Summary: Make RangeTombstone.Tracker.isDeleted() faster
Key: CASSANDRA-8547
URL: https://issues.apache.org/jira/browse/CASSANDRA-8547
Project: Cassandra
Issue Type: Improvement
Components: Core
Environment: 2.0.11
Reporter: Dominic Letz
Attachments: rangetombstone.tracker.txt
During compaction and repairs with many tombstones an exorbitant amount of time
is spend in RangeTombstone.Tracker.isDeleted().
The amount of time spend there can be so big that compactions and repairs look
"stalled" and the time remaining time estimated frozen at the same value for
days.
Using visualvm I've been sample profiling the code during execution and both in
Compaction as well as during repairs found this. (point in time backtraces
attached)
Looking at the code the problem is obviously the linear scanning:
{code}
public boolean isDeleted(Column column)
{
for (RangeTombstone tombstone : ranges)
{
if (comparator.compare(column.name(), tombstone.min) >= 0
&& comparator.compare(column.name(), tombstone.max) <= 0
&& tombstone.maxTimestamp() >= column.timestamp())
{
return true;
}
}
return false;
}
{code}
I would like to propose to change this and instead use a sorted list (e.g.
RangeTombstoneList) here instead.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)