Tom Petracca created CASSANDRA-11623:
----------------------------------------
Summary: Compactions w/ Short Rows Spending Time in
getOnDiskFilePointer
Key: CASSANDRA-11623
URL: https://issues.apache.org/jira/browse/CASSANDRA-11623
Project: Cassandra
Issue Type: Improvement
Reporter: Tom Petracca
Priority: Minor
Attachments: compactiontask_profile.png
Been doing some performance tuning and profiling of my cassandra cluster and
noticed that compaction speeds for my tables that I know to have very short
rows were going particularly slowly. Profiling shows a ton of time being spent
in BigTableWriter.getOnDiskFilePointer(), and attaching strace to a
CompactionTask shows that a majority of time is being spent lseek (called by
getOnDiskFilePointer), and not read or write.
Going deeper it looks like we call getOnDiskFilePointer each row (sometimes
multiple times per row) in order to see if we've reached our expected sstable
size and should start a new writer. This is pretty unnecessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)