[jira] [Commented] (CASSANDRA-11623) Compactions w/ Short Rows Spending Time in getOnDiskFilePointer

Tom Petracca (JIRA) Fri, 22 Apr 2016 09:03:28 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-11623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15254135#comment-15254135
 ]


Tom Petracca commented on CASSANDRA-11623:
------------------------------------------

Agreed on the chunkOffset thing always equaling getOnDiskFilePointer().  I more 
went the route I did as it seems like the whole reason getOnDiskFilePointer was 
being called in the first place was to know how big the file would be if we 
stopped writing (in other words, should I start a new sstable?) and so I 
included the impact the buffer would have on eventual on disk size.  I'm happy 
to switch it to just be chunkOffset though, because at the end of the day none 
of it needs to be exact.  Either way it would stay in that 
getEffectiveOnDiskBytes method, because for the reasons stated in the next 
paragraph I think the getOnDiskFilePointer method itself still needs to 
directly hit lseek for CompressedSequentialWriter's.

And yea so a while back I tried removing the seekToChunkStart call on 1.2 
(because presumably it doesn't need to exist), but ended up causing weirdness 
where truncate calls would cause corrupt sstables.  However I didn't dig into 
it any further or even try it on later versions.  It gets called way less 
frequently (only every flush as opposed to every row write), so for now I'd say 
it's not important.

> Compactions w/ Short Rows Spending Time in getOnDiskFilePointer
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-11623
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11623
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Tom Petracca
>            Priority: Minor
>         Attachments: compactiontask_profile.png
>
>
> Been doing some performance tuning and profiling of my cassandra cluster and 
> noticed that compaction speeds for my tables that I know to have very short 
> rows were going particularly slowly.  Profiling shows a ton of time being 
> spent in BigTableWriter.getOnDiskFilePointer(), and attaching strace to a 
> CompactionTask shows that a majority of time is being spent lseek (called by 
> getOnDiskFilePointer), and not read or write.
> Going deeper it looks like we call getOnDiskFilePointer each row (sometimes 
> multiple times per row) in order to see if we've reached our expected sstable 
> size and should start a new writer.  This is pretty unnecessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-11623) Compactions w/ Short Rows Spending Time in getOnDiskFilePointer

Reply via email to