[ 
https://issues.apache.org/jira/browse/CASSANDRA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266855#comment-13266855
 ] 

Jonathan Ellis commented on CASSANDRA-4205:
-------------------------------------------

Checked Brandon's sstables with CASSANDRA-4211.  One had a max timestamp of 
1335979912016 after upgradesstables, the other 1335979951175.  cfhistograms 
showed about 20% of reads hit both sstables.

This sounds about right; it's reasonable that the sstable w/ higher max 
timestamp, will contain *some* rows w/ actually an older version than the other 
sstable's max.  The CASSANDRA-2498 approach will always start with the newer 
sstable, but it will only skip the older one if the first-seen version of the 
columns requested have a timestamp newer than the max on the older sstable.

So, I think there is no bug here related to upgraded timestamps, it just isn't 
a magic bullet to prevent all multiple sstable reads.

The patch for creating a new version to represent "we have *correct* timestamps 
including row tombstones" is still relevant, though.
                
> SSTables are not updated with max timestamp on upgradesstables/compaction 
> leading to non-optimal performance.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-4205
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4205
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Thorkild Stray 
>            Assignee: Jonathan Ellis
>            Priority: Critical
>             Fix For: 1.0.10, 1.1.1
>
>         Attachments: 4205.txt
>
>
> We upgraded from 0.7.9 to 1.0.7 on a cluster with a heavy update load. After 
> converting all the reads to named column reads instead of get_slice calls, we 
> noticed that we still weren't getting the performance improvements 
> implemented in CASSANDRA-2498. A single named column read was still touching 
> multiple SSTables according to nodetool cfhistograms. 
> To verify whether or not this was a reporting issue or a real issue, we ran 
> multiple tests with stress and noticed that it worked as expected. After 
> changing stress so that it ran the read/write test directly in the CF having 
> issues (3 times stress & flush), we noticed that stress also touched multiple 
> SSTables (according to cfhistograms).
> So, the root of the problem is "something" left over from our pre-1.0 days. 
> All SSTables were upgraded with upgradesstables, and have been written and 
> compacted many times since the upgrade (4 months ago). The usage pattern for 
> this CF is that it is constantly read and updated (overwritten), but no 
> deletes. 
> After discussing the problem with Brandon Williams on #cassandra, it seems 
> the problem might be because a max timestamp has never been written for the 
> old SSTables that were upgraded from pre 1.0. They have only been compacted, 
> and the max timestamp is not recorded during compactions. 
> A suggested fix is to special case this in upgradesstables so that a max 
> timestamp always exists for all SSTables. 
> {panel}
> 06:08 < driftx> thorkild_: tx.  The thing is we don't record the max 
> timestamp on compactions, but we can do it specially for upgradesstables.
> 06:08 < driftx> so, nothing in... nothing out.
> 06:10 < thorkild_> driftx: ah, so when you upgrade from before the metadata 
> was written, and that data is only feed through upgradesstables and 
> compactions -> never properly written?
> 06:10 < thorkild_> that makes sense.
> 06:11 < driftx> right, we never create it, we just reuse it :(
> {panel}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to