[
https://issues.apache.org/jira/browse/CASSANDRA-4205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13266855#comment-13266855
]
Jonathan Ellis commented on CASSANDRA-4205:
-------------------------------------------
Checked Brandon's sstables with CASSANDRA-4211. One had a max timestamp of
1335979912016 after upgradesstables, the other 1335979951175. cfhistograms
showed about 20% of reads hit both sstables.
This sounds about right; it's reasonable that the sstable w/ higher max
timestamp, will contain *some* rows w/ actually an older version than the other
sstable's max. The CASSANDRA-2498 approach will always start with the newer
sstable, but it will only skip the older one if the first-seen version of the
columns requested have a timestamp newer than the max on the older sstable.
So, I think there is no bug here related to upgraded timestamps, it just isn't
a magic bullet to prevent all multiple sstable reads.
The patch for creating a new version to represent "we have *correct* timestamps
including row tombstones" is still relevant, though.
> SSTables are not updated with max timestamp on upgradesstables/compaction
> leading to non-optimal performance.
> -------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-4205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4205
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Thorkild Stray
> Assignee: Jonathan Ellis
> Priority: Critical
> Fix For: 1.0.10, 1.1.1
>
> Attachments: 4205.txt
>
>
> We upgraded from 0.7.9 to 1.0.7 on a cluster with a heavy update load. After
> converting all the reads to named column reads instead of get_slice calls, we
> noticed that we still weren't getting the performance improvements
> implemented in CASSANDRA-2498. A single named column read was still touching
> multiple SSTables according to nodetool cfhistograms.
> To verify whether or not this was a reporting issue or a real issue, we ran
> multiple tests with stress and noticed that it worked as expected. After
> changing stress so that it ran the read/write test directly in the CF having
> issues (3 times stress & flush), we noticed that stress also touched multiple
> SSTables (according to cfhistograms).
> So, the root of the problem is "something" left over from our pre-1.0 days.
> All SSTables were upgraded with upgradesstables, and have been written and
> compacted many times since the upgrade (4 months ago). The usage pattern for
> this CF is that it is constantly read and updated (overwritten), but no
> deletes.
> After discussing the problem with Brandon Williams on #cassandra, it seems
> the problem might be because a max timestamp has never been written for the
> old SSTables that were upgraded from pre 1.0. They have only been compacted,
> and the max timestamp is not recorded during compactions.
> A suggested fix is to special case this in upgradesstables so that a max
> timestamp always exists for all SSTables.
> {panel}
> 06:08 < driftx> thorkild_: tx. The thing is we don't record the max
> timestamp on compactions, but we can do it specially for upgradesstables.
> 06:08 < driftx> so, nothing in... nothing out.
> 06:10 < thorkild_> driftx: ah, so when you upgrade from before the metadata
> was written, and that data is only feed through upgradesstables and
> compactions -> never properly written?
> 06:10 < thorkild_> that makes sense.
> 06:11 < driftx> right, we never create it, we just reuse it :(
> {panel}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira