[
https://issues.apache.org/jira/browse/CASSANDRA-3510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13154252#comment-13154252
]
Jonathan Ellis commented on CASSANDRA-3510:
-------------------------------------------
Referring to the isLatestVersion check?
> Incorrect query results due to invalid SSTable.maxTimestamp
> -----------------------------------------------------------
>
> Key: CASSANDRA-3510
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3510
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.3
> Reporter: Aaron Morton
> Priority: Critical
> Attachments: 0001-3510-ignore-maxTimestamp-if-Long.MIN_VALUE.patch,
> 0002-3510-update-maxTimestamp-during-repair.patch, 3510.patch
>
>
> related to CASSANDRA-3446
> (sorry this is so long, took me a bit to work through it all and there is a
> lot of new code :) )
>
> h1. Summary
> SSTable.maxTimestamp for files created before 1.0 defaults to Long.MIN_VALUE,
> and this means the wrong data is returned from queries.
>
> h2. Details
> Noticed on a cluster that was upgraded from 0.8.X to 1.X, it then had trouble
> similar to CASSANDRA-3446. It was rolled back to 0.8 and the migrated to
> 1.0.3.
> 4 Node cluster, all files upgraded to "hb" format.
> In a super CF there are situations where a get for a sub columns returns a
> different value than a get for the column. .e.g.
> {noformat}
> [default@XXX] get Users[ascii('username')]['meta']['password'];
> => (column=password, value=3130323130343130, timestamp=1307352647576000)
> [default@XX] get Users[ascii('username')]['meta'];
> (snip)
> => (column=password, value=3034323131303034, timestamp=1319563673493000)
> {noformat}
> The correct value is the second one.
> I added logging after line 109 in
> o.a.c.db.CollectionController.collectTimeOrderedData() to log the sstable
> name and the file max timestamp, this is what I got:
> {code:java}
> for (SSTableReader sstable : view.sstables)
> {
> long currentMaxTs = sstable.getMaxTimestamp();
> logger.debug(String.format("Got sstable %s and max TS %d", sstable,
> currentMaxTs));
> reduceNameFilter(reducedFilter, container, currentMaxTs);
> {code}
> {noformat}
> DEBUG 14:08:46,012 Got sstable
> SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12348-Data.db') and
> max TS 1321824847534000
> DEBUG 14:08:47,231 Got sstable
> SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12346-Data.db') and
> max TS 1321813380793000
> DEBUG 14:08:49,879 Got sstable
> SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12330-Data.db') and
> max TS -9223372036854775808
> DEBUG 14:08:49,880 Got sstable
> SSTableReader(path='/var/lib/cassandra/data/X/Users-hb-12325-Data.db') and
> max TS -9223372036854775808
> {noformat}
> The key I was reading is present in files 12330 and 12325, the first contains
> the *old / wrong* value with timestamp 1307352647576000 above. The second
> contains the *new / correct* value with timestamp 1319563673493000.
> **Updated:** Incorrect, it was a later file that had the correct value, see
> the first comment.
> When CollectionController.collectTimeOrderedData() processes the 12325 file
> (after processing the 12330 file) while looping over the sstables the call to
> reduceNameFilter() removes the column from the filter because the column
> read from the 12330 file has a time stamp of 1307352647576000 and the 12325
> file incorrectly has a max time stamp of -9223372036854775808 .
> SSTableMetadata is reading the max time stamp from the stats file, but it is
> Long.MIN_VALUE. I think this happens because scrub creates the SSTableWriter
> using cfs.createCompactionWriter() which sets the maxTimestamp in the meta
> data collector according to the maxTimestamp in the meta data for the file(s)
> that will be scrubbed / compacted. But for pre 1.0 format files the default
> in SSTableMetadata is Long.MIN_VALUE, (see SSTableMetaData.deserialize() and
> the ctor). So scrubbing a pre 1.0 file will write stats files that have
> maxTimestamp as Long.MIN_VALUE.
> During scrubbing the SSTableWriter does not update the maxTimestamp because
> append(AbstractCompactedRow) is called which expects the that
> cfs.createCompactionWriter() was able to set the correct maxTimestamp on the
> meta data. Compaction also uses append(AbstractCompactedRow) so may create an
> SSTable with an incorrect maxTimestamp if one of the input files started life
> as a pre 1.0 file and has a bad maxTimestamp.
> It looks like the only time the maxTimestamp is calculated is when the
> SSTable is originally written. So the error from the old files will be
> carried along.
> e.g. If the files a,b and c have the maxTimestamps 10, 100 and Long.MIN_VALUE
> compaction will write a SSTable with maxTimestamp 100. However file c may
> actually contain columns with a timestamp > 100 which will be in the
> compacted file.
> h1. Reproduce
> 1. Start a clean 0.8.7
> 2. Add a schema (details of the schema do not matter):
> {noformat}
> [default@unknown] create keyspace dev;
> 5f834620-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> [default@unknown]
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev]
> [default@dev] create column family super_dev with column_type = 'Super'
> ... and key_validation_class = 'AsciiType' and comparator = 'AsciiType' and
> ... subcomparator = 'AsciiType' and default_validation_class = 'AsciiType';
> 60490720-140b-11e1-0000-242d50cf1fdf
> Waiting for schema agreement...
> ... schemas agree across the cluster
> {noformat}
> 3. Shutdown 0.8.7
> 4. Start 1.0.3 using the same data. Check the schema version loaded, example
> below shows the wrong schema is loaded. I stepped the code and the wrong
> value was read from Migration.getLastMigrationId() due to this bug.
> {noformat}
> INFO [main] 2011-11-21 19:39:08,546 DatabaseDescriptor.java (line 501)
> Loading schema version 5f834620-140b-11e1-0000-242d50cf1fdf
> {noformat}
> 5. Check the schema using the 1.0.3 CLI
> {noformat}
> [default@unknown] use dev;
> Authenticated to keyspace: dev
> [default@dev] describe;
> Keyspace: dev:
> Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
> Durable Writes: true
> Options: [datacenter1:1]
> Column Families:
> [default@dev]
> {noformat}
> 6. I then did a 1.0.3 scrub and re-started. The correct schema version was
> read, but stepping the code both Schema SSTables had Long.MIN_VALUE as the
> maxTimestamp so I think it was only the random order of the files that made
> it work.
> {noformat}
> DEBUG 19:52:30,744 Got sstable
> SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-4-Data.db') and
> max TS -9223372036854775808
> DEBUG 19:52:30,744 Got sstable
> SSTableReader(path='/var/lib/cassandra/data/system/Schema-hb-3-Data.db') and
> max TS -9223372036854775808
> {noformat}
> h1. Fixes
> Not sure, (wanted to get the ticket opened and find out if I was imagining
> things), guessing...
> Use Long.MIN_VALUE as a magic maxTimestamp that means the value is not know.
> This would not fix issues where the incorrect maxTimestamp been included in
> compaction.
>
> Looking at making scrub re-calculate the maxTimestamp.
> Also wondering if the maxTimestamp should default to Long.MAX_VALUE if read
> from a file format that does not support maxTimestamp ?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira