[
https://issues.apache.org/jira/browse/CASSANDRA-11473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233510#comment-15233510
]
Sylvain Lebresne commented on CASSANDRA-11473:
----------------------------------------------
I haven't looked very carefully yet (and probably won't be able to next week as
I'm on vacation), but one thing that would be nice to know to make sure is the
history of that cluster. Has it, by any chance, be upgraded from a beta/RC of
3.0? The fact the extra bytes are always here *and* are accounted in the row
size strongly suggests it's not some corruption. But at the same time, it's
hard to believe that the code genuinely doesn't write the same thing that it
reads as I'd assume something like that would have been detected easily and
we'd have lots of reports (of course, that could be something only happening in
very special cases but the serialization code doesn't have tons of special
cases). But we definitively did change the file format between betas, and while
I don't remember exactly, we might have done that between RCs too. So, that's
the only idea I have thus far.
> Clustering column value is zeroed out in some query results
> -----------------------------------------------------------
>
> Key: CASSANDRA-11473
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11473
> Project: Cassandra
> Issue Type: Bug
> Environment: debian jessie patch current with Cassandra 3.0.4
> Reporter: Jason Kania
> Assignee: Tyler Hobbs
>
> As per a discussion on the mailing list,
> http://www.mail-archive.com/[email protected]/msg46902.html, we are
> encountering inconsistent query results when the following query is run:
> {noformat}
> select "subscriberId","sensorUnitId","sensorId","time" from
> "sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND
> "sensorId"=0 ORDER BY "time" LIMIT 10;
> {noformat}
> Invalid Query Results
> {noformat}
> subscriberId sensorUnitId sensorId time
> JASKAN 0 0 2015-05-24 2:09
> JASKAN 0 0 1969-12-31 19:00
> JASKAN 0 0 2016-01-21 2:10
> JASKAN 0 0 2016-01-21 2:10
> JASKAN 0 0 2016-01-21 2:10
> JASKAN 0 0 2016-01-21 2:11
> JASKAN 0 0 2016-01-21 2:22
> JASKAN 0 0 2016-01-21 2:22
> JASKAN 0 0 2016-01-21 2:22
> JASKAN 0 0 2016-01-21 2:22
> {noformat}
> Valid Query Results
> {noformat}
> subscriberId sensorUnitId sensorId time
> JASKAN 0 0 2015-05-24 2:09
> JASKAN 0 0 2015-05-24 2:09
> JASKAN 0 0 2015-05-24 2:10
> JASKAN 0 0 2015-05-24 2:10
> JASKAN 0 0 2015-05-24 2:10
> JASKAN 0 0 2015-05-24 2:10
> JASKAN 0 0 2015-05-24 2:11
> JASKAN 0 0 2015-05-24 2:13
> JASKAN 0 0 2015-05-24 2:13
> JASKAN 0 0 2015-05-24 2:14
> {noformat}
> Running the following yields no rows indicating that the 1969... timestamp is
> invalid.
> {noformat}
> select "subscriberId","sensorUnitId","sensorId","time" FROM
> "edgeTransitionIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND
> "sensorId"=0 and time='1969-12-31 19:00:00-0500';
> {noformat}
> The schema is as follows:
> {noformat}
> CREATE TABLE sensorReading."sensorReadingIndex" (
> "subscriberId" text,
> "sensorUnitId" int,
> "sensorId" int,
> time timestamp,
> "classId" int,
> correlation float,
> PRIMARY KEY (("subscriberId", "sensorUnitId", "sensorId"), time)
> ) WITH CLUSTERING ORDER BY (time ASC)
> AND bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE INDEX classSecondaryIndex ON sensorReading."sensorReadingIndex"
> ("classId");
> {noformat}
> We were asked to provide our sstables as well but these are very large and
> would require some data obfuscation. We are able to run code or scripts
> against the data on our servrers if that is option.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)