[
https://issues.apache.org/jira/browse/CASSANDRA-11525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233209#comment-15233209
]
Jordan West commented on CASSANDRA-11525:
-----------------------------------------
[~doanduyhai] we have tracked down the root cause of the bug and it has
affected all versions of SASI since its original inclusion in Cassandra. The
issue is that when positions in the -Index.db file are > Integer.MAX_VALUE the
positions are factored into a 32-bit and 16-bit value. The 16-bit value was
being read as a signed short and for certain positions this resulted in
reconstructing an incorrect 64-bit offset from the 32-bit and 16-bit parts.
Thankfully, this is a quick, one-line fix (reading the short as unsigned), and
is entirely independent of the changes in CASSANDRA-11383 or this ticket. We
will include the fix for this with the merge of the changes in this ticket. We
are working on final verification using your SSTables before we merge.
> StaticTokenTreeBuilder should respect posibility of duplicate tokens
> --------------------------------------------------------------------
>
> Key: CASSANDRA-11525
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11525
> Project: Cassandra
> Issue Type: Bug
> Components: sasi
> Environment: Cassandra 3.5-SNAPSHOT
> Reporter: DOAN DuyHai
> Assignee: Jordan West
> Fix For: 3.5
>
>
> Bug reproduced in *Cassandra 3.5-SNAPSHOT* (after the fix of OOM)
> {noformat}
> create table if not exists test.resource_bench (
> dsr_id uuid,
> rel_seq bigint,
> seq bigint,
> dsp_code varchar,
> model_code varchar,
> media_code varchar,
> transfer_code varchar,
> commercial_offer_code varchar,
> territory_code varchar,
> period_end_month_int int,
> authorized_societies_txt text,
> rel_type text,
> status text,
> dsp_release_code text,
> title text,
> contributors_name list<text>,
> unic_work text,
> paying_net_qty bigint,
> PRIMARY KEY ((dsr_id, rel_seq), seq)
> ) WITH CLUSTERING ORDER BY (seq ASC);
> CREATE CUSTOM INDEX resource_period_end_month_int_idx ON test.resource_bench
> (period_end_month_int) USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH
> OPTIONS = {'mode': 'PREFIX'};
> {noformat}
> So the index is a {{DENSE}} numerical index.
> When doing the request {{SELECT dsp_code, unic_work, paying_net_qty FROM
> test.resource_bench WHERE period_end_month_int = 201401}} using server-side
> paging.
> I bumped into this stack trace:
> {noformat}
> WARN [SharedPool-Worker-1] 2016-04-06 00:00:30,825
> AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread
> Thread[SharedPool-Worker-1,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: -55
> at
> org.apache.cassandra.db.ClusteringPrefix$Serializer.deserialize(ClusteringPrefix.java:268)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:128)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.db.Serializers$2.deserialize(Serializers.java:120)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.io.sstable.IndexHelper$IndexInfo$Serializer.deserialize(IndexHelper.java:148)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:218)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.io.sstable.format.SSTableReader.keyAt(SSTableReader.java:1823)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:168)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.index.sasi.SSTableIndex$DecoratedKeyFetcher.apply(SSTableIndex.java:155)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:518)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.index.sasi.disk.TokenTree$KeyIterator.computeNext(TokenTree.java:504)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.index.sasi.utils.AbstractIterator.tryToComputeNext(AbstractIterator.java:116)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.index.sasi.utils.AbstractIterator.hasNext(AbstractIterator.java:110)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:106)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> at
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:289)
> ~[apache-cassandra-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
> {noformat}
> There are 2 possible root cause:
> 1. Index corrupted
> 2. Raw SSTable is corrupted
> To rule out *scenario 1*, I just drop and rebuild the index *many times* but
> the exception was still there, so I modified the method
> {{SSTableReader.keyAt(long indexPosition)}} to log the impacted partition:
> {noformat}
> try
> {
> if (isKeyCacheSetup())
> cacheKey(key, rowIndexEntrySerializer.deserialize(in));
> } catch (IndexOutOfBoundsException ex)
> {
> logger.error(String.format(
> "Error when reading index entry for token '%s' at
> indexPosition %s ",
> key.getToken().getTokenValue(), indexPosition));
> }
> {noformat}
> Below are the output in the log after code modification:
> {noformat}
> system_ns3038406.ip-5-39-72.eu.log:ERROR [SharedPool-Worker-1] 2016-04-07
> 17:08:28,843 SSTableReader.java:1830 - Error when reading index entry for
> token '-7005474773654630139' at indexPosition 2147457128
> system_ns3038406.ip-5-39-72.eu.log:ERROR [SharedPool-Worker-1] 2016-04-07
> 17:08:28,917 SSTableReader.java:1830 - Error when reading index entry for
> token '-5016711186446865616' at indexPosition 2147458268
> system_ns3038406.ip-5-39-72.eu.log:ERROR [SharedPool-Worker-1] 2016-04-07
> 17:08:28,918 SSTableReader.java:1830 - Error when reading index entry for
> token '1027994831942941747' at indexPosition 2147459218
> {noformat}
> I double check the original C* data using {{cqlsh}} but it seems that there
> is no data for those tokens:
> {noformat}
> SELECT dsr_id,rel_seq FROM resource_bench WHERE
> token(dsr_id,rel_seq)=-7005474773654630139;
> dsr_id | rel_seq
> --------+---------
> (0 rows)
> SELECT dsr_id,rel_seq FROM resource_bench WHERE
> token(dsr_id,rel_seq)=-5016711186446865616;
> dsr_id | rel_seq
> --------+---------
> (0 rows)
> SELECT dsr_id,rel_seq FROM resource_bench WHERE
> token(dsr_id,rel_seq)=1027994831942941747;
> dsr_id | rel_seq
> --------+---------
> (0 rows)
> {noformat}
> /cc [~xedin] [~beobal]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)