[
https://issues.apache.org/jira/browse/CASSANDRA-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902675#comment-13902675
]
Piotr Kołaczkowski commented on CASSANDRA-6707:
-----------------------------------------------
I repeated the experiment with one minor change. This time I did nodetool flush
before upgrading the first node and I didn't do it before upgrading the second
node. Here are the results:
Before upgrade:
{noformat}
Connected to test at localhost:9160.
[cqlsh 3.1.8 | Cassandra 1.2.15.1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol
19.36.2]
Use HELP for help.
cqlsh> select * from "PortfolioDemo"."StockHist" where key='JZL';
key | column1 | value
-----+------------+--------
JZL | 2013-11-09 | 517.41
JZL | 2013-11-10 | 621.67
JZL | 2013-11-11 | 647.35
JZL | 2013-11-12 | 189.38
JZL | 2013-11-13 | 29.725
JZL | 2013-11-14 | 385.86
JZL | 2013-11-15 | 900.27
JZL | 2013-11-16 | 210.4
JZL | 2013-11-17 | 844.64
JZL | 2013-11-18 | 619.82
JZL | 2013-11-19 | 956.49
JZL | 2013-11-20 | 928.05
JZL | 2013-11-21 | 542.06
JZL | 2013-11-22 | 437.06
JZL | 2013-11-23 | 806.88
JZL | 2013-11-24 | 179.27
JZL | 2013-11-25 | 207.45
JZL | 2013-11-26 | 848.3
JZL | 2013-11-27 | 715.32
JZL | 2013-11-28 | 445.85
JZL | 2013-11-29 | 821.07
JZL | 2013-11-30 | 873.4
JZL | 2013-12-01 | 625.07
JZL | 2013-12-02 | 21.017
JZL | 2013-12-03 | 881.37
JZL | 2013-12-04 | 443.81
JZL | 2013-12-05 | 432.7
JZL | 2013-12-06 | 850.86
JZL | 2013-12-07 | 38.699
JZL | 2013-12-08 | 612.97
JZL | 2013-12-09 | 158.81
JZL | 2013-12-10 | 378.39
JZL | 2013-12-11 | 245.21
JZL | 2013-12-12 | 428.54
JZL | 2013-12-13 | 664.41
JZL | 2013-12-14 | 784.94
JZL | 2013-12-15 | 820.02
JZL | 2013-12-16 | 859.82
JZL | 2013-12-17 | 258.5
JZL | 2013-12-18 | 731.21
JZL | 2013-12-19 | 384.75
JZL | 2013-12-20 | 696.25
JZL | 2013-12-21 | 936.1
JZL | 2013-12-22 | 781.04
JZL | 2013-12-23 | 113.63
JZL | 2013-12-24 | 254.12
JZL | 2013-12-25 | 120.91
JZL | 2013-12-26 | 65.565
JZL | 2013-12-27 | 378.6
JZL | 2013-12-28 | 712.02
JZL | 2013-12-29 | 953.41
JZL | 2013-12-30 | 788.21
JZL | 2013-12-31 | 236.73
JZL | 2014-01-01 | 727.81
JZL | 2014-01-02 | 128.59
JZL | 2014-01-03 | 290.18
JZL | 2014-01-04 | 930.29
JZL | 2014-01-05 | 160.98
JZL | 2014-01-06 | 992.55
JZL | 2014-01-07 | 92.251
JZL | 2014-01-08 | 456.14
JZL | 2014-01-09 | 969.27
JZL | 2014-01-10 | 769.52
JZL | 2014-01-11 | 864.01
JZL | 2014-01-12 | 516.35
JZL | 2014-01-13 | 547.88
JZL | 2014-01-14 | 128.87
JZL | 2014-01-15 | 847.73
JZL | 2014-01-16 | 232.34
JZL | 2014-01-17 | 491.26
JZL | 2014-01-18 | 196.56
JZL | 2014-01-19 | 57.35
JZL | 2014-01-20 | 978.43
JZL | 2014-01-21 | 588.59
JZL | 2014-01-22 | 377.69
JZL | 2014-01-23 | 772.32
JZL | 2014-01-24 | 377.71
JZL | 2014-01-25 | 121.46
JZL | 2014-01-26 | 202.91
JZL | 2014-01-27 | 679.37
JZL | 2014-01-28 | 558.55
JZL | 2014-01-29 | 493.89
JZL | 2014-01-30 | 759.51
JZL | 2014-01-31 | 331.46
JZL | 2014-02-01 | 291.12
JZL | 2014-02-02 | 533.44
JZL | 2014-02-03 | 950.21
JZL | 2014-02-04 | 920.72
JZL | 2014-02-05 | 843.61
JZL | 2014-02-06 | 447.53
JZL | 2014-02-07 | 797.89
JZL | 2014-02-08 | 419.86
JZL | 2014-02-09 | 640.36
JZL | 2014-02-10 | 123.98
JZL | 2014-02-11 | 339.73
JZL | 2014-02-12 | 833.88
JZL | 2014-02-13 | 699.87
JZL | 2014-02-14 | 705.4
JZL | 2014-02-15 | 655.25
JZL | 2014-02-16 | 950.93
cqlsh> select * from "PortfolioDemo"."StockHist" where key='JZL' and
column1='2013-11-09';
key | column1 | value
-----+------------+--------
JZL | 2013-11-09 | 517.41
cqlsh> exit;
automaton@ubuntu:~$ cqlsh
Connected to test at localhost:9160.
[cqlsh 3.1.8 | Cassandra 1.2.15.1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol
19.36.2]
Use HELP for help.
cqlsh> select count(*) from "PortfolioDemo"."Stocks";
count
-------
2767
cqlsh> exit;
{noformat}
After nodetool flush on the first node and upgrade of the first node:
{noformat}
Connected to test at localhost:9160.
[cqlsh 4.1.0 | Cassandra 2.0.5-CASSANDRA-6707-SNAPSHOT | CQL spec 3.1.1 |
Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> select count(*) from "PortfolioDemo"."Stocks";
count
-------
2767
(1 rows)
cqlsh> select * from "PortfolioDemo"."StockHist" where key='JZL' and
column1='2013-11-09';
key | column1 | value
-----+------------+--------
JZL | 2013-11-09 | 517.41
(1 rows)
cqlsh> exit;
{noformat}
After upgrading the second node (*no nodetool flush* - deliberately):
{noformat}
automaton@ubuntu:~$ cqlsh
Connected to test at localhost:9160.
[cqlsh 4.1.0 | Cassandra 2.0.5-CASSANDRA-6707-SNAPSHOT | CQL spec 3.1.1 |
Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> select count(*) from "PortfolioDemo"."Stocks";
count
-------
1853
(1 rows)
cqlsh> select * from "PortfolioDemo"."StockHist" where key='JZL' and
column1='2013-11-09';
(0 rows)
{noformat}
There were *no errors nor warnings* reported in the system.log when the new
nodes were starting up.
It looks like we've got a serious problem with commit-log. [~thobbs] loaded
more data than me, so probably it managed to flush most data. That explains
pretty well his results (count being slightly below the correct value).
> AIOOBE when doing select count(*) from on a mixed cluster.
> ----------------------------------------------------------
>
> Key: CASSANDRA-6707
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6707
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: old nodes: Cassandra 1.2.16 from DSE 3.2.5 (unreleased)
> new node: Cassandra 2.0.5 from DSE 4.0.0 (unreleased)
> Reporter: Piotr Kołaczkowski
> Assignee: Tyler Hobbs
> Attachments: 6707.patch
>
>
> After upgrading one node from 1.2 to 2.0, the following query fails with
> timeout:
> {noformat}
> Connected to test at localhost:9160.
> [cqlsh 4.1.0 | Cassandra 2.0.5.1-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol
> 19.39.0]
> Use HELP for help.
> cqlsh> select count(*) from cfs.sblocks;
> Request did not complete within rpc_timeout.
> {noformat}
> Table definition:
> {noformat}
> cqlsh> describe columnfamily cfs.sblocks;
> CREATE TABLE sblocks (
> key blob,
> column1 blob,
> value blob,
> PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE AND
> bloom_filter_fp_chance=0.000068 AND
> caching='KEYS_ONLY' AND
> comment='Stores blocks of information associated with a inode' AND
> dclocal_read_repair_chance=0.000000 AND
> gc_grace_seconds=864000 AND
> index_interval=128 AND
> read_repair_chance=0.100000 AND
> replicate_on_write='true' AND
> populate_io_cache_on_flush='true' AND
> default_time_to_live=0 AND
> speculative_retry='99.0PERCENTILE' AND
> memtable_flush_period_in_ms=0 AND
> compaction={'class':
> 'com.datastax.bdp.hadoop.cfs.compaction.CFSCompactionStrategy'} AND
> compression={};
> {noformat}
> The 1.2 node reports the following error:
> {noformat}
> ERROR 08:38:02,006 Exception in thread Thread[Thread-32,5,main]
> java.lang.ArrayIndexOutOfBoundsException: 36
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:59)
> at
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:208)
> at
> org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:140)
> at
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:83)
> {noformat}
> There were no errors during the upgrade.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)