[ 
https://issues.apache.org/jira/browse/CASSANDRA-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902675#comment-13902675
 ] 

Piotr Kołaczkowski commented on CASSANDRA-6707:
-----------------------------------------------

I repeated the experiment with one minor change. This time I did nodetool flush 
before upgrading the first node and I didn't do it before upgrading the second 
node. Here are the results:

Before upgrade:
{noformat}
Connected to test at localhost:9160.
[cqlsh 3.1.8 | Cassandra 1.2.15.1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 
19.36.2]
Use HELP for help.
cqlsh> select * from "PortfolioDemo"."StockHist" where key='JZL';

 key | column1    | value
-----+------------+--------
 JZL | 2013-11-09 | 517.41
 JZL | 2013-11-10 | 621.67
 JZL | 2013-11-11 | 647.35
 JZL | 2013-11-12 | 189.38
 JZL | 2013-11-13 | 29.725
 JZL | 2013-11-14 | 385.86
 JZL | 2013-11-15 | 900.27
 JZL | 2013-11-16 |  210.4
 JZL | 2013-11-17 | 844.64
 JZL | 2013-11-18 | 619.82
 JZL | 2013-11-19 | 956.49
 JZL | 2013-11-20 | 928.05
 JZL | 2013-11-21 | 542.06
 JZL | 2013-11-22 | 437.06
 JZL | 2013-11-23 | 806.88
 JZL | 2013-11-24 | 179.27
 JZL | 2013-11-25 | 207.45
 JZL | 2013-11-26 |  848.3
 JZL | 2013-11-27 | 715.32
 JZL | 2013-11-28 | 445.85
 JZL | 2013-11-29 | 821.07
 JZL | 2013-11-30 |  873.4
 JZL | 2013-12-01 | 625.07
 JZL | 2013-12-02 | 21.017
 JZL | 2013-12-03 | 881.37
 JZL | 2013-12-04 | 443.81
 JZL | 2013-12-05 |  432.7
 JZL | 2013-12-06 | 850.86
 JZL | 2013-12-07 | 38.699
 JZL | 2013-12-08 | 612.97
 JZL | 2013-12-09 | 158.81
 JZL | 2013-12-10 | 378.39
 JZL | 2013-12-11 | 245.21
 JZL | 2013-12-12 | 428.54
 JZL | 2013-12-13 | 664.41
 JZL | 2013-12-14 | 784.94
 JZL | 2013-12-15 | 820.02
 JZL | 2013-12-16 | 859.82
 JZL | 2013-12-17 |  258.5
 JZL | 2013-12-18 | 731.21
 JZL | 2013-12-19 | 384.75
 JZL | 2013-12-20 | 696.25
 JZL | 2013-12-21 |  936.1
 JZL | 2013-12-22 | 781.04
 JZL | 2013-12-23 | 113.63
 JZL | 2013-12-24 | 254.12
 JZL | 2013-12-25 | 120.91
 JZL | 2013-12-26 | 65.565
 JZL | 2013-12-27 |  378.6
 JZL | 2013-12-28 | 712.02
 JZL | 2013-12-29 | 953.41
 JZL | 2013-12-30 | 788.21
 JZL | 2013-12-31 | 236.73
 JZL | 2014-01-01 | 727.81
 JZL | 2014-01-02 | 128.59
 JZL | 2014-01-03 | 290.18
 JZL | 2014-01-04 | 930.29
 JZL | 2014-01-05 | 160.98
 JZL | 2014-01-06 | 992.55
 JZL | 2014-01-07 | 92.251
 JZL | 2014-01-08 | 456.14
 JZL | 2014-01-09 | 969.27
 JZL | 2014-01-10 | 769.52
 JZL | 2014-01-11 | 864.01
 JZL | 2014-01-12 | 516.35
 JZL | 2014-01-13 | 547.88
 JZL | 2014-01-14 | 128.87
 JZL | 2014-01-15 | 847.73
 JZL | 2014-01-16 | 232.34
 JZL | 2014-01-17 | 491.26
 JZL | 2014-01-18 | 196.56
 JZL | 2014-01-19 |  57.35
 JZL | 2014-01-20 | 978.43
 JZL | 2014-01-21 | 588.59
 JZL | 2014-01-22 | 377.69
 JZL | 2014-01-23 | 772.32
 JZL | 2014-01-24 | 377.71
 JZL | 2014-01-25 | 121.46
 JZL | 2014-01-26 | 202.91
 JZL | 2014-01-27 | 679.37
 JZL | 2014-01-28 | 558.55
 JZL | 2014-01-29 | 493.89
 JZL | 2014-01-30 | 759.51
 JZL | 2014-01-31 | 331.46
 JZL | 2014-02-01 | 291.12
 JZL | 2014-02-02 | 533.44
 JZL | 2014-02-03 | 950.21
 JZL | 2014-02-04 | 920.72
 JZL | 2014-02-05 | 843.61
 JZL | 2014-02-06 | 447.53
 JZL | 2014-02-07 | 797.89
 JZL | 2014-02-08 | 419.86
 JZL | 2014-02-09 | 640.36
 JZL | 2014-02-10 | 123.98
 JZL | 2014-02-11 | 339.73
 JZL | 2014-02-12 | 833.88
 JZL | 2014-02-13 | 699.87
 JZL | 2014-02-14 |  705.4
 JZL | 2014-02-15 | 655.25
 JZL | 2014-02-16 | 950.93

cqlsh> select * from "PortfolioDemo"."StockHist" where key='JZL' and 
column1='2013-11-09';

 key | column1    | value
-----+------------+--------
 JZL | 2013-11-09 | 517.41

cqlsh> exit;
automaton@ubuntu:~$ cqlsh
Connected to test at localhost:9160.
[cqlsh 3.1.8 | Cassandra 1.2.15.1-SNAPSHOT | CQL spec 3.0.0 | Thrift protocol 
19.36.2]
Use HELP for help.
cqlsh> select count(*) from "PortfolioDemo"."Stocks";

 count
-------
  2767

cqlsh> exit;
{noformat}

After nodetool flush on the first node and upgrade of the first node:
{noformat}
Connected to test at localhost:9160.
[cqlsh 4.1.0 | Cassandra 2.0.5-CASSANDRA-6707-SNAPSHOT | CQL spec 3.1.1 | 
Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> select count(*) from "PortfolioDemo"."Stocks";

 count
-------
  2767

(1 rows)

cqlsh> select * from "PortfolioDemo"."StockHist" where key='JZL' and 
column1='2013-11-09';

 key | column1    | value
-----+------------+--------
 JZL | 2013-11-09 | 517.41

(1 rows)

cqlsh> exit;
{noformat}

After upgrading the second node (*no nodetool flush* - deliberately):
{noformat}
automaton@ubuntu:~$ cqlsh
Connected to test at localhost:9160.
[cqlsh 4.1.0 | Cassandra 2.0.5-CASSANDRA-6707-SNAPSHOT | CQL spec 3.1.1 | 
Thrift protocol 19.39.0]
Use HELP for help.
cqlsh> select count(*) from "PortfolioDemo"."Stocks";

 count
-------
  1853

(1 rows)

cqlsh> select * from "PortfolioDemo"."StockHist" where key='JZL' and 
column1='2013-11-09';

(0 rows)
{noformat}

There were *no errors nor warnings* reported in the system.log when the new 
nodes were starting up.
It looks like we've got a serious problem with commit-log. [~thobbs] loaded 
more data than me, so probably it managed to flush most data. That explains 
pretty well his results (count being slightly below the correct value).

> AIOOBE when doing select count(*) from on a mixed cluster.
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-6707
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6707
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: old nodes: Cassandra 1.2.16 from DSE 3.2.5  (unreleased)
> new node: Cassandra  2.0.5 from DSE 4.0.0 (unreleased)
>            Reporter: Piotr Kołaczkowski
>            Assignee: Tyler Hobbs
>         Attachments: 6707.patch
>
>
> After upgrading one node from 1.2 to 2.0, the following query fails with 
> timeout:
> {noformat}
> Connected to test at localhost:9160.
> [cqlsh 4.1.0 | Cassandra 2.0.5.1-SNAPSHOT | CQL spec 3.1.1 | Thrift protocol 
> 19.39.0]
> Use HELP for help.
> cqlsh> select count(*) from cfs.sblocks;
> Request did not complete within rpc_timeout.
> {noformat}
> Table definition:
> {noformat}
> cqlsh> describe columnfamily cfs.sblocks;
> CREATE TABLE sblocks (
>   key blob,
>   column1 blob,
>   value blob,
>   PRIMARY KEY (key, column1)
> ) WITH COMPACT STORAGE AND
>   bloom_filter_fp_chance=0.000068 AND
>   caching='KEYS_ONLY' AND
>   comment='Stores blocks of information associated with a inode' AND
>   dclocal_read_repair_chance=0.000000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.100000 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='true' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 
> 'com.datastax.bdp.hadoop.cfs.compaction.CFSCompactionStrategy'} AND
>   compression={};
> {noformat}
> The 1.2 node reports the following error:
> {noformat}
> ERROR 08:38:02,006 Exception in thread Thread[Thread-32,5,main]
> java.lang.ArrayIndexOutOfBoundsException: 36
>       at org.apache.cassandra.net.MessageIn.read(MessageIn.java:59)
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:208)
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.handleModernVersion(IncomingTcpConnection.java:140)
>       at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:83)
> {noformat}
> There were no errors during the upgrade.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to