Re: Possibly cassandra 3.0.9 bug?

2017-11-15 Thread Pavel Drankov
Hi Alex,

I don't see any attached image. Can you please send it one more time?

Best wishes,
Pavel

On 16 November 2017 at 01:04, Alex Circus 
wrote:

> Hi,
>
> *On short:*
> I use cassandra 3.0.9 in a cluster of 6 nodes.
> 1. I create a keyspace called test:
> CREATE KEYSPACE business WITH replication = {'class':
> 'SimpleStrategy', 'replication_factor': '3'}  AND durable_writes = true;
> 2. I create table called test:
>
> CREATE TABLE test.test (
>
> test_id bigint,
>
> test_value text
>
> PRIMARY KEY (test_id)
>
> )
>
> 3. I insert test_id=23 and test_value=some very large string/html (like
> 406088 chars utf8).
>
> 4. I query for test_id=35 and I get timeout (even with clqsh
> --request-timeout=3600)...
>
> 5. If I run the above on an existing cassandra cluster with cassa 2.0 the
> select returns instantlyThe Java heap size is 8GB and in JMX I see max
> 4GB used of these 8 GB in the new cluster
>
>
> *Detailed:*
>
> The above was just a test. The real scenario is:
>
> I migrated some tables from an old cassa (2.0) cluster with 9 nodes into
> another with 6 nodes and with cassa 3.0.9 and there was a lot of
> problems
>
> I have a table like this:
>
> CREATE TABLE table (
>   id text,
>   ts text,
>   score decimal,
>   type text,
>   values text,
>   PRIMARY KEY (id, ts)
> ) WITH CLUSTERING ORDER BY (ts DESC)
>
> and the following query (which returns instantly):
>
> SELECT * FROM keyspace.table WHERE id='someId' AND ts IN 
> ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06');
>
> *If I add another day in the IN clause, the response never comes (even
> after 10 minutes!!!):*
>
> SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
> ('2017-10-15','2017-10-16','2017-10-17','2017-10-18','
> 2017-10-19','2017-10-20','2017-10-21','2017-10-22','
> 2017-10-23','2017-10-24','2017-10-25','2017-10-26','
> 2017-10-27','2017-10-28','2017-10-29','2017-10-30','
> 2017-10-31','2017-11-01','2017-11-02','2017-11-03','
> 2017-11-04','2017-11-05','2017-11-06', *'2017-11-07'*);
>
> *The 'values' column may have large json data. *
>
> I managed to trace one of the timeouts by looking into system_trace
> keyspace. Please look into the attached image and see the last process took
> 10 minutes!!!
>
> I think there is some size limit somewhere because in* the IN clause *if
> I have 23 params it works(under 1 second), but with more(1+) it fails. The
> rows are the same size (same json size on all). In node2 of those 6 it
> works with 24 params. In node1 and node3 no. The other nodes I haven't
> checked yet.
>
> I saw no concluding logs except this one from cassa's debug.log (in the
> moment of the timeout or very close to that):
>
> *DEBUG [Thrift:2608] 2017-11-15 13:48:05,611 ReadCallback.java:126 - Timed
> out; received 0 of 1 responses*
>
> I think this problem has the same root cause as the one from the test
> (large html text) and it is related to some memory limit by code somewhere.
>
>
> Thank you,
>
> Alex.
> [image: screenshot.png]
>
>


Re: Flakey Dtests

2017-11-15 Thread Michael Kjellman
yes - true- some are flaky, but almost all of the ones i filed fail 100% (đź’Ż) of 
the time. i look forward to triaging just the remaining flaky ones (hopefully - 
without powers combined - by the end of this month!!)

appreciate everyone’s help - no matter how small... i already personally did a 
few “fun” random-python-class-is-missing-return-after-method stuff. 

we’ve wanted this for a while and now is our time to actually execute and make 
good on our previous dev list promises. 

best,
kjellman

> On Nov 15, 2017, at 5:45 PM, Jeff Jirsa  wrote:
> 
> In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.
> 
> If you haven't been paying attention to JIRA, you likely didn't notice that
> Josh went through and triage/categorized a bunch of issues by adding
> components, and Michael took the time to open a bunch of JIRAs for failing
> tests.
> 
> How many is a bunch? Something like 35 or so just for tests currently
> failing on trunk.  If you're a regular contributor, you already know that
> dtests are flakey - it'd be great if a few of us can go through and fix a
> few. Even incremental improvements are improvements. Here's an easy search
> to find them:
> 
> https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+CASSANDRA+AND+component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+DESC%2C+created+ASC&mode=hide
> 
> If you're a new contributor, fixing tests is often a good way to learn a
> new part of the codebase. Many of these are dtests, which live in a
> different repo ( https://github.com/apache/cassandra-dtest ) and are in
> python, but have no fear, the repo has instructions for setting up and
> running dtests(
> https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )
> 
> Normal contribution workflow applies: self-assign the ticket if you want to
> work on it, click on 'start progress' to indicate that you're working on
> it, mark it 'patch available' when you've uploaded code to be reviewed (in
> a github branch, or as a standalone patch file attached to the JIRA). If
> you have questions, feel free to email the dev list (that's what it's here
> for).
> 
> Many thanks will be given,
> - Jeff


Flakey Dtests

2017-11-15 Thread Jeff Jirsa
In lieu of a weekly wrap-up, here's a pre-Thanksgiving call for help.

If you haven't been paying attention to JIRA, you likely didn't notice that
Josh went through and triage/categorized a bunch of issues by adding
components, and Michael took the time to open a bunch of JIRAs for failing
tests.

How many is a bunch? Something like 35 or so just for tests currently
failing on trunk.  If you're a regular contributor, you already know that
dtests are flakey - it'd be great if a few of us can go through and fix a
few. Even incremental improvements are improvements. Here's an easy search
to find them:

https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&jqlQuery=project+%3D+CASSANDRA+AND+component+%3D+Testing+ORDER+BY+updated+DESC%2C+priority+DESC%2C+created+ASC&mode=hide

If you're a new contributor, fixing tests is often a good way to learn a
new part of the codebase. Many of these are dtests, which live in a
different repo ( https://github.com/apache/cassandra-dtest ) and are in
python, but have no fear, the repo has instructions for setting up and
running dtests(
https://github.com/apache/cassandra-dtest/blob/master/INSTALL.md )

Normal contribution workflow applies: self-assign the ticket if you want to
work on it, click on 'start progress' to indicate that you're working on
it, mark it 'patch available' when you've uploaded code to be reviewed (in
a github branch, or as a standalone patch file attached to the JIRA). If
you have questions, feel free to email the dev list (that's what it's here
for).

Many thanks will be given,
- Jeff


Possibly cassandra 3.0.9 bug?

2017-11-15 Thread Alex Circus
Hi,

*On short:*
I use cassandra 3.0.9 in a cluster of 6 nodes.
1. I create a keyspace called test:
CREATE KEYSPACE business WITH replication = {'class': 'SimpleStrategy',
'replication_factor': '3'}  AND durable_writes = true;
2. I create table called test:

CREATE TABLE test.test (

test_id bigint,

test_value text

PRIMARY KEY (test_id)

)

3. I insert test_id=23 and test_value=some very large string/html (like
406088 chars utf8).

4. I query for test_id=35 and I get timeout (even with clqsh
--request-timeout=3600)...

5. If I run the above on an existing cassandra cluster with cassa 2.0 the
select returns instantlyThe Java heap size is 8GB and in JMX I see max
4GB used of these 8 GB in the new cluster


*Detailed:*

The above was just a test. The real scenario is:

I migrated some tables from an old cassa (2.0) cluster with 9 nodes into
another with 6 nodes and with cassa 3.0.9 and there was a lot of
problems

I have a table like this:

CREATE TABLE table (
  id text,
  ts text,
  score decimal,
  type text,
  values text,
  PRIMARY KEY (id, ts)
) WITH CLUSTERING ORDER BY (ts DESC)

and the following query (which returns instantly):

SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06');

*If I add another day in the IN clause, the response never comes (even
after 10 minutes!!!):*

SELECT * FROM keyspace.table WHERE id='someId' AND ts IN
('2017-10-15','2017-10-16','2017-10-17','2017-10-18','2017-10-19','2017-10-20','2017-10-21','2017-10-22','2017-10-23','2017-10-24','2017-10-25','2017-10-26','2017-10-27','2017-10-28','2017-10-29','2017-10-30','2017-10-31','2017-11-01','2017-11-02','2017-11-03','2017-11-04','2017-11-05','2017-11-06',
*'2017-11-07'*);

*The 'values' column may have large json data. *

I managed to trace one of the timeouts by looking into system_trace
keyspace. Please look into the attached image and see the last process took
10 minutes!!!

I think there is some size limit somewhere because in* the IN clause *if I
have 23 params it works(under 1 second), but with more(1+) it fails. The
rows are the same size (same json size on all). In node2 of those 6 it
works with 24 params. In node1 and node3 no. The other nodes I haven't
checked yet.

I saw no concluding logs except this one from cassa's debug.log (in the
moment of the timeout or very close to that):

*DEBUG [Thrift:2608] 2017-11-15 13:48:05,611 ReadCallback.java:126 - Timed
out; received 0 of 1 responses*

I think this problem has the same root cause as the one from the test
(large html text) and it is related to some memory limit by code somewhere.


Thank you,

Alex.
[image: screenshot.png]