Re: Inconsistent query results and node state

2016-03-31 Thread Tyler Hobbs
On Thu, Mar 31, 2016 at 11:53 AM, Jason Kania  wrote:

>
> To me it just seems like the timestamp column value is sometimes not being
> set somewhere in the pipeline and the result is the epoch 0 value.
>

I agree, especially since you can't directly query this row and that
timestamp doesn't fit in the normal ordering.


>
> Thoughts on how to proceed?
>

Please open a ticket at https://issues.apache.org/jira/browse/CASSANDRA and
include your schema and queries.  If possible, it would also be extremely
helpful if you can upload the sstables for that table.


-- 
Tyler Hobbs
DataStax 


Re: Inconsistent query results and node state

2016-03-31 Thread Jason Kania
Thanks for responding. The problems that we are having are in Cassandra 3.03 
and 3.0.4. We had upgraded to see if the problem went away.

The values have been out of sync this way for some time and we cannot get a row 
with the 1969 timestamp in any query that directly queries on the timestamp. 
The 1969-12-31 19:00 value comes inconsistently in range queries but seems to 
be tied to the 192.168.10.9 node.
We tried the writetime function value in the query on time but it is not 
allowed as the time column is part of the primary key. Instead we used it on an 
additional field that is written at the same time (classId):
subscriberId  sensorUnitId  sensorId  time  
writetime(classId)   JASKAN 0  0  2015-05-24 02:09  
  1458178461272000
   JASKAN 0  0  1969-12-31 19:00    1458178801214000
   JASKAN 0  0  2016-01-21 02:10    1458178801221000
   JASKAN 0  0  2016-01-21 02:10    1458178801226000
   JASKAN 0  0  2016-01-21 02:10    1458178801231000
   JASKAN 0  0  2016-01-21 02:11    1458178801235000
   JASKAN 0  0  2016-01-21 02:22    1458178801241000
   JASKAN 0  0  2016-01-21 02:22    1458178801247000
   JASKAN 0  0  2016-01-21 02:22    1458178801252000
   JASKAN 0  0  2016-01-21 02:22    1458178801258000
Based on the other column values in the table row, we confirmed that the actual 
time in the row showing up with the 1969-12-31 19:00 timestamp is associated 
with the following timestamp.
 subscriberId  sensorUnitId  sensorId  time  
writetime(classId)
   JASKAN 0  0  2016-01-21 02:09    1458178801214000

The 2016-01-21 02:09 timestamp is always present on all nodes if queried 
directly based on using tracing.
To me it just seems like the timestamp column value is sometimes not being set 
somewhere in the pipeline and the result is the epoch 0 value.
Thoughts on how to proceed?
Thanks,

Jason

  From: Tyler Hobbs <ty...@datastax.com>
 To: user@cassandra.apache.org 
 Sent: Wednesday, March 30, 2016 11:31 AM
 Subject: Re: Inconsistent query results and node state
   

org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-4908797801227889951, 4a41534b414e) 
(6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)

That key matches the row you mentioned, so it seems like all of the replicas 
should have converged on the same value for that row.  Do you consistently get 
the 1969-12-31 19:00 timestamp back now?  If not, try selecting both "time" and 
"writetime(time)}" from that row and see what write timestamps each of the 
values have.

The ArrayIndexOutOfBoundsException in response to nodetool compact looks like a 
bug.  What version of Cassandra are you running?
 
On Wed, Mar 30, 2016 at 9:59 AM, Kai Wang <dep...@gmail.com> wrote:

Do you have NTP setup on all nodes?

On Tue, Mar 29, 2016 at 11:48 PM, Jason Kania <jason.ka...@ymail.com> wrote:

We have encountered a query inconsistency problem wherein the following query 
returns different results sporadically with invalid values for a timestamp 
field looking like the field is uninitialized (a zero timestamp) in the query 
results.

Attempts to repair and compact have not changed the results.

select "subscriberId","sensorUnitId","sensorId","time" from 
"sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND 
"sensorId"=0 ORDER BY "time" LIMIT 10;

Invalid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    1969-12-31 19:00
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:11
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22

Valid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:11
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:14

We have confirmed that the 1969-12-31 timestamp is not within the data based on 
running and number of queries so it looks like the invalid timestamp value is 
generated by the query. The query below returns no row.

select * from "sensorReadingIndex" where "subscriberId"='JASKAN' AND 
"sensorUni

Re: Inconsistent query results and node state

2016-03-31 Thread Jason Kania
Thanks for the response.

All nodes are using NTP.
Thanks,
Jason

  From: Kai Wang <dep...@gmail.com>
 To: user@cassandra.apache.org; Jason Kania <jason.ka...@ymail.com> 
 Sent: Wednesday, March 30, 2016 10:59 AM
 Subject: Re: Inconsistent query results and node state
   
Do you have NTP setup on all nodes?

On Tue, Mar 29, 2016 at 11:48 PM, Jason Kania <jason.ka...@ymail.com> wrote:

We have encountered a query inconsistency problem wherein the following query 
returns different results sporadically with invalid values for a timestamp 
field looking like the field is uninitialized (a zero timestamp) in the query 
results.

Attempts to repair and compact have not changed the results.

select "subscriberId","sensorUnitId","sensorId","time" from 
"sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND 
"sensorId"=0 ORDER BY "time" LIMIT 10;

Invalid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    1969-12-31 19:00
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:11
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22

Valid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:11
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:14

We have confirmed that the 1969-12-31 timestamp is not within the data based on 
running and number of queries so it looks like the invalid timestamp value is 
generated by the query. The query below returns no row.

select * from "sensorReadingIndex" where "subscriberId"='JASKAN' AND 
"sensorUnitId"=0 AND "sensorId"=0 AND time='1969-12-31 19:00:00-0500';

No logs are coming out but the following was observed intermittently in the 
tracing output, but not correlated to the invalid query results:

 Digest mismatch: org.apache.cassandra.service.DigestMismatchException: 
Mismatch for key DecoratedKey(-7563144029910940626, 
00064a41534b414e040400) 
(be22d379c18f75c2f51dd6942d2f9356 vs da4e95d571b41303b908e0c5c3fff7ba) 
[ReadRepairStage:3179] | 2016-03-29 23:12:35.025000 | 192.168.10.10 |
An error from the debug log that might be related is:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-4908797801227889951, 4a41534b414e) 
(6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)
    at 
org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
~[apache-cassandra-3.0.3.jar:3.0.3]
    at 
org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
 ~[apache-cassandra-3.0.3.jar:3.0.3]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_74]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_74]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]

The tracing files are attached and seem to show that in the failed case, 
content is skipped because of tombstones if we understand it correctly. This 
could be an inconsistency problem on 192.168.10.9 Unfortunately, attempts to 
compact on 192.168.10.9 only give the following error without any stack trace 
detail and are not fixed with repair.

root@cutthroat:/usr/local/bin/analyzer/bin# nodetool compact
error: null
-- StackTrace --
java.lang.ArrayIndexOutOfBoundsException
Any suggestions on how to fix or what to search for would be much appreciated.
Thanks,
Jason







  

Re: Inconsistent query results and node state

2016-03-30 Thread Tyler Hobbs
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(-4908797801227889951, 4a41534b414e)
> (6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)


That key matches the row you mentioned, so it seems like all of the
replicas should have converged on the same value for that row.  Do you
consistently get the *1969-12-31 19:00 *timestamp back now?  If not, try
selecting both "time" and "writetime(time)}" from that row and see what
write timestamps each of the values have.

The ArrayIndexOutOfBoundsException in response to nodetool compact looks
like a bug.  What version of Cassandra are you running?

On Wed, Mar 30, 2016 at 9:59 AM, Kai Wang  wrote:

> Do you have NTP setup on all nodes?
>
> On Tue, Mar 29, 2016 at 11:48 PM, Jason Kania 
> wrote:
>
>> We have encountered a query inconsistency problem wherein the following
>> query returns different results sporadically with invalid values for a
>> timestamp field looking like the field is uninitialized (a zero timestamp)
>> in the query results.
>>
>> Attempts to repair and compact have not changed the results.
>>
>> select "subscriberId","sensorUnitId","sensorId","time" from
>> "sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND
>> "sensorId"=0 ORDER BY "time" LIMIT 10;
>>
>> Invalid Query Results
>> subscriberIdsensorUnitIdsensorIdtime
>> JASKAN002015-05-24 2:09
>> JASKAN00*1969-12-31 19:00*
>> JASKAN002016-01-21 2:10
>> JASKAN002016-01-21 2:10
>> JASKAN002016-01-21 2:10
>> JASKAN002016-01-21 2:11
>> JASKAN002016-01-21 2:22
>> JASKAN002016-01-21 2:22
>> JASKAN002016-01-21 2:22
>> JASKAN002016-01-21 2:22
>>
>> Valid Query Results
>> subscriberIdsensorUnitIdsensorIdtime
>> JASKAN002015-05-24 2:09
>> JASKAN002015-05-24 2:09
>> JASKAN002015-05-24 2:10
>> JASKAN002015-05-24 2:10
>> JASKAN002015-05-24 2:10
>> JASKAN002015-05-24 2:10
>> JASKAN002015-05-24 2:11
>> JASKAN002015-05-24 2:13
>> JASKAN002015-05-24 2:13
>> JASKAN002015-05-24 2:14
>>
>> We have confirmed that the 1969-12-31 timestamp is not within the data
>> based on running and number of queries so it looks like the invalid
>> timestamp value is generated by the query. The query below returns no row.
>>
>> select * from "sensorReadingIndex" where "subscriberId"='JASKAN' AND
>> "sensorUnitId"=0 AND "sensorId"=0 AND time='1969-12-31 19:00:00-0500';
>>
>> No logs are coming out but the following was observed intermittently in
>> the tracing output, but not correlated to the invalid query results:
>>
>>  Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
>> Mismatch for key DecoratedKey(-7563144029910940626,
>> 00064a41534b414e040400)
>> (be22d379c18f75c2f51dd6942d2f9356 vs da4e95d571b41303b908e0c5c3fff7ba)
>> [ReadRepairStage:3179] | 2016-03-29 23:12:35.025000 | 192.168.10.10 |
>>
>> An error from the debug log that might be related is:
>>
>> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
>> DecoratedKey(-4908797801227889951, 4a41534b414e)
>> (6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)
>> at
>> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85)
>> ~[apache-cassandra-3.0.3.jar:3.0.3]
>> at
>> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
>> ~[apache-cassandra-3.0.3.jar:3.0.3]
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> [na:1.8.0_74]
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> [na:1.8.0_74]
>> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
>>
>> The tracing files are attached and seem to show that in the failed case,
>> content is skipped because of tombstones if we understand it correctly.
>> This could be an inconsistency problem on 192.168.10.9 Unfortunately,
>> attempts to compact on 192.168.10.9 only give the following error without
>> any stack trace detail and are not fixed with repair.
>>
>> root@cutthroat:/usr/local/bin/analyzer/bin# nodetool compact
>> error: null
>> -- StackTrace --
>> java.lang.ArrayIndexOutOfBoundsException
>>
>> Any suggestions on how to fix or what to search for would be much
>> appreciated.
>>
>> Thanks,
>>
>> Jason
>>
>>
>>
>>
>


-- 
Tyler Hobbs
DataStax 


Re: Inconsistent query results and node state

2016-03-30 Thread Kai Wang
Do you have NTP setup on all nodes?

On Tue, Mar 29, 2016 at 11:48 PM, Jason Kania  wrote:

> We have encountered a query inconsistency problem wherein the following
> query returns different results sporadically with invalid values for a
> timestamp field looking like the field is uninitialized (a zero timestamp)
> in the query results.
>
> Attempts to repair and compact have not changed the results.
>
> select "subscriberId","sensorUnitId","sensorId","time" from
> "sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND
> "sensorId"=0 ORDER BY "time" LIMIT 10;
>
> Invalid Query Results
> subscriberIdsensorUnitIdsensorIdtime
> JASKAN002015-05-24 2:09
> JASKAN00*1969-12-31 19:00*
> JASKAN002016-01-21 2:10
> JASKAN002016-01-21 2:10
> JASKAN002016-01-21 2:10
> JASKAN002016-01-21 2:11
> JASKAN002016-01-21 2:22
> JASKAN002016-01-21 2:22
> JASKAN002016-01-21 2:22
> JASKAN002016-01-21 2:22
>
> Valid Query Results
> subscriberIdsensorUnitIdsensorIdtime
> JASKAN002015-05-24 2:09
> JASKAN002015-05-24 2:09
> JASKAN002015-05-24 2:10
> JASKAN002015-05-24 2:10
> JASKAN002015-05-24 2:10
> JASKAN002015-05-24 2:10
> JASKAN002015-05-24 2:11
> JASKAN002015-05-24 2:13
> JASKAN002015-05-24 2:13
> JASKAN002015-05-24 2:14
>
> We have confirmed that the 1969-12-31 timestamp is not within the data
> based on running and number of queries so it looks like the invalid
> timestamp value is generated by the query. The query below returns no row.
>
> select * from "sensorReadingIndex" where "subscriberId"='JASKAN' AND
> "sensorUnitId"=0 AND "sensorId"=0 AND time='1969-12-31 19:00:00-0500';
>
> No logs are coming out but the following was observed intermittently in
> the tracing output, but not correlated to the invalid query results:
>
>  Digest mismatch: org.apache.cassandra.service.DigestMismatchException:
> Mismatch for key DecoratedKey(-7563144029910940626,
> 00064a41534b414e040400)
> (be22d379c18f75c2f51dd6942d2f9356 vs da4e95d571b41303b908e0c5c3fff7ba)
> [ReadRepairStage:3179] | 2016-03-29 23:12:35.025000 | 192.168.10.10 |
>
> An error from the debug log that might be related is:
>
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key
> DecoratedKey(-4908797801227889951, 4a41534b414e)
> (6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)
> at
> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85)
> ~[apache-cassandra-3.0.3.jar:3.0.3]
> at
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
> ~[apache-cassandra-3.0.3.jar:3.0.3]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_74]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_74]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]
>
> The tracing files are attached and seem to show that in the failed case,
> content is skipped because of tombstones if we understand it correctly.
> This could be an inconsistency problem on 192.168.10.9 Unfortunately,
> attempts to compact on 192.168.10.9 only give the following error without
> any stack trace detail and are not fixed with repair.
>
> root@cutthroat:/usr/local/bin/analyzer/bin# nodetool compact
> error: null
> -- StackTrace --
> java.lang.ArrayIndexOutOfBoundsException
>
> Any suggestions on how to fix or what to search for would be much
> appreciated.
>
> Thanks,
>
> Jason
>
>
>
>


Inconsistent query results and node state

2016-03-29 Thread Jason Kania
We have encountered a query inconsistency problem wherein the following query 
returns different results sporadically with invalid values for a timestamp 
field looking like the field is uninitialized (a zero timestamp) in the query 
results.

Attempts to repair and compact have not changed the results.

select "subscriberId","sensorUnitId","sensorId","time" from 
"sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND 
"sensorId"=0 ORDER BY "time" LIMIT 10;

Invalid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    1969-12-31 19:00
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:10
JASKAN    0    0    2016-01-21 2:11
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22
JASKAN    0    0    2016-01-21 2:22

Valid Query Results
subscriberId    sensorUnitId    sensorId    time
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:09
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:10
JASKAN    0    0    2015-05-24 2:11
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:13
JASKAN    0    0    2015-05-24 2:14

We have confirmed that the 1969-12-31 timestamp is not within the data based on 
running and number of queries so it looks like the invalid timestamp value is 
generated by the query. The query below returns no row.

select * from "sensorReadingIndex" where "subscriberId"='JASKAN' AND 
"sensorUnitId"=0 AND "sensorId"=0 AND time='1969-12-31 19:00:00-0500';

No logs are coming out but the following was observed intermittently in the 
tracing output, but not correlated to the invalid query results:

 Digest mismatch: org.apache.cassandra.service.DigestMismatchException: 
Mismatch for key DecoratedKey(-7563144029910940626, 
00064a41534b414e040400) 
(be22d379c18f75c2f51dd6942d2f9356 vs da4e95d571b41303b908e0c5c3fff7ba) 
[ReadRepairStage:3179] | 2016-03-29 23:12:35.025000 | 192.168.10.10 |
An error from the debug log that might be related is:
org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
DecoratedKey(-4908797801227889951, 4a41534b414e) 
(6a6c8ab013d7757e702af50cbdae045c vs 2ece61a01b2a640ac10509f4c49ae6fb)
    at 
org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
~[apache-cassandra-3.0.3.jar:3.0.3]
    at 
org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
 ~[apache-cassandra-3.0.3.jar:3.0.3]
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_74]
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_74]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_74]

The tracing files are attached and seem to show that in the failed case, 
content is skipped because of tombstones if we understand it correctly. This 
could be an inconsistency problem on 192.168.10.9 Unfortunately, attempts to 
compact on 192.168.10.9 only give the following error without any stack trace 
detail and are not fixed with repair.

root@cutthroat:/usr/local/bin/analyzer/bin# nodetool compact
error: null
-- StackTrace --
java.lang.ArrayIndexOutOfBoundsException
Any suggestions on how to fix or what to search for would be much appreciated.
Thanks,
Jason



Tracing session: 09e26410-f626-11e5-8b85-9b8e819c8182

 activity   

 | timestamp  | 
source| source_elapsed
-++---+


  Execute CQL3 query | 2016-03-29 23:18:13.969000 | 
192.168.10.10 |  0
 Parsing select "subscriberId","sensorUnitId","sensorId","time" from 
"sensorReadingIndex" where "subscriberId"='JASKAN' AND "sensorUnitId"=0 AND 
"sensorId"=0 ORDER BY "time" LIMIT 10; [SharedPool-Worker-2] | 2016-03-29 
23:18:13.97 | 192.168.10.10 |181

READ message received from 
/192.168.10.10 [MessagingService-Incoming-/192.168.10.10] | 2016-03-29 
23:18:13.97 |  192.168.10.9 | 20