Re: repair cause large number of SSTABLEs

2011-01-27 Thread aaron morton
The ArrayIndexOutOfBounds in the ReadStage looks like it can happen if a key is 
not of the expected type. Could the comparator for the CF have changed ?

The error in the RequestResponseStage may be the race condition identified here 
 https://issues.apache.org/jira/browse/CASSANDRA-1959

Aaron


On 27 Jan 2011, at 19:22, B. Todd Burruss wrote:

 i ran out of file handles on the repairing node after doing nodetool repair 
 - strange as i have never had this issue until using 0.7.0 (but i should say 
 that i have not truly tested 0.7.0 until now.)  up'ed the number of file 
 handles, removed data, restarted nodes, then restarted my test.  waited a 
 little while.  i have two keyspaces on the cluster, so i checked the number 
 of SSTABLES in one of them before nodetool repair and i see 36 data.db 
 files, spread over 11 column families.  very reasonable.
 
 after running nodetool repair i have over 900 data.db files, immediately!  
 now after waiting several hours i have over 1500 data.db files.  out of these 
 i have 95 compacted files
 
 lsof reporting 803 files in use by cassandra for the Queues keyspace ...
 
 [cassandra@kv-app02 ~]$ /usr/sbin/lsof -p 32645|grep Data.db|grep -c Queues
 803
 
 .. this doesn't sound right to me.  checking the server log i see a lot of 
 these messages:
 
 ERROR [RequestResponseStage:14] 2011-01-26 17:00:29,493 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.fastRemove(ArrayList.java:441)
at java.util.ArrayList.remove(ArrayList.java:424)
at 
 com.google.common.collect.AbstractMultimap.remove(AbstractMultimap.java:219)
at 
 com.google.common.collect.ArrayListMultimap.remove(ArrayListMultimap.java:60)
at 
 org.apache.cassandra.net.MessagingService.responseReceivedFrom(MessagingService.java:436)
at 
 org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:40)
at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
 
 
 and a lot of these:
 
 ERROR [ReadStage:809] 2011-01-26 21:48:01,047 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.ArrayIndexOutOfBoundsException
 ERROR [ReadStage:809] 2011-01-26 21:48:01,047 AbstractCassandraDaemon.java 
 (line 91) Fatal exception in thread Thread[ReadStage:809,5,main]
 java.lang.ArrayIndexOutOfBoundsException
 
 and some more like this:
 ERROR [ReadStage:15] 2011-01-26 20:59:14,695 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.ArrayIndexOutOfBoundsException: 6
at 
 org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56)
at 
 org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45)
at 
 org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29)
at 
 org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:98)
at 
 org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:95)
at 
 org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:334)
at 
 org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at 
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118)
at 
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)
at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230)
at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
 



Re: repair cause large number of SSTABLEs

2011-01-27 Thread Todd Burruss
The comparator has not changed.

Sent from my Android phone using TouchDown (www.nitrodesk.com)

-Original Message-
From: aaron morton [aa...@thelastpickle.com]
Received: Thursday, 27 Jan 2011, 1:10am
To: user@cassandra.apache.org [user@cassandra.apache.org]
Subject: Re: repair cause large number of SSTABLEs

The ArrayIndexOutOfBounds in the ReadStage looks like it can happen if a key is 
not of the expected type. Could the comparator for the CF have changed ?

The error in the RequestResponseStage may be the race condition identified here 
 https://issues.apache.org/jira/browse/CASSANDRA-1959

Aaron


On 27 Jan 2011, at 19:22, B. Todd Burruss wrote:

 i ran out of file handles on the repairing node after doing nodetool repair 
 - strange as i have never had this issue until using 0.7.0 (but i should say 
 that i have not truly tested 0.7.0 until now.)  up'ed the number of file 
 handles, removed data, restarted nodes, then restarted my test.  waited a 
 little while.  i have two keyspaces on the cluster, so i checked the number 
 of SSTABLES in one of them before nodetool repair and i see 36 data.db 
 files, spread over 11 column families.  very reasonable.

 after running nodetool repair i have over 900 data.db files, immediately!  
 now after waiting several hours i have over 1500 data.db files.  out of these 
 i have 95 compacted files

 lsof reporting 803 files in use by cassandra for the Queues keyspace ...

 [cassandra@kv-app02 ~]$ /usr/sbin/lsof -p 32645|grep Data.db|grep -c Queues
 803

 .. this doesn't sound right to me.  checking the server log i see a lot of 
 these messages:

 ERROR [RequestResponseStage:14] 2011-01-26 17:00:29,493 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.ArrayIndexOutOfBoundsException: -1
at java.util.ArrayList.fastRemove(ArrayList.java:441)
at java.util.ArrayList.remove(ArrayList.java:424)
at 
 com.google.common.collect.AbstractMultimap.remove(AbstractMultimap.java:219)
at 
 com.google.common.collect.ArrayListMultimap.remove(ArrayListMultimap.java:60)
at 
 org.apache.cassandra.net.MessagingService.responseReceivedFrom(MessagingService.java:436)
at 
 org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:40)
at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)


 and a lot of these:

 ERROR [ReadStage:809] 2011-01-26 21:48:01,047 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.ArrayIndexOutOfBoundsException
 ERROR [ReadStage:809] 2011-01-26 21:48:01,047 AbstractCassandraDaemon.java 
 (line 91) Fatal exception in thread Thread[ReadStage:809,5,main]
 java.lang.ArrayIndexOutOfBoundsException

 and some more like this:
 ERROR [ReadStage:15] 2011-01-26 20:59:14,695 
 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
 java.lang.ArrayIndexOutOfBoundsException: 6
at 
 org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56)
at 
 org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45)
at 
 org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29)
at 
 org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:98)
at 
 org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:95)
at 
 org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:334)
at 
 org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
at 
 org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
at 
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118)
at 
 org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142)
at 
 org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230)
at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107)
at 
 org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077)
at org.apache.cassandra.db.Table.getRow(Table.java:384)
at 
 org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63)
at 
 org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68)
at 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63

Re: repair cause large number of SSTABLEs

2011-01-27 Thread Matthew Conway
Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ?

On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote:

 i ran out of file handles on the repairing node after doing nodetool repair 
 - strange as i have never had this issue until using 0.7.0 (but i should say 
 that i have not truly tested 0.7.0 until now.)  up'ed the number of file 
 handles, removed data, restarted nodes, then restarted my test.  waited a 
 little while.  i have two keyspaces on the cluster, so i checked the number 
 of SSTABLES in one of them before nodetool repair and i see 36 data.db 
 files, spread over 11 column families.  very reasonable.



RE: repair cause large number of SSTABLEs

2011-01-27 Thread Todd Burruss
thx, but i didn't do anything like removing/adding nodes.  just did a nodetool 
repair after running for an hour or so on a clean install


From: Matthew Conway [m...@backupify.com]
Sent: Thursday, January 27, 2011 8:17 AM
To: user@cassandra.apache.org
Subject: Re: repair cause large number of SSTABLEs

Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ?

On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote:

i ran out of file handles on the repairing node after doing nodetool repair - 
strange as i have never had this issue until using 0.7.0 (but i should say that 
i have not truly tested 0.7.0 until now.)  up'ed the number of file handles, 
removed data, restarted nodes, then restarted my test.  waited a little while.  
i have two keyspaces on the cluster, so i checked the number of SSTABLES in one 
of them before nodetool repair and i see 36 data.db files, spread over 11 
column families.  very reasonable.



Re: repair cause large number of SSTABLEs

2011-01-27 Thread Brandon Williams
On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com wrote:

 thx, but i didn't do anything like removing/adding nodes.  just did a
 nodetool repair after running for an hour or so on a clean install


It affects anything that involves streaming.

-Brandon


Re: repair cause large number of SSTABLEs

2011-01-27 Thread B. Todd Burruss
ok thx.  what about the repair creating hundreds of new sstables and 
lsof showing cassandra using currently over 800 Data.db files? is this 
normal?


On 01/27/2011 08:40 AM, Brandon Williams wrote:
On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com 
mailto:bburr...@real.com wrote:


thx, but i didn't do anything like removing/adding nodes.  just
did a nodetool repair after running for an hour or so on a clean
install


It affects anything that involves streaming.

-Brandon


Re: repair cause large number of SSTABLEs

2011-01-27 Thread Stu Hood
When the destination node fails to open the streamed SSTable, we assume it
was corrupted during transfer, and retry the stream. Independent of the
exception posted above, it is a problem that the failed transfers were not
cleaned up.

How many of the data files are marked as -tmp-?
On Jan 27, 2011 9:00 AM, B. Todd Burruss bburr...@real.com wrote:
 ok thx. what about the repair creating hundreds of new sstables and
 lsof showing cassandra using currently over 800 Data.db files? is this
 normal?

 On 01/27/2011 08:40 AM, Brandon Williams wrote:
 On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com
 mailto:bburr...@real.com wrote:

 thx, but i didn't do anything like removing/adding nodes. just
 did a nodetool repair after running for an hour or so on a clean
 install


 It affects anything that involves streaming.

 -Brandon


Re: repair cause large number of SSTABLEs

2011-01-27 Thread B. Todd Burruss
[cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Data.db 
| grep -c -v \-tmp\-

824

[cassandra@kv-app02 ~]$ ls -l 
/data/cassandra-data/data/Queues/*-tmp-*Data.db | wc -l

829

[cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Comp* | 
wc -l

247


On 01/27/2011 11:14 AM, Stu Hood wrote:


When the destination node fails to open the streamed SSTable, we 
assume it was corrupted during transfer, and retry the stream. 
Independent of the exception posted above, it is a problem that the 
failed transfers were not cleaned up.


How many of the data files are marked as -tmp-?

On Jan 27, 2011 9:00 AM, B. Todd Burruss bburr...@real.com 
mailto:bburr...@real.com wrote:

 ok thx. what about the repair creating hundreds of new sstables and
 lsof showing cassandra using currently over 800 Data.db files? is this
 normal?

 On 01/27/2011 08:40 AM, Brandon Williams wrote:
 On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com 
mailto:bburr...@real.com

 mailto:bburr...@real.com mailto:bburr...@real.com wrote:

 thx, but i didn't do anything like removing/adding nodes. just
 did a nodetool repair after running for an hour or so on a clean
 install


 It affects anything that involves streaming.

 -Brandon