Re: repair cause large number of SSTABLEs
The ArrayIndexOutOfBounds in the ReadStage looks like it can happen if a key is not of the expected type. Could the comparator for the CF have changed ? The error in the RequestResponseStage may be the race condition identified here https://issues.apache.org/jira/browse/CASSANDRA-1959 Aaron On 27 Jan 2011, at 19:22, B. Todd Burruss wrote: i ran out of file handles on the repairing node after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say that i have not truly tested 0.7.0 until now.) up'ed the number of file handles, removed data, restarted nodes, then restarted my test. waited a little while. i have two keyspaces on the cluster, so i checked the number of SSTABLES in one of them before nodetool repair and i see 36 data.db files, spread over 11 column families. very reasonable. after running nodetool repair i have over 900 data.db files, immediately! now after waiting several hours i have over 1500 data.db files. out of these i have 95 compacted files lsof reporting 803 files in use by cassandra for the Queues keyspace ... [cassandra@kv-app02 ~]$ /usr/sbin/lsof -p 32645|grep Data.db|grep -c Queues 803 .. this doesn't sound right to me. checking the server log i see a lot of these messages: ERROR [RequestResponseStage:14] 2011-01-26 17:00:29,493 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.fastRemove(ArrayList.java:441) at java.util.ArrayList.remove(ArrayList.java:424) at com.google.common.collect.AbstractMultimap.remove(AbstractMultimap.java:219) at com.google.common.collect.ArrayListMultimap.remove(ArrayListMultimap.java:60) at org.apache.cassandra.net.MessagingService.responseReceivedFrom(MessagingService.java:436) at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:40) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) and a lot of these: ERROR [ReadStage:809] 2011-01-26 21:48:01,047 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.ArrayIndexOutOfBoundsException ERROR [ReadStage:809] 2011-01-26 21:48:01,047 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[ReadStage:809,5,main] java.lang.ArrayIndexOutOfBoundsException and some more like this: ERROR [ReadStage:15] 2011-01-26 20:59:14,695 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:98) at org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:95) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:334) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619)
Re: repair cause large number of SSTABLEs
The comparator has not changed. Sent from my Android phone using TouchDown (www.nitrodesk.com) -Original Message- From: aaron morton [aa...@thelastpickle.com] Received: Thursday, 27 Jan 2011, 1:10am To: user@cassandra.apache.org [user@cassandra.apache.org] Subject: Re: repair cause large number of SSTABLEs The ArrayIndexOutOfBounds in the ReadStage looks like it can happen if a key is not of the expected type. Could the comparator for the CF have changed ? The error in the RequestResponseStage may be the race condition identified here https://issues.apache.org/jira/browse/CASSANDRA-1959 Aaron On 27 Jan 2011, at 19:22, B. Todd Burruss wrote: i ran out of file handles on the repairing node after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say that i have not truly tested 0.7.0 until now.) up'ed the number of file handles, removed data, restarted nodes, then restarted my test. waited a little while. i have two keyspaces on the cluster, so i checked the number of SSTABLES in one of them before nodetool repair and i see 36 data.db files, spread over 11 column families. very reasonable. after running nodetool repair i have over 900 data.db files, immediately! now after waiting several hours i have over 1500 data.db files. out of these i have 95 compacted files lsof reporting 803 files in use by cassandra for the Queues keyspace ... [cassandra@kv-app02 ~]$ /usr/sbin/lsof -p 32645|grep Data.db|grep -c Queues 803 .. this doesn't sound right to me. checking the server log i see a lot of these messages: ERROR [RequestResponseStage:14] 2011-01-26 17:00:29,493 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.ArrayIndexOutOfBoundsException: -1 at java.util.ArrayList.fastRemove(ArrayList.java:441) at java.util.ArrayList.remove(ArrayList.java:424) at com.google.common.collect.AbstractMultimap.remove(AbstractMultimap.java:219) at com.google.common.collect.ArrayListMultimap.remove(ArrayListMultimap.java:60) at org.apache.cassandra.net.MessagingService.responseReceivedFrom(MessagingService.java:436) at org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:40) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) and a lot of these: ERROR [ReadStage:809] 2011-01-26 21:48:01,047 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.ArrayIndexOutOfBoundsException ERROR [ReadStage:809] 2011-01-26 21:48:01,047 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[ReadStage:809,5,main] java.lang.ArrayIndexOutOfBoundsException and some more like this: ERROR [ReadStage:15] 2011-01-26 20:59:14,695 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.cassandra.db.marshal.TimeUUIDType.compareTimestampBytes(TimeUUIDType.java:56) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:45) at org.apache.cassandra.db.marshal.TimeUUIDType.compare(TimeUUIDType.java:29) at org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:98) at org.apache.cassandra.db.filter.QueryFilter$1.compare(QueryFilter.java:95) at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:334) at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230) at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:118) at org.apache.cassandra.db.filter.QueryFilter.collectCollatedColumns(QueryFilter.java:142) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1230) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1107) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1077) at org.apache.cassandra.db.Table.getRow(Table.java:384) at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:63) at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:68) at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:63
Re: repair cause large number of SSTABLEs
Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ? On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote: i ran out of file handles on the repairing node after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say that i have not truly tested 0.7.0 until now.) up'ed the number of file handles, removed data, restarted nodes, then restarted my test. waited a little while. i have two keyspaces on the cluster, so i checked the number of SSTABLES in one of them before nodetool repair and i see 36 data.db files, spread over 11 column families. very reasonable.
RE: repair cause large number of SSTABLEs
thx, but i didn't do anything like removing/adding nodes. just did a nodetool repair after running for an hour or so on a clean install From: Matthew Conway [m...@backupify.com] Sent: Thursday, January 27, 2011 8:17 AM To: user@cassandra.apache.org Subject: Re: repair cause large number of SSTABLEs Maybe related to https://issues.apache.org/jira/browse/CASSANDRA-1992 ? On Jan 27, 2011, at Thu Jan 27, 1:22 AM, B. Todd Burruss wrote: i ran out of file handles on the repairing node after doing nodetool repair - strange as i have never had this issue until using 0.7.0 (but i should say that i have not truly tested 0.7.0 until now.) up'ed the number of file handles, removed data, restarted nodes, then restarted my test. waited a little while. i have two keyspaces on the cluster, so i checked the number of SSTABLES in one of them before nodetool repair and i see 36 data.db files, spread over 11 column families. very reasonable.
Re: repair cause large number of SSTABLEs
On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com wrote: thx, but i didn't do anything like removing/adding nodes. just did a nodetool repair after running for an hour or so on a clean install It affects anything that involves streaming. -Brandon
Re: repair cause large number of SSTABLEs
ok thx. what about the repair creating hundreds of new sstables and lsof showing cassandra using currently over 800 Data.db files? is this normal? On 01/27/2011 08:40 AM, Brandon Williams wrote: On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com mailto:bburr...@real.com wrote: thx, but i didn't do anything like removing/adding nodes. just did a nodetool repair after running for an hour or so on a clean install It affects anything that involves streaming. -Brandon
Re: repair cause large number of SSTABLEs
When the destination node fails to open the streamed SSTable, we assume it was corrupted during transfer, and retry the stream. Independent of the exception posted above, it is a problem that the failed transfers were not cleaned up. How many of the data files are marked as -tmp-? On Jan 27, 2011 9:00 AM, B. Todd Burruss bburr...@real.com wrote: ok thx. what about the repair creating hundreds of new sstables and lsof showing cassandra using currently over 800 Data.db files? is this normal? On 01/27/2011 08:40 AM, Brandon Williams wrote: On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com mailto:bburr...@real.com wrote: thx, but i didn't do anything like removing/adding nodes. just did a nodetool repair after running for an hour or so on a clean install It affects anything that involves streaming. -Brandon
Re: repair cause large number of SSTABLEs
[cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Data.db | grep -c -v \-tmp\- 824 [cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*-tmp-*Data.db | wc -l 829 [cassandra@kv-app02 ~]$ ls -l /data/cassandra-data/data/Queues/*Comp* | wc -l 247 On 01/27/2011 11:14 AM, Stu Hood wrote: When the destination node fails to open the streamed SSTable, we assume it was corrupted during transfer, and retry the stream. Independent of the exception posted above, it is a problem that the failed transfers were not cleaned up. How many of the data files are marked as -tmp-? On Jan 27, 2011 9:00 AM, B. Todd Burruss bburr...@real.com mailto:bburr...@real.com wrote: ok thx. what about the repair creating hundreds of new sstables and lsof showing cassandra using currently over 800 Data.db files? is this normal? On 01/27/2011 08:40 AM, Brandon Williams wrote: On Thu, Jan 27, 2011 at 10:21 AM, Todd Burruss bburr...@real.com mailto:bburr...@real.com mailto:bburr...@real.com mailto:bburr...@real.com wrote: thx, but i didn't do anything like removing/adding nodes. just did a nodetool repair after running for an hour or so on a clean install It affects anything that involves streaming. -Brandon