The compaction code removes tombstones, and it runs whenever you have
enough sstable fragments.

I think I know what is happening -- as an optimization, if there is
only one version of a row it will just copy it to the new sstable.
This means it won't clean out tombstones.

Can you file a bug at https://issues.apache.org/jira/browse/CASSANDRA ?

-Jonathan

On Wed, Oct 21, 2009 at 2:01 PM, Ramzi Rabah <[email protected]> wrote:
> Hi Jonathan I am still running into the timeout issue even after
> reducing the GCGraceSeconds to 1 hour (we have tons of deletes
> happening in our app). Which part of Cassandra
> is responsible for deleting the tombstone records and how often does it run.
>
>
> On Tue, Oct 20, 2009 at 12:02 PM, Ramzi Rabah <[email protected]> wrote:
>> Thank you so much Jonathan.
>>
>> Data is test data so I'll just wipe it out and restart after updating
>> GCGraceSeconds.
>> Thanks for your help.
>>
>> Ray
>>
>> On Tue, Oct 20, 2009 at 11:39 AM, Jonathan Ellis <[email protected]> wrote:
>>> The problem is you have a few MB of actual data and a few hundred MB
>>> of tombstones (data marked deleted).  So what happens is get_key_range
>>> spends a long, long time iterating through the tombstoned rows,
>>> looking for keys that actually still exist.
>>>
>>> We're going to redesign this for CASSANDRA-344, but for the 0.4
>>> series, you should restart with GCGraceSeconds much lower (e.g. 3600),
>>> delete your old data files, and reload your data fresh.  (Instead of
>>> reloading, you can use "nodeprobe compact" on each node to force a
>>> major compaction but it will take much longer since you have so many
>>> tombstones).
>>>
>>> -Jonathan
>>>
>>> On Mon, Oct 19, 2009 at 10:45 PM, Ramzi Rabah <[email protected]> wrote:
>>>> Hi Jonathan:
>>>>
>>>> Here is the storage_conf.xml for one of the servers
>>>> http://email.slicezero.com/storage-conf.xml
>>>>
>>>> and here is the zipped data:
>>>> http://email.slicezero.com/datastoreDeletion.tgz
>>>>
>>>> Thanks
>>>> Ray
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Oct 19, 2009 at 8:30 PM, Jonathan Ellis <[email protected]> wrote:
>>>>> Yes, please.  You'll probably have to use something like
>>>>> http://www.getdropbox.com/ if you don't have a public web server to
>>>>> stash it temporarily.
>>>>>
>>>>> On Mon, Oct 19, 2009 at 10:28 PM, Ramzi Rabah <[email protected]> wrote:
>>>>>> Hi Jonathan the data is about 60 MB. Would you like me to send it to you?
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 19, 2009 at 8:20 PM, Jonathan Ellis <[email protected]> 
>>>>>> wrote:
>>>>>>> Is the data on 6, 9, or 10 small enough that you could tar.gz it up
>>>>>>> for me to use to reproduce over here?
>>>>>>>
>>>>>>> On Mon, Oct 19, 2009 at 10:17 PM, Ramzi Rabah <[email protected]> 
>>>>>>> wrote:
>>>>>>>> So my cluster has 4 nodes node6, node8, node9 and node10. I turned
>>>>>>>> them all off.
>>>>>>>> 1- I started node6 by itself and still got the problem.
>>>>>>>> 2- I started node8 by itself and it ran fine (returned no keys)
>>>>>>>> 3- I started node9 by itself and still got the problem.
>>>>>>>> 4- I started node10 by itself and still got the problem.
>>>>>>>>
>>>>>>>> Ray
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Oct 19, 2009 at 7:44 PM, Jonathan Ellis <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>> That's really strange...  Can you reproduce on a single-node cluster?
>>>>>>>>>
>>>>>>>>> On Mon, Oct 19, 2009 at 9:34 PM, Ramzi Rabah <[email protected]> 
>>>>>>>>> wrote:
>>>>>>>>>> The rows are very small. There are a handful of columns per row
>>>>>>>>>> (approximately about 4-5 columns per row).
>>>>>>>>>> Each column has a name which is a String (20-30 characters long), and
>>>>>>>>>> the value is an empty array of bytes (new byte[0]).
>>>>>>>>>> I just use the names of the columns, and don't need to store any
>>>>>>>>>> values in this Column Family.
>>>>>>>>>>
>>>>>>>>>> -- Ray
>>>>>>>>>>
>>>>>>>>>> On Mon, Oct 19, 2009 at 7:24 PM, Jonathan Ellis <[email protected]> 
>>>>>>>>>> wrote:
>>>>>>>>>>> Can you tell me anything about the nature of your rows?  Many/few
>>>>>>>>>>> columns?  Large/small column values?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Oct 19, 2009 at 9:17 PM, Ramzi Rabah <[email protected]> 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi Jonathan
>>>>>>>>>>>> I actually spoke too early. Now even if I restart the servers it 
>>>>>>>>>>>> still
>>>>>>>>>>>> gives a timeout exception.
>>>>>>>>>>>> As far as the sstable files are, not sure which ones are the 
>>>>>>>>>>>> sstables,
>>>>>>>>>>>> but here is the list of files in the data directory that are 
>>>>>>>>>>>> prepended
>>>>>>>>>>>> with the column family name:
>>>>>>>>>>>> DatastoreDeletionSchedule-1-Data.db
>>>>>>>>>>>> DatastoreDeletionSchedule-1-Filter.db
>>>>>>>>>>>> DatastoreDeletionSchedule-1-Index.db
>>>>>>>>>>>> DatastoreDeletionSchedule-5-Data.db
>>>>>>>>>>>> DatastoreDeletionSchedule-5-Filter.db
>>>>>>>>>>>> DatastoreDeletionSchedule-5-Index.db
>>>>>>>>>>>> DatastoreDeletionSchedule-7-Data.db
>>>>>>>>>>>> DatastoreDeletionSchedule-7-Filter.db
>>>>>>>>>>>> DatastoreDeletionSchedule-7-Index.db
>>>>>>>>>>>> DatastoreDeletionSchedule-8-Data.db
>>>>>>>>>>>> DatastoreDeletionSchedule-8-Filter.db
>>>>>>>>>>>> DatastoreDeletionSchedule-8-Index.db
>>>>>>>>>>>>
>>>>>>>>>>>> I am not currently doing any system stat collection.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:41 PM, Jonathan Ellis 
>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>> How many sstable files are in the data directories for the
>>>>>>>>>>>>> columnfamily you are querying?
>>>>>>>>>>>>>
>>>>>>>>>>>>> How many are there after you restart and it is happy?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Are you doing system stat collection with munin or ganglia or 
>>>>>>>>>>>>> some such?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 8:25 PM, Ramzi Rabah <[email protected]> 
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi Jonathan I updated to 4.1 and I still get the same exception 
>>>>>>>>>>>>>> when I
>>>>>>>>>>>>>> call get_key_range.
>>>>>>>>>>>>>> I checked all the server logs, and there is only one exception 
>>>>>>>>>>>>>> being
>>>>>>>>>>>>>> thrown by whichever server I am connecting to.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:52 PM, Jonathan Ellis 
>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>> No, it's smart enough to avoid scanning.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:49 PM, Ramzi Rabah 
>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>> Hi Jonathan thanks for the reply, I will update the code to 
>>>>>>>>>>>>>>>> 0.4.1 and
>>>>>>>>>>>>>>>> will check all the logs on all the machines.
>>>>>>>>>>>>>>>> Just a simple question, when you do a get_key_range and you 
>>>>>>>>>>>>>>>> specify ""
>>>>>>>>>>>>>>>> and "" for start and end, and the limit is 25, if there are 
>>>>>>>>>>>>>>>> too many
>>>>>>>>>>>>>>>> entries, does it do a scan to find out the start or is it 
>>>>>>>>>>>>>>>> smart enough
>>>>>>>>>>>>>>>> to know what the start key is?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:42 PM, Jonathan Ellis 
>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>> You should check the other nodes for potential exceptions 
>>>>>>>>>>>>>>>>> keeping them
>>>>>>>>>>>>>>>>> from replying.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Without seeing that it's hard to say if this is caused by an 
>>>>>>>>>>>>>>>>> old bug,
>>>>>>>>>>>>>>>>> but you should definitely upgrade to 0.4.1 either way :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 5:51 PM, Ramzi Rabah 
>>>>>>>>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>>>>>>>>> Hello all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I am running into problems with get_key_range. I have
>>>>>>>>>>>>>>>>>> OrderPreservingPartitioner defined in storage-conf.xml and I 
>>>>>>>>>>>>>>>>>> am using
>>>>>>>>>>>>>>>>>> a columnfamily that looks like
>>>>>>>>>>>>>>>>>>     <ColumnFamily CompareWith="BytesType"
>>>>>>>>>>>>>>>>>>                   Name="DatastoreDeletionSchedule"
>>>>>>>>>>>>>>>>>>                   />
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My command is client.get_key_range("Keyspace1", 
>>>>>>>>>>>>>>>>>> "DatastoreDeletionSchedule",
>>>>>>>>>>>>>>>>>>                    "", "", 25, ConsistencyLevel.ONE);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It usually works fine but after a day or so from server 
>>>>>>>>>>>>>>>>>> writes into
>>>>>>>>>>>>>>>>>> this column family, I started getting
>>>>>>>>>>>>>>>>>> ERROR [pool-1-thread-36] 2009-10-19 17:24:28,223 
>>>>>>>>>>>>>>>>>> Cassandra.java (line
>>>>>>>>>>>>>>>>>> 770) Internal error processing get_key_range
>>>>>>>>>>>>>>>>>> java.lang.RuntimeException: 
>>>>>>>>>>>>>>>>>> java.util.concurrent.TimeoutException:
>>>>>>>>>>>>>>>>>> Operation timed out.
>>>>>>>>>>>>>>>>>>        at 
>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:560)
>>>>>>>>>>>>>>>>>>        at 
>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.CassandraServer.get_key_range(CassandraServer.java:595)
>>>>>>>>>>>>>>>>>>        at 
>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.Cassandra$Processor$get_key_range.process(Cassandra.java:766)
>>>>>>>>>>>>>>>>>>        at 
>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:609)
>>>>>>>>>>>>>>>>>>        at 
>>>>>>>>>>>>>>>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>>>>>>>>>>>>>>>>>>        at 
>>>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
>>>>>>>>>>>>>>>>>>        at 
>>>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
>>>>>>>>>>>>>>>>>>        at java.lang.Thread.run(Thread.java:619)
>>>>>>>>>>>>>>>>>> Caused by: java.util.concurrent.TimeoutException: Operation 
>>>>>>>>>>>>>>>>>> timed out.
>>>>>>>>>>>>>>>>>>        at 
>>>>>>>>>>>>>>>>>> org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:97)
>>>>>>>>>>>>>>>>>>        at 
>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:556)
>>>>>>>>>>>>>>>>>>        ... 7 more
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I still get the timeout exceptions even though the servers 
>>>>>>>>>>>>>>>>>> have been
>>>>>>>>>>>>>>>>>> idle for 2 days. When I restart the cassandra servers, it 
>>>>>>>>>>>>>>>>>> seems to
>>>>>>>>>>>>>>>>>> work fine again. Any ideas what could be wrong?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> By the way, I am using 
>>>>>>>>>>>>>>>>>> version:apache-cassandra-incubating-0.4.0-rc2
>>>>>>>>>>>>>>>>>> Not sure if this is fixed in the 0.4.1 version
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks
>>>>>>>>>>>>>>>>>> Ray
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to