I opened https://issues.apache.org/jira/browse/CASSANDRA-507
Ray On Wed, Oct 21, 2009 at 12:07 PM, Jonathan Ellis <[email protected]> wrote: > The compaction code removes tombstones, and it runs whenever you have > enough sstable fragments. > > I think I know what is happening -- as an optimization, if there is > only one version of a row it will just copy it to the new sstable. > This means it won't clean out tombstones. > > Can you file a bug at https://issues.apache.org/jira/browse/CASSANDRA ? > > -Jonathan > > On Wed, Oct 21, 2009 at 2:01 PM, Ramzi Rabah <[email protected]> wrote: >> Hi Jonathan I am still running into the timeout issue even after >> reducing the GCGraceSeconds to 1 hour (we have tons of deletes >> happening in our app). Which part of Cassandra >> is responsible for deleting the tombstone records and how often does it run. >> >> >> On Tue, Oct 20, 2009 at 12:02 PM, Ramzi Rabah <[email protected]> wrote: >>> Thank you so much Jonathan. >>> >>> Data is test data so I'll just wipe it out and restart after updating >>> GCGraceSeconds. >>> Thanks for your help. >>> >>> Ray >>> >>> On Tue, Oct 20, 2009 at 11:39 AM, Jonathan Ellis <[email protected]> wrote: >>>> The problem is you have a few MB of actual data and a few hundred MB >>>> of tombstones (data marked deleted). So what happens is get_key_range >>>> spends a long, long time iterating through the tombstoned rows, >>>> looking for keys that actually still exist. >>>> >>>> We're going to redesign this for CASSANDRA-344, but for the 0.4 >>>> series, you should restart with GCGraceSeconds much lower (e.g. 3600), >>>> delete your old data files, and reload your data fresh. (Instead of >>>> reloading, you can use "nodeprobe compact" on each node to force a >>>> major compaction but it will take much longer since you have so many >>>> tombstones). >>>> >>>> -Jonathan >>>> >>>> On Mon, Oct 19, 2009 at 10:45 PM, Ramzi Rabah <[email protected]> wrote: >>>>> Hi Jonathan: >>>>> >>>>> Here is the storage_conf.xml for one of the servers >>>>> http://email.slicezero.com/storage-conf.xml >>>>> >>>>> and here is the zipped data: >>>>> http://email.slicezero.com/datastoreDeletion.tgz >>>>> >>>>> Thanks >>>>> Ray >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Oct 19, 2009 at 8:30 PM, Jonathan Ellis <[email protected]> wrote: >>>>>> Yes, please. You'll probably have to use something like >>>>>> http://www.getdropbox.com/ if you don't have a public web server to >>>>>> stash it temporarily. >>>>>> >>>>>> On Mon, Oct 19, 2009 at 10:28 PM, Ramzi Rabah <[email protected]> wrote: >>>>>>> Hi Jonathan the data is about 60 MB. Would you like me to send it to >>>>>>> you? >>>>>>> >>>>>>> >>>>>>> On Mon, Oct 19, 2009 at 8:20 PM, Jonathan Ellis <[email protected]> >>>>>>> wrote: >>>>>>>> Is the data on 6, 9, or 10 small enough that you could tar.gz it up >>>>>>>> for me to use to reproduce over here? >>>>>>>> >>>>>>>> On Mon, Oct 19, 2009 at 10:17 PM, Ramzi Rabah <[email protected]> >>>>>>>> wrote: >>>>>>>>> So my cluster has 4 nodes node6, node8, node9 and node10. I turned >>>>>>>>> them all off. >>>>>>>>> 1- I started node6 by itself and still got the problem. >>>>>>>>> 2- I started node8 by itself and it ran fine (returned no keys) >>>>>>>>> 3- I started node9 by itself and still got the problem. >>>>>>>>> 4- I started node10 by itself and still got the problem. >>>>>>>>> >>>>>>>>> Ray >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Oct 19, 2009 at 7:44 PM, Jonathan Ellis <[email protected]> >>>>>>>>> wrote: >>>>>>>>>> That's really strange... Can you reproduce on a single-node cluster? >>>>>>>>>> >>>>>>>>>> On Mon, Oct 19, 2009 at 9:34 PM, Ramzi Rabah <[email protected]> >>>>>>>>>> wrote: >>>>>>>>>>> The rows are very small. There are a handful of columns per row >>>>>>>>>>> (approximately about 4-5 columns per row). >>>>>>>>>>> Each column has a name which is a String (20-30 characters long), >>>>>>>>>>> and >>>>>>>>>>> the value is an empty array of bytes (new byte[0]). >>>>>>>>>>> I just use the names of the columns, and don't need to store any >>>>>>>>>>> values in this Column Family. >>>>>>>>>>> >>>>>>>>>>> -- Ray >>>>>>>>>>> >>>>>>>>>>> On Mon, Oct 19, 2009 at 7:24 PM, Jonathan Ellis <[email protected]> >>>>>>>>>>> wrote: >>>>>>>>>>>> Can you tell me anything about the nature of your rows? Many/few >>>>>>>>>>>> columns? Large/small column values? >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Oct 19, 2009 at 9:17 PM, Ramzi Rabah <[email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> Hi Jonathan >>>>>>>>>>>>> I actually spoke too early. Now even if I restart the servers it >>>>>>>>>>>>> still >>>>>>>>>>>>> gives a timeout exception. >>>>>>>>>>>>> As far as the sstable files are, not sure which ones are the >>>>>>>>>>>>> sstables, >>>>>>>>>>>>> but here is the list of files in the data directory that are >>>>>>>>>>>>> prepended >>>>>>>>>>>>> with the column family name: >>>>>>>>>>>>> DatastoreDeletionSchedule-1-Data.db >>>>>>>>>>>>> DatastoreDeletionSchedule-1-Filter.db >>>>>>>>>>>>> DatastoreDeletionSchedule-1-Index.db >>>>>>>>>>>>> DatastoreDeletionSchedule-5-Data.db >>>>>>>>>>>>> DatastoreDeletionSchedule-5-Filter.db >>>>>>>>>>>>> DatastoreDeletionSchedule-5-Index.db >>>>>>>>>>>>> DatastoreDeletionSchedule-7-Data.db >>>>>>>>>>>>> DatastoreDeletionSchedule-7-Filter.db >>>>>>>>>>>>> DatastoreDeletionSchedule-7-Index.db >>>>>>>>>>>>> DatastoreDeletionSchedule-8-Data.db >>>>>>>>>>>>> DatastoreDeletionSchedule-8-Filter.db >>>>>>>>>>>>> DatastoreDeletionSchedule-8-Index.db >>>>>>>>>>>>> >>>>>>>>>>>>> I am not currently doing any system stat collection. >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:41 PM, Jonathan Ellis >>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>> How many sstable files are in the data directories for the >>>>>>>>>>>>>> columnfamily you are querying? >>>>>>>>>>>>>> >>>>>>>>>>>>>> How many are there after you restart and it is happy? >>>>>>>>>>>>>> >>>>>>>>>>>>>> Are you doing system stat collection with munin or ganglia or >>>>>>>>>>>>>> some such? >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 8:25 PM, Ramzi Rabah >>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>> Hi Jonathan I updated to 4.1 and I still get the same exception >>>>>>>>>>>>>>> when I >>>>>>>>>>>>>>> call get_key_range. >>>>>>>>>>>>>>> I checked all the server logs, and there is only one exception >>>>>>>>>>>>>>> being >>>>>>>>>>>>>>> thrown by whichever server I am connecting to. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>> Ray >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:52 PM, Jonathan Ellis >>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>> No, it's smart enough to avoid scanning. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 6:49 PM, Ramzi Rabah >>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>> Hi Jonathan thanks for the reply, I will update the code to >>>>>>>>>>>>>>>>> 0.4.1 and >>>>>>>>>>>>>>>>> will check all the logs on all the machines. >>>>>>>>>>>>>>>>> Just a simple question, when you do a get_key_range and you >>>>>>>>>>>>>>>>> specify "" >>>>>>>>>>>>>>>>> and "" for start and end, and the limit is 25, if there are >>>>>>>>>>>>>>>>> too many >>>>>>>>>>>>>>>>> entries, does it do a scan to find out the start or is it >>>>>>>>>>>>>>>>> smart enough >>>>>>>>>>>>>>>>> to know what the start key is? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 4:42 PM, Jonathan Ellis >>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>> You should check the other nodes for potential exceptions >>>>>>>>>>>>>>>>>> keeping them >>>>>>>>>>>>>>>>>> from replying. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Without seeing that it's hard to say if this is caused by an >>>>>>>>>>>>>>>>>> old bug, >>>>>>>>>>>>>>>>>> but you should definitely upgrade to 0.4.1 either way :) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Oct 19, 2009 at 5:51 PM, Ramzi Rabah >>>>>>>>>>>>>>>>>> <[email protected]> wrote: >>>>>>>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I am running into problems with get_key_range. I have >>>>>>>>>>>>>>>>>>> OrderPreservingPartitioner defined in storage-conf.xml and >>>>>>>>>>>>>>>>>>> I am using >>>>>>>>>>>>>>>>>>> a columnfamily that looks like >>>>>>>>>>>>>>>>>>> <ColumnFamily CompareWith="BytesType" >>>>>>>>>>>>>>>>>>> Name="DatastoreDeletionSchedule" >>>>>>>>>>>>>>>>>>> /> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> My command is client.get_key_range("Keyspace1", >>>>>>>>>>>>>>>>>>> "DatastoreDeletionSchedule", >>>>>>>>>>>>>>>>>>> "", "", 25, ConsistencyLevel.ONE); >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It usually works fine but after a day or so from server >>>>>>>>>>>>>>>>>>> writes into >>>>>>>>>>>>>>>>>>> this column family, I started getting >>>>>>>>>>>>>>>>>>> ERROR [pool-1-thread-36] 2009-10-19 17:24:28,223 >>>>>>>>>>>>>>>>>>> Cassandra.java (line >>>>>>>>>>>>>>>>>>> 770) Internal error processing get_key_range >>>>>>>>>>>>>>>>>>> java.lang.RuntimeException: >>>>>>>>>>>>>>>>>>> java.util.concurrent.TimeoutException: >>>>>>>>>>>>>>>>>>> Operation timed out. >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:560) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.CassandraServer.get_key_range(CassandraServer.java:595) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.Cassandra$Processor$get_key_range.process(Cassandra.java:766) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:609) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) >>>>>>>>>>>>>>>>>>> at java.lang.Thread.run(Thread.java:619) >>>>>>>>>>>>>>>>>>> Caused by: java.util.concurrent.TimeoutException: Operation >>>>>>>>>>>>>>>>>>> timed out. >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.cassandra.net.AsyncResult.get(AsyncResult.java:97) >>>>>>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>> org.apache.cassandra.service.StorageProxy.getKeyRange(StorageProxy.java:556) >>>>>>>>>>>>>>>>>>> ... 7 more >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I still get the timeout exceptions even though the servers >>>>>>>>>>>>>>>>>>> have been >>>>>>>>>>>>>>>>>>> idle for 2 days. When I restart the cassandra servers, it >>>>>>>>>>>>>>>>>>> seems to >>>>>>>>>>>>>>>>>>> work fine again. Any ideas what could be wrong? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> By the way, I am using >>>>>>>>>>>>>>>>>>> version:apache-cassandra-incubating-0.4.0-rc2 >>>>>>>>>>>>>>>>>>> Not sure if this is fixed in the 0.4.1 version >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks >>>>>>>>>>>>>>>>>>> Ray >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
