I think there might be a bug in the deletion logic. I removed all the data on the cluster by running remove on every single key I entered, and I run major compaction nodeprobe -host hostname compact on a certain node, and after the compaction is over, I am left with one data file/ one index file and the bloom filter file, and they are the same size of data as before I started doing the deletes.
On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <[email protected]> wrote: > cassandra never modifies data in-place. so it writes tombstones to > supress the older writes, and when compaction occurs the data and > tombstones get GC'd (after the period specified in your config file). > > On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <[email protected]> wrote: >> Looking at jconsole I see a high number of writes when I do removes, >> so I am guessing these are tombstones being written? If that's the >> case, is the data being removed and replaced by tombstones? and will >> they all be deleted eventually when compaction runs? >> >> >> >> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <[email protected]> wrote: >>> Hi all, >>> >>> I ran a test where I inserted about 1.2 Gigabytes worth of data into >>> each node of a 4 node cluster. >>> I ran a script that first calls a get on each column inserted followed >>> by a remove. Since I was basically removing every entry >>> I inserted before, I expected that the disk space occupied by the >>> nodes will go down and eventually become 0. The disk space >>> actually goes up when I do the bulk removes to about 1.8 gigs per >>> node. Am I missing something here? >>> >>> Thanks a lot for your help >>> Ray >>> >> >
