Re: Removes increasing disk space usage in Cassandra?

Ramzi Rabah Fri, 04 Dec 2009 12:45:42 -0800

I have a two week old version of trunk. Probably need to update it to
latest build.


On Fri, Dec 4, 2009 at 12:34 PM, Jonathan Ellis <[email protected]> wrote:
> Are you testing trunk?  If not, you should check that first to see if
> it's already fixed.
>
> On Fri, Dec 4, 2009 at 1:55 PM, Ramzi Rabah <[email protected]> wrote:
>> Just to be clear what I meant is that I ran the deletions and
>> compaction with GCGraceSeconds set to 1 hour, so there was enough time
>> for the tombstones to expire.
>> Anyway I will try to make a simpler test case to hopefully reproduce
>> this, and I will share the code if I can reproduce.
>>
>> Ray
>>
>> On Fri, Dec 4, 2009 at 11:04 AM, Ramzi Rabah <[email protected]> wrote:
>>> Hi Jonathan I have changed that to 3600(one hour) based on your
>>> recommendation before.
>>>
>>> On Fri, Dec 4, 2009 at 11:01 AM, Jonathan Ellis <[email protected]> wrote:
>>>> this is what I was referring to by "the period specified in your config 
>>>> file":
>>>>
>>>>  <!--
>>>>   ~ Time to wait before garbage-collection deletion markers.  Set this to
>>>>   ~ a large enough value that you are confident that the deletion marker
>>>>   ~ will be propagated to all replicas by the time this many seconds has
>>>>   ~ elapsed, even in the face of hardware failures.  The default value is
>>>>   ~ ten days.
>>>>  -->
>>>>  <GCGraceSeconds>864000</GCGraceSeconds>
>>>>
>>>> On Fri, Dec 4, 2009 at 12:51 PM, Ramzi Rabah <[email protected]> wrote:
>>>>> I think there might be a bug in the deletion logic. I removed all the
>>>>> data on the cluster by running remove on every single key I entered,
>>>>> and I run major compaction
>>>>> nodeprobe -host hostname compact on a certain node, and after the
>>>>> compaction is over, I am left with one data file/ one index file and
>>>>> the bloom filter file,
>>>>> and they are the same size of data as before I started doing the deletes.
>>>>>
>>>>> On Thu, Dec 3, 2009 at 6:09 PM, Jonathan Ellis <[email protected]> wrote:
>>>>>> cassandra never modifies data in-place.  so it writes tombstones to
>>>>>> supress the older writes, and when compaction occurs the data and
>>>>>> tombstones get GC'd (after the period specified in your config file).
>>>>>>
>>>>>> On Thu, Dec 3, 2009 at 8:07 PM, Ramzi Rabah <[email protected]> wrote:
>>>>>>> Looking at jconsole I see a high number of writes when I do removes,
>>>>>>> so I am guessing these are tombstones being written? If that's the
>>>>>>> case, is the data being removed and replaced by tombstones? and will
>>>>>>> they all be deleted eventually when compaction runs?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Dec 3, 2009 at 3:18 PM, Ramzi Rabah <[email protected]> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> I ran a test where I inserted about 1.2 Gigabytes worth of data into
>>>>>>>> each node of a 4 node cluster.
>>>>>>>> I ran a script that first calls a get on each column inserted followed
>>>>>>>> by a remove. Since I was basically removing every entry
>>>>>>>> I inserted before, I expected that the disk space occupied by the
>>>>>>>> nodes will go down and eventually become 0. The disk space
>>>>>>>> actually goes up when I do the bulk removes to about 1.8 gigs per
>>>>>>>> node. Am I missing something here?
>>>>>>>>
>>>>>>>> Thanks a lot for your help
>>>>>>>> Ray
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Removes increasing disk space usage in Cassandra?

Reply via email to