There's a ticket open for this:
https://issues.apache.org/jira/browse/CASSANDRA-2521. Vote on it if
you think its important.

-ryan

On Wed, Jun 15, 2011 at 7:34 PM, Jeffrey Kesselman <jef...@gmail.com> wrote:
> The GC cleanup approach, if depending on specific objects being GCd,
> is fundamentally flawed.
>
> I brought this up earlier, won't restart that thread.  It should be in
> the archives.
>
>
> On Wed, Jun 15, 2011 at 10:17 PM, Terje Marthinussen
> <tmarthinus...@gmail.com> wrote:
>> Watching this on a node here right now and it sort of shows how bad this can
>> get.
>> This node still has 109GB free disk by the way...
>> INFO [CompactionExecutor:5] 2011-06-16 09:11:59,164 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:5] 2011-06-16 09:12:23,929 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:5] 2011-06-16 09:12:46,489 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:17:53,299 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:18:17,782 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:18:42,078 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:19:06,984 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:19:32,079 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:19:57,265 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:20:22,706 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:20:47,331 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:21:13,062 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:21:38,288 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:22:03,500 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:22:29,407 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:22:55,577 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:23:20,951 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:23:46,448 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:3] 2011-06-16 09:24:12,030 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [ScheduledTasks:1] 2011-06-16 09:29:29,494 GCInspector.java (line 128)
>> GC for ParNew: 392 ms, 398997776 reclaimed leaving 2334786808 used; max is
>> 10844635136
>>  INFO [ScheduledTasks:1] 2011-06-16 09:29:32,831 GCInspector.java (line 128)
>> GC for ParNew: 737 ms, 332336832 reclaimed leaving 2473311448 used; max is
>> 10844635136
>>  INFO [CompactionExecutor:6] 2011-06-16 09:48:00,633 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 09:48:26,119 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 09:48:49,002 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 10:10:20,196 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 10:10:45,322 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:6] 2011-06-16 10:11:07,619 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:7] 2011-06-16 11:01:45,562 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:7] 2011-06-16 11:02:10,236 StorageService.java
>> (line 2071) requesting GC to free disk space
>>  INFO [CompactionExecutor:7] 2011-06-16 11:05:31,297 StorageService.java
>> (line 2071) requesting GC to free disk space
>> If I look at the data dir, I see 46 *Compacted files which makes up an
>> additional 137GB of space.
>> The oldest of these Compacted files dates back to Jun 16th 01:26.
>> If these got deleted, there should actually be enough disk for the node to
>> run a full compaction run if needed.
>> Either the GC cleanup tactic is seriously flawed or  we have a potential bug
>> keeping references far longer than needed?
>> Terje
>>
>>
>> On Wed, Jun 15, 2011 at 11:50 PM, Shotaro Kamio <kamios...@gmail.com> wrote:
>>>
>>> We've encountered the situation that compacted sstable files aren't
>>> deleted after node repair. Even when gc is triggered via jmx, it
>>> sometimes leaves compacted files. In a case, a lot of files are left.
>>> Some files stay more than 10 hours already. There is no guarantee that
>>> gc will cleanup all compacted sstable files.
>>>
>>> We have a great interest on the following ticket.
>>> https://issues.apache.org/jira/browse/CASSANDRA-2521
>>>
>>>
>>> Regards,
>>> Shotaro
>>>
>>>
>>> On Fri, May 27, 2011 at 11:27 AM, Jeffrey Kesselman <jef...@gmail.com>
>>> wrote:
>>> > Im also not sure that will guarantee all space is cleaned up.  It
>>> > really depends on what you are doing inside Cassandra.  If you have
>>> > your on garbage collect that is just in some way tied to the gc run,
>>> > then it will run when  it runs.
>>> >
>>> > If otoh you are associating records in your storage with specific
>>> > objects in memory and using one of the post-mortem hooks (finalize or
>>> > PhantomReference) to tell you to clean up that particular record then
>>> > its quite possible they wont all get cleaned up.  In general hotspot
>>> > does not find and clean every candidate object on every GC run.  It
>>> > starts with the easiest/fastest to find and then sees what more it
>>> > thinks it needs to do to create enough memory for anticipated near
>>> > future needs.
>>> >
>>> > On Thu, May 26, 2011 at 10:16 PM, Jonathan Ellis <jbel...@gmail.com>
>>> > wrote:
>>> >> In summary, system.gc works fine unless you've deliberately done
>>> >> something like setting the -XX:-DisableExplicitGC flag.
>>> >>
>>> >> On Thu, May 26, 2011 at 5:58 PM, Konstantin  Naryshkin
>>> >> <konstant...@a-bb.net> wrote:
>>> >>> So, in summary, there is no way to predictably and efficiently tell
>>> >>> Cassandra to get rid of all of the extra space it is using on disk?
>>> >>>
>>> >>> ----- Original Message -----
>>> >>> From: "Jeffrey Kesselman" <jef...@gmail.com>
>>> >>> To: user@cassandra.apache.org
>>> >>> Sent: Thursday, May 26, 2011 8:57:49 PM
>>> >>> Subject: Re: Forcing Cassandra to free up some space
>>> >>>
>>> >>> Which JVM?  Which collector?  There have been and continue to be many.
>>> >>>
>>> >>> Hotspot itself supports a number of different collectors with
>>> >>> different behaviors.   Many of them do not collect every candidate on
>>> >>> every gc, but merely the easiest ones to find.  This is why depending
>>> >>> on finalizers is a *bad* idea in java code.  They may well never get
>>> >>> run.  (Finalizer is one of a few features the Sun Java team always
>>> >>> regretted putting in Java to start with.  It has caused quite a few
>>> >>> application problems over the years)
>>> >>>
>>> >>> The really important thing is that NONE of these behaviors of the
>>> >>> colelctors are guaranteed by specification not to change from version
>>> >>> to version.  Basing your code on non-specified behaviors is a good way
>>> >>> to hit mysterious failures on updates.
>>> >>>
>>> >>> For instance, in the mid 90s, IBM had a mode of their Vm called
>>> >>> "infinite heap."  it *never* garbage collected, even if you called
>>> >>> System.gc.  Instead it just threw away address space and counted on
>>> >>> the total memory needs for the life of the program being less then the
>>> >>> total addressable space of the processor.
>>> >>>
>>> >>> It was *very* fast for certain kinds of applications.
>>> >>>
>>> >>> Far from being pedantic, not depending on undocumented behavior is
>>> >>> simply good engineering.
>>> >>>
>>> >>>
>>> >>> On Thu, May 26, 2011 at 4:51 PM, Jonathan Ellis <jbel...@gmail.com>
>>> >>> wrote:
>>> >>>> I've read the relevant source. While you're pedantically correct re
>>> >>>> the spec, you're wrong as to what the JVM actually does.
>>> >>>>
>>> >>>> On Thu, May 26, 2011 at 3:14 PM, Jeffrey Kesselman <jef...@gmail.com>
>>> >>>> wrote:
>>> >>>>> Some references...
>>> >>>>>
>>> >>>>> "An object enters an unreachable state when no more strong
>>> >>>>> references
>>> >>>>> to it exist. When an object is unreachable, it is a candidate for
>>> >>>>> collection. Note the wording: Just because an object is a candidate
>>> >>>>> for collection doesn't mean it will be immediately collected. The
>>> >>>>> JVM
>>> >>>>> is free to delay collection until there is an immediate need for the
>>> >>>>> memory being consumed by the object."
>>> >>>>>
>>> >>>>>
>>> >>>>> http://java.sun.com/docs/books/performance/1st_edition/html/JPAppGC.fm.html#998394
>>> >>>>>
>>> >>>>> and "Calling the gc method suggests that the Java Virtual Machine
>>> >>>>> expend effort toward recycling unused objects"
>>> >>>>>
>>> >>>>>
>>> >>>>> http://download.oracle.com/javase/6/docs/api/java/lang/System.html#gc()
>>> >>>>>
>>> >>>>> It goes on to say that the VM will make a "best effort", but "best
>>> >>>>> effort" is *deliberately* left up to the definition of the gc
>>> >>>>> implementor.
>>> >>>>>
>>> >>>>> I guess you missed the many lectures I have given on this subject
>>> >>>>> over
>>> >>>>> the years at Java One Conferences....
>>> >>>>>
>>> >>>>> On Thu, May 26, 2011 at 3:53 PM, Jonathan Ellis <jbel...@gmail.com>
>>> >>>>> wrote:
>>> >>>>>> It's a common misunderstanding that system.gc is only a suggestion;
>>> >>>>>> on
>>> >>>>>> any VM you're likely to run Cassandra on, System.gc will actually
>>> >>>>>> invoke a full collection.
>>> >>>>>>
>>> >>>>>> On Thu, May 26, 2011 at 2:18 PM, Jeffrey Kesselman
>>> >>>>>> <jef...@gmail.com> wrote:
>>> >>>>>>> Actually this is no gaurantee.   Its a common misunderstanding
>>> >>>>>>> that
>>> >>>>>>> System.gc "forces" gc.  It does not. It is a suggestion only. The
>>> >>>>>>> vm always
>>> >>>>>>> has the option as to when and how much it gcs
>>> >>>>>>>
>>> >>>>>>> On May 26, 2011 2:51 PM, "Jonathan Ellis" <jbel...@gmail.com>
>>> >>>>>>> wrote:
>>> >>>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>>
>>> >>>>>> --
>>> >>>>>> Jonathan Ellis
>>> >>>>>> Project Chair, Apache Cassandra
>>> >>>>>> co-founder of DataStax, the source for professional Cassandra
>>> >>>>>> support
>>> >>>>>> http://www.datastax.com
>>> >>>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> --
>>> >>>>> It's always darkest just before you are eaten by a grue.
>>> >>>>>
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> Jonathan Ellis
>>> >>>> Project Chair, Apache Cassandra
>>> >>>> co-founder of DataStax, the source for professional Cassandra support
>>> >>>> http://www.datastax.com
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> It's always darkest just before you are eaten by a grue.
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Jonathan Ellis
>>> >> Project Chair, Apache Cassandra
>>> >> co-founder of DataStax, the source for professional Cassandra support
>>> >> http://www.datastax.com
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > It's always darkest just before you are eaten by a grue.
>>> >
>>>
>>>
>>>
>>> --
>>> Shotaro Kamio
>>
>>
>
>
>
> --
> It's always darkest just before you are eaten by a grue.
>

Reply via email to