Charles~

While it could be that we are asking too much of the memory subsystem,
the steadily increasing times for young gen sweeps seems like a strong
indicator that something is going wrong.  I would be more likely to
believe the entire system is over stressed if I saw consistently high
latencies.

We are using some fairly allocation happy libraries when we pull data
into/push data out of the system.  The application itself receives
continuous streams of data (events) each bit of which triggers a
relatively small amount of work.  Since an event is usually dealt with
entirely immediately all the garbage for it doesn't survive the
processing of the event.  Those events that trigger objects that
survive the first sweep are usually in longer term storage and won't
become garbage until triggered by another event an indeterminate
amount of time in the future.

Matt

On Wed, May 5, 2010 at 2:30 PM, Charles Oliver Nutter
<[email protected]> wrote:
> I will also mention that a lot of time when we benchmark object-heavy
> algorithms in JRuby (which is most algorithms), the bottleneck is not
> in GC as much as it is in allocation rates. Kirk Pepperdine ran some
> numbers for me a few months back and calculated that for numeric
> benchmarks we were basically saturating the memory pipeline to
> *acquire* objects, even though GC times were negligible. It sounds
> like you're cranking through a lot of data...could you simply be
> asking too much of memory? What sort of application is it? Is it using
> one of the object-heavy languages like JRuby or Clojure?
>
> On Wed, May 5, 2010 at 1:28 PM, Charles Oliver Nutter
> <[email protected]> wrote:
>> Ok, my next thought would be that your young generation is perhaps too
>> big? I'm sure you've probably tried choking it down and letting GC run
>> more often against a smaller young heap?
>>
>> If GC times for young gen are getting longer, something has to be
>> changing. Finalizers? Weak/SoftReferences? You say you're not getting
>> to the point of CMS running, but a large young gen can still take a
>> long time to collect. Do less more often?
>>
>> On Wed, May 5, 2010 at 9:25 AM, Matt Fowles <[email protected]> wrote:
>>> Charles~
>>>
>>> I settled on that after having run experiments varying the survivor
>>> ratio and tenuring threshold.  In the end, I discovered that >99.9% of
>>> the young garbage got caught with this and each extra young gen run
>>> only reclaimed about 1% of the previously surviving objects.  So it
>>> seemed like the trade off just wasn't winning me anything.
>>>
>>> Matt
>>>
>>> On Tue, May 4, 2010 at 6:50 PM, Charles Oliver Nutter
>>> <[email protected]> wrote:
>>>> Why are you using -XX:MaxTenuringThreshold=0? That's basically forcing
>>>> all objects that survive one collection to immediately be promoted,
>>>> even if they just happen to be a slightly longer-lived young object.
>>>> Using 0 seems like a bad idea.
>>>>
>>>> On Tue, May 4, 2010 at 4:46 PM, Matt Fowles <[email protected]> wrote:
>>>>> All~
>>>>>
>>>>> I have a large app that produces ~4g of garbage every minute and am
>>>>> trying to reduce the size of gc outliers.  About 99% of this data is
>>>>> garbage, but almost anything that survives one collection survives for
>>>>> an indeterminately long amount of time.  We are currently using the
>>>>> following VM and options
>>>>>
>>>>>
>>>>> java version "1.6.0_16"
>>>>> Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
>>>>> Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)
>>>>>
>>>>> -verbose:gc
>>>>> -Xms32g -Xmx32g -Xmn4g
>>>>> -XX:+UseParNewGC
>>>>> -XX:ParallelGCThreads=4
>>>>> -XX:+UseConcMarkSweepGC
>>>>> -XX:ParallelCMSThreads=4
>>>>> -XX:MaxTenuringThreshold=0
>>>>> -XX:SurvivorRatio=20000
>>>>> -XX:CMSInitiatingOccupancyFraction=60
>>>>> -XX:+UseCMSInitiatingOccupancyOnly
>>>>> -XX:+CMSParallelRemarkEnabled
>>>>> -XX:MaxGCPauseMillis=50
>>>>> -Xloggc:gc.log
>>>>>
>>>>>
>>>>>
>>>>> As you can see from the GC log, we never actually reach the point
>>>>> where the CMS kicks in (after app startup).  But our young gens seem
>>>>> to take increasingly long to collect as time goes by.
>>>>>
>>>>> One of the major metrics we have for measuring this system is latency
>>>>> as measured from connected clients and as measured internally.  You
>>>>> can see an attached graph of latency vs time for the clients.  It is
>>>>> not surprising the the internal latency (green and labeled 'sb') is
>>>>> not as large as the network latency.  I assume this is in part because
>>>>> VM safe points are less likely to occur within our internal timing
>>>>> markers.  But, one can easily see how the external latency
>>>>> measurements (blue and labeled 'network') display the same steady
>>>>> increase in times.
>>>>>
>>>>> My hope is to be able to tweak young gen size and trade off GC
>>>>> frequency with pause length; however, the steadily increasing GC times
>>>>> are proving to persist regardless of the size that I make the young
>>>>> generation.
>>>>>
>>>>> Has anyone seen this sort of behavior before?  Are there more switches
>>>>> that I should try running with?
>>>>>
>>>>> Obviously, I am working to profile the app and reduce the garbage load
>>>>> in parallel.  But if I still see this sort of problem, it is only a
>>>>> question of how long must the app run before I see unacceptable
>>>>> latency spikes.
>>>>>
>>>>> Matt
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "JVM Languages" group.
>>>>> To post to this group, send email to [email protected].
>>>>> To unsubscribe from this group, send email to 
>>>>> [email protected].
>>>>> For more options, visit this group at 
>>>>> http://groups.google.com/group/jvm-languages?hl=en.
>>>>>
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "JVM Languages" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to 
>>>> [email protected].
>>>> For more options, visit this group at 
>>>> http://groups.google.com/group/jvm-languages?hl=en.
>>>>
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups 
>>> "JVM Languages" group.
>>> To post to this group, send email to [email protected].
>>> To unsubscribe from this group, send email to 
>>> [email protected].
>>> For more options, visit this group at 
>>> http://groups.google.com/group/jvm-languages?hl=en.
>>>
>>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups 
> "JVM Languages" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to 
> [email protected].
> For more options, visit this group at 
> http://groups.google.com/group/jvm-languages?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en.

Reply via email to