[
https://issues.apache.org/jira/browse/CASSANDRA-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733756#comment-14733756
]
Anuj Wadehra commented on CASSANDRA-8907:
-----------------------------------------
I am adding a 3rd scenario to the 2 scenarios I mentioned in my earlier comment:
3. GC warn threshold is enabled by default and set to 5000ms.Suppose an
application is NOT sensitive to gc pauses e.g. some background job. Even though
no functionality is impacted and application SLA is being met, its getting 5+
secs of gc pauses in the background. When the user upgrades Cassandra he will
start getting Warnings for every gc pause over 5 sec. I wont call that
'breaking' of existing log monitoring system with new warnings. Warnings are
warnings "an indication of possible problem" not errors. Any gc pause over 5
secs indicates poor heap tuning / insufficient heap. After upgrade, User must
start getting these warnings so that he can look at options for optimizing JVM
tunings.
Based on the 3 scenarios I mentioned, scenarios 2 and 3 support enabling this
property by default and setting value to something like 5+ secs so that user is
aware of possible problems with GC tuning upfront. If user is warned and he
still wants to continue with long gc pauses, he can increase the gc warn
threshold. But its should be Cassandra's responsibility to make user aware of
possible problems by raising warning especially when we have a GCInspector
which is monitoring Gc pauses.
> Raise GCInspector alerts to WARN
> --------------------------------
>
> Key: CASSANDRA-8907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8907
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Adam Hattrell
> Assignee: Amit Singh Chowdhery
> Labels: patch
> Attachments: cassnadra-8907.patch
>
>
> I'm fairly regularly running into folks wondering why their applications are
> reporting down nodes. Yet, they report, when they grepped the logs they have
> no WARN or ERRORs listed.
> Nine times out of ten, when I look through the logs we see a ton of ParNew or
> CMS gc pauses occurring similar to the following:
> INFO [ScheduledTasks:1] 2013-03-07 18:44:46,795 GCInspector.java (line 122)
> GC for ConcurrentMarkSweep: 1835 ms for 3 collections, 2606015656 used; max
> is 10611589120
> INFO [ScheduledTasks:1] 2013-03-07 19:45:08,029 GCInspector.java (line 122)
> GC for ParNew: 9866 ms for 8 collections, 2910124308 used; max is 6358564864
> To my mind these should be WARN's as they have the potential to be
> significantly impacting the clusters performance as a whole.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)