[ 
https://issues.apache.org/jira/browse/CASSANDRA-8907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14733756#comment-14733756
 ] 

Anuj Wadehra commented on CASSANDRA-8907:
-----------------------------------------

I am adding a 3rd scenario to the 2 scenarios I mentioned in my earlier comment:

3. GC warn threshold is enabled by default and set to 5000ms.Suppose an 
application is NOT sensitive to gc pauses e.g. some background job. Even though 
no functionality is impacted and application SLA is being met, its getting 5+ 
secs of gc pauses in the background. When the user upgrades Cassandra he will 
start getting Warnings for every gc pause over 5 sec. I wont call that 
'breaking' of existing log monitoring system with new warnings. Warnings are 
warnings "an indication of possible problem" not errors. Any gc pause over 5 
secs indicates poor heap tuning / insufficient heap. After upgrade, User must 
start getting these warnings so that he can look at options for optimizing JVM 
tunings. 

Based on the 3 scenarios I mentioned, scenarios 2 and 3 support enabling this 
property by default and setting value to something like 5+ secs so that user is 
aware of possible problems with GC tuning upfront. If user is warned and he 
still wants to continue with long gc pauses, he can increase the gc warn 
threshold. But its should be Cassandra's responsibility to make user aware of 
possible problems by raising warning especially when we have a GCInspector 
which is monitoring Gc pauses.


> Raise GCInspector alerts to WARN
> --------------------------------
>
>                 Key: CASSANDRA-8907
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8907
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Adam Hattrell
>            Assignee: Amit Singh Chowdhery
>              Labels: patch
>         Attachments: cassnadra-8907.patch
>
>
> I'm fairly regularly running into folks wondering why their applications are 
> reporting down nodes.  Yet, they report, when they grepped the logs they have 
> no WARN or ERRORs listed.
> Nine times out of ten, when I look through the logs we see a ton of ParNew or 
> CMS gc pauses occurring similar to the following:
> INFO [ScheduledTasks:1] 2013-03-07 18:44:46,795 GCInspector.java (line 122) 
> GC for ConcurrentMarkSweep: 1835 ms for 3 collections, 2606015656 used; max 
> is 10611589120
> INFO [ScheduledTasks:1] 2013-03-07 19:45:08,029 GCInspector.java (line 122) 
> GC for ParNew: 9866 ms for 8 collections, 2910124308 used; max is 6358564864
> To my mind these should be WARN's as they have the potential to be 
> significantly impacting the clusters performance as a whole.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to