C. Scott Andreas updated CASSANDRA-13491:
    Component/s: Metrics

> Emit metrics for JVM safepoint pause
> ------------------------------------
>                 Key: CASSANDRA-13491
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13491
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Metrics
>            Reporter: Simon Zhou
>            Priority: Major
> GC pause is not the only source of latency from JVM. In one of our recent 
> production issues, the metrics for GC looks good (some >200ms and longest 
> 500ms) but GC logs show periodic pauses like this:
> {code}
> 2017-04-26T01:51:29.420+0000: 352535.998: Total time for which application 
> threads were stopped: 19.8835870 seconds, Stopping threads took: 19.7842073 
> seconds
> {code}
> This huge delay should be JVM malfunction but it caused some requests 
> timeout. So I'm suggesting to add support for safepoint pause for better 
> observability. Two problems though:
> 1. This depends on JVM. Some JVMs may not expose these internal MBeans. This 
> is actually the same case for existing GCInspector.
> 2. For Hotspot, it has HotspotRuntime as an internal MBean so that we can get 
> safepoint pause. However, there is no notification support for that. I got 
> error "MBean sun.management:type=HotspotRuntime does not implement 
> javax.management.NotificationBroadcaster" when trying to register a listener. 
> This means we will need to pull the safepoint pauses from HotspotRuntime 
> periodically.
> Reference:
> http://blog.ragozin.info/2012/10/safepoints-in-hotspot-jvm.html
> Anyone think we should support this?

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to