C. Scott Andreas updated CASSANDRA-10245:
    Component/s: Observability

> Provide after the fact visibility into the reliability of the environment C* 
> operates in
> ----------------------------------------------------------------------------------------
>                 Key: CASSANDRA-10245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10245
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Observability
>            Reporter: Ariel Weisberg
>            Priority: Major
>             Fix For: 4.x
> I think that by default databases should not be completely dependent on 
> operator provided tools for monitoring node and network health.
> The database should be able to detect and report on several dimensions of 
> performance in its environment, and more specifically report on deviations 
> from acceptable performance.
> * Node wide pauses
> * JVM wide pauses
> * Latency, and roundtrip time to all endpoints
> * Block device IO latency
> If flight recorder were available for use in production I would say as a 
> start just turn that on, add jHiccup (inside and outside the server process), 
> and a daemon inside the server to measure network performance between 
> endpoints.
> FR is not available (requires a license in production) so instead focus on 
> adding instrumentation for the most useful facets of flight recorder in 
> diagnosing performance issues. I think we can get pretty far because what we 
> need to do is not quite as undirected as the exploration FR and JMC 
> facilitate.
> Until we dial in how we measure and how to signal without false positives I 
> would expect this kind of logging to be in the background for post-hoc 
> analysis.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to