I've got a few thoughts... On the performance side, I took a look at a few CPU profiles from past benchmarks and I'm seeing DropWizard taking ~ 3% of CPU time. Is there a specific workload you're running where you're seeing it take up a significant % of CPU time? Could you share some metrics, profile data, or a workload so I can try to reproduce your findings? In my testing I've found the majority of the overhead from metrics to come from JMX, not DropWizard.
On the operator side, inventing our own metrics lib means risks making it harder to instrument Cassandra. There are libraries out there that allow you to tap into DropWizard metrics directly. For example, Sarma Pydipally did a presentation on this last year [1] based on some code I threw together. If you're planning on making it easier to instrument C* by supporting sending metrics to the OTel collector [2], then I could see the change being a net win as long as the perf is no worse than the status quo. It's hard to know the full extent of what you're planning and the impact, so I'll save any opinions till I know more about the plan. Thanks for bringing this up! Jon [1] https://planetcassandra.org/leaf/apache-cassandra-lunch-62-grafana-dashboard-for-apache-cassandra-business-platform-team/ [2] https://opentelemetry.io/docs/collector/ On Tue, Mar 4, 2025 at 12:40 PM Dmitry Konstantinov <netud...@gmail.com> wrote: > Hi all, > > After a long conversation with Benedict and Maxim in CASSANDRA-20250 > <https://issues.apache.org/jira/browse/CASSANDRA-20250> I would like to > raise and discuss a proposal to deprecate Dropwizard/Codahale metrics usage > in the next major release of Cassandra server and drop it in the following > major release. > Instead of it our own Java API and implementation should be introduced. > For the next major release Dropwizard/Codahale API is still planned to > support by extending Codahale implementations, to give potential users of > this API enough time for transition. > The proposal does not affect JMX API for metrics, it is only about local > Java API changes within Cassandra server classpath, so it is about the > cases when somebody outside of Cassandra server code relies on Codahale API > in some kind of extensions or agents. > > Reasons: > 1) Codahale metrics implementation is not very efficient from CPU and > memory usage point of view. In the past we already replaced default > Codahale implementations for Reservoir with our custom one and now in > CASSANDRA-20250 <https://issues.apache.org/jira/browse/CASSANDRA-20250> we > (Benedict and I) want to add a more efficient implementation for Counter > and Meter logic. So, in total we do not have so much logic left from the > original library (mostly a MetricRegistry as container for metrics) and the > majority of logic is implemented by ourselves. > We use metrics a lot along the read and write paths and they contribute a > visible overhead (for example for plain write load it is about 9-11% > according to async profiler CPU profile), so we want them to be highly > optimized. > From memory perspective Counter and Meter are built based on LongAdder and > they are quite heavy for the amounts which we create and use. > > 2) Codahale metrics does not provide any way to replace Counter and Meter > implementations. There are no full functional interfaces for these > entities + MetricRegistry has casts/checks to implementations and cannot > work with anything else. > I looked through the already reported issues and found the following > similar and unsuccessful attempt to introduce interfaces for metrics: > https://github.com/dropwizard/metrics/issues/2186 > as well as other older attempts: > https://github.com/dropwizard/metrics/issues/252 > https://github.com/dropwizard/metrics/issues/264 > https://github.com/dropwizard/metrics/issues/703 > https://github.com/dropwizard/metrics/pull/487 > https://github.com/dropwizard/metrics/issues/479 > https://github.com/dropwizard/metrics/issues/253 > > So, the option to request an extensibility from Codahale metrics does not > look real.. > > 3) It looks like the library is in maintenance mode now, 5.x version is on > hold and many integrations are also not so alive. > The main benefit to use Codahale metrics should be a huge amount of > reporters/integrations but if we check carefully the list of reporters > mentioned here: > https://metrics.dropwizard.io/4.2.0/manual/third-party.html#reporters > we can see that almost all of them are dead/archived. > > 4) In general, exposing other 3rd party libraries as our own public API > frequently creates too many limitations and issues (Guava is another > typical example which I saw previously, it is easy to start but later you > struggle more and more). > > Does anyone have any questions or concerns regarding this suggestion? > -- > Dmitry Konstantinov >