Sooooo. That's a +1 from you, Jon? Just want to make sure. On Thu, Oct 3, 2024 at 7:17 AM Jon Haddad <j...@rustyrazorblade.com> wrote:
> I love that we're having a discussion about observability. A HUGE thank > you to anyone willing to invest time improving it in Cassandra. > > I'd really, really like to see us ship a Prom compatible metrics endpoint > out of the box in C* that has low overhead. All the current OSS metrics > exporters that I've seen have massive overhead. I'm specifically looking > for sub-10s collection on clusters with a thousand nodes and 500+ tables. > That means going directly to DropWizard and skipping JMX. > > I put together a POC of it a while ago here: > https://github.com/rustyrazorblade/cassandra-prometheus-exporter. Please > use commit 434be099d5983d537e2c70aad745194e575bc49a as a reference. I > wasn't expecting anyone to actually care about the repo and the last commit > broke it. There's some optimizations that could be done to further improve > the exporter, I was working on that when I broke the repo :/ > > For industry comparison the following DBs either ship entire monitoring > stacks or provide strong recommendations / solutions: > > * ScyllaDB: https://www.scylladb.com/product/scylladb-monitoring-stack/ > * Cockroach: > https://www.cockroachlabs.com/docs/v24.2/ui-overview-dashboard > * Aerospike: > https://aerospike.com/docs/monitorstack/new/components-of-monitoring-stack > * MongoDB: > https://www.mongodb.com/products/platform/atlas-charts/dashboard > * Elastic: > https://www.elastic.co/guide/en/elasticsearch/reference/8.15/monitoring-production.html > * Redis: https://grafana.com/grafana/dashboards/12776-redis/ > > Re: Logs - I wouldn't write off OTel logging [1]. OTel logs can be tagged > with metadata including the span allowing you to do some really useful > diagnostics. It's a significant improvement over standard logging. > > Anyways - I don't have a strong opinion on how the CEPs are done. > Different ones or together, whichever works. I hope we can finally get a > good metrics solution because that's an area of significant pain for end > users. A lot of teams don't even have Cassandra dashboards because we > currently provide zero direction. > > Jon > > [1] https://opentelemetry.io/docs/specs/otel/logs/ > > Logs can be correlated with the rest of observability data in a few > dimensions: > > * By the time of execution. Logs, traces and metrics can record the moment > of time or the range of time the execution took place. This is the most > basic form of correlation. > > * By the execution context, also known as the trace context. It is a > standard practice to record the execution context (trace and span ids as > well as user-defined context) in the spans. OpenTelemetry extends this > practice to logs where possible by including TraceId and SpanId in the > LogRecords. This allows to directly correlate logs and traces that > correspond to the same execution context. It also allows to correlate logs > from different components of a distributed system that participated in the > particular request execution. > > * By the origin of the telemetry, also known as the Resource context. > OpenTelemetry traces and metrics contain information about the Resource > they come from. We extend this practice to logs by including the Resource > in LogRecords. > > > > On Thu, Oct 3, 2024 at 6:11 AM João Reis <joaor...@apache.org> wrote: > >> Reducing the scope of CEP-32 to OpenTelemetry Tracing is a good idea (or >> creating a new one). We recently added OpenTelemetry Tracing support to the >> C# driver [1] and we also decided to not include Metrics and Logs in this >> initiative because the driver already provides a way to collect metrics and >> logs so it's not as important. >> >> I believe there's also efforts to add OpenTelemetry support to the java >> driver but I'm not sure if it's limited to Tracing or if they include >> metrics and logs. >> >> [1] >> https://github.com/datastax/csharp-driver/tree/master/doc/features/opentelemetry#readme >> >> Yuki Morishita <mor.y...@gmail.com> escreveu (terça, 1/10/2024 à(s) >> 07:13): >> >>> Hi, >>> >>> Since I have limited time working on the CEP-32, I'd appreciate the >>> collaboration to make this CEP the reality. >>> >>> Another thing I'm thinking of is to reduce its scope to only the >>> OpenTelemetry configuration and the way to work only with OpenTelemetry >>> Tracing. >>> >>> If it's possible to create sub CEPs, I will create the one for tracing, >>> metrics and logs. Otherwise, I can rewrite the current CEP-32 to only focus >>> on OpenTelemetry Tracing. >>> Or maybe scrap CEP-32 and create a new one for Tracing. >>> >>> >>> On Mon, Sep 23, 2024 at 11:47 AM Saranya Krishnakumar < >>> saran.krishna...@gmail.com> wrote: >>> >>>> Hi Patrick, >>>> >>>> I am interested in working on this CEP collaborating with Yuki. I >>>> recently worked on adding metrics framework in Apache Cassandra Sidecar >>>> project. >>>> >>>> Best, >>>> Saranya Krishnakumar >>>> >>>> On Thu, Sep 19, 2024 at 10:57 AM Patrick McFadin <pmcfa...@gmail.com> >>>> wrote: >>>> >>>>> Here's another stalled CEP. In this case, no discuss thread or Jira. >>>>> >>>>> Yuki (or anyone else) know the status of this CEP? >>>>> >>>>> >>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-32%3A+%28DRAFT%29+OpenTelemetry+integration >>>>> >>>>> Patrick >>>>> >>>>