+1 on the distributed tracing, no obvious integration points. Dropwizard metrics should suffice wrt to functional requirements, after all it does work for Spark [1], right? Wrt to your ask on choosing and established and reasonable dependencies set dependency I think Dropwizard is the only option, no runner up afaik. While we can rely on metrics that are specific to a particular Iceberg implementation (i.e. Hadoop) there's still some interesting metrics I'd consider more than nice-to-have tbh, like histograms of table operations latencies, since for example an Iceberg file append commit operation may consist of a up to a dozen effective Hadoop filesystem operations. You have the experience of running Iceberg in production so I was looking for advice on say top three metrics that you'd strongly consider before running Iceberg in production?
[1] https://spark.apache.org/docs/latest/monitoring.html On Thu, Feb 21, 2019 at 11:26 PM Ryan Blue <rb...@netflix.com> wrote: > Sounds like one of the first decision points is whether to use a framework > with distributed tracing or not. I think I would opt for not requiring > distributed tracing. > > Most of Iceberg is a self-contained library, so there are few points at > which distributed tracing would make sense. Is there much value in tracing > the metadata swap that happens in a metastore? I'm not sure there is. I > think it would probably be sufficient to use a simpler metrics library. > > I've used DropWizard before, which I thought was trying to be the SLF4J of > metrics. Is that still the case? I'd prefer to go with an established > project that is likely to have broad support. And one that has a reasonable > dependency set. > > On Mon, Feb 18, 2019 at 2:33 PM filip <filip....@gmail.com> wrote: > >> Both these solutions provide support for collecting metrics and >> distributed tracing independent of the platform of choice. They seem to be >> overlapping quite a lot though. >> >> OpenCensus [1] provides bindings for Go, Java, C++ and more [2] and it >> also seems to support OOB backends and custom ones as well [3]. Looking >> over the troubleshooting >> section [4] I could see reasonable value in collecting performance >> metrics for measures around operations retries, latencies, error rates, >> etc. though I guess that the distributed >> tracing is their main selling point. The documentation advertises low >> footprint too. >> >> Opentracing is focusing on providing a standard for distributed tracing >> for both service and application level. No backend provided OOB afaik but >> it seems it's covered quite >> extensively by existing backends such as Zipkin, CNCF Jaeger and more >> [5]. There specification documentation [6] is very comprehensive. >> >> Oh and there is the OpenMetrics [7] too which aims to standardize on how >> we expose metrics. I am learning a lot over of interesting things from >> their issues page [8] >> >> Then there is the good old codahale/dropwizard metrics library [9] that >> we could leverage just as well to expose internal metrics from the library, >> no potential distributed tracing support though. >> I don't think that DW metrics supports tags though, reading [10] it seems >> they're looking at it as a breaking change and engineering team is looking >> to add tags support in version 5.0. >> >> I am thinking that distributed tracing might prove very useful for >> troubleshooting operations that require atomic guarantees. >> I am thinking/ hoping that should any backend we'd use for implementing >> Iceberg be using either opencensus or opentracing we might get support of >> distributed tracing, it'd be really interesting >> to see spanning across process boundaries. >> >> I am saying a lot of "hoping" and "thinking" because I haven't used >> either one in a real-world implementation but I thought I'd might get folks >> interested on the topic and something good comes out of this. >> >> [1] https://opencensus.io/introduction/ >> https://opensource.google.com/projects/opencensus >> [2] https://opencensus.io/language-support/ >> [3] https://opencensus.io/introduction/#backend-support >> [4] https://opencensus.io/advanced-concepts/troubleshooting/ >> [5] https://opentracing.io/docs/supported-tracers/ >> [6] https://opentracing.io/specification/ >> [7] https://openmetrics.io/ >> [8] https://github.com/OpenObservability/OpenMetrics/issues >> [9] https://metrics.dropwizard.io/4.0.0/ >> [10] https://github.com/dropwizard/metrics/issues/1175 >> >> >> On Mon, Feb 18, 2019 at 11:03 PM Ryan Blue <rb...@netflix.com.invalid> >> wrote: >> >>> I don't know. Can you elaborate on what opencensus and opentracing are? >>> >>> On Mon, Feb 18, 2019 at 12:51 PM filip <filip....@gmail.com> wrote: >>> >>>> >>>> /Filip >>>> >>> >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> >> -- >> Filip Bocse >> > > > -- > Ryan Blue > Software Engineer > Netflix > -- Filip Bocse