+1 on the distributed tracing, no obvious integration points.
Dropwizard metrics should suffice wrt to functional requirements, after all
it does work for Spark [1], right? Wrt to your ask on choosing and
established and reasonable dependencies set dependency I think Dropwizard
is the only option, no runner up afaik.
While we can rely on metrics that are specific to a particular Iceberg
implementation (i.e. Hadoop) there's still some interesting metrics I'd
consider more than nice-to-have tbh, like histograms of table operations
latencies, since for example an Iceberg file append commit operation may
consist of a up to a dozen effective Hadoop filesystem operations.
You have the experience of running Iceberg in production so I was looking
for advice on say top three metrics that you'd strongly consider before
running Iceberg in production?

[1] https://spark.apache.org/docs/latest/monitoring.html

On Thu, Feb 21, 2019 at 11:26 PM Ryan Blue <rb...@netflix.com> wrote:

> Sounds like one of the first decision points is whether to use a framework
> with distributed tracing or not. I think I would opt for not requiring
> distributed tracing.
>
> Most of Iceberg is a self-contained library, so there are few points at
> which distributed tracing would make sense. Is there much value in tracing
> the metadata swap that happens in a metastore? I'm not sure there is. I
> think it would probably be sufficient to use a simpler metrics library.
>
> I've used DropWizard before, which I thought was trying to be the SLF4J of
> metrics. Is that still the case? I'd prefer to go with an established
> project that is likely to have broad support. And one that has a reasonable
> dependency set.
>
> On Mon, Feb 18, 2019 at 2:33 PM filip <filip....@gmail.com> wrote:
>
>> Both these solutions provide support for collecting metrics and
>> distributed tracing independent of the platform of choice. They seem to be
>> overlapping quite a lot though.
>>
>> OpenCensus [1] provides bindings for Go, Java, C++ and more [2] and it
>> also seems to support OOB backends and custom ones as well [3]. Looking
>> over the troubleshooting
>> section [4] I could see reasonable value in collecting performance
>> metrics for measures around operations retries, latencies, error rates,
>> etc. though I guess that the distributed
>> tracing is their main selling point. The documentation advertises low
>> footprint too.
>>
>> Opentracing is focusing on providing a standard for distributed tracing
>> for both service and application level. No backend provided OOB afaik but
>> it seems it's covered quite
>> extensively by existing backends such as Zipkin, CNCF Jaeger and more
>> [5]. There specification documentation [6] is very comprehensive.
>>
>> Oh and there is the OpenMetrics [7] too which aims to standardize on how
>> we expose metrics. I am learning a lot over of interesting things from
>> their issues page [8]
>>
>> Then there is the good old codahale/dropwizard metrics library [9] that
>> we could leverage just as well to expose internal metrics from the library,
>> no potential distributed tracing support though.
>> I don't think that DW metrics supports tags though, reading [10] it seems
>> they're looking at it as a breaking change and engineering team is looking
>> to add tags support in version 5.0.
>>
>> I am thinking that distributed tracing might prove very useful for
>> troubleshooting operations that require atomic guarantees.
>> I am thinking/ hoping that should any backend we'd use for implementing
>> Iceberg be using either opencensus or opentracing we might get support of
>> distributed tracing, it'd be really interesting
>> to see spanning across process boundaries.
>>
>> I am saying a lot of "hoping" and "thinking" because I haven't used
>> either one in a real-world implementation but I thought I'd might get folks
>> interested on the topic and something good comes out of this.
>>
>> [1] https://opencensus.io/introduction/
>> https://opensource.google.com/projects/opencensus
>> [2] https://opencensus.io/language-support/
>> [3] https://opencensus.io/introduction/#backend-support
>> [4] https://opencensus.io/advanced-concepts/troubleshooting/
>> [5] https://opentracing.io/docs/supported-tracers/
>> [6] https://opentracing.io/specification/
>> [7] https://openmetrics.io/
>> [8] https://github.com/OpenObservability/OpenMetrics/issues
>> [9] https://metrics.dropwizard.io/4.0.0/
>> [10] https://github.com/dropwizard/metrics/issues/1175
>>
>>
>> On Mon, Feb 18, 2019 at 11:03 PM Ryan Blue <rb...@netflix.com.invalid>
>> wrote:
>>
>>> I don't know. Can you elaborate on what opencensus and opentracing are?
>>>
>>> On Mon, Feb 18, 2019 at 12:51 PM filip <filip....@gmail.com> wrote:
>>>
>>>>
>>>> /Filip
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>> --
>> Filip Bocse
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


-- 
Filip Bocse

Reply via email to