Re: [DISCUSS] Polaris correlation IDs and telemetry

Dmitri Bourlatchkov Thu, 23 Oct 2025 10:47:59 -0700

Hi Michael,

Thanks for the info!


Working with Envoy's tracing headers makes sense to me. However, I wonder:
why would Polaris need to generate a new request ID inside its code?..
and return it in response headers?

How important is it to propagate this ID to Polaris Events?

I'm just trying to understand the full context :)

Thanks,
Dmitri.

On Thu, Oct 23, 2025 at 1:29 PM Michael Collado <[email protected]>
wrote:

> I think the original intention for this requestId field was to support
> request propagation from load balancers, like Envoy (
>
> https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/observability/tracing
> ), which is distinct from OTEL. Don't ask me why the default
> is Polaris-Request-Id - I think it was originally a custom thing, but then
> we changed to integrate with existing conventions. Unfortunately, looking
> through the code, I think that the actual functional plumbing has been lost
> in the course of multiple refactors around the call context and resolver. I
> don't see references to that property or the underlying header.
>
> Support for the unofficial x-request-id header feels like something we
> should definitely keep, especially when Polaris is one service in a mesh of
> services that maybe don't have OTel integration. I'm a fan of the OTel
> standard, but it's not entirely ubiquitous and there are many middleware
> layers that know how to forward on the x-request-id header.
>
> Mike
>
> On Thu, Oct 23, 2025 at 3:00 AM Robert Stupp <[email protected]> wrote:
>
> > Yes, we should aim for interoperability with the existing de-facto
> > standard OTel and make it easy for users to integrate into their
> > observability platforms.
> >
> > On Wed, Oct 22, 2025 at 7:05 PM Dmitri Bourlatchkov <[email protected]>
> > wrote:
> > >
> > > Hi Alex and All,
> > >
> > > I certainly support the idea of following OTel standards for achieving
> > > "correlation" wrt Polaris requests and/or events.
> > >
> > > As to what form the correlation data should take, I believe it is
> > > conceptually what the OTel "context" represents. So, I believe it makes
> > > sense for Polaris to support standard context propagators at the API
> > layer.
> > >
> > > If the incoming request has OTel context information, then returning
> any
> > > other "correlation" data in the response is redundant, I think.
> > >
> > > If the incoming request does not have OTel context info, what is the
> > > purpose of generating a Polaris-specific "correlation ID"? How is it
> > > envisioned to be used?
> > >
> > > If the intention is to correlate a Polaris response (operation) with
> > events
> > > that might have resulted from its execution, I believe a more robust
> > > approach would be to propagate the OTel trace info (starting a new
> trace
> > if
> > > necessary) into event data. Then, Polaris can also return the trace
> > context
> > > in the API response (top span). It's a bit awkward from the OTel
> > > perspective, but might be an option for supporting custom correlators.
> It
> > > could be covered by a feature flag. The header name could be
> > > "polaris-traceparent" for W3C Trace Context.
> > >
> > > Custom correlation code will be able to extract the trace ID from the
> > > response and from events and find related data. Granted, it will
> require
> > a
> > > bit more effort for the custom code to decode trace IDs from the OTel
> > > context, but the format is standard and not complex. The benefit for
> > > Polaris, though, is that it can easily integrate with OTel-compatible
> > > observability platforms regardless of whether any particular deployment
> > > uses custom correlators or not.
> > >
> > > WDYT?
> > >
> > > Thanks,
> > > Dmitri.
> > >
> > > On Wed, Oct 22, 2025 at 6:03 AM Alexandre Dutra <[email protected]>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Today, Polaris has the notion of "request ID", but its purpose is not
> > > > entirely clear. It seems to serve as an observability feature to
> > > > facilitate correlation. A pending PR aims to rename it to
> "correlation
> > > > ID" for better alignment with industry standards [1].
> > > >
> > > > However, this PR has brought to light overlaps with core telemetry
> > > > features: when OpenTelemetry (OTel) is enabled in Polaris, each
> > > > request already has a trace ID and span ID, making a separate
> > > > correlation ID redundant.
> > > >
> > > > Moreover, using the OTel trace ID and span ID in Polaris events,
> > > > rather than the generated correlation ID, would significantly
> simplify
> > > > correlation of events with other traces.
> > > >
> > > > Therefore, I propose the following changes:
> > > >
> > > > 1. If OTel is enabled, use the trace ID and span ID as the
> correlation
> > > > ID for the request, instead of generating a random correlation ID.
> > > > 2. Otherwise, if a (Polaris-specific) correlation ID header is
> present
> > > > in the request, use it.
> > > > 3. If neither of the above conditions is met, generate a random
> > > > correlation ID.
> > > >
> > > > I am somewhat undecided on the best approach when a correlation ID
> > > > header is present in the request. However, I believe it would be more
> > > > sensible to disregard it if OTel is enabled, as OTel offers a more
> > > > robust solution for client-to-server trace propagation, e.g. W3C
> Trace
> > > > Context propagation [2].
> > > >
> > > > Please share your thoughts!
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > > > [1]: https://github.com/apache/polaris/pull/2757
> > > > [2]: https://www.w3.org/TR/trace-context
> > > >
> >
>

Re: [DISCUSS] Polaris correlation IDs and telemetry

Reply via email to