Hi Dmitri,

Yes, I think your suggestion to use just one field w3c_trace_context
makes more sense than two fields (span_id and trace_id).

With that, I also think that we are slowly drifting into
implementation considerations; let's get consensus on the general
design first, and we can certainly fine-tune the actual Java methods
and SQL table columns in the future PR. WDYT?

Thanks,
Alex

On Fri, Oct 24, 2025 at 5:14 PM Dmitri Bourlatchkov <[email protected]> wrote:
>
> Hi All,
>
> Many thanks for the background info, Adnan!
>
> +1 to action items proposed by Alex!
>
> Re: (3) Can we abstract request/trace info into a separate object without
> exposing those accessors on the Event class directly? OTel defines
> trace/span concepts, but in request ID is a bit foreign to OTel. Having
> tracing / request ID isolated in java could help with maintaining it and
> potentially supporting other (custom) tracing methods.
>
> Re: (4) I'd like to propose storing OTel correlation data in the form of a
> standard context propagation string, e.g. W3C trace-context [1] (same value
> as its HTTP header), so the column could be called w3c_trace_context or
> simply trace_context.
>
> Open question: do we need to write a separate, individual trace ID field in
> SQL? I suppose it is not very useful since correlating it to other trace
> data already requires understanding OTel context propagation and a query
> against trace_context can still be made using string-matching clauses. We
> could probably (additionally) store it in the request_id column if the
> Polaris-specific request ID header is not set.
>
> As for span ID, I do not really see a use case for persisting it
> individually. It is very specific to OTel trace data construction.
>
> Actually, using the W3C trace context [1] encoding probably makes sense in
> the java event representation too. Interested callers can easily decode
> this information since the format is well-defined. As a side benefit, this
> opens opportunities for downstream event consumers to connect (propagate
> context) to traces that produced events based on the event data itself,
> without relying on the intermediate frameworks. This may be desirable since
> the current Polaris event persistence impl. writes events in batches, so
> the association to individual requests that produced those events is lost.
> Whether to perform this kind of context propagation will be at the
> discretion of the event consumer, of course (outside of Polaris code).
>
> [1] https://www.w3.org/TR/trace-context/
>
> WDYT?
>
> Thanks,
> Dmitri.
>
> On Fri, Oct 24, 2025 at 7:18 AM Alexandre Dutra <[email protected]> wrote:
>
> > Hi all,
> >
> > Thank you for chiming in; the context around request IDs is now clear.
> >
> > Trying to summarize this thread into actionable items, here's what I
> > propose:
> >
> > 1. Restore the original functionality for request IDs.
> >     - Change the default header name back to x-request-id (despite the
> > x- prefix being deprecated), but keep it configurable as today.
> > 2. Remove RequestIdGenerator and related functionality.
> >     - Do not generate a request ID if the REST client doesn't provide one.
> > 3. Update PolarisEvent:
> >     - Expose new requestId(), traceId(), spanId() methods, all nullable.
> >     - This would align with the emerging consensus around including
> > contextual information in PolarisEvent [1].
> > 4. Update events table SQL schema:
> >     - Insert the client-provided request ID into the request_id
> > column; otherwise, insert null.
> >     - Add two new nullable columns, trace_id and span_id, and populate
> > them if OTel is enabled.
> >
> > From our discussions, I think it's important to not conflate OTel
> > tracing fields with Envoy tracing fields, which is why I suggest we
> > use separate fields / columns for them.
> >
> > Would the above plan work for everyone?
> >
> > Thanks,
> > Alex
> >
> > [1]: https://lists.apache.org/thread/rl5cpcft16sn5n00mfkmx9ldn3gsqtfy
> >
> >
> > On Fri, Oct 24, 2025 at 9:33 AM Adnan Hemani
> > <[email protected]> wrote:
> > >
> > > Hi all,
> > >
> > > Thanks to Alex for starting this thread - because of this, I’m just
> > coming to the realization that OTel Trace and Span IDs are coming built-in
> > with Quarkus and my previous work to generate a Request/Correlation ID is
> > likely not needed as a result. My original motivation for generation of a
> > Request/Correlation ID was to ensure that any client can uniquely identify
> > a request made to Polaris, which would be especially useful for debugging
> > failing requests or identifying call patterns.
> > >
> > > As a result, I’m a +1 on Michael’s opinion: we should remove the
> > Request/Correlation ID generation and always use the OTel trace/span IDs
> > (which come for free with Quarkus) instead for the Correlation ID unless a
> > valid header is present, which would take over as the Correlation ID
> > instead.
> > >
> > > —
> > >
> > > To answer Dmitri’s question re Polaris Events: The intended use case is
> > to provide some sort of correlation between events that have occurred as
> > part of the same request. For example, if a user makes an CommitTransaction
> > request, it would be helpful to see all UpdateTable calls that were made as
> > part of that one user request.
> > >
> > > Best,
> > > Adnan Hemani
> > >
> > > > On Oct 23, 2025, at 12:15 PM, Dmitri Bourlatchkov <[email protected]>
> > wrote:
> > > >
> > > > Hi Michael,
> > > >
> > > > Logging x-request-id headers makes sense.
> > > >
> > > > Just to confirm: if / when we restore that, Polaris will NOT generate
> > new
> > > > IDs in case the header is not present in the request, correct?
> > > >
> > > > I believe x-request-id can co-exist with OTel.
> > > >
> > > > What about adding request IDs to events [1][2]? What's the intended use
> > > > case for that? Could you share some context here too?
> > > >
> > > > Side note: I proposed [2877] flagging event persistence as "beta" in
> > > > 1.2.0... This discussion adds another point towards that, I think.
> > > >
> > > > [1]
> > > >
> > https://www.google.com/url?q=https://github.com/apache/polaris/blob/2f0c7a43d446452004ea51196b618de9bdf0e25b/runtime/service/src/main/java/org/apache/polaris/service/events/listeners/inmemory/InMemoryBufferEventListener.java%23L97&source=gmail-imap&ust=1761851831000000&usg=AOvVaw1WfUaXLp6z_M87iAXEqSUw
> > > > [2]
> > > >
> > https://www.google.com/url?q=https://github.com/apache/polaris/blob/19742cc20f4bc0b7e5a315a62f89c6085ad81b7d/runtime/service/src/main/java/org/apache/polaris/service/events/listeners/PolarisPersistenceEventListener.java%23L66&source=gmail-imap&ust=1761851831000000&usg=AOvVaw12-7e3ahm2sLSkLSNqTecm
> > > >
> > > > [2877]
> > https://www.google.com/url?q=https://github.com/apache/polaris/pull/2877%23discussion_r2456300613&source=gmail-imap&ust=1761851831000000&usg=AOvVaw3TuYbkzwnLx3QEVIM8oDda
> > > >
> > > > Thanks,
> > > > Dmitri.
> > > >
> > > > On Thu, Oct 23, 2025 at 2:23 PM Michael Collado <
> > [email protected]>
> > > > wrote:
> > > >
> > > >> Hey Dmitri
> > > >>
> > > >> The generating a request id is new code that was added after the
> > original
> > > >> x-request-id support. You can see the state from ~1 year ago, we
> > hard-coded
> > > >> request_id as the header we used for the MDC -
> > > >>
> > > >>
> > https://www.google.com/url?q=https://github.com/apache/polaris/blob/a6197bd7d8cb5551253fa427e4373897205ecece/polaris-service/src/main/java/org/apache/polaris/service/PolarisApplication.java%23L415-L416&source=gmail-imap&ust=1761851831000000&usg=AOvVaw35Q1A_2avAiSYlYVAnZxBb
> > > >> . At some point, it was changed to be configurable, then the
> > > >> ContextResolverFilter filter was refactored/eliminated and the
> > > >> RequestIdFilter took its responsibility, but lost some of its original
> > > >> functionality.
> > > >>
> > > >> My personal opinion is that restoring support for the x-request-id
> > header
> > > >> is something that we should do, but if the header isn't present,
> > falling
> > > >> back on simply using OTel trace ids is good enough (better, even) than
> > > >> generating another random request id.
> > > >>
> > > >> Mike
> > > >>
> > > >> On Thu, Oct 23, 2025 at 10:47 AM Dmitri Bourlatchkov <
> > [email protected]>
> > > >> wrote:
> > > >>
> > > >>> Hi Michael,
> > > >>>
> > > >>> Thanks for the info!
> > > >>>
> > > >>> Working with Envoy's tracing headers makes sense to me. However, I
> > > >> wonder:
> > > >>> why would Polaris need to generate a new request ID inside its
> > code?..
> > > >>> and return it in response headers?
> > > >>>
> > > >>> How important is it to propagate this ID to Polaris Events?
> > > >>>
> > > >>> I'm just trying to understand the full context :)
> > > >>>
> > > >>> Thanks,
> > > >>> Dmitri.
> > > >>>
> > > >>> On Thu, Oct 23, 2025 at 1:29 PM Michael Collado <
> > [email protected]>
> > > >>> wrote:
> > > >>>
> > > >>>> I think the original intention for this requestId field was to
> > support
> > > >>>> request propagation from load balancers, like Envoy (
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > https://www.google.com/url?q=https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/observability/tracing&source=gmail-imap&ust=1761851831000000&usg=AOvVaw1RnuM8mViV-j7jvuxq74Aw
> > > >>>> ), which is distinct from OTEL. Don't ask me why the default
> > > >>>> is Polaris-Request-Id - I think it was originally a custom thing,
> > but
> > > >>> then
> > > >>>> we changed to integrate with existing conventions. Unfortunately,
> > > >> looking
> > > >>>> through the code, I think that the actual functional plumbing has
> > been
> > > >>> lost
> > > >>>> in the course of multiple refactors around the call context and
> > > >>> resolver. I
> > > >>>> don't see references to that property or the underlying header.
> > > >>>>
> > > >>>> Support for the unofficial x-request-id header feels like something
> > we
> > > >>>> should definitely keep, especially when Polaris is one service in a
> > > >> mesh
> > > >>> of
> > > >>>> services that maybe don't have OTel integration. I'm a fan of the
> > OTel
> > > >>>> standard, but it's not entirely ubiquitous and there are many
> > > >> middleware
> > > >>>> layers that know how to forward on the x-request-id header.
> > > >>>>
> > > >>>> Mike
> > > >>>>
> > > >>>> On Thu, Oct 23, 2025 at 3:00 AM Robert Stupp <[email protected]>
> > wrote:
> > > >>>>
> > > >>>>> Yes, we should aim for interoperability with the existing de-facto
> > > >>>>> standard OTel and make it easy for users to integrate into their
> > > >>>>> observability platforms.
> > > >>>>>
> > > >>>>> On Wed, Oct 22, 2025 at 7:05 PM Dmitri Bourlatchkov <
> > > >> [email protected]>
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>> Hi Alex and All,
> > > >>>>>>
> > > >>>>>> I certainly support the idea of following OTel standards for
> > > >>> achieving
> > > >>>>>> "correlation" wrt Polaris requests and/or events.
> > > >>>>>>
> > > >>>>>> As to what form the correlation data should take, I believe it is
> > > >>>>>> conceptually what the OTel "context" represents. So, I believe it
> > > >>> makes
> > > >>>>>> sense for Polaris to support standard context propagators at the
> > > >> API
> > > >>>>> layer.
> > > >>>>>>
> > > >>>>>> If the incoming request has OTel context information, then
> > > >> returning
> > > >>>> any
> > > >>>>>> other "correlation" data in the response is redundant, I think.
> > > >>>>>>
> > > >>>>>> If the incoming request does not have OTel context info, what is
> > > >> the
> > > >>>>>> purpose of generating a Polaris-specific "correlation ID"? How is
> > > >> it
> > > >>>>>> envisioned to be used?
> > > >>>>>>
> > > >>>>>> If the intention is to correlate a Polaris response (operation)
> > > >> with
> > > >>>>> events
> > > >>>>>> that might have resulted from its execution, I believe a more
> > > >> robust
> > > >>>>>> approach would be to propagate the OTel trace info (starting a new
> > > >>>> trace
> > > >>>>> if
> > > >>>>>> necessary) into event data. Then, Polaris can also return the
> > trace
> > > >>>>> context
> > > >>>>>> in the API response (top span). It's a bit awkward from the OTel
> > > >>>>>> perspective, but might be an option for supporting custom
> > > >>> correlators.
> > > >>>> It
> > > >>>>>> could be covered by a feature flag. The header name could be
> > > >>>>>> "polaris-traceparent" for W3C Trace Context.
> > > >>>>>>
> > > >>>>>> Custom correlation code will be able to extract the trace ID from
> > > >> the
> > > >>>>>> response and from events and find related data. Granted, it will
> > > >>>> require
> > > >>>>> a
> > > >>>>>> bit more effort for the custom code to decode trace IDs from the
> > > >> OTel
> > > >>>>>> context, but the format is standard and not complex. The benefit
> > > >> for
> > > >>>>>> Polaris, though, is that it can easily integrate with
> > > >> OTel-compatible
> > > >>>>>> observability platforms regardless of whether any particular
> > > >>> deployment
> > > >>>>>> uses custom correlators or not.
> > > >>>>>>
> > > >>>>>> WDYT?
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Dmitri.
> > > >>>>>>
> > > >>>>>> On Wed, Oct 22, 2025 at 6:03 AM Alexandre Dutra <
> > [email protected]
> > > >>>
> > > >>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Hi all,
> > > >>>>>>>
> > > >>>>>>> Today, Polaris has the notion of "request ID", but its purpose is
> > > >>> not
> > > >>>>>>> entirely clear. It seems to serve as an observability feature to
> > > >>>>>>> facilitate correlation. A pending PR aims to rename it to
> > > >>>> "correlation
> > > >>>>>>> ID" for better alignment with industry standards [1].
> > > >>>>>>>
> > > >>>>>>> However, this PR has brought to light overlaps with core
> > > >> telemetry
> > > >>>>>>> features: when OpenTelemetry (OTel) is enabled in Polaris, each
> > > >>>>>>> request already has a trace ID and span ID, making a separate
> > > >>>>>>> correlation ID redundant.
> > > >>>>>>>
> > > >>>>>>> Moreover, using the OTel trace ID and span ID in Polaris events,
> > > >>>>>>> rather than the generated correlation ID, would significantly
> > > >>>> simplify
> > > >>>>>>> correlation of events with other traces.
> > > >>>>>>>
> > > >>>>>>> Therefore, I propose the following changes:
> > > >>>>>>>
> > > >>>>>>> 1. If OTel is enabled, use the trace ID and span ID as the
> > > >>>> correlation
> > > >>>>>>> ID for the request, instead of generating a random correlation
> > > >> ID.
> > > >>>>>>> 2. Otherwise, if a (Polaris-specific) correlation ID header is
> > > >>>> present
> > > >>>>>>> in the request, use it.
> > > >>>>>>> 3. If neither of the above conditions is met, generate a random
> > > >>>>>>> correlation ID.
> > > >>>>>>>
> > > >>>>>>> I am somewhat undecided on the best approach when a correlation
> > > >> ID
> > > >>>>>>> header is present in the request. However, I believe it would be
> > > >>> more
> > > >>>>>>> sensible to disregard it if OTel is enabled, as OTel offers a
> > > >> more
> > > >>>>>>> robust solution for client-to-server trace propagation, e.g. W3C
> > > >>>> Trace
> > > >>>>>>> Context propagation [2].
> > > >>>>>>>
> > > >>>>>>> Please share your thoughts!
> > > >>>>>>>
> > > >>>>>>> Thanks,
> > > >>>>>>> Alex
> > > >>>>>>>
> > > >>>>>>> [1]:
> > https://www.google.com/url?q=https://github.com/apache/polaris/pull/2757&source=gmail-imap&ust=1761851831000000&usg=AOvVaw1-kAWfEk4tmsEg0q0GZBCn
> > > >>>>>>> [2]:
> > https://www.google.com/url?q=https://www.w3.org/TR/trace-context&source=gmail-imap&ust=1761851831000000&usg=AOvVaw22nMyOS7pbJ69XrBo5kHQS
> > > >>>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >

Reply via email to