Hi Alex, IMO, we should be going with Option 2 - with the value selected for this column to always be the Request ID (whether that value is the `X-Request-Id`, OTel Trace ID, etc.) This is mainly because I don't see a use case currently where identifying whether the Request ID is an OTel Trace ID or some other sort of value provides any value to the user - but I am open to changing my preferred option if someone can share a use case where this is not true.
IOW, I don't think OTel information is relevant to Events if it is not used as the primary way to identify a request, and therefore should not be persisted. However, the discussion on ( https://lists.apache.org/thread/p9357rcy3d1j94w4yogtdwcf2kxzg3jr) may change this view in that maybe we need to keep a "Span ID" (not necessary that this is OTel-generated/specific) to identify corresponding Before/After events. In which case, maybe we mix in Option 4 to store just the "Span ID". I believe both email threads need more opinions before we will see the full set of requirements of what we should make. Best, Adnan Hemani On Fri, Oct 31, 2025 at 5:22 AM Alexandre Dutra <[email protected]> wrote: > Hi all, > > As a follow-up to [1], I'm starting this thread to discuss how we can > persist client-generated request IDs and OTel context in our database. > > Quoting myself [2], the requirements I think we want to fulfil are: > > 1. Only one correlation ID is enough > 2. The correlation ID is an opaque string > 3. The main use case is to find events with matching correlation IDs > 4. The only query pattern is by exact match > > A few options were suggested: > > 1) Two separate columns: request_id and otel_context, both of type > TEXT (nullable). > - Pros: Easy to implement and offers good read performance, > especially with indexes. > - Cons: Could be overkill, as often one context is sufficient for > correlating records. > > 2) A single column: (e.g., correlation_id, final name TBD) of type TEXT. > - Pros: Same as option 1. > - Cons: If both a request ID and OTel context are available, we > can only store one. There's no straightforward way to identify the > type of context stored (unless we use a prefix). > > 3) Two columns: correlation_id and correlation_id_type. > - Pros: Same as option 1, and it addresses the issue of > identifying the ID type. > - Cons: Might be over-engineered. Is the ID type truly essential? > Isn't it opaque? > > 4) Leverage the existing additional_properties column: (JSONB in > Postgres, TEXT in H2). > - Pros: Simple and flexible due to its schemaless nature, allowing > us to add anything we need. > - Cons: Query performance might not be optimal, though indexes could > help. > > What are your thoughts? > > Thanks, > Alex > > [1] https://lists.apache.org/thread/bb1qyxjt827t3tomv2xp0s1kovxjsp94 > [2] https://lists.apache.org/thread/fqjjmxc6v8bbynwd5xfz83ngmp6gqqxj >
