obelix74 commented on PR #3385: URL: https://github.com/apache/polaris/pull/3385#issuecomment-3855547856
> Hi, > > What is the intended cleanup of these tables? should we do a cronjonb doing plain sql and truncation based on dates? My concern is that otel data and iceberg are decoralated if nto store altogether (in the spans for ex) so it makes yet another maintenance job to add to be prod ready. Do you have any pointer? > > Another thing is what is the plan to make it usable in the standard observability stack since the spans dont have these JDBC data will polaris implement a grafana plugin correlable to tempo/zipkin/openobserve for ex? The original design spec (and a draft PR) had a cron job that uses Quarkus scheduler to delete the table metrics based on expiry. Based on review, we removed it since it was becoming too large for one PR or feature. The same issue exists for the events table, so probably a unified way to handle both. After these are merged, we need to raise two new features. 1. Cleanup of these tables 2. A basic REST API to expose the queries With regards to standard tooling integration: The metrics records store `otel_trace_id` and `otel_span_id` specifically for correlation with distributed traces. This allows joining Polaris metrics data with traces in `Tempo/Zipkin/Jaeger`. For visualization, there are several approaches: 1. Direct SQL/Grafana - Grafana can query the metrics tables directly via PostgreSQL datasource and correlate with Tempo using the trace ID 2. Export to OTLP - A future enhancement could export metrics as OTLP metrics/logs to an OpenTelemetry collector 3. Custom Grafana plugin - This would be a larger effort and likely a separate initiative 4. The aforementioned REST API for queries can be used by a Prometheus endpoint to deliver the metrics as Prometheus format as well (which is what I will do for my usecase). The current design prioritizes storing the data with correlation IDs, leaving the visualization layer flexible. Do you have a preference for how this should evolve? Once we are able to persist the table metrics, we can work on a variety of things around the data. Currently as of `1.3.0`, the table metrics is logged and that's it (unless someone implements a custom metrics handler). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
