obelix74 commented on PR #3385:
URL: https://github.com/apache/polaris/pull/3385#issuecomment-3856668355
@rmannibucau Thanks for the detailed feedback! You raise excellent points
about observability stack integration.
I think there are actually two complementary use cases here:
### 1. Observability/Monitoring
For real-time monitoring, alerting, and trace correlation, I completely
agree that the OpenTelemetry approach is superior:
- Span attributes for Iceberg metrics (already supported via
otel_trace_id/otel_span_id correlation)
- Prometheus /metrics endpoint for aggregated metrics
- Let the observability stack (Tempo, Grafana, etc.) handle storage and
retention
This is the "hot path" for operational visibility.
### 2. Historical Analytics/Auditing (this PR's focus)
The JDBC persistence targets a different use case:
- Query optimization analysis - "Which tables have the most expensive
scans over the last 30 days?"
- Capacity planning - "What's the trend of data scanned per catalog?"
- Audit/compliance - "Show me all operations on table X by principal Y"
- Cost attribution - Correlate scan metrics with cloud costs
These queries need structured, queryable storage that's harder to achieve
with trace backends (which are optimized for trace retrieval, not analytical
queries).
### Proposed Path Forward
The current implementation is designed to be pluggable via the
MetricsPersistence SPI:
- NoOpMetricsPersistence - Default, no storage (current behavior)
- JdbcMetricsPersistence - For users who want queryable historical data
- Future: OtlpMetricsPersistence - Export as OTLP logs/metrics to collector
Users can choose based on their needs. For pure observability, they'd use
the existing OTEL integration + /metrics. For analytics, they'd enable JDBC
persistence.
Does this separation of concerns address your feedback? Or do you see the
JDBC approach as fundamentally problematic even for the analytics use case?
I need this data persisted for end to end auditing for both internal and
external auditors (PII data).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]