obelix74 commented on code in PR #3924: URL: https://github.com/apache/polaris/pull/3924#discussion_r2885527087
########## site/content/in-dev/unreleased/proposals/observability-rest-api.md: ########## @@ -0,0 +1,1329 @@ +--- +title: Observability REST API +linkTitle: Observability REST API +weight: 100 +--- +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> + +# Proposal: REST API for Querying Table Metrics and Events + +**Author:** Anand Sankaran +**Date:** 2026-03-02 +**Status:** Draft Proposal +**Target:** Apache Polaris + +--- + +## Abstract + +This proposal defines REST API endpoints for querying table metrics and catalog events from Apache Polaris. The endpoints expose data already being persisted via the existing JDBC persistence model (`events`, `scan_metrics_report`, `commit_metrics_report` tables) and follow established Polaris API patterns. + +**Note:** The Events API in this proposal is designed to align with the emerging [Iceberg Events API specification](https://github.com/apache/iceberg/pull/12584), which is nearing consensus in the Apache Iceberg community. This ensures forward compatibility and consistency with the broader Iceberg ecosystem. + +--- + +## Table of Contents + +1. [Motivation](#1-motivation) +2. [Use Cases](#2-use-cases) +3. [Design Principles](#3-design-principles) +4. [API Specification](#4-api-specification) +5. [Authorization](#5-authorization) +6. [OpenAPI Schema](#6-openapi-schema) +7. [Implementation Notes](#7-implementation-notes) +8. [Iceberg Events API Alignment](#8-iceberg-events-api-alignment) + +--- + +## 1. Motivation + +Apache Polaris currently persists table metrics (scan reports, commit reports) and catalog events to the database, but provides no REST API to query this data. Users must access the database directly to retrieve metrics or audit information. + +Adding read-only REST endpoints enables: +- Programmatic access to metrics without database credentials +- Integration with monitoring dashboards and alerting systems +- Consistent authorization via Polaris RBAC +- Pagination and filtering without writing SQL + +--- + +## 2. Use Cases + +### 2.1 Table Health Monitoring +- Track write patterns: files added/removed per commit, record counts, duration trends +- Identify tables with high commit frequency or unusually large commits +- Detect issues indicating need for compaction (many small files) or optimization + +### 2.2 Query Performance Analysis +- Understand read patterns: files scanned vs skipped, planning duration +- Identify inefficient queries with low manifest/file pruning ratios +- Correlate performance with filter expressions and projected columns + +### 2.3 Capacity Planning & Chargeback +- Aggregate metrics by table, namespace, or principal over time +- Track storage growth trends (`total_file_size_bytes`) +- Attribute usage to teams/users via `principal_name` + +### 2.4 Debugging & Troubleshooting +- Correlate metrics with distributed traces (`otel_trace_id`, `otel_span_id`) +- Investigate specific commits by `snapshot_id` +- Trace operations via `request_id` + +### 2.5 Audit & Compliance +- Track who created/dropped/modified catalog objects +- Monitor administrative actions (credential rotation, grant changes) +- Generate compliance reports for access patterns + +--- + +## 3. Design Principles + +| Principle | Rationale | +|-----------|-----------| +| **Iceberg Events API alignment** | Events API follows the [Iceberg Events API spec](https://github.com/apache/iceberg/pull/12584) for ecosystem compatibility | +| **Dedicated metrics-reports namespace** | Metrics APIs use `/api/metrics-reports/v1/...` to separate from management and catalog APIs | +| **POST for complex filtering** | Events API uses POST with request body (per Iceberg spec) to support complex filters (arrays, nested objects) | +| **Read-only semantics** | All endpoints are read-only; metrics/events are written via existing flows | +| **Consistent pagination** | Follow `continuation-token` pattern (Iceberg) and `pageToken` pattern (Polaris APIs) | +| **Flexible filtering** | Time ranges, operation types, catalog objects - common query patterns | +| **RBAC integration** | Leverage existing Polaris authorization model | + +--- + +## 4. API Specification + +### 4.1 Endpoint Summary + +| Method | Path | Description | +|--------|------|-------------| +| POST | `/api/catalog/v1/{prefix}/events` | Query events for a catalog (Iceberg-compatible) | +| GET | `/api/metrics-reports/v1/catalogs/{catalogName}/namespaces/{namespace}/tables/{table}` | List metrics for a table (type specified via query parameter) | + +> **Note:** The Events API uses POST (not GET) and follows the Iceberg REST Catalog path structure (`/api/catalog/v1/{prefix}/events`) for compatibility with the [Iceberg Events API specification](https://github.com/apache/iceberg/pull/12584). The metrics API uses a dedicated `/api/metrics-reports/v1/` namespace since it exposes pre-populated records rather than managing catalog state - a server that doesn't support catalog management may still expose metrics reports. + +### 4.2 Path Parameters + +| Parameter | Type | Description | +|-----------|------|-------------| +| `prefix` | string | Catalog prefix (typically the catalog name) | +| `catalogName` | string | Name of the catalog | +| `namespace` | string | Namespace (URL-encoded, multi-level separated by `%1F`) | +| `table` | string | Table name | + +### 4.3 Events API (Iceberg-Compatible) + +The Events API follows the [Iceberg Events API specification](https://github.com/apache/iceberg/pull/12584) for ecosystem compatibility. Key design decisions from the Iceberg spec: + +- **POST method**: Allows complex filtering with arrays and nested objects in the request body +- **Continuation token**: Opaque cursor for resumable pagination +- **Operation-centric model**: Events are structured around operations (create-table, update-table, etc.) +- **Custom extensions**: Support for `x-` prefixed custom operation types for Polaris-specific events + +#### Request Body (`QueryEventsRequest`) + +| Property | Type | Required | Description | +|----------|------|----------|-------------| +| `continuation-token` | string | No | Opaque cursor to resume fetching from previous request | +| `page-size` | integer | No | Maximum events per page (server may return fewer) | +| `after-timestamp-ms` | long | No | Filter: events after this timestamp (inclusive) | +| `operation-types` | array[string] | No | Filter by operation types (see below) | +| `catalog-objects-by-name` | array[array[string]] | No | Filter by namespace/table/view names | +| `catalog-objects-by-id` | array[object] | No | Filter by table/view UUIDs | +| `object-types` | array[string] | No | Filter by object type: `namespace`, `table`, `view` | +| `custom-filters` | object | No | Implementation-specific filter extensions | + +#### Standard Operation Types + +| Operation Type | Description | +|----------------|-------------| +| `create-table` | Table created and committed | +| `register-table` | Existing table registered in catalog | +| `drop-table` | Table dropped | +| `update-table` | Table metadata updated | +| `rename-table` | Table renamed | +| `create-view` | View created | +| `drop-view` | View dropped | +| `update-view` | View updated | +| `rename-view` | View renamed | +| `create-namespace` | Namespace created | +| `update-namespace-properties` | Namespace properties updated | +| `drop-namespace` | Namespace dropped | + +#### Polaris Custom Operation Types + +For Polaris-specific events not covered by the Iceberg spec, use the `x-` prefix convention: + +| Custom Operation Type | Description | +|----------------------|-------------| +| `x-polaris-create-catalog-role` | Catalog role created | +| `x-polaris-grant-privilege` | Privilege granted | +| `x-polaris-rotate-credentials` | Principal credentials rotated | +| `x-polaris-create-policy` | Policy created | +| `x-polaris-attach-policy` | Policy attached to resource | + +### 4.4 Query Parameters (Metrics API) + +#### List Table Metrics (`/.../tables/{table}`) + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `metricType` | string | **Yes** | - | Type of metrics to retrieve: `scan` or `commit` | +| `pageToken` | string | No | - | Cursor for pagination | +| `pageSize` | integer | No | 100 | Results per page (max: 1000) | +| `snapshotId` | long | No | - | Filter by snapshot ID | +| `principalName` | string | No | - | Filter by principal | +| `timestampFrom` | long | No | - | Start of time range (epoch ms) | +| `timestampTo` | long | No | - | End of time range (epoch ms) | +| `operation` | string | No | - | Filter by commit operation (only applicable when `metricType=commit`): `append`, `overwrite`, `delete`, `replace` | + +> **Note:** The `metricType` parameter is required. This design allows for future extensibility as new metric types are added (e.g., compaction metrics, maintenance metrics) without requiring new endpoints. + +### 4.5 Example Requests and Responses + +#### Query Events (Iceberg-Compatible) + +**Request:** +```http +POST /api/catalog/v1/my-catalog/events +Authorization: Bearer <token> +Content-Type: application/json + +{ + "page-size": 2, + "operation-types": ["create-table", "update-table"], + "after-timestamp-ms": 1709251200000, + "catalog-objects-by-name": [ + ["analytics", "events"] + ], + "object-types": ["table"] +} +``` + +**Response:** +```json +{ + "next-page-token": "eyJ0cyI6MTcwOTMzNzYxMjM0NSwiaWQiOiI1NTBlODQwMCJ9", + "highest-processed-timestamp-ms": 1709337612345, + "events": [ + { + "event-id": "550e8400-e29b-41d4-a716-446655440000", + "request-id": "req-12345", + "request-event-count": 1, + "timestamp-ms": 1709337612345, + "actor": { + "principal": "[email protected]", + "client-ip": "192.168.1.100" + }, + "operation": { + "operation-type": "create-table", + "identifier": { + "namespace": ["analytics", "events"], + "name": "page_views" + }, + "table-uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "updates": [ + {"action": "assign-uuid", "uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"}, + {"action": "set-current-schema", "schema-id": 0}, + {"action": "set-default-spec", "spec-id": 0} + ] + } + }, + { + "event-id": "661f9511-f30c-52e5-b827-557766551111", + "request-id": "req-12346", + "request-event-count": 1, + "timestamp-ms": 1709337500000, + "actor": { + "principal": "[email protected]" + }, + "operation": { + "operation-type": "update-table", + "identifier": { + "namespace": ["analytics", "events"], + "name": "user_actions" + }, + "table-uuid": "b2c3d4e5-f6a7-8901-bcde-f23456789012", + "updates": [ + {"action": "add-snapshot", "snapshot-id": 123456789} + ], + "requirements": [ + {"type": "assert-table-uuid", "uuid": "b2c3d4e5-f6a7-8901-bcde-f23456789012"} + ] + } + } + ] +} +``` + +#### Query Events with Custom Polaris Operations + +**Request:** +```http +POST /api/catalog/v1/my-catalog/events +Authorization: Bearer <token> +Content-Type: application/json + +{ + "page-size": 10, + "operation-types": ["x-polaris-grant-privilege", "x-polaris-rotate-credentials"] +} +``` + +**Response:** +```json +{ + "next-page-token": "eyJ0cyI6MTcwOTMzODAwMDAwMH0=", + "highest-processed-timestamp-ms": 1709338000000, + "events": [ + { + "event-id": "772f0622-g41d-63f6-c938-668877662222", + "request-id": "req-admin-001", + "request-event-count": 1, + "timestamp-ms": 1709338000000, + "actor": { + "principal": "[email protected]" + }, + "operation": { + "operation-type": "custom", + "custom-type": "x-polaris-grant-privilege", + "identifier": { + "namespace": ["analytics", "events"], + "name": "page_views" + }, + "table-uuid": "a1b2c3d4-e5f6-7890-abcd-ef1234567890", + "privilege": "TABLE_READ_DATA", + "grantee": "data-analyst-role" + } + } + ] +} +``` + +#### List Metrics (Scan) + +**Request:** +```http +GET /api/metrics-reports/v1/catalogs/my-catalog/namespaces/analytics%1Fevents/tables/page_views?metricType=scan&pageSize=2×tampFrom=1709251200000 +Authorization: Bearer <token> +``` + +**Response:** +```json +{ + "nextPageToken": null, + "metricType": "scan", + "reports": [ + { + "reportId": "scan-001-abc123", + "catalogId": 12345, + "tableId": 67890, + "timestampMs": 1709337612345, + "principalName": "[email protected]", + "requestId": "req-scan-001", + "otelTraceId": "abc123def456789012345678901234", + "otelSpanId": "def456789012", + "snapshotId": 1234567890123, + "schemaId": 0, + "filterExpression": "event_date >= '2024-03-01'", + "projectedFieldIds": "1,2,3,5,8", + "projectedFieldNames": "event_id,user_id,event_type,timestamp,page_url", + "resultDataFiles": 150, + "resultDeleteFiles": 5, + "totalFileSizeBytes": 1073741824, + "totalDataManifests": 12, + "totalDeleteManifests": 2, + "scannedDataManifests": 8, + "scannedDeleteManifests": 2, + "skippedDataManifests": 4, + "skippedDeleteManifests": 0, + "skippedDataFiles": 45, + "skippedDeleteFiles": 0, + "totalPlanningDurationMs": 250, + "equalityDeleteFiles": 3, + "positionalDeleteFiles": 2, + "indexedDeleteFiles": 0, + "totalDeleteFileSizeBytes": 52428800 + } + ] +} +``` + +#### List Metrics (Commit) + +**Request:** +```http +GET /api/metrics-reports/v1/catalogs/my-catalog/namespaces/analytics%1Fevents/tables/page_views?metricType=commit&operation=append&pageSize=2 +Authorization: Bearer <token> +``` + +**Response:** +```json +{ + "nextPageToken": "eyJ0cyI6MTcwOTMzNzcwMDAwMCwiaWQiOiJjb21taXQtMDAyIn0=", + "metricType": "commit", + "reports": [ + { + "reportId": "commit-001-xyz789", Review Comment: with the new design: ListMetricsResponse uses metricType as a discriminator to determine the concrete response type (ListScanMetricsResponse or ListCommitMetricsResponse). Each concrete response type has a reports array with a single, known item type: • ListScanMetricsResponse.reports → ScanMetricsReport[] • ListCommitMetricsResponse.reports → CommitMetricsReport[] I am not 100% sure, but I think Jackson supports it. Can you please see if it works? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
