dimas-b commented on code in PR #3385: URL: https://github.com/apache/polaris/pull/3385#discussion_r2722097491
########## .gitignore: ########## @@ -112,3 +112,4 @@ venv # to override default properties for local development. # And then use `./gradlew run -Dquarkus.profile=local` to run Polaris with dev profile. application-local.properties +.polaris-work-notes.md Review Comment: is this change necessary in OSS? Looks like a local process? 😉 ########## persistence/relational-jdbc/src/main/resources/postgres/schema-v4.sql: ########## @@ -0,0 +1,194 @@ +-- +-- Licensed to the Apache Software Foundation (ASF) under one +-- or more contributor license agreements. See the NOTICE file +-- distributed with this work for additional information +-- regarding copyright ownership. The ASF licenses this file +-- to you under the Apache License, Version 2.0 (the +-- "License"). You may not use this file except in compliance +-- with the License. You may obtain a copy of the License at +-- +-- http://www.apache.org/licenses/LICENSE-2.0 +-- +-- Unless required by applicable law or agreed to in writing, +-- software distributed under the License is distributed on an +-- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +-- KIND, either express or implied. See the License for the +-- specific language governing permissions and limitations +-- under the License. +-- + +-- Changes from v3: +-- * Added `scan_metrics_report` table for scan metrics as first-class entities +-- * Added `commit_metrics_report` table for commit metrics as first-class entities + +CREATE SCHEMA IF NOT EXISTS POLARIS_SCHEMA; +SET search_path TO POLARIS_SCHEMA; + +CREATE TABLE IF NOT EXISTS version ( + version_key TEXT PRIMARY KEY, + version_value INTEGER NOT NULL +); +INSERT INTO version (version_key, version_value) +VALUES ('version', 4) +ON CONFLICT (version_key) DO UPDATE +SET version_value = EXCLUDED.version_value; +COMMENT ON TABLE version IS 'the version of the JDBC schema in use'; + +-- Include all tables from v3 +-- (entities, grant_records, principal_authentication_data, policy_mapping_record, events) +-- These are assumed to already exist from v3 migration + +-- Scan Metrics Report Entity Table +CREATE TABLE IF NOT EXISTS scan_metrics_report ( + report_id TEXT NOT NULL, + realm_id TEXT NOT NULL, + catalog_id TEXT NOT NULL, + catalog_name TEXT NOT NULL, + namespace TEXT NOT NULL, + table_name TEXT NOT NULL, + + -- Report metadata + timestamp_ms BIGINT NOT NULL, + principal_name TEXT, + request_id TEXT, + + -- Trace correlation + otel_trace_id TEXT, + otel_span_id TEXT, + report_trace_id TEXT, + + -- Scan context + snapshot_id BIGINT, + schema_id INTEGER, + filter_expression TEXT, + projected_field_ids TEXT, + projected_field_names TEXT, + + -- Scan metrics + result_data_files BIGINT DEFAULT 0, + result_delete_files BIGINT DEFAULT 0, + total_file_size_bytes BIGINT DEFAULT 0, + total_data_manifests BIGINT DEFAULT 0, + total_delete_manifests BIGINT DEFAULT 0, + scanned_data_manifests BIGINT DEFAULT 0, + scanned_delete_manifests BIGINT DEFAULT 0, + skipped_data_manifests BIGINT DEFAULT 0, + skipped_delete_manifests BIGINT DEFAULT 0, + skipped_data_files BIGINT DEFAULT 0, + skipped_delete_files BIGINT DEFAULT 0, + total_planning_duration_ms BIGINT DEFAULT 0, + + -- Equality/positional delete metrics + equality_delete_files BIGINT DEFAULT 0, + positional_delete_files BIGINT DEFAULT 0, + indexed_delete_files BIGINT DEFAULT 0, + total_delete_file_size_bytes BIGINT DEFAULT 0, + + -- Additional metadata (for extensibility) + metadata JSONB DEFAULT '{}'::JSONB, + + PRIMARY KEY (report_id) +); + +COMMENT ON TABLE scan_metrics_report IS 'Scan metrics reports as first-class entities'; +COMMENT ON COLUMN scan_metrics_report.report_id IS 'Unique identifier for the report'; +COMMENT ON COLUMN scan_metrics_report.realm_id IS 'Realm ID for multi-tenancy'; +COMMENT ON COLUMN scan_metrics_report.catalog_id IS 'Catalog ID'; +COMMENT ON COLUMN scan_metrics_report.otel_trace_id IS 'OpenTelemetry trace ID from HTTP headers'; +COMMENT ON COLUMN scan_metrics_report.report_trace_id IS 'Trace ID from report metadata'; + +-- Indexes for scan_metrics_report Review Comment: Are these query patterns actualized in Polaris code? I might be missing that, but IIRC, Polaris itself does not even need to query these reports 🤔 If you're thinking of providing convenience tools for consumption in downstream builds, I believe it would be preferable to define a top-level API for that... and probably have an RFC (like OPA) + `dev` ML discussion... If your intended use case is specific to JDBC + custom queries without expecting general reuse by other OSS parties, it might be preferable to keep these indexes downstream. WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
