wombatu-kun opened a new pull request, #16417: URL: https://github.com/apache/iceberg/pull/16417
## Problem Fixes #11950 — https://github.com/apache/iceberg/issues/11950 `MetricsConfig` keys per-column metrics modes by the **original** Iceberg column name, while `MetricsUtil.metricsMode(schema, config, fieldId)` resolves the field id to a name using whatever schema it is handed. When metrics are computed from a Parquet **footer** — the `add_files` / `migrate` / `snapshot` path via `ParquetUtil.fileMetrics` (`TableMigrationUtil`, the Delta→Iceberg snapshot action) — the schema is reconstructed from the file and carries **sanitized** names (`AvroSchemaUtil.makeCompatibleName`, e.g. `$event_time` → `_x24event_time`). The name lookup then misses and silently falls back to the default mode, which is `none` once a table exceeds `write.metadata.metrics.max-inferred-column-defaults` (default 100). Column bounds and value counts are silently dropped for escaped columns, degrading scan pruning. Explicit `write.metadata.metrics.column.<escaped-name>` overrides are ignored on the same path. The write path was already correct because `ParquetWriter.metrics()` passes the writer's original Iceberg schema; only the footer-reconstructed path was affected. ## Fix Alongside the original-name column modes, `MetricsConfig` now derives a separate alias map keyed by the **sanitized** form of each column name (sanitizing each dotted path component). `columnMode` falls back to this alias map, so a mode resolves whether the caller passes the original name (write path) or the sanitized name (footer path). The alias map is intentionally kept separate from `columnModes` so `validateReferencedColumns` keeps validating only user-supplied original names against the table schema. The change is confined to the private constructor, so every factory path benefits and there is no public API change (revapi clean). ## Tests - `TestMetricsModes`: inferred-default and explicit-override cases for an escaped column, asserting `columnMode` resolves by both the original and the sanitized name. - `TestParquet`: end-to-end regression that writes a Parquet file with an escaped column and verifies `ParquetUtil.fileMetrics` collects value counts and bounds for it. Closes #11950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
