mzzz-zzm opened a new pull request, #1039: URL: https://github.com/apache/iceberg-go/pull/1039
Fixes #1001. ## Summary Add a defensive guard in `v3writerImpl.prepareEntry` that clears the Avro-facing `dataFile.DistinctCounts` pointer on entries we own, so v3 manifests cannot emit `data_file.distinct_counts` (deprecated by the v3 spec; Java parity: [apache/iceberg#12182](https://github.com/apache/iceberg/pull/12182)). ## Scope decision Per the discussion on #1001 (reviewer approved option **(c) Defensive-only**), this PR ships the minimal v3 gate and defers the v2 schema gap to a separate issue. **Why minimal**: while implementing this, I found that `data_file.distinct_counts` is **already** absent from the on-disk Avro for every manifest version on `main` — the per-version record schemas in [`internal/avro_schemas.go`](https://github.com/apache/iceberg-go/blob/main/internal/avro_schemas.go) (`data_file_v1` L238, `data_file_v2` L276, `data_file_v3` L360) do not declare a `distinct_counts` field, so the `hamba/avro` encoder drops the Go-side pointer at encode time on every version. The intended v3 outcome is therefore already in effect, but the v2 path is **non-compliant** (v2 spec still requires the field). Tracked separately as #1038. The guard added here is **forward-compatibility insurance**: when #1038 is fixed by adding `distinct_counts` back to `data_file_v2`, the v3 path will stay compliant with no further code changes. The `dataFile` struct, `DistinctValueCounts()` getter, and `DataFileBuilder.DistinctValueCounts` setter are intentionally preserved — the change is writer-side only. ## Test `TestWriteManifestV3OmitsDistinctCounts` writes a v3 manifest with non-empty distinct counts supplied via `DataFileBuilder.DistinctValueCounts`, round-trips it through `ReadManifest`, and asserts the read side observes no distinct counts. A v2 "keeps distinct counts" test was deliberately **not** added to this PR: against today's broken behavior it would either lie (assert `Empty` is correct) or fail CI (assert `{1:42}`). The v2 round-trip test belongs in the same commit that fixes the v2 schema (see #1038), where it can flip from failing to passing alongside the schema change. ## Verification - `make test` — all packages green. - `make lint` — clean. - DCO sign-off present. ## Related - Fixes #1001 - Discovered side-issue: #1038 (v2 schema gap) - Java precedent: [apache/iceberg#12182](https://github.com/apache/iceberg/pull/12182) - Parent v3 tracker: #589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
