mzzz-zzm opened a new pull request, #1039:
URL: https://github.com/apache/iceberg-go/pull/1039

   Fixes #1001.
   
   ## Summary
   
   Add a defensive guard in `v3writerImpl.prepareEntry` that clears the 
Avro-facing `dataFile.DistinctCounts` pointer on entries we own, so v3 
manifests cannot emit `data_file.distinct_counts` (deprecated by the v3 spec; 
Java parity: 
[apache/iceberg#12182](https://github.com/apache/iceberg/pull/12182)).
   
   ## Scope decision
   
   Per the discussion on #1001 (reviewer approved option **(c) 
Defensive-only**), this PR ships the minimal v3 gate and defers the v2 schema 
gap to a separate issue.
   
   **Why minimal**: while implementing this, I found that 
`data_file.distinct_counts` is **already** absent from the on-disk Avro for 
every manifest version on `main` — the per-version record schemas in 
[`internal/avro_schemas.go`](https://github.com/apache/iceberg-go/blob/main/internal/avro_schemas.go)
 (`data_file_v1` L238, `data_file_v2` L276, `data_file_v3` L360) do not declare 
a `distinct_counts` field, so the `hamba/avro` encoder drops the Go-side 
pointer at encode time on every version. The intended v3 outcome is therefore 
already in effect, but the v2 path is **non-compliant** (v2 spec still requires 
the field). Tracked separately as #1038.
   
   The guard added here is **forward-compatibility insurance**: when #1038 is 
fixed by adding `distinct_counts` back to `data_file_v2`, the v3 path will stay 
compliant with no further code changes.
   
   The `dataFile` struct, `DistinctValueCounts()` getter, and 
`DataFileBuilder.DistinctValueCounts` setter are intentionally preserved — the 
change is writer-side only.
   
   ## Test
   
   `TestWriteManifestV3OmitsDistinctCounts` writes a v3 manifest with non-empty 
distinct counts supplied via `DataFileBuilder.DistinctValueCounts`, round-trips 
it through `ReadManifest`, and asserts the read side observes no distinct 
counts.
   
   A v2 "keeps distinct counts" test was deliberately **not** added to this PR: 
against today's broken behavior it would either lie (assert `Empty` is correct) 
or fail CI (assert `{1:42}`). The v2 round-trip test belongs in the same commit 
that fixes the v2 schema (see #1038), where it can flip from failing to passing 
alongside the schema change.
   
   ## Verification
   
   - `make test` — all packages green.
   - `make lint` — clean.
   - DCO sign-off present.
   
   ## Related
   
   - Fixes #1001
   - Discovered side-issue: #1038 (v2 schema gap)
   - Java precedent: 
[apache/iceberg#12182](https://github.com/apache/iceberg/pull/12182)
   - Parent v3 tracker: #589


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to