laskoviymishka opened a new issue, #987: URL: https://github.com/apache/iceberg-go/issues/987
Parent: #589 Once the shredded reader (#986) lands, the writer needs to decide what to shred per row and emit the shredded `STRUCT` alongside `metadata` / residual `value`. Java's posture: no shredding unless explicitly configured. Mirror that with a new table property `write.variant.shredding-paths` carrying a list of `$.path.expressions` — same property name as Java, parsed by the same property layer that already handles `write.metadata.compression-codec` etc. Add `table/internal/variant_shredder.go` with `ShredVariant(value variant.Value, schema ShreddingSchema) (typed any, residual []byte, err error)`, then hook it into the parquet writer in `table/internal/parquet_files.go` so shredded columns appear in the parquet schema and are populated per row. Statistics on shredded typed columns piggyback the existing column-stats path — typed columns get min/max for free, which is the substrate the follow-up stats issue builds on. Cross-client coverage: write a shredded variant via iceberg-go, read it via Java/pyiceberg, assert equal to source. Round-trip via the iceberg-go reader is also required. Spec: [Parquet Variant shredding](https://github.com/apache/parquet-format/blob/master/VariantShredding.md). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
