jecsand838 opened a new pull request, #8124: URL: https://github.com/apache/arrow-rs/pull/8124
# Which issue does this PR close? - Part of https://github.com/apache/arrow-rs/issues/4886 - Follows up on https://github.com/apache/arrow-rs/pull/8047 # Rationale for this change Avro allows safe widening between numeric primitives and interoperability between `bytes` and UTF‑8 `string` during **schema resolution**. Implementing promotion-aware decoding lets us: - Honor the Avro spec’s resolution matrix directly in the reader, improving interoperability with evolving schemas. - Decode **directly into the target Arrow type** (avoiding extra passes and temporary arrays). - Produce clear errors for **illegal promotions**, instead of surprising behavior. (Per the spec, unresolved writer/reader mismatches are errors.) # What changes are included in this PR? **Core decoding (`arrow-avro/src/reader/record.rs`):** - Add promotion-aware decoder variants: - `Int32ToInt64`, `Int32ToFloat32`, `Int32ToFloat64` - `Int64ToFloat32`, `Int64ToFloat64` - `Float32ToFloat64` - `BytesToString`, `StringToBytes` - Teach `Decoder::try_new` to inspect `ResolutionInfo::Promotion` and select the appropriate variant, so conversion happens **as we decode**, not after. - Extend `decode`, `append_null`, and `flush` to handle the new variants and materialize the correct Arrow arrays (`Int64Array`, `Float32Array`, `Float64Array`, `StringArray`, `BinaryArray`). - Keep existing behavior for `Utf8View` for non-promoted strings; promotions to `string` materialize a `StringArray` (not `StringViewArray`) for correctness and simplicity. (StringView remains available for native UTF‑8 paths.) **Integration tests & helpers (`arrow-avro/src/reader/mod.rs`):** - Add utilities to load a file’s **writer schema** JSON and synthesize a **reader schema** with field-level promotions (`make_reader_schema_with_promotions`). - Add cross‑codec tests on `alltypes_plain` (no compression, snappy, zstd, bzip2, xz) that validate: - Mixed numeric promotions to `float`/`double` and `int to long`. - `bytes to string` and `string to bytes`. - Timestamp/timezone behavior unchanged. - Add **negative** test ensuring **illegal promotions** (e.g., `boolean to double`) produce a descriptive error. # Are these changes tested? Yes. - **Unit tests** (in `record.rs`) for each promotion path: - `int to long`, `int to float`, `int to double` - `long to float`, `long to double` - `float to double` - `bytes to string` (including non‑ASCII UTF‑8) and `string to bytes` - Verifies that **illegal** promotions fail fast. - **Integration tests** (in `mod.rs`) reading real `alltypes_plain` Avro files across multiple compression codecs, asserting exact Arrow outputs for promoted fields. - Existing tests continue to pass. # Are there any user-facing changes? N/A -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org