jecsand838 opened a new pull request, #8124:
URL: https://github.com/apache/arrow-rs/pull/8124

   # Which issue does this PR close?
   
   - Part of https://github.com/apache/arrow-rs/issues/4886
   - Follows up on https://github.com/apache/arrow-rs/pull/8047
   
   # Rationale for this change
   
   Avro allows safe widening between numeric primitives and interoperability 
between `bytes` and UTF‑8 `string` during **schema resolution**. 
   
   Implementing promotion-aware decoding lets us:
   - Honor the Avro spec’s resolution matrix directly in the reader, improving 
interoperability with evolving schemas. 
   - Decode **directly into the target Arrow type** (avoiding extra passes and 
temporary arrays).
   - Produce clear errors for **illegal promotions**, instead of surprising 
behavior. (Per the spec, unresolved writer/reader mismatches are errors.)
   
   # What changes are included in this PR?
   
   **Core decoding (`arrow-avro/src/reader/record.rs`):**
   - Add promotion-aware decoder variants:
     - `Int32ToInt64`, `Int32ToFloat32`, `Int32ToFloat64`
     - `Int64ToFloat32`, `Int64ToFloat64`
     - `Float32ToFloat64`
     - `BytesToString`, `StringToBytes`
   - Teach `Decoder::try_new` to inspect `ResolutionInfo::Promotion` and select 
the appropriate variant, so conversion happens **as we decode**, not after.
   - Extend `decode`, `append_null`, and `flush` to handle the new variants and 
materialize the correct Arrow arrays (`Int64Array`, `Float32Array`, 
`Float64Array`, `StringArray`, `BinaryArray`).
   - Keep existing behavior for `Utf8View` for non-promoted strings; promotions 
to `string` materialize a `StringArray` (not `StringViewArray`) for correctness 
and simplicity. (StringView remains available for native UTF‑8 paths.)
   
   **Integration tests & helpers (`arrow-avro/src/reader/mod.rs`):**
   - Add utilities to load a file’s **writer schema** JSON and synthesize a 
**reader schema** with field-level promotions 
(`make_reader_schema_with_promotions`).
   - Add cross‑codec tests on `alltypes_plain` (no compression, snappy, zstd, 
bzip2, xz) that validate:
     - Mixed numeric promotions to `float`/`double` and `int to long`.
     - `bytes to string` and `string to bytes`.
     - Timestamp/timezone behavior unchanged.
   - Add **negative** test ensuring **illegal promotions** (e.g., `boolean to 
double`) produce a descriptive error.
   
   # Are these changes tested?
   
   Yes.
   - **Unit tests** (in `record.rs`) for each promotion path:
     - `int to long`, `int to float`, `int to double`
     - `long to float`, `long to double`
     - `float to double`
     - `bytes to string` (including non‑ASCII UTF‑8) and `string to bytes`
     - Verifies that **illegal** promotions fail fast.
   - **Integration tests** (in `mod.rs`) reading real `alltypes_plain` Avro 
files across multiple compression codecs, asserting exact Arrow outputs for 
promoted fields.
   - Existing tests continue to pass.
   
   # Are there any user-facing changes?
   
   N/A


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to