jecsand838 opened a new pull request, #8255: URL: https://github.com/apache/arrow-rs/pull/8255
# Which issue does this PR close? - Part of https://github.com/apache/arrow-rs/issues/4886 # Rationale for this change Apache Avro’s `decimal` logical type annotates either `bytes` or `fixed` and carries `precision` and `scale`. Implementations should reject invalid combinations such as `scale > precision`, and the underlying bytes are the two’s‑complement big‑endian representation of the unscaled integer. On the Arrow side, Rust now exposes first‑class `Decimal32`, `Decimal64`, `Decimal128`, and `Decimal256` data types with documented maximum precisions (9, 18, 38, 76 respectively). Until now, `arrow-avro` decoded all Avro decimals to 128/256‑bit Arrow decimals, even when a narrower type would suffice. # What changes are included in this PR? **`arrow-avro/src/codec.rs`** * Map `Codec::Decimal(precision, scale, _size)` to Arrow’s `Decimal32`/`64`/`128`/`256` **by precision**, preferring the narrowest type (≤9→32, ≤18→64, ≤38→128, otherwise 256). * Strengthen decimal attribute parsing: * Error if `scale > precision`. * Error if `precision` exceeds Arrow’s maximum (Decimal256). * If Avro uses `fixed`, check that declared `precision` fits the byte width (≤4→max 9, ≤8→18, ≤16→38, ≤32→76). * Update docstring of `Codec::Decimal` to mention `Decimal32`/`64`. **`arrow-avro/src/reader/record.rs`** * Add `Decoder::Decimal32` and `Decoder::Decimal64` variants with corresponding builders (`Decimal32Builder`, `Decimal64Builder`). * Builder selection: * If Avro uses **fixed**: choose by size (≤4→Decimal32, ≤8→Decimal64, ≤16→Decimal128, ≤32→Decimal256). * If Avro uses **bytes**: choose by declared precision (≤9/≤18/≤38/≤76). * Implement decode paths that sign‑extend Avro’s two’s‑complement payload to 4/8 bytes and append values to the new builders; update `append_null`/`flush` for 32/64‑bit decimals. **`arrow-avro/src/reader/mod.rs` (tests)** * Expand `test_decimal` to assert that: * bytes‑backed decimals with precision 4 map to `Decimal32`; precision 10 map to `Decimal64`; * legacy fixed\[8] decimals map to `Decimal64`; * fixed\[16] decimals map to `Decimal128`. * Add a nulls path test for bytes‑backed `Decimal32`. # Are these changes tested? Yes. Unit tests under `arrow-avro/src/reader/mod.rs` construct expected `Decimal32Array`/`Decimal64Array`/`Decimal128Array` with `with_precision_and_scale`, and compare against batches decoded from Avro files (including legacy fixed and bytes‑backed cases). The tests also exercise small batch sizes to cover buffering paths; a new Avro data file is added for higher‑width decimals. The test use new Avro test files that are detailed further in this arrow-testing PR: https://github.com/apache/arrow-testing/pull/112 # Are there any user-facing changes? N/A due to `arrow-avro` not being public. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
