jecsand838 opened a new pull request, #8255:
URL: https://github.com/apache/arrow-rs/pull/8255

   # Which issue does this PR close?
   
   - Part of https://github.com/apache/arrow-rs/issues/4886
   
   # Rationale for this change
   
   Apache Avro’s `decimal` logical type annotates either `bytes` or `fixed` and 
carries `precision` and `scale`. Implementations should reject invalid 
combinations such as `scale > precision`, and the underlying bytes are the 
two’s‑complement big‑endian representation of the unscaled integer. On the 
Arrow side, Rust now exposes first‑class `Decimal32`, `Decimal64`, 
`Decimal128`, and `Decimal256` data types with documented maximum precisions 
(9, 18, 38, 76 respectively). Until now, `arrow-avro` decoded all Avro decimals 
to 128/256‑bit Arrow decimals, even when a narrower type would suffice.
   
   # What changes are included in this PR?
   
   **`arrow-avro/src/codec.rs`**
   
   * Map `Codec::Decimal(precision, scale, _size)` to Arrow’s 
`Decimal32`/`64`/`128`/`256` **by precision**, preferring the narrowest type 
(≤9→32, ≤18→64, ≤38→128, otherwise 256). 
   * Strengthen decimal attribute parsing:
     * Error if `scale > precision`.
     * Error if `precision` exceeds Arrow’s maximum (Decimal256).
     * If Avro uses `fixed`, check that declared `precision` fits the byte 
width (≤4→max 9, ≤8→18, ≤16→38, ≤32→76). 
   * Update docstring of `Codec::Decimal` to mention `Decimal32`/`64`. 
   
   **`arrow-avro/src/reader/record.rs`**
   
   * Add `Decoder::Decimal32` and `Decoder::Decimal64` variants with 
corresponding builders (`Decimal32Builder`, `Decimal64Builder`).
   * Builder selection:
   
     * If Avro uses **fixed**: choose by size (≤4→Decimal32, ≤8→Decimal64, 
≤16→Decimal128, ≤32→Decimal256).
     * If Avro uses **bytes**: choose by declared precision (≤9/≤18/≤38/≤76).
   * Implement decode paths that sign‑extend Avro’s two’s‑complement payload to 
4/8 bytes and append values to the new builders; update `append_null`/`flush` 
for 32/64‑bit decimals.
   
   **`arrow-avro/src/reader/mod.rs` (tests)**
   
   * Expand `test_decimal` to assert that:
   
     * bytes‑backed decimals with precision 4 map to `Decimal32`; precision 10 
map to `Decimal64`;
     * legacy fixed\[8] decimals map to `Decimal64`;
     * fixed\[16] decimals map to `Decimal128`.
   * Add a nulls path test for bytes‑backed `Decimal32`.
   
   # Are these changes tested?
   
   Yes. Unit tests under `arrow-avro/src/reader/mod.rs` construct expected 
`Decimal32Array`/`Decimal64Array`/`Decimal128Array` with 
`with_precision_and_scale`, and compare against batches decoded from Avro files 
(including legacy fixed and bytes‑backed cases). The tests also exercise small 
batch sizes to cover buffering paths; a new Avro data file is added for 
higher‑width decimals.
   
   The test use new Avro test files that are detailed further in this 
arrow-testing PR: https://github.com/apache/arrow-testing/pull/112
   
   # Are there any user-facing changes?
   
   N/A due to `arrow-avro` not being public.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to