scovich opened a new issue, #8549: URL: https://github.com/apache/arrow-rs/issues/8549
**Describe the bug** The parquet reader always widens `INT32 (Decimal)` and `INT64 (Decimal)` columns to `Decimal128`. It might be possible for a reader to specifically request the narrower types in their read schema, but if they just read what was written to disk they get back a different type than what was written. **To Reproduce** Several unit tests _expect_ the current behavior, and started failing when I changed the parquet reader to preserve types: ``` test arrow::array_reader::primitive_array::tests::test_primitive_array_reader_decimal_types ... FAILED test arrow::arrow_reader::tests::test_decimal ... FAILED test arrow::arrow_reader::tests::test_arbitrary_decimal ... FAILED test arrow::arrow_reader::tests::test_read_decimal_file ... FAILED test arrow::arrow_writer::tests::arrow_writer_decimal128_dictionary ... FAILED test arrow::arrow_writer::tests::arrow_writer_decimal ... FAILED test arrow::arrow_writer::tests::arrow_writer_decimal256_dictionary ... FAILED test arrow::arrow_writer::tests::arrow_writer_decimal64_dictionary ... FAILED test arrow::schema::tests::test_arrow_schema_roundtrip ... FAILED test arrow::schema::tests::test_decimal_fields ... FAILED test arrow::schema::tests::test_column_desc_to_field ... FAILED test statistics::test_decimal128 ... FAILED test statistics::test_decimal_256 ... FAILED test statistics::test_decimal64 ... FAILED test statistics::test_data_page_stats_with_all_null_page ... FAILED ``` **Expected behavior** If it was written as `INT32 (Decimal)` it should come back as `Decimal32` unless the user requested something else via the read schema. Ditto for `INT64 (Decimal)` as `Decimal64`. **Additional context** See https://github.com/apache/arrow-rs/pull/8540#discussion_r2399703980 The variant shredding decimal integration tests fail by default, because they expect 32- and 64-bit decimals to faithfully round trip through parquet; I had to add a manual casting operation `VariantArray` constructor to compensate. For the fix I tried (but eventually reverted in favor of the above-mentioned casting), see https://github.com/apache/arrow-rs/pull/8540/files/cd978f5374503fff3a265bc06ad6ce143a4110f3..5b55f121d7461bd31dfc2d204f8fc303523a3988 The idea was to split the `decimal_type` helper (which only understood 128- and 256-bit decimals) into `decimal_[32|64|128|256]_type` helpers, where each helper's name indicates a lower bound on the resulting decimal's width. Too-high precision can still force a wider decimal, tho it's not clear to me that this is actually correct -- is e.g. `INT32 (Decimal(15, 2))` legal? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
