parthchandra opened a new issue, #7040: URL: https://github.com/apache/arrow-rs/issues/7040
**Describe the bug** The [parquet spec](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#unsigned-integers) says a uint8 or uint16 value must be an `int32` annotated by `INT(8, false), INT(16, false)`. A file with such values gets read into a `int32` vector and the value read may be negative. When casting these values to the unsigned value, the cast method checks if the value is outside the range of valid values for an unsigned value. Since a negative value is outside the range the cast method will either return null or throw an error (depending on the specified cast option). **To Reproduce** I modified `parquet/examples/read_parquet.rs` to read columns _9, and _10 from the attached file. The file schema and contents as dumped by the parquet cli - Schema ``` File path: ./alltypes_extended_plain.parquet Created by: parquet-mr version 1.13.1 (build db4183109d5b734ec5930d870cdae161e408ddba) Properties: writer.model.name: example Schema: message root { optional boolean _1; optional int32 _2 (INTEGER(8,true)); optional int32 _3 (INTEGER(16,true)); optional int32 _4; optional int64 _5; optional float _6; optional double _7; optional binary _8 (STRING); optional int32 _9 (INTEGER(8,false)); optional int32 _10 (INTEGER(16,false)); optional int32 _11 (INTEGER(32,false)); optional int64 _12 (INTEGER(64,false)); optional binary _13 (ENUM); optional fixed_len_byte_array(3) _14; optional int32 _15 (DECIMAL(5,2)); optional int64 _16 (DECIMAL(18,10)); optional fixed_len_byte_array(16) _17 (DECIMAL(38,37)); optional int64 _18 (TIMESTAMP(MILLIS,true)); optional int64 _19 (TIMESTAMP(MICROS,true)); optional int32 _20 (DATE); } ``` Values - ``` {"_1": null, "_2": null, "_3": null, "_4": null, "_5": null, "_6": null, "_7": null, "_8": null, "_9": null, "_10": null, "_11": null, "_12": null, "_13": null, "_14": null, "_15": null, "_16": null, "_17": null, "_18": null, "_19": null, "_20": null} {"_1": null, "_2": null, "_3": null, "_4": null, "_5": null, "_6": null, "_7": null, "_8": null, "_9": null, "_10": null, "_11": null, "_12": null, "_13": null, "_14": null, "_15": null, "_16": null, "_17": null, "_18": null, "_19": null, "_20": null} {"_1": true, "_2": 18, "_3": 10002, "_4": 10002, "_5": 10002, "_6": 10002.0, "_7": 10002.0, "_8": "100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002100021000210002", "_9": -18, "_10": -10002, "_11": -10002, "_12": -10002, "_13": "10002", "_14": [50, 50, 50], "_15": 10002, "_16": 10002, "_17": [50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50], "_18": 10002, "_19": 10002, "_20": 10002} {"_1": null, "_2": null, "_3": null, "_4": null, "_5": null, "_6": null, "_7": null, "_8": null, "_9": null, "_10": null, "_11": null, "_12": null, "_13": null, "_14": null, "_15": null, "_16": null, "_17": null, "_18": null, "_19": null, "_20": null} {"_1": true, "_2": 20, "_3": 10004, "_4": 10004, "_5": 10004, "_6": 10004.0, "_7": 10004.0, "_8": "100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004100041000410004", "_9": -20, "_10": -10004, "_11": -10004, "_12": -10004, "_13": "10004", "_14": [52, 52, 52], "_15": 10004, "_16": 10004, "_17": [52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52], "_18": 10004, "_19": 10004, "_20": 10004} {"_1": null, "_2": null, "_3": null, "_4": null, "_5": null, "_6": null, "_7": null, "_8": null, "_9": null, "_10": null, "_11": null, "_12": null, "_13": null, "_14": null, "_15": null, "_16": null, "_17": null, "_18": null, "_19": null, "_20": null} {"_1": null, "_2": null, "_3": null, "_4": null, "_5": null, "_6": null, "_7": null, "_8": null, "_9": null, "_10": null, "_11": null, "_12": null, "_13": null, "_14": null, "_15": null, "_16": null, "_17": null, "_18": null, "_19": null, "_20": null} {"_1": null, "_2": null, "_3": null, "_4": null, "_5": null, "_6": null, "_7": null, "_8": null, "_9": null, "_10": null, "_11": null, "_12": null, "_13": null, "_14": null, "_15": null, "_16": null, "_17": null, "_18": null, "_19": null, "_20": null} {"_1": true, "_2": 24, "_3": 10008, "_4": 10008, "_5": 10008, "_6": 10008.0, "_7": 10008.0, "_8": "100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008100081000810008", "_9": -24, "_10": -10008, "_11": -10008, "_12": -10008, "_13": "10008", "_14": [56, 56, 56], "_15": 10008, "_16": 10008, "_17": [56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56], "_18": 10008, "_19": 10008, "_20": 10008} {"_1": null, "_2": null, "_3": null, "_4": null, "_5": null, "_6": null, "_7": null, "_8": null, "_9": null, "_10": null, "_11": null, "_12": null, "_13": null, "_14": null, "_15": null, "_16": null, "_17": null, "_18": null, "_19": null, "_20": null} ``` Results ``` +----+-----+ | _9 | _10 | +----+-----+ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | +----+-----+ ``` ``` **Expected behavior** Expect non-null values to be returned. **Additional context** Parquet file generated by Spark: [alltypes_extended_plain.parquet.zip](https://github.com/user-attachments/files/18577920/alltypes_extended_plain.parquet.zip) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
