albertlockett opened a new issue, #8250: URL: https://github.com/apache/arrow-rs/issues/8250
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** When reading columns where the Arrow data type is a dictionary encoded `PrimitiveArray`, it seems like we're doing something a bit inefficient. We use the builder chooses to use a `PrimitiveArrayReader` here: https://github.com/apache/arrow-rs/blob/3dcd23ffa3cbc0d9496e1660c6f68ce563a336b4/parquet/src/arrow/array_reader/primitive_array.rs#L482 Which will decode the native array, and then cast it back to a dictionary here: https://github.com/apache/arrow-rs/blob/main/parquet/src/arrow/array_reader/primitive_array.rs This seems inefficient because a lot of the values in this array get thrown away after the cast. **Describe the solution you'd like** It seems like it would be more efficient to decode directly into a dictionary array, similar to what we do dictionaries with byte array types. E.g. - implement a `PrimitiveArrayDictionaryReader` implementation of `ArrayReader`, similar to what we've done for byte arrays here https://github.com/apache/arrow-rs/blob/main/parquet/src/arrow/array_reader/byte_array_dictionary.rs - Choose the appropriate reader implementation in `ArrayRaederBuilder::build_primitiveReader` here https://github.com/apache/arrow-rs/blob/3dcd23ffa3cbc0d9496e1660c6f68ce563a336b4/parquet/src/arrow/array_reader/builder.rs#L303 **Describe alternatives you've considered** <!-- A clear and concise description of any alternative solutions or features you've considered. --> **Additional context** <!-- Add any other context or screenshots about the feature request here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org