albertlockett opened a new issue, #8250:
URL: https://github.com/apache/arrow-rs/issues/8250

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   When reading columns where the Arrow data type is a dictionary encoded 
`PrimitiveArray`, it seems like we're doing something a bit inefficient. We use 
the builder chooses to use a `PrimitiveArrayReader` here:
   
https://github.com/apache/arrow-rs/blob/3dcd23ffa3cbc0d9496e1660c6f68ce563a336b4/parquet/src/arrow/array_reader/primitive_array.rs#L482
   
   Which will decode the native array, and then cast it back to a dictionary 
here:
   
https://github.com/apache/arrow-rs/blob/main/parquet/src/arrow/array_reader/primitive_array.rs
   
   This seems inefficient because a lot of the values in this array get thrown 
away after the cast.
   
   **Describe the solution you'd like**
   It seems like it would be more efficient to decode directly into a 
dictionary array, similar to what we do dictionaries with byte array types. 
E.g. 
   - implement a `PrimitiveArrayDictionaryReader` implementation of 
`ArrayReader`, similar to what we've done for byte arrays here 
https://github.com/apache/arrow-rs/blob/main/parquet/src/arrow/array_reader/byte_array_dictionary.rs
   - Choose the appropriate reader implementation in 
`ArrayRaederBuilder::build_primitiveReader` here 
https://github.com/apache/arrow-rs/blob/3dcd23ffa3cbc0d9496e1660c6f68ce563a336b4/parquet/src/arrow/array_reader/builder.rs#L303
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features 
you've considered.
   -->
   
   **Additional context**
   <!--
   Add any other context or screenshots about the feature request here.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to