Yordan Pavlov created ARROW-11410:
-------------------------------------

             Summary: [Rust][Parquet] Implement returning dictionary arrays 
from parquet reader
                 Key: ARROW-11410
                 URL: https://issues.apache.org/jira/browse/ARROW-11410
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Rust
            Reporter: Yordan Pavlov


Currently the Rust parquet reader returns a regular array (e.g. string array) 
even when the column is dictionary encoded in the parquet file.

If the parquet reader had the ability to return dictionary arrays for 
dictionary encoded columns this would bring many benefits such as:
 * faster reading of dictionary encoded columns from parquet (as no 
conversion/expansion into a regular array would be necessary)
 * more efficient memory use as the dictionary array would use less memory when 
loaded in memory
 * faster filtering operations as SIMD can be used to filter over the numeric 
keys of a dictionary string array instead of comparing string values in a 
string array

[~nevime] , [~alamb]  let me know what you think



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to