tustvold opened a new issue, #1655:
URL: https://github.com/apache/arrow-rs/issues/1655

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   Currently ArrayBuilderContext has multiple responsibilities
   
   * Parquet -> Arrow schema conversion
   * Constructing the necessary ArrayBuilders
   * Projection pushdown
   
   The result is not only immensely confusing but also:
   
   * Overlaps with code in `parquet_to_arrow_schema_by_columns`
   * Hard to test - #1484 
   * Potentially inconsistent - #1652
   * Buggy - #1654
   
   **Describe the solution you'd like**
   
   Create an `ArrowSchemaConverter` which takes a `FileMetaData` and an 
optional column projection and returns `ParquetField` where
   
   ```
   struct ParquetField {
       rep_level: i16,
       def_level: i16,
       arrow_type: DataType,
       parquet_type: TypePtr,
       leaf_idx: Option<usize>,
       children: Vec<ParquetField>
   }
   ```
   
   This can then easily be used to generate the Schema or ArrayReader for the 
projected columns, replacing the existing logic.
   
   As FileMetaData can easily be created, this should be significantly easier 
to test than the current logic.
   
   **Describe alternatives you've considered**
   
   Some of the bugs can be worked around manually but the code is getting 
increasingly difficult to reason about, and I think it has reached a point 
where we need to spend some time to refactor it.
   
   **Additional context**
   
   https://github.com/apache/arrow-rs/issues/1654
   https://github.com/apache/arrow-rs/issues/1652
   https://github.com/apache/arrow-rs/issues/1459
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to