tustvold opened a new issue #1032:
URL: https://github.com/apache/arrow-rs/issues/1032


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   The current API for the parquet crate is rather large, and exposes quite a 
lot of implementation detail. 
   
   This has a couple of implications:
   
   * It complicates iterating on the crate without making breaking changes to 
public APIs
   * It adds to user's cognitive load as they have to work out what APIs to use
   
   Some examples of this
   
   * The `util` module contains all sorts of random stuff - a hash 
implementation, maths functions, memory tracking, etc...
   * The `compression` module
   * `data_type::AsBytes`, `data_type::SliceAsBytes`, 
`data_type::SliceAsBytesDataType`
   * `data_type::DataType`, `ColumnReaderImpl`, `RecordReader`
   * `schema::types::to_thrift`
   
   **Describe the solution you'd like**
   
   I'm not familiar enough with the design of the crate to authoritatively 
weigh in on what should or shouldn't be public, however, it is my observation 
that a number of the APIs don't appear to be optimised for external consumption.
   
   My **personal** preference would be to make everything lower than the 
file-level, i.e. `SerializedFileReader`, `ParquetFileArrowReader`, `RowIter` 
crate-local. This would have the benefit of being pretty unambiguous and easy 
to communicate and maintain.
   
   This would obviously need to be made in a major arrow release, the next of 
which I believe is in January 2022 (@alamb could maybe confirm). I don't know 
if there are people making use of the lower-level APIs operating on columns, 
row groups, column chunks, pages, etc... However, any APIs could be made public 
again in a point-release based on user feedback.
   
   I think this sort of touches on the objectives for the crate, is the intent 
to provide APIs for manipulating parquet files, or APIs for implementing 
parquet readers and writers for your own custom in-memory format. If the 
latter, this change would be at odds with it, but I'm not sure this is the case?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to