[GitHub] [arrow] mapleFU commented on pull request #35825: GH-32723: [C++][Parquet] Add option to use LARGE* variants of binary types

via GitHub Wed, 14 Jun 2023 19:41:01 -0700


mapleFU commented on PR #35825:
URL: https://github.com/apache/arrow/pull/35825#issuecomment-1592255125


   @arthurpassos Let's talk about it from down to top
   
   1. Encoder/Decoder: Different encoder for physical type, maybe accept arrow 
or array. Only for leaf-column.
   2. PageReader/PageWriter: Handle "Page", page is independent to encoding, 
just read and write pages.
   3. ColumnWriter/ColumnReader: The values writer/reader wrapper, wraps the 
logic including statistics, dictionary fallbacks for Encoder/Decoder. This 
holds `PageReader` and `PageWriter`, and is only for leaf-column
   4. RecordReader: a leaf-column may be repeated or optional, even nested. So, 
if there are 1000 lines, leaf-values number might be 1000, 2000, or even 
100000. `RecordReader` encapsulate the `ColumnReader` as the "row"
   5. `parquet::arrow::ColumnReader`: Hey, another `ColumnReader`! Maybe you 
can notice these namespace. In parquet, we have `parquet::` and 
`parquet::arrow::`, the `parquet::arrow::` part will assemble and disable the 
records in `parquet::` to arrow data structure.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] mapleFU commented on pull request #35825: GH-32723: [C++][Parquet] Add option to use LARGE* variants of binary types

Reply via email to