mapleFU commented on PR #35825: URL: https://github.com/apache/arrow/pull/35825#issuecomment-1592255125
@arthurpassos Let's talk about it from down to top 1. Encoder/Decoder: Different encoder for physical type, maybe accept arrow or array. Only for leaf-column. 2. PageReader/PageWriter: Handle "Page", page is independent to encoding, just read and write pages. 3. ColumnWriter/ColumnReader: The values writer/reader wrapper, wraps the logic including statistics, dictionary fallbacks for Encoder/Decoder. This holds `PageReader` and `PageWriter`, and is only for leaf-column 4. RecordReader: a leaf-column may be repeated or optional, even nested. So, if there are 1000 lines, leaf-values number might be 1000, 2000, or even 100000. `RecordReader` encapsulate the `ColumnReader` as the "row" 5. `parquet::arrow::ColumnReader`: Hey, another `ColumnReader`! Maybe you can notice these namespace. In parquet, we have `parquet::` and `parquet::arrow::`, the `parquet::arrow::` part will assemble and disable the records in `parquet::` to arrow data structure. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
