mapleFU commented on PR #38390: URL: https://github.com/apache/arrow/pull/38390#issuecomment-1787513972
Parquet is about two parts in this library 1. `parquet`: the core part of parquet, mostly it handles types like parquet defined physical-type and logical type. 2. `parquet/arrow`: the arrow part of parquet. It read / write arrow array, and maintaining some other infos. The writer has the structure below: ``` arrow::parquet::FileWriter (Holding parquet::ParquetFileWriter, and some arrow-related schema/metadata) parquet::ParquetFileWriter ( holds fileMetadata and at most one active row-group writer) - parquet::FileSerializer extends parquet::ParquetFileWriter::Contents parquet::RowGroupWriter (Holding some ColumnWriter) - parquet::RowGroupSerializer extends parquet::RowGroupWriter::Contents ColumnWriter - May hold encoder and write multiple pages ``` Also, there are some `buffered` and `unbuffered` RowGroupWriter. You can go through the doc for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
