mapleFU commented on PR #38390:
URL: https://github.com/apache/arrow/pull/38390#issuecomment-1787513972

   Parquet is about two parts in this library
   1. `parquet`: the core part of parquet, mostly it handles types like parquet 
defined physical-type and logical type.
   2. `parquet/arrow`: the arrow part of parquet. It read / write arrow array, 
and maintaining some other infos.
   
   The writer has the structure below:
   
   ```
   arrow::parquet::FileWriter (Holding parquet::ParquetFileWriter, and some 
arrow-related schema/metadata)
   parquet::ParquetFileWriter ( holds fileMetadata and at most one active 
row-group writer)
   - parquet::FileSerializer extends parquet::ParquetFileWriter::Contents
   parquet::RowGroupWriter (Holding some ColumnWriter)
   - parquet::RowGroupSerializer extends parquet::RowGroupWriter::Contents
   ColumnWriter
   - May hold encoder and write multiple pages
   ```
   
   Also, there are some `buffered` and `unbuffered` RowGroupWriter. You can go 
through the doc for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to