alamb commented on a change in pull request #797: URL: https://github.com/apache/arrow-rs/pull/797#discussion_r715711686
########## File path: parquet/src/arrow/mod.rs ########## @@ -79,8 +52,57 @@ //! writer.write(&batch).expect("Writing batch"); //! } //! writer.close().unwrap(); +//! ``` + +//! `WriterProperties` can be used to set several configuration options +//! ```rust, no_run +//! use parquet::basic::{ Compression, Encoding }; +//! // File compression +//! let props = WriterProperties::builder() +//! .set_compression(Compression::SNAPPY) +//! .build(); +//! // Max row group size compression +//! let props = WriterProperties::builder() +//! .set_max_row_group_size(100) +//! .build(); +//! // File encoding +//! let props = WriterProperties::builder() +//! .set_encoding(Encoding::RLE) +//! .build(); +//! // Parquet Version +//! let props = WriterProperties::builder() +//! .set_writer_version(WriterVersion::PARQUET_1_0) +//! .build(); +//! ``` +//! +//! # Example of reading parquet file into arrow record batch +//! +//! ```rust, no_run +//! use arrow::record_batch::RecordBatchReader; +//! use parquet::file::reader::SerializedFileReader; +//! use parquet::arrow::{ParquetFileArrowReader, ArrowReader}; +//! use std::sync::Arc; +//! use std::fs::File; //! +//! let file = File::open("data.parquet").unwrap(); +//! let file_reader = SerializedFileReader::new(file).unwrap(); Review comment: > Perhaps I am missing something. My understanding is that part of the code in the reader example is meant to demonstrate reading an on disk parquet file - hence the need to use SerializedFileReader. Is this understanding correct? I am probably confused. I was imagining that the example for ~`SerializedFileReader`~ `ArrowReader` would demonstrate reading a parquet file and that a (separate) example for ~`SerializedFileWriter`~ `ArrorWriter` would demonstrate writing a `RecordBatch` (created somehow) to a file (as I think that is the common usecase). > Assuming thats the case, can you just confirm that try_from_iter is the preferred approach to creating a record batch over try_new? I don't think one is preferred over the other. I find the code to create `RecordBatch`es from `try_from_iter` is slightly shorter but they both do the same thing so I think either is fine -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org