alamb commented on code in PR #10020:
URL: https://github.com/apache/arrow-rs/pull/10020#discussion_r3350282205
##########
parquet/src/arrow/arrow_writer/mod.rs:
##########
@@ -586,6 +586,72 @@ impl ArrowWriterOptions {
/// grows with the row group size. Supplying a factory that spills to a
temp
/// file or object storage instead bounds peak write memory, decoupling it
/// from the row group size while keeping large, read-optimal row groups.
+ ///
+ /// # Example: a custom [`PageStore`]
+ ///
+ /// A store only has to map an opaque, store-allocated [`PageKey`] to a
blob
+ /// and hand the blob back once. The keys need not be dense or sequential —
+ /// here a `HashMap`-backed store mints sparse handles, proving the writer
+ /// relies only on the opaque-handle contract. A real spilling backend
would
+ /// write the bytes to a temp file in `put` and read them back in `take`.
+ ///
+ /// ```
+ /// # use std::collections::HashMap;
+ /// # use std::sync::Arc;
+ /// # use bytes::Bytes;
+ /// # use arrow_array::{ArrayRef, Int64Array, RecordBatch};
+ /// # use parquet::arrow::arrow_writer::{
+ /// # ArrowWriter, ArrowWriterOptions, PageKey, PageStore,
PageStoreFactory,
+ /// # };
+ /// # use parquet::arrow::arrow_reader::ParquetRecordBatchReader;
+ /// # use parquet::errors::{ParquetError, Result};
+ /// #[derive(Default)]
+ /// struct MapPageStore {
+ /// blobs: HashMap<u64, Bytes>,
+ /// next: u64,
+ /// }
+ ///
+ /// impl PageStore for MapPageStore {
+ /// fn put(&mut self, value: Bytes) -> Result<PageKey> {
+ /// // Mint a sparse handle (every other integer) to show the
writer
Review Comment:
a minor nit (does not need to be done here) is that these comments are
basically showing some property about the implementation (aka the ids don't
need to be dense) which is cool but also makes the example longer than
necessary (and thus maybe harder to understand)
Another good example would probably be something that is actually tempfile
backed as that is something I expect woudl actually get used
(can be done later / never)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]