alamb commented on issue #8378: URL: https://github.com/apache/arrow-rs/issues/8378#issuecomment-3312738786
I agree that sampling to determine optimal encoding parameters is likely a good features I am not sure how to work it into the parquet writer yet (and I am not sure it really belongs in the writer itself, it seems like it maybe belongs at some higher level, before the encoder is created). What I suggest is starting with something standalone and then we can eventually decide if we want to work it into the encoder For example, maybe something like this (I am not sure I like "advisor" so maybe someone can suggest a better name) ```rust // create a structure for analyzing data, determing let mut advisor = ParquetWriterAdvisor::new(schema); // feed some batches into the advisor (user can decide how many to push here) advisor.push_batch(&batch1); advisor.push_batch(&batch2); // ask advisor for recommended writer options let suggested_properties: WriterProperties = advisor.advise() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org