alamb commented on issue #8378:
URL: https://github.com/apache/arrow-rs/issues/8378#issuecomment-3312738786

   I agree that sampling to determine optimal encoding parameters is likely a 
good features
   
   I am not sure how to work it into the parquet writer yet (and I am not sure 
it really belongs in the writer itself, it seems like it maybe belongs at some 
higher level, before the encoder is created).
   
   What I suggest is starting with something standalone and then we can 
eventually decide if we want to work it into the encoder
   
   For example, maybe something like this (I am not sure I like "advisor" so 
maybe someone can suggest a better name)
   
   ```rust
   // create a structure for analyzing data, determing
   let mut advisor = ParquetWriterAdvisor::new(schema);
   // feed some batches into the advisor (user can decide how many to push here)
   advisor.push_batch(&batch1); 
   advisor.push_batch(&batch2); 
   // ask advisor for recommended writer options
   let suggested_properties: WriterProperties = advisor.advise()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to