alamb commented on issue #9242: URL: https://github.com/apache/arrow-rs/issues/9242#issuecomment-3801918121
I agree this would be a great API to add. I don't really see how you could do this with [get_column_writers](https://github.com/apache/arrow-rs/blob/d60b1cd7d60007382bf84c58ac7ba2626d887e19/parquet/src/arrow/arrow_writer/mod.rs#L427-L438) any more than you can with the replacement API [ArrowRowGroupWriterFactory](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowRowGroupWriterFactory.html) That being said, it looks to me like ArrowRowGroupWriterFactory has a property pointer on it that we could relatively easy support overiding. So, following the example in https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowColumnWriter.html Maybe the API could look something like ```rust // Create parquet writer let mut writer = SerializedFileWriter::new(&mut out, root_schema, props.clone()) .unwrap(); // Create a factory for building Arrow column writers for let row_group_factory = ArrowRowGroupWriterFactory::new(&writer, Arc::clone(&schema)); // Create column writers for the 0th row group let col_writers = row_group_factory.create_column_writers(0).unwrap(); // change the properties to disable dictionary encoding for row group 1 let new_props = row_group_factory .writer_properties() // **** <--------------- NEW API to get properties .clone() .into_builder() .set_dictionary(false) // disable dictionary .build(); let row_group_factory = row_group_factory .with_properties(Arc::new(props))?; // **** <--------------- NEW API to set properties // Create column writers for the 1st row group, using the new properties let col_writers = row_group_factory.create_column_writers(1).unwrap(); ``` ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
