devinjdangelo opened a new issue, #7298: URL: https://github.com/apache/arrow-datafusion/issues/7298
### Is your feature request related to a problem or challenge? Currently, the only way to customize how files are written as the result of a `COPY` or `INSERT` query is via session level defaults. E.g. ```sql set datafusion.execution.parquet.max_row_group_size=9999; INSERT INTO my_table values (1,2), (3,4); COPY my_table to mytable.parquet; ``` We should implement statement and table level options so individual statements can customize the write behavior as desired. E.g.: ```sql COPY my_table to mytable.parquet (max_row_group_size 9999) ``` Or to set default options for a specific table, rather than globally in a session: ```sql CREATE EXTERNAL TABLE my_table STORED AS PARQUET LOCATION 'my_table/' OPTIONS (max_row_group_size 9999) ``` ### Describe the solution you'd like We could implement a `WriteOptions` struct with a `WriteOptions::from(Vec<(String, String)>)` method so the struct can be created from arbitrary string tuples passed like in the statements above. `FileSink`s could then accept a `WriteOptions` struct and use it to construct a serializer with the desired settings. DataFrame API can be refactored to accept `WriteOptions `directly. The existing code which creates a `parquet::WriterProperties` from session configs should be refactored to reduce code duplication / share implementation details with parsing statement level overrides. ### Describe alternatives you've considered Rather than just a generic `WriteOptions` struct, we may want a `WriteOptions` trait with specific structs for each file format, i.e. `CsvWriteOptions`. Each file format can decide how to handle each option and if desired emit a warning/error if invalid options are passed (e.g. row_group_size is passed to Csv writer). ### Additional context Relevant recent PRs for supporting writes: #7244 #7283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
