alamb opened a new issue, #5654: URL: https://github.com/apache/arrow-datafusion/issues/5654
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I would like to parquet data from one format to another, for example to see the effects of page pruning -- https://github.com/apache/arrow-datafusion/issues/4085 or different orderings on compression and other properties arrow-rs and DataFusion have all the parts we need (reading from files, sorting, writing to files) we just now need to put them together We do have a very specialized version in the tpch benchmark driver https://github.com/apache/arrow-datafusion/blob/26e1b20ea3362ea62cb713004a0636b8af6a16d7/benchmarks/src/tpch.rs#L332-L400 **Describe the solution you'd like** I would like DataFusion to support duckdb style `COPY` sql statements For example: ```sql -- export the table `t` to data.parquet COPY t TO 'data.parquet' (FORMAT PARQUET); -- export as parquet, compressed with ZSTD, with a row_group_size of 100000 COPY t TO 'data.parquet' (FORMAT PARQUET, COMPRESSION ZSTD, ROW_GROUP_SIZE 100000); --- export the output of of a query `SELECT * FROM tbl` COPY (SELECT * FROM tbl ORDER BY time) TO 'data.parquet' (FORMAT PARQUET); ``` Reference: 1. https://duckdb.org/docs/sql/statements/copy 2. https://duckdb.org/docs/sql/statements/export **Describe alternatives you've considered** @metesynnada is working on `INSERT INTO` style syntax in https://github.com/apache/arrow-datafusion/issues/5130 Bonus points for CSV support (ideally the code structure will allow support in the long term but not as part of the initial PR) ```sql -- export as CSV with the given options COPY t TO 'data.csv' (FORMAT CSV, DELIMITER '|', HEADER); ``` **Additional context** https://github.com/apache/arrow-datafusion/issues/5130#issuecomment-1430323461 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
