alamb opened a new issue, #5654:
URL: https://github.com/apache/arrow-datafusion/issues/5654

   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   I would like to parquet data from one format to another, for example to see 
the effects of page pruning -- 
https://github.com/apache/arrow-datafusion/issues/4085 or different orderings 
on compression and other properties
   
   arrow-rs and DataFusion have all the parts we need (reading from files, 
sorting, writing to files) we just now need to put them together
   
   We do have a  very specialized version in the tpch benchmark driver
   
https://github.com/apache/arrow-datafusion/blob/26e1b20ea3362ea62cb713004a0636b8af6a16d7/benchmarks/src/tpch.rs#L332-L400
   
   **Describe the solution you'd like**
   I would like DataFusion to support duckdb style `COPY` sql statements 
   
   For example:
   
   ```sql
   -- export the table `t` to data.parquet
   COPY t TO 'data.parquet' (FORMAT PARQUET);
   -- export as parquet, compressed with ZSTD, with a row_group_size of 100000
   COPY t TO 'data.parquet' (FORMAT PARQUET, COMPRESSION ZSTD, ROW_GROUP_SIZE 
100000);
   --- export the output of of a query `SELECT * FROM tbl`
   COPY (SELECT * FROM tbl ORDER BY time) TO 'data.parquet' (FORMAT PARQUET);
   ```
   
   Reference:
   1. https://duckdb.org/docs/sql/statements/copy
   2. https://duckdb.org/docs/sql/statements/export
   
   **Describe alternatives you've considered**
   @metesynnada  is working on `INSERT INTO` style syntax in 
https://github.com/apache/arrow-datafusion/issues/5130
   
   Bonus points for CSV support (ideally the code structure will allow support 
in the long term but not as part of the initial PR)
   
   ```sql
   -- export as CSV with the given options
   COPY t TO 'data.csv'  (FORMAT CSV, DELIMITER '|', HEADER);
   ```
   
   **Additional context**
   
    
https://github.com/apache/arrow-datafusion/issues/5130#issuecomment-1430323461
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to