Re: [PR] Support Copy To Partitioned Files [arrow-datafusion]

via GitHub Thu, 15 Feb 2024 17:10:26 -0800


devinjdangelo commented on PR #9240:
URL: 
https://github.com/apache/arrow-datafusion/pull/9240#issuecomment-1947588213


   > Wondering as we write partitioned data then should also test the 
underlying file/folder structure, how the data was written to disk?
   
   The copy.slt tests rely on the read path for partitioned tables to make sure 
the files were written out correctly.
   
   ```sql
   # Copy to directory as partitioned files
   query ITT
   COPY (values (1, 'a', 'x'), (2, 'b', 'y'), (3, 'c', 'z')) TO 
'test_files/scratch/copy/partitioned_table2/' 
   (format parquet, compression 'zstd(10)', partition_by 'column2, column3');
   ----
   3
   
   # validate multiple partitioned parquet file output
   statement ok
   CREATE EXTERNAL TABLE validate_partitioned_parquet2 STORED AS PARQUET 
   LOCATION 'test_files/scratch/copy/partitioned_table2/' PARTITIONED BY 
(column2, column3);
   
   query I??
   select * from validate_partitioned_parquet2 order by column1,column2,column3;
   ----
   1 a x
   2 b y
   3 c z
   ```
   
   The code currently doesn't do any checking/validation of existing 
directories or files before writing. DuckDB describes some of the options they 
have for controlling this behavior here 
https://duckdb.org/docs/data/partitioning/partitioned_writes.html
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Support Copy To Partitioned Files [arrow-datafusion]

Reply via email to