hveiga opened a new issue, #10971: URL: https://github.com/apache/datafusion/issues/10971
### Is your feature request related to a problem or challenge? I am using Datafusion to partition some data stored in parquet files into a different set of parquet files. I would like those newly created files to contain the columns I am partitioning by, however currently the column gets removed as it becomes part of the file directory structure. Something like: COPY (SELECT col1, col2, col3, col4 FROM my_external_table) TO '/output' PARTITIONED BY (col1) OPTIONS (format parquet); ... /output/col1=val1/some_random_file_name.parquet /output/col1=val2/some_random_file_name.parquet /output/col1=val3/some_random_file_name.parquet ... `some_random_file_name.parquet` will not contain the column `col1`. I would like to keep it as it might be needed later for other use cases. ### Describe the solution you'd like I would like to have a flag as part of the `OPTIONS` of the `COPY` statement to conditionally allow the column to remain in the partitioned files. For example: `keep_partitioned_by_columns`, set to `false` by default. ### Describe alternatives you've considered None. I don't think there is an alternative at the moment. ### Additional context Related discussion: https://github.com/apache/datafusion/discussions/10962 I also don't know if this might have implications when reading a hive-partitioned directory structure as you would have a given column in the parquet files and also as part of the directory structure, but it's worth pointing it out in case there might be a collision. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org