hveiga opened a new issue, #10971:
URL: https://github.com/apache/datafusion/issues/10971

   ### Is your feature request related to a problem or challenge?
   
   
   I am using Datafusion to partition some data stored in parquet files into a 
different set of parquet files. I would like those newly created files to 
contain the columns I am partitioning by, however currently the column gets 
removed as it becomes part of the file directory structure. Something like:
   
   COPY (SELECT col1, col2, col3, col4 FROM my_external_table) TO '/output' 
PARTITIONED BY (col1) OPTIONS (format parquet);
   
   ...
   
   /output/col1=val1/some_random_file_name.parquet
   /output/col1=val2/some_random_file_name.parquet
   /output/col1=val3/some_random_file_name.parquet
   ...
   
   `some_random_file_name.parquet` will not contain the column `col1`. I would 
like to keep it as it might be needed later for other use cases.
   
   ### Describe the solution you'd like
   
   I would like to have a flag as part of the `OPTIONS` of the `COPY` statement 
to conditionally allow the column to remain in the partitioned files. For 
example: `keep_partitioned_by_columns`, set to `false` by default.
   
   ### Describe alternatives you've considered
   
   None. I don't think there is an alternative at the moment.
   
   ### Additional context
   
   Related discussion: https://github.com/apache/datafusion/discussions/10962
   
   I also don't know if this might have implications when reading a 
hive-partitioned directory structure as you would have a given column in the 
parquet files and also as part of the directory structure, but it's worth 
pointing it out in case there might be a collision.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to