Rick Zamora created ARROW-8244:
----------------------------------

             Summary: Add `write_to_dataset` option to populate the "file_path" 
metadata fields
                 Key: ARROW-8244
                 URL: https://issues.apache.org/jira/browse/ARROW-8244
             Project: Apache Arrow
          Issue Type: Wish
            Reporter: Rick Zamora


Prior to [dask#6023|[https://github.com/dask/dask/pull/6023]], Dask has been 
using the `write_to_dataset` API to write partitioned parquet datasets.  This 
PR is switching to a (hopefully temporary) custom solution, because that API 
makes it difficult to populate the the "file_path"  column-chunk metadata 
fields that are returned within the optional `metadata_collector` kwarg.  Dask 
needs to set these fields correctly in order to generate a proper global 
`"_metadata"` file.

Possible solutions to this problem:
 # Optionally populate the file-path fields within `write_to_dataset`
 # Always populate the file-path fields within `write_to_dataset`
 # Return the file paths for the data written within `write_to_dataset` (up to 
the user to manually populate the file-path fields)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to