[ 
https://issues.apache.org/jira/browse/ARROW-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845920#comment-16845920
 ] 

Martin Durant commented on ARROW-5349:
--------------------------------------

It depends on what is passed back to the caller: just the metadata object, or 
some indication of which file it went into (sorry, I don't know the API that's 
being built exactly). If the caller defines which file to write to, it would 
seem reasonable to let it set this attribute on the metadata object before 
writing to `_metadata`. However, that might be muddied if partitioning is also 
happening upon write and you end up with multiple files for each piece.

> [Python/C++] Provide a way to specify the file path in parquet 
> ColumnChunkMetaData
> ----------------------------------------------------------------------------------
>
>                 Key: ARROW-5349
>                 URL: https://issues.apache.org/jira/browse/ARROW-5349
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: parquet, pull-request-available
>             Fix For: 0.14.0
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> After ARROW-5258 / https://github.com/apache/arrow/pull/4236 it is now 
> possible to collect the file metadata while writing different files (then how 
> to write those metadata was not yet addressed -> original issue ARROW-1983).
> However, currently, the {{file_path}} information in the ColumnChunkMetaData 
> object is not set. This is, I think, expected / correct for the metadata as 
> included within the single file; but for using the metadata in the combined 
> dataset `_metadata`, it needs a file path set.
> So if you want to use this metadata for a partitioned dataset, there needs to 
> be a way to specify this file path. 
> Ideas I am thinking of currently: either, we could specify a file path to be 
> used when writing, or expose the `set_file_path` method on the Python side so 
> you can create an updated version of the metadata after collecting it.
> cc [~pearu] [~mdurant]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to