I submitted a PR (for a possible solution) that permits setting the
file_path fields to a particular value after a file has already been
written. That way this mutated FileMetaData can be written to a
separate _metadata file.

https://github.com/apache/arrow/pull/4386

On Fri, May 24, 2019 at 12:22 PM Ryan Blue <[email protected]> wrote:
>
> Hi Richard,
>
> I think the original intent of file_path was to allow storing columns in
> other data files. But Parquet has never really had support for this so the
> column is ignored as far as I know. I see no problem with setting it to the
> same file, but it is probably more reliable not to so that your metadata
> isn't incorrect when you rename or move the file.
>
> rb
>
> On Wed, May 22, 2019 at 8:33 PM Richard Zamora <[email protected]> wrote:
>
> > I’d like to solicit some feedback on the use of the `file_path` attribute
> > for ColumnChunk metadata in Parquet.  How exactly is this attribute used in
> > practice for both single-file and distributed datasets?
> >
> > More specifically: Is it bad form to set the `file_path` value in footer
> > metadata when the data is stored in the same file?  Should the value only
> > be set in the `_metadat` file, or in cases where the actual column-chunk
> > data is stored in a different location?  My intuition is that the answer to
> > both of these questions is “yes,”  but any feedback/details from people
> > with strong parquet experience is very welcome :)
> >
> > Note that the context for these questions is an ongoing discussion about
> > the necessary metadata API in `arrow.parquet` (e.g.
> > https://github.com/apache/arrow/pull/4361 and
> > https://issues.apache.org/jira/browse/ARROW-5349?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16845670#comment-16845670
> > )
> >
> > Thanks for your help!
> > -Rick
> >
> >
> > -----------------------------------------------------------------------------------
> > This email message is for the sole use of the intended recipient(s) and
> > may contain
> > confidential information.  Any unauthorized review, use, disclosure or
> > distribution
> > is prohibited.  If you are not the intended recipient, please contact the
> > sender by
> > reply email and destroy all copies of the original message.
> >
> > -----------------------------------------------------------------------------------
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix

Reply via email to