Proper use of the ColumnChunk `file_path` attribute

Richard Zamora Wed, 22 May 2019 20:34:32 -0700

I’d like to solicit some feedback on the use of the `file_path` attribute for 
ColumnChunk metadata in Parquet.  How exactly is this attribute used in 
practice for both single-file and distributed datasets?


More specifically: Is it bad form to set the `file_path` value in footer 
metadata when the data is stored in the same file?  Should the value only be 
set in the `_metadat` file, or in cases where the actual column-chunk data is 
stored in a different location?  My intuition is that the answer to both of 
these questions is “yes,”  but any feedback/details from people with strong 
parquet experience is very welcome :)

Note that the context for these questions is an ongoing discussion about the 
necessary metadata API in `arrow.parquet` (e.g. 
https://github.com/apache/arrow/pull/4361 and 
https://issues.apache.org/jira/browse/ARROW-5349?page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel&focusedCommentId=16845670#comment-16845670)

Thanks for your help!
-Rick

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may 
contain
confidential information.  Any unauthorized review, use, disclosure or 
distribution
is prohibited.  If you are not the intended recipient, please contact the 
sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

Proper use of the ColumnChunk `file_path` attribute

Reply via email to