This aspect of writing Parquet datasets ought to have its own section in the documentation
http://arrow.apache.org/docs/python/parquet.html This would be a useful contribution to the project: https://issues.apache.org/jira/browse/ARROW-3154 On Fri, Aug 31, 2018 at 5:57 PM Anton Goloborodko <[email protected]> wrote: > > Oh, you are absolutely right, ParquetWriter takes a schema! Many thanks, > it's really embarrassing that I did not notice it... > > On Fri, 31 Aug 2018 at 17:43, Wes McKinney <[email protected]> wrote: > > > hi Anton, > > > > Does pa.parquet.write_metadata not do what you want? > > > > https://github.com/apache/arrow/blob/master/python/pyarrow/parquet.py#L1205 > > > > See also https://issues.apache.org/jira/browse/ARROW-1983 > > > > - Wes > > On Fri, Aug 31, 2018 at 5:38 PM Anton Goloborodko > > <[email protected]> wrote: > > > > > > Dear Arrow developers, > > > > > > Our lab is planning to use pyarrow to store some biological information > > in > > > Parquet files. We also have to store some metadata alongside, e.g. which > > > sample the data comes from, how it was obtained and processed, etc. > > > > > > Parquet seems to support file-wide metadata, but I cannot find how the > > > write it via pyarrow. The closest thing I could find is how to write > > > row-group metadata (https://github.com/pandas-dev/pandas/pull/20534), > > but > > > this seems like an overkill, since our metadata is the same for all row > > > groups in the file. > > > > > > Is there any way to write file-wide Parquet metadata with pyarrow? > > > > > > Thank you! > > > Anton. > >
