It is possible to change the default Parquet version when instantiating PyArrow's ParquetWriter[1]. Here's the PR[2] that upgraded the default Parquet format version from 2.4 -> 2.6, which contains nanosecond support. It was released in Arrow v13.
[1] https://github.com/apache/arrow/blob/e198f309c577de9a265c04af2bc4644c33f54375/python/pyarrow/parquet/core.py#L953 [2]https://github.com/apache/arrow/pull/36137 On Wed, Feb 21, 2024 at 4:15 PM Li Jin <ice.xell...@gmail.com> wrote: > “Exponentially exposed” -> “potentially exposed” > > On Wed, Feb 21, 2024 at 4:13 PM Li Jin <ice.xell...@gmail.com> wrote: > > > Thanks - since we don’t control all the invocation of pq.write_table, I > > wonder if there is some configuration for the “default” behavior? > > > > Also I wonder if there are other API surface that is exponentially > exposed > > to this, e.g., dataset or pd.Dataframe.to_parquet ? > > > > Thanks! > > Li > > > > On Wed, Feb 21, 2024 at 3:53 PM Jacek Pliszka <jacek.plis...@gmail.com> > > wrote: > > > >> Hi! > >> > >> pq.write_table( > >> table, config.output_filename, coerce_timestamps="us", > >> allow_truncated_timestamps=True, > >> ) > >> > >> allows you to write as us instead of ns. > >> > >> BR > >> > >> J > >> > >> > >> śr., 21 lut 2024 o 21:44 Li Jin <ice.xell...@gmail.com> napisał(a): > >> > >> > Hi, > >> > > >> > My colleague has informed me that during the Arrow 12->15 upgrade, he > >> found > >> > that writing a pandas Dataframe with datetime64[ns] to parquet will > >> result > >> > in nanosecond metadata and nanosecond values. > >> > > >> > I wonder if this is something configurable to the old behavior so we > can > >> > enable “nanosecond in parquet” gradually? There are code that reads > >> parquet > >> > files that don’t handle parquet nanosecond now. > >> > > >> > Thanks! > >> > Li > >> > > >> > > >