It is possible to change the default Parquet version when instantiating
PyArrow's ParquetWriter[1]. Here's the PR[2] that upgraded the default
Parquet format version from 2.4 -> 2.6, which contains nanosecond support.
It was released in Arrow v13.

[1]
https://github.com/apache/arrow/blob/e198f309c577de9a265c04af2bc4644c33f54375/python/pyarrow/parquet/core.py#L953

[2]https://github.com/apache/arrow/pull/36137

On Wed, Feb 21, 2024 at 4:15 PM Li Jin <ice.xell...@gmail.com> wrote:

> “Exponentially exposed” -> “potentially exposed”
>
> On Wed, Feb 21, 2024 at 4:13 PM Li Jin <ice.xell...@gmail.com> wrote:
>
> > Thanks - since we don’t control all the invocation of pq.write_table, I
> > wonder if there is some configuration for the “default” behavior?
> >
> > Also I wonder if there are other API surface that is exponentially
> exposed
> > to this, e.g., dataset or pd.Dataframe.to_parquet ?
> >
> > Thanks!
> > Li
> >
> > On Wed, Feb 21, 2024 at 3:53 PM Jacek Pliszka <jacek.plis...@gmail.com>
> > wrote:
> >
> >> Hi!
> >>
> >>             pq.write_table(
> >>                 table, config.output_filename, coerce_timestamps="us",
> >>                 allow_truncated_timestamps=True,
> >>             )
> >>
> >> allows you to write as us instead of ns.
> >>
> >> BR
> >>
> >> J
> >>
> >>
> >> śr., 21 lut 2024 o 21:44 Li Jin <ice.xell...@gmail.com> napisał(a):
> >>
> >> > Hi,
> >> >
> >> > My colleague has informed me that during the Arrow 12->15 upgrade, he
> >> found
> >> > that writing a pandas Dataframe with datetime64[ns] to parquet will
> >> result
> >> > in nanosecond metadata and nanosecond values.
> >> >
> >> > I wonder if this is something configurable to the old behavior so we
> can
> >> > enable “nanosecond in parquet” gradually? There are code that reads
> >> parquet
> >> > files that don’t handle parquet nanosecond now.
> >> >
> >> > Thanks!
> >> > Li
> >> >
> >>
> >
>

Reply via email to