Hi Joris,

Out of curiosity I tried it with fastparquet as well and that couldn't
even save that column:

ValueError: Can't infer object conversion type: 0    1970-01-01 01:00:00+01:00
1    1970-01-01 02:00:00+02:00
Name: pd_mixed, dtype: object

Br,

Zoltan

On Thu, Jul 11, 2019 at 3:55 PM Joris Van den Bossche
<[email protected]> wrote:
>
> Created an issue for the mixed timezones here:
> https://issues.apache.org/jira/browse/ARROW-5912
>
> Op do 11 jul. 2019 om 09:42 schreef Joris Van den Bossche <
> [email protected]>:
>
> > Clarification regarding the mixed types (this is in the end not really
> > related to parquet, but to how pandas gets converted to pyarrow)
> >
> > Op do 11 jul. 2019 om 09:17 schreef Zoltan Ivanfi <[email protected]
> > >:
> >
> >> ...
> >> This matched my expectations up until pd_mixed. I was surprised to see
> >> that timestamps with mixed time zones were be stored using local
> >> semantics instead of being normalized to UTC,
> >>
> >
> > For the actual parquet writing semantics, it is more relevant to look at
> > the arrow Table that gets created from this DataFrame:
> >
> > In [20]: pa.Table.from_pandas(df)
> > Out[20]:
> > pyarrow.Table
> > datetime: timestamp[ns]
> > pd_no_tz: timestamp[ns]
> > pd_paris: timestamp[ns, tz=Europe/Paris]
> > pd_helsinki: timestamp[ns, tz=Europe/Helsinki]
> > pd_mixed: timestamp[us]
> >
> > For all columns except for pd_mixed the result is clear and expected, but
> > apparently pyarrow converts to the mixed timestamps to a TimestampArray
> > without timezone using the "local times", and not the UTC normalized times.
> >
> > Now, that certainly feels a bit buggy to me (or at least unexpected). But,
> > this is an issue for the python -> arrow conversion, not related to the
> > actual parquet writing. I will open a separate JIRA for this.
> >
> > Joris
> >

Reply via email to