Created an issue for the mixed timezones here:
https://issues.apache.org/jira/browse/ARROW-5912

Op do 11 jul. 2019 om 09:42 schreef Joris Van den Bossche <
[email protected]>:

> Clarification regarding the mixed types (this is in the end not really
> related to parquet, but to how pandas gets converted to pyarrow)
>
> Op do 11 jul. 2019 om 09:17 schreef Zoltan Ivanfi <[email protected]
> >:
>
>> ...
>> This matched my expectations up until pd_mixed. I was surprised to see
>> that timestamps with mixed time zones were be stored using local
>> semantics instead of being normalized to UTC,
>>
>
> For the actual parquet writing semantics, it is more relevant to look at
> the arrow Table that gets created from this DataFrame:
>
> In [20]: pa.Table.from_pandas(df)
> Out[20]:
> pyarrow.Table
> datetime: timestamp[ns]
> pd_no_tz: timestamp[ns]
> pd_paris: timestamp[ns, tz=Europe/Paris]
> pd_helsinki: timestamp[ns, tz=Europe/Helsinki]
> pd_mixed: timestamp[us]
>
> For all columns except for pd_mixed the result is clear and expected, but
> apparently pyarrow converts to the mixed timestamps to a TimestampArray
> without timezone using the "local times", and not the UTC normalized times.
>
> Now, that certainly feels a bit buggy to me (or at least unexpected). But,
> this is an issue for the python -> arrow conversion, not related to the
> actual parquet writing. I will open a separate JIRA for this.
>
> Joris
>

Reply via email to