Created an issue for the mixed timezones here: https://issues.apache.org/jira/browse/ARROW-5912
Op do 11 jul. 2019 om 09:42 schreef Joris Van den Bossche < [email protected]>: > Clarification regarding the mixed types (this is in the end not really > related to parquet, but to how pandas gets converted to pyarrow) > > Op do 11 jul. 2019 om 09:17 schreef Zoltan Ivanfi <[email protected] > >: > >> ... >> This matched my expectations up until pd_mixed. I was surprised to see >> that timestamps with mixed time zones were be stored using local >> semantics instead of being normalized to UTC, >> > > For the actual parquet writing semantics, it is more relevant to look at > the arrow Table that gets created from this DataFrame: > > In [20]: pa.Table.from_pandas(df) > Out[20]: > pyarrow.Table > datetime: timestamp[ns] > pd_no_tz: timestamp[ns] > pd_paris: timestamp[ns, tz=Europe/Paris] > pd_helsinki: timestamp[ns, tz=Europe/Helsinki] > pd_mixed: timestamp[us] > > For all columns except for pd_mixed the result is clear and expected, but > apparently pyarrow converts to the mixed timestamps to a TimestampArray > without timezone using the "local times", and not the UTC normalized times. > > Now, that certainly feels a bit buggy to me (or at least unexpected). But, > this is an issue for the python -> arrow conversion, not related to the > actual parquet writing. I will open a separate JIRA for this. > > Joris >
