[
https://issues.apache.org/jira/browse/ARROW-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488603#comment-17488603
]
Micah Kornfield commented on ARROW-15492:
-----------------------------------------
So this looks like an oversight with int96. The logical type with
isAdjustedToUtc isn't accounted for when making the [arrow type for
int96|https://github.com/apache/arrow/blob/85f192a45755b3f15653fdc0a8fbd788086e125f/cpp/src/parquet/arrow/schema_internal.cc#L197].
It is used for [int64|#L197]. [~amznero] would you be interested in
contributing a fix for this?
> [Python] handle timestamp type in parquet file for compatibility with older
> HiveQL
> ----------------------------------------------------------------------------------
>
> Key: ARROW-15492
> URL: https://issues.apache.org/jira/browse/ARROW-15492
> Project: Apache Arrow
> Issue Type: New Feature
> Affects Versions: 6.0.1
> Reporter: nero
> Priority: Major
>
> Hi there,
> I face an issue when I write a parquet file by PyArrow.
> In the older version of Hive, it can only recognize the timestamp type stored
> in INT96, so I use table.write_to_data with `use_deprecated
> timestamp_int96_timestamps=True` option to save the parquet file. But the
> HiveQL will skip conversion when the metadata of parquet file is not
> created_by "parquet-mr".
> [hive/ParquetRecordReaderBase.java at
> f1ff99636a5546231336208a300a114bcf8c5944 · apache/hive
> (github.com)|https://github.com/apache/hive/blob/f1ff99636a5546231336208a300a114bcf8c5944/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L137-L139]
>
> So I have to save the timestamp columns with timezone info(pad to UTC+8).
> But when pyarrow.parquet read from a dir which contains parquets created by
> both PyArrow and parquet-mr, Arrow.Table will ignore the timezone info for
> parquet-mr files.
>
> Maybe PyArrow can expose the created_by option in pyarrow({*}prefer{*},
> parquet::WriterProperties::created_by is available in the C++ ).
> Or handle the timestamp type with timezone which files created by parquet-mr?
>
> Maybe related to https://issues.apache.org/jira/browse/ARROW-14422
--
This message was sent by Atlassian Jira
(v8.20.1#820001)