The issue [1] mentions about the syntax change about arrow parquet. In
general, when reading from a Parquet file with legacy timestamp not written
by arrow, isAdjustedToUTC would be ignored during read. And when filtering
a file like this, filtering would not work.


When casting from a "deprecated" parquet 1.0 ConvertedType, a timestamp
should be force adjustedToUtc.

For the parquet standard part. Parquet has a ConvertedType for legacy
timestamp, the legacy timestamp *do not* having a adjustedToUtc flag. So,
for forward compatibility, when reading it we need to regard it as
adjustedToUtc ( A UTC Timestamp). See [2] [3].

However, as mentioned in [4]. Arrow legacy file ignores "adjustedToUtc", so
arrow parquet reader in C++ and Go don't follow the standard before 16.0.
This would be a breaking change. I wonder would this be ok, or we should
revert this change in C++ and Go back to previous implementation?

Best,
Xuwei Fu

[1] https://github.com/apache/arrow/issues/39489
[2]
https://github.com/apache/parquet-format/blob/eb4b31c1d64a01088d02a2f9aefc6c17c54cc6fc/LogicalTypes.md?plain=1#L480-L485
[3]
https://github.com/apache/parquet-format/blob/eb4b31c1d64a01088d02a2f9aefc6c17c54cc6fc/LogicalTypes.md?plain=1#L308
[4] https://github.com/apache/arrow/pull/39491#issuecomment-1884465635

Reply via email to