Github user wgtmac commented on the issue:
https://github.com/apache/orc/pull/233
Thanks @majetideepak for comment!
On the Java side, the input timestamp in writer TimestampColumnVector is in
UTC. It leverages java.sql.Timestamp which knows the local timezone info so
that it can PRINT in local timezone. You can print millis variable in line 109
in TimestampTreeWriter.java to verify this. The name of
SerializationUtils.convertToUtc(localTimezone, millis) in line 113 is kind of
confusing, because the result is not the current timestamp in UTC but adds an
offset to local timezone which I think it is also a problem.
ORC-10 has fixed the bug without writer timezone. The original design is to
be resilient to move between different reader timezones. However this caused an
issue in C++ between different daylight saving timezones and writer timezone is
forced to be written. ORC-10 adds GMT offset is actually converting the value
to local timezone so that ColumnPrinter can print the same time in local
timezone. This causes a new problem that C++ reader gets timestamp value in
local timezone, not UTC and it is different from java reader. I believe this is
why @owen has created [ORC-37](https://issues.apache.org/jira/browse/ORC-37).
SQL type TimestampTz is a new type other than traditional SQL type Timestamp, I
don't think it is a good idea to mix ORC timestamp type with TimestampTz and
there is another open issue for it:
[ORC-189](https://issues.apache.org/jira/browse/ORC-189)
It is very confusing that an input timestamp written using Java writer is
read differently via C++ reader. I think we need to fix it and this can also
resolve ORC-37. What do you think?
---