Github user stiga-huang commented on the issue:
https://github.com/apache/orc/pull/233
@omalley @majetideepak @wgtmac Thanks for your follow up on ORC-322! If I
understand these correctly, the convention is that TimestampColumnVector should
only accept timestamps in local time. Timestamp values stored in ORC file are
`local_timestamp - local_orc_epoch`. TimestampColumnVector got from the java
reader has timestamps in local time. However, TimestampColumnVector got from
the c++ reader has UTC timestamps.
If so, the c++ writer doesn't need to minus gmtOffset for each timestamp,
because after shifting the values in ORC file are `utc_timestamp -
local_orc_epoch`.
If not, I think the bug in ORC-320 should still be fixed (ORC-322 is aimed
to fix ORC-320). The root cause of ORC-320 is that gmtOffsets got in writer and
reader can be different, though they're using the same Timezone.
To be specific, the writer gets gmtOffset by timestamp `ts`, then writes
down `ts - gmtOffset` (Let's ignore the orc epoch since it's the same in writer
and reader). The reader use `ts - gmtOffset` to get gmtOffset2, then read out
`ts - gmtOffset + gmtOffset2`. However, `gmtOffset2` may not equal to
`gmtOffset`.
Thanks for your patience reading this long comment!
---