[
https://issues.apache.org/jira/browse/ORC-1055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17459183#comment-17459183
]
Gang Wu commented on ORC-1055:
------------------------------
The problem is caused by inconsistent time zones used between CSVFileImport.cc
and Writer.cc.
The C++ csv import tool uses local time zone to parse the timestamp:
[orc/CSVFileImport.cc at main · apache/orc
(github.com)|https://github.com/apache/orc/blob/main/tools/src/CSVFileImport.cc#L257]
But the C++ writer fixes time zone to GMT for some reasons: [orc/Writer.cc at
main · apache/orc
(github.com)|https://github.com/apache/orc/blob/main/c%2B%2B/src/Writer.cc#L63]
We may fix it by adding a conversion from local time zone to GMT in the
CSVFileImport.cc.
[~Guiyankuang] [~dongjoon]
> [C++] Timestamp values read in Hive are different when using ORC file created
> using CSV to ORC converter tools
> --------------------------------------------------------------------------------------------------------------
>
> Key: ORC-1055
> URL: https://issues.apache.org/jira/browse/ORC-1055
> Project: ORC
> Issue Type: Bug
> Components: C++
> Reporter: Yiqun Zhang
> Priority: Major
> Attachments: converted_by_cpp.orc, timestamp.csv
>
>
> I have a CSV file that has a column having timestamp values as 0001-01-01
> 00:00:00.0. Then I convert CSV file to ORC file using CSV to ORC converter
> and place the ORC file in a hive table backed by ORC files. On querying the
> data using Hive beeline and Spark SQL, different results are obtained
> If converted using CPP tool, value read using Hive beeline and Spark SQL
> queries is 0001-01-03 00:00:00
> Reported by [~vraval48]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)