[ 
https://issues.apache.org/jira/browse/ORC-1053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451204#comment-17451204
 ] 

Varun Raval edited comment on ORC-1053 at 11/30/21, 3:55 PM:
-------------------------------------------------------------

I have tested using the main branch. Sample csv file is timestamp.csv. It has a 
single column.

converted_by_cpp.orc, converted_by_java.orc files are generated by cpp and java 
tools respectively.

Commands used:
 # /root/orc/build/tools/src/csv-import "struct<d:timestamp>" input.csv 
output.orc
 # java -jar /root/orc/build/java/tools/orc-tools-1.8.0-SNAPSHOT-uber.jar 
convert --schema "struct<d:timestamp>" -o output.orc -t "yyyy-MM-dd 
HH:mm:ss.SSS" input.csv

Destination table in Hive is an external table and it contains single column of 
type timestamp. Description of the table is shown in hive_table_desc.jpg.


was (Author: vraval48):
I have tested using the main branch. Sample csv file is timestamp.csv. It has a 
single column.

converted_by_cpp.orc, converted_by_java.orc files are generated by cpp and java 
tools respectively.

Commands used:
 # /root/orc/build/tools/src/csv-import "struct<d:timestamp>" input.csv 
output.orc
 # java -jar /root/orc/build/java/tools/orc-tools-1.8.0-SNAPSHOT-uber.jar 
convert --schema "struct<d:timestamp>" -o output.orc -t "yyyy-MM-dd 
HH:mm:ss.SSS" input.csv

> Timestamp values read in Hive are different when using ORC file created using 
> CSV to ORC converter tools
> --------------------------------------------------------------------------------------------------------
>
>                 Key: ORC-1053
>                 URL: https://issues.apache.org/jira/browse/ORC-1053
>             Project: ORC
>          Issue Type: Bug
>          Components: C++, Java
>            Reporter: Varun Raval
>            Priority: Major
>         Attachments: converted_by_cpp.orc, converted_by_java.orc, 
> timestamp.csv
>
>
> I have a CSV file that has a column having timestamp values as 0001-01-01 
> 00:00:00.0. Then I convert CSV file to ORC file using CSV to ORC converter 
> and place the ORC file in a hive table backed by ORC files. On querying the 
> data using Hive beeline and Spark SQL, different results are obtained
> If converted using CPP tool, value read using Hive beeline and Spark SQL 
> queries is 0001-01-03 00:00:00
> If converted using Java tool, value read using Hive beeline and Spark SQL 
> queries is 0001-01-02 23:56:02.0



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to