[ 
https://issues.apache.org/jira/browse/ORC-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17451286#comment-17451286
 ] 

Varun Raval commented on ORC-1054:
----------------------------------

File converted_by_hive.orc is generated by hive as a result of running insert 
command. It contains two rows. 2021-11-10 01:02:15.553 and 0001-01-01 
00:00:00.0. Select query with where clause on timestamp is able to retrieve 
data from this orc file correctly.

Another observation:

Query `select * from master.csvtest where d > unix_timestamp('2021-11-10 
00:00:00.000');` is able to retrieve rows correctly from both orc files (those 
generated by csv tools and those generated by Hive).

However, query `select * from master.csvtest where d > '2021-11-10 
00:00:00.000';` is only able to retrieve rows correctly only from orc file 
generated by Hive.

> Unable to compare data (generated using CSV to ORC converter) on timestamp 
> column
> ---------------------------------------------------------------------------------
>
>                 Key: ORC-1054
>                 URL: https://issues.apache.org/jira/browse/ORC-1054
>             Project: ORC
>          Issue Type: Bug
>          Components: C++, Java
>            Reporter: Varun Raval
>            Priority: Major
>         Attachments: converted_by_hive.orc, file1.orc, hive_table_desc.jpg, 
> timestamp1.csv
>
>
> I have a CSV file with timestamp columns. Then I convert CSV file to ORC file 
> using CSV to ORC converter and place the ORC file in a hive table backed by 
> ORC files. I am not able to query the data using timestamp column on Apache 
> Hive beeline. If timestamp is present in the select query, the corresponding 
> rows are not retrieved.
> For example, table csvtest has single column (t) as timestamp datatype. It 
> has a row '2021-11-10 01:02:15'. Query "select * from csvtest where t > 
> '2021-11-10 00:00:00'" does not return any result. Query "select * from 
> csvtest" returns the correct row.
> However, the same query "select * from csvtest where t > '2021-11-10 
> 00:00:00'" works with Spark SQL and rows are retrieved correctly.
> Is this issue with how ORC file is created or is it some hive configuration 
> issue?
> I have tested it on the master branch and results are same for both cpp and 
> java csv to orc converters.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to