[
https://issues.apache.org/jira/browse/ORC-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Owen O'Malley resolved ORC-611.
-------------------------------
Fix Version/s: 1.7.0
1.6.4
Resolution: Fixed
I just committed this. Thank you, Panos!
> Incorrect min-max stats for sub-millisecond timestamps
> ------------------------------------------------------
>
> Key: ORC-611
> URL: https://issues.apache.org/jira/browse/ORC-611
> Project: ORC
> Issue Type: Bug
> Components: C++, Java
> Reporter: Csaba Ringhofer
> Assignee: Panagiotis Garefalakis
> Priority: Major
> Fix For: 1.6.4, 1.7.0
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The issue is related to the precision of storing timestamps:
> - nanoseconds for the data itself
> - only milliseconds for min-max statistics
> Both min and max are rounded to the same value, while min should be rounded
> down and max should be rounded up to ensure that the values are actually
> within that range.
> Repro in Hive:
> {code}
> create table tsstat (ts timestamp) stored as orc;
> insert into tsstat values ("1970-01-01 00:00:00.0005")
> select * from tsstat where ts = "1970-01-01 00:00:00.0005";
> -- returned 0 rows
> {code}
> Both the Java and the C++ writer has this issue (thanks [~stigahuang] for
> looking them up):
> https://github.com/apache/orc/blob/fea154436c37c81a16b13d879b510096cfaa2946/java/core/src/java/org/apache/orc/impl/writer/TimestampTreeWriter.java#L108
> https://github.com/apache/orc/blob/fea154436c37c81a16b13d879b510096cfaa2946/c%2B%2B/src/ColumnWriter.cc#L1800
> I guess that there are already files with this issue in production, so I
> think that the only way to fix this is to hack the reader:
> - decrease/increase min/max stats with 1 ms after reading them from file
> - also be careful about the values pushed down, as the same precision loss
> can occur there to, eg. "WHERE ts <'1970-01-01 00:00:00.0005' AND ts >
> '1970-01-01 00:00:00.0004'" shouldn't be converted to ts < "1970-01-01" AND
> ts > "1970-01-01"
> The issue was discovered during an Impala review:
> https://gerrit.cloudera.org/#/c/15403/1/be/src/exec/hdfs-orc-scanner.cc@875
--
This message was sent by Atlassian Jira
(v8.3.4#803005)