[
https://issues.apache.org/jira/browse/HIVE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103551#comment-16103551
]
Andrew Sherman commented on HIVE-17186:
---------------------------------------
This looks like an artifact of floating point arithmetic:
{noformat}
double d1 = 0.06D;
double d2 = 0.01D;
double d3 = d1 + d2;
double d4 = d1 - d2;
System.out.println("d3 = " + d3);
System.out.println("d4 = " + d4);
{noformat}
gives
{noformat}
d3 = 0.06999999999999999
d4 = 0.049999999999999996
{noformat}
> `double` type constant operation loses precision
> ------------------------------------------------
>
> Key: HIVE-17186
> URL: https://issues.apache.org/jira/browse/HIVE-17186
> Project: Hive
> Issue Type: Bug
> Reporter: Dongjoon Hyun
>
> This might be an issue where Hive loses a precision and generates a wrong
> result when handling *double* constant operations. This was reported in the
> following environment.
> *ENVIRONMENT*
> https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql
> *SQL*
> {code}
> hive> explain select l_discount from lineitem where l_discount between 0.06 -
> 0.01 and 0.06 + 0.01 limit 10;
> OK
> Plan not optimized by CBO.
> Stage-0
> Fetch Operator
> limit:10
> Stage-1
> Map 1 vectorized
> File Output Operator [FS_9]
> compressed:false
> Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE
> Column stats: COMPLETE
> table:{"input
> format:":"org.apache.hadoop.mapred.TextInputFormat","output
> format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
> Limit [LIM_8]
> Number of rows:10
> Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE
> Column stats: COMPLETE
> Select Operator [OP_7]
> outputColumnNames:["_col0"]
> Statistics:Num rows: 2999994854 Data size: 23999958832
> Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator [FIL_6]
> predicate:l_discount BETWEEN 0.049999999999999996 AND
> 0.06999999999999999 (type: boolean)
> Statistics:Num rows: 2999994854 Data size: 23999958832
> Basic stats: COMPLETE Column stats: COMPLETE
> TableScan [TS_0]
> alias:lineitem
> Statistics:Num rows: 5999989709 Data size:
> 4832986297043 Basic stats: COMPLETE Column stats: COMPLETE
> hive> select max(l_discount) from lineitem where l_discount between 0.06 -
> 0.01 and 0.06 + 0.01 limit 10;
> OK
> 0.06
> Time taken: 314.923 seconds, Fetched: 1 row(s)
> {code}
> Hive excludes 0.07 differently from the users' intuitiion. Also, this
> difference makes some users confused because they believe that Hive's result
> is the correct one. Is there any way for Hive to fix this?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)