[ 
https://issues.apache.org/jira/browse/HIVE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103551#comment-16103551
 ] 

Andrew Sherman commented on HIVE-17186:
---------------------------------------

This looks like an artifact of floating point arithmetic:

{noformat}
double d1 = 0.06D;
double d2 = 0.01D;
double d3 = d1 + d2;
double d4 = d1 - d2;
System.out.println("d3 = " + d3);
System.out.println("d4 = " + d4);
{noformat}
gives
{noformat}
d3 = 0.06999999999999999
d4 = 0.049999999999999996
{noformat}

> `double` type constant operation loses precision
> ------------------------------------------------
>
>                 Key: HIVE-17186
>                 URL: https://issues.apache.org/jira/browse/HIVE-17186
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Dongjoon Hyun
>
> This might be an issue where Hive loses a precision and generates a wrong 
> result when handling *double* constant operations. This was reported in the 
> following environment.
> *ENVIRONMENT*
> https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql
> *SQL*
> {code}
> hive> explain select l_discount from lineitem where l_discount between 0.06 - 
> 0.01 and 0.06 + 0.01 limit 10;
> OK
> Plan not optimized by CBO.
> Stage-0
>    Fetch Operator
>       limit:10
>       Stage-1
>          Map 1 vectorized
>          File Output Operator [FS_9]
>             compressed:false
>             Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE 
> Column stats: COMPLETE
>             table:{"input 
> format:":"org.apache.hadoop.mapred.TextInputFormat","output 
> format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
>             Limit [LIM_8]
>                Number of rows:10
>                Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                Select Operator [OP_7]
>                   outputColumnNames:["_col0"]
>                   Statistics:Num rows: 2999994854 Data size: 23999958832 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator [FIL_6]
>                      predicate:l_discount BETWEEN 0.049999999999999996 AND 
> 0.06999999999999999 (type: boolean)
>                      Statistics:Num rows: 2999994854 Data size: 23999958832 
> Basic stats: COMPLETE Column stats: COMPLETE
>                      TableScan [TS_0]
>                         alias:lineitem
>                         Statistics:Num rows: 5999989709 Data size: 
> 4832986297043 Basic stats: COMPLETE Column stats: COMPLETE
> hive> select max(l_discount) from lineitem where l_discount between 0.06 - 
> 0.01 and 0.06 + 0.01 limit 10;
> OK
> 0.06
> Time taken: 314.923 seconds, Fetched: 1 row(s)
> {code}
> Hive excludes 0.07 differently from the users' intuitiion. Also, this 
> difference makes some users confused because they believe that Hive's result 
> is the correct one. Is there any way for Hive to fix this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to