[
https://issues.apache.org/jira/browse/HIVE-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196819#comment-15196819
]
Sergey Shelukhin commented on HIVE-13292:
-----------------------------------------
With double type, it's usually by design
> Different DOUBLE type precision issue between Spark and MR engine
> -----------------------------------------------------------------
>
> Key: HIVE-13292
> URL: https://issues.apache.org/jira/browse/HIVE-13292
> Project: Hive
> Issue Type: Bug
> Environment: Apache Hive 2.0.0
> Apache Spark 1.6.0
> Reporter: Xin Hao
>
> Different DOUBLE type precision issue between Spark and MR engine.
> Found when executing the TPC-H query5 with scale factor 2 (2GB data size).
> More details are as below.
> (1)The MR engine output:
> MOZAMBIQUE,1.0646195910990009E8
> ETHIOPIA,1.0108856206629996E8
> ALGERIA,9.987582690420012E7
> MOROCCO,9.785484184850013E7
> KENYA,9.412388077690017E7
> (2)The Spark engine output:
> MOZAMBIQUE,1.064619591099E8
> ETHIOPIA,1.0108856206630005E8
> ALGERIA,9.987582690419997E7
> MOROCCO,9.785484184850003E7
> KENYA,9.412388077690002E7
> (3)Detail SQL used:
> drop table if exists ${env:RESULT_TABLE};
> create table ${env:RESULT_TABLE} (
> pid1 STRING,
> pid2 DOUBLE
> )
> row format delimited fields terminated by ',' lines terminated by '\n'
> stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location
> '${env:RESULT_DIR}';
> insert into table ${env:RESULT_TABLE}
> select
> n_name,
> sum(l_extendedprice * (1 - l_discount)) as revenue
> from
> customer,
> orders,
> lineitem,
> supplier,
> nation,
> region
> where
> c_custkey = o_custkey
> and l_orderkey = o_orderkey
> and l_suppkey = s_suppkey
> and c_nationkey = s_nationkey
> and s_nationkey = n_nationkey
> and n_regionkey = r_regionkey
> and r_name = 'AFRICA'
> and o_orderdate >= '1993-01-01'
> and o_orderdate < '1994-01-01'
> group by
> n_name
> order by
> revenue desc;
> (4)Similar issue also exists even after we simplified original query to a
> simpler one as below:
> drop table if exists ${env:RESULT_TABLE};
> create table ${env:RESULT_TABLE} (
> pid2 DOUBLE
> )
> row format delimited fields terminated by ',' lines terminated by '\n'
> stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location
> '${env:RESULT_DIR}';
> insert into table ${env:RESULT_TABLE}
> select
> sum(l_extendedprice * (1 - l_discount)) as revenue
> from
> lineitem
> group by
> l_orderkey
> order by
> revenue;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)