[ https://issues.apache.org/jira/browse/HIVE-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196819#comment-15196819 ]
Sergey Shelukhin commented on HIVE-13292: ----------------------------------------- With double type, it's usually by design > Different DOUBLE type precision issue between Spark and MR engine > ----------------------------------------------------------------- > > Key: HIVE-13292 > URL: https://issues.apache.org/jira/browse/HIVE-13292 > Project: Hive > Issue Type: Bug > Environment: Apache Hive 2.0.0 > Apache Spark 1.6.0 > Reporter: Xin Hao > > Different DOUBLE type precision issue between Spark and MR engine. > Found when executing the TPC-H query5 with scale factor 2 (2GB data size). > More details are as below. > (1)The MR engine output: > MOZAMBIQUE,1.0646195910990009E8 > ETHIOPIA,1.0108856206629996E8 > ALGERIA,9.987582690420012E7 > MOROCCO,9.785484184850013E7 > KENYA,9.412388077690017E7 > (2)The Spark engine output: > MOZAMBIQUE,1.064619591099E8 > ETHIOPIA,1.0108856206630005E8 > ALGERIA,9.987582690419997E7 > MOROCCO,9.785484184850003E7 > KENYA,9.412388077690002E7 > (3)Detail SQL used: > drop table if exists ${env:RESULT_TABLE}; > create table ${env:RESULT_TABLE} ( > pid1 STRING, > pid2 DOUBLE > ) > row format delimited fields terminated by ',' lines terminated by '\n' > stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location > '${env:RESULT_DIR}'; > insert into table ${env:RESULT_TABLE} > select > n_name, > sum(l_extendedprice * (1 - l_discount)) as revenue > from > customer, > orders, > lineitem, > supplier, > nation, > region > where > c_custkey = o_custkey > and l_orderkey = o_orderkey > and l_suppkey = s_suppkey > and c_nationkey = s_nationkey > and s_nationkey = n_nationkey > and n_regionkey = r_regionkey > and r_name = 'AFRICA' > and o_orderdate >= '1993-01-01' > and o_orderdate < '1994-01-01' > group by > n_name > order by > revenue desc; > (4)Similar issue also exists even after we simplified original query to a > simpler one as below: > drop table if exists ${env:RESULT_TABLE}; > create table ${env:RESULT_TABLE} ( > pid2 DOUBLE > ) > row format delimited fields terminated by ',' lines terminated by '\n' > stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location > '${env:RESULT_DIR}'; > insert into table ${env:RESULT_TABLE} > select > sum(l_extendedprice * (1 - l_discount)) as revenue > from > lineitem > group by > l_orderkey > order by > revenue; -- This message was sent by Atlassian JIRA (v6.3.4#6332)