[jira] [Commented] (HIVE-13292) Different DOUBLE type precision issue between Spark and MR engine

Sergey Shelukhin (JIRA) Tue, 15 Mar 2016 23:04:13 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-13292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196819#comment-15196819
 ]


Sergey Shelukhin commented on HIVE-13292:
-----------------------------------------

With double type, it's usually by design

> Different DOUBLE type precision issue between Spark and MR engine
> -----------------------------------------------------------------
>
>                 Key: HIVE-13292
>                 URL: https://issues.apache.org/jira/browse/HIVE-13292
>             Project: Hive
>          Issue Type: Bug
>         Environment: Apache Hive 2.0.0
> Apache Spark 1.6.0
>            Reporter: Xin Hao
>
> Different DOUBLE type precision issue between Spark and MR engine.
> Found when executing the TPC-H query5 with scale factor 2 (2GB data size). 
> More details are as below.
> (1)The MR engine output:
> MOZAMBIQUE,1.0646195910990009E8
> ETHIOPIA,1.0108856206629996E8
> ALGERIA,9.987582690420012E7
> MOROCCO,9.785484184850013E7
> KENYA,9.412388077690017E7
> (2)The Spark engine output:
> MOZAMBIQUE,1.064619591099E8
> ETHIOPIA,1.0108856206630005E8
> ALGERIA,9.987582690419997E7
> MOROCCO,9.785484184850003E7
> KENYA,9.412388077690002E7
> (3)Detail SQL used:
> drop table if exists ${env:RESULT_TABLE};
> create table ${env:RESULT_TABLE} (
>   pid1 STRING,
>   pid2 DOUBLE
> )
> row format delimited fields terminated by ',' lines terminated by '\n'
> stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location 
> '${env:RESULT_DIR}';
> insert into table ${env:RESULT_TABLE}
> select
>         n_name,
>         sum(l_extendedprice * (1 - l_discount)) as revenue
> from
>         customer,
>         orders,
>         lineitem,
>         supplier,
>         nation,
>         region
> where
>         c_custkey = o_custkey
>         and l_orderkey = o_orderkey
>         and l_suppkey = s_suppkey
>         and c_nationkey = s_nationkey
>         and s_nationkey = n_nationkey
>         and n_regionkey = r_regionkey
>         and r_name = 'AFRICA'
>         and o_orderdate >= '1993-01-01'
>         and o_orderdate < '1994-01-01'
> group by
>         n_name
> order by
>         revenue desc;
> (4)Similar issue also exists even after we simplified original query to a 
> simpler one as below:
> drop table if exists ${env:RESULT_TABLE};
> create table ${env:RESULT_TABLE} (
>   pid2 DOUBLE
> )
> row format delimited fields terminated by ',' lines terminated by '\n'
> stored as ${env:HIVE_DEFAULT_FILEFORMAT_RESULT_TABLE} location 
> '${env:RESULT_DIR}';
> insert into table ${env:RESULT_TABLE}
> select
>         sum(l_extendedprice * (1 - l_discount)) as revenue
> from
>         lineitem
> group by
>         l_orderkey
> order by
>         revenue;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-13292) Different DOUBLE type precision issue between Spark and MR engine

Reply via email to