[
https://issues.apache.org/jira/browse/DRILL-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288885#comment-14288885
]
Daniel Barclay (Drill/MapR) commented on DRILL-1572:
----------------------------------------------------
Are we and our code remembering that addition is not associative for
limited-precision numeric representations (such as IEEE-754/Java/etc.
floating-point types)?
Is Drill supposed to compute a sum by adding the values in the same order each
time? If so, expecting to get the exactly same sum value might be valid.
Or is Drill required only to add all the values together in _some_
order/grouping (that is, not necessarily in the same order--e.g., computing
sub-sums for batches and assimilating those batches' sums in different orders
in different runs)? If so, we can't expect to get the same sum value each
time. (And recall that in general the difference can be large--not just the
equivalent of a couple of least-significant bits/digits.)
> accuracy issue with tpch query 01.q and 10.q
> --------------------------------------------
>
> Key: DRILL-1572
> URL: https://issues.apache.org/jira/browse/DRILL-1572
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 0.7.0
> Reporter: Chun Chang
> Assignee: Deneche A. Hakim
> Priority: Critical
> Fix For: 0.8.0
>
>
> code base:
> #Wed Oct 22 11:40:19 PDT 2014
> git.commit.id.abbrev=ae2790e
> The following two tpch queries failed verification due to accuracy in
> returned data.
> /home/work/drill-testing/testing/framework/resources/Advanced/Passing/tpch100/parquet/01.q
> :
> {noformat}
> -- using 1395599672 as a seed to the RNG
> select
> l_returnflag,
> l_linestatus,
> sum(l_quantity) as sum_qty,
> sum(l_extendedprice) as sum_base_price,
> sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
> sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
> avg(l_quantity) as avg_qty,
> avg(l_extendedprice) as avg_price,
> avg(l_discount) as avg_disc,
> count(*) as count_order
> from
> lineitem
> where
> l_shipdate <= date '1998-12-01' - interval '120' day (3)
> group by
> l_returnflag,
> l_linestatus
> order by
> l_returnflag,
> l_linestatus
> {noformat}
> Expected number of rows: 4
> Actual number of rows from Drill: 4
> Number of matching rows: 0
> Number of rows missing: 4
> Number of rows unexpected: 4
> {noformat}
> These rows are not expected (first 10):
> A F 3.775127758E9 5.660776097194428E12
> 5.377736398183944E12 5.59284742951595E12 25.499370423275426
> 38236.11698430475 0.05000224353079674 148047881
> N O 7.269911583E9 1.0901214476134316E13
> 1.0356163586785012E13 1.0770418891237377E13 25.499873337396807
> 38236.997134222445 0.04999763132401859 285095988
> R F 3.77572497E9 5.661603032745363E12
> 5.378513563915393E12 5.593662252666899E12 25.50006628406532
> 38236.697258453125 0.050001304339521574 148067261
> N F 9.8553062E7 1.4777109838597995E11
> 1.4038496596503476E11 1.4599979303277576E11 25.501556956882876
> 38237.19938880449 0.04998528433803116 3864590
> These rows are missing (first 10):
> A F 3.775127758E9 5.660776097197787E12
> 5.377736398184481E12 5.592847429514863E12 25.499370423275426
> 38236.11698432743 0.05000224347714149 148047881 (1 time(s))
> N O 7.269911583E9 1.0901214476133223E13
> 1.0356163586779275E13 1.0770418891231504E13 25.499873337396807
> 38236.99713421861 0.04999763124732218 285095988 (1 time(s))
> R F 3.77572497E9 5.661603032743618E12
> 5.378513563916123E12 5.593662252665821E12 25.50006628406532
> 38236.69725844134 0.05000130428587516 148067261 (1 time(s))
> N F 9.8553062E7 1.4777109838598825E11
> 1.4038496596503897E11 1.4599979303278268E11 25.501556956882876
> 38237.19938880664 0.04998528433773886 3864590 (1 time(s))
> {noformat}
> Test_Failed: 2014/10/22 11:26:11.0011 - Verification failed.
> /home/work/drill-testing/testing/framework/resources/Advanced/Passing/tpch100/parquet/10.q
> :
> {noformat}
> -- tpch10 using 1395599672 as a seed to the RNG
> select
> c.c_custkey,
> c.c_name,
> sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
> c.c_acctbal,
> n.n_name,
> c.c_address,
> c.c_phone,
> c.c_comment
> from
> customer c,
> orders o,
> lineitem l,
> nation n
> where
> c.c_custkey = o.o_custkey
> and l.l_orderkey = o.o_orderkey
> and o.o_orderdate >= date '1994-03-01'
> and o.o_orderdate < date '1994-03-01' + interval '3' month
> and l.l_returnflag = 'R'
> and c.c_nationkey = n.n_nationkey
> group by
> c.c_custkey,
> c.c_name,
> c.c_acctbal,
> c.c_phone,
> n.n_name,
> c.c_address,
> c.c_comment
> order by
> revenue desc
> limit 20
> {noformat}
> Expected number of rows: 20
> Actual number of rows from Drill: 20
> Number of matching rows: 17
> Number of rows missing: 3
> Number of rows unexpected: 3
> {noformat}
> These rows are not expected (first 10):
> 6372220 Customer#006372220 793123.1516 2836.62 FRANCE
> bfd3hpM99xDp6AFsGNOPP 16-143-244-4177 regular theodolites are according to
> the unusual
> 14211121 Customer#014211121 796135.1836 7443.03 MOROCCO
> ks7nhxDqzdk72CfWM 25-755-902-4219 lyly final packages doubt furiously
> carefully bold theodolites. final
> 246700 Customer#000246700 801786.5193999999 5244.71 CHINA
> o6FXqCXJjKy3JdCAvuU3XJNRFcz35rAoc 28-466-828-8872 even asymptotes
> cajole slyly with the furiously bold accounts. furiously unusual platelets
> believe quickly final,
> These rows are missing (first 10):
> 14211121 Customer#014211121 796135.1835999999 7443.03
> MOROCCO ks7nhxDqzdk72CfWM 25-755-902-4219 lyly final packages doubt
> furiously carefully bold theodolites. final (1 time(s))
> 246700 Customer#000246700 801786.5194000001 5244.71 CHINA
> o6FXqCXJjKy3JdCAvuU3XJNRFcz35rAoc 28-466-828-8872 even asymptotes
> cajole slyly with the furiously bold accounts. furiously unusual platelets
> believe quickly final, (1 time(s))
> 6372220 Customer#006372220 793123.1516000001 2836.62 FRANCE
> bfd3hpM99xDp6AFsGNOPP 16-143-244-4177 regular theodolites are according to
> the unusual (1 time(s))
> {noformat}
> Test_Failed: 2014/10/22 11:23:10.0010 - Verification failed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)