[ 
https://issues.apache.org/jira/browse/DRILL-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288885#comment-14288885
 ] 

Daniel Barclay (Drill/MapR) commented on DRILL-1572:
----------------------------------------------------

Are we and our code remembering that addition is not associative for 
limited-precision numeric representations (such as IEEE-754/Java/etc. 
floating-point types)?


Is Drill supposed to compute a sum by adding the values in the same order each 
time?  If so, expecting to get the exactly same sum value might be valid.

Or is Drill required only to add all the values together in _some_ 
order/grouping (that is, not necessarily in the same order--e.g., computing 
sub-sums for batches and assimilating those batches' sums in different orders 
in different runs)?  If so, we can't expect to get the same sum value each 
time.  (And recall that in general the difference can be large--not just the 
equivalent of a couple of least-significant bits/digits.)


> accuracy issue with tpch query 01.q and 10.q
> --------------------------------------------
>
>                 Key: DRILL-1572
>                 URL: https://issues.apache.org/jira/browse/DRILL-1572
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 0.7.0
>            Reporter: Chun Chang
>            Assignee: Deneche A. Hakim
>            Priority: Critical
>             Fix For: 0.8.0
>
>
> code base:
> #Wed Oct 22 11:40:19 PDT 2014
> git.commit.id.abbrev=ae2790e
> The following two tpch queries failed verification due to accuracy in 
> returned data.
> /home/work/drill-testing/testing/framework/resources/Advanced/Passing/tpch100/parquet/01.q
>  :
> {noformat}
> -- using 1395599672 as a seed to the RNG
> select
>   l_returnflag,
>   l_linestatus,
>   sum(l_quantity) as sum_qty,
>   sum(l_extendedprice) as sum_base_price,
>   sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
>   sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
>   avg(l_quantity) as avg_qty,
>   avg(l_extendedprice) as avg_price,
>   avg(l_discount) as avg_disc,
>   count(*) as count_order
> from
>   lineitem
> where
>   l_shipdate <= date '1998-12-01' - interval '120' day (3)
> group by
>   l_returnflag,
>   l_linestatus
> order by
>   l_returnflag,
>   l_linestatus
> {noformat}
>          Expected number of rows: 4
> Actual number of rows from Drill: 4
>          Number of matching rows: 0
>           Number of rows missing: 4
>        Number of rows unexpected: 4
> {noformat}
> These rows are not expected (first 10):
>       A       F       3.775127758E9   5.660776097194428E12    
> 5.377736398183944E12    5.59284742951595E12     25.499370423275426      
> 38236.11698430475       0.05000224353079674     148047881       
>       N       O       7.269911583E9   1.0901214476134316E13   
> 1.0356163586785012E13   1.0770418891237377E13   25.499873337396807      
> 38236.997134222445      0.04999763132401859     285095988       
>       R       F       3.77572497E9    5.661603032745363E12    
> 5.378513563915393E12    5.593662252666899E12    25.50006628406532       
> 38236.697258453125      0.050001304339521574    148067261       
>       N       F       9.8553062E7     1.4777109838597995E11   
> 1.4038496596503476E11   1.4599979303277576E11   25.501556956882876      
> 38237.19938880449       0.04998528433803116     3864590 
> These rows are missing (first 10):
>       A       F       3.775127758E9   5.660776097197787E12    
> 5.377736398184481E12    5.592847429514863E12    25.499370423275426      
> 38236.11698432743       0.05000224347714149     148047881        (1 time(s))
>       N       O       7.269911583E9   1.0901214476133223E13   
> 1.0356163586779275E13   1.0770418891231504E13   25.499873337396807      
> 38236.99713421861       0.04999763124732218     285095988        (1 time(s))
>       R       F       3.77572497E9    5.661603032743618E12    
> 5.378513563916123E12    5.593662252665821E12    25.50006628406532       
> 38236.69725844134       0.05000130428587516     148067261        (1 time(s))
>       N       F       9.8553062E7     1.4777109838598825E11   
> 1.4038496596503897E11   1.4599979303278268E11   25.501556956882876      
> 38237.19938880664       0.04998528433773886     3864590  (1 time(s))
> {noformat}
> Test_Failed: 2014/10/22 11:26:11.0011 - Verification failed.
> /home/work/drill-testing/testing/framework/resources/Advanced/Passing/tpch100/parquet/10.q
>  :
> {noformat}
> -- tpch10 using 1395599672 as a seed to the RNG
> select
>   c.c_custkey,
>   c.c_name,
>   sum(l.l_extendedprice * (1 - l.l_discount)) as revenue,
>   c.c_acctbal,
>   n.n_name,
>   c.c_address,
>   c.c_phone,
>   c.c_comment
> from
>   customer c,
>   orders o,
>   lineitem l,
>   nation n
> where
>   c.c_custkey = o.o_custkey
>   and l.l_orderkey = o.o_orderkey
>   and o.o_orderdate >= date '1994-03-01'
>   and o.o_orderdate < date '1994-03-01' + interval '3' month
>   and l.l_returnflag = 'R'
>   and c.c_nationkey = n.n_nationkey
> group by
>   c.c_custkey,
>   c.c_name,
>   c.c_acctbal,
>   c.c_phone,
>   n.n_name,
>   c.c_address,
>   c.c_comment
> order by
>   revenue desc
> limit 20
> {noformat}
>          Expected number of rows: 20
> Actual number of rows from Drill: 20
>          Number of matching rows: 17
>           Number of rows missing: 3
>        Number of rows unexpected: 3
> {noformat}
> These rows are not expected (first 10):
>       6372220 Customer#006372220      793123.1516     2836.62 FRANCE  
> bfd3hpM99xDp6AFsGNOPP   16-143-244-4177  regular theodolites are according to 
> the unusual       
>       14211121        Customer#014211121      796135.1836     7443.03 MOROCCO 
> ks7nhxDqzdk72CfWM       25-755-902-4219 lyly final packages doubt furiously 
> carefully bold theodolites. final   
>       246700  Customer#000246700      801786.5193999999       5244.71 CHINA   
> o6FXqCXJjKy3JdCAvuU3XJNRFcz35rAoc       28-466-828-8872  even asymptotes 
> cajole slyly with the furiously bold accounts. furiously unusual platelets 
> believe quickly final,      
> These rows are missing (first 10):
>       14211121        Customer#014211121      796135.1835999999       7443.03 
> MOROCCO ks7nhxDqzdk72CfWM       25-755-902-4219 lyly final packages doubt 
> furiously carefully bold theodolites. final    (1 time(s))
>       246700  Customer#000246700      801786.5194000001       5244.71 CHINA   
> o6FXqCXJjKy3JdCAvuU3XJNRFcz35rAoc       28-466-828-8872  even asymptotes 
> cajole slyly with the furiously bold accounts. furiously unusual platelets 
> believe quickly final,       (1 time(s))
>       6372220 Customer#006372220      793123.1516000001       2836.62 FRANCE  
> bfd3hpM99xDp6AFsGNOPP   16-143-244-4177  regular theodolites are according to 
> the unusual        (1 time(s))
> {noformat}
> Test_Failed: 2014/10/22 11:23:10.0010 - Verification failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to