Bugs item #2757642, was opened at 2009-04-13 06:12 Message generated for change (Comment added) made by stmane You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2757642&group_id=56967
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: SQL/Core Group: MonetDB5 "stable" Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Rylander (johnrylander) Assigned to: Niels Nes (nielsnes) Summary: Inaccurate AVG of SQRT with 51M records Initial Comment: I'm using the latest stable Windows Vista x64-compatible release 5.10.2 based on the 1.28.2 kernel. On an table "biglist" with 51751608 records and just one field, a numeric 14,2 field called "rt", I executed the following: UPDATE biglist SET rt=1.21; (Rows affected 51751608) Then I tried: SELECT AVG(rt) FROM biglist; And of course it reported 1.21--fine. :) But then I tried: SELECT AVG(SQRT(rt)) from biglist; and instead of the expected 1.1, I got 0.64837467465745224. A wide variety of other SQRT aggregations also were wildly off the mark. (And I tried this earlier with a 7,2 numeric field--no joy.) Any suggestions? My apologies if this has already been fixed, or if I'm doing something wrong--I'm entirely new to MonetDB, but think it's an amazing project--amazing! I hope this can be resolved soon, so I can rely on it. :) Thanks to all the developers and testers! I hope to do some tiny part here with this report. ---------------------------------------------------------------------- >Comment By: Stefan Manegold (stmane) Date: 2009-04-14 08:19 Message: With the Feb2009 CVS release branch version of Sat Arp 11 2009, i.e., $ mserver5 --version MonetDB server v5.10.3 (64-bit), based on kernel v1.28.3 (64-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2009 MonetDB B.V., all rights reserved Visit http://monetdb.cwi.nl/ for further information Configured for prefix: /ufs/manegold/_/scratch0/Monet/Testing/Stable/prefix.--enable-strict_--disable-optimize_--enable-debug_--enable-assert Libraries: libpcre: 7.8 2008-09-05 (compiled with 7.8) openssl: OpenSSL 0.9.8g 19 Oct 2007 (compiled with OpenSSL 0.9.8g 19 Oct 2007) libxml2: 2.7.3 (compiled with 2.7.3) Compiled by: [email protected] Compilation: gcc -Wall -Wextra -std=c99 -g -Werror-implicit-function-declaration -Werror -Wpointer-arith -Wdeclaration-after-statement -Wundef -Wp,-D_FORTIFY_SOURCE=2 Linking : ld -IPA -m elf_x86_64 and the attached 1.21.zip (~700 KB; expands to ~500 MB; contains 100000000x "1.21"), I get the following on my 64-bit Linux machine: create table n1 (a numeric(14,2)); create table n8 (a numeric(14,2)); create table d1 (a double); create table d8 (a double); copy 10 records into n1 from '/tmp/1.21'; [ 10 ] copy 10 records into d1 from '/tmp/1.21'; [ 10 ] copy 100000000 records into n8 from '/tmp/1.21'; [ 100000000 ] copy 100000000 records into d8 from '/tmp/1.21'; [ 100000000 ] select 'count(n1) == 10', count(*), 'avg(n1) == 1.21', avg(a), 'avg(sqrt(n1)) == 1.1', avg(sqrt(a)) from n1; % .L15, sys.n1, ., sys., ., sys. # table_name % L15, L16, single_value, scale_down_L17, single_value, L20 # name % char, wrd, char, double, char, double # type % 15, 2, 15, 22, 20, 22 # length [ "count(n1) == 10", 10, "avg(n1) == 1.21", 1.21, "avg(sqrt(n1)) == 1.1", 1.1000000238418579 ] select 'count(d1) == 10', count(*), 'avg(d1) == 1.21', avg(a), 'avg(sqrt(d1)) == 1.1', avg(sqrt(a)) from d1; % .L21, sys.d1, ., sys., ., sys. # table_name % L21, L22, single_value, L23, single_value, L24 # name % char, wrd, char, double, char, double # type % 15, 2, 15, 22, 20, 22 # length [ "count(d1) == 10", 10, "avg(d1) == 1.21", 1.2100000000000002, "avg(sqrt(d1)) == 1.1", 1.0999999999999999 ] select 'avg(sqrt(n1)) == 1.1', avg(sqrt(a)) from n1; % .L165, sys. # table_name % L165, L166 # name % char, double # type % 20, 22 # length [ "avg(sqrt(n1)) == 1.1", 1.1000000953674316 ] select 'avg(sqrt(d1)) == 1.1', avg(sqrt(a)) from d1; % .L171, sys. # table_name % L171, L172 # name % char, double # type % 20, 22 # length [ "avg(sqrt(d1)) == 1.1", 1.0999999999999999 ] select 'count(n8) == 100000000', count(*), 'avg(n8) == 1.21', avg(a), 'avg(sqrt(n8)) == 1.1', avg(sqrt(a)) from n8; % .L141, sys.n8, ., sys., ., sys. # table_name % L141, L142, single_value, scale_down_L143, single_value, L144 # name % char, wrd, char, double, char, double # type % 22, 9, 15, 22, 20, 22 # length [ "count(n8) == 100000000", 100000000, "avg(n8) == 1.21", 1.21, "avg(sqrt(n8)) == 1.1", 1.1000000238418579 ] select 'count(d8) == 100000000', count(*), 'avg(d8) == 1.21', avg(a), 'avg(sqrt(d8)) == 1.1', avg(sqrt(a)) from d8; % .L145, sys.d8, ., sys., ., sys. # table_name % L145, L146, single_value, L147, single_value, L150 # name % char, wrd, char, double, char, double # type % 22, 9, 15, 22, 20, 22 # length [ "count(d8) == 100000000", 100000000, "avg(d8) == 1.21", 1.2099999975113553, "avg(sqrt(d8)) == 1.1", 1.0999999983176105 ] select 'avg(sqrt(n8)) == 1.1', avg(sqrt(a)) from n8; % .L255, sys. # table_name % L255, L256 # name % char, double # type % 20, 22 # length [ "avg(sqrt(n8)) == 1.1", 0.33554432000000001 ] select 'avg(sqrt(d8)) == 1.1', avg(sqrt(a)) from d8; % .L261, sys. # table_name % L261, L262 # name % char, double # type % 20, 22 # length [ "avg(sqrt(d8)) == 1.1", 1.0999999983176105 ] I.e., a significant rounding(?) error occurs with 100000000 tuples (only) with type numeric(14,2) and query select 'avg(sqrt(n8)) == 1.1', avg(sqrt(a)) from n8; but not with query select 'count(n8) == 100000000', count(*), 'avg(n8) == 1.21', avg(a), 'avg(sqrt(n8)) == 1.1', avg(sqrt(a)) from n8; and not with type double with either of the queries. See attached sqrt.sql & sql.Feb2009.-n1.out for more details. ---------------------------------------------------------------------- Comment By: John Rylander (johnrylander) Date: 2009-04-14 00:27 Message: More fascinating still. I emptied BigList 2 and refilled and tested it as follows (same cut-and-paste into a fixed-pitch editor for this, too): sql>insert into biglist2 select rt from biglist limit 10; Rows affected 10 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1000000953674316 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 10 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1000001907348633 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 20 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0999998092651366 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 40 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0999991416931152 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 80 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1000008583068848 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 160 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.100003433227539 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 320 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0999964714050292 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 640 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0999860763549805 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 1280 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1000141143798827 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 2560 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1000558853149414 | +------------------------+ 1 tuple sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1000558853149414 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 5120 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0999435424804687 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 10240 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0997764587402343 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 20480 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1002252578735352 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 40960 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1008938789367675 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 81920 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0990918159484864 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 163840 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0964209556579589 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 327680 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1035164833068847 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 655360 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.1142581939697265 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 1310720 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0841147422790527 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 2621440 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.0420574188232421 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 5242880 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.021028709411621 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 10485760 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 1.221028709411621 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 20971520 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 0.80000000000000004 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 41943040 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 0.40000000000000002 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist2; Rows affected 83886080 sql>select avg(sqrt(rt)) as avg from biglist2; +------------------------+ | avg | +========================+ | 0.20000000000000001 | +------------------------+ 1 tuple sql> ---------------------------------------------------------------------- Comment By: John Rylander (johnrylander) Date: 2009-04-14 00:14 Message: This is almost unreadable without a fixed-pitch typeface, so I'd suggest cutting and pasting what follows in this message into a text editor--I think the results are fascinating. Population a single-column table with only millions of instance of the number 1.21, one can see a pattern in the increasingly large scale aggregation. (The correct/ideal result in each case of the AVG(SQRT(RT)) is exactly 1.1, of course.) Copied from an mclient session: sql>insert into biglist2 select rt from biglist limit 10000000; Rows affected 10000000 sql>select avg(sqrt(rt)) from biglist2; +------------------------+ | L4 | +========================+ | 1.0220502 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist limit 10000000; Rows affected 10000000 sql>select avg(sqrt(rt)) from biglist2; +------------------------+ | L7 | +========================+ | 1.1831894000000001 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist limit 10000000; Rows affected 10000000 sql>select avg(sqrt(rt)) from biglist2; +------------------------+ | L12 | +========================+ | 1.1184810666666667 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist limit 10000000; Rows affected 10000000 sql>select avg(sqrt(rt)) from biglist2; +------------------------+ | L15 | +========================+ | 0.83886079999999996 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist limit 10000000; Rows affected 10000000 sql>select avg(sqrt(rt)) from biglist2; +------------------------+ | L20 | +========================+ | 0.67108864000000001 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist limit 10000000; Rows affected 10000000 sql>select avg(sqrt(rt)) from biglist2; +------------------------+ | L23 | +========================+ | 0.55924053333333335 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist limit 10000000; Rows affected 10000000 sql>select avg(sqrt(rt)) from biglist2; +------------------------+ | L26 | +========================+ | 0.47934902857142858 | +------------------------+ 1 tuple sql>insert into biglist2 select rt from biglist limit 10000000; Rows affected 10000000 sql>select avg(sqrt(rt)) from biglist2; +------------------------+ | L31 | +========================+ | 0.41943039999999998 | +------------------------+ 1 tuple sql>select max(rt), min(rt) from biglist2; +------------------+------------------+ | L34 | L35 | +==================+==================+ | 1.21 | 1.21 | +------------------+------------------+ 1 tuple sql>select count(rt) from biglist2; +----------+ | L36 | +==========+ | 80000000 | +----------+ 1 tuple sql> ---------------------------------------------------------------------- Comment By: John Rylander (johnrylander) Date: 2009-04-13 23:10 Message: BTW, I created the uploaded BZ2 file using 7zip "Ultimate" compression with the Bzip2 format--I hope this is not too troublesome. If it is, let me know a better format, and I'll re-upload. ---------------------------------------------------------------------- Comment By: John Rylander (johnrylander) Date: 2009-04-13 22:29 Message: Agreed with both comments below. At first I suspected a compound rounding error--and maybe the problem will still prove to be some variation on that theme--so I increased the precision from 7,2 to 14,2 with no change in results. (More precisely, I remember that the results were still way off; I do not recall if they were identical.) And this is undoubtedly related to table size: small tables do not produce this error; to my mind, this doesn't seriously mitigate the concern. I am not sure where the line is drawn, and I'm not yet good enough at automating MonetDB to easily make up an iterative test case to find out. (I was hoping this would be easy for someone else out there. ;) ) BTW, I also implemented this BigList table and query in PostgreSQL 8.3.7, and it gave precisely correct answers to the queries. (Surprisingly precise, actually--I wonder if it uses quadruple-precision sometimes, but I don't know.) I'll try to upload the BigList table in BZ2 form shortly here. It's certainly my personal record for compression efficacy: a 296 MB table compressed to 17KB. :) OF course, it's nothing but the same value about 51M times, created via: copy select * from biglist INTO '....\\BigList.cpy' USING DELIMITERS ',', '\n' NULL AS '' I hope I'm right in presuming it'll be easy for others to COPY INTO a table given the above information? ---------------------------------------------------------------------- Comment By: Martin Kersten (mlkersten) Date: 2009-04-13 22:01 Message: Don't forget that SQRT works on doubles and produces doubles. This leads to rounding errors, but does not explain yours. >create table tmp(rt decimal(10,2)); insert into tmp values(1.2); insert into tmp values(1.2); insert into tmp values(1.2); insert into tmp values(1.2); insert into tmp values(1.2); select avg(rt), avg(sqrt(rt)) from tmp; &3 >&2 1 -1 >&2 1 -1 >&2 1 -1 >&2 1 -1 >&2 1 -1 > &1 2 1 2 1 % sys., sys. # table_name % scale_down_L11, L12 # name % double, double # type % 22, 22 # length [ 1.2, 1.0954451560974121 ] >select sqrt(1.2); &1 0 1 1 1 % . # table_name % sqrt_single_value # name % real # type % 13 # length [ 1.09544516 ] ---------------------------------------------------------------------- Comment By: Niels Nes (nielsnes) Date: 2009-04-13 21:46 Message: It seems the bug is already fixed or related to your db size. I tested with a small table and this works fine. Could you try your query on a small table? ---------------------------------------------------------------------- Comment By: John Rylander (johnrylander) Date: 2009-04-13 15:39 Message: If others can confirm this bug, i.e., if I'm not just doing some wrong and this hasn't already been fixed in a developer build, then I think the priority of this bug should be raised significantly. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2757642&group_id=56967 ------------------------------------------------------------------------------ This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com _______________________________________________ Monetdb-bugs mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/monetdb-bugs
