Bugs item #2757642, was opened at 2009-04-12 23:12
Message generated for change (Comment added) made by johnrylander
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2757642&group_id=56967

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: SQL/Core
Group: MonetDB5 "stable"
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Rylander (johnrylander)
Assigned to: Niels Nes (nielsnes)
Summary: Inaccurate AVG of SQRT with 51M records

Initial Comment:
I'm using the latest stable Windows Vista x64-compatible release 5.10.2 based 
on the 1.28.2 kernel.

On an table "biglist" with  51751608 records and just one field, a numeric 14,2 
field called "rt", I executed the following:

UPDATE biglist SET rt=1.21;
(Rows affected 51751608)

Then I tried:

SELECT AVG(rt) FROM biglist;

And of course it reported 1.21--fine.  :)

But then I tried:

SELECT AVG(SQRT(rt)) from biglist;

and instead of the expected 1.1, I got  0.64837467465745224. A wide variety of 
other SQRT aggregations also were wildly off the mark.  (And I tried this 
earlier with a 7,2 numeric field--no joy.)

Any suggestions?  My apologies if this has already been fixed, or if I'm doing 
something wrong--I'm entirely new to MonetDB, but think it's an amazing 
project--amazing!  I hope this can be resolved soon, so I can rely on it.  :)

Thanks to all the developers and testers!  I hope to do some tiny part here 
with this report.


----------------------------------------------------------------------

>Comment By: John Rylander (johnrylander)
Date: 2009-04-13 17:27

Message:
More fascinating still.  I emptied BigList 2 and refilled and tested it as
follows (same cut-and-paste into a fixed-pitch editor for this, too):



sql>insert into biglist2 select rt from biglist limit 10;
Rows affected 10
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1000000953674316 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 10
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1000001907348633 |
+------------------------+
1 tuple

sql>insert into biglist2 select rt from biglist2;
Rows affected 20
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0999998092651366 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 40
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0999991416931152 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 80
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1000008583068848 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 160
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|      1.100003433227539 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 320
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0999964714050292 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 640
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0999860763549805 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 1280
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1000141143798827 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 2560
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1000558853149414 |
+------------------------+
1 tuple
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1000558853149414 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 5120
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0999435424804687 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 10240
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0997764587402343 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 20480
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1002252578735352 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 40960
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1008938789367675 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 81920
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0990918159484864 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 163840
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0964209556579589 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 327680
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1035164833068847 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 655360
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.1142581939697265 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 1310720
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0841147422790527 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 2621440
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|     1.0420574188232421 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 5242880
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|      1.021028709411621 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 10485760
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|      1.221028709411621 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 20971520
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|    0.80000000000000004 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 41943040
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|    0.40000000000000002 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist2;
Rows affected 83886080
sql>select avg(sqrt(rt)) as avg from biglist2;
+------------------------+
| avg                    |
+========================+
|    0.20000000000000001 |
+------------------------+
1 tuple
sql>

----------------------------------------------------------------------

Comment By: John Rylander (johnrylander)
Date: 2009-04-13 17:14

Message:
This is almost unreadable without a fixed-pitch typeface, so I'd suggest
cutting and pasting what follows in this message into a text editor--I
think the results are fascinating.  Population a single-column table with
only millions of instance of the number 1.21, one can see a pattern in the
increasingly large scale aggregation.  (The correct/ideal result in each
case of the AVG(SQRT(RT)) is exactly 1.1, of course.)

Copied from an mclient session:

sql>insert into biglist2 select rt from biglist limit 10000000;
Rows affected 10000000
sql>select avg(sqrt(rt)) from biglist2;
+------------------------+
| L4                     |
+========================+
|              1.0220502 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist limit 10000000;
Rows affected 10000000
sql>select avg(sqrt(rt)) from biglist2;
+------------------------+
| L7                     |
+========================+
|     1.1831894000000001 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist limit 10000000;
Rows affected 10000000
sql>select avg(sqrt(rt)) from biglist2;
+------------------------+
| L12                    |
+========================+
|     1.1184810666666667 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist limit 10000000;
Rows affected 10000000
sql>select avg(sqrt(rt)) from biglist2;
+------------------------+
| L15                    |
+========================+
|    0.83886079999999996 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist limit 10000000;
Rows affected 10000000
sql>select avg(sqrt(rt)) from biglist2;
+------------------------+
| L20                    |
+========================+
|    0.67108864000000001 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist limit 10000000;
Rows affected 10000000
sql>select avg(sqrt(rt)) from biglist2;
+------------------------+
| L23                    |
+========================+
|    0.55924053333333335 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist limit 10000000;
Rows affected 10000000
sql>select avg(sqrt(rt)) from biglist2;
+------------------------+
| L26                    |
+========================+
|    0.47934902857142858 |
+------------------------+
1 tuple
sql>insert into biglist2 select rt from biglist limit 10000000;
Rows affected 10000000
sql>select avg(sqrt(rt)) from biglist2;
+------------------------+
| L31                    |
+========================+
|    0.41943039999999998 |
+------------------------+
1 tuple
sql>select max(rt), min(rt) from biglist2;
+------------------+------------------+
| L34              | L35              |
+==================+==================+
| 1.21             | 1.21             |
+------------------+------------------+
1 tuple
sql>select count(rt) from biglist2;
+----------+
| L36      |
+==========+
| 80000000 |
+----------+
1 tuple
sql>

----------------------------------------------------------------------

Comment By: John Rylander (johnrylander)
Date: 2009-04-13 16:10

Message:
BTW, I created the uploaded BZ2 file using 7zip "Ultimate" compression with
the Bzip2 format--I hope this is not too troublesome.  If it is, let me
know a better format, and I'll re-upload.

----------------------------------------------------------------------

Comment By: John Rylander (johnrylander)
Date: 2009-04-13 15:29

Message:
Agreed with both comments below.  

At first I suspected a compound rounding error--and maybe the problem will
still prove to be some variation on that theme--so I increased the
precision from 7,2 to 14,2 with no change in results.  (More precisely, I
remember that the results were still way off; I do not recall if they were
identical.)

And this is undoubtedly related to table size: small tables do not produce
this error;  to my mind, this doesn't seriously mitigate the concern.  I am
not sure where the line is drawn, and I'm not yet good enough at automating
MonetDB to easily make up an iterative test case to find out.  (I was
hoping this would be easy for someone else out there. ;) )  BTW, I also
implemented this BigList table and query in PostgreSQL 8.3.7, and it gave
precisely correct answers to the queries.  (Surprisingly precise,
actually--I wonder if it uses quadruple-precision sometimes, but I don't
know.)

I'll try to upload the BigList table in BZ2 form shortly here.  It's
certainly my personal record for compression efficacy: a 296 MB table
compressed to 17KB.  :)  OF course, it's nothing but the same value about
51M times, created via:

 copy select * from biglist INTO '....\\BigList.cpy' USING DELIMITERS ',',
'\n' NULL AS ''

I hope I'm right in presuming it'll be easy for others to COPY INTO a
table given the above information?





----------------------------------------------------------------------

Comment By: Martin Kersten (mlkersten)
Date: 2009-04-13 15:01

Message:
Don't forget that SQRT works on doubles and produces doubles.
This leads to rounding errors, but does not explain yours.

>create table tmp(rt decimal(10,2));
insert into tmp values(1.2);
insert into tmp values(1.2);
insert into tmp values(1.2);
insert into tmp values(1.2);
insert into tmp values(1.2);
select avg(rt), avg(sqrt(rt)) from tmp; &3
>&2 1 -1
>&2 1 -1
>&2 1 -1
>&2 1 -1
>&2 1 -1
>
&1 2 1 2 1
% sys., sys. # table_name
% scale_down_L11,       L12 # name
% double,       double # type
% 22,   22 # length
[ 1.2,  1.0954451560974121      ]
>select sqrt(1.2);
&1 0 1 1 1
% . # table_name
% sqrt_single_value # name
% real # type
% 13 # length
[ 1.09544516    ]


----------------------------------------------------------------------

Comment By: Niels Nes (nielsnes)
Date: 2009-04-13 14:46

Message:
It seems the bug is already fixed or related to your db size. I tested with
a small table and this works fine.
Could you try your query on a small table?

----------------------------------------------------------------------

Comment By: John Rylander (johnrylander)
Date: 2009-04-13 08:39

Message:
If others can confirm this bug, i.e., if I'm not just doing some wrong and
this hasn't already been fixed in a developer build, then I think the
priority of this bug should be raised significantly.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2757642&group_id=56967

------------------------------------------------------------------------------
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

Reply via email to