Bugs item #2783279, was opened at 2009-04-29 00:33
Message generated for change (Comment added) made by stmane
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2783279&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core
Group: MonetDB Common CVS Head
>Status: Closed
>Resolution: Fixed
Priority: 5
Private: No
Submitted By: Stefan de Konink (skinkie)
Assigned to: Niels Nes (nielsnes)
>Summary: multi-column join with expression fails
Initial Comment:
CREATE TABLE "sys"."way_nds" (
"way" int,
"idx" int,
"to_node" int
);
create table segments (way integer, node1 integer, node2 integer);
insert into segments select t1.way, t1.to_node, t2.to_node from way_nds as t1,
way_nds as t2 where t1.way = t2.way and t1.idx = t2.idx-1;
way_nds consists of 370896230 rows
The error on the server:
!ERROR: bunfastins: bat too large
!ERROR: bunfastins: bat too large
!ERROR: bunfastins: bat too large
!ERROR: bunfastins: bat too large
!ERROR: bunfastins: bat too large
!ERROR: bunfastins: bat too large
!ERROR: bunfastins: bat too large
!ERROR: bunfastins: bat too large
!ERROR: bunfastins: bat too large
!ERROR: bunfastins: bat too large
Segmentation fault
(will try to run GDB, but currently have enabled optimize)
I am using 64bit/oid32.
----------------------------------------------------------------------
>Comment By: Stefan Manegold (stmane)
Date: 2009-05-13 19:58
Message:
fixed in CVS by
===================================================================
2009/04/29 - stmane: MonetDB5/src/modules/mal/mkey.mx,1.8.2.1
do not try to dereference a pointer to a 32-bit int or flt
as (potentially) 64-bit wrd ---
--- you might not get what you expect ...
===================================================================
2009/04/29 - stmane: MonetDB5/src/modules/mal/mkey.mx,1.8.2.2
since hash value shave been wrd for more than half a year, now,
let's also use type wrd for the hash mask
===================================================================
2009/05/13 - stmane: MonetDB5/src/modules/mal/mkey.mx,1.8.2.3
make sure we use a "reasonable" 64-bit mask when left-shifting 1 by >= 32
bit
(on 64-bit systems)
===================================================================
2009/05/13 - stmane: MonetDB5/src/modules/mal/mkey.mx,1.8.2.4
make sure that bulk_rotate_xor_hash() is side-effect free, i.e.,
produces a new result BAT instead of modifying its first input in place
===================================================================
test added in
sql/src/test/BugTracker-2009/Tests/multi-column_join_with_expression.SF-2783279.*
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-04-30 23:08
Message:
A simplified test reveals (still same setup as before), both with and
without my mkey patch:
sql>CREATE TABLE "sys"."way_nds_9" ( "way" int, "idx" int, "to_node" int
);
0 tuples
sql>insert into way_nds_9 select * from way_nds limit 9;
Rows affected 9
sql>select * from way_nds_9;
+-----+-----+-----------+
| way | idx | to_node |
+=====+=====+===========+
| 35 | 0 | 200542 |
| 35 | 1 | 274057218 |
| 35 | 2 | 200550 |
| 35 | 3 | 200551 |
| 35 | 4 | 200553 |
| 37 | 0 | 200511 |
| 37 | 1 | 177231081 |
| 37 | 2 | 200513 |
| 37 | 3 | 177081428 |
+-----+-----+-----------+
9 tuples
sql>select * from way_nds_9 as t1, way_nds_9 as t2 where t1.way = t2.way
and t1.idx = t2.idx;
+-----+-----+-----------+-----+-----+-----------+
| way | idx | to_node | way | idx | to_node |
+=====+=====+===========+=====+=====+===========+
| 35 | 0 | 200542 | 35 | 0 | 200542 |
| 35 | 1 | 274057218 | 35 | 1 | 274057218 |
| 35 | 2 | 200550 | 35 | 2 | 200550 |
| 35 | 3 | 200551 | 35 | 3 | 200551 |
| 35 | 4 | 200553 | 35 | 4 | 200553 |
| 37 | 0 | 200511 | 37 | 0 | 200511 |
| 37 | 1 | 177231081 | 37 | 1 | 177231081 |
| 37 | 2 | 200513 | 37 | 2 | 200513 |
| 37 | 3 | 177081428 | 37 | 3 | 177081428 |
+-----+-----+-----------+-----+-----+-----------+
9 tuples
sql>select * from way_nds_9 as t1, way_nds_9 as t2 where t1.way = t2.way
and t1.idx = t2.idx-1;
0 tuples
sql>select * from way_nds_9 as t1, way_nds_9 as t2 where t1.way = t2.way
and t2.idx-1 = t1.idx;
0 tuples
sql>select * from way_nds_9 as t1, way_nds_9 as t2 where t1.idx = t2.idx-1
and t1.way = t2.way;
+-----+-----+-----------+-----+-----+-----------+
| way | idx | to_node | way | idx | to_node |
+=====+=====+===========+=====+=====+===========+
| 35 | 0 | 200542 | 35 | 1 | 274057218 |
| 35 | 1 | 274057218 | 35 | 2 | 200550 |
| 35 | 2 | 200550 | 35 | 3 | 200551 |
| 35 | 3 | 200551 | 35 | 4 | 200553 |
| 37 | 0 | 200511 | 37 | 1 | 177231081 |
| 37 | 1 | 177231081 | 37 | 2 | 200513 |
| 37 | 2 | 200513 | 37 | 3 | 177081428 |
+-----+-----+-----------+-----+-----+-----------+
7 tuples
sql>select * from way_nds_9 as t1, way_nds_9 as t2 where t2.idx-1 = t1.idx
and t1.way = t2.way;
+-----+-----+-----------+-----+-----+-----------+
| way | idx | to_node | way | idx | to_node |
+=====+=====+===========+=====+=====+===========+
| 35 | 0 | 200542 | 35 | 1 | 274057218 |
| 35 | 1 | 274057218 | 35 | 2 | 200550 |
| 35 | 2 | 200550 | 35 | 3 | 200551 |
| 35 | 3 | 200551 | 35 | 4 | 200553 |
| 37 | 0 | 200511 | 37 | 1 | 177231081 |
| 37 | 1 | 177231081 | 37 | 2 | 200513 |
| 37 | 2 | 200513 | 37 | 3 | 177081428 |
+-----+-----+-----------+-----+-----+-----------+
7 tuples
Hence, there seems to be a bug in the generated plan and/or MAL code for
multi-column joins in case the second term of the conjunctive where clause
contains an (arithmetic) expression (as opposed to just column references)
...
... detailed analysis of generated plan and MAL code pending ...
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-04-30 22:23
Message:
avoiding the multi-column join (with arithmetic term) seems to work fine:
sql>select min(idx), max(idx), min(way), max(way) from way_nds;
+-----+-------+-----+----------+
| L11 | L12 | L13 | L14 |
+=====+=======+=====+==========+
| 0 | 36168 | 35 | 33244337 |
+-----+-------+-----+----------+
1 tuple
sql>select count(*) from way_nds as t1, way_nds as t2 where ((t1.way *
100000) + t1.idx) = ((t2.way * 100000) + t2.idx);
+-----------+
| L7 |
+===========+
| 370896230 |
+-----------+
1 tuple
sql>select count(*) from way_nds as t1, way_nds as t2 where ((t1.way *
100000) + t1.idx) = ((t2.way * 100000) + (t2.idx-1));
+-----------+
| L10 |
+===========+
| 343983657 |
+-----------+
1 tuple
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-04-30 22:11
Message:
Without my patch,
select count(*) from way_nds as t1, way_nds as t2 where t1.way = t2.way
and t1.idx = t2.idx-1;
seems to trigger a segfault.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-04-30 20:06
Message:
With today's CVS code base (Feb2009 branch) plus the attached mkey.mx
patch, I get the following on 64-bit Fedora 10, using 32-bit OIDs,
configured with --enable-strict --disable-debug --enable-optimize
--disable-assert --enable-oid32, compiled with gcc 4.3.2, server started
with default `mserver5 --dbinit='include sql;'`:
sql>CREATE TABLE "sys"."way_nds" ( "way" int, "idx" int, "to_node" int );
0 tuples
sql>copy 370896230 records into way_nds from
'/export/data1/konink/way_nds.csv' delimiters ',';
Rows affected 370896230
sql>select count(*) from way_nds;
+-----------+
| L1 |
+===========+
| 370896230 |
+-----------+
1 tuple
sql>select * from way_nds limit 9;
+-----+-----+-----------+
| way | idx | to_node |
+=====+=====+===========+
| 35 | 0 | 200542 |
| 35 | 1 | 274057218 |
| 35 | 2 | 200550 |
| 35 | 3 | 200551 |
| 35 | 4 | 200553 |
| 37 | 0 | 200511 |
| 37 | 1 | 177231081 |
| 37 | 2 | 200513 |
| 37 | 3 | 177081428 |
+-----+-----+-----------+
9 tuples
sql>select count(*) from way_nds as t1, way_nds as t2 where t1.way =
t2.way and t1.idx = t2.idx;
+-----------+
| L2 |
+===========+
| 370896230 |
+-----------+
1 tuple
sql>select count(*) from way_nds as t1, way_nds as t2 where t1.way =
t2.way and t1.idx = t2.idx-1;
+----+
| L3 |
+====+
| 0 |
+----+
1 tuple
sql>select min(idx) from way_nds;
+----+
| L6 |
+====+
| 0 |
+----+
1 tuple
sql>select max(idx) from way_nds;
+-------+
| L4 |
+=======+
| 36168 |
+-------+
1 tuple
sql>select count(*) from way_nds as t1, way_nds as t2 where t2.idx > 0 and
t1.way = t2.way and t1.idx = t2.idx-1;
+-----------+
| L5 |
+===========+
| 343983657 |
+-----------+
1 tuple
... I suspect that there are (still?) (despite or due to my patch??) some
problems with the multi-column join, possibly releated to or caused by
potential bugs in bulk_rotate_xor_hash() of module mkey, in particular when
hashing negaitve numbers and using a 32-bit left shift as done by the SQL
code generation ...
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-04-30 19:50
Message:
With the latest (April 30) CVS code base (Feb2009 branch or HEAD):
- does the "bat too large" error still occur?
- does the segfault still occur?
----------------------------------------------------------------------
Comment By: Martin Kersten (mlkersten)
Date: 2009-04-29 09:45
Message:
The sporadic segfault has been catched in the HEAD
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-04-29 08:46
Message:
My question about the primary key was not about to add one, but rather
about which attribute (or combination of attributes) has unique values /
value combinations (and hence would qualify as a primary key) --- this
information would help to (roughly) estimate the join result size ...
... apart from that, ALGjoinPath() should of course not segfault, even if
(a previous) inter mediate result "overflows", and hence triggers an
exception/error ...
----------------------------------------------------------------------
Comment By: Stefan de Konink (skinkie)
Date: 2009-04-29 08:10
Message:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe2aa3950 (LWP 7736)]
0x00007fffe544a9c9 in ALGjoinPath ()
from
/export/scratch1/konink/monetdb-head/lib/MonetDB5/lib/lib_optimizer.so
(gdb) bt
#0 0x00007fffe544a9c9 in ALGjoinPath ()
from
/export/scratch1/konink/monetdb-head/lib/MonetDB5/lib/lib_optimizer.so
#1 0x00007ffff7d12975 in DFLOWstep ()
from /export/scratch1/konink/monetdb-head/lib/libmonetdb5.so.5
#2 0x00007ffff7d14894 in runDFLOWworker ()
from /export/scratch1/konink/monetdb-head/lib/libmonetdb5.so.5
#3 0x0000003c1a4073da in start_thread () from /lib64/libpthread.so.0
#4 0x0000003c19ce62bd in clone () from /lib64/libc.so.6
[I'll recompile with debugging enabled again]
Currently no primary key has been set. But if I set it, it would be (way,
idx). But since my query makes an explicit relation between way and idx,
would a primary key make any difference?
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-04-29 05:09
Message:
"bat too large" occurs only if there are BUN_MAX BUNs; with 32-bit OIDs,
BUN_MAX is INT_MAX-1, i.e., 2^31-1-1 == 2147483646 >> 370896230.
I suspect that an intermediate result while calculating the (self-)join
"explodes".
What is the primary key of table way_nds? (way), (idx), (to_node),
(way,idx), ..., (way,idx,to_node)?
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2783279&group_id=56967
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs