Bugs item #2338524, was opened at 2008-11-24 13:08
Message generated for change (Comment added) made by lsidir
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2338524&group_id=56967
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF/runtime
Group: MonetDB4 "stable"
Status: Open
Resolution: Fixed
Priority: 8
Private: No
Submitted By: Floris Ouwendijk (floris233)
Assigned to: Lefteris Sidirourgos (lsidir)
Summary: Deadlock on barrier
Initial Comment:
While running multiple processes performing queries that add/remove documents,
the system locked.
The following is a (gdb) dump of the threads, and for each of the threads the
location.
The version of the database is the August 2008 tarball release.
(gdb) info threads
67 Thread 1157667136 (LWP 14091) 0x000000366440c888 in __lll_mutex_lock_wait
() from /lib64/libpthread.so.0
66 Thread 1166059840 (LWP 1404) 0x000000366440c888 in __lll_mutex_lock_wait
() from /lib64/libpthread.so.0
62 Thread 1149274432 (LWP 1400) 0x000000366440c888 in __lll_mutex_lock_wait
() from /lib64/libpthread.so.0
23 Thread 1124096320 (LWP 5202) 0x000000366440c0ed in sem_wait () from
/lib64/libpthread.so.0
5 Thread 1107310912 (LWP 5041) 0x000000366440d531 in __nanosleep_nocancel ()
from /lib64/libpthread.so.0
4 Thread 1098918208 (LWP 5040) 0x00000036638c7db2 in __select_nocancel ()
from /lib64/libc.so.6
3 Thread 1090525504 (LWP 5039) 0x00000036638c7db2 in __select_nocancel ()
from /lib64/libc.so.6
2 Thread 1082132800 (LWP 5038) 0x000000366440d531 in __nanosleep_nocancel ()
from /lib64/libpthread.so.0
* 1 Thread 46912538871296 (LWP 5035) 0x00000036638c0e1b in __read_nocancel ()
from /lib64/libc.so.6
(gdb) thre 1
[Switching to thread 1 (Thread 46912538871296 (LWP 5035))]#0
0x00000036638c0e1b in __read_nocancel () from /lib64/libc.so.6
(gdb) l
11979 MT_unset_lock(PF_UPDATE_LOCK, "pflock_begin: updateLock for
update");
11980 } else {
11981 /* update in exclusive mode */
11982 MT_down_sema(PF_UPDATE_BARRIER, "pflock_begin: updateBarrier
for retry");
11983 MT_set_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
writer");
11984 pf_writer = *wsid;
11985 MT_unset_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
writer");
11986 }
11987 return GDK_SUCCEED;
11988 }
(gdb) bt
#0 0x00000036638c0e1b in __read_nocancel () from /lib64/libc.so.6
#1 0x000000366386a617 in _IO_new_file_underflow () from /lib64/libc.so.6
#2 0x000000366386930e in _IO_file_xsgetn_internal () from /lib64/libc.so.6
#3 0x000000366385f692 in fread () from /lib64/libc.so.6
#4 0x00002aaaacf63c63 in file_read (s=0x605d70, buf=0x67bbbe, elmsize=1,
cnt=1) at stream.c:447
#5 0x00002aaaacf63311 in stream_read (s=0x605d70, buf=0x67bbbe, elmsize=1,
cnt=1) at stream.c:165
#6 0x00002aaaabc329ff in readClient (c=0x2aaaabe99bc0) at monet_client.c:428
#7 0x00002aaaabc3323a in serveClient (stk=0, lt=0x6536b8, res=0x7fffe6999c30)
at monet_client.c:622
#8 0x00002aaaabbfd54b in interpret (stk=0, lt=0x6536b8, res=0x7fffe6999c30) at
monet_interpreter.c:808
#9 0x00002aaaabc2dc48 in handleRequest (t=0x2aaaacf4c9a0, q=0x6ed8188,
res=0x7fffe6999c30) at monet_queue.c:342
#10 0x00002aaaabc2e145 in doRequest (t=0x2aaaacf4c9a0, preference=0x0) at
monet_queue.c:368
#11 0x00002aaaabc846dd in monetInterpreter (status=0x7fffe6999d34) at
monet_process.c:45
#12 0x0000000000401fa4 in main (argc=3, av=0x7fffe6999e58) at Mserver.c:234
(gdb) thre 2
[Switching to thread 2 (Thread 1082132800 (LWP 5038))]#0 0x000000366440d531 in
__nanosleep_nocancel () from /lib64/libpthread.so.0
(gdb) l
11989
11990 /* query execution finishes; note that we have PF_SHORT_LOCK during
this */
11991 int CMDpflock_end(lng *wsid)
11992 {
11993 int mode = ((*wsid) & (3LL << 30)) >> 30;
11994 lng nil = lng_nil;
11995
11996 if (mode == PF_MODE_DOCMGT) {
11997 /* last docmgt function yields the meta barrier
11998 */
(gdb) bt
#0 0x000000366440d531 in __nanosleep_nocancel () from /lib64/libpthread.so.0
#1 0x00002aaaaca238e8 in MT_sleep_ms (ms=5000) at gdk_posix.c:1770
#2 0x00002aaaac788e36 in GDKvmtrim (limit=0x2aaaacd464c0) at gdk_utils.c:1350
#3 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#4 0x00000036638ce86d in clone () from /lib64/libc.so.6
#5 0x0000000000000000 in ?? ()
(gdb) thre 3
[Switching to thread 3 (Thread 1090525504 (LWP 5039))]#0 0x00000036638c7db2 in
__select_nocancel () from /lib64/libc.so.6
(gdb) l
11999 if (*wsid == pf_special) {
12000 pf_special = -1;
12001 } else {
12002 pf_ndocmgt--;
12003 }
12004 if (pf_ndocmgt == 0) {
12005 MT_up_sema(PF_META_BARRIER, "pflock_end: metaBarrier for
docmgt");
12006 }
12007 } else {
12008 /* last active meta-information query releases meta-barrier
(gdb) bt
#0 0x00000036638c7db2 in __select_nocancel () from /lib64/libc.so.6
#1 0x00002aaaba0bbed0 in MAPIlisten (stk=2, lt=0x125e2e8, res=0x410010e0) at
mapi.c:729
#2 0x00002aaaabbfd54b in interpret (stk=2, lt=0x125e2e8, res=0x410010e0) at
monet_interpreter.c:808
#3 0x00002aaaabc2dc48 in handleRequest (t=0x2aaaacf4ca48, q=0x1258088,
res=0x410010e0) at monet_queue.c:342
#4 0x00002aaaabc2e145 in doRequest (t=0x2aaaacf4ca48, preference=0x0) at
monet_queue.c:368
#5 0x00002aaaabc846dd in monetInterpreter (status=0x7fffe6997544) at
monet_process.c:45
#6 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#7 0x00000036638ce86d in clone () from /lib64/libc.so.6
#8 0x0000000000000000 in ?? ()
(gdb) thre 4
[Switching to thread 4 (Thread 1098918208 (LWP 5040))]#0 0x00000036638c7db2 in
__select_nocancel () from /lib64/libc.so.6
(gdb) l
12009 */
12010 BUN self_passed; /* this query uses meta-information functions?
*/
12011 BUN others_passed; /* there are multiple such queries active? */
12012
12013 self_passed = BUNlocate(ws_overlaps_ws, wsid, &nil);
12014 BUNdelHead(ws_overlaps_ws, wsid, FALSE); /* release all active
info on this query */
12015 others_passed = BUNfnd(BATmirror(ws_overlaps_ws), &nil);
12016 if (self_passed != BUN_NONE && others_passed == BUN_NONE) {
12017 MT_up_sema(PF_META_BARRIER, "pflock_end: metaBarrier");
12018 }
(gdb) bt
#0 0x00000036638c7db2 in __select_nocancel () from /lib64/libc.so.6
#1 0x00002aaaba52bdda in shttpd_poll (ctx=0x41801920, milliseconds=200) at
shttpd.c:2981
#2 0x00002aaaba535638 in CMDrpcd_start (port=0x41801b80, open=0x41801b90
"\001", option=0x2aaabc127fd8 "") at xrpc_server.c:1005
#3 0x00002aaaba52d037 in CMDrpcd_start_unpack11689948 (argc=4,
argv=0x41801b70) at xrpc_server.glue.c:43
#4 0x00002aaaabbfed91 in interpret (stk=3, lt=0x11c0418, res=0x418020e0) at
monet_interpreter.c:1070
#5 0x00002aaaabc2dc48 in handleRequest (t=0x2aaaacf4caf0, q=0x11f4768,
res=0x418020e0) at monet_queue.c:342
#6 0x00002aaaabc2e145 in doRequest (t=0x2aaaacf4caf0, preference=0x0) at
monet_queue.c:368
#7 0x00002aaaabc846dd in monetInterpreter (status=0x7fffe6998184) at
monet_process.c:45
#8 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#9 0x00000036638ce86d in clone () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()
(gdb) thre 5
[Switching to thread 5 (Thread 1107310912 (LWP 5041))]#0 0x000000366440d531 in
__nanosleep_nocancel () from /lib64/libpthread.so.0
(gdb) l
12019
12020 /* last active update query releases update-barrier
12021 */
12022 if (pf_writer == *wsid) {
12023 /* finish in exclusive mode */
12024 pf_writer = -1;
12025 MT_up_sema(PF_UPDATE_BARRIER, "pflock_end: exclusive
updateBarrier");
12026 } else if (mode != PF_MODE_RDONLY) {
12027 /* finish in concurrent mode */
12028 if (--pf_nreaders == 0)
(gdb) bt
#0 0x000000366440d531 in __nanosleep_nocancel () from /lib64/libpthread.so.0
#1 0x00002aaaaca238e8 in MT_sleep_ms (ms=30000) at gdk_posix.c:1770
#2 0x00002aaab82c42cf in CMDsleep (secs=0x42001150) at alarm.c:145
#3 0x00002aaab82c361a in CMDsleep_unpack2122305048 (argc=2, argv=0x42001140)
at alarm.glue.c:43
#4 0x00002aaaabbfed91 in interpret (stk=545, lt=0x66e308, res=0x420030e0) at
monet_interpreter.c:1070
#5 0x00002aaaabc0a3cb in interpret_seqblock (stk=545, lt=0x1363898,
res=0x420030e0, scope=1) at monet_interpreter.c:2194
#6 0x00002aaaabbfc98b in interpret (stk=546, lt=0x1363898, res=0x420030e0) at
monet_interpreter.c:654
#7 0x00002aaaabc01244 in interpret_while (stk=546, lt=0x11f07d0,
res=0x420030e0) at monet_interpreter.c:1187
#8 0x00002aaaabbfcc85 in interpret (stk=546, lt=0x11f07d0, res=0x420030e0) at
monet_interpreter.c:721
#9 0x00002aaaabc0a3cb in interpret_seqblock (stk=546, lt=0x11f0708,
res=0x420030e0, scope=1) at monet_interpreter.c:2194
#10 0x00002aaaabbfc98b in interpret (stk=4, lt=0x11f0708, res=0x420030e0) at
monet_interpreter.c:654
#11 0x00002aaaabbfe726 in interpret (stk=4, lt=0x11e5338, res=0x420030e0) at
monet_interpreter.c:1044
#12 0x00002aaaabc2dc48 in handleRequest (t=0x2aaaacf4cb98, q=0x1376c78,
res=0x420030e0) at monet_queue.c:342
#13 0x00002aaaabc2e145 in doRequest (t=0x2aaaacf4cb98, preference=0x0) at
monet_queue.c:368
#14 0x00002aaaabc846dd in monetInterpreter (status=0x7fffe6998ea4) at
monet_process.c:45
#15 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#16 0x00000036638ce86d in clone () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()
(gdb) thre 23
[Switching to thread 23 (Thread 1124096320 (LWP 5202))]#0 0x000000366440c0ed
in sem_wait () from /lib64/libpthread.so.0
(gdb) l
12029 MT_up_sema(PF_UPDATE_BARRIER, "pflock_end:
updateBarrier");
12030 }
12031 }
12032 return GDK_SUCCEED;
12033 }
12034
12035 /* we are going to use a meta-information function that returns all
collections/documents
12036 * thus we must block all new docmgt queries and wait for all active
ones to finish
12037 */
12038 int CMDpflock_meta(lng *wsid)
(gdb) bt
#0 0x000000366440c0ed in sem_wait () from /lib64/libpthread.so.0
#1 0x00002aaab871e4da in CMDpflock_begin (wsid=0x42ffdd30) at
pf_support.c:11959
#2 0x00002aaab86e65ec in CMDpflock_begin_unpack57322442 (argc=2,
argv=0x42ffdd20) at pf_support.glue.c:1906
#3 0x00002aaaabbfed91 in interpret (stk=3185, lt=0x2aab5502c078,
res=0x43003dc0) at monet_interpreter.c:1070
#4 0x00002aaaabc0a3cb in interpret_seqblock (stk=3185, lt=0x2aab57e414b8,
res=0x43003dc0, scope=1) at monet_interpreter.c:2194
#5 0x00002aaaabbfc98b in interpret (stk=3571, lt=0x2aab55ef1228,
res=0x43003dc0) at monet_interpreter.c:654
#6 0x00002aaaabc0a3cb in interpret_seqblock (stk=3571, lt=0x2aab4dfe9308,
res=0x43003dc0, scope=1) at monet_interpreter.c:2194
#7 0x00002aaaabbfc98b in interpret (stk=3683, lt=0x2aab4dfe9308,
res=0x43003dc0) at monet_interpreter.c:654
#8 0x00002aaaabbfe726 in interpret (stk=3683, lt=0x2aab4e221688,
res=0x43003dc0) at monet_interpreter.c:1044
#9 0x00002aaaabc0a0cc in interpret_assignment (stk=3683, lt=0x2aab4e221638,
res=0x43003dc0) at monet_interpreter.c:2146
#10 0x00002aaaabbfc8fa in interpret (stk=3683, lt=0x2aab4e221638,
res=0x43003dc0) at monet_interpreter.c:638
#11 0x00002aaaabc0a3cb in interpret_seqblock (stk=3683, lt=0x2aab562f9580,
res=0x43003dc0, scope=1) at monet_interpreter.c:2194
#12 0x00002aaaabbfc98b in interpret (stk=1, lt=0x2aab562f9580, res=0x43003dc0)
at monet_interpreter.c:654
#13 0x00002aaaab1a2667 in CMDcatch (stk=1, lt=0x2aab562f9558, res=0x43003dc0)
at builtin.c:1064
#14 0x00002aaaabbfd54b in interpret (stk=1, lt=0x2aab562f9558, res=0x43003dc0)
at monet_interpreter.c:808
#15 0x00002aaaabc0a0cc in interpret_assignment (stk=1, lt=0x2aab562f9508,
res=0x43003dc0) at monet_interpreter.c:2146
#16 0x00002aaaabbfc8fa in interpret (stk=1, lt=0x2aab562f9508, res=0x43003dc0)
at monet_interpreter.c:638
#17 0x00002aaaabc0a3cb in interpret_seqblock (stk=1, lt=0x2aab557a6f88,
res=0x43003dc0, scope=0) at monet_interpreter.c:2194
#18 0x00002aaaabbfc98b in interpret (stk=1, lt=0x2aab4e1a3218, res=0x43003dc0)
at monet_interpreter.c:654
#19 0x00002aaab9b5c48d in xquery_tree_exec (ctx=0x155a1c8, t=0x2aab4e1a3218,
repeat=0) at pathfinder.c:247
#20 0x00002aaab9b5c9ee in xquery_mil_exec (ctx=0x155a1c8,
buf=0x2aab5c069880 "var tijah_scoreDB;\nvar tijah_ftiTape;\ntijah_ftiTape
:= new (str, bat);\nvar time_load;\nvar time_query;\nvar time_print;\nvar
a0000;\nvar a0001;\nvar a0002;\nvar a0003;\nvar a0004;\nvar a0005;\nvar
a0006;\nvar a"...) at pathfinder.c:305
#21 0x00002aaab9b5d299 in xquery_compile_exec (ctx=0x155a1c8, options=2,
url=0x161de49 "pf:del-doc(\"prepared67113.data\")\n", is_url=0,
prologue=0x43004330,
query=0x43004328, epilogue=0x43004320, nsurl=0x0) at pathfinder.c:439
#22 0x00002aaab9b63065 in xquery_prepare (ctx=0x155a1c8, usec=5233816459,
query=0x161de49 "pf:del-doc(\"prepared67113.data\")\n") at pathfinder.c:2113
#23 0x00002aaab9b63694 in xquery_client_engine (mc=0x2aaaba2bf0a8) at
pathfinder.c:2256
#24 0x00002aaaba0baf7a in mapi_client_engine (FC=0x2aaaba2bf0a8) at mapi.c:316
#25 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#26 0x00000036638ce86d in clone () from /lib64/libc.so.6
#27 0x0000000000000000 in ?? ()
(gdb) thre 62
[Switching to thread 62 (Thread 1149274432 (LWP 1400))]#0 0x000000366440c888
in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
(gdb) l
12039 {
12040 int mode = ((*wsid) & (3LL << 30)) >> 30;
12041 lng nil = lng_nil;
12042 BUN others_passed; /* there are multiple such queries active? */
12043
12044 MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12045 if (*wsid == pf_special || BUNlocate(ws_overlaps_ws, wsid, &nil) !=
BUN_NONE) {
12046 MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12047 return GDK_SUCCEED; /* already got permission to proceed */
12048 }
(gdb) bt
#0 0x000000366440c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1 0x00000036644088b5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2 0x0000003664408343 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00002aaab871ef20 in CMDpflock_meta (wsid=0x44800010) at pf_support.c:12066
#4 0x00002aaab86e6782 in CMDpflock_meta_unpack40025922 (argc=2,
argv=0x44800000) at pf_support.glue.c:1948
#5 0x00002aaaabbfed91 in interpret (stk=1641, lt=0x7556ec8, res=0x44801e70) at
monet_interpreter.c:1070
#6 0x00002aaaabbfcb0f in interpret (stk=1641, lt=0xca27f08, res=0x44801e70) at
monet_interpreter.c:683
#7 0x00002aaaabc0a3cb in interpret_seqblock (stk=1641, lt=0xfddb428,
res=0x44801e70, scope=1) at monet_interpreter.c:2194
#8 0x00002aaaabbfc98b in interpret (stk=3889, lt=0xfddb428, res=0x44801e70) at
monet_interpreter.c:654
#9 0x00002aaaabc0a3cb in interpret_seqblock (stk=3889, lt=0x6768148,
res=0x44801e70, scope=1) at monet_interpreter.c:2194
#10 0x00002aaaabbfc98b in interpret (stk=1589, lt=0x6768148, res=0x44801e70) at
monet_interpreter.c:654
#11 0x00002aaaabbfe726 in interpret (stk=1589, lt=0x67b47d0, res=0x44801e70) at
monet_interpreter.c:1044
#12 0x00002aaaabbfb887 in interpret_params (stk=1589, lt=0xecdff98,
params=0x448020a0, fcn_nme=0xd7253c8 "cross") at monet_interpreter.c:282
#13 0x00002aaaabbfddcd in interpret (stk=1589, lt=0xecdff78, res=0x44806dc0) at
monet_interpreter.c:884
#14 0x00002aaaabc0a0cc in interpret_assignment (stk=1589, lt=0xecdff28,
res=0x44806dc0) at monet_interpreter.c:2146
#15 0x00002aaaabbfc8fa in interpret (stk=1589, lt=0xecdff28, res=0x44806dc0) at
monet_interpreter.c:638
#16 0x00002aaaabc0a3cb in interpret_seqblock (stk=1589, lt=0xbf86300,
res=0x44806dc0, scope=1) at monet_interpreter.c:2194
#17 0x00002aaaabbfc98b in interpret (stk=6, lt=0xbf86300, res=0x44806dc0) at
monet_interpreter.c:654
#18 0x00002aaaab1a2667 in CMDcatch (stk=6, lt=0xbf862d8, res=0x44806dc0) at
builtin.c:1064
#19 0x00002aaaabbfd54b in interpret (stk=6, lt=0xbf862d8, res=0x44806dc0) at
monet_interpreter.c:808
#20 0x00002aaaabc0a0cc in interpret_assignment (stk=6, lt=0xbf86288,
res=0x44806dc0) at monet_interpreter.c:2146
#21 0x00002aaaabbfc8fa in interpret (stk=6, lt=0xbf86288, res=0x44806dc0) at
monet_interpreter.c:638
#22 0x00002aaaabc0a3cb in interpret_seqblock (stk=6, lt=0xdb31d18,
res=0x44806dc0, scope=0) at monet_interpreter.c:2194
#23 0x00002aaaabbfc98b in interpret (stk=6, lt=0xc825908, res=0x44806dc0) at
monet_interpreter.c:654
#24 0x00002aaab9b5c48d in xquery_tree_exec (ctx=0xc293788, t=0xc825908,
repeat=0) at pathfinder.c:247
#25 0x00002aaab9b5c9ee in xquery_mil_exec (ctx=0xc293788,
buf=0xf1f3fd0 "var tijah_scoreDB;\nvar tijah_ftiTape;\nvar time_load;\nvar
time_query;\nvar time_print;\nvar a0000;\nvar a0001;\nvar a0002;\nvar
a0003;\nvar a0004;\nvar a0005;\nvar a0006;\nvar a0007;\nvar a0008;\nvar
a0009;\nvar a"...) at pathfinder.c:305
#26 0x00002aaab9b5d299 in xquery_compile_exec (ctx=0xc293788, options=2,
url=0xd6e5aa9 "count(pf:collections()[.=\"cons02_log.xml\"])", is_url=0,
prologue=0x44807330, query=0x44807328, epilogue=0x44807320, nsurl=0x0) at
pathfinder.c:439
#27 0x00002aaab9b63065 in xquery_prepare (ctx=0xc293788, usec=5271918731,
query=0xd6e5aa9 "count(pf:collections()[.=\"cons02_log.xml\"])") at
pathfinder.c:2113
#28 0x00002aaab9b63694 in xquery_client_engine (mc=0x2aaaba2bf2b0) at
pathfinder.c:2256
#29 0x00002aaaba0baf7a in mapi_client_engine (FC=0x2aaaba2bf2b0) at mapi.c:316
#30 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#31 0x00000036638ce86d in clone () from /lib64/libc.so.6
#32 0x0000000000000000 in ?? ()
(gdb) thre 66
[Switching to thread 66 (Thread 1166059840 (LWP 1404))]#0 0x000000366440c888
in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
(gdb) l
12049 if (mode == PF_MODE_DOCMGT) {
12050 /* SPECIAL CASE: a meta function inside a docmgt function.
12051 * now we must wait for potential other docmgts to finish first
12052 */
12053 int reapply_barrier = (pf_ndocmgt > 1);
12054 if (reapply_barrier) pf_ndocmgt--;
12055 MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: metaLock precheck");
12056 if (reapply_barrier) {
12057 MT_down_sema(PF_META_BARRIER, "pflock_meta: metaBarrier for
convert");
12058 MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock
special");
(gdb) bt
#0 0x000000366440c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1 0x00000036644088b5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2 0x0000003664408343 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00002aaab871e3a8 in CMDpflock_begin (wsid=0x45802d30) at
pf_support.c:11954
#4 0x00002aaab86e65ec in CMDpflock_begin_unpack57322442 (argc=2,
argv=0x45802d20) at pf_support.glue.c:1906
#5 0x00002aaaabbfed91 in interpret (stk=1577, lt=0xeb91088, res=0x45808dc0) at
monet_interpreter.c:1070
#6 0x00002aaaabc0a3cb in interpret_seqblock (stk=1577, lt=0x30d93c8,
res=0x45808dc0, scope=1) at monet_interpreter.c:2194
#7 0x00002aaaabbfc98b in interpret (stk=1322, lt=0x8fea0e8, res=0x45808dc0) at
monet_interpreter.c:654
#8 0x00002aaaabc0a3cb in interpret_seqblock (stk=1322, lt=0x8fea308,
res=0x45808dc0, scope=1) at monet_interpreter.c:2194
#9 0x00002aaaabbfc98b in interpret (stk=1667, lt=0x8fea308, res=0x45808dc0) at
monet_interpreter.c:654
#10 0x00002aaaabbfe726 in interpret (stk=1667, lt=0x5e42748, res=0x45808dc0) at
monet_interpreter.c:1044
#11 0x00002aaaabc0a0cc in interpret_assignment (stk=1667, lt=0x5e426f8,
res=0x45808dc0) at monet_interpreter.c:2146
#12 0x00002aaaabbfc8fa in interpret (stk=1667, lt=0x5e426f8, res=0x45808dc0) at
monet_interpreter.c:638
#13 0x00002aaaabc0a3cb in interpret_seqblock (stk=1667, lt=0x5843be0,
res=0x45808dc0, scope=1) at monet_interpreter.c:2194
#14 0x00002aaaabbfc98b in interpret (stk=10, lt=0x5843be0, res=0x45808dc0) at
monet_interpreter.c:654
#15 0x00002aaaab1a2667 in CMDcatch (stk=10, lt=0x5843bb8, res=0x45808dc0) at
builtin.c:1064
#16 0x00002aaaabbfd54b in interpret (stk=10, lt=0x5843bb8, res=0x45808dc0) at
monet_interpreter.c:808
#17 0x00002aaaabc0a0cc in interpret_assignment (stk=10, lt=0x5843b68,
res=0x45808dc0) at monet_interpreter.c:2146
#18 0x00002aaaabbfc8fa in interpret (stk=10, lt=0x5843b68, res=0x45808dc0) at
monet_interpreter.c:638
#19 0x00002aaaabc0a3cb in interpret_seqblock (stk=10, lt=0x73351c8,
res=0x45808dc0, scope=0) at monet_interpreter.c:2194
#20 0x00002aaaabbfc98b in interpret (stk=10, lt=0x3afd8b8, res=0x45808dc0) at
monet_interpreter.c:654
#21 0x00002aaab9b5c48d in xquery_tree_exec (ctx=0x11fc808, t=0x3afd8b8,
repeat=0) at pathfinder.c:247
#22 0x00002aaab9b5c9ee in xquery_mil_exec (ctx=0x11fc808,
buf=0xf2209f0 "var tijah_scoreDB;\nvar tijah_ftiTape;\ntijah_ftiTape := new
(str, bat);\nvar time_load;\nvar time_query;\nvar time_print;\nvar a0000;\nvar
a0001;\nvar a0002;\nvar a0003;\nvar a0004;\nvar a0005;\nvar a0006;\nvar a"...)
at pathfinder.c:305
#23 0x00002aaab9b5d299 in xquery_compile_exec (ctx=0x11fc808, options=2,
url=0x7e18e19 "for $d in
pf:documents()[...@collection='cons02_log.xml']//text() return
pf:del-doc($d)\n", is_url=0, prologue=0x45809330, query=0x45809328,
epilogue=0x45809320, nsurl=0x0) at pathfinder.c:439
#24 0x00002aaab9b63065 in xquery_prepare (ctx=0x11fc808, usec=5281945585,
query=0x7e18e19 "for $d in
pf:documents()[...@collection='cons02_log.xml']//text() return
pf:del-doc($d)\n") at pathfinder.c:2113
#25 0x00002aaab9b63694 in xquery_client_engine (mc=0x2aaaba2bf450) at
pathfinder.c:2256
#26 0x00002aaaba0baf7a in mapi_client_engine (FC=0x2aaaba2bf450) at mapi.c:316
#27 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#28 0x00000036638ce86d in clone () from /lib64/libc.so.6
#29 0x0000000000000000 in ?? ()
(gdb) thre 67
[Switching to thread 67 (Thread 1157667136 (LWP 14091))]#0 0x000000366440c888
in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
(gdb) l
12059 pf_special = *wsid;
12060 MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock
special");
12061 }
12062 return GDK_SUCCEED;
12063 }
12064 MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12065
12066 MT_set_lock(PF_META_LOCK, "pflock_meta: metaLock");
12067 MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock insert");
12068 others_passed = BUNfnd(BATmirror(ws_overlaps_ws), &nil);
(gdb) bt
#0 0x000000366440c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1 0x00000036644088b5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2 0x0000003664408343 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3 0x00002aaab871e3a8 in CMDpflock_begin (wsid=0x45001d30) at
pf_support.c:11954
#4 0x00002aaab86e65ec in CMDpflock_begin_unpack57322442 (argc=2,
argv=0x45001d20) at pf_support.glue.c:1906
#5 0x00002aaaabbfed91 in interpret (stk=1373, lt=0xaabdec8, res=0x45007dc0) at
monet_interpreter.c:1070
#6 0x00002aaaabc0a3cb in interpret_seqblock (stk=1373, lt=0x4022388,
res=0x45007dc0, scope=1) at monet_interpreter.c:2194
#7 0x00002aaaabbfc98b in interpret (stk=3307, lt=0xed39118, res=0x45007dc0) at
monet_interpreter.c:654
#8 0x00002aaaabc0a3cb in interpret_seqblock (stk=3307, lt=0x9864c88,
res=0x45007dc0, scope=1) at monet_interpreter.c:2194
#9 0x00002aaaabbfc98b in interpret (stk=3409, lt=0x9864c88, res=0x45007dc0) at
monet_interpreter.c:654
#10 0x00002aaaabbfe726 in interpret (stk=3409, lt=0x98006a8, res=0x45007dc0) at
monet_interpreter.c:1044
#11 0x00002aaaabc0a0cc in interpret_assignment (stk=3409, lt=0x9800658,
res=0x45007dc0) at monet_interpreter.c:2146
#12 0x00002aaaabbfc8fa in interpret (stk=3409, lt=0x9800658, res=0x45007dc0) at
monet_interpreter.c:638
#13 0x00002aaaabc0a3cb in interpret_seqblock (stk=3409, lt=0xa082db0,
res=0x45007dc0, scope=1) at monet_interpreter.c:2194
#14 0x00002aaaabbfc98b in interpret (stk=5, lt=0xa082db0, res=0x45007dc0) at
monet_interpreter.c:654
#15 0x00002aaaab1a2667 in CMDcatch (stk=5, lt=0xa082d88, res=0x45007dc0) at
builtin.c:1064
#16 0x00002aaaabbfd54b in interpret (stk=5, lt=0xa082d88, res=0x45007dc0) at
monet_interpreter.c:808
#17 0x00002aaaabc0a0cc in interpret_assignment (stk=5, lt=0xa082d38,
res=0x45007dc0) at monet_interpreter.c:2146
#18 0x00002aaaabbfc8fa in interpret (stk=5, lt=0xa082d38, res=0x45007dc0) at
monet_interpreter.c:638
#19 0x00002aaaabc0a3cb in interpret_seqblock (stk=5, lt=0xdc50e98,
res=0x45007dc0, scope=0) at monet_interpreter.c:2194
#20 0x00002aaaabbfc98b in interpret (stk=5, lt=0xc0eb0b8, res=0x45007dc0) at
monet_interpreter.c:654
#21 0x00002aaab9b5c48d in xquery_tree_exec (ctx=0xb41c0f8, t=0xc0eb0b8,
repeat=0) at pathfinder.c:247
#22 0x00002aaab9b5c9ee in xquery_mil_exec (ctx=0xb41c0f8,
buf=0xed6be40 "var tijah_scoreDB;\nvar tijah_ftiTape;\ntijah_ftiTape := new
(str, bat);\nvar time_load;\nvar time_query;\nvar time_print;\nvar a0000;\nvar
a0001;\nvar a0002;\nvar a0003;\nvar a0004;\nvar a0005;\nvar a0006;\nvar a"...)
at pathfinder.c:305
#23 0x00002aaab9b5d299 in xquery_compile_exec (ctx=0xb41c0f8, options=2,
url=0xf8faf39 "for $d in
pf:documents()[...@collection='cons02_log.xml']//text() return
pf:del-doc($d)\n", is_url=0, prologue=0x45008330, query=0x45008328,
epilogue=0x45008320, nsurl=0x0) at pathfinder.c:439
#24 0x00002aaab9b63065 in xquery_prepare (ctx=0xb41c0f8, usec=235526232460,
query=0xf8faf39 "for $d in
pf:documents()[...@collection='cons02_log.xml']//text() return
pf:del-doc($d)\n") at pathfinder.c:2113
#25 0x00002aaab9b63694 in xquery_client_engine (mc=0x2aaaba2bf248) at
pathfinder.c:2256
#26 0x00002aaaba0baf7a in mapi_client_engine (FC=0x2aaaba2bf248) at mapi.c:316
#27 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#28 0x00000036638ce86d in clone () from /lib64/libc.so.6
#29 0x0000000000000000 in ?? ()
----------------------------------------------------------------------
>Comment By: Lefteris Sidirourgos (lsidir)
Date: 2009-02-17 11:41
Message:
Stefan, I don't think we can find a test for this problem. It is a race
condition that I doubt if it makes any sense to try to reproduce it
automatically during night tests.
----------------------------------------------------------------------
Comment By: Stefan Manegold (stmane)
Date: 2009-02-16 13:37
Message:
re-opened to remind us that we should consider adding a test.
----------------------------------------------------------------------
Comment By: Lefteris Sidirourgos (lsidir)
Date: 2008-11-25 12:04
Message:
The bugfix has also been applied to the Nov2008 branch (but not part of the
release, it will be in a upcoming bugfix release). However, the nightly
builds include the fix.
----------------------------------------------------------------------
Comment By: Peter Boncz (boncz)
Date: 2008-11-25 01:10
Message:
race was found. BUGFIX 2338542
fixes race condition when a docmgt query with a meta-command like:
for $d in pf:documents() return pf:del-doc($d)
is followed shortly by another docmgt query:
pf:del-doc("x")
problem is that because the docmgt counter is increased by
the second query arriving (though the first one prevents it
using a META barrier from starting), the increase of that counter
makes the pflock_end of the first query (its upgrade fromdocmgt
to meta query makes it a "special") fail to yield that META
barrier -- it should in fact do so for every special query
finishing no matter the docmgt count.
FIX IS NOT APPLIED IN STABLE YET (0_24) ONLY HEAD. NFI: do not use HEAD
now, it is too raw!! Rather get 0-24 and apply:
-bash-3.1$ cvs diff pf_support.mx
Index: pf_support.mx
===================================================================
RCS file: /cvsroot/monetdb/pathfinder/runtime/pf_support.mx,v
retrieving revision 1.299.2.9
diff -r1.299.2.9 pf_support.mx
8209a8210
> MT_up_sema(PF_META_BARRIER, "pflock_end: metaBarrier for
special docmgt");
8212,8213d8212
< }
< if (pf_ndocmgt == 0) {
-bash-3.1$
----------------------------------------------------------------------
Comment By: Peter Boncz (boncz)
Date: 2008-11-24 17:43
Message:
For clarity, now useing the gdb thread numbers:
(1) Mserver MIL console, blocked on read()
(2) Mserver vmtrim() thread, blocked on timer
(3) MIL interpreter, blocked on socket
(4) XRPC server, blocked on socket
(5) pf_logmanager() checkpoint thread (pathfinder.mx), blocked on timer
(23) pf:del-doc("prepared67113.data"), blocked on PFlock_begin on
PF_META_BARRIER (11959)
(63) count(pf:collections()[.="cons02_log.xml"]), blocked in PFlock_meta()
on
sema PF_LOCK_META (12066)
(66) for $d in pf:documents()[...@collection='cons02_log.xml']//text() return
pf:del-doc($d), blocked in PFlock_begin() on PF_LOCK_META
(67) for $d in pf:documents()[...@collection='cons02_log.xml']//text() return
pf:del-doc($d), blocked in PFlock_begin() on PF_LOCK_META
code is below. So the last three queries are blocked because the meta-lock
is taken. It is held by the pf:del-doc query, which is help up unexpectedly
by getting the meta semaphore.
The problem thus is that some query has left (maybe with an error) that
had the meta semaphore, but did not give it back..
11944 /* query execution starts */
11945 int CMDpflock_begin(lng *wsid)
11946 {
11947 int mode = ((*wsid) & (3LL << 30)) >> 30;
11948 if (mode == PF_MODE_RDONLY)
11949 return GDK_SUCCEED;
11950
11951 if (mode == PF_MODE_DOCMGT) {
11952 /* multiple docmgts may be active in parallel. the first
takes the meta-barrier */
11953 int take_barrier;
11954 MT_set_lock(PF_META_LOCK, "pflock_begin: metaLock for
mgmt");
11955 MT_set_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
mgmt");
11956 take_barrier = (pf_ndocmgt++ == 0);
11957 MT_unset_lock(PF_SHORT_LOCK, "pflock_begin: shirtLock for
mgmt");
11958 if (take_barrier) {
11959 MT_down_sema(PF_META_BARRIER, "pflock_begin:
metaBarrier");
11960 }
11961 MT_unset_lock(PF_META_LOCK, "pflock_begin: metaLock for
mgmt");
11962 return GDK_SUCCEED;
11963 }
11964
11965 /* we only have update queries below this point; first check if
we are in exclusive retry mode */
11966 MT_set_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
convoy"); /* protects pf_convoy */
11967 if (mode == PF_MODE_RETRY) {
11968 pf_convoy += PF_CONVOY; /* re-try: schedule a convoy */
11969 } else if (pf_convoy > 0) {
11970 mode = PF_MODE_RETRY; pf_convoy--; /* convert to exclusive
execution due to convoy */
11971 }
11972 MT_unset_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
convoy");
11973
11974 if (mode == PF_MODE_UPDATE) {
11975 /* update in concurrent mode */
11976 MT_set_lock(PF_UPDATE_LOCK, "pflock_begin: updateLock for
update");
11977 if (pf_nreaders++ == 0)
11978 MT_down_sema(PF_UPDATE_BARRIER, "pflock_begin:
updateBarrier for update");
11979 MT_unset_lock(PF_UPDATE_LOCK, "pflock_begin: updateLock for
update");
11980 } else {
11981 /* update in exclusive mode */
11982 MT_down_sema(PF_UPDATE_BARRIER, "pflock_begin: updateBarrier
for retry");
11983 MT_set_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
writer");
11984 pf_writer = *wsid;
11985 MT_unset_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
writer");
11986 }
11987 return GDK_SUCCEED;
11988 }
12035 /* we are going to use a meta-information function that returns all
collections/documents
12036 * thus we must block all new docmgt queries and wait for all active
ones to finish
12037 */
12038 int CMDpflock_meta(lng *wsid)
12039 {
12040 int mode = ((*wsid) & (3LL << 30)) >> 30;
12041 lng nil = lng_nil;
12042 BUN others_passed; /* there are multiple such queries active? */
12043
12044 MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12045 if (*wsid == pf_special || BUNlocate(ws_overlaps_ws, wsid, &nil)
!= BUN_NONE) {
12046 MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock
precheck");
12047 return GDK_SUCCEED; /* already got permission to proceed */
12048 }
12049 if (mode == PF_MODE_DOCMGT) {
12050 /* SPECIAL CASE: a meta function inside a docmgt function.
12051 * now we must wait for potential other docmgts to finish
first
12052 */
12053 int reapply_barrier = (pf_ndocmgt > 1);
12054 if (reapply_barrier) pf_ndocmgt--;
12055 MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: metaLock
precheck");
12056 if (reapply_barrier) {
12057 MT_down_sema(PF_META_BARRIER, "pflock_meta: metaBarrier
for convert");
12058 MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock
special");
12059 pf_special = *wsid;
12060 MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock
special");
12061 }
12062 return GDK_SUCCEED;
12063 }
12064 MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12065
12066 MT_set_lock(PF_META_LOCK, "pflock_meta: metaLock");
12067 MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock insert");
12068 others_passed = BUNfnd(BATmirror(ws_overlaps_ws), &nil);
12069 BUNins(ws_overlaps_ws, wsid, &nil, FALSE);
12070 MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock insert");
12071 if (others_passed == BUN_NONE) {
12072 /* first query with meta function: get barrier */
12073 MT_down_sema(PF_META_BARRIER, "pflock_meta: metaBarrier");
12074 }
12075 MT_unset_lock(PF_META_LOCK, "pflock_meta: metaLock");
12076 return GDK_SUCCEED;
12077 }
----------------------------------------------------------------------
Comment By: Peter Boncz (boncz)
Date: 2008-11-24 15:57
Message:
The trace shows the following:
(1) MIL server thread
(2) BBPtrim thread
(3) xrpc background server
(4) checkpoint background thread
(5) pf:del-doc("prepared67113.data")
(6) count(pf:collections()[.="cons02_log.xml"])
(7) for $d in pf:documents()[...@collection='cons02_log.xml']//text() return
pf:del-doc($d)
(8) for $d in pf:documents()[...@collection='cons02_log.xml']//text() return
pf:del-doc($d)
The first question for NFI is why threads (7) and (8) are removing the
same documents concurrently. Maybe thread (5) is even removing a document
covered by both (7) and (8) This will necessarily lead to (correct)
runtime errors.
analysis continues.. NFI feedback appreciated
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2338524&group_id=56967
------------------------------------------------------------------------------
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs