Bugs item #2338524, was opened at 2008-11-24 13:08
Message generated for change (Comment added) made by lsidir
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2338524&group_id=56967

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: PF/runtime
Group: MonetDB4 "stable"
Status: Pending
Resolution: Fixed
Priority: 5
Private: No
Submitted By: Floris Ouwendijk (floris233)
Assigned to: Lefteris Sidirourgos (lsidir)
Summary: Deadlock on barrier

Initial Comment:
While running multiple processes performing queries that add/remove documents, 
the system locked. 
The following is a (gdb) dump of the threads, and for each of the threads the 
location.
The version of the database is the August 2008 tarball release.

(gdb) info threads
  67 Thread 1157667136 (LWP 14091)  0x000000366440c888 in __lll_mutex_lock_wait 
() from /lib64/libpthread.so.0
  66 Thread 1166059840 (LWP 1404)  0x000000366440c888 in __lll_mutex_lock_wait 
() from /lib64/libpthread.so.0
  62 Thread 1149274432 (LWP 1400)  0x000000366440c888 in __lll_mutex_lock_wait 
() from /lib64/libpthread.so.0
  23 Thread 1124096320 (LWP 5202)  0x000000366440c0ed in sem_wait () from 
/lib64/libpthread.so.0
  5 Thread 1107310912 (LWP 5041)  0x000000366440d531 in __nanosleep_nocancel () 
from /lib64/libpthread.so.0
  4 Thread 1098918208 (LWP 5040)  0x00000036638c7db2 in __select_nocancel () 
from /lib64/libc.so.6
  3 Thread 1090525504 (LWP 5039)  0x00000036638c7db2 in __select_nocancel () 
from /lib64/libc.so.6
  2 Thread 1082132800 (LWP 5038)  0x000000366440d531 in __nanosleep_nocancel () 
from /lib64/libpthread.so.0
* 1 Thread 46912538871296 (LWP 5035)  0x00000036638c0e1b in __read_nocancel () 
from /lib64/libc.so.6






(gdb) thre 1
[Switching to thread 1 (Thread 46912538871296 (LWP 5035))]#0  
0x00000036638c0e1b in __read_nocancel () from /lib64/libc.so.6
(gdb) l
11979           MT_unset_lock(PF_UPDATE_LOCK, "pflock_begin: updateLock for 
update");
11980       } else {
11981           /* update in exclusive mode */
11982           MT_down_sema(PF_UPDATE_BARRIER, "pflock_begin: updateBarrier 
for retry");
11983           MT_set_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for 
writer");
11984           pf_writer = *wsid;
11985           MT_unset_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for 
writer");
11986       }
11987       return GDK_SUCCEED;
11988   }
(gdb) bt
#0  0x00000036638c0e1b in __read_nocancel () from /lib64/libc.so.6
#1  0x000000366386a617 in _IO_new_file_underflow () from /lib64/libc.so.6
#2  0x000000366386930e in _IO_file_xsgetn_internal () from /lib64/libc.so.6
#3  0x000000366385f692 in fread () from /lib64/libc.so.6
#4  0x00002aaaacf63c63 in file_read (s=0x605d70, buf=0x67bbbe, elmsize=1, 
cnt=1) at stream.c:447
#5  0x00002aaaacf63311 in stream_read (s=0x605d70, buf=0x67bbbe, elmsize=1, 
cnt=1) at stream.c:165
#6  0x00002aaaabc329ff in readClient (c=0x2aaaabe99bc0) at monet_client.c:428
#7  0x00002aaaabc3323a in serveClient (stk=0, lt=0x6536b8, res=0x7fffe6999c30) 
at monet_client.c:622
#8  0x00002aaaabbfd54b in interpret (stk=0, lt=0x6536b8, res=0x7fffe6999c30) at 
monet_interpreter.c:808
#9  0x00002aaaabc2dc48 in handleRequest (t=0x2aaaacf4c9a0, q=0x6ed8188, 
res=0x7fffe6999c30) at monet_queue.c:342
#10 0x00002aaaabc2e145 in doRequest (t=0x2aaaacf4c9a0, preference=0x0) at 
monet_queue.c:368
#11 0x00002aaaabc846dd in monetInterpreter (status=0x7fffe6999d34) at 
monet_process.c:45
#12 0x0000000000401fa4 in main (argc=3, av=0x7fffe6999e58) at Mserver.c:234














(gdb) thre 2
[Switching to thread 2 (Thread 1082132800 (LWP 5038))]#0  0x000000366440d531 in 
__nanosleep_nocancel () from /lib64/libpthread.so.0
(gdb) l
11989
11990   /* query execution finishes; note that we have PF_SHORT_LOCK during 
this */
11991   int CMDpflock_end(lng *wsid)
11992   {
11993       int mode = ((*wsid) & (3LL << 30)) >> 30;
11994       lng nil = lng_nil;
11995
11996       if (mode == PF_MODE_DOCMGT) {
11997           /* last docmgt function yields the meta barrier
11998            */
(gdb) bt
#0  0x000000366440d531 in __nanosleep_nocancel () from /lib64/libpthread.so.0
#1  0x00002aaaaca238e8 in MT_sleep_ms (ms=5000) at gdk_posix.c:1770
#2  0x00002aaaac788e36 in GDKvmtrim (limit=0x2aaaacd464c0) at gdk_utils.c:1350
#3  0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#4  0x00000036638ce86d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()











(gdb) thre 3
[Switching to thread 3 (Thread 1090525504 (LWP 5039))]#0  0x00000036638c7db2 in 
__select_nocancel () from /lib64/libc.so.6
(gdb) l
11999           if (*wsid == pf_special) {
12000               pf_special = -1;
12001           } else {
12002               pf_ndocmgt--;
12003           }
12004           if (pf_ndocmgt == 0) {
12005               MT_up_sema(PF_META_BARRIER, "pflock_end: metaBarrier for 
docmgt");
12006           }
12007       } else {
12008           /* last active meta-information query releases meta-barrier
(gdb) bt
#0  0x00000036638c7db2 in __select_nocancel () from /lib64/libc.so.6
#1  0x00002aaaba0bbed0 in MAPIlisten (stk=2, lt=0x125e2e8, res=0x410010e0) at 
mapi.c:729
#2  0x00002aaaabbfd54b in interpret (stk=2, lt=0x125e2e8, res=0x410010e0) at 
monet_interpreter.c:808
#3  0x00002aaaabc2dc48 in handleRequest (t=0x2aaaacf4ca48, q=0x1258088, 
res=0x410010e0) at monet_queue.c:342
#4  0x00002aaaabc2e145 in doRequest (t=0x2aaaacf4ca48, preference=0x0) at 
monet_queue.c:368
#5  0x00002aaaabc846dd in monetInterpreter (status=0x7fffe6997544) at 
monet_process.c:45
#6  0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#7  0x00000036638ce86d in clone () from /lib64/libc.so.6
#8  0x0000000000000000 in ?? ()









(gdb) thre 4
[Switching to thread 4 (Thread 1098918208 (LWP 5040))]#0  0x00000036638c7db2 in 
__select_nocancel () from /lib64/libc.so.6
(gdb) l
12009            */
12010           BUN self_passed; /* this query uses meta-information functions? 
*/
12011           BUN others_passed; /* there are multiple such queries active? */
12012
12013           self_passed = BUNlocate(ws_overlaps_ws, wsid, &nil);
12014           BUNdelHead(ws_overlaps_ws, wsid, FALSE); /* release all active 
info on this query */
12015           others_passed = BUNfnd(BATmirror(ws_overlaps_ws), &nil);
12016           if (self_passed != BUN_NONE && others_passed == BUN_NONE) {
12017               MT_up_sema(PF_META_BARRIER, "pflock_end: metaBarrier");
12018           }
(gdb) bt
#0  0x00000036638c7db2 in __select_nocancel () from /lib64/libc.so.6
#1  0x00002aaaba52bdda in shttpd_poll (ctx=0x41801920, milliseconds=200) at 
shttpd.c:2981
#2  0x00002aaaba535638 in CMDrpcd_start (port=0x41801b80, open=0x41801b90 
"\001", option=0x2aaabc127fd8 "") at xrpc_server.c:1005
#3  0x00002aaaba52d037 in CMDrpcd_start_unpack11689948 (argc=4, 
argv=0x41801b70) at xrpc_server.glue.c:43
#4  0x00002aaaabbfed91 in interpret (stk=3, lt=0x11c0418, res=0x418020e0) at 
monet_interpreter.c:1070
#5  0x00002aaaabc2dc48 in handleRequest (t=0x2aaaacf4caf0, q=0x11f4768, 
res=0x418020e0) at monet_queue.c:342
#6  0x00002aaaabc2e145 in doRequest (t=0x2aaaacf4caf0, preference=0x0) at 
monet_queue.c:368
#7  0x00002aaaabc846dd in monetInterpreter (status=0x7fffe6998184) at 
monet_process.c:45
#8  0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#9  0x00000036638ce86d in clone () from /lib64/libc.so.6
#10 0x0000000000000000 in ?? ()









(gdb) thre 5
[Switching to thread 5 (Thread 1107310912 (LWP 5041))]#0  0x000000366440d531 in 
__nanosleep_nocancel () from /lib64/libpthread.so.0
(gdb) l
12019
12020           /* last active update query releases update-barrier
12021            */
12022           if (pf_writer == *wsid) {
12023               /* finish in exclusive mode */
12024               pf_writer = -1;
12025               MT_up_sema(PF_UPDATE_BARRIER, "pflock_end: exclusive 
updateBarrier");
12026           } else if (mode != PF_MODE_RDONLY) {
12027               /* finish in concurrent mode */
12028               if (--pf_nreaders == 0)
(gdb) bt
#0  0x000000366440d531 in __nanosleep_nocancel () from /lib64/libpthread.so.0
#1  0x00002aaaaca238e8 in MT_sleep_ms (ms=30000) at gdk_posix.c:1770
#2  0x00002aaab82c42cf in CMDsleep (secs=0x42001150) at alarm.c:145
#3  0x00002aaab82c361a in CMDsleep_unpack2122305048 (argc=2, argv=0x42001140) 
at alarm.glue.c:43
#4  0x00002aaaabbfed91 in interpret (stk=545, lt=0x66e308, res=0x420030e0) at 
monet_interpreter.c:1070
#5  0x00002aaaabc0a3cb in interpret_seqblock (stk=545, lt=0x1363898, 
res=0x420030e0, scope=1) at monet_interpreter.c:2194
#6  0x00002aaaabbfc98b in interpret (stk=546, lt=0x1363898, res=0x420030e0) at 
monet_interpreter.c:654
#7  0x00002aaaabc01244 in interpret_while (stk=546, lt=0x11f07d0, 
res=0x420030e0) at monet_interpreter.c:1187
#8  0x00002aaaabbfcc85 in interpret (stk=546, lt=0x11f07d0, res=0x420030e0) at 
monet_interpreter.c:721
#9  0x00002aaaabc0a3cb in interpret_seqblock (stk=546, lt=0x11f0708, 
res=0x420030e0, scope=1) at monet_interpreter.c:2194
#10 0x00002aaaabbfc98b in interpret (stk=4, lt=0x11f0708, res=0x420030e0) at 
monet_interpreter.c:654
#11 0x00002aaaabbfe726 in interpret (stk=4, lt=0x11e5338, res=0x420030e0) at 
monet_interpreter.c:1044
#12 0x00002aaaabc2dc48 in handleRequest (t=0x2aaaacf4cb98, q=0x1376c78, 
res=0x420030e0) at monet_queue.c:342
#13 0x00002aaaabc2e145 in doRequest (t=0x2aaaacf4cb98, preference=0x0) at 
monet_queue.c:368
#14 0x00002aaaabc846dd in monetInterpreter (status=0x7fffe6998ea4) at 
monet_process.c:45
#15 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#16 0x00000036638ce86d in clone () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()







(gdb) thre 23
[Switching to thread 23 (Thread 1124096320 (LWP 5202))]#0  0x000000366440c0ed 
in sem_wait () from /lib64/libpthread.so.0
(gdb) l
12029                   MT_up_sema(PF_UPDATE_BARRIER, "pflock_end: 
updateBarrier");
12030           }
12031       }
12032       return GDK_SUCCEED;
12033   }
12034
12035   /* we are going to use a meta-information function that returns all 
collections/documents
12036    * thus we must block all new docmgt queries and wait for all active 
ones to finish
12037    */
12038   int CMDpflock_meta(lng *wsid)
(gdb) bt
#0  0x000000366440c0ed in sem_wait () from /lib64/libpthread.so.0
#1  0x00002aaab871e4da in CMDpflock_begin (wsid=0x42ffdd30) at 
pf_support.c:11959
#2  0x00002aaab86e65ec in CMDpflock_begin_unpack57322442 (argc=2, 
argv=0x42ffdd20) at pf_support.glue.c:1906
#3  0x00002aaaabbfed91 in interpret (stk=3185, lt=0x2aab5502c078, 
res=0x43003dc0) at monet_interpreter.c:1070
#4  0x00002aaaabc0a3cb in interpret_seqblock (stk=3185, lt=0x2aab57e414b8, 
res=0x43003dc0, scope=1) at monet_interpreter.c:2194
#5  0x00002aaaabbfc98b in interpret (stk=3571, lt=0x2aab55ef1228, 
res=0x43003dc0) at monet_interpreter.c:654
#6  0x00002aaaabc0a3cb in interpret_seqblock (stk=3571, lt=0x2aab4dfe9308, 
res=0x43003dc0, scope=1) at monet_interpreter.c:2194
#7  0x00002aaaabbfc98b in interpret (stk=3683, lt=0x2aab4dfe9308, 
res=0x43003dc0) at monet_interpreter.c:654
#8  0x00002aaaabbfe726 in interpret (stk=3683, lt=0x2aab4e221688, 
res=0x43003dc0) at monet_interpreter.c:1044
#9  0x00002aaaabc0a0cc in interpret_assignment (stk=3683, lt=0x2aab4e221638, 
res=0x43003dc0) at monet_interpreter.c:2146
#10 0x00002aaaabbfc8fa in interpret (stk=3683, lt=0x2aab4e221638, 
res=0x43003dc0) at monet_interpreter.c:638
#11 0x00002aaaabc0a3cb in interpret_seqblock (stk=3683, lt=0x2aab562f9580, 
res=0x43003dc0, scope=1) at monet_interpreter.c:2194
#12 0x00002aaaabbfc98b in interpret (stk=1, lt=0x2aab562f9580, res=0x43003dc0) 
at monet_interpreter.c:654
#13 0x00002aaaab1a2667 in CMDcatch (stk=1, lt=0x2aab562f9558, res=0x43003dc0) 
at builtin.c:1064
#14 0x00002aaaabbfd54b in interpret (stk=1, lt=0x2aab562f9558, res=0x43003dc0) 
at monet_interpreter.c:808
#15 0x00002aaaabc0a0cc in interpret_assignment (stk=1, lt=0x2aab562f9508, 
res=0x43003dc0) at monet_interpreter.c:2146
#16 0x00002aaaabbfc8fa in interpret (stk=1, lt=0x2aab562f9508, res=0x43003dc0) 
at monet_interpreter.c:638
#17 0x00002aaaabc0a3cb in interpret_seqblock (stk=1, lt=0x2aab557a6f88, 
res=0x43003dc0, scope=0) at monet_interpreter.c:2194
#18 0x00002aaaabbfc98b in interpret (stk=1, lt=0x2aab4e1a3218, res=0x43003dc0) 
at monet_interpreter.c:654
#19 0x00002aaab9b5c48d in xquery_tree_exec (ctx=0x155a1c8, t=0x2aab4e1a3218, 
repeat=0) at pathfinder.c:247
#20 0x00002aaab9b5c9ee in xquery_mil_exec (ctx=0x155a1c8,
    buf=0x2aab5c069880 "var tijah_scoreDB;\nvar tijah_ftiTape;\ntijah_ftiTape 
:= new (str, bat);\nvar time_load;\nvar time_query;\nvar time_print;\nvar 
a0000;\nvar a0001;\nvar a0002;\nvar a0003;\nvar a0004;\nvar a0005;\nvar 
a0006;\nvar a"...) at pathfinder.c:305
#21 0x00002aaab9b5d299 in xquery_compile_exec (ctx=0x155a1c8, options=2, 
url=0x161de49 "pf:del-doc(\"prepared67113.data\")\n", is_url=0, 
prologue=0x43004330,
    query=0x43004328, epilogue=0x43004320, nsurl=0x0) at pathfinder.c:439
#22 0x00002aaab9b63065 in xquery_prepare (ctx=0x155a1c8, usec=5233816459, 
query=0x161de49 "pf:del-doc(\"prepared67113.data\")\n") at pathfinder.c:2113
#23 0x00002aaab9b63694 in xquery_client_engine (mc=0x2aaaba2bf0a8) at 
pathfinder.c:2256
#24 0x00002aaaba0baf7a in mapi_client_engine (FC=0x2aaaba2bf0a8) at mapi.c:316
#25 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#26 0x00000036638ce86d in clone () from /lib64/libc.so.6
#27 0x0000000000000000 in ?? ()









(gdb) thre 62
[Switching to thread 62 (Thread 1149274432 (LWP 1400))]#0  0x000000366440c888 
in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
(gdb) l
12039   {
12040       int mode = ((*wsid) & (3LL << 30)) >> 30;
12041       lng nil = lng_nil;
12042       BUN others_passed; /* there are multiple such queries active? */
12043
12044       MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12045       if (*wsid == pf_special || BUNlocate(ws_overlaps_ws, wsid, &nil) != 
BUN_NONE) {
12046           MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12047           return GDK_SUCCEED; /* already got permission to proceed */
12048       }
(gdb) bt
#0  0x000000366440c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00000036644088b5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2  0x0000003664408343 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002aaab871ef20 in CMDpflock_meta (wsid=0x44800010) at pf_support.c:12066
#4  0x00002aaab86e6782 in CMDpflock_meta_unpack40025922 (argc=2, 
argv=0x44800000) at pf_support.glue.c:1948
#5  0x00002aaaabbfed91 in interpret (stk=1641, lt=0x7556ec8, res=0x44801e70) at 
monet_interpreter.c:1070
#6  0x00002aaaabbfcb0f in interpret (stk=1641, lt=0xca27f08, res=0x44801e70) at 
monet_interpreter.c:683
#7  0x00002aaaabc0a3cb in interpret_seqblock (stk=1641, lt=0xfddb428, 
res=0x44801e70, scope=1) at monet_interpreter.c:2194
#8  0x00002aaaabbfc98b in interpret (stk=3889, lt=0xfddb428, res=0x44801e70) at 
monet_interpreter.c:654
#9  0x00002aaaabc0a3cb in interpret_seqblock (stk=3889, lt=0x6768148, 
res=0x44801e70, scope=1) at monet_interpreter.c:2194
#10 0x00002aaaabbfc98b in interpret (stk=1589, lt=0x6768148, res=0x44801e70) at 
monet_interpreter.c:654
#11 0x00002aaaabbfe726 in interpret (stk=1589, lt=0x67b47d0, res=0x44801e70) at 
monet_interpreter.c:1044
#12 0x00002aaaabbfb887 in interpret_params (stk=1589, lt=0xecdff98, 
params=0x448020a0, fcn_nme=0xd7253c8 "cross") at monet_interpreter.c:282
#13 0x00002aaaabbfddcd in interpret (stk=1589, lt=0xecdff78, res=0x44806dc0) at 
monet_interpreter.c:884
#14 0x00002aaaabc0a0cc in interpret_assignment (stk=1589, lt=0xecdff28, 
res=0x44806dc0) at monet_interpreter.c:2146
#15 0x00002aaaabbfc8fa in interpret (stk=1589, lt=0xecdff28, res=0x44806dc0) at 
monet_interpreter.c:638
#16 0x00002aaaabc0a3cb in interpret_seqblock (stk=1589, lt=0xbf86300, 
res=0x44806dc0, scope=1) at monet_interpreter.c:2194
#17 0x00002aaaabbfc98b in interpret (stk=6, lt=0xbf86300, res=0x44806dc0) at 
monet_interpreter.c:654
#18 0x00002aaaab1a2667 in CMDcatch (stk=6, lt=0xbf862d8, res=0x44806dc0) at 
builtin.c:1064
#19 0x00002aaaabbfd54b in interpret (stk=6, lt=0xbf862d8, res=0x44806dc0) at 
monet_interpreter.c:808
#20 0x00002aaaabc0a0cc in interpret_assignment (stk=6, lt=0xbf86288, 
res=0x44806dc0) at monet_interpreter.c:2146
#21 0x00002aaaabbfc8fa in interpret (stk=6, lt=0xbf86288, res=0x44806dc0) at 
monet_interpreter.c:638
#22 0x00002aaaabc0a3cb in interpret_seqblock (stk=6, lt=0xdb31d18, 
res=0x44806dc0, scope=0) at monet_interpreter.c:2194
#23 0x00002aaaabbfc98b in interpret (stk=6, lt=0xc825908, res=0x44806dc0) at 
monet_interpreter.c:654
#24 0x00002aaab9b5c48d in xquery_tree_exec (ctx=0xc293788, t=0xc825908, 
repeat=0) at pathfinder.c:247
#25 0x00002aaab9b5c9ee in xquery_mil_exec (ctx=0xc293788,
    buf=0xf1f3fd0 "var tijah_scoreDB;\nvar tijah_ftiTape;\nvar time_load;\nvar 
time_query;\nvar time_print;\nvar a0000;\nvar a0001;\nvar a0002;\nvar 
a0003;\nvar a0004;\nvar a0005;\nvar a0006;\nvar a0007;\nvar a0008;\nvar 
a0009;\nvar a"...) at pathfinder.c:305
#26 0x00002aaab9b5d299 in xquery_compile_exec (ctx=0xc293788, options=2, 
url=0xd6e5aa9 "count(pf:collections()[.=\"cons02_log.xml\"])", is_url=0,
    prologue=0x44807330, query=0x44807328, epilogue=0x44807320, nsurl=0x0) at 
pathfinder.c:439
#27 0x00002aaab9b63065 in xquery_prepare (ctx=0xc293788, usec=5271918731, 
query=0xd6e5aa9 "count(pf:collections()[.=\"cons02_log.xml\"])") at 
pathfinder.c:2113
#28 0x00002aaab9b63694 in xquery_client_engine (mc=0x2aaaba2bf2b0) at 
pathfinder.c:2256
#29 0x00002aaaba0baf7a in mapi_client_engine (FC=0x2aaaba2bf2b0) at mapi.c:316
#30 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#31 0x00000036638ce86d in clone () from /lib64/libc.so.6
#32 0x0000000000000000 in ?? ()









(gdb) thre 66
[Switching to thread 66 (Thread 1166059840 (LWP 1404))]#0  0x000000366440c888 
in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
(gdb) l
12049       if (mode == PF_MODE_DOCMGT) {
12050           /* SPECIAL CASE: a meta function inside a docmgt function.
12051            * now we must wait for potential other docmgts to finish first
12052            */
12053           int reapply_barrier = (pf_ndocmgt > 1);
12054           if (reapply_barrier) pf_ndocmgt--;
12055           MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: metaLock precheck");
12056           if (reapply_barrier) {
12057               MT_down_sema(PF_META_BARRIER, "pflock_meta: metaBarrier for 
convert");
12058               MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock 
special");
(gdb) bt
#0  0x000000366440c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00000036644088b5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2  0x0000003664408343 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002aaab871e3a8 in CMDpflock_begin (wsid=0x45802d30) at 
pf_support.c:11954
#4  0x00002aaab86e65ec in CMDpflock_begin_unpack57322442 (argc=2, 
argv=0x45802d20) at pf_support.glue.c:1906
#5  0x00002aaaabbfed91 in interpret (stk=1577, lt=0xeb91088, res=0x45808dc0) at 
monet_interpreter.c:1070
#6  0x00002aaaabc0a3cb in interpret_seqblock (stk=1577, lt=0x30d93c8, 
res=0x45808dc0, scope=1) at monet_interpreter.c:2194
#7  0x00002aaaabbfc98b in interpret (stk=1322, lt=0x8fea0e8, res=0x45808dc0) at 
monet_interpreter.c:654
#8  0x00002aaaabc0a3cb in interpret_seqblock (stk=1322, lt=0x8fea308, 
res=0x45808dc0, scope=1) at monet_interpreter.c:2194
#9  0x00002aaaabbfc98b in interpret (stk=1667, lt=0x8fea308, res=0x45808dc0) at 
monet_interpreter.c:654
#10 0x00002aaaabbfe726 in interpret (stk=1667, lt=0x5e42748, res=0x45808dc0) at 
monet_interpreter.c:1044
#11 0x00002aaaabc0a0cc in interpret_assignment (stk=1667, lt=0x5e426f8, 
res=0x45808dc0) at monet_interpreter.c:2146
#12 0x00002aaaabbfc8fa in interpret (stk=1667, lt=0x5e426f8, res=0x45808dc0) at 
monet_interpreter.c:638
#13 0x00002aaaabc0a3cb in interpret_seqblock (stk=1667, lt=0x5843be0, 
res=0x45808dc0, scope=1) at monet_interpreter.c:2194
#14 0x00002aaaabbfc98b in interpret (stk=10, lt=0x5843be0, res=0x45808dc0) at 
monet_interpreter.c:654
#15 0x00002aaaab1a2667 in CMDcatch (stk=10, lt=0x5843bb8, res=0x45808dc0) at 
builtin.c:1064
#16 0x00002aaaabbfd54b in interpret (stk=10, lt=0x5843bb8, res=0x45808dc0) at 
monet_interpreter.c:808
#17 0x00002aaaabc0a0cc in interpret_assignment (stk=10, lt=0x5843b68, 
res=0x45808dc0) at monet_interpreter.c:2146
#18 0x00002aaaabbfc8fa in interpret (stk=10, lt=0x5843b68, res=0x45808dc0) at 
monet_interpreter.c:638
#19 0x00002aaaabc0a3cb in interpret_seqblock (stk=10, lt=0x73351c8, 
res=0x45808dc0, scope=0) at monet_interpreter.c:2194
#20 0x00002aaaabbfc98b in interpret (stk=10, lt=0x3afd8b8, res=0x45808dc0) at 
monet_interpreter.c:654
#21 0x00002aaab9b5c48d in xquery_tree_exec (ctx=0x11fc808, t=0x3afd8b8, 
repeat=0) at pathfinder.c:247
#22 0x00002aaab9b5c9ee in xquery_mil_exec (ctx=0x11fc808,
    buf=0xf2209f0 "var tijah_scoreDB;\nvar tijah_ftiTape;\ntijah_ftiTape := new 
(str, bat);\nvar time_load;\nvar time_query;\nvar time_print;\nvar a0000;\nvar 
a0001;\nvar a0002;\nvar a0003;\nvar a0004;\nvar a0005;\nvar a0006;\nvar a"...) 
at pathfinder.c:305
#23 0x00002aaab9b5d299 in xquery_compile_exec (ctx=0x11fc808, options=2,
    url=0x7e18e19 "for $d in pf:documents()[EMAIL 
PROTECTED]'cons02_log.xml']//text() return pf:del-doc($d)\n", is_url=0, 
prologue=0x45809330, query=0x45809328,
    epilogue=0x45809320, nsurl=0x0) at pathfinder.c:439
#24 0x00002aaab9b63065 in xquery_prepare (ctx=0x11fc808, usec=5281945585,
    query=0x7e18e19 "for $d in pf:documents()[EMAIL 
PROTECTED]'cons02_log.xml']//text() return pf:del-doc($d)\n") at 
pathfinder.c:2113
#25 0x00002aaab9b63694 in xquery_client_engine (mc=0x2aaaba2bf450) at 
pathfinder.c:2256
#26 0x00002aaaba0baf7a in mapi_client_engine (FC=0x2aaaba2bf450) at mapi.c:316
#27 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#28 0x00000036638ce86d in clone () from /lib64/libc.so.6
#29 0x0000000000000000 in ?? ()










(gdb) thre 67
[Switching to thread 67 (Thread 1157667136 (LWP 14091))]#0  0x000000366440c888 
in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
(gdb) l
12059               pf_special = *wsid;
12060               MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock 
special");
12061           }
12062           return GDK_SUCCEED;
12063       }
12064       MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12065
12066       MT_set_lock(PF_META_LOCK, "pflock_meta: metaLock");
12067       MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock insert");
12068       others_passed = BUNfnd(BATmirror(ws_overlaps_ws), &nil);
(gdb) bt
#0  0x000000366440c888 in __lll_mutex_lock_wait () from /lib64/libpthread.so.0
#1  0x00000036644088b5 in _L_mutex_lock_107 () from /lib64/libpthread.so.0
#2  0x0000003664408343 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00002aaab871e3a8 in CMDpflock_begin (wsid=0x45001d30) at 
pf_support.c:11954
#4  0x00002aaab86e65ec in CMDpflock_begin_unpack57322442 (argc=2, 
argv=0x45001d20) at pf_support.glue.c:1906
#5  0x00002aaaabbfed91 in interpret (stk=1373, lt=0xaabdec8, res=0x45007dc0) at 
monet_interpreter.c:1070
#6  0x00002aaaabc0a3cb in interpret_seqblock (stk=1373, lt=0x4022388, 
res=0x45007dc0, scope=1) at monet_interpreter.c:2194
#7  0x00002aaaabbfc98b in interpret (stk=3307, lt=0xed39118, res=0x45007dc0) at 
monet_interpreter.c:654
#8  0x00002aaaabc0a3cb in interpret_seqblock (stk=3307, lt=0x9864c88, 
res=0x45007dc0, scope=1) at monet_interpreter.c:2194
#9  0x00002aaaabbfc98b in interpret (stk=3409, lt=0x9864c88, res=0x45007dc0) at 
monet_interpreter.c:654
#10 0x00002aaaabbfe726 in interpret (stk=3409, lt=0x98006a8, res=0x45007dc0) at 
monet_interpreter.c:1044
#11 0x00002aaaabc0a0cc in interpret_assignment (stk=3409, lt=0x9800658, 
res=0x45007dc0) at monet_interpreter.c:2146
#12 0x00002aaaabbfc8fa in interpret (stk=3409, lt=0x9800658, res=0x45007dc0) at 
monet_interpreter.c:638
#13 0x00002aaaabc0a3cb in interpret_seqblock (stk=3409, lt=0xa082db0, 
res=0x45007dc0, scope=1) at monet_interpreter.c:2194
#14 0x00002aaaabbfc98b in interpret (stk=5, lt=0xa082db0, res=0x45007dc0) at 
monet_interpreter.c:654
#15 0x00002aaaab1a2667 in CMDcatch (stk=5, lt=0xa082d88, res=0x45007dc0) at 
builtin.c:1064
#16 0x00002aaaabbfd54b in interpret (stk=5, lt=0xa082d88, res=0x45007dc0) at 
monet_interpreter.c:808
#17 0x00002aaaabc0a0cc in interpret_assignment (stk=5, lt=0xa082d38, 
res=0x45007dc0) at monet_interpreter.c:2146
#18 0x00002aaaabbfc8fa in interpret (stk=5, lt=0xa082d38, res=0x45007dc0) at 
monet_interpreter.c:638
#19 0x00002aaaabc0a3cb in interpret_seqblock (stk=5, lt=0xdc50e98, 
res=0x45007dc0, scope=0) at monet_interpreter.c:2194
#20 0x00002aaaabbfc98b in interpret (stk=5, lt=0xc0eb0b8, res=0x45007dc0) at 
monet_interpreter.c:654
#21 0x00002aaab9b5c48d in xquery_tree_exec (ctx=0xb41c0f8, t=0xc0eb0b8, 
repeat=0) at pathfinder.c:247
#22 0x00002aaab9b5c9ee in xquery_mil_exec (ctx=0xb41c0f8,
    buf=0xed6be40 "var tijah_scoreDB;\nvar tijah_ftiTape;\ntijah_ftiTape := new 
(str, bat);\nvar time_load;\nvar time_query;\nvar time_print;\nvar a0000;\nvar 
a0001;\nvar a0002;\nvar a0003;\nvar a0004;\nvar a0005;\nvar a0006;\nvar a"...) 
at pathfinder.c:305
#23 0x00002aaab9b5d299 in xquery_compile_exec (ctx=0xb41c0f8, options=2,
    url=0xf8faf39 "for $d in pf:documents()[EMAIL 
PROTECTED]'cons02_log.xml']//text() return pf:del-doc($d)\n", is_url=0, 
prologue=0x45008330, query=0x45008328,
    epilogue=0x45008320, nsurl=0x0) at pathfinder.c:439
#24 0x00002aaab9b63065 in xquery_prepare (ctx=0xb41c0f8, usec=235526232460,
    query=0xf8faf39 "for $d in pf:documents()[EMAIL 
PROTECTED]'cons02_log.xml']//text() return pf:del-doc($d)\n") at 
pathfinder.c:2113
#25 0x00002aaab9b63694 in xquery_client_engine (mc=0x2aaaba2bf248) at 
pathfinder.c:2256
#26 0x00002aaaba0baf7a in mapi_client_engine (FC=0x2aaaba2bf248) at mapi.c:316
#27 0x00000036644062f7 in start_thread () from /lib64/libpthread.so.0
#28 0x00000036638ce86d in clone () from /lib64/libc.so.6
#29 0x0000000000000000 in ?? ()





----------------------------------------------------------------------

Comment By: Lefteris Sidirourgos (lsidir)
Date: 2008-11-25 12:04

Message:
The bugfix has also been applied to the Nov2008 branch (but not part of the
release, it will be in a upcoming bugfix release). However, the nightly
builds include the fix. 

----------------------------------------------------------------------

Comment By: Peter Boncz (boncz)
Date: 2008-11-25 01:10

Message:
race was found. BUGFIX 2338542

fixes race condition when a docmgt query with a meta-command like:

for $d in pf:documents() return pf:del-doc($d)

is followed shortly by another docmgt query:

pf:del-doc("x")

problem is that because the docmgt counter is increased by
the second query arriving (though the first one prevents it
using a META barrier from starting), the increase of that counter
makes the pflock_end of the first query (its upgrade fromdocmgt
to meta query makes it a "special") fail to yield that META
barrier -- it should in fact do so for every special query
finishing no matter the docmgt count.

FIX IS NOT APPLIED IN STABLE YET (0_24) ONLY HEAD. NFI: do not use HEAD
now, it is too raw!! Rather get 0-24 and apply:

-bash-3.1$ cvs diff pf_support.mx
Index: pf_support.mx
===================================================================
RCS file: /cvsroot/monetdb/pathfinder/runtime/pf_support.mx,v
retrieving revision 1.299.2.9
diff -r1.299.2.9 pf_support.mx
8209a8210
>             MT_up_sema(PF_META_BARRIER, "pflock_end: metaBarrier for
special docmgt");
8212,8213d8212
<         }
<         if (pf_ndocmgt == 0) {
-bash-3.1$



----------------------------------------------------------------------

Comment By: Peter Boncz (boncz)
Date: 2008-11-24 17:43

Message:
For clarity, now useing the gdb thread numbers:
(1) Mserver MIL console, blocked on read()
(2) Mserver vmtrim() thread, blocked on timer
(3) MIL interpreter, blocked on socket
(4) XRPC server, blocked on socket
(5) pf_logmanager() checkpoint thread (pathfinder.mx), blocked on timer
(23) pf:del-doc("prepared67113.data"), blocked on PFlock_begin on
PF_META_BARRIER (11959)
(63) count(pf:collections()[.="cons02_log.xml"]), blocked in PFlock_meta()
on 
sema PF_LOCK_META (12066)
(66) for $d in pf:documents()[EMAIL PROTECTED]'cons02_log.xml']//text() return
pf:del-doc($d), blocked in PFlock_begin() on PF_LOCK_META
(67) for $d in pf:documents()[EMAIL PROTECTED]'cons02_log.xml']//text() return
pf:del-doc($d), blocked in PFlock_begin() on PF_LOCK_META

code is below. So the last three queries are blocked because the meta-lock
is taken. It is held by the pf:del-doc query, which is help up unexpectedly
by getting the meta semaphore.

The problem thus is that some query has left (maybe with an error) that
had the meta semaphore, but did not give it back..


11944 /* query execution starts */
11945 int CMDpflock_begin(lng *wsid)
11946 {
11947     int mode = ((*wsid) & (3LL << 30)) >> 30;
11948     if (mode == PF_MODE_RDONLY)
11949         return GDK_SUCCEED;
11950
11951     if (mode == PF_MODE_DOCMGT) {
11952         /* multiple docmgts may be active in parallel. the first
takes the meta-barrier */
11953         int take_barrier;
11954         MT_set_lock(PF_META_LOCK, "pflock_begin: metaLock for
mgmt");
11955         MT_set_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
mgmt");
11956         take_barrier = (pf_ndocmgt++ == 0);
11957         MT_unset_lock(PF_SHORT_LOCK, "pflock_begin: shirtLock for
mgmt");
11958         if (take_barrier) {
11959             MT_down_sema(PF_META_BARRIER, "pflock_begin:
metaBarrier");
11960         }
11961         MT_unset_lock(PF_META_LOCK, "pflock_begin: metaLock for
mgmt");
11962         return GDK_SUCCEED;
11963     }
11964
11965     /* we only have update queries below this point; first check if
we are in exclusive retry mode */
11966     MT_set_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
convoy"); /* protects pf_convoy */
11967     if (mode == PF_MODE_RETRY) {
11968         pf_convoy += PF_CONVOY; /* re-try: schedule a convoy */
11969     } else if (pf_convoy > 0) {
11970         mode = PF_MODE_RETRY; pf_convoy--; /* convert to exclusive
execution due to convoy */
11971     }
11972     MT_unset_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
convoy");
11973
11974     if (mode == PF_MODE_UPDATE) {
11975         /* update in concurrent mode */
11976         MT_set_lock(PF_UPDATE_LOCK, "pflock_begin: updateLock for
update");
11977         if (pf_nreaders++ == 0)
11978             MT_down_sema(PF_UPDATE_BARRIER, "pflock_begin:
updateBarrier for update");
11979         MT_unset_lock(PF_UPDATE_LOCK, "pflock_begin: updateLock for
update");
11980     } else {
11981         /* update in exclusive mode */
11982         MT_down_sema(PF_UPDATE_BARRIER, "pflock_begin: updateBarrier
for retry");
11983         MT_set_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
writer");
11984         pf_writer = *wsid;
11985         MT_unset_lock(PF_SHORT_LOCK, "pflock_begin: shortLock for
writer");
11986     }
11987     return GDK_SUCCEED;
11988 }

12035 /* we are going to use a meta-information function that returns all
collections/documents
12036  * thus we must block all new docmgt queries and wait for all active
ones to finish
12037  */
12038 int CMDpflock_meta(lng *wsid)
12039 {
12040     int mode = ((*wsid) & (3LL << 30)) >> 30;
12041     lng nil = lng_nil;
12042     BUN others_passed; /* there are multiple such queries active? */
12043
12044     MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12045     if (*wsid == pf_special || BUNlocate(ws_overlaps_ws, wsid, &nil)
!= BUN_NONE) {
12046         MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock
precheck");
12047         return GDK_SUCCEED; /* already got permission to proceed */
12048     }
12049     if (mode == PF_MODE_DOCMGT) {
12050         /* SPECIAL CASE: a meta function inside a docmgt function.
12051          * now we must wait for potential other docmgts to finish
first
12052          */
12053         int reapply_barrier = (pf_ndocmgt > 1);
12054         if (reapply_barrier) pf_ndocmgt--;
12055         MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: metaLock
precheck");
12056         if (reapply_barrier) {
12057             MT_down_sema(PF_META_BARRIER, "pflock_meta: metaBarrier
for convert");
12058             MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock
special");
12059             pf_special = *wsid;
12060             MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock
special");
12061         }
12062         return GDK_SUCCEED;
12063     }
12064     MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock precheck");
12065
12066     MT_set_lock(PF_META_LOCK, "pflock_meta: metaLock");
12067     MT_set_lock(PF_SHORT_LOCK, "pflock_meta: shortLock insert");
12068     others_passed = BUNfnd(BATmirror(ws_overlaps_ws), &nil);
12069     BUNins(ws_overlaps_ws, wsid, &nil, FALSE);
12070     MT_unset_lock(PF_SHORT_LOCK, "pflock_meta: shortLock insert");
12071     if (others_passed == BUN_NONE) {
12072         /* first query with meta function: get barrier */
12073         MT_down_sema(PF_META_BARRIER, "pflock_meta: metaBarrier");
12074     }
12075     MT_unset_lock(PF_META_LOCK, "pflock_meta: metaLock");
12076     return GDK_SUCCEED;
12077 }


----------------------------------------------------------------------

Comment By: Peter Boncz (boncz)
Date: 2008-11-24 15:57

Message:
The trace shows the following:
(1) MIL server thread
(2) BBPtrim thread
(3) xrpc background server
(4) checkpoint background thread
(5) pf:del-doc("prepared67113.data")
(6) count(pf:collections()[.="cons02_log.xml"])
(7) for $d in pf:documents()[EMAIL PROTECTED]'cons02_log.xml']//text() return
pf:del-doc($d)
(8) for $d in pf:documents()[EMAIL PROTECTED]'cons02_log.xml']//text() return
pf:del-doc($d)

The first question for NFI is why threads (7) and (8) are removing the
same documents concurrently. Maybe thread (5) is even removing a document
covered by both (7) and (8)  This will necessarily lead to (correct)
runtime errors.

analysis continues.. NFI feedback appreciated



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=482468&aid=2338524&group_id=56967

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Monetdb-bugs mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/monetdb-bugs

Reply via email to