> Bill Page wrote: >> (1) -> writing: ifile:File List Integer:=open("/tmp/jazz1","output") >> writing: write!(ifile, [-1,2,3]) >> writing: write!(ifile, [10,-10,0,111]) >> writing: write!(ifile, [7]) >> writing: reopen!(ifile, "input") >> writing: read! ifile >> writing: read! ifile >> writing: readIfCan! ifile >> writing: readIfCan! ifile >> writing: iomode ifile >> writing: name ifile >> writing: close! ifile >> writing: )system rm /tmp/jazz1 >> >> (At this point after waiting several hours I killed the make and sman >> processes. I was expecting to see: 'making FILE.input', etc. >> >> make[2]: *** [all] Terminated >> /bin/bash: line 1: 1054 Terminated make >> make[1]: *** [all-paste] Error 143 >> make[1]: Leaving directory `/home/page/fricas-build/src' >> make: *** [all-src] Error 2 >> make[3]: *** Deleting file `FILE.pht' >> make[3]: *** [FILE.pht] Terminated >> p...@billpage:~/fricas-build$ >> >> --- >> >> I was able to determine that the code that generates these messages is >> in 'src/hyper/htinp.c' but I do not know enough about this program to >> even guess what might be suddenly going wrong. I do vaguely recall >> your comments about a possible race condition. >>
Waldek wrote: > > Yes, the communication between processes is not doing any > synchronization. But in non-parallel build on unloaded machine > we should always win races. I wonder if your problem is related > to bug 145 (it seems that bug 145 is appears because part of > input to AXIOMsys got discarded). > >> Is anyone else seeing these sort of failures? Do you have any ideas >> for what I might try in order to debug this? >> > > One thing I did was to move AXIOMsys to AXIOMsys.bin and > insted of binary AXIOMsys use script like: > > #!/bin/sh > > exec strace -o srapp.$$ /path/to/target/directory/AXIOMsys.bin "$@" > I tried this but the output looks complex and confusing to me. At some point in the srapp file I begin to see Segmentation Fault: ... open("/usr/lib/locale/ISO-8859-1/LC_CTYPE", O_RDONLY) = -1 ENOENT (No such file or directory) open("/usr/share/locale-langpack/ISO-8859-1/LC_CTYPE", O_RDONLY) = -1 ENOENT (No such file or directory) readlink("/proc/self/exe", "/home/page/fricas-build/target/x86_64-unknown-linux/bin/AXIOMsys.bin"..., 4096) = 68 open("/home/page/fricas-build/target/x86_64-unknown-linux/bin/AXIOMsys.bin", O_RDONLY) = 3 lseek(3, -8, SEEK_END) = 74674216 read(3, "LCBS\0\0\0\0"..., 8) = 8 lseek(3, -16, SEEK_END) = 74674208 read(3, "\...@\t\0\0\0\0\0"..., 8) = 8 lseek(3, 606208, SEEK_SET) = 606208 lseek(3, 0, SEEK_CUR) = 606208 read(3, "LCBS\0\0\0\0"..., 8) = 8 lseek(3, -48, SEEK_END) = 74674176 read(3, "U\363\3531\0\0\0\0\1\0\0\0\0\0\0\0\0\0\377\377\1\0\0\0\0\0 \0\0\0\0\0"..., 32) = 32 close(3) = 0 uname({sys="Linux", node="billpage", ...}) = 0 personality(0xffffffff /* PER_??? */) = 4456448 mmap(NULL, 33554432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaab4c9000 mmap(0x20000000, 1044480, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x20000000 mmap(0x20100000, 1044480, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x20100000 mmap(0x1000000000, 8589869056, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x1000000000 mmap(0x20200000, 1044480, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x20200000 mmap(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2aaaad4c9000 mprotect(0x2aaaad4c9000, 4096, PROT_NONE) = 0 open("/etc/localtime", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 fstat(3, {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaad4ca000 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0\0"..., 4096) = 2819 lseek(3, -1802, SEEK_CUR) = 1017 read(3, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0\5\0\0\0\0\0"..., 4096) = 1802 close(3) = 0 munmap(0x2aaaad4ca000, 4096) = 0 open("/home/page/fricas-build/target/x86_64-unknown-linux/bin/AXIOMsys.bin", O_RDONLY) = 3 lseek(3, 606208, SEEK_SET) = 606208 read(3, "LCBS\0\0\0\0\24\17\0\0\0\0\0\0\3\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0;"..., 4096) = 4096 mmap(0x20000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0x95000) = 0x20000000 mmap(0x20100000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0x97000) = 0x20100000 mmap(0x1000000000, 73904128, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0x98000) = 0x1000000000 lseek(3, 74526720, SEEK_SET) = 74526720 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0 \0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(3, ""..., 0) = 0 mprotect(0x1000000000, 73904128, PROT_READ|PROT_EXEC) = 0 rt_sigaction(SIGILL, {0x40ea70, [HUP INT QUIT PIPE ALRM TERM CHLD TSTP URG XCPU XFSZ VTALRM PROF WINCH IO], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x2aaaab188040}, NULL, 8) = 0 rt_sigaction(SIGTRAP, {0x40ea70, [HUP INT QUIT PIPE ALRM TERM CHLD TSTP URG XCPU XFSZ VTALRM PROF WINCH IO], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x2aaaab188040}, NULL, 8) = 0 rt_sigaction(SIGSEGV, {0x40ea70, [HUP INT QUIT PIPE ALRM TERM CHLD TSTP URG XCPU XFSZ VTALRM PROF WINCH IO], SA_RESTORER|SA_STACK|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x2aaaab188040}, NULL, 8) = 0 mmap(NULL, 4493312, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2aaaad4ca000 mmap(NULL, 280, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_NORESERVE, -1, 0) = 0x2aaaad913000 sigaltstack({ss_sp=0x2aaaad8d2000, ss_flags=0, ss_size=262144}, NULL) = 0 mprotect(0x2aaaad4ca000, 4096, PROT_NONE) = 0 mprotect(0x2aaaad7c9000, 4096, PROT_NONE) = 0 mprotect(0x2aaaad7ca000, 4096, PROT_NONE) = 0 mprotect(0x2aaaad4cb000, 4096, PROT_READ|PROT_EXEC) = 0 mprotect(0x2aaaad7c8000, 4096, PROT_NONE) = 0 mprotect(0x2aaaad7cb000, 4096, PROT_NONE) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- mprotect(0x10000e7000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 rt_sigreturn(0x10000e7000) = 68720425183 --- SIGSEGV (Segmentation fault) @ 0 (0) --- mprotect(0x10002a5000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 ... I am not exactly sure how to interpret this but certainly something went badly wrong in AXIOMsys. > Sometimes adding tee command on AXIOMsys output was useful (strace > only records beginning of each output block, tee gave whole output). > Similar thing can be done to hypertex, session and sman. > In the tee file I find: ... p...@billpage:~/fricas-build$ cat ./src/paste/tee.3871 Checking for foreign routines AXIOM="/home/page/fricas-build/target/x86_64-unknown-linux" spad-lib="/home/page/fricas-build/target/x86_64-unknown-linux/lib/libspad.so" foreign routines found openServer result 0 FriCAS (AXIOM fork) Computer Algebra System Version: FriCAS 2009-10-26 Timestamp: Saturday October 31, 2009 at 21:36:01 ----------------------------------------------------------------------------- Issue )copyright to view copyright notices. Issue )summary for a summary of useful system commands. Issue )quit to leave FriCAS and return to shell. ----------------------------------------------------------------------------- Re-reading compress.daase Re-reading interp.daase Re-reading operation.daase Re-reading category.daase Re-reading browse.daase (1) -> Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL. (no restarts: If you didn't do this on purpose, please report it as a bug.) (A SIMPLE-ERROR was caught when trying to print SB-DEBUG:*DEBUG-CONDITION* when entering the debugger. Printing was aborted and the SIMPLE-ERROR was stored in SB-DEBUG::*NESTED-DEBUG-CONDITION*.) Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL. ... And many more identical messages. > In few cases hangs we caused by errors in when computing examples, > the method above allowed to see error messages and fix the problem. > > Another technique is to attach gdb to hanging process. Unfortunatly, > for some reason we are compiling sman without '-g' flag. Anyway > > gdb > attach xxxx > bt > > shows what process xxxx is doing. > Here is the bt of the hypertex process: ... #0 0x00002b04261e3de5 in recv () from /lib/libc.so.6 #1 0x0000000000420d21 in sread (sock=0x17e80a0, buf=0x7fff05917354 "������@", buf_size=4, msg=0x426a7a "integer") at /usr/include/bits/socket2.h:45 #2 0x0000000000420e6d in fill_buf (sock=0x17e80a0, buf=0x7fff05917354 "������@", len=4, msg=0x426a7a "integer") at ../../../fricas-src/src/lib/sockio-c.c:308 #3 0x0000000000421122 in get_int (sock=0x6) at ../../../fricas-src/src/lib/sockio-c.c:319 #4 0x000000000040d5de in get_spad_output (pfile=0x1853710, command=0x18539c0 ")system rm /tmp/jazz1", com_type=<value optimized out>) at ../../../fricas-src/src/hyper/htinp.c:384 #5 0x000000000040e2e2 in make_the_input_file (page=0x17ce560) at ../../../fricas-src/src/hyper/htinp.c:458 #6 0x000000000040e5a9 in ht2_input () at ../../../fricas-src/src/hyper/htinp.c:126 #7 0x000000000040ecb5 in main (argc=<value optimized out>, argv=<value optimized out>) at ../../../fricas-src/src/hyper/hyper.c:286 (gdb) q The program is running. Quit anyway (and detach it)? (y or n) y Detaching from program: /home/page/fricas-build/target/x86_64-unknown-linux/bin/hypertex, process 2362 ---- Do I understand correctly that this show that hypertex is waiting for output from the AXIOMsys process? So my conclusion I guess is the the AXIOMsys process aborted unexpectedly. It seems strange to me that the command that apparently caused this was: )system rm /tmp/jazz1 Does any of this give you a better clue what is happening? Regards, Bill Page. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "FriCAS - computer algebra system" group. To post to this group, send email to fricas-devel@googlegroups.com To unsubscribe from this group, send email to fricas-devel+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/fricas-devel?hl=en -~----------~----~----~----~------~----~------~--~---