Maybe there is still a bug in Thread? I now use threads in a very simple way:
for q in get_seq_data(config, min_n_read, min_len_aln):
var (seqs, seed_id) = q
log("len(seqs)=", $len(seqs), ", seed_id=", seed_id)
var cargs: ConsensusArgs = (inseqs: seqs, seed_id: seed_id, config:
config)
if n_core == 0:
process_consensus(cargs)
else:
var rthread: ref Thread[ConsensusArgs]
new(rthread)
createThread(rthread[], process_consensus, cargs)
joinThread(rthread[])
... (threadpool first creates 48 threads, even though I do not use
threadpool.)
[New Thread 0x7ffff015a700 (LWP 202052)]
[New Thread 0x7fffefedb700 (LWP 202053)]
[New Thread 0x7fffefbdc700 (LWP 202054)]
main(n_core=1)
len(seqs)=25, seed_id=2
[New Thread 0x7fffef52b700 (LWP 202055)]
[Thread 0x7fffef52b700 (LWP 202055) exited]
len(seqs)=98, seed_id=14
[New Thread 0x7fffef52b700 (LWP 202056)]
[Thread 0x7fffef52b700 (LWP 202056) exited]
len(seqs)=58, seed_id=15
[New Thread 0x7fffef52b700 (LWP 202057)]
[Thread 0x7fffef52b700 (LWP 202057) exited]
len(seqs)=43, seed_id=22
[New Thread 0x7fffef52b700 (LWP 202058)]
[Thread 0x7fffef52b700 (LWP 202058) exited]
len(seqs)=55, seed_id=25
[New Thread 0x7fffef52b700 (LWP 202059)]
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffef52b700 (LWP 202059)]
deallocOsPages_e5IRqVbks39a9bBzvLjGxw2g (a=0x7ffff7f3d0c8) at
/home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/alloc.nim:740
740 osDeallocPages(it, it.origSize and not 1)
(gdb) bt
#0 deallocOsPages_e5IRqVbks39a9bBzvLjGxw2g (a=0x7ffff7f3d0c8) at
/home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/alloc.nim:740
#1 0x00000000004143f3 in deallocOsPages_njssp69aa7hvxte9bJ8uuDcg_3 () at
/home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/gc.nim:107
#2 threadProcWrapStackFrame_dXJaXMz804k05DGz7X4RkA (thrd=0x7ffff7f79328)
at /home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/threads.nim:427
#3 threadProcWrapper_2AvjU29bJvs3FXJIcnmn4Kg_2 (closure=0x7ffff7f79328) at
/home/UNIXHOME/cdunn/repo/gh/Nim/lib/system/threads.nim:437
#4 0x00007ffff76ba182 in start_thread (arg=0x7fffef52b700) at
pthread_create.c:312
#5 0x00007ffff73e700d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) l
735 when defined(debugHeapLinks):
736 cprintf("owner %p; dealloc A: %p size: %ld; next: %p\n",
addr(a),
737 it, it.origSize and not 1, next)
738 sysAssert it.origSize >= PageSize, "origSize too small"
739 # note:
740 osDeallocPages(it, it.origSize and not 1)
741 it = next
742 when false:
743 for p in elements(a.chunkStarts):
744 var page = cast[PChunk](p shl PageShift)
(gdb) p it
$1 = (BigChunk_Rv9c70Uhp2TytkX7eH78qEg *) 0x101010101010101
That is with Nim origin/devel up-to-date, at
commit 172a9c8e97694846c3348983a9b2b7c2931c939d
Author: Dominik Picheta <[email protected]>
Date: Mon Mar 27 12:14:06 2017
My program works fine without threads (n_core=0). It worked fine when I used
threadpool.
Another problem with this approach is that is goes 3x slower (despite using
GC_disable within the thread) than my single-threaded version, which was 3x
faster than C+Python/multiprocessing. Very disappointing. The single-threaded
version also suffers an explosion in memory fragmentation, though not as bad as
before I started re-using strings and seqs within each task.
So at this point, I've lost my runtime advantage; I have to jump through hoops
to avoid memory fragmentation (compared with Python multiprocessing); and now I
have this seg-fault.
If anyone wants to debug this, let me know. I can put together a full test-case
(via my corporate cloud server). I have 3 test-cases: 75k, 1.4M, and 800M. This
crash happens only on the largest, but at least it happens pretty quickly.