Yes, I thought the same. I thus removed the dependency on MPI; I'm now serializing and deserializing directly, without using MPI. My current code is at <https://bitbucket.org/eschnett/funhpc.jl/branch/memdebug>, and running "julia Wave.jl" triggers the problem reliably in a few seconds.
The deserialization call is in Comm.jl in the function recv_item. -erik On Sun, Sep 21, 2014 at 2:00 PM, Jake Bolewski <[email protected]> wrote: > I saw a couple of posts back that you are using MPI? Any chance that MPI is > issuing a callback on a different thread? This could be an issue with > c-interop and can be sometimes solved by following the steps in the thread > safety section of the manual. > > On Sunday, September 21, 2014 1:44:23 PM UTC-4, Erik Schnetter wrote: >> >> Unfortunately I don't have a simple example that reproduces the >> problem. So far, I've managed to whittle it down to an application >> running in a single process without dependencies on external packages. >> >> -erik >> >> On Sun, Sep 21, 2014 at 1:04 PM, Tim Holy <[email protected]> wrote: >> > If you have/find a clean example, certainly posting an issue will make >> > sense. I >> > can't comment on whether the task switch during I/O is inevitable. >> > >> > --Tim >> > >> > On Sunday, September 21, 2014 10:25:11 AM Erik Schnetter wrote: >> >> I'm aware that Julia's threads are "green threads". The issue of >> >> thread safety still remains; if one thread is suspended in a critical >> >> region, another can enter that region. Storing handles in global data >> >> structures and incrementing global variables are such actions, and I'm >> >> not 100% sure that the respective region in serialize.jl are >> >> yield-free, even without my info output. I was surprised to see that >> >> I/O causes task switches -- maybe something else (hashing? >> >> dictionaries? creating new lambdas in C?) also causes task switches? >> >> >> >> gdb points to memory allocation routines in libc, called from gc.c or >> >> array.c. I assume that something overwrites memory, destroying libc >> >> malloc's data structures, leading to a crash later. >> >> >> >> -erik >> >> >> >> On Sun, Sep 21, 2014 at 5:26 AM, Tim Holy <[email protected]> wrote: >> >> > Hi Erik, >> >> > >> >> > First, one comment: tasks are not "true" (kernel) threads. Currently >> >> > a >> >> > julia process is single-threaded. Tasks are better considered as a >> >> > form >> >> > of cooperative multitasking. >> >> > >> >> > Yes, I've also found that I/O causes task switching. I don't >> >> > personally >> >> > know a great way around this. One option would presumably be to have >> >> > some >> >> > form of message queue; I am pretty sure that push!ing a new message >> >> > on >> >> > it---as long as you don't need to touch I/O to create the >> >> > message---would >> >> > not cause a switch. You can also use time() and other markers to >> >> > indicate >> >> > the status of control flow. >> >> > >> >> > I haven't been reading things carefully enough to know whether >> >> > there's any >> >> > history behind this, but if you haven't said so already...what does >> >> > gdb >> >> > (or >> >> > equivalent) say about the segfault? >> >> > >> >> > --Tim >> >> > >> >> > On Saturday, September 20, 2014 08:24:59 PM Erik Schnetter wrote: >> >> >> I am trying to track down a segfault in a Julia application. >> >> >> Currently I >> >> >> am >> >> >> zooming in on "deserialize", as avoiding calling it seems to >> >> >> reliably >> >> >> cure >> >> >> the problem, while calling it (even if not using the result) seems >> >> >> to >> >> >> reliably trigger the segfault. >> >> >> >> >> >> I am using many threads (tasks), and deserialize is called >> >> >> concurrently. >> >> >> Is >> >> >> this safe? I've been bitten in the past by this; e.g. I've >> >> >> accidentally >> >> >> added an "info" statement into a sequence of statements that needs >> >> >> to be >> >> >> atomic, and I/O apparently switches threads. Is there a list of >> >> >> known-to-be-safe or known-to-be-unsafe functions? Is deserialization >> >> >> thread-safe in this respect? >> >> >> >> >> >> I am in particular deserializing function calls and lambda >> >> >> expressions, >> >> >> and >> >> >> I see global variables ("lambda_numbers", "known_lambda_data"). Are >> >> >> the >> >> >> respective data structures (WeakKeyDict and Dict) thread-safe? >> >> >> >> >> >> Is there a locking mechanism in Julia? This would temporarily only >> >> >> allow >> >> >> a >> >> >> single thread (task) to run, aborting with an error if this thread >> >> >> becomes >> >> >> unrunnable. In other words, calling "yield" when holding a lock >> >> >> would be >> >> >> a >> >> >> no-op. >> >> >> >> >> >> -erik >> > >> >> >> >> -- >> Erik Schnetter <[email protected]> >> http://www.perimeterinstitute.ca/personal/eschnetter/ -- Erik Schnetter <[email protected]> http://www.perimeterinstitute.ca/personal/eschnetter/
