Yes, I thought the same. I thus removed the dependency on MPI; I'm now
serializing and deserializing directly, without using MPI. My current
code is at <https://bitbucket.org/eschnett/funhpc.jl/branch/memdebug>,
and running "julia Wave.jl" triggers the problem reliably in a few
seconds.

The deserialization call is in Comm.jl in the function recv_item.

-erik

On Sun, Sep 21, 2014 at 2:00 PM, Jake Bolewski <[email protected]> wrote:
> I saw a couple of posts back that you are using MPI?  Any chance that MPI is
> issuing a callback on a different thread?  This could be an issue with
> c-interop and can be sometimes solved by following the steps in the  thread
> safety section of the manual.
>
> On Sunday, September 21, 2014 1:44:23 PM UTC-4, Erik Schnetter wrote:
>>
>> Unfortunately I don't have a simple example that reproduces the
>> problem. So far, I've managed to whittle it down to an application
>> running in a single process without dependencies on external packages.
>>
>> -erik
>>
>> On Sun, Sep 21, 2014 at 1:04 PM, Tim Holy <[email protected]> wrote:
>> > If you have/find a clean example, certainly posting an issue will make
>> > sense. I
>> > can't comment on whether the task switch during I/O is inevitable.
>> >
>> > --Tim
>> >
>> > On Sunday, September 21, 2014 10:25:11 AM Erik Schnetter wrote:
>> >> I'm aware that Julia's threads are "green threads". The issue of
>> >> thread safety still remains; if one thread is suspended in a critical
>> >> region, another can enter that region. Storing handles in global data
>> >> structures and incrementing global variables are such actions, and I'm
>> >> not 100% sure that the respective region in serialize.jl are
>> >> yield-free, even without my info output. I was surprised to see that
>> >> I/O causes task switches -- maybe something else (hashing?
>> >> dictionaries? creating new lambdas in C?) also causes task switches?
>> >>
>> >> gdb points to memory allocation routines in libc, called from gc.c or
>> >> array.c. I assume that something overwrites memory, destroying libc
>> >> malloc's data structures, leading to a crash later.
>> >>
>> >> -erik
>> >>
>> >> On Sun, Sep 21, 2014 at 5:26 AM, Tim Holy <[email protected]> wrote:
>> >> > Hi Erik,
>> >> >
>> >> > First, one comment: tasks are not "true" (kernel) threads. Currently
>> >> > a
>> >> > julia process is single-threaded. Tasks are better considered as a
>> >> > form
>> >> > of cooperative multitasking.
>> >> >
>> >> > Yes, I've also found that I/O causes task switching. I don't
>> >> > personally
>> >> > know a great way around this. One option would presumably be to have
>> >> > some
>> >> > form of message queue; I am pretty sure that push!ing a new message
>> >> > on
>> >> > it---as long as you don't need to touch I/O to create the
>> >> > message---would
>> >> > not cause a switch. You can also use time() and other markers to
>> >> > indicate
>> >> > the status of control flow.
>> >> >
>> >> > I haven't been reading things carefully enough to know whether
>> >> > there's any
>> >> > history behind this, but if you haven't said so already...what does
>> >> > gdb
>> >> > (or
>> >> > equivalent) say about the segfault?
>> >> >
>> >> > --Tim
>> >> >
>> >> > On Saturday, September 20, 2014 08:24:59 PM Erik Schnetter wrote:
>> >> >> I am trying to track down a segfault in a Julia application.
>> >> >> Currently I
>> >> >> am
>> >> >> zooming in on "deserialize", as avoiding calling it seems to
>> >> >> reliably
>> >> >> cure
>> >> >> the problem, while calling it (even if not using the result) seems
>> >> >> to
>> >> >> reliably trigger the segfault.
>> >> >>
>> >> >> I am using many threads (tasks), and deserialize is called
>> >> >> concurrently.
>> >> >> Is
>> >> >> this safe? I've been bitten in the past by this; e.g. I've
>> >> >> accidentally
>> >> >> added an "info" statement into a sequence of statements that needs
>> >> >> to be
>> >> >> atomic, and I/O apparently switches threads. Is there a list of
>> >> >> known-to-be-safe or known-to-be-unsafe functions? Is deserialization
>> >> >> thread-safe in this respect?
>> >> >>
>> >> >> I am in particular deserializing function calls and lambda
>> >> >> expressions,
>> >> >> and
>> >> >> I see global variables ("lambda_numbers", "known_lambda_data"). Are
>> >> >> the
>> >> >> respective data structures (WeakKeyDict and Dict) thread-safe?
>> >> >>
>> >> >> Is there a locking mechanism in Julia? This would temporarily only
>> >> >> allow
>> >> >> a
>> >> >> single thread (task) to run, aborting with an error if this thread
>> >> >> becomes
>> >> >> unrunnable. In other words, calling "yield" when holding a lock
>> >> >> would be
>> >> >> a
>> >> >> no-op.
>> >> >>
>> >> >> -erik
>> >
>>
>>
>>
>> --
>> Erik Schnetter <[email protected]>
>> http://www.perimeterinstitute.ca/personal/eschnetter/



-- 
Erik Schnetter <[email protected]>
http://www.perimeterinstitute.ca/personal/eschnetter/

Reply via email to