On 26/02/2012 02:23, Sanket Agrawal wrote:
I have to take back what I said about the increase in worker tasks being
related to some Mac OS pthread bug. I can now reproduce the issue on
Linux (Redhat x86_64) too (and cause a segmentation fault once in a
while). So, now, it seems the issue might be due to either some kind of
interaction between GHC RTS, and C pthread mutexes, or a bug in my code.

What I have done is to create a simple test case that reproduces the
increase in number of worker threads with each run of Haskell timer
thread (that syncs with C pthreads). I have put up the code on github
with documentation on how to reproduce the issue:
https://github.com/sanketr/cffitest

I will appreciate feedback on whether it is a bug in my code, or a GHC
bug that needs to be reported.

What version of GHC is this?  I vaguely remember fixing something like this.

The rule of thumb is: if you think it is a bug then report it, and we'll investigate further.

Cheers,
        Simon



On Sat, Feb 25, 2012 at 3:41 PM, Sanket Agrawal
<[email protected] <mailto:[email protected]>> wrote:

    On further investigation, it seems to be very specific to Mac OS
    Lion (I am running 10.7.3) - all tests were with -N3 option:

    - I can reliably crash the code with seg fault or bus error if I
    create more than 8 threads in C FFI (each thread creates its own
    mutex, for 1-1 coordination with Haskell timer thread). My iMac has
    4 processors. In gdb, I can see that the crash happened
    in __psynch_cvsignal () which seems to be related to pthread mutex.

    - If I increase the number of C FFI threads (and hence, pthread
    mutexes) to >=7, the number of tasks starts increasing. 8 is the max
    number of FFI threads in my testing where the code runs without
    crashing. But, it seems that there is some kind of pthread mutex
    related leak. What the timer thread does is to fork 8 parallel
    haskell threads to acquire mutexes from each of the C FFI thread.
    Though the function returns after acquiring, collecting data, and
    releasing mutex, some of the threads seem to be marked as active by
    GC, because of mutex memory leak. Exactly how, I don't know.

    - If I keep the number of C FFI threads to <=6, there is no memory
    leak. The number of tasks stays steady.

    So, it seems to be pthread library issue (and not a GHC issue).
    Something to keep in mind when developing code on Mac that involves
    mutex coordination with C FFI.


    On Sat, Feb 25, 2012 at 2:59 PM, Sanket Agrawal
    <[email protected] <mailto:[email protected]>> wrote:

        I wrote a program that uses a timed thread to collect data from
        a C producer (using FFI). The number of threads in C producer
        are fixed (and created at init). One haskell timer thread uses
        threadDelay to run itself on timed interval. When I look at RTS
        output after killing the program after couple of timer
        iterations, I see number of worker tasks increasing with time.

          For example, below is an output after 20 iterations of timer
        event:

                               MUT time (elapsed)       GC time  (elapsed)
           Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
           Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
           .......output until task 37 snipped as it is same as task
        1.......
           Task 38 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
           Task 39 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
           Task 40 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
           Task 41 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
           Task 42 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
           Task 43 (worker) :    0.18s    ( 10.20s)       0.00s    (  0.00s)
           Task 44 (worker) :    0.52s    ( 10.74s)       0.00s    (  0.00s)
           Task 45 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)
           Task 46 (worker) :    0.52s    ( 10.75s)       0.00s    (  0.00s)
           Task 47 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)


        After two iterations of timer event:

                                MUT time (elapsed)       GC time  (elapsed)
           Task  0 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
           Task  1 (worker) :    0.00s    (  0.00s)       0.00s    (  0.00s)
           Task  2 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
           Task  3 (worker) :    0.07s    (  0.09s)       0.00s    (  0.00s)
           Task  4 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
           Task  5 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
           Task  6 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
           Task  7 (worker) :    0.16s    (  1.21s)       0.00s    (  0.00s)
           Task  8 (worker) :    0.48s    (  1.80s)       0.00s    (  0.00s)
           Task  9 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)
           Task 10 (worker) :    0.48s    (  1.81s)       0.00s    (  0.00s)
           Task 11 (bound)  :    0.00s    (  0.00s)       0.00s    (  0.00s)


        Haskell code has one forkIO call to kick off C FFI - C FFI
        creates 8 threads. Runtime options are "-N3 +RTS -s". timer
        event is kicked off after forkIO. It is for the form (pseudo-code):

        timerevent <other arguments> time = run where run = do
        threadDelay time >> do some work >> run where <other variables
        defined for run function>

        I also wrote a simpler code using just timer event (fork one
        timer event, and run another timer event after that), but didn't
        see any tasks in RTS output.

        I tried searching GHC page for documentation on RTS output, but
        didn't find anything that could help me debug above issue. I
        suspect that timer event is the root cause of increasing number
        of tasks (with all but last 9 tasks idle -  I guess 8 tasks
        belong to C FFI, and one task to timerevent thread), and hence,
        memory leak.

        I will appreciate pointers on how to debug it. The timerevent
        does forkIO a call to send collected data from C FFI to a db
        server, but disabling that fork still results in the issue of
        increasing number of tasks. So, it seems strongly correlated
        with timer event though I am unable to reproduce it with a
        simpler version of timer event (which removes mvar sync/callback
        from C FFI).





_______________________________________________
Glasgow-haskell-users mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


_______________________________________________
Glasgow-haskell-users mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to