Re: GHC 7.8.3 thread hang

Michael Jones Wed, 12 Nov 2014 22:03:07 -0800

I am having difficulty imagining how there could be a loop that evaluates a 
black hole a second time by the same thread. The paper Haskell on a 
Shared-Memory Processor mentions that the runtime must not do this. My take on 
this description is that it should never happen. Even if code was recursive, it 
should not happen. Code might recurse forever and chew up memory until failure, 
but a thread creating a black hold should not trip up on its own black hole.


So I am not sure what I am looking for. There is no recursion in the encode.

Can you give some examples of what I should be looking for?

Mike


On Nov 11, 2014, at 8:37 PM, John Lato <[email protected]> wrote:

> The blocked on black hole message is very suspicious.  It means that thread 7 
> is blocked waiting for another thread to evaluate a thunk.  But in this case, 
> it's thread 7 that created that thunk and is supposed to be doing the 
> evaluating.  This is some evidence that Gregory's theory is correct and your 
> encode function loops somewhere.
> 
> On Wed Nov 12 2014 at 11:25:30 AM Michael Jones <[email protected]> wrote:
> Gregory,
> 
> The options in the 7.8.3 user guide says in the -Msize option that by default 
> the heap is unlimited. I have several applications, and they all have 
> messages like:
> 
> 7fddc7bcd700: cap 2: waking up thread 7 on cap 2
> 7fddc7bcd700: cap 2: thread 4 stopped (yielding)
> 7fddcaad6740: cap 2: running thread 7 (ThreadRunGHC)
> 7fddcaad6740: cap 2: thread 7 stopped (heap overflow)
> 7fddcaad6740: cap 2: requesting parallel GC
> 7fddc5ffe700: cap 0: starting GC
> 7fddc57fd700: cap 1: starting GC
> 7fdda77fe700: cap 3: starting GC
> 7fddcaad6740: cap 2: starting GC
> 
> I assumed that when the heap ran out of space, it caused a GC, or it enlarged 
> the heap. The programs that have these messages run for very long periods of 
> time, and when I heap profile them, they use about 500KM to 1MB over long 
> periods of time, and are quite stable.
> 
> As a test, I ran the hang application with profiling to see if memory jumps 
> up before or after the hang.
> 
> What I notice is the app moves along using about 800KB, then there is a spike 
> to 2MB at the hang. So I believe you, but I am confused about the RTS 
> behavior and how I can have all these overflow messages in a normal 
> application and how to tell the difference between these routine messages vs 
> a real heap problem.
> 
> So, I dug deeper into the log. A normal execution for sending a command looks 
> like:
> 
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
> 7f99e6ffd700: cap 0: requesting parallel GC
> 7f99e6ffd700: cap 0: starting GC
> 7f99e6ffd700: cap 0: GC working
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: all caps stopped for GC
> 7f99e6ffd700: cap 0: finished GC
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: sendCommand
> 7f99e6ffd700: cap 0: sendCommand: encoded
> 7f99e6ffd700: cap 0: sendCommand: size 4
> 7f99e6ffd700: cap 0: sendCommand: unpacked
> 7f99e6ffd700: cap 0: Sending command of size 4
> 7f99e6ffd700: cap 0: Sending command of size "\NUL\EOT"
> 7f99e6ffd700: cap 0: sendCommand: sent
> 7f99e6ffd700: cap 0: sendCommand: flushed
> 7f99e6ffd700: cap 0: thread 7 stopped (blocked on an MVar)
> 7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 2 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> 
> The thread is run, overflows, GC, runs, then blocks on an MVAr.
> 
> For a the hang case:
> 
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: sendCommand
> 7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
> 7f99e6ffd700: cap 0: requesting parallel GC
> 7f99e6ffd700: cap 0: starting GC
> 7f99e6ffd700: cap 0: GC working
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: all caps stopped for GC
> 7f99e6ffd700: cap 0: finished GC
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (heap overflow)
> 7f99e6ffd700: cap 0: requesting parallel GC
> 7f99e6ffd700: cap 0: starting GC
> 7f99e6ffd700: cap 0: GC working
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: GC idle
> 7f99e6ffd700: cap 0: GC done
> 7f99e6ffd700: cap 0: all caps stopped for GC
> 7f99e6ffd700: cap 0: finished GC
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 2 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 2 stopped (yielding)
> 7f99e6ffd700: cap 0: running thread 45 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> ...
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> 7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: fetchTelemetryServer: got lock
> 7f99e6ffd700: cap 0: fetchTelemetryServer: unlock
> 7f99e6ffd700: cap 0: fetchTelemetryServer
> 7f99e6ffd700: cap 0: thread 45 stopped (yielding)
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign 
> call)
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign 
> call)
> 7f99e6ffd700: cap 0: running thread 7 (ThreadRunGHC)
> 7f99e6ffd700: cap 0: thread 7 stopped (blocked on black hole owned by thread 
> 7)
> 7f9a05362a40: cap 0: running thread 1408 (ThreadRunGHC)
> 7f9a05362a40: cap 0: thread 1408 stopped (suspended while making a foreign 
> call)
> 
> run, overflow, GC, run, overflow, GC, run, yield, other stuff, run blocked on 
> black hole
> 
> And that is the last activity for thread 7.
> 
> I found a few links about black holes and such, so I’ll go off and read those 
> and try to learn what the RTS is doing and why it can hang on a black hole. 
> But if you have some hits, let me know. 
> 
> Mike
> 
> 
> 
> On Nov 11, 2014, at 4:01 PM, Gregory Collins <[email protected]> wrote:
> 
>> 
>> On Tue, Nov 11, 2014 at 2:06 PM, Michael Jones <[email protected]> wrote:
>> Those are all over the log even when it runs properly. So I assume the 
>> runtime is resizing the heap or something.
>> 
>> No, it means you're exhausting the heap (maybe the runtime stack for the 
>> thread running "encode"), probably because "encode" is infinite-looping. I 
>> think Occam's razor applies here, check that any recursion you're doing is 
>> actually reducing the recursive argument. Perhaps you could post the code 
>> (e.g. http://gist.github.com/)?
>> 
>> G
>> 
>> 
>> -- 
>> Gregory Collins <[email protected]>
> 
> _______________________________________________
> Glasgow-haskell-users mailing list
> [email protected]
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

_______________________________________________
Glasgow-haskell-users mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: GHC 7.8.3 thread hang

Reply via email to