Simon,

This was fixed some time back. I combed the code base looking for other busy 
loops and there are no more. I commented out the code that runs the I2C + 
Machines + IO stuff, and only left the GUI code. It appears that just the 
wxhaskell part of the program fails to start. This matches a previous 
observation based on printing.

I’ll see if I can hack up the code to a minimal set that I can publish. All the 
IP is in the I2C code, so I might be able to get it down to one file.

Mike

On Jan 19, 2015, at 3:37 AM, Simon Marlow <marlo...@gmail.com> wrote:

> Hi Michael,
> 
> Previously in this thread it was pointed out that your code was doing busy 
> waiting, and so the problem can be fixed by modifying your code to not do 
> busy waiting.  Did you do this?  The -C flag is just a workaround which will 
> make the RTS reschedule more often, it won't fix the underlying problem.
> 
> The code you showed us was:
> 
> sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> 
> ProcessT m (Spec, String) ()
> sendTransactions dev dts = repeatedly $ do
>  dts' <- liftIO $ atomically $ readTVar dts
>  when (dts' == True) (do
>      (_, transactions) <- await
>      liftIO $ sendOut dev transactions)
> 
> This loops when the contents of the TVar is False.
> 
> Cheers,
> Simon
> 
> On 18/01/2015 01:15, Michael Jones wrote:
>> I have narrowed down the problem a bit. It turns out that many times if
>> I run the program and wait long enough, it will start. Given an event
>> log, it may take from 1000-10000 entries sometimes.
>> 
>> When I look at a good start vs. slow start, I see that in both cases
>> things startup and there is some thread activity for thread 2 and 3,
>> then the application starts creating other threads, which is when the
>> wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case,
>> it just gets stuck on thread 2/3 activity for a very long time.
>> 
>> If I switch from -C0.001 to -C0.010, the startup is more reliable, in
>> that most starts result in an immediate GUI and i2c IO.
>> 
>> The behavior suggests to me that some initial threads are starving the
>> ability for other threads to start, and perhaps on a dual core machine
>> it is more of a problem than single or quad core machines. For certain,
>> due to some printing, I know that the main thread is starting, and that
>> a print just before the first fork is not printing. Code between them is
>> evaluating wxhaskell functions, but the main frame is not yet asked to
>> become visible. From last week, I know that an non-gui version of the
>> app is getting stuck, but I do not know if it eventually runs like this
>> case.
>> 
>> Is there some convention that when I look at an event log you can tell
>> which threads are OS threads vs threads from fork?
>> 
>> Perhaps someone that knows the scheduler might have some advice. It
>> seems odd that a scheduler could behave this way. The scheduler should
>> have some built in notion of fairness.
>> 
>> 
>> On Jan 12, 2015, at 11:02 PM, Michael Jones <m...@proclivis.com
>> <mailto:m...@proclivis.com>> wrote:
>> 
>>> Sorry I am reviving an old problem, but it has resurfaced, such that
>>> one system behaves different than another.
>>> 
>>> Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on
>>> a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax,
>>> something bad is going on. In summary, the same code that runs on two
>>> machines does not run on a third machine. So this indicates I have not
>>> made any breaking changes to the code or cabal files. Compiling with
>>> GHC 7.8.3.
>>> 
>>> This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1
>>> kernel. It is a dual core 64 bit I86 Atom processor. The application
>>> hangs at startup. If I remove the -C0.00N option and instead use -V0,
>>> the application runs. It has bad timing properties, but it does at
>>> least run. Note that a hang hangs an IO thread talking USB, and the
>>> GUI thread.
>>> 
>>> When testing with the -C0.00N option, it did run 2 times out of 20
>>> tries, so fail means fail most but not all of the time. When it did
>>> run, it continued to run properly. This perhaps indicates some kind of
>>> internal race condition.
>>> 
>>> In the fail to run case, it does some printing up to the point where
>>> it tries to create a wxHaskell frame. In another non-UI version of the
>>> program it also fails to run. Logging to a file gives a similar
>>> indication. It is clear that the program starts up, then fails during
>>> the run in some form of lockup, well after the initial startup code.
>>> 
>>> If I run with the strace command, it always runs with -C0.00N.
>>> 
>>> All the above was done with profiling enabled, so I removed that and
>>> instead enabled eventlog to look for clues.
>>> 
>>> In this case it lies between good and bad, in that IO to my USB is
>>> working, but the GUI comes up blank and never paints. Running this
>>> case without -v0 (event log) the gui partially paints and stops, but
>>> USB continues.
>>> 
>>> Questions:
>>> 
>>> 1) Does ghc 7.8.4 have any improvements that might pertain to these
>>> kinds of scheduling/thread problems?
>>> 2) Is there anything about the nature of a thread using USB, I2C, or
>>> wxHaskell IO that leads to problems that a pure calculation app would
>>> not have?
>>> 3) Any ideas how to track down the problem when changing conditions
>>> (compiler or runtime options) affects behavior?
>>> 4) Are there other options besides -V and -C for the runtime that
>>> might apply?
>>> 5) What does -V0 do that makes a problem program run?
>>> 
>>> Mike
>>> 
>>> 
>>> 
>>> 
>>> On Oct 29, 2014, at 6:02 PM, Michael Jones <m...@proclivis.com
>>> <mailto:m...@proclivis.com>> wrote:
>>> 
>>>> John,
>>>> 
>>>> Adding -C0.005 makes it much better. Using -C0.001 makes it behave
>>>> more like -N4.
>>>> 
>>>> Thanks. This saves my project, as I need to deploy on a single core
>>>> Atom and was stuck.
>>>> 
>>>> Mike
>>>> 
>>>> On Oct 29, 2014, at 5:12 PM, John Lato <jwl...@gmail.com
>>>> <mailto:jwl...@gmail.com>> wrote:
>>>> 
>>>>> By any chance do the delays get shorter if you run your program with
>>>>> `+RTS -C0.005` ?  If so, I suspect you're having a problem very
>>>>> similar to one that we had with ghc-7.8 (7.6 too, but it's worse on
>>>>> ghc-7.8 for some reason), involving possible misbehavior of the
>>>>> thread scheduler.
>>>>> 
>>>>> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones <m...@proclivis.com
>>>>> <mailto:m...@proclivis.com>> wrote:
>>>>> 
>>>>>    I have a general question about thread behavior in 7.8.3 vs 7.6.X
>>>>> 
>>>>>    I moved from 7.6 to 7.8 and my application behaves very
>>>>>    differently. I have three threads, an application thread that
>>>>>    plots data with wxhaskell or sends it over a network (depends on
>>>>>    settings), a thread doing usb bulk writes, and a thread doing
>>>>>    usb bulk reads. Data is moved around with TChan, and TVar is
>>>>>    used for coordination.
>>>>> 
>>>>>    When the application was compiled with 7.6, my stream of usb
>>>>>    traffic was smooth. With 7.8, there are lots of delays where
>>>>>    nothing seems to be running. These delays are up to 40ms,
>>>>>    whereas with 7.6 delays were a 1ms or so.
>>>>> 
>>>>>    When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it
>>>>>    runs fine without with -N2/4.
>>>>> 
>>>>>    The program is compiled -O2 with profiling. The -N2/4 version
>>>>>    uses more memory,  but in both cases with 7.8 and with 7.6 there
>>>>>    is no space leak.
>>>>> 
>>>>>    I tired to compile and use -ls so I could take a look with
>>>>>    threadscope, but the application hangs and writes no data to the
>>>>>    file. The CPU fans run wild like it is in an infinite loop. It
>>>>>    at least pops an unpainted wxhaskell window, so it got partially
>>>>>    running.
>>>>> 
>>>>>    One of my libraries uses option -fsimpl-tick-factor=200 to get
>>>>>    around the compiler.
>>>>> 
>>>>>    What do I need to know about changes to threading and event
>>>>>    logging between 7.6 and 7.8? Is there some general documentation
>>>>>    somewhere that might help?
>>>>> 
>>>>>    I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar
>>>>>    ball and installed myself, after removing 7.6 with apt-get.
>>>>> 
>>>>>    Any hints appreciated.
>>>>> 
>>>>>    Mike
>>>>> 
>>>>> 
>>>>>    _______________________________________________
>>>>>    Glasgow-haskell-users mailing list
>>>>>    Glasgow-haskell-users@haskell.org
>>>>>    <mailto:Glasgow-haskell-users@haskell.org>
>>>>>    http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>>>>> 
>>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> Glasgow-haskell-users mailing list
>>> Glasgow-haskell-users@haskell.org
>>> <mailto:Glasgow-haskell-users@haskell.org>
>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>> 
>> 
>> 
>> _______________________________________________
>> Glasgow-haskell-users mailing list
>> Glasgow-haskell-users@haskell.org
>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users
>> 
> _______________________________________________
> Glasgow-haskell-users mailing list
> Glasgow-haskell-users@haskell.org
> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Reply via email to