Simon, This was fixed some time back. I combed the code base looking for other busy loops and there are no more. I commented out the code that runs the I2C + Machines + IO stuff, and only left the GUI code. It appears that just the wxhaskell part of the program fails to start. This matches a previous observation based on printing.
I’ll see if I can hack up the code to a minimal set that I can publish. All the IP is in the I2C code, so I might be able to get it down to one file. Mike On Jan 19, 2015, at 3:37 AM, Simon Marlow <marlo...@gmail.com> wrote: > Hi Michael, > > Previously in this thread it was pointed out that your code was doing busy > waiting, and so the problem can be fixed by modifying your code to not do > busy waiting. Did you do this? The -C flag is just a workaround which will > make the RTS reschedule more often, it won't fix the underlying problem. > > The code you showed us was: > > sendTransactions :: MonadIO m => SMBusDevice DeviceDC590 -> TVar Bool -> > ProcessT m (Spec, String) () > sendTransactions dev dts = repeatedly $ do > dts' <- liftIO $ atomically $ readTVar dts > when (dts' == True) (do > (_, transactions) <- await > liftIO $ sendOut dev transactions) > > This loops when the contents of the TVar is False. > > Cheers, > Simon > > On 18/01/2015 01:15, Michael Jones wrote: >> I have narrowed down the problem a bit. It turns out that many times if >> I run the program and wait long enough, it will start. Given an event >> log, it may take from 1000-10000 entries sometimes. >> >> When I look at a good start vs. slow start, I see that in both cases >> things startup and there is some thread activity for thread 2 and 3, >> then the application starts creating other threads, which is when the >> wxhaskell GUI pops up and IO out my /dev/i2c begins. In the slow case, >> it just gets stuck on thread 2/3 activity for a very long time. >> >> If I switch from -C0.001 to -C0.010, the startup is more reliable, in >> that most starts result in an immediate GUI and i2c IO. >> >> The behavior suggests to me that some initial threads are starving the >> ability for other threads to start, and perhaps on a dual core machine >> it is more of a problem than single or quad core machines. For certain, >> due to some printing, I know that the main thread is starting, and that >> a print just before the first fork is not printing. Code between them is >> evaluating wxhaskell functions, but the main frame is not yet asked to >> become visible. From last week, I know that an non-gui version of the >> app is getting stuck, but I do not know if it eventually runs like this >> case. >> >> Is there some convention that when I look at an event log you can tell >> which threads are OS threads vs threads from fork? >> >> Perhaps someone that knows the scheduler might have some advice. It >> seems odd that a scheduler could behave this way. The scheduler should >> have some built in notion of fairness. >> >> >> On Jan 12, 2015, at 11:02 PM, Michael Jones <m...@proclivis.com >> <mailto:m...@proclivis.com>> wrote: >> >>> Sorry I am reviving an old problem, but it has resurfaced, such that >>> one system behaves different than another. >>> >>> Using -C0.001 solved problems on a Mac + VM + Ubuntu 14. It worked on >>> a single core 32 bit Atom NUC. But on a dual core Atom MinnowBoardMax, >>> something bad is going on. In summary, the same code that runs on two >>> machines does not run on a third machine. So this indicates I have not >>> made any breaking changes to the code or cabal files. Compiling with >>> GHC 7.8.3. >>> >>> This bad system has Ubuntu 14 installed, with an updated Linux 3.18.1 >>> kernel. It is a dual core 64 bit I86 Atom processor. The application >>> hangs at startup. If I remove the -C0.00N option and instead use -V0, >>> the application runs. It has bad timing properties, but it does at >>> least run. Note that a hang hangs an IO thread talking USB, and the >>> GUI thread. >>> >>> When testing with the -C0.00N option, it did run 2 times out of 20 >>> tries, so fail means fail most but not all of the time. When it did >>> run, it continued to run properly. This perhaps indicates some kind of >>> internal race condition. >>> >>> In the fail to run case, it does some printing up to the point where >>> it tries to create a wxHaskell frame. In another non-UI version of the >>> program it also fails to run. Logging to a file gives a similar >>> indication. It is clear that the program starts up, then fails during >>> the run in some form of lockup, well after the initial startup code. >>> >>> If I run with the strace command, it always runs with -C0.00N. >>> >>> All the above was done with profiling enabled, so I removed that and >>> instead enabled eventlog to look for clues. >>> >>> In this case it lies between good and bad, in that IO to my USB is >>> working, but the GUI comes up blank and never paints. Running this >>> case without -v0 (event log) the gui partially paints and stops, but >>> USB continues. >>> >>> Questions: >>> >>> 1) Does ghc 7.8.4 have any improvements that might pertain to these >>> kinds of scheduling/thread problems? >>> 2) Is there anything about the nature of a thread using USB, I2C, or >>> wxHaskell IO that leads to problems that a pure calculation app would >>> not have? >>> 3) Any ideas how to track down the problem when changing conditions >>> (compiler or runtime options) affects behavior? >>> 4) Are there other options besides -V and -C for the runtime that >>> might apply? >>> 5) What does -V0 do that makes a problem program run? >>> >>> Mike >>> >>> >>> >>> >>> On Oct 29, 2014, at 6:02 PM, Michael Jones <m...@proclivis.com >>> <mailto:m...@proclivis.com>> wrote: >>> >>>> John, >>>> >>>> Adding -C0.005 makes it much better. Using -C0.001 makes it behave >>>> more like -N4. >>>> >>>> Thanks. This saves my project, as I need to deploy on a single core >>>> Atom and was stuck. >>>> >>>> Mike >>>> >>>> On Oct 29, 2014, at 5:12 PM, John Lato <jwl...@gmail.com >>>> <mailto:jwl...@gmail.com>> wrote: >>>> >>>>> By any chance do the delays get shorter if you run your program with >>>>> `+RTS -C0.005` ? If so, I suspect you're having a problem very >>>>> similar to one that we had with ghc-7.8 (7.6 too, but it's worse on >>>>> ghc-7.8 for some reason), involving possible misbehavior of the >>>>> thread scheduler. >>>>> >>>>> On Wed, Oct 29, 2014 at 2:18 PM, Michael Jones <m...@proclivis.com >>>>> <mailto:m...@proclivis.com>> wrote: >>>>> >>>>> I have a general question about thread behavior in 7.8.3 vs 7.6.X >>>>> >>>>> I moved from 7.6 to 7.8 and my application behaves very >>>>> differently. I have three threads, an application thread that >>>>> plots data with wxhaskell or sends it over a network (depends on >>>>> settings), a thread doing usb bulk writes, and a thread doing >>>>> usb bulk reads. Data is moved around with TChan, and TVar is >>>>> used for coordination. >>>>> >>>>> When the application was compiled with 7.6, my stream of usb >>>>> traffic was smooth. With 7.8, there are lots of delays where >>>>> nothing seems to be running. These delays are up to 40ms, >>>>> whereas with 7.6 delays were a 1ms or so. >>>>> >>>>> When I add -N2 or -N4, the 7.8 program runs fine. But on 7.6 it >>>>> runs fine without with -N2/4. >>>>> >>>>> The program is compiled -O2 with profiling. The -N2/4 version >>>>> uses more memory, but in both cases with 7.8 and with 7.6 there >>>>> is no space leak. >>>>> >>>>> I tired to compile and use -ls so I could take a look with >>>>> threadscope, but the application hangs and writes no data to the >>>>> file. The CPU fans run wild like it is in an infinite loop. It >>>>> at least pops an unpainted wxhaskell window, so it got partially >>>>> running. >>>>> >>>>> One of my libraries uses option -fsimpl-tick-factor=200 to get >>>>> around the compiler. >>>>> >>>>> What do I need to know about changes to threading and event >>>>> logging between 7.6 and 7.8? Is there some general documentation >>>>> somewhere that might help? >>>>> >>>>> I am on Ubuntu 14.04 LTS. I downloaded the 7.8 tool chain tar >>>>> ball and installed myself, after removing 7.6 with apt-get. >>>>> >>>>> Any hints appreciated. >>>>> >>>>> Mike >>>>> >>>>> >>>>> _______________________________________________ >>>>> Glasgow-haskell-users mailing list >>>>> Glasgow-haskell-users@haskell.org >>>>> <mailto:Glasgow-haskell-users@haskell.org> >>>>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Glasgow-haskell-users mailing list >>> Glasgow-haskell-users@haskell.org >>> <mailto:Glasgow-haskell-users@haskell.org> >>> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >> >> >> >> _______________________________________________ >> Glasgow-haskell-users mailing list >> Glasgow-haskell-users@haskell.org >> http://www.haskell.org/mailman/listinfo/glasgow-haskell-users >> > _______________________________________________ > Glasgow-haskell-users mailing list > Glasgow-haskell-users@haskell.org > http://www.haskell.org/mailman/listinfo/glasgow-haskell-users _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users