David Thanks for looking into this. It sounds as if you made real progress.
This reply is just to say that Simon M is on holiday this week, so you won't hear back from him till next wk. Simon | -----Original Message----- | From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] | On Behalf Of David Kirkman | Sent: 13 August 2006 21:59 | To: glasgow-haskell-users@haskell.org | Subject: Mac/PPC threaded RTS problems -- potential clue | | I managed to build ghc-6.5.20060804 on a powerpc mac, and I spent a | little time Saturday night trying to work out what might be going on | with the threaded RTS. | | Running the testsuite with make WAY=threaded1 shows many (73) | failures. Many of them (the conc??? set) have to do with TVars, in | particular writing TVars. On my dual-proc G5 macs, with | ghc-6.5.20060804, the following program will hang when running with | the threaded RTS, but will work fine when compiled without -threaded. | | >module Main where | > | >import GHC.Conc | >import Control.Concurrent | > | >main = do | > t1 <- atomically (do t1 <- newTVar 80 | > return t1) | > atomically ( do writeTVar t1 1 ) | > putStr ("done\n") | | This problem seems to be powerpc specific -- it works fine with the | threaded RTS on a multi-processor intel mac (built by me, from the | same ghc-6.5.20060804 source tree I used to build the ppc compiler). | | Looking around in STM.c (via decidedly low-tech printfs) I quickly | zoomed in on cond_lock_tvar, which lead me to cas (atomic compare and | swap) in SMP.h, where I found (I think) a fairly clear error in the | powerpc code -- I've appended a patch to the end of this message. | | The problem is that the inline assembler code was placing the result | of an operation in a register that is used as input later in the code. | At the bottom of this message I've extracted a short short code | fragment that you can run through gcc (on a powerpc machine) to see | the generated assembly output. | | The changes to fix the problem are fairly simple. The first adds an | ampersand to the output list of the assembly fragment ("=r" (result) | --> "=&r" (result)) The ampersand just tells gcc that result can not | be placed in a register used for any of the input parameters (o, n, or | p). Otherwise, it feels free to place output parameters in the same | registers used by the inputs -- but because of the flow of control | here we need everything in a distinct register. This change fixes the | TVar program above. | | The second change adds a clobber list (the :"cc", "memory"). This | tells gcc that the condition code (due to the compare) and memory (due | to the store) might be changed during the asm execution. The lack of | a clobber list did not seem to be causing any trouble, but without it | gcc is free to assume that no state is changed during the execution. | | Applying the following patch to SMP.h, and rebuilding everything, I | not only fixed the simple writeTVar program, but it also fixed 8 | programs in the testsuite (conc043 -> conc049, conc052 and conc053). | The only conc test program that still fails is conc039. But there are | still many mac problems, I still have 132 unexpected failures with | make fast. At least the patch does not cause any new failures (in | either make fast or make WAY=threaded1). | | Anyway, seeing as the change to SMP.h fixes a fair number of test | cases in the testsuite, I figure there is some chance that it might | fix some of the problems that people are having with the threaded RTS. | On the other hand, I'm not real happy the the large number of | testsuite failures my build gets, so I can't really call this | 'tested'. But I'm posting because it might be a useful clue for | somebody with a little more mac/ghc experience. | | Cheers, | | -david k. | | // | // Short code to run through gcc -S. On my powermac, without the change | // the generated assembly produces | // 1: lwarx r0, 0, r0 | // load to here ---^ ^------- from address here | // But we need this value for the stwcx. | // | // with the fix, the first line of the generated assembly becomes | // 1: lwarx r11, 0, r0 | // and r0 remains unmodified if we need to use it later in the stwcx. | | /* | * CMPXCHG - the single-word atomic compare-and-exchange instruction. Used | * in the STM implementation. | */ | long cas(long* p, long o, long n) | { | long result; | | // | // Change | // :"=r" (result) --> :"=&r" (result) | // to get result and p in different registers | __asm__ __volatile__ ( | "1: lwarx %0, 0, %3\n" | " cmpw %0, %1\n" | " bne 2f\n" | " stwcx. %2, 0, %3\n" | " bne- 1b\n" | "2:" | :"=r" (result) | :"r" (o), "r" (n), "r" (p) | ); | return result; | } | | | Here's a "diff -cp" for SMP.h | | *** SMP.h Sun Aug 13 01:08:53 2006 | --- SMP-new.h Sun Aug 13 01:08:47 2006 | *************** cas(StgVolatilePtr p, StgWord o, StgWord | *** 76,83 **** | " stwcx. %2, 0, %3\n" | " bne- 1b\n" | "2:" | ! :"=r" (result) | :"r" (o), "r" (n), "r" (p) | ); | return result; | #else | --- 76,84 ---- | " stwcx. %2, 0, %3\n" | " bne- 1b\n" | "2:" | ! :"=&r" (result) | :"r" (o), "r" (n), "r" (p) | + :"cc", "memory" | ); | return result; | #else | _______________________________________________ | Glasgow-haskell-users mailing list | Glasgow-haskell-users@haskell.org | http://www.haskell.org/mailman/listinfo/glasgow-haskell-users _______________________________________________ Glasgow-haskell-users mailing list Glasgow-haskell-users@haskell.org http://www.haskell.org/mailman/listinfo/glasgow-haskell-users