Re: A few Questions on OS X ghc-6.4.3 fix

Peter Tanski Sat, 21 Oct 2006 17:04:50 -0700

Simon,

Many thanks for your informative reply! Please note my other responses below.


-Pete

On Oct 19, 2006, at 5:30 AM, Simon Marlow wrote:

Peter Tanski wrote:

(1) how do I obtain the latest 6.4.3 release?  It is no longer on
CVS,


It is in CVS.  I tagged it this morning, so you can check out the tag
ghc-6-4-3 from CVS.  Alternatively, you can grab the 6.4.3 source
distribution from here;

  http://haskell.org/ghc/dist/6.4.3


Thanks. Got it.  Here are the results of some tests:

platform:
        OS 10.4.8.
        Darwin (Mach) Kernel version 8.8.0,
        1 processor, (PowerPC 7450)

build with:
        gcc-4.0.1 (Apple)

ghc-6.4.3 configured with:
  $ ./configure --prefix=/opt/local
        
mk/build.mk contained:
        SRC_HC_OPTS     = -H32m -O -fasm -Rghc-timing
        GhcStage1HcOpts = -O0 -DDEBUG -W
        GhcLibHcOpts    = -O
        GhcLibWays      =
        GhcRtsCcOpts = -O2 -pg
        SplitObjs       = NO
        NoFibWays       =
        SRC_RUNTEST_OPTS += +RTS -H10m -RTS
        STRIP           =:
        GhcBootLibs     = YES

Tests were performed with the stage1 compiler.

OVERALL SUMMARY for test run started at Fri Oct 20 00:06:42 EDT 2006
    1474 total tests, which gave rise to
    4494 test cases, of which
       0 caused framework failures
     646 were skipped

    2880 expected passes
      33 expected failures
       2 unexpected passes
     517 unexpected failures --(no ghci, some missing libraries)

Unexpected passes:
   barton-mangler-bug(optasm)
   cholewo-eval(optasm)

(Notable) Unexpected failures (panics):
     ffi009(normal)
     ffi009(opt)
     ffi009(optasm)
     ffi009(threaded1)
     hs-boot(normal)
     hs-boot(opt)
     hs-boot(optasm)
     rnfail045(normal)
     tcfail159(normal)
     tcfail163(normal)
     tcfail164(normal)

I will look at these more closely later; if you are curious about any of them I will send the logs.

On a side-note, I will make an effort to test ghc-6.4.3 and ghc-6.6 in Parallel (with PAR defined). Even though I have a uni-processor I have pvm version 3 installed and have used pvm in C programs.

(2) For working a debug build of ghc-6.4.2 I had to modify the file
ghc/compiler/nativeGen/RegisterAlloc.hs by adding a deriving
declaration:

ghc/compiler/nativeGen/RegisterAlloc.hs:158

data FreeRegs = FreeRegs !Word32 !Word32

+                               deriving (Show)

This fix was in the 6.6 branch.  Is it also now in the 6.4.3 branch?


This fix isn't in 6.4.3, sorry.

I attached the cvs-patch to this email for your convenience. Without this fix I can't make a debug build on OS X/ppc. (Note: I do not know what directory you apply patches from, so I made this patch from the root cvs directory, in other words the directory heading fptools/ ghc .)

RegisterAlloc-deriv_Show-patch
Description: Binary data

(3) I cheated and modified the ghc script that invokes the
executable ..lib/ghc-6.4.2/ghc-6.4.2 by inserting a gdb invocation
after the exec statement.  (I was working on compiling Crypto with
the original Cabal setup but didn't want to resort to makefiles.):

# Mini-driver for GHC
exec gdb --args $GHCBIN  $TOPDIROPT ${1+"$@"}

Is there a better way to go about this?

I normally create a .gdbinit file with a suitable 'set args' command in

the current directory when I'm running gdb on GHC itself.

The problem I had was with 'set args': the failure building Crypto was intermittent and while building through Cabal there was not easy way to set a program to run for each build command before running ghc (hooks or Distribution.Make seemed overkill for a debug session). By placing 'gdb --args' in the mini-driver script Cabal would unknowingly start gdb and pass it all the args for each build. For each build command Cabal sent ghc, gdb started and I simply had to 'run' ghc with the appropriate arguments with 'r'.

(4) Would you please elaborate on the problem and the fix?  The
problems consistently showed up in ghc/rts/GC.c:threadSqueezeStack,
in the variable frame (note: comments *follow* code:

(gdb) disas threadSqueezeStack
...

threadSqueezeStack is a red herring: the problem is that two OS threads

have mistakenly been allowed into the runtime simultaneously, and

they're both trying to execute the same Haskell thread. The bug was in

the Capability management, which was wrongly assuming that
pthread_cond_wait() would only wake up after an explicit
pthread_cond_signal(), but this is not the case.  Some OSs exhibit
spurious wakeups of pthread_cond_wait() more than others, it seems.

Thank you for the insight! I am certain you already know the quote below by heart but for the benefit of onlookers, here it is:


IEEE Std. 1003.1-2001, under pthread_cond_wait() states:

When using condition variables there is always a Boolean predicate involving shared variables associated with each condition wait that is true if the thread should proceed. Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return.
When a thread waits on a condition variable, having specified a particular mutex to either the pthread_cond_timedwait() or the pthread_cond_wait() operation, a dynamic binding is formed between that mutex and condition variable that remains in effect as long as at least one thread is blocked on the condition variable. During this time, the effect of an attempt by any thread to wait on that condition variable using a different mutex is undefined.

Before I read this, my impression was that spurious wakeups occurred on multi-processor or multi-core processor systems--I am using a single processor machine. At the time the two OS threads had entered the runtime, it might be that no thread was blocked on the condition variable (see the second quoted paragraph, above). Following your advice below:

The problem was pretty obvious from the +RTS -Ds output, incedentally.
However getting the bug ro reproduce with +RTS -Ds on was a bit tricky.


This is the output I get from modifing the crypto.cabal file to:
Ghc-options:    [original opts.] +RTS -Ds -RTS
--note: added -RTS bound
--since Cabal/Makefile options may follow Ghc-options

sched (task 0xa000ed88): ### Running thread 1 in bound thread
--Schedule.c: schedule() if( m == mainThread )

sched (task 0xa000ed88): -->> running thread 1 ThreadRunGHC ...
--Schedule.c:schedule():run_thread: (after RELEASE_LOCK())
-- macro for RELEASE_LOCK and debugBelch in OSThreads.h
-- (apparently my "debug" build did not define LOCK_DEBUG, so
-- RELEASE_LOCK's debugBelch version was not defined)

Here, thread 1 (OS Thread 0 in gdb--the Main Thread) has released the lock... all the rest follows...


sched (task 0x3005c00): worker: returning; workers waiting: 0
--Capability.c: waitForReturnCapability()

sched (task 0x3005c00): worker (token 2): re-entering RTS
-- Schedule.c: resumeThread()

sched (task 0x3005c00): thread 2 did a _ccall_gc
--Schedule.c: suspendThread()

sched (task 0x3005c00): worker: released capability to returning worker
--Capability.c: releaseCapability()
--statement arises just after signalCondition(&returning_worker_cond)

sched (task 0x3005c00): worker (token 2): leaving RTS
-- Schedule.c: suspendThread()

sched (task 0xa000ed88): --<< thread 2 (ThreadRunGHC) stopped, yielding
-- Schedule.c: schedule():case ThreadYielding:
--Thread 1 (mainThread: Thread 0) crashed here, before
--thread 2's

-- else // (t->what_next == prev_what_next) -- debugBelch("...", (long)t->id, whatNext_strs[t->what_next]) --could be evaluated

If I am reading the source code in rts/OSThreads, Schedule, Capability and Task correctly, for a uni-processor machine with RTS_SUPPORTS_THREADS defined:


(a) in the Scheduler there are
                sched_mutex
                term_mutex      // this does not seem to be used
                thread_id_mutex // this is not used in the scheduler?
                bound_cond_cache (m->bound_thread_cond)
(b) in Capability.c, there are:
                returning_worker_cond
                thread_ready_cond
                static (Condition*) passTarget

I checked the code fairly carefully and the only mutex locked seemed to be sched_mutex. Calls from

        Schedule.c:schedule()
                --> Capability.c:waitForCapability()
        and
        Schedule.c:resumeThread()
                --> Capability.c:waitForReturnCapability(),

for example, seem to only pass sched_mutex, so I don't think the problem could be due to pairing a condition variable with a mutex different than the mutex called with pthread_cond_wait(&cond_var, &orig_mutex)--in the RTS, through waitCondition().


However:

ghc/rts/Schedule.c:2022-2032

#if defined(RTS_SUPPORTS_THREADS)
    // Allocating a new condition for each thread is expensive, so we
    // cache one.  This is a pretty feeble hack, but it helps speed up
    // consecutive call-ins quite a bit.
    if (bound_cond_cache_full) {
        m->bound_thread_cond = bound_cond_cache;
        bound_cond_cache_full = 0;
    } else {
        initCondition(&m->bound_thread_cond);
    }
#endif

That hack may become worrisome, should threads obtain separate mutexes (I did not check the code for GranSim).

To support variable-length structures, it's a common technique.  The
FLEXIBLE_ARRAY macro is used to work around portability issues.

It works well now and the implementation is elegant. If I had attempted to implement the RTS it would have been more ... heavy handed (and probably inefficient). The following are just some things to keep in mind:


The RTS in 6.4.3 actually requires gnu extensions:

typedef struct {
    StgInt srt_offset;;
    StgInfoTable i;
} StgRetInfoTable;

Here, StgRetInfoTable contains an array (StgInfoTable) that contains a ZLA member. Should GHC move toward a native-Windows version this might be a problem; I do not think Microsoft's CL compiler allows a structure to contain a structure with a flexible array member.

# 280 "ghc/includes/InfoTables.h"
typedef struct _StgInfoTable {
    StgClosureInfo layout;
    StgHalfWord type;
    StgHalfWord srt_bitmap;
    StgCode code[];
} StgInfoTable;
# 44 "ghc/includes/Closures.h"
typedef struct {
 const struct _StgInfoTable* info;
} StgHeader;


        b. the C sizeof() operator does not correctly report

the size of structures containing ZLA's, so sizeof(StgInfoTable) reports 8,

not 12,


8 is the correct size for StgInfoTable (on a 32-bit machine).  Why do
you think it should be 12?

I haven't used ZLA's before, so that was partly my misconception and partly an underlying disagreement I have with flexible array members in general. I always thought of flexible array member as a pointer; I now understand that structures containing a ZLA-flexible array member are treated as if the ZLA-member does not exist, especially for sizeof(), but their incomplete type must reserve space (sizeof (void), which is also an incomplete type, is 1). The actual implementation does include the offset to the flexible array member, so sizeof() should account for that.

Bug 25805 [4.0 regression], at <http://gcc.gnu.org/bugzilla/ show_bug.cgi?id=25805> is a good example of a worrisome error:

The following program fails to initialise d1.x[] using Apple's gcc 4.0.1 (build 5363), even when -fno-zero-initialized-in-bss is defined. (Note: -fzero-initialized-in-bss is defined by default.) This program does not fail when using a version of gcc 4.2.0 I built.


----- test-zla.c -----
#include <stdio.h>

struct s {
  int a;
  int x[];
} d1 = { 0, 0 };

int d2 = 0;

int main(int argc, const char* argv[])
{

  d2 = 1;

  if( d1.x[0] != 0 )
    printf("error in initialization of d1.x[0]\n");

  d1.x[0] = 0;

  if( d1.x[0] == 0 )
          printf("flexible array member now contains 0.\n");

  printf("sizeof struct s { int a; int x[]; } d1: %d\n", sizeof(d1));

  return 0;
}
------------------------

With Apple's gcc 4.0.1,  in assembler the label d1 is always:

_d1:
        .space 4

With gcc 4.2.0, the label d1 is:

_d1:
        .space 8

Of course, this bug only affects initialisation. The CCS_DECLARE macro in StgProf.h:202-215 might be affected but I'm not certain about that.

0x00c246c0 <threadSqueezeStack+88>:     addi    r9,r2,-12
        ; (((StgRetInfoTable *)(((StgClosure
*)frame)->header.info) - 1))
        ; the -12 is the size of StgInfoTable


Actually this is manipulating a pointer to StgRetInfoTable, which does
indeed have size 12.

That was partly a typo (I had tested sizeof(StgRetInfoTable) but wrote StgInfoTable) and partly my previously stated misconception: that the size of StgInfoTable should include the correct (initialised) size of its member 'code[]'.

Finally, if there are alignment issues, wouldn't that be better
controlled explicitly through pragmas?
Could you elaborate a bit? Where do you want to use alignment pragmas,
and what would they buy us?

Hopefully they would buy speed and maybe even some space, at least on some RISC architectures. If ZLA's are used and there is any padding on the end of the structure (in place of the ZLA member or after it), the alignment of the subsequent structures may be off. For the PowerPC, Integers are optimally aligned on byte boundaries according to their size (i.e., a 4-byte int32_t would only have 'good' performance if it was aligned at 2 and would have 'optimal' performance at 4); similarly for floats (8-byte doubles = 8 byte algned optimal, etc.). There is a space (not performance) penalty for aligning members on greater-size boundaries.

The c99 Standard (TC2), on Section 6.7.2.1 (Flexible Array Members) states that:

In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply.

Gcc (4.x) seems to align structures with flexible array members (ZLAs) at 2:


for the above example program:

_d1:
        .space 4
        .globl _d2
        .align 2

(Note: it is an error to use the __alignof__ keyword to find this out because ZLAs have incomplete type so you have to look in the assembly output.)

It might be interesting to view the performance difference between using the Darwin pragma:

#pragma options align=4

I am not certain about the effect this would have on the binary compatibility of code compiled this way with code compiled with the default alignment. Code compiled with the gcc option -falign- struct=n is not binary compatible with other code.

The original reason I suggested using pragmas was the potential memory savings: on x86 architectures packing structs well might save as much space as using flexible array members. After carefully thinking about the original system I doubt I had a good idea, though: your solution is much more elegant.


-Pete

_______________________________________________
Cvs-ghc mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/cvs-ghc

Re: A few Questions on OS X ghc-6.4.3 fix

Reply via email to