Looking back through old mail.  I mailed this to the Hugs maintainers
about 2 years ago, but I think it is still relevant.  It contains a
suggestion about why GC doesn't always work right on Hugs on sparcs
when type.c/static.c/whatever are compiled -O/-O2, and an easy fix.

J

| -----Original Message-----
| From: Julian Seward (Intl Vendor) [mailto:[EMAIL PROTECTED]] 
| Sent: Thursday, April 22, 1999 10:13 AM
| To: '[EMAIL PROTECTED]'
| Cc: '[EMAIL PROTECTED]'
| Subject: Observation re Hugs GC problem on sparcs w/gcc -O2
| 
| 
| 
| Mark
| 
| I seem to gather that (normal) Hugs has GC problems on Sparc 
| with optimisation on.  David W also reported this recently.
| 
| I got the impression that you think this happens because the 
| register window mechanism magically obscures some registers 
| which could hold roots, at the point when GC occurs.
| 
| I would like to offer a different explanation, plus an easy 
| fix.  When I was working for the UFO project at Manchester, 
| we did a collector which took roots from the C stack.  It too 
| had problems on sparcs, and the following observation saved 
| the day.  I think it might equally apply to Hugs.
| 
| In the code below, complexFnReturningAPair is obvious,
| and ... compute_and_alloc ... refers to some code which
| may cause a garbage collection.
| 
| 
| Pair complexFnReturningAPair ( ... )   { .... }
| 
| 
|    Pair p;
| 
|    p = complexFnReturningAPair ( ... );
|    for ( ... ) {
|       ... fst(p) ... snd(p) ...
|       ... compute_and_alloc ...
|    }
| 
| So: p becomes a pair.  We then enter a loop which refers to
| fst(p) and/or snd(p), and in which GC could happen. After cppisation, 
| the code looks like
| 
|    p = complexFnReturningAPair ( ... );
|    for ( ... ) {
|       ... heapTopFst[p] ... heapTopSnd[p] ...
|       ... compute_and_alloc ...
|    }
| 
| Expanding the array address calculations gives
| 
|    p = complexFnReturningAPair ( ... );
|    for ( ... ) {
|       ... heapTopFst + 4*p ... heapTopSnd + 4*p ...
|       ... compute_and_alloc ...
|    }
| 
| Now the critical step, done by gcc when optimising: lift
| 4*p out of the loop since it's loop-invariant, and
| (incidentally) CSE the two 4*p's together:
| 
|    tmp = 4 * complexFnReturningAPair ( ... );
|    for ( ... ) {
|       ... heapTopFst + tmp ... heapTopSnd + tmp ...
|       ... compute_and_alloc ...
|    }
| 
| Blargh!  Now, inside the loop, there is no mention of the 
| original p returned by complexFnReturningAPair, neither in 
| regs nor on the stack.  So the pair won't be retained by GC 
| even though it's still live in the loop.
| 
| The moral of the story is this: when using an 
| array-index-addressed heap, as Hugs does, we need to treat as 
| a root not only valid array indexes into the heap, but also 
| intermediates created by the array address calculations.  That is:
| 
|    p                           sat -MAX_HEAP < p <= 0
|    p*sizeof(int)               sat -MAX_HEAP < p <= 0
|    heapTopFst+ p*sizeof(int)   sat -MAX_HEAP < p <= 0
|    heapTopSnd+ p*sizeof(int)   sat -MAX_HEAP < p <= 0
| 
| Looking in machdep.c:gcCStack(), I only see the first of 
| those four possibilities handled.  Perhaps the following 
| would be better:
| 
| #define Blargh markWithoutMove(*ptr); \
|                markWithoutMove((*ptr)/sizeof(Cell)); \
|                markWithoutMove(( 
| (void*)(*ptr)-(void*)heapTopFst)/sizeof(Cell));  \
|                markWithoutMove((
| (void*)(*ptr)-(void*)heapTopSnd)/sizeof(Cell))
| 
| #define StackGrowsDown  { while (ptr<=CStackBase) { Blargh; 
| ptr++; }; }
| #define StackGrowsUp    { while (ptr>=CStackBase) { Blargh; 
| ptr--; }; }
| #define GuessDirection  if (ptr>CStackBase) StackGrowsUp else 
| StackGrowsDown
| 
| I tried this on x86 and it made no difference at all, since 
| there isn't a problem for x86 anyway.  I'd be interested to 
| hear if it makes any difference on Sparcs.  (note -- the code 
| might not be exactly right). Check if you try it ...
| 
| Finally, I think I know why the problem doesn't strike x86s.  
| That is because the x86 has addressing modes of the form
| 
|    (reg1 + k*reg2)    for k as 1, 2, 4 or 8
| 
| So a reference to heapTopFst[p] becomes 
| 
|    ; reg1 holds p
|    ; reg2 holds heapTopFst
|    movl (reg1 + 4*reg2), reg3
| 
| and there is never any point precomputing 4*p, since the 
| architecture can do it for free.
| 
| The entire diatribe applies not only to sparcs but other 
| riscs which have lots of registers but only simple addressing 
| modes.  Have you had problem reports for (eg) MIPS too?
| 
| J
| 

_______________________________________________
Hugs-Bugs mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/hugs-bugs

Reply via email to