Non-updateable thunks

Joachim Breitner Wed, 01 Aug 2012 03:40:27 -0700

Hello,

I’m still working on issues of performance vs. sharing; I must assume
some of the people here on the list must have seen my "dup"-paper¹ as
referees.


I’m now wondering about a approach where the compiler (either
automatically or by user annotation; I’ll leave that question for later)
would mark some thunks as reentrant, i.e. simply skip the blackholing
and update frame pushing. A quick test showed that this should work
quite well, take the usual example:
        
        import System.Environment
        main = do
            a <- getArgs
            let n = length a
            print n
            let l = [n..30000000]
            print $ last l + last l

This obviously leaks memory:

        $ ./Test +RTS -t
        0
        60000000
        <<ghc: 2400054760 bytes, 4596 GCs, 169560494/935354240 avg/max
        bytes residency (11 samples), 2121M in use, 0.00 INIT (0.00
        elapsed), 0.63 MUT (0.63 elapsed), 4.28 GC (4.29 elapsed) :ghc>>


I then modified the the assembly (a crude but effective way of testing
this ;-)) to not push a stack frame:

$ diff -u Test.s Test-modified.s 
--- Test.s      2012-08-01 11:30:00.000000000 +0200
+++ Test-modified.s     2012-08-01 11:29:40.000000000 +0200
@@ -56,20 +56,20 @@
        leaq -40(%rbp),%rax
        cmpq %r15,%rax
        jb .LcpZ
-       addq $16,%r12
-       cmpq 144(%r13),%r12
-       ja .Lcq1
-       movq $stg_upd_frame_info,-16(%rbp)
-       movq %rbx,-8(%rbp)
+       //addq $16,%r12
+       //cmpq 144(%r13),%r12
+       //ja .Lcq1
+       //movq $stg_upd_frame_info,-16(%rbp)
+       //movq %rbx,-8(%rbp)
        movq $ghczmprim_GHCziTypes_Izh_con_info,-8(%r12)
        movq $30000000,0(%r12)
        leaq -7(%r12),%rax
-       movq %rax,-24(%rbp)
+       movq %rax,-8(%rbp)
        movq 16(%rbx),%rax
-       movq %rax,-32(%rbp)
-       movq $stg_ap_pp_info,-40(%rbp)
+       movq %rax,-16(%rbp)
+       movq $stg_ap_pp_info,-24(%rbp)
        movl $base_GHCziEnum_zdfEnumInt_closure,%r14d
-       addq $-40,%rbp
+       addq $-24,%rbp
        jmp base_GHCziEnum_enumFromTo_info
 .Lcq1:
        movq $16,192(%r13)
     
Now it runs fast and slim (and did not crash on the first try, which I
find surprising after hand-modifying the assembly code):

        $ ./Test +RTS -t
        0
        60000000
        <<ghc: 4800054840 bytes, 9192 GCs, 28632/28632 avg/max bytes
        residency (1 samples), 1M in use, 0.00 INIT (0.00 elapsed), 0.73
        MUT (0.73 elapsed), 0.04 GC (0.04 elapsed) :ghc>>


My question is: Has anybody worked in that direction? And are there any
fundamental problems with the current RTS implementation and such
closures? 

Greetings,
Joachim


¹ http://arxiv.org/abs/1207.2017
currently not about to appear anywhere else, but I have not given up
hope yet :-)


-- 
Dipl.-Math. Dipl.-Inform. Joachim Breitner
Wissenschaftlicher Mitarbeiter
http://pp.info.uni-karlsruhe.de/~breitner

signature.asc
Description: This is a digitally signed message part

_______________________________________________
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Non-updateable thunks

Reply via email to