Re: storing highly shared data structures

2006-01-12 Thread Christian Maeder

Bulat Ziganshin wrote:

char *ghc_rts_opts = -A10m;

(see 4.14.5 in GHC user manual)


Yes, thanks, that is better for me
Christian
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-11 Thread Simon Marlow

Bulat Ziganshin wrote:

Hello Simon,

Tuesday, January 10, 2006, 12:26:30 PM, you wrote:



CM My old version is faster, because the version with makeStableName
does
CM very much GC.

CMMUT   time   27.28s  ( 28.91s elapsed)
CMGCtime  133.98s  (140.08s elapsed)

try to add infamous +RTS -A10m switch ;)



SM The real problem seems to be that minor GCs are taking too long.  Having 
SM said that, you can usually reduce GC overhead with large -A or -H options.


it is the same problem as with scanning large IOArrays on each GC.


Actually I think it's a different problem, with the same workaround.


in this case, i think, table of stable names scanned each time and
therefore program works so slow. if ghc will dynamically increase
+RTS -A area when there are a large IOArrays/stable names table,
this would help improve speed significantly.


I'm not keen on this, because its a hack and doesn't address the real 
cause of the problem.  We're working on addressing the array issue, see 
this ticket I created today:


  http://cvs.haskell.org/trac/ghc/ticket/650


also, i want to remind my
old suggestion to temporarily disable all GCs, or better to
temporarily change -A area at the programs request. That will help
programs like Joels one and my own, where large data structures (say,
5 or 20 mb) are created, used and then completely discarded


You can change the allocation area size from within a program quite 
easily.  Write a little C function to assign to 
RtsFlags.GcFlags.minAllocAreaSize (#include RtsFlags.h first), and 
call it from Haskell; the next time GC runs it will allocate the larger 
nursery.  Please try this and let me know if it works.


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-11 Thread Christian Maeder

Simon Marlow wrote:
You can change the allocation area size from within a program quite 
easily.  Write a little C function to assign to 
RtsFlags.GcFlags.minAllocAreaSize (#include RtsFlags.h first), and 
call it from Haskell; the next time GC runs it will allocate the larger 
nursery.  Please try this and let me know if it works.


Can someone supply a code snippet for me (as I don't speak C and always 
hoped to avoid C by using haskell)?


Christian
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-11 Thread Esa Ilari Vuokko
On 1/11/06, Christian Maeder [EMAIL PROTECTED] wrote:
 Simon Marlow wrote:
  You can change the allocation area size from within a program quite
  easily.  Write a little C function to assign to
  RtsFlags.GcFlags.minAllocAreaSize (#include RtsFlags.h first), and
  call it from Haskell; the next time GC runs it will allocate the larger
  nursery.  Please try this and let me know if it works.

 Can someone supply a code snippet for me (as I don't speak C and always
 hoped to avoid C by using haskell)?
Hi

Following compiles and runs for me (ghc 6.4.1), but I didn't test any effects.

C-bits:
#include Rts.h
#include RtsFlags.h
void SetMinAllocArea(int a) {
RtsFlags.GcFlags.minAllocAreaSize=a;
}

Haskell bits:
import Foreign.C.Types
foreign import ccall SetMinAllocArea setMinAllocArea :: CInt - IO ()

It requires compiling c-bits without --make, something like
ghc -c bits.c -o bits.o
and including it on final linking or --make step (or ar/ld, similary)
ghc bits.o --make main

HTH,
Esa
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-11 Thread Simon Marlow

Esa Ilari Vuokko wrote:

On 1/11/06, Christian Maeder [EMAIL PROTECTED] wrote:


Simon Marlow wrote:


You can change the allocation area size from within a program quite
easily.  Write a little C function to assign to
RtsFlags.GcFlags.minAllocAreaSize (#include RtsFlags.h first), and
call it from Haskell; the next time GC runs it will allocate the larger
nursery.  Please try this and let me know if it works.


Can someone supply a code snippet for me (as I don't speak C and always
hoped to avoid C by using haskell)?


Hi

Following compiles and runs for me (ghc 6.4.1), but I didn't test any effects.

C-bits:
#include Rts.h
#include RtsFlags.h
void SetMinAllocArea(int a) {
RtsFlags.GcFlags.minAllocAreaSize=a;
}

Haskell bits:
import Foreign.C.Types
foreign import ccall SetMinAllocArea setMinAllocArea :: CInt - IO ()

It requires compiling c-bits without --make, something like
ghc -c bits.c -o bits.o
and including it on final linking or --make step (or ar/ld, similary)
ghc bits.o --make main


I forgot to mention, minAllocAreaSize is in units of BLOCK_SIZE, so 
divide bytes by BLOCK_SIZE before setting it.  You can get BLOCK_SIZE 
from Rts.h.


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-10 Thread Simon Marlow

Christian Maeder wrote:

Bulat Ziganshin wrote:

CM My old version is faster, because the version with makeStableName 
does

CM very much GC.

CMMUT   time   27.28s  ( 28.91s elapsed)
CMGCtime  133.98s  (140.08s elapsed)

try to add infamous +RTS -A10m switch ;)



You saved my day, thank you Bulat!

Without that flag:

  MUT   time   24.30s  ( 24.76s elapsed)
  GCtime  131.25s  (140.01s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time  155.55s  (164.77s elapsed)

And with it:

  MUT   time   23.86s  ( 24.86s elapsed)
  GCtime   11.03s  ( 11.68s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time   34.89s  ( 36.54s elapsed)


The real problem seems to be that minor GCs are taking too long.  Having 
said that, you can usually reduce GC overhead with large -A or -H options.


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-10 Thread Jan-Willem Maessen


On Jan 10, 2006, at 4:26 AM, Simon Marlow wrote:


Christian Maeder wrote:

Bulat Ziganshin wrote:
CM My old version is faster, because the version with  
makeStableName does

CM very much GC.

CMMUT   time   27.28s  ( 28.91s elapsed)
CMGCtime  133.98s  (140.08s elapsed)

try to add infamous +RTS -A10m switch ;)

You saved my day, thank you Bulat!
Without that flag:
  MUT   time   24.30s  ( 24.76s elapsed)
  GCtime  131.25s  (140.01s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time  155.55s  (164.77s elapsed)
And with it:
  MUT   time   23.86s  ( 24.86s elapsed)
  GCtime   11.03s  ( 11.68s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time   34.89s  ( 36.54s elapsed)


The real problem seems to be that minor GCs are taking too long.   
Having said that, you can usually reduce GC overhead with large -A  
or -H options.


Is the full table of stable names being traversed at every minor GC?   
If so, it might be worth somehow segregating the nursery stable  
names.  I bet this complicates management a bunch, though, since  
right now there's only one table to look things up in.


-Jan



Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-10 Thread Simon Marlow

Jan-Willem Maessen wrote:


On Jan 10, 2006, at 4:26 AM, Simon Marlow wrote:


Christian Maeder wrote:


Bulat Ziganshin wrote:

CM My old version is faster, because the version with  
makeStableName does

CM very much GC.

CMMUT   time   27.28s  ( 28.91s elapsed)
CMGCtime  133.98s  (140.08s elapsed)

try to add infamous +RTS -A10m switch ;)


You saved my day, thank you Bulat!
Without that flag:
  MUT   time   24.30s  ( 24.76s elapsed)
  GCtime  131.25s  (140.01s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time  155.55s  (164.77s elapsed)
And with it:
  MUT   time   23.86s  ( 24.86s elapsed)
  GCtime   11.03s  ( 11.68s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time   34.89s  ( 36.54s elapsed)



The real problem seems to be that minor GCs are taking too long.   
Having said that, you can usually reduce GC overhead with large -A  or 
-H options.



Is the full table of stable names being traversed at every minor GC?   
If so, it might be worth somehow segregating the nursery stable  names.  
I bet this complicates management a bunch, though, since  right now 
there's only one table to look things up in.


It's traversed, but not re-hashed.  So it shouldn't be a bottleneck, 
unless the table is really huge.  I agree that separating the gen 0 
stable names into a separate table would be the right fix, though.


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-09 Thread Simon Marlow

Christian Maeder wrote:

Simon Marlow wrote:


Right - Ptr isn't the right thing here, because GC will move objects
around.  That's why we have StablePtr and StableName. 



may it be that makeStableName is expensive? (or it is my additional Map?)

My old version is faster, because the version with makeStableName does 
very much GC.


Christian

1. with makeStableName (and a Map):

2,447,401,824 bytes allocated in the heap
703,294,688 bytes copied during GC
 50,780,688 bytes maximum residency (24 sample(s))

   9328 collections in generation 0 (129.88s)
 24 collections in generation 1 (  4.10s)

 99 Mb total memory in use

  INIT  time0.00s  (  0.00s elapsed)
  MUT   time   27.28s  ( 28.91s elapsed)
  GCtime  133.98s  (140.08s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time  161.26s  (168.99s elapsed)

  %GC time  83.1%  (82.9% elapsed)

  Alloc rate89,714,143 bytes per MUT second

  Productivity  16.9% of total user, 16.1% of total elapsed


Interesting... this could mean that updating the stable name table on 
each GC is very costly.  This deserves more investigation, but I'm 
afraid it's not high on the priority list for me.  I can help if anyone 
else wants to look into it.


Cheers,
Simon

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-09 Thread Christian Maeder

Bulat Ziganshin wrote:

CM My old version is faster, because the version with makeStableName does
CM very much GC.

CMMUT   time   27.28s  ( 28.91s elapsed)
CMGCtime  133.98s  (140.08s elapsed)

try to add infamous +RTS -A10m switch ;)


You saved my day, thank you Bulat!

Without that flag:

  MUT   time   24.30s  ( 24.76s elapsed)
  GCtime  131.25s  (140.01s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time  155.55s  (164.77s elapsed)

And with it:

  MUT   time   23.86s  ( 24.86s elapsed)
  GCtime   11.03s  ( 11.68s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time   34.89s  ( 36.54s elapsed)

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-09 Thread Christian Maeder

Bulat Ziganshin wrote:

try to add infamous +RTS -A10m switch ;)


Maybe -H300m is more famous?

  MUT   time   24.92s  ( 29.79s elapsed)
  GCtime6.32s  (  7.67s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time   31.24s  ( 37.46s elapsed)

Christian
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-06 Thread Christian Maeder

Simon Marlow wrote:

Right - Ptr isn't the right thing here, because GC will move objects
around.  That's why we have StablePtr and StableName. 


may it be that makeStableName is expensive? (or it is my additional Map?)

My old version is faster, because the version with makeStableName does 
very much GC.


Christian

1. with makeStableName (and a Map):

2,447,401,824 bytes allocated in the heap
703,294,688 bytes copied during GC
 50,780,688 bytes maximum residency (24 sample(s))

   9328 collections in generation 0 (129.88s)
 24 collections in generation 1 (  4.10s)

 99 Mb total memory in use

  INIT  time0.00s  (  0.00s elapsed)
  MUT   time   27.28s  ( 28.91s elapsed)
  GCtime  133.98s  (140.08s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time  161.26s  (168.99s elapsed)

  %GC time  83.1%  (82.9% elapsed)

  Alloc rate89,714,143 bytes per MUT second

  Productivity  16.9% of total user, 16.1% of total elapsed


2. without makeStableName:

7,560,158,340 bytes allocated in the heap
578,736,496 bytes copied during GC
 44,307,024 bytes maximum residency (25 sample(s))

  28832 collections in generation 0 (  6.83s)
 25 collections in generation 1 (  3.20s)

102 Mb total memory in use

  INIT  time0.00s  (  0.00s elapsed)
  MUT   time  139.71s  (146.74s elapsed)
  GCtime   10.03s  ( 10.94s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time  149.74s  (157.68s elapsed)

  %GC time   6.7%  (6.9% elapsed)

  Alloc rate54,113,222 bytes per MUT second

  Productivity  93.3% of total user, 88.6% of total elapsed
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2006-01-02 Thread Benjamin Franksen
On Thursday 29 December 2005 18:22, Christian Maeder wrote:
 Einar Karttunen wrote:
  On 22.12 14:43, Christian Maeder wrote:
  How can I detect this sharing in order to avoid traversing the
  very same symbol table for every symbol?
 
  By using System.Mem.StableName
  SerTH (http://cs.helsinki.fi/u/ekarttun/SerTH/) implements this,
  so you can look at the source for pointers.

 Thanks for your hints. What a pity that makeStableName goes to IO
 (why would this break referential transparency?) and that StableName
 a and TypeRep have no Ord instances.

This has been asked before (by me too). The reason is that (due to 
efficient implementation issues) the order relation between TypeReps is 
not guaranteed to be the same for every program run. I guess the 
situation is similar with makeStableName, which is in the IO monad 
because the result may be different between program runs even given the 
same value as input.

Ben
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2005-12-29 Thread Christian Maeder

Einar Karttunen wrote:

On 22.12 14:43, Christian Maeder wrote:

How can I detect this sharing in order to avoid traversing the very
same symbol table for every symbol?


By using System.Mem.StableName
SerTH (http://cs.helsinki.fi/u/ekarttun/SerTH/) implements this,
so you can look at the source for pointers.


Thanks for your hints. What a pity that makeStableName goes to IO (why 
would this break referential transparency?) and that StableName a and 
TypeRep have no Ord instances.


Facing RealWorld (and HashTable),
Christian
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: storing highly shared data structures

2005-12-23 Thread Simon Marlow
On 22 December 2005 13:43, Christian Maeder wrote:

 for storing highly shared data structures we use so called Annotated
 Terms (shortly ATerms, details below).
 
http://www.cwi.nl/htbin/sen1/twiki/bin/view/SEN1/ATerm
 
 In contrast to the Binary (or GhcBinary) class we compute the sharing,
 which saves a lot of space for data types that keep redundant
 information.
 
 With this we can store some of our data structures (of course only
 non-cyclic and finite ones) in a few KBs that need MBs if stored
 without sharing (as when using the Binary or the Show/Read
 classes).
 
 So far so good. The problem remaining is that an object is _traversed_
 as if being unshared and thus the _time_ for the ATermTable
 construction becomes too long for us.
 
 GHC's internal data structures (on the heap) are in many cases shared,
 by pointer references. I.e. if I add a (single) symbol table to every
 symbol that I use, then the symbol table will not be copied, but only
 a reference added to my symbol.
 
 How can I detect this sharing in order to avoid traversing the very
 same symbol table for every symbol?
 
 I've tried to use a Map (Ptr ()) ShATerm. So before traversing an
 object I look up its address and check if is was traversed before (if
 not I traverse it and store it in my map for future lookups).
 
 1.) I'm not sure if it is safe to use Ptr () (meanwhile garbage
 collection, heap compaction and what not could happen).

Right - Ptr isn't the right thing here, because GC will move objects
around.  That's why we have StablePtr and StableName.  In fact, what you
really want here is the pointer-equality memo table implementation in
the Memo module (package util).  This is scheduled to be removed in 6.6,
but it will be available as a Cabal package.

Cheers,
Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: storing highly shared data structures

2005-12-22 Thread Einar Karttunen
On 22.12 14:43, Christian Maeder wrote:
 How can I detect this sharing in order to avoid traversing the very
 same symbol table for every symbol?

By using System.Mem.StableName
SerTH (http://cs.helsinki.fi/u/ekarttun/SerTH/) implements this,
so you can look at the source for pointers.

 I've tried to use a Map (Ptr ()) ShATerm. So before traversing an
 object I look up its address and check if is was traversed before (if
 not I traverse it and store it in my map for future lookups).

GHC can move the objects in memory.

 2.) A single Map (Ptr ()) ShATerm does not work! It seems that
 sometimes objects are shared that have different types (as tests
 revealed). This seems obvious for newtypes but also happens (I think)
 without newtype declarations.

Yes, this is quite evil. For example the empty list can be shared
across type boundaries. Also phantom types can share the same
address. In addition you have to worry about libraries using
unsafeCoerce#

 So, I'm now thinking if I should try using the type of my data as key
 as well, i.e. using Map (TypeRep, Ptr ()) ShATerm to avoid duplicate
 traversals.

TypeRep is not Ord iirc, which makes this harder. Of course you can
show it and then use that.

- Einar

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users