On Tue, Jul 27, 2010 at 12:42:27AM +0100, Thomas Schilling wrote: > > * a garbage colletor which can be enabled with '-fjgc' is available. > > after some more testing, I will make it the default in the 0.7.5 release. > > * A new memory allocator based on Jeff Bonwick's slab memory allocator for > > kernels. This means that in many cases, memory can be pulled off a slab > > and immediately used with no initialization, in addition, it allowed a > > rearrangement of the GC meta-data to optimizally use the cache. > > That sounds like you're essentially using BIBOP (big bag of pages, > http://www.memorymanagement.org/glossary/b.html#bibop). Did you > compare the overheads with the header-based scheme? Do you still need > a tag/header word or do you get the tag by looking up the per page > info? Do you use this scheme for all objects or only for small > objects? I could imagine a hybrid scheme where where you use bibop > for, say, single-constructor objects, although I wonder how you would > deal with thunks and their update.
Yeah, it is somewhat similar to that, except I don't need to look up in a table what the type is, I just take the address modulo the block size and there is a shared info section at the front of the block. all the GC bookeeping bits are kept here too and can usually fit on a single cache line, that way you can mark a object used without actually having to dereference a pointer to it, allowing the cache line for the object itself to fill hopefully by the time I get around to it on the grey list. objects that contain no pointers are never touched by the GC, an advantage of keeping the bookeeping bits away from the objects. There is no per object overhead/tag word imposed by the storage manager, other than the one or two bits used by the GC. (one bit right now, but I will need to expand it to add some GC features that are missing (destructors, incremential collection)). I currently use this scheme for all objects, however, beyond a certain size I think a header approach might be better, so I will look into that. In particluar, my arrary sizes are limited to what can fit in a block right now when you have the GC on, which is a limitation I need to fix. I did experiment with a header scheme, however, it was never really optimized that well so I don't think I could make performance conclusions from it. I may add a switch to control this though. Honestly, a slab inspired allocator has been my plan since day one of writing jhc, I had a fair amount of experience with them from my previous OS work and felt they would be a good fit for jhcs model. > > * packed representation of algebraic types, 'Maybe Foo' is actually > > represented in memory as a NULL pointer or just a 'Foo' with no tag bits. > > Combined with the slab allocator, this can double the number of some > > common > > values that can be put in a cache line. > > Interesting. How do you distinguish 'Just e' from 'Just _|_'? Do > you need the whole program assumption to disprove that the latter case > can happen? each word has two tag bits, one says whether the value is in WHNF, the other says whether it should be followed by the GC. So to eval something, I check the tag bit, if it says WHNF I am done, otherwise the object must be one of a thunk or an indirection, if it is a thunk, it has a code pointer as its first entry, otherwise it is a redirection that _must_ be a WHNF pointer. Unlike ghc, there are strong invarients about what sort of objects can appear where, WHNF and non WHNF values actually have different types in Grin and C. For instance, an indirection to an indirection is impossible, as is redirecting a whnf value, this also means a lot of things that are easy for ghc, a collector that moves things around incrementally, implementing concurrent locks by overwriting an evaluation pointer, etc are difficult in jhc. So there are tradeoffs. (a selectable alternate RTS may mitigate this though) Now, the interesting thing is that if the WHNF bit is set, there are no rts or gc imposed restrictions on the word whatsoever, it need not even be a pointer, and often isn't. (unless the gc flag is set, in which case it has to be a heap pointer, but the format of what it is pointing to is completly undefined). For instance, data Foo !Word8 !Word8 will theoretically be packed directly into the word itself as the two most signifigant bytes. For each type I can statically generate an "optimal" layout based on its structure. For instance, maybe benefits from two of these optimizations, first of all, nullary constructors (Nothing) need never appear in the heap, so they are given values that pack directly into a word, this happens for all nullay constructors in general. Then, since there is only one constructor left (Just) we can discard the tag field, because if something of type Maybe, is in WHNF, and is not Nothing then in must be a Just. so the Just is a single heap allocated word (which are packed end to end in a page with no overhead) I am refining the optimization algorithm, on 64 bit machines I have another few bits to play with for instance that I can take advantage of. A particularly nice case is that characters can be fully integreated into the word, so the entire space usage of "Hello" is 10 words as lists benefit from the same packing benefits as Maybe. compare this to 30 or so words used in a traditional "everything is on the heap and tagged" model. The manual has a section describing the RTS, it doesn't describe the GC though, but talks about how I pack things in the pointers. http://repetae.net/computer/jhc/manual.html#the-run-time-system John -- John Meacham - ⑆repetae.net⑆john⑈ - http://notanumber.net/ _______________________________________________ Haskell mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell
