On Mon, 2007-09-17 at 00:08 -0600, zooko wrote: > On Sep 16, 2007, at 11:53 PM, skaller wrote: > > >> But it sure has sped up some slow programs dramatically. > > > > My experience is slightly different. The Felix gc is actually slightly > > slower with Judy than the doubly linked list I used before. > > Memory consumption is about the same I think .. not sure. > > What chip architecture are you using? Other people's reports suggest > that if your machine has 64-byte cache lines then Judy will perform > much better than with other cache line sizes.
I personally have a pair of AMD64s at the moment, but Felix is a portable compiler generating portable ISO C++, so generated code runs on all architectures and all OS with a C++ compiler. > http://permalink.gmane.org/gmane.comp.lib.judy.devel/44 > > Thanks for the nice description of your felix gc algorithm. That is > a very cool trick! Yup, the calculation is actually: Word_t cand = (Word_t)p; // candidate pointer Word_t fp=cand; Word_t *w = (Word_t*)JudyLLast(j_shape,&fp,&je); // equal or less registered pointer if(w == NULL || w == (Word_t*)(void*)PPJERR) return; // no such pointer, candidate is a foreigner, ignore it if( (*w & 1UL) == reachable) return; // already handled or handling this object, // don't recursively scan it gc_shape_t *shape = (gc_shape_t*)(*w & ~1UL); // shape of lower object unsigned long n = get_count((void*)fp) * shape->count * shape->amt; // length of the object: also uses another JudyL array if(cand >= (Word_t)(void*)((unsigned char*)(void*)fp+n)) return; // not interior to the lower object: candidate is a foreigner *w = (*w & ~1uL) | reachable; // we own the object, it's reachable // use the RTTI to find the pointers in the object and // recursively examine them This uses 2 1/2 JudyL arrays, one for the registered pointers, one for the RTTI, and 1/2 for the array count (if the object is an array). It's cute, there is no special test for NULL either: NULL is 'just another foreign pointer'. Now, the scan on all objects following which checks the low bit of the address (malloc always returns even addresses) removes garbage from the registered pointer array into the 'to be deleted array', then the finalisers for them is run by scanning that array, and the array is bulk cleared. The sequential scans could *probably* be improved if JudyL returned a 'state' object which could be used as an argument to a 'get_next' function (unsafe, would only work if the array wasn't modified). C actually makes that hard. If you're recursing down a tree you want to 'yield' the current thing, and then 'resume' where you left off in the visitation. You not only have to use an array to keep state instead of the stack, you also have to keep track of the current code location so you can jump back to it. Felix actually automates precisely that mechanism, which I call control inversion. It's the same as using a pthread with a lock to cause context switching, except there's no pthread, just a heaped C++ class object with a resume() method that uses a 'switch' to jump to where it last yielded from. If Judy was actually able to do this it would be even faster, since it would save 'recursing' down the bytes of the last key to find the place to start searching for the next key. On a 64 bit machine, addresses are 8 bytes, and 8 lookups per pointer isn't nearly as 'invisible' as 4 on a 32 bit machine, even if the high order byte nodes are already in the cache. [And yes in *w = (*w & ~1uL) | reachable; the ~1uL isn't portable, probably won't work on win64 ..] -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Judy-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/judy-devel
