Very much appreciated, thanks!
On Sun, Feb 10, 2013 at 3:31 PM, Moritz Heidkamp <[email protected] > wrote: > Fellow Chickeneers, > > I have finally gotten around to finishing my long standing plan of > porting the useful persistent hash map data structure from Clojure to > Chicken. Thankfully, ClojureScript grew its own implementation of it in > the meantime which written (mostly) in ClojureScript itself so porting > was way easier than with the original Java implementation. But first, > let me explain why this data structure is so nice. In a nutshell it > allows you to program in a functional style like you are used to with > lists in Scheme. As you know, SRFI 69 style hash tables don't lend > themselves very well to this style because they are mutable. Thus we > Schemers often resort to alists when we need associative data structures > which is fine in many cases but has (at least) two drawbacks: > > 1) Lookup, deletion, and insertion are fairly expensive operations of > O(n) complexity. This can be circumvented by converting the alist > into a hash table once it is "finished" but this requires a fair > amount of copying. Also, from this point on you lose the (de facto) > immutability of the alist which allowed you to program in a > functional way. So now you have to deal with all the challenges of > mutable state again. In conclusion you can't have your cake and eat > it, too. > > 2) Alists are not of a distinct type so you can't for example dispatch > or specialize on them. The closest approximation of a type predicate > is of O(n) complexity, too, as it would require to walk the whole > list to check whether all elements are indeed pairs. This is also an > issue with serialization and deserialization as their external > representation is indistinguishable from a list of pairs. > > In practice these drawbacks are usually not much of an issue but it > strikes me as a kludgy work-aroundish state of affairs. I want to use a > proper data structure when dealing with associative data without > compromising programming convenience! And this is where persistent hash > maps come into play. In essence they work like alists without the > drawbacks mentioned above: > > 1) Lookup complexity is practically constant and contrived benchmarks > show that performance is comparable to SRFI 69 hash tables. Insertion > and deletion is around O(log(n)). As an optimization the > implementation allows to turn a persistent map into a so called > transient map which can be mutated. This allows for efficient batch > modification. Later it can be turned back into a persistent map > without any copying. > > 2) Maps are implemented as a distinct type. What's currently missing is > an external representation but this can easily be added, e.g. by > means of a SRFI 10 reader extension. > > In a way, persistent hash maps give you the best of alists and hash > tables combined which is why I highly recommend you to check them > out. The egg's documentation can be found at the usual place > (http://wiki.call-cc.org/eggref/4/persistent-hash-map), the source code > lives at https://bitbucket.org/DerGuteMoritz/persistent-hash-map. There > are almost certainly still some bugs in the implementation, especially > in edge cases. I'm happy to receive bug reports if you happen to run > into those! > > With regards to the implementation I would like share a few > tidbits. Since I wanted to achieve decently performing code I took this > project as an opportunity to give Felix' programming for performance > guide a try (http://wiki.call-cc.org/programming-for-performance). It > was really helpful in reaching this goal, so thanks Felix for taking the > time to write it up! My findings are as follows: > > The original ClojureScript (and Clojure) implementation uses types to > represent different kinds of nodes. For those types several generic > functions ("protocol methods" in Clojure lingo, see > http://clojure.org/protocols) are defined. I wanted to keep this > structure in my port so as to not diverge all too far from the original > implementation. That way I hope to be able to integrate upstream patches > in the future more swiftly. To that end I tried coops and fast-generic > (as recommended in the guide) and found that coops yields slightly > smaller compiled code (about 210K versus 244K with fast-generic) while > fast-generic's dispatch is about an order of magnitude faster. The catch > is that fast-generic types are only visible inside the compiled module > but in this case this doesn't matter because the generic functions are > completely internal to the implementation so I decided to go with > fast-generic. The types themselves are represented as Scheme records. I > tried both the typed-records and the record-variants extensions. It > turned out that typed-records yielded slightly better performance than > record-variants with all declarations enabled (unsafe, unchecked, > inline). The typed-records may have additionally benefited from > pervasive type declarations on all procedures defined in the module (a > .types file is provided by the egg, too). With these things I managed to > achieve quite good performance characteristics without resorting to > direct calls to any ##sys# or ##core# functions. The only possibly not > quite portable bit is a foreign call to __builtin_popcount which is > supported by GCC since version 3.4 and by LLVM since version 1.5. Some > architectures support popcount as a native instruction (e.g. Intel CPUs > since Core i7) which can be enabled by passing -mpopcnt or, for example, > -march=corei7 to the compiler via CSC_OPTIONS. If you run into problems > with that, please let me know. Also, if you have ideas how to ooze out > some more performance I'm all ears. The next step on my plan is to try > specializations. We'll see how that goes. > > Cheers all around > Moritz > > _______________________________________________ > Chicken-users mailing list > [email protected] > https://lists.nongnu.org/mailman/listinfo/chicken-users >
_______________________________________________ Chicken-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/chicken-users
