Very much appreciated, thanks!

On Sun, Feb 10, 2013 at 3:31 PM, Moritz Heidkamp <[email protected]
> wrote:

> Fellow Chickeneers,
>
> I have finally gotten around to finishing my long standing plan of
> porting the useful persistent hash map data structure from Clojure to
> Chicken. Thankfully, ClojureScript grew its own implementation of it in
> the meantime which written (mostly) in ClojureScript itself so porting
> was way easier than with the original Java implementation. But first,
> let me explain why this data structure is so nice. In a nutshell it
> allows you to program in a functional style like you are used to with
> lists in Scheme. As you know, SRFI 69 style hash tables don't lend
> themselves very well to this style because they are mutable. Thus we
> Schemers often resort to alists when we need associative data structures
> which is fine in many cases but has (at least) two drawbacks:
>
> 1) Lookup, deletion, and insertion are fairly expensive operations of
>    O(n) complexity. This can be circumvented by converting the alist
>    into a hash table once it is "finished" but this requires a fair
>    amount of copying. Also, from this point on you lose the (de facto)
>    immutability of the alist which allowed you to program in a
>    functional way. So now you have to deal with all the challenges of
>    mutable state again. In conclusion you can't have your cake and eat
>    it, too.
>
> 2) Alists are not of a distinct type so you can't for example dispatch
>    or specialize on them. The closest approximation of a type predicate
>    is of O(n) complexity, too, as it would require to walk the whole
>    list to check whether all elements are indeed pairs. This is also an
>    issue with serialization and deserialization as their external
>    representation is indistinguishable from a list of pairs.
>
> In practice these drawbacks are usually not much of an issue but it
> strikes me as a kludgy work-aroundish state of affairs. I want to use a
> proper data structure when dealing with associative data without
> compromising programming convenience! And this is where persistent hash
> maps come into play. In essence they work like alists without the
> drawbacks mentioned above:
>
> 1) Lookup complexity is practically constant and contrived benchmarks
>    show that performance is comparable to SRFI 69 hash tables. Insertion
>    and deletion is around O(log(n)). As an optimization the
>    implementation allows to turn a persistent map into a so called
>    transient map which can be mutated. This allows for efficient batch
>    modification. Later it can be turned back into a persistent map
>    without any copying.
>
> 2) Maps are implemented as a distinct type. What's currently missing is
>    an external representation but this can easily be added, e.g. by
>    means of a SRFI 10 reader extension.
>
> In a way, persistent hash maps give you the best of alists and hash
> tables combined which is why I highly recommend you to check them
> out. The egg's documentation can be found at the usual place
> (http://wiki.call-cc.org/eggref/4/persistent-hash-map), the source code
> lives at https://bitbucket.org/DerGuteMoritz/persistent-hash-map.  There
> are almost certainly still some bugs in the implementation, especially
> in edge cases. I'm happy to receive bug reports if you happen to run
> into those!
>
> With regards to the implementation I would like share a few
> tidbits. Since I wanted to achieve decently performing code I took this
> project as an opportunity to give Felix' programming for performance
> guide a try (http://wiki.call-cc.org/programming-for-performance). It
> was really helpful in reaching this goal, so thanks Felix for taking the
> time to write it up! My findings are as follows:
>
> The original ClojureScript (and Clojure) implementation uses types to
> represent different kinds of nodes. For those types several generic
> functions ("protocol methods" in Clojure lingo, see
> http://clojure.org/protocols) are defined. I wanted to keep this
> structure in my port so as to not diverge all too far from the original
> implementation. That way I hope to be able to integrate upstream patches
> in the future more swiftly. To that end I tried coops and fast-generic
> (as recommended in the guide) and found that coops yields slightly
> smaller compiled code (about 210K versus 244K with fast-generic) while
> fast-generic's dispatch is about an order of magnitude faster. The catch
> is that fast-generic types are only visible inside the compiled module
> but in this case this doesn't matter because the generic functions are
> completely internal to the implementation so I decided to go with
> fast-generic. The types themselves are represented as Scheme records. I
> tried both the typed-records and the record-variants extensions. It
> turned out that typed-records yielded slightly better performance than
> record-variants with all declarations enabled (unsafe, unchecked,
> inline). The typed-records may have additionally benefited from
> pervasive type declarations on all procedures defined in the module (a
> .types file is provided by the egg, too). With these things I managed to
> achieve quite good performance characteristics without resorting to
> direct calls to any ##sys# or ##core# functions. The only possibly not
> quite portable bit is a foreign call to __builtin_popcount which is
> supported by GCC since version 3.4 and by LLVM since version 1.5. Some
> architectures support popcount as a native instruction (e.g. Intel CPUs
> since Core i7) which can be enabled by passing -mpopcnt or, for example,
> -march=corei7 to the compiler via CSC_OPTIONS. If you run into problems
> with that, please let me know. Also, if you have ideas how to ooze out
> some more performance I'm all ears. The next step on my plan is to try
> specializations. We'll see how that goes.
>
> Cheers all around
> Moritz
>
> _______________________________________________
> Chicken-users mailing list
> [email protected]
> https://lists.nongnu.org/mailman/listinfo/chicken-users
>
_______________________________________________
Chicken-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/chicken-users

Reply via email to