-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hallo,
tldr; I would like to have some input, because the crypto-hash function for edn is a basic building block to distribute values and cannot easily be changed later. I have developed a cross-platform implementation to create UUID5 values based on hashes to safely store values in untrusted distributed environments (1) (for a p2p synchronisation system we are working on (2)). I am pretty sure that I don't introduce collisions, but please prove me wrong. The conceptually critical part is to use a commutative operation for commutative datastructures like maps and sets. I first hash kv entries and then xor them, so you cannot e.g. easily make two kv entries collide before entering the crypto-hash function. Now the problem is that this makes hashing maps more expansive (3). My goal was to hash a map of 1 million entries (blob size a few megabytes) in roughly a second or better so it takes as long as transmission over mobile or slow internet between untrusted peers (datascript javascript mobile clients and datomic database on the server for instance) and keeps latency in synchronisation low. I want to push the speed as far as I can (without sacrificing safety), because this determines the boundaries of distributed systems that can be built with it. Inside trusted networks hashing can always be disabled, treating these as random UUIDs, but this weakens the overall system. An alternative is to avoid commutative datastructures completely, which can give up to native byte-array speed but drops clojure value semantics. I had a look at cheshire (4) and transit-java (5) and I might be able to squeeze a factor 2x out by using JVM dispatch on types instead of protocols. But as this is already a fairly low-level and limited optimization (sequential hashing is fairly fast and the profiler only sometimes shows protocol overhead) I might miss some good ideas of how to improve the commutative crunching, I wanted to ask here first. Benchmarking has also proven quite adventurous, with significant performance changes from different jvm runs, even with criterium it just was half as fast as a few hours ago (probably OS/cpu throtteling (?...)). If you have further requirements or ideas, it would make sense to raise them now. I am also not aware of other solutions, so please point them out if you know some. Thanks, Christian (1) https://github.com/ghubber/hasch/blob/master/src/clj/hasch/platform.clj (2 bits of precious entropy are reserved for internal revisions) (2) https://github.com/ghubber/geschichte (3) https://github.com/ghubber/hasch#speed (4) https://github.com/dakrone/cheshire (5) https://github.com/cognitect/transit-java -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJUlyujAAoJEKel+aujRZMkpgYH+QFmNvrAOiu65CMDVNynZTGB B9Pb7N62gCA3sYZZq6R0taY6ZraVKozwiJ27WOO7u6c0XJeYAMFHSTUvBepbKqkj qYpgeVkkNMB9MyCLDvhBxaT4aId35CfXJBWYu0pK1jsq8Kwlfwwukq4ThOndUMB/ wp226e7i7YogLB8tetPQ2wHC9wfw1ITFSQtIf9avt1tAlxroeg23scV8NPG0LRSO p33kFrGWYnZcl27kS2AsdjC5akkS3QUDm3FBiaLVF4swCkir5jOMGOm8zhUincUL NPKpbBJ2DT6RU2nTgJESWRxK9Sph8+hvI+3Vu9S6VVgmMDGq4uixMqVhtfYipys= =6W+f -----END PGP SIGNATURE----- -- You received this message because you are subscribed to the Google Groups "Clojure" group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups "Clojure" group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.