-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hallo,

tldr;
I would like to have some input, because the crypto-hash function for
edn is a basic building block to distribute values and cannot easily
be changed later.


I have developed a cross-platform implementation to create UUID5
values based on hashes to safely store values in untrusted distributed
environments (1) (for a p2p synchronisation system we are working on
(2)).
I am pretty sure that I don't introduce collisions, but please prove
me wrong. The conceptually critical part is to use a commutative
operation for commutative datastructures like maps and sets. I first
hash kv entries and then xor them, so you cannot e.g. easily make two
kv entries collide before entering the crypto-hash function.

Now the problem is that this makes hashing maps more expansive (3). My
goal was to hash a map of 1 million entries (blob size a few
megabytes) in roughly a second or better so it takes as long as
transmission over mobile or slow internet between untrusted peers
(datascript javascript mobile clients and datomic database on the
server for instance) and keeps latency in synchronisation low. I want
to push the speed as far as I can (without sacrificing safety),
because this determines the boundaries of distributed systems that can
be built with it. Inside trusted networks hashing can always be
disabled, treating these as random UUIDs, but this weakens the overall
system. An alternative is to avoid commutative datastructures
completely, which can give up to native byte-array speed but drops
clojure value semantics.

I had a look at cheshire (4) and transit-java (5) and I might be able
to squeeze a factor 2x out by using JVM dispatch on types instead of
protocols. But as this is already a fairly low-level and limited
optimization (sequential hashing is fairly fast and the profiler only
sometimes shows protocol overhead) I might miss some good ideas of how
to improve the commutative crunching, I wanted to ask here first.
Benchmarking has also proven quite adventurous, with significant
performance changes from different jvm runs, even with criterium it
just was half as fast as a few hours ago (probably OS/cpu throtteling
(?...)).

If you have further requirements or ideas, it would make sense to
raise them now. I am also not aware of other solutions, so please
point them out if you know some.


Thanks,
Christian

(1)
https://github.com/ghubber/hasch/blob/master/src/clj/hasch/platform.clj (2
bits of precious entropy are reserved for internal revisions)
(2) https://github.com/ghubber/geschichte
(3) https://github.com/ghubber/hasch#speed
(4) https://github.com/dakrone/cheshire
(5) https://github.com/cognitect/transit-java
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEcBAEBAgAGBQJUlyujAAoJEKel+aujRZMkpgYH+QFmNvrAOiu65CMDVNynZTGB
B9Pb7N62gCA3sYZZq6R0taY6ZraVKozwiJ27WOO7u6c0XJeYAMFHSTUvBepbKqkj
qYpgeVkkNMB9MyCLDvhBxaT4aId35CfXJBWYu0pK1jsq8Kwlfwwukq4ThOndUMB/
wp226e7i7YogLB8tetPQ2wHC9wfw1ITFSQtIf9avt1tAlxroeg23scV8NPG0LRSO
p33kFrGWYnZcl27kS2AsdjC5akkS3QUDm3FBiaLVF4swCkir5jOMGOm8zhUincUL
NPKpbBJ2DT6RU2nTgJESWRxK9Sph8+hvI+3Vu9S6VVgmMDGq4uixMqVhtfYipys=
=6W+f
-----END PGP SIGNATURE-----

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to