There probably aren’t any hash collisions in this set ....
The construction of the golomb-coded set reminds me of a method for
uniformly sampling simplices:
Generate n uniform random numbers in the range [0,1], sort them, then
take the differences (including the differences to 0 at the bottom
end and 1 at the top). This results in (nicely homogenous)
barycentric coordinates; to find the sample point, simply multiply
each vertex by its barycentric weight and sum their contributions.
The fact that the differences in golomb-coding* are geometrically
distributed (much more likely to be small than large) has as its
continuous corollary that the distance to the boundary of the simplex
in high enough dimension is also much more likely to be small than
large (in the continuous case, exponentially distributed?); in other
words, we recover the folk wisdom that approximation (and hence
search) are difficult in high dimensions because very few of the
points of a given volume are "close" to its centroid. This also
explains why jitter is difficult to avoid: there is only one way to
be exactly in tempo, and many more ways to be off.
(in the discrete case, the birthday paradox is also a reflection of
high dimensional shapes being almost all surface and very little
content, resp. that events naturally clump)
-Dave
* thinking-out-loud, here is a somewhat self-delimiting octet-
friendly variation on rice-coding: code each six bits of the
remainder as a bare utf-8 continuation byte, and each non-zero
sixteen of the quotient as a normal utf-8 character. If useful, I'd
propose calling it "orzo-coding", as it wouldn't be as fine-grained
as rice-coding but could be prepared relatively quickly and easily...
--
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss