There probably aren’t any hash collisions in this set ....

The construction of the golomb-coded set reminds me of a method for uniformly sampling simplices:

Generate n uniform random numbers in the range [0,1], sort them, then take the differences (including the differences to 0 at the bottom end and 1 at the top). This results in (nicely homogenous) barycentric coordinates; to find the sample point, simply multiply each vertex by its barycentric weight and sum their contributions.

The fact that the differences in golomb-coding* are geometrically distributed (much more likely to be small than large) has as its continuous corollary that the distance to the boundary of the simplex in high enough dimension is also much more likely to be small than large (in the continuous case, exponentially distributed?); in other words, we recover the folk wisdom that approximation (and hence search) are difficult in high dimensions because very few of the points of a given volume are "close" to its centroid. This also explains why jitter is difficult to avoid: there is only one way to be exactly in tempo, and many more ways to be off.

(in the discrete case, the birthday paradox is also a reflection of high dimensional shapes being almost all surface and very little content, resp. that events naturally clump)

-Dave

* thinking-out-loud, here is a somewhat self-delimiting octet- friendly variation on rice-coding: code each six bits of the remainder as a bare utf-8 continuation byte, and each non-zero sixteen of the quotient as a normal utf-8 character. If useful, I'd propose calling it "orzo-coding", as it wouldn't be as fine-grained as rice-coding but could be prepared relatively quickly and easily...

--
To unsubscribe: http://lists.canonical.org/mailman/listinfo/kragen-discuss

Reply via email to