Your message was fun, Jeff.

I reasoned this out similarly, thinking along the lines of base-64 used in
MIME.  The permissible character set for DOS-platform file names contains at
least 46 characters.  The number of different names expressible in 8 base-46
characters is sufficient to have a minuscule collision probability for
archives of any reasonable size.  A 100,000 message archive seems two orders
of magnitude too high for MHonArc's basic design; anything that large using
a filesystem as its database needs to be organized hierarchically.  That
would add a subdirectory namespace into the quota.

-- SP

> Anyway, sorry I didn't jump in then, but the kind-of-fun question was
> implicitly raised: how many bits of randomness do you need for
> reproducible URLs in MHonArc?  (Hey, it's not every day that real life
> questions can be tackled like problem sets!)


> Now if we are restricted to ending the filenames with something like
> .htm, then there are only about 41 bits of randomness, and then we
> run about 1% risk of collision for a puny n=100,000 message archive.
> That's pushing it.
>
> Ok, one last note. If we use a real filesystem, with upper and lower
> case letters in the filenames, we'd still need 10 characters in the
> filename to meet/exceed the acceptable saftey margin (57 bits). So
> those lower case letters don't help us much in the region we are
> interested in.
>
> Using MD-5 checksums for filenames is complete overkill statisticly
> speaking. They are 128 bits, and would consume 20-odd characters in
> the filename. 10 character filenames would do the trick nicely. There
> is certainly no need to combine MD-5 and message-ID's from a
> statistical standpoint.

Reply via email to