Hi folks

>There is a technique for assessing the "naturalness" of economic data.
>This technique, known as Benford's Law, demonstrates that the
>first digits of naturally occurring phenomena do not occur with equal
>frequency.  In fact, lower digits occur with greater frequency in
>tabulated natural data than larger digits.

Great approach to this from Rodolfo... Benford's Law is visually familiar to
those of us old enough to remember such anachronisms as tables of logarithms
and slide rules! Statistically, it's because models of natural processes
(say, radioactive decay and general "time between" distributions) yield an
exponential decaying Poisson distribution. From a computing point of view,
it's the same effect as what happens if you generate floating-point numbers
of "random" bits, the distribution is skewed (the probability a number is
between 1 and 2 is the same as it is between 2 and 4). Pick a (unbounded)
random number... how can you do it? You can't make all numbers to infinity
equally probable, and any smooth trnasform of the number picked should be
"equally random" and have the same distribution. The answer turns out to be
logarithmic, hence Benford's Law.

(It may be apocryphal, but apparently some 8-bit machine (perhaps Atari?)
had a means of generating "random" numbers because some memory location was
subject to "noise" - effectively some component acted as a radio antenna. It
may even have been by design... but of course results obtained by sampling
this location for random bits were awful. Being natural they were not only
non-uniform and non-independent but also subject to their surroundings. Can
anyone validate this?).

Anyway, such logarithmic behavior is certainly visible in the Mersenne data.
Heuristically, the probability N=2^n-1 is prime is going to be proportional
to 1/log N, ie proportional to 1/n. (we ignore constraints such as n needing
to be prime and the factors being of specific form, but this is a good
enough start). Hence theoretically we expect the number of Mersenne primes
of exponent less than L to be a partial sum of this, proportional to log L.
Hence we expect the n'th Mersenne prime to have an exponent increasing
exponentially, and, in reverse, the logarithms of the exponents should be
statistically regularly spaced. (What follows may be *very* sensitive to a
better model of the distribution, but this will do as a first estimate).

In an argument similar to Benford's Law, the fractional part of these
logarithms should, for a "random" phenomenon, be uniformly distributed on
[0,1). If the phenomenon is truly random, then this result should hold no
matter what base of logarithm we choose. However, consider plotting the
statistical deviation of such observations from randomness for different
bases of logarithm. Any marked deviation from statistical "noise" and
sampling error is a good indicator of non-random data for which the
logarithm base is some sort of controlling parameter. (In effect this is a
similar approach as curve fitting our expected distribution model to the
observed data).

I'd be interested to hear from anyone who constructs such a statistical
deviation vs logarithm base plot. We may expect such a statistical approach
to suggest a distribution where the overall scaling, and artifacts such as
Noll's islands, manifest themselves in the plot as large deviations from
randomness and spikes in the plot. This is one for the statisticians, to
create a suitable measure of the deviation of these fractional parts from a
uniform distribution on [0,1). Perhaps the sample variance will be a good
first measure, but with only 37 samples and a high degree of
non-independence, beware!

Chris Nash
Lexington KY
UNITED STATES



________________________________________________________________
Unsubscribe & list info -- http://www.scruz.net/~luke/signup.htm

Reply via email to