Re: Mersenne: Any statistics majors out there?

1999-05-11 Thread Jud McCranie

At 10:13 AM 5/11/99 +, you wrote:

The theoretical distribution is most definitely not an F-distribution, 
though the shape may be reminiscent.

If we knew what the distribution is theoretically, we could fit it.  I'm a
former temporary part-time adjunct instructor of statistics for a minor
university (really!), but it has been a long time and I don't have most of my
books at hand.


Since the pdf of the normal distribution does not have a "pretty" 
analytic integral, the function I[0,x] is complex.

I've got a decent numerical integration routine (that I use for integrating the
Gaussian dist.)


All assuming that the underlying distribution of the source data is 
normal - or near enough to make no practical differerence!

If you're talking about the actual data George gave us - it is far from
normal.  At first I was going to do a Kolmogorov-Smirnov test to see if it was
close to a normal, but that is really designed for the raw data.  Since the
data was in a histogram I used the Chi-squared test.  The data posted was far
from a normal distribution (even just the right half of it was far from a
normal).



+--+
| Jud McCranie [EMAIL PROTECTED] |
+--+


Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Any statistics majors out there?

1999-05-10 Thread Brian J Beesley

Yesterday I wrote
 
 Could I suggest that the maximum convolution error is likely to occur
 when whole blocks of bits are 1, thus forcing large integers into the
 transform so that rounding/truncation errors are most likely to occur.
 
 Consider the LL test algorithm x' = (x^2 - 2) (mod 2^p - 1)
 
 If x = 1, then x' = -1, x'' = +1, x''' = -1, etc, etc, for all values

Aaarrghhh! I think I must have consumed too much Old Bushmills 
brain solvent ... x = 1, x' = x'' = x''' = -1 ... 8*)

 of p. And -1 (mod 2^p - 1) = (2^p - 2) (mod 2^p -1). Furthermore
 2^p - 2 = 2 * (2^(p-1) - 1), a number which contains (p-1) consecutive
 1 bits.

Fortunately my error does not affect this argument.
 
 So running a couple of iterations of the LL test algorithm starting
 at 1 (instead of 4), looking at the max convolution error and checking
 that the value of the residual after 2 iterations is 1, might be
 instructive ... ?

Well, I tried it using a modified version of Richard Crandall's generic 
C program lucdwt. Starting with 1 instead of 4 gives the correct 
residual 0xFFFE (-1 mod (2^p-1)) for all iterations 
and with _zero_ error, provided the transform size  p/32.

So, almost all zero bits and almost all one bits are both very easy 
on the transform.

Is it possible to generate "worst case" data, and, if so, would 
anyone care to suggest what it might be? e.g. I remember we used 
to fill space with 0xE5 on FM disks, or 0x6DB6 on MFM disks, 
because these bit patterns were particularly difficult for the 
hardware to cope with - thus, if the test pattern was reliable, the 
data should be safe.

Bearing in mind the strong links between signal processing and 
Fourier transforms, this illustration may possibly be not entirely 
without relevance.

My feeling is that running the LL test itself, starting at 4, is 
probably as effective a way as any of pseudo-randomly generating 
bit patterns for the purpose of testing the FFT logic, unless 
someone can demonstrate a better method. (Of course it takes a 
few iterations for the mod 2^p-1 to "kick in" and start to 
"randomize" the bit pattern - but after less than 50 iterations the 
initial conditions have been well and truly obfuscated)



Regards
Brian Beesley

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Any statistics majors out there?

1999-05-10 Thread Brian J Beesley

  though.  So outside about 14 sigmas you should be able to say the
  probability is below 10e-40. The problem is that if there are small
  deviations from "Gaussian-ness" way out on the wings of your distribution,
  the REAL probability of a certain result is not well approximated by the
  Error Function result.
 
 This is a real good point - if we are assuming a Gaussian distribution, then
 we are assuming the best case. The worst case is given by Tchebycheff's
 theorem, which states that, given a probability distribution where only the
 mean and standard deviation is known, then the probability that an
 observation will fall more than x standard deviations from the mean is
 bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but
 with a very unlikely distribution). In other words, if you have no
 guarantees about the distribution, "counting sigmas" is going to give you a
 false sense of security, and if the distribution is even slightly deviant
 from Gaussian, then the result can be very wrong indeed.

This is all fine - except the assertion about a Gaussian distribution 
being the best case.

Take a rectangular distribution - a continuous variable is equally 
likely to have any value between 0 and 1, but never less than 0 nor 
greater than 1.

This distribution has a mean of 0.5 and a standard deviation of 
0.289 (variance = 1/12), so _all_ observations taken from this 
distribution are well within 2 sigmas of the mean.

In fact, any distribution with negative kurtosis will be a better case 
than a Gaussian distribution in this respect, provided that it is 
symmetric (zero skew) and reasonably well behaved.

On numeric grounds, we should expect the distribution of the 
underlying errors to have a negative kurtosis. This is because of the 
finite precision of the result - the data points themselves are 
uncertain to an extent, e.g. if the data is accurate to only 4 bits 
precision then a "true" value of 0.1 could be recorded as either 
0.0625 or 0.1250. This effect tends to reduce the kurtosis (the 
central "hump" of the distribution is flattened and broadened - this 
causes an overestimation of the population standard deviation from 
a sample of observations, however large).


Regards
Brian Beesley

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Any statistics majors out there?

1999-05-10 Thread Nicolau C. Saldanha

On Mon, 10 May 1999, Brian J Beesley wrote:

   though.  So outside about 14 sigmas you should be able to say the
   probability is below 10e-40. The problem is that if there are small
   deviations from "Gaussian-ness" way out on the wings of your distribution,
   the REAL probability of a certain result is not well approximated by the
   Error Function result.

...

 In fact, any distribution with negative kurtosis will be a better case 
 than a Gaussian distribution in this respect, provided that it is 
 symmetric (zero skew) and reasonably well behaved.
 
 On numeric grounds, we should expect the distribution of the 
 underlying errors to have a negative kurtosis. This is because of the 
 finite precision of the result - the data points themselves are 
 uncertain to an extent, e.g. if the data is accurate to only 4 bits 
 precision then a "true" value of 0.1 could be recorded as either 
 0.0625 or 0.1250. This effect tends to reduce the kurtosis (the 
 central "hump" of the distribution is flattened and broadened - this 
 causes an overestimation of the population standard deviation from 
 a sample of observations, however large).

I am sorry, but what is kurtosis? Some measure of the speed with which
the tail of the distribution falls, maybe? Or some sort of curvature?

My own impression is that since the errors come from adding many many
small errors we should expect something close to a gaussian distribution,
except that very large errors should be not just very improbable
but actually impossible. Large but not so very large errors should
also have true probabilities smaller than whatever is predicted by
the gaussian distribution.

http://www.mat.puc-rio.br/~nicolau


Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Any statistics majors out there?

1999-05-10 Thread BJ . Beesley

I am sorry, but what is kurtosis? Some measure of the speed with which
the tail of the distribution falls, maybe? Or some sort of curvature?

Kurtosis is the excess of the fourth moment with respect to the normal
distribution.

A distribution with positive kurtosis has longer "tails" than a normal
distribution with the same standard deviation would be.

A good example of a smooth continuous distribution with negative
kurtosis would be defined by the probability distribution function
p = a * e^(-(x-u)^4/b) (for some constants a,b)

My own impression is that since the errors come from adding many many
small errors we should expect something close to a gaussian distribution,
except that very large errors should be not just very improbable
but actually impossible. Large but not so very large errors should
also have true probabilities smaller than whatever is predicted by
the gaussian distribution.

The source of the errors is the difference between a floating-point
value and the nearest integer. The error measure is therefore not
the result of summing many small errors.
Also, we are operating in the region where the nearest integer sometimes
has nearly as many significant bits as the precision of the floating-
point number being compared.
What we are trying to find is how far we can push this before we run
into a real danger that we will round off to the wrong integer. The
point is that, the less "guard bits" we can get away with, the larger
the exponents we can use with a particular transform size - and there
is a substantial performance loss for each increase in transform size
in terms of CPU time per iteration.

Regards
Brian Beesley

Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Any statistics majors out there?

1999-05-08 Thread Chris Nash


 though.  So outside about 14 sigmas you should be able to say the
 probability is below 10e-40. The problem is that if there are small
 deviations from "Gaussian-ness" way out on the wings of your distribution,
 the REAL probability of a certain result is not well approximated by the
 Error Function result.

This is a real good point - if we are assuming a Gaussian distribution, then
we are assuming the best case. The worst case is given by Tchebycheff's
theorem, which states that, given a probability distribution where only the
mean and standard deviation is known, then the probability that an
observation will fall more than x standard deviations from the mean is
bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but
with a very unlikely distribution). In other words, if you have no
guarantees about the distribution, "counting sigmas" is going to give you a
false sense of security, and if the distribution is even slightly deviant
from Gaussian, then the result can be very wrong indeed.

However we do have a little comfort. George's function - maximum convolution
error from a single iteration - does have some distribution. It "appears"
bell-shaped, and looks like a normal distribution, the samples George has
made are also reasonably-sized. Not only that, each observation is actually
the maximum of the deviations from integer over many channels in the FFT -
in other words, the observations are already coming from a "smoothed"
distribution. I'm sure the distribution of a maximum of several observations
is something which there is a statistical test for. (The resulting
distribution may not be normal, but may have an analogous test).

Thanks too Todd for backing up the heuristic calculation of "14 sigmas" that
I got the hard way :)

Chris



Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Any statistics majors out there?

1999-05-08 Thread Jud McCranie

At 12:00 PM 5/8/99 -0400, Chris Nash wrote:

This is a real good point - if we are assuming a Gaussian distribution, then
we are assuming the best case. The worst case is given by Tchebycheff's
theorem, which states that, given a probability distribution where only the
mean and standard deviation is known, then the probability that an
observation will fall more than x standard deviations from the mean is
bounded above *only* by 1/x^2. 

Yes, but that is a really bad worst case.  Pathological even.  Looking at the
graphs of the two data sets, you can tell that it is reasonably well behaved.


+---+
| Jud McCranie  [EMAIL PROTECTED] |
|   |
| I have macular stars in my eyes.  |
+---+



Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Any statistics majors out there?

1999-05-08 Thread Jud McCranie

At 06:17 PM 5/8/99 -0500, Ken Kriesel wrote:
Aren't gaussians symmetric about the mean value?  What George plotted is not.


Yes, but isn't too far off.  But it does drop off quite a bit faster onthe
left.  That is why I think it may be better to chop the data at the mean, throw
away that under the mean, reflect the other data with respect to the mean, and
calculate the standard deviation from that "massaged" data.  Then a Normal
distribution should approximate it pretty well.


+---+
| Jud McCranie  [EMAIL PROTECTED] |
|   |
| I have macular stars in my eyes.  |
+---+



Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Mersenne: Any statistics majors out there?

1999-05-07 Thread George Woltman

Hi all,

I'm working on version 19 of prime95 and I need your help.
In the past, the exponents at which a larger FFT is used was picked
rather haphazardly.  I simply picked a few exponents trying to find
one that could run a thousand or so iterations without the convolution
error greatly exceeding 0.3.

For this release, I thought it would be nice to use a more
scientific approach.  What I've done is run almost 1 iterations
of an exponent and used Excel to graph the number of iterations with 
an error between 0.200 and 0.201, 0.201 and 0.202, etc.  As you can imagine
you get a Bell-like curve that peaks at some value.

Since it has been 25 years since I took statistics courses
in college, I am at a loss as to how to use this graph to predict the
probability that N iterations will have an error below M.  I could
then use such information to find the exponent such that the chance 
of a convolution error exceeding 0.5 is less than 1 in 10^40 (or some
other criteria that we agree is "safe").

Rather than mail these largish spreadsheets to the entire list,
two sample graphs are at ftp://entropia.com/gimps/x1345000.xls and 
ftp://entropia.com/gimps/x136.xls (in the graphs a 400 on the X-axis
is really a convolution error of 0.400, the Y-axis is a count of iterations).

Any help is appreciated.

Best regards,
George 

P.S.  A brief explanation of convolution errors:  Since the FFT in prime95
is done in floating point arithmetic, round-off errors build up.  After the
FFT, every value is rounded back to an integer.  The convolution error is
the largest amount that a result value differs from the correct integer
value.  If the convolution error exceeds 0.5, then the round-back-to-integer
step will round to the wrong integer value - a catastrophic failure.

Another way to look at this:  Each FFT value contains about 20 bits of data.
That is, values ranging from 2^19 to -2^19.  Consider an FFT of 128K elements.
If each FFT value was close to the 2^19 value, then each result value would
need 19+19+17=55 bits to hold the result.  Since a float only holds 53 bits
there would be a loss of information.  Fortunately, probability assures us
that it is extremely unlikely that every (or even most) FFT values will
be close to the 2^19 maximum value.




Unsubscribe  list info -- http://www.scruz.net/~luke/signup.htm



Re: Mersenne: Any statistics majors out there?

1999-05-07 Thread Todd Sauke

George,

You indicate that that the error distribution looks "like a Bell curve".
There is reasonable theoretical basis for the errors to follow a Bell
curve.  The sum of many random plusses and minuses combine as in the
classic "random walk" problem to give a Gaussian probability distribution.
The probabilities for various outcomes of Gaussian distributed events are
given by the "Error Function" which is defined something like the integral
over a Gaussian probability distribution from its center to some desired
point (like 0.5) expressed in terms of the width of the distribution
(you've got to worry about the pesky square-root-2s).  Probability outside
one sigma ~ 0.3; outside 2 sigma  .05; outside 3 sigma  0.003; outside 4
sigma  0.6; outside 5 sigma  6e-7; outside 6 sigma  2e-9; outside 7
sigma  3e-12; . . . outside 14 sigma  2e-44.   This function is tabulated
in reference books, but usually not out to within 10e-40 of the terminal
value.  I suppose there are good approximations applicable out there
though.  So outside about 14 sigmas you should be able to say the
probability is below 10e-40. The problem is that if there are small
deviations from "Gaussian-ness" way out on the wings of your distribution,
the REAL probability of a certain result is not well approximated by the
Error Function result.  Even though there are reasonable theoretical
considerations to indicate that your error distribution "should" be
strictly Gaussian, any "weird", low probability-type effects that cause
small deviations from "Gaussian-ness" will make the "computed" probability
of, say, 10e-40 be wrong and give a false sense of security.  This is why
people usually do what you instinctively did before and go with a "gut
feeling" about how many "sigmas out on the distribution" to go for safety.
Without a firm idea of what the "real" distribution is or unassailable
theoretical conviction that the errors "must" be Gaussian,  a probability
calculation out to 10e-40 just gives false security.  That said, you really
don't need to go out to 10e-40 anyway.  The probability of having a
roundoff error greater than 0.5 over an entire LL test only needs to be
small compared to other typical error risks, like gamma rays, people using
excessive overclocking, or whatever effects cause the ~1% errors revealed
by doublechecking.  You probably want a per iteration error greater than
0.5 to be much less than about 10e-10, say.  So if 0.5 is outside about 6
or 7 sigmas of your error distribution, you would be OK, ASSUMING the
distribution is nearly Gaussian.  You might look at the integral of the
error histograms you get to see if they deviate subtly from the error
function.  I was unable to load the raw data you referenced; I'm using the
OLD Excel 4.0.  If you want to send me some raw data readable by Excel 4.0
I would be happy to look at it and look for how faithfully it follows a
Gaussian.

Sincerely,

Todd Sauke

Hi all,

   I'm working on version 19 of prime95 and I need your help.
In the past, the exponents at which a larger FFT is used was picked
rather haphazardly.  I simply picked a few exponents trying to find
one that could run a thousand or so iterations without the convolution
error greatly exceeding 0.3.

   For this release, I thought it would be nice to use a more
scientific approach.  What I've done is run almost 1 iterations
of an exponent and used Excel to graph the number of iterations with
an error between 0.200 and 0.201, 0.201 and 0.202, etc.  As you can imagine
you get a Bell-like curve that peaks at some value.

   Since it has been 25 years since I took statistics courses
in college, I am at a loss as to how to use this graph to predict the
probability that N iterations will have an error below M.  I could
then use such information to find the exponent such that the chance
of a convolution error exceeding 0.5 is less than 1 in 10^40 (or some
other criteria that we agree is "safe").

   Rather than mail these largish spreadsheets to the entire list,
two sample graphs are at ftp://entropia.com/gimps/x1345000.xls and
ftp://entropia.com/gimps/x136.xls (in the graphs a 400 on the X-axis
is really a convolution error of 0.400, the Y-axis is a count of iterations).

   Any help is appreciated.

Best regards,
George

P.S.  A brief explanation of convolution errors:  Since the FFT in prime95
is done in floating point arithmetic, round-off errors build up.  After the
FFT, every value is rounded back to an integer.  The convolution error is
the largest amount that a result value differs from the correct integer
value.  If the convolution error exceeds 0.5, then the round-back-to-integer
step will round to the wrong integer value - a catastrophic failure.

Another way to look at this:  Each FFT value contains about 20 bits of data.
That is, values ranging from 2^19 to -2^19.  Consider an FFT of 128K elements.
If each FFT value was close to the 2^19 value, then each result value would
need