Re: Mersenne: Any statistics majors out there?
At 10:13 AM 5/11/99 +, you wrote: The theoretical distribution is most definitely not an F-distribution, though the shape may be reminiscent. If we knew what the distribution is theoretically, we could fit it. I'm a former temporary part-time adjunct instructor of statistics for a minor university (really!), but it has been a long time and I don't have most of my books at hand. Since the pdf of the normal distribution does not have a "pretty" analytic integral, the function I[0,x] is complex. I've got a decent numerical integration routine (that I use for integrating the Gaussian dist.) All assuming that the underlying distribution of the source data is normal - or near enough to make no practical differerence! If you're talking about the actual data George gave us - it is far from normal. At first I was going to do a Kolmogorov-Smirnov test to see if it was close to a normal, but that is really designed for the raw data. Since the data was in a histogram I used the Chi-squared test. The data posted was far from a normal distribution (even just the right half of it was far from a normal). +--+ | Jud McCranie [EMAIL PROTECTED] | +--+ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?
Yesterday I wrote Could I suggest that the maximum convolution error is likely to occur when whole blocks of bits are 1, thus forcing large integers into the transform so that rounding/truncation errors are most likely to occur. Consider the LL test algorithm x' = (x^2 - 2) (mod 2^p - 1) If x = 1, then x' = -1, x'' = +1, x''' = -1, etc, etc, for all values Aaarrghhh! I think I must have consumed too much Old Bushmills brain solvent ... x = 1, x' = x'' = x''' = -1 ... 8*) of p. And -1 (mod 2^p - 1) = (2^p - 2) (mod 2^p -1). Furthermore 2^p - 2 = 2 * (2^(p-1) - 1), a number which contains (p-1) consecutive 1 bits. Fortunately my error does not affect this argument. So running a couple of iterations of the LL test algorithm starting at 1 (instead of 4), looking at the max convolution error and checking that the value of the residual after 2 iterations is 1, might be instructive ... ? Well, I tried it using a modified version of Richard Crandall's generic C program lucdwt. Starting with 1 instead of 4 gives the correct residual 0xFFFE (-1 mod (2^p-1)) for all iterations and with _zero_ error, provided the transform size p/32. So, almost all zero bits and almost all one bits are both very easy on the transform. Is it possible to generate "worst case" data, and, if so, would anyone care to suggest what it might be? e.g. I remember we used to fill space with 0xE5 on FM disks, or 0x6DB6 on MFM disks, because these bit patterns were particularly difficult for the hardware to cope with - thus, if the test pattern was reliable, the data should be safe. Bearing in mind the strong links between signal processing and Fourier transforms, this illustration may possibly be not entirely without relevance. My feeling is that running the LL test itself, starting at 4, is probably as effective a way as any of pseudo-randomly generating bit patterns for the purpose of testing the FFT logic, unless someone can demonstrate a better method. (Of course it takes a few iterations for the mod 2^p-1 to "kick in" and start to "randomize" the bit pattern - but after less than 50 iterations the initial conditions have been well and truly obfuscated) Regards Brian Beesley Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?
though. So outside about 14 sigmas you should be able to say the probability is below 10e-40. The problem is that if there are small deviations from "Gaussian-ness" way out on the wings of your distribution, the REAL probability of a certain result is not well approximated by the Error Function result. This is a real good point - if we are assuming a Gaussian distribution, then we are assuming the best case. The worst case is given by Tchebycheff's theorem, which states that, given a probability distribution where only the mean and standard deviation is known, then the probability that an observation will fall more than x standard deviations from the mean is bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but with a very unlikely distribution). In other words, if you have no guarantees about the distribution, "counting sigmas" is going to give you a false sense of security, and if the distribution is even slightly deviant from Gaussian, then the result can be very wrong indeed. This is all fine - except the assertion about a Gaussian distribution being the best case. Take a rectangular distribution - a continuous variable is equally likely to have any value between 0 and 1, but never less than 0 nor greater than 1. This distribution has a mean of 0.5 and a standard deviation of 0.289 (variance = 1/12), so _all_ observations taken from this distribution are well within 2 sigmas of the mean. In fact, any distribution with negative kurtosis will be a better case than a Gaussian distribution in this respect, provided that it is symmetric (zero skew) and reasonably well behaved. On numeric grounds, we should expect the distribution of the underlying errors to have a negative kurtosis. This is because of the finite precision of the result - the data points themselves are uncertain to an extent, e.g. if the data is accurate to only 4 bits precision then a "true" value of 0.1 could be recorded as either 0.0625 or 0.1250. This effect tends to reduce the kurtosis (the central "hump" of the distribution is flattened and broadened - this causes an overestimation of the population standard deviation from a sample of observations, however large). Regards Brian Beesley Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?
On Mon, 10 May 1999, Brian J Beesley wrote: though. So outside about 14 sigmas you should be able to say the probability is below 10e-40. The problem is that if there are small deviations from "Gaussian-ness" way out on the wings of your distribution, the REAL probability of a certain result is not well approximated by the Error Function result. ... In fact, any distribution with negative kurtosis will be a better case than a Gaussian distribution in this respect, provided that it is symmetric (zero skew) and reasonably well behaved. On numeric grounds, we should expect the distribution of the underlying errors to have a negative kurtosis. This is because of the finite precision of the result - the data points themselves are uncertain to an extent, e.g. if the data is accurate to only 4 bits precision then a "true" value of 0.1 could be recorded as either 0.0625 or 0.1250. This effect tends to reduce the kurtosis (the central "hump" of the distribution is flattened and broadened - this causes an overestimation of the population standard deviation from a sample of observations, however large). I am sorry, but what is kurtosis? Some measure of the speed with which the tail of the distribution falls, maybe? Or some sort of curvature? My own impression is that since the errors come from adding many many small errors we should expect something close to a gaussian distribution, except that very large errors should be not just very improbable but actually impossible. Large but not so very large errors should also have true probabilities smaller than whatever is predicted by the gaussian distribution. http://www.mat.puc-rio.br/~nicolau Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?
I am sorry, but what is kurtosis? Some measure of the speed with which the tail of the distribution falls, maybe? Or some sort of curvature? Kurtosis is the excess of the fourth moment with respect to the normal distribution. A distribution with positive kurtosis has longer "tails" than a normal distribution with the same standard deviation would be. A good example of a smooth continuous distribution with negative kurtosis would be defined by the probability distribution function p = a * e^(-(x-u)^4/b) (for some constants a,b) My own impression is that since the errors come from adding many many small errors we should expect something close to a gaussian distribution, except that very large errors should be not just very improbable but actually impossible. Large but not so very large errors should also have true probabilities smaller than whatever is predicted by the gaussian distribution. The source of the errors is the difference between a floating-point value and the nearest integer. The error measure is therefore not the result of summing many small errors. Also, we are operating in the region where the nearest integer sometimes has nearly as many significant bits as the precision of the floating- point number being compared. What we are trying to find is how far we can push this before we run into a real danger that we will round off to the wrong integer. The point is that, the less "guard bits" we can get away with, the larger the exponents we can use with a particular transform size - and there is a substantial performance loss for each increase in transform size in terms of CPU time per iteration. Regards Brian Beesley Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?
though. So outside about 14 sigmas you should be able to say the probability is below 10e-40. The problem is that if there are small deviations from "Gaussian-ness" way out on the wings of your distribution, the REAL probability of a certain result is not well approximated by the Error Function result. This is a real good point - if we are assuming a Gaussian distribution, then we are assuming the best case. The worst case is given by Tchebycheff's theorem, which states that, given a probability distribution where only the mean and standard deviation is known, then the probability that an observation will fall more than x standard deviations from the mean is bounded above *only* by 1/x^2. (It's a tight bound for one value of x, but with a very unlikely distribution). In other words, if you have no guarantees about the distribution, "counting sigmas" is going to give you a false sense of security, and if the distribution is even slightly deviant from Gaussian, then the result can be very wrong indeed. However we do have a little comfort. George's function - maximum convolution error from a single iteration - does have some distribution. It "appears" bell-shaped, and looks like a normal distribution, the samples George has made are also reasonably-sized. Not only that, each observation is actually the maximum of the deviations from integer over many channels in the FFT - in other words, the observations are already coming from a "smoothed" distribution. I'm sure the distribution of a maximum of several observations is something which there is a statistical test for. (The resulting distribution may not be normal, but may have an analogous test). Thanks too Todd for backing up the heuristic calculation of "14 sigmas" that I got the hard way :) Chris Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?
At 12:00 PM 5/8/99 -0400, Chris Nash wrote: This is a real good point - if we are assuming a Gaussian distribution, then we are assuming the best case. The worst case is given by Tchebycheff's theorem, which states that, given a probability distribution where only the mean and standard deviation is known, then the probability that an observation will fall more than x standard deviations from the mean is bounded above *only* by 1/x^2. Yes, but that is a really bad worst case. Pathological even. Looking at the graphs of the two data sets, you can tell that it is reasonably well behaved. +---+ | Jud McCranie [EMAIL PROTECTED] | | | | I have macular stars in my eyes. | +---+ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?
At 06:17 PM 5/8/99 -0500, Ken Kriesel wrote: Aren't gaussians symmetric about the mean value? What George plotted is not. Yes, but isn't too far off. But it does drop off quite a bit faster onthe left. That is why I think it may be better to chop the data at the mean, throw away that under the mean, reflect the other data with respect to the mean, and calculate the standard deviation from that "massaged" data. Then a Normal distribution should approximate it pretty well. +---+ | Jud McCranie [EMAIL PROTECTED] | | | | I have macular stars in my eyes. | +---+ Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Mersenne: Any statistics majors out there?
Hi all, I'm working on version 19 of prime95 and I need your help. In the past, the exponents at which a larger FFT is used was picked rather haphazardly. I simply picked a few exponents trying to find one that could run a thousand or so iterations without the convolution error greatly exceeding 0.3. For this release, I thought it would be nice to use a more scientific approach. What I've done is run almost 1 iterations of an exponent and used Excel to graph the number of iterations with an error between 0.200 and 0.201, 0.201 and 0.202, etc. As you can imagine you get a Bell-like curve that peaks at some value. Since it has been 25 years since I took statistics courses in college, I am at a loss as to how to use this graph to predict the probability that N iterations will have an error below M. I could then use such information to find the exponent such that the chance of a convolution error exceeding 0.5 is less than 1 in 10^40 (or some other criteria that we agree is "safe"). Rather than mail these largish spreadsheets to the entire list, two sample graphs are at ftp://entropia.com/gimps/x1345000.xls and ftp://entropia.com/gimps/x136.xls (in the graphs a 400 on the X-axis is really a convolution error of 0.400, the Y-axis is a count of iterations). Any help is appreciated. Best regards, George P.S. A brief explanation of convolution errors: Since the FFT in prime95 is done in floating point arithmetic, round-off errors build up. After the FFT, every value is rounded back to an integer. The convolution error is the largest amount that a result value differs from the correct integer value. If the convolution error exceeds 0.5, then the round-back-to-integer step will round to the wrong integer value - a catastrophic failure. Another way to look at this: Each FFT value contains about 20 bits of data. That is, values ranging from 2^19 to -2^19. Consider an FFT of 128K elements. If each FFT value was close to the 2^19 value, then each result value would need 19+19+17=55 bits to hold the result. Since a float only holds 53 bits there would be a loss of information. Fortunately, probability assures us that it is extremely unlikely that every (or even most) FFT values will be close to the 2^19 maximum value. Unsubscribe list info -- http://www.scruz.net/~luke/signup.htm
Re: Mersenne: Any statistics majors out there?
George, You indicate that that the error distribution looks "like a Bell curve". There is reasonable theoretical basis for the errors to follow a Bell curve. The sum of many random plusses and minuses combine as in the classic "random walk" problem to give a Gaussian probability distribution. The probabilities for various outcomes of Gaussian distributed events are given by the "Error Function" which is defined something like the integral over a Gaussian probability distribution from its center to some desired point (like 0.5) expressed in terms of the width of the distribution (you've got to worry about the pesky square-root-2s). Probability outside one sigma ~ 0.3; outside 2 sigma .05; outside 3 sigma 0.003; outside 4 sigma 0.6; outside 5 sigma 6e-7; outside 6 sigma 2e-9; outside 7 sigma 3e-12; . . . outside 14 sigma 2e-44. This function is tabulated in reference books, but usually not out to within 10e-40 of the terminal value. I suppose there are good approximations applicable out there though. So outside about 14 sigmas you should be able to say the probability is below 10e-40. The problem is that if there are small deviations from "Gaussian-ness" way out on the wings of your distribution, the REAL probability of a certain result is not well approximated by the Error Function result. Even though there are reasonable theoretical considerations to indicate that your error distribution "should" be strictly Gaussian, any "weird", low probability-type effects that cause small deviations from "Gaussian-ness" will make the "computed" probability of, say, 10e-40 be wrong and give a false sense of security. This is why people usually do what you instinctively did before and go with a "gut feeling" about how many "sigmas out on the distribution" to go for safety. Without a firm idea of what the "real" distribution is or unassailable theoretical conviction that the errors "must" be Gaussian, a probability calculation out to 10e-40 just gives false security. That said, you really don't need to go out to 10e-40 anyway. The probability of having a roundoff error greater than 0.5 over an entire LL test only needs to be small compared to other typical error risks, like gamma rays, people using excessive overclocking, or whatever effects cause the ~1% errors revealed by doublechecking. You probably want a per iteration error greater than 0.5 to be much less than about 10e-10, say. So if 0.5 is outside about 6 or 7 sigmas of your error distribution, you would be OK, ASSUMING the distribution is nearly Gaussian. You might look at the integral of the error histograms you get to see if they deviate subtly from the error function. I was unable to load the raw data you referenced; I'm using the OLD Excel 4.0. If you want to send me some raw data readable by Excel 4.0 I would be happy to look at it and look for how faithfully it follows a Gaussian. Sincerely, Todd Sauke Hi all, I'm working on version 19 of prime95 and I need your help. In the past, the exponents at which a larger FFT is used was picked rather haphazardly. I simply picked a few exponents trying to find one that could run a thousand or so iterations without the convolution error greatly exceeding 0.3. For this release, I thought it would be nice to use a more scientific approach. What I've done is run almost 1 iterations of an exponent and used Excel to graph the number of iterations with an error between 0.200 and 0.201, 0.201 and 0.202, etc. As you can imagine you get a Bell-like curve that peaks at some value. Since it has been 25 years since I took statistics courses in college, I am at a loss as to how to use this graph to predict the probability that N iterations will have an error below M. I could then use such information to find the exponent such that the chance of a convolution error exceeding 0.5 is less than 1 in 10^40 (or some other criteria that we agree is "safe"). Rather than mail these largish spreadsheets to the entire list, two sample graphs are at ftp://entropia.com/gimps/x1345000.xls and ftp://entropia.com/gimps/x136.xls (in the graphs a 400 on the X-axis is really a convolution error of 0.400, the Y-axis is a count of iterations). Any help is appreciated. Best regards, George P.S. A brief explanation of convolution errors: Since the FFT in prime95 is done in floating point arithmetic, round-off errors build up. After the FFT, every value is rounded back to an integer. The convolution error is the largest amount that a result value differs from the correct integer value. If the convolution error exceeds 0.5, then the round-back-to-integer step will round to the wrong integer value - a catastrophic failure. Another way to look at this: Each FFT value contains about 20 bits of data. That is, values ranging from 2^19 to -2^19. Consider an FFT of 128K elements. If each FFT value was close to the 2^19 value, then each result value would need