At 07:47 AM 3/11/2003 -0500, Richard Woods wrote:
However, any difference in FFT size between a P4 and other CPU, because
of SSE support/nonsupport, could make a difference to the algorithm
because it _does_ take FFT size into account.
There was a bug in calculating the the FFT size (bytes of memory consumed)
for the P4. This bug caused the P-1 bounds selecting code to produce
different results than the x86 code. This is a fairly benign bug and will be
fixed in version 23.3
In case you care, the details are: There is a global variable called FFTLEN
that is used in many places and is initialized by the FFT init routine. The
routine to select the P-1 bounds is called before the FFT code is initialized.
Thus, the routine to calculate the number of bytes consumed by an FFT
cannot use the global variable FFTLEN. In fact, that routine is passed
an argument - fftlen in lower case. Well, you guessed it, in the P4 section
of the routine I referenced FFTLEN rather than fftlen. The routine worked
fine once the FFT code was initialized - only the P-1 bounds selecting code
was affected.
BTW, the FFT size is more than FFT length * sizeof (double). There are
various paddings thrown in for better cache usage. Sadly, if I had just
used "FFT length * sizeof (double)" as an estimate for the size in selecting
the P-1 bounds this bug never would have happened and the size estimate
is more than accurate enough for the purposes of selecting bounds.
---
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.459 / Virus Database: 258 - Release Date: 2/25/2003