willem wrote:
1 IF we take the mandelbrot paramet N = 5000 then Calculatepoint will be
called 25000000 times.
andCalculatePoint pushes x*Step - 1.5,Cy on the stack.
then it executes the statements of CalculatePoint
and finally it pops a boolean from the stack.
It does not do any push's and pop's, you may compile into assembler and
see it yourself. In particular mandelbrot program, data is passed to
CalculatePoint via global variables.
In general, FPC passes up to 3 parameters into functions by using
registers, and simple (boolean, integer and alike) function return
values are also passed in registers.
together that are 50000000 push and pops.
There is a loop in CalculatePoint, that executes 600 assembler
instructions per point (at average). Saving 2 push/pop instructions will
improve speed by 1/300, which won't be noticeable.
You can avoid that by making an inline statement of CalculatePoints.
So no function calls in the innerloops of mandelbrot.pas .
2 SSE2
sse is based on SIMD, single instruction multiple data (SIMD).
sse can multiply a 4*4 matrix with one multiply instruction.
In CalculatePoint there are 4 multiplications:
2*Zr*Zi
Zi*Zi
Zr*Zr
SSE2 can multiply these parameters in one instruction with the Mulps
instruction. (multiply packed single).
But the parameters must be of single precision and not double precsion.
Single precision is 40 bits, I think it will not affect the outcome of
the mandelbrot bitmap.
but 4 multiplications in 1 instruction will speed up the program.
we need only a 4*2 matrix (a*b).
matrix a:
2*Zr*Zi*1
Zi*Zi*1*1
Zr*Zr*1*1
1*1*1*1
matrix b is the output.
But the question is - how many instructions will you need to arrange
these matrices before they can be multiplied?
Regards,
Sergei
_________________________________________________________________
To unsubscribe: mail [EMAIL PROTECTED] with
"unsubscribe" as the Subject
archives at http://www.lazarus.freepascal.org/mailarchives