willem wrote:

1 IF we take the mandelbrot paramet N = 5000 then Calculatepoint will be called 25000000 times.
andCalculatePoint pushes x*Step - 1.5,Cy on the stack.
then it executes the statements of CalculatePoint
and finally it pops a boolean from the stack.

It does not do any push's and pop's, you may compile into assembler and see it yourself. In particular mandelbrot program, data is passed to CalculatePoint via global variables. In general, FPC passes up to 3 parameters into functions by using registers, and simple (boolean, integer and alike) function return values are also passed in registers.

together that are 50000000 push and pops.
There is a loop in CalculatePoint, that executes 600 assembler instructions per point (at average). Saving 2 push/pop instructions will improve speed by 1/300, which won't be noticeable.

You can avoid that by making an inline statement of CalculatePoints.
So no function calls in the innerloops of mandelbrot.pas .

2 SSE2
sse is based on SIMD, single instruction multiple data (SIMD).
sse can multiply a 4*4 matrix with one multiply instruction.
In CalculatePoint there are 4 multiplications:
2*Zr*Zi
Zi*Zi
Zr*Zr

SSE2 can multiply these parameters in one instruction with the Mulps instruction. (multiply packed single).
But the parameters must be of single precision and not double precsion.
Single precision is 40 bits, I think it will not affect the outcome of the mandelbrot bitmap.
but 4 multiplications in 1 instruction will speed up the program.
we need only a 4*2 matrix  (a*b).
matrix a:
2*Zr*Zi*1
Zi*Zi*1*1
Zr*Zr*1*1
1*1*1*1
matrix b is the output.

But the question is - how many instructions will you need to arrange these matrices before they can be multiplied?

Regards,
Sergei

_________________________________________________________________
    To unsubscribe: mail [EMAIL PROTECTED] with
               "unsubscribe" as the Subject
  archives at http://www.lazarus.freepascal.org/mailarchives

Reply via email to