Re: [fpc-devel] Nested functions in numlib

2017-04-05 Thread Michael Schnell
Performance-wise, moreover for some tasks (such as Matrix 
multiplication) with modern multi-core machines parallel calculation 
could increase performance greatly. This can be done e.g. by using a 
thread pool. (I once did a thread pool implementation based on TThread, 
but I suppose there are more "official" sources).


-Michael
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Nested functions in numlib

2017-04-04 Thread Werner Pamler

Am 04.04.2017 um 03:28 schrieb Marco van de Voort:

Did you test performance? Repeated access to parent frame in tight loops
might be suboptimal. Could maybe be helped with some pointer work?


Right, I should have done that before asking...

Here are the results of a test running the original roof1r routine (A), 
the modified one using the nested function (B) and other modified one 
using a non-nested function but calling the version with the nested 
function (C). In each case, several functions are passed to the root 
finder which is called 5 million times, each call with a (reproducibly) 
different parameter:


f(x) = x
  (A)   ORIGINAL version: 0.656s for 500 
runs (check: y = 0.)
  (B) NESTED version: 0.703s for 500 
runs (7%)
  (C)Global function calling nested function: 0.735s for 500 
runs (12%)


f(x) = x^2
ORIGINAL version: 6.296s for 500 
runs (check: y = 0.)
  NESTED version: 6.313s for 500 
runs (0%)
 Global function calling nested function: 6.546s for 500 
runs (4%)


f(x) = exp(x)
ORIGINAL version: 6.734s for 500 
runs (check: y = 0.)
  NESTED version: 6.703s for 500 
runs (0%)
 Global function calling nested function: 6.890s for 500 
runs (2%)


f(x) = arcsin(x)
ORIGINAL version: 5.718s for 500 
runs (check: y = 0.)
  NESTED version: 5.718s for 500 
runs (0%)
 Global function calling nested function: 5.937s for 500 
runs (4%)


f(x) = erf(x)
ORIGINAL version: 6.391s for 500 
runs (check: y = 0.)
  NESTED version: 6.422s for 500 
runs (0%)
 Global function calling nested function: 6.673s for 500 
runs (4%)


f(x) = gammaLn(x)
ORIGINAL version: 15.260s for 500 
runs (check: y = 0.)
  NESTED version: 15.142s for 500 
runs (-1%)
 Global function calling nested function: 15.426s for 500 
runs (1%)


I would interpret these results such that there are no dramatic 
slow-downs due to calling variant C. Variant B (nested funtion) is 
roughly the same speed as the original procedure.

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Nested functions in numlib

2017-04-04 Thread Marco van de Voort
In our previous episode, Marco van de Voort said:
> > Is there a chance that such a patch would be accepted?
> 
> Did you test performance? Repeated access to parent frame in tight loops
> might be suboptimal. Could maybe be helped with some pointer work?

(no it can't be helped with pointer work, it is a loop around a function
call, not a memory access, don't reply in the middle of the
night Marco)
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


Re: [fpc-devel] Nested functions in numlib

2017-04-03 Thread Marco van de Voort
In our previous episode, Werner Pamler said:
> Is there a chance that such a patch would be accepted?

Did you test performance? Repeated access to parent frame in tight loops
might be suboptimal. Could maybe be helped with some pointer work?
___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


[fpc-devel] Nested functions in numlib

2017-04-03 Thread Werner Pamler
In addition to the incomplete gamma function I am planning to add a 
series of other special functions useful for statistical calculations, 
among them the incomplete beta function and its inverse.


Calculation of the inverse incomplete beta function usually is performed 
in the literature by numerical root finding procedures. Numlib is 
well-equipped for this purpose, it has the unit "roo" with the procedure 
"roof1r" which does a bisection search of the root lying between 
positive and negative guess values. The function for which the zero 
point is to be found must be specified as a procedure variable of type 
rfunc1r (defined in "typ", probably meaning "real function of one real 
variable"); in case of the inverse incomplete beta function, this 
function is the complete beta function, of course, and this depends on 
three parameters, a, b, and x.


In the current state of numlib, parameters must be declared globally so 
that the function passed as a parameter gets access to them:


  var
_a, _b, _y: ArbFloat;

  function _betai(x: ArbFloat): ArbFloat;
  begin
Result := betai(_a, _b, x) - _y;
  end;

  function invbetai(a, b, y: ArbFloat): ArbFloat;
  const
EPS = 1e-7;
  var
term: ArbInt = 0;
  begin
_a := a;
_b := b;
_y := y;
roof1r(@_betai, 0, 1, EPS, EPS, Result, term);
  end;

This is not very nice. It would be better if the test function could be 
declared as a nested function inside "invbetai":


  function invbetai(a, b, y: ArbFloat): ArbFloat;

function _invbetai(x: ArbFloat): ArbFloat;
begin
  Result := betai(a, b, x) - y;
end;

  const
EPS = 1e-7;
  var
term: ArbInt = 0;
  begin
roof1r(@_betai(0, 1, EPS, EPS, Result, term);
  end;

In order to make this construction pass the compiler I would like to

- declare another rfuncf1r type, now as "is nested"
type
  rfuncf1rn = function(x: ArbFloat): ArbFloat is nested;   // 
trailing "n" means "nested"


- modify roof1r to use the nested function
Procedure roof1r(f: rfunc1rn; a, b, ae, re: ArbFloat; Var x: 
ArbFloat; var term: ArbInt); overload;


- to avoid breaking existing code and to avoid forcing users of the 
original version to add a {$modeswitch nestedprocvars} to their code, I 
want to overload this function with a version using the non-nested 
function, but calling the nested version:


  Procedure roof1r(f: rfunc1r; a, b, ae, re: ArbFloat; Var x: ArbFloat; 
Var term: ArbInt); overload;


function _f(x: ArbFloat): ArbFloat;
begin
  Result := f(x);
end;

  begin
roof1r(@_f, a, b, ae, re, x, term);
  end;

Is there a chance that such a patch would be accepted?

___
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel