Re: [fpc-devel] Nested functions in numlib
Performance-wise, moreover for some tasks (such as Matrix multiplication) with modern multi-core machines parallel calculation could increase performance greatly. This can be done e.g. by using a thread pool. (I once did a thread pool implementation based on TThread, but I suppose there are more "official" sources). -Michael ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Nested functions in numlib
Am 04.04.2017 um 03:28 schrieb Marco van de Voort: Did you test performance? Repeated access to parent frame in tight loops might be suboptimal. Could maybe be helped with some pointer work? Right, I should have done that before asking... Here are the results of a test running the original roof1r routine (A), the modified one using the nested function (B) and other modified one using a non-nested function but calling the version with the nested function (C). In each case, several functions are passed to the root finder which is called 5 million times, each call with a (reproducibly) different parameter: f(x) = x (A) ORIGINAL version: 0.656s for 500 runs (check: y = 0.) (B) NESTED version: 0.703s for 500 runs (7%) (C)Global function calling nested function: 0.735s for 500 runs (12%) f(x) = x^2 ORIGINAL version: 6.296s for 500 runs (check: y = 0.) NESTED version: 6.313s for 500 runs (0%) Global function calling nested function: 6.546s for 500 runs (4%) f(x) = exp(x) ORIGINAL version: 6.734s for 500 runs (check: y = 0.) NESTED version: 6.703s for 500 runs (0%) Global function calling nested function: 6.890s for 500 runs (2%) f(x) = arcsin(x) ORIGINAL version: 5.718s for 500 runs (check: y = 0.) NESTED version: 5.718s for 500 runs (0%) Global function calling nested function: 5.937s for 500 runs (4%) f(x) = erf(x) ORIGINAL version: 6.391s for 500 runs (check: y = 0.) NESTED version: 6.422s for 500 runs (0%) Global function calling nested function: 6.673s for 500 runs (4%) f(x) = gammaLn(x) ORIGINAL version: 15.260s for 500 runs (check: y = 0.) NESTED version: 15.142s for 500 runs (-1%) Global function calling nested function: 15.426s for 500 runs (1%) I would interpret these results such that there are no dramatic slow-downs due to calling variant C. Variant B (nested funtion) is roughly the same speed as the original procedure. ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Nested functions in numlib
In our previous episode, Marco van de Voort said: > > Is there a chance that such a patch would be accepted? > > Did you test performance? Repeated access to parent frame in tight loops > might be suboptimal. Could maybe be helped with some pointer work? (no it can't be helped with pointer work, it is a loop around a function call, not a memory access, don't reply in the middle of the night Marco) ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
Re: [fpc-devel] Nested functions in numlib
In our previous episode, Werner Pamler said: > Is there a chance that such a patch would be accepted? Did you test performance? Repeated access to parent frame in tight loops might be suboptimal. Could maybe be helped with some pointer work? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
[fpc-devel] Nested functions in numlib
In addition to the incomplete gamma function I am planning to add a series of other special functions useful for statistical calculations, among them the incomplete beta function and its inverse. Calculation of the inverse incomplete beta function usually is performed in the literature by numerical root finding procedures. Numlib is well-equipped for this purpose, it has the unit "roo" with the procedure "roof1r" which does a bisection search of the root lying between positive and negative guess values. The function for which the zero point is to be found must be specified as a procedure variable of type rfunc1r (defined in "typ", probably meaning "real function of one real variable"); in case of the inverse incomplete beta function, this function is the complete beta function, of course, and this depends on three parameters, a, b, and x. In the current state of numlib, parameters must be declared globally so that the function passed as a parameter gets access to them: var _a, _b, _y: ArbFloat; function _betai(x: ArbFloat): ArbFloat; begin Result := betai(_a, _b, x) - _y; end; function invbetai(a, b, y: ArbFloat): ArbFloat; const EPS = 1e-7; var term: ArbInt = 0; begin _a := a; _b := b; _y := y; roof1r(@_betai, 0, 1, EPS, EPS, Result, term); end; This is not very nice. It would be better if the test function could be declared as a nested function inside "invbetai": function invbetai(a, b, y: ArbFloat): ArbFloat; function _invbetai(x: ArbFloat): ArbFloat; begin Result := betai(a, b, x) - y; end; const EPS = 1e-7; var term: ArbInt = 0; begin roof1r(@_betai(0, 1, EPS, EPS, Result, term); end; In order to make this construction pass the compiler I would like to - declare another rfuncf1r type, now as "is nested" type rfuncf1rn = function(x: ArbFloat): ArbFloat is nested; // trailing "n" means "nested" - modify roof1r to use the nested function Procedure roof1r(f: rfunc1rn; a, b, ae, re: ArbFloat; Var x: ArbFloat; var term: ArbInt); overload; - to avoid breaking existing code and to avoid forcing users of the original version to add a {$modeswitch nestedprocvars} to their code, I want to overload this function with a version using the non-nested function, but calling the nested version: Procedure roof1r(f: rfunc1r; a, b, ae, re: ArbFloat; Var x: ArbFloat; Var term: ArbInt); overload; function _f(x: ArbFloat): ArbFloat; begin Result := f(x); end; begin roof1r(@_f, a, b, ae, re, x, term); end; Is there a chance that such a patch would be accepted? ___ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel