This might prove quite complicated to implement. Writing a similar test function and modifying my node dump to get what I need (the node dump tool does its work before the second pass, so inlined functions aren't expanded), I get a very complicated node tree for a single inlined function call to "floor" with a parameter of type Single:
---- ... ...... .........2 .........SmallInt .........$000000005E511570 ......... ...... ... ... ...... .........8 .........Double .........$000000005E5114F0 ......... ...... ... ... ...... ......... ............Double ............$000000005E5114F0 ............ti_may_be_in_reg ......... ...... ...... ......... ............ ...............TESTSINGLES ............ ......... ......... ............ ...............X ............ ......... ...... ... ... ...... ......... ............ ...............SmallInt ...............$000000005E511570 ...............ti_may_be_in_reg ............ ......... ...... ...... ......... ............ ............... ..................Double ..................$000000005E5114F0 ..................ti_may_be_in_reg ............... ............ ......... ......... ............ ............... .................. ..................... $fpc_frac_real(Double):Double; ..................... ........................ ........................ ...........................Double ...........................$000000005E5114F0 ...........................ti_may_be_in_reg ........................ ..................... .................. ............... ............... .................. 0.0000000000000000E+000 ............... ............ ......... ...... ... ... ...... .........FALSE .........Double .........tt_persistent .........$000000005E5114F0 ...... ... ... ...... .........TRUE .........SmallInt .........tt_persistent .........$000000005E511570 ...... ... ... ...... .........SmallInt .........$000000005E511570 .........ti_may_be_in_reg ...... ... ---- Because it's a test function, the result type is SmalInt rather than LongInt due to the compiler options, but the effect is the same. Most of the attributes and flags I can ignore, but it's going to be a mammoth task to check all of these nodes and confirm that the function is what it's meant to be. Don't get me wrong, it can be done, but I'm worried it will take the compiler a disproportionately long time doing so. Still, at the same time, this is an example where the node dump is useful, if still needing some work. Gareth aka. Kit On Mon 04/02/19 19:28 , "J. Gareth Moreton" gar...@moreton-family.com sent: I might hold on this for a little bit until I get more out of my node outputting feature, since I need to study the nodes produced by an inlined Floor function carefully. For example, Floor's formal parameter is further passed separately into Trunc and Frac - normally it's not a problem, but if the actual parameter is a complex expression (i.e. isn't a simple constant or variable), then it may produce even more nodes as it's calculated twice, once for Trunc and once for Frac... or it's computed beforehand and put into a temporary store that's hidden from the programmer. I won't know for sure until I study the nodes and make a good contingency. I'll likely make 3 versions of the floor function (not including the Pascal version that already exists, which the compiler can fall back on if it's dealing with the "Extended" type, for example), one that uses SSE2, one that uses SSE4.1 (which introduces the ROUNDSD instruction) and one that uses AVX (which is effectively identical to the SSE4.1 one, albeit using the AVX functions). The node optimisation is definitely the better choice, thinking about it now, also because if the compiler determines that the parameters are of type Single, it can use the single-precision SSE instructions rather than converting from Single to Double and back again. I just feel like this is possibly a little bloated because it's the kind of optimisation that belongs to an internal function rather than one in a supplementary unit... unless you want to promote "floor" and similar functions from the Math unit into internal functions through the System unit. This is proving to be a fascinating learning experience, not just of coding but also of design and discussion! Gareth aka. Kit On Mon 04/02/19 20:04 , "Florian Klämpfl" flor...@freepascal.org sent: Am 04.02.19 um 17:47 schrieb J. Gareth Moreton: > Oh whoops, sorry about that and not replying to the list. > > I'll try not to screw up. Generally I think Double is preferred because > then everything uses SSE2 and no awkward ferrying of data between it and > the floating-point stack is required (come to think of it, only Win64 > actually requires the presence of SSE2 and refuses to install if it's > not present). > > Given that Florian prefers a node micro-optimisation for functions like > floor, it should be easy enough to check if the input is of type Single > or Double, and drop out if it's Extended (falling back to the actual > source code). Well, in case of a node optimization in combination with inline I do not see it as a real micro optimization as it results in the best code which is not the case if it is ifdef'ed assembler code in a unit which is most of the time not used (fpc x86-64 rtl is build with -Cfsse2 normally). _______________________________________________ fpc-devel maillist - http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [1]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2]">http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel Links: ------ [1] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel [2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
_______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel