Hi everyone,

So I've been investigating a new optimisation, using Florian's GetNextInstructionUsingRegTrackingUse method, that improves upon removing MOV instructions and the like that write to registers whose values are never used (usually because the subroutine exits soon after), and a few times it even eliminates a register completely from a subroutine (theoretically it means I can remove the "push/pop" pair and SEH directives for that register).

I have run into one problem though, and I haven't been able to solve it yet (although I have one idea that I'll investigate when I'm less tired).  It seems in some rare circumstances, volatile registers aren't deallocated properly after a "call" instruction. One that stands out is the "fpc_ansistr_concat_multi" routine in the Win64 version of the System unit (search for ".section .text.n_fpc_ansistr_concat_multi" in the assembler dump "system.s")... it inserts a call to "fpc_unicodestr_assign" but doesn't free the volatile registers until much later.  When compiling under -O4,  the result is "fpc_unicodestr_assign" is called, then %edx is ALLOCATED (which is then removed because the tracked registers show %rdx is already assigned), and then %edx is used for a temporary storage (and these instructions are removed by the peephole optimizer via "MovMov2Mov 3").  The problem is, because of the register tracking and how GetNextInstructionUsingRegTrackingUse works, it now looks like the "xorq %rdx,%rdx" instruction prior to "fpc_unicodestr_assign" is a dead store, since the value of %rdx is completely overwritten by the commands following the "call" instruction and nothing in the tracking information indicates that this is a new allocation. This of course is false because %rdx contains one of fpc_unicodestr_assign's parameters.

So far it's not causing problems with the peephole optimizer as is, but it's causing an annoying block in my optimisation work and is kind of incorrect in regards to register tracking. Additionally, because all of the volatile registers are maked as 'in use' until many instructions later, the register allocator is forced to use a non-volatile register (%ebx in this case)  To summarise, the volatile registers aren't being deallocated immediately after "call fpc_unicodestr_assign" in the "fpc_ansistr_concat_multi" subroutine, and is blocking potential new optimisations and .is a source of minor inefficiencies.

If anyone has any answers or insight to this anomaly, I would be most grateful.  Thank you.

Gareth aka. Kit

P.S. To compile the system unit and get the assembler dump as I describe, build the RTL under Win64 with "make clean all FPC=(freshly built FPC binary) OPT="-O4 -a -ar".  You might want to build FPC with the "-dDEBUG_OPTALLOC" definition as well for extra information in the dumps.


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to