Thanks Jonas.  I'll see what I can put together.  A record with a single field is a bit of a special case, but one I'll keep in mind.  More than anything I'll have to study the disassembly to see what's happening, and if things are faster with primitive types simply because they're register variables (which are always faster than stack variables even on L1) or due to something else.

Gareth aka. Kit

On 28/06/2020 12:54, Jonas Maebe wrote:
[accidentally only sent to Gareth initially]

On 28/06/2020 12:31, J. Gareth Moreton wrote:
So someone reached out to me directly again asking for an FPC
optimisation.  Now I want to see if this is possible to optimise and
won't break something or be annoying specific.
The general optimisation that would handle this is promoting individual
record members into standalone variables when possible. FPC currently
has no support at all for this.

An optimisation that's a bit less general (although orthogonal in some
cases, namely when you don't need to access individual members), is
keeping records as a whole in a register. FPC already has support for
this, see tstoreddef.is_intregable and tabstractvarsym.setregable.

It does not get triggered here on x86-64 because of another involved
method: the {$if} at the end of tabstractvarsym.is_regvar. That code
prevents records from being kept in registers if they are written to on
all architectures except for PowerPC and PowerPC64.

The reason for this is that other supported architectures lack
instructions to efficiently extract and insert bitfields from/into
integer registers (although perhaps some of the newer x86-64 include
them as part of an extension; and I think AArch64 and certain MIPS
subarchs could also support it efficiently). This means that to perform
an operation on a field of a record kept in a register, you have to do
the following in the general case:
1) extract the field. On generic x86, that would be a move to a
temporary register, then possibly a shift, and then possibly an "and".
2) perform the operation
3) possibly shift back the value to the corect position, clear it in the
original register (mask its position with 0), and then "or" the result
to insert it again

In this case, just loading a value from memory (probably L1 cache, since
register variables are only used locally within a single routine),
performing the operation, and storing it back, is quite likely to be
faster, and definitely results in much smaller code.

However, as you've undoubtedly realised, in this case none of that
shifting/masking would come into play, since the record only contains a
single field. So you could definitely add an exception for that case for
all architectures. We even have the perfect helper method for that in
the mean time: tabstractrecordsymtable.has_single_field()


Jonas

PS: that person also asked the same question on the forum
(https://forum.lazarus.freepascal.org/index.php?topic=50364)

PS2: the case Benito mentions is a different thing again. Managed
records can never be kept *only* in a register, because they need
initialisation and finalisation, which requires them to be in memory.
Caching individual fields of those locally in a register (while the
record itself remains in memory) would definitely require the general
optimisation I mentioned in the first paragraph.
_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel


--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to