Michael Schnell wrote:
If it accesses a misaligned 32 bit value it does two accesses (not 4): e.g. once 8 bit and once 24 bit (when reading each of the accesses is the same 32 bit, anyway).
Logically you should think about it how I explained. That Intel did an optimization to make the speed impact less is a different issue: internally the processor still has to have separate "8 bit" data paths and do shifting to reorder the bytes.
Perhaps this behaviour is specified in their optimization documents, or maybe you have the VHDL source? :-)
Transferring data from/to the 1st level cache imposes a lot more delay than the misaligned access. Thus if there are many instances of a record variable that are used for calculation, it might be much faster to use the packed version. If there are only a few, usually the unpacked version should be faster.
Show me the benchmark results ;-) Micha _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-devel