Daniël Mantione wrote:
Op Fri, 29 Feb 2008, schreef Christian Iversen:
Instead "unaligned" will simulate an unaligned load with two loads
and some rotation etc. On the ARM, where every mnemonic can rotate
operands, this is isn't that bad of a penalty.
Therefore, I wouldn't be surprised that even on ARM, arrays with
packed structures are faster than arrays with unpacked structures.
That's possible. Why would it be faster, btw? Better cache coherency?
Like I mentioned, unliek modern x86 processors, ARM processors cannot
detect an array traversal and preload the array into the cache. If the
array is not in cache, you get cache miss after cache miss.
Unlike modern x86 processors?
Granted, I haven't timed it, but most processors since early P4 models
are supposed to have "Streaming access detection", which is a fancy way
of saying array detection.
Are you sure your information is current?
(I could be wrong too, of course)
--
Med venlig hilsen
Christian Iversen
_______________________________________________
fpc-devel maillist - fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel