Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

Daniël Mantione Thu, 28 Feb 2008 03:38:03 -0800


Op Thu, 28 Feb 2008, schreef Yury Sidorov:

Yes, but if you have an array of them (as we have in this case),
considerably more of these records will fit in the cache. Therefore you
will have considerably less cache misses. This becomes even more serious
when the processor in question does not have prefetching; in such case,
traversing the array will cause cache miss after cache miss, a smaller
array will then have less of these misses.
You are right. Array of packed records is a bit more effective than array ofnon-packed records, at least on modern x86 CPUs.
I do some benchmarks and got on Core Duo:
2070ms - for non-packed
1910ms - for packed
But for CPUs which do not support misaligned data access - packed records arespeed killers and need to be used as the last resort.

I not 100% sure about this. Your Core Duo has a array traverse detectorwhich activates prefetching. An ARM does not have such logic and willsuffer cache miss after cache miss.

However, it is for certain that a manual unaligned load is more expensiveon ARM than a hardware unaligned load on x86.

Also if record is not element of large array it is better do declare it asnon-packed for all CPUs.


Yes.

Daniël

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/mailman/listinfo/fpc-devel

Re: [fpc-devel] Patch, font rendering on Arm-Linux devices.

Reply via email to