Am 16.10.2017 um 23:08 schrieb Markus Beth: > On 16.10.2017 22:41, Florian Klämpfl wrote: >>> P.S.: I am currently working on another version of CompareByte that might >>> have a slightly higher >>> latency for very small len but a higher throughput (2 cycles per iteration >>> vs. 3 cycles on an Intel >>> Arrandale CPU (Westmere microarchitecture)). But this would need some more >>> testing and benchmarking. >>> I can come up with it here again if this would be of any interest. >> >> Small lengths in terms of matching string or overall lengths? > > It is small length in terms of matching string as there is some setup work > before the loop. > >> BTW: I would really like to see a PCMPSTR based implementation :) > PCMPSTR is (at the moment) out of my scope. I thought PCMPSTR is part of > SSE4.2. How would you deal > with Intel core microarchitecture CPUs that don't have it?
Just set a flag at startup if it is supported and then branch on the flag. As the flag never changes, branch prediction most likely will work very good. _______________________________________________ fpc-devel maillist - fpc-devel@lists.freepascal.org http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel