Am 16.10.2017 um 23:08 schrieb Markus Beth:
> On 16.10.2017 22:41, Florian Klämpfl wrote:
>>> P.S.: I am currently working on another version of CompareByte that might 
>>> have a slightly higher
>>> latency for very small len but a higher throughput (2 cycles per iteration 
>>> vs. 3 cycles on an Intel
>>> Arrandale CPU (Westmere microarchitecture)). But this would need some more 
>>> testing and benchmarking.
>>> I can come up with it here again if this would be of any interest.
>>
>> Small lengths in terms of matching string or overall lengths?
> 
> It is small length in terms of matching string as there is some setup work 
> before the loop.
> 
>> BTW: I would really like to see a PCMPSTR based implementation :)
> PCMPSTR is (at the moment) out of my scope. I thought PCMPSTR is part of 
> SSE4.2. How would you deal
> with Intel core microarchitecture CPUs that don't have it?

Just set a flag at startup if it is supported and then branch on the flag. As 
the flag never
changes, branch prediction most likely will work very good.

_______________________________________________
fpc-devel maillist  -  fpc-devel@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

Reply via email to