On Thu, Oct 10, 2013 at 11:44 PM, Richard Yao <[email protected]> wrote:
> On 10/10/2013 11:38 PM, Richard Yao wrote:
>> On 10/10/2013 11:29 PM, Xin Li wrote:
>>> On 10/10/13 20:18, Richard Yao wrote:
>>>> Thanks for letting us know about this. I have a few comments:
>>>
>>>> 1. We could eliminate a branch entirely by doing this:
>>>
>>>> mlen = MIN(d_end - dst, mlen); while (--mlen >= 0) *dst++ = *cpy++
>>>
>>> I don't think this eliminates the branching as MIN is usually a macro
>>> that expands to a > b ? b : a.
>>
>> My mistake. I was thinking of generic swap routines. I do think that
>> using the MIN() macro is more readable though.
>
> On second thought, I was right the first time. It is possible to do this
> without branching:
>
> #define MIN(x, y) ((y) ^ (((x) ^ (y)) & -((x) < (y))))
> #define MIN(x, y) ((x) ^ (((x) ^ (y)) & -((x) < (y))))

This does not do what you say it does.

> http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
>
> This makes MIN(d_end - dst, mlen) look inefficient, but a proper
> optimizing compiler should store the result of d_end - dst in a register
> to avoid doing the subtraction 3 times.

Please let the compiler do this work!  It is possible that the icache
pollution from multiple xors, ands and a cmp will be worse than a
single, well predicted branch.  The compiler may also have other
optimizations up its sleeve.

Do you have any evidence that this "optimization" is useful?


-- 
Eitan Adler
_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to