It is late here and I think that I will refrain from sending emails this
late without some sort of stimulant in my system in the future. At this
point, I think it is safe to say that the stream of corrections has had
a similar effect and I am fully awake now.

I think using MIN() makes the code more readable. The point about
eliminating the branch was an error on my part. My reference to bit
twiddling hacks was meant more to say that it could be done than that it
was a good idea.

With that said, the compiler should be smart enough to do this trick on
its own when the CPU benefits from it and I am content to let it do that.

On 10/10/2013 11:55 PM, Eitan Adler wrote:
> On Thu, Oct 10, 2013 at 11:44 PM, Richard Yao <[email protected]> wrote:
>> On 10/10/2013 11:38 PM, Richard Yao wrote:
>>> On 10/10/2013 11:29 PM, Xin Li wrote:
>>>> On 10/10/13 20:18, Richard Yao wrote:
>>>>> Thanks for letting us know about this. I have a few comments:
>>>>
>>>>> 1. We could eliminate a branch entirely by doing this:
>>>>
>>>>> mlen = MIN(d_end - dst, mlen); while (--mlen >= 0) *dst++ = *cpy++
>>>>
>>>> I don't think this eliminates the branching as MIN is usually a macro
>>>> that expands to a > b ? b : a.
>>>
>>> My mistake. I was thinking of generic swap routines. I do think that
>>> using the MIN() macro is more readable though.
>>
>> On second thought, I was right the first time. It is possible to do this
>> without branching:
>>
Received: from [140.211.166.183] (helo=smtp.gentoo.org)
        by node002.open-zfs.net
        with esmtp (HybridCluster distributed mail proxy)
        (envelope-from <[email protected]>); Fri, 11 Oct 2013 04:16:05 -0000
>> #define MIN(x, y) ((y) ^ (((x) ^ (y)) & -((x) < (y))))
>> #define MIN(x, y) ((x) ^ (((x) ^ (y)) & -((x) < (y))))
> 
> This does not do what you say it does.
> 
>> http://graphics.stanford.edu/~seander/bithacks.html#IntegerMinOrMax
>>
>> This makes MIN(d_end - dst, mlen) look inefficient, but a proper
>> optimizing compiler should store the result of d_end - dst in a register
>> to avoid doing the subtraction 3 times.
> 
> Please let the compiler do this work!  It is possible that the icache
> pollution from multiple xors, ands and a cmp will be worse than a
> single, well predicted branch.  The compiler may also have other
> optimizations up its sleeve.
> 
> Do you have any evidence that this "optimization" is useful?
> 
> 


Attachment: signature.asc
Description: OpenPGP digital signature

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Reply via email to