On 11/6/18 12:07 PM, David Laight wrote: > From: Vlastimil Babka [mailto:vba...@suse.cz] >> Sent: 06 November 2018 10:22 > ... >>>> - return type_dma + (is_reclaimable & !is_dma) * KMALLOC_RECLAIM; >>>> + return type_dma + is_reclaimable * !is_dma * KMALLOC_RECLAIM; >>> >>> ISTM that changing is_dma and is_reclaimable from int to bool will stop the >>> bleating. >>> >>> It is also strange that this code is trying so hard here to avoid >>> conditional instructions > > I've done some experiments, compiled with gcc 4.7.3 and -O2 > The constants match those from the kernel headers. > > It is noticable that there isn't a cmov in sight.
There is with newer gcc: https://godbolt.org/z/qFdByQ But even that didn't remove the imul in f3() so that's indeed a bust. > The code would also be better if the KMALLOC constants matched the GFP ones. That would be hard, as __GFP flags have also other constraints (especially __GFP_DMA relative to other zone restricting __GFP flags) and KMALLOC_* are used as array index. > unsigned int f1(unsigned int flags) > { > return !__builtin_expect(flags & (__GFP_DMA | __GFP_RECLAIM), 0) ? 0 > : flags & __GFP_DMA ? KMALLOC_DMA : KMALLOC_RECLAIM; > } > ... > 0000000000000020 <f1>: > 20: 40 f6 c7 11 test $0x11,%dil > 24: 75 03 jne 29 <f1+0x9> > 26: 31 c0 xor %eax,%eax > 28: c3 retq > 29: 83 e7 01 and $0x1,%edi > 2c: 83 ff 01 cmp $0x1,%edi > 2f: 19 c0 sbb %eax,%eax > 31: 83 c0 02 add $0x2,%eax > 34: c3 retq > > The jne will be predicted not taken and the retq predicted. > So this might only be 1 clock in the normal case. I think this is the winner. It's also a single branch and not two, because the compiler could figure out some of the "clever arithmetics" itself. Care to send a full patch?