On 11/7/18 11:41 AM, David Laight wrote:
> From: Vlastimil Babka
>> Sent: 06 November 2018 12:51
>>
>> On 11/6/18 12:07 PM, David Laight wrote:
>>> From: Vlastimil Babka [mailto:vba...@suse.cz]
>>> 0000000000000020 <f1>:
>>>   20:   40 f6 c7 11             test   $0x11,%dil
>>>   24:   75 03                   jne    29 <f1+0x9>
>>>   26:   31 c0                   xor    %eax,%eax
>>>   28:   c3                      retq
>>>   29:   83 e7 01                and    $0x1,%edi
>>>   2c:   83 ff 01                cmp    $0x1,%edi
>>>   2f:   19 c0                   sbb    %eax,%eax
>>>   31:   83 c0 02                add    $0x2,%eax
>>>   34:   c3                      retq
>>>
>>> The jne will be predicted not taken and the retq predicted.
>>> So this might only be 1 clock in the normal case.
>>
>> I think this is the winner. It's also a single branch and not two,
>> because the compiler could figure out some of the "clever arithmetics"
>> itself. Care to send a full patch?
> 
> I've not got a suitable source tree lurking.
> So someone else would need to do it.
> I'll waive any copyright that could plausibly be assigned to the above!

There we go. This is to replace the current fix by Bart (sorry) which seems
to add an extra IMUL. Apparently current mainline is spamming anyone running
sparse with lots of warning, so it should be merged soon.

----8<----
>From ddd2fc6fcba425733f8320413a1451410687c9c3 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vba...@suse.cz>
Date: Fri, 9 Nov 2018 08:47:12 +0100
Subject: [PATCH] mm, slab: fix sparse warning in kmalloc_type()

Multiple people have reported the following sparse warning:

./include/linux/slab.h:332:43: warning: dubious: x & !y

The minimal fix would be to change the logical & to boolean &&, which emits the
same code, but Andrew has suggested that the branch-avoiding tricks are maybe
not worthwile. David Laight provided a nice comparison of disassembly of
multiple variants, which shows that the current version produces a 4 deep
dependency chain, and fixing the sparse warning by changing logical and to
multiplication emits an IMUL, making it even more expensive.

The code as rewritten by this patch yielded the best disassembly, with a single
predictable branch for the most common case, and a ternary operator for the
rest, which gcc seems to compile without a branch or cmov by itself.

The result should be more readable, without a sparse warning and probably also
faster for the common case.

Reported-by: Bart Van Assche <bvanass...@acm.org>
Reported-by: Darryl T. Agostinelli <dagostine...@gmail.com>
Suggested-by: Andrew Morton <a...@linux-foundation.org>
Suggested-by: David Laight <david.lai...@aculab.com>
Fixes: 1291523f2c1d ("mm, slab/slub: introduce kmalloc-reclaimable caches")
Signed-off-by: Vlastimil Babka <vba...@suse.cz>
---
 include/linux/slab.h | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 918f374e7156..18c6920c2803 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -304,6 +304,8 @@ enum kmalloc_cache_type {
        KMALLOC_RECLAIM,
 #ifdef CONFIG_ZONE_DMA
        KMALLOC_DMA,
+#else
+       KMALLOC_DMA = KMALLOC_NORMAL,
 #endif
        NR_KMALLOC_TYPES
 };
@@ -314,22 +316,20 @@ kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1];
 
 static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags)
 {
-       int is_dma = 0;
-       int type_dma = 0;
-       int is_reclaimable;
-
-#ifdef CONFIG_ZONE_DMA
-       is_dma = !!(flags & __GFP_DMA);
-       type_dma = is_dma * KMALLOC_DMA;
-#endif
+       int gfp_dma = IS_ENABLED(CONFIG_ZONE_DMA) ? __GFP_DMA : 0;
 
-       is_reclaimable = !!(flags & __GFP_RECLAIMABLE);
+       /*
+        * The most common case is KMALLOC_NORMAL, so test for it
+        * with a single branch for both flags.
+        */
+       if (likely((flags & (gfp_dma | __GFP_RECLAIMABLE)) == 0))
+               return KMALLOC_NORMAL;
 
        /*
-        * If an allocation is both __GFP_DMA and __GFP_RECLAIMABLE, return
-        * KMALLOC_DMA and effectively ignore __GFP_RECLAIMABLE
+        * At least one of the flags has to be set. If both are, __GFP_DMA
+        * is more important.
         */
-       return type_dma + (is_reclaimable & !is_dma) * KMALLOC_RECLAIM;
+       return flags & gfp_dma ? KMALLOC_DMA : KMALLOC_RECLAIM;
 }
 
 /*
-- 
2.19.1

Reply via email to