Le 29/01/2018 à 21:20, Mathieu Desnoyers a écrit :
Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.

Looking at switch_mm_irqs_off(), I found it more complex than expected and found that this patch is the reason for that complexity.

Before the patch (ie in kernel 4.14), we have:

00000000 <switch_mm_irqs_off>:
   0:   81 24 01 c8     lwz     r9,456(r4)
   4:   71 29 00 01     andi.   r9,r9,1
   8:   40 82 00 1c     bne     24 <switch_mm_irqs_off+0x24>
   c:   39 24 01 c8     addi    r9,r4,456
  10:   39 40 00 01     li      r10,1
  14:   7d 00 48 28     lwarx   r8,0,r9
  18:   7d 08 53 78     or      r8,r8,r10
  1c:   7d 00 49 2d     stwcx.  r8,0,r9
  20:   40 c2 ff f4     bne-    14 <switch_mm_irqs_off+0x14>
  24:   7c 04 18 40     cmplw   r4,r3
  28:   81 24 00 24     lwz     r9,36(r4)
  2c:   91 25 04 4c     stw     r9,1100(r5)
  30:   4d 82 00 20     beqlr
  34:   48 00 00 00     b       34 <switch_mm_irqs_off+0x34>
                        34: R_PPC_REL24 switch_mmu_context


After the patch (ie in 5.13-rc6), that now is:

00000000 <switch_mm_irqs_off>:
   0:   81 24 02 18     lwz     r9,536(r4)
   4:   71 29 00 01     andi.   r9,r9,1
   8:   41 82 00 24     beq     2c <switch_mm_irqs_off+0x2c>
   c:   7c 04 18 40     cmplw   r4,r3
  10:   81 24 00 24     lwz     r9,36(r4)
  14:   91 25 04 d0     stw     r9,1232(r5)
  18:   4d 82 00 20     beqlr
  1c:   81 24 00 28     lwz     r9,40(r4)
  20:   71 29 00 0a     andi.   r9,r9,10
  24:   40 82 00 34     bne     58 <switch_mm_irqs_off+0x58>
  28:   48 00 00 00     b       28 <switch_mm_irqs_off+0x28>
                        28: R_PPC_REL24 switch_mmu_context
  2c:   39 24 02 18     addi    r9,r4,536
  30:   39 40 00 01     li      r10,1
  34:   7d 00 48 28     lwarx   r8,0,r9
  38:   7d 08 53 78     or      r8,r8,r10
  3c:   7d 00 49 2d     stwcx.  r8,0,r9
  40:   40 a2 ff f4     bne     34 <switch_mm_irqs_off+0x34>
  44:   7c 04 18 40     cmplw   r4,r3
  48:   81 24 00 24     lwz     r9,36(r4)
  4c:   91 25 04 d0     stw     r9,1232(r5)
  50:   4d 82 00 20     beqlr
  54:   48 00 00 00     b       54 <switch_mm_irqs_off+0x54>
                        54: R_PPC_REL24 switch_mmu_context
  58:   2c 03 00 00     cmpwi   r3,0
  5c:   41 82 ff cc     beq     28 <switch_mm_irqs_off+0x28>
  60:   48 00 00 00     b       60 <switch_mm_irqs_off+0x60>
                        60: R_PPC_REL24 switch_mmu_context


Especially, the comparison of 'prev' to 0 is pointless as both cases end up with just branching to 'switch_mmu_context'

I don't understand all that complexity to just replace a simple 
'smp_mb__after_unlock_lock()'.

#define smp_mb__after_unlock_lock()     smp_mb()
#define smp_mb()        barrier()
# define barrier() __asm__ __volatile__("": : :"memory")


Am I missing some subtility ?

Thanks
Christophe

Reply via email to