Hello,

On Tue, Jun 06, 2017 at 11:18:36AM -0500, Michael Bringmann wrote:
> On 05/25/2017 10:30 AM, Michael Bringmann wrote:
> > I will try that patch shortly.  I also updated my patch to be conditional
> > on whether the pool's cpumask attribute was empty.  You should have received
> > V2 of that patch by now.
> 
> Let's try this again.
> 
> The hotplug problem goes away with the changes that you provided earlier, and

So, that means we're ending up in situations where NUMA online is a
proper superset of NUMA possible.

> shown in the patch below.  I kept this change to get_unbound_pool' as a just
> in case to explain the crash in the event that it occurs again:
> 
>     if (!cpumask_weight(pool->attrs->cpumask))
>         cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
> 
> I could also insert 
> 
>     BUG(!cpumask_weight(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
> 
> at that place, but I really prefer not to crash the system if there is a 
> workaround.

I'm not sure because it doesn't make any logical sense and it's not
right in terms of correctness.  The above would be able to enable CPUs
which are explicitly excluded from a workqueue.  The only fallback
which makes sense is falling back to the default pwq.

> > Can you please post the messages with the debug patch from the prev
> > thread?  In fact, let's please continue on that thread.  I'm having a
> > hard time following what's going wrong with the code.
> 
> Are these the failure logs that you requested?
> 
> 
> Red Hat Enterprise Linux Server 7.3 (Maipo)
> Kernel 4.12.0-rc1.wi91275_debug_03.ppc64le+ on an ppc64le
> 
> ltcalpine2-lp20 login: root
> Password: 
> Last login: Wed May 24 18:45:40 from oc1554177480.austin.ibm.com
> [root@ltcalpine2-lp20 ~]# numactl -H
> available: 2 nodes (0,6)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
> 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
> 51 52 53 54 55 56 57 58 59 60 61 62 63
> node 6 size: 19858 MB
> node 6 free: 16920 MB
> node distances:
> node   0   6 
>   0:  10  40 
>   6:  40  10 
> [root@ltcalpine2-lp20 ~]# numactl -H
> available: 2 nodes (0,6)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
> 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
> 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 
> 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 
> 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
> 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 
> 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 
> 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 
> 178 179 180 181 182 183 184 185 186 187 188 189 190 191
> node 6 size: 19858 MB
> node 6 free: 16362 MB
> node distances:
> node   0   6 
>   0:  10  40 
>   6:  40  10 
> [root@ltcalpine2-lp20 ~]# [  321.310943] workqueue:get_unbound_pool has empty 
> cpumask for pool attrs
> [  321.310961] ------------[ cut here ]------------
> [  321.310997] WARNING: CPU: 184 PID: 13201 at kernel/workqueue.c:3375 
> alloc_unbound_pwq+0x5c0/0x5e0
> [  321.311005] Modules linked in: rpadlpar_io rpaphp dccp_diag dccp tcp_diag 
> udp_diag inet_diag unix_diag af_packet_diag netlink_diag sg pseries_rng 
> ghash_generic gf128mul xts vmx_crypto binfmt_misc ip_tables xfs libcrc32c 
> sd_mod ibmvscsi ibmveth scsi_transport_srp dm_mirror dm_region_hash dm_log 
> dm_mod
> [  321.311097] CPU: 184 PID: 13201 Comm: cpuhp/184 Not tainted 
> 4.12.0-rc1.wi91275_debug_03.ppc64le+ #8
> [  321.311106] task: c000000408961080 task.stack: c000000406394000
> [  321.311113] NIP: c000000000116c80 LR: c000000000116c7c CTR: 
> 0000000000000000
> [  321.311121] REGS: c0000004063977b0 TRAP: 0700   Not tainted  
> (4.12.0-rc1.wi91275_debug_03.ppc64le+)
> [  321.311128] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
> [  321.311150]   CR: 28000082  XER: 00000000
> [  321.311159] CFAR: c000000000a2dc80 SOFTE: 1 
> [  321.311159] GPR00: c000000000116c7c c000000406397a30 c0000000013ae900 
> 000000000000003b 
> [  321.311159] GPR04: c000000408961a38 0000000000000006 00000000a49e41e5 
> ffffffffa4a5a483 
> [  321.311159] GPR08: 00000000000062cc 0000000000000000 0000000000000000 
> c000000408961a38 
> [  321.311159] GPR12: 0000000000000000 c00000000fb38c00 c00000000011e858 
> c00000040a902ac0 
> [  321.311159] GPR16: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000 
> [  321.311159] GPR20: c000000406394000 0000000000000002 c000000406394000 
> 0000000000000000 
> [  321.311159] GPR24: c000000405075400 c000000404fc0000 0000000000000110 
> c0000000015a4c88 
> [  321.311159] GPR28: 0000000000000000 c0000004fe256000 c0000004fe256008 
> c0000004fe052800 
> [  321.311290] NIP [c000000000116c80] alloc_unbound_pwq+0x5c0/0x5e0
> [  321.311298] LR [c000000000116c7c] alloc_unbound_pwq+0x5bc/0x5e0
> [  321.311305] Call Trace:
> [  321.311310] [c000000406397a30] [c000000000116c7c] 
> alloc_unbound_pwq+0x5bc/0x5e0 (unreliable)
> [  321.311323] [c000000406397ad0] [c000000000116e30] 
> wq_update_unbound_numa+0x190/0x270
> [  321.311334] [c000000406397b60] [c000000000118eb0] 
> workqueue_offline_cpu+0xe0/0x130
> [  321.311345] [c000000406397bf0] [c0000000000e9f20] 
> cpuhp_invoke_callback+0x240/0xcd0
> [  321.311355] [c000000406397cb0] [c0000000000eab28] 
> cpuhp_down_callbacks+0x78/0xf0
> [  321.311365] [c000000406397d00] [c0000000000eae6c] 
> cpuhp_thread_fun+0x18c/0x1a0
> [  321.311376] [c000000406397d30] [c0000000001251cc] 
> smpboot_thread_fn+0x2fc/0x3b0
> [  321.311386] [c000000406397dc0] [c00000000011e9c0] kthread+0x170/0x1b0
> [  321.311397] [c000000406397e30] [c00000000000b4f4] 
> ret_from_kernel_thread+0x5c/0x68
> [  321.311406] Instruction dump:
> [  321.311413] 3d42fff0 892ac565 2f890000 40fefd98 39200001 3c62ff89 3c82ff6c 
> 3863d590 
> [  321.311437] 38847cb0 992ac565 48916fc9 60000000 <0fe00000> 4bfffd70 
> 60000000 60420000 

The only way offlining can lead to this failure is when wq numa
possible cpu mask is a proper subset of the matching online mask.  Can
you please print out the numa online cpu and wq_numa_possible_cpumask
masks and verify that online stays within the possible for each node?
If not, the ppc arch init code needs to be updated so that cpu <->
node binding is establish for all possible cpus on boot.  Note that
this isn't a requirement coming solely from wq.  All node affine (thus
percpu) allocations depend on that.

Thanks.

-- 
tejun

Reply via email to