Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
Hello: On 06/14/2017 12:27 AM, Balbir Singh wrote: > On Wed, Jun 14, 2017 at 3:25 PM, Balbir Singhwrote: >> >> >> On Wed, Jun 14, 2017 at 8:21 AM, Michael Bringmann >> wrote: >>> >>> On a related note, we are discussing the addition of 2 new device-tree >>> properties >>> with Pete Heyrman and his fellows that should simplify the determination >>> of the >>> set of required nodes. >>> >>> * One property would provide the total/max number of nodes needed by the >>> kernel >>> on the current hardware. >> >> > > Yes, that would be nice to have > >> >>> >>> * A second property would provide the total/max number of nodes that the >>> kernel >>> could use on any system to which it could be migrated. >>> >> > > Not sure about this one, are you suggesting more memory can be added > depending on the migration target? We would use only one of these numbers to allocate nodes. I have only been on the periphery of the discussions, so I can not communicate the full reasoning as to why both measures would be needed. We would like to have the first number for node allocation/initialization, but if only the second value were provided, we would likely need to use it. >> >> >>> >>> These properties aren't available, yet, and it takes time to define new >>> properties >>> in the PAPR and have them implemented in pHyp and the kernel. As an >>> intermediary >>> step, the systems which are doing a lot of dynamic hot-add/hot-remove >>> configuration >>> could provide equivalent information to the PowerPC kernel with a command >>> line >>> parameter. The 'numa.c' code would then read this value and fill in the >>> necessary >>> entries in the 'node_possible_map'. >>> >>> Would you foresee any problems with using such a feature? >> >> > > Sorry my mailer goofed up, resending > > Balbir Singh > Thanks. -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com
Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
Hello: On 06/14/2017 12:27 AM, Balbir Singh wrote: > On Wed, Jun 14, 2017 at 3:25 PM, Balbir Singh wrote: >> >> >> On Wed, Jun 14, 2017 at 8:21 AM, Michael Bringmann >> wrote: >>> >>> On a related note, we are discussing the addition of 2 new device-tree >>> properties >>> with Pete Heyrman and his fellows that should simplify the determination >>> of the >>> set of required nodes. >>> >>> * One property would provide the total/max number of nodes needed by the >>> kernel >>> on the current hardware. >> >> > > Yes, that would be nice to have > >> >>> >>> * A second property would provide the total/max number of nodes that the >>> kernel >>> could use on any system to which it could be migrated. >>> >> > > Not sure about this one, are you suggesting more memory can be added > depending on the migration target? We would use only one of these numbers to allocate nodes. I have only been on the periphery of the discussions, so I can not communicate the full reasoning as to why both measures would be needed. We would like to have the first number for node allocation/initialization, but if only the second value were provided, we would likely need to use it. >> >> >>> >>> These properties aren't available, yet, and it takes time to define new >>> properties >>> in the PAPR and have them implemented in pHyp and the kernel. As an >>> intermediary >>> step, the systems which are doing a lot of dynamic hot-add/hot-remove >>> configuration >>> could provide equivalent information to the PowerPC kernel with a command >>> line >>> parameter. The 'numa.c' code would then read this value and fill in the >>> necessary >>> entries in the 'node_possible_map'. >>> >>> Would you foresee any problems with using such a feature? >> >> > > Sorry my mailer goofed up, resending > > Balbir Singh > Thanks. -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com
Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
On Wed, Jun 14, 2017 at 3:25 PM, Balbir Singhwrote: > > > On Wed, Jun 14, 2017 at 8:21 AM, Michael Bringmann > wrote: >> >> On a related note, we are discussing the addition of 2 new device-tree >> properties >> with Pete Heyrman and his fellows that should simplify the determination >> of the >> set of required nodes. >> >> * One property would provide the total/max number of nodes needed by the >> kernel >> on the current hardware. > > Yes, that would be nice to have > >> >> * A second property would provide the total/max number of nodes that the >> kernel >> could use on any system to which it could be migrated. >> > Not sure about this one, are you suggesting more memory can be added depending on the migration target? > > >> >> These properties aren't available, yet, and it takes time to define new >> properties >> in the PAPR and have them implemented in pHyp and the kernel. As an >> intermediary >> step, the systems which are doing a lot of dynamic hot-add/hot-remove >> configuration >> could provide equivalent information to the PowerPC kernel with a command >> line >> parameter. The 'numa.c' code would then read this value and fill in the >> necessary >> entries in the 'node_possible_map'. >> >> Would you foresee any problems with using such a feature? > > Sorry my mailer goofed up, resending Balbir Singh
Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
On Wed, Jun 14, 2017 at 3:25 PM, Balbir Singh wrote: > > > On Wed, Jun 14, 2017 at 8:21 AM, Michael Bringmann > wrote: >> >> On a related note, we are discussing the addition of 2 new device-tree >> properties >> with Pete Heyrman and his fellows that should simplify the determination >> of the >> set of required nodes. >> >> * One property would provide the total/max number of nodes needed by the >> kernel >> on the current hardware. > > Yes, that would be nice to have > >> >> * A second property would provide the total/max number of nodes that the >> kernel >> could use on any system to which it could be migrated. >> > Not sure about this one, are you suggesting more memory can be added depending on the migration target? > > >> >> These properties aren't available, yet, and it takes time to define new >> properties >> in the PAPR and have them implemented in pHyp and the kernel. As an >> intermediary >> step, the systems which are doing a lot of dynamic hot-add/hot-remove >> configuration >> could provide equivalent information to the PowerPC kernel with a command >> line >> parameter. The 'numa.c' code would then read this value and fill in the >> necessary >> entries in the 'node_possible_map'. >> >> Would you foresee any problems with using such a feature? > > Sorry my mailer goofed up, resending Balbir Singh
Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
On a related note, we are discussing the addition of 2 new device-tree properties with Pete Heyrman and his fellows that should simplify the determination of the set of required nodes. * One property would provide the total/max number of nodes needed by the kernel on the current hardware. * A second property would provide the total/max number of nodes that the kernel could use on any system to which it could be migrated. These properties aren't available, yet, and it takes time to define new properties in the PAPR and have them implemented in pHyp and the kernel. As an intermediary step, the systems which are doing a lot of dynamic hot-add/hot-remove configuration could provide equivalent information to the PowerPC kernel with a command line parameter. The 'numa.c' code would then read this value and fill in the necessary entries in the 'node_possible_map'. Would you foresee any problems with using such a feature? Thanks. On 06/13/2017 05:45 AM, Michael Ellerman wrote: > Michael Bringmannwrites: > >> Here is the information from 2 different kernels. I have not been able to >> retrieve >> the information matching yesterday's attachments, yet, as those dumps were >> acquired in April. >> >> Attached please find 2 dumps of similar material from kernels running with my >> current patches (Linux 4.4, Linux 4.12). > > OK thanks. > > I'd actually like to see the dmesg output from a kernel *without* your > patches. > > Looking at the device tree properties: > > ltcalpine2-lp9:/proc/device-tree/ibm,dynamic-reconfiguration-memory # lsprop > ibm,associativity-lookup-arrays > ibm,associativity-lookup-arrays >0004 = 4 arrays > 0004 = of 4 entries each > > 0001 0001 > 0003 0006 0006 > 0003 0007 0007 > > > Which does tell us that nodes 0, 1, 6 and 7 exist. > > So your idea of looking at that and setting any node found in there > online should work. > > My only worry is that behaviour appears to be completely undocumented in > PAPR, ie. PAPR explicitly says that property only needs to contain > values for LMBs present at boot. > > But possibly we can talk to the PowerVM/PAPR guys and have that changed > so that it becomes something we can rely on. > > cheers > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com
Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
On a related note, we are discussing the addition of 2 new device-tree properties with Pete Heyrman and his fellows that should simplify the determination of the set of required nodes. * One property would provide the total/max number of nodes needed by the kernel on the current hardware. * A second property would provide the total/max number of nodes that the kernel could use on any system to which it could be migrated. These properties aren't available, yet, and it takes time to define new properties in the PAPR and have them implemented in pHyp and the kernel. As an intermediary step, the systems which are doing a lot of dynamic hot-add/hot-remove configuration could provide equivalent information to the PowerPC kernel with a command line parameter. The 'numa.c' code would then read this value and fill in the necessary entries in the 'node_possible_map'. Would you foresee any problems with using such a feature? Thanks. On 06/13/2017 05:45 AM, Michael Ellerman wrote: > Michael Bringmann writes: > >> Here is the information from 2 different kernels. I have not been able to >> retrieve >> the information matching yesterday's attachments, yet, as those dumps were >> acquired in April. >> >> Attached please find 2 dumps of similar material from kernels running with my >> current patches (Linux 4.4, Linux 4.12). > > OK thanks. > > I'd actually like to see the dmesg output from a kernel *without* your > patches. > > Looking at the device tree properties: > > ltcalpine2-lp9:/proc/device-tree/ibm,dynamic-reconfiguration-memory # lsprop > ibm,associativity-lookup-arrays > ibm,associativity-lookup-arrays >0004 = 4 arrays > 0004 = of 4 entries each > > 0001 0001 > 0003 0006 0006 > 0003 0007 0007 > > > Which does tell us that nodes 0, 1, 6 and 7 exist. > > So your idea of looking at that and setting any node found in there > online should work. > > My only worry is that behaviour appears to be completely undocumented in > PAPR, ie. PAPR explicitly says that property only needs to contain > values for LMBs present at boot. > > But possibly we can talk to the PowerVM/PAPR guys and have that changed > so that it becomes something we can rely on. > > cheers > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com
Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
Michael Bringmannwrites: > Here is the information from 2 different kernels. I have not been able to > retrieve > the information matching yesterday's attachments, yet, as those dumps were > acquired in April. > > Attached please find 2 dumps of similar material from kernels running with my > current patches (Linux 4.4, Linux 4.12). OK thanks. I'd actually like to see the dmesg output from a kernel *without* your patches. Looking at the device tree properties: ltcalpine2-lp9:/proc/device-tree/ibm,dynamic-reconfiguration-memory # lsprop ibm,associativity-lookup-arrays ibm,associativity-lookup-arrays 0004 = 4 arrays 0004 = of 4 entries each 0001 0001 0003 0006 0006 0003 0007 0007 Which does tell us that nodes 0, 1, 6 and 7 exist. So your idea of looking at that and setting any node found in there online should work. My only worry is that behaviour appears to be completely undocumented in PAPR, ie. PAPR explicitly says that property only needs to contain values for LMBs present at boot. But possibly we can talk to the PowerVM/PAPR guys and have that changed so that it becomes something we can rely on. cheers
Re: RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
Michael Bringmann writes: > Here is the information from 2 different kernels. I have not been able to > retrieve > the information matching yesterday's attachments, yet, as those dumps were > acquired in April. > > Attached please find 2 dumps of similar material from kernels running with my > current patches (Linux 4.4, Linux 4.12). OK thanks. I'd actually like to see the dmesg output from a kernel *without* your patches. Looking at the device tree properties: ltcalpine2-lp9:/proc/device-tree/ibm,dynamic-reconfiguration-memory # lsprop ibm,associativity-lookup-arrays ibm,associativity-lookup-arrays 0004 = 4 arrays 0004 = of 4 entries each 0001 0001 0003 0006 0006 0003 0007 0007 Which does tell us that nodes 0, 1, 6 and 7 exist. So your idea of looking at that and setting any node found in there online should work. My only worry is that behaviour appears to be completely undocumented in PAPR, ie. PAPR explicitly says that property only needs to contain values for LMBs present at boot. But possibly we can talk to the PowerVM/PAPR guys and have that changed so that it becomes something we can rely on. cheers
RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
Here is the information from 2 different kernels. I have not been able to retrieve the information matching yesterday's attachments, yet, as those dumps were acquired in April. Attached please find 2 dumps of similar material from kernels running with my current patches (Linux 4.4, Linux 4.12). On 06/07/2017 07:08 AM, Michael Ellerman wrote: > Michael Bringmannwrites: > >> On 06/06/2017 04:48 AM, Michael Ellerman wrote: >>> Michael Bringmann writes: On 06/01/2017 04:36 AM, Michael Ellerman wrote: > Do you actually see mention of nodes 0 and 8 in the dmesg? When the 'numa.c' code is built with debug messages, and the system was given that configuration by pHyp, yes, I did. > What does it say? The debug message for each core thread would be something like, removing cpu 64 from node 0 adding cpu 64 to node 8 repeated for all 8 threads of the CPU, and usually with the messages for all of the CPUs coming out intermixed on the console/dmesg log. >>> >>> OK. I meant what do you see at boot. >> >> Here is an example with nodes 0,2,6,7, node 0 starts out empty: >> >> [0.00] Initmem setup node 0 >> [0.00] NODE_DATA [mem 0x3bff7d6300-0x3bff7d] >> [0.00] NODE_DATA(0) on node 7 >> [0.00] Initmem setup node 2 [mem 0x-0x13] >> [0.00] NODE_DATA [mem 0x136300-0x13] >> [0.00] Initmem setup node 6 [mem 0x14-0x34afff] >> [0.00] NODE_DATA [mem 0x34afff6300-0x34afff] >> [0.00] Initmem setup node 7 [mem 0x34b000-0x3b] >> [0.00] NODE_DATA [mem 0x3bff7cc600-0x3bff7d62ff] >> >> [0.00] Zone ranges: >> [0.00] DMA [mem 0x-0x003b] >> [0.00] DMA32empty >> [0.00] Normal empty >> [0.00] Movable zone start for each node >> [0.00] Early memory node ranges >> [0.00] node 2: [mem 0x-0x0013] >> [0.00] node 6: [mem 0x0014-0x0034afff] >> [0.00] node 7: [mem 0x0034b000-0x003b] >> [0.00] Could not find start_pfn for node 0 >> [0.00] Initmem setup node 0 [mem >> 0x-0x] >> [0.00] Initmem setup node 2 [mem >> 0x-0x0013] >> [0.00] Initmem setup node 6 [mem >> 0x0014-0x0034afff] >> [0.00] Initmem setup node 7 [mem >> 0x0034b000-0x003b] >> [0.00] percpu: Embedded 3 pages/cpu @c03bf800 s155672 r0 >> d40936 u262144 >> [0.00] Built 4 zonelists in Node order, mobility grouping on. Total >> pages: 3928320 >> >> and, >> >> [root@ltcalpine2-lp20 ~]# numactl --hardware >> available: 4 nodes (0,2,6-7) >> node 0 cpus: >> node 0 size: 0 MB >> node 0 free: 0 MB >> node 2 cpus: 16 17 18 19 20 21 22 23 32 33 34 35 36 37 38 39 56 57 58 59 60 >> 61 62 63 >> node 2 size: 81792 MB >> node 2 free: 81033 MB >> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 >> 40 41 42 43 44 45 46 47 >> node 6 size: 133743 MB >> node 6 free: 133097 MB >> node 7 cpus: 48 49 50 51 52 53 54 55 >> node 7 size: 29877 MB >> node 7 free: 29599 MB >> node distances: >> node 0 2 6 7 >> 0: 10 40 40 40 >> 2: 40 10 40 40 >> 6: 40 40 10 20 >> 7: 40 40 20 10 >> [root@ltcalpine2-lp20 ~]# > > What kernel is that running? > > And can you show me the full ibm,dynamic-memory and lookup-arrays > properties for that system? > > cheers > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com Red Hat Enterprise Linux Server 7.3 (Maipo) Kernel 4.12.0-rc3.wi91275_054c_060106.ppc64le+ on an ppc64le ltcalpine2-lp20 login: root Password: Last login: Wed Jun 7 11:03:12 from oc1554177480.austin.ibm.com [root@ltcalpine2-lp20 ~]# numactl -H available: 3 nodes (0,2-3) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 48 49 50 51 52 53 54 55 node 2 size: 188668 MB node 2 free: 187903 MB node 3 cpus: 40 41 42 43 44 45 46 47 56 57 58 59 60 61 62 63 node 3 size: 56261 MB node 3 free: 55324 MB node distances: node 0 2 3 0: 10 40 40 2: 40 10 20 3: 40 20 10 [root@ltcalpine2-lp20 ~]# cd /proc/device-tree/ibm,dynamic-reconfiguration-memory [root@ltcalpine2-lp20 ibm,dynamic-reconfiguration-memory]# lsprop ibm,dynamic-memory ibm,dynamic-memory 059e 2000 8002 0001 0008 3000 8003 0001 0008 4000 8004
RESEND Re: [Patch 2/2]: powerpc/hotplug/mm: Fix hot-add memory node assoc
Here is the information from 2 different kernels. I have not been able to retrieve the information matching yesterday's attachments, yet, as those dumps were acquired in April. Attached please find 2 dumps of similar material from kernels running with my current patches (Linux 4.4, Linux 4.12). On 06/07/2017 07:08 AM, Michael Ellerman wrote: > Michael Bringmann writes: > >> On 06/06/2017 04:48 AM, Michael Ellerman wrote: >>> Michael Bringmann writes: On 06/01/2017 04:36 AM, Michael Ellerman wrote: > Do you actually see mention of nodes 0 and 8 in the dmesg? When the 'numa.c' code is built with debug messages, and the system was given that configuration by pHyp, yes, I did. > What does it say? The debug message for each core thread would be something like, removing cpu 64 from node 0 adding cpu 64 to node 8 repeated for all 8 threads of the CPU, and usually with the messages for all of the CPUs coming out intermixed on the console/dmesg log. >>> >>> OK. I meant what do you see at boot. >> >> Here is an example with nodes 0,2,6,7, node 0 starts out empty: >> >> [0.00] Initmem setup node 0 >> [0.00] NODE_DATA [mem 0x3bff7d6300-0x3bff7d] >> [0.00] NODE_DATA(0) on node 7 >> [0.00] Initmem setup node 2 [mem 0x-0x13] >> [0.00] NODE_DATA [mem 0x136300-0x13] >> [0.00] Initmem setup node 6 [mem 0x14-0x34afff] >> [0.00] NODE_DATA [mem 0x34afff6300-0x34afff] >> [0.00] Initmem setup node 7 [mem 0x34b000-0x3b] >> [0.00] NODE_DATA [mem 0x3bff7cc600-0x3bff7d62ff] >> >> [0.00] Zone ranges: >> [0.00] DMA [mem 0x-0x003b] >> [0.00] DMA32empty >> [0.00] Normal empty >> [0.00] Movable zone start for each node >> [0.00] Early memory node ranges >> [0.00] node 2: [mem 0x-0x0013] >> [0.00] node 6: [mem 0x0014-0x0034afff] >> [0.00] node 7: [mem 0x0034b000-0x003b] >> [0.00] Could not find start_pfn for node 0 >> [0.00] Initmem setup node 0 [mem >> 0x-0x] >> [0.00] Initmem setup node 2 [mem >> 0x-0x0013] >> [0.00] Initmem setup node 6 [mem >> 0x0014-0x0034afff] >> [0.00] Initmem setup node 7 [mem >> 0x0034b000-0x003b] >> [0.00] percpu: Embedded 3 pages/cpu @c03bf800 s155672 r0 >> d40936 u262144 >> [0.00] Built 4 zonelists in Node order, mobility grouping on. Total >> pages: 3928320 >> >> and, >> >> [root@ltcalpine2-lp20 ~]# numactl --hardware >> available: 4 nodes (0,2,6-7) >> node 0 cpus: >> node 0 size: 0 MB >> node 0 free: 0 MB >> node 2 cpus: 16 17 18 19 20 21 22 23 32 33 34 35 36 37 38 39 56 57 58 59 60 >> 61 62 63 >> node 2 size: 81792 MB >> node 2 free: 81033 MB >> node 6 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31 >> 40 41 42 43 44 45 46 47 >> node 6 size: 133743 MB >> node 6 free: 133097 MB >> node 7 cpus: 48 49 50 51 52 53 54 55 >> node 7 size: 29877 MB >> node 7 free: 29599 MB >> node distances: >> node 0 2 6 7 >> 0: 10 40 40 40 >> 2: 40 10 40 40 >> 6: 40 40 10 20 >> 7: 40 40 20 10 >> [root@ltcalpine2-lp20 ~]# > > What kernel is that running? > > And can you show me the full ibm,dynamic-memory and lookup-arrays > properties for that system? > > cheers > > -- Michael W. Bringmann Linux Technology Center IBM Corporation Tie-Line 363-5196 External: (512) 286-5196 Cell: (512) 466-0650 m...@linux.vnet.ibm.com Red Hat Enterprise Linux Server 7.3 (Maipo) Kernel 4.12.0-rc3.wi91275_054c_060106.ppc64le+ on an ppc64le ltcalpine2-lp20 login: root Password: Last login: Wed Jun 7 11:03:12 from oc1554177480.austin.ibm.com [root@ltcalpine2-lp20 ~]# numactl -H available: 3 nodes (0,2-3) node 0 cpus: node 0 size: 0 MB node 0 free: 0 MB node 2 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 48 49 50 51 52 53 54 55 node 2 size: 188668 MB node 2 free: 187903 MB node 3 cpus: 40 41 42 43 44 45 46 47 56 57 58 59 60 61 62 63 node 3 size: 56261 MB node 3 free: 55324 MB node distances: node 0 2 3 0: 10 40 40 2: 40 10 20 3: 40 20 10 [root@ltcalpine2-lp20 ~]# cd /proc/device-tree/ibm,dynamic-reconfiguration-memory [root@ltcalpine2-lp20 ibm,dynamic-reconfiguration-memory]# lsprop ibm,dynamic-memory ibm,dynamic-memory 059e 2000 8002 0001 0008 3000 8003 0001 0008 4000 8004 0001 0008