[PATCH 02/34] New SMBIOS driver for x86 and ia64.
This, along with the System Firmware (sysfw) interface replaces the existing DMI code in the kernel. This subsystem provides functionality for individual drivers to access the SMBIOS structures for their own use, smbios_walk(), as well as some helper functions for some kernel modules. Cc: linux-i...@vger.kernel.org Cc: x...@kernel.org Cc: linux-a...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: openipmi-develo...@lists.sourceforge.net Cc: platform-driver-...@vger.kernel.org Cc: linux-crypto@vger.kernel.org Cc: dri-de...@lists.freedesktop.org Cc: lm-sens...@lm-sensors.org Cc: linux-...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: linux-in...@vger.kernel.org Cc: linux-me...@vger.kernel.org Cc: net...@vger.kernel.org Cc: linux-...@vger.kernel.org Cc: rtc-li...@googlegroups.com Cc: e...@driverdev.osuosl.org Cc: linux-...@vger.kernel.org Cc: device-drivers-de...@blackfin.uclinux.org Cc: linux-watch...@vger.kernel.org Cc: grant.lik...@secretlab.ca Cc: d...@debian.org Cc: rpur...@rpsys.net Cc: eric.p...@tremplin-utc.net Cc: abe...@mit.edu Cc: john...@2ka.mipt.ru Signed-off-by: Prarit Bhargava pra...@redhat.com --- Documentation/ABI/obsolete/sysfs-dmi | 40 ++ Documentation/ABI/testing/sysfw-smbios | 36 ++ arch/ia64/Kconfig |5 + arch/ia64/include/asm/smbios.h | 12 + arch/x86/Kconfig | 10 + arch/x86/include/asm/smbios.h | 19 + drivers/firmware/Kconfig | 20 + drivers/firmware/Makefile |2 + drivers/firmware/smbios-sysfs.c| 705 drivers/firmware/smbios.c | 687 +++ include/linux/smbios.h | 243 +++ 11 files changed, 1779 insertions(+), 0 deletions(-) create mode 100644 Documentation/ABI/obsolete/sysfs-dmi create mode 100644 Documentation/ABI/testing/sysfw-smbios create mode 100644 arch/ia64/include/asm/smbios.h create mode 100644 arch/x86/include/asm/smbios.h create mode 100644 drivers/firmware/smbios-sysfs.c create mode 100644 drivers/firmware/smbios.c create mode 100644 include/linux/smbios.h diff --git a/Documentation/ABI/obsolete/sysfs-dmi b/Documentation/ABI/obsolete/sysfs-dmi new file mode 100644 index 000..547dc4b --- /dev/null +++ b/Documentation/ABI/obsolete/sysfs-dmi @@ -0,0 +1,40 @@ +What: /sys/class/dmi +Date: July 2011 +KernelVersion: 3.0 +Contact: Prarit Bhargava pra...@redhat.com +Description: + The dmi class is exported if CONFIG_SMBIOS_DMI_COMPAT is + set. + + The DMI code currently exposes several values via sysfs. + These values are: + + bios_date: The datestamp of the BIOS. + bios_vendor: The company that wrote the BIOS. + bios_version: The version of the BIOS. + board_asset_tag: A unique identifier for the system +motherboard. + board_name: The name of the type of motherboard. + board_serial: The serial number of the motherboard. + board_vendor: The company that designed the motherboard. + board_version: The version of the motherboard. + chassis_asset_tag: A unique identifier for the chassis. + chassis_serial: The serial number of the chassis. + chassis_type: The type of chassis. + chassis_vendor: The company that designed the chassis. + chassis_version: The version of the chassis. + product_name: The name of the system as determined by the + OEM. + product_serial: The serial number of the system. + product_uuid: A unique UUID for the system. + product_version: The version number of the system. + sys_vendor: The OEM company for the system. + + In addition to these the standard class files are exposed + for DMI (uvent, power, subsystem) as well as a modalias + file. + + The dmi class is deprecated and should not be used by new + code. Existing code should be migrated to use + /sys/class/smbios/* which exposes the same data. + The dmi class link will be removed in July of 2013. diff --git a/Documentation/ABI/testing/sysfw-smbios b/Documentation/ABI/testing/sysfw-smbios new file mode 100644 index 000..c6045f1 --- /dev/null +++ b/Documentation/ABI/testing/sysfw-smbios @@ -0,0 +1,36 @@ +What: /sys/bus/sysfw +Date: May 1 2011 +KernelVersion: 2.6.39 +Contact: Prarit Bhargava pra...@redhat.com +Description: + The dmi class is exported if CONFIG_SMBIOS is set. + + The SMBIOS code currently exposes several values via sysfs + primarily for use by module handling code. + + These values
[PATCH] crypto, qat, use generic numa functions
While testing, the following panic was seen: IP: [8115b8d7] __alloc_pages_nodemask+0x97/0x420 PGD 0 Oops: [#1] SMP Modules linked in: aesni_intel ptp lrw qat_dh895xcc(+) intel_qat pps_core i2c_algo_bit authenc gf128mul iTCO_wdt ioatdma glue_helper sb_edac i2c_i801 ablk_helper serio_raw iTCO_vendor_support pcspkr edac_core shpchp i2c_core cryptd dca lpc_ich mfd_core wmi xfs libcrc32c sd_mod crc_t10dif crct10dif_common ahci libahci libata dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 1235 Comm: systemd-udevd Not tainted 3.10.0-165.el7.x86_64 #1 Hardware name: Intel Corporation SandyBridge Platform/To be filled by O.E.M., BIOS CCFRCLC0.019.1308201516 08/20/2013 task: 88006d068000 ti: 88006ca0c000 task.ti: 88006ca0c000 RIP: 0010:[8115b8d7] [8115b8d7] __alloc_pages_nodemask+0x97/0x420 RSP: 0018:88006ca0f928 EFLAGS: 00010246 RAX: 2000 RBX: RCX: 88006ca0ffd8 RDX: RSI: 0002 RDI: 002052d0 RBP: 88006ca0f9c8 R08: 0008 R09: 0002 R10: 0068 R11: ffc4 R12: 002052d0 R13: R14: 0002 R15: FS: 7f999a6f9880() GS:880076a0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 2008 CR3: 6c916000 CR4: 001407f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Stack: 88007ac07700 88006ca0f940 811a43d9 88006ca0fa00 811a4a0a 88007ac00e30 88007ac00e10 880076a17 8802 2000 000180d0 Call Trace: [811a43d9] ? discard_slab+0x39/0x50 [811a4a0a] ? deactivate_slab+0x35a/0x3c0 [811a3521] new_slab+0x91/0x300 [815ee9ed] __slab_alloc+0x2bb/0x482 [8101b923] ? native_sched_clock+0x13/0x80 [8101b999] ? sched_clock+0x9/0x10 [a01b8177] ? adf_probe+0xb7/0x5a0 [qat_dh895xcc] [812cce6f] ? idr_get_empty_slot+0x16f/0x3c0 [812cce6f] ? idr_get_empty_slot+0x16f/0x3c0 [811a690b] kmem_cache_alloc_node_trace+0x9b/0x220 [a01b8177] adf_probe+0xb7/0x5a0 [qat_dh895xcc] [81237bd2] ? sysfs_addrm_finish+0x42/0xe0 [812379b1] ? __sysfs_add_one+0x61/0x100 [812fee25] local_pci_probe+0x45/0xa0 [81300295] ? pci_match_device+0xc5/0xd0 [813003d9] pci_device_probe+0xf9/0x150 [813caee7] driver_probe_device+0x87/0x390 [813cb2c3] __driver_attach+0x93/0xa0 [813cb230] ? __device_attach+0x40/0x40 [813c8c73] bus_for_each_dev+0x73/0xc0 [813ca93e] driver_attach+0x1e/0x20 [813ca490] bus_add_driver+0x200/0x2d0 [813cb944] driver_register+0x64/0xf0 [812ffe95] __pci_register_driver+0xa5/0xc0 [a01be000] ? 0xa01bdfff [a01be03a] adfdrv_init+0x3a/0x1000 [qat_dh895xcc] [810020b8] do_one_initcall+0xb8/0x230 [810da32a] load_module+0x131a/0x1b20 [812ee3e0] ? ddebug_proc_write+0xf0/0xf0 [810d68c3] ? copy_module_from_fd.isra.43+0x53/0x150 [810dace6] SyS_finit_module+0xa6/0xd0 [81601a69] system_call_fastpath+0x16/0x1b Code: c1 eb 02 c1 e8 13 83 e3 02 83 e0 01 09 c3 44 23 25 cf 22 8a 00 48 c7 45 c0 00 00 00 00 41 f6 c4 10 0f 85 55 02 00 00 48 8b 45 b0 48 83 78 08 00 0f 84 a3 01 00 00 0f 1f 44 00 00 48 8b 45 b0 44 The method in which the qat code determines the numa node for memory allocations is a bit clunky. On 2 socket, single node systems it is possible that adf_get_dev_node_id() returns node 1, even though node 1 doesn't exist. This code transitions the qat code to the generic numa functions. Changing adf_get_dev_node_id() to a simple call to dev_get_node() results in a change to the adf_accel_dev struct as well. In addition to that change, qat_crypto_get_instance_node() must check for any node as a valid numa_node value. Cc: Tadeusz Struk tadeusz.st...@intel.com Cc: Herbert Xu herb...@gondor.apana.org.au Cc: David S. Miller da...@davemloft.net Cc: Bruce Allan bruce.w.al...@intel.com Cc: Prarit Bhargava pra...@redhat.com Cc: John Griffin john.grif...@intel.com Cc: qat-li...@intel.com Cc: linux-crypto@vger.kernel.org Signed-off-by: Prarit Bhargava pra...@redhat.com --- drivers/crypto/qat/qat_common/adf_accel_devices.h |2 +- drivers/crypto/qat/qat_common/qat_algs.c |7 +-- drivers/crypto/qat/qat_common/qat_crypto.c|4 +++- drivers/crypto/qat/qat_dh895xcc/adf_drv.c | 19 ++- 4 files changed, 7 insertions(+), 25 deletions(-) diff --git a/drivers/crypto/qat/qat_common/adf_accel_devices.h b/drivers/crypto/qat/qat_common/adf_accel_devices.h index 9282381..025f52f 100644 --- a/drivers/crypto/qat/qat_common/adf_accel_devices.h +++ b/drivers/crypto/qat/qat_common
Re: [PATCH] crypto, qat, use generic numa functions
On 10/08/2014 11:50 AM, Tadeusz Struk wrote: Hi Prarit, On 10/07/2014 05:12 PM, Prarit Bhargava wrote: The method in which the qat code determines the numa node for memory allocations is a bit clunky. On 2 socket, single node systems it is possible that adf_get_dev_node_id() returns node 1, even though node 1 doesn't exist. This code transitions the qat code to the generic numa functions. Changing adf_get_dev_node_id() to a simple call to dev_get_node() results in a change to the adf_accel_dev struct as well. The problem with that is we don't want to use any valid numa node, but the node we are connected to or we don't want to use the accelerator at all. Otherwise, when the first valid numa node happens to be the remote node the dma transactions we be slow and instead of accelerating we will slow things down. A patch that enforces this is on it's way. Yeah, I was actually wondering if dev_get_node() returns NO_NODE, then we should just default to 0? I'll wait for your patch ... P. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto: qat - Enforce valid numa configuration.
On 10/08/2014 01:38 PM, Tadeusz Struk wrote: In a system with NUMA configuration we want to enforce that the accelerator is connected to a node with memory to avoid cross QPI memory transaction. Otherwise there is no point in using the accelerator as the encryption in software will be faster. Signed-off-by: Tadeusz Struk tadeusz.st...@intel.com --- drivers/crypto/qat/qat_common/adf_accel_devices.h |3 +-- drivers/crypto/qat/qat_common/adf_transport.c | 12 +++- drivers/crypto/qat/qat_common/qat_algs.c |5 +++-- drivers/crypto/qat/qat_common/qat_crypto.c|8 +--- drivers/crypto/qat/qat_dh895xcc/adf_admin.c |2 +- drivers/crypto/qat/qat_dh895xcc/adf_drv.c |9 - drivers/crypto/qat/qat_dh895xcc/adf_isr.c |2 +- 7 files changed, 26 insertions(+), 15 deletions(-) diff --git a/drivers/crypto/qat/qat_common/adf_accel_devices.h b/drivers/crypto/qat/qat_common/adf_accel_devices.h index 3cfe195..96e0b06 100644 --- a/drivers/crypto/qat/qat_common/adf_accel_devices.h +++ b/drivers/crypto/qat/qat_common/adf_accel_devices.h @@ -203,8 +203,7 @@ struct adf_accel_dev { struct dentry *debugfs_dir; struct list_head list; struct module *owner; - uint8_t accel_id; - uint8_t numa_node; struct adf_accel_pci accel_pci_dev; + uint8_t accel_id; } __packed; #endif diff --git a/drivers/crypto/qat/qat_common/adf_transport.c b/drivers/crypto/qat/qat_common/adf_transport.c index 5f3fa45..9dd2cb7 100644 --- a/drivers/crypto/qat/qat_common/adf_transport.c +++ b/drivers/crypto/qat/qat_common/adf_transport.c @@ -419,9 +419,10 @@ static int adf_init_bank(struct adf_accel_dev *accel_dev, WRITE_CSR_RING_BASE(csr_addr, bank_num, i, 0); ring = bank-rings[i]; if (hw_data-tx_rings_mask (1 i)) { - ring-inflights = kzalloc_node(sizeof(atomic_t), -GFP_KERNEL, -accel_dev-numa_node); + ring-inflights = + kzalloc_node(sizeof(atomic_t), + GFP_KERNEL, + dev_to_node(GET_DEV(accel_dev))); if (!ring-inflights) goto err; } else { @@ -469,13 +470,14 @@ int adf_init_etr_data(struct adf_accel_dev *accel_dev) int i, ret; etr_data = kzalloc_node(sizeof(*etr_data), GFP_KERNEL, - accel_dev-numa_node); + dev_to_node(GET_DEV(accel_dev))); if (!etr_data) return -ENOMEM; num_banks = GET_MAX_BANKS(accel_dev); size = num_banks * sizeof(struct adf_etr_bank_data); - etr_data-banks = kzalloc_node(size, GFP_KERNEL, accel_dev-numa_node); + etr_data-banks = kzalloc_node(size, GFP_KERNEL, +dev_to_node(GET_DEV(accel_dev))); if (!etr_data-banks) { ret = -ENOMEM; goto err_bank; diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c index bffa8bf..0897a1c 100644 --- a/drivers/crypto/qat/qat_common/qat_algs.c +++ b/drivers/crypto/qat/qat_common/qat_algs.c @@ -598,7 +598,8 @@ static int qat_alg_sgl_to_bufl(struct qat_crypto_instance *inst, if (unlikely(!n)) return -EINVAL; - bufl = kmalloc_node(sz, GFP_ATOMIC, inst-accel_dev-numa_node); + bufl = kmalloc_node(sz, GFP_ATOMIC, + dev_to_node(GET_DEV(inst-accel_dev))); if (unlikely(!bufl)) return -ENOMEM; @@ -644,7 +645,7 @@ static int qat_alg_sgl_to_bufl(struct qat_crypto_instance *inst, struct qat_alg_buf *bufers; buflout = kmalloc_node(sz, GFP_ATOMIC, -inst-accel_dev-numa_node); +dev_to_node(GET_DEV(inst-accel_dev))); if (unlikely(!buflout)) goto err; bloutp = dma_map_single(dev, buflout, sz, DMA_TO_DEVICE); diff --git a/drivers/crypto/qat/qat_common/qat_crypto.c b/drivers/crypto/qat/qat_common/qat_crypto.c index 060dc0a..c1eefc4 100644 --- a/drivers/crypto/qat/qat_common/qat_crypto.c +++ b/drivers/crypto/qat/qat_common/qat_crypto.c @@ -109,12 +109,14 @@ struct qat_crypto_instance *qat_crypto_get_instance_node(int node) list_for_each(itr, adf_devmgr_get_head()) { accel_dev = list_entry(itr, struct adf_accel_dev, list); - if (accel_dev-numa_node == node adf_dev_started(accel_dev)) + if ((node == dev_to_node(GET_DEV(accel_dev)) || + dev_to_node(GET_DEV(accel_dev)) 0) + adf_dev_started(accel_dev))
Re: [PATCH 2/2] crypto: qat - Enforce valid numa configuration.
On 10/08/2014 02:11 PM, Tadeusz Struk wrote: On 10/08/2014 10:57 AM, Prarit Bhargava wrote: node = adf_get_dev_node_id(pdev); ^^^ I don't think you should ever make this call. IMO it is wrong to do it that way. Just stick with node = dev_to_node(pdev-dev) as the line below forces a default to that anyway. But then how do I know which node I'm physically connected to? The pci_dev maps to the bus which maps to a numa node. The pci_dev's numa value is copied directly from the bus (or busses depending on how deep it is). I'd argue (strongly) that the pci_dev's numa ID better be correct o/w that is a FW bug (and a bad one at that these days). dev_to_node() should return the correct value. + if (node != dev_to_node(pdev-dev) dev_to_node(pdev-dev) 0) { + /* If the accelerator is connected to a node with no memory + * there is no point in using the accelerator since the remote + * memory transaction will be very slow. */ + dev_err(pdev-dev, Invalid NUMA configuration.\n); + return -EINVAL; Hmm ... I wonder if it would be safe to do /* force allocations to node 0 */ node = 0; dev_err(pdev-dev, Invalid NUMA configuration detected, node id = %d . Defaulting node to 0. \n, node); and then continue? As the comment say there is no point continuing if the configuration is wrong. Defaulting to 0 will cause the same panic you pointed out in your first patch if node 0 has no memory. Okay, but at least fix up the warning message to output the node_id. That's sort of the important piece here. P. And maybe even a FW_WARN of some sort here might be appropriate to indicate that something is wrong with the mapping? In any case a better error message is a always good idea IMO. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto: qat - Enforce valid numa configuration.
On 10/08/2014 02:57 PM, Tadeusz Struk wrote: On 10/08/2014 11:35 AM, Prarit Bhargava wrote: But then how do I know which node I'm physically connected to? The pci_dev maps to the bus which maps to a numa node. The pci_dev's numa value is copied directly from the bus (or busses depending on how deep it is). I'd argue (strongly) that the pci_dev's numa ID better be correct o/w that is a FW bug (and a bad one at that these days). dev_to_node() should return the correct value. I'm not saying that the dev_to_node() returns incorrect value. It will always return the closest numa node for the given device. No that isn't correct. dev_to_node() will return the node the device is connected to. What we want to enforce is that the closest numa node is the node that the device is physically connected to. In case if the closest numa node is the remote node we don't want to use this accelerator. P. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto: qat - Enforce valid numa configuration.
[sorry ... accidentally hit reply instead of reply all ... resending to everyone] On 10/08/2014 03:25 PM, Tadeusz Struk wrote: On 10/08/2014 12:01 PM, Prarit Bhargava wrote: No that isn't correct. dev_to_node() will return the node the device is connected to. include/linux/device.h: static inline int dev_to_node(struct device *dev) { return dev-numa_node; } struct device { . int numa_node; /* NUMA node this device is close to */ ... That's just bad english. The numa node value (for pci devices) is read from the ACPI tables on the system and represents the node that the pci_dev is connected to. }; In case when there are two nodes and only node 0 has memory, dev-numa_node will be 0 even though the device will be connected to the pci root port of node 1. Your calculation completely falls apart and returns incorrect values when cpu hotplug is used or if there are multi-socket nodes (as was the case on the system that panicked), or if one uses the new cluster-on-die mode. P. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto: qat - Enforce valid numa configuration.
On 10/09/2014 12:14 PM, Tadeusz Struk wrote: On 10/09/2014 04:23 AM, Prarit Bhargava wrote: int numa_node; /* NUMA node this device is close to */ ... That's just bad english. The numa node value (for pci devices) is read from the ACPI tables on the system and represents the node that the pci_dev is connected to. }; In case when there are two nodes and only node 0 has memory, dev-numa_node will be 0 even though the device will be connected to the pci root port of node 1. Your calculation completely falls apart and returns incorrect values when cpu hotplug is used or if there are multi-socket nodes (as was the case on the system that panicked), or if one uses the new cluster-on-die mode. This calculation is sole for multi-socket configuration. This is why is was introduced and what it was tested for. There is no point discussing NUMA for single-socket configuration. Single socket configurations are not NUMA. In this case dev-numa_node is usually equal to NUMA_NO_NODE (-1) and adf_get_dev_node_id(pdev) will always return 0; The fact that you return an incorrect value here for any configuration is simply put, bad. You shouldn't do that. Please confirm that, but I think the system it panicked on was a two sockets system with only node 0 populated with memory and accelerator plugged it to node 1 (phys_proc_id == 1). In this case adf_get_dev_node_id(pdev) returned 1 and this was passed to kzalloc_node(size, GFP_KERNEL, 1) and because there was no memory on node 1 kzalloc_node() panicked. Yep; but my interpretation was that node 1 didn't exist at all and it panicked. This patch will make sure that this will not happen and that the configuration will be optimal. Yep, it will. But what about cpu hotplug? P. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto: qat - Enforce valid numa configuration.
On 10/09/2014 03:55 PM, Tadeusz Struk wrote: On 10/09/2014 10:32 AM, Prarit Bhargava wrote: This calculation is sole for multi-socket configuration. This is why is was introduced and what it was tested for. There is no point discussing NUMA for single-socket configuration. Single socket configurations are not NUMA. In this case dev-numa_node is usually equal to NUMA_NO_NODE (-1) and adf_get_dev_node_id(pdev) will always return 0; The fact that you return an incorrect value here for any configuration is simply put, bad. You shouldn't do that. Well I wouldn't say it is incorrect. adf_get_dev_node_id() returns the phys proc id the dev is connected to, so in single socket configuration there is only one socket 0. That's not entirely true -- see my previous comment about Please confirm that, but I think the system it panicked on was a two sockets system with only node 0 populated with memory and accelerator plugged it to node 1 (phys_proc_id == 1). In this case adf_get_dev_node_id(pdev) returned 1 and this was passed to kzalloc_node(size, GFP_KERNEL, 1) and because there was no memory on node 1 kzalloc_node() panicked. Yep; but my interpretation was that node 1 didn't exist at all and it panicked. Why didn't exist? The reason the kernel panics is because there is only node 0. Try allocating, for example, from node 100. You'll hit the same panic. That's different than saying node 0 has no memory ... I think we agree on this FWIW ... we just have slightly different interpretations of the panic :) The fact that there was no memory on node 1 doesn't make it disappear. There are two sockets in the platform 0 1 even though only one (node 0) has memory. The only problem with that is - it is far from optimal if device connected to node 1 uses memory on node 0. And this is what would happen if we would use dev_to_node(dev) here. This patch will make sure that this will not happen and that the configuration will be optimal. Yep, it will. But what about cpu hotplug? I don't think cpu hotplug matters here. This is one (probe) time determination if the configuration is optimal or not and if it makes sense to use this accelerator or not. It absolutely matters. num_online_cpus() *changes* depending on the # of cpus. P. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto: qat - Enforce valid numa configuration.
On 10/09/2014 07:12 PM, Tadeusz Struk wrote: On 10/09/2014 02:42 PM, Prarit Bhargava wrote: I don't think cpu hotplug matters here. This is one (probe) time determination if the configuration is optimal or not and if it makes sense to use this accelerator or not. It absolutely matters. num_online_cpus() *changes* depending on the # of cpus. Sure, but I still think that we are safe here. No, you're not. Dropping a single CPU changes num_online_cpus(), which results in static uint8_t adf_get_dev_node_id(struct pci_dev *pdev) { unsigned int bus_per_cpu = 0; struct cpuinfo_x86 *c = cpu_data(num_online_cpus() - 1); this being different. if (!c-phys_proc_id) return 0; bus_per_cpu = 256 / (c-phys_proc_id + 1); this being different if (bus_per_cpu != 0) return pdev-bus-number / bus_per_cpu; and this being different return 0; } P. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] crypto: qat - Enforce valid numa configuration.
On 10/10/2014 09:25 AM, Tadeusz Struk wrote: On 10/10/2014 04:23 AM, Prarit Bhargava wrote: Sure, but I still think that we are safe here. No, you're not. Dropping a single CPU changes num_online_cpus(), which results in static uint8_t adf_get_dev_node_id(struct pci_dev *pdev) { unsigned int bus_per_cpu = 0; struct cpuinfo_x86 *c = cpu_data(num_online_cpus() - 1); this being different. if (!c-phys_proc_id) return 0; bus_per_cpu = 256 / (c-phys_proc_id + 1); this being different if (bus_per_cpu != 0) return pdev-bus-number / bus_per_cpu; and this being different return 0; } You forgot to explain how this is not safe. Sorry, I thought I did explain it. My apologies. So let's say you boot the system and load the driver. At this time, num_online_cpus@boot = 4 . Crunch through the math above, and you reference the cpuinfo_x86 struct for cpu 3 (the fourth cpu), and the calculation takes into account c-phys_proc_id. So let's say now you boot the system and disable a cpu. In this case, now num_online_cpus@module_load = 3. Crunch through the math above and you're referncing a different cpuinfo_x86 struct for cpu 2. That may or may not point at the same c-phys_proc_id. That changes the calculation and gives an incorrect value. In addition to that I haven't even talked about the possibility of hot-adding and hot-removing cpus in sockets which changes the numbering scheme completely. In short, that calcuation is wrong. Don't use it; stick with the widely accepted and used dev_to_node of the pci_dev. It is used in other cases IIRC to determine the numa location of the device. It shouldn't be any different for this driver. P. T. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] crypto: qat - Enforce valid numa configuration
On 10/13/2014 09:24 PM, Tadeusz Struk wrote: snip - node = adf_get_dev_node_id(pdev); - accel_dev = kzalloc_node(sizeof(*accel_dev), GFP_KERNEL, node); + if (num_possible_nodes() 1 dev_to_node(pdev-dev) 0) { + /* If the accelerator is connected to a node with no memory + * there is no point in using the accelerator since the remote + * memory transaction will be very slow. */ + dev_err(pdev-dev, Invalid NUMA configuration.\n); This is a lot better. Thank you for taking my comments into account here. Let's say I have a non-functional qat device and I see the above message in the boot log. The log doesn't say what to do ... so perhaps change it to dev_err(pdev-dev, FW_BUG numa node is set to %d. This can be overridden by using the numa_node module parameter., dev_to_node(pdev-dev)); and add a numa_node module parameter to let the user set that at module load time in case their FW is broken? I've found that sysadmins are knowledgeable about these types of things these days and are more than capable of looking at sysfs and numactl to determine where a device is. P. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] crypto: qat - Enforce valid numa configuration
On 10/14/2014 10:50 AM, Tadeusz Struk wrote: On 10/14/2014 03:53 AM, Prarit Bhargava wrote: - node = adf_get_dev_node_id(pdev); - accel_dev = kzalloc_node(sizeof(*accel_dev), GFP_KERNEL, node); + if (num_possible_nodes() 1 dev_to_node(pdev-dev) 0) { + /* If the accelerator is connected to a node with no memory +* there is no point in using the accelerator since the remote +* memory transaction will be very slow. */ + dev_err(pdev-dev, Invalid NUMA configuration.\n); This is a lot better. Thank you for taking my comments into account here. Thanks for taking the time to review my patch and providing your comments. Let's say I have a non-functional qat device and I see the above message in the boot log. The log doesn't say what to do ... so perhaps change it to dev_err(pdev-dev, FW_BUG numa node is set to %d. This can be overridden by using the numa_node module parameter., dev_to_node(pdev-dev)); and add a numa_node module parameter to let the user set that at module load time in case their FW is broken? I've found that sysadmins are knowledgeable about these types of things these days and are more than capable of looking at sysfs and numactl to determine where a device is. But then what if there are two devices and each belongs to different node. In this case we would fix one and break the other. I think if the Oh, that's a really good point. But can you at least change the message to do a FW_BUG and dump the node information? That would be useful for debugging. P. FW is broken then using on core encryption will be safer. If a sysadmins is really knowledgeable, then she or he can change the code to customize it for a given platform and rebuild the module. Other than that as far as I know module parameters are not encouraged. T -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] crypto: qat - Enforce valid numa configuration
On 10/14/2014 01:18 PM, Tadeusz Struk wrote: On 10/14/2014 08:41 AM, Prarit Bhargava wrote: Oh, that's a really good point. But can you at least change the message to do a FW_BUG and dump the node information? That would be useful for debugging. But this not always will be a FW_BUG. If a user will not populate one of the nodes with memory this will happen as well. Hmmm ... let's maybe think about this. I wonder if there is some mechanism with which we can determine that? Larry Woodman -- is there any mm related call that we can make to determine if a node is memory-less? I could see this to be the main reason of this message to be printed. In this case num_possible_nodes() will be e.g. 2 and dev_to_node(pdev-dev) will be -1 so I don't really know what will be a useful info to print so we don't confuse the user. If you see -1, it means No node was assigned ... so -1 in a debug message is okay IMO. P. T -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] crypto: qat - Enforce valid numa configuration
On 10/14/2014 01:27 PM, Prarit Bhargava wrote: On 10/14/2014 01:18 PM, Tadeusz Struk wrote: On 10/14/2014 08:41 AM, Prarit Bhargava wrote: Oh, that's a really good point. But can you at least change the message to do a FW_BUG and dump the node information? That would be useful for debugging. But this not always will be a FW_BUG. If a user will not populate one of the nodes with memory this will happen as well. Hmmm ... let's maybe think about this. I wonder if there is some mechanism with which we can determine that? Larry Woodman -- is there any mm related call that we can make to determine if a node is memory-less? I could see this to be the main reason of this message to be printed. In this case num_possible_nodes() will be e.g. 2 and dev_to_node(pdev-dev) will be -1 so I don't really know what will be a useful info to print so we don't confuse the user. If you see -1, it means No node was assigned ... so -1 in a debug message is okay IMO. Never mind -- I'm not thinking straight after a long weekend :) This is all okay. The message above will only print iff node 0, ie) -1. So I'll ack shortly. P. P. T -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/2] crypto: qat - Fix for invalid dma mapping and numa
On 10/15/2014 06:35 AM, Nikolay Aleksandrov wrote: On 14/10/14 03:24, Tadeusz Struk wrote: Hi, These two patches fix invalid (zero length) dma mapping and enforce numa configuration for maximum performance. Change log: v2 - Removed numa node calculation based bus number and use predefined functions instead. Signed-off-by: Tadeusz Struk tadeusz.st...@intel.com --- Tadeusz Struk (2): crypto: qat - Prevent dma mapping zero length assoc data crypto: qat - Enforce valid numa configuration drivers/crypto/qat/qat_common/adf_accel_devices.h |3 +- drivers/crypto/qat/qat_common/adf_transport.c | 12 +--- drivers/crypto/qat/qat_common/qat_algs.c |7 +++-- drivers/crypto/qat/qat_common/qat_crypto.c|8 +++-- drivers/crypto/qat/qat_dh895xcc/adf_admin.c |2 + drivers/crypto/qat/qat_dh895xcc/adf_drv.c | 32 - drivers/crypto/qat/qat_dh895xcc/adf_isr.c |2 + 7 files changed, 32 insertions(+), 34 deletions(-) I just gave a quick run of these patches and they seem to fix the NUMA issue and the 0 length warnings. Tested-by: Nikolay Aleksandrov niko...@redhat.com Thanks Nik :) Reviewed-by: Prarit Bhargava pra...@redhat.com P. -- To unsubscribe from this list: send the line unsubscribe linux-crypto in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] hwrng: amd - Revert managed API changes
After commit 31b2a73c9c5f ("hwrng: amd - Migrate to managed API"), the amd-rng driver uses devres with pci_dev->dev to keep track of resources, but does not actually register a PCI driver. This results in the following issues: 1. The message WARNING: CPU: 2 PID: 621 at drivers/base/dd.c:349 driver_probe_device+0x38c is output when the i2c_amd756 driver loads and attempts to register a PCI driver. The PCI & device subsystems assume that no resources have been registered for the device, and the WARN_ON() triggers since amd-rng has already do so. 2. The driver leaks memory because the driver does not attach to a device. The driver only uses the PCI device as a reference. devm_*() functions will release resources on driver detach, which the amd-rng driver will never do. As a result, 3. The driver cannot be reloaded because there is always a use of the ioport and region after the first load of the driver. Revert the changes made by 31b2a73c9c5f ("hwrng: amd - Migrate to managed API"). Signed-off-by: Prarit Bhargava <pra...@redhat.com> Fixes: 31b2a73c9c5f ("hwrng: amd - Migrate to managed API"). Cc: Matt Mackall <m...@selenic.com> Cc: Herbert Xu <herb...@gondor.apana.org.au> Cc: Corentin LABBE <clabbe.montj...@gmail.com> Cc: PrasannaKumar Muralidharan <prasannatsmku...@gmail.com> Cc: Wei Yongjun <weiyongj...@huawei.com> Cc: linux-crypto@vger.kernel.org Cc: linux-ge...@lists.infradead.org --- drivers/char/hw_random/amd-rng.c | 42 ++ 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/drivers/char/hw_random/amd-rng.c b/drivers/char/hw_random/amd-rng.c index 4a99ac756f08..9959c762da2f 100644 --- a/drivers/char/hw_random/amd-rng.c +++ b/drivers/char/hw_random/amd-rng.c @@ -55,6 +55,7 @@ struct amd768_priv { void __iomem *iobase; struct pci_dev *pcidev; + u32 pmbase; }; static int amd_rng_read(struct hwrng *rng, void *buf, size_t max, bool wait) @@ -148,33 +149,58 @@ static int __init mod_init(void) if (pmbase == 0) return -EIO; - priv = devm_kzalloc(>dev, sizeof(*priv), GFP_KERNEL); + priv = kzalloc(sizeof(*priv), GFP_KERNEL); if (!priv) return -ENOMEM; - if (!devm_request_region(>dev, pmbase + PMBASE_OFFSET, - PMBASE_SIZE, DRV_NAME)) { + if (!request_region(pmbase + PMBASE_OFFSET, PMBASE_SIZE, DRV_NAME)) { dev_err(>dev, DRV_NAME " region 0x%x already in use!\n", pmbase + 0xF0); - return -EBUSY; + err = -EBUSY; + goto out; } - priv->iobase = devm_ioport_map(>dev, pmbase + PMBASE_OFFSET, - PMBASE_SIZE); + priv->iobase = ioport_map(pmbase + PMBASE_OFFSET, PMBASE_SIZE); if (!priv->iobase) { pr_err(DRV_NAME "Cannot map ioport\n"); - return -ENOMEM; + err = -EINVAL; + goto err_iomap; } amd_rng.priv = (unsigned long)priv; + priv->pmbase = pmbase; priv->pcidev = pdev; pr_info(DRV_NAME " detected\n"); - return devm_hwrng_register(>dev, _rng); + err = hwrng_register(_rng); + if (err) { + pr_err(DRV_NAME " registering failed (%d)\n", err); + goto err_hwrng; + } + return 0; + +err_hwrng: + ioport_unmap(priv->iobase); +err_iomap: + release_region(pmbase + PMBASE_OFFSET, PMBASE_SIZE); +out: + kfree(priv); + return err; } static void __exit mod_exit(void) { + struct amd768_priv *priv; + + priv = (struct amd768_priv *)amd_rng.priv; + + hwrng_unregister(_rng); + + ioport_unmap(priv->iobase); + + release_region(priv->pmbase + PMBASE_OFFSET, PMBASE_SIZE); + + kfree(priv); } module_init(mod_init); -- 1.7.9.3
[PATCH 2/2] hwrng: geode - Revert managed API changes
After commit e9afc746299d ("hwrng: geode - Use linux/io.h instead of asm/io.h") the geode-rng driver uses devres with pci_dev->dev to keep track of resources, but does not actually register a PCI driver. This results in the following issues: 1. The driver leaks memory because the driver does not attach to a device. The driver only uses the PCI device as a reference. devm_*() functions will release resources on driver detach, which the geode-rng driver will never do. As a result, 2. The driver cannot be reloaded because there is always a use of the ioport and region after the first load of the driver. Revert the changes made by e9afc746299d ("hwrng: geode - Use linux/io.h instead of asm/io.h"). Signed-off-by: Prarit Bhargava <pra...@redhat.com> Fixes: 6e9b5e76882c ("hwrng: geode - Migrate to managed API") Cc: Matt Mackall <m...@selenic.com> Cc: Herbert Xu <herb...@gondor.apana.org.au> Cc: Corentin LABBE <clabbe.montj...@gmail.com> Cc: PrasannaKumar Muralidharan <prasannatsmku...@gmail.com> Cc: Wei Yongjun <weiyongj...@huawei.com> Cc: linux-crypto@vger.kernel.org Cc: linux-ge...@lists.infradead.org --- drivers/char/hw_random/geode-rng.c | 50 +--- 1 file changed, 35 insertions(+), 15 deletions(-) diff --git a/drivers/char/hw_random/geode-rng.c b/drivers/char/hw_random/geode-rng.c index e7a245942029..e1d421a36a13 100644 --- a/drivers/char/hw_random/geode-rng.c +++ b/drivers/char/hw_random/geode-rng.c @@ -31,6 +31,9 @@ #include #include + +#define PFXKBUILD_MODNAME ": " + #define GEODE_RNG_DATA_REG 0x50 #define GEODE_RNG_STATUS_REG 0x54 @@ -82,6 +85,7 @@ static int geode_rng_data_present(struct hwrng *rng, int wait) static int __init mod_init(void) { + int err = -ENODEV; struct pci_dev *pdev = NULL; const struct pci_device_id *ent; void __iomem *mem; @@ -89,27 +93,43 @@ static int __init mod_init(void) for_each_pci_dev(pdev) { ent = pci_match_id(pci_tbl, pdev); - if (ent) { - rng_base = pci_resource_start(pdev, 0); - if (rng_base == 0) - return -ENODEV; - - mem = devm_ioremap(>dev, rng_base, 0x58); - if (!mem) - return -ENOMEM; - geode_rng.priv = (unsigned long)mem; - - pr_info("AMD Geode RNG detected\n"); - return devm_hwrng_register(>dev, _rng); - } + if (ent) + goto found; } - /* Device not found. */ - return -ENODEV; + goto out; + +found: + rng_base = pci_resource_start(pdev, 0); + if (rng_base == 0) + goto out; + err = -ENOMEM; + mem = ioremap(rng_base, 0x58); + if (!mem) + goto out; + geode_rng.priv = (unsigned long)mem; + + pr_info("AMD Geode RNG detected\n"); + err = hwrng_register(_rng); + if (err) { + pr_err(PFX "RNG registering failed (%d)\n", + err); + goto err_unmap; + } +out: + return err; + +err_unmap: + iounmap(mem); + goto out; } static void __exit mod_exit(void) { + void __iomem *mem = (void __iomem *)geode_rng.priv; + + hwrng_unregister(_rng); + iounmap(mem); } module_init(mod_init); -- 1.7.9.3
[PATCH 0/2] hwrng: revert managed API changes for amd and geode
When booting top-of-tree the following WARN_ON triggers in the kernel on a 15h AMD system. WARNING: CPU: 2 PID: 621 at drivers/base/dd.c:349 driver_probe_device+0x38c Modules linked in: i2c_amd756(+) amd_rng sg pcspkr parport_pc(+) parport k8 CPU: 2 PID: 621 Comm: systemd-udevd Not tainted 4.11.0-0.rc1.git0.1.el7_UNS Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./TYAN High-End Call Trace: dump_stack+0x63/0x8e __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 driver_probe_device+0x38c/0x470 __driver_attach+0xc9/0xf0 ? driver_probe_device+0x470/0x470 bus_for_each_dev+0x5d/0x90 driver_attach+0x1e/0x20 bus_add_driver+0x1d0/0x290 driver_register+0x60/0xe0 ? 0xa0037000 __pci_register_driver+0x4c/0x50 amd756_driver_init+0x1e/0x1000 [i2c_amd756] do_one_initcall+0x51/0x1b0 ? __vunmap+0x85/0xd0 ? do_init_module+0x27/0x1fa do_init_module+0x60/0x1fa load_module+0x15d1/0x1ad0 ? m_show+0x1c0/0x1c0 SYSC_finit_module+0xa9/0xd0 There are PCI devices that contain both a RNG and SMBUS device. The RNG device is initialized by the amd-rng driver but the driver does not register against the device. The SMBUS device is initialized by the i2c-amd756 driver and registers against the device and hits the WARN_ON() because the amd-rng driver has already allocated resources against the device. The amd-rng driver was incorrectly migrated to the device resource model (devres), and after code inspection I found that the geode-rng driver was also incorrectly migrated. These drivers are using devres but do not register a driver against the device, and both drivers are expecting a memory cleanup on a driver detach that will never happen. This results in a memory leak when the driver is unloaded and the inability to reload the driver. Revert 31b2a73c9c5f ("hwrng: amd - Migrate to managed API"), and 6e9b5e76882c ("hwrng: geode - Migrate to managed API"). Signed-off-by: Prarit Bhargava <pra...@redhat.com> Fixes: 31b2a73c9c5f ("hwrng: amd - Migrate to managed API"). Fixes: 6e9b5e76882c ("hwrng: geode - Migrate to managed API") Cc: Matt Mackall <m...@selenic.com> Cc: Herbert Xu <herb...@gondor.apana.org.au> Cc: Corentin LABBE <clabbe.montj...@gmail.com> Cc: PrasannaKumar Muralidharan <prasannatsmku...@gmail.com> Cc: Wei Yongjun <weiyongj...@huawei.com> Cc: linux-crypto@vger.kernel.org Cc: linux-ge...@lists.infradead.org Prarit Bhargava (2): hwrng: amd - Revert managed API changes hwrng: geode - Revert managed API changes drivers/char/hw_random/amd-rng.c | 42 -- drivers/char/hw_random/geode-rng.c | 50 +--- 2 files changed, 69 insertions(+), 23 deletions(-) -- 1.7.9.3