Re: [PATCH 1/2] mm/madvise: allow process_madvise operations on entire memory range
On Tue, Dec 22, 2020 at 09:48:43AM -0800, Suren Baghdasaryan wrote: > Thanks for the feedback! The use case is userspace memory reaping > similar to oom-reaper. Detailed justification is here: > https://lore.kernel.org/linux-mm/20201124053943.1684874-1-sur...@google.com Given that this new variant of process_madvise a) does not work on an address range b) is destructive c) doesn't share much code at all with the rest of process_madvise Why not add a proper separate syscall?
Re: [PATCH v5 07/12] media: uvcvideo: Implement UVC_EXT_GPIO_UNIT
Hi Ricardo, On Tue, Dec 22, 2020 at 07:36:52PM +0100, Ricardo Ribalda wrote: > On Tue, Dec 22, 2020 at 9:34 AM Laurent Pinchart wrote: > > On Mon, Dec 21, 2020 at 05:48:14PM +0100, Ricardo Ribalda wrote: > > > Some devices can implement a physical switch to disable the input of the > > > camera on demand. Think of it like an elegant privacy sticker. > > > > > > The system can read the status of the privacy switch via a GPIO. > > > > > > It is important to know the status of the switch, e.g. to notify the > > > user when the camera will produce black frames and a videochat > > > application is used. > > > > > > In some systems, the GPIO is connected to main SoC instead of the > > > camera controller, with the connected reported by the system firmware > > > > s/connected/connection/ > > > > > (ACPI or DT). In that case, the UVC device isn't aware of the GPIO. We > > > need to implement a virtual entity to handle the GPIO fully on the > > > driver side. > > > > > > For example, for ACPI-based systems, the GPIO is reported in the USB > > > device object: > > > > > > Scope (\_SB.PCI0.XHCI.RHUB.HS07) > > > { > > > > > > /.../ > > > > > > Name (_CRS, ResourceTemplate () // _CRS: Current Resource Settings > > > { > > > GpioIo (Exclusive, PullDefault, 0x, 0x, > > > IoRestrictionOutputOnly, > > > "\\_SB.PCI0.GPIO", 0x00, ResourceConsumer, , > > > ) > > > { // Pin list > > > 0x0064 > > > } > > > }) > > > Name (_DSD, Package (0x02) // _DSD: Device-Specific Data > > > { > > > ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301") /* Device > > > Properties for _DSD */, > > > Package (0x01) > > > { > > > Package (0x02) > > > { > > > "privacy-gpio", > > > Package (0x04) > > > { > > > \_SB.PCI0.XHCI.RHUB.HS07, > > > Zero, > > > Zero, > > > One > > > } > > > } > > > } > > > }) > > > } > > > > > > Signed-off-by: Ricardo Ribalda > > > --- > > > drivers/media/usb/uvc/uvc_ctrl.c | 7 ++ > > > drivers/media/usb/uvc/uvc_driver.c | 156 + > > > drivers/media/usb/uvc/uvc_entity.c | 1 + > > > drivers/media/usb/uvc/uvcvideo.h | 16 +++ > > > 4 files changed, 180 insertions(+) > > > > > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c > > > b/drivers/media/usb/uvc/uvc_ctrl.c > > > index 528254230535..a430fa666897 100644 > > > --- a/drivers/media/usb/uvc/uvc_ctrl.c > > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c > > > @@ -1300,6 +1300,10 @@ static void __uvc_ctrl_status_event_work(struct > > > uvc_device *dev, > > > > > > mutex_unlock(>ctrl_mutex); > > > > > > + /* Events not started by the UVC device. E.g. the GPIO unit */ > > > + if (!w->urb) > > > + return; > > > + > > > /* Resubmit the URB. */ > > > w->urb->interval = dev->int_ep->desc.bInterval; > > > ret = usb_submit_urb(w->urb, GFP_KERNEL); > > > @@ -2317,6 +2321,9 @@ int uvc_ctrl_init_device(struct uvc_device *dev) > > > } else if (UVC_ENTITY_TYPE(entity) == UVC_ITT_CAMERA) { > > > bmControls = entity->camera.bmControls; > > > bControlSize = entity->camera.bControlSize; > > > + } else if (UVC_ENTITY_TYPE(entity) == UVC_EXT_GPIO_UNIT) { > > > + bmControls = entity->gpio.bmControls; > > > + bControlSize = entity->gpio.bControlSize; > > > } > > > > > > /* Remove bogus/blacklisted controls */ > > > diff --git a/drivers/media/usb/uvc/uvc_driver.c > > > b/drivers/media/usb/uvc/uvc_driver.c > > > index c0c5f75ade40..72516101fdd0 100644 > > > --- a/drivers/media/usb/uvc/uvc_driver.c > > > +++ b/drivers/media/usb/uvc/uvc_driver.c > > > @@ -7,6 +7,7 @@ > > > */ > > > > > > #include > > > +#include > > > #include > > > #include > > > #include > > > @@ -1020,6 +1021,7 @@ static int uvc_parse_streaming(struct uvc_device > > > *dev, > > > } > > > > > > static const u8 uvc_camera_guid[16] = UVC_GUID_UVC_CAMERA; > > > +static const u8 uvc_gpio_guid[16] = UVC_GUID_EXT_GPIO_CONTROLLER; > > > static const u8 uvc_media_transport_input_guid[16] = > > > UVC_GUID_UVC_MEDIA_TRANSPORT_INPUT; > > > static const u8 uvc_processing_guid[16] = UVC_GUID_UVC_PROCESSING; > > > @@ -1051,6 +1053,9 @@ static struct uvc_entity *uvc_alloc_entity(u16 > > > type, u16 id, > > >* is initialized by the caller. > > >*/ > > > switch (type) { > > > + case UVC_EXT_GPIO_UNIT: > > > + memcpy(entity->guid, uvc_gpio_guid, 16); > > > + break; > > > case UVC_ITT_CAMERA: > > > memcpy(entity->guid, uvc_camera_guid, 16); > > > break; > > > @@ -1464,6 +1469,137 @@ static int
Re: Does uaccess_kernel() work for detecting kernel thread?
On Tue, Dec 22, 2020 at 11:39:08PM +0900, Tetsuo Handa wrote: > For example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n, > isn't sg_check_file_access() failing to detect kernel context? sg_check_file_access does exactly the right thing - fail for all kernel threads as those can't support the magic it does. > For another example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n, > isn't TOMOYO unexpectedly checking permissions for socket operations? Can someone explain WTF TOMOYO is even doing there? A security module has absolutely no business checking what context it is called from, but must check the process credentials instead.
Re: [PATCH] vdpa_sim: use iova module to allocate IOVA addresses
On Wed, Dec 23, 2020 at 11:43:40AM +0800, Jason Wang wrote: On 2020/12/23 上午1:45, Stefano Garzarella wrote: The identical mapping used until now created issues when mapping different virtual pages with the same physical address. To solve this issue, we can use the iova module, to handle the IOVA allocation. For semplicity we use an IOVA allocator with byte granularity. Should be simplicity, so did one comment below. Right, I'll fix here and in the comment below. We add two new functions, vdpasim_map_range() and vdpasim_unmap_range(), to handle the IOVA allocation and the registration into the IOMMU/IOTLB. These functions are used by dma_map_ops callbacks. Signed-off-by: Stefano Garzarella Few nits, but: Acked-by: Jason Wang Thanks! --- drivers/vdpa/vdpa_sim/vdpa_sim.h | 2 + drivers/vdpa/vdpa_sim/vdpa_sim.c | 108 +++ drivers/vdpa/Kconfig | 1 + 3 files changed, 69 insertions(+), 42 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h index b02142293d5b..6efe205e583e 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h @@ -6,6 +6,7 @@ #ifndef _VDPA_SIM_H #define _VDPA_SIM_H +#include #include #include #include @@ -55,6 +56,7 @@ struct vdpasim { /* virtio config according to device type */ void *config; struct vhost_iotlb *iommu; + struct iova_domain iova; void *buffer; u32 status; u32 generation; diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index b3fcc67bfdf0..341b9daf2ea4 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "vdpa_sim.h" @@ -128,30 +129,57 @@ static int dir_to_perm(enum dma_data_direction dir) return perm; } +static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr, + size_t size, unsigned int perm) +{ + struct iova *iova; + dma_addr_t dma_addr; + int ret; + + /* We set the limit_pfn to the maximum (~0UL - 1) */ + iova = alloc_iova(>iova, size, ~0UL - 1, true); Let's use ULONG_MAX? Definitely, much better! + if (!iova) + return DMA_MAPPING_ERROR; + + dma_addr = iova_dma_addr(>iova, iova); + + spin_lock(>iommu_lock); + ret = vhost_iotlb_add_range(vdpasim->iommu, (u64)dma_addr, + (u64)dma_addr + size - 1, (u64)paddr, perm); + spin_unlock(>iommu_lock); + + if (ret) { + __free_iova(>iova, iova); + return DMA_MAPPING_ERROR; + } + + return dma_addr; +} + +static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr, + size_t size) +{ + spin_lock(>iommu_lock); + vhost_iotlb_del_range(vdpasim->iommu, (u64)dma_addr, + (u64)dma_addr + size - 1); + spin_unlock(>iommu_lock); + + free_iova(>iova, iova_pfn(>iova, dma_addr)); +} + static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, unsigned long attrs) { struct vdpasim *vdpasim = dev_to_sim(dev); - struct vhost_iotlb *iommu = vdpasim->iommu; - u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset; - int ret, perm = dir_to_perm(dir); + phys_addr_t paddr = page_to_phys(page) + offset; + int perm = dir_to_perm(dir); if (perm < 0) return DMA_MAPPING_ERROR; - /* For simplicity, use identical mapping to avoid e.g iova -* allocator. -*/ - spin_lock(>iommu_lock); - ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1, - pa, dir_to_perm(dir)); - spin_unlock(>iommu_lock); - if (ret) - return DMA_MAPPING_ERROR; - - return (dma_addr_t)(pa); + return vdpasim_map_range(vdpasim, paddr, size, perm); } static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, @@ -159,12 +187,8 @@ static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, unsigned long attrs) { struct vdpasim *vdpasim = dev_to_sim(dev); - struct vhost_iotlb *iommu = vdpasim->iommu; - spin_lock(>iommu_lock); - vhost_iotlb_del_range(iommu, (u64)dma_addr, - (u64)dma_addr + size - 1); - spin_unlock(>iommu_lock); + vdpasim_unmap_range(vdpasim, dma_addr, size); } static void *vdpasim_alloc_coherent(struct device *dev, size_t size, @@ -172,27 +196,22 @@ static void *vdpasim_alloc_coherent(struct device *dev, size_t size, unsigned
Re: [PATCH] crypto: keembay-ocs-aes - Add dependency on HAS_IOMEM
On Thu, Dec 17, 2020 at 04:35:10PM +, Daniele Alessandrelli wrote: > From: Daniele Alessandrelli > > Add dependency for CRYPTO_DEV_KEEMBAY_OCS_AES_SM4 on HAS_IOMEM to > prevent build failures. > > Fixes: 88574332451380f4 ("crypto: keembay - Add support for Keem Bay OCS > AES/SM4") > Reported-by: kernel test robot > Signed-off-by: Daniele Alessandrelli > --- > drivers/crypto/keembay/Kconfig | 1 + > 1 file changed, 1 insertion(+) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper
On Tue, Dec 22, 2020 at 09:17:41PM +0100, Florent Revest wrote: > On Tue, Dec 22, 2020 at 3:18 PM Christoph Hellwig wrote: > > > > FYI, there is a reason why kallsyms_lookup is not exported any more. > > I don't think adding that back through a backdoor is a good idea. > > Did you maybe mean kallsyms_lookup_name (the one that looks an address > up based on a symbol name) ? It used to be exported but isn't anymore > indeed. > However, this is not what we're trying to do. As far as I can tell, > kallsyms_lookup (the one that looks a symbol name up based on an > address) has never been exported but its close cousins sprint_symbol > and sprint_symbol_no_offset (which only call kallsyms_lookup and > pretty print the result) are still exported, they are also used by > vsprintf. Is this an issue ? Indeed, I thought of kallsyms_lookup_name. Let me take another look at the patch, but kallsyms_lookup still seems like a very lowlevel function to export to arbitrary eBPF programs.
Re: [PATCH] crypto: CRYPTO_DEV_KEEMBAY_OCS_AES_SM4 should depend on ARCH_KEEMBAY
On Wed, Dec 16, 2020 at 02:14:59PM +0100, Geert Uytterhoeven wrote: > The Intel Keem Bay Offload and Crypto Subsystem (OCS) is only present on > Intel Keem Bay SoCs. Hence add a dependency on ARCH_KEEMBAY, to prevent > asking the user about this driver when configuring a kernel without > Intel Keem Bay platform support. > > While at it, fix a misspelling of "cipher". > > Fixes: 88574332451380f4 ("crypto: keembay - Add support for Keem Bay OCS > AES/SM4") > Signed-off-by: Geert Uytterhoeven > --- > drivers/crypto/keembay/Kconfig | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) Patch applied. Thanks. -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[PATCH] arm64: dts: mt8192: add thermal zones, cooling map and trips
Add thermal zone node to support mt8192 read temperature. Thermal throttle will start at 68C and the target temperature is 85C. This patch depends on [1]. [1]https://patchwork.kernel.org/project/linux-mediatek/patch/20201221061018.18503-3-yz...@mediatek.com/ Signed-off-by: Michael Kao --- arch/arm64/boot/dts/mediatek/mt8192.dtsi | 169 +++ 1 file changed, 169 insertions(+) diff --git a/arch/arm64/boot/dts/mediatek/mt8192.dtsi b/arch/arm64/boot/dts/mediatek/mt8192.dtsi index 4a0d941aec30..4020e40a092a 100644 --- a/arch/arm64/boot/dts/mediatek/mt8192.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8192.dtsi @@ -9,6 +9,7 @@ #include #include #include +#include / { compatible = "mediatek,mt8192"; @@ -42,6 +43,7 @@ clock-frequency = <170100>; next-level-cache = <_0>; capacity-dmips-mhz = <530>; + #cooling-cells = <2>; }; cpu1: cpu@100 { @@ -52,6 +54,7 @@ clock-frequency = <170100>; next-level-cache = <_0>; capacity-dmips-mhz = <530>; + #cooling-cells = <2>; }; cpu2: cpu@200 { @@ -62,6 +65,7 @@ clock-frequency = <170100>; next-level-cache = <_0>; capacity-dmips-mhz = <530>; + #cooling-cells = <2>; }; cpu3: cpu@300 { @@ -72,6 +76,7 @@ clock-frequency = <170100>; next-level-cache = <_0>; capacity-dmips-mhz = <530>; + #cooling-cells = <2>; }; cpu4: cpu@400 { @@ -82,6 +87,7 @@ clock-frequency = <217100>; next-level-cache = <_1>; capacity-dmips-mhz = <1024>; + #cooling-cells = <2>; }; cpu5: cpu@500 { @@ -92,6 +98,7 @@ clock-frequency = <217100>; next-level-cache = <_1>; capacity-dmips-mhz = <1024>; + #cooling-cells = <2>; }; cpu6: cpu@600 { @@ -102,6 +109,7 @@ clock-frequency = <217100>; next-level-cache = <_1>; capacity-dmips-mhz = <1024>; + #cooling-cells = <2>; }; cpu7: cpu@700 { @@ -112,6 +120,7 @@ clock-frequency = <217100>; next-level-cache = <_1>; capacity-dmips-mhz = <1024>; + #cooling-cells = <2>; }; cpu-map { @@ -178,6 +187,140 @@ method = "smc"; }; + thermal-zones { + soc_max { + polling-delay = <1000>; /* milliseconds */ + polling-delay-passive = <1000>; /* milliseconds */ + thermal-sensors = < 0>; + sustainable-power = <1500>; + + trips { + threshold: trip-point@0 { + temperature = <68000>; + hysteresis = <2000>; + type = "passive"; + }; + + target: target@1 { + temperature = <85000>; + hysteresis = <2000>; + type = "passive"; + }; + + soc_max_crit: soc_max_crit@0 { + temperature = <115000>; + hysteresis = <2000>; + type = "critical"; + }; + }; + + cooling-maps { + map0 { + trip = <>; + cooling-device = < + THERMAL_NO_LIMIT + THERMAL_NO_LIMIT>, +< + THERMAL_NO_LIMIT + THERMAL_NO_LIMIT>, +< + THERMAL_NO_LIMIT + THERMAL_NO_LIMIT>, +< +
Re: [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace
FYI, a few years ago I spent some time helping a customer to prepare their block device in userspace using fuse code for upstreaming, but at some point they abandoned the project. But if for some reason we don't want to use nbd I think a driver using the fuse infrastructure would be the next logical choice.
[PATCH] thermal: cpufreq_cooling: fix slab OOB issue
From: brian-sy yang Slab OOB issue is scanned by KASAN in cpu_power_to_freq(). If power is limited below the power of OPP0 in EM table, it will cause slab out-of-bound issue with negative array index. Return the lowest frequency if limited power cannot found a suitable OPP in EM table to fix this issue. Backtrace: [] die+0x104/0x5ac [] bug_handler+0x64/0xd0 [] brk_handler+0x160/0x258 [] do_debug_exception+0x248/0x3f0 [] el1_dbg+0x14/0xbc [] __kasan_report+0x1dc/0x1e0 [] kasan_report+0x10/0x20 [] __asan_report_load8_noabort+0x18/0x28 [] cpufreq_power2state+0x180/0x43c [] power_actor_set_power+0x114/0x1d4 [] allocate_power+0xaec/0xde0 [] power_allocator_throttle+0x3ec/0x5a4 [] handle_thermal_trip+0x160/0x294 [] thermal_zone_device_check+0xe4/0x154 [] process_one_work+0x5e4/0xe28 [] worker_thread+0xa4c/0xfac [] kthread+0x33c/0x358 [] ret_from_fork+0xc/0x18 Signed-off-by: brian-sy yang --- drivers/thermal/cpufreq_cooling.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index cc2959f22f01..fb33b3480a8f 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -123,7 +123,7 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev, { int i; - for (i = cpufreq_cdev->max_level; i >= 0; i--) { + for (i = cpufreq_cdev->max_level; i > 0; i--) { if (power >= cpufreq_cdev->em->table[i].power) break; } -- 2.18.0
RE: [PATCH v1] scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
> > On 2020-12-23 12:19, Stanley Chu wrote: > > Hi Can, > > > > On Tue, 2020-12-22 at 19:34 +0800, Can Guo wrote: > >> On 2020-12-22 15:29, Stanley Chu wrote: > >> > Flush during hibern8 is sufficient on MediaTek platforms, thus > >> > enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL to skip > enabling > >> > fWriteBoosterBufferFlush during WriteBooster initialization. > >> > > >> > Signed-off-by: Stanley Chu > >> > --- > >> > drivers/scsi/ufs/ufs-mediatek.c | 1 + > >> > 1 file changed, 1 insertion(+) > >> > > >> > diff --git a/drivers/scsi/ufs/ufs-mediatek.c > >> > b/drivers/scsi/ufs/ufs-mediatek.c > >> > index 80618af7c872..c55202b92a43 100644 > >> > --- a/drivers/scsi/ufs/ufs-mediatek.c > >> > +++ b/drivers/scsi/ufs/ufs-mediatek.c > >> > @@ -661,6 +661,7 @@ static int ufs_mtk_init(struct ufs_hba *hba) > >> > > >> >/* Enable WriteBooster */ > >> >hba->caps |= UFSHCD_CAP_WB_EN; > >> > + hba->quirks |= UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL; > >> >hba->vps->wb_flush_threshold = > UFS_WB_BUF_REMAIN_PERCENT(80); > >> > > >> >if (host->caps & UFS_MTK_CAP_DISABLE_AH8) > >> > >> I guess we need it too... > > > > AHHA, if you decide to add this in your platform too later, maybe we > > could change the way it does: Keep manual flush disabled by default and > > remove this quirk. Ack on that. I never understood why it was needed in the first place. Maybe just remove it, and allow to perform explicit flush from sysfs. Thanks, Avri > > > > Yeah... I will get back with an answer later.
Der Betrag von 500.000,00 Euro wurde Ihnen gespendet. Kontakt: manuelfranco4l...@gmail.com
Manuel Franco hat Ihnen 500.000,00 Euro gespendet. Er hat am 23. April 2019 den Powerball-Jackpot in Höhe von 758,7 Millionen US-Dollar gewonnen. Weitere Informationen erhalten Sie per E-Mail unter: manuelfranco4l...@gmail.com
Re: [PATCH] erofs: support direct IO for uncompressed file
On Wed, Dec 23, 2020 at 03:39:01AM +0800, Gao Xiang wrote: > Hi Christoph, > > On Tue, Dec 22, 2020 at 02:22:34PM +, Christoph Hellwig wrote: > > Please do not add new callers of __blockdev_direct_IO and use the modern > > iomap variant instead. > > We've talked about this topic before. The current status is that iomap > doesn't support tail-packing inline data yet (Chao once sent out a version), > and erofs only cares about read intrastructure for now (So we don't think > more about how to deal with tail-packing inline write path). Plus, the > original patch was once lack of inline data regression test from gfs2 folks. So resend Chaos prep patch as part of the series switching parts of erofs to iomap. We need to move things off the old infrastructure instead of adding more users and everyone needs to help a little.
Re: [LKP] [locking/rwsem] 617f3ef951: unixbench.score -21.2% regression
Hi Waiman, Do you have time to look at this? Thanks. As you describe in commit: 617f3ef95177840c77f59c2aec1029d27d5547d6 ("locking/rwsem: Remove reader optimistic spinning"), The patch that disables reader optimistic spinning shows reduced performance at lightly loaded cases, so for this regression, Is it as expected? On 12/17/2020 9:33 AM, kernel test robot wrote: Greeting, FYI, we noticed a -21.2% regression of unixbench.score due to commit: commit: 617f3ef95177840c77f59c2aec1029d27d5547d6 ("locking/rwsem: Remove reader optimistic spinning") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: unixbench on test machine: 16 threads Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz with 32G memory with following parameters: runtime: 300s nr_task: 30% test: shell8 cpufreq_governor: performance ucode: 0xde test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system. test-url: https://github.com/kdlucas/byte-unixbench If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml = compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/30%/debian-10.4-x86_64-20200603.cgz/300s/lkp-cfl-e1/shell8/unixbench/0xde commit: 1a728dff85 ("locking/rwsem: Enable reader optimistic lock stealing") 617f3ef951 ("locking/rwsem: Remove reader optimistic spinning") 1a728dff855a318b 617f3ef95177840c77f59c2aec1 --- fail:runs %reproductionfail:runs | | | 39:4 -992%:4 perf-profile.calltrace.cycles-pp.error_entry 25:4 -635%:4 perf-profile.children.cycles-pp.error_entry %stddev %change %stddev \ |\ 21807 ± 3% -21.2% 17186unixbench.score 1287072 ± 3% -38.7% 788414 unixbench.time.involuntary_context_switches 37161 ± 4% +31.3% 48798unixbench.time.major_page_faults 1.047e+08 ± 3% -21.1% 82610985unixbench.time.minor_page_faults 1341 -27.1% 978.00 unixbench.time.percent_of_cpu_this_job_got 370.87 -33.3% 247.55unixbench.time.system_time 490.05 -23.3% 376.03unixbench.time.user_time 3083520 ± 3% +59.7%4924900 unixbench.time.voluntary_context_switches 824314 ± 3% -21.2% 649654unixbench.workload 0.03 ± 27% -51.9% 0.02 ± 59% perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork 385.15 ± 2% +62.5% 625.72uptime.idle 17.03-1.8% 16.73boot-time.boot 11.01-1.6% 10.83boot-time.dhcp 214.12 ± 3% -3.1% 207.49boot-time.idle 13.72 ± 4% +23.5 37.24mpstat.cpu.all.idle% 1.06-0.10.94mpstat.cpu.all.irq% 49.32 ± 2% -11.8 37.53mpstat.cpu.all.sys% 35.24 ± 2% -11.6 23.68mpstat.cpu.all.usr% 15.50 ± 3%+145.2% 38.00vmstat.cpu.id 49.00 ± 2% -22.4% 38.00vmstat.cpu.sy 33.75 ± 2% -33.3% 22.50 ± 2% vmstat.cpu.us 21.75 ± 3% -33.3% 14.50 ± 3% vmstat.procs.r 97370 ± 3% +56.4% 152258vmstat.system.cs 37589-2.1% 36804vmstat.system.in 11861 ± 9% -18.0% 9730slabinfo.filp.active_objs 13242 ± 8% -15.5% 11184slabinfo.filp.num_objs 14731 ± 7% -9.5% 13325 ± 5% slabinfo.kmalloc-8.active_objs 14731 ± 7% -9.5% 13325 ± 5% slabinfo.kmalloc-8.num_objs 5545 ± 2% -13.8% 4780 ± 4% slabinfo.pid.active_objs 5563 ± 2% -13.8% 4793 ± 4% slabinfo.pid.num_objs 5822 ± 14% -40.4% 3468 ± 5% slabinfo.task_delay_info.active_objs 5825 ± 14% -40.5% 3468 ± 5% slabinfo.task_delay_info.num_objs 32104492 ± 3%+303.3% 1.295e+08 ± 11% cpuidle.C1.time 882330 ± 5%+131.5%2042656 ± 10% cpuidle.C1.usage 21965263 ± 3%+340.5% 96762398 ± 14% cpuidle.C1E.time 442911 ± 2%+211.3%1378866 ± 14% cpuidle.C1E.usage 6511399 ± 4%+606.6% 46010023 ±
Re: [PATCH v6 1/2] lib/string.c: add __sysfs_match_string_with_gaps() helper
On Tue, Dec 22, 2020 at 3:43 PM Andy Shevchenko wrote: > > On Tue, Dec 22, 2020 at 3:09 PM Alexandru Ardelean > wrote: > > > > The original docstring of the __sysfs_match_string() and match_string() > > helper, implied that -1 could be used to search through NULL terminated > > arrays, and positive 'n' could be used to go through arrays that may have > > NULL elements in the middle of the array. > > > > This isn't true. Regardless of the value of 'n', the first NULL element in > > the array will stop the search, even if the element may be after a NULL > > element. > > > > To allow for a behavior where we can use the __sysfs_match_string() to > > search over arrays with NULL elements in the middle, the > > __sysfs_match_string_with_gaps() helper is added. > > If n > 0, the search will continue until the element is found or n is > > reached. > > If n < 0, the search will continue until the element is found or a NULL > > character is found. > > I'm wondering if we can leave __sysfs_match_string() alone (w/o adding > unnecessary branch). Works for me. Will re-spin. > > int __sysfs_match_string_with_gaps(const char * const *array, size_t > n, const char *str) > { >const char *item; >int index; > >for (index = 0; index < n; index++) { >item = array[index]; >if (!item) >continue; >if (sysfs_streq(item, str)) >return index; >} >return -EINVAL; > } > > Note, the check n>0 seems redundant for this particular function. > > > +static int __sysfs_match_string_common(const char * const *array, ssize_t > > n, > > + const char *str, bool gaps) > > +{ > > + const char *item; > > + int index; > > + > > + for (index = 0; index < n; index++) { > > + item = array[index]; > > + if (!item) { > > + if (gaps && n > 0) > > + continue; > > + break; > > + } > > + if (sysfs_streq(item, str)) > > + return index; > > + } > > + > > + return -EINVAL; > > +} > > + > > /** > > * __sysfs_match_string - matches given string in an array > > * @array: array of strings > > @@ -770,21 +790,32 @@ EXPORT_SYMBOL(match_string); > > */ > > int __sysfs_match_string(const char * const *array, size_t n, const char > > *str) > > { > > - const char *item; > > - int index; > > - > > - for (index = 0; index < n; index++) { > > - item = array[index]; > > - if (!item) > > - break; > > - if (sysfs_streq(item, str)) > > - return index; > > - } > > - > > - return -EINVAL; > > + return __sysfs_match_string_common(array, n, str, false); > > } > > -- > With Best Regards, > Andy Shevchenko
Re: [PATCH AUTOSEL 5.4 008/130] staging: wimax: depends on NET
On Tue, Dec 22, 2020 at 09:16:11PM -0500, Sasha Levin wrote: > From: Randy Dunlap > > [ Upstream commit 9364a2cf567187c0a075942c22d1f434c758de5d ] > > Fix build errors when CONFIG_NET is not enabled. E.g. (trimmed): > > ld: drivers/staging/wimax/op-msg.o: in function `wimax_msg_alloc': > op-msg.c:(.text+0xa9): undefined reference to `__alloc_skb' > ld: op-msg.c:(.text+0xcc): undefined reference to `genlmsg_put' > ld: op-msg.c:(.text+0xfc): undefined reference to `nla_put' > ld: op-msg.c:(.text+0x168): undefined reference to `kfree_skb' > ld: drivers/staging/wimax/op-msg.o: in function `wimax_msg_data_len': > op-msg.c:(.text+0x1ba): undefined reference to `nla_find' > ld: drivers/staging/wimax/op-msg.o: in function `wimax_msg_send': > op-msg.c:(.text+0x311): undefined reference to `init_net' > ld: op-msg.c:(.text+0x326): undefined reference to `netlink_broadcast' > ld: drivers/staging/wimax/stack.o: in function `__wimax_state_change': > stack.c:(.text+0x433): undefined reference to `netif_carrier_off' > ld: stack.c:(.text+0x46b): undefined reference to `netif_carrier_on' > ld: stack.c:(.text+0x478): undefined reference to `netif_tx_wake_queue' > ld: drivers/staging/wimax/stack.o: in function `wimax_subsys_exit': > stack.c:(.exit.text+0xe): undefined reference to `genl_unregister_family' > ld: drivers/staging/wimax/stack.o: in function `wimax_subsys_init': > stack.c:(.init.text+0x1a): undefined reference to `genl_register_family' > > Cc: Greg Kroah-Hartman > Cc: Jakub Kicinski > Cc: Arnd Bergmann > Cc: net...@vger.kernel.org > Acked-by: Arnd Bergmann > Signed-off-by: Randy Dunlap > Link: https://lore.kernel.org/r/20201102072456.20303-1-rdun...@infradead.org > Signed-off-by: Greg Kroah-Hartman > Signed-off-by: Sasha Levin > --- > net/wimax/Kconfig | 1 + > 1 file changed, 1 insertion(+) This isn't needed in any backported kernel as it only is relevant when the code moved to drivers/staging/ thanks, greg k-h
Re: [PATCH] mm/uaccess: Use 'unsigned long' to placate UBSAN warnings, again
On 12/22/20 9:04 PM, Josh Poimboeuf wrote: > GCC 7 has a known bug where UBSAN ignores '-fwrapv' and generates false > signed-overflow-UB warnings. The type mismatch between 'i' and > 'nr_segs' in copy_compat_iovec_from_user() is causing such a warning, > which also happens to violate uaccess rules: > > lib/iov_iter.o: warning: objtool: iovec_from_user()+0x22d: call to > __ubsan_handle_add_overflow() with UACCESS enabled > > Fix it by making the variable types match. > > This is similar to a previous commit: > > 29da93fea3ea ("mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on > older GCC versions") > > Reported-by: Randy Dunlap > Signed-off-by: Josh Poimboeuf All good. Thanks. Acked-by: Randy Dunlap # build-tested > --- > lib/iov_iter.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/lib/iov_iter.c b/lib/iov_iter.c > index 1635111c5bd2..2e6a42f5d1df 100644 > --- a/lib/iov_iter.c > +++ b/lib/iov_iter.c > @@ -1656,7 +1656,8 @@ static int copy_compat_iovec_from_user(struct iovec > *iov, > { > const struct compat_iovec __user *uiov = > (const struct compat_iovec __user *)uvec; > - int ret = -EFAULT, i; > + int ret = -EFAULT; > + unsigned long i; > > if (!user_access_begin(uvec, nr_segs * sizeof(*uvec))) > return -EFAULT; > -- ~Randy
[PATCH v2 2/2] arm64: dts: mt6779: Support ufshci and ufsphy
Support UFS on MT6779 platforms by adding ufshci and ufsphy nodes in dts file. Reviewed-by: Hanks Chen Signed-off-by: Stanley Chu --- arch/arm64/boot/dts/mediatek/mt6779.dtsi | 36 +++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/mediatek/mt6779.dtsi b/arch/arm64/boot/dts/mediatek/mt6779.dtsi index 370f309d32de..6eaf230bb0d1 100644 --- a/arch/arm64/boot/dts/mediatek/mt6779.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt6779.dtsi @@ -225,6 +225,41 @@ #clock-cells = <1>; }; + ufshci: ufshci@1127 { + compatible = "mediatek,mt8183-ufshci"; + reg = <0 0x1127 0 0x2300>; + interrupts = ; + phys = <>; + + clocks = <_ao CLK_INFRA_UFS>, +<_ao CLK_INFRA_UFS_TICK>, +<_ao CLK_INFRA_UFS_AXI>, +<_ao CLK_INFRA_UNIPRO_TICK>, +<_ao CLK_INFRA_UNIPRO_MBIST>, +< CLK_TOP_FAES_UFSFDE>, +<_ao CLK_INFRA_AES_UFSFDE>, +<_ao CLK_INFRA_AES_BCLK>; + clock-names = "ufs", "ufs_tick", "ufs_axi", + "unipro_tick", "unipro_mbist", + "aes_top", "aes_infra", "aes_bclk"; + freq-table-hz = <0 0>, <0 0>, <0 0>, + <0 0>, <0 0>, <0 0>, + <0 0>, <0 0>; + + mediatek,ufs-disable-ah8; + mediatek,ufs-support-va09; + }; + + ufsphy: phy@11fa { + compatible = "mediatek,mt8183-ufsphy"; + reg = <0 0x11fa 0 0xc000>; + #phy-cells = <0>; + + clocks = <_ao CLK_INFRA_UNIPRO_SCK>, +<_ao CLK_INFRA_UFS_MP_SAP_BCLK>; + clock-names = "unipro", "mp"; + }; + mfgcfg: clock-controller@13fbf000 { compatible = "mediatek,mt6779-mfgcfg", "syscon"; reg = <0 0x13fbf000 0 0x1000>; @@ -266,6 +301,5 @@ reg = <0 0x1b00 0 0x1000>; #clock-cells = <1>; }; - }; }; -- 2.18.0
[PATCH v2 0/2] arm64: Support Universal Flash Storage on MediaTek MT6779 platform
Hi, This series adds UFS (Universal Flash Storage) support on MediaTek MT6779 SoC platform. Changes since v1: - Fix irq attribute in dts in patch [2/2] Stanley Chu (2): arm64: configs: Support Universal Flash Storage on MediaTek platforms arm64: dts: mt6779: Support ufshci and ufsphy arch/arm64/boot/dts/mediatek/mt6779.dtsi | 36 +++- arch/arm64/configs/defconfig | 1 + 2 files changed, 36 insertions(+), 1 deletion(-) -- 2.18.0
[PATCH v2 1/2] arm64: configs: Support Universal Flash Storage on MediaTek platforms
Support UFS on MediaTek platforms by enabling CONFIG_SCSI_UFS_MEDIATEK. Reviewed-by: Hanks Chen Signed-off-by: Stanley Chu --- arch/arm64/configs/defconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 17a2df6a263e..e92f42a43bfa 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -277,6 +277,7 @@ CONFIG_SCSI_MPT3SAS=m CONFIG_SCSI_UFSHCD=y CONFIG_SCSI_UFSHCD_PLATFORM=y CONFIG_SCSI_UFS_QCOM=m +CONFIG_SCSI_UFS_MEDIATEK=m CONFIG_SCSI_UFS_HISI=y CONFIG_ATA=y CONFIG_SATA_AHCI=y -- 2.18.0
Re: [PATCHSET] saner elf compat
On Wed, Dec 23, 2020 at 07:03:20AM +, Al Viro wrote: Argh Wrong commit blamed - the parent of the correct one. It's actually 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped files"). My apologies - fat-fingered cut'n'paste... siginfo commit does suffer the same problem, but it becomes an issue only for 32bit processes under mips64 big-endian kernel (there it yields e.g. zero .__sigfault.si_addr in $_siginfo when using gdb with a coredump of 32bit process, whatever the actual faulting address had been). And b-e mips64 is rather uncommon, so that's less of an issue.
Re: [PATCH AUTOSEL 5.4 057/130] ALSA: usb-audio: Check valid altsetting at parsing rates for UAC2/3
On Wed, 23 Dec 2020 03:17:00 +0100, Sasha Levin wrote: > > From: Takashi Iwai > > [ Upstream commit 93db51d06b32227319dae2ac289029ccf1b33181 ] > > The current driver code assumes blindly that all found sample rates for > the same endpoint from the UAC2 and UAC3 descriptors can be used no > matter which altsetting, but actually this was wrong: some devices > accept only limited sample rates in each altsetting. For determining > which altsetting supports which rate, we need to verify each sample rate > and check the validity via UAC2_AS_VAL_ALT_SETTINGS. This control > reports back the available altsettings as a bitmap. > > This patch implements the missing piece above, the verification and > reconstructs the sample rate tables based on the result. > > An open question is how to deal with the altsettings that ended up > with no valid sample rates after verification. At least, there is a > device that showed this problem although the sample rates did work in > the later usage (see bug link). For now, we accept such an altset as > is, assuming that it's a firmware bug. > > Reported-by: Dylan Robinson > Tested-by: Keith Milner > Tested-by: Dylan Robinson > BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178203 > Link: https://lore.kernel.org/r/20201123085347.19667-4-ti...@suse.de > Signed-off-by: Takashi Iwai > Signed-off-by: Sasha Levin Please drop this for 5.4 or older. At least this caused some problem on 5.3 kernel that confused USB core by some reason while it works fine with the recent upstream. thanks, Takashi
RE: [v2 1/2] rtc: pcf2127: properly set flag WD_CD for rtc chips(pcf2129, pca2129)
Hi Alexandre, Any comments? Regards, Biwen Li > -Original Message- > From: Biwen Li > Sent: 2020年12月2日 11:19 > To: Leo Li ; alexandre.bell...@bootlin.com; Anson > Huang ; Aisheng Dong > Cc: linux-kernel@vger.kernel.org; Jiafei Pan ; > linux-...@vger.kernel.org; Biwen Li > Subject: [v2 1/2] rtc: pcf2127: properly set flag WD_CD for rtc chips(pcf2129, > pca2129) > > From: Biwen Li > > Properly set flag WD_CD for rtc chips(pcf2129, pca2129) > > Signed-off-by: Biwen Li > --- > Change in v2: > - set flag WD_CD according to compatible > > drivers/rtc/rtc-pcf2127.c | 7 ++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/rtc/rtc-pcf2127.c b/drivers/rtc/rtc-pcf2127.c index > 03c9cb6b0b6e..a5418b657c50 100644 > --- a/drivers/rtc/rtc-pcf2127.c > +++ b/drivers/rtc/rtc-pcf2127.c > @@ -620,6 +620,10 @@ static int pcf2127_probe(struct device *dev, struct > regmap *regmap, >* Watchdog timer enabled and reset pin /RST activated when timed out. >* Select 1Hz clock source for watchdog timer. >* Note: Countdown timer disabled and not available. > + * For pca2129, pcf2129, only bit[7] is for Symbol WD_CD > + * of register watchdg_tim_ctl. The bit[6] is labeled > + * as T. Bits labeled as T must always be written with > + * logic 0. >*/ > ret = regmap_update_bits(pcf2127->regmap, PCF2127_REG_WD_CTL, >PCF2127_BIT_WD_CTL_CD1 | > @@ -627,7 +631,8 @@ static int pcf2127_probe(struct device *dev, struct > regmap *regmap, >PCF2127_BIT_WD_CTL_TF1 | >PCF2127_BIT_WD_CTL_TF0, >PCF2127_BIT_WD_CTL_CD1 | > - PCF2127_BIT_WD_CTL_CD0 | > + (device_property_match_string(dev, > "compatible", > "nxp,pcf2127") > + ? (PCF2127_BIT_WD_CTL_CD0) : (0)) | >PCF2127_BIT_WD_CTL_TF1); > if (ret) { > dev_err(dev, "%s: watchdog config (wd_ctl) failed\n", __func__); > -- > 2.17.1
Re: [PATCHSET] saner elf compat
[Denys Vlasenko cc'd] On Wed, Dec 16, 2020 at 09:44:53AM +, Maciej W. Rozycki wrote: > On Wed, 16 Dec 2020, Al Viro wrote: > > > > It may be worth pushing through GDB's gdb.threads/tls-core.exp test > > > case, > > > making sure no UNSUPPORTED results have been produced due to resource > > > limits preventing a core from being dumped (and no FAILs, of course), > > > with > > > o32/n32 native GDB. This should guarantee our output is still as > > > expected > > > by an interpreter. Sadly I'm currently not set up for such testing > > > though > > > eventually I mean to. > > > > Umm... What triple does one use for n32 gdb? > > I don't think there's a standardised one, just configure with CC/CXX set > for n32 compilation, e.g.: > > $ /path/to/configure CC="gcc -mabi=n32" CXX="g++ -mabi=n32" > > (and any other options set as usually). This has to be with CC/CXX rather > than CFLAGS/CXXFLAGS so that it is guaranteed to be never overridden with > any logic that might do any fiddling with compilation options. This will > set up the test suite accordingly. > > NB this may already be the compiler's default, depending on how it was > configured, i.e. if `--with-abi=n32' was used, in which case no extra > options will be required. I don't know if any standard MIPS distribution > does it though; 64-bit MIPS/Debian might. This will be reported with `gcc > --help -v', somewhere along the way. > > Let me know if there are issues with this approach. One issue is that testsuite doesn't care about $CC, $CFLAGS or anything of that sort. What I'd done was cat >~/bin/cc-n32 <<'EOF' #!/bin/sh exec /usr/bin/gcc -mabi=n32 "$@" EOF chmod +x ~/bin/cc-n32 and add CC_FOR_TARGET="/home/al/bin/cc-n32" in RUNTESTFLAGS. With that it works. Moreover, it fixes a test failure on mainline. Mainline kernel (5.10, same behaviour as debian/buster mips64el one): Test run by al on Tue Dec 22 21:23:09 2020 Native configuration is mips64el-unknown-linux-gnuabin32 === gdb tests === Schedule of variations: unix Running target unix Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/al/binutils-gdb/gdb/testsuite/config/unix.exp as tool-and-target-specific interface file. Running /home/al/binutils-gdb/gdb/testsuite/gdb.threads/tls-core.exp ... FAIL: gdb.threads/tls-core.exp: native: print thread-local storage variable === gdb Summary === # of expected passes5 # of unexpected failures1 vfs.git #work.elf-compat: Test run by al on Tue Dec 22 21:31:14 2020 Native configuration is mips64el-unknown-linux-gnuabin32 === gdb tests === Schedule of variations: unix Running target unix Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target. Using /usr/share/dejagnu/config/unix.exp as generic interface file for target. Using /home/al/binutils-gdb/gdb/testsuite/config/unix.exp as tool-and-target-specific interface file. Running /home/al/binutils-gdb/gdb/testsuite/gdb.threads/tls-core.exp ... === gdb Summary === # of expected passes6 Which is bloody embarrassing, since I'd completely missed the behaviour change - this series was supposed to be an equivalent transformation. Anyway, the minimal patch fixing that failure is this one-liner and unlike the elf-compat series it's trivial to backport: [mips] fix n32 coredump breakage Back in 2012, 49ae4d4b113b ("coredump: add a new elf note with siginfo of the signal") has introduced a new ELF coredump note - NT_FILE. It contains a mix of strings and addresses, and addresses are 32bit for 32bit targets and 64bit for 64bit ones. Eventually gdb has come to use it. Biarch targets had been taken care of from the very beginning - the same commit has added a macro (user_long_t) with default being long and fs/compat_binfmt_elf.c overriding it to compat_long_t. Unfortunately, Denis had missed the mips weirdness. As the result, on mips64 both o32 and n32 ended up using 64-bit layout. readelf(1) is not happy. More importantly, neither is gdb(1); as the matter of fact, gdb.thread/tls-core.exp kept complaining. Note that gcore(1) is using 32bit layout for n32 case - it's only the kernel n32 coredumps that get broken NT_FILE note. NOTE: similar patch is almost certainly needed for o32; I have only tested it with n32 gdb, though. Fixes: 49ae4d4b113b ("coredump: add a new elf note with siginfo of the signal") Signed-off-by: Al Viro --- diff --git a/arch/mips/kernel/binfmt_elfn32.c b/arch/mips/kernel/binfmt_elfn32.c index 6ee3f7218c67..c073136968e8 100644 --- a/arch/mips/kernel/binfmt_elfn32.c +++ b/arch/mips/kernel/binfmt_elfn32.c @@ -103,4 +103,6 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct old_timeval32 *value) #undef ns_to_kernel_old_timeval #define ns_to_kernel_old_timeval ns_to_old_timeval32
Re: [PATCH] ubifs: Fix read out-of-bounds in ubifs_jnl_write_inode()
在 2020/12/23 14:28, Chengsong Ke 写道: Reviewed-by: Zhihao Cheng From: kechengsong ubifs_jnl_write_inode() probably cause read out-of-bounds in some situation. There is kasan stack: [ 336.432159] BUG: KASAN: slab-out-of-bounds in ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.433634] Read of size 4 at addr 888019612ff8 by task kworker/u8:4/135 [ 336.434605] [ 336.434830] CPU: 1 PID: 135 Comm: kworker/u8:4 Not tainted 5.10.0-11826-gaf2a097952f3-dirty #338 [ 336.436050] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 336.437876] Workqueue: writeback wb_workfn (flush-ubifs_0_0) [ 336.438670] Call Trace: [ 336.439021] ? dump_stack+0xdd/0x126 [ 336.439513] ? print_address_description.constprop.0+0x2c/0x3c0 [ 336.440308] ? _raw_write_lock_irqsave+0x140/0x140 [ 336.440921] ? ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.441546] ? ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.442186] ? kasan_report.cold+0x5d/0xd8 [ 336.442711] ? nand_reset_op+0x280/0x310 [ 336.443218] ? ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.443842] ? __asan_load4+0x77/0x120 [ 336.444334] ? ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.444963] ? nand_ecc_sw_hamming_calculate+0x6c/0x80 [ 336.445619] ? rawnand_sw_hamming_calculate+0x12/0x20 [ 336.446263] ? nand_write_page_swecc+0xa9/0x160 [ 336.446849] ? nand_do_write_ops+0x390/0x830 [ 336.447406] ? __writeback_single_inode+0x6cc/0x880 [ 336.448041] ? nand_write_oob+0x78/0x100 [ 336.448568] ? mtd_write_oob_std+0xe2/0x160 [ 336.449127] ? mtd_write_oob+0xec/0x1b0 [ 336.449679] ? mtd_write+0x92/0xf0 [ 336.450128] ? mtd_write_oob+0x1b0/0x1b0 [ 336.450633] ? ubi_self_check_all_ff+0x82/0x2e0 [ubi] [ 336.451328] ? __list_add_valid+0x2b/0x130 [ 336.451865] ? ubi_io_write+0x2c2/0xa90 [ubi] [ 336.452472] ? _raw_read_lock_irq+0x90/0x90 [ 336.453078] ? kmem_cache_alloc_trace+0x465/0x8b0 [ 336.453749] ? do_sync_erase+0x350/0x350 [ubi] [ 336.454430] ? __kasan_check_write+0x20/0x30 [ 336.455050] ? down_write+0xf2/0x190 [ 336.455569] ? down_write_killable+0x1b0/0x1b0 [ 336.456221] ? check_mapping+0x2c/0x590 [ubi] [ 336.456890] ? ubi_eba_write_leb+0x58a/0xfa0 [ubi] [ 336.457618] ? __kmalloc+0x490/0x910 [ 336.458142] ? ubifs_jnl_write_inode.cold+0x6f/0x878 [ubifs] [ 336.459033] ? writeback_sb_inodes+0x3a9/0x9a0 [ 336.459672] ? __writeback_inodes_wb+0xc8/0x170 [ 336.460330] ? wb_writeback+0x637/0x700 [ 336.460882] ? wb_workfn+0x8af/0xb80 [ 336.461398] ? process_one_work+0x467/0x9f0 [ 336.462004] ? worker_thread+0x34d/0x8e0 [ 336.462582] ? kthread+0x204/0x280 [ 336.463047] ? ret_from_fork+0x1f/0x30 [ 336.463570] ? create_prof_cpu_mask+0x30/0x30 [ 336.464185] ? ubi_eba_read_leb_sg+0x1f0/0x1f0 [ubi] [ 336.464917] ? hrtimer_active+0x9b/0x100 [ 336.465468] ? ubi_leb_write+0x22c/0x2f0 [ubi] [ 336.466130] ? ubifs_leb_write+0xf2/0x1b0 [ubifs] [ 336.466851] ? ubifs_wbuf_write_nolock+0x412/0x1280 [ubifs] [ 336.467686] ? write_head+0xdf/0x1c0 [ubifs] [ 336.468355] ? ubifs_jnl_write_inode.cold+0x3ec/0x878 [ubifs] [ 336.469183] ? ret_from_fork+0x1e/0x30 [ 336.469707] ? ubifs_jnl_write_data+0x660/0x660 [ubifs] [ 336.470497] ? unwind_next_frame+0x247/0xca0 [ 336.471095] ? ret_from_fork+0x1f/0x30 [ 336.471574] ? fprop_reflect_period_percpu.isra.0+0x1f/0x1b0 [ 336.472335] ? generic_writepages+0x93/0x140 [ 336.472933] ? __kasan_check_write+0x20/0x30 [ 336.473526] ? mutex_lock+0xa6/0x110 [ 336.474031] ? __mutex_lock_slowpath+0x30/0x30 [ 336.474662] ? ubifs_write_inode+0x1c3/0x290 [ubifs] [ 336.475446] ? __writeback_single_inode+0x6cc/0x880 [ 336.476155] ? wbc_attach_and_unlock_inode+0x2b6/0x400 [ 336.476891] ? writeback_sb_inodes+0x3a9/0x9a0 [ 336.477528] ? write_inode_now+0x1e0/0x1e0 [ 336.478119] ? __writeback_inodes_wb+0xc8/0x170 [ 336.478770] ? wb_writeback+0x637/0x700 [ 336.479326] ? __writeback_inodes_wb+0x170/0x170 [ 336.479992] ? current_work+0xa0/0xa0 [ 336.480524] ? _find_next_bit.constprop.0+0x3e/0x140 [ 336.481241] ? find_next_bit+0x18/0x30 [ 336.481780] ? cpumask_next+0x2f/0x40 [ 336.482312] ? wb_workfn+0x8af/0xb80 [ 336.482832] ? update_cfs_group+0x1e/0x1b0 [ 336.483421] ? inode_wait_for_writeback+0x60/0x60 [ 336.484106] ? schedule+0xb7/0x240 [ 336.484595] ? finish_task_switch+0x14e/0x9a0 [ 336.485225] ? __kasan_check_write+0x20/0x30 [ 336.485841] ? __schedule+0x6f4/0x1600 [ 336.486382] ? __kasan_check_read+0x1d/0x30 [ 336.486981] ? read_word_at_a_time+0x16/0x30 [ 336.487594] ? process_one_work+0x467/0x9f0 [ 336.488198] ? worker_thread+0x34d/0x8e0 [ 336.488762] ? rescuer_thread+0x820/0x820 [ 336.489344] ? kthread+0x204/0x280 [ 336.489839] ? kthread_bind+0x50/0x50 [ 336.490367] ? ret_from_fork+0x1f/0x30 [ 336.490913] [ 336.491138] Allocated by task 135: [ 336.491629] kasan_save_stack+0x23/0x60 [ 336.492189] __kasan_kmalloc.constprop.0+0x10b/0x120 [
Re: [PATCH v3 3/4] x86/signal: Prevent an alternate stack overflow before a signal delivery
On Wed, Dec 23, 2020 at 2:57 AM Chang S. Bae wrote: > The kernel pushes data on the userspace stack when entering a signal. If > using a sigaltstack(), the kernel precisely knows the user stack size. > > When the kernel knows that the user stack is too small, avoid the overflow > and do an immediate SIGSEGV instead. > > This overflow is known to occur on systems with large XSAVE state. The > effort to increase the size typically used for altstacks reduces the > frequency of these overflows, but this approach is still useful for legacy > binaries. > > Suggested-by: Jann Horn > Signed-off-by: Chang S. Bae > Reviewed-by: Len Brown > Cc: Jann Horn > Cc: x...@kernel.org > Cc: linux-kernel@vger.kernel.org Reviewed-by: Jann Horn
Re: [PATCH] powerpc/32s: Fix RTAS machine check with VMAP stack
Le 22/12/2020 à 08:11, Christophe Leroy a écrit : When we have VMAP stack, exception prolog 1 sets r1, not r11. But exception prolog 1 uses r1 to setup r1 when machine check happens in kernel. So r1 must be restored when the branch is not taken. See subsequent patch I just sent out. Christophe Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating MMU") Fixes: d2e006036082 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- arch/powerpc/kernel/head_book3s_32.S | 7 +++ 1 file changed, 7 insertions(+) diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S index 349bf3f0c3af..fbc48a500846 100644 --- a/arch/powerpc/kernel/head_book3s_32.S +++ b/arch/powerpc/kernel/head_book3s_32.S @@ -260,9 +260,16 @@ __secondary_hold_acknowledge: MachineCheck: EXCEPTION_PROLOG_0 #ifdef CONFIG_PPC_CHRP +#ifdef CONFIG_VMAP_STACK + mtspr SPRN_SPRG_SCRATCH2,r1 + mfspr r1, SPRN_SPRG_THREAD + lwz r1, RTAS_SP(r1) + cmpwi cr1, r1, 0 +#else mfspr r11, SPRN_SPRG_THREAD lwz r11, RTAS_SP(r11) cmpwi cr1, r11, 0 +#endif bne cr1, 7f #endif /* CONFIG_PPC_CHRP */ EXCEPTION_PROLOG_1 for_rtas=1
Re: [v2] i2c: mediatek: Move suspend and resume handling to NOIRQ phase
Hi sirs: If there is no new comment, I will resent it in 5.11.
[PATCH v1 2/2] perf arm64: Add argument support for SDT
Now the two OP formats are used for SDT marker argument in Arm64 ELF, one format is genreal register xNUM (e.g. x1, x2, etc), another is for using stack pointer to access local variables (e.g. [sp], [sp, 8]). This patch adds support SDT marker argument for Arm64, it parses OP and converts to uprobe compatible format. Signed-off-by: Leo Yan --- tools/perf/arch/arm64/util/perf_regs.c | 94 ++ 1 file changed, 94 insertions(+) diff --git a/tools/perf/arch/arm64/util/perf_regs.c b/tools/perf/arch/arm64/util/perf_regs.c index 54efa12fdbea..6b4b18283041 100644 --- a/tools/perf/arch/arm64/util/perf_regs.c +++ b/tools/perf/arch/arm64/util/perf_regs.c @@ -1,4 +1,12 @@ // SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include +#include + +#include "../../../util/debug.h" +#include "../../../util/event.h" #include "../../../util/perf_regs.h" const struct sample_reg sample_reg_masks[] = { @@ -37,3 +45,89 @@ const struct sample_reg sample_reg_masks[] = { SMPL_REG(pc, PERF_REG_ARM64_PC), SMPL_REG_END }; + +/* %xNUM */ +#define SDT_OP_REGEX1 "^(x[1-2]?[0-9]|3[0-1])$" + +/* [sp], [sp, NUM] or [sp,NUM] */ +#define SDT_OP_REGEX2 "^\\[sp(, *)?([0-9]+)?\\]$" + +static regex_t sdt_op_regex1, sdt_op_regex2; + +static int sdt_init_op_regex(void) +{ + static int initialized; + int ret = 0; + + if (initialized) + return 0; + + ret = regcomp(_op_regex1, SDT_OP_REGEX1, REG_EXTENDED); + if (ret) + goto error; + + ret = regcomp(_op_regex2, SDT_OP_REGEX2, REG_EXTENDED); + if (ret) + goto free_regex1; + + initialized = 1; + return 0; + +free_regex1: + regfree(_op_regex1); +error: + pr_debug4("Regex compilation error.\n"); + return ret; +} + +/* + * SDT marker arguments on Arm64 uses %xREG or [sp, NUM], currently + * support these two formats. + */ +int arch_sdt_arg_parse_op(char *old_op, char **new_op) +{ + int ret, new_len; + regmatch_t rm[5]; + + ret = sdt_init_op_regex(); + if (ret < 0) + return ret; + + if (!regexec(_op_regex1, old_op, 3, rm, 0)) { + /* Extract xNUM */ + new_len = 2;/* % NULL */ + new_len += (int)(rm[1].rm_eo - rm[1].rm_so); + + *new_op = zalloc(new_len); + if (!*new_op) + return -ENOMEM; + + scnprintf(*new_op, new_len, "%%%.*s", + (int)(rm[1].rm_eo - rm[1].rm_so), old_op + rm[1].rm_so); + } else if (!regexec(_op_regex2, old_op, 5, rm, 0)) { + /* [sp], [sp, NUM] or [sp,NUM] */ + new_len = 7;/* + ( % s p ) NULL */ + + /* If the arugment is [sp], need to fill offset '0' */ + if (rm[2].rm_so == -1) + new_len += 1; + else + new_len += (int)(rm[2].rm_eo - rm[2].rm_so); + + *new_op = zalloc(new_len); + if (!*new_op) + return -ENOMEM; + + if (rm[2].rm_so == -1) + scnprintf(*new_op, new_len, "+0(%%sp)"); + else + scnprintf(*new_op, new_len, "+%.*s(%%sp)", + (int)(rm[2].rm_eo - rm[2].rm_so), + old_op + rm[2].rm_so); + } else { + pr_debug4("Skipping unsupported SDT argument: %s\n", old_op); + return SDT_ARG_SKIP; + } + + return SDT_ARG_VALID; +} -- 2.17.1
[PATCH] powerpc/32s: Fix RTAS machine check with VMAP stack - again
When it is not a RTAS machine check, don't trash r1 because it is needed by prolog 1. Fixes: 9c7422b92cb2 ("powerpc/32s: Fix RTAS machine check with VMAP stack") Cc: sta...@vger.kernel.org Signed-off-by: Christophe Leroy --- Sorry Michael for this last minute fix of the fix. arch/powerpc/kernel/head_book3s_32.S | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S index fbc48a500846..858fbc8b19f3 100644 --- a/arch/powerpc/kernel/head_book3s_32.S +++ b/arch/powerpc/kernel/head_book3s_32.S @@ -265,12 +265,14 @@ MachineCheck: mfspr r1, SPRN_SPRG_THREAD lwz r1, RTAS_SP(r1) cmpwi cr1, r1, 0 + bne cr1, 7f + mfspr r1, SPRN_SPRG_SCRATCH2 #else mfspr r11, SPRN_SPRG_THREAD lwz r11, RTAS_SP(r11) cmpwi cr1, r11, 0 -#endif bne cr1, 7f +#endif #endif /* CONFIG_PPC_CHRP */ EXCEPTION_PROLOG_1 for_rtas=1 7: EXCEPTION_PROLOG_2 -- 2.25.0
[PATCH v1 1/2] perf probe: Fixup Arm64 SDT arguments
Arm64 ELF section '.note.stapsdt' uses string format "-4@[sp, NUM]" if the probe is to access data in stack, e.g. below is an example for dumping Arm64 ELF file and shows the argument format: Arguments: -4@[sp, 12] -4@[sp, 8] -4@[sp, 4] Comparing against other archs' argument format, Arm64's argument introduces an extra space character in the middle of square brackets, due to argv_split() uses space as splitter, the argument is wrongly divided into two items. To support Arm64 SDT, this patch fixes up for this case, if any item contains sub string "[sp", concatenates the two continuous items. And adds the detailed explaination in comment. Signed-off-by: Leo Yan --- tools/perf/util/probe-file.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c index 064b63a6a3f3..60878c859e60 100644 --- a/tools/perf/util/probe-file.c +++ b/tools/perf/util/probe-file.c @@ -794,6 +794,8 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note, char *ret = NULL, **args; int i, args_count, err; unsigned long long ref_ctr_offset; + char *arg; + int arg_idx = 0; if (strbuf_init(, 32) < 0) return NULL; @@ -815,8 +817,34 @@ static char *synthesize_sdt_probe_command(struct sdt_note *note, if (note->args) { args = argv_split(note->args, _count); - for (i = 0; i < args_count; ++i) { - if (synthesize_sdt_probe_arg(, i, args[i]) < 0) + for (i = 0; i < args_count; ) { + /* +* FIXUP: Arm64 ELF section '.note.stapsdt' uses string +* format "-4@[sp, NUM]" if a probe is to access data in +* the stack, e.g. below is an example for the SDT +* Arguments: +* +* Arguments: -4@[sp, 12] -4@[sp, 8] -4@[sp, 4] +* +* Since the string introduces an extra space character +* in the middle of square brackets, the argument is +* divided into two items. Fixup for this case, if an +* item contains sub string "[sp,", need to concatenate +* the two items. +*/ + if (strstr(args[i], "[sp,") && (i+1) < args_count) { + arg = strcat(args[i], args[i+1]); + i += 2; + } else { + arg = strdup(args[i]); + i += 1; + } + + err = synthesize_sdt_probe_arg(, arg_idx, arg); + free(arg); + arg_idx++; + + if (err < 0) goto error; } } -- 2.17.1
[PATCH v1 0/2] perf arm64: Support SDT
This patch is to enable SDT on Arm64. Since Arm64 SDT marker in ELF file is different from other archs, especially for using stack pointer (sp) to retrieve data for local variables, patch 01 is used to fixup the arguments for this special case. Patch 02 is to add argument support for Arm64 SDT. This patch set has been verified on Arm64/x86_64 platforms with a testing program usdt_test [1]. The program run the SDT interfaces one by one for DTRACE_PROBE, DTRACE_PROBE1, ..., DTRACE_PROBE12, so it tries to verify probe with different count of arguments (the arguments count is 0 to 12). The testing flow and result are shown as below: # perf buildid-cache --add /root/test/usdt_test # perf probe sdt_usdt:test_probe # perf probe sdt_usdt:test_probe_param1 # perf probe sdt_usdt:test_probe_param1x # perf probe sdt_usdt:test_probe_param2 # perf probe sdt_usdt:test_probe_param2x # perf probe sdt_usdt:test_probe_param3 # perf probe sdt_usdt:test_probe_param3x # perf probe sdt_usdt:test_probe_param4 # perf probe sdt_usdt:test_probe_param4x # perf probe sdt_usdt:test_probe_param5 # perf probe sdt_usdt:test_probe_param5x # perf probe sdt_usdt:test_probe_param6 # perf probe sdt_usdt:test_probe_param6x # perf probe sdt_usdt:test_probe_param7 # perf probe sdt_usdt:test_probe_param7x # perf probe sdt_usdt:test_probe_param8 # perf probe sdt_usdt:test_probe_param8x # perf probe sdt_usdt:test_probe_param9 # perf probe sdt_usdt:test_probe_param9x # perf probe sdt_usdt:test_probe_param10 # perf probe sdt_usdt:test_probe_param10x # perf probe sdt_usdt:test_probe_param11 # perf probe sdt_usdt:test_probe_param11x # perf probe sdt_usdt:test_probe_param12 # perf probe sdt_usdt:test_probe_param12x # perf record \ -e sdt_usdt:test_probe_param1 -e sdt_usdt:test_probe_param1x \ -e sdt_usdt:test_probe_param2 -e sdt_usdt:test_probe_param2x \ -e sdt_usdt:test_probe_param3 -e sdt_usdt:test_probe_param3x \ -e sdt_usdt:test_probe_param4 -e sdt_usdt:test_probe_param4x \ -e sdt_usdt:test_probe_param5 -e sdt_usdt:test_probe_param5x \ -e sdt_usdt:test_probe_param6 -e sdt_usdt:test_probe_param6x \ -e sdt_usdt:test_probe_param7 -e sdt_usdt:test_probe_param7x \ -e sdt_usdt:test_probe_param8 -e sdt_usdt:test_probe_param8x \ -e sdt_usdt:test_probe_param9 -e sdt_usdt:test_probe_param9x \ -e sdt_usdt:test_probe_param10 -e sdt_usdt:test_probe_param10x \ -e sdt_usdt:test_probe_param11 -e sdt_usdt:test_probe_param11x \ -e sdt_usdt:test_probe_param12 -e sdt_usdt:test_probe_param12x \ -e sdt_usdt:test_probe -aR sleep 5 # ./usdt_test => Execute in another terminal # perf script usdt_test 7999 [003] 80493.418276: sdt_usdt:test_probe: (b0d80714) usdt_test 7999 [003] 80493.418352: sdt_usdt:test_probe_param1: (b0d80728) arg1=1 usdt_test 7999 [003] 80493.418379: sdt_usdt:test_probe_param2: (b0d80744) arg1=1 arg2=2 usdt_test 7999 [003] 80493.418405: sdt_usdt:test_probe_param3: (b0d80764) arg1=1 arg2=2 arg3=3 usdt_test 7999 [003] 80493.418432: sdt_usdt:test_probe_param4: (b0d80788) arg1=1 arg2=2 arg3=3 arg4=4 usdt_test 7999 [003] 80493.418459: sdt_usdt:test_probe_param5: (b0d807b0) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 usdt_test 7999 [003] 80493.418487: sdt_usdt:test_probe_param6: (b0d807dc) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 usdt_test 7999 [003] 80493.418516: sdt_usdt:test_probe_param7: (b0d8080c) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 usdt_test 7999 [003] 80493.418545: sdt_usdt:test_probe_param8: (b0d80840) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8 usdt_test 7999 [003] 80493.418574: sdt_usdt:test_probe_param9: (b0d80874) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8 arg9=9 usdt_test 7999 [003] 80493.418603: sdt_usdt:test_probe_param10: (b0d808a8) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8 arg9=9 arg10=10 usdt_test 7999 [003] 80493.418632: sdt_usdt:test_probe_param11: (b0d808dc) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8 arg9=9 arg10=10 arg11=11 usdt_test 7999 [003] 80493.418662: sdt_usdt:test_probe_param12: (b0d80910) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8 arg9=9 arg10=10 arg11=11 arg12=12 usdt_test 7999 [003] 80493.418687: sdt_usdt:test_probe_param1x: (b0d8092c) arg1=1 usdt_test 7999 [003] 80493.418713: sdt_usdt:test_probe_param2x: (b0d80950) arg1=1 arg2=2 usdt_test 7999 [003] 80493.418739: sdt_usdt:test_probe_param3x: (b0d8097c) arg1=1 arg2=2 arg3=3 usdt_test 7999 [003] 80493.418766: sdt_usdt:test_probe_param4x: (b0d809b0) arg1=1 arg2=2 arg3=3 arg4=4 usdt_test 7999 [003] 80493.418792: sdt_usdt:test_probe_param5x: (b0d809ec) arg1=1 arg2=2 arg3=3
[PATCH] kconfig: remove 'kvmconfig' and 'xenconfig' shorthands
Linux 5.10 is out. Remove the 'kvmconfig' and 'xenconfig' shorthands as previously announced. Signed-off-by: Masahiro Yamada --- scripts/kconfig/Makefile | 10 -- 1 file changed, 10 deletions(-) diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile index e46df0a2d4f9..2c40e68853dd 100644 --- a/scripts/kconfig/Makefile +++ b/scripts/kconfig/Makefile @@ -94,16 +94,6 @@ configfiles=$(wildcard $(srctree)/kernel/configs/$@ $(srctree)/arch/$(SRCARCH)/c $(Q)$(CONFIG_SHELL) $(srctree)/scripts/kconfig/merge_config.sh -m .config $(configfiles) $(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig -PHONY += kvmconfig -kvmconfig: kvm_guest.config - @echo >&2 "WARNING: 'make $@' will be removed after Linux 5.10" - @echo >&2 " Please use 'make $<' instead." - -PHONY += xenconfig -xenconfig: xen.config - @echo >&2 "WARNING: 'make $@' will be removed after Linux 5.10" - @echo >&2 " Please use 'make $<' instead." - PHONY += tinyconfig tinyconfig: $(Q)$(MAKE) -f $(srctree)/Makefile allnoconfig tiny.config -- 2.27.0
Re: [PATCH] sh: check return code of request_irq
On Wed, Dec 23, 2020 at 5:54 AM Nick Desaulniers wrote: > > request_irq is marked __must_check, but the call in shx3_prepare_cpus > has a void return type, so it can't propagate failure to the caller. > Follow cues from hexagon and just print an error. > > Fixes: c7936b9abcf5 ("sh: smp: Hook in to the generic IPI handler for SH-X3 > SMP.") > Cc: Miguel Ojeda > Cc: Paul Mundt > Reported-by: Guenter Roeck > Signed-off-by: Nick Desaulniers Thanks for the patch, Nick. I just wondered if there was a better error handling than printing the message. I have no idea if the system will boot up correctly when the request_irq() fails here. I hope the maintainers will suggest something, if any. > --- > arch/sh/kernel/cpu/sh4a/smp-shx3.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/arch/sh/kernel/cpu/sh4a/smp-shx3.c > b/arch/sh/kernel/cpu/sh4a/smp-shx3.c > index f8a2bec0f260..1261dc7b84e8 100644 > --- a/arch/sh/kernel/cpu/sh4a/smp-shx3.c > +++ b/arch/sh/kernel/cpu/sh4a/smp-shx3.c > @@ -73,8 +73,9 @@ static void shx3_prepare_cpus(unsigned int max_cpus) > BUILD_BUG_ON(SMP_MSG_NR >= 8); > > for (i = 0; i < SMP_MSG_NR; i++) > - request_irq(104 + i, ipi_interrupt_handler, > - IRQF_PERCPU, "IPI", (void *)(long)i); > + if (request_irq(104 + i, ipi_interrupt_handler, > + IRQF_PERCPU, "IPI", (void *)(long)i)) > + pr_err("Failed to request irq %d\n", i); > > for (i = 0; i < max_cpus; i++) > set_cpu_present(i, true); > -- > 2.29.2.729.g45daf8777d-goog > -- Best Regards Masahiro Yamada
Re: linux-next: Tree for Dec 21 (objtool warning)
On 12/22/20 9:09 PM, Josh Poimboeuf wrote: > On Mon, Dec 21, 2020 at 08:03:17AM -0800, Randy Dunlap wrote: >> On 12/20/20 7:18 PM, Stephen Rothwell wrote: >>> Hi all, >>> >>> News: there will be no linux-next releases between Dec 24 and Jan >>> 3 inclusive. >>> >>> Please do not add any v5.12 destined code to your linux-next included >>> branches until after v5.11-rc1 has been released. >>> >>> Changes since 20201218: >>> >> >> on x86_64: >> >> arch/x86/kernel/sys_ia32.o: warning: objtool: cp_stat64()+0xd8: call to >> new_encode_dev() with UACCESS enabled > > Can you send a .o for this one? Please gzip it because my email has > been rejecting .o files lately :-/ > Sure, it's attached. -- ~Randy sys_ia32.o.gz Description: application/gzip
[PATCH] mm/buffer.c: remove the macro check in check_irqs_on()
The macro irqs_disabled is always defined in include/linux/irqflags.h, so we don't need the macro check. Signed-off-by: Hui Su --- fs/buffer.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 32647d2011df..34b505542d96 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1247,9 +1247,7 @@ static DEFINE_PER_CPU(struct bh_lru, bh_lrus) = {{ NULL }}; static inline void check_irqs_on(void) { -#ifdef irqs_disabled BUG_ON(irqs_disabled()); -#endif } /* -- 2.25.1
[PATCH v2 1/3] iommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev
Current struct intel_svm has a field to record the struct intel_iommu pointer for a PASID bind. And struct intel_svm will be shared by all the devices bind to the same process. The devices may be behind different DMAR units. As the iommu driver code uses the intel_iommu pointer stored in intel_svm struct to do cache invalidations, it may only flush the cache on a single DMAR unit, for others, the cache invalidation is missed. As intel_svm struct already has a device list, this patch just moves the intel_iommu pointer to be a field of intel_svm_dev struct. Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable mode") Cc: Lu Baolu Cc: Jacob Pan Cc: Raj Ashok Cc: David Woodhouse Reported-by: Guo Kaijie Reported-by: Xin Zeng Signed-off-by: Guo Kaijie Signed-off-by: Xin Zeng Signed-off-by: Liu Yi L Tested-by: Guo Kaijie --- drivers/iommu/intel/svm.c | 9 + include/linux/intel-iommu.h | 2 +- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index 3242ebd0bca3..4a10c9ff368c 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -142,7 +142,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d } desc.qw2 = 0; desc.qw3 = 0; - qi_submit_sync(svm->iommu, , 1, 0); + qi_submit_sync(sdev->iommu, , 1, 0); if (sdev->dev_iotlb) { desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) | @@ -166,7 +166,7 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d } desc.qw2 = 0; desc.qw3 = 0; - qi_submit_sync(svm->iommu, , 1, 0); + qi_submit_sync(sdev->iommu, , 1, 0); } } @@ -211,7 +211,7 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm) */ rcu_read_lock(); list_for_each_entry_rcu(sdev, >devs, list) - intel_pasid_tear_down_entry(svm->iommu, sdev->dev, + intel_pasid_tear_down_entry(sdev->iommu, sdev->dev, svm->pasid, true); rcu_read_unlock(); @@ -363,6 +363,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev, } sdev->dev = dev; sdev->sid = PCI_DEVID(info->bus, info->devfn); + sdev->iommu = iommu; /* Only count users if device has aux domains */ if (iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)) @@ -546,6 +547,7 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags, goto out; } sdev->dev = dev; + sdev->iommu = iommu; ret = intel_iommu_enable_pasid(iommu, dev); if (ret) { @@ -575,7 +577,6 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags, kfree(sdev); goto out; } - svm->iommu = iommu; if (pasid_max > intel_pasid_max_id) pasid_max = intel_pasid_max_id; diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index d956987ed032..94522685a0d9 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -758,6 +758,7 @@ struct intel_svm_dev { struct list_head list; struct rcu_head rcu; struct device *dev; + struct intel_iommu *iommu; struct svm_dev_ops *ops; struct iommu_sva sva; u32 pasid; @@ -771,7 +772,6 @@ struct intel_svm { struct mmu_notifier notifier; struct mm_struct *mm; - struct intel_iommu *iommu; unsigned int flags; u32 pasid; int gpasid; /* In case that guest PASID is different from host PASID */ -- 2.25.1
[PATCH] ubifs: Fix read out-of-bounds in ubifs_jnl_write_inode()
From: kechengsong ubifs_jnl_write_inode() probably cause read out-of-bounds in some situation. There is kasan stack: [ 336.432159] BUG: KASAN: slab-out-of-bounds in ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.433634] Read of size 4 at addr 888019612ff8 by task kworker/u8:4/135 [ 336.434605] [ 336.434830] CPU: 1 PID: 135 Comm: kworker/u8:4 Not tainted 5.10.0-11826-gaf2a097952f3-dirty #338 [ 336.436050] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [ 336.437876] Workqueue: writeback wb_workfn (flush-ubifs_0_0) [ 336.438670] Call Trace: [ 336.439021] ? dump_stack+0xdd/0x126 [ 336.439513] ? print_address_description.constprop.0+0x2c/0x3c0 [ 336.440308] ? _raw_write_lock_irqsave+0x140/0x140 [ 336.440921] ? ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.441546] ? ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.442186] ? kasan_report.cold+0x5d/0xd8 [ 336.442711] ? nand_reset_op+0x280/0x310 [ 336.443218] ? ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.443842] ? __asan_load4+0x77/0x120 [ 336.444334] ? ecc_sw_hamming_calculate+0x1dc/0x7d0 [ 336.444963] ? nand_ecc_sw_hamming_calculate+0x6c/0x80 [ 336.445619] ? rawnand_sw_hamming_calculate+0x12/0x20 [ 336.446263] ? nand_write_page_swecc+0xa9/0x160 [ 336.446849] ? nand_do_write_ops+0x390/0x830 [ 336.447406] ? __writeback_single_inode+0x6cc/0x880 [ 336.448041] ? nand_write_oob+0x78/0x100 [ 336.448568] ? mtd_write_oob_std+0xe2/0x160 [ 336.449127] ? mtd_write_oob+0xec/0x1b0 [ 336.449679] ? mtd_write+0x92/0xf0 [ 336.450128] ? mtd_write_oob+0x1b0/0x1b0 [ 336.450633] ? ubi_self_check_all_ff+0x82/0x2e0 [ubi] [ 336.451328] ? __list_add_valid+0x2b/0x130 [ 336.451865] ? ubi_io_write+0x2c2/0xa90 [ubi] [ 336.452472] ? _raw_read_lock_irq+0x90/0x90 [ 336.453078] ? kmem_cache_alloc_trace+0x465/0x8b0 [ 336.453749] ? do_sync_erase+0x350/0x350 [ubi] [ 336.454430] ? __kasan_check_write+0x20/0x30 [ 336.455050] ? down_write+0xf2/0x190 [ 336.455569] ? down_write_killable+0x1b0/0x1b0 [ 336.456221] ? check_mapping+0x2c/0x590 [ubi] [ 336.456890] ? ubi_eba_write_leb+0x58a/0xfa0 [ubi] [ 336.457618] ? __kmalloc+0x490/0x910 [ 336.458142] ? ubifs_jnl_write_inode.cold+0x6f/0x878 [ubifs] [ 336.459033] ? writeback_sb_inodes+0x3a9/0x9a0 [ 336.459672] ? __writeback_inodes_wb+0xc8/0x170 [ 336.460330] ? wb_writeback+0x637/0x700 [ 336.460882] ? wb_workfn+0x8af/0xb80 [ 336.461398] ? process_one_work+0x467/0x9f0 [ 336.462004] ? worker_thread+0x34d/0x8e0 [ 336.462582] ? kthread+0x204/0x280 [ 336.463047] ? ret_from_fork+0x1f/0x30 [ 336.463570] ? create_prof_cpu_mask+0x30/0x30 [ 336.464185] ? ubi_eba_read_leb_sg+0x1f0/0x1f0 [ubi] [ 336.464917] ? hrtimer_active+0x9b/0x100 [ 336.465468] ? ubi_leb_write+0x22c/0x2f0 [ubi] [ 336.466130] ? ubifs_leb_write+0xf2/0x1b0 [ubifs] [ 336.466851] ? ubifs_wbuf_write_nolock+0x412/0x1280 [ubifs] [ 336.467686] ? write_head+0xdf/0x1c0 [ubifs] [ 336.468355] ? ubifs_jnl_write_inode.cold+0x3ec/0x878 [ubifs] [ 336.469183] ? ret_from_fork+0x1e/0x30 [ 336.469707] ? ubifs_jnl_write_data+0x660/0x660 [ubifs] [ 336.470497] ? unwind_next_frame+0x247/0xca0 [ 336.471095] ? ret_from_fork+0x1f/0x30 [ 336.471574] ? fprop_reflect_period_percpu.isra.0+0x1f/0x1b0 [ 336.472335] ? generic_writepages+0x93/0x140 [ 336.472933] ? __kasan_check_write+0x20/0x30 [ 336.473526] ? mutex_lock+0xa6/0x110 [ 336.474031] ? __mutex_lock_slowpath+0x30/0x30 [ 336.474662] ? ubifs_write_inode+0x1c3/0x290 [ubifs] [ 336.475446] ? __writeback_single_inode+0x6cc/0x880 [ 336.476155] ? wbc_attach_and_unlock_inode+0x2b6/0x400 [ 336.476891] ? writeback_sb_inodes+0x3a9/0x9a0 [ 336.477528] ? write_inode_now+0x1e0/0x1e0 [ 336.478119] ? __writeback_inodes_wb+0xc8/0x170 [ 336.478770] ? wb_writeback+0x637/0x700 [ 336.479326] ? __writeback_inodes_wb+0x170/0x170 [ 336.479992] ? current_work+0xa0/0xa0 [ 336.480524] ? _find_next_bit.constprop.0+0x3e/0x140 [ 336.481241] ? find_next_bit+0x18/0x30 [ 336.481780] ? cpumask_next+0x2f/0x40 [ 336.482312] ? wb_workfn+0x8af/0xb80 [ 336.482832] ? update_cfs_group+0x1e/0x1b0 [ 336.483421] ? inode_wait_for_writeback+0x60/0x60 [ 336.484106] ? schedule+0xb7/0x240 [ 336.484595] ? finish_task_switch+0x14e/0x9a0 [ 336.485225] ? __kasan_check_write+0x20/0x30 [ 336.485841] ? __schedule+0x6f4/0x1600 [ 336.486382] ? __kasan_check_read+0x1d/0x30 [ 336.486981] ? read_word_at_a_time+0x16/0x30 [ 336.487594] ? process_one_work+0x467/0x9f0 [ 336.488198] ? worker_thread+0x34d/0x8e0 [ 336.488762] ? rescuer_thread+0x820/0x820 [ 336.489344] ? kthread+0x204/0x280 [ 336.489839] ? kthread_bind+0x50/0x50 [ 336.490367] ? ret_from_fork+0x1f/0x30 [ 336.490913] [ 336.491138] Allocated by task 135: [ 336.491629] kasan_save_stack+0x23/0x60 [ 336.492189] __kasan_kmalloc.constprop.0+0x10b/0x120 [ 336.492898] kasan_kmalloc+0xd/0x20 [ 336.493401]
[PATCH v2 3/3] iommu/vt-d: Fix ineffective devTLB invalidation for subdevices
iommu_flush_dev_iotlb() is called to invalidate caches on device. It only loops the devices which are full-attached to the domain. For sub-devices, this is ineffective. This results in invalid caching entries left on the device. Fix it by adding loop for subdevices as well. Also, the domain-> has_iotlb_device needs to be updated when attaching to subdevices. Fixes: 67b8e02b5e761 ("iommu/vt-d: Aux-domain specific domain attach/detach") Signed-off-by: Liu Yi L --- drivers/iommu/intel/iommu.c | 63 +++-- 1 file changed, 47 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index acfe0a5b955e..e97c5ac1d7fc 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -726,6 +726,8 @@ static int domain_update_device_node(struct dmar_domain *domain) return nid; } +static void domain_update_iotlb(struct dmar_domain *domain); + /* Some capabilities may be different across iommus */ static void domain_update_iommu_cap(struct dmar_domain *domain) { @@ -739,6 +741,8 @@ static void domain_update_iommu_cap(struct dmar_domain *domain) */ if (domain->nid == NUMA_NO_NODE) domain->nid = domain_update_device_node(domain); + + domain_update_iotlb(domain); } struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, @@ -1459,6 +1463,18 @@ iommu_support_dev_iotlb (struct dmar_domain *domain, struct intel_iommu *iommu, return NULL; } +static bool dev_iotlb_enabled(struct device_domain_info *info) +{ + struct pci_dev *pdev; + + if (!info->dev || !dev_is_pci(info->dev)) + return false; + + pdev = to_pci_dev(info->dev); + + return !!pdev->ats_enabled; +} + static void domain_update_iotlb(struct dmar_domain *domain) { struct device_domain_info *info; @@ -1466,17 +1482,20 @@ static void domain_update_iotlb(struct dmar_domain *domain) assert_spin_locked(_domain_lock); - list_for_each_entry(info, >devices, link) { - struct pci_dev *pdev; - - if (!info->dev || !dev_is_pci(info->dev)) - continue; - - pdev = to_pci_dev(info->dev); - if (pdev->ats_enabled) { + list_for_each_entry(info, >devices, link) + if (dev_iotlb_enabled(info)) { has_iotlb_device = true; break; } + + if (!has_iotlb_device) { + struct subdev_domain_info *sinfo; + + list_for_each_entry(sinfo, >subdevices, link_domain) + if (dev_iotlb_enabled(get_domain_info(sinfo->pdev))) { + has_iotlb_device = true; + break; + } } domain->has_iotlb_device = has_iotlb_device; @@ -1557,25 +1576,37 @@ static void iommu_disable_dev_iotlb(struct device_domain_info *info) #endif } +static void __iommu_flush_dev_iotlb(struct device_domain_info *info, + u64 addr, unsigned int mask) +{ + u16 sid, qdep; + + if (!info || !info->ats_enabled) + return; + + sid = info->bus << 8 | info->devfn; + qdep = info->ats_qdep; + qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, + qdep, addr, mask); +} + static void iommu_flush_dev_iotlb(struct dmar_domain *domain, u64 addr, unsigned mask) { - u16 sid, qdep; unsigned long flags; struct device_domain_info *info; + struct subdev_domain_info *sinfo; if (!domain->has_iotlb_device) return; spin_lock_irqsave(_domain_lock, flags); - list_for_each_entry(info, >devices, link) { - if (!info->ats_enabled) - continue; + list_for_each_entry(info, >devices, link) + __iommu_flush_dev_iotlb(info, addr, mask); - sid = info->bus << 8 | info->devfn; - qdep = info->ats_qdep; - qi_flush_dev_iotlb(info->iommu, sid, info->pfsid, - qdep, addr, mask); + list_for_each_entry(sinfo, >subdevices, link_domain) { + __iommu_flush_dev_iotlb(get_domain_info(sinfo->pdev), + addr, mask); } spin_unlock_irqrestore(_domain_lock, flags); } -- 2.25.1
[PATCH v2 2/3] iommu/vt-d: Track device aux-attach with subdevice_domain_info
In the existing code, loop all devices attached to a domain does not include sub-devices attached via iommu_aux_attach_device(). This was found by when I'm working on the belwo patch, There is no device in the domain->devices list, thus unable to get the cap and ecap of iommu unit. But this domain actually has subdevice which is attached via aux-manner. But it is tracked by domain. This patch is going to fix it. https://lore.kernel.org/kvm/1599734733-6431-17-git-send-email-yi.l@intel.com/ And this fix goes beyond the patch above, such sub-device tracking is necessary for other cases. For example, flushing device_iotlb for a domain which has sub-devices attached by auxiliary manner. Co-developed-by: Xin Zeng Signed-off-by: Xin Zeng Signed-off-by: Liu Yi L --- drivers/iommu/intel/iommu.c | 95 +++-- include/linux/intel-iommu.h | 16 +-- 2 files changed, 82 insertions(+), 29 deletions(-) diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index a49afa11673c..acfe0a5b955e 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -1881,6 +1881,7 @@ static struct dmar_domain *alloc_domain(int flags) domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL; domain->has_iotlb_device = false; INIT_LIST_HEAD(>devices); + INIT_LIST_HEAD(>subdevices); return domain; } @@ -2632,7 +2633,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, info->iommu = iommu; info->pasid_table = NULL; info->auxd_enabled = 0; - INIT_LIST_HEAD(>auxiliary_domains); + INIT_LIST_HEAD(>subdevices); if (dev && dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(info->dev); @@ -5172,33 +5173,61 @@ is_aux_domain(struct device *dev, struct iommu_domain *domain) domain->type == IOMMU_DOMAIN_UNMANAGED; } -static void auxiliary_link_device(struct dmar_domain *domain, - struct device *dev) +static inline struct subdev_domain_info * +lookup_subdev_info(struct dmar_domain *domain, struct device *dev) +{ + struct subdev_domain_info *sinfo; + + if (!list_empty(>subdevices)) { + list_for_each_entry(sinfo, >subdevices, link_domain) { + if (sinfo->pdev == dev) + return sinfo; + } + } + + return NULL; +} + +static int auxiliary_link_device(struct dmar_domain *domain, +struct device *dev) { struct device_domain_info *info = get_domain_info(dev); + struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev); assert_spin_locked(_domain_lock); if (WARN_ON(!info)) - return; + return -EINVAL; + + if (!sinfo) { + sinfo = kzalloc(sizeof(*sinfo), GFP_ATOMIC); + sinfo->domain = domain; + sinfo->pdev = dev; + list_add(>link_phys, >subdevices); + list_add(>link_domain, >subdevices); + } - domain->auxd_refcnt++; - list_add(>auxd, >auxiliary_domains); + return ++sinfo->users; } -static void auxiliary_unlink_device(struct dmar_domain *domain, - struct device *dev) +static int auxiliary_unlink_device(struct dmar_domain *domain, + struct device *dev) { struct device_domain_info *info = get_domain_info(dev); + struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev); + int ret; assert_spin_locked(_domain_lock); - if (WARN_ON(!info)) - return; + if (WARN_ON(!info || !sinfo || sinfo->users <= 0)) + return -EINVAL; - list_del(>auxd); - domain->auxd_refcnt--; + ret = --sinfo->users; + if (!ret) { + list_del(>link_phys); + list_del(>link_domain); + kfree(sinfo); + } - if (!domain->auxd_refcnt && domain->default_pasid > 0) - ioasid_free(domain->default_pasid); + return ret; } static int aux_domain_add_dev(struct dmar_domain *domain, @@ -5227,6 +5256,19 @@ static int aux_domain_add_dev(struct dmar_domain *domain, } spin_lock_irqsave(_domain_lock, flags); + ret = auxiliary_link_device(domain, dev); + if (ret <= 0) + goto link_failed; + + /* +* Subdevices from the same physical device can be attached to the +* same domain. For such cases, only the first subdevice attachment +* needs to go through the full steps in this function. So if ret > +* 1, just goto out. +*/ + if (ret > 1) + goto out; + /* * iommu->lock must be held to attach domain to iommu and setup the * pasid entry for second level translation. @@ -5245,10
[PATCH v2 0/3] iommu/vt-d: Misc fixes on scalable mode
This patchset aims to fix a bug regards to native SVM usage, and also several bugs around subdevice (attached to device via auxiliary manner) tracking and ineffective device_tlb flush. Liu Yi L (3): iommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev iommu/vt-d: Track device aux-attach with subdevice_domain_info iommu/vt-d: Fix ineffective devTLB invalidation for subdevices drivers/iommu/intel/iommu.c | 158 +++- drivers/iommu/intel/svm.c | 9 +- include/linux/intel-iommu.h | 18 ++-- 3 files changed, 135 insertions(+), 50 deletions(-) -- 2.25.1
[PATCH] x86/iommu: Fix two minimal issues in check_iommu_entries()
check_iommu_entries() checks for cyclic dependency in iommu entries and fixes the cyclic dependency by setting x->depend to NULL. But this repairing isn't correct if q is in front of p, there will be "EXECUTION ORDER INVALID!" report following. Fix it by NULLing whichever in the front. The second issue is about the report of exectuion order reverse, the order is reversed incorrectly in the report, fix it. Signed-off-by: Zhenzhong Duan --- arch/x86/kernel/pci-iommu_table.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/pci-iommu_table.c b/arch/x86/kernel/pci-iommu_table.c index 2e9006c..40c8249 100644 --- a/arch/x86/kernel/pci-iommu_table.c +++ b/arch/x86/kernel/pci-iommu_table.c @@ -60,7 +60,10 @@ void __init check_iommu_entries(struct iommu_table_entry *start, printk(KERN_ERR "CYCLIC DEPENDENCY FOUND! %pS depends on %pS and vice-versa. BREAKING IT.\n", p->detect, q->detect); /* Heavy handed way..*/ - x->depend = NULL; + if (p > q) + q->depend = NULL; + else + p->depend = NULL; } } @@ -68,7 +71,7 @@ void __init check_iommu_entries(struct iommu_table_entry *start, q = find_dependents_of(p, finish, p); if (q && q > p) { printk(KERN_ERR "EXECUTION ORDER INVALID! %pS should be called before %pS!\n", - p->detect, q->detect); + q->detect, p->detect); } } } -- 1.8.3.1
Re: [PATCH] Bluetooth: btrtl: Add null check in setup
Hi Abhishek, > btrtl_dev->ic_info is only available from the controller on cold boot > (the lmp subversion matches the device model and this is used to look up > the ic_info). On warm boots (firmware already loaded), > btrtl_dev->ic_info is null. > > Fixes: 05672a2c14a4 (Bluetooth: btrtl: Enable central-peripheral role) > Signed-off-by: Abhishek Pandit-Subedi > --- > > drivers/bluetooth/btrtl.c | 23 +-- > 1 file changed, 13 insertions(+), 10 deletions(-) > > diff --git a/drivers/bluetooth/btrtl.c b/drivers/bluetooth/btrtl.c > index 1abf6a4d672734f..978f3c773856b05 100644 > --- a/drivers/bluetooth/btrtl.c > +++ b/drivers/bluetooth/btrtl.c > @@ -719,16 +719,19 @@ int btrtl_setup_realtek(struct hci_dev *hdev) >*/ > set_bit(HCI_QUIRK_SIMULTANEOUS_DISCOVERY, >quirks); > > - /* Enable central-peripheral role (able to create new connections with > - * an existing connection in slave role). > - */ > - switch (btrtl_dev->ic_info->lmp_subver) { > - case RTL_ROM_LMP_8822B: > - set_bit(HCI_QUIRK_VALID_LE_STATES, >quirks); > - break; > - default: > - rtl_dev_dbg(hdev, "Central-peripheral role not enabled."); > - break; > + if (btrtl_dev->ic_info) { > + /* Enable central-peripheral role (able to create new > + * connections with an existing connection in slave role). > + */ > + switch (btrtl_dev->ic_info->lmp_subver) { > + case RTL_ROM_LMP_8822B: > + set_bit(HCI_QUIRK_VALID_LE_STATES, >quirks); > + break; > + default: > + rtl_dev_dbg(hdev, > + "Central-peripheral role not enabled."); > + break; > + } > } if (!btrtl_dev->ic_info) goto done; > > btrtl_free(btrtl_dev); Regards Marcel
Re: [PATCH v4] ovl: fix dentry leak in ovl_get_redirect
Thanks Viro. @Miklos, can you please advise? On 20/12/22 上午11:26, Al Viro wrote: On Tue, Dec 22, 2020 at 11:06:26AM +0800, Liangyan wrote: Cc: Fixes: a6c606551141 ("ovl: redirect on rename-dir") Signed-off-by: Liangyan Reviewed-by: Joseph Qi Suggested-by: Al Viro Fine by me... I can put it through vfs.git#fixes, but IMO that would be better off in overlayfs tree.
Re: [PATCH] Revert "kbuild: avoid static_assert for genksyms"
On Sun, Dec 20, 2020 at 3:40 AM Masahiro Yamada wrote: > > This reverts commit 14dc3983b5dff513a90bd5a8cc90acaf7867c3d0. > > Macro Elver had sent a fix proper fix earlier, and also pointed out > corner cases: > > "I guess what you propose is simpler, but might still have corner cases > where we still get warnings. In particular, if some file (for whatever > reason) does not include build_bug.h and uses a raw _Static_assert(), > then we still get warnings. E.g. I see 1 user of raw _Static_assert() > (drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h )." > > I believe the raw use of _Static_assert() should be allowed, so this > should be fixed in genksyms. > > Even after commit 14dc3983b5df ("kbuild: avoid static_assert for > genksyms"), I confirmed the following test code emits the warning. > > >8 > #include > > _Static_assert((1 ?: 0), ""); > > void foo(void) { } > EXPORT_SYMBOL(foo); > >8 > > WARNING: modpost: EXPORT symbol "foo" [vmlinux] version generation failed, > symbol will not be versioned. > > Now that commit 869b91992bce ("genksyms: Ignore module scoped I updated the commit id in the mainline. 9ab55d7f240f Now, applied to linux-kbuild. > _Static_assert()") fixed this issue properly, the workaround should > be reverted. > > Link: https://lkml.org/lkml/2020/12/10/845 > Cc: Marco Elver > Signed-off-by: Masahiro Yamada > --- > > I will apply this after Macro's patch is pulled. > > > > include/linux/build_bug.h | 5 - > 1 file changed, 5 deletions(-) > > diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h > index 7bb66e15b481..e3a0be2c90ad 100644 > --- a/include/linux/build_bug.h > +++ b/include/linux/build_bug.h > @@ -77,9 +77,4 @@ > #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr) > #define __static_assert(expr, msg, ...) _Static_assert(expr, msg) > > -#ifdef __GENKSYMS__ > -/* genksyms gets confused by _Static_assert */ > -#define _Static_assert(expr, ...) > -#endif > - > #endif /* _LINUX_BUILD_BUG_H */ > -- > 2.27.0 > -- Best Regards Masahiro Yamada
drivers/acpi/x86/s2idle.c:395:13: sparse: sparse: restricted suspend_state_t degrades to integer
tree: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master head: 614cb5894306cfa2c7d9b6168182876ff5948735 commit: fef98671194be005853cbbf51b164a3927589b64 ACPI: PM: s2idle: Move x86-specific code to the x86 directory date: 5 days ago config: i386-randconfig-s001-20201221 (attached as .config) compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 reproduce: # apt-get install sparse # sparse version: v0.6.3-184-g1b896707-dirty # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fef98671194be005853cbbf51b164a3927589b64 git remote add linus https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git fetch --no-tags linus master git checkout fef98671194be005853cbbf51b164a3927589b64 # save the attached .config to linux build tree make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot "sparse warnings: (new ones prefixed by >>)" >> drivers/acpi/x86/s2idle.c:395:13: sparse: sparse: restricted suspend_state_t >> degrades to integer drivers/acpi/x86/s2idle.c:395:33: sparse: sparse: restricted suspend_state_t degrades to integer vim +395 drivers/acpi/x86/s2idle.c 348 349 static int lps0_device_attach(struct acpi_device *adev, 350const struct acpi_device_id *not_used) 351 { 352 union acpi_object *out_obj; 353 354 if (lps0_device_handle) 355 return 0; 356 357 if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) 358 return 0; 359 360 if (acpi_s2idle_vendor_amd()) { 361 guid_parse(ACPI_LPS0_DSM_UUID_AMD, _dsm_guid); 362 out_obj = acpi_evaluate_dsm(adev->handle, _dsm_guid, 0, 0, NULL); 363 rev_id = 0; 364 } else { 365 guid_parse(ACPI_LPS0_DSM_UUID, _dsm_guid); 366 out_obj = acpi_evaluate_dsm(adev->handle, _dsm_guid, 1, 0, NULL); 367 rev_id = 1; 368 } 369 370 /* Check if the _DSM is present and as expected. */ 371 if (!out_obj || out_obj->type != ACPI_TYPE_BUFFER) { 372 acpi_handle_debug(adev->handle, 373"_DSM function 0 evaluation failed\n"); 374 return 0; 375 } 376 377 lps0_dsm_func_mask = *(char *)out_obj->buffer.pointer; 378 379 ACPI_FREE(out_obj); 380 381 acpi_handle_debug(adev->handle, "_DSM function mask: 0x%x\n", 382lps0_dsm_func_mask); 383 384 lps0_device_handle = adev->handle; 385 386 if (acpi_s2idle_vendor_amd()) 387 lpi_device_get_constraints_amd(); 388 else 389 lpi_device_get_constraints(); 390 391 /* 392 * Use suspend-to-idle by default if the default suspend mode was not 393 * set from the command line. 394 */ > 395 if (mem_sleep_default > PM_SUSPEND_MEM && > !acpi_sleep_default_s3) 396 mem_sleep_current = PM_SUSPEND_TO_IDLE; 397 398 /* 399 * Some LPS0 systems, like ASUS Zenbook UX430UNR/i7-8550U, require the 400 * EC GPE to be enabled while suspended for certain wakeup devices to 401 * work, so mark it as wakeup-capable. 402 */ 403 acpi_ec_mark_gpe_for_wake(); 404 405 return 0; 406 } 407 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH] blokc/blk-merge: remove the next_bvec label in __blk_bios_map_sg()linux-bl...@vger.kernel.org (open list:BLOCK LAYER)
remove the next_bvec label in __blk_bios_map_sg(), simplify the logic of traversal bvec. Signed-off-by: sh --- block/blk-merge.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/block/blk-merge.c b/block/blk-merge.c index 808768f6b174..aa113cbc0f35 100644 --- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -494,15 +494,15 @@ static int __blk_bios_map_sg(struct request_queue *q, struct bio *bio, * to bio */ if (new_bio && - __blk_segment_map_sg_merge(q, , , sg)) - goto next_bvec; + __blk_segment_map_sg_merge(q, , , sg)) { + new_bio = false; + continue; + } if (bvec.bv_offset + bvec.bv_len <= PAGE_SIZE) nsegs += __blk_bvec_map_sg(bvec, sglist, sg); else nsegs += blk_bvec_map_sg(q, , sglist, sg); - next_bvec: - new_bio = false; } if (likely(bio->bi_iter.bi_size)) { bvprv = bvec; -- 2.25.1
Re: [PATCH v2 15/48] opp: Support set_opp() customization without requiring to use regulators
On 17-12-20, 21:06, Dmitry Osipenko wrote: > Support set_opp() customization without requiring to use regulators. This > is needed by drivers which want to use dev_pm_opp_set_rate() for changing > rates of a multiple clocks and don't need to touch regulator. > > One example is NVIDIA Tegra30/114 SoCs which have two sibling 3D hardware > units which should be use to the same clock rate, meanwhile voltage > scaling is done using a power domain. In this case OPP table doesn't have > a regulator, causing a NULL dereference in _set_opp_custom(). > > Signed-off-by: Dmitry Osipenko > --- > drivers/opp/core.c | 16 > 1 file changed, 12 insertions(+), 4 deletions(-) > > diff --git a/drivers/opp/core.c b/drivers/opp/core.c > index 3d02fe33630b..625dae7a5ecb 100644 > --- a/drivers/opp/core.c > +++ b/drivers/opp/core.c > @@ -828,17 +828,25 @@ static int _set_opp_custom(const struct opp_table > *opp_table, > struct dev_pm_opp_supply *old_supply, > struct dev_pm_opp_supply *new_supply) > { > - struct dev_pm_set_opp_data *data; > + struct dev_pm_set_opp_data *data, tmp_data; > + unsigned int regulator_count; > int size; > > - data = opp_table->set_opp_data; > + if (opp_table->set_opp_data) { > + data = opp_table->set_opp_data; > + regulator_count = opp_table->regulator_count; > + } else { > + data = _data; > + regulator_count = 0; > + } > + We should use the same structure, you can add some checks but not replace the structure altogether. > data->regulators = opp_table->regulators; > - data->regulator_count = opp_table->regulator_count; > + data->regulator_count = regulator_count; > data->clk = opp_table->clk; > data->dev = dev; > > data->old_opp.rate = old_freq; > - size = sizeof(*old_supply) * opp_table->regulator_count; > + size = sizeof(*old_supply) * regulator_count; > if (!old_supply) > memset(data->old_opp.supplies, 0, size); > else -- viresh
RE: [PATCH 2/3] aspeed-video: clear spurious interrupt bits unconditionally
> -Original Message- > From: Zev Weiss > Sent: Wednesday, December 23, 2020 11:54 AM > To: Ryan Chen > Cc: Joel Stanley ; Eddie James ; > Mauro Carvalho Chehab ; Andrew Jeffery > ; linux-me...@vger.kernel.org; OpenBMC Maillist > ; Linux ARM > ; linux-aspeed > ; Linux Kernel Mailing List > ; Jae Hyun Yoo > Subject: Re: [PATCH 2/3] aspeed-video: clear spurious interrupt bits > unconditionally > > On Tue, Dec 22, 2020 at 08:53:33PM CST, Ryan Chen wrote: > >> -Original Message- > >> From: Joel Stanley > >> Sent: Wednesday, December 23, 2020 9:07 AM > >> To: Zev Weiss ; Ryan Chen > >> > >> Cc: Eddie James ; Mauro Carvalho Chehab > >> ; Andrew Jeffery ; > >> linux-me...@vger.kernel.org; OpenBMC Maillist > >> ; Linux ARM > >> ; linux-aspeed > >> ; Linux Kernel Mailing List > >> ; Jae Hyun Yoo > >> > >> Subject: Re: [PATCH 2/3] aspeed-video: clear spurious interrupt bits > >> unconditionally > >> > >> On Tue, 22 Dec 2020 at 19:14, Zev Weiss wrote: > >> > > >> > On Mon, Dec 21, 2020 at 10:47:37PM CST, Joel Stanley wrote: > >> > >On Tue, 15 Dec 2020 at 02:46, Zev Weiss > wrote: > >> > >> > >> > >> Instead of testing and conditionally clearing them one by one, > >> > >> we can instead just unconditionally clear them all at once. > >> > >> > >> > >> Signed-off-by: Zev Weiss > >> > > > >> > >I had a poke at the assembly and it looks like GCC is clearing the > >> > >bits unconditionally anyway, so removing the tests provides no change. > >> > > > >> > >Combining them is a good further optimization. > >> > > > >> > >Reviewed-by: Joel Stanley > >> > > > >> > >A question unrelated to this patch: Do you know why the driver > >> > >doesn't clear the status bits in the interrupt handler? I would > >> > >expect it to write the value of sts back to the register to ack > >> > >the pending interrupt. > >> > > > >> > > >> > No, I don't, and I was sort of wondering the same thing actually -- > >> > I'm not deeply familiar with this hardware or driver though, so I > >> > was a bit hesitant to start messing with things. (Though maybe > >> > doing so would address the "stickiness" aspect when it does > >> > manifest.) Perhaps Eddie or Jae can shed some light here? > >> > >> I think you're onto something here - this would be why the status > >> bits seem to stick until the device is reset. > >> > >> Until Aspeed can clarify if this is a hardware or software issue, I > >> suggest we ack the bits and log a message when we see them, instead > >> of always ignoring them without taking any action. > >> > >> Can you write a patch that changes the interrupt handler to ack > >> status bits as it handles each of them? > >> > >Hello Zev, before the patch, do you met issue with irq handler? > >[continuous incoming?] > > > >In aspeed_video_irq handler should only handle enable interrupt expected. > > u32 sts = aspeed_video_read(video, VE_INTERRUPT_STATUS); > > + sts &= aspeed_video_read(video, VE_INTERRUPT_CTRL); > > > >Ryan > > > > Hi Ryan, > > Prior to any of these patches I encountered a problem pretty much exactly like > what Jae described in his commit message in 65d270acb2d (but the kernel I > was running included that patch). Adding the diagnostic in patch #1 of this > series showed that it was apparently the same problem, just with a different > interrupt that Jae's patch didn't include. > > From what you wrote above, I gather that it is in fact expected for the > hardware to assert interrupts that aren't enabled in VE_INTERRUPT_CTRL? > If so, I guess something like that would obviate the need for both Jae's > earlier > patch and this whole series. > Yes, I expected handle enabled in VE_INTERRUPT_CTRL. > I think the question Joel raised is somewhat independent though -- if the > VE_INTERRUPT_STATUS register asserts interrupts we're not actually using, > should the driver acknowledge them anyway or just leave them alone? My opinion will keep them alone, ignore them. > (Though if we're just going to ignore them anyway maybe it doesn't ultimately > matter very much.) > > > Zev
[PATCH v2] HID: Add Wireless Radio Control feature for Chicony devices
Some Chicony's keyboards support airplane mode hotkey (Fn+F2) with "Wireless Radio Control" feature. For example, the wireless keyboard [04f2:1236] shipped with ASUS all-in-one desktop. After consulting Chicony for this hotkey, learned the device will send with 0x11 as the report ID and 0x1 as the value when the key is pressed down. This patch maps the event as KEY_RFKILL. Signed-off-by: Jian-Hong Pan --- v2: Remove the duplicated key pressed check. drivers/hid/hid-chicony.c | 55 +++ drivers/hid/hid-ids.h | 1 + 2 files changed, 56 insertions(+) diff --git a/drivers/hid/hid-chicony.c b/drivers/hid/hid-chicony.c index 3f0ed6a95223..ca556d39da2a 100644 --- a/drivers/hid/hid-chicony.c +++ b/drivers/hid/hid-chicony.c @@ -21,6 +21,39 @@ #include "hid-ids.h" +#define CH_WIRELESS_CTL_REPORT_ID 0x11 + +static int ch_report_wireless(struct hid_report *report, u8 *data, int size) +{ + struct hid_device *hdev = report->device; + struct input_dev *input; + + if (report->id != CH_WIRELESS_CTL_REPORT_ID || report->maxfield != 1) + return 0; + + input = report->field[0]->hidinput->input; + if (!input) { + hid_warn(hdev, "can't find wireless radio control's input"); + return 0; + } + + input_report_key(input, KEY_RFKILL, 1); + input_sync(input); + input_report_key(input, KEY_RFKILL, 0); + input_sync(input); + + return 1; +} + +static int ch_raw_event(struct hid_device *hdev, + struct hid_report *report, u8 *data, int size) +{ + if (report->application == HID_GD_WIRELESS_RADIO_CTLS) + return ch_report_wireless(report, data, size); + + return 0; +} + #define ch_map_key_clear(c)hid_map_usage_clear(hi, usage, bit, max, \ EV_KEY, (c)) static int ch_input_mapping(struct hid_device *hdev, struct hid_input *hi, @@ -77,10 +110,30 @@ static __u8 *ch_switch12_report_fixup(struct hid_device *hdev, __u8 *rdesc, return rdesc; } +static int ch_probe(struct hid_device *hdev, const struct hid_device_id *id) +{ + int ret; + + hdev->quirks |= HID_QUIRK_INPUT_PER_APP; + ret = hid_parse(hdev); + if (ret) { + hid_err(hdev, "Chicony hid parse failed: %d\n", ret); + return ret; + } + + ret = hid_hw_start(hdev, HID_CONNECT_DEFAULT); + if (ret) { + hid_err(hdev, "Chicony hw start failed: %d\n", ret); + return ret; + } + + return 0; +} static const struct hid_device_id ch_devices[] = { { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_TACTICAL_PAD) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_WIRELESS2) }, + { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_WIRELESS3) }, { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_ACER_SWITCH12) }, { } }; @@ -91,6 +144,8 @@ static struct hid_driver ch_driver = { .id_table = ch_devices, .report_fixup = ch_switch12_report_fixup, .input_mapping = ch_input_mapping, + .probe = ch_probe, + .raw_event = ch_raw_event, }; module_hid_driver(ch_driver); diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 4c5f23640f9c..06d90301a3dc 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -270,6 +270,7 @@ #define USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE 0x1053 #define USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE20x0939 #define USB_DEVICE_ID_CHICONY_WIRELESS20x1123 +#define USB_DEVICE_ID_CHICONY_WIRELESS30x1236 #define USB_DEVICE_ID_ASUS_AK1D0x1125 #define USB_DEVICE_ID_CHICONY_TOSHIBA_WT10A0x1408 #define USB_DEVICE_ID_CHICONY_ACER_SWITCH120x1421 -- 2.29.2
Re: [PATCH v2 28/48] soc/tegra: Introduce core power domain driver
On 22-12-20, 22:39, Dmitry Osipenko wrote: > 22.12.2020 22:21, Dmitry Osipenko пишет: > >>> + if (IS_ERR(opp)) { > >>> + dev_err(>dev, "failed to find OPP for level %u: %pe\n", > >>> + level, opp); > >>> + return PTR_ERR(opp); > >>> + } > >>> + > >>> + err = dev_pm_opp_set_voltage(>dev, opp); > >> IIUC, you implemented this callback because you want to use the voltage > >> triplet > >> present in the OPP table ? > >> > >> And so you are setting the regulator ("power") later in this patch ? > > yes > > > >> I am not in favor of implementing this routine, as it just adds a wrapper > >> above > >> the regulator API. What you should be doing rather is get the regulator by > >> yourself here (instead of depending on the OPP core). And then you can do > >> dev_pm_opp_get_voltage() here and set the voltage yourself. You may want to > >> implement a version supporting triplet here though for the same. > >> > >> And you won't require the sync version of the API as well then. > >> > > That's what I initially did for this driver. I don't mind to revert back > > to the initial variant in v3, it appeared to me that it will be nicer > > and cleaner to have OPP API managing everything here. > > I forgot one important detail (why the initial variant wasn't good).. > OPP entries that have unsupportable voltages should be filtered out and > OPP core performs the filtering only if regulator is assigned to the OPP > table. > > If regulator is assigned to the OPP table, then we need to use OPP API > for driving the regulator, hence that's why I added > dev_pm_opp_sync_regulators() and dev_pm_opp_set_voltage(). > > Perhaps it should be possible to add dev_pm_opp_get_regulator() that What's wrong with getting the regulator in the driver as well ? Apart from the OPP core ? > will return the OPP table regulator in order to allow driver to use the > regulator directly. But I'm not sure whether this is a much better > option than the opp_sync_regulators() and opp_set_voltage() APIs. set_voltage() is still fine as there is some data that the OPP core has, but sync_regulator() has nothing to do with OPP core. And this may lead to more wrapper helpers in the OPP core, which I am afraid of. And so even if it is not the best, I would like the OPP core to provide the data and not get into this. Ofcourse there is an exception to this, opp_set_rate. -- viresh
[PATCH v2] net/ncsi: Use real net-device for response handler
When aggregating ncsi interfaces and dedicated interfaces to bond interfaces, the ncsi response handler will use the wrong net device to find ncsi_dev, so that the ncsi interface will not work properly. Here, we use the original net device to fix it. Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler") Signed-off-by: John Wang --- v2: Use orig_dev instead of pt->dev --- net/ncsi/ncsi-rsp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c index a94bb59793f0..e1c6bb4ab98f 100644 --- a/net/ncsi/ncsi-rsp.c +++ b/net/ncsi/ncsi-rsp.c @@ -1120,7 +1120,7 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device *dev, int payload, i, ret; /* Find the NCSI device */ - nd = ncsi_find_dev(dev); + nd = ncsi_find_dev(orig_dev); ndp = nd ? TO_NCSI_DEV_PRIV(nd) : NULL; if (!ndp) return -ENODEV; -- 2.25.1
[PATCH] drm/hisilicon: Add load and unload callback functions
Add the callback functions of drm_driver structure member functions load and unload, no need to call load in the hibmc_pci_probe function and unload in the hibmc_pci_remove function. Signed-off-by: Tian Tao --- drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c index 0d4e902..109ca87 100644 --- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c +++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c @@ -27,6 +27,9 @@ DEFINE_DRM_GEM_FOPS(hibmc_fops); +static int hibmc_load(struct drm_device *dev, unsigned long flags); +static void hibmc_unload(struct drm_device *dev); + static irqreturn_t hibmc_drm_interrupt(int irq, void *arg) { struct drm_device *dev = (struct drm_device *)arg; @@ -63,6 +66,8 @@ static const struct drm_driver hibmc_driver = { .dumb_map_offset= drm_gem_vram_driver_dumb_mmap_offset, .gem_prime_mmap = drm_gem_prime_mmap, .irq_handler= hibmc_drm_interrupt, + .load = hibmc_load, + .unload = hibmc_unload, }; static int __maybe_unused hibmc_pm_suspend(struct device *dev) @@ -248,7 +253,7 @@ static int hibmc_hw_init(struct hibmc_drm_private *priv) return 0; } -static int hibmc_unload(struct drm_device *dev) +static void hibmc_unload(struct drm_device *dev) { drm_atomic_helper_shutdown(dev); @@ -256,11 +261,9 @@ static int hibmc_unload(struct drm_device *dev) drm_irq_uninstall(dev); pci_disable_msi(dev->pdev); - - return 0; } -static int hibmc_load(struct drm_device *dev) +static int hibmc_load(struct drm_device *dev, unsigned long flags) { struct hibmc_drm_private *priv = to_hibmc_drm_private(dev); int ret; @@ -335,12 +338,6 @@ static int hibmc_pci_probe(struct pci_dev *pdev, goto err_return; } - ret = hibmc_load(dev); - if (ret) { - drm_err(dev, "failed to load hibmc: %d\n", ret); - goto err_return; - } - ret = drm_dev_register(dev, 0); if (ret) { drm_err(dev, "failed to register drv for userspace access: %d\n", -- 2.7.4
[PATCH v13 1/6] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic
The mcs unlock macro (arch_mcs_lock_handoff) should accept the value to be stored into the lock argument as another argument. This allows using the same macro in cases where the value to be stored when passing the lock is different from 1. Signed-off-by: Alex Kogan Reviewed-by: Steve Sistare Reviewed-by: Waiman Long --- arch/arm/include/asm/mcs_spinlock.h | 6 +++--- include/asm-generic/mcs_spinlock.h | 4 ++-- kernel/locking/mcs_spinlock.h | 18 +- kernel/locking/qspinlock.c | 4 ++-- kernel/locking/qspinlock_paravirt.h | 2 +- 5 files changed, 17 insertions(+), 17 deletions(-) diff --git a/arch/arm/include/asm/mcs_spinlock.h b/arch/arm/include/asm/mcs_spinlock.h index 529d2cf4d06f..1eb4d733459c 100644 --- a/arch/arm/include/asm/mcs_spinlock.h +++ b/arch/arm/include/asm/mcs_spinlock.h @@ -6,7 +6,7 @@ #include /* MCS spin-locking. */ -#define arch_mcs_spin_lock_contended(lock) \ +#define arch_mcs_spin_wait(lock) \ do { \ /* Ensure prior stores are observed before we enter wfe. */ \ smp_mb(); \ @@ -14,9 +14,9 @@ do { \ wfe(); \ } while (0)\ -#define arch_mcs_spin_unlock_contended(lock) \ +#define arch_mcs_lock_handoff(lock, val) \ do { \ - smp_store_release(lock, 1); \ + smp_store_release((lock), (val)); \ dsb_sev(); \ } while (0) diff --git a/include/asm-generic/mcs_spinlock.h b/include/asm-generic/mcs_spinlock.h index 10cd4ffc6ba2..f933d99c63e0 100644 --- a/include/asm-generic/mcs_spinlock.h +++ b/include/asm-generic/mcs_spinlock.h @@ -4,8 +4,8 @@ /* * Architectures can define their own: * - * arch_mcs_spin_lock_contended(l) - * arch_mcs_spin_unlock_contended(l) + * arch_mcs_spin_wait(l) + * arch_mcs_lock_handoff(l, val) * * See kernel/locking/mcs_spinlock.c. */ diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h index 5e10153b4d3c..904ba5d0f3f4 100644 --- a/kernel/locking/mcs_spinlock.h +++ b/kernel/locking/mcs_spinlock.h @@ -21,7 +21,7 @@ struct mcs_spinlock { int count; /* nesting count, see qspinlock.c */ }; -#ifndef arch_mcs_spin_lock_contended +#ifndef arch_mcs_spin_wait /* * Using smp_cond_load_acquire() provides the acquire semantics * required so that subsequent operations happen after the @@ -29,20 +29,20 @@ struct mcs_spinlock { * ARM64 would like to do spin-waiting instead of purely * spinning, and smp_cond_load_acquire() provides that behavior. */ -#define arch_mcs_spin_lock_contended(l) \ -do { \ - smp_cond_load_acquire(l, VAL); \ +#define arch_mcs_spin_wait(l) \ +do { \ + smp_cond_load_acquire(l, VAL); \ } while (0) #endif -#ifndef arch_mcs_spin_unlock_contended +#ifndef arch_mcs_lock_handoff /* * smp_store_release() provides a memory barrier to ensure all * operations in the critical section has been completed before * unlocking. */ -#define arch_mcs_spin_unlock_contended(l) \ - smp_store_release((l), 1) +#define arch_mcs_lock_handoff(l, val) \ + smp_store_release((l), (val)) #endif /* @@ -91,7 +91,7 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct mcs_spinlock *node) WRITE_ONCE(prev->next, node); /* Wait until the lock holder passes the lock down. */ - arch_mcs_spin_lock_contended(>locked); + arch_mcs_spin_wait(>locked); } /* @@ -115,7 +115,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct mcs_spinlock *node) } /* Pass lock to next waiter. */ - arch_mcs_spin_unlock_contended(>locked); + arch_mcs_lock_handoff(>locked, 1); } #endif /* __LINUX_MCS_SPINLOCK_H */ diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index cbff6ba53d56..435d696f9250 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -471,7 +471,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) WRITE_ONCE(prev->next, node); pv_wait_node(node, prev); - arch_mcs_spin_lock_contended(>locked); +
[PATCH v13 3/6] locking/qspinlock: Introduce CNA into the slow path of qspinlock
In CNA, spinning threads are organized in two queues, a primary queue for threads running on the same node as the current lock holder, and a secondary queue for threads running on other nodes. After acquiring the MCS lock and before acquiring the spinlock, the MCS lock holder checks whether the next waiter in the primary queue (if exists) is running on the same NUMA node. If it is not, that waiter is detached from the main queue and moved into the tail of the secondary queue. This way, we gradually filter the primary queue, leaving only waiters running on the same preferred NUMA node. For more details, see https://arxiv.org/abs/1810.05600. Note that this variant of CNA may introduce starvation by continuously passing the lock between waiters in the main queue. This issue will be addressed later in the series. Enabling CNA is controlled via a new configuration option (NUMA_AWARE_SPINLOCKS). By default, the CNA variant is patched in at the boot time only if we run on a multi-node machine in native environment and the new config is enabled. (For the time being, the patching requires CONFIG_PARAVIRT_SPINLOCKS to be enabled as well. However, this should be resolved once static_call() is available.) This default behavior can be overridden with the new kernel boot command-line option "numa_spinlock=on/off" (default is "auto"). Signed-off-by: Alex Kogan Reviewed-by: Steve Sistare Reviewed-by: Waiman Long --- .../admin-guide/kernel-parameters.txt | 10 + arch/x86/Kconfig | 20 ++ arch/x86/include/asm/qspinlock.h | 4 + arch/x86/kernel/alternative.c | 4 + kernel/locking/mcs_spinlock.h | 2 +- kernel/locking/qspinlock.c| 42 ++- kernel/locking/qspinlock_cna.h| 336 ++ 7 files changed, 413 insertions(+), 5 deletions(-) create mode 100644 kernel/locking/qspinlock_cna.h diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 44fde25bb221..a6ae826c6076 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3430,6 +3430,16 @@ numa_balancing= [KNL,X86] Enable or disable automatic NUMA balancing. Allowed values are enable and disable + numa_spinlock= [NUMA, PV_OPS] Select the NUMA-aware variant + of spinlock. The options are: + auto - Enable this variant if running on a multi-node + machine in native environment. + on - Unconditionally enable this variant. + off - Unconditionally disable this variant. + + Not specifying this option is equivalent to + numa_spinlock=auto. + numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA. 'node', 'default' can be specified This can be set from sysctl after boot. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index fbf26e0f7a6a..8c5ecbe5bcd6 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1565,6 +1565,26 @@ config NUMA Otherwise, you should say N. +config NUMA_AWARE_SPINLOCKS + bool "Numa-aware spinlocks" + depends on NUMA + depends on QUEUED_SPINLOCKS + depends on 64BIT + # For now, we depend on PARAVIRT_SPINLOCKS to make the patching work. + # This is awkward, but hopefully would be resolved once static_call() + # is available. + depends on PARAVIRT_SPINLOCKS + default y + help + Introduce NUMA (Non Uniform Memory Access) awareness into + the slow path of spinlocks. + + In this variant of qspinlock, the kernel will try to keep the lock + on the same node, thus reducing the number of remote cache misses, + while trading some of the short term fairness for better performance. + + Say N if you want absolute first come first serve fairness. + config AMD_NUMA def_bool y prompt "Old style AMD Opteron NUMA detection" diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h index d86ab942219c..21d09e8db979 100644 --- a/arch/x86/include/asm/qspinlock.h +++ b/arch/x86/include/asm/qspinlock.h @@ -27,6 +27,10 @@ static __always_inline u32 queued_fetch_set_pending_acquire(struct qspinlock *lo return val; } +#ifdef CONFIG_NUMA_AWARE_SPINLOCKS +extern void cna_configure_spin_lock_slowpath(void); +#endif + #ifdef CONFIG_PARAVIRT_SPINLOCKS extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val); extern void __pv_init_lock_hash(void); diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 2400ad62f330..e04f48c2191d 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -741,6 +741,10 @@ void __init
[PATCH v13 5/6] locking/qspinlock: Avoid moving certain threads between waiting queues in CNA
Prohibit moving certain threads (e.g., in irq and nmi contexts) to the secondary queue. Those prioritized threads will always stay in the primary queue, and so will have a shorter wait time for the lock. Signed-off-by: Alex Kogan Reviewed-by: Steve Sistare Reviewed-by: Waiman Long --- kernel/locking/qspinlock_cna.h | 26 -- 1 file changed, 20 insertions(+), 6 deletions(-) diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h index d3e27549c769..ac3109ab0a84 100644 --- a/kernel/locking/qspinlock_cna.h +++ b/kernel/locking/qspinlock_cna.h @@ -4,6 +4,7 @@ #endif #include +#include /* * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock). @@ -35,7 +36,8 @@ * running on the same NUMA node. If it is not, that waiter is detached from the * main queue and moved into the tail of the secondary queue. This way, we * gradually filter the primary queue, leaving only waiters running on the same - * preferred NUMA node. + * preferred NUMA node. Note that certain priortized waiters (e.g., in + * irq and nmi contexts) are excluded from being moved to the secondary queue. * * We change the NUMA node preference after a waiter at the head of the * secondary queue spins for a certain amount of time (10ms, by default). @@ -49,6 +51,8 @@ * Dave Dice */ +#define CNA_PRIORITY_NODE 0x + struct cna_node { struct mcs_spinlock mcs; u16 numa_node; @@ -121,9 +125,10 @@ static int __init cna_init_nodes(void) static __always_inline void cna_init_node(struct mcs_spinlock *node) { + bool priority = !in_task() || irqs_disabled() || rt_task(current); struct cna_node *cn = (struct cna_node *)node; - cn->numa_node = cn->real_numa_node; + cn->numa_node = priority ? CNA_PRIORITY_NODE : cn->real_numa_node; cn->start_time = 0; } @@ -262,11 +267,13 @@ static u32 cna_order_queue(struct mcs_spinlock *node) next_numa_node = ((struct cna_node *)next)->numa_node; if (next_numa_node != numa_node) { - struct mcs_spinlock *nnext = READ_ONCE(next->next); + if (next_numa_node != CNA_PRIORITY_NODE) { + struct mcs_spinlock *nnext = READ_ONCE(next->next); - if (nnext) { - cna_splice_next(node, next, nnext); - next = nnext; + if (nnext) { + cna_splice_next(node, next, nnext); + next = nnext; + } } /* * Inherit NUMA node id of primary queue, to maintain the @@ -284,6 +291,13 @@ static __always_inline u32 cna_wait_head_or_lock(struct qspinlock *lock, struct cna_node *cn = (struct cna_node *)node; if (!cn->start_time || !intra_node_threshold_reached(cn)) { + /* +* We are at the head of the wait queue, no need to use +* the fake NUMA node ID. +*/ + if (cn->numa_node == CNA_PRIORITY_NODE) + cn->numa_node = cn->real_numa_node; + /* * Try and put the time otherwise spent spin waiting on * _Q_LOCKED_PENDING_MASK to use by sorting our lists. -- 2.24.3 (Apple Git-128)
[PATCH v13 2/6] locking/qspinlock: Refactor the qspinlock slow path
Move some of the code manipulating the spin lock into separate functions. This would allow easier integration of alternative ways to manipulate that lock. Signed-off-by: Alex Kogan Reviewed-by: Steve Sistare Reviewed-by: Waiman Long --- kernel/locking/qspinlock.c | 38 -- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c index 435d696f9250..e3518709ffdc 100644 --- a/kernel/locking/qspinlock.c +++ b/kernel/locking/qspinlock.c @@ -289,6 +289,34 @@ static __always_inline u32 __pv_wait_head_or_lock(struct qspinlock *lock, #define queued_spin_lock_slowpath native_queued_spin_lock_slowpath #endif +/* + * __try_clear_tail - try to clear tail by setting the lock value to + * _Q_LOCKED_VAL. + * @lock: Pointer to the queued spinlock structure + * @val: Current value of the lock + * @node: Pointer to the MCS node of the lock holder + */ +static __always_inline bool __try_clear_tail(struct qspinlock *lock, +u32 val, +struct mcs_spinlock *node) +{ + return atomic_try_cmpxchg_relaxed(>val, , _Q_LOCKED_VAL); +} + +/* + * __mcs_lock_handoff - pass the MCS lock to the next waiter + * @node: Pointer to the MCS node of the lock holder + * @next: Pointer to the MCS node of the first waiter in the MCS queue + */ +static __always_inline void __mcs_lock_handoff(struct mcs_spinlock *node, + struct mcs_spinlock *next) +{ + arch_mcs_lock_handoff(>locked, 1); +} + +#define try_clear_tail __try_clear_tail +#define mcs_lock_handoff __mcs_lock_handoff + #endif /* _GEN_PV_LOCK_SLOWPATH */ /** @@ -533,7 +561,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) * PENDING will make the uncontended transition fail. */ if ((val & _Q_TAIL_MASK) == tail) { - if (atomic_try_cmpxchg_relaxed(>val, , _Q_LOCKED_VAL)) + if (try_clear_tail(lock, val, node)) goto release; /* No contention */ } @@ -550,7 +578,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 val) if (!next) next = smp_cond_load_relaxed(>next, (VAL)); - arch_mcs_lock_handoff(>locked, 1); + mcs_lock_handoff(node, next); pv_kick_node(lock, next); release: @@ -575,6 +603,12 @@ EXPORT_SYMBOL(queued_spin_lock_slowpath); #undef pv_kick_node #undef pv_wait_head_or_lock +#undef try_clear_tail +#define try_clear_tail __try_clear_tail + +#undef mcs_lock_handoff +#define mcs_lock_handoff __mcs_lock_handoff + #undef queued_spin_lock_slowpath #define queued_spin_lock_slowpath __pv_queued_spin_lock_slowpath -- 2.24.3 (Apple Git-128)
[PATCH v13 6/6] locking/qspinlock: Introduce the shuffle reduction optimization into CNA
This performance optimization chooses probabilistically to avoid moving threads from the main queue into the secondary one when the secondary queue is empty. It is helpful when the lock is only lightly contended. In particular, it makes CNA less eager to create a secondary queue, but does not introduce any extra delays for threads waiting in that queue once it is created. Signed-off-by: Alex Kogan Reviewed-by: Steve Sistare Reviewed-by: Waiman Long --- kernel/locking/qspinlock_cna.h | 39 +- 1 file changed, 38 insertions(+), 1 deletion(-) diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h index ac3109ab0a84..621399242735 100644 --- a/kernel/locking/qspinlock_cna.h +++ b/kernel/locking/qspinlock_cna.h @@ -5,6 +5,7 @@ #include #include +#include /* * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock). @@ -86,6 +87,34 @@ static inline bool intra_node_threshold_reached(struct cna_node *cn) return current_time - threshold > 0; } +/* + * Controls the probability for enabling the ordering of the main queue + * when the secondary queue is empty. The chosen value reduces the amount + * of unnecessary shuffling of threads between the two waiting queues + * when the contention is low, while responding fast enough and enabling + * the shuffling when the contention is high. + */ +#define SHUFFLE_REDUCTION_PROB_ARG (7) + +/* Per-CPU pseudo-random number seed */ +static DEFINE_PER_CPU(u32, seed); + +/* + * Return false with probability 1 / 2^@num_bits. + * Intuitively, the larger @num_bits the less likely false is to be returned. + * @num_bits must be a number between 0 and 31. + */ +static bool probably(unsigned int num_bits) +{ + u32 s; + + s = this_cpu_read(seed); + s = next_pseudo_random32(s); + this_cpu_write(seed, s); + + return s & ((1 << num_bits) - 1); +} + static void __init cna_init_nodes_per_cpu(unsigned int cpu) { struct mcs_spinlock *base = per_cpu_ptr([0].mcs, cpu); @@ -290,7 +319,15 @@ static __always_inline u32 cna_wait_head_or_lock(struct qspinlock *lock, { struct cna_node *cn = (struct cna_node *)node; - if (!cn->start_time || !intra_node_threshold_reached(cn)) { + if (node->locked <= 1 && probably(SHUFFLE_REDUCTION_PROB_ARG)) { + /* +* When the secondary queue is empty, skip the call to +* cna_order_queue() with high probability. This optimization +* reduces the overhead of unnecessary shuffling of threads +* between waiting queues when the lock is only lightly contended. +*/ + cn->partial_order = LOCAL_WAITER_FOUND; + } else if (!cn->start_time || !intra_node_threshold_reached(cn)) { /* * We are at the head of the wait queue, no need to use * the fake NUMA node ID. -- 2.24.3 (Apple Git-128)
[PATCH v13 4/6] locking/qspinlock: Introduce starvation avoidance into CNA
Keep track of the time the thread at the head of the secondary queue has been waiting, and force inter-node handoff once this time passes a preset threshold. The default value for the threshold (10ms) can be overridden with the new kernel boot command-line option "numa_spinlock_threshold". The ms value is translated internally to the nearest rounded-up jiffies. Signed-off-by: Alex Kogan Reviewed-by: Steve Sistare Reviewed-by: Waiman Long --- .../admin-guide/kernel-parameters.txt | 9 ++ kernel/locking/qspinlock_cna.h| 95 --- 2 files changed, 92 insertions(+), 12 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index a6ae826c6076..fffd31089db0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -3440,6 +3440,15 @@ Not specifying this option is equivalent to numa_spinlock=auto. + numa_spinlock_threshold=[NUMA, PV_OPS] + Set the time threshold in milliseconds for the + number of intra-node lock hand-offs before the + NUMA-aware spinlock is forced to be passed to + a thread on another NUMA node. Valid values + are in the [1..100] range. Smaller values result + in a more fair, but less performant spinlock, + and vice versa. The default value is 10. + numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA. 'node', 'default' can be specified This can be set from sysctl after boot. diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h index 590402ad69ef..d3e27549c769 100644 --- a/kernel/locking/qspinlock_cna.h +++ b/kernel/locking/qspinlock_cna.h @@ -37,6 +37,12 @@ * gradually filter the primary queue, leaving only waiters running on the same * preferred NUMA node. * + * We change the NUMA node preference after a waiter at the head of the + * secondary queue spins for a certain amount of time (10ms, by default). + * We do that by flushing the secondary queue into the head of the primary queue, + * effectively changing the preference to the NUMA node of the waiter at the head + * of the secondary queue at the time of the flush. + * * For more details, see https://arxiv.org/abs/1810.05600. * * Authors: Alex Kogan @@ -49,13 +55,33 @@ struct cna_node { u16 real_numa_node; u32 encoded_tail; /* self */ u32 partial_order; /* enum val */ + s32 start_time; }; enum { LOCAL_WAITER_FOUND, LOCAL_WAITER_NOT_FOUND, + FLUSH_SECONDARY_QUEUE }; +/* + * Controls the threshold time in ms (default = 10) for intra-node lock + * hand-offs before the NUMA-aware variant of spinlock is forced to be + * passed to a thread on another NUMA node. The default setting can be + * changed with the "numa_spinlock_threshold" boot option. + */ +#define MSECS_TO_JIFFIES(m)\ + (((m) + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ)) +static int intra_node_handoff_threshold __ro_after_init = MSECS_TO_JIFFIES(10); + +static inline bool intra_node_threshold_reached(struct cna_node *cn) +{ + s32 current_time = (s32)jiffies; + s32 threshold = cn->start_time + intra_node_handoff_threshold; + + return current_time - threshold > 0; +} + static void __init cna_init_nodes_per_cpu(unsigned int cpu) { struct mcs_spinlock *base = per_cpu_ptr([0].mcs, cpu); @@ -98,6 +124,7 @@ static __always_inline void cna_init_node(struct mcs_spinlock *node) struct cna_node *cn = (struct cna_node *)node; cn->numa_node = cn->real_numa_node; + cn->start_time = 0; } /* @@ -197,8 +224,15 @@ static void cna_splice_next(struct mcs_spinlock *node, /* stick `next` on the secondary queue tail */ if (node->locked <= 1) { /* if secondary queue is empty */ + struct cna_node *cn = (struct cna_node *)node; + /* create secondary queue */ next->next = next; + + cn->start_time = (s32)jiffies; + /* make sure start_time != 0 iff secondary queue is not empty */ + if (!cn->start_time) + cn->start_time = 1; } else { /* add to the tail of the secondary queue */ struct mcs_spinlock *tail_2nd = decode_tail(node->locked); @@ -249,11 +283,15 @@ static __always_inline u32 cna_wait_head_or_lock(struct qspinlock *lock, { struct cna_node *cn = (struct cna_node *)node; - /* -* Try and put the time otherwise spent spin waiting on -* _Q_LOCKED_PENDING_MASK to use by sorting our lists. -*/ -
[PATCH v13 0/6] Add NUMA-awareness to qspinlock
Change from v12: Added a shuffle reduction optimization (SRO, last patch in the series) in order to address the regression in unixbench. Reported-by: kernel test robot I note that despite my initial experiments, a more thorough testing on our system did not reproduce the regression. The rest of the series remains unchanged. Summary --- Lock throughput can be increased by handing a lock to a waiter on the same NUMA node as the lock holder, provided care is taken to avoid starvation of waiters on other NUMA nodes. This patch introduces CNA (compact NUMA-aware lock) as the slow path for qspinlock. It is enabled through a configuration option (NUMA_AWARE_SPINLOCKS). CNA is a NUMA-aware version of the MCS lock. Spinning threads are organized in two queues, a primary queue for threads running on the same node as the current lock holder, and a secondary queue for threads running on other nodes. Threads store the ID of the node on which they are running in their queue nodes. After acquiring the MCS lock and before acquiring the spinlock, the MCS lock holder checks whether the next waiter in the primary queue (if exists) is running on the same NUMA node. If it is not, that waiter is detached from the main queue and moved into the tail of the secondary queue. This way, we gradually filter the primary queue, leaving only waiters running on the same preferred NUMA node. Note that certain priortized waiters (e.g., in irq and nmi contexts) are excluded from being moved to the secondary queue. We change the NUMA node preference after a waiter at the head of the secondary queue spins for a certain amount of time. We do that by flushing the secondary queue into the head of the primary queue, effectively changing the preference to the NUMA node of the waiter at the head of the secondary queue at the time of the flush. More details are available at https://arxiv.org/abs/1810.05600. We have done some performance evaluation with the locktorture module as well as with several benchmarks from the will-it-scale repo. The following locktorture results are from an Oracle X5-4 server (four Intel Xeon E7-8895 v3 @ 2.60GHz sockets with 18 hyperthreaded cores each). Each number represents an average (over 25 runs) of the total number of ops (x10^7) reported at the end of each run. The standard deviation is also reported in (), and in general is about 3% from the average. The 'stock' kernel is v5.10.0-rc7, commit ca4bbdaf1716, compiled in the default configuration. 'CNA' is the modified kernel with NUMA_AWARE_SPINLOCKS set; 'CNA-wo-SRO' is the modified kernel with NUMA_AWARE_SPINLOCKS set and without the last patch in the series (the SRO optimization). The speedup is calculated by dividing the result of the corresponding variant by the result achieved with 'stock'. #thr stock CNA-wo-SRO / speedup CNA / speedup 1 2.707 (0.127) 2.693 (0.100) / 0.995 2.718 (0.101) / 1.004 2 3.262 (0.075) 3.250 (0.132) / 0.996 3.246 (0.098) / 0.995 4 4.331 (0.125) 4.804 (0.184) / 1.109 4.733 (0.143) / 1.093 8 5.092 (0.148) 6.996 (0.206) / 1.374 7.000 (0.194) / 1.375 16 5.865 (0.119) 8.763 (0.161) / 1.494 8.778 (0.217) / 1.497 32 6.314 (0.098) 9.837 (0.256) / 1.558 9.720 (0.167) / 1.539 36 6.434 (0.101) 9.929 (0.259) / 1.543 9.988 (0.208) / 1.552 72 6.342 (0.080) 10.416 (0.244) / 1.642 10.224 (0.203) / 1.612 108 6.168 (0.080) 10.490 (0.199) / 1.701 10.334 (0.173) / 1.675 142 5.895 (0.119) 10.480 (0.171) / 1.778 10.424 (0.222) / 1.768 The following tables contain throughput results (ops/us) from the same setup for will-it-scale/open1_threads: #thr stock CNA-wo-SRO / speedup CNA / speedup 1 0.508 (0.001) 0.507 (0.001) / 0.997 0.508 (0.001) / 0.999 2 0.755 (0.021) 0.764 (0.018) / 1.012 0.757 (0.017) / 1.002 4 1.409 (0.027) 1.417 (0.024) / 1.006 1.387 (0.027) / 0.984 8 1.726 (0.092) 1.657 (0.129) / 0.960 1.654 (0.135) / 0.959 16 1.878 (0.099) 1.811 (0.100) / 0.964 1.761 (0.087) / 0.938 32 1.012 (0.040) 1.705 (0.086) / 1.685 1.685 (0.081) / 1.666 36 0.930 (0.088) 1.726 (0.090) / 1.855 1.727 (0.086) / 1.856 72 0.826 (0.037) 1.645 (0.079) / 1.991 1.621 (0.076) / 1.962 108 0.845 (0.028) 1.685 (0.072) / 1.993 1.688 (0.073) / 1.997 142 0.827 (0.035) 1.712 (0.069) / 2.070 1.696 (0.064) / 2.052 and will-it-scale/lock2_threads: #thr stock CNA-wo-SRO / speedup CNA / speedup 1 1.587 (0.004) 1.564 (0.003) / 0.985 1.577 (0.002) / 0.994 2 2.802 (0.057) 2.752 (0.049) / 0.982 2.776 (0.065) / 0.991 4 5.365 (0.352) 5.368 (0.196) / 1.001 5.348 (0.297) / 0.997 8 4.161 (0.270) 4.001 (0.402) / 0.962 4.032 (0.389) / 0.969 16 4.144 (0.130) 3.940 (0.159) / 0.951 3.917 (0.133) / 0.945 32 2.444 (0.097) 3.996 (0.102) / 1.635 3.969 (0.130) / 1.624 36 2.429 (0.070) 3.891 (0.087) / 1.602 3.894 (0.096) / 1.603 72 1.847 (0.095) 3.929 (0.108) / 2.128 3.942 (0.094) / 2.135 108 1.903 (0.117) 3.898 (0.108) / 2.048 3.901 (0.105) /
Re: [External] Re: [PATCH] net/ncsi: Use real net-device for response handler
On Wed, Dec 23, 2020 at 10:25 AM Jakub Kicinski wrote: > > On Tue, 22 Dec 2020 10:38:21 -0800 Samuel Mendoza-Jonas wrote: > > On Tue, 2020-12-22 at 06:13 +, Joel Stanley wrote: > > > On Sun, 20 Dec 2020 at 12:40, John Wang wrote: > > > > When aggregating ncsi interfaces and dedicated interfaces to bond > > > > interfaces, the ncsi response handler will use the wrong net device > > > > to > > > > find ncsi_dev, so that the ncsi interface will not work properly. > > > > Here, we use the net device registered to packet_type to fix it. > > > > > > > > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler") > > > > Signed-off-by: John Wang > > This sounds like exactly the case for which orig_dev was introduced. > I think you should use the orig_dev argument, rather than pt->dev. will send a v2 > > Can you test if that works? Yes, it works. > > > > Can you show me how to reproduce this? On g220a, eth1 is the dedicated interface, eth0 is the ncsi interface kernel cfg: CONFIG_BONDING=y cat /etc/systemd/network/00-bmc-bond1.netdev [NetDev] Name=bond1 Description=Bond eth0 and eth1 Kind=bond [Bond] Mode=active-backup cat /etc/systemd/network/00-bmc-eth0.network [Match] Name=eth0 [Network] Bond=bond1 cat /etc/systemd/network/00-bmc-eth0.network [Match] Name=eth1 [Network] Bond=bond1 PrimarySlave=true ip addr 6: bond1: mtu 1500 qdisc noqueue qlen 1000 link/ether b4:05:5d:8f:6a:ad brd ff:ff:ff:ff:ff:ff inet 169.254.11.178/16 brd 169.254.255.255 scope link bond1 valid_lft forever preferred_lft forever inet 192.168.1.108/24 brd 192.168.1.255 scope global bond1 valid_lft forever preferred_lft forever inet 10.2.16.118/24 brd 10.2.16.255 scope global bond1 valid_lft forever preferred_lft forever inet6 fe80::b605:5dff:fe8f:6aad/64 scope link ... Without this patch: After bmc boots: echo eth0 > /sys/class/net/bond1/bonding/active_slave admin@g220a:~# admin@g220a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave [ 105.964357] bond1: (slave eth0): making interface the new active one admin@g220a:~# ping 10.2.16.1 PING 10.2.16.1 (10.2.16.1): 56 data bytes 64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.096 ms 64 bytes from 10.2.16.1: seq=1 ttl=255 time=2.143 ms 64 bytes from 10.2.16.1: seq=2 ttl=255 time=2.111 ms [ 112.642734] ftgmac100 1e66.ethernet eth0: NCSI Channel 0 timed out! 64 bytes from 10.2.16.1: seq=3 ttl=255 time=2.039 ms 64 bytes from 10.2.16.1: seq=4 ttl=255 time=2.037 ms [ 117.842814] ftgmac100 1e66.ethernet eth0: NCSI: No channel with link found, configuring channel 0 [ 134.482746] ftgmac100 1e66.ethernet eth0: NCSI Channel 0 timed out! [ 139.682820] ftgmac100 1e66.ethernet eth0: NCSI: No channel with link found, configuring channel 0 with this patch: After bmc boots: admin@g220a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave [58332.123754] bond1: (slave eth0): making interface the new active one admin@g220a:~# ping 10.2.16.1 PING 10.2.16.1 (10.2.16.1): 56 data bytes 64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.279 ms ... ... 64 bytes from 10.2.16.1: seq=N ttl=255 time=2.037 ms > > > > > > I don't know the ncsi or net code well enough to know if this is the > > > correct fix. If you are confident it is correct then I have no > > > objections. > > > > This looks like it is probably right; pt->dev will be the original > > device from ncsi_register_dev(), if a response comes in to > > ncsi_rcv_rsp() associated with a different device then the driver will > > fail to find the correct ncsi_dev_priv. An example of the broken case > > would be good to see though. > > From the description sounds like the case is whenever the ncsi > interface is in a bond, the netdev from the second argument is > the bond not the interface from which the frame came. It should > be possible to repro even with only one interface on the system, > create a bond or a team and add the ncsi interface to it. > > Does that make sense? I'm likely missing the subtleties here. :) I guess so.
Re: [PATCH v3 09/24] wfx: add hwio.c/hwio.h
Jérôme Pouiller writes: > On Tuesday 22 December 2020 16:27:01 CET Greg Kroah-Hartman wrote: >> >> On Tue, Dec 22, 2020 at 05:10:11PM +0200, Kalle Valo wrote: >> > Jerome Pouiller writes: >> > >> > > +/* >> > > + * Internal helpers. >> > > + * >> > > + * About CONFIG_VMAP_STACK: >> > > + * When CONFIG_VMAP_STACK is enabled, it is not possible to run DMA on >> > > stack >> > > + * allocated data. Functions below that work with registers (aka >> > > functions >> > > + * ending with "32") automatically reallocate buffers with kmalloc. >> > > However, >> > > + * functions that work with arbitrary length buffers let's caller to >> > > handle >> > > + * memory location. In doubt, enable CONFIG_DEBUG_SG to detect badly >> > > located >> > > + * buffer. >> > > + */ >> > >> > This sounds very hacky to me, I have understood that you should never >> > use stack with DMA. >> >> You should never do that because some platforms do not support it, so no >> driver should ever try to do that as they do not know what platform they >> are running on. > > Yes, I have learned this rule the hard way. > > There is no better way than a comment to warn the user that the argument > will be used with a DMA? A Sparse annotation, for example? I have not seen anything, but something like sparse annotation would be useful. Please let me know if you find anything like that. But I think that CONFIG_VMAP_STACK is irrelevant and the comment should be clarified that using stack memory must NOT be used for DMA operations in any circumstances. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
[PATCH 1/3] objtool: Refactor ORC section generation
Decouple ORC entries from instructions. This simplifies the control/data flow, and is going to make it easier to support alternative instructions which change the stack layout. Signed-off-by: Josh Poimboeuf --- tools/objtool/Makefile | 4 - tools/objtool/arch.h| 4 - tools/objtool/builtin-orc.c | 6 +- tools/objtool/check.h | 3 - tools/objtool/objtool.h | 3 +- tools/objtool/orc_gen.c | 272 ++-- tools/objtool/weak.c| 7 +- 7 files changed, 140 insertions(+), 159 deletions(-) diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile index 5cdb19036d7f..a43096f713c7 100644 --- a/tools/objtool/Makefile +++ b/tools/objtool/Makefile @@ -46,10 +46,6 @@ ifeq ($(SRCARCH),x86) SUBCMD_ORC := y endif -ifeq ($(SUBCMD_ORC),y) - CFLAGS += -DINSN_USE_ORC -endif - export SUBCMD_CHECK SUBCMD_ORC export srctree OUTPUT CFLAGS SRCARCH AWK include $(srctree)/tools/build/Makefile.include diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h index 4a84c3081b8e..5e3f3ea8bb89 100644 --- a/tools/objtool/arch.h +++ b/tools/objtool/arch.h @@ -11,10 +11,6 @@ #include "objtool.h" #include "cfi.h" -#ifdef INSN_USE_ORC -#include -#endif - enum insn_type { INSN_JUMP_CONDITIONAL, INSN_JUMP_UNCONDITIONAL, diff --git a/tools/objtool/builtin-orc.c b/tools/objtool/builtin-orc.c index 7b31121fa60b..508bdf6ae8dc 100644 --- a/tools/objtool/builtin-orc.c +++ b/tools/objtool/builtin-orc.c @@ -51,11 +51,7 @@ int cmd_orc(int argc, const char **argv) if (list_empty(>insn_list)) return 0; - ret = create_orc(file); - if (ret) - return ret; - - ret = create_orc_sections(file); + ret = orc_create(file); if (ret) return ret; diff --git a/tools/objtool/check.h b/tools/objtool/check.h index 5ec00a4b891b..4c10916ff1cf 100644 --- a/tools/objtool/check.h +++ b/tools/objtool/check.h @@ -43,9 +43,6 @@ struct instruction { struct symbol *func; struct list_head stack_ops; struct cfi_state cfi; -#ifdef INSN_USE_ORC - struct orc_entry orc; -#endif }; static inline bool is_static_jump(struct instruction *insn) diff --git a/tools/objtool/objtool.h b/tools/objtool/objtool.h index 4125d4578b23..5e58d3537e2f 100644 --- a/tools/objtool/objtool.h +++ b/tools/objtool/objtool.h @@ -26,7 +26,6 @@ struct objtool_file *objtool_open_read(const char *_objname); int check(struct objtool_file *file); int orc_dump(const char *objname); -int create_orc(struct objtool_file *file); -int create_orc_sections(struct objtool_file *file); +int orc_create(struct objtool_file *file); #endif /* _OBJTOOL_H */ diff --git a/tools/objtool/orc_gen.c b/tools/objtool/orc_gen.c index 235663b96adc..73efba2bfa72 100644 --- a/tools/objtool/orc_gen.c +++ b/tools/objtool/orc_gen.c @@ -12,89 +12,84 @@ #include "check.h" #include "warn.h" -int create_orc(struct objtool_file *file) +static int init_orc_entry(struct orc_entry *orc, struct cfi_state *cfi) { - struct instruction *insn; + struct instruction *insn = container_of(cfi, struct instruction, cfi); + struct cfi_reg *bp = >regs[CFI_BP]; - for_each_insn(file, insn) { - struct orc_entry *orc = >orc; - struct cfi_reg *cfa = >cfi.cfa; - struct cfi_reg *bp = >cfi.regs[CFI_BP]; + memset(orc, 0, sizeof(*orc)); - if (!insn->sec->text) - continue; - - orc->end = insn->cfi.end; + orc->end = cfi->end; - if (cfa->base == CFI_UNDEFINED) { - orc->sp_reg = ORC_REG_UNDEFINED; - continue; - } - - switch (cfa->base) { - case CFI_SP: - orc->sp_reg = ORC_REG_SP; - break; - case CFI_SP_INDIRECT: - orc->sp_reg = ORC_REG_SP_INDIRECT; - break; - case CFI_BP: - orc->sp_reg = ORC_REG_BP; - break; - case CFI_BP_INDIRECT: - orc->sp_reg = ORC_REG_BP_INDIRECT; - break; - case CFI_R10: - orc->sp_reg = ORC_REG_R10; - break; - case CFI_R13: - orc->sp_reg = ORC_REG_R13; - break; - case CFI_DI: - orc->sp_reg = ORC_REG_DI; - break; - case CFI_DX: - orc->sp_reg = ORC_REG_DX; - break; - default: - WARN_FUNC("unknown CFA base reg %d", - insn->sec, insn->offset, cfa->base); - return -1; -
[PATCH 2/3] objtool: Add 'alt_group' struct
Create a new struct associated with each group of alternatives instructions. This will help with the removal of fake jumps, and more importantly with adding support for stack layout changes in alternatives. Signed-off-by: Josh Poimboeuf --- tools/objtool/check.c | 29 +++-- tools/objtool/check.h | 13 - 2 files changed, 35 insertions(+), 7 deletions(-) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index c6ab44543c92..67f39b57c6f7 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -984,20 +984,28 @@ static int handle_group_alt(struct objtool_file *file, struct instruction *orig_insn, struct instruction **new_insn) { - static unsigned int alt_group_next_index = 1; struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump = NULL; - unsigned int alt_group = alt_group_next_index++; + struct alt_group *orig_alt_group, *new_alt_group; unsigned long dest_off; + + orig_alt_group = malloc(sizeof(*orig_alt_group)); + if (!orig_alt_group) { + WARN("malloc failed"); + return -1; + } last_orig_insn = NULL; insn = orig_insn; sec_for_each_insn_from(file, insn) { if (insn->offset >= special_alt->orig_off + special_alt->orig_len) break; - insn->alt_group = alt_group; + insn->alt_group = orig_alt_group; last_orig_insn = insn; } + orig_alt_group->orig_group = NULL; + orig_alt_group->first_insn = orig_insn; + orig_alt_group->last_insn = last_orig_insn; if (next_insn_same_sec(file, last_orig_insn)) { fake_jump = malloc(sizeof(*fake_jump)); @@ -1028,8 +1036,13 @@ static int handle_group_alt(struct objtool_file *file, return 0; } + new_alt_group = malloc(sizeof(*new_alt_group)); + if (!new_alt_group) { + WARN("malloc failed"); + return -1; + } + last_new_insn = NULL; - alt_group = alt_group_next_index++; insn = *new_insn; sec_for_each_insn_from(file, insn) { struct reloc *alt_reloc; @@ -1041,7 +1054,7 @@ static int handle_group_alt(struct objtool_file *file, insn->ignore = orig_insn->ignore_alts; insn->func = orig_insn->func; - insn->alt_group = alt_group; + insn->alt_group = new_alt_group; /* * Since alternative replacement code is copy/pasted by the @@ -1090,6 +1103,10 @@ static int handle_group_alt(struct objtool_file *file, return -1; } + new_alt_group->orig_group = orig_alt_group; + new_alt_group->first_insn = *new_insn; + new_alt_group->last_insn = last_new_insn; + if (fake_jump) list_add(_jump->list, _new_insn->list); @@ -2405,7 +2422,7 @@ static int validate_return(struct symbol *func, struct instruction *insn, struct static void fill_alternative_cfi(struct objtool_file *file, struct instruction *insn) { struct instruction *first_insn = insn; - int alt_group = insn->alt_group; + struct alt_group *alt_group = insn->alt_group; sec_for_each_insn_continue(file, insn) { if (insn->alt_group != alt_group) diff --git a/tools/objtool/check.h b/tools/objtool/check.h index 4c10916ff1cf..b74c383c2d83 100644 --- a/tools/objtool/check.h +++ b/tools/objtool/check.h @@ -19,6 +19,17 @@ struct insn_state { s8 instr; }; +struct alt_group { + /* +* Pointer from a replacement group to the original group. NULL if it +* *is* the original group. +*/ + struct alt_group *orig_group; + + /* First and last instructions in the group */ + struct instruction *first_insn, *last_insn; +}; + struct instruction { struct list_head list; struct hlist_node hash; @@ -34,7 +45,7 @@ struct instruction { s8 instr; u8 visited; u8 ret_offset; - int alt_group; + struct alt_group *alt_group; struct symbol *call_dest; struct instruction *jump_dest; struct instruction *first_jump_src; -- 2.29.2
[PATCH 3/3] objtool: Support stack layout changes in alternatives
The ORC unwinder showed a warning [1] which revealed the stack layout didn't match what was expected. The problem was that paravirt patching had replaced "CALL *pv_ops.irq.save_fl" with "PUSHF;POP". That changed the stack layout between the PUSHF and the POP, so unwinding from an interrupt which occurred between those two instructions would fail. Part of the agreed upon solution was to rework the custom paravirt patching code to use alternatives instead, since objtool already knows how to read alternatives (and converging runtime patching infrastructure is always a good thing anyway). But the main problem still remains, which is that runtime patching can change the stack layout. Making stack layout changes in alternatives was disallowed with commit 7117f16bf460 ("objtool: Fix ORC vs alternatives"), but now that paravirt is going to be doing it, it needs to be supported. One way to do so would be to modify the ORC table when the code gets patched. But ORC is simple -- a good thing! -- and it's best to leave it alone. Instead, support stack layout changes by "flattening" all possible stack states (CFI) from parallel alternative code streams into a single set of linear states. The only necessary limitation is that CFI conflicts are disallowed at all possible instruction boundaries. For example, this scenario is allowed: Alt1Alt2Alt3 0x00 CALL *pv_ops.save_flCALL xen_save_flPUSHF 0x01 POP %RAX 0x02 NOP ... 0x05 NOP ... 0x07 The unwind information for offset-0x00 is identical for all 3 alternatives. Similarly offset-0x05 and higher also are identical (and the same as 0x00). However offset-0x01 has deviating CFI, but that is only relevant for Alt3, neither of the other alternative instruction streams will ever hit that offset. This scenario is NOT allowed: Alt1Alt2 0x00 CALL *pv_ops.save_flPUSHF 0x01 NOP6 ... 0x07 NOP POP %RAX The problem here is that offset-0x7, which is an instruction boundary in both possible instruction patch streams, has two conflicting stack layouts. [ The above examples were stolen from Peter Zijlstra. ] The new flattened CFI array is used both for the detection of conflicts (like the second example above) and the generation of linear ORC entries. BTW, another benefit of these changes is that, thanks to some related cleanups (new fake nops and alt_group struct) objtool can finally be rid of fake jumps, which were a constant source of headaches. [1] https://lkml.kernel.org/r/2020170536.arx2zbn4ngvjoov7@treble Cc: Shinichiro Kawasaki Signed-off-by: Josh Poimboeuf --- .../Documentation/stack-validation.txt| 16 +- tools/objtool/check.c | 175 ++ tools/objtool/check.h | 6 + tools/objtool/orc_gen.c | 56 +- 4 files changed, 157 insertions(+), 96 deletions(-) diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt index 0542e46c7552..30f38fdc0d56 100644 --- a/tools/objtool/Documentation/stack-validation.txt +++ b/tools/objtool/Documentation/stack-validation.txt @@ -315,13 +315,15 @@ they mean, and suggestions for how to fix them. function tracing inserts additional calls, which is not obvious from the sources). -10. file.o: warning: func()+0x5c: alternative modifies stack - -This means that an alternative includes instructions that modify the -stack. The problem is that there is only one ORC unwind table, this means -that the ORC unwind entries must be valid for each of the alternatives. -The easiest way to enforce this is to ensure alternatives do not contain -any ORC entries, which in turn implies the above constraint. +10. file.o: warning: func()+0x5c: stack layout conflict in alternatives + +This means that in the use of the alternative() or ALTERNATIVE() +macro, the code paths have conflicting modifications to the stack. +The problem is that there is only one ORC unwind table, which means +that the ORC unwind entries must be consistent for all possible +instruction boundaries regardless of which code has been patched. +This limitation can be overcome by massaging the alternatives with +NOPs to shift the stack changes around so they no longer conflict. 11. file.o: warning: unannotated intra-function call diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 67f39b57c6f7..81d56fdef1c3 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -19,8 +19,6 @@ #include #include -#define FAKE_JUMP_OFFSET -1 - struct alternative { struct list_head list; struct instruction *insn; @@ -767,9
[PATCH 0/3] Alternatives vs ORC, a slightly easier way
These patches replace Peter's "Alternatives vs ORC, the hard way". The end result should be the same (support for paravirt patching's using of alternatives which modify the stack). Josh Poimboeuf (3): objtool: Refactor ORC section generation objtool: Add 'alt_group' struct objtool: Support stack layout changes in alternatives .../Documentation/stack-validation.txt| 16 +- tools/objtool/Makefile| 4 - tools/objtool/arch.h | 4 - tools/objtool/builtin-orc.c | 6 +- tools/objtool/check.c | 190 ++- tools/objtool/check.h | 22 +- tools/objtool/objtool.h | 3 +- tools/objtool/orc_gen.c | 308 ++ tools/objtool/weak.c | 7 +- 9 files changed, 315 insertions(+), 245 deletions(-) -- 2.29.2
Re: [PATCH v3 03/24] wfx: add Makefile/Kconfig
Jérôme Pouiller writes: > On Tuesday 22 December 2020 16:02:38 CET Kalle Valo wrote: >> Jerome Pouiller writes: >> >> > From: Jérôme Pouiller >> > >> > Signed-off-by: Jérôme Pouiller >> >> [...] >> >> > +wfx-$(CONFIG_SPI) += bus_spi.o >> > +wfx-$(subst m,y,$(CONFIG_MMC)) += bus_sdio.o >> >> Why this subst? And why only for MMC? > > CONFIG_SPI is a boolean (y or empty). The both values make senses. > > CONFIG_MMC is a tristate (y, m or empty). The substitution above > ensure that bus_sdio.o will included in wfx.ko if CONFIG_MMC is 'm' > ("wfx-$(CONFIG_MMC) += bus_sdio.o" wouldn't make the job). > > You may want to know what it happens if CONFIG_MMC=m while CONFIG_WFX=y. > This line in Kconfig prevents to compile wfx statically if MMC is a > module: >depends on MMC || !MMC # do not allow WFX=y if MMC=m Ok, thanks for explaining this. -- https://patchwork.kernel.org/project/linux-wireless/list/ https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches
Re: linux-next: Tree for Dec 21 (objtool warning)
On Mon, Dec 21, 2020 at 08:03:17AM -0800, Randy Dunlap wrote: > On 12/20/20 7:18 PM, Stephen Rothwell wrote: > > Hi all, > > > > News: there will be no linux-next releases between Dec 24 and Jan > > 3 inclusive. > > > > Please do not add any v5.12 destined code to your linux-next included > > branches until after v5.11-rc1 has been released. > > > > Changes since 20201218: > > > > on x86_64: > > arch/x86/kernel/sys_ia32.o: warning: objtool: cp_stat64()+0xd8: call to > new_encode_dev() with UACCESS enabled Can you send a .o for this one? Please gzip it because my email has been rejecting .o files lately :-/ -- Josh
Re: [PATCH v13 4/6] powerpc: Delete unused function delete_fdt_mem_rsv()
On 12/22/20 5:08 PM, Thiago Jung Bauermann wrote: Lakshmi Ramasubramanian writes: delete_fdt_mem_rsv() defined in "arch/powerpc/kexec/file_load.c" has been renamed to fdt_find_and_del_mem_rsv(), and moved to "drivers/of/kexec.c". Remove delete_fdt_mem_rsv() in "arch/powerpc/kexec/file_load.c". Co-developed-by: Prakhar Srivastava Signed-off-by: Prakhar Srivastava Signed-off-by: Lakshmi Ramasubramanian --- arch/powerpc/include/asm/kexec.h | 1 - arch/powerpc/kexec/file_load.c | 32 2 files changed, 33 deletions(-) As I mentioned in the other email, this patch could remove setup_new_fdt() as well. I'm a bit ambivalent on whether this patch should be squashed with patch 2 or left on its own, but I tend toward the latter option because patch 2 is big enough already. I also think Patch #2 is already big enough - I don't want to make more changes in that patch. I will remove delete_fdt_mem_rsv() and setup_new_fdt() in this patch (Patch #4) and call of_kexec_setup_new_fdt() directly (in setup_new_fdt_ppc64()). thanks, -lakshmi
[PATCH] mm/uaccess: Use 'unsigned long' to placate UBSAN warnings, again
GCC 7 has a known bug where UBSAN ignores '-fwrapv' and generates false signed-overflow-UB warnings. The type mismatch between 'i' and 'nr_segs' in copy_compat_iovec_from_user() is causing such a warning, which also happens to violate uaccess rules: lib/iov_iter.o: warning: objtool: iovec_from_user()+0x22d: call to __ubsan_handle_add_overflow() with UACCESS enabled Fix it by making the variable types match. This is similar to a previous commit: 29da93fea3ea ("mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions") Reported-by: Randy Dunlap Signed-off-by: Josh Poimboeuf --- lib/iov_iter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 1635111c5bd2..2e6a42f5d1df 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1656,7 +1656,8 @@ static int copy_compat_iovec_from_user(struct iovec *iov, { const struct compat_iovec __user *uiov = (const struct compat_iovec __user *)uvec; - int ret = -EFAULT, i; + int ret = -EFAULT; + unsigned long i; if (!user_access_begin(uvec, nr_segs * sizeof(*uvec))) return -EFAULT; -- 2.29.2
Re: [PATCH v13 2/6] powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c
On 12/22/20 4:40 PM, Thiago Jung Bauermann wrote: Lakshmi Ramasubramanian writes: On 12/22/20 11:45 AM, Mimi Zohar wrote: On Tue, 2020-12-22 at 10:53 -0800, Lakshmi Ramasubramanian wrote: On 12/22/20 6:26 AM, Mimi Zohar wrote: Hi Mimi, On Sat, 2020-12-19 at 09:57 -0800, Lakshmi Ramasubramanian wrote: diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile index 4aff6846c772..b6c52608cb49 100644 --- a/arch/powerpc/kexec/Makefile +++ b/arch/powerpc/kexec/Makefile @@ -9,13 +9,6 @@ obj-$(CONFIG_PPC32)+= relocate_32.o obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o file_load_$(BITS).o elf_$(BITS).o -ifdef CONFIG_HAVE_IMA_KEXEC -ifdef CONFIG_IMA -obj-y += ima.o -endif -endif Notice how "kexec/ima.o" is only included if the architecture supports it and IMA is configured. In addition only if CONFIG_IMA_KEXEC is configured, is the IMA measurement list carried across kexec. After moving the rest of ima.c to drivers/of/kexec.c, this changes. Notice how drivers/of/Kconfig includes kexec.o: obj-$(CONFIG_KEXEC_FILE) += kexec.o It is not dependent on CONFIG_HAVE_IMA_KEXEC. Shouldn't all of the functions defined in ima.c being moved to kexec.o be defined within a CONFIG_HAVE_IMA_KEXEC ifdef? Thanks for reviewing the changes. In "drivers/of/kexec.c" the function remove_ima_buffer() is defined under "#ifdef CONFIG_HAVE_IMA_KEXEC" setup_ima_buffer() is defined under "#ifdef CONFIG_IMA_KEXEC" - the same way it was defined in "arch/powerpc/kexec/ima.c". As you know, CONFIG_IMA_KEXEC depends on CONFIG_HAVE_IMA_KEXEC (as defined in "security/integrity/ima/Kconfig"). ima_get_kexec_buffer() and ima_free_kexec_buffer() are unconditionally defined in "drivers/of/kexec.c" even though they are called only when CONFIG_HAVE_IMA_KEXEC is enabled. I will update these two functions to be moved under "#ifdef CONFIG_HAVE_IMA_KEXEC" The issue is the reverse. CONFIG_HAVE_IMA_KEXEC may be enabled without CONFIG_IMA_KEXEC being enabled. This allows the architecture to support carrying the measurement list across kexec, but requires enabling it at build time. Only if CONFIG_HAVE_IMA_KEXEC is enabled should any of these functions be compiled at build. This allows restoring the previous IMA measurement list, even if CONFIG_IMA_KEXEC is not enabled. Only if CONFIG_IMA_KEXEC is enabled, should carrying the measurement list across kexec be enabled. See how arch_ima_add_kexec_buffer, write_number, setup_ima_buffer are ifdef'ed in arch/powerpc/kexec/ima.c. Yes - I agree. I will make the following changes: => Enable the functions moved from "arch/powerpc/kexec/ima.c" to "drivers/of/kexec.c" only when CONFIG_HAVE_IMA_KEXEC is enabled. => Also, compile write_number() and setup_ima_buffer() only when CONFIG_IMA_KEXEC is enabled. Sounds good, with one additional change: So far, CONFIG_HAVE_IMA_KEXEC was tested only in files that were built when CONFIG_IMA was set. With this series this is not the case anymore (in drivers/of/kexec.c). The simplest way to keep this consistent is to only enable CONFIG_HAVE_IMA_KEXEC if CONFIG_IMA is also set. For example, with this: diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index e9f13fe08492..4ddd17215ecf 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -548,7 +548,7 @@ config KEXEC config KEXEC_FILE bool "kexec file based system call" select KEXEC_CORE - select HAVE_IMA_KEXEC + select HAVE_IMA_KEXEC if IMA select BUILD_BIN2C select KEXEC_ELF depends on PPC64 And then the same thing on the arm64 patch. This is a good idea Thiago - I will make this change in the Kconfig for both powerpc and arm64. thanks, -lakshmi
Re: [PATCH v13 2/6] powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c
On 12/22/20 4:19 PM, Thiago Jung Bauermann wrote: Lakshmi Ramasubramanian writes: The functions defined in "arch/powerpc/kexec/ima.c" handle setting up and freeing the resources required to carry over the IMA measurement list from the current kernel to the next kernel across kexec system call. These functions do not have architecture specific code, but are currently limited to powerpc. Move setup_ima_buffer() call into of_kexec_setup_new_fdt() defined in "drivers/of/kexec.c". Move the remaining architecture independent functions from "arch/powerpc/kexec/ima.c" to "drivers/of/kexec.c". Delete "arch/powerpc/kexec/ima.c" and "arch/powerpc/include/asm/ima.h". Remove references to the deleted files in powerpc and in ima. Co-developed-by: Prakhar Srivastava Signed-off-by: Prakhar Srivastava Signed-off-by: Lakshmi Ramasubramanian --- arch/powerpc/include/asm/ima.h | 27 arch/powerpc/kexec/Makefile| 7 - arch/powerpc/kexec/file_load.c | 7 - arch/powerpc/kexec/ima.c | 202 - drivers/of/kexec.c | 235 + include/linux/of.h | 2 + security/integrity/ima/ima.h | 4 - security/integrity/ima/ima_kexec.c | 1 + 8 files changed, 238 insertions(+), 247 deletions(-) delete mode 100644 arch/powerpc/include/asm/ima.h delete mode 100644 arch/powerpc/kexec/ima.c This looks good, provided the changes from the discussion with Mimi are made. Also, minor nits below. I will address the changes Mimi had stated. diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h index 6ebefec616e4..7c3947ad3773 100644 --- a/security/integrity/ima/ima.h +++ b/security/integrity/ima/ima.h @@ -24,10 +24,6 @@ #include "../integrity.h" -#ifdef CONFIG_HAVE_IMA_KEXEC -#include -#endif - enum ima_show_type { IMA_SHOW_BINARY, IMA_SHOW_BINARY_NO_FIELD_LEN, IMA_SHOW_BINARY_OLD_STRING_FMT, IMA_SHOW_ASCII }; enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8, TPM_PCR10 = 10 }; This belongs in patch 1. No - the reference to "asm/ima.h" cannot be removed in Patch #1 since ima_get_kexec_buffer() and ima_free_kexec_buffer() are still declared in this header. They are moved in this patch only (Patch #2). diff --git a/security/integrity/ima/ima_kexec.c b/security/integrity/ima/ima_kexec.c index 38bcd7543e27..8a6712981dee 100644 --- a/security/integrity/ima/ima_kexec.c +++ b/security/integrity/ima/ima_kexec.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include "ima.h" This include isn't necessary. This change is necessary because ima_get_kexec_buffer() and ima_free_kexec_buffer() are now declared in "linux/of.h". -lakshmi
Re: [PATCH v13 2/6] powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c
On 12/22/20 4:48 PM, Thiago Jung Bauermann wrote: Actually, I have one more comment on this patch: Lakshmi Ramasubramanian writes: diff --git a/arch/powerpc/kexec/file_load.c b/arch/powerpc/kexec/file_load.c index 956bcb2d1ec2..9f3ec0b239ef 100644 --- a/arch/powerpc/kexec/file_load.c +++ b/arch/powerpc/kexec/file_load.c @@ -20,7 +20,6 @@ #include #include #include -#include #define SLAVE_CODE_SIZE 256 /* First 0x100 bytes */ @@ -163,12 +162,6 @@ int setup_new_fdt(const struct kimage *image, void *fdt, if (ret) goto err; - ret = setup_ima_buffer(image, fdt, fdt_path_offset(fdt, "/chosen")); - if (ret) { - pr_err("Error setting up the new device tree.\n"); - return ret; - } - return 0; err: With this change, setup_new_fdt() is nothing more than a call to of_kexec_setup_new_fdt(). It should be removed, and its caller should call of_kexec_setup_new_fdt() directly. This change could be done in patch 4 of this series, to keep this patch simpler. Sure Thiago - I will make that change. thanks, -lakshmi
UBSAN: shift-out-of-bounds in vhci_hub_control
Hello, syzbot found the following issue on: HEAD commit:a409ed15 Merge tag 'gpio-v5.11-1' of git://git.kernel.org/.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=1053b62350 kernel config: https://syzkaller.appspot.com/x/.config?x=f7c39e7211134bc0 dashboard link: https://syzkaller.appspot.com/bug?extid=297d20e437b79283bf6d compiler: gcc (GCC) 10.1.0-syz 20200507 syz repro: https://syzkaller.appspot.com/x/repro.syz?x=15f4f13750 C reproducer: https://syzkaller.appspot.com/x/repro.c?x=1115f30f50 IMPORTANT: if you fix the issue, please add the following tag to the commit: Reported-by: syzbot+297d20e437b79283b...@syzkaller.appspotmail.com UBSAN: shift-out-of-bounds in drivers/usb/usbip/vhci_hcd.c:399:41 shift exponent 768 is too large for 32-bit type 'int' CPU: 1 PID: 8482 Comm: syz-executor092 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x107/0x163 lib/dump_stack.c:120 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395 vhci_hub_control.cold+0x205/0x246 drivers/usb/usbip/vhci_hcd.c:399 rh_call_control drivers/usb/core/hcd.c:683 [inline] rh_urb_enqueue drivers/usb/core/hcd.c:841 [inline] usb_hcd_submit_urb+0xcaa/0x22d0 drivers/usb/core/hcd.c:1544 usb_submit_urb+0x6e4/0x1560 drivers/usb/core/urb.c:585 usb_start_wait_urb+0x101/0x4c0 drivers/usb/core/message.c:58 usb_internal_control_msg drivers/usb/core/message.c:102 [inline] usb_control_msg+0x31c/0x4a0 drivers/usb/core/message.c:153 do_proc_control+0x4cb/0x9c0 drivers/usb/core/devio.c:1165 proc_control drivers/usb/core/devio.c:1191 [inline] usbdev_do_ioctl drivers/usb/core/devio.c:2535 [inline] usbdev_ioctl+0x12c1/0x3b20 drivers/usb/core/devio.c:2708 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl fs/ioctl.c:739 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x443f39 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 fb d7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7ffd18a092c8 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 004002e0 RCX: 00443f39 RDX: 2000 RSI: c0185500 RDI: 0003 RBP: 006ce018 R08: R09: 004002e0 R10: 000f R11: 0246 R12: 00401bc0 R13: 00401c50 R14: R15: --- This report is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this issue. See: https://goo.gl/tpsmEJ#status for how to communicate with syzbot. syzbot can test patches for this issue, for details see: https://goo.gl/tpsmEJ#testing-patches
linux-next: Tree for Dec 23
Hi all, News: there will be no linux-next releases between Dec 24 and Jan 3 inclusive. Please do not add any v5.12 destined code to your linux-next included branches until after v5.11-rc1 has been released. Changes since 20201222: Non-merge commits (relative to Linus' tree): 923 941 files changed, 28744 insertions(+), 9573 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig and htmldocs. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 329 trees (counting Linus' and 85 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (8653b778e454 Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux) Merging fixes/fixes (9223e74f9960 Merge tag 'io_uring-5.10-2020-11-27' of git://git.kernel.dk/linux-block) Merging kbuild-current/fixes (e37b12e4bb21 Merge tag 'for-linus-5.11-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux) Merging arc-current/for-curr (3a71e423133a ARC: build: use $(READELF) instead of hard-coded readelf) Merging arm-current/fixes (e64ab473ddda ARM: 9034/1: __div64_32(): straighten up inline asm constraints) Merging arm64-fixes/for-next/fixes (9fd339a45be5 arm64: Work around broken GCC 4.9 handling of "S" constraint) Merging arm-soc-fixes/arm/fixes (f012afb6af3d ARM: dts: ux500/golden: Set display max brightness) Merging drivers-memory-fixes/fixes (3650b228f83a Linux 5.10-rc1) Merging m68k-current/for-linus (2ae92e8b9b7e MAINTAINERS: Update m68k Mac entry) Merging powerpc-fixes/fixes (9c7422b92cb2 powerpc/32s: Fix RTAS machine check with VMAP stack) Merging s390-fixes/fixes (586592478b1f Merge tag 's390-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux) Merging sparc/master (0a95a6d1a4cd sparc: use for_each_child_of_node() macro) Merging fscrypt-current/for-stable (d19d8d345eec fscrypt: fix inline encryption not used on new files) Merging net/master (2575bc1aa9d5 net: mvpp2: Fix GoP port 3 Networking Complex Control configurations) Merging bpf/master (e7e518053c26 bpf: Add schedule point in htab_init_buckets()) Merging ipsec/master (56ce7c25ae15 xfrm: Fix oops in xfrm_replay_advance_bmp) Merging netfilter/master (2575bc1aa9d5 net: mvpp2: Fix GoP port 3 Networking Complex Control configurations) Merging ipvs/master (5c8193f568ae netfilter: ipset: fix shift-out-of-bounds in htable_bits()) Merging wireless-drivers/master (bfe55584713b MAINTAINERS: switch to different email address) Merging mac80211/master (2c85ebc57b3e Linux 5.10) Merging rdma-fixes/for-rc (340b940ea0ed RDMA/cm: Fix an attempt to use non-valid pointer when cleaning timewait) Merging sound-current/for-linus (13be30f156fd ALSA/hda: apply jack fixup for the Acer Veriton N4640G/N6640G/N2510G) Merging sound-asoc-fixes/for-linus (fd19c7352504 Merge remote-tracking branch 'asoc/for-5.11' into asoc-linus) Merging regmap-fixes/for-linus (e6e9354b5830 regmap: Remove duplicate `type` field from regmap `regcache_sync` trace event) Merging regulator-fixes/for-linus (639b12846819 Merge remote-tracking branch 'regulator/for-5.11' into regulator-linus) Merging spi-fixes/for-linus (676c63ebebaf Merge remote-tracking branch 'spi/for-5.11' into spi-linus) Merging pci-current/for-linus (f8394f232b1e Linux 5.10-rc3) Merging driver-core.current/driver-core-linus (accefff5b547 Merge tag 'ar
Re: [PATCH v1] scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
On 2020-12-23 12:19, Stanley Chu wrote: Hi Can, On Tue, 2020-12-22 at 19:34 +0800, Can Guo wrote: On 2020-12-22 15:29, Stanley Chu wrote: > Flush during hibern8 is sufficient on MediaTek platforms, thus > enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL to skip enabling > fWriteBoosterBufferFlush during WriteBooster initialization. > > Signed-off-by: Stanley Chu > --- > drivers/scsi/ufs/ufs-mediatek.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/scsi/ufs/ufs-mediatek.c > b/drivers/scsi/ufs/ufs-mediatek.c > index 80618af7c872..c55202b92a43 100644 > --- a/drivers/scsi/ufs/ufs-mediatek.c > +++ b/drivers/scsi/ufs/ufs-mediatek.c > @@ -661,6 +661,7 @@ static int ufs_mtk_init(struct ufs_hba *hba) > >/* Enable WriteBooster */ >hba->caps |= UFSHCD_CAP_WB_EN; > + hba->quirks |= UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL; >hba->vps->wb_flush_threshold = UFS_WB_BUF_REMAIN_PERCENT(80); > >if (host->caps & UFS_MTK_CAP_DISABLE_AH8) I guess we need it too... AHHA, if you decide to add this in your platform too later, maybe we could change the way it does: Keep manual flush disabled by default and remove this quirk. Yeah... I will get back with an answer later. Thanks, Can Guo. Thanks, Stanley Chu Change LGTM. Regards, Can Guo.
Re: [PATCH v2 19/48] opp: Fix adding OPP entries in a wrong order if rate is unavailable
On 22-12-20, 22:19, Dmitry Osipenko wrote: > 22.12.2020 12:12, Viresh Kumar пишет: > > On 17-12-20, 21:06, Dmitry Osipenko wrote: > >> Fix adding OPP entries in a wrong (opposite) order if OPP rate is > >> unavailable. The OPP comparison is erroneously skipped if OPP rate is > >> missing, thus OPPs are left unsorted. > >> > >> Signed-off-by: Dmitry Osipenko > >> --- > >> drivers/opp/core.c | 23 --- > >> drivers/opp/opp.h | 2 +- > >> 2 files changed, 13 insertions(+), 12 deletions(-) > >> > >> diff --git a/drivers/opp/core.c b/drivers/opp/core.c > >> index 34f7e530d941..5c7f130a8de2 100644 > >> --- a/drivers/opp/core.c > >> +++ b/drivers/opp/core.c > >> @@ -1531,9 +1531,10 @@ static bool _opp_supported_by_regulators(struct > >> dev_pm_opp *opp, > >>return true; > >> } > >> > >> -int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2) > >> +int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2, > >> + bool rate_not_available) > >> { > >> - if (opp1->rate != opp2->rate) > >> + if (!rate_not_available && opp1->rate != opp2->rate) > > > > rate will be 0 for both the OPPs here if rate_not_available is true and so > > this > > change shouldn't be required. > > The rate_not_available is negated in the condition. This change is > required because both rates are 0 and then we should proceed to the > levels comparison. Won't that happen without this patch ? > I guess it's not clear by looking at this patch, please see a full > version of the function: > > int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2, > bool rate_not_available) > { > if (!rate_not_available && opp1->rate != opp2->rate) > return opp1->rate < opp2->rate ? -1 : 1; > if (opp1->bandwidth && opp2->bandwidth && > opp1->bandwidth[0].peak != opp2->bandwidth[0].peak) > return opp1->bandwidth[0].peak < opp2->bandwidth[0].peak ? -1 : 1; > if (opp1->level != opp2->level) > return opp1->level < opp2->level ? -1 : 1; > return 0; > } > > Perhaps we could check whether opp1->rate=0, like it's done for the > opp1->bandwidth. I'll consider this variant for v3, thanks. -- viresh
Re: [PATCH] perf stat: Create '--add-default' option to append default list
On 12/23/2020 8:56 AM, Jin, Yao wrote: Hi Arnaldo, On 12/23/2020 12:15 AM, Arnaldo Carvalho de Melo wrote: Em Tue, Dec 22, 2020 at 09:11:31AM +0800, Jin Yao escreveu: The event default list includes the most common events which are widely used by users. But with -e option, the current perf only counts the events assigned by -e option. Users may want to collect some extra events with the default list. For this case, users have to manually add all the events from the default list. It's inconvenient. Also, users may don't know how to get the default list. It's better to add a new option to append default list to the -e events. The new option is '--add-default'. Before: root@kbl-ppc:~# ./perf stat -e power/energy-pkg/ -a -- sleep 1 Performance counter stats for 'system wide': 2.05 Joules power/energy-pkg/ 1.000857974 seconds time elapsed After: root@kbl-ppc:~# ./perf stat -e power/energy-pkg/ -a --add-default -- sleep 1 I thought about: perf stat -e +power/energy-pkg/ -a -- sleep 1 I was surprised to see that '+' syntax had been supported. root@kbl-ppc:~# ./perf stat -e +power/energy-pkg/ -a -- sleep 1 Performance counter stats for 'system wide': 1.99 Joules +power/energy-pkg/ 1.000877852 seconds time elapsed root@kbl-ppc:~# ./perf stat -e +power/energy-pkg/,+cycles -a -- sleep 1 Performance counter stats for 'system wide': 2.00 Joules +power/energy-pkg/ 13,780,620 +cycles 1.001639147 seconds time elapsed Are there any scripts or usages need the prefix '+' before event? I don't know. But if we append the '+' to the default event list, will break something potentially? Which would have its counterpart: perf stat -e -cycles -0a --sleep 1 To remove an event from the defaults, perhaps to deal with some specific hardware where the default or what is in -d, -dd, -ddd, etc can't all be counted. I.e. - and + would remove or add from whaver list was there at that point. - Arnaldo Yes, + and - are more flexible solution. Just for above question, will '+' break existing usage? And for '-', I don't know if user can remember clearly for what the events are in default list. For '-', another difficulty is it may conflict with the hardware cache event. Say we remove the "- { return '-'; }" from parse-events.l, such as: diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l index 9db5097317f4..145653d1ce16 100644 --- a/tools/perf/util/parse-events.l +++ b/tools/perf/util/parse-events.l @@ -387,7 +387,6 @@ r{num_raw_hex} { return raw(yyscanner); } {name} { return pmu_str_check(yyscanner, _parse_state); } {name_tag} { return str(yyscanner, PE_NAME); } "/"{ BEGIN(config); return '/'; } -- { return '-'; } , { BEGIN(event); return ','; } : { return ':'; } "{"{ BEGIN(event); return '{'; } The syntax of '-' is supported. root@kbl-ppc:~# ./perf stat -e -cycles -a -- sleep 1 Performance counter stats for 'system wide': 14,008,859 -cycles 1.001471494 seconds time elapsed But the parsing of hardware cache event would be failed. :( root@kbl-ppc:~# ./perf stat -e LLC-stores -a -- sleep 1 event syntax error: 'LLC-stores' \___ parser error That complicates things. :( Thanks Jin Yao Thanks Jin Yao Performance counter stats for 'system wide': 2.10 Joules power/energy-pkg/ # 0.000 K/sec 8,009.89 msec cpu-clock # 7.995 CPUs utilized 140 context-switches # 0.017 K/sec 9 cpu-migrations # 0.001 K/sec 66 page-faults # 0.008 K/sec 10,671,929 cycles # 0.001 GHz 4,736,880 instructions # 0.44 insn per cycle 942,951 branches # 0.118 M/sec 76,096 branch-misses # 8.07% of all branches 1.001809960 seconds time elapsed Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-stat.txt | 5 + tools/perf/builtin-stat.c | 4 +++- tools/perf/util/stat.h | 1 + 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 5d4a673d7621..75a83c2e4dc5 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -438,6 +438,11 @@ convenient for post processing. --summary:: Print summary for interval mode (-I). +--add-default:: +The default event list includes the most common events which are widely +used by users. But with -e option, the perf only counts the events assigned +by -e
Re: [PATCH phy] PHY: Ingenic: fix unconditional build of phy-ingenic-usb
On 22-12-20, 13:10, Alexander Lobakin wrote: > Currently drivers/phy/ingenic/Makefile adds phy-ingenic-usb to targets > not depending on actual Kconfig symbol CONFIG_PHY_INGENIC_USB, so this > driver always gets built[-in] on every system. > Add missing dependency. Applied, thanks -- ~Vinod
[PATCH v2] arm64: dts: mt8192: add nor_flash device node
From: bayi cheng add nor_flash device node Change-Id: I79f0228529bd8a33e5f354b7a861a4ec8d92e9ba Signed-off-by: bayi cheng --- Change in v2: 1: add dependent patch of arm soc 2: change compatible name Depends on: https://patchwork.kernel.org/patch/11713559/ [v4,1/3] arm64: dts: Add Mediatek SoC MT8192 and evaluation board dts and Makefile --- arch/arm64/boot/dts/mediatek/mt8192.dtsi | 13 + 1 file changed, 13 insertions(+) diff --git a/arch/arm64/boot/dts/mediatek/mt8192.dtsi b/arch/arm64/boot/dts/mediatek/mt8192.dtsi index e12e024..751c877 100644 --- a/arch/arm64/boot/dts/mediatek/mt8192.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8192.dtsi @@ -379,6 +379,19 @@ status = "disabled"; }; + nor_flash: spi@11234000 { + compatible = "mediatek,mt8192-nor"; + reg = <0 0x11234000 0 0xe0>; + interrupts = ; + clocks = <>, +<>, +<>; + clock-names = "spi", "sf", "axi"; + #address-cells = <1>; + #size-cells = <0>; + status = "disable"; + }; + i2c3: i2c3@11cb { compatible = "mediatek,mt8192-i2c"; reg = <0 0x11cb 0 0x1000>, -- 1.9.1
Re: [PATCH v2 14/48] opp: Filter out OPPs based on availability of a required-OPP
On 22-12-20, 22:17, Dmitry Osipenko wrote: > 22.12.2020 11:59, Viresh Kumar пишет: > > On 17-12-20, 21:06, Dmitry Osipenko wrote: > >> A required OPP may not be available, and thus, all OPPs which are using > >> this required OPP should be unavailable too. > >> > >> Signed-off-by: Dmitry Osipenko > >> --- > >> drivers/opp/core.c | 11 ++- > >> 1 file changed, 10 insertions(+), 1 deletion(-) > > > > Please send a separate patchset for fixes, as these can also go to 5.11 > > itself. > > Alright, although I don't think that this patch fixes any problems for > existing OPP users. Because nobody is using this feature, but otherwise this is a fix for me. > >> diff --git a/drivers/opp/core.c b/drivers/opp/core.c > >> index d9feb7639598..3d02fe33630b 100644 > >> --- a/drivers/opp/core.c > >> +++ b/drivers/opp/core.c > >> @@ -1588,7 +1588,7 @@ int _opp_add(struct device *dev, struct dev_pm_opp > >> *new_opp, > >> struct opp_table *opp_table, bool rate_not_available) > >> { > >>struct list_head *head; > >> - int ret; > >> + int i, ret; > >> > >>mutex_lock(_table->lock); > >>head = _table->opp_list; > >> @@ -1615,6 +1615,15 @@ int _opp_add(struct device *dev, struct dev_pm_opp > >> *new_opp, > >> __func__, new_opp->rate); > >>} > >> > >> + for (i = 0; i < opp_table->required_opp_count && new_opp->available; > >> i++) { > >> + if (new_opp->required_opps[i]->available) > >> + continue; > >> + > >> + new_opp->available = false; > >> + dev_warn(dev, "%s: OPP not supported by required OPP %pOF > >> (%lu)\n", > >> + __func__, new_opp->required_opps[i]->np, > >> new_opp->rate); > > > > Why not just break from here ? > > The new_opp could be already marked as unavailable by a previous voltage > check, hence this loop should be skipped entirely in that case. Then add a separate check for that before the loop as we don't need that check on every iteration here. -- viresh
Re: [PATCH v1] scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
Hi Can, On Tue, 2020-12-22 at 19:34 +0800, Can Guo wrote: > On 2020-12-22 15:29, Stanley Chu wrote: > > Flush during hibern8 is sufficient on MediaTek platforms, thus > > enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL to skip enabling > > fWriteBoosterBufferFlush during WriteBooster initialization. > > > > Signed-off-by: Stanley Chu > > --- > > drivers/scsi/ufs/ufs-mediatek.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/drivers/scsi/ufs/ufs-mediatek.c > > b/drivers/scsi/ufs/ufs-mediatek.c > > index 80618af7c872..c55202b92a43 100644 > > --- a/drivers/scsi/ufs/ufs-mediatek.c > > +++ b/drivers/scsi/ufs/ufs-mediatek.c > > @@ -661,6 +661,7 @@ static int ufs_mtk_init(struct ufs_hba *hba) > > > > /* Enable WriteBooster */ > > hba->caps |= UFSHCD_CAP_WB_EN; > > + hba->quirks |= UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL; > > hba->vps->wb_flush_threshold = UFS_WB_BUF_REMAIN_PERCENT(80); > > > > if (host->caps & UFS_MTK_CAP_DISABLE_AH8) > > I guess we need it too... AHHA, if you decide to add this in your platform too later, maybe we could change the way it does: Keep manual flush disabled by default and remove this quirk. Thanks, Stanley Chu > > Change LGTM. > > Regards, > > Can Guo.
Re: [PATCH v2 11/48] opp: Add dev_pm_opp_find_level_ceil()
On 22-12-20, 22:15, Dmitry Osipenko wrote: > 22.12.2020 09:42, Viresh Kumar пишет: > > On 17-12-20, 21:06, Dmitry Osipenko wrote: > >> Add a ceil version of the dev_pm_opp_find_level(). It's handy to have if > >> levels don't start from 0 in OPP table and zero usually means a minimal > >> level. > >> > >> Signed-off-by: Dmitry Osipenko > > > > Why doesn't the exact version work for you here ? > > > > The exact version won't find OPP for level=0 if levels don't start with > 0, where 0 means that minimal level is desired. Right, but why do you need to send 0 for your platform ? -- viresh
Re: [PATCH v3 3/5] RISC-V: Align the .init.text section
On Fri, 18 Dec 2020 00:19:09 PST (-0800), ati...@atishpatra.org wrote: On Thu, Dec 17, 2020 at 12:33 AM Atish Patra wrote: On Wed, Dec 16, 2020 at 10:51 PM Palmer Dabbelt wrote: > > On Tue, 15 Dec 2020 22:02:54 PST (-0800), Palmer Dabbelt wrote: > > On Wed, 04 Nov 2020 16:04:37 PST (-0800), Atish Patra wrote: > >> In order to improve kernel text protection, we need separate .init.text/ > >> .init.data/.text in separate sections. However, RISC-V linker relaxation > >> code is not aware of any alignment between sections. As a result, it may > >> relax any RISCV_CALL relocations between sections to JAL without realizing > >> that an inter section alignment may move the address farther. That may > >> lead to a relocation truncated fit error. However, linker relaxation code > >> is aware of the individual section alignments. > >> > >> The detailed discussion on this issue can be found here. > >> https://github.com/riscv/riscv-gnu-toolchain/issues/738 > >> > >> Keep the .init.text section aligned so that linker relaxation will take > >> that as a hint while relaxing inter section calls. > >> Here are the code size changes for each section because of this change. > >> > >> section change in size (in bytes) > >> .head.text +4 > >> .text +40 > >> .init.text +6530 > >> .exit.text +84 > >> > >> The only significant increase in size happened for .init.text because > >> all intra relocations also use 2MB alignment. > >> > >> Suggested-by: Jim Wilson > >> Signed-off-by: Atish Patra > >> --- > >> arch/riscv/kernel/vmlinux.lds.S | 8 +++- > >> 1 file changed, 7 insertions(+), 1 deletion(-) > >> > >> diff --git a/arch/riscv/kernel/vmlinux.lds.S b/arch/riscv/kernel/vmlinux.lds.S > >> index 3ffbd6cbdb86..cacd7898ba7f 100644 > >> --- a/arch/riscv/kernel/vmlinux.lds.S > >> +++ b/arch/riscv/kernel/vmlinux.lds.S > >> @@ -30,7 +30,13 @@ SECTIONS > >> . = ALIGN(PAGE_SIZE); > >> > >> __init_begin = .; > >> -INIT_TEXT_SECTION(PAGE_SIZE) > >> +__init_text_begin = .; > >> +.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) ALIGN(SECTION_ALIGN) { \ > >> +_sinittext = .; \ > >> +INIT_TEXT \ > >> +_einittext = .; \ > >> +} > >> + > >> . = ALIGN(8); > >> __soc_early_init_table : { > >> __soc_early_init_table_start = .; > > > > Not sure what's going on here (or why I wasn't catching it earlier), but this > > is breaking boot on one of my test configs. I'm not getting any Linux boot > > spew, so it's something fairly early. I'm running defconfig with > > > > CONFIG_PREEMPT=y > > CONFIG_DEBUG_PREEMPT=y > > CONFIG_PROVE_LOCKING=y > > > > It looks like that's been throwing a bunch of warnings for a while, but it did > > at least used to boot. No idea what PREEMPT would have to do with this, and > > the other two don't generally trigger issues that early in boot (or at least, > > trigger halts that early in boot). > > I am able to reproduce this issue but with CONFIG_PROVE_LOCKING not CONFIG_PREEMPT. With CONFIG_PREEMPT, I see a bunch of warnings around smp_processor_id but it boots even with 5.0. If CONFIG_PROVE_LOCKING is enabled, I am not able to boot using 5.0. However, 5.2.0 works fine. I am going to take a look at the issue with 5.0 and PROVE_LOCKING. The config preempt warnings are resolved by the following patch. I have tested it in Qemu. https://patchwork.kernel.org/project/linux-riscv/patch/20201116081238.44223-1-wangkefeng.w...@huawei.com/ Thanks! > > There's a bunch of other stuff that depends on this that's on for-next so I > > don't want to just drop it, but I also don't want to break something. I'm just > > running QEMU's virt board. > > I just verified for-next on QEMU 5.2.0 for virt (RV32,64, nommu) and sifive_u as well. I will give it a try on unleashed tomorrow as well with the above configs enabled. > > I'll take a look again tomorrow night, but if anyone has some time to look > > that'd be great! > > Looks like this breaks on QEMU 5.0.0 but works on 5.2.0. I will take a look tomorrow to check the root cause. I guess technically > that means could be considered a regression, but as we don't really have any > scheme for which old versions of QEMU we support it's not absolute. I'd > usually err on the side of keeping support for older platforms, but in this > case it's probably just not worth the time so I'm going to just ignore it. > > ___ > linux-riscv mailing list > linux-ri...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-riscv -- Regards, Atish
[PATCH v1 2/2] arm64: dts: mt6779: Support ufshci and ufsphy
Support UFS on MT6779 platforms by adding ufshci and ufsphy nodes in dts file. Reviewed-by: Hanks Chen Signed-off-by: Stanley Chu --- arch/arm64/boot/dts/mediatek/mt6779.dtsi | 36 +++- 1 file changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/mediatek/mt6779.dtsi b/arch/arm64/boot/dts/mediatek/mt6779.dtsi index 370f309d32de..a8584b00cc9d 100644 --- a/arch/arm64/boot/dts/mediatek/mt6779.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt6779.dtsi @@ -225,6 +225,41 @@ #clock-cells = <1>; }; + ufshci: ufshci@1127 { + compatible = "mediatek,mt8183-ufshci"; + reg = <0 0x1127 0 0x2300>; + interrupts = ; + phys = <>; + + clocks = <_ao CLK_INFRA_UFS>, +<_ao CLK_INFRA_UFS_TICK>, +<_ao CLK_INFRA_UFS_AXI>, +<_ao CLK_INFRA_UNIPRO_TICK>, +<_ao CLK_INFRA_UNIPRO_MBIST>, +< CLK_TOP_FAES_UFSFDE>, +<_ao CLK_INFRA_AES_UFSFDE>, +<_ao CLK_INFRA_AES_BCLK>; + clock-names = "ufs", "ufs_tick", "ufs_axi", + "unipro_tick", "unipro_mbist", + "aes_top", "aes_infra", "aes_bclk"; + freq-table-hz = <0 0>, <0 0>, <0 0>, + <0 0>, <0 0>, <0 0>, + <0 0>, <0 0>; + + mediatek,ufs-disable-ah8; + mediatek,ufs-support-va09; + }; + + ufsphy: phy@11fa { + compatible = "mediatek,mt8183-ufsphy"; + reg = <0 0x11fa 0 0xc000>; + #phy-cells = <0>; + + clocks = <_ao CLK_INFRA_UNIPRO_SCK>, +<_ao CLK_INFRA_UFS_MP_SAP_BCLK>; + clock-names = "unipro", "mp"; + }; + mfgcfg: clock-controller@13fbf000 { compatible = "mediatek,mt6779-mfgcfg", "syscon"; reg = <0 0x13fbf000 0 0x1000>; @@ -266,6 +301,5 @@ reg = <0 0x1b00 0 0x1000>; #clock-cells = <1>; }; - }; }; -- 2.18.0
[PATCH v1 1/2] arm64: configs: Support Universal Flash Storage on MediaTek platforms
Support UFS on MediaTek platforms by enabling CONFIG_SCSI_UFS_MEDIATEK. Reviewed-by: Hanks Chen Signed-off-by: Stanley Chu --- arch/arm64/configs/defconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig index 838301650a79..89ab646e0a1e 100644 --- a/arch/arm64/configs/defconfig +++ b/arch/arm64/configs/defconfig @@ -282,6 +282,7 @@ CONFIG_MEGARAID_SAS=y CONFIG_SCSI_MPT3SAS=m CONFIG_SCSI_UFSHCD=y CONFIG_SCSI_UFSHCD_PLATFORM=y +CONFIG_SCSI_UFS_MEDIATEK=m CONFIG_SCSI_UFS_QCOM=m CONFIG_SCSI_UFS_HISI=y CONFIG_ATA=y -- 2.18.0
[PATCH v1 0/2] arm64: Support Universal Flash Storage on MediaTek MT6779 platform
Hi, This series adds UFS (Universal Flash Storage) support on MediaTek MT6779 SoC platform. Stanley Chu (2): arm64: configs: Support Universal Flash Storage on MediaTek platforms arm64: dts: mt6779: Support ufshci and ufsphy arch/arm64/boot/dts/mediatek/mt6779.dtsi | 36 +++- arch/arm64/configs/defconfig | 1 + 2 files changed, 36 insertions(+), 1 deletion(-) -- 2.18.0
Re: [PATCH 1/2] mm/madvise: allow process_madvise operations on entire memory range
On Tue, Dec 22, 2020 at 9:48 AM Suren Baghdasaryan wrote: > > On Tue, Dec 22, 2020 at 5:44 AM Christoph Hellwig wrote: > > > > On Fri, Dec 11, 2020 at 09:27:46PM +0100, Jann Horn wrote: > > > > Can we just use one element in iovec to indicate entire address rather > > > > than using up the reserved flags? > > > > > > > > struct iovec { > > > > .iov_base = NULL, > > > > .iov_len = (~(size_t)0), > > > > }; > > > > > > In addition to Suren's objections, I think it's also worth considering > > > how this looks in terms of compat API. If a compat process does > > > process_madvise() on another compat process, it would be specifying > > > the maximum 32-bit number, rather than the maximum 64-bit number, so > > > you'd need special code to catch that case, which would be ugly. > > > > > > And when a compat process uses this API on a non-compat process, it > > > semantically gets really weird: The actual address range covered would > > > be larger than the address range specified. > > > > > > And if we want different access checks for the two flavors in the > > > future, gating that different behavior on special values in the iovec > > > would feel too magical to me. > > > > > > And the length value SIZE_MAX doesn't really make sense anyway because > > > the length of the whole address space would be SIZE_MAX+1, which you > > > can't express. > > > > > > So I'm in favor of a new flag, and strongly against using SIZE_MAX as > > > a magic number here. > > > > Yes, using SIZE_MAX is a horrible interface in this case. I'm not > > a huge fan of a flag either. What is the use case for the madvise > > to all of a processes address space anyway? > > Thanks for the feedback! The use case is userspace memory reaping > similar to oom-reaper. Detailed justification is here: > https://lore.kernel.org/linux-mm/20201124053943.1684874-1-sur...@google.com Actually this post in the most informative and includes test results: https://lore.kernel.org/linux-api/cajucfpgz1kpm3g1gzh+09z7aowkg05qsammisj7h5mdmrrr...@mail.gmail.com/
mmotm 2020-12-22-20-07 uploaded
The mm-of-the-moment snapshot 2020-12-22-20-07 has been uploaded to https://www.ozlabs.org/~akpm/mmotm/ mmotm-readme.txt says README for mm-of-the-moment: https://www.ozlabs.org/~akpm/mmotm/ This is a snapshot of my -mm patch queue. Uploaded at random hopefully more than once a week. You will need quilt to apply these patches to the latest Linus release (5.x or 5.x-rcY). The series file is in broken-out.tar.gz and is duplicated in https://ozlabs.org/~akpm/mmotm/series The file broken-out.tar.gz contains two datestamp files: .DATE and .DATE--mm-dd-hh-mm-ss. Both contain the string -mm-dd-hh-mm-ss, followed by the base kernel version against which this patch series is to be applied. This tree is partially included in linux-next. To see which patches are included in linux-next, consult the `series' file. Only the patches within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in linux-next. A full copy of the full kernel tree with the linux-next and mmotm patches already applied is available through git within an hour of the mmotm release. Individual mmotm releases are tagged. The master branch always points to the latest release, so it's constantly rebasing. https://github.com/hnaz/linux-mm The directory https://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second) contains daily snapshots of the -mm tree. It is updated more frequently than mmotm, and is untested. A git copy of this tree is also available at https://github.com/hnaz/linux-mm This mmotm tree contains the following patches against 5.10: (patches marked "*" will be included in linux-next) origin.patch * kasan-drop-unnecessary-gpl-text-from-comment-headers.patch * kasan-kasan_vmalloc-depends-on-kasan_generic.patch * kasan-group-vmalloc-code.patch * kasan-shadow-declarations-only-for-software-modes.patch * kasan-rename-unpoison_shadow-to-unpoison_range.patch * kasan-rename-kasan_shadow_-to-kasan_granule_.patch * kasan-only-build-initc-for-software-modes.patch * kasan-split-out-shadowc-from-commonc.patch * kasan-define-kasan_memory_per_shadow_page.patch * kasan-rename-report-and-tags-files.patch * kasan-dont-duplicate-config-dependencies.patch * kasan-hide-invalid-free-check-implementation.patch * kasan-decode-stack-frame-only-with-kasan_stack_enable.patch * kasan-arm64-only-init-shadow-for-software-modes.patch * kasan-arm64-only-use-kasan_depth-for-software-modes.patch * kasan-arm64-move-initialization-message.patch * kasan-arm64-rename-kasan_init_tags-and-mark-as-__init.patch * kasan-rename-addr_has_shadow-to-addr_has_metadata.patch * kasan-rename-print_shadow_for_address-to-print_memory_metadata.patch * kasan-rename-shadow-layout-macros-to-meta.patch * kasan-separate-metadata_fetch_row-for-each-mode.patch * kasan-introduce-config_kasan_hw_tags.patch * arm64-enable-armv85-a-asm-arch-option.patch * arm64-mte-add-in-kernel-mte-helpers.patch * arm64-mte-reset-the-page-tag-in-page-flags.patch * arm64-mte-add-in-kernel-tag-fault-handler.patch * arm64-kasan-allow-enabling-in-kernel-mte.patch * arm64-mte-convert-gcr_user-into-an-exclude-mask.patch * arm64-mte-switch-gcr_el1-in-kernel-entry-and-exit.patch * kasan-mm-untag-page-address-in-free_reserved_area.patch * arm64-kasan-align-allocations-for-hw_tags.patch * arm64-kasan-add-arch-layer-for-memory-tagging-helpers.patch * kasan-define-kasan_granule_size-for-hw_tags.patch * kasan-x86-s390-update-undef-config_kasan.patch * kasan-arm64-expand-config_kasan-checks.patch * kasan-arm64-implement-hw_tags-runtime.patch * kasan-arm64-print-report-from-tag-fault-handler.patch * kasan-mm-reset-tags-when-accessing-metadata.patch * kasan-arm64-enable-config_kasan_hw_tags.patch * kasan-add-documentation-for-hardware-tag-based-mode.patch * kselftest-arm64-check-gcr_el1-after-context-switch.patch * kasan-simplify-quarantine_put-call-site.patch * kasan-rename-get_alloc-free_info.patch * kasan-introduce-set_alloc_info.patch * kasan-arm64-unpoison-stack-only-with-config_kasan_stack.patch * kasan-allow-vmap_stack-for-hw_tags-mode.patch * kasan-remove-__kasan_unpoison_stack.patch * kasan-inline-kasan_reset_tag-for-tag-based-modes.patch * kasan-inline-random_tag-for-hw_tags.patch * kasan-open-code-kasan_unpoison_slab.patch * kasan-inline-unpoison_range-and-check_invalid_free.patch * kasan-add-and-integrate-kasan-boot-parameters.patch * kasan-mm-check-kasan_enabled-in-annotations.patch * kasan-mm-rename-kasan_poison_kfree.patch * kasan-dont-round_up-too-much.patch * kasan-simplify-assign_tag-and-set_tag-calls.patch * kasan-clarify-comment-in-__kasan_kfree_large.patch * kasan-sanitize-objects-when-metadata-doesnt-fit.patch * kasan-mm-allow-cache-merging-with-no-metadata.patch * kasan-update-documentation.patch * mm-slub-call-account_slab_page-after-slab-page-initialization.patch * mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch * lib-zlib-fix-inflating-zlib-streams-on-s390.patch * selftests-vm-fix-building-protection-keys-test.patch *
Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up
On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin wrote: > Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up i386 allmodconfig: In file included from ./include/vdso/const.h:5, from ./include/linux/const.h:4, from ./include/linux/bits.h:5, from ./include/linux/bitops.h:6, from ./include/linux/kernel.h:11, from ./include/asm-generic/bug.h:20, from ./arch/x86/include/asm/bug.h:93, from ./include/linux/bug.h:5, from ./include/linux/mmdebug.h:5, from ./include/linux/mm.h:9, from ./include/linux/memblock.h:13, from mm/cma.c:24: mm/cma.c: In function ‘cma_declare_contiguous_nid’: ./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from ‘4294967296’ to ‘0’ [-Woverflow] #define __AC(X,Y) (X##Y) ^~ ./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’ #define _AC(X,Y) __AC(X,Y) ^~~~ ./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’ #define SZ_4G_AC(0x1, ULL) ^~~ mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’ addr = memblock_alloc_range_nid(size, alignment, SZ_4G, ^
Re: [PATCH] riscv: return -ENOSYS for syscall -1
On Tue, 22 Dec 2020 08:22:19 PST (-0800), tycho@tycho.pizza wrote: On Mon, Dec 21, 2020 at 11:52:00PM +0100, Andreas Schwab wrote: Properly return -ENOSYS for syscall -1 instead of leaving the return value uninitialized. This fixes the strace teststuite. Fixes: 5340627e3fe0 ("riscv: add support for SECCOMP and SECCOMP_FILTER") Signed-off-by: Andreas Schwab --- arch/riscv/kernel/entry.S | 9 + 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S index 524d918f3601..d07763001eb0 100644 --- a/arch/riscv/kernel/entry.S +++ b/arch/riscv/kernel/entry.S @@ -186,14 +186,7 @@ check_syscall_nr: * Syscall number held in a7. * If syscall number is above allowed value, redirect to ni_syscall. */ - bge a7, t0, 1f - /* -* Check if syscall is rejected by tracer, i.e., a7 == -1. -* If yes, we pretend it was executed. -*/ - li t1, -1 - beq a7, t1, ret_from_syscall_rejected - blt a7, t1, 1f + bgeu a7, t0, 1f IIUC, this is all dead code anyway for the path where seccomp actually rejects the syscall, since it should do the rejection directly in handle_syscall_trace_enter(), which is called above this hunk. So it seems good to me. Reviewed-by: Tycho Andersen Thanks, this is on fixes.
Re: [PATCH] HID: Add Wireless Radio Control feature for Chicony devices
Chris Chiu 於 2020年12月23日 週三 上午12:41寫道: > > On Tue, Dec 22, 2020 at 3:41 PM Jian-Hong Pan wrote: > > > > Some Chicony's keyboards support airplane mode hotkey (Fn+F2) with > > "Wireless Radio Control" feature. For example, the wireless keyboard > > [04f2:1236] shipped with ASUS all-in-one desktop. > > > > After consulting Chicony for this hotkey, learned the device will send > > with 0x11 as the report ID and 0x1 as the value when the key is pressed > > down. > > > > This patch maps the event as KEY_RFKILL. > > > > Signed-off-by: Jian-Hong Pan > > --- > > drivers/hid/hid-chicony.c | 58 +++ > > drivers/hid/hid-ids.h | 1 + > > 2 files changed, 59 insertions(+) > > > > diff --git a/drivers/hid/hid-chicony.c b/drivers/hid/hid-chicony.c > > index 3f0ed6a95223..aca963aa0f1e 100644 > > --- a/drivers/hid/hid-chicony.c > > +++ b/drivers/hid/hid-chicony.c > > @@ -21,6 +21,42 @@ > > > > #include "hid-ids.h" > > > > +#define KEY_PRESSED0x01 > > +#define CH_WIRELESS_CTL_REPORT_ID 0x11 > > + > > +static int ch_report_wireless(struct hid_report *report, u8 *data, int > > size) > > +{ > > + struct hid_device *hdev = report->device; > > + struct input_dev *input; > > + > > + if (report->id != CH_WIRELESS_CTL_REPORT_ID || > > + report->maxfield != 1 || > > + *report->field[0]->value != KEY_PRESSED) > > Maybe replace this line with hid_check_keys_pressed() and the KEY_PRESSED > is not required. Thanks for your suggestion! I tried hid_check_keys_pressed(). But, it always returns no key is pressed in this case. However, if the idea is: Since there is already a report, there must be an event from the input. So, the key press checking is duplicated. This idea makes sense. I will have a modification for this. Thanks! Jian-Hong Pan > > + return 0; > > + > > + input = report->field[0]->hidinput->input; > > + if (!input) { > > + hid_warn(hdev, "can't find wireless radio control's input"); > > + return 0; > > + } > > + > > + input_report_key(input, KEY_RFKILL, 1); > > + input_sync(input); > > + input_report_key(input, KEY_RFKILL, 0); > > + input_sync(input); > > + > > + return 1; > > +} > > + > > +static int ch_raw_event(struct hid_device *hdev, > > + struct hid_report *report, u8 *data, int size) > > +{ > > + if (report->application == HID_GD_WIRELESS_RADIO_CTLS) > > + return ch_report_wireless(report, data, size); > > + > > + return 0; > > +} > > + > > #define ch_map_key_clear(c)hid_map_usage_clear(hi, usage, bit, max, \ > > EV_KEY, (c)) > > static int ch_input_mapping(struct hid_device *hdev, struct hid_input *hi, > > @@ -77,10 +113,30 @@ static __u8 *ch_switch12_report_fixup(struct > > hid_device *hdev, __u8 *rdesc, > > return rdesc; > > } > > > > +static int ch_probe(struct hid_device *hdev, const struct hid_device_id > > *id) > > +{ > > + int ret; > > + > > + hdev->quirks |= HID_QUIRK_INPUT_PER_APP; > > + ret = hid_parse(hdev); > > + if (ret) { > > + hid_err(hdev, "Chicony hid parse failed: %d\n", ret); > > + return ret; > > + } > > + > > + ret = hid_hw_start(hdev, HID_CONNECT_DEFAULT); > > + if (ret) { > > + hid_err(hdev, "Chicony hw start failed: %d\n", ret); > > + return ret; > > + } > > + > > + return 0; > > +} > > > > static const struct hid_device_id ch_devices[] = { > > { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, > > USB_DEVICE_ID_CHICONY_TACTICAL_PAD) }, > > { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, > > USB_DEVICE_ID_CHICONY_WIRELESS2) }, > > + { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, > > USB_DEVICE_ID_CHICONY_WIRELESS3) }, > > { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, > > USB_DEVICE_ID_CHICONY_ACER_SWITCH12) }, > > { } > > }; > > @@ -91,6 +147,8 @@ static struct hid_driver ch_driver = { > > .id_table = ch_devices, > > .report_fixup = ch_switch12_report_fixup, > > .input_mapping = ch_input_mapping, > > + .probe = ch_probe, > > + .raw_event = ch_raw_event, > > }; > > module_hid_driver(ch_driver); > > > > diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h > > index 4c5f23640f9c..06d90301a3dc 100644 > > --- a/drivers/hid/hid-ids.h > > +++ b/drivers/hid/hid-ids.h > > @@ -270,6 +270,7 @@ > > #define USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE 0x1053 > > #define USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE20x0939 > > #define USB_DEVICE_ID_CHICONY_WIRELESS20x1123 > > +#define USB_DEVICE_ID_CHICONY_WIRELESS30x1236 > > #define USB_DEVICE_ID_ASUS_AK1D0x1125 > > #define USB_DEVICE_ID_CHICONY_TOSHIBA_WT10A0x1408 > > #define USB_DEVICE_ID_CHICONY_ACER_SWITCH120x1421 > > -- > > 2.29.2 > >
Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting
> On 12/21/20 11:46 PM, Liang Li wrote: > > Free page reporting only supports buddy pages, it can't report the > > free pages reserved for hugetlbfs case. On the other hand, hugetlbfs > > is a good choice for a system with a huge amount of RAM, because it > > can help to reduce the memory management overhead and improve system > > performance. > > This patch add the support for reporting hugepages in the free list > > of hugetlb, it canbe used by virtio_balloon driver for memory > > overcommit and pre zero out free pages for speeding up memory population. > > My apologies as I do not follow virtio_balloon driver. Comments from > the hugetlb perspective. Any comments are welcome. > > static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid) > > @@ -5531,6 +5537,29 @@ follow_huge_pgd(struct mm_struct *mm, unsigned long > > address, pgd_t *pgd, int fla > > return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> > > PAGE_SHIFT); > > } > > > > +bool isolate_free_huge_page(struct page *page, struct hstate *h, int nid) > > Looks like this always returns true. Should it be type void? will change in the next revision. > > +{ > > + bool ret = true; > > + > > + VM_BUG_ON_PAGE(!PageHead(page), page); > > + > > + list_move(>lru, >hugepage_activelist); > > + set_page_refcounted(page); > > + h->free_huge_pages--; > > + h->free_huge_pages_node[nid]--; > > + > > + return ret; > > +} > > + > > ... > > +static void > > +hugepage_reporting_drain(struct page_reporting_dev_info *prdev, > > + struct hstate *h, struct scatterlist *sgl, > > + unsigned int nents, bool reported) > > +{ > > + struct scatterlist *sg = sgl; > > + > > + /* > > + * Drain the now reported pages back into their respective > > + * free lists/areas. We assume at least one page is populated. > > + */ > > + do { > > + struct page *page = sg_page(sg); > > + > > + putback_isolate_huge_page(h, page); > > + > > + /* If the pages were not reported due to error skip flagging > > */ > > + if (!reported) > > + continue; > > + > > + __SetPageReported(page); > > + } while ((sg = sg_next(sg))); > > + > > + /* reinitialize scatterlist now that it is empty */ > > + sg_init_table(sgl, nents); > > +} > > + > > +/* > > + * The page reporting cycle consists of 4 stages, fill, report, drain, and > > + * idle. We will cycle through the first 3 stages until we cannot obtain a > > + * full scatterlist of pages, in that case we will switch to idle. > > + */ > > As mentioned, I am not familiar with virtio_balloon and the overall design. > So, some of this does not make sense to me. > > > +static int > > +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev, > > + struct hstate *h, unsigned int nid, > > + struct scatterlist *sgl, unsigned int *offset) > > +{ > > + struct list_head *list = >hugepage_freelists[nid]; > > + unsigned int page_len = PAGE_SIZE << h->order; > > + struct page *page, *next; > > + long budget; > > + int ret = 0, scan_cnt = 0; > > + > > + /* > > + * Perform early check, if free area is empty there is > > + * nothing to process so we can skip this free_list. > > + */ > > + if (list_empty(list)) > > + return ret; > > Do note that not all entries on the hugetlb free lists are free. Reserved > entries are also on the free list. The actual number of free entries is > 'h->free_huge_pages - h->resv_huge_pages'. > Is the intention to process reserved pages as well as free pages? Yes, Reserved pages was treated as 'free pages' > > + > > + spin_lock_irq(_lock); > > + > > + if (huge_page_order(h) > MAX_ORDER) > > + budget = HUGEPAGE_REPORTING_CAPACITY; > > + else > > + budget = HUGEPAGE_REPORTING_CAPACITY * 32; > > + > > + /* loop through free list adding unreported pages to sg list */ > > + list_for_each_entry_safe(page, next, list, lru) { > > + /* We are going to skip over the reported pages. */ > > + if (PageReported(page)) { > > + if (++scan_cnt >= MAX_SCAN_NUM) { > > + ret = scan_cnt; > > + break; > > + } > > + continue; > > + } > > + > > + /* > > + * If we fully consumed our budget then update our > > + * state to indicate that we are requesting additional > > + * processing and exit this list. > > + */ > > + if (budget < 0) { > > + atomic_set(>state, PAGE_REPORTING_REQUESTED); > > + next = page; > > + break; > > + } > > + > > + /* Attempt to pull page from list and place in
Re: [PATCH 2/3] aspeed-video: clear spurious interrupt bits unconditionally
On Tue, Dec 22, 2020 at 08:53:33PM CST, Ryan Chen wrote: -Original Message- From: Joel Stanley Sent: Wednesday, December 23, 2020 9:07 AM To: Zev Weiss ; Ryan Chen Cc: Eddie James ; Mauro Carvalho Chehab ; Andrew Jeffery ; linux-me...@vger.kernel.org; OpenBMC Maillist ; Linux ARM ; linux-aspeed ; Linux Kernel Mailing List ; Jae Hyun Yoo Subject: Re: [PATCH 2/3] aspeed-video: clear spurious interrupt bits unconditionally On Tue, 22 Dec 2020 at 19:14, Zev Weiss wrote: > > On Mon, Dec 21, 2020 at 10:47:37PM CST, Joel Stanley wrote: > >On Tue, 15 Dec 2020 at 02:46, Zev Weiss wrote: > >> > >> Instead of testing and conditionally clearing them one by one, we > >> can instead just unconditionally clear them all at once. > >> > >> Signed-off-by: Zev Weiss > > > >I had a poke at the assembly and it looks like GCC is clearing the > >bits unconditionally anyway, so removing the tests provides no change. > > > >Combining them is a good further optimization. > > > >Reviewed-by: Joel Stanley > > > >A question unrelated to this patch: Do you know why the driver > >doesn't clear the status bits in the interrupt handler? I would > >expect it to write the value of sts back to the register to ack the > >pending interrupt. > > > > No, I don't, and I was sort of wondering the same thing actually -- > I'm not deeply familiar with this hardware or driver though, so I was > a bit hesitant to start messing with things. (Though maybe doing so > would address the "stickiness" aspect when it does manifest.) Perhaps > Eddie or Jae can shed some light here? I think you're onto something here - this would be why the status bits seem to stick until the device is reset. Until Aspeed can clarify if this is a hardware or software issue, I suggest we ack the bits and log a message when we see them, instead of always ignoring them without taking any action. Can you write a patch that changes the interrupt handler to ack status bits as it handles each of them? Hello Zev, before the patch, do you met issue with irq handler? [continuous incoming?] In aspeed_video_irq handler should only handle enable interrupt expected. u32 sts = aspeed_video_read(video, VE_INTERRUPT_STATUS); + sts &= aspeed_video_read(video, VE_INTERRUPT_CTRL); Ryan Hi Ryan, Prior to any of these patches I encountered a problem pretty much exactly like what Jae described in his commit message in 65d270acb2d (but the kernel I was running included that patch). Adding the diagnostic in patch #1 of this series showed that it was apparently the same problem, just with a different interrupt that Jae's patch didn't include. From what you wrote above, I gather that it is in fact expected for the hardware to assert interrupts that aren't enabled in VE_INTERRUPT_CTRL? If so, I guess something like that would obviate the need for both Jae's earlier patch and this whole series. I think the question Joel raised is somewhat independent though -- if the VE_INTERRUPT_STATUS register asserts interrupts we're not actually using, should the driver acknowledge them anyway or just leave them alone? (Though if we're just going to ignore them anyway maybe it doesn't ultimately matter very much.) Zev
[PATCH] checkpatch: Prefer strscpy to strlcpy
Prefer strscpy over the deprecated strlcpy function. Requested-by: Andrew Morton Signed-off-by: Joe Perches --- scripts/checkpatch.pl | 6 ++ 1 file changed, 6 insertions(+) diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 00085308ed9d..27679cc0ec17 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -6646,6 +6646,12 @@ sub process { # } # } +# strlcpy uses that should likely be strscpy + if ($line =~ /\bstrlcpy\s*\(/) { + WARN("STRLCPY", +"Prefer strscpy over strlcpy - see: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw\@mail.gmail.com/\n; . $herecurr); + } + # typecasts on min/max could be min_t/max_t if ($perl_version_ok && defined $stat &&
Re: [PATCH] vdpa_sim: use iova module to allocate IOVA addresses
On 2020/12/23 上午1:45, Stefano Garzarella wrote: The identical mapping used until now created issues when mapping different virtual pages with the same physical address. To solve this issue, we can use the iova module, to handle the IOVA allocation. For semplicity we use an IOVA allocator with byte granularity. Should be simplicity, so did one comment below. We add two new functions, vdpasim_map_range() and vdpasim_unmap_range(), to handle the IOVA allocation and the registration into the IOMMU/IOTLB. These functions are used by dma_map_ops callbacks. Signed-off-by: Stefano Garzarella Few nits, but: Acked-by: Jason Wang --- drivers/vdpa/vdpa_sim/vdpa_sim.h | 2 + drivers/vdpa/vdpa_sim/vdpa_sim.c | 108 +++ drivers/vdpa/Kconfig | 1 + 3 files changed, 69 insertions(+), 42 deletions(-) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h index b02142293d5b..6efe205e583e 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.h +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h @@ -6,6 +6,7 @@ #ifndef _VDPA_SIM_H #define _VDPA_SIM_H +#include #include #include #include @@ -55,6 +56,7 @@ struct vdpasim { /* virtio config according to device type */ void *config; struct vhost_iotlb *iommu; + struct iova_domain iova; void *buffer; u32 status; u32 generation; diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index b3fcc67bfdf0..341b9daf2ea4 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "vdpa_sim.h" @@ -128,30 +129,57 @@ static int dir_to_perm(enum dma_data_direction dir) return perm; } +static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr, + size_t size, unsigned int perm) +{ + struct iova *iova; + dma_addr_t dma_addr; + int ret; + + /* We set the limit_pfn to the maximum (~0UL - 1) */ + iova = alloc_iova(>iova, size, ~0UL - 1, true); Let's use ULONG_MAX? + if (!iova) + return DMA_MAPPING_ERROR; + + dma_addr = iova_dma_addr(>iova, iova); + + spin_lock(>iommu_lock); + ret = vhost_iotlb_add_range(vdpasim->iommu, (u64)dma_addr, + (u64)dma_addr + size - 1, (u64)paddr, perm); + spin_unlock(>iommu_lock); + + if (ret) { + __free_iova(>iova, iova); + return DMA_MAPPING_ERROR; + } + + return dma_addr; +} + +static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr, + size_t size) +{ + spin_lock(>iommu_lock); + vhost_iotlb_del_range(vdpasim->iommu, (u64)dma_addr, + (u64)dma_addr + size - 1); + spin_unlock(>iommu_lock); + + free_iova(>iova, iova_pfn(>iova, dma_addr)); +} + static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page, unsigned long offset, size_t size, enum dma_data_direction dir, unsigned long attrs) { struct vdpasim *vdpasim = dev_to_sim(dev); - struct vhost_iotlb *iommu = vdpasim->iommu; - u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset; - int ret, perm = dir_to_perm(dir); + phys_addr_t paddr = page_to_phys(page) + offset; + int perm = dir_to_perm(dir); if (perm < 0) return DMA_MAPPING_ERROR; - /* For simplicity, use identical mapping to avoid e.g iova -* allocator. -*/ - spin_lock(>iommu_lock); - ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1, - pa, dir_to_perm(dir)); - spin_unlock(>iommu_lock); - if (ret) - return DMA_MAPPING_ERROR; - - return (dma_addr_t)(pa); + return vdpasim_map_range(vdpasim, paddr, size, perm); } static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, @@ -159,12 +187,8 @@ static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr, unsigned long attrs) { struct vdpasim *vdpasim = dev_to_sim(dev); - struct vhost_iotlb *iommu = vdpasim->iommu; - spin_lock(>iommu_lock); - vhost_iotlb_del_range(iommu, (u64)dma_addr, - (u64)dma_addr + size - 1); - spin_unlock(>iommu_lock); + vdpasim_unmap_range(vdpasim, dma_addr, size); } static void *vdpasim_alloc_coherent(struct device *dev, size_t size, @@ -172,27 +196,22 @@ static void *vdpasim_alloc_coherent(struct device *dev, size_t size, unsigned long attrs) { struct vdpasim *vdpasim = dev_to_sim(dev); - struct vhost_iotlb *iommu =
Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting
> On 12/22/20 11:59 AM, Alexander Duyck wrote: > > On Mon, Dec 21, 2020 at 11:47 PM Liang Li > > wrote: > >> + > >> + if (huge_page_order(h) > MAX_ORDER) > >> + budget = HUGEPAGE_REPORTING_CAPACITY; > >> + else > >> + budget = HUGEPAGE_REPORTING_CAPACITY * 32; > > > > Wouldn't huge_page_order always be more than MAX_ORDER? Seems like we > > don't even really need budget since this should probably be pulling > > out no more than one hugepage at a time. > > On standard x86_64 configs, 2MB huge pages are of order 9 < MAX_ORDER (11). > What is important for hugetlb is the largest order that can be allocated > from buddy. Anything bigger is considered a gigantic page and has to be > allocated differently. > > If the code above is trying to distinguish between huge and gigantic pages, > it is off by 1. The largest order that can be allocated from the buddy is > (MAX_ORDER - 1). So, the check should be '>='. > > -- > Mike Kravetz Yes, you're right! thanks Liang
Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting
> > +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev, > > +struct hstate *h, unsigned int nid, > > +struct scatterlist *sgl, unsigned int *offset) > > +{ > > + struct list_head *list = >hugepage_freelists[nid]; > > + unsigned int page_len = PAGE_SIZE << h->order; > > + struct page *page, *next; > > + long budget; > > + int ret = 0, scan_cnt = 0; > > + > > + /* > > +* Perform early check, if free area is empty there is > > +* nothing to process so we can skip this free_list. > > +*/ > > + if (list_empty(list)) > > + return ret; > > + > > + spin_lock_irq(_lock); > > + > > + if (huge_page_order(h) > MAX_ORDER) > > + budget = HUGEPAGE_REPORTING_CAPACITY; > > + else > > + budget = HUGEPAGE_REPORTING_CAPACITY * 32; > > Wouldn't huge_page_order always be more than MAX_ORDER? Seems like we > don't even really need budget since this should probably be pulling > out no more than one hugepage at a time. I want to disting a 2M page and 1GB page here. The order of 1GB page is greater than MAX_ORDER while 2M page's order is less than MAX_ORDER. > > > + /* loop through free list adding unreported pages to sg list */ > > + list_for_each_entry_safe(page, next, list, lru) { > > + /* We are going to skip over the reported pages. */ > > + if (PageReported(page)) { > > + if (++scan_cnt >= MAX_SCAN_NUM) { > > + ret = scan_cnt; > > + break; > > + } > > + continue; > > + } > > + > > It would probably have been better to place this set before your new > set. I don't see your new set necessarily being the best use for page > reporting. I haven't really latched on to what you mean, could you explain it again? > > > + /* > > +* If we fully consumed our budget then update our > > +* state to indicate that we are requesting additional > > +* processing and exit this list. > > +*/ > > + if (budget < 0) { > > + atomic_set(>state, PAGE_REPORTING_REQUESTED); > > + next = page; > > + break; > > + } > > + > > If budget is only ever going to be 1 then we probably could just look > at making this the default case for any time we find a non-reported > page. and here again. > > + /* Attempt to pull page from list and place in scatterlist > > */ > > + if (*offset) { > > + isolate_free_huge_page(page, h, nid); > > + /* Add page to scatter list */ > > + --(*offset); > > + sg_set_page([*offset], page, page_len, 0); > > + > > + continue; > > + } > > + > > There is no point in the continue case if we only have a budget of 1. > We should probably just tighten up the loop so that all it does is > search until it finds the 1 page it can pull, pull it, and then return > it. The scatterlist doesn't serve much purpose and could be reduced to > just a single entry. I will think about it more. > > +static int > > +hugepage_reporting_process_hstate(struct page_reporting_dev_info *prdev, > > + struct scatterlist *sgl, struct hstate *h) > > +{ > > + unsigned int leftover, offset = HUGEPAGE_REPORTING_CAPACITY; > > + int ret = 0, nid; > > + > > + for (nid = 0; nid < MAX_NUMNODES; nid++) { > > + ret = hugepage_reporting_cycle(prdev, h, nid, sgl, ); > > + > > + if (ret < 0) > > + return ret; > > + } > > + > > + /* report the leftover pages before going idle */ > > + leftover = HUGEPAGE_REPORTING_CAPACITY - offset; > > + if (leftover) { > > + sgl = [offset]; > > + ret = prdev->report(prdev, sgl, leftover); > > + > > + /* flush any remaining pages out from the last report */ > > + spin_lock_irq(_lock); > > + hugepage_reporting_drain(prdev, h, sgl, leftover, !ret); > > + spin_unlock_irq(_lock); > > + } > > + > > + return ret; > > +} > > + > > If HUGEPAGE_REPORTING_CAPACITY is 1 it would make more sense to > rewrite this code to just optimize for a find and process a page > approach rather than trying to batch pages. Yes, I will make a change. Thanks for your comments! Liang
Re: [PATCH v4 2/2] firmware: arm_scmi: Augment SMC/HVC to allow optional interrupt
On 12/22/2020 6:56 AM, Jim Quinlan wrote: > The SMC/HVC SCMI transport is modified to allow the completion of an SCMI > message to be indicated by an interrupt rather than the return of the smc > call. This accommodates the existing behavior of the BrcmSTB SCMI > "platform" whose SW is already out in the field and cannot be changed. > > Signed-off-by: Jim Quinlan This looks good to me, just one question below: [snip] > @@ -111,6 +145,8 @@ static int smc_send_message(struct scmi_chan_info *cinfo, > shmem_tx_prepare(scmi_info->shmem, xfer); > > arm_smccc_1_1_invoke(scmi_info->func_id, 0, 0, 0, 0, 0, 0, 0, ); > + if (scmi_info->irq) > + wait_for_completion(_info->tx_complete); Do we need this to have a preceding call to reinit_completion()? It does not look like this is going to make any practical difference but there are drivers doing that for correctness. -- Florian
Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect
On Tue, Dec 22, 2020 at 09:56:11PM -0500, Andrea Arcangeli wrote: > On Tue, Dec 22, 2020 at 04:39:46PM -0700, Yu Zhao wrote: > > We are talking about non-COW anon pages here -- they can't be mapped > > more than once. So why not just identify them by checking > > page_mapcount == 1 and then unconditionally reuse them? (This is > > probably where I've missed things.) > > The problem in depending on page_mapcount to decide if it's COW or > non-COW (respectively wp_page_copy or wp_page_reuse) is that is GUP > may elevate the count of a COW anon page that become a non-COW anon > page. > > This is Jann's idea not mine. > > The problem is we have an unprivileged long term GUP like vmsplice > that facilitates elevating the page count indefinitely, until the > parent finally writes a secret to it. Theoretically a short term pin > would do it too so it's not just vmpslice, but the short term pin > would be incredibly more challenging to become a concern since it'd > kill a phone battery and flash before it can read any data. > > So what happens with your page_mapcount == 1 check is that it doesn't > mean non-COW (we thought it did until it didn't for the long term gup > pin in vmsplice). > > Jann's testcases does fork() and set page_mapcount 2 and page_count to > 2, vmsplice, take unprivileged infinitely long GUP pin to set > page_count to 3, queue the page in the pipe with page_count elevated, > munmap to drop page_count to 2 and page_mapcount to 1. > > page_mapcount is 1, so you'd think the page is non-COW and owned by > the parent, but the child can still read it so it's very much still > wp_page_copy material if the parent tries to modify it. Otherwise the > child can read the content. > > This was supposed to be solvable by just doing the COW in gup(write=0) > case if page_mapcount > 1 with commit 17839856fd58. I'm not exactly > sure why that didn't fly and it had to be reverted by Peter in > a308c71bf1e6e19cc2e4ced31853ee0fc7cb439a but at the time this was > happening I was side tracked by urgent issues and I didn't manage to > look back of how we ended up with the big hammer page_count == 1 check > instead to decide if to call wp_page_reuse or wp_page_shared. > > So anyway, the only thing that is clear to me is that keeping the > child from reading the page_mapcount == 1 pages of the parent, is the > only reason why wp_page_reuse(vmf) will only be called on > page_count(page) == 1 and not on page_mapcount(page) == 1. > > It's also the reason why your page_mapcount assumption will risk to > reintroduce the issue, and I only wish we could put back page_mapcount > == 1 back there. > > Still even if we put back page_mapcount there, it is not ok to leave > the page fault with stale TLB entries and to rely on the fact > wp_page_shared won't run. It'd also avoid the problem but I think if > you leave stale TLB entries in change_protection just like NUMA > balancing does, it also requires a catcher just like NUMA balancing > has, or it'd truly work by luck. > > So until we can put a page_mapcount == 1 check back there, the > page_count will be by definition unreliable because of the speculative > lookups randomly elevating all non zero page_counts at any time in the > background on all pages, so you will never be able to tell if a page > is true COW or if it's just a spurious COW because of a speculative > lookup. It is impossible to differentiate a speculative lookup from a > vmsplice ref in a child. Thanks for the details. In your patch, do we need to take wrprotect_rwsem in handle_userfault() as well? Otherwise, it seems userspace would have to synchronize between its wrprotect ioctl and fault handler? i.e., the fault hander needs to be aware that the content of write- protected pages can actually change before the iotcl returns.