date:20201222

Re: [PATCH 1/2] mm/madvise: allow process_madvise operations on entire memory range

2020-12-22 Thread Christoph Hellwig

On Tue, Dec 22, 2020 at 09:48:43AM -0800, Suren Baghdasaryan wrote:
> Thanks for the feedback! The use case is userspace memory reaping
> similar to oom-reaper. Detailed justification is here:
> https://lore.kernel.org/linux-mm/20201124053943.1684874-1-sur...@google.com

Given that this new variant of process_madvise

  a) does not work on an address range
  b) is destructive
  c) doesn't share much code at all with the rest of process_madvise

Why not add a proper separate syscall?

Re: [PATCH v5 07/12] media: uvcvideo: Implement UVC_EXT_GPIO_UNIT

2020-12-22 Thread Laurent Pinchart

Hi Ricardo,

On Tue, Dec 22, 2020 at 07:36:52PM +0100, Ricardo Ribalda wrote:
> On Tue, Dec 22, 2020 at 9:34 AM Laurent Pinchart wrote:
> > On Mon, Dec 21, 2020 at 05:48:14PM +0100, Ricardo Ribalda wrote:
> > > Some devices can implement a physical switch to disable the input of the
> > > camera on demand. Think of it like an elegant privacy sticker.
> > >
> > > The system can read the status of the privacy switch via a GPIO.
> > >
> > > It is important to know the status of the switch, e.g. to notify the
> > > user when the camera will produce black frames and a videochat
> > > application is used.
> > >
> > > In some systems, the GPIO is connected to main SoC instead of the
> > > camera controller, with the connected reported by the system firmware
> >
> > s/connected/connection/
> >
> > > (ACPI or DT). In that case, the UVC device isn't aware of the GPIO. We
> > > need to implement a virtual entity to handle the GPIO fully on the
> > > driver side.
> > >
> > > For example, for ACPI-based systems, the GPIO is reported in the USB
> > > device object:
> > >
> > >   Scope (\_SB.PCI0.XHCI.RHUB.HS07)
> > >   {
> > >
> > > /.../
> > >
> > > Name (_CRS, ResourceTemplate ()  // _CRS: Current Resource Settings
> > > {
> > > GpioIo (Exclusive, PullDefault, 0x, 0x, 
> > > IoRestrictionOutputOnly,
> > > "\\_SB.PCI0.GPIO", 0x00, ResourceConsumer, ,
> > > )
> > > {   // Pin list
> > > 0x0064
> > > }
> > > })
> > > Name (_DSD, Package (0x02)  // _DSD: Device-Specific Data
> > > {
> > > ToUUID ("daffd814-6eba-4d8c-8a91-bc9bbf4aa301") /* Device 
> > > Properties for _DSD */,
> > > Package (0x01)
> > > {
> > > Package (0x02)
> > > {
> > > "privacy-gpio",
> > > Package (0x04)
> > > {
> > > \_SB.PCI0.XHCI.RHUB.HS07,
> > > Zero,
> > > Zero,
> > > One
> > > }
> > > }
> > > }
> > > })
> > >   }
> > >
> > > Signed-off-by: Ricardo Ribalda 
> > > ---
> > >  drivers/media/usb/uvc/uvc_ctrl.c   |   7 ++
> > >  drivers/media/usb/uvc/uvc_driver.c | 156 +
> > >  drivers/media/usb/uvc/uvc_entity.c |   1 +
> > >  drivers/media/usb/uvc/uvcvideo.h   |  16 +++
> > >  4 files changed, 180 insertions(+)
> > >
> > > diff --git a/drivers/media/usb/uvc/uvc_ctrl.c 
> > > b/drivers/media/usb/uvc/uvc_ctrl.c
> > > index 528254230535..a430fa666897 100644
> > > --- a/drivers/media/usb/uvc/uvc_ctrl.c
> > > +++ b/drivers/media/usb/uvc/uvc_ctrl.c
> > > @@ -1300,6 +1300,10 @@ static void __uvc_ctrl_status_event_work(struct 
> > > uvc_device *dev,
> > >
> > >   mutex_unlock(>ctrl_mutex);
> > >
> > > + /* Events not started by the UVC device. E.g. the GPIO unit */
> > > + if (!w->urb)
> > > + return;
> > > +
> > >   /* Resubmit the URB. */
> > >   w->urb->interval = dev->int_ep->desc.bInterval;
> > >   ret = usb_submit_urb(w->urb, GFP_KERNEL);
> > > @@ -2317,6 +2321,9 @@ int uvc_ctrl_init_device(struct uvc_device *dev)
> > >   } else if (UVC_ENTITY_TYPE(entity) == UVC_ITT_CAMERA) {
> > >   bmControls = entity->camera.bmControls;
> > >   bControlSize = entity->camera.bControlSize;
> > > + } else if (UVC_ENTITY_TYPE(entity) == UVC_EXT_GPIO_UNIT) {
> > > + bmControls = entity->gpio.bmControls;
> > > + bControlSize = entity->gpio.bControlSize;
> > >   }
> > >
> > >   /* Remove bogus/blacklisted controls */
> > > diff --git a/drivers/media/usb/uvc/uvc_driver.c 
> > > b/drivers/media/usb/uvc/uvc_driver.c
> > > index c0c5f75ade40..72516101fdd0 100644
> > > --- a/drivers/media/usb/uvc/uvc_driver.c
> > > +++ b/drivers/media/usb/uvc/uvc_driver.c
> > > @@ -7,6 +7,7 @@
> > >   */
> > >
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -1020,6 +1021,7 @@ static int uvc_parse_streaming(struct uvc_device 
> > > *dev,
> > >  }
> > >
> > >  static const u8 uvc_camera_guid[16] = UVC_GUID_UVC_CAMERA;
> > > +static const u8 uvc_gpio_guid[16] = UVC_GUID_EXT_GPIO_CONTROLLER;
> > >  static const u8 uvc_media_transport_input_guid[16] =
> > >   UVC_GUID_UVC_MEDIA_TRANSPORT_INPUT;
> > >  static const u8 uvc_processing_guid[16] = UVC_GUID_UVC_PROCESSING;
> > > @@ -1051,6 +1053,9 @@ static struct uvc_entity *uvc_alloc_entity(u16 
> > > type, u16 id,
> > >* is initialized by the caller.
> > >*/
> > >   switch (type) {
> > > + case UVC_EXT_GPIO_UNIT:
> > > + memcpy(entity->guid, uvc_gpio_guid, 16);
> > > + break;
> > >   case UVC_ITT_CAMERA:
> > >   memcpy(entity->guid, uvc_camera_guid, 16);
> > >   break;
> > > @@ -1464,6 +1469,137 @@ static int

Re: Does uaccess_kernel() work for detecting kernel thread?

2020-12-22 Thread Christoph Hellwig

On Tue, Dec 22, 2020 at 11:39:08PM +0900, Tetsuo Handa wrote:
> For example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n,
> isn't sg_check_file_access() failing to detect kernel context?

sg_check_file_access does exactly the right thing - fail for all kernel
threads as those can't support the magic it does.

> For another example, if uaccess_kernel() is "false" due to CONFIG_SET_FS=n,
> isn't TOMOYO unexpectedly checking permissions for socket operations?

Can someone explain WTF TOMOYO is even doing there?  A security module
has absolutely no business checking what context it is called from, but
must check the process credentials instead.

Re: [PATCH] vdpa_sim: use iova module to allocate IOVA addresses

2020-12-22 Thread Stefano Garzarella


On Wed, Dec 23, 2020 at 11:43:40AM +0800, Jason Wang wrote:


On 2020/12/23 上午1:45, Stefano Garzarella wrote:

The identical mapping used until now created issues when mapping
different virtual pages with the same physical address.
To solve this issue, we can use the iova module, to handle the IOVA
allocation.
For semplicity we use an IOVA allocator with byte granularity.



Should be simplicity, so did one comment below.


Right, I'll fix here and in the comment below.






We add two new functions, vdpasim_map_range() and vdpasim_unmap_range(),
to handle the IOVA allocation and the registration into the IOMMU/IOTLB.
These functions are used by dma_map_ops callbacks.

Signed-off-by: Stefano Garzarella 



Few nits, but:

Acked-by: Jason Wang 


Thanks!





---
 drivers/vdpa/vdpa_sim/vdpa_sim.h |   2 +
 drivers/vdpa/vdpa_sim/vdpa_sim.c | 108 +++
 drivers/vdpa/Kconfig |   1 +
 3 files changed, 69 insertions(+), 42 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index b02142293d5b..6efe205e583e 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -6,6 +6,7 @@
 #ifndef _VDPA_SIM_H
 #define _VDPA_SIM_H
+#include 
 #include 
 #include 
 #include 
@@ -55,6 +56,7 @@ struct vdpasim {
/* virtio config according to device type */
void *config;
struct vhost_iotlb *iommu;
+   struct iova_domain iova;
void *buffer;
u32 status;
u32 generation;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index b3fcc67bfdf0..341b9daf2ea4 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "vdpa_sim.h"
@@ -128,30 +129,57 @@ static int dir_to_perm(enum dma_data_direction dir)
return perm;
 }
+static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr,
+   size_t size, unsigned int perm)
+{
+   struct iova *iova;
+   dma_addr_t dma_addr;
+   int ret;
+
+   /* We set the limit_pfn to the maximum (~0UL - 1) */
+   iova = alloc_iova(>iova, size, ~0UL - 1, true);



Let's use ULONG_MAX?


Definitely, much better!





+   if (!iova)
+   return DMA_MAPPING_ERROR;
+
+   dma_addr = iova_dma_addr(>iova, iova);
+
+   spin_lock(>iommu_lock);
+   ret = vhost_iotlb_add_range(vdpasim->iommu, (u64)dma_addr,
+   (u64)dma_addr + size - 1, (u64)paddr, perm);
+   spin_unlock(>iommu_lock);
+
+   if (ret) {
+   __free_iova(>iova, iova);
+   return DMA_MAPPING_ERROR;
+   }
+
+   return dma_addr;
+}
+
+static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr,
+   size_t size)
+{
+   spin_lock(>iommu_lock);
+   vhost_iotlb_del_range(vdpasim->iommu, (u64)dma_addr,
+ (u64)dma_addr + size - 1);
+   spin_unlock(>iommu_lock);
+
+   free_iova(>iova, iova_pfn(>iova, dma_addr));
+}
+
 static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page,
   unsigned long offset, size_t size,
   enum dma_data_direction dir,
   unsigned long attrs)
 {
struct vdpasim *vdpasim = dev_to_sim(dev);
-   struct vhost_iotlb *iommu = vdpasim->iommu;
-   u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset;
-   int ret, perm = dir_to_perm(dir);
+   phys_addr_t paddr = page_to_phys(page) + offset;
+   int perm = dir_to_perm(dir);
if (perm < 0)
return DMA_MAPPING_ERROR;
-   /* For simplicity, use identical mapping to avoid e.g iova
-* allocator.
-*/
-   spin_lock(>iommu_lock);
-   ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1,
-   pa, dir_to_perm(dir));
-   spin_unlock(>iommu_lock);
-   if (ret)
-   return DMA_MAPPING_ERROR;
-
-   return (dma_addr_t)(pa);
+   return vdpasim_map_range(vdpasim, paddr, size, perm);
 }
 static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr,
@@ -159,12 +187,8 @@ static void vdpasim_unmap_page(struct device *dev, 
dma_addr_t dma_addr,
   unsigned long attrs)
 {
struct vdpasim *vdpasim = dev_to_sim(dev);
-   struct vhost_iotlb *iommu = vdpasim->iommu;
-   spin_lock(>iommu_lock);
-   vhost_iotlb_del_range(iommu, (u64)dma_addr,
- (u64)dma_addr + size - 1);
-   spin_unlock(>iommu_lock);
+   vdpasim_unmap_range(vdpasim, dma_addr, size);
 }
 static void *vdpasim_alloc_coherent(struct device *dev, size_t size,
@@ -172,27 +196,22 @@ static void *vdpasim_alloc_coherent(struct device *dev, 
size_t size,
unsigned

Re: [PATCH] crypto: keembay-ocs-aes - Add dependency on HAS_IOMEM

2020-12-22 Thread Herbert Xu

On Thu, Dec 17, 2020 at 04:35:10PM +, Daniele Alessandrelli wrote:
> From: Daniele Alessandrelli 
> 
> Add dependency for CRYPTO_DEV_KEEMBAY_OCS_AES_SM4 on HAS_IOMEM to
> prevent build failures.
> 
> Fixes: 88574332451380f4 ("crypto: keembay - Add support for Keem Bay OCS 
> AES/SM4")
> Reported-by: kernel test robot 
> Signed-off-by: Daniele Alessandrelli 
> ---
>  drivers/crypto/keembay/Kconfig | 1 +
>  1 file changed, 1 insertion(+)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Re: [PATCH bpf-next 1/2] bpf: Add a bpf_kallsyms_lookup helper

2020-12-22 Thread Christoph Hellwig

On Tue, Dec 22, 2020 at 09:17:41PM +0100, Florent Revest wrote:
> On Tue, Dec 22, 2020 at 3:18 PM Christoph Hellwig  wrote:
> >
> > FYI, there is a reason why kallsyms_lookup is not exported any more.
> > I don't think adding that back through a backdoor is a good idea.
> 
> Did you maybe mean kallsyms_lookup_name (the one that looks an address
> up based on a symbol name) ? It used to be exported but isn't anymore
> indeed.
> However, this is not what we're trying to do. As far as I can tell,
> kallsyms_lookup (the one that looks a symbol name up based on an
> address) has never been exported but its close cousins sprint_symbol
> and sprint_symbol_no_offset (which only call kallsyms_lookup and
> pretty print the result) are still exported, they are also used by
> vsprintf. Is this an issue ?

Indeed, I thought of kallsyms_lookup_name.  Let me take another
look at the patch, but kallsyms_lookup still seems like a very
lowlevel function to export to arbitrary eBPF programs.

Re: [PATCH] crypto: CRYPTO_DEV_KEEMBAY_OCS_AES_SM4 should depend on ARCH_KEEMBAY

2020-12-22 Thread Herbert Xu

On Wed, Dec 16, 2020 at 02:14:59PM +0100, Geert Uytterhoeven wrote:
> The Intel Keem Bay Offload and Crypto Subsystem (OCS) is only present on
> Intel Keem Bay SoCs.  Hence add a dependency on ARCH_KEEMBAY, to prevent
> asking the user about this driver when configuring a kernel without
> Intel Keem Bay platform support.
> 
> While at it, fix a misspelling of "cipher".
> 
> Fixes: 88574332451380f4 ("crypto: keembay - Add support for Keem Bay OCS 
> AES/SM4")
> Signed-off-by: Geert Uytterhoeven 
> ---
>  drivers/crypto/keembay/Kconfig | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Patch applied.  Thanks.
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[PATCH] arm64: dts: mt8192: add thermal zones, cooling map and trips

2020-12-22 Thread Michael Kao

Add thermal zone node to support mt8192 read temperature.
Thermal throttle will start at 68C and the
target temperature is 85C.

This patch depends on [1].

[1]https://patchwork.kernel.org/project/linux-mediatek/patch/20201221061018.18503-3-yz...@mediatek.com/

Signed-off-by: Michael Kao 
---
 arch/arm64/boot/dts/mediatek/mt8192.dtsi | 169 +++
 1 file changed, 169 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8192.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8192.dtsi
index 4a0d941aec30..4020e40a092a 100644
--- a/arch/arm64/boot/dts/mediatek/mt8192.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8192.dtsi
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 / {
compatible = "mediatek,mt8192";
@@ -42,6 +43,7 @@
clock-frequency = <170100>;
next-level-cache = <_0>;
capacity-dmips-mhz = <530>;
+   #cooling-cells = <2>;
};
 
cpu1: cpu@100 {
@@ -52,6 +54,7 @@
clock-frequency = <170100>;
next-level-cache = <_0>;
capacity-dmips-mhz = <530>;
+   #cooling-cells = <2>;
};
 
cpu2: cpu@200 {
@@ -62,6 +65,7 @@
clock-frequency = <170100>;
next-level-cache = <_0>;
capacity-dmips-mhz = <530>;
+   #cooling-cells = <2>;
};
 
cpu3: cpu@300 {
@@ -72,6 +76,7 @@
clock-frequency = <170100>;
next-level-cache = <_0>;
capacity-dmips-mhz = <530>;
+   #cooling-cells = <2>;
};
 
cpu4: cpu@400 {
@@ -82,6 +87,7 @@
clock-frequency = <217100>;
next-level-cache = <_1>;
capacity-dmips-mhz = <1024>;
+   #cooling-cells = <2>;
};
 
cpu5: cpu@500 {
@@ -92,6 +98,7 @@
clock-frequency = <217100>;
next-level-cache = <_1>;
capacity-dmips-mhz = <1024>;
+   #cooling-cells = <2>;
};
 
cpu6: cpu@600 {
@@ -102,6 +109,7 @@
clock-frequency = <217100>;
next-level-cache = <_1>;
capacity-dmips-mhz = <1024>;
+   #cooling-cells = <2>;
};
 
cpu7: cpu@700 {
@@ -112,6 +120,7 @@
clock-frequency = <217100>;
next-level-cache = <_1>;
capacity-dmips-mhz = <1024>;
+   #cooling-cells = <2>;
};
 
cpu-map {
@@ -178,6 +187,140 @@
method = "smc";
};
 
+   thermal-zones {
+   soc_max {
+   polling-delay = <1000>; /* milliseconds */
+   polling-delay-passive = <1000>; /* milliseconds */
+   thermal-sensors = < 0>;
+   sustainable-power = <1500>;
+
+   trips {
+   threshold: trip-point@0 {
+   temperature = <68000>;
+   hysteresis = <2000>;
+   type = "passive";
+   };
+
+   target: target@1 {
+   temperature = <85000>;
+   hysteresis = <2000>;
+   type = "passive";
+   };
+
+   soc_max_crit: soc_max_crit@0 {
+   temperature = <115000>;
+   hysteresis = <2000>;
+   type = "critical";
+   };
+   };
+
+   cooling-maps {
+   map0 {
+   trip = <>;
+   cooling-device = <
+   THERMAL_NO_LIMIT
+   THERMAL_NO_LIMIT>,
+<
+   THERMAL_NO_LIMIT
+   THERMAL_NO_LIMIT>,
+<
+   THERMAL_NO_LIMIT
+   THERMAL_NO_LIMIT>,
+<
+

Re: [PATCH v1 0/5] dm: dm-user: New target that proxies BIOs to userspace

2020-12-22 Thread Christoph Hellwig



FYI, a few years ago I spent some time helping a customer to prepare
their block device in userspace using fuse code for upstreaming, but
at some point they abandoned the project.  But if for some reason we
don't want to use nbd I think a driver using the fuse infrastructure
would be the next logical choice.

[PATCH] thermal: cpufreq_cooling: fix slab OOB issue

2020-12-22 Thread Michael Kao

From: brian-sy yang 

Slab OOB issue is scanned by KASAN in cpu_power_to_freq().
If power is limited below the power of OPP0 in EM table,
it will cause slab out-of-bound issue with negative array
index.

Return the lowest frequency if limited power cannot found
a suitable OPP in EM table to fix this issue.

Backtrace:
[] die+0x104/0x5ac
[] bug_handler+0x64/0xd0
[] brk_handler+0x160/0x258
[] do_debug_exception+0x248/0x3f0
[] el1_dbg+0x14/0xbc
[] __kasan_report+0x1dc/0x1e0
[] kasan_report+0x10/0x20
[] __asan_report_load8_noabort+0x18/0x28
[] cpufreq_power2state+0x180/0x43c
[] power_actor_set_power+0x114/0x1d4
[] allocate_power+0xaec/0xde0
[] power_allocator_throttle+0x3ec/0x5a4
[] handle_thermal_trip+0x160/0x294
[] thermal_zone_device_check+0xe4/0x154
[] process_one_work+0x5e4/0xe28
[] worker_thread+0xa4c/0xfac
[] kthread+0x33c/0x358
[] ret_from_fork+0xc/0x18

Signed-off-by: brian-sy yang 
---
 drivers/thermal/cpufreq_cooling.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/thermal/cpufreq_cooling.c 
b/drivers/thermal/cpufreq_cooling.c
index cc2959f22f01..fb33b3480a8f 100644
--- a/drivers/thermal/cpufreq_cooling.c
+++ b/drivers/thermal/cpufreq_cooling.c
@@ -123,7 +123,7 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device 
*cpufreq_cdev,
 {
int i;
 
-   for (i = cpufreq_cdev->max_level; i >= 0; i--) {
+   for (i = cpufreq_cdev->max_level; i > 0; i--) {
if (power >= cpufreq_cdev->em->table[i].power)
break;
}
-- 
2.18.0

RE: [PATCH v1] scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL

2020-12-22 Thread Avri Altman

> 
> On 2020-12-23 12:19, Stanley Chu wrote:
> > Hi Can,
> >
> > On Tue, 2020-12-22 at 19:34 +0800, Can Guo wrote:
> >> On 2020-12-22 15:29, Stanley Chu wrote:
> >> > Flush during hibern8 is sufficient on MediaTek platforms, thus
> >> > enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL to skip
> enabling
> >> > fWriteBoosterBufferFlush during WriteBooster initialization.
> >> >
> >> > Signed-off-by: Stanley Chu 
> >> > ---
> >> >  drivers/scsi/ufs/ufs-mediatek.c | 1 +
> >> >  1 file changed, 1 insertion(+)
> >> >
> >> > diff --git a/drivers/scsi/ufs/ufs-mediatek.c
> >> > b/drivers/scsi/ufs/ufs-mediatek.c
> >> > index 80618af7c872..c55202b92a43 100644
> >> > --- a/drivers/scsi/ufs/ufs-mediatek.c
> >> > +++ b/drivers/scsi/ufs/ufs-mediatek.c
> >> > @@ -661,6 +661,7 @@ static int ufs_mtk_init(struct ufs_hba *hba)
> >> >
> >> >/* Enable WriteBooster */
> >> >hba->caps |= UFSHCD_CAP_WB_EN;
> >> > +  hba->quirks |= UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL;
> >> >hba->vps->wb_flush_threshold =
> UFS_WB_BUF_REMAIN_PERCENT(80);
> >> >
> >> >if (host->caps & UFS_MTK_CAP_DISABLE_AH8)
> >>
> >> I guess we need it too...
> >
> > AHHA, if you decide to add this in your platform too later, maybe we
> > could change the way it does: Keep manual flush disabled by default and
> > remove this quirk.
Ack on that.
I never understood why it was needed in the first place.
Maybe just remove it, and allow to perform explicit flush from sysfs.

Thanks,
Avri
> >
> 
> Yeah... I will get back with an answer later.

Der Betrag von 500.000,00 Euro wurde Ihnen gespendet. Kontakt: manuelfranco4l...@gmail.com

2020-12-22 Thread Samaila Sali

Manuel Franco hat Ihnen 500.000,00 Euro gespendet. Er hat am 23. April
2019 den Powerball-Jackpot in Höhe von 758,7 Millionen US-Dollar
gewonnen. Weitere Informationen erhalten Sie per E-Mail unter:
manuelfranco4l...@gmail.com

Re: [PATCH] erofs: support direct IO for uncompressed file

2020-12-22 Thread Christoph Hellwig

On Wed, Dec 23, 2020 at 03:39:01AM +0800, Gao Xiang wrote:
> Hi Christoph,
> 
> On Tue, Dec 22, 2020 at 02:22:34PM +, Christoph Hellwig wrote:
> > Please do not add new callers of __blockdev_direct_IO and use the modern
> > iomap variant instead.
> 
> We've talked about this topic before. The current status is that iomap
> doesn't support tail-packing inline data yet (Chao once sent out a version),
> and erofs only cares about read intrastructure for now (So we don't think
> more about how to deal with tail-packing inline write path). Plus, the
> original patch was once lack of inline data regression test from gfs2 folks.

So resend Chaos prep patch as part of the series switching parts of
erofs to iomap.  We need to move things off the old infrastructure instead
of adding more users and everyone needs to help a little.

Re: [LKP] [locking/rwsem] 617f3ef951: unixbench.score -21.2% regression

2020-12-22 Thread Xing Zhengjun


Hi Waiman,

   Do you have time to look at this? Thanks.
   As you describe in commit: 617f3ef95177840c77f59c2aec1029d27d5547d6 
("locking/rwsem: Remove reader optimistic spinning"), The patch that 
disables reader optimistic spinning shows reduced performance at lightly 
loaded cases, so for this regression, Is it as expected?


On 12/17/2020 9:33 AM, kernel test robot wrote:


Greeting,

FYI, we noticed a -21.2% regression of unixbench.score due to commit:


commit: 617f3ef95177840c77f59c2aec1029d27d5547d6 ("locking/rwsem: Remove reader 
optimistic spinning")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master


in testcase: unixbench
on test machine: 16 threads Intel(R) Xeon(R) E-2278G CPU @ 3.40GHz with 32G 
memory
with following parameters:

runtime: 300s
nr_task: 30%
test: shell8
cpufreq_governor: performance
ucode: 0xde

test-description: UnixBench is the original BYTE UNIX benchmark suite aims to 
test performance of Unix-like system.
test-url: https://github.com/kdlucas/byte-unixbench



If you fix the issue, kindly add following tag
Reported-by: kernel test robot 


Details are as below:
-->


To reproduce:

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp install job.yaml  # job file is attached in this email
 bin/lkp run job.yaml

=
compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode:
   
gcc-9/performance/x86_64-rhel-8.3/30%/debian-10.4-x86_64-20200603.cgz/300s/lkp-cfl-e1/shell8/unixbench/0xde

commit:
   1a728dff85 ("locking/rwsem: Enable reader optimistic lock stealing")
   617f3ef951 ("locking/rwsem: Remove reader optimistic spinning")

1a728dff855a318b 617f3ef95177840c77f59c2aec1
 ---
fail:runs  %reproductionfail:runs
| | |
  39:4 -992%:4 
perf-profile.calltrace.cycles-pp.error_entry
  25:4 -635%:4 
perf-profile.children.cycles-pp.error_entry
  %stddev %change %stddev
  \  |\
  21807 ±  3% -21.2%  17186unixbench.score
1287072 ±  3% -38.7% 788414
unixbench.time.involuntary_context_switches
  37161 ±  4% +31.3%  48798unixbench.time.major_page_faults
  1.047e+08 ±  3% -21.1%   82610985unixbench.time.minor_page_faults
   1341   -27.1% 978.00
unixbench.time.percent_of_cpu_this_job_got
 370.87   -33.3% 247.55unixbench.time.system_time
 490.05   -23.3% 376.03unixbench.time.user_time
3083520 ±  3% +59.7%4924900
unixbench.time.voluntary_context_switches
 824314 ±  3% -21.2% 649654unixbench.workload
   0.03 ± 27% -51.9%   0.02 ± 59%  
perf-sched.sch_delay.avg.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork
 385.15 ±  2% +62.5% 625.72uptime.idle
  17.03-1.8%  16.73boot-time.boot
  11.01-1.6%  10.83boot-time.dhcp
 214.12 ±  3%  -3.1% 207.49boot-time.idle
  13.72 ±  4% +23.5   37.24mpstat.cpu.all.idle%
   1.06-0.10.94mpstat.cpu.all.irq%
  49.32 ±  2% -11.8   37.53mpstat.cpu.all.sys%
  35.24 ±  2% -11.6   23.68mpstat.cpu.all.usr%
  15.50 ±  3%+145.2%  38.00vmstat.cpu.id
  49.00 ±  2% -22.4%  38.00vmstat.cpu.sy
  33.75 ±  2% -33.3%  22.50 ±  2%  vmstat.cpu.us
  21.75 ±  3% -33.3%  14.50 ±  3%  vmstat.procs.r
  97370 ±  3% +56.4% 152258vmstat.system.cs
  37589-2.1%  36804vmstat.system.in
  11861 ±  9% -18.0%   9730slabinfo.filp.active_objs
  13242 ±  8% -15.5%  11184slabinfo.filp.num_objs
  14731 ±  7%  -9.5%  13325 ±  5%  slabinfo.kmalloc-8.active_objs
  14731 ±  7%  -9.5%  13325 ±  5%  slabinfo.kmalloc-8.num_objs
   5545 ±  2% -13.8%   4780 ±  4%  slabinfo.pid.active_objs
   5563 ±  2% -13.8%   4793 ±  4%  slabinfo.pid.num_objs
   5822 ± 14% -40.4%   3468 ±  5%  
slabinfo.task_delay_info.active_objs
   5825 ± 14% -40.5%   3468 ±  5%  slabinfo.task_delay_info.num_objs
   32104492 ±  3%+303.3%  1.295e+08 ± 11%  cpuidle.C1.time
 882330 ±  5%+131.5%2042656 ± 10%  cpuidle.C1.usage
   21965263 ±  3%+340.5%   96762398 ± 14%  cpuidle.C1E.time
 442911 ±  2%+211.3%1378866 ± 14%  cpuidle.C1E.usage
6511399 ±  4%+606.6%   46010023 ±

Re: [PATCH v6 1/2] lib/string.c: add __sysfs_match_string_with_gaps() helper

2020-12-22 Thread Alexandru Ardelean

On Tue, Dec 22, 2020 at 3:43 PM Andy Shevchenko
 wrote:
>
> On Tue, Dec 22, 2020 at 3:09 PM Alexandru Ardelean
>  wrote:
> >
> > The original docstring of the __sysfs_match_string() and match_string()
> > helper, implied that -1 could be used to search through NULL terminated
> > arrays, and positive 'n' could be used to go through arrays that may have
> > NULL elements in the middle of the array.
> >
> > This isn't true. Regardless of the value of 'n', the first NULL element in
> > the array will stop the search, even if the element may be after a NULL
> > element.
> >
> > To allow for a behavior where we can use the __sysfs_match_string() to
> > search over arrays with NULL elements in the middle, the
> > __sysfs_match_string_with_gaps() helper is added.
> > If n > 0, the search will continue until the element is found or n is
> > reached.
> > If n < 0, the search will continue until the element is found or a NULL
> > character is found.
>
> I'm wondering if we can leave __sysfs_match_string() alone (w/o adding
> unnecessary branch).

Works for me.
Will re-spin.

>
> int __sysfs_match_string_with_gaps(const char * const *array, size_t
> n, const char *str)
> {
>const char *item;
>int index;
>
>for (index = 0; index < n; index++) {
>item = array[index];
>if (!item)
>continue;
>if (sysfs_streq(item, str))
>return index;
>}
>return -EINVAL;
> }
>
> Note, the check n>0 seems redundant for this particular function.
>
> > +static int __sysfs_match_string_common(const char * const *array, ssize_t 
> > n,
> > +  const char *str, bool gaps)
> > +{
> > +   const char *item;
> > +   int index;
> > +
> > +   for (index = 0; index < n; index++) {
> > +   item = array[index];
> > +   if (!item) {
> > +   if (gaps && n > 0)
> > +   continue;
> > +   break;
> > +   }
> > +   if (sysfs_streq(item, str))
> > +   return index;
> > +   }
> > +
> > +   return -EINVAL;
> > +}
> > +
> >  /**
> >   * __sysfs_match_string - matches given string in an array
> >   * @array: array of strings
> > @@ -770,21 +790,32 @@ EXPORT_SYMBOL(match_string);
> >   */
> >  int __sysfs_match_string(const char * const *array, size_t n, const char 
> > *str)
> >  {
> > -   const char *item;
> > -   int index;
> > -
> > -   for (index = 0; index < n; index++) {
> > -   item = array[index];
> > -   if (!item)
> > -   break;
> > -   if (sysfs_streq(item, str))
> > -   return index;
> > -   }
> > -
> > -   return -EINVAL;
> > +   return __sysfs_match_string_common(array, n, str, false);
> >  }
>
> --
> With Best Regards,
> Andy Shevchenko

Re: [PATCH AUTOSEL 5.4 008/130] staging: wimax: depends on NET

2020-12-22 Thread Greg Kroah-Hartman

On Tue, Dec 22, 2020 at 09:16:11PM -0500, Sasha Levin wrote:
> From: Randy Dunlap 
> 
> [ Upstream commit 9364a2cf567187c0a075942c22d1f434c758de5d ]
> 
> Fix build errors when CONFIG_NET is not enabled. E.g. (trimmed):
> 
> ld: drivers/staging/wimax/op-msg.o: in function `wimax_msg_alloc':
> op-msg.c:(.text+0xa9): undefined reference to `__alloc_skb'
> ld: op-msg.c:(.text+0xcc): undefined reference to `genlmsg_put'
> ld: op-msg.c:(.text+0xfc): undefined reference to `nla_put'
> ld: op-msg.c:(.text+0x168): undefined reference to `kfree_skb'
> ld: drivers/staging/wimax/op-msg.o: in function `wimax_msg_data_len':
> op-msg.c:(.text+0x1ba): undefined reference to `nla_find'
> ld: drivers/staging/wimax/op-msg.o: in function `wimax_msg_send':
> op-msg.c:(.text+0x311): undefined reference to `init_net'
> ld: op-msg.c:(.text+0x326): undefined reference to `netlink_broadcast'
> ld: drivers/staging/wimax/stack.o: in function `__wimax_state_change':
> stack.c:(.text+0x433): undefined reference to `netif_carrier_off'
> ld: stack.c:(.text+0x46b): undefined reference to `netif_carrier_on'
> ld: stack.c:(.text+0x478): undefined reference to `netif_tx_wake_queue'
> ld: drivers/staging/wimax/stack.o: in function `wimax_subsys_exit':
> stack.c:(.exit.text+0xe): undefined reference to `genl_unregister_family'
> ld: drivers/staging/wimax/stack.o: in function `wimax_subsys_init':
> stack.c:(.init.text+0x1a): undefined reference to `genl_register_family'
> 
> Cc: Greg Kroah-Hartman 
> Cc: Jakub Kicinski 
> Cc: Arnd Bergmann 
> Cc: net...@vger.kernel.org
> Acked-by: Arnd Bergmann 
> Signed-off-by: Randy Dunlap 
> Link: https://lore.kernel.org/r/20201102072456.20303-1-rdun...@infradead.org
> Signed-off-by: Greg Kroah-Hartman 
> Signed-off-by: Sasha Levin 
> ---
>  net/wimax/Kconfig | 1 +
>  1 file changed, 1 insertion(+)

This isn't needed in any backported kernel as it only is relevant when
the code moved to drivers/staging/

thanks,

greg k-h

Re: [PATCH] mm/uaccess: Use 'unsigned long' to placate UBSAN warnings, again

2020-12-22 Thread Randy Dunlap

On 12/22/20 9:04 PM, Josh Poimboeuf wrote:
> GCC 7 has a known bug where UBSAN ignores '-fwrapv' and generates false
> signed-overflow-UB warnings.  The type mismatch between 'i' and
> 'nr_segs' in copy_compat_iovec_from_user() is causing such a warning,
> which also happens to violate uaccess rules:
> 
>   lib/iov_iter.o: warning: objtool: iovec_from_user()+0x22d: call to 
> __ubsan_handle_add_overflow() with UACCESS enabled
> 
> Fix it by making the variable types match.
> 
> This is similar to a previous commit:
> 
>   29da93fea3ea ("mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on 
> older GCC versions")
> 
> Reported-by: Randy Dunlap 
> Signed-off-by: Josh Poimboeuf 

All good. Thanks.

Acked-by: Randy Dunlap  # build-tested


> ---
>  lib/iov_iter.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/iov_iter.c b/lib/iov_iter.c
> index 1635111c5bd2..2e6a42f5d1df 100644
> --- a/lib/iov_iter.c
> +++ b/lib/iov_iter.c
> @@ -1656,7 +1656,8 @@ static int copy_compat_iovec_from_user(struct iovec 
> *iov,
>  {
>   const struct compat_iovec __user *uiov =
>   (const struct compat_iovec __user *)uvec;
> - int ret = -EFAULT, i;
> + int ret = -EFAULT;
> + unsigned long i;
>  
>   if (!user_access_begin(uvec, nr_segs * sizeof(*uvec)))
>   return -EFAULT;
> 


-- 
~Randy

[PATCH v2 2/2] arm64: dts: mt6779: Support ufshci and ufsphy

2020-12-22 Thread Stanley Chu

Support UFS on MT6779 platforms by adding ufshci and ufsphy
nodes in dts file.

Reviewed-by: Hanks Chen 
Signed-off-by: Stanley Chu 
---
 arch/arm64/boot/dts/mediatek/mt6779.dtsi | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt6779.dtsi 
b/arch/arm64/boot/dts/mediatek/mt6779.dtsi
index 370f309d32de..6eaf230bb0d1 100644
--- a/arch/arm64/boot/dts/mediatek/mt6779.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt6779.dtsi
@@ -225,6 +225,41 @@
#clock-cells = <1>;
};
 
+   ufshci: ufshci@1127 {
+   compatible = "mediatek,mt8183-ufshci";
+   reg = <0 0x1127 0 0x2300>;
+   interrupts = ;
+   phys = <>;
+
+   clocks = <_ao CLK_INFRA_UFS>,
+<_ao CLK_INFRA_UFS_TICK>,
+<_ao CLK_INFRA_UFS_AXI>,
+<_ao CLK_INFRA_UNIPRO_TICK>,
+<_ao CLK_INFRA_UNIPRO_MBIST>,
+< CLK_TOP_FAES_UFSFDE>,
+<_ao CLK_INFRA_AES_UFSFDE>,
+<_ao CLK_INFRA_AES_BCLK>;
+   clock-names = "ufs", "ufs_tick", "ufs_axi",
+ "unipro_tick", "unipro_mbist",
+ "aes_top", "aes_infra", "aes_bclk";
+   freq-table-hz = <0 0>, <0 0>, <0 0>,
+   <0 0>, <0 0>, <0 0>,
+   <0 0>, <0 0>;
+
+   mediatek,ufs-disable-ah8;
+   mediatek,ufs-support-va09;
+   };
+
+   ufsphy: phy@11fa {
+   compatible = "mediatek,mt8183-ufsphy";
+   reg = <0 0x11fa 0 0xc000>;
+   #phy-cells = <0>;
+
+   clocks = <_ao CLK_INFRA_UNIPRO_SCK>,
+<_ao CLK_INFRA_UFS_MP_SAP_BCLK>;
+   clock-names = "unipro", "mp";
+   };
+
mfgcfg: clock-controller@13fbf000 {
compatible = "mediatek,mt6779-mfgcfg", "syscon";
reg = <0 0x13fbf000 0 0x1000>;
@@ -266,6 +301,5 @@
reg = <0 0x1b00 0 0x1000>;
#clock-cells = <1>;
};
-
};
 };
-- 
2.18.0

[PATCH v2 0/2] arm64: Support Universal Flash Storage on MediaTek MT6779 platform

2020-12-22 Thread Stanley Chu

Hi,
This series adds UFS (Universal Flash Storage) support on MediaTek MT6779 SoC 
platform.

Changes since v1:
  - Fix irq attribute in dts in patch [2/2]

Stanley Chu (2):
  arm64: configs: Support Universal Flash Storage on MediaTek platforms
  arm64: dts: mt6779: Support ufshci and ufsphy

 arch/arm64/boot/dts/mediatek/mt6779.dtsi | 36 +++-
 arch/arm64/configs/defconfig |  1 +
 2 files changed, 36 insertions(+), 1 deletion(-)

-- 
2.18.0

[PATCH v2 1/2] arm64: configs: Support Universal Flash Storage on MediaTek platforms

2020-12-22 Thread Stanley Chu

Support UFS on MediaTek platforms by enabling CONFIG_SCSI_UFS_MEDIATEK.

Reviewed-by: Hanks Chen 
Signed-off-by: Stanley Chu 
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 17a2df6a263e..e92f42a43bfa 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -277,6 +277,7 @@ CONFIG_SCSI_MPT3SAS=m
 CONFIG_SCSI_UFSHCD=y
 CONFIG_SCSI_UFSHCD_PLATFORM=y
 CONFIG_SCSI_UFS_QCOM=m
+CONFIG_SCSI_UFS_MEDIATEK=m
 CONFIG_SCSI_UFS_HISI=y
 CONFIG_ATA=y
 CONFIG_SATA_AHCI=y
-- 
2.18.0

Re: [PATCHSET] saner elf compat

2020-12-22 Thread Al Viro

On Wed, Dec 23, 2020 at 07:03:20AM +, Al Viro wrote:

Argh  Wrong commit blamed - the parent of the correct one.
It's actually 2aa362c49c31 ("coredump: extend core dump note section to
contain file names of mapped files").  My apologies - fat-fingered
cut'n'paste...

siginfo commit does suffer the same problem, but it becomes an issue
only for 32bit processes under mips64 big-endian kernel (there it yields
e.g. zero .__sigfault.si_addr in $_siginfo when using gdb with a coredump
of 32bit process, whatever the actual faulting address had been).  And
b-e mips64 is rather uncommon, so that's less of an issue.

Re: [PATCH AUTOSEL 5.4 057/130] ALSA: usb-audio: Check valid altsetting at parsing rates for UAC2/3

2020-12-22 Thread Takashi Iwai

On Wed, 23 Dec 2020 03:17:00 +0100,
Sasha Levin wrote:
> 
> From: Takashi Iwai 
> 
> [ Upstream commit 93db51d06b32227319dae2ac289029ccf1b33181 ]
> 
> The current driver code assumes blindly that all found sample rates for
> the same endpoint from the UAC2 and UAC3 descriptors can be used no
> matter which altsetting, but actually this was wrong: some devices
> accept only limited sample rates in each altsetting.  For determining
> which altsetting supports which rate, we need to verify each sample rate
> and check the validity via UAC2_AS_VAL_ALT_SETTINGS.  This control
> reports back the available altsettings as a bitmap.
> 
> This patch implements the missing piece above, the verification and
> reconstructs the sample rate tables based on the result.
> 
> An open question is how to deal with the altsettings that ended up
> with no valid sample rates after verification.  At least, there is a
> device that showed this problem although the sample rates did work in
> the later usage (see bug link).  For now, we accept such an altset as
> is, assuming that it's a firmware bug.
> 
> Reported-by: Dylan Robinson 
> Tested-by: Keith Milner 
> Tested-by: Dylan Robinson 
> BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1178203
> Link: https://lore.kernel.org/r/20201123085347.19667-4-ti...@suse.de
> Signed-off-by: Takashi Iwai 
> Signed-off-by: Sasha Levin 

Please drop this for 5.4 or older.  At least this caused some problem
on 5.3 kernel that confused USB core by some reason while it works
fine with the recent upstream.


thanks,

Takashi

RE: [v2 1/2] rtc: pcf2127: properly set flag WD_CD for rtc chips(pcf2129, pca2129)

2020-12-22 Thread Biwen Li

Hi Alexandre,

Any comments?

Regards,
Biwen Li
> -Original Message-
> From: Biwen Li 
> Sent: 2020年12月2日 11:19
> To: Leo Li ; alexandre.bell...@bootlin.com; Anson
> Huang ; Aisheng Dong 
> Cc: linux-kernel@vger.kernel.org; Jiafei Pan ;
> linux-...@vger.kernel.org; Biwen Li 
> Subject: [v2 1/2] rtc: pcf2127: properly set flag WD_CD for rtc chips(pcf2129,
> pca2129)
> 
> From: Biwen Li 
> 
> Properly set flag WD_CD for rtc chips(pcf2129, pca2129)
> 
> Signed-off-by: Biwen Li 
> ---
> Change in v2:
>   - set flag WD_CD according to compatible
> 
>  drivers/rtc/rtc-pcf2127.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/rtc/rtc-pcf2127.c b/drivers/rtc/rtc-pcf2127.c index
> 03c9cb6b0b6e..a5418b657c50 100644
> --- a/drivers/rtc/rtc-pcf2127.c
> +++ b/drivers/rtc/rtc-pcf2127.c
> @@ -620,6 +620,10 @@ static int pcf2127_probe(struct device *dev, struct
> regmap *regmap,
>* Watchdog timer enabled and reset pin /RST activated when timed out.
>* Select 1Hz clock source for watchdog timer.
>* Note: Countdown timer disabled and not available.
> +  * For pca2129, pcf2129, only bit[7] is for Symbol WD_CD
> +  * of register watchdg_tim_ctl. The bit[6] is labeled
> +  * as T. Bits labeled as T must always be written with
> +  * logic 0.
>*/
>   ret = regmap_update_bits(pcf2127->regmap, PCF2127_REG_WD_CTL,
>PCF2127_BIT_WD_CTL_CD1 |
> @@ -627,7 +631,8 @@ static int pcf2127_probe(struct device *dev, struct
> regmap *regmap,
>PCF2127_BIT_WD_CTL_TF1 |
>PCF2127_BIT_WD_CTL_TF0,
>PCF2127_BIT_WD_CTL_CD1 |
> -  PCF2127_BIT_WD_CTL_CD0 |
> +  (device_property_match_string(dev, 
> "compatible",
> "nxp,pcf2127")
> +   ? (PCF2127_BIT_WD_CTL_CD0) : (0)) |
>PCF2127_BIT_WD_CTL_TF1);
>   if (ret) {
>   dev_err(dev, "%s: watchdog config (wd_ctl) failed\n", __func__);
> --
> 2.17.1

Re: [PATCHSET] saner elf compat

2020-12-22 Thread Al Viro

[Denys Vlasenko cc'd]

On Wed, Dec 16, 2020 at 09:44:53AM +, Maciej W. Rozycki wrote:
> On Wed, 16 Dec 2020, Al Viro wrote:
> 
> > >  It may be worth pushing through GDB's gdb.threads/tls-core.exp test 
> > > case, 
> > > making sure no UNSUPPORTED results have been produced due to resource 
> > > limits preventing a core from being dumped (and no FAILs, of course), 
> > > with 
> > > o32/n32 native GDB.  This should guarantee our output is still as 
> > > expected 
> > > by an interpreter.  Sadly I'm currently not set up for such testing 
> > > though 
> > > eventually I mean to.
> > 
> > Umm...  What triple does one use for n32 gdb?
> 
>  I don't think there's a standardised one, just configure with CC/CXX set 
> for n32 compilation, e.g.:
> 
> $ /path/to/configure CC="gcc -mabi=n32" CXX="g++ -mabi=n32"
> 
> (and any other options set as usually).  This has to be with CC/CXX rather 
> than CFLAGS/CXXFLAGS so that it is guaranteed to be never overridden with 
> any logic that might do any fiddling with compilation options.  This will 
> set up the test suite accordingly.
> 
>  NB this may already be the compiler's default, depending on how it was 
> configured, i.e. if `--with-abi=n32' was used, in which case no extra 
> options will be required.  I don't know if any standard MIPS distribution 
> does it though; 64-bit MIPS/Debian might.  This will be reported with `gcc 
> --help -v', somewhere along the way.
> 
>  Let me know if there are issues with this approach.

One issue is that testsuite doesn't care about $CC, $CFLAGS or anything
of that sort.  What I'd done was
cat >~/bin/cc-n32 <<'EOF'
#!/bin/sh
exec /usr/bin/gcc -mabi=n32 "$@"
EOF
chmod +x ~/bin/cc-n32
and add CC_FOR_TARGET="/home/al/bin/cc-n32" in RUNTESTFLAGS.

With that it works.  Moreover, it fixes a test failure on mainline.
Mainline kernel (5.10, same behaviour as debian/buster mips64el one):
Test run by al on Tue Dec 22 21:23:09 2020
Native configuration is mips64el-unknown-linux-gnuabin32

=== gdb tests ===

Schedule of variations:
unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for 
target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/al/binutils-gdb/gdb/testsuite/config/unix.exp as 
tool-and-target-specific interface file.
Running /home/al/binutils-gdb/gdb/testsuite/gdb.threads/tls-core.exp ...
FAIL: gdb.threads/tls-core.exp: native: print thread-local storage variable
=== gdb Summary ===

# of expected passes5
# of unexpected failures1

vfs.git #work.elf-compat:
Test run by al on Tue Dec 22 21:31:14 2020
Native configuration is mips64el-unknown-linux-gnuabin32

=== gdb tests ===

Schedule of variations:
unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for 
target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /home/al/binutils-gdb/gdb/testsuite/config/unix.exp as 
tool-and-target-specific interface file.
Running /home/al/binutils-gdb/gdb/testsuite/gdb.threads/tls-core.exp ...

=== gdb Summary ===

# of expected passes6

Which is bloody embarrassing, since I'd completely missed the
behaviour change - this series was supposed to be an equivalent
transformation.

Anyway, the minimal patch fixing that failure is this one-liner and
unlike the elf-compat series it's trivial to backport:

[mips] fix n32 coredump breakage

Back in 2012, 49ae4d4b113b ("coredump: add a new elf note with siginfo
of the signal") has introduced a new ELF coredump note - NT_FILE.  It contains
a mix of strings and addresses, and addresses are 32bit for 32bit targets
and 64bit for 64bit ones.  Eventually gdb has come to use it.

Biarch targets had been taken care of from the very beginning - the
same commit has added a macro (user_long_t) with default being long
and fs/compat_binfmt_elf.c overriding it to compat_long_t.

Unfortunately, Denis had missed the mips weirdness.  As the result,
on mips64 both o32 and n32 ended up using 64-bit layout.  readelf(1)
is not happy.  More importantly, neither is gdb(1); as the matter
of fact, gdb.thread/tls-core.exp kept complaining.  Note that gcore(1)
is using 32bit layout for n32 case - it's only the kernel n32 coredumps
that get broken NT_FILE note.

NOTE: similar patch is almost certainly needed for o32; I have only
tested it with n32 gdb, though.

Fixes: 49ae4d4b113b ("coredump: add a new elf note with siginfo of the signal")
Signed-off-by: Al Viro 
---
diff --git a/arch/mips/kernel/binfmt_elfn32.c b/arch/mips/kernel/binfmt_elfn32.c
index 6ee3f7218c67..c073136968e8 100644
--- a/arch/mips/kernel/binfmt_elfn32.c
+++ b/arch/mips/kernel/binfmt_elfn32.c
@@ -103,4 +103,6 @@ jiffies_to_old_timeval32(unsigned long jiffies, struct 
old_timeval32 *value)
 #undef ns_to_kernel_old_timeval
 #define ns_to_kernel_old_timeval ns_to_old_timeval32

Re: [PATCH] ubifs: Fix read out-of-bounds in ubifs_jnl_write_inode()

2020-12-22 Thread Zhihao Cheng


在 2020/12/23 14:28, Chengsong Ke 写道:

Reviewed-by: Zhihao Cheng 

From: kechengsong 

ubifs_jnl_write_inode() probably cause read out-of-bounds in some situation.
There is kasan stack:
[  336.432159] BUG: KASAN: slab-out-of-bounds in 
ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.433634] Read of size 4 at addr 888019612ff8 by task kworker/u8:4/135
[  336.434605]
[  336.434830] CPU: 1 PID: 135 Comm: kworker/u8:4 Not tainted 
5.10.0-11826-gaf2a097952f3-dirty #338
[  336.436050] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  336.437876] Workqueue: writeback wb_workfn (flush-ubifs_0_0)
[  336.438670] Call Trace:
[  336.439021]  ? dump_stack+0xdd/0x126
[  336.439513]  ? print_address_description.constprop.0+0x2c/0x3c0
[  336.440308]  ? _raw_write_lock_irqsave+0x140/0x140
[  336.440921]  ? ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.441546]  ? ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.442186]  ? kasan_report.cold+0x5d/0xd8
[  336.442711]  ? nand_reset_op+0x280/0x310
[  336.443218]  ? ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.443842]  ? __asan_load4+0x77/0x120
[  336.444334]  ? ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.444963]  ? nand_ecc_sw_hamming_calculate+0x6c/0x80
[  336.445619]  ? rawnand_sw_hamming_calculate+0x12/0x20
[  336.446263]  ? nand_write_page_swecc+0xa9/0x160
[  336.446849]  ? nand_do_write_ops+0x390/0x830
[  336.447406]  ? __writeback_single_inode+0x6cc/0x880
[  336.448041]  ? nand_write_oob+0x78/0x100
[  336.448568]  ? mtd_write_oob_std+0xe2/0x160
[  336.449127]  ? mtd_write_oob+0xec/0x1b0
[  336.449679]  ? mtd_write+0x92/0xf0
[  336.450128]  ? mtd_write_oob+0x1b0/0x1b0
[  336.450633]  ? ubi_self_check_all_ff+0x82/0x2e0 [ubi]
[  336.451328]  ? __list_add_valid+0x2b/0x130
[  336.451865]  ? ubi_io_write+0x2c2/0xa90 [ubi]
[  336.452472]  ? _raw_read_lock_irq+0x90/0x90
[  336.453078]  ? kmem_cache_alloc_trace+0x465/0x8b0
[  336.453749]  ? do_sync_erase+0x350/0x350 [ubi]
[  336.454430]  ? __kasan_check_write+0x20/0x30
[  336.455050]  ? down_write+0xf2/0x190
[  336.455569]  ? down_write_killable+0x1b0/0x1b0
[  336.456221]  ? check_mapping+0x2c/0x590 [ubi]
[  336.456890]  ? ubi_eba_write_leb+0x58a/0xfa0 [ubi]
[  336.457618]  ? __kmalloc+0x490/0x910
[  336.458142]  ? ubifs_jnl_write_inode.cold+0x6f/0x878 [ubifs]
[  336.459033]  ? writeback_sb_inodes+0x3a9/0x9a0
[  336.459672]  ? __writeback_inodes_wb+0xc8/0x170
[  336.460330]  ? wb_writeback+0x637/0x700
[  336.460882]  ? wb_workfn+0x8af/0xb80
[  336.461398]  ? process_one_work+0x467/0x9f0
[  336.462004]  ? worker_thread+0x34d/0x8e0
[  336.462582]  ? kthread+0x204/0x280
[  336.463047]  ? ret_from_fork+0x1f/0x30
[  336.463570]  ? create_prof_cpu_mask+0x30/0x30
[  336.464185]  ? ubi_eba_read_leb_sg+0x1f0/0x1f0 [ubi]
[  336.464917]  ? hrtimer_active+0x9b/0x100
[  336.465468]  ? ubi_leb_write+0x22c/0x2f0 [ubi]
[  336.466130]  ? ubifs_leb_write+0xf2/0x1b0 [ubifs]
[  336.466851]  ? ubifs_wbuf_write_nolock+0x412/0x1280 [ubifs]
[  336.467686]  ? write_head+0xdf/0x1c0 [ubifs]
[  336.468355]  ? ubifs_jnl_write_inode.cold+0x3ec/0x878 [ubifs]
[  336.469183]  ? ret_from_fork+0x1e/0x30
[  336.469707]  ? ubifs_jnl_write_data+0x660/0x660 [ubifs]
[  336.470497]  ? unwind_next_frame+0x247/0xca0
[  336.471095]  ? ret_from_fork+0x1f/0x30
[  336.471574]  ? fprop_reflect_period_percpu.isra.0+0x1f/0x1b0
[  336.472335]  ? generic_writepages+0x93/0x140
[  336.472933]  ? __kasan_check_write+0x20/0x30
[  336.473526]  ? mutex_lock+0xa6/0x110
[  336.474031]  ? __mutex_lock_slowpath+0x30/0x30
[  336.474662]  ? ubifs_write_inode+0x1c3/0x290 [ubifs]
[  336.475446]  ? __writeback_single_inode+0x6cc/0x880
[  336.476155]  ? wbc_attach_and_unlock_inode+0x2b6/0x400
[  336.476891]  ? writeback_sb_inodes+0x3a9/0x9a0
[  336.477528]  ? write_inode_now+0x1e0/0x1e0
[  336.478119]  ? __writeback_inodes_wb+0xc8/0x170
[  336.478770]  ? wb_writeback+0x637/0x700
[  336.479326]  ? __writeback_inodes_wb+0x170/0x170
[  336.479992]  ? current_work+0xa0/0xa0
[  336.480524]  ? _find_next_bit.constprop.0+0x3e/0x140
[  336.481241]  ? find_next_bit+0x18/0x30
[  336.481780]  ? cpumask_next+0x2f/0x40
[  336.482312]  ? wb_workfn+0x8af/0xb80
[  336.482832]  ? update_cfs_group+0x1e/0x1b0
[  336.483421]  ? inode_wait_for_writeback+0x60/0x60
[  336.484106]  ? schedule+0xb7/0x240
[  336.484595]  ? finish_task_switch+0x14e/0x9a0
[  336.485225]  ? __kasan_check_write+0x20/0x30
[  336.485841]  ? __schedule+0x6f4/0x1600
[  336.486382]  ? __kasan_check_read+0x1d/0x30
[  336.486981]  ? read_word_at_a_time+0x16/0x30
[  336.487594]  ? process_one_work+0x467/0x9f0
[  336.488198]  ? worker_thread+0x34d/0x8e0
[  336.488762]  ? rescuer_thread+0x820/0x820
[  336.489344]  ? kthread+0x204/0x280
[  336.489839]  ? kthread_bind+0x50/0x50
[  336.490367]  ? ret_from_fork+0x1f/0x30
[  336.490913]
[  336.491138] Allocated by task 135:
[  336.491629]  kasan_save_stack+0x23/0x60
[  336.492189]  __kasan_kmalloc.constprop.0+0x10b/0x120
[

Re: [PATCH v3 3/4] x86/signal: Prevent an alternate stack overflow before a signal delivery

2020-12-22 Thread Jann Horn

On Wed, Dec 23, 2020 at 2:57 AM Chang S. Bae  wrote:
> The kernel pushes data on the userspace stack when entering a signal. If
> using a sigaltstack(), the kernel precisely knows the user stack size.
>
> When the kernel knows that the user stack is too small, avoid the overflow
> and do an immediate SIGSEGV instead.
>
> This overflow is known to occur on systems with large XSAVE state. The
> effort to increase the size typically used for altstacks reduces the
> frequency of these overflows, but this approach is still useful for legacy
> binaries.
>
> Suggested-by: Jann Horn 
> Signed-off-by: Chang S. Bae 
> Reviewed-by: Len Brown 
> Cc: Jann Horn 
> Cc: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org

Reviewed-by: Jann Horn

Re: [PATCH] powerpc/32s: Fix RTAS machine check with VMAP stack

2020-12-22 Thread Christophe Leroy





Le 22/12/2020 à 08:11, Christophe Leroy a écrit :

When we have VMAP stack, exception prolog 1 sets r1, not r11.


But exception prolog 1 uses r1 to setup r1 when machine check happens in kernel.
So r1 must be restored when the branch is not taken. See subsequent patch I 
just sent out.

Christophe



Fixes: da7bb43ab9da ("powerpc/32: Fix vmap stack - Properly set r1 before activating 
MMU")
Fixes: d2e006036082 ("powerpc/32: Use SPRN_SPRG_SCRATCH2 in exception prologs")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
  arch/powerpc/kernel/head_book3s_32.S | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index 349bf3f0c3af..fbc48a500846 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -260,9 +260,16 @@ __secondary_hold_acknowledge:
  MachineCheck:
EXCEPTION_PROLOG_0
  #ifdef CONFIG_PPC_CHRP
+#ifdef CONFIG_VMAP_STACK
+   mtspr   SPRN_SPRG_SCRATCH2,r1
+   mfspr   r1, SPRN_SPRG_THREAD
+   lwz r1, RTAS_SP(r1)
+   cmpwi   cr1, r1, 0
+#else
mfspr   r11, SPRN_SPRG_THREAD
lwz r11, RTAS_SP(r11)
cmpwi   cr1, r11, 0
+#endif
bne cr1, 7f
  #endif /* CONFIG_PPC_CHRP */
EXCEPTION_PROLOG_1 for_rtas=1

Re: [v2] i2c: mediatek: Move suspend and resume handling to NOIRQ phase

2020-12-22 Thread Qii Wang

Hi sirs:
If there is no new comment, I will resent it in 5.11.

[PATCH v1 2/2] perf arm64: Add argument support for SDT

2020-12-22 Thread Leo Yan

Now the two OP formats are used for SDT marker argument in Arm64 ELF,
one format is genreal register xNUM (e.g. x1, x2, etc), another is for
using stack pointer to access local variables (e.g. [sp], [sp, 8]).

This patch adds support SDT marker argument for Arm64, it parses OP and
converts to uprobe compatible format.

Signed-off-by: Leo Yan 
---
 tools/perf/arch/arm64/util/perf_regs.c | 94 ++
 1 file changed, 94 insertions(+)

diff --git a/tools/perf/arch/arm64/util/perf_regs.c 
b/tools/perf/arch/arm64/util/perf_regs.c
index 54efa12fdbea..6b4b18283041 100644
--- a/tools/perf/arch/arm64/util/perf_regs.c
+++ b/tools/perf/arch/arm64/util/perf_regs.c
@@ -1,4 +1,12 @@
 // SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../../../util/debug.h"
+#include "../../../util/event.h"
 #include "../../../util/perf_regs.h"
 
 const struct sample_reg sample_reg_masks[] = {
@@ -37,3 +45,89 @@ const struct sample_reg sample_reg_masks[] = {
SMPL_REG(pc, PERF_REG_ARM64_PC),
SMPL_REG_END
 };
+
+/* %xNUM */
+#define SDT_OP_REGEX1  "^(x[1-2]?[0-9]|3[0-1])$"
+
+/* [sp], [sp, NUM] or [sp,NUM] */
+#define SDT_OP_REGEX2  "^\\[sp(, *)?([0-9]+)?\\]$"
+
+static regex_t sdt_op_regex1, sdt_op_regex2;
+
+static int sdt_init_op_regex(void)
+{
+   static int initialized;
+   int ret = 0;
+
+   if (initialized)
+   return 0;
+
+   ret = regcomp(_op_regex1, SDT_OP_REGEX1, REG_EXTENDED);
+   if (ret)
+   goto error;
+
+   ret = regcomp(_op_regex2, SDT_OP_REGEX2, REG_EXTENDED);
+   if (ret)
+   goto free_regex1;
+
+   initialized = 1;
+   return 0;
+
+free_regex1:
+   regfree(_op_regex1);
+error:
+   pr_debug4("Regex compilation error.\n");
+   return ret;
+}
+
+/*
+ * SDT marker arguments on Arm64 uses %xREG or [sp, NUM], currently
+ * support these two formats.
+ */
+int arch_sdt_arg_parse_op(char *old_op, char **new_op)
+{
+   int ret, new_len;
+   regmatch_t rm[5];
+
+   ret = sdt_init_op_regex();
+   if (ret < 0)
+   return ret;
+
+   if (!regexec(_op_regex1, old_op, 3, rm, 0)) {
+   /* Extract xNUM */
+   new_len = 2;/* % NULL */
+   new_len += (int)(rm[1].rm_eo - rm[1].rm_so);
+
+   *new_op = zalloc(new_len);
+   if (!*new_op)
+   return -ENOMEM;
+
+   scnprintf(*new_op, new_len, "%%%.*s",
+   (int)(rm[1].rm_eo - rm[1].rm_so), old_op + rm[1].rm_so);
+   } else if (!regexec(_op_regex2, old_op, 5, rm, 0)) {
+   /* [sp], [sp, NUM] or [sp,NUM] */
+   new_len = 7;/* + ( % s p ) NULL */
+
+   /* If the arugment is [sp], need to fill offset '0' */
+   if (rm[2].rm_so == -1)
+   new_len += 1;
+   else
+   new_len += (int)(rm[2].rm_eo - rm[2].rm_so);
+
+   *new_op = zalloc(new_len);
+   if (!*new_op)
+   return -ENOMEM;
+
+   if (rm[2].rm_so == -1)
+   scnprintf(*new_op, new_len, "+0(%%sp)");
+   else
+   scnprintf(*new_op, new_len, "+%.*s(%%sp)",
+ (int)(rm[2].rm_eo - rm[2].rm_so),
+ old_op + rm[2].rm_so);
+   } else {
+   pr_debug4("Skipping unsupported SDT argument: %s\n", old_op);
+   return SDT_ARG_SKIP;
+   }
+
+   return SDT_ARG_VALID;
+}
-- 
2.17.1

[PATCH] powerpc/32s: Fix RTAS machine check with VMAP stack - again

2020-12-22 Thread Christophe Leroy

When it is not a RTAS machine check, don't trash r1
because it is needed by prolog 1.

Fixes: 9c7422b92cb2 ("powerpc/32s: Fix RTAS machine check with VMAP stack")
Cc: sta...@vger.kernel.org
Signed-off-by: Christophe Leroy 
---
Sorry Michael for this last minute fix of the fix.

 arch/powerpc/kernel/head_book3s_32.S | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S 
b/arch/powerpc/kernel/head_book3s_32.S
index fbc48a500846..858fbc8b19f3 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -265,12 +265,14 @@ MachineCheck:
mfspr   r1, SPRN_SPRG_THREAD
lwz r1, RTAS_SP(r1)
cmpwi   cr1, r1, 0
+   bne cr1, 7f
+   mfspr   r1, SPRN_SPRG_SCRATCH2
 #else
mfspr   r11, SPRN_SPRG_THREAD
lwz r11, RTAS_SP(r11)
cmpwi   cr1, r11, 0
-#endif
bne cr1, 7f
+#endif
 #endif /* CONFIG_PPC_CHRP */
EXCEPTION_PROLOG_1 for_rtas=1
 7: EXCEPTION_PROLOG_2
-- 
2.25.0

[PATCH v1 1/2] perf probe: Fixup Arm64 SDT arguments

2020-12-22 Thread Leo Yan

Arm64 ELF section '.note.stapsdt' uses string format "-4@[sp, NUM]" if
the probe is to access data in stack, e.g. below is an example for
dumping Arm64 ELF file and shows the argument format:

  Arguments: -4@[sp, 12] -4@[sp, 8] -4@[sp, 4]

Comparing against other archs' argument format, Arm64's argument
introduces an extra space character in the middle of square brackets,
due to argv_split() uses space as splitter, the argument is wrongly
divided into two items.

To support Arm64 SDT, this patch fixes up for this case, if any item
contains sub string "[sp", concatenates the two continuous items.  And
adds the detailed explaination in comment.

Signed-off-by: Leo Yan 
---
 tools/perf/util/probe-file.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/probe-file.c b/tools/perf/util/probe-file.c
index 064b63a6a3f3..60878c859e60 100644
--- a/tools/perf/util/probe-file.c
+++ b/tools/perf/util/probe-file.c
@@ -794,6 +794,8 @@ static char *synthesize_sdt_probe_command(struct sdt_note 
*note,
char *ret = NULL, **args;
int i, args_count, err;
unsigned long long ref_ctr_offset;
+   char *arg;
+   int arg_idx = 0;
 
if (strbuf_init(, 32) < 0)
return NULL;
@@ -815,8 +817,34 @@ static char *synthesize_sdt_probe_command(struct sdt_note 
*note,
if (note->args) {
args = argv_split(note->args, _count);
 
-   for (i = 0; i < args_count; ++i) {
-   if (synthesize_sdt_probe_arg(, i, args[i]) < 0)
+   for (i = 0; i < args_count; ) {
+   /*
+* FIXUP: Arm64 ELF section '.note.stapsdt' uses string
+* format "-4@[sp, NUM]" if a probe is to access data in
+* the stack, e.g. below is an example for the SDT
+* Arguments:
+*
+*   Arguments: -4@[sp, 12] -4@[sp, 8] -4@[sp, 4]
+*
+* Since the string introduces an extra space character
+* in the middle of square brackets, the argument is
+* divided into two items.  Fixup for this case, if an
+* item contains sub string "[sp,", need to concatenate
+* the two items.
+*/
+   if (strstr(args[i], "[sp,") && (i+1) < args_count) {
+   arg = strcat(args[i], args[i+1]);
+   i += 2;
+   } else {
+   arg = strdup(args[i]);
+   i += 1;
+   }
+
+   err = synthesize_sdt_probe_arg(, arg_idx, arg);
+   free(arg);
+   arg_idx++;
+
+   if (err < 0)
goto error;
}
}
-- 
2.17.1

[PATCH v1 0/2] perf arm64: Support SDT

2020-12-22 Thread Leo Yan

This patch is to enable SDT on Arm64.

Since Arm64 SDT marker in ELF file is different from other archs,
especially for using stack pointer (sp) to retrieve data for local
variables, patch 01 is used to fixup the arguments for this special
case.  Patch 02 is to add argument support for Arm64 SDT.

This patch set has been verified on Arm64/x86_64 platforms with a
testing program usdt_test [1].  The program run the SDT interfaces
one by one for DTRACE_PROBE, DTRACE_PROBE1, ..., DTRACE_PROBE12, so
it tries to verify probe with different count of arguments (the
arguments count is 0 to 12).

The testing flow and result are shown as below:

  # perf buildid-cache --add /root/test/usdt_test
  # perf probe sdt_usdt:test_probe
  # perf probe sdt_usdt:test_probe_param1
  # perf probe sdt_usdt:test_probe_param1x
  # perf probe sdt_usdt:test_probe_param2
  # perf probe sdt_usdt:test_probe_param2x
  # perf probe sdt_usdt:test_probe_param3
  # perf probe sdt_usdt:test_probe_param3x
  # perf probe sdt_usdt:test_probe_param4
  # perf probe sdt_usdt:test_probe_param4x
  # perf probe sdt_usdt:test_probe_param5
  # perf probe sdt_usdt:test_probe_param5x
  # perf probe sdt_usdt:test_probe_param6
  # perf probe sdt_usdt:test_probe_param6x
  # perf probe sdt_usdt:test_probe_param7
  # perf probe sdt_usdt:test_probe_param7x
  # perf probe sdt_usdt:test_probe_param8
  # perf probe sdt_usdt:test_probe_param8x
  # perf probe sdt_usdt:test_probe_param9
  # perf probe sdt_usdt:test_probe_param9x
  # perf probe sdt_usdt:test_probe_param10
  # perf probe sdt_usdt:test_probe_param10x
  # perf probe sdt_usdt:test_probe_param11
  # perf probe sdt_usdt:test_probe_param11x
  # perf probe sdt_usdt:test_probe_param12
  # perf probe sdt_usdt:test_probe_param12x

  # perf record \
-e sdt_usdt:test_probe_param1 -e sdt_usdt:test_probe_param1x \
-e sdt_usdt:test_probe_param2 -e sdt_usdt:test_probe_param2x \
-e sdt_usdt:test_probe_param3 -e sdt_usdt:test_probe_param3x \
-e sdt_usdt:test_probe_param4 -e sdt_usdt:test_probe_param4x \
-e sdt_usdt:test_probe_param5 -e sdt_usdt:test_probe_param5x \
-e sdt_usdt:test_probe_param6 -e sdt_usdt:test_probe_param6x \
-e sdt_usdt:test_probe_param7 -e sdt_usdt:test_probe_param7x \
-e sdt_usdt:test_probe_param8 -e sdt_usdt:test_probe_param8x \
-e sdt_usdt:test_probe_param9 -e sdt_usdt:test_probe_param9x \
-e sdt_usdt:test_probe_param10 -e sdt_usdt:test_probe_param10x \
-e sdt_usdt:test_probe_param11 -e sdt_usdt:test_probe_param11x \
-e sdt_usdt:test_probe_param12 -e sdt_usdt:test_probe_param12x \
-e sdt_usdt:test_probe  -aR sleep 5

   # ./usdt_test   => Execute in another terminal

   # perf script

   usdt_test  7999 [003] 80493.418276:  sdt_usdt:test_probe: 
(b0d80714)
   usdt_test  7999 [003] 80493.418352:   sdt_usdt:test_probe_param1: 
(b0d80728) arg1=1
   usdt_test  7999 [003] 80493.418379:   sdt_usdt:test_probe_param2: 
(b0d80744) arg1=1 arg2=2
   usdt_test  7999 [003] 80493.418405:   sdt_usdt:test_probe_param3: 
(b0d80764) arg1=1 arg2=2 arg3=3
   usdt_test  7999 [003] 80493.418432:   sdt_usdt:test_probe_param4: 
(b0d80788) arg1=1 arg2=2 arg3=3 arg4=4
   usdt_test  7999 [003] 80493.418459:   sdt_usdt:test_probe_param5: 
(b0d807b0) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5
   usdt_test  7999 [003] 80493.418487:   sdt_usdt:test_probe_param6: 
(b0d807dc) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6
   usdt_test  7999 [003] 80493.418516:   sdt_usdt:test_probe_param7: 
(b0d8080c) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7
   usdt_test  7999 [003] 80493.418545:   sdt_usdt:test_probe_param8: 
(b0d80840) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8
   usdt_test  7999 [003] 80493.418574:   sdt_usdt:test_probe_param9: 
(b0d80874) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8 arg9=9
   usdt_test  7999 [003] 80493.418603:  sdt_usdt:test_probe_param10: 
(b0d808a8) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8 arg9=9 
arg10=10
   usdt_test  7999 [003] 80493.418632:  sdt_usdt:test_probe_param11: 
(b0d808dc) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8 arg9=9 
arg10=10 arg11=11
   usdt_test  7999 [003] 80493.418662:  sdt_usdt:test_probe_param12: 
(b0d80910) arg1=1 arg2=2 arg3=3 arg4=4 arg5=5 arg6=6 arg7=7 arg8=8 arg9=9 
arg10=10 arg11=11 arg12=12
   usdt_test  7999 [003] 80493.418687:  sdt_usdt:test_probe_param1x: 
(b0d8092c) arg1=1
   usdt_test  7999 [003] 80493.418713:  sdt_usdt:test_probe_param2x: 
(b0d80950) arg1=1 arg2=2
   usdt_test  7999 [003] 80493.418739:  sdt_usdt:test_probe_param3x: 
(b0d8097c) arg1=1 arg2=2 arg3=3
   usdt_test  7999 [003] 80493.418766:  sdt_usdt:test_probe_param4x: 
(b0d809b0) arg1=1 arg2=2 arg3=3 arg4=4
   usdt_test  7999 [003] 80493.418792:  sdt_usdt:test_probe_param5x: 
(b0d809ec) arg1=1 arg2=2 arg3=3

[PATCH] kconfig: remove 'kvmconfig' and 'xenconfig' shorthands

2020-12-22 Thread Masahiro Yamada

Linux 5.10 is out. Remove the 'kvmconfig' and 'xenconfig' shorthands
as previously announced.

Signed-off-by: Masahiro Yamada 
---

 scripts/kconfig/Makefile | 10 --
 1 file changed, 10 deletions(-)

diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile
index e46df0a2d4f9..2c40e68853dd 100644
--- a/scripts/kconfig/Makefile
+++ b/scripts/kconfig/Makefile
@@ -94,16 +94,6 @@ configfiles=$(wildcard $(srctree)/kernel/configs/$@ 
$(srctree)/arch/$(SRCARCH)/c
$(Q)$(CONFIG_SHELL) $(srctree)/scripts/kconfig/merge_config.sh -m 
.config $(configfiles)
$(Q)$(MAKE) -f $(srctree)/Makefile olddefconfig
 
-PHONY += kvmconfig
-kvmconfig: kvm_guest.config
-   @echo >&2 "WARNING: 'make $@' will be removed after Linux 5.10"
-   @echo >&2 " Please use 'make $<' instead."
-
-PHONY += xenconfig
-xenconfig: xen.config
-   @echo >&2 "WARNING: 'make $@' will be removed after Linux 5.10"
-   @echo >&2 " Please use 'make $<' instead."
-
 PHONY += tinyconfig
 tinyconfig:
$(Q)$(MAKE) -f $(srctree)/Makefile allnoconfig tiny.config
-- 
2.27.0

Re: [PATCH] sh: check return code of request_irq

2020-12-22 Thread Masahiro Yamada

On Wed, Dec 23, 2020 at 5:54 AM Nick Desaulniers
 wrote:
>
> request_irq is marked __must_check, but the call in shx3_prepare_cpus
> has a void return type, so it can't propagate failure to the caller.
> Follow cues from hexagon and just print an error.
>
> Fixes: c7936b9abcf5 ("sh: smp: Hook in to the generic IPI handler for SH-X3 
> SMP.")
> Cc: Miguel Ojeda 
> Cc: Paul Mundt 
> Reported-by: Guenter Roeck 
> Signed-off-by: Nick Desaulniers 


Thanks for the patch, Nick.

I just wondered if there was a better error handling than
printing the message. I have no idea if the system will
boot up correctly when the request_irq() fails here.

I hope the maintainers will suggest something, if any.




> ---
>  arch/sh/kernel/cpu/sh4a/smp-shx3.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/sh/kernel/cpu/sh4a/smp-shx3.c 
> b/arch/sh/kernel/cpu/sh4a/smp-shx3.c
> index f8a2bec0f260..1261dc7b84e8 100644
> --- a/arch/sh/kernel/cpu/sh4a/smp-shx3.c
> +++ b/arch/sh/kernel/cpu/sh4a/smp-shx3.c
> @@ -73,8 +73,9 @@ static void shx3_prepare_cpus(unsigned int max_cpus)
> BUILD_BUG_ON(SMP_MSG_NR >= 8);
>
> for (i = 0; i < SMP_MSG_NR; i++)
> -   request_irq(104 + i, ipi_interrupt_handler,
> -   IRQF_PERCPU, "IPI", (void *)(long)i);
> +   if (request_irq(104 + i, ipi_interrupt_handler,
> +   IRQF_PERCPU, "IPI", (void *)(long)i))
> +   pr_err("Failed to request irq %d\n", i);
>
> for (i = 0; i < max_cpus; i++)
> set_cpu_present(i, true);
> --
> 2.29.2.729.g45daf8777d-goog
>


-- 
Best Regards
Masahiro Yamada

Re: linux-next: Tree for Dec 21 (objtool warning)

2020-12-22 Thread Randy Dunlap

On 12/22/20 9:09 PM, Josh Poimboeuf wrote:
> On Mon, Dec 21, 2020 at 08:03:17AM -0800, Randy Dunlap wrote:
>> On 12/20/20 7:18 PM, Stephen Rothwell wrote:
>>> Hi all,
>>>
>>> News: there will be no linux-next releases between Dec 24 and Jan
>>> 3 inclusive.
>>>
>>> Please do not add any v5.12 destined code to your linux-next included
>>> branches until after v5.11-rc1 has been released.
>>>
>>> Changes since 20201218:
>>>
>>
>> on x86_64:
>>
>> arch/x86/kernel/sys_ia32.o: warning: objtool: cp_stat64()+0xd8: call to 
>> new_encode_dev() with UACCESS enabled
> 
> Can you send a .o for this one?  Please gzip it because my email has
> been rejecting .o files lately :-/
> 

Sure, it's attached.

-- 
~Randy



sys_ia32.o.gz
Description: application/gzip

[PATCH] mm/buffer.c: remove the macro check in check_irqs_on()

2020-12-22 Thread Hui Su

The macro irqs_disabled is always defined in include/linux/irqflags.h,
so we don't need the macro check.

Signed-off-by: Hui Su 
---
 fs/buffer.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 32647d2011df..34b505542d96 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1247,9 +1247,7 @@ static DEFINE_PER_CPU(struct bh_lru, bh_lrus) = {{ NULL 
}};
 
 static inline void check_irqs_on(void)
 {
-#ifdef irqs_disabled
BUG_ON(irqs_disabled());
-#endif
 }
 
 /*
-- 
2.25.1

[PATCH v2 1/3] iommu/vt-d: Move intel_iommu info from struct intel_svm to struct intel_svm_dev

2020-12-22 Thread Liu Yi L

Current struct intel_svm has a field to record the struct intel_iommu
pointer for a PASID bind. And struct intel_svm will be shared by all
the devices bind to the same process. The devices may be behind different
DMAR units. As the iommu driver code uses the intel_iommu pointer stored
in intel_svm struct to do cache invalidations, it may only flush the cache
on a single DMAR unit, for others, the cache invalidation is missed.

As intel_svm struct already has a device list, this patch just moves the
intel_iommu pointer to be a field of intel_svm_dev struct.

Fixes: 1c4f88b7f1f92 ("iommu/vt-d: Shared virtual address in scalable mode")
Cc: Lu Baolu 
Cc: Jacob Pan 
Cc: Raj Ashok 
Cc: David Woodhouse 
Reported-by: Guo Kaijie 
Reported-by: Xin Zeng 
Signed-off-by: Guo Kaijie 
Signed-off-by: Xin Zeng 
Signed-off-by: Liu Yi L 
Tested-by: Guo Kaijie 
---
 drivers/iommu/intel/svm.c   | 9 +
 include/linux/intel-iommu.h | 2 +-
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 3242ebd0bca3..4a10c9ff368c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -142,7 +142,7 @@ static void intel_flush_svm_range_dev (struct intel_svm 
*svm, struct intel_svm_d
}
desc.qw2 = 0;
desc.qw3 = 0;
-   qi_submit_sync(svm->iommu, , 1, 0);
+   qi_submit_sync(sdev->iommu, , 1, 0);
 
if (sdev->dev_iotlb) {
desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
@@ -166,7 +166,7 @@ static void intel_flush_svm_range_dev (struct intel_svm 
*svm, struct intel_svm_d
}
desc.qw2 = 0;
desc.qw3 = 0;
-   qi_submit_sync(svm->iommu, , 1, 0);
+   qi_submit_sync(sdev->iommu, , 1, 0);
}
 }
 
@@ -211,7 +211,7 @@ static void intel_mm_release(struct mmu_notifier *mn, 
struct mm_struct *mm)
 */
rcu_read_lock();
list_for_each_entry_rcu(sdev, >devs, list)
-   intel_pasid_tear_down_entry(svm->iommu, sdev->dev,
+   intel_pasid_tear_down_entry(sdev->iommu, sdev->dev,
svm->pasid, true);
rcu_read_unlock();
 
@@ -363,6 +363,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
}
sdev->dev = dev;
sdev->sid = PCI_DEVID(info->bus, info->devfn);
+   sdev->iommu = iommu;
 
/* Only count users if device has aux domains */
if (iommu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX))
@@ -546,6 +547,7 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
goto out;
}
sdev->dev = dev;
+   sdev->iommu = iommu;
 
ret = intel_iommu_enable_pasid(iommu, dev);
if (ret) {
@@ -575,7 +577,6 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
kfree(sdev);
goto out;
}
-   svm->iommu = iommu;
 
if (pasid_max > intel_pasid_max_id)
pasid_max = intel_pasid_max_id;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index d956987ed032..94522685a0d9 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -758,6 +758,7 @@ struct intel_svm_dev {
struct list_head list;
struct rcu_head rcu;
struct device *dev;
+   struct intel_iommu *iommu;
struct svm_dev_ops *ops;
struct iommu_sva sva;
u32 pasid;
@@ -771,7 +772,6 @@ struct intel_svm {
struct mmu_notifier notifier;
struct mm_struct *mm;
 
-   struct intel_iommu *iommu;
unsigned int flags;
u32 pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
-- 
2.25.1

[PATCH] ubifs: Fix read out-of-bounds in ubifs_jnl_write_inode()

2020-12-22 Thread Chengsong Ke

From: kechengsong 

ubifs_jnl_write_inode() probably cause read out-of-bounds in some situation.
There is kasan stack:
[  336.432159] BUG: KASAN: slab-out-of-bounds in 
ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.433634] Read of size 4 at addr 888019612ff8 by task kworker/u8:4/135
[  336.434605]
[  336.434830] CPU: 1 PID: 135 Comm: kworker/u8:4 Not tainted 
5.10.0-11826-gaf2a097952f3-dirty #338
[  336.436050] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  336.437876] Workqueue: writeback wb_workfn (flush-ubifs_0_0)
[  336.438670] Call Trace:
[  336.439021]  ? dump_stack+0xdd/0x126
[  336.439513]  ? print_address_description.constprop.0+0x2c/0x3c0
[  336.440308]  ? _raw_write_lock_irqsave+0x140/0x140
[  336.440921]  ? ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.441546]  ? ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.442186]  ? kasan_report.cold+0x5d/0xd8
[  336.442711]  ? nand_reset_op+0x280/0x310
[  336.443218]  ? ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.443842]  ? __asan_load4+0x77/0x120
[  336.444334]  ? ecc_sw_hamming_calculate+0x1dc/0x7d0
[  336.444963]  ? nand_ecc_sw_hamming_calculate+0x6c/0x80
[  336.445619]  ? rawnand_sw_hamming_calculate+0x12/0x20
[  336.446263]  ? nand_write_page_swecc+0xa9/0x160
[  336.446849]  ? nand_do_write_ops+0x390/0x830
[  336.447406]  ? __writeback_single_inode+0x6cc/0x880
[  336.448041]  ? nand_write_oob+0x78/0x100
[  336.448568]  ? mtd_write_oob_std+0xe2/0x160
[  336.449127]  ? mtd_write_oob+0xec/0x1b0
[  336.449679]  ? mtd_write+0x92/0xf0
[  336.450128]  ? mtd_write_oob+0x1b0/0x1b0
[  336.450633]  ? ubi_self_check_all_ff+0x82/0x2e0 [ubi]
[  336.451328]  ? __list_add_valid+0x2b/0x130
[  336.451865]  ? ubi_io_write+0x2c2/0xa90 [ubi]
[  336.452472]  ? _raw_read_lock_irq+0x90/0x90
[  336.453078]  ? kmem_cache_alloc_trace+0x465/0x8b0
[  336.453749]  ? do_sync_erase+0x350/0x350 [ubi]
[  336.454430]  ? __kasan_check_write+0x20/0x30
[  336.455050]  ? down_write+0xf2/0x190
[  336.455569]  ? down_write_killable+0x1b0/0x1b0
[  336.456221]  ? check_mapping+0x2c/0x590 [ubi]
[  336.456890]  ? ubi_eba_write_leb+0x58a/0xfa0 [ubi]
[  336.457618]  ? __kmalloc+0x490/0x910
[  336.458142]  ? ubifs_jnl_write_inode.cold+0x6f/0x878 [ubifs]
[  336.459033]  ? writeback_sb_inodes+0x3a9/0x9a0
[  336.459672]  ? __writeback_inodes_wb+0xc8/0x170
[  336.460330]  ? wb_writeback+0x637/0x700
[  336.460882]  ? wb_workfn+0x8af/0xb80
[  336.461398]  ? process_one_work+0x467/0x9f0
[  336.462004]  ? worker_thread+0x34d/0x8e0
[  336.462582]  ? kthread+0x204/0x280
[  336.463047]  ? ret_from_fork+0x1f/0x30
[  336.463570]  ? create_prof_cpu_mask+0x30/0x30
[  336.464185]  ? ubi_eba_read_leb_sg+0x1f0/0x1f0 [ubi]
[  336.464917]  ? hrtimer_active+0x9b/0x100
[  336.465468]  ? ubi_leb_write+0x22c/0x2f0 [ubi]
[  336.466130]  ? ubifs_leb_write+0xf2/0x1b0 [ubifs]
[  336.466851]  ? ubifs_wbuf_write_nolock+0x412/0x1280 [ubifs]
[  336.467686]  ? write_head+0xdf/0x1c0 [ubifs]
[  336.468355]  ? ubifs_jnl_write_inode.cold+0x3ec/0x878 [ubifs]
[  336.469183]  ? ret_from_fork+0x1e/0x30
[  336.469707]  ? ubifs_jnl_write_data+0x660/0x660 [ubifs]
[  336.470497]  ? unwind_next_frame+0x247/0xca0
[  336.471095]  ? ret_from_fork+0x1f/0x30
[  336.471574]  ? fprop_reflect_period_percpu.isra.0+0x1f/0x1b0
[  336.472335]  ? generic_writepages+0x93/0x140
[  336.472933]  ? __kasan_check_write+0x20/0x30
[  336.473526]  ? mutex_lock+0xa6/0x110
[  336.474031]  ? __mutex_lock_slowpath+0x30/0x30
[  336.474662]  ? ubifs_write_inode+0x1c3/0x290 [ubifs]
[  336.475446]  ? __writeback_single_inode+0x6cc/0x880
[  336.476155]  ? wbc_attach_and_unlock_inode+0x2b6/0x400
[  336.476891]  ? writeback_sb_inodes+0x3a9/0x9a0
[  336.477528]  ? write_inode_now+0x1e0/0x1e0
[  336.478119]  ? __writeback_inodes_wb+0xc8/0x170
[  336.478770]  ? wb_writeback+0x637/0x700
[  336.479326]  ? __writeback_inodes_wb+0x170/0x170
[  336.479992]  ? current_work+0xa0/0xa0
[  336.480524]  ? _find_next_bit.constprop.0+0x3e/0x140
[  336.481241]  ? find_next_bit+0x18/0x30
[  336.481780]  ? cpumask_next+0x2f/0x40
[  336.482312]  ? wb_workfn+0x8af/0xb80
[  336.482832]  ? update_cfs_group+0x1e/0x1b0
[  336.483421]  ? inode_wait_for_writeback+0x60/0x60
[  336.484106]  ? schedule+0xb7/0x240
[  336.484595]  ? finish_task_switch+0x14e/0x9a0
[  336.485225]  ? __kasan_check_write+0x20/0x30
[  336.485841]  ? __schedule+0x6f4/0x1600
[  336.486382]  ? __kasan_check_read+0x1d/0x30
[  336.486981]  ? read_word_at_a_time+0x16/0x30
[  336.487594]  ? process_one_work+0x467/0x9f0
[  336.488198]  ? worker_thread+0x34d/0x8e0
[  336.488762]  ? rescuer_thread+0x820/0x820
[  336.489344]  ? kthread+0x204/0x280
[  336.489839]  ? kthread_bind+0x50/0x50
[  336.490367]  ? ret_from_fork+0x1f/0x30
[  336.490913]
[  336.491138] Allocated by task 135:
[  336.491629]  kasan_save_stack+0x23/0x60
[  336.492189]  __kasan_kmalloc.constprop.0+0x10b/0x120
[  336.492898]  kasan_kmalloc+0xd/0x20
[  336.493401]

[PATCH v2 3/3] iommu/vt-d: Fix ineffective devTLB invalidation for subdevices

2020-12-22 Thread Liu Yi L

iommu_flush_dev_iotlb() is called to invalidate caches on device. It only
loops the devices which are full-attached to the domain. For sub-devices,
this is ineffective. This results in invalid caching entries left on the
device. Fix it by adding loop for subdevices as well. Also, the domain->
has_iotlb_device needs to be updated when attaching to subdevices.

Fixes: 67b8e02b5e761 ("iommu/vt-d: Aux-domain specific domain attach/detach")
Signed-off-by: Liu Yi L 
---
 drivers/iommu/intel/iommu.c | 63 +++--
 1 file changed, 47 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index acfe0a5b955e..e97c5ac1d7fc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -726,6 +726,8 @@ static int domain_update_device_node(struct dmar_domain 
*domain)
return nid;
 }
 
+static void domain_update_iotlb(struct dmar_domain *domain);
+
 /* Some capabilities may be different across iommus */
 static void domain_update_iommu_cap(struct dmar_domain *domain)
 {
@@ -739,6 +741,8 @@ static void domain_update_iommu_cap(struct dmar_domain 
*domain)
 */
if (domain->nid == NUMA_NO_NODE)
domain->nid = domain_update_device_node(domain);
+
+   domain_update_iotlb(domain);
 }
 
 struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
@@ -1459,6 +1463,18 @@ iommu_support_dev_iotlb (struct dmar_domain *domain, 
struct intel_iommu *iommu,
return NULL;
 }
 
+static bool dev_iotlb_enabled(struct device_domain_info *info)
+{
+   struct pci_dev *pdev;
+
+   if (!info->dev || !dev_is_pci(info->dev))
+   return false;
+
+   pdev = to_pci_dev(info->dev);
+
+   return !!pdev->ats_enabled;
+}
+
 static void domain_update_iotlb(struct dmar_domain *domain)
 {
struct device_domain_info *info;
@@ -1466,17 +1482,20 @@ static void domain_update_iotlb(struct dmar_domain 
*domain)
 
assert_spin_locked(_domain_lock);
 
-   list_for_each_entry(info, >devices, link) {
-   struct pci_dev *pdev;
-
-   if (!info->dev || !dev_is_pci(info->dev))
-   continue;
-
-   pdev = to_pci_dev(info->dev);
-   if (pdev->ats_enabled) {
+   list_for_each_entry(info, >devices, link)
+   if (dev_iotlb_enabled(info)) {
has_iotlb_device = true;
break;
}
+
+   if (!has_iotlb_device) {
+   struct subdev_domain_info *sinfo;
+
+   list_for_each_entry(sinfo, >subdevices, link_domain)
+   if (dev_iotlb_enabled(get_domain_info(sinfo->pdev))) {
+   has_iotlb_device = true;
+   break;
+   }
}
 
domain->has_iotlb_device = has_iotlb_device;
@@ -1557,25 +1576,37 @@ static void iommu_disable_dev_iotlb(struct 
device_domain_info *info)
 #endif
 }
 
+static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
+   u64 addr, unsigned int mask)
+{
+   u16 sid, qdep;
+
+   if (!info || !info->ats_enabled)
+   return;
+
+   sid = info->bus << 8 | info->devfn;
+   qdep = info->ats_qdep;
+   qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
+  qdep, addr, mask);
+}
+
 static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
  u64 addr, unsigned mask)
 {
-   u16 sid, qdep;
unsigned long flags;
struct device_domain_info *info;
+   struct subdev_domain_info *sinfo;
 
if (!domain->has_iotlb_device)
return;
 
spin_lock_irqsave(_domain_lock, flags);
-   list_for_each_entry(info, >devices, link) {
-   if (!info->ats_enabled)
-   continue;
+   list_for_each_entry(info, >devices, link)
+   __iommu_flush_dev_iotlb(info, addr, mask);
 
-   sid = info->bus << 8 | info->devfn;
-   qdep = info->ats_qdep;
-   qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
-   qdep, addr, mask);
+   list_for_each_entry(sinfo, >subdevices, link_domain) {
+   __iommu_flush_dev_iotlb(get_domain_info(sinfo->pdev),
+   addr, mask);
}
spin_unlock_irqrestore(_domain_lock, flags);
 }
-- 
2.25.1

[PATCH v2 2/3] iommu/vt-d: Track device aux-attach with subdevice_domain_info

2020-12-22 Thread Liu Yi L

In the existing code, loop all devices attached to a domain does not
include sub-devices attached via iommu_aux_attach_device().

This was found by when I'm working on the belwo patch, There is no
device in the domain->devices list, thus unable to get the cap and
ecap of iommu unit. But this domain actually has subdevice which is
attached via aux-manner. But it is tracked by domain. This patch is
going to fix it.

https://lore.kernel.org/kvm/1599734733-6431-17-git-send-email-yi.l@intel.com/

And this fix goes beyond the patch above, such sub-device tracking is
necessary for other cases. For example, flushing device_iotlb for a
domain which has sub-devices attached by auxiliary manner.

Co-developed-by: Xin Zeng 
Signed-off-by: Xin Zeng 
Signed-off-by: Liu Yi L 
---
 drivers/iommu/intel/iommu.c | 95 +++--
 include/linux/intel-iommu.h | 16 +--
 2 files changed, 82 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index a49afa11673c..acfe0a5b955e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1881,6 +1881,7 @@ static struct dmar_domain *alloc_domain(int flags)
domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
domain->has_iotlb_device = false;
INIT_LIST_HEAD(>devices);
+   INIT_LIST_HEAD(>subdevices);
 
return domain;
 }
@@ -2632,7 +2633,7 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
info->iommu = iommu;
info->pasid_table = NULL;
info->auxd_enabled = 0;
-   INIT_LIST_HEAD(>auxiliary_domains);
+   INIT_LIST_HEAD(>subdevices);
 
if (dev && dev_is_pci(dev)) {
struct pci_dev *pdev = to_pci_dev(info->dev);
@@ -5172,33 +5173,61 @@ is_aux_domain(struct device *dev, struct iommu_domain 
*domain)
domain->type == IOMMU_DOMAIN_UNMANAGED;
 }
 
-static void auxiliary_link_device(struct dmar_domain *domain,
- struct device *dev)
+static inline struct subdev_domain_info *
+lookup_subdev_info(struct dmar_domain *domain, struct device *dev)
+{
+   struct subdev_domain_info *sinfo;
+
+   if (!list_empty(>subdevices)) {
+   list_for_each_entry(sinfo, >subdevices, link_domain) {
+   if (sinfo->pdev == dev)
+   return sinfo;
+   }
+   }
+
+   return NULL;
+}
+
+static int auxiliary_link_device(struct dmar_domain *domain,
+struct device *dev)
 {
struct device_domain_info *info = get_domain_info(dev);
+   struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
 
assert_spin_locked(_domain_lock);
if (WARN_ON(!info))
-   return;
+   return -EINVAL;
+
+   if (!sinfo) {
+   sinfo = kzalloc(sizeof(*sinfo), GFP_ATOMIC);
+   sinfo->domain = domain;
+   sinfo->pdev = dev;
+   list_add(>link_phys, >subdevices);
+   list_add(>link_domain, >subdevices);
+   }
 
-   domain->auxd_refcnt++;
-   list_add(>auxd, >auxiliary_domains);
+   return ++sinfo->users;
 }
 
-static void auxiliary_unlink_device(struct dmar_domain *domain,
-   struct device *dev)
+static int auxiliary_unlink_device(struct dmar_domain *domain,
+  struct device *dev)
 {
struct device_domain_info *info = get_domain_info(dev);
+   struct subdev_domain_info *sinfo = lookup_subdev_info(domain, dev);
+   int ret;
 
assert_spin_locked(_domain_lock);
-   if (WARN_ON(!info))
-   return;
+   if (WARN_ON(!info || !sinfo || sinfo->users <= 0))
+   return -EINVAL;
 
-   list_del(>auxd);
-   domain->auxd_refcnt--;
+   ret = --sinfo->users;
+   if (!ret) {
+   list_del(>link_phys);
+   list_del(>link_domain);
+   kfree(sinfo);
+   }
 
-   if (!domain->auxd_refcnt && domain->default_pasid > 0)
-   ioasid_free(domain->default_pasid);
+   return ret;
 }
 
 static int aux_domain_add_dev(struct dmar_domain *domain,
@@ -5227,6 +5256,19 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
}
 
spin_lock_irqsave(_domain_lock, flags);
+   ret = auxiliary_link_device(domain, dev);
+   if (ret <= 0)
+   goto link_failed;
+
+   /*
+* Subdevices from the same physical device can be attached to the
+* same domain. For such cases, only the first subdevice attachment
+* needs to go through the full steps in this function. So if ret >
+* 1, just goto out.
+*/
+   if (ret > 1)
+   goto out;
+
/*
 * iommu->lock must be held to attach domain to iommu and setup the
 * pasid entry for second level translation.
@@ -5245,10

[PATCH v2 0/3] iommu/vt-d: Misc fixes on scalable mode

2020-12-22 Thread Liu Yi L

This patchset aims to fix a bug regards to native SVM usage, and
also several bugs around subdevice (attached to device via auxiliary
manner) tracking and ineffective device_tlb flush.

Liu Yi L (3):
  iommu/vt-d: Move intel_iommu info from struct intel_svm to struct
intel_svm_dev
  iommu/vt-d: Track device aux-attach with subdevice_domain_info
  iommu/vt-d: Fix ineffective devTLB invalidation for subdevices

 drivers/iommu/intel/iommu.c | 158 +++-
 drivers/iommu/intel/svm.c   |   9 +-
 include/linux/intel-iommu.h |  18 ++--
 3 files changed, 135 insertions(+), 50 deletions(-)

-- 
2.25.1

[PATCH] x86/iommu: Fix two minimal issues in check_iommu_entries()

2020-12-22 Thread Zhenzhong Duan

check_iommu_entries() checks for cyclic dependency in iommu entries
and fixes the cyclic dependency by setting x->depend to NULL. But
this repairing isn't correct if q is in front of p, there will be
"EXECUTION ORDER INVALID!" report following. Fix it by NULLing
whichever in the front.

The second issue is about the report of exectuion order reverse,
the order is reversed incorrectly in the report, fix it.

Signed-off-by: Zhenzhong Duan 
---
 arch/x86/kernel/pci-iommu_table.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/pci-iommu_table.c 
b/arch/x86/kernel/pci-iommu_table.c
index 2e9006c..40c8249 100644
--- a/arch/x86/kernel/pci-iommu_table.c
+++ b/arch/x86/kernel/pci-iommu_table.c
@@ -60,7 +60,10 @@ void __init check_iommu_entries(struct iommu_table_entry 
*start,
printk(KERN_ERR "CYCLIC DEPENDENCY FOUND! %pS depends 
on %pS and vice-versa. BREAKING IT.\n",
   p->detect, q->detect);
/* Heavy handed way..*/
-   x->depend = NULL;
+   if (p > q)
+   q->depend = NULL;
+   else
+   p->depend = NULL;
}
}
 
@@ -68,7 +71,7 @@ void __init check_iommu_entries(struct iommu_table_entry 
*start,
q = find_dependents_of(p, finish, p);
if (q && q > p) {
printk(KERN_ERR "EXECUTION ORDER INVALID! %pS should be 
called before %pS!\n",
-  p->detect, q->detect);
+  q->detect, p->detect);
}
}
 }
-- 
1.8.3.1

Re: [PATCH] Bluetooth: btrtl: Add null check in setup

2020-12-22 Thread Marcel Holtmann

Hi Abhishek,

> btrtl_dev->ic_info is only available from the controller on cold boot
> (the lmp subversion matches the device model and this is used to look up
> the ic_info). On warm boots (firmware already loaded),
> btrtl_dev->ic_info is null.
> 
> Fixes: 05672a2c14a4 (Bluetooth: btrtl: Enable central-peripheral role)
> Signed-off-by: Abhishek Pandit-Subedi 
> ---
> 
> drivers/bluetooth/btrtl.c | 23 +--
> 1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/bluetooth/btrtl.c b/drivers/bluetooth/btrtl.c
> index 1abf6a4d672734f..978f3c773856b05 100644
> --- a/drivers/bluetooth/btrtl.c
> +++ b/drivers/bluetooth/btrtl.c
> @@ -719,16 +719,19 @@ int btrtl_setup_realtek(struct hci_dev *hdev)
>*/
>   set_bit(HCI_QUIRK_SIMULTANEOUS_DISCOVERY, >quirks);
> 
> - /* Enable central-peripheral role (able to create new connections with
> -  * an existing connection in slave role).
> -  */
> - switch (btrtl_dev->ic_info->lmp_subver) {
> - case RTL_ROM_LMP_8822B:
> - set_bit(HCI_QUIRK_VALID_LE_STATES, >quirks);
> - break;
> - default:
> - rtl_dev_dbg(hdev, "Central-peripheral role not enabled.");
> - break;
> + if (btrtl_dev->ic_info) {
> + /* Enable central-peripheral role (able to create new
> +  * connections with an existing connection in slave role).
> +  */
> + switch (btrtl_dev->ic_info->lmp_subver) {
> + case RTL_ROM_LMP_8822B:
> + set_bit(HCI_QUIRK_VALID_LE_STATES, >quirks);
> + break;
> + default:
> + rtl_dev_dbg(hdev,
> + "Central-peripheral role not enabled.");
> + break;
> + }
>   }


if (!btrtl_dev->ic_info)
goto done;

> 
>   btrtl_free(btrtl_dev);

Regards

Marcel

Re: [PATCH v4] ovl: fix dentry leak in ovl_get_redirect

2020-12-22 Thread Liangyan


Thanks Viro.

@Miklos, can you please advise?

On 20/12/22 上午11:26, Al Viro wrote:

On Tue, Dec 22, 2020 at 11:06:26AM +0800, Liangyan wrote:


Cc: 
Fixes: a6c606551141 ("ovl: redirect on rename-dir")
Signed-off-by: Liangyan 
Reviewed-by: Joseph Qi 
Suggested-by: Al Viro 


Fine by me...  I can put it through vfs.git#fixes, but IMO
that would be better off in overlayfs tree.

Re: [PATCH] Revert "kbuild: avoid static_assert for genksyms"

2020-12-22 Thread Masahiro Yamada

On Sun, Dec 20, 2020 at 3:40 AM Masahiro Yamada  wrote:
>
> This reverts commit 14dc3983b5dff513a90bd5a8cc90acaf7867c3d0.
>
> Macro Elver had sent a fix proper fix earlier, and also pointed out
> corner cases:
>
> "I guess what you propose is simpler, but might still have corner cases
> where we still get warnings. In particular, if some file (for whatever
> reason) does not include build_bug.h and uses a raw _Static_assert(),
> then we still get warnings. E.g. I see 1 user of raw _Static_assert()
> (drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h )."
>
> I believe the raw use of _Static_assert() should be allowed, so this
> should be fixed in genksyms.
>
> Even after commit 14dc3983b5df ("kbuild: avoid static_assert for
> genksyms"), I confirmed the following test code emits the warning.
>
>   >8
>   #include 
>
>   _Static_assert((1 ?: 0), "");
>
>   void foo(void) { }
>   EXPORT_SYMBOL(foo);
>   >8
>
>   WARNING: modpost: EXPORT symbol "foo" [vmlinux] version generation failed, 
> symbol will not be versioned.
>
> Now that commit 869b91992bce ("genksyms: Ignore module scoped

I updated the commit id in the mainline.

 9ab55d7f240f


Now, applied to linux-kbuild.



> _Static_assert()") fixed this issue properly, the workaround should
> be reverted.
>
> Link: https://lkml.org/lkml/2020/12/10/845
> Cc: Marco Elver 
> Signed-off-by: Masahiro Yamada 
> ---
>
> I will apply this after Macro's patch is pulled.
>
>
>
>  include/linux/build_bug.h | 5 -
>  1 file changed, 5 deletions(-)
>
> diff --git a/include/linux/build_bug.h b/include/linux/build_bug.h
> index 7bb66e15b481..e3a0be2c90ad 100644
> --- a/include/linux/build_bug.h
> +++ b/include/linux/build_bug.h
> @@ -77,9 +77,4 @@
>  #define static_assert(expr, ...) __static_assert(expr, ##__VA_ARGS__, #expr)
>  #define __static_assert(expr, msg, ...) _Static_assert(expr, msg)
>
> -#ifdef __GENKSYMS__
> -/* genksyms gets confused by _Static_assert */
> -#define _Static_assert(expr, ...)
> -#endif
> -
>  #endif /* _LINUX_BUILD_BUG_H */
> --
> 2.27.0
>


-- 
Best Regards
Masahiro Yamada

drivers/acpi/x86/s2idle.c:395:13: sparse: sparse: restricted suspend_state_t degrades to integer

2020-12-22 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   614cb5894306cfa2c7d9b6168182876ff5948735
commit: fef98671194be005853cbbf51b164a3927589b64 ACPI: PM: s2idle: Move 
x86-specific code to the x86 directory
date:   5 days ago
config: i386-randconfig-s001-20201221 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-184-g1b896707-dirty
# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fef98671194be005853cbbf51b164a3927589b64
git remote add linus 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
git fetch --no-tags linus master
git checkout fef98671194be005853cbbf51b164a3927589b64
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


"sparse warnings: (new ones prefixed by >>)"
>> drivers/acpi/x86/s2idle.c:395:13: sparse: sparse: restricted suspend_state_t 
>> degrades to integer
   drivers/acpi/x86/s2idle.c:395:33: sparse: sparse: restricted suspend_state_t 
degrades to integer

vim +395 drivers/acpi/x86/s2idle.c

   348  
   349  static int lps0_device_attach(struct acpi_device *adev,
   350const struct acpi_device_id *not_used)
   351  {
   352  union acpi_object *out_obj;
   353  
   354  if (lps0_device_handle)
   355  return 0;
   356  
   357  if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
   358  return 0;
   359  
   360  if (acpi_s2idle_vendor_amd()) {
   361  guid_parse(ACPI_LPS0_DSM_UUID_AMD, _dsm_guid);
   362  out_obj = acpi_evaluate_dsm(adev->handle, 
_dsm_guid, 0, 0, NULL);
   363  rev_id = 0;
   364  } else {
   365  guid_parse(ACPI_LPS0_DSM_UUID, _dsm_guid);
   366  out_obj = acpi_evaluate_dsm(adev->handle, 
_dsm_guid, 1, 0, NULL);
   367  rev_id = 1;
   368  }
   369  
   370  /* Check if the _DSM is present and as expected. */
   371  if (!out_obj || out_obj->type != ACPI_TYPE_BUFFER) {
   372  acpi_handle_debug(adev->handle,
   373"_DSM function 0 evaluation 
failed\n");
   374  return 0;
   375  }
   376  
   377  lps0_dsm_func_mask = *(char *)out_obj->buffer.pointer;
   378  
   379  ACPI_FREE(out_obj);
   380  
   381  acpi_handle_debug(adev->handle, "_DSM function mask: 0x%x\n",
   382lps0_dsm_func_mask);
   383  
   384  lps0_device_handle = adev->handle;
   385  
   386  if (acpi_s2idle_vendor_amd())
   387  lpi_device_get_constraints_amd();
   388  else
   389  lpi_device_get_constraints();
   390  
   391  /*
   392   * Use suspend-to-idle by default if the default suspend mode 
was not
   393   * set from the command line.
   394   */
 > 395  if (mem_sleep_default > PM_SUSPEND_MEM && 
 > !acpi_sleep_default_s3)
   396  mem_sleep_current = PM_SUSPEND_TO_IDLE;
   397  
   398  /*
   399   * Some LPS0 systems, like ASUS Zenbook UX430UNR/i7-8550U, 
require the
   400   * EC GPE to be enabled while suspended for certain wakeup 
devices to
   401   * work, so mark it as wakeup-capable.
   402   */
   403  acpi_ec_mark_gpe_for_wake();
   404  
   405  return 0;
   406  }
   407  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

[PATCH] blokc/blk-merge: remove the next_bvec label in __blk_bios_map_sg()linux-bl...@vger.kernel.org (open list:BLOCK LAYER)

2020-12-22 Thread sh

remove the next_bvec label in __blk_bios_map_sg(), simplify the logic
of traversal bvec.

Signed-off-by: sh 
---
 block/blk-merge.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 808768f6b174..aa113cbc0f35 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -494,15 +494,15 @@ static int __blk_bios_map_sg(struct request_queue *q, 
struct bio *bio,
 * to bio
 */
if (new_bio &&
-   __blk_segment_map_sg_merge(q, , , sg))
-   goto next_bvec;
+   __blk_segment_map_sg_merge(q, , , sg)) {
+   new_bio = false;
+   continue;
+   }
 
if (bvec.bv_offset + bvec.bv_len <= PAGE_SIZE)
nsegs += __blk_bvec_map_sg(bvec, sglist, sg);
else
nsegs += blk_bvec_map_sg(q, , sglist, sg);
- next_bvec:
-   new_bio = false;
}
if (likely(bio->bi_iter.bi_size)) {
bvprv = bvec;
-- 
2.25.1

Re: [PATCH v2 15/48] opp: Support set_opp() customization without requiring to use regulators

2020-12-22 Thread Viresh Kumar

On 17-12-20, 21:06, Dmitry Osipenko wrote:
> Support set_opp() customization without requiring to use regulators. This
> is needed by drivers which want to use dev_pm_opp_set_rate() for changing
> rates of a multiple clocks and don't need to touch regulator.
> 
> One example is NVIDIA Tegra30/114 SoCs which have two sibling 3D hardware
> units which should be use to the same clock rate, meanwhile voltage
> scaling is done using a power domain. In this case OPP table doesn't have
> a regulator, causing a NULL dereference in _set_opp_custom().
> 
> Signed-off-by: Dmitry Osipenko 
> ---
>  drivers/opp/core.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> index 3d02fe33630b..625dae7a5ecb 100644
> --- a/drivers/opp/core.c
> +++ b/drivers/opp/core.c
> @@ -828,17 +828,25 @@ static int _set_opp_custom(const struct opp_table 
> *opp_table,
>  struct dev_pm_opp_supply *old_supply,
>  struct dev_pm_opp_supply *new_supply)
>  {
> - struct dev_pm_set_opp_data *data;
> + struct dev_pm_set_opp_data *data, tmp_data;
> + unsigned int regulator_count;
>   int size;
>  
> - data = opp_table->set_opp_data;
> + if (opp_table->set_opp_data) {
> + data = opp_table->set_opp_data;
> + regulator_count = opp_table->regulator_count;
> + } else {
> + data = _data;
> + regulator_count = 0;
> + }
> +

We should use the same structure, you can add some checks but not replace the
structure altogether.

>   data->regulators = opp_table->regulators;
> - data->regulator_count = opp_table->regulator_count;
> + data->regulator_count = regulator_count;
>   data->clk = opp_table->clk;
>   data->dev = dev;
>  
>   data->old_opp.rate = old_freq;
> - size = sizeof(*old_supply) * opp_table->regulator_count;
> + size = sizeof(*old_supply) * regulator_count;
>   if (!old_supply)
>   memset(data->old_opp.supplies, 0, size);
>   else

-- 
viresh

RE: [PATCH 2/3] aspeed-video: clear spurious interrupt bits unconditionally

2020-12-22 Thread Ryan Chen

> -Original Message-
> From: Zev Weiss 
> Sent: Wednesday, December 23, 2020 11:54 AM
> To: Ryan Chen 
> Cc: Joel Stanley ; Eddie James ;
> Mauro Carvalho Chehab ; Andrew Jeffery
> ; linux-me...@vger.kernel.org; OpenBMC Maillist
> ; Linux ARM
> ; linux-aspeed
> ; Linux Kernel Mailing List
> ; Jae Hyun Yoo 
> Subject: Re: [PATCH 2/3] aspeed-video: clear spurious interrupt bits
> unconditionally
> 
> On Tue, Dec 22, 2020 at 08:53:33PM CST, Ryan Chen wrote:
> >> -Original Message-
> >> From: Joel Stanley 
> >> Sent: Wednesday, December 23, 2020 9:07 AM
> >> To: Zev Weiss ; Ryan Chen
> >> 
> >> Cc: Eddie James ; Mauro Carvalho Chehab
> >> ; Andrew Jeffery ;
> >> linux-me...@vger.kernel.org; OpenBMC Maillist
> >> ; Linux ARM
> >> ; linux-aspeed
> >> ; Linux Kernel Mailing List
> >> ; Jae Hyun Yoo
> >> 
> >> Subject: Re: [PATCH 2/3] aspeed-video: clear spurious interrupt bits
> >> unconditionally
> >>
> >> On Tue, 22 Dec 2020 at 19:14, Zev Weiss  wrote:
> >> >
> >> > On Mon, Dec 21, 2020 at 10:47:37PM CST, Joel Stanley wrote:
> >> > >On Tue, 15 Dec 2020 at 02:46, Zev Weiss 
> wrote:
> >> > >>
> >> > >> Instead of testing and conditionally clearing them one by one,
> >> > >> we can instead just unconditionally clear them all at once.
> >> > >>
> >> > >> Signed-off-by: Zev Weiss 
> >> > >
> >> > >I had a poke at the assembly and it looks like GCC is clearing the
> >> > >bits unconditionally anyway, so removing the tests provides no change.
> >> > >
> >> > >Combining them is a good further optimization.
> >> > >
> >> > >Reviewed-by: Joel Stanley 
> >> > >
> >> > >A question unrelated to this patch: Do you know why the driver
> >> > >doesn't clear the status bits in the interrupt handler? I would
> >> > >expect it to write the value of sts back to the register to ack
> >> > >the pending interrupt.
> >> > >
> >> >
> >> > No, I don't, and I was sort of wondering the same thing actually --
> >> > I'm not deeply familiar with this hardware or driver though, so I
> >> > was a bit hesitant to start messing with things.  (Though maybe
> >> > doing so would address the "stickiness" aspect when it does
> >> > manifest.)  Perhaps Eddie or Jae can shed some light here?
> >>
> >> I think you're onto something here - this would be why the status
> >> bits seem to stick until the device is reset.
> >>
> >> Until Aspeed can clarify if this is a hardware or software issue, I
> >> suggest we ack the bits and log a message when we see them, instead
> >> of always ignoring them without taking any action.
> >>
> >> Can you write a patch that changes the interrupt handler to ack
> >> status bits as it handles each of them?
> >>
> >Hello Zev, before the patch, do you met issue with irq handler?
> >[continuous incoming?]
> >
> >In aspeed_video_irq handler should only handle enable interrupt expected.
> >   u32 sts = aspeed_video_read(video, VE_INTERRUPT_STATUS);
> > + sts &= aspeed_video_read(video, VE_INTERRUPT_CTRL);
> >
> >Ryan
> >
> 
> Hi Ryan,
> 
> Prior to any of these patches I encountered a problem pretty much exactly like
> what Jae described in his commit message in 65d270acb2d (but the kernel I
> was running included that patch).  Adding the diagnostic in patch #1 of this
> series showed that it was apparently the same problem, just with a different
> interrupt that Jae's patch didn't include.
> 
>  From what you wrote above, I gather that it is in fact expected for the
> hardware to assert interrupts that aren't enabled in VE_INTERRUPT_CTRL?
> If so, I guess something like that would obviate the need for both Jae's 
> earlier
> patch and this whole series.
> 
Yes, I expected handle enabled in VE_INTERRUPT_CTRL. 

> I think the question Joel raised is somewhat independent though -- if the
> VE_INTERRUPT_STATUS register asserts interrupts we're not actually using,
> should the driver acknowledge them anyway or just leave them alone?
My opinion will keep them alone, ignore them.

> (Though if we're just going to ignore them anyway maybe it doesn't ultimately
> matter very much.)
> 
> 
> Zev

[PATCH v2] HID: Add Wireless Radio Control feature for Chicony devices

2020-12-22 Thread Jian-Hong Pan

Some Chicony's keyboards support airplane mode hotkey (Fn+F2) with
"Wireless Radio Control" feature. For example, the wireless keyboard
[04f2:1236] shipped with ASUS all-in-one desktop.

After consulting Chicony for this hotkey, learned the device will send
with 0x11 as the report ID and 0x1 as the value when the key is pressed
down.

This patch maps the event as KEY_RFKILL.

Signed-off-by: Jian-Hong Pan 
---
v2: Remove the duplicated key pressed check.

 drivers/hid/hid-chicony.c | 55 +++
 drivers/hid/hid-ids.h |  1 +
 2 files changed, 56 insertions(+)

diff --git a/drivers/hid/hid-chicony.c b/drivers/hid/hid-chicony.c
index 3f0ed6a95223..ca556d39da2a 100644
--- a/drivers/hid/hid-chicony.c
+++ b/drivers/hid/hid-chicony.c
@@ -21,6 +21,39 @@
 
 #include "hid-ids.h"
 
+#define CH_WIRELESS_CTL_REPORT_ID  0x11
+
+static int ch_report_wireless(struct hid_report *report, u8 *data, int size)
+{
+   struct hid_device *hdev = report->device;
+   struct input_dev *input;
+
+   if (report->id != CH_WIRELESS_CTL_REPORT_ID || report->maxfield != 1)
+   return 0;
+
+   input = report->field[0]->hidinput->input;
+   if (!input) {
+   hid_warn(hdev, "can't find wireless radio control's input");
+   return 0;
+   }
+
+   input_report_key(input, KEY_RFKILL, 1);
+   input_sync(input);
+   input_report_key(input, KEY_RFKILL, 0);
+   input_sync(input);
+
+   return 1;
+}
+
+static int ch_raw_event(struct hid_device *hdev,
+   struct hid_report *report, u8 *data, int size)
+{
+   if (report->application == HID_GD_WIRELESS_RADIO_CTLS)
+   return ch_report_wireless(report, data, size);
+
+   return 0;
+}
+
 #define ch_map_key_clear(c)hid_map_usage_clear(hi, usage, bit, max, \
EV_KEY, (c))
 static int ch_input_mapping(struct hid_device *hdev, struct hid_input *hi,
@@ -77,10 +110,30 @@ static __u8 *ch_switch12_report_fixup(struct hid_device 
*hdev, __u8 *rdesc,
return rdesc;
 }
 
+static int ch_probe(struct hid_device *hdev, const struct hid_device_id *id)
+{
+   int ret;
+
+   hdev->quirks |= HID_QUIRK_INPUT_PER_APP;
+   ret = hid_parse(hdev);
+   if (ret) {
+   hid_err(hdev, "Chicony hid parse failed: %d\n", ret);
+   return ret;
+   }
+
+   ret = hid_hw_start(hdev, HID_CONNECT_DEFAULT);
+   if (ret) {
+   hid_err(hdev, "Chicony hw start failed: %d\n", ret);
+   return ret;
+   }
+
+   return 0;
+}
 
 static const struct hid_device_id ch_devices[] = {
{ HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
USB_DEVICE_ID_CHICONY_TACTICAL_PAD) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
USB_DEVICE_ID_CHICONY_WIRELESS2) },
+   { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
USB_DEVICE_ID_CHICONY_WIRELESS3) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
USB_DEVICE_ID_CHICONY_ACER_SWITCH12) },
{ }
 };
@@ -91,6 +144,8 @@ static struct hid_driver ch_driver = {
.id_table = ch_devices,
.report_fixup = ch_switch12_report_fixup,
.input_mapping = ch_input_mapping,
+   .probe = ch_probe,
+   .raw_event = ch_raw_event,
 };
 module_hid_driver(ch_driver);
 
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
index 4c5f23640f9c..06d90301a3dc 100644
--- a/drivers/hid/hid-ids.h
+++ b/drivers/hid/hid-ids.h
@@ -270,6 +270,7 @@
 #define USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE 0x1053
 #define USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE20x0939
 #define USB_DEVICE_ID_CHICONY_WIRELESS20x1123
+#define USB_DEVICE_ID_CHICONY_WIRELESS30x1236
 #define USB_DEVICE_ID_ASUS_AK1D0x1125
 #define USB_DEVICE_ID_CHICONY_TOSHIBA_WT10A0x1408
 #define USB_DEVICE_ID_CHICONY_ACER_SWITCH120x1421
-- 
2.29.2

Re: [PATCH v2 28/48] soc/tegra: Introduce core power domain driver

2020-12-22 Thread Viresh Kumar

On 22-12-20, 22:39, Dmitry Osipenko wrote:
> 22.12.2020 22:21, Dmitry Osipenko пишет:
> >>> + if (IS_ERR(opp)) {
> >>> + dev_err(>dev, "failed to find OPP for level %u: %pe\n",
> >>> + level, opp);
> >>> + return PTR_ERR(opp);
> >>> + }
> >>> +
> >>> + err = dev_pm_opp_set_voltage(>dev, opp);
> >> IIUC, you implemented this callback because you want to use the voltage 
> >> triplet
> >> present in the OPP table ?
> >>
> >> And so you are setting the regulator ("power") later in this patch ?
> > yes
> > 
> >> I am not in favor of implementing this routine, as it just adds a wrapper 
> >> above
> >> the regulator API. What you should be doing rather is get the regulator by
> >> yourself here (instead of depending on the OPP core). And then you can do
> >> dev_pm_opp_get_voltage() here and set the voltage yourself. You may want to
> >> implement a version supporting triplet here though for the same.
> >>
> >> And you won't require the sync version of the API as well then.
> >>
> > That's what I initially did for this driver. I don't mind to revert back
> > to the initial variant in v3, it appeared to me that it will be nicer
> > and cleaner to have OPP API managing everything here.
> 
> I forgot one important detail (why the initial variant wasn't good)..
> OPP entries that have unsupportable voltages should be filtered out and
> OPP core performs the filtering only if regulator is assigned to the OPP
> table.
> 
> If regulator is assigned to the OPP table, then we need to use OPP API
> for driving the regulator, hence that's why I added
> dev_pm_opp_sync_regulators() and dev_pm_opp_set_voltage().
> 
> Perhaps it should be possible to add dev_pm_opp_get_regulator() that

What's wrong with getting the regulator in the driver as well ? Apart from the
OPP core ?

> will return the OPP table regulator in order to allow driver to use the
> regulator directly. But I'm not sure whether this is a much better
> option than the opp_sync_regulators() and opp_set_voltage() APIs.

set_voltage() is still fine as there is some data that the OPP core has, but
sync_regulator() has nothing to do with OPP core.

And this may lead to more wrapper helpers in the OPP core, which I am afraid of.
And so even if it is not the best, I would like the OPP core to provide the data
and not get into this. Ofcourse there is an exception to this, opp_set_rate.

-- 
viresh

[PATCH v2] net/ncsi: Use real net-device for response handler

2020-12-22 Thread John Wang

When aggregating ncsi interfaces and dedicated interfaces to bond
interfaces, the ncsi response handler will use the wrong net device to
find ncsi_dev, so that the ncsi interface will not work properly.
Here, we use the original net device to fix it.

Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler")
Signed-off-by: John Wang 
---
v2:
  Use orig_dev instead of pt->dev
---
 net/ncsi/ncsi-rsp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c
index a94bb59793f0..e1c6bb4ab98f 100644
--- a/net/ncsi/ncsi-rsp.c
+++ b/net/ncsi/ncsi-rsp.c
@@ -1120,7 +1120,7 @@ int ncsi_rcv_rsp(struct sk_buff *skb, struct net_device 
*dev,
int payload, i, ret;
 
/* Find the NCSI device */
-   nd = ncsi_find_dev(dev);
+   nd = ncsi_find_dev(orig_dev);
ndp = nd ? TO_NCSI_DEV_PRIV(nd) : NULL;
if (!ndp)
return -ENODEV;
-- 
2.25.1

[PATCH] drm/hisilicon: Add load and unload callback functions

2020-12-22 Thread Tian Tao

Add the callback functions of drm_driver structure member functions
load and unload, no need to call load in the hibmc_pci_probe function
and unload in the hibmc_pci_remove function.

Signed-off-by: Tian Tao 
---
 drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c 
b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
index 0d4e902..109ca87 100644
--- a/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
+++ b/drivers/gpu/drm/hisilicon/hibmc/hibmc_drm_drv.c
@@ -27,6 +27,9 @@
 
 DEFINE_DRM_GEM_FOPS(hibmc_fops);
 
+static int hibmc_load(struct drm_device *dev, unsigned long flags);
+static void hibmc_unload(struct drm_device *dev);
+
 static irqreturn_t hibmc_drm_interrupt(int irq, void *arg)
 {
struct drm_device *dev = (struct drm_device *)arg;
@@ -63,6 +66,8 @@ static const struct drm_driver hibmc_driver = {
.dumb_map_offset= drm_gem_vram_driver_dumb_mmap_offset,
.gem_prime_mmap = drm_gem_prime_mmap,
.irq_handler= hibmc_drm_interrupt,
+   .load   = hibmc_load,
+   .unload = hibmc_unload,
 };
 
 static int __maybe_unused hibmc_pm_suspend(struct device *dev)
@@ -248,7 +253,7 @@ static int hibmc_hw_init(struct hibmc_drm_private *priv)
return 0;
 }
 
-static int hibmc_unload(struct drm_device *dev)
+static void hibmc_unload(struct drm_device *dev)
 {
drm_atomic_helper_shutdown(dev);
 
@@ -256,11 +261,9 @@ static int hibmc_unload(struct drm_device *dev)
drm_irq_uninstall(dev);
 
pci_disable_msi(dev->pdev);
-
-   return 0;
 }
 
-static int hibmc_load(struct drm_device *dev)
+static int hibmc_load(struct drm_device *dev, unsigned long flags)
 {
struct hibmc_drm_private *priv = to_hibmc_drm_private(dev);
int ret;
@@ -335,12 +338,6 @@ static int hibmc_pci_probe(struct pci_dev *pdev,
goto err_return;
}
 
-   ret = hibmc_load(dev);
-   if (ret) {
-   drm_err(dev, "failed to load hibmc: %d\n", ret);
-   goto err_return;
-   }
-
ret = drm_dev_register(dev, 0);
if (ret) {
drm_err(dev, "failed to register drv for userspace access: 
%d\n",
-- 
2.7.4

[PATCH v13 1/6] locking/qspinlock: Rename mcs lock/unlock macros and make them more generic

2020-12-22 Thread Alex Kogan

The mcs unlock macro (arch_mcs_lock_handoff) should accept the value to be
stored into the lock argument as another argument. This allows using the
same macro in cases where the value to be stored when passing the lock is
different from 1.

Signed-off-by: Alex Kogan 
Reviewed-by: Steve Sistare 
Reviewed-by: Waiman Long 
---
 arch/arm/include/asm/mcs_spinlock.h |  6 +++---
 include/asm-generic/mcs_spinlock.h  |  4 ++--
 kernel/locking/mcs_spinlock.h   | 18 +-
 kernel/locking/qspinlock.c  |  4 ++--
 kernel/locking/qspinlock_paravirt.h |  2 +-
 5 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/arch/arm/include/asm/mcs_spinlock.h 
b/arch/arm/include/asm/mcs_spinlock.h
index 529d2cf4d06f..1eb4d733459c 100644
--- a/arch/arm/include/asm/mcs_spinlock.h
+++ b/arch/arm/include/asm/mcs_spinlock.h
@@ -6,7 +6,7 @@
 #include 
 
 /* MCS spin-locking. */
-#define arch_mcs_spin_lock_contended(lock) \
+#define arch_mcs_spin_wait(lock)   \
 do {   \
/* Ensure prior stores are observed before we enter wfe. */ \
smp_mb();   \
@@ -14,9 +14,9 @@ do {  
\
wfe();  \
 } while (0)\
 
-#define arch_mcs_spin_unlock_contended(lock)   \
+#define arch_mcs_lock_handoff(lock, val)   \
 do {   \
-   smp_store_release(lock, 1); \
+   smp_store_release((lock), (val));   \
dsb_sev();  \
 } while (0)
 
diff --git a/include/asm-generic/mcs_spinlock.h 
b/include/asm-generic/mcs_spinlock.h
index 10cd4ffc6ba2..f933d99c63e0 100644
--- a/include/asm-generic/mcs_spinlock.h
+++ b/include/asm-generic/mcs_spinlock.h
@@ -4,8 +4,8 @@
 /*
  * Architectures can define their own:
  *
- *   arch_mcs_spin_lock_contended(l)
- *   arch_mcs_spin_unlock_contended(l)
+ *   arch_mcs_spin_wait(l)
+ *   arch_mcs_lock_handoff(l, val)
  *
  * See kernel/locking/mcs_spinlock.c.
  */
diff --git a/kernel/locking/mcs_spinlock.h b/kernel/locking/mcs_spinlock.h
index 5e10153b4d3c..904ba5d0f3f4 100644
--- a/kernel/locking/mcs_spinlock.h
+++ b/kernel/locking/mcs_spinlock.h
@@ -21,7 +21,7 @@ struct mcs_spinlock {
int count;  /* nesting count, see qspinlock.c */
 };
 
-#ifndef arch_mcs_spin_lock_contended
+#ifndef arch_mcs_spin_wait
 /*
  * Using smp_cond_load_acquire() provides the acquire semantics
  * required so that subsequent operations happen after the
@@ -29,20 +29,20 @@ struct mcs_spinlock {
  * ARM64 would like to do spin-waiting instead of purely
  * spinning, and smp_cond_load_acquire() provides that behavior.
  */
-#define arch_mcs_spin_lock_contended(l)
\
-do {   \
-   smp_cond_load_acquire(l, VAL);  \
+#define arch_mcs_spin_wait(l)  \
+do {   \
+   smp_cond_load_acquire(l, VAL);  \
 } while (0)
 #endif
 
-#ifndef arch_mcs_spin_unlock_contended
+#ifndef arch_mcs_lock_handoff
 /*
  * smp_store_release() provides a memory barrier to ensure all
  * operations in the critical section has been completed before
  * unlocking.
  */
-#define arch_mcs_spin_unlock_contended(l)  \
-   smp_store_release((l), 1)
+#define arch_mcs_lock_handoff(l, val)  \
+   smp_store_release((l), (val))
 #endif
 
 /*
@@ -91,7 +91,7 @@ void mcs_spin_lock(struct mcs_spinlock **lock, struct 
mcs_spinlock *node)
WRITE_ONCE(prev->next, node);
 
/* Wait until the lock holder passes the lock down. */
-   arch_mcs_spin_lock_contended(>locked);
+   arch_mcs_spin_wait(>locked);
 }
 
 /*
@@ -115,7 +115,7 @@ void mcs_spin_unlock(struct mcs_spinlock **lock, struct 
mcs_spinlock *node)
}
 
/* Pass lock to next waiter. */
-   arch_mcs_spin_unlock_contended(>locked);
+   arch_mcs_lock_handoff(>locked, 1);
 }
 
 #endif /* __LINUX_MCS_SPINLOCK_H */
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index cbff6ba53d56..435d696f9250 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -471,7 +471,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
WRITE_ONCE(prev->next, node);
 
pv_wait_node(node, prev);
-   arch_mcs_spin_lock_contended(>locked);
+

[PATCH v13 3/6] locking/qspinlock: Introduce CNA into the slow path of qspinlock

2020-12-22 Thread Alex Kogan

In CNA, spinning threads are organized in two queues, a primary queue for
threads running on the same node as the current lock holder, and a
secondary queue for threads running on other nodes. After acquiring the
MCS lock and before acquiring the spinlock, the MCS lock
holder checks whether the next waiter in the primary queue (if exists) is
running on the same NUMA node. If it is not, that waiter is detached from
the main queue and moved into the tail of the secondary queue. This way,
we gradually filter the primary queue, leaving only waiters running on
the same preferred NUMA node. For more details, see
https://arxiv.org/abs/1810.05600.

Note that this variant of CNA may introduce starvation by continuously
passing the lock between waiters in the main queue. This issue will be
addressed later in the series.

Enabling CNA is controlled via a new configuration option
(NUMA_AWARE_SPINLOCKS). By default, the CNA variant is patched in at the
boot time only if we run on a multi-node machine in native environment and
the new config is enabled. (For the time being, the patching requires
CONFIG_PARAVIRT_SPINLOCKS to be enabled as well. However, this should be
resolved once static_call() is available.) This default behavior can be
overridden with the new kernel boot command-line option
"numa_spinlock=on/off" (default is "auto").

Signed-off-by: Alex Kogan 
Reviewed-by: Steve Sistare 
Reviewed-by: Waiman Long 
---
 .../admin-guide/kernel-parameters.txt |  10 +
 arch/x86/Kconfig  |  20 ++
 arch/x86/include/asm/qspinlock.h  |   4 +
 arch/x86/kernel/alternative.c |   4 +
 kernel/locking/mcs_spinlock.h |   2 +-
 kernel/locking/qspinlock.c|  42 ++-
 kernel/locking/qspinlock_cna.h| 336 ++
 7 files changed, 413 insertions(+), 5 deletions(-)
 create mode 100644 kernel/locking/qspinlock_cna.h

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 44fde25bb221..a6ae826c6076 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3430,6 +3430,16 @@
numa_balancing= [KNL,X86] Enable or disable automatic NUMA balancing.
Allowed values are enable and disable
 
+   numa_spinlock=  [NUMA, PV_OPS] Select the NUMA-aware variant
+   of spinlock. The options are:
+   auto - Enable this variant if running on a multi-node
+   machine in native environment.
+   on  - Unconditionally enable this variant.
+   off - Unconditionally disable this variant.
+
+   Not specifying this option is equivalent to
+   numa_spinlock=auto.
+
numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA.
'node', 'default' can be specified
This can be set from sysctl after boot.
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index fbf26e0f7a6a..8c5ecbe5bcd6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1565,6 +1565,26 @@ config NUMA
 
  Otherwise, you should say N.
 
+config NUMA_AWARE_SPINLOCKS
+   bool "Numa-aware spinlocks"
+   depends on NUMA
+   depends on QUEUED_SPINLOCKS
+   depends on 64BIT
+   # For now, we depend on PARAVIRT_SPINLOCKS to make the patching work.
+   # This is awkward, but hopefully would be resolved once static_call()
+   # is available.
+   depends on PARAVIRT_SPINLOCKS
+   default y
+   help
+ Introduce NUMA (Non Uniform Memory Access) awareness into
+ the slow path of spinlocks.
+
+ In this variant of qspinlock, the kernel will try to keep the lock
+ on the same node, thus reducing the number of remote cache misses,
+ while trading some of the short term fairness for better performance.
+
+ Say N if you want absolute first come first serve fairness.
+
 config AMD_NUMA
def_bool y
prompt "Old style AMD Opteron NUMA detection"
diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index d86ab942219c..21d09e8db979 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -27,6 +27,10 @@ static __always_inline u32 
queued_fetch_set_pending_acquire(struct qspinlock *lo
return val;
 }
 
+#ifdef CONFIG_NUMA_AWARE_SPINLOCKS
+extern void cna_configure_spin_lock_slowpath(void);
+#endif
+
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 extern void native_queued_spin_lock_slowpath(struct qspinlock *lock, u32 val);
 extern void __pv_init_lock_hash(void);
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 2400ad62f330..e04f48c2191d 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -741,6 +741,10 @@ void __init

[PATCH v13 5/6] locking/qspinlock: Avoid moving certain threads between waiting queues in CNA

2020-12-22 Thread Alex Kogan

Prohibit moving certain threads (e.g., in irq and nmi contexts)
to the secondary queue. Those prioritized threads will always stay
in the primary queue, and so will have a shorter wait time for the lock.

Signed-off-by: Alex Kogan 
Reviewed-by: Steve Sistare 
Reviewed-by: Waiman Long 
---
 kernel/locking/qspinlock_cna.h | 26 --
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index d3e27549c769..ac3109ab0a84 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -4,6 +4,7 @@
 #endif
 
 #include 
+#include 
 
 /*
  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
@@ -35,7 +36,8 @@
  * running on the same NUMA node. If it is not, that waiter is detached from 
the
  * main queue and moved into the tail of the secondary queue. This way, we
  * gradually filter the primary queue, leaving only waiters running on the same
- * preferred NUMA node.
+ * preferred NUMA node. Note that certain priortized waiters (e.g., in
+ * irq and nmi contexts) are excluded from being moved to the secondary queue.
  *
  * We change the NUMA node preference after a waiter at the head of the
  * secondary queue spins for a certain amount of time (10ms, by default).
@@ -49,6 +51,8 @@
  *  Dave Dice 
  */
 
+#define CNA_PRIORITY_NODE  0x
+
 struct cna_node {
struct mcs_spinlock mcs;
u16 numa_node;
@@ -121,9 +125,10 @@ static int __init cna_init_nodes(void)
 
 static __always_inline void cna_init_node(struct mcs_spinlock *node)
 {
+   bool priority = !in_task() || irqs_disabled() || rt_task(current);
struct cna_node *cn = (struct cna_node *)node;
 
-   cn->numa_node = cn->real_numa_node;
+   cn->numa_node = priority ? CNA_PRIORITY_NODE : cn->real_numa_node;
cn->start_time = 0;
 }
 
@@ -262,11 +267,13 @@ static u32 cna_order_queue(struct mcs_spinlock *node)
next_numa_node = ((struct cna_node *)next)->numa_node;
 
if (next_numa_node != numa_node) {
-   struct mcs_spinlock *nnext = READ_ONCE(next->next);
+   if (next_numa_node != CNA_PRIORITY_NODE) {
+   struct mcs_spinlock *nnext = READ_ONCE(next->next);
 
-   if (nnext) {
-   cna_splice_next(node, next, nnext);
-   next = nnext;
+   if (nnext) {
+   cna_splice_next(node, next, nnext);
+   next = nnext;
+   }
}
/*
 * Inherit NUMA node id of primary queue, to maintain the
@@ -284,6 +291,13 @@ static __always_inline u32 cna_wait_head_or_lock(struct 
qspinlock *lock,
struct cna_node *cn = (struct cna_node *)node;
 
if (!cn->start_time || !intra_node_threshold_reached(cn)) {
+   /*
+* We are at the head of the wait queue, no need to use
+* the fake NUMA node ID.
+*/
+   if (cn->numa_node == CNA_PRIORITY_NODE)
+   cn->numa_node = cn->real_numa_node;
+
/*
 * Try and put the time otherwise spent spin waiting on
 * _Q_LOCKED_PENDING_MASK to use by sorting our lists.
-- 
2.24.3 (Apple Git-128)

[PATCH v13 2/6] locking/qspinlock: Refactor the qspinlock slow path

2020-12-22 Thread Alex Kogan

Move some of the code manipulating the spin lock into separate functions.
This would allow easier integration of alternative ways to manipulate
that lock.

Signed-off-by: Alex Kogan 
Reviewed-by: Steve Sistare 
Reviewed-by: Waiman Long 
---
 kernel/locking/qspinlock.c | 38 --
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 435d696f9250..e3518709ffdc 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -289,6 +289,34 @@ static __always_inline u32  __pv_wait_head_or_lock(struct 
qspinlock *lock,
 #define queued_spin_lock_slowpath  native_queued_spin_lock_slowpath
 #endif
 
+/*
+ * __try_clear_tail - try to clear tail by setting the lock value to
+ * _Q_LOCKED_VAL.
+ * @lock: Pointer to the queued spinlock structure
+ * @val: Current value of the lock
+ * @node: Pointer to the MCS node of the lock holder
+ */
+static __always_inline bool __try_clear_tail(struct qspinlock *lock,
+u32 val,
+struct mcs_spinlock *node)
+{
+   return atomic_try_cmpxchg_relaxed(>val, , _Q_LOCKED_VAL);
+}
+
+/*
+ * __mcs_lock_handoff - pass the MCS lock to the next waiter
+ * @node: Pointer to the MCS node of the lock holder
+ * @next: Pointer to the MCS node of the first waiter in the MCS queue
+ */
+static __always_inline void __mcs_lock_handoff(struct mcs_spinlock *node,
+  struct mcs_spinlock *next)
+{
+   arch_mcs_lock_handoff(>locked, 1);
+}
+
+#define try_clear_tail __try_clear_tail
+#define mcs_lock_handoff   __mcs_lock_handoff
+
 #endif /* _GEN_PV_LOCK_SLOWPATH */
 
 /**
@@ -533,7 +561,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
 *   PENDING will make the uncontended transition fail.
 */
if ((val & _Q_TAIL_MASK) == tail) {
-   if (atomic_try_cmpxchg_relaxed(>val, , _Q_LOCKED_VAL))
+   if (try_clear_tail(lock, val, node))
goto release; /* No contention */
}
 
@@ -550,7 +578,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
if (!next)
next = smp_cond_load_relaxed(>next, (VAL));
 
-   arch_mcs_lock_handoff(>locked, 1);
+   mcs_lock_handoff(node, next);
pv_kick_node(lock, next);
 
 release:
@@ -575,6 +603,12 @@ EXPORT_SYMBOL(queued_spin_lock_slowpath);
 #undef pv_kick_node
 #undef pv_wait_head_or_lock
 
+#undef try_clear_tail
+#define try_clear_tail __try_clear_tail
+
+#undef mcs_lock_handoff
+#define mcs_lock_handoff   __mcs_lock_handoff
+
 #undef  queued_spin_lock_slowpath
 #define queued_spin_lock_slowpath  __pv_queued_spin_lock_slowpath
 
-- 
2.24.3 (Apple Git-128)

[PATCH v13 6/6] locking/qspinlock: Introduce the shuffle reduction optimization into CNA

2020-12-22 Thread Alex Kogan

This performance optimization chooses probabilistically to avoid moving
threads from the main queue into the secondary one when the secondary queue
is empty.

It is helpful when the lock is only lightly contended. In particular, it
makes CNA less eager to create a secondary queue, but does not introduce
any extra delays for threads waiting in that queue once it is created.

Signed-off-by: Alex Kogan 
Reviewed-by: Steve Sistare 
Reviewed-by: Waiman Long 
---
 kernel/locking/qspinlock_cna.h | 39 +-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index ac3109ab0a84..621399242735 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -5,6 +5,7 @@
 
 #include 
 #include 
+#include 
 
 /*
  * Implement a NUMA-aware version of MCS (aka CNA, or compact NUMA-aware lock).
@@ -86,6 +87,34 @@ static inline bool intra_node_threshold_reached(struct 
cna_node *cn)
return current_time - threshold > 0;
 }
 
+/*
+ * Controls the probability for enabling the ordering of the main queue
+ * when the secondary queue is empty. The chosen value reduces the amount
+ * of unnecessary shuffling of threads between the two waiting queues
+ * when the contention is low, while responding fast enough and enabling
+ * the shuffling when the contention is high.
+ */
+#define SHUFFLE_REDUCTION_PROB_ARG  (7)
+
+/* Per-CPU pseudo-random number seed */
+static DEFINE_PER_CPU(u32, seed);
+
+/*
+ * Return false with probability 1 / 2^@num_bits.
+ * Intuitively, the larger @num_bits the less likely false is to be returned.
+ * @num_bits must be a number between 0 and 31.
+ */
+static bool probably(unsigned int num_bits)
+{
+   u32 s;
+
+   s = this_cpu_read(seed);
+   s = next_pseudo_random32(s);
+   this_cpu_write(seed, s);
+
+   return s & ((1 << num_bits) - 1);
+}
+
 static void __init cna_init_nodes_per_cpu(unsigned int cpu)
 {
struct mcs_spinlock *base = per_cpu_ptr([0].mcs, cpu);
@@ -290,7 +319,15 @@ static __always_inline u32 cna_wait_head_or_lock(struct 
qspinlock *lock,
 {
struct cna_node *cn = (struct cna_node *)node;
 
-   if (!cn->start_time || !intra_node_threshold_reached(cn)) {
+   if (node->locked <= 1 && probably(SHUFFLE_REDUCTION_PROB_ARG)) {
+   /*
+* When the secondary queue is empty, skip the call to
+* cna_order_queue() with high probability. This optimization
+* reduces the overhead of unnecessary shuffling of threads
+* between waiting queues when the lock is only lightly 
contended.
+*/
+   cn->partial_order = LOCAL_WAITER_FOUND;
+   } else if (!cn->start_time || !intra_node_threshold_reached(cn)) {
/*
 * We are at the head of the wait queue, no need to use
 * the fake NUMA node ID.
-- 
2.24.3 (Apple Git-128)

[PATCH v13 4/6] locking/qspinlock: Introduce starvation avoidance into CNA

2020-12-22 Thread Alex Kogan

Keep track of the time the thread at the head of the secondary queue
has been waiting, and force inter-node handoff once this time passes
a preset threshold. The default value for the threshold (10ms) can be
overridden with the new kernel boot command-line option
"numa_spinlock_threshold". The ms value is translated internally to the
nearest rounded-up jiffies.

Signed-off-by: Alex Kogan 
Reviewed-by: Steve Sistare 
Reviewed-by: Waiman Long 
---
 .../admin-guide/kernel-parameters.txt |  9 ++
 kernel/locking/qspinlock_cna.h| 95 ---
 2 files changed, 92 insertions(+), 12 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a6ae826c6076..fffd31089db0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3440,6 +3440,15 @@
Not specifying this option is equivalent to
numa_spinlock=auto.
 
+   numa_spinlock_threshold=[NUMA, PV_OPS]
+   Set the time threshold in milliseconds for the
+   number of intra-node lock hand-offs before the
+   NUMA-aware spinlock is forced to be passed to
+   a thread on another NUMA node.  Valid values
+   are in the [1..100] range. Smaller values result
+   in a more fair, but less performant spinlock,
+   and vice versa. The default value is 10.
+
numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA.
'node', 'default' can be specified
This can be set from sysctl after boot.
diff --git a/kernel/locking/qspinlock_cna.h b/kernel/locking/qspinlock_cna.h
index 590402ad69ef..d3e27549c769 100644
--- a/kernel/locking/qspinlock_cna.h
+++ b/kernel/locking/qspinlock_cna.h
@@ -37,6 +37,12 @@
  * gradually filter the primary queue, leaving only waiters running on the same
  * preferred NUMA node.
  *
+ * We change the NUMA node preference after a waiter at the head of the
+ * secondary queue spins for a certain amount of time (10ms, by default).
+ * We do that by flushing the secondary queue into the head of the primary 
queue,
+ * effectively changing the preference to the NUMA node of the waiter at the 
head
+ * of the secondary queue at the time of the flush.
+ *
  * For more details, see https://arxiv.org/abs/1810.05600.
  *
  * Authors: Alex Kogan 
@@ -49,13 +55,33 @@ struct cna_node {
u16 real_numa_node;
u32 encoded_tail;   /* self */
u32 partial_order;  /* enum val */
+   s32 start_time;
 };
 
 enum {
LOCAL_WAITER_FOUND,
LOCAL_WAITER_NOT_FOUND,
+   FLUSH_SECONDARY_QUEUE
 };
 
+/*
+ * Controls the threshold time in ms (default = 10) for intra-node lock
+ * hand-offs before the NUMA-aware variant of spinlock is forced to be
+ * passed to a thread on another NUMA node. The default setting can be
+ * changed with the "numa_spinlock_threshold" boot option.
+ */
+#define MSECS_TO_JIFFIES(m)\
+   (((m) + (MSEC_PER_SEC / HZ) - 1) / (MSEC_PER_SEC / HZ))
+static int intra_node_handoff_threshold __ro_after_init = MSECS_TO_JIFFIES(10);
+
+static inline bool intra_node_threshold_reached(struct cna_node *cn)
+{
+   s32 current_time = (s32)jiffies;
+   s32 threshold = cn->start_time + intra_node_handoff_threshold;
+
+   return current_time - threshold > 0;
+}
+
 static void __init cna_init_nodes_per_cpu(unsigned int cpu)
 {
struct mcs_spinlock *base = per_cpu_ptr([0].mcs, cpu);
@@ -98,6 +124,7 @@ static __always_inline void cna_init_node(struct 
mcs_spinlock *node)
struct cna_node *cn = (struct cna_node *)node;
 
cn->numa_node = cn->real_numa_node;
+   cn->start_time = 0;
 }
 
 /*
@@ -197,8 +224,15 @@ static void cna_splice_next(struct mcs_spinlock *node,
 
/* stick `next` on the secondary queue tail */
if (node->locked <= 1) { /* if secondary queue is empty */
+   struct cna_node *cn = (struct cna_node *)node;
+
/* create secondary queue */
next->next = next;
+
+   cn->start_time = (s32)jiffies;
+   /* make sure start_time != 0 iff secondary queue is not empty */
+   if (!cn->start_time)
+   cn->start_time = 1;
} else {
/* add to the tail of the secondary queue */
struct mcs_spinlock *tail_2nd = decode_tail(node->locked);
@@ -249,11 +283,15 @@ static __always_inline u32 cna_wait_head_or_lock(struct 
qspinlock *lock,
 {
struct cna_node *cn = (struct cna_node *)node;
 
-   /*
-* Try and put the time otherwise spent spin waiting on
-* _Q_LOCKED_PENDING_MASK to use by sorting our lists.
-*/
-

[PATCH v13 0/6] Add NUMA-awareness to qspinlock

2020-12-22 Thread Alex Kogan

Change from v12:


Added a shuffle reduction optimization (SRO, last patch in the series)
in order to address the regression in unixbench.
Reported-by: kernel test robot 

I note that despite my initial experiments, a more thorough testing 
on our system did not reproduce the regression.

The rest of the series remains unchanged.

Summary
---

Lock throughput can be increased by handing a lock to a waiter on the
same NUMA node as the lock holder, provided care is taken to avoid
starvation of waiters on other NUMA nodes. This patch introduces CNA
(compact NUMA-aware lock) as the slow path for qspinlock. It is
enabled through a configuration option (NUMA_AWARE_SPINLOCKS).

CNA is a NUMA-aware version of the MCS lock. Spinning threads are
organized in two queues, a primary queue for threads running on the same
node as the current lock holder, and a secondary queue for threads
running on other nodes. Threads store the ID of the node on which
they are running in their queue nodes. After acquiring the MCS lock and
before acquiring the spinlock, the MCS lock holder checks whether the next
waiter in the primary queue (if exists) is running on the same NUMA node.
If it is not, that waiter is detached from the main queue and moved into
the tail of the secondary queue. This way, we gradually filter the primary
queue, leaving only waiters running on the same preferred NUMA node. Note
that certain priortized waiters (e.g., in irq and nmi contexts) are
excluded from being moved to the secondary queue. We change the NUMA node
preference after a waiter at the head of the secondary queue spins for a
certain amount of time. We do that by flushing the secondary queue into
the head of the primary queue, effectively changing the preference to the
NUMA node of the waiter at the head of the secondary queue at the time of
the flush.

More details are available at https://arxiv.org/abs/1810.05600.

We have done some performance evaluation with the locktorture module
as well as with several benchmarks from the will-it-scale repo.
The following locktorture results are from an Oracle X5-4 server
(four Intel Xeon E7-8895 v3 @ 2.60GHz sockets with 18 hyperthreaded
cores each). Each number represents an average (over 25 runs) of the
total number of ops (x10^7) reported at the end of each run. The 
standard deviation is also reported in (), and in general is about 3%
from the average. The 'stock' kernel is v5.10.0-rc7,
commit ca4bbdaf1716, compiled in the default configuration. 
'CNA' is the modified kernel with NUMA_AWARE_SPINLOCKS set;
'CNA-wo-SRO' is the modified kernel with NUMA_AWARE_SPINLOCKS set 
and without the last patch in the series (the SRO optimization).
The speedup is calculated by dividing the result of the corresponding 
variant by the result achieved with 'stock'.

#thr stock CNA-wo-SRO / speedup CNA / speedup
  1  2.707 (0.127) 2.693 (0.100) / 0.995  2.718 (0.101) / 1.004
  2  3.262 (0.075) 3.250 (0.132) / 0.996  3.246 (0.098) / 0.995
  4  4.331 (0.125) 4.804 (0.184) / 1.109  4.733 (0.143) / 1.093
  8  5.092 (0.148) 6.996 (0.206) / 1.374  7.000 (0.194) / 1.375
 16  5.865 (0.119) 8.763 (0.161) / 1.494  8.778 (0.217) / 1.497
 32  6.314 (0.098) 9.837 (0.256) / 1.558  9.720 (0.167) / 1.539
 36  6.434 (0.101) 9.929 (0.259) / 1.543  9.988 (0.208) / 1.552
 72  6.342 (0.080) 10.416 (0.244) / 1.642  10.224 (0.203) / 1.612
108  6.168 (0.080) 10.490 (0.199) / 1.701  10.334 (0.173) / 1.675
142  5.895 (0.119) 10.480 (0.171) / 1.778  10.424 (0.222) / 1.768

The following tables contain throughput results (ops/us) from the same
setup for will-it-scale/open1_threads: 

#thr stock CNA-wo-SRO / speedup CNA / speedup
  1  0.508 (0.001) 0.507 (0.001) / 0.997  0.508 (0.001) / 0.999
  2  0.755 (0.021) 0.764 (0.018) / 1.012  0.757 (0.017) / 1.002
  4  1.409 (0.027) 1.417 (0.024) / 1.006  1.387 (0.027) / 0.984
  8  1.726 (0.092) 1.657 (0.129) / 0.960  1.654 (0.135) / 0.959
 16  1.878 (0.099) 1.811 (0.100) / 0.964  1.761 (0.087) / 0.938
 32  1.012 (0.040) 1.705 (0.086) / 1.685  1.685 (0.081) / 1.666
 36  0.930 (0.088) 1.726 (0.090) / 1.855  1.727 (0.086) / 1.856
 72  0.826 (0.037) 1.645 (0.079) / 1.991  1.621 (0.076) / 1.962
108  0.845 (0.028) 1.685 (0.072) / 1.993  1.688 (0.073) / 1.997
142  0.827 (0.035) 1.712 (0.069) / 2.070  1.696 (0.064) / 2.052

and will-it-scale/lock2_threads:

#thr stock CNA-wo-SRO / speedup CNA / speedup
  1  1.587 (0.004) 1.564 (0.003) / 0.985  1.577 (0.002) / 0.994
  2  2.802 (0.057) 2.752 (0.049) / 0.982  2.776 (0.065) / 0.991
  4  5.365 (0.352) 5.368 (0.196) / 1.001  5.348 (0.297) / 0.997
  8  4.161 (0.270) 4.001 (0.402) / 0.962  4.032 (0.389) / 0.969
 16  4.144 (0.130) 3.940 (0.159) / 0.951  3.917 (0.133) / 0.945
 32  2.444 (0.097) 3.996 (0.102) / 1.635  3.969 (0.130) / 1.624
 36  2.429 (0.070) 3.891 (0.087) / 1.602  3.894 (0.096) / 1.603
 72  1.847 (0.095) 3.929 (0.108) / 2.128  3.942 (0.094) / 2.135
108  1.903 (0.117) 3.898 (0.108) / 2.048  3.901 (0.105) /

Re: [External] Re: [PATCH] net/ncsi: Use real net-device for response handler

2020-12-22 Thread John Wang

On Wed, Dec 23, 2020 at 10:25 AM Jakub Kicinski  wrote:
>
> On Tue, 22 Dec 2020 10:38:21 -0800 Samuel Mendoza-Jonas wrote:
> > On Tue, 2020-12-22 at 06:13 +, Joel Stanley wrote:
> > > On Sun, 20 Dec 2020 at 12:40, John Wang wrote:
> > > > When aggregating ncsi interfaces and dedicated interfaces to bond
> > > > interfaces, the ncsi response handler will use the wrong net device
> > > > to
> > > > find ncsi_dev, so that the ncsi interface will not work properly.
> > > > Here, we use the net device registered to packet_type to fix it.
> > > >
> > > > Fixes: 138635cc27c9 ("net/ncsi: NCSI response packet handler")
> > > > Signed-off-by: John Wang 
>
> This sounds like exactly the case for which orig_dev was introduced.
> I think you should use the orig_dev argument, rather than pt->dev.

will send a v2

>
> Can you test if that works?

Yes,  it works.

>
> > > Can you show me how to reproduce this?

On g220a, eth1 is the dedicated interface, eth0 is the ncsi interface

kernel cfg:
CONFIG_BONDING=y

cat /etc/systemd/network/00-bmc-bond1.netdev
[NetDev]
Name=bond1
Description=Bond eth0 and eth1
Kind=bond

[Bond]
Mode=active-backup

cat /etc/systemd/network/00-bmc-eth0.network
[Match]
Name=eth0
[Network]
Bond=bond1

cat /etc/systemd/network/00-bmc-eth0.network
[Match]
Name=eth1
[Network]
Bond=bond1
PrimarySlave=true

ip addr

6: bond1:  mtu 1500 qdisc noqueue qlen 1000
link/ether b4:05:5d:8f:6a:ad brd ff:ff:ff:ff:ff:ff
inet 169.254.11.178/16 brd 169.254.255.255 scope link bond1
   valid_lft forever preferred_lft forever
inet 192.168.1.108/24 brd 192.168.1.255 scope global bond1
   valid_lft forever preferred_lft forever
inet 10.2.16.118/24 brd 10.2.16.255 scope global bond1
   valid_lft forever preferred_lft forever
inet6 fe80::b605:5dff:fe8f:6aad/64 scope link
...


Without this patch:
After bmc boots:
echo eth0 > /sys/class/net/bond1/bonding/active_slave
admin@g220a:~#
admin@g220a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave
[  105.964357] bond1: (slave eth0): making interface the new active one
admin@g220a:~# ping 10.2.16.1
PING 10.2.16.1 (10.2.16.1): 56 data bytes
64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.096 ms
64 bytes from 10.2.16.1: seq=1 ttl=255 time=2.143 ms
64 bytes from 10.2.16.1: seq=2 ttl=255 time=2.111 ms
[  112.642734] ftgmac100 1e66.ethernet eth0: NCSI Channel 0 timed out!
64 bytes from 10.2.16.1: seq=3 ttl=255 time=2.039 ms
64 bytes from 10.2.16.1: seq=4 ttl=255 time=2.037 ms
[  117.842814] ftgmac100 1e66.ethernet eth0: NCSI: No channel with
link found, configuring channel 0
[  134.482746] ftgmac100 1e66.ethernet eth0: NCSI Channel 0 timed out!
[  139.682820] ftgmac100 1e66.ethernet eth0: NCSI: No channel with
link found, configuring channel 0

with this patch:
After bmc boots:

admin@g220a:~# echo eth0 > /sys/class/net/bond1/bonding/active_slave
[58332.123754] bond1: (slave eth0): making interface the new active one
admin@g220a:~# ping 10.2.16.1
PING 10.2.16.1 (10.2.16.1): 56 data bytes
64 bytes from 10.2.16.1: seq=0 ttl=255 time=7.279 ms
...
...
64 bytes from 10.2.16.1: seq=N ttl=255 time=2.037 ms



> > >
> > > I don't know the ncsi or net code well enough to know if this is the
> > > correct fix. If you are confident it is correct then I have no
> > > objections.
> >
> > This looks like it is probably right; pt->dev will be the original
> > device from ncsi_register_dev(), if a response comes in to
> > ncsi_rcv_rsp() associated with a different device then the driver will
> > fail to find the correct ncsi_dev_priv. An example of the broken case
> > would be good to see though.
>
> From the description sounds like the case is whenever the ncsi
> interface is in a bond, the netdev from the second argument is
> the bond not the interface from which the frame came. It should
> be possible to repro even with only one interface on the system,
> create a bond or a team and add the ncsi interface to it.
>
> Does that make sense? I'm likely missing the subtleties here.

:)  I guess so.

Re: [PATCH v3 09/24] wfx: add hwio.c/hwio.h

2020-12-22 Thread Kalle Valo

Jérôme Pouiller  writes:

> On Tuesday 22 December 2020 16:27:01 CET Greg Kroah-Hartman wrote:
>> 
>> On Tue, Dec 22, 2020 at 05:10:11PM +0200, Kalle Valo wrote:
>> > Jerome Pouiller  writes:
>> >
>> > > +/*
>> > > + * Internal helpers.
>> > > + *
>> > > + * About CONFIG_VMAP_STACK:
>> > > + * When CONFIG_VMAP_STACK is enabled, it is not possible to run DMA on 
>> > > stack
>> > > + * allocated data. Functions below that work with registers (aka 
>> > > functions
>> > > + * ending with "32") automatically reallocate buffers with kmalloc. 
>> > > However,
>> > > + * functions that work with arbitrary length buffers let's caller to 
>> > > handle
>> > > + * memory location. In doubt, enable CONFIG_DEBUG_SG to detect badly 
>> > > located
>> > > + * buffer.
>> > > + */
>> >
>> > This sounds very hacky to me, I have understood that you should never
>> > use stack with DMA.
>> 
>> You should never do that because some platforms do not support it, so no
>> driver should ever try to do that as they do not know what platform they
>> are running on.
>
> Yes, I have learned this rule the hard way.
>
> There is no better way than a comment to warn the user that the argument
> will be used with a DMA? A Sparse annotation, for example?

I have not seen anything, but something like sparse annotation would be
useful. Please let me know if you find anything like that.

But I think that CONFIG_VMAP_STACK is irrelevant and the comment should
be clarified that using stack memory must NOT be used for DMA operations
in any circumstances.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

[PATCH 1/3] objtool: Refactor ORC section generation

2020-12-22 Thread Josh Poimboeuf

Decouple ORC entries from instructions.  This simplifies the
control/data flow, and is going to make it easier to support alternative
instructions which change the stack layout.

Signed-off-by: Josh Poimboeuf 
---
 tools/objtool/Makefile  |   4 -
 tools/objtool/arch.h|   4 -
 tools/objtool/builtin-orc.c |   6 +-
 tools/objtool/check.h   |   3 -
 tools/objtool/objtool.h |   3 +-
 tools/objtool/orc_gen.c | 272 ++--
 tools/objtool/weak.c|   7 +-
 7 files changed, 140 insertions(+), 159 deletions(-)

diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile
index 5cdb19036d7f..a43096f713c7 100644
--- a/tools/objtool/Makefile
+++ b/tools/objtool/Makefile
@@ -46,10 +46,6 @@ ifeq ($(SRCARCH),x86)
SUBCMD_ORC := y
 endif
 
-ifeq ($(SUBCMD_ORC),y)
-   CFLAGS += -DINSN_USE_ORC
-endif
-
 export SUBCMD_CHECK SUBCMD_ORC
 export srctree OUTPUT CFLAGS SRCARCH AWK
 include $(srctree)/tools/build/Makefile.include
diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
index 4a84c3081b8e..5e3f3ea8bb89 100644
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -11,10 +11,6 @@
 #include "objtool.h"
 #include "cfi.h"
 
-#ifdef INSN_USE_ORC
-#include 
-#endif
-
 enum insn_type {
INSN_JUMP_CONDITIONAL,
INSN_JUMP_UNCONDITIONAL,
diff --git a/tools/objtool/builtin-orc.c b/tools/objtool/builtin-orc.c
index 7b31121fa60b..508bdf6ae8dc 100644
--- a/tools/objtool/builtin-orc.c
+++ b/tools/objtool/builtin-orc.c
@@ -51,11 +51,7 @@ int cmd_orc(int argc, const char **argv)
if (list_empty(>insn_list))
return 0;
 
-   ret = create_orc(file);
-   if (ret)
-   return ret;
-
-   ret = create_orc_sections(file);
+   ret = orc_create(file);
if (ret)
return ret;
 
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index 5ec00a4b891b..4c10916ff1cf 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -43,9 +43,6 @@ struct instruction {
struct symbol *func;
struct list_head stack_ops;
struct cfi_state cfi;
-#ifdef INSN_USE_ORC
-   struct orc_entry orc;
-#endif
 };
 
 static inline bool is_static_jump(struct instruction *insn)
diff --git a/tools/objtool/objtool.h b/tools/objtool/objtool.h
index 4125d4578b23..5e58d3537e2f 100644
--- a/tools/objtool/objtool.h
+++ b/tools/objtool/objtool.h
@@ -26,7 +26,6 @@ struct objtool_file *objtool_open_read(const char *_objname);
 
 int check(struct objtool_file *file);
 int orc_dump(const char *objname);
-int create_orc(struct objtool_file *file);
-int create_orc_sections(struct objtool_file *file);
+int orc_create(struct objtool_file *file);
 
 #endif /* _OBJTOOL_H */
diff --git a/tools/objtool/orc_gen.c b/tools/objtool/orc_gen.c
index 235663b96adc..73efba2bfa72 100644
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/orc_gen.c
@@ -12,89 +12,84 @@
 #include "check.h"
 #include "warn.h"
 
-int create_orc(struct objtool_file *file)
+static int init_orc_entry(struct orc_entry *orc, struct cfi_state *cfi)
 {
-   struct instruction *insn;
+   struct instruction *insn = container_of(cfi, struct instruction, cfi);
+   struct cfi_reg *bp = >regs[CFI_BP];
 
-   for_each_insn(file, insn) {
-   struct orc_entry *orc = >orc;
-   struct cfi_reg *cfa = >cfi.cfa;
-   struct cfi_reg *bp = >cfi.regs[CFI_BP];
+   memset(orc, 0, sizeof(*orc));
 
-   if (!insn->sec->text)
-   continue;
-
-   orc->end = insn->cfi.end;
+   orc->end = cfi->end;
 
-   if (cfa->base == CFI_UNDEFINED) {
-   orc->sp_reg = ORC_REG_UNDEFINED;
-   continue;
-   }
-
-   switch (cfa->base) {
-   case CFI_SP:
-   orc->sp_reg = ORC_REG_SP;
-   break;
-   case CFI_SP_INDIRECT:
-   orc->sp_reg = ORC_REG_SP_INDIRECT;
-   break;
-   case CFI_BP:
-   orc->sp_reg = ORC_REG_BP;
-   break;
-   case CFI_BP_INDIRECT:
-   orc->sp_reg = ORC_REG_BP_INDIRECT;
-   break;
-   case CFI_R10:
-   orc->sp_reg = ORC_REG_R10;
-   break;
-   case CFI_R13:
-   orc->sp_reg = ORC_REG_R13;
-   break;
-   case CFI_DI:
-   orc->sp_reg = ORC_REG_DI;
-   break;
-   case CFI_DX:
-   orc->sp_reg = ORC_REG_DX;
-   break;
-   default:
-   WARN_FUNC("unknown CFA base reg %d",
- insn->sec, insn->offset, cfa->base);
-   return -1;
-

[PATCH 2/3] objtool: Add 'alt_group' struct

2020-12-22 Thread Josh Poimboeuf

Create a new struct associated with each group of alternatives
instructions.  This will help with the removal of fake jumps, and more
importantly with adding support for stack layout changes in
alternatives.

Signed-off-by: Josh Poimboeuf 
---
 tools/objtool/check.c | 29 +++--
 tools/objtool/check.h | 13 -
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index c6ab44543c92..67f39b57c6f7 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -984,20 +984,28 @@ static int handle_group_alt(struct objtool_file *file,
struct instruction *orig_insn,
struct instruction **new_insn)
 {
-   static unsigned int alt_group_next_index = 1;
struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump = 
NULL;
-   unsigned int alt_group = alt_group_next_index++;
+   struct alt_group *orig_alt_group, *new_alt_group;
unsigned long dest_off;
 
+
+   orig_alt_group = malloc(sizeof(*orig_alt_group));
+   if (!orig_alt_group) {
+   WARN("malloc failed");
+   return -1;
+   }
last_orig_insn = NULL;
insn = orig_insn;
sec_for_each_insn_from(file, insn) {
if (insn->offset >= special_alt->orig_off + 
special_alt->orig_len)
break;
 
-   insn->alt_group = alt_group;
+   insn->alt_group = orig_alt_group;
last_orig_insn = insn;
}
+   orig_alt_group->orig_group = NULL;
+   orig_alt_group->first_insn = orig_insn;
+   orig_alt_group->last_insn = last_orig_insn;
 
if (next_insn_same_sec(file, last_orig_insn)) {
fake_jump = malloc(sizeof(*fake_jump));
@@ -1028,8 +1036,13 @@ static int handle_group_alt(struct objtool_file *file,
return 0;
}
 
+   new_alt_group = malloc(sizeof(*new_alt_group));
+   if (!new_alt_group) {
+   WARN("malloc failed");
+   return -1;
+   }
+
last_new_insn = NULL;
-   alt_group = alt_group_next_index++;
insn = *new_insn;
sec_for_each_insn_from(file, insn) {
struct reloc *alt_reloc;
@@ -1041,7 +1054,7 @@ static int handle_group_alt(struct objtool_file *file,
 
insn->ignore = orig_insn->ignore_alts;
insn->func = orig_insn->func;
-   insn->alt_group = alt_group;
+   insn->alt_group = new_alt_group;
 
/*
 * Since alternative replacement code is copy/pasted by the
@@ -1090,6 +1103,10 @@ static int handle_group_alt(struct objtool_file *file,
return -1;
}
 
+   new_alt_group->orig_group = orig_alt_group;
+   new_alt_group->first_insn = *new_insn;
+   new_alt_group->last_insn = last_new_insn;
+
if (fake_jump)
list_add(_jump->list, _new_insn->list);
 
@@ -2405,7 +2422,7 @@ static int validate_return(struct symbol *func, struct 
instruction *insn, struct
 static void fill_alternative_cfi(struct objtool_file *file, struct instruction 
*insn)
 {
struct instruction *first_insn = insn;
-   int alt_group = insn->alt_group;
+   struct alt_group *alt_group = insn->alt_group;
 
sec_for_each_insn_continue(file, insn) {
if (insn->alt_group != alt_group)
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index 4c10916ff1cf..b74c383c2d83 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -19,6 +19,17 @@ struct insn_state {
s8 instr;
 };
 
+struct alt_group {
+   /*
+* Pointer from a replacement group to the original group.  NULL if it
+* *is* the original group.
+*/
+   struct alt_group *orig_group;
+
+   /* First and last instructions in the group */
+   struct instruction *first_insn, *last_insn;
+};
+
 struct instruction {
struct list_head list;
struct hlist_node hash;
@@ -34,7 +45,7 @@ struct instruction {
s8 instr;
u8 visited;
u8 ret_offset;
-   int alt_group;
+   struct alt_group *alt_group;
struct symbol *call_dest;
struct instruction *jump_dest;
struct instruction *first_jump_src;
-- 
2.29.2

[PATCH 3/3] objtool: Support stack layout changes in alternatives

2020-12-22 Thread Josh Poimboeuf

The ORC unwinder showed a warning [1] which revealed the stack layout
didn't match what was expected.  The problem was that paravirt patching
had replaced "CALL *pv_ops.irq.save_fl" with "PUSHF;POP".  That changed
the stack layout between the PUSHF and the POP, so unwinding from an
interrupt which occurred between those two instructions would fail.

Part of the agreed upon solution was to rework the custom paravirt
patching code to use alternatives instead, since objtool already knows
how to read alternatives (and converging runtime patching infrastructure
is always a good thing anyway).  But the main problem still remains,
which is that runtime patching can change the stack layout.

Making stack layout changes in alternatives was disallowed with commit
7117f16bf460 ("objtool: Fix ORC vs alternatives"), but now that paravirt
is going to be doing it, it needs to be supported.

One way to do so would be to modify the ORC table when the code gets
patched.  But ORC is simple -- a good thing! -- and it's best to leave
it alone.

Instead, support stack layout changes by "flattening" all possible stack
states (CFI) from parallel alternative code streams into a single set of
linear states.  The only necessary limitation is that CFI conflicts are
disallowed at all possible instruction boundaries.

For example, this scenario is allowed:

  Alt1Alt2Alt3

   0x00   CALL *pv_ops.save_flCALL xen_save_flPUSHF
   0x01   POP %RAX
   0x02   NOP
   ...
   0x05   NOP
   ...
   0x07   

The unwind information for offset-0x00 is identical for all 3
alternatives.  Similarly offset-0x05 and higher also are identical (and
the same as 0x00).  However offset-0x01 has deviating CFI, but that is
only relevant for Alt3, neither of the other alternative instruction
streams will ever hit that offset.

This scenario is NOT allowed:

  Alt1Alt2

   0x00   CALL *pv_ops.save_flPUSHF
   0x01   NOP6
   ...
   0x07   NOP POP %RAX

The problem here is that offset-0x7, which is an instruction boundary in
both possible instruction patch streams, has two conflicting stack
layouts.

[ The above examples were stolen from Peter Zijlstra. ]

The new flattened CFI array is used both for the detection of conflicts
(like the second example above) and the generation of linear ORC
entries.

BTW, another benefit of these changes is that, thanks to some related
cleanups (new fake nops and alt_group struct) objtool can finally be rid
of fake jumps, which were a constant source of headaches.

[1] https://lkml.kernel.org/r/2020170536.arx2zbn4ngvjoov7@treble

Cc: Shinichiro Kawasaki 
Signed-off-by: Josh Poimboeuf 
---
 .../Documentation/stack-validation.txt|  16 +-
 tools/objtool/check.c | 175 ++
 tools/objtool/check.h |   6 +
 tools/objtool/orc_gen.c   |  56 +-
 4 files changed, 157 insertions(+), 96 deletions(-)

diff --git a/tools/objtool/Documentation/stack-validation.txt 
b/tools/objtool/Documentation/stack-validation.txt
index 0542e46c7552..30f38fdc0d56 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -315,13 +315,15 @@ they mean, and suggestions for how to fix them.
   function tracing inserts additional calls, which is not obvious from the
   sources).
 
-10. file.o: warning: func()+0x5c: alternative modifies stack
-
-This means that an alternative includes instructions that modify the
-stack. The problem is that there is only one ORC unwind table, this means
-that the ORC unwind entries must be valid for each of the alternatives.
-The easiest way to enforce this is to ensure alternatives do not contain
-any ORC entries, which in turn implies the above constraint.
+10. file.o: warning: func()+0x5c: stack layout conflict in alternatives
+
+This means that in the use of the alternative() or ALTERNATIVE()
+macro, the code paths have conflicting modifications to the stack.
+The problem is that there is only one ORC unwind table, which means
+that the ORC unwind entries must be consistent for all possible
+instruction boundaries regardless of which code has been patched.
+This limitation can be overcome by massaging the alternatives with
+NOPs to shift the stack changes around so they no longer conflict.
 
 11. file.o: warning: unannotated intra-function call
 
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 67f39b57c6f7..81d56fdef1c3 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -19,8 +19,6 @@
 #include 
 #include 
 
-#define FAKE_JUMP_OFFSET -1
-
 struct alternative {
struct list_head list;
struct instruction *insn;
@@ -767,9

[PATCH 0/3] Alternatives vs ORC, a slightly easier way

2020-12-22 Thread Josh Poimboeuf

These patches replace Peter's "Alternatives vs ORC, the hard way".  The
end result should be the same (support for paravirt patching's using of
alternatives which modify the stack).

Josh Poimboeuf (3):
  objtool: Refactor ORC section generation
  objtool: Add 'alt_group' struct
  objtool: Support stack layout changes in alternatives

 .../Documentation/stack-validation.txt|  16 +-
 tools/objtool/Makefile|   4 -
 tools/objtool/arch.h  |   4 -
 tools/objtool/builtin-orc.c   |   6 +-
 tools/objtool/check.c | 190 ++-
 tools/objtool/check.h |  22 +-
 tools/objtool/objtool.h   |   3 +-
 tools/objtool/orc_gen.c   | 308 ++
 tools/objtool/weak.c  |   7 +-
 9 files changed, 315 insertions(+), 245 deletions(-)

-- 
2.29.2

Re: [PATCH v3 03/24] wfx: add Makefile/Kconfig

2020-12-22 Thread Kalle Valo

Jérôme Pouiller  writes:

> On Tuesday 22 December 2020 16:02:38 CET Kalle Valo wrote:
>> Jerome Pouiller  writes:
>> 
>> > From: Jérôme Pouiller 
>> >
>> > Signed-off-by: Jérôme Pouiller 
>> 
>> [...]
>> 
>> > +wfx-$(CONFIG_SPI) += bus_spi.o
>> > +wfx-$(subst m,y,$(CONFIG_MMC)) += bus_sdio.o
>> 
>> Why this subst? And why only for MMC?
>
> CONFIG_SPI is a boolean (y or empty). The both values make senses.
>
> CONFIG_MMC is a tristate (y, m or empty). The substitution above
> ensure that bus_sdio.o will included in wfx.ko if CONFIG_MMC is 'm'
> ("wfx-$(CONFIG_MMC) += bus_sdio.o" wouldn't make the job).
>
> You may want to know what it happens if CONFIG_MMC=m while CONFIG_WFX=y.
> This line in Kconfig prevents to compile wfx statically if MMC is a
> module:
>depends on MMC || !MMC # do not allow WFX=y if MMC=m

Ok, thanks for explaining this.

-- 
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

Re: linux-next: Tree for Dec 21 (objtool warning)

2020-12-22 Thread Josh Poimboeuf

On Mon, Dec 21, 2020 at 08:03:17AM -0800, Randy Dunlap wrote:
> On 12/20/20 7:18 PM, Stephen Rothwell wrote:
> > Hi all,
> > 
> > News: there will be no linux-next releases between Dec 24 and Jan
> > 3 inclusive.
> > 
> > Please do not add any v5.12 destined code to your linux-next included
> > branches until after v5.11-rc1 has been released.
> > 
> > Changes since 20201218:
> > 
> 
> on x86_64:
> 
> arch/x86/kernel/sys_ia32.o: warning: objtool: cp_stat64()+0xd8: call to 
> new_encode_dev() with UACCESS enabled

Can you send a .o for this one?  Please gzip it because my email has
been rejecting .o files lately :-/

-- 
Josh

Re: [PATCH v13 4/6] powerpc: Delete unused function delete_fdt_mem_rsv()

2020-12-22 Thread Lakshmi Ramasubramanian


On 12/22/20 5:08 PM, Thiago Jung Bauermann wrote:


Lakshmi Ramasubramanian  writes:


delete_fdt_mem_rsv() defined in "arch/powerpc/kexec/file_load.c"
has been renamed to fdt_find_and_del_mem_rsv(), and moved to
"drivers/of/kexec.c".

Remove delete_fdt_mem_rsv() in "arch/powerpc/kexec/file_load.c".

Co-developed-by: Prakhar Srivastava 
Signed-off-by: Prakhar Srivastava 
Signed-off-by: Lakshmi Ramasubramanian 
---
  arch/powerpc/include/asm/kexec.h |  1 -
  arch/powerpc/kexec/file_load.c   | 32 
  2 files changed, 33 deletions(-)


As I mentioned in the other email, this patch could remove
setup_new_fdt() as well.

I'm a bit ambivalent on whether this patch should be squashed with
patch 2 or left on its own, but I tend toward the latter option because
patch 2 is big enough already.



I also think Patch #2 is already big enough - I don't want to make more 
changes in that patch.


I will remove delete_fdt_mem_rsv() and setup_new_fdt() in this patch 
(Patch #4) and call of_kexec_setup_new_fdt() directly (in 
setup_new_fdt_ppc64()).


thanks,
 -lakshmi

[PATCH] mm/uaccess: Use 'unsigned long' to placate UBSAN warnings, again

2020-12-22 Thread Josh Poimboeuf

GCC 7 has a known bug where UBSAN ignores '-fwrapv' and generates false
signed-overflow-UB warnings.  The type mismatch between 'i' and
'nr_segs' in copy_compat_iovec_from_user() is causing such a warning,
which also happens to violate uaccess rules:

  lib/iov_iter.o: warning: objtool: iovec_from_user()+0x22d: call to 
__ubsan_handle_add_overflow() with UACCESS enabled

Fix it by making the variable types match.

This is similar to a previous commit:

  29da93fea3ea ("mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on 
older GCC versions")

Reported-by: Randy Dunlap 
Signed-off-by: Josh Poimboeuf 
---
 lib/iov_iter.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/lib/iov_iter.c b/lib/iov_iter.c
index 1635111c5bd2..2e6a42f5d1df 100644
--- a/lib/iov_iter.c
+++ b/lib/iov_iter.c
@@ -1656,7 +1656,8 @@ static int copy_compat_iovec_from_user(struct iovec *iov,
 {
const struct compat_iovec __user *uiov =
(const struct compat_iovec __user *)uvec;
-   int ret = -EFAULT, i;
+   int ret = -EFAULT;
+   unsigned long i;
 
if (!user_access_begin(uvec, nr_segs * sizeof(*uvec)))
return -EFAULT;
-- 
2.29.2

Re: [PATCH v13 2/6] powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c

2020-12-22 Thread Lakshmi Ramasubramanian


On 12/22/20 4:40 PM, Thiago Jung Bauermann wrote:


Lakshmi Ramasubramanian  writes:


On 12/22/20 11:45 AM, Mimi Zohar wrote:

On Tue, 2020-12-22 at 10:53 -0800, Lakshmi Ramasubramanian wrote:

On 12/22/20 6:26 AM, Mimi Zohar wrote:

Hi Mimi,



On Sat, 2020-12-19 at 09:57 -0800, Lakshmi Ramasubramanian wrote:


diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 4aff6846c772..b6c52608cb49 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -9,13 +9,6 @@ obj-$(CONFIG_PPC32)+= relocate_32.o
   obj-$(CONFIG_KEXEC_FILE) += file_load.o ranges.o
file_load_$(BITS).o elf_$(BITS).o
-ifdef CONFIG_HAVE_IMA_KEXEC
-ifdef CONFIG_IMA
-obj-y  += ima.o
-endif
-endif


Notice how "kexec/ima.o" is only included if the architecture supports
it and IMA is configured.  In addition only if CONFIG_IMA_KEXEC is
configured, is the IMA measurement list carried across kexec.  After
moving the rest of ima.c to drivers/of/kexec.c, this changes.   Notice
how drivers/of/Kconfig includes kexec.o:

obj-$(CONFIG_KEXEC_FILE) += kexec.o

It is not dependent on CONFIG_HAVE_IMA_KEXEC.  Shouldn't all of the
functions defined in ima.c being moved to kexec.o be defined within a
CONFIG_HAVE_IMA_KEXEC ifdef?



Thanks for reviewing the changes.

In "drivers/of/kexec.c" the function remove_ima_buffer() is defined
under "#ifdef CONFIG_HAVE_IMA_KEXEC"

setup_ima_buffer() is defined under "#ifdef CONFIG_IMA_KEXEC" - the same
way it was defined in "arch/powerpc/kexec/ima.c".

As you know, CONFIG_IMA_KEXEC depends on CONFIG_HAVE_IMA_KEXEC (as
defined in "security/integrity/ima/Kconfig").

ima_get_kexec_buffer() and ima_free_kexec_buffer() are unconditionally
defined in "drivers/of/kexec.c" even though they are called only when
CONFIG_HAVE_IMA_KEXEC is enabled. I will update these two functions to
be moved under "#ifdef CONFIG_HAVE_IMA_KEXEC"

The issue is the reverse.  CONFIG_HAVE_IMA_KEXEC may be enabled without
CONFIG_IMA_KEXEC being enabled.  This allows the architecture to
support carrying the measurement list across kexec, but requires
enabling it at build time.
Only if CONFIG_HAVE_IMA_KEXEC is enabled should any of these functions
be compiled at build.  This allows restoring the previous IMA
measurement list, even if CONFIG_IMA_KEXEC is not enabled.
Only if CONFIG_IMA_KEXEC is enabled, should carrying the measurement
list across kexec be enabled.  See how arch_ima_add_kexec_buffer,
write_number, setup_ima_buffer are ifdef'ed in
arch/powerpc/kexec/ima.c.



Yes - I agree. I will make the following changes:

=> Enable the functions moved from "arch/powerpc/kexec/ima.c" to
"drivers/of/kexec.c" only when CONFIG_HAVE_IMA_KEXEC is enabled.

=> Also, compile write_number() and setup_ima_buffer() only when
CONFIG_IMA_KEXEC is enabled.


Sounds good, with one additional change:

So far, CONFIG_HAVE_IMA_KEXEC was tested only in files that were built
when CONFIG_IMA was set. With this series this is not the case anymore
(in drivers/of/kexec.c). The simplest way to keep this consistent is to
only enable CONFIG_HAVE_IMA_KEXEC if CONFIG_IMA is also set.

For example, with this:

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index e9f13fe08492..4ddd17215ecf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -548,7 +548,7 @@ config KEXEC
  config KEXEC_FILE
bool "kexec file based system call"
select KEXEC_CORE
-   select HAVE_IMA_KEXEC
+   select HAVE_IMA_KEXEC if IMA
select BUILD_BIN2C
select KEXEC_ELF
depends on PPC64

And then the same thing on the arm64 patch.


This is a good idea Thiago - I will make this change in the Kconfig for 
both powerpc and arm64.


thanks,
 -lakshmi

Re: [PATCH v13 2/6] powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c

2020-12-22 Thread Lakshmi Ramasubramanian


On 12/22/20 4:19 PM, Thiago Jung Bauermann wrote:


Lakshmi Ramasubramanian  writes:


The functions defined in "arch/powerpc/kexec/ima.c" handle setting up
and freeing the resources required to carry over the IMA measurement
list from the current kernel to the next kernel across kexec system call.
These functions do not have architecture specific code, but are
currently limited to powerpc.

Move setup_ima_buffer() call into of_kexec_setup_new_fdt() defined in
"drivers/of/kexec.c".

Move the remaining architecture independent functions from
"arch/powerpc/kexec/ima.c" to "drivers/of/kexec.c".
Delete "arch/powerpc/kexec/ima.c" and "arch/powerpc/include/asm/ima.h".
Remove references to the deleted files in powerpc and in ima.

Co-developed-by: Prakhar Srivastava 
Signed-off-by: Prakhar Srivastava 
Signed-off-by: Lakshmi Ramasubramanian 
---
  arch/powerpc/include/asm/ima.h |  27 
  arch/powerpc/kexec/Makefile|   7 -
  arch/powerpc/kexec/file_load.c |   7 -
  arch/powerpc/kexec/ima.c   | 202 -
  drivers/of/kexec.c | 235 +
  include/linux/of.h |   2 +
  security/integrity/ima/ima.h   |   4 -
  security/integrity/ima/ima_kexec.c |   1 +
  8 files changed, 238 insertions(+), 247 deletions(-)
  delete mode 100644 arch/powerpc/include/asm/ima.h
  delete mode 100644 arch/powerpc/kexec/ima.c


This looks good, provided the changes from the discussion with Mimi are
made. Also, minor nits below.


I will address the changes Mimi had stated.




diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 6ebefec616e4..7c3947ad3773 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -24,10 +24,6 @@
  
  #include "../integrity.h"
  
-#ifdef CONFIG_HAVE_IMA_KEXEC

-#include 
-#endif
-
  enum ima_show_type { IMA_SHOW_BINARY, IMA_SHOW_BINARY_NO_FIELD_LEN,
 IMA_SHOW_BINARY_OLD_STRING_FMT, IMA_SHOW_ASCII };
  enum tpm_pcrs { TPM_PCR0 = 0, TPM_PCR8 = 8, TPM_PCR10 = 10 };


This belongs in patch 1.


No - the reference to "asm/ima.h" cannot be removed in Patch #1 since 
ima_get_kexec_buffer() and ima_free_kexec_buffer() are still declared in 
this header. They are moved in this patch only (Patch #2).



diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index 38bcd7543e27..8a6712981dee 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -10,6 +10,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include "ima.h"


This include isn't necessary.


This change is necessary because ima_get_kexec_buffer() and 
ima_free_kexec_buffer() are now declared in "linux/of.h".


 -lakshmi

Re: [PATCH v13 2/6] powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c

2020-12-22 Thread Lakshmi Ramasubramanian


On 12/22/20 4:48 PM, Thiago Jung Bauermann wrote:


Actually, I have one more comment on this patch:

Lakshmi Ramasubramanian  writes:


diff --git a/arch/powerpc/kexec/file_load.c b/arch/powerpc/kexec/file_load.c
index 956bcb2d1ec2..9f3ec0b239ef 100644
--- a/arch/powerpc/kexec/file_load.c
+++ b/arch/powerpc/kexec/file_load.c
@@ -20,7 +20,6 @@
  #include 
  #include 
  #include 
-#include 
  
  #define SLAVE_CODE_SIZE		256	/* First 0x100 bytes */
  
@@ -163,12 +162,6 @@ int setup_new_fdt(const struct kimage *image, void *fdt,

if (ret)
goto err;
  
-	ret = setup_ima_buffer(image, fdt, fdt_path_offset(fdt, "/chosen"));

-   if (ret) {
-   pr_err("Error setting up the new device tree.\n");
-   return ret;
-   }
-
return 0;
  
  err:


With this change, setup_new_fdt() is nothing more than a call to
of_kexec_setup_new_fdt(). It should be removed, and its caller should
call of_kexec_setup_new_fdt() directly.

This change could be done in patch 4 of this series, to keep this patch
simpler.



Sure Thiago - I will make that change.

thanks,
 -lakshmi

UBSAN: shift-out-of-bounds in vhci_hub_control

2020-12-22 Thread syzbot

Hello,

syzbot found the following issue on:

HEAD commit:a409ed15 Merge tag 'gpio-v5.11-1' of git://git.kernel.org/..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=1053b62350
kernel config:  https://syzkaller.appspot.com/x/.config?x=f7c39e7211134bc0
dashboard link: https://syzkaller.appspot.com/bug?extid=297d20e437b79283bf6d
compiler:   gcc (GCC) 10.1.0-syz 20200507
syz repro:  https://syzkaller.appspot.com/x/repro.syz?x=15f4f13750
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1115f30f50

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+297d20e437b79283b...@syzkaller.appspotmail.com


UBSAN: shift-out-of-bounds in drivers/usb/usbip/vhci_hcd.c:399:41
shift exponent 768 is too large for 32-bit type 'int'
CPU: 1 PID: 8482 Comm: syz-executor092 Not tainted 5.10.0-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 vhci_hub_control.cold+0x205/0x246 drivers/usb/usbip/vhci_hcd.c:399
 rh_call_control drivers/usb/core/hcd.c:683 [inline]
 rh_urb_enqueue drivers/usb/core/hcd.c:841 [inline]
 usb_hcd_submit_urb+0xcaa/0x22d0 drivers/usb/core/hcd.c:1544
 usb_submit_urb+0x6e4/0x1560 drivers/usb/core/urb.c:585
 usb_start_wait_urb+0x101/0x4c0 drivers/usb/core/message.c:58
 usb_internal_control_msg drivers/usb/core/message.c:102 [inline]
 usb_control_msg+0x31c/0x4a0 drivers/usb/core/message.c:153
 do_proc_control+0x4cb/0x9c0 drivers/usb/core/devio.c:1165
 proc_control drivers/usb/core/devio.c:1191 [inline]
 usbdev_do_ioctl drivers/usb/core/devio.c:2535 [inline]
 usbdev_ioctl+0x12c1/0x3b20 drivers/usb/core/devio.c:2708
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0033:0x443f39
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 
fb d7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:7ffd18a092c8 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 004002e0 RCX: 00443f39
RDX: 2000 RSI: c0185500 RDI: 0003
RBP: 006ce018 R08:  R09: 004002e0
R10: 000f R11: 0246 R12: 00401bc0
R13: 00401c50 R14:  R15: 



---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkal...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this issue, for details see:
https://goo.gl/tpsmEJ#testing-patches

linux-next: Tree for Dec 23

2020-12-22 Thread Stephen Rothwell

Hi all,

News: there will be no linux-next releases between Dec 24 and Jan
3 inclusive.

Please do not add any v5.12 destined code to your linux-next included
branches until after v5.11-rc1 has been released.

Changes since 20201222:

Non-merge commits (relative to Linus' tree): 923
 941 files changed, 28744 insertions(+), 9573 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig and htmldocs. And finally, a simple boot test
of the powerpc pseries_le_defconfig kernel in qemu (with and without
kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 329 trees (counting Linus' and 85 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (8653b778e454 Merge tag 'clk-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux)
Merging fixes/fixes (9223e74f9960 Merge tag 'io_uring-5.10-2020-11-27' of 
git://git.kernel.dk/linux-block)
Merging kbuild-current/fixes (e37b12e4bb21 Merge tag 'for-linus-5.11-ofs1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux)
Merging arc-current/for-curr (3a71e423133a ARC: build: use $(READELF) instead 
of hard-coded readelf)
Merging arm-current/fixes (e64ab473ddda ARM: 9034/1: __div64_32(): straighten 
up inline asm constraints)
Merging arm64-fixes/for-next/fixes (9fd339a45be5 arm64: Work around broken GCC 
4.9 handling of "S" constraint)
Merging arm-soc-fixes/arm/fixes (f012afb6af3d ARM: dts: ux500/golden: Set 
display max brightness)
Merging drivers-memory-fixes/fixes (3650b228f83a Linux 5.10-rc1)
Merging m68k-current/for-linus (2ae92e8b9b7e MAINTAINERS: Update m68k Mac entry)
Merging powerpc-fixes/fixes (9c7422b92cb2 powerpc/32s: Fix RTAS machine check 
with VMAP stack)
Merging s390-fixes/fixes (586592478b1f Merge tag 's390-5.11-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux)
Merging sparc/master (0a95a6d1a4cd sparc: use for_each_child_of_node() macro)
Merging fscrypt-current/for-stable (d19d8d345eec fscrypt: fix inline encryption 
not used on new files)
Merging net/master (2575bc1aa9d5 net: mvpp2: Fix GoP port 3 Networking Complex 
Control configurations)
Merging bpf/master (e7e518053c26 bpf: Add schedule point in htab_init_buckets())
Merging ipsec/master (56ce7c25ae15 xfrm: Fix oops in xfrm_replay_advance_bmp)
Merging netfilter/master (2575bc1aa9d5 net: mvpp2: Fix GoP port 3 Networking 
Complex Control configurations)
Merging ipvs/master (5c8193f568ae netfilter: ipset: fix shift-out-of-bounds in 
htable_bits())
Merging wireless-drivers/master (bfe55584713b MAINTAINERS: switch to different 
email address)
Merging mac80211/master (2c85ebc57b3e Linux 5.10)
Merging rdma-fixes/for-rc (340b940ea0ed RDMA/cm: Fix an attempt to use 
non-valid pointer when cleaning timewait)
Merging sound-current/for-linus (13be30f156fd ALSA/hda: apply jack fixup for 
the Acer Veriton N4640G/N6640G/N2510G)
Merging sound-asoc-fixes/for-linus (fd19c7352504 Merge remote-tracking branch 
'asoc/for-5.11' into asoc-linus)
Merging regmap-fixes/for-linus (e6e9354b5830 regmap: Remove duplicate `type` 
field from regmap `regcache_sync` trace event)
Merging regulator-fixes/for-linus (639b12846819 Merge remote-tracking branch 
'regulator/for-5.11' into regulator-linus)
Merging spi-fixes/for-linus (676c63ebebaf Merge remote-tracking branch 
'spi/for-5.11' into spi-linus)
Merging pci-current/for-linus (f8394f232b1e Linux 5.10-rc3)
Merging driver-core.current/driver-core-linus (accefff5b547 Merge tag 
'ar

Re: [PATCH v1] scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL

2020-12-22 Thread Can Guo


On 2020-12-23 12:19, Stanley Chu wrote:

Hi Can,

On Tue, 2020-12-22 at 19:34 +0800, Can Guo wrote:

On 2020-12-22 15:29, Stanley Chu wrote:
> Flush during hibern8 is sufficient on MediaTek platforms, thus
> enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL to skip enabling
> fWriteBoosterBufferFlush during WriteBooster initialization.
>
> Signed-off-by: Stanley Chu 
> ---
>  drivers/scsi/ufs/ufs-mediatek.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/scsi/ufs/ufs-mediatek.c
> b/drivers/scsi/ufs/ufs-mediatek.c
> index 80618af7c872..c55202b92a43 100644
> --- a/drivers/scsi/ufs/ufs-mediatek.c
> +++ b/drivers/scsi/ufs/ufs-mediatek.c
> @@ -661,6 +661,7 @@ static int ufs_mtk_init(struct ufs_hba *hba)
>
>/* Enable WriteBooster */
>hba->caps |= UFSHCD_CAP_WB_EN;
> +  hba->quirks |= UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL;
>hba->vps->wb_flush_threshold = UFS_WB_BUF_REMAIN_PERCENT(80);
>
>if (host->caps & UFS_MTK_CAP_DISABLE_AH8)

I guess we need it too...


AHHA, if you decide to add this in your platform too later, maybe we
could change the way it does: Keep manual flush disabled by default and
remove this quirk.



Yeah... I will get back with an answer later.

Thanks,

Can Guo.


Thanks,
Stanley Chu


Change LGTM.

Regards,

Can Guo.

Re: [PATCH v2 19/48] opp: Fix adding OPP entries in a wrong order if rate is unavailable

2020-12-22 Thread Viresh Kumar

On 22-12-20, 22:19, Dmitry Osipenko wrote:
> 22.12.2020 12:12, Viresh Kumar пишет:
> > On 17-12-20, 21:06, Dmitry Osipenko wrote:
> >> Fix adding OPP entries in a wrong (opposite) order if OPP rate is
> >> unavailable. The OPP comparison is erroneously skipped if OPP rate is
> >> missing, thus OPPs are left unsorted.
> >>
> >> Signed-off-by: Dmitry Osipenko 
> >> ---
> >>  drivers/opp/core.c | 23 ---
> >>  drivers/opp/opp.h  |  2 +-
> >>  2 files changed, 13 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> >> index 34f7e530d941..5c7f130a8de2 100644
> >> --- a/drivers/opp/core.c
> >> +++ b/drivers/opp/core.c
> >> @@ -1531,9 +1531,10 @@ static bool _opp_supported_by_regulators(struct 
> >> dev_pm_opp *opp,
> >>return true;
> >>  }
> >>  
> >> -int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2)
> >> +int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2,
> >> +   bool rate_not_available)
> >>  {
> >> -  if (opp1->rate != opp2->rate)
> >> +  if (!rate_not_available && opp1->rate != opp2->rate)
> > 
> > rate will be 0 for both the OPPs here if rate_not_available is true and so 
> > this
> > change shouldn't be required.
> 
> The rate_not_available is negated in the condition. This change is
> required because both rates are 0 and then we should proceed to the
> levels comparison.

Won't that happen without this patch ?

> I guess it's not clear by looking at this patch, please see a full
> version of the function:
> 
> int _opp_compare_key(struct dev_pm_opp *opp1, struct dev_pm_opp *opp2,
>  bool rate_not_available)
> {
>   if (!rate_not_available && opp1->rate != opp2->rate)
> return opp1->rate < opp2->rate ? -1 : 1;
>   if (opp1->bandwidth && opp2->bandwidth &&
>   opp1->bandwidth[0].peak != opp2->bandwidth[0].peak)
> return opp1->bandwidth[0].peak < opp2->bandwidth[0].peak ? -1 : 1;
>   if (opp1->level != opp2->level)
> return opp1->level < opp2->level ? -1 : 1;
>   return 0;
> }
> 
> Perhaps we could check whether opp1->rate=0, like it's done for the
> opp1->bandwidth. I'll consider this variant for v3, thanks.

-- 
viresh

Re: [PATCH] perf stat: Create '--add-default' option to append default list

2020-12-22 Thread Jin, Yao





On 12/23/2020 8:56 AM, Jin, Yao wrote:

Hi Arnaldo,

On 12/23/2020 12:15 AM, Arnaldo Carvalho de Melo wrote:

Em Tue, Dec 22, 2020 at 09:11:31AM +0800, Jin Yao escreveu:

The event default list includes the most common events which are widely
used by users. But with -e option, the current perf only counts the events
assigned by -e option. Users may want to collect some extra events with
the default list. For this case, users have to manually add all the events
from the default list. It's inconvenient. Also, users may don't know how to
get the default list.

It's better to add a new option to append default list to the -e events.
The new option is '--add-default'.

Before:

root@kbl-ppc:~# ./perf stat -e power/energy-pkg/ -a -- sleep 1

  Performance counter stats for 'system wide':

   2.05 Joules power/energy-pkg/

    1.000857974 seconds time elapsed

After:

root@kbl-ppc:~# ./perf stat -e power/energy-pkg/ -a --add-default -- sleep 1


I thought about:

 perf stat -e +power/energy-pkg/ -a -- sleep 1



I was surprised to see that '+' syntax had been supported.

root@kbl-ppc:~# ./perf stat -e +power/energy-pkg/ -a -- sleep 1

  Performance counter stats for 'system wide':

   1.99 Joules +power/energy-pkg/

    1.000877852 seconds time elapsed

root@kbl-ppc:~# ./perf stat -e +power/energy-pkg/,+cycles -a -- sleep 1

  Performance counter stats for 'system wide':

   2.00 Joules +power/energy-pkg/
     13,780,620    +cycles

    1.001639147 seconds time elapsed

Are there any scripts or usages need the prefix '+' before event? I don't know. But if we append the 
'+' to the default event list, will break something potentially?



Which would have its counterpart:

 perf stat -e -cycles -0a --sleep 1

To remove an event from the defaults, perhaps to deal with some specific
hardware where the default or what is in -d, -dd, -ddd, etc can't all be
counted. I.e. - and + would remove or add from whaver list was there at
that point.

- Arnaldo


Yes, + and - are more flexible solution. Just for above question, will '+' break existing 
usage? And for '-', I don't know if user can remember clearly for what the events are in default list.




For '-', another difficulty is it may conflict with the hardware cache event.

Say we remove the "-  { return '-'; }" from parse-events.l, 
such as:

diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 9db5097317f4..145653d1ce16 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -387,7 +387,6 @@ r{num_raw_hex}  { return raw(yyscanner); }
 {name} { return pmu_str_check(yyscanner, _parse_state); }
 {name_tag} { return str(yyscanner, PE_NAME); }
 "/"{ BEGIN(config); return '/'; }
--  { return '-'; }
 ,  { BEGIN(event); return ','; }
 :  { return ':'; }
 "{"{ BEGIN(event); return '{'; }

The syntax of '-' is supported.

root@kbl-ppc:~# ./perf stat -e -cycles -a -- sleep 1

 Performance counter stats for 'system wide':

14,008,859  -cycles

   1.001471494 seconds time elapsed

But the parsing of hardware cache event would be failed. :(

root@kbl-ppc:~# ./perf stat -e LLC-stores -a -- sleep 1
event syntax error: 'LLC-stores'
\___ parser error

That complicates things. :(

Thanks
Jin Yao


Thanks
Jin Yao


  Performance counter stats for 'system wide':

   2.10 Joules power/energy-pkg/ #    0.000 K/sec
   8,009.89 msec   cpu-clock #    7.995 CPUs utilized
    140    context-switches  #    0.017 K/sec
  9    cpu-migrations    #    0.001 K/sec
 66    page-faults   #    0.008 K/sec
 10,671,929    cycles    #    0.001 GHz
  4,736,880    instructions  #    0.44  insn per cycle
    942,951    branches  #    0.118 M/sec
 76,096    branch-misses #    8.07% of all branches

    1.001809960 seconds time elapsed

Signed-off-by: Jin Yao 
---
  tools/perf/Documentation/perf-stat.txt | 5 +
  tools/perf/builtin-stat.c  | 4 +++-
  tools/perf/util/stat.h | 1 +
  3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-stat.txt 
b/tools/perf/Documentation/perf-stat.txt
index 5d4a673d7621..75a83c2e4dc5 100644
--- a/tools/perf/Documentation/perf-stat.txt
+++ b/tools/perf/Documentation/perf-stat.txt
@@ -438,6 +438,11 @@ convenient for post processing.
  --summary::
  Print summary for interval mode (-I).
+--add-default::
+The default event list includes the most common events which are widely
+used by users. But with -e option, the perf only counts the events assigned
+by -e

Re: [PATCH phy] PHY: Ingenic: fix unconditional build of phy-ingenic-usb

2020-12-22 Thread Vinod Koul

On 22-12-20, 13:10, Alexander Lobakin wrote:
> Currently drivers/phy/ingenic/Makefile adds phy-ingenic-usb to targets
> not depending on actual Kconfig symbol CONFIG_PHY_INGENIC_USB, so this
> driver always gets built[-in] on every system.
> Add missing dependency.

Applied, thanks

-- 
~Vinod

[PATCH v2] arm64: dts: mt8192: add nor_flash device node

2020-12-22 Thread Bayi Cheng

From: bayi cheng 

add nor_flash device node

Change-Id: I79f0228529bd8a33e5f354b7a861a4ec8d92e9ba
Signed-off-by: bayi cheng 
---
Change in v2:
1: add dependent patch of arm soc
2: change compatible name

Depends on:
https://patchwork.kernel.org/patch/11713559/
[v4,1/3] arm64: dts: Add Mediatek SoC MT8192 and evaluation board dts and 
Makefile
---
 arch/arm64/boot/dts/mediatek/mt8192.dtsi | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm64/boot/dts/mediatek/mt8192.dtsi 
b/arch/arm64/boot/dts/mediatek/mt8192.dtsi
index e12e024..751c877 100644
--- a/arch/arm64/boot/dts/mediatek/mt8192.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8192.dtsi
@@ -379,6 +379,19 @@
status = "disabled";
};
 
+   nor_flash: spi@11234000 {
+   compatible = "mediatek,mt8192-nor";
+   reg = <0 0x11234000 0 0xe0>;
+   interrupts = ;
+   clocks = <>,
+<>,
+<>;
+   clock-names = "spi", "sf", "axi";
+   #address-cells = <1>;
+   #size-cells = <0>;
+   status = "disable";
+   };
+
i2c3: i2c3@11cb {
compatible = "mediatek,mt8192-i2c";
reg = <0 0x11cb 0 0x1000>,
-- 
1.9.1

Re: [PATCH v2 14/48] opp: Filter out OPPs based on availability of a required-OPP

2020-12-22 Thread Viresh Kumar

On 22-12-20, 22:17, Dmitry Osipenko wrote:
> 22.12.2020 11:59, Viresh Kumar пишет:
> > On 17-12-20, 21:06, Dmitry Osipenko wrote:
> >> A required OPP may not be available, and thus, all OPPs which are using
> >> this required OPP should be unavailable too.
> >>
> >> Signed-off-by: Dmitry Osipenko 
> >> ---
> >>  drivers/opp/core.c | 11 ++-
> >>  1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > Please send a separate patchset for fixes, as these can also go to 5.11 
> > itself.
> 
> Alright, although I don't think that this patch fixes any problems for
> existing OPP users.

Because nobody is using this feature, but otherwise this is a fix for me.

> >> diff --git a/drivers/opp/core.c b/drivers/opp/core.c
> >> index d9feb7639598..3d02fe33630b 100644
> >> --- a/drivers/opp/core.c
> >> +++ b/drivers/opp/core.c
> >> @@ -1588,7 +1588,7 @@ int _opp_add(struct device *dev, struct dev_pm_opp 
> >> *new_opp,
> >> struct opp_table *opp_table, bool rate_not_available)
> >>  {
> >>struct list_head *head;
> >> -  int ret;
> >> +  int i, ret;
> >>  
> >>mutex_lock(_table->lock);
> >>head = _table->opp_list;
> >> @@ -1615,6 +1615,15 @@ int _opp_add(struct device *dev, struct dev_pm_opp 
> >> *new_opp,
> >> __func__, new_opp->rate);
> >>}
> >>  
> >> +  for (i = 0; i < opp_table->required_opp_count && new_opp->available; 
> >> i++) {
> >> +  if (new_opp->required_opps[i]->available)
> >> +  continue;
> >> +
> >> +  new_opp->available = false;
> >> +  dev_warn(dev, "%s: OPP not supported by required OPP %pOF 
> >> (%lu)\n",
> >> +   __func__, new_opp->required_opps[i]->np, 
> >> new_opp->rate);
> > 
> > Why not just break from here ?
> 
> The new_opp could be already marked as unavailable by a previous voltage
> check, hence this loop should be skipped entirely in that case.

Then add a separate check for that before the loop as we don't need that check
on every iteration here.

-- 
viresh

Re: [PATCH v1] scsi: ufs-mediatek: Enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL

2020-12-22 Thread Stanley Chu

Hi Can,

On Tue, 2020-12-22 at 19:34 +0800, Can Guo wrote:
> On 2020-12-22 15:29, Stanley Chu wrote:
> > Flush during hibern8 is sufficient on MediaTek platforms, thus
> > enable UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL to skip enabling
> > fWriteBoosterBufferFlush during WriteBooster initialization.
> > 
> > Signed-off-by: Stanley Chu 
> > ---
> >  drivers/scsi/ufs/ufs-mediatek.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/drivers/scsi/ufs/ufs-mediatek.c 
> > b/drivers/scsi/ufs/ufs-mediatek.c
> > index 80618af7c872..c55202b92a43 100644
> > --- a/drivers/scsi/ufs/ufs-mediatek.c
> > +++ b/drivers/scsi/ufs/ufs-mediatek.c
> > @@ -661,6 +661,7 @@ static int ufs_mtk_init(struct ufs_hba *hba)
> > 
> > /* Enable WriteBooster */
> > hba->caps |= UFSHCD_CAP_WB_EN;
> > +   hba->quirks |= UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL;
> > hba->vps->wb_flush_threshold = UFS_WB_BUF_REMAIN_PERCENT(80);
> > 
> > if (host->caps & UFS_MTK_CAP_DISABLE_AH8)
> 
> I guess we need it too...

AHHA, if you decide to add this in your platform too later, maybe we
could change the way it does: Keep manual flush disabled by default and
remove this quirk.

Thanks,
Stanley Chu
> 
> Change LGTM.
> 
> Regards,
> 
> Can Guo.

Re: [PATCH v2 11/48] opp: Add dev_pm_opp_find_level_ceil()

2020-12-22 Thread Viresh Kumar

On 22-12-20, 22:15, Dmitry Osipenko wrote:
> 22.12.2020 09:42, Viresh Kumar пишет:
> > On 17-12-20, 21:06, Dmitry Osipenko wrote:
> >> Add a ceil version of the dev_pm_opp_find_level(). It's handy to have if
> >> levels don't start from 0 in OPP table and zero usually means a minimal
> >> level.
> >>
> >> Signed-off-by: Dmitry Osipenko 
> > 
> > Why doesn't the exact version work for you here ?
> > 
> 
> The exact version won't find OPP for level=0 if levels don't start with
> 0, where 0 means that minimal level is desired.

Right, but why do you need to send 0 for your platform ?

-- 
viresh

Re: [PATCH v3 3/5] RISC-V: Align the .init.text section

2020-12-22 Thread Palmer Dabbelt


On Fri, 18 Dec 2020 00:19:09 PST (-0800), ati...@atishpatra.org wrote:

On Thu, Dec 17, 2020 at 12:33 AM Atish Patra  wrote:


On Wed, Dec 16, 2020 at 10:51 PM Palmer Dabbelt  wrote:
>
> On Tue, 15 Dec 2020 22:02:54 PST (-0800), Palmer Dabbelt wrote:
> > On Wed, 04 Nov 2020 16:04:37 PST (-0800), Atish Patra wrote:
> >> In order to improve kernel text protection, we need separate .init.text/
> >> .init.data/.text in separate sections. However, RISC-V linker relaxation
> >> code is not aware of any alignment between sections. As a result, it may
> >> relax any RISCV_CALL relocations between sections to JAL without realizing
> >> that an inter section alignment may move the address farther. That may
> >> lead to a relocation truncated fit error. However, linker relaxation code
> >> is aware of the individual section alignments.
> >>
> >> The detailed discussion on this issue can be found here.
> >> https://github.com/riscv/riscv-gnu-toolchain/issues/738
> >>
> >> Keep the .init.text section aligned so that linker relaxation will take
> >> that as a hint while relaxing inter section calls.
> >> Here are the code size changes for each section because of this change.
> >>
> >> section change in size (in bytes)
> >>   .head.text  +4
> >>   .text   +40
> >>   .init.text  +6530
> >>   .exit.text  +84
> >>
> >> The only significant increase in size happened for .init.text because
> >> all intra relocations also use 2MB alignment.
> >>
> >> Suggested-by: Jim Wilson 
> >> Signed-off-by: Atish Patra 
> >> ---
> >>  arch/riscv/kernel/vmlinux.lds.S | 8 +++-
> >>  1 file changed, 7 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/riscv/kernel/vmlinux.lds.S 
b/arch/riscv/kernel/vmlinux.lds.S
> >> index 3ffbd6cbdb86..cacd7898ba7f 100644
> >> --- a/arch/riscv/kernel/vmlinux.lds.S
> >> +++ b/arch/riscv/kernel/vmlinux.lds.S
> >> @@ -30,7 +30,13 @@ SECTIONS
> >>  . = ALIGN(PAGE_SIZE);
> >>
> >>  __init_begin = .;
> >> -INIT_TEXT_SECTION(PAGE_SIZE)
> >> +__init_text_begin = .;
> >> +.init.text : AT(ADDR(.init.text) - LOAD_OFFSET) ALIGN(SECTION_ALIGN) 
{ \
> >> +_sinittext = .; \
> >> +INIT_TEXT   \
> >> +_einittext = .; \
> >> +}
> >> +
> >>  . = ALIGN(8);
> >>  __soc_early_init_table : {
> >>  __soc_early_init_table_start = .;
> >
> > Not sure what's going on here (or why I wasn't catching it earlier), but 
this
> > is breaking boot on one of my test configs.  I'm not getting any Linux boot
> > spew, so it's something fairly early.  I'm running defconfig with
> >
> > CONFIG_PREEMPT=y
> > CONFIG_DEBUG_PREEMPT=y
> > CONFIG_PROVE_LOCKING=y
> >
> > It looks like that's been throwing a bunch of warnings for a while, but it 
did
> > at least used to boot.  No idea what PREEMPT would have to do with this, and
> > the other two don't generally trigger issues that early in boot (or at 
least,
> > trigger halts that early in boot).
> >


I am able to reproduce this issue but with CONFIG_PROVE_LOCKING not
CONFIG_PREEMPT.
With CONFIG_PREEMPT, I see a bunch of warnings around smp_processor_id
but it boots even with 5.0.
If CONFIG_PROVE_LOCKING is enabled, I am not able to boot using 5.0.
However, 5.2.0 works fine.
I am going to take a look at the issue with 5.0 and PROVE_LOCKING.

The config preempt warnings are resolved by the following patch. I
have tested it in Qemu.

https://patchwork.kernel.org/project/linux-riscv/patch/20201116081238.44223-1-wangkefeng.w...@huawei.com/


Thanks!




> > There's a bunch of other stuff that depends on this that's on for-next so I
> > don't want to just drop it, but I also don't want to break something.  I'm 
just
> > running QEMU's virt board.
> >

I just verified for-next on QEMU 5.2.0 for virt (RV32,64, nommu) and
sifive_u as well.
I will give it a try on unleashed tomorrow as well with the above
configs enabled.

> > I'll take a look again tomorrow night, but if anyone has some time to look
> > that'd be great!
>
> Looks like this breaks on QEMU 5.0.0 but works on 5.2.0.

I will take a look tomorrow to check the root cause.

I guess technically
> that means could be considered a regression, but as we don't really have any
> scheme for which old versions of QEMU we support it's not absolute.  I'd
> usually err on the side of keeping support for older platforms, but in this
> case it's probably just not worth the time so I'm going to just ignore it.
>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv



--
Regards,
Atish

[PATCH v1 2/2] arm64: dts: mt6779: Support ufshci and ufsphy

2020-12-22 Thread Stanley Chu

Support UFS on MT6779 platforms by adding ufshci and ufsphy
nodes in dts file.

Reviewed-by: Hanks Chen 
Signed-off-by: Stanley Chu 
---
 arch/arm64/boot/dts/mediatek/mt6779.dtsi | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/mediatek/mt6779.dtsi 
b/arch/arm64/boot/dts/mediatek/mt6779.dtsi
index 370f309d32de..a8584b00cc9d 100644
--- a/arch/arm64/boot/dts/mediatek/mt6779.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt6779.dtsi
@@ -225,6 +225,41 @@
#clock-cells = <1>;
};
 
+   ufshci: ufshci@1127 {
+   compatible = "mediatek,mt8183-ufshci";
+   reg = <0 0x1127 0 0x2300>;
+   interrupts = ;
+   phys = <>;
+
+   clocks = <_ao CLK_INFRA_UFS>,
+<_ao CLK_INFRA_UFS_TICK>,
+<_ao CLK_INFRA_UFS_AXI>,
+<_ao CLK_INFRA_UNIPRO_TICK>,
+<_ao CLK_INFRA_UNIPRO_MBIST>,
+< CLK_TOP_FAES_UFSFDE>,
+<_ao CLK_INFRA_AES_UFSFDE>,
+<_ao CLK_INFRA_AES_BCLK>;
+   clock-names = "ufs", "ufs_tick", "ufs_axi",
+ "unipro_tick", "unipro_mbist",
+ "aes_top", "aes_infra", "aes_bclk";
+   freq-table-hz = <0 0>, <0 0>, <0 0>,
+   <0 0>, <0 0>, <0 0>,
+   <0 0>, <0 0>;
+
+   mediatek,ufs-disable-ah8;
+   mediatek,ufs-support-va09;
+   };
+
+   ufsphy: phy@11fa {
+   compatible = "mediatek,mt8183-ufsphy";
+   reg = <0 0x11fa 0 0xc000>;
+   #phy-cells = <0>;
+
+   clocks = <_ao CLK_INFRA_UNIPRO_SCK>,
+<_ao CLK_INFRA_UFS_MP_SAP_BCLK>;
+   clock-names = "unipro", "mp";
+   };
+
mfgcfg: clock-controller@13fbf000 {
compatible = "mediatek,mt6779-mfgcfg", "syscon";
reg = <0 0x13fbf000 0 0x1000>;
@@ -266,6 +301,5 @@
reg = <0 0x1b00 0 0x1000>;
#clock-cells = <1>;
};
-
};
 };
-- 
2.18.0

[PATCH v1 1/2] arm64: configs: Support Universal Flash Storage on MediaTek platforms

2020-12-22 Thread Stanley Chu

Support UFS on MediaTek platforms by enabling CONFIG_SCSI_UFS_MEDIATEK.

Reviewed-by: Hanks Chen 
Signed-off-by: Stanley Chu 
---
 arch/arm64/configs/defconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 838301650a79..89ab646e0a1e 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -282,6 +282,7 @@ CONFIG_MEGARAID_SAS=y
 CONFIG_SCSI_MPT3SAS=m
 CONFIG_SCSI_UFSHCD=y
 CONFIG_SCSI_UFSHCD_PLATFORM=y
+CONFIG_SCSI_UFS_MEDIATEK=m
 CONFIG_SCSI_UFS_QCOM=m
 CONFIG_SCSI_UFS_HISI=y
 CONFIG_ATA=y
-- 
2.18.0

[PATCH v1 0/2] arm64: Support Universal Flash Storage on MediaTek MT6779 platform

2020-12-22 Thread Stanley Chu

Hi,
This series adds UFS (Universal Flash Storage) support on MediaTek MT6779 SoC 
platform.

Stanley Chu (2):
  arm64: configs: Support Universal Flash Storage on MediaTek platforms
  arm64: dts: mt6779: Support ufshci and ufsphy

 arch/arm64/boot/dts/mediatek/mt6779.dtsi | 36 +++-
 arch/arm64/configs/defconfig |  1 +
 2 files changed, 36 insertions(+), 1 deletion(-)

-- 
2.18.0

Re: [PATCH 1/2] mm/madvise: allow process_madvise operations on entire memory range

2020-12-22 Thread Suren Baghdasaryan

On Tue, Dec 22, 2020 at 9:48 AM Suren Baghdasaryan  wrote:
>
> On Tue, Dec 22, 2020 at 5:44 AM Christoph Hellwig  wrote:
> >
> > On Fri, Dec 11, 2020 at 09:27:46PM +0100, Jann Horn wrote:
> > > > Can we just use one element in iovec to indicate entire address rather
> > > > than using up the reserved flags?
> > > >
> > > > struct iovec {
> > > > .iov_base = NULL,
> > > > .iov_len = (~(size_t)0),
> > > > };
> > >
> > > In addition to Suren's objections, I think it's also worth considering
> > > how this looks in terms of compat API. If a compat process does
> > > process_madvise() on another compat process, it would be specifying
> > > the maximum 32-bit number, rather than the maximum 64-bit number, so
> > > you'd need special code to catch that case, which would be ugly.
> > >
> > > And when a compat process uses this API on a non-compat process, it
> > > semantically gets really weird: The actual address range covered would
> > > be larger than the address range specified.
> > >
> > > And if we want different access checks for the two flavors in the
> > > future, gating that different behavior on special values in the iovec
> > > would feel too magical to me.
> > >
> > > And the length value SIZE_MAX doesn't really make sense anyway because
> > > the length of the whole address space would be SIZE_MAX+1, which you
> > > can't express.
> > >
> > > So I'm in favor of a new flag, and strongly against using SIZE_MAX as
> > > a magic number here.
> >
> > Yes, using SIZE_MAX is a horrible interface in this case.  I'm not
> > a huge fan of a flag either.  What is the use case for the madvise
> > to all of a processes address space anyway?
>
> Thanks for the feedback! The use case is userspace memory reaping
> similar to oom-reaper. Detailed justification is here:
> https://lore.kernel.org/linux-mm/20201124053943.1684874-1-sur...@google.com

Actually this post in the most informative and includes test results:
https://lore.kernel.org/linux-api/cajucfpgz1kpm3g1gzh+09z7aowkg05qsammisj7h5mdmrrr...@mail.gmail.com/

mmotm 2020-12-22-20-07 uploaded

2020-12-22 Thread akpm

The mm-of-the-moment snapshot 2020-12-22-20-07 has been uploaded to

   https://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

https://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (5.x
or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
https://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

https://github.com/hnaz/linux-mm

The directory https://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is also available at

https://github.com/hnaz/linux-mm



This mmotm tree contains the following patches against 5.10:
(patches marked "*" will be included in linux-next)

  origin.patch
* kasan-drop-unnecessary-gpl-text-from-comment-headers.patch
* kasan-kasan_vmalloc-depends-on-kasan_generic.patch
* kasan-group-vmalloc-code.patch
* kasan-shadow-declarations-only-for-software-modes.patch
* kasan-rename-unpoison_shadow-to-unpoison_range.patch
* kasan-rename-kasan_shadow_-to-kasan_granule_.patch
* kasan-only-build-initc-for-software-modes.patch
* kasan-split-out-shadowc-from-commonc.patch
* kasan-define-kasan_memory_per_shadow_page.patch
* kasan-rename-report-and-tags-files.patch
* kasan-dont-duplicate-config-dependencies.patch
* kasan-hide-invalid-free-check-implementation.patch
* kasan-decode-stack-frame-only-with-kasan_stack_enable.patch
* kasan-arm64-only-init-shadow-for-software-modes.patch
* kasan-arm64-only-use-kasan_depth-for-software-modes.patch
* kasan-arm64-move-initialization-message.patch
* kasan-arm64-rename-kasan_init_tags-and-mark-as-__init.patch
* kasan-rename-addr_has_shadow-to-addr_has_metadata.patch
* kasan-rename-print_shadow_for_address-to-print_memory_metadata.patch
* kasan-rename-shadow-layout-macros-to-meta.patch
* kasan-separate-metadata_fetch_row-for-each-mode.patch
* kasan-introduce-config_kasan_hw_tags.patch
* arm64-enable-armv85-a-asm-arch-option.patch
* arm64-mte-add-in-kernel-mte-helpers.patch
* arm64-mte-reset-the-page-tag-in-page-flags.patch
* arm64-mte-add-in-kernel-tag-fault-handler.patch
* arm64-kasan-allow-enabling-in-kernel-mte.patch
* arm64-mte-convert-gcr_user-into-an-exclude-mask.patch
* arm64-mte-switch-gcr_el1-in-kernel-entry-and-exit.patch
* kasan-mm-untag-page-address-in-free_reserved_area.patch
* arm64-kasan-align-allocations-for-hw_tags.patch
* arm64-kasan-add-arch-layer-for-memory-tagging-helpers.patch
* kasan-define-kasan_granule_size-for-hw_tags.patch
* kasan-x86-s390-update-undef-config_kasan.patch
* kasan-arm64-expand-config_kasan-checks.patch
* kasan-arm64-implement-hw_tags-runtime.patch
* kasan-arm64-print-report-from-tag-fault-handler.patch
* kasan-mm-reset-tags-when-accessing-metadata.patch
* kasan-arm64-enable-config_kasan_hw_tags.patch
* kasan-add-documentation-for-hardware-tag-based-mode.patch
* kselftest-arm64-check-gcr_el1-after-context-switch.patch
* kasan-simplify-quarantine_put-call-site.patch
* kasan-rename-get_alloc-free_info.patch
* kasan-introduce-set_alloc_info.patch
* kasan-arm64-unpoison-stack-only-with-config_kasan_stack.patch
* kasan-allow-vmap_stack-for-hw_tags-mode.patch
* kasan-remove-__kasan_unpoison_stack.patch
* kasan-inline-kasan_reset_tag-for-tag-based-modes.patch
* kasan-inline-random_tag-for-hw_tags.patch
* kasan-open-code-kasan_unpoison_slab.patch
* kasan-inline-unpoison_range-and-check_invalid_free.patch
* kasan-add-and-integrate-kasan-boot-parameters.patch
* kasan-mm-check-kasan_enabled-in-annotations.patch
* kasan-mm-rename-kasan_poison_kfree.patch
* kasan-dont-round_up-too-much.patch
* kasan-simplify-assign_tag-and-set_tag-calls.patch
* kasan-clarify-comment-in-__kasan_kfree_large.patch
* kasan-sanitize-objects-when-metadata-doesnt-fit.patch
* kasan-mm-allow-cache-merging-with-no-metadata.patch
* kasan-update-documentation.patch
* mm-slub-call-account_slab_page-after-slab-page-initialization.patch
* mm-memcg-slab-pre-allocate-obj_cgroups-for-slab-caches-with-slab_account.patch
* lib-zlib-fix-inflating-zlib-streams-on-s390.patch
* selftests-vm-fix-building-protection-keys-test.patch
*

Re: [PATCH v2 1/2] mm: cma: allocate cma areas bottom-up

2020-12-22 Thread Andrew Morton

On Mon, 21 Dec 2020 09:05:51 -0800 Roman Gushchin  wrote:

> Subject: [PATCH v3 1/2] mm: cma: allocate cma areas bottom-up

i386 allmodconfig:

In file included from ./include/vdso/const.h:5,
 from ./include/linux/const.h:4,
 from ./include/linux/bits.h:5,
 from ./include/linux/bitops.h:6,
 from ./include/linux/kernel.h:11,
 from ./include/asm-generic/bug.h:20,
 from ./arch/x86/include/asm/bug.h:93,
 from ./include/linux/bug.h:5,
 from ./include/linux/mmdebug.h:5,
 from ./include/linux/mm.h:9,
 from ./include/linux/memblock.h:13,
 from mm/cma.c:24:
mm/cma.c: In function ‘cma_declare_contiguous_nid’:
./include/uapi/linux/const.h:20:19: warning: conversion from ‘long long 
unsigned int’ to ‘phys_addr_t’ {aka ‘unsigned int’} changes value from 
‘4294967296’ to ‘0’ [-Woverflow]
 #define __AC(X,Y) (X##Y)
   ^~
./include/uapi/linux/const.h:21:18: note: in expansion of macro ‘__AC’
 #define _AC(X,Y) __AC(X,Y)
  ^~~~
./include/linux/sizes.h:46:18: note: in expansion of macro ‘_AC’
 #define SZ_4G_AC(0x1, ULL)
  ^~~
mm/cma.c:349:53: note: in expansion of macro ‘SZ_4G’
addr = memblock_alloc_range_nid(size, alignment, SZ_4G,
 ^

Re: [PATCH] riscv: return -ENOSYS for syscall -1

2020-12-22 Thread Palmer Dabbelt


On Tue, 22 Dec 2020 08:22:19 PST (-0800), tycho@tycho.pizza wrote:

On Mon, Dec 21, 2020 at 11:52:00PM +0100, Andreas Schwab wrote:

Properly return -ENOSYS for syscall -1 instead of leaving the return value
uninitialized.  This fixes the strace teststuite.

Fixes: 5340627e3fe0 ("riscv: add support for SECCOMP and SECCOMP_FILTER")
Signed-off-by: Andreas Schwab 
---
 arch/riscv/kernel/entry.S | 9 +
 1 file changed, 1 insertion(+), 8 deletions(-)

diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
index 524d918f3601..d07763001eb0 100644
--- a/arch/riscv/kernel/entry.S
+++ b/arch/riscv/kernel/entry.S
@@ -186,14 +186,7 @@ check_syscall_nr:
 * Syscall number held in a7.
 * If syscall number is above allowed value, redirect to ni_syscall.
 */
-   bge a7, t0, 1f
-   /*
-* Check if syscall is rejected by tracer, i.e., a7 == -1.
-* If yes, we pretend it was executed.
-*/
-   li t1, -1
-   beq a7, t1, ret_from_syscall_rejected
-   blt a7, t1, 1f
+   bgeu a7, t0, 1f


IIUC, this is all dead code anyway for the path where seccomp actually
rejects the syscall, since it should do the rejection directly in
handle_syscall_trace_enter(), which is called above this hunk. So it
seems good to me.

Reviewed-by: Tycho Andersen 


Thanks, this is on fixes.

Re: [PATCH] HID: Add Wireless Radio Control feature for Chicony devices

2020-12-22 Thread Jian-Hong Pan

Chris Chiu  於 2020年12月23日 週三 上午12:41寫道：
>
> On Tue, Dec 22, 2020 at 3:41 PM Jian-Hong Pan  wrote:
> >
> > Some Chicony's keyboards support airplane mode hotkey (Fn+F2) with
> > "Wireless Radio Control" feature. For example, the wireless keyboard
> > [04f2:1236] shipped with ASUS all-in-one desktop.
> >
> > After consulting Chicony for this hotkey, learned the device will send
> > with 0x11 as the report ID and 0x1 as the value when the key is pressed
> > down.
> >
> > This patch maps the event as KEY_RFKILL.
> >
> > Signed-off-by: Jian-Hong Pan 
> > ---
> >  drivers/hid/hid-chicony.c | 58 +++
> >  drivers/hid/hid-ids.h |  1 +
> >  2 files changed, 59 insertions(+)
> >
> > diff --git a/drivers/hid/hid-chicony.c b/drivers/hid/hid-chicony.c
> > index 3f0ed6a95223..aca963aa0f1e 100644
> > --- a/drivers/hid/hid-chicony.c
> > +++ b/drivers/hid/hid-chicony.c
> > @@ -21,6 +21,42 @@
> >
> >  #include "hid-ids.h"
> >
> > +#define KEY_PRESSED0x01
> > +#define CH_WIRELESS_CTL_REPORT_ID  0x11
> > +
> > +static int ch_report_wireless(struct hid_report *report, u8 *data, int 
> > size)
> > +{
> > +   struct hid_device *hdev = report->device;
> > +   struct input_dev *input;
> > +
> > +   if (report->id != CH_WIRELESS_CTL_REPORT_ID ||
> > +   report->maxfield != 1 ||
> > +   *report->field[0]->value != KEY_PRESSED)
>
> Maybe replace this line with hid_check_keys_pressed() and the KEY_PRESSED
> is not required.

Thanks for your suggestion!

I tried hid_check_keys_pressed().  But, it always returns no key is
pressed in this case.
However, if the idea is: Since there is already a report, there must
be an event from the input.  So, the key press checking is duplicated.
This idea makes sense.  I will have a modification for this.

Thanks!
Jian-Hong Pan

> > +   return 0;
> > +
> > +   input = report->field[0]->hidinput->input;
> > +   if (!input) {
> > +   hid_warn(hdev, "can't find wireless radio control's input");
> > +   return 0;
> > +   }
> > +
> > +   input_report_key(input, KEY_RFKILL, 1);
> > +   input_sync(input);
> > +   input_report_key(input, KEY_RFKILL, 0);
> > +   input_sync(input);
> > +
> > +   return 1;
> > +}
> > +
> > +static int ch_raw_event(struct hid_device *hdev,
> > +   struct hid_report *report, u8 *data, int size)
> > +{
> > +   if (report->application == HID_GD_WIRELESS_RADIO_CTLS)
> > +   return ch_report_wireless(report, data, size);
> > +
> > +   return 0;
> > +}
> > +
> >  #define ch_map_key_clear(c)hid_map_usage_clear(hi, usage, bit, max, \
> > EV_KEY, (c))
> >  static int ch_input_mapping(struct hid_device *hdev, struct hid_input *hi,
> > @@ -77,10 +113,30 @@ static __u8 *ch_switch12_report_fixup(struct 
> > hid_device *hdev, __u8 *rdesc,
> > return rdesc;
> >  }
> >
> > +static int ch_probe(struct hid_device *hdev, const struct hid_device_id 
> > *id)
> > +{
> > +   int ret;
> > +
> > +   hdev->quirks |= HID_QUIRK_INPUT_PER_APP;
> > +   ret = hid_parse(hdev);
> > +   if (ret) {
> > +   hid_err(hdev, "Chicony hid parse failed: %d\n", ret);
> > +   return ret;
> > +   }
> > +
> > +   ret = hid_hw_start(hdev, HID_CONNECT_DEFAULT);
> > +   if (ret) {
> > +   hid_err(hdev, "Chicony hw start failed: %d\n", ret);
> > +   return ret;
> > +   }
> > +
> > +   return 0;
> > +}
> >
> >  static const struct hid_device_id ch_devices[] = {
> > { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
> > USB_DEVICE_ID_CHICONY_TACTICAL_PAD) },
> > { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
> > USB_DEVICE_ID_CHICONY_WIRELESS2) },
> > +   { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
> > USB_DEVICE_ID_CHICONY_WIRELESS3) },
> > { HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, 
> > USB_DEVICE_ID_CHICONY_ACER_SWITCH12) },
> > { }
> >  };
> > @@ -91,6 +147,8 @@ static struct hid_driver ch_driver = {
> > .id_table = ch_devices,
> > .report_fixup = ch_switch12_report_fixup,
> > .input_mapping = ch_input_mapping,
> > +   .probe = ch_probe,
> > +   .raw_event = ch_raw_event,
> >  };
> >  module_hid_driver(ch_driver);
> >
> > diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h
> > index 4c5f23640f9c..06d90301a3dc 100644
> > --- a/drivers/hid/hid-ids.h
> > +++ b/drivers/hid/hid-ids.h
> > @@ -270,6 +270,7 @@
> >  #define USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE 0x1053
> >  #define USB_DEVICE_ID_CHICONY_PIXART_USB_OPTICAL_MOUSE20x0939
> >  #define USB_DEVICE_ID_CHICONY_WIRELESS20x1123
> > +#define USB_DEVICE_ID_CHICONY_WIRELESS30x1236
> >  #define USB_DEVICE_ID_ASUS_AK1D0x1125
> >  #define USB_DEVICE_ID_CHICONY_TOSHIBA_WT10A0x1408
> >  #define USB_DEVICE_ID_CHICONY_ACER_SWITCH120x1421
> > --
> > 2.29.2
> >

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-22 Thread Liang Li

> On 12/21/20 11:46 PM, Liang Li wrote:
> > Free page reporting only supports buddy pages, it can't report the
> > free pages reserved for hugetlbfs case. On the other hand, hugetlbfs
> > is a good choice for a system with a huge amount of RAM, because it
> > can help to reduce the memory management overhead and improve system
> > performance.
> > This patch add the support for reporting hugepages in the free list
> > of hugetlb, it canbe used by virtio_balloon driver for memory
> > overcommit and pre zero out free pages for speeding up memory population.
>
> My apologies as I do not follow virtio_balloon driver.  Comments from
> the hugetlb perspective.

Any comments are welcome.


> >  static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
> > @@ -5531,6 +5537,29 @@ follow_huge_pgd(struct mm_struct *mm, unsigned long 
> > address, pgd_t *pgd, int fla
> >   return pte_page(*(pte_t *)pgd) + ((address & ~PGDIR_MASK) >> 
> > PAGE_SHIFT);
> >  }
> >
> > +bool isolate_free_huge_page(struct page *page, struct hstate *h, int nid)
>
> Looks like this always returns true.  Should it be type void?

will change in the next revision.

> > +{
> > + bool ret = true;
> > +
> > + VM_BUG_ON_PAGE(!PageHead(page), page);
> > +
> > + list_move(>lru, >hugepage_activelist);
> > + set_page_refcounted(page);
> > + h->free_huge_pages--;
> > + h->free_huge_pages_node[nid]--;
> > +
> > + return ret;
> > +}
> > +
>
> ...

> > +static void
> > +hugepage_reporting_drain(struct page_reporting_dev_info *prdev,
> > +  struct hstate *h, struct scatterlist *sgl,
> > +  unsigned int nents, bool reported)
> > +{
> > + struct scatterlist *sg = sgl;
> > +
> > + /*
> > +  * Drain the now reported pages back into their respective
> > +  * free lists/areas. We assume at least one page is populated.
> > +  */
> > + do {
> > + struct page *page = sg_page(sg);
> > +
> > + putback_isolate_huge_page(h, page);
> > +
> > + /* If the pages were not reported due to error skip flagging 
> > */
> > + if (!reported)
> > + continue;
> > +
> > + __SetPageReported(page);
> > + } while ((sg = sg_next(sg)));
> > +
> > + /* reinitialize scatterlist now that it is empty */
> > + sg_init_table(sgl, nents);
> > +}
> > +
> > +/*
> > + * The page reporting cycle consists of 4 stages, fill, report, drain, and
> > + * idle. We will cycle through the first 3 stages until we cannot obtain a
> > + * full scatterlist of pages, in that case we will switch to idle.
> > + */
>
> As mentioned, I am not familiar with virtio_balloon and the overall design.
> So, some of this does not make sense to me.
>
> > +static int
> > +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev,
> > +  struct hstate *h, unsigned int nid,
> > +  struct scatterlist *sgl, unsigned int *offset)
> > +{
> > + struct list_head *list = >hugepage_freelists[nid];
> > + unsigned int page_len = PAGE_SIZE << h->order;
> > + struct page *page, *next;
> > + long budget;
> > + int ret = 0, scan_cnt = 0;
> > +
> > + /*
> > +  * Perform early check, if free area is empty there is
> > +  * nothing to process so we can skip this free_list.
> > +  */
> > + if (list_empty(list))
> > + return ret;
>
> Do note that not all entries on the hugetlb free lists are free.  Reserved
> entries are also on the free list.  The actual number of free entries is
> 'h->free_huge_pages - h->resv_huge_pages'.
> Is the intention to process reserved pages as well as free pages?

Yes, Reserved pages was treated as 'free pages'

> > +
> > + spin_lock_irq(_lock);
> > +
> > + if (huge_page_order(h) > MAX_ORDER)
> > + budget = HUGEPAGE_REPORTING_CAPACITY;
> > + else
> > + budget = HUGEPAGE_REPORTING_CAPACITY * 32;
> > +
> > + /* loop through free list adding unreported pages to sg list */
> > + list_for_each_entry_safe(page, next, list, lru) {
> > + /* We are going to skip over the reported pages. */
> > + if (PageReported(page)) {
> > + if (++scan_cnt >= MAX_SCAN_NUM) {
> > + ret = scan_cnt;
> > + break;
> > + }
> > + continue;
> > + }
> > +
> > + /*
> > +  * If we fully consumed our budget then update our
> > +  * state to indicate that we are requesting additional
> > +  * processing and exit this list.
> > +  */
> > + if (budget < 0) {
> > + atomic_set(>state, PAGE_REPORTING_REQUESTED);
> > + next = page;
> > + break;
> > + }
> > +
> > + /* Attempt to pull page from list and place in

Re: [PATCH 2/3] aspeed-video: clear spurious interrupt bits unconditionally

2020-12-22 Thread Zev Weiss

On Tue, Dec 22, 2020 at 08:53:33PM CST, Ryan Chen wrote:

-Original Message-
From: Joel Stanley 
Sent: Wednesday, December 23, 2020 9:07 AM
To: Zev Weiss ; Ryan Chen

Cc: Eddie James ; Mauro Carvalho Chehab
; Andrew Jeffery ;
linux-me...@vger.kernel.org; OpenBMC Maillist ;
Linux ARM ; linux-aspeed
; Linux Kernel Mailing List
; Jae Hyun Yoo 
Subject: Re: [PATCH 2/3] aspeed-video: clear spurious interrupt bits
unconditionally

On Tue, 22 Dec 2020 at 19:14, Zev Weiss  wrote:
>
> On Mon, Dec 21, 2020 at 10:47:37PM CST, Joel Stanley wrote:
> >On Tue, 15 Dec 2020 at 02:46, Zev Weiss  wrote:
> >>
> >> Instead of testing and conditionally clearing them one by one, we
> >> can instead just unconditionally clear them all at once.
> >>
> >> Signed-off-by: Zev Weiss 
> >
> >I had a poke at the assembly and it looks like GCC is clearing the
> >bits unconditionally anyway, so removing the tests provides no change.
> >
> >Combining them is a good further optimization.
> >
> >Reviewed-by: Joel Stanley 
> >
> >A question unrelated to this patch: Do you know why the driver
> >doesn't clear the status bits in the interrupt handler? I would
> >expect it to write the value of sts back to the register to ack the
> >pending interrupt.
> >
>
> No, I don't, and I was sort of wondering the same thing actually --
> I'm not deeply familiar with this hardware or driver though, so I was
> a bit hesitant to start messing with things.  (Though maybe doing so
> would address the "stickiness" aspect when it does manifest.)  Perhaps
> Eddie or Jae can shed some light here?

I think you're onto something here - this would be why the status bits seem to
stick until the device is reset.

Until Aspeed can clarify if this is a hardware or software issue, I suggest we 
ack
the bits and log a message when we see them, instead of always ignoring them
without taking any action.

Can you write a patch that changes the interrupt handler to ack status bits as 
it
handles each of them?

Hello Zev, before the patch, do you met issue with irq handler? [continuous 
incoming?]

In aspeed_video_irq handler should only handle enable interrupt expected.
  u32 sts = aspeed_video_read(video, VE_INTERRUPT_STATUS);
+ sts &= aspeed_video_read(video, VE_INTERRUPT_CTRL);

Ryan

Hi Ryan,

Prior to any of these patches I encountered a problem pretty much 
exactly like what Jae described in his commit message in 65d270acb2d 
(but the kernel I was running included that patch).  Adding the 
diagnostic in patch #1 of this series showed that it was apparently the 
same problem, just with a different interrupt that Jae's patch didn't 
include.

From what you wrote above, I gather that it is in fact expected for the 
hardware to assert interrupts that aren't enabled in VE_INTERRUPT_CTRL?  
If so, I guess something like that would obviate the need for both Jae's 
earlier patch and this whole series.

I think the question Joel raised is somewhat independent though -- if 
the VE_INTERRUPT_STATUS register asserts interrupts we're not actually 
using, should the driver acknowledge them anyway or just leave them 
alone?  (Though if we're just going to ignore them anyway maybe it 
doesn't ultimately matter very much.)

Zev

[PATCH] checkpatch: Prefer strscpy to strlcpy

2020-12-22 Thread Joe Perches

Prefer strscpy over the deprecated strlcpy function.

Requested-by: Andrew Morton 
Signed-off-by: Joe Perches 
---
 scripts/checkpatch.pl | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 00085308ed9d..27679cc0ec17 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -6646,6 +6646,12 @@ sub process {
 #  }
 #  }
 
+# strlcpy uses that should likely be strscpy
+   if ($line =~ /\bstrlcpy\s*\(/) {
+   WARN("STRLCPY",
+"Prefer strscpy over strlcpy - see: 
https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw\@mail.gmail.com/\n;
 . $herecurr);
+   }
+
 # typecasts on min/max could be min_t/max_t
if ($perl_version_ok &&
defined $stat &&

Re: [PATCH] vdpa_sim: use iova module to allocate IOVA addresses

2020-12-22 Thread Jason Wang




On 2020/12/23 上午1:45, Stefano Garzarella wrote:

The identical mapping used until now created issues when mapping
different virtual pages with the same physical address.
To solve this issue, we can use the iova module, to handle the IOVA
allocation.
For semplicity we use an IOVA allocator with byte granularity.



Should be simplicity, so did one comment below.




We add two new functions, vdpasim_map_range() and vdpasim_unmap_range(),
to handle the IOVA allocation and the registration into the IOMMU/IOTLB.
These functions are used by dma_map_ops callbacks.

Signed-off-by: Stefano Garzarella 



Few nits, but:

Acked-by: Jason Wang 



---
  drivers/vdpa/vdpa_sim/vdpa_sim.h |   2 +
  drivers/vdpa/vdpa_sim/vdpa_sim.c | 108 +++
  drivers/vdpa/Kconfig |   1 +
  3 files changed, 69 insertions(+), 42 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.h b/drivers/vdpa/vdpa_sim/vdpa_sim.h
index b02142293d5b..6efe205e583e 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.h
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.h
@@ -6,6 +6,7 @@
  #ifndef _VDPA_SIM_H
  #define _VDPA_SIM_H
  
+#include 

  #include 
  #include 
  #include 
@@ -55,6 +56,7 @@ struct vdpasim {
/* virtio config according to device type */
void *config;
struct vhost_iotlb *iommu;
+   struct iova_domain iova;
void *buffer;
u32 status;
u32 generation;
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index b3fcc67bfdf0..341b9daf2ea4 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -17,6 +17,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #include "vdpa_sim.h"
  
@@ -128,30 +129,57 @@ static int dir_to_perm(enum dma_data_direction dir)

return perm;
  }
  
+static dma_addr_t vdpasim_map_range(struct vdpasim *vdpasim, phys_addr_t paddr,

+   size_t size, unsigned int perm)
+{
+   struct iova *iova;
+   dma_addr_t dma_addr;
+   int ret;
+
+   /* We set the limit_pfn to the maximum (~0UL - 1) */
+   iova = alloc_iova(>iova, size, ~0UL - 1, true);



Let's use ULONG_MAX?



+   if (!iova)
+   return DMA_MAPPING_ERROR;
+
+   dma_addr = iova_dma_addr(>iova, iova);
+
+   spin_lock(>iommu_lock);
+   ret = vhost_iotlb_add_range(vdpasim->iommu, (u64)dma_addr,
+   (u64)dma_addr + size - 1, (u64)paddr, perm);
+   spin_unlock(>iommu_lock);
+
+   if (ret) {
+   __free_iova(>iova, iova);
+   return DMA_MAPPING_ERROR;
+   }
+
+   return dma_addr;
+}
+
+static void vdpasim_unmap_range(struct vdpasim *vdpasim, dma_addr_t dma_addr,
+   size_t size)
+{
+   spin_lock(>iommu_lock);
+   vhost_iotlb_del_range(vdpasim->iommu, (u64)dma_addr,
+ (u64)dma_addr + size - 1);
+   spin_unlock(>iommu_lock);
+
+   free_iova(>iova, iova_pfn(>iova, dma_addr));
+}
+
  static dma_addr_t vdpasim_map_page(struct device *dev, struct page *page,
   unsigned long offset, size_t size,
   enum dma_data_direction dir,
   unsigned long attrs)
  {
struct vdpasim *vdpasim = dev_to_sim(dev);
-   struct vhost_iotlb *iommu = vdpasim->iommu;
-   u64 pa = (page_to_pfn(page) << PAGE_SHIFT) + offset;
-   int ret, perm = dir_to_perm(dir);
+   phys_addr_t paddr = page_to_phys(page) + offset;
+   int perm = dir_to_perm(dir);
  
  	if (perm < 0)

return DMA_MAPPING_ERROR;
  
-	/* For simplicity, use identical mapping to avoid e.g iova

-* allocator.
-*/
-   spin_lock(>iommu_lock);
-   ret = vhost_iotlb_add_range(iommu, pa, pa + size - 1,
-   pa, dir_to_perm(dir));
-   spin_unlock(>iommu_lock);
-   if (ret)
-   return DMA_MAPPING_ERROR;
-
-   return (dma_addr_t)(pa);
+   return vdpasim_map_range(vdpasim, paddr, size, perm);
  }
  
  static void vdpasim_unmap_page(struct device *dev, dma_addr_t dma_addr,

@@ -159,12 +187,8 @@ static void vdpasim_unmap_page(struct device *dev, 
dma_addr_t dma_addr,
   unsigned long attrs)
  {
struct vdpasim *vdpasim = dev_to_sim(dev);
-   struct vhost_iotlb *iommu = vdpasim->iommu;
  
-	spin_lock(>iommu_lock);

-   vhost_iotlb_del_range(iommu, (u64)dma_addr,
- (u64)dma_addr + size - 1);
-   spin_unlock(>iommu_lock);
+   vdpasim_unmap_range(vdpasim, dma_addr, size);
  }
  
  static void *vdpasim_alloc_coherent(struct device *dev, size_t size,

@@ -172,27 +196,22 @@ static void *vdpasim_alloc_coherent(struct device *dev, 
size_t size,
unsigned long attrs)
  {
struct vdpasim *vdpasim = dev_to_sim(dev);
-   struct vhost_iotlb *iommu =

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-22 Thread Liang Li

> On 12/22/20 11:59 AM, Alexander Duyck wrote:
> > On Mon, Dec 21, 2020 at 11:47 PM Liang Li  
> > wrote:
> >> +
> >> +   if (huge_page_order(h) > MAX_ORDER)
> >> +   budget = HUGEPAGE_REPORTING_CAPACITY;
> >> +   else
> >> +   budget = HUGEPAGE_REPORTING_CAPACITY * 32;
> >
> > Wouldn't huge_page_order always be more than MAX_ORDER? Seems like we
> > don't even really need budget since this should probably be pulling
> > out no more than one hugepage at a time.
>
> On standard x86_64 configs, 2MB huge pages are of order 9 < MAX_ORDER (11).
> What is important for hugetlb is the largest order that can be allocated
> from buddy.  Anything bigger is considered a gigantic page and has to be
> allocated differently.
>
> If the code above is trying to distinguish between huge and gigantic pages,
> it is off by 1.  The largest order that can be allocated from the buddy is
> (MAX_ORDER - 1).  So, the check should be '>='.
>
> --
> Mike Kravetz

Yes, you're right!  thanks

Liang

Re: [RFC PATCH 1/3] mm: support hugetlb free page reporting

2020-12-22 Thread Liang Li

> > +hugepage_reporting_cycle(struct page_reporting_dev_info *prdev,
> > +struct hstate *h, unsigned int nid,
> > +struct scatterlist *sgl, unsigned int *offset)
> > +{
> > +   struct list_head *list = >hugepage_freelists[nid];
> > +   unsigned int page_len = PAGE_SIZE << h->order;
> > +   struct page *page, *next;
> > +   long budget;
> > +   int ret = 0, scan_cnt = 0;
> > +
> > +   /*
> > +* Perform early check, if free area is empty there is
> > +* nothing to process so we can skip this free_list.
> > +*/
> > +   if (list_empty(list))
> > +   return ret;
> > +
> > +   spin_lock_irq(_lock);
> > +
> > +   if (huge_page_order(h) > MAX_ORDER)
> > +   budget = HUGEPAGE_REPORTING_CAPACITY;
> > +   else
> > +   budget = HUGEPAGE_REPORTING_CAPACITY * 32;
>
> Wouldn't huge_page_order always be more than MAX_ORDER? Seems like we
> don't even really need budget since this should probably be pulling
> out no more than one hugepage at a time.

I want to disting a 2M page and 1GB page here. The order of 1GB page is greater
than MAX_ORDER while 2M page's order is less than MAX_ORDER.

>
> > +   /* loop through free list adding unreported pages to sg list */
> > +   list_for_each_entry_safe(page, next, list, lru) {
> > +   /* We are going to skip over the reported pages. */
> > +   if (PageReported(page)) {
> > +   if (++scan_cnt >= MAX_SCAN_NUM) {
> > +   ret = scan_cnt;
> > +   break;
> > +   }
> > +   continue;
> > +   }
> > +
>
> It would probably have been better to place this set before your new
> set. I don't see your new set necessarily being the best use for page
> reporting.

I haven't really latched on to what you mean, could you explain it again?

>
> > +   /*
> > +* If we fully consumed our budget then update our
> > +* state to indicate that we are requesting additional
> > +* processing and exit this list.
> > +*/
> > +   if (budget < 0) {
> > +   atomic_set(>state, PAGE_REPORTING_REQUESTED);
> > +   next = page;
> > +   break;
> > +   }
> > +
>
> If budget is only ever going to be 1 then we probably could just look
> at making this the default case for any time we find a non-reported
> page.

and here again.

> > +   /* Attempt to pull page from list and place in scatterlist 
> > */
> > +   if (*offset) {
> > +   isolate_free_huge_page(page, h, nid);
> > +   /* Add page to scatter list */
> > +   --(*offset);
> > +   sg_set_page([*offset], page, page_len, 0);
> > +
> > +   continue;
> > +   }
> > +
>
> There is no point in the continue case if we only have a budget of 1.
> We should probably just tighten up the loop so that all it does is
> search until it finds the 1 page it can pull, pull it, and then return
> it. The scatterlist doesn't serve much purpose and could be reduced to
> just a single entry.

I will think about it more.

> > +static int
> > +hugepage_reporting_process_hstate(struct page_reporting_dev_info *prdev,
> > +   struct scatterlist *sgl, struct hstate *h)
> > +{
> > +   unsigned int leftover, offset = HUGEPAGE_REPORTING_CAPACITY;
> > +   int ret = 0, nid;
> > +
> > +   for (nid = 0; nid < MAX_NUMNODES; nid++) {
> > +   ret = hugepage_reporting_cycle(prdev, h, nid, sgl, );
> > +
> > +   if (ret < 0)
> > +   return ret;
> > +   }
> > +
> > +   /* report the leftover pages before going idle */
> > +   leftover = HUGEPAGE_REPORTING_CAPACITY - offset;
> > +   if (leftover) {
> > +   sgl = [offset];
> > +   ret = prdev->report(prdev, sgl, leftover);
> > +
> > +   /* flush any remaining pages out from the last report */
> > +   spin_lock_irq(_lock);
> > +   hugepage_reporting_drain(prdev, h, sgl, leftover, !ret);
> > +   spin_unlock_irq(_lock);
> > +   }
> > +
> > +   return ret;
> > +}
> > +
>
> If HUGEPAGE_REPORTING_CAPACITY is 1 it would make more sense to
> rewrite this code to just optimize for a find and process a page
> approach rather than trying to batch pages.

Yes, I will make a change. Thanks for your comments!

Liang

Re: [PATCH v4 2/2] firmware: arm_scmi: Augment SMC/HVC to allow optional interrupt

2020-12-22 Thread Florian Fainelli




On 12/22/2020 6:56 AM, Jim Quinlan wrote:
> The SMC/HVC SCMI transport is modified to allow the completion of an SCMI
> message to be indicated by an interrupt rather than the return of the smc
> call.  This accommodates the existing behavior of the BrcmSTB SCMI
> "platform" whose SW is already out in the field and cannot be changed.
> 
> Signed-off-by: Jim Quinlan 

This looks good to me, just one question below:

[snip]

> @@ -111,6 +145,8 @@ static int smc_send_message(struct scmi_chan_info *cinfo,
>   shmem_tx_prepare(scmi_info->shmem, xfer);
>  
>   arm_smccc_1_1_invoke(scmi_info->func_id, 0, 0, 0, 0, 0, 0, 0, );
> + if (scmi_info->irq)
> + wait_for_completion(_info->tx_complete);

Do we need this to have a preceding call to reinit_completion()? It does
not look like this is going to make any practical difference but there
are drivers doing that for correctness.
-- 
Florian

Re: [PATCH] mm/userfaultfd: fix memory corruption due to writeprotect

2020-12-22 Thread Yu Zhao

On Tue, Dec 22, 2020 at 09:56:11PM -0500, Andrea Arcangeli wrote:
> On Tue, Dec 22, 2020 at 04:39:46PM -0700, Yu Zhao wrote:
> > We are talking about non-COW anon pages here -- they can't be mapped
> > more than once. So why not just identify them by checking
> > page_mapcount == 1 and then unconditionally reuse them? (This is
> > probably where I've missed things.)
> 
> The problem in depending on page_mapcount to decide if it's COW or
> non-COW (respectively wp_page_copy or wp_page_reuse) is that is GUP
> may elevate the count of a COW anon page that become a non-COW anon
> page.
> 
> This is Jann's idea not mine.
> 
> The problem is we have an unprivileged long term GUP like vmsplice
> that facilitates elevating the page count indefinitely, until the
> parent finally writes a secret to it. Theoretically a short term pin
> would do it too so it's not just vmpslice, but the short term pin
> would be incredibly more challenging to become a concern since it'd
> kill a phone battery and flash before it can read any data.
> 
> So what happens with your page_mapcount == 1 check is that it doesn't
> mean non-COW (we thought it did until it didn't for the long term gup
> pin in vmsplice).
> 
> Jann's testcases does fork() and set page_mapcount 2 and page_count to
> 2, vmsplice, take unprivileged infinitely long GUP pin to set
> page_count to 3, queue the page in the pipe with page_count elevated,
> munmap to drop page_count to 2 and page_mapcount to 1.
> 
> page_mapcount is 1, so you'd think the page is non-COW and owned by
> the parent, but the child can still read it so it's very much still
> wp_page_copy material if the parent tries to modify it. Otherwise the
> child can read the content.
> 
> This was supposed to be solvable by just doing the COW in gup(write=0)
> case if page_mapcount > 1 with commit 17839856fd58. I'm not exactly
> sure why that didn't fly and it had to be reverted by Peter in
> a308c71bf1e6e19cc2e4ced31853ee0fc7cb439a but at the time this was
> happening I was side tracked by urgent issues and I didn't manage to
> look back of how we ended up with the big hammer page_count == 1 check
> instead to decide if to call wp_page_reuse or wp_page_shared.
> 
> So anyway, the only thing that is clear to me is that keeping the
> child from reading the page_mapcount == 1 pages of the parent, is the
> only reason why wp_page_reuse(vmf) will only be called on
> page_count(page) == 1 and not on page_mapcount(page) == 1.
> 
> It's also the reason why your page_mapcount assumption will risk to
> reintroduce the issue, and I only wish we could put back page_mapcount
> == 1 back there.
> 
> Still even if we put back page_mapcount there, it is not ok to leave
> the page fault with stale TLB entries and to rely on the fact
> wp_page_shared won't run. It'd also avoid the problem but I think if
> you leave stale TLB entries in change_protection just like NUMA
> balancing does, it also requires a catcher just like NUMA balancing
> has, or it'd truly work by luck.
> 
> So until we can put a page_mapcount == 1 check back there, the
> page_count will be by definition unreliable because of the speculative
> lookups randomly elevating all non zero page_counts at any time in the
> background on all pages, so you will never be able to tell if a page
> is true COW or if it's just a spurious COW because of a speculative
> lookup. It is impossible to differentiate a speculative lookup from a
> vmsplice ref in a child.

Thanks for the details.

In your patch, do we need to take wrprotect_rwsem in
handle_userfault() as well? Otherwise, it seems userspace would have
to synchronize between its wrprotect ioctl and fault handler? i.e.,
the fault hander needs to be aware that the content of write-
protected pages can actually change before the iotcl returns.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1283 matches

Mail list logo