date:20201021

[PATCH v2 1/1] scsi: ufs: Fix unexpected values get from ufshcd_read_desc_param()

2020-10-21 Thread Can Guo

Since WB feature has been added, WB related sysfs entries can be accessed
even when an UFS device does not support WB feature. In that case, the
descriptors which are not supported by the UFS device may be wrongly
reported when they are accessed from their corrsponding sysfs entries.
Fix it by adding a sanity check of parameter offset against the actual
decriptor length.

Signed-off-by: Can Guo 
---
 drivers/scsi/ufs/ufshcd.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index a2ebcc8..aeec10d 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -3184,13 +3184,19 @@ int ufshcd_read_desc_param(struct ufs_hba *hba,
/* Get the length of descriptor */
ufshcd_map_desc_id_to_length(hba, desc_id, _len);
if (!buff_len) {
-   dev_err(hba->dev, "%s: Failed to get desc length", __func__);
+   dev_err(hba->dev, "%s: Failed to get desc length\n", __func__);
+   return -EINVAL;
+   }
+
+   if (param_offset >= buff_len) {
+   dev_err(hba->dev, "%s: Invalid offset 0x%x in descriptor IDN 
0x%x, length 0x%x\n",
+   __func__, param_offset, desc_id, buff_len);
return -EINVAL;
}
 
/* Check whether we need temp memory */
if (param_offset != 0 || param_size < buff_len) {
-   desc_buf = kmalloc(buff_len, GFP_KERNEL);
+   desc_buf = kzalloc(buff_len, GFP_KERNEL);
if (!desc_buf)
return -ENOMEM;
} else {
@@ -3204,14 +3210,14 @@ int ufshcd_read_desc_param(struct ufs_hba *hba,
desc_buf, _len);
 
if (ret) {
-   dev_err(hba->dev, "%s: Failed reading descriptor. desc_id %d, 
desc_index %d, param_offset %d, ret %d",
+   dev_err(hba->dev, "%s: Failed reading descriptor. desc_id %d, 
desc_index %d, param_offset %d, ret %d\n",
__func__, desc_id, desc_index, param_offset, ret);
goto out;
}
 
/* Sanity check */
if (desc_buf[QUERY_DESC_DESC_TYPE_OFFSET] != desc_id) {
-   dev_err(hba->dev, "%s: invalid desc_id %d in descriptor header",
+   dev_err(hba->dev, "%s: invalid desc_id %d in descriptor 
header\n",
__func__, desc_buf[QUERY_DESC_DESC_TYPE_OFFSET]);
ret = -EINVAL;
goto out;
@@ -3221,12 +3227,12 @@ int ufshcd_read_desc_param(struct ufs_hba *hba,
buff_len = desc_buf[QUERY_DESC_LENGTH_OFFSET];
ufshcd_update_desc_length(hba, desc_id, desc_index, buff_len);
 
-   /* Check wherher we will not copy more data, than available */
-   if (is_kmalloc && (param_offset + param_size) > buff_len)
-   param_size = buff_len - param_offset;
-
-   if (is_kmalloc)
+   if (is_kmalloc) {
+   /* Make sure we don't copy more data than available */
+   if (param_offset + param_size > buff_len)
+   param_size = buff_len - param_offset;
memcpy(param_read_buf, _buf[param_offset], param_size);
+   }
 out:
if (is_kmalloc)
kfree(desc_buf);
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [PATCH 0/2] block layer filter and block device snapshot module

2020-10-21 Thread Hannes Reinecke


On 10/21/20 4:10 PM, Sergei Shtepa wrote:

The 10/21/2020 16:31, Hannes Reinecke wrote:

I do understand where you are coming from, but then we already have a
dm-snap which does exactly what you want to achieve.
Of course, that would require a reconfiguration of the storage stack on
the machine, which is not always possible (or desired).


Yes, reconfiguring the storage stack on a machine is almost impossible.



What I _could_ imagine would be a 'dm-intercept' thingie, which
redirects the current submit_bio() function for any block device, and
re-routes that to a linear device-mapper device pointing back to the
original block device.

That way you could attach it to basically any block device, _and_ can
use the existing device-mapper functionality to do fancy stuff once the
submit_io() callback has been re-routed.

And it also would help in other scenarios, too; with such a
functionality we could seamlessly clone devices without having to move
the whole setup to device-mapper first.


Hm...
Did I understand correctly that the filter itself can be left approximately
as it is, but the blk-snap module can be replaced with 'dm-intercept',
which would use the re-route mechanism from the dm?
I think I may be able to implement it, if you describe your idea in more
detail.


Actually, once we have an dm-intercept, why do you need the block-layer 
filter at all?
From you initial description the block-layer filter was implemented 
such that blk-snap could work; but if we have dm-intercept (and with it 
the ability to use device-mapper functionality even for normal block 
devices) there wouldn't be any need for the block-layer filter, no?


Cheers,

Hannes
--
Dr. Hannes ReineckeKernel Storage Architect
h...@suse.de  +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

[PATCH 3/3] Documentation: introduce a param "irqflood_suppress"

2020-10-21 Thread Pingfan Liu

The param "irqflood_suppress" is helpful for capture kernel to survive irq
flood.

Signed-off-by: Pingfan Liu 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Jisheng Zhang 
Cc: Andrew Morton 
Cc: "Guilherme G. Piccoli" 
Cc: Petr Mladek 
Cc: Marc Zyngier 
Cc: Linus Walleij 
Cc: afzal mohammed 
Cc: Lina Iyer 
Cc: "Gustavo A. R. Silva" 
Cc: Maulik Shah 
Cc: Al Viro 
Cc: Jonathan Corbet 
Cc: Pawan Gupta 
Cc: Mike Kravetz 
Cc: Oliver Neukum 
To: linux-kernel@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: ke...@lists.infradead.org
---
 Documentation/admin-guide/kernel-parameters.txt | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index a106874..0a25a05 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2009,6 +2009,9 @@
for it. Also check all handlers each timer
interrupt. Intended to get systems with badly broken
firmware running.
+   irqflood_suppress   [HW]
+   When a irq fully occupies a cpu in a long time, 
suppressing
+   it to make kernel move on. It is useful in the capture 
kernel.
 
isapnp= [ISAPNP]
Format: ,,,
-- 
2.7.5

[PATCH 1/3] kernel/watchdog: show irq percentage if irq floods

2020-10-21 Thread Pingfan Liu

In kdump case, the capture kernel has no chance to shutdown devices, and
leave the devices in unstable state. Irq flood is one of the consequence.

When irq floods, the kernel is totally occupies by irq.  Currently, there
may be nothing or just soft lockup warning showed in console. It is better
to warn users with irq flood info.

Soft lockup watchdog is a good foundation to implement the detector. A irq
flood is reported if the following two conditions are met:
  -1. the irq occupies too much (98%) of the past interval.
  -2. no tasks has been scheduled in the past interval. This is implemented
  by check_hint.
  A note: is_softlockup() implies the 2nd condition, but unfortunately, irq
  flood can come from anywhere. If irq_enter_rcu()->tick_irq_enter(), then
  touch_softlockup_watchdog_sched() will reset watchdog, and softlockup is
  never detected.

Signed-off-by: Pingfan Liu 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Jisheng Zhang 
Cc: Andrew Morton 
Cc: "Guilherme G. Piccoli" 
Cc: Petr Mladek 
Cc: Marc Zyngier 
Cc: Linus Walleij 
Cc: afzal mohammed 
Cc: Lina Iyer 
Cc: "Gustavo A. R. Silva" 
Cc: Maulik Shah 
Cc: Al Viro 
Cc: Jonathan Corbet 
Cc: Pawan Gupta 
Cc: Mike Kravetz 
Cc: Oliver Neukum 
To: linux-kernel@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: ke...@lists.infradead.org
---
 kernel/watchdog.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 5abb5b2..230ac38 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -175,6 +176,13 @@ static DEFINE_PER_CPU(bool, softlockup_touch_sync);
 static DEFINE_PER_CPU(bool, soft_watchdog_warn);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts);
 static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
+
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+static DEFINE_PER_CPU(long, check_hint) = {-1};
+static DEFINE_PER_CPU(unsigned long, last_irq_ts);
+static DEFINE_PER_CPU(unsigned long, last_total_ts);
+#endif
+
 static unsigned long soft_lockup_nmi_warn;
 
 static int __init nowatchdog_setup(char *str)
@@ -331,12 +339,43 @@ static DEFINE_PER_CPU(struct cpu_stop_work, 
softlockup_stop_work);
  */
 static int softlockup_fn(void *data)
 {
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+   __this_cpu_write(check_hint, -1);
+#endif
__touch_watchdog();
complete(this_cpu_ptr(_completion));
 
return 0;
 }
 
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+static void check_irq_flood(void)
+{
+   unsigned long irqts, totalts, percent, cnt;
+   u64 *cpustat = kcpustat_this_cpu->cpustat;
+
+   totalts = running_clock();
+   irqts = cpustat[CPUTIME_IRQ] + cpustat[CPUTIME_SOFTIRQ];
+   cnt = __this_cpu_inc_return(check_hint);
+
+   if (cnt >= 5) {
+   totalts = totalts - __this_cpu_read(last_total_ts);
+   irqts = irqts - __this_cpu_read(last_irq_ts);
+   percent = irqts * 100 / totalts;
+   percent =  percent < 100 ? percent : 100;
+   __this_cpu_write(check_hint, -1);
+   if (percent >= 98)
+   pr_info("Irq flood occupies more than %lu%% of the past 
%lu seconds\n",
+   percent, totalts >> 30);
+   } else if (cnt == 0) {
+   __this_cpu_write(last_total_ts, totalts);
+   __this_cpu_write(last_irq_ts, irqts);
+   }
+}
+#else
+static void check_irq_flood(void){}
+#endif
+
 /* watchdog kicker functions */
 static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
 {
@@ -348,6 +387,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct 
hrtimer *hrtimer)
if (!watchdog_enabled)
return HRTIMER_NORESTART;
 
+   /* When irq floods, watchdog may be still touched. Hence it can not be 
done inside lockup */
+   check_irq_flood();
/* kick the hardlockup detector */
watchdog_interrupt_count();
 
-- 
2.7.5

[PATCH 2/3] kernel/watchdog: suppress max irq when irq floods

2020-10-21 Thread Pingfan Liu

The capture kernel should try its best to save the crash info. Normally,
irq flood is caused by some trivial devices, which has no impact on saving
vmcore.

Introducing a parameter "irqflood_suppress" to enable suppress irq flood.

Signed-off-by: Pingfan Liu 
Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Jisheng Zhang 
Cc: Andrew Morton 
Cc: "Guilherme G. Piccoli" 
Cc: Petr Mladek 
Cc: Marc Zyngier 
Cc: Linus Walleij 
Cc: afzal mohammed 
Cc: Lina Iyer 
Cc: "Gustavo A. R. Silva" 
Cc: Maulik Shah 
Cc: Al Viro 
Cc: Jonathan Corbet 
Cc: Pawan Gupta 
Cc: Mike Kravetz 
Cc: Oliver Neukum 
To: linux-kernel@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: ke...@lists.infradead.org
---
 include/linux/irq.h   |  2 ++
 kernel/irq/spurious.c | 32 
 kernel/watchdog.c |  9 -
 3 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 1b7f4df..140cb61 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -684,6 +684,8 @@ extern void note_interrupt(struct irq_desc *desc, 
irqreturn_t action_ret);
 /* Enable/disable irq debugging output: */
 extern int noirqdebug_setup(char *str);
 
+void suppress_max_irq(void);
+
 /* Checks whether the interrupt can be requested by request_irq(): */
 extern int can_request_irq(unsigned int irq, unsigned long irqflags);
 
diff --git a/kernel/irq/spurious.c b/kernel/irq/spurious.c
index f865e5f..d3d94d6 100644
--- a/kernel/irq/spurious.c
+++ b/kernel/irq/spurious.c
@@ -464,3 +464,35 @@ static int __init irqpoll_setup(char *str)
 }
 
 __setup("irqpoll", irqpoll_setup);
+
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+
+static bool irqflood_suppress;
+
+static int __init irqflood_suppress_setup(char *str)
+{
+   irqflood_suppress = true;
+   pr_info("enable auto suppress irqflood\n");
+   return 1;
+}
+__setup("irqflood_suppress", irqflood_suppress_setup);
+
+void suppress_max_irq(void)
+{
+   unsigned int tmp, maxirq = 0, max = 0;
+   int irq;
+
+   if (!irqflood_suppress)
+   return;
+   for_each_active_irq(irq) {
+   tmp = kstat_irqs_cpu(irq, smp_processor_id());
+   if (max < tmp) {
+   maxirq = irq;
+   max = tmp;
+   }
+   }
+   pr_warn("Suppress irq:%u, which is triggered %u times\n",
+   maxirq, max);
+   disable_irq_nosync(maxirq);
+}
+#endif
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 230ac38..28a74e5 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -364,9 +365,15 @@ static void check_irq_flood(void)
percent = irqts * 100 / totalts;
percent =  percent < 100 ? percent : 100;
__this_cpu_write(check_hint, -1);
-   if (percent >= 98)
+   if (percent >= 98) {
pr_info("Irq flood occupies more than %lu%% of the past 
%lu seconds\n",
percent, totalts >> 30);
+   /*
+* Suppress top irq when scheduler does not work for 
long time and irq
+* occupies too much time.
+*/
+   suppress_max_irq();
+   }
} else if (cnt == 0) {
__this_cpu_write(last_total_ts, totalts);
__this_cpu_write(last_irq_ts, irqts);
-- 
2.7.5

[PATCH 0/3] warn and suppress irqflood

2020-10-21 Thread Pingfan Liu

I hit a irqflood bug on powerpc platform, and two years ago, on a x86 platform.
When the bug happens, the kernel is totally occupies by irq.  Currently, there
may be nothing or just soft lockup warning showed in console. It is better
to warn users with irq flood info.

In the kdump case, the kernel can move on by suppressing the irq flood.

Cc: Thomas Gleixner 
Cc: Peter Zijlstra 
Cc: Jisheng Zhang 
Cc: Andrew Morton 
Cc: "Guilherme G. Piccoli" 
Cc: Petr Mladek 
Cc: Marc Zyngier 
Cc: Linus Walleij 
Cc: afzal mohammed 
Cc: Lina Iyer 
Cc: "Gustavo A. R. Silva" 
Cc: Maulik Shah 
Cc: Al Viro 
Cc: Jonathan Corbet 
Cc: Pawan Gupta 
Cc: Mike Kravetz 
Cc: Oliver Neukum 
To: linux-kernel@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: ke...@lists.infradead.org

Pingfan Liu (3):
  kernel/watchdog: show irq percentage if irq floods
  kernel/watchdog: suppress max irq when irq floods
  Documentation: introduce a param "irqflood_suppress"

 Documentation/admin-guide/kernel-parameters.txt |  3 ++
 include/linux/irq.h |  2 ++
 kernel/irq/spurious.c   | 32 +
 kernel/watchdog.c   | 48 +
 4 files changed, 85 insertions(+)

-- 
2.7.5

Re: [PATCH] PM / sysfs: Expose suspend resume driver flags in sysfs

2020-10-21 Thread Chen Yu

Hi Greg,
thanks for taking a look at this.
On Thu, Oct 22, 2020 at 07:31:43AM +0200, Greg Kroah-Hartman wrote:
> On Thu, Oct 22, 2020 at 11:23:24AM +0800, Chen Yu wrote:
> > Currently there are 4 driver flags to control system suspend/resume
> > behavior: DPM_FLAG_NO_DIRECT_COMPLETE, DPM_FLAG_SMART_PREPARE,
> > DPM_FLAG_SMART_SUSPEND and DPM_FLAG_MAY_SKIP_RESUME. Make these flags
> > visible in sysfs as read-only to get a brief understanding of the
> > expected behavior of each device during suspend/resume, so as to
> > facilitate suspend/resume debugging/tuning.
> > 
> > For example:
> > /sys/devices/pci:00/:00:15.1/power/driver_flags:4
> > (DPM_FLAG_SMART_SUSPEND)
> > 
> > /sys/devices/pci:00/:00:07.3/power/driver_flags:5
> > (DPM_FLAG_NO_DIRECT_COMPLETE | DPM_FLAG_SMART_SUSPEND)
> > 
> > Acked-by: Len Brown 
> > Signed-off-by: Chen Yu 
> > ---
> >  drivers/base/power/sysfs.c | 29 -
> >  1 file changed, 28 insertions(+), 1 deletion(-)
> 
> There is no Documentataion/ABI/ entry for your new file, which makes
> this patch impossible to properly review by anyone, and prevents it from
> being able to be accepted.
> 
> Please fix.
> 
Okay, will add the entry in the document.

thanks,
Chenyu
> thanks,
> 
> greg k-h

Re: [PATCH v8 -tip 13/26] kernel/entry: Add support for core-wide protection of kernel-mode

2020-10-21 Thread Li, Aubrey

On 2020/10/20 9:43, Joel Fernandes (Google) wrote:
> Core-scheduling prevents hyperthreads in usermode from attacking each
> other, but it does not do anything about one of the hyperthreads
> entering the kernel for any reason. This leaves the door open for MDS
> and L1TF attacks with concurrent execution sequences between
> hyperthreads.
> 
> This patch therefore adds support for protecting all syscall and IRQ
> kernel mode entries. Care is taken to track the outermost usermode exit
> and entry using per-cpu counters. In cases where one of the hyperthreads
> enter the kernel, no additional IPIs are sent. Further, IPIs are avoided
> when not needed - example: idle and non-cookie HTs do not need to be
> forced into kernel mode.
> 
> More information about attacks:
> For MDS, it is possible for syscalls, IRQ and softirq handlers to leak
> data to either host or guest attackers. For L1TF, it is possible to leak
> to guest attackers. There is no possible mitigation involving flushing
> of buffers to avoid this since the execution of attacker and victims
> happen concurrently on 2 or more HTs.
> 
> Cc: Julien Desfossez 
> Cc: Tim Chen 
> Cc: Aaron Lu 
> Cc: Aubrey Li 
> Cc: Tim Chen 
> Cc: Paul E. McKenney 
> Co-developed-by: Vineeth Pillai 
> Tested-by: Julien Desfossez 
> Signed-off-by: Vineeth Pillai 
> Signed-off-by: Joel Fernandes (Google) 
> ---
>  .../admin-guide/kernel-parameters.txt |   7 +
>  include/linux/entry-common.h  |   2 +-
>  include/linux/sched.h |  12 +
>  kernel/entry/common.c |  25 +-
>  kernel/sched/core.c   | 229 ++
>  kernel/sched/sched.h  |   3 +
>  6 files changed, 275 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt 
> b/Documentation/admin-guide/kernel-parameters.txt
> index 3236427e2215..48567110f709 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -4678,6 +4678,13 @@
>  
>   sbni=   [NET] Granch SBNI12 leased line adapter
>  
> + sched_core_protect_kernel=
> + [SCHED_CORE] Pause SMT siblings of a core running in
> + user mode, if at least one of the siblings of the core
> + is running in kernel mode. This is to guarantee that
> + kernel data is not leaked to tasks which are not trusted
> + by the kernel.
> +
>   sched_debug [KNL] Enables verbose scheduler debug messages.
>  
>   schedstats= [KNL,X86] Enable or disable scheduled statistics.
> diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h
> index 474f29638d2c..260216de357b 100644
> --- a/include/linux/entry-common.h
> +++ b/include/linux/entry-common.h
> @@ -69,7 +69,7 @@
>  
>  #define EXIT_TO_USER_MODE_WORK   
> \
>   (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME | _TIF_UPROBE |   \
> -  _TIF_NEED_RESCHED | _TIF_PATCH_PENDING |   \
> +  _TIF_NEED_RESCHED | _TIF_PATCH_PENDING | _TIF_UNSAFE_RET | \
>ARCH_EXIT_TO_USER_MODE_WORK)
>  
>  /**
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index d38e904dd603..fe6f225bfbf9 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2071,4 +2071,16 @@ int sched_trace_rq_nr_running(struct rq *rq);
>  
>  const struct cpumask *sched_trace_rd_span(struct root_domain *rd);
>  
> +#ifdef CONFIG_SCHED_CORE
> +void sched_core_unsafe_enter(void);
> +void sched_core_unsafe_exit(void);
> +bool sched_core_wait_till_safe(unsigned long ti_check);
> +bool sched_core_kernel_protected(void);
> +#else
> +#define sched_core_unsafe_enter(ignore) do { } while (0)
> +#define sched_core_unsafe_exit(ignore) do { } while (0)
> +#define sched_core_wait_till_safe(ignore) do { } while (0)
> +#define sched_core_kernel_protected(ignore) do { } while (0)
> +#endif
> +
>  #endif
> diff --git a/kernel/entry/common.c b/kernel/entry/common.c
> index 0a1e20f8d4e8..c8dc6b1b1f40 100644
> --- a/kernel/entry/common.c
> +++ b/kernel/entry/common.c
> @@ -137,6 +137,26 @@ static __always_inline void exit_to_user_mode(void)
>  /* Workaround to allow gradual conversion of architecture code */
>  void __weak arch_do_signal(struct pt_regs *regs) { }
>  
> +unsigned long exit_to_user_get_work(void)
> +{
> + unsigned long ti_work = READ_ONCE(current_thread_info()->flags);
> +
> + if (IS_ENABLED(CONFIG_SCHED_CORE) && !sched_core_kernel_protected())
> + return ti_work;
> +
> +#ifdef CONFIG_SCHED_CORE
> + ti_work &= EXIT_TO_USER_MODE_WORK;
> + if ((ti_work & _TIF_UNSAFE_RET) == ti_work) {

Though _TIF_UNSAFE_RET is not x86 specific, but I only saw the definition in 
x86.
I'm not sure if other SMT capable architectures are happy with this?


> + sched_core_unsafe_exit();
> +

Re: [PATCH 0/4] vDPA: API for reporting IOVA range

2020-10-21 Thread Jason Wang




On 2020/10/21 下午10:45, Michael S. Tsirkin wrote:

On Wed, Jun 17, 2020 at 11:29:43AM +0800, Jason Wang wrote:

Hi All:

This series introduces API for reporing IOVA range. This is a must for
userspace to work correclty:

- for the process that uses vhost-vDPA directly to properly allocate
   IOVA
- for VM(qemu), when vIOMMU is not enabled, fail early if GPA is out
   of range
- for VM(qemu), when vIOMMU is enabled, determine a valid guest
   address width

Please review.

Thanks

OK so what is the plan here? Change begin-end->first-last and repost?



I've posted V2 with this change, but it get some warning for buildbot.

Will post a V3.

Thanks





Jason Wang (4):
   vdpa: introduce config op to get valid iova range
   vdpa_sim: implement get_iova_range bus operation
   vdpa: get_iova_range() is mandatory for device specific DMA
 translation
   vhost: vdpa: report iova range

  drivers/vdpa/vdpa.c  |  4 
  drivers/vdpa/vdpa_sim/vdpa_sim.c | 11 +++
  drivers/vhost/vdpa.c | 27 +++
  include/linux/vdpa.h | 14 ++
  include/uapi/linux/vhost.h   |  4 
  include/uapi/linux/vhost_types.h |  5 +
  6 files changed, 65 insertions(+)

--
2.20.1

[PATCH v2] i2c: designware: call i2c_dw_read_clear_intrbits_slave() once

2020-10-21 Thread Michael Wu

If some bits were cleared by i2c_dw_read_clear_intrbits_slave() in
i2c_dw_isr_slave() and not handled immediately, those cleared bits would
not be shown again by later i2c_dw_read_clear_intrbits_slave(). They
therefore were forgotten to be handled.

i2c_dw_read_clear_intrbits_slave() should be called once in an ISR and take
its returned state for all later handlings.

Signed-off-by: Michael Wu 
---

Changes in v2:
 - revert moving I2C_SLAVE_WRITE_REQUESTED reporting

 drivers/i2c/busses/i2c-designware-slave.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/i2c/busses/i2c-designware-slave.c 
b/drivers/i2c/busses/i2c-designware-slave.c
index 44974b53a626..8eced99b7aeb 100644
--- a/drivers/i2c/busses/i2c-designware-slave.c
+++ b/drivers/i2c/busses/i2c-designware-slave.c
@@ -159,7 +159,6 @@ static int i2c_dw_irq_handler_slave(struct dw_i2c_dev *dev)
u32 raw_stat, stat, enabled, tmp;
u8 val = 0, slave_activity;
 
-   regmap_read(dev->map, DW_IC_INTR_STAT, );
regmap_read(dev->map, DW_IC_ENABLE, );
regmap_read(dev->map, DW_IC_RAW_INTR_STAT, _stat);
regmap_read(dev->map, DW_IC_STATUS, );
@@ -168,6 +167,7 @@ static int i2c_dw_irq_handler_slave(struct dw_i2c_dev *dev)
if (!enabled || !(raw_stat & ~DW_IC_INTR_ACTIVITY) || !dev->slave)
return 0;
 
+   stat = i2c_dw_read_clear_intrbits_slave(dev);
dev_dbg(dev->dev,
"%#x STATUS SLAVE_ACTIVITY=%#x : RAW_INTR_STAT=%#x : 
INTR_STAT=%#x\n",
enabled, slave_activity, raw_stat, stat);
@@ -188,11 +188,9 @@ static int i2c_dw_irq_handler_slave(struct dw_i2c_dev *dev)
 val);
}
regmap_read(dev->map, DW_IC_CLR_RD_REQ, );
-   stat = i2c_dw_read_clear_intrbits_slave(dev);
} else {
regmap_read(dev->map, DW_IC_CLR_RD_REQ, );
regmap_read(dev->map, DW_IC_CLR_RX_UNDER, );
-   stat = i2c_dw_read_clear_intrbits_slave(dev);
}
if (!i2c_slave_event(dev->slave,
 I2C_SLAVE_READ_REQUESTED,
@@ -207,7 +205,6 @@ static int i2c_dw_irq_handler_slave(struct dw_i2c_dev *dev)
regmap_read(dev->map, DW_IC_CLR_RX_DONE, );
 
i2c_slave_event(dev->slave, I2C_SLAVE_STOP, );
-   stat = i2c_dw_read_clear_intrbits_slave(dev);
return 1;
}
 
@@ -217,10 +214,8 @@ static int i2c_dw_irq_handler_slave(struct dw_i2c_dev *dev)
if (!i2c_slave_event(dev->slave, I2C_SLAVE_WRITE_RECEIVED,
 ))
dev_vdbg(dev->dev, "Byte %X acked!", val);
-   } else {
+   } else
i2c_slave_event(dev->slave, I2C_SLAVE_STOP, );
-   stat = i2c_dw_read_clear_intrbits_slave(dev);
-   }
 
return 1;
 }
@@ -230,7 +225,6 @@ static irqreturn_t i2c_dw_isr_slave(int this_irq, void 
*dev_id)
struct dw_i2c_dev *dev = dev_id;
int ret;
 
-   i2c_dw_read_clear_intrbits_slave(dev);
ret = i2c_dw_irq_handler_slave(dev);
if (ret > 0)
complete(>cmd_complete);
-- 
2.17.1

[PATCH] net-veth: Fix memleak in veth_newlink

2020-10-21 Thread Dinghao Liu

When rtnl_configure_link() fails, peer needs to be
freed just like when register_netdevice() fails.

Signed-off-by: Dinghao Liu 
---
 drivers/net/veth.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8c737668008a..6c68094399cc 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1405,8 +1405,6 @@ static int veth_newlink(struct net *src_net, struct 
net_device *dev,
/* nothing to do */
 err_configure_peer:
unregister_netdevice(peer);
-   return err;
-
 err_register_peer:
free_netdev(peer);
return err;
-- 
2.17.1

Re: [PATCH v4 5/5] riscv: Add numa support for riscv64 platform

2020-10-21 Thread Anup Patel

On Tue, Oct 6, 2020 at 5:48 AM Atish Patra  wrote:
>
> Use the generic numa implementation to add NUMA support for RISC-V.
> This is based on Greentime's patch[1] but modified to use generic NUMA
> implementation and few more fixes.
>
> [1] https://lkml.org/lkml/2020/1/10/233
>
> Co-developed-by: Greentime Hu 
> Signed-off-by: Greentime Hu 
> Signed-off-by: Atish Patra 
> ---
>  arch/riscv/Kconfig  | 31 ++-
>  arch/riscv/include/asm/mmzone.h | 13 +
>  arch/riscv/include/asm/numa.h   |  8 
>  arch/riscv/include/asm/pci.h| 14 ++
>  arch/riscv/kernel/setup.c   | 10 --
>  arch/riscv/kernel/smpboot.c | 12 +++-
>  arch/riscv/mm/init.c|  4 +++-
>  7 files changed, 87 insertions(+), 5 deletions(-)
>  create mode 100644 arch/riscv/include/asm/mmzone.h
>  create mode 100644 arch/riscv/include/asm/numa.h
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index df18372861d8..7beb6ddb6eb1 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -137,7 +137,7 @@ config PAGE_OFFSET
> default 0xffe0 if 64BIT && MAXPHYSMEM_128GB
>
>  config ARCH_FLATMEM_ENABLE
> -   def_bool y
> +   def_bool !NUMA
>
>  config ARCH_SPARSEMEM_ENABLE
> def_bool y
> @@ -295,6 +295,35 @@ config TUNE_GENERIC
>
>  endchoice
>
> +# Common NUMA Features
> +config NUMA
> +   bool "NUMA Memory Allocation and Scheduler Support"
> +   select GENERIC_ARCH_NUMA
> +   select OF_NUMA
> +   select ARCH_SUPPORTS_NUMA_BALANCING
> +   help
> + Enable NUMA (Non-Uniform Memory Access) support.
> +
> + The kernel will try to allocate memory used by a CPU on the
> + local memory of the CPU and add some more NUMA awareness to the 
> kernel.
> +
> +config NODES_SHIFT
> +   int "Maximum NUMA Nodes (as a power of 2)"
> +   range 1 10
> +   default "2"
> +   depends on NEED_MULTIPLE_NODES
> +   help
> + Specify the maximum number of NUMA Nodes available on the target
> + system.  Increases memory reserved to accommodate various tables.
> +
> +config USE_PERCPU_NUMA_NODE_ID
> +   def_bool y
> +   depends on NUMA
> +
> +config NEED_PER_CPU_EMBED_FIRST_CHUNK
> +   def_bool y
> +   depends on NUMA
> +
>  config RISCV_ISA_C
> bool "Emit compressed instructions when building Linux"
> default y
> diff --git a/arch/riscv/include/asm/mmzone.h b/arch/riscv/include/asm/mmzone.h
> new file mode 100644
> index ..fa17e01d9ab2
> --- /dev/null
> +++ b/arch/riscv/include/asm/mmzone.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_MMZONE_H
> +#define __ASM_MMZONE_H
> +
> +#ifdef CONFIG_NUMA
> +
> +#include 
> +
> +extern struct pglist_data *node_data[];
> +#define NODE_DATA(nid) (node_data[(nid)])
> +
> +#endif /* CONFIG_NUMA */
> +#endif /* __ASM_MMZONE_H */
> diff --git a/arch/riscv/include/asm/numa.h b/arch/riscv/include/asm/numa.h
> new file mode 100644
> index ..8c8cf4297cc3
> --- /dev/null
> +++ b/arch/riscv/include/asm/numa.h
> @@ -0,0 +1,8 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_NUMA_H
> +#define __ASM_NUMA_H
> +
> +#include 
> +#include 
> +
> +#endif /* __ASM_NUMA_H */
> diff --git a/arch/riscv/include/asm/pci.h b/arch/riscv/include/asm/pci.h
> index 1c473a1bd986..658e112c3ce7 100644
> --- a/arch/riscv/include/asm/pci.h
> +++ b/arch/riscv/include/asm/pci.h
> @@ -32,6 +32,20 @@ static inline int pci_proc_domain(struct pci_bus *bus)
> /* always show the domain in /proc */
> return 1;
>  }
> +
> +#ifdef CONFIG_NUMA
> +
> +static inline int pcibus_to_node(struct pci_bus *bus)
> +{
> +   return dev_to_node(>dev);
> +}
> +#ifndef cpumask_of_pcibus
> +#define cpumask_of_pcibus(bus) (pcibus_to_node(bus) == -1 ?\
> +cpu_all_mask : \
> +cpumask_of_node(pcibus_to_node(bus)))
> +#endif
> +#endif /* CONFIG_NUMA */
> +
>  #endif  /* CONFIG_PCI */
>
>  #endif  /* _ASM_RISCV_PCI_H */
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 07fa6d13367e..53a806a9cbaf 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -101,13 +101,19 @@ void __init setup_arch(char **cmdline_p)
>
>  static int __init topology_init(void)
>  {
> -   int i;
> +   int i, ret;
> +
> +   for_each_online_node(i)
> +   register_one_node(i);
>
> for_each_possible_cpu(i) {
> struct cpu *cpu = _cpu(cpu_devices, i);
>
> cpu->hotpluggable = cpu_has_hotplug(i);
> -   register_cpu(cpu, i);
> +   ret = register_cpu(cpu, i);
> +   if (unlikely(ret))
> +   pr_warn("Warning: %s: register_cpu %d failed (%d)\n",
> +  __func__, i, ret);
> }
>
> return

Re: [PATCH v4 4/5] riscv: Add support pte_protnone and pmd_protnone if CONFIG_NUMA_BALANCING

2020-10-21 Thread Anup Patel

On Tue, Oct 6, 2020 at 5:48 AM Atish Patra  wrote:
>
> From: Greentime Hu 
>
> These two functions are used to distinguish between PROT_NONENUMA
> protections and hinting fault protections.
>
> Signed-off-by: Greentime Hu 
> ---
>  arch/riscv/include/asm/pgtable.h | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/arch/riscv/include/asm/pgtable.h 
> b/arch/riscv/include/asm/pgtable.h
> index 515b42f98d34..2751110675e6 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -183,6 +183,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
> return (unsigned long)pfn_to_virt(pmd_val(pmd) >> _PAGE_PFN_SHIFT);
>  }
>
> +static inline pte_t pmd_pte(pmd_t pmd)
> +{
> +   return __pte(pmd_val(pmd));
> +}
> +
>  /* Yields the page frame number (PFN) of a page table entry */
>  static inline unsigned long pte_pfn(pte_t pte)
>  {
> @@ -286,6 +291,21 @@ static inline pte_t pte_mkhuge(pte_t pte)
> return pte;
>  }
>
> +#ifdef CONFIG_NUMA_BALANCING
> +/*
> + * See the comment in include/asm-generic/pgtable.h
> + */
> +static inline int pte_protnone(pte_t pte)
> +{
> +   return (pte_val(pte) & (_PAGE_PRESENT | _PAGE_PROT_NONE)) == 
> _PAGE_PROT_NONE;
> +}
> +
> +static inline int pmd_protnone(pmd_t pmd)
> +{
> +   return pte_protnone(pmd_pte(pmd));
> +}
> +#endif
> +
>  /* Modify page protection bits */
>  static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
>  {
> --
> 2.25.1
>

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

Re: [PATCH v4 3/5] riscv: Separate memory init from paging init

2020-10-21 Thread Anup Patel

On Tue, Oct 6, 2020 at 5:48 AM Atish Patra  wrote:
>
> Currently, we perform some memory init functions in paging init. But,
> that will be an issue for NUMA support where DT needs to be flattened
> before numa initialization and memblock_present can only be called
> after numa initialization.
>
> Move memory initialization related functions to a separate function.
>
> Signed-off-by: Atish Patra 
> Reviewed-by: Greentime Hu 
> ---
>  arch/riscv/include/asm/pgtable.h | 1 +
>  arch/riscv/kernel/setup.c| 1 +
>  arch/riscv/mm/init.c | 6 +-
>  3 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/pgtable.h 
> b/arch/riscv/include/asm/pgtable.h
> index eaea1f717010..515b42f98d34 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -466,6 +466,7 @@ static inline void __kernel_map_pages(struct page *page, 
> int numpages, int enabl
>  extern void *dtb_early_va;
>  void setup_bootmem(void);
>  void paging_init(void);
> +void misc_mem_init(void);
>
>  #define FIRST_USER_ADDRESS  0
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 2c6dd329312b..07fa6d13367e 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -78,6 +78,7 @@ void __init setup_arch(char **cmdline_p)
>  #else
> unflatten_device_tree();
>  #endif
> +   misc_mem_init();
>
>  #ifdef CONFIG_SWIOTLB
> swiotlb_init(1);
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index ed6e83871112..114c3966aadb 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -565,8 +565,12 @@ static void __init resource_init(void)
>  void __init paging_init(void)
>  {
> setup_vm_final();
> -   sparse_init();
> setup_zero_page();
> +}
> +
> +void __init misc_mem_init(void)
> +{
> +   sparse_init();
> zone_sizes_init();
> resource_init();
>  }
> --
> 2.25.1
>

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

Re: [PATCH] usb: typec: Expose Product Type VDOs via sysfs

2020-10-21 Thread Prashant Malani

Hi Greg,

Thanks for taking a look.

On Wed, Oct 21, 2020 at 10:30 PM Greg KH  wrote:
>
> On Wed, Oct 21, 2020 at 02:18:02PM -0700, Prashant Malani wrote:
> > A PD-capable device can return up to 3 Product Type VDOs as part of its
> > DiscoverIdentity Response (USB PD Spec, Rev 3.0, Version 2.0, Section
> > 6.4.4.3.1). Add a sysfs attribute to expose these to userspace.
>
> You forgot to add the proper Documentation/ABI/ file at the same time,
> which makes this patch impossible to review properly and able to be
> applied :(
>
Sorry about that, will fix and resend.

Best regards,

-Prashant
> Please fix.
>
> thanks,
>
> greg k-h

Re: [PATCH] PM / sysfs: Expose suspend resume driver flags in sysfs

2020-10-21 Thread Greg Kroah-Hartman

On Thu, Oct 22, 2020 at 11:23:24AM +0800, Chen Yu wrote:
> Currently there are 4 driver flags to control system suspend/resume
> behavior: DPM_FLAG_NO_DIRECT_COMPLETE, DPM_FLAG_SMART_PREPARE,
> DPM_FLAG_SMART_SUSPEND and DPM_FLAG_MAY_SKIP_RESUME. Make these flags
> visible in sysfs as read-only to get a brief understanding of the
> expected behavior of each device during suspend/resume, so as to
> facilitate suspend/resume debugging/tuning.
> 
> For example:
> /sys/devices/pci:00/:00:15.1/power/driver_flags:4
> (DPM_FLAG_SMART_SUSPEND)
> 
> /sys/devices/pci:00/:00:07.3/power/driver_flags:5
> (DPM_FLAG_NO_DIRECT_COMPLETE | DPM_FLAG_SMART_SUSPEND)
> 
> Acked-by: Len Brown 
> Signed-off-by: Chen Yu 
> ---
>  drivers/base/power/sysfs.c | 29 -
>  1 file changed, 28 insertions(+), 1 deletion(-)

There is no Documentataion/ABI/ entry for your new file, which makes
this patch impossible to properly review by anyone, and prevents it from
being able to be accepted.

Please fix.

thanks,

greg k-h

Re: [PATCH] usb: typec: Expose Product Type VDOs via sysfs

2020-10-21 Thread Greg KH

On Wed, Oct 21, 2020 at 02:18:02PM -0700, Prashant Malani wrote:
> A PD-capable device can return up to 3 Product Type VDOs as part of its
> DiscoverIdentity Response (USB PD Spec, Rev 3.0, Version 2.0, Section
> 6.4.4.3.1). Add a sysfs attribute to expose these to userspace.

You forgot to add the proper Documentation/ABI/ file at the same time,
which makes this patch impossible to review properly and able to be
applied :(

Please fix.

thanks,

greg k-h

kvm+nouveau induced lockdep gripe

2020-10-21 Thread Mike Galbraith

I've only as yet seen nouveau lockdep gripage when firing up one of my
full distro KVM's.

[   91.655613] ==
[   91.655614] WARNING: possible circular locking dependency detected
[   91.655614] 5.9.1-rt18-rt #5 Tainted: G S  E
[   91.655615] --
[   91.655615] libvirtd/1868 is trying to acquire lock:
[   91.655616] 918554b801c0 (>mmap_lock#2){}-{0:0}, at: 
mpol_rebind_mm+0x1e/0x50
[   91.655622]
   but task is already holding lock:
[   91.655622] 995b6c80 (_rwsem){}-{0:0}, at: 
cpuset_attach+0x38/0x390
[   91.655625]
   which lock already depends on the new lock.

[   91.655625]
   the existing dependency chain (in reverse order) is:
[   91.655625]
   -> #3 (_rwsem){}-{0:0}:
[   91.655626]lock_acquire+0x92/0x410
[   91.655629]cpuset_read_lock+0x39/0xf0
[   91.655630]__sched_setscheduler+0x4be/0xaf0
[   91.655632]_sched_setscheduler+0x69/0x70
[   91.655633]__kthread_create_on_node+0x114/0x170
[   91.655634]kthread_create_on_node+0x37/0x40
[   91.655635]setup_irq_thread+0x37/0x90
[   91.655637]__setup_irq+0x4de/0x7b0
[   91.655637]request_threaded_irq+0xf8/0x160
[   91.655638]nvkm_pci_oneinit+0x4c/0x70 [nouveau]
[   91.655674]nvkm_subdev_init+0x60/0x1e0 [nouveau]
[   91.655689]nvkm_device_init+0x10b/0x240 [nouveau]
[   91.655716]nvkm_udevice_init+0x47/0x70 [nouveau]
[   91.655742]nvkm_object_init+0x3d/0x180 [nouveau]
[   91.655755]nvkm_ioctl_new+0x1a1/0x260 [nouveau]
[   91.655768]nvkm_ioctl+0x10a/0x240 [nouveau]
[   91.655779]nvif_object_ctor+0xeb/0x150 [nouveau]
[   91.655790]nvif_device_ctor+0x1f/0x60 [nouveau]
[   91.655801]nouveau_cli_init+0x1dc/0x5c0 [nouveau]
[   91.655826]nouveau_drm_device_init+0x66/0x810 [nouveau]
[   91.655850]nouveau_drm_probe+0xfb/0x200 [nouveau]
[   91.655873]local_pci_probe+0x42/0x90
[   91.655875]pci_device_probe+0xe7/0x1a0
[   91.655876]really_probe+0xf7/0x4d0
[   91.655877]driver_probe_device+0x5d/0x140
[   91.655878]device_driver_attach+0x4f/0x60
[   91.655879]__driver_attach+0xa2/0x140
[   91.655880]bus_for_each_dev+0x67/0x90
[   91.655881]bus_add_driver+0x192/0x230
[   91.655882]driver_register+0x5b/0xf0
[   91.655883]do_one_initcall+0x56/0x3c4
[   91.655884]do_init_module+0x5b/0x21c
[   91.655886]load_module+0x1cc7/0x2430
[   91.655887]__do_sys_finit_module+0xa7/0xe0
[   91.655888]do_syscall_64+0x33/0x40
[   91.655889]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   91.655890]
   -> #2 (>mutex){+.+.}-{0:0}:
[   91.655891]lock_acquire+0x92/0x410
[   91.655893]_mutex_lock+0x28/0x40
[   91.655895]nvkm_udevice_fini+0x21/0x70 [nouveau]
[   91.655919]nvkm_object_fini+0xb8/0x210 [nouveau]
[   91.655931]nvkm_object_fini+0x73/0x210 [nouveau]
[   91.655943]nvkm_ioctl_del+0x7e/0xa0 [nouveau]
[   91.655954]nvkm_ioctl+0x10a/0x240 [nouveau]
[   91.655966]nvif_object_dtor+0x4a/0x60 [nouveau]
[   91.655976]nvif_client_dtor+0xe/0x40 [nouveau]
[   91.655986]nouveau_cli_fini+0x78/0x90 [nouveau]
[   91.656010]nouveau_drm_postclose+0xa6/0xe0 [nouveau]
[   91.656033]drm_file_free.part.9+0x27e/0x2d0 [drm]
[   91.656045]drm_release+0x6f/0xf0 [drm]
[   91.656052]__fput+0xb2/0x260
[   91.656053]task_work_run+0x73/0xc0
[   91.656055]exit_to_user_mode_prepare+0x204/0x230
[   91.656056]syscall_exit_to_user_mode+0x4a/0x330
[   91.656057]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   91.656058]
   -> #1 (>lock){+.+.}-{0:0}:
[   91.656059]lock_acquire+0x92/0x410
[   91.656060]_mutex_lock+0x28/0x40
[   91.656061]nouveau_mem_fini+0x4a/0x70 [nouveau]
[   91.656086]ttm_tt_destroy+0x22/0x70 [ttm]
[   91.656089]ttm_bo_cleanup_memtype_use+0x32/0xa0 [ttm]
[   91.656091]ttm_bo_put+0xe7/0x670 [ttm]
[   91.656093]ttm_bo_vm_close+0x15/0x30 [ttm]
[   91.656096]remove_vma+0x3e/0x70
[   91.656097]__do_munmap+0x2b7/0x4f0
[   91.656098]__vm_munmap+0x5b/0xa0
[   91.656098]__x64_sys_munmap+0x27/0x30
[   91.656099]do_syscall_64+0x33/0x40
[   91.656100]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   91.656100]
   -> #0 (>mmap_lock#2){}-{0:0}:
[   91.656102]validate_chain+0x981/0x1250
[   91.656103]__lock_acquire+0x86f/0xbd0
[   91.656104]lock_acquire+0x92/0x410
[   91.656105]down_write+0x3b/0x50
[   91.656106]mpol_rebind_mm+0x1e/0x50
[   91.656108]cpuset_attach+0x229/0x390
[   91.656109]cgroup_migrate_execute+0x46d/0x490
[

Re: Re: [PATCH] can: vxcan: Fix memleak in vxcan_newlink

2020-10-21 Thread dinghao . liu

> 
> On 21.10.20 07:21, Dinghao Liu wrote:
> > When rtnl_configure_link() fails, peer needs to be
> > freed just like when register_netdevice() fails.
> > 
> > Signed-off-by: Dinghao Liu 
> 
> Acked-by: Oliver Hartkopp 
> 
> Btw. as the vxcan.c driver bases on veth.c the same issue can be found 
> there!
> 
> At this point:
> https://elixir.bootlin.com/linux/latest/source/drivers/net/veth.c#L1398
> 
> err_register_dev:
>  /* nothing to do */
> err_configure_peer:
>  unregister_netdevice(peer);
>  return err; <<<
> 
> err_register_peer:
>  free_netdev(peer);
>  return err;
> }
> 
> IMO the return must be removed to fall through the next label and free 
> the netdevice too.
> 
> Would you like so send a patch for veth.c too?
> 

Sure.

Regards,
Dinghao

ltp or kvm triggerable lockdep alloc_pid() deadlock gripe

2020-10-21 Thread Mike Galbraith

Greetings,

The gripe below is repeatable in two ways here, boot with nomodeset so
nouveau doesn't steal the lockdep show when I then fire up one of my
(oink) full distro VM's, or from an ltp directory ./runltp -f cpuset
with the attached subset of controllers file placed in ./runtest dir.

Method2 may lead to a real deal deadlock, I've got a crashdump of one,
stack traces of uninterruptible sleepers attached.

[  154.927302] ==
[  154.927303] WARNING: possible circular locking dependency detected
[  154.927304] 5.9.1-rt18-rt #5 Tainted: G S  E
[  154.927305] --
[  154.927306] cpuset_inherit_/4992 is trying to acquire lock:
[  154.927307] 9d334c5e64d8 (>seqcount){+.+.}-{0:0}, at: 
__slab_alloc.isra.87+0xad/0xc0
[  154.927317]
   but task is already holding lock:
[  154.927317] ac4052d0 (pidmap_lock){+.+.}-{2:2}, at: 
alloc_pid+0x1fb/0x510
[  154.927324]
   which lock already depends on the new lock.

[  154.927324]
   the existing dependency chain (in reverse order) is:
[  154.927325]
   -> #1 (pidmap_lock){+.+.}-{2:2}:
[  154.927328]lock_acquire+0x92/0x410
[  154.927331]rt_spin_lock+0x2b/0xc0
[  154.927335]free_pid+0x27/0xc0
[  154.927338]release_task+0x34a/0x640
[  154.927340]do_exit+0x6e9/0xcf0
[  154.927342]kthread+0x11c/0x190
[  154.927344]ret_from_fork+0x1f/0x30
[  154.927347]
   -> #0 (>seqcount){+.+.}-{0:0}:
[  154.927350]validate_chain+0x981/0x1250
[  154.927352]__lock_acquire+0x86f/0xbd0
[  154.927354]lock_acquire+0x92/0x410
[  154.927356]___slab_alloc+0x71b/0x820
[  154.927358]__slab_alloc.isra.87+0xad/0xc0
[  154.927359]kmem_cache_alloc+0x700/0x8c0
[  154.927361]radix_tree_node_alloc.constprop.22+0xa2/0xf0
[  154.927365]idr_get_free+0x207/0x2b0
[  154.927367]idr_alloc_u32+0x54/0xa0
[  154.927369]idr_alloc_cyclic+0x4f/0xa0
[  154.927370]alloc_pid+0x22b/0x510
[  154.927372]copy_process+0xeb5/0x1de0
[  154.927375]_do_fork+0x52/0x750
[  154.927377]__do_sys_clone+0x64/0x70
[  154.927379]do_syscall_64+0x33/0x40
[  154.927382]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  154.927384]
   other info that might help us debug this:

[  154.927384]  Possible unsafe locking scenario:

[  154.927385]CPU0CPU1
[  154.927386]
[  154.927386]   lock(pidmap_lock);
[  154.927388]lock(>seqcount);
[  154.927389]lock(pidmap_lock);
[  154.927391]   lock(>seqcount);
[  154.927392]
*** DEADLOCK ***

[  154.927393] 4 locks held by cpuset_inherit_/4992:
[  154.927394]  #0: 9d33decea5b0 ((lock).lock){+.+.}-{2:2}, at: 
__radix_tree_preload+0x52/0x3b0
[  154.927399]  #1: ac598fa0 (rcu_read_lock){}-{1:2}, at: 
rt_spin_lock+0x5/0xc0
[  154.927405]  #2: ac4052d0 (pidmap_lock){+.+.}-{2:2}, at: 
alloc_pid+0x1fb/0x510
[  154.927409]  #3: ac598fa0 (rcu_read_lock){}-{1:2}, at: 
rt_spin_lock+0x5/0xc0
[  154.927414]
   stack backtrace:
[  154.927416] CPU: 3 PID: 4992 Comm: cpuset_inherit_ Kdump: loaded Tainted: G 
S  E 5.9.1-rt18-rt #5
[  154.927418] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[  154.927419] Call Trace:
[  154.927422]  dump_stack+0x77/0x9b
[  154.927425]  check_noncircular+0x148/0x160
[  154.927432]  ? validate_chain+0x981/0x1250
[  154.927435]  validate_chain+0x981/0x1250
[  154.927441]  __lock_acquire+0x86f/0xbd0
[  154.927446]  lock_acquire+0x92/0x410
[  154.927449]  ? __slab_alloc.isra.87+0xad/0xc0
[  154.927452]  ? kmem_cache_alloc+0x648/0x8c0
[  154.927453]  ? lock_acquire+0x92/0x410
[  154.927458]  ___slab_alloc+0x71b/0x820
[  154.927460]  ? __slab_alloc.isra.87+0xad/0xc0
[  154.927463]  ? radix_tree_node_alloc.constprop.22+0xa2/0xf0
[  154.927468]  ? __slab_alloc.isra.87+0x83/0xc0
[  154.927472]  ? radix_tree_node_alloc.constprop.22+0xa2/0xf0
[  154.927474]  ? __slab_alloc.isra.87+0xad/0xc0
[  154.927476]  __slab_alloc.isra.87+0xad/0xc0
[  154.927480]  ? radix_tree_node_alloc.constprop.22+0xa2/0xf0
[  154.927482]  kmem_cache_alloc+0x700/0x8c0
[  154.927487]  radix_tree_node_alloc.constprop.22+0xa2/0xf0
[  154.927491]  idr_get_free+0x207/0x2b0
[  154.927495]  idr_alloc_u32+0x54/0xa0
[  154.927500]  idr_alloc_cyclic+0x4f/0xa0
[  154.927503]  alloc_pid+0x22b/0x510
[  154.927506]  ? copy_thread+0x88/0x200
[  154.927512]  copy_process+0xeb5/0x1de0
[  154.927520]  _do_fork+0x52/0x750
[  154.927523]  ? lock_acquire+0x92/0x410
[  154.927525]  ? __might_fault+0x3e/0x90
[  154.927530]  ? find_held_lock+0x2d/0x90
[  154.927535]  __do_sys_clone+0x64/0x70
[  154.927541]  do_syscall_64+0x33/0x40
[  154.927544]

[RFC PATCH v3] tools/x86: add kcpuid tool to show raw CPU features

2020-10-21 Thread Feng Tang

End users frequently want to know what features their processor
supports, independent of what the kernel supports.

/proc/cpuinfo is great. It is omnipresent and since it is provided by
the kernel it is always as up to date as the kernel. But, it could be
ambiguous about processor features which can be disabled by the kernel
at boot-time or compile-time.

There are some user space tools showing more raw features, but they are
not bound with kernel, and go with distros. Many end users are still
using old distros with new kernels (upgraded by themselves), and may
not upgrade the distros only to get a newer tool.

So here arise the need for a new tool, which
  * Shows raw cpu features got from running cpuid
  * Be easier to obtain updates for compared to existing userspace
tooling (perhaps distributed like perf)
  * Inherits "modern" kernel development process, in contrast to some
of the existing userspace cpuid tools which are still being developed
without git and distributed in tarballs from non-https sites.
  * Can produce output consistent with /proc/cpuinfo to make comparison
easier.

This RFC is an early prototype, and would get community's opinion on
whether it's the right thing to do, and what functions it should also
support.

It contains one .c core file and one text file which shows the bits
definition of all CPUID output data, while in v1, a specific data
structure is defined for each eax/ebx/ecx/edx output of each leaf
and subleaf, which is less expandable [1].

This is based on the prototype code from Borislav Petkov [2].

The supported options are:

  Usage: kcpuid [-abdfhr] [-l leaf] [-s subleaf]
-a|--all Show info of all CPUID leafs and subleafs(default 
on)
-b|--bitflagsShow boolean flags only
-d|--detail  Show details of the flag/fields
-f|--flags   Show boolean flags only
-h|--helpShow usage info
-l|--leaf=index  Specify the leaf you want to check
-r|--raw Show raw cpuid data
-s|--subleaf=sub Specify the subleaf you want to check

Current RFC version only shows limited number of cpu features, and will
be completed

output of the tool (output cut version)
---

#kcpuid -r

Basic Leafs :

0x: EAX=0x000d, EBX=0x756e6547, ECX=0x6c65746e, EDX=0x49656e69
0x0001: EAX=0x000206d7, EBX=0x0d200800, ECX=0x1fbee3ff, EDX=0xbfebfbff
0x0002: EAX=0x76035a01, EBX=0x00f0b2ff, ECX=0x, EDX=0x00ca
0x0004: subleafs:
  0: EAX=0x3c004121, EBX=0x01c0003f, ECX=0x003f, EDX=0x
  1: EAX=0x3c004122, EBX=0x01c0003f, ECX=0x003f, EDX=0x
  2: EAX=0x3c004143, EBX=0x01c0003f, ECX=0x01ff, EDX=0x
  3: EAX=0x3c07c163, EBX=0x04c0003f, ECX=0x3fff, EDX=0x0006
...

Extended Leafs :

0x8000: EAX=0x8008, EBX=0x, ECX=0x, EDX=0x
0x8001: EAX=0x, EBX=0x, ECX=0x0001, EDX=0x2c100800

#kcpuid -d

max_basic_leafs : 0xd   - Max input value for supported 
subleafs
stepping: 0x7   - Stepping ID
model   : 0xd   - Model
family  : 0x6   - Family ID
processor   : 0x0   - Processor Type
sse3 - Streaming SIMD Extensions 3(SSE3)
pclmulqdq- Support PCLMULQDQ instruction
dtes64   - DS area uses 64-bit layout
mwait- MONITOR/MWAIT supported
ds_cpl   - CPL Qualified Debug Store, which allows for branch 
message storage qualified by CPL
vmx  - Virtual Machine Extensions supported
smx  - Safer Mode Extension supported
...

#kcpuid -b

sse3
pclmulqdq
dtes64
mwait
ds_cpl
vmx
smx
eist
tm2
...

#kcpuid -l 0x1

stepping: 0x7
model   : 0xd
family  : 0x6
processor   : 0x0
clflush_size: 0x8
max_cpu_id  : 0x20
apic_id : 0xf
sse3
pclmulqdq
dtes64
mwait
ds_cpl
...

[1]. 
https://lore.kernel.org/lkml/1598514543-90152-1-git-send-email-feng.t...@intel.com/
[2]. http://sr71.net/~dave/intel/stupid-cpuid.c

Originally-from: Borislav Petkov 
Suggested-by: Dave Hansen 
Suggested-by: Borislav Petkov 
Signed-off-by: Feng Tang 
---
Changelog:

  v3:
  * use .csv file instead of plain text file which also uses
comma to separate fields (Borislav)
  * add option to support a user specified .csv file (Dave)
  * make install will put the csv file under /usr/share/hwdata/ (Dave)

  v2:
  * use a new text file to store all the bits definition of each
CPUID

Re: [RESEND PATCH v18 0/4] overlayfs override_creds=off & nested get xattr fix

2020-10-21 Thread Eric Biggers

On Wed, Oct 21, 2020 at 08:18:59AM -0700, Mark Salyzyn wrote:
> Mark Salyzyn (3):
>   Add flags option to get xattr method paired to __vfs_getxattr
>   overlayfs: handle XATTR_NOSECURITY flag for get xattr method
>   overlayfs: override_creds=off option bypass creator_cred
> 
> Mark Salyzyn + John Stultz (1):
>   overlayfs: inode_owner_or_capable called during execv
> 
> The first three patches address fundamental security issues that should
> be solved regardless of the override_creds=off feature.
> 
> The fourth adds the feature depends on these other fixes.

FYI, I didn't receive patch 4, and neither https://lkml.kernel.org/linux-fsdevel
nor https://lkml.kernel.org/linux-unionfs have it either.

- Eric

Re: [PATCH V2 2/8] drm/lima: Unconditionally call dev_pm_opp_of_remove_table()

2020-10-21 Thread Viresh Kumar

On 28-08-20, 11:37, Viresh Kumar wrote:
> dev_pm_opp_of_remove_table() doesn't report any errors when it fails to
> find the OPP table with error -ENODEV (i.e. OPP table not present for
> the device). And we can call dev_pm_opp_of_remove_table()
> unconditionally here.
> 
> Reviewed-by: Qiang Yu 
> Signed-off-by: Viresh Kumar 
> 
> ---
> V2: Applied Reviewed by tag.
> ---
>  drivers/gpu/drm/lima/lima_devfreq.c | 6 +-
>  drivers/gpu/drm/lima/lima_devfreq.h | 1 -
>  2 files changed, 1 insertion(+), 6 deletions(-)

Qiang, can you please pick it up ?

-- 
viresh

Re: [PATCH V2 3/8] drm/msm: Unconditionally call dev_pm_opp_of_remove_table()

2020-10-21 Thread Viresh Kumar

On 21-10-20, 09:58, Rob Clark wrote:
> On Wed, Oct 21, 2020 at 12:24 AM Viresh Kumar  wrote:
> >
> > On 05-10-20, 11:56, Viresh Kumar wrote:
> > > On 28-08-20, 11:37, Viresh Kumar wrote:
> > > > dev_pm_opp_of_remove_table() doesn't report any errors when it fails to
> > > > find the OPP table with error -ENODEV (i.e. OPP table not present for
> > > > the device). And we can call dev_pm_opp_of_remove_table()
> > > > unconditionally here.
> > > >
> > > > While at it, also create a label to put clkname.
> > > >
> > > > Signed-off-by: Viresh Kumar 
> > >
> > > Can someone please apply this and the other drm patch (2/8) ?
> >
> > Rob/Rajendra, can someone please have a look at these patches ?
> 
> Oh, sorry I missed that, could someone re-send it and CC
> freedr...@lists.freedesktop.org so it shows up in patchworks.. that is
> more reliable counting on me to not overlook something in my mail

I have just bounced it back to you and freedreno was already cc'd,
though I have bounced it there again.

-- 
viresh

Re: [PATCH 4/5] RISC-V: Protect .init.text & .init.data

2020-10-21 Thread Anup Patel

On Thu, Oct 22, 2020 at 7:01 AM Atish Patra  wrote:
>
> On Fri, Oct 16, 2020 at 11:24 AM Atish Patra  wrote:
> >
> > On Tue, Oct 13, 2020 at 10:24 PM Atish Patra  wrote:
> > >
> > > On Tue, Oct 13, 2020 at 6:21 PM Jim Wilson  wrote:
> > > >
> > > > On Tue, Oct 13, 2020 at 3:25 PM Atish Patra  
> > > > wrote:
> > > > > This happens only when copy_from_user is called from function that is
> > > > > annotated with __init.
> > > > > Adding Kito & Jim for their input
> > > > >
> > > > > @kito, @Jim: Please let me know if I should create a issue in
> > > > > riscv-gnu-toolchain repo or somewhere else.
> > > >
> > > > I can't do anything useful without a testcase that I can use to
> > > > reproduce the problem.  The interactions here are complex, so pointing
> > > > at lines of code or kernel config options doesn't give me any useful
> > > > info.
> > > >
> > > > Relaxation can convert calls to a jal.  I don't know of any open bugs
> > > > in this area that can generate relocation errors.  if it is a
> > > > relaxation error then turning off relaxation should work around the
> > > > problem as you suggested.
> > > >
> > > > A kernel build problem is serious.  I think this is worth a bug
> > > > report.  FSF binutils or riscv-gnu-toolchain is fine.
> > > >
> > >
> > > I have created an issue with detailed descriptions and reproduction steps.
> > > Please let me know if you need anything else.
> > >
> >
> > It may be a toolchain issue. Here is the ongoing discussion in case
> > anybody else is interested.
> >
> > https://github.com/riscv/riscv-gnu-toolchain/issues/738
> >
> > > > Jim
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Atish
> >
> >
> >
> > --
> > Regards,
> > Atish
>
> Thanks to Jim, we know the cause now. Jim has provided an excellent
> analysis of the issue in the github issue report.
> https://github.com/riscv/riscv-gnu-toolchain/issues/738
>
> To summarize, the linker relaxation code is not aware of the
> alignments between sections.
> That's why it relaxes the calls from .text to .init.text and  converts
> a auipc+jalr pair to jal even if the address can't be fit +/- 1MB.
>
> There are few ways we can solve this problem.
>
> 1. As per Jim's suggestion, linker relaxation code is aware of the
> section alignments. We can mark .init.text as a 2MB aligned section.
>For calls within a section, section's alignment will be used in the
> calculation. For calls across sections, e.g. from .init.text to .text,
> the maximum
>section alignment of every section will be used. Thus, all
> relaxation within .init.text and between any sections will be
> impacted.
>Thus, it avoids the error but results in the following increase in
> size of various sections.
>  section   change in size (in bytes)
>  .head.text  +4
>  .text   +40
>  .init.text.+6530
>  .exit.text+84
>
> The only significant increase is .init.text but it is freed after
> boot. Thus, I don't see any significant performance degradation due to
> that.
>
> Here is the diff
> --- a/arch/riscv/kernel/vmlinux.lds.S
> +++ b/arch/riscv/kernel/vmlinux.lds.S
> @@ -51,7 +51,13 @@ SECTIONS
> . = ALIGN(SECTION_ALIGN);
> __init_begin = .;
> __init_text_begin = .;
> -   INIT_TEXT_SECTION(PAGE_SIZE)
> +   . = ALIGN(PAGE_SIZE);   \
> +   .init.text : AT(ADDR(.init.text) - LOAD_OFFSET)
> ALIGN(SECTION_ALIGN) {  \
> +   _sinittext = .; \
> +   INIT_TEXT   \
> +   _einittext = .; \
> +   }
> +
> . = ALIGN(8);
> __soc_early_init_table : {
> __soc_early_init_table_start = .;
>
> 2. We will continue to keep head.txt & .init.text together before
> .text. However, we will map the pages that contain head & init.text at
> page
> granularity so that .head.text and init.text can have different
> permissions. I have not measured the performance impact of this but it
> won't
> too bad given that the combined size of sections .head.txt &
> .init.text is 200K. So we are talking about page level permission only
> for
> ~50 pages during boot.
>
> 3. Keep head.text in a separate 2MB aligned section. .init.text will
> follow .head.text in its own section as well. This increases the
> kernel
> size by 2MB for MMU kernels. For nommu case, it will only increase
> by 64 bytes due to smaller section alignment for nommu kernels.
>
> Both solutions 1 & 2 come at minimal performance on boot time while
> solution 3 comes at increased kernel size.
>
> Any preference ?

I prefer solution#3 because:
1. This will help us avoid special handling of static objects
2.  This will make RISC-V linker script more aligned with other
 major architectures

I don't think solution#3 will increase the size of the kernel by 2MB. We
can make head.text part of text

[PATCH V3 4/4] misc: vop: mapping kernel memory to user space as noncached

2020-10-21 Thread Sherry Sun

Mapping kernel space memory to user space as noncached, since user space
need check the updates of avail_idx and device page flags timely.

Signed-off-by: Joakim Zhang 
Signed-off-by: Sherry Sun 
---
 drivers/misc/mic/vop/vop_vringh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/mic/vop/vop_vringh.c 
b/drivers/misc/mic/vop/vop_vringh.c
index b5612183dcb8..b75c2b713a3b 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -1058,7 +1058,7 @@ static int vop_mmap(struct file *f, struct vm_area_struct 
*vma)
}
err = remap_pfn_range(vma, vma->vm_start + offset,
  pa >> PAGE_SHIFT, size,
- vma->vm_page_prot);
+ pgprot_noncached(vma->vm_page_prot));
if (err)
goto ret;
size_rem -= size;
-- 
2.17.1

[PATCH V3 1/4] misc: vop: change the way of allocating vring and device page

2020-10-21 Thread Sherry Sun

Allocate vrings use dma_alloc_coherent is a common way in kernel. As the
memory interacted between two systems should use consistent memory to
avoid caching effects, same as device page memory.

The orginal way use __get_free_pages and dma_map_single to allocate and
map vring, but not use dma_sync_single_for_cpu/device api to sync the
changes of vring between EP and RC, which will cause memory
synchronization problem for those devices which don't support hardware
dma coherent.

Signed-off-by: Joakim Zhang 
Signed-off-by: Sherry Sun 
---
 drivers/misc/mic/host/mic_main.c  | 15 +++
 drivers/misc/mic/vop/vop_vringh.c | 43 +--
 2 files changed, 16 insertions(+), 42 deletions(-)

diff --git a/drivers/misc/mic/host/mic_main.c b/drivers/misc/mic/host/mic_main.c
index ea4608527ea0..ebacaa35cb47 100644
--- a/drivers/misc/mic/host/mic_main.c
+++ b/drivers/misc/mic/host/mic_main.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include "../common/mic_dev.h"
@@ -48,18 +49,11 @@ static struct ida g_mic_ida;
 /* Initialize the device page */
 static int mic_dp_init(struct mic_device *mdev)
 {
-   mdev->dp = kzalloc(MIC_DP_SIZE, GFP_KERNEL);
+   mdev->dp = dma_alloc_coherent(>pdev->dev, MIC_DP_SIZE,
+ >dp_dma_addr, GFP_KERNEL);
if (!mdev->dp)
return -ENOMEM;
 
-   mdev->dp_dma_addr = mic_map_single(mdev,
-   mdev->dp, MIC_DP_SIZE);
-   if (mic_map_error(mdev->dp_dma_addr)) {
-   kfree(mdev->dp);
-   dev_err(>pdev->dev, "%s %d err %d\n",
-   __func__, __LINE__, -ENOMEM);
-   return -ENOMEM;
-   }
mdev->ops->write_spad(mdev, MIC_DPLO_SPAD, mdev->dp_dma_addr);
mdev->ops->write_spad(mdev, MIC_DPHI_SPAD, mdev->dp_dma_addr >> 32);
return 0;
@@ -68,8 +62,7 @@ static int mic_dp_init(struct mic_device *mdev)
 /* Uninitialize the device page */
 static void mic_dp_uninit(struct mic_device *mdev)
 {
-   mic_unmap_single(mdev, mdev->dp_dma_addr, MIC_DP_SIZE);
-   kfree(mdev->dp);
+   dma_free_coherent(>pdev->dev, MIC_DP_SIZE, mdev->dp, 
mdev->dp_dma_addr);
 }
 
 /**
diff --git a/drivers/misc/mic/vop/vop_vringh.c 
b/drivers/misc/mic/vop/vop_vringh.c
index 7014ffe88632..2cc3c22482b5 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -298,9 +298,8 @@ static int vop_virtio_add_device(struct vop_vdev *vdev,
mutex_init(>vr_mutex);
vr_size = PAGE_ALIGN(round_up(vring_size(num, 
MIC_VIRTIO_RING_ALIGN), 4) +
sizeof(struct _mic_vring_info));
-   vr->va = (void *)
-   __get_free_pages(GFP_KERNEL | __GFP_ZERO,
-get_order(vr_size));
+   vr->va = dma_alloc_coherent(vop_dev(vdev), vr_size, _addr,
+   GFP_KERNEL);
if (!vr->va) {
ret = -ENOMEM;
dev_err(vop_dev(vdev), "%s %d err %d\n",
@@ -310,15 +309,6 @@ static int vop_virtio_add_device(struct vop_vdev *vdev,
vr->len = vr_size;
vr->info = vr->va + round_up(vring_size(num, 
MIC_VIRTIO_RING_ALIGN), 4);
vr->info->magic = cpu_to_le32(MIC_MAGIC + vdev->virtio_id + i);
-   vr_addr = dma_map_single(>dev, vr->va, vr_size,
-DMA_BIDIRECTIONAL);
-   if (dma_mapping_error(>dev, vr_addr)) {
-   free_pages((unsigned long)vr->va, get_order(vr_size));
-   ret = -ENOMEM;
-   dev_err(vop_dev(vdev), "%s %d err %d\n",
-   __func__, __LINE__, ret);
-   goto err;
-   }
vqconfig[i].address = cpu_to_le64(vr_addr);
 
vring_init(>vr, num, vr->va, MIC_VIRTIO_RING_ALIGN);
@@ -339,11 +329,8 @@ static int vop_virtio_add_device(struct vop_vdev *vdev,
dev_dbg(>dev,
"%s %d index %d va %p info %p vr_size 0x%x\n",
__func__, __LINE__, i, vr->va, vr->info, vr_size);
-   vvr->buf = (void *)__get_free_pages(GFP_KERNEL,
-   get_order(VOP_INT_DMA_BUF_SIZE));
-   vvr->buf_da = dma_map_single(>dev,
- vvr->buf, VOP_INT_DMA_BUF_SIZE,
- DMA_BIDIRECTIONAL);
+   vvr->buf = dma_alloc_coherent(vop_dev(vdev), 
VOP_INT_DMA_BUF_SIZE,
+ >buf_da, GFP_KERNEL);
}
 
snprintf(irqname, sizeof(irqname), "vop%dvirtio%d", vpdev->index,
@@ -382,10 +369,8 @@ static int vop_virtio_add_device(struct vop_vdev *vdev,
for (j = 0; j < i; j++) {
struct vop_vringh *vvr = >vvr[j];
 
-

[PATCH V3 3/4] misc: vop: simply return the saved dma address instead of virt_to_phys

2020-10-21 Thread Sherry Sun

The device page and vring should use consistent memory which are
allocated by dma_alloc_coherent api, when user space wants to get its
physical address, virt_to_phys cannot be used, should simply return the
saved device page dma address by get_dp_dma callback and the vring dma
address saved in mic_vqconfig.

Signed-off-by: Sherry Sun 
Signed-off-by: Joakim Zhang 
---
 drivers/misc/mic/bus/vop_bus.h| 2 ++
 drivers/misc/mic/host/mic_boot.c  | 8 
 drivers/misc/mic/vop/vop_vringh.c | 8 +++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/mic/bus/vop_bus.h b/drivers/misc/mic/bus/vop_bus.h
index 4fa02808c1e2..d891eacae358 100644
--- a/drivers/misc/mic/bus/vop_bus.h
+++ b/drivers/misc/mic/bus/vop_bus.h
@@ -75,6 +75,7 @@ struct vop_driver {
  * node to add/remove/configure virtio devices.
  * @get_dp: Get access to the virtio device page used by the self
  *  node to add/remove/configure virtio devices.
+ * @get_dp_dma: Get dma address of the virtio device page.
  * @send_intr: Send an interrupt to the peer node on a specified doorbell.
  * @remap: Map a buffer with the specified DMA address and length.
  * @unmap: Unmap a buffer previously mapped.
@@ -92,6 +93,7 @@ struct vop_hw_ops {
void (*ack_interrupt)(struct vop_device *vpdev, int num);
void __iomem * (*get_remote_dp)(struct vop_device *vpdev);
void * (*get_dp)(struct vop_device *vpdev);
+   dma_addr_t (*get_dp_dma)(struct vop_device *vpdev);
void (*send_intr)(struct vop_device *vpdev, int db);
void __iomem * (*remap)(struct vop_device *vpdev,
  dma_addr_t pa, size_t len);
diff --git a/drivers/misc/mic/host/mic_boot.c b/drivers/misc/mic/host/mic_boot.c
index 8cb85b8b3e19..9591bf609f57 100644
--- a/drivers/misc/mic/host/mic_boot.c
+++ b/drivers/misc/mic/host/mic_boot.c
@@ -89,6 +89,13 @@ static void *__mic_get_dp(struct vop_device *vpdev)
return mdev->dp;
 }
 
+static dma_addr_t __mic_get_dp_dma(struct vop_device *vpdev)
+{
+   struct mic_device *mdev = vpdev_to_mdev(>dev);
+
+   return mdev->dp_dma_addr;
+}
+
 static void __iomem *__mic_get_remote_dp(struct vop_device *vpdev)
 {
return NULL;
@@ -120,6 +127,7 @@ static struct vop_hw_ops vop_hw_ops = {
.ack_interrupt = __mic_ack_interrupt,
.next_db = __mic_next_db,
.get_dp = __mic_get_dp,
+   .get_dp_dma = __mic_get_dp_dma,
.get_remote_dp = __mic_get_remote_dp,
.send_intr = __mic_send_intr,
.remap = __mic_ioremap,
diff --git a/drivers/misc/mic/vop/vop_vringh.c 
b/drivers/misc/mic/vop/vop_vringh.c
index cc928d45af5a..b5612183dcb8 100644
--- a/drivers/misc/mic/vop/vop_vringh.c
+++ b/drivers/misc/mic/vop/vop_vringh.c
@@ -1009,7 +1009,13 @@ vop_query_offset(struct vop_vdev *vdev, unsigned long 
offset,
 * 
 */
if (!offset) {
-   *pa = virt_to_phys(vpdev->hw_ops->get_dp(vpdev));
+   if (vpdev->hw_ops->get_dp_dma)
+   *pa = vpdev->hw_ops->get_dp_dma(vpdev);
+   else {
+   dev_err(vop_dev(vdev), "can't get device page physical 
address\n");
+   return -EINVAL;
+   }
+
*size = MIC_DP_SIZE;
return 0;
}
-- 
2.17.1

[PATCH V3 0/4] Change vring space from nomal memory to dma coherent memory

2020-10-21 Thread Sherry Sun

Changes in V3:
1. Change the device page allocation method of the Intel mic layer in Patch 1 to
align with the vring allocation. 
2. Move the vring physical address changes in mmap callback from Prach 3 to 1.
3. Use must_be_zero instead of directly deleting used_address_updated in
Patch 2 to avoid the influence of ABI change.
(I tried to use dma_mmap_coherent api in Patch 4 as Christoph suggested, but
 met some issues explained here 
https://lore.kernel.org/patchwork/patch/1313327/,
 so there is no change to Patch 4, and still can't find a better way than
 patch 3)

The original vop driver only supports dma coherent device, as it allocates and
maps vring by _get_free_pages and dma_map_single, but not use 
dma_sync_single_for_cpu/device to sync the updates of device_page/vring between
EP and RC, which will cause memory synchronization problem for device don't
support hardware dma coherent.

And allocate vrings use dma_alloc_coherent is a common way in kernel, as the
memory interacted between two systems should use consistent memory to avoid
caching effects. So here add noncoherent platform support for vop driver.
Also add some related dma changes to make sure noncoherent platform works
well.

Sherry Sun (4):
  misc: vop: change the way of allocating vring and device page
  misc: vop: do not allocate and reassign the used ring
  misc: vop: simply return the saved dma address instead of virt_to_phys
  misc: vop: mapping kernel memory to user space as noncached

 drivers/misc/mic/bus/vop_bus.h |  2 +
 drivers/misc/mic/host/mic_boot.c   |  8 +++
 drivers/misc/mic/host/mic_main.c   | 15 ++
 drivers/misc/mic/vop/vop_debugfs.c |  2 -
 drivers/misc/mic/vop/vop_main.c| 48 +++--
 drivers/misc/mic/vop/vop_vringh.c  | 84 +++---
 include/uapi/linux/mic_common.h|  5 +-
 7 files changed, 42 insertions(+), 122 deletions(-)

-- 
2.17.1

[PATCH V3 2/4] misc: vop: do not allocate and reassign the used ring

2020-10-21 Thread Sherry Sun

We don't need to allocate and reassign the used ring here and remove the
used_address_updated flag.Since RC has allocated the entire vring,
including the used ring. Simply add the corresponding offset can get the
used ring address.

If following the orginal way to reassign the used ring, will encounter a
problem. When host finished with descriptor, it will update the used
ring with putused_kern api, if reassign used ring at EP side, used
ring will be io device memory for RC, use memcpy in putused_kern will
cause kernel panic.

Signed-off-by: Sherry Sun 
Signed-off-by: Joakim Zhang 
---
 drivers/misc/mic/vop/vop_debugfs.c |  2 --
 drivers/misc/mic/vop/vop_main.c| 48 --
 drivers/misc/mic/vop/vop_vringh.c  | 31 ---
 include/uapi/linux/mic_common.h|  5 ++--
 4 files changed, 8 insertions(+), 78 deletions(-)

diff --git a/drivers/misc/mic/vop/vop_debugfs.c 
b/drivers/misc/mic/vop/vop_debugfs.c
index 9d4f175f4dd1..05eca4a98585 100644
--- a/drivers/misc/mic/vop/vop_debugfs.c
+++ b/drivers/misc/mic/vop/vop_debugfs.c
@@ -79,8 +79,6 @@ static int vop_dp_show(struct seq_file *s, void *pos)
seq_printf(s, "Vdev reset %d\n", dc->vdev_reset);
seq_printf(s, "Guest Ack %d ", dc->guest_ack);
seq_printf(s, "Host ack %d\n", dc->host_ack);
-   seq_printf(s, "Used address updated %d ",
-  dc->used_address_updated);
seq_printf(s, "Vdev 0x%llx\n", dc->vdev);
seq_printf(s, "c2h doorbell %d ", dc->c2h_vdev_db);
seq_printf(s, "h2c doorbell %d\n", dc->h2c_vdev_db);
diff --git a/drivers/misc/mic/vop/vop_main.c b/drivers/misc/mic/vop/vop_main.c
index 714b94f42d38..1ccc94dfd6ac 100644
--- a/drivers/misc/mic/vop/vop_main.c
+++ b/drivers/misc/mic/vop/vop_main.c
@@ -250,10 +250,6 @@ static void vop_del_vq(struct virtqueue *vq, int n)
struct _vop_vdev *vdev = to_vopvdev(vq->vdev);
struct vop_device *vpdev = vdev->vpdev;
 
-   dma_unmap_single(>dev, vdev->used[n],
-vdev->used_size[n], DMA_BIDIRECTIONAL);
-   free_pages((unsigned long)vdev->used_virt[n],
-  get_order(vdev->used_size[n]));
vring_del_virtqueue(vq);
vpdev->hw_ops->unmap(vpdev, vdev->vr[n]);
vdev->vr[n] = NULL;
@@ -340,8 +336,8 @@ static struct virtqueue *vop_find_vq(struct virtio_device 
*dev,
vdev->used_size[index] = PAGE_ALIGN(sizeof(__u16) * 3 +
 sizeof(struct vring_used_elem) *
 le16_to_cpu(config.num));
-   used = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
-   get_order(vdev->used_size[index]));
+   used = va + PAGE_ALIGN(sizeof(struct vring_desc) * 
le16_to_cpu(config.num) +
+  sizeof(__u16) * (3 + le16_to_cpu(config.num)));
vdev->used_virt[index] = used;
if (!used) {
err = -ENOMEM;
@@ -355,27 +351,15 @@ static struct virtqueue *vop_find_vq(struct virtio_device 
*dev,
   name, used);
if (!vq) {
err = -ENOMEM;
-   goto free_used;
+   goto unmap;
}
 
-   vdev->used[index] = dma_map_single(>dev, used,
-   vdev->used_size[index],
-   DMA_BIDIRECTIONAL);
-   if (dma_mapping_error(>dev, vdev->used[index])) {
-   err = -ENOMEM;
-   dev_err(_vop_dev(vdev), "%s %d err %d\n",
-   __func__, __LINE__, err);
-   goto del_vq;
-   }
+   vdev->used[index] = config.address + PAGE_ALIGN(sizeof(struct 
vring_desc) * le16_to_cpu(config.num) +
+   sizeof(__u16) * (3 + le16_to_cpu(config.num)));
writeq(vdev->used[index], >used_address);
 
vq->priv = vdev;
return vq;
-del_vq:
-   vring_del_virtqueue(vq);
-free_used:
-   free_pages((unsigned long)used,
-  get_order(vdev->used_size[index]));
 unmap:
vpdev->hw_ops->unmap(vpdev, vdev->vr[index]);
return ERR_PTR(err);
@@ -388,9 +372,7 @@ static int vop_find_vqs(struct virtio_device *dev, unsigned 
nvqs,
struct irq_affinity *desc)
 {
struct _vop_vdev *vdev = to_vopvdev(dev);
-   struct vop_device *vpdev = vdev->vpdev;
-   struct mic_device_ctrl __iomem *dc = vdev->dc;
-   int i, err, retry, queue_idx = 0;
+   int i, err, queue_idx = 0;
 
/* We must have this many virtqueues. */
if (nvqs > ioread8(>desc->num_vq))
@@ -412,24 +394,6 @@ static int vop_find_vqs(struct virtio_device *dev, 
unsigned nvqs,
}
}
 
-   iowrite8(1, >used_address_updated);
-   /*
-* Send an interrupt to the host to inform it that used
-* rings have been re-assigned.
-*/
-

[PATCH v3 1/3] ARM: dts: rockchip: veyron: Remove 0 point from brightness-levels

2020-10-21 Thread Alexandru Stan

The extra 0 only adds one point in the userspace visible range,
so this change is almost a noop with the current driver behavior.

We don't need the 0% point, userspace seems to handle this just fine
because it uses the bl_power property to turn off the display.

Furthermore after adding "backlight: pwm_bl: Fix interpolation" patch,
the backlight interpolation will work a little differently. So we need
to preemptively remove the 0-3 segment since otherwise we would have a
252 long interpolation that would slowly go between 0 and 3, looking
really bad in userspace. So it's almost a noop/cleanup now, but it will
be required in the future.

Signed-off-by: Alexandru Stan 
---

 arch/arm/boot/dts/rk3288-veyron-jaq.dts| 2 +-
 arch/arm/boot/dts/rk3288-veyron-minnie.dts | 2 +-
 arch/arm/boot/dts/rk3288-veyron-tiger.dts  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/boot/dts/rk3288-veyron-jaq.dts 
b/arch/arm/boot/dts/rk3288-veyron-jaq.dts
index af77ab20586d..4a148cf1defc 100644
--- a/arch/arm/boot/dts/rk3288-veyron-jaq.dts
+++ b/arch/arm/boot/dts/rk3288-veyron-jaq.dts
@@ -20,7 +20,7 @@ / {
 
  {
/* Jaq panel PWM must be >= 3%, so start non-zero brightness at 8 */
-   brightness-levels = <0 8 255>;
+   brightness-levels = <8 255>;
num-interpolated-steps = <247>;
 };
 
diff --git a/arch/arm/boot/dts/rk3288-veyron-minnie.dts 
b/arch/arm/boot/dts/rk3288-veyron-minnie.dts
index f8b69e0a16a0..82fc6fba 100644
--- a/arch/arm/boot/dts/rk3288-veyron-minnie.dts
+++ b/arch/arm/boot/dts/rk3288-veyron-minnie.dts
@@ -39,7 +39,7 @@ volum_up {
 
  {
/* Minnie panel PWM must be >= 1%, so start non-zero brightness at 3 */
-   brightness-levels = <0 3 255>;
+   brightness-levels = <3 255>;
num-interpolated-steps = <252>;
 };
 
diff --git a/arch/arm/boot/dts/rk3288-veyron-tiger.dts 
b/arch/arm/boot/dts/rk3288-veyron-tiger.dts
index 069f0c2c1fdf..52a84cbe7a90 100644
--- a/arch/arm/boot/dts/rk3288-veyron-tiger.dts
+++ b/arch/arm/boot/dts/rk3288-veyron-tiger.dts
@@ -23,7 +23,7 @@ / {
 
  {
/* Tiger panel PWM must be >= 1%, so start non-zero brightness at 3 */
-   brightness-levels = <0 3 255>;
+   brightness-levels = <3 255>;
num-interpolated-steps = <252>;
 };
 
-- 
2.28.0

[PATCH v3 3/3] backlight: pwm_bl: Fix interpolation

2020-10-21 Thread Alexandru Stan

The previous behavior was a little unexpected, its properties/problems:
1. It was designed to generate strictly increasing values (no repeats)
2. It had quantization errors when calculating step size. Resulting in
unexpected jumps near the end of some segments.

Example settings:
brightness-levels = <0 1 2 4 8 16 32 64 128 256>;
num-interpolated-steps = <16>;

Whenever num-interpolated-steps was larger than the distance
between 2 consecutive brightness levels the table would get really
discontinuous. The slope of the interpolation would stick with
integers only and if it was 0 the whole line segment would get skipped.

The distances between 1 2 4 and 8 would be 1 (property #1 fighting us),
and only starting with 16 it would start to interpolate properly.

Property #1 is not enough. The goal here is more than just monotonically
increasing. We should still care about the shape of the curve. Repeated
points might be desired if we're in the part of the curve where we want
to go slow (aka slope near 0).

Problem #2 is plainly a bug. Imagine if the 64 entry was 63 instead,
the calculated slope on the 32-63 segment will be almost half as it
should be.

The most expected and simplest algorithm for interpolation is linear
interpolation, which would handle both problems.
Let's just implement that!

Take pairs of points from the brightness-levels array and linearly
interpolate between them. On the X axis (what userspace sees) we'll
now have equally sized intervals (num-interpolated-steps sized,
as opposed to before where we were at the mercy of quantization).

END

Signed-off-by: Alexandru Stan 
---

 drivers/video/backlight/pwm_bl.c | 70 ++--
 1 file changed, 31 insertions(+), 39 deletions(-)

diff --git a/drivers/video/backlight/pwm_bl.c b/drivers/video/backlight/pwm_bl.c
index dfc760830eb9..e48fded3e414 100644
--- a/drivers/video/backlight/pwm_bl.c
+++ b/drivers/video/backlight/pwm_bl.c
@@ -230,8 +230,7 @@ static int pwm_backlight_parse_dt(struct device *dev,
  struct platform_pwm_backlight_data *data)
 {
struct device_node *node = dev->of_node;
-   unsigned int num_levels = 0;
-   unsigned int levels_count;
+   unsigned int num_levels;
unsigned int num_steps = 0;
struct property *prop;
unsigned int *table;
@@ -260,12 +259,11 @@ static int pwm_backlight_parse_dt(struct device *dev,
if (!prop)
return 0;
 
-   data->max_brightness = length / sizeof(u32);
+   num_levels = length / sizeof(u32);
 
/* read brightness levels from DT property */
-   if (data->max_brightness > 0) {
-   size_t size = sizeof(*data->levels) * data->max_brightness;
-   unsigned int i, j, n = 0;
+   if (num_levels > 0) {
+   size_t size = sizeof(*data->levels) * num_levels;
 
data->levels = devm_kzalloc(dev, size, GFP_KERNEL);
if (!data->levels)
@@ -273,7 +271,7 @@ static int pwm_backlight_parse_dt(struct device *dev,
 
ret = of_property_read_u32_array(node, "brightness-levels",
 data->levels,
-data->max_brightness);
+num_levels);
if (ret < 0)
return ret;
 
@@ -298,7 +296,13 @@ static int pwm_backlight_parse_dt(struct device *dev,
 * between two points.
 */
if (num_steps) {
-   if (data->max_brightness < 2) {
+   unsigned int num_input_levels = num_levels;
+   unsigned int i;
+   u32 x1, x2, x, dx;
+   u32 y1, y2;
+   s64 dy;
+
+   if (num_input_levels < 2) {
dev_err(dev, "can't interpolate\n");
return -EINVAL;
}
@@ -308,14 +312,7 @@ static int pwm_backlight_parse_dt(struct device *dev,
 * taking in consideration the number of interpolated
 * steps between two levels.
 */
-   for (i = 0; i < data->max_brightness - 1; i++) {
-   if ((data->levels[i + 1] - data->levels[i]) /
-  num_steps)
-   num_levels += num_steps;
-   else
-   num_levels++;
-   }
-   num_levels++;
+   num_levels = (num_input_levels - 1) * num_steps + 1;
dev_dbg(dev, "new number of brightness levels: %d\n",
num_levels);
 
@@ -327,24 +324,25 @@ static int pwm_backlight_parse_dt(struct device *dev,

[PATCH v3 2/3] arm64: dts: qcom: trogdor: Add brightness-levels

2020-10-21 Thread Alexandru Stan

We want userspace to represent the human perceived brightness.
Since the led drivers and the leds themselves don't have a
linear response to the value we give them in terms of perceived
brightness, we'll bake the curve into the dts.

The panel also doesn't have a good response under 5%, so we'll avoid
sending it anything lower than that.

Note: Ideally this patch should be coupled with the driver change from
"backlight: pwm_bl: Fix interpolation", but it can work without it,
without looking too ugly.

Signed-off-by: Alexandru Stan 
---

 arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi 
b/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi
index bf875589d364..ccdabc6c4994 100644
--- a/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi
@@ -179,6 +179,15 @@ pp3300_fp_tp: pp3300-fp-tp-regulator {
backlight: backlight {
compatible = "pwm-backlight";
 
+   /* The panels don't seem to like anything below ~ 5% */
+   brightness-levels = <
+   196 256 324 400 484 576 676 784 900 1024 1156 1296
+   1444 1600 1764 1936 2116 2304 2500 2704 2916 3136
+   3364 3600 3844 4096
+   >;
+   num-interpolated-steps = <64>;
+   default-brightness-level = <951>;
+
pwms = <_ec_pwm 1>;
enable-gpios = < 12 GPIO_ACTIVE_HIGH>;
power-supply = <_sys>;
-- 
2.28.0

[PATCH v3 0/3] PWM backlight interpolation adjustments

2020-10-21 Thread Alexandru Stan

I was trying to adjust the brightness-levels for the trogdor boards:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2291209
Like on a lot of panels, trogdor's low end needs to be cropped,
and now that we have the interpolation stuff I wanted to make use of it
and bake in even the curve that's customary to have on chromebooks.

I found the current behavior of the pwm_bl driver a little unintuitive
and non-linear. See patch 1 for a suggested fix for this.

A few veyron dts files were relying on this (perhaps weird) behavior.
Those devices also want a minimum brightness like trogdor, so changed
them to use the new way.

Finally, given that trogdor's dts is part of linux-next now, add the
brightness-levels to it, since that's the original reason I was looking at
this.

Changes in v3:
- Reordered patches, since both dts changes will work just fine
  even before the driver change.
- Rewrote a bit of the commit message to describe the new policy,
  as Daniel suggested.
- Removed redundant s64 for something that's always positive

Changes in v2:
- Fixed type promotion in the driver
- Removed "backlight: pwm_bl: Artificially add 0% during interpolation",
  userspace works just fine without it because it already knows how to use
  bl_power for turning off the display.
- Added brightness-levels to trogdor as well, now the dts is upstream.


Alexandru Stan (3):
  ARM: dts: rockchip: veyron: Remove 0 point from brightness-levels
  arm64: dts: qcom: trogdor: Add brightness-levels
  backlight: pwm_bl: Fix interpolation

 arch/arm/boot/dts/rk3288-veyron-jaq.dts  |  2 +-
 arch/arm/boot/dts/rk3288-veyron-minnie.dts   |  2 +-
 arch/arm/boot/dts/rk3288-veyron-tiger.dts|  2 +-
 arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi |  9 +++
 drivers/video/backlight/pwm_bl.c | 70 +---
 5 files changed, 43 insertions(+), 42 deletions(-)

-- 
2.28.0

Re: [PATCH v2 5/6] crypto: lib/sha256 - Unroll LOAD and BLEND loops

2020-10-21 Thread Eric Biggers

On Tue, Oct 20, 2020 at 04:39:56PM -0400, Arvind Sankar wrote:
> Unrolling the LOAD and BLEND loops improves performance by ~8% on x86_64
> (tested on Broadwell Xeon) while not increasing code size too much.
> 
> Signed-off-by: Arvind Sankar 
> ---

Looks good,

Reviewed-by: Eric Biggers

Re: [PATCH] drm/amd/display: fix a possible NULL pointer dereference in bios_parser_get_src_obj()

2020-10-21 Thread Alex Deucher

On Mon, Oct 19, 2020 at 8:38 AM estherbdf <603571...@qq.com> wrote:
>
> [Why] the func  bios_parser_get_src_obj () is similar to  
> bios_parser_get_dst_obj () which is fixed by the 
> commit("drm/amd/display: Banch of smatch error and warning 
> fixes in DC").
> the symbol 'id' is uninitialized and it is not checked before dereference 
> it,may lead to null pointer dereference.
> [How] Initialized variable explicitly with NULL and add sanitizer.

I think the current code is safe as is.  get_src_obj_list() will
return 0 if *id_list is NULL and bios_parser_get_src_obj() checks if
number <= index.

Alex


>
> Signed-off-by: estherbdf <603571...@qq.com>
> ---
>  drivers/gpu/drm/amd/display/dc/bios/bios_parser.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/bios/bios_parser.c 
> b/drivers/gpu/drm/amd/display/dc/bios/bios_parser.c
> index 008d4d1..94c6cca 100644
> --- a/drivers/gpu/drm/amd/display/dc/bios/bios_parser.c
> +++ b/drivers/gpu/drm/amd/display/dc/bios/bios_parser.c
> @@ -190,7 +190,7 @@ static enum bp_result bios_parser_get_src_obj(struct 
> dc_bios *dcb,
> struct graphics_object_id *src_object_id)
>  {
> uint32_t number;
> -   uint16_t *id;
> +   uint16_t *id = NULL;
> ATOM_OBJECT *object;
> struct bios_parser *bp = BP_FROM_DCB(dcb);
>
> @@ -206,7 +206,7 @@ static enum bp_result bios_parser_get_src_obj(struct 
> dc_bios *dcb,
>
> number = get_src_obj_list(bp, object, );
>
> -   if (number <= index)
> +   if (number <= index || !id)
> return BP_RESULT_BADINPUT;
>
> *src_object_id = object_id_from_bios_object_id(id[index]);
> --
> 1.9.1
>
>
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2 4/6] crypto: lib/sha256 - Unroll SHA256 loop 8 times intead of 64

2020-10-21 Thread Eric Biggers

On Tue, Oct 20, 2020 at 04:39:55PM -0400, Arvind Sankar wrote:
> This reduces code size substantially (on x86_64 with gcc-10 the size of
> sha256_update() goes from 7593 bytes to 1952 bytes including the new
> SHA256_K array), and on x86 is slightly faster than the full unroll
> (tesed on Broadwell Xeon).

tesed => tested

> 
> Signed-off-by: Arvind Sankar 
> ---
>  lib/crypto/sha256.c | 166 
>  1 file changed, 30 insertions(+), 136 deletions(-)
> 
> diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
> index c6bfeacc5b81..5efd390706c6 100644
> --- a/lib/crypto/sha256.c
> +++ b/lib/crypto/sha256.c
> @@ -18,6 +18,17 @@
>  #include 
>  #include 
>  
> +static const u32 SHA256_K[] = {
> + 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 0x3956c25b, 0x59f111f1, 
> 0x923f82a4, 0xab1c5ed5,
> + 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 0x72be5d74, 0x80deb1fe, 
> 0x9bdc06a7, 0xc19bf174,
> + 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 0x2de92c6f, 0x4a7484aa, 
> 0x5cb0a9dc, 0x76f988da,
> + 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 0xc6e00bf3, 0xd5a79147, 
> 0x06ca6351, 0x14292967,
> + 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 0x650a7354, 0x766a0abb, 
> 0x81c2c92e, 0x92722c85,
> + 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 0xd192e819, 0xd6990624, 
> 0xf40e3585, 0x106aa070,
> + 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 0x391c0cb3, 0x4ed8aa4a, 
> 0x5b9cca4f, 0x682e6ff3,
> + 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 
> 0xbef9a3f7, 0xc67178f2,
> +};

Limit this to 80 columns?

Otherwise this looks good.

- Eric

Re: [PATCH v2 3/6] crypto: lib/sha256 - Clear W[] in sha256_update() instead of sha256_transform()

2020-10-21 Thread Eric Biggers

On Tue, Oct 20, 2020 at 04:39:54PM -0400, Arvind Sankar wrote:
> The temporary W[] array is currently zeroed out once every call to
> sha256_transform(), i.e. once every 64 bytes of input data. Moving it to
> sha256_update() instead so that it is cleared only once per update can
> save about 2-3% of the total time taken to compute the digest, with a
> reasonable memset() implementation, and considerably more (~20%) with a
> bad one (eg the x86 purgatory currently uses a memset() coded in C).
> 
> Signed-off-by: Arvind Sankar 

Looks good,

Reviewed-by: Eric Biggers

RE: [PATCH] scsi: ufs: make sure scan sequence for multiple hosts

2020-10-21 Thread Chanho Park

> > Did you mean /dev/disk/by-[part]label/ symlink? It's quite reasonable to
> > use them by udev in userspace such as initramfs but some cases does not
> use
> > initramfs or initrd. In that case, we need to load the root
> > device(/dev/sda[N]) directly from kernel.
> 
> Please use udev or systemd instead of adding code in the UFS driver that
> is
> not necessary when udev or systemd is used.
>

What I mentioned was how it can be handled when we mount rootfs directly from 
kernel.

1) kernel -> initramfs (mount root) -> systemd
2) kernel (mount root) -> systemd
 -> In this case, we normally use root=/dev/sda1 from kernel commandline to 
mount the rootfs.

Like fstab can support legacy node mount, ufs driver also needs to provide an 
option for using the permanent legacy node. If you're really worry about adding 
a new codes for all UFS driver, we can put this as controller specific or 
optional feature.

Best Regards,
Chanho Park

Re: [PATCH v2 2/6] crypto: lib/sha256 - Don't clear temporary variables

2020-10-21 Thread Eric Biggers

On Tue, Oct 20, 2020 at 04:39:53PM -0400, Arvind Sankar wrote:
> The assignments to clear a through h and t1/t2 are optimized out by the
> compiler because they are unused after the assignments.
> 
> These variables shouldn't be very sensitive: t1/t2 can be calculated
> from a through h, so they don't reveal any additional information.
> Knowing a through h is equivalent to knowing one 64-byte block's SHA256
> hash (with non-standard initial value) which, assuming SHA256 is secure,
> doesn't reveal any information about the input.
> 
> Signed-off-by: Arvind Sankar 

I don't entirely buy the second paragraph.  It could be the case that the input
is less than or equal to one SHA-256 block (64 bytes), in which case leaking
'a' through 'h' would reveal the final SHA-256 hash if the input length is
known.  And note that callers might consider either the input, the resulting
hash, or both to be sensitive information -- it depends.

> ---
>  lib/crypto/sha256.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
> index d43bc39ab05e..099cd11f83c1 100644
> --- a/lib/crypto/sha256.c
> +++ b/lib/crypto/sha256.c
> @@ -202,7 +202,6 @@ static void sha256_transform(u32 *state, const u8 *input)
>   state[4] += e; state[5] += f; state[6] += g; state[7] += h;
>  
>   /* clear any sensitive info... */
> - a = b = c = d = e = f = g = h = t1 = t2 = 0;
>   memzero_explicit(W, 64 * sizeof(u32));
>  }

Your change itself is fine, though.  As you mentioned, these assignments get
optimized out, so they weren't accomplishing anything.

The fact is, there just isn't any way to guarantee in C code that all sensitive
variables get cleared.

So we shouldn't (and generally don't) bother trying to clear individual u32's,
ints, etc. like this, but rather only structs and arrays, as clearing those is
more likely to work as intended.

- Eric

Re: [PATCH] drm/amdgpu: remove unneeded break

2020-10-21 Thread Alex Deucher

Applied.  Thanks!

Alex

On Mon, Oct 19, 2020 at 11:08 AM Harry Wentland  wrote:
>
> On 2020-10-19 10:55 a.m., Christian König wrote:
> > Am 19.10.20 um 16:43 schrieb t...@redhat.com:
> >> From: Tom Rix 
> >>
> >> A break is not needed if it is preceded by a return or break
> >>
> >> Signed-off-by: Tom Rix 
> >
> > Acked-by: Christian König 
>
> Reviewed-by: Harry Wentland 
>
> Harry
>
> >
> >> ---
> >>   drivers/gpu/drm/amd/display/dc/dce/dce_transform.c  | 1 -
> >>   drivers/gpu/drm/amd/display/dc/dce100/dce100_resource.c | 7 ---
> >>   drivers/gpu/drm/amd/display/dc/dce110/dce110_resource.c | 7 ---
> >>   drivers/gpu/drm/amd/display/dc/dce112/dce112_resource.c | 7 ---
> >>   drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c | 7 ---
> >>   drivers/gpu/drm/amd/display/dc/dce60/dce60_resource.c   | 7 ---
> >>   drivers/gpu/drm/amd/display/dc/dce80/dce80_resource.c   | 7 ---
> >>   7 files changed, 43 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c
> >> b/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c
> >> index 2a32b66959ba..130a0a0c8332 100644
> >> --- a/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c
> >> +++ b/drivers/gpu/drm/amd/display/dc/dce/dce_transform.c
> >> @@ -1330,7 +1330,6 @@ static bool configure_graphics_mode(
> >>   REG_SET(OUTPUT_CSC_CONTROL, 0,
> >>   OUTPUT_CSC_GRPH_MODE, 0);
> >>   break;
> >> -break;
> >>   case COLOR_SPACE_SRGB_LIMITED:
> >>   /* TV RGB */
> >>   REG_SET(OUTPUT_CSC_CONTROL, 0,
> >> diff --git a/drivers/gpu/drm/amd/display/dc/dce100/dce100_resource.c
> >> b/drivers/gpu/drm/amd/display/dc/dce100/dce100_resource.c
> >> index d741787f75dc..42c7d157da32 100644
> >> --- a/drivers/gpu/drm/amd/display/dc/dce100/dce100_resource.c
> >> +++ b/drivers/gpu/drm/amd/display/dc/dce100/dce100_resource.c
> >> @@ -418,25 +418,18 @@ static int map_transmitter_id_to_phy_instance(
> >>   switch (transmitter) {
> >>   case TRANSMITTER_UNIPHY_A:
> >>   return 0;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_B:
> >>   return 1;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_C:
> >>   return 2;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_D:
> >>   return 3;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_E:
> >>   return 4;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_F:
> >>   return 5;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_G:
> >>   return 6;
> >> -break;
> >>   default:
> >>   ASSERT(0);
> >>   return 0;
> >> diff --git a/drivers/gpu/drm/amd/display/dc/dce110/dce110_resource.c
> >> b/drivers/gpu/drm/amd/display/dc/dce110/dce110_resource.c
> >> index 2bbfa2e176a9..382581c4a674 100644
> >> --- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_resource.c
> >> +++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_resource.c
> >> @@ -471,25 +471,18 @@ static int map_transmitter_id_to_phy_instance(
> >>   switch (transmitter) {
> >>   case TRANSMITTER_UNIPHY_A:
> >>   return 0;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_B:
> >>   return 1;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_C:
> >>   return 2;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_D:
> >>   return 3;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_E:
> >>   return 4;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_F:
> >>   return 5;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_G:
> >>   return 6;
> >> -break;
> >>   default:
> >>   ASSERT(0);
> >>   return 0;
> >> diff --git a/drivers/gpu/drm/amd/display/dc/dce112/dce112_resource.c
> >> b/drivers/gpu/drm/amd/display/dc/dce112/dce112_resource.c
> >> index b622b4b1dac3..7b4b2304bbff 100644
> >> --- a/drivers/gpu/drm/amd/display/dc/dce112/dce112_resource.c
> >> +++ b/drivers/gpu/drm/amd/display/dc/dce112/dce112_resource.c
> >> @@ -446,25 +446,18 @@ static int map_transmitter_id_to_phy_instance(
> >>   switch (transmitter) {
> >>   case TRANSMITTER_UNIPHY_A:
> >>   return 0;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_B:
> >>   return 1;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_C:
> >>   return 2;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_D:
> >>   return 3;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_E:
> >>   return 4;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_F:
> >>   return 5;
> >> -break;
> >>   case TRANSMITTER_UNIPHY_G:
> >>   return 6;
> >> -break;
> >>   default:
> >>   ASSERT(0);
> >>   return 0;
> >> diff --git a/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
> >> b/drivers/gpu/drm/amd/display/dc/dce120/dce120_resource.c
> >> index 16fe7344702f..3d782b7c86cb 100644
> >> ---

Re: [PATCH] sched/fair: check for idle core

2020-10-21 Thread Viresh Kumar

On 21-10-20, 20:11, Rafael J. Wysocki wrote:
> On Wednesday, October 21, 2020 3:10:26 PM CEST Peter Zijlstra wrote:
> > On Wed, Oct 21, 2020 at 02:19:50PM +0200, Peter Zijlstra wrote:
> > > On Wed, Oct 21, 2020 at 01:56:55PM +0200, Julia Lawall wrote:
> > > > Prior to 5.8, my machine was using intel_pstate and had few background
> > > > tasks.  Thus the problem wasn't visible in practice.  Starting with 5.8
> > > > the kernel decided that intel_cpufreq would be more appropriate, which
> > > > introduced kworkers every 0.004 seconds on all cores.
> > > 
> > > That still doesn't make any sense. Are you running the legacy on-demand
> > > thing or something?
> > > 
> > > Rafael, Srinivas, Viresh, how come it defaults to that?
> > 
> > Does we want something like this?
> > 
> > ---
> >  arch/x86/configs/i386_defconfig   | 3 +--
> >  arch/x86/configs/x86_64_defconfig | 3 +--
> >  drivers/cpufreq/Kconfig   | 7 +--
> >  3 files changed, 7 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/configs/i386_defconfig 
> > b/arch/x86/configs/i386_defconfig
> > index 78210793d357..c343ad459737 100644
> > --- a/arch/x86/configs/i386_defconfig
> > +++ b/arch/x86/configs/i386_defconfig
> > @@ -41,8 +41,7 @@ CONFIG_PM_DEBUG=y
> >  CONFIG_PM_TRACE_RTC=y
> >  CONFIG_ACPI_DOCK=y
> >  CONFIG_ACPI_BGRT=y
> > -CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
> > -CONFIG_CPU_FREQ_GOV_ONDEMAND=y
> > +CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y
> >  CONFIG_X86_ACPI_CPUFREQ=y
> >  CONFIG_EFI_VARS=y
> >  CONFIG_KPROBES=y
> > diff --git a/arch/x86/configs/x86_64_defconfig 
> > b/arch/x86/configs/x86_64_defconfig
> > index 9936528e1939..23e1ea85c1ec 100644
> > --- a/arch/x86/configs/x86_64_defconfig
> > +++ b/arch/x86/configs/x86_64_defconfig
> > @@ -38,8 +38,7 @@ CONFIG_PM_DEBUG=y
> >  CONFIG_PM_TRACE_RTC=y
> >  CONFIG_ACPI_DOCK=y
> >  CONFIG_ACPI_BGRT=y
> > -CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE=y
> > -CONFIG_CPU_FREQ_GOV_ONDEMAND=y
> > +CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y
> >  CONFIG_X86_ACPI_CPUFREQ=y
> >  CONFIG_IA32_EMULATION=y
> >  CONFIG_EFI_VARS=y
> > diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
> > index 2c7171e0b001..8dfca6e9b836 100644
> > --- a/drivers/cpufreq/Kconfig
> > +++ b/drivers/cpufreq/Kconfig
> > @@ -37,8 +37,7 @@ config CPU_FREQ_STAT
> >  choice
> > prompt "Default CPUFreq governor"
> > default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || 
> > ARM_SA1110_CPUFREQ
> > -   default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if ARM64 || ARM
> > -   default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if X86_INTEL_PSTATE && SMP
> > +   default CPU_FREQ_DEFAULT_GOV_SCHEDUTIL if SMP
> > default CPU_FREQ_DEFAULT_GOV_PERFORMANCE
> > help
> >   This option sets which CPUFreq governor shall be loaded at
> > @@ -71,6 +70,7 @@ config CPU_FREQ_DEFAULT_GOV_USERSPACE
> >  
> >  config CPU_FREQ_DEFAULT_GOV_ONDEMAND
> > bool "ondemand"
> > +   depends on !SMP
> > select CPU_FREQ_GOV_ONDEMAND
> > select CPU_FREQ_GOV_PERFORMANCE
> > help
> > @@ -83,6 +83,7 @@ config CPU_FREQ_DEFAULT_GOV_ONDEMAND
> >  
> >  config CPU_FREQ_DEFAULT_GOV_CONSERVATIVE
> > bool "conservative"
> > +   depends on !SMP
> > select CPU_FREQ_GOV_CONSERVATIVE
> > select CPU_FREQ_GOV_PERFORMANCE
> > help
> 
> The changes above should work.
> 
> > @@ -144,6 +145,7 @@ config CPU_FREQ_GOV_USERSPACE
> >  
> >  config CPU_FREQ_GOV_ONDEMAND
> > tristate "'ondemand' cpufreq policy governor"
> > +   depends on !SMP
> 
> But I don't think that we can do this and the one below.

I have exactly the same comments.

> > select CPU_FREQ_GOV_COMMON
> > help
> >   'ondemand' - This driver adds a dynamic cpufreq policy governor.
> > @@ -163,6 +165,7 @@ config CPU_FREQ_GOV_ONDEMAND
> >  config CPU_FREQ_GOV_CONSERVATIVE
> > tristate "'conservative' cpufreq governor"
> > depends on CPU_FREQ
> > +   depends on !SMP
> > select CPU_FREQ_GOV_COMMON
> > help
> >   'conservative' - this driver is rather similar to the 'ondemand'
> > 
> 
> 
> 

-- 
viresh

RE: [PATCH] scsi: megaraid_sas: use spin_lock() in hard IRQ

2020-10-21 Thread Tianxianting

Thanks,
Do you mean Megasas raid can be used in m68k arch?

-Original Message-
From: Finn Thain [mailto:fth...@telegraphics.com.au] 
Sent: Thursday, October 22, 2020 10:25 AM
To: tianxianting (RD) 
Cc: kashyap.de...@broadcom.com; sumit.sax...@broadcom.com; 
shivasharan.srikanteshw...@broadcom.com; j...@linux.ibm.com; 
martin.peter...@oracle.com; megaraidlinux@broadcom.com; 
linux-s...@vger.kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH] scsi: megaraid_sas: use spin_lock() in hard IRQ

On Wed, 21 Oct 2020, Xianting Tian wrote:

> Since we already in hard IRQ context when running megasas_isr(),

On m68k, hard irq context does not mean interrupts are disabled. Are there no 
other architectures in that category?

> so use spin_lock() is enough, which is faster than spin_lock_irqsave().
> 

Is that measurable?

> Signed-off-by: Xianting Tian 
> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c 
> b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 2b7e7b5f3..bd186254d 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -3977,15 +3977,14 @@ static irqreturn_t megasas_isr(int irq, void 
> *devp)  {
>   struct megasas_irq_context *irq_context = devp;
>   struct megasas_instance *instance = irq_context->instance;
> - unsigned long flags;
>   irqreturn_t rc;
>  
>   if (atomic_read(>fw_reset_no_pci_access))
>   return IRQ_HANDLED;
>  
> - spin_lock_irqsave(>hba_lock, flags);
> + spin_lock(>hba_lock);
>   rc = megasas_deplete_reply_queue(instance, DID_OK);
> - spin_unlock_irqrestore(>hba_lock, flags);
> + spin_unlock(>hba_lock);
>  
>   return rc;
>  }
>

[PATCH v2] firmware: gsmi: Drop the use of dma_pool_* API functions

2020-10-21 Thread Furquan Shaikh

GSMI driver uses dma_pool_* API functions for buffer allocation
because it requires that the SMI buffers are allocated within 32-bit
physical address space. However, this does not work well with IOMMU
since there is no real device and hence no domain associated with the
device.

Since this is not a real device, it does not require any device
address(IOVA) for the buffer allocations. The only requirement is to
ensure that the physical address allocated to the buffer is within
32-bit physical address space. This is because the buffers have
nothing to do with DMA at all. It is required for communication with
firmware executing in SMI mode which has access only to the bottom
4GiB of memory. Hence, this change switches to using a SLAB cache
created with SLAB_CACHE_DMA32 that guarantees that the allocation
happens from the DMA32 memory zone. All calls to dma_pool_* are
replaced with kmem_cache_*.

In addition to that, all the code for managing the dma_pool for GSMI
platform device is dropped.

Signed-off-by: Furquan Shaikh 
---
Changelog since v1:
- Switched to using SLAB cache with SLAB_CACHE_DMA32.
- Added comment to code and commit message explaining the reason for
using DMA32 memory zone.

 drivers/firmware/google/gsmi.c | 38 +++---
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/drivers/firmware/google/gsmi.c b/drivers/firmware/google/gsmi.c
index 7d9367b22010..092d8cb209a3 100644
--- a/drivers/firmware/google/gsmi.c
+++ b/drivers/firmware/google/gsmi.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -85,7 +84,6 @@
 struct gsmi_buf {
u8 *start;  /* start of buffer */
size_t length;  /* length of buffer */
-   dma_addr_t handle;  /* dma allocation handle */
u32 address;/* physical address of buffer */
 };
 
@@ -97,7 +95,7 @@ static struct gsmi_device {
spinlock_t lock;/* serialize access to SMIs */
u16 smi_cmd;/* SMI command port */
int handshake_type; /* firmware handler interlock type */
-   struct dma_pool *dma_pool;  /* DMA buffer pool */
+   struct kmem_cache *mem_pool;/* kmem cache for gsmi_buf allocations 
*/
 } gsmi_dev;
 
 /* Packed structures for communicating with the firmware */
@@ -157,14 +155,20 @@ static struct gsmi_buf *gsmi_buf_alloc(void)
}
 
/* allocate buffer in 32bit address space */
-   smibuf->start = dma_pool_alloc(gsmi_dev.dma_pool, GFP_KERNEL,
-  >handle);
+   smibuf->start = kmem_cache_alloc(gsmi_dev.mem_pool, GFP_KERNEL);
if (!smibuf->start) {
printk(KERN_ERR "gsmi: failed to allocate name buffer\n");
kfree(smibuf);
return NULL;
}
 
+   if ((u64)virt_to_phys(smibuf->start) >> 32) {
+   printk(KERN_ERR "gsmi: allocation not within 32-bit physical 
address space\n");
+   kfree(smibuf->start);
+   kfree(smibuf);
+   return NULL;
+   }
+
/* fill in the buffer handle */
smibuf->length = GSMI_BUF_SIZE;
smibuf->address = (u32)virt_to_phys(smibuf->start);
@@ -176,8 +180,7 @@ static void gsmi_buf_free(struct gsmi_buf *smibuf)
 {
if (smibuf) {
if (smibuf->start)
-   dma_pool_free(gsmi_dev.dma_pool, smibuf->start,
- smibuf->handle);
+   kmem_cache_free(gsmi_dev.mem_pool, smibuf->start);
kfree(smibuf);
}
 }
@@ -914,9 +917,20 @@ static __init int gsmi_init(void)
spin_lock_init(_dev.lock);
 
ret = -ENOMEM;
-   gsmi_dev.dma_pool = dma_pool_create("gsmi", _dev.pdev->dev,
-GSMI_BUF_SIZE, GSMI_BUF_ALIGN, 0);
-   if (!gsmi_dev.dma_pool)
+
+   /*
+* SLAB cache is created using SLAB_CACHE_DMA32 to ensure that the
+* allocations for gsmi_buf come from the DMA32 memory zone. These
+* buffers have nothing to do with DMA. They are required for
+* communication with firmware executing in SMI mode which can only
+* access the bottom 4GiB of physical memory. Since DMA32 memory zone
+* guarantees allocation under the 4GiB boundary, this driver creates
+* a SLAB cache with SLAB_CACHE_DMA32 flag.
+*/
+   gsmi_dev.mem_pool = kmem_cache_create("gsmi", GSMI_BUF_SIZE,
+ GSMI_BUF_ALIGN,
+ SLAB_CACHE_DMA32, NULL);
+   if (!gsmi_dev.mem_pool)
goto out_err;
 
/*
@@ -1032,7 +1046,7 @@ static __init int gsmi_init(void)
gsmi_buf_free(gsmi_dev.param_buf);
gsmi_buf_free(gsmi_dev.data_buf);
gsmi_buf_free(gsmi_dev.name_buf);
-

Re: [PATCH 2/2] venus: venc: fix handlig of S_SELECTION and G_SELECTION

2020-10-21 Thread vgarodia


Hi Tomasz,

On 2020-10-13 19:09, Tomasz Figa wrote:

Hi Vikash,

On Tue, Oct 13, 2020 at 02:56:21PM +0530, vgaro...@codeaurora.org 
wrote:


On 2020-10-08 19:51, Tomasz Figa wrote:
> On Wed, Oct 7, 2020 at 9:33 PM  wrote:
> >
> > Hi Tomasz,
> >
> > On 2020-10-01 20:47, Tomasz Figa wrote:
> > > On Thu, Oct 1, 2020 at 3:32 AM Stanimir Varbanov
> > >  wrote:
> > >>
> > >> Hi Tomasz,
> > >>
> > >> On 9/25/20 11:55 PM, Tomasz Figa wrote:
> > >> > Hi Dikshita, Stanimir,
> > >> >
> > >> > On Thu, Sep 24, 2020 at 7:31 PM Dikshita Agarwal
> > >> >  wrote:
> > >> >>
> > >> >> From: Stanimir Varbanov 
> > >> >>
> > >> >> - return correct width and height for G_SELECTION
> > >> >> - if requested rectangle wxh doesn't match with capture port wxh
> > >> >>   adjust the rectangle to supported wxh.
> > >> >>
> > >> >> Signed-off-by: Dikshita Agarwal 
> > >> >> ---
> > >> >>  drivers/media/platform/qcom/venus/venc.c | 20 
> > >> >>  1 file changed, 12 insertions(+), 8 deletions(-)
> > >> >>
> > >> >> diff --git a/drivers/media/platform/qcom/venus/venc.c 
b/drivers/media/platform/qcom/venus/venc.c
> > >> >> index 7d2aaa8..a2cc12d 100644
> > >> >> --- a/drivers/media/platform/qcom/venus/venc.c
> > >> >> +++ b/drivers/media/platform/qcom/venus/venc.c
> > >> >> @@ -463,13 +463,13 @@ static int venc_g_fmt(struct file *file, void 
*fh, struct v4l2_format *f)
> > >> >> switch (s->target) {
> > >> >> case V4L2_SEL_TGT_CROP_DEFAULT:
> > >> >> case V4L2_SEL_TGT_CROP_BOUNDS:
> > >> >> -   s->r.width = inst->width;
> > >> >> -   s->r.height = inst->height;
> > >> >> -   break;
> > >> >> -   case V4L2_SEL_TGT_CROP:
> > >> >> s->r.width = inst->out_width;
> > >> >> s->r.height = inst->out_height;
> > >> >> break;
> > >> >> +   case V4L2_SEL_TGT_CROP:
> > >> >> +   s->r.width = inst->width;
> > >> >> +   s->r.height = inst->height;
> > >> >> +   break;
> > >> >> default:
> > >> >> return -EINVAL;
> > >> >> }inter
> > >> >> @@ -490,10 +490,14 @@ static int venc_g_fmt(struct file *file, void 
*fh, struct v4l2_format *f)
> > >> >>
> > >> >> switch (s->target) {
> > >> >> case V4L2_SEL_TGT_CROP:
> > >> >> -   if (s->r.width != inst->out_width ||
> > >> >> -   s->r.height != inst->out_height ||
> > >> >> -   s->r.top != 0 || s->r.left != 0)
> > >> >> -   return -EINVAL;
> > >> >> +   if (s->r.width != inst->width ||
> > >> >> +   s->r.height != inst->height ||
> > >> >> +   s->r.top != 0 || s->r.left != 0) {
> > >> >> +   s->r.top = 0;
> > >> >> +   s->r.left = 0;
> > >> >> +   s->r.width = inst->width;
> > >> >> +   s->r.height = inst->height;
> > >> >
> > >> > What's the point of exposing the selection API if no selection can
> > >> > actually be done?
> > >>
> > >> If someone can guarantee that dropping of s_selection will not break
> > >> userspace applications I'm fine with removing it.
> > >
> > > Indeed the specification could be made more clear about this. The
> > > visible rectangle configuration is described as optional, so I'd
> > > consider the capability to be optional as well.
> > >
> > > Of course it doesn't change the fact that something that is optional
> > > in the API may be mandatory for some specific integrations, like
> > > Chrome OS or Android.
> > >
> > >>
> > >> I implemented g/s_selection with the idea to add crop functionality
> > >> later because with current firmware interface it needs more work.
> > >
> > > I suggested one thing internally, but not sure if it was understood
> > > correctly:
> > >
> > > Most of the encoders only support partial cropping, with the rectangle
> > > limited to top = 0 and left = 0, in other words, only setting the
> > > visible width and height. This can be easily implemented on most of
> > > the hardware, even those that don't have dedicated cropping
> > > capability, by configuring the hardware as follows:
> > >
> > > stride = CAPTURE format width (or bytesperline)
> > > width = CROP width
> > > height = CROP height
> >
> > Assuming the bitstream height and width would be configured with
> > capture
> > plane
> > setting (s_fmt), configuring the crop as height/width would indicate
> > to
> > venus
> > hardware as scaling. To distinguish scaling with crop, firmware
> > needs to
> > be
> > configured separately indicating crop rectangle.
>
> The V4L2 encoder API does _not_ configure the bitstream width and
> height currently. Scaling is not defined in the API at the moment. As
> per the spec [1], the CAPTURE width and height fields are
> ignored/read-only.
>
> [1]
> 
https://www.kernel.org/doc/html/latest/userspace-api/media/v4l/dev-encoder.html#initialization
>
> Currently there

Re: [f2fs-dev] [PATCH v2 2/2] f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl

2020-10-21 Thread Daeho Jeong

> The use of ?: here is a bit strange.  How about:
>
> return algorithm < COMPRESS_MAX && f2fs_cops[algorithm] != NULL;
>

Ack

> Likewise, EINVAL tends to be over-used, which makes it ambiguous.  Maybe use
> ENOPKG for the case where algorithm < COMPRESS_MAX but the algorithm wasn't
> compiled into the kernel?  That would be similar to how opening an encrypted
> file fails with ENOPKG when the encryption algorithm isn't available.

Ack

> How about EBUSY for f2fs_is_mmap_file(inode) || get_dirty_pages(inode),
> and EFBIG for inode->i_size != 0?

Ack

Thanks~!

2020년 10월 22일 (목) 오후 1:26, Eric Biggers 님이 작성:
>
> On Thu, Oct 22, 2020 at 12:58:48PM +0900, Daeho Jeong wrote:
> > diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> > index 7895186cc765..3b58a41223f8 100644
> > --- a/fs/f2fs/compress.c
> > +++ b/fs/f2fs/compress.c
> > @@ -514,6 +514,11 @@ bool f2fs_is_compress_backend_ready(struct inode 
> > *inode)
> >   return f2fs_cops[F2FS_I(inode)->i_compress_algorithm];
> >  }
> >
> > +bool f2fs_is_compress_algorithm_ready(unsigned char algorithm)
> > +{
> > + return algorithm >= COMPRESS_MAX ? false : f2fs_cops[algorithm];
> > +}
>
> The use of ?: here is a bit strange.  How about:
>
> return algorithm < COMPRESS_MAX && f2fs_cops[algorithm] != NULL;
>
> > + if (option.log_cluster_size < MIN_COMPRESS_LOG_SIZE ||
> > + option.log_cluster_size > MAX_COMPRESS_LOG_SIZE ||
> > + !f2fs_is_compress_algorithm_ready(option.algorithm))
> > + return -EINVAL;
>
> Likewise, EINVAL tends to be over-used, which makes it ambiguous.  Maybe use
> ENOPKG for the case where algorithm < COMPRESS_MAX but the algorithm wasn't
> compiled into the kernel?  That would be similar to how opening an encrypted
> file fails with ENOPKG when the encryption algorithm isn't available.
>
> > + if (f2fs_is_mmap_file(inode) ||
> > + get_dirty_pages(inode) || inode->i_size) {
> > + ret = -EINVAL;
> > + goto out;
> > + }
>
> How about EBUSY for f2fs_is_mmap_file(inode) || get_dirty_pages(inode),
> and EFBIG for inode->i_size != 0?
>
> - Eric

Re: [PATCH v2 1/6] crypto: Use memzero_explicit() for clearing state

2020-10-21 Thread Eric Biggers

On Tue, Oct 20, 2020 at 04:39:52PM -0400, Arvind Sankar wrote:
> Without the barrier_data() inside memzero_explicit(), the compiler may
> optimize away the state-clearing if it can tell that the state is not
> used afterwards. At least in lib/crypto/sha256.c:__sha256_final(), the
> function can get inlined into sha256(), in which case the memset is
> optimized away.
> 
> Signed-off-by: Arvind Sankar 

Reviewed-by: Eric Biggers 

Maybe get the one in arch/arm64/crypto/sha3-ce-glue.c too?

- Eric

Re: [PATCH v2 05/15] perf session: introduce decompressor into trace reader object

2020-10-21 Thread Namhyung Kim

On Thu, Oct 22, 2020 at 1:00 AM Alexey Budankov
 wrote:
>
>
> Introduce decompressor to trace reader object so that decompression
> could be executed on per trace file basis separately for every
> trace file located in trace directory.

I'm slightly uncomfortable with the word 'trace' here as it's
used for other cases like perf trace and ftrace.  Maybe we can
change it to 'profile' and/or 'data file'?

Thanks
Namhyung

>
> Signed-off-by: Alexey Budankov 
> ---
>  tools/perf/util/session.c | 4 +++-
>  tools/perf/util/session.h | 1 +
>  2 files changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 911b2dbcd0ac..6afc670fdf0c 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -44,6 +44,8 @@ static int perf_session__process_compressed_event(struct 
> perf_session *session,
> u64 decomp_last_rem = 0;
> size_t mmap_len, decomp_len = session->header.env.comp_mmap_len;
> struct decomp *decomp, *decomp_last = session->decomp_last;
> +   struct zstd_data *zstd_data = session->reader ?
> +   &(session->reader->zstd_data) : &(session->zstd_data);
>
> if (decomp_last) {
> decomp_last_rem = decomp_last->size - decomp_last->head;
> @@ -71,7 +73,7 @@ static int perf_session__process_compressed_event(struct 
> perf_session *session,
> src = (void *)event + sizeof(struct perf_record_compressed);
> src_size = event->pack.header.size - sizeof(struct 
> perf_record_compressed);
>
> -   decomp_size = zstd_decompress_stream(&(session->zstd_data), src, 
> src_size,
> +   decomp_size = zstd_decompress_stream(zstd_data, src, src_size,
> &(decomp->data[decomp_last_rem]), decomp_len 
> - decomp_last_rem);
> if (!decomp_size) {
> munmap(decomp, mmap_len);
> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index abdb8518a81f..4fc9ccdf7970 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -42,6 +42,7 @@ struct reader {
> u64  data_size;
> u64  data_offset;
> reader_cb_t  process;
> +   struct zstd_data zstd_data;
>  };
>
>  struct perf_session {
> --
> 2.24.1
>
>

Re: [PATCH v2 6/6] crypto: lib/sha - Combine round constants and message schedule

2020-10-21 Thread Eric Biggers

On Tue, Oct 20, 2020 at 04:39:57PM -0400, Arvind Sankar wrote:
> Putting the round constants and the message schedule arrays together in
> one structure saves one register, which can be a significant benefit on
> register-constrained architectures. On x86-32 (tested on Broadwell
> Xeon), this gives a 10% performance benefit.
> 
> Signed-off-by: Arvind Sankar 
> Suggested-by: David Laight 
> ---
>  lib/crypto/sha256.c | 49 ++---
>  1 file changed, 28 insertions(+), 21 deletions(-)
> 
> diff --git a/lib/crypto/sha256.c b/lib/crypto/sha256.c
> index 3a8802d5f747..985cd0560d79 100644
> --- a/lib/crypto/sha256.c
> +++ b/lib/crypto/sha256.c
> @@ -29,6 +29,11 @@ static const u32 SHA256_K[] = {
>   0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 0x90befffa, 0xa4506ceb, 
> 0xbef9a3f7, 0xc67178f2,
>  };
>  
> +struct KW {
> + u32 K[64];
> + u32 W[64];
> +};

Note that this doubles the stack usage from 256 to 512 bytes.  That's pretty
large for kernel code, especially when compiler options can increase the stack
usage well beyond the "expected" value.

So unless this gives a big performance improvement on architectures other than
32-bit x86 (which people don't really care about these days), we probably
shouldn't do this.

FWIW, it's possible to reduce the length of 'W' to 16 words by computing the
next W value just before each round 16-63, or by computing the next W values in
batches of 16 before rounds 16, 32, and 48.  (This is similar to what lib/sha1.c
does for SHA-1.)  In a quick userspace benchmark that seems to reduce
performance by about 25% on x86_64, though.

- Eric

Re: [PATCH v2 04/15] perf session: move reader object definition to header file

2020-10-21 Thread Namhyung Kim

On Thu, Oct 22, 2020 at 12:59 AM Alexey Budankov
 wrote:
>
>
> Move definition of reader to session header file to be shared
> among different source files. Introduce reference to active
> reader object from session object.
>
> Signed-off-by: Alexey Budankov 

Acked-by: Namhyung Kim 

Thanks
Namhyung

> ---
>  tools/perf/util/session.c | 27 ---
>  tools/perf/util/session.h | 25 +
>  2 files changed, 25 insertions(+), 27 deletions(-)
>
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 6f09d506b2f6..911b2dbcd0ac 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -2110,33 +2110,6 @@ static int 
> __perf_session__process_decomp_events(struct perf_session *session)
> return 0;
>  }
>
> -/*
> - * On 64bit we can mmap the data file in one go. No need for tiny mmap
> - * slices. On 32bit we use 32MB.
> - */
> -#if BITS_PER_LONG == 64
> -#define MMAP_SIZE ULLONG_MAX
> -#define NUM_MMAPS 1
> -#else
> -#define MMAP_SIZE (32 * 1024 * 1024ULL)
> -#define NUM_MMAPS 128
> -#endif
> -
> -struct reader;
> -
> -typedef s64 (*reader_cb_t)(struct perf_session *session,
> -  union perf_event *event,
> -  u64 file_offset,
> -  const char *file_path);
> -
> -struct reader {
> -   int  fd;
> -   const char   *path;
> -   u64  data_size;
> -   u64  data_offset;
> -   reader_cb_t  process;
> -};
> -
>  static int
>  reader__process_events(struct reader *rd, struct perf_session *session,
>struct ui_progress *prog)
> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index 378ffc3e2809..abdb8518a81f 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -20,6 +20,30 @@ struct thread;
>  struct auxtrace;
>  struct itrace_synth_opts;
>
> +/*
> + * On 64bit we can mmap the data file in one go. No need for tiny mmap
> + * slices. On 32bit we use 32MB.
> + */
> +#if BITS_PER_LONG == 64
> +#define MMAP_SIZE ULLONG_MAX
> +#define NUM_MMAPS 1
> +#else
> +#define MMAP_SIZE (32 * 1024 * 1024ULL)
> +#define NUM_MMAPS 128
> +#endif
> +
> +typedef s64 (*reader_cb_t)(struct perf_session *session,
> +  union perf_event *event,
> +  u64 file_offset, const char *file_path);
> +
> +struct reader {
> +   int  fd;
> +   const char   *path;
> +   u64  data_size;
> +   u64  data_offset;
> +   reader_cb_t  process;
> +};
> +
>  struct perf_session {
> struct perf_header  header;
> struct machines machines;
> @@ -41,6 +65,7 @@ struct perf_session {
> struct zstd_datazstd_data;
> struct decomp   *decomp;
> struct decomp   *decomp_last;
> +   struct reader   *reader;
>  };
>
>  struct decomp {
> --
> 2.24.1
>

Re: [PATCH v2 03/15] perf data: open data directory in read access mode

2020-10-21 Thread Namhyung Kim

On Thu, Oct 22, 2020 at 12:58 AM Alexey Budankov
 wrote:
>
>
> Open files located at trace data directory in case read access
> mode is requested. File are opened and its fds assigned to
> perf_data dir files especially for loading data directories
> content in perf report mode.
>
> Signed-off-by: Alexey Budankov 
> ---
>  tools/perf/util/data.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> index c47aa34fdc0a..6ad61ac6ba67 100644
> --- a/tools/perf/util/data.c
> +++ b/tools/perf/util/data.c
> @@ -321,6 +321,10 @@ static int open_dir(struct perf_data *data)
> return -1;
>
> ret = open_file(data);
> +   if (!ret && perf_data__is_dir(data)) {

I think this __is_dir() check is unnecessary since it's checked
from the caller side already.

Thanks
Namhyung


> +   if (perf_data__is_read(data))
> +   ret = perf_data__open_dir(data);
> +   }
>
> /* Cleanup whatever we managed to create so far. */
> if (ret && perf_data__is_write(data))
> --
> 2.24.1
>

Re: [f2fs-dev] [PATCH v2 1/2] f2fs: add F2FS_IOC_GET_COMPRESS_OPTION ioctl

2020-10-21 Thread Daeho Jeong

Yep, it sounds more clear~

2020년 10월 22일 (목) 오후 1:22, Eric Biggers 님이 작성:
>
> On Thu, Oct 22, 2020 at 12:58:47PM +0900, Daeho Jeong wrote:
> > + if (!f2fs_compressed_file(inode)) {
> > + inode_unlock(inode);
> > + return -EINVAL;
> > + }
>
> How about using ENODATA here?  EINVAL tends to be used for lots of different
> reasons, and it's not always clear what it means.
>
> Note that FS_IOC_GET_ENCRYPTION_POLICY fails with ENODATA when called on an
> unencrypted file, which is a similar case.
>
> - Eric

[PATCH] thermal: imx: Do NOT return -EPROBE_DEFER when "#cooling-cells" is present

2020-10-21 Thread Anson Huang

The legacy CPU cooling should ONLY be used when "#cooling-cells" is NOT
present in cpu node, current implementation for registering legacy cooling
always return -EPROBE_DEFER when cpufreq is NOT ready, that will cause
thermal driver probe failed when cpufreq failed to probe with a non
-EPROBE_DEFER reason. In such case, thermal driver should continue probe
and provide temperature monitor and other cooling methods.

So, for the case of "#cooling-cells" present in cpu node, no need to
return -EPROBE_DEFER even cpufreq is NOT ready, this is to make sure
thermal driver can continue probe.

Signed-off-by: Anson Huang 
---
 drivers/thermal/imx_thermal.c | 25 +++--
 1 file changed, 19 insertions(+), 6 deletions(-)

diff --git a/drivers/thermal/imx_thermal.c b/drivers/thermal/imx_thermal.c
index 9f00182..df60dcb 100644
--- a/drivers/thermal/imx_thermal.c
+++ b/drivers/thermal/imx_thermal.c
@@ -630,17 +630,29 @@ MODULE_DEVICE_TABLE(of, of_imx_thermal_match);
 static int imx_thermal_register_legacy_cooling(struct imx_thermal_data *data)
 {
struct device_node *np;
+   struct device *cpu_dev;
int ret = 0;
 
-   data->policy = cpufreq_cpu_get(0);
-   if (!data->policy) {
-   pr_debug("%s: CPUFreq policy not found\n", __func__);
-   return -EPROBE_DEFER;
+   cpu_dev = get_cpu_device(0);
+   if (!cpu_dev) {
+   pr_err("imx thermal: failed to get cpu0 device\n");
+   return -ENODEV;
+   }
+
+   np = of_node_get(cpu_dev->of_node);
+   if (!np) {
+   pr_err("imx thermal: failed to find cpu0 node\n");
+   return -ENOENT;
}
 
-   np = of_get_cpu_node(data->policy->cpu, NULL);
+   if (!of_find_property(np, "#cooling-cells", NULL)) {
+   data->policy = cpufreq_cpu_get(0);
+   if (!data->policy) {
+   pr_debug("%s: CPUFreq policy not found\n", __func__);
+   ret = -EPROBE_DEFER;
+   goto put_node;
+   }
 
-   if (!np || !of_find_property(np, "#cooling-cells", NULL)) {
data->cdev = cpufreq_cooling_register(data->policy);
if (IS_ERR(data->cdev)) {
ret = PTR_ERR(data->cdev);
@@ -648,6 +660,7 @@ static int imx_thermal_register_legacy_cooling(struct 
imx_thermal_data *data)
}
}
 
+put_node:
of_node_put(np);
 
return ret;
-- 
2.7.4

Re: [PATCH v2 01/15] perf session: introduce trace file path to be shown in raw trace dump

2020-10-21 Thread Namhyung Kim

Hi,

On Thu, Oct 22, 2020 at 12:56 AM Alexey Budankov
 wrote:
>
>
> Extend reader, ordered_event and decomp objects to contain path
> of a trace file being displayed.
>
> Signed-off-by: Alexey Budankov 

Acked-by: Namhyung Kim 

Thanks
Namhyung

> ---
>  tools/perf/util/ordered-events.h | 1 +
>  tools/perf/util/session.c| 2 ++
>  tools/perf/util/session.h| 1 +
>  3 files changed, 4 insertions(+)
>
> diff --git a/tools/perf/util/ordered-events.h 
> b/tools/perf/util/ordered-events.h
> index 75345946c4b9..42c9764c6b5b 100644
> --- a/tools/perf/util/ordered-events.h
> +++ b/tools/perf/util/ordered-events.h
> @@ -9,6 +9,7 @@ struct perf_sample;
>  struct ordered_event {
> u64 timestamp;
> u64 file_offset;
> +   const char  *file_path;
> union perf_event*event;
> struct list_headlist;
>  };
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 7a5f03764702..4478ddae485b 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -2119,6 +2119,7 @@ typedef s64 (*reader_cb_t)(struct perf_session *session,
>
>  struct reader {
> int  fd;
> +   const char   *path;
> u64  data_size;
> u64  data_offset;
> reader_cb_t  process;
> @@ -2241,6 +2242,7 @@ static int __perf_session__process_events(struct 
> perf_session *session)
> .data_size  = session->header.data_size,
> .data_offset= session->header.data_offset,
> .process= process_simple,
> +   .path   = session->data->file.path,
> };
> struct ordered_events *oe = >ordered_events;
> struct perf_tool *tool = session->tool;
> diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
> index f76480166d38..378ffc3e2809 100644
> --- a/tools/perf/util/session.h
> +++ b/tools/perf/util/session.h
> @@ -46,6 +46,7 @@ struct perf_session {
>  struct decomp {
> struct decomp *next;
> u64 file_pos;
> +   const char *file_path;
> size_t mmap_len;
> u64 head;
> size_t size;
> --
> 2.24.1
>
>

Re: [PATCH v2 02/15] perf report: output trace file name in raw trace dump

2020-10-21 Thread Namhyung Kim

On Thu, Oct 22, 2020 at 12:57 AM Alexey Budankov
 wrote:
>
>
> Print path and name of a trace file into raw dump (-D)
> @. Print offset of PERF_RECORD_COMPRESSED
> record instead of zero for decompressed records:
>   0x22...@perf.data [0x30]: event: 9
> or
>   0x15c...@perf.data/data.7 [0x30]: event: 9
>
> Signed-off-by: Alexey Budankov 

Acked-by: Namhyung Kim 

Thanks
Namhyung

> ---
>  tools/perf/builtin-inject.c |  3 +-
>  tools/perf/util/session.c   | 75 ++---
>  tools/perf/util/tool.h  |  3 +-
>  3 files changed, 48 insertions(+), 33 deletions(-)
>
> diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> index 452a75fe68e5..037f8a98220c 100644
> --- a/tools/perf/builtin-inject.c
> +++ b/tools/perf/builtin-inject.c
> @@ -106,7 +106,8 @@ static int perf_event__repipe_op2_synth(struct 
> perf_session *session,
>
>  static int perf_event__repipe_op4_synth(struct perf_session *session,
> union perf_event *event,
> -   u64 data __maybe_unused)
> +   u64 data __maybe_unused,
> +   const char *str __maybe_unused)
>  {
> return perf_event__repipe_synth(session->tool, event);
>  }
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 4478ddae485b..6f09d506b2f6 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -36,7 +36,8 @@
>
>  #ifdef HAVE_ZSTD_SUPPORT
>  static int perf_session__process_compressed_event(struct perf_session 
> *session,
> - union perf_event *event, 
> u64 file_offset)
> + union perf_event *event, 
> u64 file_offset,
> + const char *file_path)
>  {
> void *src;
> size_t decomp_size, src_size;
> @@ -58,6 +59,7 @@ static int perf_session__process_compressed_event(struct 
> perf_session *session,
> }
>
> decomp->file_pos = file_offset;
> +   decomp->file_path = file_path;
> decomp->mmap_len = mmap_len;
> decomp->head = 0;
>
> @@ -98,7 +100,8 @@ static int perf_session__process_compressed_event(struct 
> perf_session *session,
>  static int perf_session__deliver_event(struct perf_session *session,
>union perf_event *event,
>struct perf_tool *tool,
> -  u64 file_offset);
> +  u64 file_offset,
> +  const char *file_path);
>
>  static int perf_session__open(struct perf_session *session)
>  {
> @@ -180,7 +183,8 @@ static int ordered_events__deliver_event(struct 
> ordered_events *oe,
> ordered_events);
>
> return perf_session__deliver_event(session, event->event,
> -  session->tool, event->file_offset);
> +  session->tool, event->file_offset,
> +  event->file_path);
>  }
>
>  struct perf_session *perf_session__new(struct perf_data *data,
> @@ -452,7 +456,8 @@ static int process_stat_round_stub(struct perf_session 
> *perf_session __maybe_unu
>
>  static int perf_session__process_compressed_event_stub(struct perf_session 
> *session __maybe_unused,
>union perf_event 
> *event __maybe_unused,
> -  u64 file_offset 
> __maybe_unused)
> +  u64 file_offset 
> __maybe_unused,
> +  const char *file_path 
> __maybe_unused)
>  {
> dump_printf(": unhandled!\n");
> return 0;
> @@ -1227,13 +1232,14 @@ static void sample_read__printf(struct perf_sample 
> *sample, u64 read_format)
>  }
>
>  static void dump_event(struct evlist *evlist, union perf_event *event,
> -  u64 file_offset, struct perf_sample *sample)
> +  u64 file_offset, struct perf_sample *sample,
> +  const char *file_path)
>  {
> if (!dump_trace)
> return;
>
> -   printf("\n%#" PRIx64 " [%#x]: event: %d\n",
> -  file_offset, event->header.size, event->header.type);
> +   printf("\n%#" PRIx64 "@%s [%#x]: event: %d\n",
> +  file_offset, file_path, event->header.size, 
> event->header.type);
>
> trace_event(event);
> if (event->header.type == PERF_RECORD_SAMPLE && 
> evlist->trace_event_sample_raw)
> @@ -1424,12 +1430,13 @@ static int machines__deliver_event(struct machines 
> *machines,
>struct evlist *evlist,
>union

Re: [PATCH] kbuild: Use uname for LINUX_COMPILE_HOST detection

2020-10-21 Thread Masahiro Yamada

On Tue, Oct 20, 2020 at 6:35 PM Chris Down  wrote:
>
> `hostname` may not be present on some systems as it's not mandated by
> POSIX/SUSv4. This isn't just a theoretical problem: on Arch Linux,
> `hostname` is provided by `inetutils`, which isn't part of the base
> distribution.
>
> ./scripts/mkcompile_h: line 38: hostname: command not found
>
> Use `uname -n` instead, which is more likely to be available (and
> mandated by standards).
>
> Signed-off-by: Chris Down 
> Cc: Masahiro Yamada 
> Cc: Michal Marek 
> Cc: linux-kbu...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---

Applied to linux-kbuild. Thanks.


>  scripts/mkcompile_h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/scripts/mkcompile_h b/scripts/mkcompile_h
> index baf3ab8d9d49..4ae735039daf 100755
> --- a/scripts/mkcompile_h
> +++ b/scripts/mkcompile_h
> @@ -35,7 +35,7 @@ else
> LINUX_COMPILE_BY=$KBUILD_BUILD_USER
>  fi
>  if test -z "$KBUILD_BUILD_HOST"; then
> -   LINUX_COMPILE_HOST=`hostname`
> +   LINUX_COMPILE_HOST=`uname -n`
>  else
> LINUX_COMPILE_HOST=$KBUILD_BUILD_HOST
>  fi
> --
> 2.28.0
>


-- 
Best Regards
Masahiro Yamada

Re: [f2fs-dev] [PATCH v2 2/2] f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl

2020-10-21 Thread Eric Biggers

On Thu, Oct 22, 2020 at 12:58:48PM +0900, Daeho Jeong wrote:
> diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
> index 7895186cc765..3b58a41223f8 100644
> --- a/fs/f2fs/compress.c
> +++ b/fs/f2fs/compress.c
> @@ -514,6 +514,11 @@ bool f2fs_is_compress_backend_ready(struct inode *inode)
>   return f2fs_cops[F2FS_I(inode)->i_compress_algorithm];
>  }
>  
> +bool f2fs_is_compress_algorithm_ready(unsigned char algorithm)
> +{
> + return algorithm >= COMPRESS_MAX ? false : f2fs_cops[algorithm];
> +}

The use of ?: here is a bit strange.  How about:

return algorithm < COMPRESS_MAX && f2fs_cops[algorithm] != NULL;

> + if (option.log_cluster_size < MIN_COMPRESS_LOG_SIZE ||
> + option.log_cluster_size > MAX_COMPRESS_LOG_SIZE ||
> + !f2fs_is_compress_algorithm_ready(option.algorithm))
> + return -EINVAL;

Likewise, EINVAL tends to be over-used, which makes it ambiguous.  Maybe use
ENOPKG for the case where algorithm < COMPRESS_MAX but the algorithm wasn't
compiled into the kernel?  That would be similar to how opening an encrypted
file fails with ENOPKG when the encryption algorithm isn't available.

> + if (f2fs_is_mmap_file(inode) ||
> + get_dirty_pages(inode) || inode->i_size) {
> + ret = -EINVAL;
> + goto out;
> + }

How about EBUSY for f2fs_is_mmap_file(inode) || get_dirty_pages(inode),
and EFBIG for inode->i_size != 0?

- Eric

Re: [f2fs-dev] [PATCH v2 1/2] f2fs: add F2FS_IOC_GET_COMPRESS_OPTION ioctl

2020-10-21 Thread Eric Biggers

On Thu, Oct 22, 2020 at 12:58:47PM +0900, Daeho Jeong wrote:
> + if (!f2fs_compressed_file(inode)) {
> + inode_unlock(inode);
> + return -EINVAL;
> + }

How about using ENODATA here?  EINVAL tends to be used for lots of different
reasons, and it's not always clear what it means.

Note that FS_IOC_GET_ENCRYPTION_POLICY fails with ENODATA when called on an
unencrypted file, which is a similar case.

- Eric

Re: [PATCH] kunit: Fix kunit.py --raw_output option

2020-10-21 Thread Brendan Higgins

On Wed, Oct 21, 2020 at 8:05 PM David Gow  wrote:
>
> Due to the raw_output() function on kunit_parser.py actually being a
> generator, it only runs if something reads the lines it returns. Since
> we no-longer do that (parsing doesn't actually happen if raw_output is
> enabled), it was not printing anything.
>
> Fixes:  45ba7a893ad89114e773b3dc32f6431354c465d6 ("kunit: kunit_tool: 
> Separate out config/build/exec/parse")
> Signed-off-by: David Gow 

Thanks for fixing this!

Reviewed-by: Brendan Higgins 
Tested-by: Brendan Higgins

Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel

2020-10-21 Thread b...@redhat.com

Hi Rahul,

On 10/20/20 at 03:26pm, Rahul Gopakumar wrote:
> >> Here, do you mean it even cost more time with the patch applied?
> 
> Yes, we ran it multiple times and it looks like there is a 
> very minor increase with the patch.
> 
.. 
> On 10/20/20 at 01:45pm, Rahul Gopakumar wrote:
> > Hi Baoquan,
> > 
> > We had some trouble applying the patch to problem commit and the latest 
> > upstream commit. Steven (CC'ed) helped us by providing the updated draft 
> > patch. We applied it on the latest commit 
> > (3e4fb4346c781068610d03c12b16c0cfb0fd24a3), and it doesn't look like 
> > improving the performance numbers.
> 
> Thanks for your feedback. From the code, I am sure what the problem is,
> but I didn't test it on system with huge memory. Forget mentioning my
> draft patch is based on akpm/master branch since it's a mm issue, it
> might be a little different with linus's mainline kernel, sorry for the
> inconvenience.
> 
> I will test and debug this on a server with 4T memory in our lab, and
> update if any progress.
> 
> > 
> > Patch on latest commit - 20.161 secs
> > Vanilla latest commit - 19.50 secs
> 

Can you tell how you measure the boot time? I checked the boot logs you
attached, E.g in below two logs, I saw patch_dmesg.log even has less
time during memmap init. Now I have got a machine with 1T memory for
testing, but didn't see obvious time cost increase. At above, you said
"Patch on latest commit - 20.161 secs", could you tell where this 20.161
secs comes from, so that I can investigate and reproduce on my system?

patch_dmesg.log:
[0.023126] Initmem setup node 1 [mem 0x0056-0x00aa]
[0.023128] On node 1 totalpages: 89128960
[0.023129]   Normal zone: 1392640 pages used for memmap
[0.023130]   Normal zone: 89128960 pages, LIFO batch:63
[0.023893] Initmem setup node 2 [mem 0x00ab-0x01033fff]
[0.023895] On node 2 totalpages: 89391104
[0.023896]   Normal zone: 1445888 pages used for memmap
[0.023897]   Normal zone: 89391104 pages, LIFO batch:63
[0.026744] ACPI: PM-Timer IO Port: 0x448
[0.026747] ACPI: Local APIC address 0xfee0

vanilla_dmesg.log:
[0.024295] Initmem setup node 1 [mem 0x0056-0x00aa]
[0.024298] On node 1 totalpages: 89128960
[0.024299]   Normal zone: 1392640 pages used for memmap
[0.024299]   Normal zone: 89128960 pages, LIFO batch:63
[0.025289] Initmem setup node 2 [mem 0x00ab-0x01033fff]
[0.025291] On node 2 totalpages: 89391104
[0.025292]   Normal zone: 1445888 pages used for memmap
[0.025293]   Normal zone: 89391104 pages, LIFO batch:63
[2.096982] ACPI: PM-Timer IO Port: 0x448
[2.096987] ACPI: Local APIC address 0xfee0

[PATCH v2 1/2] f2fs: add F2FS_IOC_GET_COMPRESS_OPTION ioctl

2020-10-21 Thread Daeho Jeong

From: Daeho Jeong 

Added a new F2FS_IOC_GET_COMPRESS_OPTION ioctl to get file compression
option of a file.

struct f2fs_comp_option {
u8 algorithm; => compression algorithm
  => 0:lzo, 1:lz4, 2:zstd, 3:lzorle
u8 log_cluster_size;  => log scale cluster size
  => 2 ~ 8
};

struct f2fs_comp_option option;

ioctl(fd, F2FS_IOC_GET_COMPRESS_OPTION, );

Signed-off-by: Daeho Jeong 
---
 fs/f2fs/f2fs.h |  7 +++
 fs/f2fs/file.c | 30 ++
 2 files changed, 37 insertions(+)

diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 53fe2853579c..a33c90cf979b 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -433,6 +433,8 @@ static inline bool __has_cursum_space(struct f2fs_journal 
*journal,
_IOR(F2FS_IOCTL_MAGIC, 19, __u64)
 #define F2FS_IOC_SEC_TRIM_FILE _IOW(F2FS_IOCTL_MAGIC, 20,  \
struct f2fs_sectrim_range)
+#define F2FS_IOC_GET_COMPRESS_OPTION   _IOR(F2FS_IOCTL_MAGIC, 21,  \
+   struct f2fs_comp_option)
 
 /*
  * should be same as XFS_IOC_GOINGDOWN.
@@ -481,6 +483,11 @@ struct f2fs_sectrim_range {
u64 flags;
 };
 
+struct f2fs_comp_option {
+   u8 algorithm;
+   u8 log_cluster_size;
+};
+
 /* for inline stuff */
 #define DEF_INLINE_RESERVED_SIZE   1
 static inline int get_extra_isize(struct inode *inode);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index ef5a844de53f..7e64259f6f5e 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3936,6 +3936,33 @@ static int f2fs_sec_trim_file(struct file *filp, 
unsigned long arg)
return ret;
 }
 
+static int f2fs_ioc_get_compress_option(struct file *filp, unsigned long arg)
+{
+   struct inode *inode = file_inode(filp);
+   struct f2fs_comp_option option;
+
+   if (!f2fs_sb_has_compression(F2FS_I_SB(inode)))
+   return -EOPNOTSUPP;
+
+   inode_lock(inode);
+
+   if (!f2fs_compressed_file(inode)) {
+   inode_unlock(inode);
+   return -EINVAL;
+   }
+
+   option.algorithm = F2FS_I(inode)->i_compress_algorithm;
+   option.log_cluster_size = F2FS_I(inode)->i_log_cluster_size;
+
+   inode_unlock(inode);
+
+   if (copy_to_user((struct f2fs_comp_option __user *)arg, ,
+   sizeof(option)))
+   return -EFAULT;
+
+   return 0;
+}
+
 long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(filp)
@@ -4024,6 +4051,8 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, 
unsigned long arg)
return f2fs_reserve_compress_blocks(filp, arg);
case F2FS_IOC_SEC_TRIM_FILE:
return f2fs_sec_trim_file(filp, arg);
+   case F2FS_IOC_GET_COMPRESS_OPTION:
+   return f2fs_ioc_get_compress_option(filp, arg);
default:
return -ENOTTY;
}
@@ -4194,6 +4223,7 @@ long f2fs_compat_ioctl(struct file *file, unsigned int 
cmd, unsigned long arg)
case F2FS_IOC_RELEASE_COMPRESS_BLOCKS:
case F2FS_IOC_RESERVE_COMPRESS_BLOCKS:
case F2FS_IOC_SEC_TRIM_FILE:
+   case F2FS_IOC_GET_COMPRESS_OPTION:
break;
default:
return -ENOIOCTLCMD;
-- 
2.29.0.rc1.297.gfa9743e501-goog

[PATCH v2 2/2] f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl

2020-10-21 Thread Daeho Jeong

From: Daeho Jeong 

Added a new F2FS_IOC_SET_COMPRESS_OPTION ioctl to change file
compression option of a file.

struct f2fs_comp_option {
u8 algorithm; => compression algorithm
  => 0:lzo, 1:lz4, 2:zstd, 3:lzorle
u8 log_cluster_size;  => log scale cluster size
  => 2 ~ 8
};

struct f2fs_comp_option option;

option.algorithm = 1;
option.log_cluster_size = 7;

ioctl(fd, F2FS_IOC_SET_COMPRESS_OPTION, );

Signed-off-by: Daeho Jeong 
---
 fs/f2fs/compress.c |  5 +
 fs/f2fs/f2fs.h |  7 +++
 fs/f2fs/file.c | 48 ++
 3 files changed, 60 insertions(+)

diff --git a/fs/f2fs/compress.c b/fs/f2fs/compress.c
index 7895186cc765..3b58a41223f8 100644
--- a/fs/f2fs/compress.c
+++ b/fs/f2fs/compress.c
@@ -514,6 +514,11 @@ bool f2fs_is_compress_backend_ready(struct inode *inode)
return f2fs_cops[F2FS_I(inode)->i_compress_algorithm];
 }
 
+bool f2fs_is_compress_algorithm_ready(unsigned char algorithm)
+{
+   return algorithm >= COMPRESS_MAX ? false : f2fs_cops[algorithm];
+}
+
 static mempool_t *compress_page_pool;
 static int num_compress_pages = 512;
 module_param(num_compress_pages, uint, 0444);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index a33c90cf979b..cc38afde6c04 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -435,6 +435,8 @@ static inline bool __has_cursum_space(struct f2fs_journal 
*journal,
struct f2fs_sectrim_range)
 #define F2FS_IOC_GET_COMPRESS_OPTION   _IOR(F2FS_IOCTL_MAGIC, 21,  \
struct f2fs_comp_option)
+#define F2FS_IOC_SET_COMPRESS_OPTION   _IOW(F2FS_IOCTL_MAGIC, 22,  \
+   struct f2fs_comp_option)
 
 /*
  * should be same as XFS_IOC_GOINGDOWN.
@@ -3915,6 +3917,7 @@ bool f2fs_compress_write_end(struct inode *inode, void 
*fsdata,
 int f2fs_truncate_partial_cluster(struct inode *inode, u64 from, bool lock);
 void f2fs_compress_write_end_io(struct bio *bio, struct page *page);
 bool f2fs_is_compress_backend_ready(struct inode *inode);
+bool f2fs_is_compress_algorithm_ready(unsigned char algorithm);
 int f2fs_init_compress_mempool(void);
 void f2fs_destroy_compress_mempool(void);
 void f2fs_decompress_pages(struct bio *bio, struct page *page, bool verity);
@@ -3945,6 +3948,10 @@ static inline bool f2fs_is_compress_backend_ready(struct 
inode *inode)
/* not support compression */
return false;
 }
+static inline bool f2fs_is_compress_algorithm_ready(unsigned char algorithm)
+{
+   return false;
+}
 static inline struct page *f2fs_compress_control_page(struct page *page)
 {
WARN_ON_ONCE(1);
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 7e64259f6f5e..b18ef22e2d9d 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -3963,6 +3963,51 @@ static int f2fs_ioc_get_compress_option(struct file 
*filp, unsigned long arg)
return 0;
 }
 
+static int f2fs_ioc_set_compress_option(struct file *filp, unsigned long arg)
+{
+   struct inode *inode = file_inode(filp);
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+   struct f2fs_comp_option option;
+   int ret;
+
+   if (!f2fs_sb_has_compression(sbi))
+   return -EOPNOTSUPP;
+
+   if (!f2fs_compressed_file(inode))
+   return -EINVAL;
+
+   if (!(filp->f_mode & FMODE_WRITE))
+   return -EBADF;
+
+   if (copy_from_user(, (struct f2fs_comp_option __user *)arg,
+   sizeof(option)))
+   return -EFAULT;
+
+   if (option.log_cluster_size < MIN_COMPRESS_LOG_SIZE ||
+   option.log_cluster_size > MAX_COMPRESS_LOG_SIZE ||
+   !f2fs_is_compress_algorithm_ready(option.algorithm))
+   return -EINVAL;
+
+   file_start_write(filp);
+   inode_lock(inode);
+
+   if (f2fs_is_mmap_file(inode) ||
+   get_dirty_pages(inode) || inode->i_size) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   F2FS_I(inode)->i_compress_algorithm = option.algorithm;
+   F2FS_I(inode)->i_log_cluster_size = option.log_cluster_size;
+   F2FS_I(inode)->i_cluster_size = 1 << option.log_cluster_size;
+   f2fs_mark_inode_dirty_sync(inode, true);
+out:
+   inode_unlock(inode);
+   file_end_write(filp);
+
+   return ret;
+}
+
 long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 {
if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(filp)
@@ -4053,6 +4098,8 @@ long f2fs_ioctl(struct file *filp, unsigned int cmd, 
unsigned long arg)
return f2fs_sec_trim_file(filp, arg);
case F2FS_IOC_GET_COMPRESS_OPTION:
return f2fs_ioc_get_compress_option(filp, arg);
+   case F2FS_IOC_SET_COMPRESS_OPTION:
+   return f2fs_ioc_set_compress_option(filp, arg);
default:

Re: [PATCH] ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it

2020-10-21 Thread Randy Dunlap

On 10/21/20 8:43 PM, Theodore Y. Ts'o wrote:
> On Wed, Oct 21, 2020 at 04:07:15PM -0700, Randy Dunlap wrote:
>>> I'm don't particularly care how this gets achieved, but please think
>>> about how to make it easy for a kernel developer to run a specific set
>>> of subsystem unit tests.  (In fact, being able to do something like
>>> "kunit.py run fs/ext4 fs/jbd2" or maybe "kunit.py run fs/..." would be
>>> *great*.  No need to fuss with hand editing the .kunitconfig file at
>>> all would be **wonderful**.
>>
>> I understand the wish for ease of use, but this is still the tail
>> wagging the dog.
>>
>> The primary documentation for 'select' is
>> Documentation/kbuild/kconfig-language.rst, which says:
>>
>>   Note:
>>  select should be used with care. select will force
>>  a symbol to a value without visiting the dependencies.
>>  By abusing select you are able to select a symbol FOO even
>>  if FOO depends on BAR that is not set.
>>  In general use select only for non-visible symbols
>>  (no prompts anywhere) and for symbols with no dependencies.
>>  That will limit the usefulness but on the other hand avoid
>>  the illegal configurations all over.
>>
> 
> Well, the KUNIT configs are kinda of a special case, since normally
> they don't have a lot of huge number of dependencies, since unit tests
> in general are not integration tests.  So ideally, dependencies will
> mostly be replaced with mocking functions.  And if there are *real*
> dependencies that the Kunit Unit tests need, they can be explicitly
> pulled in with selects.
> 
> That being said, as I said, I'm not picky about *how* this gets
> achieved.  But ease of use is a key part of making people more likely
> to run the unit tests.  So another way of solving the problem might be
> to put some kind of automated dependency solver into kunit.py, or some
> way of manually adding the necessary dependencies in some kind of
> Kunitconfig file that are in directories where their are Unit tests,
> or maybe some kind of extenstion to the Kconfig file.  My main
> requirement is that the only thing that should be necessary for
> enabling the ext4 Kunit tests should be adding a single line to the
> .kunitconfig file.  It's not fair to make the human developer manually
> have to figure out the dependency chains.
> 
> As far as I'm concerned, ease of use is important enough to justfy
> special casing and/or bending the rules as far as "select" is concered
> for Kunit-related CONFIG items.  But if someone else want to suggest a
> better approach, I'm all ears.
> 
> Cheers,

Indeed.  For the record, I support testing and have for a long time.
I just don't care for this big fscking hammer approach.
But I doubt that I can change your mind.

g'day.
-- 
~Randy

Re: [PATCH v2 1/3] backlight: pwm_bl: Fix interpolation

2020-10-21 Thread Alexandru Stan

On Wed, 14 Oct 2020 at 23:55, Geert Uytterhoeven  wrote:
>
> Hi Alexandru,
>
> On Tue, Oct 13, 2020 at 1:57 PM Alexandru Stan  wrote:
> > Whenever num-interpolated-steps was larger than the distance
> > between 2 consecutive brightness levels the table would get really
> > discontinuous. The slope of the interpolation would stick with
> > integers only and if it was 0 the whole line segment would get skipped.
> >
> > Example settings:
> > brightness-levels = <0 1 2 4 8 16 32 64 128 256>;
> > num-interpolated-steps = <16>;
> >
> > The distances between 1 2 4 and 8 would be 1, and only starting with 16
> > it would start to interpolate properly.
> >
> > Let's change it so there's always interpolation happening, even if
> > there's no enough points available (read: values in the table would
> > appear more than once). This should match the expected behavior much
> > more closely.
> >
> > Signed-off-by: Alexandru Stan 
>
> Thanks for your patch!

Thanks for your reply!

I'm sorry I haven't replied earlier. Looks like your reply was marked as spam.
Rest be assured my spam filter has been disciplined! :D

>
> > --- a/drivers/video/backlight/pwm_bl.c
> > +++ b/drivers/video/backlight/pwm_bl.c
> > @@ -327,24 +324,25 @@ static int pwm_backlight_parse_dt(struct device *dev,
> > table = devm_kzalloc(dev, size, GFP_KERNEL);
> > if (!table)
> > return -ENOMEM;
> > -
> > -   /* Fill the interpolated table. */
> > -   levels_count = 0;
> > -   for (i = 0; i < data->max_brightness - 1; i++) {
> > -   value = data->levels[i];
> > -   n = (data->levels[i + 1] - value) / 
> > num_steps;
> > -   if (n > 0) {
> > -   for (j = 0; j < num_steps; j++) {
> > -   table[levels_count] = value;
> > -   value += n;
> > -   levels_count++;
> > -   }
> > -   } else {
> > -   table[levels_count] = 
> > data->levels[i];
> > -   levels_count++;
> > +   /*
> > +* Fill the interpolated table[x] = y
> > +* by draw lines between each (x1, y1) to (x2, y2).
> > +*/
> > +   dx = num_steps;
> > +   for (i = 0; i < num_input_levels - 1; i++) {
> > +   x1 = i * dx;
> > +   x2 = x1 + dx;
> > +   y1 = data->levels[i];
> > +   y2 = data->levels[i + 1];
> > +   dy = (s64)y2 - y1;
> > +
> > +   for (x = x1; x < x2; x++) {
> > +   table[x] = y1 +
> > +   div_s64(dy * ((s64)x - x1), 
> > dx);
>
> Yummy, 64-by-32 divisions.
> Shouldn't this use a rounded division?

It won't hurt. But it really doesn't make much of a difference either way.

>
> Nevertheless, I think it would be worthwhile to implement this using
> a (modified) Bresenham algorithm, avoiding multiplications and
> divisions, and possibly increasing accuracy as well.
>
> https://en.wikipedia.org/wiki/Bresenham%27s_line_algorithm

Sure, it might be a little faster to use Bresenham's line algorithm.
Looks like to implement it I would have to deal with some fixed point
math and still have to do divisions occasionally.
I don't think performance is critical here, the values get calculated
only once when the driver loads, and the algorithm's accuracy
improvements might be at most 1 LSB.

Meanwhile the formula I already implemented is almost the same as the
formulas found at
https://en.wikipedia.org/wiki/Linear_interpolation#:~:text=gives
I would like to keep it as is, as straightforward as possible.

>
> > }
> > }
> > -   table[levels_count] = data->levels[i];
> > +   /* Fill in the last point, since no line starts 
> > here. */
> > +   table[x2] = y2;
> >
> > /*
> >  * As we use interpolation lets remove current
>
> Gr{oetje,eeting}s,
>
> Geert
>
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- 
> ge...@linux-m68k.org
>
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like 
> that.
> -- Linus Torvalds

Alexandru Stan (amstan)

[PATCH] mm,thp,shmem: limit shmem THP alloc gfp_mask

2020-10-21 Thread Rik van Riel

The allocation flags of anonymous transparent huge pages can be controlled
through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
help the system from getting bogged down in the page reclaim and compaction
code when many THPs are getting allocated simultaneously.

However, the gfp_mask for shmem THP allocations were not limited by those
configuration settings, and some workloads ended up with all CPUs stuck
on the LRU lock in the page reclaim code, trying to allocate dozens of
THPs simultaneously.

This patch applies the same configurated limitation of THPs to shmem
hugepage allocations, to prevent that from happening.

This way a THP defrag setting of "never" or "defer+madvise" will result
in quick allocation failures without direct reclaim when no 2MB free
pages are available.

Signed-off-by: Rik van Riel 
---

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index c603237e006c..0a5b164a26d9 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -614,6 +614,8 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
 extern void pm_restrict_gfp_mask(void);
 extern void pm_restore_gfp_mask(void);
 
+extern gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma);
+
 #ifdef CONFIG_PM_SLEEP
 extern bool pm_suspended_storage(void);
 #else
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9474dbc150ed..9b08ce5cc387 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -649,7 +649,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
vm_fault *vmf,
  * available
  * never: never stall for any thp allocation
  */
-static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
+gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
 {
const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 537c137698f8..d1290eb508e5 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1545,8 +1545,11 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp,
return NULL;
 
shmem_pseudo_vma_init(, info, hindex);
-   page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN,
-   HPAGE_PMD_ORDER, , 0, numa_node_id(), true);
+   /* Limit the gfp mask according to THP configuration. */
+   gfp |= __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN;
+   gfp &= alloc_hugepage_direct_gfpmask();
+   page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, , 0, numa_node_id(),
+  true);
shmem_pseudo_vma_destroy();
if (page)
prep_transhuge_page(page);

-- 
All rights reversed.

Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-21 Thread Xiaqing (A)





On 2020/10/22 10:45, Roman Gushchin wrote:


On Thu, Oct 22, 2020 at 09:54:53AM +0800, Xiaqing (A) wrote:


On 2020/10/17 6:52, Roman Gushchin wrote:


This small patchset makes cma_release() non-blocking and simplifies
the code in hugetlbfs, where previously we had to temporarily drop
hugetlb_lock around the cma_release() call.

It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic
THP under a memory pressure requires a cma_release() call. If it's
a blocking function, it complicates the already complicated code.
Because there are at least two use cases like this (hugetlbfs is
another example), I believe it's just better to make cma_release()
non-blocking.

It also makes it more consistent with other memory releasing functions
in the kernel: most of them are non-blocking.


Roman Gushchin (2):
mm: cma: make cma_release() non-blocking
mm: hugetlb: don't drop hugetlb_lock around cma_release() call

   mm/cma.c | 51 +--
   mm/hugetlb.c |  6 --
   2 files changed, 49 insertions(+), 8 deletions(-)


I don't think this patch is a good idea.It transfers part or even all of the 
time of
cma_release to cma_alloc, which is more concerned by performance indicators.

I'm not quite sure: if cma_alloc() is racing with cma_release(), cma_alloc() 
will
wait for the cma_lock mutex anyway. So we don't really transfer anything to 
cma_alloc().


On Android phones, CPU resource competition is intense in many scenarios,
As a result, kernel threads and workers can be scheduled only after some ticks 
or more.
In this case, the performance of cma_alloc will deteriorate significantly,
which is not good news for many services on Android.

Ok, I agree, if the cpu is heavily loaded, it might affect the total execution 
time.

If we aren't going into the mutex->spinlock conversion direction (as Mike 
suggested),
we can address the performance concerns by introducing a cma_release_nowait() 
function,
so that the default cma_release() would work in the old way.
cma_release_nowait() can set an atomic flag on a cma area, which will cause 
following
cma_alloc()'s to flush the release queue. In this case there will be no 
performance
penalty unless somebody is using cma_release_nowait().
Will it work for you?


That looks good to me.

Thanks!



Thank you!

Re: [PATCH] ext: EXT4_KUNIT_TESTS should depend on EXT4_FS instead of selecting it

2020-10-21 Thread Theodore Y. Ts'o

On Wed, Oct 21, 2020 at 04:07:15PM -0700, Randy Dunlap wrote:
> > I'm don't particularly care how this gets achieved, but please think
> > about how to make it easy for a kernel developer to run a specific set
> > of subsystem unit tests.  (In fact, being able to do something like
> > "kunit.py run fs/ext4 fs/jbd2" or maybe "kunit.py run fs/..." would be
> > *great*.  No need to fuss with hand editing the .kunitconfig file at
> > all would be **wonderful**.
> 
> I understand the wish for ease of use, but this is still the tail
> wagging the dog.
> 
> The primary documentation for 'select' is
> Documentation/kbuild/kconfig-language.rst, which says:
> 
>   Note:
>   select should be used with care. select will force
>   a symbol to a value without visiting the dependencies.
>   By abusing select you are able to select a symbol FOO even
>   if FOO depends on BAR that is not set.
>   In general use select only for non-visible symbols
>   (no prompts anywhere) and for symbols with no dependencies.
>   That will limit the usefulness but on the other hand avoid
>   the illegal configurations all over.
> 

Well, the KUNIT configs are kinda of a special case, since normally
they don't have a lot of huge number of dependencies, since unit tests
in general are not integration tests.  So ideally, dependencies will
mostly be replaced with mocking functions.  And if there are *real*
dependencies that the Kunit Unit tests need, they can be explicitly
pulled in with selects.

That being said, as I said, I'm not picky about *how* this gets
achieved.  But ease of use is a key part of making people more likely
to run the unit tests.  So another way of solving the problem might be
to put some kind of automated dependency solver into kunit.py, or some
way of manually adding the necessary dependencies in some kind of
Kunitconfig file that are in directories where their are Unit tests,
or maybe some kind of extenstion to the Kconfig file.  My main
requirement is that the only thing that should be necessary for
enabling the ext4 Kunit tests should be adding a single line to the
.kunitconfig file.  It's not fair to make the human developer manually
have to figure out the dependency chains.

As far as I'm concerned, ease of use is important enough to justfy
special casing and/or bending the rules as far as "select" is concered
for Kunit-related CONFIG items.  But if someone else want to suggest a
better approach, I'm all ears.

Cheers,

- Ted

Re: [PATCH net] selftests: mptcp: depends on built-in IPv6

2020-10-21 Thread Jakub Kicinski

On Wed, 21 Oct 2020 17:35:32 -0700 (PDT) Mat Martineau wrote:
> > Recently, CONFIG_MPTCP_IPV6 no longer selects CONFIG_IPV6. As a
> > consequence, if CONFIG_MPTCP_IPV6=y is added to the kconfig, it will no
> > longer ensure CONFIG_IPV6=y. If it is not enabled, CONFIG_MPTCP_IPV6
> > will stay disabled and selftests will fail.
> >
> > We also need CONFIG_IPV6 to be built-in. For more details, please see
> > commit 0ed37ac586c0 ("mptcp: depends on IPV6 but not as a module").
> >
> > Note that 'make kselftest-merge' will take all 'config' files found in
> > 'tools/testsing/selftests'. Because some of them already set
> > CONFIG_IPV6=y, MPTCP selftests were still passing. But they will fail if
> > MPTCP selftests are launched manually after having executed this command
> > to prepare the kernel config:
> >
> >  ./scripts/kconfig/merge_config.sh -m .config \
> >  ./tools/testing/selftests/net/mptcp/config
> >
> > Fixes: 010b430d5df5 ("mptcp: MPTCP_IPV6 should depend on IPV6 instead of 
> > selecting it")
> > Signed-off-by: Matthieu Baerts 
>
> Reviewed-by: Mat Martineau 

Applied, thank you!

linux-next: Tree for Oct 22

2020-10-21 Thread Stephen Rothwell

Hi all,

Since the merge window is open, please do not add any v5.11 material to
your linux-next included branches until after v5.10-rc1 has been released.

Changes since 20201021:

The pci tree gained a conflict against Linus' tree.

Non-merge commits (relative to Linus' tree): 2341
 2721 files changed, 347786 insertions(+), 42467 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig and htmldocs. And finally, a simple boot test
of the powerpc pseries_le_defconfig kernel in qemu (with and without
kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 329 trees (counting Linus' and 86 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (f804b3159482 Merge tag 'linux-watchdog-5.10-rc1' of 
git://www.linux-watchdog.org/linux-watchdog)
Merging fixes/fixes (9123e3a74ec7 Linux 5.9-rc1)
Merging kbuild-current/fixes (e30d694c3381 Documentation/llvm: Fix clang target 
examples)
Merging arc-current/for-curr (6364d1b41cc3 arc: include/asm: fix typos of 
"themselves")
Merging arm-current/fixes (9123e3a74ec7 Linux 5.9-rc1)
Merging arm64-fixes/for-next/fixes (39e4716caa59 crypto: arm64: Use x16 with 
indirect branch to bti_c)
Merging arm-soc-fixes/arm/fixes (6869f774b1cd Merge tag 
'omap-for-v5.9/fixes-rc7' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes)
Merging uniphier-fixes/fixes (48778464bb7d Linux 5.8-rc2)
Merging drivers-memory-fixes/fixes (7ff3a2a626f7 memory: jz4780_nemc: Fix an 
error pointer vs NULL check in probe())
Merging m68k-current/for-linus (50c5feeea0af ide/macide: Convert Mac IDE driver 
to platform driver)
Merging powerpc-fixes/fixes (d1781f237047 selftests/powerpc: Make alignment 
handler test P9N DD2.1 vector CI load workaround)
Merging s390-fixes/fixes (549738f15da0 Linux 5.9-rc8)
Merging sparc/master (0a95a6d1a4cd sparc: use for_each_child_of_node() macro)
Merging fscrypt-current/for-stable (2b4eae95c736 fscrypt: don't evict dirty 
inodes after removing key)
Merging net/master (0ed37ac586c0 mptcp: depends on IPV6 but not as a module)
Merging bpf/master (c5eb48e89286 bpf, doc: Fix patchwork URL to point to 
kernel.org instance)
Merging ipsec/master (7fe94612dd4c xfrm: interface: fix the priorities for ipip 
and ipv6 tunnels)
Merging netfilter/master (31cc578ae2de netfilter: nftables_offload: KASAN 
slab-out-of-bounds Read in nft_flow_rule_create)
Merging ipvs/master (48d072c4e8cd selftests: netfilter: add time counter check)
Merging wireless-drivers/master (df41c19abbea drivers/net/wan/hdlc_fr: Move the 
skb_headroom check out of fr_hard_header)
Merging mac80211/master (9ff9b0d392ea Merge tag 'net-next-5.10' of 
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next)
Merging rdma-fixes/for-rc (a1b8638ba132 Linux 5.9-rc7)
Merging sound-current/for-linus (7da4c510abff ALSA: usb-audio: Line6 Pod Go 
interface requires static clock rate quirk)
Merging sound-asoc-fixes/for-linus (8101e3024d76 Merge remote-tracking branch 
'asoc/for-5.10' into asoc-linus)
Merging regmap-fixes/for-linus (549738f15da0 Linux 5.9-rc8)
Merging regulator-fixes/for-linus (b7c11f48ff81 Merge remote-tracking branch 
'regulator/for-5.10' into regulator-linus)
Merging spi-fixes/for-linus (d4f3a651ab82 Merge remote-tracking branch 
'spi/for-5.9' into spi-linus)
Merging pci-current/for-linus (76a6b0b90d53 MAINTAINERS: Add Pali Rohár as 
aardvark PCI maintainer)
Merging driver-core.current/driver-core-linus (270315b8235e Me

[PATCH] Docs/bpf: Improve bpf_design_QA.rst

2020-10-21 Thread Rhys Rustad-Elliott

- Rephrase wording such that the file flows better
- Improve consistency with respect to word use, grammar and style
- Remove point on loops not being supported as BPF now has support for
  bounded loops

Signed-off-by: Rhys Rustad-Elliott 
---
 Documentation/bpf/bpf_design_QA.rst | 382 +---
 1 file changed, 183 insertions(+), 199 deletions(-)

diff --git a/Documentation/bpf/bpf_design_QA.rst 
b/Documentation/bpf/bpf_design_QA.rst
index 2df7b067ab93..fe9087144d75 100644
--- a/Documentation/bpf/bpf_design_QA.rst
+++ b/Documentation/bpf/bpf_design_QA.rst
@@ -1,254 +1,238 @@
 ==
 BPF Design Q
 ==
-
-BPF extensibility and applicability to networking, tracing, security
-in the linux kernel and several user space implementations of BPF
-virtual machine led to a number of misunderstanding on what BPF actually is.
-This short QA is an attempt to address that and outline a direction
-of where BPF is heading long term.
+The extensibility of BPF and its wide applicability (eg. to networking, tracing
+and security), as well as the existence of several userspace implementations of
+the BPF virtual machine have led to a number of misunderstandings regarding 
what
+BPF actually is. This short Q is an attempt to address those 
misunderstandings
+as well as outline where BPF is heading in the long term.
 
 .. contents::
 :local:
-:depth: 3
-
-Questions and Answers
-=
+:depth: 2
 
-Q: Is BPF a generic instruction set similar to x64 and arm64?
--
+Q: Is BPF a generic instruction set similar to x86-64 and arm64?
+
 A: NO.
 
-Q: Is BPF a generic virtual machine ?
+Q: Is BPF a generic virtual machine?
 -
 A: NO.
 
-BPF is generic instruction set *with* C calling convention.

+Q: Then what is BPF?
+
+A: BPF is a generic instruction set *with* C calling conventions.
 
-Q: Why C calling convention was chosen?
-~~~
-
-A: Because BPF programs are designed to run in the linux kernel
-which is written in C, hence BPF defines instruction set compatible
-with two most used architectures x64 and arm64 (and takes into
-consideration important quirks of other architectures) and
-defines calling convention that is compatible with C calling
-convention of the linux kernel on those architectures.
+Q: Why were C calling conventions chosen?
+~
+A: BPF programs are designed to run in the Linux kernel which is written in C;
+hence BPF defines an instruction set compatible with the two most used
+architectures, x86-64 and arm64 (while taking into consideration important
+quirks of other architectures) and uses calling conventions that are compatible
+with the C calling conventions of the Linux kernel on those architectures.
 
 Q: Can multiple return values be supported in the future?
 ~
-A: NO. BPF allows only register R0 to be used as return value.
+A: NO. BPF allows only register R0 to be used as a return value.
 
-Q: Can more than 5 function arguments be supported in the future?
-~
-A: NO. BPF calling convention only allows registers R1-R5 to be used
-as arguments. BPF is not a standalone instruction set.
-(unlike x64 ISA that allows msft, cdecl and other conventions)
+Q: Can more than five function arguments be supported in the future?
+
+A: NO. The BPF calling convention only allows registers R1-R5 to be used as
+arguments. BPF is not a standalone instruction set. This is unlike, for 
example,
+the x86-64 ISA, which allows msft, cdecl and other calling conventions.
 
-Q: Can BPF programs access instruction pointer or return address?
--
+Q: Can BPF programs access the instruction pointer or return addresses?
+---
 A: NO.
 
-Q: Can BPF programs access stack pointer ?
---
+Q: Can BPF programs access the stack pointer?
+-
 A: NO.
 
-Only frame pointer (register R10) is accessible.
-From compiler point of view it's necessary to have stack pointer.
-For example, LLVM defines register R11 as stack pointer in its
-BPF backend, but it makes sure that generated code never uses it.
+Only the frame pointer (register R10) is accessible.
 
-Q: Does C-calling convention diminishes possible use cases?

+Note that from the compiler's point of view, it is necessary to have a stack
+pointer. For example, LLVM defines

Re: [PATCH v4] Revert "virtio-net: ethtool configurable RXCSUM"

2020-10-21 Thread Jakub Kicinski

On Wed, 21 Oct 2020 10:30:43 -0400 Michael S. Tsirkin wrote:
> This reverts commit 3618ad2a7c0e78e4258386394d5d5f92a3dbccf8.
> 
> When control vq is not negotiated, that commit causes a crash:
> 
> [   72.229171] kernel BUG at drivers/net/virtio_net.c:1667!

Applied, thank you!

Re: [RFCv2 05/16] x86/kvm: Make VirtIO use DMA API in KVM guest

2020-10-21 Thread Halil Pasic

On Tue, 20 Oct 2020 09:18:48 +0300
"Kirill A. Shutemov"  wrote:

> VirtIO for KVM is a primary way to provide IO. All memory that used for
> communication with the host has to be marked as shared.
> 
> The easiest way to archive that is to use DMA API that already knows how
> to deal with shared memory.
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  drivers/virtio/virtio_ring.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index becc77697960..ace733845d5d 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -12,6 +12,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #ifdef DEBUG
>  /* For development, we want to crash whenever the ring is screwed. */
> @@ -255,6 +256,9 @@ static bool vring_use_dma_api(struct virtio_device *vdev)
>   if (xen_domain())
>   return true;
>  
> + if (kvm_mem_protected())
> + return true;
> +

I guess it does not matter because Christophs comment, but this breaks
the build for s390, because there is no kvm_mem_protected() for s390.

Regards,
Halil

>   return false;
>  }
>

[PATCH v2] KVM: arm64: ARM_SMCCC_ARCH_WORKAROUND_1 doesn't return SMCCC_RET_NOT_REQUIRED

2020-10-21 Thread Stephen Boyd

According to the SMCCC spec[1](7.5.2 Discovery) the
ARM_SMCCC_ARCH_WORKAROUND_1 function id only returns 0, 1, and
SMCCC_RET_NOT_SUPPORTED.

 0 is "workaround required and safe to call this function"
 1 is "workaround not required but safe to call this function"
 SMCCC_RET_NOT_SUPPORTED is "might be vulnerable or might not be, who knows, I 
give up!"

SMCCC_RET_NOT_SUPPORTED might as well mean "workaround required, except
calling this function may not work because it isn't implemented in some
cases". Wonderful. We map this SMC call to

 0 is SPECTRE_MITIGATED
 1 is SPECTRE_UNAFFECTED
 SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE

For KVM hypercalls (hvc), we've implemented this function id to return
SMCCC_RET_NOT_SUPPORTED, 0, and SMCCC_RET_NOT_REQUIRED. One of those
isn't supposed to be there. Per the code we call
arm64_get_spectre_v2_state() to figure out what to return for this
feature discovery call.

 0 is SPECTRE_MITIGATED
 SMCCC_RET_NOT_REQUIRED is SPECTRE_UNAFFECTED
 SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE

Let's clean this up so that KVM tells the guest this mapping:

 0 is SPECTRE_MITIGATED
 1 is SPECTRE_UNAFFECTED
 SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE

(Note: Moving SMCCC_RET_NOT_AFFECTED to a header is left out of this
patch as it would need to move from proton-pack.c which isn't in stable
trees and the name isn't actually part of the SMCCC spec)

Cc: Andre Przywara 
Cc: Steven Price 
Cc: Marc Zyngier 
Cc: sta...@vger.kernel.org
Link: https://developer.arm.com/documentation/den0028/latest [1]
Fixes: c118bbb52743 ("arm64: KVM: Propagate full Spectre v2 workaround state to 
KVM guests")
Signed-off-by: Stephen Boyd 
---

I see that before commit c118bbb52743 ("arm64: KVM: Propagate full
Spectre v2 workaround state to KVM guests") we had this mapping:

 0 is SPECTRE_MITIGATED
 SMCCC_RET_NOT_SUPPORTED is SPECTRE_VULNERABLE

so the return value '1' wasn't there then. Once the commit was merged we
introduced the notion of NOT_REQUIRED here when it shouldn't have been
introduced.

Changes from v1:
 * Way longer commit text, more background (sorry)
 * Dropped proton-pack part because it was wrong
 * Rebased onto other patch accepted upstream

 arch/arm64/kvm/hypercalls.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 9824025ccc5c..6a62312d7813 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -31,7 +31,7 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
val = SMCCC_RET_SUCCESS;
break;
case SPECTRE_UNAFFECTED:
-   val = SMCCC_RET_NOT_REQUIRED;
+   val = 1;
break;
}
break;

base-commit: 66dd3474702aa98d5844367e1577cdad78ef7c65
-- 
Sent by a computer, using git, on the internet

RE: [PATCH] scsi: megaraid_sas: use spin_lock() in hard IRQ

2020-10-21 Thread Finn Thain

On Thu, 22 Oct 2020, Tianxianting wrote:

> Do you mean Megasas raid can be used in m68k arch?

m68k is one example of an architecture on which the unstated assumptions 
in your patch would be invalid. Does this help to clarify what I wrote?

If Megasas raid did work on m68k, I'm sure it could potentially benefit 
from the theoretical performance improvement from your patch.

So perhaps you would consider adding support for slower CPUs like m68k.

Re: [PATCH v11 2/3] ASoC: qcom: dt-bindings: Add sc7180 machine bindings

2020-10-21 Thread Tzung-Bi Shih

Hi, sorry for jumping into your discussion but I am trying to
summarize them to make sure we are on the same page.  Pardon me to
manually copy-and-paste partial sentences to quote.

ACK:
- Don't expose DAI connections in compatible strings.
- Use "model" DT property to make the card more UCM2-friendly.
- Expose new DT properties to distinguish different DMIC models.

NACK:
- All the board variations using exactly the same compatible string.
=> This is less realistic.  Although the CODECS information can be
retrieved from DT, it is inevitable to have some custom code for each
CODEC.

Per Mark's words:
> a different CODEC is something that often justifies a separate compatible
I think we should use different compatible strings for new CODECS
combinations.  And we should try to reuse the machine driver if they
share the most code.  In the worst case, introduce a new machine
driver for the new CODECS combinations.

- Srinivas's suggestion to set driver_name.
e.g. card->driver_name = "SM8250";
=> This sounds like a new DT property should be parsed in
sound/soc/qcom/common.c.  For example: "qcom,family"?  But as we do
less care about UCM2 for now, I would prefer to just leave it as is.


I would expect the following variants in DTS (just for example):

sound {
  compatible = "qcom,sc7180-trogdor";
  model = "sc7180-rt5682-max98357a-1mic";
}

sound {
  compatible = "qcom,sc7180-trogdor";
  model = "sc7180-rt5682-max98357a-2mic";
  dmic-gpio = ...
}

sound {
  compatible = "qcom,sc7180-pompom";
  model = "sc7180-adau7002-max98357a";
}


Please correct me if there is any misunderstanding.

Re: [PATCH v1] ARM: vfp: Use long jump to fix THUMB2 kernel compilation error

2020-10-21 Thread Kees Cook

On Thu, Oct 22, 2020 at 03:00:06AM +0300, Dmitry Osipenko wrote:
> 22.10.2020 02:40, Kees Cook пишет:
> > On Thu, Oct 22, 2020 at 01:57:37AM +0300, Dmitry Osipenko wrote:
> >> The vfp_kmode_exception() function now is unreachable using relative
> >> branching in THUMB2 kernel configuration, resulting in a "relocation
> >> truncated to fit: R_ARM_THM_JUMP19 against symbol `vfp_kmode_exception'"
> >> linker error. Let's use long jump in order to fix the issue.
> > 
> > Eek. Is this with gcc or clang?
> 
> GCC 9.3.0
> 
> >> Fixes: eff8728fe698 ("vmlinux.lds.h: Add PGO and AutoFDO input sections")
> > 
> > Are you sure it wasn't 512dd2eebe55 ("arm/build: Add missing sections") ?
> > That commit may have implicitly moved the location of .vfp11_veneer,
> > though I thought I had chosen the correct position.
> 
> I re-checked that the fixes tag is correct.
> 
> >> Signed-off-by: Dmitry Osipenko 
> >> ---
> >>  arch/arm/vfp/vfphw.S | 3 ++-
> >>  1 file changed, 2 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
> >> index 4fcff9f59947..6e2b29f0c48d 100644
> >> --- a/arch/arm/vfp/vfphw.S
> >> +++ b/arch/arm/vfp/vfphw.S
> >> @@ -82,7 +82,8 @@ ENTRY(vfp_support_entry)
> >>ldr r3, [sp, #S_PSR]@ Neither lazy restore nor FP exceptions
> >>and r3, r3, #MODE_MASK  @ are supported in kernel mode
> >>teq r3, #USR_MODE
> >> -  bne vfp_kmode_exception @ Returns through lr
> >> +  ldr r1, =vfp_kmode_exception
> >> +  bxner1  @ Returns through lr
> >>  
> >>VFPFMRX r1, FPEXC   @ Is the VFP enabled?
> >>DBGSTR1 "fpexc %08x", r1
> > 
> > This seems like a workaround though? I suspect the vfp11_veneer needs
> > moving?
> > 
> 
> I don't know where it needs to be moved. Please feel free to make a
> patch if you have a better idea, I'll be glad to test it.

I might have just been distracted by the common "vfp" prefix. It's
possible that the text section shuffling just ended up being very large,
so probably this patch is right then!

-- 
Kees Cook

Re: [RFCv2 14/16] KVM: Handle protected memory in __kvm_map_gfn()/__kvm_unmap_gfn()

2020-10-21 Thread Halil Pasic

On Tue, 20 Oct 2020 09:18:57 +0300
"Kirill A. Shutemov"  wrote:

> We cannot access protected pages directly. Use ioremap() to
> create a temporary mapping of the page. The mapping is destroyed
> on __kvm_unmap_gfn().
> 
> The new interface gfn_to_pfn_memslot_protected() is used to detect if
> the page is protected.
> 
> ioremap_cache_force() is a hack to bypass IORES_MAP_SYSTEM_RAM check in
> the x86 ioremap code. We need a better solution.
> 
> Signed-off-by: Kirill A. Shutemov 
> ---
>  arch/powerpc/kvm/book3s_64_mmu_hv.c|  2 +-
>  arch/powerpc/kvm/book3s_64_mmu_radix.c |  2 +-
>  arch/x86/include/asm/io.h  |  2 +
>  arch/x86/include/asm/pgtable_types.h   |  1 +
>  arch/x86/kvm/mmu/mmu.c |  6 ++-
>  arch/x86/mm/ioremap.c  | 16 ++--
>  include/linux/kvm_host.h   |  3 +-
>  include/linux/kvm_types.h  |  1 +
>  virt/kvm/kvm_main.c| 52 +++---
>  9 files changed, 63 insertions(+), 22 deletions(-)
> 

You declare ioremap_cache_force() arch/x86/include/asm/io.h  in and
define it in arch/x86/mm/ioremap.c which is architecture specific code,
but use it in __kvm_map_gfn() in virt/kvm/kvm_main.c which is common
code.

Thus your series breaks the build for the s390 architecture. Have you
tried to (cross) compile for s390?

Regards,
Halil

[PATCH v3 1/2] usb: dwc3: Trigger a GCTL soft reset when switching modes in DRD

2020-10-21 Thread John Stultz

From: Yu Chen 

With the current dwc3 code on the HiKey960 we often see the
COREIDLE flag get stuck off in __dwc3_gadget_start(), which
seems to prevent the reset irq and causes the USB gadget to
fail to initialize.

We had seen occasional initialization failures with older
kernels but with recent 5.x era kernels it seemed to be becoming
much more common, so I dug back through some older trees and
realized I dropped this quirk from Yu Chen during upstreaming
as I couldn't provide a proper rational for it and it didn't
seem to be necessary. I now realize I was wrong.

After resubmitting the quirk, Thinh Nguyen pointed out that it
shouldn't be a quirk at all and it is actually mentioned in the
programming guide that it should be done when switching modes
in DRD.

So, to avoid these !COREIDLE lockups seen on HiKey960, this
patch issues GCTL soft reset when switching modes if the
controller is in DRD mode.

Cc: Felipe Balbi 
Cc: Tejas Joglekar 
Cc: Yang Fei 
Cc: YongQin Liu 
Cc: Andrzej Pietrasiewicz 
Cc: Thinh Nguyen 
Cc: Jun Li 
Cc: Mauro Carvalho Chehab 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Signed-off-by: Yu Chen 
Signed-off-by: John Stultz 
---
v2:
* Rework to always call the GCTL soft reset in DRD mode,
  rather then using a quirk as suggested by Thinh Nguyen

v3:
* Move GCTL soft reset under the spinlock as suggested by
  Thinh Nguyen
---
 drivers/usb/dwc3/core.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index bdf0925da6b6..a2a88284a95b 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -114,10 +114,24 @@ void dwc3_set_prtcap(struct dwc3 *dwc, u32 mode)
dwc->current_dr_role = mode;
 }
 
+static void dwc3_gctl_core_soft_reset(struct dwc3 *dwc)
+{
+   int reg;
+
+   reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+   reg |= (DWC3_GCTL_CORESOFTRESET);
+   dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+
+   reg = dwc3_readl(dwc->regs, DWC3_GCTL);
+   reg &= ~(DWC3_GCTL_CORESOFTRESET);
+   dwc3_writel(dwc->regs, DWC3_GCTL, reg);
+}
+
 static void __dwc3_set_mode(struct work_struct *work)
 {
struct dwc3 *dwc = work_to_dwc(work);
unsigned long flags;
+   int hw_mode;
int ret;
u32 reg;
 
@@ -156,6 +170,11 @@ static void __dwc3_set_mode(struct work_struct *work)
 
spin_lock_irqsave(>lock, flags);
 
+   /* Execute a GCTL Core Soft Reset when switch mode in DRD*/
+   hw_mode = DWC3_GHWPARAMS0_MODE(dwc->hwparams.hwparams0);
+   if (hw_mode == DWC3_GHWPARAMS0_MODE_DRD)
+   dwc3_gctl_core_soft_reset(dwc);
+
dwc3_set_prtcap(dwc, dwc->desired_dr_role);
 
spin_unlock_irqrestore(>lock, flags);
-- 
2.17.1

[PATCH v3 2/2] usb: dwc3: Fix DRD mode change sequence following programming guide

2020-10-21 Thread John Stultz

In reviewing the previous patch, Thinh Nguyen pointed out that
the DRD mode change sequence should be like the following when
switching from host -> device according to the programming guide
(for all DRD IPs):
1. Reset controller with GCTL.CoreSoftReset
2. Set GCTL.PrtCapDir(device)
3. Soft reset with DCTL.CSftRst
4. Then follow up with the initializing registers sequence

The current code does:
a. Soft reset with DCTL.CSftRst on driver probe
b. Reset controller with GCTL.CoreSoftReset (added in previous
   patch)
c. Set GCTL.PrtCapDir(device)
d. < missing DCTL.CSftRst >
e. Then follow up with initializing registers sequence

So this patch adds the DCTL.CSftRst soft reset that was currently
missing from the dwc3 mode switching.

Cc: Felipe Balbi 
Cc: Tejas Joglekar 
Cc: Yang Fei 
Cc: YongQin Liu 
Cc: Andrzej Pietrasiewicz 
Cc: Thinh Nguyen 
Cc: Jun Li 
Cc: Mauro Carvalho Chehab 
Cc: Greg Kroah-Hartman 
Cc: linux-...@vger.kernel.org
Signed-off-by: John Stultz 
---
Feedback would be appreciated. I'm a little worried I should be
conditionalizing the DCTL.CSftRst on DRD mode controllers, but
I'm really not sure what the right thing to do is for non-DRD
mode controllers.
---
 drivers/usb/dwc3/core.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index a2a88284a95b..c87d8add19e4 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -40,6 +40,8 @@
 
 #define DWC3_DEFAULT_AUTOSUSPEND_DELAY 5000 /* ms */
 
+static int dwc3_core_soft_reset(struct dwc3 *dwc);
+
 /**
  * dwc3_get_dr_mode - Validates and sets dr_mode
  * @dwc: pointer to our context structure
@@ -177,6 +179,7 @@ static void __dwc3_set_mode(struct work_struct *work)
 
dwc3_set_prtcap(dwc, dwc->desired_dr_role);
 
+   dwc3_core_soft_reset(dwc);
spin_unlock_irqrestore(>lock, flags);
 
switch (dwc->desired_dr_role) {
-- 
2.17.1

Re: [PATCH] serial: pmac_zilog: don't init if zilog is not available

2020-10-21 Thread Finn Thain

On Wed, 21 Oct 2020, Laurent Vivier wrote:

> Le 21/10/2020 à 01:43, Finn Thain a écrit :
> 
> > Laurent, can we avoid the irq == 0 warning splat like this?
> > 
> > diff --git a/drivers/tty/serial/pmac_zilog.c 
> > b/drivers/tty/serial/pmac_zilog.c
> > index 96e7aa479961..7db600cd8cc7 100644
> > --- a/drivers/tty/serial/pmac_zilog.c
> > +++ b/drivers/tty/serial/pmac_zilog.c
> > @@ -1701,8 +1701,10 @@ static int __init pmz_init_port(struct 
> > uart_pmac_port *uap)
> > int irq;
> >  
> > r_ports = platform_get_resource(uap->pdev, IORESOURCE_MEM, 0);
> > +   if (!r_ports)
> > +   return -ENODEV;
> > irq = platform_get_irq(uap->pdev, 0);
> > -   if (!r_ports || irq <= 0)
> > +   if (irq <= 0)
> > return -ENODEV;
> >  
> > uap->port.mapbase  = r_ports->start;
> > 
> 
> No, this doesn't fix the problem.
> 

Then I had better stop guessing and start up Aranym...

The patch below seems to fix the problem for me. Does it work on your 
system(s)?

diff --git a/arch/m68k/mac/config.c b/arch/m68k/mac/config.c
index a621fcc1a576..4e802f70333d 100644
--- a/arch/m68k/mac/config.c
+++ b/arch/m68k/mac/config.c
@@ -776,16 +776,12 @@ static struct resource scc_b_rsrcs[] = {
 struct platform_device scc_a_pdev = {
.name   = "scc",
.id = 0,
-   .num_resources  = ARRAY_SIZE(scc_a_rsrcs),
-   .resource   = scc_a_rsrcs,
 };
 EXPORT_SYMBOL(scc_a_pdev);
 
 struct platform_device scc_b_pdev = {
.name   = "scc",
.id = 1,
-   .num_resources  = ARRAY_SIZE(scc_b_rsrcs),
-   .resource   = scc_b_rsrcs,
 };
 EXPORT_SYMBOL(scc_b_pdev);
 
@@ -812,10 +808,15 @@ static void __init mac_identify(void)
 
/* Set up serial port resources for the console initcall. */
 
-   scc_a_rsrcs[0].start = (resource_size_t) mac_bi_data.sccbase + 2;
-   scc_a_rsrcs[0].end   = scc_a_rsrcs[0].start;
-   scc_b_rsrcs[0].start = (resource_size_t) mac_bi_data.sccbase;
-   scc_b_rsrcs[0].end   = scc_b_rsrcs[0].start;
+   scc_a_rsrcs[0].start = (resource_size_t)mac_bi_data.sccbase + 2;
+   scc_a_rsrcs[0].end   = scc_a_rsrcs[0].start;
+   scc_a_pdev.num_resources = ARRAY_SIZE(scc_a_rsrcs);
+   scc_a_pdev.resource  = scc_a_rsrcs;
+
+   scc_b_rsrcs[0].start = (resource_size_t)mac_bi_data.sccbase;
+   scc_b_rsrcs[0].end   = scc_b_rsrcs[0].start;
+   scc_b_pdev.num_resources = ARRAY_SIZE(scc_b_rsrcs);
+   scc_b_pdev.resource  = scc_b_rsrcs;
 
switch (macintosh_config->scc_type) {
case MAC_SCC_PSC:
diff --git a/drivers/tty/serial/pmac_zilog.c b/drivers/tty/serial/pmac_zilog.c
index 96e7aa479961..95abdb305d67 100644
--- a/drivers/tty/serial/pmac_zilog.c
+++ b/drivers/tty/serial/pmac_zilog.c
@@ -1697,18 +1697,17 @@ extern struct platform_device scc_a_pdev, scc_b_pdev;
 
 static int __init pmz_init_port(struct uart_pmac_port *uap)
 {
-   struct resource *r_ports;
-   int irq;
+   struct resource *r_ports, *r_irq;
 
r_ports = platform_get_resource(uap->pdev, IORESOURCE_MEM, 0);
-   irq = platform_get_irq(uap->pdev, 0);
-   if (!r_ports || irq <= 0)
+   r_irq = platform_get_resource(uap->pdev, IORESOURCE_IRQ, 0);
+   if (!r_ports || !r_irq)
return -ENODEV;
 
uap->port.mapbase  = r_ports->start;
uap->port.membase  = (unsigned char __iomem *) r_ports->start;
uap->port.iotype   = UPIO_MEM;
-   uap->port.irq  = irq;
+   uap->port.irq  = r_irq->start;
uap->port.uartclk  = ZS_CLOCK;
uap->port.fifosize = 1;
uap->port.ops  = _pops;

Re: [PATCH v4] Revert "virtio-net: ethtool configurable RXCSUM"

2020-10-21 Thread Jason Wang




On 2020/10/21 下午10:30, Michael S. Tsirkin wrote:

This reverts commit 3618ad2a7c0e78e4258386394d5d5f92a3dbccf8.

When control vq is not negotiated, that commit causes a crash:

[   72.229171] kernel BUG at drivers/net/virtio_net.c:1667!
[   72.230266] invalid opcode:  [#1] PREEMPT SMP
[   72.231172] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.9.0-rc8-02934-g3618ad2a7c0e7 #1
[   72.231172] EIP: virtnet_send_command+0x120/0x140
[   72.231172] Code: 00 0f 94 c0 8b 7d f0 65 33 3d 14 00 00 00 75 1c 8d 65 f4 
5b 5e 5f 5d c3 66 90 be 01 00 00 00 e9 6e ff ff ff 8d b6 00
+00 00 00 <0f> 0b e8 d9 bb 82 00 eb 17 8d b4 26 00 00 00 00 8d b4 26 00 00 00
[   72.231172] EAX: 000d EBX: f72895c0 ECX: 0017 EDX: 0011
[   72.231172] ESI: f7197800 EDI: ed69bd00 EBP: ed69bcf4 ESP: ed69bc98
[   72.231172] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010246
[   72.231172] CR0: 80050033 CR2:  CR3: 02c84000 CR4: 000406f0
[   72.231172] Call Trace:
[   72.231172]  ? __virt_addr_valid+0x45/0x60
[   72.231172]  ? ___cache_free+0x51f/0x760
[   72.231172]  ? kobject_uevent_env+0xf4/0x560
[   72.231172]  virtnet_set_guest_offloads+0x4d/0x80
[   72.231172]  virtnet_set_features+0x85/0x120
[   72.231172]  ? virtnet_set_guest_offloads+0x80/0x80
[   72.231172]  __netdev_update_features+0x27a/0x8e0
[   72.231172]  ? kobject_uevent+0xa/0x20
[   72.231172]  ? netdev_register_kobject+0x12c/0x160
[   72.231172]  register_netdevice+0x4fe/0x740
[   72.231172]  register_netdev+0x1c/0x40
[   72.231172]  virtnet_probe+0x728/0xb60
[   72.231172]  ? _raw_spin_unlock+0x1d/0x40
[   72.231172]  ? virtio_vdpa_get_status+0x1c/0x20
[   72.231172]  virtio_dev_probe+0x1c6/0x271
[   72.231172]  really_probe+0x195/0x2e0
[   72.231172]  driver_probe_device+0x26/0x60
[   72.231172]  device_driver_attach+0x49/0x60
[   72.231172]  __driver_attach+0x46/0xc0
[   72.231172]  ? device_driver_attach+0x60/0x60
[   72.231172]  bus_add_driver+0x197/0x1c0
[   72.231172]  driver_register+0x66/0xc0
[   72.231172]  register_virtio_driver+0x1b/0x40
[   72.231172]  virtio_net_driver_init+0x61/0x86
[   72.231172]  ? veth_init+0x14/0x14
[   72.231172]  do_one_initcall+0x76/0x2e4
[   72.231172]  ? rdinit_setup+0x2a/0x2a
[   72.231172]  do_initcalls+0xb2/0xd5
[   72.231172]  kernel_init_freeable+0x14f/0x179
[   72.231172]  ? rest_init+0x100/0x100
[   72.231172]  kernel_init+0xd/0xe0
[   72.231172]  ret_from_fork+0x1c/0x30
[   72.231172] Modules linked in:
[   72.269563] ---[ end trace a6ebc4afea0e6cb1 ]---

The reason is that virtnet_set_features now calls virtnet_set_guest_offloads
unconditionally, it used to only call it when there is something
to configure.

If device does not have a control vq, everything breaks.

Revert the original commit for now.

Cc: Tonghao Zhang 
Cc: Willem de Bruijn 
Fixes: 3618ad2a7c0e7 ("virtio-net: ethtool configurable RXCSUM")
Reported-by: kernel test robot 
Signed-off-by: Michael S. Tsirkin 
---



Acked-by: Jason Wang 





Same patch as all of v1-v3, just tweaking the commit log.

  drivers/net/virtio_net.c | 50 +++-
  1 file changed, 13 insertions(+), 37 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d2d2c4a53cf2..21b71148c532 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -68,8 +68,6 @@ static const unsigned long guest_offloads[] = {
(1ULL << VIRTIO_NET_F_GUEST_ECN)  | \
(1ULL << VIRTIO_NET_F_GUEST_UFO))
  
-#define GUEST_OFFLOAD_CSUM_MASK (1ULL << VIRTIO_NET_F_GUEST_CSUM)

-
  struct virtnet_stat_desc {
char desc[ETH_GSTRING_LEN];
size_t offset;
@@ -2524,48 +2522,29 @@ static int virtnet_get_phys_port_name(struct net_device 
*dev, char *buf,
return 0;
  }
  
-static netdev_features_t virtnet_fix_features(struct net_device *netdev,

- netdev_features_t features)
-{
-   /* If Rx checksum is disabled, LRO should also be disabled. */
-   if (!(features & NETIF_F_RXCSUM))
-   features &= ~NETIF_F_LRO;
-
-   return features;
-}
-
  static int virtnet_set_features(struct net_device *dev,
netdev_features_t features)
  {
struct virtnet_info *vi = netdev_priv(dev);
-   u64 offloads = vi->guest_offloads;
+   u64 offloads;
int err;
  
-	/* Don't allow configuration while XDP is active. */

-   if (vi->xdp_queue_pairs)
-   return -EBUSY;
-
if ((dev->features ^ features) & NETIF_F_LRO) {
+   if (vi->xdp_queue_pairs)
+   return -EBUSY;
+
if (features & NETIF_F_LRO)
-   offloads |= GUEST_OFFLOAD_LRO_MASK &
-   vi->guest_offloads_capable;
+   offloads = vi->guest_offloads_capable;
else
-   offloads &= ~GUEST_OFFLOAD_LRO_MASK;
+

[PATCH] PM / sysfs: Expose suspend resume driver flags in sysfs

2020-10-21 Thread Chen Yu

Currently there are 4 driver flags to control system suspend/resume
behavior: DPM_FLAG_NO_DIRECT_COMPLETE, DPM_FLAG_SMART_PREPARE,
DPM_FLAG_SMART_SUSPEND and DPM_FLAG_MAY_SKIP_RESUME. Make these flags
visible in sysfs as read-only to get a brief understanding of the
expected behavior of each device during suspend/resume, so as to
facilitate suspend/resume debugging/tuning.

For example:
/sys/devices/pci:00/:00:15.1/power/driver_flags:4
(DPM_FLAG_SMART_SUSPEND)

/sys/devices/pci:00/:00:07.3/power/driver_flags:5
(DPM_FLAG_NO_DIRECT_COMPLETE | DPM_FLAG_SMART_SUSPEND)

Acked-by: Len Brown 
Signed-off-by: Chen Yu 
---
 drivers/base/power/sysfs.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
index a1474fb67db9..48313a1040a5 100644
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -607,6 +607,13 @@ static ssize_t async_store(struct device *dev, struct 
device_attribute *attr,
 
 static DEVICE_ATTR_RW(async);
 
+static ssize_t driver_flags_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   return sysfs_emit(buf, "%x\n", dev->power.driver_flags);
+}
+static DEVICE_ATTR_RO(driver_flags);
+
 #endif /* CONFIG_PM_SLEEP */
 #endif /* CONFIG_PM_ADVANCED_DEBUG */
 
@@ -691,6 +698,20 @@ static const struct attribute_group 
pm_qos_flags_attr_group = {
.attrs  = pm_qos_flags_attrs,
 };
 
+static struct attribute *pm_driver_flags_attrs[] = {
+#ifdef CONFIG_PM_ADVANCED_DEBUG
+#ifdef CONFIG_PM_SLEEP
+   _attr_driver_flags.attr,
+#endif
+#endif
+   NULL,
+};
+
+static const struct attribute_group pm_driver_flags_attr_group = {
+   .name   = power_group_name,
+   .attrs  = pm_driver_flags_attrs,
+};
+
 int dpm_sysfs_add(struct device *dev)
 {
int rc;
@@ -719,11 +740,17 @@ int dpm_sysfs_add(struct device *dev)
if (rc)
goto err_wakeup;
}
-   rc = pm_wakeup_source_sysfs_add(dev);
+   rc = sysfs_merge_group(>kobj, _driver_flags_attr_group);
if (rc)
goto err_latency;
+
+   rc = pm_wakeup_source_sysfs_add(dev);
+   if (rc)
+   goto err_flags;
return 0;
 
+ err_flags:
+   sysfs_unmerge_group(>kobj, _driver_flags_attr_group);
  err_latency:
sysfs_unmerge_group(>kobj, _qos_latency_tolerance_attr_group);
  err_wakeup:
-- 
2.17.1

Re: [PATCH] usb: typec: Expose Product Type VDOs via sysfs

2020-10-21 Thread Benson Leung

Hi Prashant,

On Wed, Oct 21, 2020 at 02:18:02PM -0700, Prashant Malani wrote:
> A PD-capable device can return up to 3 Product Type VDOs as part of its
> DiscoverIdentity Response (USB PD Spec, Rev 3.0, Version 2.0, Section
> 6.4.4.3.1). Add a sysfs attribute to expose these to userspace.
> 
> Cc: Benson Leung 
> Cc: Heikki Krogerus 
> Signed-off-by: Prashant Malani 

Reviewed-by: Benson Leung 

> ---
>  drivers/usb/typec/class.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/drivers/usb/typec/class.c b/drivers/usb/typec/class.c
> index 35eec707cb51..e6abb0dee9fa 100644
> --- a/drivers/usb/typec/class.c
> +++ b/drivers/usb/typec/class.c
> @@ -122,10 +122,20 @@ static ssize_t product_show(struct device *dev, struct 
> device_attribute *attr,
>  }
>  static DEVICE_ATTR_RO(product);
>  
> +static ssize_t product_type_vdo_show(struct device *dev, struct 
> device_attribute *attr,
> +  char *buf)
> +{
> + struct usb_pd_identity *id = get_pd_identity(dev);
> +
> + return sprintf(buf, "0x%08x\n0x%08x\n0x%08x\n", id->vdo[0], id->vdo[1], 
> id->vdo[2]);
> +}
> +static DEVICE_ATTR_RO(product_type_vdo);
> +
>  static struct attribute *usb_pd_id_attrs[] = {
>   _attr_id_header.attr,
>   _attr_cert_stat.attr,
>   _attr_product.attr,
> + _attr_product_type_vdo.attr,
>   NULL
>  };
>  
> -- 
> 2.29.0.rc1.297.gfa9743e501-goog
> 

-- 
Benson Leung
Staff Software Engineer
Chrome OS Kernel
Google Inc.
ble...@google.com
Chromium OS Project
ble...@chromium.org


signature.asc
Description: PGP signature

drm_modes: signed integer overflow

2020-10-21 Thread Randy Dunlap

Hi,

With linux-next 20201021, when booting up, I am seeing this:

[0.560896] UBSAN: signed-integer-overflow in 
../drivers/gpu/drm/drm_modes.c:765:20
[0.560903] 2376000 * 1000 cannot be represented in type 'int'
[0.560909] CPU: 3 PID: 7 Comm: kworker/u16:0 Not tainted 
5.9.0-next-20201021 #2
[0.560914] Hardware name: TOSHIBA PORTEGE R835/Portable PC, BIOS Version 
4.10   01/08/2013
[0.560924] Workqueue: events_unbound async_run_entry_fn

[0.560930] Call Trace:
[0.560938]  dump_stack+0x5e/0x74
[0.560943]  ubsan_epilogue+0x9/0x45
[0.560948]  handle_overflow+0x8b/0x98
[0.560953]  ? set_track+0x3f/0xad
[0.560958]  __ubsan_handle_mul_overflow+0xe/0x10
[0.560964]  drm_mode_vrefresh+0x4a/0xbc
[0.560970] initcall i915_init+0x0/0x6a returned 0 after 116076 usecs
[0.560974] calling  cn_proc_init+0x0/0x36 @ 1
[0.560978]  cea_mode_alternate_clock+0x11/0x62
[0.560985]  drm_match_cea_mode+0xc7/0x1e7
[0.560987] initcall cn_proc_init+0x0/0x36 returned 0 after 3 usecs
[0.560990] calling  topology_sysfs_init+0x0/0x2d @ 1
[0.561000]  drm_mode_validate_ycbcr420+0xd/0x48
[0.561005]  drm_helper_probe_single_connector_modes+0x6db/0x7da
[0.561012]  drm_client_modeset_probe+0x225/0x143f
[0.561018]  ? bitmap_fold+0x8a/0x8a
[0.561023]  ? update_cfs_rq_load_avg+0x192/0x1a2
[0.561029]  __drm_fb_helper_initial_config_and_unlock+0x3f/0x5b7
[0.561035]  ? get_sd_balance_interval+0x1c/0x40
[0.561040]  drm_fb_helper_initial_config+0x48/0x4f
[0.561047]  intel_fbdev_initial_config+0x13/0x23
[0.561052]  async_run_entry_fn+0x89/0x15c
[0.561058]  process_one_work+0x15c/0x1f3
[0.561064]  worker_thread+0x1ac/0x25d
[0.561069]  ? process_scheduled_works+0x2e/0x2e
[0.561074]  kthread+0x10e/0x116
[0.561078]  ? kthread_parkme+0x1c/0x1c
[0.561083]  ret_from_fork+0x22/0x30
[0.561087] 


-- 
~Randy
Reported-by: Randy Dunlap

[PATCH] kunit: Fix kunit.py --raw_output option

2020-10-21 Thread David Gow

Due to the raw_output() function on kunit_parser.py actually being a
generator, it only runs if something reads the lines it returns. Since
we no-longer do that (parsing doesn't actually happen if raw_output is
enabled), it was not printing anything.

Fixes:  45ba7a893ad89114e773b3dc32f6431354c465d6 ("kunit: kunit_tool: Separate 
out config/build/exec/parse")
Signed-off-by: David Gow 
---
 tools/testing/kunit/kunit_parser.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/tools/testing/kunit/kunit_parser.py 
b/tools/testing/kunit/kunit_parser.py
index 8019e3dd4c32..744ee9cb0073 100644
--- a/tools/testing/kunit/kunit_parser.py
+++ b/tools/testing/kunit/kunit_parser.py
@@ -66,7 +66,6 @@ def isolate_kunit_output(kernel_output):
 def raw_output(kernel_output):
for line in kernel_output:
print(line)
-   yield line
 
 DIVIDER = '=' * 60
 
-- 
2.29.0.rc1.297.gfa9743e501-goog

Re: [PATCH v7 3/4] docs: Add documentation for userspace client interface

2020-10-21 Thread Jeffrey Hugo


On 10/21/2020 11:46 AM, Hemant Kumar wrote:

Hi Jeff,

On 10/21/20 8:28 AM, Jeffrey Hugo wrote:

On 10/16/2020 10:04 PM, Hemant Kumar wrote:

+release


Should this be "close" since close() is the actual function userspace 
would call?
I was keeping kernel driver in mind while writing this, i can change it 
to close() if release() is confusing here.


I guess I was considering the client perspective, but this is kernel 
documentation so I can see your perspective.  I don't have a strong 
preference.  I suppose keep it as is.





+---
+
+Decrements UCI device reference count and UCI channel reference 
count upon last
+release(). UCI channel clean up is performed. MHI channel moves to 
disabled

+state and inbound buffers are freed.
+




Thanks,
Hemant



--
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [RFC PATCH net-next 7/9] net: dsa: microchip: ksz9477: add hardware time stamping support

2020-10-21 Thread Richard Cochran

On Thu, Oct 22, 2020 at 02:39:35AM +0300, Vladimir Oltean wrote:
> So how _does_ that work for TI PHYTER?
> 
> As far as we understand, the PHYTER appears to autonomously mangle PTP packets
> in the following way:
> - subtracting t2 on RX from the correctionField of the Pdelay_Req
> - adding t3 on TX to the correctionField of the Pdelay_Resp

The Phyter does not support peer-to-peer one step.

The only driver that implements it is ptp_ines.c.

And *that* driver/HW implements it correctly.

Thanks,
Richard

Re: [PATCH] serial: pmac_zilog: don't init if zilog is not available

2020-10-21 Thread Michael Ellerman

Laurent Vivier  writes:
> Le 20/10/2020 à 20:32, Greg KH a écrit :
>> On Tue, Oct 20, 2020 at 08:19:26PM +0200, Laurent Vivier wrote:
>>> Le 20/10/2020 à 19:37, Greg KH a écrit :
 On Tue, Oct 20, 2020 at 06:37:41PM +0200, Laurent Vivier wrote:
> Le 20/10/2020 à 18:28, Greg KH a écrit :
>> On Tue, Oct 20, 2020 at 06:23:03PM +0200, Laurent Vivier wrote:
>>> We can avoid to probe for the Zilog device (and generate ugly kernel 
>>> warning)
>>> if kernel is built for Mac but not on a Mac.
>>>
>>> Signed-off-by: Laurent Vivier 
>>> ---
>>>  drivers/tty/serial/pmac_zilog.c | 11 +++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/tty/serial/pmac_zilog.c 
>>> b/drivers/tty/serial/pmac_zilog.c
>>> index 063484b22523..d1d2e55983c3 100644
>>> --- a/drivers/tty/serial/pmac_zilog.c
>>> +++ b/drivers/tty/serial/pmac_zilog.c
>>> @@ -1867,6 +1867,12 @@ static struct platform_driver pmz_driver = {
>>>  static int __init init_pmz(void)
>>>  {
>>> int rc, i;
>>> +
>>> +#ifdef CONFIG_MAC
>>> +   if (!MACH_IS_MAC)
>>> +   return -ENODEV;
>>> +#endif
>>
>> Why is the #ifdef needed?
>>
>> We don't like putting #ifdef in .c files for good reasons.  Can you make
>> the api check for this work with and without that #ifdef needed?
>
> The #ifdef is needed because this file can be compiled for PowerMac and
> m68k Mac. For PowerMac, the MACH_IS_MAC is not defined, so we need the
> #ifdef.
>
> We need the MAC_IS_MAC because the same kernel can be used with several
> m68k machines, so the init_pmz can be called on a m68k machine without
> the zilog device (it's a multi-targets kernel).
>
> You can check it's the good way to do by looking inside:
>
> drivers/video/fbdev/valkyriefb.c +317
> drivers/macintosh/adb.c +316
>
> That are two files used by both, mac and pmac.

 Why not fix it to work properly like other arch checks are done
>>> I would be happy to do the same.
>>>
 Put it in a .h file and do the #ifdef there.  Why is this "special"?
>>>
>>> I don't know.
>>>
>>> Do you mean something like:
>>>
>>> drivers/tty/serial/pmac_zilog.h
>>> ...
>>> #ifndef MACH_IS_MAC
>>> #define MACH_IS_MAC (0)
>>> #endif
>>> ...
>>>
>>> drivers/tty/serial/pmac_zilog.c
>>> ...
>>> static int __init pmz_console_init(void)
>>> {
>>> if (!MACH_IS_MAC)
>>> return -ENODEV;
>>> ...
>> 
>> Yup, that would be a good start, but why is the pmac_zilog.h file
>> responsible for this?  Shouldn't this be in some arch-specific file
>> somewhere?
>
> For m68k, MACH_IS_MAC is defined in arch/m68k/include/asm/setup.h
>
> If I want to define it for any other archs I don't know in which file we
> can put it.
>
> But as m68k mac is only sharing drivers with pmac perhaps we can put
> this in arch/powerpc/include/asm/setup.h?

It doesn't really belong in there.

I'd accept a patch to create arch/powerpc/include/asm/macintosh.h, with
MACH_IS_MAC defined in there.

cheers

Re: [f2fs-dev] [PATCH] f2fs: add compr_inode and compr_blocks sysfs nodes

2020-10-21 Thread Daeho Jeong

Yep, It sounds good to me.

2020년 10월 21일 (수) 오후 3:08, Chao Yu 님이 작성:
>
> On 2020/10/16 13:14, Daeho Jeong wrote:
> > From: Daeho Jeong 
> >
> > Added compr_inode to show compressed inode count and compr_blocks to
> > show compressed block count in sysfs.
>
> As there are so many entries in ../f2fs// directory, it looks a mess
> there, I suggest that we can add a new directory 'stats' in ../f2fs//,
> in where we can store all readonly stats related entries there later.
>
> How do you think?
>
> Thanks,
>
> >
> > Signed-off-by: Daeho Jeong 
> > ---
> >   Documentation/ABI/testing/sysfs-fs-f2fs | 10 ++
> >   fs/f2fs/sysfs.c | 17 +
> >   2 files changed, 27 insertions(+)
> >
> > diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs 
> > b/Documentation/ABI/testing/sysfs-fs-f2fs
> > index 834d0becae6d..a01c26484c69 100644
> > --- a/Documentation/ABI/testing/sysfs-fs-f2fs
> > +++ b/Documentation/ABI/testing/sysfs-fs-f2fs
> > @@ -350,3 +350,13 @@ Date:April 2020
> >   Contact:"Daeho Jeong" 
> >   Description:Give a way to change iostat_period time. 3secs by 
> > default.
> >   The new iostat trace gives stats gap given the period.
> > +
> > +What:/sys/fs/f2fs//compr_inode
> > +Date:October 2020
> > +Contact: "Daeho Jeong" 
> > +Description: Show compressed inode count
> > +
> > +What:/sys/fs/f2fs//compr_blocks
> > +Date:October 2020
> > +Contact: "Daeho Jeong" 
> > +Description: Show compressed block count
> > diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
> > index 94c98e412aa1..7139a29a00d3 100644
> > --- a/fs/f2fs/sysfs.c
> > +++ b/fs/f2fs/sysfs.c
> > @@ -223,6 +223,19 @@ static ssize_t avg_vblocks_show(struct f2fs_attr *a,
> >   f2fs_update_sit_info(sbi);
> >   return sprintf(buf, "%llu\n", (unsigned long long)(si->avg_vblocks));
> >   }
> > +
> > +static ssize_t compr_inode_show(struct f2fs_attr *a,
> > + struct f2fs_sb_info *sbi, char *buf)
> > +{
> > + return sprintf(buf, "%u\n", atomic_read(>compr_inode));
> > +}
> > +
> > +static ssize_t compr_blocks_show(struct f2fs_attr *a,
> > + struct f2fs_sb_info *sbi, char *buf)
> > +{
> > + return sprintf(buf, "%llu\n", atomic64_read(>compr_blocks));
> > +}
> > +
> >   #endif
> >
> >   static ssize_t main_blkaddr_show(struct f2fs_attr *a,
> > @@ -591,6 +604,8 @@ F2FS_STAT_ATTR(STAT_INFO, f2fs_stat_info, 
> > gc_background_calls, bg_gc);
> >   F2FS_GENERAL_RO_ATTR(moved_blocks_background);
> >   F2FS_GENERAL_RO_ATTR(moved_blocks_foreground);
> >   F2FS_GENERAL_RO_ATTR(avg_vblocks);
> > +F2FS_GENERAL_RO_ATTR(compr_inode);
> > +F2FS_GENERAL_RO_ATTR(compr_blocks);
> >   #endif
> >
> >   #ifdef CONFIG_FS_ENCRYPTION
> > @@ -675,6 +690,8 @@ static struct attribute *f2fs_attrs[] = {
> >   ATTR_LIST(moved_blocks_foreground),
> >   ATTR_LIST(moved_blocks_background),
> >   ATTR_LIST(avg_vblocks),
> > + ATTR_LIST(compr_inode),
> > + ATTR_LIST(compr_blocks),
> >   #endif
> >   NULL,
> >   };
> >

Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-21 Thread Roman Gushchin

On Thu, Oct 22, 2020 at 09:54:53AM +0800, Xiaqing (A) wrote:
> 
> 
> On 2020/10/17 6:52, Roman Gushchin wrote:
> 
> > This small patchset makes cma_release() non-blocking and simplifies
> > the code in hugetlbfs, where previously we had to temporarily drop
> > hugetlb_lock around the cma_release() call.
> > 
> > It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic
> > THP under a memory pressure requires a cma_release() call. If it's
> > a blocking function, it complicates the already complicated code.
> > Because there are at least two use cases like this (hugetlbfs is
> > another example), I believe it's just better to make cma_release()
> > non-blocking.
> > 
> > It also makes it more consistent with other memory releasing functions
> > in the kernel: most of them are non-blocking.
> > 
> > 
> > Roman Gushchin (2):
> >mm: cma: make cma_release() non-blocking
> >mm: hugetlb: don't drop hugetlb_lock around cma_release() call
> > 
> >   mm/cma.c | 51 +--
> >   mm/hugetlb.c |  6 --
> >   2 files changed, 49 insertions(+), 8 deletions(-)
> > 
> I don't think this patch is a good idea.It transfers part or even all of the 
> time of
> cma_release to cma_alloc, which is more concerned by performance indicators.

I'm not quite sure: if cma_alloc() is racing with cma_release(), cma_alloc() 
will
wait for the cma_lock mutex anyway. So we don't really transfer anything to 
cma_alloc().

> On Android phones, CPU resource competition is intense in many scenarios,
> As a result, kernel threads and workers can be scheduled only after some 
> ticks or more.
> In this case, the performance of cma_alloc will deteriorate significantly,
> which is not good news for many services on Android.

Ok, I agree, if the cpu is heavily loaded, it might affect the total execution 
time.

If we aren't going into the mutex->spinlock conversion direction (as Mike 
suggested),
we can address the performance concerns by introducing a cma_release_nowait() 
function,
so that the default cma_release() would work in the old way.
cma_release_nowait() can set an atomic flag on a cma area, which will cause 
following
cma_alloc()'s to flush the release queue. In this case there will be no 
performance
penalty unless somebody is using cma_release_nowait().
Will it work for you?

Thank you!

PING: [PATCH] block: add io_error stat for block device

2020-10-21 Thread zhenwei pi


Hi, Jens

What do you think about this, adding io error stat for block devices is 
reasonable?


On 9/10/20 10:20 AM, zhenwei pi wrote:

Currently if hitting block req error, block layer only prints error
log with a rate limitation. Then agent has to parse kernel log to
record what happens.

In this patch, add read/write/discard/flush stat counter to record
io errors.

Signed-off-by: zhenwei pi 
---
  block/blk-core.c  | 14 +++---
  block/genhd.c | 19 +++
  include/linux/part_stat.h |  1 +
  3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 10c08ac50697..8f1424835700 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1558,9 +1558,17 @@ bool blk_update_request(struct request *req, 
blk_status_t error,
req->q->integrity.profile->complete_fn(req, nr_bytes);
  #endif
  
-	if (unlikely(error && !blk_rq_is_passthrough(req) &&

-!(req->rq_flags & RQF_QUIET)))
-   print_req_error(req, error, __func__);
+   if (unlikely(error && !blk_rq_is_passthrough(req))) {
+   if (op_is_flush(req_op(req)))
+   part_stat_inc(>rq_disk->part0,
+   io_errors[STAT_FLUSH]);
+   else
+   part_stat_inc(>rq_disk->part0,
+   io_errors[op_stat_group(req_op(req))]);
+
+   if (!(req->rq_flags & RQF_QUIET))
+   print_req_error(req, error, __func__);
+   }
  
  	blk_account_io_completion(req, nr_bytes);
  
diff --git a/block/genhd.c b/block/genhd.c

index 99c64641c314..852035095485 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -104,6 +104,7 @@ static void part_stat_read_all(struct hd_struct *part, 
struct disk_stats *stat)
stat->sectors[group] += ptr->sectors[group];
stat->ios[group] += ptr->ios[group];
stat->merges[group] += ptr->merges[group];
+   stat->io_errors[group] += ptr->io_errors[group];
}
  
  		stat->io_ticks += ptr->io_ticks;

@@ -1374,6 +1375,22 @@ static ssize_t disk_discard_alignment_show(struct device 
*dev,
return sprintf(buf, "%d\n", queue_discard_alignment(disk->queue));
  }
  
+static ssize_t io_error_show(struct device *dev,

+ struct device_attribute *attr, char *buf)
+{
+   struct hd_struct *p = dev_to_part(dev);
+   struct disk_stats stat;
+
+   part_stat_read_all(p, );
+
+   return sprintf(buf,
+   "%8lu %8lu %8lu %8lu\n",
+   stat.io_errors[STAT_READ],
+   stat.io_errors[STAT_WRITE],
+   stat.io_errors[STAT_DISCARD],
+   stat.io_errors[STAT_FLUSH]);
+}
+
  static DEVICE_ATTR(range, 0444, disk_range_show, NULL);
  static DEVICE_ATTR(ext_range, 0444, disk_ext_range_show, NULL);
  static DEVICE_ATTR(removable, 0444, disk_removable_show, NULL);
@@ -1386,6 +1403,7 @@ static DEVICE_ATTR(capability, 0444, 
disk_capability_show, NULL);
  static DEVICE_ATTR(stat, 0444, part_stat_show, NULL);
  static DEVICE_ATTR(inflight, 0444, part_inflight_show, NULL);
  static DEVICE_ATTR(badblocks, 0644, disk_badblocks_show, 
disk_badblocks_store);
+static DEVICE_ATTR(io_error, 0444, io_error_show, NULL);
  
  #ifdef CONFIG_FAIL_MAKE_REQUEST

  ssize_t part_fail_show(struct device *dev,
@@ -1437,6 +1455,7 @@ static struct attribute *disk_attrs[] = {
  #ifdef CONFIG_FAIL_IO_TIMEOUT
_attr_fail_timeout.attr,
  #endif
+   _attr_io_error.attr,
NULL
  };
  
diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h

index 24125778ef3e..4fe3836d2308 100644
--- a/include/linux/part_stat.h
+++ b/include/linux/part_stat.h
@@ -9,6 +9,7 @@ struct disk_stats {
unsigned long sectors[NR_STAT_GROUPS];
unsigned long ios[NR_STAT_GROUPS];
unsigned long merges[NR_STAT_GROUPS];
+   unsigned long io_errors[NR_STAT_GROUPS];
unsigned long io_ticks;
local_t in_flight[2];
  };



--
zhenwei pi

Re: [RFC PATCH net-next 7/9] net: dsa: microchip: ksz9477: add hardware time stamping support

2020-10-21 Thread Richard Cochran

I'm just catching up with this.

Really. Truly. Please -- Include the maintainer on CC for such patches!

In case you don't know who that is, you can always consult the MAINTAINERS file.

There you will find the following entry.

PTP HARDWARE CLOCK SUPPORT
M:  Richard Cochran 
L:  net...@vger.kernel.org
S:  Maintained
W:  http://linuxptp.sourceforge.net/

On Thu, Oct 22, 2020 at 02:39:35AM +0300, Vladimir Oltean wrote:
> On Mon, Oct 19, 2020 at 07:24:33PM +0200, Christian Eggers wrote:
> > The PTP hardware performs internal detection of PTP frames (likely
> > similar as ptp_classify_raw() and ptp_parse_header()). As these filters
> > cannot be disabled, the current delay mode (E2E/P2P) and the clock mode
> > (master/slave) must be configured via sysfs attributes.

This is a complete no-go.  NAK.

Richard

[PATCH] treewide: Convert macro and uses of section(foo) to section("foo")

2020-10-21 Thread Joe Perches

Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.

Remove the quote operator # from compiler_attributes.h __section macro.

Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.

Conversion done using a script:

Link: 
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.ca...@perches.com/2-convert_section.pl

Signed-off-by: Joe Perches 
---

This conversion was previously submitted to -next last month
https://lore.kernel.org/lkml/46f69161e60b802488ba8c8f3f8bbf922aa3b49b.ca...@perches.com/

Nick Desaulniers found a defect in the conversion of 2 boot files
for powerpc, but no other defect was found for any other arch.

The script was corrected to avoid converting these 2 files.

There is no difference between the script output when run on today's -next
and Linus' tree through commit f804b3159482, so this should be reasonable to
apply now.

 arch/arc/include/asm/linkage.h|  8 +++
 arch/arc/include/asm/mach_desc.h  |  2 +-
 arch/arc/plat-hsdk/platform.c |  2 +-
 arch/arm/include/asm/cache.h  |  2 +-
 arch/arm/include/asm/cpuidle.h|  2 +-
 arch/arm/include/asm/idmap.h  |  2 +-
 arch/arm/include/asm/mach/arch.h  |  4 ++--
 arch/arm/include/asm/setup.h  |  2 +-
 arch/arm/include/asm/smp.h|  2 +-
 arch/arm/include/asm/tcm.h|  8 +++
 arch/arm/kernel/cpuidle.c |  2 +-
 arch/arm/kernel/devtree.c |  2 +-
 arch/arm64/include/asm/cache.h|  2 +-
 arch/arm64/kernel/efi.c   |  2 +-
 arch/arm64/kernel/smp_spin_table.c|  2 +-
 arch/arm64/mm/mmu.c   |  2 +-
 arch/csky/include/asm/tcm.h   |  8 +++
 arch/ia64/include/asm/cache.h |  2 +-
 arch/microblaze/kernel/setup.c|  2 +-
 arch/mips/include/asm/cache.h |  2 +-
 arch/mips/include/asm/machine.h   |  2 +-
 arch/mips/kernel/setup.c  |  2 +-
 arch/mips/mm/init.c   |  2 +-
 arch/parisc/include/asm/cache.h   |  2 +-
 arch/parisc/include/asm/ldcw.h|  2 +-
 arch/parisc/kernel/ftrace.c   |  2 +-
 arch/parisc/mm/init.c |  6 ++---
 arch/powerpc/include/asm/cache.h  |  2 +-
 arch/powerpc/include/asm/machdep.h|  2 +-
 arch/powerpc/kernel/btext.c   |  2 +-
 arch/powerpc/kernel/prom_init.c   |  2 +-
 arch/powerpc/kvm/book3s_64_vio_hv.c   |  2 +-
 arch/riscv/include/asm/soc.h  |  4 ++--
 arch/riscv/kernel/cpu_ops.c   |  4 ++--
 arch/riscv/kernel/setup.c |  4 ++--
 arch/s390/boot/startup.c  |  2 +-
 arch/s390/include/asm/cache.h |  2 +-
 arch/s390/include/asm/sections.h  |  4 ++--
 arch/s390/mm/init.c   |  2 +-
 arch/sh/boards/of-generic.c   |  2 +-
 arch/sh/include/asm/cache.h   |  2 +-
 arch/sh/include/asm/machvec.h |  2 +-
 arch/sh/include/asm/smp.h |  2 +-
 arch/sparc/include/asm/cache.h|  2 +-
 arch/sparc/kernel/btext.c |  2 +-
 arch/um/include/shared/init.h | 22 -
 arch/um/kernel/skas/clone.c   |  2 +-
 arch/um/kernel/um_arch.c  |  2 +-
 arch/x86/boot/compressed/pgtable_64.c |  2 +-
 arch/x86/boot/tty.c   |  8 +++
 arch/x86/boot/video.h |  2 +-
 arch/x86/include/asm/apic.h   |  4 ++--
 arch/x86/include/asm/cache.h  |  2 +-
 arch/x86/include/asm/intel-mid.h  |  2 +-
 arch/x86/include/asm/irqflags.h   |  2 +-
 arch/x86/include/asm/mem_encrypt.h|  2 +-
 arch/x86/include/asm/setup.h  |  2 +-
 arch/x86/kernel/cpu/cpu.h |  2 +-
 arch/x86/kernel/head64.c  |  2 +-
 arch/x86/mm/mem_encrypt.c |  6 ++---
 arch/x86/mm/mem_encrypt_identity.c|  2 +-
 arch/x86/platform/pvh/enlighten.c |  4 ++--
 arch/x86/purgatory/purgatory.c|  4 ++--
 arch/x86/um/stub_segv.c   |  2 +-
 arch/x86/xen/enlighten.c  |  2 +-
 arch/x86/xen/enlighten_pvh.c  |  2 +-
 arch/xtensa/kernel/setup.c|  2 +-
 drivers/clk/clk.c |  2 +-
 drivers/clocksource/timer-probe.c |  2 +-
 drivers/irqchip/irqchip.c |  2 +-
 drivers/of/of_reserved_mem.c  |  2 +-
 drivers/thermal/thermal_core.h|  2 +-
 fs/xfs/xfs_message.h  |  2 +-
 include/asm-generic/bug.h |  6 ++---
 include/asm-generic/error-injection.h |  2 +-
 include/asm-generic/kprobes.h |  4 ++--
 include/kunit/test.h  |  2 +-
 include/linux/acpi.h  |  4 ++--
 include/linux/cache.h |  2 +-
 include/linux/compiler.h  |  8 +++
 include/linux/compiler_attributes.h   |  2 +-
 include/linux/cpu.h   |  2 +-

Re: [PATCH v2 02/20] kvm: x86/mmu: Introduce tdp_iter

2020-10-21 Thread Yu Zhang

On Wed, Oct 21, 2020 at 11:08:52AM -0700, Ben Gardon wrote:
> On Wed, Oct 21, 2020 at 7:59 AM Yu Zhang  wrote:
> >
> > On Wed, Oct 14, 2020 at 11:26:42AM -0700, Ben Gardon wrote:
> > > The TDP iterator implements a pre-order traversal of a TDP paging
> > > structure. This iterator will be used in future patches to create
> > > an efficient implementation of the KVM MMU for the TDP case.
> > >
> > > Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
> > > machine. This series introduced no new failures.
> > >
> > > This series can be viewed in Gerrit at:
> > >   https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
> > >
> > > Signed-off-by: Ben Gardon 
> > > ---
> > >  arch/x86/kvm/Makefile   |   3 +-
> > >  arch/x86/kvm/mmu/mmu.c  |  66 
> > >  arch/x86/kvm/mmu/mmu_internal.h |  66 
> > >  arch/x86/kvm/mmu/tdp_iter.c | 176 
> > >  arch/x86/kvm/mmu/tdp_iter.h |  56 ++
> > >  5 files changed, 300 insertions(+), 67 deletions(-)
> > >  create mode 100644 arch/x86/kvm/mmu/tdp_iter.c
> > >  create mode 100644 arch/x86/kvm/mmu/tdp_iter.h
> > >
> > > diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
> > > index 7f86a14aed0e9..4525c1151bf99 100644
> > > --- a/arch/x86/kvm/Makefile
> > > +++ b/arch/x86/kvm/Makefile
> > > @@ -15,7 +15,8 @@ kvm-$(CONFIG_KVM_ASYNC_PF)  += $(KVM)/async_pf.o
> > >
> > >  kvm-y+= x86.o emulate.o i8259.o irq.o lapic.o \
> > >  i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o 
> > > \
> > > -hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o
> > > +hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
> > > +mmu/tdp_iter.o
> > >
> > >  kvm-intel-y  += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o 
> > > vmx/vmcs12.o \
> > >  vmx/evmcs.o vmx/nested.o vmx/posted_intr.o
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 6c9db349600c8..6d82784ed5679 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -121,28 +121,6 @@ module_param(dbg, bool, 0644);
> > >
> > >  #define PTE_PREFETCH_NUM 8
> > >
> > > -#define PT_FIRST_AVAIL_BITS_SHIFT 10
> > > -#define PT64_SECOND_AVAIL_BITS_SHIFT 54
> > > -
> > > -/*
> > > - * The mask used to denote special SPTEs, which can be either MMIO SPTEs 
> > > or
> > > - * Access Tracking SPTEs.
> > > - */
> > > -#define SPTE_SPECIAL_MASK (3ULL << 52)
> > > -#define SPTE_AD_ENABLED_MASK (0ULL << 52)
> > > -#define SPTE_AD_DISABLED_MASK (1ULL << 52)
> > > -#define SPTE_AD_WRPROT_ONLY_MASK (2ULL << 52)
> > > -#define SPTE_MMIO_MASK (3ULL << 52)
> > > -
> > > -#define PT64_LEVEL_BITS 9
> > > -
> > > -#define PT64_LEVEL_SHIFT(level) \
> > > - (PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS)
> > > -
> > > -#define PT64_INDEX(address, level)\
> > > - (((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 
> > > 1))
> > > -
> > > -
> > >  #define PT32_LEVEL_BITS 10
> > >
> > >  #define PT32_LEVEL_SHIFT(level) \
> > > @@ -155,19 +133,6 @@ module_param(dbg, bool, 0644);
> > >  #define PT32_INDEX(address, level)\
> > >   (((address) >> PT32_LEVEL_SHIFT(level)) & ((1 << PT32_LEVEL_BITS) - 
> > > 1))
> > >
> > > -
> > > -#ifdef CONFIG_DYNAMIC_PHYSICAL_MASK
> > > -#define PT64_BASE_ADDR_MASK (physical_mask & ~(u64)(PAGE_SIZE-1))
> > > -#else
> > > -#define PT64_BASE_ADDR_MASK (((1ULL << 52) - 1) & ~(u64)(PAGE_SIZE-1))
> > > -#endif
> > > -#define PT64_LVL_ADDR_MASK(level) \
> > > - (PT64_BASE_ADDR_MASK & ~((1ULL << (PAGE_SHIFT + (((level) - 1) \
> > > - * PT64_LEVEL_BITS))) - 1))
> > > -#define PT64_LVL_OFFSET_MASK(level) \
> > > - (PT64_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
> > > - * PT64_LEVEL_BITS))) - 1))
> > > -
> > >  #define PT32_BASE_ADDR_MASK PAGE_MASK
> > >  #define PT32_DIR_BASE_ADDR_MASK \
> > >   (PAGE_MASK & ~((1ULL << (PAGE_SHIFT + PT32_LEVEL_BITS)) - 1))
> > > @@ -192,8 +157,6 @@ module_param(dbg, bool, 0644);
> > >  #define SPTE_HOST_WRITEABLE  (1ULL << PT_FIRST_AVAIL_BITS_SHIFT)
> > >  #define SPTE_MMU_WRITEABLE   (1ULL << (PT_FIRST_AVAIL_BITS_SHIFT + 1))
> > >
> > > -#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)
> > > -
> > >  /* make pte_list_desc fit well in cache line */
> > >  #define PTE_LIST_EXT 3
> > >
> > > @@ -349,11 +312,6 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 
> > > access_mask)
> > >  }
> > >  EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
> > >
> > > -static bool is_mmio_spte(u64 spte)
> > > -{
> > > - return (spte & SPTE_SPECIAL_MASK) == SPTE_MMIO_MASK;
> > > -}
> > > -
> > >  static inline bool sp_ad_disabled(struct kvm_mmu_page *sp)
> > >  {
> > >   return sp->role.ad_disabled;
> > > @@ -626,35 +584,11 @@ static int is_nx(struct

Re: [PATCH rfc 0/2] mm: cma: make cma_release() non-blocking

2020-10-21 Thread Roman Gushchin

On Wed, Oct 21, 2020 at 05:15:53PM -0700, Mike Kravetz wrote:
> On 10/16/20 3:52 PM, Roman Gushchin wrote:
> > This small patchset makes cma_release() non-blocking and simplifies
> > the code in hugetlbfs, where previously we had to temporarily drop
> > hugetlb_lock around the cma_release() call.
> > 
> > It should help Zi Yan on his work on 1 GB THPs: splitting a gigantic
> > THP under a memory pressure requires a cma_release() call. If it's
> > a blocking function, it complicates the already complicated code.
> > Because there are at least two use cases like this (hugetlbfs is
> > another example), I believe it's just better to make cma_release()
> > non-blocking.
> > 
> > It also makes it more consistent with other memory releasing functions
> > in the kernel: most of them are non-blocking.
> 
> Thanks for looking into this Roman.

Hi Mike,

> 
> I may be missing something, but why does cma_release have to be blocking
> today?  Certainly, it takes the bitmap in cma_clear_bitmap and could
> block.  However, I do not see why cma->lock has to be a mutex.  I may be
> missing something, but I do not see any code protected by the mutex doing
> anything that could sleep?
> 
> Could we simply change that mutex to a spinlock?

I actually have suggested it few months ago, but the idea was
opposed by Joonsoo: https://lkml.org/lkml/2020/4/3/12 .

The time of a bitmap operation is definitely not an issue in my context,
but I can't speak for something like an embedded/rt case.

Thanks!

Re: [PATCH v4 2/2] PM / devfreq: Add governor attribute flag for specifc sysfs nodes

2020-10-21 Thread Chanwoo Choi

On 10/22/20 11:30 AM, Dmitry Osipenko wrote:
> Hello Chanwoo,
> 
> 20.10.2020 06:04, Chanwoo Choi пишет:
>> @@ -1354,55 +1365,71 @@ static ssize_t governor_store(struct device *dev, 
>> struct device_attribute *attr,
>>  struct devfreq *df = to_devfreq(dev);
>>  int ret;
>>  char str_governor[DEVFREQ_NAME_LEN + 1];
>> -const struct devfreq_governor *governor, *prev_governor;
>> +const struct devfreq_governor *new_governor, *prev_governor;
>>  
>>  if (!df->governor)
>>  return -EINVAL;
>> +prev_governor = df->governor;
>>  
>>  ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor);
>>  if (ret != 1)
>>  return -EINVAL;
>>  
>>  mutex_lock(_list_lock);
>> -governor = try_then_request_governor(str_governor);
>> -if (IS_ERR(governor)) {
>> -ret = PTR_ERR(governor);
>> +new_governor = try_then_request_governor(str_governor);
>> +if (IS_ERR(new_governor)) {
>> +ret = PTR_ERR(new_governor);
>>  goto out;
>>  }
>> -if (df->governor == governor) {
>> +if (prev_governor == new_governor) {
>>  ret = 0;
>>  goto out;
>> -} else if (IS_SUPPORTED_FLAG(df->governor->flags, IMMUTABLE)
>> -|| IS_SUPPORTED_FLAG(governor->flags, IMMUTABLE)) {
>> +} else if (IS_SUPPORTED_FLAG(prev_governor->flags, IMMUTABLE)
>> +|| IS_SUPPORTED_FLAG(new_governor->flags, IMMUTABLE)) {
>>  ret = -EINVAL;
>>  goto out;
>>  }
>>  
>> -ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>> +/*
>> + * Stop the previous governor and remove the specific sysfs files
>> + * which depend on previous governor.
>> + */
>> +ret = prev_governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>>  if (ret) {
>>  dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
>> - __func__, df->governor->name, ret);
>> + __func__, prev_governor->name, ret);
>>  goto out;
>>  }
>>  
>> -prev_governor = df->governor;
>> -df->governor = governor;
>> -strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN);
>> -ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
>> +remove_sysfs_files(df, prev_governor);
>> +
>> +/*
>> + * Start the new governor and create the specific sysfs files
>> + * which depend on new governor.
>> + */
>> +strncpy(df->governor_name, new_governor->name, DEVFREQ_NAME_LEN);
>> +ret = new_governor->event_handler(df, DEVFREQ_GOV_START, NULL);
> 
> The "df->governor = new_governor" needs to be set before invocation of
> the event_handler(GOV_START) because governors like a
> performance-governor need to have the df->governor being set before the
> GOV_START callback, otherwise performance governor will use
> devfreq->prev_governor->get_target_freq() in devfreq_update_target(),
> thus setting a wrong (non-max) freq.

You're right. It is my mistake. Thanks for your detailed review.

> 
>>  if (ret) {
>>  dev_warn(dev, "%s: Governor %s not started(%d)\n",
>> - __func__, df->governor->name, ret);
>> -df->governor = prev_governor;
>> + __func__, new_governor->name, ret);
>>  strncpy(df->governor_name, prev_governor->name,
>>  DEVFREQ_NAME_LEN);
>> -ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
> 
> Same here.

OK.

> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: [PATCH v4 2/2] PM / devfreq: Add governor attribute flag for specifc sysfs nodes

2020-10-21 Thread Chanwoo Choi

On 10/22/20 11:30 AM, Dmitry Osipenko wrote:
> 20.10.2020 06:04, Chanwoo Choi пишет:
>>  /**
>>   * struct devfreq_governor - Devfreq policy governor
>>   * @node:   list node - contains registered devfreq governors
>>   * @name:   Governor's name
>> + * @attr:   Governor's sysfs attribute flag
>>   * @flags:  Governor's feature flags
>>   * @get_target_freq:Returns desired operating frequency for the 
>> device.
>>   *  Basically, get_target_freq will run
>> @@ -57,6 +68,7 @@ struct devfreq_governor {
>>  struct list_head node;
>>  
>>  const char name[DEVFREQ_NAME_LEN];
>> +const u64 attr;
>>  const u64 flags;
> 
> What about to use plural for the "attrs" as well?
> 
> 

OK. I'll edit it.


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: [PATCH v4 2/2] PM / devfreq: Add governor attribute flag for specifc sysfs nodes

2020-10-21 Thread Dmitry Osipenko

Hello Chanwoo,

20.10.2020 06:04, Chanwoo Choi пишет:
> @@ -1354,55 +1365,71 @@ static ssize_t governor_store(struct device *dev, 
> struct device_attribute *attr,
>   struct devfreq *df = to_devfreq(dev);
>   int ret;
>   char str_governor[DEVFREQ_NAME_LEN + 1];
> - const struct devfreq_governor *governor, *prev_governor;
> + const struct devfreq_governor *new_governor, *prev_governor;
>  
>   if (!df->governor)
>   return -EINVAL;
> + prev_governor = df->governor;
>  
>   ret = sscanf(buf, "%" __stringify(DEVFREQ_NAME_LEN) "s", str_governor);
>   if (ret != 1)
>   return -EINVAL;
>  
>   mutex_lock(_list_lock);
> - governor = try_then_request_governor(str_governor);
> - if (IS_ERR(governor)) {
> - ret = PTR_ERR(governor);
> + new_governor = try_then_request_governor(str_governor);
> + if (IS_ERR(new_governor)) {
> + ret = PTR_ERR(new_governor);
>   goto out;
>   }
> - if (df->governor == governor) {
> + if (prev_governor == new_governor) {
>   ret = 0;
>   goto out;
> - } else if (IS_SUPPORTED_FLAG(df->governor->flags, IMMUTABLE)
> - || IS_SUPPORTED_FLAG(governor->flags, IMMUTABLE)) {
> + } else if (IS_SUPPORTED_FLAG(prev_governor->flags, IMMUTABLE)
> + || IS_SUPPORTED_FLAG(new_governor->flags, IMMUTABLE)) {
>   ret = -EINVAL;
>   goto out;
>   }
>  
> - ret = df->governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
> + /*
> +  * Stop the previous governor and remove the specific sysfs files
> +  * which depend on previous governor.
> +  */
> + ret = prev_governor->event_handler(df, DEVFREQ_GOV_STOP, NULL);
>   if (ret) {
>   dev_warn(dev, "%s: Governor %s not stopped(%d)\n",
> -  __func__, df->governor->name, ret);
> +  __func__, prev_governor->name, ret);
>   goto out;
>   }
>  
> - prev_governor = df->governor;
> - df->governor = governor;
> - strncpy(df->governor_name, governor->name, DEVFREQ_NAME_LEN);
> - ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);
> + remove_sysfs_files(df, prev_governor);
> +
> + /*
> +  * Start the new governor and create the specific sysfs files
> +  * which depend on new governor.
> +  */
> + strncpy(df->governor_name, new_governor->name, DEVFREQ_NAME_LEN);
> + ret = new_governor->event_handler(df, DEVFREQ_GOV_START, NULL);

The "df->governor = new_governor" needs to be set before invocation of
the event_handler(GOV_START) because governors like a
performance-governor need to have the df->governor being set before the
GOV_START callback, otherwise performance governor will use
devfreq->prev_governor->get_target_freq() in devfreq_update_target(),
thus setting a wrong (non-max) freq.

>   if (ret) {
>   dev_warn(dev, "%s: Governor %s not started(%d)\n",
> -  __func__, df->governor->name, ret);
> - df->governor = prev_governor;
> +  __func__, new_governor->name, ret);
>   strncpy(df->governor_name, prev_governor->name,
>   DEVFREQ_NAME_LEN);
> - ret = df->governor->event_handler(df, DEVFREQ_GOV_START, NULL);

Same here.

Re: [PATCH v4 2/2] PM / devfreq: Add governor attribute flag for specifc sysfs nodes

2020-10-21 Thread Dmitry Osipenko

20.10.2020 06:04, Chanwoo Choi пишет:
>  /**
>   * struct devfreq_governor - Devfreq policy governor
>   * @node:list node - contains registered devfreq governors
>   * @name:Governor's name
> + * @attr:Governor's sysfs attribute flag
>   * @flags:   Governor's feature flags
>   * @get_target_freq: Returns desired operating frequency for the device.
>   *   Basically, get_target_freq will run
> @@ -57,6 +68,7 @@ struct devfreq_governor {
>   struct list_head node;
>  
>   const char name[DEVFREQ_NAME_LEN];
> + const u64 attr;
>   const u64 flags;

What about to use plural for the "attrs" as well?

Re: [PATCH] scsi: megaraid_sas: use spin_lock() in hard IRQ

2020-10-21 Thread Finn Thain

On Wed, 21 Oct 2020, Xianting Tian wrote:

> Since we already in hard IRQ context when running megasas_isr(),

On m68k, hard irq context does not mean interrupts are disabled. Are there 
no other architectures in that category?

> so use spin_lock() is enough, which is faster than spin_lock_irqsave().
> 

Is that measurable?

> Signed-off-by: Xianting Tian 
> ---
>  drivers/scsi/megaraid/megaraid_sas_base.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c 
> b/drivers/scsi/megaraid/megaraid_sas_base.c
> index 2b7e7b5f3..bd186254d 100644
> --- a/drivers/scsi/megaraid/megaraid_sas_base.c
> +++ b/drivers/scsi/megaraid/megaraid_sas_base.c
> @@ -3977,15 +3977,14 @@ static irqreturn_t megasas_isr(int irq, void *devp)
>  {
>   struct megasas_irq_context *irq_context = devp;
>   struct megasas_instance *instance = irq_context->instance;
> - unsigned long flags;
>   irqreturn_t rc;
>  
>   if (atomic_read(>fw_reset_no_pci_access))
>   return IRQ_HANDLED;
>  
> - spin_lock_irqsave(>hba_lock, flags);
> + spin_lock(>hba_lock);
>   rc = megasas_deplete_reply_queue(instance, DID_OK);
> - spin_unlock_irqrestore(>hba_lock, flags);
> + spin_unlock(>hba_lock);
>  
>   return rc;
>  }
>

1 2 3 4 5 6 7 8 9 >

1 - 100 of 877 matches

Mail list logo