date:20170622

Re: Regression in kernel 4.12-rc1 for Powerpc 32 - bisected to commit 3448890c32c3

2017-06-22 Thread Al Viro

On Thu, Jun 22, 2017 at 09:19:58AM -0500, Larry Finger wrote:

> > Ugh...  MintPPC appears to be dead.  On KVM with Debian userland (either
> > jessie or wheezy - no difference in result) booting the commit in
> > question with your .config oopses as soon as pata_macio is initialized,
> > due to the bug in "treewide: Move dma_ops from struct dev_archdata into
> > struct device", and after cherry-picking your own fix for that (commit
> > 46f401c4297a "powerpc/pmac: Fix crash in dma-mapping.h with NULL dma_ops")
> > the result boots just fine.
> > 
> > Again, that happens both for Debian 8 and Debian 7 userlands, so unless
> > Mint had been doing something very odd there, I would question the accuracy
> > of your bisect...

> Any chance that real hardware differs from KVM emulation?

For that one?  Bloody unlikely; udev could, theoretically, hit different 
codepaths
due to different devices being observed, etc., but changes in that commit are
not in the areas that would be easy to get wrong in emulator.

> All I know at this
> point is that commit f2ed8beb with 46f401c4 backported boots OK and commit
> 3448890c with the same backport fails.
> 
> I will try loading jessie and see what happens.

I would recheck which kernels are being booted - I had screwed that up during 
long
bisects often enough...

BTW, could you try to check what happens if you kill the
if (__builtin_constant_p(n) && (n <= 8))
bits in raw_copy_{to,from}_user()?  The usefulness of those (in 
__copy_from_user()
originally) had always been dubious and the things are simpler without them.
If _that_ turns out to cure breakage, I would be very surprised, though.

Re: [PATCH] of: detect invalid phandle in overlay

2017-06-22 Thread Rob Herring

On Wed, Jun 21, 2017 at 12:21:56PM -0700, frowand.l...@gmail.com wrote:
> From: Frank Rowand 
> 
> Overlays are not allowed to modify phandle values of previously existing
> nodes because there is no information available to allow fixup up
> properties that use the previously existing phandle.
> 
> Signed-off-by: Frank Rowand 
> ---
>  drivers/of/overlay.c | 4 
>  1 file changed, 4 insertions(+)

Applied.

Re: [PATCH 0/3] HID: multitouch: fix a corner case of some Win 8 devices

2017-06-22 Thread Arek Burdach


Hi,

On 15.06.2017 15:32, Benjamin Tissoires wrote:

It looks like the Microsft certification misses one case of released fingers.

The (only) solution we can have against that is to wait for a hundred of ms,
and if no input report comes in, consider that the touches should have been
released. The spec, as I read it, enforces that.

Arek, can you please give a test to this new series?
I managed to find out a way to have the IRQ and the timeout exclusive, and
also added a few optimizations.

Sorry for delayed response.
Works great! Highly recommend this fix.

Cheers,
Arek

[tip:irq/core] genirq: Allow fwnode to carry name information only

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  d59f6617eef0f76e34f7a9993f5645c5ef467e42
Gitweb: http://git.kernel.org/tip/d59f6617eef0f76e34f7a9993f5645c5ef467e42
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:05 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:08 +0200

genirq: Allow fwnode to carry name information only

In order to provide proper debug interface it's required to have domain
names available when the domain is added. Non fwnode based architectures
like x86 have no way to do so.

It's not possible to use domain ops or host data for this as domain ops
might be the same for several instances, but the names have to be unique.

Extend the irqchip fwnode to allow transporting the domain name. If no node
is supplied, create a 'unknown-N' placeholder.

Warn if an invalid node is supplied and treat it like no node. This happens
e.g. with i2 devices on x86 which hand in an ACPI type node which has no
interface for retrieving the name.

[ Folded a fix from Marc to make DT name parsing work ]

Signed-off-by: Thomas Gleixner 
Acked-by: Marc Zyngier 
Cc: Jens Axboe 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235443.588784...@linutronix.de

---
 include/linux/irqdomain.h |  31 +-
 kernel/irq/irqdomain.c| 105 --
 2 files changed, 122 insertions(+), 14 deletions(-)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index 9f36160..9cf32a2 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -189,6 +189,9 @@ enum {
/* Irq domain implements MSI remapping */
IRQ_DOMAIN_FLAG_MSI_REMAP   = (1 << 5),
 
+   /* Irq domain name was allocated in __irq_domain_add() */
+   IRQ_DOMAIN_NAME_ALLOCATED   = (1 << 6),
+
/*
 * Flags starting from IRQ_DOMAIN_FLAG_NONCORE are reserved
 * for implementation specific purposes and ignored by the
@@ -203,7 +206,33 @@ static inline struct device_node 
*irq_domain_get_of_node(struct irq_domain *d)
 }
 
 #ifdef CONFIG_IRQ_DOMAIN
-struct fwnode_handle *irq_domain_alloc_fwnode(void *data);
+struct fwnode_handle *__irq_domain_alloc_fwnode(unsigned int type, int id,
+   const char *name, void *data);
+
+enum {
+   IRQCHIP_FWNODE_REAL,
+   IRQCHIP_FWNODE_NAMED,
+   IRQCHIP_FWNODE_NAMED_ID,
+};
+
+static inline
+struct fwnode_handle *irq_domain_alloc_named_fwnode(const char *name)
+{
+   return __irq_domain_alloc_fwnode(IRQCHIP_FWNODE_NAMED, 0, name, NULL);
+}
+
+static inline
+struct fwnode_handle *irq_domain_alloc_named_id_fwnode(const char *name, int 
id)
+{
+   return __irq_domain_alloc_fwnode(IRQCHIP_FWNODE_NAMED_ID, id, name,
+NULL);
+}
+
+static inline struct fwnode_handle *irq_domain_alloc_fwnode(void *data)
+{
+   return __irq_domain_alloc_fwnode(IRQCHIP_FWNODE_REAL, 0, NULL, data);
+}
+
 void irq_domain_free_fwnode(struct fwnode_handle *fwnode);
 struct irq_domain *__irq_domain_add(struct fwnode_handle *fwnode, int size,
irq_hw_number_t hwirq_max, int direct_max,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 70b9da7..e1b925b 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -26,39 +26,61 @@ static struct irq_domain *irq_default_domain;
 static void irq_domain_check_hierarchy(struct irq_domain *domain);
 
 struct irqchip_fwid {
-   struct fwnode_handle fwnode;
-   char *name;
-   void *data;
+   struct fwnode_handlefwnode;
+   unsigned inttype;
+   char*name;
+   void*data;
 };
 
 /**
  * irq_domain_alloc_fwnode - Allocate a fwnode_handle suitable for
  *   identifying an irq domain
- * @data: optional user-provided data
+ * @type:  Type of irqchip_fwnode. See linux/irqdomain.h
+ * @name:  Optional user provided domain name
+ * @id:Optional user provided id if name != NULL
+ * @data:  Optional user-provided data
  *
- * Allocate a struct device_node, and return a poiner to the embedded
+ * Allocate a struct irqchip_fwid, and return a poiner to the embedded
  * fwnode_handle (or NULL on failure).
+ *
+ * Note: The types IRQCHIP_FWNODE_NAMED and IRQCHIP_FWNODE_NAMED_ID are
+ * solely to transport name information to irqdomain creation code. The
+ * node is not stored. For other types the pointer is kept in the irq
+ * domain struct.
  */
-struct fwnode_handle *irq_domain_alloc_fwnode(void *data)
+struct fwnode_handle *__irq_domain_alloc_fwnode(unsigned int type, int id,
+   const char *name, void *data)
 {

[tip:irq/core] genirq/msi: Prevent overwriting domain name

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  0165308a2f994939d2e1b36624f5a8f57746bc88
Gitweb: http://git.kernel.org/tip/0165308a2f994939d2e1b36624f5a8f57746bc88
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:04 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:08 +0200

genirq/msi: Prevent overwriting domain name

Prevent overwriting an already assigned domain name. Remove the extra check
for chip->name, because if domain->name is NULL overwriting it with NULL is
not a problem.

Signed-off-by: Thomas Gleixner 
Acked-by: Marc Zyngier 
Cc: Jens Axboe 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235443.510684...@linutronix.de

---
 kernel/irq/msi.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index fe4d48e..9e3f185 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -274,7 +274,8 @@ struct irq_domain *msi_create_irq_domain(struct 
fwnode_handle *fwnode,
 
domain = irq_domain_create_hierarchy(parent, IRQ_DOMAIN_FLAG_MSI, 0,
 fwnode, _domain_ops, info);
-   if (domain && info->chip && info->chip->name)
+
+   if (domain && !domain->name && info->chip)
domain->name = info->chip->name;
 
return domain;

[tip:irq/core] iommu/amd: Add name to irq chip

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  290be194ba9d489e1857cc45d0dd24bf3429156b
Gitweb: http://git.kernel.org/tip/290be194ba9d489e1857cc45d0dd24bf3429156b
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:02 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:07 +0200

iommu/amd: Add name to irq chip

Add the missing name, so debugging will work proper.

Signed-off-by: Thomas Gleixner 
Acked-by: Joerg Roedel 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: io...@lists.linux-foundation.org
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235443.343236...@linutronix.de

---
 drivers/iommu/amd_iommu.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 63cacf5..590e1e8 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4386,10 +4386,11 @@ static void ir_compose_msi_msg(struct irq_data 
*irq_data, struct msi_msg *msg)
 }
 
 static struct irq_chip amd_ir_chip = {
-   .irq_ack = ir_ack_apic_edge,
-   .irq_set_affinity = amd_ir_set_affinity,
-   .irq_set_vcpu_affinity = amd_ir_set_vcpu_affinity,
-   .irq_compose_msi_msg = ir_compose_msi_msg,
+   .name   = "AMD-IR",
+   .irq_ack= ir_ack_apic_edge,
+   .irq_set_affinity   = amd_ir_set_affinity,
+   .irq_set_vcpu_affinity  = amd_ir_set_vcpu_affinity,
+   .irq_compose_msi_msg= ir_compose_msi_msg,
 };
 
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu)

[no subject]

2017-06-22 Thread Sistemi amministratore

ATTENZIONE;

La cassetta postale ha superato il limite di archiviazione, che è 5 GB come 
definiti dall'amministratore, che è attualmente in esecuzione su 10.9GB, non si 
può essere in grado di inviare o ricevere nuovi messaggi fino a ri-convalidare 
la tua mailbox. Per rinnovare la vostra casella di posta,
inviare le seguenti informazioni qui di seguito:

nome:
Nome utente:
Password:
Conferma Password:
E-mail:
telefono:

Se non si riesce a rinnovare la vostra casella di posta, la vostra caselladi
posta sarà disabilitato!

Ci dispiace per l'inconvenienza.
Codice di verifica: en:5678905362.webmail.it...
2017 Mail Technical Support ©2017

grazie
Sistemi amministratore

[tip:irq/core] genirq: Move pending helpers to internal.h

2017-06-22 Thread tip-bot for Christoph Hellwig

Commit-ID:  137221df69c6f8a7002f82dc3d95052d34f5667e
Gitweb: http://git.kernel.org/tip/137221df69c6f8a7002f82dc3d95052d34f5667e
Author: Christoph Hellwig 
AuthorDate: Tue, 20 Jun 2017 01:37:24 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:15 +0200

genirq: Move pending helpers to internal.h

So that the affinity code can reuse them.


Signed-off-by: Christoph Hellwig 
Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20170619235445.109426...@linutronix.de

---
 kernel/irq/internals.h | 38 ++
 kernel/irq/manage.c| 28 
 2 files changed, 38 insertions(+), 28 deletions(-)

diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 2d7927d..20b197f 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -249,6 +249,44 @@ irq_init_generic_chip(struct irq_chip_generic *gc, const 
char *name,
  void __iomem *reg_base, irq_flow_handler_t handler) { }
 #endif /* CONFIG_GENERIC_IRQ_CHIP */
 
+#ifdef CONFIG_GENERIC_PENDING_IRQ
+static inline bool irq_can_move_pcntxt(struct irq_data *data)
+{
+   return irqd_can_move_in_process_context(data);
+}
+static inline bool irq_move_pending(struct irq_data *data)
+{
+   return irqd_is_setaffinity_pending(data);
+}
+static inline void
+irq_copy_pending(struct irq_desc *desc, const struct cpumask *mask)
+{
+   cpumask_copy(desc->pending_mask, mask);
+}
+static inline void
+irq_get_pending(struct cpumask *mask, struct irq_desc *desc)
+{
+   cpumask_copy(mask, desc->pending_mask);
+}
+#else /* CONFIG_GENERIC_PENDING_IRQ */
+static inline bool irq_can_move_pcntxt(struct irq_data *data)
+{
+   return true;
+}
+static inline bool irq_move_pending(struct irq_data *data)
+{
+   return false;
+}
+static inline void
+irq_copy_pending(struct irq_desc *desc, const struct cpumask *mask)
+{
+}
+static inline void
+irq_get_pending(struct cpumask *mask, struct irq_desc *desc)
+{
+}
+#endif /* CONFIG_GENERIC_PENDING_IRQ */
+
 #ifdef CONFIG_GENERIC_IRQ_DEBUGFS
 void irq_add_debugfs_entry(unsigned int irq, struct irq_desc *desc);
 void irq_remove_debugfs_entry(struct irq_desc *desc);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 1e28307..7dcf193 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -168,34 +168,6 @@ void irq_set_thread_affinity(struct irq_desc *desc)
set_bit(IRQTF_AFFINITY, >thread_flags);
 }
 
-#ifdef CONFIG_GENERIC_PENDING_IRQ
-static inline bool irq_can_move_pcntxt(struct irq_data *data)
-{
-   return irqd_can_move_in_process_context(data);
-}
-static inline bool irq_move_pending(struct irq_data *data)
-{
-   return irqd_is_setaffinity_pending(data);
-}
-static inline void
-irq_copy_pending(struct irq_desc *desc, const struct cpumask *mask)
-{
-   cpumask_copy(desc->pending_mask, mask);
-}
-static inline void
-irq_get_pending(struct cpumask *mask, struct irq_desc *desc)
-{
-   cpumask_copy(mask, desc->pending_mask);
-}
-#else
-static inline bool irq_can_move_pcntxt(struct irq_data *data) { return true; }
-static inline bool irq_move_pending(struct irq_data *data) { return false; }
-static inline void
-irq_copy_pending(struct irq_desc *desc, const struct cpumask *mask) { }
-static inline void
-irq_get_pending(struct cpumask *mask, struct irq_desc *desc) { }
-#endif
-
 int irq_do_set_affinity(struct irq_data *data, const struct cpumask *mask,
bool force)
 {

[tip:irq/core] genirq: Move initial affinity setup to irq_startup()

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  2e051552df69af6d134c2592d0d6f1ac80f01190
Gitweb: http://git.kernel.org/tip/2e051552df69af6d134c2592d0d6f1ac80f01190
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:23 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:15 +0200

genirq: Move initial affinity setup to irq_startup()

The startup vs. setaffinity ordering of interrupts depends on the
IRQF_NOAUTOEN flag. Chained interrupts are not getting any affinity
assignment at all.

A regular interrupt is started up and then the affinity is set. A
IRQF_NOAUTOEN marked interrupt is not started up, but the affinity is set
nevertheless.

Move the affinity setup to startup_irq() so the ordering is always the same
and chained interrupts get the proper default affinity assigned as well.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235445.020534...@linutronix.de

---
 kernel/irq/chip.c   |  2 ++
 kernel/irq/manage.c | 15 ++-
 2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c
index bc1331f..e290d73 100644
--- a/kernel/irq/chip.c
+++ b/kernel/irq/chip.c
@@ -213,6 +213,8 @@ int irq_startup(struct irq_desc *desc, bool resend)
irq_enable(desc);
}
irq_state_set_started(desc);
+   /* Set default affinity mask once everything is setup */
+   irq_setup_affinity(desc);
}
 
if (resend)
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 907fb79..1e28307 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1327,6 +1327,12 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
if (new->flags & IRQF_ONESHOT)
desc->istate |= IRQS_ONESHOT;
 
+   /* Exclude IRQ from balancing if requested */
+   if (new->flags & IRQF_NOBALANCING) {
+   irq_settings_set_no_balancing(desc);
+   irqd_set(>irq_data, IRQD_NO_BALANCING);
+   }
+
if (irq_settings_can_autoenable(desc)) {
irq_startup(desc, true);
} else {
@@ -1341,15 +1347,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, 
struct irqaction *new)
desc->depth = 1;
}
 
-   /* Exclude IRQ from balancing if requested */
-   if (new->flags & IRQF_NOBALANCING) {
-   irq_settings_set_no_balancing(desc);
-   irqd_set(>irq_data, IRQD_NO_BALANCING);
-   }
-
-   /* Set default affinity mask once everything is setup */
-   irq_setup_affinity(desc);
-
} else if (new->flags & IRQF_TRIGGER_MASK) {
unsigned int nmsk = new->flags & IRQF_TRIGGER_MASK;
unsigned int omsk = irqd_get_trigger_type(>irq_data);

[tip:irq/core] genirq/cpuhotplug: Use effective affinity mask

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  415fcf1a2293046e0c1f4ab8558a87bad66652b1
Gitweb: http://git.kernel.org/tip/415fcf1a2293046e0c1f4ab8558a87bad66652b1
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:39 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:21 +0200

genirq/cpuhotplug: Use effective affinity mask

If the architecture supports the effective affinity mask, migrating
interrupts away which are not targeted by the effective mask is
pointless.

They can stay in the user or system supplied affinity mask, but won't be
targetted at any given point as the affinity setter functions need to
validate against the online cpu mask anyway.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235446.328488...@linutronix.de

---
 kernel/irq/cpuhotplug.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index e09cb91..0b093db 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -14,6 +14,14 @@
 
 #include "internals.h"
 
+/* For !GENERIC_IRQ_EFFECTIVE_AFF_MASK this looks at general affinity mask */
+static inline bool irq_needs_fixup(struct irq_data *d)
+{
+   const struct cpumask *m = irq_data_get_effective_affinity_mask(d);
+
+   return cpumask_test_cpu(smp_processor_id(), m);
+}
+
 static bool migrate_one_irq(struct irq_desc *desc)
 {
struct irq_data *d = irq_desc_get_irq_data(desc);
@@ -42,9 +50,7 @@ static bool migrate_one_irq(struct irq_desc *desc)
 * Note: Do not check desc->action as this might be a chained
 * interrupt.
 */
-   affinity = irq_data_get_affinity_mask(d);
-   if (irqd_is_per_cpu(d) || !irqd_is_started(d) ||
-   !cpumask_test_cpu(smp_processor_id(), affinity)) {
+   if (irqd_is_per_cpu(d) || !irqd_is_started(d) || !irq_needs_fixup(d)) {
/*
 * If an irq move is pending, abort it if the dying CPU is
 * the sole target.
@@ -69,6 +75,8 @@ static bool migrate_one_irq(struct irq_desc *desc)
 */
if (irq_fixup_move_pending(desc, true))
affinity = irq_desc_get_pending_mask(desc);
+   else
+   affinity = irq_data_get_affinity_mask(d);
 
/* Mask the chip for interrupts which cannot move in process context */
if (maskchip && chip->irq_mask)

[tip:irq/core] x86/apic: Move flat_cpu_mask_to_apicid_and() into C source

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  ad95212ee6e0b62f38b287b40c9ab6a1ba3e892b
Gitweb: http://git.kernel.org/tip/ad95212ee6e0b62f38b287b40c9ab6a1ba3e892b
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:40 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:21 +0200

x86/apic: Move flat_cpu_mask_to_apicid_and() into C source

No point in having inlines assigned to function pointers at multiple
places. Just bloats the text.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235446.405975...@linutronix.de

---
 arch/x86/include/asm/apic.h | 28 ++--
 arch/x86/kernel/apic/apic.c | 16 
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index bdffcd9..a86be0a 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -540,28 +540,12 @@ static inline int default_phys_pkg_id(int cpuid_apic, int 
index_msb)
 
 #endif
 
-static inline int
-flat_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
-   const struct cpumask *andmask,
-   unsigned int *apicid)
-{
-   unsigned long cpu_mask = cpumask_bits(cpumask)[0] &
-cpumask_bits(andmask)[0] &
-cpumask_bits(cpu_online_mask)[0] &
-APIC_ALL_CPUS;
-
-   if (likely(cpu_mask)) {
-   *apicid = (unsigned int)cpu_mask;
-   return 0;
-   } else {
-   return -EINVAL;
-   }
-}
-
-extern int
-default_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
-  const struct cpumask *andmask,
-  unsigned int *apicid);
+extern int flat_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+  const struct cpumask *andmask,
+  unsigned int *apicid);
+extern int default_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+ const struct cpumask *andmask,
+ unsigned int *apicid);
 
 static inline void
 flat_vector_allocation_domain(int cpu, struct cpumask *retmask,
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 2d75faf..e9b322f 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -2220,6 +2220,22 @@ int default_cpu_mask_to_apicid_and(const struct cpumask 
*cpumask,
return -EINVAL;
 }
 
+int flat_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
+   const struct cpumask *andmask,
+   unsigned int *apicid)
+{
+   unsigned long cpu_mask = cpumask_bits(cpumask)[0] &
+cpumask_bits(andmask)[0] &
+cpumask_bits(cpu_online_mask)[0] &
+APIC_ALL_CPUS;
+
+   if (likely(cpu_mask)) {
+   *apicid = (unsigned int)cpu_mask;
+   return 0;
+   }
+   return -EINVAL;
+}
+
 /*
  * Override the generic EOI implementation with an optimized version.
  * Only called during early boot when only one CPU is active and with

Re: [PATCH 0/3] some scheduler code movements

2017-06-22 Thread Nicolas Pitre

On Thu, 22 Jun 2017, Ingo Molnar wrote:

> * Nicolas Pitre  wrote:
> 
> > That's against my copy of tip/sched/core as of yesterday:
> > 
> > commit f11cc0760b8397e0d230122606421b6a96e9f869
> > Author: Davidlohr Bueso 
> > AuthorDate: Wed Jun 14 19:37:30 2017 -0700
> > Commit: Ingo Molnar 
> > CommitDate: Tue Jun 20 12:48:37 2017 +0200
> > 
> > sched/core: Drop the unused try_get_task_struct() helper function
> > 
> > on which I pre-applied my previous patch #1/4 ("cpuset/sched: cpuset 
> > makes sense for SMP only") you said having already applied on your side 
> > but that didn't show up in the publicly visible sched/core yet.
> 
> I see where the mismatch comes from - I applied this one from your earlier 
> patches:
> 
>   f5832c1998af: sched/core: Omit building stop_sched_class when !SMP
> 
> ... thus #1/4 was missing from my stack of patches. I'll apply that too and 
> re-try, no need to resend.

OK. Let me know if you still have difficulties.


Nicolas

Re: [PATCH] netvsc: don't access netdev->num_rx_queues directly

2017-06-22 Thread David Miller

From: Arnd Bergmann 
Date: Thu, 22 Jun 2017 00:16:37 +0200

> This structure member is hidden behind CONFIG_SYSFS, and we
> get a build error when that is disabled:
> 
> drivers/net/hyperv/netvsc_drv.c: In function 'netvsc_set_channels':
> drivers/net/hyperv/netvsc_drv.c:754:49: error: 'struct net_device' has no 
> member named 'num_rx_queues'; did you mean 'num_tx_queues'?
> drivers/net/hyperv/netvsc_drv.c: In function 'netvsc_set_rxfh':
> drivers/net/hyperv/netvsc_drv.c:1181:25: error: 'struct net_device' has no 
> member named 'num_rx_queues'; did you mean 'num_tx_queues'?
> 
> As the value is only set once to the argument of alloc_netdev_mq(),
> we can compare against that constant directly.
> 
> Fixes: ff4a44199012 ("netvsc: allow get/set of RSS indirection table")
> Fixes: 2b01888d1b45 ("netvsc: allow more flexible setting of number of 
> channels")
> Signed-off-by: Arnd Bergmann 

Applied and queued up for -stable, thank you.

Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Paolo Abeni

On Thu, 2017-06-22 at 10:30 -0700, Linus Torvalds wrote:
> So if you want to do this optimization, I'd argue that you should just
> do it inside the copy_user_enhanced_fast_string() function itself, the
> same way we already handle the really small case specially in
> copy_user_generic_string().
> 
> And do *not* use the unrolled code, which isn't used for small copies
> anyway - rewrite the "copy_user_generic_unrolled" function in that
> same asm file to have the non-unrolled cases (label "17" and forward)
> accessible, so that you don't bother re-testing the size.

Thank you for the feedback.

I'm quite new to the core x86 land; the rep stosb cost popped out while
messing with the networking. I'll try to dig into the asm.

Regards,

Paolo

[PATCH] FIXUP: CHROMIUM: fix transposed param settings

2017-06-22 Thread Nick Vaccaro

The __cros_ec_pwm_get_duty() routine was transposing the insize and
outsize fields when calling cros_ec_cmd_xfer_status().

The original code worked without error due to size of the two particular
parameter blocks passed to cros_ec_cmd_xfer_status(), so this change is
not fixing an actual runtime problem, just correcting the calling usage.

Signed-off-by: Nick Vaccaro 
---
 drivers/pwm/pwm-cros-ec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pwm/pwm-cros-ec.c b/drivers/pwm/pwm-cros-ec.c
index 2e4ab20cfb83..de5b7c9860b6 100644
--- a/drivers/pwm/pwm-cros-ec.c
+++ b/drivers/pwm/pwm-cros-ec.c
@@ -75,8 +75,8 @@ static int __cros_ec_pwm_get_duty(struct cros_ec_device *ec, 
u8 index,
 
msg->version = 0;
msg->command = EC_CMD_PWM_GET_DUTY;
-   msg->insize = sizeof(*params);
-   msg->outsize = sizeof(*resp);
+   msg->insize = sizeof(*resp);
+   msg->outsize = sizeof(*params);
 
params->pwm_type = EC_PWM_TYPE_GENERIC;
params->index = index;
-- 
2.12.2

Re: [PATCH v3 2/3] perf: xgene: Move PMU leaf functions into function pointer structure

2017-06-22 Thread Mark Rutland

On Tue, Jun 06, 2017 at 11:02:25AM -0700, Hoan Tran wrote:
> This patch moves PMU leaf functions into a function pointer structure.
> It helps code maintain and expasion easier.
> 
> Signed-off-by: Hoan Tran 
> ---
>  drivers/perf/xgene_pmu.c | 85 
> +---
>  1 file changed, 66 insertions(+), 19 deletions(-)

> -static inline u32 xgene_pmu_read_counter(struct xgene_pmu_dev *pmu_dev, int 
> idx)
> +static inline u64 xgene_pmu_read_counter32(struct xgene_pmu_dev *pmu_dev,
> +int idx)
>  {
> - return readl(pmu_dev->inf->csr + PMU_PMEVCNTR0 + (4 * idx));
> + return (u64)readl(pmu_dev->inf->csr + PMU_PMEVCNTR0 + (4 * idx));
>  }

Nit: the cast is redundant, and can go.

Otherwise:

Acked-by: Mark Rutland 

Thanks,
Mark.

Re: [PATCH v3 3/3] perf: xgene: Add support for SoC PMU version 3

2017-06-22 Thread Mark Rutland

On Thu, Jun 22, 2017 at 06:52:56PM +0100, Mark Rutland wrote:
> Hi Hoan,
> 
> This largely looks good; I have one minor comment.
> 
> On Tue, Jun 06, 2017 at 11:02:26AM -0700, Hoan Tran wrote:
> >  static inline void
> > +xgene_pmu_write_counter64(struct xgene_pmu_dev *pmu_dev, int idx, u64 val)
> > +{
> > +   u32 cnt_lo, cnt_hi;
> > +
> > +   cnt_hi = upper_32_bits(val);
> > +   cnt_lo = lower_32_bits(val);
> > +
> > +   /* v3 has 64-bit counter registers composed by 2 32-bit registers */
> > +   xgene_pmu_write_counter32(pmu_dev, 2 * idx, cnt_lo);
> > +   xgene_pmu_write_counter32(pmu_dev, 2 * idx + 1, cnt_hi);
> > +}
> 
> For this to be atomic, we need to disable the counters for the duration
> of the IRQ handler, which we don't do today.
> 
> Regardless, we should do that to ensure that groups are self-consistent.
> 
> i.e. in xgene_pmu_isr() we should call ops->stop_counters() just after
> taking the pmu lock, and we should call ops->start_counters() just
> before releasing it.
> 
> With that:
> 
> Acked-by: Mark Rutland 

Actually, that should be in _xgene_pmu_isr, given we have to do it for each
pmu_dev.

I'll apply the diff below; this also avoids a race on V1 where an
overflow could be lost (as we clear the whole OVSR rather than only the
set bits).

Thanks,
Mark.

diff --git a/drivers/perf/xgene_pmu.c b/drivers/perf/xgene_pmu.c
index 84c32e0..a9659cb 100644
--- a/drivers/perf/xgene_pmu.c
+++ b/drivers/perf/xgene_pmu.c
@@ -1217,13 +1217,15 @@ static void _xgene_pmu_isr(int irq, struct 
xgene_pmu_dev *pmu_dev)
u32 pmovsr;
int idx;
 
+   xgene_pmu->ops->stop_counters(pmu_dev);
+
if (xgene_pmu->version == PCP_PMU_V3)
pmovsr = readl(csr + PMU_PMOVSSET) & PMU_OVERFLOW_MASK;
else
pmovsr = readl(csr + PMU_PMOVSR) & PMU_OVERFLOW_MASK;
 
if (!pmovsr)
-   return;
+   goto out;
 
/* Clear interrupt flag */
if (xgene_pmu->version == PCP_PMU_V1)
@@ -1243,6 +1245,9 @@ static void _xgene_pmu_isr(int irq, struct xgene_pmu_dev 
*pmu_dev)
xgene_perf_event_update(event);
xgene_perf_event_set_period(event);
}
+
+out:
+   xgene_pmu->ops->start_counters(pmu_dev);
 }
 
 static irqreturn_t xgene_pmu_isr(int irq, void *dev_id)

Re: [PATCH v3 3/3] perf: xgene: Add support for SoC PMU version 3

2017-06-22 Thread Mark Rutland

On Thu, Jun 22, 2017 at 11:13:08AM -0700, Hoan Tran wrote:
> On Thu, Jun 22, 2017 at 10:52 AM, Mark Rutland  wrote:
> > On Tue, Jun 06, 2017 at 11:02:26AM -0700, Hoan Tran wrote:
> > >  static inline void
> > > +xgene_pmu_write_counter64(struct xgene_pmu_dev *pmu_dev, int idx, u64 
> > > val)
> > > +{
> > > + u32 cnt_lo, cnt_hi;
> > > +
> > > + cnt_hi = upper_32_bits(val);
> > > + cnt_lo = lower_32_bits(val);
> > > +
> > > + /* v3 has 64-bit counter registers composed by 2 32-bit registers */
> > > + xgene_pmu_write_counter32(pmu_dev, 2 * idx, cnt_lo);
> > > + xgene_pmu_write_counter32(pmu_dev, 2 * idx + 1, cnt_hi);
> > > +}
> >
> > For this to be atomic, we need to disable the counters for the duration
> > of the IRQ handler, which we don't do today.
> >
> > Regardless, we should do that to ensure that groups are self-consistent.
> >
> > i.e. in xgene_pmu_isr() we should call ops->stop_counters() just after
> > taking the pmu lock, and we should call ops->start_counters() just
> > before releasing it.
> 
> Thanks for your comments. I'll fix them and send another version of
> patch set soon.

No need; I'm picking these up now, and I'll apply the fixups locally.

Thanks,
Mark.

Re: New NTB API Issue

2017-06-22 Thread Logan Gunthorpe


On 6/22/2017 12:32 PM, Allen Hubbe wrote:

From: Logan Gunthorpe

Hey Guys,

I've run into some subtle issues with the new API:

It has to do with splitting mw_get_range into mw_get_align and
peer_mw_get_addr.

The original mw_get_range returned the size of the /local/ memory
window's size, address and alignment requirements. The ntb clients then
take the local size and transmit it via spads to the peer which would
use it in setting up the memory window. However, it made the assumption
that the alignment restrictions were symmetric on both hosts seeing they
were not sent across the link.

The new API makes a sensible change for this in that mw_get_align
appears to be intended to return the alignment restrictions (and now
size) of the peer. This helps a bit for the Switchtec driver but appears
to be a semantic change that wasn't really reflected in the changes to
the other NTB code. So, I see a couple of issues:

1) With our hardware, we can't actually know anything about the peer's
memory windows until the peer has finished its setup (ie. the link is
up). However, all the clients call the function during probe, before the
link is ready. There's really no good reason for this, so I think we
should change the clients so that mw_get_align is called only when the
link is up.

2) The changes to the Intel and AMD driver for mw_get_align sets
*max_size to the local pci resource size. (Thus making the assumption
that the local is the same as the peer, which is wrong). max_size isn't
actually used for anything so it's not _really_ an issue, but I do think
it's confusing and incorrect. I'd suggest we remove max_size until
something actually needs it, or at least set it to zero in cases where
the hardware doesn't support returning the size of the peer's memory
window (ie. in the Intel and AMD drivers).


You're right, and the b2b_split in the Intel driver even makes use of different 
primary/secondary bar sizes. For Intel and AMD, it would make more sense to use 
the secondary bar size here.  The size of the secondary bar still not 
necessarily valid end-to-end, because in b2b the peer's primary bar size could 
be even smaller.

I'm not entirely convinced that this should represent the end-to-end size of 
local and peer memory window configurations.  I think it should represent the 
largest side that would be valid to pass to ntb_mw_set_trans().  Then, the 
peers should communicate their respective max sizes (along with translation 
addresses, etc) before setting up the translations, and that exchange will 
ensure that the size finally used is valid end-to-end.


But why would the client ever need to use the max_size instead of the 
actual size of the bar as retrieved and exchanged from peer_mw_get_addr?


Logan

Re: [RFC v3 01/23] powerpc: Free up four 64K PTE bits in 4K backed HPTE pages

2017-06-22 Thread Ram Pai

On Thu, Jun 22, 2017 at 07:21:03PM +1000, Balbir Singh wrote:
> On Wed, 2017-06-21 at 18:39 -0700, Ram Pai wrote:
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6,
> > in the 4K backed HPTE pages. These bits continue to be used
> > for 64K backed HPTE pages in this patch,  but will be freed
> > up in the next patch. The  bit  numbers  are big-endian  as
> > defined in the ISA3.0
> > 
> > The patch does the following change to the 64K PTE format
> >
> 
> Why can't we stuff the bits in the VMA and retrieve it from there?
> Basically always get a minor fault in hash and for keys handle
> the fault in do_page_fault() and handle the keys from the VMA?

I think you raise a valid point. We dont necessarily have to program
the pte. the hpte can be programmed directly from the key in the vma.
Just that the code becomes a little ugly to do so, since the
_hash_page_*() functions do not have access to the vma.

However we are also trying to maintain consistency between hpte and rpte
implementation. The keys have to be programmed into the rpte.
The patch is working towards enabling the consistency, so that
the same code can work on both, hpte for now and rpte in the future.

Maybe I can just do what you propose.  However this patch by itself
has value, because it frees up four valuable pte bits, irrespective
of whether we use it for memory keys. Let me see what others have
to say.  

Aneesh: thoughts?

> 
> > H_PAGE_BUSY moves from bit 3 to bit 9
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> > 
> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> > is  initialized  to  0xF  indicating  an invalid  slot.  If  a HPTE
> > gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> > released immediately. In  other  words, even  though   0xF   is   a
> > valid slot we discard  and consider it as an invalid
> > slot;i.e HPTE(). This  gives  us  an opportunity to not
> > depend on a bit in the primary PTE in order to determine the
> > validity of a slot.
> 
> This is not clear, could you please rephrase? What is the bit in the
> primary key we rely on?

(H_PAGE_F_SECOND|H_PAGE_F_GIX) bits, which is big-endian bits 3 4 5 and
6. They are currently used to track the validitiy of the 4k-hptes backing the
64k-pte.   Each bit tracks four 4k-hptes, for a total of sixteen
4k-hptes.


> 
> > 
> > When  we  release  aHPTE   in the 0xF   slot we also   release a
> > legitimate primary   slot  andunmapthat  entry. This  is  to
> > ensure  that we do get a   legimate   non-0xF  slot the next time we
> > retry for a slot.
> > 
> > Though treating 0xF slot as invalid reduces the number of available
> > slots  and  may  have an effect  on the performance, the probabilty
> > of hitting a 0xF is extermely low.
> > 
> > Compared  to the current scheme, the above described scheme reduces
> > the number of false hash table updates  significantly  and  has the
> > added  advantage  of  releasing  four  valuable  PTE bits for other
> > purpose.
> > 
> > This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> > Ellermen and myself.
> >
> 
> It would be helpful if you had a text diagram explaining the PTE bits
> before and after.

ok. will add it in the next version.

> 
> > 4K PTE format remain unchanged currently.
> >
> 
> The code seems to be doing a lot more than the changelog suggests. A few
> functions are completely removed, common code between 64K and 4K has been
> split under #ifndef. It would be good to call all of these out.

ok. will do.

> 
> > Signed-off-by: Ram Pai 
> > 
> > Conflicts:
> > arch/powerpc/include/asm/book3s/64/hash.h
> > ---
> >  arch/powerpc/include/asm/book3s/64/hash-4k.h  |  7 +++
> >  arch/powerpc/include/asm/book3s/64/hash-64k.h | 17 ---
> >  arch/powerpc/include/asm/book3s/64/hash.h | 12 +++--
> >  arch/powerpc/mm/hash64_64k.c  | 70 
> > +++
> >  arch/powerpc/mm/hash_utils_64.c   |  4 +-
> >  5 files changed, 66 insertions(+), 44 deletions(-)
> > 
> > diff --git a/arch/powerpc/include/asm/book3s/64/hash-4k.h 
> > b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > index b4b5e6b..9c2c8f1 100644
> > --- a/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > +++ b/arch/powerpc/include/asm/book3s/64/hash-4k.h
> > @@ -16,6 +16,13 @@
> >  #define H_PUD_TABLE_SIZE   (sizeof(pud_t) << H_PUD_INDEX_SIZE)
> >  #define H_PGD_TABLE_SIZE   (sizeof(pgd_t) << H_PGD_INDEX_SIZE)
> >  
> > +#define H_PAGE_F_SECOND_RPAGE_RSV2 /* HPTE is in 2ndary HPTEG 
> > */
> > +#define H_PAGE_F_GIX   (_RPAGE_RSV3 | _RPAGE_RSV4 | _RPAGE_RPN44)
> > +#define H_PAGE_F_GIX_SHIFT 56
> > +
> > +#define H_PAGE_BUSY_RPAGE_RSV1 /* software: PTE & hash are 
> > busy */
> > +#define H_PAGE_HASHPTE _RPAGE_RPN43/* PTE has associated

Re: seccomp ptrace selftest failures with 4.4-stable [Was: Re: LTS testing with latest kselftests - some failures]

2017-06-22 Thread Shuah Khan

On 06/22/2017 11:50 AM, Kees Cook wrote:
> On Thu, Jun 22, 2017 at 10:49 AM, Andy Lutomirski  wrote:
>> On Thu, Jun 22, 2017 at 10:09 AM, Shuah Khan  wrote:
>>> On 06/22/2017 10:53 AM, Kees Cook wrote:
 On Thu, Jun 22, 2017 at 9:18 AM, Sumit Semwal  
 wrote:
> Hi Kees, Andy,
>
> On 15 June 2017 at 23:26, Sumit Semwal  wrote:
>> 3. 'seccomp ptrace hole closure' patches got added in 4.7 [3] -
>> feature and test together.
>> - This one also seems like a security hole being closed, and the
>> 'feature' could be a candidate for stable backports, but Arnd tried
>> that, and it was quite non-trivial. So perhaps  we'll need some help
>> from the subsystem developers here.
>
> Could you please help us sort this out? Our goal is to help Greg with
> testing stable kernels, and currently the seccomp tests fail due to
> missing feature (seccomp ptrace hole closure) getting tested via
> latest kselftest.
>
> If you feel the feature isn't a stable candidate, then could you
> please help make the test degrade gracefully in its absence?

In some cases, it is not easy to degrade and/or check for a feature.
Probably several security features could fall in this bucket.

 I don't really want to have that change be a backport -- it's quite
 invasive across multiple architectures.

Agreed. The same test for kernel applies to tests as well. If a kernel
feature can't be back-ported, the test for that feature will fall in the
same bucket. It shouldn't be back-ported.

 I would say just add a kernel version check to the test. This is
 probably not the only selftest that will need such things. :)
>>>
>>> Adding release checks to selftests is going to problematic for maintenance.
>>> Tests should fail gracefully if feature isn't supported in older kernels.
>>>
>>> Several tests do that now and please find a way to check for dependencies
>>> and feature availability and fail the test gracefully. If there is a test
>>> that can't do that for some reason, we can discuss it, but as a general
>>> rule, I don't want to see kselftest patches that check release.
>>
>> If a future kernel inadvertently loses the new feature and degrades to
>> the behavior of old kernels, that would be a serious bug and should be
>> caught.

Agreed. If I understand you correctly, by not testing stable kernels
with their own selftests, some serious bugs could go undetected.

> 
> Right. I really think stable kernels should be tested with their own
> selftests. If some test is needed in a stable kernel it should be
> backported to that stable kernel.

Correct. This is always a safe option. There might be cases that even
prevent tests being built, especially if a new feature adds new fields
to an existing structure.

It appears in some cases, users want to run newer tests on older kernels.
Some tests can clearly detect feature support using module presence and/or
Kconfig enabled or disabled. These are conditions even on a kernel that
supports a new module or new config option. The kernel the test is running
on might not have the feature enabled or module might not be present. In
these cases, it would be easier to detect and skip the test.

However, some features aren't so easy. For example:

- a new flag is added to a syscall, and new test is added. It might not
  be easy to detect that.
- We might have some tests that can't detect and skip.

Based on this discussion, it is probably accurate to say:

1. It is recommended that selftests from the same release be run on the
   kernel.
2. Selftests from newer kernels will run on older kernels, user should
   understand the risks such as some tests might fail and might not
   detect feature degradation related bugs.
3. Selftests will fail gracefully on older releases if at all possible.

Sumit!

1. What are the reasons for testing older kernel with selftests from
   newer kernels? What are the benefits you see for doing so?

   I am looking to understand the need/reasons for this use-case. In our
   previous discussion on this subject, I did say, you should be able to
   do so with some exceptions.

2. Do you test kernels with the selftests from the same release?

3. Do you find testing with newer selftests to be useful?

thanks,
-- Shuah

Re: [PATCH v3 06/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking

2017-06-22 Thread Borislav Petkov

On Thu, Jun 22, 2017 at 10:47:29AM -0700, Andy Lutomirski wrote:
> I figured that some future reader of this patch might actually want to
> see this text, though.

Oh, don't get me wrong: with commit messages more is more, in the
general case. That's why I said "if".

> >> The UV tlbflush code is rather dated and should be changed.
> 
> And I'd definitely like the UV maintainers to notice this part, now or
> in the future :)  I don't want to personally touch the UV code with a
> ten-foot pole, but it really should be updated by someone who has a
> chance of getting it right and being able to test it.

Ah, could be because they moved recently and have hpe addresses now.
Lemme add them.

> >> +
> >> + if (cpumask_test_cpu(cpu, mm_cpumask(mm)))
> >> + cpumask_clear_cpu(cpu, mm_cpumask(mm));
> >
> > It seems we haz a helper for that: cpumask_test_and_clear_cpu() which
> > does BTR straightaway.
> 
> Yeah, but I'm doing this for performance.  I think that all the
> various one-line helpers do a LOCKed op right away, and I think it's
> faster to see if we can avoid the LOCKed op by trying an ordinary read
> first.

Right, the test part of the operation is unlocked so if that is the
likely case, it is a win.

> OTOH, maybe this is misguided -- if the cacheline lives somewhere else
> and we do end up needing to update it, we'll end up first sharing it
> and then making it exclusive, which increases the amount of cache
> coherency traffic, so maybe I'm optimizing for the wrong thing. What
> do you think?

Yeah, but we'll have to do that anyway for the locked operation. Ok,
let's leave it split like it is.

> It did in one particular buggy incarnation.  It would also trigger if,
> say, suspend/resume corrupts CR3.  Admittedly this is unlikely, but
> I'd rather catch it.  Once PCID is on, corruption seems a bit less
> farfetched -- this assertion will catch anyone who accidentally does
> write_cr3(read_cr3_pa()).

Ok, but let's put a comment over it pls as it is not obvious when
something like that can happen.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v2 2/2] x86/xen/efi: Init only efi struct members used by Xen

2017-06-22 Thread Boris Ostrovsky

On 06/22/2017 06:51 AM, Daniel Kiper wrote:
> Current approach, wholesale efi struct initialization from efi_xen, is not
> good. Usually if new member is defined then it is properly initialized in
> drivers/firmware/efi/efi.c but not in arch/x86/xen/efi.c. As I saw it happened
> a few times until now. So, let's initialize only efi struct members used by
> Xen to avoid such issues in the future.
>
> Signed-off-by: Daniel Kiper 
> Acked-by: Ard Biesheuvel 

Reviewed-by: Boris Ostrovsky

Re: [PATCH] powerpc: Only obtain cpu_hotplug_lock if called by rtasd

2017-06-22 Thread Thiago Jung Bauermann

Michael Ellerman  writes:

> Thiago Jung Bauermann  writes:
>
>> Michael Ellerman  writes:
>>> Thiago Jung Bauermann  writes:
>>>
 Calling arch_update_cpu_topology from a CPU hotplug state machine callback
 hits a deadlock because the function tries to get a read lock on
 cpu_hotplug_lock while the state machine still holds a write lock on it.

 Since all callers of arch_update_cpu_topology except rtasd already hold
 cpu_hotplug_lock, this patch changes the function to use
 stop_machine_cpuslocked and creates a separate function for rtasd which
 still tries to obtain the lock.

 Michael Bringmann investigated the bug and provided a detailed analysis
 of the deadlock on this previous RFC for an alternate solution:

 https://patchwork.ozlabs.org/patch/771293/
>>>
>>> Do we know when this broke? Or has it never worked?
>>
>> It's been broken since at least v4.4, I think. I don't know about
>> earlier versions.
>
> OK.
>
> Just to be clear, this is happening on a 4.12-rcX system with no other
> patches?
>
> The code in arch_update_cpu_topology() has used stop_machine() since 
> 30c05350c39d ("powerpc/pseries: Use stop machine to update cpu maps")
> which went into v3.10, about 4 years ago.
>
> Prior to that it used get/put_online_cpus(), since 9eff1a38407c
> ("powerpc/pseries: Poll VPA for topology changes and update NUMA maps"),
> which was 2.6.38 in 2010.
>
> I wouldn't rule out the possibility it's been broken for 7 years, but I
> wonder if something else has changed to cause it to break.
>
> We really need to work it out before we backport anything.

Michael Bringmann provided this information:

We need at least one patch to show the issue in the latest 4.12
codebase:

[PATCH V6 2/2] powerpc/numa: Update CPU topology when VPHN enabled
https://lists.ozlabs.org/pipermail/linuxppc-dev/2017-June/159341.html

Reason: Prior to this patch we were not exercising the PowerPC VPHN
hcall nor the associated path through the kernel/sched code that
encounters the problem. All CPUs, whether present at boot or
hot-added, are added to node 0 without this patch. This
representation of the topology is incorrect in many/most cases.

cpu_hotplug_begin() + get_online_cpus() have potentially been broken
since the implementation of multithreading for _cpu_up() and
_cpu_down().

Reason: 'cpu_hotplug.active_writer' check in get_online_cpus() is
dependent upon the nested routines that call get_online_cpus() to
execute in the same thread as the one that invokes
'cpu_hotplug_begin'.

PowerPC's version of arch_update_cpu_topology() has used
stop_machine() or get_online_cpus() for years, since at least 2011.

Practically speaking, until the recent patch, in the 4.12 codebase
PowerPC CPUs are being added only to nodes that were online at boot,
and the topology did not change enough to trigger the paths through
'stop_machine'.

I can't say for certain about the earlier code bases where
'arch_update_cpu_topology' used 'get_online_cpus'/'put_online_cpus'
directly.

>>> Should it go to stable? (can't in its current form AFAICS)
>>
>> It's not hard to backport both this patch and commit fe5595c07400
>> ("stop_machine: Provide stop_machine_cpuslocked()") from branch
>> smp/hotplug in tip.git for stable.
>
> Yeah but it's not really my business backporting that unfortunately.

Sorry, I wasn't clear. I was offering to provide backported patches for
the relevant stable branches.

Though that will only be necessary if we also backport the topology
fixes as well.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH 32/32] ext4: add nombcache mount option

2017-06-22 Thread Theodore Ts'o

On Wed, Jun 21, 2017 at 06:49:39PM -0700, Tahsin Erdogan wrote:
> The main purpose of mb cache is to achieve deduplication in
> extended attributes. In use cases where opportunity for deduplication
> is unlikely, it only adds overhead.
> 
> Add a mount option to explicitly turn off mb cache.
> 
> Suggested-by: Andreas Dilger 
> Signed-off-by: Tahsin Erdogan 

Applied, thanks.

- Ted

Re: [PATCH v3 11/11] x86/mm: Try to preserve old TLB entries using PCID

2017-06-22 Thread Nadav Amit

Andy Lutomirski  wrote:

> 
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -812,6 +812,7 @@ void __init zone_sizes_init(void)
> 
> DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state, cpu_tlbstate) = {
>   .loaded_mm = _mm,
> + .next_asid = 1,

I think this is a remainder from previous version of the patches, no? It
does not seem necessary and may be confusing (ctx_id 0 is reserved, but not
asid 0).

Other than that, if you want, you can put for the entire series:

Reviewed-by: Nadav Amit

Re: [RFC v2 01/12] powerpc: Free up four 64K PTE bits in 4K backed hpte pages.

2017-06-22 Thread Ram Pai

On Thu, Jun 22, 2017 at 02:37:27PM +0530, Anshuman Khandual wrote:
> On 06/17/2017 09:22 AM, Ram Pai wrote:
> > Rearrange 64K PTE bits to  free  up  bits 3, 4, 5  and  6
> > in the 4K backed hpte pages. These bits continue to be used
> > for 64K backed hpte pages in this patch, but will be freed
> > up in the next patch.
> > 
> > The patch does the following change to the 64K PTE format
> > 
> > H_PAGE_BUSY moves from bit 3 to bit 9
> > H_PAGE_F_SECOND which occupied bit 4 moves to the second part
> > of the pte.
> > H_PAGE_F_GIX which  occupied bit 5, 6 and 7 also moves to the
> > second part of the pte.
> > 
> > the four  bits((H_PAGE_F_SECOND|H_PAGE_F_GIX) that represent a slot
> > is  initialized  to  0xF  indicating  an invalid  slot.  If  a hpte
> > gets cached in a 0xF  slot(i.e  7th  slot  of  secondary),  it   is
> > released immediately. In  other  words, even  though   0xF   is   a
> > valid slot we discard  and consider it as an invalid
> > slot;i.e hpte_soft_invalid(). This  gives  us  an opportunity to not
> > depend on a bit in the primary PTE in order to determine the
> > validity of a slot.
> > 
> > When  we  release  ahpte   in the 0xF   slot we also   release a
> > legitimate primary   slot  andunmapthat  entry. This  is  to
> > ensure  that we do get a   legimate   non-0xF  slot the next time we
> > retry for a slot.
> > 
> > Though treating 0xF slot as invalid reduces the number of available
> > slots  and  may  have an effect  on the performance, the probabilty
> > of hitting a 0xF is extermely low.
> > 
> > Compared  to the current scheme, the above described scheme reduces
> > the number of false hash table updates  significantly  and  has the
> > added  advantage  of  releasing  four  valuable  PTE bits for other
> > purpose.
> > 
> > This idea was jointly developed by Paul Mackerras, Aneesh, Michael
> > Ellermen and myself.
> > 
> > 4K PTE format remain unchanged currently.
> 
> Scanned through the PTE format again for hash 64K and 4K. It seems
> to me that there might be 5 free bits already present on the PTE
> format. I might have seriously mistaken something here :) Please
> correct me if that is not the case. _RPAGE_RPN* I think is applicable
> only for hash page table format and will not be available for radix
> later.
> 
> +#define _PAGE_FREE_1   0x0040UL /* Not used */
> +#define _RPAGE_SW0 0x2000UL /* Not used */
> +#define _RPAGE_SW1 0x0800UL /* Not used */
> +#define _RPAGE_RPN42   0x0040UL /* Not used */
> +#define _RPAGE_RPN41   0x0020UL /* Not used */
> 

The bits are chosen to future proof for radix implementation.
_RPAGE_SW* will eat into what is available for software in the future,
and these key-bits will certainly be something that the radix
hardware will read, in the future.

The _RPAGE_RPN* bits cannot be relied on for radix.

But finally the bits that we chose (H_PAGE_F_SECOND|H_PAGE_F_GIX) had
the best potential for giving us the highest number of free bits with
relatively less effort.

RP

Re: [PATCH v4 1/2] ACPICA: ACPI 6.2: Add support for new SRAT subtable

2017-06-22 Thread Ganapatrao Kulkarni

On Thu, Jun 22, 2017 at 7:57 PM, Rafael J. Wysocki  wrote:
> On Thursday, June 22, 2017 02:13:18 PM Moore, Robert wrote:
>> This support is already in the ACPICA code base, but I can't speak to when 
>> it will be upstreamed to Linux. Lv would know this.
>
> It should be there in linux-next already AFAICS.
>
> Lorenzo, can you please double check?

thanks Rafael, this is added to linux-next on june12 [1].

When i sent my first version i.e on June 6, it was not there, hence i
have added to this series.
my bad, i have not checked when i sent subsequent versions.
this patch can be dropped.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/include/acpi/actbl1.h?id=a618c7f89a02f3c67a8f200855cff398855e3888

>
> Thanks,
> Rafael
>
>
>> > -Original Message-
>> > From: Lorenzo Pieralisi [mailto:lorenzo.pieral...@arm.com]
>> > Sent: Thursday, June 22, 2017 6:43 AM
>> > To: Ganapatrao Kulkarni ; Zheng, Lv
>> > ; Moore, Robert ; Rafael J.
>> > Wysocki 
>> > Cc: linux-a...@vger.kernel.org; de...@acpica.org; linux-
>> > ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
>> > marc.zyng...@arm.com; catalin.mari...@arm.com; will.dea...@arm.com;
>> > hanjun@linaro.org; t...@linutronix.de; ja...@lakedaemon.net;
>> > jn...@caviumnetworks.com; gpkulka...@gmail.com
>> > Subject: Re: [PATCH v4 1/2] ACPICA: ACPI 6.2: Add support for new SRAT
>> > subtable
>> >
>> > Hi Rafael, Lv, Robert,
>> >
>> > On Thu, Jun 22, 2017 at 11:40:11AM +0530, Ganapatrao Kulkarni wrote:
>> > > Add GIC ITS Affinity (ACPI 6.2) subtable to SRAT table.
>> > >
>> > > ACPICA commit 5bc67f63918da249bfe279ee461d152bb3e6f55b
>> > > Link: https://github.com/acpica/acpica/commit/5bc67f6
>> > >
>> > > Signed-off-by: Ganapatrao Kulkarni 
>> > > ---
>> > >  include/acpi/actbl1.h | 12 +++-
>> > >  1 file changed, 11 insertions(+), 1 deletion(-)
>> >
>> > This patch is fine to me but it is up to you or who sends the ACPICA
>> > pull request to send it upstream or give us an ACK so that it can go via
>> > irqchip.
>> >
>> > We need to know how this commit (and other ACPICA changes) will be sent
>> > upstream to handle trees dependencies, please advise it is a bit urgent,
>> > thank you.
>> >
>> > Lorenzo
>> >
>> > > diff --git a/include/acpi/actbl1.h b/include/acpi/actbl1.h index
>> > > b4ce55c..253c9db 100644
>> > > --- a/include/acpi/actbl1.h
>> > > +++ b/include/acpi/actbl1.h
>> > > @@ -1192,7 +1192,8 @@ enum acpi_srat_type {
>> > >   ACPI_SRAT_TYPE_MEMORY_AFFINITY = 1,
>> > >   ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY = 2,
>> > >   ACPI_SRAT_TYPE_GICC_AFFINITY = 3,
>> > > - ACPI_SRAT_TYPE_RESERVED = 4 /* 4 and greater are reserved */
>> > > + ACPI_SRAT_TYPE_GIC_ITS_AFFINITY = 4,/* ACPI 6.2 */
>> > > + ACPI_SRAT_TYPE_RESERVED = 5 /* 5 and greater are reserved */
>> > >  };
>> > >
>> > >  /*
>> > > @@ -1260,6 +1261,15 @@ struct acpi_srat_gicc_affinity {
>> > >   u32 clock_domain;
>> > >  };
>> > >
>> > > +/* 4: GIC ITS Affinity (ACPI 6.2) */
>> > > +
>> > > +struct acpi_srat_its_affinity {
>> > > + struct acpi_subtable_header header;
>> > > + u32 proximity_domain;
>> > > + u16 reserved;
>> > > + u32 its_id;
>> > > +};
>> > > +
>> > >  /* Flags for struct acpi_srat_gicc_affinity */
>> > >
>> > >  #define ACPI_SRAT_GICC_ENABLED (1)   /* 00: Use affinity structure
>> > */
>> > > --
>> > > 1.8.1.4
>> > >
>

thanks
Ganapat

Re: [PATCH] scripts/dtc: dtx_diff - Show real file names in diff header

2017-06-22 Thread Rob Herring

On Thu, Jun 22, 2017 at 03:07:06PM +0200, Geert Uytterhoeven wrote:
> As the comparison uses process substitution to pass files after
> conversion to DTS format, the diff header doesn't show the real
> filenames, but the names of the file descriptors used:
> 
> --- /dev/fd/63  2017-06-22 11:21:47.531637188 +0200
> +++ /dev/fd/62  2017-06-22 11:21:47.531637188 +0200
> 
> This is especially annoying when comparing a bunch of DT files in a
> loop, as the output doesn't show a clue about which files it refers to.
> 
> Fix this by explicitly passing the original file names to the diff
> command using the --label option, giving e.g.:
> 
> --- arch/arm/boot/dts/r8a7791-koelsch.dtb
> +++ arch/arm/boot/dts/r8a7791-porter.dtb
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
>  scripts/dtc/dtx_diff | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied.

Re: [PATCH v2] fs/dcache.c: fix spin lockup issue on nlru->lock

2017-06-22 Thread Sahitya Tummala




On 6/21/2017 10:01 PM, Vladimir Davydov wrote:



index cddf397..c8ca150 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1133,10 +1133,11 @@ void shrink_dcache_sb(struct super_block *sb)
LIST_HEAD(dispose);
  
  		freed = list_lru_walk(>s_dentry_lru,

-   dentry_lru_isolate_shrink, , UINT_MAX);
+   dentry_lru_isolate_shrink, , 1024);
  
  		this_cpu_sub(nr_dentry_unused, freed);

shrink_dentry_list();
+   cond_resched();
} while (freed > 0);

In an extreme case, a single invocation of list_lru_walk() can skip all
1024 dentries, in which case 'freed' will be 0 forcing us to break the
loop prematurely. I think we should loop until there's at least one
dentry left on the LRU, i.e.

while (list_lru_count(>s_dentry_lru) > 0)

However, even that wouldn't be quite correct, because list_lru_count()
iterates over all memory cgroups to sum list_lru_one->nr_items, which
can race with memcg offlining code migrating dentries off a dead cgroup
(see memcg_drain_all_list_lrus()). So it looks like to make this check
race-free, we need to account the number of entries on the LRU not only
per memcg, but also per node, i.e. add list_lru_node->nr_items.
Fortunately, list_lru entries can't be migrated between NUMA nodes.
It looks like list_lru_count() is iterating per node before iterating 
over all memory

cgroups as below -

unsigned long list_lru_count_node(struct list_lru *lru, int nid)
{
long count = 0;
int memcg_idx;

count += __list_lru_count_one(lru, nid, -1);
if (list_lru_memcg_aware(lru)) {
for_each_memcg_cache_index(memcg_idx)
count += __list_lru_count_one(lru, nid, memcg_idx);
}
return count;
}

The first call to __list_lru_count_one() is iterating all the items per 
node i.e, nlru->lru->nr_items.
Is my understanding correct? If not, could you please clarify on how to 
get the lru items per node?


--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux 
Foundation Collaborative Project.

Re: [ANNOUNCE] v4.11.5-rt1

2017-06-22 Thread Sebastian Andrzej Siewior

On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> See ! and ?

See see.
What about this:

diff --git a/include/linux/sched.h b/include/linux/sched.h
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1014,8 +1014,20 @@ struct wake_q_head {
 #define WAKE_Q(name)   \
struct wake_q_head name = { WAKE_Q_TAIL,  }
 
-extern void wake_q_add(struct wake_q_head *head,
- struct task_struct *task);
+extern void __wake_q_add(struct wake_q_head *head,
+struct task_struct *task, bool sleeper);
+static inline void wake_q_add(struct wake_q_head *head,
+ struct task_struct *task)
+{
+   __wake_q_add(head, task, false);
+}
+
+static inline void wake_q_add_sleeper(struct wake_q_head *head,
+ struct task_struct *task)
+{
+   __wake_q_add(head, task, true);
+}
+
 extern void __wake_up_q(struct wake_q_head *head, bool sleeper);
 
 static inline void wake_up_q(struct wake_q_head *head)
@@ -1745,6 +1757,7 @@ struct task_struct {
raw_spinlock_t pi_lock;
 
struct wake_q_node wake_q;
+   struct wake_q_node wake_q_sleeper;
 
 #ifdef CONFIG_RT_MUTEXES
/* PI waiters blocked on a rt_mutex held by this task */
diff --git a/kernel/fork.c b/kernel/fork.c
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -558,6 +558,7 @@ static struct task_struct *dup_task_struct(struct 
task_struct *orig, int node)
tsk->splice_pipe = NULL;
tsk->task_frag.page = NULL;
tsk->wake_q.next = NULL;
+   tsk->wake_q_sleeper.next = NULL;
 
account_kernel_stack(tsk, 1);
 
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -1506,7 +1506,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head 
*wake_q,
 */
preempt_disable();
if (waiter->savestate)
-   wake_q_add(wake_sleeper_q, waiter->task);
+   wake_q_add_sleeper(wake_sleeper_q, waiter->task);
else
wake_q_add(wake_q, waiter->task);
raw_spin_unlock(>pi_lock);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -430,9 +430,15 @@ static bool set_nr_if_polling(struct task_struct *p)
 #endif
 #endif
 
-void wake_q_add(struct wake_q_head *head, struct task_struct *task)
+void __wake_q_add(struct wake_q_head *head, struct task_struct *task,
+ bool sleeper)
 {
-   struct wake_q_node *node = >wake_q;
+   struct wake_q_node *node;
+
+   if (sleeper)
+   node = >wake_q_sleeper;
+   else
+   node = >wake_q;
 
/*
 * Atomically grab the task, if ->wake_q is !nil already it means
@@ -461,11 +467,17 @@ void __wake_up_q(struct wake_q_head *head, bool sleeper)
while (node != WAKE_Q_TAIL) {
struct task_struct *task;
 
-   task = container_of(node, struct task_struct, wake_q);
+   if (sleeper)
+   task = container_of(node, struct task_struct, 
wake_q_sleeper);
+   else
+   task = container_of(node, struct task_struct, wake_q);
BUG_ON(!task);
/* task can safely be re-inserted now */
node = node->next;
-   task->wake_q.next = NULL;
+   if (sleeper)
+   task->wake_q_sleeper.next = NULL;
+   else
+   task->wake_q.next = NULL;
 
/*
 * wake_up_process() implies a wmb() to pair with the queueing

[PATCH] spi: stm32: fix error check on mbr being -ve

2017-06-22 Thread Colin King

From: Colin Ian King 

The error check of mbr < 0 is always false because mbr is a u32. Make
mbt an int so that a -ve error return from stm32_spi_prepare_mbr can be
detected.

Detected by CoverityScan, CID#1446586 ("Unsigned compared against 0")

Signed-off-by: Colin Ian King 
---
 drivers/spi/spi-stm32.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-stm32.c b/drivers/spi/spi-stm32.c
index b75909e7b117..82a6616f46b8 100644
--- a/drivers/spi/spi-stm32.c
+++ b/drivers/spi/spi-stm32.c
@@ -857,7 +857,7 @@ static int stm32_spi_transfer_one_setup(struct stm32_spi 
*spi,
}
 
if (spi->cur_speed != transfer->speed_hz) {
-   u32 mbr;
+   int mbr;
 
/* Update spi->cur_speed with real clock speed */
mbr = stm32_spi_prepare_mbr(spi, transfer->speed_hz);
@@ -869,7 +869,7 @@ static int stm32_spi_transfer_one_setup(struct stm32_spi 
*spi,
transfer->speed_hz = spi->cur_speed;
 
cfg1_clrb |= SPI_CFG1_MBR;
-   cfg1_setb |= (mbr << SPI_CFG1_MBR_SHIFT) & SPI_CFG1_MBR;
+   cfg1_setb |= ((u32)mbr << SPI_CFG1_MBR_SHIFT) & SPI_CFG1_MBR;
}
 
if (cfg1_clrb || cfg1_setb)
-- 
2.11.0

[tip:irq/core] x86/msi: Create named irq domains

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  f8f37ca78915b51a73bf240409fcda30d811b76b
Gitweb: http://git.kernel.org/tip/f8f37ca78915b51a73bf240409fcda30d811b76b
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:14 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:11 +0200

x86/msi: Create named irq domains

Use the fwnode to create named irq domains so diagnosis works.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235444.299024...@linutronix.de

---
 arch/x86/kernel/apic/msi.c | 42 +-
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index d79dc2a..9b18be7 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -136,13 +136,20 @@ static struct msi_domain_info pci_msi_domain_info = {
.handler_name   = "edge",
 };
 
-void arch_init_msi_domain(struct irq_domain *parent)
+void __init arch_init_msi_domain(struct irq_domain *parent)
 {
+   struct fwnode_handle *fn;
+
if (disable_apic)
return;
 
-   msi_default_domain = pci_msi_create_irq_domain(NULL,
-   _msi_domain_info, parent);
+   fn = irq_domain_alloc_named_fwnode("PCI-MSI");
+   if (fn) {
+   msi_default_domain =
+   pci_msi_create_irq_domain(fn, _msi_domain_info,
+ parent);
+   irq_domain_free_fwnode(fn);
+   }
if (!msi_default_domain)
pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n");
 }
@@ -230,13 +237,20 @@ static struct irq_domain *dmar_get_irq_domain(void)
 {
static struct irq_domain *dmar_domain;
static DEFINE_MUTEX(dmar_lock);
+   struct fwnode_handle *fn;
 
mutex_lock(_lock);
-   if (dmar_domain == NULL)
-   dmar_domain = msi_create_irq_domain(NULL, _msi_domain_info,
+   if (dmar_domain)
+   goto out;
+
+   fn = irq_domain_alloc_named_fwnode("DMAR-MSI");
+   if (fn) {
+   dmar_domain = msi_create_irq_domain(fn, _msi_domain_info,
x86_vector_domain);
+   irq_domain_free_fwnode(fn);
+   }
+out:
mutex_unlock(_lock);
-
return dmar_domain;
 }
 
@@ -326,9 +340,10 @@ static struct msi_domain_info hpet_msi_domain_info = {
 
 struct irq_domain *hpet_create_irq_domain(int hpet_id)
 {
-   struct irq_domain *parent;
-   struct irq_alloc_info info;
struct msi_domain_info *domain_info;
+   struct irq_domain *parent, *d;
+   struct irq_alloc_info info;
+   struct fwnode_handle *fn;
 
if (x86_vector_domain == NULL)
return NULL;
@@ -349,7 +364,16 @@ struct irq_domain *hpet_create_irq_domain(int hpet_id)
else
hpet_msi_controller.name = "IR-HPET-MSI";
 
-   return msi_create_irq_domain(NULL, domain_info, parent);
+   fn = irq_domain_alloc_named_id_fwnode(hpet_msi_controller.name,
+ hpet_id);
+   if (!fn) {
+   kfree(domain_info);
+   return NULL;
+   }
+
+   d = msi_create_irq_domain(fn, domain_info, parent);
+   irq_domain_free_fwnode(fn);
+   return d;
 }
 
 int hpet_assign_irq(struct irq_domain *domain, struct hpet_dev *dev,

[PATCH 4/7] alpha: provide ioread64 and iowrite64 implementations

2017-06-22 Thread Logan Gunthorpe

Alpha implements its own io operation and doesn't use the
common library. Thus to make ioread64 and iowrite64 globally
available we need to add implementations for alpha.

For this, we simply use calls that chain two 32-bit operations.
(mostly because I don't really understand the alpha architecture.)

Signed-off-by: Logan Gunthorpe 
Cc: Richard Henderson 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
---
 arch/alpha/include/asm/io.h |  2 ++
 arch/alpha/kernel/io.c  | 18 ++
 2 files changed, 20 insertions(+)

diff --git a/arch/alpha/include/asm/io.h b/arch/alpha/include/asm/io.h
index ff4049155c84..15588092c062 100644
--- a/arch/alpha/include/asm/io.h
+++ b/arch/alpha/include/asm/io.h
@@ -493,8 +493,10 @@ extern inline void writeq(u64 b, volatile void __iomem 
*addr)
 
 #define ioread16be(p) be16_to_cpu(ioread16(p))
 #define ioread32be(p) be32_to_cpu(ioread32(p))
+#define ioread64be(p) be64_to_cpu(ioread64(p))
 #define iowrite16be(v,p) iowrite16(cpu_to_be16(v), (p))
 #define iowrite32be(v,p) iowrite32(cpu_to_be32(v), (p))
+#define iowrite64be(v,p) iowrite32(cpu_to_be64(v), (p))
 
 #define inb_p  inb
 #define inw_p  inw
diff --git a/arch/alpha/kernel/io.c b/arch/alpha/kernel/io.c
index 19c5875ab398..8c28026f7849 100644
--- a/arch/alpha/kernel/io.c
+++ b/arch/alpha/kernel/io.c
@@ -59,6 +59,24 @@ EXPORT_SYMBOL(iowrite8);
 EXPORT_SYMBOL(iowrite16);
 EXPORT_SYMBOL(iowrite32);
 
+u64 ioread64(void __iomem *addr)
+{
+   u64 low, high;
+
+   low = ioread32(addr);
+   high = ioread32(addr + sizeof(u32));
+   return low | (high << 32);
+}
+
+void iowrite64(u64 val, void __iomem *addr)
+{
+   iowrite32(val, addr);
+   iowrite32(val >> 32, addr + sizeof(u32));
+}
+
+EXPORT_SYMBOL(ioread64);
+EXPORT_SYMBOL(iowrite64);
+
 u8 inb(unsigned long port)
 {
return ioread8(ioport_map(port, 1));
-- 
2.11.0

[PATCH 6/7] drm/tilcdc: clean up ifdef hacks around iowrite64

2017-06-22 Thread Logan Gunthorpe

Now that we can expect iowrite64 to always exist the hack is no longer
necessary so we just call iowrite64 directly.

Signed-off-by: Logan Gunthorpe 
Cc: Jyri Sarha 
Cc: Tomi Valkeinen 
Cc: David Airlie 
---
 drivers/gpu/drm/tilcdc/tilcdc_regs.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/tilcdc/tilcdc_regs.h 
b/drivers/gpu/drm/tilcdc/tilcdc_regs.h
index e9ce725698a9..0b901405f30a 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_regs.h
+++ b/drivers/gpu/drm/tilcdc/tilcdc_regs.h
@@ -133,13 +133,7 @@ static inline void tilcdc_write64(struct drm_device *dev, 
u32 reg, u64 data)
struct tilcdc_drm_private *priv = dev->dev_private;
void __iomem *addr = priv->mmio + reg;
 
-#ifdef iowrite64
iowrite64(data, addr);
-#else
-   __iowmb();
-   /* This compiles to strd (=64-bit write) on ARM7 */
-   *(u64 __force *)addr = __cpu_to_le64(data);
-#endif
 }
 
 static inline u32 tilcdc_read(struct drm_device *dev, u32 reg)
-- 
2.11.0

[tip:irq/core] x86/msi: Provide new iommu irqdomain interface

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  667724c5a3109675cf3bfe7d75795b8608d1bcbe
Gitweb: http://git.kernel.org/tip/667724c5a3109675cf3bfe7d75795b8608d1bcbe
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:10 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:10 +0200

x86/msi: Provide new iommu irqdomain interface

Provide a new interface for creating the iommu remapping domains, so that
the caller can supply a name and a id in order to create named irqdomains.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Joerg Roedel 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: io...@lists.linux-foundation.org
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235443.986661...@linutronix.de
Signed-off-by: Thomas Gleixner 

---
 arch/x86/include/asm/irq_remapping.h |  2 ++
 arch/x86/kernel/apic/msi.c   | 15 +++
 2 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/irq_remapping.h 
b/arch/x86/include/asm/irq_remapping.h
index a210eba..0398675 100644
--- a/arch/x86/include/asm/irq_remapping.h
+++ b/arch/x86/include/asm/irq_remapping.h
@@ -56,6 +56,8 @@ irq_remapping_get_irq_domain(struct irq_alloc_info *info);
 
 /* Create PCI MSI/MSIx irqdomain, use @parent as the parent irqdomain. */
 extern struct irq_domain *arch_create_msi_irq_domain(struct irq_domain 
*parent);
+extern struct irq_domain *
+arch_create_remap_msi_irq_domain(struct irq_domain *par, const char *n, int 
id);
 
 /* Get parent irqdomain for interrupt remapping irqdomain */
 static inline struct irq_domain *arch_get_ir_parent_domain(void)
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index c61aec7..0e6618e 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -167,10 +167,25 @@ static struct msi_domain_info pci_msi_ir_domain_info = {
.handler_name   = "edge",
 };
 
+struct irq_domain *arch_create_remap_msi_irq_domain(struct irq_domain *parent,
+   const char *name, int id)
+{
+   struct fwnode_handle *fn;
+   struct irq_domain *d;
+
+   fn = irq_domain_alloc_named_id_fwnode(name, id);
+   if (!fn)
+   return NULL;
+   d = pci_msi_create_irq_domain(fn, _msi_ir_domain_info, parent);
+   irq_domain_free_fwnode(fn);
+   return d;
+}
+
 struct irq_domain *arch_create_msi_irq_domain(struct irq_domain *parent)
 {
return pci_msi_create_irq_domain(NULL, _msi_ir_domain_info, parent);
 }
+
 #endif
 
 #ifdef CONFIG_DMAR_TABLE

[tip:irq/core] iommu/amd: Use named irq domain interface

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  3e49a8182277ea57736285aede5f43bfa6aa11b1
Gitweb: http://git.kernel.org/tip/3e49a8182277ea57736285aede5f43bfa6aa11b1
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:12 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:11 +0200

iommu/amd: Use named irq domain interface

Signed-off-by: Thomas Gleixner 
Acked-by: Joerg Roedel 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: io...@lists.linux-foundation.org
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235444.142270...@linutronix.de

---
 drivers/iommu/amd_iommu.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 590e1e8..503849d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -4395,13 +4395,20 @@ static struct irq_chip amd_ir_chip = {
 
 int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
 {
-   iommu->ir_domain = irq_domain_add_tree(NULL, _ir_domain_ops, iommu);
+   struct fwnode_handle *fn;
+
+   fn = irq_domain_alloc_named_id_fwnode("AMD-IR", iommu->index);
+   if (!fn)
+   return -ENOMEM;
+   iommu->ir_domain = irq_domain_create_tree(fn, _ir_domain_ops, 
iommu);
+   irq_domain_free_fwnode(fn);
if (!iommu->ir_domain)
return -ENOMEM;
 
iommu->ir_domain->parent = arch_get_ir_parent_domain();
-   iommu->msi_domain = arch_create_msi_irq_domain(iommu->ir_domain);
-
+   iommu->msi_domain = arch_create_remap_msi_irq_domain(iommu->ir_domain,
+"AMD-IR-MSI",
+iommu->index);
return 0;
 }

[tip:irq/core] iommu/vt-d: Use named irq domain interface

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  cea29b656a5e5f1a7b7de42795c3ae6fc417ab0b
Gitweb: http://git.kernel.org/tip/cea29b656a5e5f1a7b7de42795c3ae6fc417ab0b
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:11 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:10 +0200

iommu/vt-d: Use named irq domain interface

Signed-off-by: Thomas Gleixner 
Acked-by: Joerg Roedel 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: io...@lists.linux-foundation.org
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235444.063083...@linutronix.de

---
 drivers/iommu/intel_irq_remapping.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel_irq_remapping.c 
b/drivers/iommu/intel_irq_remapping.c
index ba5b580..8fc641e 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -500,8 +500,9 @@ static void iommu_enable_irq_remapping(struct intel_iommu 
*iommu)
 static int intel_setup_irq_remapping(struct intel_iommu *iommu)
 {
struct ir_table *ir_table;
-   struct page *pages;
+   struct fwnode_handle *fn;
unsigned long *bitmap;
+   struct page *pages;
 
if (iommu->ir_table)
return 0;
@@ -525,15 +526,24 @@ static int intel_setup_irq_remapping(struct intel_iommu 
*iommu)
goto out_free_pages;
}
 
-   iommu->ir_domain = irq_domain_add_hierarchy(arch_get_ir_parent_domain(),
-   0, INTR_REMAP_TABLE_ENTRIES,
-   NULL, _ir_domain_ops,
-   iommu);
+   fn = irq_domain_alloc_named_id_fwnode("INTEL-IR", iommu->seq_id);
+   if (!fn)
+   goto out_free_bitmap;
+
+   iommu->ir_domain =
+   irq_domain_create_hierarchy(arch_get_ir_parent_domain(),
+   0, INTR_REMAP_TABLE_ENTRIES,
+   fn, _ir_domain_ops,
+   iommu);
+   irq_domain_free_fwnode(fn);
if (!iommu->ir_domain) {
pr_err("IR%d: failed to allocate irqdomain\n", iommu->seq_id);
goto out_free_bitmap;
}
-   iommu->ir_msi_domain = arch_create_msi_irq_domain(iommu->ir_domain);
+   iommu->ir_msi_domain =
+   arch_create_remap_msi_irq_domain(iommu->ir_domain,
+"INTEL-IR-MSI",
+iommu->seq_id);
 
ir_table->base = page_address(pages);
ir_table->bitmap = bitmap;

[tip:irq/core] x86/irq: Use irq_migrate_all_off_this_cpu()

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  ad7a929fa4bb1143357aa83043a149d5c27c68fd
Gitweb: http://git.kernel.org/tip/ad7a929fa4bb1143357aa83043a149d5c27c68fd
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:33 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:18 +0200

x86/irq: Use irq_migrate_all_off_this_cpu()

The generic migration code supports all the required features
already. Remove the x86 specific implementation and use the generic one.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235445.851311...@linutronix.de

---
 arch/x86/Kconfig  |  1 +
 arch/x86/kernel/irq.c | 89 ++-
 2 files changed, 3 insertions(+), 87 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0efb4c9..fcf1dad 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -87,6 +87,7 @@ config X86
select GENERIC_EARLY_IOREMAP
select GENERIC_FIND_FIRST_BIT
select GENERIC_IOMAP
+   select GENERIC_IRQ_MIGRATIONif SMP
select GENERIC_IRQ_PROBE
select GENERIC_IRQ_SHOW
select GENERIC_PENDING_IRQ  if SMP
diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index 78bd2b8..4aa03c5 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -432,95 +432,12 @@ int check_irq_vectors_for_cpu_disable(void)
 /* A cpu has been removed from cpu_online_mask.  Reset irq affinities. */
 void fixup_irqs(void)
 {
-   unsigned int irq, vector;
+   unsigned int irr, vector;
struct irq_desc *desc;
struct irq_data *data;
struct irq_chip *chip;
-   int ret;
 
-   for_each_irq_desc(irq, desc) {
-   const struct cpumask *affinity;
-   bool break_affinity = false;
-
-   if (!desc)
-   continue;
-
-   /* interrupt's are disabled at this point */
-   raw_spin_lock(>lock);
-
-   data = irq_desc_get_irq_data(desc);
-   chip = irq_data_get_irq_chip(data);
-   /*
-* The interrupt descriptor might have been cleaned up
-* already, but it is not yet removed from the radix
-* tree. If the chip does not have an affinity setter,
-* nothing to do here.
-*/
-   if (!chip && !chip->irq_set_affinity) {
-   raw_spin_unlock(>lock);
-   continue;
-   }
-
-   affinity = irq_data_get_affinity_mask(data);
-
-   if (!irq_has_action(irq) || irqd_is_per_cpu(data) ||
-   cpumask_subset(affinity, cpu_online_mask)) {
-   irq_fixup_move_pending(desc, false);
-   raw_spin_unlock(>lock);
-   continue;
-   }
-
-   /*
-* Complete an eventually pending irq move cleanup. If this
-* interrupt was moved in hard irq context, then the
-* vectors need to be cleaned up. It can't wait until this
-* interrupt actually happens and this CPU was involved.
-*/
-   irq_force_complete_move(desc);
-
-   /*
-* If there is a setaffinity pending, then try to reuse the
-* pending mask, so the last change of the affinity does
-* not get lost. If there is no move pending or the pending
-* mask does not contain any online CPU, use the current
-* affinity mask.
-*/
-   if (irq_fixup_move_pending(desc, true))
-   affinity = desc->pending_mask;
-
-   /*
-* If the mask does not contain an offline CPU, break
-* affinity and use cpu_online_mask as fall back.
-*/
-   if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
-   broke_affinity = true;
-   affinity = cpu_online_mask;
-   }
-
-   if (!irqd_can_move_in_process_context(data) && chip->irq_mask)
-   chip->irq_mask(data);
-
-   ret = chip->irq_set_affinity(data, affinity, true);
-   if (ret) {
-   pr_crit("IRQ %u: Force affinity failed (%d)\n",
-   d->irq, ret);
-   broke_affinity = false;
-   }
-
-   /*
-* We unmask if the irq was not marked masked by the
-* core code. That respects the lazy irq disable
-* behaviour.
-

Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Paolo Abeni

On Thu, 2017-06-22 at 10:47 +0200, Ingo Molnar wrote:
> * Paolo Abeni  wrote:
> 
> > The 'rep' prefix suffers for a relevant "setup cost"; as a result
> > string copies with unrolled loops are faster than even
> > optimized string copy using 'rep' variant, for short string.
> > 
> > This change updates __copy_user_generic() to use the unrolled
> > version for small string length. The threshold length for short
> > string - 64 - has been selected with empirical measures as the
> > larger value that still ensure a measurable gain.
> > 
> > A micro-benchmark of __copy_from_user() with different lengths shows
> > the following:
> > 
> > string len  vanilla patched delta
> > bytes   ticks   ticks   tick(%)
> > 
> > 0   58  26  32(55%)
> > 1   49  29  20(40%)
> > 2   49  31  18(36%)
> > 3   49  32  17(34%)
> > 4   50  34  16(32%)
> > 5   49  35  14(28%)
> > 6   49  36  13(26%)
> > 7   49  38  11(22%)
> > 8   50  31  19(38%)
> > 9   51  33  18(35%)
> > 10  52  36  16(30%)
> > 11  52  37  15(28%)
> > 12  52  38  14(26%)
> > 13  52  40  12(23%)
> > 14  52  41  11(21%)
> > 15  52  42  10(19%)
> > 16  51  34  17(33%)
> > 17  51  35  16(31%)
> > 18  52  37  15(28%)
> > 19  51  38  13(25%)
> > 20  52  39  13(25%)
> > 21  52  40  12(23%)
> > 22  51  42  9(17%)
> > 23  51  46  5(9%)
> > 24  52  35  17(32%)
> > 25  52  37  15(28%)
> > 26  52  38  14(26%)
> > 27  52  39  13(25%)
> > 28  52  40  12(23%)
> > 29  53  42  11(20%)
> > 30  52  43  9(17%)
> > 31  52  44  8(15%)
> > 32  51  36  15(29%)
> > 33  51  38  13(25%)
> > 34  51  39  12(23%)
> > 35  51  41  10(19%)
> > 36  52  41  11(21%)
> > 37  52  43  9(17%)
> > 38  51  44  7(13%)
> > 39  52  46  6(11%)
> > 40  51  37  14(27%)
> > 41  50  38  12(24%)
> > 42  50  39  11(22%)
> > 43  50  40  10(20%)
> > 44  50  42  8(16%)
> > 45  50  43  7(14%)
> > 46  50  43  7(14%)
> > 47  50  45  5(10%)
> > 48  50  37  13(26%)
> > 49  49  38  11(22%)
> > 50  50  40  10(20%)
> > 51  50  42  8(16%)
> > 52  50  42  8(16%)
> > 53  49  46  3(6%)
> > 54  50  46  4(8%)
> > 55  49  48  1(2%)
> > 56  50  39  11(22%)
> > 57  50  40  10(20%)
> > 58  49  42  7(14%)
> > 59  50  42  8(16%)
> > 60  50  46  4(8%)
> > 61  50  47  3(6%)
> > 62  50  48  2(4%)
> > 63  50  48  2(4%)
> > 64  51  38  13(25%)
> > 
> > Above 64 bytes the gain fades away.
> > 
> > Very similar values are collectd for __copy_to_user().
> > UDP receive performances under flood with small packets using recvfrom()
> > increase by ~5%.
> 
> What CPU model(s) were used for the performance testing and was it 
> performance 
> tested on several different types of CPUs?
> 
> Please add a comment here:
> 
> +   if (len <= 64)
> +   return copy_user_generic_unrolled(to, from, len);
> +
> 
> ... because it's not obvious at all that this is a performance optimization, 
> not a 
> correctness issue. Also explain that '64'

[PATCH v3 08/11] brcmsmac: make some local variables 'static const' to reduce stack size

2017-06-22 Thread Arnd Bergmann

With KASAN and a couple of other patches applied, this driver is one
of the few remaining ones that actually use more than 2048 bytes of
kernel stack:

broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 
'wlc_phy_workarounds_nphy_gainctrl':
broadcom/brcm80211/brcmsmac/phy/phy_n.c:16065:1: warning: the frame size of 
3264 bytes is larger than 2048 bytes [-Wframe-larger-than=]
broadcom/brcm80211/brcmsmac/phy/phy_n.c: In function 'wlc_phy_workarounds_nphy':
broadcom/brcm80211/brcmsmac/phy/phy_n.c:17138:1: warning: the frame size of 
2864 bytes is larger than 2048 bytes [-Wframe-larger-than=]

Here, I'm reducing the stack size by marking as many local variables as
'static const' as I can without changing the actual code.

Acked-by: Arend van Spriel 
Signed-off-by: Arnd Bergmann 
---
 .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 197 ++---
 1 file changed, 97 insertions(+), 100 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
index b3aab2fe96eb..ef685465f80a 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
@@ -14764,8 +14764,8 @@ static void 
wlc_phy_ipa_restore_tx_digi_filts_nphy(struct brcms_phy *pi)
 }
 
 static void
-wlc_phy_set_rfseq_nphy(struct brcms_phy *pi, u8 cmd, u8 *events, u8 *dlys,
-  u8 len)
+wlc_phy_set_rfseq_nphy(struct brcms_phy *pi, u8 cmd, const u8 *events,
+  const u8 *dlys, u8 len)
 {
u32 t1_offset, t2_offset;
u8 ctr;
@@ -15240,16 +15240,16 @@ static void 
wlc_phy_workarounds_nphy_gainctrl_2057_rev5(struct brcms_phy *pi)
 static void wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi)
 {
u16 currband;
-   s8 lna1G_gain_db_rev7[] = { 9, 14, 19, 24 };
-   s8 *lna1_gain_db = NULL;
-   s8 *lna1_gain_db_2 = NULL;
-   s8 *lna2_gain_db = NULL;
-   s8 tiaA_gain_db_rev7[] = { -9, -6, -3, 0, 3, 3, 3, 3, 3, 3 };
-   s8 *tia_gain_db;
-   s8 tiaA_gainbits_rev7[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4 };
-   s8 *tia_gainbits;
-   u16 rfseqA_init_gain_rev7[] = { 0x624f, 0x624f };
-   u16 *rfseq_init_gain;
+   static const s8 lna1G_gain_db_rev7[] = { 9, 14, 19, 24 };
+   const s8 *lna1_gain_db = NULL;
+   const s8 *lna1_gain_db_2 = NULL;
+   const s8 *lna2_gain_db = NULL;
+   static const s8 tiaA_gain_db_rev7[] = { -9, -6, -3, 0, 3, 3, 3, 3, 3, 3 
};
+   const s8 *tia_gain_db;
+   static const s8 tiaA_gainbits_rev7[] = { 0, 1, 2, 3, 4, 4, 4, 4, 4, 4 };
+   const s8 *tia_gainbits;
+   static const u16 rfseqA_init_gain_rev7[] = { 0x624f, 0x624f };
+   const u16 *rfseq_init_gain;
u16 init_gaincode;
u16 clip1hi_gaincode;
u16 clip1md_gaincode = 0;
@@ -15310,10 +15310,9 @@ static void 
wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi)
 
if ((freq <= 5080) || (freq == 5825)) {
 
-   s8 lna1A_gain_db_rev7[] = { 11, 16, 20, 24 };
-   s8 lna1A_gain_db_2_rev7[] = {
-   11, 17, 22, 25};
-   s8 lna2A_gain_db_rev7[] = { -1, 6, 10, 14 };
+   static const s8 lna1A_gain_db_rev7[] = { 11, 
16, 20, 24 };
+   static const s8 lna1A_gain_db_2_rev7[] = { 11, 
17, 22, 25};
+   static const s8 lna2A_gain_db_rev7[] = { -1, 6, 
10, 14 };
 
crsminu_th = 0x3e;
lna1_gain_db = lna1A_gain_db_rev7;
@@ -15321,10 +15320,9 @@ static void 
wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi)
lna2_gain_db = lna2A_gain_db_rev7;
} else if ((freq >= 5500) && (freq <= 5700)) {
 
-   s8 lna1A_gain_db_rev7[] = { 11, 17, 21, 25 };
-   s8 lna1A_gain_db_2_rev7[] = {
-   12, 18, 22, 26};
-   s8 lna2A_gain_db_rev7[] = { 1, 8, 12, 16 };
+   static const s8 lna1A_gain_db_rev7[] = { 11, 
17, 21, 25 };
+   static const s8 lna1A_gain_db_2_rev7[] = { 12, 
18, 22, 26};
+   static const s8 lna2A_gain_db_rev7[] = { 1, 8, 
12, 16 };
 
crsminu_th = 0x45;
clip1md_gaincode_B = 0x14;
@@ -15335,10 +15333,9 @@ static void 
wlc_phy_workarounds_nphy_gainctrl_2057_rev6(struct brcms_phy *pi)
lna2_gain_db = lna2A_gain_db_rev7;
} else {
 
-   s8 lna1A_gain_db_rev7[] = { 12, 18, 22, 26 };
-   s8

[tip:irq/core] x86/apic: Mark single target interrupts

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  3ca57222c36ba31b80aa25de313f3c8ab26a8102
Gitweb: http://git.kernel.org/tip/3ca57222c36ba31b80aa25de313f3c8ab26a8102
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:54 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:26 +0200

x86/apic: Mark single target interrupts

If the interrupt destination mode of the APIC is physical then the
effective affinity is restricted to a single CPU.

Mark the interrupt accordingly in the domain allocation code, so the core
code can avoid pointless affinity setting attempts.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235447.508846...@linutronix.de

---
 arch/x86/kernel/apic/vector.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index b270a76..2567dc0 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -371,6 +371,13 @@ static int x86_vector_alloc_irqs(struct irq_domain 
*domain, unsigned int virq,
   irq_data);
if (err)
goto error;
+   /*
+* If the apic destination mode is physical, then the
+* effective affinity is restricted to a single target
+* CPU. Mark the interrupt accordingly.
+*/
+   if (!apic->irq_dest_mode)
+   irqd_set_single_target(irq_data);
}
 
return 0;

[PATCH v3 05/11] dvb-frontends: reduce stack size in i2c access

2017-06-22 Thread Arnd Bergmann

A typical code fragment was copied across many dvb-frontend
drivers and causes large stack frames when built with
-fsanitize-address-use-after-scope, e.g.

drivers/media/dvb-frontends/cxd2841er.c:3225:1: error: the frame size of 3992 
bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
drivers/media/dvb-frontends/cxd2841er.c:3404:1: error: the frame size of 3136 
bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
drivers/media/dvb-frontends/stv0367.c:3143:1: error: the frame size of 4016 
bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
drivers/media/dvb-frontends/stv090x.c:3430:1: error: the frame size of 5312 
bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
drivers/media/dvb-frontends/stv090x.c:4248:1: error: the frame size of 4872 
bytes is larger than 3072 bytes [-Werror=frame-larger-than=]

By marking the register access functions as noinline_if_stackbloat,
we can completely avoid this problem.

Signed-off-by: Arnd Bergmann 
---
 drivers/media/dvb-frontends/ascot2e.c   |  3 ++-
 drivers/media/dvb-frontends/cxd2841er.c |  4 ++--
 drivers/media/dvb-frontends/drx39xyj/drxj.c | 14 +++---
 drivers/media/dvb-frontends/helene.c|  4 ++--
 drivers/media/dvb-frontends/horus3a.c   |  2 +-
 drivers/media/dvb-frontends/itd1000.c   |  2 +-
 drivers/media/dvb-frontends/mt312.c |  2 +-
 drivers/media/dvb-frontends/si2165.c| 14 +++---
 drivers/media/dvb-frontends/stb0899_drv.c   |  2 +-
 drivers/media/dvb-frontends/stb6100.c   |  2 +-
 drivers/media/dvb-frontends/stv0367.c   |  2 +-
 drivers/media/dvb-frontends/stv090x.c   |  2 +-
 drivers/media/dvb-frontends/stv6110.c   |  2 +-
 drivers/media/dvb-frontends/stv6110x.c  |  2 +-
 drivers/media/dvb-frontends/tda8083.c   |  2 +-
 drivers/media/dvb-frontends/zl10039.c   |  2 +-
 16 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/drivers/media/dvb-frontends/ascot2e.c 
b/drivers/media/dvb-frontends/ascot2e.c
index 0ee0df53b91b..da1d1fc03c5e 100644
--- a/drivers/media/dvb-frontends/ascot2e.c
+++ b/drivers/media/dvb-frontends/ascot2e.c
@@ -153,7 +153,8 @@ static int ascot2e_write_regs(struct ascot2e_priv *priv,
return 0;
 }
 
-static int ascot2e_write_reg(struct ascot2e_priv *priv, u8 reg, u8 val)
+static noinline_if_stackbloat int ascot2e_write_reg(struct ascot2e_priv *priv,
+   u8 reg, u8 val)
 {
return ascot2e_write_regs(priv, reg, , 1);
 }
diff --git a/drivers/media/dvb-frontends/cxd2841er.c 
b/drivers/media/dvb-frontends/cxd2841er.c
index ce37dc2e89c7..6b851a948ce0 100644
--- a/drivers/media/dvb-frontends/cxd2841er.c
+++ b/drivers/media/dvb-frontends/cxd2841er.c
@@ -258,7 +258,7 @@ static int cxd2841er_write_regs(struct cxd2841er_priv *priv,
return 0;
 }
 
-static int cxd2841er_write_reg(struct cxd2841er_priv *priv,
+static noinline_if_stackbloat int cxd2841er_write_reg(struct cxd2841er_priv 
*priv,
   u8 addr, u8 reg, u8 val)
 {
return cxd2841er_write_regs(priv, addr, reg, , 1);
@@ -306,7 +306,7 @@ static int cxd2841er_read_regs(struct cxd2841er_priv *priv,
return 0;
 }
 
-static int cxd2841er_read_reg(struct cxd2841er_priv *priv,
+static noinline_if_stackbloat int cxd2841er_read_reg(struct cxd2841er_priv 
*priv,
  u8 addr, u8 reg, u8 *val)
 {
return cxd2841er_read_regs(priv, addr, reg, val, 1);
diff --git a/drivers/media/dvb-frontends/drx39xyj/drxj.c 
b/drivers/media/dvb-frontends/drx39xyj/drxj.c
index 14040c915dbb..ec5b13ca630b 100644
--- a/drivers/media/dvb-frontends/drx39xyj/drxj.c
+++ b/drivers/media/dvb-frontends/drx39xyj/drxj.c
@@ -1516,7 +1516,7 @@ static int drxdap_fasi_read_block(struct i2c_device_addr 
*dev_addr,
 *
 **/
 
-static int drxdap_fasi_read_reg16(struct i2c_device_addr *dev_addr,
+static noinline_if_stackbloat int drxdap_fasi_read_reg16(struct 
i2c_device_addr *dev_addr,
 u32 addr,
 u16 *data, u32 flags)
 {
@@ -1549,7 +1549,7 @@ static int drxdap_fasi_read_reg16(struct i2c_device_addr 
*dev_addr,
 *
 **/
 
-static int drxdap_fasi_read_reg32(struct i2c_device_addr *dev_addr,
+static noinline_if_stackbloat int drxdap_fasi_read_reg32(struct 
i2c_device_addr *dev_addr,
 u32 addr,
 u32 *data, u32 flags)
 {
@@ -1722,7 +1722,7 @@ static int drxdap_fasi_write_block(struct i2c_device_addr 
*dev_addr,
 *
 **/
 
-static int drxdap_fasi_write_reg16(struct i2c_device_addr *dev_addr,
+static noinline_if_stackbloat int drxdap_fasi_write_reg16(struct 
i2c_device_addr *dev_addr,
  u32 addr,
  u16 data, u32 flags)
 {
@@ -1795,7 +1795,7 @@ static

[PATCH v3 00/11] bring back stack frame warning with KASAN

2017-06-22 Thread Arnd Bergmann

This is a new version of patches I originally submitted back in
March [1], this time reducing the size of the series even further.

This minimal set of patches only makes sure that we do get
frame size warnings in allmodconfig for x86_64 and arm64 again,
even with KASAN enabled.

The changes this time are reduced to:

- I'm introducing "noinline_if_stackbloat" and use it in a number
  of places that suffer from inline functions with local variables
  - netlink, as used in various parts of the kernel
  - a number of drivers/media drivers
  - a handful of wireless network drivers
- a rework for the brcmsmac driver
- -fsanitize-address-use-after-scope is moved to a separate
  CONFIG_KASAN_EXTRA option that increases the warning limit
- CONFIG_KASAN_EXTRA is disabled with CONFIG_COMPILE_TEST,
  improving compile speed and disabling code that leads to
  valid warnings on gcc-7.0.1
- kmemcheck conflicts with CONFIG_KASAN_EXTRA

Compared to the version 1, I no longer have patches
to fix all the CONFIG_KASAN_EXTRA warnings:

- READ_ONCE/WRITE_ONCE cause problems in lots of code
- typecheck() causes huge problems in a few places
- many more uses of noinline_if_stackbloat

And compared to version 2, I have rewritten the vt-keyboard
patch based on feedback, and made KMEMCHECK mutually exclusive
with KASAN (rather than KASAN_EXTRA), everything else remains
unchanged.

This series lets us add back a stack frame warning for the regular
2048 bytes without CONFIG_KASAN_EXTRA. I set the warning limit with
KASAN_EXTRA to 3072, since I have an additional set of patches
to address all files that surpass that limit. We can debate whether
we want to apply those as a follow-up, or instead remove the option
entirely.

Another follow-up series I have reduces the warning limit with
KASAN to 1536, and without KASAN to 1280 for 64-bit architectures.

I hope that Andrew can pick up the entire series for mmotm, and
we can eventually backport most of it to stable kernels and
address the warnings that kernelci still reports for this problem [2].

 Arnd

[1] https://lkml.org/lkml/2017/3/2/508
[2] https://kernelci.org/build/id/593f89a659b51463306b958d/logs/

Arnd Bergmann (11):
  compiler: introduce noinline_if_stackbloat annotation
  netlink: mark nla_put_{u8,u16,u32} noinline_if_stackbloat
  rocker: mark rocker_tlv_put_* functions as noinline_if_stackbloat
  mtd: cfi: reduce stack size with KASAN
  dvb-frontends: reduce stack size in i2c access
  r820t: mark register functions as noinline_if_stackbloat
  tty: improve tty_insert_flip_char() fast path
  brcmsmac: make some local variables 'static const' to reduce stack
size
  brcmsmac: split up wlc_phy_workarounds_nphy
  brcmsmac: reindent split functions
  kasan: rework Kconfig settings

 drivers/media/dvb-frontends/ascot2e.c  |3 +-
 drivers/media/dvb-frontends/cxd2841er.c|4 +-
 drivers/media/dvb-frontends/drx39xyj/drxj.c|   14 +-
 drivers/media/dvb-frontends/helene.c   |4 +-
 drivers/media/dvb-frontends/horus3a.c  |2 +-
 drivers/media/dvb-frontends/itd1000.c  |2 +-
 drivers/media/dvb-frontends/mt312.c|2 +-
 drivers/media/dvb-frontends/si2165.c   |   14 +-
 drivers/media/dvb-frontends/stb0899_drv.c  |2 +-
 drivers/media/dvb-frontends/stb6100.c  |2 +-
 drivers/media/dvb-frontends/stv0367.c  |2 +-
 drivers/media/dvb-frontends/stv090x.c  |2 +-
 drivers/media/dvb-frontends/stv6110.c  |2 +-
 drivers/media/dvb-frontends/stv6110x.c |2 +-
 drivers/media/dvb-frontends/tda8083.c  |2 +-
 drivers/media/dvb-frontends/zl10039.c  |2 +-
 drivers/media/tuners/r820t.c   |4 +-
 drivers/mtd/chips/cfi_cmdset_0020.c|8 +-
 drivers/net/ethernet/rocker/rocker_tlv.h   |   24 +-
 .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 1856 ++--
 drivers/tty/tty_buffer.c   |   24 +
 include/linux/compiler.h   |   11 +
 include/linux/mtd/map.h|8 +-
 include/linux/tty_flip.h   |3 +-
 include/net/netlink.h  |   36 +-
 lib/Kconfig.debug  |4 +-
 lib/Kconfig.kasan  |   11 +-
 lib/Kconfig.kmemcheck  |1 +
 scripts/Makefile.kasan |3 +
 29 files changed, 1009 insertions(+), 1045 deletions(-)

-- 
2.9.0

[PATCH v3 01/11] compiler: introduce noinline_if_stackbloat annotation

2017-06-22 Thread Arnd Bergmann

When CONFIG_KASAN is set, we can run into some code that uses incredible
amounts of kernel stack:

drivers/staging/dgnc/dgnc_neo.c:1056:1: error: the frame size of 2 bytes is 
larger than 2048 bytes [-Werror=frame-larger-than=]
drivers/media/i2c/cx25840/cx25840-core.c:4960:1: error: the frame size of 94000 
bytes is larger than 2048 bytes [-Werror=frame-larger-than=]
drivers/media/dvb-frontends/stv090x.c:3430:1: error: the frame size of 5312 
bytes is larger than 3072 bytes [-Werror=frame-larger-than=]

This happens when a sanitizer uses stack memory each time an inline function
gets called. This introduces a new annotation for those functions to make
them either 'inline' or 'noinline' dependning on the CONFIG_KASAN symbol.

Signed-off-by: Arnd Bergmann 
---
 include/linux/compiler.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 219f82f3ec1a..a402c43c07d2 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -412,6 +412,17 @@ static __always_inline void __write_once_size(volatile 
void *p, void *res, int s
  */
 #define noinline_for_stack noinline
 
+/*
+ * CONFIG_KASAN can lead to extreme stack usage with certain patterns when
+ * one function gets inlined many times and each instance requires a stack
+ * ckeck.
+ */
+#ifdef CONFIG_KASAN
+#define noinline_if_stackbloat noinline __maybe_unused
+#else
+#define noinline_if_stackbloat inline
+#endif
+
 #ifndef __always_inline
 #define __always_inline inline
 #endif
-- 
2.9.0

[PATCH v3 03/11] rocker: mark rocker_tlv_put_* functions as noinline_if_stackbloat

2017-06-22 Thread Arnd Bergmann

Inlining these functions creates lots of stack variables when KASAN is
enabled, leading to this warning about potential stack overflow:

drivers/net/ethernet/rocker/rocker_ofdpa.c: In function 
'ofdpa_cmd_flow_tbl_add':
drivers/net/ethernet/rocker/rocker_ofdpa.c:621:1: error: the frame size of 2752 
bytes is larger than 1536 bytes [-Werror=frame-larger-than=]

This marks all of them noinline_if_stackbloat, which solves the problem by
keeping the redzone inside of the separate stack frames.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/rocker/rocker_tlv.h | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker_tlv.h 
b/drivers/net/ethernet/rocker/rocker_tlv.h
index a63ef82e7c72..8970a414eb5b 100644
--- a/drivers/net/ethernet/rocker/rocker_tlv.h
+++ b/drivers/net/ethernet/rocker/rocker_tlv.h
@@ -139,38 +139,38 @@ rocker_tlv_start(struct rocker_desc_info *desc_info)
 int rocker_tlv_put(struct rocker_desc_info *desc_info,
   int attrtype, int attrlen, const void *data);
 
-static inline int rocker_tlv_put_u8(struct rocker_desc_info *desc_info,
-   int attrtype, u8 value)
+static noinline_if_stackbloat int
+rocker_tlv_put_u8(struct rocker_desc_info *desc_info, int attrtype, u8 value)
 {
return rocker_tlv_put(desc_info, attrtype, sizeof(u8), );
 }
 
-static inline int rocker_tlv_put_u16(struct rocker_desc_info *desc_info,
-int attrtype, u16 value)
+static noinline_if_stackbloat int
+rocker_tlv_put_u16(struct rocker_desc_info *desc_info, int attrtype, u16 value)
 {
return rocker_tlv_put(desc_info, attrtype, sizeof(u16), );
 }
 
-static inline int rocker_tlv_put_be16(struct rocker_desc_info *desc_info,
- int attrtype, __be16 value)
+static noinline_if_stackbloat int
+rocker_tlv_put_be16(struct rocker_desc_info *desc_info, int attrtype, __be16 
value)
 {
return rocker_tlv_put(desc_info, attrtype, sizeof(__be16), );
 }
 
-static inline int rocker_tlv_put_u32(struct rocker_desc_info *desc_info,
-int attrtype, u32 value)
+static noinline_if_stackbloat int
+rocker_tlv_put_u32(struct rocker_desc_info *desc_info, int attrtype, u32 value)
 {
return rocker_tlv_put(desc_info, attrtype, sizeof(u32), );
 }
 
-static inline int rocker_tlv_put_be32(struct rocker_desc_info *desc_info,
- int attrtype, __be32 value)
+static noinline_if_stackbloat int
+rocker_tlv_put_be32(struct rocker_desc_info *desc_info, int attrtype, __be32 
value)
 {
return rocker_tlv_put(desc_info, attrtype, sizeof(__be32), );
 }
 
-static inline int rocker_tlv_put_u64(struct rocker_desc_info *desc_info,
-int attrtype, u64 value)
+static noinline_if_stackbloat int
+rocker_tlv_put_u64(struct rocker_desc_info *desc_info, int attrtype, u64 value)
 {
return rocker_tlv_put(desc_info, attrtype, sizeof(u64), );
 }
-- 
2.9.0

Re: [PATCH net-next,2/2] hv_netvsc: Fix the carrier state error when data path is off

2017-06-22 Thread David Miller

From: Haiyang Zhang 
Date: Wed, 21 Jun 2017 16:40:47 -0700

> From: Haiyang Zhang 
> 
> When the VF NIC is opened, the synthetic NIC's carrier state is set to
> off. This tells the host to transitions data path to the VF device. But
> if startup script or user manipulates the admin state of the netvsc
> device directly for example:
> # ifconfig eth0 down
>   # ifconfig eth0 up
> Then the carrier state of the synthetic NIC would be on, even though the
> data path was still over the VF NIC. This patch sets the carrier state
> of synthetic NIC with consideration of the related VF state.
> 
> Signed-off-by: Haiyang Zhang 
> Reviewed-by: Stephen Hemminger 

Applied.

Re: [PATCH 4/7] alpha: provide ioread64 and iowrite64 implementations

2017-06-22 Thread Logan Gunthorpe


On 6/22/2017 11:29 AM, Stephen  Bates wrote:

+#define iowrite64be(v,p) iowrite32(cpu_to_be64(v), (p))


Logan, thanks for taking this cleanup on. I think this should be iowrite64 not 
iowrite32?


Yup, good catch. Thanks. I'll fix it in a v2 of this series.

Logan

Re: [PATCH net-next,1/2] hv_netvsc: Remove unnecessary var link_state from struct netvsc_device_info

2017-06-22 Thread David Miller

From: Haiyang Zhang 
Date: Wed, 21 Jun 2017 16:40:46 -0700

> From: Haiyang Zhang 
> 
> We simply use rndis_device->link_state in the netdev_dbg. The variable,
> link_state from struct netvsc_device_info, is not used anywhere else.
> 
> Signed-off-by: Haiyang Zhang 
> Reviewed-by: Stephen Hemminger 

Applied.

Re: [PATCH v3 06/11] x86/mm: Rework lazy TLB mode and TLB freshness tracking

2017-06-22 Thread Andy Lutomirski

On Thu, Jun 22, 2017 at 7:50 AM, Borislav Petkov  wrote:
> On Tue, Jun 20, 2017 at 10:22:12PM -0700, Andy Lutomirski wrote:
>> Rewrite it entirely.  When we enter lazy mode, we simply remove the
>> cpu from mm_cpumask.  This means that we need a way to figure out
>
> s/cpu/CPU/

Done.

>
>> whether we've missed a flush when we switch back out of lazy mode.
>> I use the tlb_gen machinery to track whether a context is up to
>> date.
>>
>> Note to reviewers: this patch, my itself, looks a bit odd.  I'm
>> using an array of length 1 containing (ctx_id, tlb_gen) rather than
>> just storing tlb_gen, and making it at array isn't necessary yet.
>> I'm doing this because the next few patches add PCID support, and,
>> with PCID, we need ctx_id, and the array will end up with a length
>> greater than 1.  Making it an array now means that there will be
>> less churn and therefore less stress on your eyeballs.
>>
>> NB: This is dubious but, AFAICT, still correct on Xen and UV.
>> xen_exit_mmap() uses mm_cpumask() for nefarious purposes and this
>> patch changes the way that mm_cpumask() works.  This should be okay,
>> since Xen *also* iterates all online CPUs to find all the CPUs it
>> needs to twiddle.
>
> This whole text should be under the "---" line below if we don't want it
> in the commit message.

I figured that some future reader of this patch might actually want to
see this text, though.

>
>>
>> The UV tlbflush code is rather dated and should be changed.

And I'd definitely like the UV maintainers to notice this part, now or
in the future :)  I don't want to personally touch the UV code with a
ten-foot pole, but it really should be updated by someone who has a
chance of getting it right and being able to test it.

>> +
>> + if (cpumask_test_cpu(cpu, mm_cpumask(mm)))
>> + cpumask_clear_cpu(cpu, mm_cpumask(mm));
>
> It seems we haz a helper for that: cpumask_test_and_clear_cpu() which
> does BTR straightaway.

Yeah, but I'm doing this for performance.  I think that all the
various one-line helpers do a LOCKed op right away, and I think it's
faster to see if we can avoid the LOCKed op by trying an ordinary read
first.  OTOH, maybe this is misguided -- if the cacheline lives
somewhere else and we do end up needing to update it, we'll end up
first sharing it and then making it exclusive, which increases the
amount of cache coherency traffic, so maybe I'm optimizing for the
wrong thing.  What do you think?

>> - if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
>> - BUG();
>> + /* Warn if we're not lazy. */
>> + WARN_ON(cpumask_test_cpu(smp_processor_id(), mm_cpumask(loaded_mm)));
>
> We don't BUG() anymore?

We could.  But, when the whole patch series is applied, the only
caller left is a somewhat dubious Xen optimization, and if we blindly
continue executing, I think the worst that happens is that we OOPS
later or that we get segfaults when we shouldn't get segfaults.

>
>>
>>   switch_mm(NULL, _mm, NULL);
>>  }
>> @@ -67,133 +67,118 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct 
>> mm_struct *next,
>>  {
>>   unsigned cpu = smp_processor_id();
>>   struct mm_struct *real_prev = this_cpu_read(cpu_tlbstate.loaded_mm);
>> + u64 next_tlb_gen;
>
> Please sort function local variables declaration in a reverse christmas
> tree order:
>
>  longest_variable_name;
>  shorter_var_name;
>  even_shorter;
>  i;
>
>>
>>   /*
>> -  * NB: The scheduler will call us with prev == next when
>> -  * switching from lazy TLB mode to normal mode if active_mm
>> -  * isn't changing.  When this happens, there is no guarantee
>> -  * that CR3 (and hence cpu_tlbstate.loaded_mm) matches next.
>> +  * NB: The scheduler will call us with prev == next when switching
>> +  * from lazy TLB mode to normal mode if active_mm isn't changing.
>> +  * When this happens, we don't assume that CR3 (and hence
>> +  * cpu_tlbstate.loaded_mm) matches next.
>>*
>>* NB: leave_mm() calls us with prev == NULL and tsk == NULL.
>>*/
>>
>> - this_cpu_write(cpu_tlbstate.state, TLBSTATE_OK);
>> + /* We don't want flush_tlb_func_* to run concurrently with us. */
>> + if (IS_ENABLED(CONFIG_PROVE_LOCKING))
>> + WARN_ON_ONCE(!irqs_disabled());
>> +
>> + VM_BUG_ON(read_cr3_pa() != __pa(real_prev->pgd));
>
> Why do we need that check? Can that ever happen?

It did in one particular buggy incarnation.  It would also trigger if,
say, suspend/resume corrupts CR3.  Admittedly this is unlikely, but
I'd rather catch it.  Once PCID is on, corruption seems a bit less
farfetched -- this assertion will catch anyone who accidentally does
write_cr3(read_cr3_pa()).

>
>>   if (real_prev == next) {
>> - /*
>> -  * There's nothing to do: we always keep the per-mm control
>> -  * regs in sync with cpu_tlbstate.loaded_mm.  Just
>> -

Re: [lkp-robot] [mm] 1be7107fbe: kernel_BUG_at_mm/mmap.c

2017-06-22 Thread Hugh Dickins

On Thu, 22 Jun 2017, Oleg Nesterov wrote:
> On 06/21, Hugh Dickins wrote:
> >
> > On Wed, 21 Jun 2017, Linus Torvalds wrote:
> > > On Wed, Jun 21, 2017 at 1:56 PM, Oleg Nesterov  wrote:
> > > >
> > > > I understand. My point is that this check was invalidated by 
> > > > stack-guard-page
> > > > a long ago, and this means that we add the user-visible change now.
> > >
> > > Yeah. I guess we could consider it an *old* regression that got fixed,
> > > but if people started relying on the regression...
> > >
> > > >> Do you have a pointer to the report for this regression? I must have 
> > > >> missed it.
> > > >
> > > > See http://marc.info/?t=14979452301=1=2
> > >
> > > Ok.
> > >
> > > And thinking about it, while that is a silly test-case, the notion of
> > > "create top-down segment, then start populating it _before_ moving the
> > > stack pointer into it" is actually perfectly valid.
> > >
> > > So I guess checking against the stack pointer is wrong in that case -
> > > at least if the stack pointer isn't inside that vma to begin with.
> > >
> > > So yes, removing that check looks like the right thing to do for now.
> > >
> > > Do you want to send me the patch if you already have a commit message etc?
> >
> > I have a bit of a bad feeling about this.
> >
> > Perhaps it's just sentimental attachment to all those weird
> > and ancient stack pointer checks in arch//fault.c.
> >
> > We have been inconsistent: cris frv m32r m68k microblaze mn10300
> > openrisc powerpc tile um x86 have such checks, the others don't.
> > So that's a good reason to delete them.
> 
> OK, I didn't bother to check other acrhitectures, thanks...
> 
> > But at least at the moment those checks impose some sanity:
> > just a page less than we had imagined for several years.
> > Once we remove them, they cannot go back.  Should we now
> > complicate them with an extra page of slop?
> 
> Something like the patch below? Yes, I thought about this too.

Yes, that patch (times 11 for all the architectures) would be a good
conservative choice: imposing the traditional sanity check, but
weakened by one page to match what we've inadvertently been doing
for the last four years.

Would deserve a comment (since it makes no sense in any tree by
itself), but unfair to ask you to write that: I must get this mail
off before a meeting, can't think what to say now.

But my own preference this morning is to do nothing, until we hear
more complaints and can classify them as genuine userspace breakage,
as opposed to testcases surprised by a new kernel implementation.

Hugh

> 
> I simply do not know. Honestly, I do not even know why MAP_GROWSDOWN
> exists. I mean, I do not understand how user-space can actually use it
> to get auto-growing, the usage of MAP_GROWSDOWN in (say) criu is clear.
> The main thread's stack can grow, but this is only because it is placed
> at the right place, above mm->mmap_base in case of top-down layout.
> 
> > I'm not entirely persuaded by your pre-population argument:
> > it's perfectly possible to prepare a MAP_GROWSDOWN area with
> > an initial size, that's populated in a normal way, before handing
> > off for stack expansion - isn't it?
> 
> Exactly.
> 
> > I'd be interested to hear more about that (redhat internal) bug
> > report that Oleg mentions: whether it gives stronger grounds for
> > making this sudden change than the CRIU testcase.
> 
> Probably not. Well, the customer reported multiple problems, but most
> of them were caused by rhel-specific bugs. As for "MAP_GROWSDOWN does
> not grow", most probably this was another test-case, not the real
> application. I will ask and report back if this is not true.
> 
> In short, I agree with any decision. Even with "we do not care if we
> break some artificial test-cases".
> 
> Oleg.
> ---
> 
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1409,7 +1409,7 @@ __do_page_fault(struct pt_regs *regs, unsigned long 
> error_code,
>   bad_area(regs, error_code, address);
>   return;
>   }
> - if (error_code & PF_USER) {
> + if ((error_code & PF_USER) && (address + PAGE_SIZE < vma->vm_start)) {
>   /*
>* Accessing the stack below %sp is always a bug.
>* The large cushion allows instructions like enter

Re: [PATCH v4 0/5] Add support for the ARMv8.2 Statistical Profiling Extension

2017-06-22 Thread Will Deacon

On Thu, Jun 22, 2017 at 10:56:40AM -0500, Kim Phillips wrote:
> On Wed, 21 Jun 2017 16:31:09 +0100
> Will Deacon  wrote:
> 
> > On Thu, Jun 15, 2017 at 10:57:35AM -0500, Kim Phillips wrote:
> > > On Mon, 12 Jun 2017 11:20:48 -0500
> > > Kim Phillips  wrote:
> > > 
> > > > On Mon, 12 Jun 2017 12:08:23 +0100
> > > > Mark Rutland  wrote:
> > > > 
> > > > > On Mon, Jun 05, 2017 at 04:22:52PM +0100, Will Deacon wrote:
> > > > > > This is the sixth posting of the patches previously posted here:
> > > ...
> > > > > Kim, do you have any version of the userspace side that we could look
> > > > > at?
> > > > > 
> > > > > For review, it would be really helpful to have something that can poke
> > > > > the PMU, even if it's incomplete or lacking polish.
> > > > 
> > > > Here's the latest push, based on a a couple of prior versions of this
> > > > driver:
> > > > 
> > > > http://linux-arm.org/git?p=linux-kp.git;a=shortlog;h=refs/heads/armspev0.1
> > > > 
> > > > I don't seem to be able to get any SPE data output after rebasing on
> > > > this version of the driver.  Still don't know why at the moment...
> > > 
> > > Bisected to commit e38ba76deef "perf tools: force uncore events to
> > > system wide monitoring".  So, using record with specifying a -C
> > >  explicitly now produces SPE data, but only a couple of valid
> > > records at the beginning of each buffer; the rest is filled with
> > > PADding (0's).
> > > 
> > > I see Mark's latest comments have found a possible issue in the perf
> > > aux buffer handling code in the driver, and that the driver does some
> > > memset of padding (0's) itself; could that be responsible for the above
> > > behaviour?
> > 
> > Possibly. Do you know how big you're mapping the aux buffer
> 
> 4MiB.
> 
> > and what (if any) value you're passing as aux_watermark?
> 
> None passed, but it looks like 4KiB was used since the AUXTRACE size
> was 4MiB - 4KiB.
> 
> I'm not seeing the issue with a simple bts-based version I'm
> working on...yet.  We can revisit if I'm able to reproduce again; the
> problem could have been on the userspace side.
> 
> Meanwhile, when using fvp-base.dtb, my model setup stops booting the
> kernel after "smp: Bringing up secondary CPUs ...".  If I however take
> the second SPE node from fvp-base.dts and add it to my working device
> tree, I get this during the driver probe:
> 
> [1.042063] arm_spe_pmu spe-pmu@0: probed for CPUs 0-7 [max_record_sz 64, 
> align 1, features 0xf]
> [1.043582] arm_spe_pmu spe-pmu@1: probed for CPUs 0-7 [max_record_sz 64, 
> align 1, features 0xf]
> [1.043631] genirq: Flags mismatch irq 6. 4404 (arm_spe_pmu) vs. 
> 4404 (arm_spe_pmu)

Looks like you've screwed up your IRQ partitions, so you are effectively
registering the same device twice, which then blows up due to lack of shared
irqs.

Either remove one of the devices, or use IRQ partitions to restrict them
to unique sets of CPUs.

Will

Re: [PATCH 1/1] - Fix reiserfs WARNING in dquot_writeback_dquots

2017-06-22 Thread Jeff Mahoney

On 6/14/17 11:27 PM, Tim Savannah wrote:
> Any comments? Can we get this merged, or some variation? It affects a
> lot more than just all my machines. Google shows this traceback is
> occurring for others as well.

Hi Tim -

This patch was merged for 4.12:

commit 1e0e653f1136a413a9969e5d0d548ee6499b9763
Author: Jan Kara 
Date:   Wed Apr 5 14:17:30 2017 +0200

reiserfs: Protect dquot_writeback_dquots() by s_umount semaphore

dquot_writeback_dquots() expects s_umount semaphore to be held to
protect it from other concurrent quota operations. reiserfs_sync_fs()
can call dquot_writeback_dquots() without holding s_umount semaphore
when called from flush_old_commits().

Fix the problem by grabbing s_umount in flush_old_commits(). However we
have to be careful and use only trylock since reiserfs_cancel_old_sync()
can be waiting for flush_old_commits() to complete while holding
s_umount semaphore. Possible postponing of sync work is not a big deal
though as that is only an opportunistic flush.

Fixes: 9d1ccbe70e0b14545caad12dc73adb3605447df0
Reported-by: Jan Beulich 
Signed-off-by: Jan Kara 

Your patch was not correct.  I'll provide review below if you're interested in 
the details.

> On Mon, May 29, 2017 at 12:57 AM, Tim Savannah  wrote:
>> Oops, sent last one without patch on accident. Attached this time.
>>
>>
>> This has been happening for me since 4.10
>>
>> dquot_writeback_dquots expects a lock to be held on super_block->s_umount ,
>>
>> and reiserfs_sync_fs, which calls dquot_writeback_dquots, does not
>> obtain such a lock.
>>
>> Thus, the following warning is generated:
>>
>> [Sun May 28 04:58:06 2017] [ cut here ]
>> [Sun May 28 04:58:06 2017] WARNING: CPU: 0 PID: 31 at
>> fs/quota/dquot.c:620 dquot_writeback_dquots+0x248/0x250
>> [Sun May 28 04:58:06 2017] Modules linked in: bbswitch(O)
>> nls_iso8859_1 nls_cp437 iTCO_wdt iTCO_vendor_support acer_wmi
>> sparse_keymap coretemp efi_pstore hwmon intel_rapl
>> x86_pkg_temp_thermal intel_powerclamp pcspkr ath9k ath9k_common
>> ath9k_hw ath efivars mac80211 joydev psmouse i2c_i801 cfg80211
>> input_leds led_class nvidiafb vgastate fb_ddc atl1c i915
>> drm_kms_helper drm intel_gtt syscopyarea sysfillrect sysimgblt mei_me
>> fb_sys_fops i2c_algo_bit mei lpc_ich shpchp acpi_cpufreq thermal wmi
>> video tpm_tis tpm_tis_core button tpm sch_fq_codel evdev mac_hid
>> uvcvideo vboxnetflt(O) videobuf2_vmalloc videobuf2_memops
>> vboxnetadp(O) videobuf2_v4l2 videobuf2_core pci_stub videodev
>> vboxpci(O) media ath3k btusb btrtl btbcm btintel vboxdrv(O) bluetooth
>> rfkill loop usbip_host usbip_core sg ip_tables x_tables hid_generic
>> usbhid
>> [Sun May 28 04:58:06 2017]  hid sr_mod cdrom sd_mod serio_raw atkbd
>> libps2 ehci_pci xhci_pci ahci xhci_hcd ehci_hcd libahci libata
>> scsi_mod usbcore usb_common i8042 serio raid1 raid0 dm_mod md_mod
>> [Sun May 28 04:58:06 2017] CPU: 0 PID: 31 Comm: kworker/0:1 Tainted: G
>>   O4.11.3-1-ck2-ck #1
>> [Sun May 28 04:58:06 2017] Hardware name: Acer Aspire V3-771/VA70_HC,
>> BIOS V2.16 01/14/2013
>> [Sun May 28 04:58:06 2017] Workqueue: events_long flush_old_commits
>> [Sun May 28 04:58:06 2017] Call Trace:
>> [Sun May 28 04:58:06 2017]  ? dump_stack+0x5c/0x7a
>> [Sun May 28 04:58:06 2017]  ? __warn+0xb4/0xd0
>> [Sun May 27 04:58:06 2017]  ? dquot_writeback_dquots+0x248/0x250
>> [Sun May 27 04:58:06 2017]  ? reiserfs_sync_fs+0x12/0x70
>> [Sun May 27 04:58:06 2017]  ? dbs_work_handler+0x3d/0x50
>> [Sun May 27 04:58:06 2017]  ? flush_old_commits+0x30/0x50
>> [Sun May 27 04:58:06 2017]  ? process_one_work+0x1b1/0x3a0
>> [Sun May 27 04:58:06 2017]  ? worker_thread+0x42/0x4c0
>> [Sun May 27 04:58:06 2017]  ? kthread+0xf2/0x130
>> [Sun May 27 04:58:06 2017]  ? process_one_work+0x3a0/0x3a0
>> [Sun May 27 04:58:06 2017]  ? kthread_create_on_node+0x40/0x40
>> [Sun May 27 04:58:06 2017]  ? ret_from_fork+0x26/0x40
>> [Sun May 27 04:58:06 2017] ---[ end trace 7e040d020ba99607 ]---
>>
>>
>> This occurs during system boot on a fully-updated Archlinux system,
>> and has so since 4.10 100% of the time. It may occur after as well,
>> but it's a WARN_ONCE.
>>
>> The attached patch corrects this issue by first trying to obtain a
>> readlock on said structure member, and if it got it, releases it
>> before returning.

In the future, please include your patch inline as part of the message.

>> After applying the patch, my system is completely stable and the
>> warning no longer occurs. Mounting and unmounting works as expected
>> without issue.

I suspect this is because you aren't doing any of the things that would
conflict here.  Remounting, freeze/thaw, or really anything that takes
->s_umount as a writer running in a different thread would cause problems.

> --- a/fs/reiserfs/super.c 2017-05-29 00:14:45.0 -0400
> +++ b/fs/reiserfs/super.c 2017-05-29 00:51:44.0 -0400
>

Re: [Intel-gfx] [PATCH v9 5/7] vfio: Define vfio based dma-buf operations

2017-06-22 Thread Alex Williamson

On Thu, 22 Jun 2017 10:30:15 +0200
Gerd Hoffmann  wrote:

>   Hi,
> 
> > > VFIO_DEVICE_FLAGS_GFX_DMABUF?  
> > 
> > After proposing these, I'm kind of questioning their purpose.  In the
> > case of a GFX region, the user is going to learn that this is
> > supported
> > as they parse the region information and find the device specific
> > region identifying itself as a GFX area.  That needs to happen early
> > when the user is evaluating the device, so it seems rather redundant
> > to the flag.  
> 
> Indeed.
> 
> > If we have dmabuf support, isn't that indicated by trying to query
> > the
> > graphics plane_info and getting back a result indicating a dmabuf fd?
> > Is there any point in time where a device supporting dmabuf fds would
> > not report this here?  Userspace could really do the same process for
> > a
> > graphics region, ie. querying the plane_info, if it exists pursuing
> > either the region or dmabuf path to get it.  
> 
> Well, you can get a dma-buf only after the guest loaded the driver and
> initialized the device, so a plane actually exists ...

Is this only going to support accelerated driver output, not basic VGA
modes for BIOS interaction?
 
> Right now the experimental intel patches throw errors in case no plane
> exists (yet).  Maybe we should have a "bool is_enabled" field in the
> plane_info struct, so drivers can use that to signal whenever the guest
> has programmed a valid video mode or not (likewise for the cursor,
> which doesn't exist with fbcon, only when running xorg).  With that in
> place using the QUERY_PLANE ioctl also for probing looks reasonable.

Sure, or -ENOTTY for ioctl not implemented vs -EINVAL for no available
plane, but then that might not help the user know how a plane would be
available if it were available.

> > > generation would be increased each time one of the fields in
> > > vfio_device_gfx_plane_info changes, typically on mode switches
> > > (width/height changes) and pageflips (offset changes).  So
> > > userspace
> > > can simply compare generation instead of comparing every field to
> > > figure whenever something changed compared to the previous poll.  
> > 
> > So we have two scenarios, dmabuf and region.  When the user retrieves
> > a
> > dmabuf they get plane_info which includes the generation, so they
> > know
> > the dmabuf is for that generation.  If they query the plane_info and
> > get a different generation they should close the previous dmabuf and
> > get another.  
> 
> Keeping it cached is a valid choice too.

So generation is just intended to be a unique handle, like a uuid but
cheaper.  Generally I think of a generation field only to track what's
current.  Userspace might assume a "generation" never goes backwards
(until it wraps).

> > Does this promote the caching idea that a user might
> > maintain multiple open dmabuf fds and select the appropriate one for
> > the current device state?  Is it entirely the user's responsibility
> > to
> > remember the plane info for an open dmabuf fd?  
> 
> Yes, I'd leave that to userspace.  So, when the generation changes
> userspace knows the guest changed the plane.  It could be a
> configuration the guest has used before (and where userspace could have
> a cached dma-buf handle for), or it could be something new.

But userspace also doesn't know that a dmabuf generation will ever be
visited again, so they're bound to have some stale descriptors.  Are
we thinking userspace would have some LRU list of dmabufs so that they
don't collect too many?  Each uses some resources,  do we just rely on
open file handles to set an upper limit?
 
> > What happens to
> > existing dmabuf fds when the generation updates, do they stop
> > refreshing?  
> 
> Depends on what the guest is doing ;)
> 
> The dma-buf is just a host-side handle for the piece of video memory
> where the guest stored the framebuffer.

So the resources the user is holding if they don't release their dmabuf
are potentially non-trivial.  The user could also have this video
memory mmap'd, which makes it harder to recover from the user.  This
seems like a problem.
 
> > Does it blank the framebuffer?  
> 
> No.
> 
> > Can the dmabuf fd
> > transparently update to the new plane_info?  
> 
> No.

So the user hold a reference to video memory with no idea whether it
will be reused, we have no way to tell them to release that reference
or mechanism to force them to do so... something is wrong here.

> > The other case is a region, the user queries the plane_info records
> > the
> > parameters and region info, and configures access to the region using
> > that information.  Meanwhile, something changed, plane_info including
> > generation is updated, but the user is still assuming the previous
> > plane_info.  How can we avoid such a race?  
> 
> Qemu doesn't.  You might get rendering glitches in that case, due to
> accessing the plane with the wrong configuration.  It's fundamentally
> the same with stdvga btw.
> 
> > What is

[PATCH v5 18/18] xen: introduce a Kconfig option to enable the pvcalls backend

2017-06-22 Thread Stefano Stabellini

Also add pvcalls-back to the Makefile.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/Kconfig  | 12 
 drivers/xen/Makefile |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index f15bb3b7..4545561 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -196,6 +196,18 @@ config XEN_PCIDEV_BACKEND
 
  If in doubt, say m.
 
+config XEN_PVCALLS_BACKEND
+   bool "XEN PV Calls backend driver"
+   depends on INET && XEN && XEN_BACKEND
+   default n
+   help
+ Experimental backend for the Xen PV Calls protocol
+ (https://xenbits.xen.org/docs/unstable/misc/pvcalls.html). It
+ allows PV Calls frontends to send POSIX calls to the backend,
+ which implements them.
+
+ If in doubt, say n.
+
 config XEN_SCSI_BACKEND
tristate "XEN SCSI backend driver"
depends on XEN && XEN_BACKEND && TARGET_CORE
diff --git a/drivers/xen/Makefile b/drivers/xen/Makefile
index 8feab810..480b928 100644
--- a/drivers/xen/Makefile
+++ b/drivers/xen/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_XEN_ACPI_PROCESSOR)  += xen-acpi-processor.o
 obj-$(CONFIG_XEN_EFI)  += efi.o
 obj-$(CONFIG_XEN_SCSI_BACKEND) += xen-scsiback.o
 obj-$(CONFIG_XEN_AUTO_XLATE)   += xlate_mmu.o
+obj-$(CONFIG_XEN_PVCALLS_BACKEND)  += pvcalls-back.o
 xen-evtchn-y   := evtchn.o
 xen-gntdev-y   := gntdev.o
 xen-gntalloc-y := gntalloc.o
-- 
1.9.1

[PATCH v5 16/18] xen/pvcalls: implement read

2017-06-22 Thread Stefano Stabellini

When an active socket has data available, increment the io and read
counters, and schedule the ioworker.

Implement the read function by reading from the socket, writing the data
to the data ring.

Set in_error on error.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 85 ++
 1 file changed, 85 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index ab7882a..ccceabd 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -100,6 +100,81 @@ static int pvcalls_back_release_active(struct 
xenbus_device *dev,
 
 static void pvcalls_conn_back_read(void *opaque)
 {
+   struct sock_mapping *map = (struct sock_mapping *)opaque;
+   struct msghdr msg;
+   struct kvec vec[2];
+   RING_IDX cons, prod, size, wanted, array_size, masked_prod, masked_cons;
+   int32_t error;
+   struct pvcalls_data_intf *intf = map->ring;
+   struct pvcalls_data *data = >data;
+   unsigned long flags;
+   int ret;
+
+   array_size = XEN_FLEX_RING_SIZE(map->ring_order);
+   cons = intf->in_cons;
+   prod = intf->in_prod;
+   error = intf->in_error;
+   /* read the indexes first, then deal with the data */
+   virt_mb();
+
+   if (error)
+   return;
+
+   size = pvcalls_queued(prod, cons, array_size);
+   if (size >= array_size)
+   return;
+   spin_lock_irqsave(>sock->sk->sk_receive_queue.lock, flags);
+   if (skb_queue_empty(>sock->sk->sk_receive_queue)) {
+   atomic_set(>read, 0);
+   spin_unlock_irqrestore(>sock->sk->sk_receive_queue.lock,
+   flags);
+   return;
+   }
+   spin_unlock_irqrestore(>sock->sk->sk_receive_queue.lock, flags);
+   wanted = array_size - size;
+   masked_prod = pvcalls_mask(prod, array_size);
+   masked_cons = pvcalls_mask(cons, array_size);
+
+   memset(, 0, sizeof(msg));
+   msg.msg_iter.type = ITER_KVEC|WRITE;
+   msg.msg_iter.count = wanted;
+   if (masked_prod < masked_cons) {
+   vec[0].iov_base = data->in + masked_prod;
+   vec[0].iov_len = wanted;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 1;
+   } else {
+   vec[0].iov_base = data->in + masked_prod;
+   vec[0].iov_len = array_size - masked_prod;
+   vec[1].iov_base = data->in;
+   vec[1].iov_len = wanted - vec[0].iov_len;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 2;
+   }
+
+   atomic_set(>read, 0);
+   ret = inet_recvmsg(map->sock, , wanted, MSG_DONTWAIT);
+   WARN_ON(ret > wanted);
+   if (ret == -EAGAIN) /* shouldn't happen */
+   return;
+   if (!ret)
+   ret = -ENOTCONN;
+   spin_lock_irqsave(>sock->sk->sk_receive_queue.lock, flags);
+   if (ret > 0 && !skb_queue_empty(>sock->sk->sk_receive_queue))
+   atomic_inc(>read);
+   spin_unlock_irqrestore(>sock->sk->sk_receive_queue.lock, flags);
+
+   /* write the data, then modify the indexes */
+   virt_wmb();
+   if (ret < 0)
+   intf->in_error = ret;
+   else
+   intf->in_prod = prod + ret;
+   /* update the indexes, then notify the other end */
+   virt_wmb();
+   notify_remote_via_irq(map->irq);
+
+   return;
 }
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
@@ -172,6 +247,16 @@ static void pvcalls_sk_state_change(struct sock *sock)
 
 static void pvcalls_sk_data_ready(struct sock *sock)
 {
+   struct sock_mapping *map = sock->sk_user_data;
+   struct pvcalls_ioworker *iow;
+
+   if (map == NULL)
+   return;
+
+   iow = >ioworker;
+   atomic_inc(>read);
+   atomic_inc(>io);
+   queue_work(iow->wq, >register_work);
 }
 
 static struct sock_mapping *pvcalls_new_active_socket(
-- 
1.9.1

[PATCH v5 12/18] xen/pvcalls: implement poll command

2017-06-22 Thread Stefano Stabellini

Implement poll on passive sockets by requesting a delayed response with
mappass->reqcopy, and reply back when there is data on the passive
socket.

Poll on active socket is unimplemented as by the spec, as the frontend
should just wait for events and check the indexes on the indexes page.

Only support one outstanding poll (or accept) request for every passive
socket at any given time.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 73 +-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 62738e4..5b2ef60 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -352,11 +352,33 @@ static void __pvcalls_back_accept(struct work_struct 
*work)
 static void pvcalls_pass_sk_data_ready(struct sock *sock)
 {
struct sockpass_mapping *mappass = sock->sk_user_data;
+   struct pvcalls_fedata *fedata;
+   struct xen_pvcalls_response *rsp;
+   unsigned long flags;
+   int notify;
 
if (mappass == NULL)
return;
 
-   queue_work(mappass->wq, >register_work);
+   fedata = mappass->fedata;
+   spin_lock_irqsave(>copy_lock, flags);
+   if (mappass->reqcopy.cmd == PVCALLS_POLL) {
+   rsp = RING_GET_RESPONSE(>ring, 
fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = mappass->reqcopy.req_id;
+   rsp->u.poll.id = mappass->reqcopy.u.poll.id;
+   rsp->cmd = mappass->reqcopy.cmd;
+   rsp->ret = 0;
+
+   mappass->reqcopy.cmd = 0;
+   spin_unlock_irqrestore(>copy_lock, flags);
+
+   RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(>ring, notify);
+   if (notify)
+   notify_remote_via_irq(mappass->fedata->irq);
+   } else {
+   spin_unlock_irqrestore(>copy_lock, flags);
+   queue_work(mappass->wq, >register_work);
+   }
 }
 
 static int pvcalls_back_bind(struct xenbus_device *dev,
@@ -507,6 +529,55 @@ static int pvcalls_back_accept(struct xenbus_device *dev,
 static int pvcalls_back_poll(struct xenbus_device *dev,
 struct xen_pvcalls_request *req)
 {
+   struct pvcalls_fedata *fedata;
+   struct sockpass_mapping *mappass;
+   struct xen_pvcalls_response *rsp;
+   struct inet_connection_sock *icsk;
+   struct request_sock_queue *queue;
+   unsigned long flags;
+   int ret;
+   bool data;
+
+   fedata = dev_get_drvdata(>dev);
+
+   down(>socket_lock);
+   mappass = radix_tree_lookup(>socketpass_mappings, 
req->u.poll.id);
+   up(>socket_lock);
+   if (mappass == NULL)
+   return -EINVAL;
+
+   /*
+* Limitation of the current implementation: only support one
+* concurrent accept or poll call on one socket.
+*/
+   spin_lock_irqsave(>copy_lock, flags);
+   if (mappass->reqcopy.cmd != 0) {
+   ret = -EINTR;
+   goto out;
+   }
+
+   mappass->reqcopy = *req;
+   icsk = inet_csk(mappass->sock->sk);
+   queue = >icsk_accept_queue;
+   data = queue->rskq_accept_head != NULL;
+   if (data) {
+   mappass->reqcopy.cmd = 0;
+   ret = 0;
+   goto out;
+   }
+   spin_unlock_irqrestore(>copy_lock, flags);
+
+   /* Tell the caller we don't need to send back a notification yet */
+   return -1;
+
+out:
+   spin_unlock_irqrestore(>copy_lock, flags);
+
+   rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.poll.id = req->u.poll.id;
+   rsp->ret = ret;
return 0;
 }
 
-- 
1.9.1

[PATCH v5 00/18] introduce the Xen PV Calls backend

2017-06-22 Thread Stefano Stabellini

Hi all,

this series introduces the backend for the newly introduced PV Calls
procotol.

PV Calls is a paravirtualized protocol that allows the implementation of
a set of POSIX functions in a different domain. The PV Calls frontend
sends POSIX function calls to the backend, which implements them and
returns a value to the frontend and acts on the function call.

For more information about PV Calls, please read:

https://xenbits.xen.org/docs/unstable/misc/pvcalls.html

I tried to split the source code into small pieces to make it easier to
read and understand. Please review!


Changes in v5:
- added review-byes
- remove unnecessary gotos
- ret 0 in pvcalls_back_connect
- do not lose ret values
- remove queue->rskq_lock
- make sure all accesses to socket_mappings and socketpass_mappings are
  protected by socket_lock
- rename ring_size to array_size

Changes in v4:
- add reviewed-bys
- fix return values of many functions
- remove pointless initializers
- print a warning if ring_order > MAX_RING_ORDER
- remove map->ioworker.cpu
- use queue_work instead of queue_work_on
- add sock_release() on error paths where appropriate
- add a comment in __pvcalls_back_accept about racing with
  pvcalls_back_accept and atomicity of reqcopy
- remove unneded (void*) casts
- remove unneded {}
- fix backend_disconnect if !mappass
- remove pointless continue in backend_disconnect
- remove pointless memset of _back_global
- pass *opaque to pvcalls_conn_back_read
- improve WARN_ON in pvcalls_conn_back_read
- fix error checks in pvcalls_conn_back_write
- XEN_PVCALLS_BACKEND depends on XEN_BACKEND
- rename priv to fedata across all patches

Changes in v3:
- added reviewed-bys
- return err from pvcalls_back_probe
- remove old comments
- use a xenstore transaction in pvcalls_back_probe
- ignore errors from xenbus_switch_state
- rename pvcalls_back_priv to pvcalls_fedata
- remove addr from backend_connect
- remove priv->work, add comment about theoretical race
- use IPPROTO_IP
- refactor active socket allocation in a single new function

Changes in v2:
- allocate one ioworker per socket (rather than 1 per vcpu)
- rename privs to frontends
- add newlines
- define "1" in the public header
- better error returns in pvcalls_back_probe
- do not set XenbusStateClosed twice in set_backend_state
- add more comments
- replace rw_semaphore with semaphore
- rename pvcallss to socket_lock
- move xenbus_map_ring_valloc closer to first use in backend_connect
- use more traditional return codes from pvcalls_back_handle_cmd and
  callees
- remove useless dev == NULL checks
- replace lock_sock with more appropriate and fine grained socket locks


Stefano Stabellini (18):
  xen: introduce the pvcalls interface header
  xen/pvcalls: introduce the pvcalls xenbus backend
  xen/pvcalls: initialize the module and register the xenbus backend
  xen/pvcalls: xenbus state handling
  xen/pvcalls: connect to a frontend
  xen/pvcalls: handle commands from the frontend
  xen/pvcalls: implement socket command
  xen/pvcalls: implement connect command
  xen/pvcalls: implement bind command
  xen/pvcalls: implement listen command
  xen/pvcalls: implement accept command
  xen/pvcalls: implement poll command
  xen/pvcalls: implement release command
  xen/pvcalls: disconnect and module_exit
  xen/pvcalls: implement the ioworker functions
  xen/pvcalls: implement read
  xen/pvcalls: implement write
  xen: introduce a Kconfig option to enable the pvcalls backend

 drivers/xen/Kconfig|   12 +
 drivers/xen/Makefile   |1 +
 drivers/xen/pvcalls-back.c | 1244 
 include/xen/interface/io/pvcalls.h |  121 
 include/xen/interface/io/ring.h|2 +
 5 files changed, 1380 insertions(+)
 create mode 100644 drivers/xen/pvcalls-back.c
 create mode 100644 include/xen/interface/io/pvcalls.h

[PATCH v5 07/18] xen/pvcalls: implement socket command

2017-06-22 Thread Stefano Stabellini

Just reply with success to the other end for now. Delay the allocation
of the actual socket to bind and/or connect.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 437c2ad..953458b 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -12,12 +12,17 @@
  * GNU General Public License for more details.
  */
 
+#include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -54,6 +59,28 @@ struct pvcalls_fedata {
 static int pvcalls_back_socket(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
+   struct pvcalls_fedata *fedata;
+   int ret;
+   struct xen_pvcalls_response *rsp;
+
+   fedata = dev_get_drvdata(>dev);
+
+   if (req->u.socket.domain != AF_INET ||
+   req->u.socket.type != SOCK_STREAM ||
+   (req->u.socket.protocol != IPPROTO_IP &&
+req->u.socket.protocol != AF_INET))
+   ret = -EAFNOSUPPORT;
+   else
+   ret = 0;
+
+   /* leave the actual socket allocation for later */
+
+   rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->cmd = req->cmd;
+   rsp->u.socket.id = req->u.socket.id;
+   rsp->ret = ret;
+
return 0;
 }
 
-- 
1.9.1

[PATCH v5 13/18] xen/pvcalls: implement release command

2017-06-22 Thread Stefano Stabellini

Release both active and passive sockets. For active sockets, make sure
to avoid possible conflicts with the ioworker reading/writing to those
sockets concurrently. Set map->release to let the ioworker know
atomically that the socket will be released soon, then wait until the
ioworker finishes (flush_work).

Unmap indexes pages and data rings.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 68 ++
 1 file changed, 68 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 5b2ef60..f6f88ce 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -269,12 +269,80 @@ static int pvcalls_back_release_active(struct 
xenbus_device *dev,
   struct pvcalls_fedata *fedata,
   struct sock_mapping *map)
 {
+   disable_irq(map->irq);
+   if (map->sock->sk != NULL) {
+   write_lock_bh(>sock->sk->sk_callback_lock);
+   map->sock->sk->sk_user_data = NULL;
+   map->sock->sk->sk_data_ready = map->saved_data_ready;
+   write_unlock_bh(>sock->sk->sk_callback_lock);
+   }
+
+   atomic_set(>release, 1);
+   flush_work(>ioworker.register_work);
+
+   xenbus_unmap_ring_vfree(dev, map->bytes);
+   xenbus_unmap_ring_vfree(dev, (void *)map->ring);
+   unbind_from_irqhandler(map->irq, map);
+
+   sock_release(map->sock);
+   kfree(map);
+
+   return 0;
+}
+
+static int pvcalls_back_release_passive(struct xenbus_device *dev,
+   struct pvcalls_fedata *fedata,
+   struct sockpass_mapping *mappass)
+{
+   if (mappass->sock->sk != NULL) {
+   write_lock_bh(>sock->sk->sk_callback_lock);
+   mappass->sock->sk->sk_user_data = NULL;
+   mappass->sock->sk->sk_data_ready = mappass->saved_data_ready;
+   write_unlock_bh(>sock->sk->sk_callback_lock);
+   }
+   sock_release(mappass->sock);
+   flush_workqueue(mappass->wq);
+   destroy_workqueue(mappass->wq);
+   kfree(mappass);
+
return 0;
 }
 
 static int pvcalls_back_release(struct xenbus_device *dev,
struct xen_pvcalls_request *req)
 {
+   struct pvcalls_fedata *fedata;
+   struct sock_mapping *map, *n;
+   struct sockpass_mapping *mappass;
+   int ret = 0;
+   struct xen_pvcalls_response *rsp;
+
+   fedata = dev_get_drvdata(>dev);
+
+   down(>socket_lock);
+   list_for_each_entry_safe(map, n, >socket_mappings, list) {
+   if (map->id == req->u.release.id) {
+   list_del(>list);
+   up(>socket_lock);
+   ret = pvcalls_back_release_active(dev, fedata, map);
+   goto out;
+   }
+   }
+   mappass = radix_tree_lookup(>socketpass_mappings,
+   req->u.release.id);
+   if (mappass != NULL) {
+   radix_tree_delete(>socketpass_mappings, mappass->id);
+   up(>socket_lock);
+   ret = pvcalls_back_release_passive(dev, fedata, mappass);
+   } else
+   up(>socket_lock);
+
+out:
+   rsp = RING_GET_RESPONSE(>ring, fedata->ring.rsp_prod_pvt++);
+   rsp->req_id = req->req_id;
+   rsp->u.release.id = req->u.release.id;
+   rsp->cmd = req->cmd;
+   rsp->ret = ret;
return 0;
 }
 
-- 
1.9.1

[PATCH v5 06/18] xen/pvcalls: handle commands from the frontend

2017-06-22 Thread Stefano Stabellini

When the other end notifies us that there are commands to be read
(pvcalls_back_event), wake up the backend thread to parse the command.

The command ring works like most other Xen rings, so use the usual
ring macros to read and write to it. The functions implementing the
commands are empty stubs for now.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 119 +
 1 file changed, 119 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index e4c2e46..437c2ad 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -51,12 +51,131 @@ struct pvcalls_fedata {
struct work_struct register_work;
 };
 
+static int pvcalls_back_socket(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_connect(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_release(struct xenbus_device *dev,
+   struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_bind(struct xenbus_device *dev,
+struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_listen(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_accept(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_poll(struct xenbus_device *dev,
+struct xen_pvcalls_request *req)
+{
+   return 0;
+}
+
+static int pvcalls_back_handle_cmd(struct xenbus_device *dev,
+  struct xen_pvcalls_request *req)
+{
+   int ret = 0;
+
+   switch (req->cmd) {
+   case PVCALLS_SOCKET:
+   ret = pvcalls_back_socket(dev, req);
+   break;
+   case PVCALLS_CONNECT:
+   ret = pvcalls_back_connect(dev, req);
+   break;
+   case PVCALLS_RELEASE:
+   ret = pvcalls_back_release(dev, req);
+   break;
+   case PVCALLS_BIND:
+   ret = pvcalls_back_bind(dev, req);
+   break;
+   case PVCALLS_LISTEN:
+   ret = pvcalls_back_listen(dev, req);
+   break;
+   case PVCALLS_ACCEPT:
+   ret = pvcalls_back_accept(dev, req);
+   break;
+   case PVCALLS_POLL:
+   ret = pvcalls_back_poll(dev, req);
+   break;
+   default:
+   ret = -ENOTSUPP;
+   break;
+   }
+   return ret;
+}
+
 static void pvcalls_back_work(struct work_struct *work)
 {
+   struct pvcalls_fedata *fedata = container_of(work,
+   struct pvcalls_fedata, register_work);
+   int notify, notify_all = 0, more = 1;
+   struct xen_pvcalls_request req;
+   struct xenbus_device *dev = fedata->dev;
+
+   while (more) {
+   while (RING_HAS_UNCONSUMED_REQUESTS(>ring)) {
+   RING_COPY_REQUEST(>ring,
+ fedata->ring.req_cons++,
+ );
+
+   if (!pvcalls_back_handle_cmd(dev, )) {
+   RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(
+   >ring, notify);
+   notify_all += notify;
+   }
+   }
+
+   if (notify_all)
+   notify_remote_via_irq(fedata->irq);
+
+   RING_FINAL_CHECK_FOR_REQUESTS(>ring, more);
+   }
 }
 
 static irqreturn_t pvcalls_back_event(int irq, void *dev_id)
 {
+   struct xenbus_device *dev = dev_id;
+   struct pvcalls_fedata *fedata = NULL;
+
+   if (dev == NULL)
+   return IRQ_HANDLED;
+
+   fedata = dev_get_drvdata(>dev);
+   if (fedata == NULL)
+   return IRQ_HANDLED;
+
+   /*
+* TODO: a small theoretical race exists if we try to queue work
+* after pvcalls_back_work checked for final requests and before
+* it returns. The queuing will fail, and pvcalls_back_work
+* won't do the work because it is about to return. In that
+* case, we lose the notification.
+*/
+   queue_work(fedata->wq, >register_work);
+
return IRQ_HANDLED;
 }
 
-- 
1.9.1

[PATCH v5 17/18] xen/pvcalls: implement write

2017-06-22 Thread Stefano Stabellini

When the other end notifies us that there is data to be written
(pvcalls_back_conn_event), increment the io and write counters, and
schedule the ioworker.

Implement the write function called by ioworker by reading the data from
the data ring, writing it to the socket by calling inet_sendmsg.

Set out_error on error.

Signed-off-by: Stefano Stabellini 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 74 +-
 1 file changed, 73 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index ccceabd..424dcac 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -179,7 +179,66 @@ static void pvcalls_conn_back_read(void *opaque)
 
 static int pvcalls_conn_back_write(struct sock_mapping *map)
 {
-   return 0;
+   struct pvcalls_data_intf *intf = map->ring;
+   struct pvcalls_data *data = >data;
+   struct msghdr msg;
+   struct kvec vec[2];
+   RING_IDX cons, prod, size, array_size;
+   int ret;
+
+   cons = intf->out_cons;
+   prod = intf->out_prod;
+   /* read the indexes before dealing with the data */
+   virt_mb();
+
+   array_size = XEN_FLEX_RING_SIZE(map->ring_order);
+   size = pvcalls_queued(prod, cons, array_size);
+   if (size == 0)
+   return 0;
+
+   memset(, 0, sizeof(msg));
+   msg.msg_flags |= MSG_DONTWAIT;
+   msg.msg_iter.type = ITER_KVEC|READ;
+   msg.msg_iter.count = size;
+   if (pvcalls_mask(prod, array_size) > pvcalls_mask(cons, array_size)) {
+   vec[0].iov_base = data->out + pvcalls_mask(cons, array_size);
+   vec[0].iov_len = size;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 1;
+   } else {
+   vec[0].iov_base = data->out + pvcalls_mask(cons, array_size);
+   vec[0].iov_len = array_size - pvcalls_mask(cons, array_size);
+   vec[1].iov_base = data->out;
+   vec[1].iov_len = size - vec[0].iov_len;
+   msg.msg_iter.kvec = vec;
+   msg.msg_iter.nr_segs = 2;
+   }
+
+   atomic_set(>write, 0);
+   ret = inet_sendmsg(map->sock, , size);
+   if (ret == -EAGAIN || (ret >= 0 && ret < size)) {
+   atomic_inc(>write);
+   atomic_inc(>io);
+   }
+   if (ret == -EAGAIN)
+   return ret;
+
+   /* write the data, then update the indexes */
+   virt_wmb();
+   if (ret < 0) {
+   intf->out_error = ret;
+   } else {
+   intf->out_error = 0;
+   intf->out_cons = cons + ret;
+   prod = intf->out_prod;
+   }
+   /* update the indexes, then notify the other end */
+   virt_wmb();
+   if (prod != cons + ret)
+   atomic_inc(>write);
+   notify_remote_via_irq(map->irq);
+
+   return ret;
 }
 
 static void pvcalls_back_ioworker(struct work_struct *work)
@@ -849,6 +908,19 @@ static irqreturn_t pvcalls_back_event(int irq, void 
*dev_id)
 
 static irqreturn_t pvcalls_back_conn_event(int irq, void *sock_map)
 {
+   struct sock_mapping *map = sock_map;
+   struct pvcalls_ioworker *iow;
+
+   if (map == NULL || map->sock == NULL || map->sock->sk == NULL ||
+   map->sock->sk->sk_user_data != map)
+   return IRQ_HANDLED;
+
+   iow = >ioworker;
+
+   atomic_inc(>write);
+   atomic_inc(>io);
+   queue_work(iow->wq, >register_work);
+
return IRQ_HANDLED;
 }
 
-- 
1.9.1

[PATCH v5 04/18] xen/pvcalls: xenbus state handling

2017-06-22 Thread Stefano Stabellini

Introduce the code to handle xenbus state changes.

Implement the probe function for the pvcalls backend. Write the
supported versions, max-page-order and function-calls nodes to xenstore,
as required by the protocol.

Introduce stub functions for disconnecting/connecting to a frontend.

Signed-off-by: Stefano Stabellini 
Reviewed-by: Boris Ostrovsky 
CC: boris.ostrov...@oracle.com
CC: jgr...@suse.com
---
 drivers/xen/pvcalls-back.c | 152 +
 1 file changed, 152 insertions(+)

diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
index 9044cf2..7bce750 100644
--- a/drivers/xen/pvcalls-back.c
+++ b/drivers/xen/pvcalls-back.c
@@ -25,20 +25,172 @@
 #include 
 #include 
 
+#define PVCALLS_VERSIONS "1"
+#define MAX_RING_ORDER XENBUS_MAX_RING_GRANT_ORDER
+
 struct pvcalls_back_global {
struct list_head frontends;
struct semaphore frontends_lock;
 } pvcalls_back_global;
 
+static int backend_connect(struct xenbus_device *dev)
+{
+   return 0;
+}
+
+static int backend_disconnect(struct xenbus_device *dev)
+{
+   return 0;
+}
+
 static int pvcalls_back_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
+   int err, abort;
+   struct xenbus_transaction xbt;
+
+again:
+   abort = 1;
+
+   err = xenbus_transaction_start();
+   if (err) {
+   pr_warn("%s cannot create xenstore transaction\n", __func__);
+   return err;
+   }
+
+   err = xenbus_printf(xbt, dev->nodename, "versions", "%s",
+   PVCALLS_VERSIONS);
+   if (err) {
+   pr_warn("%s write out 'version' failed\n", __func__);
+   goto abort;
+   }
+
+   err = xenbus_printf(xbt, dev->nodename, "max-page-order", "%u",
+   MAX_RING_ORDER);
+   if (err) {
+   pr_warn("%s write out 'max-page-order' failed\n", __func__);
+   goto abort;
+   }
+
+   err = xenbus_printf(xbt, dev->nodename, "function-calls",
+   XENBUS_FUNCTIONS_CALLS);
+   if (err) {
+   pr_warn("%s write out 'function-calls' failed\n", __func__);
+   goto abort;
+   }
+
+   abort = 0;
+abort:
+   err = xenbus_transaction_end(xbt, abort);
+   if (err) {
+   if (err == -EAGAIN && !abort)
+   goto again;
+   pr_warn("%s cannot complete xenstore transaction\n", __func__);
+   return err;
+   }
+
+   xenbus_switch_state(dev, XenbusStateInitWait);
+
return 0;
 }
 
+static void set_backend_state(struct xenbus_device *dev,
+ enum xenbus_state state)
+{
+   while (dev->state != state) {
+   switch (dev->state) {
+   case XenbusStateClosed:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateConnected:
+   xenbus_switch_state(dev, XenbusStateInitWait);
+   break;
+   case XenbusStateClosing:
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateInitWait:
+   case XenbusStateInitialised:
+   switch (state) {
+   case XenbusStateConnected:
+   backend_connect(dev);
+   xenbus_switch_state(dev, XenbusStateConnected);
+   break;
+   case XenbusStateClosing:
+   case XenbusStateClosed:
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateConnected:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateClosing:
+   case XenbusStateClosed:
+   down(_back_global.frontends_lock);
+   backend_disconnect(dev);
+   up(_back_global.frontends_lock);
+   xenbus_switch_state(dev, XenbusStateClosing);
+   break;
+   default:
+   __WARN();
+   }
+   break;
+   case XenbusStateClosing:
+   switch (state) {
+   case XenbusStateInitWait:
+

Re: [PATCH v4 0/5] Add support for the ARMv8.2 Statistical Profiling Extension

2017-06-22 Thread Kim Phillips

On Wed, 21 Jun 2017 16:31:09 +0100
Will Deacon  wrote:

> On Thu, Jun 15, 2017 at 10:57:35AM -0500, Kim Phillips wrote:
> > On Mon, 12 Jun 2017 11:20:48 -0500
> > Kim Phillips  wrote:
> > 
> > > On Mon, 12 Jun 2017 12:08:23 +0100
> > > Mark Rutland  wrote:
> > > 
> > > > On Mon, Jun 05, 2017 at 04:22:52PM +0100, Will Deacon wrote:
> > > > > This is the sixth posting of the patches previously posted here:
> > ...
> > > > Kim, do you have any version of the userspace side that we could look
> > > > at?
> > > > 
> > > > For review, it would be really helpful to have something that can poke
> > > > the PMU, even if it's incomplete or lacking polish.
> > > 
> > > Here's the latest push, based on a a couple of prior versions of this
> > > driver:
> > > 
> > > http://linux-arm.org/git?p=linux-kp.git;a=shortlog;h=refs/heads/armspev0.1
> > > 
> > > I don't seem to be able to get any SPE data output after rebasing on
> > > this version of the driver.  Still don't know why at the moment...
> > 
> > Bisected to commit e38ba76deef "perf tools: force uncore events to
> > system wide monitoring".  So, using record with specifying a -C
> >  explicitly now produces SPE data, but only a couple of valid
> > records at the beginning of each buffer; the rest is filled with
> > PADding (0's).
> > 
> > I see Mark's latest comments have found a possible issue in the perf
> > aux buffer handling code in the driver, and that the driver does some
> > memset of padding (0's) itself; could that be responsible for the above
> > behaviour?
> 
> Possibly. Do you know how big you're mapping the aux buffer

4MiB.

> and what (if any) value you're passing as aux_watermark?

None passed, but it looks like 4KiB was used since the AUXTRACE size
was 4MiB - 4KiB.

I'm not seeing the issue with a simple bts-based version I'm
working on...yet.  We can revisit if I'm able to reproduce again; the
problem could have been on the userspace side.

Meanwhile, when using fvp-base.dtb, my model setup stops booting the
kernel after "smp: Bringing up secondary CPUs ...".  If I however take
the second SPE node from fvp-base.dts and add it to my working device
tree, I get this during the driver probe:

[1.042063] arm_spe_pmu spe-pmu@0: probed for CPUs 0-7 [max_record_sz 64, 
align 1, features 0xf]
[1.043582] arm_spe_pmu spe-pmu@1: probed for CPUs 0-7 [max_record_sz 64, 
align 1, features 0xf]
[1.043631] genirq: Flags mismatch irq 6. 4404 (arm_spe_pmu) vs. 
4404 (arm_spe_pmu)
[1.043784] arm_spe_pmu: probe of spe-pmu@1 failed with error -16

spe-pmu@0 is useable, but doubt spe-pmu@1 is.  btw, that 16 is EBUSY
"Device or resource busy".

Kim

Re: [PATCH] docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters

2017-06-22 Thread Jonathan Corbet

On Wed, 14 Jun 2017 12:24:12 +0200
Steffen Maier  wrote:

> Another place in lib/Kconfig.debug was already fixed in commit f8998c226587
> ("lib/Kconfig.debug: correct documentation paths").

Applied to the docs tree, thanks.

jon

Re: [PATCH 2/2] selftests/ftrace: Update multiple kprobes test for powerpc

2017-06-22 Thread Naveen N. Rao

On 2017/06/22 06:07PM, Masami Hiramatsu wrote:
> On Thu, 22 Jun 2017 00:20:28 +0530
> "Naveen N. Rao"  wrote:
> 
> > KPROBES_ON_FTRACE is only available on powerpc64le. Update comment to
> > clarify this.
> > 
> > Also, we should use an offset of 8 to ensure that the probe does not
> > fall on ftrace location. The current offset of 4 will fall before the
> > function local entry point and won't fire, while an offset of 12 or 16
> > will fall on ftrace location. Offset 8 is currently guaranteed to not be
> > the ftrace location.
> 
> OK, these part seems good to me.
> 
> > 
> > Finally, do not filter out symbols with a dot. Powerpc Elfv1 uses dot
> > prefix for all functions and this prevents us from testing some of those
> > symbols. Furthermore, with the patch to derive event names properly in
> > the presence of ':' and '.', such names are accepted by kprobe_events
> > and constitutes a good test for those symbols.
> 
> Hmm, the reason why I added such filter was to avoid symbols including
> gcc-generated suffixes like as .constprop or .isra etc.

I see.

I do wonder -- is there a problem if we try probing those symbols? On my 
local x86 vm, I don't see an issue probing it especially with the 
previous patch to enable probing with symbols having a '.' or ':'.

Furthermore, since this is for testing kprobe_events, I feel it is good 
to try probing those symbols too to catch any weird errors we may hit.

Thanks for the review!
- Naveen


> So if the Powerpc Elfv1 use dot prefix, that is OK, in that case,
> could you update the filter as "^.*\\..*" ?
> 
> Thank you,
> 
> > 
> > Signed-off-by: Naveen N. Rao 
> > ---
> >  tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc | 8 
> > 
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> > 
> > diff --git 
> > a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc 
> > b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
> > index f4d1ff785d67..d209c071b2c0 100644
> > --- a/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
> > +++ b/tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
> > @@ -2,16 +2,16 @@
> >  # description: Register/unregister many kprobe events
> >  
> >  # ftrace fentry skip size depends on the machine architecture.
> > -# Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc
> > +# Currently HAVE_KPROBES_ON_FTRACE defined on x86 and powerpc64le
> >  case `uname -m` in
> >x86_64|i[3456]86) OFFS=5;;
> > -  ppc*) OFFS=4;;
> > +  ppc64le) OFFS=8;;
> >*) OFFS=0;;
> >  esac
> >  
> >  echo "Setup up to 256 kprobes"
> > -grep t /proc/kallsyms | cut -f3 -d" " | grep -v .*\\..* | \
> > -head -n 256 | while read i; do echo p ${i}+${OFFS} ; done > kprobe_events 
> > ||:
> > +grep t /proc/kallsyms | cut -f3 -d" " | head -n 256 | \
> > +while read i; do echo p ${i}+${OFFS} ; done > kprobe_events ||:
> >  
> >  echo 1 > events/kprobes/enable
> >  echo 0 > events/kprobes/enable
> > -- 
> > 2.13.1
> > 
> 
> 
> -- 
> Masami Hiramatsu 
>

Re: [PATCH v3.1 1/3] drm/rockchip: dw_hdmi: add RK3399 HDMI support

2017-06-22 Thread Rob Herring

On Thu, Jun 22, 2017 at 2:17 AM, Mark Yao  wrote:
> RK3399 and RK3288 shared the same HDMI IP controller, only some light
> difference with GRF configure.
>
> Signed-off-by: Yakir Yang 
> Signed-off-by: Mark Yao 
> ---
> Changes in v3.1:
>   Correct documentation compatible's format(Rob Herring).
> Changes in v3:
>   remove hdmi_phy_configure_dwc_hdmi_3d_tx callbak.
>
> Changes in v2:
>   reuse hdmi_phy_configure_dwc_hdmi_3d_tx for phy configure
>   fixup Documentation
>
>  .../bindings/display/rockchip/dw_hdmi-rockchip.txt |  4 +-

Acked-by: Rob Herring 

>  drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c| 67 
> ++
>  2 files changed, 59 insertions(+), 12 deletions(-)

[tip:irq/core] genirq: Introduce IRQD_SINGLE_TARGET flag

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  d52dd44175bd27ad9d8e34a994fb80877c1f6d61
Gitweb: http://git.kernel.org/tip/d52dd44175bd27ad9d8e34a994fb80877c1f6d61
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:52 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:25 +0200

genirq: Introduce IRQD_SINGLE_TARGET flag

Many interrupt chips allow only a single CPU as interrupt target. The core
code has no knowledge about that. That's unfortunate as it could avoid
trying to readd a newly online CPU to the effective affinity mask.

Add the status flag and the necessary accessors.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235447.352343...@linutronix.de

---
 include/linux/irq.h  | 16 
 kernel/irq/debugfs.c |  1 +
 2 files changed, 17 insertions(+)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 19cea63..00db35b 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -209,6 +209,7 @@ struct irq_data {
  * IRQD_IRQ_STARTED- Startup state of the interrupt
  * IRQD_MANAGED_SHUTDOWN   - Interrupt was shutdown due to empty affinity
  *   mask. Applies only to affinity managed irqs.
+ * IRQD_SINGLE_TARGET  - IRQ allows only a single affinity target
  */
 enum {
IRQD_TRIGGER_MASK   = 0xf,
@@ -228,6 +229,7 @@ enum {
IRQD_AFFINITY_MANAGED   = (1 << 21),
IRQD_IRQ_STARTED= (1 << 22),
IRQD_MANAGED_SHUTDOWN   = (1 << 23),
+   IRQD_SINGLE_TARGET  = (1 << 24),
 };
 
 #define __irqd_to_state(d) ACCESS_PRIVATE((d)->common, state_use_accessors)
@@ -276,6 +278,20 @@ static inline bool irqd_is_level_type(struct irq_data *d)
return __irqd_to_state(d) & IRQD_LEVEL;
 }
 
+/*
+ * Must only be called of irqchip.irq_set_affinity() or low level
+ * hieararchy domain allocation functions.
+ */
+static inline void irqd_set_single_target(struct irq_data *d)
+{
+   __irqd_to_state(d) |= IRQD_SINGLE_TARGET;
+}
+
+static inline bool irqd_is_single_target(struct irq_data *d)
+{
+   return __irqd_to_state(d) & IRQD_SINGLE_TARGET;
+}
+
 static inline bool irqd_is_wakeup_set(struct irq_data *d)
 {
return __irqd_to_state(d) & IRQD_WAKEUP_STATE;
diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c
index edbef25..dbd6e78 100644
--- a/kernel/irq/debugfs.c
+++ b/kernel/irq/debugfs.c
@@ -105,6 +105,7 @@ static const struct irq_bit_descr irqdata_states[] = {
BIT_MASK_DESCR(IRQD_PER_CPU),
BIT_MASK_DESCR(IRQD_NO_BALANCING),
 
+   BIT_MASK_DESCR(IRQD_SINGLE_TARGET),
BIT_MASK_DESCR(IRQD_MOVE_PCNTXT),
BIT_MASK_DESCR(IRQD_AFFINITY_SET),
BIT_MASK_DESCR(IRQD_SETAFFINITY_PENDING),

[tip:irq/core] genirq/cpuhotplug: Handle managed IRQs on CPU hotplug

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  c5cb83bb337c25caae995d992d1cdf9b317f83de
Gitweb: http://git.kernel.org/tip/c5cb83bb337c25caae995d992d1cdf9b317f83de
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:51 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:25 +0200

genirq/cpuhotplug: Handle managed IRQs on CPU hotplug

If a CPU goes offline, interrupts affine to the CPU are moved away. If the
outgoing CPU is the last CPU in the affinity mask the migration code breaks
the affinity and sets it it all online cpus.

This is a problem for affinity managed interrupts as CPU hotplug is often
used for power management purposes. If the affinity is broken, the
interrupt is not longer affine to the CPUs to which it was allocated.

The affinity spreading allows to lay out multi queue devices in a way that
they are assigned to a single CPU or a group of CPUs. If the last CPU goes
offline, then the queue is not longer used, so the interrupt can be
shutdown gracefully and parked until one of the assigned CPUs comes online
again.

Add a graceful shutdown mechanism into the irq affinity breaking code path,
mark the irq as MANAGED_SHUTDOWN and leave the affinity mask unmodified.

In the online path, scan the active interrupts for managed interrupts and
if the interrupt is functional and the newly online CPU is part of the
affinity mask, restart the interrupt if it is marked MANAGED_SHUTDOWN or if
the interrupts is started up, try to add the CPU back to the effective
affinity mask.

Originally-by: Christoph Hellwig 
Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Link: http://lkml.kernel.org/r/20170619235447.273417...@linutronix.de

---
 include/linux/cpuhotplug.h |  1 +
 include/linux/irq.h|  5 +
 kernel/cpu.c   |  5 +
 kernel/irq/cpuhotplug.c| 45 +
 4 files changed, 56 insertions(+)

diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 0f2a803..c15f22c 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -124,6 +124,7 @@ enum cpuhp_state {
CPUHP_AP_ONLINE_IDLE,
CPUHP_AP_SMPBOOT_THREADS,
CPUHP_AP_X86_VDSO_VMA_ONLINE,
+   CPUHP_AP_IRQ_AFFINITY_ONLINE,
CPUHP_AP_PERF_ONLINE,
CPUHP_AP_PERF_X86_ONLINE,
CPUHP_AP_PERF_X86_UNCORE_ONLINE,
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 807042b..19cea63 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -500,7 +500,12 @@ extern int irq_set_affinity_locked(struct irq_data *data,
   const struct cpumask *cpumask, bool force);
 extern int irq_set_vcpu_affinity(unsigned int irq, void *vcpu_info);
 
+#if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_IRQ_MIGRATION)
 extern void irq_migrate_all_off_this_cpu(void);
+extern int irq_affinity_online_cpu(unsigned int cpu);
+#else
+# define irq_affinity_online_cpu   NULL
+#endif
 
 #if defined(CONFIG_SMP) && defined(CONFIG_GENERIC_PENDING_IRQ)
 void irq_move_irq(struct irq_data *data);
diff --git a/kernel/cpu.c b/kernel/cpu.c
index cb51034..b86b32e 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1252,6 +1252,11 @@ static struct cpuhp_step cpuhp_ap_states[] = {
.startup.single = smpboot_unpark_threads,
.teardown.single= NULL,
},
+   [CPUHP_AP_IRQ_AFFINITY_ONLINE] = {
+   .name   = "irq/affinity:online",
+   .startup.single = irq_affinity_online_cpu,
+   .teardown.single= NULL,
+   },
[CPUHP_AP_PERF_ONLINE] = {
.name   = "perf:online",
.startup.single = perf_event_init_cpu,
diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index 0b093db..b7964e7 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -83,6 +83,15 @@ static bool migrate_one_irq(struct irq_desc *desc)
chip->irq_mask(d);
 
if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
+   /*
+* If the interrupt is managed, then shut it down and leave
+* the affinity untouched.
+*/
+   if (irqd_affinity_is_managed(d)) {
+   irqd_set_managed_shutdown(d);
+   irq_shutdown(desc);
+   return false;
+   }
affinity = cpu_online_mask;
brokeaff = true;
}
@@ -129,3 +138,39 @@ void irq_migrate_all_off_this_cpu(void)
}
}
 }
+
+static void irq_restore_affinity_of_irq(struct irq_desc *desc, unsigned int 
cpu)
+{
+   struct irq_data *data =

[tip:irq/core] genirq/cpuhotplug: Avoid irq affinity setting for single targets

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  8f31a9845db348f5781df47ce04c79e4cfe90016
Gitweb: http://git.kernel.org/tip/8f31a9845db348f5781df47ce04c79e4cfe90016
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:53 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:25 +0200

genirq/cpuhotplug: Avoid irq affinity setting for single targets

Avoid trying to add a newly online CPU to the effective affinity mask of an
started up interrupt. That interrupt will either stay on the already online
CPU or move around for no value.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235447.431321...@linutronix.de

---
 kernel/irq/cpuhotplug.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/cpuhotplug.c b/kernel/irq/cpuhotplug.c
index b7964e7..aee8f7e 100644
--- a/kernel/irq/cpuhotplug.c
+++ b/kernel/irq/cpuhotplug.c
@@ -148,9 +148,17 @@ static void irq_restore_affinity_of_irq(struct irq_desc 
*desc, unsigned int cpu)
!irq_data_get_irq_chip(data) || !cpumask_test_cpu(cpu, affinity))
return;
 
-   if (irqd_is_managed_and_shutdown(data))
+   if (irqd_is_managed_and_shutdown(data)) {
irq_startup(desc, IRQ_RESEND, IRQ_START_COND);
-   else
+   return;
+   }
+
+   /*
+* If the interrupt can only be directed to a single target
+* CPU then it is already assigned to a CPU in the affinity
+* mask. No point in trying to move it around.
+*/
+   if (!irqd_is_single_target(data))
irq_set_affinity_locked(data, affinity, false);
 }

Re: [PATCH v3 3/3] perf: xgene: Add support for SoC PMU version 3

2017-06-22 Thread Mark Rutland

Hi Hoan,

This largely looks good; I have one minor comment.

On Tue, Jun 06, 2017 at 11:02:26AM -0700, Hoan Tran wrote:
>  static inline void
> +xgene_pmu_write_counter64(struct xgene_pmu_dev *pmu_dev, int idx, u64 val)
> +{
> + u32 cnt_lo, cnt_hi;
> +
> + cnt_hi = upper_32_bits(val);
> + cnt_lo = lower_32_bits(val);
> +
> + /* v3 has 64-bit counter registers composed by 2 32-bit registers */
> + xgene_pmu_write_counter32(pmu_dev, 2 * idx, cnt_lo);
> + xgene_pmu_write_counter32(pmu_dev, 2 * idx + 1, cnt_hi);
> +}

For this to be atomic, we need to disable the counters for the duration
of the IRQ handler, which we don't do today.

Regardless, we should do that to ensure that groups are self-consistent.

i.e. in xgene_pmu_isr() we should call ops->stop_counters() just after
taking the pmu lock, and we should call ops->start_counters() just
before releasing it.

With that:

Acked-by: Mark Rutland 

Thanks,
Mark.

Re: [PATCH v5 0/5] stmmac: pci: Refactor DMI probing

2017-06-22 Thread David Miller

From: Jan Kiszka 
Date: Thu, 22 Jun 2017 19:43:51 +0200

> On 2017-06-22 19:40, David Miller wrote:
>> From: Jan Kiszka 
>> Date: Thu, 22 Jun 2017 08:17:56 +0200
>> 
>>> Some cleanups of the way we probe DMI platforms in the driver. Reduces
>>> a bit of open-coding and makes the logic easier reusable for any
>>> potential DMI platform != Quark.
>>>
>>> Tested on IOT2000 and Galileo Gen2.
>>>
>>> Changes in v5:
>>>  - fixed a remaining issue in patch 5
>>>  - dropped patch 6 for now
>> 
>> Series applied to net-next.
>> 
>> Any chance the DMI table can be marked const as well?
>> 
> 
> Hmm, they are all const - or which one do you mean?

Indeed, they are, I misread the patches.

Nothing to see here, move along :)

[PATCH 2/2] drm/stm: Fixup for "drm/stm: ltdc: Add panel-bridge support"

2017-06-22 Thread Eric Anholt

Signed-off-by: Eric Anholt 
---

This fixup would be squashed into patch 1 of your series.
 drivers/gpu/drm/stm/ltdc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/stm/ltdc.c b/drivers/gpu/drm/stm/ltdc.c
index 7d7e889f09c3..d1d28348512b 100644
--- a/drivers/gpu/drm/stm/ltdc.c
+++ b/drivers/gpu/drm/stm/ltdc.c
@@ -953,7 +953,8 @@ int ltdc_load(struct drm_device *ddev)
bridge = drm_panel_bridge_add(panel, DRM_MODE_CONNECTOR_DPI);
if (IS_ERR(bridge)) {
DRM_ERROR("Failed to create panel-bridge\n");
-   return PTR_ERR(bridge);
+   ret = PTR_ERR(bridge);
+   goto err;
}
ldev->is_panel_bridge = true;
}
-- 
2.11.0

[PATCH 1/2] drm/stm: Fix leak of pixel clock enable in some error paths.

2017-06-22 Thread Eric Anholt

The clock gets enabled early on in init, since it's required in order
to read registers.  If only devm_clk_prepare_enable() was a thing!

Signed-off-by: Eric Anholt 
---

This fixup, if you like, I would slip in before patch 1 of your series.

 drivers/gpu/drm/stm/ltdc.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/stm/ltdc.c b/drivers/gpu/drm/stm/ltdc.c
index 8aa05860029f..7d7e889f09c3 100644
--- a/drivers/gpu/drm/stm/ltdc.c
+++ b/drivers/gpu/drm/stm/ltdc.c
@@ -908,13 +908,15 @@ int ltdc_load(struct drm_device *ddev)
 
if (of_address_to_resource(np, 0, )) {
DRM_ERROR("Unable to get resource\n");
-   return -ENODEV;
+   ret = -ENODEV;
+   goto err;
}
 
ldev->regs = devm_ioremap_resource(dev, );
if (IS_ERR(ldev->regs)) {
DRM_ERROR("Unable to get ltdc registers\n");
-   return PTR_ERR(ldev->regs);
+   ret = PTR_ERR(ldev->regs);
+   goto err;
}
 
for (i = 0; i < MAX_IRQ; i++) {
@@ -927,7 +929,7 @@ int ltdc_load(struct drm_device *ddev)
dev_name(dev), ddev);
if (ret) {
DRM_ERROR("Failed to register LTDC interrupt\n");
-   return ret;
+   goto err;
}
}
 
@@ -942,7 +944,7 @@ int ltdc_load(struct drm_device *ddev)
if (ret) {
DRM_ERROR("hardware identifier (0x%08x) not supported!\n",
  ldev->caps.hw_version);
-   return ret;
+   goto err;
}
 
DRM_INFO("ltdc hw version 0x%08x - ready\n", ldev->caps.hw_version);
-- 
2.11.0

Re: [PATCH 3/4] drm/vc4: Use the atomic state's commit workqueue.

2017-06-22 Thread Eric Anholt

Daniel Vetter  writes:

> On Wed, Jun 21, 2017 at 11:50:01AM -0700, Eric Anholt wrote:
>> Now that we're using the atomic helpers for fence waits, we can use
>> the same codepath as drm_atomic_helper_commit() does for async,
>> getting rid of our custom vc4_commit struct.
>
> \o/
>
> On the series: Acked-by: Daniel Vetter 

Applied the reviews and acks and pushed.

Next, what would it take to get rid of the async_modeset semaphore so we
can use the core helpers even more?

signature.asc
Description: PGP signature

[PATCH 2/3] Enable capabilities of files from shared filesystem

2017-06-22 Thread Stefan Berger

The previous patch changed existing behavior in so far as the
capabilities of files from a shared filesystem or bind-mounted files
were hidden from a user namespace. This patch makes these capabilties
visible to the user namespace again, unless the user namespace has set
its own capabilities, which will hide them until those set by the
user namespace are removed.

Also the listing of xattrs is adjusted. To avoid double listing of
names of extended attributes, which can happen if the container and the
host for example have security.capability, we now check that we do not
add the same extended attribute to the list twice.

Signed-off-by: Stefan Berger 
Signed-off-by: Serge Hallyn 
Reviewed-by: Serge Hallyn 
---
 fs/xattr.c | 90 --
 1 file changed, 64 insertions(+), 26 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 64c4b40..045be85 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -339,6 +339,26 @@ xattr_rewrite_userns_xattr(char *name)
 }
 
 /*
+ * xattr_list_contains - check whether an xattr list already contains a needle
+ *
+ * @list: 0-byte separated strings
+ * @listlen : length of the list
+ * @needle  : the needle to search for
+ */
+static int
+xattr_list_contains(const char *list, size_t listlen, const char *needle)
+{
+   size_t o = 0;
+
+   while (o < listlen) {
+   if (!strcmp([o], needle))
+   return true;
+   o += strlen([o]) + 1;
+   }
+   return false;
+}
+
+/*
  * xattr_list_userns_rewrite - Rewrite list of xattr names for user namespaces
  * or determine needed size for attribute list
  * in case size == 0
@@ -377,12 +397,16 @@ xattr_list_userns_rewrite(char *list, ssize_t size, 
size_t list_maxlen)
if (!len)
break;
 
-   newname = xattr_rewrite_userns_xattr(name);
-   if (IS_ERR(newname)) {
-   d_off = PTR_ERR(newname);
-   goto out_free;
+   if (xattr_is_userns_supported(name, false) >= 0)
+   newname = name;
+   else {
+   newname = xattr_rewrite_userns_xattr(name);
+   if (IS_ERR(newname)) {
+   d_off = PTR_ERR(newname);
+   goto out_free;
+   }
}
-   if (newname) {
+   if (newname && !xattr_list_contains(nlist, d_off, newname)) {
nlen = strlen(newname);
 
if (nlist) {
@@ -413,13 +437,19 @@ xattr_list_userns_rewrite(char *list, ssize_t size, 
size_t list_maxlen)
  * security.foo to protect these extended attributes.
  *
  * Reading:
- * 1) Reading security.foo from a user namespace will read
- *security.foo@uid= of the parent user namespace instead with uid
- *being the mapping of root in that parent user namespace. An
- *exception is if root is mapped to uid 0 on the host, and in this case
- *we will read security.foo directly.
- *-> reading security.foo will read security.foo@uid=1000 for a uid
- *   mapping of root to 1000.
+ * 1a) Reading security.foo from a user namespace will read
+ * security.foo@uid= of the parent user namespace instead with uid
+ * being the mapping of root in that parent user namespace. An
+ * exception is if root is mapped to uid 0 on the host, and in this case
+ * we will read security.foo directly.
+ * -> reading security.foo will read security.foo@uid=1000 for a uid
+ *mapping of root to 1000.
+ *
+ * 1b) If security.foo@uid= is not available, the security.foo of the
+ * parent namespace is tried to be read. This procedure is repeated up to
+ * the init user namespace. This step only applies for reading of extended
+ * attributes and provides the same behavior as older systems where the
+ * host's extended attributes applied to user namespaces.
  *
  * 2) All security.foo@uid= with valid uid mappings in the user namespace
  *an be read. The uid within the user namespace will be mapped to the
@@ -434,7 +464,7 @@ xattr_list_userns_rewrite(char *list, ssize_t size, size_t 
list_maxlen)
  * 3) No other security.foo* can be read.
  *
  * Writing and removing:
- * The same rules for reading apply to writing and removing.
+ * The same rules for reading apply to writing and removing, except for 1b).
  *
  * This function returns a buffer with either the original name or the
  * user namespace adjusted name of the extended attribute.
@@ -444,11 +474,12 @@ xattr_list_userns_rewrite(char *list, ssize_t size, 
size_t list_maxlen)
  * @is_write: whether this is for writing an xattr
  */
 char *
-xattr_userns_name(const char *fullname, const char *suffix)
+xattr_userns_name(const char *fullname, const char

[PATCH 1/3] xattr: Enable security.capability in user namespaces

2017-06-22 Thread Stefan Berger

This patch enables security.capability in user namespaces but also
takes a more general approach to enabling extended attributes in user
namespaces.

The following rules describe the approach using security.foo as a
'user namespace enabled' extended attribute:

Reading of extended attributes:

1) Reading security.foo from a user namespace will read
   security.foo@uid= of the parent user namespace instead with uid
   being the mapping of root in that parent user namespace. An
   exception is if root is mapped to uid 0 on the host, and in this case
   we will read security.foo directly.
   --> reading security.foo will read security.foo@uid=1000 for uid
   mapping of root to 1000.

2) All security.foo@uid= with valid uid mapping in the user namespace
   can be read. The uid within the user namespace will be mapped to the
   corresponding uid on the host and that uid will be used in the name of
   the extended attribute.
   -> reading security.foo@uid=1 will read security.foo@uid=1001 for uid
  mapping of root to 1000, size of at least 2.

   All security.foo@uid= can be read (by root) on the host with values
   of  also being subject to checking for valid mappings.

3) No other security.foo* can be read.

The same rules for reading apply to writing and removing of user
namespace enabled extended attributes.

When listing extended attributes of a file, only those are presented
to the user namespace that have a valid mapping. Besides that, names
of the extended attributes are adjusted to represent the mapping.
This means that if root is mapped to uid 1000 on the host, the
security.foo@uid=1000 will be listed as security.foo in the user
namespace, security.foo@uid=1001 becomes security.foo@uid=1 and so on.

Signed-off-by: Stefan Berger 
Signed-off-by: Serge Hallyn 
Reviewed-by: Serge Hallyn 
---
 fs/xattr.c   | 433 ++-
 security/commoncap.c |  36 ++--
 security/selinux/hooks.c |   9 +-
 3 files changed, 462 insertions(+), 16 deletions(-)

diff --git a/fs/xattr.c b/fs/xattr.c
index 464c94b..64c4b40 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -133,11 +133,405 @@ xattr_permission(struct inode *inode, const char *name, 
int mask)
return inode_permission(inode, mask);
 }
 
+/*
+ * A list of extended attributes that are supported in user namespaces
+ */
+static const char *const userns_xattrs[] = {
+   XATTR_NAME_CAPS,
+   NULL
+};
+
+/*
+ * xattrs_is_userns_supported - Check whether an xattr is supported in userns
+ *
+ * @name:   full name of the extended attribute
+ * @prefix: do a prefix match (true) or a full match (false)
+ *
+ * This function returns < 0 if not supported, an index into userns_xattrs[]
+ * otherwise.
+ */
+static int
+xattr_is_userns_supported(const char *name, int prefix)
+{
+   int i;
+
+   if (!name)
+   return -1;
+
+   for (i = 0; userns_xattrs[i]; i++) {
+   if (prefix) {
+   if (!strncmp(userns_xattrs[i], name,
+strlen(userns_xattrs[i])))
+   return i;
+   } else {
+   if (!strcmp(userns_xattrs[i], name))
+   return i;
+   }
+   }
+   return -1;
+}
+
+/*
+ * xattr_write_uid - print a string in the format of "%s@uid=%u", which
+ *   includes a prefix strig
+ *
+ * @uid: the uid
+ * @prefix:  prefix string; may be NULL
+ *
+ * This function returns a buffer with the string, or a NULL pointer in
+ * case of out-of-memory error.
+ */
+static char *
+xattr_write_uid(uid_t uid, const char *prefix)
+{
+   size_t buflen;
+   char *buffer;
+
+   buflen = sizeof("@uid=") - 1 + sizeof("4294967295") - 1 + 1;
+   if (prefix)
+   buflen += strlen(prefix);
+
+   buffer = kmalloc(buflen, GFP_KERNEL);
+   if (!buffer)
+   return NULL;
+
+   if (uid == 0)
+   *buffer = 0;
+   else
+   sprintf(buffer, "%s@uid=%u",
+   (prefix) ? prefix : "",
+   uid);
+
+   return buffer;
+}
+
+/*
+ * xattr_parse_uid_from_kuid - parse string in the format @uid=; consider
+ * user namespaces and check mappings
+ *
+ * @uidstr   : string in the format "@uid="
+ * @userns   : the user namespace to consult for uid mappings
+ * @n_uidstr : returned pointer holding the rewritten @uid= string with
+ * the uid remapped
+ *
+ * This function returns an error code or 0 in case of success. In case
+ * of success, 'n_uidstr' will hold a valid string.
+ */
+static int
+xattr_parse_uid_from_kuid(const char *uidstr, struct user_namespace *userns,
+ char **n_uidstr)
+{
+   int n;
+   uid_t muid, p_uid;
+   char d;
+   kuid_t tuid;
+
+   *n_uidstr = NULL;
+
+   n =

[PATCH 3/3] Enable security.selinux in user namespaces

2017-06-22 Thread Stefan Berger

Before the current modifications, SELinux extended attributes were
visible inside the user namespace but changes in patch 1 hid them.
This patch enables security.selinux in user namespaces and allows
them to be written to in the same way as security.capability.

Signed-off-by: Stefan Berger 
---
 fs/xattr.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/xattr.c b/fs/xattr.c
index 045be85..37686ee 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -138,6 +138,7 @@ xattr_permission(struct inode *inode, const char *name, int 
mask)
  */
 static const char *const userns_xattrs[] = {
XATTR_NAME_CAPS,
+   XATTR_NAME_SELINUX,
NULL
 };
 
-- 
2.7.4

[PATCH 0/3] Enable namespaced file capabilities

2017-06-22 Thread Stefan Berger

This series of patches primary goal is to enable file capabilities
in user namespaces without affecting the file capabilities that are
effective on the host. This is to prevent that any unprivileged user
on the host maps his own uid to root in a private namespace, writes
the xattr, and executes the file with privilege on the host.

We achieve this goal by writing extended attributes with a different
name when a user namespace is used. If for example the root user
in a user namespace writes the security.capability xattr, the name
of the xattr that is actually written is encoded as
security.capability@uid=1000 for root mapped to uid 1000 on the host.
When listing the xattrs on the host, the existing security.capability
as well as the security.capability@uid=1000 will be shown. Inside the
namespace only 'security.capability', with the value of
security.capability@uid=1000, is visible.

To maintain compatibility with existing behavior, the value of
security.capability of the host is shown inside the user namespace
once the security.capability of the user namespace has been removed
(which really removes security.capability@uid=1000). Writing to
an extended attribute inside a user namespace effectively hides the
extended attribute of the host.

The general framework that is established with these patches can
be applied to other extended attributes as well, such as security.ima
or the 'trusted.' prefix . Another extended attribute that needed to
be enabled here is 'security.selinux,' since otherwise this extended
attribute would not be shown anymore inside a user namespace.

Regards,
   Stefan & Serge


Stefan Berger (3):
  xattr: Enable security.capability in user namespaces
  Enable capabilities of files from shared filesystem
  Enable security.selinux in user namespaces

 fs/xattr.c   | 472 ++-
 security/commoncap.c |  36 +++-
 security/selinux/hooks.c |   9 +-
 3 files changed, 501 insertions(+), 16 deletions(-)

-- 
2.7.4

[PATCH 2/2] edac: pnd2_edac: remove useless variable assignment

2017-06-22 Thread Gustavo A. R. Silva

Value assigned to variable _ret_ at line 176 is overwritten
a few lines below before it can be used. This makes such
variable assignment useless.

Addresses-Coverity-ID: 1403730
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/edac/pnd2_edac.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index c51ec73..d1de53e 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -173,7 +173,7 @@ static int apl_rd_reg(int port, int off, int op, void 
*data, size_t sz, char *na
edac_dbg(2, "Read %s port=%x off=%x op=%x\n", name, port, off, op);
switch (sz) {
case 8:
-   ret = sbi_send(port, off + 4, op, (u32 *)(data + 4));
+   sbi_send(port, off + 4, op, (u32 *)(data + 4));
/* fall through */
case 4:
ret = sbi_send(port, off, op, (u32 *)data);
-- 
2.5.0

Re: [PATCH] doc/kokr/howto: Only send regression fixes after -rc1

2017-06-22 Thread Jonathan Corbet

On Fri, 16 Jun 2017 20:34:32 +0900
SeongJae Park  wrote:

> This commit applies commit 388f9b20f98d ("Documentation/process/howto:
> Only send regression fixes after -rc1") to Korean translation.

Applied to the docs tree, thanks.

jon

Re: [PATCH v7 27/36] iommu/amd: Allow the AMD IOMMU to work with memory encryption

2017-06-22 Thread Tom Lendacky


On 6/22/2017 5:56 AM, Borislav Petkov wrote:

On Fri, Jun 16, 2017 at 01:54:59PM -0500, Tom Lendacky wrote:

The IOMMU is programmed with physical addresses for the various tables
and buffers that are used to communicate between the device and the
driver. When the driver allocates this memory it is encrypted. In order
for the IOMMU to access the memory as encrypted the encryption mask needs
to be included in these physical addresses during configuration.

The PTE entries created by the IOMMU should also include the encryption
mask so that when the device behind the IOMMU performs a DMA, the DMA
will be performed to encrypted memory.

Signed-off-by: Tom Lendacky 
---
  drivers/iommu/amd_iommu.c   |   30 --
  drivers/iommu/amd_iommu_init.c  |   34 --
  drivers/iommu/amd_iommu_proto.h |   10 ++
  drivers/iommu/amd_iommu_types.h |2 +-
  4 files changed, 55 insertions(+), 21 deletions(-)


Reviewed-by: Borislav Petkov 

Btw, I'm assuming the virt_to_phys() difference on SME systems is only
needed in a handful of places. Otherwise, I'd suggest changing the
virt_to_phys() function/macro directly. But I guess most of the places
need the real physical address without the enc bit.


Correct.

Thanks,
Tom

[tip:irq/core] x86/ioapic: Create named irq domain

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  1b604745c8474c76e5fd1682ea5b7da0a1c6d440
Gitweb: http://git.kernel.org/tip/1b604745c8474c76e5fd1682ea5b7da0a1c6d440
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:07 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:09 +0200

x86/ioapic: Create named irq domain

Use the fwnode to create a named domain so diagnosis works, but only when
the the ioapic is not device tree based.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235443.752782...@linutronix.de
Signed-off-by: Thomas Gleixner 

---
 arch/x86/kernel/apic/io_apic.c | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 347bb9f..444ae92 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2223,6 +2223,8 @@ static int mp_irqdomain_create(int ioapic)
struct ioapic *ip = [ioapic];
struct ioapic_domain_cfg *cfg = >irqdomain_cfg;
struct mp_ioapic_gsi *gsi_cfg = mp_ioapic_gsi_routing(ioapic);
+   struct fwnode_handle *fn;
+   char *name = "IO-APIC";
 
if (cfg->type == IOAPIC_DOMAIN_INVALID)
return 0;
@@ -2233,9 +2235,25 @@ static int mp_irqdomain_create(int ioapic)
parent = irq_remapping_get_ir_irq_domain();
if (!parent)
parent = x86_vector_domain;
+   else
+   name = "IO-APIC-IR";
+
+   /* Handle device tree enumerated APICs proper */
+   if (cfg->dev) {
+   fn = of_node_to_fwnode(cfg->dev);
+   } else {
+   fn = irq_domain_alloc_named_id_fwnode(name, ioapic);
+   if (!fn)
+   return -ENOMEM;
+   }
+
+   ip->irqdomain = irq_domain_create_linear(fn, hwirqs, cfg->ops,
+(void *)(long)ioapic);
+
+   /* Release fw handle if it was allocated above */
+   if (!cfg->dev)
+   irq_domain_free_fwnode(fn);
 
-   ip->irqdomain = irq_domain_add_linear(cfg->dev, hwirqs, cfg->ops,
- (void *)(long)ioapic);
if (!ip->irqdomain)
return -ENOMEM;

[tip:irq/core] x86/htirq: Create named domain

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  5f432711ba94400fb39e9be81913ced81c141758
Gitweb: http://git.kernel.org/tip/5f432711ba94400fb39e9be81913ced81c141758
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:08 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:09 +0200

x86/htirq: Create named domain

Use the fwnode to create a named domain so diagnosis works.

Mark the init function __init while at it.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235443.829047...@linutronix.de
Signed-off-by: Thomas Gleixner 

---
 arch/x86/kernel/apic/htirq.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/apic/htirq.c b/arch/x86/kernel/apic/htirq.c
index ae50d34..56ccf93 100644
--- a/arch/x86/kernel/apic/htirq.c
+++ b/arch/x86/kernel/apic/htirq.c
@@ -150,16 +150,27 @@ static const struct irq_domain_ops htirq_domain_ops = {
.deactivate = htirq_domain_deactivate,
 };
 
-void arch_init_htirq_domain(struct irq_domain *parent)
+void __init arch_init_htirq_domain(struct irq_domain *parent)
 {
+   struct fwnode_handle *fn;
+
if (disable_apic)
return;
 
-   htirq_domain = irq_domain_add_tree(NULL, _domain_ops, NULL);
+   fn = irq_domain_alloc_named_fwnode("PCI-HT");
+   if (!fn)
+   goto warn;
+
+   htirq_domain = irq_domain_create_tree(fn, _domain_ops, NULL);
+   irq_domain_free_fwnode(fn);
if (!htirq_domain)
-   pr_warn("failed to initialize irqdomain for HTIRQ.\n");
-   else
-   htirq_domain->parent = parent;
+   goto warn;
+
+   htirq_domain->parent = parent;
+   return;
+
+warn:
+   pr_warn("Failed to initialize irqdomain for HTIRQ.\n");
 }
 
 int arch_setup_ht_irq(int idx, int pos, struct pci_dev *dev,

[tip:irq/core] x86/irq: Cleanup pending irq move in fixup_irqs()

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  8e7b632237df8b17526411d1d98f838580bb6aa3
Gitweb: http://git.kernel.org/tip/8e7b632237df8b17526411d1d98f838580bb6aa3
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:20 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:13 +0200

x86/irq: Cleanup pending irq move in fixup_irqs()

If an CPU goes offline, the interrupts are migrated away, but a eventually
pending interrupt move, which has not yet been made effective is kept
pending even if the outgoing CPU is the sole target of the pending affinity
mask. What's worse is, that the pending affinity mask is discarded even if
it would contain a valid subset of the online CPUs.

Use the newly introduced helper to:

 - Discard a pending move when the outgoing CPU is the only target in the
   pending mask.

 - Use the pending mask instead of the affinity mask to find a valid target
   for the CPU if the pending mask intersects with the online CPUs.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235444.774068...@linutronix.de

---
 arch/x86/kernel/irq.c | 25 +
 1 file changed, 21 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c
index f34fe74..9696007d 100644
--- a/arch/x86/kernel/irq.c
+++ b/arch/x86/kernel/irq.c
@@ -440,9 +440,9 @@ void fixup_irqs(void)
int ret;
 
for_each_irq_desc(irq, desc) {
+   const struct cpumask *affinity;
int break_affinity = 0;
int set_affinity = 1;
-   const struct cpumask *affinity;
 
if (!desc)
continue;
@@ -454,19 +454,36 @@ void fixup_irqs(void)
 
data = irq_desc_get_irq_data(desc);
affinity = irq_data_get_affinity_mask(data);
+
if (!irq_has_action(irq) || irqd_is_per_cpu(data) ||
cpumask_subset(affinity, cpu_online_mask)) {
+   irq_fixup_move_pending(desc, false);
raw_spin_unlock(>lock);
continue;
}
 
/*
-* Complete the irq move. This cpu is going down and for
-* non intr-remapping case, we can't wait till this interrupt
-* arrives at this cpu before completing the irq move.
+* Complete an eventually pending irq move cleanup. If this
+* interrupt was moved in hard irq context, then the
+* vectors need to be cleaned up. It can't wait until this
+* interrupt actually happens and this CPU was involved.
 */
irq_force_complete_move(desc);
 
+   /*
+* If there is a setaffinity pending, then try to reuse the
+* pending mask, so the last change of the affinity does
+* not get lost. If there is no move pending or the pending
+* mask does not contain any online CPU, use the current
+* affinity mask.
+*/
+   if (irq_fixup_move_pending(desc, true))
+   affinity = desc->pending_mask;
+
+   /*
+* If the mask does not contain an offline CPU, break
+* affinity and use cpu_online_mask as fall back.
+*/
if (cpumask_any_and(affinity, cpu_online_mask) >= nr_cpu_ids) {
break_affinity = 1;
affinity = cpu_online_mask;

[tip:irq/core] genirq: Move irq_fixup_move_pending() to core

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  36d84fb45140f151fa4e145381dbce5e5ffed24d
Gitweb: http://git.kernel.org/tip/36d84fb45140f151fa4e145381dbce5e5ffed24d
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:34 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:19 +0200

genirq: Move irq_fixup_move_pending() to core

Now that x86 uses the generic code, the function declaration and inline
stub can move to the core internal header.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235445.928156...@linutronix.de

---
 include/linux/irq.h| 5 -
 kernel/irq/internals.h | 5 +
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/include/linux/irq.h b/include/linux/irq.h
index 299271a..2b7e5a7 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -492,15 +492,10 @@ extern void irq_migrate_all_off_this_cpu(void);
 void irq_move_irq(struct irq_data *data);
 void irq_move_masked_irq(struct irq_data *data);
 void irq_force_complete_move(struct irq_desc *desc);
-bool irq_fixup_move_pending(struct irq_desc *desc, bool force_clear);
 #else
 static inline void irq_move_irq(struct irq_data *data) { }
 static inline void irq_move_masked_irq(struct irq_data *data) { }
 static inline void irq_force_complete_move(struct irq_desc *desc) { }
-static inline bool irq_fixup_move_pending(struct irq_desc *desc, bool fclear)
-{
-   return false;
-}
 #endif
 
 extern int no_irq_affinity;
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index fd4fa83..040806f 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -272,6 +272,7 @@ static inline struct cpumask 
*irq_desc_get_pending_mask(struct irq_desc *desc)
 {
return desc->pending_mask;
 }
+bool irq_fixup_move_pending(struct irq_desc *desc, bool force_clear);
 #else /* CONFIG_GENERIC_PENDING_IRQ */
 static inline bool irq_can_move_pcntxt(struct irq_data *data)
 {
@@ -293,6 +294,10 @@ static inline struct cpumask 
*irq_desc_get_pending_mask(struct irq_desc *desc)
 {
return NULL;
 }
+static inline bool irq_fixup_move_pending(struct irq_desc *desc, bool fclear)
+{
+   return false;
+}
 #endif /* !CONFIG_GENERIC_PENDING_IRQ */
 
 #ifdef CONFIG_GENERIC_IRQ_DEBUGFS

[tip:irq/core] genirq: Remove pointless gfp argument

2017-06-22 Thread tip-bot for Thomas Gleixner

Commit-ID:  4ab764c336123157690eea1dcf81851c58d1
Gitweb: http://git.kernel.org/tip/4ab764c336123157690eea1dcf81851c58d1
Author: Thomas Gleixner 
AuthorDate: Tue, 20 Jun 2017 01:37:36 +0200
Committer:  Thomas Gleixner 
CommitDate: Thu, 22 Jun 2017 18:21:19 +0200

genirq: Remove pointless gfp argument

All callers hand in GPF_KERNEL. No point to have an extra argument for
that.

Signed-off-by: Thomas Gleixner 
Cc: Jens Axboe 
Cc: Marc Zyngier 
Cc: Michael Ellerman 
Cc: Keith Busch 
Cc: Peter Zijlstra 
Cc: Christoph Hellwig 
Link: http://lkml.kernel.org/r/20170619235446.082544...@linutronix.de

---
 kernel/irq/irqdesc.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index feade53..48d4f03 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -54,14 +54,14 @@ static void __init init_irq_default_affinity(void)
 #endif
 
 #ifdef CONFIG_SMP
-static int alloc_masks(struct irq_desc *desc, gfp_t gfp, int node)
+static int alloc_masks(struct irq_desc *desc, int node)
 {
if (!zalloc_cpumask_var_node(>irq_common_data.affinity,
-gfp, node))
+GFP_KERNEL, node))
return -ENOMEM;
 
 #ifdef CONFIG_GENERIC_PENDING_IRQ
-   if (!zalloc_cpumask_var_node(>pending_mask, gfp, node)) {
+   if (!zalloc_cpumask_var_node(>pending_mask, GFP_KERNEL, node)) {
free_cpumask_var(desc->irq_common_data.affinity);
return -ENOMEM;
}
@@ -86,7 +86,7 @@ static void desc_smp_init(struct irq_desc *desc, int node,
 
 #else
 static inline int
-alloc_masks(struct irq_desc *desc, gfp_t gfp, int node) { return 0; }
+alloc_masks(struct irq_desc *desc, int node) { return 0; }
 static inline void
 desc_smp_init(struct irq_desc *desc, int node, const struct cpumask *affinity) 
{ }
 #endif
@@ -344,9 +344,8 @@ static struct irq_desc *alloc_desc(int irq, int node, 
unsigned int flags,
   struct module *owner)
 {
struct irq_desc *desc;
-   gfp_t gfp = GFP_KERNEL;
 
-   desc = kzalloc_node(sizeof(*desc), gfp, node);
+   desc = kzalloc_node(sizeof(*desc), GFP_KERNEL, node);
if (!desc)
return NULL;
/* allocate based on nr_cpu_ids */
@@ -354,7 +353,7 @@ static struct irq_desc *alloc_desc(int irq, int node, 
unsigned int flags,
if (!desc->kstat_irqs)
goto err_desc;
 
-   if (alloc_masks(desc, gfp, node))
+   if (alloc_masks(desc, node))
goto err_kstat;
 
raw_spin_lock_init(>lock);
@@ -525,7 +524,7 @@ int __init early_irq_init(void)
 
for (i = 0; i < count; i++) {
desc[i].kstat_irqs = alloc_percpu(unsigned int);
-   alloc_masks([i], GFP_KERNEL, node);
+   alloc_masks([i], node);
raw_spin_lock_init([i].lock);
lockdep_set_class([i].lock, _desc_lock_class);
desc_set_defaults(i, [i], node, NULL, NULL);

[PATCH v3 10/11] brcmsmac: reindent split functions

2017-06-22 Thread Arnd Bergmann

In the previous commit I left the indentation alone to help reviewing
the patch, this one now runs the three new functions through 'indent -kr -8'
with some manual fixups to avoid silliness.

No changes other than whitespace are intended here.

Signed-off-by: Arnd Bergmann 
Acked-by: Arend van Spriel 
---
 .../broadcom/brcm80211/brcmsmac/phy/phy_n.c| 1507 +---
 1 file changed, 697 insertions(+), 810 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
index ed409a80f3d2..763e8ba6b178 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmsmac/phy/phy_n.c
@@ -16074,7 +16074,8 @@ static void wlc_phy_workarounds_nphy_rev7(struct 
brcms_phy *pi)
NPHY_REV3_RFSEQ_CMD_INT_PA_PU,
NPHY_REV3_RFSEQ_CMD_END
};
-   static const u8 rfseq_rx2tx_dlys_rev3_ipa[] = { 8, 6, 6, 4, 4, 16, 43, 
1, 1 };
+   static const u8 rfseq_rx2tx_dlys_rev3_ipa[] =
+   { 8, 6, 6, 4, 4, 16, 43, 1, 1 };
static const u16 rfseq_rx2tx_dacbufpu_rev7[] = { 0x10f, 0x10f };
u32 leg_data_weights;
u8 chan_freq_range = 0;
@@ -16114,526 +16115,452 @@ static void wlc_phy_workarounds_nphy_rev7(struct 
brcms_phy *pi)
int coreNum;
 
 
-   if (NREV_IS(pi->pubpi.phy_rev, 7)) {
-   mod_phy_reg(pi, 0x221, (0x1 << 4), (1 << 4));
-
-   mod_phy_reg(pi, 0x160, (0x7f << 0), (32 << 0));
-   mod_phy_reg(pi, 0x160, (0x7f << 8), (39 << 8));
-   mod_phy_reg(pi, 0x161, (0x7f << 0), (46 << 0));
-   mod_phy_reg(pi, 0x161, (0x7f << 8), (51 << 8));
-   mod_phy_reg(pi, 0x162, (0x7f << 0), (55 << 0));
-   mod_phy_reg(pi, 0x162, (0x7f << 8), (58 << 8));
-   mod_phy_reg(pi, 0x163, (0x7f << 0), (60 << 0));
-   mod_phy_reg(pi, 0x163, (0x7f << 8), (62 << 8));
-   mod_phy_reg(pi, 0x164, (0x7f << 0), (62 << 0));
-   mod_phy_reg(pi, 0x164, (0x7f << 8), (63 << 8));
-   mod_phy_reg(pi, 0x165, (0x7f << 0), (63 << 0));
-   mod_phy_reg(pi, 0x165, (0x7f << 8), (64 << 8));
-   mod_phy_reg(pi, 0x166, (0x7f << 0), (64 << 0));
-   mod_phy_reg(pi, 0x166, (0x7f << 8), (64 << 8));
-   mod_phy_reg(pi, 0x167, (0x7f << 0), (64 << 0));
-   mod_phy_reg(pi, 0x167, (0x7f << 8), (64 << 8));
-   }
-
-   if (NREV_LE(pi->pubpi.phy_rev, 8)) {
-   write_phy_reg(pi, 0x23f, 0x1b0);
-   write_phy_reg(pi, 0x240, 0x1b0);
-   }
+   if (NREV_IS(pi->pubpi.phy_rev, 7)) {
+   mod_phy_reg(pi, 0x221, (0x1 << 4), (1 << 4));
+
+   mod_phy_reg(pi, 0x160, (0x7f << 0), (32 << 0));
+   mod_phy_reg(pi, 0x160, (0x7f << 8), (39 << 8));
+   mod_phy_reg(pi, 0x161, (0x7f << 0), (46 << 0));
+   mod_phy_reg(pi, 0x161, (0x7f << 8), (51 << 8));
+   mod_phy_reg(pi, 0x162, (0x7f << 0), (55 << 0));
+   mod_phy_reg(pi, 0x162, (0x7f << 8), (58 << 8));
+   mod_phy_reg(pi, 0x163, (0x7f << 0), (60 << 0));
+   mod_phy_reg(pi, 0x163, (0x7f << 8), (62 << 8));
+   mod_phy_reg(pi, 0x164, (0x7f << 0), (62 << 0));
+   mod_phy_reg(pi, 0x164, (0x7f << 8), (63 << 8));
+   mod_phy_reg(pi, 0x165, (0x7f << 0), (63 << 0));
+   mod_phy_reg(pi, 0x165, (0x7f << 8), (64 << 8));
+   mod_phy_reg(pi, 0x166, (0x7f << 0), (64 << 0));
+   mod_phy_reg(pi, 0x166, (0x7f << 8), (64 << 8));
+   mod_phy_reg(pi, 0x167, (0x7f << 0), (64 << 0));
+   mod_phy_reg(pi, 0x167, (0x7f << 8), (64 << 8));
+   }
 
-   if (NREV_GE(pi->pubpi.phy_rev, 8))
-   mod_phy_reg(pi, 0xbd, (0xff << 0), (114 << 0));
+   if (NREV_LE(pi->pubpi.phy_rev, 8)) {
+   write_phy_reg(pi, 0x23f, 0x1b0);
+   write_phy_reg(pi, 0x240, 0x1b0);
+   }
 
-   wlc_phy_table_write_nphy(pi, NPHY_TBL_ID_AFECTRL, 1, 0x00, 16,
-_control);
-   wlc_phy_table_write_nphy(pi, NPHY_TBL_ID_AFECTRL, 1, 0x10, 16,
-_control);
+   if (NREV_GE(pi->pubpi.phy_rev, 8))
+   mod_phy_reg(pi, 0xbd, (0xff << 0), (114 << 0));
 
-   wlc_phy_table_read_nphy(pi, NPHY_TBL_ID_CMPMETRICDATAWEIGHTTBL,
-   1, 0, 32, _data_weights);
-   leg_data_weights = leg_data_weights & 0xff;
-   wlc_phy_table_write_nphy(pi, NPHY_TBL_ID_CMPMETRICDATAWEIGHTTBL,
-

[PATCH v3 11/11] kasan: rework Kconfig settings

2017-06-22 Thread Arnd Bergmann

We get a lot of very large stack frames using gcc-7.0.1 with the default
-fsanitize-address-use-after-scope --param asan-stack=1 options, which
can easily cause an overflow of the kernel stack, e.g.

drivers/acpi/nfit/core.c:2686:1: warning: the frame size of 4080 bytes is 
larger than 2048 bytes [-Wframe-larger-than=]
drivers/gpu/drm/amd/amdgpu/si.c:1756:1: warning: the frame size of 7304 bytes 
is larger than 2048 bytes [-Wframe-larger-than=]
drivers/gpu/drm/i915/gvt/handlers.c:2200:1: warning: the frame size of 43752 
bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c:952:1: warning: the frame size of 6032 
bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/isdn/hardware/avm/b1.c:637:1: warning: the frame size of 13200 bytes is 
larger than 2048 bytes [-Wframe-larger-than=]
drivers/media/dvb-frontends/stv090x.c:3089:1: warning: the frame size of 5880 
bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/media/i2c/cx25840/cx25840-core.c:4964:1: warning: the frame size of 
93992 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/net/wireless/ralink/rt2x00/rt2800lib.c:4994:1: warning: the frame size 
of 23928 bytes is larger than 2048 bytes [-Wframe-larger-than=]
drivers/staging/dgnc/dgnc_tty.c:2788:1: warning: the frame size of 7072 bytes 
is larger than 2048 bytes [-Wframe-larger-than=]
fs/ntfs/mft.c:2762:1: warning: the frame size of 7432 bytes is larger than 2048 
bytes [-Wframe-larger-than=]
lib/atomic64_test.c:242:1: warning: the frame size of 12648 bytes is larger 
than 2048 bytes [-Wframe-larger-than=]

To reduce this risk, -fsanitize-address-use-after-scope is now split out
into a separate Kconfig option, vhich cannot be selected at the same
time as KMEMCHECK, leading to stack frames that are smaller than 2
kilobytes most of the time on x86_64. An earlier version of this
patch also prevented combining KASAN_EXTRA with KASAN_INLINE, but that
is no longer necessary with gcc-7.0.1.

A lot of warnings with KASAN_EXTRA go away if we disable KMEMCHECK,
as -fsanitize-address-use-after-scope seems to understand the builtin
memcpy, but adds checking code around an extern memcpy call. I had
to work around a circular dependency, as DEBUG_SLAB/SLUB depended
on !KMEMCHECK, while KASAN did it the other way round. Now we handle
both the same way.

All patches to get the frame size below 2048 bytes with CONFIG_KASAN=y
and CONFIG_KASAN_EXTRA=n have been submitted along with this patch,
so we can bring back that default now. KASAN_EXTRA=y still causes lots
of warnings but now defaults to !COMPILE_TEST to disable it in
allmodconfig, and it remains disabled in all other defconfigs since
it is a new option.

This reverts parts of commit commit 3f181b4 ("lib/Kconfig.debug:
disable -Wframe-larger-than warnings with KASAN=y").

I experimented a bit more with smaller stack frames and have another
follow-up series that reduces the warning limit for 64-bit architectures
to 1280 bytes and 1536 when CONFIG_KASAN (but not KASAN_EXTRA) is
enabled, this requires another ~25 patches to address the additional
warnings. I also have patches for all KASAN_EXTRA warnings, but we
should look at those separately and then decide whether to remove
it completely, leaving out -fsanitize-address-use-after-scope.

Signed-off-by: Arnd Bergmann 
---
 lib/Kconfig.debug  |  4 ++--
 lib/Kconfig.kasan  | 11 ++-
 lib/Kconfig.kmemcheck  |  1 +
 scripts/Makefile.kasan |  3 +++
 4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ddbef2cac189..02ec4a4da7b1 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -217,7 +217,7 @@ config ENABLE_MUST_CHECK
 config FRAME_WARN
int "Warn for stack frames larger than (needs gcc 4.4)"
range 0 8192
-   default 0 if KASAN
+   default 3072 if KASAN_EXTRA
default 2048 if GCC_PLUGIN_LATENT_ENTROPY
default 1024 if !64BIT
default 2048 if 64BIT
@@ -500,7 +500,7 @@ config DEBUG_OBJECTS_ENABLE_DEFAULT
 
 config DEBUG_SLAB
bool "Debug slab memory allocations"
-   depends on DEBUG_KERNEL && SLAB && !KMEMCHECK
+   depends on DEBUG_KERNEL && SLAB && !KMEMCHECK && !KASAN
help
  Say Y here to have the kernel do limited verification on memory
  allocation as well as poisoning memory on free to catch use of freed
diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index bd38aab05929..4d17a8f4742f 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -5,7 +5,7 @@ if HAVE_ARCH_KASAN
 
 config KASAN
bool "KASan: runtime memory debugger"
-   depends on SLUB || (SLAB && !DEBUG_SLAB)
+   depends on SLUB || SLAB
select CONSTRUCTORS
select STACKDEPOT
help
@@ -20,6 +20,15 @@ config KASAN
  Currently CONFIG_KASAN doesn't work with CONFIG_DEBUG_SLAB
  (the resulting kernel does not boot).
 
+config KASAN_EXTRA
+   bool "KAsan:

Re: [ANNOUNCE] v4.11.5-rt1

2017-06-22 Thread Mike Galbraith

On Thu, 2017-06-22 at 18:34 +0200, Sebastian Andrzej Siewior wrote:
> On 2017-06-20 09:45:06 [+0200], Mike Galbraith wrote:
> > See ! and ?
> 
> See see.
> What about this:

I'll give it a go, likely during the weekend.

I moved 4.11-rt today (also repros nicely) due to ftrace annoying me.
 After yet more staring at ever more huge traces (opposite of goal;),
then taking a break to stare at source again, I decided that the dual
wake_q business should die.. and the stall died with it.

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1014,8 +1014,20 @@ struct wake_q_head {
>  #define WAKE_Q(name) \
>   struct wake_q_head name = { WAKE_Q_TAIL,  }
>  
> -extern void wake_q_add(struct wake_q_head *head,
> -   struct task_struct *task);
> +extern void __wake_q_add(struct wake_q_head *head,
> +  struct task_struct *task, bool sleeper);
> +static inline void wake_q_add(struct wake_q_head *head,
> +   struct task_struct *task)
> +{
> + __wake_q_add(head, task, false);
> +}
> +
> +static inline void wake_q_add_sleeper(struct wake_q_head *head,
> +   struct task_struct *task)
> +{
> + __wake_q_add(head, task, true);
> +}
> +
>  extern void __wake_up_q(struct wake_q_head *head, bool sleeper);
>  
>  static inline void wake_up_q(struct wake_q_head *head)
> @@ -1745,6 +1757,7 @@ struct task_struct {
>   raw_spinlock_t pi_lock;
>  
>   struct wake_q_node wake_q;
> + struct wake_q_node wake_q_sleeper;
>  
>  #ifdef CONFIG_RT_MUTEXES
>   /* PI waiters blocked on a rt_mutex held by this task */
> diff --git a/kernel/fork.c b/kernel/fork.c
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -558,6 +558,7 @@ static struct task_struct *dup_task_struct(struct 
> task_struct *orig, int node)
>   tsk->splice_pipe = NULL;
>   tsk->task_frag.page = NULL;
>   tsk->wake_q.next = NULL;
> + tsk->wake_q_sleeper.next = NULL;
>  
>   account_kernel_stack(tsk, 1);
>  
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -1506,7 +1506,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head 
> *wake_q,
>*/
>   preempt_disable();
>   if (waiter->savestate)
> - wake_q_add(wake_sleeper_q, waiter->task);
> + wake_q_add_sleeper(wake_sleeper_q, waiter->task);
>   else
>   wake_q_add(wake_q, waiter->task);
>   raw_spin_unlock(>pi_lock);
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -430,9 +430,15 @@ static bool set_nr_if_polling(struct task_struct *p)
>  #endif
>  #endif
>  
> -void wake_q_add(struct wake_q_head *head, struct task_struct *task)
> +void __wake_q_add(struct wake_q_head *head, struct task_struct *task,
> +   bool sleeper)
>  {
> - struct wake_q_node *node = >wake_q;
> + struct wake_q_node *node;
> +
> + if (sleeper)
> + node = >wake_q_sleeper;
> + else
> + node = >wake_q;
>  
>   /*
>* Atomically grab the task, if ->wake_q is !nil already it means
> @@ -461,11 +467,17 @@ void __wake_up_q(struct wake_q_head *head, bool sleeper)
>   while (node != WAKE_Q_TAIL) {
>   struct task_struct *task;
>  
> - task = container_of(node, struct task_struct, wake_q);
> + if (sleeper)
> + task = container_of(node, struct task_struct, 
> wake_q_sleeper);
> + else
> + task = container_of(node, struct task_struct, wake_q);
>   BUG_ON(!task);
>   /* task can safely be re-inserted now */
>   node = node->next;
> - task->wake_q.next = NULL;
> + if (sleeper)
> + task->wake_q_sleeper.next = NULL;
> + else
> + task->wake_q.next = NULL;
>  
>   /*
>* wake_up_process() implies a wmb() to pair with the queueing

Re: [PATCH 4/7] alpha: provide ioread64 and iowrite64 implementations

2017-06-22 Thread Stephen Bates

> +#define iowrite64be(v,p) iowrite32(cpu_to_be64(v), (p))
 
Logan, thanks for taking this cleanup on. I think this should be iowrite64 not 
iowrite32?

Stephen

Re: [PATCH] x86/uaccess: use unrolled string copy for short strings

2017-06-22 Thread Linus Torvalds

On Wed, Jun 21, 2017 at 4:09 AM, Paolo Abeni  wrote:
>
> +   if (len <= 64)
> +   return copy_user_generic_unrolled(to, from, len);
> +
> /*
>  * If CPU has ERMS feature, use copy_user_enhanced_fast_string.
>  * Otherwise, if CPU has rep_good feature, use 
> copy_user_generic_string.

NAK. Please do *not* do this. It puts the check in completely the
wrong place for several reasons:

 (a) it puts it in the inlined caller case (which could be ok for
constant sizes, but not in general)

 (b) it uses the copy_user_generic_unrolled() function that will then
just test the size *AGAIN* against small cases.

so it's both bigger than necessary, and stupid.

So if you want to do this optimization, I'd argue that you should just
do it inside the copy_user_enhanced_fast_string() function itself, the
same way we already handle the really small case specially in
copy_user_generic_string().

And do *not* use the unrolled code, which isn't used for small copies
anyway - rewrite the "copy_user_generic_unrolled" function in that
same asm file to have the non-unrolled cases (label "17" and forward)
accessible, so that you don't bother re-testing the size.

 Linus

[PATCH 4/4] s390: Reduce ELF_ET_DYN_BASE

2017-06-22 Thread Kees Cook

Now that explicitly executed loaders are loaded in the mmap region,
position PIE binaries lower in the address space to avoid possible
collisions with mmap or stack regions. For 64-bit, align to 4GB to
allow runtimes to use the entire 32-bit address space for 32-bit
pointers.

Signed-off-by: Kees Cook 
---
 arch/s390/include/asm/elf.h | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/elf.h b/arch/s390/include/asm/elf.h
index e8f623041769..7c58d599f91b 100644
--- a/arch/s390/include/asm/elf.h
+++ b/arch/s390/include/asm/elf.h
@@ -161,14 +161,13 @@ extern unsigned int vdso_enabled;
 #define CORE_DUMP_USE_REGSET
 #define ELF_EXEC_PAGESIZE  4096
 
-/* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
-   use of this is to invoke "./ld.so someprog" to test out a new version of
-   the loader.  We need to make sure that it is out of the way of the program
-   that it will "exec", and that there is sufficient room for the brk. 64-bit
-   tasks are aligned to 4GB. */
-#define ELF_ET_DYN_BASE (is_compat_task() ? \
-   (STACK_TOP / 3 * 2) : \
-   (STACK_TOP / 3 * 2) & ~((1UL << 32) - 1))
+/*
+ * This is the base location for PIE (ET_DYN with INTERP) loads. On
+ * 64-bit, this is raised to 4GB to leave the entire 32-bit address
+ * space open for things that want to use the area for 32-bit pointers.
+ */
+#define ELF_ET_DYN_BASE(is_compat_task() ? 0x00040UL : \
+   0x1UL)
 
 /* This yields a mask that user programs can use to figure out what
instruction set this CPU supports. */
-- 
2.7.4

Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command

2017-06-22 Thread Stefano Stabellini

On Thu, 22 Jun 2017, Roger Pau Monné wrote:
> On Wed, Jun 21, 2017 at 01:16:56PM -0700, Stefano Stabellini wrote:
> > On Tue, 20 Jun 2017, Roger Pau Monné wrote:
> > > On Thu, Jun 15, 2017 at 12:09:36PM -0700, Stefano Stabellini wrote:
> > > > Just reply with success to the other end for now. Delay the allocation
> > > > of the actual socket to bind and/or connect.
> > > > 
> > > > Signed-off-by: Stefano Stabellini 
> > > > CC: boris.ostrov...@oracle.com
> > > > CC: jgr...@suse.com
> > > > ---
> > > >  drivers/xen/pvcalls-back.c | 27 +++
> > > >  1 file changed, 27 insertions(+)
> > > > 
> > > > diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> > > > index 437c2ad..953458b 100644
> > > > --- a/drivers/xen/pvcalls-back.c
> > > > +++ b/drivers/xen/pvcalls-back.c
> > > > @@ -12,12 +12,17 @@
> > > >   * GNU General Public License for more details.
> > > >   */
> > > >  
> > > > +#include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > >  #include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > > +#include 
> > > >  
> > > >  #include 
> > > >  #include 
> > > > @@ -54,6 +59,28 @@ struct pvcalls_fedata {
> > > >  static int pvcalls_back_socket(struct xenbus_device *dev,
> > > > struct xen_pvcalls_request *req)
> > > >  {
> > > > +   struct pvcalls_fedata *fedata;
> > > > +   int ret;
> > > > +   struct xen_pvcalls_response *rsp;
> > > > +
> > > > +   fedata = dev_get_drvdata(>dev);
> > > > +
> > > > +   if (req->u.socket.domain != AF_INET ||
> > > > +   req->u.socket.type != SOCK_STREAM ||
> > > > +   (req->u.socket.protocol != IPPROTO_IP &&
> > > > +req->u.socket.protocol != AF_INET))
> > > > +   ret = -EAFNOSUPPORT;
> > > 
> > > Sorry for jumping into this out of the blue, but shouldn't all the
> > > constants used above be part of the protocol? AF_INET/SOCK_STREAM/...
> > > are all part of POSIX, but their specific value is not defined in the
> > > standard, hence we should have XEN_AF_INET/XEN_SOCK_STREAM/... Or am I
> > > just missing something?
> > 
> > The values of these constants for the pvcalls protocol are defined by
> > docs/misc/pvcalls.markdown under "Socket families and address format".
> > 
> > They happen to be the same as the ones defined by Linux as AF_INET,
> > SOCK_STREAM, etc, so in Linux I am just using those, but that is just an
> > implementation detail internal to the Linux kernel driver. What is
> > important from the protocol ABI perspective are the values defined by
> > docs/misc/pvcalls.markdown.
> 
> Oh I see. I still think this should be part of the public pvcalls.h
> header, and that the error codes should be the ones defined in
> public/errno.h (or else also added to the pvcalls header).

This was done differently in the past, but now that we have a formal
process, a person in charge of new PV drivers reviews, and design
documents with clearly spelled out ABIs, I consider the design docs
under docs/misc as the official specification. We don't need headers
anymore, they are redundant. In fact, we cannot have two specifications,
and the design docs are certainly the official ones (we don't want the
specs to be written as header files in C). To me, the headers under
xen/include/public/io/ are optional helpers. It doesn't matter what's in
there, or if frontends and backends use them or not.

There is really an argument for removing those headers, because they
might get out of sync with the spec by mistake, and in those cases, then
we really end up with two specifications for the same protocol. I would
be in favor of `git rm'ing all files under xen/include/public/io/ for
which we have a complete design doc under docs/misc.

[PATCH] tpm: Fix the ioremap() call for Braswell systems

2017-06-22 Thread Azhar Shaikh

ioremap() for Intel Braswell processors was done in
tpm_tis_pnp_init(). But before this function gets called,
platform driver 'tis_drv' gets registered and its probe function
tpm_tis_plat_probe() is invoked, which does a TPM
access. Now for Braswell processors tpm_platform_begin_xfer()
will do an ioread32() without having a mapped address, which
will lead to a bad I/O access warning.
Hence move the ioremap() call from tpm_tis_pnp_init() to init_tis()
before registering the 'tis_drv' or basically before any TPM access.
Accordingly also move the iounmap() call from tpm_tis_pnp_remove()
to cleanup_tis().

Signed-off-by: Azhar Shaikh 
---
 drivers/char/tpm/tpm_tis.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/char/tpm/tpm_tis.c b/drivers/char/tpm/tpm_tis.c
index 506e62ca3576..3224db80816a 100644
--- a/drivers/char/tpm/tpm_tis.c
+++ b/drivers/char/tpm/tpm_tis.c
@@ -330,12 +330,6 @@ static int tpm_tis_pnp_init(struct pnp_dev *pnp_dev,
else
tpm_info.irq = -1;
 
-#ifdef CONFIG_X86
-   if (is_bsw())
-   ilb_base_addr = ioremap(INTEL_LEGACY_BLK_BASE_ADDR,
-   ILB_REMAP_SIZE);
-#endif
-
return tpm_tis_init(_dev->dev, _info);
 }
 
@@ -359,12 +353,6 @@ static void tpm_tis_pnp_remove(struct pnp_dev *dev)
 
tpm_chip_unregister(chip);
tpm_tis_remove(chip);
-
-#ifdef CONFIG_X86
-   if (is_bsw())
-   iounmap(ilb_base_addr);
-#endif
-
 }
 
 static struct pnp_driver tis_pnp_driver = {
@@ -472,6 +460,11 @@ static int __init init_tis(void)
if (rc)
goto err_force;
 
+#ifdef CONFIG_X86
+   if (is_bsw())
+   ilb_base_addr = ioremap(INTEL_LEGACY_BLK_BASE_ADDR,
+   ILB_REMAP_SIZE);
+#endif
rc = platform_driver_register(_drv);
if (rc)
goto err_platform;
@@ -499,6 +492,10 @@ static void __exit cleanup_tis(void)
pnp_unregister_driver(_pnp_driver);
platform_driver_unregister(_drv);
 
+#ifdef CONFIG_X86
+   if (is_bsw())
+   iounmap(ilb_base_addr);
+#endif
if (force_pdev)
platform_device_unregister(force_pdev);
 }
-- 
1.9.1

Re: [PATCH v3 3/3] perf: xgene: Add support for SoC PMU version 3

2017-06-22 Thread Hoan Tran

On Thu, Jun 22, 2017 at 11:17 AM, Mark Rutland  wrote:
> On Thu, Jun 22, 2017 at 11:13:08AM -0700, Hoan Tran wrote:
>> On Thu, Jun 22, 2017 at 10:52 AM, Mark Rutland  wrote:
>> > On Tue, Jun 06, 2017 at 11:02:26AM -0700, Hoan Tran wrote:
>> > >  static inline void
>> > > +xgene_pmu_write_counter64(struct xgene_pmu_dev *pmu_dev, int idx, u64 
>> > > val)
>> > > +{
>> > > + u32 cnt_lo, cnt_hi;
>> > > +
>> > > + cnt_hi = upper_32_bits(val);
>> > > + cnt_lo = lower_32_bits(val);
>> > > +
>> > > + /* v3 has 64-bit counter registers composed by 2 32-bit registers 
>> > > */
>> > > + xgene_pmu_write_counter32(pmu_dev, 2 * idx, cnt_lo);
>> > > + xgene_pmu_write_counter32(pmu_dev, 2 * idx + 1, cnt_hi);
>> > > +}
>> >
>> > For this to be atomic, we need to disable the counters for the duration
>> > of the IRQ handler, which we don't do today.
>> >
>> > Regardless, we should do that to ensure that groups are self-consistent.
>> >
>> > i.e. in xgene_pmu_isr() we should call ops->stop_counters() just after
>> > taking the pmu lock, and we should call ops->start_counters() just
>> > before releasing it.
>>
>> Thanks for your comments. I'll fix them and send another version of
>> patch set soon.
>
> No need; I'm picking these up now, and I'll apply the fixups locally.

Thanks!

Hoan

>
> Thanks,
> Mark.

Re: [Xen-devel] [PATCH v4 07/18] xen/pvcalls: implement socket command

2017-06-22 Thread Andrew Cooper

On 22/06/17 19:29, Stefano Stabellini wrote:
> On Thu, 22 Jun 2017, Roger Pau Monné wrote:
>> On Wed, Jun 21, 2017 at 01:16:56PM -0700, Stefano Stabellini wrote:
>>> On Tue, 20 Jun 2017, Roger Pau Monné wrote:
 On Thu, Jun 15, 2017 at 12:09:36PM -0700, Stefano Stabellini wrote:
> Just reply with success to the other end for now. Delay the allocation
> of the actual socket to bind and/or connect.
>
> Signed-off-by: Stefano Stabellini 
> CC: boris.ostrov...@oracle.com
> CC: jgr...@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 27 +++
>  1 file changed, 27 insertions(+)
>
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 437c2ad..953458b 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -12,12 +12,17 @@
>   * GNU General Public License for more details.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
> +#include 
>  
>  #include 
>  #include 
> @@ -54,6 +59,28 @@ struct pvcalls_fedata {
>  static int pvcalls_back_socket(struct xenbus_device *dev,
>   struct xen_pvcalls_request *req)
>  {
> + struct pvcalls_fedata *fedata;
> + int ret;
> + struct xen_pvcalls_response *rsp;
> +
> + fedata = dev_get_drvdata(>dev);
> +
> + if (req->u.socket.domain != AF_INET ||
> + req->u.socket.type != SOCK_STREAM ||
> + (req->u.socket.protocol != IPPROTO_IP &&
> +  req->u.socket.protocol != AF_INET))
> + ret = -EAFNOSUPPORT;
 Sorry for jumping into this out of the blue, but shouldn't all the
 constants used above be part of the protocol? AF_INET/SOCK_STREAM/...
 are all part of POSIX, but their specific value is not defined in the
 standard, hence we should have XEN_AF_INET/XEN_SOCK_STREAM/... Or am I
 just missing something?
>>> The values of these constants for the pvcalls protocol are defined by
>>> docs/misc/pvcalls.markdown under "Socket families and address format".
>>>
>>> They happen to be the same as the ones defined by Linux as AF_INET,
>>> SOCK_STREAM, etc, so in Linux I am just using those, but that is just an
>>> implementation detail internal to the Linux kernel driver. What is
>>> important from the protocol ABI perspective are the values defined by
>>> docs/misc/pvcalls.markdown.
>> Oh I see. I still think this should be part of the public pvcalls.h
>> header, and that the error codes should be the ones defined in
>> public/errno.h (or else also added to the pvcalls header).
> This was done differently in the past, but now that we have a formal
> process, a person in charge of new PV drivers reviews, and design
> documents with clearly spelled out ABIs, I consider the design docs
> under docs/misc as the official specification. We don't need headers
> anymore, they are redundant. In fact, we cannot have two specifications,
> and the design docs are certainly the official ones (we don't want the
> specs to be written as header files in C). To me, the headers under
> xen/include/public/io/ are optional helpers. It doesn't matter what's in
> there, or if frontends and backends use them or not.
>
> There is really an argument for removing those headers, because they
> might get out of sync with the spec by mistake, and in those cases, then
> we really end up with two specifications for the same protocol. I would
> be in favor of `git rm'ing all files under xen/include/public/io/ for
> which we have a complete design doc under docs/misc.

+1.

Specifications should not be written in C.  The mess that is the net and
block protocol ABIs are perfect examples of why.

Its fine (and indeed recommended) to provide a header file which
describes the specified protocol, but the authoritative spec should be
in text from.

I would really prefer if more people started using ../docs/specs/.  The
migration v2 documents are currently lonely there...

~Andrew

[PATCH] dmaengine: qcom_hidma: allow ACPI/DT parameters to be overridden

2017-06-22 Thread Sinan Kaya

Parameters like maximum read/write request size and the maximum
number of active transactions are currently configured in DT/ACPI.

This patch allows a user to override these to fine tune performance
for their application.

Signed-off-by: Sinan Kaya 
---
 drivers/dma/qcom/hidma.c  |  7 +--
 drivers/dma/qcom/hidma_mgmt.c | 47 ++-
 2 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/drivers/dma/qcom/hidma.c b/drivers/dma/qcom/hidma.c
index 5072a7d..84e3699 100644
--- a/drivers/dma/qcom/hidma.c
+++ b/drivers/dma/qcom/hidma.c
@@ -1,7 +1,7 @@
 /*
  * Qualcomm Technologies HIDMA DMA engine interface
  *
- * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2015-2017, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
@@ -795,8 +795,11 @@ static int hidma_probe(struct platform_device *pdev)
device_property_read_u32(>dev, "desc-count",
 >nr_descriptors);
 
-   if (!dmadev->nr_descriptors && nr_desc_prm)
+   if (nr_desc_prm) {
+   dev_info(>dev, "overriding number of descriptors as %d\n",
+nr_desc_prm);
dmadev->nr_descriptors = nr_desc_prm;
+   }
 
if (!dmadev->nr_descriptors)
dmadev->nr_descriptors = HIDMA_NR_DEFAULT_DESC;
diff --git a/drivers/dma/qcom/hidma_mgmt.c b/drivers/dma/qcom/hidma_mgmt.c
index f847d32..5a0991b 100644
--- a/drivers/dma/qcom/hidma_mgmt.c
+++ b/drivers/dma/qcom/hidma_mgmt.c
@@ -1,7 +1,7 @@
 /*
  * Qualcomm Technologies HIDMA DMA engine Management interface
  *
- * Copyright (c) 2015-2016, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2015-2017, The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
@@ -49,6 +49,26 @@
 #define HIDMA_AUTOSUSPEND_TIMEOUT  2000
 #define HIDMA_MAX_CHANNEL_WEIGHT   15
 
+static unsigned int max_write_request;
+module_param(max_write_request, uint, 0644);
+MODULE_PARM_DESC(max_write_request,
+   "maximum write burst (default: ACPI/DT value)");
+
+static unsigned int max_read_request;
+module_param(max_read_request, uint, 0644);
+MODULE_PARM_DESC(max_read_request,
+   "maximum read burst (default: ACPI/DT value)");
+
+static unsigned int max_wr_xactions;
+module_param(max_wr_xactions, uint, 0644);
+MODULE_PARM_DESC(max_wr_xactions,
+   "maximum number of write transactions (default: ACPI/DT value)");
+
+static unsigned int max_rd_xactions;
+module_param(max_rd_xactions, uint, 0644);
+MODULE_PARM_DESC(max_rd_xactions,
+   "maximum number of read transactions (default: ACPI/DT value)");
+
 int hidma_mgmt_setup(struct hidma_mgmt_dev *mgmtdev)
 {
unsigned int i;
@@ -207,12 +227,25 @@ static int hidma_mgmt_probe(struct platform_device *pdev)
goto out;
}
 
+   if (max_write_request) {
+   dev_info(>dev, "overriding max-write-burst-bytes: %d\n",
+   max_write_request);
+   mgmtdev->max_write_request = max_write_request;
+   } else
+   max_write_request = mgmtdev->max_write_request;
+
rc = device_property_read_u32(>dev, "max-read-burst-bytes",
  >max_read_request);
if (rc) {
dev_err(>dev, "max-read-burst-bytes missing\n");
goto out;
}
+   if (max_read_request) {
+   dev_info(>dev, "overriding max-read-burst-bytes: %d\n",
+   max_read_request);
+   mgmtdev->max_read_request = max_read_request;
+   } else
+   max_read_request = mgmtdev->max_read_request;
 
rc = device_property_read_u32(>dev, "max-write-transactions",
  >max_wr_xactions);
@@ -220,6 +253,12 @@ static int hidma_mgmt_probe(struct platform_device *pdev)
dev_err(>dev, "max-write-transactions missing\n");
goto out;
}
+   if (max_wr_xactions) {
+   dev_info(>dev, "overriding max-write-transactions: %d\n",
+   max_wr_xactions);
+   mgmtdev->max_wr_xactions = max_wr_xactions;
+   } else
+   max_wr_xactions = mgmtdev->max_wr_xactions;
 
rc = device_property_read_u32(>dev, "max-read-transactions",
  >max_rd_xactions);
@@ -227,6 +266,12 @@ static int hidma_mgmt_probe(struct platform_device *pdev)
dev_err(>dev, "max-read-transactions missing\n");
goto out;
}
+   if (max_rd_xactions) {
+   dev_info(>dev, "overriding max-read-transactions: %d\n",
+

Re: [PATCH v2] irqchip: gicv3-its: Don't assume GICv3 hardware supports 16bit INTID

2017-06-22 Thread Marc Zyngier

On 22/06/17 18:44, Shanker Donthineni wrote:
> The current ITS driver is assuming every ITS hardware implementation
> supports minimum of 16bit INTID. But this is not true, as per GICv3
> specification, INTID field is IMPLEMENTATION DEFINED in the range of
> 14-24 bits. We might see an unpredictable system behavior on systems
> where hardware support less than 16bits and software tries to use
> 64K LPI interrupts.
> 
> On Qualcomm Datacenter Technologies QDF2400 platform, boot log shows
> confusing information about number of LPI chunks as shown below. The
> QDF2400 ITS hardware supports 24bit INTID.
> 
> This patch allocates the memory resources for PEND/PROP tables based
> on discoverable value which is specified in GITS_TYPER.IDbits. Also
> taking this opportunity to increase number of LPI/MSI(x) to 128K if
> the hardware is capable, and show log message that reflects the
> correct number of LPI chunks.
> 
> ITS@0xff7efe: allocated 524288 Devices @3c040 (indirect, esz 8, psz 
> 64K, shr 1)
> ITS@0xff7efe: allocated 8192 Interrupt Collections @3c013 (flat, esz 
> 8, psz 64K, shr 1)
> ITS@0xff7efe: allocated 8192 Virtual CPUs @3c014 (flat, esz 8, psz 
> 64K, shr 1)
> ITS: Allocated 524032 chunks for LPIs
> PCI/MSI: ITS@0xff7efe domain created
> Platform MSI: ITS@0xff7efe domain created

I seriously doubt that anyone will ever see a shortage of LPIs with 
16bit IDs ("64k interrupts should be enough for everybody"). But if you 
think otherwise, fair enough. Comments on the actual patch:

> 
> Signed-off-by: Shanker Donthineni 
> ---
> Changes since v1:
>No code changes, just rebase on tip of the Marc's branch and tested on 
> QDF2400 platform.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=irq/irqchip-4.13
> 
>  drivers/irqchip/irq-gic-v3-its.c | 34 --
>  1 file changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c 
> b/drivers/irqchip/irq-gic-v3-its.c
> index fee7d13..6000c56 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -691,9 +691,11 @@ static void its_irq_compose_msi_msg(struct irq_data *d, 
> struct msi_msg *msg)
>   */
>  #define IRQS_PER_CHUNK_SHIFT 5
>  #define IRQS_PER_CHUNK   (1 << IRQS_PER_CHUNK_SHIFT)
> +#define ITS_MAX_LPI_NRBITS   (17) /* 128K LPIs */
>  
>  static unsigned long *lpi_bitmap;
>  static u32 lpi_chunks;
> +static u32 lpi_nrbits;
>  static DEFINE_SPINLOCK(lpi_lock);
>  
>  static int its_lpi_to_chunk(int lpi)
> @@ -789,26 +791,19 @@ static void its_lpi_free(struct event_lpi_map *map)
>  }
>  
>  /*
> - * We allocate 64kB for PROPBASE. That gives us at most 64K LPIs to
> + * We allocate memory for PROPBASE to cover 2 ^ lpi_nrbits LPIs to
>   * deal with (one configuration byte per interrupt). PENDBASE has to
>   * be 64kB aligned (one bit per LPI, plus 8192 bits for SPI/PPI/SGI).
>   */
> -#define LPI_PROPBASE_SZ  SZ_64K
> -#define LPI_PENDBASE_SZ  (LPI_PROPBASE_SZ / 8 + SZ_1K)

Why don't you keep the same macros and update them to deal with a
variable? It would make the patch so much easier to review.

Something along the lines of (untested):

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index 63cd0f2b8707..a1891840c66a 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -691,8 +691,10 @@ static struct irq_chip its_irq_chip = {
  */
 #define IRQS_PER_CHUNK_SHIFT   5
 #define IRQS_PER_CHUNK (1 << IRQS_PER_CHUNK_SHIFT)
+#define ITS_MAX_LPI_NRBITS (17) /* 128K LPIs */
 
 static unsigned long *lpi_bitmap;
+static u32 lpi_nrbits;
 static u32 lpi_chunks;
 static DEFINE_SPINLOCK(lpi_lock);
 
@@ -706,9 +708,9 @@ static int its_chunk_to_lpi(int chunk)
return (chunk << IRQS_PER_CHUNK_SHIFT) + 8192;
 }
 
-static int __init its_lpi_init(u32 id_bits)
+static int __init its_lpi_init(void)
 {
-   lpi_chunks = its_lpi_to_chunk(1UL << id_bits);
+   lpi_chunks = its_lpi_to_chunk(1UL << lpi_nrbits);
 
lpi_bitmap = kzalloc(BITS_TO_LONGS(lpi_chunks) * sizeof(long),
 GFP_KERNEL);
@@ -793,13 +795,9 @@ static void its_lpi_free(struct event_lpi_map *map)
  * deal with (one configuration byte per interrupt). PENDBASE has to
  * be 64kB aligned (one bit per LPI, plus 8192 bits for SPI/PPI/SGI).
  */
-#define LPI_PROPBASE_SZSZ_64K
-#define LPI_PENDBASE_SZ(LPI_PROPBASE_SZ / 8 + SZ_1K)
-
-/*
- * This is how many bits of ID we need, including the useless ones.
- */
-#define LPI_NRBITS ilog2(LPI_PROPBASE_SZ + SZ_8K)
+#define LPI_PROPBASE_SZALIGN(BIT(lpi_nrbits), SZ_64K)
+#define LPI_PENDBASE_SZALIGN(BIT(lpi_nrbits) / 8, SZ_64K)
+#define LPI_NRBITS lpi_nrbits
 
 #define LPI_PROP_DEFAULT_PRIO  0xa0
 
@@ -807,6 +805,7 @@ static int __init

[PATCH 1/2] edac: pnd2_edac: add code comment for clarification

2017-06-22 Thread Gustavo A. R. Silva

Add code comment to make it clear that the fall-through is intentional.

Addresses-Coverity-ID: 1403728
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/edac/pnd2_edac.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/edac/pnd2_edac.c b/drivers/edac/pnd2_edac.c
index 1cad5a9..c51ec73 100644
--- a/drivers/edac/pnd2_edac.c
+++ b/drivers/edac/pnd2_edac.c
@@ -174,6 +174,7 @@ static int apl_rd_reg(int port, int off, int op, void 
*data, size_t sz, char *na
switch (sz) {
case 8:
ret = sbi_send(port, off + 4, op, (u32 *)(data + 4));
+   /* fall through */
case 4:
ret = sbi_send(port, off, op, (u32 *)data);
pnd2_printk(KERN_DEBUG, "%s=%x%08x ret=%d\n", name,
-- 
2.5.0

< 1 2 3 4 5 6 7 8 9 10 >

501 - 600 of 2056 matches

Mail list logo