from:"Mark Rutland"

Re: [PATCH V3 2/4] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump

2019-05-28 Thread Mark Rutland

On Mon, May 27, 2019 at 09:20:01AM +0200, Michal Hocko wrote:
> On Wed 22-05-19 17:42:13, Mark Rutland wrote:
> > On Thu, May 16, 2019 at 01:05:29PM +0200, Michal Hocko wrote:
> > > On Thu 16-05-19 11:23:54, Mark Rutland wrote:
> > > > On Wed, May 15, 2019 at 06:58:47PM +0200, Michal Hocko wrote:
> > > > > On Tue 14-05-19 14:30:05, Anshuman Khandual wrote:

> > I don't think that it's reasonable for this code to bring down the
> > kernel unless the kernel page tables are already corrupt. I agree we
> > should minimize the impact on other code, and I'm happy to penalize
> > ptdump so long as it's functional and safe.
> > 
> > I would like it to be possible to use the ptdump code to debug
> > hot-remove, so I'd rather not make the two mutually exclusive. I'd also
> > like it to be possible to use this in-the-field, and for that asking an
> > admin to potentially crash their system isn't likely to fly.
> 
> OK, fair enough.
> 
> > > > > I am asking because I would really love to make mem hotplug locking 
> > > > > less
> > > > > scattered outside of the core MM than more. Most users simply 
> > > > > shouldn't
> > > > > care. Pfn walkers should rely on pfn_to_online_page.
> > 
> > Jut to check, is your plan to limit access to the hotplug lock, or to
> > redesign the locking scheme?
> 
> To change the locking to lock hotpluged ranges rather than having a
> global lock as the operation is inherently pfn range scoped.

Ok. That sounds like something we could adapt the ptdump code to handle
without too much pain (modulo how much of that you want to expose
outside of the core mm code).

> > > > I'm not sure if that would help us here; IIUC pfn_to_online_page() alone
> > > > doesn't ensure that the page remains online. Is there a way to achieve
> > > > that other than get_online_mems()?
> > > 
> > > You have to pin the page to make sure the hotplug is not going to
> > > offline it.
> > 
> > I'm not exactly sure how pinning works -- is there a particular set of
> > functions I should look at for that?
> 
> Pinning (get_page) on any page of the range will deffer the hotremove
> operation and therefore the page tables cannot go away as well.
> 
> That being said, I thought the API is mostly for debugging and "you
> should better know what you are doing" kinda thing (based on debugfs
> being used here). If this is really useful in its current form and
> should be used also while the hotremove is in progress then ok.
> Once we actually get to rework the locking then we will have another
> spot to handle but that's the life.

Great.

FWIW, I'm happy to rework the ptdump code to help with that.

Thanks,
Mark.

Re: [PATCH 4/8] arm64: Basic Branch Target Identification support

2019-05-24 Thread Mark Rutland

On Fri, May 24, 2019 at 05:12:40PM +0100, Dave Martin wrote:
> On Fri, May 24, 2019 at 04:38:48PM +0100, Mark Rutland wrote:
> > On Fri, May 24, 2019 at 03:53:06PM +0100, Dave Martin wrote:
> > > On Fri, May 24, 2019 at 02:02:17PM +0100, Mark Rutland wrote:
> > > > On Fri, May 24, 2019 at 11:25:29AM +0100, Dave Martin wrote:
> > > > >  /* Additional SPSR bits not exposed in the UABI */
> > > > >  #define PSR_IL_BIT   (1 << 20)
> > > > > +#define PSR_BTYPE_CALL   (2 << 10)
> > > > 
> > > > I thought BTYPE was a 2-bit field, so isn't there at leat one other
> > > > value to have a mnemonic for?
> > > > 
> > > > Is it an enumeration or a bitmask?
> > > 
> > > It's a 2-bit enumeration, and for now this is the only value that the
> > > kernel uses: this determines the types of BTI landing pad permitted at
> > > signal handler entry points in BTI guarded pages.
> > > 
> > > Possibly it would be clearer to write it
> > > 
> > > #define PSR_BTYPE_CALL(0b10 << 10)
> > > 
> > > but we don't write other ptrace.h constants this way.  In UAPI headers
> > > we should avoid GCC-isms, but here it's OK since we already rely on this
> > > syntax internally.
> > > 
> > > I can change it if you prefer, though my preference is to leave it.
> > 
> > I have no issue with the (2 << 10) form, but could we add mnemonics for
> > the other values now, even if we're not using them at this instant?
> 
> Can do.  How about.
> 
>   PSR_BTYPE_NONE  (0 << 10)
>   PSR_BTYPE_JC(1 << 10)
>   PSR_BTYPE_C (2 << 10)
>   PSR_BTYPE_J (3 << 10)
> 
> That matches the way I decode PSTATE for splats.
> 
> The architecture does not define mnemonics so these are my invention,
> but anyway this is just for the kernel.

That looks good to me!

[...]

> > > > > @@ -741,6 +741,11 @@ static void setup_return(struct pt_regs *regs, 
> > > > > struct k_sigaction *ka,
> > > > >   regs->regs[29] = (unsigned long)>next_frame->fp;
> > > > >   regs->pc = (unsigned long)ka->sa.sa_handler;
> > > > >  
> > > > > + if (system_supports_bti()) {
> > > > > + regs->pstate &= ~(regs->pstate & PSR_BTYPE_MASK);
> > > > 
> > > > Nit: that can be:
> > > > 
> > > > regs->pstate &= ~PSR_BTYPE_MASK;
> > > 
> > > x & ~y is sensitive to the type of y and can clobber high bits, so I
> > > prefer not to write it.  GCC generates the same code either way.
> > 
> > Ah, I thought this might befor type promotion.
> > 
> > > However, this will also trip us up elsewhere when the time comes, so
> > > maybe it's a waste of time working around it here.
> > > 
> > > If you feel strongly, I'm happy to change it.
> > 
> > I'd rather we followed the same pattern as elsewhere, as having this
> > special case is confusing, and we'd still have the same bug elsewhere.
> > 
> > My concern here is consistency, so if you want to fix up all instances
> > to preserve the upper 32 bits of regs->pstate, I'd be happy. :)
> > 
> > I also think there are nicer/clearer ways to fix the type promotion
> > issue, like using UL in the field definitions, using explicit casts, or
> > adding helpers to set/clear bits with appropriate promotion.
> 
> Sure, I change it to be more consistent.
> 
> Wrapping this idiom up in a clear_bits() wrapper might be an idea, 
> but in advance of that it makes little sense to do it just in this one
> place.
> 
> I don't really like annotating header #defines with arbitrary types
> until it's really necessary, so I'll leave it for now.

Sure, that's fine by me.

[...]

> > > > > diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> > > > > index 5610ac0..85b456b 100644
> > > > > --- a/arch/arm64/kernel/syscall.c
> > > > > +++ b/arch/arm64/kernel/syscall.c
> > > > > @@ -66,6 +66,7 @@ static void el0_svc_common(struct pt_regs *regs, 
> > > > > int scno, int sc_nr,
> > > > >   unsigned long flags = current_thread_info()->flags;
> > > > >  
> > > > >   regs->orig_x0 = regs->regs[0];
> > > > > + regs->pstate &= ~(regs->pstate & PSR_BTYPE_MASK);
> > > > 
> > >

Re: [PATCH 4/8] arm64: Basic Branch Target Identification support

2019-05-24 Thread Mark Rutland

On Fri, May 24, 2019 at 03:53:06PM +0100, Dave Martin wrote:
> On Fri, May 24, 2019 at 02:02:17PM +0100, Mark Rutland wrote:
> > On Fri, May 24, 2019 at 11:25:29AM +0100, Dave Martin wrote:
> > > +#define arch_calc_vm_prot_bits(prot, pkey) arm64_calc_vm_prot_bits(prot)
> > > +static inline unsigned long arm64_calc_vm_prot_bits(unsigned long prot)
> > > +{
> > > + if (system_supports_bti() && (prot & PROT_BTI_GUARDED))
> > > + return VM_ARM64_GP;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +#define arch_vm_get_page_prot(vm_flags) arm64_vm_get_page_prot(vm_flags)
> > > +static inline pgprot_t arm64_vm_get_page_prot(unsigned long vm_flags)
> > > +{
> > > + return (vm_flags & VM_ARM64_GP) ? __pgprot(PTE_GP) : __pgprot(0);
> > > +}
> > 
> > While the architectural name for the PTE bit is GP, it might make more
> > sense to call the vm flag VM_ARM64_BTI, since people are more likely to
> > recognise BTI than GP as a mnemonic.
> > 
> > Not a big deal either way, though.
> 
> I'm happy to change it.  It's a kernel internal flag used in
> approximately zero places.  So whatever name is most intuitive for
> kernel maintainers is fine.  Nobody else needs to look at it.

Sure thing; I just know that I'm going to remember what BTI is much more
easily than I'll remember what GP is.

> > > diff --git a/arch/arm64/include/asm/ptrace.h 
> > > b/arch/arm64/include/asm/ptrace.h
> > > index b2de329..b868ef11 100644
> > > --- a/arch/arm64/include/asm/ptrace.h
> > > +++ b/arch/arm64/include/asm/ptrace.h
> > > @@ -41,6 +41,7 @@
> > >  
> > >  /* Additional SPSR bits not exposed in the UABI */
> > >  #define PSR_IL_BIT   (1 << 20)
> > > +#define PSR_BTYPE_CALL   (2 << 10)
> > 
> > I thought BTYPE was a 2-bit field, so isn't there at leat one other
> > value to have a mnemonic for?
> > 
> > Is it an enumeration or a bitmask?
> 
> It's a 2-bit enumeration, and for now this is the only value that the
> kernel uses: this determines the types of BTI landing pad permitted at
> signal handler entry points in BTI guarded pages.
> 
> Possibly it would be clearer to write it
> 
> #define PSR_BTYPE_CALL(0b10 << 10)
> 
> but we don't write other ptrace.h constants this way.  In UAPI headers
> we should avoid GCC-isms, but here it's OK since we already rely on this
> syntax internally.
> 
> I can change it if you prefer, though my preference is to leave it.

I have no issue with the (2 << 10) form, but could we add mnemonics for
the other values now, even if we're not using them at this instant?

> > >  #endif /* _UAPI__ASM_HWCAP_H */
> > > diff --git a/arch/arm64/include/uapi/asm/mman.h 
> > > b/arch/arm64/include/uapi/asm/mman.h
> > > new file mode 100644
> > > index 000..4776b43
> > > --- /dev/null
> > > +++ b/arch/arm64/include/uapi/asm/mman.h
> > > @@ -0,0 +1,9 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > > +#ifndef _UAPI__ASM_MMAN_H
> > > +#define _UAPI__ASM_MMAN_H
> > > +
> > > +#include 
> > > +
> > > +#define PROT_BTI_GUARDED 0x10/* BTI guarded page */
> > 
> > From prior discussions, I thought this would be PROT_BTI, without the
> > _GUARDED suffix. Do we really need that?
> > 
> > AFAICT, all other PROT_* definitions only have a single underscore, and
> > the existing arch-specific flags are PROT_ADI on sparc, and PROT_SAO on
> > powerpc.
> 
> No strong opinon.  I was trying to make the name less obscure, but I'm
> equally happy with PROT_BTI if people prefer that.

My personal opinion is that PROT_BTI is preferable, but I'll let others
chime in.

> > > diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> > > index b82e0a9..3717b06 100644
> > > --- a/arch/arm64/kernel/ptrace.c
> > > +++ b/arch/arm64/kernel/ptrace.c
> > > @@ -1860,7 +1860,7 @@ void syscall_trace_exit(struct pt_regs *regs)
> > >   */
> > >  #define SPSR_EL1_AARCH64_RES0_BITS \
> > >   (GENMASK_ULL(63, 32) | GENMASK_ULL(27, 25) | GENMASK_ULL(23, 22) | \
> > > -  GENMASK_ULL(20, 13) | GENMASK_ULL(11, 10) | GENMASK_ULL(5, 5))
> > > +  GENMASK_ULL(20, 13) | GENMASK_ULL(5, 5))
> > >  #define SPSR_EL1_AARCH32_RES0_BITS \
> > >   (GENMASK_ULL(63, 32) | GENMASK_ULL(22, 22) | GENMASK_ULL(20, 20))
> > 
> > Phew; I was worried this would be missed!
> 
> It was.  I had fun debuggin

Re: [PATCH v1 3/5] dt-bindings: Add depends-on property

2019-05-24 Thread Mark Rutland

On Thu, May 23, 2019 at 06:01:14PM -0700, Saravana Kannan wrote:
> The depends-on property is used to list the mandatory functional
> dependencies of a consumer device on zero or more supplier devices.
> 
> Signed-off-by: Saravana Kannan 
> ---
>  .../devicetree/bindings/depends-on.txt| 26 +++
>  1 file changed, 26 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/depends-on.txt
> 
> diff --git a/Documentation/devicetree/bindings/depends-on.txt 
> b/Documentation/devicetree/bindings/depends-on.txt
> new file mode 100644
> index ..1cbddd11cf17
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/depends-on.txt
> @@ -0,0 +1,26 @@
> +Functional dependency linking
> +=
> +
> +Apart from parent-child relationships, devices (consumers) often have
> +functional dependencies on other devices (suppliers). Common examples of
> +suppliers are clock, regulators, pinctrl, etc. However not all of them are
> +dependencies with well defined devicetree bindings. 

For clocks, regualtors, and pinctrl, that dependency is already implicit
in the consumer node's properties. We should be able to derive those
dependencies within the kernel.

Can you give an example of where a dependency is not implicit in an
existing binding?

> Also, not all functional
> +dependencies are mandatory as the device might be able to operate in a 
> limited
> +mode without some of the dependencies.

Whether something is a mandatory dependency will depend on the driver
and dynamic runtime details more than it will depend on the hardware.

For example, assume I have an IP block that functions as both a
clocksource and a watchdog that can reset the system, with those two
functions derived from separate input clocks.

I could use the device as just a clocksource, or as just a watchdog, and
neither feature in isolation is necessarily mandatory for the device to
be somewhat useful to the OS.

We need better ways of dynamically providing and managing this
information. For example, if a driver could register its dynamic
dependencies at probe (or some new pre-probe callback), we'd be able to
notify it immediately when its dependencies are available.

Thanks,
Mark.

Re: [PATCH 4/8] arm64: Basic Branch Target Identification support

2019-05-24 Thread Mark Rutland

Hi Dave,

This generally looks good, but I have a few comments below.

On Fri, May 24, 2019 at 11:25:29AM +0100, Dave Martin wrote:
> +#define arch_calc_vm_prot_bits(prot, pkey) arm64_calc_vm_prot_bits(prot)
> +static inline unsigned long arm64_calc_vm_prot_bits(unsigned long prot)
> +{
> + if (system_supports_bti() && (prot & PROT_BTI_GUARDED))
> + return VM_ARM64_GP;
> +
> + return 0;
> +}
> +
> +#define arch_vm_get_page_prot(vm_flags) arm64_vm_get_page_prot(vm_flags)
> +static inline pgprot_t arm64_vm_get_page_prot(unsigned long vm_flags)
> +{
> + return (vm_flags & VM_ARM64_GP) ? __pgprot(PTE_GP) : __pgprot(0);
> +}

While the architectural name for the PTE bit is GP, it might make more
sense to call the vm flag VM_ARM64_BTI, since people are more likely to
recognise BTI than GP as a mnemonic.

Not a big deal either way, though.

[...]

> diff --git a/arch/arm64/include/asm/ptrace.h b/arch/arm64/include/asm/ptrace.h
> index b2de329..b868ef11 100644
> --- a/arch/arm64/include/asm/ptrace.h
> +++ b/arch/arm64/include/asm/ptrace.h
> @@ -41,6 +41,7 @@
>  
>  /* Additional SPSR bits not exposed in the UABI */
>  #define PSR_IL_BIT   (1 << 20)
> +#define PSR_BTYPE_CALL   (2 << 10)

I thought BTYPE was a 2-bit field, so isn't there at leat one other
value to have a mnemonic for?

Is it an enumeration or a bitmask?

>  #endif /* _UAPI__ASM_HWCAP_H */
> diff --git a/arch/arm64/include/uapi/asm/mman.h 
> b/arch/arm64/include/uapi/asm/mman.h
> new file mode 100644
> index 000..4776b43
> --- /dev/null
> +++ b/arch/arm64/include/uapi/asm/mman.h
> @@ -0,0 +1,9 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +#ifndef _UAPI__ASM_MMAN_H
> +#define _UAPI__ASM_MMAN_H
> +
> +#include 
> +
> +#define PROT_BTI_GUARDED 0x10/* BTI guarded page */

>From prior discussions, I thought this would be PROT_BTI, without the
_GUARDED suffix. Do we really need that?

AFAICT, all other PROT_* definitions only have a single underscore, and
the existing arch-specific flags are PROT_ADI on sparc, and PROT_SAO on
powerpc.

[...]

> diff --git a/arch/arm64/kernel/ptrace.c b/arch/arm64/kernel/ptrace.c
> index b82e0a9..3717b06 100644
> --- a/arch/arm64/kernel/ptrace.c
> +++ b/arch/arm64/kernel/ptrace.c
> @@ -1860,7 +1860,7 @@ void syscall_trace_exit(struct pt_regs *regs)
>   */
>  #define SPSR_EL1_AARCH64_RES0_BITS \
>   (GENMASK_ULL(63, 32) | GENMASK_ULL(27, 25) | GENMASK_ULL(23, 22) | \
> -  GENMASK_ULL(20, 13) | GENMASK_ULL(11, 10) | GENMASK_ULL(5, 5))
> +  GENMASK_ULL(20, 13) | GENMASK_ULL(5, 5))
>  #define SPSR_EL1_AARCH32_RES0_BITS \
>   (GENMASK_ULL(63, 32) | GENMASK_ULL(22, 22) | GENMASK_ULL(20, 20))

Phew; I was worried this would be missed!

[...]

> @@ -741,6 +741,11 @@ static void setup_return(struct pt_regs *regs, struct 
> k_sigaction *ka,
>   regs->regs[29] = (unsigned long)>next_frame->fp;
>   regs->pc = (unsigned long)ka->sa.sa_handler;
>  
> + if (system_supports_bti()) {
> + regs->pstate &= ~(regs->pstate & PSR_BTYPE_MASK);

Nit: that can be:

regs->pstate &= ~PSR_BTYPE_MASK;

> diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
> index 5610ac0..85b456b 100644
> --- a/arch/arm64/kernel/syscall.c
> +++ b/arch/arm64/kernel/syscall.c
> @@ -66,6 +66,7 @@ static void el0_svc_common(struct pt_regs *regs, int scno, 
> int sc_nr,
>   unsigned long flags = current_thread_info()->flags;
>  
>   regs->orig_x0 = regs->regs[0];
> + regs->pstate &= ~(regs->pstate & PSR_BTYPE_MASK);

Likewise:

regs->pstate &= ~PSR_BTYPE_MASK;

... though I don't understand why that would matter to syscalls, nor how
those bits could ever be set given we had to execute an SVC to get here.

What am I missing?

Thanks,
Mark.

Re: [PATCH 2/2] compiler: Prevent evaluation of WRITE_ONCE()

2019-05-24 Thread Mark Rutland

This would be better titled as:

  compiler: don't return a value from WRITE_ONCE()

... since we do want the WRITE_ONCE() itself to be evaluated.

On Fri, May 24, 2019 at 12:35:36PM +0200, Andrea Parri wrote:
> Now that there's no single use of the value of WRITE_ONCE(), change
> the implementation to eliminate it.

I hope that's the case, but it's possible that some macros might be
relying on this, so it's probably worth waiting to see if the kbuild
test robot screams.

Otherwise, I agree that WRITE_ONCE() returning a value is surprising,
and unnecessary. IIRC you said that trying to suport that in other
implementations was painful, so aligning on a non-returning version
sounds reasonable to me.

> 
> Suggested-by: Mark Rutland 
> Signed-off-by: Andrea Parri 
> Cc: Arnd Bergmann 
> Cc: Greg Kroah-Hartman 
> Cc: Jorgen Hansen 
> Cc: Peter Zijlstra 
> Cc: Will Deacon 
> Cc: Mark Rutland 
> Cc: "Paul E. McKenney" 
> ---
>  include/linux/compiler.h | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index 8aaf7cd026b06..4024c809a6c63 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -277,12 +277,11 @@ unsigned long read_word_at_a_time(const void *addr)
>  }
>  
>  #define WRITE_ONCE(x, val) \
> -({   \
> +do { \
>   union { typeof(x) __val; char __c[1]; } __u =   \
>   { .__val = (__force typeof(x)) (val) }; \
>   __write_once_size(&(x), __u.__c, sizeof(x));\
> - __u.__val;  \
> -})
> +} while (0)

With the title fixed, and assuming that the kbuild test robot doesn't
find uses we've missed:

Acked-by: Mark Rutland 

Thanks,
Mark.

>  
>  #endif /* __KERNEL__ */
>  
> -- 
> 2.7.4
>

Re: [PATCH] arm64: break while loop if task had been rescheduled

2019-05-24 Thread Mark Rutland

On Tue, May 21, 2019 at 05:20:04PM +0800, Tengfei Fan wrote:
> While printing a task's backtrace and this task isn't
> current task, it is possible that task's fp and fp+8
> have the same value, so cannot break the while loop.
> This can break while loop if this task had been
> rescheduled during print this task's backtrace.

There are a few cases where backtracing can get stuck in an infinite
loop. I'd attempted to address that more generally in my
arm64/robust-stacktrace branch [1].

Looking at tsk->state here is inherently racy, and doesn't solve the
general case, so I'd prefer to avoid that.

Do my patches help you here? If so, I'm happy to rebase those to
v5.2-rc1 and repost.

Thanks,
Mark.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/log/?h=arm64/robust-stacktrace

> 
> Signed-off-by: Tengfei Fan 
> ---
>  arch/arm64/kernel/traps.c | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 2975598..9df6e02 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -103,6 +103,9 @@ void dump_backtrace(struct pt_regs *regs, struct 
> task_struct *tsk)
>  {
>   struct stackframe frame;
>   int skip = 0;
> + long cur_state = 0;
> + unsigned long cur_sp = 0;
> + unsigned long cur_fp = 0;
>  
>   pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
>  
> @@ -127,6 +130,9 @@ void dump_backtrace(struct pt_regs *regs, struct 
> task_struct *tsk)
>*/
>   frame.fp = thread_saved_fp(tsk);
>   frame.pc = thread_saved_pc(tsk);
> + cur_state = tsk->state;
> + cur_sp = thread_saved_sp(tsk);
> + cur_fp = frame.fp;
>   }
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>   frame.graph = 0;
> @@ -134,6 +140,23 @@ void dump_backtrace(struct pt_regs *regs, struct 
> task_struct *tsk)
>  
>   printk("Call trace:\n");
>   do {
> + if (tsk != current && (cur_state != tsk->state
> + /*
> +  * We would not be printing backtrace for the task
> +  * that has changed state from uninterruptible to
> +  * running before hitting the do-while loop but after
> +  * saving the current state. If task is in running
> +  * state before saving the state, then we may print
> +  * wrong call trace or end up in infinite while loop
> +  * if *(fp) and *(fp+8) are same. While the situation
> +  * will stop print when that task schedule out.
> +  */
> + || cur_sp != thread_saved_sp(tsk)
> + || cur_fp != thread_saved_fp(tsk))) {
> + printk("The task:%s had been rescheduled!\n",
> + tsk->comm);
> + break;
> + }
>   /* skip until specified stack frame */
>   if (!skip) {
>   dump_backtrace_entry(frame.pc);
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
>

Re: [PATCH] arm64: break while loop if task had been rescheduled

2019-05-24 Thread Mark Rutland

Hi,

This appears to be a bizarrely formatted reply to Anshuman's questions
[1] on the first posting [2] of this patch, and as it stands, it isn't
possible to follow.

Please follow the usual mailing list ettiquette, and reply inline to
questions.

I am not going to reply further to this post, but I'll comment on the
first post.

Thanks,
Mark.

[1] 
https://lore.kernel.org/lkml/1558430404-4840-1-git-send-email-tengf...@codeaurora.org/T/#m415174aacdd100f9386113ed3ae9f427a2255f8a
[2] 
https://lore.kernel.org/lkml/1558430404-4840-1-git-send-email-tengf...@codeaurora.org/T/#u

On Fri, May 24, 2019 at 11:16:16AM +0800, tengf...@codeaurora.org wrote:
> When task isn't current task, this task's state have
> chance to be changed during printing this task's
> backtrace, so it is possible that task's fp and fp+8
> have the same vaule, so cannot break the while loop.
> To fix this issue, we first save the task's state, sp
> and fp, then we will get the task's current state, sp
> and fp in each while again. we will stop to print
> backtrace if we found any of the values are different
> than what we saved.
> 
> /answer
> question**/
> This is very confusing. IIUC it suggests that while printing
> the backtrace for non-current tasks the do/while loop does not
> exit because fp and fp+8 might have the same value ? When would
> this happen ? Even in that case the commit message here does not
> properly match the change in this patch.

So

> 
> In our issue, we got fp=pc=0xFF8025A13BA0, so cannot exit while
> loop in dump_basktrace().
> After analyze our issue's dump, we found one task(such as: task A)
> is exiting via invoke do_exit() during another task is showing task
> A's dumptask. In kernel code, do_exit() and exit_notify are defined
> as follows:
> void noreturn do_exit(long code)
> {
>  ..
>  exit_notify(tsk, group_dead);
>  ..
> }
> static void exit_notify(struct task_struct *tsk, int group_dead)
> {
>  ..
> }
> Because of exit_notify() is a static function, so it is inlined to
> do_exit() when compile kernel, so we can get partial assembly code
> of do_exit() as follows:
> ……
> {
> bool autoreap;
> struct task_struct *p, *n;
> LIST_HEAD(dead);
> 
> write_lock_irq(_lock);
>  c10:   9000adrpx0, 0 
>  c14:   910003e8mov x8, sp
>  c18:   9100add x0, x0, #0x0
> */
> static void exit_notify(struct task_struct *tsk, int group_dead)
> {
> bool autoreap;
> struct task_struct *p, *n;
> LIST_HEAD(dead);
>  c1c:   a90023e8stp x8, x8, [sp]
> 
> write_lock_irq(_lock);
>  c20:   9400bl  0 <_raw_write_lock_irq>
>  c24:   f9435268ldr x8, [x19,#1696]
> ……
> From the code "c14:" and "c1c:", we will find sp's addr value is stored
> in sp and sp+8, so sp's vaule equal (sp+8)'s value.
> In our issue, there is a chance of fp point sp, so there will be fp=pc=fp's
> addr value,so code cannot break from while loop in dump_backtrace().
> 
> /answer
> question**/
> 
> /answer
> question**/
> This patch tries to stop printing the stack for non-current tasks
> if their state change while there is one dump_backtrace() trying
> to print back trace. Dont we have any lock preventing a task in
> this situation (while dumping it's backtrace) from running again
> or changing state.
> I haven't found any lock preventing a task in this situation, and I think we
> shouldn't
> prevent task running if this task is scheduled.
> /answer
> question**/
> 
> Signed-off-by: Tengfei Fan 
> ---
>  arch/arm64/kernel/traps.c | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 2975598..9df6e02 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -103,6 +103,9 @@ void dump_backtrace(struct pt_regs *regs, struct
> task_struct *tsk)
>  {
>  struct stackframe frame;
>  int skip = 0;
> +long cur_state = 0;
> +unsigned long cur_sp = 0;
> +unsigned long cur_fp = 0;
> 
>  pr_debug("%s(regs = %p tsk = %p)\n", __func__, regs, tsk);
> 
> @@ -127,6 +130,9 @@ void dump_backtrace(struct pt_regs *regs, struct
> task_struct *tsk)
>   */
>  frame.fp = thread_saved_fp(tsk);
>  frame.pc = thread_saved_pc(tsk);
> +cur_state = tsk->state;
> +cur_sp = thread_saved_sp(tsk);
> +cur_fp = frame.fp;
> 
> /answer
> question**/
> Should 'saved_state|sp|fp' instead as its applicable to non-current
> tasks only.
> 'saved_state|sp|fp' only applies to non-current tasks.

Re: [PATCH v3 1/2] dt-bindings: edac: arm-dmc520.txt

2019-05-23 Thread Mark Rutland

On Thu, May 16, 2019 at 02:35:47AM +, Lei Wang wrote:
> From: Lei Wang 
> 
> This is the device tree bindings for new EDAC driver dmc520_edac.c.
> 
> Signed-off-by: Lei Wang 
> ---
>  .../devicetree/bindings/edac/arm-dmc520.txt| 26 
> ++
>  1 file changed, 26 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/edac/arm-dmc520.txt
> 
> diff --git a/Documentation/devicetree/bindings/edac/arm-dmc520.txt 
> b/Documentation/devicetree/bindings/edac/arm-dmc520.txt
> new file mode 100644
> index 000..71e7aa3
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/edac/arm-dmc520.txt
> @@ -0,0 +1,26 @@
> +* ARM DMC-520 EDAC node
> +
> +Required properties:
> +- compatible : "brcm,dmc-520", "arm,dmc-520".
> +- reg: Address range of the DMC-520 registers.
> +- interrupts : DMC-520 interrupt numbers. The example below specifies
> +   two interrupt lines for dram_ecc_errc_int and
> +   dram_ecc_errd_int.
> +- interrupt-config   : This is an array of interrupt masks. For each of the
> +   above interrupt line, add one interrupt mask element 
> to
> +   it. That is, there is a 1:1 mapping from each 
> interrupt
> +   line to an interrupt mask. An interrupt mask can 
> represent
> +   multiple interrupts being enabled. Refer to 
> interrupt_control
> +   register in DMC-520 TRM for interrupt mapping. In the 
> example
> +   below, the interrupt configuration enables 
> dram_ecc_errc_int
> +   and dram_ecc_errd_int. And each interrupt is 
> connected to
> +   a separate interrupt line.

Generally we use interrupt-names to distinguish interrupts.

Do you really have arbitary subsets of lines muxed together?

Thanks,
Mark.

> +
> +Example:
> +
> +dmc0: dmc@20 {
> + compatible = "brcm,dmc-520", "arm,dmc-520";
> + reg = <0x20 0x8>;
> + interrupts = <0x0 0x349 0x4>, <0x0 0x34B 0x4>;
> + interrupt-config = <0x4>, <0x8>;
> +};
> -- 
> 2.7.4
>

Re: [RFC PATCH] rcu: Make 'rcu_assign_pointer(p, v)' of type 'typeof(p)'

2019-05-23 Thread Mark Rutland

On Thu, May 23, 2019 at 06:50:13AM -0700, Paul E. McKenney wrote:
> On Thu, May 23, 2019 at 03:32:20PM +0200, Andrea Parri wrote:
> > The expression
> > 
> >   rcu_assign_pointer(p, typeof(p) v)
> > 
> > is reported to be of type 'typeof(p)' in the documentation (c.f., e.g.,
> > Documentation/RCU/whatisRCU.txt) but this is not the case: for example,
> > the following snippet
> > 
> >   int **y;
> >   int *x;
> >   int *r0;
> > 
> >   ...
> > 
> >   r0 = rcu_assign_pointer(*y, x);
> > 
> > can currently result in the compiler warning
> > 
> >   warning: assignment to ‘int *’ from ‘uintptr_t’ {aka ‘long unsigned int’} 
> > makes pointer from integer without a cast [-Wint-conversion]
> > 
> > Cast the uintptr_t value to a typeof(p) value.
> > 
> > Signed-off-by: Andrea Parri 
> > Cc: "Paul E. McKenney" 
> > Cc: Josh Triplett 
> > Cc: Steven Rostedt 
> > Cc: Mathieu Desnoyers 
> > Cc: Lai Jiangshan 
> > Cc: Joel Fernandes 
> > Cc: r...@vger.kernel.org
> > Cc: Peter Zijlstra 
> > Cc: Will Deacon 
> > Cc: Mark Rutland 
> > ---
> > NOTE:
> > 
> > TBH, I'm not sure this is 'the right patch' (hence the RFC...): in
> > fact, I'm currently missing the motivations for allowing assignments
> > such as the "r0 = ..." assignment above in generic code.  (BTW, it's
> > not currently possible to use such assignments in litmus tests...)
> 
> Given that a quick (and perhaps error-prone) search of the uses of
> rcu_assign_pointer() in v5.1 didn't find a single use of the return
> value, let's please instead change the documentation and implementation
> to eliminate the return value.

FWIW, I completely agree, and for similar reasons I'd say we should do
the same to WRITE_ONCE(), where this 'cool feature' has been inherited
from.

For WRITE_ONCE() there's at least one user that needs to be cleaned up
first (relying on non-portable implementation detaisl of atomic*_set()),
but I suspect rcu_assign_pointer() isn't used as much as a building
block for low-level macros.

Thanks,
Mark.

Re: [PATCH 00/18] locking/atomic: atomic64 type cleanup

2019-05-23 Thread Mark Rutland

On Wed, May 22, 2019 at 11:18:59PM +0200, Arnd Bergmann wrote:
> On Wed, May 22, 2019 at 3:23 PM Mark Rutland  wrote:
> >
> > Currently architectures return inconsistent types for atomic64 ops. Some 
> > return
> > long (e..g. powerpc), some return long long (e.g. arc), and some return s64
> > (e.g. x86).
> >
> > This is a bit messy, and causes unnecessary pain (e.g. as values must be 
> > cast
> > before they can be printed [1]).
> >
> > This series reworks all the atomic64 implementations to use s64 as the base
> > type for atomic64_t (as discussed [2]), and to ensure that this type is
> > consistently used for parameters and return values in the API, avoiding 
> > further
> > problems in this area.
> >
> > This series (based on v5.1-rc1) can also be found in my atomics/type-cleanup
> > branch [3] on kernel.org.
> 
> Nice cleanup!
> 
> I've provided an explicit Ack for the asm-generic patch if someone wants
> to pick up the entire series, but I can also put it all into my asm-generic
> tree if you want, after more people have had a chance to take a look.

Thanks!

I had assumed that this would go through the tip tree, as previous
atomic rework had, but I have no preference as to how this gets merged.

I'm not sure what the policy is, so I'll leave it to Peter and Will to
say.

Mark.

Re: [PATCH 12/18] locking/atomic: riscv: use s64 for atomic64

2019-05-23 Thread Mark Rutland

On Wed, May 22, 2019 at 12:06:31PM -0700, Palmer Dabbelt wrote:
> On Wed, 22 May 2019 06:22:44 PDT (-0700), mark.rutl...@arm.com wrote:
> > As a step towards making the atomic64 API use consistent types treewide,
> > let's have the s390 atomic64 implementation use s64 as the underlying
> 
> and apparently the RISC-V one as well? :)

Heh. You can guess which commit message I wrote first...

> Reviwed-by: Palmer Dabbelt 

Cheers! I'll add an extra 'e' when I fold this in. :)

Thanks,
Mark.

Re: [PATCH 00/18] locking/atomic: atomic64 type cleanup

2019-05-23 Thread Mark Rutland

On Thu, May 23, 2019 at 10:30:13AM +0200, Andrea Parri wrote:
> Hi Mark,

Hi Andrea,

> On Wed, May 22, 2019 at 02:22:32PM +0100, Mark Rutland wrote:
> > Currently architectures return inconsistent types for atomic64 ops. Some 
> > return
> > long (e..g. powerpc), some return long long (e.g. arc), and some return s64
> > (e.g. x86).
> 
> (only partially related, but probably worth asking:)
> 
> While reading the series, I realized that the following expression:
> 
>   atomic64_t v;
> ...
>   typeof(v.counter) my_val = atomic64_set(, VAL);
> 
> is a valid expression on some architectures (in part., on architectures
> which #define atomic64_set() to WRITE_ONCE()) but is invalid on others.
> (This is due to the fact that WRITE_ONCE() can be used as an rvalue in
> the above assignment; TBH, I ignore the reasons for having such rvalue?)
> 
> IIUC, similar considerations hold for atomic_set().
> 
> The question is whether this is a known/"expected" inconsistency in the
> implementation of atomic64_set() or if this would also need to be fixed
> /addressed (say in a different patchset)?

In either case, I don't think the intent is that they should be used that way,
and from a quick scan, I can only fine a single relevant instance today:

[mark@lakrids:~/src/linux]% git grep '\(return\|=\)\s\+atomic\(64\)\?_set'
include/linux/vmw_vmci_defs.h:  return atomic_set((atomic_t *)var, 
(u32)new_val);
include/linux/vmw_vmci_defs.h:  return atomic64_set(var, new_val);


[mark@lakrids:~/src/linux]% git grep '=\s+atomic_set' | wc -l
0
[mark@lakrids:~/src/linux]% git grep '=\s+atomic64_set' | wc -l
0

Any architectures implementing arch_atomic_* will have both of these functions
returning void. Currently that's x86 and arm64, but (time permitting) I intend
to migrate other architectures, so I guess we'll have to fix the above up as
required.

I think it's best to avoid the construct above.

Thanks,
Mark.

Re: [PATCH V3 2/4] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump

2019-05-22 Thread Mark Rutland

On Thu, May 16, 2019 at 01:05:29PM +0200, Michal Hocko wrote:
> On Thu 16-05-19 11:23:54, Mark Rutland wrote:
> > Hi Michal,
> > 
> > On Wed, May 15, 2019 at 06:58:47PM +0200, Michal Hocko wrote:
> > > On Tue 14-05-19 14:30:05, Anshuman Khandual wrote:
> > > > The arm64 pagetable dump code can race with concurrent modification of 
> > > > the
> > > > kernel page tables. When a leaf entries are modified concurrently, the 
> > > > dump
> > > > code may log stale or inconsistent information for a VA range, but this 
> > > > is
> > > > otherwise not harmful.
> > > > 
> > > > When intermediate levels of table are freed, the dump code will 
> > > > continue to
> > > > use memory which has been freed and potentially reallocated for another
> > > > purpose. In such cases, the dump code may dereference bogus addressses,
> > > > leading to a number of potential problems.
> > > > 
> > > > Intermediate levels of table may by freed during memory hot-remove, or 
> > > > when
> > > > installing a huge mapping in the vmalloc region. To avoid racing with 
> > > > these
> > > > cases, take the memory hotplug lock when walking the kernel page table.
> > > 
> > > Why is this a problem only on arm64 
> > 
> > It looks like it's not -- I think we're just the first to realise this.
> > 
> > AFAICT x86's debugfs ptdump has the same issue if run conccurently with
> > memory hot remove. If 32-bit arm supported hot-remove, its ptdump code
> > would have the same issue.
> > 
> > > and why do we even care for debugfs? Does anybody rely on this thing
> > > to be reliable? Do we even need it? Who is using the file?
> > 
> > The debugfs part is used intermittently by a few people working on the
> > arm64 kernel page tables. We use that both to sanity-check that kernel
> > page tables are created/updated correctly after changes to the arm64 mmu
> > code, and also to debug issues if/when we encounter issues that appear
> > to be the result of kernel page table corruption.
> 
> OK, I see. Thanks for the clarification.
> 
> > So while it's rare to need it, it's really useful to have when we do
> > need it, and I'd rather not remove it. I'd also rather that it didn't
> > have latent issues where we can accidentally crash the kernel when using
> > it, which is what this patch is addressing.
> 
> While I agree, do we rather want to document that you shouldn't be using
> the debugging tool while the hotplug is ongoing because you might get a
> garbage or crash the kernel in the worst case? In other words is the
> absolute correctness worth the additional maint. burden wrt. to future
> hotplug changes?

I don't think that it's reasonable for this code to bring down the
kernel unless the kernel page tables are already corrupt. I agree we
should minimize the impact on other code, and I'm happy to penalize
ptdump so long as it's functional and safe.

I would like it to be possible to use the ptdump code to debug
hot-remove, so I'd rather not make the two mutually exclusive. I'd also
like it to be possible to use this in-the-field, and for that asking an
admin to potentially crash their system isn't likely to fly.

> > > I am asking because I would really love to make mem hotplug locking less
> > > scattered outside of the core MM than more. Most users simply shouldn't
> > > care. Pfn walkers should rely on pfn_to_online_page.

Jut to check, is your plan to limit access to the hotplug lock, or to
redesign the locking scheme?

> > I'm not sure if that would help us here; IIUC pfn_to_online_page() alone
> > doesn't ensure that the page remains online. Is there a way to achieve
> > that other than get_online_mems()?
> 
> You have to pin the page to make sure the hotplug is not going to
> offline it.

I'm not exactly sure how pinning works -- is there a particular set of
functions I should look at for that?

I guess that if/when we allocate the vmemmap from hotpluggable memory
that will require the pinning code to take the hotplug lock internally
to ensure that the struct page is accessible while we attempt to pin it?

Thanks,
Mark.

Re: [PATCH v5 1/2] platform/mellanox: Add bootctl driver for Mellanox BlueField Soc

2019-05-22 Thread Mark Rutland

Hi Liming,

On Fri, May 17, 2019 at 01:49:04PM -0400, Liming Sun wrote:
> This commit adds the bootctl platform driver for Mellanox BlueField
> Soc, which controls the eMMC boot partition swapping and sends SMC
> calls to ATF running at exception level EL3 to program some system
> register. This register is persistent during warm reset and is only
> accessible in secure code which is used to enable the watchdog after
> reboot.

I'm concerned by this. AFAICT this part must boot using EFI (in order
for the ACPI bits to work), and these interfaces significantly overlap
with or contradict the UEFI specification w.r.t. boot management.

How exactly does this SoC boot?

How do these interfaces interact with the regular UEFI bootflow?

Thanks,
Mark.

> 
> Below are the sequences of typical use case.
> 
>   1. User-space tool upgrades one eMMC boot partition and requests
>  the boot partition swapping;
> 
>   2. The bootctl driver handles this request and sends SMC call
>  to ATF. ATF programs register BREADCRUMB0 which has value
>  preserved during warm reset. It also programs eMMC to swap
>  the boot partition;
> 
>   3. After software reset (rebooting), ATF BL1 (BootRom) checks
>  register BREADCRUMB0 to enable watchdog if configured;
> 
>   4. If booting fails, the watchdog timer will trigger rebooting.
>  In such case, ATF Boot ROM will switch the boot partition
>  back to the previous one.
> 
> Reviewed-by: Vadim Pasternak 
> Signed-off-by: Liming Sun 
> ---
> v4->v5:
> Fixes for comments from Andy:
> - Added ABI documentation;
> - Remove the extra 'const' in mlxbf_bootctl_svc_uuid_str definition;
> - Remove the mlxbf_bootctl_smc_call0() MACRO to use function
>   call directly;
> - Return more descriptive string ('invalid action') in
>   mlxbf_bootctl_reset_action_to_string();
> - Check return value of the mlxbf_bootctl_smc() in function
>   post_reset_wdog_show() and reset_action_show();
> - Revise the sprintf() line in reset_action_show() into one line;
> - Move the 'action = ' assignment out of the declarations
>   in reset_action_store() and check return value of
>   mlxbf_bootctl_reset_action_to_val() in this function;
> - Removed Redundant parens in reset_action_store();
> - Check return code of the smc_call in second_reset_action_show(),
>   merge the 'name = ' assignment into the sprintf() directly;
> - Move the 'action = ' assignment out of the declaration block
>   in second_reset_action_store();
> - Remove redundant blank line and parents in lifecycle_state_show();
> - Split declaration and assignment in secure_boot_fuse_state_show();
> - use BIT() in secure_boot_fuse_state_show() and simplify code in
>   this function to split the logic of message choice and flag flip;
>   Also removed the 'key !=0 ' check since it's not needed;
> - Added comma in mlxbf_bootctl_attr_group definition.
> - Removed un-needed '& 0xFF' logic in mlxbf_bootctl_guid_match();
> - Use temp variable for the result of the smc call in
>   mlxbf_bootctl_probe();
> - Use dev_warn() instead of dev_err() in mlxbf_bootctl_probe();
> - Removed the ACPI_PTR usage in the mlxbf_bootctl_driver definition;
> Fixes for comments from Vadim:
> - Move mlxbf-bootctl.o to the top of the Makefile;
> - Fix to use the 'ret' value instead of another mlxbf_bootctl_smc() call;
> - Fix the 'declaration inside the block' in secure_boot_fuse_state_show();
> - Use ATTRIBUTE_GROUPS() for the attribute definition;
> - Use single line for a comment in mlxbf-bootctl.h
> - Use KernelVersion 5.3 instead of 5.2.2 in the ABI doc;
> - Use common utility function for the show and store of reset_action
>   and second_reset_action.
> New changes:
> - Rename mlxbf_bootctl_smc_call1() to mlxbf_bootctl_smc() to
>   shorten the name so some statement could be make into one line;
> - Fixed the MODULE_LICENSE();
> - Added more comments in secure_boot_fuse_state_show();
> v4: Update Kconfig for the dependency on ACPI.
> Fixed a typo which caused build error for kernel module.
> v2->v3:
> Fixes for comments from Andy:
> - More coding style fixes;
> - Revised the uuid matching code.
> v2: Fix the Kconfig and coding styles, propagate errors to caller.
> v1: Initial version.
> ---
>  drivers/platform/mellanox/Kconfig |  12 ++
>  drivers/platform/mellanox/Makefile|   1 +
>  drivers/platform/mellanox/mlxbf-bootctl.c | 311 
> ++
>  drivers/platform/mellanox/mlxbf-bootctl.h | 103 ++
>  4 files changed, 427 insertions(+)
>  create mode 100644 drivers/platform/mellanox/mlxbf-bootctl.c
>  create mode 100644 drivers/platform/mellanox/mlxbf-bootctl.h
> 
> diff --git a/drivers/platform/mellanox/Kconfig 
> b/drivers/platform/mellanox/Kconfig
> index 530fe7e..386336d 100644
> --- a/drivers/platform/mellanox/Kconfig
> +++

[PATCH 17/18] locking/atomic: crypto: nx: remove redundant casts

2019-05-22 Thread Mark Rutland

Now that atomic64_read() returns s64 consistently, we don't need to
explicitly cast its return value. Drop the redundant casts.

Signed-off-by: Mark Rutland 
Cc: Herbert Xu 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 drivers/crypto/nx/nx-842-pseries.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-pseries.c 
b/drivers/crypto/nx/nx-842-pseries.c
index 9432e9e42afe..5cf77729a438 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -870,7 +870,7 @@ static ssize_t nx842_##_name##_show(struct device *dev, 
\
local_devdata = rcu_dereference(devdata);   \
if (local_devdata)  \
p = snprintf(buf, PAGE_SIZE, "%lld\n",  \
-  (s64)atomic64_read(_devdata->counters->_name));
\
+  atomic64_read(_devdata->counters->_name)); \
rcu_read_unlock();  \
return p;   \
 }
@@ -924,7 +924,7 @@ static ssize_t nx842_timehist_show(struct device *dev,
for (i = 0; i < (NX842_HIST_SLOTS - 2); i++) {
bytes = snprintf(p, bytes_remain, "%u-%uus:\t%lld\n",
   i ? (2<<(i-1)) : 0, (2<

[PATCH 18/18] locking/atomic: s390/pci: remove redundant casts

2019-05-22 Thread Mark Rutland

Now that atomic64_read() returns s64 consistently, we don't need to
explicitly cast its return value. Drop the redundant casts.

Signed-off-by: Mark Rutland 
Cc: Heiko Carstens 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 arch/s390/pci/pci_debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/s390/pci/pci_debug.c b/arch/s390/pci/pci_debug.c
index 45eccf79e990..3408c0df3ebf 100644
--- a/arch/s390/pci/pci_debug.c
+++ b/arch/s390/pci/pci_debug.c
@@ -75,7 +75,7 @@ static void pci_sw_counter_show(struct seq_file *m)
 
for (i = 0; i < ARRAY_SIZE(pci_sw_names); i++, counter++)
seq_printf(m, "%26s:\t%llu\n", pci_sw_names[i],
-  (s64)atomic64_read(counter));
+  atomic64_read(counter));
 }
 
 static int pci_perf_show(struct seq_file *m, void *v)
-- 
2.11.0

[PATCH 16/18] locking/atomic: use s64 for atomic64_t on 64-bit

2019-05-22 Thread Mark Rutland

Now that all architectures use 64 consistently as the base type for the
atomic64 API, let's have the CONFIG_64BIT definition of atomic64_t use
s64 as the underlying type for atomic64_t, rather than long, matching
the generated headers.

On architectures where atomic64_read(v) is READ_ONCE(v->counter), this
patch will cause the return type of atomic64_read() to be s64.

As of this patch, the atomic64 API can be relied upon to consistently
return s64 where a value rather than boolean condition is returned. This
should make code more robust, and simpler, allowing for the removal of
casts previously required to ensure consistent types.

Signed-off-by: Mark Rutland 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 include/linux/types.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/types.h b/include/linux/types.h
index 231114ae38f4..05030f608be3 100644
--- a/include/linux/types.h
+++ b/include/linux/types.h
@@ -174,7 +174,7 @@ typedef struct {
 
 #ifdef CONFIG_64BIT
 typedef struct {
-   long counter;
+   s64 counter;
 } atomic64_t;
 #endif
 
-- 
2.11.0

[PATCH 15/18] locking/atomic: x86: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the x86 atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long or long long, matching the
generated headers.

Note that the x86 arch_atomic64 implementation is already wrapped by the
generic instrumented atomic64 implementation, which uses s64
consistently.

Otherwise, there should be no functional change as a result of this
patch.

Signed-off-by: Mark Rutland 
Cc: Borislav Petkov 
Cc: Ingo Molnar 
Cc: Peter Zijlstra 
Cc: Russell King 
Cc: Thomas Gleixner 
Cc: Will Deacon 
---
 arch/x86/include/asm/atomic64_32.h | 66 ++
 arch/x86/include/asm/atomic64_64.h | 38 +++---
 2 files changed, 51 insertions(+), 53 deletions(-)

diff --git a/arch/x86/include/asm/atomic64_32.h 
b/arch/x86/include/asm/atomic64_32.h
index 6a5b0ec460da..52cfaecb13f9 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -9,7 +9,7 @@
 /* An 64bit atomic type */
 
 typedef struct {
-   u64 __aligned(8) counter;
+   s64 __aligned(8) counter;
 } atomic64_t;
 
 #define ATOMIC64_INIT(val) { (val) }
@@ -71,8 +71,7 @@ ATOMIC64_DECL(add_unless);
  * the old value.
  */
 
-static inline long long arch_atomic64_cmpxchg(atomic64_t *v, long long o,
- long long n)
+static inline s64 arch_atomic64_cmpxchg(atomic64_t *v, s64 o, s64 n)
 {
return arch_cmpxchg64(>counter, o, n);
 }
@@ -85,9 +84,9 @@ static inline long long arch_atomic64_cmpxchg(atomic64_t *v, 
long long o,
  * Atomically xchgs the value of @v to @n and returns
  * the old value.
  */
-static inline long long arch_atomic64_xchg(atomic64_t *v, long long n)
+static inline s64 arch_atomic64_xchg(atomic64_t *v, s64 n)
 {
-   long long o;
+   s64 o;
unsigned high = (unsigned)(n >> 32);
unsigned low = (unsigned)n;
alternative_atomic64(xchg, "=" (o),
@@ -103,7 +102,7 @@ static inline long long arch_atomic64_xchg(atomic64_t *v, 
long long n)
  *
  * Atomically sets the value of @v to @n.
  */
-static inline void arch_atomic64_set(atomic64_t *v, long long i)
+static inline void arch_atomic64_set(atomic64_t *v, s64 i)
 {
unsigned high = (unsigned)(i >> 32);
unsigned low = (unsigned)i;
@@ -118,9 +117,9 @@ static inline void arch_atomic64_set(atomic64_t *v, long 
long i)
  *
  * Atomically reads the value of @v and returns it.
  */
-static inline long long arch_atomic64_read(const atomic64_t *v)
+static inline s64 arch_atomic64_read(const atomic64_t *v)
 {
-   long long r;
+   s64 r;
alternative_atomic64(read, "=" (r), "c" (v) : "memory");
return r;
 }
@@ -132,7 +131,7 @@ static inline long long arch_atomic64_read(const atomic64_t 
*v)
  *
  * Atomically adds @i to @v and returns @i + *@v
  */
-static inline long long arch_atomic64_add_return(long long i, atomic64_t *v)
+static inline s64 arch_atomic64_add_return(s64 i, atomic64_t *v)
 {
alternative_atomic64(add_return,
 ASM_OUTPUT2("+A" (i), "+c" (v)),
@@ -143,7 +142,7 @@ static inline long long arch_atomic64_add_return(long long 
i, atomic64_t *v)
 /*
  * Other variants with different arithmetic operators:
  */
-static inline long long arch_atomic64_sub_return(long long i, atomic64_t *v)
+static inline s64 arch_atomic64_sub_return(s64 i, atomic64_t *v)
 {
alternative_atomic64(sub_return,
 ASM_OUTPUT2("+A" (i), "+c" (v)),
@@ -151,18 +150,18 @@ static inline long long arch_atomic64_sub_return(long 
long i, atomic64_t *v)
return i;
 }
 
-static inline long long arch_atomic64_inc_return(atomic64_t *v)
+static inline s64 arch_atomic64_inc_return(atomic64_t *v)
 {
-   long long a;
+   s64 a;
alternative_atomic64(inc_return, "=" (a),
 "S" (v) : "memory", "ecx");
return a;
 }
 #define arch_atomic64_inc_return arch_atomic64_inc_return
 
-static inline long long arch_atomic64_dec_return(atomic64_t *v)
+static inline s64 arch_atomic64_dec_return(atomic64_t *v)
 {
-   long long a;
+   s64 a;
alternative_atomic64(dec_return, "=" (a),
 "S" (v) : "memory", "ecx");
return a;
@@ -176,7 +175,7 @@ static inline long long arch_atomic64_dec_return(atomic64_t 
*v)
  *
  * Atomically adds @i to @v.
  */
-static inline long long arch_atomic64_add(long long i, atomic64_t *v)
+static inline s64 arch_atomic64_add(s64 i, atomic64_t *v)
 {
__alternative_atomic64(add, add_return,
   ASM_OUTPUT2("+A" (i), "+c" (v)),
@@ -191,7 +190,7 @@ static inline long long arch_atomic64_add(long long i, 
atomic64_t *v)
  *
  * Atomically subtracts @

[PATCH 12/18] locking/atomic: riscv: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the s390 atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long, matching the generated headers.

As atomic64_read() depends on the generic defintion of atomic64_t, this
still returns long on 64-bit. This will be converted in a subsequent
patch.

Otherwise, there should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Cc: Albert Ou 
Cc: Palmer Dabbelt 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 arch/riscv/include/asm/atomic.h | 44 +
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index c9e18289d65c..bffebc57357d 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -42,11 +42,11 @@ static __always_inline void atomic_set(atomic_t *v, int i)
 
 #ifndef CONFIG_GENERIC_ATOMIC64
 #define ATOMIC64_INIT(i) { (i) }
-static __always_inline long atomic64_read(const atomic64_t *v)
+static __always_inline s64 atomic64_read(const atomic64_t *v)
 {
return READ_ONCE(v->counter);
 }
-static __always_inline void atomic64_set(atomic64_t *v, long i)
+static __always_inline void atomic64_set(atomic64_t *v, s64 i)
 {
WRITE_ONCE(v->counter, i);
 }
@@ -70,11 +70,11 @@ void atomic##prefix##_##op(c_type i, atomic##prefix##_t *v) 
\
 
 #ifdef CONFIG_GENERIC_ATOMIC64
 #define ATOMIC_OPS(op, asm_op, I)  \
-ATOMIC_OP (op, asm_op, I, w,  int,   )
+ATOMIC_OP (op, asm_op, I, w, int,   )
 #else
 #define ATOMIC_OPS(op, asm_op, I)  \
-ATOMIC_OP (op, asm_op, I, w,  int,   ) \
-ATOMIC_OP (op, asm_op, I, d, long, 64)
+ATOMIC_OP (op, asm_op, I, w, int,   )  \
+ATOMIC_OP (op, asm_op, I, d, s64, 64)
 #endif
 
 ATOMIC_OPS(add, add,  i)
@@ -131,14 +131,14 @@ c_type atomic##prefix##_##op##_return(c_type i, 
atomic##prefix##_t *v)\
 
 #ifdef CONFIG_GENERIC_ATOMIC64
 #define ATOMIC_OPS(op, asm_op, c_op, I)
\
-ATOMIC_FETCH_OP( op, asm_op,   I, w,  int,   ) \
-ATOMIC_OP_RETURN(op, asm_op, c_op, I, w,  int,   )
+ATOMIC_FETCH_OP( op, asm_op,   I, w, int,   )  \
+ATOMIC_OP_RETURN(op, asm_op, c_op, I, w, int,   )
 #else
 #define ATOMIC_OPS(op, asm_op, c_op, I)
\
-ATOMIC_FETCH_OP( op, asm_op,   I, w,  int,   ) \
-ATOMIC_OP_RETURN(op, asm_op, c_op, I, w,  int,   ) \
-ATOMIC_FETCH_OP( op, asm_op,   I, d, long, 64) \
-ATOMIC_OP_RETURN(op, asm_op, c_op, I, d, long, 64)
+ATOMIC_FETCH_OP( op, asm_op,   I, w, int,   )  \
+ATOMIC_OP_RETURN(op, asm_op, c_op, I, w, int,   )  \
+ATOMIC_FETCH_OP( op, asm_op,   I, d, s64, 64)  \
+ATOMIC_OP_RETURN(op, asm_op, c_op, I, d, s64, 64)
 #endif
 
 ATOMIC_OPS(add, add, +,  i)
@@ -170,11 +170,11 @@ ATOMIC_OPS(sub, add, +, -i)
 
 #ifdef CONFIG_GENERIC_ATOMIC64
 #define ATOMIC_OPS(op, asm_op, I)  \
-ATOMIC_FETCH_OP(op, asm_op, I, w,  int,   )
+ATOMIC_FETCH_OP(op, asm_op, I, w, int,   )
 #else
 #define ATOMIC_OPS(op, asm_op, I)  \
-ATOMIC_FETCH_OP(op, asm_op, I, w,  int,   )\
-ATOMIC_FETCH_OP(op, asm_op, I, d, long, 64)
+ATOMIC_FETCH_OP(op, asm_op, I, w, int,   ) \
+ATOMIC_FETCH_OP(op, asm_op, I, d, s64, 64)
 #endif
 
 ATOMIC_OPS(and, and, i)
@@ -223,9 +223,10 @@ static __always_inline int 
atomic_fetch_add_unless(atomic_t *v, int a, int u)
 #define atomic_fetch_add_unless atomic_fetch_add_unless
 
 #ifndef CONFIG_GENERIC_ATOMIC64
-static __always_inline long atomic64_fetch_add_unless(atomic64_t *v, long a, 
long u)
+static __always_inline s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 
u)
 {
-   long prev, rc;
+   s64 prev;
+   long rc;
 
__asm__ __volatile__ (
"0: lr.d %[p],  %[c]\n"
@@ -294,11 +295,11 @@ c_t atomic##prefix##_cmpxchg(atomic##prefix##_t *v, c_t 
o, c_t n) \
 
 #ifdef CONFIG_GENERIC_ATOMIC64
 #define ATOMIC_OPS()   \
-   ATOMIC_OP( int,   , 4)
+   ATOMIC_OP(int,   , 4)
 #else
 #define ATOMIC_OPS()   \
-   ATOMIC_OP( int,   , 4)  \
-   ATOMIC_OP(long, 64, 8)
+   ATOMIC_OP(int,   , 4)   \
+   ATOMIC_OP(s64, 64, 8)
 #endif
 
 ATOMIC_OPS()
@@ -336,9 +337,10 @@ static __always_inline int atomic_sub_if_positive(atomic_t

[PATCH 11/18] locking/atomic: riscv: fix atomic64_sub_if_positive() offset argument

2019-05-22 Thread Mark Rutland

Presently the riscv implementation of atomic64_sub_if_positive() takes
a 32-bit offset value rather than a 64-bit offset value as it should do.
Thus, if called with a 64-bit offset, the value will be unexpectedly
truncated to 32 bits.

Fix this by taking the offset as a long rather than an int.

Signed-off-by: Mark Rutland 
Cc: Albert Ou 
Cc: Palmer Dabbelt 
Cc: Peter Zijlstra 
Cc: Will Deacon 
Cc: sta...@vger.kernel.org
---
 arch/riscv/include/asm/atomic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/atomic.h b/arch/riscv/include/asm/atomic.h
index 93826771b616..c9e18289d65c 100644
--- a/arch/riscv/include/asm/atomic.h
+++ b/arch/riscv/include/asm/atomic.h
@@ -336,7 +336,7 @@ static __always_inline int atomic_sub_if_positive(atomic_t 
*v, int offset)
 #define atomic_dec_if_positive(v)  atomic_sub_if_positive(v, 1)
 
 #ifndef CONFIG_GENERIC_ATOMIC64
-static __always_inline long atomic64_sub_if_positive(atomic64_t *v, int offset)
+static __always_inline long atomic64_sub_if_positive(atomic64_t *v, long 
offset)
 {
long prev, rc;
 
-- 
2.11.0

[PATCH 14/18] locking/atomic: sparc: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the sparc atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long, matching the generated headers.

As atomic64_read() depends on the generic defintion of atomic64_t, this
still returns long. This will be converted in a subsequent patch.

Otherwise, there should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Cc: David S. Miller 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 arch/sparc/include/asm/atomic_64.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/sparc/include/asm/atomic_64.h 
b/arch/sparc/include/asm/atomic_64.h
index 6963482c81d8..b60448397d4f 100644
--- a/arch/sparc/include/asm/atomic_64.h
+++ b/arch/sparc/include/asm/atomic_64.h
@@ -23,15 +23,15 @@
 
 #define ATOMIC_OP(op)  \
 void atomic_##op(int, atomic_t *); \
-void atomic64_##op(long, atomic64_t *);
+void atomic64_##op(s64, atomic64_t *);
 
 #define ATOMIC_OP_RETURN(op)   \
 int atomic_##op##_return(int, atomic_t *); \
-long atomic64_##op##_return(long, atomic64_t *);
+s64 atomic64_##op##_return(s64, atomic64_t *);
 
 #define ATOMIC_FETCH_OP(op)\
 int atomic_fetch_##op(int, atomic_t *);
\
-long atomic64_fetch_##op(long, atomic64_t *);
+s64 atomic64_fetch_##op(s64, atomic64_t *);
 
 #define ATOMIC_OPS(op) ATOMIC_OP(op) ATOMIC_OP_RETURN(op) ATOMIC_FETCH_OP(op)
 
@@ -61,7 +61,7 @@ static inline int atomic_xchg(atomic_t *v, int new)
((__typeof__((v)->counter))cmpxchg(&((v)->counter), (o), (n)))
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
 
-long atomic64_dec_if_positive(atomic64_t *v);
+s64 atomic64_dec_if_positive(atomic64_t *v);
 #define atomic64_dec_if_positive atomic64_dec_if_positive
 
 #endif /* !(__ARCH_SPARC64_ATOMIC__) */
-- 
2.11.0

[PATCH 13/18] locking/atomic: s390: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the s390 atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long, matching the generated headers.

As atomic64_read() depends on the generic defintion of atomic64_t, this
still returns long. This will be converted in a subsequent patch.

The s390-internal __atomic64_*() ops are also used by the s390 bitops,
and expect pointers to long. Since atomic64_t::counter will be converted
to s64 in a subsequent patch, pointes to this are explicitly cast to
pointers to long when passed to __atomic64_*() ops.

Otherwise, there should be no functional change as a result of this
patch.

Signed-off-by: Mark Rutland 
Cc: Heiko Carstens 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 arch/s390/include/asm/atomic.h | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/arch/s390/include/asm/atomic.h b/arch/s390/include/asm/atomic.h
index fd20ab5d4cf7..491ad53a0d4e 100644
--- a/arch/s390/include/asm/atomic.h
+++ b/arch/s390/include/asm/atomic.h
@@ -84,9 +84,9 @@ static inline int atomic_cmpxchg(atomic_t *v, int old, int 
new)
 
 #define ATOMIC64_INIT(i)  { (i) }
 
-static inline long atomic64_read(const atomic64_t *v)
+static inline s64 atomic64_read(const atomic64_t *v)
 {
-   long c;
+   s64 c;
 
asm volatile(
"   lg  %0,%1\n"
@@ -94,49 +94,49 @@ static inline long atomic64_read(const atomic64_t *v)
return c;
 }
 
-static inline void atomic64_set(atomic64_t *v, long i)
+static inline void atomic64_set(atomic64_t *v, s64 i)
 {
asm volatile(
"   stg %1,%0\n"
: "=Q" (v->counter) : "d" (i));
 }
 
-static inline long atomic64_add_return(long i, atomic64_t *v)
+static inline s64 atomic64_add_return(s64 i, atomic64_t *v)
 {
-   return __atomic64_add_barrier(i, >counter) + i;
+   return __atomic64_add_barrier(i, (long *)>counter) + i;
 }
 
-static inline long atomic64_fetch_add(long i, atomic64_t *v)
+static inline s64 atomic64_fetch_add(s64 i, atomic64_t *v)
 {
-   return __atomic64_add_barrier(i, >counter);
+   return __atomic64_add_barrier(i, (long *)>counter);
 }
 
-static inline void atomic64_add(long i, atomic64_t *v)
+static inline void atomic64_add(s64 i, atomic64_t *v)
 {
 #ifdef CONFIG_HAVE_MARCH_Z196_FEATURES
if (__builtin_constant_p(i) && (i > -129) && (i < 128)) {
-   __atomic64_add_const(i, >counter);
+   __atomic64_add_const(i, (long *)>counter);
return;
}
 #endif
-   __atomic64_add(i, >counter);
+   __atomic64_add(i, (long *)>counter);
 }
 
 #define atomic64_xchg(v, new) (xchg(&((v)->counter), new))
 
-static inline long atomic64_cmpxchg(atomic64_t *v, long old, long new)
+static inline s64 atomic64_cmpxchg(atomic64_t *v, s64 old, s64 new)
 {
-   return __atomic64_cmpxchg(>counter, old, new);
+   return __atomic64_cmpxchg((long *)>counter, old, new);
 }
 
 #define ATOMIC64_OPS(op)   \
-static inline void atomic64_##op(long i, atomic64_t *v)
\
+static inline void atomic64_##op(s64 i, atomic64_t *v) \
 {  \
-   __atomic64_##op(i, >counter);\
+   __atomic64_##op(i, (long *)>counter);\
 }  \
-static inline long atomic64_fetch_##op(long i, atomic64_t *v)  \
+static inline long atomic64_fetch_##op(s64 i, atomic64_t *v)   \
 {  \
-   return __atomic64_##op##_barrier(i, >counter);   \
+   return __atomic64_##op##_barrier(i, (long *)>counter);   \
 }
 
 ATOMIC64_OPS(and)
@@ -145,8 +145,8 @@ ATOMIC64_OPS(xor)
 
 #undef ATOMIC64_OPS
 
-#define atomic64_sub_return(_i, _v)atomic64_add_return(-(long)(_i), _v)
-#define atomic64_fetch_sub(_i, _v) atomic64_fetch_add(-(long)(_i), _v)
-#define atomic64_sub(_i, _v)   atomic64_add(-(long)(_i), _v)
+#define atomic64_sub_return(_i, _v)atomic64_add_return(-(s64)(_i), _v)
+#define atomic64_fetch_sub(_i, _v) atomic64_fetch_add(-(s64)(_i), _v)
+#define atomic64_sub(_i, _v)   atomic64_add(-(s64)(_i), _v)
 
 #endif /* __ARCH_S390_ATOMIC__  */
-- 
2.11.0

[PATCH 06/18] locking/atomic: arm: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the arm atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long long, matching the generated
headers.

Otherwise, there should be no functional change as a result of this
patch.

Signed-off-by: Mark Rutland 
Cc: Peter Zijlstra 
Cc: Russell King 
Cc: Will Deacon 
---
 arch/arm/include/asm/atomic.h | 50 +--
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/atomic.h b/arch/arm/include/asm/atomic.h
index f74756641410..d45c41f6f69c 100644
--- a/arch/arm/include/asm/atomic.h
+++ b/arch/arm/include/asm/atomic.h
@@ -249,15 +249,15 @@ ATOMIC_OPS(xor, ^=, eor)
 
 #ifndef CONFIG_GENERIC_ATOMIC64
 typedef struct {
-   long long counter;
+   s64 counter;
 } atomic64_t;
 
 #define ATOMIC64_INIT(i) { (i) }
 
 #ifdef CONFIG_ARM_LPAE
-static inline long long atomic64_read(const atomic64_t *v)
+static inline s64 atomic64_read(const atomic64_t *v)
 {
-   long long result;
+   s64 result;
 
__asm__ __volatile__("@ atomic64_read\n"
 "  ldrd%0, %H0, [%1]"
@@ -268,7 +268,7 @@ static inline long long atomic64_read(const atomic64_t *v)
return result;
 }
 
-static inline void atomic64_set(atomic64_t *v, long long i)
+static inline void atomic64_set(atomic64_t *v, s64 i)
 {
__asm__ __volatile__("@ atomic64_set\n"
 "  strd%2, %H2, [%1]"
@@ -277,9 +277,9 @@ static inline void atomic64_set(atomic64_t *v, long long i)
);
 }
 #else
-static inline long long atomic64_read(const atomic64_t *v)
+static inline s64 atomic64_read(const atomic64_t *v)
 {
-   long long result;
+   s64 result;
 
__asm__ __volatile__("@ atomic64_read\n"
 "  ldrexd  %0, %H0, [%1]"
@@ -290,9 +290,9 @@ static inline long long atomic64_read(const atomic64_t *v)
return result;
 }
 
-static inline void atomic64_set(atomic64_t *v, long long i)
+static inline void atomic64_set(atomic64_t *v, s64 i)
 {
-   long long tmp;
+   s64 tmp;
 
prefetchw(>counter);
__asm__ __volatile__("@ atomic64_set\n"
@@ -307,9 +307,9 @@ static inline void atomic64_set(atomic64_t *v, long long i)
 #endif
 
 #define ATOMIC64_OP(op, op1, op2)  \
-static inline void atomic64_##op(long long i, atomic64_t *v)   \
+static inline void atomic64_##op(s64 i, atomic64_t *v) \
 {  \
-   long long result;   \
+   s64 result; \
unsigned long tmp;  \
\
prefetchw(>counter); \
@@ -326,10 +326,10 @@ static inline void atomic64_##op(long long i, atomic64_t 
*v)  \
 }  \
 
 #define ATOMIC64_OP_RETURN(op, op1, op2)   \
-static inline long long
\
-atomic64_##op##_return_relaxed(long long i, atomic64_t *v) \
+static inline s64  \
+atomic64_##op##_return_relaxed(s64 i, atomic64_t *v)   \
 {  \
-   long long result;   \
+   s64 result; \
unsigned long tmp;  \
\
prefetchw(>counter); \
@@ -349,10 +349,10 @@ atomic64_##op##_return_relaxed(long long i, atomic64_t 
*v)\
 }
 
 #define ATOMIC64_FETCH_OP(op, op1, op2)
\
-static inline long long
\
-atomic64_fetch_##op##_relaxed(long long i, atomic64_t *v)  \
+static inline s64  \
+atomic64_fetch_##op##_relaxed(s64 i, atomic64_t *v)\
 {  \
-   long long result, val;  \
+   s64 result, val;\
unsigned long tmp;  \
\
prefetchw(>counter); \
@@ -406,10 +406,9 @@ ATOMIC64_OPS(xor, eor, eor)
 #undef ATOMIC64_

[PATCH 07/18] locking/atomic: arm64: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the arm64 atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long, matching the generated headers.

As atomic64_read() depends on the generic defintion of atomic64_t, this
still returns long. This will be converted in a subsequent patch.

Note that in arch_atomic64_dec_if_positive(), the x0 variable is left as
long, as this variable is also used to hold the pointer to the
atomic64_t.

Otherwise, there should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland 
Cc: Catalin Marinas 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 arch/arm64/include/asm/atomic_ll_sc.h | 20 ++--
 arch/arm64/include/asm/atomic_lse.h   | 34 +-
 2 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/include/asm/atomic_ll_sc.h 
b/arch/arm64/include/asm/atomic_ll_sc.h
index e321293e0c89..f3b12d7f431f 100644
--- a/arch/arm64/include/asm/atomic_ll_sc.h
+++ b/arch/arm64/include/asm/atomic_ll_sc.h
@@ -133,9 +133,9 @@ ATOMIC_OPS(xor, eor)
 
 #define ATOMIC64_OP(op, asm_op)
\
 __LL_SC_INLINE void\
-__LL_SC_PREFIX(arch_atomic64_##op(long i, atomic64_t *v))  \
+__LL_SC_PREFIX(arch_atomic64_##op(s64 i, atomic64_t *v))   \
 {  \
-   long result;\
+   s64 result; \
unsigned long tmp;  \
\
asm volatile("// atomic64_" #op "\n"\
@@ -150,10 +150,10 @@ __LL_SC_PREFIX(arch_atomic64_##op(long i, atomic64_t *v)) 
\
 __LL_SC_EXPORT(arch_atomic64_##op);
 
 #define ATOMIC64_OP_RETURN(name, mb, acq, rel, cl, op, asm_op) \
-__LL_SC_INLINE long\
-__LL_SC_PREFIX(arch_atomic64_##op##_return##name(long i, atomic64_t *v))\
+__LL_SC_INLINE s64 \
+__LL_SC_PREFIX(arch_atomic64_##op##_return##name(s64 i, atomic64_t *v))\
 {  \
-   long result;\
+   s64 result; \
unsigned long tmp;  \
\
asm volatile("// atomic64_" #op "_return" #name "\n"\
@@ -172,10 +172,10 @@ __LL_SC_PREFIX(arch_atomic64_##op##_return##name(long i, 
atomic64_t *v))\
 __LL_SC_EXPORT(arch_atomic64_##op##_return##name);
 
 #define ATOMIC64_FETCH_OP(name, mb, acq, rel, cl, op, asm_op)  \
-__LL_SC_INLINE long\
-__LL_SC_PREFIX(arch_atomic64_fetch_##op##name(long i, atomic64_t *v))  \
+__LL_SC_INLINE s64 \
+__LL_SC_PREFIX(arch_atomic64_fetch_##op##name(s64 i, atomic64_t *v))   \
 {  \
-   long result, val;   \
+   s64 result, val;\
unsigned long tmp;  \
\
asm volatile("// atomic64_fetch_" #op #name "\n"\
@@ -225,10 +225,10 @@ ATOMIC64_OPS(xor, eor)
 #undef ATOMIC64_OP_RETURN
 #undef ATOMIC64_OP
 
-__LL_SC_INLINE long
+__LL_SC_INLINE s64
 __LL_SC_PREFIX(arch_atomic64_dec_if_positive(atomic64_t *v))
 {
-   long result;
+   s64 result;
unsigned long tmp;
 
asm volatile("// atomic64_dec_if_positive\n"
diff --git a/arch/arm64/include/asm/atomic_lse.h 
b/arch/arm64/include/asm/atomic_lse.h
index 9256a3921e4b..c53832b08af7 100644
--- a/arch/arm64/include/asm/atomic_lse.h
+++ b/arch/arm64/include/asm/atomic_lse.h
@@ -224,9 +224,9 @@ ATOMIC_FETCH_OP_SUB(, al, "memory")
 
 #define __LL_SC_ATOMIC64(op)   __LL_SC_CALL(arch_atomic64_##op)
 #define ATOMIC64_OP(op, asm_op)
\
-static inline void arch_atomic64_##op(long i, atomic64_t *v)   \
+static inline void arch_atomic64_##op(s64 i, atomic64_t *v)\
 {  \
-   register long x0 asm ("x0") = i;\
+   register s64

[PATCH 10/18] locking/atomic: powerpc: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the powerpc atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long, matching the generated headers.

As atomic64_read() depends on the generic defintion of atomic64_t, this
still returns long on 64-bit. This will be converted in a subsequent
patch.

Otherwise, there should be no functional change as a result of this
patch.

Signed-off-by: Mark Rutland 
Cc: Michael Ellerman 
Cc: Paul Mackerras 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 arch/powerpc/include/asm/atomic.h | 44 +++
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/include/asm/atomic.h 
b/arch/powerpc/include/asm/atomic.h
index 52eafaf74054..31c231ea56b7 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -297,24 +297,24 @@ static __inline__ int atomic_dec_if_positive(atomic_t *v)
 
 #define ATOMIC64_INIT(i)   { (i) }
 
-static __inline__ long atomic64_read(const atomic64_t *v)
+static __inline__ s64 atomic64_read(const atomic64_t *v)
 {
-   long t;
+   s64 t;
 
__asm__ __volatile__("ld%U1%X1 %0,%1" : "=r"(t) : "m"(v->counter));
 
return t;
 }
 
-static __inline__ void atomic64_set(atomic64_t *v, long i)
+static __inline__ void atomic64_set(atomic64_t *v, s64 i)
 {
__asm__ __volatile__("std%U0%X0 %1,%0" : "=m"(v->counter) : "r"(i));
 }
 
 #define ATOMIC64_OP(op, asm_op)
\
-static __inline__ void atomic64_##op(long a, atomic64_t *v)\
+static __inline__ void atomic64_##op(s64 a, atomic64_t *v) \
 {  \
-   long t; \
+   s64 t;  \
\
__asm__ __volatile__(   \
 "1:ldarx   %0,0,%3 # atomic64_" #op "\n"   \
@@ -327,10 +327,10 @@ static __inline__ void atomic64_##op(long a, atomic64_t 
*v)   \
 }
 
 #define ATOMIC64_OP_RETURN_RELAXED(op, asm_op) \
-static inline long \
-atomic64_##op##_return_relaxed(long a, atomic64_t *v)  \
+static inline s64  \
+atomic64_##op##_return_relaxed(s64 a, atomic64_t *v)   \
 {  \
-   long t; \
+   s64 t;  \
\
__asm__ __volatile__(   \
 "1:ldarx   %0,0,%3 # atomic64_" #op "_return_relaxed\n"\
@@ -345,10 +345,10 @@ atomic64_##op##_return_relaxed(long a, atomic64_t *v) 
\
 }
 
 #define ATOMIC64_FETCH_OP_RELAXED(op, asm_op)  \
-static inline long \
-atomic64_fetch_##op##_relaxed(long a, atomic64_t *v)   \
+static inline s64  \
+atomic64_fetch_##op##_relaxed(s64 a, atomic64_t *v)\
 {  \
-   long res, t;\
+   s64 res, t; \
\
__asm__ __volatile__(   \
 "1:ldarx   %0,0,%4 # atomic64_fetch_" #op "_relaxed\n" \
@@ -396,7 +396,7 @@ ATOMIC64_OPS(xor, xor)
 
 static __inline__ void atomic64_inc(atomic64_t *v)
 {
-   long t;
+   s64 t;
 
__asm__ __volatile__(
 "1:ldarx   %0,0,%2 # atomic64_inc\n\
@@ -409,9 +409,9 @@ static __inline__ void atomic64_inc(atomic64_t *v)
 }
 #define atomic64_inc atomic64_inc
 
-static __inline__ long atomic64_inc_return_relaxed(atomic64_t *v)
+static __inline__ s64 atomic64_inc_return_relaxed(atomic64_t *v)
 {
-   long t;
+   s64 t;
 
__asm__ __volatile__(
 "1:ldarx   %0,0,%2 # atomic64_inc_return_relaxed\n"
@@ -427,7 +427,7 @@ static __inline__ long 
atomic64_inc_return_relaxed(atomic64_t *v)
 
 static __inline__ void atomic64_dec(atomic64_t *v)
 {
-   long t;
+   s64 t;
 
__asm__ __volatile__(
 "1:ldarx   %0,0,%2 # atomic64_dec\n\
@@ -440,9 +440,9

[PATCH 09/18] locking/atomic: mips: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the mips atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long or __s64, matching the generated
headers.

As atomic64_read() depends on the generic defintion of atomic64_t, this
still returns long on 64-bit. This will be converted in a subsequent
patch.

Otherwise, there should be no functional change as a result of this
patch.

Signed-off-by: Mark Rutland 
Cc: James Hogan 
Cc: Paul Burton 
Cc: Peter Zijlstra 
Cc: Ralf Baechle 
Cc: Will Deacon 
---
 arch/mips/include/asm/atomic.h | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/mips/include/asm/atomic.h b/arch/mips/include/asm/atomic.h
index 94096299fc56..9a82dd11c0e9 100644
--- a/arch/mips/include/asm/atomic.h
+++ b/arch/mips/include/asm/atomic.h
@@ -254,10 +254,10 @@ static __inline__ int atomic_sub_if_positive(int i, 
atomic_t * v)
 #define atomic64_set(v, i) WRITE_ONCE((v)->counter, (i))
 
 #define ATOMIC64_OP(op, c_op, asm_op)\
-static __inline__ void atomic64_##op(long i, atomic64_t * v) \
+static __inline__ void atomic64_##op(s64 i, atomic64_t * v)  \
 {\
if (kernel_uses_llsc) {   \
-   long temp;\
+   s64 temp; \
  \
loongson_llsc_mb();   \
__asm__ __volatile__( \
@@ -280,12 +280,12 @@ static __inline__ void atomic64_##op(long i, atomic64_t * 
v)\
 }
 
 #define ATOMIC64_OP_RETURN(op, c_op, asm_op) \
-static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v) \
+static __inline__ s64 atomic64_##op##_return_relaxed(s64 i, atomic64_t * v)   \
 {\
-   long result;  \
+   s64 result;   \
  \
if (kernel_uses_llsc) {   \
-   long temp;\
+   s64 temp; \
  \
loongson_llsc_mb();   \
__asm__ __volatile__( \
@@ -314,12 +314,12 @@ static __inline__ long 
atomic64_##op##_return_relaxed(long i, atomic64_t * v) \
 }
 
 #define ATOMIC64_FETCH_OP(op, c_op, asm_op)  \
-static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v)  \
+static __inline__ s64 atomic64_fetch_##op##_relaxed(s64 i, atomic64_t * v)\
 {\
-   long result;  \
+   s64 result;   \
  \
if (kernel_uses_llsc) {   \
-   long temp;\
+   s64 temp; \
  \
loongson_llsc_mb();   \
__asm__ __volatile__( \
@@ -386,14 +386,14 @@ ATOMIC64_OPS(xor, ^=, xor)
  * Atomically test @v and subtract @i if @v is greater or equal than @i.
  * The function returns the old value of @v minus @i.
  */
-static __inline__ long atomic64_sub_if_positive(long i, atomic64_t * v)
+static __inline__ s64 atomic64_sub_if_positive(s64 i, atomic64_t * v)
 {
-   long result;
+   s64 result;
 
smp_mb__before_llsc();
 
if (kernel_uses_llsc) {
-   long temp;
+   s64 temp;
 
__asm__ __volatile__(
"   .setpush\n"
-- 
2.11.0

[PATCH 08/18] locking/atomic: ia64: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the ia64 atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long or __s64, matching the generated
headers.

As atomic64_read() depends on the generic defintion of atomic64_t, this
still returns long. This will be converted in a subsequent patch.

Otherwise, there should be no functional change as a result of this
patch.

Signed-off-by: Mark Rutland 
Cc: Fenghua Yu 
Cc: Peter Zijlstra 
Cc: Tony Luck 
Cc: Will Deacon 
---
 arch/ia64/include/asm/atomic.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/ia64/include/asm/atomic.h b/arch/ia64/include/asm/atomic.h
index 206530d0751b..50440f3ddc43 100644
--- a/arch/ia64/include/asm/atomic.h
+++ b/arch/ia64/include/asm/atomic.h
@@ -124,10 +124,10 @@ ATOMIC_FETCH_OP(xor, ^)
 #undef ATOMIC_OP
 
 #define ATOMIC64_OP(op, c_op)  \
-static __inline__ long \
-ia64_atomic64_##op (__s64 i, atomic64_t *v)\
+static __inline__ s64  \
+ia64_atomic64_##op (s64 i, atomic64_t *v)  \
 {  \
-   __s64 old, new; \
+   s64 old, new;   \
CMPXCHG_BUGCHECK_DECL   \
\
do {\
@@ -139,10 +139,10 @@ ia64_atomic64_##op (__s64 i, atomic64_t *v)   
\
 }
 
 #define ATOMIC64_FETCH_OP(op, c_op)\
-static __inline__ long \
-ia64_atomic64_fetch_##op (__s64 i, atomic64_t *v)  \
+static __inline__ s64  \
+ia64_atomic64_fetch_##op (s64 i, atomic64_t *v)
\
 {  \
-   __s64 old, new; \
+   s64 old, new;   \
CMPXCHG_BUGCHECK_DECL   \
\
do {\
@@ -162,7 +162,7 @@ ATOMIC64_OPS(sub, -)
 
 #define atomic64_add_return(i,v)   \
 ({ \
-   long __ia64_aar_i = (i);\
+   s64 __ia64_aar_i = (i); \
__ia64_atomic_const(i)  \
? ia64_fetch_and_add(__ia64_aar_i, &(v)->counter)   \
: ia64_atomic64_add(__ia64_aar_i, v);   \
@@ -170,7 +170,7 @@ ATOMIC64_OPS(sub, -)
 
 #define atomic64_sub_return(i,v)   \
 ({ \
-   long __ia64_asr_i = (i);\
+   s64 __ia64_asr_i = (i); \
__ia64_atomic_const(i)  \
? ia64_fetch_and_add(-__ia64_asr_i, &(v)->counter)  \
: ia64_atomic64_sub(__ia64_asr_i, v);   \
@@ -178,7 +178,7 @@ ATOMIC64_OPS(sub, -)
 
 #define atomic64_fetch_add(i,v)
\
 ({ \
-   long __ia64_aar_i = (i);\
+   s64 __ia64_aar_i = (i); \
__ia64_atomic_const(i)  \
? ia64_fetchadd(__ia64_aar_i, &(v)->counter, acq)   \
: ia64_atomic64_fetch_add(__ia64_aar_i, v); \
@@ -186,7 +186,7 @@ ATOMIC64_OPS(sub, -)
 
 #define atomic64_fetch_sub(i,v)
\
 ({ \
-   long __ia64_asr_i = (i);\
+   s64 __ia64_asr_i = (i); \
__ia64_atomic_const(i)  \
? ia64_fetchadd(-__ia64_asr_i, &(v)->counter, acq)  \
: ia64_atomic64_fetch_sub(__ia64_asr_i, v); \
-- 
2.11.0

[PATCH 03/18] locking/atomic: generic: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the generic atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long long, matching the generated
headers.

Otherwise, there should be no functional change as a result of this
patch.

Signed-off-by: Mark Rutland 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 include/asm-generic/atomic64.h | 20 ++--
 lib/atomic64.c | 32 
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/include/asm-generic/atomic64.h b/include/asm-generic/atomic64.h
index 97b28b7f1f29..fc7b831ed632 100644
--- a/include/asm-generic/atomic64.h
+++ b/include/asm-generic/atomic64.h
@@ -14,24 +14,24 @@
 #include 
 
 typedef struct {
-   long long counter;
+   s64 counter;
 } atomic64_t;
 
 #define ATOMIC64_INIT(i)   { (i) }
 
-extern long long atomic64_read(const atomic64_t *v);
-extern void atomic64_set(atomic64_t *v, long long i);
+extern s64 atomic64_read(const atomic64_t *v);
+extern void atomic64_set(atomic64_t *v, s64 i);
 
 #define atomic64_set_release(v, i) atomic64_set((v), (i))
 
 #define ATOMIC64_OP(op)
\
-extern void atomic64_##op(long long a, atomic64_t *v);
+extern void atomic64_##op(s64 a, atomic64_t *v);
 
 #define ATOMIC64_OP_RETURN(op) \
-extern long long atomic64_##op##_return(long long a, atomic64_t *v);
+extern s64 atomic64_##op##_return(s64 a, atomic64_t *v);
 
 #define ATOMIC64_FETCH_OP(op)  \
-extern long long atomic64_fetch_##op(long long a, atomic64_t *v);
+extern s64 atomic64_fetch_##op(s64 a, atomic64_t *v);
 
 #define ATOMIC64_OPS(op)   ATOMIC64_OP(op) ATOMIC64_OP_RETURN(op) 
ATOMIC64_FETCH_OP(op)
 
@@ -50,11 +50,11 @@ ATOMIC64_OPS(xor)
 #undef ATOMIC64_OP_RETURN
 #undef ATOMIC64_OP
 
-extern long long atomic64_dec_if_positive(atomic64_t *v);
+extern s64 atomic64_dec_if_positive(atomic64_t *v);
 #define atomic64_dec_if_positive atomic64_dec_if_positive
-extern long long atomic64_cmpxchg(atomic64_t *v, long long o, long long n);
-extern long long atomic64_xchg(atomic64_t *v, long long new);
-extern long long atomic64_fetch_add_unless(atomic64_t *v, long long a, long 
long u);
+extern s64 atomic64_cmpxchg(atomic64_t *v, s64 o, s64 n);
+extern s64 atomic64_xchg(atomic64_t *v, s64 new);
+extern s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u);
 #define atomic64_fetch_add_unless atomic64_fetch_add_unless
 
 #endif  /*  _ASM_GENERIC_ATOMIC64_H  */
diff --git a/lib/atomic64.c b/lib/atomic64.c
index 1d91e31eceec..62f218bf50a0 100644
--- a/lib/atomic64.c
+++ b/lib/atomic64.c
@@ -46,11 +46,11 @@ static inline raw_spinlock_t *lock_addr(const atomic64_t *v)
return _lock[addr & (NR_LOCKS - 1)].lock;
 }
 
-long long atomic64_read(const atomic64_t *v)
+s64 atomic64_read(const atomic64_t *v)
 {
unsigned long flags;
raw_spinlock_t *lock = lock_addr(v);
-   long long val;
+   s64 val;
 
raw_spin_lock_irqsave(lock, flags);
val = v->counter;
@@ -59,7 +59,7 @@ long long atomic64_read(const atomic64_t *v)
 }
 EXPORT_SYMBOL(atomic64_read);
 
-void atomic64_set(atomic64_t *v, long long i)
+void atomic64_set(atomic64_t *v, s64 i)
 {
unsigned long flags;
raw_spinlock_t *lock = lock_addr(v);
@@ -71,7 +71,7 @@ void atomic64_set(atomic64_t *v, long long i)
 EXPORT_SYMBOL(atomic64_set);
 
 #define ATOMIC64_OP(op, c_op)  \
-void atomic64_##op(long long a, atomic64_t *v) \
+void atomic64_##op(s64 a, atomic64_t *v)   \
 {  \
unsigned long flags;\
raw_spinlock_t *lock = lock_addr(v);\
@@ -83,11 +83,11 @@ void atomic64_##op(long long a, atomic64_t *v)  
\
 EXPORT_SYMBOL(atomic64_##op);
 
 #define ATOMIC64_OP_RETURN(op, c_op)   \
-long long atomic64_##op##_return(long long a, atomic64_t *v)   \
+s64 atomic64_##op##_return(s64 a, atomic64_t *v)   \
 {  \
unsigned long flags;\
raw_spinlock_t *lock = lock_addr(v);\
-   long long val;  \
+   s64 val;\
\
raw_spin_lock_irqsave(lock, flags); \
val = (v->counter c_op a);  \
@@ -97,11 +97,11 @@ long long atomic64_##op##_return

[PATCH 04/18] locking/atomic: alpha: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the alpha atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than long, matching the generated headers.

As atomic64_read() depends on the generic defintion of atomic64_t, this
still returns long. This will be converted in a subsequent patch.

Otherwise, there should be no functional change as a result of this
patch.

Signed-off-by: Mark Rutland 
Cc: Ivan Kokshaysky 
Cc: Matt Turner 
Cc: Peter Zijlstra 
Cc: Richard Henderson 
Cc: Will Deacon 
---
 arch/alpha/include/asm/atomic.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/alpha/include/asm/atomic.h b/arch/alpha/include/asm/atomic.h
index 150a1c5d6a2c..2144530d1428 100644
--- a/arch/alpha/include/asm/atomic.h
+++ b/arch/alpha/include/asm/atomic.h
@@ -93,9 +93,9 @@ static inline int atomic_fetch_##op##_relaxed(int i, atomic_t 
*v) \
 }
 
 #define ATOMIC64_OP(op, asm_op)
\
-static __inline__ void atomic64_##op(long i, atomic64_t * v)   \
+static __inline__ void atomic64_##op(s64 i, atomic64_t * v)\
 {  \
-   unsigned long temp; \
+   s64 temp;   \
__asm__ __volatile__(   \
"1: ldq_l %0,%1\n"  \
"   " #asm_op " %0,%2,%0\n" \
@@ -109,9 +109,9 @@ static __inline__ void atomic64_##op(long i, atomic64_t * 
v)\
 }  \
 
 #define ATOMIC64_OP_RETURN(op, asm_op) \
-static __inline__ long atomic64_##op##_return_relaxed(long i, atomic64_t * v)  
\
+static __inline__ s64 atomic64_##op##_return_relaxed(s64 i, atomic64_t * v)
\
 {  \
-   long temp, result;  \
+   s64 temp, result;   \
__asm__ __volatile__(   \
"1: ldq_l %0,%1\n"  \
"   " #asm_op " %0,%3,%2\n" \
@@ -128,9 +128,9 @@ static __inline__ long atomic64_##op##_return_relaxed(long 
i, atomic64_t * v)   \
 }
 
 #define ATOMIC64_FETCH_OP(op, asm_op)  \
-static __inline__ long atomic64_fetch_##op##_relaxed(long i, atomic64_t * v)   
\
+static __inline__ s64 atomic64_fetch_##op##_relaxed(s64 i, atomic64_t * v) 
\
 {  \
-   long temp, result;  \
+   s64 temp, result;   \
__asm__ __volatile__(   \
"1: ldq_l %2,%1\n"  \
"   " #asm_op " %2,%3,%0\n" \
@@ -246,9 +246,9 @@ static __inline__ int atomic_fetch_add_unless(atomic_t *v, 
int a, int u)
  * Atomically adds @a to @v, so long as it was not @u.
  * Returns the old value of @v.
  */
-static __inline__ long atomic64_fetch_add_unless(atomic64_t *v, long a, long u)
+static __inline__ s64 atomic64_fetch_add_unless(atomic64_t *v, s64 a, s64 u)
 {
-   long c, new, old;
+   s64 c, new, old;
smp_mb();
__asm__ __volatile__(
"1: ldq_l   %[old],%[mem]\n"
@@ -276,9 +276,9 @@ static __inline__ long atomic64_fetch_add_unless(atomic64_t 
*v, long a, long u)
  * The function returns the old value of *v minus 1, even if
  * the atomic variable, v, was not decremented.
  */
-static inline long atomic64_dec_if_positive(atomic64_t *v)
+static inline s64 atomic64_dec_if_positive(atomic64_t *v)
 {
-   long old, tmp;
+   s64 old, tmp;
smp_mb();
__asm__ __volatile__(
"1: ldq_l   %[old],%[mem]\n"
-- 
2.11.0

[PATCH 05/18] locking/atomic: arc: use s64 for atomic64

2019-05-22 Thread Mark Rutland

As a step towards making the atomic64 API use consistent types treewide,
let's have the arc atomic64 implementation use s64 as the underlying
type for atomic64_t, rather than u64, matching the generated headers.

Otherwise, there should be no functional change as a result of this
patch.

Signed-off-by: Mark Rutland 
Cc: Peter Zijlstra 
Cc: Vineet Gupta 
Cc: Will Deacon 
---
 arch/arc/include/asm/atomic.h | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 158af079838d..2c75df55d0d2 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -324,14 +324,14 @@ ATOMIC_OPS(xor, ^=, CTOP_INST_AXOR_DI_R2_R2_R3)
  */
 
 typedef struct {
-   aligned_u64 counter;
+   s64 __aligned(8) counter;
 } atomic64_t;
 
 #define ATOMIC64_INIT(a) { (a) }
 
-static inline long long atomic64_read(const atomic64_t *v)
+static inline s64 atomic64_read(const atomic64_t *v)
 {
-   unsigned long long val;
+   s64 val;
 
__asm__ __volatile__(
"   ldd   %0, [%1]  \n"
@@ -341,7 +341,7 @@ static inline long long atomic64_read(const atomic64_t *v)
return val;
 }
 
-static inline void atomic64_set(atomic64_t *v, long long a)
+static inline void atomic64_set(atomic64_t *v, s64 a)
 {
/*
 * This could have been a simple assignment in "C" but would need
@@ -362,9 +362,9 @@ static inline void atomic64_set(atomic64_t *v, long long a)
 }
 
 #define ATOMIC64_OP(op, op1, op2)  \
-static inline void atomic64_##op(long long a, atomic64_t *v)   \
+static inline void atomic64_##op(s64 a, atomic64_t *v) \
 {  \
-   unsigned long long val; \
+   s64 val;\
\
__asm__ __volatile__(   \
"1: \n" \
@@ -375,13 +375,13 @@ static inline void atomic64_##op(long long a, atomic64_t 
*v)  \
"   bnz 1b  \n" \
: "="(val)\
: "r"(>counter), "ir"(a) \
-   : "cc");\
+   : "cc");\
 }  \
 
 #define ATOMIC64_OP_RETURN(op, op1, op2)   \
-static inline long long atomic64_##op##_return(long long a, atomic64_t *v) 
\
+static inline s64 atomic64_##op##_return(s64 a, atomic64_t *v) \
 {  \
-   unsigned long long val; \
+   s64 val;\
\
smp_mb();   \
\
@@ -402,9 +402,9 @@ static inline long long atomic64_##op##_return(long long a, 
atomic64_t *v)  \
 }
 
 #define ATOMIC64_FETCH_OP(op, op1, op2)
\
-static inline long long atomic64_fetch_##op(long long a, atomic64_t *v)
\
+static inline s64 atomic64_fetch_##op(s64 a, atomic64_t *v)\
 {  \
-   unsigned long long val, orig;   \
+   s64 val, orig;  \
\
smp_mb();   \
\
@@ -444,10 +444,10 @@ ATOMIC64_OPS(xor, xor, xor)
 #undef ATOMIC64_OP_RETURN
 #undef ATOMIC64_OP
 
-static inline long long
-atomic64_cmpxchg(atomic64_t *ptr, long long expected, long long new)
+static inline s64
+atomic64_cmpxchg(atomic64_t *ptr, s64 expected, s64 new)
 {
-   long long prev;
+   s64 prev;
 
smp_mb();
 
@@ -467,9 +467,9 @@ atomic64_cmpxchg(atomic64_t *ptr, long long expected, long 
long new)
return prev;
 }
 
-static inline long long atomic64_xchg(atomic64_t *ptr, long long new)
+static inline s64 atomic64_xchg(atomic64_t *ptr, s64 new)
 {
-   long long prev;
+   s64 prev;
 
smp_mb();
 
@@ -495,9 +495,9 @@ static inline long long atomic64_xchg(ato

[PATCH 01/18] locking/atomic: crypto: nx: prepare for atomic64_read() conversion

2019-05-22 Thread Mark Rutland

The return type of atomic64_read() varies by architecture. It may return
long (e.g. powerpc), long long (e.g. arm), or s64 (e.g. x86_64). This is
somewhat painful, and mandates the use of explicit casts in some cases
(e.g. when printing the return value).

To ameliorate matters, subsequent patches will make the atomic64 API
consistently use s64.

As a preparatory step, this patch updates the nx-842 code to treat the
return value of atomic64_read() as s64, using explicit casts. These
casts will be removed once the s64 conversion is complete.

Signed-off-by: Mark Rutland 
Cc: Herbert Xu 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 drivers/crypto/nx/nx-842-pseries.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/nx/nx-842-pseries.c 
b/drivers/crypto/nx/nx-842-pseries.c
index 57932848361b..9432e9e42afe 100644
--- a/drivers/crypto/nx/nx-842-pseries.c
+++ b/drivers/crypto/nx/nx-842-pseries.c
@@ -869,8 +869,8 @@ static ssize_t nx842_##_name##_show(struct device *dev, 
\
rcu_read_lock();\
local_devdata = rcu_dereference(devdata);   \
if (local_devdata)  \
-   p = snprintf(buf, PAGE_SIZE, "%ld\n",   \
-  atomic64_read(_devdata->counters->_name)); \
+   p = snprintf(buf, PAGE_SIZE, "%lld\n",  \
+  (s64)atomic64_read(_devdata->counters->_name));
\
rcu_read_unlock();  \
return p;   \
 }
@@ -922,17 +922,17 @@ static ssize_t nx842_timehist_show(struct device *dev,
}
 
for (i = 0; i < (NX842_HIST_SLOTS - 2); i++) {
-   bytes = snprintf(p, bytes_remain, "%u-%uus:\t%ld\n",
+   bytes = snprintf(p, bytes_remain, "%u-%uus:\t%lld\n",
   i ? (2<<(i-1)) : 0, (2<

[PATCH 02/18] locking/atomic: s390/pci: prepare for atomic64_read() conversion

2019-05-22 Thread Mark Rutland

The return type of atomic64_read() varies by architecture. It may return
long (e.g. powerpc), long long (e.g. arm), or s64 (e.g. x86_64). This is
somewhat painful, and mandates the use of explicit casts in some cases
(e.g. when printing the return value).

To ameliorate matters, subsequent patches will make the atomic64 API
consistently use s64.

As a preparatory step, this patch updates the s390 pci debug code to
treat the return value of atomic64_read() as s64, using an explicit
cast. This cast will be removed once the s64 conversion is complete.

Signed-off-by: Mark Rutland 
Cc: Heiko Carstens 
Cc: Peter Zijlstra 
Cc: Will Deacon 
---
 arch/s390/pci/pci_debug.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/pci/pci_debug.c b/arch/s390/pci/pci_debug.c
index 6b48ca7760a7..45eccf79e990 100644
--- a/arch/s390/pci/pci_debug.c
+++ b/arch/s390/pci/pci_debug.c
@@ -74,8 +74,8 @@ static void pci_sw_counter_show(struct seq_file *m)
int i;
 
for (i = 0; i < ARRAY_SIZE(pci_sw_names); i++, counter++)
-   seq_printf(m, "%26s:\t%lu\n", pci_sw_names[i],
-  atomic64_read(counter));
+   seq_printf(m, "%26s:\t%llu\n", pci_sw_names[i],
+  (s64)atomic64_read(counter));
 }
 
 static int pci_perf_show(struct seq_file *m, void *v)
-- 
2.11.0

[PATCH 00/18] locking/atomic: atomic64 type cleanup

2019-05-22 Thread Mark Rutland

Currently architectures return inconsistent types for atomic64 ops. Some return
long (e..g. powerpc), some return long long (e.g. arc), and some return s64
(e.g. x86).

This is a bit messy, and causes unnecessary pain (e.g. as values must be cast
before they can be printed [1]).

This series reworks all the atomic64 implementations to use s64 as the base
type for atomic64_t (as discussed [2]), and to ensure that this type is
consistently used for parameters and return values in the API, avoiding further
problems in this area.

This series (based on v5.1-rc1) can also be found in my atomics/type-cleanup
branch [3] on kernel.org.

Thanks,
Mark.

[1] https://lkml.kernel.org/r/20190310183051.87303-1-...@lca.pw
[2] 
https://lkml.kernel.org/r/20190313091844.ga24...@hirez.programming.kicks-ass.net
[3] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git 
atomics/type-cleanup

Mark Rutland (18):
  locking/atomic: crypto: nx: prepare for atomic64_read() conversion
  locking/atomic: s390/pci: prepare for atomic64_read() conversion
  locking/atomic: generic: use s64 for atomic64
  locking/atomic: alpha: use s64 for atomic64
  locking/atomic: arc: use s64 for atomic64
  locking/atomic: arm: use s64 for atomic64
  locking/atomic: arm64: use s64 for atomic64
  locking/atomic: ia64: use s64 for atomic64
  locking/atomic: mips: use s64 for atomic64
  locking/atomic: powerpc: use s64 for atomic64
  locking/atomic: riscv: fix atomic64_sub_if_positive() offset argument
  locking/atomic: riscv: use s64 for atomic64
  locking/atomic: s390: use s64 for atomic64
  locking/atomic: sparc: use s64 for atomic64
  locking/atomic: x86: use s64 for atomic64
  locking/atomic: use s64 for atomic64_t on 64-bit
  locking/atomic: crypto: nx: remove redundant casts
  locking/atomic: s390/pci: remove redundant casts

 arch/alpha/include/asm/atomic.h   | 20 +--
 arch/arc/include/asm/atomic.h | 41 +++---
 arch/arm/include/asm/atomic.h | 50 +-
 arch/arm64/include/asm/atomic_ll_sc.h | 20 +--
 arch/arm64/include/asm/atomic_lse.h   | 34 +-
 arch/ia64/include/asm/atomic.h| 20 +--
 arch/mips/include/asm/atomic.h| 22 ++--
 arch/powerpc/include/asm/atomic.h | 44 +++
 arch/riscv/include/asm/atomic.h   | 44 ---
 arch/s390/include/asm/atomic.h| 38 ++--
 arch/s390/pci/pci_debug.c |  2 +-
 arch/sparc/include/asm/atomic_64.h|  8 ++---
 arch/x86/include/asm/atomic64_32.h| 66 +--
 arch/x86/include/asm/atomic64_64.h| 38 ++--
 drivers/crypto/nx/nx-842-pseries.c|  6 ++--
 include/asm-generic/atomic64.h| 20 +--
 include/linux/types.h |  2 +-
 lib/atomic64.c| 32 -
 18 files changed, 252 insertions(+), 255 deletions(-)

-- 
2.11.0

Re: [PATCH 3/6] arm64: pmu: Add function implementation to update event index in userpage.

2019-05-17 Thread Mark Rutland

On Thu, May 16, 2019 at 02:21:45PM +0100, Raphael Gault wrote:
> In order to be able to access the counter directly for userspace,
> we need to provide the index of the counter using the userpage.
> We thus need to override the event_idx function to retrieve and
> convert the perf_event index to armv8 hardware index.

It would be worth noting that since the arm_pmu framework can be used
with other versions of the PMU architecture which could not permit safe
userspace access, we allow the arch code to opt-in with the
ARMPMU_EL0_RD_CNTR flag.

> Signed-off-by: Raphael Gault 
> ---
>  arch/arm64/kernel/perf_event.c |  4 
>  drivers/perf/arm_pmu.c | 10 ++
>  include/linux/perf/arm_pmu.h   |  2 ++
>  3 files changed, 16 insertions(+)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index 6164d389eed6..e6316f99f66b 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -890,6 +890,8 @@ static int __armv8_pmuv3_map_event(struct perf_event 
> *event,
>   if (armv8pmu_event_is_64bit(event))
>   event->hw.flags |= ARMPMU_EVT_64BIT;
>  
> + event->hw.flags |= ARMPMU_EL0_RD_CNTR;
> +
>   /* Only expose micro/arch events supported by this PMU */
>   if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
>   && test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
> @@ -1188,6 +1190,8 @@ void arch_perf_update_userpage(struct perf_event *event,
>*/
>   freq = arch_timer_get_rate();
>   userpg->cap_user_time = 1;
> + userpg->cap_user_rdpmc =
> + !!(event->hw.flags & ARMPMU_EL0_RD_CNTR);

This is under 80 columns when placed on a single line, so it doesn't
need to be split here.

>  
>   clocks_calc_mult_shift(>time_mult, , freq,
>   NSEC_PER_SEC, 0);
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index eec75b97e7ea..3f4c2ec7ff89 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -777,6 +777,15 @@ static void cpu_pmu_destroy(struct arm_pmu *cpu_pmu)
>   _pmu->node);
>  }
>  
> +
> +static int armpmu_event_idx(struct perf_event *event)
> +{
> + if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
> + return 0;
> +
> + return event->hw.idx;

I think this needs to remap ARMV8_IDX_CYCLE_COUNTER to 32, to match the
offset applie to the rest of counter indices.

Otherwise, this looks good to me.

Thanks,
Mark.

> +}
> +
>  static struct arm_pmu *__armpmu_alloc(gfp_t flags)
>  {
>   struct arm_pmu *pmu;
> @@ -803,6 +812,7 @@ static struct arm_pmu *__armpmu_alloc(gfp_t flags)
>   .start  = armpmu_start,
>   .stop   = armpmu_stop,
>   .read   = armpmu_read,
> + .event_idx  = armpmu_event_idx,
>   .filter_match   = armpmu_filter_match,
>   .attr_groups= pmu->attr_groups,
>   /*
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 4641e850b204..3bef390c1069 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -30,6 +30,8 @@
>   */
>  /* Event uses a 64bit counter */
>  #define ARMPMU_EVT_64BIT 1
> +/* Allow access to hardware counter from userspace */
> +#define ARMPMU_EL0_RD_CNTR   2
>  
>  #define HW_OP_UNSUPPORTED0x
>  #define C(_x)PERF_COUNT_HW_CACHE_##_x
> -- 
> 2.17.1
>

Re: [PATCH 4/6] arm64: pmu: Add hook to handle pmu-related undefined instructions

2019-05-17 Thread Mark Rutland

On Fri, May 17, 2019 at 09:10:18AM +0200, Peter Zijlstra wrote:
> On Thu, May 16, 2019 at 02:21:46PM +0100, Raphael Gault wrote:
> > In order to prevent the userspace processes which are trying to access
> > the registers from the pmu registers on a big.LITTLE environment we
> > introduce a hook to handle undefined instructions.
> > 
> > The goal here is to prevent the process to be interrupted by a signal
> > when the error is caused by the task being scheduled while accessing
> > a counter, causing the counter access to be invalid. As we are not able
> > to know efficiently the number of counters available physically on both
> > pmu in that context we consider that any faulting access to a counter
> > which is architecturally correct should not cause a SIGILL signal if
> > the permissions are set accordingly.
> 
> The other approach is using rseq for this; with that you can guarantee
> it will never issue the instruction on a wrong CPU.
> 
> That said; emulating the thing isn't horrible either.

Yup. Attempting to use rseq is on the todo list.

> > +   /*
> > +* We put 0 in the target register if we
> > +* are reading from pmu register. If we are
> > +* writing, we do nothing.
> > +*/
> 
> Wait _what_ ?!? userspace can _WRITE_ to these registers?

Remember that this is in an undefined (trap) handler.

If userspace _attempts_ to write to the registers, the CPU will trap to the
kernel. The comment is perhaps misleading; when we "do nothing", the common
trap handling code will send a SIGILL to userspace.

It would probably be better to say something like:

/*
 * If userspace is tries to read a counter that doesn't exist on this
 * CPU, we emulate it as reading as zero. This happens if userspace is
 * preempted between reading the idx and actually reading the counter,
 * and the seqlock and idx have already changed, so it's as-if the
 * counter has been reprogrammed with a different event.
 *
 * We don't permit userspace to write to these registers, and will
 * inject a SIGILL.
 */

There is one caveat: userspace can write to PMSELR without trapping, so we will
have to context-switch with the task. That only affects indirect addressing of
PMU registers, and doesn't have a functional effect on the behaviour of the
PMU, so that's benign from the PoV of perf.

Thanks,
Mark.

Re: [PATCH v3 2/3] arm64: implement update_fdt_pgprot()

2019-05-16 Thread Mark Rutland

On Thu, May 16, 2019 at 09:37:05AM -0500, Rob Herring wrote:
> On Thu, May 16, 2019 at 5:28 AM Hsin-Yi Wang  wrote:
> >
> > Basically does similar things like __fixmap_remap_fdt(). It's supposed
> > to be called after fixmap_remap_fdt() is called at least once, so region
> > checking can be skipped. Since it needs to know dt physical address, make
> > a copy of the value of __fdt_pointer.
> >
> > Signed-off-by: Hsin-Yi Wang 
> > ---
> >  arch/arm64/kernel/setup.c |  2 ++
> >  arch/arm64/mm/mmu.c   | 17 +
> >  2 files changed, 19 insertions(+)
> 
> Why not just map the FDT R/W at the start and change it to RO just
> before calling unflatten_device_tree? Then all the FDT scanning
> functions or any future fixups we need can just assume R/W. That is
> essentially what Stephen suggested. However, there's no need for a
> weak function as it can all be done within the arch code.
> 
> However, I'm still wondering why the FDT needs to be RO in the first place.

We want to preserve the original FDT in a pristine form for kexec (and
when exposed to userspace), and mapping it RO was the easiest way to
catch it being randomly modified (e.g. without fixups applied).

I'd prefer to keep it RO once we've removed/cleared certain properties
from the chosen node that don't make sense to pass on for kexec

Thanks,
Mark.

Re: Bad virt_to_phys since commit 54c7a8916a887f35

2019-05-16 Thread Mark Rutland

On Thu, May 16, 2019 at 03:20:59PM +0100, Steven Price wrote:
> On 16/05/2019 15:16, Mark Rutland wrote:
> > On Thu, May 16, 2019 at 03:05:31PM +0100, Steven Price wrote:
> >> I suspect the following is sufficient to fix the problem:
> >>
> >> 8<-
> >> diff --git a/init/initramfs.c b/init/initramfs.c
> >> index 435a428c2af1..178130fd61c2 100644
> >> --- a/init/initramfs.c
> >> +++ b/init/initramfs.c
> >> @@ -669,7 +669,7 @@ static int __init populate_rootfs(void)
> >> * If the initrd region is overlapped with crashkernel reserved region,
> >> * free only memory that is not part of crashkernel region.
> >> */
> >> -  if (!do_retain_initrd && !kexec_free_initrd())
> >> +  if (!do_retain_initrd && initrd_start && !kexec_free_initrd())
> >>free_initrd_mem(initrd_start, initrd_end);
> >>initrd_start = 0;
> >>initrd_end = 0;
> > 
> > That works for me. If you spin this as a real patch:
> > 
> > Tested-by: Mark Rutland 
> > 
> > As I mentioned, initrd_start has not been initialized at all, so I
> > suspect we should also update its declaration in init/do_mounts_initrd.c
> > such that it is guaranteed to be initialized to zero. We get away with
> > that today, but that won't necessarily hold with LTO and so on...
> 
> Well it's a global variable, so the C standard says it should be
> initialised to 0...

For some reason I was under the impression that wasn't guaranteed, but I
see that it is. Sorry for the noise.

> I'll spin a real patch and add your Tested-by

Great; thanks!

Mark.

Re: Bad virt_to_phys since commit 54c7a8916a887f35

2019-05-16 Thread Mark Rutland

On Thu, May 16, 2019 at 05:13:14PM +0300, Mike Rapoport wrote:
> On Thu, May 16, 2019 at 02:41:06PM +0100, Mark Rutland wrote:
> > On Thu, May 16, 2019 at 02:38:20PM +0100, Mark Rutland wrote:
> > > Hi,
> > > 
> > > Since commit:
> > > 
> > >   54c7a8916a887f35 ("initramfs: free initrd memory if opening 
> > > /initrd.image fails")
> > 
> > Ugh, I dropped a paragarph here.
> > 
> > Since that commit, I'm seeing a boot-time splat on arm64 when using
> > CONFIG_DEBUG_VIRTUAL. I'm running an arm64 syzkaller instance, and this
> > kills the VM, preventing further testing, which is unfortunate.
> > 
> > Mark.
> > 
> > > IIUC prior to that commit, we'd only attempt to free an intird if we had
> > > one, whereas now we do so unconditionally. AFAICT, in this case
> > > initrd_start has not been initialized (I'm not using an initrd or
> > > initramfs on my system), so we end up trying virt_to_phys() on a bogus
> > > VA in free_initrd_mem().
> > > 
> > > Any ideas on the right way to fix this?
> 
> If I remember correctly, initrd_start would be 0 unless explicitly set by
> the arch setup code, so something like this could work:
> 
> diff --git a/init/initramfs.c b/init/initramfs.c
> index 435a428c2af1..05fe60437796 100644
> --- a/init/initramfs.c
> +++ b/init/initramfs.c
> @@ -529,6 +529,9 @@ extern unsigned long __initramfs_size;
>  
>  void __weak free_initrd_mem(unsigned long start, unsigned long end)
>  {
> +   if (!start)
> +   return;
> +
> free_reserved_area((void *)start, (void *)end, POISON_FREE_INITMEM,
> "initrd");
>  }

I think this should work, given Steven's patch checks the same thing.

I don't have a preference as to which patch should be taken, so I'll
leave that to Christoph.

Thanks,
Mark.

Re: Bad virt_to_phys since commit 54c7a8916a887f35

2019-05-16 Thread Mark Rutland

On Thu, May 16, 2019 at 03:05:31PM +0100, Steven Price wrote:
> On 16/05/2019 14:41, Mark Rutland wrote:
> > On Thu, May 16, 2019 at 02:38:20PM +0100, Mark Rutland wrote:
> >> Hi,
> >>
> >> Since commit:
> >>
> >>   54c7a8916a887f35 ("initramfs: free initrd memory if opening 
> >> /initrd.image fails")
> > 
> > Ugh, I dropped a paragarph here.
> > 
> > Since that commit, I'm seeing a boot-time splat on arm64 when using
> > CONFIG_DEBUG_VIRTUAL. I'm running an arm64 syzkaller instance, and this
> > kills the VM, preventing further testing, which is unfortunate.
> > 
> > Mark.
> > 
> >> IIUC prior to that commit, we'd only attempt to free an intird if we had
> >> one, whereas now we do so unconditionally. AFAICT, in this case
> >> initrd_start has not been initialized (I'm not using an initrd or
> >> initramfs on my system), so we end up trying virt_to_phys() on a bogus
> >> VA in free_initrd_mem().
> >>
> >> Any ideas on the right way to fix this?
> 
> Your analysis looks right to me. In my review I'd managed to spot the
> change in behaviour when CONFIG_INITRAMFS_FORCE is set (the initrd is
> freed), but I'd overlooked what happens if initrd_start == 0 (the
> non-existent initrd is attempted to be freed).
> 
> I suspect the following is sufficient to fix the problem:
> 
> 8<-
> diff --git a/init/initramfs.c b/init/initramfs.c
> index 435a428c2af1..178130fd61c2 100644
> --- a/init/initramfs.c
> +++ b/init/initramfs.c
> @@ -669,7 +669,7 @@ static int __init populate_rootfs(void)
>* If the initrd region is overlapped with crashkernel reserved region,
>* free only memory that is not part of crashkernel region.
>*/
> - if (!do_retain_initrd && !kexec_free_initrd())
> + if (!do_retain_initrd && initrd_start && !kexec_free_initrd())
>   free_initrd_mem(initrd_start, initrd_end);
>   initrd_start = 0;
>   initrd_end = 0;

That works for me. If you spin this as a real patch:

Tested-by: Mark Rutland 

As I mentioned, initrd_start has not been initialized at all, so I
suspect we should also update its declaration in init/do_mounts_initrd.c
such that it is guaranteed to be initialized to zero. We get away with
that today, but that won't necessarily hold with LTO and so on...

Thanks,
Mark.

Re: Bad virt_to_phys since commit 54c7a8916a887f35

2019-05-16 Thread Mark Rutland

On Thu, May 16, 2019 at 02:38:20PM +0100, Mark Rutland wrote:
> Hi,
> 
> Since commit:
> 
>   54c7a8916a887f35 ("initramfs: free initrd memory if opening /initrd.image 
> fails")

Ugh, I dropped a paragarph here.

Since that commit, I'm seeing a boot-time splat on arm64 when using
CONFIG_DEBUG_VIRTUAL. I'm running an arm64 syzkaller instance, and this
kills the VM, preventing further testing, which is unfortunate.

Mark.

> IIUC prior to that commit, we'd only attempt to free an intird if we had
> one, whereas now we do so unconditionally. AFAICT, in this case
> initrd_start has not been initialized (I'm not using an initrd or
> initramfs on my system), so we end up trying virt_to_phys() on a bogus
> VA in free_initrd_mem().
> 
> Any ideas on the right way to fix this?
> 
> Thanks,
> Mark.
> 
> [5.251023][T1] [ cut here ]
> [5.252465][T1] virt_to_phys used for non-linear address: 
> (ptrval) (0x0)
> [5.254388][T1] WARNING: CPU: 0 PID: 1 at arch/arm64/mm/physaddr.c:15 
> __virt_to_phys+0x88/0xb8
> [5.256473][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
> 5.1.0-11058-g83f3ef3 #4
> [5.258513][T1] Hardware name: linux,dummy-virt (DT)
> [5.259923][T1] pstate: 8045 (Nzcv daif +PAN -UAO)
> [5.261375][T1] pc : __virt_to_phys+0x88/0xb8
> [5.262623][T1] lr : __virt_to_phys+0x88/0xb8
> [5.263879][T1] sp : 8be4fc60
> [5.264941][T1] x29: 8be4fc60 x28: 4000 
> [5.266522][T1] x27: 200015445000 x26:  
> [5.268112][T1] x25:  x24: 2000163e 
> [5.269691][T1] x23: 2000163e0440 x22: 2000163e 
> [5.271270][T1] x21: 2000163e0400 x20:  
> [5.272860][T1] x19:  x18: 200016aa0f80 
> [5.274430][T1] x17: 2000153a x16: f200 
> [5.276018][T1] x15: 1fffe40002d5560d x14: 17716109 
> [5.277596][T1] x13: 200016e17000 x12: 040002a83fd9 
> [5.279179][T1] x11: 1fffe40002a83fd8 x10: 040002a83fd8 
> [5.280765][T1] x9 : 1fffe40002a83fd8 x8 : dfff2000 
> [5.282343][T1] x7 : 040002a83fd9 x6 : 20001541fec0 
> [5.283929][T1] x5 : 80003b8b0040 x4 :  
> [5.285509][T1] x3 : 2000102c6504 x2 : 117c9f54 
> [5.287091][T1] x1 : 15eab2dadba38000 x0 :  
> [5.288678][T1] Call trace:
> [5.289532][T1]  __virt_to_phys+0x88/0xb8
> [5.290701][T1]  free_initrd_mem+0x3c/0x50
> [5.291894][T1]  populate_rootfs+0x2f4/0x358
> [5.293123][T1]  do_one_initcall+0x568/0xb94
> [5.294349][T1]  kernel_init_freeable+0xd44/0xe2c
> [5.295695][T1]  kernel_init+0x14/0x1c0
> [5.296814][T1]  ret_from_fork+0x10/0x1c
> [5.297947][T1] irq event stamp: 288672
> [5.299069][T1] hardirqs last  enabled at (288671): 
> [] console_unlock+0x89c/0xe50
> [5.301521][T1] hardirqs last disabled at (288672): 
> [] do_debug_exception+0x268/0x3e0
> [5.304061][T1] softirqs last  enabled at (288668): 
> [] __do_softirq+0xa38/0xf48
> [5.306457][T1] softirqs last disabled at (288653): 
> [] irq_exit+0x2a4/0x318
> [5.308777][T1] ---[ end trace 3cf83e3c184a4d3e ]---

Bad virt_to_phys since commit 54c7a8916a887f35

2019-05-16 Thread Mark Rutland

Hi,

Since commit:

  54c7a8916a887f35 ("initramfs: free initrd memory if opening /initrd.image 
fails")

IIUC prior to that commit, we'd only attempt to free an intird if we had
one, whereas now we do so unconditionally. AFAICT, in this case
initrd_start has not been initialized (I'm not using an initrd or
initramfs on my system), so we end up trying virt_to_phys() on a bogus
VA in free_initrd_mem().

Any ideas on the right way to fix this?

Thanks,
Mark.

[5.251023][T1] [ cut here ]
[5.252465][T1] virt_to_phys used for non-linear address: 
(ptrval) (0x0)
[5.254388][T1] WARNING: CPU: 0 PID: 1 at arch/arm64/mm/physaddr.c:15 
__virt_to_phys+0x88/0xb8
[5.256473][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 
5.1.0-11058-g83f3ef3 #4
[5.258513][T1] Hardware name: linux,dummy-virt (DT)
[5.259923][T1] pstate: 8045 (Nzcv daif +PAN -UAO)
[5.261375][T1] pc : __virt_to_phys+0x88/0xb8
[5.262623][T1] lr : __virt_to_phys+0x88/0xb8
[5.263879][T1] sp : 8be4fc60
[5.264941][T1] x29: 8be4fc60 x28: 4000 
[5.266522][T1] x27: 200015445000 x26:  
[5.268112][T1] x25:  x24: 2000163e 
[5.269691][T1] x23: 2000163e0440 x22: 2000163e 
[5.271270][T1] x21: 2000163e0400 x20:  
[5.272860][T1] x19:  x18: 200016aa0f80 
[5.274430][T1] x17: 2000153a x16: f200 
[5.276018][T1] x15: 1fffe40002d5560d x14: 17716109 
[5.277596][T1] x13: 200016e17000 x12: 040002a83fd9 
[5.279179][T1] x11: 1fffe40002a83fd8 x10: 040002a83fd8 
[5.280765][T1] x9 : 1fffe40002a83fd8 x8 : dfff2000 
[5.282343][T1] x7 : 040002a83fd9 x6 : 20001541fec0 
[5.283929][T1] x5 : 80003b8b0040 x4 :  
[5.285509][T1] x3 : 2000102c6504 x2 : 117c9f54 
[5.287091][T1] x1 : 15eab2dadba38000 x0 :  
[5.288678][T1] Call trace:
[5.289532][T1]  __virt_to_phys+0x88/0xb8
[5.290701][T1]  free_initrd_mem+0x3c/0x50
[5.291894][T1]  populate_rootfs+0x2f4/0x358
[5.293123][T1]  do_one_initcall+0x568/0xb94
[5.294349][T1]  kernel_init_freeable+0xd44/0xe2c
[5.295695][T1]  kernel_init+0x14/0x1c0
[5.296814][T1]  ret_from_fork+0x10/0x1c
[5.297947][T1] irq event stamp: 288672
[5.299069][T1] hardirqs last  enabled at (288671): [] 
console_unlock+0x89c/0xe50
[5.301521][T1] hardirqs last disabled at (288672): [] 
do_debug_exception+0x268/0x3e0
[5.304061][T1] softirqs last  enabled at (288668): [] 
__do_softirq+0xa38/0xf48
[5.306457][T1] softirqs last disabled at (288653): [] 
irq_exit+0x2a4/0x318
[5.308777][T1] ---[ end trace 3cf83e3c184a4d3e ]---

Re: [PATCH V3 4/4] arm64/mm: Enable memory hot remove

2019-05-16 Thread Mark Rutland

On Thu, May 16, 2019 at 11:04:48AM +0530, Anshuman Khandual wrote:
> On 05/15/2019 05:19 PM, Mark Rutland wrote:
> > On Tue, May 14, 2019 at 02:30:07PM +0530, Anshuman Khandual wrote:
> >> Memory removal from an arch perspective involves tearing down two different
> >> kernel based mappings i.e vmemmap and linear while releasing related page
> >> table and any mapped pages allocated for given physical memory range to be
> >> removed.
> >>
> >> Define a common kernel page table tear down helper remove_pagetable() which
> >> can be used to unmap given kernel virtual address range. In effect it can
> >> tear down both vmemap or kernel linear mappings. This new helper is called
> >> from both vmemamp_free() and ___remove_pgd_mapping() during memory removal.
> >>
> >> For linear mapping there are no actual allocated pages which are mapped to
> >> create the translation. Any pfn on a given entry is derived from physical
> >> address (__va(PA) --> PA) whose linear translation is to be created. They
> >> need not be freed as they were never allocated in the first place. But for
> >> vmemmap which is a real virtual mapping (like vmalloc) physical pages are
> >> allocated either from buddy or memblock which get mapped in the kernel page
> >> table. These allocated and mapped pages need to be freed during translation
> >> tear down. But page table pages need to be freed in both these cases.
> > 
> > As previously discussed, we should only hot-remove memory which was
> > hot-added, so we shouldn't encounter memory allocated from memblock.
> 
> Right, not applicable any more. Will drop this word.
> 
> >> These mappings need to be differentiated while deciding if a mapped page at
> >> any level i.e [pte|pmd|pud]_page() should be freed or not. Callers for the
> >> mapping tear down process should pass on 'sparse_vmap' variable identifying
> >> kernel vmemmap mappings.
> > 
> > I think that you can simplify the paragraphs above down to:
> > 
> >   The arch code for hot-remove must tear down portions of the linear map
> >   and vmemmap corresponding to memory being removed. In both cases the
> >   page tables mapping these regions must be freed, and when sparse
> >   vmemmap is in use the memory backing the vmemmap must also be freed.
> > 
> >   This patch adds a new remove_pagetable() helper which can be used to
> >   tear down either region, and calls it from vmemmap_free() and
> >   ___remove_pgd_mapping(). The sparse_vmap argument determines whether
> >   the backing memory will be freed.
> 
> The current one is bit more descriptive on detail. Anyways will replace with
> the above writeup if that is preferred.

I would prefer the suggested form above, as it's easier to extract the
necessary details from it.

[...]

> >> +static void
> >> +remove_pagetable(unsigned long start, unsigned long end, bool sparse_vmap)
> >> +{
> >> +  unsigned long addr, next;
> >> +  pud_t *pudp_base;
> >> +  pgd_t *pgdp;
> >> +
> >> +  spin_lock(_mm.page_table_lock);
> > 
> > It would be good to explain why we need to take the ptl here.
> 
> Will update both commit message and add an in-code comment here.
> 
> > 
> > IIUC that shouldn't be necessary for the linear map. Am I mistaken?
> 
> Its not absolutely necessary for linear map right now because both memory hot
> plug & ptdump which modifies or walks the page table ranges respectively take
> memory hotplug lock. That apart, no other callers creates or destroys linear
> mapping at runtime.
> 
> > 
> > Is there a specific race when tearing down the vmemmap?
> 
> This is trickier than linear map. vmemmap additions would be protected with
> memory hotplug lock but this can potential collide with vmalloc/IO regions.
> Even if they dont right now that will be because they dont share intermediate
> page table levels.

Sure; if we could just state something like:

  The vmemmap region may share levels of table with the vmalloc region.
  Take the ptl so that we can safely free potentially-sahred tables.

... I think that would be sufficient.

Thanks,
Mark.

Re: [PATCH V3 2/4] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump

2019-05-16 Thread Mark Rutland

Hi Michal,

On Wed, May 15, 2019 at 06:58:47PM +0200, Michal Hocko wrote:
> On Tue 14-05-19 14:30:05, Anshuman Khandual wrote:
> > The arm64 pagetable dump code can race with concurrent modification of the
> > kernel page tables. When a leaf entries are modified concurrently, the dump
> > code may log stale or inconsistent information for a VA range, but this is
> > otherwise not harmful.
> > 
> > When intermediate levels of table are freed, the dump code will continue to
> > use memory which has been freed and potentially reallocated for another
> > purpose. In such cases, the dump code may dereference bogus addressses,
> > leading to a number of potential problems.
> > 
> > Intermediate levels of table may by freed during memory hot-remove, or when
> > installing a huge mapping in the vmalloc region. To avoid racing with these
> > cases, take the memory hotplug lock when walking the kernel page table.
> 
> Why is this a problem only on arm64 

It looks like it's not -- I think we're just the first to realise this.

AFAICT x86's debugfs ptdump has the same issue if run conccurently with
memory hot remove. If 32-bit arm supported hot-remove, its ptdump code
would have the same issue.

> and why do we even care for debugfs? Does anybody rely on this thing
> to be reliable? Do we even need it? Who is using the file?

The debugfs part is used intermittently by a few people working on the
arm64 kernel page tables. We use that both to sanity-check that kernel
page tables are created/updated correctly after changes to the arm64 mmu
code, and also to debug issues if/when we encounter issues that appear
to be the result of kernel page table corruption.

So while it's rare to need it, it's really useful to have when we do
need it, and I'd rather not remove it. I'd also rather that it didn't
have latent issues where we can accidentally crash the kernel when using
it, which is what this patch is addressing.

> I am asking because I would really love to make mem hotplug locking less
> scattered outside of the core MM than more. Most users simply shouldn't
> care. Pfn walkers should rely on pfn_to_online_page.

I'm not sure if that would help us here; IIUC pfn_to_online_page() alone
doesn't ensure that the page remains online. Is there a way to achieve
that other than get_online_mems()?

The big problem for the ptdump code is when tables are freed, since the
pages can be reused elsewhere (or hot-removed), causing the ptdump code
to explode.

Thanks,
Mark.

Re: [PATCH V3 4/4] arm64/mm: Enable memory hot remove

2019-05-15 Thread Mark Rutland

Hi Anshuman,

On Tue, May 14, 2019 at 02:30:07PM +0530, Anshuman Khandual wrote:
> Memory removal from an arch perspective involves tearing down two different
> kernel based mappings i.e vmemmap and linear while releasing related page
> table and any mapped pages allocated for given physical memory range to be
> removed.
> 
> Define a common kernel page table tear down helper remove_pagetable() which
> can be used to unmap given kernel virtual address range. In effect it can
> tear down both vmemap or kernel linear mappings. This new helper is called
> from both vmemamp_free() and ___remove_pgd_mapping() during memory removal.
> 
> For linear mapping there are no actual allocated pages which are mapped to
> create the translation. Any pfn on a given entry is derived from physical
> address (__va(PA) --> PA) whose linear translation is to be created. They
> need not be freed as they were never allocated in the first place. But for
> vmemmap which is a real virtual mapping (like vmalloc) physical pages are
> allocated either from buddy or memblock which get mapped in the kernel page
> table. These allocated and mapped pages need to be freed during translation
> tear down. But page table pages need to be freed in both these cases.

As previously discussed, we should only hot-remove memory which was
hot-added, so we shouldn't encounter memory allocated from memblock.

> These mappings need to be differentiated while deciding if a mapped page at
> any level i.e [pte|pmd|pud]_page() should be freed or not. Callers for the
> mapping tear down process should pass on 'sparse_vmap' variable identifying
> kernel vmemmap mappings.

I think that you can simplify the paragraphs above down to:

  The arch code for hot-remove must tear down portions of the linear map
  and vmemmap corresponding to memory being removed. In both cases the
  page tables mapping these regions must be freed, and when sparse
  vmemmap is in use the memory backing the vmemmap must also be freed.

  This patch adds a new remove_pagetable() helper which can be used to
  tear down either region, and calls it from vmemmap_free() and
  ___remove_pgd_mapping(). The sparse_vmap argument determines whether
  the backing memory will be freed.

Could you add a paragraph describing when we can encounter partial
tables (for which we need the p??_none() checks? IIUC that's not just
for cleaning up a failed hot-add, and it would be good to call that out.

> While here update arch_add_mempory() to handle __add_pages() failures by
> just unmapping recently added kernel linear mapping. Now enable memory hot
> remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE.

Nit: s/arch_add_mempory/arch_add_memory/.

[...]

> +#if (CONFIG_PGTABLE_LEVELS > 2)
> +static void free_pmd_table(pmd_t *pmdp, pud_t *pudp, unsigned long addr)
> +{
> + struct page *page;
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PMD; i++) {
> + if (!pmd_none(pmdp[i]))
> + return;
> + }
> +
> + page = pud_page(*pudp);
> + pud_clear(pudp);
> + __flush_tlb_kernel_pgtable(addr);
> + free_hotplug_pgtable_page(page);
> +}
> +#else
> +static void free_pmd_table(pmd_t *pmdp, pud_t *pudp, unsigned long addr) { }
> +#endif

Can we fold the check in and remove the ifdeferry? e.g.

static void free_pmd_table(pmd_t *pmdp, pud_t *pudp, unsigned long addr)
{
struct page *page;
int i;

if (CONFIG_PGTABLE_LEVELS <= 2)
return;

...
}

... that would ensure that we always got build coverage here, and
minimize duplication. We do similar in map_kernel() and
early_fixmap_init() today.

Likewise for the other levels.

For arm64, the general policy is to use READ_ONCE() when reading a page
table entry (even if not strictly necessary), so please do so
consistently.

[...]

> +static void
> +remove_pte_table(pmd_t *pmdp, unsigned long addr,
> + unsigned long end, bool sparse_vmap)
> +{
> + struct page *page;
> + pte_t *ptep;
> + unsigned long start = addr;
> +
> + for (; addr < end; addr += PAGE_SIZE) {
> + ptep = pte_offset_kernel(pmdp, addr);
> + if (!pte_present(*ptep))
> + continue;
> +
> + if (sparse_vmap) {
> + page = pte_page(READ_ONCE(*ptep));
> + free_hotplug_page_range(page, PAGE_SIZE);
> + }
> + pte_clear(_mm, addr, ptep);
> + }
> + flush_tlb_kernel_range(start, end);
> +}

Please use a temporary pte variable here, e.g.

static void remove_pte_table(pmd_t *pmdp, unsigned long addr,
 unsigned long end, bool sparse_vmap)
{
unsigned long start = addr;
struct page *page;
pte_t *ptep, pte;

for (; addr < end; addr += PAGE_SIZE) {
ptep = pte_offset_kernel(pmdp, addr);
pte = READ_ONCE(*ptep);

if (!pte_present(pte))

Re: [PATCH V3 2/4] arm64/mm: Hold memory hotplug lock while walking for kernel page table dump

2019-05-14 Thread Mark Rutland

On Tue, May 14, 2019 at 02:30:05PM +0530, Anshuman Khandual wrote:
> The arm64 pagetable dump code can race with concurrent modification of the
> kernel page tables. When a leaf entries are modified concurrently, the dump
> code may log stale or inconsistent information for a VA range, but this is
> otherwise not harmful.
> 
> When intermediate levels of table are freed, the dump code will continue to
> use memory which has been freed and potentially reallocated for another
> purpose. In such cases, the dump code may dereference bogus addressses,
> leading to a number of potential problems.
> 
> Intermediate levels of table may by freed during memory hot-remove, or when
> installing a huge mapping in the vmalloc region. To avoid racing with these
> cases, take the memory hotplug lock when walking the kernel page table.
> 
> Signed-off-by: Anshuman Khandual 

Can we please move this after the next patch (which addresses the huge
vmap case), and change the last paragraph to:

  Intermediate levels of table may by freed during memory hot-remove,
  which will be enabled by a subsequent patch. To avoid racing with
  this, take the memory hotplug lock when walking the kernel page table.

With that, this looks good to me.

Thanks,
Mark.

> ---
>  arch/arm64/mm/ptdump_debugfs.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/mm/ptdump_debugfs.c b/arch/arm64/mm/ptdump_debugfs.c
> index 064163f..80171d1 100644
> --- a/arch/arm64/mm/ptdump_debugfs.c
> +++ b/arch/arm64/mm/ptdump_debugfs.c
> @@ -7,7 +7,10 @@
>  static int ptdump_show(struct seq_file *m, void *v)
>  {
>   struct ptdump_info *info = m->private;
> +
> + get_online_mems();
>   ptdump_walk_pgd(m, info);
> + put_online_mems();
>   return 0;
>  }
>  DEFINE_SHOW_ATTRIBUTE(ptdump);
> -- 
> 2.7.4
>

Re: icache_is_aliasing and big.LITTLE

2019-05-10 Thread Mark Rutland

On Thu, May 09, 2019 at 09:50:04AM +0100, Catalin Marinas wrote:
> Hi,
> 
> On Wed, May 08, 2019 at 11:45:03AM -0700, Salman Qazi wrote:
> > What is the intention behind icache_is_aliasing on big.LITTLE systems
> > where some icaches are VIPT and others are PIPT? Is it meant to be
> > conservative in some sense or should it be made per-CPU?
> 
> It needs to cover the worst case scenario across all CPUs, i.e. aliasing
> VIPT if one of the CPUs has this. We can't make it per-CPU because a
> thread performing cache maintenance might be migrated to another CPU
> with different cache policy (e.g. sync_icache_aliases()).

It's slightly more subtle than that -- for broadcast maintenance the
policy of the CPU receiving the broadcast matters.

So even if all i-cache maintenance were performed on a thread pinned to
a CPU with PIPT caches, to correctly affect any VIPT i-caches in the
system it would be necessary to perform maintenance as-if the CPU
performing the maintenance had VIPT i-caches.

Thanks,
Mark.

Re: [PATCH v2 3/3] arm64: use the correct function type for __arm64_sys_ni_syscall

2019-05-07 Thread Mark Rutland

On Fri, May 03, 2019 at 12:12:25PM -0700, Sami Tolvanen wrote:
> Calling sys_ni_syscall through a syscall_fn_t pointer trips indirect
> call Control-Flow Integrity checking due to a function type
> mismatch. Use SYSCALL_DEFINE0 for __arm64_sys_ni_syscall instead and
> remove the now unnecessary casts.
> 
> Signed-off-by: Sami Tolvanen 
> ---
>  arch/arm64/kernel/sys.c   | 14 +-
>  arch/arm64/kernel/sys32.c | 12 
>  2 files changed, 17 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c
> index b44065fb16160..4f8e8a7237a85 100644
> --- a/arch/arm64/kernel/sys.c
> +++ b/arch/arm64/kernel/sys.c
> @@ -47,22 +47,26 @@ SYSCALL_DEFINE1(arm64_personality, unsigned int, 
> personality)
>   return ksys_personality(personality);
>  }
>  
> +asmlinkage long sys_ni_syscall(void);
> +
> +SYSCALL_DEFINE0(ni_syscall)
> +{
> + return sys_ni_syscall();
> +}

I strongly think that we cant to fix up the common definition in
kernel/sys_ni.c rather than having a point-hack in arm64. Other
architectures (e.g. x86, s390) will want the same for CFI, and I'd like
to ensure that our approached don't diverge.

I took a quick look, and it looks like it's messy but possible to fix
up the core.

I also suspect that using SYSCALL_DEFINE0() as it currently stands isn't
a great idea, since it'll allow fault injection for unimplemented
syscalls, which sounds dubious to me.

Thanks,
Mark.

> +
>  /*
>   * Wrappers to pass the pt_regs argument.
>   */
>  #define sys_personality  sys_arm64_personality
>  
> -asmlinkage long sys_ni_syscall(const struct pt_regs *);
> -#define __arm64_sys_ni_syscall   sys_ni_syscall
> -
>  #undef __SYSCALL
>  #define __SYSCALL(nr, sym)   asmlinkage long __arm64_##sym(const struct 
> pt_regs *);
>  #include 
>  
>  #undef __SYSCALL
> -#define __SYSCALL(nr, sym)   [nr] = (syscall_fn_t)__arm64_##sym,
> +#define __SYSCALL(nr, sym)   [nr] = __arm64_##sym,
>  
>  const syscall_fn_t sys_call_table[__NR_syscalls] = {
> - [0 ... __NR_syscalls - 1] = (syscall_fn_t)sys_ni_syscall,
> + [0 ... __NR_syscalls - 1] = __arm64_sys_ni_syscall,
>  #include 
>  };
> diff --git a/arch/arm64/kernel/sys32.c b/arch/arm64/kernel/sys32.c
> index 0f8bcb7de7008..f8f6c26cfd326 100644
> --- a/arch/arm64/kernel/sys32.c
> +++ b/arch/arm64/kernel/sys32.c
> @@ -133,17 +133,21 @@ COMPAT_SYSCALL_DEFINE6(aarch32_fallocate, int, fd, int, 
> mode,
>   return ksys_fallocate(fd, mode, arg_u64(offset), arg_u64(len));
>  }
>  
> -asmlinkage long sys_ni_syscall(const struct pt_regs *);
> -#define __arm64_sys_ni_syscall   sys_ni_syscall
> +asmlinkage long sys_ni_syscall(void);
> +
> +COMPAT_SYSCALL_DEFINE0(ni_syscall)
> +{
> + return sys_ni_syscall();
> +}
>  
>  #undef __SYSCALL
>  #define __SYSCALL(nr, sym)   asmlinkage long __arm64_##sym(const struct 
> pt_regs *);
>  #include 
>  
>  #undef __SYSCALL
> -#define __SYSCALL(nr, sym)   [nr] = (syscall_fn_t)__arm64_##sym,
> +#define __SYSCALL(nr, sym)   [nr] = __arm64_##sym,
>  
>  const syscall_fn_t compat_sys_call_table[__NR_compat_syscalls] = {
> - [0 ... __NR_compat_syscalls - 1] = (syscall_fn_t)sys_ni_syscall,
> + [0 ... __NR_compat_syscalls - 1] = __arm64_sys_ni_syscall,
>  #include 
>  };
> -- 
> 2.21.0.1020.gf2820cf01a-goog
>

Re: [PATCH 2/2] arm64: use the correct function type in SYSCALL_DEFINE0

2019-05-03 Thread Mark Rutland

On Wed, May 01, 2019 at 01:04:51PM -0700, Sami Tolvanen wrote:
> Although a syscall defined using SYSCALL_DEFINE0 doesn't accept
> parameters, use the correct function type to avoid indirect call
> type mismatches with Control-Flow Integrity checking.

Generally, this makes sense, but I'm not sure that this is complete.

IIUC this introduces a new type mismatch with sys_ni_syscall() in some
cases. We probably need that to use SYSCALL_DEFINE0(), and maybe have a
ksys_ni_syscall() for in-kernel wrappers.

Thanks,
Mark.

> 
> Signed-off-by: Sami Tolvanen 
> ---
>  arch/arm64/include/asm/syscall_wrapper.h | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/syscall_wrapper.h 
> b/arch/arm64/include/asm/syscall_wrapper.h
> index a4477e515b798..507d0ee6bc690 100644
> --- a/arch/arm64/include/asm/syscall_wrapper.h
> +++ b/arch/arm64/include/asm/syscall_wrapper.h
> @@ -30,10 +30,10 @@
>   }   
> \
>   static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
>  
> -#define COMPAT_SYSCALL_DEFINE0(sname)
> \
> - asmlinkage long __arm64_compat_sys_##sname(void);   \
> - ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, ERRNO);   \
> - asmlinkage long __arm64_compat_sys_##sname(void)
> +#define COMPAT_SYSCALL_DEFINE0(sname)
> \
> + asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs 
> *__unused); \
> + ALLOW_ERROR_INJECTION(__arm64_compat_sys_##sname, ERRNO);   
> \
> + asmlinkage long __arm64_compat_sys_##sname(const struct pt_regs 
> *__unused)
>  
>  #define COND_SYSCALL_COMPAT(name) \
>   cond_syscall(__arm64_compat_sys_##name);
> @@ -62,11 +62,11 @@
>   static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))
>  
>  #ifndef SYSCALL_DEFINE0
> -#define SYSCALL_DEFINE0(sname)   \
> - SYSCALL_METADATA(_##sname, 0);  \
> - asmlinkage long __arm64_sys_##sname(void);  \
> - ALLOW_ERROR_INJECTION(__arm64_sys_##sname, ERRNO);  \
> - asmlinkage long __arm64_sys_##sname(void)
> +#define SYSCALL_DEFINE0(sname)   
> \
> + SYSCALL_METADATA(_##sname, 0);  
> \
> + asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused);
> \
> + ALLOW_ERROR_INJECTION(__arm64_sys_##sname, ERRNO);  
> \
> + asmlinkage long __arm64_sys_##sname(const struct pt_regs *__unused)
>  #endif
>  
>  #ifndef COND_SYSCALL
> -- 
> 2.21.0.593.g511ec345e18-goog
>

Re: [PATCH 1/2] arm64: fix syscall_fn_t type

2019-05-03 Thread Mark Rutland

On Wed, May 01, 2019 at 01:04:50PM -0700, Sami Tolvanen wrote:
> Use const struct pt_regs * instead of struct pt_regs * as
> the argument type to fix indirect call type mismatches with
> Control-Flow Integrity checking.

It's probably worth noting that in  all syscall
wrappers take a const struct pt_regs *, which is where the mismatch
comes from.

> 
> Signed-off-by: Sami Tolvanen 
> ---
>  arch/arm64/include/asm/syscall.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/syscall.h 
> b/arch/arm64/include/asm/syscall.h
> index a179df3674a1a..6206ab9bfcfc5 100644
> --- a/arch/arm64/include/asm/syscall.h
> +++ b/arch/arm64/include/asm/syscall.h
> @@ -20,7 +20,7 @@
>  #include 
>  #include 
>  
> -typedef long (*syscall_fn_t)(struct pt_regs *regs);
> +typedef long (*syscall_fn_t)(const struct pt_regs *regs);

For a second I was worried that we modify the regs to assign the return
value, but I see we do that in the syscall.c wrapper, where the pt_regs
argument isn't const.

We certainly chouldn't need to modify the regs when acquiring the
arguments, and as above this matches , so this
looks sound to me.

FWIW:

Reviewed-by: Mark Rutland 

Thanks,
Mark.

Re: [PATCH] RISC-V: Add an Image header that boot loader can parse.

2019-05-01 Thread Mark Rutland

On Wed, May 01, 2019 at 10:41:52PM +0530, Anup Patel wrote:
> On Wed, May 1, 2019 at 10:30 PM Mark Rutland  wrote:
> >
> > On Mon, Apr 29, 2019 at 10:42:40PM -0700, Atish Patra wrote:
> > > On 4/29/19 4:40 PM, Palmer Dabbelt wrote:
> > > > On Tue, 23 Apr 2019 16:25:06 PDT (-0700), atish.pa...@wdc.com wrote:
> > > > > Currently, last stage boot loaders such as U-Boot can accept only
> > > > > uImage which is an unnecessary additional step in automating boot 
> > > > > flows.
> > > > >
> > > > > Add a simple image header that boot loaders can parse and directly
> > > > > load kernel flat Image. The existing booting methods will continue to
> > > > > work as it is.
> > > > >
> > > > > Tested on both QEMU and HiFive Unleashed using OpenSBI + U-Boot + 
> > > > > Linux.
> > > > >
> > > > > Signed-off-by: Atish Patra 
> > > > > ---
> > > > >   arch/riscv/include/asm/image.h | 32 
> > > > >   arch/riscv/kernel/head.S   | 28 
> > > > >   2 files changed, 60 insertions(+)
> > > > >   create mode 100644 arch/riscv/include/asm/image.h
> > > > >
> > > > > diff --git a/arch/riscv/include/asm/image.h 
> > > > > b/arch/riscv/include/asm/image.h
> > > > > new file mode 100644
> > > > > index ..76a7e0d4068a
> > > > > --- /dev/null
> > > > > +++ b/arch/riscv/include/asm/image.h
> > > > > @@ -0,0 +1,32 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > > +
> > > > > +#ifndef __ASM_IMAGE_H
> > > > > +#define __ASM_IMAGE_H
> > > > > +
> > > > > +#define RISCV_IMAGE_MAGIC"RISCV"
> > > > > +
> > > > > +#ifndef __ASSEMBLY__
> > > > > +/*
> > > > > + * struct riscv_image_header - riscv kernel image header
> > > > > + *
> > > > > + * @code0:   Executable code
> > > > > + * @code1:   Executable code
> > > > > + * @text_offset: Image load offset
> > > > > + * @image_size:  Effective Image size
> > > > > + * @reserved:reserved
> > > > > + * @magic:   Magic number
> > > > > + * @reserved:reserved
> > > > > + */
> > > > > +
> > > > > +struct riscv_image_header {
> > > > > + u32 code0;
> > > > > + u32 code1;
> > > > > + u64 text_offset;
> > > > > + u64 image_size;
> > > > > + u64 res1;
> > > > > + u64 magic;
> > > > > + u32 res2;
> > > > > + u32 res3;
> > > > > +};
> > > >
> > > > I don't want to invent our own file format.  Is there a reason we can't 
> > > > just
> > > > use something standard?  Off the top of my head I can think of ELF 
> > > > files and
> > > > multiboot.
> > >
> > > Additional header is required to accommodate PE header format. Currently,
> > > this is only used for booti command but it will be reused for EFI headers 
> > > as
> > > well. Linux kernel Image can pretend as an EFI application if PE/COFF 
> > > header
> > > is present. This removes the need of an explicit EFI boot loader and EFI
> > > firmware can directly load Linux (obviously after EFI stub implementation
> > > for RISC-V).
> >
> > Adding the EFI stub on arm64 required very careful consideration of our
> > Image header and the EFI spec, along with the PE/COFF spec.
> >
> > For example, to be a compliant PE/COFF header, the first two bytes of
> > your kernel image need to be "MZ" in ASCII. On arm64 we happened to find
> > a valid instruction that we could rely upon that met this requirement...
> 
> The "MZ" ASCII (i.e. 0x5a4d) is "li s4,-13" instruction in RISC-V so this
> modifies "s4" register which is pretty harmless from Linux RISC-V booting
> perspective.
> 
> Of course, we should only add "MZ" ASCII in Linux RISC-V image header
> when CONFIG_EFI is enabled (just like Linux ARM64).

Great. It would probably be worth just mentioning that in the commit
message, so that it's clear that has been considered.

Thanks,
Mark.

Re: [PATCH] RISC-V: Add an Image header that boot loader can parse.

2019-05-01 Thread Mark Rutland

On Mon, Apr 29, 2019 at 10:42:40PM -0700, Atish Patra wrote:
> On 4/29/19 4:40 PM, Palmer Dabbelt wrote:
> > On Tue, 23 Apr 2019 16:25:06 PDT (-0700), atish.pa...@wdc.com wrote:
> > > Currently, last stage boot loaders such as U-Boot can accept only
> > > uImage which is an unnecessary additional step in automating boot flows.
> > > 
> > > Add a simple image header that boot loaders can parse and directly
> > > load kernel flat Image. The existing booting methods will continue to
> > > work as it is.
> > > 
> > > Tested on both QEMU and HiFive Unleashed using OpenSBI + U-Boot + Linux.
> > > 
> > > Signed-off-by: Atish Patra 
> > > ---
> > >   arch/riscv/include/asm/image.h | 32 
> > >   arch/riscv/kernel/head.S   | 28 
> > >   2 files changed, 60 insertions(+)
> > >   create mode 100644 arch/riscv/include/asm/image.h
> > > 
> > > diff --git a/arch/riscv/include/asm/image.h 
> > > b/arch/riscv/include/asm/image.h
> > > new file mode 100644
> > > index ..76a7e0d4068a
> > > --- /dev/null
> > > +++ b/arch/riscv/include/asm/image.h
> > > @@ -0,0 +1,32 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +
> > > +#ifndef __ASM_IMAGE_H
> > > +#define __ASM_IMAGE_H
> > > +
> > > +#define RISCV_IMAGE_MAGIC"RISCV"
> > > +
> > > +#ifndef __ASSEMBLY__
> > > +/*
> > > + * struct riscv_image_header - riscv kernel image header
> > > + *
> > > + * @code0:   Executable code
> > > + * @code1:   Executable code
> > > + * @text_offset: Image load offset
> > > + * @image_size:  Effective Image size
> > > + * @reserved:reserved
> > > + * @magic:   Magic number
> > > + * @reserved:reserved
> > > + */
> > > +
> > > +struct riscv_image_header {
> > > + u32 code0;
> > > + u32 code1;
> > > + u64 text_offset;
> > > + u64 image_size;
> > > + u64 res1;
> > > + u64 magic;
> > > + u32 res2;
> > > + u32 res3;
> > > +};
> > 
> > I don't want to invent our own file format.  Is there a reason we can't just
> > use something standard?  Off the top of my head I can think of ELF files and
> > multiboot.
> 
> Additional header is required to accommodate PE header format. Currently,
> this is only used for booti command but it will be reused for EFI headers as
> well. Linux kernel Image can pretend as an EFI application if PE/COFF header
> is present. This removes the need of an explicit EFI boot loader and EFI
> firmware can directly load Linux (obviously after EFI stub implementation
> for RISC-V).

Adding the EFI stub on arm64 required very careful consideration of our
Image header and the EFI spec, along with the PE/COFF spec.

For example, to be a compliant PE/COFF header, the first two bytes of
your kernel image need to be "MZ" in ASCII. On arm64 we happened to find
a valid instruction that we could rely upon that met this requirement...

> > >   __INIT
> > >   ENTRY(_start)
> > > + /*
> > > +  * Image header expected by Linux boot-loaders. The image header data
> > > +  * structure is described in asm/image.h.
> > > +  * Do not modify it without modifying the structure and all bootloaders
> > > +  * that expects this header format!!
> > > +  */
> > > + /* jump to start kernel */
> > > + j _start_kernel

... but it's not clear to me if this instruction meets that requriement.

I would strongly encourage you to consider what you actually need for a
compliant EFI header before you set the rest of this ABI in stone.

On arm64 we also had issues with endianness, and I would strongly
recommend that you define how big/little endian will work ahead of time.
e.g. whether fields are always in a fixed endianness.

Thanks,
Mark.

Re: [PATCH] io_uring: avoid page allocation warnings

2019-05-01 Thread Mark Rutland

On Wed, May 01, 2019 at 09:29:25AM -0600, Jens Axboe wrote:
> On 5/1/19 9:09 AM, Mark Rutland wrote:
> > I've manually minimized that to C below. AFAICT, that hits a leak, which
> > is what's triggering the OOM after the program is run a number of times
> > with the previously posted kvmalloc patch.
> > 
> > Per /proc/meminfo, that memory isn't accounted anywhere.
> > 
> >> Patch looks fine to me. Note
> >> that buffer registration is under the protection of RLIMIT_MEMLOCK.
> >> That's usually very limited for non-root, as root you can of course
> >> consume as much as you want and OOM the system.
> > 
> > Sure.
> > 
> > As above, it looks like there's a leak, regardless.
> 
> The leak is that we're not releasing imu->bvec in case of error. I fixed
> a missing kfree -> kvfree as well in your patch, with this rolled up
> version it works for me.

That works for me too.

I'll fold that into v2, and send that out momentarily.

Thanks,
Mark.

Re: arm64: Fix size of __early_cpu_boot_status

2019-05-01 Thread Mark Rutland

On Tue, Apr 30, 2019 at 04:05:04PM +0530, Arun KS wrote:
> __early_cpu_boot_status is of type long. Use quad
> assembler directive to allocate proper size.
> 
> Signed-off-by: Arun KS 
> ---
>  arch/arm64/kernel/head.S | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/head.S b/arch/arm64/kernel/head.S
> index eecf792..115f332 100644
> --- a/arch/arm64/kernel/head.S
> +++ b/arch/arm64/kernel/head.S
> @@ -684,7 +684,7 @@ ENTRY(__boot_cpu_mode)
>   * with MMU turned off.
>   */
>  ENTRY(__early_cpu_boot_status)
> - .long   0
> + .quad   0

This is the last element in .mmuoff.data.write, which is padded to 2K,
so luckily we don't clobber anything else (and don't need a backport).

For consistency with __boot_cpu_mode, we could instead change the c
declaration to int, and fix up the calls to
update_early_cpu_boot_status, to use a w register, but either way:

Acked-by: Mark Rutland 

Mark.

>  
>   .popsection
>  
> -- 
> 1.9.1
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Re: [PATCH] clocksource/arm_arch_timer: extract elf_hwcap use to arch-helper

2019-05-01 Thread Mark Rutland

On Tue, Apr 30, 2019 at 02:14:13PM +0100, Andrew Murray wrote:
> Different mechanisms are used to test and set elf_hwcaps between ARM
> and ARM64, this results in the use of #ifdef's in this file when

Nit: greengrocer's apostrophe -- you can say "use of ifdeferry" to clean
that up.

> setting/testing for the EVTSTRM hwcap.
> 
> Let's improve readability by extracting this to an arch helper.
> 
> Signed-off-by: Andrew Murray 
> ---
>  arch/arm/include/asm/arch_timer.h| 13 +
>  arch/arm64/include/asm/arch_timer.h  | 13 +
>  drivers/clocksource/arm_arch_timer.c | 15 ++-
>  3 files changed, 28 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/arm/include/asm/arch_timer.h 
> b/arch/arm/include/asm/arch_timer.h
> index 0a8d7bba2cb0..f21e038dc9f3 100644
> --- a/arch/arm/include/asm/arch_timer.h
> +++ b/arch/arm/include/asm/arch_timer.h
> @@ -4,6 +4,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -110,6 +111,18 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>   isb();
>  }
>  
> +static inline bool arch_timer_set_evtstrm_feature(void)
> +{
> + elf_hwcap |= HWCAP_EVTSTRM;
> +#ifdef CONFIG_COMPAT
> + compat_elf_hwcap |= COMPAT_HWCAP_EVTSTRM;
> +#endif

This can go; 32-bit arm doesn't have COMPAT.

Otherwise, this looks good to me, so with those changes:

Acked-by: Mark Rutland 

Thanks,
Mark.

> +}
> +
> +static inline bool arch_timer_have_evtstrm_feature(void)
> +{
> + return elf_hwcap & HWCAP_EVTSTRM;
> +}
>  #endif
>  
>  #endif
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index f2a234d6516c..f371d4a94f06 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -20,6 +20,7 @@
>  #define __ASM_ARCH_TIMER_H
>  
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -165,4 +166,16 @@ static inline int arch_timer_arch_init(void)
>   return 0;
>  }
>  
> +static inline void arch_timer_set_evtstrm_feature(void)
> +{
> + cpu_set_named_feature(EVTSTRM);
> +#ifdef CONFIG_COMPAT
> + compat_elf_hwcap |= COMPAT_HWCAP_EVTSTRM;
> +#endif
> +}
> +
> +static inline bool arch_timer_have_evtstrm_feature(void)
> +{
> + return cpu_have_named_feature(EVTSTRM);
> +}
>  #endif
> diff --git a/drivers/clocksource/arm_arch_timer.c 
> b/drivers/clocksource/arm_arch_timer.c
> index 6cc8aff83805..5e507e81515f 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -833,14 +833,7 @@ static void arch_timer_evtstrm_enable(int divider)
>   cntkctl |= (divider << ARCH_TIMER_EVT_TRIGGER_SHIFT)
>   | ARCH_TIMER_VIRT_EVT_EN;
>   arch_timer_set_cntkctl(cntkctl);
> -#ifdef CONFIG_ARM64
> - cpu_set_named_feature(EVTSTRM);
> -#else
> - elf_hwcap |= HWCAP_EVTSTRM;
> -#endif
> -#ifdef CONFIG_COMPAT
> - compat_elf_hwcap |= COMPAT_HWCAP_EVTSTRM;
> -#endif
> + arch_timer_set_evtstrm_feature();
>   cpumask_set_cpu(smp_processor_id(), _available);
>  }
>  
> @@ -1059,11 +1052,7 @@ static int arch_timer_cpu_pm_notify(struct 
> notifier_block *self,
>   } else if (action == CPU_PM_ENTER_FAILED || action == CPU_PM_EXIT) {
>   arch_timer_set_cntkctl(__this_cpu_read(saved_cntkctl));
>  
> -#ifdef CONFIG_ARM64
> - if (cpu_have_named_feature(EVTSTRM))
> -#else
> - if (elf_hwcap & HWCAP_EVTSTRM)
> -#endif
> + if (arch_timer_have_evtstrm_feature())
>   cpumask_set_cpu(smp_processor_id(), _available);
>   }
>   return NOTIFY_OK;
> -- 
> 2.21.0
>

Re: [PATCH] arm64: Demote boot and shutdown messages to pr_debug

2019-05-01 Thread Mark Rutland

On Tue, Apr 30, 2019 at 03:38:31PM -0700, Florian Fainelli wrote:
> Similar to commits c68b0274fb3cf ("ARM: reduce "Booted secondary
> processor" message to debug level") and 035e787543de7 ("ARM: 8644/1: Reduce 
> "CPU:
> shutdown" message to debug level"), demote the secondary_start_kernel()
> and __cpu_die() messages from info, respectively notice to debug. While
> we are at it, also do this for cpu_psci_cpu_kill() which is redundant
> with __cpu_die().
> 
> This helps improve the amount of possible hotplug cycles by around +50%
> on ARCH_BRCMSTB.

Could you elaborate on why that matters? 

e.g. is this just for testing, or does this matter in some shutdown or
hibernate scenario?

> Signed-off-by: Florian Fainelli 
> ---
>  arch/arm64/kernel/psci.c | 2 +-
>  arch/arm64/kernel/smp.c  | 4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/kernel/psci.c b/arch/arm64/kernel/psci.c
> index 8cdaf25e99cd..a78581046c80 100644
> --- a/arch/arm64/kernel/psci.c
> +++ b/arch/arm64/kernel/psci.c
> @@ -96,7 +96,7 @@ static int cpu_psci_cpu_kill(unsigned int cpu)
>   for (i = 0; i < 10; i++) {
>   err = psci_ops.affinity_info(cpu_logical_map(cpu), 0);
>   if (err == PSCI_0_2_AFFINITY_LEVEL_OFF) {
> - pr_info("CPU%d killed.\n", cpu);
> + pr_debug("CPU%d killed.\n", cpu);
>   return 0;
>   }
>  
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index 824de7038967..71fd2b5a3f0e 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -259,7 +259,7 @@ asmlinkage notrace void secondary_start_kernel(void)
>* the CPU migration code to notice that the CPU is online
>* before we continue.
>*/
> - pr_info("CPU%u: Booted secondary processor 0x%010lx [0x%08x]\n",
> + pr_debug("CPU%u: Booted secondary processor 0x%010lx [0x%08x]\n",
>cpu, (unsigned long)mpidr,
>read_cpuid_id());

I generally agree that we don't need to be verbose, and demoting these
to debug is fine, but it's a shame that these won't be accessible in
defconfig.

I wonder if we should enable DYNAMIC_DEBUG so that we can turn these on
from the kernel command line, or if we should have something like a
verbose_hotplug option specifically for these messages.

Thanks,
Mark.

>   update_cpu_boot_status(CPU_BOOT_SUCCESS);
> @@ -348,7 +348,7 @@ void __cpu_die(unsigned int cpu)
>   pr_crit("CPU%u: cpu didn't die\n", cpu);
>   return;
>   }
> - pr_notice("CPU%u: shutdown\n", cpu);
> + pr_debug("CPU%u: shutdown\n", cpu);
>  
>   /*
>* Now that the dying CPU is beyond the point of no return w.r.t.
> -- 
> 2.17.1
>

Re: [PATCH] io_uring: free allocated io_memory once

2019-04-30 Thread Mark Rutland

On Tue, Apr 30, 2019 at 10:22:23AM -0600, Jens Axboe wrote:
> On 4/30/19 10:20 AM, Mark Rutland wrote:
> > -   io_mem_free(ctx->sq_ring);
> > -   io_mem_free(ctx->sq_sqes);
> > -   io_mem_free(ctx->cq_ring);
> > +   if (ctx->sq_ring)
> > +   io_mem_free(ctx->sq_ring);
> > +   if (ctx->sq_sqes)
> > +   io_mem_free(ctx->sq_sqes);
> > +   if (ctx->cq_ring)
> > +   io_mem_free(ctx->cq_ring);
> 
> Please just make io_mem_free() handle a NULL pointer so we don't need
> these checks.

Sure; I'll spin a v2 momentarily.

Thanks,
Mark.

Re: [PATCH 2/4] perf: Add filter_match() as a parameter for pinned/flexible_sched_in()

2019-04-29 Thread Mark Rutland

On Mon, Apr 29, 2019 at 11:31:26AM -0400, Liang, Kan wrote:
> 
> 
> On 4/29/2019 11:12 AM, Mark Rutland wrote:
> > On Mon, Apr 29, 2019 at 07:44:03AM -0700, kan.li...@linux.intel.com wrote:
> > > From: Kan Liang 
> > > 
> > > A fast path will be introduced in the following patches to speed up the
> > > cgroup events sched in, which only needs a simpler filter_match().
> > > 
> > > Add filter_match() as a parameter for pinned/flexible_sched_in().
> > > 
> > > No functional change.
> > 
> > I suspect that the cost you're trying to avoid is pmu_filter_match()
> > iterating over the entire group, which arm systems rely upon for correct
> > behaviour on big.LITTLE systems.
> > 
> > Is that the case?
> 
> No. In X86, we don't use pmu_filter_match(). The fast path still keeps this
> filter.
> perf_cgroup_match() is the one I want to avoid.

Understood; sorry for the silly question, and thanks for confirming! :)

Thanks,
Mark.

Re: [PATCH 2/4] perf: Add filter_match() as a parameter for pinned/flexible_sched_in()

2019-04-29 Thread Mark Rutland

On Mon, Apr 29, 2019 at 07:44:03AM -0700, kan.li...@linux.intel.com wrote:
> From: Kan Liang 
> 
> A fast path will be introduced in the following patches to speed up the
> cgroup events sched in, which only needs a simpler filter_match().
> 
> Add filter_match() as a parameter for pinned/flexible_sched_in().
> 
> No functional change.

I suspect that the cost you're trying to avoid is pmu_filter_match()
iterating over the entire group, which arm systems rely upon for correct
behaviour on big.LITTLE systems.

Is that the case?

Thanks,
Mark.

> 
> Signed-off-by: Kan Liang 
> ---
>  kernel/events/core.c | 15 +--
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 388dd42..782fd86 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -3251,7 +3251,8 @@ static void cpu_ctx_sched_out(struct perf_cpu_context 
> *cpuctx,
>  }
>  
>  static int visit_groups_merge(struct perf_event_groups *groups, int cpu,
> -   int (*func)(struct perf_event *, void *), void 
> *data)
> +   int (*func)(struct perf_event *, void *, int 
> (*)(struct perf_event *)),
> +   void *data)
>  {
>   struct perf_event **evt, *evt1, *evt2;
>   int ret;
> @@ -3271,7 +3272,7 @@ static int visit_groups_merge(struct perf_event_groups 
> *groups, int cpu,
>   evt = 
>   }
>  
> - ret = func(*evt, data);
> + ret = func(*evt, data, event_filter_match);
>   if (ret)
>   return ret;
>  
> @@ -3287,7 +3288,8 @@ struct sched_in_data {
>   int can_add_hw;
>  };
>  
> -static int pinned_sched_in(struct perf_event *event, void *data)
> +static int pinned_sched_in(struct perf_event *event, void *data,
> +int (*filter_match)(struct perf_event *))
>  {
>   struct sched_in_data *sid = data;
>  
> @@ -3300,7 +3302,7 @@ static int pinned_sched_in(struct perf_event *event, 
> void *data)
>   return 0;
>  #endif
>  
> - if (!event_filter_match(event))
> + if (!filter_match(event))
>   return 0;
>  
>   if (group_can_go_on(event, sid->cpuctx, sid->can_add_hw)) {
> @@ -3318,7 +3320,8 @@ static int pinned_sched_in(struct perf_event *event, 
> void *data)
>   return 0;
>  }
>  
> -static int flexible_sched_in(struct perf_event *event, void *data)
> +static int flexible_sched_in(struct perf_event *event, void *data,
> +  int (*filter_match)(struct perf_event *))
>  {
>   struct sched_in_data *sid = data;
>  
> @@ -3331,7 +3334,7 @@ static int flexible_sched_in(struct perf_event *event, 
> void *data)
>   return 0;
>  #endif
>  
> - if (!event_filter_match(event))
> + if (!filter_match(event))
>   return 0;
>  
>   if (group_can_go_on(event, sid->cpuctx, sid->can_add_hw)) {
> -- 
> 2.7.4
>

Re: [PATCH 1/4] perf: Fix system-wide events miscounting during cgroup monitoring

2019-04-29 Thread Mark Rutland

On Mon, Apr 29, 2019 at 07:44:02AM -0700, kan.li...@linux.intel.com wrote:
> From: Kan Liang 
> 
> When counting system-wide events and cgroup events simultaneously, the
> value of system-wide events are miscounting. For example,
> 
> perf stat -e cycles,instructions -e cycles,instructions -G
> cgroup1,cgroup1,cgroup2,cgroup2 -a -e cycles,instructions -I 1000
> 
>  1.096265502 12,375,398,872  cycles  cgroup1
>  1.096265502  8,396,184,503  instructionscgroup1
>  #0.10  insn per cycle
>  1.096265502109,609,027,112  cycles  cgroup2
>  1.096265502 11,533,690,148  instructionscgroup2
>  #0.14  insn per cycle
>  1.096265502121,672,937,058  cycles
>  1.096265502 19,331,496,727  instructions   #
> 0.24  insn per cycle
> 
> The events are identical events for system-wide and cgroup. The
> value of system-wide events is less than the sum of cgroup events,
> which is wrong.
> 
> Both system-wide and cgroup are per-cpu. They share the same cpuctx
> groups, cpuctx->flexible_groups/pinned_groups.
> In context switch, cgroup switch tries to schedule all the events in
> the cpuctx groups. The unmatched cgroup events can be filtered by its
> event->cgrp. However, system-wide events, which event->cgrp is NULL, are
> unconditionally switched, which causes miscounting.

Why exactly does that cause mis-counting?

Are the system-wide events not switched back in? Or filtered
erroneously?

> Introduce cgrp_switch in cpuctx to indicate the cgroup context switch.
> If it's system-wide event in context switch, don't try to switch it.
> 
> Fixes: e5d1367f17ba ("perf: Add cgroup support")
> Signed-off-by: Kan Liang 
> ---
>  include/linux/perf_event.h |  1 +
>  kernel/events/core.c   | 30 --
>  2 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index e47ef76..039e2f2 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -795,6 +795,7 @@ struct perf_cpu_context {
>  #ifdef CONFIG_CGROUP_PERF
>   struct perf_cgroup  *cgrp;
>   struct list_headcgrp_cpuctx_entry;
> + unsigned intcgrp_switch :1;
>  #endif
>  
>   struct list_headsched_cb_entry;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index dc7dead..388dd42 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -809,6 +809,7 @@ static void perf_cgroup_switch(struct task_struct *task, 
> int mode)
>  
>   perf_ctx_lock(cpuctx, cpuctx->task_ctx);
>   perf_pmu_disable(cpuctx->ctx.pmu);
> + cpuctx->cgrp_switch = true;
>  
>   if (mode & PERF_CGROUP_SWOUT) {
>   cpu_ctx_sched_out(cpuctx, EVENT_ALL);
> @@ -832,6 +833,7 @@ static void perf_cgroup_switch(struct task_struct *task, 
> int mode)
>>ctx);
>   cpu_ctx_sched_in(cpuctx, EVENT_ALL, task);
>   }
> + cpuctx->cgrp_switch = false;
>   perf_pmu_enable(cpuctx->ctx.pmu);
>   perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
>   }
> @@ -2944,13 +2946,25 @@ static void ctx_sched_out(struct perf_event_context 
> *ctx,
>  
>   perf_pmu_disable(ctx->pmu);
>   if (is_active & EVENT_PINNED) {
> - list_for_each_entry_safe(event, tmp, >pinned_active, 
> active_list)
> + list_for_each_entry_safe(event, tmp, >pinned_active, 
> active_list) {
> +#ifdef CONFIG_CGROUP_PERF
> + /* Don't sched system-wide event when cgroup context 
> switch */
> + if (cpuctx->cgrp_switch && !event->cgrp)
> + continue;
> +#endif

This pattern is duplicated several times, and should probably be a
helper like:

static bool skip_cgroup_switch(struct perf_cpu_context *cpuctx,
   struct perf_event *event);
{
return IS_ENABLED(CONFIG_CGROUP_PERF) &&
   cpuctx->cgrp_switch &&
   !event->cgrp;
}

... allowing the above to be an unconditional:

if (skip_cgroup_switch(cpuctx, event))
continue;

... and likewise for the other cases.

Thanks,
Mark.

>   group_sched_out(event, cpuctx, ctx);
> + }
>   }
>  
>   if (is_active & EVENT_FLEXIBLE) {
> - list_for_each_entry_safe(event, tmp, >flexible_active, 
> active_list)
> + list_for_each_entry_safe(event, tmp, >flexible_active, 
> active_list) {
> +#ifdef CONFIG_CGROUP_PERF
> + /* Don't sched system-wide event when cgroup context 
> switch */
> + if (cpuctx->cgrp_switch && !event->cgrp)
> + continue;
> +#endif
>

Re: [PATCH v5] arm64: sysreg: Make mrs_s and msr_s macros work with Clang and LTO

2019-04-25 Thread Mark Rutland

On Wed, Apr 24, 2019 at 09:55:37AM -0700, Kees Cook wrote:
> Clang's integrated assembler does not allow assembly macros defined
> in one inline asm block using the .macro directive to be used across
> separate asm blocks. LLVM developers consider this a feature and not a
> bug, recommending code refactoring:
> 
>   https://bugs.llvm.org/show_bug.cgi?id=19749
> 
> As binutils doesn't allow macros to be redefined, this change uses
> UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
> in-place and workaround gcc and clang limitations on redefining macros
> across different assembler blocks.
> 
> Specifically, the current state after preprocessing looks like this:
> 
> asm volatile(".macro mXX_s ... .endm");
> void f()
> {
>   asm volatile("mXX_s a, b");
> }
> 
> With GCC, it gives macro redefinition error because sysreg.h is included
> in multiple source files, and assembler code for all of them is later
> combined for LTO (I've seen an intermediate file with hundreds of
> identical definitions).
> 
> With clang, it gives macro undefined error because clang doesn't allow
> sharing macros between inline asm statements.
> 
> I also seem to remember catching another sort of undefined error with
> GCC due to reordering of macro definition asm statement and generated
> asm code for function that uses the macro.
> 
> The solution with defining and undefining for each use, while certainly
> not elegant, satisfies both GCC and clang, LTO and non-LTO.
> 
> Co-developed-by: Alex Matveev 
> Co-developed-by: Yury Norov 
> Co-developed-by: Sami Tolvanen 
> Signed-off-by: Kees Cook 

I've given this a spin with the kernel.org crosstool GCC 8.1.0
toolchain, and I can confirm defconfig compiels cleanly and works as
expected. AFAICT, all usage in inline assembly has been updated, and the
removal of explicit stringification makes usage a bit easier to read,
which is a nice bonus!

I don't think we can make this any cleaner short of not having to
manually assemble the instructions, so FWIW:

Reviewed-by: Mark Rutland 

I'll leave this to Catalin and Will.

Thanks,
Mark.

> ---
> v5: include register declaration in macro (rutland)
> v4: move to using preprocessor entirely, split constraints for "rZ" case.
> v3: added more uses in irqflags, updated commit log, based on
> discussion in https://lore.kernel.org/patchwork/patch/851580/
> ---
>  arch/arm64/include/asm/irqflags.h |  8 +++---
>  arch/arm64/include/asm/kvm_hyp.h  |  4 +--
>  arch/arm64/include/asm/sysreg.h   | 45 ++-
>  3 files changed, 38 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/irqflags.h 
> b/arch/arm64/include/asm/irqflags.h
> index 43d8366c1e87..629963189085 100644
> --- a/arch/arm64/include/asm/irqflags.h
> +++ b/arch/arm64/include/asm/irqflags.h
> @@ -43,7 +43,7 @@ static inline void arch_local_irq_enable(void)
>   asm volatile(ALTERNATIVE(
>   "msrdaifclr, #2 // arch_local_irq_enable\n"
>   "nop",
> - "msr_s  " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
> + __msr_s(SYS_ICC_PMR_EL1, "%0")
>   "dsbsy",
>   ARM64_HAS_IRQ_PRIO_MASKING)
>   :
> @@ -55,7 +55,7 @@ static inline void arch_local_irq_disable(void)
>  {
>   asm volatile(ALTERNATIVE(
>   "msrdaifset, #2 // arch_local_irq_disable",
> - "msr_s  " __stringify(SYS_ICC_PMR_EL1) ", %0",
> + __msr_s(SYS_ICC_PMR_EL1, "%0"),
>   ARM64_HAS_IRQ_PRIO_MASKING)
>   :
>   : "r" ((unsigned long) GIC_PRIO_IRQOFF)
> @@ -86,7 +86,7 @@ static inline unsigned long arch_local_save_flags(void)
>   "mov%0, %1\n"
>   "nop\n"
>   "nop",
> - "mrs_s  %0, " __stringify(SYS_ICC_PMR_EL1) "\n"
> + __mrs_s("%0", SYS_ICC_PMR_EL1)
>   "ands   %1, %1, " __stringify(PSR_I_BIT) "\n"
>   "csel   %0, %0, %2, eq",
>   ARM64_HAS_IRQ_PRIO_MASKING)
> @@ -116,7 +116,7 @@ static inline void arch_local_irq_restore(unsigned long 
> flags)
>   asm volatile(ALTERNATIVE(
>   "msrdaif, %0\n"
>   "nop",
> - "msr_s  " __stringify(SYS_ICC_PMR_EL1) ", %0\n"
> + __msr_s(SYS_ICC_PMR_EL1, "%0")
>

Re: [RFC PATCH for 5.2 08/10] rseq/selftests: aarch64 code signature: handle big-endian environment

2019-04-24 Thread Mark Rutland

On Wed, Apr 24, 2019 at 05:45:38PM +0100, Will Deacon wrote:
> On Wed, Apr 24, 2019 at 11:25:00AM -0400, Mathieu Desnoyers wrote:
> > +/*
> > + * aarch64 -mbig-endian generates mixed endianness code vs data:
> > + * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
> > + * matches code endianness.
> > + */
> > +#define RSEQ_SIG_CODE  0xd428bc00  /* BRK #0x45E0.  */
> > +
> > +#ifdef __AARCH64EB__
> > +#define RSEQ_SIG_DATA  0x00bc28d4  /* BRK #0x45E0.  */
> 
> It would be neater to implement swab32 and use that with RSEQ_SIG_CODE,

If possible, marginally neater than that would be using
le32_to_cpu(RSEQ_SIG_CODE), without any ifdeffery necessary.

It looks like that's defined in tools/include/linux/kernel.h, but I'm
not sure if that gets pulled into your include path.

Thanks,
Mark.

Re: [RFC PATCH for 5.2 08/10] rseq/selftests: aarch64 code signature: handle big-endian environment

2019-04-24 Thread Mark Rutland

On Wed, Apr 24, 2019 at 05:40:33PM +0100, Mark Rutland wrote:
> On Wed, Apr 24, 2019 at 11:25:00AM -0400, Mathieu Desnoyers wrote:
> > +/*
> > + * aarch64 -mbig-endian generates mixed endianness code vs data:
> > + * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
> > + * matches code endianness.
> > + */
> > +#define RSEQ_SIG_CODE  0xd428bc00  /* BRK #0x45E0.  */
> > +
> > +#ifdef __AARCH64EB__
> > +#define RSEQ_SIG_DATA  0x00bc28d4  /* BRK #0x45E0.  */
> > +#else
> > +#define RSEQ_SIG_DATA  RSEQ_SIG_CODE
> > +#endif
> > +
> > +#define RSEQ_SIG   RSEQ_SIG_DATA
> >  
> >  #define rseq_smp_mb()  __asm__ __volatile__ ("dmb ish" ::: "memory")
> >  #define rseq_smp_rmb() __asm__ __volatile__ ("dmb ishld" ::: "memory")
> > @@ -121,7 +134,7 @@ do {
> > \
> >  
> >  #define RSEQ_ASM_DEFINE_ABORT(label, abort_label)  
> > \
> > "   b   222f\n" 
> > \
> > -   "   .inst   "   __rseq_str(RSEQ_SIG) "\n"   
> > \
> > +   "   .inst   "   __rseq_str(RSEQ_SIG_CODE) "\n"  
> > \
> 
> I don't think this is right; the .inst directive _should_ emit the value
> in the instruction stream endianness (i.e. LE, regardless of the data
> endianness).

Now I see the RSEQ_CODE value isn't endian-swapped as teh RSEQ_DATA
value is, so the code above is fine.

Sory for the noise.

Thanks,
Mark.

Re: [RFC PATCH for 5.2 08/10] rseq/selftests: aarch64 code signature: handle big-endian environment

2019-04-24 Thread Mark Rutland

On Wed, Apr 24, 2019 at 11:25:00AM -0400, Mathieu Desnoyers wrote:
> Handle compiling with -mbig-endian on aarch64, which generates binaries
> with mixed code vs data endianness (little endian code, big endian
> data).
> 
> Else mismatch between code endianness for the generated signatures and
> data endianness for the RSEQ_SIG parameter passed to the rseq
> registration will trigger application segmentation faults when the
> kernel try to abort rseq critical sections.
> 
> Signed-off-by: Mathieu Desnoyers 
> CC: Peter Zijlstra 
> CC: Thomas Gleixner 
> CC: Joel Fernandes 
> CC: Catalin Marinas 
> CC: Dave Watson 
> CC: Will Deacon 
> CC: Shuah Khan 
> CC: Andi Kleen 
> CC: linux-kselft...@vger.kernel.org
> CC: "H . Peter Anvin" 
> CC: Chris Lameter 
> CC: Russell King 
> CC: Michael Kerrisk 
> CC: "Paul E . McKenney" 
> CC: Paul Turner 
> CC: Boqun Feng 
> CC: Josh Triplett 
> CC: Steven Rostedt 
> CC: Ben Maurer 
> CC: linux-...@vger.kernel.org
> CC: Andy Lutomirski 
> CC: Andrew Morton 
> CC: Linus Torvalds 
> ---
>  tools/testing/selftests/rseq/rseq-arm64.h | 17 +++--
>  1 file changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/rseq/rseq-arm64.h 
> b/tools/testing/selftests/rseq/rseq-arm64.h
> index b41a2a48e965..200dae9e4208 100644
> --- a/tools/testing/selftests/rseq/rseq-arm64.h
> +++ b/tools/testing/selftests/rseq/rseq-arm64.h
> @@ -6,7 +6,20 @@
>   * (C) Copyright 2018 - Will Deacon 
>   */
>  
> -#define RSEQ_SIG 0xd428bc00  /* BRK #0x45E0 */
> +/*
> + * aarch64 -mbig-endian generates mixed endianness code vs data:
> + * little-endian code and big-endian data. Ensure the RSEQ_SIG signature
> + * matches code endianness.
> + */
> +#define RSEQ_SIG_CODE0xd428bc00  /* BRK #0x45E0.  */
> +
> +#ifdef __AARCH64EB__
> +#define RSEQ_SIG_DATA0x00bc28d4  /* BRK #0x45E0.  */
> +#else
> +#define RSEQ_SIG_DATARSEQ_SIG_CODE
> +#endif
> +
> +#define RSEQ_SIG RSEQ_SIG_DATA
>  
>  #define rseq_smp_mb()__asm__ __volatile__ ("dmb ish" ::: "memory")
>  #define rseq_smp_rmb()   __asm__ __volatile__ ("dmb ishld" ::: "memory")
> @@ -121,7 +134,7 @@ do {  
> \
>  
>  #define RSEQ_ASM_DEFINE_ABORT(label, abort_label)
> \
>   "   b   222f\n" 
> \
> - "   .inst   "   __rseq_str(RSEQ_SIG) "\n"   
> \
> + "   .inst   "   __rseq_str(RSEQ_SIG_CODE) "\n"  
> \

I don't think this is right; the .inst directive _should_ emit the value
in the instruction stream endianness (i.e. LE, regardless of the data
endianness).

That's certainly the case with the kernel.org crosstool GCC:

[mark@lakrids:/mnt/data/tests/inst-test]% cat test.c
   
void func(void)
{
asm volatile(".inst 0xd401");
}
[mark@lakrids:/mnt/data/tests/inst-test]% usekorg 8.1.0 aarch64-linux-gcc -c 
test.c
[mark@lakrids:/mnt/data/tests/inst-test]% usekorg 8.1.0 aarch64-linux-objdump 
-d test.o

test.o: file format elf64-littleaarch64


Disassembly of section .text:

 :
   0:   d401svc #0x0
   4:   d503201fnop
   8:   d65f03c0ret
[mark@lakrids:/mnt/data/tests/inst-test]% usekorg 8.1.0 aarch64-linux-gcc 
-mbig-endian -c test.c
[mark@lakrids:/mnt/data/tests/inst-test]% usekorg 8.1.0 aarch64-linux-objdump 
-d test.o 

test.o: file format elf64-bigaarch64


Disassembly of section .text:

 :
   0:   d401svc #0x0
   4:   d503201fnop
   8:   d65f03c0ret



Have you tested this? Is there some toolchain that doesn't get this
right?

Thanks,
Mark.

Re: [PATCH] lkdtm: fix potential use after free

2019-04-24 Thread Mark Rutland

On Wed, Apr 24, 2019 at 06:21:03PM +0800, Weikang shi wrote:
> From: swkhack 
> 
> The function lkdtm_WRITE_AFTER_FREE calls kfree(base) to free the memory
> of base. However, following kfree(base),
> it write the memory which base point to via base[offset] = 0x0abcdef0. This 
> may result in a
> use-after-free bug. This patch moves kfree(base) after the write.

As with lkdtm_READ_AFTER_FREE, this is deliberate, and we should not
make this change.

Thanks,
Mark.

> 
> Signed-off-by: swkhack 
> ---
>  drivers/misc/lkdtm/heap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/lkdtm/heap.c b/drivers/misc/lkdtm/heap.c
> index 65026d7de..0b9141525 100644
> --- a/drivers/misc/lkdtm/heap.c
> +++ b/drivers/misc/lkdtm/heap.c
> @@ -40,8 +40,8 @@ void lkdtm_WRITE_AFTER_FREE(void)
>   pr_info("Allocated memory %p-%p\n", base, [offset * 2]);
>   pr_info("Attempting bad write to freed memory at %p\n",
>   [offset]);
> - kfree(base);
>   base[offset] = 0x0abcdef0;
> + kfree(base);
>   /* Attempt to notice the overwrite. */
>   again = kmalloc(len, GFP_KERNEL);
>   kfree(again);
> -- 
> 2.17.1
>

Re: [PATCH] lkdtm: fix potential use after free

2019-04-24 Thread Mark Rutland

Hi,

On Wed, Apr 24, 2019 at 05:59:52PM +0800, Weikang shi wrote:
> From: swkhack 
> 
> The function lkdtm_READ_AFTER_FREE calls kfree(base) to free the memory
> of base. However, following kfree(base),
> it access the memory which base point to via base[offset]. This may result in 
> a
> use-after-free bug. 

This is deliberate; the test is designed to provoke a use-after-free to
verify that this can be detected by tools such as KASAN.

> This patch moves kfree(base) after the dereference.

Doing this would break the test, so we should not make this change.

Was this something you spotted with a static analysis tool?

> Signed-off-by: swkhack 

A signed-off-by line should have your real name rather than an alias.

Thanks,
Mark.

> ---
>  drivers/misc/lkdtm/heap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/misc/lkdtm/heap.c b/drivers/misc/lkdtm/heap.c
> index 65026d7de..3e2f1580a 100644
> --- a/drivers/misc/lkdtm/heap.c
> +++ b/drivers/misc/lkdtm/heap.c
> @@ -77,7 +77,6 @@ void lkdtm_READ_AFTER_FREE(void)
>   base[offset] = *val;
>   pr_info("Value in memory before free: %x\n", base[offset]);
>  
> - kfree(base);
>  
>   pr_info("Attempting bad read from freed memory\n");
>   saw = base[offset];
> @@ -88,6 +87,7 @@ void lkdtm_READ_AFTER_FREE(void)
>   }
>   pr_info("Memory was not poisoned\n");
>  
> + kfree(base);
>   kfree(val);
>  }
>  
> -- 
> 2.17.1
>

Re: [PATCH V2 2/2] arm64/mm: Enable memory hot remove

2019-04-24 Thread Mark Rutland

On Wed, Apr 24, 2019 at 11:29:28AM +0530, Anshuman Khandual wrote:
> On 04/23/2019 09:35 PM, Mark Rutland wrote:
> > On Tue, Apr 23, 2019 at 01:01:58PM +0530, Anshuman Khandual wrote:
> >> Generic usage for init_mm.pagetable_lock
> >>
> >> Unless I have missed something else these are the generic init_mm kernel 
> >> page table
> >> modifiers at runtime (at least which uses init_mm.page_table_lock)
> >>
> >>1. ioremap_page_range() /* Mapped I/O memory area */
> >>2. apply_to_page_range()/* Change existing kernel linear map */
> >>3. vmap_page_range()/* Vmalloc area */
> > 
> > Internally, those all use the __p??_alloc() functions to handle racy
> > additions by transiently taking the PTL when installing a new table, but
> > otherwise walk kernel tables _without_ the PTL held. Note that none of
> > these ever free an intermediate level of table.
> 
> Right they dont free intermediate level page table but I was curious about the
> only the leaf level modifications.

Sure thing; I just wanted to point that out explicitly for everyone else's
benefit. :)

> > I believe that the idea is that operations on separate VMAs should never
> 
> I guess you meant kernel virtual range with 'VMA' but not the actual VMA 
> which is
> vm_area_struct applicable only for the user space not the kernel.

Sure. In the kernel we'd reserve a kernel VA range with a vm_struct via
get_vm_area() or similar.

The key point is that we reserve page-granular VA ranges which cannot overlap
at the leaf level (but may share intermediate levels of table). Whoever owns
that area is in charge of necessary mutual exclusion for the leaf entries.

> > conflict at the leaf level, and operations on the same VMA should be
> > serialised somehow w.r.t. that VMA.
> 
> AFAICT see there is nothing other than hotplug lock i.e mem_hotplug_lock which
> prevents concurrent init_mm modifications and the current situation is only 
> safe
> because some how these VA areas dont overlap with respect to intermediate page
> table level spans.

Here I was ignoring hotplug to describe the general principle (which I've
expanded upon above).

> > AFAICT, these functions are _never_ called on the linear/direct map or
> > vmemmap VA ranges, and whether or not these can conflict with hot-remove
> > is entirely dependent on whether those ranges can share a level of table
> > with the vmalloc region.
> 
> Right but all these VA ranges (linear, vmemmap, vmalloc) are wired in on 
> init_mm
> hence wondering if it is prudent to assume layout scheme which varies a lot 
> based
> on different architectures while deciding possible race protections.

One thing to consider is that we could turn this implicit assumption into a
requirement, if this isn't too invasive.

> Wondering why these user should not call [get|put]_online_mems() to prevent
> race with hotplug.

I suspect that this is because they were written before memory hotplug was
added, and they were never reconsidered in the presence of hot-remove.

> Will try this out.
> 
> Unless generic MM expects these VA ranges (linear, vmemmap, vmalloc) layout 
> to be
> in certain manner from the platform guaranteeing non-overlap at intermediate 
> level
> page table spans. Only then we would not a lock.

I think that we might be able to make this a requirement for hot-remove. I
suspect this happens to be the case in practice by chance, even if it isn't
strictly guaranteed.

> > Do you know how likely that is to occur? e.g. what proportion of the
> 
> TBH I dont know.
> 
> > vmalloc region may share a level of table with the linear or vmemmap
> > regions in a typical arm64 or x86 configuration? Can we deliberately
> > provoke this failure case?
> 
> I have not enumerated those yet but there are multiple configs on arm64 and
> probably on x86 which decides kernel VA space layout causing these potential
> races. But regardless its not right to assume on vmalloc range span and not
> take a lock.
> 
> Not sure how to provoke this failure case from user space with simple hotplug
> because vmalloc physical allocation normally cannot be controlled without a
> hacked kernel change.

I believe that we can write a module which:

 - Looks at the various memory ranges, and determines whether they may share an
   intermediate level of table.
 - reserves a potentially-conflicting region with __get_vm_area_node()
 - in a loop, maps/unmaps something in that range

... while in parallel, adding/removing a potentially-conflicting region of
memory.

So long as we have the same sort of serialization we'd have for a regular
vmalloc(), ioremap() of vmap(), that would be sufficient to demonstrate that
this is

Re: [PATCH v4] arm64: sysreg: Make mrs_s and msr_s macros work with Clang and LTO

2019-04-24 Thread Mark Rutland

Hi Kees,

On Tue, Apr 23, 2019 at 04:26:22PM -0700, Kees Cook wrote:
> Clang's integrated assembler does not allow assembly macros defined
> in one inline asm block using the .macro directive to be used across
> separate asm blocks. LLVM developers consider this a feature and not a
> bug, recommending code refactoring:
> 
>   https://bugs.llvm.org/show_bug.cgi?id=19749
> 
> As binutils doesn't allow macros to be redefined, this change uses
> UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
> in-place and workaround gcc and clang limitations on redefining macros
> across different assembler blocks.

[...]

> diff --git a/arch/arm64/include/asm/irqflags.h 
> b/arch/arm64/include/asm/irqflags.h
> index 43d8366c1e87..d359f28765cb 100644
> --- a/arch/arm64/include/asm/irqflags.h
> +++ b/arch/arm64/include/asm/irqflags.h
> @@ -43,7 +43,7 @@ static inline void arch_local_irq_enable(void)
>   asm volatile(ALTERNATIVE(
>   "msrdaifclr, #2 // arch_local_irq_enable\n"
>   "nop",
> - "msr_s  " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
> + __msr_s(SYS_ICC_PMR_EL1)

Seeing the __stringify() go is nice!

Our assembly happens to use argument 0 by coincidence rather than by deliberate
design, so please have the macro take the template, e.g.

__msr_s(SYS_ICC_PMR_EL1, "%x0")

... that ways it's obvious as to which asm parameter is used, and this won't
complicate refactoring of the assembly (which may shuffle or rename the
parameters).

Likewise for __mrs_s(), e.g.

__mrs_s("%x[va]", SYS_WHATEVER);

[...]

> +#define __mrs_s(r)   \
> + DEFINE_MRS_S\
> +"mrs_s %0, " __stringify(r) "\n" \
> + UNDEFINE_MRS_S
> +
> +#define __msr_s(r)   \
> + DEFINE_MSR_S\
> +"msr_s " __stringify(r) ", %0\n" \
> + UNDEFINE_MSR_S

These can be:

#define __mrs_s(v, r)   \
DEFINE_MRS_S\
"   mrs_s " v ", " __stringify(r) "\n"  \
UNDEFINE_MRS_S


#define __msr_s(r, v)   \
DEFINE_MSR_S\
"   msr_s " __stringify(r) ", " v "\n"  \
UNDEFINE_MSR_S


> +
> +/* For use with "rZ" constraints. */
> +#define __msr_s_x0(r)\
> + DEFINE_MSR_S\
> +"msr_s " __stringify(r) ", %x0\n"\
> + UNDEFINE_MSR_S

I don't think we need this. If we pass the template in, this is solved by the
caller, and we could consistently use the x form regardless.

Otherwise, I think this is as clean as we can make this short of bumping our
minimum binutils version and using the S_<...> names.

Thanks,
Mark.

Re: [PATCH] The patch solves the type error of the parameter “off” in syscall mmap on the ARM64 platform.

2019-04-23 Thread Mark Rutland

On Tue, Apr 23, 2019 at 06:04:45PM +0100, Will Deacon wrote:
> On Thu, Apr 18, 2019 at 12:28:18PM +0100, Mark Rutland wrote:
> > [adding linux-arch and relevant folk]
> > 
> > On Wed, Apr 17, 2019 at 08:35:25PM +0800, Boyang Zhou wrote:
> > > The error information is that “offset value too large for defined data 
> > > type”.
> > > Reason:
> > > On the X86 platform, the data type of “off" is unsigned long; but on the 
> > > ARM64 platform, the data type is defined as off_t, and off_t is by type 
> > > long instead of unsigned long.
> > > When the off right shifts in the function “sys_mmap_pgoff(addr, len, 
> > > prot, flags, fd, off >> PAGE_SHIFT)"on ARM64, high address of off is 
> > > filled with sign bit 1instead of 0.
> > > In our case, we mmap GPU doorbell on both platform. On the x86 platform, 
> > > the value of off is f009c000, after shift the value becomes 
> > > f009c; while on the ARM64, the value of off changes from 
> > > ed35c000 to fffed35c. This value is treated as unsigned 
> > > long in later functions. So it is too big for off and the error happened.
> > > We have tested the patchs in Huawei ARM64 server with a couples of AMD 
> > > GPUs.
> > 
> > It looks like the generic mmap uses unsigned long, as do sparc and x86.
> > 
> > However, arm64, microblase, powerpc and riscv all use off_t.
> > 
> > Should those all be using unsigned long? If so, that seems like it
> > should be a treewide cleanup.
> 
> This is more than just a cleanup, since it changes the behaviour of the
> system call and I'd much rather each architecture made this choice
> themselves. I also don't think we should change the type of the parameter,
> so something like the following instead:
> 
> diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c
> index b44065fb1616..77dc5f006bc4 100644
> --- a/arch/arm64/kernel/sys.c
> +++ b/arch/arm64/kernel/sys.c
> @@ -31,8 +31,10 @@
>  
>  SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
>   unsigned long, prot, unsigned long, flags,
> - unsigned long, fd, off_t, off)
> + unsigned long, fd, off_t, pgoff)
>  {
> + unsigned long off = pgoff;
> +
>   if (offset_in_page(off) != 0)
>   return -EINVAL;

To be honest, I think changing the parameter itself would be cleaner,
but I'm not the maintainer. :)

> > Similar applies to pgoff for mmap2.
> 
> Does it? Looks like unsigned long is used consistently there afaict. Did
> you spot something I missed?

Sorry, I meant tree-wide where csky and riscv use off_t, and all others
use unsigned long.

Thanks,
Mark.

Re: [PATCH V2 2/2] arm64/mm: Enable memory hot remove

2019-04-23 Thread Mark Rutland

On Tue, Apr 23, 2019 at 01:01:58PM +0530, Anshuman Khandual wrote:
> Generic usage for init_mm.pagetable_lock
> 
> Unless I have missed something else these are the generic init_mm kernel page 
> table
> modifiers at runtime (at least which uses init_mm.page_table_lock)
> 
>   1. ioremap_page_range() /* Mapped I/O memory area */
>   2. apply_to_page_range()/* Change existing kernel linear map */
>   3. vmap_page_range()/* Vmalloc area */

Internally, those all use the __p??_alloc() functions to handle racy
additions by transiently taking the PTL when installing a new table, but
otherwise walk kernel tables _without_ the PTL held. Note that none of
these ever free an intermediate level of table.

I believe that the idea is that operations on separate VMAs should never
conflict at the leaf level, and operations on the same VMA should be
serialised somehow w.r.t. that VMA.

AFAICT, these functions are _never_ called on the linear/direct map or
vmemmap VA ranges, and whether or not these can conflict with hot-remove
is entirely dependent on whether those ranges can share a level of table
with the vmalloc region.

Do you know how likely that is to occur? e.g. what proportion of the
vmalloc region may share a level of table with the linear or vmemmap
regions in a typical arm64 or x86 configuration? Can we deliberately
provoke this failure case?

[...]

> In all of the above.
> 
> - Page table pages [p4d|pud|pmd|pte]_alloc_[kernel] settings are
>   protected with init_mm.page_table_lock

Racy addition is protect in this manner.

> - Should not it require init_mm.page_table_lock for all leaf level
>   (PUD|PMD|PTE) modification as well ?

As above, I believe that the PTL is assumed to not be necessary there
since other mutual exclusion should be in effect to prevent racy
modification of leaf entries.

> - Should not this require init_mm.page_table_lock for page table walk
>   itself ?
> 
> Not taking an overall lock for all these three operations will
> potentially race with an ongoing memory hot remove operation which
> takes an overall lock as proposed. Wondering if this has this been
> safe till now ?

I suspect that the answer is that hot-remove is not thoroughly
stress-tested today, and conflicts are possible but rare.

As above, can we figure out how likely conflicts are, and try to come up
with a stress test?

Is it possible to avoid these specific conflicts (ignoring ptdump) by
aligning VA regions such that they cannot share intermediate levels of
table?

Thanks,
Mark.

Re: [PATCH v3] arm64: sysreg: make mrs_s and msr_s macros work with Clang and LTO

2019-04-23 Thread Mark Rutland

On Mon, Apr 22, 2019 at 10:44:10AM -0700, Nick Desaulniers wrote:
> On Tue, Apr 16, 2019 at 9:03 PM Kees Cook  wrote:
> >
> > On Tue, Apr 16, 2019 at 12:08 PM Mark Rutland  wrote:
> > >
> > > On Mon, Apr 15, 2019 at 10:22:27AM -0700, Nick Desaulniers wrote:
> > > > On Mon, Apr 15, 2019 at 10:06 AM Mark Rutland  
> > > > wrote:
> > > > > It would be nice if we could simply rely on a more recent binutils 
> > > > > these
> > > > > days, which supports the generic S sysreg
> > > > > definition. That would mean we could get rid of the whole msr_s/mrs_s
> > > > > hack by turning that into a CPP macro which built that name.
> 
> Mark, can you give me a test case for this? I'd like to check if
> clang's integrated assembler supports this or not, so we can file an
> issue and fix it if not.

The below is a smoke test. The entire op0:op1:cn:cm:op2 space is 14
bits, so it should be feasible to generate and run an exhautive test for
all encodings.

I've included SYS and SYSL here too, since I strongly suspect we'll need
those going forward.

Thanks
Mark.

>8
/*
 * ${CC} -c test.c
 *
 * OLD_* encodings already exist, NEW_* encodings haven't yet be
 * allocated (per ARM DDI 0487D.a), but should assemble. All encodings
 * chosen arbitrarily.
 */

// OSDTRRX_EL1
#define OLD_DBG_REG "s2_0_c0_c0_2"
#define NEW_DBG_REG "s2_7_c0_c15_0"

// MIDR_EL1
#define OLD_SYS_REG "s3_0_c0_c0_0"
#define NEW_SYS_REG "s3_6_c1_c0_7"

#define REG(r)  \
asm volatile("msr " r ", xzr\n");   \
asm volatile("mrs xzr, " r "\n")

// DC IVAC, XZR
#define OLD_SYS "#0, c7, c6, #1"
#define NEW_SYS "#7, c15, c15, #7"

#define NEW_SYSL_MIN"#0, c0, c0, #0"
#define NEW_SYSL_MAX"#7, c15, c15, #7"

#define SYS(s)  asm volatile("sys " s ", xzr\n")
#define SYSL(s) asm volatile("sysl xzr, " s "\n")

void test(void)
{
REG(OLD_DBG_REG);
REG(NEW_DBG_REG);

REG(OLD_SYS_REG);
REG(NEW_SYS_REG);

SYS(OLD_SYS);
SYS(NEW_SYS);

SYSL(NEW_SYSL_MIN);
SYSL(NEW_SYSL_MAX);
}

Re: Linux Testing Microconference at LPC

2019-04-23 Thread Mark Rutland

On Thu, Apr 11, 2019 at 10:37:51AM -0700, Dhaval Giani wrote:
> Hi Folks,
> 
> This is a call for participation for the Linux Testing microconference
> at LPC this year.
> 
> For those who were at LPC last year, as the closing panel mentioned,
> testing is probably the next big push needed to improve quality. From
> getting more selftests in, to regression testing to ensure we don't
> break realtime as more of PREEMPT_RT comes in, to more stable distros,
> we need more testing around the kernel.
> 
> We have talked about different efforts around testing, such as fuzzing
> (using syzkaller and trinity), automating fuzzing with syzbot, 0day
> testing, test frameworks such as ktests, smatch to find bugs in the
> past. We want to push this discussion further this year and are
> interested in hearing from you what you want to talk about, and where
> kernel testing needs to go next.

I'd be interested to discuss what we could do with annotations and
compiler instrumentation to make the kernel more amenable to static and
dynamic analysis (and to some extent, documenting implicit
requirements).

One idea that I'd like to explore in the context of RT is to annotate
function signatures with their required IRQ/preempt context, such that
we could dynamically check whether those requirements were violated
(even if it didn't happen to cause a problem at that point in time), and
static analysis would be able to find some obviously broken usage. I had
some rough ideas of how to do the dynamic part atop/within ftrace. Maybe
there are similar problems elsewhere.

I know that some clang folk were interested in similar stuff. IIRC Nick
Desaulniers was interested in whether clang's thread safety analysis
tooling could be applied to the kernel (e.g. based on lockdep
annotations).

Thanks,
Mark.

Re: [PATCH] The patch solves the type error of the parameter “off” in syscall mmap on the ARM64 platform.

2019-04-18 Thread Mark Rutland

[adding linux-arch and relevant folk]

On Wed, Apr 17, 2019 at 08:35:25PM +0800, Boyang Zhou wrote:
> The error information is that “offset value too large for defined data type”.
> Reason:
> On the X86 platform, the data type of “off" is unsigned long; but on the 
> ARM64 platform, the data type is defined as off_t, and off_t is by type long 
> instead of unsigned long.
> When the off right shifts in the function “sys_mmap_pgoff(addr, len, prot, 
> flags, fd, off >> PAGE_SHIFT)"on ARM64, high address of off is filled with 
> sign bit 1instead of 0.
> In our case, we mmap GPU doorbell on both platform. On the x86 platform, the 
> value of off is f009c000, after shift the value becomes 
> f009c; while on the ARM64, the value of off changes from 
> ed35c000 to fffed35c. This value is treated as unsigned long 
> in later functions. So it is too big for off and the error happened.
> We have tested the patchs in Huawei ARM64 server with a couples of AMD GPUs.

It looks like the generic mmap uses unsigned long, as do sparc and x86.

However, arm64, microblase, powerpc and riscv all use off_t.

Should those all be using unsigned long? If so, that seems like it
should be a treewide cleanup.

Similar applies to pgoff for mmap2.

Thanks,
Mark.

> 
> Signed-off-by: Boyang Zhou 
> ---
>  arch/arm64/kernel/sys.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/sys.c b/arch/arm64/kernel/sys.c
> index b44065f..6f91e81 100644
> --- a/arch/arm64/kernel/sys.c
> +++ b/arch/arm64/kernel/sys.c
> @@ -31,7 +31,7 @@
>  
>  SYSCALL_DEFINE6(mmap, unsigned long, addr, unsigned long, len,
>   unsigned long, prot, unsigned long, flags,
> - unsigned long, fd, off_t, off)
> + unsigned long, fd, unsigned long, off)
>  {
>   if (offset_in_page(off) != 0)
>   return -EINVAL;
> -- 
> 2.7.4
>

Re: [PATCH V2 2/2] arm64/mm: Enable memory hot remove

2019-04-17 Thread Mark Rutland

On Wed, Apr 17, 2019 at 10:15:35PM +0530, Anshuman Khandual wrote:
> On 04/17/2019 07:51 PM, Mark Rutland wrote:
> > On Wed, Apr 17, 2019 at 03:28:18PM +0530, Anshuman Khandual wrote:
> >> On 04/15/2019 07:18 PM, Mark Rutland wrote:
> >>> On Sun, Apr 14, 2019 at 11:29:13AM +0530, Anshuman Khandual wrote:

> >>>> +spin_unlock(_mm.page_table_lock);
> >>>
> >>> What precisely is the page_table_lock intended to protect?
> >>
> >> Concurrent modification to kernel page table (init_mm) while clearing 
> >> entries.
> > 
> > Concurrent modification by what code?
> > 
> > If something else can *modify* the portion of the table that we're
> > manipulating, then I don't see how we can safely walk the table up to
> > this point without holding the lock, nor how we can safely add memory.
> > 
> > Even if this is to protect something else which *reads* the tables,
> > other code in arm64 which modifies the kernel page tables doesn't take
> > the lock.
> > 
> > Usually, if you can do a lockless walk you have to verify that things
> > didn't change once you've taken the lock, but we don't follow that
> > pattern here.
> > 
> > As things stand it's not clear to me whether this is necessary or
> > sufficient.
> 
> Hence lets take more conservative approach and wrap the entire process of
> remove_pagetable() under init_mm.page_table_lock which looks safe unless
> in the worst case when free_pages() gets stuck for some reason in which
> case we have bigger memory problem to deal with than a soft lock up.

Sorry, but I'm not happy with _any_ solution until we understand where
and why we need to take the init_mm ptl, and have made some effort to
ensure that the kernel correctly does so elsewhere. It is not sufficient
to consider this code in isolation.

IIUC, before this patch we never clear non-leaf entries in the kernel
page tables, so readers don't presently need to take the ptl in order to
safely walk down to a leaf entry.

For example, the arm64 ptdump code never takes the ptl, and as of this
patch it will blow up if it races with a hot-remove, regardless of
whether the hot-remove code itself holds the ptl.

Note that the same applies to the x86 ptdump code; we cannot assume that
just because x86 does something that it happens to be correct.

I strongly suspect there are other cases that would fall afoul of this,
in both arm64 and generic code.

Thanks,
Mark.

Re: [PATCH V2 2/2] arm64/mm: Enable memory hot remove

2019-04-17 Thread Mark Rutland

On Wed, Apr 17, 2019 at 03:28:18PM +0530, Anshuman Khandual wrote:
> On 04/15/2019 07:18 PM, Mark Rutland wrote:
> > On Sun, Apr 14, 2019 at 11:29:13AM +0530, Anshuman Khandual wrote:
> >> Memory removal from an arch perspective involves tearing down two different
> >> kernel based mappings i.e vmemmap and linear while releasing related page
> >> table pages allocated for the physical memory range to be removed.
> >>
> >> Define a common kernel page table tear down helper remove_pagetable() which
> >> can be used to unmap given kernel virtual address range. In effect it can
> >> tear down both vmemap or kernel linear mappings. This new helper is called
> >> from both vmemamp_free() and ___remove_pgd_mapping() during memory removal.
> >> The argument 'direct' here identifies kernel linear mappings.
> > 
> > Can you please explain why we need to treat these differently? I thought
> > the next paragraph was going to do that, but as per my comment there it
> > doesn't seem to be relevant. :/
> 
> For linear mapping there is no actual allocated page which is mapped. Its the
> pfn derived from physical address (from __va(PA)-->PA translation) which is
> there in the page table entry and need not be freed any where during tear 
> down.
> 
> But in case of vmemmap (struct page mapping for a given range) which is a real
> virtual mapping (like vmalloc) real pages are allocated (buddy or memblock) 
> and
> are mapped in it's page table entries to effect the translation. These pages
> need to be freed while tearing down the translation. But for both mappings
> (linear and vmemmap) their page table pages need to be freed.
> 
> This differentiation is needed while deciding if [pte|pmd|pud]_page() at any
> given level needs to be freed or not. Will update the commit message with this
> explanation if required.

Ok. I think you just need to say:

  When removing a vmemmap pagetable range, we must also free the pages
  used to back this range of the vmemmap.

> >> While here update arch_add_mempory() to handle __add_pages() failures by
> >> just unmapping recently added kernel linear mapping. 
> > 
> > Is this a latent bug?
> 
> Did not get it. __add_pages() could fail because of __add_section() in which
> case we should remove the linear mapping added previously in the first step.
> Is there any concern here ?

That is the question.

If that were to fail _before_ this series were applied, does that permit
anything bad to happen? e.g. is it permitted that when arch_add_memory()
fails, the relevant memory can be physically removed?

If so, that could result in a number of problems, and would be a latent
bug...

[...]

> >> +#ifdef CONFIG_MEMORY_HOTPLUG
> >> +static void free_pagetable(struct page *page, int order)
> > 
> > On arm64, all of the stage-1 page tables other than the PGD are always
> > PAGE_SIZE. We shouldn't need to pass an order around in order to free
> > page tables.
> > 
> > It looks like this function is misnamed, and is used to free vmemmap
> > backing pages in addition to page tables used to map them. It would be
> > nicer to come up with a better naming scheme.
> 
> free_pagetable() is being used both for freeing page table pages as well
> mapped entries at various level (for vmemmap). As you rightly mentioned
> page table pages are invariably PAGE_SIZE (other than pgd) but theses
> mapped pages size can vary at various level. free_pagetable() is a very
> generic helper which can accommodate pages allocated from buddy as well
> as memblock. But I agree that the naming is misleading.
> 
> Will something like this will be better ?
> 
> void free_pagetable_mapped_page(struct page *page, int order)
> {
>   ...
>   ...
> }
> 
> void free_pagetable_page(struct page *page)
> {
>   free_pagetable_mapped_page(page, 0);
> }
> 
> - Call free_pgtable_page() while freeing pagetable pages
> - Call free_pgtable_mapped_page() while freeing mapped pages

I think the "pgtable" naming isn't necessary. These functions are passed
the relevant page, and free that page (or a range starting at that
page).

I think it would be better to have something like:

static void free_hotplug_page_range(struct page *page, unsigned long size)
{
int order = get_order(size);
int nr_pages = 1 << order;

...
}

static void free_hotplug_page(struct page *page)
{
free_hotplug_page_range(page, PAGE_SIZE);
}

... which avoids having to place get_order() in all the callers, and
makes things a bit easier to read.

> > 
> >> +{
> >> +  unsigned long magic;
> >> +  unsigned in

Re: [PATCH v3] arm64: sysreg: make mrs_s and msr_s macros work with Clang and LTO

2019-04-16 Thread Mark Rutland

On Mon, Apr 15, 2019 at 10:22:27AM -0700, Nick Desaulniers wrote:
> On Mon, Apr 15, 2019 at 10:06 AM Mark Rutland  wrote:
> > On Mon, Apr 15, 2019 at 07:38:21AM -0700, Kees Cook wrote:
> > > diff --git a/arch/arm64/include/asm/irqflags.h 
> > > b/arch/arm64/include/asm/irqflags.h
> > > index 43d8366c1e87..06d3987d1546 100644
> > > --- a/arch/arm64/include/asm/irqflags.h
> > > +++ b/arch/arm64/include/asm/irqflags.h
> > > @@ -43,7 +43,9 @@ static inline void arch_local_irq_enable(void)
> > >   asm volatile(ALTERNATIVE(
> > >   "msrdaifclr, #2 // arch_local_irq_enable\n"
> > >   "nop",
> > > + DEFINE_MSR_S
> > >   "msr_s  " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
> > > + UNDEFINE_MSR_S
> >
> > If we do need this, can we wrap this in a larger CPP macro that does the
> > whole sequence of defining, using, and undefining the asm macros?
> 
> I agree that would be cleaner.  For example, __mrs_s/__msr_s can
> ALMOST do this (be used in arch/arm64/include/asm/irqflags.h) except
> for the input/output constraints.  Maybe the constraints can be
> removed from the cpp macro definition, then the constraints be left to
> the macro expansion sites?  That way the DEFINE_MXX_S/UNDEFINE_MXX_S
> can be hidden in the cpp macro itself.

If those took the register template explicitly, i.e. something like:

__mrs_s(SYS_ICC_PMR_EL1, "%x0") 

__msr_s("%[whatever]", SYS_FOO)

... that would be more generally useful and clear, as they could be used
in larger asm sequences.

Thanks,
Mark.

Re: [PATCH v3] arm64: sysreg: make mrs_s and msr_s macros work with Clang and LTO

2019-04-16 Thread Mark Rutland

On Mon, Apr 15, 2019 at 10:22:27AM -0700, Nick Desaulniers wrote:
> On Mon, Apr 15, 2019 at 10:06 AM Mark Rutland  wrote:
> > It would be nice if we could simply rely on a more recent binutils these
> > days, which supports the generic S sysreg
> > definition. That would mean we could get rid of the whole msr_s/mrs_s
> > hack by turning that into a CPP macro which built that name.
> >
> > It looks like binutils has been able to do that since September 2014...
> >
> > Are folk using toolchains older than that to compile kernels?
> 
> Do you have a link to a commit?  If we can pinpoint the binutils
> version, that might help.

IIUC any version of binutils with commit:

  df7b4545b2b49572 ("[PATCH/AArch64] Generic support for all system registers 
using mrs and msr")

... should be sufficent.

That's on gitweb at:

  
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=df7b4545b2b49572ab63690c130df973af109615

> Also, I look forward to this patch for use of Clang's integrated
> assembler (regardless of LTO).  I remember getting frustrated trying
> to figure out how to resolve this for both assemblers, and I had
> forgotten this solution existed.

Is this the only blocker for the integrated assembler?

Thanks,
Mark.

Re: [PATCH v3] arm64: sysreg: make mrs_s and msr_s macros work with Clang and LTO

2019-04-15 Thread Mark Rutland

On Mon, Apr 15, 2019 at 07:38:21AM -0700, Kees Cook wrote:
> From: Alex Matveev 
> 
> Clang's integrated assembler does not allow assembly macros defined
> in one inline asm block using the .macro directive to be used across
> separate asm blocks. LLVM developers consider this a feature and not a
> bug, recommending code refactoring:
> 
>   https://bugs.llvm.org/show_bug.cgi?id=19749
> 
> As binutils doesn't allow macros to be redefined, this change uses
> UNDEFINE_MRS_S and UNDEFINE_MSR_S to define corresponding macros
> in-place and workaround gcc and clang limitations on redefining macros
> across different assembler blocks.
> 
> Specifically, the current state after preprocessing looks like this:
> 
> asm volatile(".macro mXX_s ... .endm");
> void f()
> {
>   asm volatile("mXX_s a, b");
> }
> 
> With GCC, it gives macro redefinition error because sysreg.h is included
> in multiple source files, and assembler code for all of them is later
> combined for LTO (I've seen an intermediate file with hundreds of
> identical definitions).
> 
> With clang, it gives macro undefined error because clang doesn't allow
> sharing macros between inline asm statements.
> 
> I also seem to remember catching another sort of undefined error with
> GCC due to reordering of macro definition asm statement and generated
> asm code for function that uses the macro.
> 
> The solution with defining and undefining for each use, while certainly
> not elegant, satisfies both GCC and clang, LTO and non-LTO.
> 
> Signed-off-by: Alex Matveev 
> Signed-off-by: Yury Norov 
> Signed-off-by: Sami Tolvanen 
> Signed-off-by: Kees Cook 
> ---
> v3: split out patch as stand-alone, added more uses in irqflags,
> updated commit log, based on discussion in
> https://lore.kernel.org/patchwork/patch/851580/
> ---
>  arch/arm64/include/asm/irqflags.h | 12 +--
>  arch/arm64/include/asm/kvm_hyp.h  |  8 +++--
>  arch/arm64/include/asm/sysreg.h   | 55 +--
>  3 files changed, 53 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/irqflags.h 
> b/arch/arm64/include/asm/irqflags.h
> index 43d8366c1e87..06d3987d1546 100644
> --- a/arch/arm64/include/asm/irqflags.h
> +++ b/arch/arm64/include/asm/irqflags.h
> @@ -43,7 +43,9 @@ static inline void arch_local_irq_enable(void)
>   asm volatile(ALTERNATIVE(
>   "msrdaifclr, #2 // arch_local_irq_enable\n"
>   "nop",
> + DEFINE_MSR_S
>   "msr_s  " __stringify(SYS_ICC_PMR_EL1) ",%0\n"
> + UNDEFINE_MSR_S

If we do need this, can we wrap this in a larger CPP macro that does the
whole sequence of defining, using, and undefining the asm macros?

It would be nice if we could simply rely on a more recent binutils these
days, which supports the generic S sysreg
definition. That would mean we could get rid of the whole msr_s/mrs_s
hack by turning that into a CPP macro which built that name.

It looks like binutils has been able to do that since September 2014...

Are folk using toolchains older than that to compile kernels?

Thanks,
Mark.

Re: [PATCH] kcov: improve CONFIG_ARCH_HAS_KCOV help text

2019-04-15 Thread Mark Rutland

On Fri, Apr 12, 2019 at 12:31:10PM +0200, Dmitry Vyukov wrote:
> On Fri, Apr 12, 2019 at 12:27 PM Mark Rutland  wrote:
> >
> > The help text for CONFIG_ARCH_HAS_KCOV is stale, and describes the
> > feature as being enabled only for x86_64, when it is now enabled for
> > several architectures, including arm, arm64, powerpc, and s390.
> >
> > Let's remove that stale help text, and update it along the lines of hat
> > for ARCH_HAS_FORTIFY_SOURCE, better describing when an architecture
> > should select CONFIG_ARCH_HAS_KCOV.
> >
> > Signed-off-by: Mark Rutland 
> > Cc: Andrew Morton 
> > Cc: Dmitry Vyukov 
> > Cc: Kees Cook 
> > ---
> >  lib/Kconfig.debug | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 0d9e81779e37..00dbcdbc9a0d 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -753,9 +753,9 @@ endmenu # "Memory Debugging"
> >  config ARCH_HAS_KCOV
> > bool
> > help
> > - KCOV does not have any arch-specific code, but currently it is 
> > enabled
> > - only for x86_64. KCOV requires testing on other archs, and most 
> > likely
> > - disabling of instrumentation for some early boot code.
> > + An architecture should select this when it can successfully
> > + build and run with CONFIG_KCOV. This typically requires
> > + disabling instrumentation for some early boot code.
> >
> >  config CC_HAS_SANCOV_TRACE_PC
> > def_bool $(cc-option,-fsanitize-coverage=trace-pc)
> > --
> > 2.11.0
> 
> 
> Acked-by: Dmitry Vyukov 

Thanks!

Andrew, are you happy to pick this up?

Mark.

Re: [PATCH V2 2/2] arm64/mm: Enable memory hot remove

2019-04-15 Thread Mark Rutland

Hi Anshuman,

On Sun, Apr 14, 2019 at 11:29:13AM +0530, Anshuman Khandual wrote:
> Memory removal from an arch perspective involves tearing down two different
> kernel based mappings i.e vmemmap and linear while releasing related page
> table pages allocated for the physical memory range to be removed.
> 
> Define a common kernel page table tear down helper remove_pagetable() which
> can be used to unmap given kernel virtual address range. In effect it can
> tear down both vmemap or kernel linear mappings. This new helper is called
> from both vmemamp_free() and ___remove_pgd_mapping() during memory removal.
> The argument 'direct' here identifies kernel linear mappings.

Can you please explain why we need to treat these differently? I thought
the next paragraph was going to do that, but as per my comment there it
doesn't seem to be relevant. :/

> Vmemmap mappings page table pages are allocated through sparse mem helper
> functions like vmemmap_alloc_block() which does not cycle the pages through
> pgtable_page_ctor() constructs. Hence while removing it skips corresponding
> destructor construct pgtable_page_dtor().

I thought the ctor/dtor dance wasn't necessary for any init_mm tables,
so why do we need to mention it here specifically for the vmemmap
tables?

> While here update arch_add_mempory() to handle __add_pages() failures by
> just unmapping recently added kernel linear mapping. 

Is this a latent bug?

> Now enable memory hot remove on arm64 platforms by default with
> ARCH_ENABLE_MEMORY_HOTREMOVE.
> 
> This implementation is overall inspired from kernel page table tear down
> procedure on X86 architecture.
> 
> Signed-off-by: Anshuman Khandual 
> ---
>  arch/arm64/Kconfig   |   3 +
>  arch/arm64/include/asm/pgtable.h |   2 +
>  arch/arm64/mm/mmu.c  | 221 
> ++-
>  3 files changed, 224 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index c383625..a870eb2 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -267,6 +267,9 @@ config HAVE_GENERIC_GUP
>  config ARCH_ENABLE_MEMORY_HOTPLUG
>   def_bool y
>  
> +config ARCH_ENABLE_MEMORY_HOTREMOVE
> + def_bool y
> +
>  config SMP
>   def_bool y
>  
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index de70c1e..1ee22ff 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -555,6 +555,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
>  
>  #else
>  
> +#define pmd_index(addr) 0
>  #define pud_page_paddr(pud)  ({ BUILD_BUG(); 0; })
>  
>  /* Match pmd_offset folding in  */
> @@ -612,6 +613,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
>  
>  #else
>  
> +#define pud_index(adrr)  0
>  #define pgd_page_paddr(pgd)  ({ BUILD_BUG(); 0;})
>  
>  /* Match pud_offset folding in  */
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ef82312..a4750fe 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -733,6 +733,194 @@ int kern_addr_valid(unsigned long addr)
>  
>   return pfn_valid(pte_pfn(pte));
>  }
> +
> +#ifdef CONFIG_MEMORY_HOTPLUG
> +static void free_pagetable(struct page *page, int order)

On arm64, all of the stage-1 page tables other than the PGD are always
PAGE_SIZE. We shouldn't need to pass an order around in order to free
page tables.

It looks like this function is misnamed, and is used to free vmemmap
backing pages in addition to page tables used to map them. It would be
nicer to come up with a better naming scheme.

> +{
> + unsigned long magic;
> + unsigned int nr_pages = 1 << order;
> +
> + if (PageReserved(page)) {
> + __ClearPageReserved(page);
> +
> + magic = (unsigned long)page->freelist;
> + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {

Not a new problem, but it's unfortunate that the core code reuses the
page::freelist field for this, given it also uses page::private for the
section number. Using fields from different parts of the union doesn't
seem robust.

It would seem nicer to have a private2 field in the struct for anonymous
pages.

> + while (nr_pages--)
> + put_page_bootmem(page++);
> + } else {
> + while (nr_pages--)
> + free_reserved_page(page++);
> + }
> + } else {
> + free_pages((unsigned long)page_address(page), order);
> + }
> +}
> +
> +#if (CONFIG_PGTABLE_LEVELS > 2)
> +static void free_pte_table(pte_t *pte_start, pmd_t *pmd)

As a general note, for arm64 please append a 'p' for pointers to
entries, i.e. these should be ptep and pmdp.

> +{
> + pte_t *pte;
> + int i;
> +
> + for (i = 0; i < PTRS_PER_PTE; i++) {
> + pte = pte_start + i;
> + if (!pte_none(*pte))
> + return;
> + }

You could get rid of the pte

Re: [PATCH v2] arm64: dts: qcom: Add Lenovo Miix 630

2019-04-12 Thread Mark Rutland

On Fri, Apr 12, 2019 at 09:19:18AM -0600, Jeffrey Hugo wrote:
> > > + keyboard@3a {
> > > + /* QTEC0001 is the ACPI HID, which could be used for quirks 
> > > */
> > > + compatible = "QTEC0001", "hid-over-i2c";
> >
> > As mentioned last time, please drop the ACPI HID, and allocate a real
> > compatible string.
> 
> So, I'm in a quandary with this device.  As far as I can tell, its an
> off the shelf component, the device adheres to the "PNP0C50" spec (HID
> over I2C), and can be driven by the full "hid-over-i2c" driver (which
> is just a DT shim over the PNP0C50 ACPI driver).  However, the device
> itself identifies itself as an ELAN 400 device, which is an ID that is
> also used for standalone touchpad devices.  Per my understanding of
> the Linux drivers, there is a separate ELAN driver for the standalone
> touchpad devices as its been discovered though trial and error that
> the Linux PNP0C50 driver cannot drive those devices.  To handle this,
> there is a quirk in hid-quirks which rejects ELAN 400 devices, except
> those which are "QTEC0001".
> 
> We need that quirk bypass for this device because the ELAN driver
> cannot handle this device.

This is useful context; thanks for writing this up!

> I'd much rather have a single identifier to quirk on, rather than
> having one for DT and one for ACPI, and its not looking feasible to
> get the vendor to update the ACPI, so it seems like using the ACPI
> identifier is just simpler.
>
> So, if you want a different compatible string, I'll need to go put DT
> in a driver that is primarily ACPI.  I'm not sure what the HID folks
> will think of that. 

My objection is that an ACPI HID is _not_ a DT compatible string, and
the two should be treated separately. Munging the two together opens the
door for other pain.

The driver in question has a DT probe function, i2c_hid_of_probe, so
there's certainly a place to wire up that quirk.

> I'll propose it, but what do you view is a
> "proper" compatible string?  "ELAN0400-msm8998-clamshell"?

A proper compatible string has a vendor-prefix, and is documented
somewhere in Documentation/devicetree/bindings.

e.g. you could allocate something like:

"qcom,msm8998-clamshell-hid-over-i2c"

Thanks,
Mark.

Re: [PATCH v2] arm64: dts: qcom: Add Lenovo Miix 630

2019-04-12 Thread Mark Rutland

On Thu, Apr 11, 2019 at 01:51:44PM -0700, Jeffrey Hugo wrote:
> This adds the initial DT for the Lenovo Miix 630 laptop.  Supported
> functionality includes USB (host), microSD-card, keyboard, and trackpad.
> 
> Signed-off-by: Jeffrey Hugo 
> ---
> 
> v2:
> -Changed "cls" to "clam" since feedback indicated "cls" is too opaque, but
> "clamshell" is a mouthfull.  "clam" seems to be a happy medium.

Please use "clamshell". The extra 5 characters aren't going to hurt
anyone, and it avoids having yet another short name just confuses
matters further.

[...]

> + keyboard@3a {
> + /* QTEC0001 is the ACPI HID, which could be used for quirks */
> + compatible = "QTEC0001", "hid-over-i2c";

As mentioned last time, please drop the ACPI HID, and allocate a real
compatible string.

Thanks,
Mark.

[PATCH] kcov: improve CONFIG_ARCH_HAS_KCOV help text

2019-04-12 Thread Mark Rutland

The help text for CONFIG_ARCH_HAS_KCOV is stale, and describes the
feature as being enabled only for x86_64, when it is now enabled for
several architectures, including arm, arm64, powerpc, and s390.

Let's remove that stale help text, and update it along the lines of hat
for ARCH_HAS_FORTIFY_SOURCE, better describing when an architecture
should select CONFIG_ARCH_HAS_KCOV.

Signed-off-by: Mark Rutland 
Cc: Andrew Morton 
Cc: Dmitry Vyukov 
Cc: Kees Cook 
---
 lib/Kconfig.debug | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 0d9e81779e37..00dbcdbc9a0d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -753,9 +753,9 @@ endmenu # "Memory Debugging"
 config ARCH_HAS_KCOV
bool
help
- KCOV does not have any arch-specific code, but currently it is enabled
- only for x86_64. KCOV requires testing on other archs, and most likely
- disabling of instrumentation for some early boot code.
+ An architecture should select this when it can successfully
+ build and run with CONFIG_KCOV. This typically requires
+ disabling instrumentation for some early boot code.
 
 config CC_HAS_SANCOV_TRACE_PC
def_bool $(cc-option,-fsanitize-coverage=trace-pc)
-- 
2.11.0

Re: [RESEND][PATCH v2] firmware/psci: add support for SYSTEM_RESET2

2019-04-11 Thread Mark Rutland

On Thu, Apr 11, 2019 at 11:33:46AM +0100, Sudeep Holla wrote:
> PSCI v1.1 introduced SYSTEM_RESET2 to allow both architectural resets
> where the semantics are described by the PSCI specification itself as
> well as vendor-specific resets. Currently only system warm reset
> semantics is defined as part of architectural resets by the specification.
> 
> This patch implements support for SYSTEM_RESET2 by making using of
> reboot_mode passed by the reboot infrastructure in the kernel.
> 
> Cc: Mark Rutland 
> Cc: Lorenzo Pieralisi 
> Signed-off-by: Sudeep Holla 
> ---
>  drivers/firmware/psci.c   | 21 +
>  include/uapi/linux/psci.h |  2 ++
>  2 files changed, 23 insertions(+)
> 
> Resending [1] based on the request. I hope to get some testing this time.
> Last time Xilinx asked multiple times but never got any review or testing
> https://lore.kernel.org/lkml/1525257003-8608-1-git-send-email-sudeep.ho...@arm.com/
> 
> diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
> index c80ec1d03274..91748725534e 100644
> --- a/drivers/firmware/psci.c
> +++ b/drivers/firmware/psci.c
> @@ -88,6 +88,7 @@ static u32 psci_function_id[PSCI_FN_MAX];
>   PSCI_1_0_EXT_POWER_STATE_TYPE_MASK)
> 
>  static u32 psci_cpu_suspend_feature;
> +static bool psci_system_reset2_supported;
> 
>  static inline bool psci_has_ext_power_state(void)
>  {
> @@ -253,6 +254,15 @@ static int get_set_conduit_method(struct device_node *np)
> 
>  static void psci_sys_reset(enum reboot_mode reboot_mode, const char *cmd)
>  {
> + if ((reboot_mode == REBOOT_WARM || reboot_mode == REBOOT_SOFT) &&
> + psci_system_reset2_supported)
> + /*
> +  * reset_type[31] = 0 (architectural)
> +  * reset_type[30:0] = 0 (SYSTEM_WARM_RESET)
> +  * cookie = 0 (ignored by the implementation)
> +  */
> + invoke_psci_fn(PSCI_FN_NATIVE(1_1, SYSTEM_RESET2), 0, 0, 0);

Since the comment and invocation span multiple lines, could we please
wrap them in braces?

Other than that, this looks good to me, so:

Acked-by: Mark Rutland 

... I assume that Aaro will give this some testing.

Thanks,
Mark.

> +
>   invoke_psci_fn(PSCI_0_2_FN_SYSTEM_RESET, 0, 0, 0);
>  }
> 
> @@ -451,6 +461,16 @@ static const struct platform_suspend_ops 
> psci_suspend_ops = {
>   .enter  = psci_system_suspend_enter,
>  };
> 
> +static void __init psci_init_system_reset2(void)
> +{
> + int ret;
> +
> + ret = psci_features(PSCI_FN_NATIVE(1_1, SYSTEM_RESET2));
> +
> + if (ret != PSCI_RET_NOT_SUPPORTED)
> + psci_system_reset2_supported = true;
> +}
> +
>  static void __init psci_init_system_suspend(void)
>  {
>   int ret;
> @@ -588,6 +608,7 @@ static int __init psci_probe(void)
>   psci_init_smccc();
>   psci_init_cpu_suspend();
>   psci_init_system_suspend();
> + psci_init_system_reset2();
>   }
> 
>   return 0;
> diff --git a/include/uapi/linux/psci.h b/include/uapi/linux/psci.h
> index b3bcabe380da..5b0ba0062541 100644
> --- a/include/uapi/linux/psci.h
> +++ b/include/uapi/linux/psci.h
> @@ -49,8 +49,10 @@
> 
>  #define PSCI_1_0_FN_PSCI_FEATURESPSCI_0_2_FN(10)
>  #define PSCI_1_0_FN_SYSTEM_SUSPEND   PSCI_0_2_FN(14)
> +#define PSCI_1_1_FN_SYSTEM_RESET2PSCI_0_2_FN(18)
> 
>  #define PSCI_1_0_FN64_SYSTEM_SUSPEND PSCI_0_2_FN64(14)
> +#define PSCI_1_1_FN64_SYSTEM_RESET2  PSCI_0_2_FN64(18)
> 
>  /* PSCI v0.2 power state encoding for CPU_SUSPEND function */
>  #define PSCI_0_2_POWER_STATE_ID_MASK 0x
> --
> 2.17.1
>

Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture

2019-04-09 Thread Mark Rutland

On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote:
> 
> I'm just doing my initial read-through,.. however
> 
> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
> > +   if (!(sec->sh.sh_flags & SHF_EXECINSTR)
> > +   && (strcmp(sec->name, ".altinstr_replacement") || 
> > !IGNORE_SHF_EXEC_FLAG))
> > continue;
> 
> could you please not format code like that. Operators go at the end of
> the line, and continuation should match the indentation of the opening
> paren. So the above would look like:
> 
> > +   if (!(sec->sh.sh_flags & SHF_EXECINSTR) &&
> > +   (strcmp(sec->name, ".altinstr_replacement") || 
> > !IGNORE_SHF_EXEC_FLAG))
> > continue;
> 
> You appear to be doing that quit consistently, and it is against style.

Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like
this. You can run it over a list of patches, so for a patch series you
can run:

 $ ./scripts/checkpatch.pl *.patch

... and hopefully most of the output will be reasonable.

Thanks,
Mark.

Re: WARN_ON_ONCE() hit at kernel/events/core.c:330

2019-04-09 Thread Mark Rutland

On Mon, Apr 08, 2019 at 11:50:31AM +0200, Peter Zijlstra wrote:
> On Mon, Apr 08, 2019 at 10:22:29AM +0200, Peter Zijlstra wrote:
> > On Mon, Apr 08, 2019 at 09:12:28AM +0200, Thomas-Mich Richter wrote:
> 
> > > very good news, your fix ran over the weekend without any hit!!!
> > > 
> > > Thanks very much for your help. Do you submit this patch to the kernel 
> > > mailing list?
> > 
> > Most excellent, let me go write a Changelog.
> 
> Hi Thomas, find below.
> 
> Sadly, while writing the Changelog I ended up with a 'completely'
> differet patch again, could I bother you to test this one too?
> 
> ---
> Subject: perf: Fix perf_event_disable_inatomic()
> From: Peter Zijlstra 
> Date: Thu, 4 Apr 2019 15:03:00 +0200
> 
> Thomas-Mich Richter reported he triggered a WARN from event_function_local()
> on his s390. The problem boils down to:
> 
>   CPU-A   CPU-B
> 
>   perf_event_overflow()
> perf_event_disable_inatomic()
>   @pending_disable = 1
>   irq_work_queue();
> 
>   sched-out
> event_sched_out()
>   @pending_disable = 0
> 
>   sched-in
>   perf_event_overflow()
> perf_event_disable_inatomic()
>   @pending_disable = 1;
>   irq_work_queue(); // FAILS
> 
>   irq_work_run()
> perf_pending_event()
>   if (@pending_disable)
> perf_event_disable_local(); // WHOOPS
> 
> The problem exists in generic, but s390 is particularly sensitive
> because it doesn't implement arch_irq_work_raise(), nor does it call
> irq_work_run() from it's PMU interrupt handler (nor would that be
> sufficient in this case, because s390 also generates
> perf_event_overflow() from pmu::stop). Add to that the fact that s390
> is a virtual architecture and (virtual) CPU-A can stall long enough
> for the above race to happen, even if it would self-IPI.
> 
> Adding an irq_work_syn() to event_sched_in() would work for all hardare
> PMUs that properly use irq_work_run() but fails for software PMUs.

Typo: s/syn/sync/

> 
> Instead encode the CPU number in @pending_disable, such that we can
> tell which CPU requested the disable. This then allows us to detect
> the above scenario and even redirect the IPI to make up for the failed
> queue.
> 
> Cc: Heiko Carstens 
> Cc: Kees Cook 
> Cc: Hendrik Brueckner 
> Cc: a...@redhat.com
> Cc: Martin Schwidefsky 
> Cc: Mark Rutland 
> Cc: Jiri Olsa 
> Reported-by: Thomas-Mich Richter 
> Signed-off-by: Peter Zijlstra (Intel) 

I can't think of a nicer way of handling this, so FWIW:

Acked-by: Mark Rutland 

Mark.

> ---
>  kernel/events/core.c |   52 
> ++-
>  1 file changed, 43 insertions(+), 9 deletions(-)
> 
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -2009,8 +2009,8 @@ event_sched_out(struct perf_event *event
>   event->pmu->del(event, 0);
>   event->oncpu = -1;
>  
> - if (event->pending_disable) {
> - event->pending_disable = 0;
> + if (READ_ONCE(event->pending_disable) >= 0) {
> + WRITE_ONCE(event->pending_disable, -1);
>   state = PERF_EVENT_STATE_OFF;
>   }
>   perf_event_set_state(event, state);
> @@ -2198,7 +2198,8 @@ EXPORT_SYMBOL_GPL(perf_event_disable);
>  
>  void perf_event_disable_inatomic(struct perf_event *event)
>  {
> - event->pending_disable = 1;
> + WRITE_ONCE(event->pending_disable, smp_processor_id());
> + /* can fail, see perf_pending_event_disable() */
>   irq_work_queue(>pending);
>  }
>  
> @@ -5810,10 +5811,45 @@ void perf_event_wakeup(struct perf_event
>   }
>  }
>  
> +static void perf_pending_event_disable(struct perf_event *event)
> +{
> + int cpu = READ_ONCE(event->pending_disable);
> +
> + if (cpu < 0)
> + return;
> +
> + if (cpu == smp_processor_id()) {
> + WRITE_ONCE(event->pending_disable, -1);
> + perf_event_disable_local(event);
> + return;
> + }
> +
> + /*
> +  *  CPU-A   CPU-B
> +  *
> +  *  perf_event_disable_inatomic()
> +  *@pending_disable = CPU-A;
> +  *irq_work_queue();
> +  *
> +  *  sched-out
> +  *@pending_disable = -1;
> +  *
> +  *  sched-in
> +  *  perf_event_disable_inatomic()
> +  *@pending_disable

Re: [PATCH 7/7] clocksource/arm_arch_timer: Use arch_timer_read_counter to access stable counters

2019-04-08 Thread Mark Rutland

On Mon, Apr 08, 2019 at 04:49:07PM +0100, Marc Zyngier wrote:
> Instead of always going via arch_counter_get_cntvct_stable to
> access the counter workaround, let's have arch_timer_read_counter
> to point to the right method.

Nit: s/to point/point/

> For that, we need to track whether any CPU in the system has a
> workaround for the counter. This is done by having an atomic
> variable tracking this.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm/include/asm/arch_timer.h| 14 ++--
>  arch/arm64/include/asm/arch_timer.h  | 16 --
>  drivers/clocksource/arm_arch_timer.c | 48 +---
>  3 files changed, 70 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm/include/asm/arch_timer.h 
> b/arch/arm/include/asm/arch_timer.h
> index 3f0a0191f763..4b66ecd6be99 100644
> --- a/arch/arm/include/asm/arch_timer.h
> +++ b/arch/arm/include/asm/arch_timer.h
> @@ -83,7 +83,7 @@ static inline u32 arch_timer_get_cntfrq(void)
>   return val;
>  }
>  
> -static inline u64 arch_counter_get_cntpct(void)
> +static inline u64 __arch_counter_get_cntpct(void)
>  {
>   u64 cval;
>  
> @@ -92,7 +92,12 @@ static inline u64 arch_counter_get_cntpct(void)
>   return cval;
>  }
>  
> -static inline u64 arch_counter_get_cntvct(void)
> +static inline u64 __arch_counter_get_cntpct_stable(void)
> +{
> + return __arch_counter_get_cntpct();
> +}
> +
> +static inline u64 __arch_counter_get_cntvct(void)
>  {
>   u64 cval;
>  
> @@ -101,6 +106,11 @@ static inline u64 arch_counter_get_cntvct(void)
>   return cval;
>  }
>  
> +static inline u64 __arch_counter_get_cntvct_stable(void)
> +{
> + return __arch_counter_get_cntvct();
> +}
> +
>  static inline u32 arch_timer_get_cntkctl(void)
>  {
>   u32 cntkctl;
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index 5502ea049b63..48b2100f4aaa 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -174,18 +174,30 @@ static inline void arch_timer_set_cntkctl(u32 cntkctl)
>   isb();
>  }
>  
> -static inline u64 arch_counter_get_cntpct(void)
> +static inline u64 __arch_counter_get_cntpct_stable(void)
>  {
>   isb();
>   return arch_timer_reg_read_stable(cntpct_el0);
>  }
>  
> -static inline u64 arch_counter_get_cntvct(void)
> +static inline u64 __arch_counter_get_cntpct(void)
> +{
> + isb();
> + return read_sysreg(cntpct_el0);
> +}
> +
> +static inline u64 __arch_counter_get_cntvct_stable(void)
>  {
>   isb();
>   return arch_timer_reg_read_stable(cntvct_el0);
>  }
>  
> +static inline u64 __arch_counter_get_cntvct(void)
> +{
> + isb();
> + return read_sysreg(cntvct_el0);
> +}
> +
>  static inline int arch_timer_arch_init(void)
>  {
>   return 0;
> diff --git a/drivers/clocksource/arm_arch_timer.c 
> b/drivers/clocksource/arm_arch_timer.c
> index da487fbfada3..5f467868 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -152,6 +152,26 @@ u32 arch_timer_reg_read(int access, enum arch_timer_reg 
> reg,
>   return val;
>  }
>  
> +static u64 arch_counter_get_cntpct_stable(void)
> +{
> + return __arch_counter_get_cntpct_stable();
> +}
> +
> +static u64 arch_counter_get_cntpct(void)
> +{
> + return __arch_counter_get_cntpct();
> +}
> +
> +static u64 arch_counter_get_cntvct_stable(void)
> +{
> + return __arch_counter_get_cntvct_stable();
> +}
> +
> +static u64 arch_counter_get_cntvct(void)
> +{
> + return __arch_counter_get_cntvct();
> +}
> +
>  /*
>   * Default to cp15 based access because arm64 uses this function for
>   * sched_clock() before DT is probed and the cp15 method is guaranteed
> @@ -372,6 +392,7 @@ static u32 notrace sun50i_a64_read_cntv_tval_el0(void)
>  DEFINE_PER_CPU(const struct arch_timer_erratum_workaround *, 
> timer_unstable_counter_workaround);
>  EXPORT_SYMBOL_GPL(timer_unstable_counter_workaround);
>  
> +static atomic_t timer_unstable_counter_workaround_in_use = ATOMIC_INIT(0);
>  
>  static void erratum_set_next_event_tval_generic(const int access, unsigned 
> long evt,
>   struct clock_event_device *clk)
> @@ -550,6 +571,9 @@ void arch_timer_enable_workaround(const struct 
> arch_timer_erratum_workaround *wa
>   per_cpu(timer_unstable_counter_workaround, i) = wa;
>   }
>  
> + if (wa->read_cntvct_el0 || wa->read_cntpct_el0)
> + atomic_set(_unstable_counter_

Re: [PATCH 6/7] clocksource/arm_arch_timer: Remove use of workaround static key

2019-04-08 Thread Mark Rutland

On Mon, Apr 08, 2019 at 04:49:06PM +0100, Marc Zyngier wrote:
> The use of a static key in a hotplug path has proved to be a real
> nightmare, and makes it impossible to have scream-free lockdep
> kernel.
> 
> Let's remove the static key altogether, and focus on something saner.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/include/asm/arch_timer.h  |  4 
>  drivers/clocksource/arm_arch_timer.c | 25 +++--
>  2 files changed, 7 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index 4a06d46def7e..5502ea049b63 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -45,13 +45,9 @@
>   (__wa && __wa->h) ? __wa->h : arch_timer_##h;   \
>   })
>  
> -extern struct static_key_false arch_timer_read_ool_enabled;
> -#define needs_unstable_timer_counter_workaround() \
> - static_branch_unlikely(_timer_read_ool_enabled)
>  #else
>  #define has_erratum_handler(h)  false
>  #define erratum_handler(h)  (arch_timer_##h)
> -#define needs_unstable_timer_counter_workaround()  false
>  #endif
>  
>  enum arch_timer_erratum_match_type {
> diff --git a/drivers/clocksource/arm_arch_timer.c 
> b/drivers/clocksource/arm_arch_timer.c
> index c7f5b66d893c..da487fbfada3 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -372,8 +372,6 @@ static u32 notrace sun50i_a64_read_cntv_tval_el0(void)
>  DEFINE_PER_CPU(const struct arch_timer_erratum_workaround *, 
> timer_unstable_counter_workaround);
>  EXPORT_SYMBOL_GPL(timer_unstable_counter_workaround);
>  
> -DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
> -EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>  
>  static void erratum_set_next_event_tval_generic(const int access, unsigned 
> long evt,
>   struct clock_event_device *clk)
> @@ -552,12 +550,6 @@ void arch_timer_enable_workaround(const struct 
> arch_timer_erratum_workaround *wa
>   per_cpu(timer_unstable_counter_workaround, i) = wa;
>   }
>  
> - /*
> -  * Use the locked version, as we're called from the CPU
> -  * hotplug framework. Otherwise, we end-up in deadlock-land.
> -  */
> - static_branch_enable_cpuslocked(_timer_read_ool_enabled);
> -
>   /*
>* Don't use the vdso fastpath if errata require using the
>* out-of-line counter accessor. We may change our mind pretty
> @@ -573,7 +565,7 @@ void arch_timer_enable_workaround(const struct 
> arch_timer_erratum_workaround *wa
>  static void arch_timer_check_ool_workaround(enum 
> arch_timer_erratum_match_type type,
>   void *arg)
>  {
> - const struct arch_timer_erratum_workaround *wa;
> + const struct arch_timer_erratum_workaround *wa, *__wa;
>   ate_match_fn_t match_fn = NULL;
>   bool local = false;
>  
> @@ -597,16 +589,13 @@ static void arch_timer_check_ool_workaround(enum 
> arch_timer_erratum_match_type t
>   if (!wa)
>   return;
>  
> - if (needs_unstable_timer_counter_workaround()) {
> - const struct arch_timer_erratum_workaround *__wa;
> - __wa = __this_cpu_read(timer_unstable_counter_workaround);
> - if (__wa && wa != __wa)
> - pr_warn("Can't enable workaround for %s (clashes with 
> %s\n)",
> - wa->desc, __wa->desc);
> + __wa = __this_cpu_read(timer_unstable_counter_workaround);
> + if (__wa && wa != __wa)
> + pr_warn("Can't enable workaround for %s (clashes with %s\n)",
> + wa->desc, __wa->desc);
>  
> - if (__wa)
> - return;
> - }
> + if (__wa)
> + return;
>  
>   arch_timer_enable_workaround(wa, local);
>   pr_info("Enabling %s workaround for %s\n",
> -- 
> 2.20.1
>

Re: [PATCH 5/7] clocksource/arm_arch_timer: Drop use of static key in arch_timer_reg_read_stable

2019-04-08 Thread Mark Rutland

On Mon, Apr 08, 2019 at 04:49:05PM +0100, Marc Zyngier wrote:
> Let's start with the removal of the arch_timer_read_ool_enabled
> static key in arch_timer_reg_read_stable. IT is not a fast path,

Nit: s/IT/It/

> and we can simplify things a bit.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/include/asm/arch_timer.h | 42 +++--
>  1 file changed, 28 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index c3762ffcc933..4a06d46def7e 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -77,23 +77,37 @@ struct arch_timer_erratum_workaround {
>  DECLARE_PER_CPU(const struct arch_timer_erratum_workaround *,
>   timer_unstable_counter_workaround);
>  
> +/* inline sysreg accessors that make erratum_handler() work */
> +static inline notrace u32 arch_timer_read_cntp_tval_el0(void)
> +{
> + return read_sysreg(cntp_tval_el0);
> +}
> +
> +static inline notrace u32 arch_timer_read_cntv_tval_el0(void)
> +{
> + return read_sysreg(cntv_tval_el0);
> +}
> +
> +static inline notrace u64 arch_timer_read_cntpct_el0(void)
> +{
> + return read_sysreg(cntpct_el0);
> +}
> +
> +static inline notrace u64 arch_timer_read_cntvct_el0(void)
> +{
> + return read_sysreg(cntvct_el0);
> +}
> +
>  #define arch_timer_reg_read_stable(reg)  
> \
> -({   \
> - u64 _val;   \
> - if (needs_unstable_timer_counter_workaround()) {\
> - const struct arch_timer_erratum_workaround *wa; \
> + ({  \
> + u64 _val;   \
> + \
>   preempt_disable_notrace();  \
> - wa = __this_cpu_read(timer_unstable_counter_workaround); \
> - if (wa && wa->read_##reg)   \
> - _val = wa->read_##reg();\
> - else\
> - _val = read_sysreg(reg);\
> + _val = erratum_handler(read_ ## reg)(); \
>   preempt_enable_notrace();   \
> - } else {\
> - _val = read_sysreg(reg);\
> - }   \
> - _val;   \
> -})
> + \
> + _val;   \
> + })
>  
>  /*
>   * These register accessors are marked inline so the compiler can
> -- 
> 2.20.1
>

Re: [PATCH 4/7] clocksource/arm_arch_timer: Direcly assign set_next_event workaround

2019-04-08 Thread Mark Rutland

On Mon, Apr 08, 2019 at 04:49:04PM +0100, Marc Zyngier wrote:
> When a given timer is affected by an erratum and requires an
> alternative implementation of set_next_event, we do a rather
> complicated dance to detect and call the workaround on each
> set_next_event call.
> 
> This is clearly idiotic, as we can perfectly detect whether
> this CPU requires a workaround while setting up the clock event
> device.
> 
> This only requires the CPU-specific detection to be done a bit
> earlier, and we can then safely override the set_next_event pointer
> if we have a workaround associated to that CPU.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm/include/asm/arch_timer.h|  4 +++
>  arch/arm64/include/asm/arch_timer.h  | 16 ++
>  drivers/clocksource/arm_arch_timer.c | 46 +---
>  3 files changed, 28 insertions(+), 38 deletions(-)
> 
> diff --git a/arch/arm/include/asm/arch_timer.h 
> b/arch/arm/include/asm/arch_timer.h
> index 0a8d7bba2cb0..3f0a0191f763 100644
> --- a/arch/arm/include/asm/arch_timer.h
> +++ b/arch/arm/include/asm/arch_timer.h
> @@ -11,6 +11,10 @@
>  #include 
>  
>  #ifdef CONFIG_ARM_ARCH_TIMER
> +/* 32bit ARM doesn't know anything about timer errata... */
> +#define has_erratum_handler(h)   (false)
> +#define erratum_handler(h)   (arch_timer_##h)
> +
>  int arch_timer_arch_init(void);
>  
>  /*
> diff --git a/arch/arm64/include/asm/arch_timer.h 
> b/arch/arm64/include/asm/arch_timer.h
> index f2a234d6516c..c3762ffcc933 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -31,10 +31,26 @@
>  #include 
>  
>  #if IS_ENABLED(CONFIG_ARM_ARCH_TIMER_OOL_WORKAROUND)
> +#define has_erratum_handler(h)   
> \
> + ({  \
> + const struct arch_timer_erratum_workaround *__wa;   \
> + __wa = __this_cpu_read(timer_unstable_counter_workaround); \
> + (__wa && __wa->h);  \
> + })
> +
> +#define erratum_handler(h)   \
> + ({  \
> + const struct arch_timer_erratum_workaround *__wa;   \
> + __wa = __this_cpu_read(timer_unstable_counter_workaround); \
> + (__wa && __wa->h) ? __wa->h : arch_timer_##h;   \
> + })
> +
>  extern struct static_key_false arch_timer_read_ool_enabled;
>  #define needs_unstable_timer_counter_workaround() \
>   static_branch_unlikely(_timer_read_ool_enabled)
>  #else
> +#define has_erratum_handler(h)  false
> +#define erratum_handler(h)  (arch_timer_##h)
>  #define needs_unstable_timer_counter_workaround()  false
>  #endif
>  
> diff --git a/drivers/clocksource/arm_arch_timer.c 
> b/drivers/clocksource/arm_arch_timer.c
> index aa4ec53281ce..c7f5b66d893c 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -613,36 +613,12 @@ static void arch_timer_check_ool_workaround(enum 
> arch_timer_erratum_match_type t
>   local ? "local" : "global", wa->desc);
>  }
>  
> -#define erratum_handler(fn, r, ...)  \
> -({   \
> - bool __val; \
> - if (needs_unstable_timer_counter_workaround()) {\
> - const struct arch_timer_erratum_workaround *__wa;   \
> - __wa = __this_cpu_read(timer_unstable_counter_workaround); \
> - if (__wa && __wa->fn) { \
> - r = __wa->fn(__VA_ARGS__);  \
> - __val = true;   \
> - } else {\
> - __val = false;  \
> - }   \
> - } else {\
> - __val = false;  \
> - }   \
> - __val;  \
> -})
> -
>  static bool arch_timer_this_cpu_has_cntvct_wa(void)
>  {
> - const struct arch_timer_erratum_workaround *wa;
> -
&g

Re: [PATCH 3/7] arm64: Use arch_timer_read_counter instead of arch_counter_get_cntvct

2019-04-08 Thread Mark Rutland

On Mon, Apr 08, 2019 at 04:49:03PM +0100, Marc Zyngier wrote:
> Only arch_timer_read_counter will guarantee that workarounds are
> applied. So let's use this one instead of arch_counter_get_cntvct.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/kernel/traps.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
> index 8ad119c3f665..6190a60388cf 100644
> --- a/arch/arm64/kernel/traps.c
> +++ b/arch/arm64/kernel/traps.c
> @@ -493,7 +493,7 @@ static void cntvct_read_handler(unsigned int esr, struct 
> pt_regs *regs)
>  {
>   int rt = ESR_ELx_SYS64_ISS_RT(esr);
>  
> - pt_regs_write_reg(regs, rt, arch_counter_get_cntvct());
> + pt_regs_write_reg(regs, rt, arch_timer_read_counter());
>   arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE);
>  }
>  
> @@ -665,7 +665,7 @@ static void compat_cntvct_read_handler(unsigned int esr, 
> struct pt_regs *regs)
>  {
>   int rt = (esr & ESR_ELx_CP15_64_ISS_RT_MASK) >> 
> ESR_ELx_CP15_64_ISS_RT_SHIFT;
>   int rt2 = (esr & ESR_ELx_CP15_64_ISS_RT2_MASK) >> 
> ESR_ELx_CP15_64_ISS_RT2_SHIFT;
> - u64 val = arch_counter_get_cntvct();
> + u64 val = arch_timer_read_counter();
>  
>   pt_regs_write_reg(regs, rt, lower_32_bits(val));
>   pt_regs_write_reg(regs, rt2, upper_32_bits(val));
> -- 
> 2.20.1
>

Re: [PATCH 2/7] watchdog/sbsa: Use arch_timer_read_counter instead of arch_counter_get_cntvct

2019-04-08 Thread Mark Rutland

On Mon, Apr 08, 2019 at 04:49:02PM +0100, Marc Zyngier wrote:
> Only arch_timer_read_counter will guarantee that workarounds are
> applied. So let's use this one instead of arch_counter_get_cntvct.
> 
> Signed-off-by: Marc Zyngier 

Acked-by: Mark Rutland 

Mark.

> ---
>  drivers/watchdog/sbsa_gwdt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/watchdog/sbsa_gwdt.c b/drivers/watchdog/sbsa_gwdt.c
> index e8bd9887c566..e221e47396ab 100644
> --- a/drivers/watchdog/sbsa_gwdt.c
> +++ b/drivers/watchdog/sbsa_gwdt.c
> @@ -161,7 +161,7 @@ static unsigned int sbsa_gwdt_get_timeleft(struct 
> watchdog_device *wdd)
>   timeleft += readl(gwdt->control_base + SBSA_GWDT_WOR);
>  
>   timeleft += lo_hi_readq(gwdt->control_base + SBSA_GWDT_WCV) -
> - arch_counter_get_cntvct();
> + arch_timer_read_counter();
>  
>   do_div(timeleft, gwdt->clk);
>  
> -- 
> 2.20.1
>

Re: [PATCH 1/7] ARM: vdso: Remove dependency with the arch_timer driver internals

2019-04-08 Thread Mark Rutland

On Mon, Apr 08, 2019 at 04:49:01PM +0100, Marc Zyngier wrote:
> THe VDSO code uses the kernel helper that was originally designed

Nit: s/THe/The/

> to abstract the access between 32 and 64bit systems. It worked so
> far because this function is declared as 'inline'.
> 
> As we're about to revamp that part of the code, the VDSO would
> break. Let's fix it by doing what should have been done from
> the start, a proper system register access.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm/include/asm/cp15.h   | 2 ++
>  arch/arm/vdso/vgettimeofday.c | 5 +++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/cp15.h b/arch/arm/include/asm/cp15.h
> index 07e27f212dc7..d2453e2d3f1f 100644
> --- a/arch/arm/include/asm/cp15.h
> +++ b/arch/arm/include/asm/cp15.h
> @@ -68,6 +68,8 @@
>  #define BPIALL   __ACCESS_CP15(c7, 0, c5, 6)
>  #define ICIALLU  __ACCESS_CP15(c7, 0, c5, 0)
>  
> +#define CNTVCT   __ACCESS_CP15_64(1, c14)

This encoding looks right to me per ARM DDI 0406C.d section B4.1.34. The
rest also looks sound, so with that typo fixed:

Reviewed-by: Mark Rutland 

Mark.

> +
>  extern unsigned long cr_alignment;   /* defined in entry-armv.S */
>  
>  static inline unsigned long get_cr(void)
> diff --git a/arch/arm/vdso/vgettimeofday.c b/arch/arm/vdso/vgettimeofday.c
> index a9dd619c6c29..7bdbf5d5c47d 100644
> --- a/arch/arm/vdso/vgettimeofday.c
> +++ b/arch/arm/vdso/vgettimeofday.c
> @@ -18,9 +18,9 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -123,7 +123,8 @@ static notrace u64 get_ns(struct vdso_data *vdata)
>   u64 cycle_now;
>   u64 nsec;
>  
> - cycle_now = arch_counter_get_cntvct();
> + isb();
> + cycle_now = read_sysreg(CNTVCT);
>  
>   cycle_delta = (cycle_now - vdata->cs_cycle_last) & vdata->cs_mask;
>  
> -- 
> 2.20.1
>

Re: [PATCH v8 0/5] arm64: ftrace with regs

2019-04-08 Thread Mark Rutland

On Mon, Mar 11, 2019 at 12:49:46PM +0100, Torsten Duwe wrote:
> On Wed, Feb 13, 2019 at 11:11:04AM +, Julien Thierry wrote:
> > Hi Torsten,
> > 
> > On 08/02/2019 15:08, Torsten Duwe wrote:
> > > Patch series v8, as discussed.
> > > The whole series applies cleanly on 5.0-rc5
> 
> So what's the status now? Besides debatable minor style
> issues there were no more objections to v8. Would this
> go through the ARM repo or via the ftrace repo?

Sorry agains for the delay on this. I'm now back in the office and in
front of a computer daily, so I can spend a bit more time on this.

Regardless of anything else, I think that we should queue the first
three patches now. I've poked the relevant maintainers for their acks so
that those can be taken via the arm64 tree.

I'm happy to do the trivial cleanups on the last couple of patches (e.g.
s/lr/x30), and I'm actively looking at the API rework I requested.

Thanks,
Mark.

Re: [PATCH v8 3/5] arm64: replace -pg with CC_FLAGS_FTRACE in mm/kasan Makefile

2019-04-08 Thread Mark Rutland

On Fri, Feb 08, 2019 at 04:10:14PM +0100, Torsten Duwe wrote:
>   In preparation for arm64 supporting ftrace built on other compiler
>   options, let's have makefiles remove the $(CC_FLAGS_FTRACE)
>   flags, whatever these may be, rather than assuming '-pg'.
> 
>   There should be no functional change as a result of this patch.
> 
> Signed-off-by: Torsten Duwe 

Andrey, would you be happy if we were to take this via the arm64 tree?

Assuming so, could we please have your ack?

Thanks,
Mark.

> 
> ---
>  mm/kasan/Makefile |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> --- a/mm/kasan/Makefile
> +++ b/mm/kasan/Makefile
> @@ -5,8 +5,8 @@ UBSAN_SANITIZE_generic.o := n
>  UBSAN_SANITIZE_tags.o := n
>  KCOV_INSTRUMENT := n
>  
> -CFLAGS_REMOVE_common.o = -pg
> -CFLAGS_REMOVE_generic.o = -pg
> +CFLAGS_REMOVE_common.o = $(CC_FLAGS_FTRACE)
> +CFLAGS_REMOVE_generic.o = $(CC_FLAGS_FTRACE)
>  # Function splitter causes unnecessary splits in __asan_load1/__asan_store1
>  # see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63533
>

Re: [PATCH] arm64: dts: qcom: Add Lenovo Miix 630

2019-04-08 Thread Mark Rutland

On Tue, Mar 26, 2019 at 11:55:25AM -0700, Jeffrey Hugo wrote:
> This adds the initial DT for the Lenovo Miix 630 laptop.  Supported
> functionality includes USB (host), microSD-card, keyboard, and trackpad.

I was under the impression that the Miix 630 came with Windows, and the
firmware therefore provided (as least a simulacrum of) ACPI. Why do we
need a DT?

[...]

> +++ b/arch/arm64/boot/dts/qcom/msm8998-cls.dtsi
> @@ -0,0 +1,278 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2019, Jeffrey Hugo. All rights reserved. */
> +
> +/*
> + * Common include for MSM8998 clamshell (cls) devices, ie the Lenovo Miix 
> 630,
> + * Asus NovaGo TP370QL, and HP Envy x2.  All three devices are basically the
> + * same, with differences in peripherals.
> + */

I think it would be better to consistently use "clamshell" (e.g. rename
this to msm8998-clamshell.dtsi), and mention the "cls" terminology in
the comment if that's necessary.

[...]

> + keyboard@3a {
> + /* QTEC0001 is the ACPI HID, which could be used for quirks */
> + compatible = "QTEC0001", "hid-over-i2c";

Please give this a real compatible string rather than forcing the ACPI
HID in.

Thanks,
Mark.

Re: [PATCH] drivers: firmware: psci: add support for warm reset

2019-04-02 Thread Mark Rutland

On Mon, Apr 01, 2019 at 09:14:43PM +0300, Aaro Koskinen wrote:
> From: Aaro Koskinen 
> 
> Add support for warm reset using SYSTEM_RESET2 introduced in PSCI
> 1.1 specification.

For reference, how do you intend to use this?

> 
> Signed-off-by: Aaro Koskinen 
> ---
>  drivers/firmware/psci.c   | 25 +
>  include/uapi/linux/psci.h |  3 +++
>  2 files changed, 28 insertions(+)
> 
> diff --git a/drivers/firmware/psci.c b/drivers/firmware/psci.c
> index c80ec1d03274..1d95b5f6d403 100644
> --- a/drivers/firmware/psci.c
> +++ b/drivers/firmware/psci.c
> @@ -73,6 +73,7 @@ enum psci_function {
>   PSCI_FN_CPU_ON,
>   PSCI_FN_CPU_OFF,
>   PSCI_FN_MIGRATE,
> + PSCI_FN_SYSTEM_RESET2,
>   PSCI_FN_MAX,
>  };
>  
> @@ -256,6 +257,14 @@ static void psci_sys_reset(enum reboot_mode reboot_mode, 
> const char *cmd)
>   invoke_psci_fn(PSCI_0_2_FN_SYSTEM_RESET, 0, 0, 0);
>  }
>  
> +static void psci_sys_reset2(enum reboot_mode reboot_mode, const char *cmd)
> +{
> + if (reboot_mode == REBOOT_WARM)
> + invoke_psci_fn(psci_function_id[PSCI_FN_SYSTEM_RESET2], 0, 0, 
> 0);
> + else
> + psci_sys_reset(reboot_mode, cmd);
> +}

What happens when RESET2 isn't available, but the user requests s REBOOT_WARM
reset? It looks like we'd invoke a bogus SMC.

Could we please add a preprocessor definition for SYSTEM_RESET_WARM so that
it's clear that we're passing a parameter here?

> +
>  static void psci_sys_poweroff(void)
>  {
>   invoke_psci_fn(PSCI_0_2_FN_SYSTEM_OFF, 0, 0, 0);
> @@ -564,6 +573,20 @@ static void __init psci_0_2_set_functions(void)
>   pm_power_off = psci_sys_poweroff;
>  }
>  
> +static void __init psci_1_1_set_functions(void)
> +{
> + int sys_reset2;
> + int feature;
> +
> + sys_reset2 = PSCI_FN_NATIVE(1_1, SYSTEM_RESET2);
> + feature = psci_features(sys_reset2);
> +
> + if (feature != PSCI_RET_NOT_SUPPORTED) {
> + psci_function_id[PSCI_FN_SYSTEM_RESET2] = sys_reset2;
> + arm_pm_restart = psci_sys_reset2;
> + }
> +}
> +
>  /*
>   * Probe function for PSCI firmware versions >= 0.2
>   */
> @@ -588,6 +611,8 @@ static int __init psci_probe(void)
>   psci_init_smccc();
>   psci_init_cpu_suspend();
>   psci_init_system_suspend();
> + if (PSCI_VERSION_MINOR(ver) >= 1)
> + psci_1_1_set_functions();
>   }

If we're going to add more things to this chain, can we please refactor this as:

if (ver < PSCI_VERSION(1, 0))
return;

/* PSCI 1.0 setup here */

if (ver < PSCI_VERSION(1,1))
return;

/* PSCI 1.1 setup here */

... as that saves on indentation and is easier to extend in future.

Thanks,
Mark

Re: [PATCH v7 2/3] arm64: implement ftrace with regs

2019-04-02 Thread Mark Rutland

Hi Torsten,

Sorry for the long delay prior to this reply.

On Fri, Jan 18, 2019 at 05:39:08PM +0100, Torsten Duwe wrote:
> Once gcc8 adds 2 NOPs at the beginning of each function, replace the
> first NOP thus generated with a quick LR saver (move it to scratch reg
> x9), so the 2nd replacement insn, the call to ftrace, does not clobber
> the value. Ftrace will then generate the standard stack frames.
> 
> Note that patchable-function-entry in GCC disables IPA-RA, which means
> ABI register calling conventions are obeyed *and* scratch registers
> such as x9 are available.
> 
> Introduce and handle an ftrace_regs_trampoline for module PLTs, right
> after ftrace_trampoline, and double the size of this special section.
> 
> Signed-off-by: Torsten Duwe 
> 
> ---
> 
> Mark, if you see your ftrace entry macro code being represented correctly
> here, please add your sign-off, As I've initially copied it from your mail.
> 
> ---
>  arch/arm64/include/asm/ftrace.h  |   17 -
>  arch/arm64/include/asm/module.h  |3 
>  arch/arm64/kernel/entry-ftrace.S |  125 
> +--
>  arch/arm64/kernel/ftrace.c   |  114 ++-
>  arch/arm64/kernel/module-plts.c  |3 
>  arch/arm64/kernel/module.c   |2 
>  6 files changed, 227 insertions(+), 37 deletions(-)
> --- a/arch/arm64/include/asm/ftrace.h
> +++ b/arch/arm64/include/asm/ftrace.h
> @@ -14,9 +14,24 @@
>  #include 
>  
>  #define HAVE_FUNCTION_GRAPH_FP_TEST
> -#define MCOUNT_ADDR  ((unsigned long)_mcount)
>  #define MCOUNT_INSN_SIZE AARCH64_INSN_SIZE
>  
> +/*
> + * DYNAMIC_FTRACE_WITH_REGS is implemented by adding 2 NOPs at the beginning
> + * of each function, with the second NOP actually calling ftrace. In contrary
> + * to a classic _mcount call, the call instruction to be modified is thus
> + * the second one, and not the only one.
> + */
> +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS
> +#define ARCH_SUPPORTS_FTRACE_OPS 1
> +#define REC_IP_BRANCH_OFFSET AARCH64_INSN_SIZE
> +/* All we need is some magic value. Simply use "_mCount:" */
> +#define MCOUNT_ADDR  (0x5f6d436f756e743a)

I'm really not keen on having a magic constant, and I'd really like to
see MCOUNT_ADDR disappear entirely when the compiler doesn't generate an
mcount call. I'm concerned that this is confusing and fragile.

I think that it would be better to make the core ftrace API agnostic of
mcount, and to teach it that there's a one-time initialization step for
callsites (which is not necessarily patching a call with a NOP).

We currently have:

ftrace_make_nop(mod, rec, addr)
ftrace_make_call(rec, addr)
ftrace_modify_call(rec, old_addr, new_addr)

... whereas we could have:

ftrace_call_init(mod, rec)
ftrace_call_enable(rec, addr)
ftrace_call_disable(rec, addr)
ftrace_call_modify(rec, old_addr, new_addr)

... so we wouldn't need to special-case anything for initialization (and
don't need MCOUNT_ADDR at all), and it would be clearer as to what was
happening at each stage.

> +#else
> +#define REC_IP_BRANCH_OFFSET 0
> +#define MCOUNT_ADDR  ((unsigned long)_mcount)
> +#endif
> +
>  #ifndef __ASSEMBLY__
>  #include 
>  
> --- a/arch/arm64/kernel/entry-ftrace.S
> +++ b/arch/arm64/kernel/entry-ftrace.S
> @@ -10,6 +10,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -124,6 +125,7 @@ EXPORT_SYMBOL(_mcount)
>  NOKPROBE(_mcount)
>  
>  #else /* CONFIG_DYNAMIC_FTRACE */
> +#ifndef CONFIG_DYNAMIC_FTRACE_WITH_REGS
>  /*
>   * _mcount() is used to build the kernel with -pg option, but all the branch
>   * instructions to _mcount() are replaced to NOP initially at kernel start 
> up,
> @@ -163,11 +165,6 @@ GLOBAL(ftrace_graph_call)// ftrace_gra
>  
>   mcount_exit
>  ENDPROC(ftrace_caller)
> -#endif /* CONFIG_DYNAMIC_FTRACE */
> -
> -ENTRY(ftrace_stub)
> - ret
> -ENDPROC(ftrace_stub)
>  
>  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
>  /*
> @@ -187,7 +184,125 @@ ENTRY(ftrace_graph_caller)
>  
>   mcount_exit
>  ENDPROC(ftrace_graph_caller)
> +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
> +
> +#else /* CONFIG_DYNAMIC_FTRACE_WITH_REGS */
> +
> + .macro  ftrace_regs_entry, allregs=0
> + /* make room for pt_regs, plus a callee frame */
> + sub sp, sp, #(S_FRAME_SIZE + 16)
> +
> + /* save function arguments */
> + stp x0, x1, [sp, #S_X0]
> + stp x2, x3, [sp, #S_X2]
> + stp x4, x5, [sp, #S_X4]
> + stp x6, x7, [sp, #S_X6]
> + stp x8, x9, [sp, #S_X8]
>  
> + .if \allregs == 1
> + stp x10, x11, [sp, #S_X10]
> + stp x12, x13, [sp, #S_X12]
> + stp x14, x15, [sp, #S_X14]
> + stp x16, x17, [sp, #S_X16]
> + stp x18, x19, [sp, #S_X18]
> + stp x20, x21, [sp, #S_X20]
> + stp x22, x23, [sp, #S_X22]
> + stp x24, x25, [sp, #S_X24]
> + stp x26, x27, [sp, #S_X26]
> + .endif
> +
> + /* Save fp and x28, which is used in this function. */
> +

Re: [PATCH] perf: Change PMCR write to read-modify-write

2019-04-01 Thread Mark Rutland

On Thu, Mar 21, 2019 at 01:02:27PM -0700, Sodagudi Prasad wrote:
> On 2019-03-21 06:34, Julien Thierry wrote:
> > Hi Prasad,
> > 
> > On 21/03/2019 02:07, Prasad Sodagudi wrote:
> > > Preserves the bitfields of PMCR_EL0(AArch64) during PMU reset.
> > > Reset routine should write a 1 to PMCR.C and PMCR.P fields only
> > > to reset the counters. Other fields should not be changed
> > > as they could be set before PMU initialization and their
> > > value must be preserved even after reset.
> > 
> > Are there any particular bit you are concerned about? Apart from the RO
> > ones and the Res0 ones (to which we are already writing 0), I see:
> > 
> > DP -> irrelevant for non-secure
> > X -> This one is the only potentially interesting, however it resets to
> > an architecturally unknown value, so unless we know for a fact it was
> > set before hand, we probably want to clear it
> > D -> ignored when we have LC set (and we do)
> > E -> Since this is the function we use to reset the PMU on the current
> > CPU, we probably want to set this bit to 0 regardless of its previous
> > value
> > 
> > So, is there any issue this patch is solving?
> 
> Hi Julien,
> 
> Thanks for taking a look into this patch. Yes. On our Qualcomm platforms,
> observed that X bit is getting cleared by kernel.
> This bit is getting set by firmware for Qualcomm use cases and non-secure
> world is resetting without this patch.

Given the kernel reconfigures the PMU dynamically at runtime (e.g. to multiplex
events over counters), I don't follow how this is useful. Could you elaborate
on how specifically you intend to use this?

> I think, changing that register this register modifications to
> read-modify-write style make sense.

I disagree. 

In general, RESx bits may be allocated new meanings in future (and will reset
to UNKNOWN values), so it is not safe to simply preserve the reset value. Given
that, any kernel _must_ explicitly reset registers in order to ensure the
expected behaviour.

Thanks
Mark.

Re: [PATCH] arm64/vdso: don't leak kernel addresses

2019-04-01 Thread Mark Rutland

On Sat, Mar 30, 2019 at 07:46:38PM +0100, Matteo Croce wrote:
> Since commit ad67b74d2469d9b8 ("printk: hash addresses printed with %p"),
> two obfuscated kernel pointer are printed at every boot:
> 
> vdso: 2 pages (1 code @ (ptrval), 1 data @ (ptrval))
> 
> Remove the addresses from the print, which turns into a more discrete:
> 
> vdso: 2 pages (1 code, 1 data)
> 
> Fixes: ad67b74d2469d9b8 ("printk: hash addresses printed with %p")
> Signed-off-by: Matteo Croce 
> ---
>  arch/arm64/kernel/vdso.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> index 2d419006ad43..fdfee0ef5bc5 100644
> --- a/arch/arm64/kernel/vdso.c
> +++ b/arch/arm64/kernel/vdso.c
> @@ -146,8 +146,8 @@ static int __init vdso_init(void)
>   }
>  
>   vdso_pages = (vdso_end - vdso_start) >> PAGE_SHIFT;
> - pr_info("vdso: %ld pages (%ld code @ %p, %ld data @ %p)\n",
> - vdso_pages + 1, vdso_pages, vdso_start, 1L, vdso_data);
> + pr_info("vdso: %ld pages (%ld code, %ld data)\n",
> + vdso_pages + 1, vdso_pages, 1L);

It's probably better to drop this pr_info() entirely. The number of data and
code pages hasn't changed since this was upstreamed, and the pointers were the
useful part for debugging.

If you respin this to delete the pr_info() entirely, then feel free to add:

Acked-by: Mark Rutland 

Mark.

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 8690 matches

Mail list logo