Re: [PATCH v2] 9p/xen: increase XEN_9PFS_RING_ORDER

2020-05-21 Thread Dominique Martinet
Stefano Stabellini wrote on Thu, May 21, 2020:
> From: Stefano Stabellini 
> 
> Increase XEN_9PFS_RING_ORDER to 9 for performance reason. Order 9 is the
> max allowed by the protocol.
> 
> We can't assume that all backends will support order 9. The xenstore
> property max-ring-page-order specifies the max order supported by the
> backend. We'll use max-ring-page-order for the size of the ring.
> 
> This means that the size of the ring is not static
> (XEN_FLEX_RING_SIZE(9)) anymore. Change XEN_9PFS_RING_SIZE to take an
> argument and base the calculation on the order chosen at setup time.
> 
> Finally, modify p9_xen_trans.maxsize to be divided by 4 compared to the
> original value. We need to divide it by 2 because we have two rings
> coming off the same order allocation: the in and out rings. This was a
> mistake in the original code. Also divide it further by 2 because we
> don't want a single request/reply to fill up the entire ring. There can
> be multiple requests/replies outstanding at any given time and if we use
> the full ring with one, we risk forcing the backend to wait for the
> client to read back more replies before continuing, which is not
> performant.
> 
> Signed-off-by: Stefano Stabellini 

LGTM, I'll try to find some time to test this by the end of next week or
will trust you if I can't make it -- ping me around June 1st if I don't
reply again until then...

Cheers,
-- 
Dominique


linux-next: manual merge of the devicetree tree with the pci tree

2020-05-21 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the devicetree tree got a conflict in:

  Documentation/devicetree/bindings/pci/cdns-pcie.yaml

between commit:

  fb5f8f3ca5f8 ("dt-bindings: PCI: cadence: Deprecate inbound/outbound specific 
bindings")

from the pci tree and commit:

  3d21a4609335 ("dt-bindings: Remove cases of 'allOf' containing a '$ref'")

from the devicetree tree.

I fixed it up (the former removed the section modified by the latter,
so I just did that) and can carry the fix as necessary. This is now fixed
as far as linux-next is concerned, but any non trivial conflicts should
be mentioned to your upstream maintainer when your tree is submitted for
merging.  You may also want to consider cooperating with the maintainer
of the conflicting tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpt696c8IvXQ.pgp
Description: OpenPGP digital signature


Re: [PATCH -V2] swap: Reduce lock contention on swap cache from swap slots allocation

2020-05-21 Thread Huang, Ying
Daniel Jordan  writes:

> On Wed, May 20, 2020 at 11:15:02AM +0800, Huang Ying wrote:
>> @@ -2827,6 +2865,11 @@ static struct swap_info_struct *alloc_swap_info(void)
>>  p = kvzalloc(struct_size(p, avail_lists, nr_node_ids), GFP_KERNEL);
>>  if (!p)
>>  return ERR_PTR(-ENOMEM);
>> +p->cluster_next_cpu = alloc_percpu(unsigned int);
>> +if (!p->cluster_next_cpu) {
>> +kvfree(p);
>> +return ERR_PTR(-ENOMEM);
>> +}
>
> There should be free_percpu()s at two places after this, but I think the
> allocation really belongs right...
>
>> @@ -3202,7 +3245,10 @@ SYSCALL_DEFINE2(swapon, const char __user *, 
>> specialfile, int, swap_flags)
>>   * select a random position to start with to help wear leveling
>>   * SSD
>>   */
>> -p->cluster_next = 1 + prandom_u32_max(p->highest_bit);
>
> ...here because then it's only allocated when it's actually used.

Good catch!  And yes, this is the better place to allocate memory.  I
will fix this in the new version!  Thanks a lot!

Best Regards,
Huang, Ying

>> +for_each_possible_cpu(cpu) {
>> +per_cpu(*p->cluster_next_cpu, cpu) =
>> +1 + prandom_u32_max(p->highest_bit);
>> +}
>>  nr_cluster = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER);
>>  
>>  cluster_info = kvcalloc(nr_cluster, sizeof(*cluster_info),
>> -- 
>> 2.26.2
>> 
>> 


[PATCH mmotm] mm/swap: fix livelock in __read_swap_cache_async()

2020-05-21 Thread Hugh Dickins
I've only seen this livelock on one machine (repeatably, but not to
order), and not fully analyzed it - two processes seen looping around
getting -EEXIST from swapcache_prepare(), I guess a third (at lower
priority? but wanting the same cpu as one of the loopers? preemption
or cond_resched() not enough to let it back in?) set SWAP_HAS_CACHE,
then went off into direct reclaim, scheduled away, and somehow could
not get back to add the page to swap cache and let them all complete.

Restore the page allocation in __read_swap_cache_async() to before
the swapcache_prepare() call: "mm: memcontrol: charge swapin pages
on instantiation" moved it outside the loop, which indeed looks much
nicer, but exposed this weakness.  We used to allocate new_page once
and then keep it across all iterations of the loop: but I think that
just optimizes for a rare case, and complicates the flow, so go with
the new simpler structure, with allocate+free each time around (which
is more considerate use of the memory too).

Fix the comment on the looping case, which has long been inaccurate:
it's not a racing get_swap_page() that's the problem here.

Fix the add_to_swap_cache() and mem_cgroup_charge() error recovery:
not swap_free(), but put_swap_page() to undo SWAP_HAS_CACHE, as was
done before; but delete_from_swap_cache() already includes it.

And one more nit: I don't think it makes any difference in practice,
but remove the "& GFP_KERNEL" mask from the mem_cgroup_charge() call:
add_to_swap_cache() needs that, to convert gfp_mask from user and page
cache allocation (e.g. highmem) to radix node allocation (lowmem), but
we don't need or usually apply that mask when charging mem_cgroup.

Signed-off-by: Hugh Dickins 
---
Mostly fixing mm-memcontrol-charge-swapin-pages-on-instantiation.patch
but now I see that mm-memcontrol-delete-unused-lrucare-handling.patch
made a further change here (took an arg off the mem_cgroup_charge call):
as is, this patch is diffed to go on top of both of them, and better
that I get it out now for Johannes look at; but could be rediffed for
folding into blah-instantiation.patch later.

Earlier in the day I promised two patches to __read_swap_cache_async(),
but find now that I cannot quite justify the second patch: it makes a
slight adjustment in swapcache_prepare(), then removes the redundant
__swp_swapcount() && swap_slot_cache_enabled business from blah_async().

I'd still like to do that, but this patch here brings back the
alloc_page_vma() in between them, and I don't have any evidence to
reassure us that I'm not then pessimizing a readahead case by doing
unnecessary allocation and free. Leave it for some other time perhaps.

 mm/swap_state.c |   52 +-
 1 file changed, 29 insertions(+), 23 deletions(-)

--- 5.7-rc6-mm1/mm/swap_state.c 2020-05-20 12:21:56.149694170 -0700
+++ linux/mm/swap_state.c   2020-05-21 20:17:50.188773901 -0700
@@ -392,56 +392,62 @@ struct page *__read_swap_cache_async(swp
return NULL;
 
/*
+* Get a new page to read into from swap.  Allocate it now,
+* before marking swap_map SWAP_HAS_CACHE, when -EEXIST will
+* cause any racers to loop around until we add it to cache.
+*/
+   page = alloc_page_vma(gfp_mask, vma, addr);
+   if (!page)
+   return NULL;
+
+   /*
 * Swap entry may have been freed since our caller observed it.
 */
err = swapcache_prepare(entry);
if (!err)
break;
 
-   if (err == -EEXIST) {
-   /*
-* We might race against get_swap_page() and stumble
-* across a SWAP_HAS_CACHE swap_map entry whose page
-* has not been brought into the swapcache yet.
-*/
-   cond_resched();
-   continue;
-   }
+   put_page(page);
+   if (err != -EEXIST)
+   return NULL;
 
-   return NULL;
+   /*
+* We might race against __delete_from_swap_cache(), and
+* stumble across a swap_map entry whose SWAP_HAS_CACHE
+* has not yet been cleared.  Or race against another
+* __read_swap_cache_async(), which has set SWAP_HAS_CACHE
+* in swap_map, but not yet added its page to swap cache.
+*/
+   cond_resched();
}
 
/*
-* The swap entry is ours to swap in. Prepare a new page.
+* The swap entry is ours to swap in. Prepare the new page.
 */
 
-   page = alloc_page_vma(gfp_mask, vma, addr);
-   if (!page)
-   goto fail_free;
-
__SetPageLocked(page);
__SetPageSwapBacked(page);
 
/* May 

[PATCH] capabilities: Introduce CAP_RESTORE

2020-05-21 Thread Adrian Reber
This enables CRIU to checkpoint and restore a process as non-root.

Over the last years CRIU upstream has been asked a couple of time if it
is possible to checkpoint and restore a process as non-root. The answer
usually was: 'almost'.

The main blocker to restore a process was that selecting the PID of the
restored process, which is necessary for CRIU, is guarded by CAP_SYS_ADMIN.

In the last two years the questions about checkpoint/restore as non-root
have increased and especially in the last few months we have seen
multiple people inventing workarounds.

The use-cases so far and their workarounds:

 * Checkpoint/Restore in an HPC environment in combination with
   a resource manager distributing jobs. Users are always running
   as non root, but there was the desire to provide a way to
   checkpoint and restore long running jobs.
   Workaround: setuid wrapper to start CRIU as root as non-root
   
https://github.com/FredHutch/slurm-examples/blob/master/checkpointer/lib/checkpointer/checkpointer-suid.c
 * Another use case to checkpoint/restore processes as non-root
   uses as workaround a non privileged process which cycles through
   PIDs by calling fork() as fast as possible with a rate of
   100,000 pids/s instead of writing to ns_last_pid
   https://github.com/twosigma/set_ns_last_pid
 * Fast Java startup using checkpoint/restore.
   We have been in contact with JVM developers who are integrating
   CRIU into a JVM to decrease the startup time.
   Workaround so far: patch out CAP_SYS_ADMIN checks in the kernel
 * Container migration as non root. There are people already
   using CRIU to migrate containers as non-root. The solution
   there is to run it in a user namespace. So if you are able
   to carefully setup your environment with the namespaces
   it is already possible to restore a container/process as non-root.
   Unfortunately it is not always possible to setup an environment
   in such a way and for easier access to non-root based container
   migration this patch is also required.

There are probably a few more things guarded by CAP_SYS_ADMIN required
to run checkpoint/restore as non-root, but by applying this patch I can
already checkpoint and restore processes as non-root. As there are
already multiple workarounds I would prefer to do it correctly in the
kernel to avoid that CRIU users are starting to invent more workarounds.

I have used the following tests to verify that this change works as
expected by setting the new capability CAP_RESTORE on the two resulting
test binaries:

$ cat ns_last_pid.c
 // http://efiop-notes.blogspot.com/2014/06/how-to-set-pid-using-nslastpid.html
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

int main(int argc, char *argv[])
{
pid_t pid, new_pid;
char buf[32];
int fd;

if (argc != 2)
return 1;

printf("Opening ns_last_pid...\n");
fd = open("/proc/sys/kernel/ns_last_pid", O_RDWR | O_CREAT, 0644);
if (fd < 0) {
perror("Cannot open ns_last_pid");
return 1;
}

printf("Locking ns_last_pid...\n");
if (flock(fd, LOCK_EX)) {
close(fd);
printf("Cannot lock ns_last_pid\n");
return 1;
}

pid = atoi(argv[1]);
snprintf(buf, sizeof(buf), "%d", pid - 1);
printf("Writing pid-1 to ns_last_pid...\n");
if (write(fd, buf, strlen(buf)) != strlen(buf)) {
printf("Cannot write to buf\n");
return 1;
}

printf("Forking...\n");
new_pid = fork();
if (new_pid == 0) {
printf("I am the child!\n");
exit(0);
} else if (new_pid == pid)
printf("I am the parent. My child got the pid %d!\n", new_pid);
else
printf("pid (%d) does not match expected pid (%d)\n", new_pid, 
pid);

printf("Cleaning up...\n");
if (flock(fd, LOCK_UN))
printf("Cannot unlock\n");
close(fd);
return 0;
}
$ id -u; /home/libcap/ns_last_pid 30
1001
Opening ns_last_pid...
Locking ns_last_pid...
Writing pid-1 to ns_last_pid...
Forking...
I am the parent. My child got the pid 30!
I am the child!
Cleaning up...

For the clone3() based approach:
$ cat clone3_set_tid.c
 #define _GNU_SOURCE
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

 #define ptr_to_u64(ptr) ((__u64)((uintptr_t)(ptr)))

int main(int argc, char *argv[])
{
struct clone_args c_args = { };
pid_t pid, new_pid;

if (argc != 2)
return 1;

pid = atoi(argv[1]);
c_args.set_tid = ptr_to_u64();
c_args.set_tid_size = 1;

printf("Forking...\n");
new_pid = syscall(__NR_clone3, _args, sizeof(c_args));
if (new_pid == 0) {
printf("I am the child!\n");
exit(0);
} else if 

Re: [PATCH 5/5] dt-bindings: timer: Add CLINT bindings

2020-05-21 Thread Anup Patel
On Fri, May 22, 2020 at 1:35 AM Sean Anderson  wrote:
>
> On 5/21/20 9:45 AM, Anup Patel wrote:
> > We add DT bindings documentation for CLINT device.
> >
> > Signed-off-by: Anup Patel 
> > ---
> >  .../bindings/timer/sifive,clint.txt   | 33 +++
> >  1 file changed, 33 insertions(+)
> >  create mode 100644 Documentation/devicetree/bindings/timer/sifive,clint.txt
> >
> > diff --git a/Documentation/devicetree/bindings/timer/sifive,clint.txt 
> > b/Documentation/devicetree/bindings/timer/sifive,clint.txt
> > new file mode 100644
> > index ..cae2dad1223a
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/timer/sifive,clint.txt
> > @@ -0,0 +1,33 @@
> > +SiFive Core Local Interruptor (CLINT)
> > +-
> > +
> > +SiFive (and other RISC-V) SOCs include an implementation of the SiFive Core
> > +Local Interruptor (CLINT) for M-mode timer and inter-processor interrupts.
> > +
> > +It directly connects to the timer and inter-processor interrupt lines of
> > +various HARTs (or CPUs) so RISC-V per-HART (or per-CPU) local interrupt
> > +controller is the parent interrupt controller for CLINT device.
> > +
> > +The clock frequency of CLINT is specified via "timebase-frequency" DT
> > +property of "/cpus" DT node. The "timebase-frequency" DT property is
> > +described in: Documentation/devicetree/bindings/riscv/cpus.yaml
> > +
> > +Required properties:
> > +- compatible : "sifive,clint-1.0.0" and a string identifying the actual
> > +  detailed implementation in case that specific bugs need to be worked 
> > around.
>
> Should the "riscv,clint0" compatible string be documented here? This

Yes, I forgot to add this compatible string. I will add in v2.

> peripheral is not really specific to sifive, as it is present in most
> rocket-chip cores.

I agree that CLINT is present in a lot of non-SiFive RISC-V SOCs and
FPGAs but this IP is only documented as part of SiFive FU540 SOC.
(Refer, https://static.dev.sifive.com/FU540-C000-v1.0.pdf)

The RISC-V foundation should host the CLINT spec independently
under https://github.com/riscv and make CLINT spec totally open.

For now, I have documented it just like PLIC DT bindings found at:
Documentation/devicetree/bindings/interrupt-controller/sifive,plic-1.0.0.txt

If RISC-V maintainers agree then I will document it as "RISC-V CLINT".

@Palmer ?? @Paul ??

>
> > +- reg : Should contain 1 register range (address and length).
> > +- interrupts-extended : Specifies which HARTs (or CPUs) are connected to
> > +  the CLINT.  Each node pointed to should be a riscv,cpu-intc node, which
> > +  has a riscv node as parent.
> > +
> > +Example:
> > +
> > + clint@200 {
> > + compatible = "sifive,clint-1.0.0", "sifive,fu540-c000-clint";
> > + interrupts-extended = <
> > +  3  7
> > +  3  7
> > +  3  7
> > +  3  7>;
> > + reg = <0x200 0x400>;
> > + };
> >
>
> --Sean

Regards,
Anup


Re: [PATCH 01/19] dt-bindings: PCI: Endpoint: Add DT bindings for PCI EPF NTB Device

2020-05-21 Thread Kishon Vijay Abraham I
Hi RobH,

On 5/14/2020 8:29 PM, Kishon Vijay Abraham I wrote:
> Add device tree schema for PCI endpoint function bus to which
> endpoint function devices should be attached. Then add device tree
> schema for PCI endpoint function device to include bindings thats
> generic to all endpoint functions. Finally add device tree schema
> for PCI endpoint NTB function device by including the generic
> device tree schema for PCIe endpoint function.
> 
> Signed-off-by: Kishon Vijay Abraham I 
> ---
>  .../bindings/pci/endpoint/pci-epf-bus.yaml| 42 +++
>  .../bindings/pci/endpoint/pci-epf-device.yaml | 69 +++
>  .../bindings/pci/endpoint/pci-epf-ntb.yaml| 68 ++
>  3 files changed, 179 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/pci/endpoint/pci-epf-bus.yaml
>  create mode 100644 
> Documentation/devicetree/bindings/pci/endpoint/pci-epf-device.yaml
>  create mode 100644 
> Documentation/devicetree/bindings/pci/endpoint/pci-epf-ntb.yaml
> 
> diff --git a/Documentation/devicetree/bindings/pci/endpoint/pci-epf-bus.yaml 
> b/Documentation/devicetree/bindings/pci/endpoint/pci-epf-bus.yaml
> new file mode 100644
> index ..1c504f2e85e4
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pci/endpoint/pci-epf-bus.yaml
> @@ -0,0 +1,42 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +# Copyright (C) 2020 Texas Instruments Incorporated - http://www.ti.com/
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/pci/endpoint/pci-epf-bus.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: PCI Endpoint Function Bus
> +
> +maintainers:
> +  - Kishon Vijay Abraham I 
> +
> +properties:
> +  compatible:
> +const: pci-epf-bus
> +
> +patternProperties:
> +  "^func@[0-9a-f]+$":
> +type: object
> +description: |
> +  PCI Endpoint Function Bus node should have subnodes for each of
> +  the implemented endpoint function. It should follow the bindings
> +  specified for endpoint function in
> +  Documentation/devicetree/bindings/pci/endpoint/
> +
> +examples:
> +  - |
> +epf_bus {
> +  compatible = "pci-epf-bus";
> +
> +  func@0 {
> +compatible = "pci-epf-ntb";
> +epcs = <_ep>, <_ep>;
> +epc-names = "primary", "secondary";
> +reg = <0>;

I'm not sure how to represent "reg" property properly for cases like this where
it represents ID and not a memory resource. I seem to get warning for
"reg_format" even after adding address-cells and size-cells property in
epf_bus. Can you give some hints here please?

> +epf,vendor-id = /bits/ 16 <0x104c>;

I want to make vendor-id and device-id as 16 bits from the beginning at-least
for PCIe endpoint. So I'm prefixing these properties with "epf,". However I get
this "do not match any of the regexes:". Can we add "epf" as a standard prefix?

Thanks
Kishon
> +epf,device-id = /bits/ 16 <0xb00d>;
> +num-mws = <4>;
> +mws-size = <0x0 0x10>, <0x0 0x10>, <0x0 0x10>, <0x0 
> 0x10>;
> +  };
> +};
> +...
> diff --git 
> a/Documentation/devicetree/bindings/pci/endpoint/pci-epf-device.yaml 
> b/Documentation/devicetree/bindings/pci/endpoint/pci-epf-device.yaml
> new file mode 100644
> index ..cee72864c8ca
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pci/endpoint/pci-epf-device.yaml
> @@ -0,0 +1,69 @@
> +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
> +# Copyright (C) 2020 Texas Instruments Incorporated - http://www.ti.com/
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/pci/endpoint/pci-epf-device.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: PCI Endpoint Function Device
> +
> +maintainers:
> +  - Kishon Vijay Abraham I 
> +
> +properties:
> +  compatible:
> +const: pci-epf-bus
> +
> +properties:
> +  $nodename:
> +pattern: "^func@"
> +
> +  epcs:
> +description:
> +  Phandle to the endpoint controller device. Should have "2" entries for
> +  NTB endpoint function and "1" entry for others.
> +minItems: 1
> +maxItems: 2
> +
> +  epc-names:
> +description:
> +  Must contain an entry for each entry in "epcs" when "epcs" have more 
> than
> +  one entry.
> +
> +  reg:
> +maxItems: 0
> +description: Must contain the index number of the function.
> +
> +  epf,vendor-id:
> +description:
> +  The PCI vendor ID
> +allOf:
> +  - $ref: /schemas/types.yaml#/definitions/uint16
> +
> +  epf,device-id:
> +description:
> +  The PCI device ID
> +allOf:
> +  - $ref: /schemas/types.yaml#/definitions/uint16
> +
> +  epf,baseclass-code:
> +description: Code to classify the type of operation the function performs
> +allOf:
> +  - $ref: /schemas/types.yaml#/definitions/uint8
> +
> +  epf,subclass-code:
> +description:
> +  Specifies a base class sub-class, which identifies more specifically 
> the
> +  

Re: [PATCH 03/11] mm/hugetlb: introduce alloc_control structure to simplify migration target allocation APIs

2020-05-21 Thread Joonsoo Kim
2020년 5월 22일 (금) 오전 3:57, Mike Kravetz 님이 작성:
>
> On 5/17/20 6:20 PM, js1...@gmail.com wrote:
> > From: Joonsoo Kim 
> >
> > Currently, page allocation functions for migration requires some arguments.
> > More worse, in the following patch, more argument will be needed to unify
> > the similar functions. To simplify them, in this patch, unified data
> > structure that controls allocation behaviour is introduced.
>
> As a followup to Roman's question and your answer about adding a suffix/prefix
> to the new structure.  It 'may' be a bit confusing as alloc_context is already
> defined and *ac is passsed around for page allocations.  Perhaps, this new
> structure could somehow have migrate in the name as it is all about allocating
> migrate targets?

I have considered that but I cannot find appropriate prefix. In hugetlb code,
struct alloc_control is passed to the internal function which is not
fully dedicated
to the migration so 'migrate' would not be appropriate prefix.

alloc_context is used by page allocation core and alloc_control would be used by
outside of it so I think that we can endure it. If there is a good
suggestion, I will change
the name happily.

> >
> > For clean-up, function declarations are re-ordered.
> >
> > Note that, gfp_mask handling on alloc_huge_page_(node|nodemask) is
> > slightly changed, from ASSIGN to OR. It's safe since caller of these
> > functions doesn't pass extra gfp_mask except htlb_alloc_mask().
> >
> > Signed-off-by: Joonsoo Kim 
>
> Patch makes sense.

Thanks!

> > diff --git a/mm/migrate.c b/mm/migrate.c
> > index a298a8c..94d2386 100644
> > --- a/mm/migrate.c
> > +++ b/mm/migrate.c
> > @@ -1526,10 +1526,15 @@ struct page *new_page_nodemask(struct page *page,
> >   unsigned int order = 0;
> >   struct page *new_page = NULL;
> >
> > - if (PageHuge(page))
> > - return alloc_huge_page_nodemask(
> > - page_hstate(compound_head(page)),
> > - preferred_nid, nodemask);
> > + if (PageHuge(page)) {
> > + struct hstate *h = page_hstate(page);
>
> I assume the removal of compound_head(page) was intentional?  Just asking
> because PageHuge will look at head page while page_hstate will not.  So,
> if passed a non-head page things could go bad.

I was thinking that page_hstate() can handle the tail page but it seems that
it's not. Thanks for correction. I will change it on next version.

Thanks.


[PATCH] [v3] usb: musb: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu
When copy_from_user() returns an error code, there
is a runtime PM usage counter imbalance.

Fix this by moving copy_from_user() to the beginning
of this function.

Signed-off-by: Dinghao Liu 
---

Changelog:

v2: - Move copy_from_user() to the beginning rather
  than adding pm_runtime_put_autosuspend().

v3: - Add missing changelog information.
---
 drivers/usb/musb/musb_debugfs.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/musb/musb_debugfs.c b/drivers/usb/musb/musb_debugfs.c
index 7b6281ab62ed..30a89aa8a3e7 100644
--- a/drivers/usb/musb/musb_debugfs.c
+++ b/drivers/usb/musb/musb_debugfs.c
@@ -168,6 +168,11 @@ static ssize_t musb_test_mode_write(struct file *file,
u8  test;
charbuf[24];
 
+   memset(buf, 0x00, sizeof(buf));
+
+   if (copy_from_user(buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
+   return -EFAULT;
+
pm_runtime_get_sync(musb->controller);
test = musb_readb(musb->mregs, MUSB_TESTMODE);
if (test) {
@@ -176,11 +181,6 @@ static ssize_t musb_test_mode_write(struct file *file,
goto ret;
}
 
-   memset(buf, 0x00, sizeof(buf));
-
-   if (copy_from_user(buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
-   return -EFAULT;
-
if (strstarts(buf, "force host full-speed"))
test = MUSB_TEST_FORCE_HOST | MUSB_TEST_FORCE_FS;
 
-- 
2.17.1



Re: [PATCH v2] xfrm: policy: Fix xfrm policy match

2020-05-21 Thread Xin Long
On Fri, May 22, 2020 at 9:45 AM Yuehaibing  wrote:
>
> On 2020/5/21 14:49, Xin Long wrote:
> > On Tue, May 19, 2020 at 4:53 PM Steffen Klassert
> >  wrote:
> >>
> >> On Fri, May 15, 2020 at 04:39:57PM +0800, Yuehaibing wrote:
> >>>
> >>> Friendly ping...
> >>>
> >>> Any plan for this issue?
> >>
> >> There was still no consensus between you and Xin on how
> >> to fix this issue. Once this happens, I consider applying
> >> a fix.
> >>
> > Sorry, Yuehaibing, I can't really accept to do: (A->mark.m & A->mark.v)
> > I'm thinking to change to:
> >
> >  static bool xfrm_policy_mark_match(struct xfrm_policy *policy,
> >struct xfrm_policy *pol)
> >  {
> > -   u32 mark = policy->mark.v & policy->mark.m;
> > -
> > -   if (policy->mark.v == pol->mark.v && policy->mark.m == pol->mark.m)
> > -   return true;
> > -
> > -   if ((mark & pol->mark.m) == pol->mark.v &&
> > -   policy->priority == pol->priority)
> > +   if (policy->mark.v == pol->mark.v &&
> > +   (policy->mark.m == pol->mark.m ||
> > +policy->priority == pol->priority))
> > return true;
> >
> > return false;
> >
> > which means we consider (the same value and mask) or
> > (the same value and priority) as the same one. This will
> > cover both problems.
>
>   policy A (mark.v = 0x1011, mark.m = 0x1011, priority = 1)
>   policy B (mark.v = 0x1001, mark.m = 0x1001, priority = 1)
I'd think these are 2 different policies.

>
>   when fl->flowi_mark == 0x12341011, in xfrm_policy_match() do check like 
> this:
>
> (fl->flowi_mark & pol->mark.m) != pol->mark.v
>
> 0x12341011 & 0x1011 == 0x1011
> 0x12341011 & 0x1001 == 0x1001
>
>  This also match different policy depends on the order of policy inserting.
Yes, this may happen when a user adds 2  policies like that.
But I think this's a problem that the user doesn't configure it well,
'priority' should be set.
and this can not be avoided, also such as:

   policy A (mark.v = 0xff00, mark.m = 0x1000, priority = 1)
   policy B (mark.v = 0x00ff, mark.m = 0x0011, priority = 1)

   try with 0x12341011

So just be it, let users decide.


[PATCH] drm/i915: fix a memory leak bug.

2020-05-21 Thread wu000273
From: Qiushi Wu 

In intel_gtt_setup_scratch_page(), pointer "page" is not released if
pci_dma_mapping_error() return an error, leading to a memory leak bug.
Fix this issue by freeing "page" before return.

Fixes: 0e87d2b06cb46 ("intel-gtt: initialize our own scratch page")
Signed-off-by: Qiushi Wu 
---
 drivers/char/agp/intel-gtt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/char/agp/intel-gtt.c b/drivers/char/agp/intel-gtt.c
index 66a62d17a3f5..a56a50f2740f 100644
--- a/drivers/char/agp/intel-gtt.c
+++ b/drivers/char/agp/intel-gtt.c
@@ -304,8 +304,10 @@ static int intel_gtt_setup_scratch_page(void)
if (intel_private.needs_dmar) {
dma_addr = pci_map_page(intel_private.pcidev, page, 0,
PAGE_SIZE, PCI_DMA_BIDIRECTIONAL);
-   if (pci_dma_mapping_error(intel_private.pcidev, dma_addr))
+   if (pci_dma_mapping_error(intel_private.pcidev, dma_addr)) {
+   __free_page(page);
return -EINVAL;
+   }
 
intel_private.scratch_page_dma = dma_addr;
} else
-- 
2.17.1



Re: Re: [PATCH] [v2] usb: musb: Fix runtime PM imbalance on error

2020-05-21 Thread Greg Kroah-Hartman
On Fri, May 22, 2020 at 01:22:24PM +0800, dinghao@zju.edu.cn wrote:
> Sorry, it's my carelessness. In v1 I added pm_runtime_put_autosuspend()
> after copy_from_user() to fix this problem. Since copy_from_user() is
> moved to the beginning now, we need not to add PM decrement. 

THat's fine, please put that information in a v3 and resend it.

thanks,

greg k-h


[PATCH] dmaengine: dw-axi-dmac: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu
When axi_dma_resume() returns an error code, a pairing
runtime PM usage counter decrement is needed to keep the
counter balanced.

Signed-off-by: Dinghao Liu 
---
 drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c 
b/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c
index 14c1ac26f866..a368d01170f1 100644
--- a/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c
+++ b/drivers/dma/dw-axi-dmac/dw-axi-dmac-platform.c
@@ -924,8 +924,10 @@ static int dw_probe(struct platform_device *pdev)
 */
pm_runtime_get_noresume(chip->dev);
ret = axi_dma_resume(chip);
-   if (ret < 0)
+   if (ret < 0) {
+   pm_runtime_put(chip->dev);
goto err_pm_disable;
+   }
 
axi_dma_hw_init(chip);
 
-- 
2.17.1



Offer

2020-05-21 Thread Mazer
I hope you are doing great?
 
This is Felix from Toronto-Canada. I have a lucrative business 
offer that will benefit us both immensely within a very short 
period of time. However, I need your initial approval of interest 
prior to further and complete details regarding the deal.
 
Thanks,
 
Felix.


Re: [PATCH v2 10/15] soc: qcom: ipa: use new module_firmware_crashed()

2020-05-21 Thread Luis Chamberlain
On Tue, May 19, 2020 at 05:34:13PM -0500, Alex Elder wrote:
> On 5/15/20 4:28 PM, Luis Chamberlain wrote:
> > This makes use of the new module_firmware_crashed() to help
> > annotate when firmware for device drivers crash. When firmware
> > crashes devices can sometimes become unresponsive, and recovery
> > sometimes requires a driver unload / reload and in the worst cases
> > a reboot.
> > 
> > Using a taint flag allows us to annotate when this happens clearly.
> 
> I don't fully understand what this is meant to do, so I can't
> fully assess whether it's the right thing to do.

It is meant to taint the kernel to ensure it is clear that something
critically bad has happened with the device firmware, it crashed, and
recovery may or may not happen, we are not 100% certain.
> 
> But in this particular place in the IPA code, the *modem* has
> crashed.  And the IPA driver is not responsible for modem
> firmware, remoteproc is.

Oi vei. So the device it depends on has crashed.

> The IPA driver *can* be responsible for loading some other
> firmware, but even in that case, it only happens on initial
> boot, and it's basically assumed to never crash.

OK is this an issue which we can recover from? If for the slightest bit
this can affect users it is something we should inform them over.

This patch set is missing uevents for these issues, but I just added
support for this.

> So regardless of whether this module_firmware_crashed() call is
> appropriate in some places, I believe it should not be used here.

OK thanks. Can the user be affected by this crash? If so how? Can
we recover ? Is that always guaranteed?

  Luis


[PATCH] net/mlx4_core: fix a memory leak bug.

2020-05-21 Thread wu000273
From: Qiushi Wu 

In function mlx4_opreq_action(), pointer "mailbox" is not released,
when mlx4_cmd_box() return and error, causing a memory leak bug.
Fix this issue by going to "out" label, mlx4_free_cmd_mailbox() can
free this pointer.

Fixes: fe6f700d6cbb7 ("Respond to operation request by firmware")
Signed-off-by: Qiushi Wu 
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 6e501af0e532..f6ff9620a137 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -2734,7 +2734,7 @@ void mlx4_opreq_action(struct work_struct *work)
if (err) {
mlx4_err(dev, "Failed to retrieve required operation: 
%d\n",
 err);
-   return;
+   goto out;
}
MLX4_GET(modifier, outbox, GET_OP_REQ_MODIFIER_OFFSET);
MLX4_GET(token, outbox, GET_OP_REQ_TOKEN_OFFSET);
-- 
2.17.1



Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-21 Thread Luis Chamberlain
On Fri, May 22, 2020 at 08:12:59AM +0300, Emmanuel Grumbach wrote:
> >
> > On Tue, May 19, 2020 at 10:37 PM Emmanuel Grumbach  
> > wrote:
> > > So I believe we already have this uevent, it is the devcoredump. All
> > > we need is to add the unique id.
> >
> > I think there are a few reasons that devcoredump doesn't satisfy what
> > either Luis or I want.
> >
> > 1) it can be disabled entirely [1], for good reasons (e.g., think of
> > non-${CHIP_VENDOR} folks, who can't (and don't want to) do anything
> > with the opaque dumps provided by closed-source firmware)
> 
> Ok, if all you're interested into is the information that this event
> happen (as opposed to report a bug and providing the data), then I
> agree. 

I've now hit again a firmware crash with ath10k with the latest firwmare
and kernel and the *only* thing that helped recovery was a full reboot,
so that is a crystal clear case that this needs to taint the kernel, and
yes we do want to inform users too, so I've just added uevent support
for a few panic / taint events in the kernel now and rolled into my
series. I'll run some final tests and then post this as a follow up.

devlink didn't cut it, its networking specific.

  Luis


Re: piix4-poweroff.c I/O BAR usage

2020-05-21 Thread Paul Burton
Hello,

On Thu, May 21, 2020 at 6:04 PM Maciej W. Rozycki  wrote:
>  Paul may or may not be reachable anymore, so I'll step in.

I'm reachable but lacking free time & with no access to Malta hardware
I can't claim to be too useful here, so thanks for responding :)

Before being moved to a driver (which was mostly driven by a desire to
migrate Malta to a multi-platform/generic kernel using DT) this code
was part of arch/mips/mti-malta/ where I added it in commit
b6911bba598f ("MIPS: Malta: add suspend state entry code"). My main
motivation at the time was to make QEMU exit after running poweroff,
but I did ensure it worked on real Malta boards too (at least Malta-R
with CoreFPGA6). Over the years since then it shocked a couple of
hardware people to see software power off a Malta - if the original
hardware designers had intended that to work then the knowledge had
been lost over time :)

I suspect the code was based on visws_machine_power_off():

  
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/platform/visws/visws_quirks.c?h=v3.10#n125

> > pci_request_region() takes a BAR number (0-5), but here we're passing
> > PCI_BRIDGE_RESOURCES (13 if CONFIG_PCI_IOV, or 7 otherwise), which is
> > the bridge I/O window.
> >
> > I don't think this device ([8086:7113]) is a bridge, so that resource
> > should be empty.
>
>  Hmm, isn't the resource actually set up by `quirk_piix4_acpi' though?

I agree that the region used is meant to match that set up by
quirk_piix4_acpi(), which also refers to it using the
PCI_BRIDGE_RESOURCES macro.

Thanks,
Paul


Re: Re: [PATCH] [v2] usb: musb: Fix runtime PM imbalance on error

2020-05-21 Thread dinghao . liu
Sorry, it's my carelessness. In v1 I added pm_runtime_put_autosuspend()
after copy_from_user() to fix this problem. Since copy_from_user() is
moved to the beginning now, we need not to add PM decrement. 

Regards,
Dinghao

> On Fri, May 22, 2020 at 10:59:02AM +0800, Dinghao Liu wrote:
> > When copy_from_user() returns an error code, there
> > is a runtime PM usage counter imbalance.
> > 
> > Fix this by moving copy_from_user() to the beginning
> > of this function.
> > 
> > Signed-off-by: Dinghao Liu 
> > ---
> >  drivers/usb/musb/musb_debugfs.c | 10 +-
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> What changed from v1?  Always show that below the --- line as the
> documentation says to.
> 
> thanks,
> 
> greg k-h


Re: Panic related to perf (bisected)

2020-05-21 Thread Ian Rogers
On Thu, May 21, 2020 at 10:14 PM Tibor Billes  wrote:
>
> Hi,
>
> On Mon, 18 May 2020, Ian Rogers wrote:
>
> > On Sat, May 16, 2020 at 6:36 AM Billes Tibor  wrote:
> > >
> > > Hi,
> > >
> > > I've been hitting a freeze on my laptop since 5.3, but haven't got the
> > > time to finish bisecting it. Now
> > > I had, and here is what I found:
> > >
> > > - 5.2 series works correctly (tested 5.2.9 and 5.2.15)
> > > - 5.3 series and newer kernels freeze. The newest I tested is 5.6.10
> > > (which also freezes). There will
> > >be a complete bisect log at the end of the mail.
> > >
> > > There are several circumstances to reproduce the freeze. At least this
> > > is what I found relevant:
> > > - The freeze does not occur after a fresh boot, it needs a sleep-wakeup
> > > cycle.
> > > - Run `perf stat -a --topdown -- sleep 8s` every minute (It is part of
> > > some metrics I collect using Zabbix)
> > > - Some workload (building the kernel or gaming) increases the chance of
> > > freezing, but it can occur
> > >without user interaction too.
> > >
> > > The freeze usually comes within a few minutes after wakeup. The longest
> > > was about an hour. (For
> > > comparison, if I don't do a sleep-wakeup, the machine works fine for 8+
> > > hours).
> > >
> > > First, there is a warning in syslog, then I took pictures of the actual
> > > panic.
> > >
> > > The warning:
> > > May 16 13:28:46 serpens kernel: [33128.086217] [ cut here
> > > ]
> > > May 16 13:28:46 serpens kernel: [33128.086222] WARNING: CPU: 0 PID: 0 at
> > > arch/x86/events/core.c:1506 x86_pmu_del+0x140/0x160
> > > May 16 13:28:46 serpens kernel: [33128.086223] Modules linked in:
> > > nouveau mxm_wmi ttm ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc
> > > iptable_filter binfmt_misc essiv authenc uvcvideo iwlmvm
> > > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev mac80211
> > > videobuf2_common snd_hda_codec_realtek snd_hda_codec_generic libarc4
> > > snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep intel_rapl_msr
> > > iwlwifi snd_hda_core intel_rapl_common snd_pcm x86_pkg_temp_thermal
> > > snd_seq mei_me cfg80211 snd_seq_device snd_timer mei intel_powerclamp
> > > snd soundcore ideapad_laptop coretemp sparse_keymap ip_tables x_tables
> > > dm_crypt hid_generic usbhid hid i915 intel_gtt i2c_algo_bit
> > > drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> > > aesni_intel glue_helper crypto_simd r8169 ahci cryptd psmouse libahci
> > > realtek drm_panel_orientation_quirks video wmi
> > > May 16 13:28:46 serpens kernel: [33128.086247] CPU: 0 PID: 0 Comm:
> > > swapper/0 Not tainted 5.6.10 #52
> > > May 16 13:28:46 serpens kernel: [33128.086247] Hardware name: LENOVO
> > > 20378/Lenovo Y50-70, BIOS 9ECN36WW(V2.00) 01/12/2015
> > > May 16 13:28:46 serpens kernel: [33128.086249] RIP:
> > > 0010:x86_pmu_del+0x140/0x160
> > > May 16 13:28:46 serpens kernel: [33128.086250] Code: 63 d8 48 c7 84 dd
> > > 20 07 00 00 00 00 00 00 4c 89 e7 89 85 14 02 00 00 e8 fe 27 16 00 e9 ef
> > > fe ff ff 44 8d 6b 01 e9 5d ff ff ff <0f> 0b 5b 5d 41 5c 41 5d c3 31 db
> > > e9 41 ff ff ff 41 bd 01 00 00 00
> > > May 16 13:28:46 serpens kernel: [33128.086251] RSP:
> > > 0018:a2a380003e40 EFLAGS: 00010046
> > > May 16 13:28:46 serpens kernel: [33128.086252] RAX: 0005
> > > RBX: 0005 RCX: 0010
> > > May 16 13:28:46 serpens kernel: [33128.086253] RDX: 0005
> > > RSI: 0005 RDI: 907c4d4b1000
> > > May 16 13:28:46 serpens kernel: [33128.086254] RBP: 907d932125a0
> > > R08: 0002 R09: 00029f80
> > > May 16 13:28:46 serpens kernel: [33128.086254] R10: a2a380003eb0
> > > R11:  R12: 907c4d4b1000
> > > May 16 13:28:46 serpens kernel: [33128.086255] R13: 0006
> > > R14: 907d9326fc0c R15: 907d9326fb00
> > > May 16 13:28:46 serpens kernel: [33128.086256] FS:
> > > () GS:907d9320() knlGS:
> > > May 16 13:28:46 serpens kernel: [33128.086256] CS:  0010 DS:  ES:
> > >  CR0: 80050033
> > > May 16 13:28:46 serpens kernel: [33128.086257] CR2: 7f713c5b44b0
> > > CR3: 8300a001 CR4: 001606f0
> > > May 16 13:28:46 serpens kernel: [33128.086257] Call Trace:
> > > May 16 13:28:46 serpens kernel: [33128.086259]  
> > > May 16 13:28:46 serpens kernel: [33128.086263]
> > > event_sched_out.isra.116+0x89/0x1f0
> > > May 16 13:28:46 serpens kernel: [33128.086264]
> > > group_sched_out.part.118+0x55/0xd0
> > > May 16 13:28:46 serpens kernel: [33128.086265] ctx_sched_out+0x207/0x240
> > > May 16 13:28:46 serpens kernel: [33128.086267]
> > > perf_mux_hrtimer_handler+0x267/0x310
> > > May 16 13:28:46 serpens kernel: [33128.086269]  ?
> > > __perf_install_in_context+0x220/0x220
> > > May 16 13:28:46 serpens kernel: [33128.086270]
> > > __hrtimer_run_queues+0xfa/0x260
> > > May 16 13:28:46 serpens kernel: [33128.086272] 
> > > 

Re: [RFC 1/2] devlink: add simple fw crash helpers

2020-05-21 Thread Luis Chamberlain
On Tue, May 19, 2020 at 02:15:30PM -0700, Jakub Kicinski wrote:
> Add infra for creating devlink instances for a device to report

Thanks for doing this series as a PoC, counter to the module_firmware_crash()
which I proposed to taint the kernel with a firmware crash flag to the kernel
and module.

For those not famliar about devlink:

https://lwn.net/Articles/677967/
https://www.kernel.org/doc/html/latest/networking/devlink/index.html

The github page also is now 404 as Jiri merged that stuff into iproute2:

git://git.kernel.org/pub/scm/network/iproute2/iproute2.git

> fw crashes. This patch expects the devlink instance to be registered
> at probe time. I belive to be the cleanest. We can also add a devm
> version of the helpers, so that we don't have to do the clean up.
> Or we can go even further and register the devlink instance only
> once error has happened (for the first time, then we can just
> find out if already registered by traversing the list like we
> do here).
> 
> With the patch applied and a sample driver converted we get:
> 
> $ devlink dev
> pci/:07:00.0
> 
> Then monitor for errors:
> 
> $ devlink mon health
> [health,status] pci/:07:00.0:
>   reporter fw
> state error error 1 recover 0
> [health,status] pci/:07:00.0:
>   reporter fw
> state error error 2 recover 0
> 
> These are the events I triggered on purpose. One can also inspect
> the health of all devices capable of reporting fw errors:
> 
> $ devlink health
> pci/:07:00.0:
>   reporter fw
> state error error 7 recover 0
> 
> Obviously drivers may upgrade to the full devlink health API
> which includes state dump, state dump auto-collect and automatic
> error recovery control.
> 
> Signed-off-by: Jakub Kicinski 
> ---
>  include/linux/devlink.h   |  11 +++
>  net/core/Makefile |   2 +-
>  net/core/devlink_simple_fw_reporter.c | 101 ++
>  3 files changed, 113 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/devlink.h
>  create mode 100644 net/core/devlink_simple_fw_reporter.c
> 
> diff --git a/include/linux/devlink.h b/include/linux/devlink.h
> new file mode 100644
> index ..2b73987eefca
> --- /dev/null
> +++ b/include/linux/devlink.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _LINUX_DEVLINK_H_
> +#define _LINUX_DEVLINK_H_
> +
> +struct device;
> +
> +void devlink_simple_fw_reporter_prepare(struct device *dev);
> +void devlink_simple_fw_reporter_cleanup(struct device *dev);
> +void devlink_simple_fw_reporter_report_crash(struct device *dev);
> +
> +#endif
> diff --git a/net/core/Makefile b/net/core/Makefile
> index 3e2c378e5f31..6f1513781c17 100644
> --- a/net/core/Makefile
> +++ b/net/core/Makefile
> @@ -31,7 +31,7 @@ obj-$(CONFIG_LWTUNNEL_BPF) += lwt_bpf.o
>  obj-$(CONFIG_BPF_STREAM_PARSER) += sock_map.o
>  obj-$(CONFIG_DST_CACHE) += dst_cache.o
>  obj-$(CONFIG_HWBM) += hwbm.o
> -obj-$(CONFIG_NET_DEVLINK) += devlink.o
> +obj-$(CONFIG_NET_DEVLINK) += devlink.o devlink_simple_fw_reporter.o

This was looking super sexy up to here. This is networking specific.
We want something generic for *anything* that requests firmware.

I'm afraid this won't work for something generic. I don't think its
throw-away work though, the idea to provide a generic interface to
dump firmware through netlink might be nice for networking, or other
things.

But I have a feeling we'll want something still more generic than this.

So networking may want to be aware that a firmware crash happened as
part of this network device health thing, but firmware crashing is a
generic thing.

I have now extended my patch set to include uvents and I am more set on
that we need the taint now more than ever.

  Luis

>  obj-$(CONFIG_GRO_CELLS) += gro_cells.o
>  obj-$(CONFIG_FAILOVER) += failover.o
>  obj-$(CONFIG_BPF_SYSCALL) += bpf_sk_storage.o
> diff --git a/net/core/devlink_simple_fw_reporter.c 
> b/net/core/devlink_simple_fw_reporter.c
> new file mode 100644
> index ..48dde9123c3c
> --- /dev/null
> +++ b/net/core/devlink_simple_fw_reporter.c
> @@ -0,0 +1,101 @@
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct devlink_simple_fw_reporter {
> + struct list_head list;
> + struct devlink_health_reporter *reporter;
> +};
> +
> +
> +static LIST_HEAD(devlink_simple_fw_reporters);
> +static DEFINE_MUTEX(devlink_simple_fw_reporters_mutex);
> +
> +static const struct devlink_health_reporter_ops simple_devlink_health = {
> + .name = "fw",
> +};
> +
> +static const struct devlink_ops simple_devlink_ops = {
> +};
> +
> +static struct devlink_simple_fw_reporter *
> +devlink_simple_fw_reporter_find_for_dev(struct device *dev)
> +{
> + struct devlink_simple_fw_reporter *simple_devlink, *ret = NULL;
> + struct devlink *devlink;
> +
> + mutex_lock(_simple_fw_reporters_mutex);
> + list_for_each_entry(simple_devlink, _simple_fw_reporters,
> + list) {
> +

[PATCH v2 0/4] mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()

2020-05-21 Thread John Hubbard
The purpose of posting this series is to launch a test in the
intel-gfx-ci tree. (The patches have already been merged into Andrew's
linux-mm tree.)

This applies to today's linux.git (note the base-commit tag at the
bottom).

Changes since V1:

* Fixed a bug in the refactoring patch: added FOLL_FAST_ONLY to the
  list of gup_flags *not* to WARN() on. This lead to a failure in the
  first intel-gfx-ci test run [1].

[1] 
https://lore.kernel.org/r/159008745422.32320.5724805750977048...@build.alporthouse.com

Original cover letter:

This needs to go through Andrew's -mm tree, due to adding a new gup.c
routine. However, I would really love to have some testing from the
drm/i915 folks, because I haven't been able to run-time test that part
of it.

Otherwise, though, the series has passed my basic run time testing:
some LTP tests, some xfs and etx4 non-destructive xfstests, and an
assortment of other smaller ones: vm selftests, io_uring_register, a
few more. But that's only on one particular machine. Also, cross-compile
tests for half a dozen arches all pass.

Details:

In order to convert the drm/i915 driver from get_user_pages() to
pin_user_pages(), a FOLL_PIN equivalent of __get_user_pages_fast() was
required. That led to refactoring __get_user_pages_fast(), with the
following goals:

1) As above: provide a pin_user_pages*() routine for drm/i915 to call,
   in place of __get_user_pages_fast(),

2) Get rid of the gup.c duplicate code for walking page tables with
   interrupts disabled. This duplicate code is a minor maintenance
   problem anyway.

3) Make it easy for an upcoming patch from Souptick, which aims to
   convert __get_user_pages_fast() to use a gup_flags argument, instead
   of a bool writeable arg.  Also, if this series looks good, we can
   ask Souptick to change the name as well, to whatever the consensus
   is. My initial recommendation is: get_user_pages_fast_only(), to
   match the new pin_user_pages_only().

John Hubbard (4):
  mm/gup: move __get_user_pages_fast() down a few lines in gup.c
  mm/gup: refactor and de-duplicate gup_fast() code
  mm/gup: introduce pin_user_pages_fast_only()
  drm/i915: convert get_user_pages() --> pin_user_pages()

 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  22 +--
 include/linux/mm.h  |   3 +
 mm/gup.c| 153 
 3 files changed, 109 insertions(+), 69 deletions(-)


base-commit: 051143e1602d90ea71887d92363edd539d411de5
-- 
2.26.2



[PATCH v2 1/4] mm/gup: move __get_user_pages_fast() down a few lines in gup.c

2020-05-21 Thread John Hubbard
This is in order to avoid a forward declaration of
internal_get_user_pages_fast(), in the next patch.

This is code movement only--all generated code should
be identical.

Signed-off-by: John Hubbard 
---
 mm/gup.c | 112 +++
 1 file changed, 56 insertions(+), 56 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 50cd9323efff..4502846d57f9 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2666,62 +2666,6 @@ static bool gup_fast_permitted(unsigned long start, 
unsigned long end)
 }
 #endif
 
-/*
- * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back 
to
- * the regular GUP.
- * Note a difference with get_user_pages_fast: this always returns the
- * number of pages pinned, 0 if no pages were pinned.
- *
- * If the architecture does not support this function, simply return with no
- * pages pinned.
- */
-int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
- struct page **pages)
-{
-   unsigned long len, end;
-   unsigned long flags;
-   int nr_pinned = 0;
-   /*
-* Internally (within mm/gup.c), gup fast variants must set FOLL_GET,
-* because gup fast is always a "pin with a +1 page refcount" request.
-*/
-   unsigned int gup_flags = FOLL_GET;
-
-   if (write)
-   gup_flags |= FOLL_WRITE;
-
-   start = untagged_addr(start) & PAGE_MASK;
-   len = (unsigned long) nr_pages << PAGE_SHIFT;
-   end = start + len;
-
-   if (end <= start)
-   return 0;
-   if (unlikely(!access_ok((void __user *)start, len)))
-   return 0;
-
-   /*
-* Disable interrupts.  We use the nested form as we can already have
-* interrupts disabled by get_futex_key.
-*
-* With interrupts disabled, we block page table pages from being
-* freed from under us. See struct mmu_table_batch comments in
-* include/asm-generic/tlb.h for more details.
-*
-* We do not adopt an rcu_read_lock(.) here as we also want to
-* block IPIs that come from THPs splitting.
-*/
-
-   if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
-   gup_fast_permitted(start, end)) {
-   local_irq_save(flags);
-   gup_pgd_range(start, end, gup_flags, pages, _pinned);
-   local_irq_restore(flags);
-   }
-
-   return nr_pinned;
-}
-EXPORT_SYMBOL_GPL(__get_user_pages_fast);
-
 static int __gup_longterm_unlocked(unsigned long start, int nr_pages,
   unsigned int gup_flags, struct page **pages)
 {
@@ -2794,6 +2738,62 @@ static int internal_get_user_pages_fast(unsigned long 
start, int nr_pages,
return ret;
 }
 
+/*
+ * Like get_user_pages_fast() except it's IRQ-safe in that it won't fall back 
to
+ * the regular GUP.
+ * Note a difference with get_user_pages_fast: this always returns the
+ * number of pages pinned, 0 if no pages were pinned.
+ *
+ * If the architecture does not support this function, simply return with no
+ * pages pinned.
+ */
+int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
+ struct page **pages)
+{
+   unsigned long len, end;
+   unsigned long flags;
+   int nr_pinned = 0;
+   /*
+* Internally (within mm/gup.c), gup fast variants must set FOLL_GET,
+* because gup fast is always a "pin with a +1 page refcount" request.
+*/
+   unsigned int gup_flags = FOLL_GET;
+
+   if (write)
+   gup_flags |= FOLL_WRITE;
+
+   start = untagged_addr(start) & PAGE_MASK;
+   len = (unsigned long) nr_pages << PAGE_SHIFT;
+   end = start + len;
+
+   if (end <= start)
+   return 0;
+   if (unlikely(!access_ok((void __user *)start, len)))
+   return 0;
+
+   /*
+* Disable interrupts.  We use the nested form as we can already have
+* interrupts disabled by get_futex_key.
+*
+* With interrupts disabled, we block page table pages from being
+* freed from under us. See struct mmu_table_batch comments in
+* include/asm-generic/tlb.h for more details.
+*
+* We do not adopt an rcu_read_lock(.) here as we also want to
+* block IPIs that come from THPs splitting.
+*/
+
+   if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
+   gup_fast_permitted(start, end)) {
+   local_irq_save(flags);
+   gup_pgd_range(start, end, gup_flags, pages, _pinned);
+   local_irq_restore(flags);
+   }
+
+   return nr_pinned;
+}
+EXPORT_SYMBOL_GPL(__get_user_pages_fast);
+
 /**
  * get_user_pages_fast() - pin user pages in memory
  * @start:  starting user address
-- 
2.26.2



[PATCH v2 2/4] mm/gup: refactor and de-duplicate gup_fast() code

2020-05-21 Thread John Hubbard
There were two nearly identical sets of code for gup_fast()
style of walking the page tables with interrupts disabled.
This has lead to the usual maintenance problems that arise from
having duplicated code.

There is already a core internal routine in gup.c for gup_fast(),
so just enhance it very slightly: allow skipping the fall-back
to "slow" (regular) get_user_pages(), via the new FOLL_FAST_ONLY
flag. Then, just call internal_get_user_pages_fast() from
__get_user_pages_fast(), and adjust the API to match pre-existing
API behavior.

There is a change in behavior from this refactoring: the nested
form of interrupt disabling is used in all gup_fast() variants
now. That's because there is only one place that interrupt disabling
for page walking is done, and so the safer form is required. This
should, if anything, eliminate possible (rare) bugs, because the
non-nested form of enabling interrupts was fragile at best.

Signed-off-by: John Hubbard 
---
 include/linux/mm.h |  1 +
 mm/gup.c   | 63 ++
 2 files changed, 31 insertions(+), 33 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index a5594ac9ebe3..84b601cab699 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2782,6 +2782,7 @@ struct page *follow_page(struct vm_area_struct *vma, 
unsigned long address,
 #define FOLL_LONGTERM  0x1 /* mapping lifetime is indefinite: see below */
 #define FOLL_SPLIT_PMD 0x2 /* split huge pmd before returning */
 #define FOLL_PIN   0x4 /* pages must be released via unpin_user_page */
+#define FOLL_FAST_ONLY 0x8 /* gup_fast: prevent fall-back to slow gup */
 
 /*
  * FOLL_PIN and FOLL_LONGTERM may be used in various combinations with each
diff --git a/mm/gup.c b/mm/gup.c
index 4502846d57f9..4564b0dc7d0b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2694,10 +2694,12 @@ static int internal_get_user_pages_fast(unsigned long 
start, int nr_pages,
struct page **pages)
 {
unsigned long addr, len, end;
+   unsigned long flags;
int nr_pinned = 0, ret = 0;
 
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
-  FOLL_FORCE | FOLL_PIN | FOLL_GET)))
+  FOLL_FORCE | FOLL_PIN | FOLL_GET |
+  FOLL_FAST_ONLY)))
return -EINVAL;
 
start = untagged_addr(start) & PAGE_MASK;
@@ -2710,15 +2712,26 @@ static int internal_get_user_pages_fast(unsigned long 
start, int nr_pages,
if (unlikely(!access_ok((void __user *)start, len)))
return -EFAULT;
 
+   /*
+* Disable interrupts. The nested form is used, in order to allow full,
+* general purpose use of this routine.
+*
+* With interrupts disabled, we block page table pages from being
+* freed from under us. See struct mmu_table_batch comments in
+* include/asm-generic/tlb.h for more details.
+*
+* We do not adopt an rcu_read_lock(.) here as we also want to
+* block IPIs that come from THPs splitting.
+*/
if (IS_ENABLED(CONFIG_HAVE_FAST_GUP) &&
gup_fast_permitted(start, end)) {
-   local_irq_disable();
+   local_irq_save(flags);
gup_pgd_range(addr, end, gup_flags, pages, _pinned);
-   local_irq_enable();
+   local_irq_restore(flags);
ret = nr_pinned;
}
 
-   if (nr_pinned < nr_pages) {
+   if (nr_pinned < nr_pages && !(gup_flags & FOLL_FAST_ONLY)) {
/* Try to get the remaining pages with get_user_pages */
start += nr_pinned << PAGE_SHIFT;
pages += nr_pinned;
@@ -2750,45 +2763,29 @@ static int internal_get_user_pages_fast(unsigned long 
start, int nr_pages,
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  struct page **pages)
 {
-   unsigned long len, end;
-   unsigned long flags;
-   int nr_pinned = 0;
+   int nr_pinned;
/*
 * Internally (within mm/gup.c), gup fast variants must set FOLL_GET,
 * because gup fast is always a "pin with a +1 page refcount" request.
+*
+* FOLL_FAST_ONLY is required in order to match the API description of
+* this routine: no fall back to regular ("slow") GUP.
 */
-   unsigned int gup_flags = FOLL_GET;
+   unsigned int gup_flags = FOLL_GET | FOLL_FAST_ONLY;
 
if (write)
gup_flags |= FOLL_WRITE;
 
-   start = untagged_addr(start) & PAGE_MASK;
-   len = (unsigned long) nr_pages << PAGE_SHIFT;
-   end = start + len;
-
-   if (end <= start)
-   return 0;
-   if (unlikely(!access_ok((void __user *)start, len)))
-   return 0;
-
+   nr_pinned = internal_get_user_pages_fast(start, nr_pages, gup_flags,
+ 

[PATCH v2 4/4] drm/i915: convert get_user_pages() --> pin_user_pages()

2020-05-21 Thread John Hubbard
This code was using get_user_pages*(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages*() + put_page() calls to
pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/

Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 22 -
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 7ffd7afeb7a5..b55ac7563189 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -471,7 +471,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct 
*_work)
down_read(>mmap_sem);
locked = 1;
}
-   ret = get_user_pages_remote
+   ret = pin_user_pages_remote
(work->task, mm,
 obj->userptr.ptr + pinned * PAGE_SIZE,
 npages - pinned,
@@ -507,7 +507,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct 
*_work)
}
mutex_unlock(>mm.lock);
 
-   release_pages(pvec, pinned);
+   unpin_user_pages(pvec, pinned);
kvfree(pvec);
 
i915_gem_object_put(obj);
@@ -564,6 +564,7 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
struct sg_table *pages;
bool active;
int pinned;
+   unsigned int gup_flags = 0;
 
/* If userspace should engineer that these pages are replaced in
 * the vma between us binding this page into the GTT and completion
@@ -598,11 +599,14 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
  GFP_KERNEL |
  __GFP_NORETRY |
  __GFP_NOWARN);
-   if (pvec) /* defer to worker if malloc fails */
-   pinned = __get_user_pages_fast(obj->userptr.ptr,
-  num_pages,
-  
!i915_gem_object_is_readonly(obj),
-  pvec);
+   /* defer to worker if malloc fails */
+   if (pvec) {
+   if (!i915_gem_object_is_readonly(obj))
+   gup_flags |= FOLL_WRITE;
+   pinned = pin_user_pages_fast_only(obj->userptr.ptr,
+ num_pages, gup_flags,
+ pvec);
+   }
}
 
active = false;
@@ -620,7 +624,7 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
__i915_gem_userptr_set_active(obj, true);
 
if (IS_ERR(pages))
-   release_pages(pvec, pinned);
+   unpin_user_pages(pvec, pinned);
kvfree(pvec);
 
return PTR_ERR_OR_ZERO(pages);
@@ -675,7 +679,7 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
}
 
mark_page_accessed(page);
-   put_page(page);
+   unpin_user_page(page);
}
obj->mm.dirty = false;
 
-- 
2.26.2



Re: [PATCH] PM: runtime: clk: Fix clk_pm_runtime_get() error path

2020-05-21 Thread Marek Szyprowski
Hi Rafael,

On 21.05.2020 19:08, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
>
> clk_pm_runtime_get() assumes that the PM-runtime usage counter will
> be dropped by pm_runtime_get_sync() on errors, which is not the case,
> so PM-runtime references to devices acquired by the former are leaked
> on errors returned by the latter.
>
> Fix this by modifying clk_pm_runtime_get() to drop the reference if
> pm_runtime_get_sync() returns an error.
>
> Fixes: 9a34b45397e5 clk: Add support for runtime PM
> Cc: 4.15+  # 4.15+
> Signed-off-by: Rafael J. Wysocki 

Frankly, I would rather fix the runtime_get_sync() instead of fixing the 
return path everywhere in the kernel. The current behavior of the 
pm_runtime_get_sync() is completely counter-intuitive then. I bet that 
in the 99% of the places where it is being called assume that no special 
fixup is needed in case of failure. This is one of the most common 
runtime PM related function and it is really a common pattern in the 
drivers to call:

pm_runtime_get_sync()

do something with the hardware

pm_runtime_put()

Do you really want to fix the error paths of the all such calls?


> ---
>   drivers/clk/clk.c |6 +-
>   1 file changed, 5 insertions(+), 1 deletion(-)
>
> Index: linux-pm/drivers/clk/clk.c
> ===
> --- linux-pm.orig/drivers/clk/clk.c
> +++ linux-pm/drivers/clk/clk.c
> @@ -114,7 +114,11 @@ static int clk_pm_runtime_get(struct clk
>   return 0;
>   
>   ret = pm_runtime_get_sync(core->dev);
> - return ret < 0 ? ret : 0;
> + if (ret < 0) {
> + pm_runtime_put_noidle(core->dev);
> + return ret;
> + }
> + return 0;
>   }
>   
>   static void clk_pm_runtime_put(struct clk_core *core)
>
>
>
>
Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland



[PATCH v2 3/4] mm/gup: introduce pin_user_pages_fast_only()

2020-05-21 Thread John Hubbard
This is the FOLL_PIN equivalent of __get_user_pages_fast(),
except with a more descriptive name, and gup_flags instead of
a boolean "write" in the argument list.

Signed-off-by: John Hubbard 
---
 include/linux/mm.h |  2 ++
 mm/gup.c   | 36 
 2 files changed, 38 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 84b601cab699..98be7289d7e9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1820,6 +1820,8 @@ extern int mprotect_fixup(struct vm_area_struct *vma,
  */
 int __get_user_pages_fast(unsigned long start, int nr_pages, int write,
  struct page **pages);
+int pin_user_pages_fast_only(unsigned long start, int nr_pages,
+unsigned int gup_flags, struct page **pages);
 /*
  * per-process(per-mm_struct) statistics.
  */
diff --git a/mm/gup.c b/mm/gup.c
index 4564b0dc7d0b..6fa9b2016a53 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2859,6 +2859,42 @@ int pin_user_pages_fast(unsigned long start, int 
nr_pages,
 }
 EXPORT_SYMBOL_GPL(pin_user_pages_fast);
 
+/*
+ * This is the FOLL_PIN equivalent of __get_user_pages_fast(). Behavior is the
+ * same, except that this one sets FOLL_PIN instead of FOLL_GET.
+ *
+ * The API rules are the same, too: no negative values may be returned.
+ */
+int pin_user_pages_fast_only(unsigned long start, int nr_pages,
+unsigned int gup_flags, struct page **pages)
+{
+   int nr_pinned;
+
+   /*
+* FOLL_GET and FOLL_PIN are mutually exclusive. Note that the API
+* rules require returning 0, rather than -errno:
+*/
+   if (WARN_ON_ONCE(gup_flags & FOLL_GET))
+   return 0;
+   /*
+* FOLL_FAST_ONLY is required in order to match the API description of
+* this routine: no fall back to regular ("slow") GUP.
+*/
+   gup_flags |= (FOLL_PIN | FOLL_FAST_ONLY);
+   nr_pinned = internal_get_user_pages_fast(start, nr_pages, gup_flags,
+pages);
+   /*
+* This routine is not allowed to return negative values. However,
+* internal_get_user_pages_fast() *can* return -errno. Therefore,
+* correct for that here:
+*/
+   if (nr_pinned < 0)
+   nr_pinned = 0;
+
+   return nr_pinned;
+}
+EXPORT_SYMBOL_GPL(pin_user_pages_fast_only);
+
 /**
  * pin_user_pages_remote() - pin pages of a remote process (task != current)
  *
-- 
2.26.2



linux-next: manual merge of the device-mapper tree with the block tree

2020-05-21 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the device-mapper tree got a conflict in:

  drivers/md/dm-zoned-metadata.c

between commit:

  c64644ce363b ("block: remove the error_sector argument to blkdev_issue_flush")

from the block tree and commit:

  bf28a3ba0986 ("dm zoned: store device in struct dmz_sb")

from the device-mapper tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/md/dm-zoned-metadata.c
index bf2245370305,db0dc2b5d44d..
--- a/drivers/md/dm-zoned-metadata.c
+++ b/drivers/md/dm-zoned-metadata.c
@@@ -659,9 -816,10 +816,10 @@@ static int dmz_write_sb(struct dmz_meta
sb->crc = 0;
sb->crc = cpu_to_le32(crc32_le(sb_gen, (unsigned char *)sb, 
DMZ_BLOCK_SIZE));
  
-   ret = dmz_rdwr_block(zmd, REQ_OP_WRITE, block, mblk->page);
+   ret = dmz_rdwr_block(dev, REQ_OP_WRITE, zmd->sb[set].block,
+mblk->page);
if (ret == 0)
-   ret = blkdev_issue_flush(zmd->dev->bdev, GFP_NOIO);
 -  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO, NULL);
++  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO);
  
return ret;
  }
@@@ -703,7 -862,7 +862,7 @@@ static int dmz_write_dirty_mblocks(stru
  
/* Flush drive cache (this will also sync data) */
if (ret == 0)
-   ret = blkdev_issue_flush(zmd->dev->bdev, GFP_NOIO);
 -  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO, NULL);
++  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO);
  
return ret;
  }
@@@ -772,7 -933,7 +933,7 @@@ int dmz_flush_metadata(struct dmz_metad
  
/* If there are no dirty metadata blocks, just flush the device cache */
if (list_empty(_list)) {
-   ret = blkdev_issue_flush(zmd->dev->bdev, GFP_NOIO);
 -  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO, NULL);
++  ret = blkdev_issue_flush(dev->bdev, GFP_NOIO);
goto err;
}
  


pgpSZHCh8TIRK.pgp
Description: OpenPGP digital signature


Re: [PATCH v2 01/15] taint: add module firmware crash taint support

2020-05-21 Thread Luis Chamberlain
On Tue, May 19, 2020 at 06:42:31PM +0200, Jessica Yu wrote:
> +++ Luis Chamberlain [15/05/20 21:28 +]:
> > Device driver firmware can crash, and sometimes, this can leave your
> > system in a state which makes the device or subsystem completely
> > useless. Detecting this by inspecting /proc/sys/kernel/tainted instead
> > of scraping some magical words from the kernel log, which is driver
> > specific, is much easier. So instead provide a helper which lets drivers
> > annotate this.
> > 
> > Once this happens, scrapers can easily look for modules taint flags
> > for a firmware crash. This will taint both the kernel and respective
> > calling module.
> > 
> > The new helper module_firmware_crashed() uses LOCKDEP_STILL_OK as this
> > fact should in no way shape or form affect lockdep. This taint is device
> > driver specific.
> > 
> > Signed-off-by: Luis Chamberlain 
> > ---
> > Documentation/admin-guide/tainted-kernels.rst |  6 ++
> > include/linux/kernel.h|  3 ++-
> > include/linux/module.h| 13 +
> > include/trace/events/module.h |  3 ++-
> > kernel/module.c   |  5 +++--
> > kernel/panic.c|  1 +
> > tools/debugging/kernel-chktaint   |  7 +++
> > 7 files changed, 34 insertions(+), 4 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/tainted-kernels.rst 
> > b/Documentation/admin-guide/tainted-kernels.rst
> > index 71e9184a9079..92530f1d60ae 100644
> > --- a/Documentation/admin-guide/tainted-kernels.rst
> > +++ b/Documentation/admin-guide/tainted-kernels.rst
> > @@ -100,6 +100,7 @@ Bit  Log  Number  Reason that got the kernel tainted
> >  15  _/K   32768  kernel has been live patched
> >  16  _/X   65536  auxiliary taint, defined for and used by distros
> >  17  _/T  131072  kernel was built with the struct randomization plugin
> > + 18  _/Q  262144  driver firmware crash annotation
> > ===  ===  ==  
> > 
> > Note: The character ``_`` is representing a blank in this table to make 
> > reading
> > @@ -162,3 +163,8 @@ More detailed explanation for tainting
> >  produce extremely unusual kernel structure layouts (even performance
> >  pathological ones), which is important to know when debugging. Set at
> >  build time.
> > +
> > + 18) ``Q`` used by device drivers to annotate that the device driver's 
> > firmware
> > + has crashed and the device's operation has been severely affected. The
> > + device may be left in a crippled state, requiring full driver removal 
> > /
> > + addition, system reboot, or it is unclear how long recovery will take.
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > index 04a5885cec1b..19e1541c82c7 100644
> > --- a/include/linux/kernel.h
> > +++ b/include/linux/kernel.h
> > @@ -601,7 +601,8 @@ extern enum system_states {
> > #define TAINT_LIVEPATCH 15
> > #define TAINT_AUX   16
> > #define TAINT_RANDSTRUCT17
> > -#define TAINT_FLAGS_COUNT  18
> > +#define TAINT_FIRMWARE_CRASH   18
> > +#define TAINT_FLAGS_COUNT  19
> > 
> > struct taint_flag {
> > char c_true;/* character printed when tainted */
> > diff --git a/include/linux/module.h b/include/linux/module.h
> > index 2c2e988bcf10..221200078180 100644
> > --- a/include/linux/module.h
> > +++ b/include/linux/module.h
> > @@ -697,6 +697,14 @@ static inline bool is_livepatch_module(struct module 
> > *mod)
> > bool is_module_sig_enforced(void);
> > void set_module_sig_enforced(void);
> > 
> > +void add_taint_module(struct module *mod, unsigned flag,
> > + enum lockdep_ok lockdep_ok);
> > +
> > +static inline void module_firmware_crashed(void)
> > +{
> > +   add_taint_module(THIS_MODULE, TAINT_FIRMWARE_CRASH, LOCKDEP_STILL_OK);
> > +}
> 
> Just a nit: I think module_firmware_crashed() is a confusing name - it
> doesn't really tell me what it's doing, and it's not really related to
> the rest of the module_* symbols, which mostly have to do with module
> loader/module specifics. Especially since a driver can be built-in, too.
> How about taint_firmware_crashed() or something similar?

Sure.

> Also, I think we might crash in add_taint_module() if a driver is
> built into the kernel, because THIS_MODULE will be null and there is
> no null pointer check in add_taint_module(). We could unify the
> CONFIG_MODULES and !CONFIG_MODULES stubs and either add an `if (mod)`
> check in add_taint_module() or add an #ifdef MODULE check in the stub
> itself to call add_taint() or add_taint_module() as appropriate. Hope
> that makes sense.

I had to do something a bit different but I think you'll agree with it.
Will include it in my next iteration.

  Luis


Re: Panic related to perf (bisected)

2020-05-21 Thread Tibor Billes
Hi,

On Mon, 18 May 2020, Ian Rogers wrote:

> On Sat, May 16, 2020 at 6:36 AM Billes Tibor  wrote:
> >
> > Hi,
> >
> > I've been hitting a freeze on my laptop since 5.3, but haven't got the
> > time to finish bisecting it. Now
> > I had, and here is what I found:
> >
> > - 5.2 series works correctly (tested 5.2.9 and 5.2.15)
> > - 5.3 series and newer kernels freeze. The newest I tested is 5.6.10
> > (which also freezes). There will
> >be a complete bisect log at the end of the mail.
> >
> > There are several circumstances to reproduce the freeze. At least this
> > is what I found relevant:
> > - The freeze does not occur after a fresh boot, it needs a sleep-wakeup
> > cycle.
> > - Run `perf stat -a --topdown -- sleep 8s` every minute (It is part of
> > some metrics I collect using Zabbix)
> > - Some workload (building the kernel or gaming) increases the chance of
> > freezing, but it can occur
> >without user interaction too.
> >
> > The freeze usually comes within a few minutes after wakeup. The longest
> > was about an hour. (For
> > comparison, if I don't do a sleep-wakeup, the machine works fine for 8+
> > hours).
> >
> > First, there is a warning in syslog, then I took pictures of the actual
> > panic.
> >
> > The warning:
> > May 16 13:28:46 serpens kernel: [33128.086217] [ cut here
> > ]
> > May 16 13:28:46 serpens kernel: [33128.086222] WARNING: CPU: 0 PID: 0 at
> > arch/x86/events/core.c:1506 x86_pmu_del+0x140/0x160
> > May 16 13:28:46 serpens kernel: [33128.086223] Modules linked in:
> > nouveau mxm_wmi ttm ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc
> > iptable_filter binfmt_misc essiv authenc uvcvideo iwlmvm
> > videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev mac80211
> > videobuf2_common snd_hda_codec_realtek snd_hda_codec_generic libarc4
> > snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hwdep intel_rapl_msr
> > iwlwifi snd_hda_core intel_rapl_common snd_pcm x86_pkg_temp_thermal
> > snd_seq mei_me cfg80211 snd_seq_device snd_timer mei intel_powerclamp
> > snd soundcore ideapad_laptop coretemp sparse_keymap ip_tables x_tables
> > dm_crypt hid_generic usbhid hid i915 intel_gtt i2c_algo_bit
> > drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm
> > aesni_intel glue_helper crypto_simd r8169 ahci cryptd psmouse libahci
> > realtek drm_panel_orientation_quirks video wmi
> > May 16 13:28:46 serpens kernel: [33128.086247] CPU: 0 PID: 0 Comm:
> > swapper/0 Not tainted 5.6.10 #52
> > May 16 13:28:46 serpens kernel: [33128.086247] Hardware name: LENOVO
> > 20378/Lenovo Y50-70, BIOS 9ECN36WW(V2.00) 01/12/2015
> > May 16 13:28:46 serpens kernel: [33128.086249] RIP:
> > 0010:x86_pmu_del+0x140/0x160
> > May 16 13:28:46 serpens kernel: [33128.086250] Code: 63 d8 48 c7 84 dd
> > 20 07 00 00 00 00 00 00 4c 89 e7 89 85 14 02 00 00 e8 fe 27 16 00 e9 ef
> > fe ff ff 44 8d 6b 01 e9 5d ff ff ff <0f> 0b 5b 5d 41 5c 41 5d c3 31 db
> > e9 41 ff ff ff 41 bd 01 00 00 00
> > May 16 13:28:46 serpens kernel: [33128.086251] RSP:
> > 0018:a2a380003e40 EFLAGS: 00010046
> > May 16 13:28:46 serpens kernel: [33128.086252] RAX: 0005
> > RBX: 0005 RCX: 0010
> > May 16 13:28:46 serpens kernel: [33128.086253] RDX: 0005
> > RSI: 0005 RDI: 907c4d4b1000
> > May 16 13:28:46 serpens kernel: [33128.086254] RBP: 907d932125a0
> > R08: 0002 R09: 00029f80
> > May 16 13:28:46 serpens kernel: [33128.086254] R10: a2a380003eb0
> > R11:  R12: 907c4d4b1000
> > May 16 13:28:46 serpens kernel: [33128.086255] R13: 0006
> > R14: 907d9326fc0c R15: 907d9326fb00
> > May 16 13:28:46 serpens kernel: [33128.086256] FS:
> > () GS:907d9320() knlGS:
> > May 16 13:28:46 serpens kernel: [33128.086256] CS:  0010 DS:  ES:
> >  CR0: 80050033
> > May 16 13:28:46 serpens kernel: [33128.086257] CR2: 7f713c5b44b0
> > CR3: 8300a001 CR4: 001606f0
> > May 16 13:28:46 serpens kernel: [33128.086257] Call Trace:
> > May 16 13:28:46 serpens kernel: [33128.086259]  
> > May 16 13:28:46 serpens kernel: [33128.086263]
> > event_sched_out.isra.116+0x89/0x1f0
> > May 16 13:28:46 serpens kernel: [33128.086264]
> > group_sched_out.part.118+0x55/0xd0
> > May 16 13:28:46 serpens kernel: [33128.086265] ctx_sched_out+0x207/0x240
> > May 16 13:28:46 serpens kernel: [33128.086267]
> > perf_mux_hrtimer_handler+0x267/0x310
> > May 16 13:28:46 serpens kernel: [33128.086269]  ?
> > __perf_install_in_context+0x220/0x220
> > May 16 13:28:46 serpens kernel: [33128.086270]
> > __hrtimer_run_queues+0xfa/0x260
> > May 16 13:28:46 serpens kernel: [33128.086272] hrtimer_interrupt+0xe5/0x240
> > May 16 13:28:46 serpens kernel: [33128.086275]  ?
> > recalibrate_cpu_khz+0x10/0x10
> > May 16 13:28:46 serpens kernel: [33128.086278]
> > smp_apic_timer_interrupt+0x62/0x120
> > May 16 13:28:46 serpens kernel: [33128.086280] 

Re: [PATCH v2 12/15] ath10k: use new module_firmware_crashed()

2020-05-21 Thread Emmanuel Grumbach
>
> On Tue, May 19, 2020 at 10:37 PM Emmanuel Grumbach  
> wrote:
> > So I believe we already have this uevent, it is the devcoredump. All
> > we need is to add the unique id.
>
> I think there are a few reasons that devcoredump doesn't satisfy what
> either Luis or I want.
>
> 1) it can be disabled entirely [1], for good reasons (e.g., think of
> non-${CHIP_VENDOR} folks, who can't (and don't want to) do anything
> with the opaque dumps provided by closed-source firmware)

Ok, if all you're interested into is the information that this event
happen (as opposed to report a bug and providing the data), then I
agree. True, not everybody want or can enable devcoredump. I am just a
bit concerned that we may end up with two interface that notify the
same event basically. The ideal maybe would be to be able to
optionally reduce the content of the devoredump to nothing more that
is already in the dmesg output. But then, it is not what it is meant
to be: namely, a core dump..

> 2) not all drivers necessarily have a useful dump to provide when
> there's a crash; look at the rest of Luis's series to see the kinds of
> drivers-with-firmware that are crashing, some of which aren't dumping
> anything

Fair enouh.

> 3) for those that do support devcoredump, it may be used for purposes
> that are not "crashes" -- e.g., some provide debugfs or other knobs to
> initiate dumps, for diagnostic or debugging purposes

Not sure I really think we need to care about those cases, but you
already have 2 good arguments :)

>
> Brian
>
> [1] devcd_disabled
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/devcoredump.c?h=v5.6#n22


Re: [PATCH] [v2] usb: musb: Fix runtime PM imbalance on error

2020-05-21 Thread Greg Kroah-Hartman
On Fri, May 22, 2020 at 10:59:02AM +0800, Dinghao Liu wrote:
> When copy_from_user() returns an error code, there
> is a runtime PM usage counter imbalance.
> 
> Fix this by moving copy_from_user() to the beginning
> of this function.
> 
> Signed-off-by: Dinghao Liu 
> ---
>  drivers/usb/musb/musb_debugfs.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

What changed from v1?  Always show that below the --- line as the
documentation says to.

thanks,

greg k-h


[PATCH] rxrpc: fix a memory leak bug.

2020-05-21 Thread wu000273
From: Qiushi Wu 

In function rxkad_verify_response(), pointer "ticket" is not released,
when function rxkad_decrypt_ticket() returns an error, causing a
memory leak bug.

Fixes: 8c2f826dc3631 ("rxrpc: Don't put crypto buffers on the stack")
Signed-off-by: Qiushi Wu 
---
 net/rxrpc/rxkad.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/rxrpc/rxkad.c b/net/rxrpc/rxkad.c
index 098f1f9ec53b..52a24d4ef5d8 100644
--- a/net/rxrpc/rxkad.c
+++ b/net/rxrpc/rxkad.c
@@ -1148,7 +1148,7 @@ static int rxkad_verify_response(struct rxrpc_connection 
*conn,
ret = rxkad_decrypt_ticket(conn, skb, ticket, ticket_len, _key,
   , _abort_code);
if (ret < 0)
-   goto temporary_error_free_resp;
+   goto temporary_error_free_ticket;
 
/* use the session key from inside the ticket to decrypt the
 * response */
@@ -1230,7 +1230,6 @@ static int rxkad_verify_response(struct rxrpc_connection 
*conn,
 
 temporary_error_free_ticket:
kfree(ticket);
-temporary_error_free_resp:
kfree(response);
 temporary_error:
/* Ignore the response packet if we got a temporary error such as
-- 
2.17.1



[PATCH] scsi: ufs-bsg: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu
When ufs_bsg_alloc_desc_buffer() returns an error code,
a pairing runtime PM usage counter decrement is needed
to keep the counter balanced.

Signed-off-by: Dinghao Liu 
---
 drivers/scsi/ufs/ufs_bsg.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ufs/ufs_bsg.c b/drivers/scsi/ufs/ufs_bsg.c
index 53dd87628cbe..516a7f573942 100644
--- a/drivers/scsi/ufs/ufs_bsg.c
+++ b/drivers/scsi/ufs/ufs_bsg.c
@@ -106,8 +106,10 @@ static int ufs_bsg_request(struct bsg_job *job)
desc_op = bsg_request->upiu_req.qr.opcode;
ret = ufs_bsg_alloc_desc_buffer(hba, job, _buff,
_len, desc_op);
-   if (ret)
+   if (ret) {
+   pm_runtime_put_sync(hba->dev);
goto out;
+   }
 
/* fall through */
case UPIU_TRANSACTION_NOP_OUT:
-- 
2.17.1



linux-next: manual merge of the block tree with the djw-vfs tree

2020-05-21 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the block tree got a conflict in:

  drivers/block/loop.c

between commit:

  efbe3c2493d2 ("fs: Remove unneeded IS_DAX() check in io_is_direct()")

from the djw-vfs tree and commit:

  3448914e8cc5 ("loop: Add LOOP_CONFIGURE ioctl")

from the block tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc drivers/block/loop.c
index 14372df0f354,a565c5aafa52..
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@@ -1022,21 -1146,20 +1146,21 @@@ static int loop_configure(struct loop_d
lo->old_gfp_mask = mapping_gfp_mask(mapping);
mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
  
-   if (!(lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
+   if (!(lo->lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
blk_queue_write_cache(lo->lo_queue, true, false);
  
-   if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev) {
+   if (config->block_size)
+   bsize = config->block_size;
 -  else if (io_is_direct(lo->lo_backing_file) && inode->i_sb->s_bdev)
++  else if ((lo->lo_backing_file->f_flags & O_DIRECT) &&
++   inode->i_sb->s_bdev)
/* In case of direct I/O, match underlying block size */
-   unsigned short bsize = bdev_logical_block_size(
-   inode->i_sb->s_bdev);
+   bsize = bdev_logical_block_size(inode->i_sb->s_bdev);
+   else
+   bsize = 512;
  
-   blk_queue_logical_block_size(lo->lo_queue, bsize);
-   blk_queue_physical_block_size(lo->lo_queue, bsize);
-   blk_queue_io_min(lo->lo_queue, bsize);
-   }
+   blk_queue_logical_block_size(lo->lo_queue, bsize);
+   blk_queue_physical_block_size(lo->lo_queue, bsize);
+   blk_queue_io_min(lo->lo_queue, bsize);
  
loop_update_rotational(lo);
loop_update_dio(lo);


pgpt_3jZZBldb.pgp
Description: OpenPGP digital signature


[PATCH] scsi: ufs: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu
When devm_clk_get() returns an error code, a pairing
runtime PM usage counter decrement is needed to keep
the counter balanced.

Signed-off-by: Dinghao Liu 
---
 drivers/scsi/ufs/ti-j721e-ufs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ufs/ti-j721e-ufs.c b/drivers/scsi/ufs/ti-j721e-ufs.c
index 5216d228cdd9..f3f212f6f9a9 100644
--- a/drivers/scsi/ufs/ti-j721e-ufs.c
+++ b/drivers/scsi/ufs/ti-j721e-ufs.c
@@ -39,6 +39,7 @@ static int ti_j721e_ufs_probe(struct platform_device *pdev)
clk = devm_clk_get(dev, NULL);
if (IS_ERR(clk)) {
dev_err(dev, "Cannot claim MPHY clock.\n");
+   pm_runtime_put_sync(dev);
return PTR_ERR(clk);
}
clk_rate = clk_get_rate(clk);
-- 
2.17.1



Re: [PATCH v2 7/8] exec: Generic execfd support

2020-05-21 Thread Rob Landley
On 5/21/20 10:28 PM, Eric W. Biederman wrote:
> 
> Rob Landley  writes:
> 
>> On 5/20/20 11:05 AM, Eric W. Biederman wrote:
> 
>> Toybox would _like_ proc mounted, but can't assume it. I'm writing a new
>> bash-compatible shell with nommu support, which means in order to do subshell
>> and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(),
>> have the child exec itself to unblock the parent, and then read the context 
>> data
>> that just got discarded through the pipe from the parent. ("Wheee." And you 
>> can
>> quote me on that.)
> 
> Do you have clone(CLONE_VM) ?  If my quick skim of the kernel sources is
> correct that should be the same as vfork except without causing the
> parent to wait for you.  Which I think would remove the need to reexec
> yourself.

As with perpetual motion, that only seems like it would work if you don't
understand what's going on.

A nommu system uses physical addresses, not virtual ones, so every process sees
the same addresses. So if I allocate a new block of memory and memcpy the
contents of the old one into the new one, any pointers in the copy point back
into the ORIGINAL block of memory. Trying to adjust the pointers in the copy is
the exact same problem as trying to do garbage collection in C: it's an AI
complete problem.

Any attempt to "implement a full fork" on nommu hits this problem: copying an
existing mapping to a new address range means any address values in the new
mapping point into the OLD mapping. Things like fdpic fix this up at exec time
(traversing elf tables and relocating), but not at runtime. If you can solve the
"relocate at runtime all addresses within an existing mapping, and all other
mappings that might point to this mapping, including local variables on the
stack that point to a structure member or halfway into a string rather than the
start of an allocation, without adjusting unrelated values coincidentally within
RANGE of a mapping" problem, THEN you can fork on a nommu system.

What vfork() does is pause the parent and have the child continue AS the parent
for a bit (with the system call returning 0). The child starts with all the same
memory mappings the parent has (usually not even a new stack). The child has a
new PID and new resources like its own file descriptor table so close() and
open() don't affect the parent, but if you change a global that's visible to the
parent when it resumes (ant often local variables too: don't return from the
function that called vfork() because if you DON'T have a new stack it'll stomp
the return address the parent needs when IT does it). If the child calls
malloc() the parent needs to free it because it's same heap (because same
mapping of the same physical memory).

Then when the child is ready to discard all those mappings (due to calling
either execve() or _exit(), those are the only two options), the parent resumes
from where it left off with the PID of the child as the system call return 
value.

The reason the child pauses the parent is so only one process is ever using
those mappings at a given time. Otherwise they're acting like threads without
locking, and usually both are sharing a stack.

P.S. You can use threads _instead_ of fork for some stuff on nommu, but that's
its own can of worms. You still need to vfork() when you do create a child
process you're going to exec, so it doesn't go away, you're just requiring
multiple techniques simultaneously to handle a special case.

P.P.S. vfork() is useful on mmu systems to solve the "don't fork from a thread"
problem. You can vfork() from a thread cheaply and reliably and it only pauses
the one thread you forked from, not every thread in the whole process. If you
fork() from a heavily threadded process you can cause a multi-milisecond latency
spike because even with an mmu the copy on write "keep track of what's shared by
what" generally can't handle the "threads AND processes sharing mappings" case,
so it just gives up and copies it all at fork time, in one go, holding a big
lock while doing so. This causes a large latency spike which vfork() avoids.
(And can cause a large wasteful allocation and memory dirtying which is
immediately freed.)

>>> The file descriptor is stored in mm->exe_file.
>>> Probably the most straight forward implementation is to allow
>>> execveat(AT_EXE_FILE, ...).
>>
>> Cool, that works.
>>
>>> You can look at binfmt_misc for how to reopen an open file descriptor.
>>
>> Added to the todo heap.
> 
> Yes I don't think it would be a lot of code.
> 
> I think you might be better served with clone(CLONE_VM) as it doesn't
> block so you don't need to feed yourself your context over a pipe.

Except that doesn't fix it.

Yes I could use threads instead, but the cure is worse than the disease and the
result is your shell background processes are threads rather than independent
processes (is $$ reporting PID or TID, I really don't want to go there).

> Eric

Rob


[PATCH] wlcore: fix runtime pm imbalance in wlcore_irq_locked

2020-05-21 Thread Dinghao Liu
When wlcore_fw_status() returns an error code, a pairing
runtime PM usage counter decrement is needed to keep the
counter balanced. It's the same for all error paths after
wlcore_fw_status().

Signed-off-by: Dinghao Liu 
---
 drivers/net/wireless/ti/wlcore/main.c | 17 +
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ti/wlcore/main.c 
b/drivers/net/wireless/ti/wlcore/main.c
index f140f7d7f553..fd3608223f64 100644
--- a/drivers/net/wireless/ti/wlcore/main.c
+++ b/drivers/net/wireless/ti/wlcore/main.c
@@ -548,7 +548,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
 
ret = wlcore_fw_status(wl, wl->fw_status);
if (ret < 0)
-   goto out;
+   goto err_ret;
 
wlcore_hw_tx_immediate_compl(wl);
 
@@ -565,7 +565,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
ret = -EIO;
 
/* restarting the chip. ignore any other interrupt. */
-   goto out;
+   goto err_ret;
}
 
if (unlikely(intr & WL1271_ACX_SW_INTR_WATCHDOG)) {
@@ -575,7 +575,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
ret = -EIO;
 
/* restarting the chip. ignore any other interrupt. */
-   goto out;
+   goto err_ret;
}
 
if (likely(intr & WL1271_ACX_INTR_DATA)) {
@@ -583,7 +583,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
 
ret = wlcore_rx(wl, wl->fw_status);
if (ret < 0)
-   goto out;
+   goto err_ret;
 
/* Check if any tx blocks were freed */
spin_lock_irqsave(>wl_lock, flags);
@@ -596,7 +596,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
 */
ret = wlcore_tx_work_locked(wl);
if (ret < 0)
-   goto out;
+   goto err_ret;
} else {
spin_unlock_irqrestore(>wl_lock, flags);
}
@@ -604,7 +604,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
/* check for tx results */
ret = wlcore_hw_tx_delayed_compl(wl);
if (ret < 0)
-   goto out;
+   goto err_ret;
 
/* Make sure the deferred queues don't get too long */
defer_count = skb_queue_len(>deferred_tx_queue) +
@@ -617,14 +617,14 @@ static int wlcore_irq_locked(struct wl1271 *wl)
wl1271_debug(DEBUG_IRQ, "WL1271_ACX_INTR_EVENT_A");
ret = wl1271_event_handle(wl, 0);
if (ret < 0)
-   goto out;
+   goto err_ret;
}
 
if (intr & WL1271_ACX_INTR_EVENT_B) {
wl1271_debug(DEBUG_IRQ, "WL1271_ACX_INTR_EVENT_B");
ret = wl1271_event_handle(wl, 1);
if (ret < 0)
-   goto out;
+   goto err_ret;
}
 
if (intr & WL1271_ACX_INTR_INIT_COMPLETE)
@@ -635,6 +635,7 @@ static int wlcore_irq_locked(struct wl1271 *wl)
wl1271_debug(DEBUG_IRQ, "WL1271_ACX_INTR_HW_AVAILABLE");
}
 
+err_ret:
pm_runtime_mark_last_busy(wl->dev);
pm_runtime_put_autosuspend(wl->dev);
 
-- 
2.17.1



Re: [PATCH 10/14] docs: move locking-specific documenta to locking/ directory

2020-05-21 Thread Mauro Carvalho Chehab
Em Fri, 15 May 2020 12:06:07 -0600
Jonathan Corbet  escreveu:

> On Fri,  1 May 2020 17:37:54 +0200
> Mauro Carvalho Chehab  wrote:
> 
> > Several files under Documentation/*.txt describe some type of
> > locking API. Move them to locking/ subdir and add to the
> > locking/index.rst index file.
> > 
> > Signed-off-by: Mauro Carvalho Chehab   
> 
> I've applied this, but it really seems like this belongs in the core-api
> manual someday.

Makes sense.

Well, right now, it is at the same level as core-api, just below it:

Kernel API documentation


These books get into the details of how specific kernel subsystems work
from the point of view of a kernel developer.  Much of the information 
here
is taken directly from the kernel source, with supplemental material 
added
as needed (or at least as we managed to add it — probably *not* all 
that is
needed).

.. toctree::
   :maxdepth: 2

   driver-api/index
   core-api/index
   locking/index

Not too bad.

Btw, there are other doc sets that could also fit into the core-api, like:

...
   accounting/index
...
   security/index
...
   bpf/index
...
   scheduler/index

while most of the rest should likely be inside driver-api.

Some care should be taken when moving stuff, though: there is a
reason why they weren't moved to driver-api in the first place:
they may contain stuff for the admin guide mixed there.

Thanks,
Mauro


Re: Re: [PATCH] [v2] PCI: tegra194: Fix runtime PM imbalance on error

2020-05-21 Thread dinghao . liu
Hi Bjorn,

In fact, most usage of pm_runtime_get_sync() is correct. I made 
a static analysis tool to check this imbalance in kernel and 
found about 80 bugs in dirvers. Some of my patches have been 
accepted and I'm trying to patch the rest as soon as possible.

Regards,
Dinghao 

> [+cc Rafael, linux-pm]
> 
> On Thu, May 21, 2020 at 11:13:49AM +0800, Dinghao Liu wrote:
> > pm_runtime_get_sync() increments the runtime PM usage counter even
> > when it returns an error code. Thus a pairing decrement is needed on
> > the error handling path to keep the counter balanced.
> 
> I didn't realize there were so many drivers with the exact same issue.
> Can we just squash these all into a single patch so we can see them
> all together?
> 
> Hmm.  There are over 1300 callers of pm_runtime_get_sync(), and it
> looks like many of them have similar issues, i.e., they have a pattern
> like this
> 
>   ret = pm_runtime_get_sync(dev);
>   if (ret < 0)
> return;
> 
>   pm_runtime_put(dev);
> 
> where there is not a pm_runtime_put() to match every
> pm_runtime_get_sync().  Random sample:
> 
>   nds32_pmu_reserve_hardware
>   sata_rcar_probe
>   exynos_trng_probe
>   ks_sa_rng_probe
>   omap_aes_probe
>   sun8i_ss_probe
>   omap_aes_probe
>   zynq_gpio_probe
>   amdgpu_hwmon_show_power_avg
>   mtk_crtc_ddp_hw_init
>   ...
> 
> Surely I'm missing something and these aren't all broken, right?
> 
> Maybe we could put together a coccinelle script to scan the tree for
> this issue?
> 
> > Signed-off-by: Dinghao Liu 
> > ---
> >  drivers/pci/controller/dwc/pcie-tegra194.c | 5 ++---
> >  1 file changed, 2 insertions(+), 3 deletions(-)
> > 
> > diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c 
> > b/drivers/pci/controller/dwc/pcie-tegra194.c
> > index ae30a2fd3716..2c0d2ce16b47 100644
> > --- a/drivers/pci/controller/dwc/pcie-tegra194.c
> > +++ b/drivers/pci/controller/dwc/pcie-tegra194.c
> > @@ -1623,7 +1623,7 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw 
> > *pcie)
> > ret = pinctrl_pm_select_default_state(dev);
> > if (ret < 0) {
> > dev_err(dev, "Failed to configure sideband pins: %d\n", ret);
> > -   goto fail_pinctrl;
> > +   goto fail_pm_get_sync;
> > }
> >  
> > tegra_pcie_init_controller(pcie);
> > @@ -1650,9 +1650,8 @@ static int tegra_pcie_config_rp(struct tegra_pcie_dw 
> > *pcie)
> >  
> >  fail_host_init:
> > tegra_pcie_deinit_controller(pcie);
> > -fail_pinctrl:
> > -   pm_runtime_put_sync(dev);
> >  fail_pm_get_sync:
> > +   pm_runtime_put_sync(dev);
> > pm_runtime_disable(dev);
> > return ret;
> >  }
> > -- 
> > 2.17.1
> > 


Re: [PATCH 06/14] docs: debugging-via-ohci1394.txt: add it to the core-api book

2020-05-21 Thread Mauro Carvalho Chehab
Em Fri, 15 May 2020 12:00:16 -0600
Jonathan Corbet  escreveu:

> On Fri,  1 May 2020 17:37:50 +0200
> Mauro Carvalho Chehab  wrote:
> 
> > There is an special chapter inside the core-api book about
> > some debug infrastructure like tracepoints and debug objects.
> > 
> > It sounded to me that this is the best place to add a chapter
> > explaining how to use a FireWire controller to do remote
> > kernel debugging, as explained on this document.
> > 
> > Signed-off-by: Mauro Carvalho Chehab   
> 
> I've applied this, but core-api really seems like the wrong place for
> this.  It would be good to rethink our layout a bit at some point in the
> near future...

Yeah, agreed. Debug functionality should likely deserve a separate
chapter outside core-api.

Now that we'll have all docs converted, it should be easier to view
the hole picture and re-design the doc organization.


Thanks,
Mauro


Re: [PATCH 02/14] docs: add bus-virt-phys-mapping.txt to core-api

2020-05-21 Thread Mauro Carvalho Chehab
Em Fri, 15 May 2020 11:53:21 -0600
Jonathan Corbet  escreveu:

> On Fri,  1 May 2020 17:37:46 +0200
> Mauro Carvalho Chehab  wrote:
> 
> > This describes an old interface used prior the new DMA-API
> > interfaces. Add it to the core-api guide, just after the
> > DMA stuff.
> > 
> > Signed-off-by: Mauro Carvalho Chehab 
> > ---
> >  .../bus-virt-phys-mapping.rst}   | 0
> >  Documentation/core-api/index.rst | 1 +
> >  2 files changed, 1 insertion(+)
> >  rename Documentation/{bus-virt-phys-mapping.txt => 
> > core-api/bus-virt-phys-mapping.rst} (100%)  
> 
> For this one, I think we should maybe just delete the file.  It contains a
> warning from *20 years ago* saying not to use it, and talks about
> functions like isa_readl() that haven't existed i the kernel for some
> time.  Is there any reason to keep dragging it around?

Except for "keeping it for historical reasons" (as mentioned at the
file), I don't see any reason why to keep it.

It might be useful if someone wants to port some OOT code based on
a legacy kernel.

Yet, if you prefer to just trash it, I'm ok with that.

Thanks,
Mauro


Re: [PATCH v2 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

2020-05-21 Thread Alex Williamson
On Thu, 21 May 2020 22:39:06 -0400
Qian Cai  wrote:

> On Tue, May 05, 2020 at 03:55:02PM -0600, Alex Williamson wrote:
> []
> vfio_pci_mmap_fault(struct vm_fault *vmf)
> >  {
> > struct vm_area_struct *vma = vmf->vma;
> > struct vfio_pci_device *vdev = vma->vm_private_data;
> > +   vm_fault_t ret = VM_FAULT_NOPAGE;
> >  
> > -   if (vfio_pci_add_vma(vdev, vma))
> > -   return VM_FAULT_OOM;
> > +   mutex_lock(>vma_lock);
> > +   down_read(>memory_lock);  
> 
> This lock here will trigger,
> 
> [17368.321363][T3614103] 
> ==
> [17368.321375][T3614103] WARNING: possible circular locking dependency 
> detected
> [17368.321399][T3614103] 5.7.0-rc6-next-20200521+ #116 Tainted: GW
> 
> [17368.321410][T3614103] 
> --
> [17368.321433][T3614103] qemu-kvm/3614103 is trying to acquire lock:
> [17368.321443][T3614103] c000200fb2328968 (>lock){+.+.}-{3:3}, at: 
> kvmppc_irq_bypass_add_producer_hv+0xd4/0x3b0 [kvm_hv]
> [17368.321488][T3614103] 
> [17368.321488][T3614103] but task is already holding lock:
> [17368.321533][T3614103] c16f4dc8 (lock#7){+.+.}-{3:3}, at: 
> irq_bypass_register_producer+0x80/0x1d0
> [17368.321564][T3614103] 
> [17368.321564][T3614103] which lock already depends on the new lock.
> [17368.321564][T3614103] 
> [17368.321590][T3614103] 
> [17368.321590][T3614103] the existing dependency chain (in reverse order) is:
> [17368.321625][T3614103] 
> [17368.321625][T3614103] -> #4 (lock#7){+.+.}-{3:3}:
> [17368.321662][T3614103]__mutex_lock+0xdc/0xb80
> [17368.321683][T3614103]irq_bypass_register_producer+0x80/0x1d0
> [17368.321706][T3614103]vfio_msi_set_vector_signal+0x1d8/0x350 
> [vfio_pci]
> [17368.321719][T3614103]vfio_msi_set_block+0xb0/0x1e0 [vfio_pci]
> [17368.321752][T3614103]vfio_pci_set_msi_trigger+0x13c/0x3e0 
> [vfio_pci]
> [17368.321787][T3614103]vfio_pci_set_irqs_ioctl+0x134/0x2c0 [vfio_pci]
> [17368.321821][T3614103]vfio_pci_ioctl+0xe10/0x1460 [vfio_pci]
> [17368.321855][T3614103]vfio_device_fops_unl_ioctl+0x44/0x70 [vfio]
> [17368.321879][T3614103]ksys_ioctl+0xd8/0x130
> [17368.321888][T3614103]sys_ioctl+0x28/0x40
> [17368.321910][T3614103]system_call_exception+0x108/0x1d0
> [17368.321932][T3614103]system_call_common+0xf0/0x278
> [17368.321951][T3614103] 
> [17368.321951][T3614103] -> #3 (>memory_lock){}-{3:3}:
> [17368.321988][T3614103]lock_release+0x190/0x5e0
> [17368.322009][T3614103]__mutex_unlock_slowpath+0x68/0x410
> [17368.322042][T3614103]vfio_pci_mmap_fault+0xe8/0x1f0 [vfio_pci]
> vfio_pci_mmap_fault at drivers/vfio/pci/vfio_pci.c:1534
> [17368.322066][T3614103]__do_fault+0x64/0x220
> [17368.322086][T3614103]handle_mm_fault+0x12f0/0x19e0
> [17368.322107][T3614103]__do_page_fault+0x284/0xf70
> [17368.322116][T3614103]handle_page_fault+0x10/0x2c
> [17368.322136][T3614103] 
> [17368.322136][T3614103] -> #2 (>mmap_sem){}-{3:3}:
> [17368.322160][T3614103]__might_fault+0x84/0xe0
> [17368.322182][T3614103]_copy_to_user+0x3c/0x120
> [17368.322206][T3614103]kvm_vcpu_ioctl+0x1ec/0xac0 [kvm]
> [17368.322239][T3614103]ksys_ioctl+0xd8/0x130
> [17368.322270][T3614103]sys_ioctl+0x28/0x40
> [17368.322301][T3614103]system_call_exception+0x108/0x1d0
> [17368.322334][T3614103]system_call_common+0xf0/0x278
> [17368.322375][T3614103] 
> [17368.322375][T3614103] -> #1 (>mutex){+.+.}-{3:3}:
> [17368.322411][T3614103]__mutex_lock+0xdc/0xb80
> [17368.322446][T3614103]kvmppc_xive_release+0xd8/0x260 [kvm]
> [17368.322484][T3614103]kvm_device_release+0xc4/0x110 [kvm]
> [17368.322518][T3614103]__fput+0x154/0x3b0
> [17368.322562][T3614103]task_work_run+0xd8/0x170
> [17368.322583][T3614103]do_exit+0x4f8/0xeb0
> [17368.322604][T3614103]do_group_exit+0x78/0x160
> [17368.322625][T3614103]get_signal+0x230/0x1440
> [17368.322657][T3614103]do_notify_resume+0x130/0x3e0
> [17368.322677][T3614103]syscall_exit_prepare+0x1a4/0x280
> [17368.322687][T3614103]system_call_common+0xf8/0x278
> [17368.322718][T3614103] 
> [17368.322718][T3614103] -> #0 (>lock){+.+.}-{3:3}:
> [17368.322753][T3614103]__lock_acquire+0x1fe4/0x3190
> [17368.322774][T3614103]lock_acquire+0x140/0x9a0
> [17368.322805][T3614103]__mutex_lock+0xdc/0xb80
> [17368.322838][T3614103]kvmppc_irq_bypass_add_producer_hv+0xd4/0x3b0 
>

[PATCH 1/2] video: fbdev: fix error handling for get_user_pages_fast()

2020-05-21 Thread John Hubbard
Dealing with the return value of get_user_pages*() variants has a few
classic pitfalls, and this driver found one of them: the return value
might be zero, positive, or -errno. And if positive, it might be fewer
pages than were requested. And if fewer pages than requested, then
the caller should return (via put_page()) the pages that *were*
pinned.

This driver was doing that *except* that it had a problem with the
-errno case, which was being stored in an unsigned int, and which
would case an interesting mess if it ever happened: nr_pages would be
interpreted as a spectacularly huge unsigned value, rather than a
small negative value. Also, it was unnecessarily overriding a
potentially informative -errno, with -EINVAL, in some cases.

Instead: clamp the nr_pages to zero or positive, so that the error
handling works. And return the -errno value from get_user_pages*(),
unchanged, if we get one. And explain this with comments, seeing as
how it is error-prone.

Cc: Bartlomiej Zolnierkiewicz 
Cc: Arnd Bergmann 
Cc: Daniel Vetter 
Cc: Gustavo A. R. Silva 
Cc: Jani Nikula 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-fb...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 drivers/video/fbdev/pvr2fb.c | 18 --
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c
index f18d457175d9..ceb6ef590597 100644
--- a/drivers/video/fbdev/pvr2fb.c
+++ b/drivers/video/fbdev/pvr2fb.c
@@ -654,8 +654,22 @@ static ssize_t pvr2fb_write(struct fb_info *info, const 
char *buf,
 
ret = get_user_pages_fast((unsigned long)buf, nr_pages, FOLL_WRITE, 
pages);
if (ret < nr_pages) {
-   nr_pages = ret;
-   ret = -EINVAL;
+   if (ret < 0) {
+   /*
+*  Clamp the unsigned nr_pages to zero so that the
+*  error handling works. And leave ret at whatever
+*  -errno value was returned from GUP.
+*/
+   nr_pages = 0;
+   } else {
+   nr_pages = ret;
+   /*
+* Use -EINVAL to represent a mildly desperate guess at
+* why we got fewer pages (maybe even zero pages) than
+* requested.
+*/
+   ret = -EINVAL;
+   }
goto out_unmap;
}
 
-- 
2.26.2



[PATCH 0/2] video: fbdev: fix error handling, convert to pin_user_pages*()

2020-05-21 Thread John Hubbard
Hi,

Note that I have only compile-tested this series, although that does
also include cross-compiling for a few other arches. I'm hoping that
this posting will lead to some run-time testing.

Also: the proposed fix does not have a "Fixes:" tag, nor does it
Cc stable. That's because the issue has been there since the dawn of
git history for the kernel. If it's gone unnoticed this long, then
there is clearly no need for the relatively fast track of putting it
into stable, IMHO. But please correct me if that's wrong.

Cc: Bartlomiej Zolnierkiewicz 
Cc: Arnd Bergmann 
Cc: Daniel Vetter 
Cc: Gustavo A. R. Silva 
Cc: Jani Nikula 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-fb...@vger.kernel.org

John Hubbard (2):
  video: fbdev: fix error handling for get_user_pages_fast()
  video: fbdev: convert get_user_pages() --> pin_user_pages()

 drivers/video/fbdev/pvr2fb.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)


base-commit: 051143e1602d90ea71887d92363edd539d411de5
-- 
2.26.2



[PATCH 2/2] video: fbdev: convert get_user_pages() --> pin_user_pages()

2020-05-21 Thread John Hubbard
This code was using get_user_pages*(), in a "Case 2" scenario
(DMA/RDMA), using the categorization from [1]. That means that it's
time to convert the get_user_pages*() + put_page() calls to
pin_user_pages*() + unpin_user_pages() calls.

There is some helpful background in [2]: basically, this is a small
part of fixing a long-standing disconnect between pinning pages, and
file systems' use of those pages.

[1] Documentation/core-api/pin_user_pages.rst

[2] "Explicit pinning of user-space pages":
https://lwn.net/Articles/807108/

Cc: Bartlomiej Zolnierkiewicz 
Cc: Arnd Bergmann 
Cc: Daniel Vetter 
Cc: Gustavo A. R. Silva 
Cc: Jani Nikula 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-fb...@vger.kernel.org
Signed-off-by: John Hubbard 
---
 drivers/video/fbdev/pvr2fb.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/video/fbdev/pvr2fb.c b/drivers/video/fbdev/pvr2fb.c
index ceb6ef590597..2d9f69b93392 100644
--- a/drivers/video/fbdev/pvr2fb.c
+++ b/drivers/video/fbdev/pvr2fb.c
@@ -652,7 +652,7 @@ static ssize_t pvr2fb_write(struct fb_info *info, const 
char *buf,
if (!pages)
return -ENOMEM;
 
-   ret = get_user_pages_fast((unsigned long)buf, nr_pages, FOLL_WRITE, 
pages);
+   ret = pin_user_pages_fast((unsigned long)buf, nr_pages, FOLL_WRITE, 
pages);
if (ret < nr_pages) {
if (ret < 0) {
/*
@@ -712,9 +712,7 @@ static ssize_t pvr2fb_write(struct fb_info *info, const 
char *buf,
ret = count;
 
 out_unmap:
-   for (i = 0; i < nr_pages; i++)
-   put_page(pages[i]);
-
+   unpin_user_pages(pages, nr_pages);
kfree(pages);
 
return ret;
-- 
2.26.2



[PATCH 2/3] gpio: pxa: Fix return value of pxa_gpio_probe()

2020-05-21 Thread Tiezhu Yang
When call function devm_platform_ioremap_resource(), we should use IS_ERR()
to check the return value and return PTR_ERR() if failed.

Fixes: 542c25b7a209 ("drivers: gpio: pxa: use devm_platform_ioremap_resource()")
Signed-off-by: Tiezhu Yang 
---
 drivers/gpio/gpio-pxa.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpio/gpio-pxa.c b/drivers/gpio/gpio-pxa.c
index 1361270..0cb6600 100644
--- a/drivers/gpio/gpio-pxa.c
+++ b/drivers/gpio/gpio-pxa.c
@@ -660,8 +660,8 @@ static int pxa_gpio_probe(struct platform_device *pdev)
pchip->irq1 = irq1;
 
gpio_reg_base = devm_platform_ioremap_resource(pdev, 0);
-   if (!gpio_reg_base)
-   return -EINVAL;
+   if (IS_ERR(gpio_reg_base))
+   return PTR_ERR(gpio_reg_base);
 
clk = clk_get(>dev, NULL);
if (IS_ERR(clk)) {
-- 
2.1.0



[PATCH 3/3] gpio: pxa: Add COMPILE_TEST support

2020-05-21 Thread Tiezhu Yang
Add COMPILE_TEST support to the PXA GPIO driver for better compile
testing coverage.

Signed-off-by: Tiezhu Yang 
---
 drivers/gpio/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index 03c01f4..5e90aad 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -439,7 +439,7 @@ config GPIO_PMIC_EIC_SPRD
 
 config GPIO_PXA
bool "PXA GPIO support"
-   depends on ARCH_PXA || ARCH_MMP
+   depends on ARCH_PXA || ARCH_MMP || COMPILE_TEST
help
  Say yes here to support the PXA GPIO device
 
-- 
2.1.0



[PATCH 1/3] gpio: bcm-kona: Fix return value of bcm_kona_gpio_probe()

2020-05-21 Thread Tiezhu Yang
When call function devm_platform_ioremap_resource(), we should use IS_ERR()
to check the return value and return PTR_ERR() if failed.

Fixes: 72d8cb715477 ("drivers: gpio: bcm-kona: use 
devm_platform_ioremap_resource()")
Signed-off-by: Tiezhu Yang 
---
 drivers/gpio/gpio-bcm-kona.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpio/gpio-bcm-kona.c b/drivers/gpio/gpio-bcm-kona.c
index baee8c3..cf3687a 100644
--- a/drivers/gpio/gpio-bcm-kona.c
+++ b/drivers/gpio/gpio-bcm-kona.c
@@ -625,7 +625,7 @@ static int bcm_kona_gpio_probe(struct platform_device *pdev)
 
kona_gpio->reg_base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(kona_gpio->reg_base)) {
-   ret = -ENXIO;
+   ret = PTR_ERR(kona_gpio->reg_base);
goto err_irq_domain;
}
 
-- 
2.1.0



Re: [v4,0/7] Add Mediatek thermal dirver and dtsi

2020-05-21 Thread Michael Kao
On Thu, 2020-05-21 at 14:51 +0200, Matthias Brugger wrote:
> Hi Michael,
> 
> On 23/03/2020 13:15, Michael Kao wrote:
> > This patchset supports for MT8183 chip to mtk_thermal.c.
> > Add thermal zone of all the thermal sensor in SoC for
> > another get temperatrue. They don't need to thermal throttle.
> > And we bind coolers for thermal zone nodes of cpu_thermal.
> > 
> > Rebase to kernel-5.6-rc1.
> > 
> > Update content:
> > 
> > [1/7]
> > - Squash thermal zone settings in the dtsi from [v3,5/8]
> >   arm64: dts: mt8183: Increase polling frequency for CPU thermal zone
> > 
> > - Remove the property of interrupts and mediatek,hw-reset-temp
> > 
> > [2/7]
> > - Correct commit message
> > 
> > [4/7]
> > - Change the target temperature to the 80C and change the commit message
> > 
> > [6/7]
> > - Adjust newline alignment
> > 
> > - Fix the judgement on the return value of registering thermal zone
> > 
> > This patch series base on these patches [1].
> > 
> > [v7,3/3] PM / AVS: SVS: Introduce SVS engine 
> > (https://patchwork.kernel.org/patch/11439829/)
> > 
> > Matthias Kaehlcke (1):
> >   arm64: dts: mt8183: Configure CPU cooling
> > 
> > Michael Kao (6):
> >   arm64: dts: mt8183: add thermal zone node
> >   arm64: dts: mt8183: add dynamic power coefficients
> >   arm64: dts: mt8183: Add #cooling-cells to CPU nodes
> >   thermal: mediatek: mt8183: fix bank number settings
> 
> Do I understand correctly that we need to fix the bank number before we can 
> add
> the device tree changes. And that the last two patches are enhancements for 
> the
> driver but needed to get a working version?
> 
> Regards,
> Matthias
> 
Hi Matthias,

There is one bank setting of mt8183 config.
If the device tree merged first. I worry that it will crash when the
thermal zone read temperature.
It will access the invalid index of bank.
So please wait the patch "fix bank number settings " merged first.
Thanks!

/* MT8183 thermal sensor data */
static const int mt8183_bank_data[MT8183_NUM_SENSORS] = {
MT8183_TS1, MT8183_TS2, MT8183_TS3, MT8183_TS4, MT8183_TS5,
MT8183_TSABB
}; 

Best Regards,
Michael


> >   thermal: mediatek: add another get_temp ops for thermal sensors
> >   thermal: mediatek: use spinlock to protect PTPCORESEL
> > 
> >  arch/arm64/boot/dts/mediatek/mt8183.dtsi | 156 +++
> >  drivers/thermal/mtk_thermal.c|  88 +++--
> >  2 files changed, 231 insertions(+), 13 deletions(-)
> > 



Re: [v4,0/7] Add Mediatek thermal dirver and dtsi

2020-05-21 Thread Michael Kao
On Thu, 2020-05-21 at 14:51 +0200, Matthias Brugger wrote:
> Hi Michael,
> 
> On 23/03/2020 13:15, Michael Kao wrote:
> > This patchset supports for MT8183 chip to mtk_thermal.c.
> > Add thermal zone of all the thermal sensor in SoC for
> > another get temperatrue. They don't need to thermal throttle.
> > And we bind coolers for thermal zone nodes of cpu_thermal.
> > 
> > Rebase to kernel-5.6-rc1.
> > 
> > Update content:
> > 
> > [1/7]
> > - Squash thermal zone settings in the dtsi from [v3,5/8]
> >   arm64: dts: mt8183: Increase polling frequency for CPU thermal zone
> > 
> > - Remove the property of interrupts and mediatek,hw-reset-temp
> > 
> > [2/7]
> > - Correct commit message
> > 
> > [4/7]
> > - Change the target temperature to the 80C and change the commit message
> > 
> > [6/7]
> > - Adjust newline alignment
> > 
> > - Fix the judgement on the return value of registering thermal zone
> > 
> > This patch series base on these patches [1].
> > 
> > [v7,3/3] PM / AVS: SVS: Introduce SVS engine 
> > (https://patchwork.kernel.org/patch/11439829/)
> > 
> > Matthias Kaehlcke (1):
> >   arm64: dts: mt8183: Configure CPU cooling
> > 
> > Michael Kao (6):
> >   arm64: dts: mt8183: add thermal zone node
> >   arm64: dts: mt8183: add dynamic power coefficients
> >   arm64: dts: mt8183: Add #cooling-cells to CPU nodes
> >   thermal: mediatek: mt8183: fix bank number settings
> 
> Do I understand correctly that we need to fix the bank number before we can 
> add
> the device tree changes. And that the last two patches are enhancements for 
> the
> driver but needed to get a working version?
> 
> Regards,
> Matthias

Hi Matthias,

There is one bank setting of mt8183 config.
If the device tree merged first. I worry that it will crash when the
thermal zone read temperature.
It will access the invalid index of bank.
So please add the patch "fix bank number settings "
first.2

> 
> >   thermal: mediatek: add another get_temp ops for thermal sensors
> >   thermal: mediatek: use spinlock to protect PTPCORESEL
> > 
> >  arch/arm64/boot/dts/mediatek/mt8183.dtsi | 156 +++
> >  drivers/thermal/mtk_thermal.c|  88 +++--
> >  2 files changed, 231 insertions(+), 13 deletions(-)
> > 



Re: [PATCH V3 0/3] arm64: Enable vmemmap mapping from device memory

2020-05-21 Thread Jia He

Hi

On 2020/3/31 13:09, Anshuman Khandual wrote:

This series enables vmemmap backing memory allocation from device memory
ranges on arm64. But before that, it enables vmemmap_populate_basepages()
and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
alocation requests.


I verified no obvious regression after this patch series.

Host: ThunderX2(armv8a server), kernel v5.4

qemu:v3.1, -M virt \

-object 
memory-backend-file,id=mem1,share=on,mem-path=/tmp2/nvdimm.img,size=4G,align=2M \


-device nvdimm,id=nvdimm1,memdev=mem1,label-size=2M

Guest: kernel v5.7.0-rc5 with this patch series.

Tested case:

- 4K PAGESIZE, boot, mount w/ -o dax, mount w/o -o dax, basic io

- 64K PAGESIZE,boot, mount w/ -o dax, mount w/o -o dax, basic io

Not tested:

- 16K pagesize due to my hardware limiation(can't run 16K pgsz kernel)

- hot-add/remove nvdimm device from qemu due to no fully support on arm64 qemu 
yet

- Host nvdimm device hotplug

Hence from above result,

Tested-by: Jia He 


This series applies after latest (v14) arm64 memory hot remove series
(https://lkml.org/lkml/2020/3/3/1746) on Linux 5.6.

Pending Question:

altmap_alloc_block_buf() does not have any other remaining users in the
tree after this change. Should it be converted into a static function and
it's declaration be dropped from the header (include/linux/mm.h). Avoided
doing so because I was not sure if there are any off-tree users or not.

Changes in V3:

- Dropped comment from free_hotplug_page_range() per Robin
- Modified comment in unmap_hotplug_range() per Robin
- Enabled altmap support in vmemmap_alloc_block_buf() per Robin

Changes in V2: (https://lkml.org/lkml/2020/3/4/475)

- Rebased on latest hot-remove series (v14) adding P4D page table support

Changes in V1: (https://lkml.org/lkml/2020/1/23/12)

- Added an WARN_ON() in unmap_hotplug_range() when altmap is
   provided without the page table backing memory being freed

Changes in RFC V2: (https://lkml.org/lkml/2019/10/21/11)

- Changed the commit message on 1/2 patch per Will
- Changed the commit message on 2/2 patch as well
- Rebased on arm64 memory hot remove series (v10)

RFC V1: (https://lkml.org/lkml/2019/6/28/32)

Cc: Catalin Marinas
Cc: Will Deacon
Cc: Mark Rutland
Cc: Paul Walmsley
Cc: Palmer Dabbelt
Cc: Tony Luck
Cc: Fenghua Yu
Cc: Dave Hansen
Cc: Andy Lutomirski
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: David Hildenbrand
Cc: Mike Rapoport
Cc: Michal Hocko
Cc: "Matthew Wilcox (Oracle)"
Cc: "Kirill A. Shutemov"
Cc: Andrew Morton
Cc: Dan Williams
Cc: Pavel Tatashin
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Michael Ellerman
Cc:linux-arm-ker...@lists.infradead.org
Cc:linux-i...@vger.kernel.org
Cc:linux-ri...@lists.infradead.org
Cc:x...@kernel.org
Cc:linuxppc-...@lists.ozlabs.org
Cc:linux...@kvack.org
Cc:linux-kernel@vger.kernel.org

Anshuman Khandual (3):
   mm/sparsemem: Enable vmem_altmap support in vmemmap_populate_basepages()
   mm/sparsemem: Enable vmem_altmap support in vmemmap_alloc_block_buf()
   arm64/mm: Enable vmem_altmap support for vmemmap mappings

  arch/arm64/mm/mmu.c   | 59 ++-
  arch/ia64/mm/discontig.c  |  2 +-
  arch/powerpc/mm/init_64.c | 10 +++
  arch/riscv/mm/init.c  |  2 +-
  arch/x86/mm/init_64.c | 12 
  include/linux/mm.h|  8 --
  mm/sparse-vmemmap.c   | 38 -
  7 files changed, 87 insertions(+), 44 deletions(-)


--

---
Cheers,
Justin (Jia He)



Re: [PATCH 01/10] swiotlb-xen: use vmalloc_to_page on vmalloc virt addresses

2020-05-21 Thread Stefano Stabellini
On Thu, 21 May 2020, Julien Grall wrote:
> Hi,
> 
> On 21/05/2020 00:45, Stefano Stabellini wrote:
> > From: Boris Ostrovsky 
> > 
> > Don't just assume that virt_to_page works on all virtual addresses.
> > Instead add a is_vmalloc_addr check and use vmalloc_to_page on vmalloc
> > virt addresses.
> 
> Can you provide an example where swiotlb is used with vmalloc()?

The issue was reported here happening on the Rasperry Pi 4:
https://marc.info/?l=xen-devel=158862573216800

If you are asking where in the Linux codebase the vmalloc is happening
specifically, I don't know for sure, my information is limited to the
stack trace that you see in the link (I don't have a Rasperry Pi 4 yet
but I shall have one soon.)


> > Signed-off-by: Boris Ostrovsky 
> > Signed-off-by: Stefano Stabellini 
> > ---
> >   drivers/xen/swiotlb-xen.c | 5 -
> >   1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index b6d27762c6f8..a42129cba36e 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -335,6 +335,7 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t
> > size, void *vaddr,
> > int order = get_order(size);
> > phys_addr_t phys;
> > u64 dma_mask = DMA_BIT_MASK(32);
> > +   struct page *pg;
> > if (hwdev && hwdev->coherent_dma_mask)
> > dma_mask = hwdev->coherent_dma_mask;
> > @@ -346,9 +347,11 @@ xen_swiotlb_free_coherent(struct device *hwdev, size_t
> > size, void *vaddr,
> > /* Convert the size to actually allocated. */
> > size = 1UL << (order + XEN_PAGE_SHIFT);
> >   + pg = is_vmalloc_addr(vaddr) ? vmalloc_to_page(vaddr) :
> > + virt_to_page(vaddr);
> 
> Common DMA code seems to protect this check with CONFIG_DMA_REMAP. Is it
> something we want to do it here as well? Or is there any other condition where
> vmalloc can happen?

I can see it in dma_direct_free_pages:

if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr))
vunmap(cpu_addr);

I wonder why the common DMA code does that. is_vmalloc_addr should work
regardless of CONFIG_DMA_REMAP. Maybe just for efficiency?


Re: [PATCH 02/10] swiotlb-xen: remove start_dma_addr

2020-05-21 Thread Stefano Stabellini
On Thu, 21 May 2020, Julien Grall wrote:
> Hi,
> 
> On 21/05/2020 00:45, Stefano Stabellini wrote:
> > From: Stefano Stabellini 
> > 
> > It is not strictly needed. Call virt_to_phys on xen_io_tlb_start
> > instead. It will be useful not to have a start_dma_addr around with the
> > next patches.
> > 
> > Signed-off-by: Stefano Stabellini 
> > ---
> >   drivers/xen/swiotlb-xen.c | 5 +
> >   1 file changed, 1 insertion(+), 4 deletions(-)
> > 
> > diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> > index a42129cba36e..b5e0492b07b9 100644
> > --- a/drivers/xen/swiotlb-xen.c
> > +++ b/drivers/xen/swiotlb-xen.c
> > @@ -52,8 +52,6 @@ static unsigned long xen_io_tlb_nslabs;
> >* Quick lookup value of the bus address of the IOTLB.
> >*/
> >   -static u64 start_dma_addr;
> > -
> >   /*
> >* Both of these functions should avoid XEN_PFN_PHYS because phys_addr_t
> >* can be 32bit when dma_addr_t is 64bit leading to a loss in
> > @@ -241,7 +239,6 @@ int __ref xen_swiotlb_init(int verbose, bool early)
> > m_ret = XEN_SWIOTLB_EFIXUP;
> > goto error;
> > }
> > -   start_dma_addr = xen_virt_to_bus(xen_io_tlb_start);
> > if (early) {
> > if (swiotlb_init_with_tbl(xen_io_tlb_start, xen_io_tlb_nslabs,
> >  verbose))
> > @@ -389,7 +386,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device
> > *dev, struct page *page,
> >  */
> > trace_swiotlb_bounced(dev, dev_addr, size, swiotlb_force);
> >   - map = swiotlb_tbl_map_single(dev, start_dma_addr, phys,
> > +   map = swiotlb_tbl_map_single(dev, virt_to_phys(xen_io_tlb_start),
> > phys,
> 
> xen_virt_to_bus() is implemented as xen_phys_to_bus(virt_to_phys()). Can you
> explain how the two are equivalent?

They are not equivalent. Looking at what swiotlb_tbl_map_single expects,
and also the implementation of swiotlb_init_with_tbl, I think
virt_to_phys is actually the one we want.

swiotlb_tbl_map_single compares the argument with __pa(tlb) which is
__pa(xen_io_tlb_start) which is virt_to_phys(xen_io_tlb_start).


Re: [PATCH v4 0/2] add SW BOOST support for CPPC

2020-05-21 Thread Viresh Kumar
On 22-05-20, 11:34, Xiongfeng Wang wrote:
> ACPI spec 6.2 section 8.4.7.1 provide the following two CPC registers.
> 
> "Highest performance is the absolute maximum performance an individual
> processor may reach, assuming ideal conditions. This performance level
> may not be sustainable for long durations, and may only be achievable if
> other platform components are in a specific state; for example, it may
> require other processors be in an idle state.
> 
> Nominal Performance is the maximum sustained performance level of the
> processor, assuming ideal operating conditions. In absence of an
> external constraint (power, thermal, etc.) this is the performance level
> the platform is expected to be able to maintain continuously. All
> processors are expected to be able to sustain their nominal performance
> state simultaneously."
> 
> We can use Highest Performance as the max performance in boost mode and
> Nomial Performance as the max performance in non-boost mode. If the
> Highest Performance is greater than the Nominal Performance, we assume
> SW BOOST is supported.
> 
> v3->v4:
>   run 'boost_set_msr_each' for each CPU in the policy rather than
>   each CPU in the system for 'acpi-cpufreq'
>   add 'Suggested-by'

Acked-by: Viresh Kumar 

-- 
viresh


Re: [PATCH v4 2/2] mailbox: sprd: Add Spreadtrum mailbox driver

2020-05-21 Thread Jassi Brar
On Thu, May 21, 2020 at 7:24 AM Baolin Wang  wrote:
>
> Hi Jassi,
>
> On Wed, May 13, 2020 at 2:32 PM Baolin Wang  wrote:
> >
> > On Wed, May 13, 2020 at 2:05 PM Jassi Brar  wrote:
> > >
> > > On Tue, May 12, 2020 at 11:14 PM Baolin Wang  
> > > wrote:
> > > >
> > > > Hi Jassi,
> > > >
> > > > On Thu, May 7, 2020 at 11:23 AM Baolin Wang  
> > > > wrote:
> > > > >
> > > > > Hi Jassi,
> > > > >
> > > > > On Thu, May 7, 2020 at 7:25 AM Jassi Brar  
> > > > > wrote:
> > > > > >
> > > > > > On Wed, May 6, 2020 at 8:29 AM Baolin Wang  
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi Jassi,
> > > > > > >
> > > > > > > On Tue, Apr 28, 2020 at 11:10 AM Baolin Wang 
> > > > > > >  wrote:
> > > > > > > >
> > > > > > > > From: Baolin Wang 
> > > > > > > >
> > > > > > > > The Spreadtrum mailbox controller supports 8 channels to 
> > > > > > > > communicate
> > > > > > > > with MCUs, and it contains 2 different parts: inbox and outbox, 
> > > > > > > > which
> > > > > > > > are used to send and receive messages by IRQ mode.
> > > > > > > >
> > > > > > > > Signed-off-by: Baolin Wang 
> > > > > > > > Signed-off-by: Baolin Wang 
> > > > > > > > ---
> > > > > > > > Changes from v3:
> > > > > > > >  - Save the id in mbox_chan.con_priv and remove the 
> > > > > > > > 'sprd_mbox_chan'
> > > > > > > >
> > > > > > > > Changes from v2:
> > > > > > > >  - None.
> > > > > > > >
> > > > > > > > Changes from v1:
> > > > > > > >  - None
> > > > > > >
> > > > > > > Gentle ping, do you have any other comments? Thanks.
> > > > > > >
> > > > > > Yea, I am still not sure about the error returned in send_data().  
> > > > > > It
> > > > > > will either never hit or there will be no easy recovery from it. The
> > > > > > api expects the driver to tell it the last-tx was done only when it
> > > > > > can send the next message. (There may be case like sending depend on
> > > > > > remote, which can't be ensured before hand).
> > > > >
> > > > > Actually this is an unusual case, suppose the remote target did not
> > > > > fetch the message as soon as possile, which will cause the FIFO
> > > > > overflow, so in this case we  can not send messages to the remote
> > > > > target any more, otherwise messages will be lost. Thus we can return
> > > > > errors to users to indicate that something wrong with the remote
> > > > > target need to be checked.
> > > > >
> > > > > So this validation in send_data() is mostly for debugging for this
> > > > > abnormal case and we will not trigger this issue if the remote target
> > > > > works well. So I think it is useful to keep this validation in
> > > > > send_data(). Thanks.
> > > >
> > > > Any comments? Thanks.
> > > >
> > > Same as my last post.
> >
> > I think I've explained the reason why we need add this validation in
> > my previous email, I am not sure how do you think? You still want to
> > remove this validation?
>
> Gentle ping.
>
> As I explained in previous email, this validation is for an unusual
> case, suppose the remote target did not fetch the message as soon as
> possile, which will cause the FIFO overflow, so in this case we  can
> not send messages to the remote
> target any more, otherwise messages will be lost. Thus we can return
> errors to users to indicate that something wrong with the remote
> target need to be checked.
>
> So this validation in send_data() is mostly for debugging for this
> abnormal case and we will not trigger this issue if the remote target
> works well. So I think it is useful to keep this validation in
> send_data(). What do you think? Thanks.
>
I still think the same as before.
You should do this check before you call mbox_chan_txdone() and wait
if busy ... which is exactly the purpose of txdone().
It seems harmless to be paranoid and place a block of code in
practically "if 0", but that sets bad precedence for other drivers. So
please move the check before txdone().

thanks.


[PATCH -next] mt76: mt7915: Fix build error

2020-05-21 Thread YueHaibing
In file included from ./include/linux/firmware.h:6:0,
 from drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:4:
In function ‘__mt7915_mcu_msg_send’,
inlined from ‘mt7915_mcu_send_message’ at 
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:370:6:
./include/linux/compiler.h:396:38: error: call to ‘__compiletime_assert_545’ 
declared with attribute error: BUILD_BUG_ON failed: cmd == 
MCU_EXT_CMD_EFUSE_ACCESS && mcu_txd->set_query != MCU_Q_QUERY
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
  ^
./include/linux/compiler.h:377:4: note: in definition of macro 
‘__compiletime_assert’
prefix ## suffix();\
^~
./include/linux/compiler.h:396:2: note: in expansion of macro 
‘_compiletime_assert’
  _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
  ^~~
./include/linux/build_bug.h:39:37: note: in expansion of macro 
‘compiletime_assert’
 #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
 ^~
./include/linux/build_bug.h:50:2: note: in expansion of macro ‘BUILD_BUG_ON_MSG’
  BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
  ^~~~
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c:280:2: note: in expansion of 
macro ‘BUILD_BUG_ON’
  BUILD_BUG_ON(cmd == MCU_EXT_CMD_EFUSE_ACCESS &&
  ^~~~

BUILD_BUG_ON is meaningless here, chang it to WARN_ON.

Fixes: e57b7901469f ("mt76: add mac80211 driver for MT7915 PCIe-based chipsets")
Signed-off-by: YueHaibing 
---
 drivers/net/wireless/mediatek/mt76/mt7915/mcu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c 
b/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c
index f00ad2b66761..99eeea42478f 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7915/mcu.c
@@ -277,8 +277,8 @@ static int __mt7915_mcu_msg_send(struct mt7915_dev *dev, 
struct sk_buff *skb,
}
 
mcu_txd->s2d_index = MCU_S2D_H2N;
-   BUILD_BUG_ON(cmd == MCU_EXT_CMD_EFUSE_ACCESS &&
-mcu_txd->set_query != MCU_Q_QUERY);
+   WARN_ON(cmd == MCU_EXT_CMD_EFUSE_ACCESS &&
+   mcu_txd->set_query != MCU_Q_QUERY);
 
 exit:
if (wait_seq)
-- 
2.17.1




mmotm 2020-05-21-20-42 uploaded

2020-05-21 Thread Andrew Morton
The mm-of-the-moment snapshot 2020-05-21-20-42 has been uploaded to

   http://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

http://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (5.x
or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
http://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

https://github.com/hnaz/linux-mm

The directory http://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is also available at

https://github.com/hnaz/linux-mm



This mmotm tree contains the following patches against 5.7-rc6:
(patches marked "*" will be included in linux-next)

  origin.patch
* checkpatch-test-git_dir-changes.patch
* proc-kpageflags-prevent-an-integer-overflow-in-stable_page_flags.patch
* proc-kpageflags-do-not-use-uninitialized-struct-pages.patch
* kcov-cleanup-debug-messages.patch
* kcov-fix-potential-use-after-free-in-kcov_remote_start.patch
* kcov-move-t-kcov-assignments-into-kcov_start-stop.patch
* kcov-move-t-kcov_sequence-assignment.patch
* kcov-use-t-kcov_mode-as-enabled-indicator.patch
* kcov-collect-coverage-from-interrupts.patch
* usb-core-kcov-collect-coverage-from-usb-complete-callback.patch
* memcg-optimize-memorynuma_stat-like-memorystat.patch
* lib-lzo-fix-ambiguous-encoding-bug-in-lzo-rle.patch
* device-dax-dont-leak-kernel-memory-to-user-space-after-unloading-kmem.patch
* x86-bitops-fix-build-regression.patch
* mm-compaction-avoid-vm_bug_onpageslab-in-page_mapcount.patch
* rapidio-fix-an-error-in-get_user_pages_fast-error-handling.patch
* selftests-vm-gitignore-add-mremap_dontunmap.patch
* selftests-vm-write_to_hugetlbfsc-fix-unused-variable-warning.patch
* kasan-disable-branch-tracing-for-core-runtime.patch
* sh-include-linux-time_typesh-for-sockios.patch
* maintainers-update-email-address-for-naoya-horiguchi.patch
* sparc32-use-pud-rather-than-pgd-to-get-pmd-in-srmmu_nocache_init.patch
* z3fold-fix-use-after-free-when-freeing-handles.patch
* maintainers-add-files-related-to-kdump.patch
* x86-mm-ptdump-calculate-effective-permissions-correctly.patch
* mm-ptdump-expand-type-of-val-in-note_page.patch
* squashfs-migrate-from-ll_rw_block-usage-to-bio.patch
* squashfs-migrate-from-ll_rw_block-usage-to-bio-fix.patch
* ocfs2-add-missing-annotation-for-dlm_empty_lockres.patch
* ocfs2-mount-shared-volume-without-ha-stack.patch
* arch-parisc-include-asm-pgtableh-remove-unused-old_pte.patch
* drivers-tty-serial-sh-scic-suppress-uninitialized-var-warning.patch
* ramfs-support-o_tmpfile.patch
* vfs-track-per-sb-writeback-errors-and-report-them-to-syncfs.patch
* buffer-record-blockdev-write-errors-in-super_block-that-it-backs.patch
* kernel-watchdog-flush-all-printk-nmi-buffers-when-hardlockup-detected.patch
  mm.patch
* usercopy-mark-dma-kmalloc-caches-as-usercopy-caches.patch
* mm-slub-fix-corrupted-freechain-in-deactivate_slab.patch
* mm-slub-fix-corrupted-freechain-in-deactivate_slab-fix.patch
* slub-remove-userspace-notifier-for-cache-add-remove.patch
* slub-remove-kmalloc-under-list_lock-from-list_slab_objects.patch
* mm-slub-fix-stack-overruns-with-slub_stats.patch
* mm-slub-add-panic_on_error-to-the-debug-facilities-fix.patch
* mm-dump_page-do-not-crash-with-invalid-mapping-pointer.patch
* mm-move-readahead-prototypes-from-mmh.patch
* mm-return-void-from-various-readahead-functions.patch
* mm-ignore-return-value-of-readpages.patch
* mm-move-readahead-nr_pages-check-into-read_pages.patch
* mm-add-new-readahead_control-api.patch
* mm-use-readahead_control-to-pass-arguments.patch
* mm-rename-various-offset-parameters-to-index.patch
* mm-rename-readahead-loop-variable-to-i.patch
* mm-remove-page_offset-from-readahead-loop.patch
* mm-put-readahead-pages-in-cache-earlier.patch
* mm-add-readahead-address-space-operation.patch
* mm-move-end_index-check-out-of-readahead-loop.patch
* mm-add-page_cache_readahead_unbounded.patch
* mm-document-why-we-dont-set-pagereadahead.patch
* mm-use-memalloc_nofs_save-in-readahead-path.patch
* 

Re: [RFC PATCH 07/13] sched: Add core wide task selection and scheduling.

2020-05-21 Thread Aaron Lu
On Thu, May 21, 2020 at 10:35:56PM -0400, Joel Fernandes wrote:
> Discussed a lot with Vineeth. Below is an improved version of the pick_task()
> similification.
> 
> It also handles the following "bug" in the existing code as well that Vineeth
> brought up in OSPM: Suppose 2 siblings of a core: rq 1 and rq 2.
> 
> In priority order (high to low), say we have the tasks:
> A - untagged  (rq 1)
> B - tagged(rq 2)
> C - untagged  (rq 2)
> 
> Say, B and C are in the same scheduling class.
> 
> When the pick_next_task() loop runs, it looks at rq 1 and max is A, A is
> tenantively selected for rq 1. Then it looks at rq 2 and the class_pick is B.
> But that's not compatible with A. So rq 2 gets forced idle.
> 
> In reality, rq 2 could have run C instead of idle. The fix is to add C to the
> tag tree as Peter suggested in OSPM.

I like the idea of adding untagged task to the core tree.

> Updated diff below:
> 
> ---8<---
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 005d7f7323e2d..625377f393ed3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -182,9 +182,6 @@ static void sched_core_enqueue(struct rq *rq, struct 
> task_struct *p)
>  
>   rq->core->core_task_seq++;
>  
> - if (!p->core_cookie)
> - return;
> -
>   node = >core_tree.rb_node;
>   parent = *node;
>  
> @@ -215,7 +212,7 @@ static void sched_core_dequeue(struct rq *rq, struct 
> task_struct *p)
>  
>  void sched_core_add(struct rq *rq, struct task_struct *p)
>  {
> - if (p->core_cookie && task_on_rq_queued(p))
> + if (task_on_rq_queued(p))
>   sched_core_enqueue(rq, p);
>  }

It appears there are other call sites of sched_core_enqueue() where
core_cookie is checked: cpu_cgroup_fork() and __sched_write_tag().


[PATCH v4 0/2] add SW BOOST support for CPPC

2020-05-21 Thread Xiongfeng Wang
ACPI spec 6.2 section 8.4.7.1 provide the following two CPC registers.

"Highest performance is the absolute maximum performance an individual
processor may reach, assuming ideal conditions. This performance level
may not be sustainable for long durations, and may only be achievable if
other platform components are in a specific state; for example, it may
require other processors be in an idle state.

Nominal Performance is the maximum sustained performance level of the
processor, assuming ideal operating conditions. In absence of an
external constraint (power, thermal, etc.) this is the performance level
the platform is expected to be able to maintain continuously. All
processors are expected to be able to sustain their nominal performance
state simultaneously."

We can use Highest Performance as the max performance in boost mode and
Nomial Performance as the max performance in non-boost mode. If the
Highest Performance is greater than the Nominal Performance, we assume
SW BOOST is supported.

v3->v4:
run 'boost_set_msr_each' for each CPU in the policy rather than
each CPU in the system for 'acpi-cpufreq'
add 'Suggested-by'

Xiongfeng Wang (2):
  cpufreq: change '.set_boost' to act on only one policy
  CPPC: add support for SW BOOST

 drivers/cpufreq/acpi-cpufreq.c | 10 
 drivers/cpufreq/cppc_cpufreq.c | 39 +--
 drivers/cpufreq/cpufreq.c  | 53 +-
 include/linux/cpufreq.h|  2 +-
 4 files changed, 71 insertions(+), 33 deletions(-)

-- 
1.7.12.4



[PATCH v4 2/2] CPPC: add support for SW BOOST

2020-05-21 Thread Xiongfeng Wang
To add SW BOOST support for CPPC, we need to get the max frequency of
boost mode and non-boost mode. ACPI spec 6.2 section 8.4.7.1 describe
the following two CPC registers.

"Highest performance is the absolute maximum performance an individual
processor may reach, assuming ideal conditions. This performance level
may not be sustainable for long durations, and may only be achievable if
other platform components are in a specific state; for example, it may
require other processors be in an idle state.

Nominal Performance is the maximum sustained performance level of the
processor, assuming ideal operating conditions. In absence of an
external constraint (power, thermal, etc.) this is the performance level
the platform is expected to be able to maintain continuously. All
processors are expected to be able to sustain their nominal performance
state simultaneously."

To add SW BOOST support for CPPC, we can use Highest Performance as the
max performance in boost mode and Nominal Performance as the max
performance in non-boost mode. If the Highest Performance is greater
than the Nominal Performance, we assume SW BOOST is supported.

The current CPPC driver does not support SW BOOST and use 'Highest
Performance' as the max performance the CPU can achieve. 'Nominal
Performance' is used to convert 'performance' to 'frequency'. That
means, if firmware enable boost and provide a value for Highest
Performance which is greater than Nominal Performance, boost feature is
enabled by default.

Because SW BOOST is disabled by default, so, after this patch, boost
feature is disabled by default even if boost is enabled by firmware.

Signed-off-by: Xiongfeng Wang 
Suggested-by: Viresh Kumar 
---
 drivers/cpufreq/cppc_cpufreq.c | 39 +--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index bda0b24..257d726 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -37,6 +37,7 @@
  * requested etc.
  */
 static struct cppc_cpudata **all_cpu_data;
+static bool boost_supported;
 
 struct cppc_workaround_oem_info {
char oem_id[ACPI_OEM_ID_SIZE + 1];
@@ -310,7 +311,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 * Section 8.4.7.1.1.5 of ACPI 6.1 spec)
 */
policy->min = cppc_cpufreq_perf_to_khz(cpu, 
cpu->perf_caps.lowest_nonlinear_perf);
-   policy->max = cppc_cpufreq_perf_to_khz(cpu, 
cpu->perf_caps.highest_perf);
+   policy->max = cppc_cpufreq_perf_to_khz(cpu, 
cpu->perf_caps.nominal_perf);
 
/*
 * Set cpuinfo.min_freq to Lowest to make the full range of performance
@@ -318,7 +319,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 * nonlinear perf
 */
policy->cpuinfo.min_freq = cppc_cpufreq_perf_to_khz(cpu, 
cpu->perf_caps.lowest_perf);
-   policy->cpuinfo.max_freq = cppc_cpufreq_perf_to_khz(cpu, 
cpu->perf_caps.highest_perf);
+   policy->cpuinfo.max_freq = cppc_cpufreq_perf_to_khz(cpu, 
cpu->perf_caps.nominal_perf);
 
policy->transition_delay_us = 
cppc_cpufreq_get_transition_delay_us(cpu_num);
policy->shared_type = cpu->shared_type;
@@ -343,6 +344,13 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 
cpu->cur_policy = policy;
 
+   /*
+* If 'highest_perf' is greater than 'nominal_perf', we assume CPU Boost
+* is supported.
+*/
+   if (cpu->perf_caps.highest_perf > cpu->perf_caps.nominal_perf)
+   boost_supported = true;
+
/* Set policy->cur to max now. The governors will adjust later. */
policy->cur = cppc_cpufreq_perf_to_khz(cpu,
cpu->perf_caps.highest_perf);
@@ -410,6 +418,32 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int 
cpunum)
return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
 }
 
+static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
+{
+   struct cppc_cpudata *cpudata;
+   int ret;
+
+   if (!boost_supported) {
+   pr_err("BOOST not supported by CPU or firmware\n");
+   return -EINVAL;
+   }
+
+   cpudata = all_cpu_data[policy->cpu];
+   if (state)
+   policy->max = cppc_cpufreq_perf_to_khz(cpudata,
+   cpudata->perf_caps.highest_perf);
+   else
+   policy->max = cppc_cpufreq_perf_to_khz(cpudata,
+   cpudata->perf_caps.nominal_perf);
+   policy->cpuinfo.max_freq = policy->max;
+
+   ret = freq_qos_update_request(policy->max_freq_req, policy->max);
+   if (ret < 0)
+   return ret;
+
+   return 0;
+}
+
 static struct cpufreq_driver cppc_cpufreq_driver = {
.flags = CPUFREQ_CONST_LOOPS,
.verify = cppc_verify_policy,
@@ -417,6 +451,7 @@ static unsigned int 

[PATCH v4 1/2] cpufreq: change '.set_boost' to act on only one policy

2020-05-21 Thread Xiongfeng Wang
Macro 'for_each_active_policy()' is defined internally. To avoid some
cpufreq driver needing this macro to iterate over all the policies in
'.set_boost' callback, we redefine '.set_boost' to act on only one
policy and pass the policy as an argument.
'cpufreq_boost_trigger_state()' iterate over all the policies to set
boost for the system. This is preparation for adding SW BOOST support
for CPPC.

Signed-off-by: Xiongfeng Wang 
Suggested-by: Viresh Kumar 
---
 drivers/cpufreq/acpi-cpufreq.c | 10 
 drivers/cpufreq/cpufreq.c  | 53 +-
 include/linux/cpufreq.h|  2 +-
 3 files changed, 34 insertions(+), 31 deletions(-)

diff --git a/drivers/cpufreq/acpi-cpufreq.c b/drivers/cpufreq/acpi-cpufreq.c
index 289e8ce..813aabf 100644
--- a/drivers/cpufreq/acpi-cpufreq.c
+++ b/drivers/cpufreq/acpi-cpufreq.c
@@ -126,12 +126,14 @@ static void boost_set_msr_each(void *p_en)
boost_set_msr(enable);
 }
 
-static int set_boost(int val)
+static int set_boost(struct cpufreq_policy *policy, int val)
 {
get_online_cpus();
-   on_each_cpu(boost_set_msr_each, (void *)(long)val, 1);
+   on_each_cpu_mask(policy->cpus, boost_set_msr_each,
+(void *)(long)val, 1);
put_online_cpus();
-   pr_debug("Core Boosting %sabled.\n", val ? "en" : "dis");
+   pr_debug("CPU %*pbl: Core Boosting %sabled.\n",
+cpumask_pr_args(policy->cpus), val ? "en" : "dis");
 
return 0;
 }
@@ -162,7 +164,7 @@ static ssize_t store_cpb(struct cpufreq_policy *policy, 
const char *buf,
if (ret || val > 1)
return -EINVAL;
 
-   set_boost(val);
+   set_boost(policy, val);
 
return count;
 }
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index d03f250..d0d86b1 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -2532,34 +2532,29 @@ void cpufreq_update_limits(unsigned int cpu)
 /*
  *   BOOST  *
  */
-static int cpufreq_boost_set_sw(int state)
+static int cpufreq_boost_set_sw(struct cpufreq_policy *policy, int state)
 {
-   struct cpufreq_policy *policy;
-
-   for_each_active_policy(policy) {
-   int ret;
-
-   if (!policy->freq_table)
-   return -ENXIO;
+   int ret;
 
-   ret = cpufreq_frequency_table_cpuinfo(policy,
- policy->freq_table);
-   if (ret) {
-   pr_err("%s: Policy frequency update failed\n",
-  __func__);
-   return ret;
-   }
+   if (!policy->freq_table)
+   return -ENXIO;
 
-   ret = freq_qos_update_request(policy->max_freq_req, 
policy->max);
-   if (ret < 0)
-   return ret;
+   ret = cpufreq_frequency_table_cpuinfo(policy, policy->freq_table);
+   if (ret) {
+   pr_err("%s: Policy frequency update failed\n", __func__);
+   return ret;
}
 
+   ret = freq_qos_update_request(policy->max_freq_req, policy->max);
+   if (ret < 0)
+   return ret;
+
return 0;
 }
 
 int cpufreq_boost_trigger_state(int state)
 {
+   struct cpufreq_policy *policy;
unsigned long flags;
int ret = 0;
 
@@ -2570,16 +2565,22 @@ int cpufreq_boost_trigger_state(int state)
cpufreq_driver->boost_enabled = state;
write_unlock_irqrestore(_driver_lock, flags);
 
-   ret = cpufreq_driver->set_boost(state);
-   if (ret) {
-   write_lock_irqsave(_driver_lock, flags);
-   cpufreq_driver->boost_enabled = !state;
-   write_unlock_irqrestore(_driver_lock, flags);
-
-   pr_err("%s: Cannot %s BOOST\n",
-  __func__, state ? "enable" : "disable");
+   for_each_active_policy(policy) {
+   ret = cpufreq_driver->set_boost(policy, state);
+   if (ret)
+   goto err_reset_state;
}
 
+   return 0;
+
+err_reset_state:
+   write_lock_irqsave(_driver_lock, flags);
+   cpufreq_driver->boost_enabled = !state;
+   write_unlock_irqrestore(_driver_lock, flags);
+
+   pr_err("%s: Cannot %s BOOST\n",
+  __func__, state ? "enable" : "disable");
+
return ret;
 }
 
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 67d5950..3494f67 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -367,7 +367,7 @@ struct cpufreq_driver {
 
/* platform specific boost support code */
boolboost_enabled;
-   int (*set_boost)(int state);
+   int (*set_boost)(struct cpufreq_policy *policy, int state);
 

[PATCH v5 08/14] PCI: cadence: Fix updating Vendor ID and Subsystem Vendor ID register

2020-05-21 Thread Kishon Vijay Abraham I
Commit 1b79c5284439 ("PCI: cadence: Add host driver for Cadence PCIe
controller") in order to update Vendor ID, directly wrote to
PCI_VENDOR_ID register. However PCI_VENDOR_ID in root port configuration
space is read-only register and writing to it will have no effect.
Use local management register to configure Vendor ID and Subsystem Vendor
ID.

Fixes: 1b79c5284439 ("PCI: cadence: Add host driver for Cadence PCIe 
controller")
Reviewed-by: Rob Herring 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/controller/cadence/pcie-cadence-host.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/controller/cadence/pcie-cadence-host.c 
b/drivers/pci/controller/cadence/pcie-cadence-host.c
index 3003fafa3bfa..7ee9e06f1285 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-host.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-host.c
@@ -76,6 +76,7 @@ static int cdns_pcie_host_init_root_port(struct cdns_pcie_rc 
*rc)
 {
struct cdns_pcie *pcie = >pcie;
u32 value, ctrl;
+   u32 id;
 
/*
 * Set the root complex BAR configuration register:
@@ -95,8 +96,12 @@ static int cdns_pcie_host_init_root_port(struct cdns_pcie_rc 
*rc)
cdns_pcie_writel(pcie, CDNS_PCIE_LM_RC_BAR_CFG, value);
 
/* Set root port configuration space */
-   if (rc->vendor_id != 0x)
-   cdns_pcie_rp_writew(pcie, PCI_VENDOR_ID, rc->vendor_id);
+   if (rc->vendor_id != 0x) {
+   id = CDNS_PCIE_LM_ID_VENDOR(rc->vendor_id) |
+   CDNS_PCIE_LM_ID_SUBSYS(rc->vendor_id);
+   cdns_pcie_writel(pcie, CDNS_PCIE_LM_ID, id);
+   }
+
if (rc->device_id != 0x)
cdns_pcie_rp_writew(pcie, PCI_DEVICE_ID, rc->device_id);
 
-- 
2.17.1



[PATCH v5 14/14] MAINTAINERS: Add Kishon Vijay Abraham I for TI J721E SoC PCIe

2020-05-21 Thread Kishon Vijay Abraham I
Add Kishon Vijay Abraham I as MAINTAINER for TI J721E SoC PCIe.

Acked-by: Rob Herring 
Signed-off-by: Kishon Vijay Abraham I 
---
 MAINTAINERS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 2926327e4976..9d40e1318f7c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12978,12 +12978,14 @@ S:Maintained
 F: Documentation/devicetree/bindings/pci/designware-pcie.txt
 F: drivers/pci/controller/dwc/*designware*
 
-PCI DRIVER FOR TI DRA7XX
+PCI DRIVER FOR TI DRA7XX/J721E
 M: Kishon Vijay Abraham I 
 L: linux-o...@vger.kernel.org
 L: linux-...@vger.kernel.org
+L: linux-arm-ker...@lists.infradead.org
 S: Supported
 F: Documentation/devicetree/bindings/pci/ti-pci.txt
+F: drivers/pci/controller/cadence/pci-j721e.c
 F: drivers/pci/controller/dwc/pci-dra7xx.c
 
 PCI DRIVER FOR TI KEYSTONE
-- 
2.17.1



[PATCH v5 12/14] PCI: j721e: Add TI J721E PCIe driver

2020-05-21 Thread Kishon Vijay Abraham I
Add support for PCIe controller in J721E SoC. The controller uses the
Cadence PCIe core programmed by pcie-cadence*.c. The PCIe controller
will work in both host mode and device mode.
Some of the features of the controller are:
  *) Supports both RC mode and EP mode
  *) Supports MSI and MSI-X support
  *) Supports upto GEN3 speed mode
  *) Supports SR-IOV capability
  *) Ability to route all transactions via SMMU (support will be added
 in a later patch).

Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/controller/cadence/Kconfig|  23 +
 drivers/pci/controller/cadence/Makefile   |   1 +
 drivers/pci/controller/cadence/pci-j721e.c| 490 ++
 .../controller/cadence/pcie-cadence-host.c|   4 +-
 drivers/pci/controller/cadence/pcie-cadence.h |   8 +
 5 files changed, 524 insertions(+), 2 deletions(-)
 create mode 100644 drivers/pci/controller/cadence/pci-j721e.c

diff --git a/drivers/pci/controller/cadence/Kconfig 
b/drivers/pci/controller/cadence/Kconfig
index b76b3cf55ce5..5d30564190e1 100644
--- a/drivers/pci/controller/cadence/Kconfig
+++ b/drivers/pci/controller/cadence/Kconfig
@@ -42,4 +42,27 @@ config PCIE_CADENCE_PLAT_EP
  endpoint mode. This PCIe controller may be embedded into many
  different vendors SoCs.
 
+config PCI_J721E
+   bool
+
+config PCI_J721E_HOST
+   bool "TI J721E PCIe platform host controller"
+   depends on OF
+   select PCIE_CADENCE_HOST
+   select PCI_J721E
+   help
+ Say Y here if you want to support the TI J721E PCIe platform
+ controller in host mode. TI J721E PCIe controller uses Cadence PCIe
+ core.
+
+config PCI_J721E_EP
+   bool "TI J721E PCIe platform endpoint controller"
+   depends on OF
+   depends on PCI_ENDPOINT
+   select PCIE_CADENCE_EP
+   select PCI_J721E
+   help
+ Say Y here if you want to support the TI J721E PCIe platform
+ controller in endpoint mode. TI J721E PCIe controller uses Cadence 
PCIe
+ core.
 endmenu
diff --git a/drivers/pci/controller/cadence/Makefile 
b/drivers/pci/controller/cadence/Makefile
index 232a3f20876a..9bac5fb2f13d 100644
--- a/drivers/pci/controller/cadence/Makefile
+++ b/drivers/pci/controller/cadence/Makefile
@@ -3,3 +3,4 @@ obj-$(CONFIG_PCIE_CADENCE) += pcie-cadence.o
 obj-$(CONFIG_PCIE_CADENCE_HOST) += pcie-cadence-host.o
 obj-$(CONFIG_PCIE_CADENCE_EP) += pcie-cadence-ep.o
 obj-$(CONFIG_PCIE_CADENCE_PLAT) += pcie-cadence-plat.o
+obj-$(CONFIG_PCI_J721E) += pci-j721e.o
diff --git a/drivers/pci/controller/cadence/pci-j721e.c 
b/drivers/pci/controller/cadence/pci-j721e.c
new file mode 100644
index ..9b3ab880a3c5
--- /dev/null
+++ b/drivers/pci/controller/cadence/pci-j721e.c
@@ -0,0 +1,490 @@
+// SPDX-License-Identifier: GPL-2.0
+/**
+ * pci-j721e - PCIe controller driver for TI's J721E SoCs
+ *
+ * Copyright (C) 2020 Texas Instruments Incorporated - http://www.ti.com
+ * Author: Kishon Vijay Abraham I 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../../pci.h"
+#include "pcie-cadence.h"
+
+#define ENABLE_REG_SYS_2   0x108
+#define STATUS_REG_SYS_2   0x508
+#define STATUS_CLR_REG_SYS_2   0x708
+#define LINK_DOWN  BIT(1)
+
+#define J721E_PCIE_USER_CMD_STATUS 0x4
+#define LINK_TRAINING_ENABLE   BIT(0)
+
+#define J721E_PCIE_USER_LINKSTATUS 0x14
+#define LINK_STATUSGENMASK(1, 0)
+
+enum link_status {
+   NO_RECEIVERS_DETECTED,
+   LINK_TRAINING_IN_PROGRESS,
+   LINK_UP_DL_IN_PROGRESS,
+   LINK_UP_DL_COMPLETED,
+};
+
+#define J721E_MODE_RC  BIT(7)
+#define LANE_COUNT_MASKBIT(8)
+#define LANE_COUNT(n)  ((n) << 8)
+
+#define GENERATION_SEL_MASKGENMASK(1, 0)
+
+#define MAX_LANES  2
+
+struct j721e_pcie {
+   struct device   *dev;
+   u32 mode;
+   u32 num_lanes;
+   struct cdns_pcie*cdns_pcie;
+   void __iomem*user_cfg_base;
+   void __iomem*intd_cfg_base;
+};
+
+enum j721e_pcie_mode {
+   PCI_MODE_RC,
+   PCI_MODE_EP,
+};
+
+struct j721e_pcie_data {
+   enum j721e_pcie_modemode;
+};
+
+static inline u32 j721e_pcie_user_readl(struct j721e_pcie *pcie, u32 offset)
+{
+   return readl(pcie->user_cfg_base + offset);
+}
+
+static inline void j721e_pcie_user_writel(struct j721e_pcie *pcie, u32 offset,
+ u32 value)
+{
+   writel(value, pcie->user_cfg_base + offset);
+}
+
+static inline u32 j721e_pcie_intd_readl(struct j721e_pcie *pcie, u32 offset)
+{
+   return readl(pcie->intd_cfg_base + offset);
+}
+
+static inline void j721e_pcie_intd_writel(struct j721e_pcie *pcie, u32 offset,
+ u32 value)
+{
+   writel(value, 

[PATCH v5 09/14] PCI: cadence: Add MSI-X support to Endpoint driver

2020-05-21 Thread Kishon Vijay Abraham I
From: Alan Douglas 

Implement ->set_msix() and ->get_msix() callback functions in order
to configure MSIX capability in the PCIe endpoint controller.

Add cdns_pcie_ep_send_msix_irq() to send MSIX interrupts to Host.
cdns_pcie_ep_send_msix_irq() gets the MSIX table address (virtual
address) from "struct cdns_pcie_epf" that gets initialized in
->set_bar() call back function.

Signed-off-by: Alan Douglas 
[kis...@ti.com: Re-implement MSIX support in accordance with the
 re-designed core MSI-X interfaces]
Signed-off-by: Kishon Vijay Abraham I 
---
 .../pci/controller/cadence/pcie-cadence-ep.c  | 108 +-
 drivers/pci/controller/cadence/pcie-cadence.h |  10 ++
 2 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c 
b/drivers/pci/controller/cadence/pcie-cadence-ep.c
index 14021d760482..c5696274d81f 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
@@ -51,6 +51,7 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
struct pci_epf_bar *epf_bar)
 {
struct cdns_pcie_ep *ep = epc_get_drvdata(epc);
+   struct cdns_pcie_epf *epf = >epf[fn];
struct cdns_pcie *pcie = >pcie;
dma_addr_t bar_phys = epf_bar->phys_addr;
enum pci_barno bar = epf_bar->barno;
@@ -111,6 +112,8 @@ static int cdns_pcie_ep_set_bar(struct pci_epc *epc, u8 fn,
CDNS_PCIE_LM_EP_FUNC_BAR_CFG_BAR_CTRL(b, ctrl));
cdns_pcie_writel(pcie, reg, cfg);
 
+   epf->epf_bar[bar] = epf_bar;
+
return 0;
 }
 
@@ -118,6 +121,7 @@ static void cdns_pcie_ep_clear_bar(struct pci_epc *epc, u8 
fn,
   struct pci_epf_bar *epf_bar)
 {
struct cdns_pcie_ep *ep = epc_get_drvdata(epc);
+   struct cdns_pcie_epf *epf = >epf[fn];
struct cdns_pcie *pcie = >pcie;
enum pci_barno bar = epf_bar->barno;
u32 reg, cfg, b, ctrl;
@@ -139,6 +143,8 @@ static void cdns_pcie_ep_clear_bar(struct pci_epc *epc, u8 
fn,
 
cdns_pcie_writel(pcie, CDNS_PCIE_AT_IB_EP_FUNC_BAR_ADDR0(fn, bar), 0);
cdns_pcie_writel(pcie, CDNS_PCIE_AT_IB_EP_FUNC_BAR_ADDR1(fn, bar), 0);
+
+   epf->epf_bar[bar] = NULL;
 }
 
 static int cdns_pcie_ep_map_addr(struct pci_epc *epc, u8 fn, phys_addr_t addr,
@@ -224,6 +230,50 @@ static int cdns_pcie_ep_get_msi(struct pci_epc *epc, u8 fn)
return mme;
 }
 
+static int cdns_pcie_ep_get_msix(struct pci_epc *epc, u8 func_no)
+{
+   struct cdns_pcie_ep *ep = epc_get_drvdata(epc);
+   struct cdns_pcie *pcie = >pcie;
+   u32 cap = CDNS_PCIE_EP_FUNC_MSIX_CAP_OFFSET;
+   u32 val, reg;
+
+   reg = cap + PCI_MSIX_FLAGS;
+   val = cdns_pcie_ep_fn_readw(pcie, func_no, reg);
+   if (!(val & PCI_MSIX_FLAGS_ENABLE))
+   return -EINVAL;
+
+   val &= PCI_MSIX_FLAGS_QSIZE;
+
+   return val;
+}
+
+static int cdns_pcie_ep_set_msix(struct pci_epc *epc, u8 fn, u16 interrupts,
+enum pci_barno bir, u32 offset)
+{
+   struct cdns_pcie_ep *ep = epc_get_drvdata(epc);
+   struct cdns_pcie *pcie = >pcie;
+   u32 cap = CDNS_PCIE_EP_FUNC_MSIX_CAP_OFFSET;
+   u32 val, reg;
+
+   reg = cap + PCI_MSIX_FLAGS;
+   val = cdns_pcie_ep_fn_readw(pcie, fn, reg);
+   val &= ~PCI_MSIX_FLAGS_QSIZE;
+   val |= interrupts;
+   cdns_pcie_ep_fn_writew(pcie, fn, reg, val);
+
+   /* Set MSIX BAR and offset */
+   reg = cap + PCI_MSIX_TABLE;
+   val = offset | bir;
+   cdns_pcie_ep_fn_writel(pcie, fn, reg, val);
+
+   /* Set PBA BAR and offset.  BAR must match MSIX BAR */
+   reg = cap + PCI_MSIX_PBA;
+   val = (offset + (interrupts * PCI_MSIX_ENTRY_SIZE)) | bir;
+   cdns_pcie_ep_fn_writel(pcie, fn, reg, val);
+
+   return 0;
+}
+
 static void cdns_pcie_ep_assert_intx(struct cdns_pcie_ep *ep, u8 fn,
 u8 intx, bool is_asserted)
 {
@@ -330,6 +380,52 @@ static int cdns_pcie_ep_send_msi_irq(struct cdns_pcie_ep 
*ep, u8 fn,
return 0;
 }
 
+static int cdns_pcie_ep_send_msix_irq(struct cdns_pcie_ep *ep, u8 fn,
+ u16 interrupt_num)
+{
+   u32 cap = CDNS_PCIE_EP_FUNC_MSIX_CAP_OFFSET;
+   u32 tbl_offset, msg_data, reg, vec_ctrl;
+   struct cdns_pcie *pcie = >pcie;
+   struct pci_epf_msix_tbl *msix_tbl;
+   struct cdns_pcie_epf *epf;
+   u64 pci_addr_mask = 0xff;
+   u64 msg_addr;
+   u16 flags;
+   u8 bir;
+
+   /* Check whether the MSI-X feature has been enabled by the PCI host. */
+   flags = cdns_pcie_ep_fn_readw(pcie, fn, cap + PCI_MSIX_FLAGS);
+   if (!(flags & PCI_MSIX_FLAGS_ENABLE))
+   return -EINVAL;
+
+   reg = cap + PCI_MSIX_TABLE;
+   tbl_offset = cdns_pcie_ep_fn_readl(pcie, fn, reg);
+   bir = tbl_offset & PCI_MSIX_TABLE_BIR;
+   tbl_offset &= PCI_MSIX_TABLE_OFFSET;
+
+   epf = 

[PATCH] kernel/hung_task: Use task_pid_nr function to get pid

2020-05-21 Thread qiang.zhang
From: Zhang Qiang 

Use task_pid_nr(t) function instead of t->pid when printing
task pid

Signed-off-by: Zhang Qiang 
---
 kernel/hung_task.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/hung_task.c b/kernel/hung_task.c
index 14a625c16cb3..f397beb8c9e1 100644
--- a/kernel/hung_task.c
+++ b/kernel/hung_task.c
@@ -128,7 +128,7 @@ static void check_hung_task(struct task_struct *t, unsigned 
long timeout)
if (sysctl_hung_task_warnings > 0)
sysctl_hung_task_warnings--;
pr_err("INFO: task %s:%d blocked for more than %ld seconds.\n",
-  t->comm, t->pid, (jiffies - t->last_switch_time) / HZ);
+  t->comm, task_pid_nr(t), (jiffies - t->last_switch_time) 
/ HZ);
pr_err("  %s %s %.*s\n",
print_tainted(), init_utsname()->release,
(int)strcspn(init_utsname()->version, " "),
-- 
2.24.1



[PATCH v5 13/14] misc: pci_endpoint_test: Add J721E in pci_device_id table

2020-05-21 Thread Kishon Vijay Abraham I
Add J721E in pci_device_id table so that pci-epf-test can be used
for testing PCIe EP in J721E.

Reviewed-by: Rob Herring 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/misc/pci_endpoint_test.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/misc/pci_endpoint_test.c b/drivers/misc/pci_endpoint_test.c
index ef5a1af6bab7..a70b17e5dd9a 100644
--- a/drivers/misc/pci_endpoint_test.c
+++ b/drivers/misc/pci_endpoint_test.c
@@ -68,6 +68,7 @@
 #define PCI_ENDPOINT_TEST_FLAGS0x2c
 #define FLAG_USE_DMA   BIT(0)
 
+#define PCI_DEVICE_ID_TI_J721E 0xb00d
 #define PCI_DEVICE_ID_TI_AM654 0xb00c
 
 #define is_am654_pci_dev(pdev) \
@@ -930,6 +931,11 @@ static const struct pci_endpoint_test_data am654_data = {
.irq_type = IRQ_TYPE_MSI,
 };
 
+static const struct pci_endpoint_test_data j721e_data = {
+   .alignment = 256,
+   .irq_type = IRQ_TYPE_MSI,
+};
+
 static const struct pci_device_id pci_endpoint_test_tbl[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_DRA74x),
  .driver_data = (kernel_ulong_t)_data,
@@ -942,6 +948,9 @@ static const struct pci_device_id pci_endpoint_test_tbl[] = 
{
{ PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_AM654),
  .driver_data = (kernel_ulong_t)_data
},
+   { PCI_DEVICE(PCI_VENDOR_ID_TI, PCI_DEVICE_ID_TI_J721E),
+ .driver_data = (kernel_ulong_t)_data,
+   },
{ }
 };
 MODULE_DEVICE_TABLE(pci, pci_endpoint_test_tbl);
-- 
2.17.1



[PATCH v5 10/14] dt-bindings: PCI: Add host mode dt-bindings for TI's J721E SoC

2020-05-21 Thread Kishon Vijay Abraham I
Add host mode dt-bindings for TI's J721E SoC.

Signed-off-by: Kishon Vijay Abraham I 
Reviewed-by: Rob Herring 
---
 .../bindings/pci/ti,j721e-pci-host.yaml   | 113 ++
 1 file changed, 113 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/ti,j721e-pci-host.yaml

diff --git a/Documentation/devicetree/bindings/pci/ti,j721e-pci-host.yaml 
b/Documentation/devicetree/bindings/pci/ti,j721e-pci-host.yaml
new file mode 100644
index ..d7b60487c6c3
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/ti,j721e-pci-host.yaml
@@ -0,0 +1,113 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright (C) 2019 Texas Instruments Incorporated - http://www.ti.com/
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/pci/ti,j721e-pci-host.yaml#;
+$schema: "http://devicetree.org/meta-schemas/core.yaml#;
+
+title: TI J721E PCI Host (PCIe Wrapper)
+
+maintainers:
+  - Kishon Vijay Abraham I 
+
+allOf:
+  - $ref: "cdns-pcie-host.yaml#"
+
+properties:
+  compatible:
+enum:
+  - ti,j721e-pcie-host
+
+  reg:
+maxItems: 4
+
+  reg-names:
+items:
+  - const: intd_cfg
+  - const: user_cfg
+  - const: reg
+  - const: cfg
+
+  ti,syscon-pcie-ctrl:
+description: Phandle to the SYSCON entry required for configuring PCIe mode
+  and link speed.
+allOf:
+  - $ref: /schemas/types.yaml#/definitions/phandle
+
+  power-domains:
+maxItems: 1
+
+  clocks:
+maxItems: 1
+description: clock-specifier to represent input to the PCIe
+
+  clock-names:
+items:
+  - const: fck
+
+  vendor-id:
+const: 0x104c
+
+  device-id:
+const: 0xb00d
+
+  msi-map: true
+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - ti,syscon-pcie-ctrl
+  - max-link-speed
+  - num-lanes
+  - power-domains
+  - clocks
+  - clock-names
+  - vendor-id
+  - device-id
+  - msi-map
+  - dma-coherent
+  - dma-ranges
+  - ranges
+  - reset-gpios
+  - phys
+  - phy-names
+
+examples:
+  - |
+#include 
+#include 
+
+bus {
+#address-cells = <2>;
+#size-cells = <2>;
+
+pcie0_rc: pcie@290 {
+compatible = "ti,j721e-pcie-host";
+reg = <0x00 0x0290 0x00 0x1000>,
+  <0x00 0x02907000 0x00 0x400>,
+  <0x00 0x0d00 0x00 0x0080>,
+  <0x00 0x1000 0x00 0x1000>;
+reg-names = "intd_cfg", "user_cfg", "reg", "cfg";
+ti,syscon-pcie-ctrl = <_ctrl>;
+max-link-speed = <3>;
+num-lanes = <2>;
+power-domains = <_pds 239 TI_SCI_PD_EXCLUSIVE>;
+clocks = <_clks 239 1>;
+clock-names = "fck";
+device_type = "pci";
+#address-cells = <3>;
+#size-cells = <2>;
+bus-range = <0x0 0xf>;
+vendor-id = <0x104c>;
+device-id = <0xb00d>;
+msi-map = <0x0 _its 0x0 0x1>;
+dma-coherent;
+reset-gpios = < 6 GPIO_ACTIVE_HIGH>;
+phys = <_pcie_link>;
+phy-names = "pcie-phy";
+ranges = <0x0100 0x0 0x10001000  0x00 0x10001000  0x0 
0x001>,
+ <0x0200 0x0 0x10011000  0x00 0x10011000  0x0 
0x7fef000>;
+dma-ranges = <0x0200 0x0 0x0 0x0 0x0 0x1 0x0>;
+};
+};
-- 
2.17.1



[PATCH v5 11/14] dt-bindings: PCI: Add EP mode dt-bindings for TI's J721E SoC

2020-05-21 Thread Kishon Vijay Abraham I
Add PCIe EP mode dt-bindings for TI's J721E SoC.

Signed-off-by: Kishon Vijay Abraham I 
Reviewed-by: Rob Herring 
---
 .../bindings/pci/ti,j721e-pci-ep.yaml | 89 +++
 1 file changed, 89 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/pci/ti,j721e-pci-ep.yaml

diff --git a/Documentation/devicetree/bindings/pci/ti,j721e-pci-ep.yaml 
b/Documentation/devicetree/bindings/pci/ti,j721e-pci-ep.yaml
new file mode 100644
index ..c09d25b2c1b2
--- /dev/null
+++ b/Documentation/devicetree/bindings/pci/ti,j721e-pci-ep.yaml
@@ -0,0 +1,89 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+# Copyright (C) 2020 Texas Instruments Incorporated - http://www.ti.com/
+%YAML 1.2
+---
+$id: "http://devicetree.org/schemas/pci/ti,j721e-pci-ep.yaml#;
+$schema: "http://devicetree.org/meta-schemas/core.yaml#;
+
+title: TI J721E PCI EP (PCIe Wrapper)
+
+maintainers:
+  - Kishon Vijay Abraham I 
+
+allOf:
+  - $ref: "cdns-pcie-ep.yaml#"
+
+properties:
+  compatible:
+enum:
+  - ti,j721e-pcie-ep
+
+  reg:
+maxItems: 4
+
+  reg-names:
+items:
+  - const: intd_cfg
+  - const: user_cfg
+  - const: reg
+  - const: mem
+
+  ti,syscon-pcie-ctrl:
+description: Phandle to the SYSCON entry required for configuring PCIe mode
+ and link speed.
+allOf:
+  - $ref: /schemas/types.yaml#/definitions/phandle
+
+  power-domains:
+maxItems: 1
+
+  clocks:
+maxItems: 1
+description: clock-specifier to represent input to the PCIe
+
+  clock-names:
+items:
+  - const: fck
+
+  dma-coherent:
+description: Indicates that the PCIe IP block can ensure the coherency
+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - ti,syscon-pcie-ctrl
+  - max-link-speed
+  - num-lanes
+  - power-domains
+  - clocks
+  - clock-names
+  - cdns,max-outbound-regions
+  - dma-coherent
+  - max-functions
+  - phys
+  - phy-names
+
+examples:
+  - |
+#include 
+
+ pcie0_ep: pcie-ep@d00 {
+compatible = "ti,j721e-pcie-ep";
+reg = <0x00 0x0290 0x00 0x1000>,
+  <0x00 0x02907000 0x00 0x400>,
+  <0x00 0x0d00 0x00 0x0080>,
+  <0x00 0x1000 0x00 0x0800>;
+reg-names = "intd_cfg", "user_cfg", "reg", "mem";
+ti,syscon-pcie-ctrl = <_ctrl>;
+max-link-speed = <3>;
+num-lanes = <2>;
+power-domains = <_pds 239 TI_SCI_PD_EXCLUSIVE>;
+clocks = <_clks 239 1>;
+clock-names = "fck";
+cdns,max-outbound-regions = <16>;
+max-functions = /bits/ 8 <6>;
+dma-coherent;
+phys = <_pcie_link>;
+phy-names = "pcie-phy";
+};
-- 
2.17.1



[PATCH v5 00/14] Add PCIe support to TI's J721E SoC

2020-05-21 Thread Kishon Vijay Abraham I
TI's J721E SoC uses Cadence PCIe core to implement both RC mode
and EP mode.

The high level features are:
  *) Supports Legacy, MSI and MSI-X interrupt
  *) Supports upto GEN4 speed mode
  *) Supports SR-IOV
  *) Supports multiple physical function
  *) Ability to route all transactions via SMMU

This patch series
  *) Add support in Cadence PCIe core to be used for TI's J721E SoC
  *) Add a driver for J721E PCIe wrapper

v1 of the series can be found @ [1]
v2 of the series can be found @ [2]
v3 of the series can be found @ [5]
v4 of the series can be found @ [6]

Changes from v4:
1) Added Reviewed-by: & Acked-by: tags from RobH
2) Removed un-used accessors for pcie-cadence.h and removed having ops
   for read/write accessors
3) Updated cdns,cdns-pcie-host.yaml to remove "mem" from reg

Changes from v3:
1) Changed the order of files in MAINTAINTERS file to fix Joe's comments
2) Fixed indentation and added Reviewed-by: Rob Herring 
3) Cleaned up computing msix_tbl
4) Fixed RobH's comment on J721E driver

Changes from v2:
1) Converting Cadence binding to YAML schema was done as a
   separate series [3] & [4]. [3] is merged and [4] is
   pending.
2) Included MSI-X support in this series
3) Added link down interrupt handling (only error message)
4) Rebased to latest 5.7-rc1
5) Adapted TI J721E binding to [3] & [4]

Changes from v1:
1) Added DT schemas cdns-pcie-host.yaml, cdns-pcie-ep.yaml and
   cdns-pcie.yaml for Cadence PCIe core and included it in
   TI's PCIe DT schema.
2) Added cpu_addr_fixup() for Cadence Platform driver.
3) Fixed subject/description/renamed functions as commented by
   Andrew Murray.

[1] -> http://lore.kernel.org/r/20191209092147.22901-1-kis...@ti.com
[2] -> http://lore.kernel.org/r/20200106102058.19183-1-kis...@ti.com
[3] -> http://lore.kernel.org/r/20200305103017.16706-1-kis...@ti.com
[4] -> http://lore.kernel.org/r/20200417114322.3-1-kis...@ti.com
[5] -> http://lore.kernel.org/r/20200417125753.13021-1-kis...@ti.com
[6] -> http://lore.kernel.org/r/20200506151429.12255-1-kis...@ti.com

Alan Douglas (1):
  PCI: cadence: Add MSI-X support to Endpoint driver

Kishon Vijay Abraham I (13):
  PCI: cadence: Fix cdns_pcie_{host|ep}_setup() error path
  linux/kernel.h: Add PTR_ALIGN_DOWN macro
  PCI: cadence: Convert all r/w accessors to perform only 32-bit
accesses
  PCI: cadence: Add support to start link and verify link status
  PCI: cadence: Allow pci_host_bridge to have custom pci_ops
  dt-bindings: PCI: cadence: Remove "mem" from reg binding
  PCI: cadence: Add new *ops* for CPU addr fixup
  PCI: cadence: Fix updating Vendor ID and Subsystem Vendor ID register
  dt-bindings: PCI: Add host mode dt-bindings for TI's J721E SoC
  dt-bindings: PCI: Add EP mode dt-bindings for TI's J721E SoC
  PCI: j721e: Add TI J721E PCIe driver
  misc: pci_endpoint_test: Add J721E in pci_device_id table
  MAINTAINERS: Add Kishon Vijay Abraham I for TI J721E SoC PCIe

 .../bindings/pci/cdns,cdns-pcie-host.yaml |   8 +-
 .../bindings/pci/ti,j721e-pci-ep.yaml |  89 
 .../bindings/pci/ti,j721e-pci-host.yaml   | 113 
 MAINTAINERS   |   4 +-
 drivers/misc/pci_endpoint_test.c  |   9 +
 drivers/pci/controller/cadence/Kconfig|  23 +
 drivers/pci/controller/cadence/Makefile   |   1 +
 drivers/pci/controller/cadence/pci-j721e.c| 490 ++
 .../pci/controller/cadence/pcie-cadence-ep.c  | 125 -
 .../controller/cadence/pcie-cadence-host.c|  59 ++-
 .../controller/cadence/pcie-cadence-plat.c|  13 +
 drivers/pci/controller/cadence/pcie-cadence.c |   8 +-
 drivers/pci/controller/cadence/pcie-cadence.h | 127 -
 include/linux/kernel.h|   1 +
 14 files changed, 1017 insertions(+), 53 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/pci/ti,j721e-pci-ep.yaml
 create mode 100644 Documentation/devicetree/bindings/pci/ti,j721e-pci-host.yaml
 create mode 100644 drivers/pci/controller/cadence/pci-j721e.c

-- 
2.17.1



[PATCH v5 02/14] linux/kernel.h: Add PTR_ALIGN_DOWN macro

2020-05-21 Thread Kishon Vijay Abraham I
Add a macro for aligning down a pointer. This is useful to get an
aligned register address when a device allows only word access and
doesn't allow half word or byte access.

Acked-by: Rob Herring 
Signed-off-by: Kishon Vijay Abraham I 
---
 include/linux/kernel.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 9b7a8d74a9d6..c3b361b5be54 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -34,6 +34,7 @@
 #define ALIGN_DOWN(x, a)   __ALIGN_KERNEL((x) - ((a) - 1), (a))
 #define __ALIGN_MASK(x, mask)  __ALIGN_KERNEL_MASK((x), (mask))
 #define PTR_ALIGN(p, a)((typeof(p))ALIGN((unsigned long)(p), 
(a)))
+#define PTR_ALIGN_DOWN(p, a)   ((typeof(p))ALIGN_DOWN((unsigned long)(p), (a)))
 #define IS_ALIGNED(x, a)   (((x) & ((typeof(x))(a) - 1)) == 0)
 
 /* generic data direction definitions */
-- 
2.17.1



[PATCH v5 07/14] PCI: cadence: Add new *ops* for CPU addr fixup

2020-05-21 Thread Kishon Vijay Abraham I
Cadence driver uses "mem" memory resource to obtain the offset of
configuration space address region, memory space address region and
message space address region. The obtained offset is used to program
the Address Translation Unit (ATU). However certain platforms like TI's
J721E SoC require the absolute address to be programmed in the ATU and not
just the offset.

Signed-off-by: Kishon Vijay Abraham I 
---
 .../pci/controller/cadence/pcie-cadence-host.c| 15 ---
 .../pci/controller/cadence/pcie-cadence-plat.c| 13 +
 drivers/pci/controller/cadence/pcie-cadence.c |  8 ++--
 drivers/pci/controller/cadence/pcie-cadence.h |  1 +
 4 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/controller/cadence/pcie-cadence-host.c 
b/drivers/pci/controller/cadence/pcie-cadence-host.c
index 62796791f02c..3003fafa3bfa 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-host.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-host.c
@@ -330,15 +330,14 @@ static int cdns_pcie_host_map_dma_ranges(struct 
cdns_pcie_rc *rc)
 static int cdns_pcie_host_init_address_translation(struct cdns_pcie_rc *rc)
 {
struct cdns_pcie *pcie = >pcie;
-   struct resource *mem_res = pcie->mem_res;
struct resource *bus_range = rc->bus_range;
struct resource *cfg_res = rc->cfg_res;
struct device *dev = pcie->dev;
struct device_node *np = dev->of_node;
struct of_pci_range_parser parser;
+   u64 cpu_addr = cfg_res->start;
struct of_pci_range range;
u32 addr0, addr1, desc1;
-   u64 cpu_addr;
int r, err;
 
/*
@@ -351,7 +350,9 @@ static int cdns_pcie_host_init_address_translation(struct 
cdns_pcie_rc *rc)
cdns_pcie_writel(pcie, CDNS_PCIE_AT_OB_REGION_PCI_ADDR1(0), addr1);
cdns_pcie_writel(pcie, CDNS_PCIE_AT_OB_REGION_DESC1(0), desc1);
 
-   cpu_addr = cfg_res->start - mem_res->start;
+   if (pcie->ops->cpu_addr_fixup)
+   cpu_addr = pcie->ops->cpu_addr_fixup(pcie, cpu_addr);
+
addr0 = CDNS_PCIE_AT_OB_REGION_CPU_ADDR0_NBITS(12) |
(lower_32_bits(cpu_addr) & GENMASK(31, 8));
addr1 = upper_32_bits(cpu_addr);
@@ -480,14 +481,6 @@ int cdns_pcie_host_setup(struct cdns_pcie_rc *rc)
}
rc->cfg_res = res;
 
-   res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "mem");
-   if (!res) {
-   dev_err(dev, "missing \"mem\"\n");
-   return -EINVAL;
-   }
-
-   pcie->mem_res = res;
-
ret = cdns_pcie_start_link(pcie);
if (ret) {
dev_err(dev, "Failed to start link\n");
diff --git a/drivers/pci/controller/cadence/pcie-cadence-plat.c 
b/drivers/pci/controller/cadence/pcie-cadence-plat.c
index f5c6bf6dfcb8..6f5f07b3eed1 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-plat.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-plat.c
@@ -13,6 +13,8 @@
 #include 
 #include "pcie-cadence.h"
 
+#define CDNS_PLAT_CPU_TO_BUS_ADDR  0x0FFF
+
 /**
  * struct cdns_plat_pcie - private data for this PCIe platform driver
  * @pcie: Cadence PCIe controller
@@ -30,6 +32,15 @@ struct cdns_plat_pcie_of_data {
 
 static const struct of_device_id cdns_plat_pcie_of_match[];
 
+static u64 cdns_plat_cpu_addr_fixup(struct cdns_pcie *pcie, u64 cpu_addr)
+{
+   return cpu_addr & CDNS_PLAT_CPU_TO_BUS_ADDR;
+}
+
+static const struct cdns_pcie_ops cdns_plat_ops = {
+   .cpu_addr_fixup = cdns_plat_cpu_addr_fixup,
+};
+
 static int cdns_plat_pcie_probe(struct platform_device *pdev)
 {
const struct cdns_plat_pcie_of_data *data;
@@ -66,6 +77,7 @@ static int cdns_plat_pcie_probe(struct platform_device *pdev)
 
rc = pci_host_bridge_priv(bridge);
rc->pcie.dev = dev;
+   rc->pcie.ops = _plat_ops;
cdns_plat_pcie->pcie = >pcie;
cdns_plat_pcie->is_rc = is_rc;
 
@@ -93,6 +105,7 @@ static int cdns_plat_pcie_probe(struct platform_device *pdev)
return -ENOMEM;
 
ep->pcie.dev = dev;
+   ep->pcie.ops = _plat_ops;
cdns_plat_pcie->pcie = >pcie;
cdns_plat_pcie->is_rc = is_rc;
 
diff --git a/drivers/pci/controller/cadence/pcie-cadence.c 
b/drivers/pci/controller/cadence/pcie-cadence.c
index cd795f6fc1e2..8a02981fd456 100644
--- a/drivers/pci/controller/cadence/pcie-cadence.c
+++ b/drivers/pci/controller/cadence/pcie-cadence.c
@@ -73,7 +73,9 @@ void cdns_pcie_set_outbound_region(struct cdns_pcie *pcie, u8 
fn,
cdns_pcie_writel(pcie, CDNS_PCIE_AT_OB_REGION_DESC1(r), desc1);
 
/* Set the CPU address */
-   cpu_addr -= pcie->mem_res->start;
+   if (pcie->ops->cpu_addr_fixup)
+   cpu_addr = pcie->ops->cpu_addr_fixup(pcie, cpu_addr);
+
addr0 = CDNS_PCIE_AT_OB_REGION_CPU_ADDR0_NBITS(nbits) |
(lower_32_bits(cpu_addr) & GENMASK(31, 8));
addr1 = 

[PATCH v5 01/14] PCI: cadence: Fix cdns_pcie_{host|ep}_setup() error path

2020-05-21 Thread Kishon Vijay Abraham I
commit bd22885aa188 ("PCI: cadence: Refactor driver to use as a core
library") while refactoring the Cadence PCIe driver to be used as
library, removed pm_runtime_get_sync() from cdns_pcie_ep_setup()
and cdns_pcie_host_setup() but missed to remove the corresponding
pm_runtime_put_sync() in the error path. Fix it here.

Fixes: bd22885aa188 ("PCI: cadence: Refactor driver to use as a core library")
Reviewed-by: Rob Herring 
Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/controller/cadence/pcie-cadence-ep.c   | 9 ++---
 drivers/pci/controller/cadence/pcie-cadence-host.c | 6 +-
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c 
b/drivers/pci/controller/cadence/pcie-cadence-ep.c
index 1c173dad67d1..1fdae37843ef 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
@@ -8,7 +8,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "pcie-cadence.h"
@@ -440,8 +439,7 @@ int cdns_pcie_ep_setup(struct cdns_pcie_ep *ep)
epc = devm_pci_epc_create(dev, _pcie_epc_ops);
if (IS_ERR(epc)) {
dev_err(dev, "failed to create epc device\n");
-   ret = PTR_ERR(epc);
-   goto err_init;
+   return PTR_ERR(epc);
}
 
epc_set_drvdata(epc, ep);
@@ -453,7 +451,7 @@ int cdns_pcie_ep_setup(struct cdns_pcie_ep *ep)
   resource_size(pcie->mem_res));
if (ret < 0) {
dev_err(dev, "failed to initialize the memory space\n");
-   goto err_init;
+   return ret;
}
 
ep->irq_cpu_addr = pci_epc_mem_alloc_addr(epc, >irq_phys_addr,
@@ -472,8 +470,5 @@ int cdns_pcie_ep_setup(struct cdns_pcie_ep *ep)
  free_epc_mem:
pci_epc_mem_exit(epc);
 
- err_init:
-   pm_runtime_put_sync(dev);
-
return ret;
 }
diff --git a/drivers/pci/controller/cadence/pcie-cadence-host.c 
b/drivers/pci/controller/cadence/pcie-cadence-host.c
index 70e0eaa15bf9..8e73a680b567 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-host.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-host.c
@@ -7,7 +7,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "pcie-cadence.h"
 
@@ -476,7 +475,7 @@ int cdns_pcie_host_setup(struct cdns_pcie_rc *rc)
 
ret = cdns_pcie_host_init(dev, , rc);
if (ret)
-   goto err_init;
+   return ret;
 
list_splice_init(, >windows);
bridge->dev.parent = dev;
@@ -494,8 +493,5 @@ int cdns_pcie_host_setup(struct cdns_pcie_rc *rc)
  err_host_probe:
pci_free_resource_list();
 
- err_init:
-   pm_runtime_put_sync(dev);
-
return ret;
 }
-- 
2.17.1



[PATCH v5 03/14] PCI: cadence: Convert all r/w accessors to perform only 32-bit accesses

2020-05-21 Thread Kishon Vijay Abraham I
Certain platforms like TI's J721E using Cadence PCIe IP can perform only
32-bit accesses for reading or writing to Cadence registers. Convert all
read and write accesses to 32-bit in Cadence PCIe driver in preparation
for adding PCIe support in TI's J721E SoC.

Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/controller/cadence/pcie-cadence.h | 71 ++-
 1 file changed, 53 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/controller/cadence/pcie-cadence.h 
b/drivers/pci/controller/cadence/pcie-cadence.h
index bc49c22e48a9..737e9561092b 100644
--- a/drivers/pci/controller/cadence/pcie-cadence.h
+++ b/drivers/pci/controller/cadence/pcie-cadence.h
@@ -319,50 +319,88 @@ struct cdns_pcie_ep {
 
 
 /* Register access */
-static inline void cdns_pcie_writeb(struct cdns_pcie *pcie, u32 reg, u8 value)
+static inline void cdns_pcie_writel(struct cdns_pcie *pcie, u32 reg, u32 value)
 {
-   writeb(value, pcie->reg_base + reg);
+   writel(value, pcie->reg_base + reg);
 }
 
-static inline void cdns_pcie_writew(struct cdns_pcie *pcie, u32 reg, u16 value)
+static inline u32 cdns_pcie_readl(struct cdns_pcie *pcie, u32 reg)
 {
-   writew(value, pcie->reg_base + reg);
+   return readl(pcie->reg_base + reg);
 }
 
-static inline void cdns_pcie_writel(struct cdns_pcie *pcie, u32 reg, u32 value)
+static inline u32 cdns_pcie_read_sz(void __iomem *addr, int size)
 {
-   writel(value, pcie->reg_base + reg);
+   void __iomem *aligned_addr = PTR_ALIGN_DOWN(addr, 0x4);
+   unsigned int offset = (unsigned long)addr & 0x3;
+   u32 val = readl(aligned_addr);
+
+   if (!IS_ALIGNED((uintptr_t)addr, size)) {
+   WARN(1, "Address %p and size %d are not aligned\n", addr, size);
+   return 0;
+   }
+
+   if (size > 2)
+   return val;
+
+   return (val >> (8 * offset)) & ((1 << (size * 8)) - 1);
 }
 
-static inline u32 cdns_pcie_readl(struct cdns_pcie *pcie, u32 reg)
+static inline void cdns_pcie_write_sz(void __iomem *addr, int size, u32 value)
 {
-   return readl(pcie->reg_base + reg);
+   void __iomem *aligned_addr = PTR_ALIGN_DOWN(addr, 0x4);
+   unsigned int offset = (unsigned long)addr & 0x3;
+   u32 mask;
+   u32 val;
+
+   if (!IS_ALIGNED((uintptr_t)addr, size)) {
+   WARN(1, "Address %p and size %d are not aligned\n", addr, size);
+   return;
+   }
+
+   if (size > 2) {
+   writel(value, addr);
+   return;
+   }
+
+   mask = ~(((1 << (size * 8)) - 1) << (offset * 8));
+   val = readl(aligned_addr) & mask;
+   val |= value << (offset * 8);
+   writel(val, aligned_addr);
 }
 
 /* Root Port register access */
 static inline void cdns_pcie_rp_writeb(struct cdns_pcie *pcie,
   u32 reg, u8 value)
 {
-   writeb(value, pcie->reg_base + CDNS_PCIE_RP_BASE + reg);
+   void __iomem *addr = pcie->reg_base + CDNS_PCIE_RP_BASE + reg;
+
+   cdns_pcie_write_sz(addr, 0x1, value);
 }
 
 static inline void cdns_pcie_rp_writew(struct cdns_pcie *pcie,
   u32 reg, u16 value)
 {
-   writew(value, pcie->reg_base + CDNS_PCIE_RP_BASE + reg);
+   void __iomem *addr = pcie->reg_base + CDNS_PCIE_RP_BASE + reg;
+
+   cdns_pcie_write_sz(addr, 0x2, value);
 }
 
 /* Endpoint Function register access */
 static inline void cdns_pcie_ep_fn_writeb(struct cdns_pcie *pcie, u8 fn,
  u32 reg, u8 value)
 {
-   writeb(value, pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
+   void __iomem *addr = pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg;
+
+   cdns_pcie_write_sz(addr, 0x1, value);
 }
 
 static inline void cdns_pcie_ep_fn_writew(struct cdns_pcie *pcie, u8 fn,
  u32 reg, u16 value)
 {
-   writew(value, pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
+   void __iomem *addr = pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg;
+
+   cdns_pcie_write_sz(addr, 0x2, value);
 }
 
 static inline void cdns_pcie_ep_fn_writel(struct cdns_pcie *pcie, u8 fn,
@@ -371,14 +409,11 @@ static inline void cdns_pcie_ep_fn_writel(struct 
cdns_pcie *pcie, u8 fn,
writel(value, pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
 }
 
-static inline u8 cdns_pcie_ep_fn_readb(struct cdns_pcie *pcie, u8 fn, u32 reg)
-{
-   return readb(pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
-}
-
 static inline u16 cdns_pcie_ep_fn_readw(struct cdns_pcie *pcie, u8 fn, u32 reg)
 {
-   return readw(pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
+   void __iomem *addr = pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg;
+
+   return cdns_pcie_read_sz(addr, 0x2);
 }
 
 static inline u32 cdns_pcie_ep_fn_readl(struct cdns_pcie *pcie, u8 fn, u32 reg)
-- 
2.17.1



[PATCH v5 05/14] PCI: cadence: Allow pci_host_bridge to have custom pci_ops

2020-05-21 Thread Kishon Vijay Abraham I
Certain platforms like TI's J721E allows only 32-bit configuration
space access. In such cases pci_generic_config_read and
pci_generic_config_write cannot be used. Add support in Cadence core
to let pci_host_bridge have custom pci_ops.

Signed-off-by: Kishon Vijay Abraham I 
---
 drivers/pci/controller/cadence/pcie-cadence-host.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/cadence/pcie-cadence-host.c 
b/drivers/pci/controller/cadence/pcie-cadence-host.c
index 93a9414932a9..62796791f02c 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-host.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-host.c
@@ -508,7 +508,8 @@ int cdns_pcie_host_setup(struct cdns_pcie_rc *rc)
list_splice_init(, >windows);
bridge->dev.parent = dev;
bridge->busnr = pcie->bus;
-   bridge->ops = _pcie_host_ops;
+   if (!bridge->ops)
+   bridge->ops = _pcie_host_ops;
bridge->map_irq = of_irq_parse_and_map_pci;
bridge->swizzle_irq = pci_common_swizzle;
 
-- 
2.17.1



[PATCH v5 04/14] PCI: cadence: Add support to start link and verify link status

2020-05-21 Thread Kishon Vijay Abraham I
Add cdns_pcie_ops to start link and verify link status. The registers
to start link and to check link status is in Platform specific PCIe
wrapper. Add support for platform specific drivers to add callback
functions for the PCIe Cadence core to start link and verify link status.

Signed-off-by: Kishon Vijay Abraham I 
Reviewed-by: Rob Herring 
---
 .../pci/controller/cadence/pcie-cadence-ep.c  |  8 
 .../controller/cadence/pcie-cadence-host.c| 28 ++
 drivers/pci/controller/cadence/pcie-cadence.h | 37 ++-
 3 files changed, 72 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/controller/cadence/pcie-cadence-ep.c 
b/drivers/pci/controller/cadence/pcie-cadence-ep.c
index 1fdae37843ef..14021d760482 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-ep.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-ep.c
@@ -354,8 +354,10 @@ static int cdns_pcie_ep_start(struct pci_epc *epc)
 {
struct cdns_pcie_ep *ep = epc_get_drvdata(epc);
struct cdns_pcie *pcie = >pcie;
+   struct device *dev = pcie->dev;
struct pci_epf *epf;
u32 cfg;
+   int ret;
 
/*
 * BIT(0) is hardwired to 1, hence function 0 is always enabled
@@ -366,6 +368,12 @@ static int cdns_pcie_ep_start(struct pci_epc *epc)
cfg |= BIT(epf->func_no);
cdns_pcie_writel(pcie, CDNS_PCIE_LM_EP_FUNC_CFG, cfg);
 
+   ret = cdns_pcie_start_link(pcie);
+   if (ret) {
+   dev_err(dev, "Failed to start link\n");
+   return ret;
+   }
+
return 0;
 }
 
diff --git a/drivers/pci/controller/cadence/pcie-cadence-host.c 
b/drivers/pci/controller/cadence/pcie-cadence-host.c
index 8e73a680b567..93a9414932a9 100644
--- a/drivers/pci/controller/cadence/pcie-cadence-host.c
+++ b/drivers/pci/controller/cadence/pcie-cadence-host.c
@@ -3,6 +3,7 @@
 // Cadence PCIe host controller driver.
 // Author: Cyrille Pitchen 
 
+#include 
 #include 
 #include 
 #include 
@@ -422,6 +423,23 @@ static int cdns_pcie_host_init(struct device *dev,
return err;
 }
 
+static int cdns_pcie_host_wait_for_link(struct cdns_pcie *pcie)
+{
+   struct device *dev = pcie->dev;
+   int retries;
+
+   /* Check if the link is up or not */
+   for (retries = 0; retries < LINK_WAIT_MAX_RETRIES; retries++) {
+   if (cdns_pcie_link_up(pcie)) {
+   dev_info(dev, "Link up\n");
+   return 0;
+   }
+   usleep_range(LINK_WAIT_USLEEP_MIN, LINK_WAIT_USLEEP_MAX);
+   }
+
+   return -ETIMEDOUT;
+}
+
 int cdns_pcie_host_setup(struct cdns_pcie_rc *rc)
 {
struct device *dev = rc->pcie.dev;
@@ -470,6 +488,16 @@ int cdns_pcie_host_setup(struct cdns_pcie_rc *rc)
 
pcie->mem_res = res;
 
+   ret = cdns_pcie_start_link(pcie);
+   if (ret) {
+   dev_err(dev, "Failed to start link\n");
+   return ret;
+   }
+
+   ret = cdns_pcie_host_wait_for_link(pcie);
+   if (ret)
+   dev_dbg(dev, "PCIe link never came up\n");
+
for (bar = RP_BAR0; bar <= RP_NO_BAR; bar++)
rc->avail_ib_bar[bar] = true;
 
diff --git a/drivers/pci/controller/cadence/pcie-cadence.h 
b/drivers/pci/controller/cadence/pcie-cadence.h
index 737e9561092b..c013e629e9fa 100644
--- a/drivers/pci/controller/cadence/pcie-cadence.h
+++ b/drivers/pci/controller/cadence/pcie-cadence.h
@@ -10,6 +10,11 @@
 #include 
 #include 
 
+/* Parameters for the waiting for link up routine */
+#define LINK_WAIT_MAX_RETRIES  10
+#define LINK_WAIT_USLEEP_MIN   9
+#define LINK_WAIT_USLEEP_MAX   10
+
 /*
  * Local Management Registers
  */
@@ -245,12 +250,20 @@ enum cdns_pcie_msg_routing {
MSG_ROUTING_GATHER,
 };
 
+struct cdns_pcie_ops {
+   int (*start_link)(struct cdns_pcie *pcie);
+   void(*stop_link)(struct cdns_pcie *pcie);
+   bool(*link_up)(struct cdns_pcie *pcie);
+};
+
 /**
  * struct cdns_pcie - private data for Cadence PCIe controller drivers
  * @reg_base: IO mapped register base
  * @mem_res: start/end offsets in the physical system memory to map PCI 
accesses
  * @is_rc: tell whether the PCIe controller mode is Root Complex or Endpoint.
  * @bus: In Root Complex mode, the bus number
+ * @ops: Platform specific ops to control various inputs from Cadence PCIe
+ *   wrapper
  */
 struct cdns_pcie {
void __iomem*reg_base;
@@ -261,7 +274,7 @@ struct cdns_pcie {
int phy_count;
struct phy  **phy;
struct device_link  **link;
-   const struct cdns_pcie_common_ops *ops;
+   const struct cdns_pcie_ops *ops;
 };
 
 /**
@@ -421,6 +434,28 @@ static inline u32 cdns_pcie_ep_fn_readl(struct cdns_pcie 
*pcie, u8 fn, u32 reg)
return readl(pcie->reg_base + CDNS_PCIE_EP_FUNC_BASE(fn) + reg);
 }
 
+static inline int cdns_pcie_start_link(struct cdns_pcie *pcie)
+{
+   if (pcie->ops->start_link)
+  

[PATCH v5 06/14] dt-bindings: PCI: cadence: Remove "mem" from reg binding

2020-05-21 Thread Kishon Vijay Abraham I
"mem" is not a memory resource and it overlaps with PCIe config space
and memory region. Removve "mem" from reg binding.

Signed-off-by: Kishon Vijay Abraham I 
---
 .../devicetree/bindings/pci/cdns,cdns-pcie-host.yaml  | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/Documentation/devicetree/bindings/pci/cdns,cdns-pcie-host.yaml 
b/Documentation/devicetree/bindings/pci/cdns,cdns-pcie-host.yaml
index 84a8f095d031..6d67067843bf 100644
--- a/Documentation/devicetree/bindings/pci/cdns,cdns-pcie-host.yaml
+++ b/Documentation/devicetree/bindings/pci/cdns,cdns-pcie-host.yaml
@@ -18,13 +18,12 @@ properties:
 const: cdns,cdns-pcie-host
 
   reg:
-maxItems: 3
+maxItems: 2
 
   reg-names:
 items:
   - const: reg
   - const: cfg
-  - const: mem
 
   msi-parent: true
 
@@ -49,9 +48,8 @@ examples:
 device-id = <0x0200>;
 
 reg = <0x0 0xfb00  0x0 0x0100>,
-  <0x0 0x4100  0x0 0x1000>,
-  <0x0 0x4000  0x0 0x0400>;
-reg-names = "reg", "cfg", "mem";
+  <0x0 0x4100  0x0 0x1000>;
+reg-names = "reg", "cfg";
 
 ranges = <0x0200 0x0 0x4200  0x0 0x4200  0x0 
0x100>,
  <0x0100 0x0 0x4300  0x0 0x4300  0x0 
0x001>;
-- 
2.17.1



Re: [PATCH v4 03/14] PCI: cadence: Add support to use custom read and write accessors

2020-05-21 Thread Kishon Vijay Abraham I
Hi Rob,

On 5/22/2020 3:47 AM, Rob Herring wrote:
> On Thu, May 21, 2020 at 7:33 AM Kishon Vijay Abraham I  wrote:
>>
>> Hi Rob,
>>
>> On 5/21/2020 3:37 AM, Rob Herring wrote:
>>> On Wed, May 06, 2020 at 08:44:18PM +0530, Kishon Vijay Abraham I wrote:
 Add support to use custom read and write accessors. Platforms that
 don't support half word or byte access or any other constraint
 while accessing registers can use this feature to populate custom
 read and write accessors. These custom accessors are used for both
 standard register access and configuration space register access of
 the PCIe host bridge.

 Signed-off-by: Kishon Vijay Abraham I 
 ---
  drivers/pci/controller/cadence/pcie-cadence.h | 107 +++---
  1 file changed, 94 insertions(+), 13 deletions(-)
>>>
>>> Actually, take back my R-by...
>>>

 diff --git a/drivers/pci/controller/cadence/pcie-cadence.h 
 b/drivers/pci/controller/cadence/pcie-cadence.h
 index df14ad002fe9..70b6b25153e8 100644
 --- a/drivers/pci/controller/cadence/pcie-cadence.h
 +++ b/drivers/pci/controller/cadence/pcie-cadence.h
 @@ -223,6 +223,11 @@ enum cdns_pcie_msg_routing {
  MSG_ROUTING_GATHER,
  };

 +struct cdns_pcie_ops {
 +u32 (*read)(void __iomem *addr, int size);
 +void(*write)(void __iomem *addr, int size, u32 value);
 +};
 +
  /**
   * struct cdns_pcie - private data for Cadence PCIe controller drivers
   * @reg_base: IO mapped register base
 @@ -239,7 +244,7 @@ struct cdns_pcie {
  int phy_count;
  struct phy  **phy;
  struct device_link  **link;
 -const struct cdns_pcie_common_ops *ops;
 +const struct cdns_pcie_ops *ops;
  };

  /**
 @@ -299,69 +304,145 @@ struct cdns_pcie_ep {
  /* Register access */
  static inline void cdns_pcie_writeb(struct cdns_pcie *pcie, u32 reg, u8 
 value)
  {
 -writeb(value, pcie->reg_base + reg);
 +void __iomem *addr = pcie->reg_base + reg;
 +
 +if (pcie->ops && pcie->ops->write) {
 +pcie->ops->write(addr, 0x1, value);
 +return;
 +}
 +
 +writeb(value, addr);
  }

  static inline void cdns_pcie_writew(struct cdns_pcie *pcie, u32 reg, u16 
 value)
  {
 -writew(value, pcie->reg_base + reg);
 +void __iomem *addr = pcie->reg_base + reg;
 +
 +if (pcie->ops && pcie->ops->write) {
 +pcie->ops->write(addr, 0x2, value);
 +return;
 +}
 +
 +writew(value, addr);
  }
>>>
>>> cdns_pcie_writeb and cdns_pcie_writew are used, so remove them.
>>>

  static inline void cdns_pcie_writel(struct cdns_pcie *pcie, u32 reg, u32 
 value)
  {
 -writel(value, pcie->reg_base + reg);
 +void __iomem *addr = pcie->reg_base + reg;
 +
 +if (pcie->ops && pcie->ops->write) {
 +pcie->ops->write(addr, 0x4, value);
 +return;
 +}
 +
 +writel(value, addr);
>>>
>>> writel isn't broken for you, so you don't need this either.
>>>
  }

  static inline u32 cdns_pcie_readl(struct cdns_pcie *pcie, u32 reg)
  {
 -return readl(pcie->reg_base + reg);
 +void __iomem *addr = pcie->reg_base + reg;
 +
 +if (pcie->ops && pcie->ops->read)
 +return pcie->ops->read(addr, 0x4);
 +
 +return readl(addr);
>>>
>>> And neither is readl.
>>
>> Sure, I'll remove all the unused functions and avoid using ops for readl and
>> writel.
>>>
  }

  /* Root Port register access */
  static inline void cdns_pcie_rp_writeb(struct cdns_pcie *pcie,
 u32 reg, u8 value)
  {
 -writeb(value, pcie->reg_base + CDNS_PCIE_RP_BASE + reg);
 +void __iomem *addr = pcie->reg_base + CDNS_PCIE_RP_BASE + reg;
 +
 +if (pcie->ops && pcie->ops->write) {
 +pcie->ops->write(addr, 0x1, value);
 +return;
 +}
 +
 +writeb(value, addr);
  }

  static inline void cdns_pcie_rp_writew(struct cdns_pcie *pcie,
 u32 reg, u16 value)
  {
 -writew(value, pcie->reg_base + CDNS_PCIE_RP_BASE + reg);
 +void __iomem *addr = pcie->reg_base + CDNS_PCIE_RP_BASE + reg;
 +
 +if (pcie->ops && pcie->ops->write) {
 +pcie->ops->write(addr, 0x2, value);
 +return;
 +}
 +
 +writew(value, addr);
>>>
>>> You removed 2 out of 3 calls to this. I think I'd just make the root
>>> port writes always be 32-bit. It is all just one time init stuff
>>> anyways.
>>>
>>> Either rework the calls to assemble the data into 32-bits or keep these
>>> functions and do the RMW here.
>>
>> The problem with 

[PATCH] mm/vmstat: Add events for PMD based THP migration without split

2020-05-21 Thread Anshuman Khandual
This adds the following two new VM events which will help in validating PMD
based THP migration without split. Statistics reported through these events
will help in performance debugging.

1. THP_PMD_MIGRATION_SUCCESS
2. THP_PMD_MIGRATION_FAILURE

Cc: Naoya Horiguchi 
Cc: Zi Yan 
Cc: John Hubbard 
Cc: Andrew Morton 
Cc: linux...@kvack.org
Cc: linux-kernel@vger.kernel.org
[hughd: fixed oops on NULL newpage]
Signed-off-by: Anshuman Khandual 
---
Changes in V1:

- Changed function name as thp_pmd_migration_success() per John
- Folded in a fix (https://patchwork.kernel.org/patch/11563009/) from Hugh

Changes in RFC V2: (https://patchwork.kernel.org/patch/11554861/)

- Decopupled and renamed VM events from their implementation per Zi and John
- Added THP_PMD_MIGRATION_FAILURE VM event upon allocation failure and split

Changes in RFC V1: (https://patchwork.kernel.org/patch/11542055/)

 include/linux/vm_event_item.h |  4 
 mm/migrate.c  | 23 +++
 mm/vmstat.c   |  4 
 3 files changed, 31 insertions(+)

diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index ffef0f279747..23d8f9884c2b 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -91,6 +91,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
THP_ZERO_PAGE_ALLOC_FAILED,
THP_SWPOUT,
THP_SWPOUT_FALLBACK,
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+   THP_PMD_MIGRATION_SUCCESS,
+   THP_PMD_MIGRATION_FAILURE,
+#endif
 #endif
 #ifdef CONFIG_MEMORY_BALLOON
BALLOON_INFLATE,
diff --git a/mm/migrate.c b/mm/migrate.c
index 7160c1556f79..37f30bcfd628 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1170,6 +1170,20 @@ static int __unmap_and_move(struct page *page, struct 
page *newpage,
 #define ICE_noinline
 #endif
 
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+static inline void thp_pmd_migration_success(bool success)
+{
+   if (success)
+   count_vm_event(THP_PMD_MIGRATION_SUCCESS);
+   else
+   count_vm_event(THP_PMD_MIGRATION_FAILURE);
+}
+#else
+static inline void thp_pmd_migration_success(bool success)
+{
+}
+#endif
+
 /*
  * Obtain the lock on page, remove all ptes and migrate the page
  * to the newly allocated page in newpage.
@@ -1232,6 +1246,14 @@ static ICE_noinline int unmap_and_move(new_page_t 
get_new_page,
 * we want to retry.
 */
if (rc == MIGRATEPAGE_SUCCESS) {
+   /*
+* When the page to be migrated has been freed from under
+* us, that is considered a MIGRATEPAGE_SUCCESS, but no
+* newpage has been allocated. It should not be counted
+* as a successful THP migration.
+*/
+   if (newpage && PageTransHuge(newpage))
+   thp_pmd_migration_success(true);
put_page(page);
if (reason == MR_MEMORY_FAILURE) {
/*
@@ -1474,6 +1496,7 @@ int migrate_pages(struct list_head *from, new_page_t 
get_new_page,
unlock_page(page);
if (!rc) {
list_safe_reset_next(page, 
page2, lru);
+   
thp_pmd_migration_success(false);
goto retry;
}
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 96d21a792b57..e258c782fd3a 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1274,6 +1274,10 @@ const char * const vmstat_text[] = {
"thp_zero_page_alloc_failed",
"thp_swpout",
"thp_swpout_fallback",
+#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
+   "thp_pmd_migration_success",
+   "thp_pmd_migration_failure",
+#endif
 #endif
 #ifdef CONFIG_MEMORY_BALLOON
"balloon_inflate",
-- 
2.20.1



Re: [PATCH v30 10/20] x86/sgx: Linux Enclave Driver

2020-05-21 Thread Sean Christopherson
On Fri, May 15, 2020 at 03:44:00AM +0300, Jarkko Sakkinen wrote:
> +static int sgx_open(struct inode *inode, struct file *file)
> +{
> + struct sgx_encl *encl;
> + int ret;
> +
> + encl = kzalloc(sizeof(*encl), GFP_KERNEL);
> + if (!encl)
> + return -ENOMEM;
> +
> + atomic_set(>flags, 0);
> + kref_init(>refcount);
> + INIT_RADIX_TREE(>page_tree, GFP_KERNEL);
> + mutex_init(>lock);
> + INIT_LIST_HEAD(>mm_list);
> + spin_lock_init(>mm_lock);
> +
> + ret = init_srcu_struct(>srcu);

We're leaking a wee bit of memory here; enough to burn through 14gb in a few
minutes with my newly resurrected EPC cgroup test.  The possibility for
failure should have been a dead giveaway that this allocates memory, but the
"init" name threw me off. :-/

> + if (ret) {
> + kfree(encl);
> + return ret;
> + }
> +
> + file->private_data = encl;
> +
> + return 0;
> +}

...

> +/**
> + * sgx_encl_release - Destroy an enclave instance
> + * @kref:address of a kref inside _encl
> + *
> + * Used together with kref_put(). Frees all the resources associated with the
> + * enclave and the instance itself.
> + */
> +void sgx_encl_release(struct kref *ref)
> +{
> + struct sgx_encl *encl = container_of(ref, struct sgx_encl, refcount);
> +
> + sgx_encl_destroy(encl);
> +
> + if (encl->backing)
> + fput(encl->backing);

The above mem leak can be fixed by adding

cleanup_srcu_struct(>srcu);
> +
> + WARN_ON_ONCE(!list_empty(>mm_list));
> +
> + /* Detect EPC page leak's. */
> + WARN_ON_ONCE(encl->secs_child_cnt);
> + WARN_ON_ONCE(encl->secs.epc_page);
> +
> + kfree(encl);
> +}

...

> +static int sgx_encl_add_page(struct sgx_encl *encl, unsigned long src,
> +  unsigned long offset, unsigned long length,
> +  struct sgx_secinfo *secinfo, unsigned long flags)
> +{

...

> +err_out:
> + radix_tree_delete(_page->encl->page_tree,
> +   PFN_DOWN(encl_page->desc));
> +
> +err_out_unlock:
> + mutex_unlock(>lock);
> + up_read(>mm->mmap_sem);
> +
> +err_out_free:
> + sgx_free_page(epc_page);
> + kfree(encl_page);
> +
> + /*
> +  * Destroy enclave on ENCLS failure as this means that EPC has been
> +  * invalidated.
> +  */
> + if (ret == -EIO)
> + sgx_encl_destroy(encl);

This needs to be called with encl->lock held to prevent racing with the
reclaimer, e.g. sgx_encl_destroy() and sgx_reclaimer_write() can combine to
corrupt secs_child_cnt, among other badness.

It's probably worth adding a lockdep assert in sgx_encl_destroy() as well.

We can either keep the lock across the above frees or retake the lock.  I
like retaking the lock to avoid inverting the ordering between encl->lock
and mmap_sem (even though it's benign).  This is an extremely rare path,
no need to shave cycles.

> +
> + return ret;
> +}


Re: [PATCH v2 7/8] exec: Generic execfd support

2020-05-21 Thread Eric W. Biederman


Rob Landley  writes:

> On 5/20/20 11:05 AM, Eric W. Biederman wrote:

> Toybox would _like_ proc mounted, but can't assume it. I'm writing a new
> bash-compatible shell with nommu support, which means in order to do subshell
> and background tasks if (!CONFIG_FORK) I need to create a pipe pair, vfork(),
> have the child exec itself to unblock the parent, and then read the context 
> data
> that just got discarded through the pipe from the parent. ("Wheee." And you 
> can
> quote me on that.)

Do you have clone(CLONE_VM) ?  If my quick skim of the kernel sources is
correct that should be the same as vfork except without causing the
parent to wait for you.  Which I think would remove the need to reexec
yourself.

>> The file descriptor is stored in mm->exe_file.
>> Probably the most straight forward implementation is to allow
>> execveat(AT_EXE_FILE, ...).
>
> Cool, that works.
>
>> You can look at binfmt_misc for how to reopen an open file descriptor.
>
> Added to the todo heap.

Yes I don't think it would be a lot of code.

I think you might be better served with clone(CLONE_VM) as it doesn't
block so you don't need to feed yourself your context over a pipe.

Eric


Re: [PATCH v2] netprio_cgroup: Fix unlimited memory leak of v2 cgroups

2020-05-21 Thread Zefan Li
On 2020/5/22 5:14, John Fastabend wrote:
> Jakub Kicinski wrote:
>> On Fri, 8 May 2020 22:58:29 -0700 Jakub Kicinski wrote:
>>> On Sat, 9 May 2020 11:32:10 +0800 Zefan Li wrote:
 If systemd is configured to use hybrid mode which enables the use of
 both cgroup v1 and v2, systemd will create new cgroup on both the default
 root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach
 task to the two cgroups. If the task does some network thing then the v2
 cgroup can never be freed after the session exited.

 One of our machines ran into OOM due to this memory leak.

 In the scenario described above when sk_alloc() is called cgroup_sk_alloc()
 thought it's in v2 mode, so it stores the cgroup pointer in 
 sk->sk_cgrp_data
 and increments the cgroup refcnt, but then sock_update_netprioidx() thought
 it's in v1 mode, so it stores netprioidx value in sk->sk_cgrp_data, so the
 cgroup refcnt will never be freed.

 Currently we do the mode switch when someone writes to the ifpriomap cgroup
 control file. The easiest fix is to also do the switch when a task is 
 attached
 to a new cgroup.

 Fixes: bd1060a1d671("sock, cgroup: add sock->sk_cgroup")  
>>>
>>>  ^ space missing here
>>>
 Reported-by: Yang Yingliang 
 Tested-by: Yang Yingliang 
 Signed-off-by: Zefan Li 
>>
>> Fixed up the commit message and applied, thank you.
> 
> Hi Zefan, Tejun,
> 
> This is causing a regression where previously cgroupv2 bpf sockops programs
> could be attached and would run even if netprio_cgroup was enabled as long
> as  the netprio cgroup had not been configured. After this the bpf sockops
> programs can still be attached but only programs attached to the root cgroup
> will be run. For example I hit this when I ran bpf selftests on a box that
> also happened to have netprio cgroup enabled, tests started failing after
> bumping kernel to rc5.
> 
> I'm a bit on the fence here if it needs to be reverted. For my case its just
> a test box and easy enough to work around. Also all the production cases I
> have already have to be aware of this to avoid the configured error. So it
> may be fine but worth noting at least. Added Alexei to see if he has any
> opinion and/or uses net_prio+cgroubv2. I only looked it over briefly but
> didn't see any simple rc6 worthy fixes that would fix the issue above and
> also keep the original behavior.
> 

Me neither. If we really want to keep the original behavior we probably need
to do something similar to what netclassid cgroup does, which is to iterate
all the tasks in the cgroup to update netprioidx when netprio cgroup is
configured, and we also need to not update netprioidx when a task is attached
to a new cgroup.

> And then while reviewing I also wonder do we have the same issue described
> here in netclasid_cgroup.c with the cgrp_attach()? It would be best to keep
> netcls and netprio in sync in this regard imo. At least netcls calls
> cgroup_sk_alloc_disable in the write_classid() hook so I suspect it makes
> sense to also add that to the attach hook?
> 

Fortunately we don't have this issue in netclassid cgroup. :)

Because task_cls_classid() remains 0 as long as netclassid cgroup is not
configured.


Re: [PATCH v8 5/5] dt-bindings: chosen: Document linux,low-memory-range for arm64 kdump

2020-05-21 Thread chenzhou
Hi Rob,

On 2020/5/21 21:29, Rob Herring wrote:
> On Thu, May 21, 2020 at 3:35 AM Chen Zhou  wrote:
>> Add documentation for DT property used by arm64 kdump:
>> linux,low-memory-range.
>> "linux,low-memory-range" is an another memory region used for crash
>> dump kernel devices.
>>
>> Signed-off-by: Chen Zhou 
>> ---
>>  Documentation/devicetree/bindings/chosen.txt | 25 
>>  1 file changed, 25 insertions(+)
> chosen is now a schema documented here[1].
Ok, that is, i don't need to modify the doc in kernel, just create a pull 
request in github [1]?

>
>> diff --git a/Documentation/devicetree/bindings/chosen.txt 
>> b/Documentation/devicetree/bindings/chosen.txt
>> index 45e79172a646..bfe6fb6976e6 100644
>> --- a/Documentation/devicetree/bindings/chosen.txt
>> +++ b/Documentation/devicetree/bindings/chosen.txt
>> @@ -103,6 +103,31 @@ While this property does not represent a real hardware, 
>> the address
>>  and the size are expressed in #address-cells and #size-cells,
>>  respectively, of the root node.
>>
>> +linux,low-memory-range
>> +--
>> +This property (arm64 only) holds a base address and size, describing a
>> +limited region below 4G. Similar to "linux,usable-memory-range", it is
>> +an another memory range which may be considered available for use by the
>> +kernel.
> Why can't you just add a range to "linux,usable-memory-range"? It
> shouldn't be hard to figure out which part is below 4G.
I did like this in my previous version, such as v5. After discussed with James, 
i modified it to the current way.

We think the existing behavior should be unchanged, which helps with keeping 
compatibility with existing
user-space and older kdump kernels.

The comments from James:
> linux,usable-memory-range = .
Won't this break if your kdump kernel doesn't know what the extra parameters 
are?
Or if it expects two ranges, but only gets one? These DT properties should be 
treated as
ABI between kernel versions, we can't really change it like this.

I think the 'low' region is an optional-extra, that is never mapped by the 
first kernel. I
think the simplest thing to do is to add an 'linux,low-memory-range' that we
memblock_add() after memblock_cap_memory_range() has been called.
If its missing, or the new kernel doesn't know what its for, everything keeps 
working.

previous discusses:
https://lkml.org/lkml/2019/6/5/674
https://lkml.org/lkml/2019/6/13/229

Thanks,
Chen Zhou

>
> Rob
>
> [1] 
> https://github.com/devicetree-org/dt-schema/blob/master/schemas/chosen.yaml
>
> .
>




Re: [PATCH] e1000e: Relax condition to trigger reset for ME workaround

2020-05-21 Thread Punit Agrawal
Hi Aaron,

"Brown, Aaron F"  writes:

>> From: netdev-ow...@vger.kernel.org  On
>> Behalf Of Punit Agrawal
>> Sent: Thursday, May 14, 2020 9:31 PM
>> To: Kirsher, Jeffrey T 
>> Cc: daniel.sangor...@toshiba.co.jp; Punit Agrawal
>> ; Alexander Duyck
>> ; David S. Miller ;
>> intel-wired-...@lists.osuosl.org; net...@vger.kernel.org; linux-
>> ker...@vger.kernel.org
>> Subject: [PATCH] e1000e: Relax condition to trigger reset for ME workaround
>> 
>> It's an error if the value of the RX/TX tail descriptor does not match
>> what was written. The error condition is true regardless the duration
>> of the interference from ME. But the driver only performs the reset if
>> E1000_ICH_FWSM_PCIM2PCI_COUNT (2000) iterations of 50us delay have
>> transpired. The extra condition can lead to inconsistency between the
>> state of hardware as expected by the driver.
>> 
>> Fix this by dropping the check for number of delay iterations.
>> 
>> While at it, also make __ew32_prepare() static as it's not used
>> anywhere else.
>> 
>> Signed-off-by: Punit Agrawal 
>> Reviewed-by: Alexander Duyck 
>> Cc: Jeff Kirsher 
>> Cc: "David S. Miller" 
>> Cc: intel-wired-...@lists.osuosl.org
>> Cc: net...@vger.kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>> Hi Jeff,
>> 
>> If there are no further comments please consider merging the patch.
>> 
>> Also, should it be marked for backport to stable?
>> 
>> Thanks,
>> Punit
>> 
>> RFC[0] -> v1:
>> * Dropped return value for __ew32_prepare() as it's not used
>> * Made __ew32_prepare() static
>> * Added tags
>> 
>> [0] https://lkml.org/lkml/2020/5/12/20
>> 
>>  drivers/net/ethernet/intel/e1000e/e1000.h  |  1 -
>>  drivers/net/ethernet/intel/e1000e/netdev.c | 12 +---
>>  2 files changed, 5 insertions(+), 8 deletions(-)
>> 
> Tested-by: Aaron Brown 

Thanks for taking the patch for a spin.

Jeff, let me know if you're okay to apply the tag or want me to send a
new version.

Thanks,
Punit



[PATCH 2/4] pinctrl: sunxi: add support for the Allwinner A100 pin controller

2020-05-21 Thread Frank Lee
This commit introduces support for the pin controller on A100.

Signed-off-by: Frank Lee 
---
 drivers/pinctrl/sunxi/Kconfig |  10 +
 drivers/pinctrl/sunxi/Makefile|   2 +
 drivers/pinctrl/sunxi/pinctrl-sun50i-a100-r.c | 105 +++
 drivers/pinctrl/sunxi/pinctrl-sun50i-a100.c   | 710 ++
 4 files changed, 827 insertions(+)
 create mode 100644 drivers/pinctrl/sunxi/pinctrl-sun50i-a100-r.c
 create mode 100644 drivers/pinctrl/sunxi/pinctrl-sun50i-a100.c

diff --git a/drivers/pinctrl/sunxi/Kconfig b/drivers/pinctrl/sunxi/Kconfig
index f7aae200ee15..593293584ecc 100644
--- a/drivers/pinctrl/sunxi/Kconfig
+++ b/drivers/pinctrl/sunxi/Kconfig
@@ -94,6 +94,16 @@ config PINCTRL_SUN50I_A64_R
default ARM64 && ARCH_SUNXI
select PINCTRL_SUNXI
 
+config PINCTRL_SUN50I_A100
+   bool "Support for the Allwinner A100 PIO"
+   default ARM64 && ARCH_SUNXI
+   select PINCTRL_SUNXI
+
+config PINCTRL_SUN50I_A100_R
+   bool "Support for the Allwinner A100 R-PIO"
+   default ARM64 && ARCH_SUNXI
+   select PINCTRL_SUNXI
+
 config PINCTRL_SUN50I_H5
bool "Support for the Allwinner H5 PIO"
default ARM64 && ARCH_SUNXI
diff --git a/drivers/pinctrl/sunxi/Makefile b/drivers/pinctrl/sunxi/Makefile
index fafcdae8134f..8b7ff0dc3bdf 100644
--- a/drivers/pinctrl/sunxi/Makefile
+++ b/drivers/pinctrl/sunxi/Makefile
@@ -13,6 +13,8 @@ obj-$(CONFIG_PINCTRL_SUN8I_A23_R) += pinctrl-sun8i-a23-r.o
 obj-$(CONFIG_PINCTRL_SUN8I_A33)+= pinctrl-sun8i-a33.o
 obj-$(CONFIG_PINCTRL_SUN50I_A64)   += pinctrl-sun50i-a64.o
 obj-$(CONFIG_PINCTRL_SUN50I_A64_R) += pinctrl-sun50i-a64-r.o
+obj-$(CONFIG_PINCTRL_SUN50I_A100)  += pinctrl-sun50i-a100.o
+obj-$(CONFIG_PINCTRL_SUN50I_A100_R)+= pinctrl-sun50i-a100-r.o
 obj-$(CONFIG_PINCTRL_SUN8I_A83T)   += pinctrl-sun8i-a83t.o
 obj-$(CONFIG_PINCTRL_SUN8I_A83T_R) += pinctrl-sun8i-a83t-r.o
 obj-$(CONFIG_PINCTRL_SUN8I_H3) += pinctrl-sun8i-h3.o
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun50i-a100-r.c 
b/drivers/pinctrl/sunxi/pinctrl-sun50i-a100-r.c
new file mode 100644
index ..d38d8770c9da
--- /dev/null
+++ b/drivers/pinctrl/sunxi/pinctrl-sun50i-a100-r.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2020 Frank Lee 
+ *
+ * Based on:
+ * huangshuosheng 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "pinctrl-sunxi.h"
+
+static const struct sunxi_desc_pin a100_r_pins[] = {
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 0),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   SUNXI_FUNCTION(0x2, "s_i2c0"),  /* SCK */
+   SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 0)),
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 1),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   SUNXI_FUNCTION(0x2, "s_i2c0"),  /* SDA */
+   SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 1)),
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 2),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   SUNXI_FUNCTION(0x2, "s_uart0"), /* TX */
+   SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 2)),
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 3),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   SUNXI_FUNCTION(0x2, "s_uart0"), /* RX */
+   SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 3)),
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 4),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   SUNXI_FUNCTION(0x2, "s_jtag"),  /* MS */
+   SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 4)),
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 5),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   SUNXI_FUNCTION(0x2, "s_jtag"),  /* CK */
+   SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 5)),
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 6),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   SUNXI_FUNCTION(0x2, "s_jtag"),  /* DO */
+   SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 6)),
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 7),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   SUNXI_FUNCTION(0x2, "s_jtag"),  /* DI */
+   SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 7)),
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 8),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   SUNXI_FUNCTION(0x2, "s_i2c1"),  /* SCK */
+   SUNXI_FUNCTION_IRQ_BANK(0x6, 0, 8)),
+   SUNXI_PIN(SUNXI_PINCTRL_PIN(L, 9),
+   SUNXI_FUNCTION(0x0, "gpio_in"),
+   SUNXI_FUNCTION(0x1, "gpio_out"),
+   

Re: [PATCH v7 0/3] perf arm-spe: Add support for synthetic events

2020-05-21 Thread Leo Yan
Hi,

On Mon, May 04, 2020 at 07:56:22PM +0800, Leo Yan wrote:
> This patch set is to support synthetic events with enabling Arm SPE
> decoder.  Since before Xiaojun Tan (Hisilicon) and James Clark (Arm)
> have contributed much for this task, so this patch set is based on their
> privous work and polish for the version 7.
> 
> The main work in this version is to polished the core patch "perf
> arm-spe: Support synthetic events", e.g. rewrite the code to calculate
> ip, packet generation for multiple types (L1 data cache, Last level
> cache, TLB, remote access, etc).  It also heavily refactors code for
> data structure and program flow, which removed unused fields in
> structure and polished the program flow to achieve neat code as
> possible.
> 
> This patch set has been checked with checkpatch.pl, though it leaves
> several warnings, but these warnings are delibarately kept after
> reviewing.  Some warnings ask to add maintainer (so far it's not
> necessary), and some warnings complaint for patch 02 "perf auxtrace:
> Add four itrace options" for the text format, since need to keep the
> consistency with the same code format in the source code, this is why
> this patch doesn't get rid of checkpatch warnings.

Gentle ping ...

It would be appreciate if can get some review for this patch set.

Thanks,
Leo

> Tan Xiaojun (3):
>   perf tools: Move arm-spe-pkt-decoder.h/c to the new dir
>   perf auxtrace: Add four itrace options
>   perf arm-spe: Support synthetic events
> 
>  tools/perf/Documentation/itrace.txt   |   6 +-
>  tools/perf/util/Build |   2 +-
>  tools/perf/util/arm-spe-decoder/Build |   1 +
>  .../util/arm-spe-decoder/arm-spe-decoder.c| 219 +
>  .../util/arm-spe-decoder/arm-spe-decoder.h|  82 ++
>  .../arm-spe-pkt-decoder.c |   0
>  .../arm-spe-pkt-decoder.h |  16 +
>  tools/perf/util/arm-spe.c | 823 +-
>  tools/perf/util/auxtrace.c|  17 +
>  tools/perf/util/auxtrace.h|  15 +-
>  10 files changed, 1135 insertions(+), 46 deletions(-)
>  create mode 100644 tools/perf/util/arm-spe-decoder/Build
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
>  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.c (100%)
>  rename tools/perf/util/{ => arm-spe-decoder}/arm-spe-pkt-decoder.h (64%)
> 
> -- 
> 2.17.1
> 


[PATCH 1/4] clk: sunxi-ng: add support for the Allwinner A100 CCU

2020-05-21 Thread Frank Lee
Add support for a100 in the sunxi-ng CCU framework.

Signed-off-by: Frank Lee 
---
 drivers/clk/sunxi-ng/Kconfig  |   10 +
 drivers/clk/sunxi-ng/Makefile |2 +
 drivers/clk/sunxi-ng/ccu-sun50i-a100-r.c  |  206 +++
 drivers/clk/sunxi-ng/ccu-sun50i-a100-r.h  |   14 +
 drivers/clk/sunxi-ng/ccu-sun50i-a100.c| 1255 +
 drivers/clk/sunxi-ng/ccu-sun50i-a100.h|   14 +
 include/dt-bindings/clock/sun50i-a100-ccu.h   |  141 ++
 include/dt-bindings/clock/sun50i-a100-r-ccu.h |   25 +
 include/dt-bindings/reset/sun50i-a100-ccu.h   |   68 +
 include/dt-bindings/reset/sun50i-a100-r-ccu.h |   18 +
 10 files changed, 1753 insertions(+)
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun50i-a100-r.c
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun50i-a100-r.h
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun50i-a100.c
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun50i-a100.h
 create mode 100644 include/dt-bindings/clock/sun50i-a100-ccu.h
 create mode 100644 include/dt-bindings/clock/sun50i-a100-r-ccu.h
 create mode 100644 include/dt-bindings/reset/sun50i-a100-ccu.h
 create mode 100644 include/dt-bindings/reset/sun50i-a100-r-ccu.h

diff --git a/drivers/clk/sunxi-ng/Kconfig b/drivers/clk/sunxi-ng/Kconfig
index cdf333003c30..ce5f5847d5d3 100644
--- a/drivers/clk/sunxi-ng/Kconfig
+++ b/drivers/clk/sunxi-ng/Kconfig
@@ -17,6 +17,16 @@ config SUN50I_A64_CCU
default ARM64 && ARCH_SUNXI
depends on (ARM64 && ARCH_SUNXI) || COMPILE_TEST
 
+config SUN50I_A100_CCU
+   bool "Support for the Allwinner A100 CCU"
+   default ARM64 && ARCH_SUNXI
+   depends on (ARM64 && ARCH_SUNXI) || COMPILE_TEST
+
+config SUN50I_A100_R_CCU
+   bool "Support for the Allwinner A100 PRCM CCU"
+   default ARM64 && ARCH_SUNXI
+   depends on (ARM64 && ARCH_SUNXI) || COMPILE_TEST
+
 config SUN50I_H6_CCU
bool "Support for the Allwinner H6 CCU"
default ARM64 && ARCH_SUNXI
diff --git a/drivers/clk/sunxi-ng/Makefile b/drivers/clk/sunxi-ng/Makefile
index 4c7bee883f2f..3eb5cff40eac 100644
--- a/drivers/clk/sunxi-ng/Makefile
+++ b/drivers/clk/sunxi-ng/Makefile
@@ -23,6 +23,8 @@ obj-y += ccu_mp.o
 # SoC support
 obj-$(CONFIG_SUNIV_F1C100S_CCU)+= ccu-suniv-f1c100s.o
 obj-$(CONFIG_SUN50I_A64_CCU)   += ccu-sun50i-a64.o
+obj-$(CONFIG_SUN50I_A100_CCU)  += ccu-sun50i-a100.o
+obj-$(CONFIG_SUN50I_A100_R_CCU)+= ccu-sun50i-a100-r.o
 obj-$(CONFIG_SUN50I_H6_CCU)+= ccu-sun50i-h6.o
 obj-$(CONFIG_SUN50I_H6_R_CCU)  += ccu-sun50i-h6-r.o
 obj-$(CONFIG_SUN4I_A10_CCU)+= ccu-sun4i-a10.o
diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-a100-r.c 
b/drivers/clk/sunxi-ng/ccu-sun50i-a100-r.c
new file mode 100644
index ..31875269ef90
--- /dev/null
+++ b/drivers/clk/sunxi-ng/ccu-sun50i-a100-r.c
@@ -0,0 +1,206 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2020 Frank Lee 
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "ccu_common.h"
+#include "ccu_reset.h"
+
+#include "ccu_div.h"
+#include "ccu_gate.h"
+#include "ccu_mp.h"
+#include "ccu_nm.h"
+
+#include "ccu-sun50i-a100-r.h"
+
+static const char * const cpus_r_apb2_parents[] = { "dcxo24M", "osc32k",
+"iosc", "pll-periph0" };
+static const struct ccu_mux_var_prediv cpus_r_apb2_predivs[] = {
+   { .index = 3, .shift = 0, .width = 5 },
+};
+
+static struct ccu_div cpus_clk = {
+   .div= _SUNXI_CCU_DIV_FLAGS(8, 2, CLK_DIVIDER_POWER_OF_TWO),
+
+   .mux= {
+   .shift  = 24,
+   .width  = 2,
+
+   .var_predivs= cpus_r_apb2_predivs,
+   .n_var_predivs  = ARRAY_SIZE(cpus_r_apb2_predivs),
+   },
+
+   .common = {
+   .reg= 0x000,
+   .features   = CCU_FEATURE_VARIABLE_PREDIV,
+   .hw.init= CLK_HW_INIT_PARENTS("cpus",
+ cpus_r_apb2_parents,
+ _div_ops,
+ 0),
+   },
+};
+
+static CLK_FIXED_FACTOR_HW(r_ahb_clk, "r-ahb", _clk.common.hw, 1, 1, 0);
+
+static struct ccu_div r_apb1_clk = {
+   .div= _SUNXI_CCU_DIV(0, 2),
+
+   .common = {
+   .reg= 0x00c,
+   .hw.init= CLK_HW_INIT("r-apb1",
+ "r-ahb",
+ _div_ops,
+ 0),
+   },
+};
+
+static struct ccu_div r_apb2_clk = {
+   .div= _SUNXI_CCU_DIV_FLAGS(8, 2, CLK_DIVIDER_POWER_OF_TWO),
+
+   .mux= {
+   .shift  = 24,
+   .width  = 2,
+
+   .var_predivs= cpus_r_apb2_predivs,
+   .n_var_predivs  = ARRAY_SIZE(cpus_r_apb2_predivs),
+   },
+
+   

[PATCH 3/4] arm64: allwinner: A100: add the basical Allwinner A100 DTSI file

2020-05-21 Thread Frank Lee
Allwinner A100 is a new SoC with Cortex-A53 cores, this commit adds
the basical DTSI file of it, including the clock, pins and UART support.

Signed-off-by: Frank Lee 
---
 .../arm64/boot/dts/allwinner/sun50i-a100.dtsi | 173 ++
 1 file changed, 173 insertions(+)
 create mode 100644 arch/arm64/boot/dts/allwinner/sun50i-a100.dtsi

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a100.dtsi 
b/arch/arm64/boot/dts/allwinner/sun50i-a100.dtsi
new file mode 100644
index ..bd9bf9b2f832
--- /dev/null
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a100.dtsi
@@ -0,0 +1,173 @@
+// SPDX-License-Identifier: (GPL-2.0+ or MIT)
+/*
+ * Copyright (c) 2020 Frank Lee 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/ {
+   interrupt-parent = <>;
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   cpus {
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   cpu0: cpu@0 {
+   compatible = "arm,armv8";
+   device_type = "cpu";
+   reg = <0x0>;
+   enable-method = "psci";
+   };
+
+   cpu@1 {
+   compatible = "arm,armv8";
+   device_type = "cpu";
+   reg = <0x1>;
+   enable-method = "psci";
+   };
+
+   cpu@2 {
+   compatible = "arm,armv8";
+   device_type = "cpu";
+   reg = <0x2>;
+   enable-method = "psci";
+   };
+
+   cpu@3 {
+   compatible = "arm,armv8";
+   device_type = "cpu";
+   reg = <0x3>;
+   enable-method = "psci";
+   };
+   };
+
+   psci {
+   compatible = "arm,psci-1.0";
+   method = "smc";
+   };
+
+   iosc: internal-osc-clk {
+   compatible = "fixed-clock";
+   clock-frequency = <1600>;
+   clock-accuracy = <3>;
+   clock-output-names = "iosc";
+   #clock-cells = <0>;
+   };
+
+   dcxo24M: dcxo24M_clk {
+   compatible = "fixed-clock";
+   clock-frequency = <2400>;
+   clock-output-names = "dcxo24M";
+   #clock-cells = <0>;
+   };
+
+   osc32k: osc32k_clk {
+   compatible = "fixed-clock";
+   clock-frequency = <32768>;
+   clock-output-names = "osc32k";
+   #clock-cells = <0>;
+   };
+
+   timer {
+   compatible = "arm,armv8-timer";
+   interrupts = ,
+,
+,
+;
+   };
+
+   soc: soc {
+   compatible = "simple-bus";
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   ccu: clock@3001000 {
+   compatible = "allwinner,sun50i-a100-ccu";
+   reg = <0x0 0x03001000 0x0 0x1000>;
+   clocks = <>, <>, <>;
+   clock-names = "hosc", "losc", "iosc";
+   #clock-cells = <1>;
+   #reset-cells = <1>;
+   };
+
+   gic: interrupt-controller@3021000 {
+   compatible = "arm,gic-400";
+   reg = <0x0 0x03021000 0x0 0x1000>,
+ <0x0 0x03022000 0x0 0x2000>,
+ <0x0 0x03024000 0x0 0x2000>,
+ <0x0 0x03026000 0x0 0x2000>;
+   interrupts = ;
+   interrupt-controller;
+   #interrupt-cells = <3>;
+   };
+
+   pio: pinctrl@300b000 {
+   compatible = "allwinner,sun50i-a100-pinctrl";
+   reg = <0x0 0x0300b000 0x0 0x400>;
+   interrupts = ,
+,
+,
+,
+,
+,
+,
+,
+;
+   clocks = < CLK_APB1>, <>, <>;
+   clock-names = "apb", "hosc", "losc";
+   gpio-controller;
+   #gpio-cells = <3>;
+   interrupt-controller;
+   #interrupt-cells = <3>;
+
+   uart0_pb_pins: uart0-pb-pins {
+   pins = "PB9", "PB10";
+   function = "uart0";
+   };
+   };
+
+   uart0: serial@500 {
+   compatible = "snps,dw-apb-uart";

[PATCH 4/4] arm64: allwinner: A100: add support for Allwinner Perf1 board

2020-05-21 Thread Frank Lee
A100 perf1 is an Allwinner A100-based SBC, with the following features:

- 1GiB DDR3 DRAM
- AXP803 PMIC
- 2 USB 2.0 ports
- MicroSD slot and on-board eMMC module
- on-board Nand flash
- ···

Adds initial support for it, including the UART.

Signed-off-by: Frank Lee 
---
 arch/arm64/boot/dts/allwinner/Makefile|  1 +
 .../allwinner/sun50i-a100-allwinner-perf1.dts | 27 +++
 2 files changed, 28 insertions(+)
 create mode 100644 
arch/arm64/boot/dts/allwinner/sun50i-a100-allwinner-perf1.dts

diff --git a/arch/arm64/boot/dts/allwinner/Makefile 
b/arch/arm64/boot/dts/allwinner/Makefile
index e4d3cd0ac5bb..ab780dbdd17b 100644
--- a/arch/arm64/boot/dts/allwinner/Makefile
+++ b/arch/arm64/boot/dts/allwinner/Makefile
@@ -14,6 +14,7 @@ dtb-$(CONFIG_ARCH_SUNXI) += sun50i-a64-pinephone-1.1.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-a64-pinetab.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-a64-sopine-baseboard.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-a64-teres-i.dtb
+dtb-$(CONFIG_ARCH_SUNXI) += sun50i-a100-allwinner-perf1.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h5-bananapi-m2-plus.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h5-bananapi-m2-plus-v1.2.dtb
 dtb-$(CONFIG_ARCH_SUNXI) += sun50i-h5-emlid-neutis-n5-devboard.dtb
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-a100-allwinner-perf1.dts 
b/arch/arm64/boot/dts/allwinner/sun50i-a100-allwinner-perf1.dts
new file mode 100644
index ..32c9986920ed
--- /dev/null
+++ b/arch/arm64/boot/dts/allwinner/sun50i-a100-allwinner-perf1.dts
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: (GPL-2.0+ or MIT)
+/*
+ * Copyright (c) 2020 Frank Lee 
+ */
+
+/dts-v1/;
+
+#include "sun50i-a100.dtsi"
+
+/{
+   model = "A100 perf1";
+   compatible = "allwinner,a100-perf1", "allwinner,sun50i-a100";
+
+   aliases {
+   serial0 = 
+   };
+
+   chosen {
+   stdout-path = "serial0:115200n8";
+   };
+};
+
+ {
+   pinctrl-names = "default";
+   pinctrl-0 = <_pb_pins>;
+   status = "okay";
+};
-- 
2.24.0



[PATCH 0/4] Allwinner A100 Initial support

2020-05-21 Thread Frank Lee
This patch set adds initial support for allwinner a100 soc,
which is a 64-bit tablet chip.

Frank Lee (4):
  clk: sunxi-ng: add support for the Allwinner A100 CCU
  pinctrl: sunxi: add support for the Allwinner A100 pin controller
  arm64: allwinner: A100: add the basical Allwinner A100 DTSI file
  arm64: allwinner: A100: add support for Allwinner Perf1 board

 arch/arm64/boot/dts/allwinner/Makefile|1 +
 .../allwinner/sun50i-a100-allwinner-perf1.dts |   27 +
 .../arm64/boot/dts/allwinner/sun50i-a100.dtsi |  173 +++
 drivers/clk/sunxi-ng/Kconfig  |   10 +
 drivers/clk/sunxi-ng/Makefile |2 +
 drivers/clk/sunxi-ng/ccu-sun50i-a100-r.c  |  206 +++
 drivers/clk/sunxi-ng/ccu-sun50i-a100-r.h  |   14 +
 drivers/clk/sunxi-ng/ccu-sun50i-a100.c| 1255 +
 drivers/clk/sunxi-ng/ccu-sun50i-a100.h|   14 +
 drivers/pinctrl/sunxi/Kconfig |   10 +
 drivers/pinctrl/sunxi/Makefile|2 +
 drivers/pinctrl/sunxi/pinctrl-sun50i-a100-r.c |  105 ++
 drivers/pinctrl/sunxi/pinctrl-sun50i-a100.c   |  710 ++
 include/dt-bindings/clock/sun50i-a100-ccu.h   |  141 ++
 include/dt-bindings/clock/sun50i-a100-r-ccu.h |   25 +
 include/dt-bindings/reset/sun50i-a100-ccu.h   |   68 +
 include/dt-bindings/reset/sun50i-a100-r-ccu.h |   18 +
 17 files changed, 2781 insertions(+)
 create mode 100644 
arch/arm64/boot/dts/allwinner/sun50i-a100-allwinner-perf1.dts
 create mode 100644 arch/arm64/boot/dts/allwinner/sun50i-a100.dtsi
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun50i-a100-r.c
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun50i-a100-r.h
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun50i-a100.c
 create mode 100644 drivers/clk/sunxi-ng/ccu-sun50i-a100.h
 create mode 100644 drivers/pinctrl/sunxi/pinctrl-sun50i-a100-r.c
 create mode 100644 drivers/pinctrl/sunxi/pinctrl-sun50i-a100.c
 create mode 100644 include/dt-bindings/clock/sun50i-a100-ccu.h
 create mode 100644 include/dt-bindings/clock/sun50i-a100-r-ccu.h
 create mode 100644 include/dt-bindings/reset/sun50i-a100-ccu.h
 create mode 100644 include/dt-bindings/reset/sun50i-a100-r-ccu.h

-- 
2.24.0



linux-next: manual merge of the jc_docs tree with the ext4 tree

2020-05-21 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the jc_docs tree got a conflict in:

  Documentation/filesystems/fiemap.rst

between commit:

  469581d9e5c9 ("fs: move fiemap range validation into the file systems 
instances")

from the ext4 tree and commit:

  e6f7df74ec1a ("docs: filesystems: convert fiemap.txt to ReST")

from the jc_docs tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc Documentation/filesystems/fiemap.rst
index 35c8571eccb6,2a572e7edc08..
--- a/Documentation/filesystems/fiemap.rst
+++ b/Documentation/filesystems/fiemap.rst
@@@ -203,10 -206,9 +206,10 @@@ EINTR once fatal signal received
  
  
  Flag checking should be done at the beginning of the ->fiemap callback via the
- fiemap_prep() helper:
 -fiemap_check_flags() helper::
++fiemap_prep() helper::
  
- int fiemap_prep(struct inode *inode, struct fiemap_extent_info *fieinfo,
-   u64 start, u64 *len, u32 supported_flags);
 -  int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
++  int fiemap_prep(struct inode *inode, struct fiemap_extent_info *fieinfo,
++u64 start, u64 *len, u32 supported_flags);
  
  The struct fieinfo should be passed in as received from ioctl_fiemap(). The
  set of fiemap flags which the fs understands should be passed via fs_flags. If


pgpidZswDsIo2.pgp
Description: OpenPGP digital signature


Re: [PATCH v4 00/36] Large pages in the page cache

2020-05-21 Thread Matthew Wilcox
On Fri, May 22, 2020 at 12:57:51PM +1000, Dave Chinner wrote:
> On Thu, May 21, 2020 at 05:04:11PM -0700, Matthew Wilcox wrote:
> > On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote:
> > > Ok, so the main issue I have with the filesystem/iomap side of
> > > things is that it appears to be adding "transparent huge page"
> > > awareness to the filesysetm code, not "large page support".
> > > 
> > > For people that aren't aware of the difference between the
> > > transparent huge and and a normal compound page (e.g. I have no idea
> > > what the difference is), this is likely to cause problems,
> > > especially as you haven't explained at all in this description why
> > > transparent huge pages are being used rather than bog standard
> > > compound pages.
> > 
> > The primary reason to use a different name from compound_*
> > is so that it can be compiled out for systems that don't enable
> > CONFIG_TRANSPARENT_HUGEPAGE.  So THPs are compound pages, as they always
> > have been, but for a filesystem, using thp_size() will compile to either
> > page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE.
> 
> Again, why is this dependent on THP? We can allocate compound pages
> without using THP, so why only allow the page cache to use larger
> pages when THP is configured?

We have too many CONFIG options.  My brain can't cope with adding
CONFIG_LARGE_PAGES because then we might have neither THP nor LP, LP and
not THP, THP and not LP or both THP and LP.  And of course HUGETLBFS,
which has its own special set of issues that one has to think about when
dealing with the page cache.

So, either large pages becomes part of the base kernel and you
always get them, or there's a CONFIG option to enable them and it's
CONFIG_TRANSPARENT_HUGEPAGE.  I chose the latter.

I suppose what I'm saying is that a transparent hugepage can now be any
size [1], not just PMD size.

[1] power of two that isn't 1 because we use the third page for
something-or-other.


Nice to Meet you

2020-05-21 Thread info
How are you and your family? my name is Prashant Wong Lin,
i am a native of Hong Kong but resides and a citizen of United Kingdom.

I work with an Oil and Gas Company here in London, tell me about you? Your work 
etc

--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



Re: [PATCH v4] /dev/mem: Revoke mappings when a driver claims the region

2020-05-21 Thread Kees Cook
On Thu, May 21, 2020 at 02:06:17PM -0700, Dan Williams wrote:
> The typical usage of unmap_mapping_range() is part of
> truncate_pagecache() to punch a hole in a file, but in this case the
> implementation is only doing the "first half" of a hole punch. Namely it
> is just evacuating current established mappings of the "hole", and it
> relies on the fact that /dev/mem establishes mappings in terms of
> absolute physical address offsets. Once existing mmap users are
> invalidated they can attempt to re-establish the mapping, or attempt to
> continue issuing read(2) / write(2) to the invalidated extent, but they
> will then be subject to the CONFIG_IO_STRICT_DEVMEM checking that can
> block those subsequent accesses.

Nice!

Reviewed-by: Kees Cook 

And a thread hijack...   ;)

I think this is very close to providing a way to solve another issue
I've had with /dev/mem, which is to zero the view of the first 1MB of
/dev/mem via mmap. I only fixed the read/write accesses:
a4866aa81251 ("mm: Tighten x86 /dev/mem with zeroing reads")
I.e. the low 1MB range should be considered allowed, but any reads will see
zeros.

> + unmap_mapping_range(inode->i_mapping, res->start, resource_size(res), 
> 1);

Is unmap_mapping_range() sufficient for this? Would it need to happen
once during open_port() or something more special during mmap_mem()?

-- 
Kees Cook


[PATCH] [v2] usb: musb: Fix runtime PM imbalance on error

2020-05-21 Thread Dinghao Liu
When copy_from_user() returns an error code, there
is a runtime PM usage counter imbalance.

Fix this by moving copy_from_user() to the beginning
of this function.

Signed-off-by: Dinghao Liu 
---
 drivers/usb/musb/musb_debugfs.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/musb/musb_debugfs.c b/drivers/usb/musb/musb_debugfs.c
index 7b6281ab62ed..30a89aa8a3e7 100644
--- a/drivers/usb/musb/musb_debugfs.c
+++ b/drivers/usb/musb/musb_debugfs.c
@@ -168,6 +168,11 @@ static ssize_t musb_test_mode_write(struct file *file,
u8  test;
charbuf[24];
 
+   memset(buf, 0x00, sizeof(buf));
+
+   if (copy_from_user(buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
+   return -EFAULT;
+
pm_runtime_get_sync(musb->controller);
test = musb_readb(musb->mregs, MUSB_TESTMODE);
if (test) {
@@ -176,11 +181,6 @@ static ssize_t musb_test_mode_write(struct file *file,
goto ret;
}
 
-   memset(buf, 0x00, sizeof(buf));
-
-   if (copy_from_user(buf, ubuf, min_t(size_t, sizeof(buf) - 1, count)))
-   return -EFAULT;
-
if (strstarts(buf, "force host full-speed"))
test = MUSB_TEST_FORCE_HOST | MUSB_TEST_FORCE_FS;
 
-- 
2.17.1



Re: [PATCH v4 00/36] Large pages in the page cache

2020-05-21 Thread Dave Chinner
On Thu, May 21, 2020 at 05:04:11PM -0700, Matthew Wilcox wrote:
> On Fri, May 22, 2020 at 08:49:06AM +1000, Dave Chinner wrote:
> > Ok, so the main issue I have with the filesystem/iomap side of
> > things is that it appears to be adding "transparent huge page"
> > awareness to the filesysetm code, not "large page support".
> > 
> > For people that aren't aware of the difference between the
> > transparent huge and and a normal compound page (e.g. I have no idea
> > what the difference is), this is likely to cause problems,
> > especially as you haven't explained at all in this description why
> > transparent huge pages are being used rather than bog standard
> > compound pages.
> 
> The primary reason to use a different name from compound_*
> is so that it can be compiled out for systems that don't enable
> CONFIG_TRANSPARENT_HUGEPAGE.  So THPs are compound pages, as they always
> have been, but for a filesystem, using thp_size() will compile to either
> page_size() or PAGE_SIZE depending on CONFIG_TRANSPARENT_HUGEPAGE.

Again, why is this dependent on THP? We can allocate compound pages
without using THP, so why only allow the page cache to use larger
pages when THP is configured?

i.e. I don't know why this is dependent on THP because you haven't
explained why this only works for THP and not just plain old
compound pages

> Now, maybe thp_size() is the wrong name, but then you need to suggest
> a better name ;-)

First you need to explain why THP is requirement for large pages in
the page cache when most of the code changes I see only care if the
page is a compound page or not

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


Re: [PATCH] mm/gup: fixup gup.c for "mm/gup: refactor and de-duplicate gup_fast() code"

2020-05-21 Thread John Hubbard

On 2020-05-21 19:46, Chris Wilson wrote:

Quoting John Hubbard (2020-05-22 00:38:41)

Include FOLL_FAST_ONLY in the list of flags to *not* WARN()
on, in internal_get_user_pages_fast().

Cc: Chris Wilson 
Cc: Daniel Vetter 
Cc: David Airlie 
Cc: Jani Nikula 
Cc: "Joonas Lahtinen" 
Cc: Matthew Auld 
Cc: Matthew Wilcox 
Cc: Rodrigo Vivi 
Cc: Souptick Joarder 
Cc: Tvrtko Ursulin 
Signed-off-by: John Hubbard 
---

Hi Andrew, Chris,

Andrew: This is a fixup that applies to today's (20200521) linux-next.
In that tree, this fixes up:

commit dfb8dfe80808 ("mm/gup: refactor and de-duplicate gup_fast() code")

Chris: I'd like to request another CI run for the drm/i915 changes, so
for that, would you prefer that I post a v2 of the series [1], or
is it easier for you to just apply this patch here, on top of [2]?


If you post your series again with this patch included to intel-gfx, CI
will pick it up. Or I'll do that in the morning.
-Chris



OK, perfect. I'll post a version for linux.git in a moment here.


thanks,
--
John Hubbard
NVIDIA


Re: [PATCH v1 1/1] PCI/ERR: Handle fatal error recovery for non-hotplug capable devices

2020-05-21 Thread Yicong Yang



On 2020/5/22 3:31, Kuppuswamy, Sathyanarayanan wrote:
>
>
> On 5/21/20 3:58 AM, Yicong Yang wrote:
>> On 2020/5/21 1:04, Kuppuswamy, Sathyanarayanan wrote:
>>>
>>>
>>> On 5/20/20 1:28 AM, Yicong Yang wrote:
 On 2020/5/7 11:32, sathyanarayanan.kuppusw...@linux.intel.com wrote:
> From: Kuppuswamy Sathyanarayanan 
> 
>
> If there are non-hotplug capable devices connected to a given
> port, then during the fatal error recovery(triggered by DPC or
> AER), after calling reset_link() function, we cannot rely on
> hotplug handler to detach and re-enumerate the device drivers
> in the affected bus. Instead, we will have to let the error
> recovery handler call report_slot_reset() for all devices in
> the bus to notify about the reset operation. Although this is
> only required for non hot-plug capable devices, doing it for
> hotplug capable devices should not affect the functionality.
>
> Along with above issue, this fix also applicable to following
> issue.
>
> Commit 6d2c89441571 ("PCI/ERR: Update error status after
> reset_link()") added support to store status of reset_link()
> call. Although this fixed the error recovery issue observed if
> the initial value of error status is PCI_ERS_RESULT_DISCONNECT
> or PCI_ERS_RESULT_NO_AER_DRIVER, it also discarded the status
> result from report_frozen_detected. This can cause a failure to
> recover if _NEED_RESET is returned by report_frozen_detected and
> report_slot_reset is not invoked.
>
> Such an event can be induced for testing purposes by reducing the
> Max_Payload_Size of a PCIe bridge to less than that of a device
> downstream from the bridge, and then initiating I/O through the
> device, resulting in oversize transactions.  In the presence of DPC,
> this results in a containment event and attempted reset and recovery
> via pcie_do_recovery.  After 6d2c89441571 report_slot_reset is not
> invoked, and the device does not recover.
>
> [original patch is from jay.vosbu...@canonical.com]
> [original patch link 
> https://lore.kernel.org/linux-pci/18609.1588812972@famine/]
> Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
> Signed-off-by: Jay Vosburgh 
> Signed-off-by: Kuppuswamy Sathyanarayanan 
> 
> ---
>drivers/pci/pcie/err.c | 19 +++
>1 file changed, 15 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 14bb8f54723e..db80e1ecb2dc 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -165,13 +165,24 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev 
> *dev,
>pci_dbg(dev, "broadcast error_detected message\n");
>if (state == pci_channel_io_frozen) {
>pci_walk_bus(bus, report_frozen_detected, );
> -status = reset_link(dev);
> -if (status != PCI_ERS_RESULT_RECOVERED) {
> +status = PCI_ERS_RESULT_NEED_RESET;
> +} else {
> +pci_walk_bus(bus, report_normal_detected, );
> +}
> +
> +if (status == PCI_ERS_RESULT_NEED_RESET) {
> +if (reset_link) {
> +if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED)

 we'll call reset_link() only if link is frozen. so it may have problem 
 here.
>>> you mean before this change right?
>>> After this change, reset_link() will be called as long as status is
>>> PCI_ERS_RESULT_NEED_RESET.
>>
>> Yes. I think we should reset the link only if the io is blocked as before. 
>> There's
>> no reason to reset a normal link.
> Currently, only AER and DPC driver uses pcie_do_recovery() call. So the
> possible reset_link options are dpc_reset_link() and aer_root_reset().
>
> In dpc_reset_link() case, the link is already disabled and hence we
> don't need to do another reset. In case of aer_root_reset() it
> uses pci_bus_error_reset() to reset the slot.

Not exactly. In pci_bus_error_reset(), we call pci_slot_reset() only if it's
hotpluggable. But we always call pci_bus_reset() to perform a secondary bus
reset for the bridge. That's what I think is unnecessary for a normal link,
and that's what reset link indicates us to do. The slot reset is introduced
in the process only to solve side effects. (c4eed62a2143, PCI/ERR: Use slot 
reset if available)

PCI_ERS_RESULT_NEED_RESET indicates that the driver
wants a platform-dependent slot reset and its ->slot_reset() method to be 
called then.
I don't think it's same as slot reset mentioned above, which is only for 
hotpluggable
ones.

Previously, if link is normal and the driver reports PCI_ERS_RESULT_NEED_RESET,
we'll only call ->slot_reset() without slot reset in reset_link(). Maybe it's 
better
to perform just like before.

Thanks.


>>
>> Furthermore, PCI_ERS_RESULT_NEED_RESET means device driver requires a slot 
>> reset rather
>> 

[PATCH net-next 5/5] net: hns3: add a print for initializing CMDQ when reset pending

2020-05-21 Thread Huazhong Tan
When initializing CMDQ fails because of reset pending,
there is no hint for debugging, so adds a log for it.

Signed-off-by: Huazhong Tan 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
index 7f509ef..64a1d0bd 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.c
@@ -426,6 +426,9 @@ int hclge_cmd_init(struct hclge_dev *hdev)
 * reset may happen when lower level reset is being processed.
 */
if ((hclge_is_reset_pending(hdev))) {
+   dev_err(>pdev->dev,
+   "failed to init cmd since reset %#lx pending\n",
+   hdev->reset_pending);
ret = -EBUSY;
goto err_cmd_init;
}
-- 
2.7.4



  1   2   3   4   5   6   7   8   9   10   >