Re: [PATCH 5.2 000/135] 5.2.10-stable review

2019-08-23 Thread Sasha Levin

On Fri, Aug 23, 2019 at 07:32:58PM -0700, Greg KH wrote:

On Fri, Aug 23, 2019 at 09:18:05PM -0400, Sasha Levin wrote:

On Fri, Aug 23, 2019 at 10:36:27AM -0700, Greg KH wrote:
> On Fri, Aug 23, 2019 at 02:28:53AM -0400, Sasha Levin wrote:
> > On Fri, Aug 23, 2019 at 02:42:48AM +0200, Stefan Lippers-Hollmann wrote:
> > > Hi
> > >
> > > On 2019-08-22, Greg KH wrote:
> > > > On Fri, Aug 23, 2019 at 12:05:27AM +0200, Stefan Lippers-Hollmann wrote:
> > > > > On 2019-08-22, Greg KH wrote:
> > > > > > On Thu, Aug 22, 2019 at 01:05:56PM -0400, Sasha Levin wrote:
> > > [...]
> > > > > It might be down to kernel.org mirroring, but the patch file doesn't
> > > > > seem to be available yet (404), both in the wrong location listed
> > > > > above - and the expected one under
> > > > >
> > > > >
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.10-rc1.gz
> > > [...]
> > > > Ah, no, it's not a mirroring problem, Sasha and I didn't know if anyone
> > > > was actually using the patch files anymore, so it was simpler to do a
> > > > release without them to see what happens. :)
> > > >
> > > > Do you rely on these, or can you use the -rc git tree or the quilt
> > > > series?  If you do rely on them, we will work to fix this, it just
> > > > involves some scripting that we didn't get done this morning.
> > >
> > > "Rely" is a strong word, I can adapt if they're going away, but
> > > I've been using them so far, as in (slightly simplified):
> > >
> > > $ cd patches/upstream/
> > > $ wget https://cdn.kernel.org/pub/linux/kernel/v5.x/patch-5.2.9.xz
> > > $ xz -d patch-5.2.9.xz
> > > $ wget 
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.10-rc1.gz
> > > $ gunzip patch-5.2.10-rc1.gz
> > > $ vim ../series
> > > $ quilt ...
> > >
> > > I can switch to importing the quilt queue with some sed magic (and I
> > > already do that, if interesting or just a larger amounts of patches are
> > > queuing up for more than a day or two), but using the -rc patches has
> > > been convenient in that semi-manual workflow, also to make sure to really
> > > get and test the formal -rc patch, rather than something inbetween.
> >
> > An easy way to generate a patch is to just use the git.kernel.org web
> > interface. A patch for 5.2.10-rc1 would be:
> > 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.2.y=v5.2.9
> >
> > Personally this patch upload story sounded to me like a pre-git era
> > artifact...
>
> Given that we no longer do patches for Linus's -rc releases for the past
> few years, maybe it is time to move to do the same for the stable
> releases to be consistent.

Or tarballs? Why do we generate tarballs (and go through kup)?
git.kernel.org already does it for us.


As I mentioned yesterday, but writing it down here for posterity,
there's a number of reasons.

First off, the release process doesn't require kup for when a "real"
release happens, that's all now donw on git.kernel.org with a process
involving a signed note and some other fun backend stuff.  We are
working on expanding that in the future with some other signature
validation as well, to make it easier to verify tarballs are "correct"
as what we do today is a bit different than other projects.


I think that I made it read like I want to remove tarballs altogether.
That's not the case: I just want to get rid of the magical signed note
process.

The way I understand it, we generate tarballs twice: once during the
magical signed note process, and once by the git interface. I'm just
suggesting we reduce that down to happen once.

Right now you can fetch tarballs from two different links on kernel.org.
For example, a 5.2.9 tarball is available at:

- https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.2.9.tar.xz
- 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/snapshot/linux-5.2.9.tar.gz

Can't we symlink one to the other?


As for the tarball itself, it's still needed for the same reasons we do
so on Linus's releases:
- distros use these.  Don't want all Gentoo users hammering on
  git.kernel.org for their updated builds, that's a huge waste.


We can just place git.kernel.org generated tarballs (for some repos) on
the CDN, no?


- mirroring works _so_ much better around the world for tarballs


Doing the above should solve this.


- legal reasons.  git is not yet "recognized" as being something
  that properly is reflective of a specific point in time while
  as online tarballs that are mirrored and stored around the
  world are finally almost properly recognized for this.


We still keep the tarballs.


- historical, people are used to using them, and workflows are
  built up around them.  People don't like rewriting scripts, as
  can be seen in my monstrosity of a mess that I use for
  releases :)


Right, this shouldn't require changing any scripts.

--
Thanks,
Sasha


RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-23 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Saturday, August 24, 2019 10:29 AM
> To: Parav Pandit 
> Cc: Jiri Pirko ; Jiri Pirko ; David S .
> Miller ; Kirti Wankhede ;
> Cornelia Huck ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; cjia ; net...@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Sat, 24 Aug 2019 03:56:08 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Saturday, August 24, 2019 1:14 AM
> > > To: Parav Pandit 
> > > Cc: Jiri Pirko ; Jiri Pirko ;
> > > David S . Miller ; Kirti Wankhede
> > > ; Cornelia Huck ;
> > > k...@vger.kernel.org; linux- ker...@vger.kernel.org; cjia
> > > ; net...@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Fri, 23 Aug 2019 18:00:30 +
> > > Parav Pandit  wrote:
> > >
> > > > > -Original Message-
> > > > > From: Alex Williamson 
> > > > > Sent: Friday, August 23, 2019 10:47 PM
> > > > > To: Parav Pandit 
> > > > > Cc: Jiri Pirko ; Jiri Pirko
> > > > > ; David S . Miller ;
> > > > > Kirti Wankhede ; Cornelia Huck
> > > > > ; k...@vger.kernel.org; linux-
> > > > > ker...@vger.kernel.org; cjia ;
> > > > > net...@vger.kernel.org
> > > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > > >
> > > > > On Fri, 23 Aug 2019 16:14:04 + Parav Pandit
> > > > >  wrote:
> > > > >
> > > > > > > > Idea is to have mdev alias as optional.
> > > > > > > > Each mdev_parent says whether it wants mdev_core to
> > > > > > > > generate an alias or not. So only networking device drivers
> would set it to true.
> > > > > > > > For rest, alias won't be generated, and won't be compared
> > > > > > > > either during creation time. User continue to provide only uuid.
> > > > > > >
> > > > > > > Ok
> > > > > > >
> > > > > > > > I am tempted to have alias collision detection only within
> > > > > > > > children mdevs of the same parent, but doing so will
> > > > > > > > always mandate to prefix in netdev name. And currently we
> > > > > > > > are left with only 3 characters to prefix it, so that may not be
> good either.
> > > > > > > > Hence, I think mdev core wide alias is better with 12 
> > > > > > > > characters.
> > > > > > >
> > > > > > > I suppose it depends on the API, if the vendor driver can
> > > > > > > ask the mdev core for an alias as part of the device
> > > > > > > creation process, then it could manage the netdev namespace
> > > > > > > for all its devices, choosing how many characters to use,
> > > > > > > and fail the creation if it can't meet a uniqueness
> > > > > > > requirement.  IOW, mdev-core would always provide a full
> > > > > > > sha1 and therefore gets itself out of the uniqueness/collision
> aspects.
> > > > > > >
> > > > > > This doesn't work. At mdev core level 20 bytes sha1 are
> > > > > > unique, so mdev core allowed to create a mdev.
> > > > >
> > > > > The mdev vendor driver has the opportunity to fail the device
> > > > > creation in mdev_parent_ops.create().
> > > > >
> > > > That is not helpful for below reasons.
> > > > 1. vendor driver doesn't have visibility in other vendor's alias.
> > > > 2. Even for single vendor, it needs to maintain global list of
> > > > devices to see
> > > collision.
> > > > 3. multiple vendors needs to implement same scheme.
> > > >
> > > > Mdev core should be the owner. Shifting ownership from one layer
> > > > to a lower layer in vendor driver doesn't solve the problem (if
> > > > there is one, which I think doesn't exist).
> > > >
> > > > > > And then devlink core chooses
> > > > > > only 6 bytes (12 characters) and there is collision. Things
> > > > > > fall apart. Since mdev provides unique uuid based scheme, it's
> > > > > > the mdev core's ownership to provide unique aliases.
> > > > >
> > > > > You're suggesting/contemplating multiple solutions here, 3-char
> > > > > prefix + 12- char sha1 vs  + ?-char sha1.  Also,
> > > > > the 15-char total limit is imposed by an external subsystem,
> > > > > where the vendor driver is the gateway between that subsystem
> > > > > and mdev.  How would mdev integrate with another subsystem that
> > > > > maybe only has 9-chars available?  Would the vendor driver API
> > > > > specify "I need an alias" or would it specify "I need an X-char length
> alias"?
> > > > Yes, Vendor driver should say how long the alias it wants.
> > > > However before we implement that, I suggest let such
> > > > vendor/user/driver arrive which needs that. Such variable length
> > > > alias can be added at that time and even with that alias collision
> > > > can be detected by single mdev module.
> > >
> > > If we agree that different alias lengths are possible, then I would
> > > request that minimally an mdev sample driver be modified to request
> > > an alias with a length that can be adjusted without recompiling in order
> to exercise the collision path.
> > >
> > Yes. this can be done. But I 

Re: [RFC PATCH v2 00/19] RDMA/FS DAX truncate proposal V1,000,002 ;-)

2019-08-23 Thread Ira Weiny
On Sat, Aug 24, 2019 at 10:11:24AM +1000, Dave Chinner wrote:
> On Fri, Aug 23, 2019 at 09:04:29AM -0300, Jason Gunthorpe wrote:
> > On Fri, Aug 23, 2019 at 01:23:45PM +1000, Dave Chinner wrote:
> > 
> > > > But the fact that RDMA, and potentially others, can "pass the
> > > > pins" to other processes is something I spent a lot of time trying to 
> > > > work out.
> > > 
> > > There's nothing in file layout lease architecture that says you
> > > can't "pass the pins" to another process.  All the file layout lease
> > > requirements say is that if you are going to pass a resource for
> > > which the layout lease guarantees access for to another process,
> > > then the destination process already have a valid, active layout
> > > lease that covers the range of the pins being passed to it via the
> > > RDMA handle.
> > 
> > How would the kernel detect and enforce this? There are many ways to
> > pass a FD.
> 
> AFAIC, that's not really a kernel problem. It's more of an
> application design constraint than anything else. i.e. if the app
> passes the IB context to another process without a lease, then the
> original process is still responsible for recalling the lease and
> has to tell that other process to release the IB handle and it's
> resources.
> 
> > IMHO it is wrong to try and create a model where the file lease exists
> > independently from the kernel object relying on it. In other words the
> > IB MR object itself should hold a reference to the lease it relies
> > upon to function properly.
> 
> That still doesn't work. Leases are not individually trackable or
> reference counted objects objects - they are attached to a struct
> file bUt, in reality, they are far more restricted than a struct
> file.
> 
> That is, a lease specifically tracks the pid and the _open fd_ it
> was obtained for, so it is essentially owned by a specific process
> context.  Hence a lease is not able to be passed to a separate
> process context and have it still work correctly for lease break
> notifications.  i.e. the layout break signal gets delivered to
> original process that created the struct file, if it still exists
> and has the original fd still open. It does not get sent to the
> process that currently holds a reference to the IB context.
>

The fcntl man page says:

"Leases are associated with an open file description (see open(2)).  This means
that duplicate file descriptors (created by, for example, fork(2) or dup(2))
refer to the same lease, and this lease may be modified or released using any
of these descriptors.  Furthermore,  the lease is released by either an
explicit F_UNLCK operation on any of these duplicate file descriptors, or when
all such file descriptors have been closed."

>From this I took it that the child process FD would have the lease as well
_and_ could release it.  I _assumed_ that applied to SCM_RIGHTS but it does not
seem to work the same way as dup() so I'm not so sure.

Ira

> 
> So while a struct file passed to another process might still have
> an active lease, and you can change the owner of the struct file
> via fcntl(F_SETOWN), you can't associate the existing lease with a
> the new fd in the new process and so layout break signals can't be
> directed at the lease fd
> 
> This really means that a lease can only be owned by a single process
> context - it can't be shared across multiple processes (so I was
> wrong about dup/pass as being a possible way of passing them)
> because there's only one process that can "own" a struct file, and
> that where signals are sent when the lease needs to be broken.
> 
> So, fundamentally, if you want to pass a resource that pins a file
> layout between processes, both processes need to hold a layout lease
> on that file range. And that means exclusive leases and passing
> layouts between processes are fundamentally incompatible because you
> can't hold two exclusive leases on the same file range
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com


Re: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-23 Thread Alex Williamson
On Sat, 24 Aug 2019 03:56:08 +
Parav Pandit  wrote:

> > -Original Message-
> > From: Alex Williamson 
> > Sent: Saturday, August 24, 2019 1:14 AM
> > To: Parav Pandit 
> > Cc: Jiri Pirko ; Jiri Pirko ; David S 
> > . Miller
> > ; Kirti Wankhede ; Cornelia
> > Huck ; k...@vger.kernel.org; linux-
> > ker...@vger.kernel.org; cjia ; net...@vger.kernel.org
> > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > 
> > On Fri, 23 Aug 2019 18:00:30 +
> > Parav Pandit  wrote:
> >   
> > > > -Original Message-
> > > > From: Alex Williamson 
> > > > Sent: Friday, August 23, 2019 10:47 PM
> > > > To: Parav Pandit 
> > > > Cc: Jiri Pirko ; Jiri Pirko ;
> > > > David S . Miller ; Kirti Wankhede
> > > > ; Cornelia Huck ;
> > > > k...@vger.kernel.org; linux- ker...@vger.kernel.org; cjia
> > > > ; net...@vger.kernel.org
> > > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > > >
> > > > On Fri, 23 Aug 2019 16:14:04 +
> > > > Parav Pandit  wrote:
> > > >  
> > > > > > > Idea is to have mdev alias as optional.
> > > > > > > Each mdev_parent says whether it wants mdev_core to generate
> > > > > > > an alias or not. So only networking device drivers would set it 
> > > > > > > to true.
> > > > > > > For rest, alias won't be generated, and won't be compared
> > > > > > > either during creation time. User continue to provide only uuid.  
> > > > > >
> > > > > > Ok
> > > > > >  
> > > > > > > I am tempted to have alias collision detection only within
> > > > > > > children mdevs of the same parent, but doing so will always
> > > > > > > mandate to prefix in netdev name. And currently we are left
> > > > > > > with only 3 characters to prefix it, so that may not be good 
> > > > > > > either.
> > > > > > > Hence, I think mdev core wide alias is better with 12 characters. 
> > > > > > >  
> > > > > >
> > > > > > I suppose it depends on the API, if the vendor driver can ask
> > > > > > the mdev core for an alias as part of the device creation
> > > > > > process, then it could manage the netdev namespace for all its
> > > > > > devices, choosing how many characters to use, and fail the
> > > > > > creation if it can't meet a uniqueness requirement.  IOW,
> > > > > > mdev-core would always provide a full
> > > > > > sha1 and therefore gets itself out of the uniqueness/collision 
> > > > > > aspects.
> > > > > >  
> > > > > This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> > > > > mdev core allowed to create a mdev.  
> > > >
> > > > The mdev vendor driver has the opportunity to fail the device
> > > > creation in mdev_parent_ops.create().
> > > >  
> > > That is not helpful for below reasons.
> > > 1. vendor driver doesn't have visibility in other vendor's alias.
> > > 2. Even for single vendor, it needs to maintain global list of devices to 
> > > see  
> > collision.  
> > > 3. multiple vendors needs to implement same scheme.
> > >
> > > Mdev core should be the owner. Shifting ownership from one layer to a
> > > lower layer in vendor driver doesn't solve the problem (if there is
> > > one, which I think doesn't exist).
> > >  
> > > > > And then devlink core chooses
> > > > > only 6 bytes (12 characters) and there is collision. Things fall
> > > > > apart. Since mdev provides unique uuid based scheme, it's the mdev
> > > > > core's ownership to provide unique aliases.  
> > > >
> > > > You're suggesting/contemplating multiple solutions here, 3-char
> > > > prefix + 12- char sha1 vs  + ?-char sha1.  Also, the
> > > > 15-char total limit is imposed by an external subsystem, where the
> > > > vendor driver is the gateway between that subsystem and mdev.  How
> > > > would mdev integrate with another subsystem that maybe only has
> > > > 9-chars available?  Would the vendor driver API specify "I need an
> > > > alias" or would it specify "I need an X-char length alias"?  
> > > Yes, Vendor driver should say how long the alias it wants.
> > > However before we implement that, I suggest let such
> > > vendor/user/driver arrive which needs that. Such variable length alias
> > > can be added at that time and even with that alias collision can be
> > > detected by single mdev module.  
> > 
> > If we agree that different alias lengths are possible, then I would request 
> > that
> > minimally an mdev sample driver be modified to request an alias with a 
> > length
> > that can be adjusted without recompiling in order to exercise the collision 
> > path.
> >   
> Yes. this can be done. But I fail to understand the need to do so.
> It is not the responsibility of the mdev core to show case sha1
> collision efficiency/deficiency. So why do you insist exercise it?

I don't understand what you're trying to imply with "show case sha1
collision efficiency/deficiency".  Are you suggesting that I'm asking
for this feature to experimentally test the probability of collisions
at different character lengths?  We can use shell scripts for that.
I'm simply observing that 

Re: [RFC PATCH v2 00/19] RDMA/FS DAX truncate proposal V1,000,002 ;-)

2019-08-23 Thread Ira Weiny
On Fri, Aug 23, 2019 at 01:23:45PM +1000, Dave Chinner wrote:
> On Wed, Aug 21, 2019 at 01:44:21PM -0700, Ira Weiny wrote:
> > On Wed, Aug 21, 2019 at 04:48:10PM -0300, Jason Gunthorpe wrote:
> > > On Wed, Aug 21, 2019 at 11:57:03AM -0700, Ira Weiny wrote:
> > > 
> > > > > Oh, I didn't think we were talking about that. Hanging the close of
> > > > > the datafile fd contingent on some other FD's closure is a recipe for
> > > > > deadlock..
> > > > 
> > > > The discussion between Jan and Dave was concerning what happens when a 
> > > > user
> > > > calls
> > > > 
> > > > fd = open()
> > > > fnctl(...getlease...)
> > > > addr = mmap(fd...)
> > > > ib_reg_mr() 
> > > > munmap(addr...)
> > > > close(fd)
> > > 
> > > I don't see how blocking close(fd) could work.
> > 
> > Well Dave was saying this _could_ work. FWIW I'm not 100% sure it will but I
> > can't prove it won't..
> 
> Right, I proposed it as a possible way of making sure application
> developers don't do this. It _could_ be made to work (e.g. recording
> longterm page pins on the vma->file), but this is tangential to 
> the discussion of requiring active references to all resources
> covered by the layout lease.
> 
> I think allowing applications to behave like the above is simply
> poor system level design, regardless of the interaction with
> filesystems and layout leases.
> 
> > Maybe we are all just touching a different part of this
> > elephant[1] but the above scenario or one without munmap is very reasonably
> > something a user would do.  So we can either allow the close to complete (my
> > current patches) or try to make it block like Dave is suggesting.

My belief when writing the current series was that hanging the close would
cause deadlock.  But it seems I was wrong because of the delayed __fput().

So far, I have not been able to get RDMA to have an issue like Jason suggested
would happen (or used to happen).  So from that perspective it may be ok to
hang the close.

> > 
> > I don't disagree with Dave with the semantics being nice and clean for the
> > filesystem.
> 
> I'm not trying to make it "nice and clean for the filesystem".
> 
> The problem is not just RDMA/DAX - anything that is directly
> accessing the block device under the filesystem has the same set of
> issues. That is, the filesystem controls the life cycle of the
> blocks in the block device, so direct access to the blocks by any
> means needs to be co-ordinated with the filesystem. Pinning direct
> access to a file via page pins attached to a hardware context that
> the filesystem knows nothing about is not an access model that the
> filesystems can support.
> 
> IOWs, anyone looking at this problem just from the RDMA POV of page
> pins is not seeing all the other direct storage access mechainsms
> that we need to support in the filesystems. RDMA on DAX is just one
> of them.  pNFS is another. Remote acces via NVMeOF is another. XDP
> -> DAX (direct file data placement from the network hardware) is
> another. There are /lots/ of different direct storage access
> mechanisms that filesystems need to support and we sure as hell do
> not want to have to support special case semantics for every single
> one of them.

My use of struct file was based on the fact that FDs are a primary interface
for linux and my thought was that they would be more universal than having file
pin information stored in an RDMA specific structure.

XDP is not as direct; it uses sockets.  But sockets also have a struct file
which I believe could be used in a similar manner.  I'm not 100% sure of the
xdp_umem lifetime yet but it seems that my choice of using struct file was a
good one in this respect.

> 
> Hence if we don't start with a sane model for arbitrating direct
> access to the storage at the filesystem level we'll never get this
> stuff to work reliably, let alone work together coherently.  An
> application that wants a direct data path to storage should have a
> single API that enables then to safely access the storage,
> regardless of how they are accessing the storage.
> 
> From that perspective, what we are talking about here with RDMA
> doing "mmap, page pin, unmap, close" and "pass page pins via
> SCM_RIGHTS" are fundamentally unworkable from the filesystem
> perspective. They are use-after-free situations from the filesystem
> perspective - they do not hold direct references to anything in the
> filesystem, and so the filesytem is completely unaware of them.

I see your point of view but looking at it from a different point of view I
don't see this as a "use after free".

The user has explicitly registered this memory (and layout) with another direct
access subsystem (RDMA for example) so why do they need to keep the FD around?

> 
> The filesystem needs to be aware of /all users/ of it's resources if
> it's going to manage them sanely.

>From the way I look at it the underlying filesystem _is_ aware of the leases
with my patch set.  And so to is the user.  It is just not through the 

RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-23 Thread Parav Pandit
Hi Alex,

> -Original Message-
> From: linux-kernel-ow...@vger.kernel.org  ow...@vger.kernel.org> On Behalf Of Parav Pandit
> Sent: Saturday, August 24, 2019 9:26 AM
> To: Alex Williamson 
> Cc: Jiri Pirko ; Jiri Pirko ; David S .
> Miller ; Kirti Wankhede ;
> Cornelia Huck ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; cjia ; net...@vger.kernel.org
> Subject: RE: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > I don't understand this logic.  I'm simply asking that we have a way
> > to test the collision behavior without changing the binary.  The path
> > we're driving towards seems to be making this easier and easier.  If
> > the vendor can request an alias of a specific length, then a sample
> > driver with a module option to set the desired alias length to 1-char makes
> it trivially easy to induce a collision.
> Sure it is easy to test collision, but my point is - mdev core is not sha1 
> test
> module.
> Hence adding functionality of variable alias length to test collision doesn't
> make sense.
> When the actual user arrives who needs small alias, we will be able to add
> additional pieces very easily.

My initial thoughts to add parent_ops to have bool flag to generate alias or 
not.
However, instead of bool, keeping it unsigned int to say, zero to skip alias 
and non-zero length to convey generate alias.
This will serve both the purpose with trivial handling.




RE: [PATCH v2 0/2] Simplify mtty driver and mdev core

2019-08-23 Thread Parav Pandit



> -Original Message-
> From: Alex Williamson 
> Sent: Saturday, August 24, 2019 1:14 AM
> To: Parav Pandit 
> Cc: Jiri Pirko ; Jiri Pirko ; David S . 
> Miller
> ; Kirti Wankhede ; Cornelia
> Huck ; k...@vger.kernel.org; linux-
> ker...@vger.kernel.org; cjia ; net...@vger.kernel.org
> Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> 
> On Fri, 23 Aug 2019 18:00:30 +
> Parav Pandit  wrote:
> 
> > > -Original Message-
> > > From: Alex Williamson 
> > > Sent: Friday, August 23, 2019 10:47 PM
> > > To: Parav Pandit 
> > > Cc: Jiri Pirko ; Jiri Pirko ;
> > > David S . Miller ; Kirti Wankhede
> > > ; Cornelia Huck ;
> > > k...@vger.kernel.org; linux- ker...@vger.kernel.org; cjia
> > > ; net...@vger.kernel.org
> > > Subject: Re: [PATCH v2 0/2] Simplify mtty driver and mdev core
> > >
> > > On Fri, 23 Aug 2019 16:14:04 +
> > > Parav Pandit  wrote:
> > >
> > > > > > Idea is to have mdev alias as optional.
> > > > > > Each mdev_parent says whether it wants mdev_core to generate
> > > > > > an alias or not. So only networking device drivers would set it to 
> > > > > > true.
> > > > > > For rest, alias won't be generated, and won't be compared
> > > > > > either during creation time. User continue to provide only uuid.
> > > > >
> > > > > Ok
> > > > >
> > > > > > I am tempted to have alias collision detection only within
> > > > > > children mdevs of the same parent, but doing so will always
> > > > > > mandate to prefix in netdev name. And currently we are left
> > > > > > with only 3 characters to prefix it, so that may not be good either.
> > > > > > Hence, I think mdev core wide alias is better with 12 characters.
> > > > >
> > > > > I suppose it depends on the API, if the vendor driver can ask
> > > > > the mdev core for an alias as part of the device creation
> > > > > process, then it could manage the netdev namespace for all its
> > > > > devices, choosing how many characters to use, and fail the
> > > > > creation if it can't meet a uniqueness requirement.  IOW,
> > > > > mdev-core would always provide a full
> > > > > sha1 and therefore gets itself out of the uniqueness/collision 
> > > > > aspects.
> > > > >
> > > > This doesn't work. At mdev core level 20 bytes sha1 are unique, so
> > > > mdev core allowed to create a mdev.
> > >
> > > The mdev vendor driver has the opportunity to fail the device
> > > creation in mdev_parent_ops.create().
> > >
> > That is not helpful for below reasons.
> > 1. vendor driver doesn't have visibility in other vendor's alias.
> > 2. Even for single vendor, it needs to maintain global list of devices to 
> > see
> collision.
> > 3. multiple vendors needs to implement same scheme.
> >
> > Mdev core should be the owner. Shifting ownership from one layer to a
> > lower layer in vendor driver doesn't solve the problem (if there is
> > one, which I think doesn't exist).
> >
> > > > And then devlink core chooses
> > > > only 6 bytes (12 characters) and there is collision. Things fall
> > > > apart. Since mdev provides unique uuid based scheme, it's the mdev
> > > > core's ownership to provide unique aliases.
> > >
> > > You're suggesting/contemplating multiple solutions here, 3-char
> > > prefix + 12- char sha1 vs  + ?-char sha1.  Also, the
> > > 15-char total limit is imposed by an external subsystem, where the
> > > vendor driver is the gateway between that subsystem and mdev.  How
> > > would mdev integrate with another subsystem that maybe only has
> > > 9-chars available?  Would the vendor driver API specify "I need an
> > > alias" or would it specify "I need an X-char length alias"?
> > Yes, Vendor driver should say how long the alias it wants.
> > However before we implement that, I suggest let such
> > vendor/user/driver arrive which needs that. Such variable length alias
> > can be added at that time and even with that alias collision can be
> > detected by single mdev module.
> 
> If we agree that different alias lengths are possible, then I would request 
> that
> minimally an mdev sample driver be modified to request an alias with a length
> that can be adjusted without recompiling in order to exercise the collision 
> path.
> 
Yes. this can be done. But I fail to understand the need to do so.
It is not the responsibility of the mdev core to show case sha1 collision 
efficiency/deficiency.
So why do you insist exercise it?

> If mdev-core is guaranteeing uniqueness, does this indicate that each alias
> length constitutes a separate namespace?  ie. strictly a strcmp(), not a
> strncmp() to the shorter alias.
> 
Yes.


> > > Does it make sense that mdev-core would fail creation of a device if
> > > there's a collision in the 12-char address space between different
> > > subsystems?  For example, does enm0123456789ab really
> > > collide with xyz0123456789ab?
> > I think so, because at mdev level its 12-char alias matters.
> > Choosing the prefix not adding prefix is really a user space choice.
> >
> > >  So if
> > > mdev were to 

Re: [PATCH 4/5] mtd: spi-nor: Move clear_sr_bp() to 'struct spi_nor_flash_parameter'

2019-08-23 Thread Tudor.Ambarus


On 08/23/2019 06:53 PM, Tudor Ambarus - M18064 wrote:
> +  * configuration register Quad Enable bit is one, only the the
^duplication


Re: [PATCH v1] [semaphore] Removed redundant code from semaphore's down family of function

2019-08-23 Thread Satendra Singh Thakur
On Thu, 22 Aug 2019 17:51:12 +0200, Peter Zijlstra wrote:
> On Mon, Aug 12, 2019 at 07:18:59PM +0530, Satendra Singh Thakur wrote:
> > -The semaphore code has four funcs
> > down,
> > down_interruptible,
> > down_killable,
> > down_timeout
> > -These four funcs have almost similar code except that
> > they all call lower level function __down_xyz.
> > -This lower level func in-turn call inline func
> > __down_common with appropriate arguments.
> > -This patch creates a common macro for above family of funcs
> > so that duplicate code is eliminated.
> > -Also, __down_common has been made noinline so that code is
> > functionally similar to previous one
> > -For example, earlier down_killable would call __down_killable
> > , which in-turn would call inline func __down_common
> > Now, down_killable calls noinline __down_common directly
> > through a macro
> > -The funcs __down_interruptible, __down_killable etc have been
> > removed as they were just wrapper to __down_common
>
> The above is unreadable and seems to lack a reason for this change.
Hi Mr Peter,
Thanks for the comments.
I will try to explain it further:

The semaphore has four functions named down*.
The call flow of the functions is

down* > __down* > inline __down_common

The code of down* and __down* is redundant/duplicate except that
the __down_common is called with different arguments from __down*
functions.

This patch defines a macro down_common which contain this common
code of all down* functions.

new call flow is

down* > noinline __down_common (through a macro down_common).

> AFAICT from the actual patch, you're destroying the explicit
> instantiation of the __down*() functions
> through constant propagation into __down_common().
Intead of instantiation of __down* functions, we are instaintiating
__down_common, is it a problem ?

Thanks
Satendra


Re: [PATCH] Partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"

2019-08-23 Thread Yafang Shao
On Sat, Aug 24, 2019 at 6:33 AM Roman Gushchin  wrote:
>
> On Fri, Aug 16, 2019 at 05:47:26PM -0700, Roman Gushchin wrote:
> > Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync
> > with the hierarchical ones") effectively decreased the precision of
> > per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.
> >
> > That's good for displaying in memory.stat, but brings a serious regression
> > into the reclaim process.
> >
> > One issue I've discovered and debugged is the following:
> > lruvec_lru_size() can return 0 instead of the actual number of pages
> > in the lru list, preventing the kernel to reclaim last remaining
> > pages. Result is yet another dying memory cgroups flooding.
> > The opposite is also happening: scanning an empty lru list
> > is the waste of cpu time.
> >
> > Also, inactive_list_is_low() can return incorrect values, preventing
> > the active lru from being scanned and freed. It can fail both because
> > the size of active and inactive lists are inaccurate, and because
> > the number of workingset refaults isn't precise. In other words,
> > the result is pretty random.
> >
> > I'm not sure, if using the approximate number of slab pages in
> > count_shadow_number() is acceptable, but issues described above
> > are enough to partially revert the patch.
> >
> > Let's keep per-memcg vmstat_local batched (they are only used for
> > displaying stats to the userspace), but keep lruvec stats precise.
> > This change fixes the dead memcg flooding on my setup.
> >
> > Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with 
> > the hierarchical ones")
> > Signed-off-by: Roman Gushchin 
> > Cc: Yafang Shao 
> > Cc: Johannes Weiner 
>
> Any other concerns/comments here?
>
> I'd prefer to fix the regression: we're likely leaking several pages
> of memory for each created and destroyed memory cgroup. Plus
> all internal structures, which are measured in hundreds of kb.
>

Hi Roman,

As it really introduces issues, I agree with you that we should fix it first.

So for your fix,
Acked-by: Yafang Shao 

Thanks
Yafang


Re: WARNINGs in set_task_reclaim_state with memory cgroup and full memory usage

2019-08-23 Thread Yafang Shao
On Sat, Aug 24, 2019 at 10:57 AM Hillf Danton  wrote:
>
>
> On Fri, 23 Aug 2019 18:00:15 -0400 Adric Blake wrote:
> > Synopsis:
> > A WARN_ON_ONCE is hit twice in set_task_reclaim_state under the
> > following conditions:
> > - a memory cgroup has been created and a task assigned it it
> > - memory.limit_in_bytes has been set
> > - memory has filled up, likely from cache
> >
> Thanks for report.
>
> > In my usage, I create a cgroup under the current session scope and
> > assign a task to it. I then set memory.limit_in_bytes and
> > memory.soft_limit_in_bytes for the cgroup to reasonable values, say
> > 1G/512M. The program accesses large files frequently and gradually
> > fills memory with the page cache. The warnings appears when the
> > entirety of the system memory is filled, presumably from other
> > programs.
> >
> > If I wait until the program has filled the entirety of system memory
> > with cache and then assign a memory limit, the warnings appear
> > immediately.
> >
> > I am building the linux git. I first noticed this issue with the
> > drm-tip 5.3rc3 and 5.3rc4 kernels, and tested linux master after
> > 5.3rc5 to confirm the bug more resoundingly.
> >
> > Here are the warnings.
> >
> > [38491.963105] WARNING: CPU: 7 PID: 175 at mm/vmscan.c:245 
> > set_task_reclaim_state+0x1e/0x40
> > [38491.963106] Modules linked in: iwlmvm mac80211 libarc4 iwlwifi
> > cfg80211 xt_comment nls_iso8859_1 nls_cp437 vfat fat xfs jfs btrfs xor
> > raid6_pq libcrc32c ccm tun rfcomm fuse xt_tcpudp ip6t_REJECT
> > nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_multiport xt_owner
> > snd_hda_codec_hdmi ip6table_filter ip6_tables iptable_filter bnep ext4
> > crc32c_generic mbcache jbd2 snd_hda_codec_realtek
> > snd_hda_codec_generic snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core
> > snd_soc_skl_ipc x86_pkg_temp_thermal intel_powerclamp snd_soc_sst_ipc
> > coretemp snd_soc_sst_dsp snd_soc_acpi_intel_match kvm_intel
> > snd_soc_acpi i915 snd_soc_core kvm snd_compress ac97_bus
> > snd_pcm_dmaengine snd_hda_intel i2c_algo_bit btusb irqbypass
> > drm_kms_helper btrtl snd_hda_codec dell_laptop btbcm crct10dif_pclmul
> > snd_hda_core crc32c_intel btintel iTCO_wdt ghash_clmulni_intel drm
> > ledtrig_audio aesni_intel iTCO_vendor_support snd_hwdep dell_wmi
> > rtsx_usb_ms r8169 dell_smbios aes_x86_64 mei_hdcp crypto_simd
> > intel_gtt bluetooth snd_pcm cryptd dcdbas
> > [38491.963155]  wmi_bmof dell_wmi_descriptor intel_rapl_msr
> > glue_helper snd_timer joydev intel_cstate snd realtek memstick
> > dell_smm_hwmon mousedev psmouse input_leds libphy intel_uncore
> > ecdh_generic ecc crc16 rfkill intel_rapl_perf soundcore i2c_i801
> > agpgart mei_me tpm_crb syscopyarea sysfillrect sysimgblt mei
> > intel_xhci_usb_role_switch fb_sys_fops idma64 tpm_tis roles
> > processor_thermal_device intel_rapl_common i2c_hid tpm_tis_core
> > int3403_thermal intel_soc_dts_iosf battery wmi intel_lpss_pci
> > intel_lpss intel_pch_thermal tpm int3400_thermal int3402_thermal
> > acpi_thermal_rel int340x_thermal_zone rng_core intel_hid ac
> > sparse_keymap evdev mac_hid crypto_user ip_tables x_tables
> > hid_multitouch rtsx_usb_sdmmc mmc_core rtsx_usb hid_logitech_hidpp
> > sr_mod cdrom sd_mod uas usb_storage hid_logitech_dj hid_generic usbhid
> > hid ahci serio_raw libahci atkbd libps2 libata xhci_pci scsi_mod
> > xhci_hcd crc32_pclmul i8042 serio f2fs [last unloaded: cfg80211]
> > [38491.963221] CPU: 7 PID: 175 Comm: kswapd0 Not tainted 
> > 5.3.0-rc5+149+gbb7ba8069de9 #1
> > [38491.963222] Hardware name: Dell Inc. Inspiron 5570/09YTN7, BIOS 1.2.3 
> > 05/15/2019
> > [38491.963226] RIP: 0010:set_task_reclaim_state+0x1e/0x40
> > [38491.963228] Code: 78 a9 e7 ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> > 00 55 48 89 f5 53 48 89 fb 48 85 ed 48 8b 83 08 08 00 00 74 11 48 85
> > c0 74 02 <0f> 0b 48 89 ab 08 08 00 00 5b 5d c3 48 85 c0 75 f1 0f 0b 48
> > 89 ab
> > [38491.963229] RSP: 0018:8c898031fc60 EFLAGS: 00010286
> > [38491.963230] RAX: 8c898031fe28 RBX: 892aa04ddc40 RCX: 
> > 
> > [38491.963231] RDX: 8c898031fc60 RSI: 8c898031fcd0 RDI: 
> > 892aa04ddc40
> > [38491.963233] RBP: 8c898031fcd0 R08: 8c898031fd48 R09: 
> > 89279674b800
> > [38491.963234] R10:  R11:  R12: 
> > 8c898031fd48
> > [38491.963235] R13: 892a842ef000 R14: 892aaf7fc000 R15: 
> > 
> > [38491.963236] FS:  () GS:892aa33c() 
> > knlGS:
> > [38491.963238] CS:  0010 DS:  ES:  CR0: 80050033
> > [38491.963239] CR2: 7f90628fa000 CR3: 00027ee0a002 CR4: 
> > 003606e0
> > [38491.963239] Call Trace:
> > [38491.963246]  mem_cgroup_shrink_node+0x9b/0x1d0
> > [38491.963250]  mem_cgroup_soft_limit_reclaim+0x10c/0x3a0
> > [38491.963254]  balance_pgdat+0x276/0x540
> > [38491.963258]  kswapd+0x200/0x3f0
> > [38491.963261]  ? wait_woken+0x80/0x80
> > [38491.963265]  kthread+0xfd/0x130
> > [38491.963267]  ? 

Re: [PATCH v2] i2c: mediatek: disable zero-length transfers for mt8183

2019-08-23 Thread Qii Wang
On Fri, 2019-08-23 at 16:13 +0800, Hsin-Yi Wang wrote:
> On Fri, Aug 23, 2019 at 4:09 PM Qii Wang  wrote:
> 
> > >
> > >  static u32 mtk_i2c_functionality(struct i2c_adapter *adap)
> > >  {
> > > - return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL;
> > > + if (adap->quirks->flags & I2C_AQ_NO_ZERO_LEN)
> > > + return I2C_FUNC_I2C |
> > > + (I2C_FUNC_SMBUS_EMUL & ~I2C_FUNC_SMBUS_QUICK);
> > > + else
> > > + return I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL;
> >
> > It can be removed?
> See previous discussion: https://patchwork.kernel.org/patch/10814391/#22484435
> but not all SoC's quirks has I2C_AQ_NO_ZERO_LEN.
ok, it looks good for me, thanks.
Reviewed-by: Qii Wang 




Re: [PATCH RT v2 2/3] sched: migrate_enable: Use sleeping_lock to indicate involuntary sleep

2019-08-23 Thread Joel Fernandes
On Fri, Aug 23, 2019 at 02:28:46PM -0500, Scott Wood wrote:
> On Fri, 2019-08-23 at 18:20 +0200, Sebastian Andrzej Siewior wrote:
> > On 2019-08-21 18:19:05 [-0500], Scott Wood wrote:
> > > Without this, rcu_note_context_switch() will complain if an RCU read
> > > lock is held when migrate_enable() calls stop_one_cpu().
> > > 
> > > Signed-off-by: Scott Wood 
> > > ---
> > > v2: Added comment.
> > > 
> > > If my migrate disable changes aren't taken, then pin_current_cpu()
> > > will also need to use sleeping_lock_inc() because calling
> > > __read_rt_lock() bypasses the usual place it's done.
> > > 
> > >  include/linux/sched.h| 4 ++--
> > >  kernel/rcu/tree_plugin.h | 2 +-
> > >  kernel/sched/core.c  | 8 
> > >  3 files changed, 11 insertions(+), 3 deletions(-)
> > > 
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -7405,7 +7405,15 @@ void migrate_enable(void)
> > >   unpin_current_cpu();
> > >   preempt_lazy_enable();
> > >   preempt_enable();
> > > +
> > > + /*
> > > +  * sleeping_lock_inc suppresses a debug check for
> > > +  * sleeping inside an RCU read side critical section
> > > +  */
> > > + sleeping_lock_inc();
> > >   stop_one_cpu(task_cpu(p), migration_cpu_stop, );
> > > + sleeping_lock_dec();
> > 
> > this looks like an ugly hack. This sleeping_lock_inc() is used where we
> > actually hold a sleeping lock and schedule() which is okay. But this
> > would mean we hold a RCU lock and schedule() anyway. Is that okay?
> 
> Perhaps the name should be changed, but the concept is the same -- RT-
> specific sleeping which should be considered involuntary for the purpose of
> debug checks.  Voluntary sleeping is not allowed in an RCU critical section
> because it will break the critical section on certain flavors of RCU, but
> that doesn't apply to the flavor used on RT.  Sleeping for a long time in an
> RCU critical section would also be a bad thing, but that also doesn't apply
> here.

I think the name should definitely be changed. At best, it is super confusing to
call it "sleeping_lock" for this scenario. In fact here, you are not even
blocking on a lock.

Maybe "sleeping_allowed" or some such.

thanks,

 - Joel



Re: [PATCH 5.2 000/135] 5.2.10-stable review

2019-08-23 Thread Greg KH
On Fri, Aug 23, 2019 at 12:41:03PM -0600, shuah wrote:
> On 8/22/19 11:05 AM, Sasha Levin wrote:
> > 
> > This is the start of the stable review cycle for the 5.2.10 release.
> > There are 135 patches in this series, all will be posted as a response
> > to this one.  If anyone has any issues with these being applied, please
> > let me know.
> > 
> > Responses should be made by Sat 24 Aug 2019 05:07:10 PM UTC.
> > Anything received after that time might be too late.
> > 
> > The whole patch series can be found in one patch at:
> >  
> > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-5.2.10-rc1.gz
> 
> I am seeing "Sorry I can't find your kernels". Is this posted?

Ah, Sasha didn't generate the patch but it was still listed here, oops.
He copied my format and we didn't notice this, sorry about that.

As the thread shows, we didn't generate this file this time to see what
would happen.  If your test process requires it, we can generate it as I
don't want to break it.

thanks,

greg k-h


Re: linux-next: Tree for Aug 23

2019-08-23 Thread John Hubbard
On 8/23/19 2:26 AM, Stephen Rothwell wrote:
> Hi all,
> 
> Changes since 20190822:
> 
> The thermal tree gained a conflict against the jc_docs tree.
> 
> The rdma tree gained a conflict against the rdma-fixes tree.
> 
> The net-next tree gained conflicts against the pci tree.
> 
> The crypto tree gained a conflict against Linus' tree.
> 
> The drm tree gained a conflict against the drm-fixes tree.

Hi,

Even though I saw email proposing fixes for one (maybe both) of the 
following warnings, I'm still seeing them in this linux-next:

WARNING: "ahci_em_messages" [drivers/ata/libahci] is a static EXPORT_SYMBOL_GPL
WARNING: "ftrace_set_clr_event" [vmlinux] is a static EXPORT_SYMBOL_GPL

...and obviously these can be trivially fixed by:

diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e4c45d3cca79..bff369d9a1a7 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -175,7 +175,6 @@ struct ata_port_operations ahci_pmp_retry_srst_ops = {
 EXPORT_SYMBOL_GPL(ahci_pmp_retry_srst_ops);
 
 static bool ahci_em_messages __read_mostly = true;
-EXPORT_SYMBOL_GPL(ahci_em_messages);
 module_param(ahci_em_messages, bool, 0444);
 /* add other LED protocol types when they become supported */
 MODULE_PARM_DESC(ahci_em_messages,
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index c7506bc81b75..648930823b57 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -787,7 +787,7 @@ static int __ftrace_set_clr_event(struct trace_array *tr, 
const char *match,
return ret;
 }
 
-static int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
+int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set)
 {
char *event = NULL, *sub = NULL, *match;
int ret;



...which I didn't create patches for, because I expect they are already
in flight. But if those somehow got lost or skipped, then here's an early
warning that these fixes still need to be applied.


thanks,
-- 
John Hubbard
NVIDIA


Re: [PATCH 5.2 000/135] 5.2.10-stable review

2019-08-23 Thread Greg KH
On Fri, Aug 23, 2019 at 09:18:05PM -0400, Sasha Levin wrote:
> On Fri, Aug 23, 2019 at 10:36:27AM -0700, Greg KH wrote:
> > On Fri, Aug 23, 2019 at 02:28:53AM -0400, Sasha Levin wrote:
> > > On Fri, Aug 23, 2019 at 02:42:48AM +0200, Stefan Lippers-Hollmann wrote:
> > > > Hi
> > > >
> > > > On 2019-08-22, Greg KH wrote:
> > > > > On Fri, Aug 23, 2019 at 12:05:27AM +0200, Stefan Lippers-Hollmann 
> > > > > wrote:
> > > > > > On 2019-08-22, Greg KH wrote:
> > > > > > > On Thu, Aug 22, 2019 at 01:05:56PM -0400, Sasha Levin wrote:
> > > > [...]
> > > > > > It might be down to kernel.org mirroring, but the patch file doesn't
> > > > > > seem to be available yet (404), both in the wrong location listed
> > > > > > above - and the expected one under
> > > > > >
> > > > > > 
> > > > > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.10-rc1.gz
> > > > [...]
> > > > > Ah, no, it's not a mirroring problem, Sasha and I didn't know if 
> > > > > anyone
> > > > > was actually using the patch files anymore, so it was simpler to do a
> > > > > release without them to see what happens. :)
> > > > >
> > > > > Do you rely on these, or can you use the -rc git tree or the quilt
> > > > > series?  If you do rely on them, we will work to fix this, it just
> > > > > involves some scripting that we didn't get done this morning.
> > > >
> > > > "Rely" is a strong word, I can adapt if they're going away, but
> > > > I've been using them so far, as in (slightly simplified):
> > > >
> > > > $ cd patches/upstream/
> > > > $ wget https://cdn.kernel.org/pub/linux/kernel/v5.x/patch-5.2.9.xz
> > > > $ xz -d patch-5.2.9.xz
> > > > $ wget 
> > > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.10-rc1.gz
> > > > $ gunzip patch-5.2.10-rc1.gz
> > > > $ vim ../series
> > > > $ quilt ...
> > > >
> > > > I can switch to importing the quilt queue with some sed magic (and I
> > > > already do that, if interesting or just a larger amounts of patches are
> > > > queuing up for more than a day or two), but using the -rc patches has
> > > > been convenient in that semi-manual workflow, also to make sure to 
> > > > really
> > > > get and test the formal -rc patch, rather than something inbetween.
> > > 
> > > An easy way to generate a patch is to just use the git.kernel.org web
> > > interface. A patch for 5.2.10-rc1 would be:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.2.y=v5.2.9
> > > 
> > > Personally this patch upload story sounded to me like a pre-git era
> > > artifact...
> > 
> > Given that we no longer do patches for Linus's -rc releases for the past
> > few years, maybe it is time to move to do the same for the stable
> > releases to be consistent.
> 
> Or tarballs? Why do we generate tarballs (and go through kup)?
> git.kernel.org already does it for us.

As I mentioned yesterday, but writing it down here for posterity,
there's a number of reasons.

First off, the release process doesn't require kup for when a "real"
release happens, that's all now donw on git.kernel.org with a process
involving a signed note and some other fun backend stuff.  We are
working on expanding that in the future with some other signature
validation as well, to make it easier to verify tarballs are "correct"
as what we do today is a bit different than other projects.

As for the tarball itself, it's still needed for the same reasons we do
so on Linus's releases:
- distros use these.  Don't want all Gentoo users hammering on
  git.kernel.org for their updated builds, that's a huge waste.
- mirroring works _so_ much better around the world for tarballs
- legal reasons.  git is not yet "recognized" as being something
  that properly is reflective of a specific point in time while
  as online tarballs that are mirrored and stored around the
  world are finally almost properly recognized for this.
- historical, people are used to using them, and workflows are
  built up around them.  People don't like rewriting scripts, as
  can be seen in my monstrosity of a mess that I use for
  releases :)

there are probably others, I know it came up when Linus stopped doing it
for the -rc releases and it was considered to do the same for the "real"
releases.

thanks,

greg k-h


Re: [PATCH] ARM: dts: vf610-zii-scu4-aib: Configure IRQ line for GPIO expander

2019-08-23 Thread Chris Healy
On Fri, Aug 23, 2019 at 5:27 PM Andrey Smirnov  wrote:
>
> Configure IRQ line for SX1503 GPIO expander. We already have
> appropriate pinmux entry and all that is missing is "interrupt-parent"
> and "interrupts" properties. Add them.
>
> Signed-off-by: Andrey Smirnov 
> Cc: Shawn Guo 
> Cc: Chris Healy 
> Cc: Cory Tusar 
> Cc: Fabio Estevam 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/arm/boot/dts/vf610-zii-scu4-aib.dts | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/arch/arm/boot/dts/vf610-zii-scu4-aib.dts 
> b/arch/arm/boot/dts/vf610-zii-scu4-aib.dts
> index e6c3621079e0..45a978defbdc 100644
> --- a/arch/arm/boot/dts/vf610-zii-scu4-aib.dts
> +++ b/arch/arm/boot/dts/vf610-zii-scu4-aib.dts
> @@ -570,6 +570,8 @@
> #gpio-cells = <2>;
> reg = <0x20>;
> gpio-controller;
> +   interrupt-parent = <>;
> +   interrupts = <31 IRQ_TYPE_EDGE_FALLING>;
> };
>
> lm75@4e {
> --

Tested-by: Chris Healy 


Re: [PATCH v2] x86/mm/pti: in pti_clone_pgtable(), increase addr properly

2019-08-23 Thread Song Liu



> On Aug 21, 2019, at 3:30 AM, Peter Zijlstra  wrote:
> 
> On Wed, Aug 21, 2019 at 12:10:08PM +0200, Peter Zijlstra wrote:
>> On Tue, Aug 20, 2019 at 01:23:14PM -0700, Song Liu wrote:
> 
>>> host-5.2-after # grep "x  pmd" /sys/kernel/debug/page_tables/dump_pid
>>> 0x0060-0x00e0   8M USR ro PSE   
>>>   x  pmd
>>> 0x8100-0x81e0  14M ro PSE 
>>> GLB x  pmd
>>> 
>>> So after this patch, the 5.2 based kernel has 7 PMDs instead of 1 PMD
>>> in 4.16 kernel.
>> 
>> This basically gives rise to more questions than it provides answers.
>> You seem to have 'forgotten' to provide the equivalent mappings on the
>> two older kernels. The fact that they're not PMD is evident, but it
>> would be very good to know what is mapped, and what -- if anything --
>> lives in the holes we've (accidentally) created.
>> 
>> Can you please provide more complete mappings? Basically provide the
>> whole cpu_entry_area mapping.
> 
> I tried on my local machine and:
> 
>  cat /debug/page_tables/kernel | awk '/^---/ { p=0 } /CPU entry/ { p=1 } { if 
> (p) print $0 }' > ~/cea-{before,after}.txt
> 
> resulted in _identical_ files ?!?!
> 
> Can you share your before and after dumps?

I was really dumb on this. The actual issue this that kprobe on 
CONFIG_KPROBES_ON_FTRACE splits kernel text PMDs (0x8100-). 

I will dig more into this. 

Sorry for being silent, somehow I didn't see this email until just now.

Song

Re: [PATCH V3 0/3] riscv: Add perf callchain support

2019-08-23 Thread Paul Walmsley
On Mon, 19 Aug 2019, Mao Han wrote:

> PS: I got some compile error while compiling glibc 2.30 with linux
> v5.3-rc4 header. vfork.S include linux/sched.h(./include/uapi/linux/sched.h)
> which has a struct clone_args inside, added by
> 7f192e3cd316ba58c88dfa26796cf77789dd9872.

Noticed that also.  Probably the sched.h uapi kernel header file needs an 
"#ifndef __ASSEMBLY__" around the struct clone_args...


- Paul


Re: [PATCH] PCI: Add missing link delays required by the PCIe spec

2019-08-23 Thread Bjorn Helgaas
Hi Mika,

I'm trying to figure out specifically why we need this and where it
should go.  Questions below.

On Wed, Aug 21, 2019 at 03:45:19PM +0300, Mika Westerberg wrote:
> Currently Linux does not follow PCIe spec regarding the required delays
> after reset. A concrete example is a Thunderbolt add-in-card that
> consists of a PCIe switch and two PCIe endpoints:
> 
>   +-1b.0-[01-6b]00.0-[02-6b]--+-00.0-[03]00.0 TBT controller
>   +-01.0-[04-36]-- DS hotplug port
>   +-02.0-[37]00.0 xHCI controller
>   \-04.0-[38-6b]-- DS hotplug port
> 
> The root port (1b.0) and the PCIe switch downstream ports are all PCIe
> gen3 so they support 8GT/s link speeds.
> 
> We wait for the PCIe hierarchy to enter D3cold (runtime):
> 
>   pcieport :00:1b.0: power state changed by ACPI to D3cold
> 
> When it wakes up from D3cold, according to the PCIe 4.0 section 5.8 the
> PCIe switch is put to reset and its power is re-applied. This means that
> we must follow the rules in PCIe 4.0 section 6.6.1.
> 
> For the PCIe gen3 ports we are dealing with here, the following applies:
> 
>   With a Downstream Port that supports Link speeds greater than 5.0
>   GT/s, software must wait a minimum of 100 ms after Link training
>   completes before sending a Configuration Request to the device
>   immediately below that Port. Software can determine when Link training
>   completes by polling the Data Link Layer Link Active bit or by setting
>   up an associated interrupt (see Section 6.7.3.3).
> 
> Translating this into the above topology we would need to do this (DLLLA
> stands for Data Link Layer Link Active):
> 
>   pcieport :00:1b.0: wait for 100ms after DLLLA is set before access to 
> :01:00.0
>   pcieport :02:00.0: wait for 100ms after DLLLA is set before access to 
> :03:00.0
>   pcieport :02:02.0: wait for 100ms after DLLLA is set before access to 
> :37:00.0
> 
> I've instrumented the kernel with additional logging so we can see the
> actual delays the kernel performs:
> 
>   pcieport :00:1b.0: power state changed by ACPI to D0
>   pcieport :00:1b.0: waiting for D3cold delay of 100 ms
>   pcieport :00:1b.0: waking up bus
>   pcieport :00:1b.0: waiting for D3hot delay of 10 ms
>   pcieport :00:1b.0: restoring config space at offset 0x2c (was 0x60, 
> writing 0x60)
>   ...
>   pcieport :00:1b.0: PME# disabled
>   pcieport :01:00.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :01:00.0: PME# disabled
>   pcieport :02:00.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :02:00.0: PME# disabled
>   pcieport :02:01.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :02:01.0: restoring config space at offset 0x4 (was 0x10, 
> writing 0x100407)
>   pcieport :02:01.0: PME# disabled
>   pcieport :02:02.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :02:02.0: PME# disabled
>   pcieport :02:04.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...
>   pcieport :02:04.0: PME# disabled
>   pcieport :02:01.0: PME# enabled
>   pcieport :02:01.0: waiting for D3hot delay of 10 ms
>   pcieport :02:04.0: PME# enabled
>   pcieport :02:04.0: waiting for D3hot delay of 10 ms
>   thunderbolt :03:00.0: restoring config space at offset 0x14 (was 0x0, 
> writing 0x8a04)
>   ...
>   thunderbolt :03:00.0: PME# disabled
>   xhci_hcd :37:00.0: restoring config space at offset 0x10 (was 0x0, 
> writing 0x73f0)
>   ...
>   xhci_hcd :37:00.0: PME# disabled
> 
> For the switch upstream port (01:00.0) we wait for 100ms but not taking
> into account the DLLLA requirement. We then wait 10ms for D3hot -> D0
> transition of the root port and the two downstream hotplug ports. This
> means that we deviate from what the spec requires.
> 
> Performing the same check for system sleep (s2idle) transitions we can
> see following when resuming from s2idle:
> 
>   pcieport :00:1b.0: power state changed by ACPI to D0
>   pcieport :00:1b.0: restoring config space at offset 0x2c (was 0x60, 
> writing 0x60)
>   ...
>   pcieport :01:00.0: restoring config space at offset 0x3c (was 0x1ff, 
> writing 0x201ff)
>   ...

I think the important thing in all the above logging is that it
doesn't show any delay, right?  If that's the case, you can just say
that in one line; I trust you even without 40 lines of config space
restore debug output :)

>   xhci_hcd :37:00.0: restoring config space at offset 0x10 (was 0x0, 
> writing 0x73f0)
>   ...
>   thunderbolt :03:00.0: restoring config space at offset 0x14 (was 0x0, 
> writing 0x8a04)
> 
> This is even worse. None of the mandatory delays are performed. If this
> would 

Re: [PATCH v2] x86/mm/pti: in pti_clone_pgtable(), increase addr properly

2019-08-23 Thread Song Liu



> On Aug 23, 2019, at 5:59 PM, Thomas Gleixner  wrote:
> 
> On Wed, 21 Aug 2019, Thomas Gleixner wrote:
>> On Wed, 21 Aug 2019, Song Liu wrote:
 On Aug 20, 2019, at 1:23 PM, Song Liu  wrote:
 
 Before 32-bit support, pti_clone_pmds() always adds PMD_SIZE to addr.
 This behavior changes after the 32-bit support:  pti_clone_pgtable()
 increases addr by PUD_SIZE for pud_none(*pud) case, and increases addr by
 PMD_SIZE for pmd_none(*pmd) case. However, this is not accurate because
 addr may not be PUD_SIZE/PMD_SIZE aligned.
 
 Fix this issue by properly rounding up addr to next PUD_SIZE/PMD_SIZE
 in these two cases.
>>> 
>>> After poking around more, I found the following doesn't really make 
>>> sense. 
>> 
>> I'm glad you figured that out yourself. Was about to write up something to
>> that effect.
>> 
>> Still interesting questions remain:
>> 
>>  1) How did you end up feeding an unaligned address into that which points
>> to a 0 PUD?
>> 
>>  2) Is this related to Facebook specific changes and unlikely to affect any
>> regular kernel? I can't come up with a way to trigger that in mainline
>> 
>>  3) As this is a user page table and the missing mapping is related to
>> mappings required by PTI, how is the machine going in/out of user
>> space in the first place? Or did I just trip over what you called
>> nonsense?
> 
> And just because this ended in silence I looked at it myself after Peter
> told me that this was on a kernel with PTI disabled. Aside of that my built
> in distrust for debug war stories combined with fairy tale changelogs
> triggered my curiousity anyway.

I am really sorry that I was silent. Somehow I didn't see this in my inbox
(or it didn't show up until just now?). 

For this patch, I really messed up this with something else. The issue we
are seeing is that kprobe on CONFIG_KPROBES_ON_FTRACE splits PMD located 
at 0x81a0. I sent another patch last night, but that might not
be the right fix either. 

I haven't started testing our PTI enabled kernel, so I am not sure whether
there is really an issue with the PTI code. 

Thanks,
Song





Re: [PATCH 00/14] per memcg lru_lock

2019-08-23 Thread Hugh Dickins
On Wed, 21 Aug 2019, Alex Shi wrote:
> 在 2019/8/21 上午2:24, Hugh Dickins 写道:
> > I'll set aside what I'm doing, and switch to rebasing ours to v5.3-rc
> > and/or mmotm.  Then compare with what Alex has, to see if there's any
> > good reason to prefer one to the other: if no good reason to prefer ours,
> > I doubt we shall bother to repost, but just use it as basis for helping
> > to review or improve Alex's.
> 
> For your review, my patchset are pretty straight and simple.
> It just use per lruvec lru_lock to replace necessary pgdat lru_lock.
> just this.  We could talk more after I back to work. :)

Sorry to be bearer of bad news, Alex, but when you said "straight and
simple", I feared that your patchset would turn out to be fundamentally
too simple.

And that is so. I have only to see the
lruvec = mem_cgroup_page_lruvec(page, pgdat);
line in isolate_migratepages_block() in mm/compaction.c, and check
that mem_cgroup_page_lruvec() is little changed in mm/mempolicy.c.

The central problem with per-memcg lru_lock is that you do not know
for sure what lock to take (which memcg a page belongs to) until you
have taken the expected lock, and then checked whether page->memcg
is still the same - backing out and trying again if not.

Fix that central problem, and you end up with a more complicated
patchset, much like ours.  It's true that when ours was first developed,
the memcg situation was more complicated in several ways, and perhaps
some aspects of our patchset could be simplified now (though I've not
identified any).  Johannes in particular has done a great deal of
simplifying work in memcg over the last few years, but there are still
situations in which a page's memcg can change (move_charge_at_immigrate
and swapin readahead spring to mind - or perhaps the latter is only an
issue when MEMCG_SWAP is not enabled, I forget; and I often wonder if
reparenting will be brought back one day).

I did not review your patchset in detail, and wasn't able to get very
far in testing it.  At first I was put off by set_task_reclaim_state
warnings from mm/vmscan.c, but those turned out to be in v5.3-rc5
itself, not from your patchset or mine (but I've not yet investigated
what's responsible for them).  Within a minute of starting swapping
load, kcompactd compact_lock_irqsave() in isolate_migratepages_block()
would deadlock, and I did not get further.  (Though I did also notice
that booting the CONFIG_MEMCG=y kernel with "cgroup_disable=memory"
froze in booting - tiresomely, one has to keep both the memcg and
no-memcg locking to cope with that case, and I guess you had not.)

Rather than duplicating effort, I would advise you to give our patchset
a try, and if it works for you, help towards getting that one merged:
but of course, it's up to you.

I've attached a tarfile of it rebased to v5.3-rc5: I do not want to
spam the list with patches yet, because I do not have any stats or
argument in support of the series, as Andrew asked for years ago and
Michal asks again now.  But aside from that I consider it ready, and
will let Shakeel take it over from here, while I get back to what I
diverted from (but of course I'll try to answer questions on it).

Thanks,
Hugh

lrulock503.tar
Description: per-memcg lru_lock on v5.3-rc5


Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER

2019-08-23 Thread Paul Walmsley
On Fri, 23 Aug 2019, David Abdurachmanov wrote:

> On Fri, Aug 23, 2019 at 5:30 PM Paul Walmsley  
> wrote:
> >
> > On Thu, 22 Aug 2019, David Abdurachmanov wrote:
> >
> > > There is one failing kernel selftest: global.user_notification_signal
> >
> > Is this the only failing test?  Or are the rest of the selftests skipped
> > when this test fails, and no further tests are run, as seems to be shown
> > here:
> >
> >   
> > https://lore.kernel.org/linux-riscv/cadnnuqcmdmre1f+3jg8spr6jrrnbsy8vvd70vbkem0nqyeo...@mail.gmail.com/
> 
> Yes, it's a single test failing. After removing 
> global.user_notification_signal
> test everything else pass and you get the results printed.

OK.

> Well the code states ".. and hope that it doesn't break when there
> is actually a signal :)". Maybe we are just unlucky. I don't have results
> from other architectures to compare.
> 
> I found that Linaro is running selftests, but SECCOMP is disabled
> and thus it's failing. Is there another CI which tracks selftests?

0day runs the kselftests, and at least on some architectures/Kconfigs, 
it's succeeding:

https://lore.kernel.org/lkml/20190726083740.GG22106@shao2-debian/

https://lore.kernel.org/lkml/20190712064850.GC20848@shao2-debian/

https://lore.kernel.org/lkml/20190311074115.GC10839@shao2-debian/

etc.


- Paul


Re: [PATCH 5.2 000/135] 5.2.10-stable review

2019-08-23 Thread Sasha Levin

On Fri, Aug 23, 2019 at 10:36:27AM -0700, Greg KH wrote:

On Fri, Aug 23, 2019 at 02:28:53AM -0400, Sasha Levin wrote:

On Fri, Aug 23, 2019 at 02:42:48AM +0200, Stefan Lippers-Hollmann wrote:
> Hi
>
> On 2019-08-22, Greg KH wrote:
> > On Fri, Aug 23, 2019 at 12:05:27AM +0200, Stefan Lippers-Hollmann wrote:
> > > On 2019-08-22, Greg KH wrote:
> > > > On Thu, Aug 22, 2019 at 01:05:56PM -0400, Sasha Levin wrote:
> [...]
> > > It might be down to kernel.org mirroring, but the patch file doesn't
> > > seem to be available yet (404), both in the wrong location listed
> > > above - and the expected one under
> > >
> > >  
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.10-rc1.gz
> [...]
> > Ah, no, it's not a mirroring problem, Sasha and I didn't know if anyone
> > was actually using the patch files anymore, so it was simpler to do a
> > release without them to see what happens. :)
> >
> > Do you rely on these, or can you use the -rc git tree or the quilt
> > series?  If you do rely on them, we will work to fix this, it just
> > involves some scripting that we didn't get done this morning.
>
> "Rely" is a strong word, I can adapt if they're going away, but
> I've been using them so far, as in (slightly simplified):
>
> $ cd patches/upstream/
> $ wget https://cdn.kernel.org/pub/linux/kernel/v5.x/patch-5.2.9.xz
> $ xz -d patch-5.2.9.xz
> $ wget 
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.10-rc1.gz
> $ gunzip patch-5.2.10-rc1.gz
> $ vim ../series
> $ quilt ...
>
> I can switch to importing the quilt queue with some sed magic (and I
> already do that, if interesting or just a larger amounts of patches are
> queuing up for more than a day or two), but using the -rc patches has
> been convenient in that semi-manual workflow, also to make sure to really
> get and test the formal -rc patch, rather than something inbetween.

An easy way to generate a patch is to just use the git.kernel.org web
interface. A patch for 5.2.10-rc1 would be:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.2.y=v5.2.9

Personally this patch upload story sounded to me like a pre-git era
artifact...


Given that we no longer do patches for Linus's -rc releases for the past
few years, maybe it is time to move to do the same for the stable
releases to be consistent.


Or tarballs? Why do we generate tarballs (and go through kup)?
git.kernel.org already does it for us.

--
Thanks,
Sasha


Re: [PATCH 11/15] sched,fair: flatten hierarchical runqueues

2019-08-23 Thread Rik van Riel
On Fri, 2019-08-23 at 20:14 +0200, Dietmar Eggemann wrote:
> 
> Looks like with the se->depth related code gone here in
> pick_next_task_fair() and the call to find_matching_se() in
> check_preempt_wakeup() you could remove se->depth entirely.
> 
> [...]

I've just done that in a separate patch in my series,
in case we need it again.  If we don't, diffstat tells
us we're getting this:

 include/linux/sched.h |  1 -
 kernel/sched/fair.c   | 50 ++-
---
 2 files changed, 2 insertions(+), 49 deletions(-)

Thank you!

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER

2019-08-23 Thread David Abdurachmanov
On Fri, Aug 23, 2019 at 6:04 PM David Abdurachmanov
 wrote:
>
> On Fri, Aug 23, 2019 at 5:30 PM Paul Walmsley  
> wrote:
> >
> > On Thu, 22 Aug 2019, David Abdurachmanov wrote:
> >
> > > There is one failing kernel selftest: global.user_notification_signal
> >
> > Is this the only failing test?  Or are the rest of the selftests skipped
> > when this test fails, and no further tests are run, as seems to be shown
> > here:
> >
> >   
> > https://lore.kernel.org/linux-riscv/cadnnuqcmdmre1f+3jg8spr6jrrnbsy8vvd70vbkem0nqyeo...@mail.gmail.com/
>
> Yes, it's a single test failing. After removing 
> global.user_notification_signal
> test everything else pass and you get the results printed.
>
> >
> > For example, looking at the source, I'd naively expect to see the
> > user_notification_closed_listener test result -- which follows right
> > after the failing test in the selftest source.  But there aren't any
> > results?
>
> Yes, it hangs at this point. You have to manually terminate it.
>
> >
> > Also - could you follow up with the author of this failing test to see if
> > we can get some more clarity about what might be going wrong here?  It
> > appears that the failing test was added in commit 6a21cc50f0c7f ("seccomp:
> > add a return code to trap to userspace") by Tycho Andersen
> > .
>
> Well the code states ".. and hope that it doesn't break when there
> is actually a signal :)". Maybe we are just unlucky. I don't have results
> from other architectures to compare.
>
> I found that Linaro is running selftests, but SECCOMP is disabled
> and thus it's failing. Is there another CI which tracks selftests?
>
> https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/seccomp_seccomp_bpf?top=next-20190823

Actually it seems that seccomp is enabled in kernel, but not in
systemd, and somehow seccomp_bpf is missing on all arches thus
causing automatic failure.

> >
> >
> > - Paul


Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER

2019-08-23 Thread David Abdurachmanov
On Fri, Aug 23, 2019 at 5:30 PM Paul Walmsley  wrote:
>
> On Thu, 22 Aug 2019, David Abdurachmanov wrote:
>
> > There is one failing kernel selftest: global.user_notification_signal
>
> Is this the only failing test?  Or are the rest of the selftests skipped
> when this test fails, and no further tests are run, as seems to be shown
> here:
>
>   
> https://lore.kernel.org/linux-riscv/cadnnuqcmdmre1f+3jg8spr6jrrnbsy8vvd70vbkem0nqyeo...@mail.gmail.com/

Yes, it's a single test failing. After removing global.user_notification_signal
test everything else pass and you get the results printed.

>
> For example, looking at the source, I'd naively expect to see the
> user_notification_closed_listener test result -- which follows right
> after the failing test in the selftest source.  But there aren't any
> results?

Yes, it hangs at this point. You have to manually terminate it.

>
> Also - could you follow up with the author of this failing test to see if
> we can get some more clarity about what might be going wrong here?  It
> appears that the failing test was added in commit 6a21cc50f0c7f ("seccomp:
> add a return code to trap to userspace") by Tycho Andersen
> .

Well the code states ".. and hope that it doesn't break when there
is actually a signal :)". Maybe we are just unlucky. I don't have results
from other architectures to compare.

I found that Linaro is running selftests, but SECCOMP is disabled
and thus it's failing. Is there another CI which tracks selftests?

https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/seccomp_seccomp_bpf?top=next-20190823

>
>
> - Paul


Re: WARNINGs in set_task_reclaim_state with memory cgroup and full memory usage

2019-08-23 Thread Yang Shi




On 8/23/19 3:00 PM, Adric Blake wrote:

Synopsis:
A WARN_ON_ONCE is hit twice in set_task_reclaim_state under the
following conditions:
- a memory cgroup has been created and a task assigned it it
- memory.limit_in_bytes has been set
- memory has filled up, likely from cache

In my usage, I create a cgroup under the current session scope and
assign a task to it. I then set memory.limit_in_bytes and
memory.soft_limit_in_bytes for the cgroup to reasonable values, say
1G/512M. The program accesses large files frequently and gradually
fills memory with the page cache. The warnings appears when the
entirety of the system memory is filled, presumably from other
programs.

If I wait until the program has filled the entirety of system memory
with cache and then assign a memory limit, the warnings appear
immediately.


It looks the warning is triggered because kswapd set reclaim_state then 
the memcg soft limit reclaim in the same kswapd set it again.


But, kswapd and memcg soft limit uses different reclaim_state from 
different scan control. It sounds not correct, they should use the same 
reclaim_state if they come from the same context if my understanding is 
correct.




I am building the linux git. I first noticed this issue with the
drm-tip 5.3rc3 and 5.3rc4 kernels, and tested linux master after
5.3rc5 to confirm the bug more resoundingly.

Here are the warnings.

[38491.963105] WARNING: CPU: 7 PID: 175 at mm/vmscan.c:245
set_task_reclaim_state+0x1e/0x40
[38491.963106] Modules linked in: iwlmvm mac80211 libarc4 iwlwifi
cfg80211 xt_comment nls_iso8859_1 nls_cp437 vfat fat xfs jfs btrfs xor
raid6_pq libcrc32c ccm tun rfcomm fuse xt_tcpudp ip6t_REJECT
nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_multiport xt_owner
snd_hda_codec_hdmi ip6table_filter ip6_tables iptable_filter bnep ext4
crc32c_generic mbcache jbd2 snd_hda_codec_realtek
snd_hda_codec_generic snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core
snd_soc_skl_ipc x86_pkg_temp_thermal intel_powerclamp snd_soc_sst_ipc
coretemp snd_soc_sst_dsp snd_soc_acpi_intel_match kvm_intel
snd_soc_acpi i915 snd_soc_core kvm snd_compress ac97_bus
snd_pcm_dmaengine snd_hda_intel i2c_algo_bit btusb irqbypass
drm_kms_helper btrtl snd_hda_codec dell_laptop btbcm crct10dif_pclmul
snd_hda_core crc32c_intel btintel iTCO_wdt ghash_clmulni_intel drm
ledtrig_audio aesni_intel iTCO_vendor_support snd_hwdep dell_wmi
rtsx_usb_ms r8169 dell_smbios aes_x86_64 mei_hdcp crypto_simd
intel_gtt bluetooth snd_pcm cryptd dcdbas
[38491.963155]  wmi_bmof dell_wmi_descriptor intel_rapl_msr
glue_helper snd_timer joydev intel_cstate snd realtek memstick
dell_smm_hwmon mousedev psmouse input_leds libphy intel_uncore
ecdh_generic ecc crc16 rfkill intel_rapl_perf soundcore i2c_i801
agpgart mei_me tpm_crb syscopyarea sysfillrect sysimgblt mei
intel_xhci_usb_role_switch fb_sys_fops idma64 tpm_tis roles
processor_thermal_device intel_rapl_common i2c_hid tpm_tis_core
int3403_thermal intel_soc_dts_iosf battery wmi intel_lpss_pci
intel_lpss intel_pch_thermal tpm int3400_thermal int3402_thermal
acpi_thermal_rel int340x_thermal_zone rng_core intel_hid ac
sparse_keymap evdev mac_hid crypto_user ip_tables x_tables
hid_multitouch rtsx_usb_sdmmc mmc_core rtsx_usb hid_logitech_hidpp
sr_mod cdrom sd_mod uas usb_storage hid_logitech_dj hid_generic usbhid
hid ahci serio_raw libahci atkbd libps2 libata xhci_pci scsi_mod
xhci_hcd crc32_pclmul i8042 serio f2fs [last unloaded: cfg80211]
[38491.963221] CPU: 7 PID: 175 Comm: kswapd0 Not tainted
5.3.0-rc5+149+gbb7ba8069de9 #1
[38491.963222] Hardware name: Dell Inc. Inspiron 5570/09YTN7, BIOS
1.2.3 05/15/2019
[38491.963226] RIP: 0010:set_task_reclaim_state+0x1e/0x40
[38491.963228] Code: 78 a9 e7 ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 55 48 89 f5 53 48 89 fb 48 85 ed 48 8b 83 08 08 00 00 74 11 48 85
c0 74 02 <0f> 0b 48 89 ab 08 08 00 00 5b 5d c3 48 85 c0 75 f1 0f 0b 48
89 ab
[38491.963229] RSP: 0018:8c898031fc60 EFLAGS: 00010286
[38491.963230] RAX: 8c898031fe28 RBX: 892aa04ddc40 RCX: 
[38491.963231] RDX: 8c898031fc60 RSI: 8c898031fcd0 RDI: 892aa04ddc40
[38491.963233] RBP: 8c898031fcd0 R08: 8c898031fd48 R09: 89279674b800
[38491.963234] R10:  R11:  R12: 8c898031fd48
[38491.963235] R13: 892a842ef000 R14: 892aaf7fc000 R15: 
[38491.963236] FS:  () GS:892aa33c()
knlGS:
[38491.963238] CS:  0010 DS:  ES:  CR0: 80050033
[38491.963239] CR2: 7f90628fa000 CR3: 00027ee0a002 CR4: 003606e0
[38491.963239] Call Trace:
[38491.963246]  mem_cgroup_shrink_node+0x9b/0x1d0
[38491.963250]  mem_cgroup_soft_limit_reclaim+0x10c/0x3a0
[38491.963254]  balance_pgdat+0x276/0x540
[38491.963258]  kswapd+0x200/0x3f0
[38491.963261]  ? wait_woken+0x80/0x80
[38491.963265]  kthread+0xfd/0x130
[38491.963267]  ? balance_pgdat+0x540/0x540
[38491.963269]  ? kthread_park+0x80/0x80
[38491.963273]  ret_from_fork+0x35/0x40

Re: [PATCH v2] x86/mm/pti: in pti_clone_pgtable(), increase addr properly

2019-08-23 Thread Thomas Gleixner
On Wed, 21 Aug 2019, Thomas Gleixner wrote:
> On Wed, 21 Aug 2019, Song Liu wrote:
> > > On Aug 20, 2019, at 1:23 PM, Song Liu  wrote:
> > > 
> > > Before 32-bit support, pti_clone_pmds() always adds PMD_SIZE to addr.
> > > This behavior changes after the 32-bit support:  pti_clone_pgtable()
> > > increases addr by PUD_SIZE for pud_none(*pud) case, and increases addr by
> > > PMD_SIZE for pmd_none(*pmd) case. However, this is not accurate because
> > > addr may not be PUD_SIZE/PMD_SIZE aligned.
> > > 
> > > Fix this issue by properly rounding up addr to next PUD_SIZE/PMD_SIZE
> > > in these two cases.
> > 
> > After poking around more, I found the following doesn't really make 
> > sense. 
> 
> I'm glad you figured that out yourself. Was about to write up something to
> that effect.
> 
> Still interesting questions remain:
> 
>   1) How did you end up feeding an unaligned address into that which points
>  to a 0 PUD?
> 
>   2) Is this related to Facebook specific changes and unlikely to affect any
>  regular kernel? I can't come up with a way to trigger that in mainline
> 
>   3) As this is a user page table and the missing mapping is related to
>  mappings required by PTI, how is the machine going in/out of user
>  space in the first place? Or did I just trip over what you called
>  nonsense?

And just because this ended in silence I looked at it myself after Peter
told me that this was on a kernel with PTI disabled. Aside of that my built
in distrust for debug war stories combined with fairy tale changelogs
triggered my curiousity anyway.

So that cannot be a kernel with PTI disabled compile time because in that
case the functions are not available, unless it's FB hackery which I do not
care about.

So the only way this can be reached is when PTI is configured but disabled
at boot time via pti=off or nopti.

For some silly reason and that goes back to before the 32bit support and
Joern did not notice either when he introduced pti_finalize() this results
in the following functions being called unconditionallY:

 pti_clone_entry_text()
 pti_clone_kernel_text()

pti_clone_kernel_text() was called unconditionally before the 32bit support
already and the only reason why it did not have any effect in that
situation is that it invokes pti_kernel_image_global_ok() and that returns
false when PTI is disabled on the kernel command line. Oh well. It
obviously never checked whether X86_FEATURE_PTI was disabled or enabled in
the first place.

Now 32bit moved that around into pti_finalize() and added the call to
pti_clone_entry_text() which just runs unconditionally.

Now there is still the interesting question why this matters. The to be
cloned page table entries are mapped and the start address even if
unaligned never points to something unmapped. The unmapped case only covers
holes and holes are obviously aligned at the upper levels even if the
address of the hole is unaligned.

So colour me still confused what's wrong there but the proper fix is the
trivial:

--- a/arch/x86/mm/pti.c
+++ b/arch/x86/mm/pti.c
@@ -666,6 +666,8 @@ void __init pti_init(void)
  */
 void pti_finalize(void)
 {
+   if (!boot_cpu_has(X86_FEATURE_PTI))
+   return;
/*
 * We need to clone everything (again) that maps parts of the
 * kernel image.

Hmm?

I'm going to look whether that makes any difference in the page tables
tomorrow with brain awake, but I wanted to share this before the .us
vanishes into the weekend :)

Thanks,

tglx



Re: [PATCH V5 1/3] riscv: Add perf callchain support

2019-08-23 Thread Guo Ren
Please check CONFIG_FRAME_POINTER

1 *frame = *((struct stackframe *)frame->fp - 1);
This code is origionally from riscv/kernel/stacktrace.c: walk_stackframe

In linux/Makefile it'll involve the options for gcc to definitely
store ra & prev_fp in fp pointer.
ifdef CONFIG_FRAME_POINTER
KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls

So --call-graph fp need depends on CONFIG_FRAME_POINTER.

On Fri, Aug 23, 2019 at 4:56 PM Greentime Hu  wrote:
>
> Hi Mao,
>
> Mao Han  於 2019年8月23日 週五 下午2:16寫道:
>
> >
> > This patch add support for perf callchain sampling on riscv platform.
> > The return address of leaf function is retrieved from pt_regs as
> > it is not saved in the outmost frame.
> >
> > Signed-off-by: Mao Han 
> > Cc: Paul Walmsley 
> > Cc: Greentime Hu 
> > Cc: Palmer Dabbelt 
> > Cc: linux-riscv 
> > Cc: Christoph Hellwig 
> > Cc: Guo Ren 
> > ---
> >  arch/riscv/Makefile|   3 +
> >  arch/riscv/kernel/Makefile |   3 +-
> >  arch/riscv/kernel/perf_callchain.c | 115 
> > +
> >  3 files changed, 120 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/riscv/kernel/perf_callchain.c
>
> I just tested "./perf record -e cpu-clock --call-graph fp ls" on
> Unleashed board and I got this failure.
> I take a look at it. It seem failed in here. Do you have any idea?
> It seems fine in Qemu.
>
> 1 *frame = *((struct stackframe *)frame->fp - 1);
> ffe0001a198c: 00863a83 ld s5,8(a2)
> ffe0001a1990: ff093903 ld s2,-16(s2)
>
> ./perf record -e cpu-clock --call-graph fp ls
> [ 9619.423884] hrtimer: interrupt took 733000 ns
> [ 9619.977017] Unable to handle kernel paging request at virtual
> address ff94
> [ 9620.214736] Oops [#1]
> [ 9620.289893] Modules linked in:
> [ 9620.391378] CPU: 0 PID: 264 Comm: ls Not tainted
> 5.3.0-rc5-3-gb008f6bcd67c #4
> [ 9620.640176] sepc: ffe0001a198c ra : ffe0001a199a sp :
> ffe93720
> [ 9620.880366] gp : ffe00097dad8 tp : ffe82e40 t0 : 
> 00046000
> [ 9621.120564] t1 : 0002 t2 : 0007 s0 : 
> ffe93760
> [ 9621.360768] s1 : ffe93788 a0 : 0003 a1 : 
> 
> [ 9621.600991] a2 : ff8c a3 : 1fa0 a4 : 
> 0010
> [ 9621.841181] a5 : 0002 a6 : 0001 a7 : 
> ffe079b34e10
> [ 9622.081400] s2 : ff9c s3 : ffe0 s4 : 
> 1ff8
> [ 9622.321618] s5 : ffe93da0 s6 : ffe00097d540 s7 : 
> ffe07a1517a0
> [ 9622.561811] s8 : 08bf01c7ff60 s9 : ffe000235b2a s10: 
> 00020120
> [ 9622.802015] s11: 0001 t3 : ffe079b34e00 t4 : 
> 0001
> [ 9623.042194] t5 : 0008 t6 : ffe0009208d0
> [ 9623.218785] sstatus: 00020100 sbadaddr: ff94
> scause: 000d
> [ 9623.490850] ---[ end trace 49043f28e856d84d ]---
> [ 9623.644217] Kernel panic - not syncing: Fatal exception in interrupt
> [ 9623.855470] SMP: stopping secondary CPUs
> [ 9623.985955] ---[ end Kernel panic - not syncing: Fatal exception in
> interrupt ]---



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/


Re: [PATCH v5 12/13] pwm: mediatek: remove a property "has-clock"

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 02:58:42PM +0800, Sam Shih wrote:
> Due to we added clock-frequency property to fix
> mt7628 pwm during configure from userspace.
> We can alos use this property to determine whether
> the complex clock tree exists in the SoC or not.
> So we can safety remove has-clock property in the
> driver specific data.

Some suggestions in short form:

s/Due/Since/
s/alos/also/

Also please use more horizontal space, up to 76 chars per line is fine.

Other than that I suggest to first address the feedback for the earlier
patches as the needed changes there has influence on this patch.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v5 11/13] dt-bindings: pwm: update bindings for MT7629 SoC

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 02:58:41PM +0800, Sam Shih wrote:
> From: Ryder Lee 
> 
> This updates bindings for MT7629 pwm controller.
> 
> This patch is the same as
> https://patchwork.kernel.org/patch/10769381/
> and it has a Reviewed-by tag in v1

This paragraph doesn't belong in the commit log.
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 
> Reviewed-by: Matthias Brugger 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v5 10/13] arm: dts: mt7623: add a property "num-pwms" for PWM

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 02:58:40PM +0800, Sam Shih wrote:
> From: Ryder Lee 
> 
> This adds a property "num-pwms" for PWM controller.
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 
> ---
>  arch/arm/boot/dts/mt7623.dtsi | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm/boot/dts/mt7623.dtsi b/arch/arm/boot/dts/mt7623.dtsi
> index a79f0b6c3429..208e0d19a575 100644
> --- a/arch/arm/boot/dts/mt7623.dtsi
> +++ b/arch/arm/boot/dts/mt7623.dtsi
> @@ -452,6 +452,7 @@
>< CLK_PERI_PWM5>;
>   clock-names = "top", "main", "pwm1", "pwm2",
> "pwm3", "pwm4", "pwm5";
> + num-pwms = <5>;
>   status = "disabled";
>   };

FTR: The matching change to the binding is patch 7 in this series and
didn't get an Ack from the dt people yet.

Best regards
Uwe


-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v5 09/13] arm64: dts: mt7622: add a property "num-pwms" for PWM

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 02:58:39PM +0800, Sam Shih wrote:
> From: Ryder Lee 
> 
> This adds a property "num-pwms" for PWM controller.
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 
> ---
>  arch/arm64/boot/dts/mediatek/mt7622.dtsi | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi 
> b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> index d1e13d340e26..9a043938881f 100644
> --- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> +++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
> @@ -439,6 +439,7 @@
>< CLK_PERI_PWM6_PD>;
>   clock-names = "top", "main", "pwm1", "pwm2", "pwm3", "pwm4",
> "pwm5", "pwm6";
> + num-pwms = <6>;
>   status = "disabled";
>   };

FTR: The matching change to the binding is patch 7 in this series and
didn't get an Ack from the dt people yet.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v5 08/13] dt-bindings: pwm: update bindings for MT7628 SoC

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 04:12:20PM +0800, Yingjoe Chen wrote:
> On Thu, 2019-08-22 at 14:58 +0800, Sam Shih wrote:
> > This updates bindings for MT7628 pwm controller.
> > 
> > Signed-off-by: Sam Shih 
> > ---
> >  Documentation/devicetree/bindings/pwm/pwm-mediatek.txt | 4 
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/pwm/pwm-mediatek.txt 
> > b/Documentation/devicetree/bindings/pwm/pwm-mediatek.txt
> > index ea95b490a913..93980e3da261 100644
> > --- a/Documentation/devicetree/bindings/pwm/pwm-mediatek.txt
> > +++ b/Documentation/devicetree/bindings/pwm/pwm-mediatek.txt
> > @@ -21,6 +21,10 @@ Required properties:
> > See pinctrl/pinctrl-bindings.txt for details of the property values.
> >   - num-pwms: the number of PWM channels.
> > +
> > + Optional properties:
> > + - clock-frequency: fix clock frequency, this is only used in MT7628 SoC
> > +for period calculation. This SoC has no complex clock 
> > tree.
> 
> I'm sorry if this has been discussed before. 
> 
> Would it be simpler if you just provide a fixed-clock as clock in device
> tree? This way you don't need this optional properties and don't need to
> special handle it in driver code. 
> 
> After all, the hw is still connected to a simple clock tree.

This is what I just wrote in reply to patch 3 (which implements handling
the clock-frequency property) even before reading your feedback. So I
fully agree.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v5 06/13] pwm: mediatek: update license and switch to SPDX tag

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 02:58:36PM +0800, Sam Shih wrote:
> Add SPDX identifiers to pwm-mediatek.c
> Update license to GNU General Public License v2.0
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 

Reviewed-by: Uwe Kleine-König 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v5 05/13] pwm: mediatek: use pwm_mediatek as common prefix

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 02:58:35PM +0800, Sam Shih wrote:
> Use pwm_mediatek as common prefix to match the filename.
> No functional change intended.
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 
> ---
> Changes since v5:
> - Follow reviewers's comments
> The license stuff is a separate change

this is a nice cleanup, I like it.

Acked-by: Uwe Kleine-König 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v5 04/13] pwm: mediatek: allocate the clks array dynamically

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 02:58:34PM +0800, Sam Shih wrote:
> Instead of using fixed size of arrays, allocate the memory for them
> based on the information we get from the DT.
> 
> Also remove the check for num_pwms, due to dynamically allocate pwm
> should not cause array index out of bound.
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 

Reviewed-by: Uwe Kleine-König 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v5 03/13] pwm: mediatek: add a property "clock-frequency"

2019-08-23 Thread Uwe Kleine-König
Hello Sam,

On Thu, Aug 22, 2019 at 02:58:33PM +0800, Sam Shih wrote:
> This fix mt7628 pwm during configure from userspace. The SoC
> is legacy MIPS and has no complex clock tree. This patch add property
> clock-frequency to the SoC specific data and legacy MIPS SoC need to
> configure it in DT. This property is use for period calculation.
> 
> We will improve this fix by droping has-clks attribute and using
> clock-frequency to do the same thing in a new patch.
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 

I wonder if instead the platform could provide some dummy and optional
clocks.

You could add a fixed-clock for the clk that is used to determine the
clock rate and switch to devm_clk_get_optional for the others.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


[CFP] Real-Time Summit 2019 Call for Presentations

2019-08-23 Thread Daniel Bristot de Oliveira
The Real-Time Summit is organized by the Linux Foundation Real-Time Linux (RTL)
collaborative project. The event is intended to gather developers and users of
Linux as a Real-Time Operating System. The main intent is to provide room for
discussion between developers, tooling experts, and users.

The summit will take place alongside the Open Source Summit + Embedded Linux
Conference Europe 2019 in Lyon, France. The summit is planned the day after the
main conference, Thursday, October 31st, 2019, from 8:00 to 17:00 at the
conference venue. If you are already considering your travel arrangements for
the Open Source Summit + Embedded Linux Conference Europe 2019 in Lyon, France,
and you have a general interest in this topic, please extend your travel by one
day to be in Lyon on Thursday, 31st.

If you are interested to present, please submit a proposal [1] before September
14th, 2019, at 23:59 EST. Please provide a title, an abstract describing the
proposed talk (900 characters maximum), a short biography (900 characters
maximum), and describe the targeted audience (900 characters maximum). Please
indicate the slot length you are aiming for: The format is a single track with
presentation slots of 30, 45 or 60 minutes long. Considering that the
presentation should use at most half of the slot time, leaving the rest of the
slot reserved for discussion. The focus of this event is the discussion.

We are welcoming presentations from both end-users and developers, on topics
covering, but not limited to:

 - Real-time Linux development
 - Real-time Linux evaluation
 - Real-time Linux use cases (Success and failures)
 - Real-time Linux tooling (tracing, configuration, …)
 - Real-time Linux academic work, already presented or under development, for
   direct feedback from practitioners community.

Those can cover recently available technologies, ongoing work, and new ideas.

Important Notes for Speakers:

 - All speakers are required to adhere to the Linux Foundation events’ Code of
   Conduct. We also highly recommend that speakers take the Linux Foundation
   online Inclusive Speaker Orientation Course.

 - Avoid sales or marketing pitches and discussing unlicensed or potentially
   closed-source technologies when preparing your proposal; these talks are
   almost always rejected due to the fact that they take away from the integrity
   of our events, and are rarely well-received by conference attendees.

 - All accepted speakers are required to submit their slides prior to the
event.

Submission must be received by 11:59 pm PST on September 14th, 2019

[1] Submission page: https://forms.gle/yQeqyrtJYezM5VRJA

Important Dates:

  - CFP Close: Saturday, September 14th, 2019, 11:59PM PST
  - Speaker notification: September 21st, 2019
  - Conference: Thursday, October 31st, 2019

Questions on submitting a proposal?
Email Daniel Bristot de Oliveira 

Thanks!
-- Daniel


Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER

2019-08-23 Thread Paul Walmsley
On Thu, 22 Aug 2019, David Abdurachmanov wrote:

> There is one failing kernel selftest: global.user_notification_signal

Is this the only failing test?  Or are the rest of the selftests skipped 
when this test fails, and no further tests are run, as seems to be shown 
here:

  
https://lore.kernel.org/linux-riscv/cadnnuqcmdmre1f+3jg8spr6jrrnbsy8vvd70vbkem0nqyeo...@mail.gmail.com/

For example, looking at the source, I'd naively expect to see the 
user_notification_closed_listener test result -- which follows right 
after the failing test in the selftest source.  But there aren't any 
results?

Also - could you follow up with the author of this failing test to see if 
we can get some more clarity about what might be going wrong here?  It 
appears that the failing test was added in commit 6a21cc50f0c7f ("seccomp: 
add a return code to trap to userspace") by Tycho Andersen 
.


- Paul


Re: [PATCH 0/2] coresight: Add barrier packet when moving offset forward

2019-08-23 Thread Yabin Cui
Thanks for fixing this problem. I didn't realize it because I usually use a
buffer size >= the default ETR buffer size, which is harder to reproduce the
problem.
The patches LGTM, maybe you also want to fix the problem commented by Leo Yan.
I tested the patches by recording etm data with a buffer size smaller than the
default ETR buffer size. Then I saw barrier packets when decoding with OpenCSD.
And I could decode successfully without error message.


Re: [PATCH v5 01/13] pwm: mediatek: add a property "num-pwms"

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 02:58:31PM +0800, Sam Shih wrote:
> From: Ryder Lee 
> 
> This adds a property "num-pwms" to avoid having an endless
> list of compatibles with no differences for the same driver.
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 

Reviewed-by: Uwe Kleine-König 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH v5 02/13] pwm: mediatek: droping the check for of_device_get_match_data

2019-08-23 Thread Uwe Kleine-König
On Thu, Aug 22, 2019 at 02:58:32PM +0800, Sam Shih wrote:
> This patch drop the check for of_device_get_match_data.
> Due to the only way call driver probe is compatible match.
> The .data pointer which point to the SoC specify data is
> directly set by driver, and it should not be NULL in our case.
> We can safety remove the check for of_device_get_match_data.
> 
> Signed-off-by: Ryder Lee 
> Signed-off-by: Sam Shih 

Acked-by: Uwe Kleine-König 

Thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


[PATCH 1/2] media: imx: Move capture device init to registered

2019-08-23 Thread Steve Longerbeam
If the CSI is unregistered and then registered again without the
driver being removed and re-probed (which will happen when the media
device is removed and re-probed without also removing/re-probing the
CSI), the result is the kobject error and backtrace "tried to init an
initialized object". This is because the video device is left in an
initialized state after being unregistered, thus the video device's
underlying kobject is also left in an initialized state when the device
is registered again.

Fix this by moving imx_media_capture_device_init() and _remove()
into csi_registered() and csi_unregistered(). This will create a new
un-initialized video device when the CSI is re-registered. Do this for
all the subdevices that register a capture device.

Reported-by: Russell King 
Signed-off-by: Steve Longerbeam 
---
 drivers/staging/media/imx/imx-ic-prpencvf.c | 24 -
 drivers/staging/media/imx/imx-media-csi.c   | 20 ++---
 drivers/staging/media/imx/imx7-media-csi.c  | 22 +--
 3 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/drivers/staging/media/imx/imx-ic-prpencvf.c 
b/drivers/staging/media/imx/imx-ic-prpencvf.c
index 67ffa46a8e96..301f5fce53c0 100644
--- a/drivers/staging/media/imx/imx-ic-prpencvf.c
+++ b/drivers/staging/media/imx/imx-ic-prpencvf.c
@@ -1271,17 +1271,26 @@ static int prp_registered(struct v4l2_subdev *sd)
if (ret)
return ret;
 
+   priv->vdev = imx_media_capture_device_init(ic_priv->ipu_dev,
+  _priv->sd,
+  PRPENCVF_SRC_PAD);
+   if (IS_ERR(priv->vdev))
+   return PTR_ERR(priv->vdev);
+
ret = imx_media_capture_device_register(priv->vdev);
if (ret)
-   return ret;
+   goto remove_vdev;
 
ret = prp_init_controls(priv);
if (ret)
-   goto unreg;
+   goto unreg_vdev;
 
return 0;
-unreg:
+
+unreg_vdev:
imx_media_capture_device_unregister(priv->vdev);
+remove_vdev:
+   imx_media_capture_device_remove(priv->vdev);
return ret;
 }
 
@@ -1290,6 +1299,8 @@ static void prp_unregistered(struct v4l2_subdev *sd)
struct prp_priv *priv = sd_to_priv(sd);
 
imx_media_capture_device_unregister(priv->vdev);
+   imx_media_capture_device_remove(priv->vdev);
+
v4l2_ctrl_handler_free(>ctrl_hdlr);
 }
 
@@ -1336,12 +1347,6 @@ static int prp_init(struct imx_ic_priv *ic_priv)
spin_lock_init(>irqlock);
timer_setup(>eof_timeout_timer, prp_eof_timeout, 0);
 
-   priv->vdev = imx_media_capture_device_init(ic_priv->ipu_dev,
-  _priv->sd,
-  PRPENCVF_SRC_PAD);
-   if (IS_ERR(priv->vdev))
-   return PTR_ERR(priv->vdev);
-
mutex_init(>lock);
 
return 0;
@@ -1352,7 +1357,6 @@ static void prp_remove(struct imx_ic_priv *ic_priv)
struct prp_priv *priv = ic_priv->task_priv;
 
mutex_destroy(>lock);
-   imx_media_capture_device_remove(priv->vdev);
 }
 
 struct imx_ic_ops imx_ic_prpencvf_ops = {
diff --git a/drivers/staging/media/imx/imx-media-csi.c 
b/drivers/staging/media/imx/imx-media-csi.c
index 367e39f5b382..b3f1cf08a102 100644
--- a/drivers/staging/media/imx/imx-media-csi.c
+++ b/drivers/staging/media/imx/imx-media-csi.c
@@ -1797,12 +1797,22 @@ static int csi_registered(struct v4l2_subdev *sd)
if (ret)
goto free_fim;
 
+   priv->vdev = imx_media_capture_device_init(priv->sd.dev,
+  >sd,
+  CSI_SRC_PAD_IDMAC);
+   if (IS_ERR(priv->vdev)) {
+   ret = PTR_ERR(priv->vdev);
+   goto free_fim;
+   }
+
ret = imx_media_capture_device_register(priv->vdev);
if (ret)
-   goto free_fim;
+   goto remove_vdev;
 
return 0;
 
+remove_vdev:
+   imx_media_capture_device_remove(priv->vdev);
 free_fim:
if (priv->fim)
imx_media_fim_free(priv->fim);
@@ -1816,6 +1826,7 @@ static void csi_unregistered(struct v4l2_subdev *sd)
struct csi_priv *priv = v4l2_get_subdevdata(sd);
 
imx_media_capture_device_unregister(priv->vdev);
+   imx_media_capture_device_remove(priv->vdev);
 
if (priv->fim)
imx_media_fim_free(priv->fim);
@@ -1963,11 +1974,6 @@ static int imx_csi_probe(struct platform_device *pdev)
imx_media_grp_id_to_sd_name(priv->sd.name, sizeof(priv->sd.name),
priv->sd.grp_id, ipu_get_num(priv->ipu));
 
-   priv->vdev = imx_media_capture_device_init(priv->sd.dev, >sd,
-  CSI_SRC_PAD_IDMAC);
-   if (IS_ERR(priv->vdev))
-   return PTR_ERR(priv->vdev);
-

[PATCH 2/2] media: imx: Move pads init to probe

2019-08-23 Thread Steve Longerbeam
If a subdevice is unregistered and then registered again without the
driver being removed and re-probed (which will happen when the media
device is removed and re-probed without also removing/re-probing the
subdevice), media_device_register_entity() is called with a non-zero
entity->num_pads, and then the subdevice's .registered callback calls
media_entity_pads_init(). Thus the subdevice's pad objects are added
to the media device pad list twice, causing list corruption.

One way to fix this would be to create media_entity_pads_destroy(),
and call it in the subdevice's .unregistered callback. But calling
media_entity_pads_init() in the .registered callbacks was done for
legacy reasons and is no longer necessary, so move the call to
media_entity_pads_init() into the subdevice's probe functions. This
fixes the duplicate pad obejcts in the media device pad list.

Signed-off-by: Steve Longerbeam 
---
 drivers/staging/media/imx/imx-ic-prp.c| 25 ++--
 drivers/staging/media/imx/imx-ic-prpencvf.c   | 29 ++-
 drivers/staging/media/imx/imx-media-capture.c | 15 +-
 drivers/staging/media/imx/imx-media-csi.c | 21 +++---
 drivers/staging/media/imx/imx-media-vdic.c| 27 +++--
 drivers/staging/media/imx/imx6-mipi-csi2.c| 27 -
 drivers/staging/media/imx/imx7-media-csi.c| 18 +++-
 drivers/staging/media/imx/imx7-mipi-csis.c| 23 +--
 8 files changed, 82 insertions(+), 103 deletions(-)

diff --git a/drivers/staging/media/imx/imx-ic-prp.c 
b/drivers/staging/media/imx/imx-ic-prp.c
index 35e60a120dc1..2a4f77e83ed3 100644
--- a/drivers/staging/media/imx/imx-ic-prp.c
+++ b/drivers/staging/media/imx/imx-ic-prp.c
@@ -428,32 +428,19 @@ static int prp_s_frame_interval(struct v4l2_subdev *sd,
return 0;
 }
 
-/*
- * retrieve our pads parsed from the OF graph by the media device
- */
 static int prp_registered(struct v4l2_subdev *sd)
 {
struct prp_priv *priv = sd_to_priv(sd);
-   int i, ret;
u32 code;
 
-   for (i = 0; i < PRP_NUM_PADS; i++) {
-   priv->pad[i].flags = (i == PRP_SINK_PAD) ?
-   MEDIA_PAD_FL_SINK : MEDIA_PAD_FL_SOURCE;
-   }
-
/* init default frame interval */
priv->frame_interval.numerator = 1;
priv->frame_interval.denominator = 30;
 
/* set a default mbus format  */
imx_media_enum_ipu_format(, 0, CS_SEL_YUV);
-   ret = imx_media_init_mbus_fmt(>format_mbus, 640, 480, code,
- V4L2_FIELD_NONE, NULL);
-   if (ret)
-   return ret;
-
-   return media_entity_pads_init(>entity, PRP_NUM_PADS, priv->pad);
+   return imx_media_init_mbus_fmt(>format_mbus, 640, 480, code,
+  V4L2_FIELD_NONE, NULL);
 }
 
 static const struct v4l2_subdev_pad_ops prp_pad_ops = {
@@ -487,6 +474,7 @@ static const struct v4l2_subdev_internal_ops 
prp_internal_ops = {
 static int prp_init(struct imx_ic_priv *ic_priv)
 {
struct prp_priv *priv;
+   int i;
 
priv = devm_kzalloc(ic_priv->ipu_dev, sizeof(*priv), GFP_KERNEL);
if (!priv)
@@ -496,7 +484,12 @@ static int prp_init(struct imx_ic_priv *ic_priv)
ic_priv->task_priv = priv;
priv->ic_priv = ic_priv;
 
-   return 0;
+   for (i = 0; i < PRP_NUM_PADS; i++)
+   priv->pad[i].flags = (i == PRP_SINK_PAD) ?
+   MEDIA_PAD_FL_SINK : MEDIA_PAD_FL_SOURCE;
+
+   return media_entity_pads_init(_priv->sd.entity, PRP_NUM_PADS,
+ priv->pad);
 }
 
 static void prp_remove(struct imx_ic_priv *ic_priv)
diff --git a/drivers/staging/media/imx/imx-ic-prpencvf.c 
b/drivers/staging/media/imx/imx-ic-prpencvf.c
index 301f5fce53c0..09c4e3f33807 100644
--- a/drivers/staging/media/imx/imx-ic-prpencvf.c
+++ b/drivers/staging/media/imx/imx-ic-prpencvf.c
@@ -1240,21 +1240,16 @@ static int prp_s_frame_interval(struct v4l2_subdev *sd,
return 0;
 }
 
-/*
- * retrieve our pads parsed from the OF graph by the media device
- */
 static int prp_registered(struct v4l2_subdev *sd)
 {
struct prp_priv *priv = sd_to_priv(sd);
+   struct imx_ic_priv *ic_priv = priv->ic_priv;
int i, ret;
u32 code;
 
+   /* set a default mbus format  */
+   imx_media_enum_ipu_format(, 0, CS_SEL_YUV);
for (i = 0; i < PRPENCVF_NUM_PADS; i++) {
-   priv->pad[i].flags = (i == PRPENCVF_SINK_PAD) ?
-   MEDIA_PAD_FL_SINK : MEDIA_PAD_FL_SOURCE;
-
-   /* set a default mbus format  */
-   imx_media_enum_ipu_format(, 0, CS_SEL_YUV);
ret = imx_media_init_mbus_fmt(>format_mbus[i],
  640, 480, code, V4L2_FIELD_NONE,
  >cc[i]);
@@ -1266,11 +1261,6 @@ static int prp_registered(struct v4l2_subdev *sd)

[PATCH] ARM: dts: vf610-zii-scu4-aib: Use generic names for DT nodes

2019-08-23 Thread Andrey Smirnov
The devicetree specification recommends using generic node names.

Some ZII dts files already follow such recommendation, but some don't,
so use generic node names for consistency among the ZII dts files.

Signed-off-by: Andrey Smirnov 
Cc: Shawn Guo 
Cc: Chris Healy 
Cc: Cory Tusar 
Cc: Fabio Estevam 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm/boot/dts/vf610-zii-scu4-aib.dts | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/arm/boot/dts/vf610-zii-scu4-aib.dts 
b/arch/arm/boot/dts/vf610-zii-scu4-aib.dts
index 45a978defbdc..6edd682dbd29 100644
--- a/arch/arm/boot/dts/vf610-zii-scu4-aib.dts
+++ b/arch/arm/boot/dts/vf610-zii-scu4-aib.dts
@@ -417,7 +417,7 @@
pinctrl-0 = <_dspi1>;
status = "okay";
 
-   spi-flash@0 {
+   flash@0 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "jedec,spi-nor";
@@ -430,7 +430,7 @@
};
};
 
-   spi-flash@1 {
+   flash@1 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "jedec,spi-nor";
@@ -519,7 +519,7 @@
#gpio-cells = <2>;
};
 
-   lm75@48 {
+   temp-sensor@48 {
compatible = "national,lm75";
reg = <0x48>;
};
@@ -534,7 +534,7 @@
reg = <0x52>;
};
 
-   ds1682@6b {
+   elapsed-time-recorder@6b {
compatible = "dallas,ds1682";
reg = <0x6b>;
};
@@ -551,7 +551,7 @@
reg = <0x38>;
};
 
-   adt7411@4a {
+   adc@4a {
compatible = "adi,adt7411";
reg = <0x4a>;
};
@@ -563,7 +563,7 @@
pinctrl-0 = <_i2c2>;
status = "okay";
 
-   gpio9: sx1503q@20 {
+   gpio9: io-expander@20 {
compatible = "semtech,sx1503q";
pinctrl-names = "default";
pinctrl-0 = <_sx1503_20>;
@@ -574,12 +574,12 @@
interrupts = <31 IRQ_TYPE_EDGE_FALLING>;
};
 
-   lm75@4e {
+   temp-sensor@4e {
compatible = "national,lm75";
reg = <0x4e>;
};
 
-   lm75@4f {
+   temp-sensor@4f {
compatible = "national,lm75";
reg = <0x4f>;
};
@@ -591,17 +591,17 @@
reg = <0x23>;
};
 
-   adt7411@4a {
+   adc@4a {
compatible = "adi,adt7411";
reg = <0x4a>;
};
 
-   at24c08@54 {
+   eeprom@54 {
compatible = "atmel,24c08";
reg = <0x54>;
};
 
-   tca9548@70 {
+   i2c-mux@70 {
compatible = "nxp,pca9548";
pinctrl-names = "default";
#address-cells = <1>;
@@ -639,7 +639,7 @@
};
};
 
-   tca9548@71 {
+   i2c-mux@71 {
compatible = "nxp,pca9548";
pinctrl-names = "default";
reg = <0x71>;
-- 
2.21.0



Re: [PATCH] pwm: mxs: use devm_platform_ioremap_resource() to simplify code

2019-08-23 Thread Uwe Kleine-König
Hello,

On Tue, Aug 20, 2019 at 05:56:40AM +, Anson Huang wrote:
> Gentle ping...

My impression[1] is that Thierry collects patches in bulk once or twice
per release cycle. The last two such bulks were between 5.2-rc6 and
5.2-rc7 and in the 5.2 merge window. So given we're at v5.3-rc5 now I
expect some action soon :-)

> > > From: anson.hu...@nxp.com 
> > > Sent: Thursday, July 18, 2019 9:32 AM
> > >
> > > Use the new helper devm_platform_ioremap_resource() which wraps the
> > > platform_get_resource() and devm_ioremap_resource() together, to
> > > simplify the code.
> > >
> > > Signed-off-by: Anson Huang 
> > 
> > Reviewed-by: Dong Aisheng 

Acked-by: Uwe Kleine-König 

Best regards
Uwe

[1] from git log --committer=Thierry --format=%ci drivers/pwm | cut -d\  -f1 | 
uniq -c
-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


[PATCH] ARM: dts: vf610-zii-scu4-aib: Configure IRQ line for GPIO expander

2019-08-23 Thread Andrey Smirnov
Configure IRQ line for SX1503 GPIO expander. We already have
appropriate pinmux entry and all that is missing is "interrupt-parent"
and "interrupts" properties. Add them.

Signed-off-by: Andrey Smirnov 
Cc: Shawn Guo 
Cc: Chris Healy 
Cc: Cory Tusar 
Cc: Fabio Estevam 
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
 arch/arm/boot/dts/vf610-zii-scu4-aib.dts | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/boot/dts/vf610-zii-scu4-aib.dts 
b/arch/arm/boot/dts/vf610-zii-scu4-aib.dts
index e6c3621079e0..45a978defbdc 100644
--- a/arch/arm/boot/dts/vf610-zii-scu4-aib.dts
+++ b/arch/arm/boot/dts/vf610-zii-scu4-aib.dts
@@ -570,6 +570,8 @@
#gpio-cells = <2>;
reg = <0x20>;
gpio-controller;
+   interrupt-parent = <>;
+   interrupts = <31 IRQ_TYPE_EDGE_FALLING>;
};
 
lm75@4e {
-- 
2.21.0



Re: [PATCH] uprobes/x86: fix detection of 32-bit user mode

2019-08-23 Thread Thomas Gleixner
On Fri, 23 Aug 2019, Andy Lutomirski wrote:
> > On Aug 23, 2019, at 5:03 PM, Thomas Gleixner  wrote:
> > 
> >> On Sat, 24 Aug 2019, Thomas Gleixner wrote:
> >> On Fri, 23 Aug 2019, Andy Lutomirski wrote:
>  On Aug 23, 2019, at 4:44 PM, Thomas Gleixner  wrote:
>  
> >> On Sat, 24 Aug 2019, Thomas Gleixner wrote:
> >> On Sun, 28 Jul 2019, Sebastian Mayr wrote:
> >> 
> >> -static inline int sizeof_long(void)
> >> +static inline int sizeof_long(struct pt_regs *regs)
> >> {
> >> -return in_ia32_syscall() ? 4 : 8;
> > 
> > This wants a comment.
> > 
> >> +return user_64bit_mode(regs) ? 8 : 4;
>  
>  The more simpler one liner is to check
>  
>    test_thread_flag(TIF_IA32)
> >>> 
> >>> I still want to finish killing TIF_IA32 some day.  Let’s please not add 
> >>> new users.
> >> 
> >> Well, yes and no. This needs to be backported 
> > 
> > And TBH the magic in user_64bit_mode() is not pretty either.
> > 
> It’s only magic on Xen. I should probably stick a
> cpu_feature_enabled(X86_FEATURE_XENPV) in there instead.

For backporting sake I really prefer the TIF version. One usage site more
is not the end of the world. We can add the user_64bit_mode() variant from
Sebastian on top as a cleanup right away so mainline is clean.

Thanks,

tglx

RE: [PATCH v2 03/10] PCI: designware-ep: Move the function of getting MSI capability forward

2019-08-23 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月23日 21:39
> To: Xiaowei Bao 
> Cc: bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ; kis...@ti.com;
> lorenzo.pieral...@arm.co; a...@arndb.de; gre...@linuxfoundation.org; M.h.
> Lian ; Mingkai Hu ; Roy
> Zang ; jingooh...@gmail.com;
> gustavo.pimen...@synopsys.com; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH v2 03/10] PCI: designware-ep: Move the function of
> getting MSI capability forward
> 
> On Thu, Aug 22, 2019 at 07:22:35PM +0800, Xiaowei Bao wrote:
> > Move the function of getting MSI capability to the front of init
> > function, because the init function of the EP platform driver will use
> > the return value by the function of getting MSI capability.
> >
> > Signed-off-by: Xiaowei Bao 
> 
> Reviewed-by: Andrew Murray 

Thanks a lot, I think move this to ep_init is better.

> 
> > ---
> > v2:
> >  - No change.
> >
> >  drivers/pci/controller/dwc/pcie-designware-ep.c | 7 ---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > index b8388f8..0a6c199 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > @@ -656,6 +656,10 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > if (ret < 0)
> > epc->max_functions = 1;
> >
> > +   ep->msi_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSI);
> > +
> > +   ep->msix_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSIX);
> > +
> > if (ep->ops->ep_init)
> > ep->ops->ep_init(ep);
> >
> > @@ -672,9 +676,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep)
> > dev_err(dev, "Failed to reserve memory for MSI/MSI-X\n");
> > return -ENOMEM;
> > }
> > -   ep->msi_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSI);
> > -
> > -   ep->msix_cap = dw_pcie_ep_find_capability(pci, PCI_CAP_ID_MSIX);
> >
> > offset = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_REBAR);
> > if (offset) {
> > --
> > 2.9.5
> >


Re: [RFC PATCH v2 00/19] RDMA/FS DAX truncate proposal V1,000,002 ;-)

2019-08-23 Thread Dave Chinner
On Fri, Aug 23, 2019 at 10:15:04AM -0700, Ira Weiny wrote:
> On Fri, Aug 23, 2019 at 10:59:14AM +1000, Dave Chinner wrote:
> > On Wed, Aug 21, 2019 at 11:02:00AM -0700, Ira Weiny wrote:
> > > On Tue, Aug 20, 2019 at 08:55:15AM -0300, Jason Gunthorpe wrote:
> > > > On Tue, Aug 20, 2019 at 11:12:10AM +1000, Dave Chinner wrote:
> > > > > On Mon, Aug 19, 2019 at 09:38:41AM -0300, Jason Gunthorpe wrote:
> > > > > > On Mon, Aug 19, 2019 at 07:24:09PM +1000, Dave Chinner wrote:
> > > > > > 
> > > > > > > So that leaves just the normal close() syscall exit case, where 
> > > > > > > the
> > > > > > > application has full control of the order in which resources are
> > > > > > > released. We've already established that we can block in this
> > > > > > > context.  Blocking in an interruptible state will allow fatal 
> > > > > > > signal
> > > > > > > delivery to wake us, and then we fall into the
> > > > > > > fatal_signal_pending() case if we get a SIGKILL while blocking.
> > > > > > 
> > > > > > The major problem with RDMA is that it doesn't always wait on 
> > > > > > close() for the
> > > > > > MR holding the page pins to be destoyed. This is done to avoid a
> > > > > > deadlock of the form:
> > > > > > 
> > > > > >uverbs_destroy_ufile_hw()
> > > > > >   mutex_lock()
> > > > > >[..]
> > > > > > mmput()
> > > > > >  exit_mmap()
> > > > > >   remove_vma()
> > > > > >fput();
> > > > > > file_operations->release()
> > > > > 
> > > > > I think this is wrong, and I'm pretty sure it's an example of why
> > > > > the final __fput() call is moved out of line.
> > > > 
> > > > Yes, I think so too, all I can say is this *used* to happen, as we
> > > > have special code avoiding it, which is the code that is messing up
> > > > Ira's lifetime model.
> > > > 
> > > > Ira, you could try unraveling the special locking, that solves your
> > > > lifetime issues?
> > > 
> > > Yes I will try to prove this out...  But I'm still not sure this fully 
> > > solves
> > > the problem.
> > > 
> > > This only ensures that the process which has the RDMA context (RDMA FD) 
> > > is safe
> > > with regard to hanging the close for the "data file FD" (the file which 
> > > has
> > > pinned pages) in that _same_ process.  But what about the scenario.
> > > 
> > > Process A has the RDMA context FD and data file FD (with lease) open.
> > > 
> > > Process A uses SCM_RIGHTS to pass the RDMA context FD to Process B.
> > 
> > Passing the RDMA context dependent on a file layout lease to another
> > process that doesn't have a file layout lease or a reference to the
> > original lease should be considered a violation of the layout lease.
> > Process B does not have an active layout lease, and so by the rules
> > of layout leases, it is not allowed to pin the layout of the file.
> > 
> 
> I don't disagree with the semantics of this.  I just don't know how to enforce
> it.
> 
> > > Process A attempts to exit (hangs because data file FD is pinned).
> > > 
> > > Admin kills process A.  kill works because we have allowed for it...
> > > 
> > > Process B _still_ has the RDMA context FD open _and_ therefore still 
> > > holds the
> > > file pins.
> > > 
> > > Truncation still fails.
> > > 
> > > Admin does not know which process is holding the pin.
> > > 
> > > What am I missing?
> > 
> > Application does not hold the correct file layout lease references.
> > Passing the fd via SCM_RIGHTS to a process without a layout lease
> > is equivalent to not using layout leases in the first place.
> 
> Ok, So If I understand you correctly you would support a failure of SCM_RIGHTS
> in this case?  I'm ok with that but not sure how to implement it right now.
> 
> To that end, I would like to simplify this slightly because I'm not convinced
> that SCM_RIGHTS is a problem we need to solve right now.  ie I don't know of a
> user who wants to do this.

I don't think we can support it, let alone want to. SCM_RIGHTS was a
mistake made years ago that has been causing bugs and complexity to
try and avoid those bugs ever since.  I'm only taking about it
because someone else raised it and I asummed they raised it because
they want it to "work".

> Right now duplication via SCM_RIGHTS could fail if _any_ file pins (and by
> definition leases) exist underneath the "RDMA FD" (or other direct access FD,
> like XDP etc) being duplicated.

Sounds like a fine idea to me.

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


RE: [PATCH v2 08/10] PCI: layerscape: Add EP mode support for ls1088a and ls2088a

2019-08-23 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月23日 22:28
> To: Xiaowei Bao 
> Cc: bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ; kis...@ti.com;
> lorenzo.pieral...@arm.co; a...@arndb.de; gre...@linuxfoundation.org; M.h.
> Lian ; Mingkai Hu ; Roy
> Zang ; jingooh...@gmail.com;
> gustavo.pimen...@synopsys.com; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH v2 08/10] PCI: layerscape: Add EP mode support for
> ls1088a and ls2088a
> 
> On Thu, Aug 22, 2019 at 07:22:40PM +0800, Xiaowei Bao wrote:
> > Add PCIe EP mode support for ls1088a and ls2088a, there are some
> > difference between LS1 and LS2 platform, so refactor the code of the
> > EP driver.
> >
> > Signed-off-by: Xiaowei Bao 
> > ---
> > v2:
> >  - New mechanism for layerscape EP driver.
> 
> Was there a v1 of this patch?

Yes, but I don't know how to comments, ^_^

> 
> >
> >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 76
> > --
> >  1 file changed, 58 insertions(+), 18 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > index 7ca5fe8..2a66f07 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > @@ -20,27 +20,29 @@
> >
> >  #define PCIE_DBI2_OFFSET   0x1000  /* DBI2 base address*/
> >
> > -struct ls_pcie_ep {
> > -   struct dw_pcie  *pci;
> > -   struct pci_epc_features *ls_epc;
> > +#define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
> > +
> > +struct ls_pcie_ep_drvdata {
> > +   u32 func_offset;
> > +   const struct dw_pcie_ep_ops *ops;
> > +   const struct dw_pcie_ops*dw_pcie_ops;
> >  };
> >
> > -#define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
> > +struct ls_pcie_ep {
> > +   struct dw_pcie  *pci;
> > +   struct pci_epc_features *ls_epc;
> > +   const struct ls_pcie_ep_drvdata *drvdata; };
> >
> >  static int ls_pcie_establish_link(struct dw_pcie *pci)  {
> > return 0;
> >  }
> >
> > -static const struct dw_pcie_ops ls_pcie_ep_ops = {
> > +static const struct dw_pcie_ops dw_ls_pcie_ep_ops = {
> > .start_link = ls_pcie_establish_link,  };
> >
> > -static const struct of_device_id ls_pcie_ep_of_match[] = {
> > -   { .compatible = "fsl,ls-pcie-ep",},
> > -   { },
> > -};
> > -
> >  static const struct pci_epc_features*  ls_pcie_ep_get_features(struct
> > dw_pcie_ep *ep)  { @@ -82,10 +84,44 @@ static int
> > ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> > }
> >  }
> >
> > -static const struct dw_pcie_ep_ops pcie_ep_ops = {
> > +static unsigned int ls_pcie_ep_func_conf_select(struct dw_pcie_ep *ep,
> > +   u8 func_no)
> > +{
> > +   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > +   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> > +   u8 header_type;
> > +
> > +   header_type = ioread8(pci->dbi_base + PCI_HEADER_TYPE);
> > +
> > +   if (header_type & (1 << 7))
> > +   return pcie->drvdata->func_offset * func_no;
> > +   else
> > +   return 0;
> 
> It looks like there isn't a PCI define for multi function, the nearest I 
> could find
> was PCI_HEADER_TYPE_MULTIDEVICE in hotplug/ibmphp.h. A comment
> above the test might be helpful to explain the test.

Yes, I have not find the PCI_HEADER_TYPE_MULTIDEVICE define. OK, I will add
The comments in next version patch.

> 
> As the ls_pcie_ep_drvdata structures are static, the unset .func_offset will 
> be
> initialised to 0, so you could just drop the test above.

OK, thanks

> 
> However something to the effect of the following may help spot
> misconfiguration:
> 
> WARN_ON(func_no && !pcie->drvdata->func_offset); return
> pcie->drvdata->func_offset * func_no;

Thanks a lot, this looks better.

> 
> The WARN is probably quite useful as if you are attempting to use non-zero
> functions and func_offset isn't set - then things may appear to work normally
> but actually will break horribly.

got it, thanks.

> 
> Thanks,
> 
> Andrew Murray
> 
> > +}
> > +
> > +static const struct dw_pcie_ep_ops ls_pcie_ep_ops = {
> > .ep_init = ls_pcie_ep_init,
> > .raise_irq = ls_pcie_ep_raise_irq,
> > .get_features = ls_pcie_ep_get_features,
> > +   .func_conf_select = ls_pcie_ep_func_conf_select, };
> > +
> > +static const struct ls_pcie_ep_drvdata ls1_ep_drvdata = {
> > +   .ops = _pcie_ep_ops,
> > +   .dw_pcie_ops = _ls_pcie_ep_ops,
> > +};
> > +
> > +static const struct ls_pcie_ep_drvdata ls2_ep_drvdata = {
> > +   .func_offset = 0x2,
> > +   .ops = _pcie_ep_ops,
> > +   .dw_pcie_ops = _ls_pcie_ep_ops,
> > +};
> > +
> > +static const struct of_device_id ls_pcie_ep_of_match[] = {
> > +   { .compatible = "fsl,ls1046a-pcie-ep", .data = _ep_drvdata },
> > +   { 

RE: [PATCH 3/3] nvme: complete request in work queue on CPU with flooded interrupts

2019-08-23 Thread Long Li
>>>Subject: Re: [PATCH 3/3] nvme: complete request in work queue on CPU
>>>with flooded interrupts
>>>
>>>
 Sagi,

 Here are the test results.

 Benchmark command:
 fio --bs=4k --ioengine=libaio --iodepth=64
 --
>>>filename=/dev/nvme0n1:/dev/nvme1n1:/dev/nvme2n1:/dev/nvme3n1:/d
>>>ev/nv

>>>me4n1:/dev/nvme5n1:/dev/nvme6n1:/dev/nvme7n1:/dev/nvme8n1:/dev
>>>/nvme9n1
 --direct=1 --runtime=90 --numjobs=80 --rw=randread --name=test
 --group_reporting --gtod_reduce=1

 With your patch: 1720k IOPS
 With threaded interrupts: 1320k IOPS
 With just interrupts: 3720k IOPS

 Interrupts are the fastest but we need to find a way to throttle it.
>>>
>>>This is the workload that generates the flood?
>>>If so I did not expect that this would be the perf difference..
>>>
>>>If completions keep pounding on the cpu, I would expect irqpoll to simply
>>>keep running forever and poll the cqs. There is no fundamental reason why
>>>polling would be faster in an interrupt, what matters could be:
>>>1. we reschedule more than we need to
>>>2. we don't reap enough completions in every poll round, which will trigger
>>>rearming the interrupt and then when it fires reschedule another softirq...
>>>

Yes I think it's the rescheduling that takes some. With the patch there are 
lots of ksoftirqd activities. (compared to nearly none without the patch)
A 90 seconds FIO run shows a big difference of context switches on all CPUs:
With patch: 5755849
Without patch: 1462931

>>>Maybe we need to take care of some irq_poll optimizations?
>>>
>>>Does this (untested) patch make any difference?
>>>--
>>>diff --git a/lib/irq_poll.c b/lib/irq_poll.c index 2f17b488d58e..0e94183eba15
>>>100644
>>>--- a/lib/irq_poll.c
>>>+++ b/lib/irq_poll.c
>>>@@ -12,7 +12,8 @@
>>>  #include 
>>>  #include 
>>>
>>>-static unsigned int irq_poll_budget __read_mostly = 256;
>>>+static unsigned int irq_poll_budget __read_mostly = 3000; unsigned int
>>>+__read_mostly irqpoll_budget_usecs = 2000;
>>>
>>>  static DEFINE_PER_CPU(struct list_head, blk_cpu_iopoll);
>>>
>>>@@ -77,32 +78,26 @@ EXPORT_SYMBOL(irq_poll_complete);
>>>
>>>  static void __latent_entropy irq_poll_softirq(struct softirq_action *h)
>>>  {
>>>-   struct list_head *list = this_cpu_ptr(_cpu_iopoll);
>>>-   int rearm = 0, budget = irq_poll_budget;
>>>-   unsigned long start_time = jiffies;
>>>+   struct list_head *irqpoll_list = this_cpu_ptr(_cpu_iopoll);
>>>+   unsigned int budget = irq_poll_budget;
>>>+   unsigned long time_limit =
>>>+   jiffies + usecs_to_jiffies(irqpoll_budget_usecs);
>>>+   LIST_HEAD(list);
>>>
>>> local_irq_disable();
>>>+   list_splice_init(irqpoll_list, );
>>>+   local_irq_enable();
>>>
>>>-   while (!list_empty(list)) {
>>>+   while (!list_empty()) {
>>> struct irq_poll *iop;
>>> int work, weight;
>>>
>>>-   /*
>>>-* If softirq window is exhausted then punt.
>>>-*/
>>>-   if (budget <= 0 || time_after(jiffies, start_time)) {
>>>-   rearm = 1;
>>>-   break;
>>>-   }
>>>-
>>>-   local_irq_enable();
>>>-
>>> /* Even though interrupts have been re-enabled, this
>>>  * access is safe because interrupts can only add new
>>>  * entries to the tail of this list, and only ->poll()
>>>  * calls can remove this head entry from the list.
>>>  */
>>>-   iop = list_entry(list->next, struct irq_poll, list);
>>>+   iop = list_first_entry(, struct irq_poll, list);
>>>
>>> weight = iop->weight;
>>> work = 0;
>>>@@ -111,8 +106,6 @@ static void __latent_entropy irq_poll_softirq(struct
>>>softirq_action *h)
>>>
>>> budget -= work;
>>>
>>>-   local_irq_disable();
>>>-
>>> /*
>>>  * Drivers must not modify the iopoll state, if they
>>>  * consume their assigned weight (or more, some drivers 
>>> can't @@
>>>-125,11 +118,21 @@ static void __latent_entropy irq_poll_softirq(struct
>>>softirq_action *h)
>>> if (test_bit(IRQ_POLL_F_DISABLE, >state))
>>> __irq_poll_complete(iop);
>>> else
>>>-   list_move_tail(>list, list);
>>>+   list_move_tail(>list, );
>>> }
>>>+
>>>+   /*
>>>+* If softirq window is exhausted then punt.
>>>+*/
>>>+   if (budget <= 0 || time_after_eq(jiffies, time_limit))
>>>+   break;
>>> }
>>>
>>>-   if (rearm)
>>>+   local_irq_disable();
>>>+
>>>+   list_splice_tail_init(irqpoll_list, );
>>>+   list_splice(, irqpoll_list);
>>>+   if 

Re: [PATCH] uprobes/x86: fix detection of 32-bit user mode

2019-08-23 Thread Andy Lutomirski



> On Aug 23, 2019, at 5:03 PM, Thomas Gleixner  wrote:
> 
>> On Sat, 24 Aug 2019, Thomas Gleixner wrote:
>> On Fri, 23 Aug 2019, Andy Lutomirski wrote:
 On Aug 23, 2019, at 4:44 PM, Thomas Gleixner  wrote:
 
>> On Sat, 24 Aug 2019, Thomas Gleixner wrote:
>> On Sun, 28 Jul 2019, Sebastian Mayr wrote:
>> 
>> -static inline int sizeof_long(void)
>> +static inline int sizeof_long(struct pt_regs *regs)
>> {
>> -return in_ia32_syscall() ? 4 : 8;
> 
> This wants a comment.
> 
>> +return user_64bit_mode(regs) ? 8 : 4;
 
 The more simpler one liner is to check
 
   test_thread_flag(TIF_IA32)
>>> 
>>> I still want to finish killing TIF_IA32 some day.  Let’s please not add new 
>>> users.
>> 
>> Well, yes and no. This needs to be backported 
> 
> And TBH the magic in user_64bit_mode() is not pretty either.
> 
> 

It’s only magic on Xen. I should probably stick a 
cpu_feature_enabled(X86_FEATURE_XENPV) in there instead.

Re: [RFC PATCH v2 00/19] RDMA/FS DAX truncate proposal V1,000,002 ;-)

2019-08-23 Thread Dave Chinner
On Fri, Aug 23, 2019 at 09:04:29AM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 23, 2019 at 01:23:45PM +1000, Dave Chinner wrote:
> 
> > > But the fact that RDMA, and potentially others, can "pass the
> > > pins" to other processes is something I spent a lot of time trying to 
> > > work out.
> > 
> > There's nothing in file layout lease architecture that says you
> > can't "pass the pins" to another process.  All the file layout lease
> > requirements say is that if you are going to pass a resource for
> > which the layout lease guarantees access for to another process,
> > then the destination process already have a valid, active layout
> > lease that covers the range of the pins being passed to it via the
> > RDMA handle.
> 
> How would the kernel detect and enforce this? There are many ways to
> pass a FD.

AFAIC, that's not really a kernel problem. It's more of an
application design constraint than anything else. i.e. if the app
passes the IB context to another process without a lease, then the
original process is still responsible for recalling the lease and
has to tell that other process to release the IB handle and it's
resources.

> IMHO it is wrong to try and create a model where the file lease exists
> independently from the kernel object relying on it. In other words the
> IB MR object itself should hold a reference to the lease it relies
> upon to function properly.

That still doesn't work. Leases are not individually trackable or
reference counted objects objects - they are attached to a struct
file bUt, in reality, they are far more restricted than a struct
file.

That is, a lease specifically tracks the pid and the _open fd_ it
was obtained for, so it is essentially owned by a specific process
context. Hence a lease is not able to be passed to a separate
process context and have it still work correctly for lease break
notifications.  i.e. the layout break signal gets delivered to
original process that created the struct file, if it still exists
and has the original fd still open. It does not get sent to the
process that currently holds a reference to the IB context.

So while a struct file passed to another process might still have
an active lease, and you can change the owner of the struct file
via fcntl(F_SETOWN), you can't associate the existing lease with a
the new fd in the new process and so layout break signals can't be
directed at the lease fd

This really means that a lease can only be owned by a single process
context - it can't be shared across multiple processes (so I was
wrong about dup/pass as being a possible way of passing them)
because there's only one process that can "own" a struct file, and
that where signals are sent when the lease needs to be broken.

So, fundamentally, if you want to pass a resource that pins a file
layout between processes, both processes need to hold a layout lease
on that file range. And that means exclusive leases and passing
layouts between processes are fundamentally incompatible because you
can't hold two exclusive leases on the same file range

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com


RE: [PATCH v2 07/10] PCI: layerscape: Modify the MSIX to the doorbell way

2019-08-23 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月23日 21:58
> To: Xiaowei Bao 
> Cc: bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ; kis...@ti.com;
> lorenzo.pieral...@arm.co; a...@arndb.de; gre...@linuxfoundation.org; M.h.
> Lian ; Mingkai Hu ; Roy
> Zang ; jingooh...@gmail.com;
> gustavo.pimen...@synopsys.com; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH v2 07/10] PCI: layerscape: Modify the MSIX to the
> doorbell way
> 
> On Thu, Aug 22, 2019 at 07:22:39PM +0800, Xiaowei Bao wrote:
> > The layerscape platform use the doorbell way to trigger MSIX interrupt
> > in EP mode.
> >
> 
> I have no problems with this patch, however...
> 
> Are you able to add to this message a reason for why you are making this
> change? Did dw_pcie_ep_raise_msix_irq not work when func_no != 0? Or did
> it work yet dw_pcie_ep_raise_msix_irq_doorbell is more efficient?

The fact is that, this driver is verified in ls1046a platform of NXP before, 
and ls1046a don't
support MSIX feature, so I set the msix_capable of pci_epc_features struct is 
false,
but in other platform, e.g. ls1088a, it support the MSIX feature, I verified 
the MSIX
feature in ls1088a, it is not OK, so I changed to another way. Thanks.

> 
> Thanks,
> 
> Andrew Murray
> 
> > Signed-off-by: Xiaowei Bao 
> > ---
> > v2:
> >  - No change.
> >
> >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > index 8461f62..7ca5fe8 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > @@ -74,7 +74,8 @@ static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep,
> u8 func_no,
> > case PCI_EPC_IRQ_MSI:
> > return dw_pcie_ep_raise_msi_irq(ep, func_no, interrupt_num);
> > case PCI_EPC_IRQ_MSIX:
> > -   return dw_pcie_ep_raise_msix_irq(ep, func_no, interrupt_num);
> > +   return dw_pcie_ep_raise_msix_irq_doorbell(ep, func_no,
> > + interrupt_num);
> > default:
> > dev_err(pci->dev, "UNKNOWN IRQ type\n");
> > return -EINVAL;
> > --
> > 2.9.5
> >


Re: [PATCH 01/15] sched: introduce task_se_h_load helper

2019-08-23 Thread Rik van Riel
On Fri, 2019-08-23 at 20:13 +0200, Dietmar Eggemann wrote:
> 
> 
> > @@ -1668,7 +1668,7 @@ static void task_numa_compare(struct
> > task_numa_env *env,
> > /*
> >  * In the overloaded case, try and keep the load balanced.
> >  */
> > -   load = task_h_load(env->p) - task_h_load(cur);
> > +   load = task_se_h_load(env->p->se) - task_se_h_load(cur->se);
> 
> Shouldn't this be:
> 
> load = task_se_h_load(>p->se) - task_se_h_load(>se);

Yes indeed. Sorry I forgot to fix these after you
pointed them out last time.

They are now fixed in my tree.

-- 
All Rights Reversed.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] uprobes/x86: fix detection of 32-bit user mode

2019-08-23 Thread Thomas Gleixner
On Sat, 24 Aug 2019, Thomas Gleixner wrote:
> On Fri, 23 Aug 2019, Andy Lutomirski wrote:
> > > On Aug 23, 2019, at 4:44 PM, Thomas Gleixner  wrote:
> > > 
> > >> On Sat, 24 Aug 2019, Thomas Gleixner wrote:
> > >>> On Sun, 28 Jul 2019, Sebastian Mayr wrote:
> > >>> 
> > >>> -static inline int sizeof_long(void)
> > >>> +static inline int sizeof_long(struct pt_regs *regs)
> > >>> {
> > >>> -return in_ia32_syscall() ? 4 : 8;
> > >> 
> > >>  This wants a comment.
> > >> 
> > >>> +return user_64bit_mode(regs) ? 8 : 4;
> > > 
> > > The more simpler one liner is to check
> > > 
> > >test_thread_flag(TIF_IA32)
> > 
> > I still want to finish killing TIF_IA32 some day.  Let’s please not add new 
> > users.
> 
> Well, yes and no. This needs to be backported 

And TBH the magic in user_64bit_mode() is not pretty either.

Thanks,

tglx

Re: [PATCH] uprobes/x86: fix detection of 32-bit user mode

2019-08-23 Thread Thomas Gleixner
On Fri, 23 Aug 2019, Andy Lutomirski wrote:
> > On Aug 23, 2019, at 4:44 PM, Thomas Gleixner  wrote:
> > 
> >> On Sat, 24 Aug 2019, Thomas Gleixner wrote:
> >>> On Sun, 28 Jul 2019, Sebastian Mayr wrote:
> >>> 
> >>> -static inline int sizeof_long(void)
> >>> +static inline int sizeof_long(struct pt_regs *regs)
> >>> {
> >>> -return in_ia32_syscall() ? 4 : 8;
> >> 
> >>  This wants a comment.
> >> 
> >>> +return user_64bit_mode(regs) ? 8 : 4;
> > 
> > The more simpler one liner is to check
> > 
> >test_thread_flag(TIF_IA32)
> 
> I still want to finish killing TIF_IA32 some day.  Let’s please not add new 
> users.

Well, yes and no. This needs to be backported 

RE: [PATCH v2 05/10] PCI: layerscape: Fix some format issue of the code

2019-08-23 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月23日 21:45
> To: Xiaowei Bao 
> Cc: bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ; kis...@ti.com;
> lorenzo.pieral...@arm.co; a...@arndb.de; gre...@linuxfoundation.org; M.h.
> Lian ; Mingkai Hu ; Roy
> Zang ; jingooh...@gmail.com;
> gustavo.pimen...@synopsys.com; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH v2 05/10] PCI: layerscape: Fix some format issue of the
> code
> 
> On Thu, Aug 22, 2019 at 07:22:37PM +0800, Xiaowei Bao wrote:
> > Fix some format issue of the code in EP driver.
> >
> > Signed-off-by: Xiaowei Bao 
> 
> Reviewed-by: Andrew Murray 

Thanks.

> 
> 
> > ---
> > v2:
> >  - No change.
> >
> >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > index be61d96..4e92a95 100644
> > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > @@ -62,7 +62,7 @@ static void ls_pcie_ep_init(struct dw_pcie_ep *ep)
> > }
> >
> >  static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no,
> > - enum pci_epc_irq_type type, u16 interrupt_num)
> > +   enum pci_epc_irq_type type, u16 interrupt_num)
> >  {
> > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >
> > @@ -86,7 +86,7 @@ static const struct dw_pcie_ep_ops pcie_ep_ops = {
> > };
> >
> >  static int __init ls_add_pcie_ep(struct ls_pcie_ep *pcie,
> > -   struct platform_device *pdev)
> > +struct platform_device *pdev)
> >  {
> > struct dw_pcie *pci = pcie->pci;
> > struct device *dev = pci->dev;
> > --
> > 2.9.5
> >


Re: [PATCH] uprobes/x86: fix detection of 32-bit user mode

2019-08-23 Thread Andy Lutomirski



> On Aug 23, 2019, at 4:44 PM, Thomas Gleixner  wrote:
> 
>> On Sat, 24 Aug 2019, Thomas Gleixner wrote:
>>> On Sun, 28 Jul 2019, Sebastian Mayr wrote:
>>> 
>>> -static inline int sizeof_long(void)
>>> +static inline int sizeof_long(struct pt_regs *regs)
>>> {
>>> -return in_ia32_syscall() ? 4 : 8;
>> 
>>  This wants a comment.
>> 
>>> +return user_64bit_mode(regs) ? 8 : 4;
> 
> The more simpler one liner is to check
> 
>test_thread_flag(TIF_IA32)

I still want to finish killing TIF_IA32 some day.  Let’s please not add new 
users.



RE: [PATCH v2 02/10] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode

2019-08-23 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月23日 21:36
> To: Xiaowei Bao 
> Cc: bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ; kis...@ti.com;
> lorenzo.pieral...@arm.co; a...@arndb.de; gre...@linuxfoundation.org; M.h.
> Lian ; Mingkai Hu ; Roy
> Zang ; jingooh...@gmail.com;
> gustavo.pimen...@synopsys.com; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH v2 02/10] PCI: designware-ep: Add the doorbell mode of
> MSI-X in EP mode
> 
> On Thu, Aug 22, 2019 at 07:22:34PM +0800, Xiaowei Bao wrote:
> > Add the doorbell mode of MSI-X in EP mode.
> >
> > Signed-off-by: Xiaowei Bao 
> > ---
> > v2:
> >  - Remove the macro of no used.
> >
> >  drivers/pci/controller/dwc/pcie-designware-ep.c | 14 ++
> >  drivers/pci/controller/dwc/pcie-designware.h| 12 
> >  2 files changed, 26 insertions(+)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > index 3e2b740..b8388f8 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > @@ -480,6 +480,20 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep
> *ep, u8 func_no,
> > return 0;
> >  }
> >
> > +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8
> func_no,
> > +  u16 interrupt_num)
> > +{
> > +   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > +   u32 msg_data;
> > +
> > +   msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) |
> > +  (interrupt_num - 1);
> > +
> > +   dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data);
> > +
> > +   return 0;
> > +}
> > +
> >  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> >   u16 interrupt_num)
> >  {
> > diff --git a/drivers/pci/controller/dwc/pcie-designware.h
> > b/drivers/pci/controller/dwc/pcie-designware.h
> > index a0fdbf7..895a9ef 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > @@ -88,6 +88,9 @@
> >  #define PCIE_MISC_CONTROL_1_OFF0x8BC
> >  #define PCIE_DBI_RO_WR_EN  BIT(0)
> >
> > +#define PCIE_MSIX_DOORBELL 0x948
> > +#define PCIE_MSIX_DOORBELL_PF_SHIFT24
> > +
> >  /*
> >   * iATU Unroll-specific register definitions
> >   * From 4.80 core version the address translation will be made by
> > unroll @@ -400,6 +403,8 @@ int dw_pcie_ep_raise_msi_irq(struct
> dw_pcie_ep *ep, u8 func_no,
> >  u8 interrupt_num);
> >  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> >  u16 interrupt_num);
> > +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8
> func_no,
> > +  u16 interrupt_num);
> >  void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar);
> > #else  static inline void dw_pcie_ep_linkup(struct dw_pcie_ep *ep) @@
> > -432,6 +437,13 @@ static inline int dw_pcie_ep_raise_msix_irq(struct
> dw_pcie_ep *ep, u8 func_no,
> > return 0;
> >  }
> >
> > +static inline int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep
> *ep,
> > +u8 func_no,
> > +u16 interrupt_num)
> > +{
> > +   return 0;
> > +}
> > +
> 
> Looks OK to me.
> 
> Reviewed-by: Andrew Murray 

Thanks a lot.

> 
> >  static inline void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum
> > pci_barno bar)  {  }
> > --
> > 2.9.5
> >


RE: [PATCH v2 01/10] PCI: designware-ep: Add multiple PFs support for DWC

2019-08-23 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月23日 21:25
> To: Xiaowei Bao 
> Cc: bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ; kis...@ti.com;
> lorenzo.pieral...@arm.co; a...@arndb.de; gre...@linuxfoundation.org; M.h.
> Lian ; Mingkai Hu ; Roy
> Zang ; jingooh...@gmail.com;
> gustavo.pimen...@synopsys.com; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH v2 01/10] PCI: designware-ep: Add multiple PFs support
> for DWC
> 
> On Thu, Aug 22, 2019 at 07:22:33PM +0800, Xiaowei Bao wrote:
> > Add multiple PFs support for DWC, different PF have different config
> > space we use pf-offset property which get from the DTS to access the
> > different pF config space.
> 
> It looks like you're missing a --cover-letter again.
> 
> >
> > Signed-off-by: Xiaowei Bao 
> > ---
> > v2:
> >  - Remove duplicate redundant code.
> >  - Reimplement the PF config space access way.
> >
> >  drivers/pci/controller/dwc/pcie-designware-ep.c | 122
> 
> >  drivers/pci/controller/dwc/pcie-designware.c|  59 
> >  drivers/pci/controller/dwc/pcie-designware.h|  11 ++-
> >  3 files changed, 134 insertions(+), 58 deletions(-)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > index 2bf5a35..3e2b740 100644
> > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > @@ -19,12 +19,17 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
> > pci_epc_linkup(epc);
> >  }
> >
> > -static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno
> bar,
> > -  int flags)
> > +static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8 func_no,
> > +  enum pci_barno bar, int flags)
> >  {
> > u32 reg;
> > +   unsigned int func_offset = 0;
> > +   struct dw_pcie_ep *ep = >ep;
> >
> > -   reg = PCI_BASE_ADDRESS_0 + (4 * bar);
> > +   if (ep->ops->func_conf_select)
> > +   func_offset = ep->ops->func_conf_select(ep, func_no);
> > +
> > +   reg = func_offset + PCI_BASE_ADDRESS_0 + (4 * bar);
> 
> This pattern of checking if func_conf_select exists and using it to get an 
> offset
> is repeated a lot throughout this file. You could move this functionality 
> into a
> new function (similar to dw_pcie_read_dbi etc). Or perhaps a new variant of
> dw_pcie_writel_ should be created that writes takes a func_no argument.

Thanks for your comments, I thought about this method before, but there is a 
issue
about the method of access the different func config space, due to our platform 
use
this method that different func have different offset from dbi_base to access 
the
different config space, but others platform maybe use the way that write a 
register
to implement different func config space access, so I think reserve a callback 
function 
to different platform to implement the own method, my point is that, if use 
register 
method they can implement the code in this function and return offset is 0, if 
use 
offset method, they can return the offset value which can be use by dw_pcie_ep 
driver.
 
> 
> 
> > dw_pcie_dbi_ro_wr_en(pci);
> > dw_pcie_writel_dbi2(pci, reg, 0x0);
> > dw_pcie_writel_dbi(pci, reg, 0x0);
> 
> 
> > @@ -235,7 +257,7 @@ static int dw_pcie_ep_map_addr(struct pci_epc
> *epc, u8 func_no,
> > struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >
> > -   ret = dw_pcie_ep_outbound_atu(ep, addr, pci_addr, size);
> > +   ret = dw_pcie_ep_outbound_atu(ep, func_no, addr, pci_addr, size);
> > if (ret) {
> > dev_err(pci->dev, "Failed to enable address\n");
> > return ret;
> > @@ -249,11 +271,15 @@ static int dw_pcie_ep_get_msi(struct pci_epc
> *epc, u8 func_no)
> > struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > u32 val, reg;
> > +   unsigned int func_offset = 0;
> > +
> > +   if (ep->ops->func_conf_select)
> > +   func_offset = ep->ops->func_conf_select(ep, func_no);
> >
> > if (!ep->msi_cap)
> > return -EINVAL;
> >
> > -   reg = ep->msi_cap + PCI_MSI_FLAGS;
> > +   reg = ep->msi_cap + func_offset + PCI_MSI_FLAGS;
> 
> This makes me nervous.
> 
> From a PCI viewpoint, each function has it's own capability structure and
> within each function there may exist a MSI capability. Yet what we're doing
> here is using dw_pcie_ep_find_capability to get the list of capabilities for
> function 0, and then applying offsets from that for subsequent functions. I.e.
> we're applying DW specific knowledge to find the correct capability, rather
> than following the general PCI approach.
> 
> I think the above hunk shouldn't be required - but 

Re: [PATCH] uprobes/x86: fix detection of 32-bit user mode

2019-08-23 Thread Thomas Gleixner
On Sat, 24 Aug 2019, Thomas Gleixner wrote:
> On Sun, 28 Jul 2019, Sebastian Mayr wrote:
> 
> > -static inline int sizeof_long(void)
> > +static inline int sizeof_long(struct pt_regs *regs)
> >  {
> > -   return in_ia32_syscall() ? 4 : 8;
> 
>   This wants a comment.
> 
> > +   return user_64bit_mode(regs) ? 8 : 4;

The more simpler one liner is to check

test_thread_flag(TIF_IA32)

which is only true for IA32 and independent of syscalls, exceptions ...

Thanks,

tglx


Re: [PATCH] uprobes/x86: fix detection of 32-bit user mode

2019-08-23 Thread Thomas Gleixner
Sebastian,

On Sun, 28 Jul 2019, Sebastian Mayr wrote:

sorry for the delay..

> 32-bit processes running on a 64-bit kernel are not always detected
> correctly, causing the process to crash when uretprobes are installed.
> The reason for the crash is that in_ia32_syscall() is used to determine
> the process's mode, which only works correctly when called from a
> syscall. In the case of uretprobes, however, the function is called from
> a software interrupt and always returns 'false' (on a 64-bit kernel). In

s/software interrupt/exception/

> consequence this leads to corruption of the process's return address.

Nice detective work and well written changelog!

> This can be fixed by using user_64bit_mode(), which should always be
> correct.

This wants to be:

  Fix this by using user_64bit_mode() which is always correct.

Be imperative, even if not entirely sure. You nicely put a disclaimer into
the discard section.

This also wants a Fixes tag and cc stable. Interestingly enough this should
have been detected by the rename with

  abfb9498ee13 ("x86/entry: Rename is_{ia32,x32}_task() to 
in_{ia32,x32}_syscall()")

which states in the changelog:

The is_ia32_task()/is_x32_task() function names are a big misnomer: they
suggests that the compat-ness of a system call is a task property, which
is not true, the compatness of a system call purely depends on how it
was invoked through the system call layer.
.

and then it went and blindly renamed every call site 

And sadly this was already mentioned here:

   8faaed1b9f50 ("uprobes/x86: Introduce sizeof_long(), cleanup 
adjust_ret_addr() and arch_uretprobe_hijack_return_addr()")

where the changelog says:

   TODO: is_ia32_task() is not what we actually want, TS_COMPAT does
   not necessarily mean 32bit. Fortunately syscall-like insns can't be
   probed so it actually works, but it would be better to rename and
   use is_ia32_frame().

and goes all the way back to:

0326f5a94dde ("uprobes/core: Handle breakpoint and singlestep exceptions")

Oh well. 7 years until someone actually tried a uretprobe on a 32bit
process on a 64bit kernel

> -static inline int sizeof_long(void)
> +static inline int sizeof_long(struct pt_regs *regs)
>  {
> - return in_ia32_syscall() ? 4 : 8;

  This wants a comment.

> + return user_64bit_mode(regs) ? 8 : 4;
>  }

No need to resend, I'll fix this up while applying.

Thank you very much for digging into this!

  tglx




Re: [PATCH] Partially revert "mm/memcontrol.c: keep local VM counters in sync with the hierarchical ones"

2019-08-23 Thread Roman Gushchin
On Fri, Aug 16, 2019 at 05:47:26PM -0700, Roman Gushchin wrote:
> Commit 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync
> with the hierarchical ones") effectively decreased the precision of
> per-memcg vmstats_local and per-memcg-per-node lruvec percpu counters.
> 
> That's good for displaying in memory.stat, but brings a serious regression
> into the reclaim process.
> 
> One issue I've discovered and debugged is the following:
> lruvec_lru_size() can return 0 instead of the actual number of pages
> in the lru list, preventing the kernel to reclaim last remaining
> pages. Result is yet another dying memory cgroups flooding.
> The opposite is also happening: scanning an empty lru list
> is the waste of cpu time.
> 
> Also, inactive_list_is_low() can return incorrect values, preventing
> the active lru from being scanned and freed. It can fail both because
> the size of active and inactive lists are inaccurate, and because
> the number of workingset refaults isn't precise. In other words,
> the result is pretty random.
> 
> I'm not sure, if using the approximate number of slab pages in
> count_shadow_number() is acceptable, but issues described above
> are enough to partially revert the patch.
> 
> Let's keep per-memcg vmstat_local batched (they are only used for
> displaying stats to the userspace), but keep lruvec stats precise.
> This change fixes the dead memcg flooding on my setup.
> 
> Fixes: 766a4c19d880 ("mm/memcontrol.c: keep local VM counters in sync with 
> the hierarchical ones")
> Signed-off-by: Roman Gushchin 
> Cc: Yafang Shao 
> Cc: Johannes Weiner 

Any other concerns/comments here?

I'd prefer to fix the regression: we're likely leaking several pages
of memory for each created and destroyed memory cgroup. Plus
all internal structures, which are measured in hundreds of kb.

Thanks!


[PATCH v7 4/8] PCI/DPC: Add Error Disconnect Recover (EDR) support

2019-08-23 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

As per ACPI specification r6.3, sec 5.6.6, when firmware owns Downstream
Port Containment (DPC), its expected to use the "Error Disconnect
Recover" (EDR) notification to alert OSPM of a DPC event and if OS
supports EDR, its expected to handle the software state invalidation and
port recovery in OS, and also let firmware know the recovery status via
_OST ACPI call. Related _OST status codes can be found in ACPI
specification r6.3, sec 6.3.5.2.

Also, as per PCI firmware specification r3.2 Downstream Port Containment
Related Enhancements ECN, sec 4.5.1, table 4-6, If DPC is controlled by
firmware (firmware first mode), firmware is responsible for
configuring the DPC and OS is responsible for error recovery. Also, OS
is allowed to modify DPC registers only during the EDR notification
window. So with EDR support, OS should provide DPC port services even in
firmware first mode.

Cc: Keith Busch 
Signed-off-by: Kuppuswamy Sathyanarayanan 

Acked-by: Keith Busch 
Tested-by: Huong Nguyen 
Tested-by: Austin Bolen 
---
 drivers/pci/pci-acpi.c   |  91 
 drivers/pci/pcie/Kconfig |  10 
 drivers/pci/pcie/dpc.c   | 125 ++-
 include/linux/pci-acpi.h |  11 
 4 files changed, 236 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 45049f558860..82abab25daf2 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -104,6 +104,97 @@ int acpi_get_rc_resources(struct device *dev, const char 
*hid, u16 segment,
 }
 #endif
 
+#if defined(CONFIG_PCIE_DPC) && defined(CONFIG_ACPI)
+
+/*
+ * _DSM wrapper function to enable/disable DPC port.
+ * @dpc   : DPC device structure
+ * @enable: status of DPC port (0 or 1).
+ *
+ * returns 0 on success or errno on failure.
+ */
+int acpi_enable_dpc_port(struct pci_dev *pdev, acpi_handle handle, bool enable)
+{
+   union acpi_object *obj;
+   int status = 0;
+   union acpi_object argv4;
+
+   argv4.type = ACPI_TYPE_INTEGER;
+   argv4.integer.value = enable;
+
+   obj = acpi_evaluate_dsm(handle, _acpi_dsm_guid, 1,
+   EDR_PORT_ENABLE_DSM, );
+   if (!obj)
+   return -EIO;
+
+   if (obj->type != ACPI_TYPE_INTEGER || obj->integer.value != enable)
+   status = -EIO;
+
+   ACPI_FREE(obj);
+
+   return status;
+}
+
+/*
+ * _DSM wrapper function to locate DPC port.
+ * @dpc   : DPC device structure
+ *
+ * returns pci_dev or NULL.
+ */
+struct pci_dev *acpi_locate_dpc_port(struct pci_dev *pdev, acpi_handle handle)
+{
+   union acpi_object *obj;
+   u16 port;
+
+   obj = acpi_evaluate_dsm(handle, _acpi_dsm_guid, 1,
+   EDR_PORT_LOCATE_DSM, NULL);
+   if (!obj)
+   return pci_dev_get(pdev);
+
+   if (obj->type != ACPI_TYPE_INTEGER) {
+   ACPI_FREE(obj);
+   return NULL;
+   }
+
+   /*
+* Firmware returns DPC port BDF details in following format:
+*  15:8 = bus
+*  7:3 = device
+*  2:0 = function
+*/
+   port = obj->integer.value;
+
+   ACPI_FREE(obj);
+
+   return pci_get_domain_bus_and_slot(pci_domain_nr(pdev->bus),
+  PCI_BUS_NUM(port), port & 0xff);
+}
+
+/*
+ * _OST wrapper function to let firmware know the status of EDR event.
+ * @dpc   : DPC device structure.
+ * @status: Status of EDR event.
+ */
+int acpi_send_edr_status(struct pci_dev *pdev,  acpi_handle handle, u16 status)
+{
+   u32 ost_status;
+
+   pci_dbg(pdev, "Sending EDR status :%x\n", status);
+
+   ost_status =  PCI_DEVID(pdev->bus->number, pdev->devfn);
+   ost_status = (ost_status << 16) | status;
+
+   status = acpi_evaluate_ost(handle,
+  ACPI_NOTIFY_DISCONNECT_RECOVER,
+  ost_status, NULL);
+   if (ACPI_FAILURE(status))
+   return -EINVAL;
+
+   return 0;
+}
+
+#endif /* CONFIG_PCIE_DPC && CONFIG_ACPI */
+
 phys_addr_t acpi_pci_root_get_mcfg_addr(acpi_handle handle)
 {
acpi_status status = AE_NOT_EXIST;
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 362eb8cfa53b..9a05015af7cd 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -150,3 +150,13 @@ config PCIE_BW
  This enables PCI Express Bandwidth Change Notification.  If
  you know link width or rate changes occur only to correct
  unreliable links, you may answer Y.
+
+config PCIE_EDR
+   bool "PCI Express Error Disconnect Recover support"
+   default n
+   depends on PCIE_DPC && ACPI
+   help
+ This options adds Error Disconnect Recover support as specified
+ in PCI firmware specification v3.2 Downstream Port Containment
+ Related Enhancements ECN. Enable this if you want to support hybrid
+ DPC model which 

[PATCH v7 2/8] PCI/DPC: Allow dpc_probe() even if firmware first mode is enabled

2019-08-23 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

As per ACPI specification v6.3, sec 5.6.6, Error Disconnect Recover
(EDR) notification used by firmware to let OS know about the DPC event
and permit OS to perform error recovery when processing the EDR
notification. Also, as per PCI firmware specification r3.2 Downstream
Port Containment Related Enhancements ECN, sec 4.5.1, table 4-6, if DPC
is controlled by firmware (firmware first mode), it's responsible for
initializing Downstream Port Containment Extended Capability Structures
per firmware policy. And, OS is permitted to read or write DPC Control
and Status registers of a port while processing an Error Disconnect
Recover (EDR) notification from firmware on that port.

Currently, if firmware controls DPC (firmware first mode), OS will not
create/enumerate DPC PCIe port services. But, if OS supports EDR
feature, then as mentioned in above spec references, it should permit
enumeration of DPC driver and also support handling ACPI EDR
notification. So as first step, allow dpc_probe() to continue even if
firmware first mode is enabled. Also add appropriate checks to ensure
device registers are not modified outside EDR notification window in
firmware first mode. This is a preparatory patch for adding EDR support.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Acked-by: Keith Busch 
---
 drivers/pci/pcie/dpc.c | 49 +++---
 1 file changed, 36 insertions(+), 13 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index a32ec3487a8d..9717fda012f8 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -22,6 +22,8 @@ struct dpc_dev {
u16 cap_pos;
boolrp_extensions;
u8  rp_log_size;
+   /* Set True if DPC is controlled by firmware */
+   boolfirmware_dpc;
 };
 
 static const char * const rp_pio_error_string[] = {
@@ -69,6 +71,9 @@ void pci_save_dpc_state(struct pci_dev *dev)
if (!dpc)
return;
 
+   if (dpc->firmware_dpc)
+   return;
+
save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_DPC);
if (!save_state)
return;
@@ -90,6 +95,9 @@ void pci_restore_dpc_state(struct pci_dev *dev)
if (!dpc)
return;
 
+   if (dpc->firmware_dpc)
+   return;
+
save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_DPC);
if (!save_state)
return;
@@ -291,9 +299,6 @@ static int dpc_probe(struct pcie_device *dev)
int status;
u16 ctl, cap;
 
-   if (pcie_aer_get_firmware_first(pdev))
-   return -ENOTSUPP;
-
dpc = devm_kzalloc(device, sizeof(*dpc), GFP_KERNEL);
if (!dpc)
return -ENOMEM;
@@ -302,13 +307,25 @@ static int dpc_probe(struct pcie_device *dev)
dpc->dev = dev;
set_service_data(dev, dpc);
 
-   status = devm_request_threaded_irq(device, dev->irq, dpc_irq,
-  dpc_handler, IRQF_SHARED,
-  "pcie-dpc", dpc);
-   if (status) {
-   pci_warn(pdev, "request IRQ%d failed: %d\n", dev->irq,
-status);
-   return status;
+   if (pcie_aer_get_firmware_first(pdev))
+   dpc->firmware_dpc = 1;
+
+   /*
+* If DPC is handled in firmware and ACPI support is not enabled
+* in OS, skip probe and return error.
+*/
+   if (dpc->firmware_dpc && !IS_ENABLED(CONFIG_ACPI))
+   return -ENODEV;
+
+   if (!dpc->firmware_dpc) {
+   status = devm_request_threaded_irq(device, dev->irq, dpc_irq,
+  dpc_handler, IRQF_SHARED,
+  "pcie-dpc", dpc);
+   if (status) {
+   pci_warn(pdev, "request IRQ%d failed: %d\n", dev->irq,
+status);
+   return status;
+   }
}
 
pci_read_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CAP, );
@@ -323,9 +340,12 @@ static int dpc_probe(struct pcie_device *dev)
dpc->rp_log_size = 0;
}
}
-
-   ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | 
PCI_EXP_DPC_CTL_INT_EN;
-   pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL, ctl);
+   if (!dpc->firmware_dpc) {
+   ctl = (ctl & 0xfff4) |
+   (PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN);
+   pci_write_config_word(pdev, dpc->cap_pos + PCI_EXP_DPC_CTL,
+ ctl);
+   }
 
pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c 
PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n",
 cap & PCI_EXP_DPC_IRQ, FLAG(cap, PCI_EXP_DPC_CAP_RP_EXT),
@@ -343,6 +363,9 @@ 

[PATCH v7 3/8] PCI/DPC: Add dpc_process_error() wrapper function

2019-08-23 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

With Error Disconnect Recover (EDR) support, we need to support
processing DPC event either from DPC IRQ or ACPI EDR event. So create
a wrapper function dpc_process_error() and move common error handling
code in to it. It will be used to process the DPC event in both DPC IRQ
and EDR ACPI event contexts.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Acked-by: Keith Busch 
---
 drivers/pci/pcie/dpc.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index 9717fda012f8..4137ec7b48cc 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -232,10 +232,9 @@ static int dpc_get_aer_uncorrect_severity(struct pci_dev 
*dev,
return 1;
 }
 
-static irqreturn_t dpc_handler(int irq, void *context)
+static void dpc_process_error(struct dpc_dev *dpc)
 {
struct aer_err_info info;
-   struct dpc_dev *dpc = context;
struct pci_dev *pdev = dpc->dev->port;
u16 cap = dpc->cap_pos, status, source, reason, ext_reason;
 
@@ -268,6 +267,13 @@ static irqreturn_t dpc_handler(int irq, void *context)
 
/* We configure DPC so it only triggers on ERR_FATAL */
pcie_do_recovery(pdev, pci_channel_io_frozen, PCIE_PORT_SERVICE_DPC);
+}
+
+static irqreturn_t dpc_handler(int irq, void *context)
+{
+   struct dpc_dev *dpc = context;
+
+   dpc_process_error(dpc);
 
return IRQ_HANDLED;
 }
-- 
2.21.0



[PATCH v7 5/8] PCI/AER: Allow clearing Error Status Register in FF mode

2019-08-23 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

As per PCI firmware specification r3.2 Downstream Port Containment
Related Enhancements ECN, sec 4.5.1, table 4-6, Error Disconnect
Recover (EDR) support allows OS to handle error recovery and
clearing Error Registers even in FF mode. So remove FF mode checks in
pci_cleanup_aer_uncorrect_error_status(), pci_aer_clear_fatal_status()
and pci_cleanup_aer_error_status_regs() functions.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Acked-by: Keith Busch 
---
 drivers/pci/pcie/aer.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
index b45bc47d04fe..e1509bb900c5 100644
--- a/drivers/pci/pcie/aer.c
+++ b/drivers/pci/pcie/aer.c
@@ -383,9 +383,6 @@ int pci_cleanup_aer_uncorrect_error_status(struct pci_dev 
*dev)
if (!pos)
return -EIO;
 
-   if (pcie_aer_get_firmware_first(dev))
-   return -EIO;
-
/* Clear status bits for ERR_NONFATAL errors only */
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, );
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, );
@@ -406,9 +403,6 @@ void pci_aer_clear_fatal_status(struct pci_dev *dev)
if (!pos)
return;
 
-   if (pcie_aer_get_firmware_first(dev))
-   return;
-
/* Clear status bits for ERR_FATAL errors only */
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, );
pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, );
@@ -430,9 +424,6 @@ int pci_cleanup_aer_error_status_regs(struct pci_dev *dev)
if (!pos)
return -EIO;
 
-   if (pcie_aer_get_firmware_first(dev))
-   return -EIO;
-
port_type = pci_pcie_type(dev);
if (port_type == PCI_EXP_TYPE_ROOT_PORT) {
pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, );
@@ -455,7 +446,8 @@ void pci_aer_init(struct pci_dev *dev)
if (dev->aer_cap)
dev->aer_stats = kzalloc(sizeof(struct aer_stats), GFP_KERNEL);
 
-   pci_cleanup_aer_error_status_regs(dev);
+   if (!pcie_aer_get_firmware_first(dev))
+   pci_cleanup_aer_error_status_regs(dev);
 }
 
 void pci_aer_exit(struct pci_dev *dev)
-- 
2.21.0



[PATCH v7 7/8] PCI/DPC: Clear AER registers in EDR mode

2019-08-23 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

As per PCI firmware specification r3.2 Downstream Port Containment
Related Enhancements ECN, OS is responsible for clearing the AER
registers in EDR mode. So clear AER registers in dpc_process_error()
function.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Acked-by: Keith Busch 
---
 drivers/pci/pcie/dpc.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index fafc55c00fe0..de2d892bc7c4 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -275,6 +275,10 @@ static void dpc_process_error(struct dpc_dev *dpc)
pci_aer_clear_fatal_status(pdev);
}
 
+   /* In EDR mode, OS is responsible for clearing AER registers */
+   if (dpc->firmware_dpc)
+   pci_cleanup_aer_error_status_regs(pdev);
+
/*
 * Irrespective of whether the DPC event is triggered by
 * ERR_FATAL or ERR_NONFATAL, since the link is already down,
-- 
2.21.0



[PATCH v7 6/8] PCI/DPC: Update comments related to DPC recovery on NON_FATAL errors

2019-08-23 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

Currently, in native mode, DPC driver is configured to trigger DPC only
for FATAL errors and hence it only supports port recovery for FATAL
errors. But with Error Disconnect Recover (EDR) support, DPC
configuration is done by firmware, and hence we should expect DPC
triggered for both FATAL/NON_FATAL errors. So update comments and add
details about how NON_FATAL dpc recovery is handled.

Signed-off-by: Kuppuswamy Sathyanarayanan 

Acked-by: Keith Busch 
---
 drivers/pci/pcie/dpc.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
index a8b634e05694..fafc55c00fe0 100644
--- a/drivers/pci/pcie/dpc.c
+++ b/drivers/pci/pcie/dpc.c
@@ -275,7 +275,11 @@ static void dpc_process_error(struct dpc_dev *dpc)
pci_aer_clear_fatal_status(pdev);
}
 
-   /* We configure DPC so it only triggers on ERR_FATAL */
+   /*
+* Irrespective of whether the DPC event is triggered by
+* ERR_FATAL or ERR_NONFATAL, since the link is already down,
+* use the FATAL error recovery path for both cases.
+*/
pcie_do_recovery(pdev, pci_channel_io_frozen, PCIE_PORT_SERVICE_DPC);
 }
 
-- 
2.21.0



[PATCH v7 8/8] PCI/ACPI: Enable EDR support

2019-08-23 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

As per PCI firmware specification r3.2 Downstream Port Containment
Related Enhancements ECN, sec 4.5.1, OS must implement following steps
to enable/use EDR feature.

1. OS can use bit 7 of _OSC Control Field to negotiate control over
Downstream Port Containment (DPC) configuration of PCIe port. After _OSC
negotiation, firmware will Set this bit to grant OS control over PCIe
DPC configuration and Clear it if this feature was requested and denied,
or was not requested.

2. Also, if OS supports EDR, it should expose its support to BIOS by
setting bit 7 of _OSC Support Field. And if OS sets bit 7 of _OSC
Control Field it must also expose support for EDR by setting bit 7 of
_OSC Support Field.

Cc: Bjorn Helgaas 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Signed-off-by: Kuppuswamy Sathyanarayanan 

Acked-by: Keith Busch 
Tested-by: Huong Nguyen 
Tested-by: Austin Bolen 
---
 drivers/acpi/pci_root.c | 9 +
 drivers/pci/pcie/portdrv_core.c | 8 +++-
 drivers/pci/probe.c | 1 +
 include/linux/acpi.h| 6 --
 include/linux/pci.h | 3 ++-
 5 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c
index 314a187ed572..988d09d788b6 100644
--- a/drivers/acpi/pci_root.c
+++ b/drivers/acpi/pci_root.c
@@ -132,6 +132,7 @@ static struct pci_osc_bit_struct pci_osc_support_bit[] = {
{ OSC_PCI_CLOCK_PM_SUPPORT, "ClockPM" },
{ OSC_PCI_SEGMENT_GROUPS_SUPPORT, "Segments" },
{ OSC_PCI_MSI_SUPPORT, "MSI" },
+   { OSC_PCI_EDR_SUPPORT, "EDR" },
{ OSC_PCI_HPX_TYPE_3_SUPPORT, "HPX-Type3" },
 };
 
@@ -142,6 +143,7 @@ static struct pci_osc_bit_struct pci_osc_control_bit[] = {
{ OSC_PCI_EXPRESS_AER_CONTROL, "AER" },
{ OSC_PCI_EXPRESS_CAPABILITY_CONTROL, "PCIeCapability" },
{ OSC_PCI_EXPRESS_LTR_CONTROL, "LTR" },
+   { OSC_PCI_EXPRESS_DPC_CONTROL, "DPC" },
 };
 
 static void decode_osc_bits(struct acpi_pci_root *root, char *msg, u32 word,
@@ -441,6 +443,8 @@ static void negotiate_os_control(struct acpi_pci_root 
*root, int *no_aspm,
support |= OSC_PCI_ASPM_SUPPORT | OSC_PCI_CLOCK_PM_SUPPORT;
if (pci_msi_enabled())
support |= OSC_PCI_MSI_SUPPORT;
+   if (IS_ENABLED(CONFIG_PCIE_EDR))
+   support |= OSC_PCI_EDR_SUPPORT;
 
decode_osc_support(root, "OS supports", support);
status = acpi_pci_osc_support(root, support);
@@ -488,6 +492,9 @@ static void negotiate_os_control(struct acpi_pci_root 
*root, int *no_aspm,
control |= OSC_PCI_EXPRESS_AER_CONTROL;
}
 
+   if (IS_ENABLED(CONFIG_PCIE_DPC))
+   control |= OSC_PCI_EXPRESS_DPC_CONTROL;
+
requested = control;
status = acpi_pci_osc_control_set(handle, ,
  OSC_PCI_EXPRESS_CAPABILITY_CONTROL);
@@ -917,6 +924,8 @@ struct pci_bus *acpi_pci_root_create(struct acpi_pci_root 
*root,
host_bridge->native_pme = 0;
if (!(root->osc_control_set & OSC_PCI_EXPRESS_LTR_CONTROL))
host_bridge->native_ltr = 0;
+   if (!(root->osc_control_set & OSC_PCI_EXPRESS_DPC_CONTROL))
+   host_bridge->native_dpc = 0;
 
/*
 * Evaluate the "PCI Boot Configuration" _DSM Function.  If it
diff --git a/drivers/pci/pcie/portdrv_core.c b/drivers/pci/pcie/portdrv_core.c
index 1b330129089f..1b54a39df795 100644
--- a/drivers/pci/pcie/portdrv_core.c
+++ b/drivers/pci/pcie/portdrv_core.c
@@ -250,8 +250,14 @@ static int get_port_device_capability(struct pci_dev *dev)
pcie_pme_interrupt_enable(dev, false);
}
 
+   /*
+* If EDR support is enabled in OS, then even if AER is not handled in
+* OS, DPC service can be enabled.
+*/
if (pci_find_ext_capability(dev, PCI_EXT_CAP_ID_DPC) &&
-   pci_aer_available() && services & PCIE_PORT_SERVICE_AER)
+   ((IS_ENABLED(CONFIG_PCIE_EDR) && !host->native_dpc) ||
+   (pci_aer_available() && services & PCIE_PORT_SERVICE_AER &&
+   (pcie_ports_native || host->native_dpc
services |= PCIE_PORT_SERVICE_DPC;
 
if (pci_pcie_type(dev) == PCI_EXP_TYPE_DOWNSTREAM ||
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a3c7338fad86..cf8acdd62089 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -601,6 +601,7 @@ static void pci_init_host_bridge(struct pci_host_bridge 
*bridge)
bridge->native_shpc_hotplug = 1;
bridge->native_pme = 1;
bridge->native_ltr = 1;
+   bridge->native_dpc = 1;
 }
 
 struct pci_host_bridge *pci_alloc_host_bridge(size_t priv)
diff --git a/include/linux/acpi.h b/include/linux/acpi.h
index 9426b9aaed86..7b29c640a9db 100644
--- a/include/linux/acpi.h
+++ b/include/linux/acpi.h
@@ -515,8 +515,9 @@ extern bool osc_pc_lpi_support_confirmed;
 #define OSC_PCI_CLOCK_PM_SUPPORT   

[PATCH v7 0/8] Add Error Disconnect Recover (EDR) support

2019-08-23 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

This patchset adds support for following features:

1. Error Disconnect Recover (EDR) support.
2. _OSC based negotiation support for DPC.

You can find EDR spec in the following link.

https://members.pcisig.com/wg/PCI-SIG/document/12614

Changes since v6:
 * Modified the order of patches to enable EDR only after all necessary support 
is added in kernel.
 * Addressed Bjorn comments.

Changes since v5:
 * Addressed Keith's comments.
 * Added additional check for FF mode in pci_aer_init().
 * Updated commit history of "PCI/DPC: Add support for DPC recovery on 
NON_FATAL errors" patch.

Changes since v4:
 * Rebased on top of v5.3-rc1
 * Fixed lock/unlock issue in edr_handle_event().
 * Merged "Update error status after reset_link()" patch into this patchset.

Changes since v3:
 * Moved EDR related ACPI functions/definitions to pci-acpi.c
 * Modified commit history in few patches to include spec reference.
 * Added support to handle DPC triggered by NON_FATAL errors.
 * Added edr_lock to protect PCI device receiving duplicate EDR notifications.
 * Addressed Bjorn comments.

Changes since v2:
 * Split EDR support patch into multiple patches.
 * Addressed Bjorn comments.

Changes since v1:
 * Rebased on top of v5.1-rc1

Kuppuswamy Sathyanarayanan (8):
  PCI/ERR: Update error status after reset_link()
  PCI/DPC: Allow dpc_probe() even if firmware first mode is enabled
  PCI/DPC: Add dpc_process_error() wrapper function
  PCI/DPC: Add Error Disconnect Recover (EDR) support
  PCI/AER: Allow clearing Error Status Register in FF mode
  PCI/DPC: Update comments related to DPC recovery on NON_FATAL errors
  PCI/DPC: Clear AER registers in EDR mode
  PCI/ACPI: Enable EDR support

 drivers/acpi/pci_root.c |   9 ++
 drivers/pci/pci-acpi.c  |  91 +++
 drivers/pci/pcie/Kconfig|  10 ++
 drivers/pci/pcie/aer.c  |  12 +-
 drivers/pci/pcie/dpc.c  | 194 +---
 drivers/pci/pcie/err.c  |  10 +-
 drivers/pci/pcie/portdrv_core.c |   8 +-
 drivers/pci/probe.c |   1 +
 include/linux/acpi.h|   6 +-
 include/linux/pci-acpi.h|  11 ++
 include/linux/pci.h |   3 +-
 11 files changed, 321 insertions(+), 34 deletions(-)

-- 
2.21.0



[PATCH v7 1/8] PCI/ERR: Update error status after reset_link()

2019-08-23 Thread sathyanarayanan . kuppuswamy
From: Kuppuswamy Sathyanarayanan 

Commit bdb5ac85777d ("PCI/ERR: Handle fatal error recovery") uses
reset_link() to recover from fatal errors. But, if the reset is
successful there is no need to continue the rest of the error recovery
checks. Also, during fatal error recovery, if the initial value of error
status is PCI_ERS_RESULT_DISCONNECT or PCI_ERS_RESULT_NO_AER_DRIVER then
even after successful recovery (using reset_link()) pcie_do_recovery()
will report the recovery result as failure. So update the status of
error after reset_link().

Fixes: bdb5ac85777d ("PCI/ERR: Handle fatal error recovery")
Cc: Ashok Raj 
Cc: Keith Busch 
Signed-off-by: Kuppuswamy Sathyanarayanan 

Acked-by: Keith Busch 
---
 drivers/pci/pcie/err.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
index 773197a12568..c3e883d47675 100644
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -204,9 +204,12 @@ void pcie_do_recovery(struct pci_dev *dev, enum 
pci_channel_state state,
else
pci_walk_bus(bus, report_normal_detected, );
 
-   if (state == pci_channel_io_frozen &&
-   reset_link(dev, service) != PCI_ERS_RESULT_RECOVERED)
-   goto failed;
+   if (state == pci_channel_io_frozen) {
+   status = reset_link(dev, service);
+   if (status != PCI_ERS_RESULT_RECOVERED)
+   goto failed;
+   goto done;
+   }
 
if (status == PCI_ERS_RESULT_CAN_RECOVER) {
status = PCI_ERS_RESULT_RECOVERED;
@@ -228,6 +231,7 @@ void pcie_do_recovery(struct pci_dev *dev, enum 
pci_channel_state state,
if (status != PCI_ERS_RESULT_RECOVERED)
goto failed;
 
+done:
pci_dbg(dev, "broadcast resume message\n");
pci_walk_bus(bus, report_resume, );
 
-- 
2.21.0



Re: [PATCH] sched/fair: don't assign runtime for throttled cfs_rq

2019-08-23 Thread Valentin Schneider
On 23/08/2019 21:00, bseg...@google.com wrote:
[...]
> Could you mention in the message that this a throttled cfs_rq can have
> account_cfs_rq_runtime called on it because it is throttled before
> idle_balance, and the idle_balance calls update_rq_clock to add time
> that is accounted to the task.
> 

Mayhaps even a comment for the extra condition.

> I think this solution is less risky than unthrottling
> in this area, so other than that:
> 
> Reviewed-by: Ben Segall 
> 

If you don't mind squashing this in:

-8<-
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b1d9cec9b1ed..b47b0bcf56bc 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4630,6 +4630,10 @@ static u64 distribute_cfs_runtime(struct cfs_bandwidth 
*cfs_b, u64 remaining)
if (!cfs_rq_throttled(cfs_rq))
goto next;
 
+   /* By the above check, this should never be true */
+   WARN_ON(cfs_rq->runtime_remaining > 0);
+
+   /* Pick the minimum amount to return to a positive quota state 
*/
runtime = -cfs_rq->runtime_remaining + 1;
if (runtime > remaining)
runtime = remaining;
->8-

I'm not adamant about the extra comment, but the WARN_ON would be nice IMO.


@Ben, do you reckon we want to strap

Cc: 
Fixes: ec12cb7f31e2 ("sched: Accumulate per-cfs_rq cpu usage and charge against 
bandwidth")

to the thing? AFAICT the pick_next_task_fair() + idle_balance() dance you
described should still be possible on that commit.


Other than that,

Reviewed-by: Valentin Schneider 

[...]


Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER

2019-08-23 Thread Carlos Eduardo de Paula
On Thu, Aug 22, 2019 at 5:56 PM David Abdurachmanov
 wrote:
>
> This patch was extensively tested on Fedora/RISCV (applied by default on
> top of 5.2-rc7 kernel for <2 months). The patch was also tested with 5.3-rc
> on QEMU and SiFive Unleashed board.
>
> libseccomp (userspace) was rebased:
> https://github.com/seccomp/libseccomp/pull/134
>
> Fully passes libseccomp regression testing (simulation and live).
>
> There is one failing kernel selftest: global.user_notification_signal
>
> v1 -> v2:
>   - return immediatly if secure_computing(NULL) returns -1
>   - fixed whitespace issues
>   - add missing seccomp.h
>   - remove patch #2 (solved now)
>   - add riscv to seccomp kernel selftest
>
> Cc: keesc...@chromium.org
> Cc: m...@carlosedp.com
>
> Signed-off-by: David Abdurachmanov 
> Tested-by: Carlos de Paula 
> ---
>  arch/riscv/Kconfig| 14 ++
>  arch/riscv/include/asm/seccomp.h  | 10 +++
>  arch/riscv/include/asm/thread_info.h  |  5 +++-
>  arch/riscv/kernel/entry.S | 27 +--
>  arch/riscv/kernel/ptrace.c| 10 +++
>  tools/testing/selftests/seccomp/seccomp_bpf.c |  8 +-
>  6 files changed, 70 insertions(+), 4 deletions(-)
>  create mode 100644 arch/riscv/include/asm/seccomp.h
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 59a4727ecd6c..441e63ff5adc 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -31,6 +31,7 @@ config RISCV
> select GENERIC_SMP_IDLE_THREAD
> select GENERIC_ATOMIC64 if !64BIT
> select HAVE_ARCH_AUDITSYSCALL
> +   select HAVE_ARCH_SECCOMP_FILTER
> select HAVE_MEMBLOCK_NODE_MAP
> select HAVE_DMA_CONTIGUOUS
> select HAVE_FUTEX_CMPXCHG if FUTEX
> @@ -235,6 +236,19 @@ menu "Kernel features"
>
>  source "kernel/Kconfig.hz"
>
> +config SECCOMP
> +   bool "Enable seccomp to safely compute untrusted bytecode"
> +   help
> + This kernel feature is useful for number crunching applications
> + that may need to compute untrusted bytecode during their
> + execution. By using pipes or other transports made available to
> + the process as file descriptors supporting the read/write
> + syscalls, it's possible to isolate those applications in
> + their own address space using seccomp. Once seccomp is
> + enabled via prctl(PR_SET_SECCOMP), it cannot be disabled
> + and the task is only allowed to execute a few safe syscalls
> + defined by each seccomp mode.
> +
>  endmenu
>
>  menu "Boot options"
> diff --git a/arch/riscv/include/asm/seccomp.h 
> b/arch/riscv/include/asm/seccomp.h
> new file mode 100644
> index ..bf7744ee3b3d
> --- /dev/null
> +++ b/arch/riscv/include/asm/seccomp.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _ASM_SECCOMP_H
> +#define _ASM_SECCOMP_H
> +
> +#include 
> +
> +#include 
> +
> +#endif /* _ASM_SECCOMP_H */
> diff --git a/arch/riscv/include/asm/thread_info.h 
> b/arch/riscv/include/asm/thread_info.h
> index 905372d7eeb8..a0b2a29a0da1 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -75,6 +75,7 @@ struct thread_info {
>  #define TIF_MEMDIE 5   /* is terminating due to OOM killer */
>  #define TIF_SYSCALL_TRACEPOINT  6   /* syscall tracepoint 
> instrumentation */
>  #define TIF_SYSCALL_AUDIT  7   /* syscall auditing */
> +#define TIF_SECCOMP8   /* syscall secure computing */
>
>  #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
>  #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
> @@ -82,11 +83,13 @@ struct thread_info {
>  #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
>  #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT)
>  #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
> +#define _TIF_SECCOMP   (1 << TIF_SECCOMP)
>
>  #define _TIF_WORK_MASK \
> (_TIF_NOTIFY_RESUME | _TIF_SIGPENDING | _TIF_NEED_RESCHED)
>
>  #define _TIF_SYSCALL_WORK \
> -   (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_AUDIT)
> +   (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_AUDIT | \
> +_TIF_SECCOMP )
>
>  #endif /* _ASM_RISCV_THREAD_INFO_H */
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index bc7a56e1ca6f..0bbedfa3e47d 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -203,8 +203,25 @@ check_syscall_nr:
> /* Check to make sure we don't jump to a bogus syscall number. */
> li t0, __NR_syscalls
> la s0, sys_ni_syscall
> -   /* Syscall number held in a7 */
> -   bgeu a7, t0, 1f
> +   /*
> +* The tracer can change syscall number to valid/invalid value.
> +* We use syscall_set_nr helper in syscall_trace_enter thus we
> +* cannot trust the current value in a7 and have 

Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER

2019-08-23 Thread Carlos Eduardo de Paula
On Thu, Aug 22, 2019 at 5:56 PM David Abdurachmanov
 wrote:
>
> This patch was extensively tested on Fedora/RISCV (applied by default on
> top of 5.2-rc7 kernel for <2 months). The patch was also tested with 5.3-rc
> on QEMU and SiFive Unleashed board.
>
> libseccomp (userspace) was rebased:
> https://github.com/seccomp/libseccomp/pull/134
>
> Fully passes libseccomp regression testing (simulation and live).
>
> There is one failing kernel selftest: global.user_notification_signal
>
> v1 -> v2:
>   - return immediatly if secure_computing(NULL) returns -1
>   - fixed whitespace issues
>   - add missing seccomp.h
>   - remove patch #2 (solved now)
>   - add riscv to seccomp kernel selftest
>
> Cc: keesc...@chromium.org
> Cc: m...@carlosedp.com
>
> Signed-off-by: David Abdurachmanov 
> ---
>  arch/riscv/Kconfig| 14 ++
>  arch/riscv/include/asm/seccomp.h  | 10 +++
>  arch/riscv/include/asm/thread_info.h  |  5 +++-
>  arch/riscv/kernel/entry.S | 27 +--
>  arch/riscv/kernel/ptrace.c| 10 +++
>  tools/testing/selftests/seccomp/seccomp_bpf.c |  8 +-
>  6 files changed, 70 insertions(+), 4 deletions(-)
>  create mode 100644 arch/riscv/include/asm/seccomp.h
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 59a4727ecd6c..441e63ff5adc 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -31,6 +31,7 @@ config RISCV
> select GENERIC_SMP_IDLE_THREAD
> select GENERIC_ATOMIC64 if !64BIT
> select HAVE_ARCH_AUDITSYSCALL
> +   select HAVE_ARCH_SECCOMP_FILTER
> select HAVE_MEMBLOCK_NODE_MAP
> select HAVE_DMA_CONTIGUOUS
> select HAVE_FUTEX_CMPXCHG if FUTEX
> @@ -235,6 +236,19 @@ menu "Kernel features"
>
>  source "kernel/Kconfig.hz"
>
> +config SECCOMP
> +   bool "Enable seccomp to safely compute untrusted bytecode"
> +   help
> + This kernel feature is useful for number crunching applications
> + that may need to compute untrusted bytecode during their
> + execution. By using pipes or other transports made available to
> + the process as file descriptors supporting the read/write
> + syscalls, it's possible to isolate those applications in
> + their own address space using seccomp. Once seccomp is
> + enabled via prctl(PR_SET_SECCOMP), it cannot be disabled
> + and the task is only allowed to execute a few safe syscalls
> + defined by each seccomp mode.
> +
>  endmenu
>
>  menu "Boot options"
> diff --git a/arch/riscv/include/asm/seccomp.h 
> b/arch/riscv/include/asm/seccomp.h
> new file mode 100644
> index ..bf7744ee3b3d
> --- /dev/null
> +++ b/arch/riscv/include/asm/seccomp.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _ASM_SECCOMP_H
> +#define _ASM_SECCOMP_H
> +
> +#include 
> +
> +#include 
> +
> +#endif /* _ASM_SECCOMP_H */
> diff --git a/arch/riscv/include/asm/thread_info.h 
> b/arch/riscv/include/asm/thread_info.h
> index 905372d7eeb8..a0b2a29a0da1 100644
> --- a/arch/riscv/include/asm/thread_info.h
> +++ b/arch/riscv/include/asm/thread_info.h
> @@ -75,6 +75,7 @@ struct thread_info {
>  #define TIF_MEMDIE 5   /* is terminating due to OOM killer */
>  #define TIF_SYSCALL_TRACEPOINT  6   /* syscall tracepoint 
> instrumentation */
>  #define TIF_SYSCALL_AUDIT  7   /* syscall auditing */
> +#define TIF_SECCOMP8   /* syscall secure computing */
>
>  #define _TIF_SYSCALL_TRACE (1 << TIF_SYSCALL_TRACE)
>  #define _TIF_NOTIFY_RESUME (1 << TIF_NOTIFY_RESUME)
> @@ -82,11 +83,13 @@ struct thread_info {
>  #define _TIF_NEED_RESCHED  (1 << TIF_NEED_RESCHED)
>  #define _TIF_SYSCALL_TRACEPOINT(1 << TIF_SYSCALL_TRACEPOINT)
>  #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT)
> +#define _TIF_SECCOMP   (1 << TIF_SECCOMP)
>
>  #define _TIF_WORK_MASK \
> (_TIF_NOTIFY_RESUME | _TIF_SIGPENDING | _TIF_NEED_RESCHED)
>
>  #define _TIF_SYSCALL_WORK \
> -   (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_AUDIT)
> +   (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_TRACEPOINT | _TIF_SYSCALL_AUDIT | \
> +_TIF_SECCOMP )
>
>  #endif /* _ASM_RISCV_THREAD_INFO_H */
> diff --git a/arch/riscv/kernel/entry.S b/arch/riscv/kernel/entry.S
> index bc7a56e1ca6f..0bbedfa3e47d 100644
> --- a/arch/riscv/kernel/entry.S
> +++ b/arch/riscv/kernel/entry.S
> @@ -203,8 +203,25 @@ check_syscall_nr:
> /* Check to make sure we don't jump to a bogus syscall number. */
> li t0, __NR_syscalls
> la s0, sys_ni_syscall
> -   /* Syscall number held in a7 */
> -   bgeu a7, t0, 1f
> +   /*
> +* The tracer can change syscall number to valid/invalid value.
> +* We use syscall_set_nr helper in syscall_trace_enter thus we
> +* cannot trust the current value in a7 and have to reload from
> +* the 

Re: [patch V2 27/38] posix-cpu-timers: Remove cputime_expires

2019-08-23 Thread Frederic Weisbecker
On Wed, Aug 21, 2019 at 09:09:14PM +0200, Thomas Gleixner wrote:
> The last users of the magic struct cputime based expiry cache are
> gone. Remove the leftovers.
> 
> Signed-off-by: Thomas Gleixner 

Reviewed-by: Frederic Weisbecker 


Re: [PATCH] KVM: x86: Don't update RIP or do single-step on faulting emulation

2019-08-23 Thread Andy Lutomirski
On Fri, Aug 23, 2019 at 1:55 PM Sean Christopherson
 wrote:
>
> Don't advance RIP or inject a single-step #DB if emulation signals a
> fault.  This logic applies to all state updates that are conditional on
> clean retirement of the emulation instruction, e.g. updating RFLAGS was
> previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
> EFLAGS on faulting emulation").
>
> Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
> ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
> fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
> invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation

EFLAGS.TF=1


Re: [patch V2 26/38] posix-cpu-timers: Make expiry checks array based

2019-08-23 Thread Frederic Weisbecker
On Wed, Aug 21, 2019 at 09:09:13PM +0200, Thomas Gleixner wrote:
> The expiry cache is an array indexed by clock ids. The new sample functions
> allow to retrieve a corresponding array of samples.
> 
> Convert the fastpath expiry checks to make use of the new sample functions
> and do the comparisons on the sample and the expiry array.
> 
> Make the check for the expiry array being zero array based as well.
> 
> Signed-off-by: Thomas Gleixner 

Reviewed-by: Frederic Weisbecker 

Just a few boring neats:

>  /**
> - * task_cputime_expired - Compare two task_cputime entities.
> + * task_cputimers_expired - Compare two task_cputime entities.

s/task_cputime/clock arrays ?

>   *
> - * @sample:  The task_cputime structure to be checked for expiration.
> - * @expires: Expiration times, against which @sample will be checked.
> + * @samples: Array of current samples for the CPUCLOCK clocks
> + * @expiries:Array of expiry values for the CPUCLOCK clocks
>   *
> - * Checks @sample against @expires to see if any field of @sample has 
> expired.
> - * Returns true if any field of the former is greater than the corresponding
> - * field of the latter if the latter field is set.  Otherwise returns false.
> + * Returns true if any mmember of @samples is greater than the corresponding

s/mmember/member

> + * member of @expiries if that member is non zero. False otherwise
>   */
> -static inline int task_cputime_expired(const struct task_cputime *sample,
> - const struct task_cputime *expires)
> +static inline bool task_cputimers_expired(const u64 *sample, const u64 
> *expiries)

s/sample/samples to agree with above documented parameters.

Thanks.


Re: [PATCH] net/mlx5: fix a -Wstringop-truncation warning

2019-08-23 Thread David Miller


Saeed, I assume I'll get this from you.


Re: [patch V2 25/38] posix-cpu-timers: Provide array based sample functions

2019-08-23 Thread Frederic Weisbecker
On Wed, Aug 21, 2019 at 09:09:12PM +0200, Thomas Gleixner wrote:
> Instead of using task_cputime and doing the addition of utime and stime at
> all call sites, it's way simpler to have a sample array which allows
> indexed based checks against the expiry cache array.
> 
> Signed-off-by: Thomas Gleixner 

Reviewed-by: Frederic Weisbecker 


Re: [PATCH 5.2 000/135] 5.2.10-stable review

2019-08-23 Thread Sasha Levin

On Fri, Aug 23, 2019 at 12:41:03PM -0600, shuah wrote:

On 8/22/19 11:05 AM, Sasha Levin wrote:


This is the start of the stable review cycle for the 5.2.10 release.
There are 135 patches in this series, all will be posted as a response
to this one.  If anyone has any issues with these being applied, please
let me know.

Responses should be made by Sat 24 Aug 2019 05:07:10 PM UTC.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:

https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-5.2.10-rc1.gz


I am seeing "Sorry I can't find your kernels". Is this posted?


I proposed that we stop uploading the patch to see if anyone is actually
using it.

An alternative would be to use the git web interface instead, so for
example a patch file for 5.2.10-rc1 can be generated at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.2.y=v5.2.9

--
Thanks,
Sasha


WARNINGs in set_task_reclaim_state with memory cgroup and full memory usage

2019-08-23 Thread Adric Blake
Synopsis:
A WARN_ON_ONCE is hit twice in set_task_reclaim_state under the
following conditions:
- a memory cgroup has been created and a task assigned it it
- memory.limit_in_bytes has been set
- memory has filled up, likely from cache

In my usage, I create a cgroup under the current session scope and
assign a task to it. I then set memory.limit_in_bytes and
memory.soft_limit_in_bytes for the cgroup to reasonable values, say
1G/512M. The program accesses large files frequently and gradually
fills memory with the page cache. The warnings appears when the
entirety of the system memory is filled, presumably from other
programs.

If I wait until the program has filled the entirety of system memory
with cache and then assign a memory limit, the warnings appear
immediately.

I am building the linux git. I first noticed this issue with the
drm-tip 5.3rc3 and 5.3rc4 kernels, and tested linux master after
5.3rc5 to confirm the bug more resoundingly.

Here are the warnings.

[38491.963105] WARNING: CPU: 7 PID: 175 at mm/vmscan.c:245
set_task_reclaim_state+0x1e/0x40
[38491.963106] Modules linked in: iwlmvm mac80211 libarc4 iwlwifi
cfg80211 xt_comment nls_iso8859_1 nls_cp437 vfat fat xfs jfs btrfs xor
raid6_pq libcrc32c ccm tun rfcomm fuse xt_tcpudp ip6t_REJECT
nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_multiport xt_owner
snd_hda_codec_hdmi ip6table_filter ip6_tables iptable_filter bnep ext4
crc32c_generic mbcache jbd2 snd_hda_codec_realtek
snd_hda_codec_generic snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core
snd_soc_skl_ipc x86_pkg_temp_thermal intel_powerclamp snd_soc_sst_ipc
coretemp snd_soc_sst_dsp snd_soc_acpi_intel_match kvm_intel
snd_soc_acpi i915 snd_soc_core kvm snd_compress ac97_bus
snd_pcm_dmaengine snd_hda_intel i2c_algo_bit btusb irqbypass
drm_kms_helper btrtl snd_hda_codec dell_laptop btbcm crct10dif_pclmul
snd_hda_core crc32c_intel btintel iTCO_wdt ghash_clmulni_intel drm
ledtrig_audio aesni_intel iTCO_vendor_support snd_hwdep dell_wmi
rtsx_usb_ms r8169 dell_smbios aes_x86_64 mei_hdcp crypto_simd
intel_gtt bluetooth snd_pcm cryptd dcdbas
[38491.963155]  wmi_bmof dell_wmi_descriptor intel_rapl_msr
glue_helper snd_timer joydev intel_cstate snd realtek memstick
dell_smm_hwmon mousedev psmouse input_leds libphy intel_uncore
ecdh_generic ecc crc16 rfkill intel_rapl_perf soundcore i2c_i801
agpgart mei_me tpm_crb syscopyarea sysfillrect sysimgblt mei
intel_xhci_usb_role_switch fb_sys_fops idma64 tpm_tis roles
processor_thermal_device intel_rapl_common i2c_hid tpm_tis_core
int3403_thermal intel_soc_dts_iosf battery wmi intel_lpss_pci
intel_lpss intel_pch_thermal tpm int3400_thermal int3402_thermal
acpi_thermal_rel int340x_thermal_zone rng_core intel_hid ac
sparse_keymap evdev mac_hid crypto_user ip_tables x_tables
hid_multitouch rtsx_usb_sdmmc mmc_core rtsx_usb hid_logitech_hidpp
sr_mod cdrom sd_mod uas usb_storage hid_logitech_dj hid_generic usbhid
hid ahci serio_raw libahci atkbd libps2 libata xhci_pci scsi_mod
xhci_hcd crc32_pclmul i8042 serio f2fs [last unloaded: cfg80211]
[38491.963221] CPU: 7 PID: 175 Comm: kswapd0 Not tainted
5.3.0-rc5+149+gbb7ba8069de9 #1
[38491.963222] Hardware name: Dell Inc. Inspiron 5570/09YTN7, BIOS
1.2.3 05/15/2019
[38491.963226] RIP: 0010:set_task_reclaim_state+0x1e/0x40
[38491.963228] Code: 78 a9 e7 ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 55 48 89 f5 53 48 89 fb 48 85 ed 48 8b 83 08 08 00 00 74 11 48 85
c0 74 02 <0f> 0b 48 89 ab 08 08 00 00 5b 5d c3 48 85 c0 75 f1 0f 0b 48
89 ab
[38491.963229] RSP: 0018:8c898031fc60 EFLAGS: 00010286
[38491.963230] RAX: 8c898031fe28 RBX: 892aa04ddc40 RCX: 
[38491.963231] RDX: 8c898031fc60 RSI: 8c898031fcd0 RDI: 892aa04ddc40
[38491.963233] RBP: 8c898031fcd0 R08: 8c898031fd48 R09: 89279674b800
[38491.963234] R10:  R11:  R12: 8c898031fd48
[38491.963235] R13: 892a842ef000 R14: 892aaf7fc000 R15: 
[38491.963236] FS:  () GS:892aa33c()
knlGS:
[38491.963238] CS:  0010 DS:  ES:  CR0: 80050033
[38491.963239] CR2: 7f90628fa000 CR3: 00027ee0a002 CR4: 003606e0
[38491.963239] Call Trace:
[38491.963246]  mem_cgroup_shrink_node+0x9b/0x1d0
[38491.963250]  mem_cgroup_soft_limit_reclaim+0x10c/0x3a0
[38491.963254]  balance_pgdat+0x276/0x540
[38491.963258]  kswapd+0x200/0x3f0
[38491.963261]  ? wait_woken+0x80/0x80
[38491.963265]  kthread+0xfd/0x130
[38491.963267]  ? balance_pgdat+0x540/0x540
[38491.963269]  ? kthread_park+0x80/0x80
[38491.963273]  ret_from_fork+0x35/0x40
[38491.963276] ---[ end trace 727343df67b2398a ]---
[38492.129877] WARNING: CPU: 7 PID: 175 at mm/vmscan.c:248
set_task_reclaim_state+0x2f/0x40
[38492.129879] Modules linked in: iwlmvm mac80211 libarc4 iwlwifi
cfg80211 xt_comment nls_iso8859_1 nls_cp437 vfat fat xfs jfs btrfs xor
raid6_pq libcrc32c ccm tun rfcomm fuse xt_tcpudp ip6t_REJECT
nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_multiport xt_owner

Re: [RESEND PATCH 04/13] KVM: x86: Drop EMULTYPE_NO_UD_ON_FAIL as a standalone type

2019-08-23 Thread Sean Christopherson
On Fri, Aug 23, 2019 at 04:32:05PM +0300, Liran Alon wrote:
> 
> > On 23 Aug 2019, at 16:21, Liran Alon  wrote:
> > 
> >> On 23 Aug 2019, at 4:07, Sean Christopherson 
> >>  wrote:
> >> 
> >> The "no #UD on fail" is used only in the VMWare case, and for the VMWare
> >> scenario it really means "#GP instead of #UD on fail".  Remove the flag
> >> in preparation for moving all fault injection into the emulation flow
> >> itself, which in turn will allow eliminating EMULATE_DONE and company.
> >> 
> >> Signed-off-by: Sean Christopherson 
> > 
> > When I created the commit which introduced this e23661712005 ("KVM: x86:
> > Add emulation_type to not raise #UD on emulation failure") I intentionally
> > introduced a new flag to emulation_type instead of using EMULTYPE_VMWARE as
> > I thought it’s weird to couple this behaviour specifically with VMware
> > emulation.  As it made sense to me that there could be more scenarios in
> > which some VMExit handler would like to use the x86 emulator but in case of
> > failure want to decide what would be the failure handling from the outside.
> > I also didn’t want the x86 emulator to be aware of VMware interception
> > internals.
> > 
> > Having said that, one could argue that the x86 emulator already knows about
> > the VMware interception internals because of how x86_emulate_instruction()
> > use is_vmware_backdoor_opcode() and from the mere existence of
> > EMULTYPE_VMWARE. So I think it’s legit to decide that we will just move all
> > the VMware interception logic into the x86 emulator. Including handling
> > emulation failures. But then, I would make this patch of yours to also
> > modify handle_emulation_failure() to queue #GP to guest directly instead of
> > #GP intercept in VMX/SVM to do so.  I see you do it in a later patch "KVM:
> > x86: Move #GP injection for VMware into x86_emulate_instruction()" but I
> > think this should just be squashed with this patch to make sense.
> > 
> > To sum-up, I agree with your approach but I recommend you squash this patch
> > and patch 6 of the series to one and change commit message to explain that
> > you just move entire handling of VMware interception into the x86 emulator.
> > Instead of providing explanations such as VMware emulation is the only one
> > that use “no #UD on fail”.
> 
> After reading patch 5 as-well, I would recommend to first apply patch 5
> (filter out #GP with error-code != 0) and only then apply 4+6.

Works for me.


[PATCH] gpiolib: acpi: Add gpiolib_acpi_run_edge_events_on_boot option and blacklist

2019-08-23 Thread Hans de Goede
Another day; another DSDT bug we need to workaround...

Since commit ca876c7483b6 ("gpiolib-acpi: make sure we trigger edge events
at least once on boot") we call _AEI edge handlers at boot.

In some rare cases this causes problems. One example of this is the Minix
Neo Z83-4 mini PC, this device has a clear DSDT bug where it has some copy
and pasted code for dealing with Micro USB-B connector host/device role
switching, while the mini PC does not even have a micro-USB connector.
This code, which should not be there, messes with the DDC data pin from
the HDMI connector (switching it to GPIO mode) breaking HDMI support.

To avoid problems like this, this commit adds a new
gpiolib_acpi_run_edge_events_on_boot kernel commandline option which
can be "on", "off", or "auto" (default).

In auto mode the default is on and a DMI based blacklist is used,
the initial version of this blacklist contains the Minix Neo Z83-4
fixing the HDMI being broken on this device.

Cc: sta...@vger.kernel.org
Cc: Daniel Drake 
Cc: Ian W MORRISON 
Reported-by: Ian W MORRISON 
Suggested-by: Ian W MORRISON 
Fixes: ca876c7483b6 ("gpiolib-acpi: make sure we trigger edge events at least 
once on boot")
Signed-off-by: Hans de Goede 
---
 drivers/gpio/gpiolib-acpi.c | 52 ++---
 1 file changed, 48 insertions(+), 4 deletions(-)

diff --git a/drivers/gpio/gpiolib-acpi.c b/drivers/gpio/gpiolib-acpi.c
index 39f2f9035c11..546dc2c1f3f1 100644
--- a/drivers/gpio/gpiolib-acpi.c
+++ b/drivers/gpio/gpiolib-acpi.c
@@ -7,6 +7,7 @@
  *  Mika Westerberg 
  */
 
+#include 
 #include 
 #include 
 #include 
@@ -19,6 +20,23 @@
 
 #include "gpiolib.h"
 
+static int gpiolib_acpi_run_edge_events_on_boot = -1;
+
+static int __init gpiolib_acpi_run_edge_events_on_boot_setup(char *arg)
+{
+   if (!strcmp(arg, "on"))
+   gpiolib_acpi_run_edge_events_on_boot = 1;
+   else if (!strcmp(arg, "off"))
+   gpiolib_acpi_run_edge_events_on_boot = 0;
+   else if (!strcmp(arg, "auto"))
+   gpiolib_acpi_run_edge_events_on_boot = -1;
+
+   return 1;
+}
+
+__setup("gpiolib_acpi_run_edge_events_on_boot=",
+gpiolib_acpi_run_edge_events_on_boot_setup);
+
 /**
  * struct acpi_gpio_event - ACPI GPIO event handler data
  *
@@ -150,6 +168,29 @@ bool acpi_gpio_get_irq_resource(struct acpi_resource *ares,
 }
 EXPORT_SYMBOL_GPL(acpi_gpio_get_irq_resource);
 
+static const struct dmi_system_id run_edge_events_on_boot_blacklist[] =
+{
+   {
+   .matches = {
+   DMI_MATCH(DMI_SYS_VENDOR, "MINIX"),
+   DMI_MATCH(DMI_PRODUCT_NAME, "Z83-4"),
+   }
+   },
+   {} /* Terminating entry */
+};
+
+static bool acpi_gpiochip_run_edge_events_on_boot(void)
+{
+   if (gpiolib_acpi_run_edge_events_on_boot == -1) {
+   if (dmi_check_system(run_edge_events_on_boot_blacklist))
+   gpiolib_acpi_run_edge_events_on_boot = 0;
+   else
+   gpiolib_acpi_run_edge_events_on_boot = 1;
+   }
+
+   return gpiolib_acpi_run_edge_events_on_boot;
+}
+
 static void acpi_gpiochip_request_irq(struct acpi_gpio_chip *acpi_gpio,
  struct acpi_gpio_event *event)
 {
@@ -170,10 +211,13 @@ static void acpi_gpiochip_request_irq(struct 
acpi_gpio_chip *acpi_gpio,
event->irq_requested = true;
 
/* Make sure we trigger the initial state of edge-triggered IRQs */
-   value = gpiod_get_raw_value_cansleep(event->desc);
-   if (((event->irqflags & IRQF_TRIGGER_RISING) && value == 1) ||
-   ((event->irqflags & IRQF_TRIGGER_FALLING) && value == 0))
-   event->handler(event->irq, event);
+   if (acpi_gpiochip_run_edge_events_on_boot() &&
+   (event->irqflags & (IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING))) {
+   value = gpiod_get_raw_value_cansleep(event->desc);
+   if (((event->irqflags & IRQF_TRIGGER_RISING) && value == 1) ||
+   ((event->irqflags & IRQF_TRIGGER_FALLING) && value == 0))
+   event->handler(event->irq, event);
+   }
 }
 
 static void acpi_gpiochip_request_irqs(struct acpi_gpio_chip *acpi_gpio)
-- 
2.22.0



Re: [PATCH] tcp: fix tcp_rtx_queue_tail in case of empty retransmit queue

2019-08-23 Thread David Miller


All networking patches must be sent CC:'d to net...@vger.kernel.org

Thank you.


Re: [PATCH] KVM: x86: Don't update RIP or do single-step on faulting emulation

2019-08-23 Thread Nadav Amit
> On Aug 23, 2019, at 1:55 PM, Sean Christopherson 
>  wrote:
> 
> Don't advance RIP or inject a single-step #DB if emulation signals a
> fault.  This logic applies to all state updates that are conditional on
> clean retirement of the emulation instruction, e.g. updating RFLAGS was
> previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
> EFLAGS on faulting emulation").
> 
> Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
> ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
> fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
> invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation
> overwriting the #UD with #DB and thus restarting the bad SYSCALL over
> and over.
> 
> Cc: Nadav Amit 
> Cc: sta...@vger.kernel.org
> Reported-by: Andy Lutomirski 
> Fixes: 663f4c61b803 ("KVM: x86: handle singlestep during emulation")
> Signed-off-by: Sean Christopherson 

Seems fine. I guess I should’ve found it before…

Consider running the relevant self-tests (e.g., single_test_syscall) to
avoid regressions.



Re: [PATCH net 2/2] r8152: avoid using napi_disable after netif_napi_del.

2019-08-23 Thread David Miller
From: Hayes Wang 
Date: Fri, 23 Aug 2019 16:53:02 +0800

> Exchange netif_napi_del() and unregister_netdev() in rtl8152_disconnect()
> to avoid using napi_disable() after netif_napi_del().
> 
> Signed-off-by: Hayes Wang 
> ---
>  drivers/net/usb/r8152.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 690a24d1ef82..29390eda5251 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -5364,8 +5364,8 @@ static void rtl8152_disconnect(struct usb_interface 
> *intf)
>   if (tp) {
>   rtl_set_unplug(tp);
>  
> - netif_napi_del(>napi);
>   unregister_netdev(tp->netdev);
> + netif_napi_del(>napi);
>   cancel_delayed_work_sync(>hw_phy_work);
>   tp->rtl_ops.unload(tp);
>   free_netdev(tp->netdev);

This is completely redundant because free_netdev() will perform all of
the necessary netif_napi_del() calls.


[PATCH] KVM: VMX: Handle single-step #DB for EMULTYPE_SKIP on EPT misconfig

2019-08-23 Thread Sean Christopherson
VMX's EPT misconfig flow to handle fast-MMIO path falls back to decoding
the instruction to determine the instruction length when running as a
guest (Hyper-V doesn't fill VMCS.VM_EXIT_INSTRUCTION_LEN because it's
technically not defined for EPT misconfigs).  Rather than implement the
slow skip in VMX's generic skip_emulated_instruction(),
handle_ept_misconfig() directly calls kvm_emulate_instruction() with
EMULTYPE_SKIP, which intentionally doesn't do single-step detection, and
so handle_ept_misconfig() misses a single-step #DB.

Rework the EPT misconfig fallback case to route it through
kvm_skip_emulated_instruction() so that single-step #DBs and interrupt
shadow updates are handled automatically.  I.e. make VMX's slow skip
logic match SVM's and have the SVM flow not intentionally avoid the
shadow update.

Alternatively, the handle_ept_misconfig() could manually handle single-
step detection, but that results in EMULTYPE_SKIP having split logic for
the interrupt shadow vs. single-step #DBs, and split emulator logic is
largely what led to this mess in the first place.

Modifying SVM to mirror VMX flow isn't really an option as SVM's case
isn't limited to a specific exit reason, i.e. handling the slow skip in
skip_emulated_instruction() is mandatory for all intents and purposes.

Drop VMX's skip_emulated_instruction() wrapper since it can now fail,
and instead WARN if it fails unexpectedly, e.g. if exit_reason somehow
becomes corrupted.

Cc: Vitaly Kuznetsov 
Fixes: d391f12070672 ("x86/kvm/vmx: do not use vm-exit instruction length for 
fast MMIO when running nested")
Signed-off-by: Sean Christopherson 
---

*** LOOK HERE ***

This patch applies on top my recent emulation cleanup[1][2] as it has
non-trivial conflicts, dealing with those seemed like a waste of time,
and this doesn't seem like a candidate for stable.  Let me know if you'd
prefer it to be respun without the dependency.

Sadly/ironically, this unwinds some of the logic that was recently
added by Vitaly at my suggestion.  Hindsight is 20/20 and all that...

[1] 
https://lkml.kernel.org/r/20190823010709.24879-1-sean.j.christopher...@intel.com
[2] https://patchwork.kernel.org/cover/0331/

 arch/x86/kvm/svm.c | 17 +++---
 arch/x86/kvm/vmx/vmx.c | 52 ++
 arch/x86/kvm/x86.c |  6 -
 3 files changed, 36 insertions(+), 39 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f374f11358b7..93137d5f71f8 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -777,14 +777,15 @@ static int skip_emulated_instruction(struct kvm_vcpu 
*vcpu)
svm->next_rip = svm->vmcb->control.next_rip;
}
 
-   if (!svm->next_rip)
-   return kvm_emulate_instruction(vcpu, EMULTYPE_SKIP);
-
-   if (svm->next_rip - kvm_rip_read(vcpu) > MAX_INST_SIZE)
-   printk(KERN_ERR "%s: ip 0x%lx next 0x%llx\n",
-  __func__, kvm_rip_read(vcpu), svm->next_rip);
-
-   kvm_rip_write(vcpu, svm->next_rip);
+   if (!svm->next_rip) {
+   if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP))
+   return 0;
+   } else {
+   if (svm->next_rip - kvm_rip_read(vcpu) > MAX_INST_SIZE)
+   pr_err("%s: ip 0x%lx next 0x%llx\n",
+  __func__, kvm_rip_read(vcpu), svm->next_rip);
+   kvm_rip_write(vcpu, svm->next_rip);
+   }
svm_set_interrupt_shadow(vcpu, 0);
 
return 1;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44d868c49301..4ee1572e1a7e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -1472,17 +1472,27 @@ static int vmx_rtit_ctl_check(struct kvm_vcpu *vcpu, 
u64 data)
return 0;
 }
 
-/*
- * Returns an int to be compatible with SVM implementation (which can fail).
- * Do not use directly, use skip_emulated_instruction() instead.
- */
-static int __skip_emulated_instruction(struct kvm_vcpu *vcpu)
+static int skip_emulated_instruction(struct kvm_vcpu *vcpu)
 {
unsigned long rip;
 
-   rip = kvm_rip_read(vcpu);
-   rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
-   kvm_rip_write(vcpu, rip);
+   /*
+* Using VMCS.VM_EXIT_INSTRUCTION_LEN on EPT misconfig depends on
+* undefined behavior: Intel's SDM doesn't mandate the VMCS field be
+* set when EPT misconfig occurs.  In practice, real hardware updates
+* VM_EXIT_INSTRUCTION_LEN on EPT misconfig, but other hypervisors
+* (namely Hyper-V) don't set it due to it being undefined behavior,
+* i.e. we end up advancing IP with some random value.
+*/
+   if (!static_cpu_has(X86_FEATURE_HYPERVISOR) ||
+   to_vmx(vcpu)->exit_reason != EXIT_REASON_EPT_MISCONFIG) {
+   rip = kvm_rip_read(vcpu);
+   rip += vmcs_read32(VM_EXIT_INSTRUCTION_LEN);
+   kvm_rip_write(vcpu, rip);
+   } else {
+   if 

Re: [PATCH net-next v4 0/2] r8152: save EEE

2019-08-23 Thread David Miller
From: Hayes Wang 
Date: Fri, 23 Aug 2019 15:33:39 +0800

> v4:
> For patch #2, remove redundant calling of "ocp_reg_write(tp, OCP_EEE_ADV, 0)".
> 
> v3:
> For patch #2, fix the mistake caused by copying and pasting.
> 
> v2:
> Adjust patch #1. The EEE has been disabled in the beginning of
> r8153_hw_phy_cfg() and r8153b_hw_phy_cfg(), so only check if
> it is necessary to enable EEE.
> 
> Add the patch #2 for the helper function.
> 
> v1:
> Saving the settings of EEE to avoid they become the default settings
> after reset_resume().

Series applied.


Re: [PATCH] wimax/i2400m: fix calculation of index, remove sizeof

2019-08-23 Thread David Miller
From: Colin Ian King 
Date: Fri, 23 Aug 2019 12:27:00 +0100

> On 23/08/2019 12:23, Dan Carpenter wrote:
>> On Fri, Aug 23, 2019 at 09:52:30AM +0100, Colin King wrote:
>>> From: Colin Ian King 
>>>
>>> The subtraction of the two pointers is automatically scaled by the
>>> size of the size of the object the pointers point to, so the division
>>> by sizeof(*i2400m->barker) is incorrect.  Fix this by removing the
>>> division.  Also make index an unsigned int to clean up a checkpatch
>>> warning.
>>>
>>> Addresses-Coverity: ("Extra sizeof expression")
>>> Fixes: aba3792ac2d7 ("wimax/i2400m: rework bootrom initialization to be 
>>> more flexible")
>>> Signed-off-by: Colin Ian King 
>>> ---
>>>  drivers/net/wimax/i2400m/fw.c | 3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/net/wimax/i2400m/fw.c b/drivers/net/wimax/i2400m/fw.c
>>> index 489cba9b284d..599a703af6eb 100644
>>> --- a/drivers/net/wimax/i2400m/fw.c
>>> +++ b/drivers/net/wimax/i2400m/fw.c
>>> @@ -399,8 +399,7 @@ int i2400m_is_boot_barker(struct i2400m *i2400m,
>>>  * associated with the device. */
>>> if (i2400m->barker
>>> && !memcmp(buf, i2400m->barker, sizeof(i2400m->barker->data))) {
>>> -   unsigned index = (i2400m->barker - i2400m_barker_db)
>>> -   / sizeof(*i2400m->barker);
>>> +   unsigned int index = i2400m->barker - i2400m_barker_db;
>>> d_printf(2, dev, "boot barker cache-confirmed #%u/%08x\n",
>>>  index, le32_to_cpu(i2400m->barker->data[0]));
>> 
>> It's only used for this debug output.  You may as well just delete it.
>> 
>>> return 0;
> 
> Deleting wrong debug code vs fixing debug code? I'd rather go for the
> latter.

It's been wrong since day one, so it's been useful for absolutely nobody.

This is also an ancient driver for hardware no longer in production.

Dan is right, just remove this stuff, thanks.


Re: [PATCH 1/7] fs: introduce kernel_pread_file* support

2019-08-23 Thread Luis Chamberlain
On Fri, Aug 23, 2019 at 12:55:30PM -0700, Scott Branden wrote:
> Hi Takashi
> 
> On 2019-08-23 5:29 a.m., Takashi Iwai wrote:
> > On Thu, 22 Aug 2019 21:24:45 +0200,
> > Scott Branden wrote:
> > > Add kernel_pread_file* support to kernel to allow for partial read
> > > of files with an offset into the file.  Existing kernel_read_file
> > > functions call new kernel_pread_file functions with offset=0 and
> > > flags=KERNEL_PREAD_FLAG_WHOLE.
> > Would this change passes the security check like ima?
> > I thought security_kernel_post_read_file() checks the whole content
> > for calculating the hash...
> 
> It passes the fw_run_tests.sh.  How do you test the firmware loader passes
> this security check?

Its not a security check per code, its an audit of the code, to ensure
that no new cases are not covered and its why I had CC'd Mimi. The
question lies in *if* the approach exposes a new interface which cannot
be attested. Its unclear to me if we can attest currently through
security modules the fallback interface, as there are not APIs with a
respective callback yet.

  Luis


Re: [PATCH net 0/9] rxrpc: Fix use of skb_cow_data()

2019-08-23 Thread David Miller
From: David Howells 
Date: Fri, 23 Aug 2019 09:52:28 +0100

> Question for you: how likely is a newly received buffer, through a UDP socket,
> to be 'cloned'?

Very unlikely, I'd say.


Re: [PATCH v2 2/5] spi: spi-fsl-dspi: Exit the ISR with IRQ_NONE when it's not ours

2019-08-23 Thread Mark Brown
On Fri, Aug 23, 2019 at 03:06:52PM +0300, Vladimir Oltean wrote:

> - You left change requests in the initial patchset I submitted, but
> you partially applied the series anyway. You didn't give me a chance
> to respin the whole series and put the shared IRQ fix on top, so it
> applies on old trees as well. No problem, I sent two versions of the
> patch.

Right, and this is fine.  A big part of this is that it's just
generally bad practice to not have fixes at the front of the
series, I'd flag this up as a problem even if the code was all
new and there was no question of applying as a bug fix.  It's
something that's noticable just at the level of looking at the
shape of the series without even looking at the contents of the
patches, if the fix is actually a good one or anything like that.
In the context of this it made it look like the reason you'd had
to do two versions.

> So I didn't put any target version in the patch titles this time,
> although arguably it would have been clearer to you that there's a
> patch for-5.4 and another version of it for-4.20 (which i *think* is
> how I should submit a fix, I don't see any branch for inclusion in
> stable trees per se).

Not for 4.20, for v5.3 - we basically only fix Linus' tree
directly, anything else gets backported from there unless it's
super important.  I don't think anyone is updating v4.20 at all
these days, the version number change from v4 to v5 was totally
arbatrary.

> Yes, I did send a cover letter for a single patch. I thought it's
> harder to miss than a note hidden under patch 2/5 of one series, and
> in the note section of the other's. I think you could have also made

If you're sending a multi-patch series it's of course good to
send a cover letter, it's just single patches where it's adding
overhead.

> No problem, you missed the link between the two. I sent you a link to
> the lkml archive. You said "I'm not online enough to readily follow
> that link right now". Please teach me - I really don't know - how can

It's not that I missed the link between them, it's that what I'd
expected to see was the fix being the first patch in the series
for -next and for that fix to look substantially the same with at
most some context difference.  I wasn't expecting to see a
completely different patch that wasn't at the start of the
series, had the fix been at the start of the series it'd have
been fairly clear what was going on but the refactoring patch
looked like the main reason you'd needed different versions (it's
certainly why they don't visually resemble each other).

In other words it looked like you'd sent a different fix because
the fix you'd done for -next was based on the first patch in the
series rather than there also being some context changes.

> I make links between patchsets easier for you to follow, if you don't
> read cover letters and you can't access lkml? I promise I'll use that
> method next time.

Like I said include a plain text description of what you're
linking to (eg, the subject line from a mail).

> > I do frequently catch up on my mail on flights or while otherwise
> > travelling so this is even more pressing for me than just being about
> > making things a bit easier to read.

> Maybe you simply should do something else while traveling, just saying.

I could also add in the coffee shop I sometimes work from which
doesn't have WiFi or mobile coverage.  Besides, like that part of
the text does say it's also a usability thing, having to fire up
a web browser to figure out what's being described is a stumbling
block.


signature.asc
Description: PGP signature


[PATCH 1/2] perf report: Use timestamp__scnprintf_nsec for time sort key

2019-08-23 Thread Andi Kleen
From: Andi Kleen 

Use timestamp__scnprintf_nsec to print nanoseconds for the time
sort key, instead of open coding.

Signed-off-by: Andi Kleen 
---
 tools/perf/util/sort.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f9a38a1dd4d1..0985e9072db0 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -668,17 +668,11 @@ sort__time_cmp(struct hist_entry *left, struct hist_entry 
*right)
 static int hist_entry__time_snprintf(struct hist_entry *he, char *bf,
size_t size, unsigned int width)
 {
-   unsigned long secs;
-   unsigned long long nsecs;
char he_time[32];
 
-   nsecs = he->time;
-   secs = nsecs / NSEC_PER_SEC;
-   nsecs -= secs * NSEC_PER_SEC;
-
if (symbol_conf.nanosecs)
-   snprintf(he_time, sizeof he_time, "%5lu.%09llu: ",
-secs, nsecs);
+   timestamp__scnprintf_nsec(he->time, he_time,
+ sizeof(he_time));
else
timestamp__scnprintf_usec(he->time, he_time,
  sizeof(he_time));
-- 
2.20.1



[PATCH 2/2] perf report: Fix --ns time sort key output

2019-08-23 Thread Andi Kleen
From: Andi Kleen 

If the user specified --ns, the column to print the sort time stamp
wasn't wide enough to actually print the full nanoseconds.

Widen the time key column width when --ns is specified.

Before:

% perf record -a sleep 1
% perf report --sort time,overhead,symbol --stdio --ns
...
 2.39%  187851.1  [k] smp_call_function_single  
  -  -
 1.53%  187851.1  [k] intel_idle
  -  -
 0.59%  187851.1  [.] __wcscmp_ifunc
  -  -
 0.33%  187851.1  [.]   
  -  -
 0.28%  187851.1  [k] cpuidle_enter_state   
  -  -

After:

% perf report --sort time,overhead,symbol --stdio --ns
...
 2.39%  187851.1  [k] smp_call_function_single  
  -  -
 1.53%  187851.1  [k] intel_idle
  -  -
 0.59%  187851.1  [.] __wcscmp_ifunc
  -  -
 0.33%  187851.1  [.]   
  -  -
 0.28%  187851.1  [k] cpuidle_enter_state   
  -  -

Signed-off-by: Andi Kleen 
---
 tools/perf/util/hist.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 8efbf58dc3d0..33702675073c 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -193,7 +193,10 @@ void hists__calc_col_len(struct hists *hists, struct 
hist_entry *h)
hists__new_col_len(hists, HISTC_MEM_LVL, 21 + 3);
hists__new_col_len(hists, HISTC_LOCAL_WEIGHT, 12);
hists__new_col_len(hists, HISTC_GLOBAL_WEIGHT, 12);
-   hists__new_col_len(hists, HISTC_TIME, 12);
+   if (symbol_conf.nanosecs)
+   hists__new_col_len(hists, HISTC_TIME, 16);
+   else
+   hists__new_col_len(hists, HISTC_TIME, 12);
 
if (h->srcline) {
len = MAX(strlen(h->srcline), strlen(sort_srcline.se_header));
-- 
2.20.1



[PATCH] KVM: x86: Don't update RIP or do single-step on faulting emulation

2019-08-23 Thread Sean Christopherson
Don't advance RIP or inject a single-step #DB if emulation signals a
fault.  This logic applies to all state updates that are conditional on
clean retirement of the emulation instruction, e.g. updating RFLAGS was
previously handled by commit 38827dbd3fb85 ("KVM: x86: Do not update
EFLAGS on faulting emulation").

Not advancing RIP is likely a nop, i.e. ctxt->eip isn't updated with
ctxt->_eip until emulation "retires" anyways.  Skipping #DB injection
fixes a bug reported by Andy Lutomirski where a #UD on SYSCALL due to
invalid state with RFLAGS.RF=1 would loop indefinitely due to emulation
overwriting the #UD with #DB and thus restarting the bad SYSCALL over
and over.

Cc: Nadav Amit 
Cc: sta...@vger.kernel.org
Reported-by: Andy Lutomirski 
Fixes: 663f4c61b803 ("KVM: x86: handle singlestep during emulation")
Signed-off-by: Sean Christopherson 
---

Note, this has minor conflict with my recent series to cleanup the
emulator return flows[*].  The end result should look something like:

if (!ctxt->have_exception ||
exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
kvm_rip_write(vcpu, ctxt->eip);
if (r && ctxt->tf)
r = kvm_vcpu_do_singlestep(vcpu);
__kvm_set_rflags(vcpu, ctxt->eflags);
}

[*] 
https://lkml.kernel.org/r/20190823010709.24879-1-sean.j.christopher...@intel.com

 arch/x86/kvm/x86.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4cfd786d0b6..d2962671c3d3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6611,12 +6611,13 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu,
unsigned long rflags = kvm_x86_ops->get_rflags(vcpu);
toggle_interruptibility(vcpu, ctxt->interruptibility);
vcpu->arch.emulate_regs_need_sync_to_vcpu = false;
-   kvm_rip_write(vcpu, ctxt->eip);
-   if (r == EMULATE_DONE && ctxt->tf)
-   kvm_vcpu_do_singlestep(vcpu, );
if (!ctxt->have_exception ||
-   exception_type(ctxt->exception.vector) == EXCPT_TRAP)
+   exception_type(ctxt->exception.vector) == EXCPT_TRAP) {
+   kvm_rip_write(vcpu, ctxt->eip);
+   if (r == EMULATE_DONE && ctxt->tf)
+   kvm_vcpu_do_singlestep(vcpu, );
__kvm_set_rflags(vcpu, ctxt->eflags);
+   }
 
/*
 * For STI, interrupts are shadowed; so KVM_REQ_EVENT will
-- 
2.22.0



  1   2   3   4   5   6   7   8   9   >