Re: [Intel-gfx] [RFC PATCH 00/20] Initial Xe driver submission

2023-01-17 Thread Jason Ekstrand
On Thu, Jan 12, 2023 at 11:17 AM Matthew Brost 
wrote:

> On Thu, Jan 12, 2023 at 10:54:25AM +0100, Lucas De Marchi wrote:
> > On Thu, Jan 05, 2023 at 09:27:57PM +, Matthew Brost wrote:
> > > On Tue, Jan 03, 2023 at 12:21:08PM +, Tvrtko Ursulin wrote:
> > > >
> > > > On 22/12/2022 22:21, Matthew Brost wrote:
> > > > > Hello,
> > > > >
> > > > > This is a submission for Xe, a new driver for Intel GPUs that
> supports both
> > > > > integrated and discrete platforms starting with Tiger Lake (first
> platform with
> > > > > Intel Xe Architecture). The intention of this new driver is to
> have a fresh base
> > > > > to work from that is unencumbered by older platforms, whilst also
> taking the
> > > > > opportunity to rearchitect our driver to increase sharing across
> the drm
> > > > > subsystem, both leveraging and allowing us to contribute more
> towards other
> > > > > shared components like TTM and drm/scheduler. The memory model is
> based on VM
> > > > > bind which is similar to the i915 implementation. Likewise the
> execbuf
> > > > > implementation for Xe is very similar to execbuf3 in the i915 [1].
> > > > >
> > > > > The code is at a stage where it is already functional and has
> experimental
> > > > > support for multiple platforms starting from Tiger Lake, with
> initial support
> > > > > implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan
> drivers), as well
> > > > > as in NEO (for OpenCL and Level0). A Mesa MR has been posted [2]
> and NEO
> > > > > implementation will be released publicly early next year. We also
> have a suite
> > > > > of IGTs for XE that will appear on the IGT list shortly.
> > > > >
> > > > > It has been built with the assumption of supporting multiple
> architectures from
> > > > > the get-go, right now with tests running both on X86 and ARM
> hosts. And we
> > > > > intend to continue working on it and improving on it as part of
> the kernel
> > > > > community upstream.
> > > > >
> > > > > The new Xe driver leverages a lot from i915 and work on i915
> continues as we
> > > > > ready Xe for production throughout 2023.
> > > > >
> > > > > As for display, the intent is to share the display code with the
> i915 driver so
> > > > > that there is maximum reuse there. Currently this is being done by
> compiling the
> > > > > display code twice, but alternatives to that are under
> consideration and we want
> > > > > to have more discussion on what the best final solution will look
> like over the
> > > > > next few months. Right now, work is ongoing in refactoring the
> display codebase
> > > > > to remove as much as possible any unnecessary dependencies on i915
> specific data
> > > > > structures there..
> > > > >
> > > > > We currently have 2 submission backends, execlists and GuC. The
> execlist is
> > > > > meant mostly for testing and is not fully functional while GuC
> backend is fully
> > > > > functional. As with the i915 and GuC submission, in Xe the GuC
> firmware is
> > > > > required and should be placed in /lib/firmware/xe.
> > > >
> > > > What is the plan going forward for the execlists backend? I think it
> would
> > > > be preferable to not upstream something semi-functional and so to
> carry
> > > > technical debt in the brand new code base, from the very start. If
> it is for
> > > > Tigerlake, which is the starting platform for Xe, could it be made
> GuC only
> > > > Tigerlake for instance?
> > > >
> > >
> > > A little background here. In the original PoC written by Jason and
> Dave,
> > > the execlist backend was the only one present and it was in
> semi-working
> > > state. As soon as myself and a few others started working on Xe we went
> > > full in a on the GuC backend. We left the execlist backend basically in
> > > the state it was in. We left it in place for 2 reasons.
> > >
> > > 1. Having 2 backends from the start ensured we layered our code
> > > correctly. The layer was a complete disaster in the i915 so we really
> > > wanted to avoid that.
> > > 2. The thought was it might be needed for early product bring up one
> > > day.
> > >
> > > As I think about this a bit more, we likely just delete execlist
> backend
> > > before merging this upstream and perhaps just carry 1 large patch
> > > internally with this implementation that we can use as needed. Final
> > > decession TDB though.
> >
> > but that might regress after some time on "let's keep 2 backends so we
> > layer the code correctly". Leaving the additional backend behind
> > CONFIG_BROKEN or XE_EXPERIMENTAL, or something like that, not
> > enabled by distros, but enabled in CI would be a good idea IMO.
> >
> > Carrying a large patch out of tree would make things harder for new
> > platforms. A perfect backend split would make it possible, but like I
> > said, we are likely not to have it if we delete the second backend.
> >
>
> Good points here Lucas. One thing that we absolutely have wrong is
> falling back to execlists if GuC firmware is missing. We def should not
> be 

Re: [Intel-gfx] [RFC PATCH 00/20] Initial Xe driver submission

2023-01-17 Thread Jason Ekstrand
On Thu, Dec 22, 2022 at 4:29 PM Matthew Brost 
wrote:

> Hello,
>
> This is a submission for Xe, a new driver for Intel GPUs that supports both
> integrated and discrete platforms starting with Tiger Lake (first platform
> with
> Intel Xe Architecture). The intention of this new driver is to have a
> fresh base
> to work from that is unencumbered by older platforms, whilst also taking
> the
> opportunity to rearchitect our driver to increase sharing across the drm
> subsystem, both leveraging and allowing us to contribute more towards other
> shared components like TTM and drm/scheduler. The memory model is based on
> VM
> bind which is similar to the i915 implementation. Likewise the execbuf
> implementation for Xe is very similar to execbuf3 in the i915 [1].
>
> The code is at a stage where it is already functional and has experimental
> support for multiple platforms starting from Tiger Lake, with initial
> support
> implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan drivers), as
> well
> as in NEO (for OpenCL and Level0). A Mesa MR has been posted [2] and NEO
> implementation will be released publicly early next year. We also have a
> suite
> of IGTs for XE that will appear on the IGT list shortly.
>
> It has been built with the assumption of supporting multiple architectures
> from
> the get-go, right now with tests running both on X86 and ARM hosts. And we
> intend to continue working on it and improving on it as part of the kernel
> community upstream.
>
> The new Xe driver leverages a lot from i915 and work on i915 continues as
> we
> ready Xe for production throughout 2023.
>
> As for display, the intent is to share the display code with the i915
> driver so
> that there is maximum reuse there. Currently this is being done by
> compiling the
> display code twice, but alternatives to that are under consideration and
> we want
> to have more discussion on what the best final solution will look like
> over the
> next few months. Right now, work is ongoing in refactoring the display
> codebase
> to remove as much as possible any unnecessary dependencies on i915
> specific data
> structures there..
>
> We currently have 2 submission backends, execlists and GuC. The execlist is
> meant mostly for testing and is not fully functional while GuC backend is
> fully
> functional. As with the i915 and GuC submission, in Xe the GuC firmware is
> required and should be placed in /lib/firmware/xe.
>
> The GuC firmware can be found in the below location:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
>
> The easiest way to setup firmware is:
> cp -r /lib/firmware/i915 /lib/firmware/xe
>
> The code has been organized such that we have all patches that touch areas
> outside of drm/xe first for review, and then the actual new driver in a
> separate
> commit. The code which is outside of drm/xe is included in this RFC while
> drm/xe is not due to the size of the commit. The drm/xe is code is
> available in
> a public repo listed below.
>
> Xe driver commit:
>
> https://cgit.freedesktop.org/drm/drm-xe/commit/?h=drm-xe-next=9cb016ebbb6a275f57b1cb512b95d5a842391ad7


Drive-by comment here because I don't see any actual xe patches on the list:

You probably want to drop DRM_XE_SYNC_DMA_BUF from the uAPI.  Now that
we've landed the new dma-buf ioctls for sync_file import/export, there's
really no reason to have it as part of submit.  Dropping it should also
make locking a tiny bit easier.

--Jason



> Xe kernel repo:
> https://cgit.freedesktop.org/drm/drm-xe/
>
> There's a lot of work still to happen on Xe but we're very excited about
> it and
> wanted to share it early and welcome feedback and discussion.
>
> Cheers,
> Matthew Brost
>
> [1] https://patchwork.freedesktop.org/series/105879/
> [2] https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20418
>
> Maarten Lankhorst (12):
>   drm/amd: Convert amdgpu to use suballocation helper.
>   drm/radeon: Use the drm suballocation manager implementation.
>   drm/i915: Remove gem and overlay frontbuffer tracking
>   drm/i915/display: Neuter frontbuffer tracking harder
>   drm/i915/display: Add more macros to remove all direct calls to uncore
>   drm/i915/display: Remove all uncore mmio accesses in favor of intel_de
>   drm/i915: Rename find_section to find_bdb_section
>   drm/i915/regs: Set DISPLAY_MMIO_BASE to 0 for xe
>   drm/i915/display: Fix a use-after-free when intel_edp_init_connector
> fails
>   drm/i915/display: Remaining changes to make xe compile
>   sound/hda: Allow XE as i915 replacement for sound
>   mei/hdcp: Also enable for XE
>
> Matthew Brost (5):
>   drm/sched: Convert drm scheduler to use a work queue rather than
> kthread
>   drm/sched: Add generic scheduler message interface
>   drm/sched: Start run wq before TDR in drm_sched_start
>   drm/sched: Submit job before starting TDR
>   drm/sched: Add helper to set TDR timeout
>
> Thomas Hellström (3):
>   drm/suballoc: Introduce a generic 

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2023 at 4:32 PM Matthew Brost 
wrote:

> On Wed, Jan 11, 2023 at 04:18:01PM -0600, Jason Ekstrand wrote:
> > On Wed, Jan 11, 2023 at 2:50 AM Tvrtko Ursulin <
> > tvrtko.ursu...@linux.intel.com> wrote:
> >
> > >
> > > On 10/01/2023 14:08, Jason Ekstrand wrote:
> > > > On Tue, Jan 10, 2023 at 5:28 AM Tvrtko Ursulin
> > > >  tvrtko.ursu...@linux.intel.com>>
> > >
> > > > wrote:
> > > >
> > > >
> > > >
> > > > On 09/01/2023 17:27, Jason Ekstrand wrote:
> > > >
> > > > [snip]
> > > >
> > > >  >  >>> AFAICT it proposes to have 1:1 between *userspace*
> > > created
> > > >  > contexts (per
> > > >  >  >>> context _and_ engine) and drm_sched. I am not sure
> > > avoiding
> > > >  > invasive changes
> > > >  >  >>> to the shared code is in the spirit of the overall
> idea
> > > > and instead
> > > >  >  >>> opportunity should be used to look at way to
> > > > refactor/improve
> > > >  > drm_sched.
> > > >  >
> > > >  >
> > > >  > Maybe?  I'm not convinced that what Xe is doing is an abuse at
> > > > all or
> > > >  > really needs to drive a re-factor.  (More on that later.)
> > > > There's only
> > > >  > one real issue which is that it fires off potentially a lot of
> > > > kthreads.
> > > >  > Even that's not that bad given that kthreads are pretty light
> and
> > > > you're
> > > >  > not likely to have more kthreads than userspace threads which
> are
> > > > much
> > > >  > heavier.  Not ideal, but not the end of the world either.
> > > > Definitely
> > > >  > something we can/should optimize but if we went through with
> Xe
> > > > without
> > > >  > this patch, it would probably be mostly ok.
> > > >  >
> > > >  >  >> Yes, it is 1:1 *userspace* engines and drm_sched.
> > > >  >  >>
> > > >  >  >> I'm not really prepared to make large changes to DRM
> > > > scheduler
> > > >  > at the
> > > >  >  >> moment for Xe as they are not really required nor does
> > > Boris
> > > >  > seem they
> > > >  >  >> will be required for his work either. I am interested
> to
> > > see
> > > >  > what Boris
> > > >  >  >> comes up with.
> > > >  >  >>
> > > >  >  >>> Even on the low level, the idea to replace drm_sched
> > > threads
> > > >  > with workers
> > > >  >  >>> has a few problems.
> > > >  >  >>>
> > > >  >  >>> To start with, the pattern of:
> > > >  >  >>>
> > > >  >  >>>while (not_stopped) {
> > > >  >  >>> keep picking jobs
> > > >  >  >>>}
> > > >  >  >>>
> > > >  >  >>> Feels fundamentally in disagreement with workers
> (while
> > > >  > obviously fits
> > > >  >  >>> perfectly with the current kthread design).
> > > >  >  >>
> > > >  >  >> The while loop breaks and worker exists if no jobs are
> > > ready.
> > > >  >
> > > >  >
> > > >  > I'm not very familiar with workqueues. What are you saying
> would
> > > fit
> > > >  > better? One scheduling job per work item rather than one big
> work
> > > > item
> > > >  > which handles all available jobs?
> > > >
> > > > Yes and no, it indeed IMO does not fit to have a work item which
> is
> > > > potentially unbound in runtime. But it is a bit moot conceptual
> > > > mismatch
> > > > because it is a worst case / theoretical, and I think due more
> > > > fundamental concerns.
> > > >

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2023 at 2:50 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:

>
> On 10/01/2023 14:08, Jason Ekstrand wrote:
> > On Tue, Jan 10, 2023 at 5:28 AM Tvrtko Ursulin
> > mailto:tvrtko.ursu...@linux.intel.com>>
>
> > wrote:
> >
> >
> >
> > On 09/01/2023 17:27, Jason Ekstrand wrote:
> >
> > [snip]
> >
> >  >  >>> AFAICT it proposes to have 1:1 between *userspace*
> created
> >  > contexts (per
> >  >  >>> context _and_ engine) and drm_sched. I am not sure
> avoiding
> >  > invasive changes
> >  >  >>> to the shared code is in the spirit of the overall idea
> > and instead
> >  >  >>> opportunity should be used to look at way to
> > refactor/improve
> >  > drm_sched.
> >  >
> >  >
> >  > Maybe?  I'm not convinced that what Xe is doing is an abuse at
> > all or
> >  > really needs to drive a re-factor.  (More on that later.)
> > There's only
> >  > one real issue which is that it fires off potentially a lot of
> > kthreads.
> >  > Even that's not that bad given that kthreads are pretty light and
> > you're
> >  > not likely to have more kthreads than userspace threads which are
> > much
> >  > heavier.  Not ideal, but not the end of the world either.
> > Definitely
> >  > something we can/should optimize but if we went through with Xe
> > without
> >  > this patch, it would probably be mostly ok.
> >  >
> >  >  >> Yes, it is 1:1 *userspace* engines and drm_sched.
> >  >  >>
> >  >  >> I'm not really prepared to make large changes to DRM
> > scheduler
> >  > at the
> >  >  >> moment for Xe as they are not really required nor does
> Boris
> >  > seem they
> >  >  >> will be required for his work either. I am interested to
> see
> >  > what Boris
> >  >  >> comes up with.
> >  >  >>
> >  >  >>> Even on the low level, the idea to replace drm_sched
> threads
> >  > with workers
> >  >  >>> has a few problems.
> >  >  >>>
> >  >  >>> To start with, the pattern of:
> >  >  >>>
> >  >  >>>while (not_stopped) {
> >  >  >>> keep picking jobs
> >  >  >>>}
> >  >  >>>
> >  >  >>> Feels fundamentally in disagreement with workers (while
> >  > obviously fits
> >  >  >>> perfectly with the current kthread design).
> >  >  >>
> >  >  >> The while loop breaks and worker exists if no jobs are
> ready.
> >  >
> >  >
> >  > I'm not very familiar with workqueues. What are you saying would
> fit
> >  > better? One scheduling job per work item rather than one big work
> > item
> >  > which handles all available jobs?
> >
> > Yes and no, it indeed IMO does not fit to have a work item which is
> > potentially unbound in runtime. But it is a bit moot conceptual
> > mismatch
> > because it is a worst case / theoretical, and I think due more
> > fundamental concerns.
> >
> > If we have to go back to the low level side of things, I've picked
> this
> > random spot to consolidate what I have already mentioned and perhaps
> > expand.
> >
> > To start with, let me pull out some thoughts from workqueue.rst:
> >
> > """
> > Generally, work items are not expected to hog a CPU and consume many
> > cycles. That means maintaining just enough concurrency to prevent
> work
> > processing from stalling should be optimal.
> > """
> >
> > For unbound queues:
> > """
> > The responsibility of regulating concurrency level is on the users.
> > """
> >
> > Given the unbound queues will be spawned on demand to service all
> > queued
> > work items (more interesting when mixing up with the
> > system_unbound_wq),
> > in the proposed design the number of in

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-10 Thread Jason Ekstrand
On Tue, Jan 10, 2023 at 5:28 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:

>
>
> On 09/01/2023 17:27, Jason Ekstrand wrote:
>
> [snip]
>
> >  >>> AFAICT it proposes to have 1:1 between *userspace* created
> > contexts (per
> >  >>> context _and_ engine) and drm_sched. I am not sure avoiding
> > invasive changes
> >  >>> to the shared code is in the spirit of the overall idea and
> instead
> >  >>> opportunity should be used to look at way to refactor/improve
> > drm_sched.
> >
> >
> > Maybe?  I'm not convinced that what Xe is doing is an abuse at all or
> > really needs to drive a re-factor.  (More on that later.)  There's only
> > one real issue which is that it fires off potentially a lot of kthreads.
> > Even that's not that bad given that kthreads are pretty light and you're
> > not likely to have more kthreads than userspace threads which are much
> > heavier.  Not ideal, but not the end of the world either.  Definitely
> > something we can/should optimize but if we went through with Xe without
> > this patch, it would probably be mostly ok.
> >
> >  >> Yes, it is 1:1 *userspace* engines and drm_sched.
> >  >>
> >  >> I'm not really prepared to make large changes to DRM scheduler
> > at the
> >  >> moment for Xe as they are not really required nor does Boris
> > seem they
> >  >> will be required for his work either. I am interested to see
> > what Boris
> >  >> comes up with.
> >  >>
> >  >>> Even on the low level, the idea to replace drm_sched threads
> > with workers
> >  >>> has a few problems.
> >  >>>
> >  >>> To start with, the pattern of:
> >  >>>
> >  >>>while (not_stopped) {
> >  >>> keep picking jobs
> >  >>>}
> >  >>>
> >  >>> Feels fundamentally in disagreement with workers (while
> > obviously fits
> >  >>> perfectly with the current kthread design).
> >  >>
> >  >> The while loop breaks and worker exists if no jobs are ready.
> >
> >
> > I'm not very familiar with workqueues. What are you saying would fit
> > better? One scheduling job per work item rather than one big work item
> > which handles all available jobs?
>
> Yes and no, it indeed IMO does not fit to have a work item which is
> potentially unbound in runtime. But it is a bit moot conceptual mismatch
> because it is a worst case / theoretical, and I think due more
> fundamental concerns.
>
> If we have to go back to the low level side of things, I've picked this
> random spot to consolidate what I have already mentioned and perhaps
> expand.
>
> To start with, let me pull out some thoughts from workqueue.rst:
>
> """
> Generally, work items are not expected to hog a CPU and consume many
> cycles. That means maintaining just enough concurrency to prevent work
> processing from stalling should be optimal.
> """
>
> For unbound queues:
> """
> The responsibility of regulating concurrency level is on the users.
> """
>
> Given the unbound queues will be spawned on demand to service all queued
> work items (more interesting when mixing up with the system_unbound_wq),
> in the proposed design the number of instantiated worker threads does
> not correspond to the number of user threads (as you have elsewhere
> stated), but pessimistically to the number of active user contexts.


Those are pretty much the same in practice.  Rather, user threads is
typically an upper bound on the number of contexts.  Yes, a single user
thread could have a bunch of contexts but basically nothing does that
except IGT.  In real-world usage, it's at most one context per user thread.


> That
> is the number which drives the maximum number of not-runnable jobs that
> can become runnable at once, and hence spawn that many work items, and
> in turn unbound worker threads.
>
> Several problems there.
>
> It is fundamentally pointless to have potentially that many more threads
> than the number of CPU cores - it simply creates a scheduling storm.
>
> Unbound workers have no CPU / cache locality either and no connection
> with the CPU scheduler to optimize scheduling patterns. This may matter
> either on large systems or on small ones. Whereas the current design
> allows for scheduler to notice userspace CPU thread keeps waking up th

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-09 Thread Jason Ekstrand
On Mon, Jan 9, 2023 at 7:46 AM Tvrtko Ursulin <
tvrtko.ursu...@linux.intel.com> wrote:

>
> On 06/01/2023 23:52, Matthew Brost wrote:
> > On Thu, Jan 05, 2023 at 09:43:41PM +, Matthew Brost wrote:
> >> On Tue, Jan 03, 2023 at 01:02:15PM +, Tvrtko Ursulin wrote:
> >>>
> >>> On 02/01/2023 07:30, Boris Brezillon wrote:
>  On Fri, 30 Dec 2022 12:55:08 +0100
>  Boris Brezillon  wrote:
> 
> > On Fri, 30 Dec 2022 11:20:42 +0100
> > Boris Brezillon  wrote:
> >
> >> Hello Matthew,
> >>
> >> On Thu, 22 Dec 2022 14:21:11 -0800
> >> Matthew Brost  wrote:
> >>> In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> >>> mapping between a drm_gpu_scheduler and drm_sched_entity. At first
> this
> >>> seems a bit odd but let us explain the reasoning below.
> >>>
> >>> 1. In XE the submission order from multiple drm_sched_entity is not
> >>> guaranteed to be the same completion even if targeting the same
> hardware
> >>> engine. This is because in XE we have a firmware scheduler, the
> GuC,
> >>> which allowed to reorder, timeslice, and preempt submissions. If a
> using
> >>> shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR
> falls
> >>> apart as the TDR expects submission order == completion order.
> Using a
> >>> dedicated drm_gpu_scheduler per drm_sched_entity solve this
> problem.
> >>
> >> Oh, that's interesting. I've been trying to solve the same sort of
> >> issues to support Arm's new Mali GPU which is relying on a
> FW-assisted
> >> scheduling scheme (you give the FW N streams to execute, and it does
> >> the scheduling between those N command streams, the kernel driver
> >> does timeslice scheduling to update the command streams passed to
> the
> >> FW). I must admit I gave up on using drm_sched at some point, mostly
> >> because the integration with drm_sched was painful, but also
> because I
> >> felt trying to bend drm_sched to make it interact with a
> >> timeslice-oriented scheduling model wasn't really future proof.
> Giving
> >> drm_sched_entity exlusive access to a drm_gpu_scheduler probably
> might
> >> help for a few things (didn't think it through yet), but I feel it's
> >> coming short on other aspects we have to deal with on Arm GPUs.
> >
> > Ok, so I just had a quick look at the Xe driver and how it
> > instantiates the drm_sched_entity and drm_gpu_scheduler, and I think
> I
> > have a better understanding of how you get away with using drm_sched
> > while still controlling how scheduling is really done. Here
> > drm_gpu_scheduler is just a dummy abstract that let's you use the
> > drm_sched job queuing/dep/tracking mechanism. The whole run-queue
> > selection is dumb because there's only one entity ever bound to the
> > scheduler (the one that's part of the xe_guc_engine object which also
> > contains the drm_gpu_scheduler instance). I guess the main issue we'd
> > have on Arm is the fact that the stream doesn't necessarily get
> > scheduled when ->run_job() is called, it can be placed in the
> runnable
> > queue and be picked later by the kernel-side scheduler when a FW slot
> > gets released. That can probably be sorted out by manually disabling
> the
> > job timer and re-enabling it when the stream gets picked by the
> > scheduler. But my main concern remains, we're basically abusing
> > drm_sched here.
> >
> > For the Arm driver, that means turning the following sequence
> >
> > 1. wait for job deps
> > 2. queue job to ringbuf and push the stream to the runnable
> >  queue (if it wasn't queued already). Wakeup the timeslice
> scheduler
> >  to re-evaluate (if the stream is not on a FW slot already)
> > 3. stream gets picked by the timeslice scheduler and sent to the FW
> for
> >  execution
> >
> > into
> >
> > 1. queue job to entity which takes care of waiting for job deps for
> >  us
> > 2. schedule a drm_sched_main iteration
> > 3. the only available entity is picked, and the first job from this
> >  entity is dequeued. ->run_job() is called: the job is queued to
> the
> >  ringbuf and the stream is pushed to the runnable queue (if it
> wasn't
> >  queued already). Wakeup the timeslice scheduler to re-evaluate
> (if
> >  the stream is not on a FW slot already)
> > 4. stream gets picked by the timeslice scheduler and sent to the FW
> for
> >  execution
> >
> > That's one extra step we don't really need. To sum-up, yes, all the
> > job/entity tracking might be interesting to share/re-use, but I
> wonder
> > if we couldn't have that without pulling out the scheduling part of
> > drm_sched, or maybe I'm missing something, and there's something in
> > drm_gpu_scheduler you really need.
> 
>  On second 

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

2023-01-09 Thread Jason Ekstrand
On Thu, Jan 5, 2023 at 1:40 PM Matthew Brost 
wrote:

> On Mon, Jan 02, 2023 at 08:30:19AM +0100, Boris Brezillon wrote:
> > On Fri, 30 Dec 2022 12:55:08 +0100
> > Boris Brezillon  wrote:
> >
> > > On Fri, 30 Dec 2022 11:20:42 +0100
> > > Boris Brezillon  wrote:
> > >
> > > > Hello Matthew,
> > > >
> > > > On Thu, 22 Dec 2022 14:21:11 -0800
> > > > Matthew Brost  wrote:
> > > >
> > > > > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1
> > > > > mapping between a drm_gpu_scheduler and drm_sched_entity. At first
> this
> > > > > seems a bit odd but let us explain the reasoning below.
> > > > >
> > > > > 1. In XE the submission order from multiple drm_sched_entity is not
> > > > > guaranteed to be the same completion even if targeting the same
> hardware
> > > > > engine. This is because in XE we have a firmware scheduler, the
> GuC,
> > > > > which allowed to reorder, timeslice, and preempt submissions. If a
> using
> > > > > shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR
> falls
> > > > > apart as the TDR expects submission order == completion order.
> Using a
> > > > > dedicated drm_gpu_scheduler per drm_sched_entity solve this
> problem.
> > > >
> > > > Oh, that's interesting. I've been trying to solve the same sort of
> > > > issues to support Arm's new Mali GPU which is relying on a
> FW-assisted
> > > > scheduling scheme (you give the FW N streams to execute, and it does
> > > > the scheduling between those N command streams, the kernel driver
> > > > does timeslice scheduling to update the command streams passed to the
> > > > FW). I must admit I gave up on using drm_sched at some point, mostly
> > > > because the integration with drm_sched was painful, but also because
> I
> > > > felt trying to bend drm_sched to make it interact with a
> > > > timeslice-oriented scheduling model wasn't really future proof.
> Giving
> > > > drm_sched_entity exlusive access to a drm_gpu_scheduler probably
> might
> > > > help for a few things (didn't think it through yet), but I feel it's
> > > > coming short on other aspects we have to deal with on Arm GPUs.
> > >
> > > Ok, so I just had a quick look at the Xe driver and how it
> > > instantiates the drm_sched_entity and drm_gpu_scheduler, and I think I
> > > have a better understanding of how you get away with using drm_sched
> > > while still controlling how scheduling is really done. Here
> > > drm_gpu_scheduler is just a dummy abstract that let's you use the
> > > drm_sched job queuing/dep/tracking mechanism. The whole run-queue
>
> You nailed it here, we use the DRM scheduler for queuing jobs,
> dependency tracking and releasing jobs to be scheduled when dependencies
> are met, and lastly a tracking mechanism of inflights jobs that need to
> be cleaned up if an error occurs. It doesn't actually do any scheduling
> aside from the most basic level of not overflowing the submission ring
> buffer. In this sense, a 1 to 1 relationship between entity and
> scheduler fits quite well.
>

Yeah, I think there's an annoying difference between what AMD/NVIDIA/Intel
want here and what you need for Arm thanks to the number of FW queues
available. I don't remember the exact number of GuC queues but it's at
least 1k. This puts it in an entirely different class from what you have on
Mali. Roughly, there's about three categories here:

 1. Hardware where the kernel is placing jobs on actual HW rings. This is
old Mali, Intel Haswell and earlier, and probably a bunch of others.
(Intel BDW+ with execlists is a weird case that doesn't fit in this
categorization.)

 2. Hardware (or firmware) with a very limited number of queues where
you're going to have to juggle in the kernel in order to run desktop Linux.

 3. Firmware scheduling with a high queue count. In this case, you don't
want the kernel scheduling anything. Just throw it at the firmware and let
it go br.  If we ever run out of queues (unlikely), the kernel can
temporarily pause some low-priority contexts and do some juggling or,
frankly, just fail userspace queue creation and tell the user to close some
windows.

The existence of this 2nd class is a bit annoying but it's where we are. I
think it's worth recognizing that Xe and panfrost are in different places
here and will require different designs. For Xe, we really are just using
drm/scheduler as a front-end and the firmware does all the real scheduling.

How do we deal with class 2? That's an interesting question.  We may
eventually want to break that off into a separate discussion and not litter
the Xe thread but let's keep going here for a bit.  I think there are some
pretty reasonable solutions but they're going to look a bit different.

The way I did this for Xe with execlists was to keep the 1:1:1 mapping
between drm_gpu_scheduler, drm_sched_entity, and userspace xe_engine.
Instead of feeding a GuC ring, though, it would feed a fixed-size execlist
ring and then there was a tiny kernel which operated entirely in IRQ

Re: [PATCH] drm/fourcc: Document open source user waiver

2022-12-01 Thread Jason Ekstrand
Acked-by: Jason Ekstrand 

On Thu, Dec 1, 2022 at 4:22 AM Daniel Vetter  wrote:

> On Thu, 1 Dec 2022 at 11:07, Daniel Vetter  wrote:
> >
> > On Wed, Nov 23, 2022 at 08:24:37PM +0100, Daniel Vetter wrote:
> > > It's a bit a FAQ, and we really can't claim to be the authoritative
> > > source for allocating these numbers used in many standard extensions
> > > if we tell closed source or vendor stacks in general to go away.
> > >
> > > Iirc this was already clarified in some vulkan discussions, but I
> > > can't find that anywhere anymore. At least not in a public link.
> > >
> > > Cc: Maarten Lankhorst 
> > > Cc: Maxime Ripard 
> > > Cc: Thomas Zimmermann 
> > > Cc: David Airlie 
> > > Cc: Daniel Vetter 
> > > Cc: Alex Deucher 
> > > Cc: Daniel Stone 
> > > Cc: Bas Nieuwenhuizen 
> > > Cc: Jason Ekstrand 
> > > Cc: Neil Trevett 
> > > Signed-off-by: Daniel Vetter 
> >
> > From irc:
> >
> >  danvet: ack from me
>
> Also from irc:
>
>  danvet: Acked
>
> -Daniel
>
> > > ---
> > >  include/uapi/drm/drm_fourcc.h | 12 
> > >  1 file changed, 12 insertions(+)
> > >
> > > diff --git a/include/uapi/drm/drm_fourcc.h
> b/include/uapi/drm/drm_fourcc.h
> > > index bc056f2d537d..de703c6be969 100644
> > > --- a/include/uapi/drm/drm_fourcc.h
> > > +++ b/include/uapi/drm/drm_fourcc.h
> > > @@ -88,6 +88,18 @@ extern "C" {
> > >   *
> > >   * The authoritative list of format modifier codes is found in
> > >   * `include/uapi/drm/drm_fourcc.h`
> > > + *
> > > + * Open Source User Waiver
> > > + * ---
> > > + *
> > > + * Because this is the authoritative source for pixel formats and
> modifiers
> > > + * referenced by GL, Vulkan extensions and other standards and hence
> used both
> > > + * by open source and closed source driver stacks, the usual
> requirement for an
> > > + * upstream in-kernel or open source userspace user does not apply.
> > > + *
> > > + * To ensure, as much as feasible, compatibility across stacks and
> avoid
> > > + * confusion with incompatible enumerations stakeholders for all
> relevant driver
> > > + * stacks should approve additions.
> > >   */
> > >
> > >  #define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \
> > > --
> > > 2.37.2
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>


Re: vm binding interfaces and parallel with mmap

2022-08-24 Thread Jason Ekstrand
On Mon, Aug 22, 2022 at 8:27 AM Christian König 
wrote:

> Am 22.08.22 um 10:34 schrieb Bas Nieuwenhuizen:
> > On Mon, Aug 22, 2022 at 9:28 AM Dave Airlie  wrote:
> >> On Mon, 22 Aug 2022 at 17:05, Dave Airlie  wrote:
> >>> Hey,
> >>>
> >>> I've just been looking at the vm bind type interfaces and wanted to at
> >>> least document how we think the unmapping API should work. I know I've
> >>> talked on irc before about this, but wanted to solidify things a bit
> >>> more around what is required vs what is a nice to have.
> >>>
> >>> My main concerns/thoughts are around the unbind interfaces and how
> >>> close to munmap they should be.
> >>>
> >>> I think the mapping operation is mostly consistent
> >>> MAP(bo handle, offset into bo, range, VM offset, VM flags)
> >>> which puts the range inside to bo at the offset in the current VM
> >>> (maybe take an optional vm_id).
> >>>
> >>> now the simplest unmap I can see if one that parallel munmap
> >>> UNMAP(vmaddr, range);
> >>>
> >>> But it begs the question on then how much the kernel needs to deal
> >>> with here, if we support random vmaddr,range then we really need to be
> >>> able to do everything munmap does for CPU VMA, which means splitting
> >>> ranges, joining ranges etc.
> >>>
> >>> like
> >>> MAP(1, 0, 0x8000, 0xc)
> >>> UNMAP(0xc1000, 0x1000)
> >>> should that be possible?
> >>>
> >>> Do we have any API usage (across Vulkan/CL/CUDA/ROCm etc) that
> >>> requires this sort of control, or should we be fine with only
> >>> unmapping objects exactly like how they were mapped in the first
> >>> place, and not have any splitting/joining?
> > Vulkan allows for this, though I haven't checked to what extent apps use
> it.
>
> This is massively used for partial resident textures under OpenGL as far
> as I know.
>
> E.g. you map a range like 1->10 as PRT and then then map real textures
> at 2, 5 and 7 or something like that.
>
> Saying that a functionality to map/enable PRT for a range is necessary
> as well. On amdgpu we have a special flag for that and in this case the
> BO to map can be NULL.
>

NVIDIA has similar bits.  I don't remember if intel does or not.  Handling
this as a map with BO=0 and a PRT flag of some sort seems like a perfectly
reasonable way to handle it.


> > We could technically split all mapping/unmapping to be per single tile
> > in the userspace driver, which avoids the need for splitting/merging,
> > but that could very much be a pessimization.
>
> That would be pretty much a NAK from my side. A couple of hardware
> optimizations require mappings to be as large as possible.
>
> Otherwise we wouldn't be able to use huge/giant (2MiB, 1GiB) pages,
> power of two TLB reach optimizations (8KiB, 16KiB, 32KiB.) as well
> as texture fetcher optimizations.
>

Agreed.  Intel has such optimizations as well and they really do matter.
IDK about nvidia but I'd be surprised if they don't at least have a 2M
variant or something.  Reducing page-table depth matters a lot for latency.


> >> I suppose it also asks the question around paralleling
> >>
> >> fd = open()
> >> ptr = mmap(fd,)
> >> close(fd)
> >> the mapping is still valid.
> >>
> >> I suppose our equiv is
> >> handle = bo_alloc()
> >> gpu_addr = vm_bind(handle,)
> >> gem_close(handle)
> >> is the gpu_addr still valid does the VM hold a reference on the kernel
> >> bo internally.
> > For Vulkan it looks like this is undefined and the above is not
> necessary:
> >
> > "It is important to note that freeing a VkDeviceMemory object with
> > vkFreeMemory will not cause resources (or resource regions) bound to
> > the memory object to become unbound. Applications must not access
> > resources bound to memory that has been freed."
> > (32.7.6)
>

I'm not sure about this particular question.  We need to be sure that maps
get cleaned up eventually.  On the one hand, I think it's probably a valid
API implementation to have each mapped page hold a reference similar to
mmap and have vkDestroyImage or vkDestroyBuffer do an unmap to clean up the
range.  However, clients may be surprised when they destroy a large memory
object and can't reap the memory because of extra BO references they don't
know about.  If BOs unmap themselves on close or if we had a way to take a
VM+BO and say "unmap this BO from everywhere, please", we can clean up the
memory pretty easily.  Without that, it's a giant PITA to do entirely
inside the userspace driver because it requires us to globally track every
mapping and that means data structures and locks.  Yes, such an ioctl would
require the kernel to track things but the kernel is already tracking
everything that's bound, so hopefully it doesn't add much.

--Jason


> Additional to what was discussed here so far we need an array on in and
> out drm_syncobj for both map as well as unmap.
>
> E.g. when the mapping/unmapping should happen and when it is completed
> etc...
>
> Christian.
>
> >
> >
> >> Dave.
> >>> Dave.
>
>


Re: Rust in our code base

2022-08-24 Thread Jason Ekstrand
+mesa-dev and my jlekstrand.net e-mail

On Sun, 2022-08-21 at 20:44 +0200, Karol Herbst wrote:
> On Sun, Aug 21, 2022 at 8:34 PM Rob Clark 
> wrote:
> > 
> > On Sun, Aug 21, 2022 at 10:45 AM Karol Herbst 
> > wrote:
> > > 
> > > On Sun, Aug 21, 2022 at 7:43 PM Karol Herbst 
> > > wrote:
> > > > 
> > > > On Sun, Aug 21, 2022 at 5:46 PM Rob Clark 
> > > > wrote:
> > > > > 
> > > > > On Sat, Aug 20, 2022 at 5:23 AM Karol Herbst
> > > > >  wrote:
> > > > > > 
> > > > > > Hey everybody,
> > > > > > 
> > > > > > so I think it's time to have this discussion for real.
> > > > > > 
> > > > > > I am working on Rusticl
> > > > > > (
> > > > > > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15
> > > > > > 439)
> > > > > > which I would like to merge quite soon.
> > > > > > 
> > > > > > Others might also plan on starting kernel drivers written
> > > > > > in Rust (and
> > > > > > if people feel comfortable to discuss this as well, they
> > > > > > might reply
> > > > > > here)
> > > > > > 
> > > > > > The overall implication of that is: if we are doing this,
> > > > > > people (that
> > > > > > is we) have to accept that touching Rust code will be part
> > > > > > of our
> > > > > > development process. There is no other sane way of doing
> > > > > > it.
> > > > > > 
> > > > > > I am not willing to wrap things in Rusticl so changing
> > > > > > gallium APIs
> > > > > > won't involve touching Rust code, and we also can't expect
> > > > > > people to
> > > > > > design their kernel drivers in weird ways "just because
> > > > > > somebody
> > > > > > doesn't want to deal with Rust"
> > > > > > 
> > > > > > If we are going to do this, we have to do it for real,
> > > > > > which means,
> > > > > > Rust code will call C APIs directly and a change in those
> > > > > > APIs will
> > > > > > also require changes in Rust code making use of those APIs.
> > > > > > 
> > > > > > I am so explicit on this very point, because we had some
> > > > > > discussion on
> > > > > > IRC where this was seen as a no-go at least from some
> > > > > > people, which
> > > > > > makes me think we have to find a mutual agreement on how it
> > > > > > should be
> > > > > > going forward.
> > > > > > 
> > > > > > And I want to be very explicit here about the future of
> > > > > > Rusticl as
> > > > > > well: if the agreement is that people don't want to have to
> > > > > > deal with
> > > > > > Rust changing e.g. gallium, Rusticl is a dead project. I am
> > > > > > not
> > > > > > willing to come up with some trashy external-internal API
> > > > > > just to
> > > > > > maintain Rusticl outside of the mesa git repo.
> > > > > > And doing it on a kernel level is even more of a no-go.
> > > > > > 
> > > > > > So what are we all thinking about Rust in our core repos?
> > > > > 
> > > > > I think there has to be willingness on the part of rust folks
> > > > > to help
> > > > > others who aren't so familiar with rust with these sorts of
> > > > > API
> > > > > changes.  You can't completely impose the burden on others
> > > > > who have
> > > > > never touched rust before.  That said, I expect a lot of API
> > > > > changes
> > > > > over time are simple enough that other devs could figure out
> > > > > the
> > > > > related rust side changes.
> > > > > 
> > > > 
> > > > yeah, I agree here. I wouldn't say it's all the responsibility
> > > > of
> > > > developers changing APIs to also know how to change the code.
> > > > So e.g.
> > > > if an MR fails to compile and it's because of rusticl, I will
> > > > help out
> > > > and do the changes myself if necessary. But long term we have
> > > > to
> > > > accept that API changes also come with the implication of also
> > > > having
> > > > to touch Rust code.
> > > > 
> > > > Short term it might be a learning opportunity for some/most,
> > > > but long
> > > > term it has to be accepted as a part of development to deal
> > > > with Rust.
> > > > 
> > > > > As long as folks who want to start introducing rust in mesa
> > > > > and drm
> > > > > realize they are also signing up to play the role of rust
> > > > > tutor and
> > > > > technical assistance, I don't see a problem.  But if they
> > > > > aren't
> > > > > around and willing to help, I could see this going badly.
> > > > > 
> > > > 
> > > > Yep, I fully agree here. This is also the main reason I am
> > > > bringing
> > > > this up. Nobody should be left alone with having to deal with
> > > > changing
> > > > the code. On the other hand a "not having to touch Rust code
> > > > when
> > > > changing APIs" guarantee is something which is simply
> > > > impossible to
> > > > have in any sane architecture. So we should figure out under
> > > > which
> > > > circumstances it will be okay for everybody.
> > 
> > Yeah, this sounds fine to me.
> > 
> > > > At least I don't see a way how I can structure Rusticl so that
> > > > somebody working on gallium won't have to also deal with
> > > > rusticl. One
> > > > possibility would be to have a 

Re: [PATCH] dma-buf/dma-resv: check if the new fence is really later

2022-08-24 Thread Jason Ekstrand
On Wed, Aug 10, 2022 at 12:26 PM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Previously when we added a fence to a dma_resv object we always
> assumed the the newer than all the existing fences.
>
> With Jason's work to add an UAPI to explicit export/import that's not
> necessary the case any more. So without this check we would allow
> userspace to force the kernel into an use after free error.
>
> Since the change is very small and defensive it's probably a good
> idea to backport this to stable kernels as well just in case others
> are using the dma_resv object in the same way.
>

Especially in the new world of dma_resv being a "bag of fences", I think
this makes a lot of sense.

Reviewed-by: Jason Ekstrand 


>
> Signed-off-by: Christian König 
> ---
>  drivers/dma-buf/dma-resv.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> index 205acb2c744d..e3885c90a3ac 100644
> --- a/drivers/dma-buf/dma-resv.c
> +++ b/drivers/dma-buf/dma-resv.c
> @@ -295,7 +295,8 @@ void dma_resv_add_fence(struct dma_resv *obj, struct
> dma_fence *fence,
> enum dma_resv_usage old_usage;
>
> dma_resv_list_entry(fobj, i, obj, , _usage);
> -   if ((old->context == fence->context && old_usage >= usage)
> ||
> +   if ((old->context == fence->context && old_usage >= usage
> &&
> +dma_fence_is_later(fence, old)) ||
> dma_fence_is_signaled(old)) {
> dma_resv_list_set(fobj, i, fence, usage);
> dma_fence_put(old);
> --
> 2.25.1
>
>


Re: [PATCH] dma-buf: Use dma_fence_unwrap_for_each when importing fences

2022-08-24 Thread Jason Ekstrand
On Mon, Aug 8, 2022 at 11:39 AM Jason Ekstrand 
wrote:

> On Sun, 2022-08-07 at 18:35 +0200, Christian König wrote:
> > Am 02.08.22 um 23:01 schrieb Jason Ekstrand:
> > > Ever since 68129f431faa ("dma-buf: warn about containers in
> > > dma_resv object"),
> > > dma_resv_add_shared_fence will warn if you attempt to add a
> > > container fence.
> > > While most drivers were fine, fences can also be added to a
> > > dma_resv via the
> > > recently added DMA_BUF_IOCTL_IMPORT_SYNC_FILE.  Use
> > > dma_fence_unwrap_for_each
> > > to add each fence one at a time.
> > >
> > > Fixes: 594740497e99 ("dma-buf: Add an API for importing sync files
> > > (v10)")
> > > Signed-off-by: Jason Ekstrand 
> > > Reported-by: Sarah Walker 
> > > Cc: Christian König 
> > > ---
> > >   drivers/dma-buf/dma-buf.c | 23 +--
> > >   1 file changed, 17 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > > index 630133284e2b..8d5d45112f52 100644
> > > --- a/drivers/dma-buf/dma-buf.c
> > > +++ b/drivers/dma-buf/dma-buf.c
> > > @@ -15,6 +15,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #include 
> > >   #include 
> > >   #include 
> > > @@ -391,8 +392,10 @@ static long dma_buf_import_sync_file(struct
> > > dma_buf *dmabuf,
> > >  const void __user *user_data)
> > >   {
> > > struct dma_buf_import_sync_file arg;
> > > -   struct dma_fence *fence;
> > > +   struct dma_fence *fence, *f;
> > > enum dma_resv_usage usage;
> > > +   struct dma_fence_unwrap iter;
> > > +   unsigned int num_fences;
> > > int ret = 0;
> > >
> > > if (copy_from_user(, user_data, sizeof(arg)))
> > > @@ -411,13 +414,21 @@ static long dma_buf_import_sync_file(struct
> > > dma_buf *dmabuf,
> > > usage = (arg.flags & DMA_BUF_SYNC_WRITE) ?
> > > DMA_RESV_USAGE_WRITE :
> > >
> > > DMA_RESV_USAGE_READ;
> > >
> > > -   dma_resv_lock(dmabuf->resv, NULL);
> > > +   num_fences = 0;
> > > +   dma_fence_unwrap_for_each(f, , fence)
> > > +   ++num_fences;
> > >
> > > -   ret = dma_resv_reserve_fences(dmabuf->resv, 1);
> > > -   if (!ret)
> > > -   dma_resv_add_fence(dmabuf->resv, fence, usage);
> > > +   if (num_fences > 0) {
> > > +   dma_resv_lock(dmabuf->resv, NULL);
> > >
> > > -   dma_resv_unlock(dmabuf->resv);
> > > +   ret = dma_resv_reserve_fences(dmabuf->resv,
> > > num_fences);
> >
> > That looks like it is misplaced.
> >
> > You *must* only lock the reservation object once and then add all
> > fences
> > in one go.
>
> That's what I'm doing.  Lock, reserve, add a bunch, unlock.  I am
> assuming that the iterator won't suddenly want to iterate more fences
> between my initial count and when I go to add them but I think that
> assumption is ok.
>

Bump.  This has been sitting here for a couple of weeks.  I still don't see
the problem.

--Jason


> --Jason
>
>
> > Thinking now about it we probably had a bug around that before as
> > well.
> > Going to double check tomorrow.
> >
> > Regards,
> > Christian.
> >
> > > +   if (!ret) {
> > > +   dma_fence_unwrap_for_each(f, , fence)
> > > +   dma_resv_add_fence(dmabuf->resv, f,
> > > usage);
> > > +   }
> > > +
> > > +   dma_resv_unlock(dmabuf->resv);
> > > +   }
> > >
> > > dma_fence_put(fence);
> > >
> >
>
>


Re: [PATCH 0/1] [RFC] drm/fourcc: Add new unsigned R16_UINT/RG1616_UINT formats

2022-08-09 Thread Jason Ekstrand
On Fri, 2022-07-15 at 11:20 +0100, Dennis Tsiang wrote:
> On 30/06/2022 08:47, Pekka Paalanen wrote:
> > On Wed, 29 Jun 2022 14:53:49 +
> > Simon Ser  wrote:
> > 
> > > On Wednesday, June 29th, 2022 at 16:46, Dennis Tsiang
> > >  wrote:
> > > 
> > > > Thanks for your comments. This is not intended to be used for
> > > > KMS, where
> > > > indeed there would be no difference. This proposal is for other
> > > > Graphics
> > > > APIs such as Vulkan, which requires the application to be
> > > > explicit
> > > > upfront about how they will interpret the data, whether that be
> > > > UNORM,
> > > > UINT .etc. We want to be able to import dma_bufs which create a
> > > > VkImage
> > > > with a "_UINT" VkFormat. However there is currently no explicit
> > > > mapping
> > > > between the DRM fourccs + modifiers combos to "_UINT"
> > > > VkFormats. One
> > > > solution is to encode that into the fourccs, which is what this
> > > > RFC is
> > > > proposing.
> > > 
> > > As a general comment, I don't think it's reasonable to encode all
> > > of the
> > > VkFormat information inside DRM FourCC. For instance, VkFormat
> > > has SRGB/UNORM
> > > variants which describe whether pixel values are electrical or
> > > optical
> > > (IOW, EOTF-encoded or not). Moreover, other APIs may encode
> > > different
> > > information in their format enums.
> > 
> > Yeah, do not add any of that information to the DRM pixel format
> > codes.
> > 
> > There is *so much* other stuff you also need to define than what's
> > already mentioned, and which bits you need for the API at hand
> > depends
> > totally on the API at hand. After the API has defined some parts of
> > the
> > metadata, the API user has to take care of the remaining parts of
> > the
> > metadata in other ways, like dynamic range or color space.
> > 
> > Besides, when you deal with dmabuf, you already need to pass a lot
> > of
> > metadata explicitly, like the pixel format, width, height, stride,
> > modifier, etc. so it's better to add more of those (like we will be
> > doing in Wayland, and not specific to dmabuf even) than to try make
> > pixel formats a huge mess through combinatorial explosion and
> > sometimes
> > partial and sometimes conflicting image metadata.
> > 
> > You might be able to get a glimpse of what all metadata there could
> > be
> > by reading
> > https://gitlab.freedesktop.org/pq/color-and-hdr/-/blob/main/doc/pixels_color.md
> > .
> > 
> > Compare Vulkan formats to e.g.
> > https://docs.microsoft.com/en-us/windows/win32/api/dxgicommon/ne-dxgicommon-dxgi_color_space_type
> > and you'll see that while DXGI color space enumeration is mostly
> > about
> > other stuff, it also has overlap with Vulkan formats I think, at
> > least
> > the SRGB vs. not part.
> > 
> > Btw. practically all buffers you see used, especially if they are 8
> > bpc, they are almost guaranteed to be "SRGB" non-linearly encoded,
> > but
> > do you ever see that fact being explicitly communicated?
> > 
> > Then there is the question that if you have an SRGB-encoded buffer,
> > do
> > you want to read out SRGB-encoded or linear values? That depends on
> > what you are doing with the buffer, so if you always mapped dmabuf
> > to
> > Vulkan SRGB formats (or always to non-SRGB formats), then you need
> > some
> > other way in Vulkan for the app to say whether to sample encoded or
> > linear (electrical or optical) values. And whether texture
> > filtering is
> > done in encoded or linear space, because that makes a difference
> > too.
> > 
> > IOW, there are cases where the format mapping depends on the user
> > of the
> > buffer and not only on the contents of the buffer.
> > 
> > Therefore you simply cannot create a static mapping table between
> > two
> > format definition systems when the two systems are fundamentally
> > different, like Vulkan and DRM fourcc.
> > 
> > 
> > Thanks,
> > pq
> 
> Thanks all for your comments. We'll look into an alternative approach
> to
> achieve what we need.

I mostly agree with Pekka here.  The fourcc formats as they currently
are defined only specify a bit pattern and channel order, not an
interpretation.  Vulkan formats, on the other hand, have everything you
need in order to know how to convert float vec4s to/from that format in
a shader.  That's not the same as knowing what the data represents
(colorspace, wite values, etc.) but it's a lot more than fourcc.

That said, the Vulkan APIs for querying modifier support will give you
much more fine-grained information about exactly the Vulkan formats you
request.  So if you ask for modifier support for VK_FORMAT_R16G16_UINT,
that's what you'll get.  I *think* it should be fine to use
VK_FORMAT_R16G16_UINT with DRM_FOURCC_GR1616.  Of course, the API on
the other side will also need a more precise format specifier than
fourcc if it's to know the difference between R16G16_UINT and
R16G16_UNORM.

--Jason



Re: [PATCH] dma-buf: Use dma_fence_unwrap_for_each when importing fences

2022-08-08 Thread Jason Ekstrand
On Sun, 2022-08-07 at 18:35 +0200, Christian König wrote:
> Am 02.08.22 um 23:01 schrieb Jason Ekstrand:
> > Ever since 68129f431faa ("dma-buf: warn about containers in
> > dma_resv object"),
> > dma_resv_add_shared_fence will warn if you attempt to add a
> > container fence.
> > While most drivers were fine, fences can also be added to a
> > dma_resv via the
> > recently added DMA_BUF_IOCTL_IMPORT_SYNC_FILE.  Use
> > dma_fence_unwrap_for_each
> > to add each fence one at a time.
> > 
> > Fixes: 594740497e99 ("dma-buf: Add an API for importing sync files
> > (v10)")
> > Signed-off-by: Jason Ekstrand 
> > Reported-by: Sarah Walker 
> > Cc: Christian König 
> > ---
> >   drivers/dma-buf/dma-buf.c | 23 +--
> >   1 file changed, 17 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 630133284e2b..8d5d45112f52 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
> > @@ -15,6 +15,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -391,8 +392,10 @@ static long dma_buf_import_sync_file(struct
> > dma_buf *dmabuf,
> >  const void __user *user_data)
> >   {
> > struct dma_buf_import_sync_file arg;
> > -   struct dma_fence *fence;
> > +   struct dma_fence *fence, *f;
> > enum dma_resv_usage usage;
> > +   struct dma_fence_unwrap iter;
> > +   unsigned int num_fences;
> > int ret = 0;
> >   
> > if (copy_from_user(, user_data, sizeof(arg)))
> > @@ -411,13 +414,21 @@ static long dma_buf_import_sync_file(struct
> > dma_buf *dmabuf,
> > usage = (arg.flags & DMA_BUF_SYNC_WRITE) ?
> > DMA_RESV_USAGE_WRITE :
> >   
> > DMA_RESV_USAGE_READ;
> >   
> > -   dma_resv_lock(dmabuf->resv, NULL);
> > +   num_fences = 0;
> > +   dma_fence_unwrap_for_each(f, , fence)
> > +   ++num_fences;
> >   
> > -   ret = dma_resv_reserve_fences(dmabuf->resv, 1);
> > -   if (!ret)
> > -   dma_resv_add_fence(dmabuf->resv, fence, usage);
> > +   if (num_fences > 0) {
> > +   dma_resv_lock(dmabuf->resv, NULL);
> >   
> > -   dma_resv_unlock(dmabuf->resv);
> > +   ret = dma_resv_reserve_fences(dmabuf->resv,
> > num_fences);
> 
> That looks like it is misplaced.
> 
> You *must* only lock the reservation object once and then add all
> fences 
> in one go.

That's what I'm doing.  Lock, reserve, add a bunch, unlock.  I am
assuming that the iterator won't suddenly want to iterate more fences
between my initial count and when I go to add them but I think that
assumption is ok.

--Jason


> Thinking now about it we probably had a bug around that before as
> well. 
> Going to double check tomorrow.
> 
> Regards,
> Christian.
> 
> > +   if (!ret) {
> > +   dma_fence_unwrap_for_each(f, , fence)
> > +   dma_resv_add_fence(dmabuf->resv, f,
> > usage);
> > +   }
> > +
> > +   dma_resv_unlock(dmabuf->resv);
> > +   }
> >   
> > dma_fence_put(fence);
> >   
> 



[PATCH] dma-buf: Use dma_fence_unwrap_for_each when importing fences

2022-08-02 Thread Jason Ekstrand
Ever since 68129f431faa ("dma-buf: warn about containers in dma_resv object"),
dma_resv_add_shared_fence will warn if you attempt to add a container fence.
While most drivers were fine, fences can also be added to a dma_resv via the
recently added DMA_BUF_IOCTL_IMPORT_SYNC_FILE.  Use dma_fence_unwrap_for_each
to add each fence one at a time.

Fixes: 594740497e99 ("dma-buf: Add an API for importing sync files (v10)")
Signed-off-by: Jason Ekstrand 
Reported-by: Sarah Walker 
Cc: Christian König 
---
 drivers/dma-buf/dma-buf.c | 23 +--
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 630133284e2b..8d5d45112f52 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -391,8 +392,10 @@ static long dma_buf_import_sync_file(struct dma_buf 
*dmabuf,
 const void __user *user_data)
 {
struct dma_buf_import_sync_file arg;
-   struct dma_fence *fence;
+   struct dma_fence *fence, *f;
enum dma_resv_usage usage;
+   struct dma_fence_unwrap iter;
+   unsigned int num_fences;
int ret = 0;
 
if (copy_from_user(, user_data, sizeof(arg)))
@@ -411,13 +414,21 @@ static long dma_buf_import_sync_file(struct dma_buf 
*dmabuf,
usage = (arg.flags & DMA_BUF_SYNC_WRITE) ? DMA_RESV_USAGE_WRITE :
   DMA_RESV_USAGE_READ;
 
-   dma_resv_lock(dmabuf->resv, NULL);
+   num_fences = 0;
+   dma_fence_unwrap_for_each(f, , fence)
+   ++num_fences;
 
-   ret = dma_resv_reserve_fences(dmabuf->resv, 1);
-   if (!ret)
-   dma_resv_add_fence(dmabuf->resv, fence, usage);
+   if (num_fences > 0) {
+   dma_resv_lock(dmabuf->resv, NULL);
 
-   dma_resv_unlock(dmabuf->resv);
+   ret = dma_resv_reserve_fences(dmabuf->resv, num_fences);
+   if (!ret) {
+   dma_fence_unwrap_for_each(f, , fence)
+   dma_resv_add_fence(dmabuf->resv, f, usage);
+   }
+
+   dma_resv_unlock(dmabuf->resv);
+   }
 
dma_fence_put(fence);
 
-- 
2.36.1



Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-30 Thread Jason Ekstrand
On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> VM_BIND and related uapi definitions
>
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
> I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
> Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
> all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> Update documentation on async vm_bind/unbind and versioning.
> Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> batch_count field and I915_EXEC3_SECURE flag.
>
> Signed-off-by: Niranjana Vishwanathapura <
> niranjana.vishwanathap...@intel.com>
> Reviewed-by: Daniel Vetter 
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index ..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *lifted, Similarly, binding will replace any mappings in the given
> range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION 57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND   (1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND   0x3d
> +#define DRM_I915_GEM_VM_UNBIND 0x3e
> +#define DRM_I915_GEM_EXECBUFFER3   0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +   /** @handle: User's handle for a drm_syncobj to wait on or signal.
> */
> +   __u32 handle;
> +
> +   /**
> +* @flags: Supported flags are:
> +*
> +* I915_TIMELINE_FENCE_WAIT:
> +* Wait for the input fence before the operation.
> +*
> +* I915_TIMELINE_FENCE_SIGNAL:
> +* Return operation completion fence as output.
> +*/
> +   __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT(1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL  (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL
> << 1))
> +
> +   /**
> +* @value: A point in the timeline.
> +* Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +* timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +* binary one.
> +*/
> +   __u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not 

Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-30 Thread Jason Ekstrand
On Thu, Jun 30, 2022 at 10:14 AM Matthew Auld 
wrote:

> On 30/06/2022 06:11, Jason Ekstrand wrote:
> > On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura
> >  > <mailto:niranjana.vishwanathap...@intel.com>> wrote:
> >
> > VM_BIND and related uapi definitions
> >
> > v2: Reduce the scope to simple Mesa use case.
> > v3: Expand VM_UNBIND documentation and add
> >  I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> >  and I915_GEM_VM_BIND_TLB_FLUSH flags.
> > v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> >  documentation for vm_bind/unbind.
> > v5: Remove TLB flush requirement on VM_UNBIND.
> >  Add version support to stage implementation.
> > v6: Define and use drm_i915_gem_timeline_fence structure for
> >  all timeline fences.
> > v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> >  Update documentation on async vm_bind/unbind and versioning.
> >  Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> >  batch_count field and I915_EXEC3_SECURE flag.
> >
> > Signed-off-by: Niranjana Vishwanathapura
> >  > <mailto:niranjana.vishwanathap...@intel.com>>
> > Reviewed-by: Daniel Vetter  > <mailto:daniel.vet...@ffwll.ch>>
> > ---
> >   Documentation/gpu/rfc/i915_vm_bind.h | 280
> +++
> >   1 file changed, 280 insertions(+)
> >   create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
> >
> > diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> > b/Documentation/gpu/rfc/i915_vm_bind.h
> > new file mode 100644
> > index ..a93e08bceee6
> > --- /dev/null
> > +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> > @@ -0,0 +1,280 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright © 2022 Intel Corporation
> > + */
> > +
> > +/**
> > + * DOC: I915_PARAM_VM_BIND_VERSION
> > + *
> > + * VM_BIND feature version supported.
> > + * See typedef drm_i915_getparam_t param.
> > + *
> > + * Specifies the VM_BIND feature version supported.
> > + * The following versions of VM_BIND have been defined:
> > + *
> > + * 0: No VM_BIND support.
> > + *
> > + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings
> > created
> > + *previously with VM_BIND, the ioctl will not support unbinding
> > multiple
> > + *mappings or splitting them. Similarly, VM_BIND calls will not
> > replace
> > + *any existing mappings.
> > + *
> > + * 2: The restrictions on unbinding partial or multiple mappings is
> > + *lifted, Similarly, binding will replace any mappings in the
> > given range.
> > + *
> > + * See struct drm_i915_gem_vm_bind and struct
> drm_i915_gem_vm_unbind.
> > + */
> > +#define I915_PARAM_VM_BIND_VERSION 57
> > +
> > +/**
> > + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> > + *
> > + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> > + * See struct drm_i915_gem_vm_control flags.
> > + *
> > + * The older execbuf2 ioctl will not support VM_BIND mode of
> operation.
> > + * For VM_BIND mode, we have new execbuf3 ioctl which will not
> > accept any
> > + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> > + */
> > +#define I915_VM_CREATE_FLAGS_USE_VM_BIND   (1 << 0)
> > +
> > +/* VM_BIND related ioctls */
> > +#define DRM_I915_GEM_VM_BIND   0x3d
> > +#define DRM_I915_GEM_VM_UNBIND 0x3e
> > +#define DRM_I915_GEM_EXECBUFFER3   0x3f
> > +
> > +#define DRM_IOCTL_I915_GEM_VM_BIND
> >   DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_BIND, struct
> > drm_i915_gem_vm_bind)
> > +#define DRM_IOCTL_I915_GEM_VM_UNBIND
> >   DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_UNBIND, struct
> > drm_i915_gem_vm_bind)
> > +#define DRM_IOCTL_I915_GEM_EXECBUFFER3
> >   DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER3, struct
> > drm_i915_gem_execbuffer3)
> > +
> > +/**
> > + * struct drm_i915_gem_timeline_fence - An input or output timeline
> > fence.
> > + *
> > + * The operation will wait for input fence to

Re: [PATCH v6 3/3] drm/doc/rfc: VM_BIND uapi definition

2022-06-29 Thread Jason Ekstrand
On Sat, Jun 25, 2022 at 8:49 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> VM_BIND and related uapi definitions
>
> v2: Reduce the scope to simple Mesa use case.
> v3: Expand VM_UNBIND documentation and add
> I915_GEM_VM_BIND/UNBIND_FENCE_VALID
> and I915_GEM_VM_BIND_TLB_FLUSH flags.
> v4: Remove I915_GEM_VM_BIND_TLB_FLUSH flag and add additional
> documentation for vm_bind/unbind.
> v5: Remove TLB flush requirement on VM_UNBIND.
> Add version support to stage implementation.
> v6: Define and use drm_i915_gem_timeline_fence structure for
> all timeline fences.
> v7: Rename I915_PARAM_HAS_VM_BIND to I915_PARAM_VM_BIND_VERSION.
> Update documentation on async vm_bind/unbind and versioning.
> Remove redundant vm_bind/unbind FENCE_VALID flag, execbuf3
> batch_count field and I915_EXEC3_SECURE flag.
>
> Signed-off-by: Niranjana Vishwanathapura <
> niranjana.vishwanathap...@intel.com>
> Reviewed-by: Daniel Vetter 
> ---
>  Documentation/gpu/rfc/i915_vm_bind.h | 280 +++
>  1 file changed, 280 insertions(+)
>  create mode 100644 Documentation/gpu/rfc/i915_vm_bind.h
>
> diff --git a/Documentation/gpu/rfc/i915_vm_bind.h
> b/Documentation/gpu/rfc/i915_vm_bind.h
> new file mode 100644
> index ..a93e08bceee6
> --- /dev/null
> +++ b/Documentation/gpu/rfc/i915_vm_bind.h
> @@ -0,0 +1,280 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * Copyright © 2022 Intel Corporation
> + */
> +
> +/**
> + * DOC: I915_PARAM_VM_BIND_VERSION
> + *
> + * VM_BIND feature version supported.
> + * See typedef drm_i915_getparam_t param.
> + *
> + * Specifies the VM_BIND feature version supported.
> + * The following versions of VM_BIND have been defined:
> + *
> + * 0: No VM_BIND support.
> + *
> + * 1: In VM_UNBIND calls, the UMD must specify the exact mappings created
> + *previously with VM_BIND, the ioctl will not support unbinding
> multiple
> + *mappings or splitting them. Similarly, VM_BIND calls will not
> replace
> + *any existing mappings.
> + *
> + * 2: The restrictions on unbinding partial or multiple mappings is
> + *lifted, Similarly, binding will replace any mappings in the given
> range.
> + *
> + * See struct drm_i915_gem_vm_bind and struct drm_i915_gem_vm_unbind.
> + */
> +#define I915_PARAM_VM_BIND_VERSION 57
> +
> +/**
> + * DOC: I915_VM_CREATE_FLAGS_USE_VM_BIND
> + *
> + * Flag to opt-in for VM_BIND mode of binding during VM creation.
> + * See struct drm_i915_gem_vm_control flags.
> + *
> + * The older execbuf2 ioctl will not support VM_BIND mode of operation.
> + * For VM_BIND mode, we have new execbuf3 ioctl which will not accept any
> + * execlist (See struct drm_i915_gem_execbuffer3 for more details).
> + */
> +#define I915_VM_CREATE_FLAGS_USE_VM_BIND   (1 << 0)
> +
> +/* VM_BIND related ioctls */
> +#define DRM_I915_GEM_VM_BIND   0x3d
> +#define DRM_I915_GEM_VM_UNBIND 0x3e
> +#define DRM_I915_GEM_EXECBUFFER3   0x3f
> +
> +#define DRM_IOCTL_I915_GEM_VM_BIND DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_BIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_VM_UNBIND   DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_VM_UNBIND, struct drm_i915_gem_vm_bind)
> +#define DRM_IOCTL_I915_GEM_EXECBUFFER3 DRM_IOWR(DRM_COMMAND_BASE
> + DRM_I915_GEM_EXECBUFFER3, struct drm_i915_gem_execbuffer3)
> +
> +/**
> + * struct drm_i915_gem_timeline_fence - An input or output timeline fence.
> + *
> + * The operation will wait for input fence to signal.
> + *
> + * The returned output fence will be signaled after the completion of the
> + * operation.
> + */
> +struct drm_i915_gem_timeline_fence {
> +   /** @handle: User's handle for a drm_syncobj to wait on or signal.
> */
> +   __u32 handle;
> +
> +   /**
> +* @flags: Supported flags are:
> +*
> +* I915_TIMELINE_FENCE_WAIT:
> +* Wait for the input fence before the operation.
> +*
> +* I915_TIMELINE_FENCE_SIGNAL:
> +* Return operation completion fence as output.
> +*/
> +   __u32 flags;
> +#define I915_TIMELINE_FENCE_WAIT(1 << 0)
> +#define I915_TIMELINE_FENCE_SIGNAL  (1 << 1)
> +#define __I915_TIMELINE_FENCE_UNKNOWN_FLAGS (-(I915_TIMELINE_FENCE_SIGNAL
> << 1))
> +
> +   /**
> +* @value: A point in the timeline.
> +* Value must be 0 for a binary drm_syncobj. A Value of 0 for a
> +* timeline drm_syncobj is invalid as it turns a drm_syncobj into a
> +* binary one.
> +*/
> +   __u64 value;
> +};
> +
> +/**
> + * struct drm_i915_gem_vm_bind - VA to object mapping to bind.
> + *
> + * This structure is passed to VM_BIND ioctl and specifies the mapping of
> GPU
> + * virtual address (VA) range to the section of an object that should be
> bound
> + * in the device page table of the specified address space (VM).
> + * The VA range specified must be unique (ie., not 

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-08 Thread Jason Ekstrand
On Wed, Jun 8, 2022 at 4:44 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> On Wed, Jun 08, 2022 at 08:33:25AM +0100, Tvrtko Ursulin wrote:
> >
> >
> >On 07/06/2022 22:32, Niranjana Vishwanathapura wrote:
> >>On Tue, Jun 07, 2022 at 11:18:11AM -0700, Niranjana Vishwanathapura
> wrote:
> >>>On Tue, Jun 07, 2022 at 12:12:03PM -0500, Jason Ekstrand wrote:
> >>>> On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura
> >>>>  wrote:
> >>>>
> >>>>   On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote:
> >>>>   >   On 02/06/2022 23:35, Jason Ekstrand wrote:
> >>>>   >
> >>>>   > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura
> >>>>   >  wrote:
> >>>>   >
> >>>>   >   On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew
> >>>>Brost wrote:
> >>>>   >   >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin
> >>>>   wrote:
> >>>>   >   >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:
> >>>>   >   >> > +VM_BIND/UNBIND ioctl will immediately start
> >>>>   binding/unbinding
> >>>>   >   the mapping in an
> >>>>   >   >> > +async worker. The binding and unbinding will
> >>>>work like a
> >>>>   special
> >>>>   >   GPU engine.
> >>>>   >   >> > +The binding and unbinding operations are serialized
> and
> >>>>   will
> >>>>   >   wait on specified
> >>>>   >   >> > +input fences before the operation and will signal the
> >>>>   output
> >>>>   >   fences upon the
> >>>>   >   >> > +completion of the operation. Due to serialization,
> >>>>   completion of
> >>>>   >   an operation
> >>>>   >   >> > +will also indicate that all previous operations
> >>>>are also
> >>>>   >   complete.
> >>>>   >   >>
> >>>>   >   >> I guess we should avoid saying "will immediately start
> >>>>   >   binding/unbinding" if
> >>>>   >   >> there are fences involved.
> >>>>   >   >>
> >>>>   >   >> And the fact that it's happening in an async
> >>>>worker seem to
> >>>>   imply
> >>>>   >   it's not
> >>>>   >   >> immediate.
> >>>>   >   >>
> >>>>   >
> >>>>   >   Ok, will fix.
> >>>>   >   This was added because in earlier design binding was
> deferred
> >>>>   until
> >>>>   >   next execbuff.
> >>>>   >   But now it is non-deferred (immediate in that sense).
> >>>>But yah,
> >>>>   this is
> >>>>   >   confusing
> >>>>   >   and will fix it.
> >>>>   >
> >>>>   >   >>
> >>>>   >   >> I have a question on the behavior of the bind
> >>>>operation when
> >>>>   no
> >>>>   >   input fence
> >>>>   >   >> is provided. Let say I do :
> >>>>   >   >>
> >>>>   >   >> VM_BIND (out_fence=fence1)
> >>>>   >   >>
> >>>>   >   >> VM_BIND (out_fence=fence2)
> >>>>   >   >>
> >>>>   >   >> VM_BIND (out_fence=fence3)
> >>>>   >   >>
> >>>>   >   >>
> >>>>   >   >> In what order are the fences going to be signaled?
> >>>>   >   >>
> >>>>   >   >> In the order of VM_BIND ioctls? Or out of order?
> >>>>   >   >>
> >>>>   >   >> Because you wrote "serialized I assume it's : in order
> >>>>   >   >>
> >>>>   >
> >>>>   >   Yes, in the order of VM_BIND/UNBIND ioctls. Note that
> >>>>bind and
> >>>>   unbind
> >>>>   >   will use
> >>>>   >   

[PATCH 2/2] dma-buf: Add an API for importing sync files (v10)

2022-06-08 Thread Jason Ekstrand
This patch is analogous to the previous sync file export patch in that
it allows you to import a sync_file into a dma-buf.  Unlike the previous
patch, however, this does add genuinely new functionality to dma-buf.
Without this, the only way to attach a sync_file to a dma-buf is to
submit a batch to your driver of choice which waits on the sync_file and
claims to write to the dma-buf.  Even if said batch is a no-op, a submit
is typically way more overhead than just attaching a fence.  A submit
may also imply extra synchronization with other work because it happens
on a hardware queue.

In the Vulkan world, this is useful for dealing with the out-fence from
vkQueuePresent.  Current Linux window-systems (X11, Wayland, etc.) all
rely on dma-buf implicit sync.  Since Vulkan is an explicit sync API, we
get a set of fences (VkSemaphores) in vkQueuePresent and have to stash
those as an exclusive (write) fence on the dma-buf.  We handle it in
Mesa today with the above mentioned dummy submit trick.  This ioctl
would allow us to set it directly without the dummy submit.

This may also open up possibilities for GPU drivers to move away from
implicit sync for their kernel driver uAPI and instead provide sync
files and rely on dma-buf import/export for communicating with other
implicit sync clients.

We make the explicit choice here to only allow setting RW fences which
translates to an exclusive fence on the dma_resv.  There's no use for
read-only fences for communicating with other implicit sync userspace
and any such attempts are likely to be racy at best.  When we got to
insert the RW fence, the actual fence we set as the new exclusive fence
is a combination of the sync_file provided by the user and all the other
fences on the dma_resv.  This ensures that the newly added exclusive
fence will never signal before the old one would have and ensures that
we don't break any dma_resv contracts.  We require userspace to specify
RW in the flags for symmetry with the export ioctl and in case we ever
want to support read fences in the future.

There is one downside here that's worth documenting:  If two clients
writing to the same dma-buf using this API race with each other, their
actions on the dma-buf may happen in parallel or in an undefined order.
Both with and without this API, the pattern is the same:  Collect all
the fences on dma-buf, submit work which depends on said fences, and
then set a new exclusive (write) fence on the dma-buf which depends on
said work.  The difference is that, when it's all handled by the GPU
driver's submit ioctl, the three operations happen atomically under the
dma_resv lock.  If two userspace submits race, one will happen before
the other.  You aren't guaranteed which but you are guaranteed that
they're strictly ordered.  If userspace manages the fences itself, then
these three operations happen separately and the two render operations
may happen genuinely in parallel or get interleaved.  However, this is a
case of userspace racing with itself.  As long as we ensure userspace
can't back the kernel into a corner, it should be fine.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Split import and export into separate patches
 - New commit message

v7 (Daniel Vetter):
 - Fix the uapi header to use the right struct in the ioctl
 - Use a separate dma_buf_import_sync_file struct
 - Add kerneldoc for dma_buf_import_sync_file

v8 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

v9 (Daniel Vetter):
 - Fix -EINVAL checks for the flags parameter
 - Add documentation about read/write fences
 - Add documentation about the expected usage of import/export and
   specifically call out the possible userspace race.

v10 (Simon Ser):
 - Fix a typo in the docs

Signed-off-by: Jason Ekstrand 
Signed-off-by: Jason Ekstrand 
Signed-off-by: Jason Ekstrand 
Reviewed-by: Christian König 
Reviewed-by: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 39 
 include/uapi/linux/dma-buf.h | 49 
 2 files changed, 88 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 6ff54f7e6119..f23f1482eb38 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -386,6 +386,43 @@ static long dma_buf_export_sync_file(struct dma_buf 
*dmabuf,
put_unused_fd(fd);
return ret;
 }
+
+static long dma_buf_import_sync_file(struct dma_buf *dmabuf

[PATCH 1/2] dma-buf: Add an API for exporting sync files (v14)

2022-06-08 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit
synchronization model.  This doesn't always play nicely with the
implicit synchronization used in the kernel and assumed by X11 and
Wayland.  The client -> compositor half of the synchronization isn't too
bad, at least on intel, because we can control whether or not i915
synchronizes on the buffer and whether or not it's considered written.

The harder part is the compositor -> client synchronization when we get
the buffer back from the compositor.  We're required to be able to
provide the client with a VkSemaphore and VkFence representing the point
in time where the window system (compositor and/or display) finished
using the buffer.  With current APIs, it's very hard to do this in such
a way that we don't get confused by the Vulkan driver's access of the
buffer.  In particular, once we tell the kernel that we're rendering to
the buffer again, any CPU waits on the buffer or GPU dependencies will
wait on some of the client rendering and not just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of
the implicit synchronization state of a given dma-buf in the form of a
sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
instead of CPU waiting directly, it encapsulates the wait operation, at
the current moment in time, in a sync_file so we can check/wait on it
later.  As long as the Vulkan driver does the sync_file export from the
dma-buf before we re-introduce it for rendering, it will only contain
fences from the compositor or display.  This allows to accurately turn
it into a VkFence or VkSemaphore without any over-synchronization.

By making this an ioctl on the dma-buf itself, it allows this new
functionality to be used in an entirely driver-agnostic way without
having access to a DRM fd. This makes it ideal for use in driver-generic
code in Mesa or in a client such as a compositor where the DRM fd may be
hard to reach.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Drop the sync_file import as it was all-around sketchy and not nearly
   as useful as import.
 - Re-introduce READ/WRITE flag support for export
 - Rework the commit message

v7 (Jason Ekstrand):
 - Require at least one sync flag
 - Fix a refcounting bug: dma_resv_get_excl() doesn't take a reference
 - Use _rcu helpers since we're accessing the dma_resv read-only

v8 (Jason Ekstrand):
 - Return -ENOMEM if the sync_file_create fails
 - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)

v9 (Jason Ekstrand):
 - Add documentation for the new ioctl

v10 (Jason Ekstrand):
 - Go back to dma_buf_sync_file as the ioctl struct name

v11 (Daniel Vetter):
 - Go back to dma_buf_export_sync_file as the ioctl struct name
 - Better kerneldoc describing what the read/write flags do

v12 (Christian König):
 - Document why we chose to make it an ioctl on dma-buf

v13 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

v14 (Daniel Vetter & Christian König):
 - Use dma_rev_usage_rw to get the properly inverted usage to pass to
   dma_resv_get_singleton()
 - Clean up the sync_file and fd if copy_to_user() fails

Signed-off-by: Jason Ekstrand 
Signed-off-by: Jason Ekstrand 
Signed-off-by: Jason Ekstrand 
Acked-by: Simon Ser 
Reviewed-by: Christian König 
Reviewed-by: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 67 
 include/uapi/linux/dma-buf.h | 35 +++
 2 files changed, 102 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 79795857be3e..6ff54f7e6119 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -192,6 +193,9 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
offset, int whence)
  * Note that this only signals the completion of the respective fences, i.e. 
the
  * DMA transfers are complete. Cache flushing and any other necessary
  * preparations before CPU access can begin still need to happen.
+ *
+ * As an alternative to poll(), the set of fences on DMA buffer can be
+ * exported as a _file using _buf_sync_file_export.
  */
 
 static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
@@ -326,6 +330,64 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const 
char __user *buf)
return 0;
 }
 
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+s

[PATCH 0/2] dma-buf: Add an API for exporting sync files (v15)

2022-06-08 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit synchronization
model.  This doesn't always play nicely with the implicit synchronization used
in the kernel and assumed by X11 and Wayland.  The client -> compositor half
of the synchronization isn't too bad, at least on intel, because we can
control whether or not i915 synchronizes on the buffer and whether or not it's
considered written.

The harder part is the compositor -> client synchronization when we get the
buffer back from the compositor.  We're required to be able to provide the
client with a VkSemaphore and VkFence representing the point in time where the
window system (compositor and/or display) finished using the buffer.  With
current APIs, it's very hard to do this in such a way that we don't get
confused by the Vulkan driver's access of the buffer.  In particular, once we
tell the kernel that we're rendering to the buffer again, any CPU waits on the
buffer or GPU dependencies will wait on some of the client rendering and not
just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of the
implicit synchronization state of a given dma-buf in the form of a sync file.
It's effectively the same as a poll() or I915_GEM_WAIT only, instead of CPU
waiting directly, it encapsulates the wait operation, at the current moment in
time, in a sync_file so we can check/wait on it later.  As long as the Vulkan
driver does the sync_file export from the dma-buf before we re-introduce it
for rendering, it will only contain fences from the compositor or display.
This allows to accurately turn it into a VkFence or VkSemaphore without any
over-synchronization.

This patch series actually contains two new ioctls.  There is the export one
mentioned above as well as an RFC for an import ioctl which provides the other
half.  The intention is to land the export ioctl since it seems like there's
no real disagreement on that one.  The import ioctl, however, has a lot of
debate around it so it's intended to be RFC-only for now.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037
IGT tests: https://patchwork.freedesktop.org/series/90490/

v10 (Jason Ekstrand, Daniel Vetter):
 - Add reviews/acks
 - Add a patch to rename _rcu to _unlocked
 - Split things better so import is clearly RFC status

v11 (Daniel Vetter):
 - Add more CCs to try and get maintainers
 - Add a patch to document DMA_BUF_IOCTL_SYNC
 - Generally better docs
 - Use separate structs for import/export (easier to document)
 - Fix an issue in the import patch

v12 (Daniel Vetter):
 - Better docs for DMA_BUF_IOCTL_SYNC

v12 (Christian König):
 - Drop the rename patch in favor of Christian's series
 - Add a comment to the commit message for the dma-buf sync_file export
   ioctl saying why we made it an ioctl on dma-buf

v13 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

v14 (Daniel Vetter and Christian König):
 - Use dma_rev_usage_rw to get the properly inverted usage to pass to
   dma_resv_get_singleton() for export
 - Clean up the sync_file and fd if copy_to_user() fails
 - Fix -EINVAL checks for the flags parameter in import
 - Add documentation about read/write fences for import
 - Add documentation about the expected usage of import/export and
   specifically call out the possible userspace race.

v15 (Simon Ser):
 - Fix a typo in the docs
 - Collect RB tags

Jason Ekstrand (2):
  dma-buf: Add an API for exporting sync files (v14)
  dma-buf: Add an API for importing sync files (v10)

 drivers/dma-buf/dma-buf.c| 106 +++
 include/uapi/linux/dma-buf.h |  84 +++
 2 files changed, 190 insertions(+)

-- 
2.36.1



Re: [PATCH v4] dma-buf: Add a capabilities directory

2022-06-08 Thread Jason Ekstrand
On Tue, 2022-06-07 at 12:55 +0200, Greg KH wrote:
> On Thu, Jun 02, 2022 at 08:47:56AM +0200, Daniel Vetter wrote:
> > On Thu, 2 Jun 2022 at 08:34, Simon Ser  wrote:
> > > 
> > > On Thursday, June 2nd, 2022 at 08:25, Greg KH 
> > > wrote:
> > > 
> > > > On Thu, Jun 02, 2022 at 06:17:31AM +, Simon Ser wrote:
> > > > 
> > > > > On Thursday, June 2nd, 2022 at 07:40, Greg KH
> > > > > g...@kroah.com wrote:
> > > > > 
> > > > > > On Wed, Jun 01, 2022 at 04:13:14PM +, Simon Ser wrote:
> > > > > > 
> > > > > > > To discover support for new DMA-BUF IOCTLs, user-space
> > > > > > > has no
> > > > > > > choice but to try to perform the IOCTL on an existing
> > > > > > > DMA-BUF.
> > > > > > 
> > > > > > Which is correct and how all kernel features work (sorry I
> > > > > > missed the
> > > > > > main goal of this patch earlier and focused only on the
> > > > > > sysfs stuff).
> > > > > > 
> > > > > > > However, user-space may want to figure out whether or not
> > > > > > > the
> > > > > > > IOCTL is available before it has a DMA-BUF at hand, e.g.
> > > > > > > at
> > > > > > > initialization time in a Wayland compositor.
> > > > > > 
> > > > > > Why not just do the ioctl in a test way? That's how we
> > > > > > determine kernel
> > > > > > features, we do not poke around in sysfs to determine what
> > > > > > is, or is
> > > > > > not, present at runtime.
> > > > > > 
> > > > > > > Add a /sys/kernel/dmabuf/caps directory which allows the
> > > > > > > DMA-BUF
> > > > > > > subsystem to advertise supported features. Add a
> > > > > > > sync_file_import_export entry which indicates that
> > > > > > > importing and
> > > > > > > exporting sync_files from/to DMA-BUFs is supported.
> > > > > > 
> > > > > > No, sorry, this is not a sustainable thing to do for all
> > > > > > kernel features
> > > > > > over time. Please just do the ioctl and go from there.
> > > > > > sysfs is not
> > > > > > for advertising what is and is not enabled/present in a
> > > > > > kernel with
> > > > > > regards to functionality or capabilities of the system.
> > > > > > 
> > > > > > If sysfs were to export this type of thing, it would have
> > > > > > to do it for
> > > > > > everything, not just some random tiny thing of one kernel
> > > > > > driver.
> > > > > 
> > > > > I'd argue that DMA-BUF is a special case here.
> > > > 
> > > > So this is special and unique just like everything else? :)
> > > > 
> > > > > To check whether the import/export IOCTLs are available,
> > > > > user-space
> > > > > needs a DMA-BUF to try to perform the IOCTL. To get a DMA-
> > > > > BUF,
> > > > > user-space needs to enumerate GPUs, pick one at random, load
> > > > > GBM or
> > > > > Vulkan, use that heavy-weight API to allocate a "fake" buffer
> > > > > on the
> > > > > GPU, export that buffer into a DMA-BUF, try the IOCTL, then
> > > > > teardown
> > > > > all of this. There is no other way.
> > > > > 
> > > > > This sounds like a roundabout way to answer the simple
> > > > > question "is the
> > > > > IOCTL available?". Do you have another suggestion to address
> > > > > this
> > > > > problem?
> > > > 
> > > > What does userspace do differently if the ioctl is present or
> > > > not?
> > > 
> > > Globally enable a synchronization API for Wayland clients, for
> > > instance
> > > in the case of a Wayland compositor.
> > > 
> > > > And why is this somehow more special than of the tens of
> > > > thousands of
> > > > other ioctl calls where you have to do exactly the same thing
> > > > you list
> > > > above to determine if it is present or not?
> > > 
> > > For other IOCTLs it's not as complicated to obtain a FD to do the
> > > test
> > > with.
> > 
> > Two expand on this:
> > 
> > - compositor opens the drm render /dev node
> > - compositor initializes the opengl or vulkan userspace driver on
> > top of that
> > - compositor asks that userspace driver to allocate some buffer,
> > which
> > can be pretty expensive
> > - compositor asks the userspace driver to export that buffer into a
> > dma-buf
> > - compositor can finally do the test ioctl, realizes support isn't
> > there and tosses the entire thing
> > 
> > read() on a sysfs file is so much more reasonable it's not even
> > funny.
> 
> I agree it seems trivial and "simple", but that is NOT how to
> determine
> what is, and is not, a valid ioctl command for a device node.
> 
> The only sane way to do this is like we have been doing for the past
> 30+
> years, make the ioctl and look at the return value.
> 
> Now if we want to come up with a new generic "here's the
> capabilities/ioctls/whatever" that the kernel currently supports at
> this
> point in time api, wonderful, but PLEASE do not overload sysfs to do
> something like this as that is not what it is for at this moment in
> time.
> 
> Don't just do this for one specific ioctl as there really is nothing
> special about it at all ("it's special and unique just like all other
> ioctls...")
> 
> > Plan B we discussed is to add a getparam to 

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-07 Thread Jason Ekstrand
On Fri, Jun 3, 2022 at 6:52 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> On Fri, Jun 03, 2022 at 10:20:25AM +0300, Lionel Landwerlin wrote:
> >   On 02/06/2022 23:35, Jason Ekstrand wrote:
> >
> > On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura
> >  wrote:
> >
> >   On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost wrote:
> >   >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote:
> >   >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:
> >   >> > +VM_BIND/UNBIND ioctl will immediately start binding/unbinding
> >   the mapping in an
> >   >> > +async worker. The binding and unbinding will work like a
> special
> >   GPU engine.
> >   >> > +The binding and unbinding operations are serialized and will
> >   wait on specified
> >   >> > +input fences before the operation and will signal the output
> >   fences upon the
> >   >> > +completion of the operation. Due to serialization,
> completion of
> >   an operation
> >   >> > +will also indicate that all previous operations are also
> >   complete.
> >   >>
> >   >> I guess we should avoid saying "will immediately start
> >   binding/unbinding" if
> >   >> there are fences involved.
> >   >>
> >   >> And the fact that it's happening in an async worker seem to
> imply
> >   it's not
> >   >> immediate.
> >   >>
> >
> >   Ok, will fix.
> >   This was added because in earlier design binding was deferred until
> >   next execbuff.
> >   But now it is non-deferred (immediate in that sense). But yah,
> this is
> >   confusing
> >   and will fix it.
> >
> >   >>
> >   >> I have a question on the behavior of the bind operation when no
> >   input fence
> >   >> is provided. Let say I do :
> >   >>
> >   >> VM_BIND (out_fence=fence1)
> >   >>
> >   >> VM_BIND (out_fence=fence2)
> >   >>
> >   >> VM_BIND (out_fence=fence3)
> >   >>
> >   >>
> >   >> In what order are the fences going to be signaled?
> >   >>
> >   >> In the order of VM_BIND ioctls? Or out of order?
> >   >>
> >   >> Because you wrote "serialized I assume it's : in order
> >   >>
> >
> >   Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind and
> unbind
> >   will use
> >   the same queue and hence are ordered.
> >
> >   >>
> >   >> One thing I didn't realize is that because we only get one
> >   "VM_BIND" engine,
> >   >> there is a disconnect from the Vulkan specification.
> >   >>
> >   >> In Vulkan VM_BIND operations are serialized but per engine.
> >   >>
> >   >> So you could have something like this :
> >   >>
> >   >> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)
> >   >>
> >   >> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)
> >   >>
> >   >>
> >   >> fence1 is not signaled
> >   >>
> >   >> fence3 is signaled
> >   >>
> >   >> So the second VM_BIND will proceed before the first VM_BIND.
> >   >>
> >   >>
> >   >> I guess we can deal with that scenario in userspace by doing the
> >   wait
> >   >> ourselves in one thread per engines.
> >   >>
> >   >> But then it makes the VM_BIND input fences useless.
> >   >>
> >   >>
> >   >> Daniel : what do you think? Should be rework this or just deal
> with
> >   wait
> >   >> fences in userspace?
> >   >>
> >   >
> >   >My opinion is rework this but make the ordering via an engine
> param
> >   optional.
> >   >
> >   >e.g. A VM can be configured so all binds are ordered within the VM
> >   >
> >   >e.g. A VM can be configured so all binds accept an engine argument
> >   (in
> >   >the case of the i915 likely this is a gem context h

Re: [Intel-gfx] [RFC v3 1/3] drm/doc/rfc: VM_BIND feature design document

2022-06-02 Thread Jason Ekstrand
On Thu, Jun 2, 2022 at 3:11 PM Niranjana Vishwanathapura <
niranjana.vishwanathap...@intel.com> wrote:

> On Wed, Jun 01, 2022 at 01:28:36PM -0700, Matthew Brost wrote:
> >On Wed, Jun 01, 2022 at 05:25:49PM +0300, Lionel Landwerlin wrote:
> >> On 17/05/2022 21:32, Niranjana Vishwanathapura wrote:
> >> > +VM_BIND/UNBIND ioctl will immediately start binding/unbinding the
> mapping in an
> >> > +async worker. The binding and unbinding will work like a special GPU
> engine.
> >> > +The binding and unbinding operations are serialized and will wait on
> specified
> >> > +input fences before the operation and will signal the output fences
> upon the
> >> > +completion of the operation. Due to serialization, completion of an
> operation
> >> > +will also indicate that all previous operations are also complete.
> >>
> >> I guess we should avoid saying "will immediately start
> binding/unbinding" if
> >> there are fences involved.
> >>
> >> And the fact that it's happening in an async worker seem to imply it's
> not
> >> immediate.
> >>
>
> Ok, will fix.
> This was added because in earlier design binding was deferred until next
> execbuff.
> But now it is non-deferred (immediate in that sense). But yah, this is
> confusing
> and will fix it.
>
> >>
> >> I have a question on the behavior of the bind operation when no input
> fence
> >> is provided. Let say I do :
> >>
> >> VM_BIND (out_fence=fence1)
> >>
> >> VM_BIND (out_fence=fence2)
> >>
> >> VM_BIND (out_fence=fence3)
> >>
> >>
> >> In what order are the fences going to be signaled?
> >>
> >> In the order of VM_BIND ioctls? Or out of order?
> >>
> >> Because you wrote "serialized I assume it's : in order
> >>
>
> Yes, in the order of VM_BIND/UNBIND ioctls. Note that bind and unbind will
> use
> the same queue and hence are ordered.
>
> >>
> >> One thing I didn't realize is that because we only get one "VM_BIND"
> engine,
> >> there is a disconnect from the Vulkan specification.
> >>
> >> In Vulkan VM_BIND operations are serialized but per engine.
> >>
> >> So you could have something like this :
> >>
> >> VM_BIND (engine=rcs0, in_fence=fence1, out_fence=fence2)
> >>
> >> VM_BIND (engine=ccs0, in_fence=fence3, out_fence=fence4)
> >>
> >>
> >> fence1 is not signaled
> >>
> >> fence3 is signaled
> >>
> >> So the second VM_BIND will proceed before the first VM_BIND.
> >>
> >>
> >> I guess we can deal with that scenario in userspace by doing the wait
> >> ourselves in one thread per engines.
> >>
> >> But then it makes the VM_BIND input fences useless.
> >>
> >>
> >> Daniel : what do you think? Should be rework this or just deal with wait
> >> fences in userspace?
> >>
> >
> >My opinion is rework this but make the ordering via an engine param
> optional.
> >
> >e.g. A VM can be configured so all binds are ordered within the VM
> >
> >e.g. A VM can be configured so all binds accept an engine argument (in
> >the case of the i915 likely this is a gem context handle) and binds
> >ordered with respect to that engine.
> >
> >This gives UMDs options as the later likely consumes more KMD resources
> >so if a different UMD can live with binds being ordered within the VM
> >they can use a mode consuming less resources.
> >
>
> I think we need to be careful here if we are looking for some out of
> (submission) order completion of vm_bind/unbind.
> In-order completion means, in a batch of binds and unbinds to be
> completed in-order, user only needs to specify in-fence for the
> first bind/unbind call and the our-fence for the last bind/unbind
> call. Also, the VA released by an unbind call can be re-used by
> any subsequent bind call in that in-order batch.
>
> These things will break if binding/unbinding were to be allowed to
> go out of order (of submission) and user need to be extra careful
> not to run into pre-mature triggereing of out-fence and bind failing
> as VA is still in use etc.
>
> Also, VM_BIND binds the provided mapping on the specified address space
> (VM). So, the uapi is not engine/context specific.
>
> We can however add a 'queue' to the uapi which can be one from the
> pre-defined queues,
> I915_VM_BIND_QUEUE_0
> I915_VM_BIND_QUEUE_1
> ...
> I915_VM_BIND_QUEUE_(N-1)
>
> KMD will spawn an async work queue for each queue which will only
> bind the mappings on that queue in the order of submission.
> User can assign the queue to per engine or anything like that.
>
> But again here, user need to be careful and not deadlock these
> queues with circular dependency of fences.
>
> I prefer adding this later an as extension based on whether it
> is really helping with the implementation.
>

I can tell you right now that having everything on a single in-order queue
will not get us the perf we want.  What vulkan really wants is one of two
things:

 1. No implicit ordering of VM_BIND ops.  They just happen in whatever
their dependencies are resolved and we ensure ordering ourselves by having
a syncobj in the VkQueue.

 2. The ability to create multiple VM_BIND queues.  

Re: [PATCH v4] dma-buf: Add a capabilities directory

2022-06-02 Thread Jason Ekstrand
v4 looks good to me as well.

--Jason


On Wed, 2022-06-01 at 16:13 +, Simon Ser wrote:
> To discover support for new DMA-BUF IOCTLs, user-space has no
> choice but to try to perform the IOCTL on an existing DMA-BUF.
> However, user-space may want to figure out whether or not the
> IOCTL is available before it has a DMA-BUF at hand, e.g. at
> initialization time in a Wayland compositor.
> 
> Add a /sys/kernel/dmabuf/caps directory which allows the DMA-BUF
> subsystem to advertise supported features. Add a
> sync_file_import_export entry which indicates that importing and
> exporting sync_files from/to DMA-BUFs is supported.
> 
> v2: Add missing files lost in a rebase
> 
> v3:
> - Create separate file in Documentation/ABI/testing/, add it to
>   MAINTAINERS
> - Fix kernel version (Daniel)
> - Remove unnecessary brackets (Jason)
> - Fix SPDX comment style
> 
> v4: improve sysfs code (Greg)
> 
> Signed-off-by: Simon Ser 
> Reviewed-by: Jason Ekstrand 
> Cc: Daniel Vetter 
> Cc: Bas Nieuwenhuizen 
> Cc: Christian König 
> Cc: Greg KH 
> ---
>  .../ABI/testing/sysfs-kernel-dmabuf-caps  | 13 ++
>  MAINTAINERS   |  1 +
>  drivers/dma-buf/Makefile  |  2 +-
>  drivers/dma-buf/dma-buf-sysfs-caps.c  | 31 +
>  drivers/dma-buf/dma-buf-sysfs-caps.h  | 15 +++
>  drivers/dma-buf/dma-buf-sysfs-stats.c | 16 ++-
>  drivers/dma-buf/dma-buf-sysfs-stats.h |  6 ++-
>  drivers/dma-buf/dma-buf.c | 45
> +--
>  include/uapi/linux/dma-buf.h  |  6 +++
>  9 files changed, 115 insertions(+), 20 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-dmabuf-
> caps
>  create mode 100644 drivers/dma-buf/dma-buf-sysfs-caps.c
>  create mode 100644 drivers/dma-buf/dma-buf-sysfs-caps.h
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-dmabuf-caps
> b/Documentation/ABI/testing/sysfs-kernel-dmabuf-caps
> new file mode 100644
> index ..f83af422fd18
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-dmabuf-caps
> @@ -0,0 +1,13 @@
> +What:  /sys/kernel/dmabuf/caps
> +Date:  May 2022
> +KernelVersion: v5.20
> +Contact:   Simon Ser 
> +Description:   This directory advertises DMA-BUF capabilities
> supported by the
> +   kernel.
> +
> +What:  /sys/kernel/dmabuf/caps/sync_file_import_export
> +Date:  May 2022
> +KernelVersion: v5.20
> +Contact:   Simon Ser 
> +Description:   This file is read-only and advertises support for
> importing and
> +   exporting sync_files from/to DMA-BUFs.
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 11da16bfa123..8966686f7231 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -5871,6 +5871,7 @@ L:dri-devel@lists.freedesktop.org
>  L: linaro-mm-...@lists.linaro.org (moderated for non-
> subscribers)
>  S: Maintained
>  T: git git://anongit.freedesktop.org/drm/drm-misc
> +F: Documentation/ABI/testing/sysfs-kernel-dmabuf-caps
>  F: Documentation/driver-api/dma-buf.rst
>  F: drivers/dma-buf/
>  F: include/linux/*fence.h
> diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
> index 4c9eb53ba3f8..afc874272710 100644
> --- a/drivers/dma-buf/Makefile
> +++ b/drivers/dma-buf/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
> -    dma-resv.o
> +    dma-resv.o dma-buf-sysfs-caps.o
>  obj-$(CONFIG_DMABUF_HEAPS) += dma-heap.o
>  obj-$(CONFIG_DMABUF_HEAPS) += heaps/
>  obj-$(CONFIG_SYNC_FILE)+= sync_file.o
> diff --git a/drivers/dma-buf/dma-buf-sysfs-caps.c b/drivers/dma-
> buf/dma-buf-sysfs-caps.c
> new file mode 100644
> index ..6eb27b81469f
> --- /dev/null
> +++ b/drivers/dma-buf/dma-buf-sysfs-caps.c
> @@ -0,0 +1,31 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * DMA-BUF sysfs capabilities.
> + *
> + * Copyright (C) 2022 Simon Ser
> + */
> +
> +#include 
> +#include 
> +
> +#include "dma-buf-sysfs-caps.h"
> +
> +static ssize_t sync_file_import_export_show(struct kobject *kobj,
> +   struct kobj_attribute
> *attr,
> +   char *buf)
> +{
> +   return sysfs_emit(buf, "1\n");
> +}
> +
> +static struct kobj_attribute dma_buf_sync_file_import_export_attr =
> +   __ATTR_RO(sync_file_import_export);
> +
> +static struct attribute *dma_buf_caps_attrs[] = {
> +   _buf_sync_file_import_export

Re: [PATCH v3] dma-buf: Add a capabilities directory

2022-05-31 Thread Jason Ekstrand
On Mon, 2022-05-30 at 10:26 +0200, Greg KH wrote:
> On Mon, May 30, 2022 at 08:15:04AM +, Simon Ser wrote:
> > On Monday, May 30th, 2022 at 09:20, Greg KH
> >  wrote:
> > 
> > > > > +static struct attribute *dma_buf_caps_attrs[] = {
> > > > > +   _buf_sync_file_import_export_attr.attr,
> > > > > +   NULL,
> > > > > +};
> > > > > +
> > > > > +static const struct attribute_group dma_buf_caps_attr_group
> > > > > = {
> > > > > +   .attrs = dma_buf_caps_attrs,
> > > > > +};
> > > > 
> > > > Didn't we had macros for those? I think I have seen something
> > > > for that.
> > > 
> > > Yes, please use ATTRIBUTE_GROUPS()
> > 
> > This doesn't allow the user to set a group name, and creates an
> > unused
> > "_groups" variable, causing warnings.
> 
> Then set a group name.
> 
> But you really want to almost always be using lists of groups, which
> is
> why that macro works that way.

I think I see the confusion here.  The ATTRIBUTE_GROUPS() macro is
intended for device drivers and to be used with add_device().  However,
this is dma-buf so there is no device and no add_device() call to hook.
Unless there are other magic macros to use in this case, I think we're
stuck doing it manually.

--Jason


> > 
> > > > > +
> > > > > +static struct kobject *dma_buf_caps_kobj;
> > > > > +
> > > > > +int dma_buf_init_sysfs_capabilities(struct kset *kset)
> > > > > +{
> > > > > +   int ret;
> > > > > +
> > > > > +   dma_buf_caps_kobj = kobject_create_and_add("caps",
> > > > > >kobj);
> > > > > +   if (!dma_buf_caps_kobj)
> > > > > +   return -ENOMEM;
> > > > > +
> > > > > +   ret = sysfs_create_group(dma_buf_caps_kobj,
> > > > > _buf_caps_attr_group);
> > > 
> > > Why do we have "raw" kobjects here?
> > > 
> > > A group can have a name, which puts it in the subdirectory of the
> > > object
> > > it is attached to.  Please do that and do not create a new
> > > kobject.
> > 
> > I see, I'll switch to sysfs_create_group with a group name in the
> > next version.
> 
> No, do not do that, add it to the list of groups for the original
> kobject.
> 
> thanks,
> 
> greg k-h



Re: [RFC PATCH v2] dma-buf: Add a capabilities directory

2022-05-26 Thread Jason Ekstrand
On Thu, May 26, 2022 at 12:40 PM Simon Ser  wrote:

> To discover support for new DMA-BUF IOCTLs, user-space has no
> choice but to try to perform the IOCTL on an existing DMA-BUF.
> However, user-space may want to figure out whether or not the
> IOCTL is available before it has a DMA-BUF at hand, e.g. at
> initialization time in a Wayland compositor.
>
> Add a /sys/kernel/dmabuf/caps directory which allows the DMA-BUF
> subsystem to advertise supported features. Add a
> sync_file_import_export entry which indicates that importing and
> exporting sync_files from/to DMA-BUFs is supported.
>
> Signed-off-by: Simon Ser 
> Cc: Jason Ekstrand 
> Cc: Daniel Vetter 
> Cc: Bas Nieuwenhuizen 
> Cc: Christian König 
> ---
>
> Oops, I forgot to check in new files after spliting a commit.
> Fixed.
>
> This depends on:
> https://patchwork.freedesktop.org/series/103715/
>
>  .../ABI/testing/sysfs-kernel-dmabuf-buffers   | 14 +
>  drivers/dma-buf/Makefile  |  2 +-
>  drivers/dma-buf/dma-buf-sysfs-caps.c  | 51 +++
>  drivers/dma-buf/dma-buf-sysfs-caps.h  | 16 ++
>  drivers/dma-buf/dma-buf-sysfs-stats.c | 13 +
>  drivers/dma-buf/dma-buf-sysfs-stats.h |  6 ++-
>  drivers/dma-buf/dma-buf.c | 43 ++--
>  include/uapi/linux/dma-buf.h  |  6 +++
>  8 files changed, 133 insertions(+), 18 deletions(-)
>  create mode 100644 drivers/dma-buf/dma-buf-sysfs-caps.c
>  create mode 100644 drivers/dma-buf/dma-buf-sysfs-caps.h
>
> diff --git a/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers
> b/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers
> index 5d3bc997dc64..682d313689d8 100644
> --- a/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers
> +++ b/Documentation/ABI/testing/sysfs-kernel-dmabuf-buffers
> @@ -22,3 +22,17 @@ KernelVersion:   v5.13
>  Contact:   Hridya Valsaraju 
>  Description:   This file is read-only and specifies the size of the
> DMA-BUF in
> bytes.
> +
> +What:  /sys/kernel/dmabuf/caps
> +Date:  May 2022
> +KernelVersion: v5.19
> +Contact:   Simon Ser 
> +Description:   This directory advertises DMA-BUF capabilities supported
> by the
> +   kernel.
> +
> +What:  /sys/kernel/dmabuf/caps/sync_file_import_export
> +Date:  May 2022
> +KernelVersion: v5.19
> +Contact:   Simon Ser 
> +Description:   This file is read-only and advertises support for
> importing and
> +   exporting sync_files from/to DMA-BUFs.
> diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
> index 4c9eb53ba3f8..afc874272710 100644
> --- a/drivers/dma-buf/Makefile
> +++ b/drivers/dma-buf/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
> -dma-resv.o
> +dma-resv.o dma-buf-sysfs-caps.o
>  obj-$(CONFIG_DMABUF_HEAPS) += dma-heap.o
>  obj-$(CONFIG_DMABUF_HEAPS) += heaps/
>  obj-$(CONFIG_SYNC_FILE)+= sync_file.o
> diff --git a/drivers/dma-buf/dma-buf-sysfs-caps.c
> b/drivers/dma-buf/dma-buf-sysfs-caps.c
> new file mode 100644
> index ..c760e55353bc
> --- /dev/null
> +++ b/drivers/dma-buf/dma-buf-sysfs-caps.c
> @@ -0,0 +1,51 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * DMA-BUF sysfs capabilities.
> + *
> + * Copyright (C) 2022 Simon Ser
> + */
> +
> +#include 
> +#include 
> +
> +#include "dma-buf-sysfs-caps.h"
> +
> +static ssize_t sync_file_import_export_show(struct kobject *kobj,
> +   struct kobj_attribute *attr,
> +   char *buf)
> +{
> +   return sysfs_emit(buf, "1\n");
> +}
> +
> +static struct kobj_attribute dma_buf_sync_file_import_export_attr =
> +   __ATTR_RO(sync_file_import_export);
> +
> +static struct attribute *dma_buf_caps_attrs[] = {
> +   _buf_sync_file_import_export_attr.attr,
> +   NULL,
> +};
> +
> +static const struct attribute_group dma_buf_caps_attr_group = {
> +   .attrs = dma_buf_caps_attrs,
> +};
> +
> +static struct kobject *dma_buf_caps_kobj;
> +
> +int dma_buf_init_sysfs_capabilities(struct kset *kset)
> +{
> +   int ret;
> +
> +   dma_buf_caps_kobj = kobject_create_and_add("caps", >kobj);
> +   if (!dma_buf_caps_kobj)
> +   return -ENOMEM;
> +
> +   ret = sysfs_create_group(dma_buf_caps_kobj,
> _buf_caps_attr_group);
> +   if (ret)
> +   kobject_put(dma_buf_caps_kobj);
> +   retu

Re: [PATCH 0/2] dma-buf: Add an API for exporting sync files (v14)

2022-05-26 Thread Jason Ekstrand
On Wed, May 25, 2022 at 5:02 AM Daniel Stone  wrote:

> On Sat, 7 May 2022 at 14:18, Jason Ekstrand  wrote:
> > This patch series actually contains two new ioctls.  There is the export
> one
> > mentioned above as well as an RFC for an import ioctl which provides the
> other
> > half.  The intention is to land the export ioctl since it seems like
> there's
> > no real disagreement on that one.  The import ioctl, however, has a lot
> of
> > debate around it so it's intended to be RFC-only for now.
>
> Errr, I think we're good with this one now right?
>

Yeah, I dropped the RFC from the patch, just not the description in the
cover letter, apparently.


> From the uAPI point of view, having looked through the Mesa MR, both are:
> Acked-by: Daniel Stone 
>

For reference:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037

Yes, I agree it's gotten sufficient review at this point that I think we
can call the uAPI reviewed.  I'm good with landing now.  Sorry that took so
long but the original version I had only used half of the new API and I
wanted to make sure both halves got good testing.

--Jason



> Cheers,
> Daniel
>


Re: [PATCH 1/2] dma-buf: Add an API for exporting sync files (v14)

2022-05-26 Thread Jason Ekstrand
On Wed, May 25, 2022 at 8:20 AM Daniel Vetter  wrote:

> On Mon, May 09, 2022 at 07:54:19AM +0200, Christian König wrote:
> > Reviewed-by: Christian König  for the series.
> >
> > I assume you have the userspace part ready as well? If yes let's push
> this
> > to drm-misc-next asap.
>
> Hopefully I'm not too late, but I think all my review has also been
> addressed. On the series:
>
> Reviewed-by: Daniel Vetter 
>

Thanks!  If Christian hasn't already, can we get this in drm-misc-next
please?  I don't have access AFAIK.

--Jason



> >
> > Christian.
> >
> > Am 06.05.22 um 20:02 schrieb Jason Ekstrand:
> > > Modern userspace APIs like Vulkan are built on an explicit
> > > synchronization model.  This doesn't always play nicely with the
> > > implicit synchronization used in the kernel and assumed by X11 and
> > > Wayland.  The client -> compositor half of the synchronization isn't
> too
> > > bad, at least on intel, because we can control whether or not i915
> > > synchronizes on the buffer and whether or not it's considered written.
> > >
> > > The harder part is the compositor -> client synchronization when we get
> > > the buffer back from the compositor.  We're required to be able to
> > > provide the client with a VkSemaphore and VkFence representing the
> point
> > > in time where the window system (compositor and/or display) finished
> > > using the buffer.  With current APIs, it's very hard to do this in such
> > > a way that we don't get confused by the Vulkan driver's access of the
> > > buffer.  In particular, once we tell the kernel that we're rendering to
> > > the buffer again, any CPU waits on the buffer or GPU dependencies will
> > > wait on some of the client rendering and not just the compositor.
> > >
> > > This new IOCTL solves this problem by allowing us to get a snapshot of
> > > the implicit synchronization state of a given dma-buf in the form of a
> > > sync file.  It's effectively the same as a poll() or I915_GEM_WAIT
> only,
> > > instead of CPU waiting directly, it encapsulates the wait operation, at
> > > the current moment in time, in a sync_file so we can check/wait on it
> > > later.  As long as the Vulkan driver does the sync_file export from the
> > > dma-buf before we re-introduce it for rendering, it will only contain
> > > fences from the compositor or display.  This allows to accurately turn
> > > it into a VkFence or VkSemaphore without any over-synchronization.
> > >
> > > By making this an ioctl on the dma-buf itself, it allows this new
> > > functionality to be used in an entirely driver-agnostic way without
> > > having access to a DRM fd. This makes it ideal for use in
> driver-generic
> > > code in Mesa or in a client such as a compositor where the DRM fd may
> be
> > > hard to reach.
> > >
> > > v2 (Jason Ekstrand):
> > >   - Use a wrapper dma_fence_array of all fences including the new one
> > > when importing an exclusive fence.
> > >
> > > v3 (Jason Ekstrand):
> > >   - Lock around setting shared fences as well as exclusive
> > >   - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
> > >   - Initialize ret to 0 in dma_buf_wait_sync_file
> > >
> > > v4 (Jason Ekstrand):
> > >   - Use the new dma_resv_get_singleton helper
> > >
> > > v5 (Jason Ekstrand):
> > >   - Rename the IOCTLs to import/export rather than wait/signal
> > >   - Drop the WRITE flag and always get/set the exclusive fence
> > >
> > > v6 (Jason Ekstrand):
> > >   - Drop the sync_file import as it was all-around sketchy and not
> nearly
> > > as useful as import.
> > >   - Re-introduce READ/WRITE flag support for export
> > >   - Rework the commit message
> > >
> > > v7 (Jason Ekstrand):
> > >   - Require at least one sync flag
> > >   - Fix a refcounting bug: dma_resv_get_excl() doesn't take a reference
> > >   - Use _rcu helpers since we're accessing the dma_resv read-only
> > >
> > > v8 (Jason Ekstrand):
> > >   - Return -ENOMEM if the sync_file_create fails
> > >   - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)
> > >
> > > v9 (Jason Ekstrand):
> > >   - Add documentation for the new ioctl
> > >
> > > v10 (Jason Ekstrand):
> > >   - Go back to dma_buf_sync_file as the ioctl struct name
> > >
> > > v11 (Daniel Vetter):
> > >   - Go back to dma_buf_e

Re: [PATCH 1/2] dma-buf: Add an API for exporting sync files (v14)

2022-05-26 Thread Jason Ekstrand
On Wed, May 25, 2022 at 10:43 AM Daniel Vetter  wrote:

> On Wed, May 25, 2022 at 10:35:47AM -0500, Jason Ekstrand wrote:
> > On Wed, May 25, 2022 at 8:20 AM Daniel Vetter  wrote:
> >
> > > On Mon, May 09, 2022 at 07:54:19AM +0200, Christian König wrote:
> > > > Reviewed-by: Christian König  for the
> series.
> > > >
> > > > I assume you have the userspace part ready as well? If yes let's push
> > > this
> > > > to drm-misc-next asap.
> > >
> > > Hopefully I'm not too late, but I think all my review has also been
> > > addressed. On the series:
> > >
> > > Reviewed-by: Daniel Vetter 
> > >
> >
> > Thanks!  If Christian hasn't already, can we get this in drm-misc-next
> > please?  I don't have access AFAIK.
>
> We need to fix this?
>

I don't do enough kernel dev to be worth giving access, I don't think.
It's infrequent enough that I'm going to have to ask someone else how to
use the tools to push stuff every time anyway.

--Jason



> -Daniel
> >
> > --Jason
> >
> >
> >
> > > >
> > > > Christian.
> > > >
> > > > Am 06.05.22 um 20:02 schrieb Jason Ekstrand:
> > > > > Modern userspace APIs like Vulkan are built on an explicit
> > > > > synchronization model.  This doesn't always play nicely with the
> > > > > implicit synchronization used in the kernel and assumed by X11 and
> > > > > Wayland.  The client -> compositor half of the synchronization
> isn't
> > > too
> > > > > bad, at least on intel, because we can control whether or not i915
> > > > > synchronizes on the buffer and whether or not it's considered
> written.
> > > > >
> > > > > The harder part is the compositor -> client synchronization when
> we get
> > > > > the buffer back from the compositor.  We're required to be able to
> > > > > provide the client with a VkSemaphore and VkFence representing the
> > > point
> > > > > in time where the window system (compositor and/or display)
> finished
> > > > > using the buffer.  With current APIs, it's very hard to do this in
> such
> > > > > a way that we don't get confused by the Vulkan driver's access of
> the
> > > > > buffer.  In particular, once we tell the kernel that we're
> rendering to
> > > > > the buffer again, any CPU waits on the buffer or GPU dependencies
> will
> > > > > wait on some of the client rendering and not just the compositor.
> > > > >
> > > > > This new IOCTL solves this problem by allowing us to get a
> snapshot of
> > > > > the implicit synchronization state of a given dma-buf in the form
> of a
> > > > > sync file.  It's effectively the same as a poll() or I915_GEM_WAIT
> > > only,
> > > > > instead of CPU waiting directly, it encapsulates the wait
> operation, at
> > > > > the current moment in time, in a sync_file so we can check/wait on
> it
> > > > > later.  As long as the Vulkan driver does the sync_file export
> from the
> > > > > dma-buf before we re-introduce it for rendering, it will only
> contain
> > > > > fences from the compositor or display.  This allows to accurately
> turn
> > > > > it into a VkFence or VkSemaphore without any over-synchronization.
> > > > >
> > > > > By making this an ioctl on the dma-buf itself, it allows this new
> > > > > functionality to be used in an entirely driver-agnostic way without
> > > > > having access to a DRM fd. This makes it ideal for use in
> > > driver-generic
> > > > > code in Mesa or in a client such as a compositor where the DRM fd
> may
> > > be
> > > > > hard to reach.
> > > > >
> > > > > v2 (Jason Ekstrand):
> > > > >   - Use a wrapper dma_fence_array of all fences including the new
> one
> > > > > when importing an exclusive fence.
> > > > >
> > > > > v3 (Jason Ekstrand):
> > > > >   - Lock around setting shared fences as well as exclusive
> > > > >   - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
> > > > >   - Initialize ret to 0 in dma_buf_wait_sync_file
> > > > >
> > > > > v4 (Jason Ekstrand):
> > > > >   - Use the new dma_resv_get_singleton helper
> > > > >
> > > > > v5 (Jas

[PATCH 1/2] dma-buf: Add an API for exporting sync files (v14)

2022-05-07 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit
synchronization model.  This doesn't always play nicely with the
implicit synchronization used in the kernel and assumed by X11 and
Wayland.  The client -> compositor half of the synchronization isn't too
bad, at least on intel, because we can control whether or not i915
synchronizes on the buffer and whether or not it's considered written.

The harder part is the compositor -> client synchronization when we get
the buffer back from the compositor.  We're required to be able to
provide the client with a VkSemaphore and VkFence representing the point
in time where the window system (compositor and/or display) finished
using the buffer.  With current APIs, it's very hard to do this in such
a way that we don't get confused by the Vulkan driver's access of the
buffer.  In particular, once we tell the kernel that we're rendering to
the buffer again, any CPU waits on the buffer or GPU dependencies will
wait on some of the client rendering and not just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of
the implicit synchronization state of a given dma-buf in the form of a
sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
instead of CPU waiting directly, it encapsulates the wait operation, at
the current moment in time, in a sync_file so we can check/wait on it
later.  As long as the Vulkan driver does the sync_file export from the
dma-buf before we re-introduce it for rendering, it will only contain
fences from the compositor or display.  This allows to accurately turn
it into a VkFence or VkSemaphore without any over-synchronization.

By making this an ioctl on the dma-buf itself, it allows this new
functionality to be used in an entirely driver-agnostic way without
having access to a DRM fd. This makes it ideal for use in driver-generic
code in Mesa or in a client such as a compositor where the DRM fd may be
hard to reach.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Drop the sync_file import as it was all-around sketchy and not nearly
   as useful as import.
 - Re-introduce READ/WRITE flag support for export
 - Rework the commit message

v7 (Jason Ekstrand):
 - Require at least one sync flag
 - Fix a refcounting bug: dma_resv_get_excl() doesn't take a reference
 - Use _rcu helpers since we're accessing the dma_resv read-only

v8 (Jason Ekstrand):
 - Return -ENOMEM if the sync_file_create fails
 - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)

v9 (Jason Ekstrand):
 - Add documentation for the new ioctl

v10 (Jason Ekstrand):
 - Go back to dma_buf_sync_file as the ioctl struct name

v11 (Daniel Vetter):
 - Go back to dma_buf_export_sync_file as the ioctl struct name
 - Better kerneldoc describing what the read/write flags do

v12 (Christian König):
 - Document why we chose to make it an ioctl on dma-buf

v13 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

v14 (Daniel Vetter & Christian König):
 - Use dma_rev_usage_rw to get the properly inverted usage to pass to
   dma_resv_get_singleton()
 - Clean up the sync_file and fd if copy_to_user() fails

Signed-off-by: Jason Ekstrand 
Signed-off-by: Jason Ekstrand 
Signed-off-by: Jason Ekstrand 
Acked-by: Simon Ser 
Acked-by: Christian König 
Reviewed-by: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 67 
 include/uapi/linux/dma-buf.h | 35 +++
 2 files changed, 102 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 79795857be3e..6ff54f7e6119 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -192,6 +193,9 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
offset, int whence)
  * Note that this only signals the completion of the respective fences, i.e. 
the
  * DMA transfers are complete. Cache flushing and any other necessary
  * preparations before CPU access can begin still need to happen.
+ *
+ * As an alternative to poll(), the set of fences on DMA buffer can be
+ * exported as a _file using _buf_sync_file_export.
  */
 
 static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
@@ -326,6 +330,64 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const 
char __user *buf)
return 0;
 }
 
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+s

[PATCH 0/2] dma-buf: Add an API for exporting sync files (v14)

2022-05-07 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit synchronization
model.  This doesn't always play nicely with the implicit synchronization used
in the kernel and assumed by X11 and Wayland.  The client -> compositor half
of the synchronization isn't too bad, at least on intel, because we can
control whether or not i915 synchronizes on the buffer and whether or not it's
considered written.

The harder part is the compositor -> client synchronization when we get the
buffer back from the compositor.  We're required to be able to provide the
client with a VkSemaphore and VkFence representing the point in time where the
window system (compositor and/or display) finished using the buffer.  With
current APIs, it's very hard to do this in such a way that we don't get
confused by the Vulkan driver's access of the buffer.  In particular, once we
tell the kernel that we're rendering to the buffer again, any CPU waits on the
buffer or GPU dependencies will wait on some of the client rendering and not
just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of the
implicit synchronization state of a given dma-buf in the form of a sync file.
It's effectively the same as a poll() or I915_GEM_WAIT only, instead of CPU
waiting directly, it encapsulates the wait operation, at the current moment in
time, in a sync_file so we can check/wait on it later.  As long as the Vulkan
driver does the sync_file export from the dma-buf before we re-introduce it
for rendering, it will only contain fences from the compositor or display.
This allows to accurately turn it into a VkFence or VkSemaphore without any
over-synchronization.

This patch series actually contains two new ioctls.  There is the export one
mentioned above as well as an RFC for an import ioctl which provides the other
half.  The intention is to land the export ioctl since it seems like there's
no real disagreement on that one.  The import ioctl, however, has a lot of
debate around it so it's intended to be RFC-only for now.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037
IGT tests: https://patchwork.freedesktop.org/series/90490/

v10 (Jason Ekstrand, Daniel Vetter):
 - Add reviews/acks
 - Add a patch to rename _rcu to _unlocked
 - Split things better so import is clearly RFC status

v11 (Daniel Vetter):
 - Add more CCs to try and get maintainers
 - Add a patch to document DMA_BUF_IOCTL_SYNC
 - Generally better docs
 - Use separate structs for import/export (easier to document)
 - Fix an issue in the import patch

v12 (Daniel Vetter):
 - Better docs for DMA_BUF_IOCTL_SYNC

v12 (Christian König):
 - Drop the rename patch in favor of Christian's series
 - Add a comment to the commit message for the dma-buf sync_file export
   ioctl saying why we made it an ioctl on dma-buf

v13 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

v14 (Daniel Vetter and Christian König):
 - Use dma_rev_usage_rw to get the properly inverted usage to pass to
   dma_resv_get_singleton() for export
 - Clean up the sync_file and fd if copy_to_user() fails
 - Fix -EINVAL checks for the flags parameter in import
 - Add documentation about read/write fences for import
 - Add documentation about the expected usage of import/export and
   specifically call out the possible userspace race.

Jason Ekstrand (2):
  dma-buf: Add an API for exporting sync files (v14)
  dma-buf: Add an API for importing sync files (v9)

 drivers/dma-buf/dma-buf.c| 106 +++
 include/uapi/linux/dma-buf.h |  84 +++
 2 files changed, 190 insertions(+)

-- 
2.36.0



[PATCH 2/2] dma-buf: Add an API for importing sync files (v9)

2022-05-07 Thread Jason Ekstrand
This patch is analogous to the previous sync file export patch in that
it allows you to import a sync_file into a dma-buf.  Unlike the previous
patch, however, this does add genuinely new functionality to dma-buf.
Without this, the only way to attach a sync_file to a dma-buf is to
submit a batch to your driver of choice which waits on the sync_file and
claims to write to the dma-buf.  Even if said batch is a no-op, a submit
is typically way more overhead than just attaching a fence.  A submit
may also imply extra synchronization with other work because it happens
on a hardware queue.

In the Vulkan world, this is useful for dealing with the out-fence from
vkQueuePresent.  Current Linux window-systems (X11, Wayland, etc.) all
rely on dma-buf implicit sync.  Since Vulkan is an explicit sync API, we
get a set of fences (VkSemaphores) in vkQueuePresent and have to stash
those as an exclusive (write) fence on the dma-buf.  We handle it in
Mesa today with the above mentioned dummy submit trick.  This ioctl
would allow us to set it directly without the dummy submit.

This may also open up possibilities for GPU drivers to move away from
implicit sync for their kernel driver uAPI and instead provide sync
files and rely on dma-buf import/export for communicating with other
implicit sync clients.

We make the explicit choice here to only allow setting RW fences which
translates to an exclusive fence on the dma_resv.  There's no use for
read-only fences for communicating with other implicit sync userspace
and any such attempts are likely to be racy at best.  When we got to
insert the RW fence, the actual fence we set as the new exclusive fence
is a combination of the sync_file provided by the user and all the other
fences on the dma_resv.  This ensures that the newly added exclusive
fence will never signal before the old one would have and ensures that
we don't break any dma_resv contracts.  We require userspace to specify
RW in the flags for symmetry with the export ioctl and in case we ever
want to support read fences in the future.

There is one downside here that's worth documenting:  If two clients
writing to the same dma-buf using this API race with each other, their
actions on the dma-buf may happen in parallel or in an undefined order.
Both with and without this API, the pattern is the same:  Collect all
the fences on dma-buf, submit work which depends on said fences, and
then set a new exclusive (write) fence on the dma-buf which depends on
said work.  The difference is that, when it's all handled by the GPU
driver's submit ioctl, the three operations happen atomically under the
dma_resv lock.  If two userspace submits race, one will happen before
the other.  You aren't guaranteed which but you are guaranteed that
they're strictly ordered.  If userspace manages the fences itself, then
these three operations happen separately and the two render operations
may happen genuinely in parallel or get interleaved.  However, this is a
case of userspace racing with itself.  As long as we ensure userspace
can't back the kernel into a corner, it should be fine.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Split import and export into separate patches
 - New commit message

v7 (Daniel Vetter):
 - Fix the uapi header to use the right struct in the ioctl
 - Use a separate dma_buf_import_sync_file struct
 - Add kerneldoc for dma_buf_import_sync_file

v8 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

v9 (Daniel Vetter):
 - Fix -EINVAL checks for the flags parameter
 - Add documentation about read/write fences
 - Add documentation about the expected usage of import/export and
   specifically call out the possible userspace race.

Signed-off-by: Jason Ekstrand 
Signed-off-by: Jason Ekstrand 
Signed-off-by: Jason Ekstrand 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 39 
 include/uapi/linux/dma-buf.h | 49 
 2 files changed, 88 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 6ff54f7e6119..f23f1482eb38 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -386,6 +386,43 @@ static long dma_buf_export_sync_file(struct dma_buf 
*dmabuf,
put_unused_fd(fd);
return ret;
 }
+
+static long dma_buf_import_sync_file(struct dma_buf *dmabuf,
+const void __user *user_data)
+{
+   struct

Re: [PATCH 1/2] dma-buf: Add an API for exporting sync files (v13)

2022-05-06 Thread Jason Ekstrand
On Thu, May 5, 2022 at 3:23 AM Daniel Vetter  wrote:

> On Thu, May 05, 2022 at 03:05:44AM -0500, Jason Ekstrand wrote:
> > On Wed, May 4, 2022 at 5:49 PM Daniel Vetter  wrote:
> >
> > > On Wed, May 04, 2022 at 03:34:03PM -0500, Jason Ekstrand wrote:
> > > > Modern userspace APIs like Vulkan are built on an explicit
> > > > synchronization model.  This doesn't always play nicely with the
> > > > implicit synchronization used in the kernel and assumed by X11 and
> > > > Wayland.  The client -> compositor half of the synchronization isn't
> too
> > > > bad, at least on intel, because we can control whether or not i915
> > > > synchronizes on the buffer and whether or not it's considered
> written.
> > > >
> > > > The harder part is the compositor -> client synchronization when we
> get
> > > > the buffer back from the compositor.  We're required to be able to
> > > > provide the client with a VkSemaphore and VkFence representing the
> point
> > > > in time where the window system (compositor and/or display) finished
> > > > using the buffer.  With current APIs, it's very hard to do this in
> such
> > > > a way that we don't get confused by the Vulkan driver's access of the
> > > > buffer.  In particular, once we tell the kernel that we're rendering
> to
> > > > the buffer again, any CPU waits on the buffer or GPU dependencies
> will
> > > > wait on some of the client rendering and not just the compositor.
> > > >
> > > > This new IOCTL solves this problem by allowing us to get a snapshot
> of
> > > > the implicit synchronization state of a given dma-buf in the form of
> a
> > > > sync file.  It's effectively the same as a poll() or I915_GEM_WAIT
> only,
> > > > instead of CPU waiting directly, it encapsulates the wait operation,
> at
> > > > the current moment in time, in a sync_file so we can check/wait on it
> > > > later.  As long as the Vulkan driver does the sync_file export from
> the
> > > > dma-buf before we re-introduce it for rendering, it will only contain
> > > > fences from the compositor or display.  This allows to accurately
> turn
> > > > it into a VkFence or VkSemaphore without any over-synchronization.
> > > >
> > > > By making this an ioctl on the dma-buf itself, it allows this new
> > > > functionality to be used in an entirely driver-agnostic way without
> > > > having access to a DRM fd. This makes it ideal for use in
> driver-generic
> > > > code in Mesa or in a client such as a compositor where the DRM fd
> may be
> > > > hard to reach.
> > > >
> > > > v2 (Jason Ekstrand):
> > > >  - Use a wrapper dma_fence_array of all fences including the new one
> > > >when importing an exclusive fence.
> > > >
> > > > v3 (Jason Ekstrand):
> > > >  - Lock around setting shared fences as well as exclusive
> > > >  - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
> > > >  - Initialize ret to 0 in dma_buf_wait_sync_file
> > > >
> > > > v4 (Jason Ekstrand):
> > > >  - Use the new dma_resv_get_singleton helper
> > > >
> > > > v5 (Jason Ekstrand):
> > > >  - Rename the IOCTLs to import/export rather than wait/signal
> > > >  - Drop the WRITE flag and always get/set the exclusive fence
> > > >
> > > > v6 (Jason Ekstrand):
> > > >  - Drop the sync_file import as it was all-around sketchy and not
> nearly
> > > >as useful as import.
> > > >  - Re-introduce READ/WRITE flag support for export
> > > >  - Rework the commit message
> > > >
> > > > v7 (Jason Ekstrand):
> > > >  - Require at least one sync flag
> > > >  - Fix a refcounting bug: dma_resv_get_excl() doesn't take a
> reference
> > > >  - Use _rcu helpers since we're accessing the dma_resv read-only
> > > >
> > > > v8 (Jason Ekstrand):
> > > >  - Return -ENOMEM if the sync_file_create fails
> > > >  - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)
> > > >
> > > > v9 (Jason Ekstrand):
> > > >  - Add documentation for the new ioctl
> > > >
> > > > v10 (Jason Ekstrand):
> > > >  - Go back to dma_buf_sync_file as the ioctl struct name
> > > >
> > > > v11 (Daniel Vetter):
> > > >  - Go back to dma_buf_export_sync_

Re: [PATCH 2/2] dma-buf: Add an API for importing sync files (v8)

2022-05-06 Thread Jason Ekstrand
On Wed, May 4, 2022 at 5:53 PM Daniel Vetter  wrote:

> On Wed, May 04, 2022 at 03:34:04PM -0500, Jason Ekstrand wrote:
> > This patch is analogous to the previous sync file export patch in that
> > it allows you to import a sync_file into a dma-buf.  Unlike the previous
> > patch, however, this does add genuinely new functionality to dma-buf.
> > Without this, the only way to attach a sync_file to a dma-buf is to
> > submit a batch to your driver of choice which waits on the sync_file and
> > claims to write to the dma-buf.  Even if said batch is a no-op, a submit
> > is typically way more overhead than just attaching a fence.  A submit
> > may also imply extra synchronization with other work because it happens
> > on a hardware queue.
> >
> > In the Vulkan world, this is useful for dealing with the out-fence from
> > vkQueuePresent.  Current Linux window-systems (X11, Wayland, etc.) all
> > rely on dma-buf implicit sync.  Since Vulkan is an explicit sync API, we
> > get a set of fences (VkSemaphores) in vkQueuePresent and have to stash
> > those as an exclusive (write) fence on the dma-buf.  We handle it in
> > Mesa today with the above mentioned dummy submit trick.  This ioctl
> > would allow us to set it directly without the dummy submit.
> >
> > This may also open up possibilities for GPU drivers to move away from
> > implicit sync for their kernel driver uAPI and instead provide sync
> > files and rely on dma-buf import/export for communicating with other
> > implicit sync clients.
> >
> > We make the explicit choice here to only allow setting RW fences which
> > translates to an exclusive fence on the dma_resv.  There's no use for
> > read-only fences for communicating with other implicit sync userspace
> > and any such attempts are likely to be racy at best.  When we got to
> > insert the RW fence, the actual fence we set as the new exclusive fence
> > is a combination of the sync_file provided by the user and all the other
> > fences on the dma_resv.  This ensures that the newly added exclusive
> > fence will never signal before the old one would have and ensures that
> > we don't break any dma_resv contracts.  We require userspace to specify
> > RW in the flags for symmetry with the export ioctl and in case we ever
> > want to support read fences in the future.
> >
> > There is one downside here that's worth documenting:  If two clients
> > writing to the same dma-buf using this API race with each other, their
> > actions on the dma-buf may happen in parallel or in an undefined order.
> > Both with and without this API, the pattern is the same:  Collect all
> > the fences on dma-buf, submit work which depends on said fences, and
> > then set a new exclusive (write) fence on the dma-buf which depends on
> > said work.  The difference is that, when it's all handled by the GPU
> > driver's submit ioctl, the three operations happen atomically under the
> > dma_resv lock.  If two userspace submits race, one will happen before
> > the other.  You aren't guaranteed which but you are guaranteed that
> > they're strictly ordered.  If userspace manages the fences itself, then
> > these three operations happen separately and the two render operations
> > may happen genuinely in parallel or get interleaved.  However, this is a
> > case of userspace racing with itself.  As long as we ensure userspace
> > can't back the kernel into a corner, it should be fine.
> >
> > v2 (Jason Ekstrand):
> >  - Use a wrapper dma_fence_array of all fences including the new one
> >    when importing an exclusive fence.
> >
> > v3 (Jason Ekstrand):
> >  - Lock around setting shared fences as well as exclusive
> >  - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
> >  - Initialize ret to 0 in dma_buf_wait_sync_file
> >
> > v4 (Jason Ekstrand):
> >  - Use the new dma_resv_get_singleton helper
> >
> > v5 (Jason Ekstrand):
> >  - Rename the IOCTLs to import/export rather than wait/signal
> >  - Drop the WRITE flag and always get/set the exclusive fence
> >
> > v6 (Jason Ekstrand):
> >  - Split import and export into separate patches
> >  - New commit message
> >
> > v7 (Daniel Vetter):
> >  - Fix the uapi header to use the right struct in the ioctl
> >  - Use a separate dma_buf_import_sync_file struct
> >  - Add kerneldoc for dma_buf_import_sync_file
> >
> > v8 (Jason Ekstrand):
> >  - Rebase on Christian König's fence rework
> >
> > Signed-off-by: Jason Ekstrand 
> > Cc: Christian König 
> > Cc: Daniel Vetter 
> > Cc: Su

Re: [PATCH 1/2] dma-buf: Add an API for exporting sync files (v13)

2022-05-06 Thread Jason Ekstrand
On Thu, May 5, 2022 at 1:25 AM Christian König 
wrote:

> Am 04.05.22 um 22:34 schrieb Jason Ekstrand:
> > Modern userspace APIs like Vulkan are built on an explicit
> > synchronization model.  This doesn't always play nicely with the
> > implicit synchronization used in the kernel and assumed by X11 and
> > Wayland.  The client -> compositor half of the synchronization isn't too
> > bad, at least on intel, because we can control whether or not i915
> > synchronizes on the buffer and whether or not it's considered written.
> >
> > The harder part is the compositor -> client synchronization when we get
> > the buffer back from the compositor.  We're required to be able to
> > provide the client with a VkSemaphore and VkFence representing the point
> > in time where the window system (compositor and/or display) finished
> > using the buffer.  With current APIs, it's very hard to do this in such
> > a way that we don't get confused by the Vulkan driver's access of the
> > buffer.  In particular, once we tell the kernel that we're rendering to
> > the buffer again, any CPU waits on the buffer or GPU dependencies will
> > wait on some of the client rendering and not just the compositor.
> >
> > This new IOCTL solves this problem by allowing us to get a snapshot of
> > the implicit synchronization state of a given dma-buf in the form of a
> > sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
> > instead of CPU waiting directly, it encapsulates the wait operation, at
> > the current moment in time, in a sync_file so we can check/wait on it
> > later.  As long as the Vulkan driver does the sync_file export from the
> > dma-buf before we re-introduce it for rendering, it will only contain
> > fences from the compositor or display.  This allows to accurately turn
> > it into a VkFence or VkSemaphore without any over-synchronization.
> >
> > By making this an ioctl on the dma-buf itself, it allows this new
> > functionality to be used in an entirely driver-agnostic way without
> > having access to a DRM fd. This makes it ideal for use in driver-generic
> > code in Mesa or in a client such as a compositor where the DRM fd may be
> > hard to reach.
> >
> > v2 (Jason Ekstrand):
> >   - Use a wrapper dma_fence_array of all fences including the new one
> > when importing an exclusive fence.
> >
> > v3 (Jason Ekstrand):
> >   - Lock around setting shared fences as well as exclusive
> >   - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
> >   - Initialize ret to 0 in dma_buf_wait_sync_file
> >
> > v4 (Jason Ekstrand):
> >   - Use the new dma_resv_get_singleton helper
> >
> > v5 (Jason Ekstrand):
> >   - Rename the IOCTLs to import/export rather than wait/signal
> >   - Drop the WRITE flag and always get/set the exclusive fence
> >
> > v6 (Jason Ekstrand):
> >   - Drop the sync_file import as it was all-around sketchy and not nearly
> > as useful as import.
> >   - Re-introduce READ/WRITE flag support for export
> >   - Rework the commit message
> >
> > v7 (Jason Ekstrand):
> >   - Require at least one sync flag
> >   - Fix a refcounting bug: dma_resv_get_excl() doesn't take a reference
> >   - Use _rcu helpers since we're accessing the dma_resv read-only
> >
> > v8 (Jason Ekstrand):
> >   - Return -ENOMEM if the sync_file_create fails
> >   - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)
> >
> > v9 (Jason Ekstrand):
> >   - Add documentation for the new ioctl
> >
> > v10 (Jason Ekstrand):
> >   - Go back to dma_buf_sync_file as the ioctl struct name
> >
> > v11 (Daniel Vetter):
> >   - Go back to dma_buf_export_sync_file as the ioctl struct name
> >   - Better kerneldoc describing what the read/write flags do
> >
> > v12 (Christian König):
> >   - Document why we chose to make it an ioctl on dma-buf
> >
> > v12 (Jason Ekstrand):
> >   - Rebase on Christian König's fence rework
> >
> > Signed-off-by: Jason Ekstrand 
> > Acked-by: Simon Ser 
> > Acked-by: Christian König 
> > Reviewed-by: Daniel Vetter 
> > Cc: Sumit Semwal 
> > Cc: Maarten Lankhorst 
> > ---
> >   drivers/dma-buf/dma-buf.c| 64 
> >   include/uapi/linux/dma-buf.h | 35 
> >   2 files changed, 99 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
> > index 79795857be3e..529e0611e53b 100644
> > --- a/drivers/dma-buf/dma-buf.c
> > +++ b/drivers/dma-buf/dma-buf.c
>

Re: [PATCH 1/2] dma-buf: Add an API for exporting sync files (v13)

2022-05-06 Thread Jason Ekstrand
On Wed, May 4, 2022 at 5:49 PM Daniel Vetter  wrote:

> On Wed, May 04, 2022 at 03:34:03PM -0500, Jason Ekstrand wrote:
> > Modern userspace APIs like Vulkan are built on an explicit
> > synchronization model.  This doesn't always play nicely with the
> > implicit synchronization used in the kernel and assumed by X11 and
> > Wayland.  The client -> compositor half of the synchronization isn't too
> > bad, at least on intel, because we can control whether or not i915
> > synchronizes on the buffer and whether or not it's considered written.
> >
> > The harder part is the compositor -> client synchronization when we get
> > the buffer back from the compositor.  We're required to be able to
> > provide the client with a VkSemaphore and VkFence representing the point
> > in time where the window system (compositor and/or display) finished
> > using the buffer.  With current APIs, it's very hard to do this in such
> > a way that we don't get confused by the Vulkan driver's access of the
> > buffer.  In particular, once we tell the kernel that we're rendering to
> > the buffer again, any CPU waits on the buffer or GPU dependencies will
> > wait on some of the client rendering and not just the compositor.
> >
> > This new IOCTL solves this problem by allowing us to get a snapshot of
> > the implicit synchronization state of a given dma-buf in the form of a
> > sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
> > instead of CPU waiting directly, it encapsulates the wait operation, at
> > the current moment in time, in a sync_file so we can check/wait on it
> > later.  As long as the Vulkan driver does the sync_file export from the
> > dma-buf before we re-introduce it for rendering, it will only contain
> > fences from the compositor or display.  This allows to accurately turn
> > it into a VkFence or VkSemaphore without any over-synchronization.
> >
> > By making this an ioctl on the dma-buf itself, it allows this new
> > functionality to be used in an entirely driver-agnostic way without
> > having access to a DRM fd. This makes it ideal for use in driver-generic
> > code in Mesa or in a client such as a compositor where the DRM fd may be
> > hard to reach.
> >
> > v2 (Jason Ekstrand):
> >  - Use a wrapper dma_fence_array of all fences including the new one
> >when importing an exclusive fence.
> >
> > v3 (Jason Ekstrand):
> >  - Lock around setting shared fences as well as exclusive
> >  - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
> >  - Initialize ret to 0 in dma_buf_wait_sync_file
> >
> > v4 (Jason Ekstrand):
> >  - Use the new dma_resv_get_singleton helper
> >
> > v5 (Jason Ekstrand):
> >  - Rename the IOCTLs to import/export rather than wait/signal
> >  - Drop the WRITE flag and always get/set the exclusive fence
> >
> > v6 (Jason Ekstrand):
> >  - Drop the sync_file import as it was all-around sketchy and not nearly
> >as useful as import.
> >  - Re-introduce READ/WRITE flag support for export
> >  - Rework the commit message
> >
> > v7 (Jason Ekstrand):
> >  - Require at least one sync flag
> >  - Fix a refcounting bug: dma_resv_get_excl() doesn't take a reference
> >  - Use _rcu helpers since we're accessing the dma_resv read-only
> >
> > v8 (Jason Ekstrand):
> >  - Return -ENOMEM if the sync_file_create fails
> >  - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)
> >
> > v9 (Jason Ekstrand):
> >  - Add documentation for the new ioctl
> >
> > v10 (Jason Ekstrand):
> >  - Go back to dma_buf_sync_file as the ioctl struct name
> >
> > v11 (Daniel Vetter):
> >  - Go back to dma_buf_export_sync_file as the ioctl struct name
> >  - Better kerneldoc describing what the read/write flags do
> >
> > v12 (Christian König):
> >  - Document why we chose to make it an ioctl on dma-buf
> >
> > v12 (Jason Ekstrand):
> >  - Rebase on Christian König's fence rework
> >
> > Signed-off-by: Jason Ekstrand 
> > Acked-by: Simon Ser 
> > Acked-by: Christian König 
> > Reviewed-by: Daniel Vetter 
>
> Not sure which version it was that I reviewed, but with dma_resv_usage
> this all looks neat and tidy. One nit below.
>
> > Cc: Sumit Semwal 
> > Cc: Maarten Lankhorst 
> > ---
> >  drivers/dma-buf/dma-buf.c| 64 
> >  include/uapi/linux/dma-buf.h | 35 
> >  2 files changed, 99 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
>

[PATCH 0/2] dma-buf: Add an API for exporting sync files (v13)

2022-05-05 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit synchronization
model.  This doesn't always play nicely with the implicit synchronization used
in the kernel and assumed by X11 and Wayland.  The client -> compositor half
of the synchronization isn't too bad, at least on intel, because we can
control whether or not i915 synchronizes on the buffer and whether or not it's
considered written.

The harder part is the compositor -> client synchronization when we get the
buffer back from the compositor.  We're required to be able to provide the
client with a VkSemaphore and VkFence representing the point in time where the
window system (compositor and/or display) finished using the buffer.  With
current APIs, it's very hard to do this in such a way that we don't get
confused by the Vulkan driver's access of the buffer.  In particular, once we
tell the kernel that we're rendering to the buffer again, any CPU waits on the
buffer or GPU dependencies will wait on some of the client rendering and not
just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of the
implicit synchronization state of a given dma-buf in the form of a sync file.
It's effectively the same as a poll() or I915_GEM_WAIT only, instead of CPU
waiting directly, it encapsulates the wait operation, at the current moment in
time, in a sync_file so we can check/wait on it later.  As long as the Vulkan
driver does the sync_file export from the dma-buf before we re-introduce it
for rendering, it will only contain fences from the compositor or display.
This allows to accurately turn it into a VkFence or VkSemaphore without any
over-synchronization.

This patch series actually contains two new ioctls.  There is the export one
mentioned above as well as an RFC for an import ioctl which provides the other
half.  The intention is to land the export ioctl since it seems like there's
no real disagreement on that one.  The import ioctl, however, has a lot of
debate around it so it's intended to be RFC-only for now.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037
IGT tests: https://patchwork.freedesktop.org/series/90490/

v10 (Jason Ekstrand, Daniel Vetter):
 - Add reviews/acks
 - Add a patch to rename _rcu to _unlocked
 - Split things better so import is clearly RFC status

v11 (Daniel Vetter):
 - Add more CCs to try and get maintainers
 - Add a patch to document DMA_BUF_IOCTL_SYNC
 - Generally better docs
 - Use separate structs for import/export (easier to document)
 - Fix an issue in the import patch

v12 (Daniel Vetter):
 - Better docs for DMA_BUF_IOCTL_SYNC

v12 (Christian König):
 - Drop the rename patch in favor of Christian's series
 - Add a comment to the commit message for the dma-buf sync_file export
   ioctl saying why we made it an ioctl on dma-buf

v13 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

Cc: Christian König 
Cc: Michel Dänzer 
Cc: Dave Airlie 
Cc: Bas Nieuwenhuizen 
Cc: Daniel Stone 
Cc: mesa-...@lists.freedesktop.org
Cc: wayland-de...@lists.freedesktop.org

Jason Ekstrand (2):
  dma-buf: Add an API for exporting sync files (v13)
  dma-buf: Add an API for importing sync files (v8)

 drivers/dma-buf/dma-buf.c| 100 +++
 include/uapi/linux/dma-buf.h |  57 
 2 files changed, 157 insertions(+)

-- 
2.36.0



[PATCH 2/2] dma-buf: Add an API for importing sync files (v8)

2022-05-05 Thread Jason Ekstrand
This patch is analogous to the previous sync file export patch in that
it allows you to import a sync_file into a dma-buf.  Unlike the previous
patch, however, this does add genuinely new functionality to dma-buf.
Without this, the only way to attach a sync_file to a dma-buf is to
submit a batch to your driver of choice which waits on the sync_file and
claims to write to the dma-buf.  Even if said batch is a no-op, a submit
is typically way more overhead than just attaching a fence.  A submit
may also imply extra synchronization with other work because it happens
on a hardware queue.

In the Vulkan world, this is useful for dealing with the out-fence from
vkQueuePresent.  Current Linux window-systems (X11, Wayland, etc.) all
rely on dma-buf implicit sync.  Since Vulkan is an explicit sync API, we
get a set of fences (VkSemaphores) in vkQueuePresent and have to stash
those as an exclusive (write) fence on the dma-buf.  We handle it in
Mesa today with the above mentioned dummy submit trick.  This ioctl
would allow us to set it directly without the dummy submit.

This may also open up possibilities for GPU drivers to move away from
implicit sync for their kernel driver uAPI and instead provide sync
files and rely on dma-buf import/export for communicating with other
implicit sync clients.

We make the explicit choice here to only allow setting RW fences which
translates to an exclusive fence on the dma_resv.  There's no use for
read-only fences for communicating with other implicit sync userspace
and any such attempts are likely to be racy at best.  When we got to
insert the RW fence, the actual fence we set as the new exclusive fence
is a combination of the sync_file provided by the user and all the other
fences on the dma_resv.  This ensures that the newly added exclusive
fence will never signal before the old one would have and ensures that
we don't break any dma_resv contracts.  We require userspace to specify
RW in the flags for symmetry with the export ioctl and in case we ever
want to support read fences in the future.

There is one downside here that's worth documenting:  If two clients
writing to the same dma-buf using this API race with each other, their
actions on the dma-buf may happen in parallel or in an undefined order.
Both with and without this API, the pattern is the same:  Collect all
the fences on dma-buf, submit work which depends on said fences, and
then set a new exclusive (write) fence on the dma-buf which depends on
said work.  The difference is that, when it's all handled by the GPU
driver's submit ioctl, the three operations happen atomically under the
dma_resv lock.  If two userspace submits race, one will happen before
the other.  You aren't guaranteed which but you are guaranteed that
they're strictly ordered.  If userspace manages the fences itself, then
these three operations happen separately and the two render operations
may happen genuinely in parallel or get interleaved.  However, this is a
case of userspace racing with itself.  As long as we ensure userspace
can't back the kernel into a corner, it should be fine.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Split import and export into separate patches
 - New commit message

v7 (Daniel Vetter):
 - Fix the uapi header to use the right struct in the ioctl
 - Use a separate dma_buf_import_sync_file struct
 - Add kerneldoc for dma_buf_import_sync_file

v8 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

Signed-off-by: Jason Ekstrand 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 36 
 include/uapi/linux/dma-buf.h | 22 ++
 2 files changed, 58 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 529e0611e53b..68aac6f694f9 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -383,6 +383,40 @@ static long dma_buf_export_sync_file(struct dma_buf 
*dmabuf,
put_unused_fd(fd);
return ret;
 }
+
+static long dma_buf_import_sync_file(struct dma_buf *dmabuf,
+const void __user *user_data)
+{
+   struct dma_buf_import_sync_file arg;
+   struct dma_fence *fence;
+   enum dma_resv_usage usage;
+   int ret = 0;
+
+   if (copy_from_user(, user_data, sizeof(arg)))
+   return -EFAULT;
+
+   if (arg.flags != DMA_BUF_SYNC_RW)
+   return -EINVAL;
+
+   fence

[PATCH 0/2] *dma-buf: Add an API for exporting sync files (v13)

2022-05-05 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit synchronization
model.  This doesn't always play nicely with the implicit synchronization used
in the kernel and assumed by X11 and Wayland.  The client -> compositor half
of the synchronization isn't too bad, at least on intel, because we can
control whether or not i915 synchronizes on the buffer and whether or not it's
considered written.

The harder part is the compositor -> client synchronization when we get the
buffer back from the compositor.  We're required to be able to provide the
client with a VkSemaphore and VkFence representing the point in time where the
window system (compositor and/or display) finished using the buffer.  With
current APIs, it's very hard to do this in such a way that we don't get
confused by the Vulkan driver's access of the buffer.  In particular, once we
tell the kernel that we're rendering to the buffer again, any CPU waits on the
buffer or GPU dependencies will wait on some of the client rendering and not
just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of the
implicit synchronization state of a given dma-buf in the form of a sync file.
It's effectively the same as a poll() or I915_GEM_WAIT only, instead of CPU
waiting directly, it encapsulates the wait operation, at the current moment in
time, in a sync_file so we can check/wait on it later.  As long as the Vulkan
driver does the sync_file export from the dma-buf before we re-introduce it
for rendering, it will only contain fences from the compositor or display.
This allows to accurately turn it into a VkFence or VkSemaphore without any
over-synchronization.

This patch series actually contains two new ioctls.  There is the export one
mentioned above as well as an RFC for an import ioctl which provides the other
half.  The intention is to land the export ioctl since it seems like there's
no real disagreement on that one.  The import ioctl, however, has a lot of
debate around it so it's intended to be RFC-only for now.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037
IGT tests: https://patchwork.freedesktop.org/series/90490/

v10 (Jason Ekstrand, Daniel Vetter):
 - Add reviews/acks
 - Add a patch to rename _rcu to _unlocked
 - Split things better so import is clearly RFC status

v11 (Daniel Vetter):
 - Add more CCs to try and get maintainers
 - Add a patch to document DMA_BUF_IOCTL_SYNC
 - Generally better docs
 - Use separate structs for import/export (easier to document)
 - Fix an issue in the import patch

v12 (Daniel Vetter):
 - Better docs for DMA_BUF_IOCTL_SYNC

v12 (Christian König):
 - Drop the rename patch in favor of Christian's series
 - Add a comment to the commit message for the dma-buf sync_file export
   ioctl saying why we made it an ioctl on dma-buf

v13 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

Cc: Christian König 
Cc: Michel Dänzer 
Cc: Dave Airlie 
Cc: Bas Nieuwenhuizen 
Cc: Daniel Stone 
Cc: mesa-...@lists.freedesktop.org
Cc: wayland-de...@lists.freedesktop.org

Jason Ekstrand (2):
  dma-buf: Add an API for exporting sync files (v13)
  dma-buf: Add an API for importing sync files (v8)

 drivers/dma-buf/dma-buf.c| 100 +++
 include/uapi/linux/dma-buf.h |  57 
 2 files changed, 157 insertions(+)

-- 
2.36.0



[PATCH 2/2] dma-buf: Add an API for importing sync files (v8)

2022-05-05 Thread Jason Ekstrand
This patch is analogous to the previous sync file export patch in that
it allows you to import a sync_file into a dma-buf.  Unlike the previous
patch, however, this does add genuinely new functionality to dma-buf.
Without this, the only way to attach a sync_file to a dma-buf is to
submit a batch to your driver of choice which waits on the sync_file and
claims to write to the dma-buf.  Even if said batch is a no-op, a submit
is typically way more overhead than just attaching a fence.  A submit
may also imply extra synchronization with other work because it happens
on a hardware queue.

In the Vulkan world, this is useful for dealing with the out-fence from
vkQueuePresent.  Current Linux window-systems (X11, Wayland, etc.) all
rely on dma-buf implicit sync.  Since Vulkan is an explicit sync API, we
get a set of fences (VkSemaphores) in vkQueuePresent and have to stash
those as an exclusive (write) fence on the dma-buf.  We handle it in
Mesa today with the above mentioned dummy submit trick.  This ioctl
would allow us to set it directly without the dummy submit.

This may also open up possibilities for GPU drivers to move away from
implicit sync for their kernel driver uAPI and instead provide sync
files and rely on dma-buf import/export for communicating with other
implicit sync clients.

We make the explicit choice here to only allow setting RW fences which
translates to an exclusive fence on the dma_resv.  There's no use for
read-only fences for communicating with other implicit sync userspace
and any such attempts are likely to be racy at best.  When we got to
insert the RW fence, the actual fence we set as the new exclusive fence
is a combination of the sync_file provided by the user and all the other
fences on the dma_resv.  This ensures that the newly added exclusive
fence will never signal before the old one would have and ensures that
we don't break any dma_resv contracts.  We require userspace to specify
RW in the flags for symmetry with the export ioctl and in case we ever
want to support read fences in the future.

There is one downside here that's worth documenting:  If two clients
writing to the same dma-buf using this API race with each other, their
actions on the dma-buf may happen in parallel or in an undefined order.
Both with and without this API, the pattern is the same:  Collect all
the fences on dma-buf, submit work which depends on said fences, and
then set a new exclusive (write) fence on the dma-buf which depends on
said work.  The difference is that, when it's all handled by the GPU
driver's submit ioctl, the three operations happen atomically under the
dma_resv lock.  If two userspace submits race, one will happen before
the other.  You aren't guaranteed which but you are guaranteed that
they're strictly ordered.  If userspace manages the fences itself, then
these three operations happen separately and the two render operations
may happen genuinely in parallel or get interleaved.  However, this is a
case of userspace racing with itself.  As long as we ensure userspace
can't back the kernel into a corner, it should be fine.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Split import and export into separate patches
 - New commit message

v7 (Daniel Vetter):
 - Fix the uapi header to use the right struct in the ioctl
 - Use a separate dma_buf_import_sync_file struct
 - Add kerneldoc for dma_buf_import_sync_file

v8 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

Signed-off-by: Jason Ekstrand 
Cc: Christian König 
Cc: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 36 
 include/uapi/linux/dma-buf.h | 22 ++
 2 files changed, 58 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 529e0611e53b..68aac6f694f9 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -383,6 +383,40 @@ static long dma_buf_export_sync_file(struct dma_buf 
*dmabuf,
put_unused_fd(fd);
return ret;
 }
+
+static long dma_buf_import_sync_file(struct dma_buf *dmabuf,
+const void __user *user_data)
+{
+   struct dma_buf_import_sync_file arg;
+   struct dma_fence *fence;
+   enum dma_resv_usage usage;
+   int ret = 0;
+
+   if (copy_from_user(, user_data, sizeof(arg)))
+   return -EFAULT;
+
+   if (arg.flags != DMA_BUF_SYNC_RW)
+   return -EINVAL;
+
+   fence

[PATCH 1/2] dma-buf: Add an API for exporting sync files (v13)

2022-05-05 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit
synchronization model.  This doesn't always play nicely with the
implicit synchronization used in the kernel and assumed by X11 and
Wayland.  The client -> compositor half of the synchronization isn't too
bad, at least on intel, because we can control whether or not i915
synchronizes on the buffer and whether or not it's considered written.

The harder part is the compositor -> client synchronization when we get
the buffer back from the compositor.  We're required to be able to
provide the client with a VkSemaphore and VkFence representing the point
in time where the window system (compositor and/or display) finished
using the buffer.  With current APIs, it's very hard to do this in such
a way that we don't get confused by the Vulkan driver's access of the
buffer.  In particular, once we tell the kernel that we're rendering to
the buffer again, any CPU waits on the buffer or GPU dependencies will
wait on some of the client rendering and not just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of
the implicit synchronization state of a given dma-buf in the form of a
sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
instead of CPU waiting directly, it encapsulates the wait operation, at
the current moment in time, in a sync_file so we can check/wait on it
later.  As long as the Vulkan driver does the sync_file export from the
dma-buf before we re-introduce it for rendering, it will only contain
fences from the compositor or display.  This allows to accurately turn
it into a VkFence or VkSemaphore without any over-synchronization.

By making this an ioctl on the dma-buf itself, it allows this new
functionality to be used in an entirely driver-agnostic way without
having access to a DRM fd. This makes it ideal for use in driver-generic
code in Mesa or in a client such as a compositor where the DRM fd may be
hard to reach.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Drop the sync_file import as it was all-around sketchy and not nearly
   as useful as import.
 - Re-introduce READ/WRITE flag support for export
 - Rework the commit message

v7 (Jason Ekstrand):
 - Require at least one sync flag
 - Fix a refcounting bug: dma_resv_get_excl() doesn't take a reference
 - Use _rcu helpers since we're accessing the dma_resv read-only

v8 (Jason Ekstrand):
 - Return -ENOMEM if the sync_file_create fails
 - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)

v9 (Jason Ekstrand):
 - Add documentation for the new ioctl

v10 (Jason Ekstrand):
 - Go back to dma_buf_sync_file as the ioctl struct name

v11 (Daniel Vetter):
 - Go back to dma_buf_export_sync_file as the ioctl struct name
 - Better kerneldoc describing what the read/write flags do

v12 (Christian König):
 - Document why we chose to make it an ioctl on dma-buf

v12 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

Signed-off-by: Jason Ekstrand 
Acked-by: Simon Ser 
Acked-by: Christian König 
Reviewed-by: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 64 
 include/uapi/linux/dma-buf.h | 35 
 2 files changed, 99 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 79795857be3e..529e0611e53b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -192,6 +193,9 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
offset, int whence)
  * Note that this only signals the completion of the respective fences, i.e. 
the
  * DMA transfers are complete. Cache flushing and any other necessary
  * preparations before CPU access can begin still need to happen.
+ *
+ * As an alternative to poll(), the set of fences on DMA buffer can be
+ * exported as a _file using _buf_sync_file_export.
  */
 
 static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
@@ -326,6 +330,61 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const 
char __user *buf)
return 0;
 }
 
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+static long dma_buf_export_sync_file(struct dma_buf *dmabuf,
+void __user *user_data)
+{
+   struct dma_buf_export_sync_file arg;
+   enum dma_resv_usage usage;
+   struct dma_fence *fence = NULL;
+   struct sync_file *syn

[PATCH 1/2] dma-buf: Add an API for exporting sync files (v13)

2022-05-05 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit
synchronization model.  This doesn't always play nicely with the
implicit synchronization used in the kernel and assumed by X11 and
Wayland.  The client -> compositor half of the synchronization isn't too
bad, at least on intel, because we can control whether or not i915
synchronizes on the buffer and whether or not it's considered written.

The harder part is the compositor -> client synchronization when we get
the buffer back from the compositor.  We're required to be able to
provide the client with a VkSemaphore and VkFence representing the point
in time where the window system (compositor and/or display) finished
using the buffer.  With current APIs, it's very hard to do this in such
a way that we don't get confused by the Vulkan driver's access of the
buffer.  In particular, once we tell the kernel that we're rendering to
the buffer again, any CPU waits on the buffer or GPU dependencies will
wait on some of the client rendering and not just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of
the implicit synchronization state of a given dma-buf in the form of a
sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
instead of CPU waiting directly, it encapsulates the wait operation, at
the current moment in time, in a sync_file so we can check/wait on it
later.  As long as the Vulkan driver does the sync_file export from the
dma-buf before we re-introduce it for rendering, it will only contain
fences from the compositor or display.  This allows to accurately turn
it into a VkFence or VkSemaphore without any over-synchronization.

By making this an ioctl on the dma-buf itself, it allows this new
functionality to be used in an entirely driver-agnostic way without
having access to a DRM fd. This makes it ideal for use in driver-generic
code in Mesa or in a client such as a compositor where the DRM fd may be
hard to reach.

v2 (Jason Ekstrand):
 - Use a wrapper dma_fence_array of all fences including the new one
   when importing an exclusive fence.

v3 (Jason Ekstrand):
 - Lock around setting shared fences as well as exclusive
 - Mark SIGNAL_SYNC_FILE as a read-write ioctl.
 - Initialize ret to 0 in dma_buf_wait_sync_file

v4 (Jason Ekstrand):
 - Use the new dma_resv_get_singleton helper

v5 (Jason Ekstrand):
 - Rename the IOCTLs to import/export rather than wait/signal
 - Drop the WRITE flag and always get/set the exclusive fence

v6 (Jason Ekstrand):
 - Drop the sync_file import as it was all-around sketchy and not nearly
   as useful as import.
 - Re-introduce READ/WRITE flag support for export
 - Rework the commit message

v7 (Jason Ekstrand):
 - Require at least one sync flag
 - Fix a refcounting bug: dma_resv_get_excl() doesn't take a reference
 - Use _rcu helpers since we're accessing the dma_resv read-only

v8 (Jason Ekstrand):
 - Return -ENOMEM if the sync_file_create fails
 - Predicate support on IS_ENABLED(CONFIG_SYNC_FILE)

v9 (Jason Ekstrand):
 - Add documentation for the new ioctl

v10 (Jason Ekstrand):
 - Go back to dma_buf_sync_file as the ioctl struct name

v11 (Daniel Vetter):
 - Go back to dma_buf_export_sync_file as the ioctl struct name
 - Better kerneldoc describing what the read/write flags do

v12 (Christian König):
 - Document why we chose to make it an ioctl on dma-buf

v13 (Jason Ekstrand):
 - Rebase on Christian König's fence rework

Signed-off-by: Jason Ekstrand 
Acked-by: Simon Ser 
Acked-by: Christian König 
Reviewed-by: Daniel Vetter 
Cc: Sumit Semwal 
Cc: Maarten Lankhorst 
---
 drivers/dma-buf/dma-buf.c| 64 
 include/uapi/linux/dma-buf.h | 35 
 2 files changed, 99 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 79795857be3e..529e0611e53b 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -192,6 +193,9 @@ static loff_t dma_buf_llseek(struct file *file, loff_t 
offset, int whence)
  * Note that this only signals the completion of the respective fences, i.e. 
the
  * DMA transfers are complete. Cache flushing and any other necessary
  * preparations before CPU access can begin still need to happen.
+ *
+ * As an alternative to poll(), the set of fences on DMA buffer can be
+ * exported as a _file using _buf_sync_file_export.
  */
 
 static void dma_buf_poll_cb(struct dma_fence *fence, struct dma_fence_cb *cb)
@@ -326,6 +330,61 @@ static long dma_buf_set_name(struct dma_buf *dmabuf, const 
char __user *buf)
return 0;
 }
 
+#if IS_ENABLED(CONFIG_SYNC_FILE)
+static long dma_buf_export_sync_file(struct dma_buf *dmabuf,
+void __user *user_data)
+{
+   struct dma_buf_export_sync_file arg;
+   enum dma_resv_usage usage;
+   struct dma_fence *fence = NULL;
+   struct sync_file *syn

Re: [PATCH 18/24] dma-buf: add enum dma_resv_usage v3

2022-03-03 Thread Jason Ekstrand
On Wed, Dec 22, 2021 at 4:00 PM Daniel Vetter  wrote:

> On Tue, Dec 07, 2021 at 01:34:05PM +0100, Christian König wrote:
> > This change adds the dma_resv_usage enum and allows us to specify why a
> > dma_resv object is queried for its containing fences.
> >
> > Additional to that a dma_resv_usage_rw() helper function is added to aid
> > retrieving the fences for a read or write userspace submission.
> >
> > This is then deployed to the different query functions of the dma_resv
> > object and all of their users. When the write paratermer was previously
> > true we now use DMA_RESV_USAGE_WRITE and DMA_RESV_USAGE_READ otherwise.
> >
> > v2: add KERNEL/OTHER in separate patch
> > v3: some kerneldoc suggestions by Daniel
> >
> > Signed-off-by: Christian König 
>
> Just commenting on the kerneldoc here.
>
> > diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
> > index ecb2ff606bac..33a17db89fb4 100644
> > --- a/drivers/dma-buf/dma-resv.c
> > +++ b/drivers/dma-buf/dma-resv.c
> > @@ -408,7 +408,7 @@ static void dma_resv_iter_restart_unlocked(struct
> dma_resv_iter *cursor)
> > cursor->seq = read_seqcount_begin(>obj->seq);
> > cursor->index = -1;
> > cursor->shared_count = 0;
> > -   if (cursor->all_fences) {
> > +   if (cursor->usage >= DMA_RESV_USAGE_READ) {
>

If we're going to do this


> > cursor->fences = dma_resv_shared_list(cursor->obj);
> > if (cursor->fences)
> > cursor->shared_count =
> cursor->fences->shared_count;


> > diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> > index 40ac9d486f8f..d96d8ca9af56 100644
> > --- a/include/linux/dma-resv.h
> > +++ b/include/linux/dma-resv.h
> > @@ -49,6 +49,49 @@ extern struct ww_class reservation_ww_class;
> >
> >  struct dma_resv_list;
> >
> > +/**
> > + * enum dma_resv_usage - how the fences from a dma_resv obj are used
> > + *
>

We probably want a note in here about the ordering of this enum.  I'm not
even sure that comparing enum values is good or that all values will have a
strict ordering that can be useful.  It would definitely make me nervous if
anything outside dma-resv.c is doing comparisons on these values.

--Jason


> > + * This enum describes the different use cases for a dma_resv object and
> > + * controls which fences are returned when queried.
>
> We need to link here to both dma_buf.resv and from there to here.
>
> Also we had a fair amount of text in the old dma_resv fields which should
> probably be included here.
>
> > + */
> > +enum dma_resv_usage {
> > + /**
> > +  * @DMA_RESV_USAGE_WRITE: Implicit write synchronization.
> > +  *
> > +  * This should only be used for userspace command submissions
> which add
> > +  * an implicit write dependency.
> > +  */
> > + DMA_RESV_USAGE_WRITE,
> > +
> > + /**
> > +  * @DMA_RESV_USAGE_READ: Implicit read synchronization.
> > +  *
> > +  * This should only be used for userspace command submissions
> which add
> > +  * an implicit read dependency.
>
> I think the above would benefit from at least a link each to _buf.resv
> for further discusion.
>
> Plus the READ flag needs a huge warning that in general it does _not_
> guarantee that neither there's no writes possible, nor that the writes can
> be assumed mistakes and dropped (on buffer moves e.g.).
>
> Drivers can only make further assumptions for driver-internal dma_resv
> objects (e.g. on vm/pagetables) or when the fences are all fences of the
> same driver (e.g. the special sync rules amd has that takes the fence
> owner into account).
>
> We have this documented in the dma_buf.resv rules, but since it came up
> again in a discussion with Thomas H. somewhere, it's better to hammer this
> in a few more time. Specically in generally ignoring READ fences for
> buffer moves (well the copy job, memory freeing still has to wait for all
> of them) is a correctness bug.
>
> Maybe include a big warning that really the difference between READ and
> WRITE should only matter for implicit sync, and _not_ for anything else
> the kernel does.
>
> I'm assuming the actual replacement is all mechanical, so I skipped that
> one for now, that's for next year :-)
> -Daniel
>
> > +  */
> > + DMA_RESV_USAGE_READ,
> > +};
> > +
> > +/**
> > + * dma_resv_usage_rw - helper for implicit sync
> > + * @write: true if we create a new implicit sync write
> > + *
> > + * This returns the implicit synchronization usage for write or read
> accesses,
> > + * see enum dma_resv_usage.
> > + */
> > +static inline enum dma_resv_usage dma_resv_usage_rw(bool write)
> > +{
> > + /* This looks confusing at first sight, but is indeed correct.
> > +  *
> > +  * The rational is that new write operations needs to wait for the
> > +  * existing read and write operations to finish.
> > +  * But a new read operation only needs to wait for the existing
> write
> > +  * operations to 

Re: [PATCH 04/24] dma-buf: add dma_resv_get_singleton v2

2022-03-03 Thread Jason Ekstrand
On Mon, Jan 17, 2022 at 5:26 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 14.01.22 um 17:31 schrieb Daniel Vetter:
> > On Mon, Jan 03, 2022 at 12:13:41PM +0100, Christian König wrote:
> >> Am 22.12.21 um 22:21 schrieb Daniel Vetter:
> >>> On Tue, Dec 07, 2021 at 01:33:51PM +0100, Christian König wrote:
>  Add a function to simplify getting a single fence for all the fences
> in
>  the dma_resv object.
> 
>  v2: fix ref leak in error handling
> 
>  Signed-off-by: Christian König 
>  ---
> drivers/dma-buf/dma-resv.c | 52
> ++
> include/linux/dma-resv.h   |  2 ++
> 2 files changed, 54 insertions(+)
> 
>  diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
>  index 480c305554a1..694716a3d66d 100644
>  --- a/drivers/dma-buf/dma-resv.c
>  +++ b/drivers/dma-buf/dma-resv.c
>  @@ -34,6 +34,7 @@
>  */
> #include 
>  +#include 
> #include 
> #include 
> #include 
>  @@ -657,6 +658,57 @@ int dma_resv_get_fences(struct dma_resv *obj,
> bool write,
> }
> EXPORT_SYMBOL_GPL(dma_resv_get_fences);
>  +/**
>  + * dma_resv_get_singleton - Get a single fence for all the fences
>  + * @obj: the reservation object
>  + * @write: true if we should return all fences
>  + * @fence: the resulting fence
>  + *
>  + * Get a single fence representing all the fences inside the resv
> object.
>  + * Returns either 0 for success or -ENOMEM.
>  + *
>  + * Warning: This can't be used like this when adding the fence back
> to the resv
>  + * object since that can lead to stack corruption when finalizing the
>  + * dma_fence_array.
> >>> Uh I don't get this one? I thought the only problem with nested fences
> is
> >>> the signalling recursion, which we work around with the irq_work?
> >> Nope, the main problem is finalizing the dma_fence_array.
> >>
> >> E.g. imagine that you build up a chain of dma_fence_array objects like
> this:
> >> a<-b<-c<-d<-e<-f.
> >>
> >> With each one referencing the previous dma_fence_array and then you call
> >> dma_fence_put() on the last one. That in turn will cause calling
> >> dma_fence_put() on the previous one, which in turn will cause
> >> dma_fence_put() one the one before the previous one etc
> >>
> >> In other words you recurse because each dma_fence_array instance drops
> the
> >> last reference of it's predecessor.
> >>
> >> What we could do is to delegate dropping the reference to the containing
> >> fences in a dma_fence_array as well, but that would require some
> changes to
> >> the irq_work_run_list() function to be halve way efficient.o
> >>
> >>> Also if there's really an issue with dma_fence_array fences, then that
> >>> warning should be on the dma_resv kerneldoc, not somewhere hidden like
> >>> this. And finally I really don't see what can go wrong, sure we'll end
> up
> >>> with the same fence once in the dma_resv_list and then once more in the
> >>> fence array. But they're all refcounted, so really shouldn't matter.
> >>>
> >>> The code itself looks correct, but me not understanding what even goes
> >>> wrong here freaks me out a bit.
> >> Yeah, IIRC we already discussed that with Jason in length as well.
> >>
> >> Essentially what you can't do is to put a dma_fence_array into another
> >> dma_fence_array without causing issues.
> >>
> >> So I think we should maybe just add a WARN_ON() into
> dma_fence_array_init()
> >> to make sure that this never happens.
> > Yeah I think this would be much clearer instead of sprinkling half the
> > story as a scary warning over all kinds of users which
> > internally use dma fence arrays.
>

Agreed.  WARN_ON in dma_fence_array_init() would be better for everyone, I
think.


> > And then if it goes boom I guess we could fix it internally in
> > dma_fence_array_init by flattening fences down again. But only if
> actually
> > needed.
>
> Ok, going to do that first then.
>

Sounds good.  This patch looks pretty reasonable to me.  I do have a bit of
a concern with how it's being used to replace calls to
dma_resv_excl_fence() in later patches, though.  In particular, this may
allocate memory whereas dma_resv_excl_fence() does not so we need to be
really careful in each of the replacements that doing so is safe.  That's a
job for the per-driver reviewers but I thought I'd drop a note here so
we're all aware of and watching for it.

--Jason


> >
> > What confused me is why dma_resv is special, and from your reply it
> sounds
> > like it really isn't.
>
> Well, it isn't special in any way. It's just something very obvious
> which could go wrong.
>
> Regards,
> Christian.
>
> > -Daniel
> >
> >
> >> Regards,
> >> Christian.
> >>
> >>> I guess something to figure out next year, I kinda hoped I could
> squeeze a
> >>> review in before I disappear :-/
> >>> -Daniel
> >>>
>  + */
>  +int 

Re: [PATCH 20/24] dma-buf: add DMA_RESV_USAGE_KERNEL

2022-03-03 Thread Jason Ekstrand
On Wed, Dec 22, 2021 at 4:05 PM Daniel Vetter  wrote:

> On Tue, Dec 07, 2021 at 01:34:07PM +0100, Christian König wrote:
> > Add an usage for kernel submissions. Waiting for those
> > are mandatory for dynamic DMA-bufs.
> >
> > Signed-off-by: Christian König 
>
> Again just skipping to the doc bikeshedding, maybe with more cc others
> help with some code review too.
>
> >  EXPORT_SYMBOL(ib_umem_dmabuf_map_pages);
> > diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
> > index 4f3a6abf43c4..29d71496 100644
> > --- a/include/linux/dma-resv.h
> > +++ b/include/linux/dma-resv.h
> > @@ -54,8 +54,30 @@ struct dma_resv_list;
> >   *
> >   * This enum describes the different use cases for a dma_resv object and
> >   * controls which fences are returned when queried.
> > + *
> > + * An important fact is that there is the order KERNEL > + * when the dma_resv object is asked for fences for one use case the
> fences
> > + * for the lower use case are returned as well.
> > + *
> > + * For example when asking for WRITE fences then the KERNEL fences are
> returned
> > + * as well. Similar when asked for READ fences then both WRITE and
> KERNEL
> > + * fences are returned as well.
> >   */
> >  enum dma_resv_usage {
> > + /**
> > +  * @DMA_RESV_USAGE_KERNEL: For in kernel memory management only.
> > +  *
> > +  * This should only be used for things like copying or clearing
> memory
> > +  * with a DMA hardware engine for the purpose of kernel memory
> > +  * management.
> > +  *
> > + * Drivers *always* need to wait for those fences before
> accessing the
>

super-nit: Your whitespace is wrong here.


> s/need to/must/ to stay with usual RFC wording. It's a hard requirement or
> there's a security bug somewhere.
>

Yeah, probably.  I like *must* but that's because that's what we use in the
VK spec.  Do whatever's usual for kernel docs.

Not sure where to put this comment but I feel like the way things are
framed is a bit the wrong way around.  Specifically, I don't think we
should be talking about what fences you must wait on so much as what fences
you can safely skip.  In the previous model, the exclusive fence had to be
waited on at all times and the shared fences could be skipped unless you
were doing something that would result in a new exclusive fence.  In this
new world of "it's just a bucket of fences", we need to be very sure the
waiting is happening on the right things.  It sounds (I could be wrong)
like USAGE_KERNEL is the new exclusive fence.  If so, we need to make it
virtually impossible to ignore.

Sorry if that's a bit of a ramble.  I think what I'm saying is this:  In
whatever helpers or iterators we have, be that get_singleton or iter_begin
or whatever, we need to be sure we specify things in terms of exclusion and
not inclusion.  "Give me everything except implicit sync read fences"
rather than "give me implicit sync write fences".  If having a single,
well-ordered enum is sufficient for that, great.  If we think we'll ever
end up with something other than a strict ordering, we may need to re-think
a bit.

Concerning well-ordering... I'm a bit surprised to only see three values
here.  I expected 4:

 - kernel exclusive, used for memory moves and the like
 - kernel shared, used for "I'm using this right now, don't yank it out
from under me" which may not have any implicit sync implications whatsoever
 - implicit sync write
 - implicit sync read

If we had those four, I don't think the strict ordering works anymore.
>From the POV of implicit sync, they would look at the implicit sync
read/write fences and maybe not even kernel exclusive.  From the POV of
some doing a BO move, they'd look at all of them.  From the POV of holding
on to memory while Vulkan is using it, you want to set a kernel shared
fence but it doesn't need to interact with implicit sync at all.  Am I
missing something obvious here?

--Jason



> > +  * resource protected by the dma_resv object. The only exception
> for
> > +  * that is when the resource is known to be locked down in place by
> > +  * pinning it previously.
>
> Is this true? This sounds more confusing than helpful, because afaik in
> general our pin interfaces do not block for any kernel fences. dma_buf_pin
> doesn't do that for sure. And I don't think ttm does that either.
>
> I think the only safe thing here is to state that it's safe if a) the
> resource is pinned down and b) the callers has previously waited for the
> kernel fences.
>
> I also think we should put that wait for kernel fences into dma_buf_pin(),
> but that's maybe a later patch.
> -Daniel
>
>
>
> > +  */
> > + DMA_RESV_USAGE_KERNEL,
> > +
> >   /**
> >* @DMA_RESV_USAGE_WRITE: Implicit write synchronization.
> >*
> > --
> > 2.25.1
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
>


Re: [PATCH 07/27] Revert "drm/i915/gt: Propagate change in error status to children on unhold"

2021-08-20 Thread Jason Ekstrand
On Thu, Aug 19, 2021 at 1:22 AM Matthew Brost  wrote:
>
> Propagating errors to dependent fences is wrong, don't do it. A selftest
> in the following exposed the propagating of an error to a dependent
> fence after an engine reset.

I feel like we could still have a bit of a better message.  Maybe
something like this:

Propagating errors to dependent fences is broken and can lead to
errors from one client ending up in another.  In 3761baae908a (Revert
"drm/i915: Propagate errors on awaiting already signaled fences"), we
attempted to get rid of fence error propagation but missed the case
added in 8e9f84cf5cac ("drm/i915/gt: Propagate change in error status
to children on unhold").  Revert that one too.  This error was found
by an up-and-coming selftest which .

Otherwise, looks good to me.

--Jason

>
> This reverts commit 8e9f84cf5cac248a1c6a5daa4942879c8b765058.
>
> v2:
>  (Daniel Vetter)
>   - Use revert
>
> References: 3761baae908a (Revert "drm/i915: Propagate errors on awaiting 
> already signaled fences")
> Signed-off-by: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 
>  1 file changed, 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de5f9c86b9a4..cafb0608ffb4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -2140,10 +2140,6 @@ static void __execlists_unhold(struct i915_request *rq)
> if (p->flags & I915_DEPENDENCY_WEAK)
> continue;
>
> -   /* Propagate any change in error status */
> -   if (rq->fence.error)
> -   i915_request_set_error_once(w, 
> rq->fence.error);
> -
> if (w->engine != rq->engine)
> continue;
>
> --
> 2.32.0
>


Re: [PATCH] drm: Fix drm.h uapi header for Windows

2021-08-17 Thread Jason Ekstrand
I'd really like this for Mesa so we can pull drm_fourcc.h into common
WSI code.  Why has it stalled?

--Jason

On Tue, Dec 8, 2020 at 2:33 AM James Park  wrote:
>
> I updated the patch earlier today incorporating the suggestions. I also had 
> to bring back "#include " to drm.h because there's some sanity 
> check that fails, as if it doesn't scan past the first level of #includes..
>
> - James
>
> On Mon, Dec 7, 2020 at 3:14 AM Pekka Paalanen  wrote:
>>
>> On Mon, 07 Dec 2020 10:53:49 +
>> Simon Ser  wrote:
>>
>> > On Monday, December 7th, 2020 at 11:49 AM, James Park 
>> >  wrote:
>> >
>> > > That would work, but that's kind of an annoying requirement. I would
>> > > prefer the header to be self-sufficient.
>> >
>> > I don't want to make it more confusing than before, but here Pekka (I
>> > think) suggests to replace this:
>> >
>> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>> > index 82f3278..5eb07a5 100644
>> > --- a/include/uapi/drm/drm_fourcc.h
>> > +++ b/include/uapi/drm/drm_fourcc.h
>> > @@ -24,7 +24,11 @@
>> >  #ifndef DRM_FOURCC_H
>> >  #define DRM_FOURCC_H
>> >
>> > +#ifdef DRM_FOURCC_STANDALONE
>> > +#include "drm_basic_types.h"
>> > +#else
>> >  #include "drm.h"
>> > +#endif
>> >
>> >  #if defined(__cplusplus)
>> >  extern "C" {
>> >
>> > With this:
>> >
>> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>> > index 82f3278..5eb07a5 100644
>> > --- a/include/uapi/drm/drm_fourcc.h
>> > +++ b/include/uapi/drm/drm_fourcc.h
>> > @@ -24,7 +24,11 @@
>> >  #ifndef DRM_FOURCC_H
>> >  #define DRM_FOURCC_H
>> >
>> > +#include "drm_basic_types.h"
>> > +
>> > +#ifndef DRM_FOURCC_STANDALONE
>> >  #include "drm.h"
>> > +#endif
>> >
>> >  #if defined(__cplusplus)
>> >  extern "C" {
>> >
>> > That wouldn't change whether the header is self-sufficient or not,
>> > would it?
>>
>> Exactly this.
>>
>> This communicates properly that DRM_FOURCC_STANDALONE only affects
>> whether drm.h gets pulled in or not, and there are no other effects.
>>
>> This also makes testing better: when you unconditionally include
>> drm_basic_types.h, you are more likely to catch breakage there.
>>
>> For functionality, it makes no difference. Whether userspace does
>>
>> #include "drm.h"
>> #define DRM_FOURCC_STANDALONE
>> #include "drm_fourcc.h"
>>
>> or
>>
>> #define DRM_FOURCC_STANDALONE
>> #include "drm_fourcc.h"
>> #include "drm.h"
>>
>> the result must always be good.
>>
>>
>> Thanks,
>> pq
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/1] drm: ttm: Don't bail from ttm_global_init if debugfs_create_dir fails

2021-08-16 Thread Jason Ekstrand
Makes sense

Reviewed-by: Jason Ekstrand 

On Mon, Aug 16, 2021 at 2:40 AM Christian König
 wrote:
>
> Am 10.08.21 um 21:59 schrieb Dan Moulding:
> > In 69de4421bb4c ("drm/ttm: Initialize debugfs from
> > ttm_global_init()"), ttm_global_init was changed so that if creation
> > of the debugfs global root directory fails, ttm_global_init will bail
> > out early and return an error, leading to initialization failure of
> > DRM drivers. However, not every system will be using debugfs. On such
> > a system, debugfs directory creation can be expected to fail, but DRM
> > drivers must still be usable. This changes it so that if creation of
> > TTM's debugfs root directory fails, then no biggie: keep calm and
> > carry on.
> >
> > Fixes: 69de4421bb4c ("drm/ttm: Initialize debugfs from ttm_global_init()")
> > Signed-off-by: Dan Moulding 
>
> Good point, patch is Reviewed-by: Christian König
> .
>
> Going to pick that up later today.
>
> Regards,
> Christian.
>
> > ---
> >   drivers/gpu/drm/ttm/ttm_device.c | 2 --
> >   1 file changed, 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/ttm/ttm_device.c 
> > b/drivers/gpu/drm/ttm/ttm_device.c
> > index 74e3b460132b..2df59b3c2ea1 100644
> > --- a/drivers/gpu/drm/ttm/ttm_device.c
> > +++ b/drivers/gpu/drm/ttm/ttm_device.c
> > @@ -78,9 +78,7 @@ static int ttm_global_init(void)
> >
> >   ttm_debugfs_root = debugfs_create_dir("ttm", NULL);
> >   if (IS_ERR(ttm_debugfs_root)) {
> > - ret = PTR_ERR(ttm_debugfs_root);
> >   ttm_debugfs_root = NULL;
> > - goto out;
> >   }
> >
> >   /* Limit the number of pages in the pool to about 50% of the total
>


Re: [PATCH 2/2] drm/i915: Add pci ids and uapi for DG1

2021-08-12 Thread Jason Ekstrand
On Thu, Aug 12, 2021 at 9:49 AM Daniel Vetter  wrote:
>
> On Thu, Aug 12, 2021 at 2:44 PM Maarten Lankhorst
>  wrote:
> >
> > DG1 has support for local memory, which requires the usage of the
> > lmem placement extension for creating bo's, and memregion queries
> > to obtain the size. Because of this, those parts of the uapi are
> > no longer guarded behind FAKE_LMEM.
> >
> > According to the pull request referenced below, mesa should be mostly
> > ready for DG1. VK_EXT_memory_budget is not hooked up yet, but we
> > should definitely just enable the uapi parts by default.
> >
> > Signed-off-by: Maarten Lankhorst 
> > References: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11584
> > Cc: Jordan Justen jordan.l.jus...@intel.com
> > Cc: Jason Ekstrand ja...@jlekstrand.net
>
> Acked-by: Daniel Vetter 

Acked-by: Jason Ekstrand 

>
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_create.c | 3 ---
> >  drivers/gpu/drm/i915/i915_pci.c| 1 +
> >  drivers/gpu/drm/i915/i915_query.c  | 3 ---
> >  3 files changed, 1 insertion(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > index 23fee13a3384..1d341b8c47c0 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
> > @@ -347,9 +347,6 @@ static int ext_set_placements(struct 
> > i915_user_extension __user *base,
> >  {
> > struct drm_i915_gem_create_ext_memory_regions ext;
> >
> > -   if (!IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM))
> > -   return -ENODEV;
> > -
> > if (copy_from_user(, base, sizeof(ext)))
> > return -EFAULT;
> >
> > diff --git a/drivers/gpu/drm/i915/i915_pci.c 
> > b/drivers/gpu/drm/i915/i915_pci.c
> > index 1bbd09ad5287..93ccdc6bbd03 100644
> > --- a/drivers/gpu/drm/i915/i915_pci.c
> > +++ b/drivers/gpu/drm/i915/i915_pci.c
> > @@ -1115,6 +1115,7 @@ static const struct pci_device_id pciidlist[] = {
> > INTEL_RKL_IDS(_info),
> > INTEL_ADLS_IDS(_s_info),
> > INTEL_ADLP_IDS(_p_info),
> > +   INTEL_DG1_IDS(_info),
> > {0, 0, 0}
> >  };
> >  MODULE_DEVICE_TABLE(pci, pciidlist);
> > diff --git a/drivers/gpu/drm/i915/i915_query.c 
> > b/drivers/gpu/drm/i915/i915_query.c
> > index e49da36c62fb..5e2b909827f4 100644
> > --- a/drivers/gpu/drm/i915/i915_query.c
> > +++ b/drivers/gpu/drm/i915/i915_query.c
> > @@ -432,9 +432,6 @@ static int query_memregion_info(struct drm_i915_private 
> > *i915,
> > u32 total_length;
> > int ret, id, i;
> >
> > -   if (!IS_ENABLED(CONFIG_DRM_I915_UNSTABLE_FAKE_LMEM))
> > -   return -ENODEV;
> > -
> > if (query_item->flags != 0)
> > return -EINVAL;
> >
> > --
> > 2.32.0
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


[PATCH 2/2] drm/ttm: Include pagemap.h from ttm_tt.h

2021-08-12 Thread Jason Ekstrand
It's needed for pgprot_t which is used in the header.

Signed-off-by: Jason Ekstrand 
Cc: Christian König 
---
 drivers/gpu/drm/ttm/ttm_tt.c | 1 -
 include/drm/ttm/ttm_tt.h | 1 +
 2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 24031a8acd2d..d5cd8b5dc0bf 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -32,7 +32,6 @@
 #define pr_fmt(fmt) "[TTM] " fmt
 
 #include 
-#include 
 #include 
 #include 
 #include 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 0d97967bf955..b20e89d321b0 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -27,6 +27,7 @@
 #ifndef _TTM_TT_H_
 #define _TTM_TT_H_
 
+#include 
 #include 
 #include 
 #include 
-- 
2.31.1



[PATCH 1/2] drm/ttm: ttm_bo_device is now ttm_device

2021-08-12 Thread Jason Ekstrand
These names were changed in

commit 8af8a109b34fa88b8b91f25d11485b37d37549c3
Author: Christian König 
Date:   Thu Oct 1 14:51:40 2020 +0200

drm/ttm: device naming cleanup

But he missed a couple of them.

Signed-off-by: Jason Ekstrand 
Cc: Christian König 
Fixes: 8af8a109b34f ("drm/ttm: device naming cleanup")
---
 Documentation/gpu/drm-mm.rst | 2 +-
 include/drm/ttm/ttm_tt.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index d5a73fa2c9ef..8126beadc7df 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -37,7 +37,7 @@ TTM initialization
 This section is outdated.
 
 Drivers wishing to support TTM must pass a filled :c:type:`ttm_bo_driver
-` structure to ttm_bo_device_init, together with an
+` structure to ttm_device_init, together with an
 initialized global reference to the memory manager.  The ttm_bo_driver
 structure contains several fields with function pointers for
 initializing the TTM, allocating and freeing memory, waiting for command
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index 818680c6a8ed..0d97967bf955 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -31,7 +31,7 @@
 #include 
 #include 
 
-struct ttm_bo_device;
+struct ttm_device;
 struct ttm_tt;
 struct ttm_resource;
 struct ttm_buffer_object;
-- 
2.31.1



Re: [PATCH] drm/i915: Use locked access to ctx->engines in set_priority

2021-08-12 Thread Jason Ekstrand
On Tue, Aug 10, 2021 at 8:05 AM Daniel Vetter  wrote:
>
> This essentially reverts
>
> commit 89ff76bf9b3b0b86e6bbe344bd6378d8661303fc
> Author: Chris Wilson 
> Date:   Thu Apr 2 13:42:18 2020 +0100
>
> drm/i915/gem: Utilize rcu iteration of context engines
>
> Note that the other use of __context_engines_await have disappeard in
> the following commits:
>
> ccbc1b97948a ("drm/i915/gem: Don't allow changing the VM on running contexts 
> (v4)")
> c7a71fc8ee04 ("drm/i915: Drop getparam support for 
> I915_CONTEXT_PARAM_ENGINES")
> 4a766ae40ec8 ("drm/i915: Drop the CONTEXT_CLONE API (v2)")
>
> None of these have any business to optimize their engine lookup with
> rcu, unless extremely convincing benchmark data and a solid analysis
> why we can't make that workload (whatever it is that does) faster with
> a proper design fix.
>
> Also since there's only one caller of context_apply_all left and it's
> really just a loop, inline it and then inline the lopp body too. This
> is how all other callers that take the engine lock loop over engines,
> it's much simpler.
>
> Signed-off-by: Daniel Vetter 
> Cc: Chris Wilson 
> Cc: Mika Kuoppala 
> Cc: Daniel Vetter 
> Cc: Jason Ekstrand 
> Cc: Tvrtko Ursulin 
> Cc: Joonas Lahtinen 
> Cc: Matthew Brost 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 72 -
>  1 file changed, 14 insertions(+), 58 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index dbaeb924a437..fd169cf2f75a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1284,49 +1284,6 @@ static int __context_set_persistence(struct 
> i915_gem_context *ctx, bool state)
> return 0;
>  }
>
> -static inline struct i915_gem_engines *
> -__context_engines_await(const struct i915_gem_context *ctx,
> -   bool *user_engines)
> -{
> -   struct i915_gem_engines *engines;
> -
> -   rcu_read_lock();
> -   do {
> -   engines = rcu_dereference(ctx->engines);
> -   GEM_BUG_ON(!engines);
> -
> -   if (user_engines)
> -   *user_engines = i915_gem_context_user_engines(ctx);
> -
> -   /* successful await => strong mb */
> -   if (unlikely(!i915_sw_fence_await(>fence)))

Ugh... The first time I looked at this I thought the SW fence meant it
was actually waiting on something.  But, no, it's just making sure the
engines object still exists.  *sigh*  Burn it!

Reviewed-by: Jason Ekstrand 

> -   continue;
> -
> -   if (likely(engines == rcu_access_pointer(ctx->engines)))
> -   break;
> -
> -   i915_sw_fence_complete(>fence);
> -   } while (1);
> -   rcu_read_unlock();
> -
> -   return engines;
> -}
> -
> -static void
> -context_apply_all(struct i915_gem_context *ctx,
> - void (*fn)(struct intel_context *ce, void *data),
> - void *data)
> -{
> -   struct i915_gem_engines_iter it;
> -   struct i915_gem_engines *e;
> -   struct intel_context *ce;
> -
> -   e = __context_engines_await(ctx, NULL);
> -   for_each_gem_engine(ce, e, it)
> -   fn(ce, data);
> -   i915_sw_fence_complete(>fence);
> -}
> -
>  static struct i915_gem_context *
>  i915_gem_create_context(struct drm_i915_private *i915,
> const struct i915_gem_proto_context *pc)
> @@ -1776,23 +1733,11 @@ set_persistence(struct i915_gem_context *ctx,
> return __context_set_persistence(ctx, args->value);
>  }
>
> -static void __apply_priority(struct intel_context *ce, void *arg)
> -{
> -   struct i915_gem_context *ctx = arg;
> -
> -   if (!intel_engine_has_timeslices(ce->engine))
> -   return;
> -
> -   if (ctx->sched.priority >= I915_PRIORITY_NORMAL &&
> -   intel_engine_has_semaphores(ce->engine))
> -   intel_context_set_use_semaphores(ce);
> -   else
> -   intel_context_clear_use_semaphores(ce);
> -}
> -
>  static int set_priority(struct i915_gem_context *ctx,
> const struct drm_i915_gem_context_param *args)
>  {
> +   struct i915_gem_engines_iter it;
> +   struct intel_context *ce;
> int err;
>
> err = validate_priority(ctx->i915, args);
> @@ -1800,7 +1745,18 @@ static int set_priority(struct i915_gem_context *ctx,
> return err;
>
> ctx->sched.pri

Re: [PATCH] drm/doc/rfc: drop lmem uapi section

2021-08-10 Thread Jason Ekstrand
Acked-by: Jason Ekstrand 

On Tue, Aug 10, 2021 at 7:34 AM Daniel Vetter  wrote:
>
> We still have quite a bit more work to do with overall reworking of
> the ttm-based dg1 code, but the uapi stuff is now finalized with the
> latest pull. So remove that.
>
> This also fixes kerneldoc build warnings because we've included the
> same headers in two places, resulting in sphinx complaining about
> duplicated symbols. This regression has been created when we moved the
> uapi definitions to the real include/uapi/ folder in 727ecd99a4c9
> ("drm/doc/rfc: drop the i915_gem_lmem.h header")
>
> Reported-by: Stephen Rothwell 
> Cc: Stephen Rothwell 
> Fixes: 727ecd99a4c9 ("drm/doc/rfc: drop the i915_gem_lmem.h header")
> Cc: Matthew Auld 
> Signed-off-by: Daniel Vetter 
> ---
>  Documentation/gpu/rfc/i915_gem_lmem.rst | 107 
>  1 file changed, 107 deletions(-)
>
> diff --git a/Documentation/gpu/rfc/i915_gem_lmem.rst 
> b/Documentation/gpu/rfc/i915_gem_lmem.rst
> index 675ba8620d66..91be041e68cc 100644
> --- a/Documentation/gpu/rfc/i915_gem_lmem.rst
> +++ b/Documentation/gpu/rfc/i915_gem_lmem.rst
> @@ -22,110 +22,3 @@ real, with all the uAPI bits is:
>  * SET/GET ioctl caching(see `I915 SET/GET CACHING`_)
>  * Send RFC(with mesa-dev on cc) for final sign off on the uAPI
>  * Add pciid for DG1 and turn on uAPI for real
> -
> -New object placement and region query uAPI
> -==
> -Starting from DG1 we need to give userspace the ability to allocate buffers 
> from
> -device local-memory. Currently the driver supports gem_create, which can 
> place
> -buffers in system memory via shmem, and the usual assortment of other
> -interfaces, like dumb buffers and userptr.
> -
> -To support this new capability, while also providing a uAPI which will work
> -beyond just DG1, we propose to offer three new bits of uAPI:
> -
> -DRM_I915_QUERY_MEMORY_REGIONS
> --
> -New query ID which allows userspace to discover the list of supported memory
> -regions(like system-memory and local-memory) for a given device. We identify
> -each region with a class and instance pair, which should be unique. The class
> -here would be DEVICE or SYSTEM, and the instance would be zero, on platforms
> -like DG1.
> -
> -Side note: The class/instance design is borrowed from our existing engine 
> uAPI,
> -where we describe every physical engine in terms of its class, and the
> -particular instance, since we can have more than one per class.
> -
> -In the future we also want to expose more information which can further
> -describe the capabilities of a region.
> -
> -.. kernel-doc:: include/uapi/drm/i915_drm.h
> -:functions: drm_i915_gem_memory_class 
> drm_i915_gem_memory_class_instance drm_i915_memory_region_info 
> drm_i915_query_memory_regions
> -
> -GEM_CREATE_EXT
> ---
> -New ioctl which is basically just gem_create but now allows userspace to 
> provide
> -a chain of possible extensions. Note that if we don't provide any extensions 
> and
> -set flags=0 then we get the exact same behaviour as gem_create.
> -
> -Side note: We also need to support PXP[1] in the near future, which is also
> -applicable to integrated platforms, and adds its own gem_create_ext 
> extension,
> -which basically lets userspace mark a buffer as "protected".
> -
> -.. kernel-doc:: include/uapi/drm/i915_drm.h
> -:functions: drm_i915_gem_create_ext
> -
> -I915_GEM_CREATE_EXT_MEMORY_REGIONS
> ---
> -Implemented as an extension for gem_create_ext, we would now allow userspace 
> to
> -optionally provide an immutable list of preferred placements at creation 
> time,
> -in priority order, for a given buffer object.  For the placements we expect
> -them each to use the class/instance encoding, as per the output of the 
> regions
> -query. Having the list in priority order will be useful in the future when
> -placing an object, say during eviction.
> -
> -.. kernel-doc:: include/uapi/drm/i915_drm.h
> -:functions: drm_i915_gem_create_ext_memory_regions
> -
> -One fair criticism here is that this seems a little over-engineered[2]. If we
> -just consider DG1 then yes, a simple gem_create.flags or something is totally
> -all that's needed to tell the kernel to allocate the buffer in local-memory 
> or
> -whatever. However looking to the future we need uAPI which can also support
> -upcoming Xe HP multi-tile architecture in a sane way, where there can be
> -multiple local-memory instances for a given device, and so using both class 
> and
> -instance in our uAPI to describe regions is desirable, a

Re: [PATCH] drm/i915: Release ctx->syncobj on final put, not on ctx close

2021-08-07 Thread Jason Ekstrand

On August 6, 2021 15:18:59 Daniel Vetter  wrote:


gem context refcounting is another exercise in least locking design it
seems, where most things get destroyed upon context closure (which can
race with anything really). Only the actual memory allocation and the
locks survive while holding a reference.

This tripped up Jason when reimplementing the single timeline feature
in

commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d
Author: Jason Ekstrand 
Date:   Thu Jul 8 10:48:12 2021 -0500

   drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)

We could fix the bug by holding ctx->mutex, but it's cleaner to just


What bug is this fixing, exactly?

--Jason



make the context object actually invariant over its _entire_ lifetime.

Signed-off-by: Daniel Vetter 
Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")
Cc: Jason Ekstrand 
Cc: Chris Wilson 
Cc: Tvrtko Ursulin 
Cc: Joonas Lahtinen 
Cc: Matthew Brost 
Cc: Matthew Auld 
Cc: Maarten Lankhorst 
Cc: "Thomas Hellström" 
Cc: Lionel Landwerlin 
Cc: Dave Airlie 
---
drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c

index 754b9b8d4981..93ba0197d70a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref)
 trace_i915_context_free(ctx);
 GEM_BUG_ON(!i915_gem_context_is_closed(ctx));

+ if (ctx->syncobj)
+ drm_syncobj_put(ctx->syncobj);
+
 mutex_destroy(>engines_mutex);
 mutex_destroy(>lut_mutex);

@@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx)
 if (vm)
 i915_vm_close(vm);

- if (ctx->syncobj)
- drm_syncobj_put(ctx->syncobj);
-
 ctx->file_priv = ERR_PTR(-EBADF);

 /*
--
2.32.0




Re: [PATCH -next] drm/i915: fix i915_globals_exit() section mismatch error

2021-08-04 Thread Jason Ekstrand
On Wed, Aug 4, 2021 at 3:41 PM Randy Dunlap  wrote:
>
> Fix modpost Section mismatch error in i915_globals_exit().
> Since both an __init function and an __exit function can call
> i915_globals_exit(), any function that i915_globals_exit() calls
> should not be marked as __init or __exit. I.e., it needs to be
> available for either of them.
>
> WARNING: modpost: vmlinux.o(.text+0x8b796a): Section mismatch in reference 
> from the function i915_globals_exit() to the function 
> .exit.text:__i915_globals_flush()
> The function i915_globals_exit() references a function in an exit section.
> Often the function __i915_globals_flush() has valid usage outside the exit 
> section
> and the fix is to remove the __exit annotation of __i915_globals_flush.
>
> ERROR: modpost: Section mismatches detected.
> Set CONFIG_SECTION_MISMATCH_WARN_ONLY=y to allow them.

My gut says we actually want to back-port
https://lore.kernel.org/dri-devel/YPk3OCMrhg7UlU6T@phenom.ffwll.local/
instead.  Daniel, thoughts?

--Jason

>
> Fixes: 1354d830cb8f ("drm/i915: Call i915_globals_exit() if 
> pci_register_device() fails")
> Signed-off-by: Randy Dunlap 
> Cc: Jason Ekstrand 
> Cc: Daniel Vetter 
> Cc: Rodrigo Vivi 
> Cc: Jani Nikula 
> Cc: Joonas Lahtinen 
> Cc: intel-...@lists.freedesktop.org
> Cc: dri-devel@lists.freedesktop.org
> ---
>  drivers/gpu/drm/i915/i915_globals.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- linext-2021-0804.orig/drivers/gpu/drm/i915/i915_globals.c
> +++ linext-2021-0804/drivers/gpu/drm/i915/i915_globals.c
> @@ -138,7 +138,7 @@ void i915_globals_unpark(void)
> atomic_inc();
>  }
>
> -static void __exit __i915_globals_flush(void)
> +static void  __i915_globals_flush(void)
>  {
> atomic_inc(); /* skip shrinking */
>


[PATCH] docs/drm: Add a new bullet to the uAPI requirements (v2)

2021-08-04 Thread Jason Ekstrand
While tracking down various bits of i915 uAPI, it's been difficult to
find the userspace much of the time because no one bothers to mention it
in commit messages.  Require the kernel patch to be a one-stop shop for
finding the various bits which were used to justify the new uAPI.

v2 (Daniel Vetter):
 - Minor wording tweaks

Signed-off-by: Jason Ekstrand 
Acked-by: Daniel Vetter 
Cc: Dave Airlie 
---
 Documentation/gpu/drm-uapi.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 199afb503ab1..7b398c6fadc6 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -109,6 +109,11 @@ leads to a few additional requirements:
   userspace patches land. uAPI always flows from the kernel, doing things the
   other way round risks divergence of the uAPI definitions and header files.
 
+- The kernel patch which adds the new uAPI **must** reference the patch series
+  or merge requests in the userspaces projects which demonstrate the use of the
+  new uAPI and against which the review was done so that future developers can
+  find all of the pieces which tie together.
+
 These are fairly steep requirements, but have grown out from years of shared
 pain and experience with uAPI added hastily, and almost always regretted about
 just as fast. GFX devices change really fast, requiring a paradigm shift and
-- 
2.31.1



Re: [PATCH] docs/drm: Add a new bullet to the uAPI requirements

2021-08-04 Thread Jason Ekstrand
On Wed, Aug 4, 2021 at 1:48 PM Jason Ekstrand  wrote:
>
> While tracking down various bits of i915 uAPI, it's been difficult to
> find the userspace much of the time because no one bothers to mention it
> in commit messages.  Require the kernel patch to be a one-stop shop for
> finding the various bits which were used to justify the new uAPI.
>
> Signed-off-by: Jason Ekstrand 
> Cc: Daniel Vetter 
> Cc: Dave Airlie 
> ---
>  Documentation/gpu/drm-uapi.rst | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
> index 199afb503ab1..82f780bc3fce 100644
> --- a/Documentation/gpu/drm-uapi.rst
> +++ b/Documentation/gpu/drm-uapi.rst
> @@ -109,6 +109,11 @@ leads to a few additional requirements:
>userspace patches land. uAPI always flows from the kernel, doing things the
>other way round risks divergence of the uAPI definitions and header files.
>
> +- The kernel patch which adds the new uAPI **must** reference the patch 
> series
> +  or merge requests in the userspaces project which use the new uAPI and

Locally, I've done s/project which use/projects which demonstrate/

--Jason

> +  against which the review was done so that future developers can find all of
> +  the pieces which tie together.
> +
>  These are fairly steep requirements, but have grown out from years of shared
>  pain and experience with uAPI added hastily, and almost always regretted 
> about
>  just as fast. GFX devices change really fast, requiring a paradigm shift and
> --
> 2.31.1
>


[PATCH] docs/drm: Add a new bullet to the uAPI requirements

2021-08-04 Thread Jason Ekstrand
While tracking down various bits of i915 uAPI, it's been difficult to
find the userspace much of the time because no one bothers to mention it
in commit messages.  Require the kernel patch to be a one-stop shop for
finding the various bits which were used to justify the new uAPI.

Signed-off-by: Jason Ekstrand 
Cc: Daniel Vetter 
Cc: Dave Airlie 
---
 Documentation/gpu/drm-uapi.rst | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/gpu/drm-uapi.rst b/Documentation/gpu/drm-uapi.rst
index 199afb503ab1..82f780bc3fce 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -109,6 +109,11 @@ leads to a few additional requirements:
   userspace patches land. uAPI always flows from the kernel, doing things the
   other way round risks divergence of the uAPI definitions and header files.
 
+- The kernel patch which adds the new uAPI **must** reference the patch series
+  or merge requests in the userspaces project which use the new uAPI and
+  against which the review was done so that future developers can find all of
+  the pieces which tie together.
+
 These are fairly steep requirements, but have grown out from years of shared
 pain and experience with uAPI added hastily, and almost always regretted about
 just as fast. GFX devices change really fast, requiring a paradigm shift and
-- 
2.31.1



Re: [PATCH 2/2] drm/i915: delete gpu reloc code

2021-08-03 Thread Jason Ekstrand
Both are

Reviewed-by: Jason Ekstrand 

On Tue, Aug 3, 2021 at 7:49 AM Daniel Vetter  wrote:
>
> It's already removed, this just garbage collects it all.
>
> v2: Rebase over s/GEN/GRAPHICS_VER/
>
> v3: Also ditch eb.reloc_pool and eb.reloc_context (Maarten)
>
> Signed-off-by: Daniel Vetter 
> Cc: Jon Bloomfield 
> Cc: Chris Wilson 
> Cc: Maarten Lankhorst 
> Cc: Daniel Vetter 
> Cc: Joonas Lahtinen 
> Cc: "Thomas Hellström" 
> Cc: Matthew Auld 
> Cc: Lionel Landwerlin 
> Cc: Dave Airlie 
> Cc: Jason Ekstrand 
> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 360 +-
>  .../drm/i915/selftests/i915_live_selftests.h  |   1 -
>  2 files changed, 1 insertion(+), 360 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index e4dc4c3b4df3..98e25efffb59 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -277,16 +277,8 @@ struct i915_execbuffer {
> bool has_llc : 1;
> bool has_fence : 1;
> bool needs_unfenced : 1;
> -
> -   struct i915_request *rq;
> -   u32 *rq_cmd;
> -   unsigned int rq_size;
> -   struct intel_gt_buffer_pool_node *pool;
> } reloc_cache;
>
> -   struct intel_gt_buffer_pool_node *reloc_pool; /** relocation pool for 
> -EDEADLK handling */
> -   struct intel_context *reloc_context;
> -
> u64 invalid_flags; /** Set of execobj.flags that are invalid */
>
> u64 batch_len; /** Length of batch within object */
> @@ -1035,8 +1027,6 @@ static void eb_release_vmas(struct i915_execbuffer *eb, 
> bool final)
>
>  static void eb_destroy(const struct i915_execbuffer *eb)
>  {
> -   GEM_BUG_ON(eb->reloc_cache.rq);
> -
> if (eb->lut_size > 0)
> kfree(eb->buckets);
>  }
> @@ -1048,14 +1038,6 @@ relocation_target(const struct 
> drm_i915_gem_relocation_entry *reloc,
> return gen8_canonical_addr((int)reloc->delta + target->node.start);
>  }
>
> -static void reloc_cache_clear(struct reloc_cache *cache)
> -{
> -   cache->rq = NULL;
> -   cache->rq_cmd = NULL;
> -   cache->pool = NULL;
> -   cache->rq_size = 0;
> -}
> -
>  static void reloc_cache_init(struct reloc_cache *cache,
>  struct drm_i915_private *i915)
>  {
> @@ -1068,7 +1050,6 @@ static void reloc_cache_init(struct reloc_cache *cache,
> cache->has_fence = cache->graphics_ver < 4;
> cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
> cache->node.flags = 0;
> -   reloc_cache_clear(cache);
>  }
>
>  static inline void *unmask_page(unsigned long p)
> @@ -1090,48 +1071,10 @@ static inline struct i915_ggtt *cache_to_ggtt(struct 
> reloc_cache *cache)
> return >ggtt;
>  }
>
> -static void reloc_cache_put_pool(struct i915_execbuffer *eb, struct 
> reloc_cache *cache)
> -{
> -   if (!cache->pool)
> -   return;
> -
> -   /*
> -* This is a bit nasty, normally we keep objects locked until the end
> -* of execbuffer, but we already submit this, and have to unlock 
> before
> -* dropping the reference. Fortunately we can only hold 1 pool node at
> -* a time, so this should be harmless.
> -*/
> -   i915_gem_ww_unlock_single(cache->pool->obj);
> -   intel_gt_buffer_pool_put(cache->pool);
> -   cache->pool = NULL;
> -}
> -
> -static void reloc_gpu_flush(struct i915_execbuffer *eb, struct reloc_cache 
> *cache)
> -{
> -   struct drm_i915_gem_object *obj = cache->rq->batch->obj;
> -
> -   GEM_BUG_ON(cache->rq_size >= obj->base.size / sizeof(u32));
> -   cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
> -
> -   i915_gem_object_flush_map(obj);
> -   i915_gem_object_unpin_map(obj);
> -
> -   intel_gt_chipset_flush(cache->rq->engine->gt);
> -
> -   i915_request_add(cache->rq);
> -   reloc_cache_put_pool(eb, cache);
> -   reloc_cache_clear(cache);
> -
> -   eb->reloc_pool = NULL;
> -}
> -
>  static void reloc_cache_reset(struct reloc_cache *cache, struct 
> i915_execbuffer *eb)
>  {
> void *vaddr;
>
> -   if (cache->rq)
> -   reloc_gpu_flush(eb, cache);
> -
> if (!cache->vaddr)
> return;
>
> @@ -1313,295 +1256,6 @@ static void clflush_write32(u32 *addr, u32 value, 
> unsigne

Re: [Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-08-03 Thread Jason Ekstrand
On Tue, Aug 3, 2021 at 10:09 AM Daniel Vetter  wrote:
> On Wed, Jul 28, 2021 at 4:22 PM Matthew Auld
>  wrote:
> >
> > On Mon, 26 Jul 2021 at 17:10, Tvrtko Ursulin
> >  wrote:
> > >
> > >
> > > On 26/07/2021 16:14, Jason Ekstrand wrote:
> > > > On Mon, Jul 26, 2021 at 3:31 AM Maarten Lankhorst
> > > >  wrote:
> > > >>
> > > >> Op 23-07-2021 om 13:34 schreef Matthew Auld:
> > > >>> From: Chris Wilson 
> > > >>>
> > > >>> Jason Ekstrand requested a more efficient method than 
> > > >>> userptr+set-domain
> > > >>> to determine if the userptr object was backed by a complete set of 
> > > >>> pages
> > > >>> upon creation. To be more efficient than simply populating the userptr
> > > >>> using get_user_pages() (as done by the call to set-domain or execbuf),
> > > >>> we can walk the tree of vm_area_struct and check for gaps or vma not
> > > >>> backed by struct page (VM_PFNMAP). The question is how to handle
> > > >>> VM_MIXEDMAP which may be either struct page or pfn backed...
> > > >>>
> > > >>> With discrete we are going to drop support for set_domain(), so 
> > > >>> offering
> > > >>> a way to probe the pages, without having to resort to dummy batches 
> > > >>> has
> > > >>> been requested.
> > > >>>
> > > >>> v2:
> > > >>> - add new query param for the PROBE flag, so userspace can easily
> > > >>>check if the kernel supports it(Jason).
> > > >>> - use mmap_read_{lock, unlock}.
> > > >>> - add some kernel-doc.
> > > >>> v3:
> > > >>> - In the docs also mention that PROBE doesn't guarantee that the pages
> > > >>>will remain valid by the time they are actually used(Tvrtko).
> > > >>> - Add a small comment for the hole finding logic(Jason).
> > > >>> - Move the param next to all the other params which just return true.
> > > >>>
> > > >>> Testcase: igt/gem_userptr_blits/probe
> > > >>> Signed-off-by: Chris Wilson 
> > > >>> Signed-off-by: Matthew Auld 
> > > >>> Cc: Thomas Hellström 
> > > >>> Cc: Maarten Lankhorst 
> > > >>> Cc: Tvrtko Ursulin 
> > > >>> Cc: Jordan Justen 
> > > >>> Cc: Kenneth Graunke 
> > > >>> Cc: Jason Ekstrand 
> > > >>> Cc: Daniel Vetter 
> > > >>> Cc: Ramalingam C 
> > > >>> Reviewed-by: Tvrtko Ursulin 
> > > >>> Acked-by: Kenneth Graunke 
> > > >>> Reviewed-by: Jason Ekstrand 
> > > >>> ---
> > > >>>   drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 
> > > >>> -
> > > >>>   drivers/gpu/drm/i915/i915_getparam.c|  1 +
> > > >>>   include/uapi/drm/i915_drm.h | 20 ++
> > > >>>   3 files changed, 61 insertions(+), 1 deletion(-)
> > > >>>
> > > >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > > >>> b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > >>> index 56edfeff8c02..468a7a617fbf 100644
> > > >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > >>> @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> > > >>> i915_gem_userptr_ops = {
> > > >>>
> > > >>>   #endif
> > > >>>
> > > >>> +static int
> > > >>> +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long 
> > > >>> len)
> > > >>> +{
> > > >>> + const unsigned long end = addr + len;
> > > >>> + struct vm_area_struct *vma;
> > > >>> + int ret = -EFAULT;
> > > >>> +
> > > >>> + mmap_read_lock(mm);
> > > >>> + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > > >>> + /* Check for holes, note that we also update the addr 
> > > >>> below */
> > > >>> + if (vma->vm_start > addr)
> > >

Re: [PATCH] drm/i915/selftests: prefer the create_user helper

2021-07-28 Thread Jason Ekstrand

On July 28, 2021 10:57:23 Matthew Auld  wrote:


No need to hand roll the set_placements stuff, now that that we have a
helper for this. Also no need to handle the -ENODEV case here, since
NULL mr implies missing device support, where the for_each_memory_region
helper will always skip over such regions.

Signed-off-by: Matthew Auld 
Cc: Jason Ekstrand 


Reviewed-by: Jason Ekstrand 



---
.../drm/i915/gem/selftests/i915_gem_mman.c| 46 ++-
1 file changed, 4 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c

index 0b2b73d8a364..eed1c2c64e75 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -860,24 +860,6 @@ static bool can_mmap(struct drm_i915_gem_object *obj, 
enum i915_mmap_type type)

 return !no_map;
}

-static void object_set_placements(struct drm_i915_gem_object *obj,
-  struct intel_memory_region **placements,
-  unsigned int n_placements)
-{
- GEM_BUG_ON(!n_placements);
-
- if (n_placements == 1) {
- struct drm_i915_private *i915 = to_i915(obj->base.dev);
- struct intel_memory_region *mr = placements[0];
-
- obj->mm.placements = >mm.regions[mr->id];
- obj->mm.n_placements = 1;
- } else {
- obj->mm.placements = placements;
- obj->mm.n_placements = n_placements;
- }
-}
-
#define expand32(x) (((x) << 0) | ((x) << 8) | ((x) << 16) | ((x) << 24))
static int __igt_mmap(struct drm_i915_private *i915,
  struct drm_i915_gem_object *obj,
@@ -972,15 +954,10 @@ static int igt_mmap(void *arg)
 struct drm_i915_gem_object *obj;
 int err;

- obj = i915_gem_object_create_region(mr, sizes[i], 0, I915_BO_ALLOC_USER);
- if (obj == ERR_PTR(-ENODEV))
- continue;
-
+ obj = __i915_gem_object_create_user(i915, sizes[i], , 1);
 if (IS_ERR(obj))
 return PTR_ERR(obj);

- object_set_placements(obj, , 1);
-
 err = __igt_mmap(i915, obj, I915_MMAP_TYPE_GTT);
 if (err == 0)
 err = __igt_mmap(i915, obj, I915_MMAP_TYPE_WC);
@@ -1101,15 +1078,10 @@ static int igt_mmap_access(void *arg)
 struct drm_i915_gem_object *obj;
 int err;

- obj = i915_gem_object_create_region(mr, PAGE_SIZE, 0, I915_BO_ALLOC_USER);
- if (obj == ERR_PTR(-ENODEV))
- continue;
-
+ obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
 if (IS_ERR(obj))
 return PTR_ERR(obj);

- object_set_placements(obj, , 1);
-
 err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_GTT);
 if (err == 0)
 err = __igt_mmap_access(i915, obj, I915_MMAP_TYPE_WB);
@@ -1248,15 +1220,10 @@ static int igt_mmap_gpu(void *arg)
 struct drm_i915_gem_object *obj;
 int err;

- obj = i915_gem_object_create_region(mr, PAGE_SIZE, 0, I915_BO_ALLOC_USER);
- if (obj == ERR_PTR(-ENODEV))
- continue;
-
+ obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
 if (IS_ERR(obj))
 return PTR_ERR(obj);

- object_set_placements(obj, , 1);
-
 err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_GTT);
 if (err == 0)
 err = __igt_mmap_gpu(i915, obj, I915_MMAP_TYPE_WC);
@@ -1405,15 +1372,10 @@ static int igt_mmap_revoke(void *arg)
 struct drm_i915_gem_object *obj;
 int err;

- obj = i915_gem_object_create_region(mr, PAGE_SIZE, 0, I915_BO_ALLOC_USER);
- if (obj == ERR_PTR(-ENODEV))
- continue;
-
+ obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
 if (IS_ERR(obj))
 return PTR_ERR(obj);

- object_set_placements(obj, , 1);
-
 err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_GTT);
 if (err == 0)
 err = __igt_mmap_revoke(i915, obj, I915_MMAP_TYPE_WC);
--
2.26.3




Re: [PATCH v2 11/11] drm/i915: Extract i915_module.c

2021-07-27 Thread Jason Ekstrand
On Tue, Jul 27, 2021 at 9:44 AM Tvrtko Ursulin
 wrote:
>
>
> On 27/07/2021 13:10, Daniel Vetter wrote:
> > The module init code is somewhat misplaced in i915_pci.c, since it
> > needs to pull in init/exit functions from every part of the driver and
> > pollutes the include list a lot.
> >
> > Extract an i915_module.c file which pulls all the bits together, and
> > allows us to massively trim the include list of i915_pci.c.
> >
> > The downside is that have to drop the error path check Jason added to
> > catch when we set up the pci driver too early. I think that risk is
> > acceptable for this pretty nice include.
>
> i915_module.c is an improvement and the rest for me is not extremely
> objectionable by the end of this incarnation, but I also do not see it
> as an improvement really.

It's not a big improvement to be sure, but I think there are a few
ways this is nicer:

 1. One less level of indirection to sort through.
 2. The init/exit table is generally simpler than the i915_global interface.
 3. It's easy to forget i915_global_register but forgetting to put an
_exit function in the module init table is a lot more obvious.

None of those are deal-breakers but they're kind-of nice.  Anyway,
this one is also

Reviewed-by: Jason Ekstrand 

--Jason

> There was a bug to fix relating to mock tests, but that is where the
> exercise should have stopped for now. After that it IMHO spiraled out of
> control, not least the unjustifiably expedited removal of cache
> shrinking. On balance for me it is too churny and boils down to two
> extremely capable people spending time on kind of really unimportant
> side fiddles. And I do not intend to prescribe you what to do, just
> expressing my bewilderment. FWIW... I can only say my opinion as it, not
> that it matters a lot.
>
> Regards,
>
> Tvrtko
>
> > Cc: Jason Ekstrand 
> > Cc: Tvrtko Ursulin 
> > Signed-off-by: Daniel Vetter 
> > ---
> >   drivers/gpu/drm/i915/Makefile  |   1 +
> >   drivers/gpu/drm/i915/i915_module.c | 113 
> >   drivers/gpu/drm/i915/i915_pci.c| 117 +
> >   drivers/gpu/drm/i915/i915_pci.h|   8 ++
> >   4 files changed, 125 insertions(+), 114 deletions(-)
> >   create mode 100644 drivers/gpu/drm/i915/i915_module.c
> >   create mode 100644 drivers/gpu/drm/i915/i915_pci.h
> >
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 9022dc638ed6..4ebd9f417ddb 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -38,6 +38,7 @@ i915-y += i915_drv.o \
> > i915_irq.o \
> > i915_getparam.o \
> > i915_mitigations.o \
> > +   i915_module.o \
> > i915_params.o \
> > i915_pci.o \
> > i915_scatterlist.o \
> > diff --git a/drivers/gpu/drm/i915/i915_module.c 
> > b/drivers/gpu/drm/i915/i915_module.c
> > new file mode 100644
> > index ..c578ea8f56a0
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/i915_module.c
> > @@ -0,0 +1,113 @@
> > +/*
> > + * SPDX-License-Identifier: MIT
> > + *
> > + * Copyright © 2021 Intel Corporation
> > + */
> > +
> > +#include 
> > +
> > +#include "gem/i915_gem_context.h"
> > +#include "gem/i915_gem_object.h"
> > +#include "i915_active.h"
> > +#include "i915_buddy.h"
> > +#include "i915_params.h"
> > +#include "i915_pci.h"
> > +#include "i915_perf.h"
> > +#include "i915_request.h"
> > +#include "i915_scheduler.h"
> > +#include "i915_selftest.h"
> > +#include "i915_vma.h"
> > +
> > +static int i915_check_nomodeset(void)
> > +{
> > + bool use_kms = true;
> > +
> > + /*
> > +  * Enable KMS by default, unless explicitly overriden by
> > +  * either the i915.modeset prarameter or by the
> > +  * vga_text_mode_force boot option.
> > +  */
> > +
> > + if (i915_modparams.modeset == 0)
> > + use_kms = false;
> > +
> > + if (vgacon_text_force() && i915_modparams.modeset == -1)
> > + use_kms = false;
> > +
> > + if (!use_kms) {
> > + /* Silently fail loading to not upset userspace. */
> > + DRM_DEBUG_DRIVER("KMS disabled.\n");
> > + return 1;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static const struct {
> > +   int 

Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 11:31 AM Tvrtko Ursulin
 wrote:
>
>
> On 26/07/2021 17:20, Jason Ekstrand wrote:
> > On Mon, Jul 26, 2021 at 11:08 AM Tvrtko Ursulin
> >  wrote:
> >> On 26/07/2021 16:42, Jason Ekstrand wrote:
> >>> On Mon, Jul 26, 2021 at 10:30 AM Jason Ekstrand  
> >>> wrote:
> >>>>
> >>>> On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin
> >>>>  wrote:
> >>>>>
> >>>>>
> >>>>> On 23/07/2021 20:29, Daniel Vetter wrote:
> >>>>>> With the global kmem_cache shrink infrastructure gone there's nothing
> >>>>>> special and we can convert them over.
> >>>>>>
> >>>>>> I'm doing this split up into each patch because there's quite a bit of
> >>>>>> noise with removing the static global.slab_ce to just a
> >>>>>> slab_ce.
> >>>>>>
> >>>>>> Cc: Jason Ekstrand 
> >>>>>> Signed-off-by: Daniel Vetter 
> >>>>>> ---
> >>>>>> drivers/gpu/drm/i915/gt/intel_context.c | 25 
> >>>>>> -
> >>>>>> drivers/gpu/drm/i915/gt/intel_context.h |  3 +++
> >>>>>> drivers/gpu/drm/i915/i915_globals.c |  2 --
> >>>>>> drivers/gpu/drm/i915/i915_globals.h |  1 -
> >>>>>> drivers/gpu/drm/i915/i915_pci.c |  2 ++
> >>>>>> 5 files changed, 13 insertions(+), 20 deletions(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> >>>>>> b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>>>> index baa05fddd690..283382549a6f 100644
> >>>>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> >>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>>>> @@ -7,7 +7,6 @@
> >>>>>> #include "gem/i915_gem_pm.h"
> >>>>>>
> >>>>>> #include "i915_drv.h"
> >>>>>> -#include "i915_globals.h"
> >>>>>> #include "i915_trace.h"
> >>>>>>
> >>>>>> #include "intel_context.h"
> >>>>>> @@ -15,14 +14,11 @@
> >>>>>> #include "intel_engine_pm.h"
> >>>>>> #include "intel_ring.h"
> >>>>>>
> >>>>>> -static struct i915_global_context {
> >>>>>> - struct i915_global base;
> >>>>>> - struct kmem_cache *slab_ce;
> >>>>>> -} global;
> >>>>>> +struct kmem_cache *slab_ce;
> >>>>
> >>>> Static?  With that,
> >>>>
> >>>> Reviewed-by: Jason Ekstrand 
> >>>>
> >>>>>>
> >>>>>> static struct intel_context *intel_context_alloc(void)
> >>>>>> {
> >>>>>> - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
> >>>>>> + return kmem_cache_zalloc(slab_ce, GFP_KERNEL);
> >>>>>> }
> >>>>>>
> >>>>>> static void rcu_context_free(struct rcu_head *rcu)
> >>>>>> @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu)
> >>>>>> struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
> >>>>>>
> >>>>>> trace_intel_context_free(ce);
> >>>>>> - kmem_cache_free(global.slab_ce, ce);
> >>>>>> + kmem_cache_free(slab_ce, ce);
> >>>>>> }
> >>>>>>
> >>>>>> void intel_context_free(struct intel_context *ce)
> >>>>>> @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce)
> >>>>>> i915_active_fini(>active);
> >>>>>> }
> >>>>>>
> >>>>>> -static void i915_global_context_exit(void)
> >>>>>> +void i915_context_module_exit(void)
> >>>>>> {
> >>>>>> - kmem_cache_destroy(global.slab_ce);
> >>>>>> + kmem_cache_destroy(slab_ce);
> >>>>>> }
> >>>>>>
> >>>>>> -static struct i9

Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 11:08 AM Tvrtko Ursulin
 wrote:
> On 26/07/2021 16:42, Jason Ekstrand wrote:
> > On Mon, Jul 26, 2021 at 10:30 AM Jason Ekstrand  
> > wrote:
> >>
> >> On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin
> >>  wrote:
> >>>
> >>>
> >>> On 23/07/2021 20:29, Daniel Vetter wrote:
> >>>> With the global kmem_cache shrink infrastructure gone there's nothing
> >>>> special and we can convert them over.
> >>>>
> >>>> I'm doing this split up into each patch because there's quite a bit of
> >>>> noise with removing the static global.slab_ce to just a
> >>>> slab_ce.
> >>>>
> >>>> Cc: Jason Ekstrand 
> >>>> Signed-off-by: Daniel Vetter 
> >>>> ---
> >>>>drivers/gpu/drm/i915/gt/intel_context.c | 25 -
> >>>>drivers/gpu/drm/i915/gt/intel_context.h |  3 +++
> >>>>drivers/gpu/drm/i915/i915_globals.c |  2 --
> >>>>drivers/gpu/drm/i915/i915_globals.h |  1 -
> >>>>drivers/gpu/drm/i915/i915_pci.c |  2 ++
> >>>>5 files changed, 13 insertions(+), 20 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> >>>> b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> index baa05fddd690..283382549a6f 100644
> >>>> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> >>>> @@ -7,7 +7,6 @@
> >>>>#include "gem/i915_gem_pm.h"
> >>>>
> >>>>    #include "i915_drv.h"
> >>>> -#include "i915_globals.h"
> >>>>#include "i915_trace.h"
> >>>>
> >>>>#include "intel_context.h"
> >>>> @@ -15,14 +14,11 @@
> >>>>#include "intel_engine_pm.h"
> >>>>#include "intel_ring.h"
> >>>>
> >>>> -static struct i915_global_context {
> >>>> - struct i915_global base;
> >>>> - struct kmem_cache *slab_ce;
> >>>> -} global;
> >>>> +struct kmem_cache *slab_ce;
> >>
> >> Static?  With that,
> >>
> >> Reviewed-by: Jason Ekstrand 
> >>
> >>>>
> >>>>static struct intel_context *intel_context_alloc(void)
> >>>>{
> >>>> - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
> >>>> + return kmem_cache_zalloc(slab_ce, GFP_KERNEL);
> >>>>}
> >>>>
> >>>>static void rcu_context_free(struct rcu_head *rcu)
> >>>> @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu)
> >>>>struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
> >>>>
> >>>>trace_intel_context_free(ce);
> >>>> - kmem_cache_free(global.slab_ce, ce);
> >>>> + kmem_cache_free(slab_ce, ce);
> >>>>}
> >>>>
> >>>>void intel_context_free(struct intel_context *ce)
> >>>> @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce)
> >>>>i915_active_fini(>active);
> >>>>}
> >>>>
> >>>> -static void i915_global_context_exit(void)
> >>>> +void i915_context_module_exit(void)
> >>>>{
> >>>> - kmem_cache_destroy(global.slab_ce);
> >>>> + kmem_cache_destroy(slab_ce);
> >>>>}
> >>>>
> >>>> -static struct i915_global_context global = { {
> >>>> - .exit = i915_global_context_exit,
> >>>> -} };
> >>>> -
> >>>> -int __init i915_global_context_init(void)
> >>>> +int __init i915_context_module_init(void)
> >>>>{
> >>>> - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> >>>> - if (!global.slab_ce)
> >>>> + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> >>>> + if (!slab_ce)
> >>>>return -ENOMEM;
> >>>>
> >>>> - i915_global_register();
> >>>>return 0;
> >>>>}
> >>>>
> >>>> diff --git a/drivers/gpu/drm/i915/gt/intel

Re: [PATCH 10/10] drm/i915: Remove i915_globals

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> No longer used.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Jason Ekstrand 

But, also, tvrtko is right that dumping all that stuff in i915_pci.c
isn't great.  Mind typing a quick follow-on that moves i915_init/exit
to i915_drv.c?

--Jason

> ---
>  drivers/gpu/drm/i915/Makefile |  1 -
>  drivers/gpu/drm/i915/gt/intel_gt_pm.c |  1 -
>  drivers/gpu/drm/i915/i915_globals.c   | 53 ---
>  drivers/gpu/drm/i915/i915_globals.h   | 25 -
>  drivers/gpu/drm/i915/i915_pci.c   |  2 -
>  5 files changed, 82 deletions(-)
>  delete mode 100644 drivers/gpu/drm/i915/i915_globals.c
>  delete mode 100644 drivers/gpu/drm/i915/i915_globals.h
>
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 10b3bb6207ba..9022dc638ed6 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -166,7 +166,6 @@ i915-y += \
>   i915_gem_gtt.o \
>   i915_gem_ww.o \
>   i915_gem.o \
> - i915_globals.o \
>   i915_query.o \
>   i915_request.o \
>   i915_scheduler.o \
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index d86825437516..943c1d416ec0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -6,7 +6,6 @@
>  #include 
>
>  #include "i915_drv.h"
> -#include "i915_globals.h"
>  #include "i915_params.h"
>  #include "intel_context.h"
>  #include "intel_engine_pm.h"
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> deleted file mode 100644
> index 04979789e7be..
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ /dev/null
> @@ -1,53 +0,0 @@
> -/*
> - * SPDX-License-Identifier: MIT
> - *
> - * Copyright © 2019 Intel Corporation
> - */
> -
> -#include 
> -#include 
> -
> -#include "i915_globals.h"
> -#include "i915_drv.h"
> -
> -static LIST_HEAD(globals);
> -
> -void __init i915_global_register(struct i915_global *global)
> -{
> -   GEM_BUG_ON(!global->exit);
> -
> -   list_add_tail(>link, );
> -}
> -
> -static void __i915_globals_cleanup(void)
> -{
> -   struct i915_global *global, *next;
> -
> -   list_for_each_entry_safe_reverse(global, next, , link)
> -   global->exit();
> -}
> -
> -static __initconst int (* const initfn[])(void) = {
> -};
> -
> -int __init i915_globals_init(void)
> -{
> -   int i;
> -
> -   for (i = 0; i < ARRAY_SIZE(initfn); i++) {
> -   int err;
> -
> -   err = initfn[i]();
> -   if (err) {
> -   __i915_globals_cleanup();
> -   return err;
> -   }
> -   }
> -
> -   return 0;
> -}
> -
> -void i915_globals_exit(void)
> -{
> -   __i915_globals_cleanup();
> -}
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> deleted file mode 100644
> index 57d2998bba45..
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ /dev/null
> @@ -1,25 +0,0 @@
> -/*
> - * SPDX-License-Identifier: MIT
> - *
> - * Copyright © 2019 Intel Corporation
> - */
> -
> -#ifndef _I915_GLOBALS_H_
> -#define _I915_GLOBALS_H_
> -
> -#include 
> -
> -typedef void (*i915_global_func_t)(void);
> -
> -struct i915_global {
> -   struct list_head link;
> -
> -   i915_global_func_t exit;
> -};
> -
> -void i915_global_register(struct i915_global *global);
> -
> -int i915_globals_init(void);
> -void i915_globals_exit(void);
> -
> -#endif /* _I915_GLOBALS_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 0affcf33a211..ed72bcb58331 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -37,7 +37,6 @@
>  #include "gem/i915_gem_object.h"
>  #include "i915_request.h"
>  #include "i915_perf.h"
> -#include "i915_globals.h"
>  #include "i915_selftest.h"
>  #include "i915_scheduler.h"
>  #include "i915_vma.h"
> @@ -1308,7 +1307,6 @@ static const struct {
> { i915_request_module_init, i915_request_module_exit },
> { i915_scheduler_module_init, i915_scheduler_module_exit },
> { i915_vma_module_init, i915_vma_module_exit },
> -   { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> { i915_pmu_init, i915_pmu_exit },
> { i915_register_pci_driver, i915_unregister_pci_driver },
> --
> 2.32.0
>


Re: [PATCH 09/10] drm/i915: move vma slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_vmas to just a
> slab_vmas.
>
> We have to keep i915_drv.h include in i915_globals otherwise there's
> nothing anymore that pulls in GEM_BUG_ON.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_globals.c |  3 +--
>  drivers/gpu/drm/i915/i915_globals.h |  3 ---
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  drivers/gpu/drm/i915/i915_vma.c | 25 -
>  drivers/gpu/drm/i915/i915_vma.h |  3 +++
>  5 files changed, 14 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index 8923589057ab..04979789e7be 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -8,7 +8,7 @@
>  #include 
>
>  #include "i915_globals.h"
> -#include "i915_vma.h"
> +#include "i915_drv.h"
>
>  static LIST_HEAD(globals);
>
> @@ -28,7 +28,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_vma_init,
>  };
>
>  int __init i915_globals_init(void)
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> index 7a57bce1da05..57d2998bba45 100644
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -22,7 +22,4 @@ void i915_global_register(struct i915_global *global);
>  int i915_globals_init(void);
>  void i915_globals_exit(void);
>
> -/* constructors */
> -int i915_global_vma_init(void);
> -
>  #endif /* _I915_GLOBALS_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index a44318519977..0affcf33a211 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -40,6 +40,7 @@
>  #include "i915_globals.h"
>  #include "i915_selftest.h"
>  #include "i915_scheduler.h"
> +#include "i915_vma.h"
>
>  #define PLATFORM(x) .platform = (x)
>  #define GEN(x) \
> @@ -1306,6 +1307,7 @@ static const struct {
> { i915_objects_module_init, i915_objects_module_exit },
> { i915_request_module_init, i915_request_module_exit },
> { i915_scheduler_module_init, i915_scheduler_module_exit },
> +   { i915_vma_module_init, i915_vma_module_exit },
> { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> { i915_pmu_init, i915_pmu_exit },
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index 09a7c47926f7..d094e2016b93 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -34,24 +34,20 @@
>  #include "gt/intel_gt_requests.h"
>
>  #include "i915_drv.h"
> -#include "i915_globals.h"
>  #include "i915_sw_fence_work.h"
>  #include "i915_trace.h"
>  #include "i915_vma.h"
>
> -static struct i915_global_vma {
> -   struct i915_global base;
> -   struct kmem_cache *slab_vmas;
> -} global;
> +struct kmem_cache *slab_vmas;

static.  With that,

Reviewed-by: Jason Ekstrand 

>
>  struct i915_vma *i915_vma_alloc(void)
>  {
> -   return kmem_cache_zalloc(global.slab_vmas, GFP_KERNEL);
> +   return kmem_cache_zalloc(slab_vmas, GFP_KERNEL);
>  }
>
>  void i915_vma_free(struct i915_vma *vma)
>  {
> -   return kmem_cache_free(global.slab_vmas, vma);
> +   return kmem_cache_free(slab_vmas, vma);
>  }
>
>  #if IS_ENABLED(CONFIG_DRM_I915_ERRLOG_GEM) && IS_ENABLED(CONFIG_DRM_DEBUG_MM)
> @@ -1414,21 +1410,16 @@ void i915_vma_make_purgeable(struct i915_vma *vma)
>  #include "selftests/i915_vma.c"
>  #endif
>
> -static void i915_global_vma_exit(void)
> +void i915_vma_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_vmas);
> +   kmem_cache_destroy(slab_vmas);
>  }
>
> -static struct i915_global_vma global = { {
> -   .exit = i915_global_vma_exit,
> -} };
> -
> -int __init i915_global_vma_init(void)
> +int __init i915_vma_module_init(void)
>  {
> -   global.slab_vmas = KMEM_CACHE(i915_vma, SLAB_HWCACHE_ALIGN);
> -   if (!global.slab_vmas)
> +   slab_vmas = KMEM_CACHE(i915_vma, SLAB_HWCACHE_ALIGN);
> +   if (!slab_vmas)
> return -ENOMEM;
>
> -   i915_global_register();
> return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h
> index eca452a9851f..ed69f66c7ab0 100644
> --- a/drivers/gpu/drm/i915/i915_vma.h
> +++ b/drivers/gpu/drm/i915/i915_vma.h
> @@ -426,4 +426,7 @@ static inline int i915_vma_sync(struct i915_vma *vma)
> return i915_active_wait(>active);
>  }
>
> +void i915_vma_module_exit(void);
> +int i915_vma_module_init(void);
> +
>  #endif
> --
> 2.32.0
>


Re: [PATCH 08/10] drm/i915: move scheduler slabs to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_dependencies|priorities to just a
> slab_dependencies|priorities.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_globals.c   |  2 --
>  drivers/gpu/drm/i915/i915_globals.h   |  2 --
>  drivers/gpu/drm/i915/i915_pci.c   |  2 ++
>  drivers/gpu/drm/i915/i915_scheduler.c | 39 +++
>  drivers/gpu/drm/i915/i915_scheduler.h |  3 +++
>  5 files changed, 20 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index 8fffa8d93bc5..8923589057ab 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -8,7 +8,6 @@
>  #include 
>
>  #include "i915_globals.h"
> -#include "i915_scheduler.h"
>  #include "i915_vma.h"
>
>  static LIST_HEAD(globals);
> @@ -29,7 +28,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_scheduler_init,
> i915_global_vma_init,
>  };
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> index 9734740708f4..7a57bce1da05 100644
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -23,8 +23,6 @@ int i915_globals_init(void);
>  void i915_globals_exit(void);
>
>  /* constructors */
> -int i915_global_request_init(void);
> -int i915_global_scheduler_init(void);
>  int i915_global_vma_init(void);
>
>  #endif /* _I915_GLOBALS_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index bb2bd12fb8c2..a44318519977 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -39,6 +39,7 @@
>  #include "i915_perf.h"
>  #include "i915_globals.h"
>  #include "i915_selftest.h"
> +#include "i915_scheduler.h"
>
>  #define PLATFORM(x) .platform = (x)
>  #define GEN(x) \
> @@ -1304,6 +1305,7 @@ static const struct {
> { i915_gem_context_module_init, i915_gem_context_module_exit },
> { i915_objects_module_init, i915_objects_module_exit },
> { i915_request_module_init, i915_request_module_exit },
> +   { i915_scheduler_module_init, i915_scheduler_module_exit },
> { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> { i915_pmu_init, i915_pmu_exit },
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c 
> b/drivers/gpu/drm/i915/i915_scheduler.c
> index 561c649e59f7..02d90d239ff5 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -7,15 +7,11 @@
>  #include 
>
>  #include "i915_drv.h"
> -#include "i915_globals.h"
>  #include "i915_request.h"
>  #include "i915_scheduler.h"
>
> -static struct i915_global_scheduler {
> -   struct i915_global base;
> -   struct kmem_cache *slab_dependencies;
> -   struct kmem_cache *slab_priorities;
> -} global;
> +struct kmem_cache *slab_dependencies;

static

> +struct kmem_cache *slab_priorities;

static

>
>  static DEFINE_SPINLOCK(schedule_lock);
>
> @@ -93,7 +89,7 @@ i915_sched_lookup_priolist(struct i915_sched_engine 
> *sched_engine, int prio)
> if (prio == I915_PRIORITY_NORMAL) {
> p = _engine->default_priolist;
> } else {
> -   p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
> +   p = kmem_cache_alloc(slab_priorities, GFP_ATOMIC);
> /* Convert an allocation failure to a priority bump */
> if (unlikely(!p)) {
> prio = I915_PRIORITY_NORMAL; /* recurses just once */
> @@ -122,7 +118,7 @@ i915_sched_lookup_priolist(struct i915_sched_engine 
> *sched_engine, int prio)
>
>  void __i915_priolist_free(struct i915_priolist *p)
>  {
> -   kmem_cache_free(global.slab_priorities, p);
> +   kmem_cache_free(slab_priorities, p);
>  }
>
>  struct sched_cache {
> @@ -313,13 +309,13 @@ void i915_sched_node_reinit(struct i915_sched_node 
> *node)
>  static struct i915_dependency *
>  i915_dependency_alloc(void)
>  {
> -   return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL);
> +   return kmem_cache_alloc(slab_dependenc

Re: [PATCH 07/10] drm/i915: move request slabs to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_requests|execute_cbs to just a
> slab_requests|execute_cbs.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_globals.c |  2 --
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  drivers/gpu/drm/i915/i915_request.c | 47 -
>  drivers/gpu/drm/i915/i915_request.h |  3 ++
>  4 files changed, 24 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index 40a592fbc3e0..8fffa8d93bc5 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -8,7 +8,6 @@
>  #include 
>
>  #include "i915_globals.h"
> -#include "i915_request.h"
>  #include "i915_scheduler.h"
>  #include "i915_vma.h"
>
> @@ -30,7 +29,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_request_init,
> i915_global_scheduler_init,
> i915_global_vma_init,
>  };
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 2334eb3e9abb..bb2bd12fb8c2 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -35,6 +35,7 @@
>  #include "i915_drv.h"
>  #include "gem/i915_gem_context.h"
>  #include "gem/i915_gem_object.h"
> +#include "i915_request.h"
>  #include "i915_perf.h"
>  #include "i915_globals.h"
>  #include "i915_selftest.h"
> @@ -1302,6 +1303,7 @@ static const struct {
> { i915_context_module_init, i915_context_module_exit },
> { i915_gem_context_module_init, i915_gem_context_module_exit },
> { i915_objects_module_init, i915_objects_module_exit },
> +   { i915_request_module_init, i915_request_module_exit },
> { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> { i915_pmu_init, i915_pmu_exit },
> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> b/drivers/gpu/drm/i915/i915_request.c
> index 6594cb2f8ebd..69152369ea00 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -42,7 +42,6 @@
>
>  #include "i915_active.h"
>  #include "i915_drv.h"
> -#include "i915_globals.h"
>  #include "i915_trace.h"
>  #include "intel_pm.h"
>
> @@ -52,11 +51,8 @@ struct execute_cb {
> struct i915_request *signal;
>  };
>
> -static struct i915_global_request {
> -   struct i915_global base;
> -   struct kmem_cache *slab_requests;
> -   struct kmem_cache *slab_execute_cbs;
> -} global;
> +struct kmem_cache *slab_requests;

static

> +struct kmem_cache *slab_execute_cbs;

static

Am I tired of typing this?  Yes, I am!  Will I keep typing it?  Probably. :-P

>
>  static const char *i915_fence_get_driver_name(struct dma_fence *fence)
>  {
> @@ -107,7 +103,7 @@ static signed long i915_fence_wait(struct dma_fence 
> *fence,
>
>  struct kmem_cache *i915_request_slab_cache(void)
>  {
> -   return global.slab_requests;
> +   return slab_requests;
>  }
>
>  static void i915_fence_release(struct dma_fence *fence)
> @@ -159,7 +155,7 @@ static void i915_fence_release(struct dma_fence *fence)
> !cmpxchg(>engine->request_pool, NULL, rq))
> return;
>
> -   kmem_cache_free(global.slab_requests, rq);
> +   kmem_cache_free(slab_requests, rq);
>  }
>
>  const struct dma_fence_ops i915_fence_ops = {
> @@ -176,7 +172,7 @@ static void irq_execute_cb(struct irq_work *wrk)
> struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
>
> i915_sw_fence_complete(cb->fence);
> -   kmem_cache_free(global.slab_execute_cbs, cb);
> +   kmem_cache_free(slab_execute_cbs, cb);
>  }
>
>  static __always_inline void
> @@ -514,7 +510,7 @@ __await_execution(struct i915_request *rq,
> if (i915_request_is_active(signal))
> return 0;
>
> -   cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
> +   cb = kmem_cache_alloc(slab_execute_cbs, gfp);
> if (!cb)
> return -ENOMEM;
>
> @@ -868,7 +864,7 @@ request_alloc_slow(struct intel_timeline *tl,
> rq = list_first_entry(>requests, typeof(*rq), 

Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 10:30 AM Jason Ekstrand  wrote:
>
> On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin
>  wrote:
> >
> >
> > On 23/07/2021 20:29, Daniel Vetter wrote:
> > > With the global kmem_cache shrink infrastructure gone there's nothing
> > > special and we can convert them over.
> > >
> > > I'm doing this split up into each patch because there's quite a bit of
> > > noise with removing the static global.slab_ce to just a
> > > slab_ce.
> > >
> > > Cc: Jason Ekstrand 
> > > Signed-off-by: Daniel Vetter 
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_context.c | 25 -
> > >   drivers/gpu/drm/i915/gt/intel_context.h |  3 +++
> > >   drivers/gpu/drm/i915/i915_globals.c |  2 --
> > >   drivers/gpu/drm/i915/i915_globals.h |  1 -
> > >   drivers/gpu/drm/i915/i915_pci.c |  2 ++
> > >   5 files changed, 13 insertions(+), 20 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > > b/drivers/gpu/drm/i915/gt/intel_context.c
> > > index baa05fddd690..283382549a6f 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > > @@ -7,7 +7,6 @@
> > >   #include "gem/i915_gem_pm.h"
> > >
> > >   #include "i915_drv.h"
> > > -#include "i915_globals.h"
> > >   #include "i915_trace.h"
> > >
> > >   #include "intel_context.h"
> > > @@ -15,14 +14,11 @@
> > >   #include "intel_engine_pm.h"
> > >   #include "intel_ring.h"
> > >
> > > -static struct i915_global_context {
> > > - struct i915_global base;
> > > - struct kmem_cache *slab_ce;
> > > -} global;
> > > +struct kmem_cache *slab_ce;
>
> Static?  With that,
>
> Reviewed-by: Jason Ekstrand 
>
> > >
> > >   static struct intel_context *intel_context_alloc(void)
> > >   {
> > > - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
> > > + return kmem_cache_zalloc(slab_ce, GFP_KERNEL);
> > >   }
> > >
> > >   static void rcu_context_free(struct rcu_head *rcu)
> > > @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu)
> > >   struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
> > >
> > >   trace_intel_context_free(ce);
> > > - kmem_cache_free(global.slab_ce, ce);
> > > + kmem_cache_free(slab_ce, ce);
> > >   }
> > >
> > >   void intel_context_free(struct intel_context *ce)
> > > @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce)
> > >   i915_active_fini(>active);
> > >   }
> > >
> > > -static void i915_global_context_exit(void)
> > > +void i915_context_module_exit(void)
> > >   {
> > > - kmem_cache_destroy(global.slab_ce);
> > > + kmem_cache_destroy(slab_ce);
> > >   }
> > >
> > > -static struct i915_global_context global = { {
> > > - .exit = i915_global_context_exit,
> > > -} };
> > > -
> > > -int __init i915_global_context_init(void)
> > > +int __init i915_context_module_init(void)
> > >   {
> > > - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> > > - if (!global.slab_ce)
> > > + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> > > + if (!slab_ce)
> > >   return -ENOMEM;
> > >
> > > - i915_global_register();
> > >   return 0;
> > >   }
> > >
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > > b/drivers/gpu/drm/i915/gt/intel_context.h
> > > index 974ef85320c2..a0ca82e3c40d 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > > @@ -30,6 +30,9 @@ void intel_context_init(struct intel_context *ce,
> > >   struct intel_engine_cs *engine);
> > >   void intel_context_fini(struct intel_context *ce);
> > >
> > > +void i915_context_module_exit(void);
> > > +int i915_context_module_init(void);
> > > +
> > >   struct intel_context *
> > >   intel_context_create(struct intel_engine_cs *engine);
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> > > b/drivers/gpu/drm/i915/i915_g

Re: [PATCH 06/10] drm/i915: move gem_objects slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_objects to just a
> slab_objects.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c | 26 +++---
>  drivers/gpu/drm/i915/gem/i915_gem_object.h |  3 +++
>  drivers/gpu/drm/i915/i915_globals.c|  1 -
>  drivers/gpu/drm/i915/i915_globals.h|  1 -
>  drivers/gpu/drm/i915/i915_pci.c|  1 +
>  5 files changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 5c21cff33199..53156250d283 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -30,14 +30,10 @@
>  #include "i915_gem_context.h"
>  #include "i915_gem_mman.h"
>  #include "i915_gem_object.h"
> -#include "i915_globals.h"
>  #include "i915_memcpy.h"
>  #include "i915_trace.h"
>
> -static struct i915_global_object {
> -   struct i915_global base;
> -   struct kmem_cache *slab_objects;
> -} global;
> +struct kmem_cache *slab_objects;

static

With that,

Reviewed-by: Jason Ekstrand 

>  static const struct drm_gem_object_funcs i915_gem_object_funcs;
>
> @@ -45,7 +41,7 @@ struct drm_i915_gem_object *i915_gem_object_alloc(void)
>  {
> struct drm_i915_gem_object *obj;
>
> -   obj = kmem_cache_zalloc(global.slab_objects, GFP_KERNEL);
> +   obj = kmem_cache_zalloc(slab_objects, GFP_KERNEL);
> if (!obj)
> return NULL;
> obj->base.funcs = _gem_object_funcs;
> @@ -55,7 +51,7 @@ struct drm_i915_gem_object *i915_gem_object_alloc(void)
>
>  void i915_gem_object_free(struct drm_i915_gem_object *obj)
>  {
> -   return kmem_cache_free(global.slab_objects, obj);
> +   return kmem_cache_free(slab_objects, obj);
>  }
>
>  void i915_gem_object_init(struct drm_i915_gem_object *obj,
> @@ -664,23 +660,17 @@ void i915_gem_init__objects(struct drm_i915_private 
> *i915)
> INIT_WORK(>mm.free_work, __i915_gem_free_work);
>  }
>
> -static void i915_global_objects_exit(void)
> +void i915_objects_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_objects);
> +   kmem_cache_destroy(slab_objects);
>  }
>
> -static struct i915_global_object global = { {
> -   .exit = i915_global_objects_exit,
> -} };
> -
> -int __init i915_global_objects_init(void)
> +int __init i915_objects_module_init(void)
>  {
> -   global.slab_objects =
> -   KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN);
> -   if (!global.slab_objects)
> +   slab_objects = KMEM_CACHE(drm_i915_gem_object, SLAB_HWCACHE_ALIGN);
> +   if (!slab_objects)
> return -ENOMEM;
>
> -   i915_global_register();
> return 0;
>  }
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> index f3ede43282dc..6d8ea62a372f 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
> @@ -48,6 +48,9 @@ static inline bool i915_gem_object_size_2big(u64 size)
>
>  void i915_gem_init__objects(struct drm_i915_private *i915);
>
> +void i915_objects_module_exit(void);
> +int i915_objects_module_init(void);
> +
>  struct drm_i915_gem_object *i915_gem_object_alloc(void);
>  void i915_gem_object_free(struct drm_i915_gem_object *obj);
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index dbb3d81eeea7..40a592fbc3e0 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -30,7 +30,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_objects_init,
> i915_global_request_init,
> i915_global_scheduler_init,
> i915_global_vma_init,
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> index f16752dbbdbf..9734740708f4 100644
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -23,7 +23,6 @@ int i915_globals_init(void);
>  void i915_globals_exit(void);
>
>  /* constructors */
> -int i915_global_objects_init(void);
>  int i915_global_request_init(void);
>  in

Re: [PATCH 05/10] drm/i915: move gem_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_luts to just a
> slab_luts.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 25 +++--
>  drivers/gpu/drm/i915/gem/i915_gem_context.h |  3 +++
>  drivers/gpu/drm/i915/i915_globals.c |  2 --
>  drivers/gpu/drm/i915/i915_globals.h |  1 -
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  5 files changed, 13 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 89ca401bf9ae..c17c28af1e57 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -79,25 +79,21 @@
>  #include "gt/intel_ring.h"
>
>  #include "i915_gem_context.h"
> -#include "i915_globals.h"
>  #include "i915_trace.h"
>  #include "i915_user_extensions.h"
>
>  #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
>
> -static struct i915_global_gem_context {
> -   struct i915_global base;
> -   struct kmem_cache *slab_luts;
> -} global;
> +struct kmem_cache *slab_luts;

static.

With that,

Reviewed-by: Jason Ekstrand 

>  struct i915_lut_handle *i915_lut_handle_alloc(void)
>  {
> -   return kmem_cache_alloc(global.slab_luts, GFP_KERNEL);
> +   return kmem_cache_alloc(slab_luts, GFP_KERNEL);
>  }
>
>  void i915_lut_handle_free(struct i915_lut_handle *lut)
>  {
> -   return kmem_cache_free(global.slab_luts, lut);
> +   return kmem_cache_free(slab_luts, lut);
>  }
>
>  static void lut_close(struct i915_gem_context *ctx)
> @@ -2282,21 +2278,16 @@ i915_gem_engines_iter_next(struct 
> i915_gem_engines_iter *it)
>  #include "selftests/i915_gem_context.c"
>  #endif
>
> -static void i915_global_gem_context_exit(void)
> +void i915_gem_context_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_luts);
> +   kmem_cache_destroy(slab_luts);
>  }
>
> -static struct i915_global_gem_context global = { {
> -   .exit = i915_global_gem_context_exit,
> -} };
> -
> -int __init i915_global_gem_context_init(void)
> +int __init i915_gem_context_module_init(void)
>  {
> -   global.slab_luts = KMEM_CACHE(i915_lut_handle, 0);
> -   if (!global.slab_luts)
> +   slab_luts = KMEM_CACHE(i915_lut_handle, 0);
> +   if (!slab_luts)
> return -ENOMEM;
>
> -   i915_global_register();
> return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> index 20411db84914..18060536b0c2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
> @@ -224,6 +224,9 @@ i915_gem_engines_iter_next(struct i915_gem_engines_iter 
> *it);
> for (i915_gem_engines_iter_init(&(it), (engines)); \
>  ((ce) = i915_gem_engines_iter_next(&(it)));)
>
> +void i915_gem_context_module_exit(void);
> +int i915_gem_context_module_init(void);
> +
>  struct i915_lut_handle *i915_lut_handle_alloc(void);
>  void i915_lut_handle_free(struct i915_lut_handle *lut);
>
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index d36eb7dc40aa..dbb3d81eeea7 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -7,7 +7,6 @@
>  #include 
>  #include 
>
> -#include "gem/i915_gem_object.h"
>  #include "i915_globals.h"
>  #include "i915_request.h"
>  #include "i915_scheduler.h"
> @@ -31,7 +30,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_gem_context_init,
> i915_global_objects_init,
> i915_global_request_init,
> i915_global_scheduler_init,
> diff --git a/drivers/gpu/drm/i915/i915_globals.h 
> b/drivers/gpu/drm/i915/i915_globals.h
> index 60daa738a188..f16752dbbdbf 100644
> --- a/drivers/gpu/drm/i915/i915_globals.h
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -23,7 +23,6 @@ int i915_globals_init(void);
>  void i915_globals_exit(void);
>
>  /* constructors */
> -int i915_global_gem_context_init(void);
>  int i915_global_objects_init(void);
>  int i915_global_request_init(void);
>  int i915_glob

Re: [Intel-gfx] [PATCH 0/8] drm/i915: Migrate memory to SMEM when imported cross-device (v8)

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 10:29 AM Matthew Auld
 wrote:
>
> On Mon, 26 Jul 2021 at 16:11, Jason Ekstrand  wrote:
> >
> > On Mon, Jul 26, 2021 at 3:12 AM Matthew Auld
> >  wrote:
> > >
> > > On Fri, 23 Jul 2021 at 18:21, Jason Ekstrand  wrote:
> > > >
> > > > This patch series fixes an issue with discrete graphics on Intel where 
> > > > we
> > > > allowed dma-buf import while leaving the object in local memory.  This
> > > > breaks down pretty badly if the import happened on a different physical
> > > > device.
> > > >
> > > > v7:
> > > >  - Drop "drm/i915/gem/ttm: Place new BOs in the requested region"
> > > >  - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in 
> > > > i915_gem_dumb_create()"
> > > >  - Misc. review feedback from Matthew Auld
> > > > v8:
> > > >  - Misc. review feedback from Matthew Auld
> > > > v9:
> > > >  - Replace the i915/ttm patch with two that are hopefully more correct
> > > >
> > > > Jason Ekstrand (6):
> > > >   drm/i915/gem: Check object_can_migrate from object_migrate
> > > >   drm/i915/gem: Refactor placement setup for i915_gem_object_create*
> > > > (v2)
> > > >   drm/i915/gem: Call i915_gem_flush_free_objects() in
> > > > i915_gem_dumb_create()
> > > >   drm/i915/gem: Unify user object creation (v3)
> > > >   drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed
> > > >   drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails
> > > >
> > > > Thomas Hellström (2):
> > > >   drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
> > > >   drm/i915/gem: Migrate to system at dma-buf attach time (v7)
> > >
> > > Should I push the series?
> >
> > Yes, please.  Do we have a solid testing plan for things like this
> > that touch discrete?  I tested with mesa+glxgears on my DG1 but
> > haven't run anything more stressful.
>
> I think all we really have are the migration related selftests, and CI
> is not even running them on DG1 due to other breakage. Assuming you
> ran these locally, I think we just merge the series?

Works for me.  Yes, I ran them on my TGL+DG1 box.  I've also tested
both GL and Vulkan PRIME support with the client running on DG1 and
the compositor running on TGL with this series and everything works
smooth.

--Jason


> >
> > --Jason
> >
> >
> > > >
> > > >  drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 
> > > >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  58 --
> > > >  drivers/gpu/drm/i915/gem/i915_gem_object.c|  20 +-
> > > >  drivers/gpu/drm/i915/gem/i915_gem_object.h|   4 +
> > > >  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  13 +-
> > > >  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 190 +-
> > > >  .../drm/i915/gem/selftests/i915_gem_migrate.c |  15 --
> > > >  7 files changed, 341 insertions(+), 136 deletions(-)
> > > >
> > > > --
> > > > 2.31.1
> > > >
> > > > ___
> > > > Intel-gfx mailing list
> > > > intel-...@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 04/10] drm/i915: move intel_context slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 3:35 AM Tvrtko Ursulin
 wrote:
>
>
> On 23/07/2021 20:29, Daniel Vetter wrote:
> > With the global kmem_cache shrink infrastructure gone there's nothing
> > special and we can convert them over.
> >
> > I'm doing this split up into each patch because there's quite a bit of
> > noise with removing the static global.slab_ce to just a
> > slab_ce.
> >
> > Cc: Jason Ekstrand 
> > Signed-off-by: Daniel Vetter 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_context.c | 25 -
> >   drivers/gpu/drm/i915/gt/intel_context.h |  3 +++
> >   drivers/gpu/drm/i915/i915_globals.c |  2 --
> >   drivers/gpu/drm/i915/i915_globals.h |  1 -
> >   drivers/gpu/drm/i915/i915_pci.c |  2 ++
> >   5 files changed, 13 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> > b/drivers/gpu/drm/i915/gt/intel_context.c
> > index baa05fddd690..283382549a6f 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> > @@ -7,7 +7,6 @@
> >   #include "gem/i915_gem_pm.h"
> >
> >   #include "i915_drv.h"
> > -#include "i915_globals.h"
> >   #include "i915_trace.h"
> >
> >   #include "intel_context.h"
> > @@ -15,14 +14,11 @@
> >   #include "intel_engine_pm.h"
> >   #include "intel_ring.h"
> >
> > -static struct i915_global_context {
> > - struct i915_global base;
> > - struct kmem_cache *slab_ce;
> > -} global;
> > +struct kmem_cache *slab_ce;

Static?  With that,

Reviewed-by: Jason Ekstrand 

> >
> >   static struct intel_context *intel_context_alloc(void)
> >   {
> > - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
> > + return kmem_cache_zalloc(slab_ce, GFP_KERNEL);
> >   }
> >
> >   static void rcu_context_free(struct rcu_head *rcu)
> > @@ -30,7 +26,7 @@ static void rcu_context_free(struct rcu_head *rcu)
> >   struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
> >
> >   trace_intel_context_free(ce);
> > - kmem_cache_free(global.slab_ce, ce);
> > + kmem_cache_free(slab_ce, ce);
> >   }
> >
> >   void intel_context_free(struct intel_context *ce)
> > @@ -410,22 +406,17 @@ void intel_context_fini(struct intel_context *ce)
> >   i915_active_fini(>active);
> >   }
> >
> > -static void i915_global_context_exit(void)
> > +void i915_context_module_exit(void)
> >   {
> > - kmem_cache_destroy(global.slab_ce);
> > + kmem_cache_destroy(slab_ce);
> >   }
> >
> > -static struct i915_global_context global = { {
> > - .exit = i915_global_context_exit,
> > -} };
> > -
> > -int __init i915_global_context_init(void)
> > +int __init i915_context_module_init(void)
> >   {
> > - global.slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> > - if (!global.slab_ce)
> > + slab_ce = KMEM_CACHE(intel_context, SLAB_HWCACHE_ALIGN);
> > + if (!slab_ce)
> >   return -ENOMEM;
> >
> > - i915_global_register();
> >   return 0;
> >   }
> >
> > diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> > b/drivers/gpu/drm/i915/gt/intel_context.h
> > index 974ef85320c2..a0ca82e3c40d 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_context.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> > @@ -30,6 +30,9 @@ void intel_context_init(struct intel_context *ce,
> >   struct intel_engine_cs *engine);
> >   void intel_context_fini(struct intel_context *ce);
> >
> > +void i915_context_module_exit(void);
> > +int i915_context_module_init(void);
> > +
> >   struct intel_context *
> >   intel_context_create(struct intel_engine_cs *engine);
> >
> > diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> > b/drivers/gpu/drm/i915/i915_globals.c
> > index 3de7cf22ec76..d36eb7dc40aa 100644
> > --- a/drivers/gpu/drm/i915/i915_globals.c
> > +++ b/drivers/gpu/drm/i915/i915_globals.c
> > @@ -7,7 +7,6 @@
> >   #include 
> >   #include 
> >
> > -#include "gem/i915_gem_context.h"
> >   #include "gem/i915_gem_object.h"
> >   #include "i915_globals.h"
> >   #include "i915_request.h"
> > @@ -32,7 +31,6 @@ static void __i915_globals_cleanup(void)
> >   }
> >
> >   static __initconst int (* const initfn[])(void) = {
> > -

Re: [PATCH 03/10] drm/i915: move i915_buddy slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_blocks to just a
> slab_blocks.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_buddy.c   | 25 -
>  drivers/gpu/drm/i915/i915_buddy.h   |  3 ++-
>  drivers/gpu/drm/i915/i915_globals.c |  2 --
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  4 files changed, 12 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_buddy.c 
> b/drivers/gpu/drm/i915/i915_buddy.c
> index caabcaea3be7..045d00c43b4c 100644
> --- a/drivers/gpu/drm/i915/i915_buddy.c
> +++ b/drivers/gpu/drm/i915/i915_buddy.c
> @@ -8,13 +8,9 @@
>  #include "i915_buddy.h"
>
>  #include "i915_gem.h"
> -#include "i915_globals.h"
>  #include "i915_utils.h"
>
> -static struct i915_global_buddy {
> -   struct i915_global base;
> -   struct kmem_cache *slab_blocks;
> -} global;
> +struct kmem_cache *slab_blocks;

static?  With that fixed,

Reviewed-by: Jason Ekstrand 

>
>  static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm *mm,
>  struct i915_buddy_block 
> *parent,
> @@ -25,7 +21,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
> i915_buddy_mm *mm,
>
> GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER);
>
> -   block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL);
> +   block = kmem_cache_zalloc(slab_blocks, GFP_KERNEL);
> if (!block)
> return NULL;
>
> @@ -40,7 +36,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
> i915_buddy_mm *mm,
>  static void i915_block_free(struct i915_buddy_mm *mm,
> struct i915_buddy_block *block)
>  {
> -   kmem_cache_free(global.slab_blocks, block);
> +   kmem_cache_free(slab_blocks, block);
>  }
>
>  static void mark_allocated(struct i915_buddy_block *block)
> @@ -410,21 +406,16 @@ int i915_buddy_alloc_range(struct i915_buddy_mm *mm,
>  #include "selftests/i915_buddy.c"
>  #endif
>
> -static void i915_global_buddy_exit(void)
> +void i915_buddy_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_blocks);
> +   kmem_cache_destroy(slab_blocks);
>  }
>
> -static struct i915_global_buddy global = { {
> -   .exit = i915_global_buddy_exit,
> -} };
> -
> -int __init i915_global_buddy_init(void)
> +int __init i915_buddy_module_init(void)
>  {
> -   global.slab_blocks = KMEM_CACHE(i915_buddy_block, 0);
> -   if (!global.slab_blocks)
> +   slab_blocks = KMEM_CACHE(i915_buddy_block, 0);
> +   if (!slab_blocks)
> return -ENOMEM;
>
> -   i915_global_register();
> return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_buddy.h 
> b/drivers/gpu/drm/i915/i915_buddy.h
> index d8f26706de52..3940d632f208 100644
> --- a/drivers/gpu/drm/i915/i915_buddy.h
> +++ b/drivers/gpu/drm/i915/i915_buddy.h
> @@ -129,6 +129,7 @@ void i915_buddy_free(struct i915_buddy_mm *mm, struct 
> i915_buddy_block *block);
>
>  void i915_buddy_free_list(struct i915_buddy_mm *mm, struct list_head 
> *objects);
>
> -int i915_global_buddy_init(void);
> +void i915_buddy_module_exit(void);
> +int i915_buddy_module_init(void);
>
>  #endif
> diff --git a/drivers/gpu/drm/i915/i915_globals.c 
> b/drivers/gpu/drm/i915/i915_globals.c
> index a53135ee831d..3de7cf22ec76 100644
> --- a/drivers/gpu/drm/i915/i915_globals.c
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -7,7 +7,6 @@
>  #include 
>  #include 
>
> -#include "i915_buddy.h"
>  #include "gem/i915_gem_context.h"
>  #include "gem/i915_gem_object.h"
>  #include "i915_globals.h"
> @@ -33,7 +32,6 @@ static void __i915_globals_cleanup(void)
>  }
>
>  static __initconst int (* const initfn[])(void) = {
> -   i915_global_buddy_init,
> i915_global_context_init,
> i915_global_gem_context_init,
> i915_global_objects_init,
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 6ee77a8f43d6..f9527269e30a 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -31,6 +31,7 @@
>  #include "display/intel_fbdev.h"
>
>  #include "i915_active.h"
> +#include "i915_buddy.h"
>  #include "i915_drv.h"
>  #include "i915_perf.h"
>  #include "i915_globals.h"
> @@ -1295,6 +1296,7 @@ static const struct {
>  } init_funcs[] = {
> { i915_check_nomodeset, NULL },
> { i915_active_module_init, i915_active_module_exit },
> +   { i915_buddy_module_init, i915_buddy_module_exit },
> { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> { i915_pmu_init, i915_pmu_exit },
> --
> 2.32.0
>


Re: [PATCH 02/10] drm/i915: move i915_active slab to direct module init/exit

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> With the global kmem_cache shrink infrastructure gone there's nothing
> special and we can convert them over.
>
> I'm doing this split up into each patch because there's quite a bit of
> noise with removing the static global.slab_cache to just a slab_cache.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/i915_active.c  | 31 ++---
>  drivers/gpu/drm/i915/i915_active.h  |  3 +++
>  drivers/gpu/drm/i915/i915_globals.c |  2 --
>  drivers/gpu/drm/i915/i915_globals.h |  1 -
>  drivers/gpu/drm/i915/i915_pci.c |  2 ++
>  5 files changed, 16 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_active.c 
> b/drivers/gpu/drm/i915/i915_active.c
> index 91723123ae9f..9ffeb77eb5bb 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -13,7 +13,6 @@
>
>  #include "i915_drv.h"
>  #include "i915_active.h"
> -#include "i915_globals.h"
>
>  /*
>   * Active refs memory management
> @@ -22,10 +21,7 @@
>   * they idle (when we know the active requests are inactive) and allocate the
>   * nodes from a local slab cache to hopefully reduce the fragmentation.
>   */
> -static struct i915_global_active {
> -   struct i915_global base;
> -   struct kmem_cache *slab_cache;
> -} global;
> +struct kmem_cache *slab_cache;

static?  Or were you planning to expose it somehow?  With that fixed,

Reviewed-by: Jason Ekstrand 

>
>  struct active_node {
> struct rb_node node;
> @@ -174,7 +170,7 @@ __active_retire(struct i915_active *ref)
> /* Finally free the discarded timeline tree  */
> rbtree_postorder_for_each_entry_safe(it, n, , node) {
> GEM_BUG_ON(i915_active_fence_isset(>base));
> -   kmem_cache_free(global.slab_cache, it);
> +   kmem_cache_free(slab_cache, it);
> }
>  }
>
> @@ -322,7 +318,7 @@ active_instance(struct i915_active *ref, u64 idx)
>  * XXX: We should preallocate this before i915_active_ref() is ever
>  *  called, but we cannot call into fs_reclaim() anyway, so use 
> GFP_ATOMIC.
>  */
> -   node = kmem_cache_alloc(global.slab_cache, GFP_ATOMIC);
> +   node = kmem_cache_alloc(slab_cache, GFP_ATOMIC);
> if (!node)
> goto out;
>
> @@ -788,7 +784,7 @@ void i915_active_fini(struct i915_active *ref)
> mutex_destroy(>mutex);
>
> if (ref->cache)
> -   kmem_cache_free(global.slab_cache, ref->cache);
> +   kmem_cache_free(slab_cache, ref->cache);
>  }
>
>  static inline bool is_idle_barrier(struct active_node *node, u64 idx)
> @@ -908,7 +904,7 @@ int i915_active_acquire_preallocate_barrier(struct 
> i915_active *ref,
> node = reuse_idle_barrier(ref, idx);
> rcu_read_unlock();
> if (!node) {
> -   node = kmem_cache_alloc(global.slab_cache, 
> GFP_KERNEL);
> +   node = kmem_cache_alloc(slab_cache, GFP_KERNEL);
> if (!node)
> goto unwind;
>
> @@ -956,7 +952,7 @@ int i915_active_acquire_preallocate_barrier(struct 
> i915_active *ref,
> atomic_dec(>count);
> intel_engine_pm_put(barrier_to_engine(node));
>
> -   kmem_cache_free(global.slab_cache, node);
> +   kmem_cache_free(slab_cache, node);
> }
> return -ENOMEM;
>  }
> @@ -1176,21 +1172,16 @@ struct i915_active *i915_active_create(void)
>  #include "selftests/i915_active.c"
>  #endif
>
> -static void i915_global_active_exit(void)
> +void i915_active_module_exit(void)
>  {
> -   kmem_cache_destroy(global.slab_cache);
> +   kmem_cache_destroy(slab_cache);
>  }
>
> -static struct i915_global_active global = { {
> -   .exit = i915_global_active_exit,
> -} };
> -
> -int __init i915_global_active_init(void)
> +int __init i915_active_module_init(void)
>  {
> -   global.slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
> -   if (!global.slab_cache)
> +   slab_cache = KMEM_CACHE(active_node, SLAB_HWCACHE_ALIGN);
> +   if (!slab_cache)
> return -ENOMEM;
>
> -   i915_global_register();
> return 0;
>  }
> diff --git a/drivers/gpu/drm/i915/i915_active.h 
> b/drivers/gpu/drm/i915/i915_active.h
> index d0feda68b874..5fcdb0e2bc9e 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -247,4 +247,

Re: [PATCH 01/10] drm/i915: Check for nomodeset in i915_init() first

2021-07-26 Thread Jason Ekstrand
On Fri, Jul 23, 2021 at 2:29 PM Daniel Vetter  wrote:
>
> When modesetting (aka the full pci driver, which has nothing to do
> with disable_display option, which just gives you the full pci driver
> without the display driver) is disabled, we load nothing and do
> nothing.
>
> So move that check first, for a bit of orderliness. With Jason's
> module init/exit table this now becomes trivial.
>
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 

Reviewed-by: Jason Ekstrand 

> ---
>  drivers/gpu/drm/i915/i915_pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 48ea23dd3b5b..0deaeeba2347 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -1292,9 +1292,9 @@ static const struct {
> int (*init)(void);
> void (*exit)(void);
>  } init_funcs[] = {
> +   { i915_check_nomodeset, NULL },
> { i915_globals_init, i915_globals_exit },
> { i915_mock_selftests, NULL },
> -   { i915_check_nomodeset, NULL },
> { i915_pmu_init, i915_pmu_exit },
> { i915_register_pci_driver, i915_unregister_pci_driver },
> { i915_perf_sysctl_register, i915_perf_sysctl_unregister },
> --
> 2.32.0
>


Re: [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 3:31 AM Maarten Lankhorst
 wrote:
>
> Op 23-07-2021 om 13:34 schreef Matthew Auld:
> > From: Chris Wilson 
> >
> > Jason Ekstrand requested a more efficient method than userptr+set-domain
> > to determine if the userptr object was backed by a complete set of pages
> > upon creation. To be more efficient than simply populating the userptr
> > using get_user_pages() (as done by the call to set-domain or execbuf),
> > we can walk the tree of vm_area_struct and check for gaps or vma not
> > backed by struct page (VM_PFNMAP). The question is how to handle
> > VM_MIXEDMAP which may be either struct page or pfn backed...
> >
> > With discrete we are going to drop support for set_domain(), so offering
> > a way to probe the pages, without having to resort to dummy batches has
> > been requested.
> >
> > v2:
> > - add new query param for the PROBE flag, so userspace can easily
> >   check if the kernel supports it(Jason).
> > - use mmap_read_{lock, unlock}.
> > - add some kernel-doc.
> > v3:
> > - In the docs also mention that PROBE doesn't guarantee that the pages
> >   will remain valid by the time they are actually used(Tvrtko).
> > - Add a small comment for the hole finding logic(Jason).
> > - Move the param next to all the other params which just return true.
> >
> > Testcase: igt/gem_userptr_blits/probe
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Matthew Auld 
> > Cc: Thomas Hellström 
> > Cc: Maarten Lankhorst 
> > Cc: Tvrtko Ursulin 
> > Cc: Jordan Justen 
> > Cc: Kenneth Graunke 
> > Cc: Jason Ekstrand 
> > Cc: Daniel Vetter 
> > Cc: Ramalingam C 
> > Reviewed-by: Tvrtko Ursulin 
> > Acked-by: Kenneth Graunke 
> > Reviewed-by: Jason Ekstrand 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 -
> >  drivers/gpu/drm/i915/i915_getparam.c|  1 +
> >  include/uapi/drm/i915_drm.h | 20 ++
> >  3 files changed, 61 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > index 56edfeff8c02..468a7a617fbf 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> > i915_gem_userptr_ops = {
> >
> >  #endif
> >
> > +static int
> > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> > +{
> > + const unsigned long end = addr + len;
> > + struct vm_area_struct *vma;
> > + int ret = -EFAULT;
> > +
> > + mmap_read_lock(mm);
> > + for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > + /* Check for holes, note that we also update the addr below */
> > + if (vma->vm_start > addr)
> > + break;
> > +
> > + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> > + break;
> > +
> > + if (vma->vm_end >= end) {
> > + ret = 0;
> > + break;
> > + }
> > +
> > + addr = vma->vm_end;
> > + }
> > + mmap_read_unlock(mm);
> > +
> > + return ret;
> > +}
> > +
> >  /*
> >   * Creates a new mm object that wraps some normal memory from the process
> >   * context - user memory.
> > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> >   }
> >
> >   if (args->flags & ~(I915_USERPTR_READ_ONLY |
> > - I915_USERPTR_UNSYNCHRONIZED))
> > + I915_USERPTR_UNSYNCHRONIZED |
> > + I915_USERPTR_PROBE))
> >   return -EINVAL;
> >
> >   if (i915_gem_object_size_2big(args->user_size))
> > @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> >   return -ENODEV;
> >   }
> >
> > + if (args->flags & I915_USERPTR_PROBE) {
> > + /*
> > +  * Check that the range pointed to represents real struct
> > +  * pages and not iomappings (at this moment in time!)
> > +  */
> > + ret = probe_range(current->mm, args->user_ptr, 
> > args->user_size);
> > + if (ret)
> > + return ret;
> > + }
> > +
>

Re: [Intel-gfx] [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 3:06 AM Matthew Auld
 wrote:
>
> On Fri, 23 Jul 2021 at 18:48, Jason Ekstrand  wrote:
> >
> > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044
>
> Cool, is that ready to go? i.e can we start merging the kernel + IGT side.

Yes, it's all reviewed.  Though, it sounds like Maarten had a comment
so we should settle on that before landing.

> >
> > On Fri, Jul 23, 2021 at 6:35 AM Matthew Auld  wrote:
> > >
> > > From: Chris Wilson 
> > >
> > > Jason Ekstrand requested a more efficient method than userptr+set-domain
> > > to determine if the userptr object was backed by a complete set of pages
> > > upon creation. To be more efficient than simply populating the userptr
> > > using get_user_pages() (as done by the call to set-domain or execbuf),
> > > we can walk the tree of vm_area_struct and check for gaps or vma not
> > > backed by struct page (VM_PFNMAP). The question is how to handle
> > > VM_MIXEDMAP which may be either struct page or pfn backed...
> > >
> > > With discrete we are going to drop support for set_domain(), so offering
> > > a way to probe the pages, without having to resort to dummy batches has
> > > been requested.
> > >
> > > v2:
> > > - add new query param for the PROBE flag, so userspace can easily
> > >   check if the kernel supports it(Jason).
> > > - use mmap_read_{lock, unlock}.
> > > - add some kernel-doc.
> > > v3:
> > > - In the docs also mention that PROBE doesn't guarantee that the pages
> > >   will remain valid by the time they are actually used(Tvrtko).
> > > - Add a small comment for the hole finding logic(Jason).
> > > - Move the param next to all the other params which just return true.
> > >
> > > Testcase: igt/gem_userptr_blits/probe
> > > Signed-off-by: Chris Wilson 
> > > Signed-off-by: Matthew Auld 
> > > Cc: Thomas Hellström 
> > > Cc: Maarten Lankhorst 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Jordan Justen 
> > > Cc: Kenneth Graunke 
> > > Cc: Jason Ekstrand 
> > > Cc: Daniel Vetter 
> > > Cc: Ramalingam C 
> > > Reviewed-by: Tvrtko Ursulin 
> > > Acked-by: Kenneth Graunke 
> > > Reviewed-by: Jason Ekstrand 
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 -
> > >  drivers/gpu/drm/i915/i915_getparam.c|  1 +
> > >  include/uapi/drm/i915_drm.h | 20 ++
> > >  3 files changed, 61 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > index 56edfeff8c02..468a7a617fbf 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> > > i915_gem_userptr_ops = {
> > >
> > >  #endif
> > >
> > > +static int
> > > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> > > +{
> > > +   const unsigned long end = addr + len;
> > > +   struct vm_area_struct *vma;
> > > +   int ret = -EFAULT;
> > > +
> > > +   mmap_read_lock(mm);
> > > +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > > +   /* Check for holes, note that we also update the addr 
> > > below */
> > > +   if (vma->vm_start > addr)
> > > +   break;
> > > +
> > > +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> > > +   break;
> > > +
> > > +   if (vma->vm_end >= end) {
> > > +   ret = 0;
> > > +   break;
> > > +   }
> > > +
> > > +   addr = vma->vm_end;
> > > +   }
> > > +   mmap_read_unlock(mm);
> > > +
> > > +   return ret;
> > > +}
> > > +
> > >  /*
> > >   * Creates a new mm object that wraps some normal memory from the process
> > >   * context - user memory.
> > > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > > }
> > >
> > > if (args->flags & ~(I915_USERPTR_READ_ONLY |
> > > -   I915_USERPTR_UNSYNC

Re: [Intel-gfx] [PATCH 0/8] drm/i915: Migrate memory to SMEM when imported cross-device (v8)

2021-07-26 Thread Jason Ekstrand
On Mon, Jul 26, 2021 at 3:12 AM Matthew Auld
 wrote:
>
> On Fri, 23 Jul 2021 at 18:21, Jason Ekstrand  wrote:
> >
> > This patch series fixes an issue with discrete graphics on Intel where we
> > allowed dma-buf import while leaving the object in local memory.  This
> > breaks down pretty badly if the import happened on a different physical
> > device.
> >
> > v7:
> >  - Drop "drm/i915/gem/ttm: Place new BOs in the requested region"
> >  - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in 
> > i915_gem_dumb_create()"
> >  - Misc. review feedback from Matthew Auld
> > v8:
> >  - Misc. review feedback from Matthew Auld
> > v9:
> >  - Replace the i915/ttm patch with two that are hopefully more correct
> >
> > Jason Ekstrand (6):
> >   drm/i915/gem: Check object_can_migrate from object_migrate
> >   drm/i915/gem: Refactor placement setup for i915_gem_object_create*
> > (v2)
> >   drm/i915/gem: Call i915_gem_flush_free_objects() in
> > i915_gem_dumb_create()
> >   drm/i915/gem: Unify user object creation (v3)
> >   drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed
> >   drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails
> >
> > Thomas Hellström (2):
> >   drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
> >   drm/i915/gem: Migrate to system at dma-buf attach time (v7)
>
> Should I push the series?

Yes, please.  Do we have a solid testing plan for things like this
that touch discrete?  I tested with mesa+glxgears on my DG1 but
haven't run anything more stressful.

--Jason


> >
> >  drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 
> >  drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  58 --
> >  drivers/gpu/drm/i915/gem/i915_gem_object.c|  20 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_object.h|   4 +
> >  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  13 +-
> >  .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 190 +-
> >  .../drm/i915/gem/selftests/i915_gem_migrate.c |  15 --
> >  7 files changed, 341 insertions(+), 136 deletions(-)
> >
> > --
> > 2.31.1
> >
> > ___
> > Intel-gfx mailing list
> > intel-...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [PATCH 00/30] Remove CNL support

2021-07-23 Thread Jason Ekstrand

Generally a big fan. 

--Jason

On July 23, 2021 19:11:34 Lucas De Marchi  wrote:


Patches 1 and 2 are already being reviewed elsewhere. Discussion on 2nd
patch made me revive something I started after comment from Ville
at 
https://patchwork.freedesktop.org/patch/428168/?series=88988=1#comment_768918


This removes CNL completely from the driver, while trying to rename
functions and macros where appropriate (usually to GLK when dealing with
display or with ICL otherwise). It starts with display, which is more
straightforward, and then proceed to the rest of i915.

diff stat removing 1600 lines of dead code seems to pay the pain of
doing this.


Lucas De Marchi (30):
 drm/i915: fix not reading DSC disable fuse in GLK
 drm/i915/display: split DISPLAY_VER 9 and 10 in intel_setup_outputs()
 drm/i915/display: remove PORT_F workaround for CNL
 drm/i915/display: remove explicit CNL handling from intel_cdclk.c
 drm/i915/display: remove explicit CNL handling from intel_color.c
 drm/i915/display: remove explicit CNL handling from intel_combo_phy.c
 drm/i915/display: remove explicit CNL handling from intel_crtc.c
 drm/i915/display: remove explicit CNL handling from intel_ddi.c
 drm/i915/display: remove explicit CNL handling from
   intel_display_debugfs.c
 drm/i915/display: remove explicit CNL handling from intel_dmc.c
 drm/i915/display: remove explicit CNL handling from intel_dp.c
 drm/i915/display: remove explicit CNL handling from intel_dpll_mgr.c
 drm/i915/display: remove explicit CNL handling from intel_vdsc.c
 drm/i915/display: remove explicit CNL handling from
   skl_universal_plane.c
 drm/i915/display: remove explicit CNL handling from
   intel_display_power.c
 drm/i915/display: remove CNL ddi buf translation tables
 drm/i915/display: rename CNL references in skl_scaler.c
 drm/i915: remove explicit CNL handling from i915_irq.c
 drm/i915: remove explicit CNL handling from intel_pm.c
 drm/i915: remove explicit CNL handling from intel_mocs.c
 drm/i915: remove explicit CNL handling from intel_pch.c
 drm/i915: remove explicit CNL handling from intel_wopcm.c
 drm/i915/gt: remove explicit CNL handling from intel_sseu.c
 drm/i915: rename CNL references in intel_dram.c
 drm/i915/gt: rename CNL references in intel_engine.h
 drm/i915: finish removal of CNL
 drm/i915: remove GRAPHICS_VER == 10
 drm/i915: rename/remove CNL registers
 drm/i915: replace random CNL comments
 drm/i915: switch num_scalers/num_sprites to consider DISPLAY_VER

drivers/gpu/drm/i915/display/intel_bios.c |   8 +-
drivers/gpu/drm/i915/display/intel_cdclk.c|  72 +-
drivers/gpu/drm/i915/display/intel_color.c|   5 +-
.../gpu/drm/i915/display/intel_combo_phy.c| 106 +--
drivers/gpu/drm/i915/display/intel_crtc.c |   2 +-
drivers/gpu/drm/i915/display/intel_ddi.c  | 266 +---
.../drm/i915/display/intel_ddi_buf_trans.c| 616 +-
.../drm/i915/display/intel_ddi_buf_trans.h|   4 +-
drivers/gpu/drm/i915/display/intel_display.c  |   3 +-
.../drm/i915/display/intel_display_debugfs.c  |   2 +-
.../drm/i915/display/intel_display_power.c| 289 
.../drm/i915/display/intel_display_power.h|   2 -
drivers/gpu/drm/i915/display/intel_dmc.c  |   9 -
drivers/gpu/drm/i915/display/intel_dp.c   |  35 +-
drivers/gpu/drm/i915/display/intel_dp_aux.c   |   1 -
drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 586 +++--
drivers/gpu/drm/i915/display/intel_dpll_mgr.h |   1 -
drivers/gpu/drm/i915/display/intel_vbt_defs.h |   2 +-
drivers/gpu/drm/i915/display/intel_vdsc.c |   5 +-
drivers/gpu/drm/i915/display/skl_scaler.c |  10 +-
.../drm/i915/display/skl_universal_plane.c|  14 +-
drivers/gpu/drm/i915/gem/i915_gem_stolen.c|   1 -
drivers/gpu/drm/i915/gt/debugfs_gt_pm.c   |  10 +-
drivers/gpu/drm/i915/gt/intel_engine.h|   2 +-
drivers/gpu/drm/i915/gt/intel_engine_cs.c |   3 -
drivers/gpu/drm/i915/gt/intel_ggtt.c  |   4 +-
.../gpu/drm/i915/gt/intel_gt_clock_utils.c|  10 +-
drivers/gpu/drm/i915/gt/intel_gtt.c   |   6 +-
drivers/gpu/drm/i915/gt/intel_lrc.c   |  42 +-
drivers/gpu/drm/i915/gt/intel_mocs.c  |   2 +-
drivers/gpu/drm/i915/gt/intel_rc6.c   |   2 +-
drivers/gpu/drm/i915/gt/intel_rps.c   |   4 +-
drivers/gpu/drm/i915/gt/intel_sseu.c  |  79 ---
drivers/gpu/drm/i915/gt/intel_sseu.h  |   2 +-
drivers/gpu/drm/i915/gt/intel_sseu_debugfs.c  |   6 +-
drivers/gpu/drm/i915/gvt/gtt.c|   2 +-
drivers/gpu/drm/i915/i915_debugfs.c   |   6 +-
drivers/gpu/drm/i915/i915_drv.h   |  13 +-
drivers/gpu/drm/i915/i915_irq.c   |   7 +-
drivers/gpu/drm/i915/i915_pci.c   |  23 +-
drivers/gpu/drm/i915/i915_perf.c  |  22 +-
drivers/gpu/drm/i915/i915_reg.h   | 245 ++-
drivers/gpu/drm/i915/intel_device_info.c  |  23 +-
drivers/gpu/drm/i915/intel_device_info.h  |   4 +-
drivers/gpu/drm/i915/intel_dram.c |  32 +-

Re: [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-23 Thread Jason Ekstrand
Are there IGTs for this anywhere?

On Fri, Jul 23, 2021 at 12:47 PM Jason Ekstrand  wrote:
>
> https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044
>
> On Fri, Jul 23, 2021 at 6:35 AM Matthew Auld  wrote:
> >
> > From: Chris Wilson 
> >
> > Jason Ekstrand requested a more efficient method than userptr+set-domain
> > to determine if the userptr object was backed by a complete set of pages
> > upon creation. To be more efficient than simply populating the userptr
> > using get_user_pages() (as done by the call to set-domain or execbuf),
> > we can walk the tree of vm_area_struct and check for gaps or vma not
> > backed by struct page (VM_PFNMAP). The question is how to handle
> > VM_MIXEDMAP which may be either struct page or pfn backed...
> >
> > With discrete we are going to drop support for set_domain(), so offering
> > a way to probe the pages, without having to resort to dummy batches has
> > been requested.
> >
> > v2:
> > - add new query param for the PROBE flag, so userspace can easily
> >   check if the kernel supports it(Jason).
> > - use mmap_read_{lock, unlock}.
> > - add some kernel-doc.
> > v3:
> > - In the docs also mention that PROBE doesn't guarantee that the pages
> >   will remain valid by the time they are actually used(Tvrtko).
> > - Add a small comment for the hole finding logic(Jason).
> > - Move the param next to all the other params which just return true.
> >
> > Testcase: igt/gem_userptr_blits/probe
> > Signed-off-by: Chris Wilson 
> > Signed-off-by: Matthew Auld 
> > Cc: Thomas Hellström 
> > Cc: Maarten Lankhorst 
> > Cc: Tvrtko Ursulin 
> > Cc: Jordan Justen 
> > Cc: Kenneth Graunke 
> > Cc: Jason Ekstrand 
> > Cc: Daniel Vetter 
> > Cc: Ramalingam C 
> > Reviewed-by: Tvrtko Ursulin 
> > Acked-by: Kenneth Graunke 
> > Reviewed-by: Jason Ekstrand 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 -
> >  drivers/gpu/drm/i915/i915_getparam.c|  1 +
> >  include/uapi/drm/i915_drm.h | 20 ++
> >  3 files changed, 61 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > index 56edfeff8c02..468a7a617fbf 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> > i915_gem_userptr_ops = {
> >
> >  #endif
> >
> > +static int
> > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> > +{
> > +   const unsigned long end = addr + len;
> > +   struct vm_area_struct *vma;
> > +   int ret = -EFAULT;
> > +
> > +   mmap_read_lock(mm);
> > +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > +   /* Check for holes, note that we also update the addr below 
> > */
> > +   if (vma->vm_start > addr)
> > +   break;
> > +
> > +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> > +   break;
> > +
> > +   if (vma->vm_end >= end) {
> > +   ret = 0;
> > +   break;
> > +   }
> > +
> > +   addr = vma->vm_end;
> > +   }
> > +   mmap_read_unlock(mm);
> > +
> > +   return ret;
> > +}
> > +
> >  /*
> >   * Creates a new mm object that wraps some normal memory from the process
> >   * context - user memory.
> > @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > }
> >
> > if (args->flags & ~(I915_USERPTR_READ_ONLY |
> > -   I915_USERPTR_UNSYNCHRONIZED))
> > +   I915_USERPTR_UNSYNCHRONIZED |
> > +   I915_USERPTR_PROBE))
> > return -EINVAL;
> >
> > if (i915_gem_object_size_2big(args->user_size))
> > @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > return -ENODEV;
> > }
> >
> > +   if (args->flags & I915_USERPTR_PROBE) {
> > +   /*
> > +* Check that the range pointed to represents real struct
> > +* pages and not iomappings (at this moment in time!)
> > +*/
> > +

Re: [PATCH] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-23 Thread Jason Ekstrand
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044

On Fri, Jul 23, 2021 at 6:35 AM Matthew Auld  wrote:
>
> From: Chris Wilson 
>
> Jason Ekstrand requested a more efficient method than userptr+set-domain
> to determine if the userptr object was backed by a complete set of pages
> upon creation. To be more efficient than simply populating the userptr
> using get_user_pages() (as done by the call to set-domain or execbuf),
> we can walk the tree of vm_area_struct and check for gaps or vma not
> backed by struct page (VM_PFNMAP). The question is how to handle
> VM_MIXEDMAP which may be either struct page or pfn backed...
>
> With discrete we are going to drop support for set_domain(), so offering
> a way to probe the pages, without having to resort to dummy batches has
> been requested.
>
> v2:
> - add new query param for the PROBE flag, so userspace can easily
>   check if the kernel supports it(Jason).
> - use mmap_read_{lock, unlock}.
> - add some kernel-doc.
> v3:
> - In the docs also mention that PROBE doesn't guarantee that the pages
>   will remain valid by the time they are actually used(Tvrtko).
> - Add a small comment for the hole finding logic(Jason).
> - Move the param next to all the other params which just return true.
>
> Testcase: igt/gem_userptr_blits/probe
> Signed-off-by: Chris Wilson 
> Signed-off-by: Matthew Auld 
> Cc: Thomas Hellström 
> Cc: Maarten Lankhorst 
> Cc: Tvrtko Ursulin 
> Cc: Jordan Justen 
> Cc: Kenneth Graunke 
> Cc: Jason Ekstrand 
> Cc: Daniel Vetter 
> Cc: Ramalingam C 
> Reviewed-by: Tvrtko Ursulin 
> Acked-by: Kenneth Graunke 
> Reviewed-by: Jason Ekstrand 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 41 -
>  drivers/gpu/drm/i915/i915_getparam.c|  1 +
>  include/uapi/drm/i915_drm.h | 20 ++
>  3 files changed, 61 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> index 56edfeff8c02..468a7a617fbf 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> @@ -422,6 +422,34 @@ static const struct drm_i915_gem_object_ops 
> i915_gem_userptr_ops = {
>
>  #endif
>
> +static int
> +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> +{
> +   const unsigned long end = addr + len;
> +   struct vm_area_struct *vma;
> +   int ret = -EFAULT;
> +
> +   mmap_read_lock(mm);
> +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> +   /* Check for holes, note that we also update the addr below */
> +   if (vma->vm_start > addr)
> +   break;
> +
> +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> +   break;
> +
> +   if (vma->vm_end >= end) {
> +   ret = 0;
> +   break;
> +   }
> +
> +   addr = vma->vm_end;
> +   }
> +   mmap_read_unlock(mm);
> +
> +   return ret;
> +}
> +
>  /*
>   * Creates a new mm object that wraps some normal memory from the process
>   * context - user memory.
> @@ -477,7 +505,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> }
>
> if (args->flags & ~(I915_USERPTR_READ_ONLY |
> -   I915_USERPTR_UNSYNCHRONIZED))
> +   I915_USERPTR_UNSYNCHRONIZED |
> +   I915_USERPTR_PROBE))
> return -EINVAL;
>
> if (i915_gem_object_size_2big(args->user_size))
> @@ -504,6 +533,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> return -ENODEV;
> }
>
> +   if (args->flags & I915_USERPTR_PROBE) {
> +   /*
> +* Check that the range pointed to represents real struct
> +* pages and not iomappings (at this moment in time!)
> +*/
> +   ret = probe_range(current->mm, args->user_ptr, 
> args->user_size);
> +   if (ret)
> +   return ret;
> +   }
> +
>  #ifdef CONFIG_MMU_NOTIFIER
> obj = i915_gem_object_alloc();
> if (obj == NULL)
> diff --git a/drivers/gpu/drm/i915/i915_getparam.c 
> b/drivers/gpu/drm/i915/i915_getparam.c
> index 24e18219eb50..bbb7cac43eb4 100644
> --- a/drivers/gpu/drm/i915/i915_getparam.c
> +++ b/drivers/gpu/drm/i915/i915_getparam.c
> @@ -134,6 +134,7 @@ int i915_getparam_ioctl(struct drm_device *dev, void 
> *data,
> case I915_PARAM_HAS_EXEC_FEN

[PATCH 7/8] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)

2021-07-23 Thread Jason Ekstrand
From: Thomas Hellström 

If our exported dma-bufs are imported by another instance of our driver,
that instance will typically have the imported dma-bufs locked during
dma_buf_map_attachment(). But the exporter also locks the same reservation
object in the map_dma_buf() callback, which leads to recursive locking.

So taking the lock inside _pin_pages_unlocked() is incorrect.

Additionally, the current pinning code path is contrary to the defined
way that pinning should occur.

Remove the explicit pin/unpin from the map/umap functions and move them
to the attach/detach allowing correct locking to occur, and to match
the static dma-buf drm_prime pattern.

Add a live selftest to exercise both dynamic and non-dynamic
exports.

v2:
- Extend the selftest with a fake dynamic importer.
- Provide real pin and unpin callbacks to not abuse the interface.
v3: (ruhl)
- Remove the dynamic export support and move the pinning into the
  attach/detach path.
v4: (ruhl)
- Put pages does not need to assert on the dma-resv
v5: (jason)
- Lock around dma_buf_unmap_attachment() when emulating a dynamic
  importer in the subtests.
- Use pin_pages_unlocked
v6: (jason)
- Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests
v7: (mauld)
- Use __i915_gem_object_get_pages (2 __underscores) instead of the
  4 underscore version in the selftests
v8: (mauld)
- Drop the kernel doc from the static i915_gem_dmabuf_attach function
- Add missing "err = PTR_ERR()" to a bunch of selftest error cases

Reported-by: Michael J. Ruhl 
Signed-off-by: Thomas Hellström 
Signed-off-by: Michael J. Ruhl 
Signed-off-by: Jason Ekstrand 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  37 --
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 109 +-
 2 files changed, 132 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 616c3a2f1baf0..59dc56ae14d6b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -12,6 +12,8 @@
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 
+I915_SELFTEST_DECLARE(static bool force_different_devices;)
+
 static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf)
 {
return to_intel_bo(buf->priv);
@@ -25,15 +27,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
struct scatterlist *src, *dst;
int ret, i;
 
-   ret = i915_gem_object_pin_pages_unlocked(obj);
-   if (ret)
-   goto err;
-
/* Copy sg so that we make an independent mapping */
st = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
if (st == NULL) {
ret = -ENOMEM;
-   goto err_unpin_pages;
+   goto err;
}
 
ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
@@ -58,8 +56,6 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
sg_free_table(st);
 err_free:
kfree(st);
-err_unpin_pages:
-   i915_gem_object_unpin_pages(obj);
 err:
return ERR_PTR(ret);
 }
@@ -68,13 +64,9 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
   struct sg_table *sg,
   enum dma_data_direction dir)
 {
-   struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
-
dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sg);
kfree(sg);
-
-   i915_gem_object_unpin_pages(obj);
 }
 
 static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map 
*map)
@@ -168,7 +160,25 @@ static int i915_gem_end_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_direct
return err;
 }
 
+static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
+ struct dma_buf_attachment *attach)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+
+   return i915_gem_object_pin_pages_unlocked(obj);
+}
+
+static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
+  struct dma_buf_attachment *attach)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+
+   i915_gem_object_unpin_pages(obj);
+}
+
 static const struct dma_buf_ops i915_dmabuf_ops =  {
+   .attach = i915_gem_dmabuf_attach,
+   .detach = i915_gem_dmabuf_detach,
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
@@ -204,6 +214,8 @@ static int i915_gem_object_get_pages_dmabuf(struct 
drm_i915_gem_object *obj)
struct sg_table *pages;
unsigned int sg_page_sizes;
 
+   assert_object_held(obj);
+
pages = dma_buf_map_attachment(obj->base.import_attach,
 

[PATCH 8/8] drm/i915/gem: Migrate to system at dma-buf attach time (v7)

2021-07-23 Thread Jason Ekstrand
From: Thomas Hellström 

Until we support p2p dma or as a complement to that, migrate data
to system memory at dma-buf attach time if possible.

v2:
- Rebase on dynamic exporter. Update the igt_dmabuf_import_same_driver
  selftest to migrate if we are LMEM capable.
v3:
- Migrate also in the pin() callback.
v4:
- Migrate in attach
v5: (jason)
- Lock around the migration
v6: (jason)
- Move the can_migrate check outside the lock
- Rework the selftests to test more migration conditions.  In
  particular, SMEM, LMEM, and LMEM+SMEM are all checked.
v7: (mauld)
- Misc style nits

Signed-off-by: Thomas Hellström 
Signed-off-by: Michael J. Ruhl 
Reported-by: kernel test robot 
Signed-off-by: Jason Ekstrand 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 23 -
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 87 ++-
 2 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 59dc56ae14d6b..afa34111de02e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -164,8 +164,29 @@ static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
  struct dma_buf_attachment *attach)
 {
struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+   struct i915_gem_ww_ctx ww;
+   int err;
+
+   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM))
+   return -EOPNOTSUPP;
+
+   for_i915_gem_ww(, err, true) {
+   err = i915_gem_object_lock(obj, );
+   if (err)
+   continue;
+
+   err = i915_gem_object_migrate(obj, , INTEL_REGION_SMEM);
+   if (err)
+   continue;
 
-   return i915_gem_object_pin_pages_unlocked(obj);
+   err = i915_gem_object_wait_migration(obj, 0);
+   if (err)
+   continue;
+
+   err = i915_gem_object_pin_pages(obj);
+   }
+
+   return err;
 }
 
 static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
index d4ce01e6ee854..ffae7df5e4d7d 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
@@ -85,9 +85,63 @@ static int igt_dmabuf_import_self(void *arg)
return err;
 }
 
-static int igt_dmabuf_import_same_driver(void *arg)
+static int igt_dmabuf_import_same_driver_lmem(void *arg)
 {
struct drm_i915_private *i915 = arg;
+   struct intel_memory_region *lmem = i915->mm.regions[INTEL_REGION_LMEM];
+   struct drm_i915_gem_object *obj;
+   struct drm_gem_object *import;
+   struct dma_buf *dmabuf;
+   int err;
+
+   if (!lmem)
+   return 0;
+
+   force_different_devices = true;
+
+   obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
+   if (IS_ERR(obj)) {
+   pr_err("__i915_gem_object_create_user failed with err=%ld\n",
+  PTR_ERR(dmabuf));
+   err = PTR_ERR(obj);
+   goto out_ret;
+   }
+
+   dmabuf = i915_gem_prime_export(>base, 0);
+   if (IS_ERR(dmabuf)) {
+   pr_err("i915_gem_prime_export failed with err=%ld\n",
+  PTR_ERR(dmabuf));
+   err = PTR_ERR(dmabuf);
+   goto out;
+   }
+
+   /*
+* We expect an import of an LMEM-only object to fail with
+* -EOPNOTSUPP because it can't be migrated to SMEM.
+*/
+   import = i915_gem_prime_import(>drm, dmabuf);
+   if (!IS_ERR(import)) {
+   drm_gem_object_put(import);
+   pr_err("i915_gem_prime_import succeeded when it shouldn't 
have\n");
+   err = -EINVAL;
+   } else if (PTR_ERR(import) != -EOPNOTSUPP) {
+   pr_err("i915_gem_prime_import failed with the wrong err=%ld\n",
+  PTR_ERR(import));
+   err = PTR_ERR(import);
+   }
+
+   dma_buf_put(dmabuf);
+out:
+   i915_gem_object_put(obj);
+out_ret:
+   force_different_devices = false;
+   return err;
+}
+
+static int igt_dmabuf_import_same_driver(struct drm_i915_private *i915,
+struct intel_memory_region **regions,
+unsigned int num_regions)
+{
struct drm_i915_gem_object *obj, *import_obj;
struct drm_gem_object *import;
struct dma_buf *dmabuf;
@@ -97,8 +151,12 @@ static int igt_dmabuf_import_same_driver(void *arg)
int err;
 
force_different_devices = true;
-   obj = i915_gem_object_create_shmem(i915, PAGE_SIZE);
+
+   obj = __i915_gem_object_create_user(i915, PAGE_SIZE,
+   

[PATCH 6/8] drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails

2021-07-23 Thread Jason Ekstrand
Without TTM, we have no such hook so we exit early but this is fine
because we use TTM on all LMEM platforms and, on integrated platforms,
there is no real migration.  If we do have the hook, it's better to just
let TTM handle the migration because it knows where things are actually
placed.

This fixes a bug where i915_gem_object_migrate fails to migrate newly
created LMEM objects.  In that scenario, the object has obj->mm.region
set to LMEM but TTM has it in SMEM because that's where all new objects
are placed there prior to getting actual pages.  When we invoke
i915_gem_object_migrate, it exits early because, from the point of view
of the GEM object, it's already in LMEM and no migration is needed.
Then, when we try to pin the pages, __i915_ttm_get_pages is called
which, unaware of our failed attempt at a migration, places the object
in SMEM.  This only happens on newly created objects because they have
this weird state where TTM thinks they're in SMEM, GEM thinks they're in
LMEM, and the reality is that they don't exist at all.

It's better if GEM just always calls into TTM and let's TTM handle
things.  That way the lies stay better contained.  Once the migration is
complete, the object will have pages, obj->mm.region will be correct,
and we're done lying.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index d09bd9bdb38ac..9d3497e1235a0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -607,12 +607,15 @@ int i915_gem_object_migrate(struct drm_i915_gem_object 
*obj,
mr = i915->mm.regions[id];
GEM_BUG_ON(!mr);
 
-   if (obj->mm.region == mr)
-   return 0;
-
if (!i915_gem_object_can_migrate(obj, id))
return -EINVAL;
 
+   if (!obj->ops->migrate) {
+   if (GEM_WARN_ON(obj->mm.region != mr))
+   return -EINVAL;
+   return 0;
+   }
+
return obj->ops->migrate(obj, mr);
 }
 
-- 
2.31.1



[PATCH 5/8] drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed

2021-07-23 Thread Jason Ekstrand
__i915_ttm_get_pages does two things.  First, it calls ttm_bo_validate()
to check the given placement and migrate the BO if needed.  Then, it
updates the GEM object to match, in case the object was migrated.  If
no migration occured, however, we might still have pages on the GEM
object in which case we don't need to fetch them from TTM and call
__i915_gem_object_set_pages.  This hasn't been a problem before because
the primary user of __i915_ttm_get_pages is __i915_gem_object_get_pages
which only calls it if the GEM object doesn't have pages.

However, i915_ttm_migrate also uses __i915_ttm_get_pages to do the
migration so this meant it was unsafe to call on an already populated
object.  This patch checks i915_gem_object_has_pages() before trying to
__i915_gem_object_set_pages so i915_ttm_migrate is safe to call, even on
populated objects.

Signed-off-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index f253b11e9e367..771eb2963123f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -662,13 +662,14 @@ static int __i915_ttm_get_pages(struct 
drm_i915_gem_object *obj,
i915_ttm_adjust_gem_after_move(obj);
}
 
-   GEM_WARN_ON(obj->mm.pages);
-   /* Object either has a page vector or is an iomem object */
-   st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : obj->ttm.cached_io_st;
-   if (IS_ERR(st))
-   return PTR_ERR(st);
+   if (!i915_gem_object_has_pages(obj)) {
+   /* Object either has a page vector or is an iomem object */
+   st = bo->ttm ? i915_ttm_tt_get_st(bo->ttm) : 
obj->ttm.cached_io_st;
+   if (IS_ERR(st))
+   return PTR_ERR(st);
 
-   __i915_gem_object_set_pages(obj, st, i915_sg_dma_sizes(st->sgl));
+   __i915_gem_object_set_pages(obj, st, 
i915_sg_dma_sizes(st->sgl));
+   }
 
return ret;
 }
-- 
2.31.1



[PATCH 4/8] drm/i915/gem: Unify user object creation (v3)

2021-07-23 Thread Jason Ekstrand
Instead of hand-rolling the same three calls in each function, pull them
into an i915_gem_object_create_user helper.  Apart from re-ordering of
the placements array ENOMEM check, there should be no functional change.

v2 (Matthew Auld):
 - Add the call to i915_gem_flush_free_objects() from
   i915_gem_dumb_create() in a separate patch
 - Move i915_gem_object_alloc() below the simple error checks
v3 (Matthew Auld):
 - Add __ to i915_gem_object_create_user and kerneldoc which warns the
   caller that it's not validating anything.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 119 ++---
 drivers/gpu/drm/i915/gem/i915_gem_object.h |   4 +
 2 files changed, 58 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index adcce37c04b8d..23fee13a33844 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -11,13 +11,14 @@
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 
-static u32 object_max_page_size(struct drm_i915_gem_object *obj)
+static u32 object_max_page_size(struct intel_memory_region **placements,
+   unsigned int n_placements)
 {
u32 max_page_size = 0;
int i;
 
-   for (i = 0; i < obj->mm.n_placements; i++) {
-   struct intel_memory_region *mr = obj->mm.placements[i];
+   for (i = 0; i < n_placements; i++) {
+   struct intel_memory_region *mr = placements[i];
 
GEM_BUG_ON(!is_power_of_2(mr->min_page_size));
max_page_size = max_t(u32, max_page_size, mr->min_page_size);
@@ -81,22 +82,46 @@ static int i915_gem_publish(struct drm_i915_gem_object *obj,
return 0;
 }
 
-static int
-i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
+/**
+ * Creates a new object using the same path as DRM_I915_GEM_CREATE_EXT
+ * @i915: i915 private
+ * @size: size of the buffer, in bytes
+ * @placements: possible placement regions, in priority order
+ * @n_placements: number of possible placement regions
+ *
+ * This function is exposed primarily for selftests and does very little
+ * error checking.  It is assumed that the set of placement regions has
+ * already been verified to be valid.
+ */
+struct drm_i915_gem_object *
+__i915_gem_object_create_user(struct drm_i915_private *i915, u64 size,
+ struct intel_memory_region **placements,
+ unsigned int n_placements)
 {
-   struct intel_memory_region *mr = obj->mm.placements[0];
+   struct intel_memory_region *mr = placements[0];
+   struct drm_i915_gem_object *obj;
unsigned int flags;
int ret;
 
-   size = round_up(size, object_max_page_size(obj));
+   i915_gem_flush_free_objects(i915);
+
+   size = round_up(size, object_max_page_size(placements, n_placements));
if (size == 0)
-   return -EINVAL;
+   return ERR_PTR(-EINVAL);
 
/* For most of the ABI (e.g. mmap) we think in system pages */
GEM_BUG_ON(!IS_ALIGNED(size, PAGE_SIZE));
 
if (i915_gem_object_size_2big(size))
-   return -E2BIG;
+   return ERR_PTR(-E2BIG);
+
+   obj = i915_gem_object_alloc();
+   if (!obj)
+   return ERR_PTR(-ENOMEM);
+
+   ret = object_set_placements(obj, placements, n_placements);
+   if (ret)
+   goto object_free;
 
/*
 * I915_BO_ALLOC_USER will make sure the object is cleared before
@@ -106,12 +131,18 @@ i915_gem_setup(struct drm_i915_gem_object *obj, u64 size)
 
ret = mr->ops->init_object(mr, obj, size, 0, flags);
if (ret)
-   return ret;
+   goto object_free;
 
GEM_BUG_ON(size != obj->base.size);
 
trace_i915_gem_object_create(obj);
-   return 0;
+   return obj;
+
+object_free:
+   if (obj->mm.n_placements > 1)
+   kfree(obj->mm.placements);
+   i915_gem_object_free(obj);
+   return ERR_PTR(ret);
 }
 
 int
@@ -124,7 +155,6 @@ i915_gem_dumb_create(struct drm_file *file,
enum intel_memory_type mem_type;
int cpp = DIV_ROUND_UP(args->bpp, 8);
u32 format;
-   int ret;
 
switch (cpp) {
case 1:
@@ -151,32 +181,19 @@ i915_gem_dumb_create(struct drm_file *file,
if (args->pitch < args->width)
return -EINVAL;
 
-   i915_gem_flush_free_objects(i915);
-
args->size = mul_u32_u32(args->pitch, args->height);
 
mem_type = INTEL_MEMORY_SYSTEM;
if (HAS_LMEM(to_i915(dev)))
mem_type = INTEL_MEMORY_LOCAL;
 
-   obj = i915_gem_object_alloc();
-   if (!obj)
-   return -ENOMEM;
-
mr = intel_memory_region_by_type(to_i915(dev), mem_type);
-   

[PATCH 3/8] drm/i915/gem: Call i915_gem_flush_free_objects() in i915_gem_dumb_create()

2021-07-23 Thread Jason Ekstrand
This doesn't really fix anything serious since the chances of a client
creating and destroying a mass of dumb BOs is pretty low.  However, it
is called by the other two create IOCTLs to garbage collect old objects.
Call it here too for consistency.

Signed-off-by: Jason Ekstrand 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index aa687b10dcd45..adcce37c04b8d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -151,6 +151,8 @@ i915_gem_dumb_create(struct drm_file *file,
if (args->pitch < args->width)
return -EINVAL;
 
+   i915_gem_flush_free_objects(i915);
+
args->size = mul_u32_u32(args->pitch, args->height);
 
mem_type = INTEL_MEMORY_SYSTEM;
-- 
2.31.1



[PATCH 1/8] drm/i915/gem: Check object_can_migrate from object_migrate

2021-07-23 Thread Jason Ekstrand
We don't roll them together entirely because there are still a couple
cases where we want a separate can_migrate check.  For instance, the
display code checks that you can migrate a buffer to LMEM before it
accepts it in fb_create.  The dma-buf import code also uses it to do an
early check and return a different error code if someone tries to attach
a LMEM-only dma-buf to another driver.

However, no one actually wants to call object_migrate when can_migrate
has failed.  The stated intention is for self-tests but none of those
actually take advantage of this unsafe migration.

Signed-off-by: Jason Ekstrand 
Cc: Daniel Vetter 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c| 13 ++---
 .../gpu/drm/i915/gem/selftests/i915_gem_migrate.c | 15 ---
 2 files changed, 2 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 5c21cff33199e..d09bd9bdb38ac 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -584,12 +584,6 @@ bool i915_gem_object_can_migrate(struct 
drm_i915_gem_object *obj,
  * completed yet, and to accomplish that, i915_gem_object_wait_migration()
  * must be called.
  *
- * This function is a bit more permissive than i915_gem_object_can_migrate()
- * to allow for migrating objects where the caller knows exactly what is
- * happening. For example within selftests. More specifically this
- * function allows migrating I915_BO_ALLOC_USER objects to regions
- * that are not in the list of allowable regions.
- *
  * Note: the @ww parameter is not used yet, but included to make sure
  * callers put some effort into obtaining a valid ww ctx if one is
  * available.
@@ -616,11 +610,8 @@ int i915_gem_object_migrate(struct drm_i915_gem_object 
*obj,
if (obj->mm.region == mr)
return 0;
 
-   if (!i915_gem_object_evictable(obj))
-   return -EBUSY;
-
-   if (!obj->ops->migrate)
-   return -EOPNOTSUPP;
+   if (!i915_gem_object_can_migrate(obj, id))
+   return -EINVAL;
 
return obj->ops->migrate(obj, mr);
 }
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
index 0b7144d2991ca..28a700f08b49a 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_migrate.c
@@ -61,11 +61,6 @@ static int igt_create_migrate(struct intel_gt *gt, enum 
intel_region_id src,
if (err)
continue;
 
-   if (!i915_gem_object_can_migrate(obj, dst)) {
-   err = -EINVAL;
-   continue;
-   }
-
err = i915_gem_object_migrate(obj, , dst);
if (err)
continue;
@@ -114,11 +109,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx 
*ww,
return err;
 
if (i915_gem_object_is_lmem(obj)) {
-   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM)) {
-   pr_err("object can't migrate to smem.\n");
-   return -EINVAL;
-   }
-
err = i915_gem_object_migrate(obj, ww, INTEL_REGION_SMEM);
if (err) {
pr_err("Object failed migration to smem\n");
@@ -137,11 +127,6 @@ static int lmem_pages_migrate_one(struct i915_gem_ww_ctx 
*ww,
}
 
} else {
-   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_LMEM)) {
-   pr_err("object can't migrate to lmem.\n");
-   return -EINVAL;
-   }
-
err = i915_gem_object_migrate(obj, ww, INTEL_REGION_LMEM);
if (err) {
pr_err("Object failed migration to lmem\n");
-- 
2.31.1



[PATCH 2/8] drm/i915/gem: Refactor placement setup for i915_gem_object_create* (v2)

2021-07-23 Thread Jason Ekstrand
Since we don't allow changing the set of regions after creation, we can
make ext_set_placements() build up the region set directly in the
create_ext and assign it to the object later.  This is similar to what
we did for contexts with the proto-context only simpler because there's
no funny object shuffling.  This will be used in the next patch to allow
us to de-duplicate a bunch of code.  Also, since we know the maximum
number of regions up-front, we can use a fixed-size temporary array for
the regions.  This simplifies memory management a bit for this new
delayed approach.

v2 (Matthew Auld):
 - Get rid of MAX_N_PLACEMENTS
 - Drop kfree(placements) from set_placements()
v3 (Matthew Auld):
 - Properly set ext_data->n_placements

Signed-off-by: Jason Ekstrand 
Reviewed-by: Matthew Auld 
---
 drivers/gpu/drm/i915/gem/i915_gem_create.c | 82 --
 1 file changed, 46 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c 
b/drivers/gpu/drm/i915/gem/i915_gem_create.c
index 51f92e4b1a69d..aa687b10dcd45 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_create.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c
@@ -27,10 +27,13 @@ static u32 object_max_page_size(struct drm_i915_gem_object 
*obj)
return max_page_size;
 }
 
-static void object_set_placements(struct drm_i915_gem_object *obj,
- struct intel_memory_region **placements,
- unsigned int n_placements)
+static int object_set_placements(struct drm_i915_gem_object *obj,
+struct intel_memory_region **placements,
+unsigned int n_placements)
 {
+   struct intel_memory_region **arr;
+   unsigned int i;
+
GEM_BUG_ON(!n_placements);
 
/*
@@ -44,9 +47,20 @@ static void object_set_placements(struct drm_i915_gem_object 
*obj,
obj->mm.placements = >mm.regions[mr->id];
obj->mm.n_placements = 1;
} else {
-   obj->mm.placements = placements;
+   arr = kmalloc_array(n_placements,
+   sizeof(struct intel_memory_region *),
+   GFP_KERNEL);
+   if (!arr)
+   return -ENOMEM;
+
+   for (i = 0; i < n_placements; i++)
+   arr[i] = placements[i];
+
+   obj->mm.placements = arr;
obj->mm.n_placements = n_placements;
}
+
+   return 0;
 }
 
 static int i915_gem_publish(struct drm_i915_gem_object *obj,
@@ -148,7 +162,9 @@ i915_gem_dumb_create(struct drm_file *file,
return -ENOMEM;
 
mr = intel_memory_region_by_type(to_i915(dev), mem_type);
-   object_set_placements(obj, , 1);
+   ret = object_set_placements(obj, , 1);
+   if (ret)
+   goto object_free;
 
ret = i915_gem_setup(obj, args->size);
if (ret)
@@ -184,7 +200,9 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
return -ENOMEM;
 
mr = intel_memory_region_by_type(i915, INTEL_MEMORY_SYSTEM);
-   object_set_placements(obj, , 1);
+   ret = object_set_placements(obj, , 1);
+   if (ret)
+   goto object_free;
 
ret = i915_gem_setup(obj, args->size);
if (ret)
@@ -199,7 +217,8 @@ i915_gem_create_ioctl(struct drm_device *dev, void *data,
 
 struct create_ext {
struct drm_i915_private *i915;
-   struct drm_i915_gem_object *vanilla_object;
+   struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
+   unsigned int n_placements;
 };
 
 static void repr_placements(char *buf, size_t size,
@@ -230,8 +249,7 @@ static int set_placements(struct 
drm_i915_gem_create_ext_memory_regions *args,
struct drm_i915_private *i915 = ext_data->i915;
struct drm_i915_gem_memory_class_instance __user *uregions =
u64_to_user_ptr(args->regions);
-   struct drm_i915_gem_object *obj = ext_data->vanilla_object;
-   struct intel_memory_region **placements;
+   struct intel_memory_region *placements[INTEL_REGION_UNKNOWN];
u32 mask;
int i, ret = 0;
 
@@ -245,6 +263,8 @@ static int set_placements(struct 
drm_i915_gem_create_ext_memory_regions *args,
ret = -EINVAL;
}
 
+   BUILD_BUG_ON(ARRAY_SIZE(i915->mm.regions) != ARRAY_SIZE(placements));
+   BUILD_BUG_ON(ARRAY_SIZE(ext_data->placements) != 
ARRAY_SIZE(placements));
if (args->num_regions > ARRAY_SIZE(i915->mm.regions)) {
drm_dbg(>drm, "num_regions is too large\n");
ret = -EINVAL;
@@ -253,21 +273,13 @@ static int set_placements(struct 
drm_i915_gem_create_ext_memory_regions *args,
if (ret)
return ret;
 
-   placements = kmalloc_array(args->num_regions,
-  sizeof(str

[PATCH 0/8] drm/i915: Migrate memory to SMEM when imported cross-device (v8)

2021-07-23 Thread Jason Ekstrand
This patch series fixes an issue with discrete graphics on Intel where we
allowed dma-buf import while leaving the object in local memory.  This
breaks down pretty badly if the import happened on a different physical
device.

v7:
 - Drop "drm/i915/gem/ttm: Place new BOs in the requested region"
 - Add a new "drm/i915/gem: Call i915_gem_flush_free_objects() in 
i915_gem_dumb_create()"
 - Misc. review feedback from Matthew Auld
v8:
 - Misc. review feedback from Matthew Auld
v9:
 - Replace the i915/ttm patch with two that are hopefully more correct

Jason Ekstrand (6):
  drm/i915/gem: Check object_can_migrate from object_migrate
  drm/i915/gem: Refactor placement setup for i915_gem_object_create*
(v2)
  drm/i915/gem: Call i915_gem_flush_free_objects() in
i915_gem_dumb_create()
  drm/i915/gem: Unify user object creation (v3)
  drm/i915/gem/ttm: Only call __i915_gem_object_set_pages if needed
  drm/i915/gem: Always call obj->ops->migrate unless can_migrate fails

Thomas Hellström (2):
  drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)
  drm/i915/gem: Migrate to system at dma-buf attach time (v7)

 drivers/gpu/drm/i915/gem/i915_gem_create.c| 177 
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  58 --
 drivers/gpu/drm/i915/gem/i915_gem_object.c|  20 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|   4 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   |  13 +-
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 190 +-
 .../drm/i915/gem/selftests/i915_gem_migrate.c |  15 --
 7 files changed, 341 insertions(+), 136 deletions(-)

-- 
2.31.1



Re: [Intel-gfx] [PATCH] drm/i915: Ditch i915 globals shrink infrastructure

2021-07-22 Thread Jason Ekstrand
On Thu, Jul 22, 2021 at 5:34 AM Tvrtko Ursulin
 wrote:
> On 22/07/2021 11:16, Daniel Vetter wrote:
> > On Thu, Jul 22, 2021 at 11:02:55AM +0100, Tvrtko Ursulin wrote:
> >> On 21/07/2021 19:32, Daniel Vetter wrote:
> >>> This essentially reverts
> >>>
> >>> commit 84a1074920523430f9dc30ff907f4801b4820072
> >>> Author: Chris Wilson 
> >>> Date:   Wed Jan 24 11:36:08 2018 +
> >>>
> >>>   drm/i915: Shrink the GEM kmem_caches upon idling
> >>>
> >>> mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it
> >>> then we need to fix that there, not hand-roll our own slab shrinking
> >>> code in i915.
> >>
> >> This is somewhat incomplete statement which ignores a couple of angles so I
> >> wish there was a bit more time to respond before steam rolling it in. :(
> >>
> >> The removed code was not a hand rolled shrinker, but about managing slab
> >> sizes in face of bursty workloads. Core code does not know when i915 is
> >> active and when it is idle, so calling kmem_cache_shrink() after going idle
> >> wass supposed to help with house keeping by doing house keeping work 
> >> outside
> >> of the latency sensitive phase.
> >>
> >> To "fix" (improve really) it in core as you suggest, would need some method
> >> of signaling when a slab user feels is an opportunte moment to do this 
> >> house
> >> keeping. And kmem_cache_shrink is just that so I don't see the problem.
> >>
> >> Granted, argument kmem_cache_shrink is not much used is a valid one so
> >> discussion overall is definitely valid. Becuase on the higher level we 
> >> could
> >> definitely talk about which workloads actually benefit from this code and
> >> how much which probably no one knows at this point.

Pardon me for being a bit curt here, but that discussion should have
happened 3.5 years ago when this landed.  The entire justification we
have on record for this change is, "When we finally decide the gpu is
idle, that is a good time to shrink our kmem_caches."  We have no
record of any workloads which benefit from this and no recorded way to
reproduce any supposed benefits, even if it requires a microbenchmark.
But we added over 100 lines of code for it anyway, including a bunch
of hand-rolled RCU juggling.  Ripping out unjustified complexity is
almost always justified, IMO.  The burden of proof here isn't on
Daniel to show he isn't regressing anything but it was on you and
Chris to show that complexity was worth something back in 2018 when
this landed.

--Jason


> >> But in general I think you needed to leave more time for discussion. 12
> >> hours is way too short.
> >
> > It's 500+ users of kmem_cache_create vs i915 doing kmem_cache_shrink. And
>
> There are two other callers for the record. ;)
>
> > I guarantee you there's slab users that churn through more allocations
> > than we do, and are more bursty.
>
> I wasn't disputing that.
>
> > An extraordinary claim like this needs extraordinary evidence. And then a
> > discussion with core mm/ folks so that we can figure out how to solve the
> > discovered problem best for the other 500+ users of slabs in-tree, so that
> > everyone benefits. Not just i915 gpu workloads.
>
> Yep, not disputing that either. Noticed I wrote it was a valid argument?
>
> But discussion with mm folks could also have happened before you steam
> rolled the "revert" in though. Perhaps tey would have said
> kmem_cache_shrink is the way. Or maybe it isn't. Or maybe they would
> have said meh. I just don't see how the rush was justified given the
> code in question.
>
> Regards,
>
> Tvrtko
>
> > -Daniel
> >
> >>> Noticed while reviewing a patch set from Jason to fix up some issues
> >>> in our i915_init() and i915_exit() module load/cleanup code. Now that
> >>> i915_globals.c isn't any different than normal init/exit functions, we
> >>> should convert them over to one unified table and remove
> >>> i915_globals.[hc] entirely.
> >>>
> >>> Cc: David Airlie 
> >>> Cc: Jason Ekstrand 
> >>> Signed-off-by: Daniel Vetter 
> >>> ---
> >>>drivers/gpu/drm/i915/gem/i915_gem_context.c |  6 --
> >>>drivers/gpu/drm/i915/gem/i915_gem_object.c  |  6 --
> >>>drivers/gpu/drm/i915/gt/intel_context.c |  6 --
> >>>drivers/gpu/drm/i915/gt/intel_gt_pm.c   |  4 -
> >>>drivers/gpu/drm/i915/i915_active.c   

Re: [Intel-gfx] [PATCH 3/4] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-22 Thread Jason Ekstrand
On Thu, Jul 22, 2021 at 3:44 AM Matthew Auld
 wrote:
>
> On Wed, 21 Jul 2021 at 21:28, Jason Ekstrand  wrote:
> >
> > On Thu, Jul 15, 2021 at 5:16 AM Matthew Auld  wrote:
> > >
> > > From: Chris Wilson 
> > >
> > > Jason Ekstrand requested a more efficient method than userptr+set-domain
> > > to determine if the userptr object was backed by a complete set of pages
> > > upon creation. To be more efficient than simply populating the userptr
> > > using get_user_pages() (as done by the call to set-domain or execbuf),
> > > we can walk the tree of vm_area_struct and check for gaps or vma not
> > > backed by struct page (VM_PFNMAP). The question is how to handle
> > > VM_MIXEDMAP which may be either struct page or pfn backed...
> > >
> > > With discrete are going to drop support for set_domain(), so offering a
> > > way to probe the pages, without having to resort to dummy batches has
> > > been requested.
> > >
> > > v2:
> > > - add new query param for the PROPBE flag, so userspace can easily
> > >   check if the kernel supports it(Jason).
> > > - use mmap_read_{lock, unlock}.
> > > - add some kernel-doc.
> > >
> > > Testcase: igt/gem_userptr_blits/probe
> > > Signed-off-by: Chris Wilson 
> > > Signed-off-by: Matthew Auld 
> > > Cc: Thomas Hellström 
> > > Cc: Maarten Lankhorst 
> > > Cc: Tvrtko Ursulin 
> > > Cc: Jordan Justen 
> > > Cc: Kenneth Graunke 
> > > Cc: Jason Ekstrand 
> > > Cc: Daniel Vetter 
> > > Cc: Ramalingam C 
> > > ---
> > >  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 -
> > >  drivers/gpu/drm/i915/i915_getparam.c|  3 ++
> > >  include/uapi/drm/i915_drm.h | 18 ++
> > >  3 files changed, 60 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> > > b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > index 56edfeff8c02..fd6880328596 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> > > @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops 
> > > i915_gem_userptr_ops = {
> > >
> > >  #endif
> > >
> > > +static int
> > > +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> > > +{
> > > +   const unsigned long end = addr + len;
> > > +   struct vm_area_struct *vma;
> > > +   int ret = -EFAULT;
> > > +
> > > +   mmap_read_lock(mm);
> > > +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> > > +   if (vma->vm_start > addr)
> >
> > Why isn't this > end?  Are we somehow guaranteed that one vma covers
> > the entire range?
>
> AFAIK we are just making sure we don't have a hole(note that we also
> update addr below), for example the user might have done a partial
> munmap. There could be multiple vma's if the kernel was unable to
> merge them. If we reach the vm_end >= end, then we know we have a
> "valid" range.

Ok.  That wasn't obvious to me but I see the addr update now.  Makes
sense.  Might be worth a one-line comment for the next guy.  Either
way,

Reviewed-by: Jason Ekstrand 

Thanks for wiring this up!

--Jason

> >
> > > +   break;
> > > +
> > > +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> > > +   break;
> > > +
> > > +   if (vma->vm_end >= end) {
> > > +   ret = 0;
> > > +   break;
> > > +   }
> > > +
> > > +   addr = vma->vm_end;
> > > +   }
> > > +   mmap_read_unlock(mm);
> > > +
> > > +   return ret;
> > > +}
> > > +
> > >  /*
> > >   * Creates a new mm object that wraps some normal memory from the process
> > >   * context - user memory.
> > > @@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> > > }
> > >
> > > if (args->flags & ~(I915_USERPTR_READ_ONLY |
> > > -   I915_USERPTR_UNSYNCHRONIZED))
> > > +   I915_USERPTR_UNSYNCHRONIZED |
> > > +   I915_USERPTR_PROBE))
> > > return -EINVAL;
> > >
&

Re: [PATCH 3/4] drm/i915/userptr: Probe existence of backing struct pages upon creation

2021-07-21 Thread Jason Ekstrand
On Thu, Jul 15, 2021 at 5:16 AM Matthew Auld  wrote:
>
> From: Chris Wilson 
>
> Jason Ekstrand requested a more efficient method than userptr+set-domain
> to determine if the userptr object was backed by a complete set of pages
> upon creation. To be more efficient than simply populating the userptr
> using get_user_pages() (as done by the call to set-domain or execbuf),
> we can walk the tree of vm_area_struct and check for gaps or vma not
> backed by struct page (VM_PFNMAP). The question is how to handle
> VM_MIXEDMAP which may be either struct page or pfn backed...
>
> With discrete are going to drop support for set_domain(), so offering a
> way to probe the pages, without having to resort to dummy batches has
> been requested.
>
> v2:
> - add new query param for the PROPBE flag, so userspace can easily
>   check if the kernel supports it(Jason).
> - use mmap_read_{lock, unlock}.
> - add some kernel-doc.
>
> Testcase: igt/gem_userptr_blits/probe
> Signed-off-by: Chris Wilson 
> Signed-off-by: Matthew Auld 
> Cc: Thomas Hellström 
> Cc: Maarten Lankhorst 
> Cc: Tvrtko Ursulin 
> Cc: Jordan Justen 
> Cc: Kenneth Graunke 
> Cc: Jason Ekstrand 
> Cc: Daniel Vetter 
> Cc: Ramalingam C 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 40 -
>  drivers/gpu/drm/i915/i915_getparam.c|  3 ++
>  include/uapi/drm/i915_drm.h | 18 ++
>  3 files changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> index 56edfeff8c02..fd6880328596 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> @@ -422,6 +422,33 @@ static const struct drm_i915_gem_object_ops 
> i915_gem_userptr_ops = {
>
>  #endif
>
> +static int
> +probe_range(struct mm_struct *mm, unsigned long addr, unsigned long len)
> +{
> +   const unsigned long end = addr + len;
> +   struct vm_area_struct *vma;
> +   int ret = -EFAULT;
> +
> +   mmap_read_lock(mm);
> +   for (vma = find_vma(mm, addr); vma; vma = vma->vm_next) {
> +   if (vma->vm_start > addr)

Why isn't this > end?  Are we somehow guaranteed that one vma covers
the entire range?

> +   break;
> +
> +   if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> +   break;
> +
> +   if (vma->vm_end >= end) {
> +   ret = 0;
> +   break;
> +   }
> +
> +   addr = vma->vm_end;
> +   }
> +   mmap_read_unlock(mm);
> +
> +   return ret;
> +}
> +
>  /*
>   * Creates a new mm object that wraps some normal memory from the process
>   * context - user memory.
> @@ -477,7 +504,8 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> }
>
> if (args->flags & ~(I915_USERPTR_READ_ONLY |
> -   I915_USERPTR_UNSYNCHRONIZED))
> +   I915_USERPTR_UNSYNCHRONIZED |
> +   I915_USERPTR_PROBE))
> return -EINVAL;
>
> if (i915_gem_object_size_2big(args->user_size))
> @@ -504,6 +532,16 @@ i915_gem_userptr_ioctl(struct drm_device *dev,
> return -ENODEV;
> }
>
> +   if (args->flags & I915_USERPTR_PROBE) {
> +   /*
> +* Check that the range pointed to represents real struct
> +* pages and not iomappings (at this moment in time!)
> +*/
> +   ret = probe_range(current->mm, args->user_ptr, 
> args->user_size);
> +   if (ret)
> +   return ret;
> +   }
> +
>  #ifdef CONFIG_MMU_NOTIFIER
> obj = i915_gem_object_alloc();
> if (obj == NULL)
> diff --git a/drivers/gpu/drm/i915/i915_getparam.c 
> b/drivers/gpu/drm/i915/i915_getparam.c
> index 24e18219eb50..d6d2e1a10d14 100644
> --- a/drivers/gpu/drm/i915/i915_getparam.c
> +++ b/drivers/gpu/drm/i915/i915_getparam.c
> @@ -163,6 +163,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void 
> *data,
> case I915_PARAM_PERF_REVISION:
> value = i915_perf_ioctl_version();
> break;
> +   case I915_PARAM_HAS_USERPTR_PROBE:
> +   value = true;
> +   break;
> default:
> DRM_DEBUG("Unknown parameter %d\n", param->param);
> return -EINVAL;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index e20eeeca7a1c..2e4112b

Re: [PATCH] drm/i915: Ditch i915 globals shrink infrastructure

2021-07-21 Thread Jason Ekstrand
On Wed, Jul 21, 2021 at 1:32 PM Daniel Vetter  wrote:
>
> This essentially reverts
>
> commit 84a1074920523430f9dc30ff907f4801b4820072
> Author: Chris Wilson 
> Date:   Wed Jan 24 11:36:08 2018 +
>
> drm/i915: Shrink the GEM kmem_caches upon idling
>
> mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it
> then we need to fix that there, not hand-roll our own slab shrinking
> code in i915.
>
> Noticed while reviewing a patch set from Jason to fix up some issues
> in our i915_init() and i915_exit() module load/cleanup code. Now that
> i915_globals.c isn't any different than normal init/exit functions, we
> should convert them over to one unified table and remove
> i915_globals.[hc] entirely.

Mind throwing in a comment somewhere about how i915 is one of only two
users of kmem_cache_shrink() in the entire kernel?  That also seems to
be pretty good evidence that it's not useful.

Reviewed-by: Jason Ekstrand 

Feel free to land at-will and I'll deal with merge conflicts on my end.

> Cc: David Airlie 
> Cc: Jason Ekstrand 
> Signed-off-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c |  6 --
>  drivers/gpu/drm/i915/gem/i915_gem_object.c  |  6 --
>  drivers/gpu/drm/i915/gt/intel_context.c |  6 --
>  drivers/gpu/drm/i915/gt/intel_gt_pm.c   |  4 -
>  drivers/gpu/drm/i915/i915_active.c  |  6 --
>  drivers/gpu/drm/i915/i915_globals.c | 95 -
>  drivers/gpu/drm/i915/i915_globals.h |  3 -
>  drivers/gpu/drm/i915/i915_request.c |  7 --
>  drivers/gpu/drm/i915/i915_scheduler.c   |  7 --
>  drivers/gpu/drm/i915/i915_vma.c |  6 --
>  10 files changed, 146 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 7d6f52d8a801..bf2a2319353a 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -2280,18 +2280,12 @@ i915_gem_engines_iter_next(struct 
> i915_gem_engines_iter *it)
>  #include "selftests/i915_gem_context.c"
>  #endif
>
> -static void i915_global_gem_context_shrink(void)
> -{
> -   kmem_cache_shrink(global.slab_luts);
> -}
> -
>  static void i915_global_gem_context_exit(void)
>  {
> kmem_cache_destroy(global.slab_luts);
>  }
>
>  static struct i915_global_gem_context global = { {
> -   .shrink = i915_global_gem_context_shrink,
> .exit = i915_global_gem_context_exit,
>  } };
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 9da7b288b7ed..5c21cff33199 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -664,18 +664,12 @@ void i915_gem_init__objects(struct drm_i915_private 
> *i915)
> INIT_WORK(>mm.free_work, __i915_gem_free_work);
>  }
>
> -static void i915_global_objects_shrink(void)
> -{
> -   kmem_cache_shrink(global.slab_objects);
> -}
> -
>  static void i915_global_objects_exit(void)
>  {
> kmem_cache_destroy(global.slab_objects);
>  }
>
>  static struct i915_global_object global = { {
> -   .shrink = i915_global_objects_shrink,
> .exit = i915_global_objects_exit,
>  } };
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index bd63813c8a80..c1338441cc1d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -398,18 +398,12 @@ void intel_context_fini(struct intel_context *ce)
> i915_active_fini(>active);
>  }
>
> -static void i915_global_context_shrink(void)
> -{
> -   kmem_cache_shrink(global.slab_ce);
> -}
> -
>  static void i915_global_context_exit(void)
>  {
> kmem_cache_destroy(global.slab_ce);
>  }
>
>  static struct i915_global_context global = { {
> -   .shrink = i915_global_context_shrink,
> .exit = i915_global_context_exit,
>  } };
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c 
> b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index aef3084e8b16..d86825437516 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -67,8 +67,6 @@ static int __gt_unpark(struct intel_wakeref *wf)
>
> GT_TRACE(gt, "\n");
>
> -   i915_globals_unpark();
> -
> /*
>  * It seems that the DMC likes to transition between the DC states a 
> lot
>  * when there are no connected displays (no active power domains) 
> during
> @@ -116,8 +114,6 @@ static int __gt_park(struct intel_wa

Re: [Intel-gfx] [PATCH 6/6] drm/i915: Make the kmem slab for i915_buddy_block a global

2021-07-21 Thread Jason Ekstrand
On Wed, Jul 21, 2021 at 1:56 PM Daniel Vetter  wrote:
>
> On Wed, Jul 21, 2021 at 05:25:41PM +0100, Matthew Auld wrote:
> > On 21/07/2021 16:23, Jason Ekstrand wrote:
> > > There's no reason that I can tell why this should be per-i915_buddy_mm
> > > and doing so causes KMEM_CACHE to throw dmesg warnings because it tries
> > > to create a debugfs entry with the name i915_buddy_block multiple times.
> > > We could handle this by carefully giving each slab its own name but that
> > > brings its own pain because then we have to store that string somewhere
> > > and manage the lifetimes of the different slabs.  The most likely
> > > outcome would be a global atomic which we increment to get a new name or
> > > something like that.
> > >
> > > The much easier solution is to use the i915_globals system like we do
> > > for every other slab in i915.  This ensures that we have exactly one of
> > > them for each i915 driver load and it gets neatly created on module load
> > > and destroyed on module unload.  Using the globals system also means
> > > that its now tied into the shrink handler so we can properly respond to
> > > low-memory situations.
> > >
> > > Signed-off-by: Jason Ekstrand 
> > > Fixes: 88be9a0a06b7 ("drm/i915/ttm: add ttm_buddy_man")
> > > Cc: Matthew Auld 
> > > Cc: Christian König 
> >
> > It was intentionally ripped it out with the idea that we would be moving the
> > buddy stuff into ttm, and so part of that was trying to get rid of the some
> > of the i915 specifics, like this globals thing.
> >
> > Reviewed-by: Matthew Auld 
>
> I just sent out a patch to put i915_globals on a diet, so maybe we can
> hold this patch here a bit when there's other reasons for why this is
> special?

This is required to get rid of the dmesg warnings.

> Or at least no make this use the i915_globals stuff and instead just link
> up the init/exit function calls directly into Jason's new table, so that
> we don't have a merge conflict here?

I'm happy to deal with merge conflicts however they land.

--Jason

> -Daniel
>
> >
> > > ---
> > >   drivers/gpu/drm/i915/i915_buddy.c   | 44 ++---
> > >   drivers/gpu/drm/i915/i915_buddy.h   |  3 +-
> > >   drivers/gpu/drm/i915/i915_globals.c |  2 ++
> > >   3 files changed, 38 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/i915/i915_buddy.c 
> > > b/drivers/gpu/drm/i915/i915_buddy.c
> > > index 29dd7d0310c1f..911feedad4513 100644
> > > --- a/drivers/gpu/drm/i915/i915_buddy.c
> > > +++ b/drivers/gpu/drm/i915/i915_buddy.c
> > > @@ -8,8 +8,14 @@
> > >   #include "i915_buddy.h"
> > >   #include "i915_gem.h"
> > > +#include "i915_globals.h"
> > >   #include "i915_utils.h"
> > > +static struct i915_global_buddy {
> > > +   struct i915_global base;
> > > +   struct kmem_cache *slab_blocks;
> > > +} global;
> > > +
> > >   static struct i915_buddy_block *i915_block_alloc(struct i915_buddy_mm 
> > > *mm,
> > >  struct i915_buddy_block 
> > > *parent,
> > >  unsigned int order,
> > > @@ -19,7 +25,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
> > > i915_buddy_mm *mm,
> > > GEM_BUG_ON(order > I915_BUDDY_MAX_ORDER);
> > > -   block = kmem_cache_zalloc(mm->slab_blocks, GFP_KERNEL);
> > > +   block = kmem_cache_zalloc(global.slab_blocks, GFP_KERNEL);
> > > if (!block)
> > > return NULL;
> > > @@ -34,7 +40,7 @@ static struct i915_buddy_block *i915_block_alloc(struct 
> > > i915_buddy_mm *mm,
> > >   static void i915_block_free(struct i915_buddy_mm *mm,
> > > struct i915_buddy_block *block)
> > >   {
> > > -   kmem_cache_free(mm->slab_blocks, block);
> > > +   kmem_cache_free(global.slab_blocks, block);
> > >   }
> > >   static void mark_allocated(struct i915_buddy_block *block)
> > > @@ -85,15 +91,11 @@ int i915_buddy_init(struct i915_buddy_mm *mm, u64 
> > > size, u64 chunk_size)
> > > GEM_BUG_ON(mm->max_order > I915_BUDDY_MAX_ORDER);
> > > -   mm->slab_blocks = KMEM_CACHE(i915_buddy_block, SLAB_HWCACHE_ALIGN);
> > > -   if (!mm->slab_blocks)
> > > -   return -ENOMEM;
> > > -
> > > mm->free_l

[PATCH 7/7] drm/i915/gem: Migrate to system at dma-buf attach time (v7)

2021-07-21 Thread Jason Ekstrand
From: Thomas Hellström 

Until we support p2p dma or as a complement to that, migrate data
to system memory at dma-buf attach time if possible.

v2:
- Rebase on dynamic exporter. Update the igt_dmabuf_import_same_driver
  selftest to migrate if we are LMEM capable.
v3:
- Migrate also in the pin() callback.
v4:
- Migrate in attach
v5: (jason)
- Lock around the migration
v6: (jason)
- Move the can_migrate check outside the lock
- Rework the selftests to test more migration conditions.  In
  particular, SMEM, LMEM, and LMEM+SMEM are all checked.
v7: (mauld)
- Misc style nits

Signed-off-by: Thomas Hellström 
Signed-off-by: Michael J. Ruhl 
Reported-by: kernel test robot 
Signed-off-by: Jason Ekstrand 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 23 -
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 87 ++-
 2 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 59dc56ae14d6b..afa34111de02e 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -164,8 +164,29 @@ static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
  struct dma_buf_attachment *attach)
 {
struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+   struct i915_gem_ww_ctx ww;
+   int err;
+
+   if (!i915_gem_object_can_migrate(obj, INTEL_REGION_SMEM))
+   return -EOPNOTSUPP;
+
+   for_i915_gem_ww(, err, true) {
+   err = i915_gem_object_lock(obj, );
+   if (err)
+   continue;
+
+   err = i915_gem_object_migrate(obj, , INTEL_REGION_SMEM);
+   if (err)
+   continue;
 
-   return i915_gem_object_pin_pages_unlocked(obj);
+   err = i915_gem_object_wait_migration(obj, 0);
+   if (err)
+   continue;
+
+   err = i915_gem_object_pin_pages(obj);
+   }
+
+   return err;
 }
 
 static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
index d4ce01e6ee854..ffae7df5e4d7d 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
@@ -85,9 +85,63 @@ static int igt_dmabuf_import_self(void *arg)
return err;
 }
 
-static int igt_dmabuf_import_same_driver(void *arg)
+static int igt_dmabuf_import_same_driver_lmem(void *arg)
 {
struct drm_i915_private *i915 = arg;
+   struct intel_memory_region *lmem = i915->mm.regions[INTEL_REGION_LMEM];
+   struct drm_i915_gem_object *obj;
+   struct drm_gem_object *import;
+   struct dma_buf *dmabuf;
+   int err;
+
+   if (!lmem)
+   return 0;
+
+   force_different_devices = true;
+
+   obj = __i915_gem_object_create_user(i915, PAGE_SIZE, , 1);
+   if (IS_ERR(obj)) {
+   pr_err("__i915_gem_object_create_user failed with err=%ld\n",
+  PTR_ERR(dmabuf));
+   err = PTR_ERR(obj);
+   goto out_ret;
+   }
+
+   dmabuf = i915_gem_prime_export(>base, 0);
+   if (IS_ERR(dmabuf)) {
+   pr_err("i915_gem_prime_export failed with err=%ld\n",
+  PTR_ERR(dmabuf));
+   err = PTR_ERR(dmabuf);
+   goto out;
+   }
+
+   /*
+* We expect an import of an LMEM-only object to fail with
+* -EOPNOTSUPP because it can't be migrated to SMEM.
+*/
+   import = i915_gem_prime_import(>drm, dmabuf);
+   if (!IS_ERR(import)) {
+   drm_gem_object_put(import);
+   pr_err("i915_gem_prime_import succeeded when it shouldn't 
have\n");
+   err = -EINVAL;
+   } else if (PTR_ERR(import) != -EOPNOTSUPP) {
+   pr_err("i915_gem_prime_import failed with the wrong err=%ld\n",
+  PTR_ERR(import));
+   err = PTR_ERR(import);
+   }
+
+   dma_buf_put(dmabuf);
+out:
+   i915_gem_object_put(obj);
+out_ret:
+   force_different_devices = false;
+   return err;
+}
+
+static int igt_dmabuf_import_same_driver(struct drm_i915_private *i915,
+struct intel_memory_region **regions,
+unsigned int num_regions)
+{
struct drm_i915_gem_object *obj, *import_obj;
struct drm_gem_object *import;
struct dma_buf *dmabuf;
@@ -97,8 +151,12 @@ static int igt_dmabuf_import_same_driver(void *arg)
int err;
 
force_different_devices = true;
-   obj = i915_gem_object_create_shmem(i915, PAGE_SIZE);
+
+   obj = __i915_gem_object_create_user(i915, PAGE_SIZE,
+   

[PATCH 6/7] drm/i915/gem: Correct the locking and pin pattern for dma-buf (v8)

2021-07-21 Thread Jason Ekstrand
From: Thomas Hellström 

If our exported dma-bufs are imported by another instance of our driver,
that instance will typically have the imported dma-bufs locked during
dma_buf_map_attachment(). But the exporter also locks the same reservation
object in the map_dma_buf() callback, which leads to recursive locking.

So taking the lock inside _pin_pages_unlocked() is incorrect.

Additionally, the current pinning code path is contrary to the defined
way that pinning should occur.

Remove the explicit pin/unpin from the map/umap functions and move them
to the attach/detach allowing correct locking to occur, and to match
the static dma-buf drm_prime pattern.

Add a live selftest to exercise both dynamic and non-dynamic
exports.

v2:
- Extend the selftest with a fake dynamic importer.
- Provide real pin and unpin callbacks to not abuse the interface.
v3: (ruhl)
- Remove the dynamic export support and move the pinning into the
  attach/detach path.
v4: (ruhl)
- Put pages does not need to assert on the dma-resv
v5: (jason)
- Lock around dma_buf_unmap_attachment() when emulating a dynamic
  importer in the subtests.
- Use pin_pages_unlocked
v6: (jason)
- Use dma_buf_attach instead of dma_buf_attach_dynamic in the selftests
v7: (mauld)
- Use __i915_gem_object_get_pages (2 __underscores) instead of the
  4 underscore version in the selftests
v8: (mauld)
- Drop the kernel doc from the static i915_gem_dmabuf_attach function
- Add missing "err = PTR_ERR()" to a bunch of selftest error cases

Reported-by: Michael J. Ruhl 
Signed-off-by: Thomas Hellström 
Signed-off-by: Michael J. Ruhl 
Signed-off-by: Jason Ekstrand 
Reviewed-by: Jason Ekstrand 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|  37 --
 .../drm/i915/gem/selftests/i915_gem_dmabuf.c  | 109 +-
 2 files changed, 132 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index 616c3a2f1baf0..59dc56ae14d6b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -12,6 +12,8 @@
 #include "i915_gem_object.h"
 #include "i915_scatterlist.h"
 
+I915_SELFTEST_DECLARE(static bool force_different_devices;)
+
 static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf)
 {
return to_intel_bo(buf->priv);
@@ -25,15 +27,11 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
struct scatterlist *src, *dst;
int ret, i;
 
-   ret = i915_gem_object_pin_pages_unlocked(obj);
-   if (ret)
-   goto err;
-
/* Copy sg so that we make an independent mapping */
st = kmalloc(sizeof(struct sg_table), GFP_KERNEL);
if (st == NULL) {
ret = -ENOMEM;
-   goto err_unpin_pages;
+   goto err;
}
 
ret = sg_alloc_table(st, obj->mm.pages->nents, GFP_KERNEL);
@@ -58,8 +56,6 @@ static struct sg_table *i915_gem_map_dma_buf(struct 
dma_buf_attachment *attachme
sg_free_table(st);
 err_free:
kfree(st);
-err_unpin_pages:
-   i915_gem_object_unpin_pages(obj);
 err:
return ERR_PTR(ret);
 }
@@ -68,13 +64,9 @@ static void i915_gem_unmap_dma_buf(struct dma_buf_attachment 
*attachment,
   struct sg_table *sg,
   enum dma_data_direction dir)
 {
-   struct drm_i915_gem_object *obj = dma_buf_to_obj(attachment->dmabuf);
-
dma_unmap_sgtable(attachment->dev, sg, dir, DMA_ATTR_SKIP_CPU_SYNC);
sg_free_table(sg);
kfree(sg);
-
-   i915_gem_object_unpin_pages(obj);
 }
 
 static int i915_gem_dmabuf_vmap(struct dma_buf *dma_buf, struct dma_buf_map 
*map)
@@ -168,7 +160,25 @@ static int i915_gem_end_cpu_access(struct dma_buf 
*dma_buf, enum dma_data_direct
return err;
 }
 
+static int i915_gem_dmabuf_attach(struct dma_buf *dmabuf,
+ struct dma_buf_attachment *attach)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+
+   return i915_gem_object_pin_pages_unlocked(obj);
+}
+
+static void i915_gem_dmabuf_detach(struct dma_buf *dmabuf,
+  struct dma_buf_attachment *attach)
+{
+   struct drm_i915_gem_object *obj = dma_buf_to_obj(dmabuf);
+
+   i915_gem_object_unpin_pages(obj);
+}
+
 static const struct dma_buf_ops i915_dmabuf_ops =  {
+   .attach = i915_gem_dmabuf_attach,
+   .detach = i915_gem_dmabuf_detach,
.map_dma_buf = i915_gem_map_dma_buf,
.unmap_dma_buf = i915_gem_unmap_dma_buf,
.release = drm_gem_dmabuf_release,
@@ -204,6 +214,8 @@ static int i915_gem_object_get_pages_dmabuf(struct 
drm_i915_gem_object *obj)
struct sg_table *pages;
unsigned int sg_page_sizes;
 
+   assert_object_held(obj);
+
pages = dma_buf_map_attachment(obj->base.import_attach,
 

  1   2   3   4   5   6   7   8   9   >