On 11/29/2017 04:09 PM, Miguel Angel Vico wrote:


On Wed, 29 Nov 2017 16:28:15 -0500
Rob Clark <[email protected]> wrote:

On Wed, Nov 29, 2017 at 2:41 PM, Miguel Angel Vico <[email protected]> wrote:
Many of you may already know, but James is going to be out for a few
weeks and I'll be taking over this in the meantime.

Sorry for the unfortunate timing. I am indeed on paternity leave at the moment. Some quick comments below. I'll be trying to follow the discussion as time allows while I'm out.

See inline for comments.

On Wed, 29 Nov 2017 09:33:29 -0800
Jason Ekstrand <[email protected]> wrote:
On Sat, Nov 25, 2017 at 1:20 PM, Rob Clark <[email protected]> wrote:
On Sat, Nov 25, 2017 at 12:46 PM, Jason Ekstrand <[email protected]>
wrote:
On November 24, 2017 09:29:43 Rob Clark <[email protected]> wrote:


On Mon, Nov 20, 2017 at 8:11 PM, James Jones <[email protected]>
wrote:

As many here know at this point, I've been working on solving issues
related
to DMA-capable memory allocation for various devices for some time now.
I'd
like to take this opportunity to apologize for the way I handled the
EGL
stream proposals.  I understand now that the development process
followed
there was unacceptable to the community and likely offended many great
engineers.

Moving forward, I attempted to reboot talks in a more constructive
manner
with the generic allocator library proposals & discussion forum at XDC
2016.
Some great design ideas came out of that, and I've since been
prototyping
some code to prove them out before bringing them back as official
proposals.
Again, I understand some people are growing concerned that I've been
doing
this off on the side in a github project that has primarily NVIDIA
contributors.  My goal was only to avoid wasting everyone's time with
unproven ideas.  The intent was never to dump the prototype code as-is
on
the community and presume acceptance. It's just a public research
project.

Now the prototyping is nearing completion, and I'd like to renew
discussion
on whether and how the new mechanisms can be integrated with the Linux
graphics stack.

I'd be interested to know if more work is needed to demonstrate the
usefulness of the new mechanisms, or whether people think they have
value
at
this point.

After talking with people on the hallway track at XDC this year, I've
heard
several proposals for incorporating the new mechanisms:

-Include ideas from the generic allocator design into GBM.  This could
take
the form of designing a "GBM 2.0" API, or incrementally adding to the
existing GBM API.

-Develop a library to replace GBM.  The allocator prototype code could
be
massaged into something production worthy to jump start this process.

-Develop a library that sits beside or on top of GBM, using GBM for
low-level graphics buffer allocation, while supporting non-graphics
kernel
APIs directly.  The additional cross-device negotiation and sorting of
capabilities would be handled in this slightly higher-level API before
handing off to GBM and other APIs for actual allocation somehow.


tbh, I kinda see GBM and $new_thing sitting side by side.. GBM is
still the "winsys" for running on "bare metal" (ie. kms).  And we
don't want to saddle $new_thing with aspects of that, but rather have
it focus on being the thing that in multiple-"device"[1] scenarious
figures out what sort of buffer can be allocated by who for sharing.
Ie $new_thing should really not care about winsys level things like
cursors or surfaces.. only buffers.

The mesa implementation of $new_thing could sit on top of GBM,
although it could also just sit on top of the same internal APIs that
GBM sits on top of.  That is an implementation detail.  It could be
that GBM grows an API to return an instance of $new_thing for
use-cases that involve sharing a buffer with the GPU.  Or perhaps that
is exposed via some sort of EGL extension.  (We probably also need a
way to get an instance from libdrm (?) for display-only KMS drivers,
to cover cases like etnaviv sharing a buffer with a separate display
driver.)

[1] where "devices" could be multiple GPUs or multiple APIs for one or
more GPUs, but also includes non-GPU devices like camera, video
decoder, "image processor" (which may or may not be part of camera),
etc, etc


I'm not quite some sure what I think about this.  I think I would like to
see $new_thing at least replace the guts of GBM. Whether GBM becomes a
wrapper around $new_thing or $new_thing implements the GBM API, I'm not
sure.  What I don't think I want is to see GBM development continuing on
it's own so we have two competing solutions.

I don't really view them as competing.. there is *some* overlap, ie.
allocating a buffer.. but even if you are using GBM w/out $new_thing
you could allocate a buffer externally and import it.  I don't see
$new_thing as that much different from GBM PoV.

But things like surfaces (aka swap chains) seem a bit out of place
when you are thinking about implementing $new_thing for non-gpu
devices.  Plus EGL<->GBM tie-ins that seem out of place when talking
about a (for ex.) camera.  I kinda don't want to throw out the baby
with the bathwater here.

Agreed.  GBM is very EGLish and we don't want the new allocator to be that.

*maybe* GBM could be partially implemented on top of $new_thing.  I
don't quite see how that would work.  Possibly we could deprecate
parts of GBM that are no longer needed?  idk..  Either way, I fully
expect that GBM and mesa's implementation of $new_thing could perhaps
sit on to of some of the same set of internal APIs.  The public
interface can be decoupled from the internal implementation.

Maybe I should restate things a bit.  My real point was that modifiers +
$new_thing + Kernel blob should be a complete and more powerful replacement
for GBM.  I don't know that we really can implement GBM on top of it
because GBM has lots of wishy-washy concepts such as "cursor plane" which
may not map well at least not without querying the kernel about specifc
display planes.  In particular, I don't want someone to feel like they need
to use $new_thing and GBM at the same time or together.  Ideally, I'd like
them to never do that unless we decide gbm_bo is a useful abstraction for
$new_thing.

I'm not really familiar with GBM guts, so I don't know how easy would
it be to make GBM rely on the allocator for the buffer allocations.
Maybe that's something worth exploring. What I wouldn't like is
$new_thing to fall short because we are trying to shove it under GBM's
hood.

yeah, I think we should consider functionality of $new_thing
independent of GBM.. how to go from individual buffers allocated via
$new_thing to EGL surface/swapchain is I think out of scope for
$new_thing.

It seems to me that $new_thing should grow as a separate thing whether
it ends up replacing GBM or GBM internals are somewhat rewritten on top
of it. If I'm reading you both correctly, you agree with that, so in
order to move forward, should we go ahead and create a project in fd.o?

Before filing the new project request though, we should find an
appropriate name for $new_thing. Creativity isn't one of my strengths,
but I'll go ahead and start the bikeshedding with "Generic Device
Memory Allocator" or "Generic Device Memory Manager".

liballoc - Generic Device Memory Allocator ... seems reasonable to me..

Cool. If there aren't better suggestions, we can go with that. We
should also namespace all APIs and structures. Is 'galloc' distinctive
enough to be used as namespace? Being an 'r' away from gralloc maybe
it's a bit confusing?


I think it is reasonable to live on github until we figure out how
transitions work.. or in particular are there any thread restrictions
or interactions w/ gl context if transitions are done on the gpu or
anything like that?  Or can we just make it more vulkan like w/
explicit ctx ptr, and pass around fence fd's to synchronize everyone??
  I haven't thought about the transition part too much but I guess we
should have a reasonable idea for how that should work before we start
getting too many non-toy users, lest we find big API changes are
needed..

Seems fine, but I would like to get other people other than NVIDIANs
involved giving feedback on the design as we move forward with the
prototype.

Due to lack of a better list, is it okay to start sending patches to
mesa-dev? If that's a too broad audience, should I just CC specific
individuals that have somewhat contributed to the project?


Do we need to define both in-place and copy transitions?  Ie. what if
GPU is still reading a tiled or compressed texture (ie. sampling from
previous frame for some reason), but we need to untile/uncompress for
display.. of maybe there are some other cases like that we should
think about..

Maybe you already have some thoughts about that?

This is the next thing I'll be working on. I haven't given it much
thought myself so far, but I think James might have had some insights.
I'll read through some of his notes to double-check.

A couple of notes on usage transitions:

While chatting about transitions, a few assertions were made by others that I've come to accept, despite the fact that they reduce the generality of the allocator mechanisms:

-GPUs are the only things that actually need usage transitions as far as I know thus far. Other engines either share the GPU representations of data, or use more limited representations; the latter being the reason non-GPU usage transitions are a useful thing.

-It's reasonable to assume that a GPU is required to perform a usage transition. This follows from the above postulate. If only GPUs are using more advanced representations, you don't need any transitions unless you have a GPU available.

From that, I derived the rough API proposal for transitions presented on my XDC 2017 slides. Transition "metadata" is queried from the allocator given a pair of usages (which may refer to more than one device), but the realization of the transition is left to existing GPU APIs. I think I put Vulkan-like pseudo-code in the slides, but the GL external objects extensions (GL_EXT_memory_object and GL_EXT_semaphore) would work as well.

Regarding in-place Vs. copy: To me a transition is something that happens in-place, at least semantically. If you need to make copies, that's a format conversion blit not a transition, and graphics APIs are already capable of expressing that without any special transitions or help from the allocator. However, I understand some chipsets perform transitions using something that looks kind of like a blit using on-chip caches and constrained usage semantics. There's probably some work to do to see whether those need to be accommodated as conversion blits or usgae transitions.

For our hardware's purposes, transitions are just various levels of decompression or compression reconfiguration and potentially cache flushing/invalidation, so our transition metadata will just be some bits signaling which compression operation is needed, if any. That's the sort of operation I modeled the API around, so if things are much more exotic than that for others, it will probably require some adjustments.

Thanks,
Miguel.


Once we agree upon something, I can take care of filing the request,
but I'm unclear what the initial list of approvers should be.
Looking at the main contributors of both the initial draft of
$new_thing and git repository, does the following list of people seem
reasonable?

  * Rob Clark
  * Jason Ekstrand
  * James Jones
  * Chad Versace
  * Miguel A Vico

I never started a project in fd.o, so any useful advice will be
appreciated.

fwiw, https://www.freedesktop.org/wiki/NewProject/

BR,
-R

Thanks,
Miguel.
I *think* I like the idea of having $new_thing implement GBM as a
deprecated
legacy API.  Whether that means we start by pulling GBM out into it's own
project or we start over, I don't know.  My feeling is that the current
dri_interface is *not* what we want which is why starting with GBM makes
me
nervous.

/me expects if we pull GBM out of mesa, the interface between GBM and
mesa (or other GL drivers) is 'struct gbm_device'.. so "GBM the
project" is just a thin shim plus some 'struct gbm_device' versioning.

BR,
-R
I need to go read through your code before I can provide a stronger or
more
nuanced opinion.  That's not going to happen before the end of the year.

I hope you and others, especially those of you who seem to already have some well-formed ideas about end-goals for this project, do get a chance to go through the prototype code and simple kmscube example at some point. A code review is worth a thousand high-level design discussions IMHO, and it really isn't that much code at this point. Of course, I understand everyone's busy this time of year.

-I have also heard some general comments that regardless of the
relationship
between GBM and the new allocator mechanisms, it might be time to move
GBM
out of Mesa so it can be developed as a stand-alone project.  I'd be
interested what others think about that, as it would be something worth
coordinating with any other new development based on or inside of GBM.


+1

We already have at least a couple different non-mesa implementations
of GBM (which afaict tend to lag behind mesa's GBM and cause
headaches).

The extracted part probably isn't much more than a header and shim.
But probably does need to grow some versioning for the backend to know
if, for example, gbm->bo_map() is supported.. at least it could
provide stubs that return an error, rather than having link-time fail
if building something w/ $vendor's old gbm implementation.
And of course I'm open to any other ideas for integration.  Beyond just
where this code would live, there is much to debate about the
mechanisms
themselves and all the implementation details.  I was just hoping to
kick
things off with something high level to start.


My $0.02, is that the place where devel happens and place to go for
releases could be different.  Either way, I would like to see git tree
for tagged release versions live on fd.o and use the common release
process[2] for generating/uploading release tarballs that distros can
use.


Agreed.  I think fd.o is the right place for such a project to live.  We
can
have mirrors on GitHub and other places but fd.o is where Linux graphics
stack development currently happens.
[2] https://cgit.freedesktop.org/xorg/util/modular/tree/release.sh
For reference, the code Miguel and I have been developing for the
prototype
is here:

    https://github.com/cubanismo/allocator

And we've posted a port of kmscube that uses the new interfaces as a
demonstration here:

    https://github.com/cubanismo/kmscube

There are still some proposed mechanisms (usage transitions mainly)
that
aren't prototyped, but I think it makes sense to start discussing
integration while prototyping continues.


btw, I think a nice end goal would be a gralloc implementation using
this new API for sharing buffers in various use-cases.  That could
mean converting gbm-gralloc, or perhaps it means something new.

AOSP has support for mesa + upstream kernel for some devices which
also have upstream camera and/or video decoder in addition to just
GPU.. and this is where you start hitting the limits of a GBM based
gralloc.  In a lot of way, I view $new_thing as what gralloc *should*
have been, but at least it provides a way to implement a generic
gralloc.


+100

Maybe that is getting a step ahead, there is a lot we can prototype
with kmscube.  But gralloc gets us into interesting real-world
use-cases that involve more than just GPUs.  Possibly this would be
something that linaro might be interested in getting involved with?

Gralloc-on-$new_thing, as well as hwcomposer-on-$new_thing is one of my primary goals. However, it's a pretty heavy thing to prototype. If someone has the time though, I think it would be a great experiment. It would help flesh out the paltry list of usages, constraints, and capabilities in the existing prototype codebase. The kmscube example really should have added at least a "render" usage, but I got lazy and just re-used texture for now. That won't actually work on our HW in all cases, but it's good enough for kmscube.

Thanks,
-James

BR,
-R
In addition, I'd like to note that NVIDIA is committed to providing
open
source driver implementations of these mechanisms for our hardware, in
addition to support in our proprietary drivers.  In other words,
wherever
modifications to the nouveau kernel & userspace drivers are needed to
implement the improved allocator mechanisms, we'll be contributing
patches
if no one beats us to it.

Thanks in advance for any feedback!

-James Jones
_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




--
Miguel



_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to