On 8/12/20 10:10 AM, Karol Herbst wrote:
On Wed, Aug 12, 2020 at 7:03 PM James Jones <jajo...@nvidia.com> wrote:

On 8/12/20 5:37 AM, Ilia Mirkin wrote:
On Wed, Aug 12, 2020 at 8:24 AM Karol Herbst <kher...@redhat.com> wrote:

On Wed, Aug 12, 2020 at 12:43 PM Karol Herbst <kher...@redhat.com> wrote:

On Wed, Aug 12, 2020 at 12:27 PM Karol Herbst <kher...@redhat.com> wrote:

On Wed, Aug 12, 2020 at 2:19 AM James Jones <jajo...@nvidia.com> wrote:

Sorry for the slow reply here as well.  I've been in the process of
rebasing and reworking the userspace patches.  I'm not clear my changes
will address the Jetson Nano issue, but if you'd like to try them, the
latest userspace changes are available here:

     https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3724

And the tegra-drm kernel patches are here:


https://patchwork.ozlabs.org/project/linux-tegra/patch/20191217005205.2573-1-jajo...@nvidia.com/

Those + the kernel changes addressed in this thread are everything I had
outstanding.


I don't know if that's caused by your changes or not, but now the
assert I hit is a different one pointing out that
nvc0_miptree_select_best_modifier fails in a certain case and returns
MOD_INVALID... anyway, it seems like with your patches applied it's
now way easier to debug and figure out what's going wrong, so maybe I
can figure it out now :)


collected some information which might help to track it down.

src/gallium/frontends/dri/dri2.c:648 is the assert hit: assert(*zsbuf)

templ is {reference = {count = 0}, width0 = 300, height0 = 300, depth0
= 1, array_size = 1, format = PIPE_FORMAT_Z24X8_UNORM, target =
PIPE_TEXTURE_2D, last_level = 0, nr_samples = 0, nr_storage_samples =
0, usage = 0, bind = 1, flags = 0, next = 0x0, screen = 0x0}

inside tegra_screen_resource_create modifier says
DRM_FORMAT_MOD_INVALID as template->bind is 1

and nvc0_miptree_select_best_modifier returns DRM_FORMAT_MOD_INVALID,
so the call just returns NULL leading to the assert.

Btw, this is on Xorg-1.20.8-1.fc32.aarch64 with glxgears.


So I digged a bit deeper and here is what tripps it of:

when the context gets made current, the normal framebuffer validation
and render buffer allocation is done, but we end up inside
tegra_screen_resource_create at some point with PIPE_BIND_SCANOUT set
in template->bind. Now the tegra driver forces the
DRM_FORMAT_MOD_LINEAR modifier and calls into
resource_create_with_modifiers.

If it wouldn't do that, nouveau would allocate a tiled buffer, with
that it's linear and we at some point end up with an assert about a
depth_stencil buffer being there even though it shouldn't. If I always
use DRM_FORMAT_MOD_INVALID in tegra_screen_resource_create, things
just work.

That's kind of the cause I pinpointed the issue down to. But I have no
idea what's supposed to happen and what the actual bug is.

Yeah, the bug with tegra has always been "trying to render to linear
color + tiled depth", which the hardware plain doesn't support. (And
linear depth isn't a thing.)

Question is whether what it's doing necessary. PIPE_BIND_SCANOUT
(/linear) requirements are needed for DRI2 to work (well, maybe not in
theory, but at least in practice the nouveau ddx expects linear
buffers). However tegra operates on a more DRI3-like basis, so with
"client" allocations, tiled should work OK as long as there's
something in tegra to copy it to linear when necessary?

I can confirm the above: Our hardware can't render to linear depth
buffers, nor can it mix linear color buffers with block linear depth
buffers.

I think there's a misunderstanding on expected behavior of
resource_create_with_modifiers() here too:
tegra_screen_resource_create() is passing DRM_FORMAT_MOD_INVALID as the
only modifier in non-scanout cases.  Previously, I believe nouveau may
have treated that as "no modifiers specified.  Fall back to internal
layout selection logic", but in my patches I "fixed" it to match other
drivers' behavior, in that allocation will fail if that is the only
modifier in the list, since it is equivalent to passing in a list
containing only unsupported modifiers.  To get fallback behavior,
tegra_screen_resource_create() should pass in (NULL, 0) for (modifiers,
count), or just call resource_create() on the underlying screen instead.

...and in merging my code with Alyssa's new panfrost format modifier support, I see panfrost does the opposite of this and treats a format modifier list of only INVALID as "don't care". I modeled the new nouveau behavior on the Iris driver. Now I'm not sure which is correct :-(

Thanks,
-James

Beyond that, I can only offer my thoughts based on analysis of the code
referenced here so far:

While I've learned from the origins of this thread applications/things
external to Mesa in general shouldn't be querying format modifiers of
buffers created without format modifiers, tegra is a Mesa internal
component that already has some intimate knowledge of how the nouveau
driver it sits on top of works.  Nouveau will always be able to
construct and return a valid format modifier for unorm single sampled
color buffers (and hopefully, anything that can scan out going forward),
both before and after my patches I believe, regardless of how they were
allocated.  After my patches, it should even work for things that can't
scan out in theory.  Hence, looking at this without knowledge of what
motivated the original changes, it seems like
tegra_screen_resource_create should just naively forward the
resource_create() call, relying on nouveau to select a layout and
provide a valid modifier when queried for import.  As Karol notes, this
works fine for at least this simple test case, and it's what nouveau
itself would be doing with an equivalent callstack, excepting the
modifier query, so I find it hard to believe it breaks some application
behavior.  It'll also end up being equivalent (in end result, not quite
semantically) to what dri3_alloc_render_buffer() was doing prior to the
patch mentioned that broke things for Karol, so certainly for the DRI3
usage it's the right behavior.

Ilia, what in the nouveau DDX (As in Xfree86 DDX?) assumes linear
buffers?  It sounds like you don't think it will interact poorly with
this path regardless?  Thierry, do you recall what motivated the
force-linear code here?

As to why this works for Thierry and not Karol, that's confusing.  Are
you both using the same X11 DDX (modesetting I assume?) and X server
versions?  Could it be a difference in client-side DRI library code somehow?


it's X. 1.20.99.1 works, 1.20.8 is broken.

Thanks,
-James

    -ilia



_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Reply via email to