Hi,

On 11-08-15 14:21, Ilia Mirkin wrote:
On Mon, Aug 10, 2015 at 8:47 AM, Hans de Goede <[email protected]> wrote:
Hi,


On 03-08-15 20:09, Ilia Mirkin wrote:

On Mon, Aug 3, 2015 at 1:31 PM, Hans de Goede <[email protected]> wrote:

Hi,


On 03-08-15 17:36, Ilia Mirkin wrote:


On Mon, Aug 3, 2015 at 9:02 AM, Hans de Goede <[email protected]>
wrote:


Hi,

On 30-07-15 16:09, Ilia Mirkin wrote:



FWIW this is a fail on nv50+ as well. See for example
https://bugs.freedesktop.org/show_bug.cgi?id=91445

My suspicion is that this is due to the lack of PUSH_KICK in the *Done
exa handlers -- works fine with DRI2, but DRI3 has no synchronization
and so the commands never get flushed out. Easily verified by sticking
PUSH_KICK's everywhere.




I do not believe that that is the problem, in my case it clearly
seems to be a pitch / swizzle problem rather then a synchronizarion
problem, here is what my desktop with gnome shell looks like when
using DRI2:

https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-good.jpg

And this is what it looks like when using DRI3:

https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-bad.jpg

The DRI2 screenshot is made with Mario's 2 patches on top of
current master:

http://lists.freedesktop.org/archives/nouveau/2015-July/021740.html
http://lists.freedesktop.org/archives/nouveau/2015-July/021741.html

And then adding Option "DRI" "2" to xorg.conf.



His patches should have defaulted it to DRI 2 I think, so this is
unnecessary. In fact you should have had to say "DRI" "3" to get DRI3
with his patches.
    --



I've also tried disabling EXA using Option "AccelMethod" "none",
but that seems to also automatically disable all DRI, leading to
software rendering.

I discussed this with Ben this morning and he suggested that this
is likely a Mesa issue since with DRI3 mesa rather then the ddx
allocs the surfaces. I've tried disabling swizzling in the
mesa code by forcing nv30_miptree_create() to always take
the code path for linear textures, but that leads to the exact
same result as before that change.



Ah yes. Very different problem indeed. I actually suspect it has to do
with swizzling. Look at the white pattern of the moon -- it's all in a
line. That means that it expected some locality and instead it got
drawn all on a line. If it were merely a stride problem, I'd expect to
see strips of the moon below and offset from one another.

So... take a look at nv30_miptree_from_handle -- I wonder if it can
now receive swizzled textures where it couldn't before.



Ok, that does go in the direction I am expecting the problem to be,
but I'm afraid I'm going to need a bit more guidance, what exactly
am I looking for in that function / which "knobs" should I try to
vary / play with to maybe fix this ?


Unfortunately this is playing near (or past) the limits of my
knowledge as well. My understanding is that DRI3 passes pixmaps around
with dma-buf, aka "bo_from_handle". DRI2 uses some other mechanism
which is not that (I think it just copies stuff around).

Now on nv50+, bo's have "tile flags" (and memtype and probably other
annoyances). The tile flags indicate the specific tiling mechanism
used on that bo (i.e. do you do 32x32 tiles? 32x64? etc). Take a look
at the nouveau_bo_new() call in nv50_miptree.c -- note how it takes a
"bo config" argument. This bo config can then later be retrieved using
some other syscall.

However on nv30 there appears to not be any such thing. The
nouveau_bo_new call just passes in NULL for creating the bo, which
means that there's no way to recover the "are you swizzled"
information after-the-fact.

Presumably you should create a "nv04" bo config section in the union,


That already exists, and indeed gets set by the nouveau_allocate_surface
function from src/nv_accel_common.c from the ddx,

and just pass the single "swizzled" bit through. I'm not sure what, if
anything, is required on the kernel side for that. I don't think
there's any optionality in how the swizzling is done for pre-nv50.

Note that in the nv30_miptree logic, mt->swizzled implies that
mt->uniform_pitch = 0, but the level pitch is set "properly" (again,
see nv30_miptree_create).

Hope this sheds some light and doesn't cause you to go in the wrong
direction -- please take everything I say with a grain of salt -- I'm
often a bit off on some of the details.


Thanks this was helpful, I do feel we are getting somewhere, but I do
need a bit more help.

I've added some debug printf's to nv30_miptree.c, nv30_miptree_create
and nv30_miptree_from_handle, where the latter is only used when using
dri2 (e.g. in the working case).

Doing a diff between a log of starting gnome-shell with dri vs dri3
results in this:

--- mesa.log.dri2       2015-08-10 14:18:03.182712022 +0200
+++ mesa.log.dri3       2015-08-10 14:18:33.233336338 +0200
@@ -1,8 +1,8 @@
  nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0
-nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0
+nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0
-nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0
+nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
@@ -12,6 +12,7 @@
  nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 512x512 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0
+nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
@@ -20,29 +21,24 @@
  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
-nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
  nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 1920x1200 uniform_pitch 7680 usage 0 flags 0
+nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
  nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0
  nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
-nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
-nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0
-nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
+nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
  nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
  nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
-nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
+nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0
+nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
  nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0
-nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
  nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0
-nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
  nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
  nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0
  nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0
  nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0
  nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0
-nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0
  nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0
-nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0

I've also added logging of the surface creation in the ddx, here
is one such log line:

  Surface alloc 1920x1080@32 pitch 8192, cfg.nv04.surf_pitch 8192 .surf_flags
2

What stands out to me is:

1) In the dri3 case there are no nv30_miptree_from_handle calls, these are
  replaced by nv30_miptree_create calls

I bet it becomes the DDX that consumes these now though? And perhaps
it's not ready to take swizzled textures (i.e. pixmaps)?

Correct, it turns out these are scanout buffers, see below for a fix.

2) These replacement calls use a different pitch for the 1920x... surfaces,
8192 vs 7680

Looks like it gets upgraded to a POT value (i.e. 2048 x whatever).
nv30_miptree will create an unswizzled surface for a non-POT-sized
texture, but I guess something somewhere is aligning it up in the DDX?
And *still* creating it as a linear texture? Weird.

Correct, both on the aligning and on them still being linear, maybe
the scanout buffers always need to be linear on nv3x / nv4x ?

3) The surface creation in the ddx passes in cfg.nv04.surf_... when calling
nouveau_bo_new, where as nv30_miptree_create passes in NULL

Right. That's the bit I was talking about earlier -- you need to pass
the information that it will be swizzled. Otherwise it's lost in the
nv30_miptree should probably be setting those surf_flags as well on BO
creation. Apparently there are performance implications from setting
these.

OK, things do seem to work fine with both dri2 / dri3 without the nv30_miptree
code passing these in. But if you think it is useful I can write a patch
setting these in the non-linear case, and see what happens.

4) nv30_miptree_from_handle always sets uniform_pitch in the nv30_miptree,
iow
it always assumes non-swizzled ?

That's right.

Which works because the ddx is always creating linear buffers, right ?

The thing I would like to investigate first is 2), it seems to me that
nv30_miptree_create should not be doing pitch alignment itself, so if we
want to have the same pitch for these surfaces as in the dri2 case, we
need to fix this in the caller. Which leads to the question where is
this code being called from in the dri3 case. And this is where I'm
stuck and could use your or Benjamin's input. So do you know where the
caller is and/or how to figure out where the caller is?

Well, a quick look at the DDX, looks like it never swizzles its
textures. See NV40EXAPictTexture for example -- it always puts
TEX_FORMAT_LINEAR in. Why, you ask? No clue, that seems incredibly
inefficient and silly.

Instead of doing that, you could "guess" whether it's swizzled or not
based on whether the w/h are POT. If both are, then don't put that
linear thing in. It's obviously possible to create POT-sized
non-swizzled textures, but you can see the nv30_miptree logic for when
it does and when it doesn't. I'm only suggesting this for debugging,
not for "real". The real solution to this will be to include the
swizzled-ness in the bo_info ioctl, i.e. stick it somewhere in the
bo_config and have the kernel hold on to it.

This assumes that it is useful / possible to use swizzled buffers
in combination with the scanout engine on nv3x/nv4x ...

Also given the age of the hardware I wonder whether it is worth the
trouble to do this.

###

Thanks for all the info, I've been working on this myself today, and
I've come up with the following fix for the immediate problem at
hand:

--- a/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
+++ b/src/gallium/drivers/nouveau/nv30/nv30_miptree.c
@@ -28,6 +28,7 @@
 #include "util/u_surface.h"

 #include "nv_m2mf.xml.h"
+#include "nv_object.xml.h"
 #include "nv30/nv30_screen.h"
 #include "nv30/nv30_context.h"
 #include "nv30/nv30_resource.h"
@@ -362,6 +363,7 @@ nv30_miptree_create(struct pipe_screen *pscreen,
    blocksz = util_format_get_blocksize(pt->format);

    if ((pt->target == PIPE_TEXTURE_RECT) ||
+       (pt->bind & PIPE_BIND_SCANOUT) ||
        !util_is_power_of_two(pt->width0) ||
        !util_is_power_of_two(pt->height0) ||
        !util_is_power_of_two(pt->depth0) ||
@@ -369,6 +371,14 @@ nv30_miptree_create(struct pipe_screen *pscreen,
        util_format_is_float(pt->format) || mt->ms_mode) {
       mt->uniform_pitch = util_format_get_nblocksx(pt->format, w) * blocksz;
       mt->uniform_pitch = align(mt->uniform_pitch, 64);
+      if (pt->bind & PIPE_BIND_SCANOUT) {
+         struct nv30_screen *screen = nv30_screen(pscreen);
+         /* This assumes a tiled framebuffer */
+         int pitch_align = MAX2(
+               screen->eng3d->oclass >= NV40_3D_CLASS ? 1024 : 256,
+               /* round_down_pow2(mt->uniform_pitch / 4) */
+               1 << (util_last_bit(mt->uniform_pitch / 4) - 1));
+         mt->uniform_pitch = align(mt->uniform_pitch, pitch_align);
+      }
    }

    if (!mt->uniform_pitch)


What I believe is happening is that with dri3 the buffer which becomes
the buffer scanned-out by the hardware is allocated by a dri3 client,
rather then by the x-server/ddx, but the actual video-mode set is still
done by the ddx, so these scanout buffers must have a layout matching
the one expected by the video-mode as setup by the ddx.

This patch seems to fix the immediate problem at hand, it has one problem
though, that it assumes that pNv->tiled_layout in the ddx always is true,
I see no way to get to pNv->tiled_layout from the mesa code, so I think
that it may be best to fix this problem by simply always making
pNv->tiled_layout true when dri3 is active in the ddx.

Regards,

Hans
_______________________________________________
Nouveau mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/nouveau

Reply via email to