On Mon, Aug 10, 2015 at 8:47 AM, Hans de Goede <hdego...@redhat.com> wrote: > Hi, > > > On 03-08-15 20:09, Ilia Mirkin wrote: >> >> On Mon, Aug 3, 2015 at 1:31 PM, Hans de Goede <hdego...@redhat.com> wrote: >>> >>> Hi, >>> >>> >>> On 03-08-15 17:36, Ilia Mirkin wrote: >>>> >>>> >>>> On Mon, Aug 3, 2015 at 9:02 AM, Hans de Goede <hdego...@redhat.com> >>>> wrote: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> On 30-07-15 16:09, Ilia Mirkin wrote: >>>>>> >>>>>> >>>>>> >>>>>> FWIW this is a fail on nv50+ as well. See for example >>>>>> https://bugs.freedesktop.org/show_bug.cgi?id=91445 >>>>>> >>>>>> My suspicion is that this is due to the lack of PUSH_KICK in the *Done >>>>>> exa handlers -- works fine with DRI2, but DRI3 has no synchronization >>>>>> and so the commands never get flushed out. Easily verified by sticking >>>>>> PUSH_KICK's everywhere. >>>>> >>>>> >>>>> >>>>> >>>>> I do not believe that that is the problem, in my case it clearly >>>>> seems to be a pitch / swizzle problem rather then a synchronizarion >>>>> problem, here is what my desktop with gnome shell looks like when >>>>> using DRI2: >>>>> >>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-good.jpg >>>>> >>>>> And this is what it looks like when using DRI3: >>>>> >>>>> https://fedorapeople.org/~jwrdegoede/nv46-gnome-shell-bad.jpg >>>>> >>>>> The DRI2 screenshot is made with Mario's 2 patches on top of >>>>> current master: >>>>> >>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021740.html >>>>> http://lists.freedesktop.org/archives/nouveau/2015-July/021741.html >>>>> >>>>> And then adding Option "DRI" "2" to xorg.conf. >>>> >>>> >>>> >>>> His patches should have defaulted it to DRI 2 I think, so this is >>>> unnecessary. In fact you should have had to say "DRI" "3" to get DRI3 >>>> with his patches. >>>> -- >>>>> >>>>> >>>>> >>>>> I've also tried disabling EXA using Option "AccelMethod" "none", >>>>> but that seems to also automatically disable all DRI, leading to >>>>> software rendering. >>>>> >>>>> I discussed this with Ben this morning and he suggested that this >>>>> is likely a Mesa issue since with DRI3 mesa rather then the ddx >>>>> allocs the surfaces. I've tried disabling swizzling in the >>>>> mesa code by forcing nv30_miptree_create() to always take >>>>> the code path for linear textures, but that leads to the exact >>>>> same result as before that change. >>>> >>>> >>>> >>>> Ah yes. Very different problem indeed. I actually suspect it has to do >>>> with swizzling. Look at the white pattern of the moon -- it's all in a >>>> line. That means that it expected some locality and instead it got >>>> drawn all on a line. If it were merely a stride problem, I'd expect to >>>> see strips of the moon below and offset from one another. >>>> >>>> So... take a look at nv30_miptree_from_handle -- I wonder if it can >>>> now receive swizzled textures where it couldn't before. >>> >>> >>> >>> Ok, that does go in the direction I am expecting the problem to be, >>> but I'm afraid I'm going to need a bit more guidance, what exactly >>> am I looking for in that function / which "knobs" should I try to >>> vary / play with to maybe fix this ? >> >> >> Unfortunately this is playing near (or past) the limits of my >> knowledge as well. My understanding is that DRI3 passes pixmaps around >> with dma-buf, aka "bo_from_handle". DRI2 uses some other mechanism >> which is not that (I think it just copies stuff around). >> >> Now on nv50+, bo's have "tile flags" (and memtype and probably other >> annoyances). The tile flags indicate the specific tiling mechanism >> used on that bo (i.e. do you do 32x32 tiles? 32x64? etc). Take a look >> at the nouveau_bo_new() call in nv50_miptree.c -- note how it takes a >> "bo config" argument. This bo config can then later be retrieved using >> some other syscall. >> >> However on nv30 there appears to not be any such thing. The >> nouveau_bo_new call just passes in NULL for creating the bo, which >> means that there's no way to recover the "are you swizzled" >> information after-the-fact. >> >> Presumably you should create a "nv04" bo config section in the union, > > > That already exists, and indeed gets set by the nouveau_allocate_surface > function from src/nv_accel_common.c from the ddx, > >> and just pass the single "swizzled" bit through. I'm not sure what, if >> anything, is required on the kernel side for that. I don't think >> there's any optionality in how the swizzling is done for pre-nv50. >> >> Note that in the nv30_miptree logic, mt->swizzled implies that >> mt->uniform_pitch = 0, but the level pitch is set "properly" (again, >> see nv30_miptree_create). >> >> Hope this sheds some light and doesn't cause you to go in the wrong >> direction -- please take everything I say with a grain of salt -- I'm >> often a bit off on some of the details. > > > Thanks this was helpful, I do feel we are getting somewhere, but I do > need a bit more help. > > I've added some debug printf's to nv30_miptree.c, nv30_miptree_create > and nv30_miptree_from_handle, where the latter is only used when using > dri2 (e.g. in the working case). > > Doing a diff between a log of starting gnome-shell with dri vs dri3 > results in this: > > --- mesa.log.dri2 2015-08-10 14:18:03.182712022 +0200 > +++ mesa.log.dri3 2015-08-10 14:18:33.233336338 +0200 > @@ -1,8 +1,8 @@ > nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0 > +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 512x32 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_from_handle 1x1 uniform_pitch 1024 usage 0 flags 0 > +nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > @@ -12,6 +12,7 @@ > nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 512x512 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 > +nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > @@ -20,29 +21,24 @@ > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > -nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1920x1200 uniform_pitch 7680 usage 0 flags 0 > +nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 > nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > -nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 256x256 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > +nv30_miptree_create 18x18 uniform_pitch 64 usage 0 flags 0 > +nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 48x48 uniform_pitch 192 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > nv30_miptree_create 16x16 uniform_pitch 0 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 1920x1080 uniform_pitch 7680 usage 0 flags 0 > nv30_miptree_create 1x1 uniform_pitch 0 usage 0 flags 0 > nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0 > nv30_miptree_create 6x8 uniform_pitch 64 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > nv30_miptree_create 24x24 uniform_pitch 128 usage 0 flags 0 > -nv30_miptree_from_handle 1920x1080 uniform_pitch 8192 usage 0 flags 0 > > I've also added logging of the surface creation in the ddx, here > is one such log line: > > Surface alloc 1920x1080@32 pitch 8192, cfg.nv04.surf_pitch 8192 .surf_flags > 2 > > What stands out to me is: > > 1) In the dri3 case there are no nv30_miptree_from_handle calls, these are > replaced by nv30_miptree_create calls
I bet it becomes the DDX that consumes these now though? And perhaps it's not ready to take swizzled textures (i.e. pixmaps)? > > 2) These replacement calls use a different pitch for the 1920x... surfaces, > 8192 vs 7680 Looks like it gets upgraded to a POT value (i.e. 2048 x whatever). nv30_miptree will create an unswizzled surface for a non-POT-sized texture, but I guess something somewhere is aligning it up in the DDX? And *still* creating it as a linear texture? Weird. > > 3) The surface creation in the ddx passes in cfg.nv04.surf_... when calling > nouveau_bo_new, where as nv30_miptree_create passes in NULL Right. That's the bit I was talking about earlier -- you need to pass the information that it will be swizzled. Otherwise it's lost in the nv30_miptree should probably be setting those surf_flags as well on BO creation. Apparently there are performance implications from setting these. > > 4) nv30_miptree_from_handle always sets uniform_pitch in the nv30_miptree, > iow > it always assumes non-swizzled ? That's right. > > The thing I would like to investigate first is 2), it seems to me that > nv30_miptree_create should not be doing pitch alignment itself, so if we > want to have the same pitch for these surfaces as in the dri2 case, we > need to fix this in the caller. Which leads to the question where is > this code being called from in the dri3 case. And this is where I'm > stuck and could use your or Benjamin's input. So do you know where the > caller is and/or how to figure out where the caller is? Well, a quick look at the DDX, looks like it never swizzles its textures. See NV40EXAPictTexture for example -- it always puts TEX_FORMAT_LINEAR in. Why, you ask? No clue, that seems incredibly inefficient and silly. Instead of doing that, you could "guess" whether it's swizzled or not based on whether the w/h are POT. If both are, then don't put that linear thing in. It's obviously possible to create POT-sized non-swizzled textures, but you can see the nv30_miptree logic for when it does and when it doesn't. I'm only suggesting this for debugging, not for "real". The real solution to this will be to include the swizzled-ness in the bo_info ioctl, i.e. stick it somewhere in the bo_config and have the kernel hold on to it. -ilia _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau