On 08/18/2016 04:32 AM, Michel Dänzer wrote: > On 18/08/16 08:51 AM, Mario Kleiner wrote: >> >> That's what the ati-ddx/amdgpu-ddx does at the moment, as it detects the >> mismatch in tiling flags and uses the DRI3/Present copy path instead of >> the pageflip path. The problem is that the servers Present >> implementation doesn't request a vsync'ed start of the copy operation [...] > > It waits for vblank before starting the copy. >
Yes, a vblank event triggers the present_execute in the server. But all the latency from vblank event dispatch to the copy command packet hitting the gpu is still way too bad to avoid tearing. I tried again and couldn't find a single intel/amd/nvidia gpu here that doesn't tear more or less badly depending on load with DRI3/Present Copyswaps. Even tearfree wouldn't be good enough for my kind of applications as crucial timing/timestamps could still be off frequently by at least 1 frame. > >> There is this other approach from NVidia's Alex Goins for their >> proprietary driver, whose patches landed in the X-Server 1.19 master >> branch a couple of weeks ago. I haven't read his patches in detail yet, >> and i so far couldn't successfully test them with the reference >> implementation in modesetting ddx 1.19. Afaik there the display gpu >> exports a pair of scanout friendly, page flipping compatible dmabufs (i >> assume linear, contiguous, accessible by the display engines), > > FWIW, that wouldn't be possible with our "older" GPUs which can't scan > out from GTT: A BO can be either shared with another GPU or scanout > friendly, not both at the same time. > Ok, good to know. > >> and the offload gpu imports those and renders into them. That saves >> one extra copy, so should be somewhat more efficient. > > Using two shared buffers actually isn't as efficient as possible wrt > inter-GPU bandwidth. > Out of interest, why? You'd have only one detiling copy VRAM -> RAM? Or is it about switching some kind of GTT mappings with two buffers that is inefficient? > >> Setting it up seems to be more involved and less flexible though. So far >> i couldn't make it work here for testing. Maybe bugs, maybe mistakes on >> my side, maybe i just have the wrong hardware for it. > > Yeah, my impression has been it's a rather complicated solution geared > towards the Intel iGPU + proprietary nVidia use case. > > Setting up output source/output sink is not fun, as i learned now, rather clumsy and complex compared to render offload. I hope the real thing will come with some fool-proof one-click setup GUI, otherwise i don't have great hopes, given the technical skill level of my users. I still didn't manage to get it working, not even with the new Nvidia proprietary beta drivers on a real Optimus laptop. -mario