On Wednesday 1 February 2012, Michel Dänzer <mic...@daenzer.net> wrote: > On Mit, 2012-02-01 at 15:01 +0000, Simon Farnsworth wrote: > > + if (sleep_bo) { > > + unsigned reloc_index; > > + /* Create a dummy BO so that fence_finish without a timeout can > > sleep waiting for completion */ > > + *sleep_bo = ctx->ws->buffer_create(ctx->ws, 1, 1, > > + PIPE_BIND_CUSTOM, > > + RADEON_DOMAIN_GTT); > > + /* Add the fence as a dummy relocation. */ > > + reloc_index = ctx->ws->cs_add_reloc(ctx->cs, > > + > > ctx->ws->buffer_get_cs_handle(*sleep_bo), > > + RADEON_USAGE_READWRITE, > > RADEON_DOMAIN_GTT); > > + if (reloc_index >= ctx->creloc) > > + ctx->creloc = reloc_index+1; > > + } > > Is there a point in making sleep_bo optional? > I can't think of a reason to make it optional; I'll remove that in v2. > > > diff --git a/src/gallium/drivers/r600/r600_pipe.c > > b/src/gallium/drivers/r600/r600_pipe.c > > index c38fbc5..71e31b1 100644 > > --- a/src/gallium/drivers/r600/r600_pipe.c > > +++ b/src/gallium/drivers/r600/r600_pipe.c > > @@ -605,6 +605,14 @@ static boolean r600_fence_finish(struct pipe_screen > > *pscreen, > > } > > > > while (rscreen->fences.data[rfence->index] == 0) { > > + /* Special-case infinite timeout */ > > + if (timeout == PIPE_TIMEOUT_INFINITE && > > + rfence->sleep_bo) { > > + rscreen->ws->buffer_wait(rfence->sleep_bo, > > RADEON_USAGE_READWRITE); > > + pb_reference(&rfence->sleep_bo, NULL); > > + continue; > > + } > > I think rfence->sleep_bo should only be unreferenced in > r600_fence_reference() when the fence is recycled, otherwise it'll be > leaked if r600_fence_finish() is never called for some reason. > I'll fix this in v2.
> If r600_fence_finish() only ever called os_time_sleep(), never > sched_yield() (like r300_fence_finish()), would that avoid your problem > even with a finite timeout? > I experimented with that - depending on the specific workload, I need the timeout to vary, otherwise I can see the impact of the loop in terms of bad latency behaviour (resulting in occasional dropped frames). For the workloads I tried, I needed the sleep to vary between 1 usec (for low-complexity workloads) and 100 usec (for high complexity workloads). Recompiling Mesa for each workload is obviously not an option. I did try an adaptive spin - essentially removing the "if (spins++ % 256) continue", and adding: if (spins < 40) os_sleep_time(1); else if (spins < 100) os_sleep_time(10); else os_sleep_time(100); But I felt this was ugly, when the core problem is that I want to sleep until completion, the hardware has support for sleeping until completion, and the only reason I can't is deficiencies in the driver stack. Fundamentally, I suspect that the reason I'm seeing pain from this and other people aren't is that I'm comparing an AMD E-350 to an Intel Atom D510, and I've tuned my software stack on the D510 to within an inch of its life. My expectation is that the better GPU in the E-350 will make my 2D graphics-intensive workload (OpenGL compositing of 2D movies) perform about as well as it did on the D510 - sleep-based waiting for fence completion gets in the way, as the D510 has slightly more CPU power than the E-350, and I'm not (yet) fully exploiting the E-350's GPU. -- Simon Farnsworth Software Engineer ONELAN Limited http://www.onelan.com/
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev