From: Rob Clark <[email protected]> So, I know there were a couple concerns voiced over the idea of re-ordering rendering in a gallium shim pipe driver layer. For me, the main concern was whether the overhead of an extra layer, queueing and replaying state updates, draws, etc, would be prohibitive. So I implemented it enough that I could do some benchmarking ;-)
The first 9 patches are just some general API cleanups, which I found to be convenient (since the resequencer layer is generating most of the state handling with python + mako, so the cleanups to improve consistency help minimize the state which required special handling). But regardless of the outcome of the resequencer layer, I think these patches make sense on their own. (Note: auto-generating some of the other wrapper layers might be an interesting future cleanup.. at least it should be trivial for noop ;-)) As far as overhead, I've been benchmarking (most glmark2 + stk + gfxbench), and in the current state (without actually having the dependency tracking implemented) it doesn't seem to cause more than a couple percent overhead. From here on out, the remaining overhead added to implement the dependency tracking and re- ordering would be the same as the additional overhead required to implement it in the driver backend. And a couple percent overhead is small compared to the expected gains for games which benefit.. ie. 8MiB for 1080p rgb frame, avoiding copying that from tile to memory and back once or twice quickly dwarfs an extra copy of some 10's of kb of state.. and even more so for (for ex.) f32f32f32f32 intermediate buffers. Queries are still missing, but I expect what would be required to implement it is the same as the logic that would be needed in the driver backend otherwise. Basically, the only concern I have, compared to the approach of implementing the dependency tracking in each driver backend is pipe_constant_buffer::user_buffer. Currently both freedreno and vc4 what non-UBO constant buffers to be emitted in cmdstream. In the adreno case, it looks like a3xx/a4xx should also support the non-user_buffer case, although in fact this appears to be broken (at least on a4xx) and I've never seen blob driver use this. At the moment I'm doing a hack in freedreno to map the backing fd_bo and then memcpy it into cmdstream. Which is a bit silly (since it is a write-combine buffer I'm copying from). But in glmark I had trouble even measuring the overhead of this extra copy. Although possibly I need to find something to measure which emits more non-UBO constant state. btw, if someone has some requests for benchmarks to try (provided they are available for arm/linux) I'd be happy to try some other things. The plus side of doing this in a separate layer is that we only implement the dependency tracking and resource shadowing once, instead of both in vc4 and freedreno (and who knows, maybe someday someone gets around to writing a lima gallium driver). Plus, I envision this to be something that mesa/st wraps the pipe_screen with if driconf tells it to, and pscreen->rsq_funcs is populated (we at least need a callback to know if resource is still busy). This way we can turn it on for games/apps that are known to benefit, and leave it off with zero additional overhead for better written things (or rather, things written with tilers in mind). Rob Clark (10): gallium: cleanup set_tess_state gallium: make shader_buffers const gallium: make constant_buffer const gallium: make image_view const gallium: change end_query() to return boolean gallium/util: add util_copy_index_buffer() helper gallium/util: add util_copy_shader_buffer() helper gallium/util: add util_copy_vertex_buffer helper gallium/util: make util_copy_framebuffer_state(src=NULL) work RFC: gallium: add resequencer driver (INCOMPLETE) configure.ac | 1 + src/gallium/auxiliary/util/u_framebuffer.c | 37 +- src/gallium/auxiliary/util/u_helpers.c | 15 - src/gallium/auxiliary/util/u_helpers.h | 3 - src/gallium/auxiliary/util/u_inlines.h | 49 ++ src/gallium/drivers/ddebug/dd_context.c | 15 +- src/gallium/drivers/freedreno/freedreno_query.c | 2 +- src/gallium/drivers/freedreno/freedreno_state.c | 13 +- src/gallium/drivers/i915/i915_query.c | 2 +- src/gallium/drivers/i915/i915_state.c | 8 +- src/gallium/drivers/ilo/ilo_query.c | 2 +- src/gallium/drivers/ilo/ilo_state.c | 14 +- src/gallium/drivers/llvmpipe/lp_query.c | 2 +- src/gallium/drivers/llvmpipe/lp_state_fs.c | 2 +- src/gallium/drivers/llvmpipe/lp_state_vertex.c | 6 +- src/gallium/drivers/noop/noop_pipe.c | 2 +- src/gallium/drivers/noop/noop_state.c | 2 +- src/gallium/drivers/nouveau/nv30/nv30_query.c | 2 +- src/gallium/drivers/nouveau/nv30/nv30_state.c | 13 +- src/gallium/drivers/nouveau/nv50/nv50_query.c | 2 +- src/gallium/drivers/nouveau/nv50/nv50_state.c | 2 +- src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 25 +- src/gallium/drivers/r300/r300_query.c | 4 +- src/gallium/drivers/r300/r300_state.c | 10 +- src/gallium/drivers/r600/evergreen_state.c | 7 +- src/gallium/drivers/r600/r600_state_common.c | 7 +- src/gallium/drivers/radeon/r600_query.c | 2 +- src/gallium/drivers/radeonsi/si_descriptors.c | 16 +- src/gallium/drivers/radeonsi/si_state.c | 13 +- src/gallium/drivers/radeonsi/si_state.h | 3 +- src/gallium/drivers/rbug/rbug_context.c | 6 +- src/gallium/drivers/resequencer/.gitignore | 2 + src/gallium/drivers/resequencer/Makefile.am | 44 ++ src/gallium/drivers/resequencer/Makefile.sources | 23 + src/gallium/drivers/resequencer/rsq_batch.c | 144 +++++ src/gallium/drivers/resequencer/rsq_batch.h | 71 +++ src/gallium/drivers/resequencer/rsq_context.c | 457 ++++++++++++++++ src/gallium/drivers/resequencer/rsq_context.h | 84 +++ src/gallium/drivers/resequencer/rsq_draw.c | 230 ++++++++ src/gallium/drivers/resequencer/rsq_draw.h | 40 ++ src/gallium/drivers/resequencer/rsq_fence.c | 48 ++ src/gallium/drivers/resequencer/rsq_fence.h | 43 ++ src/gallium/drivers/resequencer/rsq_public.h | 68 +++ src/gallium/drivers/resequencer/rsq_query.c | 148 +++++ src/gallium/drivers/resequencer/rsq_query.h | 32 ++ src/gallium/drivers/resequencer/rsq_resource.c | 222 ++++++++ src/gallium/drivers/resequencer/rsq_resource.h | 60 ++ src/gallium/drivers/resequencer/rsq_screen.c | 186 +++++++ src/gallium/drivers/resequencer/rsq_screen.h | 50 ++ src/gallium/drivers/resequencer/rsq_state.py | 607 +++++++++++++++++++++ .../drivers/resequencer/rsq_state_helpers.h | 219 ++++++++ src/gallium/drivers/resequencer/rsq_surface.c | 107 ++++ src/gallium/drivers/resequencer/rsq_surface.h | 72 +++ src/gallium/drivers/softpipe/sp_query.c | 2 +- src/gallium/drivers/softpipe/sp_state_image.c | 10 +- src/gallium/drivers/softpipe/sp_state_shader.c | 2 +- src/gallium/drivers/softpipe/sp_state_vertex.c | 6 +- src/gallium/drivers/svga/svga_pipe_constants.c | 2 +- src/gallium/drivers/svga/svga_pipe_query.c | 2 +- src/gallium/drivers/svga/svga_pipe_vertex.c | 2 +- src/gallium/drivers/swr/swr_query.cpp | 2 +- src/gallium/drivers/swr/swr_state.cpp | 9 +- src/gallium/drivers/trace/tr_context.c | 15 +- src/gallium/drivers/vc4/vc4_query.c | 2 +- src/gallium/drivers/vc4/vc4_state.c | 13 +- src/gallium/drivers/virgl/virgl_context.c | 10 +- src/gallium/drivers/virgl/virgl_query.c | 4 +- src/gallium/include/pipe/p_context.h | 12 +- src/gallium/include/pipe/p_state.h | 8 + src/mesa/state_tracker/st_atom_tess.c | 13 +- 70 files changed, 3148 insertions(+), 210 deletions(-) create mode 100644 src/gallium/drivers/resequencer/.gitignore create mode 100644 src/gallium/drivers/resequencer/Makefile.am create mode 100644 src/gallium/drivers/resequencer/Makefile.sources create mode 100644 src/gallium/drivers/resequencer/rsq_batch.c create mode 100644 src/gallium/drivers/resequencer/rsq_batch.h create mode 100644 src/gallium/drivers/resequencer/rsq_context.c create mode 100644 src/gallium/drivers/resequencer/rsq_context.h create mode 100644 src/gallium/drivers/resequencer/rsq_draw.c create mode 100644 src/gallium/drivers/resequencer/rsq_draw.h create mode 100644 src/gallium/drivers/resequencer/rsq_fence.c create mode 100644 src/gallium/drivers/resequencer/rsq_fence.h create mode 100644 src/gallium/drivers/resequencer/rsq_public.h create mode 100644 src/gallium/drivers/resequencer/rsq_query.c create mode 100644 src/gallium/drivers/resequencer/rsq_query.h create mode 100644 src/gallium/drivers/resequencer/rsq_resource.c create mode 100644 src/gallium/drivers/resequencer/rsq_resource.h create mode 100644 src/gallium/drivers/resequencer/rsq_screen.c create mode 100644 src/gallium/drivers/resequencer/rsq_screen.h create mode 100644 src/gallium/drivers/resequencer/rsq_state.py create mode 100644 src/gallium/drivers/resequencer/rsq_state_helpers.h create mode 100644 src/gallium/drivers/resequencer/rsq_surface.c create mode 100644 src/gallium/drivers/resequencer/rsq_surface.h -- 2.5.5 _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
