On Thu, Feb 21, 2019 at 1:19 AM Alyssa Rosenzweig <[email protected]> wrote: > > For reasons that are still unclear (speculation included in the comment > added in this patch), the tiler? metadata has a fast path that we were > not enabling; there looks to be a possible time/memory tradeoff, but the > details remain unclear. > > Regardless, this patch improves performance dramatically. Particular > wins are for geometry-heavy scenes. For instance, glmark2-es2's > Phong-shaded bunny, rendering at fullscreen (2400x1600) via GBM, jumped > from ~20fps to hitting vsync cap at 60fps. Gains are even more obvious > when vsync is disabled, as in glmark2-es2-wayland. > > With this patch, on GLES 2.0 samples not involving FBOs, it appears > performance is converging with (and sometimes surpassing) the blob. > > Signed-off-by: Alyssa Rosenzweig <[email protected]> > --- > src/gallium/drivers/panfrost/pan_context.c | 42 +++++++++++++++++++--- > 1 file changed, 38 insertions(+), 4 deletions(-) > > diff --git a/src/gallium/drivers/panfrost/pan_context.c > b/src/gallium/drivers/panfrost/pan_context.c > index 822b5a0dfef..2996a9c1e09 100644 > --- a/src/gallium/drivers/panfrost/pan_context.c > +++ b/src/gallium/drivers/panfrost/pan_context.c > @@ -256,7 +256,28 @@ static struct bifrost_framebuffer > panfrost_emit_mfbd(struct panfrost_context *ctx) > { > struct bifrost_framebuffer framebuffer = { > - .tiler_meta = 0xf00000c600, > + /* It is not yet clear what tiler_meta means or how it's > + * calculated, but we can tell the lower 32-bits are a > + * (monotonically increasing?) function of tile count and > + * geometry complexity; I suspect it defines a memory size of > + * some kind? for the tiler. It's really unclear at the > + * moment... but to add to the confusion, the hardware is > happy > + * enough to accept a zero in this field, so we don't even > have > + * to worry about it right now.
*somewhere* the result of VS (or binning shader if VS is split in two) needs to be saved for use during the per-tile rendering. Presumably you have to give the hw a buffer of appropriate/matching size somewhere, or with enough geometry (and too small a buffer) things will overflow and go badly. I guess if you exceed the size given in .tiler_meta, then hw falls back to running VS per tile for all geometry (not just geom visible in the tile), hence big diff in perf for large tile counts and lotsa geometry. It does sound a bit strange that it would depend on tile count. I'd expect it would be a function strictly of amount of geometry (and possibly effected by whether or not gl_PointSize is written?).. and possibly amount of varyings if VS isn't split into two parts. BR, -R > + * The byte (just after the 32-bit mark) is much more > + * interesting. The higher nibble I've only ever seen as 0xF, > + * but the lower one I've seen as 0x0 or 0xF, and it's not > + * obvious what the difference is. But what -is- obvious is > + * that when the lower nibble is zero, performance is > severely > + * degraded compared to when the lower nibble is set. > + * Evidently, that nibble enables some sort of fast path, > + * perhaps relating to caching or tile flush? Regardless, at > + * this point there's no clear reason not to set it, aside > from > + * substantially increased memory requirements (of the misc_0 > + * buffer) */ > + > + .tiler_meta = ((uint64_t) 0xff << 32) | 0x0, > > .width1 = MALI_POSITIVE(ctx->pipe_framebuffer.width), > .height1 = MALI_POSITIVE(ctx->pipe_framebuffer.height), > @@ -271,10 +292,23 @@ panfrost_emit_mfbd(struct panfrost_context *ctx) > > .unknown2 = 0x1f, > > - /* Presumably corresponds to unknown_address_X of SFBD */ > + /* Corresponds to unknown_address_X of SFBD */ > .scratchpad = ctx->scratchpad.gpu, > .tiler_scratch_start = ctx->misc_0.gpu, > - .tiler_scratch_middle = ctx->misc_0.gpu + > /*ctx->misc_0.size*/40960, /* Size depends on the size of the framebuffer and > the number of vertices */ > + > + /* The constant added here is, like the lower word of > + * tiler_meta, (loosely) another product of framebuffer size > + * and geometry complexity. It must be sufficiently large for > + * the tiler_meta fast path to work; if it's too small, there > + * will be DATA_INVALID_FAULTs. Conversely, it must be less > + * than the total size of misc_0, or else there's no room. > It's > + * possible this constant configures a partition between two > + * parts of misc_0? We haven't investigated the > functionality, > + * as these buffers are internally used by the hardware > + * (presumably by the tiler) but not seemingly touched by > the driver > + */ > + > + .tiler_scratch_middle = ctx->misc_0.gpu + 0xf0000, > > .tiler_heap_start = ctx->tiler_heap.gpu, > .tiler_heap_end = ctx->tiler_heap.gpu + ctx->tiler_heap.size, > @@ -2696,7 +2730,7 @@ panfrost_setup_hardware(struct panfrost_context *ctx) > screen->driver->allocate_slab(screen, &ctx->varying_mem, 16384, > false, 0, 0, 0); > screen->driver->allocate_slab(screen, &ctx->shaders, 4096, true, > PAN_ALLOCATE_EXECUTE, 0, 0); > screen->driver->allocate_slab(screen, &ctx->tiler_heap, 32768, > false, PAN_ALLOCATE_GROWABLE, 1, 128); > - screen->driver->allocate_slab(screen, &ctx->misc_0, 128, false, > PAN_ALLOCATE_GROWABLE, 1, 128); > + screen->driver->allocate_slab(screen, &ctx->misc_0, 128*128, false, > PAN_ALLOCATE_GROWABLE, 1, 128); > > } > > -- > 2.20.1 > > _______________________________________________ > mesa-dev mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
