I guess fast-by-default. I imagine that more apps care about performance than care about the granularity of their derivatives.
After a bit more thought -- In HLSL shader model 5 there's both ddx_coarse() and ddx_fine() which gives the shader author the choice between roughly these options. In a *very* quick look I haven't found anything equivalent -- but I might just be being blind. CC'ing Ian -- any opinion? Is there any conformance issue here? -- Chris On Thu, Sep 12, 2013 at 8:41 PM, Chia-I Wu <olva...@gmail.com> wrote: > On Thu, Sep 12, 2013 at 2:06 PM, Chris Forbes <chr...@ijw.co.nz> wrote: >> Can we make this approximation conditional on an image-quality control >> in driconf [or somewhere else]? > Sure. What would be the default behavior? > >> On Thu, Sep 12, 2013 at 5:00 PM, Chia-I Wu <olva...@gmail.com> wrote: >>> From: Chia-I Wu <o...@lunarg.com> >>> >>> Replicate the gradient of the top-left pixel to the other three pixels in >>> the >>> subspan, as how DDY is implemented. Before, different graidents were used >>> for >>> pixels in the top row and pixels in the bottom row. >>> >>> This change results in a less accurate approximation. However, it improves >>> the performance of Xonotic with Ultra settings by 24.3879% +/- 0.832202% (at >>> 95.0% confidence) on Haswell. No noticeable image quality difference >>> observed. >>> >>> No piglit gpu.tests regressions. >>> >>> I failed to come up with an explanation for the performance difference. The >>> change does not make a difference on Ivy Bridge either. If anyone has the >>> insight, please kindly enlighten me. Performance differences may also be >>> observed on other games that call textureGrad and dFdx. >>> >>> Signed-off-by: Chia-I Wu <o...@lunarg.com> >>> --- >>> src/mesa/drivers/dri/i965/brw_fs_emit.cpp | 17 +++++++++++++---- >>> 1 file changed, 13 insertions(+), 4 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp >>> b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp >>> index bfb3d33..c0d24a0 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_fs_emit.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_fs_emit.cpp >>> @@ -564,16 +564,25 @@ fs_generator::generate_tex(fs_inst *inst, struct >>> brw_reg dst, struct brw_reg src >>> void >>> fs_generator::generate_ddx(fs_inst *inst, struct brw_reg dst, struct >>> brw_reg src) >>> { >>> + /* approximate with ((ss0.tr - ss0.tl)x4 (ss1.tr - ss1.tl)x4) on >>> Haswell, >>> + * which gives much better performance when the result is used with >>> + * sample_d >>> + */ >>> + unsigned vstride = (brw->is_haswell) ? BRW_VERTICAL_STRIDE_4 : >>> + BRW_VERTICAL_STRIDE_2; >>> + unsigned width = (brw->is_haswell) ? BRW_WIDTH_4 : >>> + BRW_WIDTH_2; >>> + >>> struct brw_reg src0 = brw_reg(src.file, src.nr, 1, >>> BRW_REGISTER_TYPE_F, >>> - BRW_VERTICAL_STRIDE_2, >>> - BRW_WIDTH_2, >>> + vstride, >>> + width, >>> BRW_HORIZONTAL_STRIDE_0, >>> BRW_SWIZZLE_XYZW, WRITEMASK_XYZW); >>> struct brw_reg src1 = brw_reg(src.file, src.nr, 0, >>> BRW_REGISTER_TYPE_F, >>> - BRW_VERTICAL_STRIDE_2, >>> - BRW_WIDTH_2, >>> + vstride, >>> + width, >>> BRW_HORIZONTAL_STRIDE_0, >>> BRW_SWIZZLE_XYZW, WRITEMASK_XYZW); >>> brw_ADD(p, dst, src0, negate(src1)); >>> -- >>> 1.8.3.1 >>> >>> _______________________________________________ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > > > -- > o...@lunarg.com _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev