On 09/06/2013 05:05 AM, Chia-I Wu wrote: > On Thu, Sep 5, 2013 at 9:57 PM, Chia-I Wu <olva...@gmail.com> wrote: >> On Thu, Sep 5, 2013 at 5:12 PM, Chris Forbes <chr...@ijw.co.nz> wrote: >>> A possible explanation for the perf change is that Xonotic uses >>> anisotropic filtering at this quality level. Lowering to txl defeats >>> it. >> I had a look at that. gl_sampler->MaxAnisotropy is never greater than >> 1.0 in gen7_update_sampler_state() so there is no anisotropic >> filtering in this case. >> >> It makes sense to me that avoiding punting to SIMD8 helps the >> performance. But it is not clear to me why >10% performance change >> can still be observed when INTEL_DEBUG=no16 is specified. A >> reasonable explanation is that the image quality is degraded in some >> way, which is why I am still nervous about the change. > With INTEL_DEBUG=no16 set, the same trick hurts the performance on > Haswell by about 5%. That is, sample_d on Haswell is faster than the > one emulated with sample_l.
What is the delta if sample_d is used for just SIMD8 shaders on HSW? Even when the shader can go SIMD16, some fragments will use the SIMD8 path. > But since the trick makes SIMD16 possible, it gains 5% more fps when > INTEL_DEBUG=no16 is not set. > >> An alternative approach to avoid punting seems to emulate SIMD16 >> sample_d with two SIMD8 sample_d. It will take longer to implement >> given my familiarity with the code, and may be less performant. BUt >> that would allow things like anisotropic filtering to be honored. >> >> >>> It would be worth doing an image quality comparison before and after the >>> change. >> Yeah, that is worth doing. I will do that. >> >>> >>> -- Chris >>> >>> On Thu, Sep 5, 2013 at 8:35 PM, Chia-I Wu <olva...@gmail.com> wrote: >>>> sample_d is slower than the lowered version on gen7. For gen7, this >>>> improves >>>> Xonotic benchmark with Ultimate effects by as much as 25%: >>>> >>>> before the change: 40.06 fps >>>> after the change: 51.10 fps >>>> after the change with INTEL_DEBUG=no16: 44.46 fps >>>> >>>> As sample_d is not allowed in SIMD16 mode, I firstly thought the difference >>>> was from SIMD8 versus SIMD16. If that was the case, we would want to apply >>>> brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode. >>>> >>>> But, as the numbers show, there is still 10% improvement when SIMD16 is >>>> forced >>>> off after the change. Thus textureGrad() is lowered unconditionally for >>>> now. >>>> Due to this and that I haven't tried it on Haswell, this is still RFC. >>>> >>>> No piglit regressions. >>>> >>>> Signed-off-by: Chia-I Wu <olva...@gmail.com> >>>> --- >>>> .../dri/i965/brw_lower_texture_gradients.cpp | 54 >>>> ++++++++++++++-------- >>>> 1 file changed, 36 insertions(+), 18 deletions(-) >>>> >>>> diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >>>> b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >>>> index 1589a20..f3fcb56 100644 >>>> --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >>>> +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >>>> @@ -34,8 +34,8 @@ using namespace ir_builder; >>>> >>>> class lower_texture_grad_visitor : public ir_hierarchical_visitor { >>>> public: >>>> - lower_texture_grad_visitor(bool has_sample_d_c) >>>> - : has_sample_d_c(has_sample_d_c) >>>> + lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c) >>>> + : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c) >>>> { >>>> progress = false; >>>> } >>>> @@ -44,6 +44,7 @@ public: >>>> >>>> >>>> bool progress; >>>> + bool has_sample_d; >>>> bool has_sample_d_c; >>>> >>>> private: >>>> @@ -90,22 +91,33 @@ txs_type(const glsl_type *type) >>>> ir_visitor_status >>>> lower_texture_grad_visitor::visit_leave(ir_texture *ir) >>>> { >>>> - /* Only lower textureGrad with shadow samplers */ >>>> - if (ir->op != ir_txd || !ir->shadow_comparitor) >>>> + if (ir->op != ir_txd) >>>> return visit_continue; >>>> >>>> - /* Lower textureGrad() with samplerCubeShadow even if we have the >>>> sample_d_c >>>> - * message. GLSL provides gradients for the 'r' coordinate. >>>> Unfortunately: >>>> - * >>>> - * From the Ivybridge PRM, Volume 4, Part 1, sample_d message >>>> description: >>>> - * "The r coordinate contains the faceid, and the r gradients are >>>> ignored >>>> - * by hardware." >>>> - * >>>> - * We likely need to do a similar treatment for samplerCube and >>>> - * samplerCubeArray, but we have insufficient testing for that at the >>>> moment. >>>> - */ >>>> - bool need_lowering = !has_sample_d_c || >>>> - ir->sampler->type->sampler_dimensionality == GLSL_SAMPLER_DIM_CUBE; >>>> + bool need_lowering = false; >>>> + >>>> + if (ir->shadow_comparitor) { >>>> + /* Lower textureGrad() with samplerCubeShadow even if we have the >>>> + * sample_d_c message. GLSL provides gradients for the 'r' >>>> coordinate. >>>> + * Unfortunately: >>>> + * >>>> + * From the Ivybridge PRM, Volume 4, Part 1, sample_d message >>>> + * description: "The r coordinate contains the faceid, and the r >>>> + * gradients are ignored by hardware." >>>> + */ >>>> + if (ir->sampler->type->sampler_dimensionality == >>>> GLSL_SAMPLER_DIM_CUBE) >>>> + need_lowering = true; >>>> + else if (!has_sample_d_c) >>>> + need_lowering = true; >>>> + } >>>> + else { >>>> + /* We likely need to do a similar treatment for samplerCube and >>>> + * samplerCubeArray, but we have insufficient testing for that at >>>> the >>>> + * moment. >>>> + */ >>>> + if (!has_sample_d) >>>> + need_lowering = true; >>>> + } >>>> >>>> if (!need_lowering) >>>> return visit_continue; >>>> @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture *ir) >>>> expr(ir_unop_sqrt, dot(dPdy, dPdy))); >>>> } >>>> >>>> - /* lambda_base = log2(rho). We're ignoring GL state biases for now. */ >>>> + /* lambda_base = log2(rho). It will be biased and clamped by values >>>> + * defined in SAMPLER_STATE to get the final lambda. >>>> + */ >>>> ir->op = ir_txl; >>>> ir->lod_info.lod = expr(ir_unop_log2, rho); >>>> >>>> @@ -168,8 +182,12 @@ bool >>>> brw_lower_texture_gradients(struct brw_context *brw, >>>> struct exec_list *instructions) >>>> { >>>> + /* sample_d is slower than the lowered version on gen7, and is not >>>> allowed >>>> + * in SIMD16 mode. Treating it as unsupported improves the >>>> performance. >>>> + */ >>>> + bool has_sample_d = brw->gen != 7; >>>> bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell; >>>> - lower_texture_grad_visitor v(has_sample_d_c); >>>> + lower_texture_grad_visitor v(has_sample_d, has_sample_d_c); >>>> >>>> visit_list_elements(&v, instructions); >>>> >>>> -- >>>> 1.8.3.1 >>>> >>>> _______________________________________________ >>>> mesa-dev mailing list >>>> mesa-dev@lists.freedesktop.org >>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> >> -- >> o...@lunarg.com _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev