On Tue, Sep 10, 2013 at 1:37 PM, Chia-I Wu <olva...@gmail.com> wrote: > On Tue, Sep 10, 2013 at 4:01 AM, Ian Romanick <i...@freedesktop.org> wrote: >> On 09/06/2013 05:05 AM, Chia-I Wu wrote: >>> On Thu, Sep 5, 2013 at 9:57 PM, Chia-I Wu <olva...@gmail.com> wrote: >>>> On Thu, Sep 5, 2013 at 5:12 PM, Chris Forbes <chr...@ijw.co.nz> wrote: >>>>> A possible explanation for the perf change is that Xonotic uses >>>>> anisotropic filtering at this quality level. Lowering to txl defeats >>>>> it. >>>> I had a look at that. gl_sampler->MaxAnisotropy is never greater than >>>> 1.0 in gen7_update_sampler_state() so there is no anisotropic >>>> filtering in this case. >>>> >>>> It makes sense to me that avoiding punting to SIMD8 helps the >>>> performance. But it is not clear to me why >10% performance change >>>> can still be observed when INTEL_DEBUG=no16 is specified. A >>>> reasonable explanation is that the image quality is degraded in some >>>> way, which is why I am still nervous about the change. >>> With INTEL_DEBUG=no16 set, the same trick hurts the performance on >>> Haswell by about 5%. That is, sample_d on Haswell is faster than the >>> one emulated with sample_l. >> >> What is the delta if sample_d is used for just SIMD8 shaders on HSW? >> Even when the shader can go SIMD16, some fragments will use the SIMD8 path. > brw_lower_texture_gradients applies on the IR so it is hard to > selectively apply it only for SIMD16 fs. I will see if I can work > something out here to get the numbers you need. I could clone the original IR list, run all but brw_lower_texture_gradients passes on it, and use the cloned list to generate SIMD8 code. This is to get the numbers, not for the final code.
But I sent another patch that should speed up sample_d. With it, we do not want to lower sample_d to sample_l at all. I will see how the patch goes first. > > >>> But since the trick makes SIMD16 possible, it gains 5% more fps when >>> INTEL_DEBUG=no16 is not set. >>> >>>> An alternative approach to avoid punting seems to emulate SIMD16 >>>> sample_d with two SIMD8 sample_d. It will take longer to implement >>>> given my familiarity with the code, and may be less performant. BUt >>>> that would allow things like anisotropic filtering to be honored. And we will need to do this to enable SIMD16. >>>> >>>> >>>>> It would be worth doing an image quality comparison before and after the >>>>> change. >>>> Yeah, that is worth doing. I will do that. >>>> >>>>> >>>>> -- Chris >>>>> >>>>> On Thu, Sep 5, 2013 at 8:35 PM, Chia-I Wu <olva...@gmail.com> wrote: >>>>>> sample_d is slower than the lowered version on gen7. For gen7, this >>>>>> improves >>>>>> Xonotic benchmark with Ultimate effects by as much as 25%: >>>>>> >>>>>> before the change: 40.06 fps >>>>>> after the change: 51.10 fps >>>>>> after the change with INTEL_DEBUG=no16: 44.46 fps >>>>>> >>>>>> As sample_d is not allowed in SIMD16 mode, I firstly thought the >>>>>> difference >>>>>> was from SIMD8 versus SIMD16. If that was the case, we would want to >>>>>> apply >>>>>> brw_lower_texture_gradients() only on fragment shaders in SIMD16 mode. >>>>>> >>>>>> But, as the numbers show, there is still 10% improvement when SIMD16 is >>>>>> forced >>>>>> off after the change. Thus textureGrad() is lowered unconditionally for >>>>>> now. >>>>>> Due to this and that I haven't tried it on Haswell, this is still RFC. >>>>>> >>>>>> No piglit regressions. >>>>>> >>>>>> Signed-off-by: Chia-I Wu <olva...@gmail.com> >>>>>> --- >>>>>> .../dri/i965/brw_lower_texture_gradients.cpp | 54 >>>>>> ++++++++++++++-------- >>>>>> 1 file changed, 36 insertions(+), 18 deletions(-) >>>>>> >>>>>> diff --git a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >>>>>> b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >>>>>> index 1589a20..f3fcb56 100644 >>>>>> --- a/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >>>>>> +++ b/src/mesa/drivers/dri/i965/brw_lower_texture_gradients.cpp >>>>>> @@ -34,8 +34,8 @@ using namespace ir_builder; >>>>>> >>>>>> class lower_texture_grad_visitor : public ir_hierarchical_visitor { >>>>>> public: >>>>>> - lower_texture_grad_visitor(bool has_sample_d_c) >>>>>> - : has_sample_d_c(has_sample_d_c) >>>>>> + lower_texture_grad_visitor(bool has_sample_d, bool has_sample_d_c) >>>>>> + : has_sample_d(has_sample_d), has_sample_d_c(has_sample_d_c) >>>>>> { >>>>>> progress = false; >>>>>> } >>>>>> @@ -44,6 +44,7 @@ public: >>>>>> >>>>>> >>>>>> bool progress; >>>>>> + bool has_sample_d; >>>>>> bool has_sample_d_c; >>>>>> >>>>>> private: >>>>>> @@ -90,22 +91,33 @@ txs_type(const glsl_type *type) >>>>>> ir_visitor_status >>>>>> lower_texture_grad_visitor::visit_leave(ir_texture *ir) >>>>>> { >>>>>> - /* Only lower textureGrad with shadow samplers */ >>>>>> - if (ir->op != ir_txd || !ir->shadow_comparitor) >>>>>> + if (ir->op != ir_txd) >>>>>> return visit_continue; >>>>>> >>>>>> - /* Lower textureGrad() with samplerCubeShadow even if we have the >>>>>> sample_d_c >>>>>> - * message. GLSL provides gradients for the 'r' coordinate. >>>>>> Unfortunately: >>>>>> - * >>>>>> - * From the Ivybridge PRM, Volume 4, Part 1, sample_d message >>>>>> description: >>>>>> - * "The r coordinate contains the faceid, and the r gradients are >>>>>> ignored >>>>>> - * by hardware." >>>>>> - * >>>>>> - * We likely need to do a similar treatment for samplerCube and >>>>>> - * samplerCubeArray, but we have insufficient testing for that at >>>>>> the moment. >>>>>> - */ >>>>>> - bool need_lowering = !has_sample_d_c || >>>>>> - ir->sampler->type->sampler_dimensionality == >>>>>> GLSL_SAMPLER_DIM_CUBE; >>>>>> + bool need_lowering = false; >>>>>> + >>>>>> + if (ir->shadow_comparitor) { >>>>>> + /* Lower textureGrad() with samplerCubeShadow even if we have the >>>>>> + * sample_d_c message. GLSL provides gradients for the 'r' >>>>>> coordinate. >>>>>> + * Unfortunately: >>>>>> + * >>>>>> + * From the Ivybridge PRM, Volume 4, Part 1, sample_d message >>>>>> + * description: "The r coordinate contains the faceid, and the r >>>>>> + * gradients are ignored by hardware." >>>>>> + */ >>>>>> + if (ir->sampler->type->sampler_dimensionality == >>>>>> GLSL_SAMPLER_DIM_CUBE) >>>>>> + need_lowering = true; >>>>>> + else if (!has_sample_d_c) >>>>>> + need_lowering = true; >>>>>> + } >>>>>> + else { >>>>>> + /* We likely need to do a similar treatment for samplerCube and >>>>>> + * samplerCubeArray, but we have insufficient testing for that at >>>>>> the >>>>>> + * moment. >>>>>> + */ >>>>>> + if (!has_sample_d) >>>>>> + need_lowering = true; >>>>>> + } >>>>>> >>>>>> if (!need_lowering) >>>>>> return visit_continue; >>>>>> @@ -154,7 +166,9 @@ lower_texture_grad_visitor::visit_leave(ir_texture >>>>>> *ir) >>>>>> expr(ir_unop_sqrt, dot(dPdy, dPdy))); >>>>>> } >>>>>> >>>>>> - /* lambda_base = log2(rho). We're ignoring GL state biases for now. >>>>>> */ >>>>>> + /* lambda_base = log2(rho). It will be biased and clamped by values >>>>>> + * defined in SAMPLER_STATE to get the final lambda. >>>>>> + */ >>>>>> ir->op = ir_txl; >>>>>> ir->lod_info.lod = expr(ir_unop_log2, rho); >>>>>> >>>>>> @@ -168,8 +182,12 @@ bool >>>>>> brw_lower_texture_gradients(struct brw_context *brw, >>>>>> struct exec_list *instructions) >>>>>> { >>>>>> + /* sample_d is slower than the lowered version on gen7, and is not >>>>>> allowed >>>>>> + * in SIMD16 mode. Treating it as unsupported improves the >>>>>> performance. >>>>>> + */ >>>>>> + bool has_sample_d = brw->gen != 7; >>>>>> bool has_sample_d_c = brw->gen >= 8 || brw->is_haswell; >>>>>> - lower_texture_grad_visitor v(has_sample_d_c); >>>>>> + lower_texture_grad_visitor v(has_sample_d, has_sample_d_c); >>>>>> >>>>>> visit_list_elements(&v, instructions); >>>>>> >>>>>> -- >>>>>> 1.8.3.1 >>>>>> >>>>>> _______________________________________________ >>>>>> mesa-dev mailing list >>>>>> mesa-dev@lists.freedesktop.org >>>>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>> >>>> -- >>>> o...@lunarg.com >> > > > > -- > o...@lunarg.com -- o...@lunarg.com _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev