[Mesa-dev] [PATCH] egl/android: config id increase one by one

2017-01-11 Thread Liu Zhiquan
when dri2_add_config, driver_configs may add to exist dri2_conf,
the config id should not increase in this case.
In the code, when ConfigID equal to count+1, it's mean a new config,
config_count will increase. otherwise it's a exist config.

Signed-off-by: Liu Zhiquan 
Signed-off-by: Long, Zhifang 
---
 src/egl/drivers/dri2/platform_android.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/egl/drivers/dri2/platform_android.c 
b/src/egl/drivers/dri2/platform_android.c
index 1c880f9..5bf6fd5 100644
--- a/src/egl/drivers/dri2/platform_android.c
+++ b/src/egl/drivers/dri2/platform_android.c
@@ -950,9 +950,9 @@ droid_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay 
*dpy)
  EGL_NONE
};
unsigned int format_count[ARRAY_SIZE(visuals)] = { 0 };
-   int count, i, j;
+   int config_count, i, j;
 
-   count = 0;
+   config_count = 0;
for (i = 0; dri2_dpy->driver_configs[i]; i++) {
   const EGLint surface_type = EGL_WINDOW_BIT | EGL_PBUFFER_BIT;
   struct dri2_egl_config *dri2_conf;
@@ -962,9 +962,10 @@ droid_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay 
*dpy)
  config_attrs[3] = visuals[j].format;
 
  dri2_conf = dri2_add_config(dpy, dri2_dpy->driver_configs[i],
-   count + 1, surface_type, config_attrs, visuals[j].rgba_masks);
+   config_count + 1, surface_type, config_attrs, 
visuals[j].rgba_masks);
  if (dri2_conf) {
-count++;
+if (dri2_conf->base.ConfigID == (config_count + 1))
+   config_count++;
 format_count[j]++;
  }
   }
@@ -977,7 +978,7 @@ droid_add_configs_for_visuals(_EGLDriver *drv, _EGLDisplay 
*dpy)
   }
}
 
-   return (count != 0);
+   return (config_count != 0);
 }
 
 static int
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] Revert "egl: stop claiming support for pbuffer + msaa"

2017-01-11 Thread Liu Zhiquan
This reverts commit 4d6d55deef291b489af4d7870c6f5eb223c8da5d.

SurfaceType added EGL_PBUFFER_BIT when dri_single_config is null.
This will fix in "egl: correct surface_type when add config" patch.
---
 src/egl/drivers/dri2/egl_dri2.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index 52fbdff..ac231d0 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -329,15 +329,6 @@ dri2_add_config(_EGLDisplay *disp, const __DRIconfig 
*dri_config, int id,
   surface_type &= ~EGL_PIXMAP_BIT;
}
 
-   /* No support for pbuffer + MSAA for now.
-*
-* XXX TODO: pbuffer + MSAA does not work and causes crashes.
-* See QT bugreport: https://bugreports.qt.io/browse/QTBUG-47509
-*/
-   if (base.Samples) {
-  surface_type &= ~EGL_PBUFFER_BIT;
-   }
-
conf->base.SurfaceType |= surface_type;
 
return conf;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] egl: correct surface_type when add config

2017-01-11 Thread Liu Zhiquan
When add config, dri_config is double or single. Should only add
EGL_WINDOW_BIT to surface_type for double dri_config, Should only add
EGL_PBUFFER_BIT EGL_PIXMAP_BIT to surface_type for single dri_config.
This avoid crash when operate on wrong surface_type which
dri_double_config or dri_single_config is null.

Signed-off-by: Liu Zhiquan 
Signed-off-by: Long, Zhifang 
---
 src/egl/drivers/dri2/egl_dri2.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
index ac231d0..60b24ad 100644
--- a/src/egl/drivers/dri2/egl_dri2.c
+++ b/src/egl/drivers/dri2/egl_dri2.c
@@ -287,6 +287,11 @@ dri2_add_config(_EGLDisplay *disp, const __DRIconfig 
*dri_config, int id,
   return NULL;
}
 
+   if (surface_type & (double_buffer ? EGL_WINDOW_BIT : (EGL_PBUFFER_BIT | 
EGL_PIXMAP_BIT)))
+  surface_type &= ~(!double_buffer ? EGL_WINDOW_BIT : (EGL_PBUFFER_BIT | 
EGL_PIXMAP_BIT));
+   else
+  return NULL;
+
config_id = base.ConfigID;
base.ConfigID= EGL_DONT_CARE;
base.SurfaceType = EGL_DONT_CARE;
@@ -325,10 +330,6 @@ dri2_add_config(_EGLDisplay *disp, const __DRIconfig 
*dri_config, int id,
   return NULL;
}
 
-   if (double_buffer) {
-  surface_type &= ~EGL_PIXMAP_BIT;
-   }
-
conf->base.SurfaceType |= surface_type;
 
return conf;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/22] anv: Reduce HiZ Resolves

2017-01-11 Thread Jason Ekstrand
I'm done reviewing things for the evening.  I've got a little more looking
to do but it all looks fantastic so far.  All of my comments have been
pretty cosmetic.  I'll finish tomorrow and we can go over things in the
office if you'd like.

--Jason

On Wed, Jan 11, 2017 at 6:01 PM, Nanley Chery  wrote:

> On Wed, Jan 11, 2017 at 05:54:46PM -0800, Nanley Chery wrote:
> > In my testing, this series completely removes HiZ resolves for the
> > following Vulkan applications: Dota 2, Talos Principle, and the Sascha
> > Willems Vulkan examples and demos. This is accomplished with two major
> > changes. The first change is to transition the current HiZ resolving
> > algorithm from resolving on attachment load/store ops to resolving on
> > image layout transitions. The second change is to enable sampling from
> > HiZ on BDW+.
> >
> > There are some notable additional changes. To support performing layout
> > transitions outside of a render pass we implement the HiZ sequence in
> > BLORP which can emit depth stencil state outside of a render pass.
> >
> > Performance data was collected at different points in this series. These
> > tests were run on a SKL GT4, with a monitor resolution of 1440x900. For
> > Dota 2 and Talos Principle, the average of three fullscreen runs was
> > taken. At least one warm-up run was performed between driver builds. The
> > Talos Principle runs are omitted as no significant changes were
> > measured. No warm-up was performed for the Vulkan examples and the demo
> > resolution was the default window size on startup.
>
> Here are the results:
>
> shadowmapping (Vulkan example) - visual measurement of min-max:
> * HiZ disabled   - ~579-593
> * HiZ load/store - ~602-655
> * HiZ layouts- ~628-673
> * HiZ layouts + sampling - ~766-806
>
> Dota 2 demo benchmark:
> * HiZ disabled   - 46.9
> * HiZ load/store - 43.4
> * HiZ layouts- 51.3
> * HiZ layouts + sampling - 51.5
>
> -Nanley
>
> >
> > Nanley Chery (22):
> >   intel/blorp: Add the BDW+ optimized HZ_OP sequence to BLORP
> >   intel/blorp_blit: Handle ISL_AUX_USAGE_HIZ
> >   anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ
> >   anv: Use ::anv_attachment_state for toggling HiZ per subpass
> >   anv: Enable HiZ support for multiple subpasses
> >   intel/blorp_clear: Add gen8 HiZ clearing functions
> >   anv: Use gen8 BLORP HiZ clearing functions
> >   anv/blorp: Add a gen8 HiZ op resolve function
> >   anv: Use the gen8 BLORP HiZ resolving function
> >   anv: Delete anv's HiZ op emit function
> >   anv: Add helpers to handle depth buffer layout transitions
> >   anv: Store depth stencil layouts
> >   anv: Prepare for transitioning to the requested final layout
> >   anv: Avoid resolves incurred by fast depth clears
> >   anv: Disable HiZ for input attachments
> >   anv/image: Disable HiZ for storage images
> >   anv: Perform HiZ resolves only on layout transitions
> >   isl/surface_state: Handle ISL_AUX_USAGE_HIZ
> >   anv: Add a helper to determine sampling with HiZ
> >   anv/blorp: Don't fast depth clear samplable HiZ buffers on BDW
> >   anv: Enable sampling from HiZ
> >   anv: Avoid some resolves for samplable HiZ buffers
> >
> >  src/intel/blorp/blorp.h|  12 ++
> >  src/intel/blorp/blorp_blit.c   |   2 +
> >  src/intel/blorp/blorp_clear.c  |  80 +
> >  src/intel/blorp/blorp_genX_exec.h  |  87 ++
> >  src/intel/isl/isl_surface_state.c  |  38 ++-
> >  src/intel/vulkan/TODO  |   3 +-
> >  src/intel/vulkan/anv_blorp.c   | 100 -
> >  src/intel/vulkan/anv_genX.h|   3 -
> >  src/intel/vulkan/anv_image.c   |  46 +++-
> >  src/intel/vulkan/anv_pass.c|   8 ++
> >  src/intel/vulkan/anv_private.h |  51 +++--
> >  src/intel/vulkan/gen7_cmd_buffer.c |   7 --
> >  src/intel/vulkan/gen8_cmd_buffer.c | 224 --
> ---
> >  src/intel/vulkan/genX_cmd_buffer.c | 168 
> >  14 files changed, 548 insertions(+), 281 deletions(-)
> >
> > --
> > 2.11.0
> >
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 20/22] anv/blorp: Don't fast depth clear samplable HiZ buffers on BDW

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 5:55 PM, Nanley Chery  wrote:

> Avoid the resolves that would be required if fast depth clears were
> allowed for such buffers.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_blorp.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index 5d410f7d86..4649ffd9db 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -1270,6 +1270,15 @@ anv_cmd_buffer_clear_subpass(struct anv_cmd_buffer
> *cmd_buffer)
>  * ANV_HZ_FC_VAL.
>  */
> clear_with_hiz = false;
> +} else if (gen == 8 &&
> +   anv_can_sample_with_hiz(cmd_
> buffer->device->info.gen,
> +   iview->image->samples)) {
> +   /* Only gen9+ supports returning ANV_HZ_FC_VAL when
> sampling a
> +* fast-cleared portion of a HiZ buffer. Testing has
> revealed
> +* that Gen8 only supports returning 0.0f. Gens prior to
> gen8 do
> +* not support this feature at all.
> +*/
> +   clear_with_hiz = false;
>

Doesn't this mean that we should use a clear value of 0 on BDW and 1 on
SKL+?  I'm confused by this comment.


>  }
>   }
>
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: switch fmul to increase chance of optimising it away

2017-01-11 Thread Timothy Arceri
On Thu, 2017-01-12 at 15:20 +1100, Timothy Arceri wrote:
> If one of the inputs to the multiplcation in ffma is the result of
> an fmul there is a chance that we can reuse the result of that
> fmul in other ffma calls if we do the multiplication in the right
> order.
> 
> For example it is a fairly common pattern for shaders to do something
> similar to this:
> 
>   const float a = 0.5;
>   in vec4 b;
>   in float c;
> 
>   ...
> 
>   b.x = b.x * c;
>   b.y = b.y * c;
> 
>   ...
> 
>   b.x = b.x * a + a;
>   b.y = b.y * a + a;
> 
> So by simply detecting that constant a is part of the multiplication
> in ffma and switching it with previous fmul that updates b we end up
> with:
> 
>   ...
> 
>   c = a * c;
> 
>   ...
> 
>   b.x = b.x * c + a;
>   b.y = b.y * c + a;
> 
> shader-db results BDW:
> 
> total instructions in shared programs: 13056473 -> 13038614 (-0.14%)
> instructions in affected programs: 2433641 -> 2415782 (-0.73%)
> helped: 10114
> HURT: 330
> 
> total cycles in shared programs: 256515402 -> 256331740 (-0.07%)
> cycles in affected programs: 137476650 -> 137292988 (-0.13%)
> helped: 10802
> HURT: 3871
> 
> total spills in shared programs: 14923 -> 14675 (-1.66%)
> spills in affected programs: 7976 -> 7728 (-3.11%)
> helped: 280
> HURT: 27
> 
> total fills in shared programs: 20141 -> 19691 (-2.23%)
> fills in affected programs: 11408 -> 10958 (-3.94%)
> helped: 282
> HURT: 27
> 
> LOST:   6
> GAINED: 1
> ---
>  src/compiler/nir/nir_opt_algebraic.py |  2 ++
>  src/compiler/nir/nir_search_helpers.h | 11 +++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> index b5974a7..3cddda3 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -331,6 +331,8 @@ optimizations = [
> (('~fadd', '#a', ('fadd', b, '#c')), ('fadd', ('fadd', a, c),
> b)),
> (('iadd', '#a', ('iadd', b, '#c')), ('iadd', ('iadd', a, c), b)),
>  
> +   (('fadd', ('fmul(is_used_once)', ('fmul(is_used_once)',
> 'a(is_not_const)', 'b(is_not_const)'), '#c'), d), ('fadd', ('fmul',
> ('fmul', a, '#c'), b), d)),
> +
> # Misc. lowering
> (('fmod@32', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv',
> a, b, 'options->lower_fmod32'),
> (('fmod@64', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv',
> a, b, 'options->lower_fmod64'),
> diff --git a/src/compiler/nir/nir_search_helpers.h
> b/src/compiler/nir/nir_search_helpers.h
> index ddaff52..f89e417 100644
> --- a/src/compiler/nir/nir_search_helpers.h
> +++ b/src/compiler/nir/nir_search_helpers.h
> @@ -115,6 +115,17 @@ is_zero_to_one(nir_alu_instr *instr, unsigned
> src, unsigned num_components,
>  }
>  
>  static inline bool
> +is_not_const(nir_alu_instr *instr, unsigned src, unsigned
> num_components)

I'm surprised this compiled and ran. There should be an extra param
here:

 const uint8_t *swizzle

Fixed locally.

> +{
> +   nir_const_value *val = nir_src_as_const_value(instr-
> >src[src].src);
> +
> +   if (val)
> +  return false;
> +
> +   return true;
> +}
> +
> +static inline bool
>  is_used_more_than_once(nir_alu_instr *instr)
>  {
> bool zero_if_use = list_empty(>dest.dest.ssa.if_uses);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 18/22] isl/surface_state: Handle ISL_AUX_USAGE_HIZ

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 5:55 PM, Nanley Chery  wrote:

> Signed-off-by: Nanley Chery 
> ---
>  src/intel/isl/isl_surface_state.c | 38 ++
> +---
>  1 file changed, 35 insertions(+), 3 deletions(-)
>
> diff --git a/src/intel/isl/isl_surface_state.c
> b/src/intel/isl/isl_surface_state.c
> index b9093cc951..54e48eb5da 100644
> --- a/src/intel/isl/isl_surface_state.c
> +++ b/src/intel/isl/isl_surface_state.c
> @@ -498,11 +498,14 @@ isl_genX(surf_fill_state_s)(const struct isl_device
> *dev, void *state,
> assert(info->y_offset_sa % y_div == 0);
> s.XOffset = info->x_offset_sa / x_div;
> s.YOffset = info->y_offset_sa / y_div;
> -#else
> -   assert(info->x_offset_sa == 0);
> -   assert(info->y_offset_sa == 0);
>  #endif
>
> +   /* If Auxiliary Surface Mode is not AUX_NONE, this field must be zero.
> */
> +   if ((GEN_GEN == 4 && !GEN_IS_G4X) || info->aux_usage !=
> ISL_AUX_USAGE_NONE) {
> +  assert(info->x_offset_sa == 0);
> +  assert(info->y_offset_sa == 0);
>

I believe we already handle this higher up.


> +   }
> +
>  #if GEN_GEN >= 7
> if (info->aux_surf && info->aux_usage != ISL_AUX_USAGE_NONE) {
>struct isl_tile_info tile_info;
> @@ -520,6 +523,26 @@ isl_genX(surf_fill_state_s)(const struct isl_device
> *dev, void *state,
>s.AuxiliarySurfaceQPitch =
>   isl_surf_get_array_pitch_sa_rows(info->aux_surf) >> 2;
>s.AuxiliarySurfaceBaseAddress = info->aux_address;
> +
> +  if (info->aux_usage == ISL_AUX_USAGE_HIZ) {
> + /* The number of samples must be 1 */
> + assert(info->surf->samples == 1);
> +
> + /* The dimension must not be 3D */
> + assert(info->surf->dim != ISL_SURF_DIM_3D);
> +
> + /* The format must be one of the following: */
> + switch (info->view->format) {
>

How about

assert(info->view->format == ISL_FORMAT_R32_FLOT ||
info->view->format...

Mostly cosmatic.  Doesn't really matter.


> + case ISL_FORMAT_R32_FLOAT:
> + case ISL_FORMAT_R24_UNORM_X8_TYPELESS:
> + case ISL_FORMAT_R16_UNORM:
> +break;
> + default:
> +assert(!"Incompatible HiZ Sampling format");
> +break;
> + }
> +  }
> +
>s.AuxiliarySurfaceMode = isl_to_gen_aux_mode[info->aux_usage];
>  #else
>assert(info->aux_usage == ISL_AUX_USAGE_MCS ||
> @@ -548,6 +571,15 @@ isl_genX(surf_fill_state_s)(const struct isl_device
> *dev, void *state,
>   s.SamplerL2BypassModeDisable = true;
>   break;
>default:
> + /* From the SKL PRM, Programming Note under Sampler Output
> Channel
> +  * Mapping:
> +  *
> +  *If a surface has an associated HiZ Auxilliary surface, the
> +  *Sampler L2 Bypass Mode Disable field in the
> RENDER_SURFACE_STATE
> +  *must be set.
> +  */
> + if (GEN_GEN >= 9 && info->aux_usage == ISL_AUX_USAGE_HIZ)
> +s.SamplerL2BypassModeDisable = true;
>   break;
>}
> }
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 17/22] anv: Perform HiZ resolves only on layout transitions

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 5:55 PM, Nanley Chery  wrote:

> This is a better mapping to the Vulkan API and improves performance in
> all tested workloads.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_blorp.c   | 48 ++---
>  src/intel/vulkan/genX_cmd_buffer.c | 54 ++
> +---
>  2 files changed, 46 insertions(+), 56 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index 9919ac7ea0..5d410f7d86 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -1579,52 +1579,8 @@ anv_gen8_hiz_op_resolve(struct anv_cmd_buffer
> *cmd_buffer,
> image->aux_usage != ISL_AUX_USAGE_HIZ)
>return;
>
> -   const struct anv_cmd_state *cmd_state = _buffer->state;
> -   const uint32_t ds = cmd_state->subpass->depth_stencil_attachment;
> -
> -   /* Section 7.4. of the Vulkan 1.0.27 spec states:
> -*
> -*   "The render area must be contained within the framebuffer
> dimensions."
> -*
> -* Therefore, the only way the extent of the render area can match
> that of
> -* the image view is if the render area offset equals (0, 0).
> -*/
> -   const bool full_surface_op =
> - cmd_state->render_area.extent.width == image->extent.width
> &&
> - cmd_state->render_area.extent.height ==
> image->extent.height;
> -   if (full_surface_op)
> -  assert(cmd_state->render_area.offset.x == 0 &&
> - cmd_state->render_area.offset.y == 0);
> -
> -   /* Check the subpass index to determine if skipping a resolve is
> allowed */
> -   const uint32_t subpass_idx = cmd_state->subpass -
> cmd_state->pass->subpasses;
> -   switch (op) {
> -   case BLORP_HIZ_OP_DEPTH_RESOLVE:
> -  if (cmd_buffer->state.pass->attachments[ds].store_op !=
> -  VK_ATTACHMENT_STORE_OP_STORE &&
> -  subpass_idx == cmd_state->pass->subpass_count - 1)
> - return;
> -  break;
> -   case BLORP_HIZ_OP_HIZ_RESOLVE:
> -  /* If the render area covers the entire surface *and* load_op is
> either
> -   * CLEAR or DONT_CARE then the previous contents of the depth buffer
> -   * will be entirely discarded.  In this case, we can skip the HiZ
> -   * resolve.
> -   *
> -   * If the render area is not the full surface, we need to do
> -   * the resolve because otherwise data outside the render area may
> get
> -   * garbled by the resolve at the end of the render pass.
> -   */
> -  if (full_surface_op &&
> -  cmd_buffer->state.pass->attachments[ds].load_op !=
> -  VK_ATTACHMENT_LOAD_OP_LOAD && subpass_idx == 0)
> - return;
> -  break;
> -   case BLORP_HIZ_OP_DEPTH_CLEAR:
> -   case BLORP_HIZ_OP_NONE:
> -  unreachable("Invalid HiZ OP");
> -   }
> -
> +   assert(op == BLORP_HIZ_OP_HIZ_RESOLVE ||
> +  op == BLORP_HIZ_OP_DEPTH_RESOLVE);
>
> struct blorp_batch batch;
> blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 1793c4df26..447baa08b2 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -510,7 +510,13 @@ genX(cmd_buffer_setup_attachments)(struct
> anv_cmd_buffer *cmd_buffer,
>state->attachments[i].aux_usage,
>state->attachments[i].color_rt_state);
>   } else {
> -state->attachments[i].aux_usage = iview->image->aux_usage;
> +if (iview->image->aux_usage == ISL_AUX_USAGE_HIZ &&
> +iview->aspect_mask & VK_IMAGE_ASPECT_DEPTH_BIT) {
> +   state->attachments[i].aux_usage =
> +  layout_to_hiz_usage(att->initial_layout);
> +} else {
> +   state->attachments[i].aux_usage = ISL_AUX_USAGE_NONE;
> +}
>  state->attachments[i].input_aux_usage = ISL_AUX_USAGE_NONE;
>   }
>
> @@ -915,6 +921,13 @@ void genX(CmdPipelineBarrier)(
> for (uint32_t i = 0; i < imageMemoryBarrierCount; i++) {
>src_flags |= pImageMemoryBarriers[i].srcAccessMask;
>dst_flags |= pImageMemoryBarriers[i].dstAccessMask;
> +  ANV_FROM_HANDLE(anv_image, image, pImageMemoryBarriers[i].image);
> +  if (pImageMemoryBarriers[i].subresourceRange.aspectMask &
> +  VK_IMAGE_ASPECT_DEPTH_BIT) {
> + transition_depth_buffer(cmd_buffer, image,
> + pImageMemoryBarriers[i].oldLayout,
> + pImageMemoryBarriers[i].newLayout);
> +  }
> }
>
> enum anv_pipe_bits pipe_bits = 0;
> @@ -2297,9 +2310,16 @@ genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer
> *cmd_buffer,
> const struct anv_image_view *iview =
>anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
>
> -   if (iview) {
> -  

Re: [Mesa-dev] [PATCH] nir: shuffle fmuls to allow const evaluation

2017-01-11 Thread Matt Turner
Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: switch fmul to increase chance of optimising it away

2017-01-11 Thread Timothy Arceri
I can change this to use a symbol e.g. '!' for non const but since we
are moving towards helpers I did it this way for now.

On Thu, 2017-01-12 at 15:20 +1100, Timothy Arceri wrote:
> If one of the inputs to the multiplcation in ffma is the result of
> an fmul there is a chance that we can reuse the result of that
> fmul in other ffma calls if we do the multiplication in the right
> order.
> 
> For example it is a fairly common pattern for shaders to do something
> similar to this:
> 
>   const float a = 0.5;
>   in vec4 b;
>   in float c;
> 
>   ...
> 
>   b.x = b.x * c;
>   b.y = b.y * c;
> 
>   ...
> 
>   b.x = b.x * a + a;
>   b.y = b.y * a + a;
> 
> So by simply detecting that constant a is part of the multiplication
> in ffma and switching it with previous fmul that updates b we end up
> with:
> 
>   ...
> 
>   c = a * c;
> 
>   ...
> 
>   b.x = b.x * c + a;
>   b.y = b.y * c + a;
> 
> shader-db results BDW:
> 
> total instructions in shared programs: 13056473 -> 13038614 (-0.14%)
> instructions in affected programs: 2433641 -> 2415782 (-0.73%)
> helped: 10114
> HURT: 330
> 
> total cycles in shared programs: 256515402 -> 256331740 (-0.07%)
> cycles in affected programs: 137476650 -> 137292988 (-0.13%)
> helped: 10802
> HURT: 3871
> 
> total spills in shared programs: 14923 -> 14675 (-1.66%)
> spills in affected programs: 7976 -> 7728 (-3.11%)
> helped: 280
> HURT: 27
> 
> total fills in shared programs: 20141 -> 19691 (-2.23%)
> fills in affected programs: 11408 -> 10958 (-3.94%)
> helped: 282
> HURT: 27
> 
> LOST:   6
> GAINED: 1
> ---
>  src/compiler/nir/nir_opt_algebraic.py |  2 ++
>  src/compiler/nir/nir_search_helpers.h | 11 +++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/src/compiler/nir/nir_opt_algebraic.py
> b/src/compiler/nir/nir_opt_algebraic.py
> index b5974a7..3cddda3 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -331,6 +331,8 @@ optimizations = [
> (('~fadd', '#a', ('fadd', b, '#c')), ('fadd', ('fadd', a, c),
> b)),
> (('iadd', '#a', ('iadd', b, '#c')), ('iadd', ('iadd', a, c), b)),
>  
> +   (('fadd', ('fmul(is_used_once)', ('fmul(is_used_once)',
> 'a(is_not_const)', 'b(is_not_const)'), '#c'), d), ('fadd', ('fmul',
> ('fmul', a, '#c'), b), d)),
> +
> # Misc. lowering
> (('fmod@32', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv',
> a, b, 'options->lower_fmod32'),
> (('fmod@64', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv',
> a, b, 'options->lower_fmod64'),
> diff --git a/src/compiler/nir/nir_search_helpers.h
> b/src/compiler/nir/nir_search_helpers.h
> index ddaff52..f89e417 100644
> --- a/src/compiler/nir/nir_search_helpers.h
> +++ b/src/compiler/nir/nir_search_helpers.h
> @@ -115,6 +115,17 @@ is_zero_to_one(nir_alu_instr *instr, unsigned
> src, unsigned num_components,
>  }
>  
>  static inline bool
> +is_not_const(nir_alu_instr *instr, unsigned src, unsigned
> num_components)
> +{
> +   nir_const_value *val = nir_src_as_const_value(instr-
> >src[src].src);
> +
> +   if (val)
> +  return false;
> +
> +   return true;
> +}
> +
> +static inline bool
>  is_used_more_than_once(nir_alu_instr *instr)
>  {
> bool zero_if_use = list_empty(>dest.dest.ssa.if_uses);
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: switch fmul to increase chance of optimising it away

2017-01-11 Thread Timothy Arceri
If one of the inputs to the multiplcation in ffma is the result of
an fmul there is a chance that we can reuse the result of that
fmul in other ffma calls if we do the multiplication in the right
order.

For example it is a fairly common pattern for shaders to do something
similar to this:

  const float a = 0.5;
  in vec4 b;
  in float c;

  ...

  b.x = b.x * c;
  b.y = b.y * c;

  ...

  b.x = b.x * a + a;
  b.y = b.y * a + a;

So by simply detecting that constant a is part of the multiplication
in ffma and switching it with previous fmul that updates b we end up
with:

  ...

  c = a * c;

  ...

  b.x = b.x * c + a;
  b.y = b.y * c + a;

shader-db results BDW:

total instructions in shared programs: 13056473 -> 13038614 (-0.14%)
instructions in affected programs: 2433641 -> 2415782 (-0.73%)
helped: 10114
HURT: 330

total cycles in shared programs: 256515402 -> 256331740 (-0.07%)
cycles in affected programs: 137476650 -> 137292988 (-0.13%)
helped: 10802
HURT: 3871

total spills in shared programs: 14923 -> 14675 (-1.66%)
spills in affected programs: 7976 -> 7728 (-3.11%)
helped: 280
HURT: 27

total fills in shared programs: 20141 -> 19691 (-2.23%)
fills in affected programs: 11408 -> 10958 (-3.94%)
helped: 282
HURT: 27

LOST:   6
GAINED: 1
---
 src/compiler/nir/nir_opt_algebraic.py |  2 ++
 src/compiler/nir/nir_search_helpers.h | 11 +++
 2 files changed, 13 insertions(+)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index b5974a7..3cddda3 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -331,6 +331,8 @@ optimizations = [
(('~fadd', '#a', ('fadd', b, '#c')), ('fadd', ('fadd', a, c), b)),
(('iadd', '#a', ('iadd', b, '#c')), ('iadd', ('iadd', a, c), b)),
 
+   (('fadd', ('fmul(is_used_once)', ('fmul(is_used_once)', 'a(is_not_const)', 
'b(is_not_const)'), '#c'), d), ('fadd', ('fmul', ('fmul', a, '#c'), b), d)),
+
# Misc. lowering
(('fmod@32', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod32'),
(('fmod@64', a, b), ('fsub', a, ('fmul', b, ('ffloor', ('fdiv', a, b, 
'options->lower_fmod64'),
diff --git a/src/compiler/nir/nir_search_helpers.h 
b/src/compiler/nir/nir_search_helpers.h
index ddaff52..f89e417 100644
--- a/src/compiler/nir/nir_search_helpers.h
+++ b/src/compiler/nir/nir_search_helpers.h
@@ -115,6 +115,17 @@ is_zero_to_one(nir_alu_instr *instr, unsigned src, 
unsigned num_components,
 }
 
 static inline bool
+is_not_const(nir_alu_instr *instr, unsigned src, unsigned num_components)
+{
+   nir_const_value *val = nir_src_as_const_value(instr->src[src].src);
+
+   if (val)
+  return false;
+
+   return true;
+}
+
+static inline bool
 is_used_more_than_once(nir_alu_instr *instr)
 {
bool zero_if_use = list_empty(>dest.dest.ssa.if_uses);
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 16/22] anv/image: Disable HiZ for storage images

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 5:55 PM, Nanley Chery  wrote:

> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_image.c | 6 ++
>  1 file changed, 6 insertions(+)
>
> diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
> index f8a21c2982..7d5beeabbe 100644
> --- a/src/intel/vulkan/anv_image.c
> +++ b/src/intel/vulkan/anv_image.c
> @@ -190,6 +190,12 @@ make_surface(const struct anv_device *dev,
>* input attachments.
>*/
>   anv_finishme("Implement HiZ for input attachments");
> +  } else if (image->usage & VK_IMAGE_USAGE_STORAGE_BIT) {
> + /* Storage images must be in the VK_IMAGE_LAYOUT_GENERAL layout
> for
> +  * load and store operations. For the same reasons as above,
> disable
> +  * HiZ for now.
>

I don't think you can have depth storage images.


> +  */
> + anv_finishme("Implement HiZ for storage images");
>} else if (!env_var_as_boolean("INTEL_VK_HIZ", dev->info.gen >=
> 8)) {
>   anv_finishme("Implement gen7 HiZ");
>} else if (vk_info->mipLevels > 1) {
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/22] anv: Disable HiZ for input attachments

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 8:05 PM, Jason Ekstrand 
wrote:

> On Wed, Jan 11, 2017 at 5:55 PM, Nanley Chery 
> wrote:
>
>> Signed-off-by: Nanley Chery 
>> ---
>>  src/intel/vulkan/anv_image.c   | 17 +
>>  src/intel/vulkan/genX_cmd_buffer.c | 10 --
>>  2 files changed, 17 insertions(+), 10 deletions(-)
>>
>> diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
>> index d821629191..f8a21c2982 100644
>> --- a/src/intel/vulkan/anv_image.c
>> +++ b/src/intel/vulkan/anv_image.c
>> @@ -182,6 +182,14 @@ make_surface(const struct anv_device *dev,
>> */
>>if (!(image->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT))
>> {
>>   /* It will never be used as an attachment, HiZ is pointless. */
>> +  } else if (image->usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT) {
>> + /* It will never have a layout of
>> +  * VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, so HiZ is
>> +  * currently pointless. If transfer operations learn to use the
>> HiZ
>> +  * buffer, we can enable HiZ for VK_IMAGE_LAYOUT_GENERAL and
>> support
>> +  * input attachments.
>> +  */
>>
>
> From the 1.0.37 spec:
>
> "An attachment used as an input attachment and depth/stencil attachment
> must be in either VK_IMAGE_LAYOUT_GENERAL or
> VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL."
>
> So it can happen.  Since gen8 can texture from HiZ, this shouldn't be a
> problem.  I think we'll need this for gen7 though.
>
>
>> + anv_finishme("Implement HiZ for input attachments");
>>} else if (!env_var_as_boolean("INTEL_VK_HIZ", dev->info.gen >=
>> 8)) {
>>   anv_finishme("Implement gen7 HiZ");
>>} else if (vk_info->mipLevels > 1) {
>> @@ -529,14 +537,15 @@ anv_CreateImageView(VkDevice _device,
>> if (surf_usage == ISL_AUX_USAGE_HIZ)
>>surf_usage = ISL_AUX_USAGE_NONE;
>>
>> -   /* Input attachment surfaces for color or depth are allocated and
>> filled
>> +   /* Input attachment surfaces for color are allocated and filled
>>  * out at BeginRenderPass time because they need compression
>> information.
>> -* Stencil image do not support compression so we just use the texture
>> -* surface from the image view.
>> +* Compression is not yet enabled for depth textures and stencil
>> doesn't
>> +* allow compression so we can just use the texture surface state
>> from the
>> +* view.
>>  */
>> if (image->usage & VK_IMAGE_USAGE_SAMPLED_BIT ||
>> (image->usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT &&
>> -(iview->aspect_mask & VK_IMAGE_ASPECT_STENCIL_BIT))) {
>> +!(iview->aspect_mask & VK_IMAGE_ASPECT_COLOR_BIT))) {
>>iview->sampler_surface_state = alloc_surface_state(device);
>>
>>struct isl_view view = iview->isl;
>> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
>> b/src/intel/vulkan/genX_cmd_buffer.c
>> index baa932e517..1793c4df26 100644
>> --- a/src/intel/vulkan/genX_cmd_buffer.c
>> +++ b/src/intel/vulkan/genX_cmd_buffer.c
>> @@ -303,11 +303,11 @@ need_input_attachment_state(const struct
>> anv_render_pass_attachment *att)
>> if (!(att->usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT))
>>return false;
>>
>> -   /* We only allocate input attachment states for color and depth
>> surfaces.
>> -* Stencil doesn't allow compression so we can just use the texture
>> surface
>> -* state from the view
>> +   /* We only allocate input attachment states for color surfaces.
>> Compression
>> +* is not yet enabled for depth textures and stencil doesn't allow
>> +* compression so we can just use the texture surface state from the
>> view.
>>  */
>> -   return vk_format_is_color(att->format) ||
>> vk_format_has_depth(att->format);
>> +   return vk_format_is_color(att->format);
>>  }
>>
>>  static enum isl_aux_usage
>> @@ -518,8 +518,6 @@ genX(cmd_buffer_setup_attachments)(struct
>> anv_cmd_buffer *cmd_buffer,
>>  const struct isl_surf *surf;
>>  if (att_aspects == VK_IMAGE_ASPECT_COLOR_BIT) {
>> surf = >image->color_surface.isl;
>> -} else {
>> -   surf = >image->depth_surface.isl;
>>  }
>>
>
I think we can just drop conditional all together.


>
>>  struct isl_view view = iview->isl;
>> --
>> 2.11.0
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 15/22] anv: Disable HiZ for input attachments

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 5:55 PM, Nanley Chery  wrote:

> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_image.c   | 17 +
>  src/intel/vulkan/genX_cmd_buffer.c | 10 --
>  2 files changed, 17 insertions(+), 10 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
> index d821629191..f8a21c2982 100644
> --- a/src/intel/vulkan/anv_image.c
> +++ b/src/intel/vulkan/anv_image.c
> @@ -182,6 +182,14 @@ make_surface(const struct anv_device *dev,
> */
>if (!(image->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT))
> {
>   /* It will never be used as an attachment, HiZ is pointless. */
> +  } else if (image->usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT) {
> + /* It will never have a layout of
> +  * VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, so HiZ is
> +  * currently pointless. If transfer operations learn to use the
> HiZ
> +  * buffer, we can enable HiZ for VK_IMAGE_LAYOUT_GENERAL and
> support
> +  * input attachments.
> +  */
>

>From the 1.0.37 spec:

"An attachment used as an input attachment and depth/stencil attachment must
be in either VK_IMAGE_LAYOUT_GENERAL or
VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL."

So it can happen.  Since gen8 can texture from HiZ, this shouldn't be a
problem.  I think we'll need this for gen7 though.


> + anv_finishme("Implement HiZ for input attachments");
>} else if (!env_var_as_boolean("INTEL_VK_HIZ", dev->info.gen >=
> 8)) {
>   anv_finishme("Implement gen7 HiZ");
>} else if (vk_info->mipLevels > 1) {
> @@ -529,14 +537,15 @@ anv_CreateImageView(VkDevice _device,
> if (surf_usage == ISL_AUX_USAGE_HIZ)
>surf_usage = ISL_AUX_USAGE_NONE;
>
> -   /* Input attachment surfaces for color or depth are allocated and
> filled
> +   /* Input attachment surfaces for color are allocated and filled
>  * out at BeginRenderPass time because they need compression
> information.
> -* Stencil image do not support compression so we just use the texture
> -* surface from the image view.
> +* Compression is not yet enabled for depth textures and stencil
> doesn't
> +* allow compression so we can just use the texture surface state from
> the
> +* view.
>  */
> if (image->usage & VK_IMAGE_USAGE_SAMPLED_BIT ||
> (image->usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT &&
> -(iview->aspect_mask & VK_IMAGE_ASPECT_STENCIL_BIT))) {
> +!(iview->aspect_mask & VK_IMAGE_ASPECT_COLOR_BIT))) {
>iview->sampler_surface_state = alloc_surface_state(device);
>
>struct isl_view view = iview->isl;
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index baa932e517..1793c4df26 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -303,11 +303,11 @@ need_input_attachment_state(const struct
> anv_render_pass_attachment *att)
> if (!(att->usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT))
>return false;
>
> -   /* We only allocate input attachment states for color and depth
> surfaces.
> -* Stencil doesn't allow compression so we can just use the texture
> surface
> -* state from the view
> +   /* We only allocate input attachment states for color surfaces.
> Compression
> +* is not yet enabled for depth textures and stencil doesn't allow
> +* compression so we can just use the texture surface state from the
> view.
>  */
> -   return vk_format_is_color(att->format) || vk_format_has_depth(att->
> format);
> +   return vk_format_is_color(att->format);
>  }
>
>  static enum isl_aux_usage
> @@ -518,8 +518,6 @@ genX(cmd_buffer_setup_attachments)(struct
> anv_cmd_buffer *cmd_buffer,
>  const struct isl_surf *surf;
>  if (att_aspects == VK_IMAGE_ASPECT_COLOR_BIT) {
> surf = >image->color_surface.isl;
> -} else {
> -   surf = >image->depth_surface.isl;
>  }
>
>  struct isl_view view = iview->isl;
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 14/22] anv: Avoid resolves incurred by fast depth clears

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 5:55 PM, Nanley Chery  wrote:

> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_blorp.c   |  9 +++--
>  src/intel/vulkan/anv_private.h | 15 +++
>  src/intel/vulkan/genX_cmd_buffer.c |  5 +
>  3 files changed, 23 insertions(+), 6 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index 433e82f938..9919ac7ea0 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -1264,6 +1264,12 @@ anv_cmd_buffer_clear_subpass(struct anv_cmd_buffer
> *cmd_buffer)
> render_area.offset.y +
> render_area.extent.height)) {
> clear_with_hiz = false;
> +} else if (clear_att.clearValue.depthStencil.depth !=
> +   ANV_HZ_FC_VAL) {
> +   /* Don't enable fast depth clears for any color not equal
> to
> +* ANV_HZ_FC_VAL.
> +*/
> +   clear_with_hiz = false;
>  }
>   }
>
> @@ -1626,8 +1632,7 @@ anv_gen8_hiz_op_resolve(struct anv_cmd_buffer
> *cmd_buffer,
> struct blorp_surf surf;
> get_blorp_surf_for_anv_image(image, VK_IMAGE_ASPECT_DEPTH_BIT,
>  ISL_AUX_USAGE_HIZ, );
> -   surf.clear_color.u32[0] = (uint32_t)
> -  cmd_state->attachments[ds].clear_value.depthStencil.depth;
> +   surf.clear_color.u32[0] = (uint32_t) ANV_HZ_FC_VAL;
>

Ugh... We should really fix this ugly corner of blorp.


>
> blorp_gen6_hiz_op(, , 0, 0, op);
> blorp_batch_finish();
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> index 237308fb3e..98692b5913 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -72,6 +72,21 @@ struct gen_l3_config;
>  extern "C" {
>  #endif
>
> +/* Allowing different clear colors requires us to perform a depth resolve
> at
> + * the end of certain render passes. This is because while slow clears
> store
> + * the clear color in the HiZ buffer, fast clears (without a resolve)
> don't.
> + * See the PRMs for examples describing when additional resolves would be
> + * necessary. To enable fast clears without requiring extra resolves, we
> set
> + * the clear value to a globally-defined one. We could allow different
> values
> + * if the user doesn't expect coherent data during or after a render
> passes
> + * (VK_ATTACHMENT_STORE_OP_DONT_CARE), but such users (aside from the
> CTS)
> + * don't seem to exist yet. In almost all Vulkan applications tested thus
> far,
> + * 1.0f seems to be the only value used. The only application that
> doesn't set
> + * this value does so through the usage of an seemingly uninitialized
> clear
> + * value.
> + */
> +#define ANV_HZ_FC_VAL 1.0f
> +
>  #define MAX_VBS 32
>  #define MAX_SETS 8
>  #define MAX_RTS  8
> diff --git a/src/intel/vulkan/genX_cmd_buffer.c
> b/src/intel/vulkan/genX_cmd_buffer.c
> index 95d0cfc983..baa932e517 100644
> --- a/src/intel/vulkan/genX_cmd_buffer.c
> +++ b/src/intel/vulkan/genX_cmd_buffer.c
> @@ -2283,10 +2283,7 @@ cmd_buffer_emit_depth_stencil(struct
> anv_cmd_buffer *cmd_buffer)
> anv_batch_emit(_buffer->batch, GENX(3DSTATE_CLEAR_PARAMS), cp) {
>if (has_hiz) {
>   cp.DepthClearValueValid = true;
> - const uint32_t ds =
> -cmd_buffer->state.subpass->depth_stencil_attachment;
> - cp.DepthClearValue =
> -cmd_buffer->state.attachments[ds].clear_value.depthStencil.
> depth;
> + cp.DepthClearValue = ANV_HZ_FC_VAL;
>}
> }
>  }
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: shuffle fmuls to allow const evaluation

2017-01-11 Thread Timothy Arceri
Shader-db results BDW:

total instructions in shared programs: 13059905 -> 13059274 (-0.00%)
instructions in affected programs: 88407 -> 87776 (-0.71%)
helped: 329
HURT: 0

total cycles in shared programs: 256570054 -> 256548062 (-0.01%)
cycles in affected programs: 2308020 -> 2286028 (-0.95%)
helped: 242
HURT: 69

LOST:   1
GAINED: 0
---
 src/compiler/nir/nir_opt_algebraic.py |  3 ++-
 src/compiler/nir/nir_search_helpers.h | 22 ++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/src/compiler/nir/nir_opt_algebraic.py 
b/src/compiler/nir/nir_opt_algebraic.py
index a557f7b..b5974a7 100644
--- a/src/compiler/nir/nir_opt_algebraic.py
+++ b/src/compiler/nir/nir_opt_algebraic.py
@@ -323,9 +323,10 @@ optimizations = [
(('imul', ('ineg', a), b), ('ineg', ('imul', a, b))),
 
# Reassociate constants in add/mul chains so they can be folded together.
-   # For now, we only handle cases where the constants are separated by
+   # For now, we mostly only handle cases where the constants are separated by
# a single non-constant.  We could do better eventually.
(('~fmul', '#a', ('fmul', b, '#c')), ('fmul', ('fmul', a, c), b)),
+   (('~fmul', '#a', ('fmul(is_used_once)', b, ('fmul(is_used_once)', c, 
'#d'))), ('fmul', ('fmul', b, ('fmul', a, d)), c)),
(('imul', '#a', ('imul', b, '#c')), ('imul', ('imul', a, c), b)),
(('~fadd', '#a', ('fadd', b, '#c')), ('fadd', ('fadd', a, c), b)),
(('iadd', '#a', ('iadd', b, '#c')), ('iadd', ('iadd', a, c), b)),
diff --git a/src/compiler/nir/nir_search_helpers.h 
b/src/compiler/nir/nir_search_helpers.h
index e925a2b..ddaff52 100644
--- a/src/compiler/nir/nir_search_helpers.h
+++ b/src/compiler/nir/nir_search_helpers.h
@@ -131,6 +131,28 @@ is_used_more_than_once(nir_alu_instr *instr)
 }
 
 static inline bool
+is_used_once(nir_alu_instr *instr)
+{
+   bool zero_if_use = list_empty(>dest.dest.ssa.if_uses);
+   bool zero_use = list_empty(>dest.dest.ssa.uses);
+
+   if (zero_if_use && zero_use)
+  return false;
+
+   if (!zero_if_use && list_is_singular(>dest.dest.ssa.uses))
+ return false;
+
+   if (!zero_use && list_is_singular(>dest.dest.ssa.if_uses))
+ return false;
+
+   if (!list_is_singular(>dest.dest.ssa.if_uses) &&
+   !list_is_singular(>dest.dest.ssa.uses))
+  return false;
+
+   return true;
+}
+
+static inline bool
 is_not_used_by_if(nir_alu_instr *instr)
 {
return list_empty(>dest.dest.ssa.if_uses);
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa 13.1.0 release plan

2017-01-11 Thread Timothy Arceri
On Wed, 2017-01-11 at 13:19 +, Emil Velikov wrote:
> On 30 November 2016 at 20:23, Emil Velikov 
> wrote:
> > Hi all,
> > 
> > With holidays not far off, it might be a nice idea to consider the
> > branchpoint/release schedule for the next release.
> > 
> > I will be having limited internet access during 20 Dec - 7 Jan,
> > thus
> > the I'm leaning towards following:
> >  Jan 13 2017 - Feature freeze/Release candidate 1
> >  Jan 20 2017 - Release candidate 2
> >  Jan 27 2017 - Release candidate 3
> >  Feb 03 2017 - Release candidate 4/final release
> > 
> 
> Friendly reminder that the above schedule is in place, meaning that
> we
> have ~2 days until the feature freeze/branchpoint.
> 
> Noticeable work for inclusion:
>  - Etnaviv driver: Looking good and ready for merging.
>  - On-disk shader cache: There's still patches lacking review.

It seems unlikely at this point. The clean-up prep series is almost
fully landed, any takers for reviewing 4 (the big one), 6-9, and 22-25?
[1] fresh rebase in my lastest_cleanups branch.

If those get a review very soon I'll send out a fresh rebase of the 50+
shader cache series and we will see how we go from there.

[1] https://patchwork.freedesktop.org/series/17671/
> 
> If people have outstanding patches which need work, are not reviewed,
> other, I suggest getting on it.
> Do coordinate with others (irc/email/etc.) if your work seems to be
> missing some love. We're human and might have missed/forgotten about
> some patches.
> 
> On a related note:
> As suggested by Marek (et al.) Mesa versioning scheme will be going a
> small update. Namely, the major number will be bumped at the
> beginning
> of each year and shall reflect the last two digits of the respective
> year. I would like to thank Marek for the suggestion and everyone who
> took part in the related discussion.
> 
> -Emil
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/22] intel/blorp_blit: Handle ISL_AUX_USAGE_HIZ

2017-01-11 Thread Jason Ekstrand
I would rather not...  Those asserts exist precisely to prevent someone
from doing a blorp copy with HiZ enabled when it's not actually supported.
The right thing to do is to set aux_usage to ISL_AUX_USAGE_NONE if you want
blorp to ignore aux.

On Wed, Jan 11, 2017 at 5:54 PM, Nanley Chery  wrote:

> Prevent assert failures that would occur in the next patch. BLORP
> ignores this flag internally, so no protections are lost here.
>
> Signed-off-by: Nanley Chery 
> ---
>  src/intel/blorp/blorp_blit.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
> index 1cbd9403c9..aa389dbcc1 100644
> --- a/src/intel/blorp/blorp_blit.c
> +++ b/src/intel/blorp/blorp_blit.c
> @@ -2291,9 +2291,11 @@ blorp_copy(struct blorp_batch *batch,
>isl_format_get_layout(params.dst.surf.format);
>
> assert(params.src.aux_usage == ISL_AUX_USAGE_NONE ||
> +  params.src.aux_usage == ISL_AUX_USAGE_HIZ ||
>params.src.aux_usage == ISL_AUX_USAGE_MCS ||
>params.src.aux_usage == ISL_AUX_USAGE_CCS_E);
> assert(params.dst.aux_usage == ISL_AUX_USAGE_NONE ||
> +  params.dst.aux_usage == ISL_AUX_USAGE_HIZ ||
>params.dst.aux_usage == ISL_AUX_USAGE_MCS ||
>params.dst.aux_usage == ISL_AUX_USAGE_CCS_E);
>
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/22] anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 5:54 PM, Nanley Chery  wrote:

> The helper doesn't provide additional functionality over the current
> infrastructure.
>

Actually... It does.  The point of anv_image::aux_usage is to say how the
aux surface is supposed to be used outside of a render pass.  Just because
an image has a HiZ surface allocated doesn't mean that it should always be
used.  The intention (and the way it's used for CCS) was that you could
have an aux surface allocated but where it was only used inside the render
pass and not outside.

Unfortunately, that mental model doesn't really work when you're using
layout transitions.  I'll have to think about this as I read the rest of
the patches.


> Signed-off-by: Nanley Chery 
> ---
>  src/intel/vulkan/anv_blorp.c   |  2 +-
>  src/intel/vulkan/anv_image.c   | 10 --
>  src/intel/vulkan/anv_private.h | 10 --
>  src/intel/vulkan/gen8_cmd_buffer.c |  2 +-
>  src/intel/vulkan/genX_cmd_buffer.c |  2 +-
>  5 files changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index b431d6af48..13d5b5f9d7 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -857,7 +857,7 @@ void anv_CmdClearDepthStencilImage(
> struct blorp_surf depth, stencil;
> if (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT) {
>get_blorp_surf_for_anv_image(image, VK_IMAGE_ASPECT_DEPTH_BIT,
> -   image->aux_usage, );
> +   ISL_AUX_USAGE_NONE, );
> } else {
>memset(, 0, sizeof(depth));
> }
> diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
> index f262d8a524..d821629191 100644
> --- a/src/intel/vulkan/anv_image.c
> +++ b/src/intel/vulkan/anv_image.c
> @@ -195,6 +195,7 @@ make_surface(const struct anv_device *dev,
>   isl_surf_get_hiz_surf(>isl_dev, >depth_surface.isl,
> >aux_surface.isl);
>   add_surface(image, >aux_surface);
> + image->aux_usage = ISL_AUX_USAGE_HIZ;
>}
> } else if (aspect == VK_IMAGE_ASPECT_COLOR_BIT && vk_info->samples ==
> 1) {
>if (!unlikely(INTEL_DEBUG & DEBUG_NO_RBC)) {
> @@ -523,6 +524,11 @@ anv_CreateImageView(VkDevice _device,
>iview->isl.usage = 0;
> }
>
> +   /* Sampling from HiZ is not yet enabled */
> +   enum isl_aux_usage surf_usage = image->aux_usage;
> +   if (surf_usage == ISL_AUX_USAGE_HIZ)
> +  surf_usage = ISL_AUX_USAGE_NONE;
> +
> /* Input attachment surfaces for color or depth are allocated and
> filled
>  * out at BeginRenderPass time because they need compression
> information.
>  * Stencil image do not support compression so we just use the texture
> @@ -540,7 +546,7 @@ anv_CreateImageView(VkDevice _device,
>.surf = >isl,
>.view = ,
>.aux_surf = >aux_surface.isl,
> -  .aux_usage = image->aux_usage,
> +  .aux_usage = surf_usage,
>.mocs = device->default_mocs);
>
>if (!device->info.has_llc)
> @@ -564,7 +570,7 @@ anv_CreateImageView(VkDevice _device,
>   .surf = >isl,
>   .view = ,
>   .aux_surf = >aux_surface.isl,
> - .aux_usage = image->aux_usage,
> + .aux_usage = surf_usage,
>   .mocs = device->default_mocs);
>} else {
>   anv_fill_buffer_surface_state(device,
> iview->storage_surface_state,
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_
> private.h
> index dbc8c3cf68..4b2cac5c48 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -1642,16 +1642,6 @@ const struct anv_surface *
>  anv_image_get_surface_for_aspect_mask(const struct anv_image *image,
>VkImageAspectFlags aspect_mask);
>
> -static inline bool
> -anv_image_has_hiz(const struct anv_image *image)
> -{
> -   /* We must check the aspect because anv_image::aux_surface may be used
> for
> -* any type of auxiliary surface, not just HiZ.
> -*/
> -   return (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT) &&
> -  image->aux_surface.isl.size > 0;
> -}
> -
>  struct anv_buffer_view {
> enum isl_format format; /**< VkBufferViewCreateInfo::format */
> struct anv_bo *bo;
> diff --git a/src/intel/vulkan/gen8_cmd_buffer.c
> b/src/intel/vulkan/gen8_cmd_buffer.c
> index 3e4aa9bc62..892a035304 100644
> --- a/src/intel/vulkan/gen8_cmd_buffer.c
> +++ b/src/intel/vulkan/gen8_cmd_buffer.c
> @@ -337,7 +337,7 @@ genX(cmd_buffer_emit_hz_op)(struct anv_cmd_buffer
> *cmd_buffer,
> const struct anv_image_view *iview =
>

Re: [Mesa-dev] [PATCH] spirv: Handle patch decorations up-front

2017-01-11 Thread Kenneth Graunke
On Wednesday, January 11, 2017 6:32:34 PM PST Jason Ekstrand wrote:
> Once again, SPIR-V is insane... It allows you to place "patch"
> decorations on structure members.  Presumably, this is so that you can
> do something such as
> 
> out struct S {
>layout(location = 0) patch vec4 thing1;
>layout(location = 0) vec4 thing2;
> } str;
> 
> And have your I/O "nicely" organized.  While this is a bit silly, it's
> allowed and well-defined so whatever.  Where it really gets interesting
> is when you have an array of struct.  SPIR-V says nothing about not
> allowing you to have those qualifiers on the members of a struct that's
> inside an array and GLSLang does this.  Specifically, if you have
> 
> layout(location = 0) out patch struct S {
>vec4 thing1;
>vec4 thing2;
> } str[2];
> 
> then GLSLang will place the "patch" decorations on the struct members.
> This is ridiculous there is no way that having some of them be patch and
> some not would be well-defined given that patch and non-patch outputs
> are in effectively different storage classes.  This commit moves around
> the way we handle the "patch" decoration so that we can detect even the
> crazy cases and handle them.
> 
> Fixes: dEQP-VK.tessellation.user_defined_io.per_patch_block_array.*
> ---
>  src/compiler/spirv/vtn_variables.c | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/src/compiler/spirv/vtn_variables.c 
> b/src/compiler/spirv/vtn_variables.c
> index 3ecb54f..91e5b13 100644
> --- a/src/compiler/spirv/vtn_variables.c
> +++ b/src/compiler/spirv/vtn_variables.c
> @@ -1359,8 +1359,29 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp 
> opcode,
>  
>case vtn_variable_mode_input:
>case vtn_variable_mode_output: {
> + /* In order to know whether or not we're a per-vertex inout, we need
> +  * the patch qualifier.  This means walking the variable decorations
> +  * early before we actually create any variables.  Not a big deal.
> +  *
> +  * GLSLang really likes to place decorations in the most interior
> +  * thing it possibly can.  In particular, if you have a struct, it
> +  * will place the patch decorations on the struct members.  This
> +  * should be handled by the variable splitting below just fine.
> +  *
> +  * If you have an array-of-struct, things get even more wierd as it

weird

> +  * will place the patch decorations on the struct even though it's
> +  * inside an array and some of the members being patch and others 
> not
> +  * makes no sense whatsoever.  Since the only sensible thing is for
> +  * it to be all or nothing, we'll call it patch if any of the 
> members
> +  * are declared patch.
> +  */

I wonder if we could emit a warning if they don't match.  That would be
nice, although not essential...

It would be nice if SPIR-V on glslang were less crazy in this regard
but I don't expect either of those to realistically happen.  I think
your approach is reasonable.

Thank you so much for fixing this for me.

Reviewed-by: Kenneth Graunke 

>   var->patch = false;
>   vtn_foreach_decoration(b, val, var_is_patch_cb, >patch);
> + if (glsl_type_is_array(var->type->type) &&
> + glsl_type_is_struct(without_array->type)) {
> +vtn_foreach_decoration(b, without_array->val,
> +   var_is_patch_cb, >patch);
> + }
>  
>   /* For inputs and outputs, we immediately split structures.  This
>* is for a couple of reasons.  For one, builtins may all come in
> @@ -1400,6 +1421,7 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp 
> opcode,
> var->members[i]->interface_type =
>interface_type->members[i]->type;
> var->members[i]->data.mode = nir_mode;
> +   var->members[i]->data.patch = var->patch;
>  }
>   } else {
>  var->var = rzalloc(b->shader, nir_variable);
> @@ -1407,6 +1429,7 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp 
> opcode,
>  var->var->type = var->type->type;
>  var->var->interface_type = interface_type->type;
>  var->var->data.mode = nir_mode;
> +var->var->data.patch = var->patch;
>   }
>  
>   /* For inputs and outputs, we need to grab locations and builtin
> 



signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] spirv: Handle patch decorations up-front

2017-01-11 Thread Jason Ekstrand
Once again, SPIR-V is insane... It allows you to place "patch"
decorations on structure members.  Presumably, this is so that you can
do something such as

out struct S {
   layout(location = 0) patch vec4 thing1;
   layout(location = 0) vec4 thing2;
} str;

And have your I/O "nicely" organized.  While this is a bit silly, it's
allowed and well-defined so whatever.  Where it really gets interesting
is when you have an array of struct.  SPIR-V says nothing about not
allowing you to have those qualifiers on the members of a struct that's
inside an array and GLSLang does this.  Specifically, if you have

layout(location = 0) out patch struct S {
   vec4 thing1;
   vec4 thing2;
} str[2];

then GLSLang will place the "patch" decorations on the struct members.
This is ridiculous there is no way that having some of them be patch and
some not would be well-defined given that patch and non-patch outputs
are in effectively different storage classes.  This commit moves around
the way we handle the "patch" decoration so that we can detect even the
crazy cases and handle them.

Fixes: dEQP-VK.tessellation.user_defined_io.per_patch_block_array.*
---
 src/compiler/spirv/vtn_variables.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/src/compiler/spirv/vtn_variables.c 
b/src/compiler/spirv/vtn_variables.c
index 3ecb54f..91e5b13 100644
--- a/src/compiler/spirv/vtn_variables.c
+++ b/src/compiler/spirv/vtn_variables.c
@@ -1359,8 +1359,29 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp opcode,
 
   case vtn_variable_mode_input:
   case vtn_variable_mode_output: {
+ /* In order to know whether or not we're a per-vertex inout, we need
+  * the patch qualifier.  This means walking the variable decorations
+  * early before we actually create any variables.  Not a big deal.
+  *
+  * GLSLang really likes to place decorations in the most interior
+  * thing it possibly can.  In particular, if you have a struct, it
+  * will place the patch decorations on the struct members.  This
+  * should be handled by the variable splitting below just fine.
+  *
+  * If you have an array-of-struct, things get even more wierd as it
+  * will place the patch decorations on the struct even though it's
+  * inside an array and some of the members being patch and others not
+  * makes no sense whatsoever.  Since the only sensible thing is for
+  * it to be all or nothing, we'll call it patch if any of the members
+  * are declared patch.
+  */
  var->patch = false;
  vtn_foreach_decoration(b, val, var_is_patch_cb, >patch);
+ if (glsl_type_is_array(var->type->type) &&
+ glsl_type_is_struct(without_array->type)) {
+vtn_foreach_decoration(b, without_array->val,
+   var_is_patch_cb, >patch);
+ }
 
  /* For inputs and outputs, we immediately split structures.  This
   * is for a couple of reasons.  For one, builtins may all come in
@@ -1400,6 +1421,7 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp opcode,
var->members[i]->interface_type =
   interface_type->members[i]->type;
var->members[i]->data.mode = nir_mode;
+   var->members[i]->data.patch = var->patch;
 }
  } else {
 var->var = rzalloc(b->shader, nir_variable);
@@ -1407,6 +1429,7 @@ vtn_handle_variables(struct vtn_builder *b, SpvOp opcode,
 var->var->type = var->type->type;
 var->var->interface_type = interface_type->type;
 var->var->data.mode = nir_mode;
+var->var->data.patch = var->patch;
  }
 
  /* For inputs and outputs, we need to grab locations and builtin
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/22] anv: Reduce HiZ Resolves

2017-01-11 Thread Nanley Chery
On Wed, Jan 11, 2017 at 05:54:46PM -0800, Nanley Chery wrote:
> In my testing, this series completely removes HiZ resolves for the
> following Vulkan applications: Dota 2, Talos Principle, and the Sascha
> Willems Vulkan examples and demos. This is accomplished with two major
> changes. The first change is to transition the current HiZ resolving
> algorithm from resolving on attachment load/store ops to resolving on
> image layout transitions. The second change is to enable sampling from
> HiZ on BDW+.
> 
> There are some notable additional changes. To support performing layout
> transitions outside of a render pass we implement the HiZ sequence in
> BLORP which can emit depth stencil state outside of a render pass.
> 
> Performance data was collected at different points in this series. These
> tests were run on a SKL GT4, with a monitor resolution of 1440x900. For
> Dota 2 and Talos Principle, the average of three fullscreen runs was
> taken. At least one warm-up run was performed between driver builds. The
> Talos Principle runs are omitted as no significant changes were
> measured. No warm-up was performed for the Vulkan examples and the demo
> resolution was the default window size on startup.

Here are the results:

shadowmapping (Vulkan example) - visual measurement of min-max:
* HiZ disabled   - ~579-593
* HiZ load/store - ~602-655
* HiZ layouts- ~628-673
* HiZ layouts + sampling - ~766-806

Dota 2 demo benchmark:
* HiZ disabled   - 46.9
* HiZ load/store - 43.4
* HiZ layouts- 51.3
* HiZ layouts + sampling - 51.5

-Nanley

> 
> Nanley Chery (22):
>   intel/blorp: Add the BDW+ optimized HZ_OP sequence to BLORP
>   intel/blorp_blit: Handle ISL_AUX_USAGE_HIZ
>   anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ
>   anv: Use ::anv_attachment_state for toggling HiZ per subpass
>   anv: Enable HiZ support for multiple subpasses
>   intel/blorp_clear: Add gen8 HiZ clearing functions
>   anv: Use gen8 BLORP HiZ clearing functions
>   anv/blorp: Add a gen8 HiZ op resolve function
>   anv: Use the gen8 BLORP HiZ resolving function
>   anv: Delete anv's HiZ op emit function
>   anv: Add helpers to handle depth buffer layout transitions
>   anv: Store depth stencil layouts
>   anv: Prepare for transitioning to the requested final layout
>   anv: Avoid resolves incurred by fast depth clears
>   anv: Disable HiZ for input attachments
>   anv/image: Disable HiZ for storage images
>   anv: Perform HiZ resolves only on layout transitions
>   isl/surface_state: Handle ISL_AUX_USAGE_HIZ
>   anv: Add a helper to determine sampling with HiZ
>   anv/blorp: Don't fast depth clear samplable HiZ buffers on BDW
>   anv: Enable sampling from HiZ
>   anv: Avoid some resolves for samplable HiZ buffers
> 
>  src/intel/blorp/blorp.h|  12 ++
>  src/intel/blorp/blorp_blit.c   |   2 +
>  src/intel/blorp/blorp_clear.c  |  80 +
>  src/intel/blorp/blorp_genX_exec.h  |  87 ++
>  src/intel/isl/isl_surface_state.c  |  38 ++-
>  src/intel/vulkan/TODO  |   3 +-
>  src/intel/vulkan/anv_blorp.c   | 100 -
>  src/intel/vulkan/anv_genX.h|   3 -
>  src/intel/vulkan/anv_image.c   |  46 +++-
>  src/intel/vulkan/anv_pass.c|   8 ++
>  src/intel/vulkan/anv_private.h |  51 +++--
>  src/intel/vulkan/gen7_cmd_buffer.c |   7 --
>  src/intel/vulkan/gen8_cmd_buffer.c | 224 
> -
>  src/intel/vulkan/genX_cmd_buffer.c | 168 
>  14 files changed, 548 insertions(+), 281 deletions(-)
> 
> -- 
> 2.11.0
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 21/22] anv: Enable sampling from HiZ

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/TODO|  1 -
 src/intel/vulkan/anv_image.c | 19 ---
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/src/intel/vulkan/TODO b/src/intel/vulkan/TODO
index 37fd16b437..38acc0dd5b 100644
--- a/src/intel/vulkan/TODO
+++ b/src/intel/vulkan/TODO
@@ -8,7 +8,6 @@ Missing Features:
  - Sparse memory
 
 Performance:
- - Sampling from HiZ (Nanley)
  - Multi-{sampled/gen8,LOD} HiZ
  - Compressed multisample support
  - Pushing pieces of UBOs?
diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index 7d5beeabbe..ee563685bb 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -538,10 +538,22 @@ anv_CreateImageView(VkDevice _device,
   iview->isl.usage = 0;
}
 
-   /* Sampling from HiZ is not yet enabled */
+   /* If the HiZ buffer can be sampled from, set the constant clear color.
+* If it cannot, disable the isl aux usage flag.
+*/
+   float red_clear_color = 0.0f;
enum isl_aux_usage surf_usage = image->aux_usage;
-   if (surf_usage == ISL_AUX_USAGE_HIZ)
-  surf_usage = ISL_AUX_USAGE_NONE;
+   if (image->aux_usage == ISL_AUX_USAGE_HIZ) {
+  if (anv_can_sample_with_hiz(device->info.gen, image->samples)) {
+ /* When a HiZ buffer is sampled on gen9+, ensure that
+  * the constant fast clear value is set in the surface state.
+  */
+ if (device->info.gen >= 9)
+red_clear_color = ANV_HZ_FC_VAL;
+  } else {
+ surf_usage = ISL_AUX_USAGE_NONE;
+  }
+   }
 
/* Input attachment surfaces for color are allocated and filled
 * out at BeginRenderPass time because they need compression information.
@@ -560,6 +572,7 @@ anv_CreateImageView(VkDevice _device,
   iview->sampler_surface_state.map,
   .surf = >isl,
   .view = ,
+  .clear_color.f32 = { red_clear_color,},
   .aux_surf = >aux_surface.isl,
   .aux_usage = surf_usage,
   .mocs = device->default_mocs);
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 22/22] anv: Avoid some resolves for samplable HiZ buffers

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/genX_cmd_buffer.c | 54 +-
 1 file changed, 41 insertions(+), 13 deletions(-)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 447baa08b2..11745f8b9e 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -311,11 +311,21 @@ need_input_attachment_state(const struct 
anv_render_pass_attachment *att)
 }
 
 static enum isl_aux_usage
-layout_to_hiz_usage(VkImageLayout layout)
+layout_to_hiz_usage(VkImageLayout layout, uint8_t samples)
 {
switch (layout) {
case VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL:
   return ISL_AUX_USAGE_HIZ;
+   case VK_IMAGE_LAYOUT_DEPTH_STENCIL_READ_ONLY_OPTIMAL:
+   case VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL:
+  if (anv_can_sample_with_hiz(GEN_GEN, samples))
+ return ISL_AUX_USAGE_HIZ;
+  /* Fall-through */
+   case VK_IMAGE_LAYOUT_GENERAL:
+  /* This buffer could be used as a source or destination in a transfer
+   * operation. Transfer operations current don't perform HiZ-enabled reads
+   * and writes.
+   */
default:
   return ISL_AUX_USAGE_NONE;
}
@@ -336,26 +346,43 @@ transition_depth_buffer(struct anv_cmd_buffer *cmd_buffer,
if (image->aux_usage != ISL_AUX_USAGE_HIZ)
   return;
 
-   const bool hiz_enabled = layout_to_hiz_usage(initial_layout) ==
+   const bool hiz_enabled = layout_to_hiz_usage(initial_layout, 
image->samples) ==
 ISL_AUX_USAGE_HIZ;
-   const bool enable_hiz = layout_to_hiz_usage(final_layout) ==
+   const bool enable_hiz = layout_to_hiz_usage(final_layout, image->samples) ==
ISL_AUX_USAGE_HIZ;
 
+   /* Images that have sampling with HiZ enabled cause all shader sampling to
+* load data with the HiZ buffer. Therefore, in the case of transitioning to
+* the general layout - which currently routes all writes to the depth
+* buffer - we must ensure that the HiZ buffer remains consistent with the
+* depth buffer by performing a HIZ resolve after performing the resolve
+* required by this transition (if not already HiZ).
+*/
+   const bool needs_hiz_resolve = final_layout == VK_IMAGE_LAYOUT_GENERAL &&
+  (hiz_enabled || initial_layout == VK_IMAGE_LAYOUT_UNDEFINED) &&
+  anv_can_sample_with_hiz(GEN_GEN, image->samples);
+
/* We've already initialized the aux HiZ buffer at BindImageMemory time,
 * so there's no need to perform a HIZ resolve or clear to avoid GPU hangs.
 * This initial layout indicates that the user doesn't care about the data
-* that's currently in the buffer, so no resolves are necessary.
+* that's currently in the buffer, so resolves are not necessary except for
+* the case mentioned above.
 */
-   if (initial_layout == VK_IMAGE_LAYOUT_UNDEFINED)
+   if (!needs_hiz_resolve && initial_layout == VK_IMAGE_LAYOUT_UNDEFINED)
   return;
 
-   if (hiz_enabled == enable_hiz) {
-  /* The same buffer will be used, no resolves are necessary */
-   } else if (hiz_enabled && !enable_hiz) {
-  anv_gen8_hiz_op_resolve(cmd_buffer, image, BLORP_HIZ_OP_DEPTH_RESOLVE);
+   if (!hiz_enabled && enable_hiz) {
+ anv_gen8_hiz_op_resolve(cmd_buffer, image, BLORP_HIZ_OP_HIZ_RESOLVE);
} else {
-  assert(!hiz_enabled && enable_hiz);
-  anv_gen8_hiz_op_resolve(cmd_buffer, image, BLORP_HIZ_OP_HIZ_RESOLVE);
+  if (hiz_enabled == enable_hiz) {
+ /* If the same buffer will be used, no resolves are necessary except
+  * for the special case noted above.
+  */
+  } else if (hiz_enabled && !enable_hiz) {
+ anv_gen8_hiz_op_resolve(cmd_buffer, image, 
BLORP_HIZ_OP_DEPTH_RESOLVE);
+  }
+  if (needs_hiz_resolve)
+ anv_gen8_hiz_op_resolve(cmd_buffer, image, BLORP_HIZ_OP_HIZ_RESOLVE);
}
 }
 
@@ -513,7 +540,7 @@ genX(cmd_buffer_setup_attachments)(struct anv_cmd_buffer 
*cmd_buffer,
 if (iview->image->aux_usage == ISL_AUX_USAGE_HIZ &&
 iview->aspect_mask & VK_IMAGE_ASPECT_DEPTH_BIT) {
state->attachments[i].aux_usage =
-  layout_to_hiz_usage(att->initial_layout);
+  layout_to_hiz_usage(att->initial_layout, 
iview->image->samples);
 } else {
state->attachments[i].aux_usage = ISL_AUX_USAGE_NONE;
 }
@@ -2319,7 +2346,8 @@ genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer 
*cmd_buffer,
   cmd_buffer->state.attachments[ds].current_layout =
  cmd_buffer->state.subpass->depth_stencil_layout;
   cmd_buffer->state.attachments[ds].aux_usage =
- layout_to_hiz_usage(cmd_buffer->state.subpass->depth_stencil_layout);
+ layout_to_hiz_usage(cmd_buffer->state.subpass->depth_stencil_layout,
+ iview->image->samples);
}
 
cmd_buffer_emit_depth_stencil(cmd_buffer);
-- 

[Mesa-dev] [PATCH 19/22] anv: Add a helper to determine sampling with HiZ

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_private.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 98692b5913..fc7b6d1ec8 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1603,6 +1603,13 @@ struct anv_image {
struct anv_surface aux_surface;
 };
 
+/* Returns true if a HiZ-enabled depth buffer can be sampled from. */
+static inline bool
+anv_can_sample_with_hiz(uint8_t gen, uint32_t samples)
+{
+   return gen >= 8 && samples == 1;
+}
+
 void
 anv_gen8_hiz_op_resolve(struct anv_cmd_buffer *cmd_buffer,
 const struct anv_image *image,
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/22] anv: Prepare for transitioning to the requested final layout

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_pass.c| 3 +++
 src/intel/vulkan/anv_private.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c
index ea86fa9ff2..5df6330c6a 100644
--- a/src/intel/vulkan/anv_pass.c
+++ b/src/intel/vulkan/anv_pass.c
@@ -118,6 +118,7 @@ VkResult anv_CreateRenderPass(
 subpass->input_attachments[j] = a;
 pass->attachments[a].usage |= VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT;
 pass->attachments[a].subpass_usage[i] |= ANV_SUBPASS_USAGE_INPUT;
+pass->attachments[a].last_subpass_idx = i;
 
 if (desc->pDepthStencilAttachment &&
 a == desc->pDepthStencilAttachment->attachment)
@@ -134,6 +135,7 @@ VkResult anv_CreateRenderPass(
 subpass->color_attachments[j] = a;
 pass->attachments[a].usage |= VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT;
 pass->attachments[a].subpass_usage[i] |= ANV_SUBPASS_USAGE_DRAW;
+pass->attachments[a].last_subpass_idx = i;
  }
   }
 
@@ -156,6 +158,7 @@ VkResult anv_CreateRenderPass(
   ANV_SUBPASS_USAGE_RESOLVE_SRC;
pass->attachments[a].subpass_usage[i] |=
   ANV_SUBPASS_USAGE_RESOLVE_DST;
+   pass->attachments[a].last_subpass_idx = i;
 }
  }
   }
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index f4034866a7..237308fb3e 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1764,6 +1764,9 @@ struct anv_render_pass_attachment {
 
/* An array, indexed by subpass id, of how the attachment will be used. */
enum anv_subpass_usage * subpass_usage;
+
+   /* The subpass id in which the attachment will be used last. */
+   uint32_t last_subpass_idx;
 };
 
 struct anv_render_pass {
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16/22] anv/image: Disable HiZ for storage images

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_image.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index f8a21c2982..7d5beeabbe 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -190,6 +190,12 @@ make_surface(const struct anv_device *dev,
   * input attachments.
   */
  anv_finishme("Implement HiZ for input attachments");
+  } else if (image->usage & VK_IMAGE_USAGE_STORAGE_BIT) {
+ /* Storage images must be in the VK_IMAGE_LAYOUT_GENERAL layout for
+  * load and store operations. For the same reasons as above, disable
+  * HiZ for now.
+  */
+ anv_finishme("Implement HiZ for storage images");
   } else if (!env_var_as_boolean("INTEL_VK_HIZ", dev->info.gen >= 8)) {
  anv_finishme("Implement gen7 HiZ");
   } else if (vk_info->mipLevels > 1) {
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 18/22] isl/surface_state: Handle ISL_AUX_USAGE_HIZ

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/isl/isl_surface_state.c | 38 +++---
 1 file changed, 35 insertions(+), 3 deletions(-)

diff --git a/src/intel/isl/isl_surface_state.c 
b/src/intel/isl/isl_surface_state.c
index b9093cc951..54e48eb5da 100644
--- a/src/intel/isl/isl_surface_state.c
+++ b/src/intel/isl/isl_surface_state.c
@@ -498,11 +498,14 @@ isl_genX(surf_fill_state_s)(const struct isl_device *dev, 
void *state,
assert(info->y_offset_sa % y_div == 0);
s.XOffset = info->x_offset_sa / x_div;
s.YOffset = info->y_offset_sa / y_div;
-#else
-   assert(info->x_offset_sa == 0);
-   assert(info->y_offset_sa == 0);
 #endif
 
+   /* If Auxiliary Surface Mode is not AUX_NONE, this field must be zero. */
+   if ((GEN_GEN == 4 && !GEN_IS_G4X) || info->aux_usage != ISL_AUX_USAGE_NONE) 
{
+  assert(info->x_offset_sa == 0);
+  assert(info->y_offset_sa == 0);
+   }
+
 #if GEN_GEN >= 7
if (info->aux_surf && info->aux_usage != ISL_AUX_USAGE_NONE) {
   struct isl_tile_info tile_info;
@@ -520,6 +523,26 @@ isl_genX(surf_fill_state_s)(const struct isl_device *dev, 
void *state,
   s.AuxiliarySurfaceQPitch =
  isl_surf_get_array_pitch_sa_rows(info->aux_surf) >> 2;
   s.AuxiliarySurfaceBaseAddress = info->aux_address;
+
+  if (info->aux_usage == ISL_AUX_USAGE_HIZ) {
+ /* The number of samples must be 1 */
+ assert(info->surf->samples == 1);
+
+ /* The dimension must not be 3D */
+ assert(info->surf->dim != ISL_SURF_DIM_3D);
+
+ /* The format must be one of the following: */
+ switch (info->view->format) {
+ case ISL_FORMAT_R32_FLOAT:
+ case ISL_FORMAT_R24_UNORM_X8_TYPELESS:
+ case ISL_FORMAT_R16_UNORM:
+break;
+ default:
+assert(!"Incompatible HiZ Sampling format");
+break;
+ }
+  }
+
   s.AuxiliarySurfaceMode = isl_to_gen_aux_mode[info->aux_usage];
 #else
   assert(info->aux_usage == ISL_AUX_USAGE_MCS ||
@@ -548,6 +571,15 @@ isl_genX(surf_fill_state_s)(const struct isl_device *dev, 
void *state,
  s.SamplerL2BypassModeDisable = true;
  break;
   default:
+ /* From the SKL PRM, Programming Note under Sampler Output Channel
+  * Mapping:
+  *
+  *If a surface has an associated HiZ Auxilliary surface, the
+  *Sampler L2 Bypass Mode Disable field in the RENDER_SURFACE_STATE
+  *must be set.
+  */
+ if (GEN_GEN >= 9 && info->aux_usage == ISL_AUX_USAGE_HIZ)
+s.SamplerL2BypassModeDisable = true;
  break;
   }
}
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/22] intel/blorp_clear: Add gen8 HiZ clearing functions

2017-01-11 Thread Nanley Chery
Add an entry point for the optimized gen8 BLORP HiZ sequence. commit
c9eaf12de20ac4143fe79d42018bdbb5a391356f fixed a bug that was
unknowingly worked around by forcing additional clear rectangle
alignment restrictions not specified in the PRMs. Now that the bug is no
longer present, omit the additional alignment restrictions.

Signed-off-by: Nanley Chery 
---
 src/intel/blorp/blorp.h   | 12 +++
 src/intel/blorp/blorp_clear.c | 80 +++
 2 files changed, 92 insertions(+)

diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
index 823475b607..ff60567fc4 100644
--- a/src/intel/blorp/blorp.h
+++ b/src/intel/blorp/blorp.h
@@ -155,8 +155,20 @@ blorp_clear_depth_stencil(struct blorp_batch *batch,
   uint32_t x0, uint32_t y0, uint32_t x1, uint32_t y1,
   bool clear_depth, float depth_value,
   uint8_t stencil_mask, uint8_t stencil_value);
+bool
+blorp_can_hiz_clear_depth(uint8_t gen, enum isl_format format,
+  uint32_t num_samples,
+  uint32_t x0, uint32_t y0,
+  uint32_t x1, uint32_t y1);
 
 void
+blorp_gen8_hiz_clear_attachments(struct blorp_batch *batch,
+ uint32_t num_samples,
+ uint32_t x0, uint32_t y0,
+ uint32_t x1, uint32_t y1,
+ bool clear_depth, bool clear_stencil,
+ uint8_t stencil_value);
+void
 blorp_clear_attachments(struct blorp_batch *batch,
 uint32_t binding_table_offset,
 enum isl_format depth_format,
diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index d090408721..b6db6d2c19 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -470,6 +470,86 @@ blorp_clear_depth_stencil(struct blorp_batch *batch,
}
 }
 
+bool
+blorp_can_hiz_clear_depth(uint8_t gen, enum isl_format format,
+  uint32_t num_samples,
+  uint32_t x0, uint32_t y0, uint32_t x1, uint32_t y1)
+{
+   /* This function currently doesn't support any gen prior to gen8 */
+   assert(gen >= 8);
+
+   if (gen == 8 && format == ISL_FORMAT_R16_UNORM) {
+  /* Apply the D16 alignment restrictions. On BDW, HiZ has an 8x4 sample
+   * block with the following property: as the number of samples increases,
+   * the number of pixels representable by this block decreases by a factor
+   * of the sample dimensions. Sample dimensions scale following the MSAA
+   * interleaved pattern.
+   *
+   * Sample|Sample|Pixel
+   * Count |Dim   |Dim
+   * ===
+   *1  | 1x1  | 8x4
+   *2  | 2x1  | 4x4
+   *4  | 2x2  | 4x2
+   *8  | 4x2  | 2x2
+   *   16  | 4x4  | 2x1
+   *
+   * Table: Pixel Dimensions in a HiZ Sample Block Pre-SKL
+   */
+  const struct isl_extent2d sa_block_dim =
+ isl_get_interleaved_msaa_px_size_sa(num_samples);
+  const uint8_t align_px_w = 8 / sa_block_dim.w;
+  const uint8_t align_px_h = 4 / sa_block_dim.h;
+
+  /* Fast depth clears clear an entire sample block at a time. As a result,
+   * the rectangle must be aligned to the dimensions of the encompassing
+   * pixel block for a successful operation.
+   *
+   * Fast clears can still work if the upper-left corner is aligned and the
+   * bottom-rigtht corner touches the edge of a depth buffer whose extent
+   * is unaligned. This is because each miplevel in the depth buffer is
+   * padded by the Pixel Dim (similar to a standard compressed texture).
+   * In this case, the clear rectangle could be padded by to match the
+   * full depth buffer extent later on in this function but to
+   * support multiple clearing techniques, we chose to be unaware of the
+   * depth buffer's extent and thus don't handle this case.
+   */
+  if (x0 % align_px_w || y0 % align_px_h ||
+  x1 % align_px_w || y1 % align_px_h)
+ return false;
+   }
+   return true;
+}
+
+/* Given a depth stencil attachment, this function performs a fast depth clear
+ * on a depth portion and a regular clear on the stencil portion. When
+ * performing a fast depth clear on the depth portion, the HiZ buffer is simply
+ * tagged as cleared so the depth clear value is not actually needed.
+ */
+void
+blorp_gen8_hiz_clear_attachments(struct blorp_batch *batch,
+ uint32_t num_samples,
+ uint32_t x0, uint32_t y0,
+ uint32_t x1, uint32_t y1,
+ bool clear_depth, bool clear_stencil,
+ uint8_t stencil_value)
+{
+   assert(batch->flags & 

[Mesa-dev] [PATCH 20/22] anv/blorp: Don't fast depth clear samplable HiZ buffers on BDW

2017-01-11 Thread Nanley Chery
Avoid the resolves that would be required if fast depth clears were
allowed for such buffers.

Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_blorp.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index 5d410f7d86..4649ffd9db 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -1270,6 +1270,15 @@ anv_cmd_buffer_clear_subpass(struct anv_cmd_buffer 
*cmd_buffer)
 * ANV_HZ_FC_VAL.
 */
clear_with_hiz = false;
+} else if (gen == 8 &&
+   anv_can_sample_with_hiz(cmd_buffer->device->info.gen,
+   iview->image->samples)) {
+   /* Only gen9+ supports returning ANV_HZ_FC_VAL when sampling a
+* fast-cleared portion of a HiZ buffer. Testing has revealed
+* that Gen8 only supports returning 0.0f. Gens prior to gen8 do
+* not support this feature at all.
+*/
+   clear_with_hiz = false;
 }
  }
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 17/22] anv: Perform HiZ resolves only on layout transitions

2017-01-11 Thread Nanley Chery
This is a better mapping to the Vulkan API and improves performance in
all tested workloads.

Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_blorp.c   | 48 ++---
 src/intel/vulkan/genX_cmd_buffer.c | 54 +++---
 2 files changed, 46 insertions(+), 56 deletions(-)

diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index 9919ac7ea0..5d410f7d86 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -1579,52 +1579,8 @@ anv_gen8_hiz_op_resolve(struct anv_cmd_buffer 
*cmd_buffer,
image->aux_usage != ISL_AUX_USAGE_HIZ)
   return;
 
-   const struct anv_cmd_state *cmd_state = _buffer->state;
-   const uint32_t ds = cmd_state->subpass->depth_stencil_attachment;
-
-   /* Section 7.4. of the Vulkan 1.0.27 spec states:
-*
-*   "The render area must be contained within the framebuffer dimensions."
-*
-* Therefore, the only way the extent of the render area can match that of
-* the image view is if the render area offset equals (0, 0).
-*/
-   const bool full_surface_op =
- cmd_state->render_area.extent.width == image->extent.width &&
- cmd_state->render_area.extent.height == image->extent.height;
-   if (full_surface_op)
-  assert(cmd_state->render_area.offset.x == 0 &&
- cmd_state->render_area.offset.y == 0);
-
-   /* Check the subpass index to determine if skipping a resolve is allowed */
-   const uint32_t subpass_idx = cmd_state->subpass - 
cmd_state->pass->subpasses;
-   switch (op) {
-   case BLORP_HIZ_OP_DEPTH_RESOLVE:
-  if (cmd_buffer->state.pass->attachments[ds].store_op !=
-  VK_ATTACHMENT_STORE_OP_STORE &&
-  subpass_idx == cmd_state->pass->subpass_count - 1)
- return;
-  break;
-   case BLORP_HIZ_OP_HIZ_RESOLVE:
-  /* If the render area covers the entire surface *and* load_op is either
-   * CLEAR or DONT_CARE then the previous contents of the depth buffer
-   * will be entirely discarded.  In this case, we can skip the HiZ
-   * resolve.
-   *
-   * If the render area is not the full surface, we need to do
-   * the resolve because otherwise data outside the render area may get
-   * garbled by the resolve at the end of the render pass.
-   */
-  if (full_surface_op &&
-  cmd_buffer->state.pass->attachments[ds].load_op !=
-  VK_ATTACHMENT_LOAD_OP_LOAD && subpass_idx == 0)
- return;
-  break;
-   case BLORP_HIZ_OP_DEPTH_CLEAR:
-   case BLORP_HIZ_OP_NONE:
-  unreachable("Invalid HiZ OP");
-   }
-
+   assert(op == BLORP_HIZ_OP_HIZ_RESOLVE ||
+  op == BLORP_HIZ_OP_DEPTH_RESOLVE);
 
struct blorp_batch batch;
blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 1793c4df26..447baa08b2 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -510,7 +510,13 @@ genX(cmd_buffer_setup_attachments)(struct anv_cmd_buffer 
*cmd_buffer,
   state->attachments[i].aux_usage,
   state->attachments[i].color_rt_state);
  } else {
-state->attachments[i].aux_usage = iview->image->aux_usage;
+if (iview->image->aux_usage == ISL_AUX_USAGE_HIZ &&
+iview->aspect_mask & VK_IMAGE_ASPECT_DEPTH_BIT) {
+   state->attachments[i].aux_usage =
+  layout_to_hiz_usage(att->initial_layout);
+} else {
+   state->attachments[i].aux_usage = ISL_AUX_USAGE_NONE;
+}
 state->attachments[i].input_aux_usage = ISL_AUX_USAGE_NONE;
  }
 
@@ -915,6 +921,13 @@ void genX(CmdPipelineBarrier)(
for (uint32_t i = 0; i < imageMemoryBarrierCount; i++) {
   src_flags |= pImageMemoryBarriers[i].srcAccessMask;
   dst_flags |= pImageMemoryBarriers[i].dstAccessMask;
+  ANV_FROM_HANDLE(anv_image, image, pImageMemoryBarriers[i].image);
+  if (pImageMemoryBarriers[i].subresourceRange.aspectMask &
+  VK_IMAGE_ASPECT_DEPTH_BIT) {
+ transition_depth_buffer(cmd_buffer, image,
+ pImageMemoryBarriers[i].oldLayout,
+ pImageMemoryBarriers[i].newLayout);
+  }
}
 
enum anv_pipe_bits pipe_bits = 0;
@@ -2297,9 +2310,16 @@ genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer 
*cmd_buffer,
const struct anv_image_view *iview =
   anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
 
-   if (iview) {
-  anv_gen8_hiz_op_resolve(cmd_buffer, iview->image,
-  BLORP_HIZ_OP_HIZ_RESOLVE);
+   if (iview && iview->image->aux_usage == ISL_AUX_USAGE_HIZ &&
+   iview->aspect_mask & VK_IMAGE_ASPECT_DEPTH_BIT) {
+  const uint32_t ds = subpass->depth_stencil_attachment;
+  

[Mesa-dev] [PATCH 08/22] anv/blorp: Add a gen8 HiZ op resolve function

2017-01-11 Thread Nanley Chery
Add an entry point for resolving using BLORP's gen8 HiZ op function.

Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_blorp.c   | 74 ++
 src/intel/vulkan/anv_private.h |  5 +++
 2 files changed, 79 insertions(+)

diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index a77913db25..433e82f938 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -1558,3 +1558,77 @@ anv_cmd_buffer_resolve_subpass(struct anv_cmd_buffer 
*cmd_buffer)
 
blorp_batch_finish();
 }
+
+void
+anv_gen8_hiz_op_resolve(struct anv_cmd_buffer *cmd_buffer,
+const struct anv_image *image,
+enum blorp_hiz_op op)
+{
+   assert(image);
+
+   /* Don't resolve depth buffers without an auxiliary HiZ buffer and
+* don't perform such a resolve on gens that don't support it.
+*/
+   if (cmd_buffer->device->info.gen < 8 ||
+   image->aux_usage != ISL_AUX_USAGE_HIZ)
+  return;
+
+   const struct anv_cmd_state *cmd_state = _buffer->state;
+   const uint32_t ds = cmd_state->subpass->depth_stencil_attachment;
+
+   /* Section 7.4. of the Vulkan 1.0.27 spec states:
+*
+*   "The render area must be contained within the framebuffer dimensions."
+*
+* Therefore, the only way the extent of the render area can match that of
+* the image view is if the render area offset equals (0, 0).
+*/
+   const bool full_surface_op =
+ cmd_state->render_area.extent.width == image->extent.width &&
+ cmd_state->render_area.extent.height == image->extent.height;
+   if (full_surface_op)
+  assert(cmd_state->render_area.offset.x == 0 &&
+ cmd_state->render_area.offset.y == 0);
+
+   /* Check the subpass index to determine if skipping a resolve is allowed */
+   const uint32_t subpass_idx = cmd_state->subpass - 
cmd_state->pass->subpasses;
+   switch (op) {
+   case BLORP_HIZ_OP_DEPTH_RESOLVE:
+  if (cmd_buffer->state.pass->attachments[ds].store_op !=
+  VK_ATTACHMENT_STORE_OP_STORE &&
+  subpass_idx == cmd_state->pass->subpass_count - 1)
+ return;
+  break;
+   case BLORP_HIZ_OP_HIZ_RESOLVE:
+  /* If the render area covers the entire surface *and* load_op is either
+   * CLEAR or DONT_CARE then the previous contents of the depth buffer
+   * will be entirely discarded.  In this case, we can skip the HiZ
+   * resolve.
+   *
+   * If the render area is not the full surface, we need to do
+   * the resolve because otherwise data outside the render area may get
+   * garbled by the resolve at the end of the render pass.
+   */
+  if (full_surface_op &&
+  cmd_buffer->state.pass->attachments[ds].load_op !=
+  VK_ATTACHMENT_LOAD_OP_LOAD && subpass_idx == 0)
+ return;
+  break;
+   case BLORP_HIZ_OP_DEPTH_CLEAR:
+   case BLORP_HIZ_OP_NONE:
+  unreachable("Invalid HiZ OP");
+   }
+
+
+   struct blorp_batch batch;
+   blorp_batch_init(_buffer->device->blorp, , cmd_buffer, 0);
+
+   struct blorp_surf surf;
+   get_blorp_surf_for_anv_image(image, VK_IMAGE_ASPECT_DEPTH_BIT,
+ISL_AUX_USAGE_HIZ, );
+   surf.clear_color.u32[0] = (uint32_t)
+  cmd_state->attachments[ds].clear_value.depthStencil.depth;
+
+   blorp_gen6_hiz_op(, , 0, 0, op);
+   blorp_batch_finish();
+}
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 4b2cac5c48..84cc15a0b5 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1587,6 +1587,11 @@ struct anv_image {
struct anv_surface aux_surface;
 };
 
+void
+anv_gen8_hiz_op_resolve(struct anv_cmd_buffer *cmd_buffer,
+const struct anv_image *image,
+enum blorp_hiz_op op);
+
 static inline uint32_t
 anv_get_layerCount(const struct anv_image *image,
const VkImageSubresourceRange *range)
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/22] anv: Avoid resolves incurred by fast depth clears

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_blorp.c   |  9 +++--
 src/intel/vulkan/anv_private.h | 15 +++
 src/intel/vulkan/genX_cmd_buffer.c |  5 +
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index 433e82f938..9919ac7ea0 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -1264,6 +1264,12 @@ anv_cmd_buffer_clear_subpass(struct anv_cmd_buffer 
*cmd_buffer)
render_area.offset.y +
render_area.extent.height)) {
clear_with_hiz = false;
+} else if (clear_att.clearValue.depthStencil.depth !=
+   ANV_HZ_FC_VAL) {
+   /* Don't enable fast depth clears for any color not equal to
+* ANV_HZ_FC_VAL.
+*/
+   clear_with_hiz = false;
 }
  }
 
@@ -1626,8 +1632,7 @@ anv_gen8_hiz_op_resolve(struct anv_cmd_buffer *cmd_buffer,
struct blorp_surf surf;
get_blorp_surf_for_anv_image(image, VK_IMAGE_ASPECT_DEPTH_BIT,
 ISL_AUX_USAGE_HIZ, );
-   surf.clear_color.u32[0] = (uint32_t)
-  cmd_state->attachments[ds].clear_value.depthStencil.depth;
+   surf.clear_color.u32[0] = (uint32_t) ANV_HZ_FC_VAL;
 
blorp_gen6_hiz_op(, , 0, 0, op);
blorp_batch_finish();
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 237308fb3e..98692b5913 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -72,6 +72,21 @@ struct gen_l3_config;
 extern "C" {
 #endif
 
+/* Allowing different clear colors requires us to perform a depth resolve at
+ * the end of certain render passes. This is because while slow clears store
+ * the clear color in the HiZ buffer, fast clears (without a resolve) don't.
+ * See the PRMs for examples describing when additional resolves would be
+ * necessary. To enable fast clears without requiring extra resolves, we set
+ * the clear value to a globally-defined one. We could allow different values
+ * if the user doesn't expect coherent data during or after a render passes
+ * (VK_ATTACHMENT_STORE_OP_DONT_CARE), but such users (aside from the CTS)
+ * don't seem to exist yet. In almost all Vulkan applications tested thus far,
+ * 1.0f seems to be the only value used. The only application that doesn't set
+ * this value does so through the usage of an seemingly uninitialized clear
+ * value.
+ */
+#define ANV_HZ_FC_VAL 1.0f
+
 #define MAX_VBS 32
 #define MAX_SETS 8
 #define MAX_RTS  8
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 95d0cfc983..baa932e517 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -2283,10 +2283,7 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
*cmd_buffer)
anv_batch_emit(_buffer->batch, GENX(3DSTATE_CLEAR_PARAMS), cp) {
   if (has_hiz) {
  cp.DepthClearValueValid = true;
- const uint32_t ds =
-cmd_buffer->state.subpass->depth_stencil_attachment;
- cp.DepthClearValue =
-cmd_buffer->state.attachments[ds].clear_value.depthStencil.depth;
+ cp.DepthClearValue = ANV_HZ_FC_VAL;
   }
}
 }
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/22] anv: Use the gen8 BLORP HiZ resolving function

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/genX_cmd_buffer.c | 27 ---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 63f6be12a8..74369f6ba1 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -2248,8 +2248,15 @@ genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer 
*cmd_buffer,
 
cmd_buffer->state.dirty |= ANV_CMD_DIRTY_RENDER_TARGETS;
 
+   const struct anv_image_view *iview =
+  anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
+
+   if (iview) {
+  anv_gen8_hiz_op_resolve(cmd_buffer, iview->image,
+  BLORP_HIZ_OP_HIZ_RESOLVE);
+   }
+
cmd_buffer_emit_depth_stencil(cmd_buffer);
-   genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_HIZ_RESOLVE);
 
anv_cmd_buffer_clear_subpass(cmd_buffer);
 }
@@ -2281,7 +2288,14 @@ void genX(CmdNextSubpass)(
 
assert(cmd_buffer->level == VK_COMMAND_BUFFER_LEVEL_PRIMARY);
 
-   genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_DEPTH_RESOLVE);
+   const struct anv_image_view *iview =
+  anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
+
+   if (iview) {
+  anv_gen8_hiz_op_resolve(cmd_buffer, iview->image,
+  BLORP_HIZ_OP_DEPTH_RESOLVE);
+   }
+
anv_cmd_buffer_resolve_subpass(cmd_buffer);
genX(cmd_buffer_set_subpass)(cmd_buffer, cmd_buffer->state.subpass + 1);
 }
@@ -2291,7 +2305,14 @@ void genX(CmdEndRenderPass)(
 {
ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer);
 
-   genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_DEPTH_RESOLVE);
+   const struct anv_image_view *iview =
+  anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
+
+   if (iview) {
+  anv_gen8_hiz_op_resolve(cmd_buffer, iview->image,
+  BLORP_HIZ_OP_DEPTH_RESOLVE);
+   }
+
anv_cmd_buffer_resolve_subpass(cmd_buffer);
 
 #ifndef NDEBUG
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/22] anv: Delete anv's HiZ op emit function

2017-01-11 Thread Nanley Chery
This is no longer used.

Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_genX.h|   3 -
 src/intel/vulkan/gen7_cmd_buffer.c |   7 --
 src/intel/vulkan/gen8_cmd_buffer.c | 223 -
 3 files changed, 233 deletions(-)

diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h
index 35ee3bb380..d04fe38a51 100644
--- a/src/intel/vulkan/anv_genX.h
+++ b/src/intel/vulkan/anv_genX.h
@@ -61,9 +61,6 @@ genX(emit_urb_setup)(struct anv_device *device, struct 
anv_batch *batch,
  VkShaderStageFlags active_stages,
  const unsigned entry_size[4]);
 
-void genX(cmd_buffer_emit_hz_op)(struct anv_cmd_buffer *cmd_buffer,
-   enum blorp_hiz_op op);
-
 void genX(cmd_buffer_gpu_memcpy)(struct anv_cmd_buffer *cmd_buffer,
  struct anv_bo *dst, uint32_t dst_offset,
  struct anv_bo *src, uint32_t src_offset,
diff --git a/src/intel/vulkan/gen7_cmd_buffer.c 
b/src/intel/vulkan/gen7_cmd_buffer.c
index 38e400b2d1..8d68aba9c9 100644
--- a/src/intel/vulkan/gen7_cmd_buffer.c
+++ b/src/intel/vulkan/gen7_cmd_buffer.c
@@ -256,13 +256,6 @@ genX(cmd_buffer_flush_dynamic_state)(struct anv_cmd_buffer 
*cmd_buffer)
cmd_buffer->state.dirty = 0;
 }
 
-void
-genX(cmd_buffer_emit_hz_op)(struct anv_cmd_buffer *cmd_buffer,
-  enum blorp_hiz_op op)
-{
-   anv_finishme("Implement Gen7 HZ ops");
-}
-
 void genX(CmdSetEvent)(
 VkCommandBuffer commandBuffer,
 VkEvent event,
diff --git a/src/intel/vulkan/gen8_cmd_buffer.c 
b/src/intel/vulkan/gen8_cmd_buffer.c
index 81d7727130..f22037b570 100644
--- a/src/intel/vulkan/gen8_cmd_buffer.c
+++ b/src/intel/vulkan/gen8_cmd_buffer.c
@@ -322,229 +322,6 @@ void genX(CmdBindIndexBuffer)(
cmd_buffer->state.dirty |= ANV_CMD_DIRTY_INDEX_BUFFER;
 }
 
-
-/**
- * Emit the HZ_OP packet in the sequence specified by the BDW PRM section
- * entitled: "Optimized Depth Buffer Clear and/or Stencil Buffer Clear."
- *
- * \todo Enable Stencil Buffer-only clears
- */
-void
-genX(cmd_buffer_emit_hz_op)(struct anv_cmd_buffer *cmd_buffer,
-  enum blorp_hiz_op op)
-{
-   struct anv_cmd_state *cmd_state = _buffer->state;
-   const struct anv_image_view *iview =
-  anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
-
-   if (iview == NULL || iview->image->aux_usage != ISL_AUX_USAGE_HIZ)
-  return;
-
-   const uint32_t ds = cmd_state->subpass->depth_stencil_attachment;
-
-   /* Section 7.4. of the Vulkan 1.0.27 spec states:
-*
-*   "The render area must be contained within the framebuffer dimensions."
-*
-* Therefore, the only way the extent of the render area can match that of
-* the image view is if the render area offset equals (0, 0).
-*/
-   const bool full_surface_op =
- cmd_state->render_area.extent.width == iview->extent.width &&
- cmd_state->render_area.extent.height == iview->extent.height;
-   if (full_surface_op)
-  assert(cmd_state->render_area.offset.x == 0 &&
- cmd_state->render_area.offset.y == 0);
-
-   bool depth_clear;
-   bool stencil_clear;
-
-   /* This variable corresponds to the Pixel Dim column in the table below */
-   struct isl_extent2d px_dim;
-
-   const uint32_t subpass_idx = cmd_state->subpass - 
cmd_state->pass->subpasses;
-
-   /* Validate that we can perform the HZ operation and that it's necessary. */
-   switch (op) {
-   case BLORP_HIZ_OP_DEPTH_CLEAR:
-  stencil_clear = VK_IMAGE_ASPECT_STENCIL_BIT &
-  cmd_state->attachments[ds].pending_clear_aspects;
-  depth_clear = VK_IMAGE_ASPECT_DEPTH_BIT &
-cmd_state->attachments[ds].pending_clear_aspects;
-
-  /* Apply alignment restrictions. Despite the BDW PRM mentioning this is
-   * only needed for a depth buffer surface type of D16_UNORM, testing
-   * showed it to be necessary for other depth formats as well
-   * (e.g., D32_FLOAT).
-   */
-#if GEN_GEN == 8
-  /* Pre-SKL, HiZ has an 8x4 sample block. As the number of samples
-   * increases, the number of pixels representable by this block
-   * decreases by a factor of the sample dimensions. Sample dimensions
-   * scale following the MSAA interleaved pattern.
-   *
-   * Sample|Sample|Pixel
-   * Count |Dim   |Dim
-   * ===
-   *1  | 1x1  | 8x4
-   *2  | 2x1  | 4x4
-   *4  | 2x2  | 4x2
-   *8  | 4x2  | 2x2
-   *   16  | 4x4  | 2x1
-   *
-   * Table: Pixel Dimensions in a HiZ Sample Block Pre-SKL
-   */
-  /* This variable corresponds to the Sample Dim column in the table
-   * above.
-   */
-  const struct isl_extent2d sa_dim =
- isl_get_interleaved_msaa_px_size_sa(iview->image->samples);
-  px_dim.w = 8 / 

[Mesa-dev] [PATCH 15/22] anv: Disable HiZ for input attachments

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_image.c   | 17 +
 src/intel/vulkan/genX_cmd_buffer.c | 10 --
 2 files changed, 17 insertions(+), 10 deletions(-)

diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index d821629191..f8a21c2982 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -182,6 +182,14 @@ make_surface(const struct anv_device *dev,
*/
   if (!(image->usage & VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT)) {
  /* It will never be used as an attachment, HiZ is pointless. */
+  } else if (image->usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT) {
+ /* It will never have a layout of
+  * VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL, so HiZ is
+  * currently pointless. If transfer operations learn to use the HiZ
+  * buffer, we can enable HiZ for VK_IMAGE_LAYOUT_GENERAL and support
+  * input attachments.
+  */
+ anv_finishme("Implement HiZ for input attachments");
   } else if (!env_var_as_boolean("INTEL_VK_HIZ", dev->info.gen >= 8)) {
  anv_finishme("Implement gen7 HiZ");
   } else if (vk_info->mipLevels > 1) {
@@ -529,14 +537,15 @@ anv_CreateImageView(VkDevice _device,
if (surf_usage == ISL_AUX_USAGE_HIZ)
   surf_usage = ISL_AUX_USAGE_NONE;
 
-   /* Input attachment surfaces for color or depth are allocated and filled
+   /* Input attachment surfaces for color are allocated and filled
 * out at BeginRenderPass time because they need compression information.
-* Stencil image do not support compression so we just use the texture
-* surface from the image view.
+* Compression is not yet enabled for depth textures and stencil doesn't
+* allow compression so we can just use the texture surface state from the
+* view.
 */
if (image->usage & VK_IMAGE_USAGE_SAMPLED_BIT ||
(image->usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT &&
-(iview->aspect_mask & VK_IMAGE_ASPECT_STENCIL_BIT))) {
+!(iview->aspect_mask & VK_IMAGE_ASPECT_COLOR_BIT))) {
   iview->sampler_surface_state = alloc_surface_state(device);
 
   struct isl_view view = iview->isl;
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index baa932e517..1793c4df26 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -303,11 +303,11 @@ need_input_attachment_state(const struct 
anv_render_pass_attachment *att)
if (!(att->usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT))
   return false;
 
-   /* We only allocate input attachment states for color and depth surfaces.
-* Stencil doesn't allow compression so we can just use the texture surface
-* state from the view
+   /* We only allocate input attachment states for color surfaces. Compression
+* is not yet enabled for depth textures and stencil doesn't allow
+* compression so we can just use the texture surface state from the view.
 */
-   return vk_format_is_color(att->format) || vk_format_has_depth(att->format);
+   return vk_format_is_color(att->format);
 }
 
 static enum isl_aux_usage
@@ -518,8 +518,6 @@ genX(cmd_buffer_setup_attachments)(struct anv_cmd_buffer 
*cmd_buffer,
 const struct isl_surf *surf;
 if (att_aspects == VK_IMAGE_ASPECT_COLOR_BIT) {
surf = >image->color_surface.isl;
-} else {
-   surf = >image->depth_surface.isl;
 }
 
 struct isl_view view = iview->isl;
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 12/22] anv: Store depth stencil layouts

2017-01-11 Thread Nanley Chery
Store the current and requested depth stencil layouts so that we can
perform the appropriate HiZ resolves for a given transition while
recording a render pass.

Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_pass.c|  5 +
 src/intel/vulkan/anv_private.h | 11 +++
 src/intel/vulkan/genX_cmd_buffer.c |  1 +
 3 files changed, 17 insertions(+)

diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c
index c1c149b48b..ea86fa9ff2 100644
--- a/src/intel/vulkan/anv_pass.c
+++ b/src/intel/vulkan/anv_pass.c
@@ -74,6 +74,8 @@ VkResult anv_CreateRenderPass(
   att->load_op = pCreateInfo->pAttachments[i].loadOp;
   att->store_op = pCreateInfo->pAttachments[i].storeOp;
   att->stencil_load_op = pCreateInfo->pAttachments[i].stencilLoadOp;
+  att->initial_layout = pCreateInfo->pAttachments[i].initialLayout;
+  att->final_layout = pCreateInfo->pAttachments[i].finalLayout;
   att->subpass_usage = usages;
   usages += pass->subpass_count;
}
@@ -161,6 +163,8 @@ VkResult anv_CreateRenderPass(
   if (desc->pDepthStencilAttachment) {
  uint32_t a = desc->pDepthStencilAttachment->attachment;
  subpass->depth_stencil_attachment = a;
+ subpass->depth_stencil_layout =
+desc->pDepthStencilAttachment->layout;
  if (a != VK_ATTACHMENT_UNUSED) {
 pass->attachments[a].usage |=
VK_IMAGE_USAGE_DEPTH_STENCIL_ATTACHMENT_BIT;
@@ -168,6 +172,7 @@ VkResult anv_CreateRenderPass(
  }
   } else {
  subpass->depth_stencil_attachment = VK_ATTACHMENT_UNUSED;
+ subpass->depth_stencil_layout = VK_IMAGE_LAYOUT_UNDEFINED;
   }
}
 
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index 84cc15a0b5..f4034866a7 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1097,6 +1097,7 @@ struct anv_attachment_state {
struct anv_state color_rt_state;
struct anv_state input_att_state;
 
+   VkImageLayoutcurrent_layout;
VkImageAspectFlags   pending_clear_aspects;
bool fast_clear;
VkClearValue clear_value;
@@ -1727,7 +1728,12 @@ struct anv_subpass {
uint32_t color_count;
uint32_t *   color_attachments;
uint32_t *   resolve_attachments;
+
+   /* TODO: Consider storing the depth/stencil VkAttachmentReference
+* instead of its two structure members (below) individually.
+*/
uint32_t depth_stencil_attachment;
+   VkImageLayoutdepth_stencil_layout;
 
/** Subpass has a depth/stencil self-dependency */
bool has_ds_self_dep;
@@ -1744,12 +1750,17 @@ enum anv_subpass_usage {
 };
 
 struct anv_render_pass_attachment {
+   /* TODO: Consider using VkAttachmentDescription instead of storing each of
+* its members individually.
+*/
VkFormat format;
uint32_t samples;
VkImageUsageFlagsusage;
VkAttachmentLoadOp   load_op;
VkAttachmentStoreOp  store_op;
VkAttachmentLoadOp   stencil_load_op;
+   VkImageLayoutinitial_layout;
+   VkImageLayoutfinal_layout;
 
/* An array, indexed by subpass id, of how the attachment will be used. */
enum anv_subpass_usage * subpass_usage;
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index fff9bd37c0..95d0cfc983 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -480,6 +480,7 @@ genX(cmd_buffer_setup_attachments)(struct anv_cmd_buffer 
*cmd_buffer,
 }
  }
 
+ state->attachments[i].current_layout = att->initial_layout;
  state->attachments[i].pending_clear_aspects = clear_aspects;
  if (clear_aspects)
 state->attachments[i].clear_value = begin->pClearValues[i];
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/22] anv: Add helpers to handle depth buffer layout transitions

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/genX_cmd_buffer.c | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 74369f6ba1..fff9bd37c0 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -310,6 +310,56 @@ need_input_attachment_state(const struct 
anv_render_pass_attachment *att)
return vk_format_is_color(att->format) || vk_format_has_depth(att->format);
 }
 
+static enum isl_aux_usage
+layout_to_hiz_usage(VkImageLayout layout)
+{
+   switch (layout) {
+   case VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL:
+  return ISL_AUX_USAGE_HIZ;
+   default:
+  return ISL_AUX_USAGE_NONE;
+   }
+}
+
+/* Transitions a HiZ-enabled depth buffer from one layout to another. Unless
+ * the initial layout is undefined, the HiZ buffer and depth buffer will
+ * represent the same data at the end of this operation.
+ */
+static void
+transition_depth_buffer(struct anv_cmd_buffer *cmd_buffer,
+const struct anv_image *image,
+VkImageLayout initial_layout,
+VkImageLayout final_layout)
+{
+   assert(image);
+
+   if (image->aux_usage != ISL_AUX_USAGE_HIZ)
+  return;
+
+   const bool hiz_enabled = layout_to_hiz_usage(initial_layout) ==
+ISL_AUX_USAGE_HIZ;
+   const bool enable_hiz = layout_to_hiz_usage(final_layout) ==
+   ISL_AUX_USAGE_HIZ;
+
+   /* We've already initialized the aux HiZ buffer at BindImageMemory time,
+* so there's no need to perform a HIZ resolve or clear to avoid GPU hangs.
+* This initial layout indicates that the user doesn't care about the data
+* that's currently in the buffer, so no resolves are necessary.
+*/
+   if (initial_layout == VK_IMAGE_LAYOUT_UNDEFINED)
+  return;
+
+   if (hiz_enabled == enable_hiz) {
+  /* The same buffer will be used, no resolves are necessary */
+   } else if (hiz_enabled && !enable_hiz) {
+  anv_gen8_hiz_op_resolve(cmd_buffer, image, BLORP_HIZ_OP_DEPTH_RESOLVE);
+   } else {
+  assert(!hiz_enabled && enable_hiz);
+  anv_gen8_hiz_op_resolve(cmd_buffer, image, BLORP_HIZ_OP_HIZ_RESOLVE);
+   }
+}
+
+
 /**
  * Setup anv_cmd_state::attachments for vkCmdBeginRenderPass.
  */
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/22] intel/blorp: Add the BDW+ optimized HZ_OP sequence to BLORP

2017-01-11 Thread Nanley Chery
We'll be switching to layout-transition based resolves which can occur
outside of a render pass. Add this sequence to BLORP, as using BLORP
will enable emitting depth stencil state outside of a render pass (among
other benefits). The depth buffer extent is ignored to enable eventual
usage of VkCmdClearAttachments().

Signed-off-by: Nanley Chery 
---
 src/intel/blorp/blorp_genX_exec.h | 87 +++
 1 file changed, 87 insertions(+)

diff --git a/src/intel/blorp/blorp_genX_exec.h 
b/src/intel/blorp/blorp_genX_exec.h
index 66906fabbc..a673ab8141 100644
--- a/src/intel/blorp/blorp_genX_exec.h
+++ b/src/intel/blorp/blorp_genX_exec.h
@@ -1237,6 +1237,86 @@ blorp_emit_3dstate_multisample(struct blorp_batch *batch,
}
 }
 
+#if GEN_GEN >= 8
+/* Emits the Optimized HiZ sequence specified in the BDW+ PRMs. The
+ * depth/stencil buffer extents are ignored to handle APIs which perform
+ * clearing operations without such information.
+ * */
+static void
+blorp_emit_gen8_hiz_op(struct blorp_batch *batch,
+   const struct blorp_params *params)
+{
+   /* We should be performing an operation on a depth or stencil buffer.
+*/
+   assert(params->depth.enabled || params->stencil.enabled);
+
+   /* The stencil buffer should only be enabled if a fast clear operation is
+* requested.
+*/
+   if (params->stencil.enabled)
+  assert(params->hiz_op == BLORP_HIZ_OP_DEPTH_CLEAR);
+
+   /* If we can't alter the depth stencil config and multiple layers are
+* involved, the HiZ op will fail. This is because the op requires that a
+* new config is emitted for each additional layer.
+*/
+   if (batch->flags & BLORP_BATCH_NO_EMIT_DEPTH_STENCIL) {
+  assert(params->num_layers <= 1);
+   } else {
+  blorp_emit_depth_stencil_config(batch, params);
+   }
+
+   blorp_emit(batch, GENX(3DSTATE_WM_HZ_OP), hzp) {
+  switch (params->hiz_op) {
+  case BLORP_HIZ_OP_DEPTH_CLEAR:
+ hzp.StencilBufferClearEnable = params->stencil.enabled;
+ hzp.DepthBufferClearEnable = params->depth.enabled;
+ hzp.StencilClearValue = params->stencil_ref;
+ break;
+  case BLORP_HIZ_OP_DEPTH_RESOLVE:
+ hzp.DepthBufferResolveEnable = true;
+ break;
+  case BLORP_HIZ_OP_HIZ_RESOLVE:
+ hzp.HierarchicalDepthBufferResolveEnable = true;
+ break;
+  case BLORP_HIZ_OP_NONE:
+ unreachable("Invalid HIZ op");
+  }
+
+  hzp.NumberofMultisamples = ffs(params->num_samples) - 1;
+  hzp.SampleMask = 0x;
+
+  /* Due to a hardware issue, this bit MBZ */
+  assert(hzp.ScissorRectangleEnable == false);
+
+  /* Contrary to the HW docs both fields are inclusive */
+  hzp.ClearRectangleXMin = params->x0;
+  hzp.ClearRectangleYMin = params->y0;
+
+  /* Contrary to the HW docs both fields are exclusive */
+  hzp.ClearRectangleXMax = params->x1;
+  hzp.ClearRectangleYMax = params->y1;
+   }
+
+   /* PIPE_CONTROL w/ all bits clear except for “Post-Sync Operation” must set
+* to “Write Immediate Data” enabled.
+*/
+   blorp_emit(batch, GENX(PIPE_CONTROL), pc) {
+  pc.PostSyncOperation = WriteImmediateData;
+   }
+
+   blorp_emit(batch, GENX(3DSTATE_WM_HZ_OP), hzp);
+
+   /* Perform depth clear specific flushing */
+   if (params->hiz_op == BLORP_HIZ_OP_DEPTH_CLEAR && params->depth.enabled) {
+  blorp_emit(batch, GENX(PIPE_CONTROL), pc) {
+ pc.DepthStallEnable = true;
+ pc.DepthCacheFlushEnable = true;
+  }
+   }
+}
+#endif
+
 /* 3DSTATE_VIEWPORT_STATE_POINTERS */
 static void
 blorp_emit_viewport_state(struct blorp_batch *batch,
@@ -1283,6 +1363,13 @@ blorp_exec(struct blorp_batch *batch, const struct 
blorp_params *params)
uint32_t color_calc_state_offset = 0;
uint32_t depth_stencil_state_offset;
 
+#if GEN_GEN >= 8
+   if (params->hiz_op != BLORP_HIZ_OP_NONE) {
+  blorp_emit_gen8_hiz_op(batch, params);
+  return;
+   }
+#endif
+
blorp_emit_vertex_buffers(batch, params);
blorp_emit_vertex_elements(batch, params);
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/22] anv: Enable HiZ support for multiple subpasses

2017-01-11 Thread Nanley Chery
We'll be using layout transitions later on in the series which can occur
within and between subpasses. Turn this on now to simplify the change
later.

Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/TODO  |  2 +-
 src/intel/vulkan/gen8_cmd_buffer.c | 11 +--
 src/intel/vulkan/genX_cmd_buffer.c |  8 ++--
 3 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/src/intel/vulkan/TODO b/src/intel/vulkan/TODO
index 5c33725700..37fd16b437 100644
--- a/src/intel/vulkan/TODO
+++ b/src/intel/vulkan/TODO
@@ -9,7 +9,7 @@ Missing Features:
 
 Performance:
  - Sampling from HiZ (Nanley)
- - Multi-{sampled/gen8,LOD,subpass} HiZ
+ - Multi-{sampled/gen8,LOD} HiZ
  - Compressed multisample support
  - Pushing pieces of UBOs?
  - Enable guardband clipping
diff --git a/src/intel/vulkan/gen8_cmd_buffer.c 
b/src/intel/vulkan/gen8_cmd_buffer.c
index 892a035304..81d7727130 100644
--- a/src/intel/vulkan/gen8_cmd_buffer.c
+++ b/src/intel/vulkan/gen8_cmd_buffer.c
@@ -340,10 +340,6 @@ genX(cmd_buffer_emit_hz_op)(struct anv_cmd_buffer 
*cmd_buffer,
if (iview == NULL || iview->image->aux_usage != ISL_AUX_USAGE_HIZ)
   return;
 
-   /* FINISHME: Implement multi-subpass HiZ */
-   if (cmd_buffer->state.pass->subpass_count > 1)
-  return;
-
const uint32_t ds = cmd_state->subpass->depth_stencil_attachment;
 
/* Section 7.4. of the Vulkan 1.0.27 spec states:
@@ -366,6 +362,8 @@ genX(cmd_buffer_emit_hz_op)(struct anv_cmd_buffer 
*cmd_buffer,
/* This variable corresponds to the Pixel Dim column in the table below */
struct isl_extent2d px_dim;
 
+   const uint32_t subpass_idx = cmd_state->subpass - 
cmd_state->pass->subpasses;
+
/* Validate that we can perform the HZ operation and that it's necessary. */
switch (op) {
case BLORP_HIZ_OP_DEPTH_CLEAR:
@@ -446,7 +444,8 @@ genX(cmd_buffer_emit_hz_op)(struct anv_cmd_buffer 
*cmd_buffer,
   break;
case BLORP_HIZ_OP_DEPTH_RESOLVE:
   if (cmd_buffer->state.pass->attachments[ds].store_op !=
-  VK_ATTACHMENT_STORE_OP_STORE)
+  VK_ATTACHMENT_STORE_OP_STORE &&
+  subpass_idx == cmd_state->pass->subpass_count - 1)
  return;
   break;
case BLORP_HIZ_OP_HIZ_RESOLVE:
@@ -461,7 +460,7 @@ genX(cmd_buffer_emit_hz_op)(struct anv_cmd_buffer 
*cmd_buffer,
*/
   if (full_surface_op &&
   cmd_buffer->state.pass->attachments[ds].load_op !=
-  VK_ATTACHMENT_LOAD_OP_LOAD)
+  VK_ATTACHMENT_LOAD_OP_LOAD && subpass_idx == 0)
  return;
   break;
case BLORP_HIZ_OP_NONE:
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 47d3322e48..b670d00e2d 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -2105,12 +2105,7 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
*cmd_buffer)
 depth_stencil_surface_type(image->depth_surface.isl.dim);
  db.DepthWriteEnable  = true;
  db.StencilWriteEnable= has_stencil;
-
- if (cmd_buffer->state.pass->subpass_count == 1) {
-db.HierarchicalDepthBufferEnable = has_hiz;
- } else {
-anv_finishme("Multiple-subpass HiZ not implemented");
- }
+ db.HierarchicalDepthBufferEnable = has_hiz;
 
  db.SurfaceFormat = isl_surf_get_depth_format(>isl_dev,
   
>depth_surface.isl);
@@ -2287,6 +2282,7 @@ void genX(CmdNextSubpass)(
 
assert(cmd_buffer->level == VK_COMMAND_BUFFER_LEVEL_PRIMARY);
 
+   genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_DEPTH_RESOLVE);
anv_cmd_buffer_resolve_subpass(cmd_buffer);
genX(cmd_buffer_set_subpass)(cmd_buffer, cmd_buffer->state.subpass + 1);
 }
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/22] anv: Use ::anv_attachment_state for toggling HiZ per subpass

2017-01-11 Thread Nanley Chery
We're about to enable HiZ support for multiple subpasses. Use this field
to keep track of whether or not subpass operations should treat the
depth buffer as having an auxiliary HiZ buffer.

Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/genX_cmd_buffer.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index a372e6420f..47d3322e48 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -459,7 +459,7 @@ genX(cmd_buffer_setup_attachments)(struct anv_cmd_buffer 
*cmd_buffer,
   state->attachments[i].aux_usage,
   state->attachments[i].color_rt_state);
  } else {
-state->attachments[i].aux_usage = ISL_AUX_USAGE_NONE;
+state->attachments[i].aux_usage = iview->image->aux_usage;
 state->attachments[i].input_aux_usage = ISL_AUX_USAGE_NONE;
  }
 
@@ -2087,7 +2087,9 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
*cmd_buffer)
   anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
const struct anv_image *image = iview ? iview->image : NULL;
const bool has_depth = image && (image->aspects & 
VK_IMAGE_ASPECT_DEPTH_BIT);
-   const bool has_hiz = image != NULL && image->aux_usage == ISL_AUX_USAGE_HIZ;
+   const uint32_t ds = cmd_buffer->state.subpass->depth_stencil_attachment;
+   const bool has_hiz = image != NULL &&
+  cmd_buffer->state.attachments[ds].aux_usage == ISL_AUX_USAGE_HIZ;
const bool has_stencil =
   image && (image->aspects & VK_IMAGE_ASPECT_STENCIL_BIT);
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/22] intel/blorp_blit: Handle ISL_AUX_USAGE_HIZ

2017-01-11 Thread Nanley Chery
Prevent assert failures that would occur in the next patch. BLORP
ignores this flag internally, so no protections are lost here.

Signed-off-by: Nanley Chery 
---
 src/intel/blorp/blorp_blit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/intel/blorp/blorp_blit.c b/src/intel/blorp/blorp_blit.c
index 1cbd9403c9..aa389dbcc1 100644
--- a/src/intel/blorp/blorp_blit.c
+++ b/src/intel/blorp/blorp_blit.c
@@ -2291,9 +2291,11 @@ blorp_copy(struct blorp_batch *batch,
   isl_format_get_layout(params.dst.surf.format);
 
assert(params.src.aux_usage == ISL_AUX_USAGE_NONE ||
+  params.src.aux_usage == ISL_AUX_USAGE_HIZ ||
   params.src.aux_usage == ISL_AUX_USAGE_MCS ||
   params.src.aux_usage == ISL_AUX_USAGE_CCS_E);
assert(params.dst.aux_usage == ISL_AUX_USAGE_NONE ||
+  params.dst.aux_usage == ISL_AUX_USAGE_HIZ ||
   params.dst.aux_usage == ISL_AUX_USAGE_MCS ||
   params.dst.aux_usage == ISL_AUX_USAGE_CCS_E);
 
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/22] anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ

2017-01-11 Thread Nanley Chery
The helper doesn't provide additional functionality over the current
infrastructure.

Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_blorp.c   |  2 +-
 src/intel/vulkan/anv_image.c   | 10 --
 src/intel/vulkan/anv_private.h | 10 --
 src/intel/vulkan/gen8_cmd_buffer.c |  2 +-
 src/intel/vulkan/genX_cmd_buffer.c |  2 +-
 5 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index b431d6af48..13d5b5f9d7 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -857,7 +857,7 @@ void anv_CmdClearDepthStencilImage(
struct blorp_surf depth, stencil;
if (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT) {
   get_blorp_surf_for_anv_image(image, VK_IMAGE_ASPECT_DEPTH_BIT,
-   image->aux_usage, );
+   ISL_AUX_USAGE_NONE, );
} else {
   memset(, 0, sizeof(depth));
}
diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index f262d8a524..d821629191 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -195,6 +195,7 @@ make_surface(const struct anv_device *dev,
  isl_surf_get_hiz_surf(>isl_dev, >depth_surface.isl,
>aux_surface.isl);
  add_surface(image, >aux_surface);
+ image->aux_usage = ISL_AUX_USAGE_HIZ;
   }
} else if (aspect == VK_IMAGE_ASPECT_COLOR_BIT && vk_info->samples == 1) {
   if (!unlikely(INTEL_DEBUG & DEBUG_NO_RBC)) {
@@ -523,6 +524,11 @@ anv_CreateImageView(VkDevice _device,
   iview->isl.usage = 0;
}
 
+   /* Sampling from HiZ is not yet enabled */
+   enum isl_aux_usage surf_usage = image->aux_usage;
+   if (surf_usage == ISL_AUX_USAGE_HIZ)
+  surf_usage = ISL_AUX_USAGE_NONE;
+
/* Input attachment surfaces for color or depth are allocated and filled
 * out at BeginRenderPass time because they need compression information.
 * Stencil image do not support compression so we just use the texture
@@ -540,7 +546,7 @@ anv_CreateImageView(VkDevice _device,
   .surf = >isl,
   .view = ,
   .aux_surf = >aux_surface.isl,
-  .aux_usage = image->aux_usage,
+  .aux_usage = surf_usage,
   .mocs = device->default_mocs);
 
   if (!device->info.has_llc)
@@ -564,7 +570,7 @@ anv_CreateImageView(VkDevice _device,
  .surf = >isl,
  .view = ,
  .aux_surf = >aux_surface.isl,
- .aux_usage = image->aux_usage,
+ .aux_usage = surf_usage,
  .mocs = device->default_mocs);
   } else {
  anv_fill_buffer_surface_state(device, iview->storage_surface_state,
diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
index dbc8c3cf68..4b2cac5c48 100644
--- a/src/intel/vulkan/anv_private.h
+++ b/src/intel/vulkan/anv_private.h
@@ -1642,16 +1642,6 @@ const struct anv_surface *
 anv_image_get_surface_for_aspect_mask(const struct anv_image *image,
   VkImageAspectFlags aspect_mask);
 
-static inline bool
-anv_image_has_hiz(const struct anv_image *image)
-{
-   /* We must check the aspect because anv_image::aux_surface may be used for
-* any type of auxiliary surface, not just HiZ.
-*/
-   return (image->aspects & VK_IMAGE_ASPECT_DEPTH_BIT) &&
-  image->aux_surface.isl.size > 0;
-}
-
 struct anv_buffer_view {
enum isl_format format; /**< VkBufferViewCreateInfo::format */
struct anv_bo *bo;
diff --git a/src/intel/vulkan/gen8_cmd_buffer.c 
b/src/intel/vulkan/gen8_cmd_buffer.c
index 3e4aa9bc62..892a035304 100644
--- a/src/intel/vulkan/gen8_cmd_buffer.c
+++ b/src/intel/vulkan/gen8_cmd_buffer.c
@@ -337,7 +337,7 @@ genX(cmd_buffer_emit_hz_op)(struct anv_cmd_buffer 
*cmd_buffer,
const struct anv_image_view *iview =
   anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
 
-   if (iview == NULL || !anv_image_has_hiz(iview->image))
+   if (iview == NULL || iview->image->aux_usage != ISL_AUX_USAGE_HIZ)
   return;
 
/* FINISHME: Implement multi-subpass HiZ */
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 7ff0d3ebba..a372e6420f 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -2087,7 +2087,7 @@ cmd_buffer_emit_depth_stencil(struct anv_cmd_buffer 
*cmd_buffer)
   anv_cmd_buffer_get_depth_stencil_view(cmd_buffer);
const struct anv_image *image = iview ? iview->image : NULL;
const bool has_depth = image && (image->aspects & 
VK_IMAGE_ASPECT_DEPTH_BIT);
-   const bool has_hiz = image != NULL && anv_image_has_hiz(image);
+   const bool has_hiz = image != NULL && image->aux_usage == 

[Mesa-dev] [PATCH 00/22] anv: Reduce HiZ Resolves

2017-01-11 Thread Nanley Chery
In my testing, this series completely removes HiZ resolves for the
following Vulkan applications: Dota 2, Talos Principle, and the Sascha
Willems Vulkan examples and demos. This is accomplished with two major
changes. The first change is to transition the current HiZ resolving
algorithm from resolving on attachment load/store ops to resolving on
image layout transitions. The second change is to enable sampling from
HiZ on BDW+.

There are some notable additional changes. To support performing layout
transitions outside of a render pass we implement the HiZ sequence in
BLORP which can emit depth stencil state outside of a render pass.

Performance data was collected at different points in this series. These
tests were run on a SKL GT4, with a monitor resolution of 1440x900. For
Dota 2 and Talos Principle, the average of three fullscreen runs was
taken. At least one warm-up run was performed between driver builds. The
Talos Principle runs are omitted as no significant changes were
measured. No warm-up was performed for the Vulkan examples and the demo
resolution was the default window size on startup.

Nanley Chery (22):
  intel/blorp: Add the BDW+ optimized HZ_OP sequence to BLORP
  intel/blorp_blit: Handle ISL_AUX_USAGE_HIZ
  anv: Replace anv_image_has_hiz() with ISL_AUX_USAGE_HIZ
  anv: Use ::anv_attachment_state for toggling HiZ per subpass
  anv: Enable HiZ support for multiple subpasses
  intel/blorp_clear: Add gen8 HiZ clearing functions
  anv: Use gen8 BLORP HiZ clearing functions
  anv/blorp: Add a gen8 HiZ op resolve function
  anv: Use the gen8 BLORP HiZ resolving function
  anv: Delete anv's HiZ op emit function
  anv: Add helpers to handle depth buffer layout transitions
  anv: Store depth stencil layouts
  anv: Prepare for transitioning to the requested final layout
  anv: Avoid resolves incurred by fast depth clears
  anv: Disable HiZ for input attachments
  anv/image: Disable HiZ for storage images
  anv: Perform HiZ resolves only on layout transitions
  isl/surface_state: Handle ISL_AUX_USAGE_HIZ
  anv: Add a helper to determine sampling with HiZ
  anv/blorp: Don't fast depth clear samplable HiZ buffers on BDW
  anv: Enable sampling from HiZ
  anv: Avoid some resolves for samplable HiZ buffers

 src/intel/blorp/blorp.h|  12 ++
 src/intel/blorp/blorp_blit.c   |   2 +
 src/intel/blorp/blorp_clear.c  |  80 +
 src/intel/blorp/blorp_genX_exec.h  |  87 ++
 src/intel/isl/isl_surface_state.c  |  38 ++-
 src/intel/vulkan/TODO  |   3 +-
 src/intel/vulkan/anv_blorp.c   | 100 -
 src/intel/vulkan/anv_genX.h|   3 -
 src/intel/vulkan/anv_image.c   |  46 +++-
 src/intel/vulkan/anv_pass.c|   8 ++
 src/intel/vulkan/anv_private.h |  51 +++--
 src/intel/vulkan/gen7_cmd_buffer.c |   7 --
 src/intel/vulkan/gen8_cmd_buffer.c | 224 -
 src/intel/vulkan/genX_cmd_buffer.c | 168 
 14 files changed, 548 insertions(+), 281 deletions(-)

-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/22] anv: Use gen8 BLORP HiZ clearing functions

2017-01-11 Thread Nanley Chery
Signed-off-by: Nanley Chery 
---
 src/intel/vulkan/anv_blorp.c   | 54 +++---
 src/intel/vulkan/genX_cmd_buffer.c |  1 -
 2 files changed, 50 insertions(+), 5 deletions(-)

diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index 13d5b5f9d7..a77913db25 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -1162,6 +1162,8 @@ void
 anv_cmd_buffer_clear_subpass(struct anv_cmd_buffer *cmd_buffer)
 {
const struct anv_cmd_state *cmd_state = _buffer->state;
+   const VkRect2D render_area = cmd_buffer->state.render_area;
+
 
if (!subpass_needs_clear(cmd_buffer))
   return;
@@ -1196,8 +1198,6 @@ anv_cmd_buffer_clear_subpass(struct anv_cmd_buffer 
*cmd_buffer)
att_state->aux_usage, );
   surf.clear_color = vk_to_isl_color(att_state->clear_value.color);
 
-  const VkRect2D render_area = cmd_buffer->state.render_area;
-
   if (att_state->fast_clear) {
  blorp_fast_clear(, , iview->isl.format,
   iview->isl.base_level,
@@ -1237,8 +1237,54 @@ anv_cmd_buffer_clear_subpass(struct anv_cmd_buffer 
*cmd_buffer)
  .clearValue = cmd_state->attachments[ds].clear_value,
   };
 
-  clear_depth_stencil_attachment(cmd_buffer, ,
- _att, 1, _rect);
+
+  const uint8_t gen = cmd_buffer->device->info.gen;
+  bool clear_with_hiz = gen >= 8 && cmd_state->attachments[ds].aux_usage ==
+ISL_AUX_USAGE_HIZ;
+  const struct anv_image_view *iview = fb->attachments[ds];
+
+  if (clear_with_hiz) {
+ const bool clear_depth = clear_att.aspectMask &
+  VK_IMAGE_ASPECT_DEPTH_BIT;
+ const bool clear_stencil = clear_att.aspectMask &
+VK_IMAGE_ASPECT_STENCIL_BIT;
+
+ /* Check against restrictions for depth buffer clearing. A great GPU
+  * performance benefit isn't expected when using the HZ sequence for
+  * stencil-only clears. Therefore, we don't emit a HZ op sequence for
+  * a stencil clear in addition to using the BLORP-fallback for depth.
+  */
+ if (clear_depth) {
+if (!blorp_can_hiz_clear_depth(gen, iview->isl.format,
+   iview->image->samples,
+   render_area.offset.x,
+   render_area.offset.y,
+   render_area.offset.x +
+   render_area.extent.width,
+   render_area.offset.y +
+   render_area.extent.height)) {
+   clear_with_hiz = false;
+}
+ }
+
+ if (clear_with_hiz) {
+blorp_gen8_hiz_clear_attachments(, iview->image->samples,
+ render_area.offset.x,
+ render_area.offset.y,
+ render_area.offset.x +
+ render_area.extent.width,
+ render_area.offset.y +
+ render_area.extent.height,
+ clear_depth, clear_stencil,
+ clear_att.clearValue.
+depthStencil.stencil);
+ }
+  }
+
+  if (!clear_with_hiz) {
+ clear_depth_stencil_attachment(cmd_buffer, ,
+_att, 1, _rect);
+  }
 
   cmd_state->attachments[ds].pending_clear_aspects = 0;
}
diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index b670d00e2d..63f6be12a8 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -2250,7 +2250,6 @@ genX(cmd_buffer_set_subpass)(struct anv_cmd_buffer 
*cmd_buffer,
 
cmd_buffer_emit_depth_stencil(cmd_buffer);
genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_HIZ_RESOLVE);
-   genX(cmd_buffer_emit_hz_op)(cmd_buffer, BLORP_HIZ_OP_DEPTH_CLEAR);
 
anv_cmd_buffer_clear_subpass(cmd_buffer);
 }
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Move nir_lower_wpos_center after dead variable elimination.

2017-01-11 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Wed, Jan 11, 2017 at 5:25 PM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> On Wed, 2017-01-11 at 16:09 -0800, Kenneth Graunke wrote:
> > When multiple shader stages exist in the same SPIR-V module, we
> > compile
> > all entry points and their inputs/outputs, then dead code eliminate
> > the
> > ones not related to the specific entry point later.
> >
> > nir_lower_wpos_center was being run prior to eliminating those random
> > other variables, which made it trip up, thinking it found
> > gl_FragCoord
> > when it actually found something else like gl_PerVertex[3].
> >
> > Fixes dEQP-VK.spirv_assembly.instruction.graphics.module.same_module.
> >
> > Signed-off-by: Kenneth Graunke 
>
> Reviewed-by: Timothy Arceri 
>
> > ---
> >  src/intel/vulkan/anv_pipeline.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/intel/vulkan/anv_pipeline.c
> > b/src/intel/vulkan/anv_pipeline.c
> > index 6c939b071da..7d939ebabe9 100644
> > --- a/src/intel/vulkan/anv_pipeline.c
> > +++ b/src/intel/vulkan/anv_pipeline.c
> > @@ -139,9 +139,6 @@ anv_shader_compile_to_nir(struct anv_device
> > *device,
> >
> > free(spec_entries);
> >
> > -   if (stage == MESA_SHADER_FRAGMENT)
> > -  NIR_PASS_V(nir, nir_lower_wpos_center);
> > -
> > /* We have to lower away local constant initializers right before
> > we
> >  * inline functions.  That way they get properly initialized at
> > the top
> >  * of the function and not at the top of its caller.
> > @@ -161,6 +158,9 @@ anv_shader_compile_to_nir(struct anv_device
> > *device,
> > NIR_PASS_V(nir, nir_remove_dead_variables,
> >nir_var_shader_in | nir_var_shader_out |
> > nir_var_system_value);
> >
> > +   if (stage == MESA_SHADER_FRAGMENT)
> > +  NIR_PASS_V(nir, nir_lower_wpos_center);
> > +
> > /* Now that we've deleted all but the main function, we can go
> > ahead and
> >  * lower the rest of the constant initializers.
> >  */
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Move nir_lower_wpos_center after dead variable elimination.

2017-01-11 Thread Timothy Arceri
On Wed, 2017-01-11 at 16:09 -0800, Kenneth Graunke wrote:
> When multiple shader stages exist in the same SPIR-V module, we
> compile
> all entry points and their inputs/outputs, then dead code eliminate
> the
> ones not related to the specific entry point later.
> 
> nir_lower_wpos_center was being run prior to eliminating those random
> other variables, which made it trip up, thinking it found
> gl_FragCoord
> when it actually found something else like gl_PerVertex[3].
> 
> Fixes dEQP-VK.spirv_assembly.instruction.graphics.module.same_module.
> 
> Signed-off-by: Kenneth Graunke 

Reviewed-by: Timothy Arceri 

> ---
>  src/intel/vulkan/anv_pipeline.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_pipeline.c
> b/src/intel/vulkan/anv_pipeline.c
> index 6c939b071da..7d939ebabe9 100644
> --- a/src/intel/vulkan/anv_pipeline.c
> +++ b/src/intel/vulkan/anv_pipeline.c
> @@ -139,9 +139,6 @@ anv_shader_compile_to_nir(struct anv_device
> *device,
>  
> free(spec_entries);
>  
> -   if (stage == MESA_SHADER_FRAGMENT)
> -  NIR_PASS_V(nir, nir_lower_wpos_center);
> -
> /* We have to lower away local constant initializers right before
> we
>  * inline functions.  That way they get properly initialized at
> the top
>  * of the function and not at the top of its caller.
> @@ -161,6 +158,9 @@ anv_shader_compile_to_nir(struct anv_device
> *device,
> NIR_PASS_V(nir, nir_remove_dead_variables,
>    nir_var_shader_in | nir_var_shader_out |
> nir_var_system_value);
>  
> +   if (stage == MESA_SHADER_FRAGMENT)
> +  NIR_PASS_V(nir, nir_lower_wpos_center);
> +
> /* Now that we've deleted all but the main function, we can go
> ahead and
>  * lower the rest of the constant initializers.
>  */
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] [swr] Always defer memory free in swr_resource_destroy

2017-01-11 Thread Cherniak, Bruce
Reviewed-by: Bruce Cherniak 

> On Jan 11, 2017, at 5:19 PM, George Kyriazis  
> wrote:
> 
> Defer delete on regular resources.  This ensures that any work being done
> on the resource is completed before freeing up the resource's memory.
> ---
> src/gallium/drivers/swr/swr_screen.cpp | 17 +
> 1 file changed, 5 insertions(+), 12 deletions(-)
> 
> diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
> b/src/gallium/drivers/swr/swr_screen.cpp
> index cc8030e..5012388 100644
> --- a/src/gallium/drivers/swr/swr_screen.cpp
> +++ b/src/gallium/drivers/swr/swr_screen.cpp
> @@ -880,18 +880,11 @@ swr_resource_destroy(struct pipe_screen *p_screen, 
> struct pipe_resource *pt)
>   winsys->displaytarget_destroy(winsys, spr->display_target);
> 
>} else {
> -  /* For regular resources, if the resource is being used, defer deletion
> -   * (use aligned-free) */
> -  if (pipe && spr->status) {
> - swr_resource_unused(pt);
> - swr_fence_work_free(screen->flush_fence,
> - spr->swr.pBaseAddress, true);
> - swr_fence_work_free(screen->flush_fence, 
> - spr->secondary.pBaseAddress, true);
> -  } else {
> - AlignedFree(spr->swr.pBaseAddress);
> - AlignedFree(spr->secondary.pBaseAddress);
> -  }
> +  /* For regular resources, defer deletion */
> +  swr_resource_unused(pt);
> +  swr_fence_work_free(screen->flush_fence, spr->swr.pBaseAddress, true);
> +  swr_fence_work_free(screen->flush_fence,
> +  spr->secondary.pBaseAddress, true);
>}
> 
>FREE(spr);
> -- 
> 2.10.0.windows.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

2017-01-11 Thread Marek Olšák
On Thu, Jan 12, 2017 at 12:33 AM, Ilia Mirkin  wrote:
> On Wed, Jan 11, 2017 at 4:00 PM, Roland Scheidegger  
> wrote:
>> Am 11.01.2017 um 21:08 schrieb Samuel Pitoiset:
>>>
>>>
>>> On 01/11/2017 07:00 PM, Roland Scheidegger wrote:
 I don't think there's any glsl, es or otherwise, specification which
 would require denorms (since obviously lots of hw can't do it, d3d10
 forbids them), with any precision qualifier. Hence these look like bugs
 of the test suite to me?
 (Irrespective if it's a good idea or not to enable denormals, which I
 don't realy know.)
>>>
>>> That test works on NVIDIA hw (both with blob and nouveau) and IIRC it
>>> also works on Intel hw. I don't think it's buggy there.
>> The question then is why it needs denorms on radeons...
>
> I spent some time with Samuel looking at this. So, this is pretty
> funny... (or at least feels that way after staring at floating point
> for a while)
>
> dEQP is, in fact, feeding denorms to the min/max functions. But it's
> smart enough to know that flushing denorms to 0 is OK, and so it
> treats a 0 as a pass. (And obviously it treats the "right answer" as a
> pass.) So that's why enabling denorm processing fixes it - that causes
> the hw to return the proper correct answer and all is well.
>
> However the issue is that without denorm processing, the hw is
> returning the *wrong* answer. At first I thought that max was being
> lowered into something like
>
> if (a > b) { x = a; } else { x = b; }
>
> which would end up with potentially wrong results if a and b are being
> flushed as inputs into the comparison but not into the assignments.
> But that's not (explicitly) what's happening - the v_max_f32_e32
> instruction is being used. Perhaps that's what it does internally? If
> so, that means that results of affected float functions in LLVM need
> explicit flushing before being stored into results.
>
> FWIW the specific values triggering the issue are:
>
> in0=-0x0.02p-126, in1=-0x0.fep-126, out0=-0x0.fep-126 -> FAIL
>
> With denorm processing, it correctly reports out0=-0x0.02p-126,
> while nouveau with denorm flushing enabled reports out0=0.0 which also
> passes.

The denorm configuration has 2 bits:
- flush (0) or allow (1) input denorms
- flush (0) or allow (1) output denorms

In the case of v_max, it looks like output denorms are not flushed and
it behaves almost like you said:

if (a >= b) { x = a; } else { x = b; }

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] docs: document INTEL_PRECISE_TRIG envvar

2017-01-11 Thread Jordan Justen
On 2017-01-11 09:53:15, Juan A. Suarez Romero wrote:
> ---
>  docs/envvars.html | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/docs/envvars.html b/docs/envvars.html
> index 9eee8db..4f05d7f 100644
> --- a/docs/envvars.html
> +++ b/docs/envvars.html
> @@ -187,6 +187,7 @@ See the Xlib software driver 
> page for details.
> do32 - generate compute shader SIMD32 programs even if workgroup size 
> doesn't exceed the SIMD16 limit
> norbc - disable single sampled render buffer compression
>  
> +INTEL_PRECISE_TRIG - if set to 1 in gen<10, it fixes results out of 
> [-1.0, 1.0] range for a small set of values.

It can also be set to "true" or "yes".

I think the description could be more generic, like the precise_trig
in src/mesa/drivers/dri/common/xmlpool/t_options.h.

INTEL_PRECISE_TRIG - if set to 1, true or yes, then the driver
prefers accuracy over performance in trig functions.

Reviewed-by: Jordan Justen 

>  
>  
>  
> -- 
> 2.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] android: ac/debug: move sid_tables.h generation and IB decode to amd/common

2017-01-11 Thread Mauro Rossi
Hi,

I'm sending a patch to support recent changes in radeonsi and amd/common,
in order to fix android building errors due to files moved to amd/common.

The patch has been implemented as a merge of the two ports,
in order to fully test the build.

Please consider that patches to fix LLVMInitializeAMDGPU*
declarations, which are in final stages of review, should be
upstreamed, to be able to correctly apply this one.

Mauro


>From d6614af2763ea96940e16571005d4ab06e86304c Mon Sep 17 00:00:00 2001
From: Mauro Rossi 
Date: Thu, 12 Jan 2017 00:35:06 +0100
Subject: [PATCH] android: ac/debug: move sid_tables.h generation and IB decode
 to amd/common

This patch is the porting to android of the following commits:

b838f64 "ac/debug: Move sid_tables.h generation to common code."
0ef1b4d "ac/debug: Move IB decode to common code."

Fixes android building errors due to sid_tables.h
and ac_debug.c, ac_debug.h moved to amd/common

Tested by building nougat-x86
---
 src/amd/Android.common.mk   | 16 +++-
 src/gallium/drivers/radeonsi/Android.mk | 15 +++
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/src/amd/Android.common.mk b/src/amd/Android.common.mk
index c7fd193..58c3267 100644
--- a/src/amd/Android.common.mk
+++ b/src/amd/Android.common.mk
@@ -28,16 +28,30 @@ include $(CLEAR_VARS)

 LOCAL_MODULE := libmesa_amd_common

-LOCAL_SRC_FILES := $(AMD_COMPILER_FILES)
+LOCAL_SRC_FILES := \
+ $(AMD_COMPILER_FILES) \
+ $(AMD_DEBUG_FILES)

 LOCAL_CFLAGS += -DFORCE_BUILD_AMDGPU   # to enable
TARGET_LLVM(AMDGPU) LLVMInitialize* prototypes

+# generate sources
+LOCAL_MODULE_CLASS := STATIC_LIBRARIES
+intermediates := $(call local-generated-sources-dir)
+LOCAL_GENERATED_SOURCES := $(addprefix $(intermediates)/,
$(AMD_GENERATED_FILES))
+
+$(LOCAL_GENERATED_SOURCES): PRIVATE_PYTHON := $(MESA_PYTHON2)
+$(LOCAL_GENERATED_SOURCES): PRIVATE_CUSTOM_TOOL = $(PRIVATE_PYTHON) $^ > $@
+
+$(intermediates)/common/sid_tables.h:
$(LOCAL_PATH)/common/sid_tables.py $(MESA_TOP)/src/amd/common/sid.h
+ $(transform-generated-source)
+
 LOCAL_C_INCLUDES := \
  $(MESA_TOP)/include \
  $(MESA_TOP)/src \
  $(MESA_TOP)/src/amd/common \
  $(MESA_TOP)/src/gallium/include \
  $(MESA_TOP)/src/gallium/auxiliary \
+ $(intermediates)/common \
  external/llvm/include \
  external/llvm/device/include \
  external/libcxx/include \
diff --git a/src/gallium/drivers/radeonsi/Android.mk
b/src/gallium/drivers/radeonsi/Android.mk
index ee4e229..3625675 100644
--- a/src/gallium/drivers/radeonsi/Android.mk
+++ b/src/gallium/drivers/radeonsi/Android.mk
@@ -32,21 +32,12 @@ LOCAL_SRC_FILES := $(C_SOURCES)

 LOCAL_CFLAGS += -DFORCE_BUILD_AMDGPU   # to enable
TARGET_LLVM(AMDGPU) LLVMInitialize* prototypes

-LOCAL_C_INCLUDES := $(MESA_TOP)/src/amd/common
+LOCAL_C_INCLUDES := \
+ $(MESA_TOP)/src/amd/common \
+ $(call intermediates-dir-for,STATIC_LIBRARIES,libmesa_amd_common)/common

 LOCAL_SHARED_LIBRARIES := libdrm_radeon
 LOCAL_MODULE := libmesa_pipe_radeonsi

-# generate sources
-LOCAL_MODULE_CLASS := STATIC_LIBRARIES
-intermediates := $(call local-generated-sources-dir)
-LOCAL_GENERATED_SOURCES := $(addprefix $(intermediates)/, $(GENERATED_SOURCES))
-
-$(LOCAL_GENERATED_SOURCES): PRIVATE_PYTHON := $(MESA_PYTHON2)
-$(LOCAL_GENERATED_SOURCES): PRIVATE_CUSTOM_TOOL = $(PRIVATE_PYTHON) $^ > $@
-
-$(intermediates)/sid_tables.h:  $(intermediates)/%.h:
$(LOCAL_PATH)/%.py $(MESA_TOP)/src/amd/common/sid.h
- $(transform-generated-source)
-
 include $(GALLIUM_COMMON_MK)
 include $(BUILD_STATIC_LIBRARY)
-- 
2.9.3
From d6614af2763ea96940e16571005d4ab06e86304c Mon Sep 17 00:00:00 2001
From: Mauro Rossi 
Date: Thu, 12 Jan 2017 00:35:06 +0100
Subject: [PATCH] android: ac/debug: move sid_tables.h generation and IB decode
 to amd/common

This patch is the porting to android of the following commits:

b838f64 "ac/debug: Move sid_tables.h generation to common code."
0ef1b4d "ac/debug: Move IB decode to common code."

Fixes android building errors due to sid_tables.h
and ac_debug.c, ac_debug.h moved to amd/common

Tested by building nougat-x86
---
 src/amd/Android.common.mk   | 16 +++-
 src/gallium/drivers/radeonsi/Android.mk | 15 +++
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/src/amd/Android.common.mk b/src/amd/Android.common.mk
index c7fd193..58c3267 100644
--- a/src/amd/Android.common.mk
+++ b/src/amd/Android.common.mk
@@ -28,16 +28,30 @@ include $(CLEAR_VARS)
 
 LOCAL_MODULE := libmesa_amd_common
 
-LOCAL_SRC_FILES := $(AMD_COMPILER_FILES)
+LOCAL_SRC_FILES := \
+	$(AMD_COMPILER_FILES) \
+	$(AMD_DEBUG_FILES)
 
 LOCAL_CFLAGS += -DFORCE_BUILD_AMDGPU   # to enable TARGET_LLVM(AMDGPU) LLVMInitialize* prototypes
 
+# generate sources
+LOCAL_MODULE_CLASS := STATIC_LIBRARIES
+intermediates := $(call local-generated-sources-dir)
+LOCAL_GENERATED_SOURCES := $(addprefix $(intermediates)/, $(AMD_GENERATED_FILES))
+
+$(LOCAL_GENERATED_SOURCES): 

[Mesa-dev] [PATCH] anv: Move nir_lower_wpos_center after dead variable elimination.

2017-01-11 Thread Kenneth Graunke
When multiple shader stages exist in the same SPIR-V module, we compile
all entry points and their inputs/outputs, then dead code eliminate the
ones not related to the specific entry point later.

nir_lower_wpos_center was being run prior to eliminating those random
other variables, which made it trip up, thinking it found gl_FragCoord
when it actually found something else like gl_PerVertex[3].

Fixes dEQP-VK.spirv_assembly.instruction.graphics.module.same_module.

Signed-off-by: Kenneth Graunke 
---
 src/intel/vulkan/anv_pipeline.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/intel/vulkan/anv_pipeline.c b/src/intel/vulkan/anv_pipeline.c
index 6c939b071da..7d939ebabe9 100644
--- a/src/intel/vulkan/anv_pipeline.c
+++ b/src/intel/vulkan/anv_pipeline.c
@@ -139,9 +139,6 @@ anv_shader_compile_to_nir(struct anv_device *device,
 
free(spec_entries);
 
-   if (stage == MESA_SHADER_FRAGMENT)
-  NIR_PASS_V(nir, nir_lower_wpos_center);
-
/* We have to lower away local constant initializers right before we
 * inline functions.  That way they get properly initialized at the top
 * of the function and not at the top of its caller.
@@ -161,6 +158,9 @@ anv_shader_compile_to_nir(struct anv_device *device,
NIR_PASS_V(nir, nir_remove_dead_variables,
   nir_var_shader_in | nir_var_shader_out | nir_var_system_value);
 
+   if (stage == MESA_SHADER_FRAGMENT)
+  NIR_PASS_V(nir, nir_lower_wpos_center);
+
/* Now that we've deleted all but the main function, we can go ahead and
 * lower the rest of the constant initializers.
 */
-- 
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] ac/debug: move .gitignore for sid_tables.h too

2017-01-11 Thread Grazvydas Ignotas
b838f642 "ac/debug: Move sid_tables.h generation to common code." moved
sid_tables.h but forgot the corresponding .gitignore.

Signed-off-by: Grazvydas Ignotas 
---
no commit access

 src/amd/common/.gitignore   | 1 +
 src/gallium/drivers/radeonsi/.gitignore | 1 -
 2 files changed, 1 insertion(+), 1 deletion(-)
 create mode 100644 src/amd/common/.gitignore
 delete mode 100644 src/gallium/drivers/radeonsi/.gitignore

diff --git a/src/amd/common/.gitignore b/src/amd/common/.gitignore
new file mode 100644
index 000..e0ee798
--- /dev/null
+++ b/src/amd/common/.gitignore
@@ -0,0 +1 @@
+sid_tables.h
diff --git a/src/gallium/drivers/radeonsi/.gitignore 
b/src/gallium/drivers/radeonsi/.gitignore
deleted file mode 100644
index e0ee798..000
--- a/src/gallium/drivers/radeonsi/.gitignore
+++ /dev/null
@@ -1 +0,0 @@
-sid_tables.h
-- 
2.7.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

2017-01-11 Thread Ilia Mirkin
On Wed, Jan 11, 2017 at 4:00 PM, Roland Scheidegger  wrote:
> Am 11.01.2017 um 21:08 schrieb Samuel Pitoiset:
>>
>>
>> On 01/11/2017 07:00 PM, Roland Scheidegger wrote:
>>> I don't think there's any glsl, es or otherwise, specification which
>>> would require denorms (since obviously lots of hw can't do it, d3d10
>>> forbids them), with any precision qualifier. Hence these look like bugs
>>> of the test suite to me?
>>> (Irrespective if it's a good idea or not to enable denormals, which I
>>> don't realy know.)
>>
>> That test works on NVIDIA hw (both with blob and nouveau) and IIRC it
>> also works on Intel hw. I don't think it's buggy there.
> The question then is why it needs denorms on radeons...

I spent some time with Samuel looking at this. So, this is pretty
funny... (or at least feels that way after staring at floating point
for a while)

dEQP is, in fact, feeding denorms to the min/max functions. But it's
smart enough to know that flushing denorms to 0 is OK, and so it
treats a 0 as a pass. (And obviously it treats the "right answer" as a
pass.) So that's why enabling denorm processing fixes it - that causes
the hw to return the proper correct answer and all is well.

However the issue is that without denorm processing, the hw is
returning the *wrong* answer. At first I thought that max was being
lowered into something like

if (a > b) { x = a; } else { x = b; }

which would end up with potentially wrong results if a and b are being
flushed as inputs into the comparison but not into the assignments.
But that's not (explicitly) what's happening - the v_max_f32_e32
instruction is being used. Perhaps that's what it does internally? If
so, that means that results of affected float functions in LLVM need
explicit flushing before being stored into results.

FWIW the specific values triggering the issue are:

in0=-0x0.02p-126, in1=-0x0.fep-126, out0=-0x0.fep-126 -> FAIL

With denorm processing, it correctly reports out0=-0x0.02p-126,
while nouveau with denorm flushing enabled reports out0=0.0 which also
passes.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] fixup! EGL: Implement the libglvnd interface for EGL (v2)

2017-01-11 Thread Timo Aaltonen
On 05.01.2017 23:29, Kyle Brenneman wrote:
> ---
>  src/egl/generate/eglFunctionList.py | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/src/egl/generate/eglFunctionList.py 
> b/src/egl/generate/eglFunctionList.py
> index b19b5f7..80cb834 100644
> --- a/src/egl/generate/eglFunctionList.py
> +++ b/src/egl/generate/eglFunctionList.py
> @@ -53,12 +53,14 @@ method values:
>  Select the vendor that owns the current context.
>  """
>  
> -def _eglFunc(name, method, static=False, public=False, inheader=None, 
> prefix="", extension=None, retval=None):
> +def _eglFunc(name, method, static=None, public=False, inheader=None, 
> prefix="dispatch_", extension=None, retval=None):
>  """
>  A convenience function to define an entry in the EGL function list.
>  """
> +if static is None:
> +static = (not public and method != "custom")
>  if inheader is None:
> -inheader = (not public)
> +inheader = (not static)
>  values = {
>  "method" : method,
>  "prefix" : prefix,

You probably need to send a v3 with this added?



-- 
t
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] [swr] Always defer memory free in swr_resource_destroy

2017-01-11 Thread George Kyriazis
Defer delete on regular resources.  This ensures that any work being done
on the resource is completed before freeing up the resource's memory.
---
 src/gallium/drivers/swr/swr_screen.cpp | 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/src/gallium/drivers/swr/swr_screen.cpp 
b/src/gallium/drivers/swr/swr_screen.cpp
index cc8030e..5012388 100644
--- a/src/gallium/drivers/swr/swr_screen.cpp
+++ b/src/gallium/drivers/swr/swr_screen.cpp
@@ -880,18 +880,11 @@ swr_resource_destroy(struct pipe_screen *p_screen, struct 
pipe_resource *pt)
   winsys->displaytarget_destroy(winsys, spr->display_target);
 
} else {
-  /* For regular resources, if the resource is being used, defer deletion
-   * (use aligned-free) */
-  if (pipe && spr->status) {
- swr_resource_unused(pt);
- swr_fence_work_free(screen->flush_fence,
- spr->swr.pBaseAddress, true);
- swr_fence_work_free(screen->flush_fence, 
- spr->secondary.pBaseAddress, true);
-  } else {
- AlignedFree(spr->swr.pBaseAddress);
- AlignedFree(spr->secondary.pBaseAddress);
-  }
+  /* For regular resources, defer deletion */
+  swr_resource_unused(pt);
+  swr_fence_work_free(screen->flush_fence, spr->swr.pBaseAddress, true);
+  swr_fence_work_free(screen->flush_fence,
+  spr->secondary.pBaseAddress, true);
}
 
FREE(spr);
-- 
2.10.0.windows.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] gputool: a tool for debugging AMD GPUs

2017-01-11 Thread Andres Rodriguez

Hey Everyone,

I started a small project called gputool to help me debug issues on AMD 
platforms. I think it may be useful to other driver devs, so I've made 
it available on github:


https://github.com/lostgoat/gputool

Mainly I wanted something that could read and decode registers, so that 
is what is currently supported.


Example operation:

|gputool> read CB_HW_CONTROL CB_HW_CONTROL: 0x7208 FORCE_NEEDS_DST: 0x0 
DISABLE_BLEND_OPT_WHEN_DISABLED_SRCALPHA_IS_USED: 0x0 
DISABLE_BLEND_OPT_DISCARD_PIXEL: 0x0 DISABLE_BLEND_OPT_DONT_RD_DST: 0x0 
PRIORITIZE_FC_EVICT_OVER_FOP_RD_ON_BANK_CONFLICT: 0x0 
DISABLE_FULL_WRITE_MASK: 0x0 DISABLE_BLEND_OPT_RESULT_EQ_DEST: 0x0 
DISABLE_INTNORM_LE11BPC_CLAMPING: 0x0 
DISABLE_PIXEL_IN_QUAD_FIX_FOR_LINEAR_SURFACE: 0x0 
ALLOW_MRT_WITH_DUAL_SOURCE: 0x0 DISABLE_CC_IB_SERIALIZER_STATE_OPT: 0x0 
PRIORITIZE_FC_WR_OVER_FC_RD_ON_CMASK_CONFLICT: 0x0 FORCE_ALWAYS_TOGGLE: 
0x0 FC_CACHE_EVICT_POINT: 0x8 CC_CACHE_EVICT_POINT: 0x7 
DISABLE_RESOLVE_OPT_FOR_SINGLE_FRAG: 0x0 DISABLE_BLEND_OPT_BYPASS: 0x0 
CM_CACHE_EVICT_POINT: 0x8 ||gputool> read PA_SC_RASTER_CONFIG PA_SC_RASTER_CONFIG: 0x1612 
RB_XSEL: 0x0 SC_YSEL: 0x0 SC_MAP: 0x0 SC_XSEL: 0x0 SE_YSEL: 0x1 SE_XSEL: 
0x1 PKR_YSEL: 0x0 RB_YSEL: 0x0 PKR_XSEL: 0x0 RB_XSEL2: 0x1 PKR_XSEL2: 
0x0 SE_MAP: 0x2 RB_MAP_PKR0: 0x2 RB_MAP_PKR1: 0x0 PKR_MAP: 0x0|


One of the other things I'm considering adding is register write 
support, and also dumping textures from vram for inspection.


Regards,
Andres




||

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2] main/buffers: update error handling on DrawBuffers for 4.5

2017-01-11 Thread Anuj Phogat
On Wed, Jan 11, 2017 at 2:00 PM, Alejandro Piñeiro  wrote:
> Before 4.5, GL_BACK was not allowed as a value of bufs. Since 4.5 it
> is allowed under some circunstances:
>
> From the OpenGL 4.5 specification, Section 17.4.1 "Selecting Buffers
> for Writing", page 493 (page 515 of the PDF):
>  "An INVALID_ENUM error is generated if any value in bufs is FRONT ,
>   LEFT , RIGHT , or FRONT_AND_BACK . This restriction applies to both
>   the de- fault framebuffer and framebuffer objects, and exists
>   because these constants may themselves refer to multiple buffers, as
>   shown in table 17.4."
>
> And on page 492 (page 514 of the PDF):
>  "If the default framebuffer is affected, then each of the constants
>   must be one of the values listed in table 17.6 or the special value
>   BACK . When BACK is used, n must be 1 and color values are written
>   into the left buffer for single-buffered contexts, or into the back
>   left buffer for double-buffered contexts."
>
> This patch keeps the same behaviour if OpenGL version is < 4. We
> assume that for 4.x this is the intended behaviour, so a fix, but for
> 3.x the intended behaviour is the already in place.
>
> Part of the fix for:
> GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors
>
> v2: remove forgot printf
> ---
>
> Previous version has a debug printf that I forgot to remove. Sorry for
> the noise.
>
>  src/mesa/main/buffers.c | 46 +-
>  1 file changed, 33 insertions(+), 13 deletions(-)
>
> diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
> index 2b24e5a..46ba4fd 100644
> --- a/src/mesa/main/buffers.c
> +++ b/src/mesa/main/buffers.c
> @@ -343,7 +343,9 @@ _mesa_NamedFramebufferDrawBuffer(GLuint framebuffer, 
> GLenum buf)
>   * \param n  number of outputs
>   * \param buffers  array [n] of renderbuffer names.  Unlike glDrawBuffer, the
>   * names cannot specify more than one buffer.  For example,
> - * GL_FRONT_AND_BACK is illegal.
> + * GL_FRONT_AND_BACK is illegal. The only exception is 
> GL_BACK
> + * that is considered special and allowed as far as n is one
> + * since 4.5.
>   */
>  static void
>  draw_buffers(struct gl_context *ctx, struct gl_framebuffer *fb,
> @@ -401,20 +403,38 @@ draw_buffers(struct gl_context *ctx, struct 
> gl_framebuffer *fb,
>   return;
>}
>
> -  /* From the OpenGL 4.0 specification, page 256:
> -   * "For both the default framebuffer and framebuffer objects, the
> -   *  constants FRONT, BACK, LEFT, RIGHT, and FRONT_AND_BACK are not
> -   *  valid in the bufs array passed to DrawBuffers, and will result in
> -   *  the error INVALID_ENUM. This restriction is because these
> -   *  constants may themselves refer to multiple buffers, as shown in
> -   *  table 4.4."
> -   *  Previous versions of the OpenGL specification say 
> INVALID_OPERATION,
> -   *  but the Khronos conformance tests expect INVALID_ENUM.
> +  /* From the OpenGL 4.5 specification, page 493 (page 515 of the PDF)
> +   * "An INVALID_ENUM error is generated if any value in bufs is FRONT
> +   *  , LEFT , RIGHT , or FRONT_AND_BACK . This restriction applies to
Need no space before comma. Move it after the comma.
> +   *  both the de- fault framebuffer and framebuffer objects, and
> +   *  exists because these constants may themselves refer to multiple
> +   *  buffers, as shown in table 17.4."
> +   *
> +   * And on page 492 (page 514 of the PDF):
> +   * "If the default framebuffer is affected, then each of the
> +   *  constants must be one of the values listed in table 17.6 or the
> +   *  special value BACK . When BACK is used, n must be 1 and color
> +   *  values are written into the left buffer for single-buffered
> +   *  contexts, or into the back left buffer for double-buffered
> +   *  contexts."
> +   *
> +   * Note "special value BACK". GL_BACK also refers to multiple buffers,
> +   * but it is consider a special case here. This is a change on 4.5. For
> +   * OpenGL 4.x we check that behaviour. For any previous version we keep
> +   * considering it wrong (as INVALID_ENUM).
> */
>if (_mesa_bitcount(destMask[output]) > 1) {
> - _mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)",
> - caller, _mesa_enum_to_string(buffers[output]));
> - return;
> + if (_mesa_is_winsys_fbo(fb) && ctx->Version >= 40 && 
> buffers[output] == GL_BACK) {
Split in two lines. Exceeding the 80 character limit.
> +if (n != 1) {
> +   _mesa_error(ctx, GL_INVALID_OPERATION, "%s(with GL_BACK n 
> must be 1)",
> +   caller);
> +   return;
> +}
> + } else {
> +_mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)",
> +

Re: [Mesa-dev] [PATCH] docs: document INTEL_PRECISE_TRIG envvar

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 2:18 PM, Jordan Justen 
wrote:

> On 2017-01-11 12:31:28, Kenneth Graunke wrote:
> > On Wednesday, January 11, 2017 10:48:53 AM PST Jason Ekstrand wrote:
> > > On Wed, Jan 11, 2017 at 10:34 AM, Jordan Justen <
> jordan.l.jus...@intel.com>
> > > wrote:
> > >
> > > > On 2017-01-11 09:53:15, Juan A. Suarez Romero wrote:
> > > > > +INTEL_PRECISE_TRIG - if set to 1 in gen<10, it fixes results
> out of
> > > > [-1.0, 1.0] range for a small set of values.
> > > >
> > > > Since we now have the precise_trig driconf option (d9546b0c5d1a),
> > > > should we deprecate INTEL_PRECISE_TRIG?
> > > >
> > >
> > > No.  Vulkan doesn't do driconf.
> > >
> >
> > See commit d9546b0c5d1a5136a92276cdd7c14883f0c62737 where we tried that
> > and commit be32a2132785fbc119f17e62070e007ee7d17af7 where we abandoned
> > that idea.
>
> Ok, answered ... twice.
>
> I guess there's no plans for something driconf-like for vk?
>

 Plans, yes, but nothing has materialized yet.  We haven't really had a
reason for it yet though.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] docs: document INTEL_PRECISE_TRIG envvar

2017-01-11 Thread Jordan Justen
On 2017-01-11 12:31:28, Kenneth Graunke wrote:
> On Wednesday, January 11, 2017 10:48:53 AM PST Jason Ekstrand wrote:
> > On Wed, Jan 11, 2017 at 10:34 AM, Jordan Justen 
> > wrote:
> > 
> > > On 2017-01-11 09:53:15, Juan A. Suarez Romero wrote:
> > > > +INTEL_PRECISE_TRIG - if set to 1 in gen<10, it fixes results out of
> > > [-1.0, 1.0] range for a small set of values.
> > >
> > > Since we now have the precise_trig driconf option (d9546b0c5d1a),
> > > should we deprecate INTEL_PRECISE_TRIG?
> > >
> > 
> > No.  Vulkan doesn't do driconf.
> > 
> 
> See commit d9546b0c5d1a5136a92276cdd7c14883f0c62737 where we tried that
> and commit be32a2132785fbc119f17e62070e007ee7d17af7 where we abandoned
> that idea.

Ok, answered ... twice.

I guess there's no plans for something driconf-like for vk?

-Jordan
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v2] main/buffers: update error handling on DrawBuffers for 4.5

2017-01-11 Thread Alejandro Piñeiro
Before 4.5, GL_BACK was not allowed as a value of bufs. Since 4.5 it
is allowed under some circunstances:

From the OpenGL 4.5 specification, Section 17.4.1 "Selecting Buffers
for Writing", page 493 (page 515 of the PDF):
 "An INVALID_ENUM error is generated if any value in bufs is FRONT ,
  LEFT , RIGHT , or FRONT_AND_BACK . This restriction applies to both
  the de- fault framebuffer and framebuffer objects, and exists
  because these constants may themselves refer to multiple buffers, as
  shown in table 17.4."

And on page 492 (page 514 of the PDF):
 "If the default framebuffer is affected, then each of the constants
  must be one of the values listed in table 17.6 or the special value
  BACK . When BACK is used, n must be 1 and color values are written
  into the left buffer for single-buffered contexts, or into the back
  left buffer for double-buffered contexts."

This patch keeps the same behaviour if OpenGL version is < 4. We
assume that for 4.x this is the intended behaviour, so a fix, but for
3.x the intended behaviour is the already in place.

Part of the fix for:
GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors

v2: remove forgot printf
---

Previous version has a debug printf that I forgot to remove. Sorry for
the noise.

 src/mesa/main/buffers.c | 46 +-
 1 file changed, 33 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
index 2b24e5a..46ba4fd 100644
--- a/src/mesa/main/buffers.c
+++ b/src/mesa/main/buffers.c
@@ -343,7 +343,9 @@ _mesa_NamedFramebufferDrawBuffer(GLuint framebuffer, GLenum 
buf)
  * \param n  number of outputs
  * \param buffers  array [n] of renderbuffer names.  Unlike glDrawBuffer, the
  * names cannot specify more than one buffer.  For example,
- * GL_FRONT_AND_BACK is illegal.
+ * GL_FRONT_AND_BACK is illegal. The only exception is GL_BACK
+ * that is considered special and allowed as far as n is one
+ * since 4.5.
  */
 static void
 draw_buffers(struct gl_context *ctx, struct gl_framebuffer *fb,
@@ -401,20 +403,38 @@ draw_buffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
  return;
   }
 
-  /* From the OpenGL 4.0 specification, page 256:
-   * "For both the default framebuffer and framebuffer objects, the
-   *  constants FRONT, BACK, LEFT, RIGHT, and FRONT_AND_BACK are not
-   *  valid in the bufs array passed to DrawBuffers, and will result in
-   *  the error INVALID_ENUM. This restriction is because these
-   *  constants may themselves refer to multiple buffers, as shown in
-   *  table 4.4."
-   *  Previous versions of the OpenGL specification say INVALID_OPERATION,
-   *  but the Khronos conformance tests expect INVALID_ENUM.
+  /* From the OpenGL 4.5 specification, page 493 (page 515 of the PDF)
+   * "An INVALID_ENUM error is generated if any value in bufs is FRONT
+   *  , LEFT , RIGHT , or FRONT_AND_BACK . This restriction applies to
+   *  both the de- fault framebuffer and framebuffer objects, and
+   *  exists because these constants may themselves refer to multiple
+   *  buffers, as shown in table 17.4."
+   *
+   * And on page 492 (page 514 of the PDF):
+   * "If the default framebuffer is affected, then each of the
+   *  constants must be one of the values listed in table 17.6 or the
+   *  special value BACK . When BACK is used, n must be 1 and color
+   *  values are written into the left buffer for single-buffered
+   *  contexts, or into the back left buffer for double-buffered
+   *  contexts."
+   *
+   * Note "special value BACK". GL_BACK also refers to multiple buffers,
+   * but it is consider a special case here. This is a change on 4.5. For
+   * OpenGL 4.x we check that behaviour. For any previous version we keep
+   * considering it wrong (as INVALID_ENUM).
*/
   if (_mesa_bitcount(destMask[output]) > 1) {
- _mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)",
- caller, _mesa_enum_to_string(buffers[output]));
- return;
+ if (_mesa_is_winsys_fbo(fb) && ctx->Version >= 40 && buffers[output] 
== GL_BACK) {
+if (n != 1) {
+   _mesa_error(ctx, GL_INVALID_OPERATION, "%s(with GL_BACK n must 
be 1)",
+   caller);
+   return;
+}
+ } else {
+_mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)",
+caller, _mesa_enum_to_string(buffers[output]));
+return;
+ }
   }
 
   /* Section 4.2 (Whole Framebuffer Operations) of the OpenGL ES 3.0
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] main/buffers: take into account FRONT_AND_BACK on ReadBuffer

2017-01-11 Thread Alejandro Piñeiro
From OpenGL 3.1 spec, section 4.3.1 "Reading Pixels", page 190 (203 PDF)

  "When READ FRAMEBUFFER BINDING is zero, i.e. the default
   framebuffer, src must be one of the values listed in table 4.4,
   including NONE . FRONT_AND_BACK , FRONT , and LEFT refer to the
   front left buffer."

There is an equivalent text on OpenGL 4.5 spec, section 18.2.1
"Selecting Buffers for Reading", page 502 (524 PDF), so the behaviour
is still the same.

Part of the fix for:
GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors
---

Note that this functionality was not tested on piglit. Just sent
a test to the piglit list that will fail until this patch gets
accepted.

 src/mesa/main/buffers.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
index 92c1839..a5655f6 100644
--- a/src/mesa/main/buffers.c
+++ b/src/mesa/main/buffers.c
@@ -204,6 +204,8 @@ read_buffer_enum_to_index(const struct gl_context *ctx, 
GLenum buffer)
  return BUFFER_FRONT_LEFT;
   case GL_AUX0:
  return BUFFER_AUX0;
+  case GL_FRONT_AND_BACK:
+ return BUFFER_FRONT_LEFT;
   case GL_AUX1:
   case GL_AUX2:
   case GL_AUX3:
-- 
2.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] main/buffers: update error handling on DrawBuffers for 4.5

2017-01-11 Thread Alejandro Piñeiro
Before 4.5, GL_BACK was not allowed as a value of bufs. Since 4.5 it
is allowed under some circunstances:

From the OpenGL 4.5 specification, Section 17.4.1 "Selecting Buffers
for Writing", page 493 (page 515 of the PDF):
 "An INVALID_ENUM error is generated if any value in bufs is FRONT ,
  LEFT , RIGHT , or FRONT_AND_BACK . This restriction applies to both
  the de- fault framebuffer and framebuffer objects, and exists
  because these constants may themselves refer to multiple buffers, as
  shown in table 17.4."

And on page 492 (page 514 of the PDF):
 "If the default framebuffer is affected, then each of the constants
  must be one of the values listed in table 17.6 or the special value
  BACK . When BACK is used, n must be 1 and color values are written
  into the left buffer for single-buffered contexts, or into the back
  left buffer for double-buffered contexts."

This patch keeps the same behaviour if OpenGL version is < 4. We
assume that for 4.x this is the intended behaviour, so a fix, but for
3.x the intended behaviour is the already in place.

Part of the fix for:
GL45-CTS.direct_state_access.framebuffers_draw_read_buffers_errors
---

Note that this change would make piglit test gl-3.1-draw-buffers to fail.

I have just sent patches to update that test, and a gl-4.5 equivalent
(using NamedFramebufferDrawBuffers) to the piglit list.

 src/mesa/main/buffers.c | 47 ++-
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/src/mesa/main/buffers.c b/src/mesa/main/buffers.c
index 2b24e5a..92c1839 100644
--- a/src/mesa/main/buffers.c
+++ b/src/mesa/main/buffers.c
@@ -343,7 +343,9 @@ _mesa_NamedFramebufferDrawBuffer(GLuint framebuffer, GLenum 
buf)
  * \param n  number of outputs
  * \param buffers  array [n] of renderbuffer names.  Unlike glDrawBuffer, the
  * names cannot specify more than one buffer.  For example,
- * GL_FRONT_AND_BACK is illegal.
+ * GL_FRONT_AND_BACK is illegal. The only exception is GL_BACK
+ * that is considered special and allowed as far as n is one
+ * since 4.5.
  */
 static void
 draw_buffers(struct gl_context *ctx, struct gl_framebuffer *fb,
@@ -401,20 +403,39 @@ draw_buffers(struct gl_context *ctx, struct 
gl_framebuffer *fb,
  return;
   }
 
-  /* From the OpenGL 4.0 specification, page 256:
-   * "For both the default framebuffer and framebuffer objects, the
-   *  constants FRONT, BACK, LEFT, RIGHT, and FRONT_AND_BACK are not
-   *  valid in the bufs array passed to DrawBuffers, and will result in
-   *  the error INVALID_ENUM. This restriction is because these
-   *  constants may themselves refer to multiple buffers, as shown in
-   *  table 4.4."
-   *  Previous versions of the OpenGL specification say INVALID_OPERATION,
-   *  but the Khronos conformance tests expect INVALID_ENUM.
+  /* From the OpenGL 4.5 specification, page 493 (page 515 of the PDF)
+   * "An INVALID_ENUM error is generated if any value in bufs is FRONT
+   *  , LEFT , RIGHT , or FRONT_AND_BACK . This restriction applies to
+   *  both the de- fault framebuffer and framebuffer objects, and
+   *  exists because these constants may themselves refer to multiple
+   *  buffers, as shown in table 17.4."
+   *
+   * And on page 492 (page 514 of the PDF):
+   * "If the default framebuffer is affected, then each of the
+   *  constants must be one of the values listed in table 17.6 or the
+   *  special value BACK . When BACK is used, n must be 1 and color
+   *  values are written into the left buffer for single-buffered
+   *  contexts, or into the back left buffer for double-buffered
+   *  contexts."
+   *
+   * Note "special value BACK". GL_BACK also refers to multiple buffers,
+   * but it is consider a special case here. This is a change on 4.5. For
+   * OpenGL 4.x we check that behaviour. For any previous version we keep
+   * considering it wrong (as INVALID_ENUM).
*/
   if (_mesa_bitcount(destMask[output]) > 1) {
- _mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)",
- caller, _mesa_enum_to_string(buffers[output]));
- return;
+ if (_mesa_is_winsys_fbo(fb) && ctx->Version >= 40 && buffers[output] 
== GL_BACK) {
+printf("default framebuffer! with buffer %s\n", 
_mesa_enum_to_string(buffers[output]));
+if (n != 1) {
+   _mesa_error(ctx, GL_INVALID_OPERATION, "%s(with GL_BACK n must 
be 1)",
+   caller);
+   return;
+}
+ } else {
+_mesa_error(ctx, GL_INVALID_ENUM, "%s(invalid buffer %s)",
+caller, _mesa_enum_to_string(buffers[output]));
+return;
+ }
   }
 
   /* Section 4.2 (Whole Framebuffer Operations) of the OpenGL ES 3.0
-- 

Re: [Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

2017-01-11 Thread Roland Scheidegger
Am 11.01.2017 um 21:08 schrieb Samuel Pitoiset:
> 
> 
> On 01/11/2017 07:00 PM, Roland Scheidegger wrote:
>> I don't think there's any glsl, es or otherwise, specification which
>> would require denorms (since obviously lots of hw can't do it, d3d10
>> forbids them), with any precision qualifier. Hence these look like bugs
>> of the test suite to me?
>> (Irrespective if it's a good idea or not to enable denormals, which I
>> don't realy know.)
> 
> That test works on NVIDIA hw (both with blob and nouveau) and IIRC it
> also works on Intel hw. I don't think it's buggy there.
The question then is why it needs denorms on radeons...

Roland


> 
>>
>> Roland
>>
>>
>> Am 11.01.2017 um 18:29 schrieb Samuel Pitoiset:
>>> Only VI can do 32-bit denormals at full rate while previous
>>> generations can do it only for 64-bit and 16-bit.
>>>
>>> This fixes some dEQP tests with the highp type qualifier.
>>>
>>> Bugzilla:
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.freedesktop.org_show-5Fbug.cgi-3Fid-3D99343=DwICaQ=uilaK90D4TOVoH58JNXRgQ=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0=DORsZmnfns66hjZY_OiEwB7cjwlqqJ-1spZXHa_yu7g=w31p8T6Q7NMLHvXzhcryng-QMyOnTbAtYbgUx1cDuhc=
>>> Signed-off-by: Samuel Pitoiset 
>>> ---
>>>  src/gallium/drivers/radeonsi/si_shader.c | 11 ---
>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c
>>> b/src/gallium/drivers/radeonsi/si_shader.c
>>> index 5dfbd6603a..e9cb11883f 100644
>>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>>> @@ -6361,8 +6361,10 @@ int si_compile_llvm(struct si_screen *sscreen,
>>>
>>>  si_shader_binary_read_config(binary, conf, 0);
>>>
>>> -/* Enable 64-bit and 16-bit denormals, because there is no
>>> performance
>>> - * cost.
>>> +/* Enable denormals when there is no performance cost.
>>> + *
>>> + * Only VI can do 32-bit denormals at full rate while previous
>>> + * generations can do it only for 64-bit and 16-bit.
>>>   *
>>>   * If denormals are enabled, all floating-point output modifiers
>>> are
>>>   * ignored.
>>> @@ -6373,7 +6375,10 @@ int si_compile_llvm(struct si_screen *sscreen,
>>>   *   have to stop using those.
>>>   * - SI & CI would be very slow.
>>>   */
>>> -conf->float_mode |= V_00B028_FP_64_DENORMS;
>>> +if (sscreen->b.chip_class >= VI)
>>> +conf->float_mode |= V_00B028_FP_ALL_DENORMS;
>>> +else
>>> +conf->float_mode |= V_00B028_FP_64_DENORMS;
>>>
>>>  FREE(binary->config);
>>>  FREE(binary->global_symbol_offsets);
>>>
>>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 24/25] mesa/glsl: set and get cs layouts to and from shader_info

2017-01-11 Thread Timothy Arceri
On Wed, 2017-01-11 at 12:54 +, Lionel Landwerlin wrote:
> On 09/01/17 05:13, Timothy Arceri wrote:
> > ---
> >   src/compiler/glsl/linker.cpp | 35 +++--
> > --
> >   src/mesa/main/mtypes.h   | 10 --
> >   src/mesa/main/shaderapi.c|  6 ++
> >   src/mesa/main/shaderobj.c|  2 --
> >   4 files changed, 17 insertions(+), 36 deletions(-)
> > 
> > diff --git a/src/compiler/glsl/linker.cpp
> > b/src/compiler/glsl/linker.cpp
> > index 53ee7e6..f822778 100644
> > --- a/src/compiler/glsl/linker.cpp
> > +++ b/src/compiler/glsl/linker.cpp
> > @@ -2005,21 +2005,21 @@ link_gs_inout_layout_qualifiers(struct
> > gl_shader_program *prog,
> >    */
> >   static void
> >   link_cs_input_layout_qualifiers(struct gl_shader_program *prog,
> > -struct gl_linked_shader
> > *linked_shader,
> > +struct gl_program *gl_prog,
> >   struct gl_shader **shader_list,
> >   unsigned num_shaders)
> >   {
> > -   for (int i = 0; i < 3; i++)
> > -  linked_shader->info.Comp.LocalSize[i] = 0;
> > -
> > -   linked_shader->info.Comp.LocalSizeVariable = false;
> > -
> >  /* This function is called for all shader stages, but it only
> > has an effect
> >   * for compute shaders.
> >   */
> > -   if (linked_shader->Stage != MESA_SHADER_COMPUTE)
> > +   if (gl_prog->info.stage != MESA_SHADER_COMPUTE)
> > return;
> >   
> > +   for (int i = 0; i < 3; i++)
> > +  gl_prog->info.cs.local_size[i] = 0;
> > +
> > +   gl_prog->info.cs.local_size_variable = false;
> > +
> >  /* From the ARB_compute_shader spec, in the section describing
> > local size
> >   * declarations:
> >   *
> > @@ -2034,9 +2034,9 @@ link_cs_input_layout_qualifiers(struct
> > gl_shader_program *prog,
> > struct gl_shader *shader = shader_list[sh];
> >   
> > if (shader->info.Comp.LocalSize[0] != 0) {
> > - if (linked_shader->info.Comp.LocalSize[0] != 0) {
> > + if (gl_prog->info.cs.local_size[0] != 0) {
> >   for (int i = 0; i < 3; i++) {
> > -   if (linked_shader->info.Comp.LocalSize[i] !=
> > +   if (gl_prog->info.cs.local_size[i] !=
> >  shader->info.Comp.LocalSize[i]) {
> > linker_error(prog, "compute shader defined with
> > conflicting "
> >  "local sizes\n");
> > @@ -2045,11 +2045,11 @@ link_cs_input_layout_qualifiers(struct
> > gl_shader_program *prog,
> >   }
> >    }
> >    for (int i = 0; i < 3; i++) {
> > -linked_shader->info.Comp.LocalSize[i] =
> > +gl_prog->info.cs.local_size[i] =
> >  shader->info.Comp.LocalSize[i];
> >    }
> > } else if (shader->info.Comp.LocalSizeVariable) {
> > - if (linked_shader->info.Comp.LocalSize[0] != 0) {
> > + if (gl_prog->info.cs.local_size[0] != 0) {
> >   /* The ARB_compute_variable_group_size spec says:
> >    *
> >    * If one compute shader attached to a program
> > declares a
> > @@ -2061,7 +2061,7 @@ link_cs_input_layout_qualifiers(struct
> > gl_shader_program *prog,
> >    "variable local group size\n");
> >   return;
> >    }
> > - linked_shader->info.Comp.LocalSizeVariable = true;
> > + gl_prog->info.cs.local_size_variable = true;
> > }
> >  }
> >   
> > @@ -2069,17 +2069,12 @@ link_cs_input_layout_qualifiers(struct
> > gl_shader_program *prog,
> >   * since we already know we're in the right type of shader
> > program
> >   * for doing it.
> >   */
> > -   if (linked_shader->info.Comp.LocalSize[0] == 0 &&
> > -   !linked_shader->info.Comp.LocalSizeVariable) {
> > +   if (gl_prog->info.cs.local_size[0] == 0 &&
> > +   !gl_prog->info.cs.local_size_variable) {
> > linker_error(prog, "compute shader must contain a fixed or
> > a variable "
> >    "local group size\n");
> > return;
> >  }
> > -   for (int i = 0; i < 3; i++)
> > -  prog->Comp.LocalSize[i] = linked_shader-
> > >info.Comp.LocalSize[i];
> > -
> > -   prog->Comp.LocalSizeVariable =
> > -  linked_shader->info.Comp.LocalSizeVariable;
> >   }
> >   
> >   
> > @@ -2209,7 +2204,7 @@ link_intrastage_shaders(void *mem_ctx,
> >  link_tcs_out_layout_qualifiers(prog, gl_prog, shader_list,
> > num_shaders);
> >  link_tes_in_layout_qualifiers(prog, gl_prog, shader_list,
> > num_shaders);
> >  link_gs_inout_layout_qualifiers(prog, gl_prog, shader_list,
> > num_shaders);
> > -   link_cs_input_layout_qualifiers(prog, linked, shader_list,
> > num_shaders);
> > +   link_cs_input_layout_qualifiers(prog, gl_prog, shader_list,
> > num_shaders);
> >  link_xfb_stride_layout_qualifiers(ctx, prog, linked,
> > shader_list,
> >   

Re: [Mesa-dev] [PATCH 03/27] gbm: Export a plane getter function

2017-01-11 Thread Jason Ekstrand
On Tue, Jan 10, 2017 at 2:46 AM, Daniel Stone  wrote:

> Hi,
>
> On 10 January 2017 at 05:49, Ben Widawsky 
> wrote:
> > On 17-01-09 11:56:04, Jason Ekstrand wrote:
> >> Do we need to do any error checking here?  Do we need to check for the
> >> right dri image extension version?  Do we need to check queryImage !=
> >> NULL?  Do we need to check a return value?
> >>
> >> I ask because I genuinely don't know how this stuff is supposed to work.
> >> Returning a default of 1 seems reasonable.
> >
> > I'm not entirely sure about a reasonable default, 1 seemed right to me,
> but
> > Eric
> > E seemed to think we should return 0, it's hard to test, so I'll defer to
> > anyone
> > that claims to be the expert.
>
> Drivers which don't support the new interfaces can by definition only
> export single-plane images. So, 1.
>

An expert has spoken!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 1:12 PM, Erik Faye-Lund  wrote:

> On Wed, Jan 11, 2017 at 9:49 PM, Erik Faye-Lund 
> wrote:
> > On Wed, Jan 11, 2017 at 9:42 PM, Erik Faye-Lund 
> wrote:
> >> On Wed, Jan 11, 2017 at 9:22 PM, Samuel Pitoiset
> >>  wrote:
> >>>
> >>>
> >>> On 01/11/2017 07:34 PM, Erik Faye-Lund wrote:
> 
>  On Wed, Jan 11, 2017 at 7:33 PM, Jason Ekstrand  >
>  wrote:
> >
> > On Wed, Jan 11, 2017 at 10:31 AM, Erik Faye-Lund <
> kusmab...@gmail.com>
> > wrote:
> >>
> >>
> >> On Wed, Jan 11, 2017 at 7:22 PM, Marek Olšák 
> wrote:
> >>>
> >>> On Wed, Jan 11, 2017 at 7:09 PM, Jason Ekstrand <
> ja...@jlekstrand.net>
> >>> wrote:
> 
>  On Wed, Jan 11, 2017 at 9:32 AM, Samuel Pitoiset
>  
>  wrote:
> >
> >
> >
> >
> > On 01/11/2017 05:32 PM, Marek Olšák wrote:
> >>
> >>
> >> On Wed, Jan 11, 2017 at 4:33 PM, Erik Faye-Lund
> >> 
> >> wrote:
> >>>
> >>>
> >>> On Wed, Jan 11, 2017 at 4:14 PM, Nicolai Hähnle
> >>> 
> >>> wrote:
> 
> 
>  On 11.01.2017 13:17, Marek Olšák wrote:
> >
> >
> >
> > On Tue, Jan 10, 2017 at 6:48 PM, Jason Ekstrand
> > 
> > wrote:
> >>
> >>
> >>
> >> I'll be honest, I'm not a fan... Given that D3D10 has one
> >> defined
> >> behavior,
> >> D3D9 has another, and GL doesn't specify, I don't really
> think
> >> we
> >> should
> >> be
> >> making a global change to all drivers to do the D3D9
> behavior
> >> just to
> >> fix
> >> one app.  Sure, other apps probably have the same bug, but
> are
> >> we
> >> going
> >> to
> >> have apps that expect the D3D10 behavior that we've now
> >> explicitly
> >> made
> >> not
> >> work?
> >>
> >> If we're going to hack around an app bug, I would really
> rather
> >> see
> >> it
> >> behind a driconf option rather than a global change to
> driver
> >> behavior.
> >> Even better, it'd be cool if we could see the app get fixed.
> >> (Yes, I
> >> know
> >> that's not likely).
> >
> >
> >
> >
> > I think we are not in a position to refuse this workaround,
> or
> > put
> > more precisely, to have a different behavior from everybody
> else.
> > By
> > "we", I mean i965, radeonsi, svga. All closed drivers use
> abs.
> > Many
> > Mesa drivers also use abs internally (r300, r600, nv30,
> > nv50/nvc0).
> > This is not really a workaround for a specific application,
> even
> > though it's strongly motivated by that. It's a fix to align
> the
> > few
> > remaining drivers with all others.
> >
> > We talked with the publisher about this a very long time ago.
> > While I
> > don't remember the details (Nicolai?), I think they refused
> to
> > fix
> > it
> > because radeonsi appeared to be the only driver not doing
> abs.
> 
> 
> 
> 
>  If I remember correctly, it wasn't so much a refusal as a
> lack of
>  follow-through. They even had an option in their framework to
> add
>  the
>  abs(...) when translating shaders, but somehow didn't turn it
> on
>  unconditionally for some reason...
> >>>
> >>>
> >>>
> >>> VP even says so here:
> >>> https://github.com/virtual-programming/specops-linux/issues/20
> >>>
> >>> They recommend against patching mesa to do abs, though.
> >>
> >>
> >>
> >> We should still patch Mesa to align the behavior with closed
> drivers
> >> and gallium drivers like r600g and nouveau. In other words,
> it's too
> >> late to tell us not to patch Mesa, because r600g and nouveau
> have
> >> been
> >> "patched" since the beginning.
> >>
> >> We only need to decide whether we should do it in the GLSL
> compiler
> >> or
> >> radeonsi, i.e. whether we should exclude i965 and svga.
> >
> >
> >
> > I do 

Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Erik Faye-Lund
On Wed, Jan 11, 2017 at 9:49 PM, Erik Faye-Lund  wrote:
> On Wed, Jan 11, 2017 at 9:42 PM, Erik Faye-Lund  wrote:
>> On Wed, Jan 11, 2017 at 9:22 PM, Samuel Pitoiset
>>  wrote:
>>>
>>>
>>> On 01/11/2017 07:34 PM, Erik Faye-Lund wrote:

 On Wed, Jan 11, 2017 at 7:33 PM, Jason Ekstrand 
 wrote:
>
> On Wed, Jan 11, 2017 at 10:31 AM, Erik Faye-Lund 
> wrote:
>>
>>
>> On Wed, Jan 11, 2017 at 7:22 PM, Marek Olšák  wrote:
>>>
>>> On Wed, Jan 11, 2017 at 7:09 PM, Jason Ekstrand 
>>> wrote:

 On Wed, Jan 11, 2017 at 9:32 AM, Samuel Pitoiset
 
 wrote:
>
>
>
>
> On 01/11/2017 05:32 PM, Marek Olšák wrote:
>>
>>
>> On Wed, Jan 11, 2017 at 4:33 PM, Erik Faye-Lund
>> 
>> wrote:
>>>
>>>
>>> On Wed, Jan 11, 2017 at 4:14 PM, Nicolai Hähnle
>>> 
>>> wrote:


 On 11.01.2017 13:17, Marek Olšák wrote:
>
>
>
> On Tue, Jan 10, 2017 at 6:48 PM, Jason Ekstrand
> 
> wrote:
>>
>>
>>
>> I'll be honest, I'm not a fan... Given that D3D10 has one
>> defined
>> behavior,
>> D3D9 has another, and GL doesn't specify, I don't really think
>> we
>> should
>> be
>> making a global change to all drivers to do the D3D9 behavior
>> just to
>> fix
>> one app.  Sure, other apps probably have the same bug, but are
>> we
>> going
>> to
>> have apps that expect the D3D10 behavior that we've now
>> explicitly
>> made
>> not
>> work?
>>
>> If we're going to hack around an app bug, I would really rather
>> see
>> it
>> behind a driconf option rather than a global change to driver
>> behavior.
>> Even better, it'd be cool if we could see the app get fixed.
>> (Yes, I
>> know
>> that's not likely).
>
>
>
>
> I think we are not in a position to refuse this workaround, or
> put
> more precisely, to have a different behavior from everybody else.
> By
> "we", I mean i965, radeonsi, svga. All closed drivers use abs.
> Many
> Mesa drivers also use abs internally (r300, r600, nv30,
> nv50/nvc0).
> This is not really a workaround for a specific application, even
> though it's strongly motivated by that. It's a fix to align the
> few
> remaining drivers with all others.
>
> We talked with the publisher about this a very long time ago.
> While I
> don't remember the details (Nicolai?), I think they refused to
> fix
> it
> because radeonsi appeared to be the only driver not doing abs.




 If I remember correctly, it wasn't so much a refusal as a lack of
 follow-through. They even had an option in their framework to add
 the
 abs(...) when translating shaders, but somehow didn't turn it on
 unconditionally for some reason...
>>>
>>>
>>>
>>> VP even says so here:
>>> https://github.com/virtual-programming/specops-linux/issues/20
>>>
>>> They recommend against patching mesa to do abs, though.
>>
>>
>>
>> We should still patch Mesa to align the behavior with closed drivers
>> and gallium drivers like r600g and nouveau. In other words, it's too
>> late to tell us not to patch Mesa, because r600g and nouveau have
>> been
>> "patched" since the beginning.
>>
>> We only need to decide whether we should do it in the GLSL compiler
>> or
>> radeonsi, i.e. whether we should exclude i965 and svga.
>
>
>
> I do agree with that.



 I tend to disagree but I've come to the conclusion that I won't stand
 in the
 way either.  If both of the other desktop vendors do it and we've
 already
 decided that no implementation we care about will have its performance
 impacted, it seems like a valid spec-compliant thing to do.  I would
 

Re: [Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 12:08 PM, Samuel Pitoiset  wrote:

>
>
> On 01/11/2017 07:00 PM, Roland Scheidegger wrote:
>
>> I don't think there's any glsl, es or otherwise, specification which
>> would require denorms (since obviously lots of hw can't do it, d3d10
>> forbids them), with any precision qualifier. Hence these look like bugs
>> of the test suite to me?
>> (Irrespective if it's a good idea or not to enable denormals, which I
>> don't realy know.)
>>
>
> That test works on NVIDIA hw (both with blob and nouveau) and IIRC it also
> works on Intel hw. I don't think it's buggy there.
>

Intel HW has full denorm support.  Just because the test works for us
doesn't mean it's valid.


>
>
>> Roland
>>
>>
>> Am 11.01.2017 um 18:29 schrieb Samuel Pitoiset:
>>
>>> Only VI can do 32-bit denormals at full rate while previous
>>> generations can do it only for 64-bit and 16-bit.
>>>
>>> This fixes some dEQP tests with the highp type qualifier.
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99343
>>> Signed-off-by: Samuel Pitoiset 
>>> ---
>>>  src/gallium/drivers/radeonsi/si_shader.c | 11 ---
>>>  1 file changed, 8 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/src/gallium/drivers/radeonsi/si_shader.c
>>> b/src/gallium/drivers/radeonsi/si_shader.c
>>> index 5dfbd6603a..e9cb11883f 100644
>>> --- a/src/gallium/drivers/radeonsi/si_shader.c
>>> +++ b/src/gallium/drivers/radeonsi/si_shader.c
>>> @@ -6361,8 +6361,10 @@ int si_compile_llvm(struct si_screen *sscreen,
>>>
>>> si_shader_binary_read_config(binary, conf, 0);
>>>
>>> -   /* Enable 64-bit and 16-bit denormals, because there is no
>>> performance
>>> -* cost.
>>> +   /* Enable denormals when there is no performance cost.
>>> +*
>>> +* Only VI can do 32-bit denormals at full rate while previous
>>> +* generations can do it only for 64-bit and 16-bit.
>>>  *
>>>  * If denormals are enabled, all floating-point output modifiers
>>> are
>>>  * ignored.
>>> @@ -6373,7 +6375,10 @@ int si_compile_llvm(struct si_screen *sscreen,
>>>  *   have to stop using those.
>>>  * - SI & CI would be very slow.
>>>  */
>>> -   conf->float_mode |= V_00B028_FP_64_DENORMS;
>>> +   if (sscreen->b.chip_class >= VI)
>>> +   conf->float_mode |= V_00B028_FP_ALL_DENORMS;
>>> +   else
>>> +   conf->float_mode |= V_00B028_FP_64_DENORMS;
>>>
>>> FREE(binary->config);
>>> FREE(binary->global_symbol_offsets);
>>>
>>>
>> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/9] nir: Introduce a nir_opt_move_comparisons() pass.

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 12:49 PM, Jason Ekstrand 
wrote:

> I just sent three squash-in patches that I think improve this pass a bit
> and make things more clear.  With those (or a good reason why not),
>

FYI, they've already been run through Jenkins. :-)


> Reviewed-by: Jason Ekstrand 
>
> On Tue, Jan 10, 2017 at 1:41 AM, Timothy Arceri <
> timothy.arc...@collabora.com> wrote:
>
>> From: Kenneth Graunke 
>>
>> This tries to move comparisons (a common source of boolean values)
>> closer to their first use.  For GPUs which use condition codes,
>> this can eliminate a lot of temporary booleans and comparisons
>> which reload the condition code register based on a boolean.
>>
>> V2: (Timothy Arceri)
>>  - fix move comparision for phis so we dont end up with:
>>
>> vec1 32 ssa_227 = phi block_34: ssa_1, block_38: ssa_240
>> vec1 32 ssa_235 = feq ssa_227, ssa_1
>> vec1 32 ssa_230 = phi block_34: ssa_221, block_38: ssa_235
>>
>>  - add nir_op_i2b/nir_op_f2b to the list of comparisons.
>>
>> V3: (Timothy Arceri)
>>  - tidy up suggested by Jason.
>>  - add inot/fnot to move comparison list
>>
>> Signed-off-by: Kenneth Graunke 
>> Reviewed-by: Ian Romanick  [v1]
>> ---
>>  src/compiler/Makefile.sources   |   1 +
>>  src/compiler/nir/nir.h  |   2 +
>>  src/compiler/nir/nir_opt_move_comparisons.c | 179
>> 
>>  3 files changed, 182 insertions(+)
>>  create mode 100644 src/compiler/nir/nir_opt_move_comparisons.c
>>
>> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.source
>> s
>> index 52f6e54..6da854e 100644
>> --- a/src/compiler/Makefile.sources
>> +++ b/src/compiler/Makefile.sources
>> @@ -245,6 +245,7 @@ NIR_FILES = \
>> nir/nir_opt_global_to_local.c \
>> nir/nir_opt_if.c \
>> nir/nir_opt_loop_unroll.c \
>> +   nir/nir_opt_move_comparisons.c \
>> nir/nir_opt_peephole_select.c \
>> nir/nir_opt_remove_phis.c \
>> nir/nir_opt_trivial_continues.c \
>> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
>> index c9226e9..2bff0b5 100644
>> --- a/src/compiler/nir/nir.h
>> +++ b/src/compiler/nir/nir.h
>> @@ -2589,6 +2589,8 @@ bool nir_opt_if(nir_shader *shader);
>>
>>  bool nir_opt_loop_unroll(nir_shader *shader, nir_variable_mode
>> indirect_mask);
>>
>> +bool nir_opt_move_comparisons(nir_shader *shader);
>> +
>>  bool nir_opt_peephole_select(nir_shader *shader, unsigned limit);
>>
>>  bool nir_opt_remove_phis(nir_shader *shader);
>> diff --git a/src/compiler/nir/nir_opt_move_comparisons.c
>> b/src/compiler/nir/nir_opt_move_comparisons.c
>> new file mode 100644
>> index 000..651b937
>> --- /dev/null
>> +++ b/src/compiler/nir/nir_opt_move_comparisons.c
>> @@ -0,0 +1,179 @@
>> +/*
>> + * Copyright © 2016 Intel Corporation
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining
>> a
>> + * copy of this software and associated documentation files (the
>> "Software"),
>> + * to deal in the Software without restriction, including without
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice (including the
>> next
>> + * paragraph) shall be included in all copies or substantial portions of
>> the
>> + * Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
>> SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> ARISING
>> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
>> DEALINGS
>> + * IN THE SOFTWARE.
>> + */
>> +
>> +#include "nir.h"
>> +
>> +/**
>> + * \file nir_opt_move_comparisons.c
>> + *
>> + * This pass moves ALU comparison operations just before their first use.
>> + *
>> + * It only moves instructions within a single basic block; cross-block
>> + * movement is left to global code motion.
>> + *
>> + * Many GPUs generate condition codes for comparisons, and use
>> predication
>> + * for conditional selects and control flow.  In a sequence such as:
>> + *
>> + * vec1 32 ssa_1 = flt a b
>> + * 
>> + * vec1 32 ssa_2 = bcsel ssa_1 c d
>> + *
>> + * the backend would likely do the comparison, producing condition codes,
>> + * then save those to a boolean value.  The intervening operations might
>> + * trash the condition codes.  Then, in order to do the bcsel, it would
>> + * need to re-populate the condition code 

Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Erik Faye-Lund
On Wed, Jan 11, 2017 at 9:42 PM, Erik Faye-Lund  wrote:
> On Wed, Jan 11, 2017 at 9:22 PM, Samuel Pitoiset
>  wrote:
>>
>>
>> On 01/11/2017 07:34 PM, Erik Faye-Lund wrote:
>>>
>>> On Wed, Jan 11, 2017 at 7:33 PM, Jason Ekstrand 
>>> wrote:

 On Wed, Jan 11, 2017 at 10:31 AM, Erik Faye-Lund 
 wrote:
>
>
> On Wed, Jan 11, 2017 at 7:22 PM, Marek Olšák  wrote:
>>
>> On Wed, Jan 11, 2017 at 7:09 PM, Jason Ekstrand 
>> wrote:
>>>
>>> On Wed, Jan 11, 2017 at 9:32 AM, Samuel Pitoiset
>>> 
>>> wrote:




 On 01/11/2017 05:32 PM, Marek Olšák wrote:
>
>
> On Wed, Jan 11, 2017 at 4:33 PM, Erik Faye-Lund
> 
> wrote:
>>
>>
>> On Wed, Jan 11, 2017 at 4:14 PM, Nicolai Hähnle
>> 
>> wrote:
>>>
>>>
>>> On 11.01.2017 13:17, Marek Olšák wrote:



 On Tue, Jan 10, 2017 at 6:48 PM, Jason Ekstrand
 
 wrote:
>
>
>
> I'll be honest, I'm not a fan... Given that D3D10 has one
> defined
> behavior,
> D3D9 has another, and GL doesn't specify, I don't really think
> we
> should
> be
> making a global change to all drivers to do the D3D9 behavior
> just to
> fix
> one app.  Sure, other apps probably have the same bug, but are
> we
> going
> to
> have apps that expect the D3D10 behavior that we've now
> explicitly
> made
> not
> work?
>
> If we're going to hack around an app bug, I would really rather
> see
> it
> behind a driconf option rather than a global change to driver
> behavior.
> Even better, it'd be cool if we could see the app get fixed.
> (Yes, I
> know
> that's not likely).




 I think we are not in a position to refuse this workaround, or
 put
 more precisely, to have a different behavior from everybody else.
 By
 "we", I mean i965, radeonsi, svga. All closed drivers use abs.
 Many
 Mesa drivers also use abs internally (r300, r600, nv30,
 nv50/nvc0).
 This is not really a workaround for a specific application, even
 though it's strongly motivated by that. It's a fix to align the
 few
 remaining drivers with all others.

 We talked with the publisher about this a very long time ago.
 While I
 don't remember the details (Nicolai?), I think they refused to
 fix
 it
 because radeonsi appeared to be the only driver not doing abs.
>>>
>>>
>>>
>>>
>>> If I remember correctly, it wasn't so much a refusal as a lack of
>>> follow-through. They even had an option in their framework to add
>>> the
>>> abs(...) when translating shaders, but somehow didn't turn it on
>>> unconditionally for some reason...
>>
>>
>>
>> VP even says so here:
>> https://github.com/virtual-programming/specops-linux/issues/20
>>
>> They recommend against patching mesa to do abs, though.
>
>
>
> We should still patch Mesa to align the behavior with closed drivers
> and gallium drivers like r600g and nouveau. In other words, it's too
> late to tell us not to patch Mesa, because r600g and nouveau have
> been
> "patched" since the beginning.
>
> We only need to decide whether we should do it in the GLSL compiler
> or
> radeonsi, i.e. whether we should exclude i965 and svga.



 I do agree with that.
>>>
>>>
>>>
>>> I tend to disagree but I've come to the conclusion that I won't stand
>>> in the
>>> way either.  If both of the other desktop vendors do it and we've
>>> already
>>> decided that no implementation we care about will have its performance
>>> impacted, it seems like a valid spec-compliant thing to do.  I would
>>> prefer
>>> it to be behind a driconf option, but if it's unconditional, oh well.
>>> My
>>> disagreement is mostly philosophical.
>>>
>>> Over the last two years of working on Vulkan, I've been fighting

Re: [Mesa-dev] [PATCH 4/9] nir: Introduce a nir_opt_move_comparisons() pass.

2017-01-11 Thread Jason Ekstrand
I just sent three squash-in patches that I think improve this pass a bit
and make things more clear.  With those (or a good reason why not),

Reviewed-by: Jason Ekstrand 

On Tue, Jan 10, 2017 at 1:41 AM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> From: Kenneth Graunke 
>
> This tries to move comparisons (a common source of boolean values)
> closer to their first use.  For GPUs which use condition codes,
> this can eliminate a lot of temporary booleans and comparisons
> which reload the condition code register based on a boolean.
>
> V2: (Timothy Arceri)
>  - fix move comparision for phis so we dont end up with:
>
> vec1 32 ssa_227 = phi block_34: ssa_1, block_38: ssa_240
> vec1 32 ssa_235 = feq ssa_227, ssa_1
> vec1 32 ssa_230 = phi block_34: ssa_221, block_38: ssa_235
>
>  - add nir_op_i2b/nir_op_f2b to the list of comparisons.
>
> V3: (Timothy Arceri)
>  - tidy up suggested by Jason.
>  - add inot/fnot to move comparison list
>
> Signed-off-by: Kenneth Graunke 
> Reviewed-by: Ian Romanick  [v1]
> ---
>  src/compiler/Makefile.sources   |   1 +
>  src/compiler/nir/nir.h  |   2 +
>  src/compiler/nir/nir_opt_move_comparisons.c | 179
> 
>  3 files changed, 182 insertions(+)
>  create mode 100644 src/compiler/nir/nir_opt_move_comparisons.c
>
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index 52f6e54..6da854e 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -245,6 +245,7 @@ NIR_FILES = \
> nir/nir_opt_global_to_local.c \
> nir/nir_opt_if.c \
> nir/nir_opt_loop_unroll.c \
> +   nir/nir_opt_move_comparisons.c \
> nir/nir_opt_peephole_select.c \
> nir/nir_opt_remove_phis.c \
> nir/nir_opt_trivial_continues.c \
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index c9226e9..2bff0b5 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2589,6 +2589,8 @@ bool nir_opt_if(nir_shader *shader);
>
>  bool nir_opt_loop_unroll(nir_shader *shader, nir_variable_mode
> indirect_mask);
>
> +bool nir_opt_move_comparisons(nir_shader *shader);
> +
>  bool nir_opt_peephole_select(nir_shader *shader, unsigned limit);
>
>  bool nir_opt_remove_phis(nir_shader *shader);
> diff --git a/src/compiler/nir/nir_opt_move_comparisons.c
> b/src/compiler/nir/nir_opt_move_comparisons.c
> new file mode 100644
> index 000..651b937
> --- /dev/null
> +++ b/src/compiler/nir/nir_opt_move_comparisons.c
> @@ -0,0 +1,179 @@
> +/*
> + * Copyright © 2016 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "nir.h"
> +
> +/**
> + * \file nir_opt_move_comparisons.c
> + *
> + * This pass moves ALU comparison operations just before their first use.
> + *
> + * It only moves instructions within a single basic block; cross-block
> + * movement is left to global code motion.
> + *
> + * Many GPUs generate condition codes for comparisons, and use predication
> + * for conditional selects and control flow.  In a sequence such as:
> + *
> + * vec1 32 ssa_1 = flt a b
> + * 
> + * vec1 32 ssa_2 = bcsel ssa_1 c d
> + *
> + * the backend would likely do the comparison, producing condition codes,
> + * then save those to a boolean value.  The intervening operations might
> + * trash the condition codes.  Then, in order to do the bcsel, it would
> + * need to re-populate the condition code register based on the boolean.
> + *
> + * By moving the comparison just before the bcsel, the condition codes
> could
> + * be used directly.  This eliminates the need to reload them from the
> boolean
> + * (generally eliminating an instruction).  It may also eliminate the
> need 

[Mesa-dev] [PATCH 3/3] rSQUASH/i965/opt_move_comparisons: rework phi handling

2017-01-11 Thread Jason Ekstrand
---
 src/compiler/nir/nir_opt_move_comparisons.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/compiler/nir/nir_opt_move_comparisons.c 
b/src/compiler/nir/nir_opt_move_comparisons.c
index 2bfd940..617c2ca 100644
--- a/src/compiler/nir/nir_opt_move_comparisons.c
+++ b/src/compiler/nir/nir_opt_move_comparisons.c
@@ -84,8 +84,7 @@ move_comparison_source(nir_src *src, nir_block *block, 
nir_instr *before)
 
if (src_instr->block == block &&
src_instr->type == nir_instr_type_alu &&
-   is_comparison(nir_instr_as_alu(src_instr)->op) &&
-   (!before || before->type != nir_instr_type_phi)) {
+   is_comparison(nir_instr_as_alu(src_instr)->op)) {
 
   exec_node_remove(_instr->node);
 
@@ -135,7 +134,18 @@ move_comparisons(nir_block *block)
}
 
nir_foreach_instr_reverse(instr, block) {
-  if (instr->type == nir_instr_type_alu) {
+  /* The sources of phi instructions happen after the predecessor block
+   * but before this block.  (Yes, that's between blocks).  This means
+   * that we don't need to move them in order for them to be correct.
+   * We could move them to encourage comparisons that are used in a phi to
+   * the end of the block, doing so correctly would make the pass
+   * substantially more complicated and wouldn't gain us anything since
+   * the phi can't use a flag value anyway.
+   */
+  if (instr->type == nir_instr_type_phi) {
+ /* We're going backwards so everything else is a phi too */
+ break;
+  } else if (instr->type == nir_instr_type_alu) {
  /* Walk ALU instruction sources backwards so that bcsel's boolean
   * condition is processed last.
   */
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] SQUASH/i965/opt_move_comparisons: get rid of the tuple

2017-01-11 Thread Jason Ekstrand
---
 src/compiler/nir/nir_opt_move_comparisons.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/src/compiler/nir/nir_opt_move_comparisons.c 
b/src/compiler/nir/nir_opt_move_comparisons.c
index 535009b..2bfd940 100644
--- a/src/compiler/nir/nir_opt_move_comparisons.c
+++ b/src/compiler/nir/nir_opt_move_comparisons.c
@@ -100,20 +100,14 @@ move_comparison_source(nir_src *src, nir_block *block, 
nir_instr *before)
return false;
 }
 
-/* nir_foreach_src callback boilerplate */
-struct nomc_tuple
-{
-   nir_instr *instr;
-   bool progress;
-};
-
 static bool
 move_comparison_source_cb(nir_src *src, void *data)
 {
-   struct nomc_tuple *tuple = data;
+   bool *progress = data;
 
-   if (move_comparison_source(src, tuple->instr->block, tuple->instr))
-  tuple->progress = true;
+   nir_instr *instr = src->parent_instr;
+   if (move_comparison_source(src, instr->block, instr))
+  *progress = true;
 
return true; /* nir_foreach_src should keep going */
 }
@@ -151,9 +145,7 @@ move_comparisons(nir_block *block)
block, instr);
  }
   } else {
- struct nomc_tuple tuple = { instr, false };
- nir_foreach_src(instr, move_comparison_source_cb, );
- progress |= tuple.progress;
+ nir_foreach_src(instr, move_comparison_source_cb, );
   }
}
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] SQUASH/i965/opt_move_comparisons: clean up move_comparison_source

2017-01-11 Thread Jason Ekstrand
---
 src/compiler/nir/nir_opt_move_comparisons.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/src/compiler/nir/nir_opt_move_comparisons.c 
b/src/compiler/nir/nir_opt_move_comparisons.c
index 651b937..535009b 100644
--- a/src/compiler/nir/nir_opt_move_comparisons.c
+++ b/src/compiler/nir/nir_opt_move_comparisons.c
@@ -77,18 +77,22 @@ is_comparison(nir_op op)
 static bool
 move_comparison_source(nir_src *src, nir_block *block, nir_instr *before)
 {
-   if (src->is_ssa && src->ssa->parent_instr->block == block &&
-   src->ssa->parent_instr->type == nir_instr_type_alu &&
-   is_comparison(nir_instr_as_alu(src->ssa->parent_instr)->op) &&
+   if (!src->is_ssa)
+  return false;
+
+   nir_instr *src_instr = src->ssa->parent_instr;
+
+   if (src_instr->block == block &&
+   src_instr->type == nir_instr_type_alu &&
+   is_comparison(nir_instr_as_alu(src_instr)->op) &&
(!before || before->type != nir_instr_type_phi)) {
 
-  struct exec_node *src_node = >ssa->parent_instr->node;
-  exec_node_remove(src_node);
+  exec_node_remove(_instr->node);
 
   if (before)
- exec_node_insert_node_before(>node, src_node);
+ exec_node_insert_node_before(>node, _instr->node);
   else
- exec_list_push_tail(>instr_list, src_node);
+ exec_list_push_tail(>instr_list, _instr->node);
 
   return true;
}
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/9] nir: tidy up swizzle handling in nir_search

2017-01-11 Thread Jason Ekstrand
On Tue, Jan 10, 2017 at 1:41 AM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> If we just check that we are not dealing with an identity swizzle
> in match_value() before calling match_expression() we can avoid
> a bunch of temp swizzle arrays and the passing it around and
> resetting craziness.
>

I believe I found the confusion.  You're mixing up explicitly sized
destinations and explicitly sized sources.  Explicitly sized destinations
can only have the identity swizzle, but explicitly sized sources we can
handle just fine.  For those, we just reset the swizzle to the identity and
keep matching.


> ---
>  src/compiler/nir/nir_search.c | 89 ++
> -
>  1 file changed, 38 insertions(+), 51 deletions(-)
>
> diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
> index b34b13f..7a84b18 100644
> --- a/src/compiler/nir/nir_search.c
> +++ b/src/compiler/nir/nir_search.c
> @@ -37,8 +37,7 @@ struct match_state {
>
>  static bool
>  match_expression(const nir_search_expression *expr, nir_alu_instr *instr,
> - unsigned num_components, const uint8_t *swizzle,
> - struct match_state *state);
> + unsigned num_components, struct match_state *state);
>
>  static const uint8_t identity_swizzle[] = { 0, 1, 2, 3 };
>
> @@ -93,22 +92,15 @@ src_is_type(nir_src src, nir_alu_type type)
>
>  static bool
>  match_value(const nir_search_value *value, nir_alu_instr *instr, unsigned
> src,
> -unsigned num_components, const uint8_t *swizzle,
> -struct match_state *state)
> +unsigned num_components, struct match_state *state)
>  {
> -   uint8_t new_swizzle[4];
> -
> /* If the source is an explicitly sized source, then we need to reset
> -* both the number of components and the swizzle.
> +* the number of components.
>  */
> if (nir_op_infos[instr->op].input_sizes[src] != 0) {
>num_components = nir_op_infos[instr->op].input_sizes[src];
> -  swizzle = identity_swizzle;
> }
>
> -   for (unsigned i = 0; i < num_components; ++i)
> -  new_swizzle[i] = instr->src[src].swizzle[swizzle[i]];
> -
> /* If the value has a specific bit size and it doesn't match, bail */
> if (value->bit_size &&
> nir_src_bit_size(instr->src[src].src) != value->bit_size)
> @@ -122,9 +114,23 @@ match_value(const nir_search_value *value,
> nir_alu_instr *instr, unsigned src,
>if (instr->src[src].src.ssa->parent_instr->type !=
> nir_instr_type_alu)
>   return false;
>
> +  /* If we have an explicitly sized destination, we can only handle
> the
> +   * identity swizzle.  While dot(vec3(a, b, c).zxy) is a valid
> +   * expression, we don't have the information right now to propagate
> that
> +   * swizzle through.  We can only properly propagate swizzles if the
> +   * instruction is vectorized.
> +   */
> +  nir_alu_instr *alu_instr =
> + nir_instr_as_alu(instr->src[src].src.ssa->parent_instr);
> +  if (nir_op_infos[alu_instr->op].output_size != 0) {
> + for (unsigned i = 0; i < num_components; i++) {
> +if (instr->src[src].swizzle[i] != i)
> +   return false;
> + }
> +  }
> +
>return match_expression(nir_search_value_as_expression(value),
> -  nir_instr_as_alu(instr->src[
> src].src.ssa->parent_instr),
> -  num_components, new_swizzle, state);
> +  alu_instr, num_components, state);
>
> case nir_search_value_variable: {
>nir_search_variable *var = nir_search_value_as_variable(value);
> @@ -138,7 +144,8 @@ match_value(const nir_search_value *value,
> nir_alu_instr *instr, unsigned src,
>   assert(!instr->src[src].abs && !instr->src[src].negate);
>
>   for (unsigned i = 0; i < num_components; ++i) {
> -if (state->variables[var->variable].swizzle[i] !=
> new_swizzle[i])
> +if (state->variables[var->variable].swizzle[i] !=
> +instr->src[src].swizzle[i])
> return false;
>   }
>
> @@ -148,7 +155,8 @@ match_value(const nir_search_value *value,
> nir_alu_instr *instr, unsigned src,
>   instr->src[src].src.ssa->parent_instr->type !=
> nir_instr_type_load_const)
>  return false;
>
> - if (var->cond && !var->cond(instr, src, num_components,
> new_swizzle))
> + if (var->cond && !var->cond(instr, src, num_components,
> + instr->src[src].swizzle))
>  return false;
>
>   if (var->type != nir_type_invalid &&
> @@ -161,9 +169,10 @@ match_value(const nir_search_value *value,
> nir_alu_instr *instr, unsigned src,
>   state->variables[var->variable].negate = false;
>
>   for (unsigned i = 0; i < 4; ++i) {
> -if (i < num_components)
> -   

Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Erik Faye-Lund
On Wed, Jan 11, 2017 at 9:22 PM, Samuel Pitoiset
 wrote:
>
>
> On 01/11/2017 07:34 PM, Erik Faye-Lund wrote:
>>
>> On Wed, Jan 11, 2017 at 7:33 PM, Jason Ekstrand 
>> wrote:
>>>
>>> On Wed, Jan 11, 2017 at 10:31 AM, Erik Faye-Lund 
>>> wrote:


 On Wed, Jan 11, 2017 at 7:22 PM, Marek Olšák  wrote:
>
> On Wed, Jan 11, 2017 at 7:09 PM, Jason Ekstrand 
> wrote:
>>
>> On Wed, Jan 11, 2017 at 9:32 AM, Samuel Pitoiset
>> 
>> wrote:
>>>
>>>
>>>
>>>
>>> On 01/11/2017 05:32 PM, Marek Olšák wrote:


 On Wed, Jan 11, 2017 at 4:33 PM, Erik Faye-Lund
 
 wrote:
>
>
> On Wed, Jan 11, 2017 at 4:14 PM, Nicolai Hähnle
> 
> wrote:
>>
>>
>> On 11.01.2017 13:17, Marek Olšák wrote:
>>>
>>>
>>>
>>> On Tue, Jan 10, 2017 at 6:48 PM, Jason Ekstrand
>>> 
>>> wrote:



 I'll be honest, I'm not a fan... Given that D3D10 has one
 defined
 behavior,
 D3D9 has another, and GL doesn't specify, I don't really think
 we
 should
 be
 making a global change to all drivers to do the D3D9 behavior
 just to
 fix
 one app.  Sure, other apps probably have the same bug, but are
 we
 going
 to
 have apps that expect the D3D10 behavior that we've now
 explicitly
 made
 not
 work?

 If we're going to hack around an app bug, I would really rather
 see
 it
 behind a driconf option rather than a global change to driver
 behavior.
 Even better, it'd be cool if we could see the app get fixed.
 (Yes, I
 know
 that's not likely).
>>>
>>>
>>>
>>>
>>> I think we are not in a position to refuse this workaround, or
>>> put
>>> more precisely, to have a different behavior from everybody else.
>>> By
>>> "we", I mean i965, radeonsi, svga. All closed drivers use abs.
>>> Many
>>> Mesa drivers also use abs internally (r300, r600, nv30,
>>> nv50/nvc0).
>>> This is not really a workaround for a specific application, even
>>> though it's strongly motivated by that. It's a fix to align the
>>> few
>>> remaining drivers with all others.
>>>
>>> We talked with the publisher about this a very long time ago.
>>> While I
>>> don't remember the details (Nicolai?), I think they refused to
>>> fix
>>> it
>>> because radeonsi appeared to be the only driver not doing abs.
>>
>>
>>
>>
>> If I remember correctly, it wasn't so much a refusal as a lack of
>> follow-through. They even had an option in their framework to add
>> the
>> abs(...) when translating shaders, but somehow didn't turn it on
>> unconditionally for some reason...
>
>
>
> VP even says so here:
> https://github.com/virtual-programming/specops-linux/issues/20
>
> They recommend against patching mesa to do abs, though.



 We should still patch Mesa to align the behavior with closed drivers
 and gallium drivers like r600g and nouveau. In other words, it's too
 late to tell us not to patch Mesa, because r600g and nouveau have
 been
 "patched" since the beginning.

 We only need to decide whether we should do it in the GLSL compiler
 or
 radeonsi, i.e. whether we should exclude i965 and svga.
>>>
>>>
>>>
>>> I do agree with that.
>>
>>
>>
>> I tend to disagree but I've come to the conclusion that I won't stand
>> in the
>> way either.  If both of the other desktop vendors do it and we've
>> already
>> decided that no implementation we care about will have its performance
>> impacted, it seems like a valid spec-compliant thing to do.  I would
>> prefer
>> it to be behind a driconf option, but if it's unconditional, oh well.
>> My
>> disagreement is mostly philosophical.
>>
>> Over the last two years of working on Vulkan, I've been fighting
>> broken
>> tests and apps left and right.  Vulkan has a huge amount of area
>> where,
>> if
>> an app does something wrong, they get undefined behavior which is up
>> to
>> and
>> including 

Re: [Mesa-dev] [PATCH] docs: document INTEL_PRECISE_TRIG envvar

2017-01-11 Thread Kenneth Graunke
On Wednesday, January 11, 2017 10:48:53 AM PST Jason Ekstrand wrote:
> On Wed, Jan 11, 2017 at 10:34 AM, Jordan Justen 
> wrote:
> 
> > On 2017-01-11 09:53:15, Juan A. Suarez Romero wrote:
> > > ---
> > >  docs/envvars.html | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/docs/envvars.html b/docs/envvars.html
> > > index 9eee8db..4f05d7f 100644
> > > --- a/docs/envvars.html
> > > +++ b/docs/envvars.html
> > > @@ -187,6 +187,7 @@ See the Xlib software
> > driver page for details.
> > > do32 - generate compute shader SIMD32 programs even if workgroup
> > size doesn't exceed the SIMD16 limit
> > > norbc - disable single sampled render buffer compression
> > >  
> > > +INTEL_PRECISE_TRIG - if set to 1 in gen<10, it fixes results out of
> > [-1.0, 1.0] range for a small set of values.
> >
> > Since we now have the precise_trig driconf option (d9546b0c5d1a),
> > should we deprecate INTEL_PRECISE_TRIG?
> >
> 
> No.  Vulkan doesn't do driconf.
> 

See commit d9546b0c5d1a5136a92276cdd7c14883f0c62737 where we tried that
and commit be32a2132785fbc119f17e62070e007ee7d17af7 where we abandoned
that idea.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/10] amd/common: unify cube map coordinate handling between radeonsi and radv

2017-01-11 Thread Bas Nieuwenhuizen
On Wed, Jan 11, 2017 at 1:45 AM, Grazvydas Ignotas  wrote:
> Unfortunately this one breaks at least (surprise!) texturecubemap
> SaschaWillemsVulkan demo.
> I recommend you try it yourself, there are even precompiled binaries
> available (see README.md):
> https://github.com/SaschaWillems/Vulkan

As far as I can see reverting the offset back to 1.5 (as Nicolai noted
on patch 1) also fixes that vulkan demo.

- Bas
>
> Gražvydas
>
> On Tue, Jan 10, 2017 at 5:12 PM, Nicolai Hähnle  wrote:
>> From: Nicolai Hähnle 
>>
>> Code is taken from a combination of radv (for the more basic functions,
>> to avoid gallivm dependencies) and radeonsi (for the new and improved
>> derivative calculations).
>> ---
>>  src/amd/common/ac_llvm_util.c  | 362 
>> +
>>  src/amd/common/ac_llvm_util.h  |  57 
>>  src/amd/common/ac_nir_to_llvm.c| 204 +---
>>  src/gallium/drivers/radeonsi/si_shader.c   |   6 +-
>>  src/gallium/drivers/radeonsi/si_shader_internal.h  |   2 +
>>  .../drivers/radeonsi/si_shader_tgsi_setup.c|   4 +
>>  6 files changed, 438 insertions(+), 197 deletions(-)
>>
>> diff --git a/src/amd/common/ac_llvm_util.c b/src/amd/common/ac_llvm_util.c
>> index a8408dd..6dd6cfa 100644
>> --- a/src/amd/common/ac_llvm_util.c
>> +++ b/src/amd/common/ac_llvm_util.c
>> @@ -25,20 +25,23 @@
>>  /* based on pieces from si_pipe.c and radeon_llvm_emit.c */
>>  #include "ac_llvm_util.h"
>>
>>  #include 
>>
>>  #include "c11/threads.h"
>>
>>  #include 
>>  #include 
>>
>> +#include "util/bitscan.h"
>> +#include "util/macros.h"
>> +
>>  static void ac_init_llvm_target()
>>  {
>>  #if HAVE_LLVM < 0x0307
>> LLVMInitializeR600TargetInfo();
>> LLVMInitializeR600Target();
>> LLVMInitializeR600TargetMC();
>> LLVMInitializeR600AsmPrinter();
>>  #else
>> LLVMInitializeAMDGPUTargetInfo();
>> LLVMInitializeAMDGPUTarget();
>> @@ -133,10 +136,369 @@ LLVMTargetMachineRef ac_create_target_machine(enum 
>> radeon_family family)
>>  target,
>>  triple,
>>  ac_get_llvm_processor_name(family),
>>  "+DumpCode,+vgpr-spilling",
>>  LLVMCodeGenLevelDefault,
>>  LLVMRelocDefault,
>>  LLVMCodeModelDefault);
>>
>> return tm;
>>  }
>> +
>> +/* Initialize module-independent parts of the context.
>> + *
>> + * The caller is responsible for initializing ctx::module and ctx::builder.
>> + */
>> +void
>> +ac_llvm_context_init(struct ac_llvm_context *ctx, LLVMContextRef context)
>> +{
>> +   LLVMValueRef args[1];
>> +
>> +   ctx->context = context;
>> +   ctx->module = NULL;
>> +   ctx->builder = NULL;
>> +
>> +   ctx->i32 = LLVMIntTypeInContext(ctx->context, 32);
>> +   ctx->f32 = LLVMFloatTypeInContext(ctx->context);
>> +
>> +   ctx->fpmath_md_kind = LLVMGetMDKindIDInContext(ctx->context, 
>> "fpmath", 6);
>> +
>> +   args[0] = LLVMConstReal(ctx->f32, 2.5);
>> +   ctx->fpmath_md_2p5_ulp = LLVMMDNodeInContext(ctx->context, args, 1);
>> +}
>> +
>> +#if HAVE_LLVM < 0x0400
>> +static LLVMAttribute ac_attr_to_llvm_attr(enum ac_func_attr attr)
>> +{
>> +   switch (attr) {
>> +   case AC_FUNC_ATTR_ALWAYSINLINE: return LLVMAlwaysInlineAttribute;
>> +   case AC_FUNC_ATTR_BYVAL: return LLVMByValAttribute;
>> +   case AC_FUNC_ATTR_INREG: return LLVMInRegAttribute;
>> +   case AC_FUNC_ATTR_NOALIAS: return LLVMNoAliasAttribute;
>> +   case AC_FUNC_ATTR_NOUNWIND: return LLVMNoUnwindAttribute;
>> +   case AC_FUNC_ATTR_READNONE: return LLVMReadNoneAttribute;
>> +   case AC_FUNC_ATTR_READONLY: return LLVMReadOnlyAttribute;
>> +   default:
>> +  fprintf(stderr, "Unhandled function attribute: %x\n", attr);
>> +  return 0;
>> +   }
>> +}
>> +
>> +#else
>> +
>> +static const char *attr_to_str(enum ac_func_attr attr)
>> +{
>> +   switch (attr) {
>> +   case AC_FUNC_ATTR_ALWAYSINLINE: return "alwaysinline";
>> +   case AC_FUNC_ATTR_BYVAL: return "byval";
>> +   case AC_FUNC_ATTR_INREG: return "inreg";
>> +   case AC_FUNC_ATTR_NOALIAS: return "noalias";
>> +   case AC_FUNC_ATTR_NOUNWIND: return "nounwind";
>> +   case AC_FUNC_ATTR_READNONE: return "readnone";
>> +   case AC_FUNC_ATTR_READONLY: return "readonly";
>> +   default:
>> +  fprintf(stderr, "Unhandled function attribute: %x\n", attr);
>> +  return 0;
>> +   }
>> +}
>> +
>> +#endif
>> +
>> +void
>> +ac_add_function_attr(LLVMValueRef function,
>> + int attr_idx,
>> + enum ac_func_attr attr)
>> +{
>> +
>> +#if HAVE_LLVM < 0x0400
>> +   LLVMAttribute llvm_attr = ac_attr_to_llvm_attr(attr);
>> +   if (attr_idx == -1) {
>> +  LLVMAddFunctionAttr(function, 

Re: [Mesa-dev] [PATCH 1/2] nir/search: Rework conditions to be a bit simpler and more generic

2017-01-11 Thread Jason Ekstrand
These two patches provide a slightly different approach to what Tim did.  I
think Tim's is fine but I also like the idea of conditions working based on
SSA def, type, and read mask so I consider this a unification/cleanup
rather than a counter-proposal.  Thoughts?

On Wed, Jan 11, 2017 at 11:13 AM, Jason Ekstrand 
wrote:

> Instead of passing all of the ALU op information, we just pass what you
> need: The SSA def, the type it's being read as, and a component mask.
> ---
>  src/compiler/nir/nir_search.c | 12 +--
>  src/compiler/nir/nir_search.h |  3 +-
>  src/compiler/nir/nir_search_helpers.h | 62 +++---
> -
>  3 files changed, 44 insertions(+), 33 deletions(-)
>
> diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
> index 10a0941..2f57821 100644
> --- a/src/compiler/nir/nir_search.c
> +++ b/src/compiler/nir/nir_search.c
> @@ -154,8 +154,16 @@ match_value(const nir_search_value *value,
> nir_alu_instr *instr, unsigned src,
>   instr->src[src].src.ssa->parent_instr->type !=
> nir_instr_type_load_const)
>  return false;
>
> - if (var->cond && !var->cond(instr, src, num_components,
> new_swizzle))
> -return false;
> + if (var->cond) {
> +uint8_t read_mask = 0;
> +for (unsigned i = 0; i < num_components; i++)
> +   read_mask |= 1 << new_swizzle[i];
> +
> +if (!var->cond(instr->src[src].src.ssa,
> +   nir_op_infos[instr->op].input_types[src],
> +   read_mask))
> +   return false;
> + }
>
>   if (var->type != nir_type_invalid &&
>   !src_is_type(instr->src[src].src, var->type))
> diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h
> index dec19d5..9d25018 100644
> --- a/src/compiler/nir/nir_search.h
> +++ b/src/compiler/nir/nir_search.h
> @@ -76,8 +76,7 @@ typedef struct {
>  * variables to require, for example, power-of-two in order for the
> search
>  * to match.
>  */
> -   bool (*cond)(nir_alu_instr *instr, unsigned src,
> -unsigned num_components, const uint8_t *swizzle);
> +   bool (*cond)(nir_ssa_def *def, nir_alu_type type, uint8_t read_mask);
>  } nir_search_variable;
>
>  typedef struct {
> diff --git a/src/compiler/nir/nir_search_helpers.h
> b/src/compiler/nir/nir_search_helpers.h
> index 20fdae6..85f6c85 100644
> --- a/src/compiler/nir/nir_search_helpers.h
> +++ b/src/compiler/nir/nir_search_helpers.h
> @@ -36,25 +36,26 @@ __is_power_of_two(unsigned int x)
>  }
>
>  static inline bool
> -is_pos_power_of_two(nir_alu_instr *instr, unsigned src, unsigned
> num_components,
> -const uint8_t *swizzle)
> +is_pos_power_of_two(nir_ssa_def *def, nir_alu_type type, uint8_t
> read_mask)
>  {
> -   nir_const_value *val = nir_src_as_const_value(instr->src[src].src);
> -
> -   /* only constant src's: */
> -   if (!val)
> +   if (def->parent_instr->type != nir_instr_type_load_const)
>return false;
>
> -   for (unsigned i = 0; i < num_components; i++) {
> -  switch (nir_op_infos[instr->op].input_types[src]) {
> +   nir_const_value *val = _instr_as_load_const(def->
> parent_instr)->value;
> +
> +   for (unsigned i = 0; i < 4; i++) {
> +  if (!(read_mask & (1 << i)))
> + continue;
> +
> +  switch (type) {
>case nir_type_int:
> - if (val->i32[swizzle[i]] < 0)
> + if (val->i32[i] < 0)
>  return false;
> - if (!__is_power_of_two(val->i32[swizzle[i]]))
> + if (!__is_power_of_two(val->i32[i]))
>  return false;
>   break;
>case nir_type_uint:
> - if (!__is_power_of_two(val->u32[swizzle[i]]))
> + if (!__is_power_of_two(val->u32[i]))
>  return false;
>   break;
>default:
> @@ -66,21 +67,22 @@ is_pos_power_of_two(nir_alu_instr *instr, unsigned
> src, unsigned num_components,
>  }
>
>  static inline bool
> -is_neg_power_of_two(nir_alu_instr *instr, unsigned src, unsigned
> num_components,
> -const uint8_t *swizzle)
> +is_neg_power_of_two(nir_ssa_def *def, nir_alu_type type, uint8_t
> read_mask)
>  {
> -   nir_const_value *val = nir_src_as_const_value(instr->src[src].src);
> -
> -   /* only constant src's: */
> -   if (!val)
> +   if (def->parent_instr->type != nir_instr_type_load_const)
>return false;
>
> -   for (unsigned i = 0; i < num_components; i++) {
> -  switch (nir_op_infos[instr->op].input_types[src]) {
> +   nir_const_value *val = _instr_as_load_const(def->
> parent_instr)->value;
> +
> +   for (unsigned i = 0; i < 4; i++) {
> +  if (!(read_mask & (1 << i)))
> + continue;
> +
> +  switch (type) {
>case nir_type_int:
> - if (val->i32[swizzle[i]] > 0)
> + if (val->i32[i] > 0)
>  return false;
> - if 

Re: [Mesa-dev] [PATCH 1/9] nir: tidy up swizzle handling in nir_search

2017-01-11 Thread Jason Ekstrand
This patch appears to cause regressions on Haswell.  I'm not sure why.  The
fact that it's only haswell strikes me as a bit odd so it's possible it's
not your patch's fault.  I'm looking into it.

On Tue, Jan 10, 2017 at 1:41 AM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> If we just check that we are not dealing with an identity swizzle
> in match_value() before calling match_expression() we can avoid
> a bunch of temp swizzle arrays and the passing it around and
> resetting craziness.
> ---
>  src/compiler/nir/nir_search.c | 89 ++
> -
>  1 file changed, 38 insertions(+), 51 deletions(-)
>
> diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
> index b34b13f..7a84b18 100644
> --- a/src/compiler/nir/nir_search.c
> +++ b/src/compiler/nir/nir_search.c
> @@ -37,8 +37,7 @@ struct match_state {
>
>  static bool
>  match_expression(const nir_search_expression *expr, nir_alu_instr *instr,
> - unsigned num_components, const uint8_t *swizzle,
> - struct match_state *state);
> + unsigned num_components, struct match_state *state);
>
>  static const uint8_t identity_swizzle[] = { 0, 1, 2, 3 };
>
> @@ -93,22 +92,15 @@ src_is_type(nir_src src, nir_alu_type type)
>
>  static bool
>  match_value(const nir_search_value *value, nir_alu_instr *instr, unsigned
> src,
> -unsigned num_components, const uint8_t *swizzle,
> -struct match_state *state)
> +unsigned num_components, struct match_state *state)
>  {
> -   uint8_t new_swizzle[4];
> -
> /* If the source is an explicitly sized source, then we need to reset
> -* both the number of components and the swizzle.
> +* the number of components.
>  */
> if (nir_op_infos[instr->op].input_sizes[src] != 0) {
>num_components = nir_op_infos[instr->op].input_sizes[src];
> -  swizzle = identity_swizzle;
> }
>
> -   for (unsigned i = 0; i < num_components; ++i)
> -  new_swizzle[i] = instr->src[src].swizzle[swizzle[i]];
> -
> /* If the value has a specific bit size and it doesn't match, bail */
> if (value->bit_size &&
> nir_src_bit_size(instr->src[src].src) != value->bit_size)
> @@ -122,9 +114,23 @@ match_value(const nir_search_value *value,
> nir_alu_instr *instr, unsigned src,
>if (instr->src[src].src.ssa->parent_instr->type !=
> nir_instr_type_alu)
>   return false;
>
> +  /* If we have an explicitly sized destination, we can only handle
> the
> +   * identity swizzle.  While dot(vec3(a, b, c).zxy) is a valid
> +   * expression, we don't have the information right now to propagate
> that
> +   * swizzle through.  We can only properly propagate swizzles if the
> +   * instruction is vectorized.
> +   */
> +  nir_alu_instr *alu_instr =
> + nir_instr_as_alu(instr->src[src].src.ssa->parent_instr);
> +  if (nir_op_infos[alu_instr->op].output_size != 0) {
> + for (unsigned i = 0; i < num_components; i++) {
> +if (instr->src[src].swizzle[i] != i)
> +   return false;
> + }
> +  }
> +
>return match_expression(nir_search_value_as_expression(value),
> -  nir_instr_as_alu(instr->src[
> src].src.ssa->parent_instr),
> -  num_components, new_swizzle, state);
> +  alu_instr, num_components, state);
>
> case nir_search_value_variable: {
>nir_search_variable *var = nir_search_value_as_variable(value);
> @@ -138,7 +144,8 @@ match_value(const nir_search_value *value,
> nir_alu_instr *instr, unsigned src,
>   assert(!instr->src[src].abs && !instr->src[src].negate);
>
>   for (unsigned i = 0; i < num_components; ++i) {
> -if (state->variables[var->variable].swizzle[i] !=
> new_swizzle[i])
> +if (state->variables[var->variable].swizzle[i] !=
> +instr->src[src].swizzle[i])
> return false;
>   }
>
> @@ -148,7 +155,8 @@ match_value(const nir_search_value *value,
> nir_alu_instr *instr, unsigned src,
>   instr->src[src].src.ssa->parent_instr->type !=
> nir_instr_type_load_const)
>  return false;
>
> - if (var->cond && !var->cond(instr, src, num_components,
> new_swizzle))
> + if (var->cond && !var->cond(instr, src, num_components,
> + instr->src[src].swizzle))
>  return false;
>
>   if (var->type != nir_type_invalid &&
> @@ -161,9 +169,10 @@ match_value(const nir_search_value *value,
> nir_alu_instr *instr, unsigned src,
>   state->variables[var->variable].negate = false;
>
>   for (unsigned i = 0; i < 4; ++i) {
> -if (i < num_components)
> -   state->variables[var->variable].swizzle[i] =
> new_swizzle[i];
> -else
> +if (i < num_components) {
> +   

Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Samuel Pitoiset



On 01/11/2017 07:34 PM, Erik Faye-Lund wrote:

On Wed, Jan 11, 2017 at 7:33 PM, Jason Ekstrand  wrote:

On Wed, Jan 11, 2017 at 10:31 AM, Erik Faye-Lund 
wrote:


On Wed, Jan 11, 2017 at 7:22 PM, Marek Olšák  wrote:

On Wed, Jan 11, 2017 at 7:09 PM, Jason Ekstrand 
wrote:

On Wed, Jan 11, 2017 at 9:32 AM, Samuel Pitoiset

wrote:




On 01/11/2017 05:32 PM, Marek Olšák wrote:


On Wed, Jan 11, 2017 at 4:33 PM, Erik Faye-Lund 
wrote:


On Wed, Jan 11, 2017 at 4:14 PM, Nicolai Hähnle 
wrote:


On 11.01.2017 13:17, Marek Olšák wrote:



On Tue, Jan 10, 2017 at 6:48 PM, Jason Ekstrand

wrote:



I'll be honest, I'm not a fan... Given that D3D10 has one defined
behavior,
D3D9 has another, and GL doesn't specify, I don't really think we
should
be
making a global change to all drivers to do the D3D9 behavior
just to
fix
one app.  Sure, other apps probably have the same bug, but are we
going
to
have apps that expect the D3D10 behavior that we've now
explicitly
made
not
work?

If we're going to hack around an app bug, I would really rather
see
it
behind a driconf option rather than a global change to driver
behavior.
Even better, it'd be cool if we could see the app get fixed.
(Yes, I
know
that's not likely).




I think we are not in a position to refuse this workaround, or put
more precisely, to have a different behavior from everybody else.
By
"we", I mean i965, radeonsi, svga. All closed drivers use abs.
Many
Mesa drivers also use abs internally (r300, r600, nv30,
nv50/nvc0).
This is not really a workaround for a specific application, even
though it's strongly motivated by that. It's a fix to align the
few
remaining drivers with all others.

We talked with the publisher about this a very long time ago.
While I
don't remember the details (Nicolai?), I think they refused to fix
it
because radeonsi appeared to be the only driver not doing abs.




If I remember correctly, it wasn't so much a refusal as a lack of
follow-through. They even had an option in their framework to add
the
abs(...) when translating shaders, but somehow didn't turn it on
unconditionally for some reason...



VP even says so here:
https://github.com/virtual-programming/specops-linux/issues/20

They recommend against patching mesa to do abs, though.



We should still patch Mesa to align the behavior with closed drivers
and gallium drivers like r600g and nouveau. In other words, it's too
late to tell us not to patch Mesa, because r600g and nouveau have
been
"patched" since the beginning.

We only need to decide whether we should do it in the GLSL compiler
or
radeonsi, i.e. whether we should exclude i965 and svga.



I do agree with that.



I tend to disagree but I've come to the conclusion that I won't stand
in the
way either.  If both of the other desktop vendors do it and we've
already
decided that no implementation we care about will have its performance
impacted, it seems like a valid spec-compliant thing to do.  I would
prefer
it to be behind a driconf option, but if it's unconditional, oh well.
My
disagreement is mostly philosophical.

Over the last two years of working on Vulkan, I've been fighting broken
tests and apps left and right.  Vulkan has a huge amount of area where,
if
an app does something wrong, they get undefined behavior which is up to
and
including program termination.  And basically all apps are broken in
some
way.  Fortunately, the validation layers are finally starting to catch
up to
the point where I'm noticing very few bugs that the validation layers
don't
catch and things are getting into a better state.  However, I've had
more
discussions than I can count with people where I have to explain to
them
that "No, the app is broken.  It needs to be fixed.  It's not my job to
make
it work."  Once you start allowing brokenness, you can never stop
allowing
it and you paint yourself into a corner.  Suddenly, you go to make a
change,
and your design decisions are not guided by the spec, they're guided by
the
spec *and* all of the broken apps that you have to keep working on your
driver because you let something through.

In the world of GLES and OpenGL conformance, we fight the same fight.
When
people ask me how conformance is coming, I frequently answer with,
"We've
got a bunch of people fixing  so that our
driver passes".  It's not that mesa is particularly touchy, it's that a
good
chunk of the rest of the industry just hacks around everything inside
their
driver and doesn't bother to fix the tests.  Sometimes the driver that
passes the conformance suite isn't even the one they ship.  If we're
going
to have a spec and hardware vendors (or the FOSS community) are going
to
implement it and apps are going to write to it, then we all need to
agree on
what it means and play by the rules.  If an app doesn't play by the
rules
and does something with 

Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Samuel Pitoiset



On 01/11/2017 07:33 PM, Jason Ekstrand wrote:

One trivial request:  If we do land this patch, please include a link
the the mailing list archives in the commit message so that we can
easily track down this discussion if we ever need to in the future.


Sure, I was planning to do it. :-)

But it's unclear if we should or not push it.



On Fri, Jan 6, 2017 at 1:42 AM, Samuel Pitoiset
> wrote:

D3D always computes the absolute value while GLSL says that the
result of inversesqrt() is undefined if x <= 0 (and undefined if
x < 0 for sqrt()). But some apps rely on this specific behaviour
which is not clearly defined by OpenGL.

Computing the absolute value before sqrt()/inversesqrt() will
prevent that, especially for apps which have been ported from D3D.
Note that closed drivers seem to also use that quirk.

This gets rid of the NaN values in the "Spec Ops: The Line" game
as well as the black squares with radeonsi. Note that Nouveau is
not affected by this bug because we already take the absolute value
when translating from TGSI to nv50/ir.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97338


Signed-off-by: Samuel Pitoiset >
---
 src/compiler/glsl/builtin_functions.cpp | 22 --
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/src/compiler/glsl/builtin_functions.cpp
b/src/compiler/glsl/builtin_functions.cpp
index 797af08b6c..f816f2ff7d 100644
--- a/src/compiler/glsl/builtin_functions.cpp
+++ b/src/compiler/glsl/builtin_functions.cpp
@@ -3623,12 +3623,30 @@ builtin_builder::_pow(const glsl_type *type)
return binop(always_available, ir_binop_pow, type, type, type);
 }

+ir_function_signature *
+builtin_builder::_sqrt(builtin_available_predicate avail,
+   const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, avail, 1, x);
+   body.emit(ret(expr(ir_unop_sqrt, abs(x;
+   return sig;
+}
+
+ir_function_signature *
+builtin_builder::_inversesqrt(builtin_available_predicate avail,
+  const glsl_type *type)
+{
+   ir_variable *x = in_var(type, "x");
+   MAKE_SIG(type, avail, 1, x);
+   body.emit(ret(expr(ir_unop_rsq, abs(x;
+   return sig;
+}
+
 UNOP(exp, ir_unop_exp,  always_available)
 UNOP(log, ir_unop_log,  always_available)
 UNOP(exp2,ir_unop_exp2, always_available)
 UNOP(log2,ir_unop_log2, always_available)
-UNOPA(sqrt,ir_unop_sqrt)
-UNOPA(inversesqrt, ir_unop_rsq)

 /** @} */

--
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org 
https://lists.freedesktop.org/mailman/listinfo/mesa-dev




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

2017-01-11 Thread Samuel Pitoiset



On 01/11/2017 07:00 PM, Roland Scheidegger wrote:

I don't think there's any glsl, es or otherwise, specification which
would require denorms (since obviously lots of hw can't do it, d3d10
forbids them), with any precision qualifier. Hence these look like bugs
of the test suite to me?
(Irrespective if it's a good idea or not to enable denormals, which I
don't realy know.)


That test works on NVIDIA hw (both with blob and nouveau) and IIRC it 
also works on Intel hw. I don't think it's buggy there.




Roland


Am 11.01.2017 um 18:29 schrieb Samuel Pitoiset:

Only VI can do 32-bit denormals at full rate while previous
generations can do it only for 64-bit and 16-bit.

This fixes some dEQP tests with the highp type qualifier.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99343
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/radeonsi/si_shader.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 5dfbd6603a..e9cb11883f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -6361,8 +6361,10 @@ int si_compile_llvm(struct si_screen *sscreen,

si_shader_binary_read_config(binary, conf, 0);

-   /* Enable 64-bit and 16-bit denormals, because there is no performance
-* cost.
+   /* Enable denormals when there is no performance cost.
+*
+* Only VI can do 32-bit denormals at full rate while previous
+* generations can do it only for 64-bit and 16-bit.
 *
 * If denormals are enabled, all floating-point output modifiers are
 * ignored.
@@ -6373,7 +6375,10 @@ int si_compile_llvm(struct si_screen *sscreen,
 *   have to stop using those.
 * - SI & CI would be very slow.
 */
-   conf->float_mode |= V_00B028_FP_64_DENORMS;
+   if (sscreen->b.chip_class >= VI)
+   conf->float_mode |= V_00B028_FP_ALL_DENORMS;
+   else
+   conf->float_mode |= V_00B028_FP_64_DENORMS;

FREE(binary->config);
FREE(binary->global_symbol_offsets);




___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

2017-01-11 Thread Samuel Pitoiset



On 01/11/2017 07:49 PM, Marek Olšák wrote:

I've realized there is a small problem with this. MAD never supports
denorms, but MUL+ADD do. That can cause issues, because the compiler
assumes that MAD = MUL+ADD.

We need to use MAD for good performance, which means we should
probably never enable FP32 denorms.


Yeah, this is explained in the code also, but according to your comment 
on Bugzilla, I thought VI+ was not affected by that...




Marek

On Wed, Jan 11, 2017 at 6:29 PM, Samuel Pitoiset
 wrote:

Only VI can do 32-bit denormals at full rate while previous
generations can do it only for 64-bit and 16-bit.

This fixes some dEQP tests with the highp type qualifier.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99343
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/radeonsi/si_shader.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 5dfbd6603a..e9cb11883f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -6361,8 +6361,10 @@ int si_compile_llvm(struct si_screen *sscreen,

si_shader_binary_read_config(binary, conf, 0);

-   /* Enable 64-bit and 16-bit denormals, because there is no performance
-* cost.
+   /* Enable denormals when there is no performance cost.
+*
+* Only VI can do 32-bit denormals at full rate while previous
+* generations can do it only for 64-bit and 16-bit.
 *
 * If denormals are enabled, all floating-point output modifiers are
 * ignored.
@@ -6373,7 +6375,10 @@ int si_compile_llvm(struct si_screen *sscreen,
 *   have to stop using those.
 * - SI & CI would be very slow.
 */
-   conf->float_mode |= V_00B028_FP_64_DENORMS;
+   if (sscreen->b.chip_class >= VI)
+   conf->float_mode |= V_00B028_FP_ALL_DENORMS;
+   else
+   conf->float_mode |= V_00B028_FP_64_DENORMS;

FREE(binary->config);
FREE(binary->global_symbol_offsets);
--
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

2017-01-11 Thread Samuel Pitoiset



On 01/11/2017 07:18 PM, Ilia Mirkin wrote:

So, I don't know whether this affects more than compute shaders
without reading the code, but I explicitly had to enable denorm
flushing on nvc0 in order to fix some sad artifacts in Unigine Heaven.

Right now nouveau only does denorm flushes on graphics shaders, but
the reason I did that originally was to leave them on for a
hypothetical OpenCL compute situation (and GL compute was far from my
radar at the time). IMHO GL compute shaders probably want denorm
flushing as well.


I think as well.
Fyi,
dEQP-GLES31.functional.shaders.builtin_functions.precision.min.highp_compute.scalar
works on Nouveau without explicitely set the ftz flag.



Cheers,

  -ilia

On Wed, Jan 11, 2017 at 1:09 PM, Marek Olšák  wrote:

Reviewed-by: Marek Olšák 

Would you please run a GPU-bound benchmark of your choice to make sure
it doesn't affect performance?

Thanks,
Marek

On Wed, Jan 11, 2017 at 6:29 PM, Samuel Pitoiset
 wrote:

Only VI can do 32-bit denormals at full rate while previous
generations can do it only for 64-bit and 16-bit.

This fixes some dEQP tests with the highp type qualifier.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99343
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/radeonsi/si_shader.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 5dfbd6603a..e9cb11883f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -6361,8 +6361,10 @@ int si_compile_llvm(struct si_screen *sscreen,

si_shader_binary_read_config(binary, conf, 0);

-   /* Enable 64-bit and 16-bit denormals, because there is no performance
-* cost.
+   /* Enable denormals when there is no performance cost.
+*
+* Only VI can do 32-bit denormals at full rate while previous
+* generations can do it only for 64-bit and 16-bit.
 *
 * If denormals are enabled, all floating-point output modifiers are
 * ignored.
@@ -6373,7 +6375,10 @@ int si_compile_llvm(struct si_screen *sscreen,
 *   have to stop using those.
 * - SI & CI would be very slow.
 */
-   conf->float_mode |= V_00B028_FP_64_DENORMS;
+   if (sscreen->b.chip_class >= VI)
+   conf->float_mode |= V_00B028_FP_ALL_DENORMS;
+   else
+   conf->float_mode |= V_00B028_FP_64_DENORMS;

FREE(binary->config);
FREE(binary->global_symbol_offsets);
--
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

2017-01-11 Thread Samuel Pitoiset



On 01/11/2017 07:09 PM, Marek Olšák wrote:

Reviewed-by: Marek Olšák 

Would you please run a GPU-bound benchmark of your choice to make sure
it doesn't affect performance?


I tried Furmark and Pixmark Piano on my rx 480. With 3 runs before and 
after that change, the number of FPS as well as the number of points are 
identical.




Thanks,
Marek

On Wed, Jan 11, 2017 at 6:29 PM, Samuel Pitoiset
 wrote:

Only VI can do 32-bit denormals at full rate while previous
generations can do it only for 64-bit and 16-bit.

This fixes some dEQP tests with the highp type qualifier.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99343
Signed-off-by: Samuel Pitoiset 
---
 src/gallium/drivers/radeonsi/si_shader.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 5dfbd6603a..e9cb11883f 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -6361,8 +6361,10 @@ int si_compile_llvm(struct si_screen *sscreen,

si_shader_binary_read_config(binary, conf, 0);

-   /* Enable 64-bit and 16-bit denormals, because there is no performance
-* cost.
+   /* Enable denormals when there is no performance cost.
+*
+* Only VI can do 32-bit denormals at full rate while previous
+* generations can do it only for 64-bit and 16-bit.
 *
 * If denormals are enabled, all floating-point output modifiers are
 * ignored.
@@ -6373,7 +6375,10 @@ int si_compile_llvm(struct si_screen *sscreen,
 *   have to stop using those.
 * - SI & CI would be very slow.
 */
-   conf->float_mode |= V_00B028_FP_64_DENORMS;
+   if (sscreen->b.chip_class >= VI)
+   conf->float_mode |= V_00B028_FP_ALL_DENORMS;
+   else
+   conf->float_mode |= V_00B028_FP_64_DENORMS;

FREE(binary->config);
FREE(binary->global_symbol_offsets);
--
2.11.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] nir/algebraic: add support for conditional helper functions to expressions

2017-01-11 Thread Jason Ekstrand
This patch and 5-9 are

Reviewed-by: Jason Ekstrand 

If others think the "unified" approach is nicer, I'll rebase on top and we
shouldn't need any nir_opt_algebraic.py changes since the syntax won't
change.

On Tue, Jan 10, 2017 at 1:41 AM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> ---
>  src/compiler/nir/nir_algebraic.py | 5 -
>  src/compiler/nir/nir_search.c | 3 +++
>  src/compiler/nir/nir_search.h | 8 
>  3 files changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/src/compiler/nir/nir_algebraic.py b/src/compiler/nir/nir_
> algebraic.py
> index 19ac6ee..b0fa9e7 100644
> --- a/src/compiler/nir/nir_algebraic.py
> +++ b/src/compiler/nir/nir_algebraic.py
> @@ -90,6 +90,7 @@ static const ${val.c_type} ${val.name} = {
> ${'true' if val.inexact else 'false'},
> nir_op_${val.opcode},
> { ${', '.join(src.c_ptr for src in val.sources)} },
> +   ${val.cond if val.cond else 'NULL'},
>  % endif
>  };""")
>
> @@ -185,7 +186,8 @@ class Variable(Value):
>elif self.required_type == 'float':
>   return "nir_type_float"
>
> -_opcode_re = re.compile(r"(?P~)?(?P\w+)(?:@(?P\d+)
> )?")
> +_opcode_re = re.compile(r"(?P~)?(?P\w+)(?:@(?P\d+)
> )?"
> +r"(?P\([^\)]+\))?")
>
>  class Expression(Value):
> def __init__(self, expr, name_base, varset):
> @@ -198,6 +200,7 @@ class Expression(Value):
>self.opcode = m.group('opcode')
>self.bit_size = int(m.group('bits')) if m.group('bits') else 0
>self.inexact = m.group('inexact') is not None
> +  self.cond = m.group('cond')
>self.sources = [ Value.create(src, "{0}_{1}".format(name_base, i),
> varset)
> for (i, src) in enumerate(expr[1:]) ]
>
> diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
> index cc17642..0d08614 100644
> --- a/src/compiler/nir/nir_search.c
> +++ b/src/compiler/nir/nir_search.c
> @@ -264,6 +264,9 @@ static bool
>  match_expression(const nir_search_expression *expr, nir_alu_instr *instr,
>   unsigned num_components, struct match_state *state)
>  {
> +   if (expr->cond && !expr->cond(instr))
> +  return false;
> +
> if (instr->op != expr->opcode)
>return false;
>
> diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h
> index 357509a..004b61d 100644
> --- a/src/compiler/nir/nir_search.h
> +++ b/src/compiler/nir/nir_search.h
> @@ -102,6 +102,14 @@ typedef struct {
>
> nir_op opcode;
> const nir_search_value *srcs[4];
> +
> +   /** Optional condition fxn ptr
> +*
> +* This allows additional constraints on expression matching, it is
> +* typically used to match an expressions uses such as the number of
> times
> +* the expression is used, and whether its used by an if.
> +*/
> +   bool (*cond)(nir_alu_instr *instr);
>  } nir_search_expression;
>
>  NIR_DEFINE_CAST(nir_search_value_as_variable, nir_search_value,
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] util: fix list_is_singular()

2017-01-11 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

Please CC stable.

On Tue, Jan 10, 2017 at 8:13 PM, Timothy Arceri <
timothy.arc...@collabora.com> wrote:

> Currently its dependant on the user calling and checking the result
> of list_empty() before using the result of list_is_singular().
> ---
>  src/util/list.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/src/util/list.h b/src/util/list.h
> index e8a99ac..07eb9f3 100644
> --- a/src/util/list.h
> +++ b/src/util/list.h
> @@ -110,7 +110,7 @@ static inline bool list_empty(struct list_head *list)
>   */
>  static inline bool list_is_singular(const struct list_head *list)
>  {
> -   return list->next != NULL && list->next->next == list;
> +   return list->next != NULL && list->next != list && list->next->next ==
> list;
>  }
>
>  static inline unsigned list_length(struct list_head *list)
> --
> 2.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Ilia Mirkin
On Wed, Jan 11, 2017 at 2:33 PM, Marek Olšák  wrote:
> On Wed, Jan 11, 2017 at 7:34 PM, Erik Faye-Lund  wrote:
>> On Wed, Jan 11, 2017 at 7:33 PM, Jason Ekstrand  wrote:
>>> On Wed, Jan 11, 2017 at 10:31 AM, Erik Faye-Lund 
>>> wrote:

 On Wed, Jan 11, 2017 at 7:22 PM, Marek Olšák  wrote:
 > On Wed, Jan 11, 2017 at 7:09 PM, Jason Ekstrand 
 > wrote:
 > I agree with you completely, and I find it unfortunate too that we
 > have to add the workaround to GLSL or radeonsi to align its behavior
 > with closed drivers.

 Just for reference, I just tested what NVIDIA does on Windows, and
 they *don't* seem to do inversesqrt(abs(x)) on my HW/driver.
>>>
>>>
>>> What about sqrt()?  Do they do abs for one and not the other?  Because that
>>> would be crazy but also possible.
>>
>> Not for sqrt either, it seems.
>
> I'm open to a drirc-based solution, but I'd also be OK with abs in
> radeonsi or GLSL. It's not like nouveau and r600g are gonna remove the
> ABS modifier

FWIW I'd be happy to drop the modifier in the nouveau backend if
that's the consensus opinion of how to handle this situation properly.

Mostly I'd prefer consistent handling across all drivers.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/8] android: fix llvmpipe build

2017-01-11 Thread Jose Fonseca

On 10/01/17 15:54, Emil Velikov wrote:

On 6 January 2017 at 17:35, Wu Zhen  wrote:

From: WuZhen 

since (cf410574 gallivm: Make MCJIT a runtime optioni.), llvmpipe assume
MCJIT is available on x86(_64). this is not the case for android prior to M.


Wu Zhen, what exactly is the issue you're getting - build or link-time error ?

Looking at the hunk [1] in the offending commit makes me wonder.
 - Why do we call LLVMLinkInJIT() even if one selects MCJIT via the env var.
 - Why do we always call LLVMLinkInMCJIT regardless of a) if we've
build against old LLVM and b) the env var.

Jose, shouldn't we honour the above ? One way that comes to mind is to
have USE_MCJIT always as static variable. Then we can guard the
debug_get_bool_option() override with the current LLVM_VERSION/ARCH
heuristics while preserving original invocation.

if (USE_MCJIT) // use lowercase name since it's not a macro ?
   LLVMLinkInMCJIT();
else
   LLVMLinkInJIT();


Thanks
Emil

[1]
@@ -385,18 +382,18 @@ lp_build_init(void)
   if (gallivm_initialized)
  return TRUE;

+   LLVMLinkInMCJIT();
+#if !defined(USE_MCJIT)
+   USE_MCJIT = debug_get_bool_option("GALLIVM_MCJIT", 0);
+   LLVMLinkInJIT();
+#endif
+
#ifdef DEBUG
   gallivm_debug = debug_get_option_gallivm_debug();
#endif

   lp_set_target_options();

-#if USE_MCJIT
-   LLVMLinkInMCJIT();
-#else
-   LLVMLinkInJIT();
-#endif
-



USE_MCJIT used to be a statically define macro, but it's now it can also 
be runtime boolean.


We require LLVM 3.3, and MCJIT has been available since then, so there 
was no reason not to link.


Android seems a new beast: it has LLVM 3.3 but not MCJIT??

Jose
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Marek Olšák
On Wed, Jan 11, 2017 at 7:34 PM, Erik Faye-Lund  wrote:
> On Wed, Jan 11, 2017 at 7:33 PM, Jason Ekstrand  wrote:
>> On Wed, Jan 11, 2017 at 10:31 AM, Erik Faye-Lund 
>> wrote:
>>>
>>> On Wed, Jan 11, 2017 at 7:22 PM, Marek Olšák  wrote:
>>> > On Wed, Jan 11, 2017 at 7:09 PM, Jason Ekstrand 
>>> > wrote:
>>> >> On Wed, Jan 11, 2017 at 9:32 AM, Samuel Pitoiset
>>> >> 
>>> >> wrote:
>>> >>>
>>> >>>
>>> >>>
>>> >>> On 01/11/2017 05:32 PM, Marek Olšák wrote:
>>> 
>>>  On Wed, Jan 11, 2017 at 4:33 PM, Erik Faye-Lund 
>>>  wrote:
>>> >
>>> > On Wed, Jan 11, 2017 at 4:14 PM, Nicolai Hähnle 
>>> > wrote:
>>> >>
>>> >> On 11.01.2017 13:17, Marek Olšák wrote:
>>> >>>
>>> >>>
>>> >>> On Tue, Jan 10, 2017 at 6:48 PM, Jason Ekstrand
>>> >>> 
>>> >>> wrote:
>>> 
>>> 
>>>  I'll be honest, I'm not a fan... Given that D3D10 has one defined
>>>  behavior,
>>>  D3D9 has another, and GL doesn't specify, I don't really think we
>>>  should
>>>  be
>>>  making a global change to all drivers to do the D3D9 behavior
>>>  just to
>>>  fix
>>>  one app.  Sure, other apps probably have the same bug, but are we
>>>  going
>>>  to
>>>  have apps that expect the D3D10 behavior that we've now
>>>  explicitly
>>>  made
>>>  not
>>>  work?
>>> 
>>>  If we're going to hack around an app bug, I would really rather
>>>  see
>>>  it
>>>  behind a driconf option rather than a global change to driver
>>>  behavior.
>>>  Even better, it'd be cool if we could see the app get fixed.
>>>  (Yes, I
>>>  know
>>>  that's not likely).
>>> >>>
>>> >>>
>>> >>>
>>> >>> I think we are not in a position to refuse this workaround, or put
>>> >>> more precisely, to have a different behavior from everybody else.
>>> >>> By
>>> >>> "we", I mean i965, radeonsi, svga. All closed drivers use abs.
>>> >>> Many
>>> >>> Mesa drivers also use abs internally (r300, r600, nv30,
>>> >>> nv50/nvc0).
>>> >>> This is not really a workaround for a specific application, even
>>> >>> though it's strongly motivated by that. It's a fix to align the
>>> >>> few
>>> >>> remaining drivers with all others.
>>> >>>
>>> >>> We talked with the publisher about this a very long time ago.
>>> >>> While I
>>> >>> don't remember the details (Nicolai?), I think they refused to fix
>>> >>> it
>>> >>> because radeonsi appeared to be the only driver not doing abs.
>>> >>
>>> >>
>>> >>
>>> >> If I remember correctly, it wasn't so much a refusal as a lack of
>>> >> follow-through. They even had an option in their framework to add
>>> >> the
>>> >> abs(...) when translating shaders, but somehow didn't turn it on
>>> >> unconditionally for some reason...
>>> >
>>> >
>>> > VP even says so here:
>>> > https://github.com/virtual-programming/specops-linux/issues/20
>>> >
>>> > They recommend against patching mesa to do abs, though.
>>> 
>>> 
>>>  We should still patch Mesa to align the behavior with closed drivers
>>>  and gallium drivers like r600g and nouveau. In other words, it's too
>>>  late to tell us not to patch Mesa, because r600g and nouveau have
>>>  been
>>>  "patched" since the beginning.
>>> 
>>>  We only need to decide whether we should do it in the GLSL compiler
>>>  or
>>>  radeonsi, i.e. whether we should exclude i965 and svga.
>>> >>>
>>> >>>
>>> >>> I do agree with that.
>>> >>
>>> >>
>>> >> I tend to disagree but I've come to the conclusion that I won't stand
>>> >> in the
>>> >> way either.  If both of the other desktop vendors do it and we've
>>> >> already
>>> >> decided that no implementation we care about will have its performance
>>> >> impacted, it seems like a valid spec-compliant thing to do.  I would
>>> >> prefer
>>> >> it to be behind a driconf option, but if it's unconditional, oh well.
>>> >> My
>>> >> disagreement is mostly philosophical.
>>> >>
>>> >> Over the last two years of working on Vulkan, I've been fighting broken
>>> >> tests and apps left and right.  Vulkan has a huge amount of area where,
>>> >> if
>>> >> an app does something wrong, they get undefined behavior which is up to
>>> >> and
>>> >> including program termination.  And basically all apps are broken in
>>> >> some
>>> >> way.  Fortunately, the validation layers are finally starting to catch
>>> >> up to
>>> >> the point where I'm noticing very few bugs that the validation layers
>>> >> don't
>>> >> catch and things are getting into a better state.  However, 

Re: [Mesa-dev] [PATCH 1/3] vl/dri3: use external texture as back buffers(v4)

2017-01-11 Thread Harry Wentland

On 2017-01-11 12:50 AM, Michel Dänzer wrote:

On 10/01/17 09:07 PM, Andy Furniss wrote:

Andy Furniss wrote:


Though recent testing shows this is not true with DAL/DC on 3.7 -
todo test DC on new drm-next branch.


todo done, DC for some reason on both amd-staging-4.7 and
amd-staging-drm-next is "slower" = the tear region is 2 to 3 times
larger than non DC kernel with powerplay auto. With high it is smaller
but still present.


This particular issue is because DC uses the GPU's VUPDATE interrupt
instead of the VBLANK interrupt to drive the DRM vblank machinery. The
result is that userspace is only notified of a vertical blank period
when it's already over, so it doesn't get a chance to do anything inside
the vertical blank period.




Adding Tony for comment on why DC behaves the way it does.

Harry
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] nir/search: Rework conditions to be a bit simpler and more generic

2017-01-11 Thread Jason Ekstrand
Instead of passing all of the ALU op information, we just pass what you
need: The SSA def, the type it's being read as, and a component mask.
---
 src/compiler/nir/nir_search.c | 12 +--
 src/compiler/nir/nir_search.h |  3 +-
 src/compiler/nir/nir_search_helpers.h | 62 +++
 3 files changed, 44 insertions(+), 33 deletions(-)

diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
index 10a0941..2f57821 100644
--- a/src/compiler/nir/nir_search.c
+++ b/src/compiler/nir/nir_search.c
@@ -154,8 +154,16 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
  instr->src[src].src.ssa->parent_instr->type != 
nir_instr_type_load_const)
 return false;
 
- if (var->cond && !var->cond(instr, src, num_components, new_swizzle))
-return false;
+ if (var->cond) {
+uint8_t read_mask = 0;
+for (unsigned i = 0; i < num_components; i++)
+   read_mask |= 1 << new_swizzle[i];
+
+if (!var->cond(instr->src[src].src.ssa,
+   nir_op_infos[instr->op].input_types[src],
+   read_mask))
+   return false;
+ }
 
  if (var->type != nir_type_invalid &&
  !src_is_type(instr->src[src].src, var->type))
diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h
index dec19d5..9d25018 100644
--- a/src/compiler/nir/nir_search.h
+++ b/src/compiler/nir/nir_search.h
@@ -76,8 +76,7 @@ typedef struct {
 * variables to require, for example, power-of-two in order for the search
 * to match.
 */
-   bool (*cond)(nir_alu_instr *instr, unsigned src,
-unsigned num_components, const uint8_t *swizzle);
+   bool (*cond)(nir_ssa_def *def, nir_alu_type type, uint8_t read_mask);
 } nir_search_variable;
 
 typedef struct {
diff --git a/src/compiler/nir/nir_search_helpers.h 
b/src/compiler/nir/nir_search_helpers.h
index 20fdae6..85f6c85 100644
--- a/src/compiler/nir/nir_search_helpers.h
+++ b/src/compiler/nir/nir_search_helpers.h
@@ -36,25 +36,26 @@ __is_power_of_two(unsigned int x)
 }
 
 static inline bool
-is_pos_power_of_two(nir_alu_instr *instr, unsigned src, unsigned 
num_components,
-const uint8_t *swizzle)
+is_pos_power_of_two(nir_ssa_def *def, nir_alu_type type, uint8_t read_mask)
 {
-   nir_const_value *val = nir_src_as_const_value(instr->src[src].src);
-
-   /* only constant src's: */
-   if (!val)
+   if (def->parent_instr->type != nir_instr_type_load_const)
   return false;
 
-   for (unsigned i = 0; i < num_components; i++) {
-  switch (nir_op_infos[instr->op].input_types[src]) {
+   nir_const_value *val = _instr_as_load_const(def->parent_instr)->value;
+
+   for (unsigned i = 0; i < 4; i++) {
+  if (!(read_mask & (1 << i)))
+ continue;
+
+  switch (type) {
   case nir_type_int:
- if (val->i32[swizzle[i]] < 0)
+ if (val->i32[i] < 0)
 return false;
- if (!__is_power_of_two(val->i32[swizzle[i]]))
+ if (!__is_power_of_two(val->i32[i]))
 return false;
  break;
   case nir_type_uint:
- if (!__is_power_of_two(val->u32[swizzle[i]]))
+ if (!__is_power_of_two(val->u32[i]))
 return false;
  break;
   default:
@@ -66,21 +67,22 @@ is_pos_power_of_two(nir_alu_instr *instr, unsigned src, 
unsigned num_components,
 }
 
 static inline bool
-is_neg_power_of_two(nir_alu_instr *instr, unsigned src, unsigned 
num_components,
-const uint8_t *swizzle)
+is_neg_power_of_two(nir_ssa_def *def, nir_alu_type type, uint8_t read_mask)
 {
-   nir_const_value *val = nir_src_as_const_value(instr->src[src].src);
-
-   /* only constant src's: */
-   if (!val)
+   if (def->parent_instr->type != nir_instr_type_load_const)
   return false;
 
-   for (unsigned i = 0; i < num_components; i++) {
-  switch (nir_op_infos[instr->op].input_types[src]) {
+   nir_const_value *val = _instr_as_load_const(def->parent_instr)->value;
+
+   for (unsigned i = 0; i < 4; i++) {
+  if (!(read_mask & (1 << i)))
+ continue;
+
+  switch (type) {
   case nir_type_int:
- if (val->i32[swizzle[i]] > 0)
+ if (val->i32[i] > 0)
 return false;
- if (!__is_power_of_two(abs(val->i32[swizzle[i]])))
+ if (!__is_power_of_two(abs(val->i32[i])))
 return false;
  break;
   default:
@@ -92,18 +94,20 @@ is_neg_power_of_two(nir_alu_instr *instr, unsigned src, 
unsigned num_components,
 }
 
 static inline bool
-is_zero_to_one(nir_alu_instr *instr, unsigned src, unsigned num_components,
-   const uint8_t *swizzle)
+is_zero_to_one(nir_ssa_def *def, nir_alu_type type, uint8_t read_mask)
 {
-   nir_const_value *val = nir_src_as_const_value(instr->src[src].src);
-
-   if (!val)
+   if (def->parent_instr->type != 

[Mesa-dev] [PATCH 2/2] nir/search: Allow conditions on expressions as well as variables

2017-01-11 Thread Jason Ekstrand
---
 src/compiler/nir/nir_algebraic.py | 10 +++---
 src/compiler/nir/nir_search.c | 28 +---
 src/compiler/nir/nir_search.h | 18 +-
 3 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/src/compiler/nir/nir_algebraic.py 
b/src/compiler/nir/nir_algebraic.py
index 19ac6ee..e70c511 100644
--- a/src/compiler/nir/nir_algebraic.py
+++ b/src/compiler/nir/nir_algebraic.py
@@ -78,14 +78,13 @@ class Value(object):
__template = mako.template.Template("""
 #include "compiler/nir/nir_search_helpers.h"
 static const ${val.c_type} ${val.name} = {
-   { ${val.type_enum}, ${val.bit_size} },
+   { ${val.type_enum}, ${val.bit_size}, ${val.cond if val.cond else 'NULL'} },
 % if isinstance(val, Constant):
${val.type()}, { ${hex(val)} /* ${val.value} */ },
 % elif isinstance(val, Variable):
${val.index}, /* ${val.var_name} */
${'true' if val.is_constant else 'false'},
${val.type() or 'nir_type_invalid' },
-   ${val.cond if val.cond else 'NULL'},
 % elif isinstance(val, Expression):
${'true' if val.inexact else 'false'},
nir_op_${val.opcode},
@@ -121,6 +120,9 @@ class Constant(Value):
def __init__(self, val, name):
   Value.__init__(self, name, "constant")
 
+  # Constants can't have conditions.  They either match or they don't.
+  self.cond = None
+
   if isinstance(val, (str)):
  m = _constant_re.match(val)
  self.value = ast.literal_eval(m.group('value'))
@@ -185,7 +187,8 @@ class Variable(Value):
   elif self.required_type == 'float':
  return "nir_type_float"
 
-_opcode_re = re.compile(r"(?P~)?(?P\w+)(?:@(?P\d+))?")
+_opcode_re = re.compile(r"(?P~)?(?P\w+)(?:@(?P\d+))?"
+r"(?P\([^\)]+\))?")
 
 class Expression(Value):
def __init__(self, expr, name_base, varset):
@@ -198,6 +201,7 @@ class Expression(Value):
   self.opcode = m.group('opcode')
   self.bit_size = int(m.group('bits')) if m.group('bits') else 0
   self.inexact = m.group('inexact') is not None
+  self.cond = m.group('cond')
   self.sources = [ Value.create(src, "{0}_{1}".format(name_base, i), 
varset)
for (i, src) in enumerate(expr[1:]) ]
 
diff --git a/src/compiler/nir/nir_search.c b/src/compiler/nir/nir_search.c
index 2f57821..0148b2f 100644
--- a/src/compiler/nir/nir_search.c
+++ b/src/compiler/nir/nir_search.c
@@ -119,6 +119,17 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
for (unsigned i = 0; i < num_components; ++i)
   new_swizzle[i] = instr->src[src].swizzle[swizzle[i]];
 
+   if (value->cond) {
+  uint8_t read_mask = 0;
+  for (unsigned i = 0; i < num_components; i++)
+ read_mask |= 1 << new_swizzle[i];
+
+  if (!value->cond(instr->src[src].src.ssa,
+   nir_op_infos[instr->op].input_types[src],
+   read_mask))
+ return false;
+   }
+
/* If the value has a specific bit size and it doesn't match, bail */
if (value->bit_size &&
nir_src_bit_size(instr->src[src].src) != value->bit_size)
@@ -154,17 +165,6 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
  instr->src[src].src.ssa->parent_instr->type != 
nir_instr_type_load_const)
 return false;
 
- if (var->cond) {
-uint8_t read_mask = 0;
-for (unsigned i = 0; i < num_components; i++)
-   read_mask |= 1 << new_swizzle[i];
-
-if (!var->cond(instr->src[src].src.ssa,
-   nir_op_infos[instr->op].input_types[src],
-   read_mask))
-   return false;
- }
-
  if (var->type != nir_type_invalid &&
  !src_is_type(instr->src[src].src, var->type))
 return false;
@@ -604,6 +604,12 @@ nir_replace_instr(nir_alu_instr *instr, const 
nir_search_expression *search,
state.has_exact_alu = false;
state.variables_seen = 0;
 
+   if (search->value.cond) {
+  if (!search->value.cond(>dest.dest.ssa, nir_type_invalid,
+  instr->dest.write_mask))
+ return false;
+   }
+
if (!match_expression(search, instr, instr->dest.dest.ssa.num_components,
  swizzle, ))
   return NULL;
diff --git a/src/compiler/nir/nir_search.h b/src/compiler/nir/nir_search.h
index 9d25018..6399033 100644
--- a/src/compiler/nir/nir_search.h
+++ b/src/compiler/nir/nir_search.h
@@ -42,6 +42,15 @@ typedef struct {
nir_search_value_type type;
 
unsigned bit_size;
+
+   /** Optional condition fxn ptr
+*
+* This is only allowed in search expressions, and allows additional
+* constraints to be placed on the match.  Typically used for 'is_constant'
+* variables to require, for example, power-of-two in order for the search
+* to match.
+*/
+   bool (*cond)(nir_ssa_def *def, nir_alu_type type, uint8_t read_mask);
 } 

Re: [Mesa-dev] [PATCH] radeonsi: enable 32-bit denormals on VI+

2017-01-11 Thread Marek Olšák
I've realized there is a small problem with this. MAD never supports
denorms, but MUL+ADD do. That can cause issues, because the compiler
assumes that MAD = MUL+ADD.

We need to use MAD for good performance, which means we should
probably never enable FP32 denorms.

Marek

On Wed, Jan 11, 2017 at 6:29 PM, Samuel Pitoiset
 wrote:
> Only VI can do 32-bit denormals at full rate while previous
> generations can do it only for 64-bit and 16-bit.
>
> This fixes some dEQP tests with the highp type qualifier.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99343
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/gallium/drivers/radeonsi/si_shader.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
> b/src/gallium/drivers/radeonsi/si_shader.c
> index 5dfbd6603a..e9cb11883f 100644
> --- a/src/gallium/drivers/radeonsi/si_shader.c
> +++ b/src/gallium/drivers/radeonsi/si_shader.c
> @@ -6361,8 +6361,10 @@ int si_compile_llvm(struct si_screen *sscreen,
>
> si_shader_binary_read_config(binary, conf, 0);
>
> -   /* Enable 64-bit and 16-bit denormals, because there is no performance
> -* cost.
> +   /* Enable denormals when there is no performance cost.
> +*
> +* Only VI can do 32-bit denormals at full rate while previous
> +* generations can do it only for 64-bit and 16-bit.
>  *
>  * If denormals are enabled, all floating-point output modifiers are
>  * ignored.
> @@ -6373,7 +6375,10 @@ int si_compile_llvm(struct si_screen *sscreen,
>  *   have to stop using those.
>  * - SI & CI would be very slow.
>  */
> -   conf->float_mode |= V_00B028_FP_64_DENORMS;
> +   if (sscreen->b.chip_class >= VI)
> +   conf->float_mode |= V_00B028_FP_ALL_DENORMS;
> +   else
> +   conf->float_mode |= V_00B028_FP_64_DENORMS;
>
> FREE(binary->config);
> FREE(binary->global_symbol_offsets);
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] docs: document INTEL_PRECISE_TRIG envvar

2017-01-11 Thread Jason Ekstrand
On Wed, Jan 11, 2017 at 10:34 AM, Jordan Justen 
wrote:

> On 2017-01-11 09:53:15, Juan A. Suarez Romero wrote:
> > ---
> >  docs/envvars.html | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/docs/envvars.html b/docs/envvars.html
> > index 9eee8db..4f05d7f 100644
> > --- a/docs/envvars.html
> > +++ b/docs/envvars.html
> > @@ -187,6 +187,7 @@ See the Xlib software
> driver page for details.
> > do32 - generate compute shader SIMD32 programs even if workgroup
> size doesn't exceed the SIMD16 limit
> > norbc - disable single sampled render buffer compression
> >  
> > +INTEL_PRECISE_TRIG - if set to 1 in gen<10, it fixes results out of
> [-1.0, 1.0] range for a small set of values.
>
> Since we now have the precise_trig driconf option (d9546b0c5d1a),
> should we deprecate INTEL_PRECISE_TRIG?
>

No.  Vulkan doesn't do driconf.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Erik Faye-Lund
On Wed, Jan 11, 2017 at 7:33 PM, Jason Ekstrand  wrote:
> On Wed, Jan 11, 2017 at 10:31 AM, Erik Faye-Lund 
> wrote:
>>
>> On Wed, Jan 11, 2017 at 7:22 PM, Marek Olšák  wrote:
>> > On Wed, Jan 11, 2017 at 7:09 PM, Jason Ekstrand 
>> > wrote:
>> >> On Wed, Jan 11, 2017 at 9:32 AM, Samuel Pitoiset
>> >> 
>> >> wrote:
>> >>>
>> >>>
>> >>>
>> >>> On 01/11/2017 05:32 PM, Marek Olšák wrote:
>> 
>>  On Wed, Jan 11, 2017 at 4:33 PM, Erik Faye-Lund 
>>  wrote:
>> >
>> > On Wed, Jan 11, 2017 at 4:14 PM, Nicolai Hähnle 
>> > wrote:
>> >>
>> >> On 11.01.2017 13:17, Marek Olšák wrote:
>> >>>
>> >>>
>> >>> On Tue, Jan 10, 2017 at 6:48 PM, Jason Ekstrand
>> >>> 
>> >>> wrote:
>> 
>> 
>>  I'll be honest, I'm not a fan... Given that D3D10 has one defined
>>  behavior,
>>  D3D9 has another, and GL doesn't specify, I don't really think we
>>  should
>>  be
>>  making a global change to all drivers to do the D3D9 behavior
>>  just to
>>  fix
>>  one app.  Sure, other apps probably have the same bug, but are we
>>  going
>>  to
>>  have apps that expect the D3D10 behavior that we've now
>>  explicitly
>>  made
>>  not
>>  work?
>> 
>>  If we're going to hack around an app bug, I would really rather
>>  see
>>  it
>>  behind a driconf option rather than a global change to driver
>>  behavior.
>>  Even better, it'd be cool if we could see the app get fixed.
>>  (Yes, I
>>  know
>>  that's not likely).
>> >>>
>> >>>
>> >>>
>> >>> I think we are not in a position to refuse this workaround, or put
>> >>> more precisely, to have a different behavior from everybody else.
>> >>> By
>> >>> "we", I mean i965, radeonsi, svga. All closed drivers use abs.
>> >>> Many
>> >>> Mesa drivers also use abs internally (r300, r600, nv30,
>> >>> nv50/nvc0).
>> >>> This is not really a workaround for a specific application, even
>> >>> though it's strongly motivated by that. It's a fix to align the
>> >>> few
>> >>> remaining drivers with all others.
>> >>>
>> >>> We talked with the publisher about this a very long time ago.
>> >>> While I
>> >>> don't remember the details (Nicolai?), I think they refused to fix
>> >>> it
>> >>> because radeonsi appeared to be the only driver not doing abs.
>> >>
>> >>
>> >>
>> >> If I remember correctly, it wasn't so much a refusal as a lack of
>> >> follow-through. They even had an option in their framework to add
>> >> the
>> >> abs(...) when translating shaders, but somehow didn't turn it on
>> >> unconditionally for some reason...
>> >
>> >
>> > VP even says so here:
>> > https://github.com/virtual-programming/specops-linux/issues/20
>> >
>> > They recommend against patching mesa to do abs, though.
>> 
>> 
>>  We should still patch Mesa to align the behavior with closed drivers
>>  and gallium drivers like r600g and nouveau. In other words, it's too
>>  late to tell us not to patch Mesa, because r600g and nouveau have
>>  been
>>  "patched" since the beginning.
>> 
>>  We only need to decide whether we should do it in the GLSL compiler
>>  or
>>  radeonsi, i.e. whether we should exclude i965 and svga.
>> >>>
>> >>>
>> >>> I do agree with that.
>> >>
>> >>
>> >> I tend to disagree but I've come to the conclusion that I won't stand
>> >> in the
>> >> way either.  If both of the other desktop vendors do it and we've
>> >> already
>> >> decided that no implementation we care about will have its performance
>> >> impacted, it seems like a valid spec-compliant thing to do.  I would
>> >> prefer
>> >> it to be behind a driconf option, but if it's unconditional, oh well.
>> >> My
>> >> disagreement is mostly philosophical.
>> >>
>> >> Over the last two years of working on Vulkan, I've been fighting broken
>> >> tests and apps left and right.  Vulkan has a huge amount of area where,
>> >> if
>> >> an app does something wrong, they get undefined behavior which is up to
>> >> and
>> >> including program termination.  And basically all apps are broken in
>> >> some
>> >> way.  Fortunately, the validation layers are finally starting to catch
>> >> up to
>> >> the point where I'm noticing very few bugs that the validation layers
>> >> don't
>> >> catch and things are getting into a better state.  However, I've had
>> >> more
>> >> discussions than I can count with people where I have to explain to
>> >> them
>> >> that "No, the app is broken.  It needs to be fixed.  It's not my job to
>> >> make
>> >> it 

Re: [Mesa-dev] [PATCH] docs: document INTEL_PRECISE_TRIG envvar

2017-01-11 Thread Jordan Justen
On 2017-01-11 09:53:15, Juan A. Suarez Romero wrote:
> ---
>  docs/envvars.html | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/docs/envvars.html b/docs/envvars.html
> index 9eee8db..4f05d7f 100644
> --- a/docs/envvars.html
> +++ b/docs/envvars.html
> @@ -187,6 +187,7 @@ See the Xlib software driver 
> page for details.
> do32 - generate compute shader SIMD32 programs even if workgroup size 
> doesn't exceed the SIMD16 limit
> norbc - disable single sampled render buffer compression
>  
> +INTEL_PRECISE_TRIG - if set to 1 in gen<10, it fixes results out of 
> [-1.0, 1.0] range for a small set of values.

Since we now have the precise_trig driconf option (d9546b0c5d1a),
should we deprecate INTEL_PRECISE_TRIG?

-Jordan

>  
>  
>  
> -- 
> 2.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs())

2017-01-11 Thread Jason Ekstrand
One trivial request:  If we do land this patch, please include a link the
the mailing list archives in the commit message so that we can easily track
down this discussion if we ever need to in the future.

On Fri, Jan 6, 2017 at 1:42 AM, Samuel Pitoiset 
wrote:

> D3D always computes the absolute value while GLSL says that the
> result of inversesqrt() is undefined if x <= 0 (and undefined if
> x < 0 for sqrt()). But some apps rely on this specific behaviour
> which is not clearly defined by OpenGL.
>
> Computing the absolute value before sqrt()/inversesqrt() will
> prevent that, especially for apps which have been ported from D3D.
> Note that closed drivers seem to also use that quirk.
>
> This gets rid of the NaN values in the "Spec Ops: The Line" game
> as well as the black squares with radeonsi. Note that Nouveau is
> not affected by this bug because we already take the absolute value
> when translating from TGSI to nv50/ir.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97338
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/compiler/glsl/builtin_functions.cpp | 22 --
>  1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/src/compiler/glsl/builtin_functions.cpp
> b/src/compiler/glsl/builtin_functions.cpp
> index 797af08b6c..f816f2ff7d 100644
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -3623,12 +3623,30 @@ builtin_builder::_pow(const glsl_type *type)
> return binop(always_available, ir_binop_pow, type, type, type);
>  }
>
> +ir_function_signature *
> +builtin_builder::_sqrt(builtin_available_predicate avail,
> +   const glsl_type *type)
> +{
> +   ir_variable *x = in_var(type, "x");
> +   MAKE_SIG(type, avail, 1, x);
> +   body.emit(ret(expr(ir_unop_sqrt, abs(x;
> +   return sig;
> +}
> +
> +ir_function_signature *
> +builtin_builder::_inversesqrt(builtin_available_predicate avail,
> +  const glsl_type *type)
> +{
> +   ir_variable *x = in_var(type, "x");
> +   MAKE_SIG(type, avail, 1, x);
> +   body.emit(ret(expr(ir_unop_rsq, abs(x;
> +   return sig;
> +}
> +
>  UNOP(exp, ir_unop_exp,  always_available)
>  UNOP(log, ir_unop_log,  always_available)
>  UNOP(exp2,ir_unop_exp2, always_available)
>  UNOP(log2,ir_unop_log2, always_available)
> -UNOPA(sqrt,ir_unop_sqrt)
> -UNOPA(inversesqrt, ir_unop_rsq)
>
>  /** @} */
>
> --
> 2.11.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   >