Re: [Mesa-dev] [PATCH 1/2] radv: disallow 3D images and mipmaps/layers for R32G32B32 linear formats

2018-10-08 Thread Bas Nieuwenhuizen
On Tue, Oct 9, 2018 at 12:43 AM Jason Ekstrand  wrote:
>
> On Mon, Oct 8, 2018 at 4:06 PM Bas Nieuwenhuizen  
> wrote:
>>
>> On Mon, Oct 8, 2018 at 2:39 PM Samuel Pitoiset
>>  wrote:
>> >
>> > R32G32B32 are weird formats and we are only going to support
>> > some basic operations for now.
>> >
>> > Signed-off-by: Samuel Pitoiset 
>> > ---
>> >  src/amd/vulkan/radv_formats.c | 14 ++
>> >  1 file changed, 14 insertions(+)
>> >
>> > diff --git a/src/amd/vulkan/radv_formats.c b/src/amd/vulkan/radv_formats.c
>> > index ad06c9e996..a7aa819e2b 100644
>> > --- a/src/amd/vulkan/radv_formats.c
>> > +++ b/src/amd/vulkan/radv_formats.c
>> > @@ -1091,6 +1091,20 @@ static VkResult 
>> > radv_get_image_format_properties(struct radv_physical_device *ph
>> > sampleCounts |= VK_SAMPLE_COUNT_2_BIT | 
>> > VK_SAMPLE_COUNT_4_BIT | VK_SAMPLE_COUNT_8_BIT;
>> > }
>> >
>> > +   if (info->tiling == VK_IMAGE_TILING_LINEAR &&
>> > +   (info->format == VK_FORMAT_R32G32B32_SFLOAT ||
>> > +info->format == VK_FORMAT_R32G32B32_SINT ||
>> > +info->format == VK_FORMAT_R32G32B32_UINT)) {
>> Maybe just check if the blocksize is not a power of two?
>
>
> Probably better if you don't support 24 or 48-bit formats.

As far as I can tell this is just a further restriction, we return
unsupported already if the feature flags are 0,
>
>>
>> Either way, this patch  is
>>
>> Reviewed-by: Bas Nieuwenhuizen 
>> > +   /* R32G32B32 is a weird format and the driver currently 
>> > only
>> > +* supports the barely minimum.
>> > +* TODO: Implement more if we really need to.
>> > +*/
>> > +   if (info->type == VK_IMAGE_TYPE_3D)
>> > +   goto unsupported;
>> > +   maxArraySize = 1;
>> > +   maxMipLevels = 1;
>> > +   }
>> > +
>> > if (info->usage & VK_IMAGE_USAGE_SAMPLED_BIT) {
>> > if (!(format_feature_flags & 
>> > VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT)) {
>> > goto unsupported;
>> > --
>> > 2.19.1
>> >
>> > ___
>> > mesa-dev mailing list
>> > mesa-dev@lists.freedesktop.org
>> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] radv: disallow 3D images and mipmaps/layers for R32G32B32 linear formats

2018-10-08 Thread Jason Ekstrand
On Mon, Oct 8, 2018 at 4:06 PM Bas Nieuwenhuizen 
wrote:

> On Mon, Oct 8, 2018 at 2:39 PM Samuel Pitoiset
>  wrote:
> >
> > R32G32B32 are weird formats and we are only going to support
> > some basic operations for now.
> >
> > Signed-off-by: Samuel Pitoiset 
> > ---
> >  src/amd/vulkan/radv_formats.c | 14 ++
> >  1 file changed, 14 insertions(+)
> >
> > diff --git a/src/amd/vulkan/radv_formats.c
> b/src/amd/vulkan/radv_formats.c
> > index ad06c9e996..a7aa819e2b 100644
> > --- a/src/amd/vulkan/radv_formats.c
> > +++ b/src/amd/vulkan/radv_formats.c
> > @@ -1091,6 +1091,20 @@ static VkResult
> radv_get_image_format_properties(struct radv_physical_device *ph
> > sampleCounts |= VK_SAMPLE_COUNT_2_BIT |
> VK_SAMPLE_COUNT_4_BIT | VK_SAMPLE_COUNT_8_BIT;
> > }
> >
> > +   if (info->tiling == VK_IMAGE_TILING_LINEAR &&
> > +   (info->format == VK_FORMAT_R32G32B32_SFLOAT ||
> > +info->format == VK_FORMAT_R32G32B32_SINT ||
> > +info->format == VK_FORMAT_R32G32B32_UINT)) {
> Maybe just check if the blocksize is not a power of two?
>

Probably better if you don't support 24 or 48-bit formats.


> Either way, this patch  is
>
> Reviewed-by: Bas Nieuwenhuizen 
> > +   /* R32G32B32 is a weird format and the driver currently
> only
> > +* supports the barely minimum.
> > +* TODO: Implement more if we really need to.
> > +*/
> > +   if (info->type == VK_IMAGE_TYPE_3D)
> > +   goto unsupported;
> > +   maxArraySize = 1;
> > +   maxMipLevels = 1;
> > +   }
> > +
> > if (info->usage & VK_IMAGE_USAGE_SAMPLED_BIT) {
> > if (!(format_feature_flags &
> VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT)) {
> > goto unsupported;
> > --
> > 2.19.1
> >
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108275] Breaking out of loop creates broken code on RADV

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108275

Bas Nieuwenhuizen  changed:

   What|Removed |Added

 Status|NEEDINFO|NEW

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/6] radeonsi:optimizing SET_CONTEXT_REG for shaders VS

2018-10-08 Thread Marek Olšák
I didn't see any issue when I was testing Tahiti and Hawaii on amdgpu.
radeon is too unstable for a 16-thread piglit, so I never use it.

You can try to set reg_saved = 0 in si_begin_new_gfx_cs. That should
prevent issues with CLEAR_STATE.

Marek
On Mon, Oct 8, 2018 at 12:30 PM Michel Dänzer  wrote:
>
> On 2018-10-03 5:53 p.m., Sonny Jiang wrote:
> > Signed-off-by: Sonny Jiang 
>
> Unfortunately, this change causes GPU hangs with the radeon kernel
> driver on Kaveri, see the attached dmesg excerpt (this might have been
> with later patches from the series still applied, but I've had to revert
> those in addition to this one, or there are conflicts).
>
>
> --
> Earthling Michel Dänzer   |   http://www.amd.com
> Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108277] Implement tiled copies for AMD -> Intel PRIME

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108277

Bug ID: 108277
   Summary: Implement tiled copies for AMD -> Intel PRIME
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: enhancement
  Priority: medium
 Component: Drivers/Vulkan/Common
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: ja...@jlekstrand.net
CC: airl...@freedesktop.org, chadvers...@chromium.org,
dan...@fooishbar.org, ja...@jlekstrand.net

When running the display server (X11 or Wayland) on an Intel GPU and using an
AMD GPU as the primary, it wouldn't be terribly difficult to exchange X or
Y-tiled images instead of linear ones.  This would certainly improve
performance on the Intel GPU and probably somewhat on the AMD GPU since
anything is better than linear.  The idea would be to replace the
vkCmdCopyImageToBuffer with a compute pipeline that binds the PRIME image as a
buffer and does the swizzling in the shader to get X or Y-tiling.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99553] Tracker bug for runnning OpenCL applications on Clover

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99553

Jan Vesely  changed:

   What|Removed |Added

 Blocks|108272  |
 Depends on||108272


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=108272
[Bug 108272] [polaris10] opencl-mesa: Anything using OpenCL segfaults, XFX
Radeon RX 580
-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/12] nir: Allow [iu]mul_high on non-32-bit types

2018-10-08 Thread Jason Ekstrand
On Mon, Oct 8, 2018 at 3:46 PM Ian Romanick  wrote:

> On 10/05/2018 09:10 PM, Jason Ekstrand wrote:
> > ---
> >  src/compiler/nir/nir_constant_expressions.py |  1 +
> >  src/compiler/nir/nir_opcodes.py  | 43 ++--
> >  2 files changed, 40 insertions(+), 4 deletions(-)
> >
> > diff --git a/src/compiler/nir/nir_constant_expressions.py
> b/src/compiler/nir/nir_constant_expressions.py
> > index 118af9f7818..afc0739e8b2 100644
> > --- a/src/compiler/nir/nir_constant_expressions.py
> > +++ b/src/compiler/nir/nir_constant_expressions.py
> > @@ -79,6 +79,7 @@ template = """\
> >  #include 
> >  #include "util/rounding.h" /* for _mesa_roundeven */
> >  #include "util/half_float.h"
> > +#include "util/bigmath.h"
> >  #include "nir_constant_expressions.h"
> >
> >  /**
> > diff --git a/src/compiler/nir/nir_opcodes.py
> b/src/compiler/nir/nir_opcodes.py
> > index 4ef4ecc6f22..209f0c5509b 100644
> > --- a/src/compiler/nir/nir_opcodes.py
> > +++ b/src/compiler/nir/nir_opcodes.py
> > @@ -443,12 +443,47 @@ binop("isub", tint, "", "src0 - src1")
> >  binop("fmul", tfloat, commutative + associative, "src0 * src1")
> >  # low 32-bits of signed/unsigned integer multiply
> >  binop("imul", tint, commutative + associative, "src0 * src1")
> > +
> >  # high 32-bits of signed integer multiply
> > -binop("imul_high", tint32, commutative,
> > -  "(int32_t)(((int64_t) src0 * (int64_t) src1) >> 32)")
> > +binop("imul_high", tint, commutative, """
>
> This will enable imul_high for all integer types (ditto for umul_high
> below).  A later patch adds lowering for 64-bit integer type.  Will the
> backend do the right thing for [iu]mul_high of 16- or 8-bit types?
>

That's a good question.  Looks like lower_integer_multiplication in the
back-end will do nothing whatsoever, and we'll emit an illegal opcode which
will probably hang the GPU.  For 8 and 16, it's easy enough to lower to a
couple of conversions, a N*2-bit multiply, and a shift.  It's also not
obvious where the cut-off point for the optimization is.  Certainly, it's
better in 64-bits than doing the division algorithm in the shader and I
think it's better for 32 but maybe not in 8 and 16?  I'm not sure.  I'm
pretty sure my 32-bit benchmark gave positive results (about 40-50% faster)
but it was very noisy.

I don't think anything allows 8 and 16-bit arithmetic right now.  Still,
should probably fix it...

--Jason


> > +if (bit_size == 64) {
> > +   /* We need to do a full 128-bit x 128-bit multiply in order for the
> sign
> > +* extension to work properly.  The casts are kind-of annoying but
> needed
> > +* to prevent compiler warnings.
> > +*/
> > +   uint32_t src0_u32[4] = {
> > +  src0,
> > +  (int64_t)src0 >> 32,
> > +  (int64_t)src0 >> 63,
> > +  (int64_t)src0 >> 63,
> > +   };
> > +   uint32_t src1_u32[4] = {
> > +  src1,
> > +  (int64_t)src1 >> 32,
> > +  (int64_t)src1 >> 63,
> > +  (int64_t)src1 >> 63,
> > +   };
> > +   uint32_t prod_u32[4];
> > +   ubm_mul_u32arr(prod_u32, src0_u32, src1_u32);
> > +   dst = (uint64_t)prod_u32[2] | ((uint64_t)prod_u32[3] << 32);
> > +} else {
> > +   dst = ((int64_t)src0 * (int64_t)src1) >> bit_size;
> > +}
> > +""")
> > +
> >  # high 32-bits of unsigned integer multiply
> > -binop("umul_high", tuint32, commutative,
> > -  "(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")
> > +binop("umul_high", tuint, commutative, """
> > +if (bit_size == 64) {
> > +   /* The casts are kind-of annoying but needed to prevent compiler
> warnings. */
> > +   uint32_t src0_u32[2] = { src0, (uint64_t)src0 >> 32 };
> > +   uint32_t src1_u32[2] = { src1, (uint64_t)src1 >> 32 };
> > +   uint32_t prod_u32[4];
> > +   ubm_mul_u32arr(prod_u32, src0_u32, src1_u32);
> > +   dst = (uint64_t)prod_u32[2] | ((uint64_t)prod_u32[3] << 32);
> > +} else {
> > +   dst = ((uint64_t)src0 * (uint64_t)src1) >> bit_size;
> > +}
> > +""")
> >
> >  binop("fdiv", tfloat, "", "src0 / src1")
> >  binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 99553] Tracker bug for runnning OpenCL applications on Clover

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99553

Jan Vesely  changed:

   What|Removed |Added

 Blocks||108272


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=108272
[Bug 108272] [polaris10] opencl-mesa: Anything using OpenCL segfaults, XFX
Radeon RX 580
-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108275] Breaking out of loop creates broken code on RADV

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108275

--- Comment #3 from mais...@archlinux.us ---
Seems to work just fine on Intel (Anvil).

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] radv: disallow 3D images and mipmaps/layers for R32G32B32 linear formats

2018-10-08 Thread Bas Nieuwenhuizen
On Mon, Oct 8, 2018 at 2:39 PM Samuel Pitoiset
 wrote:
>
> R32G32B32 are weird formats and we are only going to support
> some basic operations for now.
>
> Signed-off-by: Samuel Pitoiset 
> ---
>  src/amd/vulkan/radv_formats.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/src/amd/vulkan/radv_formats.c b/src/amd/vulkan/radv_formats.c
> index ad06c9e996..a7aa819e2b 100644
> --- a/src/amd/vulkan/radv_formats.c
> +++ b/src/amd/vulkan/radv_formats.c
> @@ -1091,6 +1091,20 @@ static VkResult 
> radv_get_image_format_properties(struct radv_physical_device *ph
> sampleCounts |= VK_SAMPLE_COUNT_2_BIT | VK_SAMPLE_COUNT_4_BIT 
> | VK_SAMPLE_COUNT_8_BIT;
> }
>
> +   if (info->tiling == VK_IMAGE_TILING_LINEAR &&
> +   (info->format == VK_FORMAT_R32G32B32_SFLOAT ||
> +info->format == VK_FORMAT_R32G32B32_SINT ||
> +info->format == VK_FORMAT_R32G32B32_UINT)) {
Maybe just check if the blocksize is not a power of two?

Either way, this patch  is

Reviewed-by: Bas Nieuwenhuizen 
> +   /* R32G32B32 is a weird format and the driver currently only
> +* supports the barely minimum.
> +* TODO: Implement more if we really need to.
> +*/
> +   if (info->type == VK_IMAGE_TYPE_3D)
> +   goto unsupported;
> +   maxArraySize = 1;
> +   maxMipLevels = 1;
> +   }
> +
> if (info->usage & VK_IMAGE_USAGE_SAMPLED_BIT) {
> if (!(format_feature_flags & 
> VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT)) {
> goto unsupported;
> --
> 2.19.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/9] nir: Add nir_const_value_negative_equal

2018-10-08 Thread Thomas Helland
I really like this one; its very readable =)

Reviewed-by: Thomas Helland

Den tor. 30. aug. 2018 kl. 07:37 skrev Ian Romanick :
>
> From: Ian Romanick 
>
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/nir/meson.build|  12 +
>  src/compiler/nir/nir.h  |   6 +
>  src/compiler/nir/nir_instr_set.c|  98 +
>  src/compiler/nir/tests/negative_equal_tests.cpp | 278 
> 
>  4 files changed, 394 insertions(+)
>  create mode 100644 src/compiler/nir/tests/negative_equal_tests.cpp
>
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index 090aa7a628f..5438c17a8f8 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -245,4 +245,16 @@ if with_tests
>link_with : libmesa_util,
>  )
>)
> +
> +  test(
> +'negative_equal',
> +executable(
> +  'negative_equal',
> +  files('tests/negative_equal_tests.cpp'),
> +  c_args : [c_vis_args, c_msvc_compat_args, no_override_init_args],
> +  include_directories : [inc_common],
> +  dependencies : [dep_thread, idep_gtest, idep_nir],
> +  link_with : libmesa_util,
> +)
> +  )
>  endif
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 9bca6d487e9..f94538e0782 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -955,6 +955,12 @@ nir_ssa_alu_instr_src_components(const nir_alu_instr 
> *instr, unsigned src)
> return instr->dest.dest.ssa.num_components;
>  }
>
> +bool nir_const_value_negative_equal(const nir_const_value *c1,
> +const nir_const_value *c2,
> +unsigned components,
> +nir_alu_type base_type,
> +unsigned bits);
> +
>  bool nir_alu_srcs_equal(const nir_alu_instr *alu1, const nir_alu_instr *alu2,
>  unsigned src1, unsigned src2);
>
> diff --git a/src/compiler/nir/nir_instr_set.c 
> b/src/compiler/nir/nir_instr_set.c
> index 19771fcd9dd..009d9661e60 100644
> --- a/src/compiler/nir/nir_instr_set.c
> +++ b/src/compiler/nir/nir_instr_set.c
> @@ -23,6 +23,7 @@
>
>  #include "nir_instr_set.h"
>  #include "nir_vla.h"
> +#include "util/half_float.h"
>
>  #define HASH(hash, data) _mesa_fnv32_1a_accumulate((hash), (data))
>
> @@ -261,6 +262,103 @@ nir_srcs_equal(nir_src src1, nir_src src2)
> }
>  }
>
> +bool
> +nir_const_value_negative_equal(const nir_const_value *c1,
> +   const nir_const_value *c2,
> +   unsigned components,
> +   nir_alu_type base_type,
> +   unsigned bits)
> +{
> +   assert(base_type == nir_alu_type_get_base_type(base_type));
> +   assert(base_type != nir_type_invalid);
> +
> +   switch (base_type) {
> +   case nir_type_float:
> +  switch (bits) {
> +  case 16:
> + for (unsigned i = 0; i < components; i++) {
> +if (_mesa_half_to_float(c1->u16[i]) !=
> +-_mesa_half_to_float(c2->u16[i])) {
> +   return false;
> +}
> + }
> +
> + return true;
> +
> +  case 32:
> + for (unsigned i = 0; i < components; i++) {
> +if (c1->f32[i] != -c2->f32[i])
> +   return false;
> + }
> +
> + return true;
> +
> +  case 64:
> + for (unsigned i = 0; i < components; i++) {
> +if (c1->f64[i] != -c2->f64[i])
> +   return false;
> + }
> +
> + return true;
> +
> +  default:
> + unreachable("unknown bit size");
> +  }
> +
> +  break;
> +
> +   case nir_type_int:
> +   case nir_type_uint:
> +  switch (bits) {
> +  case 8:
> + for (unsigned i = 0; i < components; i++) {
> +if (c1->i8[i] != -c2->i8[i])
> +   return false;
> + }
> +
> + return true;
> +
> +  case 16:
> + for (unsigned i = 0; i < components; i++) {
> +if (c1->i16[i] != -c2->i16[i])
> +   return false;
> + }
> +
> + return true;
> + break;
> +
> +  case 32:
> + for (unsigned i = 0; i < components; i++) {
> +if (c1->i32[i] != -c2->i32[i])
> +   return false;
> + }
> +
> + return true;
> +
> +  case 64:
> + for (unsigned i = 0; i < components; i++) {
> +if (c1->i64[i] != -c2->i64[i])
> +   return false;
> + }
> +
> + return true;
> +
> +  default:
> + unreachable("unknown bit size");
> +  }
> +
> +  break;
> +
> +   case nir_type_bool:
> +  return false;
> +
> +   default:
> +  break;
> +   }
> +
> +   return false;
> +}
> +
>  bool
>  nir_alu_srcs_equal(const nir_alu_instr *alu1, const nir_alu_instr *alu2,
> unsigned src1, unsigned src2)
> diff 

Re: [Mesa-dev] [PATCH] i965: consider a 'base level' when calculating width0, height0, depth0

2018-10-08 Thread Rafael Antognolli
On Tue, Oct 02, 2018 at 07:16:01PM +0300, asimiklit.w...@gmail.com wrote:
> From: Andrii Simiklit 
> 
> I guess that when we calculating the width0, height0, depth0
> to use for function 'intel_miptree_create' we need to consider
> the 'base level' like it is done in the 'intel_miptree_create_for_teximage'
> function.

Hi Andrii, this makes sense to me. I'm also not familiar with this code,
so I'm not sure this is the right way to solve the issue, but at least
it's a way.

You added a simple test case in the bug, do you think you could make
that a piglit test?

Thanks,
Rafael

> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107987
> Signed-off-by: Andrii Simiklit 
> ---
>  .../drivers/dri/i965/intel_tex_validate.c | 26 ++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c 
> b/src/mesa/drivers/dri/i965/intel_tex_validate.c
> index 72ce83c7ce..37aa8f43ec 100644
> --- a/src/mesa/drivers/dri/i965/intel_tex_validate.c
> +++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c
> @@ -119,8 +119,32 @@ intel_finalize_mipmap_tree(struct brw_context *brw,
> /* May need to create a new tree:
>  */
> if (!intelObj->mt) {
> +  const unsigned level = firstImage->base.Base.Level;
>intel_get_image_dims(>base.Base, , , );
> -
> +  /* Figure out image dimensions at start level. */
> +  switch(intelObj->base.Target) {
> +  case GL_TEXTURE_2D_MULTISAMPLE:
> +  case GL_TEXTURE_2D_MULTISAMPLE_ARRAY:
> +  case GL_TEXTURE_RECTANGLE:
> +  case GL_TEXTURE_EXTERNAL_OES:
> +  assert(level == 0);
> +  break;
> +  case GL_TEXTURE_3D:
> +  depth = depth << level;
> +  /* Fall through */
> +  case GL_TEXTURE_2D:
> +  case GL_TEXTURE_2D_ARRAY:
> +  case GL_TEXTURE_CUBE_MAP:
> +  case GL_TEXTURE_CUBE_MAP_ARRAY:
> +  height = height << level;
> +  /* Fall through */
> +  case GL_TEXTURE_1D:
> +  case GL_TEXTURE_1D_ARRAY:
> +  width = width << level;
> +  break;
> +  default:
> +  unreachable("Unexpected target");
> +  }
>perf_debug("Creating new %s %dx%dx%d %d-level miptree to handle "
>   "finalized texture miptree.\n",
>   _mesa_get_format_name(firstImage->base.Base.TexFormat),
> -- 
> 2.17.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 07/12] nir: Allow [iu]mul_high on non-32-bit types

2018-10-08 Thread Ian Romanick
On 10/05/2018 09:10 PM, Jason Ekstrand wrote:
> ---
>  src/compiler/nir/nir_constant_expressions.py |  1 +
>  src/compiler/nir/nir_opcodes.py  | 43 ++--
>  2 files changed, 40 insertions(+), 4 deletions(-)
> 
> diff --git a/src/compiler/nir/nir_constant_expressions.py 
> b/src/compiler/nir/nir_constant_expressions.py
> index 118af9f7818..afc0739e8b2 100644
> --- a/src/compiler/nir/nir_constant_expressions.py
> +++ b/src/compiler/nir/nir_constant_expressions.py
> @@ -79,6 +79,7 @@ template = """\
>  #include 
>  #include "util/rounding.h" /* for _mesa_roundeven */
>  #include "util/half_float.h"
> +#include "util/bigmath.h"
>  #include "nir_constant_expressions.h"
>  
>  /**
> diff --git a/src/compiler/nir/nir_opcodes.py b/src/compiler/nir/nir_opcodes.py
> index 4ef4ecc6f22..209f0c5509b 100644
> --- a/src/compiler/nir/nir_opcodes.py
> +++ b/src/compiler/nir/nir_opcodes.py
> @@ -443,12 +443,47 @@ binop("isub", tint, "", "src0 - src1")
>  binop("fmul", tfloat, commutative + associative, "src0 * src1")
>  # low 32-bits of signed/unsigned integer multiply
>  binop("imul", tint, commutative + associative, "src0 * src1")
> +
>  # high 32-bits of signed integer multiply
> -binop("imul_high", tint32, commutative,
> -  "(int32_t)(((int64_t) src0 * (int64_t) src1) >> 32)")
> +binop("imul_high", tint, commutative, """

This will enable imul_high for all integer types (ditto for umul_high
below).  A later patch adds lowering for 64-bit integer type.  Will the
backend do the right thing for [iu]mul_high of 16- or 8-bit types?

> +if (bit_size == 64) {
> +   /* We need to do a full 128-bit x 128-bit multiply in order for the sign
> +* extension to work properly.  The casts are kind-of annoying but needed
> +* to prevent compiler warnings.
> +*/
> +   uint32_t src0_u32[4] = {
> +  src0,
> +  (int64_t)src0 >> 32,
> +  (int64_t)src0 >> 63,
> +  (int64_t)src0 >> 63,
> +   };
> +   uint32_t src1_u32[4] = {
> +  src1,
> +  (int64_t)src1 >> 32,
> +  (int64_t)src1 >> 63,
> +  (int64_t)src1 >> 63,
> +   };
> +   uint32_t prod_u32[4];
> +   ubm_mul_u32arr(prod_u32, src0_u32, src1_u32);
> +   dst = (uint64_t)prod_u32[2] | ((uint64_t)prod_u32[3] << 32);
> +} else {
> +   dst = ((int64_t)src0 * (int64_t)src1) >> bit_size;
> +}
> +""")
> +
>  # high 32-bits of unsigned integer multiply
> -binop("umul_high", tuint32, commutative,
> -  "(uint32_t)(((uint64_t) src0 * (uint64_t) src1) >> 32)")
> +binop("umul_high", tuint, commutative, """
> +if (bit_size == 64) {
> +   /* The casts are kind-of annoying but needed to prevent compiler 
> warnings. */
> +   uint32_t src0_u32[2] = { src0, (uint64_t)src0 >> 32 };
> +   uint32_t src1_u32[2] = { src1, (uint64_t)src1 >> 32 };
> +   uint32_t prod_u32[4];
> +   ubm_mul_u32arr(prod_u32, src0_u32, src1_u32);
> +   dst = (uint64_t)prod_u32[2] | ((uint64_t)prod_u32[3] << 32);
> +} else {
> +   dst = ((uint64_t)src0 * (uint64_t)src1) >> bit_size;
> +}
> +""")
>  
>  binop("fdiv", tfloat, "", "src0 / src1")
>  binop("idiv", tint, "", "src1 == 0 ? 0 : (src0 / src1)")
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] nir/opt_peephole_select: Don't peephole_select expensive math instructions

2018-10-08 Thread Thomas Helland
Den tor. 30. aug. 2018 kl. 07:37 skrev Ian Romanick :
>
> From: Ian Romanick 
>
> On some GPUs, especially older Intel GPUs, some math instructions are
> very expensive.  On those architectures, don't reduce flow control to a
> csel if one of the branches contains one of these expensive math
> instructions.
>
> This prevents a bunch of cycle count regressions on pre-Gen6 platforms
> with a later patch (intel/compiler: More peephole select for pre-Gen6).
>
> Signed-off-by: Ian Romanick 
> ---
>  src/amd/vulkan/radv_shader.c |  2 +-
>  src/broadcom/compiler/nir_to_vir.c   |  2 +-
>  src/compiler/nir/nir.h   |  2 +-
>  src/compiler/nir/nir_opt_peephole_select.c   | 46 
> +++-
>  src/gallium/drivers/freedreno/ir3/ir3_nir.c  |  2 +-
>  src/gallium/drivers/radeonsi/si_shader_nir.c |  2 +-
>  src/gallium/drivers/vc4/vc4_program.c|  2 +-
>  src/intel/compiler/brw_nir.c |  4 +--
>  src/mesa/state_tracker/st_glsl_to_nir.cpp|  2 +-
>  9 files changed, 47 insertions(+), 17 deletions(-)
>
> diff --git a/src/amd/vulkan/radv_shader.c b/src/amd/vulkan/radv_shader.c
> index 632512db09b..c8d502a9e3a 100644
> --- a/src/amd/vulkan/radv_shader.c
> +++ b/src/amd/vulkan/radv_shader.c
> @@ -143,7 +143,7 @@ radv_optimize_nir(struct nir_shader *shader, bool 
> optimize_conservatively)
>  NIR_PASS(progress, shader, nir_opt_if);
>  NIR_PASS(progress, shader, nir_opt_dead_cf);
>  NIR_PASS(progress, shader, nir_opt_cse);
> -NIR_PASS(progress, shader, nir_opt_peephole_select, 8, true);
> +NIR_PASS(progress, shader, nir_opt_peephole_select, 8, true, 
> true);
>  NIR_PASS(progress, shader, nir_opt_algebraic);
>  NIR_PASS(progress, shader, nir_opt_constant_folding);
>  NIR_PASS(progress, shader, nir_opt_undef);
> diff --git a/src/broadcom/compiler/nir_to_vir.c 
> b/src/broadcom/compiler/nir_to_vir.c
> index 0d23cea4d5b..ec0ff4b907a 100644
> --- a/src/broadcom/compiler/nir_to_vir.c
> +++ b/src/broadcom/compiler/nir_to_vir.c
> @@ -1210,7 +1210,7 @@ v3d_optimize_nir(struct nir_shader *s)
>  NIR_PASS(progress, s, nir_opt_dce);
>  NIR_PASS(progress, s, nir_opt_dead_cf);
>  NIR_PASS(progress, s, nir_opt_cse);
> -NIR_PASS(progress, s, nir_opt_peephole_select, 8, true);
> +NIR_PASS(progress, s, nir_opt_peephole_select, 8, true, 
> true);
>  NIR_PASS(progress, s, nir_opt_algebraic);
>  NIR_PASS(progress, s, nir_opt_constant_folding);
>  NIR_PASS(progress, s, nir_opt_undef);
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index 67fa46d5557..feb69be6b59 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -3003,7 +3003,7 @@ bool nir_opt_move_comparisons(nir_shader *shader);
>  bool nir_opt_move_load_ubo(nir_shader *shader);
>
>  bool nir_opt_peephole_select(nir_shader *shader, unsigned limit,
> - bool indirect_load_ok);
> + bool indirect_load_ok, bool expensive_alu_ok);
>
>  bool nir_opt_remove_phis_impl(nir_function_impl *impl);
>  bool nir_opt_remove_phis(nir_shader *shader);
> diff --git a/src/compiler/nir/nir_opt_peephole_select.c 
> b/src/compiler/nir/nir_opt_peephole_select.c
> index 6808d3eda6c..09b55f3739e 100644
> --- a/src/compiler/nir/nir_opt_peephole_select.c
> +++ b/src/compiler/nir/nir_opt_peephole_select.c
> @@ -59,7 +59,8 @@
>
>  static bool
>  block_check_for_allowed_instrs(nir_block *block, unsigned *count,
> -   bool alu_ok, bool indirect_load_ok)
> +   bool alu_ok, bool indirect_load_ok,
> +   bool expensive_alu_ok)
>  {
> nir_foreach_instr(instr, block) {
>switch (instr->type) {
> @@ -117,6 +118,25 @@ block_check_for_allowed_instrs(nir_block *block, 
> unsigned *count,
>   case nir_op_vec3:
>   case nir_op_vec4:
>  break;
> +
> + case nir_op_fcos:
> + case nir_op_fdiv:
> + case nir_op_fexp2:
> + case nir_op_flog2:
> + case nir_op_fmod:
> + case nir_op_fpow:
> + case nir_op_frcp:
> + case nir_op_frem:
> + case nir_op_frsq:
> + case nir_op_fsin:
> + case nir_op_idiv:
> + case nir_op_irem:
> + case nir_op_udiv:
> +if (!alu_ok || !expensive_alu_ok)
> +   return false;
> +
> +break;
> +
>   default:
>  if (!alu_ok) {
> /* It must be a move-like operation. */
> @@ -160,7 +180,8 @@ block_check_for_allowed_instrs(nir_block *block, unsigned 
> *count,
>
>  static bool
>  nir_opt_peephole_select_block(nir_block *block, nir_shader *shader,
> -  unsigned limit, bool 

Re: [Mesa-dev] [PATCH 06/12] util: Add tests for fast integer division by constants

2018-10-08 Thread Dylan Baker
Quoting Jason Ekstrand (2018-10-05 21:10:14)
> While I generally trust rediculousfish to have done his homework, we've
> made some adjustments to suite the needs of mesa and it'd be good to
   ^
   suit

> test those.  Also, there's no better place than unit tests to clearly
> document the different edge cases of the different methods.
> ---
>  configure.ac  |   1 +
>  src/util/Makefile.am  |   3 +-
>  src/util/meson.build  |   1 +
>  src/util/tests/fast_idiv_by_const/Makefile.am |  43 ++
>  .../fast_idiv_by_const_test.cpp   | 472 ++
>  src/util/tests/fast_idiv_by_const/meson.build |  30 ++
>  6 files changed, 549 insertions(+), 1 deletion(-)
>  create mode 100644 src/util/tests/fast_idiv_by_const/Makefile.am
>  create mode 100644 
> src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp
>  create mode 100644 src/util/tests/fast_idiv_by_const/meson.build
> 
> diff --git a/configure.ac b/configure.ac
> index 34689826c98..7b0b2b20ba2 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -3198,6 +3198,7 @@ AC_CONFIG_FILES([Makefile
>   src/util/tests/hash_table/Makefile
>   src/util/tests/set/Makefile
>   src/util/tests/string_buffer/Makefile
> + src/util/tests/uint_inverse/Makefile
>   src/util/tests/vma/Makefile
>   src/util/xmlpool/Makefile
>   src/vulkan/Makefile])
> diff --git a/src/util/Makefile.am b/src/util/Makefile.am
> index d79f2b320be..9e633bf65d5 100644
> --- a/src/util/Makefile.am
> +++ b/src/util/Makefile.am
> @@ -24,7 +24,8 @@ SUBDIRS = . \
> tests/fast_idiv_by_const \
> tests/hash_table \
> tests/string_buffer \
> -   tests/set
> +   tests/set \
> +   tests/uint_inverse
>  
>  if HAVE_STD_CXX11
>  SUBDIRS += tests/vma
> diff --git a/src/util/meson.build b/src/util/meson.build
> index cdbad98e7cb..49d84c16ebe 100644
> --- a/src/util/meson.build
> +++ b/src/util/meson.build
> @@ -170,6 +170,7 @@ if with_tests
>  )
>)
>  
> +  subdir('tests/fast_idiv_by_const')
>subdir('tests/hash_table')
>subdir('tests/string_buffer')
>subdir('tests/vma')
> diff --git a/src/util/tests/fast_idiv_by_const/Makefile.am 
> b/src/util/tests/fast_idiv_by_const/Makefile.am
> new file mode 100644
> index 000..1ebee09f59b
> --- /dev/null
> +++ b/src/util/tests/fast_idiv_by_const/Makefile.am
> @@ -0,0 +1,43 @@
> +# Copyright © 2018 Intel
> +#
> +#  Permission is hereby granted, free of charge, to any person obtaining a
> +#  copy of this software and associated documentation files (the "Software"),
> +#  to deal in the Software without restriction, including without limitation
> +#  the rights to use, copy, modify, merge, publish, distribute, sublicense,
> +#  and/or sell copies of the Software, and to permit persons to whom the
> +#  Software is furnished to do so, subject to the following conditions:
> +#
> +#  The above copyright notice and this permission notice (including the next
> +#  paragraph) shall be included in all copies or substantial portions of the
> +#  Software.
> +#
> +#  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> +#  IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> +#  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> +#  THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> +#  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> +#  FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> +#  IN THE SOFTWARE.
> +
> +AM_CPPFLAGS = \
> +   -I$(top_srcdir)/src \
> +   -I$(top_srcdir)/include \
> +   -I$(top_srcdir)/src/gallium/include \
> +   -I$(top_srcdir)/src/gtest/include \
> +   $(PTHREAD_CFLAGS) \
> +   $(DEFINES)
> +
> +TESTS = fast_idiv_by_const_test
> +
> +check_PROGRAMS = $(TESTS)
> +
> +fast_idiv_by_const_test_SOURCES = \
> +   fast_idiv_by_const_test.cpp
> +
> +fast_idiv_by_const_test_LDADD = \
> +   $(top_builddir)/src/gtest/libgtest.la \
> +   $(top_builddir)/src/util/libmesautil.la \
> +   $(PTHREAD_LIBS) \
> +   $(DLOPEN_LIBS)
> +
> +EXTRA_DIST = meson.build
> diff --git a/src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp 
> b/src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp
> new file mode 100644
> index 000..34b149e1c6f
> --- /dev/null
> +++ b/src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp
> @@ -0,0 +1,472 @@
> +/*
> + * Copyright © 2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the 

Re: [Mesa-dev] [PATCH 04/11] nir: Add tests for dead write elimination

2018-10-08 Thread Jason Ekstrand
On Fri, Sep 14, 2018 at 10:46 PM Caio Marcelo de Oliveira Filho <
caio.olive...@intel.com> wrote:

> Note at the moment the pass called is nir_opt_copy_prop_vars, because
> dead write elimination is implemented there.
>
> Also added tests that involve identifying dead writes in multiple
> blocks (e.g. the overwrite happens in another block).  Those currently
> fail as expected, so are marked to be skipped.
> ---
>  src/compiler/nir/tests/vars_tests.cpp | 241 ++
>  1 file changed, 241 insertions(+)
>
> diff --git a/src/compiler/nir/tests/vars_tests.cpp
> b/src/compiler/nir/tests/vars_tests.cpp
> index 7fbdb514349..dd913f04429 100644
> --- a/src/compiler/nir/tests/vars_tests.cpp
> +++ b/src/compiler/nir/tests/vars_tests.cpp
> @@ -26,6 +26,9 @@
>  #include "nir.h"
>  #include "nir_builder.h"
>
> +/* This optimization is done together with copy propagation. */
> +#define nir_opt_dead_write_vars nir_opt_copy_prop_vars
> +
>  namespace {
>
>  class nir_vars_test : public ::testing::Test {
> @@ -141,6 +144,7 @@ nir_imm_ivec2(nir_builder *build, int x, int y)
>
>  /* Allow grouping the tests while still sharing the helpers. */
>  class nir_copy_prop_vars_test : public nir_vars_test {};
> +class nir_dead_write_vars_test : public nir_vars_test {};
>
>  } // namespace
>
> @@ -197,3 +201,240 @@ TEST_F(nir_copy_prop_vars_test, simple_store_load)
>EXPECT_EQ(store->src[1].ssa, stored_value);
> }
>  }
> +
> +TEST_F(nir_dead_write_vars_test, no_dead_writes_in_block)
> +{
> +   nir_variable **v = create_many_int(nir_var_shader_storage, "v", 2);
>

Using inputs and outputs is probably a tad bit safer than shader_storage
but I don't think it matters too much.  This is probably fine.


> +
> +   nir_store_var(b, v[0], nir_load_var(b, v[1]), 1);
> +
> +   bool progress = nir_opt_dead_write_vars(b->shader);
> +   ASSERT_FALSE(progress);
> +}
> +
> +TEST_F(nir_dead_write_vars_test,
> no_dead_writes_different_components_in_block)
> +{
> +   nir_variable **v = create_many_ivec2(nir_var_shader_storage, "v", 3);
> +
> +   nir_store_var(b, v[0], nir_load_var(b, v[1]), 1 << 0);
> +   nir_store_var(b, v[0], nir_load_var(b, v[2]), 1 << 1);
> +
> +   bool progress = nir_opt_dead_write_vars(b->shader);
> +   ASSERT_FALSE(progress);
> +}
> +
> +TEST_F(nir_dead_write_vars_test, no_dead_writes_in_if_statement)
> +{
> +   nir_variable **v = create_many_int(nir_var_shader_storage, "v", 6);
> +
> +   nir_store_var(b, v[2], nir_load_var(b, v[0]), 1);
> +   nir_store_var(b, v[3], nir_load_var(b, v[1]), 1);
> +
> +   /* Each arm of the if statement will overwrite one store. */
> +   nir_if *if_stmt = nir_push_if(b, nir_imm_int(b, 0));
>

Maybe nir_push_if(b, nir_load_var(b, v[0])); so that it's not a loop with
one dead side.  I doubt this pass will ever get that smart but it's easy
enough to just prevent the possibility.


> +   nir_store_var(b, v[2], nir_load_var(b, v[4]), 1);
> +
> +   nir_push_else(b, if_stmt);
> +   nir_store_var(b, v[3], nir_load_var(b, v[5]), 1);
> +
> +   nir_pop_if(b, if_stmt);
> +
> +   bool progress = nir_opt_dead_write_vars(b->shader);
> +   ASSERT_FALSE(progress);
> +}
> +
> +TEST_F(nir_dead_write_vars_test, no_dead_writes_in_loop_statement)
> +{
> +   nir_variable **v = create_many_int(nir_var_shader_storage, "v", 3);
> +
> +   nir_store_var(b, v[0], nir_load_var(b, v[1]), 1);
> +
> +   /* Loop will write other value.  Since it might not be executed, it
> doesn't
> +* kill the first write.
> +*/
> +   nir_loop *loop = nir_push_loop(b);
> +
> +   nir_if *if_stmt = nir_push_if(b, nir_imm_int(b, 0));
>

Same here, though you'll want to use v[1] instead of v[0] so it doesn't get
copy-propagated.


> +   nir_jump(b, nir_jump_break);
> +   nir_pop_if(b, if_stmt);
> +
> +   nir_store_var(b, v[0], nir_load_var(b, v[2]), 1);
> +   nir_pop_loop(b, loop);
> +
> +   bool progress = nir_opt_dead_write_vars(b->shader);
> +   ASSERT_FALSE(progress);
> +}
> +
> +TEST_F(nir_dead_write_vars_test, dead_write_in_block)
> +{
> +   nir_variable **v = create_many_int(nir_var_shader_storage, "v", 3);
> +
> +   nir_store_var(b, v[0], nir_load_var(b, v[1]), 1);
> +   nir_ssa_def *load_v2 = nir_load_var(b, v[2]);
> +   nir_store_var(b, v[0], load_v2, 1);
> +
> +   bool progress = nir_opt_dead_write_vars(b->shader);
> +   ASSERT_TRUE(progress);
> +
> +   EXPECT_EQ(1, count_intrinsics(nir_intrinsic_store_deref));
> +
> +   nir_intrinsic_instr *store =
> find_next_intrinsic(nir_intrinsic_store_deref, NULL);
> +   ASSERT_TRUE(store->src[1].is_ssa);
> +   EXPECT_EQ(store->src[1].ssa, load_v2);
> +}
> +
> +TEST_F(nir_dead_write_vars_test, dead_write_components_in_block)
> +{
> +   nir_variable **v = create_many_ivec2(nir_var_shader_storage, "v", 3);
> +
> +   nir_store_var(b, v[0], nir_load_var(b, v[1]), 1 << 0);
> +   nir_ssa_def *load_v2 = nir_load_var(b, v[2]);
> +   nir_store_var(b, v[0], load_v2, 1 << 0);
> +
> +   bool progress = nir_opt_dead_write_vars(b->shader);
> +   ASSERT_TRUE(progress);
> +
> + 

[Mesa-dev] [Bug 108275] Breaking out of loop creates broken code on RADV

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108275

Ian Romanick  changed:

   What|Removed |Added

 Status|NEW |NEEDINFO

--- Comment #2 from Ian Romanick  ---
(In reply to maister from comment #0)
> Created attachment 141940 [details]
> Fossilize dump
> 
> I have a test case where adding a break to a loop creates broken code, and
> Vulkan renders something complete bogus.
> 
> The code comes from spirv-opt and works fine on all other implementations
> I've tested.

Does that include the Intel Vulkan driver in Mesa?  That may help narrow down
the location of the problem.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/11] intel/compiler: Optimize sign(x)*y

2018-10-08 Thread Thomas Helland
Den tir. 11. sep. 2018 kl. 01:30 skrev Ian Romanick :
>
> This series implements a code-generation optimization for sign(x)*y.  In
> GLSL, sign(x) is defined as:
>
> Returns 1.0 if x > 0, 0.0 if x = 0, or -1.0 if x < 0.
>
> It is silent on the NaN behavior, so I have taken it as "undefined."  I
> don't think the new implementation will produce different results from
> the old.
>
> The optimization is only applied to the scalar backend.  On Skylake,
> there are ~1,000 shaders in VS, TCS, and TES stages helped.  It may be
> worth applying this to the vector backend for Haswell.  I have a couple
> long flights in my near future, so I might work on it then.  We'll see.
> This might also be a good newbie projet for someone wanting to get into
> the i965 compiler backend.
>
> There are actually two versions of this series.  The series that I am
> sending to the list includes "i965/fs: Eliminate dead code first".  The
> results of that patch is not good.  The other version of the series
> omits that patch, but it adds a bunch of horror to "i965/fs: Add a scale
> factor to emit_fsign".  Basically, if both the fused and non-fused
> version of the nir_op_fsign are emitted, copy propagation will propagate
> part of the common expressions, but, due to the predicated OR or XOR,
> one extra MOV will be left around.  That single instruction ruins the
> whole optimization.
>
> Both versions are available in my cgit.  List version:
>
> https://cgit.freedesktop.org/~idr/mesa/log/?h=fsign-optimization
>
> That branch includes a few things that I tried, but they did not pan
> out.
>
> Alternate version:
>
> 
> https://cgit.freedesktop.org/~idr/mesa/log/?h=fsign-optimization-emit-no-dead-code
>
> I think the version sent to the list is cleaner, but it's shader-db
> results are not as good.  The difference between the list version and
> the other version on Skylake is shown below.  Other platforms had
> similar shaped results.
>
> total instructions in shared programs: 15090997 -> 15091028 (<.01%)
> instructions in affected programs: 10251 -> 10282 (0.30%)
> helped: 0
> HURT: 26
> HURT stats (abs)   min: 1 max: 4 x̄: 1.19 x̃: 1
> HURT stats (rel)   min: 0.14% max: 1.96% x̄: 0.49% x̃: 0.24%
> 95% mean confidence interval for instructions value: 0.94 1.45
> 95% mean confidence interval for instructions %-change: 0.28% 0.71%
> Instructions are HURT.
>
> total cycles in shared programs: 565827580 -> 565824007 (<.01%)
> cycles in affected programs: 1995745 -> 1992172 (-0.18%)
> helped: 271
> HURT: 248
> helped stats (abs) min: 1 max: 623 x̄: 25.79 x̃: 5
> helped stats (rel) min: 0.02% max: 13.19% x̄: 0.94% x̃: 0.28%
> HURT stats (abs)   min: 1 max: 204 x̄: 13.78 x̃: 4
> HURT stats (rel)   min: 0.01% max: 6.57% x̄: 0.52% x̃: 0.21%
> 95% mean confidence interval for cycles value: -11.25 -2.52
> 95% mean confidence interval for cycles %-change: -0.38% -0.11%
> Cycles are helped.
>
> The version sent to the list saves a couple instructions in 26 shaders,
> but cycles are hurt.  The list version also avoids ~65 lines of ugly
> code.
>
> I also sent a couple tests to the piglit list that exersice a bug that I
> had during development.
>
> https://patchwork.freedesktop.org/patch/247911/
>
> In the alternate version, for an expression like sign(a)*sign(b),
> sign(b) would never get emitted.  When the fused sign(a)*x was emitted,
> it would explode.  The solution was to just bail on the optimization
> when sign(a)*sign(b) is encountered.  I suspect that's the source of the
> 32 shaders with instructions hurt in the alternate version, but I have
> not verified that.
>

I've rewieved some of the NIR patches in this series.
I'll leave the intel backend ones to people with more experience there.
Some of the more complicated NIR patches were left for now;
I might look at them later this week if time allows.

- Thomas

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/11] nir: Add helper functions to get the instruction that generated a nir_src

2018-10-08 Thread Thomas Helland
Den tir. 11. sep. 2018 kl. 01:30 skrev Ian Romanick :
>
> From: Ian Romanick 
>

Reviewed-by: Thomas Helland

> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/nir/nir.h | 23 +++
>  1 file changed, 23 insertions(+)
>
> diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
> index bf4bd916d27..69ca1215644 100644
> --- a/src/compiler/nir/nir.h
> +++ b/src/compiler/nir/nir.h
> @@ -2490,6 +2490,29 @@ bool nir_foreach_dest(nir_instr *instr, 
> nir_foreach_dest_cb cb, void *state);
>  bool nir_foreach_src(nir_instr *instr, nir_foreach_src_cb cb, void *state);
>
>  nir_const_value *nir_src_as_const_value(nir_src src);
> +
> +static inline struct nir_instr *
> +nir_src_instr(const struct nir_src *src)
> +{
> +   return src->is_ssa ? src->ssa->parent_instr : NULL;
> +}
> +
> +#define NIR_SRC_AS_(name, c_type, type_enum, cast_macro)\
> +static inline c_type *  \
> +nir_src_as_ ## name (struct nir_src *src)   \
> +{   \
> +return src->is_ssa && src->ssa->parent_instr->type == type_enum \
> +   ? cast_macro(src->ssa->parent_instr) : NULL; \
> +}   \
> +static inline const c_type *\
> +nir_src_as_ ## name ## _const(const struct nir_src *src)\
> +{   \
> +return src->is_ssa && src->ssa->parent_instr->type == type_enum \
> +   ? cast_macro(src->ssa->parent_instr) : NULL; \
> +}
> +
> +NIR_SRC_AS_(alu_instr, nir_alu_instr, nir_instr_type_alu, nir_instr_as_alu)
> +
>  bool nir_src_is_dynamically_uniform(nir_src src);
>  bool nir_srcs_equal(nir_src src1, nir_src src2);
>  void nir_instr_rewrite_src(nir_instr *instr, nir_src *src, nir_src new_src);
> --
> 2.14.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/11] intel/compiler: Don't handle fsign.sat

2018-10-08 Thread Thomas Helland
Den tir. 11. sep. 2018 kl. 01:30 skrev Ian Romanick :
>
> From: Ian Romanick 
>
> No shader-db or CI changes on any Intel platform.
>

I'm no expert on the intel backend, but this seems trivial enough.

Reviewed-by: Thomas Helland

> Signed-off-by: Ian Romanick 
> ---
>  src/intel/compiler/brw_fs_nir.cpp   | 14 +-
>  src/intel/compiler/brw_vec4_nir.cpp | 12 ++--
>  2 files changed, 3 insertions(+), 23 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp 
> b/src/intel/compiler/brw_fs_nir.cpp
> index 7f453d75b64..12b087a5ec0 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -842,6 +842,7 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> nir_alu_instr *instr)
>break;
>
> case nir_op_fsign: {
> +  assert(!instr->dest.saturate);
>if (op[0].abs) {
>   /* Straightforward since the source can be assumed to be either
>* strictly >= 0 or strictly <= 0 depending on the setting of the
> @@ -854,10 +855,6 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> nir_alu_instr *instr)
>  : bld.MOV(result, brw_imm_f(1.0f));
>
>   set_predicate(BRW_PREDICATE_NORMAL, inst);
> -
> - if (instr->dest.saturate)
> -inst->saturate = true;
> -
>} else if (type_sz(op[0].type) < 8) {
>   /* AND(val, 0x8000) gives the sign bit.
>*
> @@ -873,10 +870,6 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> nir_alu_instr *instr)
>
>   inst = bld.OR(result_int, result_int, brw_imm_ud(0x3f80u));
>   inst->predicate = BRW_PREDICATE_NORMAL;
> - if (instr->dest.saturate) {
> -inst = bld.MOV(result, result);
> -inst->saturate = true;
> - }
>} else {
>   /* For doubles we do the same but we need to consider:
>*
> @@ -897,11 +890,6 @@ fs_visitor::nir_emit_alu(const fs_builder , 
> nir_alu_instr *instr)
>
>   set_predicate(BRW_PREDICATE_NORMAL,
> bld.OR(r, r, brw_imm_ud(0x3ff0u)));
> -
> - if (instr->dest.saturate) {
> -inst = bld.MOV(result, result);
> -inst->saturate = true;
> - }
>}
>break;
> }
> diff --git a/src/intel/compiler/brw_vec4_nir.cpp 
> b/src/intel/compiler/brw_vec4_nir.cpp
> index 124714b59de..eaf1754b006 100644
> --- a/src/intel/compiler/brw_vec4_nir.cpp
> +++ b/src/intel/compiler/brw_vec4_nir.cpp
> @@ -1818,6 +1818,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>unreachable("not reached: should have been lowered");
>
> case nir_op_fsign:
> +  assert(!instr->dest.saturate);
>if (op[0].abs) {
>   /* Straightforward since the source can be assumed to be either
>* strictly >= 0 or strictly <= 0 depending on the setting of the
> @@ -1830,10 +1831,6 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>  ? emit(MOV(dst, brw_imm_f(-1.0f)))
>  : emit(MOV(dst, brw_imm_f(1.0f)));
>   inst->predicate = BRW_PREDICATE_NORMAL;
> -
> - if (instr->dest.saturate)
> -inst->saturate = true;
> -
> } else if (type_sz(op[0].type) < 8) {
>   /* AND(val, 0x8000) gives the sign bit.
>*
> @@ -1849,11 +1846,6 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>   inst = emit(OR(dst, src_reg(dst), brw_imm_ud(0x3f80u)));
>   inst->predicate = BRW_PREDICATE_NORMAL;
>   dst.type = BRW_REGISTER_TYPE_F;
> -
> - if (instr->dest.saturate) {
> -inst = emit(MOV(dst, src_reg(dst)));
> -inst->saturate = true;
> - }
>} else {
>   /* For doubles we do the same but we need to consider:
>*
> @@ -1886,7 +1878,7 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
>   /* Now convert the result from float to double */
>   emit_conversion_to_double(dst, retype(src_reg(tmp),
> BRW_REGISTER_TYPE_F),
> -   instr->dest.saturate);
> +   false);
>}
>break;
>
> --
> 2.14.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/11] nir/algebraic: Simplify fsat of fsign

2018-10-08 Thread Thomas Helland
Den tir. 11. sep. 2018 kl. 01:30 skrev Ian Romanick :
>
> From: Ian Romanick 
>
> These allows us to not support fsign.sat in the Intel compiler backend,
> and that will simplify some later changes.
>
> No shader-db changes on any Intel platform.
>

I was a bit skeptical to how this would impact other platforms
than intel, but I've settled on it being a wash.

Reviewed-by: Thomas Helland

> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index 3267e93a583..422a8794d38 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -329,6 +329,7 @@ optimizations = [
> (('imax', a, ('ineg', a)), ('iabs', a)),
> (('~fmin', ('fmax', a, 0.0), 1.0), ('fsat', a), '!options->lower_fsat'),
> (('~fmax', ('fmin', a, 1.0), 0.0), ('fsat', a), '!options->lower_fsat'),
> +   (('fsat', ('fsign', a)), ('b2f', ('flt', 0.0, a))),
> (('fsat', a), ('fmin', ('fmax', a, 0.0), 1.0), 'options->lower_fsat'),
> (('fsat', ('fsat', a)), ('fsat', a)),
> (('fmin', ('fmax', ('fmin', ('fmax', a, b), c), b), c), ('fmin', ('fmax', 
> a, b), c)),
> --
> 2.14.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 02/11] nir/algebraic: sign(x)*x*x is abs(x)*x

2018-10-08 Thread Thomas Helland
Den tir. 11. sep. 2018 kl. 01:30 skrev Ian Romanick :
>
> From: Ian Romanick 
>

Reviewed-by: Thomas Helland

> shader-db results:
>
> All Gen7+ platforms had similar results. (Skylake shown)
> total instructions in shared programs: 15106023 -> 15105981 (<.01%)
> instructions in affected programs: 300 -> 258 (-14.00%)
> helped: 6
> HURT: 0
> helped stats (abs) min: 7 max: 7 x̄: 7.00 x̃: 7
> helped stats (rel) min: 14.00% max: 14.00% x̄: 14.00% x̃: 14.00%
> 95% mean confidence interval for instructions value: -7.00 -7.00
> 95% mean confidence interval for instructions %-change: -14.00% -14.00%
> Instructions are helped.
>
> total cycles in shared programs: 566050327 -> 566050075 (<.01%)
> cycles in affected programs: 2826 -> 2574 (-8.92%)
> helped: 6
> HURT: 0
> helped stats (abs) min: 40 max: 44 x̄: 42.00 x̃: 42
> helped stats (rel) min: 8.89% max: 8.94% x̄: 8.92% x̃: 8.92%
> 95% mean confidence interval for cycles value: -44.30 -39.70
> 95% mean confidence interval for cycles %-change: -8.95% -8.88%
> Cycles are helped.
>
> No changes on Gen6 or earlier.
>
> Signed-off-by: Ian Romanick 
> ---
>  src/compiler/nir/nir_opt_algebraic.py | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/src/compiler/nir/nir_opt_algebraic.py 
> b/src/compiler/nir/nir_opt_algebraic.py
> index ae1261f8744..3267e93a583 100644
> --- a/src/compiler/nir/nir_opt_algebraic.py
> +++ b/src/compiler/nir/nir_opt_algebraic.py
> @@ -105,6 +105,11 @@ optimizations = [
> (('imul', a, 1), a),
> (('fmul', a, -1.0), ('fneg', a)),
> (('imul', a, -1), ('ineg', a)),
> +   # If a < 0: fsign(a)*a*a => -1*a*a => -a*a => abs(a)*a
> +   # If a > 0: fsign(a)*a*a => 1*a*a => a*a => abs(a)*a
> +   # If a == 0: fsign(a)*a*a => 0*0*0 => abs(0)*0
> +   (('fmul', ('fsign', a), ('fmul', a, a)), ('fmul', ('fabs', a), a)),
> +   (('fmul', ('fmul', ('fsign', a), a), a), ('fmul', ('fabs', a), a)),
> (('~ffma', 0.0, a, b), b),
> (('~ffma', a, 0.0, b), b),
> (('~ffma', a, b, 0.0), ('fmul', a, b)),
> --
> 2.14.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108275] Breaking out of loop creates broken code on RADV

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108275

--- Comment #1 from mais...@archlinux.us ---
To replay the pipeline (for debugging):

./cli/fossilize-replay fossilize.json --filter-compute 3

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108275] Breaking out of loop creates broken code on RADV

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108275

Bug ID: 108275
   Summary: Breaking out of loop creates broken code on RADV
   Product: Mesa
   Version: 18.2
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: Drivers/Vulkan/radeon
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: mais...@archlinux.us
QA Contact: mesa-dev@lists.freedesktop.org

Created attachment 141940
  --> https://bugs.freedesktop.org/attachment.cgi?id=141940=edit
Fossilize dump

I have a test case where adding a break to a loop creates broken code, and
Vulkan renders something complete bogus.

The code comes from spirv-opt and works fine on all other implementations I've
tested.

The original GLSL looks like this:
https://github.com/Themaister/Granite/blob/master/assets/shaders/ocean/cull_blocks.comp
To workaround the issue, I removed the "break" on line 54, which for some
reason fixed the issue.

My hunch is that the SelectionMerge inside the loop is merging to the loop's
continue block, and this is causing some weirdness.

To build Fossilize for repro:

git clone git://github.com/Themaister/Fossilize
cd Fossilize
git submodule update --init
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Debug
make -j16

To disassemble the failing SPIR-V pipeline:

GLSL (SPIRV-Cross):
./cli/fossilize-disasm fossilize.json --compute-pipeline 3 --target glsl
SPIR-V asm:
... --target asm
AMD ISA (VK_AMD_shader_info):
... --target amd

For the workaround case, use --compute-pipeline 29 instead.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/12] util: Add a simple big math library

2018-10-08 Thread Ian Romanick
This patch is

Reviewed-by: Ian Romanick 

On 10/05/2018 09:10 PM, Jason Ekstrand wrote:
> ---
>  src/util/Makefile.am  |   1 +
>  src/util/Makefile.sources |   1 +
>  src/util/bigmath.h| 112 ++
>  src/util/meson.build  |   1 +
>  4 files changed, 115 insertions(+)
>  create mode 100644 src/util/bigmath.h
> 
> diff --git a/src/util/Makefile.am b/src/util/Makefile.am
> index efb94caff71..d79f2b320be 100644
> --- a/src/util/Makefile.am
> +++ b/src/util/Makefile.am
> @@ -21,6 +21,7 @@
>  
>  SUBDIRS = . \
>   xmlpool \
> + tests/fast_idiv_by_const \
>   tests/hash_table \
>   tests/string_buffer \
>   tests/set
> diff --git a/src/util/Makefile.sources b/src/util/Makefile.sources
> index b562d6cd6f4..5b1548c733c 100644
> --- a/src/util/Makefile.sources
> +++ b/src/util/Makefile.sources
> @@ -1,4 +1,5 @@
>  MESA_UTIL_FILES := \
> + bigmath.h \
>   bitscan.c \
>   bitscan.h \
>   bitset.h \
> diff --git a/src/util/bigmath.h b/src/util/bigmath.h
> new file mode 100644
> index 000..6339bb6f6ca
> --- /dev/null
> +++ b/src/util/bigmath.h
> @@ -0,0 +1,112 @@
> +/*
> + * Copyright © 2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#ifndef UTIL_BIGMATH_H
> +#define UTIL_BIGMATH_H
> +
> +#include "macros.h"
> +
> +#include 
> +#include 
> +#include 
> +
> +static inline bool
> +_ubm_add_u32arr(uint32_t *dst, unsigned dst_len,
> +uint32_t *a, unsigned a_len,
> +uint32_t *b, unsigned b_len)
> +{
> +   uint32_t carry = 0;
> +   for (unsigned i = 0; i < dst_len; i++) {
> +  uint64_t sum = carry;
> +  if (i < a_len)
> + sum += a[i];
> +  if (i < b_len)
> + sum += b[i];
> +  dst[i] = sum;
> +  carry = sum >> 32;
> +   }
> +
> +   /* Now compute overflow */
> +
> +   for (unsigned i = dst_len; i < a_len; i++) {
> +  if (a[i])
> + return true;
> +   }
> +
> +   for (unsigned i = dst_len; i < b_len; i++) {
> +  if (b[i])
> + return true;
> +   }
> +
> +   return carry;
> +}
> +#define ubm_add_u32arr(dst, a, b) \
> +   _ubm_add_u32arr(dst, ARRAY_SIZE(dst), a, ARRAY_SIZE(a), b, ARRAY_SIZE(b))
> +
> +static inline bool
> +_ubm_mul_u32arr(uint32_t *dst, unsigned dst_len,
> +uint32_t *a, unsigned a_len,
> +uint32_t *b, unsigned b_len)
> +{
> +   memset(dst, 0, dst_len * sizeof(*dst));
> +
> +   bool overflow = false;
> +
> +   for (unsigned i = 0; i < a_len; i++) {
> +  uint32_t carry = 0;
> +  for (unsigned j = 0; j < b_len; j++) {
> + /* The maximum values of a[i] and b[i] are UINT32_MAX so the maximum
> +  * value of tmp is UINT32_MAX * UINT32_MAX.  The maximum value that
> +  * will fit in tmp is
> +  *
> +  *UINT64_MAX = UINT32_MAX << 32 + UINT32_MAX
> +  *   = UINT32_MAX * (UINT32_MAX + 1) + UINT32_MAX
> +  *   = UINT32_MAX * UINT32_MAX + 2 * UINT32_MAX
> +  *
> +  * so we're guaranteed that we can add in two more 32-bit values
> +  * without overflowing tmp.
> +  */
> + uint64_t tmp = (uint64_t)a[i] * (uint64_t)b[j];
> + tmp += carry;
> + if (i + j < dst_len) {
> +tmp += dst[i + j];
> +dst[i + j] = tmp;
> +carry = tmp >> 32;
> + } else {
> +/* We're trying to write a value that doesn't fit */
> +overflow = overflow || tmp > 0;
> +break;
> + }
> +  }
> +  if (i + b_len < dst_len)
> + dst[i + b_len] = carry;
> +  else
> + overflow = overflow || carry > 0;
> +   }
> +
> +   return overflow;
> +}
> +#define ubm_mul_u32arr(dst, a, b) \
> +   _ubm_mul_u32arr(dst, ARRAY_SIZE(dst), a, ARRAY_SIZE(a), b, ARRAY_SIZE(b))
> +
> 

Re: [Mesa-dev] [PATCH] mesa/st: In the precense of integer buffers enable per buffer blending

2018-10-08 Thread Ilia Mirkin
On Wed, Sep 26, 2018 at 3:30 AM Gert Wollny  wrote:
>
> Am Dienstag, den 25.09.2018, 10:20 -0400 schrieb Ilia Mirkin:
> > I haven't double-checked yet, but doesn't this result in a reduction
> > of functionality for pre-independent-blend GPUs (like the early
> > NVIDIA
> > Tesla series)? Configuring blending for an integer RT does nothing on
> > NVIDIA hardware, so it all works out there just fine...
> Unfortunately I can't test this ...
>
> >
> > Perhaps both patches should just be reverted and keep things as they
> > were, and let drivers worry about this?
> Do as you see fit, my only concern is that there are no regressions,
> and if the two patches together still result in regressions then it is
> indeed better to revert,
>
> best,
> Gert

I had a look at this in more detail. Ken's change basically made
handling of integer buffers be the same as buffers whose colormask was
set to empty, or whose blending had been disabled. This should be
supportable by all GPUs which support EXT_draw_buffers2. The
additional bit is independent blend *functions*
(ARB_draw_buffers_blend), which are only supported on GT215+ (on the
NVIDIA front).

A quick look through the various GPU capabilities, it seems like
pre-G80, there is no EXT_draw_buffers2 support, but on the bright
side, no integer RTs either, so all is well.

So these changes should be fine, although they do force us onto the
independent blend path unnecessarily. I suspect this happens ~never in
practice.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 93089] mesa fails to check for gcc atomic primitives before using them

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93089

--- Comment #22 from Ian Romanick  ---
(In reply to freedesktop from comment #21)
> This also popped up on archlinux32 when trying to package mesa for i486:
> 
> /usr/bin/ld: src/intel/vulkan/libanv_common.a(anv_allocator.c.o): in
> function `anv_block_pool_alloc_new':

...

There is no GPU supported by the Intel Vulkan driver that can possibly exist
with a 486 CPU.  Don't build that driver with i486 restrictions.  Same goes for
i915_dri.so and i965_dri.so.  None of those can exist with ARM, Alpha, or MIPS
either.  Use the proper options to configure.ac or Meson to exclude those
drivers on those platforms.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108118] AMDGPU sometimes hangs forever when running graphical applications

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108118

--- Comment #4 from duoora...@gmail.com ---
I have been unable to replicate this problem with vulkan-radeon from the new
Mesa 18.2.2, I think it might have been fixed in the 18.2.1 -> 18.2.2 update.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nvc0: handle user buffers without any data

2018-10-08 Thread Ilia Mirkin
I have a recollection of wanting to do something similar in the past.
However I also have a record of the test passing before (circa 2015).
OTOH we do things differently on GM107+, so ... who knows. My guess is
that it was the splitting out of is_user_buffer changed how the code
behaved somehow - perhaps wrt how it was set in the "there is nothing"
case. Can you do a bit of archaeology to see what's different now?On
Mon, Oct 8, 2018 at 12:20 PM Rhys Perry 
wrote:
>
> Fixes crash in piglit's gl-3.1-vao-broken-attrib.
>
> Signed-off-by: Rhys Perry 
> ---
>  src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
> index f2393cb27b..86a8bebc25 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
> @@ -972,7 +972,13 @@ nvc0_set_vertex_buffers(struct pipe_context *pipe,
>  for (i = 0; i < count; ++i) {
> unsigned dst_index = start_slot + i;
>
> -   if (vb[i].is_user_buffer) {
> +   /* user buffers without any data are treated as resource buffers
> +* without a resource
> +*/
> +   if (vb[i].is_user_buffer && !vb[i].buffer.user)
> +  nvc0->vtxbuf[i].is_user_buffer = false;
> +
> +   if (nvc0->vtxbuf[i].is_user_buffer) {
>nvc0->vbo_user |= 1 << dst_index;
>if (!vb[i].stride && nvc0->screen->eng3d->oclass < GM107_3D_CLASS)
>   nvc0->constant_vbos |= 1 << dst_index;
> --
> 2.17.1
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 3/6] radeonsi:optimizing SET_CONTEXT_REG for shaders VS

2018-10-08 Thread Michel Dänzer
On 2018-10-03 5:53 p.m., Sonny Jiang wrote:
> Signed-off-by: Sonny Jiang 

Unfortunately, this change causes GPU hangs with the radeon kernel
driver on Kaveri, see the attached dmesg excerpt (this might have been
with later patches from the series still applied, but I've had to revert
those in addition to this one, or there are conflicts).


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer


kern.log.gz
Description: application/gzip
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nvc0: handle user buffers without any data

2018-10-08 Thread Rhys Perry
Fixes crash in piglit's gl-3.1-vao-broken-attrib.

Signed-off-by: Rhys Perry 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_state.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
index f2393cb27b..86a8bebc25 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_state.c
@@ -972,7 +972,13 @@ nvc0_set_vertex_buffers(struct pipe_context *pipe,
 for (i = 0; i < count; ++i) {
unsigned dst_index = start_slot + i;
 
-   if (vb[i].is_user_buffer) {
+   /* user buffers without any data are treated as resource buffers
+* without a resource
+*/
+   if (vb[i].is_user_buffer && !vb[i].buffer.user)
+  nvc0->vtxbuf[i].is_user_buffer = false;
+
+   if (nvc0->vtxbuf[i].is_user_buffer) {
   nvc0->vbo_user |= 1 << dst_index;
   if (!vb[i].stride && nvc0->screen->eng3d->oclass < GM107_3D_CLASS)
  nvc0->constant_vbos |= 1 << dst_index;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nvc0: handle possible null dereference in nvc0_validate_vertex_buffers

2018-10-08 Thread Rhys Perry
It's valid to have a vertex buffer with a NULL resource.

Signed-off-by: Rhys Perry 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
index 66de6d9e2f..7871d50ab9 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_vbo.c
@@ -346,6 +346,10 @@ nvc0_validate_vertex_buffers(struct nvc0_context *nvc0)
  /* address/value set in nvc0_update_user_vbufs */
  continue;
   }
+  if (!vb->buffer.resource) {
+ IMMED_NVC0(push, NVC0_3D(VERTEX_ARRAY_FETCH(i)), 0);
+ continue;
+  }
   res = nv04_resource(vb->buffer.resource);
   offset = ve->pipe.src_offset + vb->buffer_offset;
   limit = vb->buffer.resource->width0 - 1;
-- 
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/12] util: Add tests for fast integer division by constants

2018-10-08 Thread Jason Ekstrand
On Sat, Oct 6, 2018 at 1:52 PM Marek Olšák  wrote:

> With my comments addressed, patches 2 - 6 are:
>
> Reviewed-by: Marek Olšák 
>

Thanks!  Unfortunately, the tests require patch 1 so it'd be nice if that
got review by someone.  I'd also be happy to pull in someone else's more
vetted code for large integer multiplication but I had trouble finding any
that was liberally lisenced.  Maybe I just didn't look hard enough?


> Since I will need to compute the division terms during draw calls, I
> may need to switch the math to uint32_t for my case (e.g. via a C++
> template).
>

There's very little change in the 32 vs. 64-bit versions but if you're
worried about the 64-bit arithmetic, it'd be easy enough to have a version
that does 32-bit arithmetic and only use the 64-bit version when actually
needed.

--Jason


> Marek
>
> On Sat, Oct 6, 2018 at 12:11 AM Jason Ekstrand 
> wrote:
> >> While I generally trust rediculousfish to have done his homework, we've
> > made some adjustments to suite the needs of mesa and it'd be good to
> > test those.  Also, there's no better place than unit tests to clearly
> > document the different edge cases of the different methods.
> > ---
> >  configure.ac  |   1 +
> >  src/util/Makefile.am  |   3 +-
> >  src/util/meson.build  |   1 +
> >  src/util/tests/fast_idiv_by_const/Makefile.am |  43 ++
> >  .../fast_idiv_by_const_test.cpp   | 472 ++
> >  src/util/tests/fast_idiv_by_const/meson.build |  30 ++
> >  6 files changed, 549 insertions(+), 1 deletion(-)
> >  create mode 100644 src/util/tests/fast_idiv_by_const/Makefile.am
> >  create mode 100644
> src/util/tests/fast_idiv_by_const/fast_idiv_by_const_test.cpp
> >  create mode 100644 src/util/tests/fast_idiv_by_const/meson.build
> >
> > diff --git a/configure.ac b/configure.ac
> > index 34689826c98..7b0b2b20ba2 100644
> > --- a/configure.ac
> > +++ b/configure.ac
> > @@ -3198,6 +3198,7 @@ AC_CONFIG_FILES([Makefile
> >   src/util/tests/hash_table/Makefile
> >   src/util/tests/set/Makefile
> >   src/util/tests/string_buffer/Makefile
> > + src/util/tests/uint_inverse/Makefile
> >   src/util/tests/vma/Makefile
> >   src/util/xmlpool/Makefile
> >   src/vulkan/Makefile])
> > diff --git a/src/util/Makefile.am b/src/util/Makefile.am
> > index d79f2b320be..9e633bf65d5 100644
> > --- a/src/util/Makefile.am
> > +++ b/src/util/Makefile.am
> > @@ -24,7 +24,8 @@ SUBDIRS = . \
> > tests/fast_idiv_by_const \
> > tests/hash_table \
> > tests/string_buffer \
> > -   tests/set
> > +   tests/set \
> > +   tests/uint_inverse
> >
> >  if HAVE_STD_CXX11
> >  SUBDIRS += tests/vma
> > diff --git a/src/util/meson.build b/src/util/meson.build
> > index cdbad98e7cb..49d84c16ebe 100644
> > --- a/src/util/meson.build
> > +++ b/src/util/meson.build
> > @@ -170,6 +170,7 @@ if with_tests
> >  )
> >)
> >
> > +  subdir('tests/fast_idiv_by_const')
> >subdir('tests/hash_table')
> >subdir('tests/string_buffer')
> >subdir('tests/vma')
> > diff --git a/src/util/tests/fast_idiv_by_const/Makefile.am
> b/src/util/tests/fast_idiv_by_const/Makefile.am
> > new file mode 100644
> > index 000..1ebee09f59b
> > --- /dev/null
> > +++ b/src/util/tests/fast_idiv_by_const/Makefile.am
> > @@ -0,0 +1,43 @@
> > +# Copyright © 2018 Intel
> > +#
> > +#  Permission is hereby granted, free of charge, to any person
> obtaining a
> > +#  copy of this software and associated documentation files (the
> "Software"),
> > +#  to deal in the Software without restriction, including without
> limitation
> > +#  the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> > +#  and/or sell copies of the Software, and to permit persons to whom the
> > +#  Software is furnished to do so, subject to the following conditions:
> > +#
> > +#  The above copyright notice and this permission notice (including the
> next
> > +#  paragraph) shall be included in all copies or substantial portions
> of the
> > +#  Software.
> > +#
> > +#  THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> > +#  IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> > +#  FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> > +#  THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> > +#  LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> ARISING
> > +#  FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> > +#  IN THE SOFTWARE.
> > +
> > +AM_CPPFLAGS = \
> > +   -I$(top_srcdir)/src \
> > +   -I$(top_srcdir)/include \
> > +   -I$(top_srcdir)/src/gallium/include \
> > +   -I$(top_srcdir)/src/gtest/include \
> > +   $(PTHREAD_CFLAGS) \
> > +   $(DEFINES)
> > +
> > +TESTS = 

Re: [Mesa-dev] [PATCH] nir: Add a pass for gathering transform feedback info

2018-10-08 Thread Jason Ekstrand
On Mon, Oct 8, 2018 at 10:22 AM Alejandro Piñeiro 
wrote:

> I was not able to finish trying to get ARB_gl_spirv using this pass. The
> major difference is that on ARB_gl_spirv (and afaiu on GLSL too) we are
> merging the info of all the available xfb varyings from all the stages,
> while this pass gathers info from a individual nir shader (so one
> individual stage).
>
> Having said so, while using this pass, I found some issues/questions,
> see below inline.
>
>
> On 05/10/18 16:13, Jason Ekstrand wrote:
> > This is different from the GL_ARB_spirv pass because it generates a much
> > simpler data structure that isn't tied to OpenGL and mtypes.h.
> > ---
> >  src/compiler/Makefile.sources  |   4 +-
> >  src/compiler/nir/meson.build   |   2 +
> >  src/compiler/nir/nir_gather_xfb_info.c | 150 +
> >  src/compiler/nir/nir_xfb_info.h|  59 ++
> >  4 files changed, 214 insertions(+), 1 deletion(-)
> >  create mode 100644 src/compiler/nir/nir_gather_xfb_info.c
> >  create mode 100644 src/compiler/nir/nir_xfb_info.h
> >
> > diff --git a/src/compiler/Makefile.sources
> b/src/compiler/Makefile.sources
> > index d3b06564832..46ed5e47b46 100644
> > --- a/src/compiler/Makefile.sources
> > +++ b/src/compiler/Makefile.sources
> > @@ -216,6 +216,7 @@ NIR_FILES = \
> >   nir/nir_format_convert.h \
> >   nir/nir_from_ssa.c \
> >   nir/nir_gather_info.c \
> > + nir/nir_gather_xfb_info.c \
> >   nir/nir_gs_count_vertices.c \
> >   nir/nir_inline_functions.c \
> >   nir/nir_instr_set.c \
> > @@ -307,7 +308,8 @@ NIR_FILES = \
> >   nir/nir_validate.c \
> >   nir/nir_vla.h \
> >   nir/nir_worklist.c \
> > - nir/nir_worklist.h
> > + nir/nir_worklist.h \
> > + nir/nir_xfb_info.h
> >
> >  SPIRV_GENERATED_FILES = \
> >   spirv/spirv_info.c \
> > diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> > index 090aa7a628f..b416e561eb0 100644
> > --- a/src/compiler/nir/meson.build
> > +++ b/src/compiler/nir/meson.build
> > @@ -100,6 +100,7 @@ files_libnir = files(
> >'nir_format_convert.h',
> >'nir_from_ssa.c',
> >'nir_gather_info.c',
> > +  'nir_gather_xfb_info.c',
> >'nir_gs_count_vertices.c',
> >'nir_inline_functions.c',
> >'nir_instr_set.c',
> > @@ -192,6 +193,7 @@ files_libnir = files(
> >'nir_vla.h',
> >'nir_worklist.c',
> >'nir_worklist.h',
> > +  'nir_xfb_info.h',
> >'../spirv/GLSL.ext.AMD.h',
> >'../spirv/GLSL.std.450.h',
> >'../spirv/gl_spirv.c',
> > diff --git a/src/compiler/nir/nir_gather_xfb_info.c
> b/src/compiler/nir/nir_gather_xfb_info.c
> > new file mode 100644
> > index 000..a53703bb9bf
> > --- /dev/null
> > +++ b/src/compiler/nir/nir_gather_xfb_info.c
> > @@ -0,0 +1,150 @@
> > +/*
> > + * Copyright © 2018 Intel Corporation
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> obtaining a
> > + * copy of this software and associated documentation files (the
> "Software"),
> > + * to deal in the Software without restriction, including without
> limitation
> > + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> > + * and/or sell copies of the Software, and to permit persons to whom the
> > + * Software is furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice (including the
> next
> > + * paragraph) shall be included in all copies or substantial portions
> of the
> > + * Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> ARISING
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> > + * IN THE SOFTWARE.
> > + */
> > +
> > +#include "nir_xfb_info.h"
> > +
> > +#include 
> > +
> > +static void
> > +add_var_xfb_outputs(nir_xfb_info *xfb,
> > +nir_variable *var,
> > +unsigned *location,
> > +unsigned *offset,
> > +const struct glsl_type *type)
> > +{
> > +   if (glsl_type_is_array(type) || glsl_type_is_matrix(type)) {
> > +  unsigned length = glsl_get_length(type);
> > +  const struct glsl_type *child_type = glsl_get_array_element(type);
> > +  for (unsigned i = 0; i < length; i++)
> > + add_var_xfb_outputs(xfb, var, location, offset, child_type);
> > +   } else if (glsl_type_is_struct(type)) {
> > +  unsigned length = glsl_get_length(type);
> > +  for (unsigned i = 0; i < length; i++) {
> > + const struct glsl_type *child_type =
> glsl_get_struct_field(type, i);
> > + 

Re: [Mesa-dev] [PATCH mesa] radv: add missing meson c++ visibility arguments

2018-10-08 Thread Dylan Baker
Reviewed-by: Dylan Baker 

You could add radv_flags to future safety, but it's not used currently so this
should fix things.

Quoting Eric Engestrom (2018-10-08 08:25:58)
> Fixes: 6f3aee40f90d725653b6 "radv: using tls to store llvm related info
>  and speed up compiles (v10)"
> Cc: Dave Airlie 
> Signed-off-by: Eric Engestrom 
> ---
>  src/amd/vulkan/meson.build | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/amd/vulkan/meson.build b/src/amd/vulkan/meson.build
> index 9ceaeb6f00246f03b14a..87a90ffe849ad2baf346 100644
> --- a/src/amd/vulkan/meson.build
> +++ b/src/amd/vulkan/meson.build
> @@ -144,6 +144,7 @@ libvulkan_radeon = shared_library(
>  idep_nir,
>],
>c_args : [c_vis_args, no_override_init_args, radv_flags],
> +  cpp_args : cpp_vis_args,
>link_args : [ld_args_bsymbolic, ld_args_gc_sections],
>install : true,
>  )
> -- 
> Cheers,
>   Eric
> 


signature.asc
Description: signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH mesa] radv: add missing meson c++ visibility arguments

2018-10-08 Thread Eric Engestrom
Fixes: 6f3aee40f90d725653b6 "radv: using tls to store llvm related info
 and speed up compiles (v10)"
Cc: Dave Airlie 
Signed-off-by: Eric Engestrom 
---
 src/amd/vulkan/meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/amd/vulkan/meson.build b/src/amd/vulkan/meson.build
index 9ceaeb6f00246f03b14a..87a90ffe849ad2baf346 100644
--- a/src/amd/vulkan/meson.build
+++ b/src/amd/vulkan/meson.build
@@ -144,6 +144,7 @@ libvulkan_radeon = shared_library(
 idep_nir,
   ],
   c_args : [c_vis_args, no_override_init_args, radv_flags],
+  cpp_args : cpp_vis_args,
   link_args : [ld_args_bsymbolic, ld_args_gc_sections],
   install : true,
 )
-- 
Cheers,
  Eric

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: Add a pass for gathering transform feedback info

2018-10-08 Thread Alejandro Piñeiro
I was not able to finish trying to get ARB_gl_spirv using this pass. The
major difference is that on ARB_gl_spirv (and afaiu on GLSL too) we are
merging the info of all the available xfb varyings from all the stages,
while this pass gathers info from a individual nir shader (so one
individual stage).

Having said so, while using this pass, I found some issues/questions,
see below inline.


On 05/10/18 16:13, Jason Ekstrand wrote:
> This is different from the GL_ARB_spirv pass because it generates a much
> simpler data structure that isn't tied to OpenGL and mtypes.h.
> ---
>  src/compiler/Makefile.sources  |   4 +-
>  src/compiler/nir/meson.build   |   2 +
>  src/compiler/nir/nir_gather_xfb_info.c | 150 +
>  src/compiler/nir/nir_xfb_info.h|  59 ++
>  4 files changed, 214 insertions(+), 1 deletion(-)
>  create mode 100644 src/compiler/nir/nir_gather_xfb_info.c
>  create mode 100644 src/compiler/nir/nir_xfb_info.h
>
> diff --git a/src/compiler/Makefile.sources b/src/compiler/Makefile.sources
> index d3b06564832..46ed5e47b46 100644
> --- a/src/compiler/Makefile.sources
> +++ b/src/compiler/Makefile.sources
> @@ -216,6 +216,7 @@ NIR_FILES = \
>   nir/nir_format_convert.h \
>   nir/nir_from_ssa.c \
>   nir/nir_gather_info.c \
> + nir/nir_gather_xfb_info.c \
>   nir/nir_gs_count_vertices.c \
>   nir/nir_inline_functions.c \
>   nir/nir_instr_set.c \
> @@ -307,7 +308,8 @@ NIR_FILES = \
>   nir/nir_validate.c \
>   nir/nir_vla.h \
>   nir/nir_worklist.c \
> - nir/nir_worklist.h
> + nir/nir_worklist.h \
> + nir/nir_xfb_info.h
>  
>  SPIRV_GENERATED_FILES = \
>   spirv/spirv_info.c \
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index 090aa7a628f..b416e561eb0 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -100,6 +100,7 @@ files_libnir = files(
>'nir_format_convert.h',
>'nir_from_ssa.c',
>'nir_gather_info.c',
> +  'nir_gather_xfb_info.c',
>'nir_gs_count_vertices.c',
>'nir_inline_functions.c',
>'nir_instr_set.c',
> @@ -192,6 +193,7 @@ files_libnir = files(
>'nir_vla.h',
>'nir_worklist.c',
>'nir_worklist.h',
> +  'nir_xfb_info.h',
>'../spirv/GLSL.ext.AMD.h',
>'../spirv/GLSL.std.450.h',
>'../spirv/gl_spirv.c',
> diff --git a/src/compiler/nir/nir_gather_xfb_info.c 
> b/src/compiler/nir/nir_gather_xfb_info.c
> new file mode 100644
> index 000..a53703bb9bf
> --- /dev/null
> +++ b/src/compiler/nir/nir_gather_xfb_info.c
> @@ -0,0 +1,150 @@
> +/*
> + * Copyright © 2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "nir_xfb_info.h"
> +
> +#include 
> +
> +static void
> +add_var_xfb_outputs(nir_xfb_info *xfb,
> +nir_variable *var,
> +unsigned *location,
> +unsigned *offset,
> +const struct glsl_type *type)
> +{
> +   if (glsl_type_is_array(type) || glsl_type_is_matrix(type)) {
> +  unsigned length = glsl_get_length(type);
> +  const struct glsl_type *child_type = glsl_get_array_element(type);
> +  for (unsigned i = 0; i < length; i++)
> + add_var_xfb_outputs(xfb, var, location, offset, child_type);
> +   } else if (glsl_type_is_struct(type)) {
> +  unsigned length = glsl_get_length(type);
> +  for (unsigned i = 0; i < length; i++) {
> + const struct glsl_type *child_type = glsl_get_struct_field(type, i);
> + add_var_xfb_outputs(xfb, var, location, offset, child_type);
> +  }
> +   } else {
> +  assert(var->data.xfb_buffer < NIR_MAX_XFB_BUFFERS);
> +  if (xfb->buffers_written & (1 << var->data.xfb_buffer)) {
> + assert(xfb->strides[var->data.xfb_buffer] == var->data.xfb_stride);
> + 

Re: [Mesa-dev] EGL_MESA_query_renderer

2018-10-08 Thread Haehnle, Nicolai
Hi Veluri,

On 07.10.2018 21:31, Veluri Mithun wrote:
> All these days I worked on packaging since I didn't find much time last 
> month in my new academic schedule, I finished it if you wish you may 
> download it 
> here(https://flathub.org/apps/details/br.com.jeanhertel.adriconf). 
> Currently, it can configure drivers in X11 server.

That's awesome :)


> I started to work on this extension and I have few doubts
> 
>  1. 
> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/glx/dri_glx.c#L254
> this functions gets the driver configs in X11. what is the use of
> this line? what does it do? why we are doing mutex_lock at start and
> mutex_unlock at end? I saw this in many other functions also.

Are you asking what a mutex is? Just look it up on Wikipedia :)

In this particular case, the function is reading and possibly writing a 
global data structure, namely the driver_config_cache.

If multiple threads were to access the structure simultaneously, we'd 
obviously be in serious trouble (e.g., two threads modify the list at 
the same time, leading to memory leaks or, worse, corruption), and the 
mutex protects against this.

Remember, there's nothing preventing two threads from calling that 
function simultaneously.

(In some other cases in GL / GLX, some protections exist. For example, 
two threads cannot simultaneously call a GL function on the same 
context. So a GL function which only accesses per-context data doesn't 
need any locking. However, GL functions that access data which may be 
shared between contexts, such as textures, do need locking.)


> On Wed, Aug 22, 2018 at 11:23 AM Nicolai Hähnle  > wrote:
> 
> In a separate email, Rob wrote:
> 
>   > so, it was earlier discussed that
>   > glXGetScreenDriver()/glXGetDriverConfig() equivalents could be
> lumped
>   > into this extension, which is I guess not what you have done.
> 
> I'm fairly agnostic on this, but if you do lump it into one extension,
> please make the GetDriverConfig part optional.
> 
> 2. How to make this part optional?

The extension should probably say somewhere that the function must 
exist, but it's correct to return NULL instead of a string containing 
the XML which describes driver config options.

(There are other possibilities, but that one just makes sense to me.)

Cheers,
Nicolai


> 
> 
> There are non-Mesa drivers which implement GLX_MESA_query_renderer, and
> it'd be good if the same were at least possible for
> EGL_MESA_query_renderer as well.
> 
> Cheers,
> Nicolai
> 
> 
> Cheers,
> Veluri.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/11] freedreno: a2xx: split large draws on a20x

2018-10-08 Thread Jonathan marek

Hi,

You're right, it would be easy to do. I'll include it in my next submission.

On 10/08/2018 12:13 AM, Ilia Mirkin wrote:

See my feedback from your earlier submission for how to make this work
on more than triangles. Seems easy enough to just do it.

https://patchwork.freedesktop.org/patch/250192/
On Mon, Oct 8, 2018 at 12:07 AM Jonathan Marek  wrote:


a20x can only draw 65535 vertices at once. this fix only applies to
triangles.

Signed-off-by: Jonathan Marek 
---
  src/gallium/drivers/freedreno/a2xx/fd2_draw.c | 30 +--
  1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c 
b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
index 1792505808..7ccbee587f 100644
--- a/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
+++ b/src/gallium/drivers/freedreno/a2xx/fd2_draw.c
@@ -171,8 +171,34 @@ fd2_draw_vbo(struct fd_context *ctx, const struct 
pipe_draw_info *pinfo,
 fd2_emit_state(ctx, ctx->batch->draw, ctx->dirty);
 fd2_emit_state(ctx, ctx->batch->binning, ctx->dirty);

-   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false);
-   draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true);
+   /* a20x can only draw 65535 vertices at once... */
+   if (is_a20x(ctx->screen) && pinfo->count > 0x) {
+   struct pipe_draw_info info = *pinfo;
+   unsigned count = info.count;
+   unsigned num_vertices = ctx->batch->num_vertices;
+
+   /* other primitives require more work
+* (triangles works because 0x is divible by 3)
+*/
+   if (info.mode != PIPE_PRIM_TRIANGLES)
+   return false;
+
+   for (; count; ) {
+   info.count = MIN2(count, 0x);
+
+   draw_impl(ctx, , ctx->batch->draw, index_offset, 
false);
+   draw_impl(ctx, , ctx->batch->binning, 
index_offset, true);
+
+   info.start += 0x;
+   ctx->batch->num_vertices += 0x;
+   count -= info.count;
+   }
+   /* changing this value is a hack, restore it */
+   ctx->batch->num_vertices = num_vertices;
+   } else {
+   draw_impl(ctx, pinfo, ctx->batch->draw, index_offset, false);
+   draw_impl(ctx, pinfo, ctx->batch->binning, index_offset, true);
+   }

 fd_context_all_clean(ctx);

--
2.17.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] sse4 patches

2018-10-08 Thread Tapani Pälli


On 10/8/18 9:23 AM, Kenneth Graunke wrote:

On Monday, September 24, 2018 4:19:36 AM PDT Tapani Pälli wrote:

Hi;

Here's another try to inline sse41 code and get rid of gtt maps
in intel_miptree_map (revert 58fb613a519). To be able to safely
utilize sse41 we separate sse41 functionality as a library and
then choose run time if we want to use it.

Couple of different approaches were tried, this one seems one with
minimal overall changes.

// Tapani

Scott D Phillips (2):
   i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
   i965/miptree: Use cpu tiling/detiling when mapping

Tapani Pälli (1):
   i965: expose type of memcpy instead of memcpy function itself

  src/mesa/drivers/dri/i965/Android.mk  |  38 
  src/mesa/drivers/dri/i965/Makefile.am |  14 ++
  src/mesa/drivers/dri/i965/Makefile.sources|  10 +-
  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 110 +-
  src/mesa/drivers/dri/i965/intel_pixel_read.c  |   6 +-
  src/mesa/drivers/dri/i965/intel_tex_image.c   |  14 +-
  .../drivers/dri/i965/intel_tiled_memcpy.c | 192 ++
  .../drivers/dri/i965/intel_tiled_memcpy.h |  86 +++-
  .../dri/i965/intel_tiled_memcpy_normal.c  |  59 ++
  .../dri/i965/intel_tiled_memcpy_sse41.c   |  61 ++
  .../dri/i965/intel_tiled_memcpy_sse41.h   |  59 ++
  src/mesa/drivers/dri/i965/meson.build |  38 +++-
  12 files changed, 579 insertions(+), 108 deletions(-)
  create mode 100644 src/mesa/drivers/dri/i965/intel_tiled_memcpy_normal.c
  create mode 100644 src/mesa/drivers/dri/i965/intel_tiled_memcpy_sse41.c
  create mode 100644 src/mesa/drivers/dri/i965/intel_tiled_memcpy_sse41.h




Thanks a ton for fixing this up, Tapani!

Patches 1 and 3 are:
Reviewed-by: Kenneth Graunke 

Patch 2 is:
Acked-by: Kenneth Graunke 



Thanks Ken! Matt, are you ok with these changes?

// Tapani
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108113] [vulkancts] r32g32b32 transfer operations not implemented

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108113

--- Comment #1 from Samuel Pitoiset  ---
https://patchwork.freedesktop.org/series/50686/

-- 
You are receiving this mail because:
You are on the CC list for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] radv: disallow 3D images and mipmaps/layers for R32G32B32 linear formats

2018-10-08 Thread Samuel Pitoiset
R32G32B32 are weird formats and we are only going to support
some basic operations for now.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_formats.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/amd/vulkan/radv_formats.c b/src/amd/vulkan/radv_formats.c
index ad06c9e996..a7aa819e2b 100644
--- a/src/amd/vulkan/radv_formats.c
+++ b/src/amd/vulkan/radv_formats.c
@@ -1091,6 +1091,20 @@ static VkResult radv_get_image_format_properties(struct 
radv_physical_device *ph
sampleCounts |= VK_SAMPLE_COUNT_2_BIT | VK_SAMPLE_COUNT_4_BIT | 
VK_SAMPLE_COUNT_8_BIT;
}
 
+   if (info->tiling == VK_IMAGE_TILING_LINEAR &&
+   (info->format == VK_FORMAT_R32G32B32_SFLOAT ||
+info->format == VK_FORMAT_R32G32B32_SINT ||
+info->format == VK_FORMAT_R32G32B32_UINT)) {
+   /* R32G32B32 is a weird format and the driver currently only
+* supports the barely minimum.
+* TODO: Implement more if we really need to.
+*/
+   if (info->type == VK_IMAGE_TYPE_3D)
+   goto unsupported;
+   maxArraySize = 1;
+   maxMipLevels = 1;
+   }
+
if (info->usage & VK_IMAGE_USAGE_SAMPLED_BIT) {
if (!(format_feature_flags & 
VK_FORMAT_FEATURE_SAMPLED_IMAGE_BIT)) {
goto unsupported;
-- 
2.19.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radv: implement clear operations for R32G32B32

2018-10-08 Thread Samuel Pitoiset
This fixes crashes for some CTS:
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.linear_*_*
dEQP-VK.api.copy_and_blit.core.blit_image.all_formats.color.*.*_linear_*

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108113
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_meta_bufimage.c | 268 
 src/amd/vulkan/radv_meta_clear.c|   5 +-
 src/amd/vulkan/radv_private.h   |   5 +
 3 files changed, 277 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_meta_bufimage.c 
b/src/amd/vulkan/radv_meta_bufimage.c
index b596173fe1..79ffad5fca 100644
--- a/src/amd/vulkan/radv_meta_bufimage.c
+++ b/src/amd/vulkan/radv_meta_bufimage.c
@@ -893,6 +893,164 @@ radv_device_finish_meta_cleari_state(struct radv_device 
*device)
 state->cleari.pipeline_3d, >alloc);
 }
 
+/* Special path for clearing R32G32B32 images using a compute shader. */
+static nir_shader *
+build_nir_cleari_r32g32b32_compute_shader(struct radv_device *dev)
+{
+   nir_builder b;
+   const struct glsl_type *img_type = 
glsl_sampler_type(GLSL_SAMPLER_DIM_BUF,
+false,
+false,
+GLSL_TYPE_FLOAT);
+   nir_builder_init_simple_shader(, NULL, MESA_SHADER_COMPUTE, NULL);
+   b.shader->info.name = ralloc_strdup(b.shader, 
"meta_cleari_r32g32b32_cs");
+   b.shader->info.cs.local_size[0] = 16;
+   b.shader->info.cs.local_size[1] = 16;
+   b.shader->info.cs.local_size[2] = 1;
+
+   nir_variable *output_img = nir_variable_create(b.shader, 
nir_var_uniform,
+  img_type, "out_img");
+   output_img->data.descriptor_set = 0;
+   output_img->data.binding = 0;
+
+   nir_ssa_def *invoc_id = nir_load_system_value(, 
nir_intrinsic_load_local_invocation_id, 0);
+   nir_ssa_def *wg_id = nir_load_system_value(, 
nir_intrinsic_load_work_group_id, 0);
+   nir_ssa_def *block_size = nir_imm_ivec4(,
+   b.shader->info.cs.local_size[0],
+   b.shader->info.cs.local_size[1],
+   
b.shader->info.cs.local_size[2], 0);
+
+   nir_ssa_def *global_id = nir_iadd(, nir_imul(, wg_id, block_size), 
invoc_id);
+
+   nir_intrinsic_instr *clear_val = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_load_push_constant);
+   nir_intrinsic_set_base(clear_val, 0);
+   nir_intrinsic_set_range(clear_val, 16);
+   clear_val->src[0] = nir_src_for_ssa(nir_imm_int(, 0));
+   clear_val->num_components = 3;
+   nir_ssa_dest_init(_val->instr, _val->dest, 3, 32, 
"clear_value");
+   nir_builder_instr_insert(, _val->instr);
+
+   nir_intrinsic_instr *stride = nir_intrinsic_instr_create(b.shader, 
nir_intrinsic_load_push_constant);
+   nir_intrinsic_set_base(stride, 0);
+   nir_intrinsic_set_range(stride, 16);
+   stride->src[0] = nir_src_for_ssa(nir_imm_int(, 12));
+   stride->num_components = 1;
+   nir_ssa_dest_init(>instr, >dest, 1, 32, "stride");
+   nir_builder_instr_insert(, >instr);
+
+   nir_ssa_def *global_x = nir_channel(, global_id, 0);
+   nir_ssa_def *global_y = nir_channel(, global_id, 1);
+
+   nir_ssa_def *global_pos =
+   nir_iadd(,
+nir_imul(, global_y, >dest.ssa),
+nir_imul(, global_x, nir_imm_int(, 3)));
+
+   for (unsigned chan = 0; chan < 3; chan++) {
+   nir_ssa_def *local_pos =
+   nir_iadd(, global_pos, nir_imm_int(, chan));
+
+   nir_ssa_def *coord =
+   nir_vec4(, local_pos, local_pos, local_pos, 
local_pos);
+
+   nir_intrinsic_instr *store = 
nir_intrinsic_instr_create(b.shader, nir_intrinsic_image_deref_store);
+   store->num_components = 1;
+   store->src[0] = nir_src_for_ssa(_build_deref_var(, 
output_img)->dest.ssa);
+   store->src[1] = nir_src_for_ssa(coord);
+   store->src[2] = nir_src_for_ssa(nir_ssa_undef(, 1, 32));
+   store->src[3] = nir_src_for_ssa(nir_channel(, 
_val->dest.ssa, chan));
+   nir_builder_instr_insert(, >instr);
+   }
+
+   return b.shader;
+}
+
+static VkResult
+radv_device_init_meta_cleari_r32g32b32_state(struct radv_device *device)
+{
+   VkResult result;
+   struct radv_shader_module cs = { .nir = NULL };
+
+   cs.nir = build_nir_cleari_r32g32b32_compute_shader(device);
+
+   VkDescriptorSetLayoutCreateInfo ds_create_info = {
+   .sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO,
+   .flags = 
VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR,
+   .bindingCount = 1,
+   .pBindings = 

[Mesa-dev] [Bug 93089] mesa fails to check for gcc atomic primitives before using them

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=93089

--- Comment #21 from freedesk...@eckner.net ---
This also popped up on archlinux32 when trying to package mesa for i486:

/usr/bin/ld: src/intel/vulkan/libanv_common.a(anv_allocator.c.o): in function
`anv_block_pool_alloc_new':
anv_allocator.c:(.text+0x1aa): undefined reference to `__sync_fetch_and_add_8'
/usr/bin/ld: anv_allocator.c:(.text+0x307): undefined reference to
`__sync_lock_test_and_set_8'
/usr/bin/ld: src/intel/vulkan/libanv_common.a(anv_allocator.c.o): in function
`anv_free_list_pop':
anv_allocator.c:(.text+0x440): undefined reference to
`__sync_val_compare_and_swap_8'
/usr/bin/ld: src/intel/vulkan/libanv_common.a(anv_allocator.c.o): in function
`anv_free_list_push':
anv_allocator.c:(.text+0x526): undefined reference to
`__sync_val_compare_and_swap_8'
/usr/bin/ld: src/intel/vulkan/libanv_common.a(anv_allocator.c.o): in function
`anv_state_pool_alloc_no_vg':
anv_allocator.c:(.text+0x755): undefined reference to `__sync_fetch_and_add_8'
/usr/bin/ld: anv_allocator.c:(.text+0x7a3): undefined reference to
`__sync_lock_test_and_set_8'


I seem unable to find a source file (only the mentioned *.a and *.o) which
actually tries to use these atomics - any idea, where I should look / try to
patch?

> gcc --version
gcc (GCC) 8.2.1 20180831
> uname -a
Linux arch32-i486-bs0 4.17.6-1-ARCH #1 SMP PREEMPT Wed Jul 18 08:34:04 UTC 2018
i486 GNU/Linux

Best Regards,
Erich

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: Divergence Analysis

2018-10-08 Thread Daniel Schürmann
---
 src/compiler/nir/meson.build   |   1 +
 src/compiler/nir/nir.h |   2 +
 src/compiler/nir/nir_divergence_analysis.c | 333 +
 3 files changed, 336 insertions(+)
 create mode 100644 src/compiler/nir/nir_divergence_analysis.c

diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
index 090aa7a628..aabfeee02c 100644
--- a/src/compiler/nir/meson.build
+++ b/src/compiler/nir/meson.build
@@ -96,6 +96,7 @@ files_libnir = files(
   'nir_control_flow_private.h',
   'nir_deref.c',
   'nir_deref.h',
+  'nir_divergence_analysis.c',
   'nir_dominance.c',
   'nir_format_convert.h',
   'nir_from_ssa.c',
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index e0df95c391..374280a1cc 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -3010,6 +3010,8 @@ void nir_convert_loop_to_lcssa(nir_loop *loop);
  */
 bool nir_convert_from_ssa(nir_shader *shader, bool phi_webs_only);
 
+bool* nir_divergence_analysis(nir_shader *shader);
+
 bool nir_lower_phis_to_regs_block(nir_block *block);
 bool nir_lower_ssa_defs_to_regs_block(nir_block *block);
 bool nir_rematerialize_derefs_in_use_blocks_impl(nir_function_impl *impl);
diff --git a/src/compiler/nir/nir_divergence_analysis.c 
b/src/compiler/nir/nir_divergence_analysis.c
new file mode 100644
index 00..d91f4e55e6
--- /dev/null
+++ b/src/compiler/nir/nir_divergence_analysis.c
@@ -0,0 +1,333 @@
+/*
+ * Copyright ?? 2018 Valve Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Daniel Sch??rmann (daniel.schuerm...@campus.tu-berlin.de)
+ *
+ */
+
+#include "nir.h"
+#include "nir_worklist.h"
+
+/* This pass computes for each ssa definition if it is uniform.
+ * That is, the variable has the same value for all invocations
+ * of the group.
+ *
+ * This algorithm implements "The Simple Divergence Analysis" from
+ * Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quint??o 
Pereira.
+ * Divergence Analysis.  ACM Transactions on Programming Languages and Systems 
(TOPLAS),
+ * ACM, 2013, 35 (4), pp.13:1-13:36. <10.1145/2523815>. 
+ */
+
+
+static bool alu_src_is_divergent(bool *divergent, nir_alu_src src, unsigned 
num_input_components)
+{
+   /* If the alu src is swizzled and defined by a vec-instruction,
+* we can check if the originating value is non-divergent. */
+   if (num_input_components == 1 &&
+   src.src.ssa->num_components != 1 &&
+   src.src.parent_instr->type == nir_instr_type_alu) {
+  nir_alu_instr *parent = nir_instr_as_alu(src.src.parent_instr);
+  switch(parent->op) {
+ case nir_op_vec2:
+ case nir_op_vec3:
+ case nir_op_vec4: {
+if (divergent[parent->src[src.swizzle[0]].src.ssa->index])
+   return true;
+return false;
+ }
+ default:
+break;
+  }
+   }
+   return divergent[src.src.ssa->index];
+}
+
+static bool visit_alu(bool *divergent, nir_alu_instr *instr)
+{
+   if (divergent[instr->dest.dest.ssa.index])
+  return false;
+   unsigned num_src = nir_op_infos[instr->op].num_inputs;
+   for (unsigned i = 0; i < num_src; i++) {
+  if (alu_src_is_divergent(divergent, instr->src[i], 
nir_op_infos[instr->op].input_sizes[i])) {
+ divergent[instr->dest.dest.ssa.index] = true;
+ return true;
+  }
+   }
+   divergent[instr->dest.dest.ssa.index] = false;
+   return false;
+}
+
+static bool visit_intrinsic(bool *divergent, nir_intrinsic_instr *instr)
+{
+   if (!nir_intrinsic_infos[instr->intrinsic].has_dest)
+  return false;
+   if (divergent[instr->dest.ssa.index])
+  return false;
+   bool is_divergent = false;
+   switch (instr->intrinsic) {
+   /* TODO: load_shared_var */
+   /*   load_uniform etc.*/
+   case nir_intrinsic_shader_clock:
+   case nir_intrinsic_ballot:
+   case nir_intrinsic_read_invocation:
+   case 

[Mesa-dev] [RFC] nir: Divergence Analysis

2018-10-08 Thread Daniel Schürmann

This is an RFC for a Divergence Analysis for NIR.

The algorithm implements "The Simple Divergence Analysis" from
Diogo Sampaio, Rafael De Souza, Sylvain Collange, Fernando Magno Quintão 
Pereira.
Divergence Analysis.

The proposed pass computes for each ssa definition if it is uniform.
That is, the variable has the same value for all invocations of the group.
If the value might be different for some thread, we call it divergent.
The algorithm is a worklist algorithm and starts with the assumption that all 
values
are uniform / non-divergent and iterates until convergence.

Motivation:
Divergence Analysis can be used for various optimizations:
control flow optimizations such as branch distribution, branch fusion, branch 
splitting, 
loop collapsing, iteration delaying and thread reallocation, 
but also memory optimizations like memory coalescing, and work unification.
Not all optimizations are applicable for every backend, and some cannot be used 
at all (like thread reallocation).

Implementation State:
This implementation is incomplete, but the difficult part is done (i.e. the 
control flow handling).
There are some intrinsics and maybe some special cases missing.
Also the way the worklist is handled is still a bit inefficient.
Currently, the pass returns an array of bools where 'true' corresponds to 
'divergent'.
We might want to add a flag directly to the ssa-defs.

Future Work:
The mentioned paper also contains a more complex divergence analysis, which
calculates dependencies to the thread index, where a divergent value can
be recomputed by using a uniform value and the thread index.
There are a few use cases for this analysis like simplified address calculation
and rematerialization, but the main benefit should be achievable with the 
simple version.

Final Note:
My hope is that this DA helps on some control flow optimizations and improves 
global code motion.
Some backends might also use it for work unification by using a scalar unit 
and/or save some register space.
I'd be glad about any comments.

Kind regards,
Daniel

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108263] glGetTexImage with PBO is not accelerated on Gallium

2018-10-08 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108263

soredake  changed:

   What|Removed |Added

 CC||fds...@krutt.org

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] sse4 patches

2018-10-08 Thread Kenneth Graunke
On Monday, September 24, 2018 4:19:36 AM PDT Tapani Pälli wrote:
> Hi;
> 
> Here's another try to inline sse41 code and get rid of gtt maps 
> in intel_miptree_map (revert 58fb613a519). To be able to safely 
> utilize sse41 we separate sse41 functionality as a library and 
> then choose run time if we want to use it.
> 
> Couple of different approaches were tried, this one seems one with 
> minimal overall changes.
> 
> // Tapani
> 
> Scott D Phillips (2):
>   i965/tiled_memcpy: inline movntdqa loads in tiled_to_linear
>   i965/miptree: Use cpu tiling/detiling when mapping
> 
> Tapani Pälli (1):
>   i965: expose type of memcpy instead of memcpy function itself
> 
>  src/mesa/drivers/dri/i965/Android.mk  |  38 
>  src/mesa/drivers/dri/i965/Makefile.am |  14 ++
>  src/mesa/drivers/dri/i965/Makefile.sources|  10 +-
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 110 +-
>  src/mesa/drivers/dri/i965/intel_pixel_read.c  |   6 +-
>  src/mesa/drivers/dri/i965/intel_tex_image.c   |  14 +-
>  .../drivers/dri/i965/intel_tiled_memcpy.c | 192 ++
>  .../drivers/dri/i965/intel_tiled_memcpy.h |  86 +++-
>  .../dri/i965/intel_tiled_memcpy_normal.c  |  59 ++
>  .../dri/i965/intel_tiled_memcpy_sse41.c   |  61 ++
>  .../dri/i965/intel_tiled_memcpy_sse41.h   |  59 ++
>  src/mesa/drivers/dri/i965/meson.build |  38 +++-
>  12 files changed, 579 insertions(+), 108 deletions(-)
>  create mode 100644 src/mesa/drivers/dri/i965/intel_tiled_memcpy_normal.c
>  create mode 100644 src/mesa/drivers/dri/i965/intel_tiled_memcpy_sse41.c
>  create mode 100644 src/mesa/drivers/dri/i965/intel_tiled_memcpy_sse41.h
> 
> 

Thanks a ton for fixing this up, Tapani!

Patches 1 and 3 are:
Reviewed-by: Kenneth Graunke 

Patch 2 is:
Acked-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev