Re: [Mesa-dev] [PATCH v3 08/42] intel/compiler: implement 16-bit fsign

2019-01-17 Thread Iago Toral
On Thu, 2019-01-17 at 13:55 -0600, Jason Ekstrand wrote:
> On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga  > wrote:
> > v2:
> > 
> >  - make 16-bit be its own separate case (Jason)
> > 
> > 
> > 
> > Reviewed-by: Topi Pohjolainen 
> > 
> > ---
> > 
> >  src/intel/compiler/brw_fs_nir.cpp | 18 +-
> > 
> >  1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_fs_nir.cpp
> > b/src/intel/compiler/brw_fs_nir.cpp
> > 
> > index d742f55a957..cf546b8ff09 100644
> > 
> > --- a/src/intel/compiler/brw_fs_nir.cpp
> > 
> > +++ b/src/intel/compiler/brw_fs_nir.cpp
> > 
> > @@ -844,7 +844,22 @@ fs_visitor::nir_emit_alu(const fs_builder
> > , nir_alu_instr *instr)
> > 
> >  : bld.MOV(result, brw_imm_f(1.0f));
> > 
> > 
> > 
> >   set_predicate(BRW_PREDICATE_NORMAL, inst);
> > 
> > -  } else if (type_sz(op[0].type) < 8) {
> > 
> > +  } else if (type_sz(op[0].type) == 2) {
> > 
> > + /* AND(val, 0x8000) gives the sign bit.
> > 
> > +  *
> > 
> > +  * Predicated OR ORs 1.0 (0x3c00) with the sign bit if
> > val is not zero.
> > 
> > +  */
> > 
> > + fs_reg zero = retype(brw_imm_uw(0),
> > BRW_REGISTER_TYPE_HF);
> > 
> > + bld.CMP(bld.null_reg_f(), op[0], zero,
> > BRW_CONDITIONAL_NZ);
> > 
> > +
> > 
> > + fs_reg result_int = retype(result, BRW_REGISTER_TYPE_UW);
> > 
> > + op[0].type = BRW_REGISTER_TYPE_UW;
> > 
> > + result.type = BRW_REGISTER_TYPE_UW;
> 
> Why are you whacking the type on result and also making a result_int
> temp?  I guess you just copied that from the 32-bit case?  

Oh yes, I didn't noticed that.
> If we're going to whack result.type (which is fine), just use result
> for the rest of it.  With that fixed,

Right, while I am on it I guess it makes sense to do this small fix for
the 32-bit case in the same patch unless you prefer that to be a
separate change.
> Reviewed-by: Jason Ekstrand 
>  
> > + bld.AND(result_int, op[0], brw_imm_uw(0x8000u));
> > 
> > +
> > 
> > + inst = bld.OR(result_int, result_int,
> > brw_imm_uw(0x3c00u));
> > 
> > + inst->predicate = BRW_PREDICATE_NORMAL;
> > 
> > +  } else if (type_sz(op[0].type) == 4) {
> > 
> >   /* AND(val, 0x8000) gives the sign bit.
> > 
> >*
> > 
> >* Predicated OR ORs 1.0 (0x3f80) with the sign bit
> > if val is not
> > 
> > @@ -866,6 +881,7 @@ fs_visitor::nir_emit_alu(const fs_builder ,
> > nir_alu_instr *instr)
> > 
> >* - The sign is encoded in the high 32-bit of each DF
> > 
> >* - We need to produce a DF result.
> > 
> >*/
> > 
> > + assert(type_sz(op[0].type) == 8);
> > 
> > 
> > 
> >   fs_reg zero = vgrf(glsl_type::double_type);
> > 
> >   bld.MOV(zero, setup_imm_df(bld, 0.0));
> > 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 02/42] intel/compiler: add a NIR pass to lower conversions

2019-01-17 Thread Iago Toral
On Thu, 2019-01-17 at 13:42 -0600, Jason Ekstrand wrote:
> On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga  > wrote:
> > Some conversions are not directly supported in hardware and need to
> > be
> > 
> > split in two conversion instructions going through an intermediary
> > type.
> > 
> > Doing this at the NIR level simplifies a bit the complexity in the
> > backend.
> > 
> > 
> > 
> > v2:
> > 
> >  - Consider fp16 rounding conversion opcodes
> > 
> >  - Properly handle swizzles on conversion sources.
> > 
> > 
> > 
> > Reviewed-by: Topi Pohjolainen  (v1)
> > 
> > ---
> > 
> >  src/intel/Makefile.sources|   1 +
> > 
> >  src/intel/compiler/brw_nir.c  |   1 +
> > 
> >  src/intel/compiler/brw_nir.h  |   2 +
> > 
> >  .../compiler/brw_nir_lower_conversions.c  | 158
> > ++
> > 
> >  src/intel/compiler/meson.build|   1 +
> > 
> >  5 files changed, 163 insertions(+)
> > 
> >  create mode 100644 src/intel/compiler/brw_nir_lower_conversions.c
> > 
> > 
> > 
> > diff --git a/src/intel/Makefile.sources
> > b/src/intel/Makefile.sources
> > 
> > index 94a28d370e8..9975daa3ad1 100644
> > 
> > --- a/src/intel/Makefile.sources
> > 
> > +++ b/src/intel/Makefile.sources
> > 
> > @@ -83,6 +83,7 @@ COMPILER_FILES = \
> > 
> > compiler/brw_nir_analyze_boolean_resolves.c \
> > 
> > compiler/brw_nir_analyze_ubo_ranges.c \
> > 
> > compiler/brw_nir_attribute_workarounds.c \
> > 
> > +   compiler/brw_nir_lower_conversions.c \
> > 
> > compiler/brw_nir_lower_cs_intrinsics.c \
> > 
> > compiler/brw_nir_lower_image_load_store.c \
> > 
> > compiler/brw_nir_lower_mem_access_bit_sizes.c \
> > 
> > diff --git a/src/intel/compiler/brw_nir.c
> > b/src/intel/compiler/brw_nir.c
> > 
> > index 92d7fe4bede..572ab824a94 100644
> > 
> > --- a/src/intel/compiler/brw_nir.c
> > 
> > +++ b/src/intel/compiler/brw_nir.c
> > 
> > @@ -882,6 +882,7 @@ brw_postprocess_nir(nir_shader *nir, const
> > struct brw_compiler *compiler,
> > 
> > OPT(nir_opt_move_comparisons);
> > 
> > 
> > 
> > OPT(nir_lower_bool_to_int32);
> > 
> > +   OPT(brw_nir_lower_conversions);
> 
> This is *really* late and I don't think you actually want this to run
> after we apply source/destination modifiers.  If you just move it to
> right after nir_opt_algebraic_late, that will fix a multitude of
> issues.

Sure, I'll move it there.
> > 
> > OPT(nir_lower_locals_to_regs);
> > 
> > 
> > 
> > diff --git a/src/intel/compiler/brw_nir.h
> > b/src/intel/compiler/brw_nir.h
> > 
> > index bc81950d47e..662b2627e95 100644
> > 
> > --- a/src/intel/compiler/brw_nir.h
> > 
> > +++ b/src/intel/compiler/brw_nir.h
> > 
> > @@ -114,6 +114,8 @@ void brw_nir_lower_tcs_outputs(nir_shader *nir,
> > const struct brw_vue_map *vue,
> > 
> > GLenum tes_primitive_mode);
> > 
> >  void brw_nir_lower_fs_outputs(nir_shader *nir);
> > 
> > 
> > 
> > +bool brw_nir_lower_conversions(nir_shader *nir);
> > 
> > +
> > 
> >  bool brw_nir_lower_image_load_store(nir_shader *nir,
> > 
> >  const struct gen_device_info
> > *devinfo);
> > 
> >  void brw_nir_rewrite_image_intrinsic(nir_intrinsic_instr *intrin,
> > 
> > diff --git a/src/intel/compiler/brw_nir_lower_conversions.c
> > b/src/intel/compiler/brw_nir_lower_conversions.c
> > 
> > new file mode 100644
> > 
> > index 000..583167c7753
> > 
> > --- /dev/null
> > 
> > +++ b/src/intel/compiler/brw_nir_lower_conversions.c
> > 
> > @@ -0,0 +1,158 @@
> > 
> > +/*
> > 
> > + * Copyright © 2018 Intel Corporation
> > 
> > + *
> > 
> > + * Permission is hereby granted, free of charge, to any person
> > obtaining a
> > 
> > + * copy of this software and associated documentation files (the
> > "Software"),
> > 
> > + * to deal in the Software without restriction, including without
> > limitation
> > 
> > + * the rights to use, copy, modify, merge, publish, distribute,
> > sublicense,
> > 
> > + * and/or sell copies of the Software, and to permit persons to
> > whom the
> > 
> > + * Software is furnished to do so, subject to the following
> > conditions:
> > 
> > + *
> > 
> > + * The above copyright notice and this permission notice
> > (including the next
> > 
> > + * paragraph) shall be included in all copies or substantial
> > portions of the
> > 
> > + * Software.
> > 
> > + *
> > 
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> > EXPRESS OR
> > 
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > MERCHANTABILITY,
> > 
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> > EVENT SHALL
> > 
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM,
> > DAMAGES OR OTHER
> > 
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> > ARISING
> > 
> > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> > OTHER DEALINGS
> > 
> > + * IN THE SOFTWARE.
> > 

Re: [Mesa-dev] [PATCH v3 39/42] intel/compiler: remove MAD/LRP algebraic optimizations from the backend

2019-01-17 Thread Iago Toral
On Thu, 2019-01-17 at 22:31 -0600, Jason Ekstrand wrote:
> On Thu, Jan 17, 2019 at 6:42 PM Matt Turner 
> wrote:
> > On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga <
> > ito...@igalia.com> wrote:
> > 
> > >
> > 
> > > NIR already has these so they are redundant. A run of shader-db
> > confirms
> > 
> > > that the only cases where these backend optimizations are
> > activated
> > 
> > > are some Tomb Raider shaders where the affected variables are
> > qualified
> > 
> > > as "precise", which is why NIR won't apply them and why the
> > backend
> > 
> > > shouldn't either (so it is actually a bug).
> > 
> > 
> > 
> > Which of the six optimizations that you're removing were
> > responsible
> > 
> > for the change? I ask because...
> 
> If it's one of the precise ones, we should port it to NIR...
> 
> > >
> > 
> > > Suggested-by: Jason Ekstrand 
> > 
> > > ---
> > 
> > >  src/intel/compiler/brw_fs.cpp | 37 ---
> > 
> > 
> > >  1 file changed, 37 deletions(-)
> > 
> > >
> > 
> > > diff --git a/src/intel/compiler/brw_fs.cpp
> > b/src/intel/compiler/brw_fs.cpp
> > 
> > > index 77c955ac435..e7f5a8822a3 100644
> > 
> > > --- a/src/intel/compiler/brw_fs.cpp
> > 
> > > +++ b/src/intel/compiler/brw_fs.cpp
> > 
> > > @@ -2568,16 +2568,6 @@ fs_visitor::opt_algebraic()
> > 
> > >  break;
> > 
> > >   }
> > 
> > >   break;
> > 
> > > -  case BRW_OPCODE_LRP:
> > 
> > > - if (inst->src[1].equals(inst->src[2])) {
> > 
> > > -inst->opcode = BRW_OPCODE_MOV;
> > 
> > > -inst->src[0] = inst->src[1];
> > 
> > > -inst->src[1] = reg_undef;
> > 
> > > -inst->src[2] = reg_undef;
> > 
> > > -progress = true;
> > 
> > > -break;
> > 
> > 
> > 
> > I'm not sure whether this is imprecise, and...
> 
> Doesn't work for NaN or either inf, at least not unles inf - inf == 0
> which I don't think it is.

This exists in NIR algebraic and is marked as inexact there (plus it is
not triggered by anything in shader-db)
> > > - }
> > 
> > > - break;
> > 
> > >case BRW_OPCODE_CMP:
> > 
> > >   if ((inst->conditional_mod == BRW_CONDITIONAL_Z ||
> > 
> > >inst->conditional_mod == BRW_CONDITIONAL_NZ) &&
> > 
> > > @@ -2654,33 +2644,6 @@ fs_visitor::opt_algebraic()
> > 
> > >  }
> > 
> > >   }
> > 
> > >   break;
> > 
> > > -  case BRW_OPCODE_MAD:
> > 
> > > - if (inst->src[1].is_zero() || inst->src[2].is_zero()) {
> > 
> > > -inst->opcode = BRW_OPCODE_MOV;
> > 
> > > -inst->src[1] = reg_undef;
> > 
> > > -inst->src[2] = reg_undef;
> > 
> > > -progress = true;
> > 
> > > - } else if (inst->src[0].is_zero()) {
> > 
> > > -inst->opcode = BRW_OPCODE_MUL;
> > 
> > > -inst->src[0] = inst->src[2];
> > 
> > > -inst->src[2] = reg_undef;
> > 
> > > -progress = true;

I believe these were the only cases triggered by the Tomb Raider
shaders. These optimizations exist in NIR and  are marked as imprecise
there too.
> > > - } else if (inst->src[1].is_one()) {
> > 
> > > -inst->opcode = BRW_OPCODE_ADD;
> > 
> > > -inst->src[1] = inst->src[2];
> > 
> > > -inst->src[2] = reg_undef;
> > 
> > > -progress = true;
> > 
> > > - } else if (inst->src[2].is_one()) {
> > 
> > > -inst->opcode = BRW_OPCODE_ADD;
> > 
> > > -inst->src[2] = reg_undef;
> > 
> > > -progress = true;

These also exist in NIR, but I see they are not marked as imprecise
there, not sure why, looks like a bug to me right?
> > > - } else if (inst->src[1].file == IMM && inst-
> > >src[2].file == IMM) {
> > 
> > > -inst->opcode = BRW_OPCODE_ADD;
> > 
> > > -inst->src[1].f *= inst->src[2].f;
> > 
> > > -inst->src[2] = reg_undef;
> > 
> > > -progress = true;
> > 
> > 

This one doesn't exist in NIR but it clearly breaks precise
requirements since it its manually breaking  MAD into MUL + ADD, which
has different precision.
> or this one.
> 
> Yes, it is.  Part of the point of FMA is that it's more precise than
> mul+add because the mul is done with extra precision and added to
> src[0] in high-precision before the final rounding.  This
> optimization explicitly breaks the MAD into mul+add.--Jason
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] anv/device: fix maximum number of images supported

2019-01-17 Thread Iago Toral
On Thu, 2019-01-17 at 12:57 -0600, Jason Ekstrand wrote:
> Yup, that'll do it.  Gotta watch out for ++...  Assuming it fixes the
> problem, that patch is
> 
> Reviewed-by: Jason Ekstrand 
> 
> 
> On Thu, Jan 17, 2019 at 12:35 PM Lionel Landwerlin <
> lionel.g.landwer...@intel.com> wrote:
> >   
> > 
> >   
> >   
> > Looking at the change the binding table
> >   emission, I think the image++ has been moved such that it
> > doesn't
> >   produce the same tables anymore.
> > Trying this change on CI :
> > 
https://github.com/djdeath/mesa/commit/a6b8eaf1325389d94d1d8a5b3bb952a362125eb2
> > 
> > 
> > 

Yes, this is what the patch that I sent for review did, I just pushed a
previous version, sorry for the mess :-(
Iago


> > On 17/01/2019 18:19, Clayton Craft
> >   wrote:
> > 
> > 
> > 
> > >   Quoting Mark Janes (2019-01-17 10:13:37)
> > > 
> > >   
> > > > Hi Iago,
> > > > It looks like you tested this patch in CI and got the same
> > > > failures thatwe are seeing on master:
> > > > 
http://mesa-ci-results.jf.intel.com/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845
> > > > 
> > > >   
> > > 
> > >   The correct link is:
> > > https://mesa-ci.01.org/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845
> > > 
> > > 
> > >   
> > > > Why was this patch pushed?
> > > > -Mark
> > > > Mark Janes  writes:
> > > > 
> > > > 
> > > > 
> > > > >   This patch regresses thousands of tests on BDW and
> > > > > HSW:
> > > > > 
http://mesa-ci-results.jf.intel.com/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails
> > > > > 
> > > > > 
> > > > 
> > > >   
> > > 
> > >   
> > > https://mesa-ci.01.org/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails
> > > 
> > > 
> > >   
> > > > 
> > > > >   I'll revert it as soon as my testing completes.
> > > > > Iago Toral Quiroga  writes:
> > > > > 
> > > > > 
> > > > >   
> > > > > > We had defined MAX_IMAGES as 8, which we used
> > > > > > to size the array forimage push constant data. The comment
> > > > > > there stated that this was forgen8, but
> > > > > > anv_nir_apply_pipeline_layout runs for all gens and
> > > > > > writesthat array, asserting that we don't exceed that
> > > > > > number of images,which imposes a limit of MAX_IMAGES on all
> > > > > > gens.
> > > > > > Furthermore, despite this, we are exposing up to 64 images
> > > > > > per shaderstage on all gens, gen8 included.
> > > > > > This patch lowers the number of images we expose in gen8 to
> > > > > > 8 andkeeps 64 images for gen9+ while making sure that only
> > > > > > pre-SKL gensuse push constant space to handle images.
> > > > > > v2: - <= instead of < in the assert (Eric, Lionel) - Change
> > > > > > the way the assertion is written (Eric)
> > > > > > v3: - Revert the way the assertion is written to the form
> > > > > > it had in v1,   the version in v2 was not equivalent and
> > > > > > was incorrect. (Lionel)
> > > > > > v4: - gen9+ doesn't need push constants for images at all
> > > > > > (Jason)---
> > > > > > src/intel/vulkan/anv_device.c |  7 --
> > > > > > .../vulkan/anv_nir_apply_pipeline_layout.c|  4 +--
> > > > > > src/intel/vulkan/anv_private.h|  5 ++--
> > > > > > src/intel/vulkan/genX_cmd_buffer.c| 25
> > > > > > +-- 4 files changed, 28 insertions(+), 13
> > > > > > deletions(-)
> > > > > > diff --git a/src/intel/vulkan/anv_device.c
> > > > > > b/src/intel/vulkan/anv_device.cindex
> > > > > > 523f1483e29..f85458b672e 100644---
> > > > > > a/src/intel/vulkan/anv_device.c+++
> > > > > > b/src/intel/vulkan/anv_device.c@@ -987,9 +987,12 @@ void
> > > > > > anv_GetPhysicalDeviceProperties(const uint32_t
> > > > > > max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell)
> > > > > > ?  128 : 16; +   const
> > > > > > uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES :
> > > > > > MAX_IMAGES;+VkSampleCountFlags sample_counts
> > > > > > =   isl_device_get_sample_counts(>isl_dev);
> > > > > > +VkPhysicalDeviceLimits limits =
> > > > > > {   .maxImageDimension1D  = (1 <<
> > > > > > 14),   .maxImageDimension2D  = (1
> > > > > > << 14),@@ -1009,7 +1012,7 @@ void
> > > > > > anv_GetPhysicalDeviceProperties(   .maxPerStageDescript
> > > > > > orUniformBuffers  =
> > > > > > 64,   .maxPerStageDescriptorStorageBuffers  =
> > > > > > 64,   .maxPerStageDescriptorSampledImages   =
> > > > > > max_samplers,-  .maxPerStageDescriptorStorageImages
> > > > > >= 64,+  .maxPerStageDescriptorStorageImages   =
> > > > > > max_images,   .maxPerStageDescriptorInputAttachments   
> > > > > >  = 64,   .maxPerStageResources =
> > > > > > 250,   .maxDescriptorSetSamplers 

[Mesa-dev] [Bug 106595] [RADV] Rendering distortions only when MSAA is enabled

2019-01-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=106595

zefkerri...@gmail.com changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |FIXED

--- Comment #25 from zefkerri...@gmail.com ---
(In reply to Rhys Perry from comment #24)
> I seem to be able to reproduce the problem (alpha-tested geometry not
> visible) but only when Multi-sample Alpha-Test is enabled and WineD3D is
> used.
> After disabling Multi-sample Alpha-Test or using DXVK, alpha-tested geometry
> seems to render perfectly.
> 
> Are you sure you're using DXVK? If so, are you able to reproduce the problem
> in the character creation screen (which is what I've been looking at)?
> Perhaps it only occurs in some areas (such as the one you've been taking
> screenshots of).

This RADV issue went away with the latest build of Mesa-git. Thank you very
much!
But the WineD3D/RadeonSI issue is unfortunately still here.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 39/42] intel/compiler: remove MAD/LRP algebraic optimizations from the backend

2019-01-17 Thread Jason Ekstrand
On Thu, Jan 17, 2019 at 6:42 PM Matt Turner  wrote:

> On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga 
> wrote:
> >
> > NIR already has these so they are redundant. A run of shader-db confirms
> > that the only cases where these backend optimizations are activated
> > are some Tomb Raider shaders where the affected variables are qualified
> > as "precise", which is why NIR won't apply them and why the backend
> > shouldn't either (so it is actually a bug).
>
> Which of the six optimizations that you're removing were responsible
> for the change? I ask because...
>

If it's one of the precise ones, we should port it to NIR...


> >
> > Suggested-by: Jason Ekstrand 
> > ---
> >  src/intel/compiler/brw_fs.cpp | 37 ---
> >  1 file changed, 37 deletions(-)
> >
> > diff --git a/src/intel/compiler/brw_fs.cpp
> b/src/intel/compiler/brw_fs.cpp
> > index 77c955ac435..e7f5a8822a3 100644
> > --- a/src/intel/compiler/brw_fs.cpp
> > +++ b/src/intel/compiler/brw_fs.cpp
> > @@ -2568,16 +2568,6 @@ fs_visitor::opt_algebraic()
> >  break;
> >   }
> >   break;
> > -  case BRW_OPCODE_LRP:
> > - if (inst->src[1].equals(inst->src[2])) {
> > -inst->opcode = BRW_OPCODE_MOV;
> > -inst->src[0] = inst->src[1];
> > -inst->src[1] = reg_undef;
> > -inst->src[2] = reg_undef;
> > -progress = true;
> > -break;
>
> I'm not sure whether this is imprecise, and...
>

Doesn't work for NaN or either inf, at least not unles inf - inf == 0 which
I don't think it is.


> > - }
> > - break;
> >case BRW_OPCODE_CMP:
> >   if ((inst->conditional_mod == BRW_CONDITIONAL_Z ||
> >inst->conditional_mod == BRW_CONDITIONAL_NZ) &&
> > @@ -2654,33 +2644,6 @@ fs_visitor::opt_algebraic()
> >  }
> >   }
> >   break;
> > -  case BRW_OPCODE_MAD:
> > - if (inst->src[1].is_zero() || inst->src[2].is_zero()) {
> > -inst->opcode = BRW_OPCODE_MOV;
> > -inst->src[1] = reg_undef;
> > -inst->src[2] = reg_undef;
> > -progress = true;
> > - } else if (inst->src[0].is_zero()) {
> > -inst->opcode = BRW_OPCODE_MUL;
> > -inst->src[0] = inst->src[2];
> > -inst->src[2] = reg_undef;
> > -progress = true;
> > - } else if (inst->src[1].is_one()) {
> > -inst->opcode = BRW_OPCODE_ADD;
> > -inst->src[1] = inst->src[2];
> > -inst->src[2] = reg_undef;
> > -progress = true;
> > - } else if (inst->src[2].is_one()) {
> > -inst->opcode = BRW_OPCODE_ADD;
> > -inst->src[2] = reg_undef;
> > -progress = true;
> > - } else if (inst->src[1].file == IMM && inst->src[2].file ==
> IMM) {
> > -inst->opcode = BRW_OPCODE_ADD;
> > -inst->src[1].f *= inst->src[2].f;
> > -inst->src[2] = reg_undef;
> > -progress = true;
>
> or this one.
>

Yes, it is.  Part of the point of FMA is that it's more precise than
mul+add because the mul is done with extra precision and added to src[0] in
high-precision before the final rounding.  This optimization explicitly
breaks the MAD into mul+add.

--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 109258] Weston drm-backend.so seems to fail with Mesa master and LIBGL_ALWAYS_SOFTWARE=1

2019-01-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109258

--- Comment #5 from n3rdopolis  ---
Created attachment 143147
  --> https://bugs.freedesktop.org/attachment.cgi?id=143147=edit
Reverse 47273d7312cb5b5b6b0b9faa814d574bbbce1c01

Weston and Gnome shell work with LIBGL_ALWAYS_SOFTWARE with this patch, but all
I did was revert commit 47273d7312cb5b5b6b0b9faa814d574bbbce1c01 and then
changed up the conflicts.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 39/42] intel/compiler: remove MAD/LRP algebraic optimizations from the backend

2019-01-17 Thread Matt Turner
On Tue, Jan 15, 2019 at 5:54 AM Iago Toral Quiroga  wrote:
>
> NIR already has these so they are redundant. A run of shader-db confirms
> that the only cases where these backend optimizations are activated
> are some Tomb Raider shaders where the affected variables are qualified
> as "precise", which is why NIR won't apply them and why the backend
> shouldn't either (so it is actually a bug).

Which of the six optimizations that you're removing were responsible
for the change? I ask because...

>
> Suggested-by: Jason Ekstrand 
> ---
>  src/intel/compiler/brw_fs.cpp | 37 ---
>  1 file changed, 37 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 77c955ac435..e7f5a8822a3 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -2568,16 +2568,6 @@ fs_visitor::opt_algebraic()
>  break;
>   }
>   break;
> -  case BRW_OPCODE_LRP:
> - if (inst->src[1].equals(inst->src[2])) {
> -inst->opcode = BRW_OPCODE_MOV;
> -inst->src[0] = inst->src[1];
> -inst->src[1] = reg_undef;
> -inst->src[2] = reg_undef;
> -progress = true;
> -break;

I'm not sure whether this is imprecise, and...

> - }
> - break;
>case BRW_OPCODE_CMP:
>   if ((inst->conditional_mod == BRW_CONDITIONAL_Z ||
>inst->conditional_mod == BRW_CONDITIONAL_NZ) &&
> @@ -2654,33 +2644,6 @@ fs_visitor::opt_algebraic()
>  }
>   }
>   break;
> -  case BRW_OPCODE_MAD:
> - if (inst->src[1].is_zero() || inst->src[2].is_zero()) {
> -inst->opcode = BRW_OPCODE_MOV;
> -inst->src[1] = reg_undef;
> -inst->src[2] = reg_undef;
> -progress = true;
> - } else if (inst->src[0].is_zero()) {
> -inst->opcode = BRW_OPCODE_MUL;
> -inst->src[0] = inst->src[2];
> -inst->src[2] = reg_undef;
> -progress = true;
> - } else if (inst->src[1].is_one()) {
> -inst->opcode = BRW_OPCODE_ADD;
> -inst->src[1] = inst->src[2];
> -inst->src[2] = reg_undef;
> -progress = true;
> - } else if (inst->src[2].is_one()) {
> -inst->opcode = BRW_OPCODE_ADD;
> -inst->src[2] = reg_undef;
> -progress = true;
> - } else if (inst->src[1].file == IMM && inst->src[2].file == IMM) {
> -inst->opcode = BRW_OPCODE_ADD;
> -inst->src[1].f *= inst->src[2].f;
> -inst->src[2] = reg_undef;
> -progress = true;

or this one.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 00/42] intel: VK_KHR_shader_float16_int8 implementation

2019-01-17 Thread Jason Ekstrand
I'm done for the day but I've read through most of the patches.  I think
I've got 4 or 5 tricky ones left.  By and large, I think things are looking
really good.  I don't know that we'll make 19.0 but there's a possibility.
If not, it'll likely land shortly after.

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> The changes in this version address review feedback to v2 and, most
> importantly,
> rebase on top of relevant changes in master, specifically Curro's regioning
> lowering pass. This new regioning pass simplifies some of the NIR
> translation
> code (specifically the code for translating regioning restrictions on
> conversions for atom platforms) making some of the previous work in this
> series
> unnecessary. The regioning restrictions for conversions between integer and
> half-float added with this series are are now implemented as part of this
> framework instead of doing it at NIR translation time. This version of the
> series also dropped the SPIR-V compiler patches that have already been
> merged.
>
> As always, a branch for with these patches is available for testing in the
> itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at
> https://github.com/Igalia/mesa.
>
> Iago Toral Quiroga (42):
>   intel/compiler: handle conversions between int and half-float on atom
>   intel/compiler: add a NIR pass to lower conversions
>   intel/compiler: split float to 64-bit opcodes from int to 64-bit
>   intel/compiler: handle b2i/b2f with other integer conversion opcodes
>   intel/compiler: assert restrictions on conversions to half-float
>   intel/compiler: lower some 16-bit float operations to 32-bit
>   intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
>   intel/compiler: implement 16-bit fsign
>   intel/compiler: allow extended math functions with HF operands
>   compiler/nir: add lowering option for 16-bit fmod
>   intel/compiler: lower 16-bit fmod
>   compiler/nir: add lowering for 16-bit flrp
>   intel/compiler: lower 16-bit flrp
>   compiler/nir: add lowering for 16-bit ldexp
>   intel/compiler: Extended Math is limited to SIMD8 on half-float
>   intel/compiler: add instruction setters for Src1Type and Src2Type.
>   intel/compiler: add new half-float register type for 3-src
> instructions
>   intel/compiler: add a helper function to query hardware type table
>   intel/compiler: don't compact 3-src instructions with Src1Type or
> Src2Type bits
>   intel/compiler: allow half-float on 3-source instructions since gen8
>   intel/compiler: set correct precision fields for 3-source float
> instructions
>   intel/compiler: don't propagate HF immediates to 3-src instructions
>   intel/compiler: fix ddx and ddy for 16-bit float
>   intel/compiler: fix ddy for half-float in gen8
>   intel/compiler: workaround for SIMD8 half-float MAD in gen8
>   intel/compiler: split is_partial_write() into two variants
>   intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
>   intel/compiler: handle 64-bit float to 8-bit integer conversions
>   intel/compiler: handle conversions between int and half-float on atom
>   intel/compiler: implement isign for int8
>   intel/compiler: ask for an integer type if requesting an 8-bit type
>   intel/eu: force stride of 2 on NULL register for Byte instructions
>   compiler/spirv: add support for Float16 and Int8 capabilities
>   anv/pipeline: support Float16 and Int8 capabilities in gen8+
>   anv/device: expose shaderFloat16 and shaderInt8 in gen8+
>   intel/compiler: implement is_zero, is_one, is_negative_one for
> 8-bit/16-bit
>   intel/compiler: add a brw_reg_type_is_integer helper
>   intel/compiler: fix cmod propagation for non 32-bit types
>   intel/compiler: remove MAD/LRP algebraic optimizations from the
> backend
>   intel/compiler: support half-float in the combine constants pass
>   intel/compiler: fix combine constants for Align16 with half-float
> prior to gen9
>   intel/compiler: allow propagating HF immediates to MAD/LRP
>
>  src/compiler/nir/nir.h|   2 +
>  src/compiler/nir/nir_opt_algebraic.py |  11 +-
>  src/compiler/shader_info.h|   2 +
>  src/compiler/spirv/spirv_to_nir.c |   8 +-
>  src/intel/Makefile.sources|   1 +
>  src/intel/compiler/brw_compiler.c |   2 +
>  src/intel/compiler/brw_eu_compact.c   |   5 +-
>  src/intel/compiler/brw_eu_emit.c  |  36 +++-
>  src/intel/compiler/brw_fs.cpp | 143 ++--
>  src/intel/compiler/brw_fs.h   |   1 +
>  .../compiler/brw_fs_cmod_propagation.cpp  |  34 ++--
>  .../compiler/brw_fs_combine_constants.cpp |  82 +++--
>  .../compiler/brw_fs_copy_propagation.cpp  |  14 +-
>  src/intel/compiler/brw_fs_cse.cpp |   3 +-
>  .../compiler/brw_fs_dead_code_eliminate.cpp   |   2 +-
>  src/intel/compiler/brw_fs_generator.cpp   |  47 +++---
>  

Re: [Mesa-dev] [PATCH v3 40/42] intel/compiler: support half-float in the combine constants pass

2019-01-17 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> Reviewed-by: Topi Pohjolainen 
> ---
>  .../compiler/brw_fs_combine_constants.cpp | 60 +++
>  1 file changed, 49 insertions(+), 11 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_combine_constants.cpp
> b/src/intel/compiler/brw_fs_combine_constants.cpp
> index 7343f77bb45..54017e5668b 100644
> --- a/src/intel/compiler/brw_fs_combine_constants.cpp
> +++ b/src/intel/compiler/brw_fs_combine_constants.cpp
> @@ -36,6 +36,7 @@
>
>  #include "brw_fs.h"
>  #include "brw_cfg.h"
> +#include "util/half_float.h"
>
>  using namespace brw;
>
> @@ -114,8 +115,9 @@ struct imm {
>  */
> exec_list *uses;
>
> -   /** The immediate value.  We currently only handle floats. */
> +   /** The immediate value.  We currently only handle float and
> half-float. */
> float val;
> +   brw_reg_type type;
>

I had a brief chat with Matt today and I think that this may be going in
the wrong direction.  In particular, I'd like us to eventually (maybe we
can do it now?) generalize the combine_constants pass to more data types;
in particular, integers.  I recently came across a shader where the fact
that we couldn't do combine_constants on integers was causing significant
register pressure problems and spilling.  (The test was doing a bunch of
BFI2/BFE with constant sources.)  It could also be a huge win for 8-bit and
64-bit where we can't put immediates in regular 2-src instructions.

What does this mean for the pass?  I suspect we want a bit size instead of
a type and a simple char[8] for the data and just make it a blob of bits.
We may also want some sort of heuristic so we don't burn constant table
space for things that are only used once or maybe even twice.

Normally, I would say "do it as a fixup" but if we go the direction of
having a float and using _mesa_half_to_float and _mesa_float_to_half, I
suspect it'll be harder to go for the bag-of-bits approach.

Thoughts?

--Jason


>
> /**
>  * The GRF register and subregister number where we've decided to
> store the
> @@ -145,10 +147,10 @@ struct table {
>  };
>
>  static struct imm *
> -find_imm(struct table *table, float val)
> +find_imm(struct table *table, float val, brw_reg_type type)
>  {
> for (int i = 0; i < table->len; i++) {
> -  if (table->imm[i].val == val) {
> +  if (table->imm[i].val == val && table->imm[i].type == type) {
>   return >imm[i];
>}
> }
> @@ -190,6 +192,20 @@ compare(const void *_a, const void *_b)
> return a->first_use_ip - b->first_use_ip;
>  }
>
> +static bool
> +needs_negate(float reg_val, float imm_val, brw_reg_type type)
> +{
> +   /* reg_val represents the immediate value in the register in its
> original
> +* bit-size, while imm_val is always a valid 32-bit float value.
> +*/
> +   if (type == BRW_REGISTER_TYPE_HF) {
> +  uint32_t reg_val_ud = *((uint32_t *) _val);
> +  reg_val = _mesa_half_to_float(reg_val_ud & 0x);
> +   }
> +
> +   return signbit(imm_val) != signbit(reg_val);
> +}
> +
>  bool
>  fs_visitor::opt_combine_constants()
>  {
> @@ -215,12 +231,20 @@ fs_visitor::opt_combine_constants()
>
>for (int i = 0; i < inst->sources; i++) {
>   if (inst->src[i].file != IMM ||
> - inst->src[i].type != BRW_REGISTER_TYPE_F)
> + (inst->src[i].type != BRW_REGISTER_TYPE_F &&
> +  inst->src[i].type != BRW_REGISTER_TYPE_HF))
>  continue;
>
> - float val = !inst->can_do_source_mods(devinfo) ? inst->src[i].f :
> - fabs(inst->src[i].f);
> - struct imm *imm = find_imm(, val);
> + float val;
> + if (inst->src[i].type == BRW_REGISTER_TYPE_F) {
> +val = !inst->can_do_source_mods(devinfo) ? inst->src[i].f :
> +fabs(inst->src[i].f);
> + } else {
> +val = !inst->can_do_source_mods(devinfo) ?
> +   _mesa_half_to_float(inst->src[i].d & 0x) :
> +   fabs(_mesa_half_to_float(inst->src[i].d & 0x));
> + }
> + struct imm *imm = find_imm(, val, inst->src[i].type);
>
>   if (imm) {
>  bblock_t *intersection = cfg_t::intersect(block, imm->block);
> @@ -238,6 +262,7 @@ fs_visitor::opt_combine_constants()
>  imm->uses = new(const_ctx) exec_list();
>  imm->uses->push_tail(link(const_ctx, >src[i]));
>  imm->val = val;
> +imm->type = inst->src[i].type;
>  imm->uses_by_coissue = could_coissue(devinfo, inst);
>  imm->must_promote = must_promote_imm(devinfo, inst);
>  imm->first_use_ip = ip;
> @@ -278,12 +303,23 @@ fs_visitor::opt_combine_constants()
>imm->block->last_non_control_flow_inst()->next);
>const fs_builder ibld = bld.at(imm->block, n).exec_all().group(1,
> 0);
>
> -  ibld.MOV(reg, brw_imm_f(imm->val));
> +  reg = retype(reg, imm->type);
> +  if 

Re: [Mesa-dev] [PATCH v3 39/42] intel/compiler: remove MAD/LRP algebraic optimizations from the backend

2019-01-17 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> NIR already has these so they are redundant. A run of shader-db confirms
> that the only cases where these backend optimizations are activated
> are some Tomb Raider shaders where the affected variables are qualified
> as "precise", which is why NIR won't apply them and why the backend
> shouldn't either (so it is actually a bug).
>
> Suggested-by: Jason Ekstrand 
> ---
>  src/intel/compiler/brw_fs.cpp | 37 ---
>  1 file changed, 37 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 77c955ac435..e7f5a8822a3 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -2568,16 +2568,6 @@ fs_visitor::opt_algebraic()
>  break;
>   }
>   break;
> -  case BRW_OPCODE_LRP:
> - if (inst->src[1].equals(inst->src[2])) {
> -inst->opcode = BRW_OPCODE_MOV;
> -inst->src[0] = inst->src[1];
> -inst->src[1] = reg_undef;
> -inst->src[2] = reg_undef;
> -progress = true;
> -break;
> - }
> - break;
>case BRW_OPCODE_CMP:
>   if ((inst->conditional_mod == BRW_CONDITIONAL_Z ||
>inst->conditional_mod == BRW_CONDITIONAL_NZ) &&
> @@ -2654,33 +2644,6 @@ fs_visitor::opt_algebraic()
>  }
>   }
>   break;
> -  case BRW_OPCODE_MAD:
> - if (inst->src[1].is_zero() || inst->src[2].is_zero()) {
> -inst->opcode = BRW_OPCODE_MOV;
> -inst->src[1] = reg_undef;
> -inst->src[2] = reg_undef;
> -progress = true;
> - } else if (inst->src[0].is_zero()) {
> -inst->opcode = BRW_OPCODE_MUL;
> -inst->src[0] = inst->src[2];
> -inst->src[2] = reg_undef;
> -progress = true;
> - } else if (inst->src[1].is_one()) {
> -inst->opcode = BRW_OPCODE_ADD;
> -inst->src[1] = inst->src[2];
> -inst->src[2] = reg_undef;
> -progress = true;
> - } else if (inst->src[2].is_one()) {
> -inst->opcode = BRW_OPCODE_ADD;
> -inst->src[2] = reg_undef;
> -progress = true;
> - } else if (inst->src[1].file == IMM && inst->src[2].file == IMM)
> {
> -inst->opcode = BRW_OPCODE_ADD;
> -inst->src[1].f *= inst->src[2].f;
> -inst->src[2] = reg_undef;
> -progress = true;
> - }
> - break;
>case SHADER_OPCODE_BROADCAST:
>   if (is_uniform(inst->src[0])) {
>  inst->opcode = BRW_OPCODE_MOV;
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 38/42] intel/compiler: fix cmod propagation for non 32-bit types

2019-01-17 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> v2:
>  - Do not propagate if the bit-size changes
> ---
>  src/intel/compiler/brw_fs_cmod_propagation.cpp | 14 +-
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_cmod_propagation.cpp
> b/src/intel/compiler/brw_fs_cmod_propagation.cpp
> index 7bb5c9afbc9..57d4e645c05 100644
> --- a/src/intel/compiler/brw_fs_cmod_propagation.cpp
> +++ b/src/intel/compiler/brw_fs_cmod_propagation.cpp
> @@ -244,8 +244,7 @@ opt_cmod_propagation_local(const gen_device_info
> *devinfo,
>  /* CMP's result is the same regardless of dest type. */
>  if (inst->conditional_mod == BRW_CONDITIONAL_NZ &&
>  scan_inst->opcode == BRW_OPCODE_CMP &&
> -(inst->dst.type == BRW_REGISTER_TYPE_D ||
> - inst->dst.type == BRW_REGISTER_TYPE_UD)) {
> +brw_reg_type_is_integer(inst->dst.type)) {
> inst->remove(block);
> progress = true;
> break;
> @@ -258,9 +257,14 @@ opt_cmod_propagation_local(const gen_device_info
> *devinfo,
> break;
>
>  /* Comparisons operate differently for ints and floats */
> -if (scan_inst->dst.type != inst->dst.type &&
> -(scan_inst->dst.type == BRW_REGISTER_TYPE_F ||
> - inst->dst.type == BRW_REGISTER_TYPE_F))
> +if (brw_reg_type_is_floating_point(scan_inst->dst.type) !=
> +brw_reg_type_is_floating_point(inst->dst.type))
> +   break;
> +
> +/* Comparison result may be altered if the bit-size changes
> + * since that affects range, denorms, etc
> + */
> +if (type_sz(scan_inst->dst.type) != type_sz(inst->dst.type))
> break;
>
>  /* If the instruction generating inst's source also wrote the
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 36/42] intel/compiler: implement is_zero, is_one, is_negative_one for 8-bit/16-bit

2019-01-17 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> There are no 8-bit immediates, so assert in that case.
> 16-bit immediates are replicated in each word of a 32-bit immediate, so
> we only need to check the lower 16-bits.
>
> v2:
>  - Fix is_zero with half-float to consider -0 as well (Jason).
>  - Fix is_negative_one for word type.
> ---
>  src/intel/compiler/brw_shader.cpp | 26 ++
>  1 file changed, 26 insertions(+)
>
> diff --git a/src/intel/compiler/brw_shader.cpp
> b/src/intel/compiler/brw_shader.cpp
> index 97966c951a1..3c636c9d3a4 100644
> --- a/src/intel/compiler/brw_shader.cpp
> +++ b/src/intel/compiler/brw_shader.cpp
> @@ -704,11 +704,20 @@ backend_reg::is_zero() const
> if (file != IMM)
>return false;
>
> +   assert(type_sz(type) > 1);
> +
> switch (type) {
> +   case BRW_REGISTER_TYPE_HF:
> +  assert((d & 0x) == ((d >> 16) & 0x));
> +  return (d & 0x) == 0 || (d & 0x) == 0x8000;
> case BRW_REGISTER_TYPE_F:
>return f == 0;
> case BRW_REGISTER_TYPE_DF:
>return df == 0;
> +   case BRW_REGISTER_TYPE_W:
> +   case BRW_REGISTER_TYPE_UW:
> +  assert((d & 0x) == ((d >> 16) & 0x));
> +  return (d & 0x) == 0;
> case BRW_REGISTER_TYPE_D:
> case BRW_REGISTER_TYPE_UD:
>return d == 0;
> @@ -726,11 +735,20 @@ backend_reg::is_one() const
> if (file != IMM)
>return false;
>
> +   assert(type_sz(type) > 1);
> +
> switch (type) {
> +   case BRW_REGISTER_TYPE_HF:
> +  assert((d & 0x) == ((d >> 16) & 0x));
> +  return (d & 0x) == 0x3c00;
> case BRW_REGISTER_TYPE_F:
>return f == 1.0f;
> case BRW_REGISTER_TYPE_DF:
>return df == 1.0;
> +   case BRW_REGISTER_TYPE_W:
> +   case BRW_REGISTER_TYPE_UW:
> +  assert((d & 0x) == ((d >> 16) & 0x));
> +  return (d & 0x) == 1;
> case BRW_REGISTER_TYPE_D:
> case BRW_REGISTER_TYPE_UD:
>return d == 1;
> @@ -748,11 +766,19 @@ backend_reg::is_negative_one() const
> if (file != IMM)
>return false;
>
> +   assert(type_sz(type) > 1);
> +
> switch (type) {
> +   case BRW_REGISTER_TYPE_HF:
> +  assert((d & 0x) == ((d >> 16) & 0x));
> +  return (d & 0x) == 0xbc00;
> case BRW_REGISTER_TYPE_F:
>return f == -1.0;
> case BRW_REGISTER_TYPE_DF:
>return df == -1.0;
> +   case BRW_REGISTER_TYPE_W:
> +  assert((d & 0x) == ((d >> 16) & 0x));
> +  return (d & 0x) == 0x;
> case BRW_REGISTER_TYPE_D:
>return d == -1;
> case BRW_REGISTER_TYPE_Q:
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 31/42] intel/compiler: ask for an integer type if requesting an 8-bit type

2019-01-17 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> ---
>  src/intel/compiler/brw_fs_nir.cpp | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index a3d193b8a44..ccf1891b925 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -346,7 +346,9 @@ fs_visitor::nir_emit_impl(nir_function_impl *impl)
>   reg->num_array_elems == 0 ? 1 : reg->num_array_elems;
>unsigned size = array_elems * reg->num_components;
>const brw_reg_type reg_type =
> - brw_reg_type_from_bit_size(reg->bit_size, BRW_REGISTER_TYPE_F);
> + brw_reg_type_from_bit_size(reg->bit_size,
> +reg->bit_size == 8 ?
> BRW_REGISTER_TYPE_D :
> +
>  BRW_REGISTER_TYPE_F);
>

If it were me, I'd do

const brw_reg_type reg_type = reg->bit_size == 8 ? BRW_REGISTER_TYPE_B :
   brw_reg_type_for_bit_size(reg->bit_size, BRW_REGISTER_TYPE_F);

I just find that a tiny bit easier to parse.  Either way is fine though.

Reviewed-by: Jason Ekstrand 


>nir_locals[reg->index] = bld.vgrf(reg_type, size);
> }
>
> @@ -4281,7 +4283,10 @@ fs_visitor::nir_emit_intrinsic(const fs_builder
> , nir_intrinsic_instr *instr
>fs_reg value = get_nir_src(instr->src[0]);
>if (instr->intrinsic == nir_intrinsic_vote_feq) {
>   const unsigned bit_size = nir_src_bit_size(instr->src[0]);
> - value.type = brw_reg_type_from_bit_size(bit_size,
> BRW_REGISTER_TYPE_F);
> + value.type =
> +brw_reg_type_from_bit_size(bit_size,
> +   bit_size == 8 ?
> BRW_REGISTER_TYPE_D :
> +
>  BRW_REGISTER_TYPE_F);
>}
>
>fs_reg uniformized = bld.emit_uniformize(value);
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Jason Ekstrand
On Thu, Jan 17, 2019 at 5:11 PM Francisco Jerez 
wrote:

> Subject is still inaccurate.  How about "intel/fs: Don't touch
> accumulator destination while applying regioning alignment rule."
>

Sounds good to me.  I'll fix it.


>
> Jason Ekstrand  writes:
>
> > In some shaders, you can end up with a stride in the source of a
> > SHADER_OPCODE_MULH.  One way this can happen is if the MULH is acting on
> > the top bits of a 64-bit value due to 64-bit integer lowering.  In this
> > case, the compiler will produce something like this:
> >
> > mul(8)acc0<1>UD   g5<8,4,2>UD   0x0004UW  { align1 1Q };
> > mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q
> AccWrEnable };
> >
> > The new region fixup pass looks at the MUL and sees a strided source and
> > unstrided destination and determines that the sequence is illegal.  It
> > then attempts to fix the illegal stride by replacing the destination of
> > the MUL with a temporary and emitting a MOV into the accumulator:
> >
> > mul(8)g9<2>UD g5<8,4,2>UD   0x0004UW  { align1 1Q };
> > mov(8)acc0<1>UD   g9<8,4,2>UD { align1 1Q };
> > mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q
> AccWrEnable };
> >
> > Unfortunately, this new sequence isn't correct because MOV accesses the
> > accumulator with a different precision to MUL and, instead of filling
> > the bottom 32 bits with the source and zeroing the top 32 bits, it
> > leaves the top 32 (or maybe 31) bits alone and full of garbage.  When
> > the MACH comes along and tries to complete the multiplication, the
> > result is correct in the bottom 32 bits (which we throw away) and
> > garbage in the top 32 bits which are actually returned by MACH.
> >
> > This commit does two things:  First, it adds an assert to ensure that we
> > don't try to rewrite accumulator destinations of MUL instructions so we
> > can avoid this precision issue.  Second, it modifies
> > required_dst_byte_stride to require a tightly packed stride so that we
> > fix up the sources instead and the actual code which gets emitted is
> > this:
> >
> > mov(8)g9<1>UD g5<8,4,2>UD { align1 1Q };
> > mul(8)acc0<1>UD   g9<8,8,1>UD   0x0004UW  { align1 1Q };
> > mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q
> AccWrEnable };
> >
> > Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
> > Cc: Francisco Jerez 
> > ---
> >  src/intel/compiler/brw_fs_lower_regioning.cpp | 24 ++-
> >  1 file changed, 23 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
> b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > index cc4163b4c2c..00cb5769ebe 100644
> > --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> > +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > @@ -53,7 +53,21 @@ namespace {
> > unsigned
> > required_dst_byte_stride(const fs_inst *inst)
> > {
> > -  if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> > +  if (inst->dst.is_accumulator()) {
> > + /* If the destination is an accumulator, insist that we leave
> the
> > +  * stride alone.  We cannot "fix" accumulator destinations by
> writing
> > +  * to a temporary and emitting a MOV into the original
> destination.
> > +  * For multiply instructions (our one use of the accumulator),
> the
> > +  * MUL writes the full 66 bits of the accumulator whereas the
> MOV we
> > +  * would emit only writes 33 bits ane leaves the top 33 bits
>
> and*
>
> > +  * undefined.
> > +  *
> > +  * It's safe to just require the original stride here because
> the
> > +  * lowering pass will detect the mismatch in
> required_src_byte_stride
> > +  * just fix up the sources of the multiply instead of the
> destination.
>
> There isn't such a thing as "required_src_byte_stride".  Conjunction
> missing between that sentence and the "just fix up..." one.
>
> Code is still:
>
> Reviewed-by: Francisco Jerez 
>

Thanks!  I'll do one more Jenkins run and push it.

--Jason


> > +  */
> > + return inst->dst.stride * type_sz(inst->dst.type);
> > +  } else if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> >!is_byte_raw_mov(inst)) {
> >   return get_exec_type_size(inst);
> >} else {
> > @@ -316,6 +330,14 @@ namespace {
> > bool
> > lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst *inst)
> > {
> > +  /* We cannot replace the result of an integer multiply which
> writes the
> > +   * accumulator because MUL+MACH pairs act on the accumulator as a
> 66-bit
> > +   * value whereas the MOV will act on only 32 or 33 bits of the
> > +   * accumulator.
> > +   */
> > +  assert(inst->opcode != BRW_OPCODE_MUL ||
> !inst->dst.is_accumulator() ||
> > + brw_reg_type_is_floating_point(inst->dst.type));
> > +
> >const fs_builder 

Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Francisco Jerez
Jason Ekstrand  writes:

> On Thu, Jan 17, 2019 at 3:34 PM Francisco Jerez 
> wrote:
>
>> Jason Ekstrand  writes:
>>
>> > Bah... previous e-mail unfinished.  Please ignore.
>> >
>> > On Thu, Jan 17, 2019 at 4:15 AM Francisco Jerez 
>> > wrote:
>> >
>> >> Jason Ekstrand  writes:
>> >>
>> >> > The pass was discovered to cause problems with the MUL+MACH
>> combination
>> >> > we emit for nir_[iu]mul_high.  In an experimental branch of mine, I
>> ran
>> >> > into issues where the MUL+MACH ended up using a strided source due to
>> >> > working on half of a uint64_t and the new lowering pass helpfully
>> tried
>> >> > to fix the multiply which wrote to an unstriated accumulator.
>> >>
>> >> > Not only did the multiply not need to be fixed
>> >>
>> >> That's far from clear, and inconsistent with what this patch is doing,
>> >> since the fix is still being applied (Wouldn't it make sense to clarify
>> >> that in the commit message since it's slightly misleading about it?).
>> >>
>> >> The original instruction was technically violating the first CHV/BXT
>> >> double-precision regioning restriction before the pass was introduced,
>> >> that's why it made any changes in the first place.  The integer
>> >> multiplication lowering code was just lucky enough that violating the
>> >> restriction didn't matter in this case, but I doubt that the reason for
>> >> that had anything to do with the accumulator being the explicit
>> >> destination...
>> >>
>> >
>> > Explicit, no, but I do suspect that does have to do with it being the
>> > accumulator.  This restriction isn't theoretical; if you violate it
>> > with a GRF, you will get data corruption; I've seen it myself.
>>
>> The BSpec language is vague and frequently inconsistent.  Obviously it
>> was being violated before because the text doesn't name the accumulator
>> as an exemption from that rule.  The fact that you've seen it blow up
>> with corruption before doesn't guarantee it will always blow up under
>> the conditions stated on the hardware spec (because those conditions are
>> a highly imperfect abstraction of the hardware logic rather than the
>> hardware logic itself).  It's because the restriction (as it's
>> enunciated in the BSpec) was purely theoretical that the MULH
>> implementation worked in the first place.
>>
>> > I could see two possible explanations:
>> >
>> >  1. Under the hood the accumulator is written with a Q type and an
>> internal
>> > stride of 8 bytes, hence the restriction does apply but is implicitly
>> > satisfied for D type source strides of 1 and 2.
>> >  2. The data path to the accumulator is a special case in the hardware
>> and
>> > doesn't use the normal general-purpose regioning logic and so doesn't
>> > require the restriction.
>> >
>>
>> I don't see any evidence for any of these explanations.  I believe that
>> the actual reason why the MULH implementation didn't suffer the effects
>> of violating these restrictions is that in fact they don't apply to
>> *any* 32x16-bit integer multiply operations at all despite what the
>> hardware spec says, whether the destination is the accumulator or not.
>>
>> I've verified it by doing a daily CI run on the following patch:
>>
>>
>> https://cgit.freedesktop.org/~currojerez/mesa/commit/?h=jenkins=c1a32c4e1e53d70b0c8f6254f0f53f0230b7e21b
>>
>> It disables legalization of the integer multiply instruction and then
>> adds a hack to lower_integer_multiplication() for it to intentionally
>> break the alignment rule.  No regressions on CHV/BXT/GLK.  My reading of
>> the simulator confirms that 32x16-bit multiplication isn't affected by
>> the restriction.
>>
>
> That explanation makes sense especially when combined with the fact that
> DxD -> Q and DxD -> D was added on gen8 and DxW -> D or DxW -> acc0 has
> been around for a long time.
>
>
>> I'm tempted to send a patch that disables regioning alignment lowering
>> for 32x16-bit integer multiplication strictly for performance.  But
>> that's really an orthogonal change to this patch, since due to the issue
>> of precision loss we still need to make sure not to touch accumulator
>> destinations in instructions that *do* have this restriction.
>>
>
> I just searched through the fs code and implementing MULH is the only use
> of the accumulator in the scalar back-end.  Given that it explicitly only
> does a DxW multiply, changing the lowering pass to only apply to DxD
> multiplies would guarantee correct use of the accumulator.  If we think
> this is the real reason, then I'd be a fan of such an approach.
>

"Guarantee correct" sounds somewhat too strong to me.  Yes, it would
prevent the failure you've seen for the time being and until something
in the compiler changes and introduces additional uses of the
accumulator.  Because nothing in the IR guarantees that the accumulator
won't ever be used in combination with a restricted instruction I'd
argue that we want both patches.

> As a side-note, we really should add a NIR opcode for DxD -> Q 

Re: [Mesa-dev] [PATCH v3 00/42] intel: VK_KHR_shader_float16_int8 implementation

2019-01-17 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> The changes in this version address review feedback to v2 and, most
> importantly,
> rebase on top of relevant changes in master, specifically Curro's regioning
> lowering pass. This new regioning pass simplifies some of the NIR
> translation
> code (specifically the code for translating regioning restrictions on
> conversions for atom platforms) making some of the previous work in this
> series
> unnecessary. The regioning restrictions for conversions between integer and
> half-float added with this series are are now implemented as part of this
> framework instead of doing it at NIR translation time. This version of the
> series also dropped the SPIR-V compiler patches that have already been
> merged.
>
> As always, a branch for with these patches is available for testing in the
> itoral/VK_KHR_shader_float16_int8 branch of the Igalia Mesa repository at
> https://github.com/Igalia/mesa.
>
> Iago Toral Quiroga (42):
>   intel/compiler: handle conversions between int and half-float on atom
>   intel/compiler: add a NIR pass to lower conversions
>   intel/compiler: split float to 64-bit opcodes from int to 64-bit
>   intel/compiler: handle b2i/b2f with other integer conversion opcodes
>   intel/compiler: assert restrictions on conversions to half-float
>   intel/compiler: lower some 16-bit float operations to 32-bit
>   intel/compiler: lower 16-bit extended math to 32-bit prior to gen9
>   intel/compiler: implement 16-bit fsign
>   intel/compiler: allow extended math functions with HF operands
>   compiler/nir: add lowering option for 16-bit fmod
>   intel/compiler: lower 16-bit fmod
>   compiler/nir: add lowering for 16-bit flrp
>   intel/compiler: lower 16-bit flrp
>   compiler/nir: add lowering for 16-bit ldexp
>   intel/compiler: Extended Math is limited to SIMD8 on half-float
>   intel/compiler: add instruction setters for Src1Type and Src2Type.
>   intel/compiler: add new half-float register type for 3-src
> instructions
>   intel/compiler: add a helper function to query hardware type table
>   intel/compiler: don't compact 3-src instructions with Src1Type or
> Src2Type bits
>   intel/compiler: allow half-float on 3-source instructions since gen8
>   intel/compiler: set correct precision fields for 3-source float
> instructions
>   intel/compiler: don't propagate HF immediates to 3-src instructions
>   intel/compiler: fix ddx and ddy for 16-bit float
>   intel/compiler: fix ddy for half-float in gen8
>   intel/compiler: workaround for SIMD8 half-float MAD in gen8
>   intel/compiler: split is_partial_write() into two variants
>   intel/compiler: activate 16-bit bit-size lowerings also for 8-bit
>   intel/compiler: handle 64-bit float to 8-bit integer conversions
>   intel/compiler: handle conversions between int and half-float on atom
>

I can't find this patch (29/42) in my e-mail but it's on patchwork.  In any
case, I don't think it's doing anything useful anymore and should just be
dropped.


>   intel/compiler: implement isign for int8
>   intel/compiler: ask for an integer type if requesting an 8-bit type
>   intel/eu: force stride of 2 on NULL register for Byte instructions
>   compiler/spirv: add support for Float16 and Int8 capabilities
>   anv/pipeline: support Float16 and Int8 capabilities in gen8+
>   anv/device: expose shaderFloat16 and shaderInt8 in gen8+
>   intel/compiler: implement is_zero, is_one, is_negative_one for
> 8-bit/16-bit
>   intel/compiler: add a brw_reg_type_is_integer helper
>   intel/compiler: fix cmod propagation for non 32-bit types
>   intel/compiler: remove MAD/LRP algebraic optimizations from the
> backend
>   intel/compiler: support half-float in the combine constants pass
>   intel/compiler: fix combine constants for Align16 with half-float
> prior to gen9
>   intel/compiler: allow propagating HF immediates to MAD/LRP
>
>  src/compiler/nir/nir.h|   2 +
>  src/compiler/nir/nir_opt_algebraic.py |  11 +-
>  src/compiler/shader_info.h|   2 +
>  src/compiler/spirv/spirv_to_nir.c |   8 +-
>  src/intel/Makefile.sources|   1 +
>  src/intel/compiler/brw_compiler.c |   2 +
>  src/intel/compiler/brw_eu_compact.c   |   5 +-
>  src/intel/compiler/brw_eu_emit.c  |  36 +++-
>  src/intel/compiler/brw_fs.cpp | 143 ++--
>  src/intel/compiler/brw_fs.h   |   1 +
>  .../compiler/brw_fs_cmod_propagation.cpp  |  34 ++--
>  .../compiler/brw_fs_combine_constants.cpp |  82 +++--
>  .../compiler/brw_fs_copy_propagation.cpp  |  14 +-
>  src/intel/compiler/brw_fs_cse.cpp |   3 +-
>  .../compiler/brw_fs_dead_code_eliminate.cpp   |   2 +-
>  src/intel/compiler/brw_fs_generator.cpp   |  47 +++---
>  src/intel/compiler/brw_fs_live_variables.cpp  |   2 +-
>  src/intel/compiler/brw_fs_nir.cpp |  85 --
>  

Re: [Mesa-dev] [PATCH v3 28/42] intel/compiler: handle 64-bit float to 8-bit integer conversions

2019-01-17 Thread Jason Ekstrand
This patch doesn't really do what the commit message says.  What it really
does is implement float -> 8-bit converions for *any* size float.

On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> These are not directly supported in hardware and brw_nir_lower_conversions
> should have taken care of that before we get here. Also, while we are
> at it, make sure 64-bit integer to 8-bit are also properly split by
> the same lowering pass, since they have the same hardware restrictions.
>

Now that we have a lowering pass, having separate cases just so one of them
can assert seems silly.  If anything, we should just do

if (result.type == BRW_REGISTER_TYPE_B ||
result.type == BRW_REGISTER_TYPE_UB ||
result.type == BRW_REGISTER_TYPE_HF)
   assert(type_sz(op[0].type) < 8) /* brw_nir_lower_conversions */

and have it all in one big case.  The only special case we need is for
booleans where we need to negate them and fall through.

--Jason


> ---
>  src/intel/compiler/brw_fs_nir.cpp | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index cf546b8ff09..e454578d99b 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -786,6 +786,10 @@ fs_visitor::nir_emit_alu(const fs_builder ,
> nir_alu_instr *instr)
> case nir_op_f2f16:
> case nir_op_i2f16:
> case nir_op_u2f16:
> +   case nir_op_i2i8:
> +   case nir_op_u2u8:
> +   case nir_op_f2i8:
> +   case nir_op_f2u8:
>assert(type_sz(op[0].type) < 8); /* brw_nir_lower_conversions */
>inst = bld.MOV(result, op[0]);
>inst->saturate = instr->dest.saturate;
> @@ -824,8 +828,6 @@ fs_visitor::nir_emit_alu(const fs_builder ,
> nir_alu_instr *instr)
> case nir_op_u2u32:
> case nir_op_i2i16:
> case nir_op_u2u16:
> -   case nir_op_i2i8:
> -   case nir_op_u2u8:
>inst = bld.MOV(result, op[0]);
>inst->saturate = instr->dest.saturate;
>break;
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Francisco Jerez
Subject is still inaccurate.  How about "intel/fs: Don't touch
accumulator destination while applying regioning alignment rule."

Jason Ekstrand  writes:

> In some shaders, you can end up with a stride in the source of a
> SHADER_OPCODE_MULH.  One way this can happen is if the MULH is acting on
> the top bits of a 64-bit value due to 64-bit integer lowering.  In this
> case, the compiler will produce something like this:
>
> mul(8)acc0<1>UD   g5<8,4,2>UD   0x0004UW  { align1 1Q };
> mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q AccWrEnable };
>
> The new region fixup pass looks at the MUL and sees a strided source and
> unstrided destination and determines that the sequence is illegal.  It
> then attempts to fix the illegal stride by replacing the destination of
> the MUL with a temporary and emitting a MOV into the accumulator:
>
> mul(8)g9<2>UD g5<8,4,2>UD   0x0004UW  { align1 1Q };
> mov(8)acc0<1>UD   g9<8,4,2>UD { align1 1Q };
> mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q AccWrEnable };
>
> Unfortunately, this new sequence isn't correct because MOV accesses the
> accumulator with a different precision to MUL and, instead of filling
> the bottom 32 bits with the source and zeroing the top 32 bits, it
> leaves the top 32 (or maybe 31) bits alone and full of garbage.  When
> the MACH comes along and tries to complete the multiplication, the
> result is correct in the bottom 32 bits (which we throw away) and
> garbage in the top 32 bits which are actually returned by MACH.
>
> This commit does two things:  First, it adds an assert to ensure that we
> don't try to rewrite accumulator destinations of MUL instructions so we
> can avoid this precision issue.  Second, it modifies
> required_dst_byte_stride to require a tightly packed stride so that we
> fix up the sources instead and the actual code which gets emitted is
> this:
>
> mov(8)g9<1>UD g5<8,4,2>UD { align1 1Q };
> mul(8)acc0<1>UD   g9<8,8,1>UD   0x0004UW  { align1 1Q };
> mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q AccWrEnable };
>
> Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
> Cc: Francisco Jerez 
> ---
>  src/intel/compiler/brw_fs_lower_regioning.cpp | 24 ++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
> b/src/intel/compiler/brw_fs_lower_regioning.cpp
> index cc4163b4c2c..00cb5769ebe 100644
> --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> @@ -53,7 +53,21 @@ namespace {
> unsigned
> required_dst_byte_stride(const fs_inst *inst)
> {
> -  if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> +  if (inst->dst.is_accumulator()) {
> + /* If the destination is an accumulator, insist that we leave the
> +  * stride alone.  We cannot "fix" accumulator destinations by 
> writing
> +  * to a temporary and emitting a MOV into the original destination.
> +  * For multiply instructions (our one use of the accumulator), the
> +  * MUL writes the full 66 bits of the accumulator whereas the MOV we
> +  * would emit only writes 33 bits ane leaves the top 33 bits

and*

> +  * undefined.
> +  *
> +  * It's safe to just require the original stride here because the
> +  * lowering pass will detect the mismatch in 
> required_src_byte_stride
> +  * just fix up the sources of the multiply instead of the 
> destination.

There isn't such a thing as "required_src_byte_stride".  Conjunction
missing between that sentence and the "just fix up..." one.

Code is still:

Reviewed-by: Francisco Jerez 

> +  */
> + return inst->dst.stride * type_sz(inst->dst.type);
> +  } else if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
>!is_byte_raw_mov(inst)) {
>   return get_exec_type_size(inst);
>} else {
> @@ -316,6 +330,14 @@ namespace {
> bool
> lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst *inst)
> {
> +  /* We cannot replace the result of an integer multiply which writes the
> +   * accumulator because MUL+MACH pairs act on the accumulator as a 
> 66-bit
> +   * value whereas the MOV will act on only 32 or 33 bits of the
> +   * accumulator.
> +   */
> +  assert(inst->opcode != BRW_OPCODE_MUL || !inst->dst.is_accumulator() ||
> + brw_reg_type_is_floating_point(inst->dst.type));
> +
>const fs_builder ibld(v, block, inst);
>const unsigned stride = required_dst_byte_stride(inst) /
>type_sz(inst->dst.type);
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 22/42] intel/compiler: don't propagate HF immediates to 3-src instructions

2019-01-17 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> 3-src instructions don't support immediates, but since 36bc5f06dd22,
> we allow them on MAD and LRP relying on the combine constants pass to
> fix it up later. However, that pass is specialized for 32-bit float
> immediates and can't handle HF constants at present, so this patch
> ensures that copy-propagation only does this for 32-bit constants.
>
> Reviewed-by: Topi Pohjolainen 
> ---
>  src/intel/compiler/brw_fs_copy_propagation.cpp | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_copy_propagation.cpp
> b/src/intel/compiler/brw_fs_copy_propagation.cpp
> index c23ce1ef426..77f2749ba04 100644
> --- a/src/intel/compiler/brw_fs_copy_propagation.cpp
> +++ b/src/intel/compiler/brw_fs_copy_propagation.cpp
> @@ -772,8 +772,16 @@ fs_visitor::try_constant_propagate(fs_inst *inst,
> acp_entry *entry)
>
>case BRW_OPCODE_MAD:
>case BRW_OPCODE_LRP:
> - inst->src[i] = val;
> - progress = true;
> + /* 3-src instructions can't take IMM registers, however, for
> 32-bit
> +  * floating instructions we rely on the combine constants pass
> to fix
> +  * it up. For anything else, we shouldn't be promoting immediates
> +  * until we can make the pass capable of combining constants of
> +  * different sizes.
> +  */
> + if (val.type == BRW_REGISTER_TYPE_F) {
> +inst->src[i] = val;
> +progress = true;
> + }
>   break;
>
>default:
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Jason Ekstrand
In some shaders, you can end up with a stride in the source of a
SHADER_OPCODE_MULH.  One way this can happen is if the MULH is acting on
the top bits of a 64-bit value due to 64-bit integer lowering.  In this
case, the compiler will produce something like this:

mul(8)acc0<1>UD   g5<8,4,2>UD   0x0004UW  { align1 1Q };
mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q AccWrEnable };

The new region fixup pass looks at the MUL and sees a strided source and
unstrided destination and determines that the sequence is illegal.  It
then attempts to fix the illegal stride by replacing the destination of
the MUL with a temporary and emitting a MOV into the accumulator:

mul(8)g9<2>UD g5<8,4,2>UD   0x0004UW  { align1 1Q };
mov(8)acc0<1>UD   g9<8,4,2>UD { align1 1Q };
mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q AccWrEnable };

Unfortunately, this new sequence isn't correct because MOV accesses the
accumulator with a different precision to MUL and, instead of filling
the bottom 32 bits with the source and zeroing the top 32 bits, it
leaves the top 32 (or maybe 31) bits alone and full of garbage.  When
the MACH comes along and tries to complete the multiplication, the
result is correct in the bottom 32 bits (which we throw away) and
garbage in the top 32 bits which are actually returned by MACH.

This commit does two things:  First, it adds an assert to ensure that we
don't try to rewrite accumulator destinations of MUL instructions so we
can avoid this precision issue.  Second, it modifies
required_dst_byte_stride to require a tightly packed stride so that we
fix up the sources instead and the actual code which gets emitted is
this:

mov(8)g9<1>UD g5<8,4,2>UD { align1 1Q };
mul(8)acc0<1>UD   g9<8,8,1>UD   0x0004UW  { align1 1Q };
mach(8)   g6<1>UD g5<8,4,2>UD   0x0004UD  { align1 1Q AccWrEnable };

Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
Cc: Francisco Jerez 
---
 src/intel/compiler/brw_fs_lower_regioning.cpp | 24 ++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
b/src/intel/compiler/brw_fs_lower_regioning.cpp
index cc4163b4c2c..00cb5769ebe 100644
--- a/src/intel/compiler/brw_fs_lower_regioning.cpp
+++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
@@ -53,7 +53,21 @@ namespace {
unsigned
required_dst_byte_stride(const fs_inst *inst)
{
-  if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
+  if (inst->dst.is_accumulator()) {
+ /* If the destination is an accumulator, insist that we leave the
+  * stride alone.  We cannot "fix" accumulator destinations by writing
+  * to a temporary and emitting a MOV into the original destination.
+  * For multiply instructions (our one use of the accumulator), the
+  * MUL writes the full 66 bits of the accumulator whereas the MOV we
+  * would emit only writes 33 bits ane leaves the top 33 bits
+  * undefined.
+  *
+  * It's safe to just require the original stride here because the
+  * lowering pass will detect the mismatch in required_src_byte_stride
+  * just fix up the sources of the multiply instead of the destination.
+  */
+ return inst->dst.stride * type_sz(inst->dst.type);
+  } else if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
   !is_byte_raw_mov(inst)) {
  return get_exec_type_size(inst);
   } else {
@@ -316,6 +330,14 @@ namespace {
bool
lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst *inst)
{
+  /* We cannot replace the result of an integer multiply which writes the
+   * accumulator because MUL+MACH pairs act on the accumulator as a 66-bit
+   * value whereas the MOV will act on only 32 or 33 bits of the
+   * accumulator.
+   */
+  assert(inst->opcode != BRW_OPCODE_MUL || !inst->dst.is_accumulator() ||
+ brw_reg_type_is_floating_point(inst->dst.type));
+
   const fs_builder ibld(v, block, inst);
   const unsigned stride = required_dst_byte_stride(inst) /
   type_sz(inst->dst.type);
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Jason Ekstrand
On Thu, Jan 17, 2019 at 3:34 PM Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > Bah... previous e-mail unfinished.  Please ignore.
> >
> > On Thu, Jan 17, 2019 at 4:15 AM Francisco Jerez 
> > wrote:
> >
> >> Jason Ekstrand  writes:
> >>
> >> > The pass was discovered to cause problems with the MUL+MACH
> combination
> >> > we emit for nir_[iu]mul_high.  In an experimental branch of mine, I
> ran
> >> > into issues where the MUL+MACH ended up using a strided source due to
> >> > working on half of a uint64_t and the new lowering pass helpfully
> tried
> >> > to fix the multiply which wrote to an unstriated accumulator.
> >>
> >> > Not only did the multiply not need to be fixed
> >>
> >> That's far from clear, and inconsistent with what this patch is doing,
> >> since the fix is still being applied (Wouldn't it make sense to clarify
> >> that in the commit message since it's slightly misleading about it?).
> >>
> >> The original instruction was technically violating the first CHV/BXT
> >> double-precision regioning restriction before the pass was introduced,
> >> that's why it made any changes in the first place.  The integer
> >> multiplication lowering code was just lucky enough that violating the
> >> restriction didn't matter in this case, but I doubt that the reason for
> >> that had anything to do with the accumulator being the explicit
> >> destination...
> >>
> >
> > Explicit, no, but I do suspect that does have to do with it being the
> > accumulator.  This restriction isn't theoretical; if you violate it
> > with a GRF, you will get data corruption; I've seen it myself.
>
> The BSpec language is vague and frequently inconsistent.  Obviously it
> was being violated before because the text doesn't name the accumulator
> as an exemption from that rule.  The fact that you've seen it blow up
> with corruption before doesn't guarantee it will always blow up under
> the conditions stated on the hardware spec (because those conditions are
> a highly imperfect abstraction of the hardware logic rather than the
> hardware logic itself).  It's because the restriction (as it's
> enunciated in the BSpec) was purely theoretical that the MULH
> implementation worked in the first place.
>
> > I could see two possible explanations:
> >
> >  1. Under the hood the accumulator is written with a Q type and an
> internal
> > stride of 8 bytes, hence the restriction does apply but is implicitly
> > satisfied for D type source strides of 1 and 2.
> >  2. The data path to the accumulator is a special case in the hardware
> and
> > doesn't use the normal general-purpose regioning logic and so doesn't
> > require the restriction.
> >
>
> I don't see any evidence for any of these explanations.  I believe that
> the actual reason why the MULH implementation didn't suffer the effects
> of violating these restrictions is that in fact they don't apply to
> *any* 32x16-bit integer multiply operations at all despite what the
> hardware spec says, whether the destination is the accumulator or not.
>
> I've verified it by doing a daily CI run on the following patch:
>
>
> https://cgit.freedesktop.org/~currojerez/mesa/commit/?h=jenkins=c1a32c4e1e53d70b0c8f6254f0f53f0230b7e21b
>
> It disables legalization of the integer multiply instruction and then
> adds a hack to lower_integer_multiplication() for it to intentionally
> break the alignment rule.  No regressions on CHV/BXT/GLK.  My reading of
> the simulator confirms that 32x16-bit multiplication isn't affected by
> the restriction.
>

That explanation makes sense especially when combined with the fact that
DxD -> Q and DxD -> D was added on gen8 and DxW -> D or DxW -> acc0 has
been around for a long time.


> I'm tempted to send a patch that disables regioning alignment lowering
> for 32x16-bit integer multiplication strictly for performance.  But
> that's really an orthogonal change to this patch, since due to the issue
> of precision loss we still need to make sure not to touch accumulator
> destinations in instructions that *do* have this restriction.
>

I just searched through the fs code and implementing MULH is the only use
of the accumulator in the scalar back-end.  Given that it explicitly only
does a DxW multiply, changing the lowering pass to only apply to DxD
multiplies would guarantee correct use of the accumulator.  If we think
this is the real reason, then I'd be a fan of such an approach.

As a side-note, we really should add a NIR opcode for DxD -> Q multiply and
use that when lowering 64-bit multiplicatoin.

> I don't find the first one very convincing at all.  Among other things, if
> > it were the true reason, it would imply that we would need to use a
> stride
> > of exactly 2 on D type sources which but empirical evidence suggests that
> > "mul(8) acc0<1> g5<8,8,1>UD g9<16,8,2>UW" works just fine.
> >
> >
> >> > but the "fix" ended up breaking it because a MOV to the accumulator is
> >> > not the same as using it as a multiply destination due to the 

Re: [Mesa-dev] [PATCH] autotools: Deprecate the use of autotools

2019-01-17 Thread Jonathan Gray
On Thu, Jan 17, 2019 at 05:05:41PM -0500, Rob Clark wrote:
> On Thu, Jan 17, 2019 at 3:56 PM Jonathan Gray  wrote:
> >
> > On Mon, Jan 14, 2019 at 03:46:35PM +0200, Eero Tamminen wrote:
> > > Hi,
> > >
> > > On 13.1.2019 9.44, Jonathan Gray wrote:
> > > ...> As we can not depend on python to build Mesa in OpenBSD I will have 
> > > to
> > > > go back to maintaining a local Mesa build system if autotools is 
> > > > removed.
> > >
> > > Mesa already needs python-mako (for code generation) with Autotools.
> > > Why Meson also needing Python is a problem?
> >
> > Python generated files can be generated per release to not depend on
> > python at build time.
> >
> > All of the files generated by python used in a build of Mesa on OpenBSD are
> > already included in Mesa tarballs except for egd_tables.h
> >
> > https://bugs.freedesktop.org/show_bug.cgi?id=103911
> 
> is pre-generating the ninja build files that meson generates an
> option?  (I assume that the generated ninja ends up not being portable
> across different build environments, but maybe for OpenBSD the build
> environment is uniform enough??  idk, just a thought)

I suspect the build files meson/ninja generates would not be portable
across architectures with different toolchains but have not looked.

> 
> Anyways, a lot of other projects have been switching to meson, I guess
> OpenBSD will have to figure out how to deal getting python into build
> tree sooner or later, right?

OpenBSD is split into three repositories, src which contains the kernel,
libc, libssl, posix userland, toolchains (llvm, gcc, binutils) etc. 
xenocara which is the xserver, Mesa, xorg drivers, fonts etc and ports.

The src and xenocara trees are self contained and include all dependencies.
ports are makefiles which describe how to fetch sources and patches to
build binary packages.

meson is in ports and is used to build gnome and related parts but can
not be used in src or xenocara trees.  Python will not be imported into
src.  Either a c/perl implementation of meson is needed or a make based
build system from the OpenBSD perspective.

If Makefile.sources remains after autoconf goes I could use that as part
of a locally maintained not upstreamed make based build system.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] autotools: Deprecate the use of autotools

2019-01-17 Thread Rob Clark
On Thu, Jan 17, 2019 at 3:56 PM Jonathan Gray  wrote:
>
> On Mon, Jan 14, 2019 at 03:46:35PM +0200, Eero Tamminen wrote:
> > Hi,
> >
> > On 13.1.2019 9.44, Jonathan Gray wrote:
> > ...> As we can not depend on python to build Mesa in OpenBSD I will have to
> > > go back to maintaining a local Mesa build system if autotools is removed.
> >
> > Mesa already needs python-mako (for code generation) with Autotools.
> > Why Meson also needing Python is a problem?
>
> Python generated files can be generated per release to not depend on
> python at build time.
>
> All of the files generated by python used in a build of Mesa on OpenBSD are
> already included in Mesa tarballs except for egd_tables.h
>
> https://bugs.freedesktop.org/show_bug.cgi?id=103911

is pre-generating the ninja build files that meson generates an
option?  (I assume that the generated ninja ends up not being portable
across different build environments, but maybe for OpenBSD the build
environment is uniform enough??  idk, just a thought)

Anyways, a lot of other projects have been switching to meson, I guess
OpenBSD will have to figure out how to deal getting python into build
tree sooner or later, right?

BR,
-R
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Android + tegra + mesa

2019-01-17 Thread Ilia Mirkin
I believe it should work in principle [at least the tegra drm +
nouveau bit, perhaps not the android things], however it's not
well-tested. I have a Jetson TK1, but it dies fairly instantaneously
once any network goes through the NIC, which is a bad situation for a
netboot setup.

If you're interested in doing some debugging, feel free to join
#nouveau on irc.freenode.net and we can probably help a bit. This
presupposes that you're comfortable building various packages,
potentially with packages, and have remote access to the device to be
able to play around with it. Otherwise we may have a tough time
helping.

Cheers,

  -ilia

On Thu, Jan 17, 2019 at 4:28 PM Artyom Bambalov
 wrote:
>
> Hello, guys. I have device with tegra K1(T124). I try to get open source 
> graphical stack. What I use: mesa 3d(18.2-19.0), libdrm(2.4.96), 
> drm_hwcomposer(the latest), gbm_gralloc(the latest), 4.19 kernel by 
> google(tegra drm + nouveau). What I get: normal bootanimatin, but the system 
> UI is not displaying properly(look at pinned picture). I also tested the 
> kernel with linux distr and all works fine. So, I wanted to ask is it 
> possible to get a normal picture on android with nouveau on tegra K1(T124)?
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Francisco Jerez
Jason Ekstrand  writes:

> Bah... previous e-mail unfinished.  Please ignore.
>
> On Thu, Jan 17, 2019 at 4:15 AM Francisco Jerez 
> wrote:
>
>> Jason Ekstrand  writes:
>>
>> > The pass was discovered to cause problems with the MUL+MACH combination
>> > we emit for nir_[iu]mul_high.  In an experimental branch of mine, I ran
>> > into issues where the MUL+MACH ended up using a strided source due to
>> > working on half of a uint64_t and the new lowering pass helpfully tried
>> > to fix the multiply which wrote to an unstriated accumulator.
>>
>> > Not only did the multiply not need to be fixed
>>
>> That's far from clear, and inconsistent with what this patch is doing,
>> since the fix is still being applied (Wouldn't it make sense to clarify
>> that in the commit message since it's slightly misleading about it?).
>>
>> The original instruction was technically violating the first CHV/BXT
>> double-precision regioning restriction before the pass was introduced,
>> that's why it made any changes in the first place.  The integer
>> multiplication lowering code was just lucky enough that violating the
>> restriction didn't matter in this case, but I doubt that the reason for
>> that had anything to do with the accumulator being the explicit
>> destination...
>>
>
> Explicit, no, but I do suspect that does have to do with it being the
> accumulator.  This restriction isn't theoretical; if you violate it
> with a GRF, you will get data corruption; I've seen it myself.

The BSpec language is vague and frequently inconsistent.  Obviously it
was being violated before because the text doesn't name the accumulator
as an exemption from that rule.  The fact that you've seen it blow up
with corruption before doesn't guarantee it will always blow up under
the conditions stated on the hardware spec (because those conditions are
a highly imperfect abstraction of the hardware logic rather than the
hardware logic itself).  It's because the restriction (as it's
enunciated in the BSpec) was purely theoretical that the MULH
implementation worked in the first place.

> I could see two possible explanations:
>
>  1. Under the hood the accumulator is written with a Q type and an internal
> stride of 8 bytes, hence the restriction does apply but is implicitly
> satisfied for D type source strides of 1 and 2.
>  2. The data path to the accumulator is a special case in the hardware and
> doesn't use the normal general-purpose regioning logic and so doesn't
> require the restriction.
>

I don't see any evidence for any of these explanations.  I believe that
the actual reason why the MULH implementation didn't suffer the effects
of violating these restrictions is that in fact they don't apply to
*any* 32x16-bit integer multiply operations at all despite what the
hardware spec says, whether the destination is the accumulator or not.

I've verified it by doing a daily CI run on the following patch:

https://cgit.freedesktop.org/~currojerez/mesa/commit/?h=jenkins=c1a32c4e1e53d70b0c8f6254f0f53f0230b7e21b

It disables legalization of the integer multiply instruction and then
adds a hack to lower_integer_multiplication() for it to intentionally
break the alignment rule.  No regressions on CHV/BXT/GLK.  My reading of
the simulator confirms that 32x16-bit multiplication isn't affected by
the restriction.

I'm tempted to send a patch that disables regioning alignment lowering
for 32x16-bit integer multiplication strictly for performance.  But
that's really an orthogonal change to this patch, since due to the issue
of precision loss we still need to make sure not to touch accumulator
destinations in instructions that *do* have this restriction.

> I don't find the first one very convincing at all.  Among other things, if
> it were the true reason, it would imply that we would need to use a stride
> of exactly 2 on D type sources which but empirical evidence suggests that
> "mul(8) acc0<1> g5<8,8,1>UD g9<16,8,2>UW" works just fine.
>
>
>> > but the "fix" ended up breaking it because a MOV to the accumulator is
>> > not the same as using it as a multiply destination due to the magic
>> > way the 33/64 bits of the
>>
>> Technically it has 66 bits (it wasn't a typo when I said that to you
>> earlier on IRC).  That's how it can t hold the result of a SIMD16
>> 16x16-bit integer multiplication with 33-bit signed precision per scalar
>> component.
>>
>
> Yes, there are 33 bits available for WxW multiplies but this is dealing
> with a DxD multiply which only has 64 bits according to this bit of bspec
> text:
>
> As there are only 64 bits per channel in DWord mode (D and UD), it is
> sufficient to store the multiplication result of two DWord operands as long
> as the post source modified sources are still within 32 bits. If any one
> source is type UD and is negated, the negated result becomes 33 bits. The
> DWord multiplication result is then 65 bits, bigger than the storage
> capacity of accumulators. Consequently, the results are unpredictable.
>
>


[Mesa-dev] Android + tegra + mesa

2019-01-17 Thread Artyom Bambalov
Hello, guys. I have device with tegra K1(T124). I try to get open source
graphical stack. What I use: mesa 3d(18.2-19.0), libdrm(2.4.96),
drm_hwcomposer(the latest), gbm_gralloc(the latest), 4.19 kernel by
google(tegra drm + nouveau). What I get: normal bootanimatin, but the
system UI is not displaying properly(look at pinned picture). I also tested
the kernel with linux distr and all works fine. So, I wanted to ask is it
possible to get a normal picture on android with nouveau on tegra K1(T124)?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [MR] nir: check NIR_SKIP to skip passes by name

2019-01-17 Thread Caio Marcelo de Oliveira Filho
"Passes' function names, separated by comma, listed in NIR_SKIP
environment variable will be skipped in debug mode.  The mechanism is
hooked into the _PASS macro, like NIR_PRINT."

https://gitlab.freedesktop.org/mesa/mesa/merge_requests/130


Caio
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] autotools: Deprecate the use of autotools

2019-01-17 Thread Jonathan Gray
On Mon, Jan 14, 2019 at 03:46:35PM +0200, Eero Tamminen wrote:
> Hi,
> 
> On 13.1.2019 9.44, Jonathan Gray wrote:
> ...> As we can not depend on python to build Mesa in OpenBSD I will have to
> > go back to maintaining a local Mesa build system if autotools is removed.
> 
> Mesa already needs python-mako (for code generation) with Autotools.
> Why Meson also needing Python is a problem?

Python generated files can be generated per release to not depend on
python at build time.

All of the files generated by python used in a build of Mesa on OpenBSD are
already included in Mesa tarballs except for egd_tables.h

https://bugs.freedesktop.org/show_bug.cgi?id=103911
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 21/42] intel/compiler: set correct precision fields for 3-source float instructions

2019-01-17 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> Source0 and Destination extract the floating-point precision automatically
> from the SrcType and DstType instruction fields respectively when they are
> set to types :F or :HF. For Source1 and Source2 operands, we use the new
> 1-bit fields Src1Type and Src2Type, where 0 means normal precision and 1
> means half-precision. Since we always use the type of the destination for
> all operands when we emit 3-source instructions, we only need set Src1Type
> and Src2Type to 1 when we are emitting a half-precision instruction.
>
> v2:
>  - Set the bit separately for each source based on its type so we can
>do mixed floating-point mode in the future (Topi).
>
> Reviewed-by: Topi Pohjolainen 
> ---
>  src/intel/compiler/brw_eu_emit.c | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/src/intel/compiler/brw_eu_emit.c
> b/src/intel/compiler/brw_eu_emit.c
> index a785f96b650..2fa89f8a2a3 100644
> --- a/src/intel/compiler/brw_eu_emit.c
> +++ b/src/intel/compiler/brw_eu_emit.c
> @@ -801,6 +801,22 @@ brw_alu3(struct brw_codegen *p, unsigned opcode,
> struct brw_reg dest,
>*/
>   brw_inst_set_3src_a16_src_type(devinfo, inst, dest.type);
>   brw_inst_set_3src_a16_dst_type(devinfo, inst, dest.type);
> +
> + /* From the Bspec: Instruction types
> +  *
> +  * Three source instructions can use operands with mixed-mode
> +  * precision. When SrcType field is set to :f or :hf it defines
> +  * precision for source 0 only, and fields Src1Type and Src2Type
> +  * define precision for other source operands:
> +  *
> +  *   0b = :f. Single precision Float (32-bit).
> +  *   1b = :hf. Half precision Float (16-bit).
> +  */
> + if (src1.type == BRW_REGISTER_TYPE_HF)
> +brw_inst_set_3src_a16_src1_type(devinfo, inst, 1);
>

Maybe worth throwing in an

assert(src0.type == BRW_REGISTER_TYPE_F || src0.type ==
BRW_REGISTER_TYPE_HF);

just to be sure?  Either way, this and patch 20 are

Reviewed-by: Jason Ekstrand 


> +
> + if (src2.type == BRW_REGISTER_TYPE_HF)
> +brw_inst_set_3src_a16_src2_type(devinfo, inst, 1);
>}
> }
>
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 18/42] intel/compiler: add a helper function to query hardware type table

2019-01-17 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> We open coded this in a couple of places, so a helper function is probably
> sensible. Plus it makes it more consistent with the 3src hardware type
> case.
>
> Suggested-by: Topi Pohjolainen 
> ---
>  src/intel/compiler/brw_reg_type.c | 34 ---
>  1 file changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/src/intel/compiler/brw_reg_type.c
> b/src/intel/compiler/brw_reg_type.c
> index 09b3ea61d4c..0c9f522eca0 100644
> --- a/src/intel/compiler/brw_reg_type.c
> +++ b/src/intel/compiler/brw_reg_type.c
> @@ -193,6 +193,20 @@ static const struct hw_3src_type {
>  #undef E
>  };
>
> +static inline const struct hw_type *
> +get_hw_type_map(const struct gen_device_info *devinfo, uint32_t *size)
> +{
> +   if (devinfo->gen >= 11) {
> +  if (size)
> + *size = ARRAY_SIZE(gen11_hw_type);
>

All of these type tables have the same size because we declare everything
through BRW_REGISTER_TYPE_LAST.  Do we really need to be returning the
array size separately for every table?


> +  return gen11_hw_type;
> +   } else {
> +  if (size)
> + *size = ARRAY_SIZE(gen4_hw_type);
> +  return gen4_hw_type;
> +   }
> +}
> +
>  /**
>   * Convert a brw_reg_type enumeration value into the hardware
> representation.
>   *
> @@ -203,16 +217,10 @@ brw_reg_type_to_hw_type(const struct gen_device_info
> *devinfo,
>  enum brw_reg_file file,
>  enum brw_reg_type type)
>  {
> -   const struct hw_type *table;
> -
> -   if (devinfo->gen >= 11) {
> -  assert(type < ARRAY_SIZE(gen11_hw_type));
> -  table = gen11_hw_type;
> -   } else {
> -  assert(type < ARRAY_SIZE(gen4_hw_type));
> -  table = gen4_hw_type;
> -   }
> +   uint32_t table_size;
> +   const struct hw_type *table = get_hw_type_map(devinfo, _size);
>
> +   assert(type < table_size);
> assert(devinfo->has_64bit_types || brw_reg_type_to_size(type) < 8 ||
>type == BRW_REGISTER_TYPE_NF);
>
> @@ -234,13 +242,7 @@ enum brw_reg_type
>  brw_hw_type_to_reg_type(const struct gen_device_info *devinfo,
>  enum brw_reg_file file, unsigned hw_type)
>  {
> -   const struct hw_type *table;
> -
> -   if (devinfo->gen >= 11) {
> -  table = gen11_hw_type;
> -   } else {
> -  table = gen4_hw_type;
> -   }
> +   const struct hw_type *table = get_hw_type_map(devinfo, NULL);
>
> if (file == BRW_IMMEDIATE_VALUE) {
>for (enum brw_reg_type i = 0; i <= BRW_REGISTER_TYPE_LAST; i++) {
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 19/42] intel/compiler: don't compact 3-src instructions with Src1Type or Src2Type bits

2019-01-17 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> We are now using these bits, so don't assert that they are not set, just
> avoid compaction in that case.
>
> Reviewed-by: Topi Pohjolainen 
> ---
>  src/intel/compiler/brw_eu_compact.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_compact.c
> b/src/intel/compiler/brw_eu_compact.c
> index ae14ef10ec0..20fed254331 100644
> --- a/src/intel/compiler/brw_eu_compact.c
> +++ b/src/intel/compiler/brw_eu_compact.c
> @@ -928,8 +928,11 @@ has_3src_unmapped_bits(const struct gen_device_info
> *devinfo,
>assert(!brw_inst_bits(src, 127, 126) &&
>   !brw_inst_bits(src, 105, 105) &&
>   !brw_inst_bits(src, 84, 84) &&
> - !brw_inst_bits(src, 36, 35) &&
>   !brw_inst_bits(src, 7,  7));
> +
> +  /* Src1Type and Src2Type, used for mixed-precision floating point */
> +  if (brw_inst_bits(src, 36, 35))
> + return true;
>

You're only doing this in the broadwell case.  What about SKL+ and CHV?
Can we compact mixed-precision stuff there?  Looks like maybe we can but
there should be at least something in the commit message about that.


> }
>
> return false;
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 15/42] intel/compiler: Extended Math is limited to SIMD8 on half-float

2019-01-17 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> From the Skylake PRM, Extended Math Function:
>
>   "The execution size must be no more than 8 when half-floats
>are used in source or destination operand."
>
> Earlier generations do not support Extended Math with half-float.
>
> v2:
>  - Rewrite the code to make it more readable (Jason).
>
> v3:
>  - Use if-ladders or just if+return exclusively (Topi).
>
> Reviewed-by: Topi Pohjolainen  (v1)
> ---
>  src/intel/compiler/brw_fs.cpp | 27 ++-
>  1 file changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 0359eb079f7..0b3ec94e2d2 100644
> --- a/src/intel/compiler/brw_fs.cpp
> +++ b/src/intel/compiler/brw_fs.cpp
> @@ -5493,18 +5493,27 @@ get_lowered_simd_width(const struct
> gen_device_info *devinfo,
> case SHADER_OPCODE_EXP2:
> case SHADER_OPCODE_LOG2:
> case SHADER_OPCODE_SIN:
> -   case SHADER_OPCODE_COS:
> +   case SHADER_OPCODE_COS: {
>/* Unary extended math instructions are limited to SIMD8 on Gen4 and
> -   * Gen6.
> +   * Gen6. Extended Math Function is limited to SIMD8 with half-float.
> */
> -  return (devinfo->gen >= 7 ? MIN2(16, inst->exec_size) :
> -  devinfo->gen == 5 || devinfo->is_g4x ? MIN2(16,
> inst->exec_size) :
> -  MIN2(8, inst->exec_size));
> +  if (devinfo->gen == 6 || (devinfo->gen == 4 && !devinfo->is_g4x))
> + return MIN2(8, inst->exec_size);
> +  if (inst->dst.type == BRW_REGISTER_TYPE_HF)
> + return MIN2(8, inst->exec_size);
> +  return MIN2(16, inst->exec_size);
> +   }
>
> -   case SHADER_OPCODE_POW:
> -  /* SIMD16 is only allowed on Gen7+. */
> -  return (devinfo->gen >= 7 ? MIN2(16, inst->exec_size) :
> -  MIN2(8, inst->exec_size));
> +   case SHADER_OPCODE_POW: {
> +  /* SIMD16 is only allowed on Gen7+. Extended Math Function is
> limited
> +   * to SIMD8 with half-float
> +   */
> +  if (devinfo->gen < 7)
> + return MIN2(8, inst->exec_size);
> +  if (inst->dst.type == BRW_REGISTER_TYPE_HF)
> + return MIN2(8, inst->exec_size);
> +  return MIN2(16, inst->exec_size);
> +   }
>
> case SHADER_OPCODE_INT_QUOTIENT:
> case SHADER_OPCODE_INT_REMAINDER:
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 13/42] intel/compiler: lower 16-bit flrp

2019-01-17 Thread Jason Ekstrand
Again, please squash with the previous patch.  Splitting stuff this
granular just makes review harder.

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> Reviewed-by: Jason Ekstrand 
> ---
>  src/intel/compiler/brw_compiler.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/intel/compiler/brw_compiler.c
> b/src/intel/compiler/brw_compiler.c
> index f885e79c3e6..04a1a7cac4e 100644
> --- a/src/intel/compiler/brw_compiler.c
> +++ b/src/intel/compiler/brw_compiler.c
> @@ -33,6 +33,7 @@
> .lower_sub = true,
>  \
> .lower_fdiv = true,
> \
> .lower_scmp = true,
> \
> +   .lower_flrp16 = true,
> \
> .lower_fmod16 = true,
> \
> .lower_fmod32 = true,
> \
> .lower_fmod64 = false,
>  \
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 11/42] intel/compiler: lower 16-bit fmod

2019-01-17 Thread Jason Ekstrand
Patches 10 and 11 really could be one patch.

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> Reviewed-by: Jason Ekstrand 
> ---
>  src/intel/compiler/brw_compiler.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/intel/compiler/brw_compiler.c
> b/src/intel/compiler/brw_compiler.c
> index fe632c5badc..f885e79c3e6 100644
> --- a/src/intel/compiler/brw_compiler.c
> +++ b/src/intel/compiler/brw_compiler.c
> @@ -33,6 +33,7 @@
> .lower_sub = true,
>  \
> .lower_fdiv = true,
> \
> .lower_scmp = true,
> \
> +   .lower_fmod16 = true,
> \
> .lower_fmod32 = true,
> \
> .lower_fmod64 = false,
>  \
> .lower_bitfield_extract = true,
> \
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 09/42] intel/compiler: allow extended math functions with HF operands

2019-01-17 Thread Jason Ekstrand
This should probably be squashed into patch 7 so that it's clear from one
patch that it's being properly handed on both sides of the gen9 boundary.
I'd also squash in patch 15 and just give the whole thing the title "Handle
extended math restrictions for half-float" with a detailed message
describing what all we have to handle.  If you want to keep it as two
patches, that's fine but unnecessary.

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> The PRM states that half-float operands are supported since gen9.
>
> Reviewed-by: Topi Pohjolainen 
> ---
>  src/intel/compiler/brw_eu_emit.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_eu_emit.c
> b/src/intel/compiler/brw_eu_emit.c
> index 45e2552783b..e21df4624b3 100644
> --- a/src/intel/compiler/brw_eu_emit.c
> +++ b/src/intel/compiler/brw_eu_emit.c
> @@ -1874,8 +1874,10 @@ void gen6_math(struct brw_codegen *p,
>assert(src1.file == BRW_GENERAL_REGISTER_FILE ||
>   (devinfo->gen >= 8 && src1.file == BRW_IMMEDIATE_VALUE));
> } else {
> -  assert(src0.type == BRW_REGISTER_TYPE_F);
> -  assert(src1.type == BRW_REGISTER_TYPE_F);
> +  assert(src0.type == BRW_REGISTER_TYPE_F ||
> + (src0.type == BRW_REGISTER_TYPE_HF && devinfo->gen >= 9));
> +  assert(src1.type == BRW_REGISTER_TYPE_F ||
> + (src1.type == BRW_REGISTER_TYPE_HF && devinfo->gen >= 9));
> }
>
> /* Source modifiers are ignored for extended math instructions on
> Gen6. */
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 08/42] intel/compiler: implement 16-bit fsign

2019-01-17 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> v2:
>  - make 16-bit be its own separate case (Jason)
>
> Reviewed-by: Topi Pohjolainen 
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index d742f55a957..cf546b8ff09 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -844,7 +844,22 @@ fs_visitor::nir_emit_alu(const fs_builder ,
> nir_alu_instr *instr)
>  : bld.MOV(result, brw_imm_f(1.0f));
>
>   set_predicate(BRW_PREDICATE_NORMAL, inst);
> -  } else if (type_sz(op[0].type) < 8) {
> +  } else if (type_sz(op[0].type) == 2) {
> + /* AND(val, 0x8000) gives the sign bit.
> +  *
> +  * Predicated OR ORs 1.0 (0x3c00) with the sign bit if val is
> not zero.
> +  */
> + fs_reg zero = retype(brw_imm_uw(0), BRW_REGISTER_TYPE_HF);
> + bld.CMP(bld.null_reg_f(), op[0], zero, BRW_CONDITIONAL_NZ);
> +
> + fs_reg result_int = retype(result, BRW_REGISTER_TYPE_UW);
> + op[0].type = BRW_REGISTER_TYPE_UW;
> + result.type = BRW_REGISTER_TYPE_UW;
>

Why are you whacking the type on result and also making a result_int temp?
I guess you just copied that from the 32-bit case?  If we're going to whack
result.type (which is fine), just use result for the rest of it.  With that
fixed,

Reviewed-by: Jason Ekstrand 


> + bld.AND(result_int, op[0], brw_imm_uw(0x8000u));
> +
> + inst = bld.OR(result_int, result_int, brw_imm_uw(0x3c00u));
> + inst->predicate = BRW_PREDICATE_NORMAL;
> +  } else if (type_sz(op[0].type) == 4) {
>   /* AND(val, 0x8000) gives the sign bit.
>*
>* Predicated OR ORs 1.0 (0x3f80) with the sign bit if val
> is not
> @@ -866,6 +881,7 @@ fs_visitor::nir_emit_alu(const fs_builder ,
> nir_alu_instr *instr)
>* - The sign is encoded in the high 32-bit of each DF
>* - We need to produce a DF result.
>*/
> + assert(type_sz(op[0].type) == 8);
>
>   fs_reg zero = vgrf(glsl_type::double_type);
>   bld.MOV(zero, setup_imm_df(bld, 0.0));
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 05/42] intel/compiler: assert restrictions on conversions to half-float

2019-01-17 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> There are some hardware restrictions that brw_nir_lower_conversions should
> have taken care of before we get here.
>
> v2:
>  - rebased on top of regioning lowering pass
>
> Reviewed-by: Topi Pohjolainen  (v1)
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index e1d0e318b35..d742f55a957 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -784,6 +784,9 @@ fs_visitor::nir_emit_alu(const fs_builder ,
> nir_alu_instr *instr)
> */
>
> case nir_op_f2f16:
> +   case nir_op_i2f16:
> +   case nir_op_u2f16:
> +  assert(type_sz(op[0].type) < 8); /* brw_nir_lower_conversions */
>inst = bld.MOV(result, op[0]);
>inst->saturate = instr->dest.saturate;
>break;
> @@ -821,8 +824,6 @@ fs_visitor::nir_emit_alu(const fs_builder ,
> nir_alu_instr *instr)
> case nir_op_u2u32:
> case nir_op_i2i16:
> case nir_op_u2u16:
> -   case nir_op_i2f16:
> -   case nir_op_u2f16:
> case nir_op_i2i8:
> case nir_op_u2u8:
>inst = bld.MOV(result, op[0]);
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 03/42] intel/compiler: split float to 64-bit opcodes from int to 64-bit

2019-01-17 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Tue, Jan 15, 2019 at 7:55 AM Iago Toral Quiroga 
wrote:

> Going forward having these split is a bit more convenient since these two
> groups have different restrictions.
>
> v2:
>  - Rebased on top of new regioning lowering pass.
>
> Reviewed-by: Topi Pohjolainen  (v1)
> ---
>  src/intel/compiler/brw_fs_nir.cpp | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs_nir.cpp
> b/src/intel/compiler/brw_fs_nir.cpp
> index bdc883e5364..a59debf2b78 100644
> --- a/src/intel/compiler/brw_fs_nir.cpp
> +++ b/src/intel/compiler/brw_fs_nir.cpp
> @@ -801,10 +801,17 @@ fs_visitor::nir_emit_alu(const fs_builder ,
> nir_alu_instr *instr)
> case nir_op_f2f64:
> case nir_op_f2i64:
> case nir_op_f2u64:
> +  assert(type_sz(op[0].type) > 2); /* brw_nir_lower_conversions */
> +  inst = bld.MOV(result, op[0]);
> +  inst->saturate = instr->dest.saturate;
> +  break;
> +
> case nir_op_i2f64:
> case nir_op_i2i64:
> case nir_op_u2f64:
> case nir_op_u2u64:
> +  assert(type_sz(op[0].type) > 1); /* brw_nir_lower_conversions */
> +  /* fallthrough */
> case nir_op_f2f32:
> case nir_op_f2i32:
> case nir_op_f2u32:
> --
> 2.17.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3 02/42] intel/compiler: add a NIR pass to lower conversions

2019-01-17 Thread Jason Ekstrand
On Tue, Jan 15, 2019 at 7:54 AM Iago Toral Quiroga 
wrote:

> Some conversions are not directly supported in hardware and need to be
> split in two conversion instructions going through an intermediary type.
> Doing this at the NIR level simplifies a bit the complexity in the backend.
>
> v2:
>  - Consider fp16 rounding conversion opcodes
>  - Properly handle swizzles on conversion sources.
>
> Reviewed-by: Topi Pohjolainen  (v1)
> ---
>  src/intel/Makefile.sources|   1 +
>  src/intel/compiler/brw_nir.c  |   1 +
>  src/intel/compiler/brw_nir.h  |   2 +
>  .../compiler/brw_nir_lower_conversions.c  | 158 ++
>  src/intel/compiler/meson.build|   1 +
>  5 files changed, 163 insertions(+)
>  create mode 100644 src/intel/compiler/brw_nir_lower_conversions.c
>
> diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources
> index 94a28d370e8..9975daa3ad1 100644
> --- a/src/intel/Makefile.sources
> +++ b/src/intel/Makefile.sources
> @@ -83,6 +83,7 @@ COMPILER_FILES = \
> compiler/brw_nir_analyze_boolean_resolves.c \
> compiler/brw_nir_analyze_ubo_ranges.c \
> compiler/brw_nir_attribute_workarounds.c \
> +   compiler/brw_nir_lower_conversions.c \
> compiler/brw_nir_lower_cs_intrinsics.c \
> compiler/brw_nir_lower_image_load_store.c \
> compiler/brw_nir_lower_mem_access_bit_sizes.c \
> diff --git a/src/intel/compiler/brw_nir.c b/src/intel/compiler/brw_nir.c
> index 92d7fe4bede..572ab824a94 100644
> --- a/src/intel/compiler/brw_nir.c
> +++ b/src/intel/compiler/brw_nir.c
> @@ -882,6 +882,7 @@ brw_postprocess_nir(nir_shader *nir, const struct
> brw_compiler *compiler,
> OPT(nir_opt_move_comparisons);
>
> OPT(nir_lower_bool_to_int32);
> +   OPT(brw_nir_lower_conversions);
>

This is *really* late and I don't think you actually want this to run after
we apply source/destination modifiers.  If you just move it to right after
nir_opt_algebraic_late, that will fix a multitude of issues.


>
> OPT(nir_lower_locals_to_regs);
>
> diff --git a/src/intel/compiler/brw_nir.h b/src/intel/compiler/brw_nir.h
> index bc81950d47e..662b2627e95 100644
> --- a/src/intel/compiler/brw_nir.h
> +++ b/src/intel/compiler/brw_nir.h
> @@ -114,6 +114,8 @@ void brw_nir_lower_tcs_outputs(nir_shader *nir, const
> struct brw_vue_map *vue,
> GLenum tes_primitive_mode);
>  void brw_nir_lower_fs_outputs(nir_shader *nir);
>
> +bool brw_nir_lower_conversions(nir_shader *nir);
> +
>  bool brw_nir_lower_image_load_store(nir_shader *nir,
>  const struct gen_device_info
> *devinfo);
>  void brw_nir_rewrite_image_intrinsic(nir_intrinsic_instr *intrin,
> diff --git a/src/intel/compiler/brw_nir_lower_conversions.c
> b/src/intel/compiler/brw_nir_lower_conversions.c
> new file mode 100644
> index 000..583167c7753
> --- /dev/null
> +++ b/src/intel/compiler/brw_nir_lower_conversions.c
> @@ -0,0 +1,158 @@
> +/*
> + * Copyright © 2018 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the
> next
> + * paragraph) shall be included in all copies or substantial portions of
> the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> DEALINGS
> + * IN THE SOFTWARE.
> + */
> +
> +#include "brw_nir.h"
> +#include "compiler/nir/nir_builder.h"
> +
> +static nir_op
> +get_conversion_op(nir_alu_type src_type,
> +  unsigned src_bit_size,
> +  nir_alu_type dst_type,
> +  unsigned dst_bit_size,
> +  nir_rounding_mode rounding_mode)
> +{
> +   nir_alu_type src_full_type = (nir_alu_type) (src_type | src_bit_size);
> +   nir_alu_type dst_full_type = (nir_alu_type) (dst_type | dst_bit_size);
> +
> +   return nir_type_conversion_op(src_full_type, dst_full_type,
> rounding_mode);
> +}
> +
> +static nir_rounding_mode
> +get_opcode_rounding_mode(nir_op op)
> +{
> +   switch (op) {
> +   case nir_op_f2f16_rtz:
> +  return 

Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-17 Thread Daniel Stone
Hi,

On Thu, 17 Jan 2019 at 16:35, Jason Ekstrand  wrote:
> On January 17, 2019 08:58:03 Erik Faye-Lund  
> wrote:
> > Whoops! I meant to say something like "we'd need to be able to
> > distinguis between CI steps that are triggered due to new MRs versus
> > updated MRs, or pushes to existing branches".
> >
> >> Anyway, Jason did actually write that hook, and it's something I'm
> >> happy to host on existing fd.o machines. I just haven't got to doing
> >> it, since I ended up taking my sabbatical a lot more seriously than I
> >> expected, and now I'm back to work I've got a bit of a backlog. But
> >> we
> >> can definitely do it, and pretty soon.
> >
> > Cool, then I won't worry about it, and just assume it'll appear
> > magically soon :)
>
> My script was a total hack. It's probably massively insecure and doesn't
> include any code to provide a diffstat which has been requested by several
> people. Someone taking it a bit more seriously would probably be good
> before we deploy anything.

With the caveat that I can no longer see the script because it's been
expired out of the pastebin (why not make a GitLab repo or at least
upload it to a snippet?) ...

I had the same assumption when you posted it, but came to the
conclusion it was actually OK, or at least would be with very minimal
work. We can configure Apache and GitLab pretty easily so it can only
be triggered with a secret token which is buried in the repo config
and/or accessible only to admins. It calls back into GitLab to get the
changes, so there's no danger of it sending completely arbitrary
content even if someone does figure out how to trigger it when they
shouldn't. It also has GitLab project -> email destination hardcoded
in the script, so there's no danger of it being used to spam arbitrary
addresses either.

Even without that, given that people currently only need to sign up to
Bugzilla (without a captcha) in order to send email with arbitrary
content to mesa-dev@, 'less spam-prone than the status quo' is an
embarrassingly low bar anyway.

Whoever wants to see this happen should ping Jason to get the script
and his suggested changes, get it in a GitLab repo, then file an issue
on https://gitlab.freedesktop.org/freedesktop/freedesktop/issues/new/
and I'll get it deployed.

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] anv/device: fix maximum number of images supported

2019-01-17 Thread Jason Ekstrand
Yup, that'll do it.  Gotta watch out for ++...  Assuming it fixes the
problem, that patch is

Reviewed-by: Jason Ekstrand 

On Thu, Jan 17, 2019 at 12:35 PM Lionel Landwerlin <
lionel.g.landwer...@intel.com> wrote:

> Looking at the change the binding table emission, I think the image++ has
> been moved such that it doesn't produce the same tables anymore.
> Trying this change on CI :
> https://github.com/djdeath/mesa/commit/a6b8eaf1325389d94d1d8a5b3bb952a362125eb2
>
> On 17/01/2019 18:19, Clayton Craft wrote:
>
> Quoting Mark Janes (2019-01-17 10:13:37)
>
> Hi Iago,
>
> It looks like you tested this patch in CI and got the same failures that
> we are seeing on master:
> http://mesa-ci-results.jf.intel.com/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845
>
> The correct link 
> is:https://mesa-ci.01.org/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845
>
> Why was this patch pushed?
>
> -Mark
>
> Mark Janes   writes:
>
>
> This patch regresses thousands of tests on BDW and HSW:
> http://mesa-ci-results.jf.intel.com/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails
>
> https://mesa-ci.01.org/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails
>
> I'll revert it as soon as my testing completes.
>
> Iago Toral Quiroga   writes:
>
>
> We had defined MAX_IMAGES as 8, which we used to size the array for
> image push constant data. The comment there stated that this was for
> gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
> that array, asserting that we don't exceed that number of images,
> which imposes a limit of MAX_IMAGES on all gens.
>
> Furthermore, despite this, we are exposing up to 64 images per shader
> stage on all gens, gen8 included.
>
> This patch lowers the number of images we expose in gen8 to 8 and
> keeps 64 images for gen9+ while making sure that only pre-SKL gens
> use push constant space to handle images.
>
> v2:
>  - <= instead of < in the assert (Eric, Lionel)
>  - Change the way the assertion is written (Eric)
>
> v3:
>  - Revert the way the assertion is written to the form it had in v1,
>the version in v2 was not equivalent and was incorrect. (Lionel)
>
> v4:
>  - gen9+ doesn't need push constants for images at all (Jason)
> ---
>  src/intel/vulkan/anv_device.c |  7 --
>  .../vulkan/anv_nir_apply_pipeline_layout.c|  4 +--
>  src/intel/vulkan/anv_private.h|  5 ++--
>  src/intel/vulkan/genX_cmd_buffer.c| 25 +--
>  4 files changed, 28 insertions(+), 13 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
> index 523f1483e29..f85458b672e 100644
> --- a/src/intel/vulkan/anv_device.c
> +++ b/src/intel/vulkan/anv_device.c
> @@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
> const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell) ?
>   128 : 16;
>
> +   const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : 
> MAX_IMAGES;
> +
> VkSampleCountFlags sample_counts =
>isl_device_get_sample_counts(>isl_dev);
>
> +
> VkPhysicalDeviceLimits limits = {
>.maxImageDimension1D  = (1 << 14),
>.maxImageDimension2D  = (1 << 14),
> @@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
>.maxPerStageDescriptorUniformBuffers  = 64,
>.maxPerStageDescriptorStorageBuffers  = 64,
>.maxPerStageDescriptorSampledImages   = max_samplers,
> -  .maxPerStageDescriptorStorageImages   = 64,
> +  .maxPerStageDescriptorStorageImages   = max_images,
>.maxPerStageDescriptorInputAttachments= 64,
>.maxPerStageResources = 250,
>.maxDescriptorSetSamplers = 6 * max_samplers, /* 
> number of stages * maxPerStageDescriptorSamplers */
> @@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
>.maxDescriptorSetStorageBuffers   = 6 * 64,   /* 
> number of stages * maxPerStageDescriptorStorageBuffers */
>.maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS / 2,
>.maxDescriptorSetSampledImages= 6 * max_samplers, /* 
> number of stages * maxPerStageDescriptorSampledImages */
> -  .maxDescriptorSetStorageImages= 6 * 64,   /* 
> number of stages * maxPerStageDescriptorStorageImages */
> +  .maxDescriptorSetStorageImages= 6 * max_images,   /* 
> number of stages * maxPerStageDescriptorStorageImages */
>.maxDescriptorSetInputAttachments = 256,
>.maxVertexInputAttributes = MAX_VBS,
>.maxVertexInputBindings   = MAX_VBS,
> diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c 
> b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> index b3daf702bc0..623984b0f8c 100644
> --- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> +++ 

Re: [Mesa-dev] [PATCH] intel/fs: Promote execution type to 32-bit when any half-float conversion is needed.

2019-01-17 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand 

On Wed, Jan 16, 2019 at 1:32 AM Francisco Jerez 
wrote:

> The docs are fairly incomplete and inconsistent about it, but this
> seems to be the reason why half-float destinations are required to be
> DWORD-aligned on BDW+ projects.  This way the regioning lowering pass
> will make sure that the destination components of W to HF and HF to W
> conversions are aligned like the corresponding conversion operation
> with 32-bit execution data type.
> ---
>  src/intel/compiler/brw_ir_fs.h | 21 +
>  1 file changed, 21 insertions(+)
>
> diff --git a/src/intel/compiler/brw_ir_fs.h
> b/src/intel/compiler/brw_ir_fs.h
> index 3c23fb375e4..08e3d83d910 100644
> --- a/src/intel/compiler/brw_ir_fs.h
> +++ b/src/intel/compiler/brw_ir_fs.h
> @@ -477,6 +477,27 @@ get_exec_type(const fs_inst *inst)
>
> assert(exec_type != BRW_REGISTER_TYPE_B);
>
> +   /* Promotion of the execution type to 32-bit for conversions from or to
> +* half-float seems to be consistent with the following text from the
> +* Cherryview PRM Vol. 7, "Execution Data Type":
> +*
> +* "When single precision and half precision floats are mixed between
> +*  source operands or between source and destination operand [..]
> single
> +*  precision float is the execution datatype."
> +*
> +* and from "Register Region Restrictions":
> +*
> +* "Conversion between Integer and HF (Half Float) must be DWord
> aligned
> +*  and strided by a DWord on the destination."
> +*/
> +   if (type_sz(exec_type) == 2 &&
> +   inst->dst.type != exec_type) {
> +  if (exec_type == BRW_REGISTER_TYPE_HF)
> + exec_type = BRW_REGISTER_TYPE_F;
> +  else if (inst->dst.type == BRW_REGISTER_TYPE_HF)
> + exec_type = BRW_REGISTER_TYPE_D;
> +   }
> +
> return exec_type;
>  }
>
> --
> 2.19.2
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v4] anv/device: fix maximum number of images supported

2019-01-17 Thread Lionel Landwerlin
Looking at the change the binding table emission, I think the image++ 
has been moved such that it doesn't produce the same tables anymore.
Trying this change on CI : 
https://github.com/djdeath/mesa/commit/a6b8eaf1325389d94d1d8a5b3bb952a362125eb2


On 17/01/2019 18:19, Clayton Craft wrote:

Quoting Mark Janes (2019-01-17 10:13:37)

Hi Iago,

It looks like you tested this patch in CI and got the same failures that
we are seeing on master:

http://mesa-ci-results.jf.intel.com/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845

The correct link is:
https://mesa-ci.01.org/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845


Why was this patch pushed?

-Mark

Mark Janes  writes:


This patch regresses thousands of tests on BDW and HSW:

http://mesa-ci-results.jf.intel.com/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails

https://mesa-ci.01.org/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails


I'll revert it as soon as my testing completes.

Iago Toral Quiroga  writes:


We had defined MAX_IMAGES as 8, which we used to size the array for
image push constant data. The comment there stated that this was for
gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
that array, asserting that we don't exceed that number of images,
which imposes a limit of MAX_IMAGES on all gens.

Furthermore, despite this, we are exposing up to 64 images per shader
stage on all gens, gen8 included.

This patch lowers the number of images we expose in gen8 to 8 and
keeps 64 images for gen9+ while making sure that only pre-SKL gens
use push constant space to handle images.

v2:
  - <= instead of < in the assert (Eric, Lionel)
  - Change the way the assertion is written (Eric)

v3:
  - Revert the way the assertion is written to the form it had in v1,
the version in v2 was not equivalent and was incorrect. (Lionel)

v4:
  - gen9+ doesn't need push constants for images at all (Jason)
---
  src/intel/vulkan/anv_device.c |  7 --
  .../vulkan/anv_nir_apply_pipeline_layout.c|  4 +--
  src/intel/vulkan/anv_private.h|  5 ++--
  src/intel/vulkan/genX_cmd_buffer.c| 25 +--
  4 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
index 523f1483e29..f85458b672e 100644
--- a/src/intel/vulkan/anv_device.c
+++ b/src/intel/vulkan/anv_device.c
@@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
 const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell) ?
   128 : 16;
  
+   const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : MAX_IMAGES;

+
 VkSampleCountFlags sample_counts =
isl_device_get_sample_counts(>isl_dev);
  
+

 VkPhysicalDeviceLimits limits = {
.maxImageDimension1D  = (1 << 14),
.maxImageDimension2D  = (1 << 14),
@@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
.maxPerStageDescriptorUniformBuffers  = 64,
.maxPerStageDescriptorStorageBuffers  = 64,
.maxPerStageDescriptorSampledImages   = max_samplers,
-  .maxPerStageDescriptorStorageImages   = 64,
+  .maxPerStageDescriptorStorageImages   = max_images,
.maxPerStageDescriptorInputAttachments= 64,
.maxPerStageResources = 250,
.maxDescriptorSetSamplers = 6 * max_samplers, /* number 
of stages * maxPerStageDescriptorSamplers */
@@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
.maxDescriptorSetStorageBuffers   = 6 * 64,   /* number 
of stages * maxPerStageDescriptorStorageBuffers */
.maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS / 2,
.maxDescriptorSetSampledImages= 6 * max_samplers, /* number 
of stages * maxPerStageDescriptorSampledImages */
-  .maxDescriptorSetStorageImages= 6 * 64,   /* number 
of stages * maxPerStageDescriptorStorageImages */
+  .maxDescriptorSetStorageImages= 6 * max_images,   /* number 
of stages * maxPerStageDescriptorStorageImages */
.maxDescriptorSetInputAttachments = 256,
.maxVertexInputAttributes = MAX_VBS,
.maxVertexInputBindings   = MAX_VBS,
diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c 
b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
index b3daf702bc0..623984b0f8c 100644
--- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
+++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
@@ -528,8 +528,8 @@ anv_nir_apply_pipeline_layout(const struct 
anv_physical_device *pdevice,
}
 }
  
-   if (map->image_count > 0) {

-  assert(map->image_count <= MAX_IMAGES);
+   if (map->image_count > 0 && pdevice->compiler->devinfo->gen < 9) {
+  assert(map->image_count <= MAX_GEN8_IMAGES);

Re: [Mesa-dev] [PATCH v4] anv/device: fix maximum number of images supported

2019-01-17 Thread Clayton Craft
Quoting Mark Janes (2019-01-17 10:13:37)
> Hi Iago,
> 
> It looks like you tested this patch in CI and got the same failures that
> we are seeing on master:
> 
> http://mesa-ci-results.jf.intel.com/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845

The correct link is:
https://mesa-ci.01.org/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845

> 
> Why was this patch pushed?
> 
> -Mark
> 
> Mark Janes  writes:
> 
> > This patch regresses thousands of tests on BDW and HSW:
> >
> > http://mesa-ci-results.jf.intel.com/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails

https://mesa-ci.01.org/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails

> >
> > I'll revert it as soon as my testing completes.
> >
> > Iago Toral Quiroga  writes:
> >
> >> We had defined MAX_IMAGES as 8, which we used to size the array for
> >> image push constant data. The comment there stated that this was for
> >> gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
> >> that array, asserting that we don't exceed that number of images,
> >> which imposes a limit of MAX_IMAGES on all gens.
> >>
> >> Furthermore, despite this, we are exposing up to 64 images per shader
> >> stage on all gens, gen8 included.
> >>
> >> This patch lowers the number of images we expose in gen8 to 8 and
> >> keeps 64 images for gen9+ while making sure that only pre-SKL gens
> >> use push constant space to handle images.
> >>
> >> v2:
> >>  - <= instead of < in the assert (Eric, Lionel)
> >>  - Change the way the assertion is written (Eric)
> >>
> >> v3:
> >>  - Revert the way the assertion is written to the form it had in v1,
> >>the version in v2 was not equivalent and was incorrect. (Lionel)
> >>
> >> v4:
> >>  - gen9+ doesn't need push constants for images at all (Jason)
> >> ---
> >>  src/intel/vulkan/anv_device.c |  7 --
> >>  .../vulkan/anv_nir_apply_pipeline_layout.c|  4 +--
> >>  src/intel/vulkan/anv_private.h|  5 ++--
> >>  src/intel/vulkan/genX_cmd_buffer.c| 25 +--
> >>  4 files changed, 28 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
> >> index 523f1483e29..f85458b672e 100644
> >> --- a/src/intel/vulkan/anv_device.c
> >> +++ b/src/intel/vulkan/anv_device.c
> >> @@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
> >> const uint32_t max_samplers = (devinfo->gen >= 8 || 
> >> devinfo->is_haswell) ?
> >>   128 : 16;
> >>  
> >> +   const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : 
> >> MAX_IMAGES;
> >> +
> >> VkSampleCountFlags sample_counts =
> >>isl_device_get_sample_counts(>isl_dev);
> >>  
> >> +
> >> VkPhysicalDeviceLimits limits = {
> >>.maxImageDimension1D  = (1 << 14),
> >>.maxImageDimension2D  = (1 << 14),
> >> @@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
> >>.maxPerStageDescriptorUniformBuffers  = 64,
> >>.maxPerStageDescriptorStorageBuffers  = 64,
> >>.maxPerStageDescriptorSampledImages   = max_samplers,
> >> -  .maxPerStageDescriptorStorageImages   = 64,
> >> +  .maxPerStageDescriptorStorageImages   = max_images,
> >>.maxPerStageDescriptorInputAttachments= 64,
> >>.maxPerStageResources = 250,
> >>.maxDescriptorSetSamplers = 6 * max_samplers, /* 
> >> number of stages * maxPerStageDescriptorSamplers */
> >> @@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
> >>.maxDescriptorSetStorageBuffers   = 6 * 64,   /* 
> >> number of stages * maxPerStageDescriptorStorageBuffers */
> >>.maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS / 2,
> >>.maxDescriptorSetSampledImages= 6 * max_samplers, /* 
> >> number of stages * maxPerStageDescriptorSampledImages */
> >> -  .maxDescriptorSetStorageImages= 6 * 64,   /* 
> >> number of stages * maxPerStageDescriptorStorageImages */
> >> +  .maxDescriptorSetStorageImages= 6 * max_images,   /* 
> >> number of stages * maxPerStageDescriptorStorageImages */
> >>.maxDescriptorSetInputAttachments = 256,
> >>.maxVertexInputAttributes = MAX_VBS,
> >>.maxVertexInputBindings   = MAX_VBS,
> >> diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c 
> >> b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> >> index b3daf702bc0..623984b0f8c 100644
> >> --- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> >> +++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> >> @@ -528,8 +528,8 @@ anv_nir_apply_pipeline_layout(const struct 
> >> anv_physical_device *pdevice,
> >>}
> >> }
> >>  
> >> -   if (map->image_count > 0) {
> >> -  assert(map->image_count <= 

Re: [Mesa-dev] [PATCH v4] anv/device: fix maximum number of images supported

2019-01-17 Thread Mark Janes
Hi Iago,

It looks like you tested this patch in CI and got the same failures that
we are seeing on master:

http://mesa-ci-results.jf.intel.com/itoral/builds/263/group/63a9f0ea7bb98050796b649e85481845

Why was this patch pushed?

-Mark

Mark Janes  writes:

> This patch regresses thousands of tests on BDW and HSW:
>
> http://mesa-ci-results.jf.intel.com/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails
>
> I'll revert it as soon as my testing completes.
>
> Iago Toral Quiroga  writes:
>
>> We had defined MAX_IMAGES as 8, which we used to size the array for
>> image push constant data. The comment there stated that this was for
>> gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
>> that array, asserting that we don't exceed that number of images,
>> which imposes a limit of MAX_IMAGES on all gens.
>>
>> Furthermore, despite this, we are exposing up to 64 images per shader
>> stage on all gens, gen8 included.
>>
>> This patch lowers the number of images we expose in gen8 to 8 and
>> keeps 64 images for gen9+ while making sure that only pre-SKL gens
>> use push constant space to handle images.
>>
>> v2:
>>  - <= instead of < in the assert (Eric, Lionel)
>>  - Change the way the assertion is written (Eric)
>>
>> v3:
>>  - Revert the way the assertion is written to the form it had in v1,
>>the version in v2 was not equivalent and was incorrect. (Lionel)
>>
>> v4:
>>  - gen9+ doesn't need push constants for images at all (Jason)
>> ---
>>  src/intel/vulkan/anv_device.c |  7 --
>>  .../vulkan/anv_nir_apply_pipeline_layout.c|  4 +--
>>  src/intel/vulkan/anv_private.h|  5 ++--
>>  src/intel/vulkan/genX_cmd_buffer.c| 25 +--
>>  4 files changed, 28 insertions(+), 13 deletions(-)
>>
>> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
>> index 523f1483e29..f85458b672e 100644
>> --- a/src/intel/vulkan/anv_device.c
>> +++ b/src/intel/vulkan/anv_device.c
>> @@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
>> const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell) 
>> ?
>>   128 : 16;
>>  
>> +   const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : 
>> MAX_IMAGES;
>> +
>> VkSampleCountFlags sample_counts =
>>isl_device_get_sample_counts(>isl_dev);
>>  
>> +
>> VkPhysicalDeviceLimits limits = {
>>.maxImageDimension1D  = (1 << 14),
>>.maxImageDimension2D  = (1 << 14),
>> @@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
>>.maxPerStageDescriptorUniformBuffers  = 64,
>>.maxPerStageDescriptorStorageBuffers  = 64,
>>.maxPerStageDescriptorSampledImages   = max_samplers,
>> -  .maxPerStageDescriptorStorageImages   = 64,
>> +  .maxPerStageDescriptorStorageImages   = max_images,
>>.maxPerStageDescriptorInputAttachments= 64,
>>.maxPerStageResources = 250,
>>.maxDescriptorSetSamplers = 6 * max_samplers, /* 
>> number of stages * maxPerStageDescriptorSamplers */
>> @@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
>>.maxDescriptorSetStorageBuffers   = 6 * 64,   /* 
>> number of stages * maxPerStageDescriptorStorageBuffers */
>>.maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS / 2,
>>.maxDescriptorSetSampledImages= 6 * max_samplers, /* 
>> number of stages * maxPerStageDescriptorSampledImages */
>> -  .maxDescriptorSetStorageImages= 6 * 64,   /* 
>> number of stages * maxPerStageDescriptorStorageImages */
>> +  .maxDescriptorSetStorageImages= 6 * max_images,   /* 
>> number of stages * maxPerStageDescriptorStorageImages */
>>.maxDescriptorSetInputAttachments = 256,
>>.maxVertexInputAttributes = MAX_VBS,
>>.maxVertexInputBindings   = MAX_VBS,
>> diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c 
>> b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
>> index b3daf702bc0..623984b0f8c 100644
>> --- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
>> +++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
>> @@ -528,8 +528,8 @@ anv_nir_apply_pipeline_layout(const struct 
>> anv_physical_device *pdevice,
>>}
>> }
>>  
>> -   if (map->image_count > 0) {
>> -  assert(map->image_count <= MAX_IMAGES);
>> +   if (map->image_count > 0 && pdevice->compiler->devinfo->gen < 9) {
>> +  assert(map->image_count <= MAX_GEN8_IMAGES);
>>assert(shader->num_uniforms == prog_data->nr_params * 4);
>>state.first_image_uniform = shader->num_uniforms;
>>uint32_t *param = brw_stage_prog_data_add_params(prog_data,
>> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
>> index 770254e93ea..47878adb066 

Re: [Mesa-dev] [PATCH v4] anv/device: fix maximum number of images supported

2019-01-17 Thread Mark Janes
This patch regresses thousands of tests on BDW and HSW:

http://mesa-ci-results.jf.intel.com/vulkancts/builds/10035/group/63a9f0ea7bb98050796b649e85481845#fails

I'll revert it as soon as my testing completes.

Iago Toral Quiroga  writes:

> We had defined MAX_IMAGES as 8, which we used to size the array for
> image push constant data. The comment there stated that this was for
> gen8, but anv_nir_apply_pipeline_layout runs for all gens and writes
> that array, asserting that we don't exceed that number of images,
> which imposes a limit of MAX_IMAGES on all gens.
>
> Furthermore, despite this, we are exposing up to 64 images per shader
> stage on all gens, gen8 included.
>
> This patch lowers the number of images we expose in gen8 to 8 and
> keeps 64 images for gen9+ while making sure that only pre-SKL gens
> use push constant space to handle images.
>
> v2:
>  - <= instead of < in the assert (Eric, Lionel)
>  - Change the way the assertion is written (Eric)
>
> v3:
>  - Revert the way the assertion is written to the form it had in v1,
>the version in v2 was not equivalent and was incorrect. (Lionel)
>
> v4:
>  - gen9+ doesn't need push constants for images at all (Jason)
> ---
>  src/intel/vulkan/anv_device.c |  7 --
>  .../vulkan/anv_nir_apply_pipeline_layout.c|  4 +--
>  src/intel/vulkan/anv_private.h|  5 ++--
>  src/intel/vulkan/genX_cmd_buffer.c| 25 +--
>  4 files changed, 28 insertions(+), 13 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c
> index 523f1483e29..f85458b672e 100644
> --- a/src/intel/vulkan/anv_device.c
> +++ b/src/intel/vulkan/anv_device.c
> @@ -987,9 +987,12 @@ void anv_GetPhysicalDeviceProperties(
> const uint32_t max_samplers = (devinfo->gen >= 8 || devinfo->is_haswell) ?
>   128 : 16;
>  
> +   const uint32_t max_images = devinfo->gen < 9 ? MAX_GEN8_IMAGES : 
> MAX_IMAGES;
> +
> VkSampleCountFlags sample_counts =
>isl_device_get_sample_counts(>isl_dev);
>  
> +
> VkPhysicalDeviceLimits limits = {
>.maxImageDimension1D  = (1 << 14),
>.maxImageDimension2D  = (1 << 14),
> @@ -1009,7 +1012,7 @@ void anv_GetPhysicalDeviceProperties(
>.maxPerStageDescriptorUniformBuffers  = 64,
>.maxPerStageDescriptorStorageBuffers  = 64,
>.maxPerStageDescriptorSampledImages   = max_samplers,
> -  .maxPerStageDescriptorStorageImages   = 64,
> +  .maxPerStageDescriptorStorageImages   = max_images,
>.maxPerStageDescriptorInputAttachments= 64,
>.maxPerStageResources = 250,
>.maxDescriptorSetSamplers = 6 * max_samplers, /* 
> number of stages * maxPerStageDescriptorSamplers */
> @@ -1018,7 +1021,7 @@ void anv_GetPhysicalDeviceProperties(
>.maxDescriptorSetStorageBuffers   = 6 * 64,   /* 
> number of stages * maxPerStageDescriptorStorageBuffers */
>.maxDescriptorSetStorageBuffersDynamic= MAX_DYNAMIC_BUFFERS / 2,
>.maxDescriptorSetSampledImages= 6 * max_samplers, /* 
> number of stages * maxPerStageDescriptorSampledImages */
> -  .maxDescriptorSetStorageImages= 6 * 64,   /* 
> number of stages * maxPerStageDescriptorStorageImages */
> +  .maxDescriptorSetStorageImages= 6 * max_images,   /* 
> number of stages * maxPerStageDescriptorStorageImages */
>.maxDescriptorSetInputAttachments = 256,
>.maxVertexInputAttributes = MAX_VBS,
>.maxVertexInputBindings   = MAX_VBS,
> diff --git a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c 
> b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> index b3daf702bc0..623984b0f8c 100644
> --- a/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> +++ b/src/intel/vulkan/anv_nir_apply_pipeline_layout.c
> @@ -528,8 +528,8 @@ anv_nir_apply_pipeline_layout(const struct 
> anv_physical_device *pdevice,
>}
> }
>  
> -   if (map->image_count > 0) {
> -  assert(map->image_count <= MAX_IMAGES);
> +   if (map->image_count > 0 && pdevice->compiler->devinfo->gen < 9) {
> +  assert(map->image_count <= MAX_GEN8_IMAGES);
>assert(shader->num_uniforms == prog_data->nr_params * 4);
>state.first_image_uniform = shader->num_uniforms;
>uint32_t *param = brw_stage_prog_data_add_params(prog_data,
> diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h
> index 770254e93ea..47878adb066 100644
> --- a/src/intel/vulkan/anv_private.h
> +++ b/src/intel/vulkan/anv_private.h
> @@ -157,7 +157,8 @@ struct gen_l3_config;
>  #define MAX_SCISSORS16
>  #define MAX_PUSH_CONSTANTS_SIZE 128
>  #define MAX_DYNAMIC_BUFFERS 16
> -#define MAX_IMAGES 8
> +#define MAX_IMAGES 64
> +#define MAX_GEN8_IMAGES 8
>  #define MAX_PUSH_DESCRIPTORS 32 /* Minimum requirement 

[Mesa-dev] [Bug 109323] [TRACKER] mesa: Remove autotools

2019-01-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109323

Dylan Baker  changed:

   What|Removed |Added

   Assignee|mesa-dev@lists.freedesktop. |baker.dyla...@gmail.com
   |org |

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] gallium: add pipe_grid_info::last_block

2019-01-17 Thread Marek Olšák
On Wed, Jan 16, 2019 at 2:23 PM Eric Anholt  wrote:

> Marek Olšák  writes:
>
> > From: "Jiang, Sonny" 
> >
> > and add radeonsi support. This will be used by radeonsi internally.
> >
> > Signed-off-by: Sonny Jiang 
>
> It's still strange to me to be proposing changes to gallium core with no
> users of the change.
>

I don't know yet but we might use it for multimedia state trackers.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v1] nir: Length of boolean vtn_value now is 1

2019-01-17 Thread Jason Ekstrand
I think this is probably mostly ok.  There's still some question about
exactly what gets stored there and who is storing it.

On Thu, Jan 17, 2019 at 4:34 AM apinheiro  wrote:

> I was waiting Jason to chime in, but just in case he is too busy I took
> a look in detail to the patch. It LGTM, so assuming it passes intel CI:
>
> Reviewed-by: Alejandro Piñeiro 
>
> On 15/1/19 12:08, Sergii Romantsov wrote:
> > During conversion type-length was lost due to math.
> >
> > CC: Jason Ekstrand 
> > Fixes: 44227453ec03 (nir: Switch to using 1-bit Booleans for almost
> everything)
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109353
> > Signed-off-by: Sergii Romantsov 
> > ---
> >   src/compiler/spirv/spirv_to_nir.c | 9 ++---
> >   1 file changed, 6 insertions(+), 3 deletions(-)
> >
> > diff --git a/src/compiler/spirv/spirv_to_nir.c
> b/src/compiler/spirv/spirv_to_nir.c
> > index e3dc619..faad771 100644
> > --- a/src/compiler/spirv/spirv_to_nir.c
> > +++ b/src/compiler/spirv/spirv_to_nir.c
> > @@ -1042,14 +1042,16 @@ vtn_type_layout_std430(struct vtn_builder *b,
> struct vtn_type *type,
> >   {
> >  switch (type->base_type) {
> >  case vtn_base_type_scalar: {
> > -  uint32_t comp_size = glsl_get_bit_size(type->type) / 8;
> > +  uint32_t comp_size = glsl_type_is_boolean(type->type)
> > + ? 1 : glsl_get_bit_size(type->type) / 8;
>

Because we don't know in spirv_to_nir what size the back-end is going to
use for booleans, we should assume they're 32-bit and use 4 here and
below.  Otherwise, we may end up with booleans overlapping other stuff.


> > *size_out = comp_size;
> > *align_out = comp_size;
> > return type;
> >  }
> >
> >  case vtn_base_type_vector: {
> > -  uint32_t comp_size = glsl_get_bit_size(type->type) / 8;
> > +  uint32_t comp_size = glsl_type_is_boolean(type->type)
> > + ? 1 : glsl_get_bit_size(type->type) / 8;
> > unsigned align_comps = type->length == 3 ? 4 : type->length;
> > *size_out = comp_size * type->length,
> > *align_out = comp_size * align_comps;
> > @@ -1168,7 +1170,8 @@ vtn_handle_type(struct vtn_builder *b, SpvOp
> opcode,
> > val->type->base_type = vtn_base_type_vector;
> > val->type->type =
> glsl_vector_type(glsl_get_base_type(base->type), elems);
> > val->type->length = elems;
> > -  val->type->stride = glsl_get_bit_size(base->type) / 8;
> > +  val->type->stride = glsl_type_is_boolean(val->type->type)
> > + ? 1 : glsl_get_bit_size(base->type) / 8;
> > val->type->array_element = base;
> > break;
> >  }
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] radv: do not write unused descriptors to the per-queue BO

2019-01-17 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_device.c | 252 ++-
 1 file changed, 128 insertions(+), 124 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 488ed0b6225..0bb2dcdcc20 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -1965,134 +1965,138 @@ fill_geom_tess_rings(struct radv_queue *queue,
 uint32_t tess_offchip_ring_size,
 struct radeon_winsys_bo *tess_rings_bo)
 {
-   uint64_t esgs_va = 0, gsvs_va = 0;
-   uint64_t tess_va = 0, tess_offchip_va = 0;
uint32_t *desc = [4];
 
-   if (esgs_ring_bo)
-   esgs_va = radv_buffer_get_va(esgs_ring_bo);
-   if (gsvs_ring_bo)
-   gsvs_va = radv_buffer_get_va(gsvs_ring_bo);
+   if (esgs_ring_bo) {
+   uint64_t esgs_va = radv_buffer_get_va(esgs_ring_bo);
+
+   /* stride 0, num records - size, add tid, swizzle, elsize4,
+  index stride 64 */
+   desc[0] = esgs_va;
+   desc[1] = S_008F04_BASE_ADDRESS_HI(esgs_va >> 32) |
+ S_008F04_STRIDE(0) |
+ S_008F04_SWIZZLE_ENABLE(true);
+   desc[2] = esgs_ring_size;
+   desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
+ S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
+ S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
+ S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) |
+ S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) |
+ S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32) |
+ S_008F0C_ELEMENT_SIZE(1) |
+ S_008F0C_INDEX_STRIDE(3) |
+ S_008F0C_ADD_TID_ENABLE(true);
+
+   /* GS entry for ES->GS ring */
+   /* stride 0, num records - size, elsize0,
+  index stride 0 */
+   desc[4] = esgs_va;
+   desc[5] = S_008F04_BASE_ADDRESS_HI(esgs_va >> 32)|
+ S_008F04_STRIDE(0) |
+ S_008F04_SWIZZLE_ENABLE(false);
+   desc[6] = esgs_ring_size;
+   desc[7] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
+ S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
+ S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
+ S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) |
+ S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) |
+ S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32) |
+ S_008F0C_ELEMENT_SIZE(0) |
+ S_008F0C_INDEX_STRIDE(0) |
+ S_008F0C_ADD_TID_ENABLE(false);
+   }
+
+   desc += 8;
+
+   if (gsvs_ring_bo) {
+   uint64_t gsvs_va = radv_buffer_get_va(gsvs_ring_bo);
+
+   /* VS entry for GS->VS ring */
+   /* stride 0, num records - size, elsize0,
+  index stride 0 */
+   desc[0] = gsvs_va;
+   desc[1] = S_008F04_BASE_ADDRESS_HI(gsvs_va >> 32)|
+ S_008F04_STRIDE(0) |
+ S_008F04_SWIZZLE_ENABLE(false);
+   desc[2] = gsvs_ring_size;
+   desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
+ S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
+ S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
+ S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) |
+ S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) |
+ S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32) |
+ S_008F0C_ELEMENT_SIZE(0) |
+ S_008F0C_INDEX_STRIDE(0) |
+ S_008F0C_ADD_TID_ENABLE(false);
+
+   /* stride gsvs_itemsize, num records 64
+  elsize 4, index stride 16 */
+   /* shader will patch stride and desc[2] */
+   desc[4] = gsvs_va;
+   desc[5] = S_008F04_BASE_ADDRESS_HI(gsvs_va >> 32)|
+ S_008F04_STRIDE(0) |
+ S_008F04_SWIZZLE_ENABLE(true);
+   desc[6] = 0;
+   desc[7] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) |
+ S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) |
+ S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) |
+ S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) |
+ S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_FLOAT) |
+ S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32) |
+ S_008F0C_ELEMENT_SIZE(1) |
+ S_008F0C_INDEX_STRIDE(1) |
+ S_008F0C_ADD_TID_ENABLE(true);
+   }
+
+   desc += 8;
+
if 

[Mesa-dev] [PATCH 1/4] radv: drop unused code related to 16 sample locations

2019-01-17 Thread Samuel Pitoiset
The driver only supports up to 8 sample locations.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_device.c   | 2 --
 src/amd/vulkan/radv_meta_resolve.c | 8 
 src/amd/vulkan/radv_nir_to_llvm.c  | 3 ---
 3 files changed, 13 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 6cadbe722ae..b2112a6ed34 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -2093,8 +2093,6 @@ fill_geom_tess_rings(struct radv_queue *queue,
memcpy(desc, queue->device->sample_locations_4x, 32);
desc += 8;
memcpy(desc, queue->device->sample_locations_8x, 64);
-   desc += 16;
-   memcpy(desc, queue->device->sample_locations_16x, 128);
 }
 
 static unsigned
diff --git a/src/amd/vulkan/radv_meta_resolve.c 
b/src/amd/vulkan/radv_meta_resolve.c
index 7ce36b1df6e..6a7cbfe8468 100644
--- a/src/amd/vulkan/radv_meta_resolve.c
+++ b/src/amd/vulkan/radv_meta_resolve.c
@@ -456,14 +456,6 @@ void radv_CmdResolveImage(
}
assert(dest_image->info.samples == 1);
 
-   if (src_image->info.samples >= 16) {
-   /* See commit aa3f9aaf31e9056a255f9e0472ebdfdaa60abe54 for the
-* glBlitFramebuffer workaround for samples >= 16.
-*/
-   radv_finishme("vkCmdResolveImage: need interpolation workaround 
when "
- "samples >= 16");
-   }
-
if (src_image->info.array_size > 1)
radv_finishme("vkCmdResolveImage: multisample array images");
 
diff --git a/src/amd/vulkan/radv_nir_to_llvm.c 
b/src/amd/vulkan/radv_nir_to_llvm.c
index 9d0aa411528..40812fa7ffb 100644
--- a/src/amd/vulkan/radv_nir_to_llvm.c
+++ b/src/amd/vulkan/radv_nir_to_llvm.c
@@ -1690,9 +1690,6 @@ radv_get_sample_pos_offset(uint32_t num_samples)
case 8:
sample_pos_offset = 7;
break;
-   case 16:
-   sample_pos_offset = 15;
-   break;
default:
break;
}
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] radv: initialize the per-queue descriptor BO only once

2019-01-17 Thread Samuel Pitoiset
Totally useless to write the descriptors inside the loop.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_device.c | 47 ++--
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index 0bb2dcdcc20..c2de61c935d 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -2456,6 +2456,29 @@ radv_get_preamble_cs(struct radv_queue *queue,
} else
descriptor_bo = queue->descriptor_bo;
 
+   if (descriptor_bo != queue->descriptor_bo) {
+   uint32_t *map = 
(uint32_t*)queue->device->ws->buffer_map(descriptor_bo);
+
+   if (scratch_bo) {
+   uint64_t scratch_va = radv_buffer_get_va(scratch_bo);
+   uint32_t rsrc1 = S_008F04_BASE_ADDRESS_HI(scratch_va >> 
32) |
+S_008F04_SWIZZLE_ENABLE(1);
+   map[0] = scratch_va;
+   map[1] = rsrc1;
+   }
+
+   if (esgs_ring_bo || gsvs_ring_bo || tess_rings_bo || 
add_sample_positions)
+   fill_geom_tess_rings(queue, map, add_sample_positions,
+esgs_ring_size, esgs_ring_bo,
+gsvs_ring_size, gsvs_ring_bo,
+tess_factor_ring_size,
+tess_offchip_ring_offset,
+tess_offchip_ring_size,
+tess_rings_bo);
+
+   queue->device->ws->buffer_unmap(descriptor_bo);
+   }
+
for(int i = 0; i < 3; ++i) {
struct radeon_cmdbuf *cs = NULL;
cs = queue->device->ws->cs_create(queue->device->ws,
@@ -2480,30 +2503,6 @@ radv_get_preamble_cs(struct radv_queue *queue,
break;
}
 
-   if (descriptor_bo != queue->descriptor_bo) {
-   uint32_t *map = 
(uint32_t*)queue->device->ws->buffer_map(descriptor_bo);
-
-   if (scratch_bo) {
-   uint64_t scratch_va = 
radv_buffer_get_va(scratch_bo);
-   uint32_t rsrc1 = 
S_008F04_BASE_ADDRESS_HI(scratch_va >> 32) |
-S_008F04_SWIZZLE_ENABLE(1);
-   map[0] = scratch_va;
-   map[1] = rsrc1;
-   }
-
-   if (esgs_ring_bo || gsvs_ring_bo || tess_rings_bo ||
-   add_sample_positions)
-   fill_geom_tess_rings(queue, map, 
add_sample_positions,
-esgs_ring_size, 
esgs_ring_bo,
-gsvs_ring_size, 
gsvs_ring_bo,
-tess_factor_ring_size,
-tess_offchip_ring_offset,
-tess_offchip_ring_size,
-tess_rings_bo);
-
-   queue->device->ws->buffer_unmap(descriptor_bo);
-   }
-
if (esgs_ring_bo || gsvs_ring_bo || tess_rings_bo)  {
radeon_emit(cs, PKT3(PKT3_EVENT_WRITE, 0, 0));
radeon_emit(cs, EVENT_TYPE(V_028A90_VS_PARTIAL_FLUSH) | 
EVENT_INDEX(4));
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] radv: reduce size of the per-queue descriptor BO

2019-01-17 Thread Samuel Pitoiset
Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c
index b2112a6ed34..488ed0b6225 100644
--- a/src/amd/vulkan/radv_device.c
+++ b/src/amd/vulkan/radv_device.c
@@ -2435,7 +2435,7 @@ radv_get_preamble_cs(struct radv_queue *queue,
tess_rings_bo || add_sample_positions) {
size = 112; /* 2 dword + 2 padding + 4 dword * 6 */
if (add_sample_positions)
-   size += 256; /* 32+16+8+4+2+1 samples * 4 * 2 = 
248 bytes. */
+   size += 128; /* 64+32+16+8 = 120 bytes */
}
else if (scratch_bo)
size = 8; /* 2 dword */
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 108877] OpenGL CTS gl43 test cases were interrupted due to segment fault

2019-01-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=108877

--- Comment #4 from Marek Olšák  ---
This issue should be fixed with these two patches:

https://patchwork.freedesktop.org/patch/276824/
https://patchwork.freedesktop.org/patch/276858/

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-17 Thread Jason Ekstrand
On January 17, 2019 08:58:03 Erik Faye-Lund  
wrote:



On Thu, 2019-01-17 at 14:37 +, Daniel Stone wrote:

Hi,

On Thu, 17 Jan 2019 at 07:38, Erik Faye-Lund
 wrote:

1. New MRs should probably get their cover-letter automatically
sent to
the mailing list for incrased visibility.

[...]

I don't think any of these issues are show-stoppers from moving
entirely to MRs, though. Perhaps issue #1 here should be fixed
first,
dunno...

Issue #1 is of course "just" a matter of writing some script that
gets
triggered by a hook. It's just work that needs to be done; any of
us
that complain about it could take up the work, I suppose.

The only question is where to host such a service... perhaps it's
just
a gitlab CI step? That way we'd get some "automatic" cloud
computing
power to do the work, but it doesn't really seem like a perfect
fit;
we'd need to be able to distinguis between CI steps that are
triggered
due to new


Unfinished sentence?


Whoops! I meant to say something like "we'd need to be able to
distinguis between CI steps that are triggered due to new MRs versus
updated MRs, or pushes to existing branches".


Anyway, Jason did actually write that hook, and it's something I'm
happy to host on existing fd.o machines. I just haven't got to doing
it, since I ended up taking my sabbatical a lot more seriously than I
expected, and now I'm back to work I've got a bit of a backlog. But
we
can definitely do it, and pretty soon.


Cool, then I won't worry about it, and just assume it'll appear
magically soon :)


My script was a total hack. It's probably massively insecure and doesn't 
include any code to provide a diffstat which has been requested by several 
people. Someone taking it a bit more seriously would probably be good 
before we deploy anything.


--Jason



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] glsl: be much more aggressive when skipping shader compilation

2019-01-17 Thread Marek Olšák
Acked-by: Marek Olšák 

Marek

On Thu, Jan 17, 2019 at 1:17 AM Timothy Arceri 
wrote:

> Currently only add a cache key for a shader once it is linked.
> However games like Team Fortress 2 compile a whole bunch of shaders
> which are never actually linked. These compiled shaders can take
> up a bunch of memory.
>
> This patch changes things so that we add the key for the shader to
> the cache as soon as it is compiled. This means on a warm cache we
> can avoid the wasted memory from these shaders. Worst case scenario
> is we need to compile the shaders at link time but this can happen
> anyway if the shader has been evicted from the cache.
>
> Reduces memory use in Team Fortress 2 from 1.3GB -> 770MB on a
> warm cache from start up to the game menu.
> ---
>  src/compiler/glsl/glsl_parser_extras.cpp | 9 +
>  src/compiler/glsl/shader_cache.cpp   | 7 +--
>  2 files changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/src/compiler/glsl/glsl_parser_extras.cpp
> b/src/compiler/glsl/glsl_parser_extras.cpp
> index 200df7759b..655399a812 100644
> --- a/src/compiler/glsl/glsl_parser_extras.cpp
> +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> @@ -2155,6 +2155,15 @@ _mesa_glsl_compile_shader(struct gl_context *ctx,
> struct gl_shader *shader,
>
> delete state->symbols;
> ralloc_free(state);
> +
> +   if (ctx->Cache) {
> +  char sha1_buf[41];
> +  disk_cache_put_key(ctx->Cache, shader->sha1);
> +  if (ctx->_Shader->Flags & GLSL_CACHE_INFO) {
> + _mesa_sha1_format(sha1_buf, shader->sha1);
> + fprintf(stderr, "marking shader: %s\n", sha1_buf);
> +  }
> +   }
>  }
>
>  } /* extern "C" */
> diff --git a/src/compiler/glsl/shader_cache.cpp
> b/src/compiler/glsl/shader_cache.cpp
> index 879511a9d7..581098b88f 100644
> --- a/src/compiler/glsl/shader_cache.cpp
> +++ b/src/compiler/glsl/shader_cache.cpp
> @@ -121,20 +121,15 @@ shader_cache_write_program_metadata(struct
> gl_context *ctx,
> if (!cache_item_metadata.keys)
>goto fail;
>
> -   char sha1_buf[41];
> for (unsigned i = 0; i < prog->NumShaders; i++) {
> -  disk_cache_put_key(cache, prog->Shaders[i]->sha1);
>memcpy(cache_item_metadata.keys[i], prog->Shaders[i]->sha1,
>   sizeof(cache_key));
> -  if (ctx->_Shader->Flags & GLSL_CACHE_INFO) {
> - _mesa_sha1_format(sha1_buf, prog->Shaders[i]->sha1);
> - fprintf(stderr, "marking shader: %s\n", sha1_buf);
> -  }
> }
>
> disk_cache_put(cache, prog->data->sha1, metadata.data, metadata.size,
>_item_metadata);
>
> +   char sha1_buf[41];
> if (ctx->_Shader->Flags & GLSL_CACHE_INFO) {
>_mesa_sha1_format(sha1_buf, prog->data->sha1);
>fprintf(stderr, "putting program metadata in cache: %s\n",
> sha1_buf);
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] glsl: don't skip GLSL IR opts on first-time compiles

2019-01-17 Thread Marek Olšák
Reviewed-by: Marek Olšák 

Marek

On Thu, Jan 17, 2019 at 1:17 AM Timothy Arceri 
wrote:

> This basically reverts c2bc0aa7b188.
>
> By running the opts we reduce  memory using in Team Fortress 2
> from 1.5GB -> 1.3GB from start-up to game menu.
>
> This will likely increase Deus Ex start up times as per commit
> c2bc0aa7b188. However currently 32bit games like Team Fortress 2
> can run out of memory on low memory systems, so that seems more
> important.
> ---
>  src/compiler/glsl/glsl_parser_extras.cpp | 16 +---
>  src/compiler/glsl/shader_cache.cpp   | 17 -
>  src/mesa/main/mtypes.h   |  3 +--
>  3 files changed, 2 insertions(+), 34 deletions(-)
>
> diff --git a/src/compiler/glsl/glsl_parser_extras.cpp
> b/src/compiler/glsl/glsl_parser_extras.cpp
> index 2048a7f900..200df7759b 100644
> --- a/src/compiler/glsl/glsl_parser_extras.cpp
> +++ b/src/compiler/glsl/glsl_parser_extras.cpp
> @@ -2090,14 +2090,6 @@ _mesa_glsl_compile_shader(struct gl_context *ctx,
> struct gl_shader *shader,
> */
>if (shader->CompileStatus == COMPILE_SUCCESS)
>   return;
> -
> -  if (shader->CompileStatus == COMPILED_NO_OPTS) {
> - opt_shader_and_create_symbol_table(ctx,
> -NULL, /* source_symbols */
> -shader);
> - shader->CompileStatus = COMPILE_SUCCESS;
> - return;
> -  }
> }
>
> struct _mesa_glsl_parse_state *state =
> @@ -2153,13 +2145,7 @@ _mesa_glsl_compile_shader(struct gl_context *ctx,
> struct gl_shader *shader,
> if (!state->error && !shader->ir->is_empty()) {
>assign_subroutine_indexes(state);
>lower_subroutine(shader->ir, state);
> -
> -  if (!ctx->Cache || force_recompile)
> - opt_shader_and_create_symbol_table(ctx, state->symbols, shader);
> -  else {
> - reparent_ir(shader->ir, shader->ir);
> - shader->CompileStatus = COMPILED_NO_OPTS;
> -  }
> +  opt_shader_and_create_symbol_table(ctx, state->symbols, shader);
> }
>
> if (!force_recompile) {
> diff --git a/src/compiler/glsl/shader_cache.cpp
> b/src/compiler/glsl/shader_cache.cpp
> index 31d0aa6296..879511a9d7 100644
> --- a/src/compiler/glsl/shader_cache.cpp
> +++ b/src/compiler/glsl/shader_cache.cpp
> @@ -264,23 +264,6 @@ shader_cache_read_program_metadata(struct gl_context
> *ctx,
> /* This is used to flag a shader retrieved from cache */
> prog->data->LinkStatus = LINKING_SKIPPED;
>
> -   /* Since the program load was successful, CompileStatus of all shaders
> at
> -* this point should normally be compile_skipped. However because of
> how
> -* the eviction works, it may happen that some of the individual
> shader keys
> -* have been evicted, resulting in unnecessary recompiles on this
> load, so
> -* mark them again to skip such recompiles next time.
> -*/
> -   char sha1_buf[41];
> -   for (unsigned i = 0; i < prog->NumShaders; i++) {
> -  if (prog->Shaders[i]->CompileStatus == COMPILED_NO_OPTS) {
> - disk_cache_put_key(cache, prog->Shaders[i]->sha1);
> - if (ctx->_Shader->Flags & GLSL_CACHE_INFO) {
> -_mesa_sha1_format(sha1_buf, prog->Shaders[i]->sha1);
> -fprintf(stderr, "re-marking shader: %s\n", sha1_buf);
> - }
> -  }
> -   }
> -
> free (buffer);
>
> return true;
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index 241c2b92f7..0fdeba4732 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -2576,8 +2576,7 @@ enum gl_compile_status
>  {
> COMPILE_FAILURE = 0,
> COMPILE_SUCCESS,
> -   COMPILE_SKIPPED,
> -   COMPILED_NO_OPTS
> +   COMPILE_SKIPPED
>  };
>
>  /**
> --
> 2.20.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Double free error on etnaviv driver.

2019-01-17 Thread Nazarov Sergey
Hi, Lucas!
Here is result of execution standard Qt5 example application mainwindow
built under custom buildroot with gcc-4.9.4, mesa3d-17.3.6.
But we had the same error with latest buildroot and latest supported mesa and 
gcc.
---
# ./mainwindow
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/root/.tmp/runtime-root'
Failed to move cursor on screen LVDS1: -14
*** Error in `./mainwindow': double free or corruption (fasttop): 0x021295b8 ***
Aborted
---
Some from valgrind output:
==1473== Invalid free() / delete / delete[] / realloc()
==1473==at 0x48469F8: free (vg_replace_malloc.c:530)
==1473==by 0x825C61B: _mesa_glsl_release_builtin_functions() (in 
/usr/lib/dri/imx-drm_dri.so)
==1473==by 0x8286113: _mesa_destroy_shader_compiler (in 
/usr/lib/dri/imx-drm_dri.so)
==1473==by 0x808F073: one_time_fini (in /usr/lib/dri/imx-drm_dri.so)
==1473==  Address 0x96f4ee8 is 0 bytes inside a block of size 24 free'd
==1473==at 0x48469F8: free (vg_replace_malloc.c:530)
==1473==by 0x4015CF7: _dl_close_worker (in /lib/ld-2.23.so)
==1473==  Block was alloc'd at
==1473==at 0x48454B0: malloc (vg_replace_malloc.c:299)
==1473==by 0x829EA7F: ralloc_size (in /usr/lib/dri/imx-drm_dri.so)

Regards, Sergey.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Jason Ekstrand
Bah... previous e-mail unfinished.  Please ignore.

On Thu, Jan 17, 2019 at 4:15 AM Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > The pass was discovered to cause problems with the MUL+MACH combination
> > we emit for nir_[iu]mul_high.  In an experimental branch of mine, I ran
> > into issues where the MUL+MACH ended up using a strided source due to
> > working on half of a uint64_t and the new lowering pass helpfully tried
> > to fix the multiply which wrote to an unstriated accumulator.
>
> > Not only did the multiply not need to be fixed
>
> That's far from clear, and inconsistent with what this patch is doing,
> since the fix is still being applied (Wouldn't it make sense to clarify
> that in the commit message since it's slightly misleading about it?).
>
> The original instruction was technically violating the first CHV/BXT
> double-precision regioning restriction before the pass was introduced,
> that's why it made any changes in the first place.  The integer
> multiplication lowering code was just lucky enough that violating the
> restriction didn't matter in this case, but I doubt that the reason for
> that had anything to do with the accumulator being the explicit
> destination...
>

Explicit, no, but I do suspect that does have to do with it being the
accumulator.  This restriction isn't theoretical; if you violate it with a
GRF, you will get data corruption; I've seen it myself.  I could see two
possible explanations:

 1. Under the hood the accumulator is written with a Q type and an internal
stride of 8 bytes, hence the restriction does apply but is implicitly
satisfied for D type source strides of 1 and 2.
 2. The data path to the accumulator is a special case in the hardware and
doesn't use the normal general-purpose regioning logic and so doesn't
require the restriction.

I don't find the first one very convincing at all.  Among other things, if
it were the true reason, it would imply that we would need to use a stride
of exactly 2 on D type sources which but empirical evidence suggests that
"mul(8) acc0<1> g5<8,8,1>UD g9<16,8,2>UW" works just fine.


> > but the "fix" ended up breaking it because a MOV to the accumulator is
> > not the same as using it as a multiply destination due to the magic
> > way the 33/64 bits of the
>
> Technically it has 66 bits (it wasn't a typo when I said that to you
> earlier on IRC).  That's how it can t hold the result of a SIMD16
> 16x16-bit integer multiplication with 33-bit signed precision per scalar
> component.
>

Yes, there are 33 bits available for WxW multiplies but this is dealing
with a DxD multiply which only has 64 bits according to this bit of bspec
text:

As there are only 64 bits per channel in DWord mode (D and UD), it is
sufficient to store the multiplication result of two DWord operands as long
as the post source modified sources are still within 32 bits. If any one
source is type UD and is negated, the negated result becomes 33 bits. The
DWord multiplication result is then 65 bits, bigger than the storage
capacity of accumulators. Consequently, the results are unpredictable.


> > accumulator are handled for different instruction types.
> >
> > Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
> > Cc: Francisco Jerez 
> > ---
> >  src/intel/compiler/brw_fs_lower_regioning.cpp | 16 +++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
> b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > index cc4163b4c2c..b8a89e82272 100644
> > --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> > +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > @@ -53,7 +53,13 @@ namespace {
> > unsigned
> > required_dst_byte_stride(const fs_inst *inst)
> > {
> > -  if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> > +  if (inst->dst.is_accumulator()) {
> > + /* Even though it's not explicitly documented in the PRMs or
> the
> > +  * BSpec, writes to the accumulator appear to not need any
> special
> > +  * treatment with respect too their destination stride
> alignment.
> > +  */
>
> The code is not really doing what the comment says.  The
> destination/source stride alignment restriction will still be honored
> for this instruction.  It's just that the destination *has* to be left
> untouched while doing that in the case of an integer MUL/MACH
> instruction (that's the only reason I asked you to return the original
> byte stride of the destination), because splitting off the region into a
> MOV would lead to data loss due to the inconsistent semantics of the
> accumulator destination for integer MUL/MACH (which update the whole 66
> bits) and every other integer arithmetic instruction (which update the
> bottom 33 bits and *apparently* leave the top 33 bits uninitialized) --
> IOW this is only here so that the assert below doesn't fire.
>

Ok, now I'm very confused.  It sounds to me like you 

Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-17 Thread Erik Faye-Lund
On Thu, 2019-01-17 at 14:37 +, Daniel Stone wrote:
> Hi,
> 
> On Thu, 17 Jan 2019 at 07:38, Erik Faye-Lund
>  wrote:
> > 1. New MRs should probably get their cover-letter automatically
> > sent to
> > the mailing list for incrased visibility.
> > 
> > [...]
> > 
> > I don't think any of these issues are show-stoppers from moving
> > entirely to MRs, though. Perhaps issue #1 here should be fixed
> > first,
> > dunno...
> > 
> > Issue #1 is of course "just" a matter of writing some script that
> > gets
> > triggered by a hook. It's just work that needs to be done; any of
> > us
> > that complain about it could take up the work, I suppose.
> > 
> > The only question is where to host such a service... perhaps it's
> > just
> > a gitlab CI step? That way we'd get some "automatic" cloud
> > computing
> > power to do the work, but it doesn't really seem like a perfect
> > fit;
> > we'd need to be able to distinguis between CI steps that are
> > triggered
> > due to new
> 
> Unfinished sentence?

Whoops! I meant to say something like "we'd need to be able to
distinguis between CI steps that are triggered due to new MRs versus
updated MRs, or pushes to existing branches".

> Anyway, Jason did actually write that hook, and it's something I'm
> happy to host on existing fd.o machines. I just haven't got to doing
> it, since I ended up taking my sabbatical a lot more seriously than I
> expected, and now I'm back to work I've got a bit of a backlog. But
> we
> can definitely do it, and pretty soon.
> 

Cool, then I won't worry about it, and just assume it'll appear
magically soon :)

Thanks!

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-17 Thread Erik Faye-Lund
On Thu, 2019-01-17 at 10:00 +0100, Michel Dänzer wrote:
> On 2019-01-17 8:38 a.m., Erik Faye-Lund wrote:
> > 3. There's some browsing-pain with the commit list. For instance, I
> > always second-guess if the latest commit is at the top or bottom.
> 
> At the top, same as in all Git related tools I've seen.
> 

One would think all tools would have the sense to do this, but sadly
this isn't the case. One major player decided to do this differently;
GitHub. Since I use both Github and Gitlab, this difference keeps
confusing me.

It could have been made obvious for instance by showing the commit-
graph next to the commit-list, decorating it with the branch head in
one end and clear continuation in the other.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-17 Thread Daniel Stone
Hi,

On Thu, 17 Jan 2019 at 07:38, Erik Faye-Lund
 wrote:
> 1. New MRs should probably get their cover-letter automatically sent to
> the mailing list for incrased visibility.
>
> [...]
>
> I don't think any of these issues are show-stoppers from moving
> entirely to MRs, though. Perhaps issue #1 here should be fixed first,
> dunno...
>
> Issue #1 is of course "just" a matter of writing some script that gets
> triggered by a hook. It's just work that needs to be done; any of us
> that complain about it could take up the work, I suppose.
>
> The only question is where to host such a service... perhaps it's just
> a gitlab CI step? That way we'd get some "automatic" cloud computing
> power to do the work, but it doesn't really seem like a perfect fit;
> we'd need to be able to distinguis between CI steps that are triggered
> due to new

Unfinished sentence?

Anyway, Jason did actually write that hook, and it's something I'm
happy to host on existing fd.o machines. I just haven't got to doing
it, since I ended up taking my sabbatical a lot more seriously than I
expected, and now I'm back to work I've got a bit of a backlog. But we
can definitely do it, and pretty soon.

Cheers,
Daniel
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Jason Ekstrand
On Thu, Jan 17, 2019 at 4:15 AM Francisco Jerez 
wrote:

> Jason Ekstrand  writes:
>
> > The pass was discovered to cause problems with the MUL+MACH combination
> > we emit for nir_[iu]mul_high.  In an experimental branch of mine, I ran
> > into issues where the MUL+MACH ended up using a strided source due to
> > working on half of a uint64_t and the new lowering pass helpfully tried
> > to fix the multiply which wrote to an unstriated accumulator.
>
> > Not only did the multiply not need to be fixed
>
> That's far from clear, and inconsistent with what this patch is doing,
> since the fix is still being applied (Wouldn't it make sense to clarify
> that in the commit message since it's slightly misleading about it?).
>
> The original instruction was technically violating the first CHV/BXT
> double-precision regioning restriction before the pass was introduced,
> that's why it made any changes in the first place.  The integer
> multiplication lowering code was just lucky enough that violating the
> restriction didn't matter in this case, but I doubt that the reason for
> that had anything to do with the accumulator being the explicit
> destination...
>

Explicit, no, but I do suspect that does have to do with it being the
accumulator.  This restriction isn't theoretical; if you violate it with a
GRF, you will get data corruption; I've seen it myself.  I could see two
possible explanations:

   However, accumulators are a special case that does appear to work.


> > but the "fix" ended up breaking it because a MOV to the accumulator is
> > not the same as using it as a multiply destination due to the magic
> > way the 33/64 bits of the
>
> Technically it has 66 bits (it wasn't a typo when I said that to you
> earlier on IRC).  That's how it can t hold the result of a SIMD16
> 16x16-bit integer multiplication with 33-bit signed precision per scalar
> component.
>

Yes, there are 33 bits available for WxW multiplies but this is dealing
with a DxD multiply which only has 64 bits according to this bit of bspec
text:

As there are only 64 bits per channel in DWord mode (D and UD), it is
sufficient to store the multiplication result of two DWord operands as long
as the post source modified sources are still within 32 bits. If any one
source is type UD and is negated, the negated result becomes 33 bits. The
DWord multiplication result is then 65 bits, bigger than the storage
capacity of accumulators. Consequently, the results are unpredictable.


> > accumulator are handled for different instruction types.
> >
> > Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
> > Cc: Francisco Jerez 
> > ---
> >  src/intel/compiler/brw_fs_lower_regioning.cpp | 16 +++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp
> b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > index cc4163b4c2c..b8a89e82272 100644
> > --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> > +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> > @@ -53,7 +53,13 @@ namespace {
> > unsigned
> > required_dst_byte_stride(const fs_inst *inst)
> > {
> > -  if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> > +  if (inst->dst.is_accumulator()) {
> > + /* Even though it's not explicitly documented in the PRMs or
> the
> > +  * BSpec, writes to the accumulator appear to not need any
> special
> > +  * treatment with respect too their destination stride
> alignment.
> > +  */
>
> The code is not really doing what the comment says.  The
> destination/source stride alignment restriction will still be honored
> for this instruction.  It's just that the destination *has* to be left
> untouched while doing that in the case of an integer MUL/MACH
> instruction (that's the only reason I asked you to return the original
> byte stride of the destination), because splitting off the region into a
> MOV would lead to data loss due to the inconsistent semantics of the
> accumulator destination for integer MUL/MACH (which update the whole 66
> bits) and every other integer arithmetic instruction (which update the
> bottom 33 bits and *apparently* leave the top 33 bits uninitialized) --
> IOW this is only here so that the assert below doesn't fire.
>
> > + return inst->dst.stride * type_sz(inst->dst.type);
> > +  } else if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
>
> The code changes themselves are just as I wished, so this gets my:
>
> Reviewed-by: Francisco Jerez 
>
> assuming that you clarify the commit message and comment above.
>
> >!is_byte_raw_mov(inst)) {
> >   return get_exec_type_size(inst);
> >} else {
> > @@ -316,6 +322,14 @@ namespace {
> > bool
> > lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst *inst)
> > {
> > +  /* We cannot replace the result of an integer multiply which
> writes the
> > +   * accumulator because MUL+MACH 

[Mesa-dev] [Bug 99553] Tracker bug for runnning OpenCL applications on Clover

2019-01-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=99553

Jan Vesely  changed:

   What|Removed |Added

 Depends on||109334


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=109334
[Bug 109334] OpenGL-OpenCL interop support
-- 
You are receiving this mail because:
You are the assignee for the bug.
You are the QA Contact for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 109334] OpenGL-OpenCL interop support

2019-01-17 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=109334

Jan Vesely  changed:

   What|Removed |Added

 Blocks||99553


Referenced Bugs:

https://bugs.freedesktop.org/show_bug.cgi?id=99553
[Bug 99553] Tracker bug for runnning OpenCL applications on Clover
-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] i965/compute: Emit GPGPU_WALKER in genX_state_upload

2019-01-17 Thread Lionel Landwerlin

Put a few nits below, but otherwise looks good.

-
Lionel

On 13/11/2018 00:05, Jordan Justen wrote:

Signed-off-by: Jordan Justen 
---
  src/mesa/drivers/dri/i965/brw_compute.c   | 131 +-
  src/mesa/drivers/dri/i965/brw_context.h   |   2 +
  src/mesa/drivers/dri/i965/genX_state_upload.c | 102 ++
  3 files changed, 105 insertions(+), 130 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
b/src/mesa/drivers/dri/i965/brw_compute.c
index de08fc3ac16..6e4f5b593aa 100644
--- a/src/mesa/drivers/dri/i965/brw_compute.c
+++ b/src/mesa/drivers/dri/i965/brw_compute.c
@@ -34,135 +34,6 @@
  #include "brw_defines.h"
  
  
-static void

-prepare_indirect_gpgpu_walker(struct brw_context *brw)
-{
-   const struct gen_device_info *devinfo = >screen->devinfo;
-   GLintptr indirect_offset = brw->compute.num_work_groups_offset;
-   struct brw_bo *bo = brw->compute.num_work_groups_bo;
-
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMX, bo, indirect_offset + 
0);
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMY, bo, indirect_offset + 
4);
-   brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMZ, bo, indirect_offset + 
8);
-
-   if (devinfo->gen > 7)
-  return;
-
-   /* Clear upper 32-bits of SRC0 and all 64-bits of SRC1 */
-   BEGIN_BATCH(7);
-   OUT_BATCH(MI_LOAD_REGISTER_IMM | (7 - 2));
-   OUT_BATCH(MI_PREDICATE_SRC0 + 4);
-   OUT_BATCH(0u);
-   OUT_BATCH(MI_PREDICATE_SRC1 + 0);
-   OUT_BATCH(0u);
-   OUT_BATCH(MI_PREDICATE_SRC1 + 4);
-   OUT_BATCH(0u);
-   ADVANCE_BATCH();
-
-   /* Load compute_dispatch_indirect_x_size into SRC0 */
-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 0);
-
-   /* predicate = (compute_dispatch_indirect_x_size == 0); */
-   BEGIN_BATCH(1);
-   OUT_BATCH(GEN7_MI_PREDICATE |
- MI_PREDICATE_LOADOP_LOAD |
- MI_PREDICATE_COMBINEOP_SET |
- MI_PREDICATE_COMPAREOP_SRCS_EQUAL);
-   ADVANCE_BATCH();
-
-   /* Load compute_dispatch_indirect_y_size into SRC0 */
-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 4);
-
-   /* predicate |= (compute_dispatch_indirect_y_size == 0); */
-   BEGIN_BATCH(1);
-   OUT_BATCH(GEN7_MI_PREDICATE |
- MI_PREDICATE_LOADOP_LOAD |
- MI_PREDICATE_COMBINEOP_OR |
- MI_PREDICATE_COMPAREOP_SRCS_EQUAL);
-   ADVANCE_BATCH();
-
-   /* Load compute_dispatch_indirect_z_size into SRC0 */
-   brw_load_register_mem(brw, MI_PREDICATE_SRC0, bo, indirect_offset + 8);
-
-   /* predicate |= (compute_dispatch_indirect_z_size == 0); */
-   BEGIN_BATCH(1);
-   OUT_BATCH(GEN7_MI_PREDICATE |
- MI_PREDICATE_LOADOP_LOAD |
- MI_PREDICATE_COMBINEOP_OR |
- MI_PREDICATE_COMPAREOP_SRCS_EQUAL);
-   ADVANCE_BATCH();
-
-   /* predicate = !predicate; */
-   BEGIN_BATCH(1);
-   OUT_BATCH(GEN7_MI_PREDICATE |
- MI_PREDICATE_LOADOP_LOADINV |
- MI_PREDICATE_COMBINEOP_OR |
- MI_PREDICATE_COMPAREOP_FALSE);
-   ADVANCE_BATCH();
-}
-
-static void
-brw_emit_gpgpu_walker(struct brw_context *brw)
-{
-   const struct gen_device_info *devinfo = >screen->devinfo;
-   const struct brw_cs_prog_data *prog_data =
-  brw_cs_prog_data(brw->cs.base.prog_data);
-
-   const GLuint *num_groups = brw->compute.num_work_groups;
-   uint32_t indirect_flag;
-
-   if (brw->compute.num_work_groups_bo == NULL) {
-  indirect_flag = 0;
-   } else {
-  indirect_flag =
- GEN7_GPGPU_INDIRECT_PARAMETER_ENABLE |
- (devinfo->gen == 7 ? GEN7_GPGPU_PREDICATE_ENABLE : 0);
-  prepare_indirect_gpgpu_walker(brw);
-   }
-
-   const unsigned simd_size = prog_data->simd_size;
-   unsigned group_size = prog_data->local_size[0] *
-  prog_data->local_size[1] * prog_data->local_size[2];
-   unsigned thread_width_max =
-  (group_size + simd_size - 1) / simd_size;
-
-   uint32_t right_mask = 0xu >> (32 - simd_size);
-   const unsigned right_non_aligned = group_size & (simd_size - 1);
-   if (right_non_aligned != 0)
-  right_mask >>= (simd_size - right_non_aligned);
-
-   uint32_t dwords = devinfo->gen < 8 ? 11 : 15;
-   BEGIN_BATCH(dwords);
-   OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2) | indirect_flag);
-   OUT_BATCH(0);
-   if (devinfo->gen >= 8) {
-  OUT_BATCH(0); /* Indirect Data Length */
-  OUT_BATCH(0); /* Indirect Data Start Address */
-   }
-   assert(thread_width_max <= brw->screen->devinfo.max_cs_threads);
-   OUT_BATCH(SET_FIELD(simd_size / 16, GPGPU_WALKER_SIMD_SIZE) |
- SET_FIELD(thread_width_max - 1, GPGPU_WALKER_THREAD_WIDTH_MAX));
-   OUT_BATCH(0);/* Thread Group ID Starting X */
-   if (devinfo->gen >= 8)
-  OUT_BATCH(0); /* MBZ */
-   OUT_BATCH(num_groups[0]);/* Thread Group ID X Dimension */
-   OUT_BATCH(0);/* Thread Group ID Starting Y */
-   if (devinfo->gen >= 8)
-  OUT_BATCH(0); 

Re: [Mesa-dev] [PATCH 1/2] i965/genX_state: Add register access functions

2019-01-17 Thread Lionel Landwerlin

Sorry, for replying so late, going through my unread emails :(

We already have functions for doing this :

brw_load_register_reg
brw_load_register_imm32/64
brw_load_register_mem

Why not use those?

-
Lionel

On 13/11/2018 00:05, Jordan Justen wrote:

Signed-off-by: Jordan Justen 
---
  src/mesa/drivers/dri/i965/genX_state_upload.c | 31 +++
  1 file changed, 31 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/genX_state_upload.c 
b/src/mesa/drivers/dri/i965/genX_state_upload.c
index 5acd0922922..6495862e700 100644
--- a/src/mesa/drivers/dri/i965/genX_state_upload.c
+++ b/src/mesa/drivers/dri/i965/genX_state_upload.c
@@ -197,6 +197,37 @@ KSP(UNUSED struct brw_context *brw, uint32_t offset)
  _brw_cmd_pack(cmd)(brw, (void *)_dst, ),  \
  _dst = NULL)
  
+#if GEN_GEN >= 7

+static void
+emit_lrm(struct brw_context *brw, uint32_t reg, struct brw_address addr)
+{
+   brw_batch_emit(brw, GENX(MI_LOAD_REGISTER_MEM), lrm) {
+  lrm.RegisterAddress  = reg;
+  lrm.MemoryAddress= addr;
+   }
+}
+#endif
+
+MAYBE_UNUSED static void
+emit_lri(struct brw_context *brw, uint32_t reg, uint32_t imm)
+{
+   brw_batch_emit(brw, GENX(MI_LOAD_REGISTER_IMM), lri) {
+  lri.RegisterOffset   = reg;
+  lri.DataDWord= imm;
+   }
+}
+
+#if GEN_IS_HASWELL || GEN_GEN >= 8
+MAYBE_UNUSED static void
+emit_lrr(struct brw_context *brw, uint32_t dst, uint32_t src)
+{
+   brw_batch_emit(brw, GENX(MI_LOAD_REGISTER_REG), lrr) {
+  lrr.SourceRegisterAddress= src;
+  lrr.DestinationRegisterAddress   = dst;
+   }
+}
+#endif
+
  /**
   * Polygon stipple packet
   */



___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [ANNOUNCE] mesa 18.3.2

2019-01-17 Thread Emil Velikov
Mesa 18.3.2 is now available.

In this release candidate we have added more PCI IDs for AMD Vega devices and
a number of fixes for the RADV Vulkan drivers.

On the Intel side we have a selection ranging from quad swizzles support for
ICL to compiler fixes.

The nine state tracker has also seen some love as do the Broadcom drivers.

To top it all up, we have a healthy mount of build system fixes.


Alex Deucher (3):
  pci_ids: add new vega10 pci ids
  pci_ids: add new vega20 pci id
  pci_ids: add new VegaM pci id

Alexander von Gluck IV (1):
  egl/haiku: Fix reference to disp vs dpy

Andres Gomez (2):
  glsl: correct typo in GLSL compilation error message
  glsl/linker: specify proper direction in location aliasing error

Axel Davy (3):
  st/nine: Fix volumetexture dtor on ctor failure
  st/nine: Bind src not dst in nine_context_box_upload
  st/nine: Add src reference to nine_context_range_upload

Bas Nieuwenhuizen (5):
  radv: Do a cache flush if needed before reading predicates.
  radv: Implement buffer stores with less than 4 components.
  anv/android: Do not reject storage images.
  radv: Fix rasterization precision bits.
  spirv: Fix matrix parameters in function calls.

Caio Marcelo de Oliveira Filho (3):
  nir: properly clear the entry sources in copy_prop_vars
  nir: properly find the entry to keep in copy_prop_vars
  nir: remove dead code from copy_prop_vars

Dave Airlie (2):
  radv/xfb: fix counter buffer bounds checks.
  virgl/vtest: fix front buffer flush with protocol version 0.

Dylan Baker (6):
  meson: Fix ppc64 little endian detection
  meson: Add support for gnu hurd
  meson: Add toggle for glx-direct
  meson: Override C++ standard to gnu++11 when building with altivec on 
ppc64
  meson: Error out if building nouveau and using LLVM without rtti
  autotools: Remove tegra vdpau driver

Emil Velikov (13):
  docs: add sha256 checksums for 18.3.1
  bin/get-pick-list.sh: rework handing of sha nominations
  bin/get-pick-list.sh: warn when commit lists invalid sha
  cherry-ignore: meson: libfreedreno depends upon libdrm (for fence support)
  glx: mandate xf86vidmode only for "drm" dri platforms
  meson: don't require glx/egl/gbm with gallium drivers
  pipe-loader: meson: reference correct library
  TODO: glx: meson: build dri based glx tests, only with -Dglx=dri
  glx: meson: drop includes from a link-only library
  glx: meson: wire up the dispatch-index-check test
  glx/test: meson: assorted include fixes
  Update version to 18.3.2
  docs: add release notes for 18.3.2

Eric Anholt (6):
  v3d: Fix a leak of the transfer helper on screen destroy.
  vc4: Fix a leak of the transfer helper on screen destroy.
  v3d: Fix a leak of the disassembled instruction string during debug dumps.
  v3d: Make sure that a thrsw doesn't split a multop from its umul24.
  v3d: Add missing flagging of SYNCB as a TSY op.
  gallium/ttn: Fix setup of outputs_written.

Erik Faye-Lund (2):
  virgl: wrap vertex element state in a struct
  virgl: work around bad assumptions in virglrenderer

Francisco Jerez (5):
  intel/fs: Handle source modifiers in lower_integer_multiplication().
  intel/fs: Implement quad swizzles on ICL+.
  intel/fs: Fix bug in lower_simd_width while splitting an instruction 
which was already split.
  intel/eu/gen7: Fix brw_MOV() with DF destination and strided source.
  intel/fs: Respect CHV/BXT regioning restrictions in copy propagation pass.

Ian Romanick (2):
  i965/vec4/dce: Don't narrow the write mask if the flags are used
  Revert "nir/lower_indirect: Bail early if modes == 0"

Jan Vesely (1):
  clover: Fix build after clang r348827

Jason Ekstrand (6):
  nir/constant_folding: Fix source bit size logic
  intel/blorp: Be more conservative about copying clear colors
  spirv: Handle any bit size in vector_insert/extract
  anv/apply_pipeline_layout: Set the cursor in lower_res_reindex_intrinsic
  spirv: Sign-extend array indices
  intel/peephole_ffma: Fix swizzle propagation

Karol Herbst (1):
  nv50/ir: fix use-after-free in ConstantFolding::visit

Kirill Burtsev (1):
  loader: free error state, when checking the drawable type

Lionel Landwerlin (5):
  anv: don't do partial resolve on layer > 0
  i965: include draw_params/derived_draw_params for VF cache workaround
  i965: add CS stall on VF invalidation workaround
  anv: explictly specify format for blorp ccs/mcs op
  anv: flush fast clear colors into compressed surfaces

Marek Olšák (1):
  st/mesa: don't leak pipe_surface if pipe_context is not current

Mario Kleiner (1):
  radeonsi: Fix use of 1- or 2- component GL_DOUBLE vbo's.

Nicolai Hähnle (1):
  meson: link LLVM 'native' component when LLVM is available

Rhys Perry (3):
  radv: don't set surf_index for 

Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-17 Thread Tapani Pälli



On 1/17/19 12:32 PM, apinheiro wrote:


On 17/1/19 9:39, Kenneth Graunke wrote:

On Wednesday, January 16, 2019 11:38:05 PM PST Erik Faye-Lund wrote:

On Fri, 2019-01-11 at 10:57 -0600, Jason Ekstrand wrote:

All,

The mesa project has now hit 100 merge requests (36 are still open).
I (and I'm sure others) would be curious to hear people's initial
thoughts on the process.  What's working well?  What's not working?
Is it total fail and should we go back to mailing lists?


So, overall I think it works pretty well. I have some things I think
maybe we could do better, some of which has already been pointed out:

1. New MRs should probably get their cover-letter automatically sent to
the mailing list for incrased visibility.

2. Perhaps we should ban sending MRs from the main mesa repo? With
gitlab, it's trivial to make your own fork, and you can delegate
permissions to other users for collaborators. I don't think there's any
reason to clutter up the main mesa repo with all kinds of branches. But
it seems some people send their MRs from the main-repo anyway. Perhaps
we should document that this isn't how to send MRs?

I agree, I would much rather see MRs sent from personal forks.
That's what I've been doing, and it works well.  If you see people
doing that, feel free to say something - I assume they just don't
realize they can do that.



As one of the people doing it wrongly (using a branch of the main repo, 
instead of a personal fork): yes please, let's document that, as I was 
unaware that was a problem.


What Im not sure if it is possible to update a existing MR in order to 
use a different repo/branch. If that is not possible, should I close my 
MR and reopen it with a fork?




I guess it's not a big deal (?) if we just take care of proper cleanup 
(can be done via checkbox while merging). For me this happened by 
accident (with recent ahw-fix branch), I just happen to have remote 
names 'gitlab' and 'gitlab-tpalli' so it got pushed to the wrong one.


// Tapani
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v1] nir: Length of boolean vtn_value now is 1

2019-01-17 Thread apinheiro
I was waiting Jason to chime in, but just in case he is too busy I took 
a look in detail to the patch. It LGTM, so assuming it passes intel CI:


Reviewed-by: Alejandro Piñeiro 

On 15/1/19 12:08, Sergii Romantsov wrote:

During conversion type-length was lost due to math.

CC: Jason Ekstrand 
Fixes: 44227453ec03 (nir: Switch to using 1-bit Booleans for almost everything)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109353
Signed-off-by: Sergii Romantsov 
---
  src/compiler/spirv/spirv_to_nir.c | 9 ++---
  1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index e3dc619..faad771 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -1042,14 +1042,16 @@ vtn_type_layout_std430(struct vtn_builder *b, struct 
vtn_type *type,
  {
 switch (type->base_type) {
 case vtn_base_type_scalar: {
-  uint32_t comp_size = glsl_get_bit_size(type->type) / 8;
+  uint32_t comp_size = glsl_type_is_boolean(type->type)
+ ? 1 : glsl_get_bit_size(type->type) / 8;
*size_out = comp_size;
*align_out = comp_size;
return type;
 }
  
 case vtn_base_type_vector: {

-  uint32_t comp_size = glsl_get_bit_size(type->type) / 8;
+  uint32_t comp_size = glsl_type_is_boolean(type->type)
+ ? 1 : glsl_get_bit_size(type->type) / 8;
unsigned align_comps = type->length == 3 ? 4 : type->length;
*size_out = comp_size * type->length,
*align_out = comp_size * align_comps;
@@ -1168,7 +1170,8 @@ vtn_handle_type(struct vtn_builder *b, SpvOp opcode,
val->type->base_type = vtn_base_type_vector;
val->type->type = glsl_vector_type(glsl_get_base_type(base->type), 
elems);
val->type->length = elems;
-  val->type->stride = glsl_get_bit_size(base->type) / 8;
+  val->type->stride = glsl_type_is_boolean(val->type->type)
+ ? 1 : glsl_get_bit_size(base->type) / 8;
val->type->array_element = base;
break;
 }

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-17 Thread apinheiro


On 17/1/19 9:39, Kenneth Graunke wrote:

On Wednesday, January 16, 2019 11:38:05 PM PST Erik Faye-Lund wrote:

On Fri, 2019-01-11 at 10:57 -0600, Jason Ekstrand wrote:

All,

The mesa project has now hit 100 merge requests (36 are still open).
I (and I'm sure others) would be curious to hear people's initial
thoughts on the process.  What's working well?  What's not working?
Is it total fail and should we go back to mailing lists?


So, overall I think it works pretty well. I have some things I think
maybe we could do better, some of which has already been pointed out:

1. New MRs should probably get their cover-letter automatically sent to
the mailing list for incrased visibility.

2. Perhaps we should ban sending MRs from the main mesa repo? With
gitlab, it's trivial to make your own fork, and you can delegate
permissions to other users for collaborators. I don't think there's any
reason to clutter up the main mesa repo with all kinds of branches. But
it seems some people send their MRs from the main-repo anyway. Perhaps
we should document that this isn't how to send MRs?

I agree, I would much rather see MRs sent from personal forks.
That's what I've been doing, and it works well.  If you see people
doing that, feel free to say something - I assume they just don't
realize they can do that.



As one of the people doing it wrongly (using a branch of the main repo, 
instead of a personal fork): yes please, let's document that, as I was 
unaware that was a problem.


What Im not sure if it is possible to update a existing MR in order to 
use a different repo/branch. If that is not possible, should I close my 
MR and reopen it with a fork?





--Ken

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/fs: Don't apply the des stride alignment rule to accumulators

2019-01-17 Thread Francisco Jerez
Jason Ekstrand  writes:

> The pass was discovered to cause problems with the MUL+MACH combination
> we emit for nir_[iu]mul_high.  In an experimental branch of mine, I ran
> into issues where the MUL+MACH ended up using a strided source due to
> working on half of a uint64_t and the new lowering pass helpfully tried
> to fix the multiply which wrote to an unstriated accumulator.

> Not only did the multiply not need to be fixed

That's far from clear, and inconsistent with what this patch is doing,
since the fix is still being applied (Wouldn't it make sense to clarify
that in the commit message since it's slightly misleading about it?).

The original instruction was technically violating the first CHV/BXT
double-precision regioning restriction before the pass was introduced,
that's why it made any changes in the first place.  The integer
multiplication lowering code was just lucky enough that violating the
restriction didn't matter in this case, but I doubt that the reason for
that had anything to do with the accumulator being the explicit
destination...

> but the "fix" ended up breaking it because a MOV to the accumulator is
> not the same as using it as a multiply destination due to the magic
> way the 33/64 bits of the

Technically it has 66 bits (it wasn't a typo when I said that to you
earlier on IRC).  That's how it can t hold the result of a SIMD16
16x16-bit integer multiplication with 33-bit signed precision per scalar
component.

> accumulator are handled for different instruction types.
>
> Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
> Cc: Francisco Jerez 
> ---
>  src/intel/compiler/brw_fs_lower_regioning.cpp | 16 +++-
>  1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_fs_lower_regioning.cpp 
> b/src/intel/compiler/brw_fs_lower_regioning.cpp
> index cc4163b4c2c..b8a89e82272 100644
> --- a/src/intel/compiler/brw_fs_lower_regioning.cpp
> +++ b/src/intel/compiler/brw_fs_lower_regioning.cpp
> @@ -53,7 +53,13 @@ namespace {
> unsigned
> required_dst_byte_stride(const fs_inst *inst)
> {
> -  if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&
> +  if (inst->dst.is_accumulator()) {
> + /* Even though it's not explicitly documented in the PRMs or the
> +  * BSpec, writes to the accumulator appear to not need any special
> +  * treatment with respect too their destination stride alignment.
> +  */

The code is not really doing what the comment says.  The
destination/source stride alignment restriction will still be honored
for this instruction.  It's just that the destination *has* to be left
untouched while doing that in the case of an integer MUL/MACH
instruction (that's the only reason I asked you to return the original
byte stride of the destination), because splitting off the region into a
MOV would lead to data loss due to the inconsistent semantics of the
accumulator destination for integer MUL/MACH (which update the whole 66
bits) and every other integer arithmetic instruction (which update the
bottom 33 bits and *apparently* leave the top 33 bits uninitialized) --
IOW this is only here so that the assert below doesn't fire.

> + return inst->dst.stride * type_sz(inst->dst.type);
> +  } else if (type_sz(inst->dst.type) < get_exec_type_size(inst) &&

The code changes themselves are just as I wished, so this gets my:

Reviewed-by: Francisco Jerez 

assuming that you clarify the commit message and comment above.

>!is_byte_raw_mov(inst)) {
>   return get_exec_type_size(inst);
>} else {
> @@ -316,6 +322,14 @@ namespace {
> bool
> lower_dst_region(fs_visitor *v, bblock_t *block, fs_inst *inst)
> {
> +  /* We cannot replace the result of an integer multiply which writes the
> +   * accumulator because MUL+MACH pairs act on the accumulator as a 
> 64-bit
> +   * value whereas the MOV will act on only 32 or 33 bits of the
> +   * accumulator.
> +   */
> +  assert(inst->opcode != BRW_OPCODE_MUL || !inst->dst.is_accumulator() ||
> + brw_reg_type_is_floating_point(inst->dst.type));
> +
>const fs_builder ibld(v, block, inst);
>const unsigned stride = required_dst_byte_stride(inst) /
>type_sz(inst->dst.type);
> -- 
> 2.20.1


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Double free error on etnaviv driver.

2019-01-17 Thread Lucas Stach
Hi Sergey,

Am Mittwoch, den 16.01.2019, 13:34 +0300 schrieb Nazarov Sergey:
> Hello!
> I use mesa+etnaviv Gallium driver on iMX6Q based board.
> I have double free error at the end of any application using mesa.
> I've found that the problem comes from current ARM g++ compilers
> (at least from 4.9.4 up to latest ones) which call static objects destructors 
> before
> on_exit handler. So, static builtin_builder builtins destructor frees 
> dynamically allocated
> internal members memory before call of _mesa_glsl_release_builtin_functions(),
> which do that again. Here is a patch to fix the problem:

Do you have a reproducer for this issue? I haven't experienced any
memory corruption issues with etnaviv in a long time. Also this being
shared code, it should also happen on other ARM drivers too.

I gave some applications a quick run under valgrind, which didn't turn 
up anything bad immediately, so I would be interested in a way to
clearly demonstrate the issue.

Regards,
Lucas

> 
> --- a/src/compiler/glsl/builtin_functions.cpp
> +++ b/src/compiler/glsl/builtin_functions.cpp
> @@ -6268,7 +6268,7 @@ builtin_builder::_vote(const char *intri
>  
> /**/
>  
>  /* The singleton instance of builtin_builder. */
> -static builtin_builder builtins;
> +static builtin_builder *builtins = NULL;
>  static mtx_t builtins_lock = _MTX_INITIALIZER_NP;
>  
>  /**
> @@ -6279,7 +6279,9 @@ void
>  _mesa_glsl_initialize_builtin_functions()
>  {
> mtx_lock(_lock);
> -   builtins.initialize();
> +   if (!builtins)
> + builtins = new builtin_builder;
> +   builtins->initialize();
> mtx_unlock(_lock);
>  }
>  
> @@ -6287,7 +6289,9 @@ void
>  _mesa_glsl_release_builtin_functions()
>  {
> mtx_lock(_lock);
> -   builtins.release();
> +   builtins->release();
> +   delete builtins;
> +   builtins = NULL;
> mtx_unlock(_lock);
>  }
>  
> @@ -6297,7 +6301,7 @@ _mesa_glsl_find_builtin_function(_mesa_g
>  {
> ir_function_signature *s;
> mtx_lock(_lock);
> -   s = builtins.find(state, name, actual_parameters);
> +   s = builtins->find(state, name, actual_parameters);
> mtx_unlock(_lock);
>  
> return s;
> @@ -6309,7 +6313,7 @@ _mesa_glsl_has_builtin_function(_mesa_gl
> ir_function *f;
> bool ret = false;
> mtx_lock(_lock);
> -   f = builtins.shader->symbols->get_function(name);
> +   f = builtins->shader->symbols->get_function(name);
> if (f != NULL) {
>    foreach_in_list(ir_function_signature, sig, >signatures) {
>   if (sig->is_builtin_available(state)) {
> @@ -6326,7 +6330,7 @@ _mesa_glsl_has_builtin_function(_mesa_gl
>  gl_shader *
>  _mesa_glsl_get_builtin_function_shader()
>  {
> -   return builtins.shader;
> +   return builtins ? builtins->shader : NULL;
>  }
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-17 Thread Michel Dänzer
On 2019-01-17 8:38 a.m., Erik Faye-Lund wrote:
>
> 3. There's some browsing-pain with the commit list. For instance, I
> always second-guess if the latest commit is at the top or bottom.

At the top, same as in all Git related tools I've seen.


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Thoughts after hitting 100 merge requests?

2019-01-17 Thread Kenneth Graunke
On Wednesday, January 16, 2019 11:38:05 PM PST Erik Faye-Lund wrote:
> On Fri, 2019-01-11 at 10:57 -0600, Jason Ekstrand wrote:
> > All,
> > 
> > The mesa project has now hit 100 merge requests (36 are still open). 
> > I (and I'm sure others) would be curious to hear people's initial
> > thoughts on the process.  What's working well?  What's not working? 
> > Is it total fail and should we go back to mailing lists?
> > 
> 
> So, overall I think it works pretty well. I have some things I think
> maybe we could do better, some of which has already been pointed out:
> 
> 1. New MRs should probably get their cover-letter automatically sent to
> the mailing list for incrased visibility.
> 
> 2. Perhaps we should ban sending MRs from the main mesa repo? With
> gitlab, it's trivial to make your own fork, and you can delegate
> permissions to other users for collaborators. I don't think there's any
> reason to clutter up the main mesa repo with all kinds of branches. But
> it seems some people send their MRs from the main-repo anyway. Perhaps
> we should document that this isn't how to send MRs?

I agree, I would much rather see MRs sent from personal forks.
That's what I've been doing, and it works well.  If you see people
doing that, feel free to say something - I assume they just don't
realize they can do that.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] radv: compute the GFX9 fence VA at allocation time

2019-01-17 Thread Samuel Pitoiset
Instead of doing every time we emit cache flushes.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_cmd_buffer.c | 12 ++--
 src/amd/vulkan/radv_private.h|  3 +--
 src/amd/vulkan/si_cmd_buffer.c   |  2 +-
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 75acf2a2b7c..bb176033b56 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -335,13 +335,14 @@ radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer)
if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9 &&
cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) {
unsigned num_db = 
cmd_buffer->device->physical_device->rad_info.num_render_backends;
-   unsigned eop_bug_offset;
+   unsigned fence_offset, eop_bug_offset;
void *fence_ptr;
 
-   radv_cmd_buffer_upload_alloc(cmd_buffer, 8, 0,
-_buffer->gfx9_fence_offset,
+   radv_cmd_buffer_upload_alloc(cmd_buffer, 8, 0, _offset,
 _ptr);
-   cmd_buffer->gfx9_fence_bo = cmd_buffer->upload.upload_bo;
+   cmd_buffer->gfx9_fence_va =
+   radv_buffer_get_va(cmd_buffer->upload.upload_bo);
+   cmd_buffer->gfx9_fence_va += fence_offset;
 
/* Allocate a buffer for the EOP bug on GFX9. */
radv_cmd_buffer_upload_alloc(cmd_buffer, 16 * num_db, 0,
@@ -494,8 +495,7 @@ radv_cmd_buffer_after_draw(struct radv_cmd_buffer 
*cmd_buffer,
RADV_CMD_FLAG_CS_PARTIAL_FLUSH));
 
if (cmd_buffer->device->physical_device->rad_info.chip_class == 
GFX9) {
-   va = radv_buffer_get_va(cmd_buffer->gfx9_fence_bo) +
-cmd_buffer->gfx9_fence_offset;
+   va = cmd_buffer->gfx9_fence_va;
ptr = _buffer->gfx9_fence_idx;
}
 
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index 57e216ecfca..b7bf469933d 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -1114,8 +1114,7 @@ struct radv_cmd_buffer {
 
VkResult record_result;
 
-   uint32_t gfx9_fence_offset;
-   struct radeon_winsys_bo *gfx9_fence_bo;
+   uint64_t gfx9_fence_va;
uint32_t gfx9_fence_idx;
uint64_t gfx9_eop_bug_va;
 
diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
index 2f32c72fea1..f05096fcdfe 100644
--- a/src/amd/vulkan/si_cmd_buffer.c
+++ b/src/amd/vulkan/si_cmd_buffer.c
@@ -976,7 +976,7 @@ si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer)
uint32_t *ptr = NULL;
uint64_t va = 0;
if (chip_class == GFX9) {
-   va = radv_buffer_get_va(cmd_buffer->gfx9_fence_bo) + 
cmd_buffer->gfx9_fence_offset;
+   va = cmd_buffer->gfx9_fence_va;
ptr = _buffer->gfx9_fence_idx;
}
si_cs_emit_cache_flush(cmd_buffer->cs,
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] radv: always pass the GFX9 fence data to si_cs_emit_cache_flush()

2019-01-17 Thread Samuel Pitoiset
Remove two useless checks.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_cmd_buffer.c | 11 ++-
 src/amd/vulkan/si_cmd_buffer.c   |  9 ++---
 2 files changed, 4 insertions(+), 16 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index bb176033b56..b5093bd7d87 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -488,23 +488,16 @@ radv_cmd_buffer_after_draw(struct radv_cmd_buffer 
*cmd_buffer,
   enum radv_cmd_flush_bits flags)
 {
if (cmd_buffer->device->instance->debug_flags & 
RADV_DEBUG_SYNC_SHADERS) {
-   uint32_t *ptr = NULL;
-   uint64_t va = 0;
-
assert(flags & (RADV_CMD_FLAG_PS_PARTIAL_FLUSH |
RADV_CMD_FLAG_CS_PARTIAL_FLUSH));
 
-   if (cmd_buffer->device->physical_device->rad_info.chip_class == 
GFX9) {
-   va = cmd_buffer->gfx9_fence_va;
-   ptr = _buffer->gfx9_fence_idx;
-   }
-
radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 4);
 
/* Force wait for graphics or compute engines to be idle. */
si_cs_emit_cache_flush(cmd_buffer->cs,
   
cmd_buffer->device->physical_device->rad_info.chip_class,
-  ptr, va,
+  _buffer->gfx9_fence_idx,
+  cmd_buffer->gfx9_fence_va,
   radv_cmd_buffer_uses_mec(cmd_buffer),
   flags, cmd_buffer->gfx9_eop_bug_va);
}
diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
index f05096fcdfe..80093ed89ed 100644
--- a/src/amd/vulkan/si_cmd_buffer.c
+++ b/src/amd/vulkan/si_cmd_buffer.c
@@ -973,15 +973,10 @@ si_emit_cache_flush(struct radv_cmd_buffer *cmd_buffer)
enum chip_class chip_class = 
cmd_buffer->device->physical_device->rad_info.chip_class;
radeon_check_space(cmd_buffer->device->ws, cmd_buffer->cs, 128);
 
-   uint32_t *ptr = NULL;
-   uint64_t va = 0;
-   if (chip_class == GFX9) {
-   va = cmd_buffer->gfx9_fence_va;
-   ptr = _buffer->gfx9_fence_idx;
-   }
si_cs_emit_cache_flush(cmd_buffer->cs,
   
cmd_buffer->device->physical_device->rad_info.chip_class,
-  ptr, va,
+  _buffer->gfx9_fence_idx,
+  cmd_buffer->gfx9_fence_va,
   radv_cmd_buffer_uses_mec(cmd_buffer),
   cmd_buffer->state.flush_bits,
   cmd_buffer->gfx9_eop_bug_va);
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] radv: only allocate the GFX9 fence and EOP BOs for the gfx queue

2019-01-17 Thread Samuel Pitoiset
It's invalid to emit a ZPASS_DONE event on the compute queue, and
the fence BO is unused on the compute queue (ie. we don't flush
CB or DB caches).
This saves some space in the upload BO.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_cmd_buffer.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index 950973548af..75acf2a2b7c 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -332,7 +332,8 @@ radv_reset_cmd_buffer(struct radv_cmd_buffer *cmd_buffer)
cmd_buffer->descriptors[i].push_dirty = false;
}
 
-   if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9) {
+   if (cmd_buffer->device->physical_device->rad_info.chip_class >= GFX9 &&
+   cmd_buffer->queue_family_index == RADV_QUEUE_GENERAL) {
unsigned num_db = 
cmd_buffer->device->physical_device->rad_info.num_render_backends;
unsigned eop_bug_offset;
void *fence_ptr;
-- 
2.20.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] radv: remove old_fence parameter from si_cs_emit_write_event_eop()

2019-01-17 Thread Samuel Pitoiset
This parameter is actually useless as the immediate value
can always be zero.

Signed-off-by: Samuel Pitoiset 
---
 src/amd/vulkan/radv_cmd_buffer.c | 2 +-
 src/amd/vulkan/radv_private.h| 1 -
 src/amd/vulkan/radv_query.c  | 4 ++--
 src/amd/vulkan/si_cmd_buffer.c   | 9 -
 4 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/src/amd/vulkan/radv_cmd_buffer.c b/src/amd/vulkan/radv_cmd_buffer.c
index f41d6c0b3e7..950973548af 100644
--- a/src/amd/vulkan/radv_cmd_buffer.c
+++ b/src/amd/vulkan/radv_cmd_buffer.c
@@ -4693,7 +4693,7 @@ static void write_event(struct radv_cmd_buffer 
*cmd_buffer,
   
cmd_buffer->device->physical_device->rad_info.chip_class,
   radv_cmd_buffer_uses_mec(cmd_buffer),
   V_028A90_BOTTOM_OF_PIPE_TS, 0,
-  EOP_DATA_SEL_VALUE_32BIT, va, 2, 
value,
+  EOP_DATA_SEL_VALUE_32BIT, va, value,
   cmd_buffer->gfx9_eop_bug_va);
}
 
diff --git a/src/amd/vulkan/radv_private.h b/src/amd/vulkan/radv_private.h
index 6089ee6a607..57e216ecfca 100644
--- a/src/amd/vulkan/radv_private.h
+++ b/src/amd/vulkan/radv_private.h
@@ -1150,7 +1150,6 @@ void si_cs_emit_write_event_eop(struct radeon_cmdbuf *cs,
unsigned event, unsigned event_flags,
unsigned data_sel,
uint64_t va,
-   uint32_t old_fence,
uint32_t new_fence,
uint64_t gfx9_eop_bug_va);
 
diff --git a/src/amd/vulkan/radv_query.c b/src/amd/vulkan/radv_query.c
index 72429635290..68eca0ea48d 100644
--- a/src/amd/vulkan/radv_query.c
+++ b/src/amd/vulkan/radv_query.c
@@ -1569,7 +1569,7 @@ static void emit_end_query(struct radv_cmd_buffer 
*cmd_buffer,
   radv_cmd_buffer_uses_mec(cmd_buffer),
   V_028A90_BOTTOM_OF_PIPE_TS, 0,
   EOP_DATA_SEL_VALUE_32BIT,
-  avail_va, 0, 1,
+  avail_va, 1,
   cmd_buffer->gfx9_eop_bug_va);
break;
case VK_QUERY_TYPE_TRANSFORM_FEEDBACK_STREAM_EXT:
@@ -1704,7 +1704,7 @@ void radv_CmdWriteTimestamp(
   mec,
   V_028A90_BOTTOM_OF_PIPE_TS, 
0,
   EOP_DATA_SEL_TIMESTAMP,
-  query_va, 0, 0,
+  query_va, 0,
   cmd_buffer->gfx9_eop_bug_va);
break;
}
diff --git a/src/amd/vulkan/si_cmd_buffer.c b/src/amd/vulkan/si_cmd_buffer.c
index 2f57584bf82..2f32c72fea1 100644
--- a/src/amd/vulkan/si_cmd_buffer.c
+++ b/src/amd/vulkan/si_cmd_buffer.c
@@ -660,7 +660,6 @@ void si_cs_emit_write_event_eop(struct radeon_cmdbuf *cs,
unsigned event, unsigned event_flags,
unsigned data_sel,
uint64_t va,
-   uint32_t old_fence,
uint32_t new_fence,
uint64_t gfx9_eop_bug_va)
 {
@@ -707,7 +706,7 @@ void si_cs_emit_write_event_eop(struct radeon_cmdbuf *cs,
radeon_emit(cs, op);
radeon_emit(cs, va);
radeon_emit(cs, ((va >> 32) & 0x) | sel);
-   radeon_emit(cs, old_fence); /* immediate data */
+   radeon_emit(cs, 0); /* immediate data */
radeon_emit(cs, 0); /* unused */
}
 
@@ -801,7 +800,7 @@ si_cs_emit_cache_flush(struct radeon_cmdbuf *cs,
   
V_028A90_FLUSH_AND_INV_CB_DATA_TS,
   0,
   EOP_DATA_SEL_DISCARD,
-  0, 0, 0,
+  0, 0,
   gfx9_eop_bug_va);
}
}
@@ -868,11 +867,11 @@ si_cs_emit_cache_flush(struct radeon_cmdbuf *cs,
 RADV_CMD_FLAG_INV_VMEM_L1);
}
assert(flush_cnt);
-   uint32_t old_fence = (*flush_cnt)++;
+   (*flush_cnt)++;
 
si_cs_emit_write_event_eop(cs, chip_class, false, cb_db_event, 
tc_flags,