I don't remember the specifics of why we ended up interfacing with Clang
this way. What is technically wrong with it, specifically? I don't
have any objection to switching to the Driver and Compilation interface,
nor to translating the "-cl-denorms-are-zero" option to whatever the
current option
Alyssa Rosenzweig writes:
> Hi all,
>
> Recently I've been thinking about the potential for the Rust programming
> language in Mesa. Rust bills itself a safe system programming language
> with comparable performance to C [0], which is a naturally fit for
> graphics driver development.
>
> Mesa
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale. According to Jason, improves Aztec Ruins
performance by 2.7%.
Reviewed-by: Kenneth Graunke (v1)
v2: Undo CPU performance micro-optimization done in i965 and iris due
to lack of data justifying it on
Jason Ekstrand writes:
> On Sat, Aug 10, 2019 at 2:22 PM Francisco Jerez
> wrote:
>
>> Jason Ekstrand writes:
>>
>> > On Fri, Aug 9, 2019 at 7:22 PM Francisco Jerez
>> > wrote:
>> >
>> >> See "i965/gen9: Optimize slice and su
Jason Ekstrand writes:
> On Fri, Aug 9, 2019 at 7:22 PM Francisco Jerez
> wrote:
>
>> See "i965/gen9: Optimize slice and subslice load balancing behavior."
>> for the rationale. Marked optional because no performance evaluation
>> has been done on
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale.
Reviewed-by: Kenneth Graunke
---
src/gallium/drivers/iris/iris_blorp.c | 6 ++
src/gallium/drivers/iris/iris_context.c | 1 +
src/gallium/drivers/iris/iris_context.h | 3 +
See "i965/gen9: Optimize slice and subslice load balancing behavior."
for the rationale. Marked optional because no performance evaluation
has been done on this commit, it is provided to match the hashing
settings of the Iris driver. Test reports welcome.
---
src/intel/vulkan/anv_genX.h
The default pixel hashing mode settings used for slice and subslice
load balancing are far from optimal under certain conditions (see the
comments below for the gory details). The top-of-the-line GT4 parts
suffer from a particularly severe performance problem currently due to
a subslice load
Reviewed-by: Kenneth Graunke
---
src/intel/genxml/gen9.xml | 17 +
1 file changed, 17 insertions(+)
diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
index 9df7cd82738..0d037489df9 100644
--- a/src/intel/genxml/gen9.xml
+++ b/src/intel/genxml/gen9.xml
@@ -6477,6
Specifically the optimization of a conditional BREAK + WHILE sequence
into a conditional WHILE seems pretty broken. The list of successors
of "earlier_block" (where the conditional BREAK was found) is emptied
and then re-created with the same edges for no apparent reason. On
top of that the list
Karol Herbst writes:
> From: Pierre Moreau
>
> v2 (Karol Herbst):
> silence warnings about unhandled enum values
> ---
> .../clover/spirv/invocation.cpp | 598 ++
> .../clover/spirv/invocation.hpp | 12 +
> 2 files changed, 610 insertions(+)
>
>
gt; + // SPIR-V header.
> + if (length < 20u)
> + return false;
> +
> + const uint32_t first_word = binary[0u];
> + return (first_word == SpvMagicNumber) ||
> + (util_bswap32(first_word) == SpvMagicNumber);
> +}
> +
This function seems like dead code.
nt saying where to find llvm-spirv (Karol Herbst).
> * v3:
> - make SPIRV-Tools and llvm-spirv optional (Francisco Jerez);
> - bump requirement for llvm-spirv to version 0.2
> * v2:
> - Bump the required version of SPIRV-Tools to the latest release;
> - Add a dependency on
Karol Herbst writes:
> We want to use it for other formats as well, so give it a more generic name
>
> Signed-off-by: Karol Herbst
> Reviewed-by: Francisco Jerez
> ---
> src/gallium/drivers/r600/evergreen_compute.c | 2 +-
> src/gallium/drivers/
Francisco Jerez writes:
> Because the "low" temporary needs to be accessed with word type and
> twice the original stride, attempting to preserve the alignment of the
> original destination can potentially lead to instructions with illegal
> destination stride greater than
mption that all operands are assumed to be packed, so we
> + * understand that this might be hinting that there may be an exception
> + * for f32 operands with a vstride of 0, so we don't validate this for
> + * them while we don't have empirical evidence t
Jason Ekstrand writes:
> Quick status check. Mesa 19.1 is supposed to branch in two weeks. Are we
> about ready to land this?
>
Seems pretty close to ready to me...
> On Mon, Mar 25, 2019 at 11:13 AM Juan A. Suarez Romero
> wrote:
>
>> On Fri, 2019-03-22 at 17:53 +0100, Iago Toral wrote:
"Juan A. Suarez Romero" writes:
> On Wed, 2019-04-10 at 17:13 -0700, Francisco Jerez wrote:
>> "Juan A. Suarez Romero" writes:
>>
>> > From: Iago Toral Quiroga
>> >
>> > v2:
>> > - Adapted unit tests to make them c
"Juan A. Suarez Romero" writes:
> From: Iago Toral Quiroga
>
> v2:
> - Adapted unit tests to make them consistent with the changes done
>to the validation of half-float conversions.
>
> v3 (Curro):
> - Check all the accummulators
> - Constify declarations
> - Do not check src1 type in
ions.
> - Check restriction on src1.
> - Remove invalid test.
Reviewed-by: Francisco Jerez
> ---
> src/intel/compiler/brw_eu_validate.c| 155 +++-
> src/intel/compiler/test_eu_validate.cpp | 116 ++
> 2 files changed, 270 insertions(+),
Iago Toral writes:
> On Mon, 2019-04-08 at 12:00 -0700, Francisco Jerez wrote:
>> "Juan A. Suarez Romero" writes:
>>
>> > From: Iago Toral Quiroga
>> >
>> > v2: f32to16/f16to32 can use a :W destination (Curr
"Juan A. Suarez Romero" writes:
> From: Iago Toral Quiroga
>
> v2: f32to16/f16to32 can use a :W destination (Curro)
> ---
> src/intel/compiler/brw_fs.cpp | 71 +++
> 1 file changed, 71 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp
"Juan A. Suarez Romero" writes:
> On Wed, 2019-03-27 at 19:37 -0700, Francisco Jerez wrote:
>> "Juan A. Suarez Romero" writes:
>>
>> > From: Iago Toral Quiroga
>> >
>> > v2:
>> > - Adapted unit tests to make them c
"Juan A. Suarez Romero" writes:
> From: Iago Toral Quiroga
>
> v2:
> - Adapted unit tests to make them consistent with the changes done
>to the validation of half-float conversions.
> ---
> src/intel/compiler/brw_eu_validate.c| 256 ++
> src/intel/compiler/test_eu_validate.cpp
"Juan A. Suarez Romero" writes:
> From: Iago Toral Quiroga
>
> v2:
> - Consider implicit conversions in 2-src instructions too (Curro)
> - For restrictions that involve destination stride requirements
>only validate them for Align1, since Align16 always requires
>packed data.
> -
"Juan A. Suarez Romero" writes:
> From: Iago Toral Quiroga
>
> ---
> src/intel/compiler/brw_fs.cpp | 65 +++
> 1 file changed, 65 insertions(+)
>
> diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
> index 2fc7793709b..3616a7afc31 100644
Iago Toral writes:
> On Tue, 2019-03-12 at 15:44 -0700, Francisco Jerez wrote:
>> Iago Toral writes:
>>
>> > On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
>> > > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>> > > > Iago
Francisco Jerez writes:
> Iago Toral writes:
>
>> On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
>>> On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>>> > Iago Toral writes:
>>> >
>>> > > On Fri, 2019-03-01 at 19:0
Iago Toral writes:
> On Wed, 2019-03-06 at 09:21 +0100, Iago Toral wrote:
>> On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
>> > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>> > > Iago Toral writes:
>> > >
>> > > >
Iago Toral writes:
> On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote:
>> On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>> > Iago Toral writes:
>> >
>> > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>> > > > Ia
Iago Toral writes:
> On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote:
>> Iago Toral writes:
>>
>> > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>> > > Iago Toral writes:
>> > >
>> > > > On Thu, 2019-02-
Iago Toral writes:
> On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote:
>> Iago Toral writes:
>>
>> > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > > Iago Toral writes:
>> > >
>> > > > On Wed, 2019-02-
Iago Toral writes:
> On Fri, 2019-03-01 at 09:39 +0100, Iago Toral wrote:
>> On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> > Iago Toral writes:
>> >
>> > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > > Ia
Iago Toral writes:
> On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote:
>> Iago Toral writes:
>>
>> > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> > > Iago Toral writes:
>> > >
>> > > > On Tue, 2019-02-
Iago Toral writes:
> On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote:
>> Iago Toral writes:
>>
>> > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> > > Iago Toral Quiroga writes:
>> > >
>> > > &
Iago Toral writes:
> On Wed, 2019-02-27 at 15:44 -0800, Francisco Jerez wrote:
>> Iago Toral Quiroga writes:
>>
>> > The section 'Execution Data Types' of 3D Media GPGPU volume, which
>> > describes execution types, is exactly the same in BDW and SKL+.
>
W_REGISTER_TYPE_F;
>
> - assert(src0_exec_type == BRW_REGISTER_TYPE_F);
> - return BRW_REGISTER_TYPE_F;
> + assert(src0_exec_type == BRW_REGISTER_TYPE_HF);
> + return BRW_REGISTER_TYPE_HF;
Not really convinced the function is fully correct, but it should be
strictly better with this patch:
Acked-by: Francisco Jerez
> }
>
> /**
> --
> 2.17.1
signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Iago Toral writes:
> On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez wrote:
>> Iago Toral Quiroga writes:
>>
>> > ---
>> > src/intel/compiler/brw_eu_validate.c| 64 -
>> > src/intel/compiler/test_eu_validate.cpp | 122
>>
Iago Toral Quiroga writes:
> ---
> src/intel/compiler/brw_eu_validate.c| 10 +-
> src/intel/compiler/test_eu_validate.cpp | 46 +
> 2 files changed, 55 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c
>
Iago Toral Quiroga writes:
> ---
> src/intel/compiler/brw_eu_validate.c| 64 -
> src/intel/compiler/test_eu_validate.cpp | 122
> 2 files changed, 185 insertions(+), 1 deletion(-)
>
> diff --git a/src/intel/compiler/brw_eu_validate.c
>
Iago Toral Quiroga writes:
> The section 'Execution Data Types' of 3D Media GPGPU volume, which
> describes execution types, is exactly the same in BDW and SKL+.
>
> Also, this section states that there is a single execution type, so it
> makes sense that this is the wider of the two floating
Jason Ekstrand writes:
> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez
> wrote:
>
>> Currently the execution type calculation will return a bogus value in
>> cases like:
>>
>> mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u
>>
>> Which will be
Jason Ekstrand writes:
> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez
> wrote:
>
>> Because the "low" temporary needs to be accessed with word type and
>> twice the original stride, attempting to preserve the alignment of the
>> original destination
Jason Ekstrand writes:
> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez
> wrote:
>
>> This is required in combination with the following commit, because
>> otherwise if a source region with an extended 8+ stride is present in
>> the instruction (which we're about to
Jason Ekstrand writes:
> On Fri, Jan 18, 2019 at 6:09 PM Francisco Jerez
> wrote:
>
>> Strides up to 32B can be implemented for the source regions of most
>> instructions by leveraging either the vertical or the horizontal
>> stride of the hardware Align1 r
This fixes a rather astonishing problem that came up while debugging
an issue in the Vulkan CTS. Apparently the Vulkan CTS framework has
the tendency to create multiple VkDevices, each one with a separate
DRM device FD and therefore a disjoint GEM buffer object handle space.
Because the
Iago Toral writes:
> On Mon, 2019-02-04 at 08:50 +0100, Iago Toral wrote:
>> On Fri, 2019-02-01 at 11:23 -0800, Francisco Jerez wrote:
>> > Iago Toral writes:
>> >
>> > > On Fri, 2019-01-25 at 12:54 -0800, Francisco Jerez wrote:
>> > > > Ia
Iago Toral writes:
> On Fri, 2019-01-25 at 12:54 -0800, Francisco Jerez wrote:
>> Iago Toral writes:
>>
>> > On Thu, 2019-01-24 at 11:45 -0800, Francisco Jerez wrote:
>> > > Iago Toral writes:
>> > >
>> > > > On Wed, 2019-01-
Iago Toral writes:
> On Thu, 2019-01-24 at 11:45 -0800, Francisco Jerez wrote:
>> Iago Toral writes:
>>
>> > On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote:
>> > > Iago Toral Quiroga writes:
>> > >
>> > > >
te.cpp) will be rejected.
Strictly by that rule this patch should be rejected ;). Maybe say
"functional changes" instead of "patches"? Other than that:
Reviewed-by: Francisco Jerez
> */
>
> #include "brw_eu.h"
> --
> 2.19.2
signature.asc
Descrip
Iago Toral writes:
> On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote:
>> Iago Toral Quiroga writes:
>>
>> > Commit c84ec70b3a72 implemented execution type promotion to 32-bit
>> > for
>> > conversions involving half-float registe
Iago Toral Quiroga writes:
> Commit c84ec70b3a72 implemented execution type promotion to 32-bit for
> conversions involving half-float registers, which empirical testing suggested
> was required, but it did not incorporate this change into the assembly
> validator
> logic. This commits adds
h_insn_state(p);
>
> + /* The flag register is only used on Gen7 in align1 mode, so avoid setting
> +* unnecessary bits in the instruction words, get the information we need
> +* and reset the default flag register.
Maybe mention here that this also allows more instructio
Matt Turner writes:
> emit_uniformize() emits SHADER_OPCODE_FIND_LIVE_CHANNEL with its
> flag_subreg set, so that the IR knows which flag is accessed. However
> the flag is only used on Gen7 in Align1 mode.
>
> To avoid setting unnecessary bits in the instruction words, get the
> information we
Matt Turner writes:
> emit_uniformize() emits SHADER_OPCODE_FIND_LIVE_CHANNEL with its
> flag_subreg set, so that the IR knows which flag is accessed. However
> the flag is only used on Gen7 in Align1 mode, and it is used as an
> explicit source and destination.
>
> To avoid setting unnecessary
Currently the execution type calculation will return a bogus value in
cases like:
mov_indirect(8) vgrf0:w, vgrf1:w, vgrf2:ud, 32u
Which will be considered to have a 32-bit integer execution type even
though the actual indirect move operation will be carried out with
16-bit precision.
Even though the hardware spec claims that any "integer DWord multiply"
operation is affected by the regioning restrictions of CHV/BXT/GLK,
this is inconsistent with the behavior of the simulator and with
empirical evidence -- Return false from has_dst_aligned_region_restriction()
for such
Because the "low" temporary needs to be accessed with word type and
twice the original stride, attempting to preserve the alignment of the
original destination can potentially lead to instructions with illegal
destination stride greater than four. Because the CHV/BXT alignment
restrictions are
This is required in combination with the following commit, because
otherwise if a source region with an extended 8+ stride is present in
the instruction (which we're about to declare legal) we'll end up
emitting code that attempts to write to such a region, even though
strides greater than four
Strides up to 32B can be implemented for the source regions of most
instructions by leveraging either the vertical or the horizontal
stride of the hardware Align1 region. The main motivation for this is
that currently the lower_integer_multiplication() pass will happily
double the stride of one
Pierre Moreau writes:
> One flag that needs to be tracked is whether a library is allowed to
> received mathematics optimisations or not, as the authorisation is given
> when creating the library while the optimisations are specified when
> creating the executable.
>
> Reviewed-by: Aaron Watry
Pierre Moreau writes:
> Reviewed-by: Francisco Jerez
>
> Changes since:
> * v5:
> - Drop the `valid_devs` argument to `validate_build_common()`
> (Francisco Jerez)
> - Change `clLinkProgram()` to initialise `prog`’s devices prior to
> calling `validate
Jason Ekstrand writes:
> On Thu, Jan 17, 2019 at 3:34 PM Francisco Jerez
> wrote:
>
>> Jason Ekstrand writes:
>>
>> > Bah... previous e-mail unfinished. Please ignore.
>> >
>> > On Thu, Jan 17, 2019 at 4:15 AM Francisco Jere
>
> mov(8)g9<1>UD g5<8,4,2>UD { align1 1Q };
> mul(8)acc0<1>UD g9<8,8,1>UD 0x0004UW { align1 1Q };
> mach(8) g6<1>UD g5<8,4,2>UD 0x0004UD { align1 1Q AccWrEnable };
>
> Fixes: efa4e4bc5fc "int
Jason Ekstrand writes:
> Bah... previous e-mail unfinished. Please ignore.
>
> On Thu, Jan 17, 2019 at 4:15 AM Francisco Jerez
> wrote:
>
>> Jason Ekstrand writes:
>>
>> > The pass was discovered to cause problems with the MUL+MACH combinati
component.
> accumulator are handled for different instruction types.
>
> Fixes: efa4e4bc5fc "intel/fs: Introduce regioning lowering pass"
> Cc: Francisco Jerez
> ---
> src/intel/compiler/brw_fs_lower_regioning.cpp | 16 +++-
> 1 file changed,
The docs are fairly incomplete and inconsistent about it, but this
seems to be the reason why half-float destinations are required to be
DWORD-aligned on BDW+ projects. This way the regioning lowering pass
will make sure that the destination components of W to HF and HF to W
conversions are
Iago Toral Quiroga writes:
> v2: adapted to work with the new regioning lowering pass
>
> Reviewed-by: Topi Pohjolainen (v1)
> ---
> src/intel/compiler/brw_ir_fs.h | 33 ++---
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
> diff --git
Iago Toral writes:
> On Mon, 2019-01-07 at 11:58 -0800, Francisco Jerez wrote:
>> Iago Toral writes:
>>
>> > On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
>> > > This seems to be a problem in combination with the
>> > > lower_reg
Iago Toral writes:
> On Sat, 2018-12-29 at 12:39 -0800, Francisco Jerez wrote:
>> It's redundant with the functionality provided by lower_regioning
>> now.
>> ---
>> src/intel/Makefile.sources| 1 -
>> src/intel/compiler/brw_fs.cpp
Iago Toral writes:
> On Sat, 2019-01-05 at 14:03 -0800, Francisco Jerez wrote:
>> This legalization pass is meant to handle situations where the source
>> or destination regioning controls of an instruction are unsupported
>> by
>> the hardware and need to be
Iago Toral writes:
> On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
>> Currently the visitor attempts to enforce the regioning restrictions
>> that apply to double-precision instructions on CHV/BXT at NIR-to-i965
>> translation time. It is possible though for
Iago Toral writes:
> On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
>> This seems to be a problem in combination with the lower_regioning
>> pass introduced by a future commit, which can modify a SIMD-split
>> instruction causing its execution size to
Iago Toral writes:
> On Sat, 2018-12-29 at 12:38 -0800, Francisco Jerez wrote:
>> Align16 is no longer a thing, so a new implementation is provided
>> using Align1 instead. Not all possible swizzles can be represented
>> as
>> a single Align1 region, but s
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling
Francisco Jerez writes:
> Iago Toral writes:
>
>> On Sat, 2018-12-29 at 12:39 -0800, Francisco Jerez wrote:
>>> This legalization pass is meant to handle situations where the source
>>> or destination regioning controls of an instruction are unsupported
&g
Iago Toral writes:
> On Wed, 2019-01-02 at 15:00 -0800, Francisco Jerez wrote:
>> Iago Toral Quiroga writes:
>>
>> > There are hardware restrictions to consider that seem to affect
>> > atom platforms
>> > only.
>>
>> Same comment h
Iago Toral writes:
> On Sat, 2018-12-29 at 12:39 -0800, Francisco Jerez wrote:
>> This legalization pass is meant to handle situations where the source
>> or destination regioning controls of an instruction are unsupported
>> by
>> the hardware and need to be
Iago Toral Quiroga writes:
> There are hardware restrictions to consider that seem to affect atom platforms
> only.
Same comment here as for PATCH 13 of this series. This and PATCH 40
shouldn't be necessary anymore with [1] in place. Please drop them.
[1]
This patch is redundant with the regioning lowering pass I sent a few
days ago [1]. The problem with this approach is that on the one hand
it's easy for the back-end compiler to cause code which was legalized at
NIR translation time to become illegal again accidentally, on the other
hand there's
Align16 is no longer a thing, so a new implementation is provided
using Align1 instead. Not all possible swizzles can be represented as
a single Align1 region, but some fast paths are provided for
frequently used swizzles that can be represented efficiently in Align1
mode.
Fixes ~90 subgroup
These are a number of fixes and clean-ups we've been carrying around
for a while in an internal branch. Most of the fixes are required for
conformance of a future platform, but due to their nature some of them
are likely to affect shipping platforms as well -- Especially the
issues addressed by
This legalization pass is meant to handle situations where the source
or destination regioning controls of an instruction are unsupported by
the hardware and need to be lowered away into separate instructions.
This should be more reliable and future-proof than the current
approach of handling
This seems to be a problem in combination with the lower_regioning
pass introduced by a future commit, which can modify a SIMD-split
instruction causing its execution size to become illegal again. A
subsequent call to lower_simd_width() would hit this bug on a future
platform.
Cc:
I triggered this bug while prototyping code for a future platform on
IVB. Could be a problem today though if a strided move is
copy-propagated into a type-converting move with DF destination.
Cc: mesa-sta...@lists.freedesktop.org
---
src/intel/compiler/brw_eu_emit.c | 11 ---
1 file
Currently the visitor attempts to enforce the regioning restrictions
that apply to double-precision instructions on CHV/BXT at NIR-to-i965
translation time. It is possible though for the copy propagation pass
to violate this restriction if a strided move is propagated into one
of the affected
---
src/intel/compiler/brw_fs_builder.h | 68 +-
src/intel/compiler/brw_fs_nir.cpp | 89 +++--
2 files changed, 12 insertions(+), 145 deletions(-)
diff --git a/src/intel/compiler/brw_fs_builder.h
b/src/intel/compiler/brw_fs_builder.h
index
lower_integer_multiplication() implements 32x32-bit multiplication on
some platforms by bit-casting one of the 32-bit sources into two
16-bit unsigned integer portions. This can give incorrect results if
the original instruction specified a source modifier. Fix it by
emitting an additional MOV
These are broken on a future platform, but it turns out we don't need
to fix them, since they're just type-converting moves with strided
source. Kill them.
---
src/intel/compiler/brw_eu_defines.h | 2 --
src/intel/compiler/brw_fs.cpp | 2 --
src/intel/compiler/brw_fs.h
---
src/intel/compiler/brw_fs.cpp | 2 +-
src/intel/compiler/brw_ir_fs.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 4aacc72a1b7..889509badab 100644
--- a/src/intel/compiler/brw_fs.cpp
+++
It's redundant with the functionality provided by lower_regioning now.
---
src/intel/Makefile.sources| 1 -
src/intel/compiler/brw_fs.cpp | 1 -
src/intel/compiler/brw_fs.h | 1 -
.../compiler/brw_fs_lower_conversions.cpp | 132
Jason Ekstrand writes:
> ---
> src/intel/compiler/brw_shader.cpp | 12
> 1 file changed, 12 insertions(+)
>
> diff --git a/src/intel/compiler/brw_shader.cpp
> b/src/intel/compiler/brw_shader.cpp
> index 34b8f3acf93..5cb91e0dce9 100644
> --- a/src/intel/compiler/brw_shader.cpp
>
Anuj Phogat writes:
> L3 allocation table in h/w specification recommends using 4 KB
> granularity for programming allocation fields in L3CNTLREG.
>
> Signed-off-by: Anuj Phogat
> Cc: Kenneth Graunke
> Cc: Francisco Jerez
> Cc: Lionel Landwerlin
Reviewed-by: Francisco
Anuj Phogat writes:
> Signed-off-by: Anuj Phogat
> Cc: Kenneth Graunke
> Cc: Francisco Jerez
> Cc: Lionel Landwerlin
Reviewed-by: Francisco Jerez
> ---
> src/intel/common/gen_l3_config.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> di
Anuj Phogat writes:
> L3 allocation table in h/w specification recommends using 4 KB
> granularity for programming allocation fields in L3CNTLREG.
>
> Signed-off-by: Anuj Phogat
> Cc: Kenneth Graunke
> Cc: Francisco Jerez
> Cc: Lionel Landwerlin
> ---
>
Anuj Phogat writes:
> Use L3 configuration specified in h/w specification.
>
> V2: Drop configs which do under allocation of l3 cache.
> Bump up the comment above table.
>
> Signed-off-by: Anuj Phogat
> Cc: Kenneth Graunke
> Cc: Francisco Jerez
Reviewed-by: Franci
Anuj Phogat writes:
> On Fri, Nov 16, 2018 at 6:21 AM Eero Tamminen
> wrote:
>>
>> Hi,
>>
>> On 16.11.2018 10.33, Francisco Jerez wrote:
>> > Kenneth Graunke writes:
>> [...]
>> >> Perhaps we'll get both configs working, and then will
Kenneth Graunke writes:
> On Thursday, November 15, 2018 11:16:09 PM PST Francisco Jerez wrote:
>> Kenneth Graunke writes:
>>
>> > On Thursday, November 15, 2018 5:51:18 PM PST Francisco Jerez wrote:
>> >> Anuj Phogat writes:
>> >>
>
Kenneth Graunke writes:
> On Thursday, November 15, 2018 5:51:18 PM PST Francisco Jerez wrote:
>> Anuj Phogat writes:
>>
>> > Use L3 configuration table specified in h/w specification.
>> >
>> > Signed-off-by: Anuj Phogat
>> > Cc: Kenneth
Anuj Phogat writes:
> Use L3 configuration table specified in h/w specification.
>
> Signed-off-by: Anuj Phogat
> Cc: Kenneth Graunke
> Cc: Francisco Jerez
> Cc: Lionel Landwerlin
> ---
> src/intel/common/gen_l3_config.c | 16 ++--
> 1 file changed, 10
1 - 100 of 2990 matches
Mail list logo