Re: [Mesa-dev] [PATCH] i965/vec4: use a temp register to compute offsets for pull loads

2018-02-20 Thread Iago Toral
Yes, I agree, thanks for bringing it up.

Iago

On Tue, 2018-02-20 at 16:38 +0200, Andres Gomez wrote:
> Iago, this looks like a good candidate to nominate for inclusion in
> the
> 17.3 stable queue.
> 
> What do you think?
> 
> On Wed, 2017-11-29 at 11:49 +0100, Iago Toral Quiroga wrote:
> > 64-bit pull loads are implemented by emitting 2 separate
> > 32-bit pull load messages, where the second message loads from
> > an offset at +16B.
> > 
> > That addition of 16B to the original offset should not alter the
> > original offset register used as source for the pull load
> > instruction
> > though, since the compiler might use that same offset register in
> > other
> > instructions (for example, for other pull loads in the shader code
> > that take that same offset as reference).
> > 
> > If the pull load is 32-bit then we only need to emit one message
> > and
> > we don't need to do offset calculations, but in that case the
> > optimizer
> > should be able to drop the redundant MOV.
> > 
> > Fixes the following test on Haswell:
> > KHR-GL45.gpu_shader_fp64.fp64.max_uniform_components
> > ---
> >  src/intel/compiler/brw_vec4_nir.cpp | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/src/intel/compiler/brw_vec4_nir.cpp
> > b/src/intel/compiler/brw_vec4_nir.cpp
> > index 0a1caa9fad..84f5b37a9d 100644
> > --- a/src/intel/compiler/brw_vec4_nir.cpp
> > +++ b/src/intel/compiler/brw_vec4_nir.cpp
> > @@ -888,7 +888,9 @@
> > vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
> >if (const_offset) {
> >   offset_reg = brw_imm_ud(const_offset->u32[0] & ~15);
> >} else {
> > - offset_reg = get_nir_src(instr->src[1], nir_type_uint32,
> > 1);
> > + offset_reg = src_reg(this, glsl_type::uint_type);
> > + emit(MOV(dst_reg(offset_reg),
> > +  get_nir_src(instr->src[1], nir_type_uint32,
> > 1)));
> >}
> >  
> >src_reg packed_consts;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Update: Vulkan modifiers extension VK_EXT_image_drm_format_modifier

2018-02-20 Thread Chad Versace
As many of you know, I've been writing a Vulkan extension for DRM format
modifiers, named VK_EXT_image_drm_format_modifier.

The extension is very close to completion. I've submitted a branch to
Khronos for review. It's receiving active review inside Khronos from
some non-Mesa closed-source window-system-integration people, and Mesa
people too (namely jekstrand).

You should take a look at the spec if you care about modifiers and Vulkan.
I try to keep up-to-date urls to everything related to this extension at
.

There remain only two unresolved issues from my perspective:

- The exact definition of members of the array
  VkImageDrmFormatModifierExplicitCreateInfoEXT::pPlaneLayouts.
  Should the extension re-use VkSubresourceLayout as the array
  members? Or should it define a new struct with less fields than
  VkSubresourceLayout?

- If an image has a modifier that requires an extra plane (such as
  a color-compression plane), should the extension allow such an
  image to be disjoint? Specifically, if a modifier requires an
  extra plane, should the extension allow the modifier's
  drmFormatModifierTilingFeatures to contain
  VK_FORMAT_FEATURE_DISJOINT_KHR?

  I've tentatively concluded "no": images with extra planes must be
  non-disjoint. Though we could lift this restriction in a future
  extension.

Branches

I maintain a public branch of the Vulkan spec, branch
1.0-VK_EXT_image_drm_format_modifier, which is synchronized with the
Khronos-internal branch of the same name. I like cgit; other people like
Github; so I keep a mirror at both.

cgit: 
http://git.kiwitree.net/cgit/~chadv/vulkan-spec/log?h=1.0-VK_EXT_image_drm_format_modifier
github: 
https://github.com/chadversary/vulkan-spec/commits/1.0-VK_EXT_image_drm_format_modifier
khronos-internal: 
https://gitlab.khronos.org/vulkan/vulkan/merge_requests/2555


Prebuilt Specs
==
I maintain a public build of the branch. The built headers and HTML
specification are synchronized with the git branch thanks to the magic
of shell scripts.

vulkan.h: 
http://git.kiwitree.net/cgit/~chadv/vulkan-spec/tree/src/vulkan/vulkan.h?h=1.0-VK_EXT_image_drm_format_modifier
spec appendix: 
http://kiwitree.net/~chadv/vulkan/1.0-VK_EXT_image_drm_format_modifier/html/vkspec.html#VK_EXT_image_drm_format_modifier
full spec: 
http://kiwitree.net/~chadv/vulkan/1.0-VK_EXT_image_drm_format_modifier/html/vkspec.html


Where to start reading
==
Here's a short reading guide for people unfamiliar with the extension:

- Don't start with the specification. You'll quickly get lost.

- First, read the VK_EXT_external_memory_dma_buf and
  VK_EXT_queue_family_foreign extensions. They're small extensions.
  They're intended to be used with VK_EXT_image_drm_format_modifier.
  Despite that intent, the the three extensions are independent from
  the Vulkan specification's perspective.

- Read the appendix chapter for VK_EXT_image_drm_format_modifier.
  I've written an "introduction to modifiers" section in the
  appendix. If you're already intimately understand modifiers, then
  you can briefly scan this section, skipping over the boring stuff.

- Read the issue section in the appendix.

- Now for the tofu. Study the new structs and functions. Find them
  under `#define VK_EXT_image_drm_format_modifier` in vulkan.h.
  Also, study the new enums values. Find them by grepping 'DRM.*EXT'
  in vulkan.h.

- Finally, go read the specification text for the new structs,
  functions, and enums.


How to send feedback

I honestly don't know. You _could_ comment on the merge request
in the Khronos-internal Gitlab. But you probably (and rightfully so)
want to keep the discussion public.

You could provide general feedback by replying to this thread.

You could leave comments on my Github branch. I don't like Github, but
I can't think of a better solution, other than...

I could send my specification patches to mesa-dev. If people want that,
say so.

So... yeah. I don't know how you should provide feedback. Just do it,
and we'll iron out any problems as they arise.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] anv/blorp: multisample resolve all attachment layers

2018-02-20 Thread Iago Toral
Hi Nanley,

thanks for having a look at this, you're right that we should use the
framebuffer dimensions to decide on the number of layers to resolve. 

I'll send a new version with the fix.

Iago

On Tue, 2018-02-20 at 15:18 -0800, Nanley Chery wrote:
> On Thu, Feb 15, 2018 at 09:40:16AM +0100, Iago Toral Quiroga wrote:
> > We were only resolving the first.
> > 
> > v2:
> >   - Do not require that the number of layers on dst and src are an
> > exact match, it is okay if the dst has more layers so long as
> > it has at least the same that we are going to resolve.
> >   - Do not always resolve array_len layers, we should resolve
> > only from base_array_layer to array_len.
> > 
> > v3:
> >   - v2 was assuming that array_len represented the total number of
> > layers in the image, but it represents the number of layers
> > starting at the base array ayer.
> > 
> > Fixes new CTS tests for multisampled layered rendering:
> > dEQP-VK.renderpass.multisample_resolve.layers_*
> > ---
> >  src/intel/vulkan/anv_blorp.c | 30 +++---
> >  1 file changed, 19 insertions(+), 11 deletions(-)
> > 
> > diff --git a/src/intel/vulkan/anv_blorp.c
> > b/src/intel/vulkan/anv_blorp.c
> > index d38b343671..df566773a4 100644
> > --- a/src/intel/vulkan/anv_blorp.c
> > +++ b/src/intel/vulkan/anv_blorp.c
> > @@ -1543,25 +1543,33 @@ anv_cmd_buffer_resolve_subpass(struct
> > anv_cmd_buffer *cmd_buffer)
> >   get_blorp_surf_for_anv_image(cmd_buffer->device,
> > dst_iview->image,
> >VK_IMAGE_ASPECT_COLOR_BIT,
> >dst_aux_usage, _surf);
> > +
> > + uint32_t base_src_layer = src_iview-
> > >planes[0].isl.base_array_layer;
> > + uint32_t base_dst_layer = dst_iview-
> > >planes[0].isl.base_array_layer;
> > + uint32_t num_layers = src_iview->planes[0].isl.array_len;
> 
> num_layers should be equal to fb->layers. As seen in the definition
> of
> renderArea in the Vulkan spec, resolve operations are limited to the
> renderArea, which extends to all layers of the framebuffer.
> 
>renderArea is the render area that is affected by the render pass
>instance. The effects of attachment load, store and multisample
> resolve
>operations are restricted to the pixels whose x and y coordinates
> fall
>within the render area on all attachments. The render area extends
> to
>all layers of framebuffer.
> 
> > + assert(num_layers <= dst_iview->planes[0].isl.array_len);
> > +
> 
> This assertion is false. The spec allows having an arrayed
> multisample
> source image view and a non-arrayed single-sampled destination image
> view as long as the framebuffer is non-arrayed.
> 
>Each element of pAttachments must have dimensions at least as
> large as
>the corresponding framebuffer dimension
> 
> -Nanley
> 
> >   anv_cmd_buffer_mark_image_written(cmd_buffer, dst_iview-
> > >image,
> > VK_IMAGE_ASPECT_COLOR_B
> > IT,
> > dst_surf.aux_usage,
> > dst_iview-
> > >planes[0].isl.base_level,
> > -   dst_iview-
> > >planes[0].isl.base_array_layer, 1);
> > +   base_dst_layer,
> > num_layers);
> >  
> >   assert(!src_iview->image->format->can_ycbcr);
> >   assert(!dst_iview->image->format->can_ycbcr);
> >  
> > - resolve_surface(,
> > - _surf,
> > - src_iview->planes[0].isl.base_level,
> > - src_iview-
> > >planes[0].isl.base_array_layer,
> > - _surf,
> > - dst_iview->planes[0].isl.base_level,
> > - dst_iview-
> > >planes[0].isl.base_array_layer,
> > - render_area.offset.x,
> > render_area.offset.y,
> > - render_area.offset.x,
> > render_area.offset.y,
> > - render_area.extent.width,
> > render_area.extent.height);
> > + for (uint32_t i = 0; i < num_layers; i++) {
> > +resolve_surface(,
> > +_surf,
> > +src_iview->planes[0].isl.base_level,
> > +base_src_layer + i,
> > +_surf,
> > +dst_iview->planes[0].isl.base_level,
> > +base_dst_layer + i,
> > +render_area.offset.x,
> > render_area.offset.y,
> > +render_area.offset.x,
> > render_area.offset.y,
> > +render_area.extent.width,
> > render_area.extent.height);
> > + }
> >}
> >  
> >blorp_batch_finish();
> > -- 
> > 2.14.1
> > 
> > ___
> > mesa-dev mailing 

Re: [Mesa-dev] Allocator Nouveau driver, Mesa EXT_external_objects, and DRM metadata import interfaces

2018-02-20 Thread Chad Versace
On Thu 21 Dec 2017, Daniel Vetter wrote:
> On Thu, Dec 21, 2017 at 12:22 AM, Kristian Kristensen  
> wrote:
>> On Wed, Dec 20, 2017 at 12:41 PM, Miguel Angel Vico  
>> wrote:
>>> On Wed, 20 Dec 2017 11:54:10 -0800 Kristian Høgsberg  
>>> wrote:
 I'd like to see concrete examples of actual display controllers
 supporting more format layouts than what can be specified with a 64
 bit modifier.
>>>
>>> The main problem is our tiling and other metadata parameters can't
>>> generally fit in a modifier, so we find passing a blob of metadata a
>>> more suitable mechanism.
>>
>> I understand that you may have n knobs with a total of more than a total of
>> 56 bits that configure your tiling/swizzling for color buffers. What I don't
>> buy is that you need all those combinations when passing buffers around
>> between codecs, cameras and display controllers. Even if you're sharing
>> between the same 3D drivers in different processes, I expect just locking
>> down, say, 64 different combinations (you can add more over time) and
>> assigning each a modifier would be sufficient. I doubt you'd extract
>> meaningful performance gains from going all the way to a blob.

I agree with Kristian above. In my opinion, choosing to encode in
modifiers a precise description of every possible tiling/compression
layout is not technically incorrect, but I believe it misses the point.
The intention behind modifiers is not to exhaustively describe all
possibilites.

I summarized this opinion in VK_EXT_image_drm_format_modifier,
where I wrote an "introdution to modifiers" section. Here's an excerpt:

One goal of modifiers in the Linux ecosystem is to enumerate for each
vendor a reasonably sized set of tiling formats that are appropriate for
images shared across processes, APIs, and/or devices, where each
participating component may possibly be from different vendors.
A non-goal is to enumerate all tiling formats supported by all vendors.
Some tiling formats used internally by vendors are inappropriate for
sharing; no modifiers should be assigned to such tiling formats.

> Tegra just redesigned it's modifier space from an ungodly amount of
> bits to just a few layouts. Not even just the ones in used, but simply
> limiting to the ones that make sense (there's dependencies apparently)
> Also note that the modifier alone doesn't need to describe the layout
> precisely, it only makes sense together with a specific pixel format
> and size. E.g. a bunch of the i915 layouts change layout depending
> upon bpp.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: remove old assert

2018-02-20 Thread Ian Romanick
That makes sense.  I guess whoever changed that aspect didn't remove the
assert.  I only noticed it because I build with -Wextra, so it's not
surprising that nobody else noticed.

Reviewed-by: Ian Romanick 

On 02/20/2018 07:42 PM, Timothy Arceri wrote:
> This was originally intended to make sure the remap location
> was not -1. However the code has changed alot since then,
> the location is now never set to -1 and we also handle
> components meaning this old assert has been doing comparisions
> with the pointer to the array of component data.
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105183
> ---
>  src/compiler/nir/nir_linking_helpers.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/src/compiler/nir/nir_linking_helpers.c 
> b/src/compiler/nir/nir_linking_helpers.c
> index 6459c6a24d..2b0a2668a3 100644
> --- a/src/compiler/nir/nir_linking_helpers.c
> +++ b/src/compiler/nir/nir_linking_helpers.c
> @@ -283,7 +283,6 @@ remap_slots_and_components(struct exec_list *var_list, 
> gl_shader_stage stage,
>if (var->data.location >= VARYING_SLOT_VAR0 &&
>var->data.location - VARYING_SLOT_VAR0 < 32) {
>   assert(var->data.location - VARYING_SLOT_VAR0 < 32);
> - assert(remap[var->data.location - VARYING_SLOT_VAR0] >= 0);
>  
>   const struct glsl_type *type = var->type;
>   if (nir_is_per_vertex_io(var, stage)) {
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/17 v2] spirv: Silence compiler warning about undefined srcs[0]

2018-02-20 Thread Eric Anholt
v2: Use assume() at the srcs[] definition instead.

Cc: Jason Ekstrand 
Cc: Ian Romanick 
Cc: Eric Engestrom 
---
 src/compiler/spirv/spirv_to_nir.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/compiler/spirv/spirv_to_nir.c 
b/src/compiler/spirv/spirv_to_nir.c
index c6df764682ec..e22fe25a2e82 100644
--- a/src/compiler/spirv/spirv_to_nir.c
+++ b/src/compiler/spirv/spirv_to_nir.c
@@ -2922,6 +2922,7 @@ vtn_handle_composite(struct vtn_builder *b, SpvOp opcode,
 
case SpvOpCompositeConstruct: {
   unsigned elems = count - 3;
+  assume(elems >= 1);
   if (glsl_type_is_vector_or_scalar(type)) {
  nir_ssa_def *srcs[4];
  for (unsigned i = 0; i < elems; i++)
-- 
2.15.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] radeonsi/nir: collect more accurate output_usagemask

2018-02-20 Thread Timothy Arceri
Fixes assert in glsl-1.50-gs-max-output-components piglit test.

Note that the double handling will only work for doubles that
don't take up multiple slots i.e. double and dvec2. However
dual slot double handling is an existing bug which is made no
worse by this patch.
---
 src/gallium/drivers/radeonsi/si_shader_nir.c | 56 +---
 1 file changed, 43 insertions(+), 13 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index 3294019cea..7b10410dd7 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -462,21 +462,35 @@ void si_nir_scan_shader(const struct nir_shader *nir,
}
 
i = variable->data.driver_location;
-   if (processed_outputs & ((uint64_t)1 << i))
-   continue;
-
-   processed_outputs |= ((uint64_t)1 << i);
-   num_outputs++;
-
-   info->output_semantic_name[i] = semantic_name;
-   info->output_semantic_index[i] = semantic_index;
-   info->output_usagemask[i] = TGSI_WRITEMASK_XYZW;
 
unsigned num_components = 4;
unsigned vector_elements = 
glsl_get_vector_elements(glsl_without_array(variable->type));
if (vector_elements)
num_components = vector_elements;
 
+   if (glsl_type_is_64bit(glsl_without_array(variable->type)))
+   num_components = MIN2(num_components * 2, 4);
+
+   ubyte usagemask = 0;
+   for (unsigned j = 0; j < num_components; j++) {
+   switch (j + variable->data.location_frac) {
+   case 0:
+   usagemask |= TGSI_WRITEMASK_X;
+   break;
+   case 1:
+   usagemask |= TGSI_WRITEMASK_Y;
+   break;
+   case 2:
+   usagemask |= TGSI_WRITEMASK_Z;
+   break;
+   case 3:
+   usagemask |= TGSI_WRITEMASK_W;
+   break;
+   default:
+   unreachable("error calculating 
component index");
+   }
+   }
+
unsigned gs_out_streams;
if (variable->data.stream & (1u << 31)) {
gs_out_streams = variable->data.stream & ~(1u << 31);
@@ -492,23 +506,39 @@ void si_nir_scan_shader(const struct nir_shader *nir,
unsigned streamz = (gs_out_streams >> 4) & 3;
unsigned streamw = (gs_out_streams >> 6) & 3;
 
-   if (info->output_usagemask[i] & TGSI_WRITEMASK_X) {
+   if (usagemask & TGSI_WRITEMASK_X) {
+   info->output_usagemask[i] |= TGSI_WRITEMASK_X;
info->output_streams[i] |= streamx;
info->num_stream_output_components[streamx]++;
}
-   if (info->output_usagemask[i] & TGSI_WRITEMASK_Y) {
+   if (usagemask & TGSI_WRITEMASK_Y) {
+   info->output_usagemask[i] |= TGSI_WRITEMASK_Y;
info->output_streams[i] |= streamy << 2;
info->num_stream_output_components[streamy]++;
}
-   if (info->output_usagemask[i] & TGSI_WRITEMASK_Z) {
+   if (usagemask & TGSI_WRITEMASK_Z) {
+   info->output_usagemask[i] |= TGSI_WRITEMASK_Z;
info->output_streams[i] |= streamz << 4;
info->num_stream_output_components[streamz]++;
}
-   if (info->output_usagemask[i] & TGSI_WRITEMASK_W) {
+   if (usagemask & TGSI_WRITEMASK_W) {
+   info->output_usagemask[i] |= TGSI_WRITEMASK_W;
info->output_streams[i] |= streamw << 6;
info->num_stream_output_components[streamw]++;
}
 
+   /* make sure we only count this location once against the
+* num_outputs counter.
+*/
+   if (processed_outputs & ((uint64_t)1 << i))
+   continue;
+
+   processed_outputs |= ((uint64_t)1 << i);
+   num_outputs++;
+
+   info->output_semantic_name[i] = semantic_name;
+   info->output_semantic_index[i] = semantic_index;
+
switch (semantic_name) {
case TGSI_SEMANTIC_PRIMID:
info->writes_primid = true;
-- 
2.14.3

___
mesa-dev mailing list

[Mesa-dev] [PATCH] nvc0: fix writing query results into buffer

2018-02-20 Thread Ilia Mirkin
We need to mark the range as valid, and validate the resource using a
helper to ensure that the buffer status is marked properly.

Fixes some CTS pipeline stats query tests.

Signed-off-by: Ilia Mirkin 
---
 src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
index 7568eeb94db..ef5f939319a 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_query_hw.c
@@ -473,10 +473,10 @@ nvc0_hw_get_query_result_resource(struct nvc0_context 
*nvc0,
PUSH_DATAh(push, buf->address + offset);
PUSH_DATA (push, buf->address + offset);
 
-   if (buf->mm) {
-  nouveau_fence_ref(nvc0->screen->base.fence.current, >fence);
-  nouveau_fence_ref(nvc0->screen->base.fence.current, >fence_wr);
-   }
+   util_range_add(>valid_buffer_range, offset,
+  offset + (result_type >= PIPE_QUERY_TYPE_I64 ? 8 : 4));
+
+   nvc0_resource_validate(buf, NOUVEAU_BO_WR);
 }
 
 static const struct nvc0_query_funcs hw_query_funcs = {
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 16/17] intel/compiler: Disable Align16 tests on Gen11+

2018-02-20 Thread Matt Turner
Align16 is no more.
---
 src/intel/compiler/test_eu_validate.cpp | 16 
 1 file changed, 16 insertions(+)

diff --git a/src/intel/compiler/test_eu_validate.cpp 
b/src/intel/compiler/test_eu_validate.cpp
index cb2fcd3d40f..f6c2b35625e 100644
--- a/src/intel/compiler/test_eu_validate.cpp
+++ b/src/intel/compiler/test_eu_validate.cpp
@@ -374,6 +374,10 @@ TEST_P(validation_test, dst_horizontal_stride_0)
 
clear_instructions(p);
 
+   /* Align16 does not exist on Gen11+ */
+   if (devinfo.gen >= 11)
+  return;
+
brw_set_default_access_mode(p, BRW_ALIGN_16);
 
brw_ADD(p, g0, g0, g0);
@@ -421,6 +425,10 @@ TEST_P(validation_test, 
must_not_cross_grf_boundary_in_a_width)
 /* Destination Horizontal must be 1 in Align16 */
 TEST_P(validation_test, dst_hstride_on_align16_must_be_1)
 {
+   /* Align16 does not exist on Gen11+ */
+   if (devinfo.gen >= 11)
+  return;
+
brw_set_default_access_mode(p, BRW_ALIGN_16);
 
brw_ADD(p, g0, g0, g0);
@@ -439,6 +447,10 @@ TEST_P(validation_test, dst_hstride_on_align16_must_be_1)
 /* VertStride must be 0 or 4 in Align16 */
 TEST_P(validation_test, vstride_on_align16_must_be_0_or_4)
 {
+   /* Align16 does not exist on Gen11+ */
+   if (devinfo.gen >= 11)
+  return;
+
const struct {
   enum brw_vertical_stride vstride;
   bool expected_result;
@@ -1419,6 +1431,10 @@ TEST_P(validation_test, align16_64_bit_integer)
if (devinfo.gen < 8)
   return;
 
+   /* Align16 does not exist on Gen11+ */
+   if (devinfo.gen >= 11)
+  return;
+
brw_set_default_access_mode(p, BRW_ALIGN_16);
 
for (unsigned i = 0; i < sizeof(inst) / sizeof(inst[0]); i++) {
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 17/17] intel/compiler: Add ICL to test_eu_validate.cpp

2018-02-20 Thread Matt Turner
With the Align16 tests now disabled, we can run the rest of the tests in
ICL mode (and see them pass!)
---
 src/intel/compiler/test_eu_validate.cpp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/intel/compiler/test_eu_validate.cpp 
b/src/intel/compiler/test_eu_validate.cpp
index f6c2b35625e..d987311ef84 100644
--- a/src/intel/compiler/test_eu_validate.cpp
+++ b/src/intel/compiler/test_eu_validate.cpp
@@ -56,6 +56,7 @@ static const struct gen_info {
{ "glk", 9, IS_GLK },
{ "cfl", 9, IS_CFL },
{ "cnl", 10 },
+   { "icl", 11 },
 };
 
 class validation_test: public ::testing::TestWithParam {
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/17] intel/compiler/fs: Fix application of cmod and saturate to LINE/MAC pair

2018-02-20 Thread Matt Turner
---
 src/intel/compiler/brw_fs_generator.cpp | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 0854709b272..f2bdac7d731 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -673,6 +673,7 @@ fs_generator::generate_linterp(fs_inst *inst,
struct brw_reg delta_x = src[0];
struct brw_reg delta_y = offset(src[0], inst->exec_size / 8);
struct brw_reg interp = src[1];
+   brw_inst *i[2];
 
if (devinfo->gen >= 11) {
   struct brw_reg acc = retype(brw_acc_reg(8), BRW_REGISTER_TYPE_NF);
@@ -727,11 +728,19 @@ fs_generator::generate_linterp(fs_inst *inst,
 
   return false;
} else {
-  brw_LINE(p, brw_null_reg(), interp, delta_x);
-  brw_MAC(p, dst, suboffset(interp, 1), delta_y);
-
-  return true;
+  i[0] = brw_LINE(p, brw_null_reg(), interp, delta_x);
+  i[1] = brw_MAC(p, dst, suboffset(interp, 1), delta_y);
}
+
+   brw_inst_set_cond_modifier(p->devinfo, i[1], inst->conditional_mod);
+
+   /* brw_set_default_saturate() is called before emitting instructions, so the
+* saturate bit is set in each instruction, so we need to unset it on the
+* first instruction.
+*/
+   brw_inst_set_saturate(p->devinfo, i[0], false);
+
+   return true;
 }
 
 void
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 13/17] intel/compiler: Lower flrp32 on Gen11+

2018-02-20 Thread Matt Turner
The LRP instruction is no more.
---
 src/intel/compiler/brw_compiler.c   | 35 +
 src/intel/compiler/brw_fs_builder.h |  2 +-
 src/intel/compiler/brw_fs_generator.cpp |  2 +-
 src/intel/compiler/brw_vec4_builder.h   |  2 +-
 src/intel/compiler/brw_vec4_visitor.cpp |  2 +-
 5 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/src/intel/compiler/brw_compiler.c 
b/src/intel/compiler/brw_compiler.c
index e515559acb6..b651ba14f1b 100644
--- a/src/intel/compiler/brw_compiler.c
+++ b/src/intel/compiler/brw_compiler.c
@@ -45,20 +45,28 @@
.use_interpolated_input_intrinsics = true, \
.vertex_id_zero_based = true
 
+#define COMMON_SCALAR_OPTIONS \
+   .lower_pack_half_2x16 = true,  \
+   .lower_pack_snorm_2x16 = true, \
+   .lower_pack_snorm_4x8 = true,  \
+   .lower_pack_unorm_2x16 = true, \
+   .lower_pack_unorm_4x8 = true,  \
+   .lower_unpack_half_2x16 = true,\
+   .lower_unpack_snorm_2x16 = true,   \
+   .lower_unpack_snorm_4x8 = true,\
+   .lower_unpack_unorm_2x16 = true,   \
+   .lower_unpack_unorm_4x8 = true,\
+   .max_unroll_iterations = 32
+
 static const struct nir_shader_compiler_options scalar_nir_options = {
COMMON_OPTIONS,
-   .lower_pack_half_2x16 = true,
-   .lower_pack_snorm_2x16 = true,
-   .lower_pack_snorm_4x8 = true,
-   .lower_pack_unorm_2x16 = true,
-   .lower_pack_unorm_4x8 = true,
-   .lower_unpack_half_2x16 = true,
-   .lower_unpack_snorm_2x16 = true,
-   .lower_unpack_snorm_4x8 = true,
-   .lower_unpack_unorm_2x16 = true,
-   .lower_unpack_unorm_4x8 = true,
-   .vs_inputs_dual_locations = true,
-   .max_unroll_iterations = 32,
+   COMMON_SCALAR_OPTIONS,
+};
+
+static const struct nir_shader_compiler_options scalar_nir_options_gen11 = {
+   COMMON_OPTIONS,
+   COMMON_SCALAR_OPTIONS,
+   .lower_flrp32 = true,
 };
 
 static const struct nir_shader_compiler_options vector_nir_options = {
@@ -148,7 +156,8 @@ brw_compiler_create(void *mem_ctx, const struct 
gen_device_info *devinfo)
   compiler->glsl_compiler_options[i].OptimizeForAOS = !is_scalar;
 
   if (is_scalar) {
- compiler->glsl_compiler_options[i].NirOptions = _nir_options;
+ compiler->glsl_compiler_options[i].NirOptions =
+devinfo->gen < 11 ? _nir_options : 
_nir_options_gen11;
   } else {
  compiler->glsl_compiler_options[i].NirOptions =
 devinfo->gen < 6 ? _nir_options : _nir_options_gen6;
diff --git a/src/intel/compiler/brw_fs_builder.h 
b/src/intel/compiler/brw_fs_builder.h
index 87394bc17b3..874272b7afd 100644
--- a/src/intel/compiler/brw_fs_builder.h
+++ b/src/intel/compiler/brw_fs_builder.h
@@ -540,7 +540,7 @@ namespace brw {
   LRP(const dst_reg , const src_reg , const src_reg ,
   const src_reg ) const
   {
- if (shader->devinfo->gen >= 6) {
+ if (shader->devinfo->gen >= 6 && shader->devinfo->gen <= 10) {
 /* The LRP instruction actually does op1 * op0 + op2 * (1 - op0), 
so
  * we need to reorder the operands.
  */
diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index ffc46972420..9817e317cb8 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1857,7 +1857,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
 break;
 
   case BRW_OPCODE_LRP:
- assert(devinfo->gen >= 6);
+ assert(devinfo->gen >= 6 && devinfo->gen <= 10);
  if (devinfo->gen < 10)
 brw_set_default_access_mode(p, BRW_ALIGN_16);
  brw_LRP(p, dst, src[0], src[1], src[2]);
diff --git a/src/intel/compiler/brw_vec4_builder.h 
b/src/intel/compiler/brw_vec4_builder.h
index 4c3efe8457b..5c880c19f52 100644
--- a/src/intel/compiler/brw_vec4_builder.h
+++ b/src/intel/compiler/brw_vec4_builder.h
@@ -501,7 +501,7 @@ namespace brw {
   LRP(const dst_reg , const src_reg , const src_reg ,
   const src_reg ) const
   {
- if (shader->devinfo->gen >= 6) {
+ if (shader->devinfo->gen >= 6 && shader->devinfo->gen <= 10) {
 /* The LRP instruction actually does op1 * op0 + op2 * (1 - op0), 
so
  * we need to reorder the operands.
  */
diff --git a/src/intel/compiler/brw_vec4_visitor.cpp 
b/src/intel/compiler/brw_vec4_visitor.cpp
index 53f6a5ed546..e683a8c51db 100644
--- a/src/intel/compiler/brw_vec4_visitor.cpp
+++ b/src/intel/compiler/brw_vec4_visitor.cpp
@@ 

[Mesa-dev] [PATCH 12/17] intel/compiler/fs: Implement ddy without using align16 for Gen11+

2018-02-20 Thread Matt Turner
Align16 is no more. We previously generated an align16 ADD instruction
to calculate DDY:

   add(8) g11<1>F  -g10<4>.xyxyF  g10<4>.zwzwF  { align16 1Q };

Without align16, we now implement it as two align1 instructions:

   add(4) g11<2>F   -g10<4,2,0>Fg10.2<4,2,0>F  { align1 1N };
   add(4) g11.1<2>F -g10.1<4,2,0>F  g10.3<4,2,0>F  { align1 1N };
---
 src/intel/compiler/brw_fs_generator.cpp | 70 ++---
 1 file changed, 56 insertions(+), 14 deletions(-)

diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 013d2c820a0..ffc46972420 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1192,23 +1192,65 @@ fs_generator::generate_ddy(const fs_inst *inst,
 {
if (inst->opcode == FS_OPCODE_DDY_FINE) {
   /* produce accurate derivatives */
-  struct brw_reg src0 = src;
-  struct brw_reg src1 = src;
+  if (devinfo->gen >= 11) {
+ struct brw_reg x = src;
+ struct brw_reg y = src;
+ struct brw_reg z = src;
+ struct brw_reg w = src;
+ struct brw_reg dst_e = dst;
+ struct brw_reg dst_o = dst;
+
+ x.vstride = BRW_VERTICAL_STRIDE_4;
+ y.vstride = BRW_VERTICAL_STRIDE_4;
+ z.vstride = BRW_VERTICAL_STRIDE_4;
+ w.vstride = BRW_VERTICAL_STRIDE_4;
+
+ x.width = BRW_WIDTH_2;
+ y.width = BRW_WIDTH_2;
+ z.width = BRW_WIDTH_2;
+ w.width = BRW_WIDTH_2;
+
+ x.hstride = BRW_HORIZONTAL_STRIDE_0;
+ y.hstride = BRW_HORIZONTAL_STRIDE_0;
+ z.hstride = BRW_HORIZONTAL_STRIDE_0;
+ w.hstride = BRW_HORIZONTAL_STRIDE_0;
+
+ x.subnr = 0 * sizeof(float);
+ y.subnr = 1 * sizeof(float);
+ z.subnr = 2 * sizeof(float);
+ w.subnr = 3 * sizeof(float);
+
+ dst_e.hstride = BRW_HORIZONTAL_STRIDE_2;
+ dst_o.hstride = BRW_HORIZONTAL_STRIDE_2;
+ dst_o.subnr = sizeof(float);
 
-  src0.swizzle = BRW_SWIZZLE_XYXY;
-  src0.vstride = BRW_VERTICAL_STRIDE_4;
-  src0.width   = BRW_WIDTH_4;
-  src0.hstride = BRW_HORIZONTAL_STRIDE_1;
+ brw_push_insn_state(p);
+ if (inst->exec_size == 8)
+brw_set_default_exec_size(p, BRW_EXECUTE_4);
+ else
+brw_set_default_exec_size(p, BRW_EXECUTE_8);
+ brw_ADD(p, dst_e, negate(x), z);
+ brw_ADD(p, dst_o, negate(y), w);
+ brw_pop_insn_state(p);
+  } else {
+ struct brw_reg src0 = src;
+ struct brw_reg src1 = src;
 
-  src1.swizzle = BRW_SWIZZLE_ZWZW;
-  src1.vstride = BRW_VERTICAL_STRIDE_4;
-  src1.width   = BRW_WIDTH_4;
-  src1.hstride = BRW_HORIZONTAL_STRIDE_1;
+ src0.swizzle = BRW_SWIZZLE_XYXY;
+ src0.vstride = BRW_VERTICAL_STRIDE_4;
+ src0.width   = BRW_WIDTH_4;
+ src0.hstride = BRW_HORIZONTAL_STRIDE_1;
 
-  brw_push_insn_state(p);
-  brw_set_default_access_mode(p, BRW_ALIGN_16);
-  brw_ADD(p, dst, negate(src0), src1);
-  brw_pop_insn_state(p);
+ src1.swizzle = BRW_SWIZZLE_ZWZW;
+ src1.vstride = BRW_VERTICAL_STRIDE_4;
+ src1.width   = BRW_WIDTH_4;
+ src1.hstride = BRW_HORIZONTAL_STRIDE_1;
+
+ brw_push_insn_state(p);
+ brw_set_default_access_mode(p, BRW_ALIGN_16);
+ brw_ADD(p, dst, negate(src0), src1);
+ brw_pop_insn_state(p);
+  }
} else {
   /* replicate the derivative at the top-left pixel to other pixels */
   struct brw_reg src0 = src;
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/17] intel/compiler/fs: Return multiple_instructions_emitted from generate_linterp

2018-02-20 Thread Matt Turner
If multiple instructions are emitted, special handling of things like
conditional mod, saturate, and NoDDClr/NoDDChk need to be performed.

I noticed that conditional mods were misapplied when adding support for
Gen11 (in the previous patch). The next patch fixes the same bug in the
Gen4 LINE/MAC case, though I was not able to trigger it.
---
 src/intel/compiler/brw_fs.h |  2 +-
 src/intel/compiler/brw_fs_generator.cpp | 12 +---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 63373580ee4..37106ccb284 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -409,7 +409,7 @@ private:
void generate_urb_write(fs_inst *inst, struct brw_reg payload);
void generate_cs_terminate(fs_inst *inst, struct brw_reg payload);
void generate_barrier(fs_inst *inst, struct brw_reg src);
-   void generate_linterp(fs_inst *inst, struct brw_reg dst,
+   bool generate_linterp(fs_inst *inst, struct brw_reg dst,
 struct brw_reg *src);
void generate_tex(fs_inst *inst, struct brw_reg dst, struct brw_reg src,
  struct brw_reg surface_index,
diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index 54869bc3ebc..0854709b272 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -646,9 +646,9 @@ fs_generator::generate_barrier(fs_inst *inst, struct 
brw_reg src)
brw_WAIT(p);
 }
 
-void
+bool
 fs_generator::generate_linterp(fs_inst *inst,
-struct brw_reg dst, struct brw_reg *src)
+   struct brw_reg dst, struct brw_reg *src)
 {
/* PLN reads:
 *  /   in SIMD16   \
@@ -719,12 +719,18 @@ fs_generator::generate_linterp(fs_inst *inst,
  brw_inst_set_saturate(p->devinfo, i[0], false);
  brw_inst_set_saturate(p->devinfo, i[2], false);
   }
+
+  return true;
} else if (devinfo->has_pln &&
   (devinfo->gen >= 7 || (delta_x.nr & 1) == 0)) {
   brw_PLN(p, dst, interp, delta_x);
+
+  return false;
} else {
   brw_LINE(p, brw_null_reg(), interp, delta_x);
   brw_MAC(p, dst, suboffset(interp, 1), delta_y);
+
+  return true;
}
 }
 
@@ -1999,7 +2005,7 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
 brw_MOV(p, dst, src[0]);
 break;
   case FS_OPCODE_LINTERP:
-generate_linterp(inst, dst, src);
+multiple_instructions_emitted = generate_linterp(inst, dst, src);
 break;
   case FS_OPCODE_PIXEL_X:
  assert(src[0].type == BRW_REGISTER_TYPE_UW);
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 11/17] intel/compiler/fs: Simplify ddx/ddy code generation

2018-02-20 Thread Matt Turner
The brw_reg() constructor just obfuscates things here, in my opinion.
---
 src/intel/compiler/brw_fs_generator.cpp | 77 +++--
 1 file changed, 35 insertions(+), 42 deletions(-)

diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index e5a5a76a932..013d2c820a0 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1168,20 +1168,17 @@ fs_generator::generate_ddx(const fs_inst *inst,
   width = BRW_WIDTH_4;
}
 
-   struct brw_reg src0 = brw_reg(src.file, src.nr, 1,
- src.negate, src.abs,
-BRW_REGISTER_TYPE_F,
-vstride,
-width,
-BRW_HORIZONTAL_STRIDE_0,
-BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
-   struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
- src.negate, src.abs,
-BRW_REGISTER_TYPE_F,
-vstride,
-width,
-BRW_HORIZONTAL_STRIDE_0,
-BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
+   struct brw_reg src0 = src;
+   struct brw_reg src1 = src;
+
+   src0.subnr   = sizeof(float);
+   src0.vstride = vstride;
+   src0.width   = width;
+   src0.hstride = BRW_HORIZONTAL_STRIDE_0;
+   src1.vstride = vstride;
+   src1.width   = width;
+   src1.hstride = BRW_HORIZONTAL_STRIDE_0;
+
brw_ADD(p, dst, src0, negate(src1));
 }
 
@@ -1195,40 +1192,36 @@ fs_generator::generate_ddy(const fs_inst *inst,
 {
if (inst->opcode == FS_OPCODE_DDY_FINE) {
   /* produce accurate derivatives */
-  struct brw_reg src0 = brw_reg(src.file, src.nr, 0,
-src.negate, src.abs,
-BRW_REGISTER_TYPE_F,
-BRW_VERTICAL_STRIDE_4,
-BRW_WIDTH_4,
-BRW_HORIZONTAL_STRIDE_1,
-BRW_SWIZZLE_XYXY, WRITEMASK_XYZW);
-  struct brw_reg src1 = brw_reg(src.file, src.nr, 0,
-src.negate, src.abs,
-BRW_REGISTER_TYPE_F,
-BRW_VERTICAL_STRIDE_4,
-BRW_WIDTH_4,
-BRW_HORIZONTAL_STRIDE_1,
-BRW_SWIZZLE_ZWZW, WRITEMASK_XYZW);
+  struct brw_reg src0 = src;
+  struct brw_reg src1 = src;
+
+  src0.swizzle = BRW_SWIZZLE_XYXY;
+  src0.vstride = BRW_VERTICAL_STRIDE_4;
+  src0.width   = BRW_WIDTH_4;
+  src0.hstride = BRW_HORIZONTAL_STRIDE_1;
+
+  src1.swizzle = BRW_SWIZZLE_ZWZW;
+  src1.vstride = BRW_VERTICAL_STRIDE_4;
+  src1.width   = BRW_WIDTH_4;
+  src1.hstride = BRW_HORIZONTAL_STRIDE_1;
+
   brw_push_insn_state(p);
   brw_set_default_access_mode(p, BRW_ALIGN_16);
   brw_ADD(p, dst, negate(src0), src1);
   brw_pop_insn_state(p);
} else {
   /* replicate the derivative at the top-left pixel to other pixels */
-  struct brw_reg src0 = brw_reg(src.file, src.nr, 0,
-src.negate, src.abs,
-BRW_REGISTER_TYPE_F,
-BRW_VERTICAL_STRIDE_4,
-BRW_WIDTH_4,
-BRW_HORIZONTAL_STRIDE_0,
-BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
-  struct brw_reg src1 = brw_reg(src.file, src.nr, 2,
-src.negate, src.abs,
-BRW_REGISTER_TYPE_F,
-BRW_VERTICAL_STRIDE_4,
-BRW_WIDTH_4,
-BRW_HORIZONTAL_STRIDE_0,
-BRW_SWIZZLE_XYZW, WRITEMASK_XYZW);
+  struct brw_reg src0 = src;
+  struct brw_reg src1 = src;
+
+  src0.vstride = BRW_VERTICAL_STRIDE_4;
+  src0.width   = BRW_WIDTH_4;
+  src0.hstride = BRW_HORIZONTAL_STRIDE_0;
+  src1.vstride = BRW_VERTICAL_STRIDE_4;
+  src1.width   = BRW_WIDTH_4;
+  src1.hstride = BRW_HORIZONTAL_STRIDE_0;
+  src1.subnr   = 2 * sizeof(float);
+
   brw_ADD(p, dst, negate(src0), src1);
}
 }
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 14/17] intel/compiler: Mark line, pln, and lrp as removed on Gen11+

2018-02-20 Thread Matt Turner
---
 src/intel/compiler/brw_eu.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/src/intel/compiler/brw_eu.c b/src/intel/compiler/brw_eu.c
index bc297a21b32..3646076a8e8 100644
--- a/src/intel/compiler/brw_eu.c
+++ b/src/intel/compiler/brw_eu.c
@@ -384,7 +384,8 @@ enum gen {
GEN75 = (1 << 5),
GEN8  = (1 << 6),
GEN9  = (1 << 7),
-   GEN10  = (1 << 8),
+   GEN10 = (1 << 8),
+   GEN11 = (1 << 9),
GEN_ALL = ~0
 };
 
@@ -628,16 +629,16 @@ static const struct opcode_desc opcode_descs[128] = {
},
/* Reserved 88 */
[BRW_OPCODE_LINE] = {
-  .name = "line",.nsrc = 2, .ndst = 1, .gens = GEN_ALL,
+  .name = "line",.nsrc = 2, .ndst = 1, .gens = GEN_LE(GEN10),
},
[BRW_OPCODE_PLN] = {
-  .name = "pln", .nsrc = 2, .ndst = 1, .gens = GEN_GE(GEN45),
+  .name = "pln", .nsrc = 2, .ndst = 1, .gens = GEN_GE(GEN45) & 
GEN_LE(GEN10),
},
[BRW_OPCODE_MAD] = {
   .name = "mad", .nsrc = 3, .ndst = 1, .gens = GEN_GE(GEN6),
},
[BRW_OPCODE_LRP] = {
-  .name = "lrp", .nsrc = 3, .ndst = 1, .gens = GEN_GE(GEN6),
+  .name = "lrp", .nsrc = 3, .ndst = 1, .gens = GEN_GE(GEN6) & 
GEN_LE(GEN10),
},
[93] = {
   .name = "madm",.nsrc = 3, .ndst = 1, .gens = GEN_GE(GEN8),
@@ -662,6 +663,7 @@ gen_from_devinfo(const struct gen_device_info *devinfo)
case 8: return GEN8;
case 9: return GEN9;
case 10: return GEN10;
+   case 11: return GEN11;
default:
   unreachable("not reached");
}
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/17] intel/compiler/fs: Pass fs_inst to generate_ddx/ddy instead of opcode

2018-02-20 Thread Matt Turner
In a future patch, generate_ddy will want to inspect inst->exec_size.
Change generate_ddx as well for consistency.
---
 src/intel/compiler/brw_fs.h |  6 --
 src/intel/compiler/brw_fs_generator.cpp | 12 ++--
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/src/intel/compiler/brw_fs.h b/src/intel/compiler/brw_fs.h
index 37106ccb284..76ad76e08b7 100644
--- a/src/intel/compiler/brw_fs.h
+++ b/src/intel/compiler/brw_fs.h
@@ -417,8 +417,10 @@ private:
void generate_get_buffer_size(fs_inst *inst, struct brw_reg dst,
  struct brw_reg src,
  struct brw_reg surf_index);
-   void generate_ddx(enum opcode op, struct brw_reg dst, struct brw_reg src);
-   void generate_ddy(enum opcode op, struct brw_reg dst, struct brw_reg src);
+   void generate_ddx(const fs_inst *inst,
+ struct brw_reg dst, struct brw_reg src);
+   void generate_ddy(const fs_inst *inst,
+ struct brw_reg dst, struct brw_reg src);
void generate_scratch_write(fs_inst *inst, struct brw_reg src);
void generate_scratch_read(fs_inst *inst, struct brw_reg dst);
void generate_scratch_read_gen7(fs_inst *inst, struct brw_reg dst);
diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index f2bdac7d731..e5a5a76a932 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -1153,12 +1153,12 @@ fs_generator::generate_tex(fs_inst *inst, struct 
brw_reg dst, struct brw_reg src
  * appropriate swizzling.
  */
 void
-fs_generator::generate_ddx(enum opcode opcode,
+fs_generator::generate_ddx(const fs_inst *inst,
struct brw_reg dst, struct brw_reg src)
 {
unsigned vstride, width;
 
-   if (opcode == FS_OPCODE_DDX_FINE) {
+   if (inst->opcode == FS_OPCODE_DDX_FINE) {
   /* produce accurate derivatives */
   vstride = BRW_VERTICAL_STRIDE_2;
   width = BRW_WIDTH_2;
@@ -1190,10 +1190,10 @@ fs_generator::generate_ddx(enum opcode opcode,
  * left.
  */
 void
-fs_generator::generate_ddy(enum opcode opcode,
+fs_generator::generate_ddy(const fs_inst *inst,
struct brw_reg dst, struct brw_reg src)
 {
-   if (opcode == FS_OPCODE_DDY_FINE) {
+   if (inst->opcode == FS_OPCODE_DDY_FINE) {
   /* produce accurate derivatives */
   struct brw_reg src0 = brw_reg(src.file, src.nr, 0,
 src.negate, src.abs,
@@ -2049,11 +2049,11 @@ fs_generator::generate_code(const cfg_t *cfg, int 
dispatch_width)
 break;
   case FS_OPCODE_DDX_COARSE:
   case FS_OPCODE_DDX_FINE:
- generate_ddx(inst->opcode, dst, src[0]);
+ generate_ddx(inst, dst, src[0]);
  break;
   case FS_OPCODE_DDY_COARSE:
   case FS_OPCODE_DDY_FINE:
- generate_ddy(inst->opcode, dst, src[0]);
+ generate_ddy(inst, dst, src[0]);
 break;
 
   case SHADER_OPCODE_GEN4_SCRATCH_WRITE:
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/17] intel/compiler/fs: Don't generate integer DWord multiply on Gen11

2018-02-20 Thread Matt Turner
Like CHV et al., Gen11 does not support 32x32 -> 32/64-bit integer
multiplies.
---
 src/intel/common/gen_device_info.c | 4 
 src/intel/common/gen_device_info.h | 1 +
 src/intel/compiler/brw_fs.cpp  | 6 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/src/intel/common/gen_device_info.c 
b/src/intel/common/gen_device_info.c
index 465d4c783a1..c4b78e032a3 100644
--- a/src/intel/common/gen_device_info.c
+++ b/src/intel/common/gen_device_info.c
@@ -323,6 +323,7 @@ static const struct gen_device_info gen_device_info_hsw_gt3 
= {
.has_llc = true, \
.has_sample_with_hiz = false,\
.has_pln = true, \
+   .has_integer_dword_mul = true,   \
.has_64bit_types = true, \
.supports_simd16_3src = true,\
.has_surface_tile_offset = true, \
@@ -405,6 +406,7 @@ static const struct gen_device_info gen_device_info_bdw_gt3 
= {
 static const struct gen_device_info gen_device_info_chv = {
GEN8_FEATURES, .is_cherryview = 1, .gt = 1,
.has_llc = false,
+   .has_integer_dword_mul = false,
.num_slices = 1,
.num_subslices = { 2, },
.num_thread_per_eu = 7,
@@ -455,6 +457,7 @@ static const struct gen_device_info gen_device_info_chv = {
 #define GEN9_LP_FEATURES   \
GEN8_FEATURES,  \
GEN9_HW_INFO,   \
+   .has_integer_dword_mul = false, \
.gt = 1,\
.has_llc = false,   \
.has_sample_with_hiz = true,\
@@ -759,6 +762,7 @@ static const struct gen_device_info gen_device_info_cnl_5x8 
= {
GEN8_FEATURES,   \
GEN11_HW_INFO,   \
.has_64bit_types = false,\
+   .has_integer_dword_mul = false,  \
.gt = _gt, .num_slices = _slices, .l3_banks = _l3
 
 static const struct gen_device_info gen_device_info_icl_8x8 = {
diff --git a/src/intel/common/gen_device_info.h 
b/src/intel/common/gen_device_info.h
index 7761eeba7e0..edd910faee7 100644
--- a/src/intel/common/gen_device_info.h
+++ b/src/intel/common/gen_device_info.h
@@ -60,6 +60,7 @@ struct gen_device_info
 
bool has_pln;
bool has_64bit_types;
+   bool has_integer_dword_mul;
bool has_compr4;
bool has_surface_tile_offset;
bool supports_simd16_3src;
diff --git a/src/intel/compiler/brw_fs.cpp b/src/intel/compiler/brw_fs.cpp
index 6fb46e7374c..3b61fe9178c 100644
--- a/src/intel/compiler/brw_fs.cpp
+++ b/src/intel/compiler/brw_fs.cpp
@@ -3549,11 +3549,7 @@ fs_visitor::lower_integer_multiplication()
   inst->dst.type != BRW_REGISTER_TYPE_UD))
 continue;
 
- /* Gen8's MUL instruction can do a 32-bit x 32-bit -> 32-bit
-  * operation directly, but CHV/BXT cannot.
-  */
- if (devinfo->gen >= 8 &&
- !devinfo->is_cherryview && !gen_device_info_is_9lp(devinfo))
+ if (devinfo->has_integer_dword_mul)
 continue;
 
  if (inst->src[1].file == IMM &&
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 15/17] intel/compiler: Add instruction compaction support on Gen11

2018-02-20 Thread Matt Turner
Gen11 only differs from SKL+ in that it uses a new datatype index table.
---
 src/intel/compiler/brw_eu_compact.c | 42 +
 1 file changed, 42 insertions(+)

diff --git a/src/intel/compiler/brw_eu_compact.c 
b/src/intel/compiler/brw_eu_compact.c
index 8d33e2adffc..ae14ef10ec0 100644
--- a/src/intel/compiler/brw_eu_compact.c
+++ b/src/intel/compiler/brw_eu_compact.c
@@ -637,6 +637,41 @@ static const uint16_t gen8_src_index_table[32] = {
0b010110001000
 };
 
+static const uint32_t gen11_datatype_table[32] = {
+   0b00101,
+   0b001000100,
+   0b001000101,
+   0b001001101,
+   0b0010101100101,
+   0b0010010100101,
+   0b0010010010101,
+   0b00100100101000101,
+   0b00100100101100101,
+   0b001010101,
+   0b001110100,
+   0b001110101,
+   0b001000101000101000101,
+   0b001000111000101000100,
+   0b001000111000101000101,
+   0b001100100100101100101,
+   0b001100101100100100101,
+   0b001100101100101100100,
+   0b001100101100101100101,
+   0b00110000101100100,
+   0b001001100,
+   0b0010001100101,
+   0b0010101000101,
+   0b001010100,
+   0b001000101000101000100,
+   0b00100011100010100,
+   0b00100100100101001,
+   0b00110100101100101,
+   0b00110000101100101,
+   0b00100001101001100,
+   0b001001001001001001000,
+   0b001001011001001001000,
+};
+
 /* This is actually the control index table for Cherryview (26 bits), but the
  * only difference from Broadwell (24 bits) is that it has two extra 0-bits at
  * the start.
@@ -1450,8 +1485,15 @@ brw_init_compaction_tables(const struct gen_device_info 
*devinfo)
assert(gen8_datatype_table[ARRAY_SIZE(gen8_datatype_table) - 1] != 0);
assert(gen8_subreg_table[ARRAY_SIZE(gen8_subreg_table) - 1] != 0);
assert(gen8_src_index_table[ARRAY_SIZE(gen8_src_index_table) - 1] != 0);
+   assert(gen11_datatype_table[ARRAY_SIZE(gen11_datatype_table) - 1] != 0);
 
switch (devinfo->gen) {
+   case 11:
+  control_index_table = gen8_control_index_table;
+  datatype_table = gen11_datatype_table;
+  subreg_table = gen8_subreg_table;
+  src_index_table = gen8_src_index_table;
+  break;
case 10:
case 9:
case 8:
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 06/17] intel/compiler/fs: Implement FS_OPCODE_LINTERP with MADs on Gen11+

2018-02-20 Thread Matt Turner
The PLN instruction is no more. Its functionality is now implemented
using two MAD instructions with the new native-float type. Instead of

   pln(16) r20.0<1>:F r10.4<0;1,0>:F r4.0<8;8,1>:F

we now have

   mad(8) acc0<1>:NF r10.7<0;1,0>:F r4.0<8;8,1>:F r10.4<0;1,0>:F
   mad(8) r20.0<1>:F acc0<8;8,1>:NF r5.0<8;8,1>:F r10.5<0;1,0>:F
   mad(8) acc0<1>:NF r10.7<0;1,0>:F r6.0<8;8,1>:F r10.4<0;1,0>:F
   mad(8) r21.0<1>:F acc0<8;8,1>:NF r7.0<8;8,1>:F r10.5<0;1,0>:F

... and in the case of SIMD8 only the first pair of MAD instructions is
used.
---
 src/intel/compiler/brw_eu_emit.c|  2 +-
 src/intel/compiler/brw_fs_generator.cpp | 49 +++--
 2 files changed, 48 insertions(+), 3 deletions(-)

diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index ec871e5aa75..a96fe43556e 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -968,7 +968,7 @@ ALU2(DP4)
 ALU2(DPH)
 ALU2(DP3)
 ALU2(DP2)
-ALU3F(MAD)
+ALU3(MAD)
 ALU3F(LRP)
 ALU1(BFREV)
 ALU3(BFE)
diff --git a/src/intel/compiler/brw_fs_generator.cpp 
b/src/intel/compiler/brw_fs_generator.cpp
index cd5be054f69..54869bc3ebc 100644
--- a/src/intel/compiler/brw_fs_generator.cpp
+++ b/src/intel/compiler/brw_fs_generator.cpp
@@ -674,8 +674,53 @@ fs_generator::generate_linterp(fs_inst *inst,
struct brw_reg delta_y = offset(src[0], inst->exec_size / 8);
struct brw_reg interp = src[1];
 
-   if (devinfo->has_pln &&
-   (devinfo->gen >= 7 || (delta_x.nr & 1) == 0)) {
+   if (devinfo->gen >= 11) {
+  struct brw_reg acc = retype(brw_acc_reg(8), BRW_REGISTER_TYPE_NF);
+  struct brw_reg dwP = suboffset(interp, 0);
+  struct brw_reg dwQ = suboffset(interp, 1);
+  struct brw_reg dwR = suboffset(interp, 3);
+
+  brw_set_default_access_mode(p, BRW_ALIGN_1);
+  brw_set_default_exec_size(p, BRW_EXECUTE_8);
+
+  if (inst->exec_size == 8) {
+ brw_inst *i[2];
+
+ i[0] = brw_MAD(p,acc, dwR, offset(delta_x, 0), dwP);
+ i[1] = brw_MAD(p, offset(dst, 0), acc, offset(delta_y, 0), dwQ);
+
+ brw_inst_set_cond_modifier(p->devinfo, i[1], inst->conditional_mod);
+
+ /* brw_set_default_saturate() is called before emitting instructions,
+  * so the saturate bit is set in each instruction, so we need to unset
+  * it on the first instruction of each pair.
+  */
+ brw_inst_set_saturate(p->devinfo, i[0], false);
+  } else {
+ brw_inst *i[4];
+
+ brw_set_default_compression_control(p, BRW_COMPRESSION_NONE);
+ i[0] = brw_MAD(p,acc, dwR, offset(delta_x, 0), dwP);
+ i[1] = brw_MAD(p, offset(dst, 0), acc, offset(delta_x, 1), dwQ);
+
+ brw_set_default_compression_control(p, BRW_COMPRESSION_2NDHALF);
+ i[2] = brw_MAD(p,acc, dwR, offset(delta_y, 0), dwP);
+ i[3] = brw_MAD(p, offset(dst, 1), acc, offset(delta_y, 1), dwQ);
+
+ brw_set_default_compression_control(p, BRW_COMPRESSION_COMPRESSED);
+
+ brw_inst_set_cond_modifier(p->devinfo, i[1], inst->conditional_mod);
+ brw_inst_set_cond_modifier(p->devinfo, i[3], inst->conditional_mod);
+
+ /* brw_set_default_saturate() is called before emitting instructions,
+  * so the saturate bit is set in each instruction, so we need to unset
+  * it on the first instruction of each pair.
+  */
+ brw_inst_set_saturate(p->devinfo, i[0], false);
+ brw_inst_set_saturate(p->devinfo, i[2], false);
+  }
+   } else if (devinfo->has_pln &&
+  (devinfo->gen >= 7 || (delta_x.nr & 1) == 0)) {
   brw_PLN(p, dst, interp, delta_x);
} else {
   brw_LINE(p, brw_null_reg(), interp, delta_x);
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 00/17] intel/compiler: Ice Lake support

2018-02-20 Thread Matt Turner
[PATCH 01/17] intel: Add a preliminary device for Ice Lake
[PATCH 02/17] intel: Add icl pci id for INTEL_DEVID_OVERRIDE
[PATCH 03/17] intel: Disable 64-bit extensions on platforms without
[PATCH 04/17] intel/compiler: Add Gen11 register types
[PATCH 05/17] intel/compiler: Add Gen11+ native float type
[PATCH 06/17] intel/compiler/fs: Implement FS_OPCODE_LINTERP with
[PATCH 07/17] intel/compiler/fs: Return multiple_instructions_emitted
[PATCH 08/17] intel/compiler/fs: Fix application of cmod and saturate
[PATCH 09/17] intel/compiler/fs: Don't generate integer DWord
[PATCH 10/17] intel/compiler/fs: Pass fs_inst to generate_ddx/ddy
[PATCH 11/17] intel/compiler/fs: Simplify ddx/ddy code generation
[PATCH 12/17] intel/compiler/fs: Implement ddy without using align16
[PATCH 13/17] intel/compiler: Lower flrp32 on Gen11+
[PATCH 14/17] intel/compiler: Mark line, pln, and lrp as removed on
[PATCH 15/17] intel/compiler: Add instruction compaction support on
[PATCH 16/17] intel/compiler: Disable Align16 tests on Gen11+
[PATCH 17/17] intel/compiler: Add ICL to test_eu_validate.cpp
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/17] intel: Add a preliminary device for Ice Lake

2018-02-20 Thread Matt Turner
From: Anuj Phogat 

Signed-off-by: Anuj Phogat 
---
 include/pci_ids/i965_pci_ids.h |  9 ++
 src/intel/common/gen_device_info.c | 56 +-
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/include/pci_ids/i965_pci_ids.h b/include/pci_ids/i965_pci_ids.h
index feb9c582b19..81c9a5f13fb 100644
--- a/include/pci_ids/i965_pci_ids.h
+++ b/include/pci_ids/i965_pci_ids.h
@@ -196,3 +196,12 @@ CHIPSET(0x5A50, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 
5x8 GT2)")
 CHIPSET(0x5A51, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
 CHIPSET(0x5A52, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
 CHIPSET(0x5A54, cnl_5x8, "Intel(R) HD Graphics (Cannonlake 5x8 GT2)")
+CHIPSET(0x8A50, icl_8x8, "Intel(R) HD Graphics (Ice Lake 8x8 GT2)")
+CHIPSET(0x8A51, icl_8x8, "Intel(R) HD Graphics (Ice Lake 8x8 GT2)")
+CHIPSET(0x8A52, icl_8x8, "Intel(R) HD Graphics (Ice Lake 8x8 GT2)")
+CHIPSET(0x8A5A, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")
+CHIPSET(0x8A5B, icl_4x8, "Intel(R) HD Graphics (Ice Lake 4x8 GT1)")
+CHIPSET(0x8A5C, icl_6x8, "Intel(R) HD Graphics (Ice Lake 6x8 GT1.5)")
+CHIPSET(0x8A5D, icl_4x8, "Intel(R) HD Graphics (Ice Lake 4x8 GT1)")
+CHIPSET(0x8A71, icl_1x8, "Intel(R) HD Graphics (Ice Lake 1x8 GT0.5)")
+CHIPSET(0xFF05, icl_8x8, "Intel(R) HD Graphics (Ice Lake Simulation)")
diff --git a/src/intel/common/gen_device_info.c 
b/src/intel/common/gen_device_info.c
index a08a13a32a4..8bf4b6b9bb0 100644
--- a/src/intel/common/gen_device_info.c
+++ b/src/intel/common/gen_device_info.c
@@ -731,6 +731,49 @@ static const struct gen_device_info 
gen_device_info_cnl_5x8 = {
.is_cannonlake = true,
 };
 
+#define GEN11_HW_INFO   \
+   .gen = 11,   \
+   .has_pln = false,\
+   .max_vs_threads = 364,   \
+   .max_gs_threads = 224,   \
+   .max_tcs_threads = 224,  \
+   .max_tes_threads = 364,  \
+   .max_cs_threads = 56,\
+   .urb = { \
+  .size = 1024, \
+  .min_entries = {  \
+ [MESA_SHADER_VERTEX]= 64,  \
+ [MESA_SHADER_TESS_EVAL] = 34,  \
+  },\
+  .max_entries = {  \
+ [MESA_SHADER_VERTEX]= 2384,\
+ [MESA_SHADER_TESS_CTRL] = 1032,\
+ [MESA_SHADER_TESS_EVAL] = 2384,\
+ [MESA_SHADER_GEOMETRY]  = 1032,\
+  },\
+   }
+
+#define GEN11_FEATURES(_gt, _slices, _l3)   \
+   GEN8_FEATURES,   \
+   GEN11_HW_INFO,   \
+   .gt = _gt, .num_slices = _slices, .l3_banks = _l3
+
+static const struct gen_device_info gen_device_info_icl_8x8 = {
+   GEN11_FEATURES(2, 1, 8),
+};
+
+static const struct gen_device_info gen_device_info_icl_6x8 = {
+   GEN11_FEATURES(1, 1, 6),
+};
+
+static const struct gen_device_info gen_device_info_icl_4x8 = {
+   GEN11_FEATURES(1, 1, 6),
+};
+
+static const struct gen_device_info gen_device_info_icl_1x8 = {
+   GEN11_FEATURES(1, 1, 6),
+};
+
 bool
 gen_get_device_info(int devid, struct gen_device_info *devinfo)
 {
@@ -757,10 +800,21 @@ gen_get_device_info(int devid, struct gen_device_info 
*devinfo)
 * Extra padding can be necessary depending how the thread IDs are
 * calculated for a particular shader stage.
 */
-   if (devinfo->gen >= 9) {
+
+   switch(devinfo->gen) {
+   case 9:
+   case 10:
   devinfo->max_wm_threads = 64 /* threads-per-PSD */
   * devinfo->num_slices
   * 4; /* effective subslices per slice */
+  break;
+   case 11:
+  devinfo->max_wm_threads = 128 /* threads-per-PSD */
+  * devinfo->num_slices
+  * 8; /* subslices per slice */
+  break;
+   default:
+  break;
}
 
assert(devinfo->num_slices <= ARRAY_SIZE(devinfo->num_subslices));
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/17] intel/compiler: Add Gen11 register types

2018-02-20 Thread Matt Turner
The hardware register types' encodings have changed on Gen11. Good thing
we have that superfluous looking brw_reg_type abstraction lying around!
---
 src/intel/compiler/brw_reg_type.c | 73 ++-
 1 file changed, 65 insertions(+), 8 deletions(-)

diff --git a/src/intel/compiler/brw_reg_type.c 
b/src/intel/compiler/brw_reg_type.c
index b7fff0867f4..c4f8eedeb4b 100644
--- a/src/intel/compiler/brw_reg_type.c
+++ b/src/intel/compiler/brw_reg_type.c
@@ -40,6 +40,18 @@ enum hw_reg_type {
BRW_HW_REG_TYPE_B   = 5,
GEN7_HW_REG_TYPE_DF = 6,
GEN8_HW_REG_TYPE_HF = 10,
+
+   GEN11_HW_REG_TYPE_UD = 0,
+   GEN11_HW_REG_TYPE_D  = 1,
+   GEN11_HW_REG_TYPE_UW = 2,
+   GEN11_HW_REG_TYPE_W  = 3,
+   GEN11_HW_REG_TYPE_UB = 4,
+   GEN11_HW_REG_TYPE_B  = 5,
+   GEN11_HW_REG_TYPE_UQ = 6,
+   GEN11_HW_REG_TYPE_Q  = 7,
+   GEN11_HW_REG_TYPE_HF = 8,
+   GEN11_HW_REG_TYPE_F  = 9,
+   GEN11_HW_REG_TYPE_DF = 10,
 };
 
 enum hw_imm_type {
@@ -56,9 +68,22 @@ enum hw_imm_type {
BRW_HW_IMM_TYPE_V   = 6,
GEN8_HW_IMM_TYPE_DF = 10,
GEN8_HW_IMM_TYPE_HF = 11,
+
+   GEN11_HW_IMM_TYPE_UD = 0,
+   GEN11_HW_IMM_TYPE_D  = 1,
+   GEN11_HW_IMM_TYPE_UW = 2,
+   GEN11_HW_IMM_TYPE_W  = 3,
+   GEN11_HW_IMM_TYPE_UV = 4,
+   GEN11_HW_IMM_TYPE_V  = 5,
+   GEN11_HW_IMM_TYPE_UQ = 6,
+   GEN11_HW_IMM_TYPE_Q  = 7,
+   GEN11_HW_IMM_TYPE_HF = 8,
+   GEN11_HW_IMM_TYPE_F  = 9,
+   GEN11_HW_IMM_TYPE_DF = 10,
+   GEN11_HW_IMM_TYPE_VF = 11,
 };
 
-static const struct {
+static const struct hw_type {
enum hw_reg_type reg_type;
enum hw_imm_type imm_type;
 } gen4_hw_type[] = {
@@ -77,6 +102,22 @@ static const struct {
[BRW_REGISTER_TYPE_UB] = { BRW_HW_REG_TYPE_UB,  INVALID },
[BRW_REGISTER_TYPE_V]  = { INVALID, BRW_HW_IMM_TYPE_V   },
[BRW_REGISTER_TYPE_UV] = { INVALID, BRW_HW_IMM_TYPE_UV  },
+}, gen11_hw_type[] = {
+   [BRW_REGISTER_TYPE_DF] = { GEN11_HW_REG_TYPE_DF, GEN11_HW_IMM_TYPE_DF },
+   [BRW_REGISTER_TYPE_F]  = { GEN11_HW_REG_TYPE_F,  GEN11_HW_IMM_TYPE_F  },
+   [BRW_REGISTER_TYPE_HF] = { GEN11_HW_REG_TYPE_HF, GEN11_HW_IMM_TYPE_HF },
+   [BRW_REGISTER_TYPE_VF] = { INVALID,  GEN11_HW_IMM_TYPE_VF },
+
+   [BRW_REGISTER_TYPE_Q]  = { GEN11_HW_REG_TYPE_Q,  GEN11_HW_IMM_TYPE_Q  },
+   [BRW_REGISTER_TYPE_UQ] = { GEN11_HW_REG_TYPE_UQ, GEN11_HW_IMM_TYPE_UQ },
+   [BRW_REGISTER_TYPE_D]  = { GEN11_HW_REG_TYPE_D,  GEN11_HW_IMM_TYPE_D  },
+   [BRW_REGISTER_TYPE_UD] = { GEN11_HW_REG_TYPE_UD, GEN11_HW_IMM_TYPE_UD },
+   [BRW_REGISTER_TYPE_W]  = { GEN11_HW_REG_TYPE_W,  GEN11_HW_IMM_TYPE_W  },
+   [BRW_REGISTER_TYPE_UW] = { GEN11_HW_REG_TYPE_UW, GEN11_HW_IMM_TYPE_UW },
+   [BRW_REGISTER_TYPE_B]  = { GEN11_HW_REG_TYPE_B,  INVALID  },
+   [BRW_REGISTER_TYPE_UB] = { GEN11_HW_REG_TYPE_UB, INVALID  },
+   [BRW_REGISTER_TYPE_V]  = { INVALID,  GEN11_HW_IMM_TYPE_V  },
+   [BRW_REGISTER_TYPE_UV] = { INVALID,  GEN11_HW_IMM_TYPE_UV },
 };
 
 /* SNB adds 3-src instructions (MAD and LRP) that only operate on floats, so
@@ -147,14 +188,22 @@ brw_reg_type_to_hw_type(const struct gen_device_info 
*devinfo,
 enum brw_reg_file file,
 enum brw_reg_type type)
 {
-   assert(type < ARRAY_SIZE(gen4_hw_type));
+   const struct hw_type *table;
+
+   if (devinfo->gen >= 11) {
+  assert(type < ARRAY_SIZE(gen11_hw_type));
+  table = gen11_hw_type;
+   } else {
+  assert(type < ARRAY_SIZE(gen4_hw_type));
+  table = gen4_hw_type;
+   }
 
if (file == BRW_IMMEDIATE_VALUE) {
-  assert(gen4_hw_type[type].imm_type != (enum hw_imm_type)INVALID);
-  return gen4_hw_type[type].imm_type;
+  assert(table[type].imm_type != (enum hw_imm_type)INVALID);
+  return table[type].imm_type;
} else {
-  assert(gen4_hw_type[type].reg_type != (enum hw_reg_type)INVALID);
-  return gen4_hw_type[type].reg_type;
+  assert(table[type].reg_type != (enum hw_reg_type)INVALID);
+  return table[type].reg_type;
}
 }
 
@@ -167,15 +216,23 @@ enum brw_reg_type
 brw_hw_type_to_reg_type(const struct gen_device_info *devinfo,
 enum brw_reg_file file, unsigned hw_type)
 {
+   const struct hw_type *table;
+
+   if (devinfo->gen >= 11) {
+  table = gen11_hw_type;
+   } else {
+  table = gen4_hw_type;
+   }
+
if (file == BRW_IMMEDIATE_VALUE) {
   for (enum brw_reg_type i = 0; i <= BRW_REGISTER_TYPE_LAST; i++) {
- if (gen4_hw_type[i].imm_type == (enum hw_imm_type)hw_type) {
+ if (table[i].imm_type == (enum hw_imm_type)hw_type) {
 return i;
  }
   }
} else {
   for (enum brw_reg_type i = 0; i <= BRW_REGISTER_TYPE_LAST; i++) {
- if (gen4_hw_type[i].reg_type == (enum hw_reg_type)hw_type) {
+ if (table[i].reg_type == (enum hw_reg_type)hw_type) {
 return i;
  }
   }
-- 
2.16.1

___
mesa-dev mailing list

[Mesa-dev] [PATCH 05/17] intel/compiler: Add Gen11+ native float type

2018-02-20 Thread Matt Turner
This new type exposes the additional precision offered by the
accumulator register and will be used in the next patch to implement the
functionality of the PLN instruction using a pair of MAD instructions.

One weird thing to note: align1 ternary instructions may only have an
accumulator in the dst or src1 normally, but when src0's type is :NF
the accumulator is read.
---
 src/intel/compiler/brw_disasm.c  |  7 +++
 src/intel/compiler/brw_eu_emit.c | 10 --
 src/intel/compiler/brw_eu_validate.c |  1 +
 src/intel/compiler/brw_reg_type.c|  8 
 src/intel/compiler/brw_reg_type.h|  2 ++
 src/intel/compiler/brw_shader.cpp|  6 ++
 6 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/src/intel/compiler/brw_disasm.c b/src/intel/compiler/brw_disasm.c
index 429ed781404..a9a108f8acd 100644
--- a/src/intel/compiler/brw_disasm.c
+++ b/src/intel/compiler/brw_disasm.c
@@ -1035,6 +1035,12 @@ src0_3src(FILE *file, const struct gen_device_info 
*devinfo, const brw_inst *ins
  reg_nr = brw_inst_3src_src0_reg_nr(devinfo, inst);
  subreg_nr = brw_inst_3src_a1_src0_subreg_nr(devinfo, inst);
  type = brw_inst_3src_a1_src0_type(devinfo, inst);
+  } else if (brw_inst_3src_a1_src0_type(devinfo, inst) ==
+ BRW_REGISTER_TYPE_NF) {
+ _file = BRW_ARCHITECTURE_REGISTER_FILE;
+ reg_nr = brw_inst_3src_src0_reg_nr(devinfo, inst);
+ subreg_nr = brw_inst_3src_a1_src0_subreg_nr(devinfo, inst);
+ type = brw_inst_3src_a1_src0_type(devinfo, inst);
   } else {
  _file = BRW_IMMEDIATE_VALUE;
  uint16_t imm_val = brw_inst_3src_a1_src0_imm(devinfo, inst);
@@ -1288,6 +1294,7 @@ imm(FILE *file, const struct gen_device_info *devinfo, 
enum brw_reg_type type,
case BRW_REGISTER_TYPE_HF:
   string(file, "Half Float IMM");
   break;
+   case BRW_REGISTER_TYPE_NF:
case BRW_REGISTER_TYPE_UB:
case BRW_REGISTER_TYPE_B:
   format(file, "*** invalid immediate type %d ", type);
diff --git a/src/intel/compiler/brw_eu_emit.c b/src/intel/compiler/brw_eu_emit.c
index c25d8d6eda0..ec871e5aa75 100644
--- a/src/intel/compiler/brw_eu_emit.c
+++ b/src/intel/compiler/brw_eu_emit.c
@@ -771,7 +771,11 @@ brw_alu3(struct brw_codegen *p, unsigned opcode, struct 
brw_reg dest,
 to_3src_align1_hstride(src2.hstride));
 
   brw_inst_set_3src_a1_src0_subreg_nr(devinfo, inst, src0.subnr);
-  brw_inst_set_3src_src0_reg_nr(devinfo, inst, src0.nr);
+  if (src0.type == BRW_REGISTER_TYPE_NF) {
+ brw_inst_set_3src_src0_reg_nr(devinfo, inst, BRW_ARF_ACCUMULATOR);
+  } else {
+ brw_inst_set_3src_src0_reg_nr(devinfo, inst, src0.nr);
+  }
   brw_inst_set_3src_src0_abs(devinfo, inst, src0.abs);
   brw_inst_set_3src_src0_negate(devinfo, inst, src0.negate);
 
@@ -790,7 +794,9 @@ brw_alu3(struct brw_codegen *p, unsigned opcode, struct 
brw_reg dest,
   brw_inst_set_3src_src2_negate(devinfo, inst, src2.negate);
 
   assert(src0.file == BRW_GENERAL_REGISTER_FILE ||
- src0.file == BRW_IMMEDIATE_VALUE);
+ src0.file == BRW_IMMEDIATE_VALUE ||
+ (src0.file == BRW_ARCHITECTURE_REGISTER_FILE &&
+  src0.type == BRW_REGISTER_TYPE_NF));
   assert(src1.file == BRW_GENERAL_REGISTER_FILE ||
  src1.file == BRW_ARCHITECTURE_REGISTER_FILE);
   assert(src2.file == BRW_GENERAL_REGISTER_FILE ||
diff --git a/src/intel/compiler/brw_eu_validate.c 
b/src/intel/compiler/brw_eu_validate.c
index 6ee6b4ffbe7..d3189d1ef5e 100644
--- a/src/intel/compiler/brw_eu_validate.c
+++ b/src/intel/compiler/brw_eu_validate.c
@@ -277,6 +277,7 @@ static enum brw_reg_type
 execution_type_for_type(enum brw_reg_type type)
 {
switch (type) {
+   case BRW_REGISTER_TYPE_NF:
case BRW_REGISTER_TYPE_DF:
case BRW_REGISTER_TYPE_F:
case BRW_REGISTER_TYPE_HF:
diff --git a/src/intel/compiler/brw_reg_type.c 
b/src/intel/compiler/brw_reg_type.c
index c4f8eedeb4b..3c82eb0a76f 100644
--- a/src/intel/compiler/brw_reg_type.c
+++ b/src/intel/compiler/brw_reg_type.c
@@ -52,6 +52,7 @@ enum hw_reg_type {
GEN11_HW_REG_TYPE_HF = 8,
GEN11_HW_REG_TYPE_F  = 9,
GEN11_HW_REG_TYPE_DF = 10,
+   GEN11_HW_REG_TYPE_NF = 11,
 };
 
 enum hw_imm_type {
@@ -87,6 +88,8 @@ static const struct hw_type {
enum hw_reg_type reg_type;
enum hw_imm_type imm_type;
 } gen4_hw_type[] = {
+   [0 ... BRW_REGISTER_TYPE_LAST] = { INVALID, INVALID },
+
[BRW_REGISTER_TYPE_DF] = { GEN7_HW_REG_TYPE_DF, GEN8_HW_IMM_TYPE_DF },
[BRW_REGISTER_TYPE_F]  = { BRW_HW_REG_TYPE_F,   BRW_HW_IMM_TYPE_F   },
[BRW_REGISTER_TYPE_HF] = { GEN8_HW_REG_TYPE_HF, GEN8_HW_IMM_TYPE_HF },
@@ -103,6 +106,7 @@ static const struct hw_type {
[BRW_REGISTER_TYPE_V]  = { INVALID, BRW_HW_IMM_TYPE_V   },
[BRW_REGISTER_TYPE_UV] = { INVALID, BRW_HW_IMM_TYPE_UV  },
 }, gen11_hw_type[] = {
+   [BRW_REGISTER_TYPE_NF] 

[Mesa-dev] [PATCH 03/17] intel: Disable 64-bit extensions on platforms without 64-bit types

2018-02-20 Thread Matt Turner
Gen11 does not support DF, Q, UQ types in hardware. As a result, we have
to disable some GL extensions until they can be reimplemented.
---
 src/intel/common/gen_device_info.c   | 3 +++
 src/intel/common/gen_device_info.h   | 1 +
 src/mesa/drivers/dri/i965/intel_extensions.c | 9 +
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/intel/common/gen_device_info.c 
b/src/intel/common/gen_device_info.c
index 8bf4b6b9bb0..465d4c783a1 100644
--- a/src/intel/common/gen_device_info.c
+++ b/src/intel/common/gen_device_info.c
@@ -138,6 +138,7 @@ static const struct gen_device_info gen_device_info_snb_gt2 
= {
.must_use_separate_stencil = true,   \
.has_llc = true, \
.has_pln = true, \
+   .has_64bit_types = true, \
.has_surface_tile_offset = true, \
.timestamp_frequency = 1250
 
@@ -322,6 +323,7 @@ static const struct gen_device_info gen_device_info_hsw_gt3 
= {
.has_llc = true, \
.has_sample_with_hiz = false,\
.has_pln = true, \
+   .has_64bit_types = true, \
.supports_simd16_3src = true,\
.has_surface_tile_offset = true, \
.max_vs_threads = 504,   \
@@ -756,6 +758,7 @@ static const struct gen_device_info gen_device_info_cnl_5x8 
= {
 #define GEN11_FEATURES(_gt, _slices, _l3)   \
GEN8_FEATURES,   \
GEN11_HW_INFO,   \
+   .has_64bit_types = false,\
.gt = _gt, .num_slices = _slices, .l3_banks = _l3
 
 static const struct gen_device_info gen_device_info_icl_8x8 = {
diff --git a/src/intel/common/gen_device_info.h 
b/src/intel/common/gen_device_info.h
index fd9c17531db..7761eeba7e0 100644
--- a/src/intel/common/gen_device_info.h
+++ b/src/intel/common/gen_device_info.h
@@ -59,6 +59,7 @@ struct gen_device_info
bool has_llc;
 
bool has_pln;
+   bool has_64bit_types;
bool has_compr4;
bool has_surface_tile_offset;
bool supports_simd16_3src;
diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
b/src/mesa/drivers/dri/i965/intel_extensions.c
index cc961e051fd..3f5f4dab411 100644
--- a/src/mesa/drivers/dri/i965/intel_extensions.c
+++ b/src/mesa/drivers/dri/i965/intel_extensions.c
@@ -217,7 +217,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.ARB_derivative_control = true;
   ctx->Extensions.ARB_framebuffer_no_attachments = true;
   ctx->Extensions.ARB_gpu_shader5 = true;
-  ctx->Extensions.ARB_gpu_shader_fp64 = true;
+  ctx->Extensions.ARB_gpu_shader_fp64 = devinfo->has_64bit_types;
   ctx->Extensions.ARB_shader_atomic_counters = true;
   ctx->Extensions.ARB_shader_atomic_counter_ops = true;
   ctx->Extensions.ARB_shader_clock = true;
@@ -229,7 +229,7 @@ intelInitExtensions(struct gl_context *ctx)
   ctx->Extensions.ARB_texture_compression_bptc = true;
   ctx->Extensions.ARB_texture_view = true;
   ctx->Extensions.ARB_shader_storage_buffer_object = true;
-  ctx->Extensions.ARB_vertex_attrib_64bit = true;
+  ctx->Extensions.ARB_vertex_attrib_64bit = devinfo->has_64bit_types;
   ctx->Extensions.EXT_shader_samples_identical = true;
   ctx->Extensions.OES_primitive_bounding_box = true;
   ctx->Extensions.OES_texture_buffer = true;
@@ -279,8 +279,9 @@ intelInitExtensions(struct gl_context *ctx)
}
 
if (devinfo->gen >= 8) {
-  ctx->Extensions.ARB_gpu_shader_int64 = true;
-  ctx->Extensions.ARB_shader_ballot = true; /* requires 
ARB_gpu_shader_int64 */
+  ctx->Extensions.ARB_gpu_shader_int64 = devinfo->has_64bit_types;
+  /* requires ARB_gpu_shader_int64 */
+  ctx->Extensions.ARB_shader_ballot = devinfo->has_64bit_types;
   ctx->Extensions.ARB_ES3_2_compatibility = true;
}
 
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/17] intel: Add icl pci id for INTEL_DEVID_OVERRIDE

2018-02-20 Thread Matt Turner
From: Anuj Phogat 

Reviewed-by: Matt Turner 
Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/intel_screen.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index ef5aee894fa..0367feb47c2 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -2380,6 +2380,7 @@ parse_devid_override(const char *devid_override)
   { "kbl", 0x5912 },
   { "glk", 0x3185 },
   { "cnl", 0x5a52 },
+  { "icl", 0x8a52 }
};
 
for (unsigned i = 0; i < ARRAY_SIZE(name_map); i++) {
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] intel/gen9+: Enable object level preemption.

2018-02-20 Thread Ben Widawsky

On 18-02-20 09:15:01, Antognolli, Rafael wrote:

On Tue, Feb 20, 2018 at 08:11:14AM -0800, Rafael Antognolli wrote:

On Fri, Feb 16, 2018 at 06:37:55PM -0800, Ben Widawsky wrote:
> On 18-02-16 13:44:00, Antognolli, Rafael wrote:
> > "This field controls the granularity of the replay mechanism when
> > coming back into a previously preempted context."
> >
> > The kernel disables this bit but whitelists the register, and it's a
> > context register. So enable it and take advantage of finer granularity
> > when preemption is available.
> >
>
> Does the kernel actually disable it? I thought the kernel just doesn't touch 
it
> (I don't think it's whitelisted by the kernel either, it's just writable).

I'm seeing it being disabled at WaDisable3DMidCmdPreemption, seems to be
in effect since commit 5152defe4a53ad15e6d96c422440152302c8abd7.

And it's whitelisted by WaEnablePreemptionGranularityControlByUMD.

> > Signed-off-by: Rafael Antognolli 
> > Cc: Ben Widawsky 
> > ---
> >
> > This patch still needs more testing (only ran it through CI and also did
> > some basic tests on my machine to make sure it's not breaking anything).
> >
> > src/intel/genxml/gen10.xml   |  8 
> > src/intel/genxml/gen11.xml   |  8 
> > src/intel/genxml/gen9.xml|  8 
> > src/intel/vulkan/genX_state.c| 18 ++
> > src/mesa/drivers/dri/i965/brw_defines.h  |  5 +
> > src/mesa/drivers/dri/i965/brw_state_upload.c | 10 ++
> > 6 files changed, 57 insertions(+)
> >
> > diff --git a/src/intel/genxml/gen10.xml b/src/intel/genxml/gen10.xml
> > index 47c679a3fa9..42ac6e82696 100644
> > --- a/src/intel/genxml/gen10.xml
> > +++ b/src/intel/genxml/gen10.xml
> > @@ -3692,6 +3692,14 @@
> > 
> >   
> >
> > +  
> > +
> > +  
> > +  
> > +
> > +
> > +  
> > +
> >   
> > 
> >   
> > diff --git a/src/intel/genxml/gen11.xml b/src/intel/genxml/gen11.xml
> > index 9a8a2fe21e3..e6ce42b2bfb 100644
> > --- a/src/intel/genxml/gen11.xml
> > +++ b/src/intel/genxml/gen11.xml
> > @@ -3688,6 +3688,14 @@
> > 
> >   
> >
> > +  
> > +
> > +  
> > +  
> > +
> > +
> > +  
> > +
> >   
> > 
> >   
> > diff --git a/src/intel/genxml/gen9.xml b/src/intel/genxml/gen9.xml
> > index 7eef4bee013..45e1fddeb50 100644
> > --- a/src/intel/genxml/gen9.xml
> > +++ b/src/intel/genxml/gen9.xml
> > @@ -3638,6 +3638,14 @@
> > 
> >   
> >
> > +  
> > +
> > +  
> > +  
> > +
> > +
> > +  
> > +
> >   
> > 
> >   
> > diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c
> > index 54fb8634fdc..83b6c6387f3 100644
> > --- a/src/intel/vulkan/genX_state.c
> > +++ b/src/intel/vulkan/genX_state.c
> > @@ -169,6 +169,24 @@ genX(init_device_state)(struct anv_device *device)
> >gen10_emit_wa_lri_to_cache_mode_zero();
> > #endif
> >
> > +#if GEN_GEN >= 9
> > +   /* A fixed function pipe flush is required before modifying this field 
*/
> > +   anv_batch_emit(, GENX(PIPE_CONTROL), pipe) {
> > +  pipe.PipeControlFlushEnable = true;
> > +   }
> > +
> > +   /* enable object level preemption */
> > +   uint32_t csc1;
> > +
> > +   anv_pack_struct(, GENX(CS_CHICKEN1),
> > +   .ReplayMode = ObjectLevelPreemption,
> > +   .ReplayModeMask = 1);
> > +   anv_batch_emit(, GENX(MI_LOAD_REGISTER_IMM), lri) {
> > +  lri.RegisterOffset   = GENX(CS_CHICKEN1_num);
> > +  lri.DataDWord= csc1;
> > +   }
> > +#endif
> > +
> >anv_batch_emit(, GENX(MI_BATCH_BUFFER_END), bbe);
> >
> >assert(batch.next <= batch.end);
> > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
b/src/mesa/drivers/dri/i965/brw_defines.h
> > index 8bf6f68b67c..f0994d3b139 100644
> > --- a/src/mesa/drivers/dri/i965/brw_defines.h
> > +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> > @@ -1661,4 +1661,9 @@ enum brw_pixel_shader_coverage_mask_mode {
> > # define GLK_SCEC_BARRIER_MODE_3D_HULL (1 << 7)
> > # define GLK_SCEC_BARRIER_MODE_MASKREG_MASK(1 << 7)
> >
> > +#define CS_CHICKEN10x2580 /* Gen9+ */
> > +# define GEN9_REPLAY_MODE_MIDBUFFER (0 << 0)
> > +# define GEN9_REPLAY_MODE_MIDOBJECT (1 << 0)
> > +# define GEN9_REPLAY_MODE_MASK  REG_MASK(1 << 0)
> > +
> > #endif
> > diff --git a/src/mesa/drivers/dri/i965/brw_state_upload.c 
b/src/mesa/drivers/dri/i965/brw_state_upload.c
> > index 86c12e4d357..a90dc01d87b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_state_upload.c
> > +++ b/src/mesa/drivers/dri/i965/brw_state_upload.c
> > @@ -115,6 +115,16 @@ brw_upload_initial_gpu_state(struct brw_context *brw)
> >   OUT_BATCH(0);
> >   ADVANCE_BATCH();
> >}
> > +
> > +   if (devinfo->gen >= 9) {
> > +  /* A fixed function pipe flush is required before modifying this 
field */
> > +  brw_emit_pipe_control_flush(brw, 

[Mesa-dev] [PATCH] nv50,nvc0: fix clear buffer acceleration

2018-02-20 Thread Ilia Mirkin
Two things were off:
 - valid range was not updated, which could affect waiting for future
   maps
 - fencing was done manually instead of using the *_resource_validate
   helper, which resulted in a missed dirty buffer flag being set

Fixes: KHR-GL45.direct_state_access.buffers_clear
Signed-off-by: Ilia Mirkin 
---

Untested on pre-kepler paths. Pretty similar overall.

 src/gallium/drivers/nouveau/nv50/nv50_surface.c | 20 
 src/gallium/drivers/nouveau/nvc0/nvc0_surface.c | 25 +
 2 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/src/gallium/drivers/nouveau/nv50/nv50_surface.c 
b/src/gallium/drivers/nouveau/nv50/nv50_surface.c
index 908c534b92e..037e14a4d60 100644
--- a/src/gallium/drivers/nouveau/nv50/nv50_surface.c
+++ b/src/gallium/drivers/nouveau/nv50/nv50_surface.c
@@ -672,10 +672,7 @@ nv50_clear_buffer_push(struct pipe_context *pipe,
   count -= nr;
}
 
-   if (buf->mm) {
-  nouveau_fence_ref(nv50->screen->base.fence.current, >fence);
-  nouveau_fence_ref(nv50->screen->base.fence.current, >fence_wr);
-   }
+   nv50_resource_validate(buf, NOUVEAU_BO_WR);
 
nouveau_bufctx_reset(nv50->bufctx, 0);
 }
@@ -727,6 +724,8 @@ nv50_clear_buffer(struct pipe_context *pipe,
   return;
}
 
+   util_range_add(>valid_buffer_range, offset, offset + size);
+
assert(size % data_size == 0);
 
if (offset & 0xff) {
@@ -747,10 +746,10 @@ nv50_clear_buffer(struct pipe_context *pipe,
assert(width > 0);
 
BEGIN_NV04(push, NV50_3D(CLEAR_COLOR(0)), 4);
-   PUSH_DATAf(push, color.f[0]);
-   PUSH_DATAf(push, color.f[1]);
-   PUSH_DATAf(push, color.f[2]);
-   PUSH_DATAf(push, color.f[3]);
+   PUSH_DATA (push, color.ui[0]);
+   PUSH_DATA (push, color.ui[1]);
+   PUSH_DATA (push, color.ui[2]);
+   PUSH_DATA (push, color.ui[3]);
 
if (nouveau_pushbuf_space(push, 64, 1, 0))
   return;
@@ -796,10 +795,7 @@ nv50_clear_buffer(struct pipe_context *pipe,
BEGIN_NV04(push, NV50_3D(COND_MODE), 1);
PUSH_DATA (push, nv50->cond_condmode);
 
-   if (buf->mm) {
-  nouveau_fence_ref(nv50->screen->base.fence.current, >fence);
-  nouveau_fence_ref(nv50->screen->base.fence.current, >fence_wr);
-   }
+   nv50_resource_validate(buf, NOUVEAU_BO_WR);
 
if (width * height != elements) {
   offset += width * height * data_size;
diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c 
b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
index 9445c05f3ab..0f86c11b7f4 100644
--- a/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
+++ b/src/gallium/drivers/nouveau/nvc0/nvc0_surface.c
@@ -403,10 +403,7 @@ nvc0_clear_buffer_push_nvc0(struct pipe_context *pipe,
   size -= nr * 4;
}
 
-   if (buf->mm) {
-  nouveau_fence_ref(nvc0->screen->base.fence.current, >fence);
-  nouveau_fence_ref(nvc0->screen->base.fence.current, >fence_wr);
-   }
+   nvc0_resource_validate(buf, NOUVEAU_BO_WR);
 
nouveau_bufctx_reset(nvc0->bufctx, 0);
 }
@@ -453,10 +450,7 @@ nvc0_clear_buffer_push_nve4(struct pipe_context *pipe,
   size -= nr * 4;
}
 
-   if (buf->mm) {
-  nouveau_fence_ref(nvc0->screen->base.fence.current, >fence);
-  nouveau_fence_ref(nvc0->screen->base.fence.current, >fence_wr);
-   }
+   nvc0_resource_validate(buf, NOUVEAU_BO_WR);
 
nouveau_bufctx_reset(nvc0->bufctx, 0);
 }
@@ -540,6 +534,8 @@ nvc0_clear_buffer(struct pipe_context *pipe,
   return;
}
 
+   util_range_add(>valid_buffer_range, offset, offset + size);
+
assert(size % data_size == 0);
 
if (data_size == 12) {
@@ -570,10 +566,10 @@ nvc0_clear_buffer(struct pipe_context *pipe,
PUSH_REFN (push, buf->bo, buf->domain | NOUVEAU_BO_WR);
 
BEGIN_NVC0(push, NVC0_3D(CLEAR_COLOR(0)), 4);
-   PUSH_DATAf(push, color.f[0]);
-   PUSH_DATAf(push, color.f[1]);
-   PUSH_DATAf(push, color.f[2]);
-   PUSH_DATAf(push, color.f[3]);
+   PUSH_DATA (push, color.ui[0]);
+   PUSH_DATA (push, color.ui[1]);
+   PUSH_DATA (push, color.ui[2]);
+   PUSH_DATA (push, color.ui[3]);
BEGIN_NVC0(push, NVC0_3D(SCREEN_SCISSOR_HORIZ), 2);
PUSH_DATA (push, width << 16);
PUSH_DATA (push, height << 16);
@@ -600,10 +596,7 @@ nvc0_clear_buffer(struct pipe_context *pipe,
 
IMMED_NVC0(push, NVC0_3D(COND_MODE), nvc0->cond_condmode);
 
-   if (buf->mm) {
-  nouveau_fence_ref(nvc0->screen->base.fence.current, >fence);
-  nouveau_fence_ref(nvc0->screen->base.fence.current, >fence_wr);
-   }
+   nvc0_resource_validate(buf, NOUVEAU_BO_WR);
 
if (width * height != elements) {
   offset += width * height * data_size;
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v1 0/7] Implement commont gralloc_handle_t in libdrm

2018-02-20 Thread Tomasz Figa
On Wed, Feb 21, 2018 at 4:03 AM, Rob Herring  wrote:
> On Tue, Feb 20, 2018 at 4:26 AM, Tomasz Figa  wrote:
>> On Tue, Feb 20, 2018 at 6:51 PM, Robert Foss  
>> wrote:
>>> Hey Tomasz,
>>>
>>> On 02/20/2018 09:55 AM, Tomasz Figa wrote:

 Hi Rob,

 On Fri, Feb 16, 2018 at 11:48 PM, Tomasz Figa  wrote:
>
> On Fri, Feb 16, 2018 at 11:33 PM, Robert Foss 
> wrote:
>>
>> Hey Tomasz,
>>
>>
>> On 02/16/2018 05:10 AM, Tomaszzz Figa wrote:
>>>
>>>
>>> On Fri, Feb 9, 2018 at 11:06 PM, Rob Herring  wrote:


 On Fri, Feb 9, 2018 at 3:58 AM, Tomasz Figa >>
>>> On Fri, Feb 2, 2018 at 2:01 AM, Tomasz Figa 
>>>
>>> wrote:


 Hi Rob,

 On Tue, Jan 30, 2018 at 9:36 PM, Robert Foss
  wrote:
>>
>>
>> uint32_t (*get_fd)(buffer_handle_t handle, uint32_t
>> plane);
>> uint64_t (*get_modifier)(buffer_handle_t handle,
>> uint32_t
>> plane);
>> uint32_t (*get_offsets)(buffer_handle_t handle,
>> uint32_t
>> plane);
>> uint32_t (*get_stride)(buffer_handle_t handle,
>> uint32_t
>> plane);
>> ...
>> } gralloc_funcs_t;
>>
>>
>>
>>
>> These ones? >
>> Yeah, if we could retrieve such function pointer struct using
>> perform
>> or any equivalent (like the implementation-specific methods in
>> gralloc1, but not sure if that's going to be used in practice
>> anywhere), it could work for us.
>
>
>
>
> So this is where you and Rob Herring lose me, I don't think I
> understand
> quite how the gralloc1 call would be used, and how it would tie
> into
> this
> handle struct. I think I could do with some guidance on this.



 This would be very similar to gralloc0 perform call. gralloc1
 implementations need to provide getFunction() callback [1], which
 returns a pointer to given function. The list of standard
 functions
 is
 defined in the gralloc1.h header [2], but we could take some
 random
 big number and use it for our function that fills in provided
 gralloc_funcs_t struct with necessary pointers.

 [1]

 https://android.googlesource.com/platform/hardware/libhardware/+/master/include/hardware/gralloc1.h#300
 [2]

 https://android.googlesource.com/platform/hardware/libhardware/+/master/include/hardware/gralloc1.h#134
>>>
>>>
>>>
>>> This is a deadend because it won't work with a HIDL based
>>> implementation (aka gralloc 2.0). You can't set function pointers
>>> (or
>>> any pointers) because gralloc runs in a different process. Yes,
>>> currently gralloc is a pass-thru HAL, but AIUI that will go away.
>>
>>
>>
>> Part of it. I can't see IMapper being implemented by a separate
>> process. You can't map a buffer into one process from another
>> process.
>>
>> But anyway, it's a good point, thanks, I almost forgot about its
>> existence. I'll do further investigation.
>
>
>
> Okay, so IMapper indeed breaks the approach I suggested. I'm not sure
> at the moment what we could do about it. (The idea of a dynamic
> library of a pre-defined name, exporting functions we specify, might
> still work, though.)
>
> Note that the DRM_GRALLOC_GET_FD used currently by Mesa will also be
> impossible to implement with IAllocator/IMapper. (Although I still
> think Mesa and Gralloc are free to have separate logic for choosing
> the DRM device to use.)



 I think the need for GET_FD goes away when the render node is used. We
 may still need the card node for s/w rendering (if I can ever get that
 working) though. Of course, if we use the vgem approach like CrOS then
 we wouldn't.
>>>
>>>
>>>
>>> Hmm, if so, then we probably wouldn't have any strict need for these
>>> function pointers anymore. We already have a makeshift format resolve
>>> in place and 

Re: [Mesa-dev] [PATCH v5 01/34] st/glsl_to_nir: run lower_output_reads on !PIPE_CAP_TGSI_CAN_READ_OUTPUTS

2018-02-20 Thread Timothy Arceri

Reviewed-by: Timothy Arceri 

On 21/02/18 08:02, Karol Herbst wrote:

this is required for Nouveau

Signed-off-by: Karol Herbst 
---
  src/mesa/state_tracker/st_glsl_to_nir.cpp | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/src/mesa/state_tracker/st_glsl_to_nir.cpp 
b/src/mesa/state_tracker/st_glsl_to_nir.cpp
index 765c827d93..f6f55afe40 100644
--- a/src/mesa/state_tracker/st_glsl_to_nir.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_nir.cpp
@@ -43,6 +43,7 @@
  #include "compiler/glsl_types.h"
  #include "compiler/glsl/glsl_to_nir.h"
  #include "compiler/glsl/ir.h"
+#include "compiler/glsl/ir_optimization.h"
  #include "compiler/glsl/string_to_uint_map.h"
  
  
@@ -471,6 +472,7 @@ st_nir_get_mesa_program(struct gl_context *ctx,

  struct gl_linked_shader *shader)
  {
 struct st_context *st = st_context(ctx);
+   struct pipe_screen *pscreen = ctx->st->pipe->screen;
 struct gl_program *prog;
  
 validate_ir_tree(shader->ir);

@@ -483,6 +485,10 @@ st_nir_get_mesa_program(struct gl_context *ctx,
 _mesa_generate_parameters_list_for_uniforms(ctx, shader_program, shader,
 prog->Parameters);
  
+   /* Remove reads from output registers. */

+   if (!pscreen->get_param(pscreen, PIPE_CAP_TGSI_CAN_READ_OUTPUTS))
+  lower_output_reads(shader->Stage, shader->ir);
+
 if (ctx->_Shader->Flags & GLSL_DUMP) {
_mesa_log("\n");
_mesa_log("GLSL IR for linked %s program %d:\n",


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] ac/nir: set the DA field when performing atomics on 3D images

2018-02-20 Thread Timothy Arceri

On 21/02/18 07:29, Samuel Pitoiset wrote:
On VI, 3D images are considered as 2D arrays. RadeonSI sets DA for 
loads/stores/atomics and RADV only for loads/stores, so I guess there is 
a reason for that?


I've changed the nir->llvm code recently in order to fix some piglit 
test on the  radeonsi nir backend.


[1] 
https://cgit.freedesktop.org/mesa/mesa/commit/?id=e68150de263156a3f3d1b609b6506c5649967f61
[2] 
https://cgit.freedesktop.org/mesa/mesa/commit/?id=82adf53308c137ce0dc5f2d5da4e7cc40c5b808c




Anyway, there is a potential issue on the RADV side I think.

On 02/20/2018 04:43 PM, Nicolai Hähnle wrote:

Why? 3D images are not arrays.

On 20.02.2018 11:11, Samuel Pitoiset wrote:

This doesn't fix anything known but it should definitely be set.

Signed-off-by: Samuel Pitoiset 
---
  src/amd/common/ac_nir_to_llvm.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c 
b/src/amd/common/ac_nir_to_llvm.c

index dc471de977..9244f8bc7b 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -3764,7 +3764,8 @@ static LLVMValueRef visit_image_atomic(struct 
ac_nir_context *ctx,

  char coords_type[8];
  bool da = glsl_sampler_type_is_array(type) ||
-  glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_CUBE;
+  glsl_get_sampler_dim(type) == 
GLSL_SAMPLER_DIM_CUBE ||

+  glsl_get_sampler_dim(type) == GLSL_SAMPLER_DIM_3D;
  LLVMValueRef coords = params[param_count++] = 
get_image_coords(ctx, instr);
  params[param_count++] = get_sampler_desc(ctx, 
instr->variables[0], AC_DESC_IMAGE,






___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 18.0] i965: Disable ARB_get_program_binary for compat profiles

2018-02-20 Thread Timothy Arceri

On 21/02/18 13:21, Ilia Mirkin wrote:

Is this worth doing for st/mesa as well? Some quick grepping suggests
it's enabled on the 18.0 branch there too, but it's behind a
conditional which perhaps is never set.


Yes the st will need a change too as it will be enable for any driver 
that enables the disk cache (which is most drivers). The qt bug has been 
observed on radeonsi.




On Tue, Feb 20, 2018 at 9:12 PM, Jordan Justen
 wrote:

The QT framework has a bug in their shader program cache, which is
built on GL_ARB_get_program_binary.

In an effort to allow them to fix the bug we don't enable more than 1
binary format for compatibility profiles.

This is only being done on the 18.0 release branch.

Ref: https://bugreports.qt.io/browse/QTBUG-66420
Ref: https://bugs.freedesktop.org/show_bug.cgi?id=105065
Cc: "18.0" 
Cc: Mark Janes 
Cc: Kenneth Graunke 
Cc: Scott D Phillips 
Signed-off-by: Jordan Justen 
---
  docs/relnotes/17.4.0.html   | 2 +-
  src/mesa/drivers/dri/i965/brw_context.c | 9 -
  2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/relnotes/17.4.0.html b/docs/relnotes/17.4.0.html
index 412c0fc455e..fecdfe77969 100644
--- a/docs/relnotes/17.4.0.html
+++ b/docs/relnotes/17.4.0.html
@@ -53,7 +53,7 @@ Note: some of the new features are only available with 
certain drivers.
  GL_ARB_enhanced_layouts on r600/evergreen+
  GL_ARB_bindless_texture on nvc0/kepler
  OpenGL 4.3 on r600/evergreen with hw fp64 support
-Support 1 binary format for GL_ARB_get_program_binary on i965
+Support 1 binary format for GL_ARB_get_program_binary on i965 (except in GL 
compatibility profiles)
  

  Bug fixes
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index e9358b7bc9c..58527d77263 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -704,7 +704,14 @@ brw_initialize_context_constants(struct brw_context *brw)
ctx->Const.AllowMappedBuffersDuringExecution = true;

 /* GL_ARB_get_program_binary */
-   ctx->Const.NumProgramBinaryFormats = 1;
+   /* The QT framework has a bug in their shader program cache, which is built
+* on GL_ARB_get_program_binary. In an effort to allow them to fix the bug
+* we don't enable more than 1 binary format for compatibility profiles.
+* This is only being done on the 18.0 release branch.
+*/
+   if (ctx->API != API_OPENGL_COMPAT) {
+  ctx->Const.NumProgramBinaryFormats = 1;
+   }
  }

  static void
--
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: remove old assert

2018-02-20 Thread Timothy Arceri
This was originally intended to make sure the remap location
was not -1. However the code has changed alot since then,
the location is now never set to -1 and we also handle
components meaning this old assert has been doing comparisions
with the pointer to the array of component data.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105183
---
 src/compiler/nir/nir_linking_helpers.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/compiler/nir/nir_linking_helpers.c 
b/src/compiler/nir/nir_linking_helpers.c
index 6459c6a24d..2b0a2668a3 100644
--- a/src/compiler/nir/nir_linking_helpers.c
+++ b/src/compiler/nir/nir_linking_helpers.c
@@ -283,7 +283,6 @@ remap_slots_and_components(struct exec_list *var_list, 
gl_shader_stage stage,
   if (var->data.location >= VARYING_SLOT_VAR0 &&
   var->data.location - VARYING_SLOT_VAR0 < 32) {
  assert(var->data.location - VARYING_SLOT_VAR0 < 32);
- assert(remap[var->data.location - VARYING_SLOT_VAR0] >= 0);
 
  const struct glsl_type *type = var->type;
  if (nir_is_per_vertex_io(var, stage)) {
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] ac/radeonsi: pass bindless bool to load_sampler_desc()

2018-02-20 Thread Marek Olšák
On Wed, Feb 21, 2018 at 2:21 AM, Timothy Arceri  wrote:
>
>
> On 21/02/18 12:10, Marek Olšák wrote:
>>
>> On Wed, Feb 21, 2018 at 12:50 AM, Timothy Arceri 
>> wrote:
>>>
>>> On 21/02/18 10:33, Marek Olšák wrote:


 On Tue, Feb 20, 2018 at 11:51 PM, Timothy Arceri 
 wrote:
>
>
> On 21/02/18 09:46, Marek Olšák wrote:
>>
>>
>>
>> On Tue, Feb 20, 2018 at 11:42 PM, Marek Olšák 
>> wrote:
>>>
>>>
>>>
>>> For patches 1-5:
>>>
>>> Reviewed-by: Marek Olšák 
>>
>>
>>
>>
>> Actually no. Only patches 1, 3, 5 are reviewed by me.
>>
>> Marek
>
>
>
>
> Do you have an issue with patch 4?



 No, I'm just not sure if it's correct. It calls
 st_nir_lookup_parameter_index, but bindless handless are just
 variables. I think it should just visit the whole expression leading
 to the bindless variable in a generic way and not treat it as a
 uniform.
>>>
>>>
>>>
>>> I'm not sure I understand. We use uniform storage for bindless in tgsi
>>> also.
>>
>>
>> A bindless (sampler or buffer) variable is represented as a 64-bit
>> number in the GL API. It can be passed to shaders in many different
>> ways. For example, a bindless sampler2D variable can be a vertex
>> shader input (loaded from a vertex buffer).
>
>
> Right I should have specified this series does not yet handle bindless
> input/output support, that will require more updates to nir itself as those
> shaders currently trip asserts. Patch 4 however is specifically about
> bindless uniforms.

OK. Patch 4 also has my Rb.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] virgl: reduce some default capset limits.

2018-02-20 Thread Stéphane Marchesin
On Tue, Feb 20, 2018 at 5:49 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> Since v2 might take a while to rollout, we should reduce
> these inside some gathered minimums and then v2 can increase
> them using host values.
>
> Signed-off-by: Dave Airlie 

Reviewed-by: Stéphane Marchesin 

> ---
>  src/gallium/drivers/virgl/virgl_winsys.h | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/src/gallium/drivers/virgl/virgl_winsys.h 
> b/src/gallium/drivers/virgl/virgl_winsys.h
> index d633678597b..95e21a8afde 100644
> --- a/src/gallium/drivers/virgl/virgl_winsys.h
> +++ b/src/gallium/drivers/virgl/virgl_winsys.h
> @@ -114,17 +114,17 @@ struct virgl_winsys {
>   */
>  static inline void virgl_ws_fill_new_caps_defaults(struct virgl_drm_caps 
> *caps)
>  {
> -   caps->caps.v2.min_aliased_point_size = 0.f;
> +   caps->caps.v2.min_aliased_point_size = 1.f;
> caps->caps.v2.max_aliased_point_size = 255.f;
> -   caps->caps.v2.min_smooth_point_size = 0.f;
> -   caps->caps.v2.max_smooth_point_size = 255.f;
> -   caps->caps.v2.min_aliased_line_width = 0.f;
> -   caps->caps.v2.max_aliased_line_width = 255.f;
> +   caps->caps.v2.min_smooth_point_size = 1.f;
> +   caps->caps.v2.max_smooth_point_size = 190.f;
> +   caps->caps.v2.min_aliased_line_width = 1.f;
> +   caps->caps.v2.max_aliased_line_width = 10.f;
> caps->caps.v2.min_smooth_line_width = 0.f;
> -   caps->caps.v2.max_smooth_line_width = 255.f;
> -   caps->caps.v2.max_texture_lod_bias = 16.0f;
> +   caps->caps.v2.max_smooth_line_width = 10.f;
> +   caps->caps.v2.max_texture_lod_bias = 15.0f;
> caps->caps.v2.max_geom_output_vertices = 256;
> -   caps->caps.v2.max_geom_total_output_components = 16384;
> +   caps->caps.v2.max_geom_total_output_components = 1024;
> caps->caps.v2.max_vertex_outputs = 32;
> caps->caps.v2.max_vertex_attribs = 16;
> caps->caps.v2.max_shader_patch_varyings = 0;
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/2] virgl: handle getting new capsets.

2018-02-20 Thread Stéphane Marchesin
On Tue, Feb 20, 2018 at 5:49 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This checks the kernel api is new enough and asks for the
> larger caps size since the kernel won't mess it up now.
>
> Signed-off-by: Dave Airlie 

Reviewed-by: Stéphane Marchesin 

> ---
>  src/gallium/drivers/virgl/virgl_winsys.h   | 25 ++-
>  src/gallium/winsys/virgl/drm/virgl_drm_winsys.c| 52 
> ++
>  src/gallium/winsys/virgl/drm/virgl_drm_winsys.h|  1 +
>  src/gallium/winsys/virgl/drm/virtgpu_drm.h |  1 +
>  .../winsys/virgl/vtest/virgl_vtest_socket.c|  2 +-
>  .../winsys/virgl/vtest/virgl_vtest_winsys.c|  2 +
>  6 files changed, 52 insertions(+), 31 deletions(-)
>
> diff --git a/src/gallium/drivers/virgl/virgl_winsys.h 
> b/src/gallium/drivers/virgl/virgl_winsys.h
> index ea21f2b6712..d633678597b 100644
> --- a/src/gallium/drivers/virgl/virgl_winsys.h
> +++ b/src/gallium/drivers/virgl/virgl_winsys.h
> @@ -109,5 +109,28 @@ struct virgl_winsys {
>   struct pipe_box *sub_box);
>  };
>
> -
> +/* this defaults all newer caps,
> + * the kernel will overwrite these if newer version is available.
> + */
> +static inline void virgl_ws_fill_new_caps_defaults(struct virgl_drm_caps 
> *caps)
> +{
> +   caps->caps.v2.min_aliased_point_size = 0.f;
> +   caps->caps.v2.max_aliased_point_size = 255.f;
> +   caps->caps.v2.min_smooth_point_size = 0.f;
> +   caps->caps.v2.max_smooth_point_size = 255.f;
> +   caps->caps.v2.min_aliased_line_width = 0.f;
> +   caps->caps.v2.max_aliased_line_width = 255.f;
> +   caps->caps.v2.min_smooth_line_width = 0.f;
> +   caps->caps.v2.max_smooth_line_width = 255.f;
> +   caps->caps.v2.max_texture_lod_bias = 16.0f;
> +   caps->caps.v2.max_geom_output_vertices = 256;
> +   caps->caps.v2.max_geom_total_output_components = 16384;
> +   caps->caps.v2.max_vertex_outputs = 32;
> +   caps->caps.v2.max_vertex_attribs = 16;
> +   caps->caps.v2.max_shader_patch_varyings = 0;
> +   caps->caps.v2.min_texel_offset = -8;
> +   caps->caps.v2.max_texel_offset = 7;
> +   caps->caps.v2.min_texture_gather_offset = -8;
> +   caps->caps.v2.max_texture_gather_offset = 7;
> +}
>  #endif
> diff --git a/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c 
> b/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c
> index fd6ae98a515..77854680e59 100644
> --- a/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c
> +++ b/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c
> @@ -705,46 +705,28 @@ static int virgl_drm_get_caps(struct virgl_winsys *vws,
> struct virgl_drm_winsys *vdws = virgl_drm_winsys(vws);
> struct drm_virtgpu_get_caps args;
> int ret;
> -   bool fill_v2 = false;
>
> -   memset(, 0, sizeof(args));
> +   virgl_ws_fill_new_caps_defaults(caps);
>
> -   args.cap_set_id = 1;
> +   memset(, 0, sizeof(args));
> +   if (vdws->has_capset_query_fix) {
> +  /* if we have the query fix - try and get cap set id 2 first */
> +  args.cap_set_id = 2;
> +  args.size = sizeof(union virgl_caps);
> +   } else {
> +  args.cap_set_id = 1;
> +  args.size = sizeof(struct virgl_caps_v1);
> +   }
> args.addr = (unsigned long)>caps;
> -   args.size = sizeof(union virgl_caps);
>
> ret = drmIoctl(vdws->fd, DRM_IOCTL_VIRTGPU_GET_CAPS, );
> -
> if (ret == -1 && errno == EINVAL) {
>/* Fallback to v1 */
> +  args.cap_set_id = 1;
>args.size = sizeof(struct virgl_caps_v1);
>ret = drmIoctl(vdws->fd, DRM_IOCTL_VIRTGPU_GET_CAPS, );
>if (ret == -1)
>return ret;
> -  fill_v2 = true;
> -   }
> -   if (caps->caps.max_version == 1)
> -   fill_v2 = true;
> -
> -   if (fill_v2) {
> -  caps->caps.v2.min_aliased_point_size = 0.f;
> -  caps->caps.v2.max_aliased_point_size = 255.f;
> -  caps->caps.v2.min_smooth_point_size = 0.f;
> -  caps->caps.v2.max_smooth_point_size = 255.f;
> -  caps->caps.v2.min_aliased_line_width = 0.f;
> -  caps->caps.v2.max_aliased_line_width = 255.f;
> -  caps->caps.v2.min_smooth_line_width = 0.f;
> -  caps->caps.v2.max_smooth_line_width = 255.f;
> -  caps->caps.v2.max_texture_lod_bias = 16.0f;
> -  caps->caps.v2.max_geom_output_vertices = 256;
> -  caps->caps.v2.max_geom_total_output_components = 16384;
> -  caps->caps.v2.max_vertex_outputs = 32;
> -  caps->caps.v2.max_vertex_attribs = 16;
> -  caps->caps.v2.max_shader_patch_varyings = 0;
> -  caps->caps.v2.min_texel_offset = -8;
> -  caps->caps.v2.max_texel_offset = 7;
> -  caps->caps.v2.min_texture_gather_offset = -8;
> -  caps->caps.v2.max_texture_gather_offset = 7;
> }
> return ret;
>  }
> @@ -813,6 +795,8 @@ static struct virgl_winsys *
>  virgl_drm_winsys_create(int drmFD)
>  {
> struct virgl_drm_winsys *qdws;
> +   int ret;
> +   struct drm_virtgpu_getparam getparam = {0};
>
> qdws = CALLOC_STRUCT(virgl_drm_winsys);
> if (!qdws)
> @@ 

Re: [Mesa-dev] [PATCH 18.0] i965: Disable ARB_get_program_binary for compat profiles

2018-02-20 Thread Ilia Mirkin
Is this worth doing for st/mesa as well? Some quick grepping suggests
it's enabled on the 18.0 branch there too, but it's behind a
conditional which perhaps is never set.

On Tue, Feb 20, 2018 at 9:12 PM, Jordan Justen
 wrote:
> The QT framework has a bug in their shader program cache, which is
> built on GL_ARB_get_program_binary.
>
> In an effort to allow them to fix the bug we don't enable more than 1
> binary format for compatibility profiles.
>
> This is only being done on the 18.0 release branch.
>
> Ref: https://bugreports.qt.io/browse/QTBUG-66420
> Ref: https://bugs.freedesktop.org/show_bug.cgi?id=105065
> Cc: "18.0" 
> Cc: Mark Janes 
> Cc: Kenneth Graunke 
> Cc: Scott D Phillips 
> Signed-off-by: Jordan Justen 
> ---
>  docs/relnotes/17.4.0.html   | 2 +-
>  src/mesa/drivers/dri/i965/brw_context.c | 9 -
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/docs/relnotes/17.4.0.html b/docs/relnotes/17.4.0.html
> index 412c0fc455e..fecdfe77969 100644
> --- a/docs/relnotes/17.4.0.html
> +++ b/docs/relnotes/17.4.0.html
> @@ -53,7 +53,7 @@ Note: some of the new features are only available with 
> certain drivers.
>  GL_ARB_enhanced_layouts on r600/evergreen+
>  GL_ARB_bindless_texture on nvc0/kepler
>  OpenGL 4.3 on r600/evergreen with hw fp64 support
> -Support 1 binary format for GL_ARB_get_program_binary on i965
> +Support 1 binary format for GL_ARB_get_program_binary on i965 (except in 
> GL compatibility profiles)
>  
>
>  Bug fixes
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index e9358b7bc9c..58527d77263 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -704,7 +704,14 @@ brw_initialize_context_constants(struct brw_context *brw)
>ctx->Const.AllowMappedBuffersDuringExecution = true;
>
> /* GL_ARB_get_program_binary */
> -   ctx->Const.NumProgramBinaryFormats = 1;
> +   /* The QT framework has a bug in their shader program cache, which is 
> built
> +* on GL_ARB_get_program_binary. In an effort to allow them to fix the bug
> +* we don't enable more than 1 binary format for compatibility profiles.
> +* This is only being done on the 18.0 release branch.
> +*/
> +   if (ctx->API != API_OPENGL_COMPAT) {
> +  ctx->Const.NumProgramBinaryFormats = 1;
> +   }
>  }
>
>  static void
> --
> 2.16.1
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 105183] Weird assertion in NIR linker

2018-02-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105183

Bug ID: 105183
   Summary: Weird assertion in NIR linker
   Product: Mesa
   Version: git
  Hardware: Other
OS: All
Status: NEW
  Severity: normal
  Priority: medium
 Component: glsl-compiler
  Assignee: mesa-dev@lists.freedesktop.org
  Reporter: i...@freedesktop.org
QA Contact: intel-3d-b...@lists.freedesktop.org

GCC issues the following warning in my build:

In file included from ../../SOURCE/master/src/compiler/glsl_types.h:29:0,
 from ../../SOURCE/master/src/compiler/nir_types.h:36,
 from ../../SOURCE/master/src/compiler/nir/nir.h:39,
 from
../../SOURCE/master/src/compiler/nir/nir_linking_helpers.c:24:
../../SOURCE/master/src/compiler/nir/nir_linking_helpers.c: In function
‘remap_slots_and_components’:
../../SOURCE/master/src/compiler/nir/nir_linking_helpers.c:286:63: warning:
ordered comparison of pointer with integer zero [-Wextra]
  assert(remap[var->data.location - VARYING_SLOT_VAR0] >= 0);
   ^

I looked at the code, and remap is declared as "struct varying_loc
(*remap)[4]".  It is comparing that a pointer to an array of 4 structures is >=
0.  I'm not sure what the original intention was, but this tautology isn't
doing it.

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 105183] Weird assertion in NIR linker

2018-02-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105183

Ian Romanick  changed:

   What|Removed |Added

 CC||t_arc...@yahoo.com.au

-- 
You are receiving this mail because:
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 18.0] i965: Disable ARB_get_program_binary for compat profiles

2018-02-20 Thread Jordan Justen
The QT framework has a bug in their shader program cache, which is
built on GL_ARB_get_program_binary.

In an effort to allow them to fix the bug we don't enable more than 1
binary format for compatibility profiles.

This is only being done on the 18.0 release branch.

Ref: https://bugreports.qt.io/browse/QTBUG-66420
Ref: https://bugs.freedesktop.org/show_bug.cgi?id=105065
Cc: "18.0" 
Cc: Mark Janes 
Cc: Kenneth Graunke 
Cc: Scott D Phillips 
Signed-off-by: Jordan Justen 
---
 docs/relnotes/17.4.0.html   | 2 +-
 src/mesa/drivers/dri/i965/brw_context.c | 9 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/docs/relnotes/17.4.0.html b/docs/relnotes/17.4.0.html
index 412c0fc455e..fecdfe77969 100644
--- a/docs/relnotes/17.4.0.html
+++ b/docs/relnotes/17.4.0.html
@@ -53,7 +53,7 @@ Note: some of the new features are only available with 
certain drivers.
 GL_ARB_enhanced_layouts on r600/evergreen+
 GL_ARB_bindless_texture on nvc0/kepler
 OpenGL 4.3 on r600/evergreen with hw fp64 support
-Support 1 binary format for GL_ARB_get_program_binary on i965
+Support 1 binary format for GL_ARB_get_program_binary on i965 (except in 
GL compatibility profiles)
 
 
 Bug fixes
diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
b/src/mesa/drivers/dri/i965/brw_context.c
index e9358b7bc9c..58527d77263 100644
--- a/src/mesa/drivers/dri/i965/brw_context.c
+++ b/src/mesa/drivers/dri/i965/brw_context.c
@@ -704,7 +704,14 @@ brw_initialize_context_constants(struct brw_context *brw)
   ctx->Const.AllowMappedBuffersDuringExecution = true;
 
/* GL_ARB_get_program_binary */
-   ctx->Const.NumProgramBinaryFormats = 1;
+   /* The QT framework has a bug in their shader program cache, which is built
+* on GL_ARB_get_program_binary. In an effort to allow them to fix the bug
+* we don't enable more than 1 binary format for compatibility profiles.
+* This is only being done on the 18.0 release branch.
+*/
+   if (ctx->API != API_OPENGL_COMPAT) {
+  ctx->Const.NumProgramBinaryFormats = 1;
+   }
 }
 
 static void
-- 
2.16.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] virgl: reduce some default capset limits.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

Since v2 might take a while to rollout, we should reduce
these inside some gathered minimums and then v2 can increase
them using host values.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/virgl/virgl_winsys.h | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/gallium/drivers/virgl/virgl_winsys.h 
b/src/gallium/drivers/virgl/virgl_winsys.h
index d633678597b..95e21a8afde 100644
--- a/src/gallium/drivers/virgl/virgl_winsys.h
+++ b/src/gallium/drivers/virgl/virgl_winsys.h
@@ -114,17 +114,17 @@ struct virgl_winsys {
  */
 static inline void virgl_ws_fill_new_caps_defaults(struct virgl_drm_caps *caps)
 {
-   caps->caps.v2.min_aliased_point_size = 0.f;
+   caps->caps.v2.min_aliased_point_size = 1.f;
caps->caps.v2.max_aliased_point_size = 255.f;
-   caps->caps.v2.min_smooth_point_size = 0.f;
-   caps->caps.v2.max_smooth_point_size = 255.f;
-   caps->caps.v2.min_aliased_line_width = 0.f;
-   caps->caps.v2.max_aliased_line_width = 255.f;
+   caps->caps.v2.min_smooth_point_size = 1.f;
+   caps->caps.v2.max_smooth_point_size = 190.f;
+   caps->caps.v2.min_aliased_line_width = 1.f;
+   caps->caps.v2.max_aliased_line_width = 10.f;
caps->caps.v2.min_smooth_line_width = 0.f;
-   caps->caps.v2.max_smooth_line_width = 255.f;
-   caps->caps.v2.max_texture_lod_bias = 16.0f;
+   caps->caps.v2.max_smooth_line_width = 10.f;
+   caps->caps.v2.max_texture_lod_bias = 15.0f;
caps->caps.v2.max_geom_output_vertices = 256;
-   caps->caps.v2.max_geom_total_output_components = 16384;
+   caps->caps.v2.max_geom_total_output_components = 1024;
caps->caps.v2.max_vertex_outputs = 32;
caps->caps.v2.max_vertex_attribs = 16;
caps->caps.v2.max_shader_patch_varyings = 0;
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] virgl: handle getting new capsets.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

This checks the kernel api is new enough and asks for the
larger caps size since the kernel won't mess it up now.

Signed-off-by: Dave Airlie 
---
 src/gallium/drivers/virgl/virgl_winsys.h   | 25 ++-
 src/gallium/winsys/virgl/drm/virgl_drm_winsys.c| 52 ++
 src/gallium/winsys/virgl/drm/virgl_drm_winsys.h|  1 +
 src/gallium/winsys/virgl/drm/virtgpu_drm.h |  1 +
 .../winsys/virgl/vtest/virgl_vtest_socket.c|  2 +-
 .../winsys/virgl/vtest/virgl_vtest_winsys.c|  2 +
 6 files changed, 52 insertions(+), 31 deletions(-)

diff --git a/src/gallium/drivers/virgl/virgl_winsys.h 
b/src/gallium/drivers/virgl/virgl_winsys.h
index ea21f2b6712..d633678597b 100644
--- a/src/gallium/drivers/virgl/virgl_winsys.h
+++ b/src/gallium/drivers/virgl/virgl_winsys.h
@@ -109,5 +109,28 @@ struct virgl_winsys {
  struct pipe_box *sub_box);
 };
 
-
+/* this defaults all newer caps,
+ * the kernel will overwrite these if newer version is available.
+ */
+static inline void virgl_ws_fill_new_caps_defaults(struct virgl_drm_caps *caps)
+{
+   caps->caps.v2.min_aliased_point_size = 0.f;
+   caps->caps.v2.max_aliased_point_size = 255.f;
+   caps->caps.v2.min_smooth_point_size = 0.f;
+   caps->caps.v2.max_smooth_point_size = 255.f;
+   caps->caps.v2.min_aliased_line_width = 0.f;
+   caps->caps.v2.max_aliased_line_width = 255.f;
+   caps->caps.v2.min_smooth_line_width = 0.f;
+   caps->caps.v2.max_smooth_line_width = 255.f;
+   caps->caps.v2.max_texture_lod_bias = 16.0f;
+   caps->caps.v2.max_geom_output_vertices = 256;
+   caps->caps.v2.max_geom_total_output_components = 16384;
+   caps->caps.v2.max_vertex_outputs = 32;
+   caps->caps.v2.max_vertex_attribs = 16;
+   caps->caps.v2.max_shader_patch_varyings = 0;
+   caps->caps.v2.min_texel_offset = -8;
+   caps->caps.v2.max_texel_offset = 7;
+   caps->caps.v2.min_texture_gather_offset = -8;
+   caps->caps.v2.max_texture_gather_offset = 7;
+}
 #endif
diff --git a/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c 
b/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c
index fd6ae98a515..77854680e59 100644
--- a/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c
+++ b/src/gallium/winsys/virgl/drm/virgl_drm_winsys.c
@@ -705,46 +705,28 @@ static int virgl_drm_get_caps(struct virgl_winsys *vws,
struct virgl_drm_winsys *vdws = virgl_drm_winsys(vws);
struct drm_virtgpu_get_caps args;
int ret;
-   bool fill_v2 = false;
 
-   memset(, 0, sizeof(args));
+   virgl_ws_fill_new_caps_defaults(caps);
 
-   args.cap_set_id = 1;
+   memset(, 0, sizeof(args));
+   if (vdws->has_capset_query_fix) {
+  /* if we have the query fix - try and get cap set id 2 first */
+  args.cap_set_id = 2;
+  args.size = sizeof(union virgl_caps);
+   } else {
+  args.cap_set_id = 1;
+  args.size = sizeof(struct virgl_caps_v1);
+   }
args.addr = (unsigned long)>caps;
-   args.size = sizeof(union virgl_caps);
 
ret = drmIoctl(vdws->fd, DRM_IOCTL_VIRTGPU_GET_CAPS, );
-
if (ret == -1 && errno == EINVAL) {
   /* Fallback to v1 */
+  args.cap_set_id = 1;
   args.size = sizeof(struct virgl_caps_v1);
   ret = drmIoctl(vdws->fd, DRM_IOCTL_VIRTGPU_GET_CAPS, );
   if (ret == -1)
   return ret;
-  fill_v2 = true;
-   }
-   if (caps->caps.max_version == 1)
-   fill_v2 = true;
-
-   if (fill_v2) {
-  caps->caps.v2.min_aliased_point_size = 0.f;
-  caps->caps.v2.max_aliased_point_size = 255.f;
-  caps->caps.v2.min_smooth_point_size = 0.f;
-  caps->caps.v2.max_smooth_point_size = 255.f;
-  caps->caps.v2.min_aliased_line_width = 0.f;
-  caps->caps.v2.max_aliased_line_width = 255.f;
-  caps->caps.v2.min_smooth_line_width = 0.f;
-  caps->caps.v2.max_smooth_line_width = 255.f;
-  caps->caps.v2.max_texture_lod_bias = 16.0f;
-  caps->caps.v2.max_geom_output_vertices = 256;
-  caps->caps.v2.max_geom_total_output_components = 16384;
-  caps->caps.v2.max_vertex_outputs = 32;
-  caps->caps.v2.max_vertex_attribs = 16;
-  caps->caps.v2.max_shader_patch_varyings = 0;
-  caps->caps.v2.min_texel_offset = -8;
-  caps->caps.v2.max_texel_offset = 7;
-  caps->caps.v2.min_texture_gather_offset = -8;
-  caps->caps.v2.max_texture_gather_offset = 7;
}
return ret;
 }
@@ -813,6 +795,8 @@ static struct virgl_winsys *
 virgl_drm_winsys_create(int drmFD)
 {
struct virgl_drm_winsys *qdws;
+   int ret;
+   struct drm_virtgpu_getparam getparam = {0};
 
qdws = CALLOC_STRUCT(virgl_drm_winsys);
if (!qdws)
@@ -847,6 +831,16 @@ virgl_drm_winsys_create(int drmFD)
qdws->base.fence_reference = virgl_fence_reference;
 
qdws->base.get_caps = virgl_drm_get_caps;
+
+   uint32_t value;
+   getparam.param = VIRTGPU_PARAM_CAPSET_QUERY_FIX;
+   getparam.value = (uint64_t)(uintptr_t)
+   ret = drmIoctl(qdws->fd, DRM_IOCTL_VIRTGPU_GETPARAM, );
+   if (ret == 0) {
+  if (value 

[Mesa-dev] [PATCH 12/14] ac/radv: migrate lds size calculations to shader gen.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

This moves the lds_size calcs into the shader so we have all
the size stuff in one file.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 33 +
 src/amd/common/ac_nir_to_llvm.h |  1 +
 src/amd/vulkan/radv_pipeline.c  | 30 --
 3 files changed, 38 insertions(+), 26 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index c235c5314be..1cf181ddeba 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -357,6 +357,38 @@ get_tcs_num_patches(struct radv_shader_context *ctx)
return num_patches;
 }
 
+static unsigned
+calculate_tess_lds_size(struct radv_shader_context *ctx)
+{
+   unsigned num_tcs_input_cp = ctx->options->key.tcs.input_vertices;
+   unsigned num_tcs_output_cp;
+   unsigned num_tcs_outputs, num_tcs_patch_outputs;
+   unsigned input_vertex_size, output_vertex_size;
+   unsigned input_patch_size, output_patch_size;
+   unsigned pervertex_output_patch_size;
+   unsigned output_patch0_offset;
+   unsigned num_patches;
+   unsigned lds_size;
+
+   num_tcs_output_cp = ctx->tcs_vertices_per_patch;
+   num_tcs_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.outputs_written);
+   num_tcs_patch_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.patch_outputs_written);
+
+   input_vertex_size = ctx->tcs_num_inputs * 16;
+   output_vertex_size = num_tcs_outputs * 16;
+
+   input_patch_size = num_tcs_input_cp * input_vertex_size;
+
+   pervertex_output_patch_size = num_tcs_output_cp * output_vertex_size;
+   output_patch_size = pervertex_output_patch_size + num_tcs_patch_outputs 
* 16;
+
+   num_patches = ctx->tcs_num_patches;
+   output_patch0_offset = input_patch_size * num_patches;
+
+   lds_size = output_patch0_offset + output_patch_size * num_patches;
+   return lds_size;
+}
+
 /* Tessellation shaders pass outputs to the next shader using LDS.
  *
  * LS outputs = TCS inputs
@@ -6981,6 +7013,7 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
shaders[i]->info.gs.vertices_out;
} else if (shaders[i]->info.stage == MESA_SHADER_TESS_CTRL) {
shader_info->tcs.num_patches = ctx.tcs_num_patches;
+   shader_info->tcs.lds_size = 
calculate_tess_lds_size();
}
}
 
diff --git a/src/amd/common/ac_nir_to_llvm.h b/src/amd/common/ac_nir_to_llvm.h
index 62ce72fb7d2..f4825ef4cd5 100644
--- a/src/amd/common/ac_nir_to_llvm.h
+++ b/src/amd/common/ac_nir_to_llvm.h
@@ -199,6 +199,7 @@ struct ac_shader_variant_info {
unsigned tcs_vertices_out;
/* Which outputs are actually written */
uint32_t num_patches;
+   uint32_t lds_size;
} tcs;
struct {
struct ac_vs_output_info outinfo;
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 195daf6abde..e9a9ae975b1 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -1306,39 +1306,17 @@ static struct radv_tessellation_state
 calculate_tess_state(struct radv_pipeline *pipeline,
 const VkGraphicsPipelineCreateInfo *pCreateInfo)
 {
-   unsigned num_tcs_input_cp = 
pCreateInfo->pTessellationState->patchControlPoints;
-   unsigned num_tcs_output_cp, num_tcs_inputs, num_tcs_outputs;
-   unsigned num_tcs_patch_outputs;
-   unsigned input_vertex_size, output_vertex_size, 
pervertex_output_patch_size;
-   unsigned input_patch_size, output_patch_size, output_patch0_offset;
+   unsigned num_tcs_input_cp;
+   unsigned num_tcs_output_cp;
unsigned lds_size;
unsigned num_patches;
struct radv_tessellation_state tess = {0};
 
-   /* This calculates how shader inputs and outputs among VS, TCS, and TES
-* are laid out in LDS. */
-   num_tcs_inputs = 
util_last_bit64(radv_get_vertex_shader(pipeline)->info.info.vs.ls_outputs_written);
-   num_tcs_outputs = 
util_last_bit64(pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.info.tcs.outputs_written);
 //tcs->outputs_written
+   num_tcs_input_cp = pCreateInfo->pTessellationState->patchControlPoints;
num_tcs_output_cp = 
pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.tcs.tcs_vertices_out; //TCS 
VERTICES OUT
-   num_tcs_patch_outputs = 
util_last_bit64(pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.info.tcs.patch_outputs_written);
-
-   /* Ensure that we only need one wave per SIMD so we don't need to check
-* resource usage. Also ensures that the number of tcs in and out
-* vertices per threadgroup are at most 256.
-*/
-   input_vertex_size = num_tcs_inputs * 16;
-   

[Mesa-dev] [PATCH 06/14] radv: drop tcs_out_offsets

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

Move all calculations to shader generation.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 39 +++
 src/amd/vulkan/radv_pipeline.c  | 11 +++
 2 files changed, 30 insertions(+), 20 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index b6c84390a1e..396b98698e6 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -82,7 +82,6 @@ struct radv_shader_context {
LLVMValueRef es2gs_offset;
 
LLVMValueRef tcs_offchip_layout;
-   LLVMValueRef tcs_out_offsets;
LLVMValueRef oc_lds;
LLVMValueRef merged_wave_info;
LLVMValueRef tess_factor_offset;
@@ -375,17 +374,37 @@ get_tcs_out_vertex_stride(struct radv_shader_context *ctx)
 static LLVMValueRef
 get_tcs_out_patch0_offset(struct radv_shader_context *ctx)
 {
+   assert (ctx->stage == MESA_SHADER_TESS_CTRL);
+   uint32_t input_vertex_size = ctx->tcs_num_inputs * 16;
+   uint32_t input_patch_size = ctx->options->key.tcs.input_vertices * 
input_vertex_size;
+   uint32_t output_patch0_offset = input_patch_size;
+   LLVMValueRef num_patches = unpack_param(>ac, 
ctx->tcs_offchip_layout, 0, 9);
+
+   output_patch0_offset /= 4;
return LLVMBuildMul(ctx->ac.builder,
-   unpack_param(>ac, ctx->tcs_out_offsets, 0, 16),
-   LLVMConstInt(ctx->ac.i32, 4, false), "");
+   num_patches,
+   LLVMConstInt(ctx->ac.i32, output_patch0_offset, 
false), "");
 }
 
 static LLVMValueRef
 get_tcs_out_patch0_patch_data_offset(struct radv_shader_context *ctx)
 {
-   return LLVMBuildMul(ctx->ac.builder,
-   unpack_param(>ac, ctx->tcs_out_offsets, 16, 
16),
-   LLVMConstInt(ctx->ac.i32, 4, false), "");
+   uint32_t input_vertex_size = ctx->tcs_num_inputs * 16;
+   uint32_t input_patch_size = ctx->options->key.tcs.input_vertices * 
input_vertex_size;
+   uint32_t output_patch0_offset = input_patch_size;
+
+   uint32_t num_tcs_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.outputs_written);
+   uint32_t output_vertex_size = num_tcs_outputs * 16;
+   uint32_t pervertex_output_patch_size = ctx->tcs_vertices_per_patch * 
output_vertex_size;
+   LLVMValueRef num_patches = unpack_param(>ac, 
ctx->tcs_offchip_layout, 0, 9);
+
+   output_patch0_offset /= 4;
+   LLVMValueRef value = LLVMBuildMul(ctx->ac.builder,
+   num_patches,
+   LLVMConstInt(ctx->ac.i32, output_patch0_offset, 
false), "");
+   return LLVMBuildAdd(ctx->ac.builder,
+   value,
+   LLVMConstInt(ctx->ac.i32, 
pervertex_output_patch_size / 4, false), "");
 }
 
 static LLVMValueRef
@@ -541,7 +560,7 @@ static void allocate_user_sgprs(struct radv_shader_context 
*ctx,
if (previous_stage == MESA_SHADER_VERTEX)
user_sgpr_info->sgpr_count += 
count_vs_user_sgprs(ctx);
}
-   user_sgpr_info->sgpr_count += 2;
+   user_sgpr_info->sgpr_count += 1;
break;
case MESA_SHADER_TESS_EVAL:
user_sgpr_info->sgpr_count += 1;
@@ -810,8 +829,6 @@ static void create_function(struct radv_shader_context *ctx,
 
add_arg(, ARG_SGPR, ctx->ac.i32,
>tcs_offchip_layout);
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >tcs_out_offsets);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -831,8 +848,6 @@ static void create_function(struct radv_shader_context *ctx,
 
add_arg(, ARG_SGPR, ctx->ac.i32,
>tcs_offchip_layout);
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >tcs_out_offsets);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -1041,7 +1056,7 @@ static void create_function(struct radv_shader_context 
*ctx,
case MESA_SHADER_TESS_CTRL:
set_vs_specific_input_locs(ctx, stage, has_previous_stage,
   previous_stage, _sgpr_idx);
-   set_loc_shader(ctx, AC_UD_TCS_OFFCHIP_LAYOUT, _sgpr_idx, 
2);
+   set_loc_shader(ctx, AC_UD_TCS_OFFCHIP_LAYOUT, _sgpr_idx, 
1);
if (ctx->abi.view_index)
set_loc_shader(ctx, AC_UD_VIEW_INDEX, _sgpr_idx, 
1);
break;
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c

[Mesa-dev] [PATCH 11/14] ac/radv: drop scanning the tess shader in the nir code.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

This drops the now unneeded scanning and results in favour
of the ones in the info.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 28 
 src/amd/common/ac_nir_to_llvm.h |  4 
 src/amd/vulkan/radv_pipeline.c  |  7 +++
 3 files changed, 3 insertions(+), 36 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index d43c0ab7fe9..c235c5314be 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -117,8 +117,6 @@ struct radv_shader_context {
unsigned gs_max_out_vertices;
 
unsigned tes_primitive_mode;
-   uint64_t tess_outputs_written;
-   uint64_t tess_patch_outputs_written;
 
uint32_t tcs_patch_outputs_read;
uint64_t tcs_outputs_read;
@@ -2852,17 +2850,6 @@ static LLVMValueRef 
get_tcs_tes_buffer_address_params(struct radv_shader_context
return get_tcs_tes_buffer_address(ctx, vertex_index, param_index);
 }
 
-static void
-mark_tess_output(struct radv_shader_context *ctx,
-bool is_patch, uint32_t param)
-
-{
-   if (is_patch) {
-   ctx->tess_patch_outputs_written |= (1ull << param);
-   } else
-   ctx->tess_outputs_written |= (1ull << param);
-}
-
 static LLVMValueRef
 get_dw_address(struct radv_shader_context *ctx,
   LLVMValueRef dw_addr,
@@ -2986,8 +2973,6 @@ store_tcs_output(struct ac_shader_abi *abi,
dw_addr = get_tcs_out_current_patch_data_offset(ctx);
}
 
-   mark_tess_output(ctx, is_patch, param);
-
dw_addr = get_dw_address(ctx, dw_addr, param, const_index, is_compact, 
vertex_index, stride,
 param_index);
buf_addr = get_tcs_tes_buffer_address_params(ctx, param, const_index, 
is_compact,
@@ -6273,9 +6258,6 @@ handle_ls_outputs_post(struct radv_shader_context *ctx)
if (i == VARYING_SLOT_CLIP_DIST0)
length = ctx->num_output_clips + ctx->num_output_culls;
int param = shader_io_get_unique_index(i);
-   mark_tess_output(ctx, false, param);
-   if (length > 4)
-   mark_tess_output(ctx, false, param + 1);
LLVMValueRef dw_addr = LLVMBuildAdd(ctx->ac.builder, 
base_dw_addr,
LLVMConstInt(ctx->ac.i32, 
param * 4, false),
"");
@@ -6419,13 +6401,11 @@ write_tess_factors(struct radv_shader_context *ctx)
 
if (inner_comps) {
tess_inner_index = 
shader_io_get_unique_index(VARYING_SLOT_TESS_LEVEL_INNER);
-   mark_tess_output(ctx, true, tess_inner_index);
lds_inner = LLVMBuildAdd(ctx->ac.builder, lds_base,
 LLVMConstInt(ctx->ac.i32, 
tess_inner_index * 4, false), "");
}
 
tess_outer_index = 
shader_io_get_unique_index(VARYING_SLOT_TESS_LEVEL_OUTER);
-   mark_tess_output(ctx, true, tess_outer_index);
lds_outer = LLVMBuildAdd(ctx->ac.builder, lds_base,
 LLVMConstInt(ctx->ac.i32, tess_outer_index * 
4, false), "");
 
@@ -6910,7 +6890,6 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
for(int i = 0; i < shader_count; ++i) {
ctx.stage = shaders[i]->info.stage;
ctx.output_mask = 0;
-   ctx.tess_outputs_written = 0;
ctx.num_output_clips = 
shaders[i]->info.clip_distance_array_size;
ctx.num_output_culls = 
shaders[i]->info.cull_distance_array_size;
 
@@ -7001,14 +6980,7 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
shader_info->gs.max_gsvs_emit_size = 
shader_info->gs.gsvs_vertex_size *
shaders[i]->info.gs.vertices_out;
} else if (shaders[i]->info.stage == MESA_SHADER_TESS_CTRL) {
-   shader_info->tcs.outputs_written = 
ctx.tess_outputs_written;
-   shader_info->tcs.patch_outputs_written = 
ctx.tess_patch_outputs_written;
shader_info->tcs.num_patches = ctx.tcs_num_patches;
-   assert(ctx.tess_outputs_written == 
ctx.shader_info->info.tcs.outputs_written);
-   assert(ctx.tess_patch_outputs_written == 
ctx.shader_info->info.tcs.patch_outputs_written);
-   } else if (shaders[i]->info.stage == MESA_SHADER_VERTEX && 
ctx.options->key.vs.as_ls) {
-   shader_info->vs.outputs_written = 
ctx.tess_outputs_written;
-   assert(ctx.tess_outputs_written == 
ctx.shader_info->info.vs.ls_outputs_written);
}
}
 
diff --git a/src/amd/common/ac_nir_to_llvm.h b/src/amd/common/ac_nir_to_llvm.h
index 48a4a5b2049..62ce72fb7d2 100644
--- 

[Mesa-dev] [PATCH 09/14] radv/tess: remove last chunk of tess sgprs

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

This removes the last TES-specifc user sgpr.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 51 +
 src/amd/common/ac_nir_to_llvm.h |  4 ++--
 src/amd/vulkan/radv_pipeline.c  | 18 ++-
 3 files changed, 20 insertions(+), 53 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 981c5d542a4..d43c0ab7fe9 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -81,7 +81,6 @@ struct radv_shader_context {
LLVMValueRef vs_prim_id;
LLVMValueRef es2gs_offset;
 
-   LLVMValueRef tcs_offchip_layout;
LLVMValueRef oc_lds;
LLVMValueRef merged_wave_info;
LLVMValueRef tess_factor_offset;
@@ -601,14 +600,11 @@ static void allocate_user_sgprs(struct 
radv_shader_context *ctx,
}
break;
case MESA_SHADER_TESS_EVAL:
-   user_sgpr_info->sgpr_count += 1;
break;
case MESA_SHADER_GEOMETRY:
if (has_previous_stage) {
if (previous_stage == MESA_SHADER_VERTEX) {
user_sgpr_info->sgpr_count += 
count_vs_user_sgprs(ctx);
-   } else {
-   user_sgpr_info->sgpr_count++;
}
}
user_sgpr_info->sgpr_count += 2;
@@ -900,7 +896,6 @@ static void create_function(struct radv_shader_context *ctx,
   previous_stage, _sgpr_info,
   , _sets);
 
-   add_arg(, ARG_SGPR, ctx->ac.i32, >tcs_offchip_layout);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -935,10 +930,7 @@ static void create_function(struct radv_shader_context 
*ctx,
   _sgpr_info, ,
   _sets);
 
-   if (previous_stage == MESA_SHADER_TESS_EVAL) {
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >tcs_offchip_layout);
-   } else {
+   if (previous_stage != MESA_SHADER_TESS_EVAL) {
declare_vs_specific_input_sgprs(ctx, stage,

has_previous_stage,
previous_stage,
@@ -1094,7 +1086,6 @@ static void create_function(struct radv_shader_context 
*ctx,
set_loc_shader(ctx, AC_UD_VIEW_INDEX, _sgpr_idx, 
1);
break;
case MESA_SHADER_TESS_EVAL:
-   set_loc_shader(ctx, AC_UD_TES_OFFCHIP_LAYOUT, _sgpr_idx, 
1);
if (ctx->abi.view_index)
set_loc_shader(ctx, AC_UD_VIEW_INDEX, _sgpr_idx, 
1);
break;
@@ -1105,9 +1096,6 @@ static void create_function(struct radv_shader_context 
*ctx,
   has_previous_stage,
   previous_stage,
   _sgpr_idx);
-   else
-   set_loc_shader(ctx, AC_UD_TES_OFFCHIP_LAYOUT,
-  _sgpr_idx, 1);
}
set_loc_shader(ctx, AC_UD_GS_VS_RING_STRIDE_ENTRIES,
   _sgpr_idx, 2);
@@ -2783,35 +2771,28 @@ out:
  */
 static LLVMValueRef get_non_vertex_index_offset(struct radv_shader_context 
*ctx)
 {
-   if (ctx->stage == MESA_SHADER_TESS_CTRL) {
-   uint32_t num_tcs_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.outputs_written);
-   uint32_t output_vertex_size = num_tcs_outputs * 16;
-   uint32_t pervertex_output_patch_size = 
ctx->tcs_vertices_per_patch * output_vertex_size;
-   uint32_t num_patches = ctx->tcs_num_patches;
+   uint32_t num_patches = ctx->tcs_num_patches;
+   uint32_t num_tcs_outputs;
 
-   return LLVMConstInt(ctx->ac.i32, pervertex_output_patch_size * 
num_patches, false);
-   } else
-   return unpack_param(>ac, ctx->tcs_offchip_layout, 16, 16);
+   if (ctx->stage == MESA_SHADER_TESS_CTRL)
+   num_tcs_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.outputs_written);
+   else
+   num_tcs_outputs = ctx->options->key.tes.tcs_num_outputs;
+
+   uint32_t output_vertex_size = num_tcs_outputs * 16;
+   uint32_t pervertex_output_patch_size = ctx->tcs_vertices_per_patch * 
output_vertex_size;
+
+   return LLVMConstInt(ctx->ac.i32, pervertex_output_patch_size * 
num_patches, false);
 }
 
 

[Mesa-dev] [PATCH 14/14] ac/radv: drop geometry stride user sgpr.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

This removes the other geometry specific user sgpr.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 37 +++--
 src/amd/common/ac_nir_to_llvm.h |  1 -
 src/amd/vulkan/radv_pipeline.c  |  9 -
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 9e4069b535a..43e13ec91e4 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -88,7 +88,6 @@ struct radv_shader_context {
LLVMValueRef tes_u;
LLVMValueRef tes_v;
 
-   LLVMValueRef gsvs_ring_stride;
LLVMValueRef gs2vs_offset;
LLVMValueRef gs_wave_id;
LLVMValueRef gs_vtx_offset[6];
@@ -122,6 +121,8 @@ struct radv_shader_context {
uint32_t tcs_vertices_per_patch;
uint32_t tcs_num_inputs;
uint32_t tcs_num_patches;
+   uint32_t max_gsvs_emit_size;
+   uint32_t gsvs_vertex_size;
 };
 
 static inline struct radv_shader_context *
@@ -636,7 +637,6 @@ static void allocate_user_sgprs(struct radv_shader_context 
*ctx,
user_sgpr_info->sgpr_count += 
count_vs_user_sgprs(ctx);
}
}
-   user_sgpr_info->sgpr_count += 1;
break;
default:
break;
@@ -966,8 +966,6 @@ static void create_function(struct radv_shader_context *ctx,
);
}
 
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >gsvs_ring_stride);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -995,8 +993,6 @@ static void create_function(struct radv_shader_context *ctx,
   _sgpr_info, ,
   _sets);
 
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >gsvs_ring_stride);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -1122,8 +1118,6 @@ static void create_function(struct radv_shader_context 
*ctx,
   previous_stage,
   _sgpr_idx);
}
-   set_loc_shader(ctx, AC_UD_GS_VS_RING_STRIDE_ENTRIES,
-  _sgpr_idx, 1);
if (ctx->abi.view_index)
set_loc_shader(ctx, AC_UD_VIEW_INDEX, _sgpr_idx, 
1);
break;
@@ -6740,6 +6734,8 @@ ac_setup_rings(struct radv_shader_context *ctx)
if (ctx->stage == MESA_SHADER_GEOMETRY) {
LLVMValueRef tmp;
uint32_t num_entries = 64;
+   LLVMValueRef gsvs_ring_stride = LLVMConstInt(ctx->ac.i32, 
ctx->max_gsvs_emit_size, false);
+   LLVMValueRef gsvs_ring_desc = LLVMConstInt(ctx->ac.i32, 
ctx->max_gsvs_emit_size << 16, false);
ctx->esgs_ring = ac_build_load_to_sgpr(>ac, 
ctx->ring_offsets, LLVMConstInt(ctx->ac.i32, RING_ESGS_GS, false));
ctx->gsvs_ring = ac_build_load_to_sgpr(>ac, 
ctx->ring_offsets, LLVMConstInt(ctx->ac.i32, RING_GSVS_GS, false));
 
@@ -6747,10 +6743,10 @@ ac_setup_rings(struct radv_shader_context *ctx)
 
tmp = LLVMConstInt(ctx->ac.i32, num_entries, false);
if (ctx->options->chip_class >= VI)
-   tmp = LLVMBuildMul(ctx->ac.builder, 
LLVMBuildLShr(ctx->ac.builder, ctx->gsvs_ring_stride, LLVMConstInt(ctx->ac.i32, 
16, false), ""), tmp, "");
+   tmp = LLVMBuildMul(ctx->ac.builder, gsvs_ring_stride, 
tmp, "");
ctx->gsvs_ring = LLVMBuildInsertElement(ctx->ac.builder, 
ctx->gsvs_ring, tmp, LLVMConstInt(ctx->ac.i32, 2, false), "");
tmp = LLVMBuildExtractElement(ctx->ac.builder, ctx->gsvs_ring, 
ctx->ac.i32_1, "");
-   tmp = LLVMBuildOr(ctx->ac.builder, tmp, ctx->gsvs_ring_stride, 
"");
+   tmp = LLVMBuildOr(ctx->ac.builder, tmp, gsvs_ring_desc, "");
ctx->gsvs_ring = LLVMBuildInsertElement(ctx->ac.builder, 
ctx->gsvs_ring, tmp, ctx->ac.i32_1, "");
}
 
@@ -6968,6 +6964,17 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
if (i)
emit_barrier(, ctx.stage);
 
+   nir_foreach_variable(variable, [i]->outputs)
+   scan_shader_output_decl(, variable, shaders[i], 
shaders[i]->info.stage);
+
+   if (shaders[i]->info.stage == MESA_SHADER_GEOMETRY) {
+   unsigned addclip = 
shaders[i]->info.clip_distance_array_size +

[Mesa-dev] [PATCH 13/14] ac/radv: get rid of geometry user sgpr for num entries.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

This drops one of the geometry specific user sgprs,
we can work this out at compile time.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 15 +++
 src/amd/vulkan/radv_pipeline.c  |  9 +
 2 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 1cf181ddeba..9e4069b535a 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -89,7 +89,6 @@ struct radv_shader_context {
LLVMValueRef tes_v;
 
LLVMValueRef gsvs_ring_stride;
-   LLVMValueRef gsvs_num_entries;
LLVMValueRef gs2vs_offset;
LLVMValueRef gs_wave_id;
LLVMValueRef gs_vtx_offset[6];
@@ -637,7 +636,7 @@ static void allocate_user_sgprs(struct radv_shader_context 
*ctx,
user_sgpr_info->sgpr_count += 
count_vs_user_sgprs(ctx);
}
}
-   user_sgpr_info->sgpr_count += 2;
+   user_sgpr_info->sgpr_count += 1;
break;
default:
break;
@@ -969,8 +968,6 @@ static void create_function(struct radv_shader_context *ctx,
 
add_arg(, ARG_SGPR, ctx->ac.i32,
>gsvs_ring_stride);
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >gsvs_num_entries);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -1000,8 +997,6 @@ static void create_function(struct radv_shader_context 
*ctx,
 
add_arg(, ARG_SGPR, ctx->ac.i32,
>gsvs_ring_stride);
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >gsvs_num_entries);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -1128,7 +1123,7 @@ static void create_function(struct radv_shader_context 
*ctx,
   _sgpr_idx);
}
set_loc_shader(ctx, AC_UD_GS_VS_RING_STRIDE_ENTRIES,
-  _sgpr_idx, 2);
+  _sgpr_idx, 1);
if (ctx->abi.view_index)
set_loc_shader(ctx, AC_UD_VIEW_INDEX, _sgpr_idx, 
1);
break;
@@ -6744,12 +6739,16 @@ ac_setup_rings(struct radv_shader_context *ctx)
}
if (ctx->stage == MESA_SHADER_GEOMETRY) {
LLVMValueRef tmp;
+   uint32_t num_entries = 64;
ctx->esgs_ring = ac_build_load_to_sgpr(>ac, 
ctx->ring_offsets, LLVMConstInt(ctx->ac.i32, RING_ESGS_GS, false));
ctx->gsvs_ring = ac_build_load_to_sgpr(>ac, 
ctx->ring_offsets, LLVMConstInt(ctx->ac.i32, RING_GSVS_GS, false));
 
ctx->gsvs_ring = LLVMBuildBitCast(ctx->ac.builder, 
ctx->gsvs_ring, ctx->ac.v4i32, "");
 
-   ctx->gsvs_ring = LLVMBuildInsertElement(ctx->ac.builder, 
ctx->gsvs_ring, ctx->gsvs_num_entries, LLVMConstInt(ctx->ac.i32, 2, false), "");
+   tmp = LLVMConstInt(ctx->ac.i32, num_entries, false);
+   if (ctx->options->chip_class >= VI)
+   tmp = LLVMBuildMul(ctx->ac.builder, 
LLVMBuildLShr(ctx->ac.builder, ctx->gsvs_ring_stride, LLVMConstInt(ctx->ac.i32, 
16, false), ""), tmp, "");
+   ctx->gsvs_ring = LLVMBuildInsertElement(ctx->ac.builder, 
ctx->gsvs_ring, tmp, LLVMConstInt(ctx->ac.i32, 2, false), "");
tmp = LLVMBuildExtractElement(ctx->ac.builder, ctx->gsvs_ring, 
ctx->ac.i32_1, "");
tmp = LLVMBuildOr(ctx->ac.builder, tmp, ctx->gsvs_ring_stride, 
"");
ctx->gsvs_ring = LLVMBuildInsertElement(ctx->ac.builder, 
ctx->gsvs_ring, tmp, ctx->ac.i32_1, "");
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index e9a9ae975b1..5d1b5f6e352 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -2622,16 +2622,9 @@ radv_pipeline_generate_geometry_shader(struct 
radeon_winsys_cs *cs,
 
AC_UD_GS_VS_RING_STRIDE_ENTRIES);
if (loc->sgpr_idx != -1) {
uint32_t stride = gs->info.gs.max_gsvs_emit_size;
-   uint32_t num_entries = 64;
-   bool is_vi = 
pipeline->device->physical_device->rad_info.chip_class >= VI;
-
-   if (is_vi)
-   num_entries *= stride;
-
stride = S_008F04_STRIDE(stride);
-   radeon_set_sh_reg_seq(cs, R_00B230_SPI_SHADER_USER_DATA_GS_0 + 
loc->sgpr_idx * 4, 2);
+   radeon_set_sh_reg_seq(cs, R_00B230_SPI_SHADER_USER_DATA_GS_0 + 
loc->sgpr_idx * 4, 1);
 

[Mesa-dev] [PATCH 08/14] radv: pass num_patches to tes from tcs

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

TES needs num_patches to do some of the calculations.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 4 +++-
 src/amd/common/ac_nir_to_llvm.h | 3 ++-
 src/amd/vulkan/radv_pipeline.c  | 4 
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 90b27603266..981c5d542a4 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2804,7 +2804,7 @@ static LLVMValueRef calc_param_stride(struct 
radv_shader_context *ctx,
else
param_stride = LLVMConstInt(ctx->ac.i32, 
ctx->tcs_num_patches, false);
} else {
-   LLVMValueRef num_patches = unpack_param(>ac, 
ctx->tcs_offchip_layout, 0, 9);
+   LLVMValueRef num_patches = LLVMConstInt(ctx->ac.i32, 
ctx->tcs_num_patches, false);
LLVMValueRef vertices_per_patch = LLVMConstInt(ctx->ac.i32, 
ctx->tcs_vertices_per_patch, false);
if (vertex_index)
param_stride = LLVMBuildMul(ctx->ac.builder, 
vertices_per_patch,
@@ -6956,6 +6956,7 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
ctx.abi.load_tess_coord = load_tess_coord;
ctx.abi.load_patch_vertices_in = load_patch_vertices_in;
ctx.tcs_vertices_per_patch = 
shaders[i]->info.tess.tcs_vertices_out;
+   ctx.tcs_num_patches = ctx.options->key.tes.num_patches;
} else if (shaders[i]->info.stage == MESA_SHADER_VERTEX) {
if (shader_info->info.vs.needs_instance_id) {
if (ctx.options->key.vs.as_ls) {
@@ -7021,6 +7022,7 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
} else if (shaders[i]->info.stage == MESA_SHADER_TESS_CTRL) {
shader_info->tcs.outputs_written = 
ctx.tess_outputs_written;
shader_info->tcs.patch_outputs_written = 
ctx.tess_patch_outputs_written;
+   shader_info->tcs.num_patches = ctx.tcs_num_patches;
assert(ctx.tess_outputs_written == 
ctx.shader_info->info.tcs.outputs_written);
assert(ctx.tess_patch_outputs_written == 
ctx.shader_info->info.tcs.patch_outputs_written);
} else if (shaders[i]->info.stage == MESA_SHADER_VERTEX && 
ctx.options->key.vs.as_ls) {
diff --git a/src/amd/common/ac_nir_to_llvm.h b/src/amd/common/ac_nir_to_llvm.h
index d81123144df..f1348c849b2 100644
--- a/src/amd/common/ac_nir_to_llvm.h
+++ b/src/amd/common/ac_nir_to_llvm.h
@@ -49,6 +49,7 @@ struct ac_vs_variant_key {
 struct ac_tes_variant_key {
uint32_t as_es:1;
uint32_t export_prim_id:1;
+   uint32_t num_patches;
 };
 
 struct ac_tcs_variant_key {
@@ -201,7 +202,7 @@ struct ac_shader_variant_info {
uint64_t outputs_written;
/* Which patch outputs are actually written */
uint32_t patch_outputs_written;
-
+   uint32_t num_patches;
} tcs;
struct {
struct ac_vs_output_info outinfo;
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 06b2db8455f..92a8d8c7051 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -1786,6 +1786,7 @@ void radv_create_shaders(struct radv_pipeline *pipeline,

  _sizes[MESA_SHADER_TESS_CTRL]);
}
modules[MESA_SHADER_VERTEX] = NULL;
+   keys[MESA_SHADER_TESS_EVAL].tes.num_patches = 
pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.tcs.num_patches;
}
 
if (device->physical_device->rad_info.chip_class >= GFX9 && 
modules[MESA_SHADER_GEOMETRY]) {
@@ -1805,6 +1806,9 @@ void radv_create_shaders(struct radv_pipeline *pipeline,
if (i == MESA_SHADER_TESS_CTRL) {
keys[MESA_SHADER_TESS_CTRL].tcs.num_inputs = 
util_last_bit64(pipeline->shaders[MESA_SHADER_VERTEX]->info.info.vs.ls_outputs_written);
}
+   if (i == MESA_SHADER_TESS_EVAL) {
+   keys[MESA_SHADER_TESS_EVAL].tes.num_patches = 
pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.tcs.num_patches;
+   }
pipeline->shaders[i] = 
radv_shader_variant_create(device, modules[i], [i], 1,
  
pipeline->layout,
  keys 
+ i, [i],
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org

[Mesa-dev] [PATCH 05/14] radv: drop tcs_out_layout

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

Move all calculations to shader generation.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 22 +-
 src/amd/vulkan/radv_pipeline.c  |  8 ++--
 2 files changed, 15 insertions(+), 15 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 923bfaabb97..b6c84390a1e 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -83,7 +83,6 @@ struct radv_shader_context {
 
LLVMValueRef tcs_offchip_layout;
LLVMValueRef tcs_out_offsets;
-   LLVMValueRef tcs_out_layout;
LLVMValueRef oc_lds;
LLVMValueRef merged_wave_info;
LLVMValueRef tess_factor_offset;
@@ -355,13 +354,22 @@ get_tcs_in_patch_stride(struct radv_shader_context *ctx)
 static LLVMValueRef
 get_tcs_out_patch_stride(struct radv_shader_context *ctx)
 {
-   return unpack_param(>ac, ctx->tcs_out_layout, 0, 13);
+   uint32_t num_tcs_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.outputs_written);
+   uint32_t num_tcs_patch_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.patch_outputs_written);
+   uint32_t output_vertex_size = num_tcs_outputs * 16;
+   uint32_t pervertex_output_patch_size = ctx->tcs_vertices_per_patch * 
output_vertex_size;
+   uint32_t output_patch_size = pervertex_output_patch_size + 
num_tcs_patch_outputs * 16;
+   output_patch_size /= 4;
+   return LLVMConstInt(ctx->ac.i32, output_patch_size, false);
 }
 
 static LLVMValueRef
 get_tcs_out_vertex_stride(struct radv_shader_context *ctx)
 {
-   return unpack_param(>ac, ctx->tcs_out_layout, 13, 8);
+   uint32_t num_tcs_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.outputs_written);
+   uint32_t output_vertex_size = num_tcs_outputs * 16;
+   output_vertex_size /= 4;
+   return LLVMConstInt(ctx->ac.i32, output_vertex_size, false);
 }
 
 static LLVMValueRef
@@ -533,7 +541,7 @@ static void allocate_user_sgprs(struct radv_shader_context 
*ctx,
if (previous_stage == MESA_SHADER_VERTEX)
user_sgpr_info->sgpr_count += 
count_vs_user_sgprs(ctx);
}
-   user_sgpr_info->sgpr_count += 3;
+   user_sgpr_info->sgpr_count += 2;
break;
case MESA_SHADER_TESS_EVAL:
user_sgpr_info->sgpr_count += 1;
@@ -804,8 +812,6 @@ static void create_function(struct radv_shader_context *ctx,
>tcs_offchip_layout);
add_arg(, ARG_SGPR, ctx->ac.i32,
>tcs_out_offsets);
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >tcs_out_layout);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -827,8 +833,6 @@ static void create_function(struct radv_shader_context *ctx,
>tcs_offchip_layout);
add_arg(, ARG_SGPR, ctx->ac.i32,
>tcs_out_offsets);
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >tcs_out_layout);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -1037,7 +1041,7 @@ static void create_function(struct radv_shader_context 
*ctx,
case MESA_SHADER_TESS_CTRL:
set_vs_specific_input_locs(ctx, stage, has_previous_stage,
   previous_stage, _sgpr_idx);
-   set_loc_shader(ctx, AC_UD_TCS_OFFCHIP_LAYOUT, _sgpr_idx, 
3);
+   set_loc_shader(ctx, AC_UD_TCS_OFFCHIP_LAYOUT, _sgpr_idx, 
2);
if (ctx->abi.view_index)
set_loc_shader(ctx, AC_UD_VIEW_INDEX, _sgpr_idx, 
1);
break;
diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 79a9c1f6c87..d51b191788d 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -62,7 +62,6 @@ struct radv_blend_state {
 
 struct radv_tessellation_state {
uint32_t ls_hs_config;
-   uint32_t tcs_out_layout;
uint32_t tcs_out_offsets;
uint32_t offchip_layout;
unsigned num_patches;
@@ -1382,8 +1381,6 @@ calculate_tess_state(struct radv_pipeline *pipeline,
 
tess.lds_size = lds_size;
 
-   tess.tcs_out_layout = (output_patch_size / 4) |
-   ((output_vertex_size / 4) << 13);
tess.tcs_out_offsets = (output_patch0_offset / 16) |
((perpatch_output_offset / 16) << 16);
tess.offchip_layout = (pervertex_output_patch_size * num_patches << 16) 
|
@@ -2615,12 +2612,11 @@ radv_pipeline_generate_tess_shaders(struct 
radeon_winsys_cs *cs,

[Mesa-dev] [PATCH 07/14] radv: drop tess offchip layout for tcs.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

This removes the last TCS specific user sgpr.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 118 ++--
 src/amd/common/ac_nir_to_llvm.h |   2 +-
 src/amd/vulkan/radv_pipeline.c  |   9 ---
 src/amd/vulkan/radv_shader.c|   2 +-
 4 files changed, 92 insertions(+), 39 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 396b98698e6..90b27603266 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -125,6 +125,7 @@ struct radv_shader_context {
uint64_t tcs_outputs_read;
uint32_t tcs_vertices_per_patch;
uint32_t tcs_num_inputs;
+   uint32_t tcs_num_patches;
 };
 
 static inline struct radv_shader_context *
@@ -319,6 +320,46 @@ static LLVMValueRef get_rel_patch_id(struct 
radv_shader_context *ctx)
}
 }
 
+static unsigned
+get_tcs_num_patches(struct radv_shader_context *ctx)
+{
+   unsigned num_tcs_input_cp = ctx->options->key.tcs.input_vertices;
+   unsigned num_tcs_output_cp = ctx->tcs_vertices_per_patch;
+   uint32_t input_vertex_size = ctx->tcs_num_inputs * 16;
+   uint32_t input_patch_size = ctx->options->key.tcs.input_vertices * 
input_vertex_size;
+   uint32_t num_tcs_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.outputs_written);
+   uint32_t num_tcs_patch_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.patch_outputs_written);
+   uint32_t output_vertex_size = num_tcs_outputs * 16;
+   uint32_t pervertex_output_patch_size = ctx->tcs_vertices_per_patch * 
output_vertex_size;
+   uint32_t output_patch_size = pervertex_output_patch_size + 
num_tcs_patch_outputs * 16;
+   unsigned num_patches;
+   unsigned hardware_lds_size;
+
+   /* Ensure that we only need one wave per SIMD so we don't need to check
+* resource usage. Also ensures that the number of tcs in and out
+* vertices per threadgroup are at most 256.
+*/
+   num_patches = 64 / MAX2(num_tcs_input_cp, num_tcs_output_cp) * 4;
+   /* Make sure that the data fits in LDS. This assumes the shaders only
+* use LDS for the inputs and outputs.
+*/
+   hardware_lds_size = ctx->options->chip_class >= CIK ? 65536 : 32768;
+   num_patches = MIN2(num_patches, hardware_lds_size / (input_patch_size + 
output_patch_size));
+   /* Make sure the output data fits in the offchip buffer */
+   num_patches = MIN2(num_patches, 
(ctx->options->tess_offchip_block_dw_size * 4) / output_patch_size);
+   /* Not necessary for correctness, but improves performance. The
+* specific value is taken from the proprietary driver.
+*/
+   num_patches = MIN2(num_patches, 40);
+
+   /* SI bug workaround - limit LS-HS threadgroups to only one wave. */
+   if (ctx->options->chip_class == SI) {
+   unsigned one_wave = 64 / MAX2(num_tcs_input_cp, 
num_tcs_output_cp);
+   num_patches = MIN2(num_patches, one_wave);
+   }
+   return num_patches;
+}
+
 /* Tessellation shaders pass outputs to the next shader using LDS.
  *
  * LS outputs = TCS inputs
@@ -378,17 +419,17 @@ get_tcs_out_patch0_offset(struct radv_shader_context *ctx)
uint32_t input_vertex_size = ctx->tcs_num_inputs * 16;
uint32_t input_patch_size = ctx->options->key.tcs.input_vertices * 
input_vertex_size;
uint32_t output_patch0_offset = input_patch_size;
-   LLVMValueRef num_patches = unpack_param(>ac, 
ctx->tcs_offchip_layout, 0, 9);
+   unsigned num_patches = ctx->tcs_num_patches;
 
+   output_patch0_offset *= num_patches;
output_patch0_offset /= 4;
-   return LLVMBuildMul(ctx->ac.builder,
-   num_patches,
-   LLVMConstInt(ctx->ac.i32, output_patch0_offset, 
false), "");
+   return LLVMConstInt(ctx->ac.i32, output_patch0_offset, false);
 }
 
 static LLVMValueRef
 get_tcs_out_patch0_patch_data_offset(struct radv_shader_context *ctx)
 {
+   assert (ctx->stage == MESA_SHADER_TESS_CTRL);
uint32_t input_vertex_size = ctx->tcs_num_inputs * 16;
uint32_t input_patch_size = ctx->options->key.tcs.input_vertices * 
input_vertex_size;
uint32_t output_patch0_offset = input_patch_size;
@@ -396,15 +437,13 @@ get_tcs_out_patch0_patch_data_offset(struct 
radv_shader_context *ctx)
uint32_t num_tcs_outputs = 
util_last_bit64(ctx->shader_info->info.tcs.outputs_written);
uint32_t output_vertex_size = num_tcs_outputs * 16;
uint32_t pervertex_output_patch_size = ctx->tcs_vertices_per_patch * 
output_vertex_size;
-   LLVMValueRef num_patches = unpack_param(>ac, 
ctx->tcs_offchip_layout, 0, 9);
+   unsigned num_patches = ctx->tcs_num_patches;
 
+   output_patch0_offset *= num_patches;
+   output_patch0_offset += pervertex_output_patch_size;
output_patch0_offset /= 4;
- 

[Mesa-dev] [PATCH 10/14] radv: use num_patches output from tcs shader.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

Instead of recalculating the value, use the shader calculated value.

Signed-off-by: Dave Airlie 
---
 src/amd/vulkan/radv_pipeline.c | 30 ++
 1 file changed, 2 insertions(+), 28 deletions(-)

diff --git a/src/amd/vulkan/radv_pipeline.c b/src/amd/vulkan/radv_pipeline.c
index 5fdbce093d0..c7c23a85ac1 100644
--- a/src/amd/vulkan/radv_pipeline.c
+++ b/src/amd/vulkan/radv_pipeline.c
@@ -1311,7 +1311,7 @@ calculate_tess_state(struct radv_pipeline *pipeline,
unsigned num_tcs_patch_outputs;
unsigned input_vertex_size, output_vertex_size, 
pervertex_output_patch_size;
unsigned input_patch_size, output_patch_size, output_patch0_offset;
-   unsigned lds_size, hardware_lds_size;
+   unsigned lds_size;
unsigned num_patches;
struct radv_tessellation_state tess = {0};
 
@@ -1334,34 +1334,8 @@ calculate_tess_state(struct radv_pipeline *pipeline,
 
pervertex_output_patch_size = num_tcs_output_cp * output_vertex_size;
output_patch_size = pervertex_output_patch_size + num_tcs_patch_outputs 
* 16;
-   /* Ensure that we only need one wave per SIMD so we don't need to check
-* resource usage. Also ensures that the number of tcs in and out
-* vertices per threadgroup are at most 256.
-*/
-   num_patches = 64 / MAX2(num_tcs_input_cp, num_tcs_output_cp) * 4;
-
-   /* Make sure that the data fits in LDS. This assumes the shaders only
-* use LDS for the inputs and outputs.
-*/
-   hardware_lds_size = 
pipeline->device->physical_device->rad_info.chip_class >= CIK ? 65536 : 32768;
-   num_patches = MIN2(num_patches, hardware_lds_size / (input_patch_size + 
output_patch_size));
-
-   /* Make sure the output data fits in the offchip buffer */
-   num_patches = MIN2(num_patches,
-   (pipeline->device->tess_offchip_block_dw_size * 4) /
-   output_patch_size);
-
-   /* Not necessary for correctness, but improves performance. The
-* specific value is taken from the proprietary driver.
-*/
-   num_patches = MIN2(num_patches, 40);
-
-   /* SI bug workaround - limit LS-HS threadgroups to only one wave. */
-   if (pipeline->device->physical_device->rad_info.chip_class == SI) {
-   unsigned one_wave = 64 / MAX2(num_tcs_input_cp, 
num_tcs_output_cp);
-   num_patches = MIN2(num_patches, one_wave);
-   }
 
+   num_patches = 
pipeline->shaders[MESA_SHADER_TESS_CTRL]->info.tcs.num_patches;
output_patch0_offset = input_patch_size * num_patches;
/*  perpatch_output_offset = output_patch0_offset + 
pervertex_output_patch_size;*/
 
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 02/14] ac/shader_info: start gathering tess output info

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

This gathers the ls outputs written by the vertex shader,
and the tcs outputs, these are needed to calculate certain
tcs parameters.

These have to be separate for combined gfx9 shaders.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_shader_info.c | 48 +++--
 src/amd/common/ac_shader_info.h |  5 +
 2 files changed, 51 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_shader_info.c b/src/amd/common/ac_shader_info.c
index 5ae8a720462..5f2b34e34d0 100644
--- a/src/amd/common/ac_shader_info.c
+++ b/src/amd/common/ac_shader_info.c
@@ -30,6 +30,23 @@ static void mark_sampler_desc(const nir_variable *var,
info->desc_set_used_mask |= (1 << var->data.descriptor_set);
 }
 
+static void mark_ls_output(struct ac_shader_info *info,
+  uint32_t param, int num_slots)
+{
+   uint64_t mask = (1ull << num_slots) - 1ull;
+   info->vs.ls_outputs_written |= (mask << param);
+}
+
+static void mark_tess_output(struct ac_shader_info *info,
+bool is_patch, uint32_t param, int num_slots)
+{
+   uint64_t mask = (1ull << num_slots) - 1ull;
+   if (is_patch)
+   info->tcs.patch_outputs_written |= (mask << param);
+   else
+   info->tcs.outputs_written |= (mask << param);
+}
+
 static void
 gather_intrinsic_info(const nir_shader *nir, const nir_intrinsic_instr *instr,
  struct ac_shader_info *info)
@@ -146,6 +163,18 @@ gather_intrinsic_info(const nir_shader *nir, const 
nir_intrinsic_instr *instr,
}
}
break;
+   case nir_intrinsic_store_var:
+   if (nir->info.stage == MESA_SHADER_TESS_CTRL) {
+   nir_deref_var *dvar = instr->variables[0];
+   nir_variable *var = dvar->var;
+
+   if (var->data.mode == nir_var_shader_out) {
+   unsigned param = 
shader_io_get_unique_index(var->data.location);
+   int num_slots = 
glsl_count_attribute_slots(glsl_without_array(var->type), false);
+   mark_tess_output(info, var->data.patch, param, 
num_slots);
+   }
+   }
+   break;
default:
break;
}
@@ -238,14 +267,29 @@ gather_info_output_decl_ps(const nir_shader *nir, const 
nir_variable *var,
}
 }
 
+static void
+gather_info_output_decl_vs(const nir_shader *nir, const nir_variable *var,
+  struct ac_shader_info *info)
+{
+   int idx = var->data.location;
+   unsigned param = shader_io_get_unique_index(idx);
+   int num_slots = glsl_count_attribute_slots(var->type, false);
+   mark_ls_output(info, param, num_slots);
+}
+
 static void
 gather_info_output_decl(const nir_shader *nir, const nir_variable *var,
-   struct ac_shader_info *info)
+   struct ac_shader_info *info,
+   const struct ac_nir_compiler_options *options)
 {
switch (nir->info.stage) {
case MESA_SHADER_FRAGMENT:
gather_info_output_decl_ps(nir, var, info);
break;
+   case MESA_SHADER_VERTEX:
+   if (options->key.vs.as_ls)
+   gather_info_output_decl_vs(nir, var, info);
+   break;
default:
break;
}
@@ -270,5 +314,5 @@ ac_nir_shader_info_pass(const struct nir_shader *nir,
}
 
nir_foreach_variable(variable, >outputs)
-   gather_info_output_decl(nir, variable, info);
+   gather_info_output_decl(nir, variable, info, options);
 }
diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
index 9574380877a..52741f5935c 100644
--- a/src/amd/common/ac_shader_info.h
+++ b/src/amd/common/ac_shader_info.h
@@ -37,6 +37,7 @@ struct ac_shader_info {
bool uses_invocation_id;
bool uses_prim_id;
struct {
+   uint64_t ls_outputs_written;
uint8_t input_usage_mask[VERT_ATTRIB_MAX];
bool has_vertex_buffers; /* needs vertex buffers and base/start 
*/
bool needs_draw_id;
@@ -57,6 +58,10 @@ struct ac_shader_info {
bool uses_thread_id[3];
bool uses_local_invocation_idx;
} cs;
+   struct {
+   uint64_t outputs_written;
+   uint64_t patch_outputs_written;
+   } tcs;
 };
 
 /* A NIR pass to gather all the info needed to optimise the allocation patterns
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [rfc] radv drop all tess/gs specific user sgprs

2018-02-20 Thread Dave Airlie
It seems to be season for reducing sgpr usage, but I was looking
at the tess/gs sgprs on radv when I realised nothing in them wasn't
static from the pipeline at compile time, so there is no need to
go passing to the shader via the user sgprs.

This series removes all the tess/gs specific users sgprs from radv.

It firstly adds support to the info gathering pass to get the tess
outputs for vs/tcs. Then it uses that info and drops each sgpr
in turn. It also moves some of the calculation code into the shader
side, like lds size.

This code should only affect radv specific code in the nir->llvm
code.

I've tested on vega/vi with a cts subset + shader demos.

I'm not really sure this gives any speedup on anything, but it
definitely makes the code a lot leaner, I've also probably left
the calculations in the ac code a bit verbose to avoid making
mistakes when moving them over from the pipeline, these could
be cleaned up later.

Dave.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/14] ac: migrate unique index info shader info

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

This just moves this function to an inline so the shader_info
pass can use it.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 22 --
 src/amd/common/ac_shader_info.h | 25 +
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 351e6fa9efc..c21a78b1335 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -148,28 +148,6 @@ static unsigned radeon_llvm_reg_index_soa(unsigned index, 
unsigned chan)
return (index * 4) + chan;
 }
 
-static unsigned shader_io_get_unique_index(gl_varying_slot slot)
-{
-   /* handle patch indices separate */
-   if (slot == VARYING_SLOT_TESS_LEVEL_OUTER)
-   return 0;
-   if (slot == VARYING_SLOT_TESS_LEVEL_INNER)
-   return 1;
-   if (slot >= VARYING_SLOT_PATCH0 && slot <= VARYING_SLOT_TESS_MAX)
-   return 2 + (slot - VARYING_SLOT_PATCH0);
-
-   if (slot == VARYING_SLOT_POS)
-   return 0;
-   if (slot == VARYING_SLOT_PSIZ)
-   return 1;
-   if (slot == VARYING_SLOT_CLIP_DIST0)
-   return 2;
-   /* 3 is reserved for clip dist as well */
-   if (slot >= VARYING_SLOT_VAR0 && slot <= VARYING_SLOT_VAR31)
-   return 4 + (slot - VARYING_SLOT_VAR0);
-   unreachable("illegal slot in get unique index\n");
-}
-
 static void set_llvm_calling_convention(LLVMValueRef func,
 gl_shader_stage stage)
 {
diff --git a/src/amd/common/ac_shader_info.h b/src/amd/common/ac_shader_info.h
index 7f87582930c..9574380877a 100644
--- a/src/amd/common/ac_shader_info.h
+++ b/src/amd/common/ac_shader_info.h
@@ -25,6 +25,7 @@
 #define AC_SHADER_INFO_H
 
 #include "compiler/shader_enums.h"
+#include "util/macros.h"
 
 struct nir_shader;
 struct ac_nir_compiler_options;
@@ -66,4 +67,28 @@ ac_nir_shader_info_pass(const struct nir_shader *nir,
const struct ac_nir_compiler_options *options,
struct ac_shader_info *info);
 
+
+static __inline__ unsigned shader_io_get_unique_index(gl_varying_slot slot)
+{
+   /* handle patch indices separate */
+   if (slot == VARYING_SLOT_TESS_LEVEL_OUTER)
+   return 0;
+   if (slot == VARYING_SLOT_TESS_LEVEL_INNER)
+   return 1;
+   if (slot >= VARYING_SLOT_PATCH0 && slot <= VARYING_SLOT_TESS_MAX)
+   return 2 + (slot - VARYING_SLOT_PATCH0);
+
+   if (slot == VARYING_SLOT_POS)
+   return 0;
+   if (slot == VARYING_SLOT_PSIZ)
+   return 1;
+   if (slot == VARYING_SLOT_CLIP_DIST0)
+   return 2;
+   /* 3 is reserved for clip dist as well */
+   if (slot >= VARYING_SLOT_VAR0 && slot <= VARYING_SLOT_VAR31)
+   return 4 + (slot - VARYING_SLOT_VAR0);
+   unreachable("illegal slot in get unique index\n");
+}
+
+
 #endif
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/14] radv/tess: drop tcs_in_layout setting completely.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

Inline all calcs at shader creation.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 27 ++-
 src/amd/common/ac_nir_to_llvm.h |  1 +
 src/amd/vulkan/radv_pipeline.c  | 12 ++--
 3 files changed, 25 insertions(+), 15 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index ce3679abedc..923bfaabb97 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -84,7 +84,6 @@ struct radv_shader_context {
LLVMValueRef tcs_offchip_layout;
LLVMValueRef tcs_out_offsets;
LLVMValueRef tcs_out_layout;
-   LLVMValueRef tcs_in_layout;
LLVMValueRef oc_lds;
LLVMValueRef merged_wave_info;
LLVMValueRef tess_factor_offset;
@@ -127,6 +126,7 @@ struct radv_shader_context {
uint32_t tcs_patch_outputs_read;
uint64_t tcs_outputs_read;
uint32_t tcs_vertices_per_patch;
+   uint32_t tcs_num_inputs;
 };
 
 static inline struct radv_shader_context *
@@ -345,7 +345,11 @@ static LLVMValueRef
 get_tcs_in_patch_stride(struct radv_shader_context *ctx)
 {
assert (ctx->stage == MESA_SHADER_TESS_CTRL);
-   return unpack_param(>ac, ctx->tcs_in_layout, 0, 13);
+   uint32_t input_vertex_size = ctx->tcs_num_inputs * 16;
+   uint32_t input_patch_size = ctx->options->key.tcs.input_vertices * 
input_vertex_size;
+
+   input_patch_size /= 4;
+   return LLVMConstInt(ctx->ac.i32, input_patch_size, false);
 }
 
 static LLVMValueRef
@@ -529,7 +533,7 @@ static void allocate_user_sgprs(struct radv_shader_context 
*ctx,
if (previous_stage == MESA_SHADER_VERTEX)
user_sgpr_info->sgpr_count += 
count_vs_user_sgprs(ctx);
}
-   user_sgpr_info->sgpr_count += 4;
+   user_sgpr_info->sgpr_count += 3;
break;
case MESA_SHADER_TESS_EVAL:
user_sgpr_info->sgpr_count += 1;
@@ -802,8 +806,6 @@ static void create_function(struct radv_shader_context *ctx,
>tcs_out_offsets);
add_arg(, ARG_SGPR, ctx->ac.i32,
>tcs_out_layout);
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >tcs_in_layout);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -827,8 +829,6 @@ static void create_function(struct radv_shader_context *ctx,
>tcs_out_offsets);
add_arg(, ARG_SGPR, ctx->ac.i32,
>tcs_out_layout);
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >tcs_in_layout);
if (needs_view_index)
add_arg(, ARG_SGPR, ctx->ac.i32,
>abi.view_index);
@@ -1037,7 +1037,7 @@ static void create_function(struct radv_shader_context 
*ctx,
case MESA_SHADER_TESS_CTRL:
set_vs_specific_input_locs(ctx, stage, has_previous_stage,
   previous_stage, _sgpr_idx);
-   set_loc_shader(ctx, AC_UD_TCS_OFFCHIP_LAYOUT, _sgpr_idx, 
4);
+   set_loc_shader(ctx, AC_UD_TCS_OFFCHIP_LAYOUT, _sgpr_idx, 
3);
if (ctx->abi.view_index)
set_loc_shader(ctx, AC_UD_VIEW_INDEX, _sgpr_idx, 
1);
break;
@@ -2857,7 +2857,9 @@ load_tcs_varyings(struct ac_shader_abi *abi,
unsigned param = shader_io_get_unique_index(location);
 
if (load_input) {
-   stride = unpack_param(>ac, ctx->tcs_in_layout, 13, 8);
+   uint32_t input_vertex_size = ctx->tcs_num_inputs * 16;
+   input_vertex_size /= 4;
+   stride = LLVMConstInt(ctx->ac.i32, input_vertex_size, false);
dw_addr = get_tcs_in_current_patch_offset(ctx);
} else {
if (!is_patch) {
@@ -6863,6 +6865,10 @@ LLVMModuleRef 
ac_translate_nir_to_llvm(LLVMTargetMachineRef tm,
ctx.abi.load_patch_vertices_in = load_patch_vertices_in;
ctx.abi.store_tcs_outputs = store_tcs_output;
ctx.tcs_vertices_per_patch = 
shaders[i]->info.tess.tcs_vertices_out;
+   if (shader_count == 1)
+   ctx.tcs_num_inputs = 
ctx.options->key.tcs.num_inputs;
+   else
+   ctx.tcs_num_inputs = 
util_last_bit64(shader_info->info.vs.ls_outputs_written);
} else if (shaders[i]->info.stage == MESA_SHADER_TESS_EVAL) {
ctx.tes_primitive_mode = 
shaders[i]->info.tess.primitive_mode;

[Mesa-dev] [PATCH 03/14] radv: drop ls_out_layout const.

2018-02-20 Thread Dave Airlie
From: Dave Airlie 

We can precalculate input_vertex_size at compile time.

Signed-off-by: Dave Airlie 
---
 src/amd/common/ac_nir_to_llvm.c | 30 --
 src/amd/common/ac_nir_to_llvm.h |  1 -
 src/amd/vulkan/radv_pipeline.c  | 10 --
 3 files changed, 4 insertions(+), 37 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index c21a78b1335..ce3679abedc 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -79,7 +79,6 @@ struct radv_shader_context {
LLVMValueRef vertex_buffers;
LLVMValueRef rel_auto_id;
LLVMValueRef vs_prim_id;
-   LLVMValueRef ls_out_layout;
LLVMValueRef es2gs_offset;
 
LLVMValueRef tcs_offchip_layout;
@@ -345,14 +344,8 @@ static LLVMValueRef get_rel_patch_id(struct 
radv_shader_context *ctx)
 static LLVMValueRef
 get_tcs_in_patch_stride(struct radv_shader_context *ctx)
 {
-   if (ctx->stage == MESA_SHADER_VERTEX)
-   return unpack_param(>ac, ctx->ls_out_layout, 0, 13);
-   else if (ctx->stage == MESA_SHADER_TESS_CTRL)
-   return unpack_param(>ac, ctx->tcs_in_layout, 0, 13);
-   else {
-   assert(0);
-   return NULL;
-   }
+   assert (ctx->stage == MESA_SHADER_TESS_CTRL);
+   return unpack_param(>ac, ctx->tcs_in_layout, 0, 13);
 }
 
 static LLVMValueRef
@@ -530,14 +523,11 @@ static void allocate_user_sgprs(struct 
radv_shader_context *ctx,
case MESA_SHADER_VERTEX:
if (!ctx->is_gs_copy_shader)
user_sgpr_info->sgpr_count += count_vs_user_sgprs(ctx);
-   if (ctx->options->key.vs.as_ls)
-   user_sgpr_info->sgpr_count++;
break;
case MESA_SHADER_TESS_CTRL:
if (has_previous_stage) {
if (previous_stage == MESA_SHADER_VERTEX)
user_sgpr_info->sgpr_count += 
count_vs_user_sgprs(ctx);
-   user_sgpr_info->sgpr_count++;
}
user_sgpr_info->sgpr_count += 4;
break;
@@ -781,9 +771,6 @@ static void create_function(struct radv_shader_context *ctx,
if (ctx->options->key.vs.as_es)
add_arg(, ARG_SGPR, ctx->ac.i32,
>es2gs_offset);
-   else if (ctx->options->key.vs.as_ls)
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >ls_out_layout);
 
declare_vs_input_vgprs(ctx, );
break;
@@ -809,9 +796,6 @@ static void create_function(struct radv_shader_context *ctx,
has_previous_stage,
previous_stage, );
 
-   add_arg(, ARG_SGPR, ctx->ac.i32,
-   >ls_out_layout);
-
add_arg(, ARG_SGPR, ctx->ac.i32,
>tcs_offchip_layout);
add_arg(, ARG_SGPR, ctx->ac.i32,
@@ -1049,17 +1033,10 @@ static void create_function(struct radv_shader_context 
*ctx,
   previous_stage, _sgpr_idx);
if (ctx->abi.view_index)
set_loc_shader(ctx, AC_UD_VIEW_INDEX, _sgpr_idx, 
1);
-   if (ctx->options->key.vs.as_ls) {
-   set_loc_shader(ctx, AC_UD_VS_LS_TCS_IN_LAYOUT,
-  _sgpr_idx, 1);
-   }
break;
case MESA_SHADER_TESS_CTRL:
set_vs_specific_input_locs(ctx, stage, has_previous_stage,
   previous_stage, _sgpr_idx);
-   if (has_previous_stage)
-   set_loc_shader(ctx, AC_UD_VS_LS_TCS_IN_LAYOUT,
-  _sgpr_idx, 1);
set_loc_shader(ctx, AC_UD_TCS_OFFCHIP_LAYOUT, _sgpr_idx, 
4);
if (ctx->abi.view_index)
set_loc_shader(ctx, AC_UD_VIEW_INDEX, _sgpr_idx, 
1);
@@ -6218,7 +6195,8 @@ static void
 handle_ls_outputs_post(struct radv_shader_context *ctx)
 {
LLVMValueRef vertex_id = ctx->rel_auto_id;
-   LLVMValueRef vertex_dw_stride = unpack_param(>ac, 
ctx->ls_out_layout, 13, 8);
+   uint32_t num_tcs_inputs = 
util_last_bit64(ctx->shader_info->info.vs.ls_outputs_written);
+   LLVMValueRef vertex_dw_stride = LLVMConstInt(ctx->ac.i32, 
num_tcs_inputs * 4, false);
LLVMValueRef base_dw_addr = LLVMBuildMul(ctx->ac.builder, vertex_id,
 vertex_dw_stride, "");
 
diff --git a/src/amd/common/ac_nir_to_llvm.h b/src/amd/common/ac_nir_to_llvm.h
index 07cf9656f59..b1cc2b742b4 100644
--- a/src/amd/common/ac_nir_to_llvm.h
+++ b/src/amd/common/ac_nir_to_llvm.h
@@ -103,7 +103,6 @@ 

[Mesa-dev] [PATCH] radeonsi/nir: disable GLSL IR loop unrolling

2018-02-20 Thread Timothy Arceri
Delaying unrolling and allowing NIR to do it instead has been shown
to result in better code in drivers such as i965. shader-db results
appear to should the same is true for radeonsi.

The other advantage is that using NIR unrolling improves compile
times significantly.

Totals from affected shaders:
SGPRS: 9624 -> 10016 (4.07 %)
VGPRS: 6800 -> 6464 (-4.94 %)
Spilled SGPRs: 0 -> 2 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 359176 -> 332264 (-7.49 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 1355 -> 1432 (5.68 %)
Wait states: 0 -> 0 (0.00 %)
---
 src/gallium/drivers/radeonsi/si_get.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/gallium/drivers/radeonsi/si_get.c 
b/src/gallium/drivers/radeonsi/si_get.c
index ef03a962d1..18d9cec414 100644
--- a/src/gallium/drivers/radeonsi/si_get.c
+++ b/src/gallium/drivers/radeonsi/si_get.c
@@ -437,6 +437,8 @@ static int si_get_shader_param(struct pipe_screen* pscreen,
case PIPE_SHADER_CAP_MAX_SHADER_IMAGES:
return SI_NUM_IMAGES;
case PIPE_SHADER_CAP_MAX_UNROLL_ITERATIONS_HINT:
+   if (sscreen->debug_flags & DBG(NIR))
+   return 0;
return 32;
case PIPE_SHADER_CAP_PREFERRED_IR:
if (sscreen->debug_flags & DBG(NIR))
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/5] anv/image: Support CCS_E for images which may be used for storage

2018-02-20 Thread Jason Ekstrand
Nanley,

At your request, I did a little fact-finding.  I ran all the Sascha demos,
and only one of them ever hits this resolve: computeshader.  The demo only
hits it once for the entire run probably because it does
vkCmdCopyBufferToImage at the beginning and then uses it as a storage
image.  I'll run some other games when I get to the office tomorrow.  I
really doubt this will significantly affect the performance of any
workloads we have today.  It's possible that some game will do a bunch of
rendering and then run the result through a compute shader for after
effects and we'd want the rendering to be compressed in that case.  Really,
my primary motivation was to force more partial resolves in the CTS so that
we can get better testing.

--Jason

On Tue, Feb 20, 2018 at 1:52 PM, Jason Ekstrand 
wrote:

> We have to do resolves whenever we go into the general layout for these
> images.  However, it also means that images which declare the storage
> usage but don't actually need it most of the time will still get
> compression.
> ---
>  src/intel/vulkan/anv_image.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
> index a297cc4..477d167 100644
> --- a/src/intel/vulkan/anv_image.c
> +++ b/src/intel/vulkan/anv_image.c
> @@ -463,8 +463,7 @@ make_surface(const struct anv_device *dev,
>   * a render target.  This means that it's safe to just leave
>   * compression on at all times for these formats.
>   */
> -if (!(vk_info->usage & VK_IMAGE_USAGE_STORAGE_BIT) &&
> -all_formats_ccs_e_compatible(>info, vk_info)) {
> +if (all_formats_ccs_e_compatible(>info, vk_info)) {
> image->planes[plane].aux_usage = ISL_AUX_USAGE_CCS_E;
>  }
>   }
> @@ -799,9 +798,22 @@ anv_layout_to_aux_usage(const struct gen_device_info
> * const devinfo,
>return ISL_AUX_USAGE_NONE;
>
>
> +   case VK_IMAGE_LAYOUT_GENERAL:
> +  if (aspect == VK_IMAGE_ASPECT_DEPTH_BIT) {
> + return ISL_AUX_USAGE_NONE;
> +  } else if (image->usage & VK_IMAGE_USAGE_STORAGE_BIT) {
> + /* If we might be used as a storage image and we're in the
> general
> +  * layout, we have to disable aux because the dataport doesn't
> +  * support CCS.
> +  */
> + return ISL_AUX_USAGE_NONE;
> +  } else {
> + return image->planes[plane].aux_usage;
> +  }
> +
> +
> /* Transfer Layouts
>  */
> -   case VK_IMAGE_LAYOUT_GENERAL:
> case VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL:
> case VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL:
>if (aspect == VK_IMAGE_ASPECT_DEPTH_BIT) {
> --
> 2.5.0.400.gff86faf
>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] ac/radeonsi: pass bindless bool to load_sampler_desc()

2018-02-20 Thread Timothy Arceri



On 21/02/18 12:10, Marek Olšák wrote:

On Wed, Feb 21, 2018 at 12:50 AM, Timothy Arceri  wrote:

On 21/02/18 10:33, Marek Olšák wrote:


On Tue, Feb 20, 2018 at 11:51 PM, Timothy Arceri 
wrote:


On 21/02/18 09:46, Marek Olšák wrote:



On Tue, Feb 20, 2018 at 11:42 PM, Marek Olšák  wrote:



For patches 1-5:

Reviewed-by: Marek Olšák 




Actually no. Only patches 1, 3, 5 are reviewed by me.

Marek




Do you have an issue with patch 4?



No, I'm just not sure if it's correct. It calls
st_nir_lookup_parameter_index, but bindless handless are just
variables. I think it should just visit the whole expression leading
to the bindless variable in a generic way and not treat it as a
uniform.



I'm not sure I understand. We use uniform storage for bindless in tgsi also.


A bindless (sampler or buffer) variable is represented as a 64-bit
number in the GL API. It can be passed to shaders in many different
ways. For example, a bindless sampler2D variable can be a vertex
shader input (loaded from a vertex buffer).


Right I should have specified this series does not yet handle bindless 
input/output support, that will require more updates to nir itself as 
those shaders currently trip asserts. Patch 4 however is specifically 
about bindless uniforms.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] ac/radeonsi: pass bindless bool to load_sampler_desc()

2018-02-20 Thread Marek Olšák
On Wed, Feb 21, 2018 at 12:50 AM, Timothy Arceri  wrote:
> On 21/02/18 10:33, Marek Olšák wrote:
>>
>> On Tue, Feb 20, 2018 at 11:51 PM, Timothy Arceri 
>> wrote:
>>>
>>> On 21/02/18 09:46, Marek Olšák wrote:


 On Tue, Feb 20, 2018 at 11:42 PM, Marek Olšák  wrote:
>
>
> For patches 1-5:
>
> Reviewed-by: Marek Olšák 



 Actually no. Only patches 1, 3, 5 are reviewed by me.

 Marek
>>>
>>>
>>>
>>> Do you have an issue with patch 4?
>>
>>
>> No, I'm just not sure if it's correct. It calls
>> st_nir_lookup_parameter_index, but bindless handless are just
>> variables. I think it should just visit the whole expression leading
>> to the bindless variable in a generic way and not treat it as a
>> uniform.
>
>
> I'm not sure I understand. We use uniform storage for bindless in tgsi also.

A bindless (sampler or buffer) variable is represented as a 64-bit
number in the GL API. It can be passed to shaders in many different
ways. For example, a bindless sampler2D variable can be a vertex
shader input (loaded from a vertex buffer).

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH shaderdb 3/3] run: shader program file created via GetProgramBinary (v2)

2018-02-20 Thread Dongwon Kim
extraction of linked binary program to a file using glGetProgramBinary.
This file is intended to be loaded by glProgramBinary in the graphic
application running on the target system.

A new option, '--out=' is available to be used for specifying
the output file name.

v2: 1. define MAX_LOG_LEN and use it as the size of gl log
2. define MAX_PROG_SIZE and use it as the max size of extracted
   shader_program
3. out_file is now pointer allocated by strdup for the file name

Signed-off-by: Dongwon Kim 
---
 run.c | 57 +
 1 file changed, 53 insertions(+), 4 deletions(-)

diff --git a/run.c b/run.c
index d066567..df466eb 100644
--- a/run.c
+++ b/run.c
@@ -52,6 +52,9 @@
 
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
 
+#define MAX_LOG_LEN 4096
+#define MAX_PROG_SIZE (10*1024*1024) /* maximum 10MB for shader program */
+
 struct context_info {
 char *extension_string;
 int extension_string_len;
@@ -358,18 +361,20 @@ const struct platform platforms[] = {
 enum
 {
 PCI_ID_OVERRIDE_OPTION = CHAR_MAX + 1,
+OUT_PROGRAM_OPTION,
 };
 
 const struct option const long_options[] =
 {
 {"pciid", required_argument, NULL, PCI_ID_OVERRIDE_OPTION},
+{"out", required_argument, NULL, OUT_PROGRAM_OPTION},
 {NULL, 0, NULL, 0}
 };
 
 void print_usage(const char *prog_name)
 {
 fprintf(stderr,
-"Usage: %s [-d ] [-j ] [-o ] [-p 
] [--pciid=] \n",
+"Usage: %s [-d ] [-j ] [-o ] [-p 
] [--pciid=] [--out=] \n",
 prog_name);
 }
 
@@ -450,6 +455,7 @@ main(int argc, char **argv)
 int opt;
 bool platf_overridden = 0;
 bool pci_id_overridden = 0;
+char *out_file = NULL;
 
 max_threads = omp_get_max_threads();
 
@@ -518,6 +524,14 @@ main(int argc, char **argv)
 setenv("INTEL_DEVID_OVERRIDE", optarg, 1);
 pci_id_overridden = 1;
 break;
+case OUT_PROGRAM_OPTION:
+if (optarg[0] == 0) {
+  fprintf(stderr, "Output file name is empty.\n");
+  return -1;
+}
+out_file = strdup(optarg);
+assert(out_file != NULL);
+break;
 default:
 fprintf(stderr, "Unknown option: %x\n", opt);
 print_usage(argv[0]);
@@ -751,6 +765,8 @@ main(int argc, char **argv)
 EGLContext compat_ctx = create_context(egl_dpy, cfg, TYPE_COMPAT);
 if (compat_ctx == EGL_NO_CONTEXT) {
 fprintf(stderr, "ERROR: eglCreateContext() failed\n");
+if (out_file)
+free(out_file);
 exit(-1);
 }
 
@@ -858,18 +874,18 @@ main(int argc, char **argv)
 }
 } else if (type == TYPE_CORE || type == TYPE_COMPAT || type == 
TYPE_ES) {
 GLuint prog = glCreateProgram();
+GLint param;
 
 for (unsigned i = 0; i < num_shaders; i++) {
 GLuint s = glCreateShader(shader[i].type);
 glShaderSource(s, 1, [i].text, [i].length);
 glCompileShader(s);
 
-GLint param;
 glGetShaderiv(s, GL_COMPILE_STATUS, );
 if (unlikely(!param)) {
-GLchar log[4096];
+GLchar log[MAX_LOG_LEN];
 GLsizei length;
-glGetShaderInfoLog(s, 4096, , log);
+glGetShaderInfoLog(s, sizeof(log), , log);
 
 fprintf(stderr, "ERROR: %s failed to compile:\n%s\n",
 current_shader_name, log);
@@ -879,6 +895,36 @@ main(int argc, char **argv)
 }
 
 glLinkProgram(prog);
+
+glGetProgramiv(prog, GL_LINK_STATUS, );
+if (unlikely(!param)) {
+   GLchar log[MAX_LOG_LEN];
+   GLsizei length;
+   glGetProgramInfoLog(prog, sizeof(log), , log);
+
+   fprintf(stderr, "ERROR: failed to link progam:\n%s\n",
+   log);
+} else {
+   char *prog_buf = (char *)malloc(MAX_PROG_SIZE);
+   GLenum format;
+   GLsizei length;
+   FILE *fp;
+
+   glGetProgramBinary(prog, MAX_PROG_SIZE, , , 
prog_buf);
+
+   param = glGetError();
+   if (param != GL_NO_ERROR) {
+  fprintf(stderr, "ERROR: failed to get Program Binary\n");
+   } else {
+  fp = fopen(out_file, "wb");
+  fprintf(stdout, "Binary program is generated (%d 
Byte).\n", length);
+  fprintf(stdout, "Binary Format is %d\n", format);
+  fprintf(stdout, "Now writing to the file\n");
+  fwrite(prog_buf, sizeof(char), length, fp);
+  

Re: [Mesa-dev] [PATCH v2 02/12] genxml: Preserve fields that share dword space with addresses.

2018-02-20 Thread Rafael Antognolli
On Wed, Jan 24, 2018 at 11:20:07AM +0200, Pohjolainen, Topi wrote:
> On Fri, Jan 19, 2018 at 11:54:37AM -0800, Rafael Antognolli wrote:
> > Some instructions contain fields that are either an address or a value
> > of some type based on the content of other fields, such as clear color
> > values vs address. That works fine if these fields are in the less
> > significant dword, the lower 32 bits of the address, because they get
> > OR'ed with the address. But if they are in the higher 32 bits, they get
> > discarded.
> > 
> > On Gen10 we have fields that share space with the higher 16 bits of the
> > address too. This commit makes sure those fields don't get discarded.
> > 
> > Signed-off-by: Rafael Antognolli 
> > ---
> >  src/intel/genxml/gen_pack_header.py | 9 +++--
> >  1 file changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/src/intel/genxml/gen_pack_header.py 
> > b/src/intel/genxml/gen_pack_header.py
> > index e6cea8646ff..e81695e2aea 100644
> > --- a/src/intel/genxml/gen_pack_header.py
> > +++ b/src/intel/genxml/gen_pack_header.py
> > @@ -486,11 +486,16 @@ class Group(object):
> >  v_address = "v%d_address" % index
> >  print("   const uint64_t %s =\n  
> > __gen_combine_address(data, [%d], values->%s, %s);" %
> >(v_address, index, dw.address.name + field.dim, v))
> > -v = v_address
> > -
> > +if len(dw.fields) > address_count:
> > +print("   dw[%d] = %s;" % (index, v_address))
> > +print("   dw[%d] = (%s >> 32) | (%s >> 32);" % (index 
> > + 1, v_address, v))
> > +continue
> > +else:
> > +v = v_address
> >  print("   dw[%d] = %s;" % (index, v))
> >  print("   dw[%d] = %s >> 32;" % (index + 1, v))
> 
> I'm wondering if we could have left the "continue" out and write the
> else-branch directly just like we did if:
> 
>print("   dw[%d] = %s;" % (index, v_address))
>print("   dw[%d] = %s >> 32;" % (index + 1, v_address))

Hi Topi,

I was rebasing the series on top of master and while applying your
suggestion, I just noticed it's not gonna work. Notice that the last 2
lines are executed both when the else branch is taken, or when the outer
"if dw.address:" is not taken. If I do as you suggest, that last case
won't be covered.

I know it looks really ugly, I'll try to think of a better way to do
this, but for now I'll just submit the updated series with this version
again to get reviews on other stuff.

Thanks for the review anyway.

> >  
> > +
> >  class Value(object):
> >  def __init__(self, attrs):
> >  self.name = safe_name(attrs["name"])
> > -- 
> > 2.14.3
> > 
> > ___
> > mesa-dev mailing list
> > mesa-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] clover: Fix build after llvm r325155 and r325160

2018-02-20 Thread Jan Vesely
On Wed, 2018-02-21 at 00:50 +0100, Dieter Nützel wrote:
> Thank you Jan,
> 
> much appreciated, but now I get this:
> 
> LLVM-CC amdgcn--/lib/math/half_exp.cl.tahiti.bc
> ./amdgcn/lib/workitem/get_global_offset.cl:6:3: error: casting 
> '__attribute__((address_space(4)))
>unsigned char *' to type '__attribute__((address_space(2))) uint 
> *' (aka
>'__attribute__((address_space(2))) unsigned int *') changes 
> address space of pointer
>  (__attribute__((address_space(2))) uint *)
>  ^
> 1 error generated.
> make: *** [Makefile:6309: 
> amdgcn--/lib/workitem/get_global_offset.cl.tahiti.bc] Fehler 1
> make: *** Es wird auf noch nicht beendete Prozesse gewartet
> ./amdgcn/lib/workitem/get_work_dim.cl:6:3: error: casting 
> '__attribute__((address_space(4)))
>unsigned char *' to type '__attribute__((address_space(2))) uint 
> *' (aka
>'__attribute__((address_space(2))) unsigned int *') changes 
> address space of pointer
>  (__attribute__((address_space(2))) uint *)
>  ^
> 1 error generated.
> make: *** [Makefile:6336: 
> amdgcn--/lib/workitem/get_work_dim.cl.tahiti.bc] Fehler 1
> 
> LLVM git taken some seconds ago 
> (#99eb4ff05c078ba341c99a5da2d003b346bd092c)

amdgcn backend switched GDS and const address space numbers (for
whatever reason, writing sensible and explanatory commit messages is
apparently frowned upon at AMD).
you'll need at least the 3 gcn patches from an earlier build fix
series[0].

Jan

[0] http://lists.llvm.org/pipermail/libclc-dev/2018-February/002796.htm
l

> 
> Dieter
> 
> Am 19.02.2018 23:21, schrieb Jan Vesely:
> > On Fri, 2018-02-16 at 05:49 +0100, Dieter Nützel wrote:
> > > Hello Jan,
> > > 
> > > something semilar is needed fro libclc, too.
> > > 
> > > LLVM-CC nvptx64--nvidiacl/lib/geometric/dot.cl.bc
> > > ./utils/prepare-builtins.cpp:108:3: error: no matching function for 
> > > call
> > > to 'WriteBitcodeToFile'
> > >WriteBitcodeToFile(M, Out->os());
> > >^~
> > > /usr/local/include/llvm/Bitcode/BitcodeWriter.h:129:8: note: candidate
> > > function not viable: no known
> > >conversion from 'llvm::Module *' to 'const llvm::Module' for 
> > > 1st
> > > argument; dereference the
> > >argument with *
> > >void WriteBitcodeToFile(const Module , raw_ostream ,
> > > ^
> > 
> > patch is now posted at:
> > https://lists.llvm.org/pipermail/libclc-dev/2018-February/002800.html
> > 
> > Jan
> > 
> > > 
> > > Greetings,
> > > Dieter

-- 
Jan Vesely 

signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] ac/radeonsi: pass bindless bool to load_sampler_desc()

2018-02-20 Thread Timothy Arceri

On 21/02/18 10:33, Marek Olšák wrote:

On Tue, Feb 20, 2018 at 11:51 PM, Timothy Arceri  wrote:

On 21/02/18 09:46, Marek Olšák wrote:


On Tue, Feb 20, 2018 at 11:42 PM, Marek Olšák  wrote:


For patches 1-5:

Reviewed-by: Marek Olšák 



Actually no. Only patches 1, 3, 5 are reviewed by me.

Marek



Do you have an issue with patch 4?


No, I'm just not sure if it's correct. It calls
st_nir_lookup_parameter_index, but bindless handless are just
variables. I think it should just visit the whole expression leading
to the bindless variable in a generic way and not treat it as a
uniform.


I'm not sure I understand. We use uniform storage for bindless in tgsi 
also.




Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] clover: Fix build after llvm r325155 and r325160

2018-02-20 Thread Dieter Nützel

Thank you Jan,

much appreciated, but now I get this:

LLVM-CC amdgcn--/lib/math/half_exp.cl.tahiti.bc
./amdgcn/lib/workitem/get_global_offset.cl:6:3: error: casting 
'__attribute__((address_space(4)))
  unsigned char *' to type '__attribute__((address_space(2))) uint 
*' (aka
  '__attribute__((address_space(2))) unsigned int *') changes 
address space of pointer

(__attribute__((address_space(2))) uint *)
^
1 error generated.
make: *** [Makefile:6309: 
amdgcn--/lib/workitem/get_global_offset.cl.tahiti.bc] Fehler 1

make: *** Es wird auf noch nicht beendete Prozesse gewartet
./amdgcn/lib/workitem/get_work_dim.cl:6:3: error: casting 
'__attribute__((address_space(4)))
  unsigned char *' to type '__attribute__((address_space(2))) uint 
*' (aka
  '__attribute__((address_space(2))) unsigned int *') changes 
address space of pointer

(__attribute__((address_space(2))) uint *)
^
1 error generated.
make: *** [Makefile:6336: 
amdgcn--/lib/workitem/get_work_dim.cl.tahiti.bc] Fehler 1


LLVM git taken some seconds ago 
(#99eb4ff05c078ba341c99a5da2d003b346bd092c)


Dieter

Am 19.02.2018 23:21, schrieb Jan Vesely:

On Fri, 2018-02-16 at 05:49 +0100, Dieter Nützel wrote:

Hello Jan,

something semilar is needed fro libclc, too.

LLVM-CC nvptx64--nvidiacl/lib/geometric/dot.cl.bc
./utils/prepare-builtins.cpp:108:3: error: no matching function for 
call

to 'WriteBitcodeToFile'
   WriteBitcodeToFile(M, Out->os());
   ^~
/usr/local/include/llvm/Bitcode/BitcodeWriter.h:129:8: note: candidate
function not viable: no known
   conversion from 'llvm::Module *' to 'const llvm::Module' for 
1st

argument; dereference the
   argument with *
   void WriteBitcodeToFile(const Module , raw_ostream ,
^


patch is now posted at:
https://lists.llvm.org/pipermail/libclc-dev/2018-February/002800.html

Jan



Greetings,
Dieter

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] radeonsi/nir: fix tess varying loads for doubles

2018-02-20 Thread Timothy Arceri
Fixes the following piglit tests:

tests/spec/arb_tessellation_shader/execution/double-array-vs-tcs-tes.shader_test
tests/spec/arb_tessellation_shader/execution/double-vs-tcs-tes.shader_test
---
 src/gallium/drivers/radeonsi/si_shader.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index 33319b249c..9ccae9f18d 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1273,7 +1273,7 @@ static LLVMValueRef si_nir_load_tcs_varyings(struct 
ac_shader_abi *abi,
 
LLVMValueRef value[4];
for (unsigned i = 0; i < num_components + component; i++) {
-   value[i] = lds_load(bld_base, ctx->i32, i, dw_addr);
+   value[i] = lds_load(bld_base, type, i, dw_addr);
}
 
return ac_build_varying_gather_values(>ac, value, num_components, 
component);
@@ -1360,7 +1360,7 @@ LLVMValueRef si_nir_load_input_tes(struct ac_shader_abi 
*abi,
 */
LLVMValueRef value[4];
for (unsigned i = component; i < num_components + component; i++) {
-   value[i] = buffer_load(>bld_base, ctx->i32, i, buffer, 
base, addr, true);
+   value[i] = buffer_load(>bld_base, type, i, buffer, base, 
addr, true);
}
 
return ac_build_varying_gather_values(>ac, value, num_components, 
component);
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] ac/radeonsi: pass type to load_tess_varyings()

2018-02-20 Thread Timothy Arceri
We need this to be able to load 64bit varyings.
---
 src/amd/common/ac_nir_to_llvm.c   | 15 +--
 src/amd/common/ac_shader_abi.h|  1 +
 src/gallium/drivers/radeonsi/si_shader.c  |  2 ++
 src/gallium/drivers/radeonsi/si_shader_internal.h |  1 +
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
index 9f55be0d45..50f3a4f69e 100644
--- a/src/amd/common/ac_nir_to_llvm.c
+++ b/src/amd/common/ac_nir_to_llvm.c
@@ -2888,6 +2888,7 @@ get_dw_address(struct radv_shader_context *ctx,
 
 static LLVMValueRef
 load_tcs_varyings(struct ac_shader_abi *abi,
+ LLVMTypeRef type,
  LLVMValueRef vertex_index,
  LLVMValueRef indir_index,
  unsigned const_index,
@@ -3011,6 +3012,7 @@ store_tcs_output(struct ac_shader_abi *abi,
 
 static LLVMValueRef
 load_tes_input(struct ac_shader_abi *abi,
+  LLVMTypeRef type,
   LLVMValueRef vertex_index,
   LLVMValueRef param_index,
   unsigned const_index,
@@ -3149,12 +3151,21 @@ static LLVMValueRef load_tess_varyings(struct 
ac_nir_context *ctx,
 false, NULL, is_patch ? NULL : _index,
 _index, _index);
 
-   result = ctx->abi->load_tess_varyings(ctx->abi, vertex_index, 
indir_index,
+   LLVMTypeRef dest_type = get_def_type(ctx, >dest.ssa);
+
+   LLVMTypeRef src_component_type;
+   if (LLVMGetTypeKind(dest_type) == LLVMVectorTypeKind)
+   src_component_type = LLVMGetElementType(dest_type);
+   else
+   src_component_type = dest_type;
+
+   result = ctx->abi->load_tess_varyings(ctx->abi, src_component_type,
+ vertex_index, indir_index,
  const_index, location, 
driver_location,
  
instr->variables[0]->var->data.location_frac,
  instr->num_components,
  is_patch, is_compact, 
load_inputs);
-   return LLVMBuildBitCast(ctx->ac.builder, result, get_def_type(ctx, 
>dest.ssa), "");
+   return LLVMBuildBitCast(ctx->ac.builder, result, dest_type, "");
 }
 
 static LLVMValueRef visit_load_var(struct ac_nir_context *ctx,
diff --git a/src/amd/common/ac_shader_abi.h b/src/amd/common/ac_shader_abi.h
index de3034e32f..75fd8ed554 100644
--- a/src/amd/common/ac_shader_abi.h
+++ b/src/amd/common/ac_shader_abi.h
@@ -96,6 +96,7 @@ struct ac_shader_abi {
LLVMTypeRef type);
 
LLVMValueRef (*load_tess_varyings)(struct ac_shader_abi *abi,
+  LLVMTypeRef type,
   LLVMValueRef vertex_index,
   LLVMValueRef param_index,
   unsigned const_index,
diff --git a/src/gallium/drivers/radeonsi/si_shader.c 
b/src/gallium/drivers/radeonsi/si_shader.c
index ec03f537d0..33319b249c 100644
--- a/src/gallium/drivers/radeonsi/si_shader.c
+++ b/src/gallium/drivers/radeonsi/si_shader.c
@@ -1215,6 +1215,7 @@ static LLVMValueRef fetch_input_tcs(
 }
 
 static LLVMValueRef si_nir_load_tcs_varyings(struct ac_shader_abi *abi,
+LLVMTypeRef type,
 LLVMValueRef vertex_index,
 LLVMValueRef param_index,
 unsigned const_index,
@@ -1316,6 +1317,7 @@ static LLVMValueRef fetch_input_tes(
 }
 
 LLVMValueRef si_nir_load_input_tes(struct ac_shader_abi *abi,
+  LLVMTypeRef type,
   LLVMValueRef vertex_index,
   LLVMValueRef param_index,
   unsigned const_index,
diff --git a/src/gallium/drivers/radeonsi/si_shader_internal.h 
b/src/gallium/drivers/radeonsi/si_shader_internal.h
index 571df55977..42a1b9f107 100644
--- a/src/gallium/drivers/radeonsi/si_shader_internal.h
+++ b/src/gallium/drivers/radeonsi/si_shader_internal.h
@@ -268,6 +268,7 @@ LLVMValueRef si_llvm_emit_fetch(struct 
lp_build_tgsi_context *bld_base,
unsigned swizzle);
 
 LLVMValueRef si_nir_load_input_tes(struct ac_shader_abi *abi,
+  LLVMTypeRef type,
   LLVMValueRef vertex_index,
   LLVMValueRef param_index,
   unsigned const_index,
-- 
2.14.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] ac/radeonsi: pass bindless bool to load_sampler_desc()

2018-02-20 Thread Marek Olšák
On Tue, Feb 20, 2018 at 11:51 PM, Timothy Arceri  wrote:
> On 21/02/18 09:46, Marek Olšák wrote:
>>
>> On Tue, Feb 20, 2018 at 11:42 PM, Marek Olšák  wrote:
>>>
>>> For patches 1-5:
>>>
>>> Reviewed-by: Marek Olšák 
>>
>>
>> Actually no. Only patches 1, 3, 5 are reviewed by me.
>>
>> Marek
>
>
> Do you have an issue with patch 4?

No, I'm just not sure if it's correct. It calls
st_nir_lookup_parameter_index, but bindless handless are just
variables. I think it should just visit the whole expression leading
to the bindless variable in a generic way and not treat it as a
uniform.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/6] i965/state: Ignore intel_obj->_Format for depth/stencil and ETC2

2018-02-20 Thread Jason Ekstrand
On Mon, Feb 19, 2018 at 10:01 AM, Chad Versace 
wrote:

> On Wed 24 Jan 2018, Jason Ekstrand wrote:
> > We're about to start letting the intel_obj->_Format be the "real"
> > texture format.  For depth/stencil textures, this may be a combined
> > depth stencil format.  For ETC2 on gen7 and earlier, this will be the
> > actual ETC2 format.  This makes a bit more GL sense but means we have to
> > be careful in state upload.
>
> What is the "real" format? It's not a rhetorical question. Throughout
> Mesa, I never know what's real and what's not. By "real", do you mean
> the untranslated user-specified glTextureView(internalformat) and
> glTexImage2D(internalformat)?  Or do you mean simply "more real than
> before" ;)
>

By "real" format, I mean the one that the core mesa state tracking code
thinks it is.  For texture views, that corresponds directly to an actual GL
internal format.  For textures created through glTexImage2D (not
TexStorage) with an internal format such as GL_RGB, it's something computed
from the internal format and the format used for upload.


> > ---
> >  src/mesa/drivers/dri/i965/brw_wm_surface_state.c | 16 +++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > index 38af6bc..844c23b 100644
> > --- a/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > +++ b/src/mesa/drivers/dri/i965/brw_wm_surface_state.c
> > @@ -507,7 +507,21 @@ brw_update_texture_surface(struct gl_context *ctx,
> >const unsigned swizzle = (unlikely(alpha_depth) ? SWIZZLE_XYZW :
> >  brw_get_texture_swizzle(>ctx,
> obj));
> >
> > -  mesa_format mesa_fmt = plane == 0 ? intel_obj->_Format :
> mt->format;
> > +  mesa_format mesa_fmt;
> > +  if (firstImage->_BaseFormat == GL_DEPTH_STENCIL ||
> > +  firstImage->_BaseFormat == GL_DEPTH_COMPONENT) {
> > + /* The format from intel_obj may be a combined depth stencil
> format
> > +  * when we just want depth.  Pull it from the miptree
> instead.  This
> > +  * is safe because texture views aren't allowed on
> depth/stencil.
> > +  */
> > + mesa_fmt = mt->format;
> > +  } else if (mt->etc_format != MESA_FORMAT_NONE) {
> > + mesa_fmt = mt->format;
>
> This looks like it would break ETC2 texture views on hw where we decode
> the ETC2 on upload (Ivybridge?), if such views worked. I suspect such
> texture views never worked.
>

I'm pretty sure they've never worked.


> > +  } else if (plane > 0) {
> > + mesa_fmt = mt->format;
> > +  } else {
> > + mesa_fmt = intel_obj->_Format;
> > +  }
> >enum isl_format format = translate_tex_format(brw, mesa_fmt,
> >  for_txf ?
> GL_DECODE_EXT :
> >
> sampler->sRGBDecode);
>
> I want to give a r-b, but want to first hear your reply regarding the
> "real" format.
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v3] anv/blorp: multisample resolve all attachment layers

2018-02-20 Thread Nanley Chery
On Thu, Feb 15, 2018 at 09:40:16AM +0100, Iago Toral Quiroga wrote:
> We were only resolving the first.
> 
> v2:
>   - Do not require that the number of layers on dst and src are an
> exact match, it is okay if the dst has more layers so long as
> it has at least the same that we are going to resolve.
>   - Do not always resolve array_len layers, we should resolve
> only from base_array_layer to array_len.
> 
> v3:
>   - v2 was assuming that array_len represented the total number of
> layers in the image, but it represents the number of layers
> starting at the base array ayer.
> 
> Fixes new CTS tests for multisampled layered rendering:
> dEQP-VK.renderpass.multisample_resolve.layers_*
> ---
>  src/intel/vulkan/anv_blorp.c | 30 +++---
>  1 file changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
> index d38b343671..df566773a4 100644
> --- a/src/intel/vulkan/anv_blorp.c
> +++ b/src/intel/vulkan/anv_blorp.c
> @@ -1543,25 +1543,33 @@ anv_cmd_buffer_resolve_subpass(struct anv_cmd_buffer 
> *cmd_buffer)
>   get_blorp_surf_for_anv_image(cmd_buffer->device, dst_iview->image,
>VK_IMAGE_ASPECT_COLOR_BIT,
>dst_aux_usage, _surf);
> +
> + uint32_t base_src_layer = src_iview->planes[0].isl.base_array_layer;
> + uint32_t base_dst_layer = dst_iview->planes[0].isl.base_array_layer;
> + uint32_t num_layers = src_iview->planes[0].isl.array_len;

num_layers should be equal to fb->layers. As seen in the definition of
renderArea in the Vulkan spec, resolve operations are limited to the
renderArea, which extends to all layers of the framebuffer.

   renderArea is the render area that is affected by the render pass
   instance. The effects of attachment load, store and multisample resolve
   operations are restricted to the pixels whose x and y coordinates fall
   within the render area on all attachments. The render area extends to
   all layers of framebuffer.

> + assert(num_layers <= dst_iview->planes[0].isl.array_len);
> +

This assertion is false. The spec allows having an arrayed multisample
source image view and a non-arrayed single-sampled destination image
view as long as the framebuffer is non-arrayed.

   Each element of pAttachments must have dimensions at least as large as
   the corresponding framebuffer dimension

-Nanley

>   anv_cmd_buffer_mark_image_written(cmd_buffer, dst_iview->image,
> VK_IMAGE_ASPECT_COLOR_BIT,
> dst_surf.aux_usage,
> 
> dst_iview->planes[0].isl.base_level,
> -   
> dst_iview->planes[0].isl.base_array_layer, 1);
> +   base_dst_layer, num_layers);
>  
>   assert(!src_iview->image->format->can_ycbcr);
>   assert(!dst_iview->image->format->can_ycbcr);
>  
> - resolve_surface(,
> - _surf,
> - src_iview->planes[0].isl.base_level,
> - src_iview->planes[0].isl.base_array_layer,
> - _surf,
> - dst_iview->planes[0].isl.base_level,
> - dst_iview->planes[0].isl.base_array_layer,
> - render_area.offset.x, render_area.offset.y,
> - render_area.offset.x, render_area.offset.y,
> - render_area.extent.width, 
> render_area.extent.height);
> + for (uint32_t i = 0; i < num_layers; i++) {
> +resolve_surface(,
> +_surf,
> +src_iview->planes[0].isl.base_level,
> +base_src_layer + i,
> +_surf,
> +dst_iview->planes[0].isl.base_level,
> +base_dst_layer + i,
> +render_area.offset.x, render_area.offset.y,
> +render_area.offset.x, render_area.offset.y,
> +render_area.extent.width, 
> render_area.extent.height);
> + }
>}
>  
>blorp_batch_finish();
> -- 
> 2.14.1
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] i965: Use blorp_ccs_op for CCS fast-clears

2018-02-20 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_blorp.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_blorp.c 
b/src/mesa/drivers/dri/i965/brw_blorp.c
index 3c4aef9..6a87e54 100644
--- a/src/mesa/drivers/dri/i965/brw_blorp.c
+++ b/src/mesa/drivers/dri/i965/brw_blorp.c
@@ -1254,9 +1254,15 @@ do_single_blorp_clear(struct brw_context *brw, struct 
gl_framebuffer *fb,
 
   struct blorp_batch batch;
   blorp_batch_init(>blorp, , brw, 0);
-  blorp_fast_clear(, , isl_format,
-   level, irb->mt_layer, num_layers,
-   x0, y0, x1, y1);
+  if (surf.aux_usage == ISL_AUX_USAGE_CCS_E ||
+  surf.aux_usage == ISL_AUX_USAGE_CCS_D) {
+ blorp_ccs_op(, , level, irb->mt_layer, num_layers,
+  isl_format, ISL_AUX_OP_FAST_CLEAR);
+  } else {
+ blorp_fast_clear(, , isl_format,
+  level, irb->mt_layer, num_layers,
+  x0, y0, x1, y1);
+  }
   blorp_batch_finish();
 
   brw_emit_end_of_pipe_sync(brw, PIPE_CONTROL_RENDER_TARGET_FLUSH);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] intel/blorp: Simplify asserts in blorp_ccs_op

2018-02-20 Thread Jason Ekstrand
If we use any invalid CCS ops for a particular platform, we will hit an
unreachable() in the blorp back-end.  The only on CCS op not supported
by this function at the moment is fast-clear.
---
 src/intel/blorp/blorp_clear.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index c7c013a..2027fda 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -875,17 +875,7 @@ blorp_ccs_op(struct blorp_batch *batch,
params.x1 = ALIGN(params.x1, x_scaledown) / x_scaledown;
params.y1 = ALIGN(params.y1, y_scaledown) / y_scaledown;
 
-   if (batch->blorp->isl_dev->info->gen >= 10) {
-  assert(ccs_op == ISL_AUX_OP_FULL_RESOLVE ||
- ccs_op == ISL_AUX_OP_PARTIAL_RESOLVE ||
- ccs_op == ISL_AUX_OP_AMBIGUATE);
-   } else if (batch->blorp->isl_dev->info->gen >= 9) {
-  assert(ccs_op == ISL_AUX_OP_FULL_RESOLVE ||
- ccs_op == ISL_AUX_OP_PARTIAL_RESOLVE);
-   } else {
-  /* Broadwell and earlier do not have a partial resolve */
-  assert(ccs_op == ISL_AUX_OP_FULL_RESOLVE);
-   }
+   assert(ccs_op != ISL_AUX_OP_FAST_CLEAR);
params.fast_clear_op = ccs_op;
params.num_layers = num_layers;
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] intel/blorp: Refactor MCS clears

2018-02-20 Thread Jason Ekstrand
This commit renames blorp_fast_clear to blorp_mcs_clear, pulls in the
fast clear rectangle calculation into the function, and removes the
unneeded level parameter.  We could have also removed the x0, y0, x1,
and y1 parameters because all of the callers only do full-slice clears.
However, partial clears are a thing that we can, in theory, do under the
right conditions.  We may as well keep them for a rainy day.
---
 src/intel/blorp/blorp.h   |  13 ++--
 src/intel/blorp/blorp_clear.c | 143 ++
 src/intel/vulkan/anv_blorp.c  |   6 +-
 src/mesa/drivers/dri/i965/brw_blorp.c |   6 +-
 4 files changed, 71 insertions(+), 97 deletions(-)

diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
index e27ea7e..6a4501d 100644
--- a/src/intel/blorp/blorp.h
+++ b/src/intel/blorp/blorp.h
@@ -142,12 +142,6 @@ blorp_buffer_copy(struct blorp_batch *batch,
   uint64_t size);
 
 void
-blorp_fast_clear(struct blorp_batch *batch,
- const struct blorp_surf *surf, enum isl_format format,
- uint32_t level, uint32_t start_layer, uint32_t num_layers,
- uint32_t x0, uint32_t y0, uint32_t x1, uint32_t y1);
-
-void
 blorp_clear(struct blorp_batch *batch,
 const struct blorp_surf *surf,
 enum isl_format format, struct isl_swizzle swizzle,
@@ -208,6 +202,13 @@ blorp_ccs_op(struct blorp_batch *batch,
  enum isl_aux_op ccs_op);
 
 void
+blorp_mcs_clear(struct blorp_batch *batch,
+const struct blorp_surf *surf,
+enum isl_format format,
+uint32_t start_layer, uint32_t num_layers,
+uint32_t x0, uint32_t y0, uint32_t x1, uint32_t y1);
+
+void
 blorp_mcs_partial_resolve(struct blorp_batch *batch,
   struct blorp_surf *surf,
   enum isl_format format,
diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index 8d729a2..6ef5a3b 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -179,107 +179,80 @@ blorp_params_get_layer_offset_vs(struct blorp_context 
*blorp,
return result;
 }
 
-/* The x0, y0, x1, and y1 parameters must already be populated with the render
- * area of the framebuffer to be cleared.
- */
-static void
-get_fast_clear_rect(const struct isl_device *dev,
-const struct isl_surf *aux_surf,
-unsigned *x0, unsigned *y0,
-unsigned *x1, unsigned *y1)
-{
-   unsigned int x_align, y_align;
-   unsigned int x_scaledown, y_scaledown;
-
-   /* Only single sampled surfaces need to (and actually can) be resolved. */
-   if (aux_surf->usage == ISL_SURF_USAGE_CCS_BIT) {
-  unreachable("This function only supports MCS fast-clear");
-   } else {
-  assert(aux_surf->usage == ISL_SURF_USAGE_MCS_BIT);
-
-  /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
-   * Target(s)", beneath the "MSAA Compression" bullet (p326):
-   *
-   * Clear pass for this case requires that scaled down primitive
-   * is sent down with upper left co-ordinate to coincide with
-   * actual rectangle being cleared. For MSAA, clear rectangle’s
-   * height and width need to as show in the following table in
-   * terms of (width,height) of the RT.
-   *
-   * MSAA  Width of Clear Rect  Height of Clear Rect
-   *  2X Ceil(1/8*width)  Ceil(1/2*height)
-   *  4X Ceil(1/8*width)  Ceil(1/2*height)
-   *  8X Ceil(1/2*width)  Ceil(1/2*height)
-   * 16X widthCeil(1/2*height)
-   *
-   * The text "with upper left co-ordinate to coincide with actual
-   * rectangle being cleared" is a little confusing--it seems to imply
-   * that to clear a rectangle from (x,y) to (x+w,y+h), one needs to
-   * feed the pipeline using the rectangle (x,y) to
-   * (x+Ceil(w/N),y+Ceil(h/2)), where N is either 2 or 8 depending on
-   * the number of samples.  Experiments indicate that this is not
-   * quite correct; actually, what the hardware appears to do is to
-   * align whatever rectangle is sent down the pipeline to the nearest
-   * multiple of 2x2 blocks, and then scale it up by a factor of N
-   * horizontally and 2 vertically.  So the resulting alignment is 4
-   * vertically and either 4 or 16 horizontally, and the scaledown
-   * factor is 2 vertically and either 2 or 8 horizontally.
-   */
-  switch (aux_surf->format) {
-  case ISL_FORMAT_MCS_2X:
-  case ISL_FORMAT_MCS_4X:
- x_scaledown = 8;
- break;
-  case ISL_FORMAT_MCS_8X:
- x_scaledown = 2;
- break;
-  case ISL_FORMAT_MCS_16X:
- x_scaledown = 1;
- break;
-  default:
- unreachable("Unexpected MCS format for fast clear");
-  }
-  y_scaledown = 2;
- 

[Mesa-dev] [PATCH 8/9] intel/blorp: Handle fast-clear directly in blorp_ccs_op

2018-02-20 Thread Jason Ekstrand
---
 src/intel/blorp/blorp_clear.c | 199 +++---
 1 file changed, 88 insertions(+), 111 deletions(-)

diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index 17d47a1..8d729a2 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -193,104 +193,7 @@ get_fast_clear_rect(const struct isl_device *dev,
 
/* Only single sampled surfaces need to (and actually can) be resolved. */
if (aux_surf->usage == ISL_SURF_USAGE_CCS_BIT) {
-  /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
-   * Target(s)", beneath the "Fast Color Clear" bullet (p327):
-   *
-   * Clear pass must have a clear rectangle that must follow
-   * alignment rules in terms of pixels and lines as shown in the
-   * table below. Further, the clear-rectangle height and width
-   * must be multiple of the following dimensions. If the height
-   * and width of the render target being cleared do not meet these
-   * requirements, an MCS buffer can be created such that it
-   * follows the requirement and covers the RT.
-   *
-   * The alignment size in the table that follows is related to the
-   * alignment size that is baked into the CCS surface format but with X
-   * alignment multiplied by 16 and Y alignment multiplied by 32.
-   */
-  x_align = isl_format_get_layout(aux_surf->format)->bw;
-  y_align = isl_format_get_layout(aux_surf->format)->bh;
-
-  x_align *= 16;
-
-  /* SKL+ line alignment requirement for Y-tiled are half those of the 
prior
-   * generations.
-   */
-  if (dev->info->gen >= 9)
- y_align *= 16;
-  else
- y_align *= 32;
-
-  /* From the Ivy Bridge PRM, Vol2 Part1 11.7 "MCS Buffer for Render
-   * Target(s)", beneath the "Fast Color Clear" bullet (p327):
-   *
-   * In order to optimize the performance MCS buffer (when bound to
-   * 1X RT) clear similarly to MCS buffer clear for MSRT case,
-   * clear rect is required to be scaled by the following factors
-   * in the horizontal and vertical directions:
-   *
-   * The X and Y scale down factors in the table that follows are each
-   * equal to half the alignment value computed above.
-   */
-  x_scaledown = x_align / 2;
-  y_scaledown = y_align / 2;
-
-  if (ISL_DEV_IS_HASWELL(dev)) {
- /* The following text was added in the Haswell PRM, "3D Media GPGPU
-  * Engine" >> "MCS Buffer for Render Target(s)" >> Table "Color Clear
-  * of Non-MultiSampler Render Target Restrictions":
-  *
-  *"Clear rectangle must be aligned to two times the number of
-  *pixels in the table shown below due to 16X16 hashing across the
-  *slice."
-  *
-  * It has persisted in the documentation for all platforms up until
-  * Cannonlake and possibly even beyond.  However, we believe that it
-  * is only needed on Haswell.
-  *
-  * There are a couple possible explanations for this restriction:
-  *
-  * 1) If you assume that the hardware is writing to the CCS as
-  *bytes, then the x/y_align computed above gives you an alignment
-  *in the CCS of 8x8 bytes and, if 16x16 is needed for hashing, we
-  *need to multiply by 2.
-  *
-  * 2) Haswell is a bit unique in that it's CCS tiling does not line
-  *up with Y-tiling on a cache-line granularity.  Instead, it has
-  *an extra bit of swizzling in bit 9.  Also, bit 6 swizzling
-  *applies to the CCS on Haswell.  This means that Haswell CTS
-  *does not match on a cache-line granularity but it does match on
-  *a 2x2 cache line granularity.
-  *
-  * Clearly, the first explanation seems to follow documentation the
-  * best but they may be related.  In any case, empirical evidence
-  * seems to confirm that it is, indeed required on Haswell.
-  *
-  * On Broadwell things get a bit stickier.  Broadwell adds support
-  * for mip-mapped CCS with an alignment in the CCS of 256x128.  For a
-  * 32bpb main surface, the above computation will yield a x/y_align
-  * of 128x128 for a Y-tiled main surface and 256x64 for X-tiled.  In
-  * either case, if we double the alignment, we will get an alignment
-  * bigger than horizontal and vertical alignment of the CCS and fast
-  * clears of one LOD may leak into others.
-  *
-  * Starting with Skylake, the image alignment for the CCS is only
-  * 128x64 which is exactly the x/h_align computed above if the main
-  * surface has a 32bpb format.  Also, the "Render Target Resolve"
-  * page in the bspec (not the PRM) says, "The Resolve 

[Mesa-dev] [PATCH 0/9] intel/blorp: Refactors, cleanups, and fixes

2018-02-20 Thread Jason Ekstrand
This little series makes a bunch of mostly small changes to blorp.  The end
objective is to get to the point where you just call blorp_ccs_op and hand
it an isl_aux_op instead of having different entrypoints for everything.
This is similar to what we do for HiZ.  For MCS, we still have two
functions: blorp_mcs_clear and blorp_mcs_partial_resolve.  Since those are
the only two MCS operations you can do (and partial resolve isn't an actual
hardware op), that seemed ok.

The difficult patch in here is the first one.  I fairly firmly believe it
to be correct but it's a deviation of the docs so it's a bit hard to say.
Unfortunately, it's one of the worst bits of documentation we have for our
GPUs and, as the giant comment explains, it's actually self-contradictory
once you start doing the math.

Jason Ekstrand (9):
  intel/blorp: Only double the fast-clear rect alignment on HSW
  intel/blorp: Use the hardware op for CCS ambiguate on gen10+
  intel/blorp: Rename blorp_ccs_resolve to blorp_ccs_op
  intel/blorp: Simplify asserts in blorp_ccs_op
  anv/blorp: Use blorp_ccs_op for everything
  intel/blorp: Make blorp_ccs_ambiguate just an internal helper
  i965: Use blorp_ccs_op for CCS fast-clears
  intel/blorp: Handle fast-clear directly in blorp_ccs_op
  intel/blorp: Refactor MCS clears

 src/intel/blorp/blorp.h   |  24 ++-
 src/intel/blorp/blorp_clear.c | 327 ++
 src/intel/blorp/blorp_genX_exec.h |   6 +
 src/intel/vulkan/anv_blorp.c  |  34 +---
 src/mesa/drivers/dri/i965/brw_blorp.c |  18 +-
 5 files changed, 203 insertions(+), 206 deletions(-)

-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/9] anv/blorp: Use blorp_ccs_op for everything

2018-02-20 Thread Jason Ekstrand
---
 src/intel/vulkan/anv_blorp.c | 28 ++--
 1 file changed, 2 insertions(+), 26 deletions(-)

diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index 3a89ea4..d894b6a 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -1649,12 +1649,6 @@ anv_image_ccs_op(struct anv_cmd_buffer *cmd_buffer,
assert(base_layer + layer_count <=
   anv_image_aux_layers(image, aspect, level));
 
-   uint32_t plane = anv_image_aspect_to_plane(image->aspects, aspect);
-   uint32_t width_div = image->format->planes[plane].denominator_scales[0];
-   uint32_t height_div = image->format->planes[plane].denominator_scales[1];
-   uint32_t level_width = anv_minify(image->extent.width, level) / width_div;
-   uint32_t level_height = anv_minify(image->extent.height, level) / 
height_div;
-
struct blorp_batch batch;
blorp_batch_init(_buffer->device->blorp, , cmd_buffer,
 predicate ? BLORP_BATCH_PREDICATE_ENABLE : 0);
@@ -1694,26 +1688,8 @@ anv_image_ccs_op(struct anv_cmd_buffer *cmd_buffer,
cmd_buffer->state.pending_pipe_bits |=
   ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;
 
-   switch (ccs_op) {
-   case ISL_AUX_OP_FAST_CLEAR:
-  blorp_fast_clear(, , surf.surf->format,
-   level, base_layer, layer_count,
-   0, 0, level_width, level_height);
-  break;
-   case ISL_AUX_OP_FULL_RESOLVE:
-   case ISL_AUX_OP_PARTIAL_RESOLVE:
-  blorp_ccs_op(, , level, base_layer, layer_count,
-   surf.surf->format, ccs_op);
-  break;
-   case ISL_AUX_OP_AMBIGUATE:
-  for (uint32_t a = 0; a < layer_count; a++) {
- const uint32_t layer = base_layer + a;
- blorp_ccs_ambiguate(, , level, layer);
-  }
-  break;
-   default:
-  unreachable("Unsupported CCS operation");
-   }
+   blorp_ccs_op(, , level, base_layer, layer_count,
+surf.surf->format, ccs_op);
 
cmd_buffer->state.pending_pipe_bits |=
   ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT | ANV_PIPE_CS_STALL_BIT;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] intel/blorp: Only double the fast-clear rect alignment on HSW

2018-02-20 Thread Jason Ekstrand
The data in the commit message is a bit sketchy for Ivybridge.  We don't
run dEQP or any of the CTSs on Ivybridge in CI so all the data we have
is piglit.  On Haswell, piglit didn't catch anything so we don't have
anything to go off of for Ivybridge besides the fact that the restriction
wasn't added until Haswell.
---
 src/intel/blorp/blorp_clear.c | 66 ---
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index dde116f..36ec185 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -235,16 +235,62 @@ get_fast_clear_rect(const struct isl_device *dev,
   x_scaledown = x_align / 2;
   y_scaledown = y_align / 2;
 
-  /* From BSpec: 3D-Media-GPGPU Engine > 3D Pipeline > Pixel > Pixel
-   * Backend > MCS Buffer for Render Target(s) [DevIVB+] > Table "Color
-   * Clear of Non-MultiSampled Render Target Restrictions":
-   *
-   *   Clear rectangle must be aligned to two times the number of
-   *   pixels in the table shown below due to 16x16 hashing across the
-   *   slice.
-   */
-  x_align *= 2;
-  y_align *= 2;
+  if (ISL_DEV_IS_HASWELL(dev)) {
+ /* The following text was added in the Haswell PRM, "3D Media GPGPU
+  * Engine" >> "MCS Buffer for Render Target(s)" >> Table "Color Clear
+  * of Non-MultiSampler Render Target Restrictions":
+  *
+  *"Clear rectangle must be aligned to two times the number of
+  *pixels in the table shown below due to 16X16 hashing across the
+  *slice."
+  *
+  * It has persisted in the documentation for all platforms up until
+  * Cannonlake and possibly even beyond.  However, we believe that it
+  * is only needed on Haswell.
+  *
+  * There are a couple possible explanations for this restriction:
+  *
+  * 1) If you assume that the hardware is writing to the CCS as
+  *bytes, then the x/y_align computed above gives you an alignment
+  *in the CCS of 8x8 bytes and, if 16x16 is needed for hashing, we
+  *need to multiply by 2.
+  *
+  * 2) Haswell is a bit unique in that it's CCS tiling does not line
+  *up with Y-tiling on a cache-line granularity.  Instead, it has
+  *an extra bit of swizzling in bit 9.  Also, bit 6 swizzling
+  *applies to the CCS on Haswell.  This means that Haswell CTS
+  *does not match on a cache-line granularity but it does match on
+  *a 2x2 cache line granularity.
+  *
+  * Clearly, the first explanation seems to follow documentation the
+  * best but they may be related.  In any case, empirical evidence
+  * seems to confirm that it is, indeed required on Haswell.
+  *
+  * On Broadwell things get a bit stickier.  Broadwell adds support
+  * for mip-mapped CCS with an alignment in the CCS of 256x128.  For a
+  * 32bpb main surface, the above computation will yield a x/y_align
+  * of 128x128 for a Y-tiled main surface and 256x64 for X-tiled.  In
+  * either case, if we double the alignment, we will get an alignment
+  * bigger than horizontal and vertical alignment of the CCS and fast
+  * clears of one LOD may leak into others.
+  *
+  * Starting with Skylake, the image alignment for the CCS is only
+  * 128x64 which is exactly the x/h_align computed above if the main
+  * surface has a 32bpb format.  Also, the "Render Target Resolve"
+  * page in the bspec (not the PRM) says, "The Resolve Rectangle size
+  * is same as Clear Rectangle size from SKL+".  The x/y_align
+  * computed above (without doubling) match the resolve rectangle
+  * calculation perfectly.
+  *
+  * Finally, to confirm all this, a full test run was performed on
+  * Feb. 9, 2018 with this doubling removed and the only platform
+  * which seemed to be affected was Haswell.  The run consisted of
+  * piglit, dEQP, the Vulkan CTS 1.0.2, the OpenGL 4.5 CTS, and the
+  * OpenGL ES 3.2 CTS.
+  */
+ x_align *= 2;
+ y_align *= 2;
+  }
} else {
   assert(aux_surf->usage == ISL_SURF_USAGE_MCS_BIT);
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] intel/blorp: Make blorp_ccs_ambiguate just an internal helper

2018-02-20 Thread Jason Ekstrand
Now that anv uses blorp_ccs_op for everything, we no longer need to
expose the ccs_ambiguate function directly.  It's much better tucked
away as an implementation detail.
---
 src/intel/blorp/blorp.h   |  5 -
 src/intel/blorp/blorp_clear.c | 21 ++---
 2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
index 8c775bf..e27ea7e 100644
--- a/src/intel/blorp/blorp.h
+++ b/src/intel/blorp/blorp.h
@@ -208,11 +208,6 @@ blorp_ccs_op(struct blorp_batch *batch,
  enum isl_aux_op ccs_op);
 
 void
-blorp_ccs_ambiguate(struct blorp_batch *batch,
-struct blorp_surf *surf,
-uint32_t level, uint32_t layer);
-
-void
 blorp_mcs_partial_resolve(struct blorp_batch *batch,
   struct blorp_surf *surf,
   enum isl_format format,
diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index 2027fda..17d47a1 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -814,6 +814,11 @@ blorp_clear_attachments(struct blorp_batch *batch,
batch->blorp->exec(batch, );
 }
 
+static void
+blorp_legacy_ccs_ambiguate(struct blorp_batch *batch,
+   struct blorp_surf *surf,
+   uint32_t level, uint32_t layer);
+
 void
 blorp_ccs_op(struct blorp_batch *batch,
  struct blorp_surf *surf, uint32_t level,
@@ -835,7 +840,7 @@ blorp_ccs_op(struct blorp_batch *batch,
* mess to another function.
*/
   for (uint32_t a = 0; a < num_layers; a++)
- blorp_ccs_ambiguate(batch, surf, level, start_layer + a);
+ blorp_legacy_ccs_ambiguate(batch, surf, level, start_layer + a);
   return;
}
 
@@ -999,17 +1004,11 @@ blorp_mcs_partial_resolve(struct blorp_batch *batch,
  * for a given layer/level of a surface to 0x0 which is the "uncompressed"
  * state which tells the sampler to go look at the main surface.
  */
-void
-blorp_ccs_ambiguate(struct blorp_batch *batch,
-struct blorp_surf *surf,
-uint32_t level, uint32_t layer)
+static void
+blorp_legacy_ccs_ambiguate(struct blorp_batch *batch,
+   struct blorp_surf *surf,
+   uint32_t level, uint32_t layer)
 {
-   if (ISL_DEV_GEN(batch->blorp->isl_dev) >= 10) {
-  /* On gen10 and above, we have a hardware resolve op for this */
-  return blorp_ccs_op(batch, surf, level, layer, 1,
-  surf->surf->format, ISL_AUX_OP_AMBIGUATE);
-   }
-
struct blorp_params params;
blorp_params_init();
 
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] intel/blorp: Use the hardware op for CCS ambiguate on gen10+

2018-02-20 Thread Jason Ekstrand
Cannonlake hardware adds a new resolve type in 3DSTATE_PS called
FAST_CLEAR_0 which does an ambiguate.  Now that the hardware can do it
directly, we should use that instead of binding the CCS as a render
target and doing it manually.  This was tested with a full Vulkan CTS
run on Cannonlake.
---
 src/intel/blorp/blorp_clear.c | 12 +++-
 src/intel/blorp/blorp_genX_exec.h |  6 ++
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index 36ec185..eea9cee 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -857,7 +857,11 @@ blorp_ccs_resolve(struct blorp_batch *batch,
params.x1 = ALIGN(params.x1, x_scaledown) / x_scaledown;
params.y1 = ALIGN(params.y1, y_scaledown) / y_scaledown;
 
-   if (batch->blorp->isl_dev->info->gen >= 9) {
+   if (batch->blorp->isl_dev->info->gen >= 10) {
+  assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE ||
+ resolve_op == ISL_AUX_OP_PARTIAL_RESOLVE ||
+ resolve_op == ISL_AUX_OP_AMBIGUATE);
+   } else if (batch->blorp->isl_dev->info->gen >= 9) {
   assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE ||
  resolve_op == ISL_AUX_OP_PARTIAL_RESOLVE);
} else {
@@ -992,6 +996,12 @@ blorp_ccs_ambiguate(struct blorp_batch *batch,
 struct blorp_surf *surf,
 uint32_t level, uint32_t layer)
 {
+   if (ISL_DEV_GEN(batch->blorp->isl_dev) >= 10) {
+  /* On gen10 and above, we have a hardware resolve op for this */
+  return blorp_ccs_resolve(batch, surf, level, layer, 1,
+   surf->surf->format, ISL_AUX_OP_AMBIGUATE);
+   }
+
struct blorp_params params;
blorp_params_init();
 
diff --git a/src/intel/blorp/blorp_genX_exec.h 
b/src/intel/blorp/blorp_genX_exec.h
index 737720a..234019c 100644
--- a/src/intel/blorp/blorp_genX_exec.h
+++ b/src/intel/blorp/blorp_genX_exec.h
@@ -755,6 +755,12 @@ blorp_emit_ps_config(struct blorp_batch *batch,
   switch (params->fast_clear_op) {
   case ISL_AUX_OP_NONE:
  break;
+#if GEN_GEN >= 10
+  case ISL_AUX_OP_AMBIGUATE:
+ ps.RenderTargetFastClearEnable = true;
+ ps.RenderTargetResolveType = FAST_CLEAR_0;
+ break;
+#endif
 #if GEN_GEN >= 9
   case ISL_AUX_OP_PARTIAL_RESOLVE:
  ps.RenderTargetResolveType = RESOLVE_PARTIAL;
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/9] intel/blorp: Rename blorp_ccs_resolve to blorp_ccs_op

2018-02-20 Thread Jason Ekstrand
We also make it capable of handling any aux op including fast-clear and
ambiguate.
---
 src/intel/blorp/blorp.h   | 10 
 src/intel/blorp/blorp_clear.c | 46 ---
 src/intel/vulkan/anv_blorp.c  |  4 +--
 src/mesa/drivers/dri/i965/brw_blorp.c |  6 ++---
 4 files changed, 42 insertions(+), 24 deletions(-)

diff --git a/src/intel/blorp/blorp.h b/src/intel/blorp/blorp.h
index 4626f2f..8c775bf 100644
--- a/src/intel/blorp/blorp.h
+++ b/src/intel/blorp/blorp.h
@@ -201,11 +201,11 @@ blorp_clear_attachments(struct blorp_batch *batch,
 uint8_t stencil_mask, uint8_t stencil_value);
 
 void
-blorp_ccs_resolve(struct blorp_batch *batch,
-  struct blorp_surf *surf, uint32_t level,
-  uint32_t start_layer, uint32_t num_layers,
-  enum isl_format format,
-  enum isl_aux_op resolve_op);
+blorp_ccs_op(struct blorp_batch *batch,
+ struct blorp_surf *surf, uint32_t level,
+ uint32_t start_layer, uint32_t num_layers,
+ enum isl_format format,
+ enum isl_aux_op ccs_op);
 
 void
 blorp_ccs_ambiguate(struct blorp_batch *batch,
diff --git a/src/intel/blorp/blorp_clear.c b/src/intel/blorp/blorp_clear.c
index eea9cee..c7c013a 100644
--- a/src/intel/blorp/blorp_clear.c
+++ b/src/intel/blorp/blorp_clear.c
@@ -815,12 +815,30 @@ blorp_clear_attachments(struct blorp_batch *batch,
 }
 
 void
-blorp_ccs_resolve(struct blorp_batch *batch,
-  struct blorp_surf *surf, uint32_t level,
-  uint32_t start_layer, uint32_t num_layers,
-  enum isl_format format,
-  enum isl_aux_op resolve_op)
+blorp_ccs_op(struct blorp_batch *batch,
+ struct blorp_surf *surf, uint32_t level,
+ uint32_t start_layer, uint32_t num_layers,
+ enum isl_format format,
+ enum isl_aux_op ccs_op)
 {
+   if (ccs_op == ISL_AUX_OP_FAST_CLEAR) {
+  blorp_fast_clear(batch, surf, format, level, start_layer, num_layers,
+   0, 0,
+   minify(surf->surf->logical_level0_px.w, level),
+   minify(surf->surf->logical_level0_px.h, level));
+  return;
+   } else if (ISL_DEV_GEN(batch->blorp->isl_dev) < 10 &&
+  ccs_op == ISL_AUX_OP_AMBIGUATE) {
+  /* Prior to Cannonlake, the ambiguate is not available as a hardware
+   * operation.  Instead, we have to fake it by carefully binding the CCS
+   * as a render target and clearing it to 0.  We leave that complicated
+   * mess to another function.
+   */
+  for (uint32_t a = 0; a < num_layers; a++)
+ blorp_ccs_ambiguate(batch, surf, level, start_layer + a);
+  return;
+   }
+
struct blorp_params params;
 
blorp_params_init();
@@ -858,17 +876,17 @@ blorp_ccs_resolve(struct blorp_batch *batch,
params.y1 = ALIGN(params.y1, y_scaledown) / y_scaledown;
 
if (batch->blorp->isl_dev->info->gen >= 10) {
-  assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE ||
- resolve_op == ISL_AUX_OP_PARTIAL_RESOLVE ||
- resolve_op == ISL_AUX_OP_AMBIGUATE);
+  assert(ccs_op == ISL_AUX_OP_FULL_RESOLVE ||
+ ccs_op == ISL_AUX_OP_PARTIAL_RESOLVE ||
+ ccs_op == ISL_AUX_OP_AMBIGUATE);
} else if (batch->blorp->isl_dev->info->gen >= 9) {
-  assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE ||
- resolve_op == ISL_AUX_OP_PARTIAL_RESOLVE);
+  assert(ccs_op == ISL_AUX_OP_FULL_RESOLVE ||
+ ccs_op == ISL_AUX_OP_PARTIAL_RESOLVE);
} else {
   /* Broadwell and earlier do not have a partial resolve */
-  assert(resolve_op == ISL_AUX_OP_FULL_RESOLVE);
+  assert(ccs_op == ISL_AUX_OP_FULL_RESOLVE);
}
-   params.fast_clear_op = resolve_op;
+   params.fast_clear_op = ccs_op;
params.num_layers = num_layers;
 
/* Note: there is no need to initialize push constants because it doesn't
@@ -998,8 +1016,8 @@ blorp_ccs_ambiguate(struct blorp_batch *batch,
 {
if (ISL_DEV_GEN(batch->blorp->isl_dev) >= 10) {
   /* On gen10 and above, we have a hardware resolve op for this */
-  return blorp_ccs_resolve(batch, surf, level, layer, 1,
-   surf->surf->format, ISL_AUX_OP_AMBIGUATE);
+  return blorp_ccs_op(batch, surf, level, layer, 1,
+  surf->surf->format, ISL_AUX_OP_AMBIGUATE);
}
 
struct blorp_params params;
diff --git a/src/intel/vulkan/anv_blorp.c b/src/intel/vulkan/anv_blorp.c
index bee51e0..3a89ea4 100644
--- a/src/intel/vulkan/anv_blorp.c
+++ b/src/intel/vulkan/anv_blorp.c
@@ -1702,8 +1702,8 @@ anv_image_ccs_op(struct anv_cmd_buffer *cmd_buffer,
   break;
case ISL_AUX_OP_FULL_RESOLVE:
case ISL_AUX_OP_PARTIAL_RESOLVE:
-  blorp_ccs_resolve(, , level, base_layer, layer_count,
-surf.surf->format, ccs_op);
+  blorp_ccs_op(, , level, 

[Mesa-dev] [PATCH] anv: Only copy clear dwords if we're rendering to the first slice

2018-02-20 Thread Jason Ekstrand
---
 src/intel/vulkan/genX_cmd_buffer.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 939a795..8015a42 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -3462,7 +3462,10 @@ cmd_buffer_begin_subpass(struct anv_cmd_buffer 
*cmd_buffer,
  assert(att_state->pending_clear_aspects == 0);
   }
 
-  if (att_state->pending_load_aspects & VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV) 
{
+  if ((att_state->pending_load_aspects & 
VK_IMAGE_ASPECT_ANY_COLOR_BIT_ANV) &&
+  image->planes[0].aux_surface.isl.size > 0 &&
+  iview->planes[0].isl.base_level == 0 &&
+  iview->planes[0].isl.base_array_layer == 0) {
  if (att_state->aux_usage != ISL_AUX_USAGE_NONE) {
 genX(copy_fast_clear_dwords)(cmd_buffer, att_state->color.state,
  image, VK_IMAGE_ASPECT_COLOR_BIT,
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] ac/radeonsi: pass bindless bool to load_sampler_desc()

2018-02-20 Thread Timothy Arceri

On 21/02/18 09:46, Marek Olšák wrote:

On Tue, Feb 20, 2018 at 11:42 PM, Marek Olšák  wrote:

For patches 1-5:

Reviewed-by: Marek Olšák 


Actually no. Only patches 1, 3, 5 are reviewed by me.

Marek


Do you have an issue with patch 4?
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] ac/radeonsi: pass bindless bool to load_sampler_desc()

2018-02-20 Thread Marek Olšák
On Tue, Feb 20, 2018 at 11:42 PM, Marek Olšák  wrote:
> For patches 1-5:
>
> Reviewed-by: Marek Olšák 

Actually no. Only patches 1, 3, 5 are reviewed by me.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/7] ac/radeonsi: pass bindless bool to load_sampler_desc()

2018-02-20 Thread Marek Olšák
For patches 1-5:

Reviewed-by: Marek Olšák 

Marek

On Tue, Feb 20, 2018 at 4:42 AM, Timothy Arceri  wrote:
> We also fix the base_index for bindless by using the driver
> location.
> ---
>  src/amd/common/ac_nir_to_llvm.c  | 14 +++---
>  src/amd/common/ac_shader_abi.h   |  3 ++-
>  src/gallium/drivers/radeonsi/si_shader_nir.c |  2 +-
>  3 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c
> index fc89779c12..9f55be0d45 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -4664,7 +4664,8 @@ static LLVMValueRef radv_get_sampler_desc(struct 
> ac_shader_abi *abi,
>   unsigned constant_index,
>   LLVMValueRef index,
>   enum ac_descriptor_type desc_type,
> - bool image, bool write)
> + bool image, bool write,
> + bool bindless)
>  {
> struct radv_shader_context *ctx = radv_shader_context_from_abi(abi);
> LLVMValueRef list = ctx->descriptor_sets[descriptor_set];
> @@ -4744,6 +4745,7 @@ static LLVMValueRef get_sampler_desc(struct 
> ac_nir_context *ctx,
> unsigned constant_index = 0;
> unsigned descriptor_set;
> unsigned base_index;
> +   bool bindless = false;
>
> if (!deref) {
> assert(tex_instr && !image);
> @@ -4777,14 +4779,20 @@ static LLVMValueRef get_sampler_desc(struct 
> ac_nir_context *ctx,
> tail = >deref;
> }
> descriptor_set = deref->var->data.descriptor_set;
> -   base_index = deref->var->data.binding;
> +
> +   if (deref->var->data.bindless) {
> +   bindless = deref->var->data.bindless;
> +   base_index = deref->var->data.driver_location;
> +   } else {
> +   base_index = deref->var->data.binding;
> +   }
> }
>
> return ctx->abi->load_sampler_desc(ctx->abi,
>   descriptor_set,
>   base_index,
>   constant_index, index,
> - desc_type, image, write);
> + desc_type, image, write, bindless);
>  }
>
>  static void set_tex_fetch_args(struct ac_llvm_context *ctx,
> diff --git a/src/amd/common/ac_shader_abi.h b/src/amd/common/ac_shader_abi.h
> index 62b8b7a5dc..de3034e32f 100644
> --- a/src/amd/common/ac_shader_abi.h
> +++ b/src/amd/common/ac_shader_abi.h
> @@ -156,7 +156,8 @@ struct ac_shader_abi {
>   unsigned constant_index,
>   LLVMValueRef index,
>   enum ac_descriptor_type desc_type,
> - bool image, bool write);
> + bool image, bool write,
> + bool bindless);
>
> /**
>  * Load a Vulkan-specific resource.
> diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
> b/src/gallium/drivers/radeonsi/si_shader_nir.c
> index 7a5acd3ff1..c2036a1509 100644
> --- a/src/gallium/drivers/radeonsi/si_shader_nir.c
> +++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
> @@ -776,7 +776,7 @@ si_nir_load_sampler_desc(struct ac_shader_abi *abi,
>  unsigned descriptor_set, unsigned base_index,
>  unsigned constant_index, LLVMValueRef dynamic_index,
>  enum ac_descriptor_type desc_type, bool image,
> -bool write)
> +bool write, bool bindless)
>  {
> struct si_shader_context *ctx = si_shader_context_from_abi(abi);
> LLVMBuilderRef builder = ctx->ac.builder;
> --
> 2.14.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/7] radeonsi/nir: add initial bindless image support

2018-02-20 Thread Marek Olšák
On Tue, Feb 20, 2018 at 4:42 AM, Timothy Arceri  wrote:
> ---
>  src/gallium/drivers/radeonsi/si_shader_nir.c | 41 
> +++-
>  1 file changed, 34 insertions(+), 7 deletions(-)
>
> diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
> b/src/gallium/drivers/radeonsi/si_shader_nir.c
> index c2036a1509..e3e71c6eb6 100644
> --- a/src/gallium/drivers/radeonsi/si_shader_nir.c
> +++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
> @@ -771,6 +771,21 @@ si_nir_lookup_interp_param(struct ac_shader_abi *abi,
> LLVMGetParam(ctx->main_fn, interp_param_idx) : NULL;
>  }
>
> +static LLVMValueRef
> +get_bindless_index(struct ac_shader_abi *abi,
> +   struct si_shader_context *ctx, LLVMValueRef index)
> +{
> +   LLVMValueRef offset =
> +   LLVMBuildMul(ctx->ac.builder, index, LLVMConstInt(ctx->i32, 
> 16, 0), "");
> +
> +   index = abi->load_ubo(abi, ctx->ac.i32_0);
> +
> +   LLVMValueRef ret = ac_build_buffer_load(>ac, index, 1, NULL, 
> offset,
> +   NULL, 0, false, false, true, 
> true);

I don't understand this. At least I think it shouldn't use load_ubo
and ac_build_buffer_load. A bindless variable (index) is a 64-bit
integer, i.e. the same size as vec2.

Marek
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/7] radeonsi/nir: set uses_bindless_images for images

2018-02-20 Thread Samuel Pitoiset



On 02/20/2018 04:42 AM, Timothy Arceri wrote:

---
  src/gallium/drivers/radeonsi/si_shader_nir.c | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/gallium/drivers/radeonsi/si_shader_nir.c 
b/src/gallium/drivers/radeonsi/si_shader_nir.c
index ea9f2076da..974068b88f 100644
--- a/src/gallium/drivers/radeonsi/si_shader_nir.c
+++ b/src/gallium/drivers/radeonsi/si_shader_nir.c
@@ -134,7 +134,11 @@ static void scan_instruction(struct tgsi_shader_info *info,
case nir_intrinsic_image_atomic_or:
case nir_intrinsic_image_atomic_xor:
case nir_intrinsic_image_atomic_exchange:
-   case nir_intrinsic_image_atomic_comp_swap:
+   case nir_intrinsic_image_atomic_comp_swap: {
+   nir_variable *var = intr->variables[0]->var;
+   if (var->data.bindless)
+   info->uses_bindless_images = true;
+   }


How about image loads and image query sizes?


case nir_intrinsic_store_ssbo:
case nir_intrinsic_ssbo_atomic_add:
case nir_intrinsic_ssbo_atomic_imin:


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2] radv: implement AMD_gcn_shader extension

2018-02-20 Thread Dylan Baker
Quoting Daniel Schürmann (2018-02-20 11:06:37)
> From: Dave Airlie 
> 
> Signed-off-by: Daniel Schürmann 
> ---
>   src/amd/common/ac_nir_to_llvm.c   | 51 +++
>   src/amd/vulkan/radv_extensions.py |  1 +
>   src/compiler/nir/meson.build  |  1 +
>   src/compiler/nir/nir_intrinsics.h |  5 
>   src/compiler/spirv/spirv_to_nir.c |  2 ++
>   src/compiler/spirv/vtn_amd.c  | 63 
> +++
>   src/compiler/spirv/vtn_private.h  |  3 ++
>   7 files changed, 126 insertions(+)
>   create mode 100644 src/compiler/spirv/vtn_amd.c
> 
> diff --git a/src/amd/common/ac_nir_to_llvm.c 
> b/src/amd/common/ac_nir_to_llvm.c
> index 12f097e2b2..251c225676 100644
> --- a/src/amd/common/ac_nir_to_llvm.c
> +++ b/src/amd/common/ac_nir_to_llvm.c
> @@ -4325,6 +4325,47 @@ load_patch_vertices_in(struct ac_shader_abi *abi)
> return LLVMConstInt(ctx->ac.i32, 
> ctx->options->key.tcs.input_vertices, false);
>   }
>   +static LLVMValueRef
> +visit_cube_face_index(struct ac_nir_context *ctx,
> + nir_intrinsic_instr *instr)
> +{
> +   LLVMValueRef result;
> +   LLVMValueRef in[3];
> +   LLVMValueRef src0 = ac_to_float(>ac, get_src(ctx, 
> instr->src[0]));
> +   for (unsigned chan = 0; chan < 3; chan++)
> +   in[chan] = ac_llvm_extract_elem(>ac, src0, chan);
> +
> +   result = ac_build_intrinsic(>ac,  "llvm.amdgcn.cubeid",
> +   ctx->ac.f32, in, 3, 
> AC_FUNC_ATTR_READNONE);
> +   return result;
> +}
> +
> +static LLVMValueRef
> +visit_cube_face_coord(struct ac_nir_context *ctx,
> + nir_intrinsic_instr *instr)
> +{
> +   LLVMValueRef results[2];
> +   LLVMValueRef in[3];
> +   LLVMValueRef src0 = ac_to_float(>ac, get_src(ctx, 
> instr->src[0]));
> +   for (unsigned chan = 0; chan < 3; chan++)
> +   in[chan] = ac_llvm_extract_elem(>ac, src0, chan);
> +
> +   results[0] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubetc",
> +   ctx->ac.f32, in, 3, 
> AC_FUNC_ATTR_READNONE);
> +   results[1] = ac_build_intrinsic(>ac, "llvm.amdgcn.cubesc",
> +   ctx->ac.f32, in, 3, 
> AC_FUNC_ATTR_READNONE);
> +   return ac_build_gather_values(>ac, results, 2);
> +}
> +
> +static LLVMValueRef
> +visit_time(struct ac_nir_context *ctx,
> +nir_intrinsic_instr *instr)
> +{
> +   return ac_build_intrinsic(>ac, "llvm.amdgcn.s.memrealtime",
> + ctx->ac.i64, NULL, 0, 
> AC_FUNC_ATTR_READONLY);
> +
> +}
> +
>   static void visit_intrinsic(struct ac_nir_context *ctx,
>   nir_intrinsic_instr *instr)
>   {
> @@ -4610,6 +4651,16 @@ static void visit_intrinsic(struct ac_nir_context 
> *ctx,
> result = LLVMBuildSExt(ctx->ac.builder, tmp, ctx->ac.i32, "");
> break;
> }
> +   case nir_intrinsic_cube_face_index:
> +   result = visit_cube_face_index(ctx, instr);
> +   break;
> +   case nir_intrinsic_cube_face_coord:
> +   result = visit_cube_face_coord(ctx, instr);
> +   break;
> +   case nir_intrinsic_time:
> +   result = visit_time(ctx, instr);
> +   break;
> +
> default:
> fprintf(stderr, "Unknown intrinsic: ");
> nir_print_instr(>instr, stderr);
> diff --git a/src/amd/vulkan/radv_extensions.py 
> b/src/amd/vulkan/radv_extensions.py
> index d761895d3a..a63e01faae 100644
> --- a/src/amd/vulkan/radv_extensions.py
> +++ b/src/amd/vulkan/radv_extensions.py
> @@ -88,6 +88,7 @@ EXTENSIONS = [
>   Extension('VK_EXT_external_memory_host',  1, 
> 'device->rad_info.has_userptr'),
>   Extension('VK_EXT_global_priority',   1, 
> 'device->rad_info.has_ctx_priority'),
>   Extension('VK_AMD_draw_indirect_count',   1, True),
> +Extension('VK_AMD_gcn_shader',1, True),

the indent here (and a few other places) looks off.

>   Extension('VK_AMD_rasterization_order',   1, 
> 'device->rad_info.chip_class >= VI && device->rad_info.max_se >= 2'),
>   Extension('VK_AMD_shader_info',   1, True),
>   ]
> diff --git a/src/compiler/nir/meson.build b/src/compiler/nir/meson.build
> index 859a0c1e62..e0011a4dc0 100644
> --- a/src/compiler/nir/meson.build
> +++ b/src/compiler/nir/meson.build
> @@ -189,6 +189,7 @@ files_libnir = files(
> '../spirv/spirv_info.h',
> '../spirv/spirv_to_nir.c',
> '../spirv/vtn_alu.c',
> +  '../spirv/vtn_amd.c',
> '../spirv/vtn_cfg.c',
> '../spirv/vtn_glsl450.c',
> '../spirv/vtn_private.h',
> diff --git a/src/compiler/nir/nir_intrinsics.h 
> b/src/compiler/nir/nir_intrinsics.h
> index ede2927787..e3c0620ce8 100644
> --- 

Re: [Mesa-dev] [PATCH 1/7] nir: add bindless to nir data

2018-02-20 Thread Samuel Pitoiset

Reviewed-by: Samuel Pitoiset 

On 02/20/2018 04:42 AM, Timothy Arceri wrote:

---
  src/compiler/glsl/glsl_to_nir.cpp | 1 +
  src/compiler/nir/nir.h| 6 ++
  2 files changed, 7 insertions(+)

diff --git a/src/compiler/glsl/glsl_to_nir.cpp 
b/src/compiler/glsl/glsl_to_nir.cpp
index 7a9d15015e..49d66c173c 100644
--- a/src/compiler/glsl/glsl_to_nir.cpp
+++ b/src/compiler/glsl/glsl_to_nir.cpp
@@ -434,6 +434,7 @@ nir_visitor::visit(ir_variable *ir)
 var->data.index = ir->data.index;
 var->data.descriptor_set = 0;
 var->data.binding = ir->data.binding;
+   var->data.bindless = ir->data.bindless;
 var->data.offset = ir->data.offset;
 var->data.image.read_only = ir->data.memory_read_only;
 var->data.image.write_only = ir->data.memory_write_only;
diff --git a/src/compiler/nir/nir.h b/src/compiler/nir/nir.h
index 2acd9511f5..c6541f0a6f 100644
--- a/src/compiler/nir/nir.h
+++ b/src/compiler/nir/nir.h
@@ -249,6 +249,12 @@ typedef struct nir_variable {
 */
unsigned fb_fetch_output:1;
  
+  /**

+   * Non-zero if this variable is considered bindless as defined by
+   * ARB_bindless_texture.
+   */
+  unsigned bindless:1;
+
/**
 * \brief Layout qualifier for gl_FragDepth.
 *


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH shaderdb 3/3] run: shader program file created via GetProgramBinary

2018-02-20 Thread Dongwon Kim
Thanks for the review. I put my comments below yours.

On Wed, Feb 14, 2018 at 10:56:14AM +0200, Eero Tamminen wrote:
> Hi,
> 
> On 13.02.2018 03:26, Dongwon Kim wrote:
> >extraction of linked binary program to a file using glGetProgramBinary.
> >This file is intended to be loaded by glProgramBinary in the graphic
> >application running on the target system.
> >
> >A new option, '--out=' is available to be used for specifying
> >the output file name.
> >
> >Signed-off-by: Dongwon Kim 
> >---
> >  run.c | 46 --
> >  1 file changed, 44 insertions(+), 2 deletions(-)
> >
> >diff --git a/run.c b/run.c
> >index d066567..54575e1 100644
> >--- a/run.c
> >+++ b/run.c
> >@@ -358,18 +358,20 @@ const struct platform platforms[] = {
> >  enum
> >  {
> >  PCI_ID_OVERRIDE_OPTION = CHAR_MAX + 1,
> >+OUT_PROGRAM_OPTION,
> >  };
> >  const struct option const long_options[] =
> >  {
> >  {"pciid", required_argument, NULL, PCI_ID_OVERRIDE_OPTION},
> >+{"out", required_argument, NULL, OUT_PROGRAM_OPTION},
> >  {NULL, 0, NULL, 0}
> >  };
> >  void print_usage(const char *prog_name)
> >  {
> >  fprintf(stderr,
> >-"Usage: %s [-d ] [-j ] [-o ] [-p 
> >] [--pciid=]  >*.shader_test files>\n",
> >+"Usage: %s [-d ] [-j ] [-o ] [-p 
> >] [--pciid=] [--out= >output shader program>] \n",
> >  prog_name);
> >  }
> >@@ -450,6 +452,7 @@ main(int argc, char **argv)
> >  int opt;
> >  bool platf_overridden = 0;
> >  bool pci_id_overridden = 0;
> >+char out_file[64] = {0};
> 
> File names can be potentially thousands of chars long.
> 
> Why not just use:
>   const char *out_file = NULL;
> ?
> 
> >  max_threads = omp_get_max_threads();
> >@@ -518,6 +521,13 @@ main(int argc, char **argv)
> >  setenv("INTEL_DEVID_OVERRIDE", optarg, 1);
> >  pci_id_overridden = 1;
> >  break;
> >+case OUT_PROGRAM_OPTION:
> >+if (optarg[0] == 0) {
> >+  fprintf(stderr, "Output file name is empty.\n");
> >+  return -1;
> >+}
> >+strncpy(out_file, optarg, 64);
> 
> ...and if pointer cannot assigned directly, strdup & assert if that fails.

yeah, your proposal sounds better. I will do so.

> 
> 
> >+break;
> >  default:
> >  fprintf(stderr, "Unknown option: %x\n", opt);
> >  print_usage(argv[0]);
> >@@ -858,13 +868,13 @@ main(int argc, char **argv)
> >  }
> >  } else if (type == TYPE_CORE || type == TYPE_COMPAT || type == 
> > TYPE_ES) {
> >  GLuint prog = glCreateProgram();
> >+GLint param;
> >  for (unsigned i = 0; i < num_shaders; i++) {
> >  GLuint s = glCreateShader(shader[i].type);
> >  glShaderSource(s, 1, [i].text, 
> > [i].length);
> >  glCompileShader(s);
> >-GLint param;
> >  glGetShaderiv(s, GL_COMPILE_STATUS, );
> >  if (unlikely(!param)) {
> >  GLchar log[4096];
> >@@ -879,6 +889,38 @@ main(int argc, char **argv)
> >  }
> >  glLinkProgram(prog);
> >+
> >+glGetProgramiv(prog, GL_LINK_STATUS, );
> >+if (unlikely(!param)) {
> >+   GLchar log[4096];
> 
> Maybe add define for log buffer size as it's used in multiple places?

I just followed the existing notation. However, I agree with you on this.
I am going to define a constant and use it instead.

> 
> 
> >+   GLsizei length;
> >+   glGetProgramInfoLog(prog, 4096, , log);
> 
> 4096 -> sizeof(log)
> 
> 
> >+
> >+   fprintf(stderr, "ERROR: failed to link progam:\n%s\n",
> >+   log);
> >+} else {
> >+   if (out_file[0] != 0) {
> 
> If changed to pointer, check for NULL.

With assert(outfile != NULL) above, I don't think I actually need this condition
check..

> 
> 
> >+  char *prog_buf = (char *)malloc(10*1024*1024);
> >+  GLenum format;
> >+  GLsizei length;
> >+  FILE *fp;
> >+
> >+  glGetProgramBinary(prog, 10*1024*1024, , 
> >, prog_buf);
> 
> Use a define for size instead of magic value.

Yeah, agreed. I will change this.

> 
> >+
> >+  param = glGetError();
> >+  if (param != GL_NO_ERROR) {
> >+ fprintf(stderr, "ERROR: failed to get Program 
> >Binary\n");
> >+  } else {
> >+ fp = fopen(out_file, "wb");
> >+ fprintf(stdout, "Binary program is generated (%d 
> >Byte).\n", length);
> >+ fprintf(stdout, "Binary Format is %d\n", format);
> >+

[Mesa-dev] [PATCH] anv: Add WSI support for the I915_FORMAT_MOD_Y_TILED_CCS

2018-02-20 Thread Jason Ekstrand
v2 (Jason Ekstrand):
 - Return the correct enum values from anv_layout_to_fast_clear_type

v3 (Jason Ekstrand):
 - Always return ANV_FAST_CLEAR_NONE and leave doing the right thing for
   the patch which adds a modifier which supports fast-clears.
---
 src/intel/vulkan/anv_formats.c |  9 
 src/intel/vulkan/anv_image.c   | 49 ++
 2 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/src/intel/vulkan/anv_formats.c b/src/intel/vulkan/anv_formats.c
index 9c52ad5..3c17366 100644
--- a/src/intel/vulkan/anv_formats.c
+++ b/src/intel/vulkan/anv_formats.c
@@ -671,9 +671,18 @@ get_wsi_format_modifier_properties_list(const struct 
anv_physical_device *physic
   DRM_FORMAT_MOD_LINEAR,
   I915_FORMAT_MOD_X_TILED,
   I915_FORMAT_MOD_Y_TILED,
+  I915_FORMAT_MOD_Y_TILED_CCS,
};
 
for (uint32_t i = 0; i < ARRAY_SIZE(modifiers); i++) {
+  const struct isl_drm_modifier_info *mod_info =
+ isl_drm_modifier_get_info(modifiers[i]);
+
+  if (mod_info->aux_usage == ISL_AUX_USAGE_CCS_E &&
+  !isl_format_supports_ccs_e(_device->info,
+ anv_format->planes[0].isl_format))
+ continue;
+
   vk_outarray_append(, mod_props) {
  mod_props->modifier = modifiers[i];
  if (isl_drm_modifier_has_aux(modifiers[i]))
diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
index a2bae7b..f536459 100644
--- a/src/intel/vulkan/anv_image.c
+++ b/src/intel/vulkan/anv_image.c
@@ -515,6 +515,7 @@ score_drm_format_mod(uint64_t modifier)
case DRM_FORMAT_MOD_LINEAR: return 1;
case I915_FORMAT_MOD_X_TILED: return 2;
case I915_FORMAT_MOD_Y_TILED: return 3;
+   case I915_FORMAT_MOD_Y_TILED_CCS: return 4;
default: unreachable("bad DRM format modifier");
}
 }
@@ -746,8 +747,13 @@ void anv_GetImageSubresourceLayout(
 VkSubresourceLayout*layout)
 {
ANV_FROM_HANDLE(anv_image, image, _image);
-   const struct anv_surface *surface =
-  get_surface(image, subresource->aspectMask);
+
+   const struct anv_surface *surface;
+   if (subresource->aspectMask == VK_IMAGE_ASPECT_PLANE_1_BIT_KHR &&
+   isl_drm_modifier_has_aux(image->drm_format_mod))
+  surface = >planes[0].aux_surface;
+   else
+  surface = get_surface(image, subresource->aspectMask);
 
assert(__builtin_popcount(subresource->aspectMask) == 1);
 
@@ -862,25 +868,20 @@ anv_layout_to_aux_usage(const struct gen_device_info * 
const devinfo,
   }
 
 
-   case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR:
+   case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: {
   assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
 
-  /* On SKL+, the render buffer can be decompressed by the presentation
-   * engine. Support for this feature has not yet landed in the wider
-   * ecosystem. TODO: Update this code when support lands.
-   *
-   * From the BDW PRM, Vol 7, Render Target Resolve:
-   *
-   *If the MCS is enabled on a non-multisampled render target, the
-   *render target must be resolved before being used for other
-   *purposes (display, texture, CPU lock) The clear value from
-   *SURFACE_STATE is written into pixels in the render target
-   *indicated as clear in the MCS.
-   *
-   * Pre-SKL, the render buffer must be resolved before being used for
-   * presentation. We can infer that the auxiliary buffer is not used.
+  /* When handing the image off to the presentation engine, we need to
+   * ensure that things are properly resolved.  For images with no
+   * modifier, we assume that they follow the old rules and always need
+   * a full resolve because the PE doesn't understand any form of
+   * compression.  For images with modifiers, we use the aux usage from
+   * the modifier.
*/
-  return ISL_AUX_USAGE_NONE;
+  const struct isl_drm_modifier_info *mod_info =
+ isl_drm_modifier_get_info(image->drm_format_mod);
+  return mod_info ? mod_info->aux_usage : ISL_AUX_USAGE_NONE;
+   }
 
 
/* Rendering Layouts */
@@ -960,8 +961,18 @@ anv_layout_to_fast_clear_type(const struct gen_device_info 
* const devinfo,
case VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL:
   return ANV_FAST_CLEAR_ANY;
 
-   case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR:
+   case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: {
+  assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
+#ifndef NDEBUG
+  /* We do not yet support any modifiers which support clear color so we
+   * just always return NONE.  One day, this will change.
+   */
+  const struct isl_drm_modifier_info *mod_info =
+ isl_drm_modifier_get_info(image->drm_format_mod);
+  assert(!mod_info || !mod_info->supports_clear_color);
+#endif
   return ANV_FAST_CLEAR_NONE;
+   }
 
default:
   /* If the image has CCS_E enabled all the time then we can use
-- 
2.5.0.400.gff86faf


[Mesa-dev] [Bug 105179] DiRT Rally: wrong frames appear during camera transition

2018-02-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105179

Gregor Münch  changed:

   What|Removed |Added

 QA Contact|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop
   |org |.org
   Assignee|mesa-dev@lists.freedesktop. |dri-devel@lists.freedesktop
   |org |.org
  Component|Mesa core   |Drivers/Gallium/radeonsi

--- Comment #2 from Gregor Münch  ---
(In reply to Ilia Mirkin from comment #1)
> In any case, those ills are in no way related to radeonsi or any other
> driver.

Thx for your explanation! Sorry for the noise, reassigning to radeon than.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] anv: Add WSI support for the I915_FORMAT_MOD_Y_TILED_CCS

2018-02-20 Thread Jason Ekstrand
On Tue, Feb 20, 2018 at 1:57 PM, Nanley Chery  wrote:

> On Tue, Feb 20, 2018 at 11:31:08AM -0800, Jason Ekstrand wrote:
> > On Tue, Feb 20, 2018 at 11:26 AM, Jason Ekstrand 
> > wrote:
> >
> > > On Tue, Feb 20, 2018 at 11:25 AM, Nanley Chery 
> > > wrote:
> > >
> > >> On Fri, Feb 16, 2018 at 09:28:43AM -0800, Jason Ekstrand wrote:
> > >> > ---
> > >> >  src/intel/vulkan/anv_formats.c |  9 +++
> > >> >  src/intel/vulkan/anv_image.c   | 53 ++
> > >> 
> > >> >  2 files changed, 42 insertions(+), 20 deletions(-)
> > >> >
> > >> > diff --git a/src/intel/vulkan/anv_formats.c
> > >> b/src/intel/vulkan/anv_formats.c
> > >> > index 9c52ad5..3c17366 100644
> > >> > --- a/src/intel/vulkan/anv_formats.c
> > >> > +++ b/src/intel/vulkan/anv_formats.c
> > >> > @@ -671,9 +671,18 @@ get_wsi_format_modifier_properties_list(const
> > >> struct anv_physical_device *physic
> > >> >DRM_FORMAT_MOD_LINEAR,
> > >> >I915_FORMAT_MOD_X_TILED,
> > >> >I915_FORMAT_MOD_Y_TILED,
> > >> > +  I915_FORMAT_MOD_Y_TILED_CCS,
> > >> > };
> > >> >
> > >> > for (uint32_t i = 0; i < ARRAY_SIZE(modifiers); i++) {
> > >> > +  const struct isl_drm_modifier_info *mod_info =
> > >> > + isl_drm_modifier_get_info(modifiers[i]);
> > >> > +
> > >> > +  if (mod_info->aux_usage == ISL_AUX_USAGE_CCS_E &&
> > >> > +  !isl_format_supports_ccs_e(_device->info,
> > >> > + anv_format->planes[0].isl_for
> > >> mat))
> > >> > + continue;
> > >> > +
> > >> >vk_outarray_append(, mod_props) {
> > >> >   mod_props->modifier = modifiers[i];
> > >> >   if (isl_drm_modifier_has_aux(modifiers[i]))
> > >> > diff --git a/src/intel/vulkan/anv_image.c
> b/src/intel/vulkan/anv_image.c
> > >> > index a2bae7b..d7c2e55 100644
> > >> > --- a/src/intel/vulkan/anv_image.c
> > >> > +++ b/src/intel/vulkan/anv_image.c
> > >> > @@ -515,6 +515,7 @@ score_drm_format_mod(uint64_t modifier)
> > >> > case DRM_FORMAT_MOD_LINEAR: return 1;
> > >> > case I915_FORMAT_MOD_X_TILED: return 2;
> > >> > case I915_FORMAT_MOD_Y_TILED: return 3;
> > >> > +   case I915_FORMAT_MOD_Y_TILED_CCS: return 4;
> > >> > default: unreachable("bad DRM format modifier");
> > >> > }
> > >> >  }
> > >> > @@ -746,8 +747,13 @@ void anv_GetImageSubresourceLayout(
> > >> >  VkSubresourceLayout*layout)
> > >> >  {
> > >> > ANV_FROM_HANDLE(anv_image, image, _image);
> > >> > -   const struct anv_surface *surface =
> > >> > -  get_surface(image, subresource->aspectMask);
> > >> > +
> > >> > +   const struct anv_surface *surface;
> > >> > +   if (subresource->aspectMask == VK_IMAGE_ASPECT_PLANE_1_BIT_KHR
> &&
> > >> > +   isl_drm_modifier_has_aux(image->drm_format_mod))
> > >> > +  surface = >planes[0].aux_surface;
> > >> > +   else
> > >> > +  surface = get_surface(image, subresource->aspectMask);
> > >> >
> > >> > assert(__builtin_popcount(subresource->aspectMask) == 1);
> > >> >
> > >> > @@ -862,25 +868,20 @@ anv_layout_to_aux_usage(const struct
> > >> gen_device_info * const devinfo,
> > >> >}
> > >> >
> > >> >
> > >> > -   case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR:
> > >> > +   case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: {
> > >> >assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> > >> >
> > >> > -  /* On SKL+, the render buffer can be decompressed by the
> > >> presentation
> > >> > -   * engine. Support for this feature has not yet landed in the
> > >> wider
> > >> > -   * ecosystem. TODO: Update this code when support lands.
> > >> > -   *
> > >> > -   * From the BDW PRM, Vol 7, Render Target Resolve:
> > >> > -   *
> > >> > -   *If the MCS is enabled on a non-multisampled render
> target,
> > >> the
> > >> > -   *render target must be resolved before being used for
> other
> > >> > -   *purposes (display, texture, CPU lock) The clear value
> from
> > >> > -   *SURFACE_STATE is written into pixels in the render
> target
> > >> > -   *indicated as clear in the MCS.
> > >> > -   *
> > >> > -   * Pre-SKL, the render buffer must be resolved before being
> used
> > >> for
> > >> > -   * presentation. We can infer that the auxiliary buffer is
> not
> > >> used.
> > >> > +  /* When handing the image off to the presentation engine, we
> > >> need to
> > >> > +   * ensure that things are properly resolved.  For images
> with no
> > >> > +   * modifier, we assume that they follow the old rules and
> always
> > >> need
> > >> > +   * a full resolve because the PE doesn't understand any form
> of
> > >> > +   * compression.  For images with modifiers, we use the aux
> usage
> > >> from
> > >> > +   * the modifier.
> > >> > */
> > >> > -  return ISL_AUX_USAGE_NONE;
> > >> > +  const struct 

Re: [Mesa-dev] [PATCH v5 07/34] nvc0/debug: add env var to make nir default

2018-02-20 Thread Pierre Moreau
Acked-by: Pierre Moreau 

On 2018-02-20 — 22:02, Karol Herbst wrote:
> v2: allow for non debug builds as well
> v3: move reading out env var more global
> disable tg4 with multiple offsets with nir
> disable caps for 64 bit types
> 
> Signed-off-by: Karol Herbst 
> ---
>  src/gallium/drivers/nouveau/nouveau_screen.c   |  4 
>  src/gallium/drivers/nouveau/nouveau_screen.h   |  2 ++
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 12 
>  3 files changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/src/gallium/drivers/nouveau/nouveau_screen.c 
> b/src/gallium/drivers/nouveau/nouveau_screen.c
> index c144b39b2d..6c52f9e40c 100644
> --- a/src/gallium/drivers/nouveau/nouveau_screen.c
> +++ b/src/gallium/drivers/nouveau/nouveau_screen.c
> @@ -175,6 +175,7 @@ nouveau_screen_init(struct nouveau_screen *screen, struct 
> nouveau_device *dev)
> void *data;
> union nouveau_bo_config mm_config;
>  
> +   char *use_nir = getenv("NV50_PROG_USE_NIR");
> char *nv_dbg = getenv("NOUVEAU_MESA_DEBUG");
> if (nv_dbg)
>nouveau_mesa_debug = atoi(nv_dbg);
> @@ -261,6 +262,9 @@ nouveau_screen_init(struct nouveau_screen *screen, struct 
> nouveau_device *dev)
> NOUVEAU_BO_GART | NOUVEAU_BO_MAP,
> _config);
> screen->mm_VRAM = nouveau_mm_create(dev, NOUVEAU_BO_VRAM, _config);
> +
> +   screen->prefer_nir = use_nir && strtol(use_nir, NULL, 0) == 1;
> +
> return 0;
>  }
>  
> diff --git a/src/gallium/drivers/nouveau/nouveau_screen.h 
> b/src/gallium/drivers/nouveau/nouveau_screen.h
> index e4fbae99ca..1229b66b26 100644
> --- a/src/gallium/drivers/nouveau/nouveau_screen.h
> +++ b/src/gallium/drivers/nouveau/nouveau_screen.h
> @@ -62,6 +62,8 @@ struct nouveau_screen {
>  
> struct disk_cache *disk_shader_cache;
>  
> +   bool prefer_nir;
> +
>  #ifdef NOUVEAU_ENABLE_DRIVER_STATISTICS
> union {
>uint64_t v[29];
> diff --git a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c 
> b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> index fb5668d726..35fe028039 100644
> --- a/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> +++ b/src/gallium/drivers/nouveau/nvc0/nvc0_screen.c
> @@ -112,7 +112,8 @@ static int
>  nvc0_screen_get_param(struct pipe_screen *pscreen, enum pipe_cap param)
>  {
> const uint16_t class_3d = nouveau_screen(pscreen)->class_3d;
> -   struct nouveau_device *dev = nouveau_screen(pscreen)->device;
> +   const struct nouveau_screen *screen = nouveau_screen(pscreen);
> +   struct nouveau_device *dev = screen->device;
>  
> switch (param) {
> /* non-boolean caps */
> @@ -216,7 +217,6 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
> pipe_cap param)
> case PIPE_CAP_USER_VERTEX_BUFFERS:
> case PIPE_CAP_TEXTURE_QUERY_LOD:
> case PIPE_CAP_SAMPLE_SHADING:
> -   case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
> case PIPE_CAP_TEXTURE_GATHER_SM5:
> case PIPE_CAP_TGSI_FS_FINE_DERIVATIVE:
> case PIPE_CAP_CONDITIONAL_RENDER_INVERTED:
> @@ -256,6 +256,9 @@ nvc0_screen_get_param(struct pipe_screen *pscreen, enum 
> pipe_cap param)
> case PIPE_CAP_CAN_BIND_CONST_BUFFER_AS_VERTEX:
> case PIPE_CAP_ALLOW_MAPPED_BUFFERS_DURING_EXECUTION:
>return 1;
> +   case PIPE_CAP_TEXTURE_GATHER_OFFSETS:
> +  /* TODO: nir doesn't support tg4 with multiple offsets */
> +  return screen->prefer_nir ? 0 : 1;
> case PIPE_CAP_PREFER_BLIT_BASED_TEXTURE_TRANSFER:
>return nouveau_screen(pscreen)->vram_domain & NOUVEAU_BO_VRAM ? 1 : 0;
> case PIPE_CAP_TGSI_FS_FBFETCH:
> @@ -338,7 +341,8 @@ nvc0_screen_get_shader_param(struct pipe_screen *pscreen,
>   enum pipe_shader_type shader,
>   enum pipe_shader_cap param)
>  {
> -   const uint16_t class_3d = nouveau_screen(pscreen)->class_3d;
> +   const struct nouveau_screen *screen = nouveau_screen(pscreen);
> +   const uint16_t class_3d = screen->class_3d;
>  
> switch (shader) {
> case PIPE_SHADER_VERTEX:
> @@ -354,7 +358,7 @@ nvc0_screen_get_shader_param(struct pipe_screen *pscreen,
>  
> switch (param) {
> case PIPE_SHADER_CAP_PREFERRED_IR:
> -  return PIPE_SHADER_IR_TGSI;
> +  return screen->prefer_nir ? PIPE_SHADER_IR_NIR : PIPE_SHADER_IR_TGSI;
> case PIPE_SHADER_CAP_SUPPORTED_IRS:
>return 1 << PIPE_SHADER_IR_TGSI |
>   1 << PIPE_SHADER_IR_NIR;
> -- 
> 2.14.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] radv prep for removing tess specific user sgprs

2018-02-20 Thread Samuel Pitoiset



On 02/20/2018 02:25 AM, Dave Airlie wrote:

These are just some cleanups that popped out of a series I was working
on to remove all the tcs/tes user sgprs stuff.

I've got the full patchset working on VI, just need to test on Vega now.


Would be nice to also double-check with a game that needs tess like F1, 
and probably RoTR.


Assuming this doesn't regress anything on Vega, series is:

Reviewed-by: Samuel Pitoiset 



Dave.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5 06/34] nvir/nir: add support for NIR on nvc0

2018-02-20 Thread Pierre Moreau
Acked-by: Pierre Moreau 

On 2018-02-20 — 22:02, Karol Herbst wrote:
> not all those nir options are actually required, it just made the work a
> little easier.
> 
> v2: fix asserts
> parse compute shaders
> don't lower bitfield_insert
> v3: fix memory leak
> v4: don't lower fmod32
> v5: set lower_all_io_to_temps to false
> fix memory leak because we take over ownership of the nir shader
> merge: use the lowering helper
> 
> Signed-off-by: Karol Herbst 
> ---
>  src/gallium/drivers/nouveau/Makefile.sources   |  1 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir.cpp|  3 +
>  src/gallium/drivers/nouveau/codegen/nv50_ir.h  |  1 +
>  .../drivers/nouveau/codegen/nv50_ir_from_nir.cpp   | 74 
> ++
>  src/gallium/drivers/nouveau/meson.build| 10 +--
>  src/gallium/drivers/nouveau/nvc0/nvc0_program.c| 18 +-
>  src/gallium/drivers/nouveau/nvc0/nvc0_screen.c | 41 +++-
>  src/gallium/drivers/nouveau/nvc0/nvc0_state.c  | 27 +++-
>  8 files changed, 166 insertions(+), 9 deletions(-)
>  create mode 100644 src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
> 
> diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
> b/src/gallium/drivers/nouveau/Makefile.sources
> index ec344c6316..c6a1aff711 100644
> --- a/src/gallium/drivers/nouveau/Makefile.sources
> +++ b/src/gallium/drivers/nouveau/Makefile.sources
> @@ -117,6 +117,7 @@ NV50_CODEGEN_SOURCES := \
>   codegen/nv50_ir_emit_nv50.cpp \
>   codegen/nv50_ir_from_common.cpp \
>   codegen/nv50_ir_from_common.h \
> + codegen/nv50_ir_from_nir.cpp \
>   codegen/nv50_ir_from_tgsi.cpp \
>   codegen/nv50_ir_graph.cpp \
>   codegen/nv50_ir_graph.h \
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
> index 6f12df70a1..b95ba8e4e9 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.cpp
> @@ -1231,6 +1231,9 @@ nv50_ir_generate_code(struct nv50_ir_prog_info *info)
> prog->optLevel = info->optLevel;
>  
> switch (info->bin.sourceRep) {
> +   case PIPE_SHADER_IR_NIR:
> +  ret = prog->makeFromNIR(info) ? 0 : -2;
> +  break;
> case PIPE_SHADER_IR_TGSI:
>ret = prog->makeFromTGSI(info) ? 0 : -2;
>break;
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir.h 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> index f4f3c70888..e5b4592a61 100644
> --- a/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir.h
> @@ -1255,6 +1255,7 @@ public:
> inline void del(Function *fn, int& id) { allFuncs.remove(id); }
> inline void add(Value *rval, int& id) { allRValues.insert(rval, id); }
>  
> +   bool makeFromNIR(struct nv50_ir_prog_info *);
> bool makeFromTGSI(struct nv50_ir_prog_info *);
> bool convertToSSA();
> bool optimizeSSA(int level);
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
> new file mode 100644
> index 00..73527d4800
> --- /dev/null
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_from_nir.cpp
> @@ -0,0 +1,74 @@
> +/*
> + * Copyright 2017 Red Hat Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * Authors: Karol Herbst 
> + */
> +
> +#include "compiler/nir/nir.h"
> +
> +#include "codegen/nv50_ir.h"
> +#include "codegen/nv50_ir_from_common.h"
> +#include "codegen/nv50_ir_lowering_helper.h"
> +#include "codegen/nv50_ir_util.h"
> +
> +namespace {
> +
> +using namespace nv50_ir;
> +
> +class Converter : public ConverterCommon
> +{
> +public:
> +   Converter(Program *, nir_shader *, nv50_ir_prog_info *);
> +
> +   bool run();
> +private:
> +   nir_shader *nir;
> +};
> +
> +Converter::Converter(Program 

[Mesa-dev] [Bug 105179] DiRT Rally: wrong frames appear during camera transition

2018-02-20 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=105179

--- Comment #1 from Ilia Mirkin  ---
(In reply to Gregor Münch from comment #0)
> https://lists.freedesktop.org/archives/mesa-dev/2018-February/185134.html
> 
> "except I got a few ltc errors in DiRT
> Rally. Unclear if it's related to this patch though. Could be a missing
> flush somewhere."
> 
> ..I think it is a general mesa issue, except Im wrong here.

This is a nouveau-specific problem, in fact, Maxwell+ only. LTC = level-two
cache. Something specific to NVIDIA hw (other GPUs have it too, but it's always
different and controlled by the hw or driver directly - not something that's
API-accessible, or even generic across drivers).

Also note that the errors only happened when I force-enabled bindless textures
in DiRT Rally (via a config option), which is the feature that the patch was
implementing. The "missing flush" is in reference to something missing inside
of nouveau wrt texture management. There's ample evidence that there's
something off somewhere in there.

In any case, those ills are in no way related to radeonsi or any other driver.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5 05/34] nvir: add lowering helper

2018-02-20 Thread Pierre Moreau
With the variables “dt”, “st” and “std” renamed to match existing code, this
patch is

Reviewed-by: Pierre Moreau 

On 2018-02-20 — 22:02, Karol Herbst wrote:
> this is mostly usefull for lazy IR converters not wanting to deal with 64 bit
> lowering and other illegal stuff
> 
> v5: also handle SAT
> 
> Signed-off-by: Karol Herbst 
> ---
>  src/gallium/drivers/nouveau/Makefile.sources   |   2 +
>  .../nouveau/codegen/nv50_ir_lowering_helper.cpp| 267 
> +
>  .../nouveau/codegen/nv50_ir_lowering_helper.h  |  53 
>  src/gallium/drivers/nouveau/meson.build|   2 +
>  4 files changed, 324 insertions(+)
>  create mode 100644 
> src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp
>  create mode 100644 
> src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.h
> 
> diff --git a/src/gallium/drivers/nouveau/Makefile.sources 
> b/src/gallium/drivers/nouveau/Makefile.sources
> index fee5e59522..ec344c6316 100644
> --- a/src/gallium/drivers/nouveau/Makefile.sources
> +++ b/src/gallium/drivers/nouveau/Makefile.sources
> @@ -122,6 +122,8 @@ NV50_CODEGEN_SOURCES := \
>   codegen/nv50_ir_graph.h \
>   codegen/nv50_ir.h \
>   codegen/nv50_ir_inlines.h \
> + codegen/nv50_ir_lowering_helper.cpp \
> + codegen/nv50_ir_lowering_helper.h \
>   codegen/nv50_ir_lowering_nv50.cpp \
>   codegen/nv50_ir_peephole.cpp \
>   codegen/nv50_ir_print.cpp \
> diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp 
> b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp
> new file mode 100644
> index 00..680c6d4f38
> --- /dev/null
> +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_lowering_helper.cpp
> @@ -0,0 +1,267 @@
> +/*
> + * Copyright 2018 Red Hat Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + * Authors: Karol Herbst 
> + */
> +
> +#include "codegen/nv50_ir_lowering_helper.h"
> +
> +namespace nv50_ir {
> +
> +bool
> +LoweringHelper::visit(Instruction *insn)
> +{
> +   switch (insn->op) {
> +   case OP_ABS:
> +  return handleABS(insn);
> +   case OP_CVT:
> +  return handleCVT(insn);
> +   case OP_MAX:
> +   case OP_MIN:
> +  return handleMAXMIN(insn);
> +   case OP_MOV:
> +  return handleMOV(insn);
> +   case OP_NEG:
> +  return handleNEG(insn);
> +   case OP_SAT:
> +  return handleSAT(insn);
> +   case OP_SLCT:
> +  return handleSLCT(insn->asCmp());
> +   case OP_AND:
> +   case OP_OR:
> +   case OP_XOR:
> +  return handleLogOp(insn);
> +   default:
> +  return true;
> +   }
> +}
> +
> +bool
> +LoweringHelper::handleABS(Instruction *insn)
> +{
> +   DataType dt = insn->dType;
> +   if (!(dt == TYPE_U64 || dt == TYPE_S64))
> +  return true;
> +
> +   bld.setPosition(insn, false);
> +
> +   Value *neg = bld.getSSA(8);
> +   Value *negComp[2], *srcComp[2];
> +   Value *lo = bld.getSSA(), *hi = bld.getSSA();
> +   bld.mkOp2(OP_SUB, dt, neg, bld.mkImm((uint64_t)0), insn->getSrc(0));
> +   bld.mkSplit(negComp, 4, neg);
> +   bld.mkSplit(srcComp, 4, insn->getSrc(0));
> +   bld.mkCmp(OP_SLCT, CC_LT, TYPE_S32, lo, TYPE_S32, negComp[0], srcComp[0], 
> srcComp[1]);
> +   bld.mkCmp(OP_SLCT, CC_LT, TYPE_S32, hi, TYPE_S32, negComp[1], srcComp[1], 
> srcComp[1]);
> +   insn->op = OP_MERGE;
> +   insn->setSrc(0, lo);
> +   insn->setSrc(1, hi);
> +
> +   return true;
> +}
> +
> +bool
> +LoweringHelper::handleCVT(Instruction *insn)
> +{
> +   DataType dt = insn->dType;
> +   DataType st = insn->sType;
> +
> +   if (typeSizeof(dt) <= 4 && typeSizeof(st) <= 4)
> +  return true;
> +
> +   bld.setPosition(insn, false);
> +
> +   if ((dt == TYPE_S32 && st == TYPE_S64) ||
> +   (dt == TYPE_U32 && st == TYPE_U64)) {
> +  Value *src[2];
> +  bld.mkSplit(src, 4, insn->getSrc(0));
> +  insn->op = OP_MOV;
> +  insn->setSrc(0, 

Re: [Mesa-dev] [PATCH] anv: Add WSI support for the I915_FORMAT_MOD_Y_TILED_CCS

2018-02-20 Thread Nanley Chery
On Tue, Feb 20, 2018 at 11:31:08AM -0800, Jason Ekstrand wrote:
> On Tue, Feb 20, 2018 at 11:26 AM, Jason Ekstrand 
> wrote:
> 
> > On Tue, Feb 20, 2018 at 11:25 AM, Nanley Chery 
> > wrote:
> >
> >> On Fri, Feb 16, 2018 at 09:28:43AM -0800, Jason Ekstrand wrote:
> >> > ---
> >> >  src/intel/vulkan/anv_formats.c |  9 +++
> >> >  src/intel/vulkan/anv_image.c   | 53 ++
> >> 
> >> >  2 files changed, 42 insertions(+), 20 deletions(-)
> >> >
> >> > diff --git a/src/intel/vulkan/anv_formats.c
> >> b/src/intel/vulkan/anv_formats.c
> >> > index 9c52ad5..3c17366 100644
> >> > --- a/src/intel/vulkan/anv_formats.c
> >> > +++ b/src/intel/vulkan/anv_formats.c
> >> > @@ -671,9 +671,18 @@ get_wsi_format_modifier_properties_list(const
> >> struct anv_physical_device *physic
> >> >DRM_FORMAT_MOD_LINEAR,
> >> >I915_FORMAT_MOD_X_TILED,
> >> >I915_FORMAT_MOD_Y_TILED,
> >> > +  I915_FORMAT_MOD_Y_TILED_CCS,
> >> > };
> >> >
> >> > for (uint32_t i = 0; i < ARRAY_SIZE(modifiers); i++) {
> >> > +  const struct isl_drm_modifier_info *mod_info =
> >> > + isl_drm_modifier_get_info(modifiers[i]);
> >> > +
> >> > +  if (mod_info->aux_usage == ISL_AUX_USAGE_CCS_E &&
> >> > +  !isl_format_supports_ccs_e(_device->info,
> >> > + anv_format->planes[0].isl_for
> >> mat))
> >> > + continue;
> >> > +
> >> >vk_outarray_append(, mod_props) {
> >> >   mod_props->modifier = modifiers[i];
> >> >   if (isl_drm_modifier_has_aux(modifiers[i]))
> >> > diff --git a/src/intel/vulkan/anv_image.c b/src/intel/vulkan/anv_image.c
> >> > index a2bae7b..d7c2e55 100644
> >> > --- a/src/intel/vulkan/anv_image.c
> >> > +++ b/src/intel/vulkan/anv_image.c
> >> > @@ -515,6 +515,7 @@ score_drm_format_mod(uint64_t modifier)
> >> > case DRM_FORMAT_MOD_LINEAR: return 1;
> >> > case I915_FORMAT_MOD_X_TILED: return 2;
> >> > case I915_FORMAT_MOD_Y_TILED: return 3;
> >> > +   case I915_FORMAT_MOD_Y_TILED_CCS: return 4;
> >> > default: unreachable("bad DRM format modifier");
> >> > }
> >> >  }
> >> > @@ -746,8 +747,13 @@ void anv_GetImageSubresourceLayout(
> >> >  VkSubresourceLayout*layout)
> >> >  {
> >> > ANV_FROM_HANDLE(anv_image, image, _image);
> >> > -   const struct anv_surface *surface =
> >> > -  get_surface(image, subresource->aspectMask);
> >> > +
> >> > +   const struct anv_surface *surface;
> >> > +   if (subresource->aspectMask == VK_IMAGE_ASPECT_PLANE_1_BIT_KHR &&
> >> > +   isl_drm_modifier_has_aux(image->drm_format_mod))
> >> > +  surface = >planes[0].aux_surface;
> >> > +   else
> >> > +  surface = get_surface(image, subresource->aspectMask);
> >> >
> >> > assert(__builtin_popcount(subresource->aspectMask) == 1);
> >> >
> >> > @@ -862,25 +868,20 @@ anv_layout_to_aux_usage(const struct
> >> gen_device_info * const devinfo,
> >> >}
> >> >
> >> >
> >> > -   case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR:
> >> > +   case VK_IMAGE_LAYOUT_PRESENT_SRC_KHR: {
> >> >assert(image->aspects == VK_IMAGE_ASPECT_COLOR_BIT);
> >> >
> >> > -  /* On SKL+, the render buffer can be decompressed by the
> >> presentation
> >> > -   * engine. Support for this feature has not yet landed in the
> >> wider
> >> > -   * ecosystem. TODO: Update this code when support lands.
> >> > -   *
> >> > -   * From the BDW PRM, Vol 7, Render Target Resolve:
> >> > -   *
> >> > -   *If the MCS is enabled on a non-multisampled render target,
> >> the
> >> > -   *render target must be resolved before being used for other
> >> > -   *purposes (display, texture, CPU lock) The clear value from
> >> > -   *SURFACE_STATE is written into pixels in the render target
> >> > -   *indicated as clear in the MCS.
> >> > -   *
> >> > -   * Pre-SKL, the render buffer must be resolved before being used
> >> for
> >> > -   * presentation. We can infer that the auxiliary buffer is not
> >> used.
> >> > +  /* When handing the image off to the presentation engine, we
> >> need to
> >> > +   * ensure that things are properly resolved.  For images with no
> >> > +   * modifier, we assume that they follow the old rules and always
> >> need
> >> > +   * a full resolve because the PE doesn't understand any form of
> >> > +   * compression.  For images with modifiers, we use the aux usage
> >> from
> >> > +   * the modifier.
> >> > */
> >> > -  return ISL_AUX_USAGE_NONE;
> >> > +  const struct isl_drm_modifier_info *mod_info =
> >> > + isl_drm_modifier_get_info(image->drm_format_mod);
> >> > +  return mod_info ? mod_info->aux_usage : ISL_AUX_USAGE_NONE;
> >> > +   }
> >> >
> >> >
> >> > /* Rendering Layouts */
> >> > @@ -960,8 +961,20 @@ anv_layout_to_fast_clear_type(const struct
> >> gen_device_info 

[Mesa-dev] [PATCH 2/5] anv/cmd_buffer: Use layout_to_* helpers in compute_aux_usage

2018-02-20 Thread Jason Ekstrand
---
 src/intel/vulkan/genX_cmd_buffer.c | 53 +-
 1 file changed, 35 insertions(+), 18 deletions(-)

diff --git a/src/intel/vulkan/genX_cmd_buffer.c 
b/src/intel/vulkan/genX_cmd_buffer.c
index 5c36fc7..8bd824b 100644
--- a/src/intel/vulkan/genX_cmd_buffer.c
+++ b/src/intel/vulkan/genX_cmd_buffer.c
@@ -223,16 +223,27 @@ color_attachment_compute_aux_usage(struct anv_device * 
device,
   att_state->input_aux_usage = ISL_AUX_USAGE_NONE;
   att_state->fast_clear = false;
   return;
-   } else if (iview->image->planes[0].aux_usage == ISL_AUX_USAGE_MCS) {
-  att_state->aux_usage = ISL_AUX_USAGE_MCS;
+   }
+
+   att_state->aux_usage =
+  anv_layout_to_aux_usage(>info, iview->image,
+  VK_IMAGE_ASPECT_COLOR_BIT,
+  VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL);
+
+   /* If we don't have aux, then we should have returned early in the layer
+* check above.  If we got here, we must have something.
+*/
+   assert(att_state->aux_usage != ISL_AUX_USAGE_NONE);
+
+   if (att_state->aux_usage == ISL_AUX_USAGE_MCS) {
   att_state->input_aux_usage = ISL_AUX_USAGE_MCS;
   att_state->fast_clear = false;
   return;
-   } else if (iview->image->planes[0].aux_usage == ISL_AUX_USAGE_CCS_E) {
-  att_state->aux_usage = ISL_AUX_USAGE_CCS_E;
+   }
+
+   if (att_state->aux_usage == ISL_AUX_USAGE_CCS_E) {
   att_state->input_aux_usage = ISL_AUX_USAGE_CCS_E;
} else {
-  att_state->aux_usage = ISL_AUX_USAGE_CCS_D;
   /* From the Sky Lake PRM, RENDER_SURFACE_STATE::AuxiliarySurfaceMode:
*
*"If Number of Multisamples is MULTISAMPLECOUNT_1, AUX_CCS_D
@@ -286,8 +297,25 @@ color_attachment_compute_aux_usage(struct anv_device * 
device,
   isl_color_value_is_zero(clear_color, iview->planes[0].isl.format);
 
if (att_state->pending_clear_aspects == VK_IMAGE_ASPECT_COLOR_BIT) {
-  /* Start off assuming fast clears are possible */
-  att_state->fast_clear = true;
+  /* Start by getting the fast clear type.  We use the first subpass
+   * layout here because we don't want to fast-clear if the first subpass
+   * to use the attachment can't handle fast-clears.
+   */
+  enum anv_fast_clear_type fast_clear_type =
+ anv_layout_to_fast_clear_type(>info, iview->image,
+   VK_IMAGE_ASPECT_COLOR_BIT,
+   
cmd_state->pass->attachments[att].first_subpass_layout);
+  switch (fast_clear_type) {
+  case ANV_FAST_CLEAR_NONE:
+ att_state->fast_clear = false;
+ break;
+  case ANV_FAST_CLEAR_DEFAULT_VALUE:
+ att_state->fast_clear = att_state->clear_color_is_zero;
+ break;
+  case ANV_FAST_CLEAR_ANY:
+ att_state->fast_clear = true;
+ break;
+  }
 
   /* Potentially, we could do partial fast-clears but doing so has crazy
* alignment restrictions.  It's easier to just restrict to full size
@@ -303,17 +331,6 @@ color_attachment_compute_aux_usage(struct anv_device * 
device,
   if (GEN_GEN <= 8 && !att_state->clear_color_is_zero_one)
  att_state->fast_clear = false;
 
-  /* We only allow fast clears in the GENERAL layout if the auxiliary
-   * buffer is always enabled and the fast-clear value is all 0's. See
-   * add_aux_state_tracking_buffer() for more information.
-   */
-  if (cmd_state->pass->attachments[att].first_subpass_layout ==
-  VK_IMAGE_LAYOUT_GENERAL &&
-  (!att_state->clear_color_is_zero ||
-   iview->image->planes[0].aux_usage == ISL_AUX_USAGE_NONE)) {
- att_state->fast_clear = false;
-  }
-
   /* We only allow fast clears to the first slice of an image (level 0,
* layer 0) and only for the entire slice.  This guarantees us that, at
* any given time, there is only one clear color on any given image at
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


  1   2   3   >