[Mesa-dev] [PATCH] panfrost/midgard: Implement fpow
We have a native op for this, which was just found in a disassembly -- so instead of lowering, use it! Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/midgard/helpers.h | 1 + src/gallium/drivers/panfrost/midgard/midgard.h | 2 ++ src/gallium/drivers/panfrost/midgard/midgard_compile.c | 1 + src/gallium/drivers/panfrost/midgard/midgard_compile.h | 1 - 4 files changed, 4 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/panfrost/midgard/helpers.h b/src/gallium/drivers/panfrost/midgard/helpers.h index b597e516f20..606eb9982e7 100644 --- a/src/gallium/drivers/panfrost/midgard/helpers.h +++ b/src/gallium/drivers/panfrost/midgard/helpers.h @@ -228,6 +228,7 @@ static unsigned alu_opcode_props[256] = { [midgard_alu_op_frcp] = UNIT_VLUT, [midgard_alu_op_frsqrt] = UNIT_VLUT, [midgard_alu_op_fsqrt] = UNIT_VLUT, +[midgard_alu_op_fpow] = UNIT_VLUT, [midgard_alu_op_fexp2] = UNIT_VLUT, [midgard_alu_op_flog2] = UNIT_VLUT, diff --git a/src/gallium/drivers/panfrost/midgard/midgard.h b/src/gallium/drivers/panfrost/midgard/midgard.h index f6b83d30f5d..39b1df5d915 100644 --- a/src/gallium/drivers/panfrost/midgard/midgard.h +++ b/src/gallium/drivers/panfrost/midgard/midgard.h @@ -108,6 +108,7 @@ typedef enum { midgard_alu_op_fcsel = 0xC5, midgard_alu_op_fround = 0xC6, midgard_alu_op_fatan_pt2 = 0xE8, +midgard_alu_op_fpow = 0xEC, midgard_alu_op_frcp = 0xF0, midgard_alu_op_frsqrt = 0xF2, midgard_alu_op_fsqrt = 0xF3, @@ -462,6 +463,7 @@ static char *alu_opcode_names[256] = { [midgard_alu_op_frcp] = "frcp", [midgard_alu_op_frsqrt] = "frsqrt", [midgard_alu_op_fsqrt] = "fsqrt", +[midgard_alu_op_fpow] = "fpow", [midgard_alu_op_fexp2] = "fexp2", [midgard_alu_op_flog2] = "flog2", [midgard_alu_op_fsin] = "fsin", diff --git a/src/gallium/drivers/panfrost/midgard/midgard_compile.c b/src/gallium/drivers/panfrost/midgard/midgard_compile.c index a6c4b4498da..7f3b6997ca0 100644 --- a/src/gallium/drivers/panfrost/midgard/midgard_compile.c +++ b/src/gallium/drivers/panfrost/midgard/midgard_compile.c @@ -990,6 +990,7 @@ emit_alu(compiler_context *ctx, nir_alu_instr *instr) ALU_CASE(frcp, frcp); ALU_CASE(frsq, frsqrt); ALU_CASE(fsqrt, fsqrt); +ALU_CASE(fpow, fpow); ALU_CASE(fexp2, fexp2); ALU_CASE(flog2, flog2); diff --git a/src/gallium/drivers/panfrost/midgard/midgard_compile.h b/src/gallium/drivers/panfrost/midgard/midgard_compile.h index 2fcbc317164..6225d041c07 100644 --- a/src/gallium/drivers/panfrost/midgard/midgard_compile.h +++ b/src/gallium/drivers/panfrost/midgard/midgard_compile.h @@ -62,7 +62,6 @@ midgard_compile_shader_nir(nir_shader *nir, midgard_program *program, bool is_bl static const nir_shader_compiler_options midgard_nir_options = { .lower_ffma = true, .lower_sub = true, -.lower_fpow = true, .lower_scmp = true, .lower_flrp32 = true, .lower_flrp64 = true, -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 9/9] anv: Enabled the VK_EXT_sample_locations extension
Reviewed-by: Sagar Ghuge On 3/12/19 3:34 AM, Eleni Maria Stea wrote: > Enabled the VK_EXT_sample_locations for Intel Gen >= 7. > > v2: Replaced device.info->gen >= 7 with True, as Anv doesn't support > anything below Gen7. (Lionel Landwerlin) > --- > src/intel/vulkan/anv_extensions.py | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/intel/vulkan/anv_extensions.py > b/src/intel/vulkan/anv_extensions.py > index 9e4e03e46df..5a30c733c5c 100644 > --- a/src/intel/vulkan/anv_extensions.py > +++ b/src/intel/vulkan/anv_extensions.py > @@ -129,7 +129,7 @@ EXTENSIONS = [ > Extension('VK_EXT_inline_uniform_block', 1, True), > Extension('VK_EXT_pci_bus_info', 2, True), > Extension('VK_EXT_post_depth_coverage', 1, > 'device->info.gen >= 9'), > -Extension('VK_EXT_sample_locations', 1, False), > +Extension('VK_EXT_sample_locations', 1, True), > Extension('VK_EXT_sampler_filter_minmax', 1, > 'device->info.gen >= 9'), > Extension('VK_EXT_scalar_block_layout', 1, True), > Extension('VK_EXT_shader_stencil_export', 1, > 'device->info.gen >= 9'), > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 1/9] anv: Added the VK_EXT_sample_locations extension to the anv_extensions list
Reviewed-by: Sagar Ghuge On 3/12/19 3:34 AM, Eleni Maria Stea wrote: > Added the VK_EXT_sample_locations to the anv_extensions.py list to > generate the related entrypoints. > --- > src/intel/vulkan/anv_extensions.py | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/intel/vulkan/anv_extensions.py > b/src/intel/vulkan/anv_extensions.py > index 6fff293dee4..9e4e03e46df 100644 > --- a/src/intel/vulkan/anv_extensions.py > +++ b/src/intel/vulkan/anv_extensions.py > @@ -129,6 +129,7 @@ EXTENSIONS = [ > Extension('VK_EXT_inline_uniform_block', 1, True), > Extension('VK_EXT_pci_bus_info', 2, True), > Extension('VK_EXT_post_depth_coverage', 1, > 'device->info.gen >= 9'), > +Extension('VK_EXT_sample_locations', 1, False), > Extension('VK_EXT_sampler_filter_minmax', 1, > 'device->info.gen >= 9'), > Extension('VK_EXT_scalar_block_layout', 1, True), > Extension('VK_EXT_shader_stencil_export', 1, > 'device->info.gen >= 9'), > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] ac/nir_to_llvm: dont add unrequired swizzle to bcsel src
NIR can produce IR that looks like this: vec1 1 ssa_51 = ilt ssa_32, ssa_50 vec4 32 ssa_54 = intrinsic load_deref (ssa_53) (0) vec4 32 ssa_57 = intrinsic load_deref (ssa_56) (0) vec4 32 ssa_61 = bcsel ssa_51., ssa_54, ssa_57 The swizzle on the first bcsel src causes a crash in llvm as it doesn't seem to expect it. Here we add a special case for the first bcsel src. --- src/amd/common/ac_nir_to_llvm.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 5fb5c8da609..cebf248427c 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -154,13 +154,18 @@ static LLVMBasicBlockRef get_block(struct ac_nir_context *nir, static LLVMValueRef get_alu_src(struct ac_nir_context *ctx, nir_alu_src src, -unsigned num_components) +unsigned num_components, +bool is_first_bcsel_src) { LLVMValueRef value = get_src(ctx, src.src); bool need_swizzle = false; assert(value); unsigned src_components = ac_get_llvm_num_components(value); + + if (is_first_bcsel_src) + num_components = 1; + for (unsigned i = 0; i < num_components; ++i) { assert(src.swizzle[i] < src_components); if (src.swizzle[i] != i) @@ -584,8 +589,10 @@ static void visit_alu(struct ac_nir_context *ctx, const nir_alu_instr *instr) src_components = num_components; break; } - for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) - src[i] = get_alu_src(ctx, instr->src[i], src_components); + for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) { + src[i] = get_alu_src(ctx, instr->src[i], src_components, + (instr->op == nir_op_b32csel && i == 0)); + } switch (instr->op) { case nir_op_fmov: -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] anv/pass: Flag the need for a depth flush for resolve attachments
Cc: mesa-sta...@lists.freedesktop.org Cc: Nanley Chery --- src/intel/vulkan/anv_pass.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c index 5fac5bbb31c..ec217abfda0 100644 --- a/src/intel/vulkan/anv_pass.c +++ b/src/intel/vulkan/anv_pass.c @@ -178,12 +178,28 @@ anv_render_pass_compile(struct anv_render_pass *pass) * subpasses and checking to see if any of them don't have an external * dependency. Or, we could just be lazy and add a couple extra flushes. * We choose to be lazy. +* +* From the documentation for vkCmdNextSubpass: +* +*"Moving to the next subpass automatically performs any multisample +*resolve operations in the subpass being ended. End-of-subpass +*multisample resolves are treated as color attachment writes for the +*purposes of synchronization. This applies to resolve operations for +*both color and depth/stencil attachments. That is, they are +*considered to execute in the +*VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT pipeline stage and +*their writes are synchronized with +*VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT." +* +* Therefore, the above flags concerning color attachments also apply to +* color and depth/stencil resolve attachments. */ if (all_usage & VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT) { pass->subpass_flushes[0] |= ANV_PIPE_TEXTURE_CACHE_INVALIDATE_BIT; } - if (all_usage & VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT) { + if (all_usage & (VK_IMAGE_USAGE_COLOR_ATTACHMENT_BIT | +VK_IMAGE_USAGE_TRANSFER_DST_BIT)) { pass->subpass_flushes[pass->subpass_count] |= ANV_PIPE_RENDER_TARGET_CACHE_FLUSH_BIT; } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/8] mesa: Implement helper functions to map and unmap a VAO.
The series looks great. Just a few minor suggestions below. Reviewed-by: Brian Paul BTW, I happened to look at _mesa_primitive_restart_index(). It seems we could/should add an assertion in there: assert(index_size == 1 || index_size == 2 || index_size == 4); right? -Brian On 03/05/2019 01:13 AM, mathias.froehl...@gmx.net wrote: From: Mathias Fröhlich Provide a set of functions that maps or unmaps all VBOs held in a VAO. The functions will be used in the following patches. Signed-off-by: Mathias Fröhlich --- src/mesa/main/arrayobj.c | 84 src/mesa/main/arrayobj.h | 18 + 2 files changed, 102 insertions(+) diff --git a/src/mesa/main/arrayobj.c b/src/mesa/main/arrayobj.c index 68d30aa9b1f..1f6c6904739 100644 --- a/src/mesa/main/arrayobj.c +++ b/src/mesa/main/arrayobj.c @@ -913,6 +913,90 @@ _mesa_all_buffers_are_unmapped(const struct gl_vertex_array_object *vao) return true; } + +/** + * Map buffer objects used in attribute arrays. + */ +void +_mesa_vao_map_arrays(struct gl_context *ctx, struct gl_vertex_array_object *vao, + GLbitfield access) +{ + GLbitfield mask = vao->Enabled & vao->VertexAttribBufferMask; + while (mask) { + /* Do not use u_bit_scan as we can walk multiple attrib arrays at once */ + const gl_vert_attrib attr = ffs(mask) - 1; + const GLubyte bindex = vao->VertexAttrib[attr].BufferBindingIndex; + struct gl_vertex_buffer_binding *binding = >BufferBinding[bindex]; + mask &= ~binding->_BoundArrays; + + struct gl_buffer_object *bo = binding->BufferObj; + assert(_mesa_is_bufferobj(bo)); + if (_mesa_bufferobj_mapped(bo, MAP_INTERNAL)) + continue; + + ctx->Driver.MapBufferRange(ctx, 0, bo->Size, access, bo, MAP_INTERNAL); + } +} + + +/** + * Map buffer objects used in the vao, + * attribute arrays and index buffer. Combine onto one line. + */ +void +_mesa_vao_map(struct gl_context *ctx, struct gl_vertex_array_object *vao, + GLbitfield access) +{ A comment might be hlpeful here, like /* map the index buffer, if there is one, and not already mapped */ + struct gl_buffer_object *bo = vao->IndexBufferObj; + + if (_mesa_is_bufferobj(bo) && !_mesa_bufferobj_mapped(bo, MAP_INTERNAL)) + ctx->Driver.MapBufferRange(ctx, 0, bo->Size, access, bo, MAP_INTERNAL); + + _mesa_vao_map_arrays(ctx, vao, access); +} + + +/** + * Unmap buffer objects used in attribute arrays. + */ +void +_mesa_vao_unmap_arrays(struct gl_context *ctx, + struct gl_vertex_array_object *vao) +{ + GLbitfield mask = vao->Enabled & vao->VertexAttribBufferMask; + while (mask) { + /* Do not use u_bit_scan as we can walk multiple attrib arrays at once */ + const gl_vert_attrib attr = ffs(mask) - 1; + const GLubyte bindex = vao->VertexAttrib[attr].BufferBindingIndex; + struct gl_vertex_buffer_binding *binding = >BufferBinding[bindex]; + mask &= ~binding->_BoundArrays; + + struct gl_buffer_object *bo = binding->BufferObj; + assert(_mesa_is_bufferobj(bo)); + if (!_mesa_bufferobj_mapped(bo, MAP_INTERNAL)) + continue; + + ctx->Driver.UnmapBuffer(ctx, bo, MAP_INTERNAL); + } +} + + +/** + * Unmap buffer objects used in the vao, + * attribute arrays and index buffer. combine onto one line. + */ +void +_mesa_vao_unmap(struct gl_context *ctx, struct gl_vertex_array_object *vao) +{ + struct gl_buffer_object *bo = vao->IndexBufferObj; + + if (_mesa_is_bufferobj(bo) && _mesa_bufferobj_mapped(bo, MAP_INTERNAL)) + ctx->Driver.UnmapBuffer(ctx, bo, MAP_INTERNAL); + + _mesa_vao_unmap_arrays(ctx, vao); +} + + /**/ /* API Functions */ /**/ diff --git a/src/mesa/main/arrayobj.h b/src/mesa/main/arrayobj.h index ee87b4b6ba5..7516bae9e39 100644 --- a/src/mesa/main/arrayobj.h +++ b/src/mesa/main/arrayobj.h @@ -100,6 +100,24 @@ extern bool _mesa_all_buffers_are_unmapped(const struct gl_vertex_array_object *vao); +extern void +_mesa_vao_map_arrays(struct gl_context *ctx, struct gl_vertex_array_object *vao, + GLbitfield access); + +extern void +_mesa_vao_map(struct gl_context *ctx, struct gl_vertex_array_object *vao, + GLbitfield access); + + +extern void +_mesa_vao_unmap_arrays(struct gl_context *ctx, + struct gl_vertex_array_object *vao); + +extern void +_mesa_vao_unmap(struct gl_context *ctx, +struct gl_vertex_array_object *vao); + + /** * Array to apply the position/generic0 aliasing map to * an attribute value used in vertex processing inputs to an attribute ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org
[Mesa-dev] [AppVeyor] mesa master #10389 completed
Build mesa 10389 completed Commit 3570d15b6d by Kenneth Graunke on 3/12/2019 2:00 AM: intel/fs: Fix opt_peephole_csel to not throw away saturates.\n\nWe were not copying the saturate bit from the original instruction\nto the new replacement instruction. This caused major misrendering\nin DiRT Rally on iris, where comparisons leading to discards failed\ndue to the missing saturate, causing lots of extra garbage pixels to\nbe drawn in text rendering, trees, and so on.\n\nThis did not show up on i965 because st/nir performs a more aggressive\nversion of nir_opt_peephole_select, yielding more b32csel operations.\n\nFixes: 52c7df1643e i965/fs: Merge CMP and SEL into CSEL on Gen8+\n\nReviewed-by: Lionel Landwerlin \nReviewed-by: Ian Romanick Configure your notification preferences ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 109641] GLX swrast driver leaks shared memory segments
https://bugs.freedesktop.org/show_bug.cgi?id=109641 Brian Paul changed: What|Removed |Added Resolution|--- |FIXED Status|NEW |RESOLVED -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering
On Tue, Mar 12, 2019 at 9:59 AM Marc-André Lureau wrote: > > Hi > > On Fri, Mar 1, 2019 at 12:13 PM Mathias Fröhlich > wrote: > > > > On Friday, 1 March 2019 12:15:08 CET Eero Tamminen wrote: > > > Hi, > > > > > > On 1.3.2019 11.12, Michel Dänzer wrote: > > > > On 2019-02-28 8:41 p.m., Marek Olšák wrote: > > > >>> On Thu, Feb 28, 2019 at 1:37 PM Eero Tamminen > > > >>> > > > Why distro versions of Qemu filter sched_setaffinity() syscall? > > > >>> > > > >>> (https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1815889) > > > >>> > > > >>> Daniel Berrange (berrange) wrote on 2019-02-27: #19 > > > >>> > > > >>> "IMHO that mesa change is not valid. It is settings its affinity to > > > >>> run on all threads which is definitely *NOT* something we want to be > > > >>> allowed. Management applications want to control which CPUs QEMU runs > > > >>> on, and as such Mesa should honour the CPU placement that the QEMU > > > >>> process has. > > > >>> > > > >>> This is a great example of why QEMU wants to use seccomp to block > > > >>> affinity changes to prevent something silently trying to use more CPUs > > > >>> than are assigned to this QEMU." > > > >>> > > > >> > > > >> Mesa uses thread affinity to optimize memory access performance on some > > > >> CPUs (see util_pin_thread_to_L3). Other places in Mesa need to restore > > > >> the > > > >> original thread affinity for some child threads. Additionally, if games > > > >> limit the thread affinity, Mesa needs to restore the full thread > > > >> affinity > > > >> for some of its child threads. > > > > > > > > The last part sounds like Mesa clearly overstepping its authority. > > > > > > > > > > > >> In essence, the thread affinity should only be considered a hint for > > > >> the > > > >> kernel for optimal performance. There is no reason to kill the process > > > >> if > > > >> it's disallowed. Just ignore the call or modify the thread mask to > > > >> make it > > > >> legal. > > > > > > > > The fundamental issue here is that Mesa is using the thread affinity API > > > > for something else than it's intended for. If there was an API for what > > > > Mesa wants (encouraging certain sets of threads to run on topologically > > > > close cores), there should be no need to block that. > > > > > > Why such process needs to be killed instead the request being masked > > > suitably, is there some program that breaks subtly if affinity request > > > is masked (and that being worse than the program being killed)? > > > > But that is still a situation that could be nicely handled with a > > EPERM error return. Way better than just kill a process. > > That 'badly affected' program still can call abort then. > > But nicely working programs don't get just killed then!! > > > Returning an error seems less secure that prohibiting it completely. > And it may lead to subtle bugs in rarely tested code paths. > > It's legitimate that QEMU and management layers want to prevent > arbitrary code from changing resource allocation etc. > > There are no easy way I can think of for mesa (and other libraries) to > probe the seccomp filters and associated action. > > So we need a way to tell mesa not to call setaffinity() (and other > syscalls). MESA_NO_THREAD_AFFINITY or MESA_NO_SYSCALLS=setaffinity,... > seem like a relatively easy way to go. I strongly believe we should not be going the route of adding another environment variable. Primarily because this is adding a big pitfall for users that have their software not work and have to spend significant efforts to figure out the environment variable and get things work. (as well as associated costs of some users not getting that far and filing bugs or proclaiming on other sites that this thing is buggy for them). As such I'd strongly appreciate it if people look further than the immediate crash and figure out a way to make graceful degradation happen without user intervention. Is there really no way to figure out that calling setaffinity is going to kill the process, and what would be needed to add a method? > > thanks > > > -- > Marc-André Lureau > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [AppVeyor] mesa viewport-jobize #10388 failed
Build mesa 10388 failed Commit 904905e3de by Alyssa Rosenzweig on 3/13/2019 1:50 AM: panfrost: Compute viewport state on the fly\n\nPreviously, we were caching this incorrectly; there's no real reason to\ngiven how variable it is (sensitive to changes in viewport, framebuffer\ndimensions, and scissors) and how cheap it is to recompute. So, just do\nit on the fly each draw.\n\nFixes glmark-es2 -bshadow and -brefract.\n\nSigned-off-by: Alyssa Rosenzweig Configure your notification preferences ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] anv: Ignore VkRenderPassInputAttachementAspectCreateInfo
Pushed. Thanks! On Tue, Mar 12, 2019 at 6:35 PM Sagar Ghuge wrote: > Reviewed-by: Sagar Ghuge > > On 3/12/19 4:21 PM, Jason Ekstrand wrote: > > We don't care about the information but there's no sense in throwing a > > debug warning about it. It's harmless but annoying to users. > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109984 > > Cc: Craig Stout > > --- > > src/intel/vulkan/anv_pass.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c > > index 02f2be60e02..5fac5bbb31c 100644 > > --- a/src/intel/vulkan/anv_pass.c > > +++ b/src/intel/vulkan/anv_pass.c > > @@ -332,6 +332,10 @@ VkResult anv_CreateRenderPass( > > > > vk_foreach_struct(ext, pCreateInfo->pNext) { > >switch (ext->sType) { > > + case > VK_STRUCTURE_TYPE_RENDER_PASS_INPUT_ATTACHMENT_ASPECT_CREATE_INFO: > > + /* We don't care about this information */ > > + break; > > + > >case VK_STRUCTURE_TYPE_RENDER_PASS_MULTIVIEW_CREATE_INFO: { > > VkRenderPassMultiviewCreateInfo *mv = (void *)ext; > > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/loop_unroll: Fix out-of-bounds access handling
Pushed! Thanks. On Tue, Mar 12, 2019 at 5:42 PM Timothy Arceri wrote: > > > On 13/3/19 8:30 am, Jason Ekstrand wrote: > > The previous code was completely broken when it came to constructing the > > undef values. I'm not sure how it ever worked. For the case of a copy > > that reads an undefined value, we can just delete the copy because the > > destination is a valid undefined value. This saves us the effort of > > trying to construct a value for an arbitrary copy_deref intrinsic. > > Yes. It turns out this was indeed not required. Thanks for the fix! > > Reviewed-by: Timothy Arceri > > > > > Fixes: e8a8937a04 "nir: add partial loop unrolling support" > > Cc: Timothy Arceri > > --- > > src/compiler/nir/nir_opt_loop_unroll.c | 14 ++ > > 1 file changed, 2 insertions(+), 12 deletions(-) > > > > diff --git a/src/compiler/nir/nir_opt_loop_unroll.c > b/src/compiler/nir/nir_opt_loop_unroll.c > > index 06ec78b8068..b2696d4aadb 100644 > > --- a/src/compiler/nir/nir_opt_loop_unroll.c > > +++ b/src/compiler/nir/nir_opt_loop_unroll.c > > @@ -670,11 +670,9 @@ remove_out_of_bounds_induction_use(nir_shader > *shader, nir_loop *loop, > > if (is_access_out_of_bounds(term, > nir_src_as_deref(intrin->src[0]), > > trip_count)) { > > if (intrin->intrinsic == nir_intrinsic_load_deref) { > > - assert(intrin->src[0].is_ssa); > > - nir_ssa_def *a_ssa = intrin->src[0].ssa; > > nir_ssa_def *undef = > > - nir_ssa_undef(, intrin->num_components, > > - a_ssa->bit_size); > > + nir_ssa_undef(, intrin->dest.ssa.num_components, > > + intrin->dest.ssa.bit_size); > > nir_ssa_def_rewrite_uses(>dest.ssa, > > nir_src_for_ssa(undef)); > > } else { > > @@ -686,14 +684,6 @@ remove_out_of_bounds_induction_use(nir_shader > *shader, nir_loop *loop, > > if (intrin->intrinsic == nir_intrinsic_copy_deref && > > is_access_out_of_bounds(term, > nir_src_as_deref(intrin->src[1]), > > trip_count)) { > > - assert(intrin->src[1].is_ssa); > > - nir_ssa_def *a_ssa = intrin->src[1].ssa; > > - nir_ssa_def *undef = > > - nir_ssa_undef(, intrin->num_components, > a_ssa->bit_size); > > - > > - /* Replace the copy with a store of the undefined value > */ > > - b.cursor = nir_before_instr(instr); > > - nir_store_deref(, nir_src_as_deref(intrin->src[0]), > undef, ~0); > > nir_instr_remove(instr); > > } > >} > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/10] panfrost: Minor comment cleanup (version detection)
Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/pan_context.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/panfrost/pan_context.c b/src/gallium/drivers/panfrost/pan_context.c index 48e471eace2..cb226cc2220 100644 --- a/src/gallium/drivers/panfrost/pan_context.c +++ b/src/gallium/drivers/panfrost/pan_context.c @@ -2408,8 +2408,9 @@ panfrost_create_context(struct pipe_screen *screen, void *priv, unsigned flags) unsigned gpu_id; gpu_id = pscreen->driver->query_gpu_version(pscreen); -ctx->is_t6xx = gpu_id <= 0x0750; /* For now, this flag means t76x or less */ -ctx->require_sfbd = gpu_id < 0x0750; /* t76x is the first to support MFD */ + +ctx->is_t6xx = gpu_id <= 0x0750; /* For now, this flag means T760 or less */ +ctx->require_sfbd = gpu_id < 0x0750; /* T760 is the first to support MFBD */ gallium->screen = screen; -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/10] panfrost/mfbd: Implement linear depth buffers
This removes a clunky hack where the depth buffer was enabled during the *clear*, instead of during depth buffer linking. That said, this does not yet support writeback like AFBC depth buffers. Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/pan_mfbd.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/src/gallium/drivers/panfrost/pan_mfbd.c b/src/gallium/drivers/panfrost/pan_mfbd.c index a141fd314c0..b9c7cb221e7 100644 --- a/src/gallium/drivers/panfrost/pan_mfbd.c +++ b/src/gallium/drivers/panfrost/pan_mfbd.c @@ -58,8 +58,6 @@ panfrost_mfbd_clear( struct bifrost_fb_extra *fbx, struct bifrost_render_target *rt) { -struct panfrost_context *ctx = job->ctx; - if (job->clear & PIPE_CLEAR_COLOR) { rt->clear_color_1 = job->clear_color; rt->clear_color_2 = job->clear_color; @@ -74,14 +72,6 @@ panfrost_mfbd_clear( if (job->clear & PIPE_CLEAR_STENCIL) { fb->clear_stencil = job->clear_stencil; } - -if (job->clear & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL)) { -/* Setup combined 24/8 depth/stencil */ -fb->unk3 |= MALI_MFBD_EXTRA; -fbx->flags = 0x405; -fbx->ds_linear.depth = ctx->depth_stencil_buffer.gpu; -fbx->ds_linear.depth_stride = ctx->pipe_framebuffer.width * 4; -} } static void @@ -155,6 +145,15 @@ panfrost_mfbd_set_zsbuf( fbx->ds_afbc.padding = 0x1000; fb->unk3 |= MALI_MFBD_DEPTH_WRITE; +} else if (rsrc->bo->layout == PAN_LINEAR) { +fb->unk3 |= MALI_MFBD_EXTRA; +fbx->flags |= MALI_EXTRA_PRESENT | MALI_EXTRA_ZS | 0x1; + +fbx->ds_linear.depth = rsrc->bo->gpu[0]; +fbx->ds_linear.depth_stride = +util_format_get_stride(surf->format, surf->texture->width0); +} else { +assert(0); } } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/10] panfrost: Allocate extra data for depth buffer
It's not clear why the hardware "spills" a little bit, but if we don't do this, we get MMU faults with linear depth buffers. Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/pan_resource.c | 5 + 1 file changed, 5 insertions(+) diff --git a/src/gallium/drivers/panfrost/pan_resource.c b/src/gallium/drivers/panfrost/pan_resource.c index d647f618ee7..a6e0cfe4687 100644 --- a/src/gallium/drivers/panfrost/pan_resource.c +++ b/src/gallium/drivers/panfrost/pan_resource.c @@ -197,6 +197,11 @@ panfrost_create_bo(struct panfrost_screen *screen, const struct pipe_resource *t if (template->height0) sz *= template->height0; if (template->depth0) sz *= template->depth0; +/* Depth buffers require extra space for unknown reasons */ + +if (template->bind & PIPE_BIND_DEPTH_STENCIL) +sz = sz + sz/256; + /* Based on the usage, figure out what storing will be used. There are * various tradeoffs: * -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/10] panfrost/mfbd: Respect per-job depth write flag
While a depth buffer may be supplied, it only needs to be written to if the depth writemask is set for any draw AND if the depth buffer is not immediately invalidated (as is the case for scanout). This refactors panfrost_job to provide a depth write requirement, which is now implemented for MFBD depth buffers. Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/pan_context.c | 30 -- src/gallium/drivers/panfrost/pan_job.h | 9 +-- src/gallium/drivers/panfrost/pan_mfbd.c| 21 --- src/gallium/drivers/panfrost/pan_sfbd.c| 2 +- 4 files changed, 42 insertions(+), 20 deletions(-) diff --git a/src/gallium/drivers/panfrost/pan_context.c b/src/gallium/drivers/panfrost/pan_context.c index cb226cc2220..a038ea122f7 100644 --- a/src/gallium/drivers/panfrost/pan_context.c +++ b/src/gallium/drivers/panfrost/pan_context.c @@ -50,19 +50,6 @@ extern const char *pan_counters_base; /* Do not actually send anything to the GPU; merely generate the cmdstream as fast as possible. Disables framebuffer writes */ //#define DRY_RUN -/* TODO: Sample size, etc */ - -static void -panfrost_set_framebuffer_msaa(struct panfrost_context *ctx, bool enabled) -{ -struct panfrost_job *job = panfrost_get_job_for_fbo(ctx); - -job->msaa |= enabled; - -SET_BIT(ctx->fragment_shader_core.unknown2_3, MALI_HAS_MSAA, enabled); -SET_BIT(ctx->fragment_shader_core.unknown2_4, MALI_NO_MSAA, !enabled); -} - /* AFBC is enabled on a per-resource basis (AFBC enabling is theoretically * indepdent between color buffers and depth/stencil). To enable, we allocate * the AFBC metadata buffer and mark that it is enabled. We do -not- actually @@ -789,15 +776,30 @@ panfrost_emit_vertex_data(struct panfrost_context *ctx) void panfrost_emit_for_draw(struct panfrost_context *ctx, bool with_vertex_data) { +struct panfrost_job *job = panfrost_get_job_for_fbo(ctx); + if (with_vertex_data) { panfrost_emit_vertex_data(ctx); } +bool msaa = ctx->rasterizer->base.multisample; + if (ctx->dirty & PAN_DIRTY_RASTERIZER) { ctx->payload_tiler.gl_enables = ctx->rasterizer->tiler_gl_enables; -panfrost_set_framebuffer_msaa(ctx, ctx->rasterizer->base.multisample); + +/* TODO: Sample size */ +SET_BIT(ctx->fragment_shader_core.unknown2_3, MALI_HAS_MSAA, msaa); +SET_BIT(ctx->fragment_shader_core.unknown2_4, MALI_NO_MSAA, !msaa); } +/* Enable job requirements at draw-time */ + +if (msaa) +job->requirements |= PAN_REQ_MSAA; + +if (ctx->depth_stencil->depth.writemask) +job->requirements |= PAN_REQ_DEPTH_WRITE; + if (ctx->occlusion_query) { ctx->payload_tiler.gl_enables |= MALI_OCCLUSION_QUERY | MALI_OCCLUSION_PRECISE; ctx->payload_tiler.postfix.occlusion_counter = ctx->occlusion_query->transfer.gpu; diff --git a/src/gallium/drivers/panfrost/pan_job.h b/src/gallium/drivers/panfrost/pan_job.h index 10503d944ac..30f1cf4bd5c 100644 --- a/src/gallium/drivers/panfrost/pan_job.h +++ b/src/gallium/drivers/panfrost/pan_job.h @@ -33,6 +33,9 @@ struct panfrost_job_key { struct pipe_surface *zsbuf; }; +#define PAN_REQ_MSAA(1 << 0) +#define PAN_REQ_DEPTH_WRITE (1 << 1) + /* A panfrost_job corresponds to a bound FBO we're rendering to, * collecting over multiple draws. */ @@ -48,8 +51,10 @@ struct panfrost_job { float clear_depth; unsigned clear_stencil; -/* Whether this job uses MSAA */ -bool msaa; +/* Whether this job uses the corresponding requirement (PAN_REQ_* + * bitmask) */ +unsigned requirements; + }; /* Functions for managing the above */ diff --git a/src/gallium/drivers/panfrost/pan_mfbd.c b/src/gallium/drivers/panfrost/pan_mfbd.c index b9c7cb221e7..68c842981f3 100644 --- a/src/gallium/drivers/panfrost/pan_mfbd.c +++ b/src/gallium/drivers/panfrost/pan_mfbd.c @@ -143,8 +143,6 @@ panfrost_mfbd_set_zsbuf( fbx->ds_afbc.zero1 = 0x10009; fbx->ds_afbc.padding = 0x1000; - -fb->unk3 |= MALI_MFBD_DEPTH_WRITE; } else if (rsrc->bo->layout == PAN_LINEAR) { fb->unk3 |= MALI_MFBD_EXTRA; fbx->flags |= MALI_EXTRA_PRESENT | MALI_EXTRA_ZS | 0x1; @@ -246,7 +244,21 @@ panfrost_mfbd_fragment(struct panfrost_context *ctx, bool flip_y) rts[0].framebuffer_stride = 0; } -if (job->msaa) { +/* When scanning out, the depth buffer is immediately invalidated, so + * we don't need to waste bandwidth writing it out. This can improve + * performance substantially (Z32_UNORM 1080p @ 60fps is 475 MB/s of + * memory bandwidth!). + * + * The exception is ReadPixels, but this is not supported on
[Mesa-dev] [PATCH 07/10] panfrost: Comment spelling fix
Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/pan_mfbd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/gallium/drivers/panfrost/pan_mfbd.c b/src/gallium/drivers/panfrost/pan_mfbd.c index 68c842981f3..ac7f9ed665f 100644 --- a/src/gallium/drivers/panfrost/pan_mfbd.c +++ b/src/gallium/drivers/panfrost/pan_mfbd.c @@ -198,7 +198,7 @@ panfrost_mfbd_upload( UPLOAD(m_f_trans, offset, [c], total_sz); } -/* Return pointer suitable for the fragment seciton */ +/* Return pointer suitable for the fragment section */ return m_f_trans.gpu | MALI_MFBD | (has_extra ? 2 : 0); } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/10] panfrost; Disable AFBC for depth buffers
For inexplicable reasons, the depth buffer is faster if kept as linear, whereas the colour buffers are faster if AFBC. Given both code paths are available, we'll choose the faster one of each (which also helps with testing coverage). Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/pan_context.c | 5 + src/gallium/drivers/panfrost/pan_resource.c | 6 ++ 2 files changed, 7 insertions(+), 4 deletions(-) diff --git a/src/gallium/drivers/panfrost/pan_context.c b/src/gallium/drivers/panfrost/pan_context.c index a038ea122f7..f607a9a8169 100644 --- a/src/gallium/drivers/panfrost/pan_context.c +++ b/src/gallium/drivers/panfrost/pan_context.c @@ -2058,10 +2058,7 @@ panfrost_set_framebuffer_state(struct pipe_context *pctx, panfrost_attach_vt_framebuffer(ctx); panfrost_set_scissor(ctx); -struct panfrost_resource *tex = ((struct panfrost_resource *) ctx->pipe_framebuffer.zsbuf->texture); - -if (tex->bo->layout != PAN_AFBC && !panfrost_is_scanout(ctx)) -panfrost_enable_afbc(ctx, tex, true); +/* Keep the depth FBO linear */ } } } diff --git a/src/gallium/drivers/panfrost/pan_resource.c b/src/gallium/drivers/panfrost/pan_resource.c index a6e0cfe4687..0fe54aa8a1d 100644 --- a/src/gallium/drivers/panfrost/pan_resource.c +++ b/src/gallium/drivers/panfrost/pan_resource.c @@ -220,6 +220,12 @@ panfrost_create_bo(struct panfrost_screen *screen, const struct pipe_resource *t /* Tiling textures is almost always faster, unless we only use it once */ bool should_tile = (template->usage != PIPE_USAGE_STREAM) && (template->bind & PIPE_BIND_SAMPLER_VIEW); +/* For unclear reasons, depth/stencil is faster linear than AFBC, so + * make sure it's linear */ + +if (template->bind & PIPE_BIND_DEPTH_STENCIL) +should_tile = false; + /* Set the layout appropriately */ bo->layout = should_tile ? PAN_TILED : PAN_LINEAR; -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/10] Fragment descriptor focused refactor
This patch series can be considered a "v2" of the previous, for the commits that were not merged due to regressions and such. It goes back and fixes quite a bit of these issues the right way. One particular benefit is the removal of the staging fragment framebuffer descriptor, which was a major hack and probably causing issues. Alyssa Rosenzweig (10): panfrost: Break out fragment to SFBD/MFBD files panfrost: Remove staging SFBD for pan_context panfrost: Remove staging MFBD panfrost: Minor comment cleanup (version detection) panfrost/mfbd: Implement linear depth buffers panfrost/mfbd: Respect per-job depth write flag panfrost: Comment spelling fix panfrost: Allocate extra data for depth buffer panfrost; Disable AFBC for depth buffers panfrost: Compute viewport state on the fly src/gallium/drivers/panfrost/meson.build| 3 + src/gallium/drivers/panfrost/pan_context.c | 525 +++- src/gallium/drivers/panfrost/pan_context.h | 36 +- src/gallium/drivers/panfrost/pan_fragment.c | 66 +++ src/gallium/drivers/panfrost/pan_job.h | 8 + src/gallium/drivers/panfrost/pan_mfbd.c | 289 +++ src/gallium/drivers/panfrost/pan_resource.c | 11 + src/gallium/drivers/panfrost/pan_sfbd.c | 139 ++ 8 files changed, 598 insertions(+), 479 deletions(-) create mode 100644 src/gallium/drivers/panfrost/pan_fragment.c create mode 100644 src/gallium/drivers/panfrost/pan_mfbd.c create mode 100644 src/gallium/drivers/panfrost/pan_sfbd.c -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/10] panfrost: Compute viewport state on the fly
Previously, we were caching this incorrectly; there's no real reason to given how variable it is (sensitive to changes in viewport, framebuffer dimensions, and scissors) and how cheap it is to recompute. So, just do it on the fly each draw. Fixes glmark-es2 -bshadow and -brefract. Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/pan_context.c | 108 - src/gallium/drivers/panfrost/pan_context.h | 1 - 2 files changed, 38 insertions(+), 71 deletions(-) diff --git a/src/gallium/drivers/panfrost/pan_context.c b/src/gallium/drivers/panfrost/pan_context.c index f607a9a8169..378a631410b 100644 --- a/src/gallium/drivers/panfrost/pan_context.c +++ b/src/gallium/drivers/panfrost/pan_context.c @@ -311,39 +311,6 @@ panfrost_attach_vt_framebuffer(struct panfrost_context *ctx) ctx->payload_tiler.postfix.framebuffer = framebuffer; } -static void -panfrost_viewport(struct panfrost_context *ctx, - float depth_clip_near, - float depth_clip_far, - int viewport_x0, int viewport_y0, - int viewport_x1, int viewport_y1) -{ -/* Clip bounds are encoded as floats. The viewport itself is encoded as - * (somewhat) asymmetric ints. */ - -struct mali_viewport ret = { -/* By default, do no viewport clipping, i.e. clip to (-inf, - * inf) in each direction. Clipping to the viewport in theory - * should work, but in practice causes issues when we're not - * explicitly trying to scissor */ - -.clip_minx = -inff, -.clip_miny = -inff, -.clip_maxx = inff, -.clip_maxy = inff, - -/* We always perform depth clipping (TODO: Can this be disabled?) */ - -.clip_minz = depth_clip_near, -.clip_maxz = depth_clip_far, - -.viewport0 = { viewport_x0, viewport_y0 }, -.viewport1 = { MALI_POSITIVE(viewport_x1), MALI_POSITIVE(viewport_y1) }, -}; - -memcpy(ctx->viewport, , sizeof(ret)); -} - /* Reset per-frame context, called on context initialisation as well as after * flushing a frame */ @@ -413,11 +380,6 @@ panfrost_emit_tiler_payload(struct panfrost_context *ctx) }, }; -/* Reserve the viewport */ -struct panfrost_transfer t = panfrost_allocate_chunk(ctx, sizeof(struct mali_viewport), HEAP_DESCRIPTOR); -ctx->viewport = (struct mali_viewport *) t.cpu; -payload.postfix.viewport = t.gpu; - memcpy(>payload_tiler, , sizeof(payload)); } @@ -1119,6 +1081,44 @@ panfrost_emit_for_draw(struct panfrost_context *ctx, bool with_vertex_data) } } +/* TODO: Upload the viewport somewhere more appropriate */ + +/* Clip bounds are encoded as floats. The viewport itself is encoded as + * (somewhat) asymmetric ints. */ +const struct pipe_scissor_state *ss = >scissor; + +struct mali_viewport view = { +/* By default, do no viewport clipping, i.e. clip to (-inf, + * inf) in each direction. Clipping to the viewport in theory + * should work, but in practice causes issues when we're not + * explicitly trying to scissor */ + +.clip_minx = -inff, +.clip_miny = -inff, +.clip_maxx = inff, +.clip_maxy = inff, + +.clip_minz = 0.0, +.clip_maxz = 1.0, +}; + +if (ss && ctx->rasterizer && ctx->rasterizer->base.scissor && 0) { +view.viewport0[0] = ss->minx; +view.viewport0[1] = ss->miny; +view.viewport1[0] = MALI_POSITIVE(ss->maxx); +view.viewport1[1] = MALI_POSITIVE(ss->maxy); +} else { +view.viewport0[0] = 0; +view.viewport0[1] = 0; +view.viewport1[0] = MALI_POSITIVE(ctx->pipe_framebuffer.width); +view.viewport1[1] = MALI_POSITIVE(ctx->pipe_framebuffer.height); +} + +ctx->payload_tiler.postfix.viewport = +panfrost_upload_transient(ctx, +, +sizeof(struct mali_viewport)); + ctx->dirty = 0; } @@ -1463,24 +1463,6 @@ panfrost_generic_cso_delete(struct pipe_context *pctx, void *hwcso) free(hwcso); } -static void -panfrost_set_scissor(struct panfrost_context *ctx) -{ -const struct pipe_scissor_state *ss = >scissor; - -if (ss && ctx->rasterizer && ctx->rasterizer->base.scissor && 0) { -ctx->viewport->viewport0[0] = ss->minx; -ctx->viewport->viewport0[1] = ss->miny; -ctx->viewport->viewport1[0] = MALI_POSITIVE(ss->maxx); -ctx->viewport->viewport1[1] = MALI_POSITIVE(ss->maxy); -
[Mesa-dev] [PATCH 02/10] panfrost: Remove staging SFBD for pan_context
The fragment framebuffer descriptor should not be a context entry; rather, it should be constructed only at fragment time to keep analysis tractable. Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/pan_context.h | 5 +- src/gallium/drivers/panfrost/pan_fragment.c | 7 ++- src/gallium/drivers/panfrost/pan_mfbd.c | 2 +- src/gallium/drivers/panfrost/pan_sfbd.c | 55 + 4 files changed, 30 insertions(+), 39 deletions(-) diff --git a/src/gallium/drivers/panfrost/pan_context.h b/src/gallium/drivers/panfrost/pan_context.h index a304d83e4fe..a3c87199d00 100644 --- a/src/gallium/drivers/panfrost/pan_context.h +++ b/src/gallium/drivers/panfrost/pan_context.h @@ -140,7 +140,6 @@ struct panfrost_context { * e.g. clearing information */ union { -struct mali_single_framebuffer fragment_sfbd; struct { struct bifrost_framebuffer fragment_mfbd; struct bifrost_fb_extra fragment_extra; @@ -371,10 +370,10 @@ bool panfrost_is_scanout(struct panfrost_context *ctx); mali_ptr -panfrost_sfbd_fragment(struct panfrost_context *ctx); +panfrost_sfbd_fragment(struct panfrost_context *ctx, bool flip_y); mali_ptr -panfrost_mfbd_fragment(struct panfrost_context *ctx); +panfrost_mfbd_fragment(struct panfrost_context *ctx, bool flip_y); struct bifrost_framebuffer panfrost_emit_mfbd(struct panfrost_context *ctx); diff --git a/src/gallium/drivers/panfrost/pan_fragment.c b/src/gallium/drivers/panfrost/pan_fragment.c index 16405a4ed21..0dc15c2d235 100644 --- a/src/gallium/drivers/panfrost/pan_fragment.c +++ b/src/gallium/drivers/panfrost/pan_fragment.c @@ -34,9 +34,12 @@ mali_ptr panfrost_fragment_job(struct panfrost_context *ctx) { +/* TODO: Check viewport or something */ +bool flip_y = panfrost_is_scanout(ctx); + mali_ptr framebuffer = ctx->require_sfbd ? -panfrost_sfbd_fragment(ctx) : -panfrost_mfbd_fragment(ctx); +panfrost_sfbd_fragment(ctx, flip_y) : +panfrost_mfbd_fragment(ctx, flip_y); struct mali_job_descriptor_header header = { .job_type = JOB_TYPE_FRAGMENT, diff --git a/src/gallium/drivers/panfrost/pan_mfbd.c b/src/gallium/drivers/panfrost/pan_mfbd.c index dffe13e6713..cb36a374063 100644 --- a/src/gallium/drivers/panfrost/pan_mfbd.c +++ b/src/gallium/drivers/panfrost/pan_mfbd.c @@ -234,7 +234,7 @@ panfrost_mfbd_upload(struct panfrost_context *ctx) /* Creates an MFBD for the FRAGMENT section of the bound framebuffer */ mali_ptr -panfrost_mfbd_fragment(struct panfrost_context *ctx) +panfrost_mfbd_fragment(struct panfrost_context *ctx, bool flip_y) { struct panfrost_job *job = panfrost_get_job_for_fbo(ctx); diff --git a/src/gallium/drivers/panfrost/pan_sfbd.c b/src/gallium/drivers/panfrost/pan_sfbd.c index d04da20e258..0e283bbb082 100644 --- a/src/gallium/drivers/panfrost/pan_sfbd.c +++ b/src/gallium/drivers/panfrost/pan_sfbd.c @@ -36,16 +36,11 @@ panfrost_sfbd_format(struct pipe_surface *surf) } static void -panfrost_sfbd_enable_msaa(struct panfrost_context *ctx) -{ -ctx->fragment_sfbd.format |= MALI_FRAMEBUFFER_MSAA_A | MALI_FRAMEBUFFER_MSAA_B; -} - -static void -panfrost_sfbd_clear(struct panfrost_job *job) +panfrost_sfbd_clear( +struct panfrost_job *job, +struct mali_single_framebuffer *sfbd) { struct panfrost_context *ctx = job->ctx; -struct mali_single_framebuffer *sfbd = >fragment_sfbd; if (job->clear & PIPE_CLEAR_COLOR) { sfbd->clear_color_1 = job->clear_color; @@ -91,60 +86,54 @@ panfrost_sfbd_clear(struct panfrost_job *job) static void panfrost_sfbd_set_cbuf( -struct panfrost_context *ctx, -struct pipe_surface *surf) +struct mali_single_framebuffer *fb, +struct pipe_surface *surf, +bool flip_y) { struct panfrost_resource *rsrc = pan_resource(surf->texture); signed stride = util_format_get_stride(surf->format, surf->texture->width0); -ctx->fragment_sfbd.format = panfrost_sfbd_format(surf); +fb->format = panfrost_sfbd_format(surf); if (rsrc->bo->layout == PAN_LINEAR) { mali_ptr framebuffer = rsrc->bo->gpu[0]; /* The default is upside down from OpenGL's perspective. */ -if (panfrost_is_scanout(ctx)) { +if (flip_y) { framebuffer += stride * (surf->texture->height0 - 1); stride = -stride; } -ctx->fragment_sfbd.framebuffer = framebuffer; -ctx->fragment_sfbd.stride = stride; +fb->framebuffer = framebuffer; +fb->stride = stride; } else { fprintf(stderr, "Invalid render
[Mesa-dev] [PATCH 01/10] panfrost: Break out fragment to SFBD/MFBD files
This substantially cleans up the corresponding logic at the expense of a bit of code duplication; nevertheless, it's a net win since otherwise incompatible hardware code is mixed confusingly. Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/meson.build| 3 + src/gallium/drivers/panfrost/pan_context.c | 385 +--- src/gallium/drivers/panfrost/pan_context.h | 21 ++ src/gallium/drivers/panfrost/pan_fragment.c | 63 src/gallium/drivers/panfrost/pan_job.h | 3 + src/gallium/drivers/panfrost/pan_mfbd.c | 273 ++ src/gallium/drivers/panfrost/pan_sfbd.c | 150 7 files changed, 520 insertions(+), 378 deletions(-) create mode 100644 src/gallium/drivers/panfrost/pan_fragment.c create mode 100644 src/gallium/drivers/panfrost/pan_mfbd.c create mode 100644 src/gallium/drivers/panfrost/pan_sfbd.c diff --git a/src/gallium/drivers/panfrost/meson.build b/src/gallium/drivers/panfrost/meson.build index 456782355bb..e3569e73468 100644 --- a/src/gallium/drivers/panfrost/meson.build +++ b/src/gallium/drivers/panfrost/meson.build @@ -42,6 +42,9 @@ files_panfrost = files( 'pan_blend_shaders.c', 'pan_wallpaper.c', 'pan_pretty_print.c', + 'pan_fragment.c', + 'pan_sfbd.c', + 'pan_mfbd.c' ) inc_panfrost = [ diff --git a/src/gallium/drivers/panfrost/pan_context.c b/src/gallium/drivers/panfrost/pan_context.c index cbf97daf6e5..48e471eace2 100644 --- a/src/gallium/drivers/panfrost/pan_context.c +++ b/src/gallium/drivers/panfrost/pan_context.c @@ -50,30 +50,17 @@ extern const char *pan_counters_base; /* Do not actually send anything to the GPU; merely generate the cmdstream as fast as possible. Disables framebuffer writes */ //#define DRY_RUN -#define SET_BIT(lval, bit, cond) \ - if (cond) \ - lval |= (bit); \ - else \ - lval &= ~(bit); - /* TODO: Sample size, etc */ static void panfrost_set_framebuffer_msaa(struct panfrost_context *ctx, bool enabled) { -SET_BIT(ctx->fragment_shader_core.unknown2_3, MALI_HAS_MSAA, enabled); -SET_BIT(ctx->fragment_shader_core.unknown2_4, MALI_NO_MSAA, !enabled); - -if (ctx->require_sfbd) { -SET_BIT(ctx->fragment_sfbd.format, MALI_FRAMEBUFFER_MSAA_A | MALI_FRAMEBUFFER_MSAA_B, enabled); -} else { -SET_BIT(ctx->fragment_rts[0].format.flags, MALI_MFBD_FORMAT_MSAA, enabled); +struct panfrost_job *job = panfrost_get_job_for_fbo(ctx); -SET_BIT(ctx->fragment_mfbd.unk1, (1 << 4) | (1 << 1), enabled); +job->msaa |= enabled; -/* XXX */ -ctx->fragment_mfbd.rt_count_2 = enabled ? 4 : 1; -} +SET_BIT(ctx->fragment_shader_core.unknown2_3, MALI_HAS_MSAA, enabled); +SET_BIT(ctx->fragment_shader_core.unknown2_4, MALI_NO_MSAA, !enabled); } /* AFBC is enabled on a per-resource basis (AFBC enabling is theoretically @@ -131,160 +118,6 @@ panfrost_enable_checksum(struct panfrost_context *ctx, struct panfrost_resource rsrc->bo->has_checksum = true; } -static unsigned -panfrost_sfbd_format_for_surface(struct pipe_surface *surf) -{ -/* TODO */ -return 0xb84e0281; /* RGB32, no MSAA */ -} - -static struct mali_rt_format -panfrost_mfbd_format_for_surface(struct pipe_surface *surf) -{ -/* Explode details on the format */ - -const struct util_format_description *desc = -util_format_description(surf->texture->format); - -/* Fill in accordingly */ - -struct mali_rt_format fmt = { -.unk1 = 0x400, -.unk2 = 0x1, -.nr_channels = MALI_POSITIVE(desc->nr_channels), -.flags = 0x444, -.swizzle = panfrost_translate_swizzle_4(desc->swizzle), -.unk4 = 0x8 -}; - -return fmt; -} - -static bool panfrost_is_scanout(struct panfrost_context *ctx); - -/* These routines link a fragment job with the bound surface, accounting for the - * BO layout. This routine runs per-frame */ - -static void -panfrost_set_fragment_target_cbuf( -struct panfrost_context *ctx, -struct pipe_surface *surf, -unsigned cb) -{ -struct panfrost_resource *rsrc = pan_resource(surf->texture); - -signed stride = -util_format_get_stride(surf->format, surf->texture->width0); - -/* First, we set the format bits */ - -if (ctx->require_sfbd) { -ctx->fragment_sfbd.format = -panfrost_sfbd_format_for_surface(surf); -} else { -ctx->fragment_rts[cb].format = -panfrost_mfbd_format_for_surface(surf); -} - -/* Now, we set the layout specific pieces */ - -if (rsrc->bo->layout == PAN_LINEAR) { -mali_ptr framebuffer = rsrc->bo->gpu[0]; - -/* The default is upside
[Mesa-dev] [PATCH 03/10] panfrost: Remove staging MFBD
Same idea as the previous commit, but for the MFBD this time instead of the SFBD. Signed-off-by: Alyssa Rosenzweig --- src/gallium/drivers/panfrost/pan_context.h | 13 -- src/gallium/drivers/panfrost/pan_mfbd.c| 194 +++-- 2 files changed, 98 insertions(+), 109 deletions(-) diff --git a/src/gallium/drivers/panfrost/pan_context.h b/src/gallium/drivers/panfrost/pan_context.h index a3c87199d00..15b9819484f 100644 --- a/src/gallium/drivers/panfrost/pan_context.h +++ b/src/gallium/drivers/panfrost/pan_context.h @@ -134,19 +134,6 @@ struct panfrost_context { struct panfrost_query *occlusion_query; -/* Each render job has multiple framebuffer descriptors associated with - * it, used for various purposes with more or less the same format. The - * most obvious is the fragment framebuffer descriptor, which carries - * e.g. clearing information */ - -union { -struct { -struct bifrost_framebuffer fragment_mfbd; -struct bifrost_fb_extra fragment_extra; -struct bifrost_render_target fragment_rts[4]; -}; -}; - /* Each draw has corresponding vertex and tiler payloads */ struct midgard_payload_vertex_tiler payload_vertex; struct midgard_payload_vertex_tiler payload_tiler; diff --git a/src/gallium/drivers/panfrost/pan_mfbd.c b/src/gallium/drivers/panfrost/pan_mfbd.c index cb36a374063..a141fd314c0 100644 --- a/src/gallium/drivers/panfrost/pan_mfbd.c +++ b/src/gallium/drivers/panfrost/pan_mfbd.c @@ -50,159 +50,135 @@ panfrost_mfbd_format(struct pipe_surface *surf) return fmt; } -static void -panfrost_mfbd_enable_msaa(struct panfrost_context *ctx) -{ -ctx->fragment_rts[0].format.flags |= MALI_MFBD_FORMAT_MSAA; - -/* XXX */ -ctx->fragment_mfbd.unk1 |= (1 << 4) | (1 << 1); -ctx->fragment_mfbd.rt_count_2 = 4; -} static void -panfrost_mfbd_clear(struct panfrost_job *job) +panfrost_mfbd_clear( +struct panfrost_job *job, +struct bifrost_framebuffer *fb, +struct bifrost_fb_extra *fbx, +struct bifrost_render_target *rt) { struct panfrost_context *ctx = job->ctx; -struct bifrost_render_target *buffer_color = >fragment_rts[0]; -struct bifrost_framebuffer *buffer_ds = >fragment_mfbd; if (job->clear & PIPE_CLEAR_COLOR) { -buffer_color->clear_color_1 = job->clear_color; -buffer_color->clear_color_2 = job->clear_color; -buffer_color->clear_color_3 = job->clear_color; -buffer_color->clear_color_4 = job->clear_color; +rt->clear_color_1 = job->clear_color; +rt->clear_color_2 = job->clear_color; +rt->clear_color_3 = job->clear_color; +rt->clear_color_4 = job->clear_color; } if (job->clear & PIPE_CLEAR_DEPTH) { -buffer_ds->clear_depth = job->clear_depth; +fb->clear_depth = job->clear_depth; } if (job->clear & PIPE_CLEAR_STENCIL) { -buffer_ds->clear_stencil = job->clear_stencil; +fb->clear_stencil = job->clear_stencil; } if (job->clear & (PIPE_CLEAR_DEPTH | PIPE_CLEAR_STENCIL)) { /* Setup combined 24/8 depth/stencil */ -ctx->fragment_mfbd.unk3 |= MALI_MFBD_EXTRA; -ctx->fragment_extra.flags = 0x405; -ctx->fragment_extra.ds_linear.depth = ctx->depth_stencil_buffer.gpu; -ctx->fragment_extra.ds_linear.depth_stride = ctx->pipe_framebuffer.width * 4; +fb->unk3 |= MALI_MFBD_EXTRA; +fbx->flags = 0x405; +fbx->ds_linear.depth = ctx->depth_stencil_buffer.gpu; +fbx->ds_linear.depth_stride = ctx->pipe_framebuffer.width * 4; } } static void panfrost_mfbd_set_cbuf( -struct panfrost_context *ctx, +struct bifrost_render_target *rt, struct pipe_surface *surf, -unsigned cb) +bool flip_y) { struct panfrost_resource *rsrc = pan_resource(surf->texture); signed stride = util_format_get_stride(surf->format, surf->texture->width0); -ctx->fragment_rts[cb].format = panfrost_mfbd_format(surf); +rt->format = panfrost_mfbd_format(surf); /* Now, we set the layout specific pieces */ if (rsrc->bo->layout == PAN_LINEAR) { mali_ptr framebuffer = rsrc->bo->gpu[0]; -/* The default is upside down from OpenGL's perspective. */ -if (panfrost_is_scanout(ctx)) { +if (flip_y) { framebuffer += stride * (surf->texture->height0 - 1); stride = -stride;
Re: [Mesa-dev] [PATCH 11/11] ac: use new LLVM 8 intrinsics in ac_build_buffer_store_dword()
This one causes 2000+ piglit tests to fail on radeonsi. For example: ./bin/shader_runner generated_tests/spec/arb_gpu_shader_fp64/execution/conversion/geom-conversion-explicit-bool-double.shader_test -auto -fbo On 13/3/19 3:19 am, Samuel Pitoiset wrote: New buffer intrinsics have a separate soffset parameter. Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 66 ++ 1 file changed, 26 insertions(+), 40 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index ce6639d49bf..8ed5199da55 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1227,59 +1227,45 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx, if (!swizzle_enable_hint) { LLVMValueRef offset = soffset; - static const char *types[] = {"f32", "v2f32", "v4f32"}; - if (inst_offset) offset = LLVMBuildAdd(ctx->builder, offset, LLVMConstInt(ctx->i32, inst_offset, 0), ""); - if (voffset) - offset = LLVMBuildAdd(ctx->builder, offset, voffset, ""); - - LLVMValueRef args[] = { - ac_to_float(ctx, vdata), - LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""), - ctx->i32_0, - offset, - LLVMConstInt(ctx->i1, glc, 0), - LLVMConstInt(ctx->i1, slc, 0), - }; - - char name[256]; - snprintf(name, sizeof(name), "llvm.amdgcn.buffer.store.%s", -types[CLAMP(num_channels, 1, 3) - 1]); - ac_build_intrinsic(ctx, name, ctx->voidt, - args, ARRAY_SIZE(args), - ac_get_store_intr_attribs(writeonly_memory)); + if (HAVE_LLVM >= 0x800) { + ac_build_llvm8_buffer_store_common(ctx, rsrc, + ac_to_float(ctx, vdata), + ctx->i32_0, + voffset, offset, + num_channels, + glc, slc, + writeonly_memory, + false, true); + } else { + if (voffset) + offset = LLVMBuildAdd(ctx->builder, offset, voffset, ""); + + ac_build_buffer_store_common(ctx, rsrc, +ac_to_float(ctx, vdata), +ctx->i32_0, offset, +num_channels, glc, slc, +writeonly_memory, false); + } return; } - static const unsigned dfmt[] = { + static const unsigned dfmts[] = { V_008F0C_BUF_DATA_FORMAT_32, V_008F0C_BUF_DATA_FORMAT_32_32, V_008F0C_BUF_DATA_FORMAT_32_32_32, V_008F0C_BUF_DATA_FORMAT_32_32_32_32 }; - static const char *types[] = {"i32", "v2i32", "v4i32"}; - LLVMValueRef args[] = { - vdata, - LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""), - ctx->i32_0, - voffset ? voffset : ctx->i32_0, - soffset, - LLVMConstInt(ctx->i32, inst_offset, 0), - LLVMConstInt(ctx->i32, dfmt[num_channels - 1], 0), - LLVMConstInt(ctx->i32, V_008F0C_BUF_NUM_FORMAT_UINT, 0), - LLVMConstInt(ctx->i1, glc, 0), - LLVMConstInt(ctx->i1, slc, 0), - }; - char name[256]; - snprintf(name, sizeof(name), "llvm.amdgcn.tbuffer.store.%s", -types[CLAMP(num_channels, 1, 3) - 1]); + unsigned dfmt = dfmts[num_channels - 1]; + unsigned nfmt = V_008F0C_BUF_NUM_FORMAT_UINT; + LLVMValueRef immoffset = LLVMConstInt(ctx->i32, inst_offset, 0); - ac_build_intrinsic(ctx, name, ctx->voidt, - args, ARRAY_SIZE(args), - ac_get_store_intr_attribs(writeonly_memory)); + ac_build_tbuffer_store(ctx, rsrc, vdata, ctx->i32_0, voffset, soffset, + immoffset, num_channels, dfmt, nfmt, glc, slc, + writeonly_memory); } static LLVMValueRef ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/2] panfrost: Set bo->size[0] in the DRM backend
Both patches are: Reviewed-by: Alyssa Rosenzweig ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] anv: Ignore VkRenderPassInputAttachementAspectCreateInfo
Reviewed-by: Sagar Ghuge On 3/12/19 4:21 PM, Jason Ekstrand wrote: > We don't care about the information but there's no sense in throwing a > debug warning about it. It's harmless but annoying to users. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109984 > Cc: Craig Stout > --- > src/intel/vulkan/anv_pass.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c > index 02f2be60e02..5fac5bbb31c 100644 > --- a/src/intel/vulkan/anv_pass.c > +++ b/src/intel/vulkan/anv_pass.c > @@ -332,6 +332,10 @@ VkResult anv_CreateRenderPass( > > vk_foreach_struct(ext, pCreateInfo->pNext) { >switch (ext->sType) { > + case VK_STRUCTURE_TYPE_RENDER_PASS_INPUT_ATTACHMENT_ASPECT_CREATE_INFO: > + /* We don't care about this information */ > + break; > + >case VK_STRUCTURE_TYPE_RENDER_PASS_MULTIVIEW_CREATE_INFO: { > VkRenderPassMultiviewCreateInfo *mv = (void *)ext; > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] anv: Ignore VkRenderPassInputAttachementAspectCreateInfo
We don't care about the information but there's no sense in throwing a debug warning about it. It's harmless but annoying to users. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109984 Cc: Craig Stout --- src/intel/vulkan/anv_pass.c | 4 1 file changed, 4 insertions(+) diff --git a/src/intel/vulkan/anv_pass.c b/src/intel/vulkan/anv_pass.c index 02f2be60e02..5fac5bbb31c 100644 --- a/src/intel/vulkan/anv_pass.c +++ b/src/intel/vulkan/anv_pass.c @@ -332,6 +332,10 @@ VkResult anv_CreateRenderPass( vk_foreach_struct(ext, pCreateInfo->pNext) { switch (ext->sType) { + case VK_STRUCTURE_TYPE_RENDER_PASS_INPUT_ATTACHMENT_ASPECT_CREATE_INFO: + /* We don't care about this information */ + break; + case VK_STRUCTURE_TYPE_RENDER_PASS_MULTIVIEW_CREATE_INFO: { VkRenderPassMultiviewCreateInfo *mv = (void *)ext; -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering
The env var workaround is fine. Thread affinity is used for cache topology related optimizations. I think it's a mistake to treat it only as a resource allocation tool. Marek On Tue, Mar 12, 2019, 1:59 AM Marc-André Lureau wrote: > Hi > > On Fri, Mar 1, 2019 at 12:13 PM Mathias Fröhlich > wrote: > > > > On Friday, 1 March 2019 12:15:08 CET Eero Tamminen wrote: > > > Hi, > > > > > > On 1.3.2019 11.12, Michel Dänzer wrote: > > > > On 2019-02-28 8:41 p.m., Marek Olšák wrote: > > > >>> On Thu, Feb 28, 2019 at 1:37 PM Eero Tamminen < > eero.t.tammi...@intel.com> > > > Why distro versions of Qemu filter sched_setaffinity() syscall? > > > >>> > > > >>> (https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1815889) > > > >>> > > > >>> Daniel Berrange (berrange) wrote on 2019-02-27: #19 > > > >>> > > > >>> "IMHO that mesa change is not valid. It is settings its affinity to > > > >>> run on all threads which is definitely *NOT* something we want to > be > > > >>> allowed. Management applications want to control which CPUs QEMU > runs > > > >>> on, and as such Mesa should honour the CPU placement that the QEMU > > > >>> process has. > > > >>> > > > >>> This is a great example of why QEMU wants to use seccomp to block > > > >>> affinity changes to prevent something silently trying to use more > CPUs > > > >>> than are assigned to this QEMU." > > > >>> > > > >> > > > >> Mesa uses thread affinity to optimize memory access performance on > some > > > >> CPUs (see util_pin_thread_to_L3). Other places in Mesa need to > restore the > > > >> original thread affinity for some child threads. Additionally, if > games > > > >> limit the thread affinity, Mesa needs to restore the full thread > affinity > > > >> for some of its child threads. > > > > > > > > The last part sounds like Mesa clearly overstepping its authority. > > > > > > > > > > > >> In essence, the thread affinity should only be considered a hint > for the > > > >> kernel for optimal performance. There is no reason to kill the > process if > > > >> it's disallowed. Just ignore the call or modify the thread mask to > make it > > > >> legal. > > > > > > > > The fundamental issue here is that Mesa is using the thread affinity > API > > > > for something else than it's intended for. If there was an API for > what > > > > Mesa wants (encouraging certain sets of threads to run on > topologically > > > > close cores), there should be no need to block that. > > > > > > Why such process needs to be killed instead the request being masked > > > suitably, is there some program that breaks subtly if affinity request > > > is masked (and that being worse than the program being killed)? > > > > But that is still a situation that could be nicely handled with a > > EPERM error return. Way better than just kill a process. > > That 'badly affected' program still can call abort then. > > But nicely working programs don't get just killed then!! > > > Returning an error seems less secure that prohibiting it completely. > And it may lead to subtle bugs in rarely tested code paths. > > It's legitimate that QEMU and management layers want to prevent > arbitrary code from changing resource allocation etc. > > There are no easy way I can think of for mesa (and other libraries) to > probe the seccomp filters and associated action. > > So we need a way to tell mesa not to call setaffinity() (and other > syscalls). MESA_NO_THREAD_AFFINITY or MESA_NO_SYSCALLS=setaffinity,... > seem like a relatively easy way to go. > > thanks > > > -- > Marc-André Lureau > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions
Francisco Jerez writes: > Iago Toral writes: > >> On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote: >>> On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote: >>> > Iago Toral writes: >>> > >>> > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote: >>> > > > Iago Toral writes: >>> > > > >>> > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote: >>> > > > > > Iago Toral writes: >>> > > > > > >>> > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote: >>> > > > > > > > Iago Toral writes: >>> > > > > > > > >>> > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez >>> > > > > > > > > wrote: >>> > > > > > > > > > Iago Toral Quiroga writes: >>> > > > > > > > > > >>> > > > > > > > > > > --- >>> > > > > > > > > > > src/intel/compiler/brw_eu_validate.c| 64 >>> > > > > > > > > > > - >>> > > > > > > > > > > src/intel/compiler/test_eu_validate.cpp | 122 >>> > > > > > > > > > > >>> > > > > > > > > > > 2 files changed, 185 insertions(+), 1 deletion(- >>> > > > > > > > > > > ) >>> > > > > > > > > > > >>> > > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c >>> > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c >>> > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644 >>> > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c >>> > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c >>> > > > > > > > > > > @@ -531,7 +531,69 @@ >>> > > > > > > > > > > general_restrictions_based_on_operand_types(const >>> > > > > > > > > > > struct >>> > > > > > > > > > > gen_device_info *devinf >>> > > > > > > > > > > exec_type_size == 8 && dst_type_size == >>> > > > > > > > > > > 4) >>> > > > > > > > > > >dst_type_size = 8; >>> > > > > > > > > > > >>> > > > > > > > > > > - if (exec_type_size > dst_type_size) { >>> > > > > > > > > > > + /* From the BDW+ PRM: >>> > > > > > > > > > > +* >>> > > > > > > > > > > +*"There is no direct conversion from HF >>> > > > > > > > > > > to >>> > > > > > > > > > > DF >>> > > > > > > > > > > or >>> > > > > > > > > > > DF to >>> > > > > > > > > > > HF. >>> > > > > > > > > > > +* There is no direct conversion from HF >>> > > > > > > > > > > to >>> > > > > > > > > > > Q/UQ or >>> > > > > > > > > > > Q/UQ to >>> > > > > > > > > > > HF." >>> > > > > > > > > > > +*/ >>> > > > > > > > > > > + enum brw_reg_type src0_type = >>> > > > > > > > > > > brw_inst_src0_type(devinfo, >>> > > > > > > > > > > inst); >>> > > > > > > > > > > + ERROR_IF(brw_inst_opcode(devinfo, inst) == >>> > > > > > > > > > > BRW_OPCODE_MOV >>> > > > > > > > > > > && >>> > > > > > > > > > >>> > > > > > > > > > Why is only the MOV instruction handled here and >>> > > > > > > > > > below? Aren't >>> > > > > > > > > > other >>> > > > > > > > > > instructions able to do implicit >>> > > > > > > > > > conversions? Probably >>> > > > > > > > > > means >>> > > > > > > > > > you >>> > > > > > > > > > need >>> > > > > > > > > > to deal with two sources rather than one. >>> > > > > > > > > >>> > > > > > > > > This comes from the programming notes of the MOV >>> > > > > > > > > instruction >>> > > > > > > > > (Volume >>> > > > > > > > > 2a, Command Reference - Instructions - MOV), so it is >>> > > > > > > > > described >>> > > > > > > > > specifically for the MOV instruction. I should >>> > > > > > > > > probably >>> > > > > > > > > have >>> > > > > > > > > made >>> > > > > > > > > this >>> > > > > > > > > clear in the comment. >>> > > > > > > > > >>> > > > > > > > >>> > > > > > > > Maybe the one above is specified in the MOV page only, >>> > > > > > > > probably >>> > > > > > > > due >>> > > > > > > > to >>> > > > > > > > an oversight (If these restrictions were really >>> > > > > > > > specific >>> > > > > > > > to >>> > > > > > > > the >>> > > > > > > > MOV >>> > > > > > > > instruction, what would prevent you from implementing >>> > > > > > > > such >>> > > > > > > > conversions >>> > > > > > > > through a different instruction? E.g. "ADD dst:df, >>> > > > > > > > src:hf, >>> > > > > > > > 0" >>> > > > > > > > which >>> > > > > > > > would be substantially more efficient than what you're >>> > > > > > > > doing >>> > > > > > > > in >>> > > > > > > > PATCH >>> > > > > > > > 02) >>> > > > > > > >>> > > > > > > Instructions that take HF can only be strictly HF or mix >>> > > > > > > F >>> > > > > > > and >>> > > > > > > HF >>> > > > > > > (mixed mode float), with MOV being the only exception. >>> > > > > > > That >>> > > > > > > means >>> > > > > > > that >>> > > > > > > any instruction like the one you use above are invalid. >>> > > > > > > Maybe >>> > > > > > > we >>> > > > > > > should >>> > > > > > > validate explicitly that instructions that use HF are >>> > > > > > > strictly >>> > > > > > > HF >>> > > > > > > or >>> > > > > > > mixed-float mode only? >>> > > > > > > >>> > > > > > >>> > > > > > So you're acknowledging that the
Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions
Iago Toral writes: > On Wed, 2019-03-06 at 09:21 +0100, Iago Toral wrote: >> On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote: >> > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote: >> > > Iago Toral writes: >> > > >> > > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote: >> > > > > Iago Toral writes: >> > > > > >> > > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote: >> > > > > > > Iago Toral writes: >> > > > > > > >> > > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez >> > > > > > > > wrote: >> > > > > > > > > Iago Toral writes: >> > > > > > > > > >> > > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez >> > > > > > > > > > wrote: >> > > > > > > > > > > Iago Toral Quiroga writes: >> > > > > > > > > > > >> > > > > > > > > > > > --- >> > > > > > > > > > > > src/intel/compiler/brw_eu_validate.c| 64 >> > > > > > > > > > > > - >> > > > > > > > > > > > src/intel/compiler/test_eu_validate.cpp | 122 >> > > > > > > > > > > > >> > > > > > > > > > > > 2 files changed, 185 insertions(+), 1 >> > > > > > > > > > > > deletion(- >> > > > > > > > > > > > ) >> > > > > > > > > > > > >> > > > > > > > > > > > diff --git >> > > > > > > > > > > > a/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644 >> > > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > > > @@ -531,7 +531,69 @@ >> > > > > > > > > > > > general_restrictions_based_on_operand_types(con >> > > > > > > > > > > > st >> > > > > > > > > > > > struct >> > > > > > > > > > > > gen_device_info *devinf >> > > > > > > > > > > > exec_type_size == 8 && dst_type_size == >> > > > > > > > > > > > 4) >> > > > > > > > > > > >dst_type_size = 8; >> > > > > > > > > > > > >> > > > > > > > > > > > - if (exec_type_size > dst_type_size) { >> > > > > > > > > > > > + /* From the BDW+ PRM: >> > > > > > > > > > > > +* >> > > > > > > > > > > > +*"There is no direct conversion from >> > > > > > > > > > > > HF >> > > > > > > > > > > > to >> > > > > > > > > > > > DF >> > > > > > > > > > > > or >> > > > > > > > > > > > DF to >> > > > > > > > > > > > HF. >> > > > > > > > > > > > +* There is no direct conversion from >> > > > > > > > > > > > HF >> > > > > > > > > > > > to >> > > > > > > > > > > > Q/UQ or >> > > > > > > > > > > > Q/UQ to >> > > > > > > > > > > > HF." >> > > > > > > > > > > > +*/ >> > > > > > > > > > > > + enum brw_reg_type src0_type = >> > > > > > > > > > > > brw_inst_src0_type(devinfo, >> > > > > > > > > > > > inst); >> > > > > > > > > > > > + ERROR_IF(brw_inst_opcode(devinfo, inst) == >> > > > > > > > > > > > BRW_OPCODE_MOV >> > > > > > > > > > > > && >> > > > > > > > > > > >> > > > > > > > > > > Why is only the MOV instruction handled here and >> > > > > > > > > > > below? Aren't >> > > > > > > > > > > other >> > > > > > > > > > > instructions able to do implicit >> > > > > > > > > > > conversions? Probably >> > > > > > > > > > > means >> > > > > > > > > > > you >> > > > > > > > > > > need >> > > > > > > > > > > to deal with two sources rather than one. >> > > > > > > > > > >> > > > > > > > > > This comes from the programming notes of the MOV >> > > > > > > > > > instruction >> > > > > > > > > > (Volume >> > > > > > > > > > 2a, Command Reference - Instructions - MOV), so it >> > > > > > > > > > is >> > > > > > > > > > described >> > > > > > > > > > specifically for the MOV instruction. I should >> > > > > > > > > > probably >> > > > > > > > > > have >> > > > > > > > > > made >> > > > > > > > > > this >> > > > > > > > > > clear in the comment. >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > Maybe the one above is specified in the MOV page >> > > > > > > > > only, >> > > > > > > > > probably >> > > > > > > > > due >> > > > > > > > > to >> > > > > > > > > an oversight (If these restrictions were really >> > > > > > > > > specific >> > > > > > > > > to >> > > > > > > > > the >> > > > > > > > > MOV >> > > > > > > > > instruction, what would prevent you from implementing >> > > > > > > > > such >> > > > > > > > > conversions >> > > > > > > > > through a different instruction? E.g. "ADD dst:df, >> > > > > > > > > src:hf, >> > > > > > > > > 0" >> > > > > > > > > which >> > > > > > > > > would be substantially more efficient than what >> > > > > > > > > you're >> > > > > > > > > doing >> > > > > > > > > in >> > > > > > > > > PATCH >> > > > > > > > > 02) >> > > > > > > > >> > > > > > > > Instructions that take HF can only be strictly HF or >> > > > > > > > mix >> > > > > > > > F >> > > > > > > > and >> > > > > > > > HF >> > > > > > > > (mixed mode float), with MOV being the only exception. >> > > > > > > > That >> > > > > > > > means >> > > > > > > > that >> > > > > > > > any instruction like
Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions
Iago Toral writes: > On Tue, 2019-03-05 at 07:35 +0100, Iago Toral wrote: >> On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote: >> > Iago Toral writes: >> > >> > > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote: >> > > > Iago Toral writes: >> > > > >> > > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote: >> > > > > > Iago Toral writes: >> > > > > > >> > > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote: >> > > > > > > > Iago Toral writes: >> > > > > > > > >> > > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez >> > > > > > > > > wrote: >> > > > > > > > > > Iago Toral Quiroga writes: >> > > > > > > > > > >> > > > > > > > > > > --- >> > > > > > > > > > > src/intel/compiler/brw_eu_validate.c| 64 >> > > > > > > > > > > - >> > > > > > > > > > > src/intel/compiler/test_eu_validate.cpp | 122 >> > > > > > > > > > > >> > > > > > > > > > > 2 files changed, 185 insertions(+), 1 deletion(- >> > > > > > > > > > > ) >> > > > > > > > > > > >> > > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > > index 000a05cb6ac..203641fecb9 100644 >> > > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > > @@ -531,7 +531,69 @@ >> > > > > > > > > > > general_restrictions_based_on_operand_types(const >> > > > > > > > > > > struct >> > > > > > > > > > > gen_device_info *devinf >> > > > > > > > > > > exec_type_size == 8 && dst_type_size == >> > > > > > > > > > > 4) >> > > > > > > > > > >dst_type_size = 8; >> > > > > > > > > > > >> > > > > > > > > > > - if (exec_type_size > dst_type_size) { >> > > > > > > > > > > + /* From the BDW+ PRM: >> > > > > > > > > > > +* >> > > > > > > > > > > +*"There is no direct conversion from HF >> > > > > > > > > > > to >> > > > > > > > > > > DF >> > > > > > > > > > > or >> > > > > > > > > > > DF to >> > > > > > > > > > > HF. >> > > > > > > > > > > +* There is no direct conversion from HF >> > > > > > > > > > > to >> > > > > > > > > > > Q/UQ or >> > > > > > > > > > > Q/UQ to >> > > > > > > > > > > HF." >> > > > > > > > > > > +*/ >> > > > > > > > > > > + enum brw_reg_type src0_type = >> > > > > > > > > > > brw_inst_src0_type(devinfo, >> > > > > > > > > > > inst); >> > > > > > > > > > > + ERROR_IF(brw_inst_opcode(devinfo, inst) == >> > > > > > > > > > > BRW_OPCODE_MOV >> > > > > > > > > > > && >> > > > > > > > > > >> > > > > > > > > > Why is only the MOV instruction handled here and >> > > > > > > > > > below? Aren't >> > > > > > > > > > other >> > > > > > > > > > instructions able to do implicit >> > > > > > > > > > conversions? Probably >> > > > > > > > > > means >> > > > > > > > > > you >> > > > > > > > > > need >> > > > > > > > > > to deal with two sources rather than one. >> > > > > > > > > >> > > > > > > > > This comes from the programming notes of the MOV >> > > > > > > > > instruction >> > > > > > > > > (Volume >> > > > > > > > > 2a, Command Reference - Instructions - MOV), so it is >> > > > > > > > > described >> > > > > > > > > specifically for the MOV instruction. I should >> > > > > > > > > probably >> > > > > > > > > have >> > > > > > > > > made >> > > > > > > > > this >> > > > > > > > > clear in the comment. >> > > > > > > > > >> > > > > > > > >> > > > > > > > Maybe the one above is specified in the MOV page only, >> > > > > > > > probably >> > > > > > > > due >> > > > > > > > to >> > > > > > > > an oversight (If these restrictions were really >> > > > > > > > specific >> > > > > > > > to >> > > > > > > > the >> > > > > > > > MOV >> > > > > > > > instruction, what would prevent you from implementing >> > > > > > > > such >> > > > > > > > conversions >> > > > > > > > through a different instruction? E.g. "ADD dst:df, >> > > > > > > > src:hf, >> > > > > > > > 0" >> > > > > > > > which >> > > > > > > > would be substantially more efficient than what you're >> > > > > > > > doing >> > > > > > > > in >> > > > > > > > PATCH >> > > > > > > > 02) >> > > > > > > >> > > > > > > Instructions that take HF can only be strictly HF or mix >> > > > > > > F >> > > > > > > and >> > > > > > > HF >> > > > > > > (mixed mode float), with MOV being the only exception. >> > > > > > > That >> > > > > > > means >> > > > > > > that >> > > > > > > any instruction like the one you use above are invalid. >> > > > > > > Maybe >> > > > > > > we >> > > > > > > should >> > > > > > > validate explicitly that instructions that use HF are >> > > > > > > strictly >> > > > > > > HF >> > > > > > > or >> > > > > > > mixed-float mode only? >> > > > > > > >> > > > > > >> > > > > > So you're acknowledging that the conversions checked above >> > > > > > are >> > > > > > illegal >> > > > > > whether the instruction is a MOV or something else. Where >> > > > > > are we >> > >
Re: [Mesa-dev] [PATCH] nir/loop_unroll: Fix out-of-bounds access handling
On 13/3/19 8:30 am, Jason Ekstrand wrote: The previous code was completely broken when it came to constructing the undef values. I'm not sure how it ever worked. For the case of a copy that reads an undefined value, we can just delete the copy because the destination is a valid undefined value. This saves us the effort of trying to construct a value for an arbitrary copy_deref intrinsic. Yes. It turns out this was indeed not required. Thanks for the fix! Reviewed-by: Timothy Arceri Fixes: e8a8937a04 "nir: add partial loop unrolling support" Cc: Timothy Arceri --- src/compiler/nir/nir_opt_loop_unroll.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/src/compiler/nir/nir_opt_loop_unroll.c b/src/compiler/nir/nir_opt_loop_unroll.c index 06ec78b8068..b2696d4aadb 100644 --- a/src/compiler/nir/nir_opt_loop_unroll.c +++ b/src/compiler/nir/nir_opt_loop_unroll.c @@ -670,11 +670,9 @@ remove_out_of_bounds_induction_use(nir_shader *shader, nir_loop *loop, if (is_access_out_of_bounds(term, nir_src_as_deref(intrin->src[0]), trip_count)) { if (intrin->intrinsic == nir_intrinsic_load_deref) { - assert(intrin->src[0].is_ssa); - nir_ssa_def *a_ssa = intrin->src[0].ssa; nir_ssa_def *undef = - nir_ssa_undef(, intrin->num_components, - a_ssa->bit_size); + nir_ssa_undef(, intrin->dest.ssa.num_components, + intrin->dest.ssa.bit_size); nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(undef)); } else { @@ -686,14 +684,6 @@ remove_out_of_bounds_induction_use(nir_shader *shader, nir_loop *loop, if (intrin->intrinsic == nir_intrinsic_copy_deref && is_access_out_of_bounds(term, nir_src_as_deref(intrin->src[1]), trip_count)) { - assert(intrin->src[1].is_ssa); - nir_ssa_def *a_ssa = intrin->src[1].ssa; - nir_ssa_def *undef = - nir_ssa_undef(, intrin->num_components, a_ssa->bit_size); - - /* Replace the copy with a store of the undefined value */ - b.cursor = nir_before_instr(instr); - nir_store_deref(, nir_src_as_deref(intrin->src[0]), undef, ~0); nir_instr_remove(instr); } } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v4 34/40] intel/compiler: validate region restrictions for half-float conversions
Iago Toral writes: > On Mon, 2019-03-04 at 15:36 -0800, Francisco Jerez wrote: >> Iago Toral writes: >> >> > On Fri, 2019-03-01 at 19:04 -0800, Francisco Jerez wrote: >> > > Iago Toral writes: >> > > >> > > > On Thu, 2019-02-28 at 09:54 -0800, Francisco Jerez wrote: >> > > > > Iago Toral writes: >> > > > > >> > > > > > On Wed, 2019-02-27 at 13:47 -0800, Francisco Jerez wrote: >> > > > > > > Iago Toral writes: >> > > > > > > >> > > > > > > > On Tue, 2019-02-26 at 14:54 -0800, Francisco Jerez >> > > > > > > > wrote: >> > > > > > > > > Iago Toral Quiroga writes: >> > > > > > > > > >> > > > > > > > > > --- >> > > > > > > > > > src/intel/compiler/brw_eu_validate.c| 64 >> > > > > > > > > > - >> > > > > > > > > > src/intel/compiler/test_eu_validate.cpp | 122 >> > > > > > > > > > >> > > > > > > > > > 2 files changed, 185 insertions(+), 1 deletion(-) >> > > > > > > > > > >> > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > b/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > index 000a05cb6ac..203641fecb9 100644 >> > > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c >> > > > > > > > > > @@ -531,7 +531,69 @@ >> > > > > > > > > > general_restrictions_based_on_operand_types(const >> > > > > > > > > > struct >> > > > > > > > > > gen_device_info *devinf >> > > > > > > > > > exec_type_size == 8 && dst_type_size == 4) >> > > > > > > > > >dst_type_size = 8; >> > > > > > > > > > >> > > > > > > > > > - if (exec_type_size > dst_type_size) { >> > > > > > > > > > + /* From the BDW+ PRM: >> > > > > > > > > > +* >> > > > > > > > > > +*"There is no direct conversion from HF to >> > > > > > > > > > DF >> > > > > > > > > > or >> > > > > > > > > > DF to >> > > > > > > > > > HF. >> > > > > > > > > > +* There is no direct conversion from HF to >> > > > > > > > > > Q/UQ or >> > > > > > > > > > Q/UQ to >> > > > > > > > > > HF." >> > > > > > > > > > +*/ >> > > > > > > > > > + enum brw_reg_type src0_type = >> > > > > > > > > > brw_inst_src0_type(devinfo, >> > > > > > > > > > inst); >> > > > > > > > > > + ERROR_IF(brw_inst_opcode(devinfo, inst) == >> > > > > > > > > > BRW_OPCODE_MOV >> > > > > > > > > > && >> > > > > > > > > >> > > > > > > > > Why is only the MOV instruction handled here and >> > > > > > > > > below? Aren't >> > > > > > > > > other >> > > > > > > > > instructions able to do implicit >> > > > > > > > > conversions? Probably >> > > > > > > > > means >> > > > > > > > > you >> > > > > > > > > need >> > > > > > > > > to deal with two sources rather than one. >> > > > > > > > >> > > > > > > > This comes from the programming notes of the MOV >> > > > > > > > instruction >> > > > > > > > (Volume >> > > > > > > > 2a, Command Reference - Instructions - MOV), so it is >> > > > > > > > described >> > > > > > > > specifically for the MOV instruction. I should probably >> > > > > > > > have >> > > > > > > > made >> > > > > > > > this >> > > > > > > > clear in the comment. >> > > > > > > > >> > > > > > > >> > > > > > > Maybe the one above is specified in the MOV page only, >> > > > > > > probably >> > > > > > > due >> > > > > > > to >> > > > > > > an oversight (If these restrictions were really specific >> > > > > > > to >> > > > > > > the >> > > > > > > MOV >> > > > > > > instruction, what would prevent you from implementing >> > > > > > > such >> > > > > > > conversions >> > > > > > > through a different instruction? E.g. "ADD dst:df, >> > > > > > > src:hf, >> > > > > > > 0" >> > > > > > > which >> > > > > > > would be substantially more efficient than what you're >> > > > > > > doing >> > > > > > > in >> > > > > > > PATCH >> > > > > > > 02) >> > > > > > >> > > > > > Instructions that take HF can only be strictly HF or mix F >> > > > > > and >> > > > > > HF >> > > > > > (mixed mode float), with MOV being the only exception. That >> > > > > > means >> > > > > > that >> > > > > > any instruction like the one you use above are invalid. >> > > > > > Maybe >> > > > > > we >> > > > > > should >> > > > > > validate explicitly that instructions that use HF are >> > > > > > strictly >> > > > > > HF >> > > > > > or >> > > > > > mixed-float mode only? >> > > > > > >> > > > > >> > > > > So you're acknowledging that the conversions checked above >> > > > > are >> > > > > illegal >> > > > > whether the instruction is a MOV or something else. Where >> > > > > are we >> > > > > preventing instructions other than MOV with such conversions >> > > > > from >> > > > > being >> > > > > accepted by the validator? >> > > > >> > > > We aren't, because the validator is not checking, in general, >> > > > for >> > > > accepted type combinations for specific instructions anywhere >> > > > as >> > > > far as >> > > > I can see. >> > > >> > > Luckily these type conversion restrictions aren't really specific >> > > to >> > > any
Re: [Mesa-dev] [PATCH 04/11] ac: add ac_build_buffer_store_format() helper
On 3/12/19 5:19 PM, Samuel Pitoiset wrote: Similar to ac_build_buffer_load_format(). Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 100 src/amd/common/ac_llvm_build.h | 11 src/amd/common/ac_nir_to_llvm.c | 29 +++-- 3 files changed, 119 insertions(+), 21 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 253073e52fb..bb8a470ae1d 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1082,6 +1082,106 @@ LLVMValueRef ac_build_load_to_sgpr_uint_wraparound(struct ac_llvm_context *ctx, return ac_build_load_custom(ctx, base_ptr, index, true, true, false); } +static void +ac_build_buffer_store_common(struct ac_llvm_context *ctx, +LLVMValueRef rsrc, +LLVMValueRef data, +LLVMValueRef vindex, +LLVMValueRef voffset, +unsigned num_channels, +bool glc, +bool slc, +bool writeonly_memory, +bool use_format) +{ + LLVMValueRef args[] = { + data, + LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""), + vindex ? vindex : ctx->i32_0, + voffset, + LLVMConstInt(ctx->i1, glc, 0), + LLVMConstInt(ctx->i1, slc, 0) + }; + unsigned func = CLAMP(num_channels, 1, 3) - 1; + + const char *type_names[] = {"f32", "v2f32", "v4f32"}; + char name[256]; + + if (use_format) { + snprintf(name, sizeof(name), "llvm.amdgcn.buffer.store.format.%s", +type_names[func]); + } else { + snprintf(name, sizeof(name), "llvm.amdgcn.buffer.store.%s", +type_names[func]); + } + + ac_build_intrinsic(ctx, name, ctx->voidt, args, ARRAY_SIZE(args), + ac_get_store_intr_attribs(writeonly_memory)); +} + +static void +ac_build_llvm8_buffer_store_common(struct ac_llvm_context *ctx, + LLVMValueRef rsrc, + LLVMValueRef data, + LLVMValueRef vindex, + LLVMValueRef voffset, + LLVMValueRef soffset, + unsigned num_channels, + bool glc, + bool slc, + bool writeonly_memory, + bool use_format, + bool structurized) +{ + LLVMValueRef args[5]; This should be 6. + int idx = 0; + args[idx++] = data; + args[idx++] = LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""); + if (structurized) + args[idx++] = vindex ? vindex : ctx->i32_0; + args[idx++] = voffset ? voffset : ctx->i32_0; + args[idx++] = soffset ? soffset : ctx->i32_0; + args[idx++] = LLVMConstInt(ctx->i32, (glc ? 1 : 0) + (slc ? 2 : 0), 0); + unsigned func = CLAMP(num_channels, 1, 3) - 1; + + const char *type_names[] = {"f32", "v2f32", "v4f32"}; + const char *indexing_kind = structurized ? "struct" : "raw"; + char name[256]; + + if (use_format) { + snprintf(name, sizeof(name), "llvm.amdgcn.%s.buffer.store.format.%s", +indexing_kind, type_names[func]); + } else { + snprintf(name, sizeof(name), "llvm.amdgcn.%s.buffer.store.%s", +indexing_kind, type_names[func]); + } + + ac_build_intrinsic(ctx, name, ctx->voidt, args, idx, + ac_get_store_intr_attribs(writeonly_memory)); +} + +void +ac_build_buffer_store_format(struct ac_llvm_context *ctx, +LLVMValueRef rsrc, +LLVMValueRef data, +LLVMValueRef vindex, +LLVMValueRef voffset, +unsigned num_channels, +bool glc, +bool writeonly_memory) +{ + if (HAVE_LLVM >= 0x800) { + ac_build_llvm8_buffer_store_common(ctx, rsrc, data, vindex, + voffset, NULL, num_channels, + glc, false, writeonly_memory, + true, true); + } else { + ac_build_buffer_store_common(ctx, rsrc, data, vindex, voffset, +num_channels, glc, false, +writeonly_memory, true); + } +} + /* TBUFFER_STORE_FORMAT_{X,XY,XYZ,XYZW} <- the suffix is
Re: [Mesa-dev] [PATCH] gallium/docs: clarify set_sampler_views
On Tue, Mar 12, 2019 at 1:59 PM Roland Scheidegger wrote: > > Am 12.03.19 um 16:16 schrieb Rob Clark: > > This previously was not called out clearly, but based on a survey of the > > code, it seems the expected behavior is to release the reference to any > > sampler views beyond the new range being bound. > > That isn't really true. This was designed to work like d3d10, where > other views are unmodified. > The cso code will actually unset all views which previously were set and > are above the num_views in the call (this wouldn't be necessary if the > pipe function itself would work like this). > However, it will only do this for fragment textures, and pass through > the parameters unmodified otherwise. Which means behavior might not be > very consistent for the different stages... hmm, I did notice w/ deqp tests (which aren't so good at resetting/clearing state between tests), that I ended up w/ different # of sampler views bound (without changing freedreno to match the behavior of most of the other drivers).. I didn't really dig in that closely but it seemed like mesa/st wasn't clearing the additional previously bound textures. Maybe I overlooked something, but that seemed wrong. One way or another, I guess we should clarify and change the various drivers to have a common behavior because right now there two different behaviors and I guess it is at least confusing for new gallium driver writers (as it was for me and I've been at it for a while) BR, -R > > > > > > I think radeonsi and freedreno were the only ones not doing this. Which > > could probably temporarily leak a bit of memory by holding on to the > > sampler view reference. > Not sure about other drivers, but llvmpipe will not do this neither. > > Roland > > > > > > Signed-off-by: Rob Clark > > --- > > src/gallium/docs/source/context.rst | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/src/gallium/docs/source/context.rst > > b/src/gallium/docs/source/context.rst > > index f89d9e1005e..199d335f8f4 100644 > > --- a/src/gallium/docs/source/context.rst > > +++ b/src/gallium/docs/source/context.rst > > @@ -143,6 +143,9 @@ to the array index which is used for sampling. > >to a respective sampler view and releases a reference to the previous > >sampler view. > > > > + Previously bound samplers with index ``>= num_views`` are unbound rather > > + than unmodified. > > + > > * ``create_sampler_view`` creates a new sampler view. ``texture`` is > > associated > >with the sampler view which results in sampler view holding a reference > >to the texture. Format specified in template must be compatible > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir/loop_unroll: Fix out-of-bounds access handling
The previous code was completely broken when it came to constructing the undef values. I'm not sure how it ever worked. For the case of a copy that reads an undefined value, we can just delete the copy because the destination is a valid undefined value. This saves us the effort of trying to construct a value for an arbitrary copy_deref intrinsic. Fixes: e8a8937a04 "nir: add partial loop unrolling support" Cc: Timothy Arceri --- src/compiler/nir/nir_opt_loop_unroll.c | 14 ++ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/src/compiler/nir/nir_opt_loop_unroll.c b/src/compiler/nir/nir_opt_loop_unroll.c index 06ec78b8068..b2696d4aadb 100644 --- a/src/compiler/nir/nir_opt_loop_unroll.c +++ b/src/compiler/nir/nir_opt_loop_unroll.c @@ -670,11 +670,9 @@ remove_out_of_bounds_induction_use(nir_shader *shader, nir_loop *loop, if (is_access_out_of_bounds(term, nir_src_as_deref(intrin->src[0]), trip_count)) { if (intrin->intrinsic == nir_intrinsic_load_deref) { - assert(intrin->src[0].is_ssa); - nir_ssa_def *a_ssa = intrin->src[0].ssa; nir_ssa_def *undef = - nir_ssa_undef(, intrin->num_components, - a_ssa->bit_size); + nir_ssa_undef(, intrin->dest.ssa.num_components, + intrin->dest.ssa.bit_size); nir_ssa_def_rewrite_uses(>dest.ssa, nir_src_for_ssa(undef)); } else { @@ -686,14 +684,6 @@ remove_out_of_bounds_induction_use(nir_shader *shader, nir_loop *loop, if (intrin->intrinsic == nir_intrinsic_copy_deref && is_access_out_of_bounds(term, nir_src_as_deref(intrin->src[1]), trip_count)) { - assert(intrin->src[1].is_ssa); - nir_ssa_def *a_ssa = intrin->src[1].ssa; - nir_ssa_def *undef = - nir_ssa_undef(, intrin->num_components, a_ssa->bit_size); - - /* Replace the copy with a store of the undefined value */ - b.cursor = nir_before_instr(instr); - nir_store_deref(, nir_src_as_deref(intrin->src[0]), undef, ~0); nir_instr_remove(instr); } } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: set the maximum number of IBs per submit to 192
This fixes random SteamVR corruption, see https://github.com/ValveSoftware/SteamVR-for-Linux/issues/181 Fixes: 4d30f2c6f42 ("radv/winsys: remove the max IBs per submit limit for the fallback path") Signed-off-by: Samuel Pitoiset --- src/amd/vulkan/radv_device.c | 2 +- src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys_public.h | 7 +++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 83d218fb6bf..9570c15af02 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -2815,7 +2815,7 @@ VkResult radv_QueueSubmit( struct radeon_winsys_fence *base_fence = fence ? fence->fence : NULL; struct radeon_winsys_ctx *ctx = queue->hw_ctx; int ret; - uint32_t max_cs_submission = queue->device->trace_bo ? 1 : UINT32_MAX; + uint32_t max_cs_submission = queue->device->trace_bo ? 1 : RADV_MAX_IBS_PER_SUBMIT; uint32_t scratch_size = 0; uint32_t compute_scratch_size = 0; uint32_t esgs_ring_size = 0, gsvs_ring_size = 0; diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys_public.h b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys_public.h index 854e216551f..709669b2a57 100644 --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys_public.h +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_winsys_public.h @@ -29,6 +29,13 @@ #ifndef RADV_AMDGPU_WINSYS_PUBLIC_H #define RADV_AMDGPU_WINSYS_PUBLIC_H +/* The number of IBs per submit isn't infinite, it depends on the ring type + * (ie. some initial setup needed for a submit) and the number of IBs (4 DW). + * This limit is arbitrary but should be safe for now. Ideally, we should get + * this limit from the KMD. +*/ +#define RADV_MAX_IBS_PER_SUBMIT 192 + struct radeon_winsys *radv_amdgpu_winsys_create(int fd, uint64_t debug_flags, uint64_t perftest_flags); -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: Fix driverUUID
On 3/12/19 7:11 PM, Emil Velikov wrote: On Tue, 12 Mar 2019 at 15:12, Samuel Pitoiset wrote: Reviewed-by: Samuel Pitoiset A fixes tag like below would be great ;-) Fixes: 14cad8786a8 ("radv: generate the same driver UUID as radeonsi") Reviewed-by: Emil Velikov Aside: seems like radv_get_driver_uuid() produces "AMD-MESA-DRV" - a not so unique identifier. Perhaps we should fix that at some point? yeah, should probably be fixed. HTH Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] anv: Stop using VK_TRUE/FALSE
We've been fairly inconsistent about this so we should really choose whether we're going to use VK_TRUE/FALSE or the C boolean values. The Vulkan #defines are set to 1 and 0 respectively so it's the same value as C gives you when you cast a boolean expression to an integer. Since there are several places where we set a VkBool32 to a C logical expression, let's just embrace C booleans and stop using the VK defines. --- src/intel/vulkan/anv_device.c | 42 +-- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 729cceb3e32..83fa3936c19 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -833,7 +833,7 @@ VkResult anv_EnumeratePhysicalDeviceGroups( memset(p->physicalDevices, 0, sizeof(p->physicalDevices)); p->physicalDevices[0] = anv_physical_device_to_handle(>physicalDevice); - p->subsetAllocation = VK_FALSE; + p->subsetAllocation = false; vk_foreach_struct(ext, p->pNext) anv_debug_ignored_stype(ext->sType); @@ -967,7 +967,7 @@ void anv_GetPhysicalDeviceFeatures2( case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_DEPTH_CLIP_ENABLE_FEATURES_EXT: { VkPhysicalDeviceDepthClipEnableFeaturesEXT *features = (VkPhysicalDeviceDepthClipEnableFeaturesEXT *)ext; - features->depthClipEnable = VK_TRUE; + features->depthClipEnable = true; break; } @@ -990,7 +990,7 @@ void anv_GetPhysicalDeviceFeatures2( case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_PROTECTED_MEMORY_FEATURES: { VkPhysicalDeviceProtectedMemoryFeatures *features = (void *)ext; - features->protectedMemory = VK_FALSE; + features->protectedMemory = false; break; } @@ -1024,23 +1024,23 @@ void anv_GetPhysicalDeviceFeatures2( case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_TRANSFORM_FEEDBACK_FEATURES_EXT: { VkPhysicalDeviceTransformFeedbackFeaturesEXT *features = (VkPhysicalDeviceTransformFeedbackFeaturesEXT *)ext; - features->transformFeedback = VK_TRUE; - features->geometryStreams = VK_TRUE; + features->transformFeedback = true; + features->geometryStreams = true; break; } case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_VERTEX_ATTRIBUTE_DIVISOR_FEATURES_EXT: { VkPhysicalDeviceVertexAttributeDivisorFeaturesEXT *features = (VkPhysicalDeviceVertexAttributeDivisorFeaturesEXT *)ext; - features->vertexAttributeInstanceRateDivisor = VK_TRUE; - features->vertexAttributeInstanceRateZeroDivisor = VK_TRUE; + features->vertexAttributeInstanceRateDivisor = true; + features->vertexAttributeInstanceRateZeroDivisor = true; break; } case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_YCBCR_IMAGE_ARRAYS_FEATURES_EXT: { VkPhysicalDeviceYcbcrImageArraysFeaturesEXT *features = (VkPhysicalDeviceYcbcrImageArraysFeaturesEXT *)ext; - features->ycbcrImageArrays = VK_TRUE; + features->ycbcrImageArrays = true; break; } @@ -1234,8 +1234,8 @@ void anv_GetPhysicalDeviceProperties2( VK_RESOLVE_MODE_MAX_BIT_KHR; } - props->independentResolveNone = VK_TRUE; - props->independentResolve = VK_TRUE; + props->independentResolveNone = true; + props->independentResolve = true; break; } @@ -1372,7 +1372,7 @@ void anv_GetPhysicalDeviceProperties2( VK_SUBGROUP_FEATURE_SHUFFLE_RELATIVE_BIT | VK_SUBGROUP_FEATURE_CLUSTERED_BIT | VK_SUBGROUP_FEATURE_QUAD_BIT; - properties->quadOperationsInAllStages = VK_TRUE; + properties->quadOperationsInAllStages = true; break; } @@ -1386,10 +1386,10 @@ void anv_GetPhysicalDeviceProperties2( props->maxTransformFeedbackStreamDataSize = 128 * 4; props->maxTransformFeedbackBufferDataSize = 128 * 4; props->maxTransformFeedbackBufferDataStride = 2048; - props->transformFeedbackQueries = VK_TRUE; - props->transformFeedbackStreamsLinesTriangles = VK_FALSE; - props->transformFeedbackRasterizationStreamSelect = VK_FALSE; - props->transformFeedbackDraw = VK_TRUE; + props->transformFeedbackQueries = true; + props->transformFeedbackStreamsLinesTriangles = false; + props->transformFeedbackRasterizationStreamSelect = false; + props->transformFeedbackDraw = true; break; } @@ -2961,8 +2961,8 @@ void anv_GetBufferMemoryRequirements2( switch (ext->sType) { case VK_STRUCTURE_TYPE_MEMORY_DEDICATED_REQUIREMENTS: { VkMemoryDedicatedRequirements *requirements = (void *)ext; - requirements->prefersDedicatedAllocation =
[Mesa-dev] [PATCH 0/2] Fix llvmpipe on ARM processors without NEON
Hi, chained to this message are two patches that aim to fix llvmpipe driver on an ARM machine that has no NEON instructions. My machine is a XO-1.75 laptop with a Marvell MMP2 processor: [lkundrak@xo ~]$ LD_SHOW_AUXV=yespls /bin/true |grep HWCAP AT_HWCAP:half thumb fastmult vfp edsp iwmmxt thumbee vfpv3 vfpv3d16 tls idivt AT_HWCAP2: [lkundrak@xo ~]$ With unpatched Mesa: # gdb /usr/libexec/Xorg --ex run ... (II) modeset(0): Initializing kms color map for depth 24, 8 bpc. warning: JITed object file architecture armv7 is not compatible with target architecture iwmmxt. ... Program received signal SIGILL, Illegal instruction. 0xb6fbe028 in ?? () (gdb) x/i $pc => 0xb6fbe028: vld1.32 {d16[]-d17[]}, [r7 :32] (gdb) Seems like defaulting to enabling NEON on armv7 and thus breaking on Marvell and NVidia (and perhaps other) platform was a deliberate decision and as such is not considered a bug by LLVM upstream: https://bugs.llvm.org/show_bug.cgi?id=30842 Lubo ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] gallivm: guess CPU features also on ARM
getHostCPUFeatures() is also available on ARM, for even longer time than for x86. Use it -- it potentially enables instructions that may speed things up. Signed-off-by: Lubomir Rintel Cc: --- src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp index fcbdd5050fe..f3b5784fce7 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp @@ -556,11 +556,11 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT, llvm::SmallVector MAttrs; -#if defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) -#if HAVE_LLVM >= 0x0400 - /* llvm-3.7+ implements sys::getHostCPUFeatures for x86, -* which allows us to enable/disable code generation based -* on the results of cpuid. +#if HAVE_LLVM >= 0x0400 && (defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) || defined(PIPE_ARCH_ARM)) + /* llvm-3.3+ implements sys::getHostCPUFeatures for Arm +* and llvm-3.7+ for x86, which allows us to enable/disable +* code generation based on the results of cpuid on these +* architectures. */ llvm::StringMap features; llvm::sys::getHostCPUFeatures(features); @@ -570,7 +570,7 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT, ++f) { MAttrs.push_back(((*f).second ? "+" : "-") + (*f).first().str()); } -#else +#elif defined(PIPE_ARCH_X86) || defined(PIPE_ARCH_X86_64) /* * We need to unset attributes because sometimes LLVM mistakenly assumes * certain features are present given the processor name. @@ -625,7 +625,6 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT, MAttrs.push_back("-avx512vl"); #endif #endif -#endif #if defined(PIPE_ARCH_PPC) MAttrs.push_back(util_cpu_caps.has_altivec ? "+altivec" : "-altivec"); -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] gallivm: disable NEON instructions if they are not supported
The LLVM project made some questionable decisions about defaults for armv7 (e.g. they enable NEON that is not there on NVIDIA and Marvell platforms). On top of that, getHostCPUFeatures() doesn't disable missing machine attributes. Finally, -neon alone is not sufficient to disable emmision of NEON instructions. Signed-off-by: Lubomir Rintel Cc: --- src/gallium/auxiliary/gallivm/lp_bld_misc.cpp | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp index f3b5784fce7..f307c26d4f7 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp @@ -625,6 +625,13 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT, MAttrs.push_back("-avx512vl"); #endif #endif +#if defined(PIPE_ARCH_ARM) + if (!util_cpu_caps.has_neon) { + MAttrs.push_back("-neon"); + MAttrs.push_back("-crypto"); + MAttrs.push_back("-vfp2"); + } +#endif #if defined(PIPE_ARCH_PPC) MAttrs.push_back(util_cpu_caps.has_altivec ? "+altivec" : "-altivec"); -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 8/9] anv: Removed unused header file
Reviewed-by: Sagar Ghuge On 3/12/19 3:34 AM, Eleni Maria Stea wrote: > In src/intel/vulkan/genX_blorp_exec.c we included the file: > common/gen_sample_positions.h but not use it. Removed. > --- > src/intel/vulkan/genX_blorp_exec.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/src/intel/vulkan/genX_blorp_exec.c > b/src/intel/vulkan/genX_blorp_exec.c > index e9c85d56d5f..0eeefaaa9d6 100644 > --- a/src/intel/vulkan/genX_blorp_exec.c > +++ b/src/intel/vulkan/genX_blorp_exec.c > @@ -31,7 +31,6 @@ > #undef __gen_combine_address > > #include "common/gen_l3_config.h" > -#include "common/gen_sample_positions.h" > #include "blorp/blorp_genX_exec.h" > > static void * > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/3] radv: use typed buffer loads for vertex input fetches
r-b for the series On Tue, Feb 26, 2019 at 1:39 PM Samuel Pitoiset wrote: > > This drastically reduces the number of SGPRs because the driver > now uses descriptors per vertex binding, instead of per vertex > attribute format. > > 29077 shaders in 15096 tests > Totals: > SGPRS: 1354285 -> 1282109 (-5.33 %) > VGPRS: 909896 -> 908800 (-0.12 %) > Spilled SGPRs: 24840 -> 24811 (-0.12 %) > Code Size: 49221144 -> 48986628 (-0.48 %) bytes > Max Waves: 243930 -> 244229 (0.12 %) > > Totals from affected shaders: > SGPRS: 390648 -> 318472 (-18.48 %) > VGPRS: 288432 -> 287336 (-0.38 %) > Spilled SGPRs: 94 -> 65 (-30.85 %) > Code Size: 11548412 -> 11313896 (-2.03 %) bytes > Max Waves: 86460 -> 86759 (0.35 %) > > This gives a really tiny boost. > > Signed-off-by: Samuel Pitoiset > --- > src/amd/vulkan/radv_cmd_buffer.c | 21 +- > src/amd/vulkan/radv_nir_to_llvm.c | 47 +-- > src/amd/vulkan/radv_pipeline.c| 37 ++-- > src/amd/vulkan/radv_private.h | 5 +--- > 4 files changed, 57 insertions(+), 53 deletions(-) > > diff --git a/src/amd/vulkan/radv_cmd_buffer.c > b/src/amd/vulkan/radv_cmd_buffer.c > index ad0b934ddfc..5ab93d11d68 100644 > --- a/src/amd/vulkan/radv_cmd_buffer.c > +++ b/src/amd/vulkan/radv_cmd_buffer.c > @@ -1985,13 +1985,13 @@ radv_flush_vertex_descriptors(struct radv_cmd_buffer > *cmd_buffer, > { > if ((pipeline_is_dirty || > (cmd_buffer->state.dirty & RADV_CMD_DIRTY_VERTEX_BUFFER)) && > - cmd_buffer->state.pipeline->vertex_elements.count && > + cmd_buffer->state.pipeline->num_vertex_bindings && > radv_get_shader(cmd_buffer->state.pipeline, > MESA_SHADER_VERTEX)->info.info.vs.has_vertex_buffers) { > struct radv_vertex_elements_info *velems = > _buffer->state.pipeline->vertex_elements; > unsigned vb_offset; > void *vb_ptr; > uint32_t i = 0; > - uint32_t count = velems->count; > + uint32_t count = > cmd_buffer->state.pipeline->num_vertex_bindings; > uint64_t va; > > /* allocate some descriptor state for vertex buffers */ > @@ -2002,13 +2002,15 @@ radv_flush_vertex_descriptors(struct radv_cmd_buffer > *cmd_buffer, > for (i = 0; i < count; i++) { > uint32_t *desc = &((uint32_t *)vb_ptr)[i * 4]; > uint32_t offset; > - int vb = velems->binding[i]; > - struct radv_buffer *buffer = > cmd_buffer->vertex_bindings[vb].buffer; > - uint32_t stride = > cmd_buffer->state.pipeline->binding_stride[vb]; > + struct radv_buffer *buffer = > cmd_buffer->vertex_bindings[i].buffer; > + uint32_t stride = > cmd_buffer->state.pipeline->binding_stride[i]; > + > + if (!buffer) > + continue; > > va = radv_buffer_get_va(buffer->bo); > > - offset = cmd_buffer->vertex_bindings[vb].offset + > velems->offset[i]; > + offset = cmd_buffer->vertex_bindings[i].offset; > va += offset + buffer->offset; > desc[0] = va; > desc[1] = S_008F04_BASE_ADDRESS_HI(va >> 32) | > S_008F04_STRIDE(stride); > @@ -2016,7 +2018,12 @@ radv_flush_vertex_descriptors(struct radv_cmd_buffer > *cmd_buffer, > desc[2] = (buffer->size - offset - > velems->format_size[i]) / stride + 1; > else > desc[2] = buffer->size - offset; > - desc[3] = velems->rsrc_word3[i]; > + desc[3] = S_008F0C_DST_SEL_X(V_008F0C_SQ_SEL_X) | > + S_008F0C_DST_SEL_Y(V_008F0C_SQ_SEL_Y) | > + S_008F0C_DST_SEL_Z(V_008F0C_SQ_SEL_Z) | > + S_008F0C_DST_SEL_W(V_008F0C_SQ_SEL_W) | > + > S_008F0C_NUM_FORMAT(V_008F0C_BUF_NUM_FORMAT_UINT) | > + > S_008F0C_DATA_FORMAT(V_008F0C_BUF_DATA_FORMAT_32); > } > > va = radv_buffer_get_va(cmd_buffer->upload.upload_bo); > diff --git a/src/amd/vulkan/radv_nir_to_llvm.c > b/src/amd/vulkan/radv_nir_to_llvm.c > index 36f499be212..e6c8f3ecb92 100644 > --- a/src/amd/vulkan/radv_nir_to_llvm.c > +++ b/src/amd/vulkan/radv_nir_to_llvm.c > @@ -2008,6 +2008,8 @@ adjust_vertex_fetch_alpha(struct radv_shader_context > *ctx, > > LLVMValueRef c30 = LLVMConstInt(ctx->ac.i32, 30, 0); > > + alpha = LLVMBuildBitCast(ctx->ac.builder, alpha, ctx->ac.f32, ""); > + > if (adjustment == RADV_ALPHA_ADJUST_SSCALED) > alpha = LLVMBuildFPToUI(ctx->ac.builder, alpha, ctx->ac.i32, > ""); > else > @@ -2035,7 +2037,7 @@
Re: [Mesa-dev] [PATCH] radv: Fix driverUUID
On Tue, 12 Mar 2019 at 15:12, Samuel Pitoiset wrote: > > Reviewed-by: Samuel Pitoiset > A fixes tag like below would be great ;-) Fixes: 14cad8786a8 ("radv: generate the same driver UUID as radeonsi") Reviewed-by: Emil Velikov Aside: seems like radv_get_driver_uuid() produces "AMD-MESA-DRV" - a not so unique identifier. Perhaps we should fix that at some point? HTH Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] gallium/docs: clarify set_sampler_views
Am 12.03.19 um 16:16 schrieb Rob Clark: > This previously was not called out clearly, but based on a survey of the > code, it seems the expected behavior is to release the reference to any > sampler views beyond the new range being bound. That isn't really true. This was designed to work like d3d10, where other views are unmodified. The cso code will actually unset all views which previously were set and are above the num_views in the call (this wouldn't be necessary if the pipe function itself would work like this). However, it will only do this for fragment textures, and pass through the parameters unmodified otherwise. Which means behavior might not be very consistent for the different stages... > > I think radeonsi and freedreno were the only ones not doing this. Which > could probably temporarily leak a bit of memory by holding on to the > sampler view reference. Not sure about other drivers, but llvmpipe will not do this neither. Roland > > Signed-off-by: Rob Clark > --- > src/gallium/docs/source/context.rst | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/src/gallium/docs/source/context.rst > b/src/gallium/docs/source/context.rst > index f89d9e1005e..199d335f8f4 100644 > --- a/src/gallium/docs/source/context.rst > +++ b/src/gallium/docs/source/context.rst > @@ -143,6 +143,9 @@ to the array index which is used for sampling. >to a respective sampler view and releases a reference to the previous >sampler view. > > + Previously bound samplers with index ``>= num_views`` are unbound rather > + than unmodified. > + > * ``create_sampler_view`` creates a new sampler view. ``texture`` is > associated >with the sampler view which results in sampler view holding a reference >to the texture. Format specified in template must be compatible > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: init hash keys with memset(), not designated initializers
Am 12.03.19 um 18:34 schrieb Eric Anholt: > Ilia Mirkin writes: > >> I believe the distinction here is between initializing all the fields >> and using them individually (thus leaving the padding uninitialized), >> vs using the fields to create a structure to hash as a sequence of >> bytes. For the latter, you really need all the memory initialized, not >> just the "valid" parts of the structure. In at least my mind, it's >> fairly well-established that compilers don't always initialize all of >> a structure's underlying bytes, but I also don't have a specific >> instance of that situation I can point to. >> >> For most usage, foo = {0} is fine since you're not hashing the bytes >> but rather accessing the fields directly. But for hashing, you really >> want all the bits initialized. > > Gah. The commit message even said it was about padding, and I failed to > read. Sorry for the noise, this does seem right. The alternative is to use explicit padding in the struct, so that structure initialization is guaranteed to initialize everything to zero. I believe some places still do that too, but it's not very nice neither (can easily make mistakes there). So - pick your poison... Roland > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fmesa-devdata=02%7C01%7Csroland%40vmware.com%7Cd6e706866d844beb84a808d6a7110f53%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636880089068822714sdata=AENcqpYCV4d4DJW3q4Bkg9GXIEyU6Au8Yccuthd7Mbk%3Dreserved=0 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: init hash keys with memset(), not designated initializers
Ilia Mirkin writes: > I believe the distinction here is between initializing all the fields > and using them individually (thus leaving the padding uninitialized), > vs using the fields to create a structure to hash as a sequence of > bytes. For the latter, you really need all the memory initialized, not > just the "valid" parts of the structure. In at least my mind, it's > fairly well-established that compilers don't always initialize all of > a structure's underlying bytes, but I also don't have a specific > instance of that situation I can point to. > > For most usage, foo = {0} is fine since you're not hashing the bytes > but rather accessing the fields directly. But for hashing, you really > want all the bits initialized. Gah. The commit message even said it was about padding, and I failed to read. Sorry for the noise, this does seem right. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: init hash keys with memset(), not designated initializers
I believe the distinction here is between initializing all the fields and using them individually (thus leaving the padding uninitialized), vs using the fields to create a structure to hash as a sequence of bytes. For the latter, you really need all the memory initialized, not just the "valid" parts of the structure. In at least my mind, it's fairly well-established that compilers don't always initialize all of a structure's underlying bytes, but I also don't have a specific instance of that situation I can point to. For most usage, foo = {0} is fine since you're not hashing the bytes but rather accessing the fields directly. But for hashing, you really want all the bits initialized. Cheers, -ilia On Tue, Mar 12, 2019 at 12:31 PM Eric Anholt wrote: > > Brian Paul writes: > > > On 03/11/2019 04:47 PM, Eric Anholt wrote: > >> Brian Paul writes: > >> > >>> Since the compiler may not zero-out padding in the object. > >>> Add a couple comments about this to prevent misunderstandings in > >>> the future. > >>> > >>> Fixes: 67d96816ff5 ("st/mesa: move, clean-up shader variant key > >>> decls/inits") > >>> --- > >>> src/mesa/state_tracker/st_atom_shader.c | 9 +++-- > >>> src/mesa/state_tracker/st_program.c | 13 ++--- > >>> 2 files changed, 17 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/src/mesa/state_tracker/st_atom_shader.c > >>> b/src/mesa/state_tracker/st_atom_shader.c > >>> index ac7a1a5..a4475e2 100644 > >>> --- a/src/mesa/state_tracker/st_atom_shader.c > >>> +++ b/src/mesa/state_tracker/st_atom_shader.c > >>> @@ -112,7 +112,10 @@ st_update_fp( struct st_context *st ) > >>> !stfp->variants->key.bitmap) { > >>> shader = stfp->variants->driver_shader; > >>> } else { > >>> - struct st_fp_variant_key key = {0}; > >>> + struct st_fp_variant_key key; > >>> + > >>> + /* use memset, not an initializer to be sure all memory is zeroed > >>> */ > >>> + memset(, 0, sizeof(key)); > >> > >> Wait, what? We rely on this form of initialization all over, what's > >> changed? > > > > The question is do all compilers, when presented with > > > > struct st_fp_variant_key key = {0}; > > You seem to be saying that not all compilers do, but throughout the > compiler we're relying on all compilers doing so. Can we get some more > information about what compiler you're fixing here? > -BEGIN PGP SIGNATURE- > > iQIzBAEBCgAdFiEE/JuuFDWp9/ZkuCBXtdYpNtH8nugFAlyH3tsACgkQtdYpNtH8 > nujFFxAAkzNBlIJyvZhaUuYZ3B7ugbHxCnyfsi6J9AmAKcXyyJ0oIAi1sXkLwflB > e5WbyksenHiE+NnihdQZICV2fzb0G8WObn5HiBSxNGKqEwATQY3ND3gkzvH2npKT > EGY0AZCPX/yJPA3yci/MRTLLPzLgSYTUstfEsAmTlRlqmOlWDenV6BV7OpOoTz/F > bVdrk6QXufLcQ+ZtQS2OFKEKlXdaUaE5ruwanasc8qzXmJ0TMonvRZp4ZrNUNc4j > p1sUQnAU0SxwuSqIVS6iVmPEH/N2e3u4XdNsZ+45KxFkAnV0wnpIKAbREa1tzOdP > d2nhZ70BUNnHJE5q11I070mGWkburTKk5hbLvFGS59np/2nLjZQLcb3hYNyCPDP2 > sdRwqzPtwbdNLAbHpiQ3x0OuVJ6P5fuxGUjuB/AQfl1ixV03nB77AXogqhlQ/cK4 > WceUn9AXmhBb5tIdhfKK1uzuplmCSpXaKxdPubvgfI4Rey5KMvcZwBFNqztcfe24 > hc3wPFyfmayNAuFHkC+0jd4po20ibJkF1F0kXjoyl7dKMG6IFX4fICJKPsafIgNV > cex67m6VAF9ZalBaSgnXZpcmHXSFbVSHYvES/WxRncefgAzXG/BMuywsLZ+US99A > LfrnLaq0eRDAdpcSfITQmDgpjEvWFThxCDXXBCI5a+pQpYBB6lk= > =4Bvh > -END PGP SIGNATURE-___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: init hash keys with memset(), not designated initializers
Brian Paul writes: > On 03/11/2019 04:47 PM, Eric Anholt wrote: >> Brian Paul writes: >> >>> Since the compiler may not zero-out padding in the object. >>> Add a couple comments about this to prevent misunderstandings in >>> the future. >>> >>> Fixes: 67d96816ff5 ("st/mesa: move, clean-up shader variant key >>> decls/inits") >>> --- >>> src/mesa/state_tracker/st_atom_shader.c | 9 +++-- >>> src/mesa/state_tracker/st_program.c | 13 ++--- >>> 2 files changed, 17 insertions(+), 5 deletions(-) >>> >>> diff --git a/src/mesa/state_tracker/st_atom_shader.c >>> b/src/mesa/state_tracker/st_atom_shader.c >>> index ac7a1a5..a4475e2 100644 >>> --- a/src/mesa/state_tracker/st_atom_shader.c >>> +++ b/src/mesa/state_tracker/st_atom_shader.c >>> @@ -112,7 +112,10 @@ st_update_fp( struct st_context *st ) >>> !stfp->variants->key.bitmap) { >>> shader = stfp->variants->driver_shader; >>> } else { >>> - struct st_fp_variant_key key = {0}; >>> + struct st_fp_variant_key key; >>> + >>> + /* use memset, not an initializer to be sure all memory is zeroed */ >>> + memset(, 0, sizeof(key)); >> >> Wait, what? We rely on this form of initialization all over, what's >> changed? > > The question is do all compilers, when presented with > > struct st_fp_variant_key key = {0}; You seem to be saying that not all compilers do, but throughout the compiler we're relying on all compilers doing so. Can we get some more information about what compiler you're fixing here? signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 11/11] ac: use new LLVM 8 intrinsics in ac_build_buffer_store_dword()
New buffer intrinsics have a separate soffset parameter. Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 66 ++ 1 file changed, 26 insertions(+), 40 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index ce6639d49bf..8ed5199da55 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1227,59 +1227,45 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx, if (!swizzle_enable_hint) { LLVMValueRef offset = soffset; - static const char *types[] = {"f32", "v2f32", "v4f32"}; - if (inst_offset) offset = LLVMBuildAdd(ctx->builder, offset, LLVMConstInt(ctx->i32, inst_offset, 0), ""); - if (voffset) - offset = LLVMBuildAdd(ctx->builder, offset, voffset, ""); - - LLVMValueRef args[] = { - ac_to_float(ctx, vdata), - LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""), - ctx->i32_0, - offset, - LLVMConstInt(ctx->i1, glc, 0), - LLVMConstInt(ctx->i1, slc, 0), - }; - - char name[256]; - snprintf(name, sizeof(name), "llvm.amdgcn.buffer.store.%s", -types[CLAMP(num_channels, 1, 3) - 1]); - ac_build_intrinsic(ctx, name, ctx->voidt, - args, ARRAY_SIZE(args), - ac_get_store_intr_attribs(writeonly_memory)); + if (HAVE_LLVM >= 0x800) { + ac_build_llvm8_buffer_store_common(ctx, rsrc, + ac_to_float(ctx, vdata), + ctx->i32_0, + voffset, offset, + num_channels, + glc, slc, + writeonly_memory, + false, true); + } else { + if (voffset) + offset = LLVMBuildAdd(ctx->builder, offset, voffset, ""); + + ac_build_buffer_store_common(ctx, rsrc, +ac_to_float(ctx, vdata), +ctx->i32_0, offset, +num_channels, glc, slc, +writeonly_memory, false); + } return; } - static const unsigned dfmt[] = { + static const unsigned dfmts[] = { V_008F0C_BUF_DATA_FORMAT_32, V_008F0C_BUF_DATA_FORMAT_32_32, V_008F0C_BUF_DATA_FORMAT_32_32_32, V_008F0C_BUF_DATA_FORMAT_32_32_32_32 }; - static const char *types[] = {"i32", "v2i32", "v4i32"}; - LLVMValueRef args[] = { - vdata, - LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""), - ctx->i32_0, - voffset ? voffset : ctx->i32_0, - soffset, - LLVMConstInt(ctx->i32, inst_offset, 0), - LLVMConstInt(ctx->i32, dfmt[num_channels - 1], 0), - LLVMConstInt(ctx->i32, V_008F0C_BUF_NUM_FORMAT_UINT, 0), - LLVMConstInt(ctx->i1, glc, 0), - LLVMConstInt(ctx->i1, slc, 0), - }; - char name[256]; - snprintf(name, sizeof(name), "llvm.amdgcn.tbuffer.store.%s", -types[CLAMP(num_channels, 1, 3) - 1]); + unsigned dfmt = dfmts[num_channels - 1]; + unsigned nfmt = V_008F0C_BUF_NUM_FORMAT_UINT; + LLVMValueRef immoffset = LLVMConstInt(ctx->i32, inst_offset, 0); - ac_build_intrinsic(ctx, name, ctx->voidt, - args, ARRAY_SIZE(args), - ac_get_store_intr_attribs(writeonly_memory)); + ac_build_tbuffer_store(ctx, rsrc, vdata, ctx->i32_0, voffset, soffset, + immoffset, num_channels, dfmt, nfmt, glc, slc, + writeonly_memory); } static LLVMValueRef -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/11] ac: use new LLVM 8 intrinsic when storing 16-bit values
Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 108 src/amd/common/ac_llvm_build.h | 26 src/amd/common/ac_nir_to_llvm.c | 26 ++-- 3 files changed, 139 insertions(+), 21 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 7aec8154a76..ce6639d49bf 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1550,6 +1550,114 @@ ac_build_llvm8_tbuffer_load(struct ac_llvm_context *ctx, ac_get_load_intr_attribs(can_speculate)); } +static void +ac_build_llvm8_tbuffer_store(struct ac_llvm_context *ctx, +LLVMValueRef rsrc, +LLVMValueRef vdata, +LLVMValueRef vindex, +LLVMValueRef voffset, +LLVMValueRef soffset, +unsigned num_channels, +unsigned dfmt, +unsigned nfmt, +bool glc, +bool slc, +bool writeonly_memory, +bool structurized) +{ + LLVMValueRef args[7]; + int idx = 0; + args[idx++] = vdata; + args[idx++] = LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""); + if (structurized) + args[idx++] = vindex ? vindex : ctx->i32_0; + args[idx++] = voffset ? voffset : ctx->i32_0; + args[idx++] = soffset ? soffset : ctx->i32_0; + args[idx++] = LLVMConstInt(ctx->i32, dfmt | (nfmt << 4), 0); + args[idx++] = LLVMConstInt(ctx->i32, (glc ? 1 : 0) + (slc ? 2 : 0), 0); + unsigned func = CLAMP(num_channels, 1, 3) - 1; + + const char *type_names[] = {"i32", "v2i32", "v4i32"}; + const char *indexing_kind = structurized ? "struct" : "raw"; + char name[256]; + + snprintf(name, sizeof(name), "llvm.amdgcn.%s.tbuffer.store.%s", +indexing_kind, type_names[func]); + + ac_build_intrinsic(ctx, name, ctx->voidt, args, idx, + ac_get_store_intr_attribs(writeonly_memory)); +} + +void +ac_build_tbuffer_store(struct ac_llvm_context *ctx, + LLVMValueRef rsrc, + LLVMValueRef vdata, + LLVMValueRef vindex, + LLVMValueRef voffset, + LLVMValueRef soffset, + LLVMValueRef immoffset, + unsigned num_channels, + unsigned dfmt, + unsigned nfmt, + bool glc, + bool slc, + bool writeonly_memory) +{ + if (HAVE_LLVM >= 0x800) { + voffset = LLVMBuildAdd(ctx->builder, + voffset ? voffset : ctx->i32_0, + immoffset, ""); + + ac_build_llvm8_tbuffer_store(ctx, rsrc, vdata, vindex, voffset, +soffset, num_channels, dfmt, nfmt, +glc, slc, writeonly_memory, true); + } else { + LLVMValueRef params[] = { + vdata, + rsrc, + vindex, + voffset ? voffset : ctx->i32_0, + soffset ? soffset : ctx->i32_0, + immoffset, + LLVMConstInt(ctx->i32, dfmt, false), + LLVMConstInt(ctx->i32, nfmt, false), + LLVMConstInt(ctx->i32, glc, false), + LLVMConstInt(ctx->i32, slc, false), + }; + unsigned func = CLAMP(num_channels, 1, 3) - 1; + const char *type_names[] = {"i32", "v2i32", "v4i32"}; + char name[256]; + + snprintf(name, sizeof(name), "llvm.amdgcn.tbuffer.store.%s", +type_names[func]); + + ac_build_intrinsic(ctx, name, ctx->voidt, params, 10, + ac_get_store_intr_attribs(writeonly_memory)); + } +} + +void +ac_build_tbuffer_store_short(struct ac_llvm_context *ctx, +LLVMValueRef rsrc, +LLVMValueRef vdata, +LLVMValueRef vindex, +LLVMValueRef voffset, +LLVMValueRef soffset, +bool glc, +bool slc, +bool writeonly_memory) +{ + unsigned dfmt = V_008F0C_BUF_DATA_FORMAT_16; + unsigned nfmt = V_008F0C_BUF_NUM_FORMAT_UINT; + + vdata = LLVMBuildBitCast(ctx->builder, vdata, ctx->i16, ""); + vdata = LLVMBuildZExt(ctx->builder, vdata, ctx->i32, ""); + + ac_build_tbuffer_store(ctx, rsrc,
[Mesa-dev] [PATCH 09/11] ac: use new LLVM 8 intrinsics in ac_build_buffer_load()
Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 8 1 file changed, 8 insertions(+) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index bb8a470ae1d..7aec8154a76 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1412,6 +1412,14 @@ ac_build_buffer_load(struct ac_llvm_context *ctx, return ac_build_gather_values(ctx, result, num_channels); } + if (HAVE_LLVM >= 0x0800) { + return ac_build_llvm8_buffer_load_common(ctx, rsrc, vindex, +offset, ctx->i32_0, +num_channels, glc, slc, +can_speculate, false, +true); + } + return ac_build_buffer_load_common(ctx, rsrc, vindex, offset, num_channels, glc, slc, can_speculate, false); -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/11] ac: add ac_build_buffer_store_format() helper
Similar to ac_build_buffer_load_format(). Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 100 src/amd/common/ac_llvm_build.h | 11 src/amd/common/ac_nir_to_llvm.c | 29 +++-- 3 files changed, 119 insertions(+), 21 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 253073e52fb..bb8a470ae1d 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1082,6 +1082,106 @@ LLVMValueRef ac_build_load_to_sgpr_uint_wraparound(struct ac_llvm_context *ctx, return ac_build_load_custom(ctx, base_ptr, index, true, true, false); } +static void +ac_build_buffer_store_common(struct ac_llvm_context *ctx, +LLVMValueRef rsrc, +LLVMValueRef data, +LLVMValueRef vindex, +LLVMValueRef voffset, +unsigned num_channels, +bool glc, +bool slc, +bool writeonly_memory, +bool use_format) +{ + LLVMValueRef args[] = { + data, + LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""), + vindex ? vindex : ctx->i32_0, + voffset, + LLVMConstInt(ctx->i1, glc, 0), + LLVMConstInt(ctx->i1, slc, 0) + }; + unsigned func = CLAMP(num_channels, 1, 3) - 1; + + const char *type_names[] = {"f32", "v2f32", "v4f32"}; + char name[256]; + + if (use_format) { + snprintf(name, sizeof(name), "llvm.amdgcn.buffer.store.format.%s", +type_names[func]); + } else { + snprintf(name, sizeof(name), "llvm.amdgcn.buffer.store.%s", +type_names[func]); + } + + ac_build_intrinsic(ctx, name, ctx->voidt, args, ARRAY_SIZE(args), + ac_get_store_intr_attribs(writeonly_memory)); +} + +static void +ac_build_llvm8_buffer_store_common(struct ac_llvm_context *ctx, + LLVMValueRef rsrc, + LLVMValueRef data, + LLVMValueRef vindex, + LLVMValueRef voffset, + LLVMValueRef soffset, + unsigned num_channels, + bool glc, + bool slc, + bool writeonly_memory, + bool use_format, + bool structurized) +{ + LLVMValueRef args[5]; + int idx = 0; + args[idx++] = data; + args[idx++] = LLVMBuildBitCast(ctx->builder, rsrc, ctx->v4i32, ""); + if (structurized) + args[idx++] = vindex ? vindex : ctx->i32_0; + args[idx++] = voffset ? voffset : ctx->i32_0; + args[idx++] = soffset ? soffset : ctx->i32_0; + args[idx++] = LLVMConstInt(ctx->i32, (glc ? 1 : 0) + (slc ? 2 : 0), 0); + unsigned func = CLAMP(num_channels, 1, 3) - 1; + + const char *type_names[] = {"f32", "v2f32", "v4f32"}; + const char *indexing_kind = structurized ? "struct" : "raw"; + char name[256]; + + if (use_format) { + snprintf(name, sizeof(name), "llvm.amdgcn.%s.buffer.store.format.%s", +indexing_kind, type_names[func]); + } else { + snprintf(name, sizeof(name), "llvm.amdgcn.%s.buffer.store.%s", +indexing_kind, type_names[func]); + } + + ac_build_intrinsic(ctx, name, ctx->voidt, args, idx, + ac_get_store_intr_attribs(writeonly_memory)); +} + +void +ac_build_buffer_store_format(struct ac_llvm_context *ctx, +LLVMValueRef rsrc, +LLVMValueRef data, +LLVMValueRef vindex, +LLVMValueRef voffset, +unsigned num_channels, +bool glc, +bool writeonly_memory) +{ + if (HAVE_LLVM >= 0x800) { + ac_build_llvm8_buffer_store_common(ctx, rsrc, data, vindex, + voffset, NULL, num_channels, + glc, false, writeonly_memory, + true, true); + } else { + ac_build_buffer_store_common(ctx, rsrc, data, vindex, voffset, +num_channels, glc, false, +writeonly_memory, true); + } +} + /* TBUFFER_STORE_FORMAT_{X,XY,XYZ,XYZW} <- the suffix is selected by num_channels=1..4. * The type of vdata must be one of i32
[Mesa-dev] [PATCH 05/11] ac/nir: remove one useless check in visit_store_ssbo()
Trivial. Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_nir_to_llvm.c | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index c10a0cce16f..55d3ce90ce4 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1567,12 +1567,9 @@ static void visit_store_ssbo(struct ac_nir_context *ctx, } data = extract_vector_range(>ac, base_data, start, count); - if (start == 0) { - offset = base_offset; - } else { - offset = LLVMBuildAdd(ctx->ac.builder, base_offset, - LLVMConstInt(ctx->ac.i32, start * elem_size_bytes, false), ""); - } + offset = LLVMBuildAdd(ctx->ac.builder, base_offset, + LLVMConstInt(ctx->ac.i32, start * elem_size_bytes, false), ""); + if (num_bytes == 2) { store_name = "llvm.amdgcn.tbuffer.store.i32"; data_type = ctx->ac.i32; -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/11] ac/nir: use ac_build_buffer_store_dword() for SSBO store operations
Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_nir_to_llvm.c | 23 +-- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index be0dca2d857..47a865de36f 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1591,34 +1591,29 @@ static void visit_store_ssbo(struct ac_nir_context *ctx, ctx->ac.voidt, tbuffer_params, 10, ac_get_store_intr_attribs(writeonly_memory)); } else { + int num_channels = num_bytes / 4; + switch (num_bytes) { case 16: /* v4f32 */ - store_name = "llvm.amdgcn.buffer.store.v4f32"; data_type = ctx->ac.v4f32; break; case 8: /* v2f32 */ - store_name = "llvm.amdgcn.buffer.store.v2f32"; data_type = ctx->ac.v2f32; break; case 4: /* f32 */ - store_name = "llvm.amdgcn.buffer.store.f32"; data_type = ctx->ac.f32; break; default: unreachable("Malformed vector store."); } data = LLVMBuildBitCast(ctx->ac.builder, data, data_type, ""); - LLVMValueRef params[] = { - data, - rsrc, - ctx->ac.i32_0, /* vindex */ - offset, - glc, - ctx->ac.i1false, /* slc */ - }; - ac_build_intrinsic(>ac, store_name, - ctx->ac.voidt, params, 6, - ac_get_store_intr_attribs(writeonly_memory)); + + ac_build_buffer_store_dword(>ac, rsrc, data, + num_channels, offset, + ctx->ac.i32_0, 0, + cache_policy & ac_glc, + false, writeonly_memory, + false); } } } -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/11] ac/nir: use ac_build_buffer_load() for SSBO load operations
Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_nir_to_llvm.c | 35 ++--- 1 file changed, 6 insertions(+), 29 deletions(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 922f1063c79..be0dca2d857 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1703,7 +1703,6 @@ static LLVMValueRef visit_load_buffer(struct ac_nir_context *ctx, int num_components = instr->num_components; enum gl_access_qualifier access = nir_intrinsic_access(instr); unsigned cache_policy = get_cache_policy(ctx, access, false, false); - LLVMValueRef glc = (cache_policy & ac_glc) ? ctx->ac.i1true : ctx->ac.i1false; LLVMValueRef offset = get_src(ctx, instr->src[1]); LLVMValueRef rsrc = ctx->abi->load_ssbo(ctx->abi, @@ -1734,34 +1733,12 @@ static LLVMValueRef visit_load_buffer(struct ac_nir_context *ctx, immoffset, cache_policy & ac_glc); } else { - const char *load_name; - LLVMTypeRef data_type; - switch (load_bytes) { - case 16: - case 12: - load_name = "llvm.amdgcn.buffer.load.v4f32"; - data_type = ctx->ac.v4f32; - break; - case 8: - case 6: - load_name = "llvm.amdgcn.buffer.load.v2f32"; - data_type = ctx->ac.v2f32; - break; - case 4: - load_name = "llvm.amdgcn.buffer.load.f32"; - data_type = ctx->ac.f32; - break; - default: - unreachable("Malformed load buffer."); - } - LLVMValueRef params[] = { - rsrc, - vindex, - LLVMBuildAdd(ctx->ac.builder, offset, immoffset, ""), - glc, - ctx->ac.i1false, - }; - ret = ac_build_intrinsic(>ac, load_name, data_type, params, 5, 0); + int num_channels = util_next_power_of_two(load_bytes) / 4; + + ret = ac_build_buffer_load(>ac, rsrc, num_channels, + vindex, offset, immoffset, 0, + cache_policy & ac_glc, false, + false, false); } LLVMTypeRef byte_vec = LLVMVectorType(ctx->ac.i8, ac_get_type_size(LLVMTypeOf(ret))); -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/11] ac: fix glc parameter use for new LLVM 8 typed buffer intrinsics
ac_build_llvm8_tbuffer_load() expects a boolean for glc. Fixes: 2cf5433b99f ("ac: use new LLVM 8 intrinsic when loading 16-bit values") Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 4 ++-- src/amd/common/ac_llvm_build.h | 2 +- src/amd/common/ac_nir_to_llvm.c | 4 ++-- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index bc64f0bb7e3..88ea289a121 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1376,7 +1376,7 @@ ac_build_tbuffer_load_short(struct ac_llvm_context *ctx, LLVMValueRef voffset, LLVMValueRef soffset, LLVMValueRef immoffset, - LLVMValueRef glc) + bool glc) { unsigned dfmt = V_008F0C_BUF_DATA_FORMAT_16; unsigned nfmt = V_008F0C_BUF_NUM_FORMAT_UINT; @@ -1399,7 +1399,7 @@ ac_build_tbuffer_load_short(struct ac_llvm_context *ctx, immoffset, LLVMConstInt(ctx->i32, dfmt, false), LLVMConstInt(ctx->i32, nfmt, false), - glc, + LLVMConstInt(ctx->i32, glc, false), ctx->i1false, }; res = ac_build_intrinsic(ctx, name, type, params, 9, 0); diff --git a/src/amd/common/ac_llvm_build.h b/src/amd/common/ac_llvm_build.h index fd5c4295abf..0fb3eb52f05 100644 --- a/src/amd/common/ac_llvm_build.h +++ b/src/amd/common/ac_llvm_build.h @@ -304,7 +304,7 @@ ac_build_tbuffer_load_short(struct ac_llvm_context *ctx, LLVMValueRef voffset, LLVMValueRef soffset, LLVMValueRef immoffset, - LLVMValueRef glc); + bool glc); LLVMValueRef ac_build_llvm8_tbuffer_load(struct ac_llvm_context *ctx, diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 5fb5c8da609..a7b3fdf64aa 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1716,7 +1716,7 @@ static LLVMValueRef visit_load_buffer(struct ac_nir_context *ctx, offset, ctx->ac.i32_0, immoffset, - glc); + cache_policy & ac_glc); } else { const char *load_name; LLVMTypeRef data_type; @@ -1787,7 +1787,7 @@ static LLVMValueRef visit_load_ubo_buffer(struct ac_nir_context *ctx, offset, ctx->ac.i32_0, LLVMConstInt(ctx->ac.i32, 2 * i, 0), - ctx->ac.i1false); +false); } ret = ac_build_gather_values(>ac, results, num_components); } else { -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 06/11] ac/nir: use new LLVM 8 intrinsics for SSBO atomic operations
Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_nir_to_llvm.c | 65 + 1 file changed, 41 insertions(+), 24 deletions(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index 55d3ce90ce4..922f1063c79 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1626,57 +1626,74 @@ static void visit_store_ssbo(struct ac_nir_context *ctx, static LLVMValueRef visit_atomic_ssbo(struct ac_nir_context *ctx, const nir_intrinsic_instr *instr) { - const char *name; - LLVMValueRef params[6]; + const char *atomic_name; + char intrinsic_name[64]; + LLVMValueRef params[7]; int arg_count = 0; - - if (instr->intrinsic == nir_intrinsic_ssbo_atomic_comp_swap) { - params[arg_count++] = ac_llvm_extract_elem(>ac, get_src(ctx, instr->src[3]), 0); - } - params[arg_count++] = ac_llvm_extract_elem(>ac, get_src(ctx, instr->src[2]), 0); - params[arg_count++] = ctx->abi->load_ssbo(ctx->abi, -get_src(ctx, instr->src[0]), -true); - params[arg_count++] = ctx->ac.i32_0; /* vindex */ - params[arg_count++] = get_src(ctx, instr->src[1]); /* voffset */ - params[arg_count++] = ctx->ac.i1false; /* slc */ + int length; switch (instr->intrinsic) { case nir_intrinsic_ssbo_atomic_add: - name = "llvm.amdgcn.buffer.atomic.add"; + atomic_name = "add"; break; case nir_intrinsic_ssbo_atomic_imin: - name = "llvm.amdgcn.buffer.atomic.smin"; + atomic_name = "smin"; break; case nir_intrinsic_ssbo_atomic_umin: - name = "llvm.amdgcn.buffer.atomic.umin"; + atomic_name = "umin"; break; case nir_intrinsic_ssbo_atomic_imax: - name = "llvm.amdgcn.buffer.atomic.smax"; + atomic_name = "smax"; break; case nir_intrinsic_ssbo_atomic_umax: - name = "llvm.amdgcn.buffer.atomic.umax"; + atomic_name = "umax"; break; case nir_intrinsic_ssbo_atomic_and: - name = "llvm.amdgcn.buffer.atomic.and"; + atomic_name = "and"; break; case nir_intrinsic_ssbo_atomic_or: - name = "llvm.amdgcn.buffer.atomic.or"; + atomic_name = "or"; break; case nir_intrinsic_ssbo_atomic_xor: - name = "llvm.amdgcn.buffer.atomic.xor"; + atomic_name = "xor"; break; case nir_intrinsic_ssbo_atomic_exchange: - name = "llvm.amdgcn.buffer.atomic.swap"; + atomic_name = "swap"; break; case nir_intrinsic_ssbo_atomic_comp_swap: - name = "llvm.amdgcn.buffer.atomic.cmpswap"; + atomic_name = "cmpswap"; break; default: abort(); } - return ac_build_intrinsic(>ac, name, ctx->ac.i32, params, arg_count, 0); + if (instr->intrinsic == nir_intrinsic_ssbo_atomic_comp_swap) { + params[arg_count++] = ac_llvm_extract_elem(>ac, get_src(ctx, instr->src[3]), 0); + } + params[arg_count++] = ac_llvm_extract_elem(>ac, get_src(ctx, instr->src[2]), 0); + params[arg_count++] = ctx->abi->load_ssbo(ctx->abi, +get_src(ctx, instr->src[0]), +true); + params[arg_count++] = ctx->ac.i32_0; /* vindex */ + params[arg_count++] = get_src(ctx, instr->src[1]); /* voffset */ + + if (HAVE_LLVM >= 0x0800) { + params[arg_count++] = ctx->ac.i32_0; /* soffset */ + params[arg_count++] = ctx->ac.i32_0; /* slc */ + + length = snprintf(intrinsic_name, sizeof(intrinsic_name), + "llvm.amdgcn.struct.buffer.atomic.%s.i32", + atomic_name); + } else { + params[arg_count++] = ctx->ac.i1false; /* slc */ + + length = snprintf(intrinsic_name, sizeof(intrinsic_name), + "llvm.amdgcn.buffer.atomic.%s", atomic_name); + } + + assert(length < sizeof(intrinsic_name)); + return ac_build_intrinsic(>ac, intrinsic_name, ctx->ac.i32, + params, arg_count, 0); } static LLVMValueRef visit_load_buffer(struct ac_nir_context *ctx, -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/11] ac/nir: set attrib flags for SSBO and image store operations
For consistency regarding other store operations. Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_nir_to_llvm.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/amd/common/ac_nir_to_llvm.c b/src/amd/common/ac_nir_to_llvm.c index a7b3fdf64aa..ff29345ffe5 100644 --- a/src/amd/common/ac_nir_to_llvm.c +++ b/src/amd/common/ac_nir_to_llvm.c @@ -1591,7 +1591,8 @@ static void visit_store_ssbo(struct ac_nir_context *ctx, ctx->ac.i1false, }; ac_build_intrinsic(>ac, store_name, - ctx->ac.voidt, tbuffer_params, 10, 0); + ctx->ac.voidt, tbuffer_params, 10, + ac_get_store_intr_attribs(writeonly_memory)); } else { switch (num_bytes) { case 16: /* v4f32 */ @@ -1619,7 +1620,8 @@ static void visit_store_ssbo(struct ac_nir_context *ctx, ctx->ac.i1false, /* slc */ }; ac_build_intrinsic(>ac, store_name, - ctx->ac.voidt, params, 6, 0); + ctx->ac.voidt, params, 6, + ac_get_store_intr_attribs(writeonly_memory)); } } } @@ -2548,7 +2550,8 @@ static void visit_image_store(struct ac_nir_context *ctx, params[4] = LLVMConstInt(ctx->ac.i1, !!(args.cache_policy & ac_glc), 0); params[5] = ctx->ac.i1false; /* slc */ } - ac_build_intrinsic(>ac, name, ctx->ac.voidt, params, 6, 0); + ac_build_intrinsic(>ac, name, ctx->ac.voidt, params, 6, + ac_get_store_intr_attribs(writeonly_memory)); } else { args.opcode = ac_image_store; args.data[0] = ac_to_float(>ac, get_src(ctx, instr->src[3])); -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 02/11] ac: make use of ac_get_store_intr_attribs() where possible
Signed-off-by: Samuel Pitoiset --- src/amd/common/ac_llvm_build.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/src/amd/common/ac_llvm_build.c b/src/amd/common/ac_llvm_build.c index 88ea289a121..253073e52fb 100644 --- a/src/amd/common/ac_llvm_build.c +++ b/src/amd/common/ac_llvm_build.c @@ -1150,9 +1150,7 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx, ac_build_intrinsic(ctx, name, ctx->voidt, args, ARRAY_SIZE(args), - writeonly_memory ? - AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY : - AC_FUNC_ATTR_WRITEONLY); + ac_get_store_intr_attribs(writeonly_memory)); return; } @@ -1181,9 +1179,7 @@ ac_build_buffer_store_dword(struct ac_llvm_context *ctx, ac_build_intrinsic(ctx, name, ctx->voidt, args, ARRAY_SIZE(args), - writeonly_memory ? - AC_FUNC_ATTR_INACCESSIBLE_MEM_ONLY : - AC_FUNC_ATTR_WRITEONLY); + ac_get_store_intr_attribs(writeonly_memory)); } static LLVMValueRef -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 00/11] ac: use LLVM 8 buffer intrinsics everywhere
Hi, This small series makes use of new LLVM 8 buffer intrinsics. No CTS regressions on GFX8 with LLVM 7, 8 and master. Please review, Thanks! Samuel Pitoiset (11): ac: fix glc parameter use for new LLVM 8 typed buffer intrinsics ac: make use of ac_get_store_intr_attribs() where possible ac/nir: set attrib flags for SSBO and image store operations ac: add ac_build_buffer_store_format() helper ac/nir: remove one useless check in visit_store_ssbo() ac/nir: use new LLVM 8 intrinsics for SSBO atomic operations ac/nir: use ac_build_buffer_load() for SSBO load operations ac/nir: use ac_build_buffer_store_dword() for SSBO store operations ac: use new LLVM 8 intrinsics in ac_build_buffer_load() ac: use new LLVM 8 intrinsic when storing 16-bit values ac: use new LLVM 8 intrinsics in ac_build_buffer_store_dword() src/amd/common/ac_llvm_build.c | 290 +++- src/amd/common/ac_llvm_build.h | 39 - src/amd/common/ac_nir_to_llvm.c | 188 - 3 files changed, 356 insertions(+), 161 deletions(-) -- 2.21.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] panfrost: Set bo->size[0] in the DRM backend
So we can unmap it later. Signed-off-by: Tomeu Vizoso --- include/drm-uapi/panfrost_drm.h| 1 - src/gallium/drivers/panfrost/pan_drm.c | 10 +- 2 files changed, 5 insertions(+), 6 deletions(-) diff --git a/include/drm-uapi/panfrost_drm.h b/include/drm-uapi/panfrost_drm.h index 7618f14f9e26..a0ead4979ccc 100644 --- a/include/drm-uapi/panfrost_drm.h +++ b/include/drm-uapi/panfrost_drm.h @@ -27,7 +27,6 @@ extern "C" { #define DRM_IOCTL_PANFROST_GET_BO_OFFSET DRM_IOWR(DRM_COMMAND_BASE + DRM_PANFROST_GET_BO_OFFSET, struct drm_panfrost_get_bo_offset) #define PANFROST_JD_REQ_FS (1 << 0) - /** * struct drm_panfrost_submit - ioctl argument for submitting commands to the 3D * engine. diff --git a/src/gallium/drivers/panfrost/pan_drm.c b/src/gallium/drivers/panfrost/pan_drm.c index 6d1129ff5f2b..71e8c6ac1f9d 100644 --- a/src/gallium/drivers/panfrost/pan_drm.c +++ b/src/gallium/drivers/panfrost/pan_drm.c @@ -125,7 +125,7 @@ panfrost_drm_import_bo(struct panfrost_screen *screen, struct winsys_handle *wha struct panfrost_drm *drm = (struct panfrost_drm *)screen->driver; struct drm_panfrost_get_bo_offset get_bo_offset = {0,}; struct drm_panfrost_mmap_bo mmap_bo = {0,}; -int ret, size; +int ret; unsigned gem_handle; ret = drmPrimeFDToHandle(drm->fd, whandle->handle, _handle); @@ -146,9 +146,9 @@ panfrost_drm_import_bo(struct panfrost_screen *screen, struct winsys_handle *wha assert(0); } - size = lseek(whandle->handle, 0, SEEK_END); - assert(size > 0); -bo->cpu[0] = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_SHARED, +bo->size[0] = lseek(whandle->handle, 0, SEEK_END); +assert(bo->size[0] > 0); +bo->cpu[0] = mmap(NULL, bo->size[0], PROT_READ | PROT_WRITE, MAP_SHARED, drm->fd, mmap_bo.offset); if (bo->cpu[0] == MAP_FAILED) { fprintf(stderr, "mmap failed: %p\n", bo->cpu[0]); @@ -156,7 +156,7 @@ panfrost_drm_import_bo(struct panfrost_screen *screen, struct winsys_handle *wha } /* Record the mmap if we're tracing */ -pantrace_mmap(bo->gpu[0], bo->cpu[0], size, NULL); +pantrace_mmap(bo->gpu[0], bo->cpu[0], bo->size[0], NULL); return bo; } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] panfrost: Set bo->gem_handle when creating a linear BO
So we can free it later. Signed-off-by: Tomeu Vizoso --- src/gallium/drivers/panfrost/pan_resource.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/gallium/drivers/panfrost/pan_resource.c b/src/gallium/drivers/panfrost/pan_resource.c index d647f618ee77..2fa468b177b9 100644 --- a/src/gallium/drivers/panfrost/pan_resource.c +++ b/src/gallium/drivers/panfrost/pan_resource.c @@ -237,6 +237,7 @@ panfrost_create_bo(struct panfrost_screen *screen, const struct pipe_resource *t bo->cpu[0] = mem.cpu; bo->gpu[0] = mem.gpu; +bo->gem_handle = mem.gem_handle; /* TODO: Mipmap */ } @@ -312,7 +313,8 @@ panfrost_destroy_bo(struct panfrost_screen *screen, struct panfrost_bo *pbo) struct panfrost_memory mem = { .cpu = bo->cpu[0], .gpu = bo->gpu[0], -.size = bo->size[0] +.size = bo->size[0], +.gem_handle = bo->gem_handle, }; screen->driver->free_slab(screen, ); -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 109535] [Tracker] Mesa 19.0 release tracker
https://bugs.freedesktop.org/show_bug.cgi?id=109535 Bug 109535 depends on bug 107510, which changed state. Bug 107510 Summary: [GEN8+] up to 10% perf drop on several 3D benchmarks https://bugs.freedesktop.org/show_bug.cgi?id=107510 What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] gallium/docs: clarify set_sampler_views
This previously was not called out clearly, but based on a survey of the code, it seems the expected behavior is to release the reference to any sampler views beyond the new range being bound. I think radeonsi and freedreno were the only ones not doing this. Which could probably temporarily leak a bit of memory by holding on to the sampler view reference. Signed-off-by: Rob Clark --- src/gallium/docs/source/context.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/gallium/docs/source/context.rst b/src/gallium/docs/source/context.rst index f89d9e1005e..199d335f8f4 100644 --- a/src/gallium/docs/source/context.rst +++ b/src/gallium/docs/source/context.rst @@ -143,6 +143,9 @@ to the array index which is used for sampling. to a respective sampler view and releases a reference to the previous sampler view. + Previously bound samplers with index ``>= num_views`` are unbound rather + than unmodified. + * ``create_sampler_view`` creates a new sampler view. ``texture`` is associated with the sampler view which results in sampler view holding a reference to the texture. Format specified in template must be compatible -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] Fix Bugzilla Bug 109646 - jagginess with video play
Bug 109646 - New video compositor compute shader render glitches mpv Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109646 Fixed Problem 2,3:Jaggines caused by interpolation issue with weave de-interlace James Zhu (3): gallium/auxiliary/vl: Increase shader_params size gallium/auxiliary/vl: Change grid setting gallium/auxiliary/vl: Change weave compute shader implementation src/gallium/auxiliary/vl/vl_compositor.c| 2 +- src/gallium/auxiliary/vl/vl_compositor_cs.c | 97 ++--- 2 files changed, 76 insertions(+), 23 deletions(-) -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] radv: Fix driverUUID
Reviewed-by: Samuel Pitoiset On 3/12/19 4:07 PM, Józef Kucia wrote: --- src/amd/vulkan/radv_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 83d218fb6bf0..6deb69d22d48 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -337,7 +337,7 @@ radv_physical_device_init(struct radv_physical_device *device, device->rad_info.chip_class > GFX9) fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n"); - radv_get_driver_uuid(>device_uuid); + radv_get_driver_uuid(>driver_uuid); radv_get_device_uuid(>rad_info, >device_uuid); if (device->rad_info.family == CHIP_STONEY || ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] gallium/auxiliary/vl: Change grid setting
Using draw area for grid setting instead of destination buffer size. Signed-off-by: James Zhu Tested-by: Bruno Milreu --- src/gallium/auxiliary/vl/vl_compositor_cs.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/src/gallium/auxiliary/vl/vl_compositor_cs.c b/src/gallium/auxiliary/vl/vl_compositor_cs.c index 70898be..de0a3c7 100644 --- a/src/gallium/auxiliary/vl/vl_compositor_cs.c +++ b/src/gallium/auxiliary/vl/vl_compositor_cs.c @@ -221,7 +221,8 @@ const char *compute_shader_rgba = static void cs_launch(struct vl_compositor *c, - void *cs) + void *cs, + const struct u_rect *draw_area) { struct pipe_context *ctx = c->pipe; @@ -241,8 +242,8 @@ cs_launch(struct vl_compositor *c, info.block[0] = 8; info.block[1] = 8; info.block[2] = 1; - info.grid[0] = DIV_ROUND_UP(c->fb_state.width, info.block[0]); - info.grid[1] = DIV_ROUND_UP(c->fb_state.height, info.block[1]); + info.grid[0] = DIV_ROUND_UP(draw_area->x1, info.block[0]); + info.grid[1] = DIV_ROUND_UP(draw_area->y1, info.block[1]); info.grid[2] = 1; ctx->launch_grid(ctx, ); @@ -346,7 +347,7 @@ draw_layers(struct vl_compositor *c, c->pipe->set_sampler_views(c->pipe, PIPE_SHADER_COMPUTE, 0, num_sampler_views, samplers); - cs_launch(c, layer->cs); + cs_launch(c, layer->cs, &(drawn.area)); if (dirty) { struct u_rect drawn = calc_drawn_area(s, layer); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/3] gallium/auxiliary/vl: Change weave compute shader implementation
Use 2D_ARRARY instead of RECT to fetch texels for weave compute shader. Problem 2,3: Fixed interpolation issue with weave de-interlace Fixes: 9364d66cb7f7 (Add video compositor compute shader render) Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109646 Signed-off-by: James Zhu Tested-by: Bruno Milreu --- src/gallium/auxiliary/vl/vl_compositor_cs.c | 79 ++--- 1 file changed, 62 insertions(+), 17 deletions(-) diff --git a/src/gallium/auxiliary/vl/vl_compositor_cs.c b/src/gallium/auxiliary/vl/vl_compositor_cs.c index de0a3c7..bad7d5f 100644 --- a/src/gallium/auxiliary/vl/vl_compositor_cs.c +++ b/src/gallium/auxiliary/vl/vl_compositor_cs.c @@ -113,15 +113,16 @@ const char *compute_shader_weave = "DCL SV[1], BLOCK_ID\n" "DCL CONST[0..5]\n" - "DCL SVIEW[0..2], RECT, FLOAT\n" + "DCL SVIEW[0..2], 2D_ARRAY, FLOAT\n" "DCL SAMP[0..2]\n" "DCL IMAGE[0], 2D, WR\n" - "DCL TEMP[0..9]\n" + "DCL TEMP[0..15]\n" "IMM[0] UINT32 { 8, 8, 1, 0}\n" "IMM[1] FLT32 { 1.0, 2.0, 0.0, 0.0}\n" "IMM[2] UINT32 { 1, 2, 4, 0}\n" + "IMM[3] FLT32 { 0.25, 0.5, 0.125, 0.125}\n" "UMAD TEMP[0], SV[1], IMM[0], SV[0]\n" @@ -137,26 +138,70 @@ const char *compute_shader_weave = /* Translate */ "UADD TEMP[2].xy, TEMP[2], -CONST[5].xyxy\n" - /* Texture layer */ - "UMOD TEMP[2].z, TEMP[2]., IMM[2].\n" - "UMOD TEMP[3].z, TEMP[2]., IMM[2].\n" - "USHR TEMP[3].z, TEMP[3]., IMM[2].\n" + /* Top Y */ + "U2F TEMP[2], TEMP[2]\n" + "DIV TEMP[2].y, TEMP[2]., IMM[1].\n" + /* Down Y */ + "MOV TEMP[12], TEMP[2]\n" + + /* Top UV */ + "MOV TEMP[3], TEMP[2]\n" + "DIV TEMP[3].xy, TEMP[3], IMM[1].\n" + /* Down UV */ + "MOV TEMP[13], TEMP[3]\n" + + /* Texture offset */ + "ADD TEMP[2].x, TEMP[2]., IMM[3].\n" + "ADD TEMP[2].y, TEMP[2]., IMM[3].\n" + "ADD TEMP[12].x, TEMP[12]., IMM[3].\n" + "ADD TEMP[12].y, TEMP[12]., IMM[3].\n" + + "ADD TEMP[3].x, TEMP[3]., IMM[3].\n" + "ADD TEMP[3].y, TEMP[3]., IMM[3].\n" + "ADD TEMP[13].x, TEMP[13]., IMM[3].\n" + "ADD TEMP[13].y, TEMP[13]., IMM[3].\n" - "USHR TEMP[2].y, TEMP[2], IMM[2].\n" - "USHR TEMP[3].xy, TEMP[2], IMM[2].\n" + /* Scale */ + "DIV TEMP[2].xy, TEMP[2], CONST[3].zwzw\n" + "DIV TEMP[12].xy, TEMP[12], CONST[3].zwzw\n" + "DIV TEMP[3].xy, TEMP[3], CONST[3].zwzw\n" + "DIV TEMP[13].xy, TEMP[13], CONST[3].zwzw\n" - "U2F TEMP[4], TEMP[2]\n" - "U2F TEMP[5], TEMP[3]\n" + /* Weave offset */ + "ADD TEMP[2].y, TEMP[2]., IMM[3].\n" + "ADD TEMP[12].y, TEMP[12]., -IMM[3].\n" + "ADD TEMP[3].y, TEMP[3]., IMM[3].\n" + "ADD TEMP[13].y, TEMP[13]., -IMM[3].\n" - /* Scale */ - "DIV TEMP[4], TEMP[4], CONST[3].zwzw\n" - "DIV TEMP[5], TEMP[5], CONST[3].zwzw\n" + /* Texture layer */ + "MOV TEMP[14].x, TEMP[2].\n" + "MOV TEMP[14].yz, TEMP[3].\n" + "ROUND TEMP[15], TEMP[14]\n" + "ADD TEMP[14], TEMP[14], -TEMP[15]\n" + "MOV TEMP[14], |TEMP[14]|\n" + "MUL TEMP[14], TEMP[14], IMM[1].\n" + + /* Normalize */ + "DIV TEMP[2].xy, TEMP[2], CONST[5].zwzw\n" + "DIV TEMP[12].xy, TEMP[12], CONST[5].zwzw\n" + "DIV TEMP[15].xy, CONST[5].zwzw, IMM[1].\n" + "DIV TEMP[3].xy, TEMP[3], TEMP[15].xyxy\n" + "DIV TEMP[13].xy, TEMP[13], TEMP[15].xyxy\n" /* Fetch texels */ - "TEX_LZ TEMP[6].x, TEMP[4], SAMP[0], RECT\n" - "TEX_LZ TEMP[6].y, TEMP[5], SAMP[1], RECT\n" - "TEX_LZ TEMP[6].z, TEMP[5], SAMP[2], RECT\n" - + "MOV TEMP[2].z, IMM[1].\n" + "MOV TEMP[3].z, IMM[1].\n" + "TEX_LZ TEMP[10].x, TEMP[2], SAMP[0], 2D_ARRAY\n" + "TEX_LZ TEMP[10].y, TEMP[3], SAMP[1], 2D_ARRAY\n" + "TEX_LZ TEMP[10].z, TEMP[3], SAMP[2], 2D_ARRAY\n" + + "MOV TEMP[12].z, IMM[1].\n" + "MOV TEMP[13].z, IMM[1].\n" + "TEX_LZ TEMP[11].x, TEMP[12], SAMP[0], 2D_ARRAY\n" + "TEX_LZ TEMP[11].y, TEMP[13], SAMP[1], 2D_ARRAY\n" + "TEX_LZ TEMP[11].z, TEMP[13], SAMP[2], 2D_ARRAY\n" + + "LRP TEMP[6], TEMP[14], TEMP[11], TEMP[10]\n" "MOV TEMP[6].w, IMM[1].\n" /* Color Space Conversion */ -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] mesa: Fix GL_NUM_DEVICE_UUIDS_EXT
Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/main/get.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index ee77c45d03c1..efc9c11f79d3 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -906,6 +906,9 @@ find_custom_value(struct gl_context *ctx, const struct value_desc *d, union valu break; /* GL_EXT_external_objects */ + case GL_NUM_DEVICE_UUIDS_EXT: + v->value_int = 1; + break; case GL_DRIVER_UUID_EXT: _mesa_get_driver_uuid(ctx, v->value_int_4); break; -- 2.19.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] gallium/auxiliary/vl: Increase shader_params size
Increase shader_params size to pass sampler data to compute shader during weave de-interlace. Signed-off-by: James Zhu Tested-by: Bruno Milreu --- src/gallium/auxiliary/vl/vl_compositor.c| 2 +- src/gallium/auxiliary/vl/vl_compositor_cs.c | 9 - 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/src/gallium/auxiliary/vl/vl_compositor.c b/src/gallium/auxiliary/vl/vl_compositor.c index a8f3620..12c58ff 100644 --- a/src/gallium/auxiliary/vl/vl_compositor.c +++ b/src/gallium/auxiliary/vl/vl_compositor.c @@ -802,7 +802,7 @@ vl_compositor_init_state(struct vl_compositor_state *s, struct pipe_context *pip pipe->screen, PIPE_BIND_CONSTANT_BUFFER, PIPE_USAGE_DEFAULT, - sizeof(csc_matrix) + 4*sizeof(float) + 6*sizeof(int) + sizeof(csc_matrix) + 6*sizeof(float) + 6*sizeof(int) ); if (!s->shader_params) diff --git a/src/gallium/auxiliary/vl/vl_compositor_cs.c b/src/gallium/auxiliary/vl/vl_compositor_cs.c index 0e49c11..70898be 100644 --- a/src/gallium/auxiliary/vl/vl_compositor_cs.c +++ b/src/gallium/auxiliary/vl/vl_compositor_cs.c @@ -38,6 +38,8 @@ struct cs_viewport { struct u_rect area; int translate_x; int translate_y; + float sampler0_w; + float sampler0_h; }; const char *compute_shader_video_buffer = @@ -298,8 +300,11 @@ set_viewport(struct vl_compositor_state *s, *ptr_int++ = drawn->area.x1; *ptr_int++ = drawn->area.y1; *ptr_int++ = drawn->translate_x; - *ptr_int = drawn->translate_y; + *ptr_int++ = drawn->translate_y; + ptr_float = (float *)ptr_int; + *ptr_float++ = drawn->sampler0_w; + *ptr_float = drawn->sampler0_h; pipe_buffer_unmap(s->pipe, buf_transfer); return true; @@ -328,6 +333,8 @@ draw_layers(struct vl_compositor *c, drawn.scale_y = drawn.scale_x; drawn.translate_x = (int)layer->viewport.translate[0]; drawn.translate_y = (int)layer->viewport.translate[1]; + drawn.sampler0_w = (float)layer->sampler_views[0]->texture->width0; + drawn.sampler0_h = (float)layer->sampler_views[0]->texture->height0; if (memcmp(, _drawn, sizeof(struct cs_viewport))) { set_viewport(s, ); -- 2.7.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] radv: Fix driverUUID
--- src/amd/vulkan/radv_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/amd/vulkan/radv_device.c b/src/amd/vulkan/radv_device.c index 83d218fb6bf0..6deb69d22d48 100644 --- a/src/amd/vulkan/radv_device.c +++ b/src/amd/vulkan/radv_device.c @@ -337,7 +337,7 @@ radv_physical_device_init(struct radv_physical_device *device, device->rad_info.chip_class > GFX9) fprintf(stderr, "WARNING: radv is not a conformant vulkan implementation, testing use only.\n"); - radv_get_driver_uuid(>device_uuid); + radv_get_driver_uuid(>driver_uuid); radv_get_device_uuid(>rad_info, >device_uuid); if (device->rad_info.family == CHIP_STONEY || -- 2.19.2 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 109929] tgsi_to_nir.c:2111: undefined reference to `gl_nir_lower_samplers_as_deref'
https://bugs.freedesktop.org/show_bug.cgi?id=109929 --- Comment #4 from Timur Kristóf --- Created attachment 143635 --> https://bugs.freedesktop.org/attachment.cgi?id=143635=edit build log of mesa with e582e761b7f49d1c0b100289b62442e6295cefef I tried with the commit before that, e582e761b7f49d1c0b100289b62442e6295cefef and mesa doesn't build with autotools. Attached the log. Am I doing something wrong here? -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 109641] GLX swrast driver leaks shared memory segments
https://bugs.freedesktop.org/show_bug.cgi?id=109641 --- Comment #2 from Brian Paul --- And it appears this patch will be in the Mesa 19.0 release. --- Comment #3 from Marcello Blancasio --- The patch fixed the issue. Thank you. -- You are receiving this mail because: You are the assignee for the bug. You are the QA Contact for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] 2019 X.Org Foundation Election Candidates
On Tue, Mar 12, 2019 at 12:24:00AM +, Wentland, Harry wrote: > To all X.Org Foundation Members: > > The election for the X.Org Foundation Board of Directors will begin on > 14 March 2019. We have 6 candidates who are running for 4 seats. They > are (in alphabetical order): > Attached below are the Personal Statements each candidate submitted > for your consideration along with their Statements of Contribution > that they submitted with the membership application. Please review > each of the candidates' statements to help you decide whom to vote for > during the upcoming election. > > If you have questions of the candidates, you should feel free to ask them > here on the mailing list. > > The election committee will provide detailed instructions on how the voting > system will work when the voting period begins. > > Harry, on behalf of the X.Org elections committee Are these candidates the only thing we will be voting on? Luc Verhaegen. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 77449] Tracker bug for all bugs related to Steam titles
https://bugs.freedesktop.org/show_bug.cgi?id=77449 Bug 77449 depends on bug 104602, which changed state. Bug 104602 Summary: [apitrace] Graphical artifacts in Civilization VI on RX Vega https://bugs.freedesktop.org/show_bug.cgi?id=104602 What|Removed |Added Status|REOPENED|RESOLVED Resolution|--- |FIXED -- You are receiving this mail because: You are the assignee for the bug.___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/9] anv: Set the values for the VkPhysicalDeviceSampleLocationsPropertiesEXT
On Mon, 11 Mar 2019 11:39:58 -0700 Sagar Ghuge wrote: > On Mon, 2019-03-11 at 17:04 +0200, Eleni Maria Stea wrote: > > The VkPhysicalDeviceSampleLocationPropertiesEXT struct is filled > > with implementation dependent values and according to the table > > from the Vulkan Specification section [36.1. Limit Requirements]: > > > > pname | max | min > > pname:sampleLocationSampleCounts |- > > |ename:VK_SAMPLE_COU NT_4_BIT > > pname:maxSampleLocationGridSize|-|(1, 1) > > pname:sampleLocationCoordinateRange|(0.0, 0.9375)|(0.0, 0.9375) > > pname:sampleLocationSubPixelBits |-|4 > > pname:variableSampleLocations | false |implementation > > dependent > > > > The hardware only supports setting the same sample location for all > > the > > pixels, so we only support 1x1 grids. > > > > Also, variableSampleLocations is set to false because we don't > > support the > > feature. > > --- > > src/intel/vulkan/anv_device.c | 21 + > > src/intel/vulkan/anv_private.h | 3 +++ > > 2 files changed, 24 insertions(+) > > > > diff --git a/src/intel/vulkan/anv_device.c > > b/src/intel/vulkan/anv_device.c > > index 729cceb3e32..1e183b7f4ad 100644 > > --- a/src/intel/vulkan/anv_device.c > > +++ b/src/intel/vulkan/anv_device.c > > @@ -1401,6 +1401,27 @@ void anv_GetPhysicalDeviceProperties2( > > break; > >} > > > > + case > > VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SAMPLE_LOCATIONS_PROPERTIES_EXT: { > > + VkPhysicalDeviceSampleLocationsPropertiesEXT *props = > > +(VkPhysicalDeviceSampleLocationsPropertiesEXT *)ext; > > + props->sampleLocationSampleCounts = ISL_SAMPLE_COUNT_2_BIT > > | > > + ISL_SAMPLE_COUNT_4_BIT > > | > > + > > ISL_SAMPLE_COUNT_8_BIT; > > + if (pdevice->info.gen >= 9) > > +props->sampleLocationSampleCounts |= > > ISL_SAMPLE_COUNT_16_BIT; > > Hi Eleni, > > Thanks for the series. > > "isl_device_get_sample_counts" method figure out values according to > platform so maybe we can make use of it and ignore > ISL_SAMPLE_COUNT_1_BIT. So that we don't have to take care of values > according to platform here. > > I am not sure about this, so it might be a good idea to consult with > Jason/Lionel once. :) I think that not only you are right here, but on top of that we shouldn't ignore the ISL_SAMPLE_COUNT_1_BIT, as we can still write one user defined location when only 1 sample per pixel is used (at least MULTISAMPLE and SAMPLE_PATTERN commands allow us to do so). So, I've made the change, thank you. :) > > with or without the fix, this patch is: > > Reviewed-by: Sagar Ghuge > Thanks for the review! Eleni ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 8/9] anv: Removed unused header file
In src/intel/vulkan/genX_blorp_exec.c we included the file: common/gen_sample_positions.h but not use it. Removed. --- src/intel/vulkan/genX_blorp_exec.c | 1 - 1 file changed, 1 deletion(-) diff --git a/src/intel/vulkan/genX_blorp_exec.c b/src/intel/vulkan/genX_blorp_exec.c index e9c85d56d5f..0eeefaaa9d6 100644 --- a/src/intel/vulkan/genX_blorp_exec.c +++ b/src/intel/vulkan/genX_blorp_exec.c @@ -31,7 +31,6 @@ #undef __gen_combine_address #include "common/gen_l3_config.h" -#include "common/gen_sample_positions.h" #include "blorp/blorp_genX_exec.h" static void * -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 7/9] anv: Optimized the emission of the default locations on Gen8+
We only emit sample locations when the extension is enabled by the user. In all other cases the default locations are emitted once when the device is initialized to increase performance. --- src/intel/vulkan/anv_genX.h| 3 ++- src/intel/vulkan/genX_cmd_buffer.c | 2 +- src/intel/vulkan/genX_pipeline.c | 11 +++ src/intel/vulkan/genX_state.c | 8 +--- 4 files changed, 15 insertions(+), 9 deletions(-) diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h index e82d83465ef..7f33a2b0a68 100644 --- a/src/intel/vulkan/anv_genX.h +++ b/src/intel/vulkan/anv_genX.h @@ -93,4 +93,5 @@ void genX(emit_ms_state)(struct anv_batch *batch, struct anv_sample *anv_samples, uint32_t num_samples, uint32_t log2_samples, - bool custom_sample_locations); + bool custom_sample_locations, + bool sample_locations_ext_enabled); diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index 4752c66f350..ae7c5a80a3c 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -2654,7 +2654,7 @@ cmd_buffer_emit_sample_locations(struct anv_cmd_buffer *cmd_buffer) anv_samples = cmd_buffer->state.gfx.dynamic.sample_locations.anv_samples; genX(emit_ms_state)(_buffer->batch, anv_samples, samples, - log2_samples, true); + log2_samples, true, true); } void diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c index 8afc08f0320..12adfa65da8 100644 --- a/src/intel/vulkan/genX_pipeline.c +++ b/src/intel/vulkan/genX_pipeline.c @@ -573,10 +573,12 @@ emit_sample_mask(struct anv_pipeline *pipeline, } static void -emit_ms_state(struct anv_pipeline *pipeline, +emit_ms_state(struct anv_device *device, + struct anv_pipeline *pipeline, const VkPipelineMultisampleStateCreateInfo *info, const VkPipelineDynamicStateCreateInfo *dinfo) { + bool sample_loc_enabled = device->enabled_extensions.EXT_sample_locations; struct anv_sample anv_samples[MAX_SAMPLE_LOCATIONS]; VkSampleLocationsInfoEXT *sl; bool custom_locations = false; @@ -588,7 +590,7 @@ emit_ms_state(struct anv_pipeline *pipeline, if (info) { samples = info->rasterizationSamples; - if (info->pNext) { + if (sample_loc_enabled && info->pNext) { VkPipelineSampleLocationsStateCreateInfoEXT *slinfo = (VkPipelineSampleLocationsStateCreateInfoEXT *)info->pNext; @@ -617,7 +619,7 @@ emit_ms_state(struct anv_pipeline *pipeline, } genX(emit_ms_state)(>batch, anv_samples, samples, log2_samples, - custom_locations); + custom_locations, sample_loc_enabled); } static const uint32_t vk_to_gen_logic_op[] = { @@ -1947,7 +1949,8 @@ genX(graphics_pipeline_create)( assert(pCreateInfo->pRasterizationState); emit_rs_state(pipeline, pCreateInfo->pRasterizationState, pCreateInfo->pMultisampleState, pass, subpass); - emit_ms_state(pipeline, pCreateInfo->pMultisampleState, pCreateInfo->pDynamicState); + emit_ms_state(device, pipeline, pCreateInfo->pMultisampleState, + pCreateInfo->pDynamicState); emit_ds_state(pipeline, pCreateInfo->pDepthStencilState, pass, subpass); emit_cb_state(pipeline, pCreateInfo->pColorBlendState, pCreateInfo->pMultisampleState); diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c index 804cfab3a56..bc6b5870d8d 100644 --- a/src/intel/vulkan/genX_state.c +++ b/src/intel/vulkan/genX_state.c @@ -552,12 +552,14 @@ genX(emit_ms_state)(struct anv_batch *batch, struct anv_sample *anv_samples, uint32_t num_samples, uint32_t log2_samples, - bool custom_sample_locations) + bool custom_sample_locations, + bool sample_locations_ext_enabled) { emit_multisample(batch, anv_samples, num_samples, log2_samples, custom_sample_locations); #if GEN_GEN >= 8 - emit_sample_locations(batch, anv_samples, num_samples, - custom_sample_locations); + if (sample_locations_ext_enabled) + emit_sample_locations(batch, anv_samples, num_samples, +custom_sample_locations); #endif } -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 6/9] anv: Added support for dynamic and non-dynamic sample locations on Gen7
Allowing setting dynamic and non-dynamic sample locations on Gen7. --- src/intel/vulkan/anv_genX.h| 13 ++--- src/intel/vulkan/genX_cmd_buffer.c | 9 ++-- src/intel/vulkan/genX_pipeline.c | 13 + src/intel/vulkan/genX_state.c | 86 +- 4 files changed, 70 insertions(+), 51 deletions(-) diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h index f84fe457152..e82d83465ef 100644 --- a/src/intel/vulkan/anv_genX.h +++ b/src/intel/vulkan/anv_genX.h @@ -89,11 +89,8 @@ void genX(cmd_buffer_mi_memset)(struct anv_cmd_buffer *cmd_buffer, void genX(blorp_exec)(struct blorp_batch *batch, const struct blorp_params *params); -void genX(emit_multisample)(struct anv_batch *batch, -uint32_t samples, -uint32_t log2_samples); - -void genX(emit_sample_locations)(struct anv_batch *batch, - const struct anv_sample *anv_samples, - uint32_t num_samples, - bool custom_locations); +void genX(emit_ms_state)(struct anv_batch *batch, + struct anv_sample *anv_samples, + uint32_t num_samples, + uint32_t log2_samples, + bool custom_sample_locations); diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index 9229df84caa..4752c66f350 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -2643,8 +2643,7 @@ cmd_buffer_flush_push_constants(struct anv_cmd_buffer *cmd_buffer, static void cmd_buffer_emit_sample_locations(struct anv_cmd_buffer *cmd_buffer) { -#if GEN_GEN >= 8 - const struct anv_sample *anv_samples; + struct anv_sample *anv_samples; uint32_t log2_samples; uint32_t samples; @@ -2654,10 +2653,8 @@ cmd_buffer_emit_sample_locations(struct anv_cmd_buffer *cmd_buffer) log2_samples = __builtin_ffs(samples) - 1; anv_samples = cmd_buffer->state.gfx.dynamic.sample_locations.anv_samples; - genX(emit_multisample)(_buffer->batch, samples, log2_samples); - genX(emit_sample_locations)(_buffer->batch, anv_samples, samples, - true); -#endif + genX(emit_ms_state)(_buffer->batch, anv_samples, samples, + log2_samples, true); } void diff --git a/src/intel/vulkan/genX_pipeline.c b/src/intel/vulkan/genX_pipeline.c index fa42e622077..8afc08f0320 100644 --- a/src/intel/vulkan/genX_pipeline.c +++ b/src/intel/vulkan/genX_pipeline.c @@ -577,12 +577,9 @@ emit_ms_state(struct anv_pipeline *pipeline, const VkPipelineMultisampleStateCreateInfo *info, const VkPipelineDynamicStateCreateInfo *dinfo) { -#if GEN_GEN >= 8 struct anv_sample anv_samples[MAX_SAMPLE_LOCATIONS]; VkSampleLocationsInfoEXT *sl; bool custom_locations = false; -#endif - uint32_t samples = 1; uint32_t log2_samples = 0; @@ -591,7 +588,6 @@ emit_ms_state(struct anv_pipeline *pipeline, if (info) { samples = info->rasterizationSamples; -#if GEN_GEN >= 8 if (info->pNext) { VkPipelineSampleLocationsStateCreateInfoEXT *slinfo = (VkPipelineSampleLocationsStateCreateInfoEXT *)info->pNext; @@ -616,17 +612,12 @@ emit_ms_state(struct anv_pipeline *pipeline, } } } -#endif log2_samples = __builtin_ffs(samples) - 1; } - genX(emit_multisample(>batch, samples, log2_samples)); - -#if GEN_GEN >= 8 - genX(emit_sample_locations)(>batch, anv_samples, samples, - custom_locations); -#endif + genX(emit_ms_state)(>batch, anv_samples, samples, log2_samples, + custom_locations); } static const uint32_t vk_to_gen_logic_op[] = { diff --git a/src/intel/vulkan/genX_state.c b/src/intel/vulkan/genX_state.c index 44cfc925ed5..804cfab3a56 100644 --- a/src/intel/vulkan/genX_state.c +++ b/src/intel/vulkan/genX_state.c @@ -437,10 +437,12 @@ VkResult genX(CreateSampler)( return VK_SUCCESS; } -void -genX(emit_multisample)(struct anv_batch *batch, - uint32_t samples, - uint32_t log2_samples) +static void +emit_multisample(struct anv_batch *batch, + const struct anv_sample *anv_samples, + uint32_t samples, + uint32_t log2_samples, + bool custom_locations) { anv_batch_emit(batch, GENX(3DSTATE_MULTISAMPLE), ms) { ms.NumberofMultisamples = log2_samples; @@ -453,34 +455,52 @@ genX(emit_multisample)(struct anv_batch *batch, */ ms.PixelPositionOffsetEnable = false; #else - switch (samples) { - case 1: - GEN_SAMPLE_POS_1X(ms.Sample); - break; - case 2: - GEN_SAMPLE_POS_2X(ms.Sample); - break; - case 4: - GEN_SAMPLE_POS_4X(ms.Sample); - break; -
[Mesa-dev] [PATCH v2 5/9] anv: Added support for dynamic sample locations on Gen8+
Added support for setting the locations when the pipeline has been created with the dynamic state bit enabled according to the Vulkan Specification section [26.5. Custom Sample Locations] for the function: 'vkCmdSetSampleLocationsEXT' The reason that we preferred to store the boolean valid inside the dynamic state struct for locations instead of using a dirty bit (ANV_CMD_DIRTY_SAMPLE_LOCATIONS for example) is that other functions can modify the value of the dirty bits causing unexpected behavior. --- src/intel/vulkan/anv_cmd_buffer.c | 19 src/intel/vulkan/anv_genX.h| 6 +++- src/intel/vulkan/anv_private.h | 6 src/intel/vulkan/genX_cmd_buffer.c | 27 ++ src/intel/vulkan/genX_pipeline.c | 46 -- src/intel/vulkan/genX_state.c | 41 +++--- 6 files changed, 99 insertions(+), 46 deletions(-) diff --git a/src/intel/vulkan/anv_cmd_buffer.c b/src/intel/vulkan/anv_cmd_buffer.c index 1b34644a434..101c1375430 100644 --- a/src/intel/vulkan/anv_cmd_buffer.c +++ b/src/intel/vulkan/anv_cmd_buffer.c @@ -28,6 +28,7 @@ #include #include "anv_private.h" +#include "anv_sample_locations.h" #include "vk_format_info.h" #include "vk_util.h" @@ -558,6 +559,24 @@ void anv_CmdSetStencilReference( cmd_buffer->state.gfx.dirty |= ANV_CMD_DIRTY_DYNAMIC_STENCIL_REFERENCE; } +void +anv_CmdSetSampleLocationsEXT(VkCommandBuffer commandBuffer, + const VkSampleLocationsInfoEXT *pSampleLocationsInfo) +{ + ANV_FROM_HANDLE(anv_cmd_buffer, cmd_buffer, commandBuffer); + assert(pSampleLocationsInfo); + + struct anv_dynamic_state *dyn_state = _buffer->state.gfx.dynamic; + dyn_state->sample_locations.num_samples = + pSampleLocationsInfo->sampleLocationsPerPixel; + + anv_calc_sample_locations(dyn_state->sample_locations.anv_samples, + dyn_state->sample_locations.num_samples, + pSampleLocationsInfo); + + cmd_buffer->state.gfx.dynamic.sample_locations.valid = true; +} + static void anv_cmd_buffer_bind_descriptor_set(struct anv_cmd_buffer *cmd_buffer, VkPipelineBindPoint bind_point, diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h index 52415c04a45..f84fe457152 100644 --- a/src/intel/vulkan/anv_genX.h +++ b/src/intel/vulkan/anv_genX.h @@ -89,7 +89,11 @@ void genX(cmd_buffer_mi_memset)(struct anv_cmd_buffer *cmd_buffer, void genX(blorp_exec)(struct blorp_batch *batch, const struct blorp_params *params); +void genX(emit_multisample)(struct anv_batch *batch, +uint32_t samples, +uint32_t log2_samples); + void genX(emit_sample_locations)(struct anv_batch *batch, + const struct anv_sample *anv_samples, uint32_t num_samples, - const VkSampleLocationsInfoEXT *sl, bool custom_locations); diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index 981956e5706..a2e1756cd99 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -2135,6 +2135,12 @@ struct anv_dynamic_state { uint32_t front; uint32_t back; } stencil_reference; + + struct { + struct anv_sample anv_samples[MAX_SAMPLE_LOCATIONS]; + uint32_t num_samples; + bool valid; + } sample_locations; }; extern const struct anv_dynamic_state default_dynamic_state; diff --git a/src/intel/vulkan/genX_cmd_buffer.c b/src/intel/vulkan/genX_cmd_buffer.c index 7687507e6b7..9229df84caa 100644 --- a/src/intel/vulkan/genX_cmd_buffer.c +++ b/src/intel/vulkan/genX_cmd_buffer.c @@ -25,11 +25,13 @@ #include #include "anv_private.h" +#include "anv_sample_locations.h" #include "vk_format_info.h" #include "vk_util.h" #include "util/fast_idiv_by_const.h" #include "common/gen_l3_config.h" +#include "common/gen_sample_positions.h" #include "genxml/gen_macros.h" #include "genxml/genX_pack.h" @@ -2638,6 +2640,26 @@ cmd_buffer_flush_push_constants(struct anv_cmd_buffer *cmd_buffer, cmd_buffer->state.push_constants_dirty &= ~flushed; } +static void +cmd_buffer_emit_sample_locations(struct anv_cmd_buffer *cmd_buffer) +{ +#if GEN_GEN >= 8 + const struct anv_sample *anv_samples; + uint32_t log2_samples; + uint32_t samples; + + samples = cmd_buffer->state.gfx.dynamic.sample_locations.num_samples; + assert(samples > 0); + + log2_samples = __builtin_ffs(samples) - 1; + anv_samples = cmd_buffer->state.gfx.dynamic.sample_locations.anv_samples; + + genX(emit_multisample)(_buffer->batch, samples, log2_samples); + genX(emit_sample_locations)(_buffer->batch,
[Mesa-dev] [PATCH v2 2/9] anv: Set the values for the VkPhysicalDeviceSampleLocationsPropertiesEXT
The VkPhysicalDeviceSampleLocationPropertiesEXT struct is filled with implementation dependent values and according to the table from the Vulkan Specification section [36.1. Limit Requirements]: pname | max | min pname:sampleLocationSampleCounts |-|ename:VK_SAMPLE_COUNT_4_BIT pname:maxSampleLocationGridSize|-|(1, 1) pname:sampleLocationCoordinateRange|(0.0, 0.9375)|(0.0, 0.9375) pname:sampleLocationSubPixelBits |-|4 pname:variableSampleLocations | false |implementation dependent The hardware only supports setting the same sample location for all the pixels, so we only support 1x1 grids. Also, variableSampleLocations is set to false because we don't support the feature. v2: 1- Replaced false with VK_FALSE for consistency. (Sagar Ghuge) 2- Used the isl_device_sample_count to take the number of samples per platform to avoid extra checks. (Sagar Ghuge) Reviewed-by: Sagar Ghuge --- src/intel/vulkan/anv_device.c | 19 +++ src/intel/vulkan/anv_private.h | 3 +++ 2 files changed, 22 insertions(+) diff --git a/src/intel/vulkan/anv_device.c b/src/intel/vulkan/anv_device.c index 729cceb3e32..bf6f03ebb1a 100644 --- a/src/intel/vulkan/anv_device.c +++ b/src/intel/vulkan/anv_device.c @@ -1401,6 +1401,25 @@ void anv_GetPhysicalDeviceProperties2( break; } + case VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SAMPLE_LOCATIONS_PROPERTIES_EXT: { + VkPhysicalDeviceSampleLocationsPropertiesEXT *props = +(VkPhysicalDeviceSampleLocationsPropertiesEXT *)ext; + + props->sampleLocationSampleCounts = +isl_device_get_sample_counts(>isl_dev); + + props->maxSampleLocationGridSize.width = SAMPLE_LOC_GRID_W; + props->maxSampleLocationGridSize.height = SAMPLE_LOC_GRID_H; + + props->sampleLocationCoordinateRange[0] = 0; + props->sampleLocationCoordinateRange[1] = 0.9375; + props->sampleLocationSubPixelBits = 4; + + props->variableSampleLocations = VK_FALSE; + + break; + } + default: anv_debug_ignored_stype(ext->sType); break; diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index eed282ff985..5905299e59d 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -195,6 +195,9 @@ struct gen_l3_config; #define anv_printflike(a, b) __attribute__((__format__(__printf__, a, b))) +#define SAMPLE_LOC_GRID_W 1 +#define SAMPLE_LOC_GRID_H 1 + static inline uint32_t align_down_npot_u32(uint32_t v, uint32_t a) { -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 3/9] anv: Implemented the vkGetPhysicalDeviceMultisamplePropertiesEXT
Implemented the vkGetPhysicalDeviceMultisamplePropertiesEXT according to the Vulkan Specification section [36.2. Additional Multisampling Capabilities]. --- src/intel/Makefile.sources | 1 + src/intel/vulkan/anv_sample_locations.c | 60 + src/intel/vulkan/meson.build| 1 + 3 files changed, 62 insertions(+) create mode 100644 src/intel/vulkan/anv_sample_locations.c diff --git a/src/intel/Makefile.sources b/src/intel/Makefile.sources index a5c8828a6b6..a0873c7ccc2 100644 --- a/src/intel/Makefile.sources +++ b/src/intel/Makefile.sources @@ -251,6 +251,7 @@ VULKAN_FILES := \ vulkan/anv_pipeline_cache.c \ vulkan/anv_private.h \ vulkan/anv_queue.c \ + vulkan/anv_sample_locations.c \ vulkan/anv_util.c \ vulkan/anv_wsi.c \ vulkan/vk_format_info.h diff --git a/src/intel/vulkan/anv_sample_locations.c b/src/intel/vulkan/anv_sample_locations.c new file mode 100644 index 000..1ebf280e05b --- /dev/null +++ b/src/intel/vulkan/anv_sample_locations.c @@ -0,0 +1,60 @@ +/* + * Copyright © 2019 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "anv_private.h" + +void +anv_GetPhysicalDeviceMultisamplePropertiesEXT(VkPhysicalDevice physicalDevice, + VkSampleCountFlagBits samples, + VkMultisamplePropertiesEXT + *pMultisampleProperties) +{ + ANV_FROM_HANDLE(anv_physical_device, physical_device, physicalDevice); + const struct gen_device_info *devinfo = _device->info; + + VkExtent2D grid_size; + switch (samples) { + case VK_SAMPLE_COUNT_2_BIT: + case VK_SAMPLE_COUNT_4_BIT: + case VK_SAMPLE_COUNT_8_BIT: + grid_size.width = SAMPLE_LOC_GRID_W; + grid_size.height = SAMPLE_LOC_GRID_H; + break; + + case VK_SAMPLE_COUNT_16_BIT: + if (devinfo->gen >= 9) { + grid_size.width = SAMPLE_LOC_GRID_W; + grid_size.height = SAMPLE_LOC_GRID_H; + break; + } + default: + grid_size.width = grid_size.height = 0; + break; + }; + + *pMultisampleProperties = (VkMultisamplePropertiesEXT) { + .sType = VK_STRUCTURE_TYPE_MULTISAMPLE_PROPERTIES_EXT, + .pNext = NULL, + .maxSampleLocationGridSize = grid_size + }; +} diff --git a/src/intel/vulkan/meson.build b/src/intel/vulkan/meson.build index 7fa43a6ad79..3f78757c774 100644 --- a/src/intel/vulkan/meson.build +++ b/src/intel/vulkan/meson.build @@ -135,6 +135,7 @@ libanv_files = files( 'anv_pipeline_cache.c', 'anv_private.h', 'anv_queue.c', + 'anv_sample_locations.c', 'anv_util.c', 'anv_wsi.c', 'vk_format_info.h', -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 9/9] anv: Enabled the VK_EXT_sample_locations extension
Enabled the VK_EXT_sample_locations for Intel Gen >= 7. v2: Replaced device.info->gen >= 7 with True, as Anv doesn't support anything below Gen7. (Lionel Landwerlin) --- src/intel/vulkan/anv_extensions.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/intel/vulkan/anv_extensions.py b/src/intel/vulkan/anv_extensions.py index 9e4e03e46df..5a30c733c5c 100644 --- a/src/intel/vulkan/anv_extensions.py +++ b/src/intel/vulkan/anv_extensions.py @@ -129,7 +129,7 @@ EXTENSIONS = [ Extension('VK_EXT_inline_uniform_block', 1, True), Extension('VK_EXT_pci_bus_info', 2, True), Extension('VK_EXT_post_depth_coverage', 1, 'device->info.gen >= 9'), -Extension('VK_EXT_sample_locations', 1, False), +Extension('VK_EXT_sample_locations', 1, True), Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen >= 9'), Extension('VK_EXT_scalar_block_layout', 1, True), Extension('VK_EXT_shader_stencil_export', 1, 'device->info.gen >= 9'), -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 0/9] Implementation of the VK_EXT_sample_locations
Implemented the requirements from the VK_EXT_sample_locations extension specification to allow setting custom sample locations on Intel Gen >= 7. Some decisions explained: The grid size was set to 1x1 because the hardware only supports a single set of sample locations for the whole framebuffer. The user can only set custom sample locations per pipeline by filling the extension provided structs or dynamically the way it is described in the sections 26.5, 36.1, 36.2 of the Vulkan specification. Sections 6.7.3 and 7.4 describe how to use sample locations with images when a layout transition is about to take place. These sections were ignored as currently we aren't using sample locations with images in the driver. Variable sample locations aren't required and have not been implemented. We have 754 vk-gl-cts tests for this extension: The 690 pass on Gen >= 9 (where we can support 16 samples). The remaining 64 tests aren't supported because they test the variable sample locations. Eleni Maria Stea (9): anv: Added the VK_EXT_sample_locations extension to the anv_extensions list anv: Set the values for the VkPhysicalDeviceSampleLocationsPropertiesEXT anv: Implemented the vkGetPhysicalDeviceMultisamplePropertiesEXT anv: Added support for non-dynamic sample locations on Gen8+ anv: Added support for dynamic sample locations on Gen8+ anv: Added support for dynamic and non-dynamic sample locations on Gen7 anv: Optimized the emission of the default locations on Gen8+ anv: Removed unused header file anv: Enabled the VK_EXT_sample_locations extension src/intel/Makefile.sources | 1 + src/intel/common/gen_sample_positions.h | 53 ++ src/intel/vulkan/anv_cmd_buffer.c | 19 src/intel/vulkan/anv_device.c | 21 src/intel/vulkan/anv_extensions.py | 1 + src/intel/vulkan/anv_genX.h | 7 ++ src/intel/vulkan/anv_private.h | 18 src/intel/vulkan/anv_sample_locations.c | 96 ++ src/intel/vulkan/anv_sample_locations.h | 29 ++ src/intel/vulkan/genX_blorp_exec.c | 1 - src/intel/vulkan/genX_cmd_buffer.c | 24 + src/intel/vulkan/genX_pipeline.c| 92 + src/intel/vulkan/genX_state.c | 128 src/intel/vulkan/meson.build| 1 + 14 files changed, 450 insertions(+), 41 deletions(-) create mode 100644 src/intel/vulkan/anv_sample_locations.c create mode 100644 src/intel/vulkan/anv_sample_locations.h -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v2 4/9] anv: Added support for non-dynamic sample locations on Gen8+
Allowing the user to set custom sample locations non-dynamically, by filling the extension structs and chaining them to the pipeline structs according to the Vulkan specification section [26.5. Custom Sample Locations] for the following structures: 'VkPipelineSampleLocationsStateCreateInfoEXT' 'VkSampleLocationsInfoEXT' 'VkSampleLocationEXT' Once custom locations are used, the default locations are lost and need to be re-emitted again in the next pipeline creation. For that, we emit the 3DSTATE_SAMPLE_PATTERN at every pipeline creation. --- src/intel/common/gen_sample_positions.h | 53 src/intel/vulkan/anv_genX.h | 5 ++ src/intel/vulkan/anv_private.h | 9 +++ src/intel/vulkan/anv_sample_locations.c | 38 +++- src/intel/vulkan/anv_sample_locations.h | 29 + src/intel/vulkan/genX_pipeline.c| 80 + src/intel/vulkan/genX_state.c | 59 ++ 7 files changed, 259 insertions(+), 14 deletions(-) create mode 100644 src/intel/vulkan/anv_sample_locations.h diff --git a/src/intel/common/gen_sample_positions.h b/src/intel/common/gen_sample_positions.h index da48dcb5ed0..e8af2a552dc 100644 --- a/src/intel/common/gen_sample_positions.h +++ b/src/intel/common/gen_sample_positions.h @@ -160,4 +160,57 @@ prefix##14YOffset = 0.9375; \ prefix##15XOffset = 0.0625; \ prefix##15YOffset = 0.; +/* Examples: + * in case of GEN_GEN < 8: + * SET_SAMPLE_POS(ms.Sample, 0); expands to: + *ms.Sample0XOffset = anv_samples[0].offs_x; + *ms.Sample0YOffset = anv_samples[0].offs_y; + * + * in case of GEN_GEN >= 8: + * SET_SAMPLE_POS(sp._16xSample, 0); expands to: + *sp._16xSample0XOffset = anv_samples[0].offs_x; + *sp._16xSample0YOffset = anv_samples[0].offs_y; + */ +#define SET_SAMPLE_POS(prefix, sample_idx) \ +prefix##sample_idx##XOffset = anv_samples[sample_idx].offs_x; \ +prefix##sample_idx##YOffset = anv_samples[sample_idx].offs_y; + +#define SET_SAMPLE_POS_2X(prefix) \ +SET_SAMPLE_POS(prefix, 0); \ +SET_SAMPLE_POS(prefix, 1); + +#define SET_SAMPLE_POS_4X(prefix) \ +SET_SAMPLE_POS(prefix, 0); \ +SET_SAMPLE_POS(prefix, 1); \ +SET_SAMPLE_POS(prefix, 2); \ +SET_SAMPLE_POS(prefix, 3); + +#define SET_SAMPLE_POS_8X(prefix) \ +SET_SAMPLE_POS(prefix, 0); \ +SET_SAMPLE_POS(prefix, 1); \ +SET_SAMPLE_POS(prefix, 2); \ +SET_SAMPLE_POS(prefix, 3); \ +SET_SAMPLE_POS(prefix, 4); \ +SET_SAMPLE_POS(prefix, 5); \ +SET_SAMPLE_POS(prefix, 6); \ +SET_SAMPLE_POS(prefix, 7); + +#define SET_SAMPLE_POS_16X(prefix) \ +SET_SAMPLE_POS(prefix, 0); \ +SET_SAMPLE_POS(prefix, 1); \ +SET_SAMPLE_POS(prefix, 2); \ +SET_SAMPLE_POS(prefix, 3); \ +SET_SAMPLE_POS(prefix, 4); \ +SET_SAMPLE_POS(prefix, 5); \ +SET_SAMPLE_POS(prefix, 6); \ +SET_SAMPLE_POS(prefix, 7); \ +SET_SAMPLE_POS(prefix, 8); \ +SET_SAMPLE_POS(prefix, 9); \ +SET_SAMPLE_POS(prefix, 10); \ +SET_SAMPLE_POS(prefix, 11); \ +SET_SAMPLE_POS(prefix, 12); \ +SET_SAMPLE_POS(prefix, 13); \ +SET_SAMPLE_POS(prefix, 14); \ +SET_SAMPLE_POS(prefix, 15); + #endif /* GEN_SAMPLE_POSITIONS_H */ diff --git a/src/intel/vulkan/anv_genX.h b/src/intel/vulkan/anv_genX.h index 8fd32cabf1e..52415c04a45 100644 --- a/src/intel/vulkan/anv_genX.h +++ b/src/intel/vulkan/anv_genX.h @@ -88,3 +88,8 @@ void genX(cmd_buffer_mi_memset)(struct anv_cmd_buffer *cmd_buffer, void genX(blorp_exec)(struct blorp_batch *batch, const struct blorp_params *params); + +void genX(emit_sample_locations)(struct anv_batch *batch, + uint32_t num_samples, + const VkSampleLocationsInfoEXT *sl, + bool custom_locations); diff --git a/src/intel/vulkan/anv_private.h b/src/intel/vulkan/anv_private.h index 5905299e59d..981956e5706 100644 --- a/src/intel/vulkan/anv_private.h +++ b/src/intel/vulkan/anv_private.h @@ -71,6 +71,7 @@ struct anv_buffer; struct anv_buffer_view; struct anv_image_view; struct anv_instance; +struct anv_sample; struct gen_l3_config; @@ -165,6 +166,7 @@ struct gen_l3_config; #define MAX_PUSH_DESCRIPTORS 32 /* Minimum requirement */ #define MAX_INLINE_UNIFORM_BLOCK_SIZE 4096 #define MAX_INLINE_UNIFORM_BLOCK_DESCRIPTORS 32 +#define MAX_SAMPLE_LOCATIONS 16 /* The kernel relocation API has a limitation of a 32-bit delta value * applied to the address before it is written which, in spite of it being @@ -2086,6 +2088,13 @@ struct anv_push_constants { struct brw_image_param images[MAX_GEN8_IMAGES]; }; +struct +anv_sample { + float offs_x; + float offs_y; + float radius; +}; + struct anv_dynamic_state { struct { uint32_t count; diff --git a/src/intel/vulkan/anv_sample_locations.c b/src/intel/vulkan/anv_sample_locations.c index 1ebf280e05b..c660cb5ae84 100644 --- a/src/intel/vulkan/anv_sample_locations.c +++ b/src/intel/vulkan/anv_sample_locations.c @@ -21,7 +21,7 @@ * IN THE SOFTWARE. */ -#include "anv_private.h"
[Mesa-dev] [PATCH v2 1/9] anv: Added the VK_EXT_sample_locations extension to the anv_extensions list
Added the VK_EXT_sample_locations to the anv_extensions.py list to generate the related entrypoints. --- src/intel/vulkan/anv_extensions.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/intel/vulkan/anv_extensions.py b/src/intel/vulkan/anv_extensions.py index 6fff293dee4..9e4e03e46df 100644 --- a/src/intel/vulkan/anv_extensions.py +++ b/src/intel/vulkan/anv_extensions.py @@ -129,6 +129,7 @@ EXTENSIONS = [ Extension('VK_EXT_inline_uniform_block', 1, True), Extension('VK_EXT_pci_bus_info', 2, True), Extension('VK_EXT_post_depth_coverage', 1, 'device->info.gen >= 9'), +Extension('VK_EXT_sample_locations', 1, False), Extension('VK_EXT_sampler_filter_minmax', 1, 'device->info.gen >= 9'), Extension('VK_EXT_scalar_block_layout', 1, True), Extension('VK_EXT_shader_stencil_export', 1, 'device->info.gen >= 9'), -- 2.20.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] RFC: Workaround for pthread_setaffinity_np() seccomp filtering
Hi On Fri, Mar 1, 2019 at 12:13 PM Mathias Fröhlich wrote: > > On Friday, 1 March 2019 12:15:08 CET Eero Tamminen wrote: > > Hi, > > > > On 1.3.2019 11.12, Michel Dänzer wrote: > > > On 2019-02-28 8:41 p.m., Marek Olšák wrote: > > >>> On Thu, Feb 28, 2019 at 1:37 PM Eero Tamminen > > >>> > > Why distro versions of Qemu filter sched_setaffinity() syscall? > > >>> > > >>> (https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1815889) > > >>> > > >>> Daniel Berrange (berrange) wrote on 2019-02-27: #19 > > >>> > > >>> "IMHO that mesa change is not valid. It is settings its affinity to > > >>> run on all threads which is definitely *NOT* something we want to be > > >>> allowed. Management applications want to control which CPUs QEMU runs > > >>> on, and as such Mesa should honour the CPU placement that the QEMU > > >>> process has. > > >>> > > >>> This is a great example of why QEMU wants to use seccomp to block > > >>> affinity changes to prevent something silently trying to use more CPUs > > >>> than are assigned to this QEMU." > > >>> > > >> > > >> Mesa uses thread affinity to optimize memory access performance on some > > >> CPUs (see util_pin_thread_to_L3). Other places in Mesa need to restore > > >> the > > >> original thread affinity for some child threads. Additionally, if games > > >> limit the thread affinity, Mesa needs to restore the full thread affinity > > >> for some of its child threads. > > > > > > The last part sounds like Mesa clearly overstepping its authority. > > > > > > > > >> In essence, the thread affinity should only be considered a hint for the > > >> kernel for optimal performance. There is no reason to kill the process if > > >> it's disallowed. Just ignore the call or modify the thread mask to make > > >> it > > >> legal. > > > > > > The fundamental issue here is that Mesa is using the thread affinity API > > > for something else than it's intended for. If there was an API for what > > > Mesa wants (encouraging certain sets of threads to run on topologically > > > close cores), there should be no need to block that. > > > > Why such process needs to be killed instead the request being masked > > suitably, is there some program that breaks subtly if affinity request > > is masked (and that being worse than the program being killed)? > > But that is still a situation that could be nicely handled with a > EPERM error return. Way better than just kill a process. > That 'badly affected' program still can call abort then. > But nicely working programs don't get just killed then!! Returning an error seems less secure that prohibiting it completely. And it may lead to subtle bugs in rarely tested code paths. It's legitimate that QEMU and management layers want to prevent arbitrary code from changing resource allocation etc. There are no easy way I can think of for mesa (and other libraries) to probe the seccomp filters and associated action. So we need a way to tell mesa not to call setaffinity() (and other syscalls). MESA_NO_THREAD_AFFINITY or MESA_NO_SYSCALLS=setaffinity,... seem like a relatively easy way to go. thanks -- Marc-André Lureau ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev