On Fri, Mar 23, 2018 at 10:07 PM, Jason Ekstrand <ja...@jlekstrand.net> wrote: > +list > > On Fri, Mar 23, 2018 at 1:45 PM, Karol Herbst <kher...@redhat.com> wrote: >> >> On Fri, Mar 23, 2018 at 9:30 PM, Jason Ekstrand <ja...@jlekstrand.net> >> wrote: >> > As I've been rewriting core NIR deref handling, I've been thinking about >> > this problem quite a bit. One objective I have is to actually make UBO >> > and >> > SSBO access go through derefs instead of just being an offset and index >> > so >> > that the compiler can better reason about them. In particular, I want >> > to be >> > able to start doing load/store elimination on SSBOs, SLM, and whatever >> > CL >> > has which would be great for everyone's compute performance (GL, Vulkan, >> > CL, >> > etc.). >> > >> > I would be lying if I said I had a full plan but I do have part of a >> > plan. >> > In my patch which adds the deref instructions, I add a new "cast" deref >> > type >> > which takes an arbitrary value as it's source and kicks out a deref with >> > a >> > type. Whenever we discover that the source of the cast is actually >> > another >> > deref which is compatible (same type etc.), copy propagation gets rid of >> > the >> > cast for you. The idea is that, instead of doing a load_raw(raw_ptr), >> > you >> > would do a load((type *)raw_ptr). >> > >> > Right now, most of the core NIR optimizations will throw a fit if they >> > ever >> > see a cast. This is intentional because it requires us to manually go >> > through and handle casts. This would mean that, at the moment, you >> > would >> > have to lower to load_raw intrinsics almost immediately after coming out >> > of >> > SPIR-V. >> > >> >> Well it gets more fun with OpenCL 2.0 where you can have generic >> pointer where you only know the type at creation type. You can also >> declare generic pointers as function inputs in a way, that you never >> actually know from where you have to load if you only have that one >> function. So the actual load operation depends on when you create the >> initial pointer variable (you can cast from X to generic, but not the >> other way around). >> >> Which in the end means you can end up with load(generic_ptr) and only >> following the chain up to it's creation (with function inlining in >> mind) you know the actual memory target. > > > Yup. And there will always be crazy cases where you can't actually follow > it and you have to emit a pile of code to load different ways depending on > some bits somewhere that tell you how to load it. I'm well aware of the > insanity. :-) This is part of the reason why I'm glad I'm not trying to > write an OpenCL 2.0 driver. > > This insanity is exactly why I'm suggesting the pointer casting. Sure, you > may not know the data type until the actual load. In that case, you end up > with the cast being right before the load. If you don't know the storage > class, maybe you have to switch and do multiple casts based on some bits. > Alternatively, if you don't know the storage class, we can just let the > deref mode be 0 for "I don't know". or maybe multiple bits for "these are > the things it might be". In any case, I think we can handle it. >
there shouldn't be a situation where we don't know, except when you don't inline all functions. I think Rob had the idea of fat pointers where a pointer is a vec2 and the 2nd component contains the actual pointer type and you end up with a switch over the type to get the correct storage class. And if the compiler inlines all functions, it should be able to optimize that switch away. > It's insane but we need some sort of structure to be able to reason about > the insanity. Immediately lowering everything to load_raw is a good way to > get a driver off the ground. What it's not so good for is making an > optimizing compiler that can reason about these crazy pointers and actually > optimize them. Lest I sound too negative, I'm 100% fine with taking a short > path to getting something working now so long as it doesn't cloud up our > ability to do better in the future. > >> >> And I think the issue here is not that it is some kind of raw pointer >> in the patch, but more like an unbound/physical pointer, which doesn't >> relate to any variable. It is just a value like any other int/long as >> well. >> >> > On Fri, Mar 23, 2018 at 12:33 PM, Karol Herbst <kher...@redhat.com> >> > wrote: >> >> >> >> From: Rob Clark <robdcl...@gmail.com> >> >> >> >> An attempt to add physical pointer support to vtn. I'm not totally >> >> happy about the handling of logical pointers vs physical pointers. >> >> So this is really more of an RFS (request for suggestions) >> >> >> >> v2: treat vec3 types as vec4 when dereferencing >> >> >> >> Signed-off-by: Karol Herbst <kher...@redhat.com> >> >> --- >> >> src/compiler/spirv/spirv_to_nir.c | 87 ++++++++--- >> >> src/compiler/spirv/vtn_private.h | 20 ++- >> >> src/compiler/spirv/vtn_variables.c | 300 >> >> ++++++++++++++++++++++++++++++++----- >> >> 3 files changed, 347 insertions(+), 60 deletions(-) >> >> >> >> diff --git a/src/compiler/spirv/spirv_to_nir.c >> >> b/src/compiler/spirv/spirv_to_nir.c >> >> index 334bcab9a82..d58a68f80ef 100644 >> >> --- a/src/compiler/spirv/spirv_to_nir.c >> >> +++ b/src/compiler/spirv/spirv_to_nir.c >> >> @@ -572,6 +572,7 @@ vtn_types_compatible(struct vtn_builder *b, >> >> vtn_types_compatible(b, t1->array_element, >> >> t2->array_element); >> >> >> >> case vtn_base_type_pointer: >> >> + case vtn_base_type_raw_pointer: >> >> return vtn_types_compatible(b, t1->deref, t2->deref); >> >> >> >> case vtn_base_type_struct: >> >> @@ -609,6 +610,7 @@ vtn_type_copy(struct vtn_builder *b, struct >> >> vtn_type >> >> *src) >> >> case vtn_base_type_matrix: >> >> case vtn_base_type_array: >> >> case vtn_base_type_pointer: >> >> + case vtn_base_type_raw_pointer: >> >> case vtn_base_type_image: >> >> case vtn_base_type_sampler: >> >> case vtn_base_type_sampled_image: >> >> @@ -939,6 +941,14 @@ vtn_type_layout_std430(struct vtn_builder *b, >> >> struct >> >> vtn_type *type, >> >> return type; >> >> } >> >> >> >> + case vtn_base_type_raw_pointer: { >> >> + uint32_t comp_size = b->ptr_size / 8; >> >> + vtn_assert(comp_size); >> >> + *size_out = comp_size; >> >> + *align_out = comp_size; >> >> + return type; >> >> + } >> >> + >> >> case vtn_base_type_vector: { >> >> uint32_t comp_size = glsl_get_bit_size(type->type) / 8; >> >> assert(type->length > 0 && type->length <= 4); >> >> @@ -1003,6 +1013,7 @@ vtn_handle_type(struct vtn_builder *b, SpvOp >> >> opcode, >> >> val->type->base_type = vtn_base_type_scalar; >> >> val->type->type = glsl_bool_type(); >> >> val->type->length = 1; >> >> + val->type->stride = 4; >> >> break; >> >> case SpvOpTypeInt: { >> >> int bit_size = w[2]; >> >> @@ -1025,6 +1036,7 @@ vtn_handle_type(struct vtn_builder *b, SpvOp >> >> opcode, >> >> vtn_fail("Invalid int bit size"); >> >> } >> >> val->type->length = 1; >> >> + val->type->stride = bit_size / 8; >> >> break; >> >> } >> >> >> >> @@ -1045,6 +1057,7 @@ vtn_handle_type(struct vtn_builder *b, SpvOp >> >> opcode, >> >> vtn_fail("Invalid float bit size"); >> >> } >> >> val->type->length = 1; >> >> + val->type->stride = bit_size / 8; >> >> break; >> >> } >> >> >> >> @@ -1061,6 +1074,10 @@ vtn_handle_type(struct vtn_builder *b, SpvOp >> >> opcode, >> >> val->type->type = >> >> glsl_vector_type(glsl_get_base_type(base->type), >> >> elems); >> >> val->type->length = elems; >> >> val->type->stride = glsl_get_bit_size(base->type) / 8; >> >> + /* special case: vec3 is aligned to vec4 */ >> >> + if (elems == 3) >> >> + elems = 4; >> >> + val->type->stride *= elems; >> >> val->type->array_element = base; >> >> break; >> >> } >> >> @@ -1138,7 +1155,11 @@ vtn_handle_type(struct vtn_builder *b, SpvOp >> >> opcode, >> >> >> >> const char *name = val->name ? val->name : "struct"; >> >> >> >> - val->type->type = glsl_struct_type(fields, num_fields, name, >> >> false); >> >> + val->type->type = glsl_struct_type(fields, num_fields, name, >> >> + val->type->packed); >> >> + // TODO stride for a struct only matters for kernel shaders, >> >> where >> >> + // cl_size is the right thing.. but still a bit ugly to >> >> hard-code. >> >> + val->type->stride = glsl_get_cl_size(val->type->type); >> >> break; >> >> } >> >> >> >> @@ -1167,25 +1188,47 @@ vtn_handle_type(struct vtn_builder *b, SpvOp >> >> opcode, >> >> val->type->storage_class = storage_class; >> >> val->type->deref = deref_type; >> >> >> >> - if (storage_class == SpvStorageClassUniform || >> >> - storage_class == SpvStorageClassStorageBuffer) { >> >> - /* These can actually be stored to nir_variables and used as >> >> SSA >> >> - * values so they need a real glsl_type. >> >> - */ >> >> - val->type->type = glsl_vector_type(GLSL_TYPE_UINT, 2); >> >> - } >> >> - >> >> - if (storage_class == SpvStorageClassWorkgroup && >> >> - b->options->lower_workgroup_access_to_offsets) { >> >> + // XXX handling the "fake" glsl pointers vs "raw" pointers in >> >> kernel >> >> + // is a bit ugly.. need to understand how "pointers" are used >> >> in >> >> vk >> >> + // and figure out something better >> >> + if (storage_class == SpvStorageClassFunction || >> >> + storage_class == SpvStorageClassUniformConstant || >> >> + storage_class == SpvStorageClassWorkgroup || >> >> + !b->kernel_mode) { >> >> + if (storage_class == SpvStorageClassUniform || >> >> + storage_class == SpvStorageClassStorageBuffer) { >> >> + /* These can actually be stored to nir_variables and used >> >> as >> >> SSA >> >> + * values so they need a real glsl_type. >> >> + */ >> >> + val->type->type = glsl_vector_type(GLSL_TYPE_UINT, 2); >> >> + } else if (storage_class == SpvStorageClassWorkgroup && >> >> + b->options->lower_workgroup_access_to_offsets) { >> >> + uint32_t size, align; >> >> + val->type->deref = vtn_type_layout_std430(b, >> >> val->type->deref, >> >> + &size, &align); >> >> + val->type->length = size; >> >> + val->type->align = align; >> >> + /* These can actually be stored to nir_variables and used >> >> as >> >> SSA >> >> + * values so they need a real glsl_type. >> >> + */ >> >> + val->type->type = glsl_uint_type(); >> >> + } >> >> + } else { >> >> + vtn_assert(storage_class == SpvStorageClassCrossWorkgroup || >> >> + storage_class == SpvStorageClassInput); >> >> uint32_t size, align; >> >> + if (b->ptr_size == 64) { >> >> + val->type->type = glsl_uint64_t_type(); >> >> + } else { >> >> + val->type->type = glsl_uint_type(); >> >> + } >> >> + val->type->base_type = vtn_base_type_raw_pointer; >> >> + /* pointers can be accessed as array, so set the stride as >> >> size: >> >> */ >> >> val->type->deref = vtn_type_layout_std430(b, >> >> val->type->deref, >> >> &size, &align); >> >> - val->type->length = size; >> >> +#define ALIGN(_v, _d) (((_v) + ((_d) - 1)) & ~((_d) - 1)) >> >> + val->type->stride = ALIGN(size, align); >> >> val->type->align = align; >> >> - /* These can actually be stored to nir_variables and used as >> >> SSA >> >> - * values so they need a real glsl_type. >> >> - */ >> >> - val->type->type = glsl_uint_type(); >> >> } >> >> break; >> >> } >> >> @@ -3391,9 +3434,16 @@ vtn_handle_preamble_instruction(struct >> >> vtn_builder >> >> *b, SpvOp opcode, >> >> break; >> >> >> >> case SpvOpMemoryModel: >> >> - vtn_assert(w[1] == SpvAddressingModelLogical); >> >> + vtn_assert(w[1] == SpvAddressingModelLogical || >> >> + w[1] == SpvAddressingModelPhysical32 || >> >> + w[1] == SpvAddressingModelPhysical64); >> >> vtn_assert(w[2] == SpvMemoryModelSimple || >> >> - w[2] == SpvMemoryModelGLSL450); >> >> + w[2] == SpvMemoryModelGLSL450 || >> >> + w[2] == SpvMemoryModelOpenCL); >> >> + if (w[1] == SpvAddressingModelPhysical32) >> >> + b->ptr_size = 32; >> >> + else if (w[1] == SpvAddressingModelPhysical64) >> >> + b->ptr_size = 64; >> >> break; >> >> >> >> case SpvOpEntryPoint: { >> >> @@ -3780,6 +3830,7 @@ vtn_handle_body_instruction(struct vtn_builder >> >> *b, >> >> SpvOp opcode, >> >> sel_type = glsl_vector_type(GLSL_TYPE_BOOL, >> >> res_val->type->length); >> >> break; >> >> case vtn_base_type_pointer: >> >> + case vtn_base_type_raw_pointer: >> >> /* We need to have actual storage for pointer types */ >> >> vtn_fail_if(res_val->type->type == NULL, >> >> "Invalid pointer result type for OpSelect"); >> >> diff --git a/src/compiler/spirv/vtn_private.h >> >> b/src/compiler/spirv/vtn_private.h >> >> index bd273122703..dbfe9eab58a 100644 >> >> --- a/src/compiler/spirv/vtn_private.h >> >> +++ b/src/compiler/spirv/vtn_private.h >> >> @@ -262,7 +262,8 @@ enum vtn_base_type { >> >> vtn_base_type_matrix, >> >> vtn_base_type_array, >> >> vtn_base_type_struct, >> >> - vtn_base_type_pointer, >> >> + vtn_base_type_pointer, /* a logical pointer */ >> >> + vtn_base_type_raw_pointer, /* a physical pointer */ >> >> vtn_base_type_image, >> >> vtn_base_type_sampler, >> >> vtn_base_type_sampled_image, >> >> @@ -407,12 +408,14 @@ enum vtn_variable_mode { >> >> vtn_variable_mode_local, >> >> vtn_variable_mode_global, >> >> vtn_variable_mode_param, >> >> + vtn_variable_mode_const, >> >> vtn_variable_mode_ubo, >> >> vtn_variable_mode_ssbo, >> >> vtn_variable_mode_push_constant, >> >> vtn_variable_mode_image, >> >> vtn_variable_mode_sampler, >> >> vtn_variable_mode_workgroup, >> >> + vtn_variable_mode_cross_workgroup, >> >> vtn_variable_mode_input, >> >> vtn_variable_mode_output, >> >> }; >> >> @@ -588,6 +591,13 @@ struct vtn_builder { >> >> >> >> bool has_loop_continue; >> >> >> >> + /* pointer size is: >> >> + * AddressingModelLogical: 0 (default) >> >> + * AddressingModelPhysical32: 32 >> >> + * AddressingModelPhysical64: 64 >> >> + */ >> >> + unsigned ptr_size; >> >> + >> >> bool kernel_mode; >> >> }; >> >> >> >> @@ -624,7 +634,8 @@ vtn_push_ssa(struct vtn_builder *b, uint32_t >> >> value_id, >> >> struct vtn_type *type, struct vtn_ssa_value *ssa) >> >> { >> >> struct vtn_value *val; >> >> - if (type->base_type == vtn_base_type_pointer) { >> >> + if (type->base_type == vtn_base_type_pointer || >> >> + type->base_type == vtn_base_type_raw_pointer) { >> >> val = vtn_push_value(b, value_id, vtn_value_type_pointer); >> >> val->pointer = vtn_pointer_from_ssa(b, ssa->def, type); >> >> } else { >> >> @@ -688,6 +699,11 @@ struct vtn_ssa_value *vtn_local_load(struct >> >> vtn_builder *b, nir_deref_var *src); >> >> void vtn_local_store(struct vtn_builder *b, struct vtn_ssa_value *src, >> >> nir_deref_var *dest); >> >> >> >> +struct vtn_ssa_value *vtn_pointer_load(struct vtn_builder *b, >> >> + struct vtn_pointer *ptr); >> >> +void vtn_pointer_store(struct vtn_builder *b, struct vtn_ssa_value >> >> *src, >> >> + struct vtn_pointer *ptr); >> >> + >> >> struct vtn_ssa_value * >> >> vtn_variable_load(struct vtn_builder *b, struct vtn_pointer *src); >> >> >> >> diff --git a/src/compiler/spirv/vtn_variables.c >> >> b/src/compiler/spirv/vtn_variables.c >> >> index b2897407fb1..af9222d6f4e 100644 >> >> --- a/src/compiler/spirv/vtn_variables.c >> >> +++ b/src/compiler/spirv/vtn_variables.c >> >> @@ -51,6 +51,9 @@ vtn_access_chain_extend(struct vtn_builder *b, struct >> >> vtn_access_chain *old, >> >> unsigned old_len = old ? old->length : 0; >> >> chain = vtn_access_chain_create(b, old_len + new_ids); >> >> >> >> + if (old) >> >> + chain->ptr_as_array = old->ptr_as_array; >> >> + >> >> for (unsigned i = 0; i < old_len; i++) >> >> chain->link[i] = old->link[i]; >> >> >> >> @@ -88,11 +91,12 @@ vtn_access_chain_pointer_dereference(struct >> >> vtn_builder *b, >> >> vtn_access_chain_extend(b, base->chain, deref_chain->length); >> >> struct vtn_type *type = base->type; >> >> >> >> - /* OpPtrAccessChain is only allowed on things which support >> >> variable >> >> - * pointers. For everything else, the client is expected to just >> >> pass >> >> us >> >> - * the right access chain. >> >> + /* We can get an *PtrAccessChain with cl kernels, with first >> >> element >> >> + * of deref chain being literal zero. But we can't really cope w/ >> >> + * tacking on another level of ptr_as_array. >> >> */ >> >> - vtn_assert(!deref_chain->ptr_as_array); >> >> + vtn_assert(!(chain->ptr_as_array && deref_chain->ptr_as_array)); >> >> + chain->ptr_as_array = deref_chain->ptr_as_array; >> >> >> >> unsigned start = base->chain ? base->chain->length : 0; >> >> for (unsigned i = 0; i < deref_chain->length; i++) { >> >> @@ -135,6 +139,21 @@ vtn_access_link_as_ssa(struct vtn_builder *b, >> >> struct >> >> vtn_access_link link, >> >> } >> >> } >> >> >> >> +static struct vtn_access_link >> >> +vtn_to_access_link(struct vtn_builder *b, uint32_t link_id) >> >> +{ >> >> + struct vtn_value *link_val = vtn_untyped_value(b, link_id); >> >> + static struct vtn_access_link link; >> >> + if (link_val->value_type == vtn_value_type_constant) { >> >> + link.mode = vtn_access_mode_literal; >> >> + link.id = link_val->constant->values[0].u32[0]; >> >> + } else { >> >> + link.mode = vtn_access_mode_id; >> >> + link.id = link_id; >> >> + } >> >> + return link; >> >> +} >> >> + >> >> static nir_ssa_def * >> >> vtn_variable_resource_index(struct vtn_builder *b, struct vtn_variable >> >> *var, >> >> nir_ssa_def *desc_array_index) >> >> @@ -207,6 +226,12 @@ vtn_ssa_offset_pointer_dereference(struct >> >> vtn_builder >> >> *b, >> >> /* You can't have a zero-length OpPtrAccessChain */ >> >> vtn_assert(deref_chain->length >= 1); >> >> desc_arr_idx = vtn_access_link_as_ssa(b, >> >> deref_chain->link[0], 1); >> >> + } else if (!glsl_type_is_struct(type->type)) { >> >> + /* if we have something that is ptr to simple type, like >> >> an >> >> + * int ptr, then just treat that like an array of length >> >> 1, >> >> + * we just want to deref the zero'th element: >> >> + */ >> >> + desc_arr_idx = nir_imm_int(&b->nb, 0); >> >> } else { >> >> /* We have a regular non-array SSBO. */ >> >> desc_arr_idx = NULL; >> >> @@ -304,7 +329,9 @@ vtn_ssa_offset_pointer_dereference(struct >> >> vtn_builder >> >> *b, >> >> case GLSL_TYPE_FLOAT: >> >> case GLSL_TYPE_FLOAT16: >> >> case GLSL_TYPE_DOUBLE: >> >> - case GLSL_TYPE_BOOL: >> >> + case GLSL_TYPE_BOOL: { >> >> + break; >> >> + } >> >> case GLSL_TYPE_ARRAY: { >> >> nir_ssa_def *elem_offset = >> >> vtn_access_link_as_ssa(b, deref_chain->link[idx], >> >> type->stride); >> >> @@ -336,6 +363,98 @@ vtn_ssa_offset_pointer_dereference(struct >> >> vtn_builder >> >> *b, >> >> return ptr; >> >> } >> >> >> >> +static struct vtn_pointer * >> >> +vtn_ssa_raw_pointer_dereference(struct vtn_builder *b, >> >> + struct vtn_pointer *base, >> >> + struct vtn_access_chain *deref_chain) >> >> +{ >> >> + nir_ssa_def *offset = nir_imm_int(&b->nb, 0); >> >> + struct vtn_type *type = base->type; >> >> + >> >> + >> >> + vtn_assert(type); >> >> + >> >> + unsigned idx = 0; >> >> + >> >> + /* For the *PtrAccessChain deref instructions, the first entry in >> >> + * the deref chain is deref'ing the ptr as an array, ie. if you >> >> + * have "struct foo *f" then it can be deref'd as either "f->blah", >> >> + * which is equiv to "f[0].blah", or "f[n].blah". >> >> + */ >> >> + if (deref_chain->ptr_as_array) { >> >> + nir_ssa_def *elem_offset = >> >> + vtn_access_link_as_ssa(b, deref_chain->link[idx], >> >> type->stride); >> >> + offset = nir_iadd(&b->nb, offset, elem_offset); >> >> + idx++; >> >> + } >> >> + >> >> + /* TODO this can probably be refactored out into helper? Nearly >> >> + * the same as in vtn_ssa_offset_pointer_dereference() and perhaps >> >> + * elsewhere.. >> >> + * >> >> + * Note that the type here in the switch is of what the pointer >> >> + * points to, ie. if the base type is uint, we are dereferencing >> >> + * (possibly as an array) a uint* pointer. >> >> + */ >> >> + for (; idx < deref_chain->length; idx++) { >> >> + switch (glsl_get_base_type(type->type)) { >> >> + case GLSL_TYPE_UINT: >> >> + case GLSL_TYPE_INT: >> >> + case GLSL_TYPE_UINT16: >> >> + case GLSL_TYPE_INT16: >> >> + case GLSL_TYPE_UINT8: >> >> + case GLSL_TYPE_INT8: >> >> + case GLSL_TYPE_UINT64: >> >> + case GLSL_TYPE_INT64: >> >> + case GLSL_TYPE_FLOAT: >> >> + case GLSL_TYPE_FLOAT16: >> >> + case GLSL_TYPE_DOUBLE: >> >> + case GLSL_TYPE_BOOL: >> >> + case GLSL_TYPE_ARRAY: { >> >> + nir_ssa_def *elem_offset = >> >> + vtn_access_link_as_ssa(b, deref_chain->link[idx], >> >> type->stride); >> >> + offset = nir_iadd(&b->nb, offset, elem_offset); >> >> + if (type->array_element) { >> >> + type = type->array_element; >> >> + } else { >> >> + vtn_assert(idx == (deref_chain->length - 1)); >> >> + } >> >> + break; >> >> + } >> >> + >> >> + case GLSL_TYPE_STRUCT: { >> >> + vtn_assert(deref_chain->link[idx].mode == >> >> vtn_access_mode_literal); >> >> + unsigned member = deref_chain->link[idx].id; >> >> + nir_ssa_def *mem_offset = nir_imm_int(&b->nb, >> >> type->offsets[member]); >> >> + offset = nir_iadd(&b->nb, offset, mem_offset); >> >> + type = type->members[member]; >> >> + break; >> >> + } >> >> + >> >> + default: >> >> + vtn_fail("Invalid type for deref"); >> >> + } >> >> + } >> >> + >> >> + /* at this point, offset is 32b, but pointer can be either 32b or >> >> 64b >> >> + * depending on memory model: >> >> + */ >> >> + if (b->ptr_size == 64) { >> >> + offset = nir_u2u64(&b->nb, offset); >> >> + } >> >> + >> >> + /* add pointer address to calculated offset: */ >> >> + if (base->offset) >> >> + offset = nir_iadd(&b->nb, offset, base->offset); >> >> + >> >> + struct vtn_pointer *ptr = rzalloc(b, struct vtn_pointer); >> >> + ptr->mode = base->mode; >> >> + ptr->type = base->type; >> >> + ptr->offset = offset; >> >> + >> >> + return ptr; >> >> +} >> >> + >> >> /* Dereference the given base pointer by the access chain */ >> >> static struct vtn_pointer * >> >> vtn_pointer_dereference(struct vtn_builder *b, >> >> @@ -344,6 +463,8 @@ vtn_pointer_dereference(struct vtn_builder *b, >> >> { >> >> if (vtn_pointer_uses_ssa_offset(b, base)) { >> >> return vtn_ssa_offset_pointer_dereference(b, base, deref_chain); >> >> + } else if (base->ptr_type->base_type == vtn_base_type_raw_pointer) >> >> { >> >> + return vtn_ssa_raw_pointer_dereference(b, base, deref_chain); >> >> } else { >> >> return vtn_access_chain_pointer_dereference(b, base, >> >> deref_chain); >> >> } >> >> @@ -373,7 +494,8 @@ vtn_pointer_for_variable(struct vtn_builder *b, >> >> >> >> pointer->mode = var->mode; >> >> pointer->type = var->type; >> >> - vtn_assert(ptr_type->base_type == vtn_base_type_pointer); >> >> + vtn_assert(ptr_type->base_type == vtn_base_type_pointer || >> >> + ptr_type->base_type == vtn_base_type_raw_pointer); >> >> vtn_assert(ptr_type->deref->type == var->type->type); >> >> pointer->ptr_type = ptr_type; >> >> pointer->var = var; >> >> @@ -384,6 +506,12 @@ vtn_pointer_for_variable(struct vtn_builder *b, >> >> nir_deref_var * >> >> vtn_pointer_to_deref(struct vtn_builder *b, struct vtn_pointer *ptr) >> >> { >> >> + /* once you've chased a pointer, you no longer really have a var, >> >> + * so this case is handled differently >> >> + */ >> >> + if (!ptr->var) >> >> + return NULL; >> >> + >> >> /* Do on-the-fly copy propagation for samplers. */ >> >> if (ptr->var->copy_prop_sampler) >> >> return vtn_pointer_to_deref(b, ptr->var->copy_prop_sampler); >> >> @@ -408,7 +536,19 @@ vtn_pointer_to_deref(struct vtn_builder *b, struct >> >> vtn_pointer *ptr) >> >> nir_deref *tail = &deref_var->deref; >> >> nir_variable **members = ptr->var->members; >> >> >> >> - for (unsigned i = 0; i < chain->length; i++) { >> >> + unsigned i = 0; >> >> + if (chain->ptr_as_array) { >> >> + /* We can't currently handle this in a nir deref chain. Perhaps >> >> + * a new type of deref node is needed? But we can at least >> >> handle >> >> + * the case where the first deref link is literal zero by >> >> skipping >> >> + * it. >> >> + */ >> >> + vtn_assert(chain->link[0].mode == vtn_access_mode_literal); >> >> + vtn_assert(chain->link[0].id == 0); >> >> + i++; >> >> + } >> >> + >> >> + for (; i < chain->length; i++) { >> >> enum glsl_base_type base_type = >> >> glsl_get_base_type(deref_type->type); >> >> switch (base_type) { >> >> case GLSL_TYPE_UINT: >> >> @@ -598,6 +738,49 @@ vtn_local_store(struct vtn_builder *b, struct >> >> vtn_ssa_value *src, >> >> } >> >> } >> >> >> >> +struct vtn_ssa_value * >> >> +vtn_pointer_load(struct vtn_builder *b, struct vtn_pointer *ptr) >> >> +{ >> >> + const struct glsl_type *type = ptr->type->type; >> >> + struct vtn_ssa_value *val = vtn_create_ssa_value(b, type); >> >> + nir_intrinsic_op op = nir_intrinsic_load_global; >> >> + >> >> + vtn_assert(ptr->offset); >> >> + >> >> + nir_intrinsic_instr *intrin = nir_intrinsic_instr_create(b->shader, >> >> op); >> >> + intrin->num_components = glsl_get_vector_elements(type); >> >> + intrin->src[0] = nir_src_for_ssa(ptr->offset); >> >> + >> >> + nir_ssa_dest_init(&intrin->instr, &intrin->dest, >> >> + intrin->num_components, >> >> + glsl_get_bit_size(type), >> >> + NULL); >> >> + val->def = &intrin->dest.ssa; >> >> + >> >> + nir_builder_instr_insert(&b->nb, &intrin->instr); >> >> + >> >> + return val; >> >> +} >> >> + >> >> +void >> >> +vtn_pointer_store(struct vtn_builder *b, struct vtn_ssa_value *src, >> >> + struct vtn_pointer *ptr) >> >> +{ >> >> + const struct glsl_type *type = ptr->type->type; >> >> + nir_intrinsic_op op = nir_intrinsic_store_global; >> >> + >> >> + vtn_assert(ptr->offset); >> >> + >> >> + nir_intrinsic_instr *intrin = nir_intrinsic_instr_create(b->shader, >> >> op); >> >> + intrin->num_components = MAX2(1, glsl_get_vector_elements(type)); >> >> + intrin->src[0] = nir_src_for_ssa(src->def); >> >> + intrin->src[1] = nir_src_for_ssa(ptr->offset); >> >> + >> >> + nir_intrinsic_set_write_mask(intrin, (1 << intrin->num_components) >> >> - >> >> 1); >> >> + >> >> + nir_builder_instr_insert(&b->nb, &intrin->instr); >> >> +} >> >> + >> >> nir_ssa_def * >> >> vtn_pointer_to_offset(struct vtn_builder *b, struct vtn_pointer *ptr, >> >> nir_ssa_def **index_out, unsigned *end_idx_out) >> >> @@ -1004,20 +1187,33 @@ _vtn_variable_load_store(struct vtn_builder *b, >> >> bool load, >> >> case GLSL_TYPE_FLOAT: >> >> case GLSL_TYPE_FLOAT16: >> >> case GLSL_TYPE_BOOL: >> >> - case GLSL_TYPE_DOUBLE: >> >> - /* At this point, we have a scalar, vector, or matrix so we know >> >> that >> >> - * there cannot be any structure splitting still in the way. By >> >> - * stopping at the matrix level rather than the vector level, we >> >> - * ensure that matrices get loaded in the optimal way even if >> >> they >> >> - * are storred row-major in a UBO. >> >> - */ >> >> - if (load) { >> >> - *inout = vtn_local_load(b, vtn_pointer_to_deref(b, ptr)); >> >> + case GLSL_TYPE_DOUBLE: { >> >> + nir_deref_var *deref_var = vtn_pointer_to_deref(b, ptr); >> >> + >> >> + if (deref_var) { >> >> + /* At this point, we have a scalar, vector, or matrix so we >> >> know >> >> that >> >> + * there cannot be any structure splitting still in the way. >> >> By >> >> + * stopping at the matrix level rather than the vector level, >> >> we >> >> + * ensure that matrices get loaded in the optimal way even if >> >> they >> >> + * are storred row-major in a UBO. >> >> + */ >> >> + if (load) { >> >> + *inout = vtn_local_load(b, deref_var); >> >> + } else { >> >> + vtn_local_store(b, *inout, deref_var); >> >> + } >> >> } else { >> >> - vtn_local_store(b, *inout, vtn_pointer_to_deref(b, ptr)); >> >> + /* If deref'ing a raw pointer, we don't have a variable to go >> >> along >> >> + * with it. Just directly generate load/store_global >> >> intrinsics: >> >> + */ >> >> + if (load) { >> >> + *inout = vtn_pointer_load(b, ptr); >> >> + } else { >> >> + vtn_pointer_store(b, *inout, ptr); >> >> + } >> >> } >> >> return; >> >> - >> >> + } >> >> case GLSL_TYPE_ARRAY: >> >> case GLSL_TYPE_STRUCT: { >> >> unsigned elems = glsl_get_length(ptr->type->type); >> >> @@ -1052,6 +1248,8 @@ vtn_variable_load(struct vtn_builder *b, struct >> >> vtn_pointer *src) >> >> { >> >> if (vtn_pointer_is_external_block(b, src)) { >> >> return vtn_block_load(b, src); >> >> + } else if (!src->var) { >> >> + return vtn_pointer_load(b, src); >> >> } else { >> >> struct vtn_ssa_value *val = NULL; >> >> _vtn_variable_load_store(b, true, src, &val); >> >> @@ -1065,8 +1263,11 @@ vtn_variable_store(struct vtn_builder *b, struct >> >> vtn_ssa_value *src, >> >> { >> >> if (vtn_pointer_is_external_block(b, dest)) { >> >> vtn_assert(dest->mode == vtn_variable_mode_ssbo || >> >> - dest->mode == vtn_variable_mode_workgroup); >> >> + dest->mode == vtn_variable_mode_workgroup || >> >> + dest->mode == vtn_variable_mode_param); >> >> vtn_block_store(b, src, dest); >> >> + } else if (!dest->var) { >> >> + vtn_pointer_store(b, src, dest); >> >> } else { >> >> _vtn_variable_load_store(b, false, dest, &src); >> >> } >> >> @@ -1608,6 +1809,7 @@ static enum vtn_variable_mode >> >> vtn_storage_class_to_mode(struct vtn_builder *b, >> >> SpvStorageClass class, >> >> struct vtn_type *interface_type, >> >> + bool has_initializer, >> >> nir_variable_mode *nir_mode_out) >> >> { >> >> enum vtn_variable_mode mode; >> >> @@ -1635,8 +1837,12 @@ vtn_storage_class_to_mode(struct vtn_builder *b, >> >> } else if (glsl_type_is_sampler(interface_type->type)) { >> >> mode = vtn_variable_mode_sampler; >> >> nir_mode = nir_var_uniform; >> >> + } else if (has_initializer) { >> >> + mode = vtn_variable_mode_const; >> >> + nir_mode = nir_var_global; >> >> } else { >> >> - vtn_fail("Invalid uniform constant variable type"); >> >> + mode = vtn_variable_mode_push_constant; >> >> + nir_mode = nir_var_uniform; >> >> } >> >> break; >> >> case SpvStorageClassPushConstant: >> >> @@ -1664,6 +1870,10 @@ vtn_storage_class_to_mode(struct vtn_builder *b, >> >> nir_mode = nir_var_shared; >> >> break; >> >> case SpvStorageClassCrossWorkgroup: >> >> + mode = vtn_variable_mode_cross_workgroup; >> >> + // TODO do we want a new nir_mode for raw pointers? >> >> + nir_mode = nir_var_global; >> >> + break; >> >> case SpvStorageClassGeneric: >> >> case SpvStorageClassAtomicCounter: >> >> default: >> >> @@ -1692,7 +1902,14 @@ vtn_pointer_to_ssa(struct vtn_builder *b, struct >> >> vtn_pointer *ptr) >> >> struct vtn_access_chain chain = { >> >> .length = 0, >> >> }; >> >> - ptr = vtn_ssa_offset_pointer_dereference(b, ptr, &chain); >> >> + >> >> + if (vtn_pointer_uses_ssa_offset(b, ptr)) { >> >> + ptr = vtn_ssa_offset_pointer_dereference(b, ptr, &chain); >> >> + } else if (ptr->ptr_type->base_type == >> >> vtn_base_type_raw_pointer) { >> >> + ptr = vtn_ssa_raw_pointer_dereference(b, ptr, &chain); >> >> + } else { >> >> + vtn_fail("unhandled"); >> >> + } >> >> } >> >> >> >> vtn_assert(ptr->offset); >> >> @@ -1710,27 +1927,34 @@ struct vtn_pointer * >> >> vtn_pointer_from_ssa(struct vtn_builder *b, nir_ssa_def *ssa, >> >> struct vtn_type *ptr_type) >> >> { >> >> - vtn_assert(ssa->num_components <= 2 && ssa->bit_size == 32); >> >> - vtn_assert(ptr_type->base_type == vtn_base_type_pointer); >> >> + vtn_assert(ptr_type->base_type == vtn_base_type_pointer || >> >> + ptr_type->base_type == vtn_base_type_raw_pointer); >> >> vtn_assert(ptr_type->deref->base_type != vtn_base_type_pointer); >> >> /* This pointer type needs to have actual storage */ >> >> vtn_assert(ptr_type->type); >> >> >> >> struct vtn_pointer *ptr = rzalloc(b, struct vtn_pointer); >> >> ptr->mode = vtn_storage_class_to_mode(b, ptr_type->storage_class, >> >> - ptr_type, NULL); >> >> + ptr_type, false, NULL); >> >> + >> >> + if (ptr_type->base_type == vtn_base_type_raw_pointer) { >> >> + vtn_assert(ssa->num_components == 1 && ssa->bit_size == >> >> b->ptr_size); >> >> + } else { >> >> + vtn_assert(ssa->num_components <= 2 && ssa->bit_size == 32); >> >> + } >> >> + >> >> ptr->type = ptr_type->deref; >> >> ptr->ptr_type = ptr_type; >> >> >> >> if (ssa->num_components > 1) { >> >> vtn_assert(ssa->num_components == 2); >> >> vtn_assert(ptr->mode == vtn_variable_mode_ubo || >> >> - ptr->mode == vtn_variable_mode_ssbo); >> >> + ptr->mode == vtn_variable_mode_ssbo || >> >> + ptr->mode == vtn_variable_mode_global); >> >> ptr->block_index = nir_channel(&b->nb, ssa, 0); >> >> ptr->offset = nir_channel(&b->nb, ssa, 1); >> >> } else { >> >> vtn_assert(ssa->num_components == 1); >> >> - vtn_assert(ptr->mode == vtn_variable_mode_workgroup); >> >> ptr->block_index = NULL; >> >> ptr->offset = ssa; >> >> } >> >> @@ -1761,7 +1985,8 @@ vtn_create_variable(struct vtn_builder *b, struct >> >> vtn_value *val, >> >> struct vtn_type *ptr_type, SpvStorageClass >> >> storage_class, >> >> nir_constant *initializer) >> >> { >> >> - vtn_assert(ptr_type->base_type == vtn_base_type_pointer); >> >> + vtn_assert(ptr_type->base_type == vtn_base_type_pointer || >> >> + ptr_type->base_type == vtn_base_type_raw_pointer); >> >> struct vtn_type *type = ptr_type->deref; >> >> >> >> struct vtn_type *without_array = type; >> >> @@ -1770,7 +1995,8 @@ vtn_create_variable(struct vtn_builder *b, struct >> >> vtn_value *val, >> >> >> >> enum vtn_variable_mode mode; >> >> nir_variable_mode nir_mode; >> >> - mode = vtn_storage_class_to_mode(b, storage_class, without_array, >> >> &nir_mode); >> >> + mode = vtn_storage_class_to_mode(b, storage_class, without_array, >> >> + !!initializer, &nir_mode); >> >> >> >> switch (mode) { >> >> case vtn_variable_mode_ubo: >> >> @@ -1933,6 +2159,7 @@ vtn_create_variable(struct vtn_builder *b, struct >> >> vtn_value *val, >> >> case vtn_variable_mode_ubo: >> >> case vtn_variable_mode_ssbo: >> >> case vtn_variable_mode_push_constant: >> >> + case vtn_variable_mode_cross_workgroup: >> >> /* These don't need actual variables. */ >> >> break; >> >> } >> >> @@ -2031,25 +2258,18 @@ vtn_handle_variables(struct vtn_builder *b, >> >> SpvOp >> >> opcode, >> >> case SpvOpAccessChain: >> >> case SpvOpPtrAccessChain: >> >> case SpvOpInBoundsAccessChain: { >> >> + struct vtn_type *ptr_type = vtn_value(b, w[1], >> >> vtn_value_type_type)->type; >> >> + struct vtn_value *base_val = vtn_untyped_value(b, w[3]); >> >> struct vtn_access_chain *chain = vtn_access_chain_create(b, >> >> count - >> >> 4); >> >> - chain->ptr_as_array = (opcode == SpvOpPtrAccessChain); >> >> + chain->ptr_as_array = (opcode == SpvOpPtrAccessChain) || >> >> + (opcode == SpvOpInBoundsPtrAccessChain); >> >> >> >> unsigned idx = 0; >> >> for (int i = 4; i < count; i++) { >> >> - struct vtn_value *link_val = vtn_untyped_value(b, w[i]); >> >> - if (link_val->value_type == vtn_value_type_constant) { >> >> - chain->link[idx].mode = vtn_access_mode_literal; >> >> - chain->link[idx].id = >> >> link_val->constant->values[0].u32[0]; >> >> - } else { >> >> - chain->link[idx].mode = vtn_access_mode_id; >> >> - chain->link[idx].id = w[i]; >> >> - >> >> - } >> >> + chain->link[idx] = vtn_to_access_link(b, w[i]); >> >> idx++; >> >> } >> >> >> >> - struct vtn_type *ptr_type = vtn_value(b, w[1], >> >> vtn_value_type_type)->type; >> >> - struct vtn_value *base_val = vtn_untyped_value(b, w[3]); >> >> if (base_val->value_type == vtn_value_type_sampled_image) { >> >> /* This is rather insane. SPIR-V allows you to use >> >> OpSampledImage >> >> * to combine an array of images with a single sampler to get >> >> an >> >> -- >> >> 2.14.3 >> >> >> >> _______________________________________________ >> >> mesa-dev mailing list >> >> mesa-dev@lists.freedesktop.org >> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > >> > > > _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev