Paul Berry <stereotype...@gmail.com> writes: > On 21 October 2013 17:48, Eric Anholt <e...@anholt.net> wrote: > >> Things blew up when I enabled the debug register spill code without >> disabling 16-wide, so I decided to just fix 16-wide spilling. >> >> We still don't generate 16-wide when register spilling happens as part of >> allocation (since we expect it to be slower), but now we can experiment >> with allowing it in some cases in the future. >> > > Also, we'll need the ability to spill in 16-wide when we get to doing > compute shaders, because there are certain conditions in compute shaders > where SIMD8 isn't an option (basically, if a compute shader uses barriers > or shared variables, all the threads within a thread group need to run > simultaneously on one subslice; with the maximum possible size thread group > (1024), that's only possible if we use SIMD16. See Graphics BSpec: > 3D-Media-GPGPU Engine > Media GPGPU Pipeline > Media GPGPU Pipeline > > Programming the GPGPU Pipeline IVB+ > GPGPU Thread Limits [DevHSW+]). > > >> --- >> src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 8 ++++---- >> src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 15 ++++++++------- >> 2 files changed, 12 insertions(+), 11 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> index fa15f7b..6c8fb76 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp >> @@ -718,8 +718,8 @@ fs_generator::generate_spill(fs_inst *inst, struct >> brw_reg src) >> brw_MOV(p, >> retype(brw_message_reg(inst->base_mrf + 1), >> BRW_REGISTER_TYPE_UD), >> retype(src, BRW_REGISTER_TYPE_UD)); >> - brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf), 1, >> - inst->offset); >> + brw_oword_block_write_scratch(p, brw_message_reg(inst->base_mrf), >> + inst->mlen, inst->offset); >> } >> >> void >> @@ -727,8 +727,8 @@ fs_generator::generate_unspill(fs_inst *inst, struct >> brw_reg dst) >> { >> assert(inst->mlen != 0); >> >> - brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf), >> 1, >> - inst->offset); >> + brw_oword_block_read_scratch(p, dst, brw_message_reg(inst->base_mrf), >> + dispatch_width / 8, inst->offset); >> } >> >> void >> diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> index 7826cd4..ed0ce0d 100644 >> --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp >> @@ -540,7 +540,7 @@ fs_visitor::emit_unspill(fs_inst *inst, fs_reg dst, >> uint32_t spill_offset, >> inst->insert_before(unspill_inst); >> >> dst.reg_offset++; >> > > I had to do a lot of digging to reassure myself that this didn't need to > change to "dst.reg_offset += dispatch_width / 8;". Would you mind > clarifying the comment above the declaration of reg_offset in brw_fs.h? > Currently it says: > > /** > * For virtual registers, this is a hardware register offset from > * the start of the register block (for example, a constant index > * in an array access). > */ > > which seems to imply that it's measured in increments of a single 256-bit > register. But it appears that for virtual registers, it's actually in > increments of (dispatch_width/8) 256-bit registers. > > >> - spill_offset += REG_SIZE; >> + spill_offset += dispatch_width * sizeof(float); >> } >> } >> >> @@ -624,10 +624,11 @@ fs_visitor::choose_spill_reg(struct ra_graph *g) >> void >> fs_visitor::spill_reg(int spill_reg) >> { >> + int reg_size = dispatch_width * sizeof(float); >> int size = virtual_grf_sizes[spill_reg]; >> unsigned int spill_offset = c->last_scratch; >> assert(ALIGN(spill_offset, 16) == spill_offset); /* oword read/write >> req. */ >> - c->last_scratch += size * REG_SIZE; >> + c->last_scratch += size * reg_size; >> >> /* Generate spill/unspill instructions for the objects being >> * spilled. Right now, we spill or unspill the whole thing to a >> @@ -646,7 +647,7 @@ fs_visitor::spill_reg(int spill_reg) >> inst->src[i].reg_offset = 0; >> >> emit_unspill(inst, inst->src[i], >> - spill_offset + REG_SIZE * >> inst->src[i].reg_offset, >> + spill_offset + reg_size * >> inst->src[i].reg_offset, >> regs_read); >> } >> } >> @@ -654,7 +655,7 @@ fs_visitor::spill_reg(int spill_reg) >> if (inst->dst.file == GRF && >> inst->dst.reg == spill_reg) { >> int subset_spill_offset = (spill_offset + >> - REG_SIZE * inst->dst.reg_offset); >> + reg_size * inst->dst.reg_offset); >> inst->dst.reg = virtual_grf_alloc(inst->regs_written); >> inst->dst.reg_offset = 0; >> >> @@ -677,11 +678,11 @@ fs_visitor::spill_reg(int spill_reg) >> fs_inst *spill_inst = new(mem_ctx) fs_inst(FS_OPCODE_SPILL, >> reg_null_f, >> spill_src); >> spill_src.reg_offset++; >> - spill_inst->offset = subset_spill_offset + chan * REG_SIZE; >> + spill_inst->offset = subset_spill_offset + chan * reg_size; >> spill_inst->ir = inst->ir; >> spill_inst->annotation = inst->annotation; >> - spill_inst->base_mrf = 14; >> - spill_inst->mlen = 2; /* header, value */ >> + spill_inst->mlen = 1 + dispatch_width / 8; /* header, value */ >> + spill_inst->base_mrf = 16 - spill_inst->mlen; >> > > This means that for SIMD16 spills, we'll use MRFs 13-15 (previously we used > MRFs 14-15). Do we know for sure that MRF 13 isn't used by any > non-spill-related code? If so, would you mind adding a quick comment to > clarify that?
Thanks for catching this. Not only do we not know that m13 isn't used, we don't even know that m14 isn't used in 16-wide! I think pre-regalloc I need to look at the set of used MRFs (like we do for the hack range currently), and if we spill while m13-15 are used then just fail() out.
pgpaiXvmnXDNE.pgp
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev