On Tue, 2016-08-23 at 12:58 -0700, Francisco Jerez wrote: > Iago Toral Quiroga <ito...@igalia.com> writes: > > > > > So we can access it in the vec4 backend to handle byte offsets into > > registers. > This change has deep implications in the meaning of the vec4 register > objects because the representation of register offsets is now split > between 'reg_offset' and 'subreg_offset', and there are a *lot* of > places that directly rely on the current representation of register > offsets, just grep the VEC4 back-end for 'reg_offset' -- Every > occurrence is now suspect of being inaccurate because the offset of a > register is no longer guaranteed to be 32B-aligned, so comparing > reg_offsets alone to find out whether two registers are equivalent or > whether they overlap is no longer sufficient. > > Do you *really* need to make this representation change for FP64?
No, in fact the fp64 series I sent did not need this at all. I only did this because in the review of the simd lowering pass you mentioned that we should use a horiz_offset() helper that worked in terms of channels and I assumed that also meant that you wanted offset() to work in similar fashion, like we have in the FS backend. I am actually relieved to see that you did not mean this :) That said, to implement horiz_offset() we need to use something like subnr or subreg_offset, right? so right after the simd splitting pass runs we are again in the situation where offsets are split between two fields. You mentioned in another e-mail that we should probably postpone simd splitting after the main opt loop, so I guess that should remove the problem for the most part... > Or is > there some way around it? ISTR that in the big VEC4 FP64 series you > ended up using backend_reg::subnr for this (which is just an > undercover > version of subreg_offset so the same caveat applies :P) when you had > to > address both halves of a single-precision register during SIMD > lowering Right, I thought this use of subnr there was maybe a bit of an abuse since it seems that before fp64 we mostly only used it at codegen time so I thought that having a different field for this in the IR (subreg_offset, like we do in the FS backend) was probably a better idea, but maybe it doesn't really matter much. If you have no issues with using subnr instead of subreg_offset for this I am fine too. The thing is that in the simd splitting pass I sent for review we assumed that we are only splitting DF instructions (because that's really the only case at the moment) and that the split DF dst and src regions are going to be exactly one SIMD register long (which is really what we end up with in all the cases where we need to split). Because of that, we can use the offset() helper to compute the start of the src/dst regions for DF operands (and subnr if any of the operands in the instruction is 32-bit) and because of that we added this assert: /* We always split so that each lowered instruction writes exactly to * one register. */ assert(inst->regs_written == inst->exec_size / lowered_width); If that assert is not met by any DF instruction that we need to split it means that we are splitting things in smaller chunks than that and using offset() when computing the split regions won't do what we need for that scenario, so that's why I added the assert. We don't have any such cases at the moment, but in the review process you suggested that we changed the implementation to use a horiz_offset() and remove that assertion, so I understood that you wanted the splitting pass to deal with regions that would not be register aligned, and that requires that we have a way to express offsets that are not register aligned too, and that's how I ended up doing this. I suppose I misunderstood your comments? > (Can you remind me what the exact use-case was for that?). We use subnr in two scenarios: 1. To select the second half of a DF register (ZW), but we only do this right before codegen, in the IR we work with the ZW wizzles. 2. To generate the second half of split instructions that use single- precision sources that are not uniforms, since these would not be register aligned after the split (they need to be 16-byte aligned). > To address > the ZW components of a double-precision vector this seems less of a > requirement because you can just use the logical (i.e. 64-bit-based) > swizzles at the IR level and only translate to physical > swizzles+offsets+strides during codegen time. That's exactly what we are doing, but as I said, we also need to use subnr when we split a DF instruction that has single-precision sources. Off the top of my head this can happen, for example, with some virtual opcodes like {pick/set}_{low/high}_32bit that we need for things like packDouble2x32 and similar situations (basically when we need to produce code that operates on the low/high 32-bit of a DF). There might be other cases of DF operations that take a single-precision source, I would have to check to confirm. > If the answer is that you definitely need this change, I hope the answer is that we don't need this... we have the subnr situation in split instructions that use single-precision sources. I don't think we can avoid that, but as long as we only call the simd splitting pass after the optimization loop we should be mostly fine, right? I am still a bit concerned with the changes you suggested to the simd splitting pass to deal with the possibility over dst/src overlap (basically that we make split instructions write to temps and the copy from that to the actual result), because we would like to run copy- propagation right after to clean up the temporaries if they are not really needed but I guess we can avoid that situation by making the pass responsible for not producing temporaries if there is no hazardous overlap, even if that requires additional complexity. > I think I'm going > to help you out with this because the change is going to be massive > and > involve auditing the whole VEC4 back-end. I think there are two > lessons > to learn from the FS back-end: > > - Having the register offset split between two variables has been an > endless source of bugs, because it's just too tempting to only > take > one of them into account and ignore the other. > > - It wouldn't be substantially more difficult to change reg_offset > to > be expressed in byte units instead, even if it involves going > through > every occurrence of reg_offset in the back-end, because adding a > separate subreg_offset field invalidates every ocurrence of > reg_offset in the back-end anyway. Yes, I agree. Iago > > > > --- > > src/mesa/drivers/dri/i965/brw_ir_fs.h | 6 ------ > > src/mesa/drivers/dri/i965/brw_shader.h | 6 ++++++ > > 2 files changed, 6 insertions(+), 6 deletions(-) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_ir_fs.h > > b/src/mesa/drivers/dri/i965/brw_ir_fs.h > > index f214483..00fbace 100644 > > --- a/src/mesa/drivers/dri/i965/brw_ir_fs.h > > +++ b/src/mesa/drivers/dri/i965/brw_ir_fs.h > > @@ -52,12 +52,6 @@ public: > > /** Smear a channel of the reg to all channels. */ > > fs_reg &set_smear(unsigned subreg); > > > > - /** > > - * Offset in bytes from the start of the register. Values up > > to a > > - * backend_reg::reg_offset unit are valid. > > - */ > > - int subreg_offset; > > - > > /** Register region horizontal stride */ > > uint8_t stride; > > }; > > diff --git a/src/mesa/drivers/dri/i965/brw_shader.h > > b/src/mesa/drivers/dri/i965/brw_shader.h > > index e61c080..ae23830 100644 > > --- a/src/mesa/drivers/dri/i965/brw_shader.h > > +++ b/src/mesa/drivers/dri/i965/brw_shader.h > > @@ -75,6 +75,12 @@ struct backend_reg : private brw_reg > > */ > > uint16_t reg_offset; > > > > + /** > > + * Offset in bytes from the start of the register. Values up > > to a > > + * backend_reg::reg_offset unit are valid. > > + */ > > + uint16_t subreg_offset; > > + > > using brw_reg::type; > > using brw_reg::file; > > using brw_reg::negate; _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev