On Thu, 2017-02-16 at 08:11 +0100, Samuel Iglesias Gonsálvez wrote: > On Wed, 2017-02-15 at 12:08 -0800, Francisco Jerez wrote: > > Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > > > > > From: "Juan A. Suarez Romero" <jasua...@igalia.com> > > > > > > When converting a DF to 32-bit conversions, we set dst stride to > > > 2, > > > to fulfill alignment restrictions because the upper Dword of > > > every > > > Qword will be written with undefined value. > > > > > > But in IVB/BYT, this is not necessary, as each DF conversion > > > already > > > writes 2, the first one the real value, and the second one a 0. > > > That is, IVB/BYT already set stride = 2 implicitly, so we must > > > set > > > it to > > > 1 explicitly to avoid ending up with stride = 4. > > > > > > v2: > > > - Fix typo (Matt) > > > > > > v3: > > > - Fix stride in the destination's brw_reg, don't modity IR > > > (Curro) > > > > > > Signed-off-by: Juan A. Suarez Romero <jasua...@igalia.com> > > > Signed-off-by: Samuel Iglesias Gonsálvez <sigles...@igalia.com> > > > --- > > > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 76 > > > +++++++++++++++----------- > > > 1 file changed, 45 insertions(+), 31 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > index 2f60ddd8706..e0a80d73b70 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > +++ b/src/mesa/drivers/dri/i965/brw_fs_generator.cpp > > > @@ -55,7 +55,7 @@ brw_file_from_reg(fs_reg *reg) > > > > > > static struct brw_reg > > > brw_reg_from_fs_reg(const struct gen_device_info *devinfo, > > > fs_inst > > > *inst, > > > - fs_reg *reg, bool compressed) > > > + fs_reg *reg, bool is_dst, bool compressed) > > > { > > > struct brw_reg brw_reg; > > > > > > @@ -94,34 +94,48 @@ brw_reg_from_fs_reg(const struct > > > gen_device_info *devinfo, fs_inst *inst, > > > const unsigned width = MIN2(reg_width, phys_width); > > > brw_reg = brw_vecn_reg(width, brw_file_from_reg(reg), > > > reg->nr, 0); > > > brw_reg = stride(brw_reg, width * reg->stride, width, > > > reg->stride); > > > - /* From the IvyBridge PRM (EU Changes by Processor > > > Generation, page 13): > > > - * "Each DF (Double Float) operand uses an element > > > size > > > of 4 rather > > > - * than 8 and all regioning parameters are twice what > > > the values > > > - * would be based on the true element size: ExecSize, > > > Width, > > > - * HorzStride, and VertStride. Each DF operand uses a > > > pair of > > > - * channels and all masking and swizzing should be > > > adjusted > > > - * appropriately." > > > - * > > > - * From the IvyBridge PRM (Special Requirements for > > > Handling Double > > > - * Precision Data Types, page 71): > > > - * "In Align1 mode, all regioning parameters like > > > stride, execution > > > - * size, and width must use the syntax of a pair of > > > packed > > > - * floats. The offsets for these data types must be > > > 64- > > > bit > > > - * aligned. The execution size and regioning > > > parameters > > > are in terms > > > - * of floats." > > > - * > > > - * Summarized: when handling DF-typed arguments, > > > ExecSize, > > > - * VertStride, and Width must be doubled, and > > > HorzStride > > > must be > > > - * doubled when the region is not scalar. > > > - * > > > - * It applies to BayTrail too. > > > - */ > > > - if (devinfo->gen == 7 && !devinfo->is_haswell && > > > - type_sz(reg->type) == 8) { > > > - brw_reg.width++; > > > - if (brw_reg.vstride > 0) > > > - brw_reg.vstride++; > > > - assert(brw_reg.hstride == BRW_HORIZONTAL_STRIDE_1); > > > + > > > + if (devinfo->gen == 7 && !devinfo->is_haswell) { > > > + /* From the IvyBridge PRM (EU Changes by Processor > > > Generation, page 13): > > > + * "Each DF (Double Float) operand uses an element > > > size of 4 rather > > > + * than 8 and all regioning parameters are twice > > > what the values > > > + * would be based on the true element size: > > > ExecSize, Width, > > > + * HorzStride, and VertStride. Each DF operand > > > uses > > > a pair of > > > + * channels and all masking and swizzing should be > > > adjusted > > > + * appropriately." > > > + * > > > + * From the IvyBridge PRM (Special Requirements for > > > Handling Double > > > + * Precision Data Types, page 71): > > > + * "In Align1 mode, all regioning parameters like > > > stride, execution > > > + * size, and width must use the syntax of a pair > > > of > > > packed > > > + * floats. The offsets for these data types must > > > be > > > 64-bit > > > + * aligned. The execution size and regioning > > > parameters are in terms > > > + * of floats." > > > + * > > > + * Summarized: when handling DF-typed arguments, > > > ExecSize, > > > + * VertStride, and Width must be doubled, and > > > HorzStride must be > > > + * doubled when the region is not scalar. > > > + * > > > + * It applies to BayTrail too. > > > + */ > > > + if (type_sz(reg->type) == 8) { > > > + brw_reg.width++; > > > + if (brw_reg.vstride > 0) > > > + brw_reg.vstride++; > > > + assert(brw_reg.hstride == > > > BRW_HORIZONTAL_STRIDE_1); > > > + } > > > + > > > + /* When converting from DF->F, we set destination's > > > stride as 2 as an > > > + * because each d2f conversion implicitly writes 2 > > > F, > > > + * being the first one the converted value. IVB/BYT > > > already set > > > + * stride = 2 implicitly, so we must set it to 1 > > > explicitly to avoid > > > + * ending up with stride = 4. > > > + */ > > > > The sentence starting from "IVB/BYT already set..." seems kind of > > misleading, because the hardware's behavior is really not the same > > as > > a > > stride=2 region -- The hardware actually writes two F components > > per > > SIMD channel, and every other component is filled with > > garbage. Stride > > behaves as usual, it's just that the hardware outputs twice as many > > components as you'd expect. > > > > OK > > > > + if (is_dst && get_exec_type_size(inst) == 8 && > > > + type_sz(reg->type) < 8) { > > > + assert(brw_reg.hstride == > > > BRW_HORIZONTAL_STRIDE_2); > > > + brw_reg.hstride = BRW_HORIZONTAL_STRIDE_1; > > > > You could change this line to 'brw_reg.hstride--' and relax the > > assertion for it to handle arbitrary destination strides. > > > > > + } > > > } > > > } > > > > > > @@ -1626,7 +1640,7 @@ fs_generator::generate_code(const cfg_t > > > *cfg, > > > int dispatch_width) > > > > > > for (unsigned int i = 0; i < inst->sources; i++) { > > > src[i] = brw_reg_from_fs_reg(devinfo, inst, > > > - &inst->src[i], > > > compressed); > > > + &inst->src[i], false, > > > compressed); > > > > Multiple boolean arguments make your code rather hard to read (what > > is > > this false literal referring to again?) and increases the > > likelihood > > of > > mistaking one of the boolean arguments for the other, since the > > type > > checker won't be able to notice the difference. Instead you could > > drop > > the argument altogether and replace the is_dst check in > > brw_reg_from_fs_reg() with a 'reg == &inst->dst' comparison, or > > take > > the > > destination region hstride munging out of brw_reg_from_fs_reg() and > > do > > it below instead, right after the 'dst = ...' assignment. > > >
By the way, Do you grant your R-b to this patch with the aforementioned changes? Sam > Right. I will change it. > > Thanks, > > Sam > > > > /* The accumulator result appears to get used for the > > > * conditional modifier generation. When negating a UD > > > * value, there is a 33rd bit generated for the sign in > > > the > > > @@ -1638,7 +1652,7 @@ fs_generator::generate_code(const cfg_t > > > *cfg, > > > int dispatch_width) > > > !inst->src[i].negate); > > > } > > > dst = brw_reg_from_fs_reg(devinfo, inst, > > > - &inst->dst, compressed); > > > + &inst->dst, true, compressed); > > > > > > brw_set_default_access_mode(p, BRW_ALIGN_1); > > > brw_set_default_predicate_control(p, inst->predicate); > > > -- > > > 2.11.0 > > > > > > _______________________________________________ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: This is a digitally signed message part
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev