On Fri, 2019-02-01 at 11:23 -0800, Francisco Jerez wrote: > Iago Toral <ito...@igalia.com> writes: > > > On Fri, 2019-01-25 at 12:54 -0800, Francisco Jerez wrote: > > > Iago Toral <ito...@igalia.com> writes: > > > > > > > On Thu, 2019-01-24 at 11:45 -0800, Francisco Jerez wrote: > > > > > Iago Toral <ito...@igalia.com> writes: > > > > > > > > > > > On Wed, 2019-01-23 at 06:03 -0800, Francisco Jerez wrote: > > > > > > > Iago Toral Quiroga <ito...@igalia.com> writes: > > > > > > > > > > > > > > > Commit c84ec70b3a72 implemented execution type > > > > > > > > promotion to > > > > > > > > 32- > > > > > > > > bit > > > > > > > > for > > > > > > > > conversions involving half-float registers, which > > > > > > > > empirical > > > > > > > > testing > > > > > > > > suggested > > > > > > > > was required, but it did not incorporate this change > > > > > > > > into > > > > > > > > the > > > > > > > > assembly validator > > > > > > > > logic. This commits adds that, preventing validation > > > > > > > > errors > > > > > > > > like > > > > > > > > this: > > > > > > > > > > > > > > > > > > > > > > I don't think we should be validating empirical > > > > > > > assumptions > > > > > > > in > > > > > > > the EU > > > > > > > validator. > > > > > > > > > > > > I am not sure I get your point, isn't c84ec70b3a72 also > > > > > > based > > > > > > on > > > > > > empirical testing after all? > > > > > > > > > > > > > > > > To some extent, but it doesn't attempt to enforce ISA > > > > > restrictions > > > > > based > > > > > on information obtained empirically. > > > > > > > > > > > > > > > > > > > mov(16) g9<4>B g3<16,8,2>HF { align1 1H }; > > > > > > > > ERROR: Destination stride must be equal to the ratio of > > > > > > > > the > > > > > > > > sizes > > > > > > > > of the > > > > > > > > execution data type to the destination type > > > > > > > > > > > > > > > > Fixes: c84ec70b3a72 "intel/fs: Promote execution type > > > > > > > > to > > > > > > > > 32-bit > > > > > > > > when any half-float conversion is needed." > > > > > > > > > > > > > > I don't think this "fixes" anything that ever worked. > > > > > > > > > > > > It is true that the code in that trace above is not > > > > > > something > > > > > > we > > > > > > can > > > > > > produce right now, because it is a conversion from HF to B > > > > > > and > > > > > > that > > > > > > should only happen within the context of > > > > > > VK_KHR_shader_float16_int8, > > > > > > however, this is a consequence of the fact that since > > > > > > c84ec70b3a72 > > > > > > there is an inconsistency between what we do at the IR > > > > > > level > > > > > > regarding > > > > > > execution size of HF conversions and what the EU validator > > > > > > is > > > > > > doing, > > > > > > and from that perspective this is really fixing an > > > > > > inconsistency > > > > > > that > > > > > > didn't exist before, and I thought we would want to address > > > > > > that > > > > > > sooner > > > > > > rather than later and track it down to the original change > > > > > > that > > > > > > introduced that inconsistency so we know where this is > > > > > > coming > > > > > > from. > > > > > > > > > > > > > > > > The "inconsistency" between the IR's get_exec_type() and the > > > > > EU > > > > > validator's execution_type() has existed ever since > > > > > a05b6f25bf4bfad7 > > > > > removed the HF assert from get_exec_type() without actually > > > > > implementing > > > > > the code required to handle HF operands (which is what my > > > > > commit > > > > > c84ec70b3a72 did). > > > > > > > > I agree with the fact that since a05b6f25bf4bfad7 the validator > > > > could > > > > reject valid code and that had nothing to do with your patch, > > > > > > The validator rejected the same valid HF code since it was > > > written, > > > that > > > had nothing to do with neither a05b6f25bf4bfad7 nor with my > > > patch, > > > and > > > it is the real problem this patch was working around. > > > > > > > but the inconsistency I am talking about here, that this patch > > > > fixes, > > > > is the one about get_exec_type() in the IR and execution_type() > > > > in > > > > the > > > > validator doing different things for HF instructions, which > > > > only > > > > exists since your patch and which you discuss below. > > > > > > > > > > The "inconsistency" exists ever since get_exec_type() was > > > introduced > > > without correct handling of HF types (even though > > > execution_type() > > > already attempted to handle it). And I disagree that it's a real > > > inconsistency except due to the fact that the validator is > > > incorrectly > > > attempting to validate the alignment of the destination region > > > according > > > to a rule that doesn't apply to HF types. > > > > > > > > > Anyway, that was my rationale for the Fixes tag, but if you > > > > > > think > > > > > > this > > > > > > is not useful I am happy to drop this patch and just > > > > > > include it > > > > > > as > > > > > > part > > > > > > of my series without the tag. > > > > > > > > > > > > > > > > I'd like to see the actual regioning restrictions for HF > > > > > types > > > > > implemented in the EU validator as part of your series. > > > > > > > > Ok, let's see if we can agree on what restrictions should we > > > > implement > > > > then. I can implement this restriction as documented: > > > > > > > > "Conversion between Integer and HF (Half Float) must be DWord- > > > > aligned > > > > and strided by a DWord on the destination" > > > > > > > > Instead of trying to apply the general one that is currently > > > > breaking. > > > > That will fix the assertion issue. I guess my issue with it was > > > > that > > > > going by your analysis this restriction is not telling the full > > > > picture, this is what you had to say about this restriction: > > > > > > > > "I have a feeling that the reason for this may be that the 16- > > > > bit > > > > pipeline lacks the ability to handle conversions from or to > > > > half- > > > > float, > > > > so the execution type is implicitly promoted to the matching > > > > (integer > > > > or floating-point) 32-bit type where any HF conversion would be > > > > needed. And on those the usual alignment restriction of the > > > > destination to a larger execution type applies." > > > > > > > > But I guess your point is that we can ignore these details at > > > > the > > > > validator level and just focus on validating strictly what the > > > > PRM > > > > says. Fair enough. > > > > > > > > Unfortunatley, you also mentioned that this restriction *seems* > > > > to > > > > be > > > > overriden by this one on all platforms but BDW (even though the > > > > both > > > > are listed, at least I see both in my SKL docs): > > > > > > > > "There is a relaxed alignment rule for word destinations. When > > > > the > > > > destination type is word (UW, W, HF), destination data types > > > > can be > > > > aligned to either the lowest word or the second lowest word of > > > > the > > > > execution channel. This means the destination data words can be > > > > either > > > > all in the even word locations or all in the odd word > > > > locations." > > > > > > > > > > I'd implement this one since it seems like the most specific, > > > except > > > on > > > BDW where only the previous (almost equivalent) restriction seems > > > to > > > apply. > > > > I have just realized that this restriction could be at odds with > > this > > other aspect of the spec: > > > > "There is no direct conversion from B/UB to DF or DF to B/UB. Use > > two > > instructions and a word or DWord intermediate type." > > > > So it is okay to convert from B to W and then from W to DF/Q (it > > has > > the same text from conversions between HF and 64-bit types). We > > actually emit these conversions and they work fine going by the > > existing CTS tests, but these conversions are not aligned to 32-bit > > like the restriction states, instead they are aligned to 64-bit > > (the > > regioning lowering pass does this): > > > > mov(8) g14<4>W g5<4,4,1>Q { align1 2Q }; > > > > So I think this is a bit inconsistent again and I guess we need to > > choose what we want to do. I can think of 4 options: > > > > 1. We could assume that the restriction is correctly formulated and > > split the instructions to respect the restriction as-is even if we > > have > > not found any evidence that this doesn't work > > > > 2. We could assume that the restriction is correctly formulated and > > the > > regioning lowering pass need to be fixed to generate a conversion > > with > > a stride of 2 even if the execution size if 64-bit (haven't tried > > this > > since it sounds sketchy to me). > > > > 3. We could understand that the restriction is strictly formulated > > for > > conversions from a 32-bit execution types, and decide that it does > > not > > apply to conversions from 64-bit sources. > > > > 4. We can assume that the restriction is formulated incorrectly and > > instead of saying that destination words should all be in even or > > odd > > slots, assuming a conversion from a 32-bit type, it should just say > > that they should be aligned to the execution type to also > > accomodate > > conversions from 64-bit types. > > > > 4. is basically what the back-end is assuming right now, but it is > indeed not literally equivalent to the regioning restrictions as > they're > formulated in the B-Spec. I'd suggest not validating the DWORD > alignment rule for HF instructions whenever the execution type is > greater than 32 bits, since the B-Spec language about those is rather > inconsistent.
Ok, I will do that. Just one thing: I guess that when you say HF instructions you mean 16-bit destinations, since that is the target of the restriction. > > And indepedently of how we decide to interpret the restriction in > > the > > end, given that we have discussed here that we should only validate > > things as they are described in the PRM, I think we also need to > > decide > > what we implement in the validator (if anything) if we decide to go > > with either 3) or 4) (were we would not be exactly following the > > PRM to > > the letter). > > > > I kind of think that 4) might be what is going on here... > > what do you think? > > > > > There are shitloads of other restrictions we're missing for HF > > > types BTW, check out the section "Special Requirements for > > > Handling > > > Mixed Mode Float Operations" of the hardware spec. > > > > > > > Which you discussed didn't make sense to you and I agreed, if > > > > only > > > > because my own experiments suggested that the implications that > > > > one > > > > would derive from a strict interpretation of this (that packed > > > > 16- > > > > bit > > > > data is not supported) are not quite true going by the fact > > > > that we > > > > are > > > > emitting packed instructions on all platforms without any > > > > indication > > > > that this doesn't work. So what should we do about this one? If > > > > we > > > > decide that we want to implement this then we have to re-think > > > > the > > > > implementation we have. > > > > > > > > > > The implication on packed HF instructions is likely accidental, I > > > don't > > > think we should enforce the rule except for conversion > > > operations. > > > > > > > Should we just validate the one about the integer/float > > > > conversions? > > > > Should we do that only on BDW or on all platforms? > > > > > > > > > I don't see the > > > > > need to introduce any empirical assumptions into the EU > > > > > validator's > > > > > execution_type() in order to achieve that, even if that means > > > > > that > > > > > get_exec_type() and execution_type() don't do the exact same > > > > > calculation > > > > > -- What you call an inconsistency is the consequence of > > > > > execution_type() > > > > > being the hardware spec's opinion on what the execution type > > > > > is, > > > > > which > > > > > we assume is what we need to use while enforcing a regioning > > > > > restriction > > > > > that refers to the execution type of the instruction. > > > > > > > > Mmm... I guess my question is: is the hardware really assuming > > > > that > > > > execution type in all cases that involve HF instructions? What > > > > I > > > > got > > > > from your analysis of the whole situation is that the hardware > > > > is > > > > actually promoting the execution size to 32-bit in some cases, > > > > and > > > > if > > > > that is the case, then does it make sense to implement code in > > > > the > > > > validator that ignores that fact? > > > > > > If the hardware spec chose to ignore that fact and instead gave > > > us a > > > different set of regioning restrictions for us to use on HF > > > restrictions > > > that don't rely on the definition of execution type, we should > > > probably > > > have the validator match the hardware spec literally. > > > > > > > I understand that you think this doesn't have any significance > > > > once > > > > we > > > > make sure that we only apply the restrictions specifically > > > > documented > > > > for HF from the PRM, and you are probably right, however I have > > > > to > > > > admit that it feels a bit odd to me that we do that when we > > > > know > > > > that > > > > under the hood there is much more going on and we are producing > > > > FS > > > > IR > > > > code following different rules after all. But maybe it is just > > > > me > > > > being too paranoid about this... > > > > > > > > > > Wouldn't it be even odder to have the hardware restriction > > > validator > > > diverge from the documentation it's supposedly validating? > > > > > > > > > > > > > > The validator is > > > > > > > still missing an implementation of the quirky HF > > > > > > > restrictions, > > > > > > > and it > > > > > > > wasn't the purpose of c84ec70b3a72 to do such a thing. > > > > > > > > > > > > While this is true in general, the EU validator does > > > > > > consider > > > > > > the > > > > > > execution type of the instruction to validate general rules > > > > > > such as > > > > > > the > > > > > > one I mentioned in the commit message in this patch. And > > > > > > that > > > > > > part > > > > > > of > > > > > > the validator is inconsistent with c84ec70b3a72. > > > > > > > > > > That part of the validator was also inconsistent with the > > > > > code > > > > > generated > > > > > by your original VK_KHR_shader_float16_int8 series even > > > > > before I > > > > > committed c84ec70b3a72. The reason is that it is trying to > > > > > validate > > > > > a > > > > > restriction that rejects working code, because the "general" > > > > > rule > > > > > it's > > > > > trying to enforce isn't supposed to apply to instructions > > > > > with HF > > > > > operands, which is the real bug. > > > > > > > > I agree with that, and I'll fix that in my series by preventing > > > > that > > > > rule to apply to these instructions (pending to agree on the > > > > rules > > > > that > > > > we should validate instead), but as I have already explained > > > > above, > > > > that was not the inconsistency I was referring to. > > > > > > > > > > It's the only actual inconsistency this was fixing. > > > > > > > > > In fact, the EU validator is accounting for execution size > > > > > > promotion > > > > > > of HF instructions to 32-bit in SKL+ and CHV only, for > > > > > > conversions > > > > > > from HF->F and mixed float mode instructions... which is > > > > > > part > > > > > > of > > > > > > what > > > > > > c84ec70b3a72 addresses at the IR level, which it actually > > > > > > does > > > > > > for > > > > > > all > > > > > > hardware platforms and in more cases. > > > > > > > > > > > > > > > > I'm fine with fixing execution_type() to do the right thing > > > > > in > > > > > more > > > > > cases and platforms, but I don't think that should involve > > > > > making > > > > > empirical assumptions in the validator. > > > > > > > > Just to be clear, in the particular case we are discussing > > > > here, I > > > > understand this means that we should leave the validator's > > > > execution_type() as-is. > > > > > > > > > > That's not what I was saying. Last time I looked at > > > execution_type() > > > it > > > did seem somewhat sketchy regarding HF types. I wouldn't be > > > surprised > > > if it needs some fixes to match the hardware spec (*not* our > > > empirical > > > knowledge of the hardware) in addition to the above. > > > > > > > > > > You *should* > > > > > > > definitely implement those restrictions (as they're > > > > > > > stated in > > > > > > > the > > > > > > > hardware spec, without empirical assumptions) in the > > > > > > > validator as > > > > > > > part > > > > > > > of your VK_KHR_shader_float16_int8 series, > > > > > > > > > > > > Again, I am not sure what you mean by "without empirical > > > > > > assumptions". > > > > > > > > > > I was just paraphrasing your comment. If you feel the need > > > > > to > > > > > write > > > > > a > > > > > comment justifying the existence of some code in the > > > > > validator > > > > > based > > > > > on > > > > > empirical testing, it probably doesn't belong in the > > > > > validator. > > > > > > > > > > > According to the commit message in c84ec70b3a72 "the docs > > > > > > are > > > > > > fairly > > > > > > imcomplete and inconsistent" and you explained here that > > > > > > you > > > > > > had to > > > > > > do > > > > > > some experimentation of your own using the simulator (where > > > > > > you > > > > > > found > > > > > > its results to also be inconsistent with the hardware docs) > > > > > > to > > > > > > try > > > > > > and > > > > > > guess what is happening. That's why I was using the term > > > > > > "empirical" > > > > > > here, since the hardware docs alone aren't clear or correct > > > > > > enough > > > > > > to > > > > > > understand what is really happening, when and in what > > > > > > platforms. > > > > > > > > > > > > > > > > If some hardware restriction is described in a contradictory > > > > > or > > > > > ambiguous way by the hardware spec we should probably avoid > > > > > enforcing > > > > > it > > > > > altogether in the EU validator. > > > > > > > > Okay, in that case I guess you would rule the restriction about > > > > the > > > > relaxed alignment rule for Word destinations as such, but not > > > > the > > > > one > > > > about half-float/integer conversions. Is that correct? > > > > > > > > > > We probably need to enforce both since they apply to different > > > sets > > > of > > > hardware. > > > > > > > > > Anyway, if you don't like the term "empirical" to refer to > > > > > > all > > > > > > this, > > > > > > that's fine by me, but what I need to know is if we agree > > > > > > that > > > > > > we > > > > > > need > > > > > > to implement the same type promotion rules in the validator > > > > > > that we > > > > > > are > > > > > > implementing in the IR, indepedently of whether refer to > > > > > > them > > > > > > as > > > > > > "based > > > > > > on empirical testing" or not. > > > > > > > > > > > > > > > > I don't think we agree on that. IMHO you want to enforce the > > > > > regioning > > > > > restrictions that the hardware spec describes for HF types > > > > > rather > > > > > than > > > > > the current (incorrect and incomplete) generalization of the > > > > > standard > > > > > restrictions. With those in place the apparent inconsistency > > > > > between > > > > > get_exec_type() and the hardware spec's execution type would > > > > > be > > > > > harmless. > > > > > > > > I agree with that. I still feel uneasy about having two > > > > different > > > > implementations of the execution type in FS IR and the > > > > validator, > > > > even > > > > if that doesn't lead to any issues when we do what you say > > > > here... > > > > > > > > > > I feel uneasy about the validator not matching the hardware spec. > > > Luckily its only purpose is validating the rest of the compiler, > > > so > > > if > > > there is an actual inconsistency as you seem to think, it will > > > come > > > up. > > > > > > > At the very least I think that deserves a comment in the > > > > validator > > > > explaining why we think that is okay. > > > > > > > > > > A quote of the hardware spec should be enough of a > > > comment. Right? And > > > there is already a comment in get_exec_type() explaining why its > > > HF > > > type > > > handling logic is consistent with the hardware spec restrictions > > > for > > > HF > > > types. > > > > > > > > > > if anything because currently > > > > > > > it will reject working code that uses HF types. > > > > > > > > > > > > Just for the sake of clarity, since that sentence above > > > > > > could > > > > > > be > > > > > > interpreted as if all HF code would be rejected: we have > > > > > > been > > > > > > using > > > > > > HF > > > > > > types since we landed VK_KHR_16bit_storage, which includes > > > > > > conversions > > > > > > between HF and F and the EU validator is not complaining > > > > > > about > > > > > > any > > > > > > of > > > > > > that. It is currently a problem only with conversions to > > > > > > smaller > > > > > > types > > > > > > (so only conversions to Byte) because that's where we check > > > > > > for > > > > > > that > > > > > > restriction on the stride, which is based on the execution > > > > > > type > > > > > > of > > > > > > the > > > > > > instruction. > > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > src/intel/compiler/brw_eu_validate.c | 27 > > > > > > > > ++++++++++++++ > > > > > > > > ---- > > > > > > > > ---- > > > > > > > > ----- > > > > > > > > 1 file changed, 14 insertions(+), 13 deletions(-) > > > > > > > > > > > > > > > > diff --git a/src/intel/compiler/brw_eu_validate.c > > > > > > > > b/src/intel/compiler/brw_eu_validate.c > > > > > > > > index a25010b225c..3bb37677672 100644 > > > > > > > > --- a/src/intel/compiler/brw_eu_validate.c > > > > > > > > +++ b/src/intel/compiler/brw_eu_validate.c > > > > > > > > @@ -325,17 +325,20 @@ execution_type(const struct > > > > > > > > gen_device_info > > > > > > > > *devinfo, const brw_inst *inst) > > > > > > > > unsigned num_sources = > > > > > > > > num_sources_from_inst(devinfo, > > > > > > > > inst); > > > > > > > > enum brw_reg_type src0_exec_type, src1_exec_type; > > > > > > > > > > > > > > > > - /* Execution data type is independent of > > > > > > > > destination > > > > > > > > data > > > > > > > > type, > > > > > > > > except in > > > > > > > > - * mixed F/HF instructions on CHV and SKL+. > > > > > > > > + /* Empirical testing suggests that type conversions > > > > > > > > involving > > > > > > > > half-float > > > > > > > > + * promote execution type to 32-bit. See > > > > > > > > get_exec_type() in > > > > > > > > brw_ir_fs.h. > > > > > > > > */ > > > > > > > > enum brw_reg_type dst_exec_type = > > > > > > > > brw_inst_dst_type(devinfo, > > > > > > > > inst); > > > > > > > > > > > > > > > > src0_exec_type = > > > > > > > > execution_type_for_type(brw_inst_src0_type(devinfo, > > > > > > > > inst)); > > > > > > > > if (num_sources == 1) { > > > > > > > > - if ((devinfo->gen >= 9 || devinfo- > > > > > > > > >is_cherryview) && > > > > > > > > - src0_exec_type == BRW_REGISTER_TYPE_HF) { > > > > > > > > - return dst_exec_type; > > > > > > > > + if (type_sz(src0_exec_type) == 2 && > > > > > > > > dst_exec_type != > > > > > > > > src0_exec_type) { > > > > > > > > + if (src0_exec_type == BRW_REGISTER_TYPE_HF) > > > > > > > > + return BRW_REGISTER_TYPE_F; > > > > > > > > + else if (dst_exec_type == > > > > > > > > BRW_REGISTER_TYPE_HF) > > > > > > > > + return BRW_REGISTER_TYPE_D; > > > > > > > > } > > > > > > > > + > > > > > > > > return src0_exec_type; > > > > > > > > } > > > > > > > > > > > > > > > > @@ -367,14 +370,12 @@ execution_type(const struct > > > > > > > > gen_device_info > > > > > > > > *devinfo, const brw_inst *inst) > > > > > > > > src1_exec_type == BRW_REGISTER_TYPE_DF) > > > > > > > > return BRW_REGISTER_TYPE_DF; > > > > > > > > > > > > > > > > - if (devinfo->gen >= 9 || devinfo->is_cherryview) { > > > > > > > > - if (dst_exec_type == BRW_REGISTER_TYPE_F || > > > > > > > > - src0_exec_type == BRW_REGISTER_TYPE_F || > > > > > > > > - src1_exec_type == BRW_REGISTER_TYPE_F) { > > > > > > > > - return BRW_REGISTER_TYPE_F; > > > > > > > > - } else { > > > > > > > > - return BRW_REGISTER_TYPE_HF; > > > > > > > > - } > > > > > > > > + if (dst_exec_type == BRW_REGISTER_TYPE_F || > > > > > > > > + src0_exec_type == BRW_REGISTER_TYPE_F || > > > > > > > > + src1_exec_type == BRW_REGISTER_TYPE_F) { > > > > > > > > + return BRW_REGISTER_TYPE_F; > > > > > > > > + } else { > > > > > > > > + return BRW_REGISTER_TYPE_HF; > > > > > > > > } > > > > > > > > > > > > > > > > assert(src0_exec_type == BRW_REGISTER_TYPE_F); > > > > > > > > -- > > > > > > > > 2.17.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev