Chema Casanova <jmcasan...@igalia.com> writes: > El 26/07/18 a las 20:02, Francisco Jerez escribió: >> Chema Casanova <jmcasan...@igalia.com> writes: >> >>> El 20/07/18 a las 22:10, Francisco Jerez escribió: >>>> Chema Casanova <jmcasan...@igalia.com> writes: >>>> >>>>> El 20/07/18 a las 00:34, Francisco Jerez escribió: >>>>>> Chema Casanova <jmcasan...@igalia.com> writes: >>>>>> >>>>>>> El 14/07/18 a las 00:14, Francisco Jerez escribió: >>>>>>>> Jose Maria Casanova Crespo <jmcasan...@igalia.com> writes: >>>>>>>> >>>>>>>>> For a register source/destination of an instruction the function >>>>>>>>> returns >>>>>>>>> the read/write byte pattern of a 32-byte registers as a unsigned int. >>>>>>>>> >>>>>>>>> The returned pattern takes into account the exec_size of the >>>>>>>>> instruction, >>>>>>>>> the type bitsize, the stride and if the register is source or >>>>>>>>> destination. >>>>>>>>> >>>>>>>>> The objective of the functions if to help to know the read/written >>>>>>>>> bytes >>>>>>>>> of the instructions to improve the liveness analysis for partial >>>>>>>>> read/writes. >>>>>>>>> >>>>>>>>> We manage special cases for SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >>>>>>>>> and SHADER_OPCODE_BYTE_SCATTERED_WRITE because depending of the >>>>>>>>> bitsize >>>>>>>>> parameter they have a different read pattern. >>>>>>>>> --- >>>>>>>>> src/intel/compiler/brw_fs.cpp | 183 >>>>>>>>> +++++++++++++++++++++++++++++++++ >>>>>>>>> src/intel/compiler/brw_ir_fs.h | 1 + >>>>>>>>> 2 files changed, 184 insertions(+) >>>>>>>>> >>>>>>>>> diff --git a/src/intel/compiler/brw_fs.cpp >>>>>>>>> b/src/intel/compiler/brw_fs.cpp >>>>>>>>> index 2b8363ca362..f3045c4ff6c 100644 >>>>>>>>> --- a/src/intel/compiler/brw_fs.cpp >>>>>>>>> +++ b/src/intel/compiler/brw_fs.cpp >>>>>>>>> @@ -687,6 +687,189 @@ fs_inst::is_partial_write() const >>>>>>>>> this->dst.offset % REG_SIZE != 0); >>>>>>>>> } >>>>>>>>> >>>>>>>>> +/** >>>>>>>>> + * Returns a 32-bit uint whose bits represent if the associated >>>>>>>>> register byte >>>>>>>>> + * has been read/written by the instruction. The returned pattern >>>>>>>>> takes into >>>>>>>>> + * account the exec_size of the instruction, the type bitsize and >>>>>>>>> the register >>>>>>>>> + * stride and the register is source or destination for the >>>>>>>>> instruction. >>>>>>>>> + * >>>>>>>>> + * The objective of this function is to identify which parts of the >>>>>>>>> register >>>>>>>>> + * are read or written for operations that don't read/write a full >>>>>>>>> register. >>>>>>>>> + * So we can identify in live range variable analysis if a partial >>>>>>>>> write has >>>>>>>>> + * completelly defined the part of the register used by a partial >>>>>>>>> read. So we >>>>>>>>> + * avoid extending the liveness range because all data read was >>>>>>>>> already >>>>>>>>> + * defined although the wasn't completely written. >>>>>>>>> + */ >>>>>>>>> +unsigned >>>>>>>>> +fs_inst::register_byte_use_pattern(const fs_reg &r, boolean is_dst) >>>>>>>>> const >>>>>>>>> +{ >>>>>>>>> + if (is_dst) { >>>>>>> >>>>>>>> Please split into two functions (like fs_inst::src_read and >>>>>>>> ::src_written) since that would make the call-sites of this method more >>>>>>>> self-documenting than a boolean parameter. You should be able to share >>>>>>>> code by refactoring the common logic into a separate function (see >>>>>>>> below >>>>>>>> for some suggestions on how that could be achieved). >>>>>>> >>>>>>> Sure, it would improve readability and simplifies the logic, I've chosen >>>>>>> dst_write_pattern and src_read_pattern. >>>>>>> >>>>>>>> >>>>>>>>> + /* We don't know what is written so we return the worts case */ >>>>>>>> >>>>>>>> "worst" >>>>>>> >>>>>>> Fixed. >>>>>>> >>>>>>>>> + if (this->predicate && this->opcode != BRW_OPCODE_SEL) >>>>>>>>> + return 0; >>>>>>>>> + /* We assume that send destinations are completely written */ >>>>>>>>> + if (this->is_send_from_grf()) >>>>>>>>> + return ~0u; >>>>>>>> >>>>>>>> Some send-like instructions won't be caught by this condition, you >>>>>>>> should check for this->mlen != 0 in addition. >>>>>>> >>>>>>> Would it be enough to check for (this->mlen > 0) and forget about >>>>>>> is_send_from_grf? I am using this approach in v2 I am sending. >>>>>>> >>>>>> >>>>>> I don't think the mlen > 0 condition would catch all cases either... >>>>>> E.g. FS_OPCODE_UNIFORM_PULL_CONSTANT_LOAD IIRC. You probably need both >>>>>> conditions. Sucks... >>>>> >>>>> That is true, so now we have the: >>>>> (this->is_send_from_grf() || this->mlen != 0) >>>>> >>>>>>>>> + } else { >>>>>>>>> + /* byte_scattered_write_logical pattern of src[1] is 32-bit >>>>>>>>> aligned >>>>>>>>> + * so the read pattern depends on the bitsize stored at src[4] >>>>>>>>> + */ >>>>>>>>> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE_LOGICAL >>>>>>>>> && >>>>>>>>> + this->src[1].nr == r.nr) { >>>>>>> >>>>>>>> I feel uncomfortable about attempting to guess the source the caller is >>>>>>>> referring to by comparing the registers for equality. E.g. you could >>>>>>>> potentially end up with two sources that compare equal but have >>>>>>>> different semantics (e.g. as a result of CSE) which might cause it to >>>>>>>> get the wrong answer. It would probably be better to pass a source >>>>>>>> index and a byte offset as argument instead of an fs_reg. >>>>>>> >>>>>>> I've didn't thought about CSE, I'm now receiving the number of source >>>>>>> and the reg_offset. I'm using reg_offset instead of byte offsets as it >>>>>>> simplifies the logic. Now we are using always the base src register to >>>>>>> do all the calculation >>>>>>>>> + switch (this->src[4].ud) { >>>>>>>>> + case 32: >>>>>>>>> + return ~0u; >>>>>>>>> + case 16: >>>>>>>>> + return 0x33333333; >>>>>>>>> + case 8: >>>>>>>>> + return 0x11111111; >>>>>>>>> + default: >>>>>>>>> + unreachable("Unsupported bitsize at >>>>>>>>> byte_scattered_write_logical"); >>>>>>>>> + } >>>>>>>> >>>>>>>> Replace the above switch statement with a call to "periodic_mask(8, 4, >>>>>>>> this->src[4].ud / 8)" (see below for the definition). >>>>>>> >>>>>>> Ok. >>>>>>> >>>>>>>>> + } >>>>>>>>> + /* As for byte_scattered_write_logical but we need to take >>>>>>>>> into account >>>>>>>>> + * that data written are in the payload offset 32 with SIMD8 >>>>>>>>> and offset >>>>>>>>> + * 64 with SIMD16. >>>>>>>>> + */ >>>>>>>>> + if (this->opcode == SHADER_OPCODE_BYTE_SCATTERED_WRITE && >>>>>>>>> + this->src[0].nr == r.nr) { >>>>>>>>> + fs_reg payload = this->src[0]; >>>>>>>>> + payload.offset = REG_SIZE * this->exec_size / 8; >>>>>>>> >>>>>>>> byte_offset() is your friend. >>>>>>> >>>>>>> I've removed the overlap logic, and I'm just checking if we are in the >>>>>>> reg_offset 1 on SIMD8 or reg_offset 2-3 in SIMD16. >>>>>>> >>>>>>>>> + if (regions_overlap(r, REG_SIZE, >>>>>>>>> + payload, REG_SIZE * this->exec_size / >>>>>>>>> 8)) { >>>>>>>>> + switch (this->src[2].ud) { >>>>>>>>> + case 32: >>>>>>>>> + return ~0u; >>>>>>>>> + case 16: >>>>>>>>> + return 0x33333333; >>>>>>>>> + case 8: >>>>>>>>> + return 0x11111111; >>>>>>>>> + default: >>>>>>>>> + unreachable("Unsupported bitsize at >>>>>>>>> byte_scattered_write"); >>>>>>>>> + } >>>>>>>> >>>>>>>> Replace the above switch statement with a call to "periodic_mask(8, 4, >>>>>>>> this->src[2].ud / 8)". >>>>>>> >>>>>>> Ok. >>>>>>> >>>>>>>>> + } else { >>>>>>>>> + return ~0u; >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + /* We define the most conservative value in order to calculate >>>>>>>>> liveness >>>>>>>>> + * range. If it is a destination nothing is defined and if is a >>>>>>>>> source >>>>>>>>> + * all the bytes of the register could be read. So for release >>>>>>>>> builds >>>>>>>>> + * the unreachables would have always safe return value. */ >>>>>>>>> + unsigned pattern = is_dst ? 0 : ~0u; >>>>>>>>> + >>>>>>>>> + /* In the general case we calculate the pattern for a specific >>>>>>>>> register >>>>>>>>> + * on base of the type_size and stride. We calculate the SIMD8 >>>>>>>>> pattern >>>>>>>>> + * and then we adjust the patter if needed for different >>>>>>>>> exec_sizes >>>>>>>>> + * and offset >>>>>>>>> + */ >>>>>>>>> + switch (type_sz(r.type)){ >>>>>>>>> + case 1: >>>>>>>>> + switch (r.stride) { >>>>>>>>> + case 0: >>>>>>>>> + pattern = 0X1; >>>>>>>>> + break; >>>>>>>>> + case 1: >>>>>>>>> + pattern = 0xff; >>>>>>>>> + break; >>>>>>>>> + case 2: >>>>>>>>> + pattern = 0x5555; >>>>>>>>> + break; >>>>>>>>> + case 4: >>>>>>>>> + pattern = 0x11111111; >>>>>>>>> + break; >>>>>>>>> + case 8: >>>>>>>>> + pattern = 0x01010101; >>>>>>>>> + break; >>>>>>>>> + default: >>>>>>>>> + unreachable("Unknown pattern unsupported 8-bit stride"); >>>>>>>>> + } >>>>>>>>> + break; >>>>>>>>> + case 2: >>>>>>>>> + switch (r.stride) { >>>>>>>>> + case 0: >>>>>>>>> + pattern = 0X3; >>>>>>>>> + break; >>>>>>>>> + case 1: >>>>>>>>> + pattern = 0xffff; >>>>>>>>> + break; >>>>>>>>> + case 2: >>>>>>>>> + pattern = 0x33333333; >>>>>>>>> + break; >>>>>>>>> + case 4: >>>>>>>>> + pattern = 0x03030303; >>>>>>>>> + break; >>>>>>>>> + case 8: >>>>>>>>> + pattern = 0x00030003; >>>>>>>>> + break; >>>>>>>>> + default: >>>>>>>>> + unreachable("Unknown pattern unsupported 16-bit stride"); >>>>>>>>> + } >>>>>>>>> + break; >>>>>>>>> + case 4: >>>>>>>>> + switch (r.stride) { >>>>>>>>> + case 0: >>>>>>>>> + pattern = 0Xf; >>>>>>>>> + break; >>>>>>>>> + case 1: >>>>>>>>> + pattern = ~0u; >>>>>>>>> + break; >>>>>>>>> + case 2: >>>>>>>>> + pattern = 0x0f0f0f0f; >>>>>>>>> + break; >>>>>>>>> + case 4: >>>>>>>>> + pattern = 0x000f000f; >>>>>>>>> + break; >>>>>>>>> + default: >>>>>>>>> + unreachable("Unknown pattern unsupported 32-bit stride"); >>>>>>>>> + } >>>>>>>>> + break; >>>>>>>>> + case 8: >>>>>>>>> + switch (r.stride) { >>>>>>>>> + case 0: >>>>>>>>> + pattern = 0Xff; >>>>>>>>> + break; >>>>>>>>> + case 1: >>>>>>>>> + pattern = ~0u; >>>>>>>>> + break; >>>>>>>>> + case 2: >>>>>>>>> + pattern = 0x00ff00ff; >>>>>>>>> + break; >>>>>>>>> + case 4: >>>>>>>>> + pattern = 0xff; >>>>>>>>> + break; >>>>>>>>> + default: >>>>>>>>> + unreachable("Unknown pattern unsupported 64-bit stride"); >>>>>>>>> + } >>>>>>>>> + break; >>>>>>>>> + default: >>>>>>>>> + unreachable("Unknown pattern for unsupported bitsize "); >>>>>>>>> + } >>>>>>>>> + >>>>>>>>> + if (this->exec_size > 8 && r.stride * type_sz(r.type) * 8 < >>>>>>>>> REG_SIZE) { >>>>>>>>> + /* For exec_size greater than SIMD8 we repeat the pattern >>>>>>>>> until it >>>>>>>>> + * represents a full register already represent a full >>>>>>>>> register */ >>>>>>>>> + pattern = pattern | (pattern << (8 * r.stride * >>>>>>>>> type_sz(r.type))); >>>>>>>>> + if (this->exec_size > 16 && r.stride * type_sz(r.type) * 16 < >>>>>>>>> REG_SIZE) >>>>>>>>> + pattern = pattern | (pattern << (16 * r.stride * >>>>>>>>> type_sz(r.type))); >>>>>>>>> + } else if (this->exec_size < 8 && >>>>>>>>> + r.stride * type_sz(r.type) * this->exec_size < >>>>>>>>> REG_SIZE) { >>>>>>>>> + /* For exec_size smaller than SIMD8 we reduce the pattern if >>>>>>>>> its size >>>>>>>>> + * is smaller than a full register. */ >>>>>>>>> + pattern = pattern >> (MIN2(REG_SIZE, 8 * type_sz(r.type) * >>>>>>>>> r.stride) - >>>>>>>>> + this->exec_size * type_sz(r.type) * >>>>>>>>> r.stride); >>>>>>>>> + } >>>>>>>>> + >>>>>>> >>>>>>>> This seems really mad, no clue whether it's correct. Why not replace >>>>>>>> the above ~110 lines with a call to the following (fully >>>>>>>> untested) 5-LOC function: >>>>>>> >>>>>>> Your suggestion seems to work perfectly fine, my original approach was >>>>>>> trying to avoid the loop of creating the read/write pattern but after >>>>>>> testing my v2 I wasn't able to notice any performance difference running >>>>>>> shader-db and having the same results. I was originally trying that >>>>>>> SIMD8 patterns were already constants. Sorry for the added complexity. >>>>>>> >>>>>> >>>>>> The loop runs for a logarithmic number of iterations though, so it has >>>>>> the exact same run-time complexity as your original patch, roughly the >>>>>> same amount of branches at run-time (but possibly less indirect >>>>>> branches!), and it should be compiled into a substantially lower number >>>>>> of instructions, which may actually cause it to perform better due to >>>>>> more favourable caching. It's hard to tell though which one will >>>>>> perform better in practice without benchmarking them, and as you >>>>>> probably realized this is so far from being a bottleneck that whatever >>>>>> the difference was is likely lost in the noise. So it really doesn't >>>>>> matter which one performs better... >>>>>> >>>>>>>> >>>>>>>> | uint32_t >>>>>>>> | periodic_mask(unsigned count, unsigned step, unsigned bits) >>>>>>>> | { >>>>>>>> | uint32_t m = (count ? (1 << bits) - 1 : 0); >>>>>>>> | const unsigined max = MIN2(count * step, sizeof(m) * CHAR_BITS); >>>>>>>> | >>>>>>>> | for (unsigned shift = step; shift < max; shift *= 2) >>>>>>>> | m |= m << shift; >>>>>>>> | >>>>>>>> | return m; >>>>>>>> | } >>>>>>> >>>>>>> I've used your function just changing the sizeof(m) * CHAR_BITS for the >>>>>>> REG_SIZE, to not include the limits.h. >>>>>> >>>>>> My intention was to make the function as agnostic to IR details as >>>>>> possible: the only reason there is a limit of 32 bits is because that's >>>>>> the size of the type used to hold the return value. Using sizeof makes >>>>>> sure that e.g. extending the code to 64 bits is as simple as changing >>>>>> the datatype to uint64_t. >>>>>> >>>>>>> And I've also included an offset parameter that allows us to shift the >>>>>>> bits of the pattern when we have an offset inside the register. >>>>>>> >>>>>> >>>>>> That sounds fine. >>>>>> >>>>>>>>> + /* We adjust the pattern to the byte_offset of the register */ >>>>>>>>> + pattern = pattern << (r.offset % REG_SIZE); >>>>>>>>> + >>>>>>>> >>>>>>>> This doesn't really work except for r equal to the first GRF read by >>>>>>>> the >>>>>>>> source. Regions with non-zero sub-GRF offset that straddle multiple >>>>>>>> GRFs are not really very common at the IR level, but they're allowed by >>>>>>>> the hardware with some restrictions, so it would make sense to at least >>>>>>>> handle them conservatively in order to keep things from breaking >>>>>>>> silently. >>>>>>> >>>>>>> >>>>>>> I included an assert in the periodic_mask function to detect if the >>>>>>> combination of offset and mask goes over the REG_SIZE, so we would >>>>>>> detect an straddle at this level of the IR. >>>>>>> >>>>>>> assert(offset + max - (step - bits) <= REG_SIZE); >>>>> >>>>>> Uhm... What's the definition of "offset" here? You seem to be passing >>>>>> the offset relative to the start of the VGRF modulo REG_SIZE but that >>>>>> doesn't really make sense to me whenever "reg_offset" is non-zero. >>>>> >>>>> Yes I agree on removing this assert in the periodic_mask as it a generic >>>>> function. I was to focus on seen the integer as the register :) >>>>> >>>>> The motivation was because of the following cases that happen with >>>>> 8/16-bit more usually: >>>>> >>>>> (1) mov(16) g12<2>HF g1<16,8,2>HF { align1 1H }; >>>>> (2) mov(16) g12.1<2>HF g2<16,8,2>HF { align1 1H }; >>>>> >>>>> In the dst_write_pattern, at point of calling the general periodic_mask >>>>> case we know that instruction is not expected to be a send message >>>>> destination. So the pattern is the same for reg_offset = 0 and >>>>> reg_offset = 1. >>>>> >>>> >>>> In this specific example, yes, but in general the pattern will not be >>>> the same for multiple registers for arbitrary fs_reg::offset values. >>> >>> I've been running some tests. I've found that the three opcodes reach >>> this situation in shader-db (MOV, INDIRECT_MOV and OR). >>> >>> If we just manage the MOV opcode that we agree that the same pattern is >>> repeated for the same source/dst used registers have the same spilling >>> reduction that with my original patch. >>> >>>>> 0x3333333 for case (1) and 0xccccccc for case (2). But without repeating >>>>> the pattern offset we would get 0xcccccccc for reg_offset=0 but >>>>> 0x333333333 for reg_offset=1 witch would be incorrect. >>>>> >>>>>> I think you want to pass "src[i].offset % REG_SIZE - reg_offset * >>>>>> REG_SIZE" as a signed integer in order to get the offset of the first >>>>>> byte actually written by the instruction relative to the first byte of >>>>>> the GRF window of the pattern. You don't really need to assert-fail >>>>>> when the offset is greater or equal to 32 (which shouldn't actually >>>>>> happen in practice), "return 0" gives you the correct behavior. For >>>>>> negative offsets (which means the pattern starts after the first byte >>>>>> written by the instruction) you can just return ~0u conservatively >>>>>> whenever the current logic wouldn't work [assuming you don't feel like >>>>>> implementing the code to handle that case accurately ;)]. >>>>> >>>>> Maybe you feel more comfortable with the following approach for >>>>> src_read_pattern.: >>>>> >>>>> For read sources we also assume that SENDs sources are completely >>>>> read so return ~0u except the byte_scattered_write source exceptions. >>>>> >>>>> if (this->is_send_from_grf() || this->mlen != 0) >>>>> return ~0u; >>>>> >>>>> So after this I think that we can assume that the following condition >>>>> should be always correct where source operand must reside within two >>>>> adjacent 256-bit registers so the pattern would be periodic for >>>>> all reg_offsets and we can use the (this->src[i].offset % REG_SIZE). >>>>> >>>> >>>> That's not a particularly futureproof assumption. I'm okay with >>>> handling non-reg_offset-periodic cases inaccurately for the moment for >>>> the sake of simplicity, but there is no reason to have the code blow up >>>> in such cases if you could just return ~0u. >>> >>> I've added some modifications in the v3, we return ~0u for reads and 0u >>> for writes by default, but i added the case of !this->is_partial_write() >>> that will return ~0u for writes, that won't we reached with current use, >>> but to be coherent with any future uses of this function. >>> >>>>> assert(reg_offset < DIV_ROUND_UP(this->src[i].stride ? >>>>> this->src[i].stride * type_sz(this->src[i].type) * this->exec_size : >>>>> type_sz(this->src[i].type), REG_SIZE)); >>>>> >>> >>>> I don't think it's useful to assert-fail on this condition, since it >>>> doesn't handle all cases where the code below will be broken for a >>>> certain valid IR, and the condition it does protect against can be >>>> handled trivially in periodic_mask(), by returning zero when the 32-byte >>>> pattern window falls outside the region read or written by the >>>> instruction -- But currently you can't even know whether that's the >>>> case, because periodic_mask() is unaware where the 32-byte window starts >>>> relative to the source region, which is *all* it cares about, instead >>>> you're passing an offset to it that is equal to that in some special >>>> cases -- In all other cases the result of the function will be bogus. >>> >>> I understand your concerns, so in my V3, I am using only the >>> periodic_mask function when reg_offset is 0, and for the MOV special >>> case that I understand we agree that the internal offset for >>> reg_offset=1 is generally reg.offset % REG_SIZE. With that two cases >>> covered we solve the register_pressure on current supported 64/16/8 bit >>> cases. >>> > >> I don't understand. The MOV opcode is not substantially different from >> any other ALU opcode with regards to regioning. The assumptions this is >> based on are invalid for general Align1 regions, whether the opcode is a >> MOV or not. > > Let's try step by step to find my misunderstanding :-) > > My original understanding was that for a general ALU operations to > define witch parts of the VGRF are written/read we need to use 4 > different variables in the scalar backend for src/dst registers. > > - The type size. > - The horizontal stride. > - The execution size. > - The initial byte offset of the first register of src[i]/dst > > If stride==1 && type_size * execution_size>=REG_SIZE && byte_offset== 0 > we have a full/write of one or two registers. In that case the solution > is easy everything is written ~0u. >
Not necessarily, if "reg_offset * REG_SIZE >= byte_offset + type_size * MAX2(1, stride * exec_size)" the correct answer is 0. > If we have a partial write/read: > > I understood that you my initial patter proposal would only be ok for > the first GRF of src[i]/dst (reg_offset == 0) > > periodic_mask(this->exec_size, /* count */ > this->src[i].stride * type_sz(this->src[i].type), /*step */ > type_sz(this->src[i].type), /* bits */ > this->src[i].offset % REG_SIZE); /* offset */ > > In the case we manage only reg_offset == 0 we get a huge improvement > reducing all problems many of the register_pressure we have now on all > SIMD8 shaders with 8/16bits test cases. > > I understood that you didn't agree that for cases where src/destination > use more than 1 GRF (reg_offset == 1) we can not guarantee that we can > apply the same internal offset (this->src[i].offset % REG_SIZE) as the > base register to calculate a patter. So It would be better to return ~0u > on reads or 0u in writes. > Yes, but you could easily determine whether the mask is going to be invariant with respect to reg_offset (where reg_offset is within bounds) and in that case return the periodic_mask() expression above, otherwise return 0/~0u depending on whether reg_offset is within bounds. > That is the point were I couldn't found a counter-case with the search I > did to identify the different register region that appear in shaderdb > generated MOV instructions and 16/8 bit CTS tests, that used two > registers with a region pattern that implied a different byte_pattern, > for the first and the second register. But it should exist :) > I don't see how special-casing MOV could help you in any way except by luck. If your assumptions were correct you would be able to apply them to all ALU instructions with Align1 regions. > If my bad understanding is before this point, I should back to re-review > the Regioning PRM section. Or maybe I messing the IR level and the > emission level that is another option with different semantics :-? > > Thank you for your time. > > Chema > >>>>> So we could call in this scenario. >>>>> >>>>> return periodic_mask(this->exec_size, >>>>> this->src[i].stride * type_sz(this->src[i].type), >>>>> type_sz(this->src[i].type), >>>>> this->src[i].offset % REG_SIZE); >>>>> >>>>> The only issue I could think about that could generate issues would a >>>>> case that I didn't found in our current code where a source is defined >>>>> like r5.0<1;8;2>:w explained at PRM KBL vol07 Page 790 "A 16-element >>>>> register region with interleaved rows (r5.0<1;8,2>:w)". >>>>> >>>> >>>> You couldn't really handle that even if you wanted to, because there is >>>> no way to represent such a thing at the IR level, since the horizontal >>>> and vertical strides cannot be controlled independently, only the stride >>>> member is meaningful for VGRFs. >>> >>> Good, that simplifies things. >>> >>> Thanks again. >>> >>> Chema >>> >>>>> What do you think? >>>>> >>>>> Chema >>>>> >>>>> >>>>>>> Thanks for the review, I think that v2 has better shape. >>>>>>> >>>>>> >>>>>> No problem. >>>>>> >>>>>>> Chema >>>>>>> >>>>>>>>> + assert(pattern); >>>>>>>>> + >>>>>>>>> + return pattern; >>>>>>>>> +} >>>>>>>>> + >>>>>>>>> + >>>>>>>>> unsigned >>>>>>>>> fs_inst::components_read(unsigned i) const >>>>>>>>> { >>>>>>>>> diff --git a/src/intel/compiler/brw_ir_fs.h >>>>>>>>> b/src/intel/compiler/brw_ir_fs.h >>>>>>>>> index 92dad269a34..5ea6294b8ad 100644 >>>>>>>>> --- a/src/intel/compiler/brw_ir_fs.h >>>>>>>>> +++ b/src/intel/compiler/brw_ir_fs.h >>>>>>>>> @@ -350,6 +350,7 @@ public: >>>>>>>>> bool equals(fs_inst *inst) const; >>>>>>>>> bool is_send_from_grf() const; >>>>>>>>> bool is_partial_write() const; >>>>>>>>> + unsigned register_byte_use_pattern(const fs_reg &r, boolean >>>>>>>>> is_dst) const; >>>>>>>>> bool is_copy_payload(const brw::simple_allocator &grf_alloc) >>>>>>>>> const; >>>>>>>>> unsigned components_read(unsigned i) const; >>>>>>>>> unsigned size_read(int arg) const; >>>>>>>>> -- >>>>>>>>> 2.17.1 >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> mesa-dev mailing list >>>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> mesa-dev mailing list >>>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> mesa-dev mailing list >>>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> mesa-dev mailing list >>>>>>>>> mesa-dev@lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev