On Wed, Feb 26, 2014 at 11:26 AM, Kenneth Graunke <kenn...@whitecape.org> wrote: > On 02/26/2014 08:35 AM, Ian Romanick wrote: >> On 02/25/2014 05:06 PM, Kenneth Graunke wrote: >>> On 02/25/2014 09:38 AM, Eric Anholt wrote: >>>> Matt Turner <matts...@gmail.com> writes: >>>> >>>>> On Mon, Feb 24, 2014 at 10:15 AM, Eric Anholt <e...@anholt.net> wrote: >>>>>> I think we would do better by emitting >>>>>> ADD(y_minus_x, y, negate(x)) >>>>>> MAC(dst, x, y_minus_x, a) >>>>> >>>>> MAC only takes two arguments, so >>>>> - if you meant MAD, there's no MAD on platforms that don't have LRP >>>>> - if you meant MAC(dst, ...) I don't see a way of doing it only two >>>>> instructions, but we could do >>>>> >>>>> MOV(acc, x) >>>>> ADD(y_minus_x, y, negate(x) >>>>> MAC(dst, y_minus_x, a) >>>> >>>> Oops, yeah, I was still thinking in terms of MAD. This should still be >>>> better I think, while being an obvious translation of the LRP >>>> instruction: >>>> >>>> ADD one_minus_a, negate(a), 1.0f >>>> MUL null, y, a >>>> MAC dst, x, one_minus_a >>>> >>>> (multiplying y * a first to slightly reduce the stall pressure from >>>> one_minus_a) >>> >>> Nice. I agree this is better, but it's harder than you think. We would >>> have to: >>> >>> 1. Create a MAC() emitter. >>> 2. Add BRW_OPCODE_MAC to vec4_generator. >>> 3. Add a new "enable accumulator writes" flag to vec4_instruction >>> and make vec4_generator respect that. (The MUL needs this.) >>> 4. Fix up dead code elimination and other things to know about implicit >>> accumulator writes. >> >> Can you write a slightly expanded description of what needs doing? Don't >> take more than 10 minutes. This is exactly the sort of task that I'd >> like to take with me to Finland. :) > > Part 1: Adding arbitrary accumulator write support. > --------------------------------------------------- > > i965 hardware has an "accumulator" register, which can be used to store > intermediate results across multiple instructions. It is higher > precision than ordinary registers. > > Many assembly instructions support the "AccWrEn" flag to write a value > to the accumulator in addition to their destination. (This may be a > different value. For example, addc stores the addition result in dst, > but the carry result in the accumulator.) > > Some instructions read from the accumulator implicitly, while others can > use it as an explicit source register. (See the ISA reference for > restrictions on various instructions.) > > Currently, the i965 Vec4 backend uses the accumulator only for a few > instructions (ADDC, SUBB, MACH) where it's absolutely necessary. We > would like to support it more generally. > > 1. Create a new flag in vec4_instruction to represent AccWrEn: > bool write_accumulator; > > 2. Update the instruction creators for ADDC, SUBB, and MACH to set it. > > brw_vec4_visitor.cpp defines a number of visitor methods that create > instructions: MUL(), ADD(), and so on. Since the majority of these are > identical (other than the opcode), they're implemented via macros: ALU1, > ALU2, and ALU3. > > - Create a new ALU2_ACC macro that is identical to ALU2, but which sets > the "write_accumulator" flag after allocating the new instruction. > > - Change ADDC/SUBB/MACH to be implemented with ALU2_ACC() instead of ALU2(). > > 3. Update the dead code elimination pass to consider the new flag. > > Normally, if nothing uses an instruction's destination register, we can > eliminate that instruction. However, instructions that implicitly write > to the accumulator produce additional data which may still be used. So, > dead code elimination instead simply changes the destination register to > the null register to free up that register. > > In vec4_visitor::dead_code_eliminate(), replace the switch statement on > opcode with: > > if (inst->write_accumulator) > inst->dst = dst_reg(retype(brw_null_reg(), inst->dst.type)); > else > inst->remove(); > > (Since we set the flag on ADDC/SUBB/MACH, this should be equivalent, but > will also handle any new instructions that implicitly write to the > accumulator.) > > 5. Make the flag actually affect assembly output. > > The vec4_generator class is what translates this IR (list of > vec4_instructions) to the actual assembly code. At a lower level, it > uses the brw_eu_emit.c infrastructure to emit that code. > > The brw_eu_emit.c code allows you to set the "default state" for > subsequent instructions. > > In vec4_generator::generate_code(), find this block of code: > > brw_set_conditionalmod(p, inst->conditional_mod); > brw_set_predicate_control(p, inst->predicate); > brw_set_predicate_inverse(p, inst->predicate_inverse); > brw_set_saturate(p, inst->saturate); > brw_set_mask_control(p, inst->force_writemask_all); > > This sets up the default state according to the flags. Note how the > next call is generate_vec4_instruction(), which generates assembly > instruction(s) from the IR. You'll want to add: > > brw_set_acc_write_control(p, inst->write_accumulator); > > With this in place, the brw_set_acc_write_control calls in > generate_vec4_instruction's MACH, ADDC, SUBB cases are unnecessary. > Remove them. > > You should now should regression test your code using Piglit.
I think instruction scheduling will have to be updated too. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev