vec4: Handle ir_triop_lrp on Gen4-5 as well.

Matt Turner Wed, 26 Feb 2014 12:44:01 -0800

On Wed, Feb 26, 2014 at 11:26 AM, Kenneth Graunke <kenn...@whitecape.org> wrote:
> On 02/26/2014 08:35 AM, Ian Romanick wrote:
>> On 02/25/2014 05:06 PM, Kenneth Graunke wrote:
>>> On 02/25/2014 09:38 AM, Eric Anholt wrote:
>>>> Matt Turner <matts...@gmail.com> writes:
>>>>
>>>>> On Mon, Feb 24, 2014 at 10:15 AM, Eric Anholt <e...@anholt.net> wrote:
>>>>>> I think we would do better by emitting
>>>>>> ADD(y_minus_x, y, negate(x))
>>>>>> MAC(dst, x, y_minus_x, a)
>>>>>
>>>>> MAC only takes two arguments, so
>>>>>   - if you meant MAD, there's no MAD on platforms that don't have LRP
>>>>>   - if you meant MAC(dst, ...) I don't see a way of doing it only two
>>>>> instructions, but we could do
>>>>>
>>>>> MOV(acc, x)
>>>>> ADD(y_minus_x, y, negate(x)
>>>>> MAC(dst, y_minus_x, a)
>>>>
>>>> Oops, yeah, I was still thinking in terms of MAD.  This should still be
>>>> better I think, while being an obvious translation of the LRP
>>>> instruction:
>>>>
>>>> ADD one_minus_a, negate(a), 1.0f
>>>> MUL null, y, a
>>>> MAC dst, x, one_minus_a
>>>>
>>>> (multiplying y * a first to slightly reduce the stall pressure from
>>>> one_minus_a)
>>>
>>> Nice.  I agree this is better, but it's harder than you think.  We would
>>> have to:
>>>
>>> 1. Create a MAC() emitter.
>>> 2. Add BRW_OPCODE_MAC to vec4_generator.
>>> 3. Add a new "enable accumulator writes" flag to vec4_instruction
>>>     and make vec4_generator respect that.  (The MUL needs this.)
>>> 4. Fix up dead code elimination and other things to know about implicit
>>> accumulator writes.
>>
>> Can you write a slightly expanded description of what needs doing? Don't
>> take more than 10 minutes.  This is exactly the sort of task that I'd
>> like to take with me to Finland. :)
>
> Part 1: Adding arbitrary accumulator write support.
> ---------------------------------------------------
>
> i965 hardware has an "accumulator" register, which can be used to store
> intermediate results across multiple instructions.  It is higher
> precision than ordinary registers.
>
> Many assembly instructions support the "AccWrEn" flag to write a value
> to the accumulator in addition to their destination.  (This may be a
> different value.  For example, addc stores the addition result in dst,
> but the carry result in the accumulator.)
>
> Some instructions read from the accumulator implicitly, while others can
> use it as an explicit source register.  (See the ISA reference for
> restrictions on various instructions.)
>
> Currently, the i965 Vec4 backend uses the accumulator only for a few
> instructions (ADDC, SUBB, MACH) where it's absolutely necessary.  We
> would like to support it more generally.
>
> 1. Create a new flag in vec4_instruction to represent AccWrEn:
>    bool write_accumulator;
>
> 2. Update the instruction creators for ADDC, SUBB, and MACH to set it.
>
> brw_vec4_visitor.cpp defines a number of visitor methods that create
> instructions: MUL(), ADD(), and so on.  Since the majority of these are
> identical (other than the opcode), they're implemented via macros: ALU1,
> ALU2, and ALU3.
>
> - Create a new ALU2_ACC macro that is identical to ALU2, but which sets
> the "write_accumulator" flag after allocating the new instruction.
>
> - Change ADDC/SUBB/MACH to be implemented with ALU2_ACC() instead of ALU2().
>
> 3. Update the dead code elimination pass to consider the new flag.
>
> Normally, if nothing uses an instruction's destination register, we can
> eliminate that instruction.  However, instructions that implicitly write
> to the accumulator produce additional data which may still be used.  So,
> dead code elimination instead simply changes the destination register to
> the null register to free up that register.
>
> In vec4_visitor::dead_code_eliminate(), replace the switch statement on
> opcode with:
>
> if (inst->write_accumulator)
>    inst->dst = dst_reg(retype(brw_null_reg(), inst->dst.type));
> else
>    inst->remove();
>
> (Since we set the flag on ADDC/SUBB/MACH, this should be equivalent, but
> will also handle any new instructions that implicitly write to the
> accumulator.)
>
> 5. Make the flag actually affect assembly output.
>
> The vec4_generator class is what translates this IR (list of
> vec4_instructions) to the actual assembly code.  At a lower level, it
> uses the brw_eu_emit.c infrastructure to emit that code.
>
> The brw_eu_emit.c code allows you to set the "default state" for
> subsequent instructions.
>
> In vec4_generator::generate_code(), find this block of code:
>
>       brw_set_conditionalmod(p, inst->conditional_mod);
>       brw_set_predicate_control(p, inst->predicate);
>       brw_set_predicate_inverse(p, inst->predicate_inverse);
>       brw_set_saturate(p, inst->saturate);
>       brw_set_mask_control(p, inst->force_writemask_all);
>
> This sets up the default state according to the flags.  Note how the
> next call is generate_vec4_instruction(), which generates assembly
> instruction(s) from the IR.  You'll want to add:
>
>       brw_set_acc_write_control(p, inst->write_accumulator);
>
> With this in place, the brw_set_acc_write_control calls in
> generate_vec4_instruction's MACH, ADDC, SUBB cases are unnecessary.
> Remove them.
>
> You should now should regression test your code using Piglit.


I think instruction scheduling will have to be updated too.
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [Mesa-stable] [PATCH 2/4] i965/vec4: Handle ir_triop_lrp on Gen4-5 as well.

Reply via email to