On Thu, 28 Nov 2024 at 15:13, Andre Vieira (lists)
<andre.simoesdiasvie...@arm.com> wrote:
>
> Hi Christophe,
>
> On 28/11/2024 10:22, Christophe Lyon wrote:
> > The VCTP instruction creates a Vector Tail Predicate in VPR.P0, based
> > on the input value, but also constrained by a VPT block (if present),
> > or if used within a DLSTP/LETP loop.
> >
> > Therefore we need to inform the compiler that this intrinsic reads the
> > FPCXT register, otherwise it could make incorrect assumptions: for
> > instance in test7() from gcc.target/arm/mve/dlstp-compile-asm-2.c it
> > would hoist p1 = vctp32q (g) outside of the loop.
>
> We chatted about this offlist but it's good to share here for others
> too. I do not believe the transformation gcc is doing here is wrong. The
> transformation we do for test 7, along with some others in the
> testsuite, relies on analysis we do to check whether masks, that are not
> the loop predicate mask, used within the loop have a side effect. In
> other words, any instruction that is not predicated by the loop
> predicate, be that unpredicated or predicated by another mask, triggers
> an analysis to check whether the results are used in a safe way. Check
> the comments above 'arm_mve_impl_predicated_p' in arm.cc
>
> For test7 the non-loop predicate 'p1' is used to predicate a load inside
> the loop, when dlstp'ed that load will be masked by 'p & p1' instead,
> which means it could be loading less than initially intended, however,
> the results of that load are used in a vadd predicated by 'p' which
> means any values that it would have loaded if not masked by 'p' would
> have been discarded in the add, so it has no relevant effect.
>
> Furthermore, I also believe the compiler is already aware that VCTP
> writes P0, given it has an input operand with the predicate
> 'vpr_register_operand' and the register constraint '=Up'. During DLSTP
> transformation we rely on reads and writes to such operands to do our
> transformation and it should also provide other backend passes with
> enough information.
>
> So I don't think this patch is needed.
>
Indeed.
I managed to wrongly convince myself that p1 = vctp32q (g) should not
be hoisted...

Let me drop this patch, but dlstp-compile-asm-2.c still needs fixing.

Thanks,

Christophe

Reply via email to