On Thu, 28 Nov 2024 at 15:13, Andre Vieira (lists) <andre.simoesdiasvie...@arm.com> wrote: > > Hi Christophe, > > On 28/11/2024 10:22, Christophe Lyon wrote: > > The VCTP instruction creates a Vector Tail Predicate in VPR.P0, based > > on the input value, but also constrained by a VPT block (if present), > > or if used within a DLSTP/LETP loop. > > > > Therefore we need to inform the compiler that this intrinsic reads the > > FPCXT register, otherwise it could make incorrect assumptions: for > > instance in test7() from gcc.target/arm/mve/dlstp-compile-asm-2.c it > > would hoist p1 = vctp32q (g) outside of the loop. > > We chatted about this offlist but it's good to share here for others > too. I do not believe the transformation gcc is doing here is wrong. The > transformation we do for test 7, along with some others in the > testsuite, relies on analysis we do to check whether masks, that are not > the loop predicate mask, used within the loop have a side effect. In > other words, any instruction that is not predicated by the loop > predicate, be that unpredicated or predicated by another mask, triggers > an analysis to check whether the results are used in a safe way. Check > the comments above 'arm_mve_impl_predicated_p' in arm.cc > > For test7 the non-loop predicate 'p1' is used to predicate a load inside > the loop, when dlstp'ed that load will be masked by 'p & p1' instead, > which means it could be loading less than initially intended, however, > the results of that load are used in a vadd predicated by 'p' which > means any values that it would have loaded if not masked by 'p' would > have been discarded in the add, so it has no relevant effect. > > Furthermore, I also believe the compiler is already aware that VCTP > writes P0, given it has an input operand with the predicate > 'vpr_register_operand' and the register constraint '=Up'. During DLSTP > transformation we rely on reads and writes to such operands to do our > transformation and it should also provide other backend passes with > enough information. > > So I don't think this patch is needed. > Indeed. I managed to wrongly convince myself that p1 = vctp32q (g) should not be hoisted...
Let me drop this patch, but dlstp-compile-asm-2.c still needs fixing. Thanks, Christophe