Hi, > -----Original Message----- > From: Richard Sandiford [mailto:richard.sandif...@arm.com] > Sent: Sunday, May 31, 2020 12:01 AM > To: Yangfei (Felix) <felix.y...@huawei.com> > Cc: gcc-patches@gcc.gnu.org; Uros Bizjak <ubiz...@gmail.com>; Jakub > Jelinek <ja...@redhat.com>; Hongtao Liu <crazy...@gmail.com>; H.J. Lu > <hjl.to...@gmail.com> > Subject: Re: [PATCH PR95254] aarch64: gcc generate inefficient code with > fixed sve vector length >
Snip... > > > > The v5 patch attached addressed this issue. > > > > There two added changes compared with the v4 patch: > > 1. In candidate_mem_p, mov_optab for innermode should be available. > > In this case, mov_optab for SDmode is not there and subreg are added > back by emit_move_insn_1. So we won't get the benefit with the patch. > > I agree we should have this check. I think the rule applies to all of the > transforms though, not just the mem one, so we should add the check to the > register and constant cases too. OK. I changed to make this an extra condition for calculating x_inner & y _inner. > > 2. Instead of using adjust_address, I changed to use adjust_address_nv to > avoid the emit of invalid insn 13. > > The latter call to validize_mem() in emit_move_insn will take care of > > the > address for us. > > The validation performed by validize_mem is the same as that performed by > adjust_address, so the only case this should make a difference is for > push_operands: True. > /* If X or Y are memory references, verify that their addresses are valid > for the machine. */ > if (MEM_P (x) > && (! memory_address_addr_space_p (GET_MODE (x), XEXP (x, 0), > MEM_ADDR_SPACE (x)) > && ! push_operand (x, GET_MODE (x)))) > x = validize_mem (x); > > if (MEM_P (y) > && ! memory_address_addr_space_p (GET_MODE (y), XEXP (y, 0), > MEM_ADDR_SPACE (y))) > y = validize_mem (y); > > So I think the fix is to punt on push_operands instead (and continue to use > adjust_address rather than adjust_address_nv). Not sure if I understand it correctly. Do you mean excluding push_operand in candidate_mem_p? Like: 3830 auto candidate_mem_p = [&](machine_mode innermode, rtx mem) { 3831 return !targetm.can_change_mode_class (innermode, GET_MODE (mem), ALL_REGS) 3832 && !push_operand (mem, GET_MODE (mem)) 3833 /* Not a candiate if innermode requires too much alignment. */ 3834 && (MEM_ALIGN (mem) >= GET_MODE_ALIGNMENT (innermode) 3835 || targetm.slow_unaligned_access (GET_MODE (mem), 3836 MEM_ALIGN (mem)) 3837 || !targetm.slow_unaligned_access (innermode, MEM_ALIGN (mem))); 3838 }; Thanks, Felix