On Thu, Apr 12, 2018 at 05:17:29PM +0200, Richard Biener wrote: > >For -Os that is easily measurable regression, for -O2 it depends on the > >relative speed of memcpy vs. mempcpy and whether one or both of them > >are in > >I-cache or not. > > Well, then simply unconditionally not generate a libcall from the move > expander?
We need to generate libcall for many callers and in fact, we don't have a mode nor a way to tell the caller that we haven't emitted anything. What we could do is add another enumerator to enum block_op_methods that would be like BLOCK_OP_NO_LIBCALL, but would not use emit_block_move_via_loop if move_by_pieces nor emit_block_move_via_movmem can be used, and say instead return const0_rtx or pc_rtx or some way to tell the caller that it hasn't emitted anything and in expand_builtin_memory_copy_args pass for endp == 1 && target != const0_rtx that new BLOCK_OP_NO_LIBCALL_LOOP to emit_block_move_hints and return 0 if dest_addr is const0_rtx (or pc_rtx or whatever is chosen). Jakub