On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > On 1 December 2016 at 18:38, Richard Biener <rguent...@suse.de> wrote: > > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > > > >> On 1 December 2016 at 18:26, Richard Biener <rguent...@suse.de> wrote: > >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> > > >> >> On 1 December 2016 at 17:40, Richard Biener <rguent...@suse.de> wrote: > >> >> > On Thu, 1 Dec 2016, Prathamesh Kulkarni wrote: > >> >> > > >> >> >> On 25 November 2016 at 21:17, Jeff Law <l...@redhat.com> wrote: > >> >> >> > On 11/25/2016 01:07 AM, Richard Biener wrote: > >> >> >> > > >> >> >> >>> For the tail-call, issue should we artificially create a lhs and > >> >> >> >>> use that > >> >> >> >>> as return value (perhaps by a separate pass before tailcall) ? > >> >> >> >>> > >> >> >> >>> __builtin_memcpy (a1, a2, a3); > >> >> >> >>> return a1; > >> >> >> >>> > >> >> >> >>> gets transformed to: > >> >> >> >>> _1 = __builtin_memcpy (a1, a2, a3) > >> >> >> >>> return _1; > >> >> >> >>> > >> >> >> >>> So tail-call optimization pass would see the IL in it's expected > >> >> >> >>> form. > >> >> >> >> > >> >> >> >> > >> >> >> >> As said, a RTL expert needs to chime in here. Iff then tail-call > >> >> >> >> itself should do this rewrite. But if this form is required to > >> >> >> >> make > >> >> >> >> things work (I suppose you checked it _does_ actually work?) then > >> >> >> >> we'd need to make sure later passes do not undo it. So it looks > >> >> >> >> fragile to me. OTOH I seem to remember that the flags we set on > >> >> >> >> GIMPLE are merely a hint to RTL expansion and the tailcalling is > >> >> >> >> verified again there? > >> >> >> > > >> >> >> > So tail calling actually sits on the border between trees and RTL. > >> >> >> > Essentially it's an expand-time decision as we use information > >> >> >> > from trees as > >> >> >> > well as low level target information. > >> >> >> > > >> >> >> > I would not expect the former sequence to tail call. The tail > >> >> >> > calling code > >> >> >> > does not know that the return value from memcpy will be a1. Thus > >> >> >> > the tail > >> >> >> > calling code has to assume that it'll have to copy a1 into the > >> >> >> > return > >> >> >> > register after returning from memcpy, which obviously can't be > >> >> >> > done if we > >> >> >> > tail called memcpy. > >> >> >> > > >> >> >> > The second form is much more likely to turn into a tail call > >> >> >> > sequence > >> >> >> > because the return value from memcpy will be sitting in the proper > >> >> >> > register. > >> >> >> > This form out to work for most calling conventions that allow tail > >> >> >> > calls. > >> >> >> > > >> >> >> > We could (in theory) try and exploit the fact that memcpy returns > >> >> >> > its first > >> >> >> > argument as a return value, but that would only be helpful on a > >> >> >> > target where > >> >> >> > the first argument and return value use the same register. So I'd > >> >> >> > have a > >> >> >> > slight preference to rewriting per Prathamesh's suggestion above > >> >> >> > since it's > >> >> >> > more general. > >> >> >> Thanks for the suggestion. The attached patch creates artificial lhs, > >> >> >> and returns it if the function returns it's argument and that > >> >> >> argument > >> >> >> is used as return-value. > >> >> >> > >> >> >> eg: > >> >> >> f (void * a1, void * a2, long unsigned int a3) > >> >> >> { > >> >> >> <bb 2> [0.0%]: > >> >> >> # .MEM_5 = VDEF <.MEM_1(D)> > >> >> >> __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > >> >> >> # VUSE <.MEM_5> > >> >> >> return a1_2(D); > >> >> >> > >> >> >> } > >> >> >> > >> >> >> is transformed to: > >> >> >> f (void * a1, void * a2, long unsigned int a3) > >> >> >> { > >> >> >> void * _6; > >> >> >> > >> >> >> <bb 2> [0.0%]: > >> >> >> # .MEM_5 = VDEF <.MEM_1(D)> > >> >> >> _6 = __builtin_memcpy (a1_2(D), a2_3(D), a3_4(D)); > >> >> >> # VUSE <.MEM_5> > >> >> >> return _6; > >> >> >> > >> >> >> } > >> >> >> > >> >> >> While testing, I came across an issue with function f() defined > >> >> >> intail-padding1.C: > >> >> >> struct X > >> >> >> { > >> >> >> ~X() {} > >> >> >> int n; > >> >> >> char d; > >> >> >> }; > >> >> >> > >> >> >> X f() > >> >> >> { > >> >> >> X nrvo; > >> >> >> __builtin_memset (&nrvo, 0, sizeof(X)); > >> >> >> return nrvo; > >> >> >> } > >> >> >> > >> >> >> input to the pass: > >> >> >> X f() () > >> >> >> { > >> >> >> <bb 2> [0.0%]: > >> >> >> # .MEM_3 = VDEF <.MEM_1(D)> > >> >> >> __builtin_memset (nrvo_2(D), 0, 8); > >> >> >> # VUSE <.MEM_3> > >> >> >> return nrvo_2(D); > >> >> >> > >> >> >> } > >> >> >> > >> >> >> verify_gimple_return failed with: > >> >> >> tail-padding1.C:13:1: error: invalid conversion in return statement > >> >> >> } > >> >> >> ^ > >> >> >> struct X > >> >> >> > >> >> >> struct X & > >> >> >> > >> >> >> # VUSE <.MEM_3> > >> >> >> return _4; > >> >> >> > >> >> >> It seems the return type of function (struct X) differs with the type > >> >> >> of return value (struct X&). > >> >> >> Not sure how this is possible ? > >> >> > > >> >> > You need to honor DECL_BY_REFERENCE of DECL_RESULT. > >> >> Thanks! Gating on !DECL_BY_REFERENCE (DECL_RESULT (cfun->decl)) > >> >> resolved the error. > >> >> Does the attached version look OK ? > >> > > >> > + ass_var = make_ssa_name (TREE_TYPE (arg)); > >> > > >> > can you try > >> > > >> > ass_var = copy_ssa_name (arg); > >> > > >> > instead? That way the underlying decl should make sure the > >> > DECL_BY_REFERENCE check in the IL verification works. > >> Done in the attached version and verified tail-padding1.C passes with > >> the change. > >> Does it look OK ? > > > > + if (!ass_var) > > + { > > + /* Check if function returns one if it's arguments > > + and that argument is used as return value. > > + In that case create an artificial lhs to call_stmt, > > + and set it as the return value. */ > > + > > + unsigned rf = gimple_call_return_flags (call); > > + if (rf & ERF_RETURNS_ARG) > > + { > > + unsigned argnum = rf & ERF_RETURN_ARG_MASK; > > + if (argnum < gimple_call_num_args (call) > > + && ret_stmt) > > > > check && ret_stmt above together with !ass_var. > Oops, sorry about that. > > > > + { > > + tree arg = gimple_call_arg (call, argnum); > > + tree retval = gimple_return_retval (ret_stmt); > > + if (retval > > + && TREE_CODE (retval) == SSA_NAME > > + && operand_equal_p (retval, arg, 0) > > + && !DECL_BY_REFERENCE (DECL_RESULT > > (cfun->decl))) > > > > the DECL_BY_REFERENCE check can be removed now (I think). > Well after removing DECL_BY_REFERENCE, verify_gimple still fails but > differently: > > tail-padding1.C:13:1: error: RESULT_DECL should be read only when > DECL_BY_REFERENCE is set > } > ^ > while verifying SSA_NAME nrvo_4 in statement > # .MEM_3 = VDEF <.MEM_1(D)> > nrvo_4 = __builtin_memset (nrvo_2(D), 0, 8); > tail-padding1.C:13:1: internal compiler error: verify_ssa failed

## Advertising

Hmm, ok. Not sure why we enforce this. Note that in the end this patch looks fishy -- iff we really need the LHS on the assignment for correctness if we have the tailcall flag set then what guarantees that later passes do not remove it again? So anybody removing a LHS would need to unset the tailcall flag? Saying again that I don't know enough about the RTL part of tailcall expansion. Richard. > Thanks, > Prathamesh > > > > Richard. > > > >> Bootstrap+test in progress on x86_64-unknown-linux-gnu. > >> > >> Thanks, > >> Prathamesh > >> > > >> > Thanks, > >> > Richard. > >> > > >> > > >> >> Validation in progress. > >> >> > >> >> Thanks, > >> >> Prathamesh > >> >> > > >> >> >> To work around that, I guarded the transform on: > >> >> >> useless_type_conversion_p (TREE_TYPE (TREE_TYPE (cfun->decl)), > >> >> >> TREE_TYPE (retval))) > >> >> >> > >> >> >> in the patch. Does that look OK ? > >> >> >> > >> >> >> Bootstrap+tested on x86_64-unknown-linux-gnu with > >> >> >> --enable-languages=all,ada. > >> >> >> Cross-tested on arm*-*-*, aarch64*-*-*. > >> >> >> > >> >> >> Thanks, > >> >> >> Prathamesh > >> >> >> > > >> >> >> > > >> >> >> > Jeff > >> >> >> > >> >> > > >> >> > -- > >> >> > Richard Biener <rguent...@suse.de> > >> >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham > >> >> > Norton, HRB 21284 (AG Nuernberg) > >> >> > >> > > >> > -- > >> > Richard Biener <rguent...@suse.de> > >> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, > >> > HRB 21284 (AG Nuernberg) > >> > > > > -- > > Richard Biener <rguent...@suse.de> > > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > > 21284 (AG Nuernberg) > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)