https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77621
--- Comment #16 from rguenther at suse dot de <rguenther at suse dot de> --- On Tue, 20 Sep 2016, ubizjak at gmail dot com wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77621 > > --- Comment #15 from Uroš Bizjak <ubizjak at gmail dot com> --- > (In reply to Richard Biener from comment #12) > > > If V2DFmode moves are fine(?) then maybe not do this for the load/store > > kinds - this means only handling vector_stmt this way (and maybe > > vect_promote_demote?) - at least make sure to not handle scalar_* > > (not sure if vectype is always NULL for those -- docs say only > > memory ops may depend on vectype). > > Moves are fine, V2DFmode vector arithmetic insns (addpd, subpd, mulpd) have > much higher latencies (e.g. 6 for addpd, 9 for mulpd), comparing to their > {SF,DF}mode (or V4SFmode) versions (1 for addps, 2 for mulps). > > > Instead of += 20 I'd have done *= <factor> to > > make it more independent of the absolute value of the cost numbers. > > IMO, having no other data at hand than Agner Fog's instruction tables, it > looks > that penalizing vector_stmt cost with a factor of 5 should be OK for a start. > > > If you'd do the cost adjustment in ix86_add_stmt_cost you have more control > > over the details (there's also similar offsetting for silvermont) > > ix86_builtin_vectorization_cost is also called from there. OTOH, > ix86_add_stmt_cost uses some other arguments (e.g. location), which I think > are > irrelevant to the insn type cost adjustment. At least you won't get called for the scalar loop copy and you have definite acccess to vectype.