http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935



--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> 
2013-04-16 07:48:47 UTC ---

On Mon, 15 Apr 2013, ysrumyan at gmail dot com wrote:



> 

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56935

> 

> --- Comment #4 from Yuri Rumyantsev <ysrumyan at gmail dot com> 2013-04-15 
> 14:54:50 UTC ---

> Richard,

> 

> both subq's are accessed the same cash line and it means that after 1st 

> store tthe 2nd load will stall till finish updating data cash (this is 

> not exact explanation but if you'd like I can find out more strong and 

> correct definition of memory conflict). In result non-vectorizable code 

> will run much slower adn we saw such slowdown on 253.perl from cpu2000.



I fear this is beyond the scope of the vectorizer cost model in

its current form.  Clearly what it computes is correct if the

cost is defined as a sum of individual stmt costs (which is

how the scalar cost is computed).  The vectorizer cost model

now gives the target the power to look at the whole vectorized

sequence and compute something better than the sum of the individual

vectorized stmt costs, but currently the x86 target does not

use this power.



Factoring in instruction cache it's still not clear that a possible

extra cache miss for ifetch is worth avoiding the "stall" due to

the store forwarding issue.  (btw, your explanation looks odd -

there is no dependency between the two - yes, if the share the

same slot as the store buffers granularity then maybe a failed

store forward (due to _no_ dependency) may cause that issue?)



Note that for basic-block vectorization we want to even more

keep an eye on code-size, and the same cost model is used for

basic-block and loop SLP.



Richard.

Reply via email to