On Wed, 11 Apr 2012, Richard Guenther wrote:
> > But it would possibly be an interesting experiment already to do such
> > transformation generally (without profiling) and see what it gives on
> > some benchmarks. Just to get a feel what's on the plate.
> The question is, of course, why on earth is a modulo operation in the
> loop setup so expensive that avoiding it improves the performance of the
> overall routine so much ...
Because in most cases in protein the loop actually runs only one or two
times or not at all, hence loop setup is more expensive than the loop
> did you expect the code-gen difference of your patch?
Which code-gen difference? I expected that in the protein case the
division isn't executed, if that was what your question was about. If it
wasn't, please reformulate :)