Re: poor optimisation case

Tristan Wibberley Sun, 05 Aug 2007 16:39:36 -0700

On Sun, 2007-08-05 at 16:58 -0400, Tim Prince wrote: 
> [EMAIL PROTECTED] wrote:


[snip]

> > The loop unfortunately can't always be written as in -DOLD as the
> > implementation of an iterator adapter might use ?: to special case the
> > first element of a sequence and when used in a generic algorithm which
> > just has the simple loop of -DNEW it ought to be optimised like -DOLD if
> > inlining occurs.
> > 
> I don't see why you special case the first iteration of a loop with ? 
> inside the loop.  Simply write the first iteration separately, and begin 
> the loop with the next iteration.  It should be a lot clearer both to us 
> and to the compiler what is your intention.
> Doesn't this belong on gcc-help?  Better peeling optimization needs more 
> justification than this.

As above. When using C++ as it's supposed to be used this isn't
possible. If I've got a sequence of values and I want to sum them there
is a generic sum algorithm that I'm supposed to use (and which I
*should* use to avoid unmaintainable spaghetti code which soon turns up
when every thing has to be hand coded to be fast - I've read "numerical
recipes in C" and nearly killed myself by the end of it).

If I've got a 1000000 element sequence and want (on one thread) to
compare what would happen in my generic algorithm (such as std::sum) if
the first element is doubled but (on a second thread) with the sequence
as it is, I'm not supposed to be required to write two versions of a
large piece of code - I'm supposed to be able to just write an iterator
that returns a different value for the first element.

In the case of returning a different value for the first element, I'd
use something in operator* that after inlining would end up as
equivalent to d?d:1 and the compiler should optimise that if it can and
if it would reduce runtime by a massive 33% and text size by quite a
bit. Both of those are true in this case.

Basically, writing two versions of the loop is not an option in real
life because developing on spaghetti C code is costly while elegant C++
code is cheap.

For example, my real (big) case is that I've got an iterator that moves
around an image or video thusly (or 4,5,6 dimensional equivalent):

  it = it[1] + 6; // move 6 rows along the 2nd dimension

When I move in all dimensions at once I quite sensibly have a loop from
dimension zero to the top dimension adding
(amount-to-move*stride-of-dimension) to the pointer that the iterator is
implemented with for each one. The strides of the dimensions are
previously recorded in an array that the iterator holds a reference to -
except the first (because when you have a one dimensional image - a
signal - nothing needs to be stored) which is always 1. So everything
gets reduced down correctly by g++, except that this special case is not
moved out of the loop.

So this is not a programming problem that I need help with (I know how
to micro-optimise with C-style ugly-stuff). I'm just reporting a
significant missed optimisation opportunity that will help C++
developers even if not C or fortran developers. The only reason I didn't
reflect the C++ simple-vs-spaghetti code issue in my example is because
I wanted to keep it simple to target one of the problems very
specifically.

-- 
Tristan Wibberley

Any opinion expressed is mine (or else I'm playing devils advocate for
the sake of a good argument). My employer had nothing to do with this
communication.

Re: poor optimisation case

Reply via email to