Re: [JAVA3D] Grafic card question

Dipl. Ing. Paul Szawlowski Wed, 09 May 2001 06:05:52 -0700
Thanks for the explanation.
There's one point where I disagree:
I think the loop of Justin is correct because he uses --i. That means before
the first comparison i is decremented by one. It would be wrong if he would
have used i --.
That's one of the reasons I don't like using the ++, -- operators in
statements more comple than a simple assignment.

I strongly agree with the cache argument.

regards
Paul


Bo Nygaard Bai schrieb:

> I appologise if you see this twice.
>
> Hi Paul and Justin,
>
> Dipl. Ing. Paul Szawlowski wrote:
>
> > Hi Justin,
> >
> >> A simple example: Why is this loop
> >>
> >> for(int i = count; --i >= 0 ; )
> >>
> >> faster than this loop
> >>
> >> for(int i = 0; i < count - 1; i++ )
> >>
> >> given exactly the same loop content?
> >
> >
> I would have to disagree with you on that Justin. But only because
> the first loop runs count times and the second runs count-1 times.
> So I'll assume you meant:
>
> for (int i = 0; i < count; i++)
>
> Old hackers makes mistakes too, ehh [:-)]
>
> >
> > Ok, here is my coming out. I do programming now but don't have a solid
> > programmer's education background. So I can't tell you why the first
> > loop is
> > faster. But I really would like to know more about what's going on
> > under the
> > hood. Could you please explain to me
>
> The reason for this is the way your program is changed into simple
> instructions for your CPU. I will illustrate this with some
> simplified maschine code examples. No CPU has exactly these
> instructions, but they all look a bit like this. It should be
> quite simle to follow, even if you have never programmed at
> machine level.
>
> Lets first look at the "slow" example:
>
> for (int i = 0; i < count; i++)
>
> A direct translation into a simplified CPU code would look like this:
>
> LD  A,0     - load 0 into register A
> :::loop     - label marking the start of the loop (we jump to here)
> CMP A,count - compare A to count
> BGE done    - branch (on greater og equal) to label done
> .
> . [code inside the loop]
> .
> INC A       - increment register A by 1
> JMP loop    - jump to label loop (round again)
> :::done     - label marking the code after the loop
>
> Each time round the loop we do:
>
> compare, branch, ... increment, jump
>
> The key to the trick is the way the compare (CMP) and branch (BGE)
> instructions are tied together. The branch looks at some flags inside
> the CPU to know if it should jump. The CMP instruction sets these
> flags depending on the result of the comparison.
>
> Most instructions sets the CPU flags when "something special happens"
> like a result is zero or a walue changed sign (+/-).
>
> Each time around the loop we increment the value value of A, but
> nothing special happens when we reach count, so we have to compare
> A to count to see if we are done.
>
> The trick is to arrange it so that "something special happens" when
> we are done. To do this we count down to zero in stead:
>
> for (int i = count; --i >= 0; )
>
> Now when the value of "i" (register A) changes sign to become -1
> we should  branch out of the loop. Changing sign is
> "something special" and sets a flag in the CPU. How nice!
> We don't need the compare instruction to set the flags anymore.
>
> LD  A,count - load count into register A
> :::loop     - label marking the start of the loop
> DEC A       - decrement register A by 1 (also sets flags)
> BNE done    - branch (if the result was negative) to label done
> .
> . [code inside the loop]
> .
> JMP loop    - jump to label loop (round again)
> :::done:
>
> Each time round the loop we do:
>
> decrement, branch, ... jump
>
> We have saved a compare each time round the loop.
>
> >
> > The only thing what I can see is the count - 1 statement. Is it
> > calculated
> > after every loop step ? Will the compiler only optimize this if count is
> > declared const(final) ?
> >
> > Is this equivalent to the first loop ?
> > const int tempCount = count - 1;
> > for( int i = 0; i < tempCount; i ++)
>
> You can trust your compiler to do that for you.
>
> Any modern compiler will find instructions that can be moved
> outside the loop and will generaly do so "behind your back".
>
> In fact, almost all of the old hacker tricks are performed by
> optimizing compilers now, and you will see no change in
> performance (or even in the maschine code) if you use them.
>
> Justins example is still valid because changing the direction
> of the count i often not a simple optimization. The compiler
> must be sure it does not change the result of the program.
> Some optimizing compilers **will** actually change the code
> to count down in stead of up if "i" is not used in the loop.
>
> In the old days of simpler CPU's the direction of loops could
> have a big impact on a programs running time. This is still
> true for signal processors. For the PC class system of today,
> it is not important unless you are writing low level libraries.
> If you do that kind of optimizations. Run tests!
> Brach prediction and the like can hold a few surprises.
>
> My advice is to:
>
> * Write clear and readable code
> * Use a good optimizing compiler
> * Do code opmizations by hand in your spare time (for fun)
>
> The kind of optimization that you realy **must** considder
> is how you access memory. In a PC class system you have
> a very fast cache and some very slow memory. There can be
> an order of mangnitudes difference in running time between
> "the right way" and "the wrong way" of using the cache.
>
> /Bo Bai

===========================================================================
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff JAVA3D-INTEREST".  For general help, send email to
[EMAIL PROTECTED] and include in the body of the message "help".
Re: [JAVA3D] Grafic card question

Reply via email to