Hello Kai.  Thank you for the detailed response.  It takes someone brave 
to wade into a post that long.

To respond to your points:
> I disagree here.
> The cite from http://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html
Well, it's good to know that we are (literally) on the same page about 
where the behavior here is defined.
> Therefore the use of volatile is a way to minimize the number of
> constrains.  And for sure it doesn't hurt.  To optimize constrains is
> of course welcome, but then we need to be sure all required are
> provided.
However, you and I seem to have drawn very different conclusions from 
what we read.  As many of your responses were based on this assertion, 
I'm going to start here.

Since you are the longtime expert in this area and I am the newbie (at 
least to gcc), it might be tempting to assume that I simply have this 
wrong.  And I am certainly willing to entertain that possibility.  
However, I may be right, and I hope you are prepared to consider that 
possibility as well.  That said...

I'm going to take aim directly at your statements that "And for sure it 
doesn't hurt" and "To avoid that[,] volatile is IMHO the better choice 
to make things easier to read."  I believe that it does hurt, and is 
more than a formatting issue.  Here is my proof that your statements are 
incorrect:

#define NDEBUG

#include <stdio.h>
#include <assert.h>

typedef unsigned char BOOLEAN;
typedef unsigned long DWORD;

     __CRT_INLINE BOOLEAN _BitScanForward(DWORD *Index, DWORD Mask)
     {
       __asm__ __volatile__ ("bsfl %1,%0"
          : "=r" (*Index)
          : "r" (Mask)
          : "cc");
       return Mask!=0;
     }

int __cdecl main(int argc, char* argv[])
{
    DWORD a = atoi(argv[1]);

    DWORD b;
    _BitScanForward(&b, a);

    assert(b < 10);

    printf("hello world\n");
}

Consider this bit of code.  In debug builds, the variable b is compared 
with the constant 10, producing an assertion as appropriate.  Now, how 
about NDEBUG builds?  In that case, b is never referenced after it is 
computed.  In point of fact, since it is never used, there is no reason 
to compute it at all.  Normally the optimizer would just optimize this 
useless computation away. However, by including the __volatile__ 
attribute, we are FORCING the optimizer to include the asm code.

You can examine the output for yourself, but you can see that if you 
remove the volatile qualifier (as I'm recommending), the optimizer is 
able to detect that none of the outputs from BitScanForward are needed, 
and it (correctly) deletes the call.

So, including the volatile keyword (when it isn't needed) *does* hurt.  
It forces the optimizer to produce unneeded code.

Which brings us to your statement about constraints: "we need to be sure 
all required are provided."  That's absolutely true.  And for people who 
are unsure about how this qualifier works, adding it is undoubtedly the 
*safest* course.  As I have just demonstrated, having it when you don't 
need it produces inefficient code.  Missing it when you do need it will 
produce incorrectly functioning code.

But if you *do* know how it works, omitting it when it isn't needed 
(which I believe is the case here) can allow the optimizer to produce 
more efficient code.

Looking forward to your code example proving that I have this all wrong...

dw

------------------------------------------------------------------------------
Own the Future-Intel(R) Level Up Game Demo Contest 2013
Rise to greatness in Intel's independent game demo contest. Compete 
for recognition, cash, and the chance to get your game on Steam. 
$5K grand prize plus 10 genre and skill prizes. Submit your demo 
by 6/6/13. http://altfarm.mediaplex.com/ad/ck/12124-176961-30367-2
_______________________________________________
Mingw-w64-public mailing list
Mingw-w64-public@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mingw-w64-public

Reply via email to