Re: recent troubles with float vectors bitwise ops

2007-08-25 Thread Paolo Bonzini

given that we know
that the processor supports bitwise-or on floating point values, using
a instruction different from that for bitwise-or on integer values,
then it is fair to ask why we don't support vector | vector for
floating point vectors.


Because processors may add weird instructions for internal reasons, 
especially in an area where you want to extract every little bit of 
performance.  It is up to the back-end to ensure that the instruction is 
generated whenever appropriate.


Paolo


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Paolo Bonzini



Let's assume that the recent change is what we want, i.e., that the
answer to (1) is No, these operations should not be part of the vector
extensions because they are not valid scalar extensions.  So, that
means we need to answer (2).

We still have the problem that users now can't write machine-independent
code to do this operation.  Assuming the operations are useful for
something (and, if it weren't, why would Intel want to have instructions
for it, and why would tbp want to use it?),


I'm not sure that it is *so* useful for a user to have access to it,
except for specialized cases:

1) neg, abs and copysign operations on vectors.  These we can make
available via builtins (for - of course you don't need it); we already
support them in many back-ends.

abs:
 cmpeqps xmm1, xmm1   ; xmm1 = all-ones
 psrlq   xmm1, 31 ; xmm1 = all 1000...
 andnps  xmm2, xmm1   ; xmm2 = abs(xmm2)

neg:
 cmpeqps xmm1, xmm1   ; xmm1 = all-ones
 psrlq   xmm1, 31 ; xmm1 = all 1000...
 xorps   xmm2, xmm1   ; xmm2 = -xmm2

copysign:
 cmpeqps xmm1, xmm1   ; xmm1 = all-ones
 psrlq   xmm1, 31 ; xmm1 = all 1000...
 andnps  xmm2, xmm1   ; xmm2 = abs (xmm2)
 andps   xmm1, xmm3   ; xmm1 = signbit (xmm2)
 orpsxmm2, xmm1   ; xmm2 = copysign (xmm2, xmm3)


2) selection operations on vectors, kind of (v1 = v2 ? v3 : v4).  These
can be written for example like this:

 cmpleps xmm1, xmm2   ; xmm1 = xmm1 = xmm2 ? all-ones : 0
 andnps  xmm4, xmm1   ; xmm4 = xmm1 = xmm2 ? 0 : xmm4
 andps   xmm1, xmm3   ; xmm1 = xmm1 = xmm2 ? xmm3 : 0
 orpsxmm1, xmm4   ; xmm1 = xmm1 = xmm2 ? xmm3 : xmm4

And we can add as an extension to our vector arithmetic set; they are
already supported as VEC_COND_EXPR by the middle-end.


For other cases, which do not come to mind at the moment, introducing a
couple of casts is not a big deal IMNSHO, especially if we make sure
that the generated code is good.  Right now, we have good code for SSE,
and a prototype patch was posted for SSE2 and up.

If we have a good extension for vector arithmetic, we should aim at
improving it consistently rather than extending it in unpredictable
ways.  For example, another useful extension would be the ability to
access vectors by item using x[n] (at least with constant expressions).


What are these operation used for?  Can someone give an example of a
kernel than benefits from this kind of thing?


See above.

Paolo




Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Tim Prince

Paolo Bonzini wrote:


2) selection operations on vectors, kind of (v1 = v2 ? v3 : v4).  These
can be written for example like this:

 cmpleps xmm1, xmm2   ; xmm1 = xmm1 = xmm2 ? all-ones : 0
 andnps  xmm4, xmm1   ; xmm4 = xmm1 = xmm2 ? 0 : xmm4
 andps   xmm1, xmm3   ; xmm1 = xmm1 = xmm2 ? xmm3 : 0
 orpsxmm1, xmm4   ; xmm1 = xmm1 = xmm2 ? xmm3 : xmm4
SSE4 introduces specific instruction support, with a shorter sequence 
for this purpose.  It seems to be quite difficult to persuade gcc to use it.


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread tbp

Mark Mitchell wrote:

One option is for the user to use intrinsics.  It's been claimed that
results in worse code.  There doesn't seem any obvious reason for that,
but, if true, we should try to fix it; we don't want to penalize people
who are using the intrinsics.  So, let's assume using intrinsics is just
as efficient, either because it already is, or because we make it so.
I maintain that empirical claim; if i compare what gives a simple SOA 
hybrid 3 coordinates something implemented via intrinsics, builtins and 
vector when used as the basic component for a raytracer kernel i get as 
many codegen variations: register allocations differ, stack footprints 
differ, branches  code organization differ, etc... so it's not that 
surprising performance also differ. It appears the vector  builtin 
(which isn't using __m128 but straight v4sf) implementations are mostly 
on par while the intrinsic based version is slightly slower.
Then you factor in how convenient it is, well... was, to use that vector 
extension to write such something...


Another issue is that for MSVC and ICC, __m128 is a class, but not for 
gcc so you need more wrapping in C++ but if you know you can let some 
naked v4sf escape because the compiler always does the right thing with 
them.


Now while there's some subtleties (and annoying 'features'), i should 
state that gcc4.3, if you're careful, generates mostly excellent SSE 
code (especially on x86-64, even more so if compared to icc).



We still have the problem that users now can't write machine-independent
code to do this operation.  Assuming the operations are useful for
That and writing, say, a generic int,float,double something takes much 
much more work.



What are these operation used for?  Can someone give an example of a
kernel than benefits from this kind of thing?
There's of course what Paolo Bonzini described, but also all kind tricks 
that knowing such operations are extremely efficient encourages.
While it would be nice to have such builtins also operate on vectors, if 
only because they are so common, it's not quite the same as having full 
freedom and hardware features exposed.





Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread tbp

Paolo Bonzini wrote:

I'm not sure that it is *so* useful for a user to have access to it,
except for specialized cases:
As there's other means, it may not be that useful but for sure it's 
extremely convenient.



2) selection operations on vectors, kind of (v1 = v2 ? v3 : v4).  These
can be written for example like this:

 cmpleps xmm1, xmm2   ; xmm1 = xmm1 = xmm2 ? all-ones : 0
 andnps  xmm4, xmm1   ; xmm4 = xmm1 = xmm2 ? 0 : xmm4
 andps   xmm1, xmm3   ; xmm1 = xmm1 = xmm2 ? xmm3 : 0
 orpsxmm1, xmm4   ; xmm1 = xmm1 = xmm2 ? xmm3 : xmm4
I suppose you'll find such variant of a conditional move pattern in 
every piece of SSE code.


But you can't condense bitwise vs float usage to a few patterns because 
when writing SSE, the efficiency of those operations is taken for granted.




If we have a good extension for vector arithmetic, we should aim at
improving it consistently rather than extending it in unpredictable
ways.  For example, another useful extension would be the ability to
access vectors by item using x[n] (at least with constant expressions).

Yes, yes and yes.





Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Ian Lance Taylor
Paolo Bonzini [EMAIL PROTECTED] writes:

 1) neg, abs and copysign operations on vectors.  These we can make
 available via builtins (for - of course you don't need it); we already
 support them in many back-ends.

Here is my point of view.  People using the vector extensions are
already writing inherently machine specific code, and they are
(ideally) familiar with the instruction set of their processor.  I see
no significant disadvantage to gcc to granting them easy access to the
capabilities of their processor.  Saying that these capabilities are
available in other ways just amounts to putting an obstacle in their
path.  If there is a reason to put in that obstacle--e.g., because we
are implementing a language standard and the language standard forbids
it--then fine.  But citing a PowerPC specific standard to forbid code
appropriate for the x86 does not count as a sufficient reason in my
book.

Permitting this extension continues the preexisting behaviour, and it
helps programmers and helps existing code.  Who does it hurt to permit
this extension?  Who does it help to forbid this extension?

Ian


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Paul Brook
On Friday 24 August 2007, Ian Lance Taylor wrote:
 Paolo Bonzini [EMAIL PROTECTED] writes:
  1) neg, abs and copysign operations on vectors.  These we can make
  available via builtins (for - of course you don't need it); we already
  support them in many back-ends.

 Here is my point of view.  People using the vector extensions are
 already writing inherently machine specific code, and they are
 (ideally) familiar with the instruction set of their processor.

By the same argument, If you're already writing machine specific code then 
there shouldn't be a problem using machine specific intrinsics. I admit I've 
never been convinced that the generic vector support was sufficient to write 
useful code without resorting to machine specific intrinsics.

 Permitting this extension continues the preexisting behaviour, and it
 helps programmers and helps existing code.  Who does it hurt to permit
 this extension?  Who does it help to forbid this extension?

I'm partly worried about cross-platform compatibility, and what this imples 
for other SIMD targets.
At minimum we need to fix the internals documentation to say how to support 
this extension. The current docs are unclear whether (or:V2SF ...) is valid 
RTL.

Paul


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Chris Lattner


On Aug 24, 2007, at 8:02 AM, Ian Lance Taylor wrote:

Permitting this extension continues the preexisting behaviour, and it
helps programmers and helps existing code.  Who does it hurt to permit
this extension?  Who does it help to forbid this extension?


Aren't builtins the designated way to access processor-specific  
features like this?  Why does there have to be C operators for  
obscure features like this?


Wouldn't it be better to fix the code generator to do the right thing  
regardless of how the user presents it?  There is a lot of code that  
uses casts (including the builtin implementations themselves) - it  
seems worthwhile to generate instructions for the right domain for  
this code as well.


-Chris


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Ian Lance Taylor
Chris Lattner [EMAIL PROTECTED] writes:

 On Aug 24, 2007, at 8:02 AM, Ian Lance Taylor wrote:
  Permitting this extension continues the preexisting behaviour, and it
  helps programmers and helps existing code.  Who does it hurt to permit
  this extension?  Who does it help to forbid this extension?
 
 Aren't builtins the designated way to access processor-specific
 features like this?  Why does there have to be C operators for
 obscure features like this?

A fair question, but we've already decided to support vector + vector
and such operations, and we've decided that that is one valid way to
generate vector instructions.  That decision may itself have been a
mistake.  But once we accept that decision, then, given that we know
that the processor supports bitwise-or on floating point values, using
a instruction different from that for bitwise-or on integer values,
then it is fair to ask why we don't support vector | vector for
floating point vectors.

 Wouldn't it be better to fix the code generator to do the right thing
 regardless of how the user presents it?  There is a lot of code that
 uses casts (including the builtin implementations themselves) - it
 seems worthwhile to generate instructions for the right domain for
 this code as well.

I completely agree.

Ian


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Mark Mitchell
Paul Brook wrote:
 On Friday 24 August 2007, Ian Lance Taylor wrote:
 Paolo Bonzini [EMAIL PROTECTED] writes:
 1) neg, abs and copysign operations on vectors.  These we can make
 available via builtins (for - of course you don't need it); we already
 support them in many back-ends.
 Here is my point of view.  People using the vector extensions are
 already writing inherently machine specific code, and they are
 (ideally) familiar with the instruction set of their processor.
 
 By the same argument, If you're already writing machine specific code then 
 there shouldn't be a problem using machine specific intrinsics. I admit I've 
 never been convinced that the generic vector support was sufficient to write 
 useful code without resorting to machine specific intrinsics.

Our VSIPL++ team is using it for some things.  My guess is that it's
probably not sufficient for all things, but probably is sufficient for
many things.  Also, I expect some users get (say) a 4x speedup over C
code easily by using the vector extension, and could get an 8x speedup
by using intrinsics, but with a lot more work.  So, the vector
extensions give them a sweet spot on the performance/effort/portability
curve.

 I'm partly worried about cross-platform compatibility, and what this imples 
 for other SIMD targets.

Yes.  Here's a proposed definition:

Let a and b be floating-point operands of type F, where F is a
floating-point type.  Let N be the number of bytes in F.  Then, a | b
is defined as:

  ({ union fi { F f; char bytes[N]; };
 union fi au;
 union fi bu;
 au.f = a;
 bu.f = b;
 for (i = 0; i  N; ++i)
   au.bytes[i] |= bu.bytes[i];
 au.f; })

If the resulting floating-point value is denormal, NaN, etc., whether or
not exceptions are raised is unspecified.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Chris Lattner


On Aug 24, 2007, at 8:37 AM, Ian Lance Taylor wrote:


Chris Lattner [EMAIL PROTECTED] writes:


On Aug 24, 2007, at 8:02 AM, Ian Lance Taylor wrote:
Permitting this extension continues the preexisting behaviour,  
and it
helps programmers and helps existing code.  Who does it hurt to  
permit

this extension?  Who does it help to forbid this extension?


Aren't builtins the designated way to access processor-specific
features like this?  Why does there have to be C operators for
obscure features like this?


A fair question, but we've already decided to support vector + vector
and such operations, and we've decided that that is one valid way to
generate vector instructions.  That decision may itself have been a
mistake.  But once we accept that decision, then, given that we know
that the processor supports bitwise-or on floating point values, using
a instruction different from that for bitwise-or on integer values,
then it is fair to ask why we don't support vector | vector for
floating point vectors.


My personal opinion is that the grammar and type rules of the  
language should be defined independently of the target.  + is  
allowed on all generic vectors for all targets.  Allowing ^| to be  
used on FP vectors on some targets but not others seems extremely  
inconsistent (generic vectors are supposed to provide some amount of  
portability after all).  Allowing these operators on all targets also  
seems strange to me, but is a better solution than allowing them on  
some targets but not others.


I consider pollution of the IR to be a significant problem.  If you  
allow this, you suddenly have tree nodes and RTL nodes for logical  
operations that have to handle operands that are FP vectors.  I  
imagine that this will result in either 1) subtle bugs in various  
transformations that work on these or 2) special case code to handle  
this in various cases, spread through the optimizer.


-Chris


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Andrew Pinski
On 8/24/07, Mark Mitchell [EMAIL PROTECTED] wrote:
 Let a and b be floating-point operands of type F, where F is a
 floating-point type.  Let N be the number of bytes in F.  Then, a | b
 is defined as:

Yes that makes sense, not.  Since most of the time, you have a mask
and that is what is being used.  Like masking the the sign bit or
doing a selection.  The mask is most likely a NaN anyways so having
that undefined just does not make sense.  So is this going to be on
scalars?  If not, then we should still not accept it on vectors.

-- Pinski


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Mark Mitchell
Andrew Pinski wrote:
 On 8/24/07, Mark Mitchell [EMAIL PROTECTED] wrote:
 Let a and b be floating-point operands of type F, where F is a
 floating-point type.  Let N be the number of bytes in F.  Then, a | b
 is defined as:
 
 Yes that makes sense, not.  

I'm not following.  Are you agreeing or disagreeing?

 Since most of the time, you have a mask
 and that is what is being used.  Like masking the the sign bit or
 doing a selection.  The mask is most likely a NaN anyways so having
 that undefined just does not make sense.  

I'm not following.  What I meant was that if the result was a NaN,
whether or not floating-point exceptions were signalled was unspecified.
 Where does undefined come into it, and what does that have to do with
the mask?  If we think that no hardware will ever signal an exception in
this case, then we can say that the operation never signals an
exception.  But, I was afraid that might be too strong a constraint.

 So is this going to be on
 scalars?  If not, then we should still not accept it on vectors.

Yes, from a language-design point of view, it should be for both scalars
and vectors, so I wrote the strawman definition in terms of scalars.  Of
course, if where it's actually useful is vectors, then implementing it
for vectors is the important case, and whether or not we get around to
doing it on scalars is secondary.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


RE: recent troubles with float vectors bitwise ops

2007-08-24 Thread Dave Korn
On 24 August 2007 17:04, Andrew Pinski wrote:

 On 8/24/07, Mark Mitchell [EMAIL PROTECTED] wrote:
 Let a and b be floating-point operands of type F, where F is a
 floating-point type.  Let N be the number of bytes in F.  Then, a | b
 is defined as:
 
 Yes that makes sense, not.  Since most of the time, you have a mask
 and that is what is being used.

  http://en.wikipedia.org/wiki/Weasel_word.

  Like masking the the sign bit or
 doing a selection.  The mask is most likely a NaN anyways so having
 that undefined just does not make sense.  

  What are you talking about?  I can't even parse this rant.


cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Paul Brook
  I'm partly worried about cross-platform compatibility, and what this
  imples for other SIMD targets.

 Yes.  Here's a proposed definition:

snip

I agree this is the only sane definition.

I probably wasn't clear: My main concern is that if we do support this 
extension the internals should be implemented and documented in such a way 
that target maintainers (i.e. me) can figure out how to make it work on their 
favourite target. We should not just quietly flip some bit in the x86 
backend.

Paul


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Mark Mitchell
Paul Brook wrote:

 I probably wasn't clear: My main concern is that if we do support this 
 extension the internals should be implemented and documented in such a way 
 that target maintainers (i.e. me) can figure out how to make it work on their 
 favourite target. We should not just quietly flip some bit in the x86 
 backend.

Totally agreed.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Paolo Bonzini

If there is a reason to put in that obstacle--e.g., because we
are implementing a language standard and the language standard forbids
it--then fine.  But citing a PowerPC specific standard to forbid code
appropriate for the x86 does not count as a sufficient reason in my
book.


The code I want to forbid is actually appropriate not only for the x86; 
the exact same code is appropriate for PowerPC, because the same kind of 
masking operations can be used there.  However, for some reason, the 
PowerPC spec chose *not* to allow vector float bitwise operations, and 
I agree with it; the reason I want to avoid this, is that it goes 
against our guideline for vector extensions (i.e. valarray).


Users can also achieve the same effect with casts, and in addition I 
would like to trade this lost ability with two gained abilities.  First, 
 I want GCC to produce the exact same code with and without casts. 
Second, I want GCC to have builtins supporting most common uses of the 
idiom, so that users can actually do without casts *and* bitwise 
operations 99% of the time.


Paolo


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Joe Buck
On Fri, Aug 24, 2007 at 02:34:27PM -0400, Ross Ridge wrote:
 Mark Mitchell 
 Let's assume that the recent change is what we want, i.e., that the
 answer to (1) is No, these operations should not be part of the vector
 extensions because they are not valid scalar extensions.  
 
 I don't think we should assume that.  If we were to we'd also have
 to change vector casts to work like scalar casts and actually convert
 the values.  (Or like valarray, disallow them completely.)  That would
 force a solution like Paolo Bonzini's to use unions instead of casts,
 making it even more cumbersome.

In C++, you could use reinterpret_cast (meaning that
values are not converted, just reinterpreted as integers of the same
size).  That would avoid the need for unions, you'd just cast.  But this
solution doesn't work for C.

 Using vector casts that behave differently than
 scalar casts has a lot more potential to generate confusion than allowing
 bitwise operations on vector floats does.

I suppose you could have an appropriately named intrinsic for doing
a reinterpret_cast in C (that is, the type would be reinterpreted but it would
be a no-op at machine level).  Then, to do a masking operation you could
write

ovec = __as_float_vector(MASK | __as_int_vector(ivec));


Re: recent troubles with float vectors bitwise ops

2007-08-24 Thread Ross Ridge
Mark Mitchell 
Let's assume that the recent change is what we want, i.e., that the
answer to (1) is No, these operations should not be part of the vector
extensions because they are not valid scalar extensions.  

I don't think we should assume that.  If we were to we'd also have
to change vector casts to work like scalar casts and actually convert
the values.  (Or like valarray, disallow them completely.)  That would
force a solution like Paolo Bonzini's to use unions instead of casts,
making it even more cumbersome.

If you look at what these bitwise operations are doing, they're taking
a floating point vector and applying an operation (eg. negation) to
certain members of the vector of according to a (normally) constant mask.
They're really unaray floating-point vector operations.  I don't think
it's unreasonable to want to express these operations using floating-point
vector types directly.  Using vector casts that behave differently than
scalar casts has a lot more potential to generate confusion than allowing
bitwise operations on vector floats does.

As I see it, there's two ways you can express these kinds operations
without using casts that are both cumbersome and misleading.  The easy
way would be to just revert the change, and allow bitwise operations on
vector floats.  This is essentially an old-school programmer-knows-best
solution where the compiler provides operators that represent the sort
of operations generally supported by CPUs.  Even on Altivec these bitwise
operations on vector floats are meaningful and useful.

The other way is to provide a complete set operations that would
make using the bitwise operators pretty much unnecessary, like it is
with scalar floats.  For example, you can express masked negation by
multiplying with a constant vector of -1.0 and 1.0 elements.  It shouldn't
be too hard for GCC to optimize this into an appropriate bitwise
instruction for the target.  For other operations the solution isn't
as nice.  You could implement a set of builtin functions easily enough,
but it wouldn't be much better than using target specific intrinsics.
Chances are though that operatations are going to be missed.  For example,
I doubt anyone unfamiliar with 3D programming would've seen the need
for only negating part of a vector.

(A more concise way to eliminate the need for the bitwise operations on
vector floats would be to implement either the swizzles used in 3D
shaders or array indexing on vectors.  It would require a lot of work
to implement properly, so I don't see it happening.)

Ross Ridge



Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Paolo Bonzini



The IA-32 instruction set does distignuish between integer and
floating point bitiwse operations.  In addition to the single-precision
floating-point bitwise instructions that tbp mentioned (ORPS, ANDPS,
ANDNPS and XORPS) there are both distinct double-precision floating-point
bitwise instructions (ORPD, ANDPD, ANDNPD and XORPD) and integer bitwise
instructions (POR, PAND, PANDN and PXOR).  While these operations all do
the same thing, they can differ in performance depending on the context.


Oops, I only remembered PS vs. PD (I remembered POR as MMX instructions 
only).  I believe that optimizing this should be a task for the x86 
machine dependent reorg.


Paolo


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Paolo Bonzini



Why did Intel split up these instructions in the first place, is it
because they wanted to have a seperate vector units in some cases?
I don't know and I don't care that much. 


To some extent I agree with Andrew Pinski here.  Saying that you need 
support in a generic vector extension for vector float | vector float 
in order to generate ANDPS and not PXOR, is just wrong.  That should be 
done by the back-end.


Paolo



Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread tbp

Ross Ridge wrote:

If I were tbp, I'd just code all his vector operatations using intrinsics.
The other responses in this thread have made it clear that GCC's vector
arithemetic operations are really only designed to be used with the Cell
Broadband Engine and other Power PC processors.
Thing is my main use for that extension is for a specialization (made on 
a rainy day out of boredom) of a basic something re-used all over in my 
code; the default implementation uses intrinsics.
It turns out, when benchmarked, that i get better code with the 
specialization. So it's more convenient and faster, win/win.


I'm unsure why the code is better in the end, perhaps because the 
may_alias attribute of __m128, perhaps because some builtins which are 
used to implement those intrinsics are mistyped (ie v4si 
__builtin_ia32_cmpltps (v4sf, v4sf))... i don't know, i'd need to try a 
builtin based specialization.


In any case that vector extension is now totally useless on x86 and 
conflicts with the documentation.


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread tbp

Andrew Pinski wrote:

Which hardware (remember GCC is a generic compiler)?  VMX/Altivec and
SPU actually does not have different instructions for bitwise
and/ior/xor for different vector types (it is all the same
instruction).  I have ran into ICEs with even bitwise on vector
float/double on x86 also in the past which is the other reason why I
disabled them.  Since this is an extension, it would be nice if it was
nicely defined extension which means disabling them for vector
float/double.

It *was* neatly defined:
The types defined in this manner can be used with a subset of normal 
C operations. Currently, GCC will allow using the following operators on 
these types: +, -, *, /, unary minus, ^, |, , ~..



So can you, pretty please, also patch the documentation and maybe point 
to the Altivec spec as it's obviously the only one relevant no matter 
what platform you're on?


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Paolo Bonzini


The types defined in this manner can be used with a subset of normal C 
operations. Currently, GCC will allow using the following operators on 
these types: +, -, *, /, unary minus, ^, |, , ~..


What was missing is when allowed by the base type.  E.g.  is not 
supported.


Paolo


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread tbp

Paolo Bonzini wrote:
To some extent I agree with Andrew Pinski here.  Saying that you need 
support in a generic vector extension for vector float | vector float 
in order to generate ANDPS and not PXOR, is just wrong.  That should be 
done by the back-end.
I guess i fail to grasp the logic mandating that the intended source 
level, strictly typed, 'vector float | vector float' should be mangled 
into an int op with frantic casts to magically emerge out from the 
backend as the original 'vector float | vector float', but i'm not a 
compiler maintener: for me it smells like a regression.


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Paolo Bonzini

tbp wrote:

Paolo Bonzini wrote:
To some extent I agree with Andrew Pinski here.  Saying that you need 
support in a generic vector extension for vector float | vector 
float in order to generate ANDPS and not PXOR, is just wrong.  That 
should be done by the back-end.


I guess i fail to grasp the logic mandating that the intended source 
level, strictly typed, 'vector float | vector float' should be mangled 
into an int op with frantic casts to magically emerge out from the 
backend as the original 'vector float | vector float', but i'm not a 
compiler maintener: for me it smells like a regression.


Because it's *not* strictly typed.  Strict typing means that you accept 
the same things accepted for the element type.  So it's not a 
regression, it's a bug fix.


Paolo


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Paolo Bonzini

GCC makes the problem is even worse if only SSE and not SSE 2 instructions
are enabled.  Since the integer bitwise instructions are only available
with SSE 2, using casts instead of intrinsics causes GCC to expand the
operation into a long series of instructions.


This was also a bug and a patch for this has been posted and approved.

Paolo


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread tbp

Paolo Bonzini wrote:
Because it's *not* strictly typed.  Strict typing means that you accept 
the same things accepted for the element type.  So it's not a 
regression, it's a bug fix.

# cat regressionorbugfix.cc
typedef float v4sf_t __attribute__ ((__vector_size__ (16)));
typedef int v4si_t __attribute__ ((__vector_size__ (16)));
v4sf_t foo(v4sf_t a, v4sf_t b, v4sf_t c) {
return a + (b | c);
}
v4sf_t bar(v4sf_t a, v4sf_t b, v4sf_t c) {
return a + (v4sf_t) ((v4si_t) b | (v4si_t) c);
}
int main() { return 0; }

00400a30 foo(float __vector, float __vector, float __vector):
  400a30:   orps   %xmm2,%xmm1
  400a33:   addps  %xmm1,%xmm0
  400a36:   retq

00400a40 bar(float __vector, float __vector, float __vector):
  400a40:   por%xmm2,%xmm1
  400a44:   addps  %xmm1,%xmm0
  400a47:   retq

I'm surely not qualified to argue about typing, but you'd need a rather 
strong distortion field to not characterize that as a regression.




Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Paolo Bonzini



# cat regressionorbugfix.cc
typedef float v4sf_t __attribute__ ((__vector_size__ (16)));
typedef int v4si_t __attribute__ ((__vector_size__ (16)));
v4sf_t foo(v4sf_t a, v4sf_t b, v4sf_t c) {
return a + (b | c);
}
v4sf_t bar(v4sf_t a, v4sf_t b, v4sf_t c) {
return a + (v4sf_t) ((v4si_t) b | (v4si_t) c);
}
int main() { return 0; }

00400a30 foo(float __vector, float __vector, float __vector):
  400a30:   orps   %xmm2,%xmm1
  400a33:   addps  %xmm1,%xmm0
  400a36:   retq

00400a40 bar(float __vector, float __vector, float __vector):
  400a40:   por%xmm2,%xmm1
  400a44:   addps  %xmm1,%xmm0
  400a47:   retq

I'm surely not qualified to argue about typing, but you'd need a rather 
strong distortion field to not characterize that as a regression.


I've added 5 minutes ago an XFAILed test for exactly this code.  OTOH, I 
have also committed a fix that will avoid producing tons of shuffle and 
unpacking instructions when function bar is compiled with -msse but 
without -msse2.


I'm also going to file a missed optimization bug soon.

I'm curious, does ICC support vector arithmetic like this? Do both 
functions compile? What code does it produce for bar?


Paolo


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread tbp
On 8/23/07, Paolo Bonzini [EMAIL PROTECTED] wrote:
 I've added 5 minutes ago an XFAILed test for exactly this code.  OTOH, I
 have also committed a fix that will avoid producing tons of shuffle and
 unpacking instructions when function bar is compiled with -msse but
 without -msse2.
Thanks.

 I'm also going to file a missed optimization bug soon.
Ditto.

 I'm curious, does ICC support vector arithmetic like this? Do both
 functions compile? What code does it produce for bar?
No, icc9/10 only provide basic support for that extension (and then
only on linux i think)
# /opt/intel/cce/9.1.051/bin/icpc regressionorbugfix.cc
regressionorbugfix.cc(5): error: no operator | matches these operands
operand types are: v4sf_t | v4sf_t
return a + (b | c);
  ^

regressionorbugfix.cc(8): error: no operator | matches these operands
operand types are: v4si_t | v4si_t
return a + (v4sf_t) ((v4si_t) b | (v4si_t) c);
^

but then it's more aggressive about intrinsics than gcc.
Like i said somewhere i got slightly better results when using that
extension than intrinsics with gcc 4.3 but haven't checked if i could
get the same result with builtins yet.


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Tim Prince

Paolo Bonzini wrote:


I'm curious, does ICC support vector arithmetic like this? 
The primary icc/icl use of SSE/SSE2 masking operations, of course, is in 
the auto-vectorization of fabs[f] and conditional operations:


 sum = 0.f;
 i__2 = *n;
 for (i__ = 1; i__ = i__2; ++i__)
 if (a[i__]  0.f)
 sum += a[i__];
 (Windows/intel asm syntax)
  pxor  xmm2, xmm2
  cmpltps   xmm2, xmm3
  andps xmm3, xmm2
  addps xmm0, xmm3
...



Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread tbp
On 8/23/07, Tim Prince [EMAIL PROTECTED] wrote:
 The primary icc/icl use of SSE/SSE2 masking operations, of course, is in
 the auto-vectorization of fabs[f] and conditional operations:

   sum = 0.f;
   i__2 = *n;
   for (i__ = 1; i__ = i__2; ++i__)
   if (a[i__]  0.f)
   sum += a[i__];
  (Windows/intel asm syntax)
pxor  xmm2, xmm2
cmpltps   xmm2, xmm3
andps xmm3, xmm2
addps xmm0, xmm3
 ...
Note that icc9 has a strong bias for pentium4, which had no stall
penalty for mistyped fp vectors as for Intel it came with the pentium
M line, so you see a pxor even if generating code for the core2.
# cat autoicc.cc
float foo(const float *a, int n) {
float sum = 0.f;
for (int i = 0; i n; ++i)
if (a[i]  0.f)
sum += a[i];
return sum;
}
int main() { return 0; }
# /opt/intel/cce/9.1.051/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3) : (col. 2) remark: LOOP WAS VECTORIZED.
  4007a9:   pxor   %xmm4,%xmm4
  4007ad:   cmpltps %xmm3,%xmm4
  4007b1:   andps  %xmm3,%xmm4
# /opt/intel/cce/10.0.023/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3): (col. 2) remark: LOOP WAS VECTORIZED.
  400b50:   xorps  %xmm3,%xmm3
  400b53:   cmpltps %xmm4,%xmm3
  400b57:   andps  %xmm3,%xmm4


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Ian Lance Taylor
Paolo Bonzini [EMAIL PROTECTED] writes:

  The types defined in this manner can be used with a subset of
  normal C operations. Currently, GCC will allow using the following
  operators on these types: +, -, *, /, unary minus, ^, |, , ~..
 
 What was missing is when allowed by the base type.  E.g.  is not
 supported.

I think we should revert the patch, and continue permitting the
bitwise operations on vector float.

There seem to be solid reasons to permit this, and no very strong ones
to prohibit it.  We can consider it to be a GNU extension for vectors.
Vectors are of course themselves an extension already.

Ian


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Paolo Bonzini



The types defined in this manner can be used with a subset of
normal C operations. Currently, GCC will allow using the following
operators on these types: +, -, *, /, unary minus, ^, |, , ~..

What was missing is when allowed by the base type.  E.g.  is not
supported.


I think we should revert the patch, and continue permitting the
bitwise operations on vector float.

There seem to be solid reasons to permit this, and no very strong ones
to prohibit it.


I'm not sure.  I think it's better if we improve the compiler to 
generate better code for the version with casts.  So we get no 
pessimization, and better typechecking.


I think that, in an ideal world, intrinsics would be implemented using 
the generic vector extensions, and they would generic as good code as 
with builtins -- better because of simplifications that can be done 
after inlining.  We should move towards that direction.


Paolo


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Paul Brook
 There seem to be solid reasons to permit this, and no very strong ones
 to prohibit it.  We can consider it to be a GNU extension for vectors.
 Vectors are of course themselves an extension already.

How are you suggesting it be implemented?

Will the front/middle-end convert it to (vNsf)((vNsi)a | (vNsi)b), or do all 
vector backends need to lie about having float vector bitwise operations?

Paul


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Michael Matz
Hi,

On Thu, 23 Aug 2007, Paul Brook wrote:

  There seem to be solid reasons to permit this, and no very strong ones
  to prohibit it.  We can consider it to be a GNU extension for vectors.
  Vectors are of course themselves an extension already.
 
 How are you suggesting it be implemented?
 
 Will the front/middle-end convert it to (vNsf)((vNsi)a | (vNsi)b), or do all 
 vector backends need to lie about having float vector bitwise operations?

optabs and open-coding on expand when unavailable?  Like other constructs?


Ciao,
Michael.


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Joseph S. Myers
On Thu, 23 Aug 2007, Ian Lance Taylor wrote:

 I think we should revert the patch, and continue permitting the
 bitwise operations on vector float.
 
 There seem to be solid reasons to permit this, and no very strong ones
 to prohibit it.  We can consider it to be a GNU extension for vectors.
 Vectors are of course themselves an extension already.

We decided long ago that the extension would be based on what's permitted 
by C++ valarray rather than by a particular CPU's vector intrinsics.  So 
unless C++ valarray allows this operation, I think we should leave it 
prohibited and ensure that the compiler can generate appropriate code for 
these bitwise operations in the presence of casts (the particular integer 
element type used should of course not affect the code for these 
operations either.)

-- 
Joseph S. Myers
[EMAIL PROTECTED]


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Andrew Pinski
On 8/23/07, Joseph S. Myers [EMAIL PROTECTED] wrote:
 On Thu, 23 Aug 2007, Ian Lance Taylor wrote:

  I think we should revert the patch, and continue permitting the
  bitwise operations on vector float.
 
  There seem to be solid reasons to permit this, and no very strong ones
  to prohibit it.  We can consider it to be a GNU extension for vectors.
  Vectors are of course themselves an extension already.

 We decided long ago that the extension would be based on what's permitted
 by C++ valarray rather than by a particular CPU's vector intrinsics.  So
 unless C++ valarray allows this operation, I think we should leave it
 prohibited

And it is not supported by valarray.
Testcase:
#include valarray

using std::valarray;
valarrayfloat a, b;

int f(void)
{
  a = a | b;
}
--- cut --
Error messages:
/usr/include/c++/4.0.0/bits/valarray_before.h: In member function '_Tp
std::__bitwise_or::operator()(const _Tp, const _Tp) const [with _Tp
= float]':
/usr/include/c++/4.0.0/bits/valarray_before.h:527:   instantiated from
'typename std::__fun_Oper, typename _Arg::value_type::result_type
std::_BinBase_Oper, _FirstArg, _SecondArg::operator[](size_t) const
[with _Oper = std::__bitwise_or, _FirstArg = std::valarrayfloat,
_SecondArg = std::valarrayfloat]'
/usr/include/c++/4.0.0/bits/valarray_after.h:220:   instantiated from
'_Tp std::_Expr_Clos, _Tp::operator[](size_t) const [with _Clos =
std::_BinClosstd::__bitwise_or, std::_ValArray, std::_ValArray,
float, float, _Tp = float]'
/usr/include/c++/4.0.0/bits/valarray_array.tcc:149:   instantiated
from 'void std::__valarray_copy(const std::_Expr_Dom, _Tp, size_t,
std::_Array_Tp) [with _Tp = float, _Dom =
std::_BinClosstd::__bitwise_or, std::_ValArray, std::_ValArray,
float, float]'
/usr/include/c++/4.0.0/valarray:696:   instantiated from
'std::valarray_Tp std::valarray_Tp::operator=(const
std::_Expr_Dom, _Tp) [with _Dom = std::_BinClosstd::__bitwise_or,
std::_ValArray, std::_ValArray, float, float, _Tp = float]'
t.cc:8:   instantiated from here
/usr/include/c++/4.0.0/bits/valarray_before.h:243: error: invalid
operands of types 'const float' and 'const float' to binary
'operator|'


Thanks,
Andrew Pinski


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Gabriel Dos Reis
Joseph S. Myers [EMAIL PROTECTED] writes:

| On Thu, 23 Aug 2007, Ian Lance Taylor wrote:
| 
|  I think we should revert the patch, and continue permitting the
|  bitwise operations on vector float.
|  
|  There seem to be solid reasons to permit this, and no very strong ones
|  to prohibit it.  We can consider it to be a GNU extension for vectors.
|  Vectors are of course themselves an extension already.
| 
| We decided long ago that the extension would be based on what's permitted 
| by C++ valarray rather than by a particular CPU's vector intrinsics. 

In C++, the broadcast operations are allowed on arrays if, and only
if, they are allowed on element types.

-- Gaby


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Andrew Pinski
On 8/23/07, Andrew Pinski [EMAIL PROTECTED] wrote:
 On 8/23/07, Joseph S. Myers [EMAIL PROTECTED] wrote:
 
  We decided long ago that the extension would be based on what's permitted
  by C++ valarray rather than by a particular CPU's vector intrinsics.  So
  unless C++ valarray allows this operation, I think we should leave it
  prohibited

 And it is not supported by valarray.

Plus this is already documented:
The operations behave like C++ valarrays. Addition is defined as the
addition of the corresponding elements of the operands.

So if one reads the documentation, vector float | vector float would
mean take each element and ior it with the element in the other vector
so then you have float | float which is invalid.

-- Pinski


Re: recent troubles with float vectors bitwise ops

2007-08-23 Thread Mark Mitchell
Paolo Bonzini wrote:
 
 Why did Intel split up these instructions in the first place, is it
 because they wanted to have a seperate vector units in some cases?
 I don't know and I don't care that much. 
 
 To some extent I agree with Andrew Pinski here.  Saying that you need
 support in a generic vector extension for vector float | vector float
 in order to generate ANDPS and not PXOR, is just wrong.  That should be
 done by the back-end.

Rather than accusing Intel of bad ISA design and the GCC maintainers of
Altivec prejudice, let's just figure out what to do.

We all agree that:

(1) On Intel CPUs, it's more efficient to use the floating-point bitwise
instructions

(2) In C, you can't do a bitwise-or on two floating-point types

So, we have two questions:

(1) Should GCC's vector extensions permit floating-point bitwise operations?

(2) If not, how can a user can get efficient code?

Let's assume that the recent change is what we want, i.e., that the
answer to (1) is No, these operations should not be part of the vector
extensions because they are not valid scalar extensions.  So, that
means we need to answer (2).

One option is for the user to use intrinsics.  It's been claimed that
results in worse code.  There doesn't seem any obvious reason for that,
but, if true, we should try to fix it; we don't want to penalize people
who are using the intrinsics.  So, let's assume using intrinsics is just
as efficient, either because it already is, or because we make it so.

We still have the problem that users now can't write machine-independent
code to do this operation.  Assuming the operations are useful for
something (and, if it weren't, why would Intel want to have instructions
for it, and why would tbp want to use it?), it seems unfortunate to
restrict the extension in this way.  We could always support the scalar
form too, if we want to maintain consistency between the scalar and
vector forms.

Presumably, the reason this isn't standard C or C++ is because the
standards don't specify a floating-point format.  At most, they could
have made the behavior implementation-defined.  But, if nobody thought
it was a useful operation, they probably didn't see any point.

What are these operation used for?  Can someone give an example of a
kernel than benefits from this kind of thing?

Assuming there's a plausible use, my suggestion is that we just undo the
patch that turned off this functionality.  If it doesn't well on some
systems, and we don't have any volunteers to write a fully generic
(e.g., move float operands to integer registers, bitwise operation, move
the result back) then we could always issue a sorry.  Users may then
have to use #ifdefs on some platforms, but that's no worse than using
intrinsics.

-- 
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713



Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Paolo Bonzini
Apparently enough for a small vendor like Intel to propose such things 
as orps, andps, andnps, and xorps.


I think you're running too far with your sarcasm. SSE's instructions do 
not go so far as to specify integer vs. floating point.  To me, ps 
means 32-bit SIMD, independent of integerness.


So, that's what i feared... it was intentional. And now i guess the only 
sanctioned access to those ops is via builtins/intrinsics.


No, you can do so with casts.  Floating-point to integer vector casts 
preserve the bit pattern.  For example, you can do


vector float f = { 5, 5, 5, 5 };
vector int g = { 0x8000, 0, 0x8000, 0 };
vector int f_int = (vector int) f;
f = (vector float) (f_int ^ g);

For Altivec, I get exactly

addis r2,r10,ha16(LC0-gibberish)
la r2,lo16(LC0-gibberish)(r2)
lvx v0,0,r2
vxor v2,v2,v0
...
LC0:
.long   -2147483648
.long   0
.long   -2147483648
.long   0

Paolo


RE: recent troubles with float vectors bitwise ops

2007-08-22 Thread Dave Korn
On 22 August 2007 06:10, Ian Lance Taylor wrote:

 tbp [EMAIL PROTECTED] writes:
 
 vecop.cc:4: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator|'
 vecop.cc:5: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator'
 vecop.cc:6: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator^'
 
 Apparently it's still there as of right now, on x86-64 at least. I
 think this is not supposed to happen but i'm not sure, hence the mail.
 
 What does it mean to do a bitwise-or of a floating point value?
 
 This code also gets an error:
 
 double foo(double a, double b) { return a | b; }


  There are some notable fp hacks and speedups that make use of integer ops on
floating point operands, so it's not entirely an insane notion.

  However, as Paolo points out upthread, that can be done with casts, which is
more correct anyway, so I don't think there's a problem with blocking the
unadorned usage.

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Andrew Pinski
On 8/21/07, tbp [EMAIL PROTECTED] wrote:
 # /usr/local/gcc-4.3-svn.old6/bin/g++ vecop.cc
 vecop.cc: In function 'T foo() [with T = float __vector__]':
 vecop.cc:13:   instantiated from here
 vecop.cc:4: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator|'
 vecop.cc:5: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator'
 vecop.cc:6: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator^'

 Apparently it's still there as of right now, on x86-64 at least. I think
 this is not supposed to happen but i'm not sure, hence the mail.

This is internally, float|float does not make sense so how can vector
float | vector float make sense (likewise for  and ^).

This was PR 30428.

-- Pinski


RE: recent troubles with float vectors bitwise ops

2007-08-22 Thread Dave Korn
On 22 August 2007 11:13, Andrew Pinski wrote:

 On 8/21/07, tbp [EMAIL PROTECTED] wrote:
 # /usr/local/gcc-4.3-svn.old6/bin/g++ vecop.cc
 vecop.cc: In function 'T foo() [with T = float __vector__]':
 vecop.cc:13:   instantiated from here
 vecop.cc:4: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator|'
 vecop.cc:5: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator'
 vecop.cc:6: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator^'
 
 Apparently it's still there as of right now, on x86-64 at least. I think
 this is not supposed to happen but i'm not sure, hence the mail.
 
 This is internally, float|float does not make sense so how can vector
 float | vector float make sense (likewise for  and ^).


float InvSqrt (float x){
float xhalf = 0.5f*x;
int i = *(int*)x;
i = 0x5f3759df - (i1);
x = *(float*)i;
x = x*(1.5f - xhalf*x*x);
return x;
}


  It's not exactly what you're referring to, but it's evidence in favour of the 
argument that we should never presume anything, no matter how unusual, might 
not have a reasonable use.  (However as I said, I'm not arguing against the 
error message, since it's still possible to express this intent using casts.)

cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Andrew Pinski
On 8/22/07, Dave Korn [EMAIL PROTECTED] wrote:
 float InvSqrt (float x){
 float xhalf = 0.5f*x;
 int i = *(int*)x;

You are violating C/C++ aliasing rules here anyways.

 i = 0x5f3759df - (i1);
 x = *(float*)i;

Likewise.

So I guess you like to depend on undefined code :).

-- Pinski


RE: recent troubles with float vectors bitwise ops

2007-08-22 Thread Dave Korn
On 22 August 2007 11:40, Andrew Pinski wrote:

 On 8/22/07, Dave Korn [EMAIL PROTECTED] wrote:
 float InvSqrt (float x){
 float xhalf = 0.5f*x;
 int i = *(int*)x;
 
 You are violating C/C++ aliasing rules here anyways.
 
 i = 0x5f3759df - (i1);
 x = *(float*)i;
 
 Likewise.
 
 So I guess you like to depend on undefined code :).

  Well, I like to think that I could cast the address to unsigned char*, memcpy 
a bunch of them to the address of an int, then dereference the int and the 
compiler would realise it was a no-op and optimise it away, but I doubt thatt 
would actually happen...


cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Rask Ingemann Lambertsen
On Wed, Aug 22, 2007 at 11:47:52AM +0100, Dave Korn wrote:
 
   Well, I like to think that I could cast the address to unsigned char*, 
 memcpy a bunch of them to the address of an int, then dereference the int and 
 the compiler would realise it was a no-op and optimise it away, but I doubt 
 thatt would actually happen...

   It did a few months ago, at least for scalar variables. Have we
regressed in this area?

-- 
Rask Ingemann Lambertsen


RE: recent troubles with float vectors bitwise ops

2007-08-22 Thread Dave Korn
On 22 August 2007 14:06, Rask Ingemann Lambertsen wrote:

 On Wed, Aug 22, 2007 at 11:47:52AM +0100, Dave Korn wrote:
 
   Well, I like to think that I could cast the address to unsigned char*,
 memcpy a bunch of them to the address of an int, then dereference the int and
 the compiler would realise it was a no-op and optimise it away, but I doubt
 thatt would actually happen...   
 
It did a few months ago, at least for scalar variables. 

  I'm proper impressed!

 Have we regressed in this area?

  Not that I know of.  


cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Ross Ridge
tbp writes:
Apparently enough for a small vendor like Intel to propose such things
as orps, andps, andnps, and xorps.

Paolo Bonzini writes:
I think you're running too far with your sarcasm. SSE's instructions
do not go so far as to specify integer vs. floating point.  To me, ps
means 32-bit SIMD, independent of integerness

The IA-32 instruction set does distignuish between integer and
floating point bitiwse operations.  In addition to the single-precision
floating-point bitwise instructions that tbp mentioned (ORPS, ANDPS,
ANDNPS and XORPS) there are both distinct double-precision floating-point
bitwise instructions (ORPD, ANDPD, ANDNPD and XORPD) and integer bitwise
instructions (POR, PAND, PANDN and PXOR).  While these operations all do
the same thing, they can differ in performance depending on the context.

Intel's IA-32 Software Developer's Manual gives this warning:

In this example: XORPS or PXOR can be used in place of XORPD
and yield the same correct result. However, because of the type
mismatch between the operand data type and the instruction data
type, a latency penalty will be incurred due to implementations
of the instructions at the microarchitecture level.

And now i guess the only sanctioned access to those ops is via
builtins/intrinsics.

No, you can do so with casts.

tbp is correct.  Using casts gets you the integer bitwise instrucitons,
not the single-precision bitwise instructions that are more optimal for
flipping bits in single-precision vectors.  If you want GCC to generate
better code using single-precision bitwise instructions you're now forced
to use the intrinsics.

Ross Ridge



Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread tbp
On 8/22/07, Paolo Bonzini [EMAIL PROTECTED] wrote:
 I think you're running too far with your sarcasm. SSE's instructions do
 not go so far as to specify integer vs. floating point.  To me, ps
 means 32-bit SIMD, independent of integerness.
Excuse me if i'm amazed being replied  bitwise ops on floating values
make no sense as the justification for breaking something that used to
work and match hardware features. I naively thought that was the
purpose of that convenient extension.

  So, that's what i feared... it was intentional. And now i guess the only
  sanctioned access to those ops is via builtins/intrinsics.
 No, you can do so with casts.  Floating-point to integer vector casts
 preserve the bit pattern.  For example, you can do
Again SIMD ops (among them bitwise stuff) comes in 3 mostly symmetric
flavors on x86 namely for int, float and doubles; casting isn't
innocuous because there's a penalty for type mismatch (1 cycle of
re-categorization if i remember for both k8 and core2), so it's either
that or some moving around.

Let me cite Intel(r) 64 and IA-32 Architectures Optimization
Reference Manual,  5-1,
When writing SIMD code that works for both integer and floating-point
data, use
the subset of SIMD convert instructions or load/store instructions to
ensure that
the input operands in XMM registers contain data types that are
properly defined
to match the instruction.
Code sequences containing cross-typed usage produce the same result across
different implementations but incur a significant performance penalty. Using
SSE/SSE2/SSE3/SSSE3 instructions to operate on type-mismatched SIMD data
in the XMM register is strongly discouraged.

You could find a similar note in AMD's doc for the k8.


Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Andrew Pinski
On 8/22/07, tbp [EMAIL PROTECTED] wrote:
 On 8/22/07, Paolo Bonzini [EMAIL PROTECTED] wrote:
  I think you're running too far with your sarcasm. SSE's instructions do
  not go so far as to specify integer vs. floating point.  To me, ps
  means 32-bit SIMD, independent of integerness.
 Excuse me if i'm amazed being replied  bitwise ops on floating values
 make no sense as the justification for breaking something that used to
 work and match hardware features. I naively thought that was the
 purpose of that convenient extension.

Which hardware (remember GCC is a generic compiler)?  VMX/Altivec and
SPU actually does not have different instructions for bitwise
and/ior/xor for different vector types (it is all the same
instruction).  I have ran into ICEs with even bitwise on vector
float/double on x86 also in the past which is the other reason why I
disabled them.  Since this is an extension, it would be nice if it was
nicely defined extension which means disabling them for vector
float/double.

Thanks,
Andrew Pinski


Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Andrew Pinski
On 8/22/07, Andrew Pinski [EMAIL PROTECTED] wrote:
 Which hardware (remember GCC is a generic compiler)?  VMX/Altivec and
 SPU actually does not have different instructions for bitwise
 and/ior/xor for different vector types (it is all the same
 instruction).  I have ran into ICEs with even bitwise on vector
 float/double on x86 also in the past which is the other reason why I
 disabled them.  Since this is an extension, it would be nice if it was
 nicely defined extension which means disabling them for vector
 float/double.

One more note, the C/C++ Language extensions for the CBEA
specifications,  says that the bitwise operators don't work on vector
float/double but do work on the integer vector types.  So the other
reason why this change I did was to make us more conforming with that
standard (yes I worked on that spec but I did not write that part).

-- Pinski


Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Ross Ridge
Ross Ridge writes:
tbp is correct.  Using casts gets you the integer bitwise instrucitons,
not the single-precision bitwise instructions that are more optimal for
flipping bits in single-precision vectors.  If you want GCC to generate
better code using single-precision bitwise instructions you're now forced
to use the intrinsics.

GCC makes the problem is even worse if only SSE and not SSE 2 instructions
are enabled.  Since the integer bitwise instructions are only available
with SSE 2, using casts instead of intrinsics causes GCC to expand the
operation into a long series of instructions.

If I were tbp, I'd just code all his vector operatations using intrinsics.
The other responses in this thread have made it clear that GCC's vector
arithemetic operations are really only designed to be used with the Cell
Broadband Engine and other Power PC processors.

Ross Ridge



Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Andrew Pinski
On 8/22/07, Ross Ridge [EMAIL PROTECTED] wrote:
 GCC makes the problem is even worse if only SSE and not SSE 2 instructions
 are enabled.  Since the integer bitwise instructions are only available
 with SSE 2, using casts instead of intrinsics causes GCC to expand the
 operation into a long series of instructions.

And why do a bad decission based on another bad decission?  Why did
Intel split up these instructions in the first place, is it because
they wanted to have a seperate vector units in some cases?  I don't
know and I don't care that much.  This extension is supposed to be
generic and doing weird stuff by allowing bitwise operators on vector
float just confuses people more.  Yes Intel/AMD's specific instruction
set includes that but not everyone elses.

 If I were tbp, I'd just code all his vector operatations using intrinsics.
 The other responses in this thread have made it clear that GCC's vector
 arithemetic operations are really only designed to be used with the Cell
 Broadband Engine and other Power PC processors.

No, they were designed to be generic.  The issue comes down to what is
generic.  I am saying that don't allow it for scalar fp types, why
allow it for vector fp types?  The genericism here is that vector is
just an expansion on top of the scalar types.  Not many new features
are supposed to be added.

-- Pinski


Re: recent troubles with float vectors bitwise ops

2007-08-22 Thread Ross Ridge
Ross Ridge [EMAIL PROTECTED] wrote:
 GCC makes the problem is even worse if only SSE and not SSE 2 instructions
 are enabled.  Since the integer bitwise instructions are only available
 with SSE 2, using casts instead of intrinsics causes GCC to expand the
 operation into a long series of instructions.

Andrew Pinski writes:
...
Why did Intel split up these instructions in the first place, is it
because they wanted to have a seperate vector units in some cases?
I don't know and I don't care that much. 

Well, if you would rather remain ingorant, I suppose there's little
point in discussing this with you.  However, please don't try to pretend
that the vector extenstions are supposed to be generic when you use
justifications like it's how Altivec works, and it's compatible with
a proprietary standard called C/C++ Language Extensions for Cell
Broadband Engine Architecture.  If you're going to continue to use
justifications like this and ignore the performance implications of
your changes on IA-32, then you should accept the fact that the vector
extensions are not ment for platforms that you don't know and don't care
that much about.

Ross Ridge



Re: recent troubles with float vectors bitwise ops

2007-08-21 Thread Ian Lance Taylor
tbp [EMAIL PROTECTED] writes:

 vecop.cc:4: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator|'
 vecop.cc:5: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator'
 vecop.cc:6: error: invalid operands of types 'float __vector__' and
 'float __vector__' to binary 'operator^'
 
 Apparently it's still there as of right now, on x86-64 at least. I
 think this is not supposed to happen but i'm not sure, hence the mail.

What does it mean to do a bitwise-or of a floating point value?

This code also gets an error:

double foo(double a, double b) { return a | b; }

Ian


Re: recent troubles with float vectors bitwise ops

2007-08-21 Thread tbp

Ian Lance Taylor wrote:

What does it mean to do a bitwise-or of a floating point value?
Apparently enough for a small vendor like Intel to propose such things 
as orps, andps, andnps, and xorps.
So, that's what i feared... it was intentional. And now i guess the only 
sanctioned access to those ops is via builtins/intrinsics. Great.
If only i could get the same quality of code when using intrinsics to 
begin with...