DavidEGrayson added a comment.

Hi John.  I thought about your comment on line 1601 somewhat carefully and 
tried to imagine what it would look like.  See my inline comment.

Also, by the way, do you know if the results of the LLVM intrinsics like 
`llvm.smul.with.overflow.*` are guaranteed to be well-defined when there is an 
overflow?  This whole patch hinges on that, because the results of the builtins 
are supposed to be well-defined.  The documentation here 
<http://llvm.org/docs/LangRef.html#llvm-smul-with-overflow-intrinsics> is not 
so great.


================
Comment at: lib/CodeGen/CGBuiltin.cpp:1601
@@ +1600,3 @@
+    auto RITy = IntegerWidthAndSignedness(CGM.getContext(), RQTy);
+    auto EITy = EncompassingIntegerType({XITy, YITy, RITy});
+
----------------
rjmccall wrote:
> DavidEGrayson wrote:
> > rjmccall wrote:
> > > These are not the most evocative names you could have chosen. :)
> > > 
> > > Also, this strategy is correct, but it produces horrible code when the 
> > > operands have the same signedness and the result is different, where the 
> > > result would otherwise be an ordinary operation and a compare.  You 
> > > really want to just merge the two operands and then max with the width of 
> > > the result type.
> > I don't think that would work.  It sounds like you want me to remove RITy 
> > from the call to EncompassingIntegerType.  That means the ecompassing type 
> > could be smaller than the result type, and the arithmetic intrinsic would 
> > report an overflow even though the mathematical result is representable in 
> > the result type.
> > 
> > For example, if I am multiplying ((uint8_t)100) by ((uint8_t)100) and 
> > storing the result in a (uint16_t), the result needs to be 10000 and there 
> > is no overflow.  To arrive at that result, we need to convert the operands 
> > to 16-bit unsigned integers before multiplying them.  If we multiply them 
> > as 8-bit unsigned integers, the multiplication intrinsic will report an 
> > overflow and give a result of 16.  That seems like a messy situation and I 
> > don't see how a max operation would fix it.
> I think I was unclear.  I'm not saying that you should do a max as part of 
> computing the dynamic result.  I'm saying that, when deciding the 
> encompassing type, it should still be at least as wide as the result, but you 
> should ignore the result type's signedness and then patch it up with a 
> comparison at the end if necessary.  (I believe that in both cases, it's a 
> signed comparison against zero.)
> 
> That is, when multiplying two uint8_t's and storing the result in an int16_t, 
> we should just do an unsigned 16-bit multiply-with-overflow.  The computation 
> overflows if the multiply overflows (which it provably doesn't, but we can 
> let LLVM figure that out for now) or the result is signed-< 0.
OK, I understand now.  As an optimization, you would to like the operation type 
(the width/signedness that we use to perform the main math operation) to 
sometimes be the same width as the result type (the type that the third 
argument points to) but have a different sign.

I'm thinking about how that would work in all the different cases.  I have 
concluded that it would work in 5 out of 6 cases, but we would have to think 
carefully about each case and it is possible I have made some mistakes.  But 
here are the 6 cases:

1) `__builtin_add_overflow`, operation type signed, result type unsigned: It 
would work but it would be complicated.  If the signed addition intrinsic 
reports an overflow, we need to know whether the result was too big (which is 
OK) or too small (which is bad).  The most extreme underflow would be 
(-2^(N-1)) + (-2^(N-1)) = 0, so every underflow should give a non-negative 
result.  Similarly, every overflow-proper (case where the result was too big) 
would result in a negative number.  So the overflow bit returned from 
`__builtin_add_overflow` should be the overflow bit from the intrinsic XORed 
with a "less than zero" comparison.

2) `__builtin_add_overflow`, operation type unsigned, result type signed:  This 
is easier, because the only possible overflow reported by the unsigned addition 
intrinsic happens when the result is too big, which means it's definitely too 
big for the signed type.  The overflow bit returned from 
`__builtin_add_overflow` in this case would be the overflow bit from the 
intrinsic ORed with a "less than zero" comparison.

3) `__builtin_sub_overflow`, operation type signed, result type unsigned:  If 
the signed subtraction intrinsic reports an overflow, we need to know whether 
the result was too big (which is OK) or too small (which is bad).  The most 
extreme underflow would be (-2^(N-1)) - (2^(N-1)-1) = 1, so every underflow 
gives a positive value.  The most extreme overflow-proper would be (2^(N-1)-1) 
- (-2^(N-1)) = -1, so every overflow-proper gives a negative number.  Just like 
case #1, the overflow bit we return shoud be the overflow bit from the 
intrinsic XORed with a "less than zero" comparison.

4) `__builtin_sub_overflow`, operation type unsigned, result type signed:  If 
the unsigned subtraction intrinsic reports an overflow, it simply means Y > X.  
That is OK as long as Y isn't too much greater than X.  The most extreme case 
is 0 - (2^N-1) = 1.  So if the result from the intrinsic is less than zero, 
then we are OK.  So the overflow bit returned from __builtin_sub_overflow 
should be the overflow bit from the intrinsic XORed with a "less than zero" 
comparison. 

5) `__builtin_mul_overflow`, operation type signed, result type unsigned: I 
don't think this case would work, so we would have to fall back to using an 
operation type that actually encompasses the result type.  For example, if we 
are multiplying two int16_t and the result is uint16_t, and the intrinsic gives 
us an overflow, it's really hard to know whether that overflow was OK (200 * 
200) or not OK (200 * 527).

6) `__builtin_mul_overflow`, operation type unsigned, result type signed: The 
logic from case 2 applies here.

Do you want the patch to go in this direction?  Since we have to consider these 
6 cases, I think it would be more complicated than what you were describing.


Repository:
  rL LLVM

http://reviews.llvm.org/D12793



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to