Hi,

Here is a simple test-case to reproduce 176.gcc failure (I run it on
Haswell machine).
Using 20160819 compiler build we get:
gcc -O3 -m32 -mavx2 test.c -o test.ref.exe
/users/ysrumyan/isse_6866$ ./test.ref.exe
Aborted (core dumped)

If I apply patch proposed by Patrick test runs properly
Instead of running we can check number of .jump thread.

2016-08-19 12:25 GMT+03:00 Richard Biener <richard.guent...@gmail.com>:
> On Fri, Aug 19, 2016 at 1:06 AM, Patrick Palka <patr...@parcs.ath.cx> wrote:
>> On Thu, 18 Aug 2016, Richard Biener wrote:
>>
>>> On August 18, 2016 8:25:18 PM GMT+02:00, Patrick Palka 
>>> <patr...@parcs.ath.cx> wrote:
>>> >In comment #5 Yuri reports that r235653 introduces a runtime failure
>>> >for
>>> >176.gcc which I guess is caused by the combining step in
>>> >simplify_control_stmt_condition_1() not behaving properly on operands
>>> >of
>>> >type VECTOR_TYPE.  I'm a bit stumped as to why it mishandles
>>> >VECTOR_TYPEs because the logic should be generic enough to support them
>>> >as well.  But it was confirmed that restricting the combining step to
>>> >operands of scalar type fixes the runtime failure so here is a patch
>>> >that does this.  Does this look OK to commit after bootstrap +
>>> >regtesting on x86_64-pc-linux-gnu?
>>>
>>> Hum, I'd rather understand what is going wrong.  Can you at least isolate a 
>>> testcase?
>>>
>>> Richard.
>>
>> I don't have access to the SPEC benchmarks unfortunately.  Maybe Yuri
>> can isolate a test case?
>>
>> But I think I found a theoretical bug which may or may not coincide with
>> the bug that Yuri is observing.  The part of the combining step that may
>> provide wrong results for VECTOR_TYPEs is the one that simplifies the
>> conditional (A & B) != 0 to true when given that A != 0 and B != 0 and
>> given that their TYPE_PRECISION is 1.
>>
>> The TYPE_PRECISION test was intended to succeed only on scalars, but
>> IIUC it accidentally succeeds on one-dimensional vectors too.  So we may
>> be wrongly simplifying X & Y != <0> to true given that e.g.  X == <8>
>> and Y == <2>.  So this simplification should probably be restricted to
>> integral types like so:
>>
>> diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
>> index 170e456..b8c8b70 100644
>> --- a/gcc/tree-ssa-threadedge.c
>> +++ b/gcc/tree-ssa-threadedge.c
>> @@ -648,14 +648,17 @@ simplify_control_stmt_condition_1 (edge e,
>>           if (res1 != NULL_TREE && res2 != NULL_TREE)
>>             {
>>               if (rhs_code == BIT_AND_EXPR
>> +                 && INTEGRAL_TYPE_P (TREE_TYPE (op0))
>>                   && TYPE_PRECISION (TREE_TYPE (op0)) == 1
>
> you can use element_precision (op0) == 1 instead.
>
> Richard.
>
>>                   && integer_nonzerop (res1)
>>                   && integer_nonzerop (res2))
>> --
>> 2.9.3.650.g20ba99f
>>
>> Hope this makes sense.
>>
>>>
>>> >gcc/ChangeLog:
>>> >
>>> >     PR tree-optimization/71077
>>> >     * tree-ssa-threadedge.c (simplify_control_stmt_condition_1):
>>> >     Perform the combining step only if the operands have an integral
>>> >     or a pointer type.
>>> >---
>>> > gcc/tree-ssa-threadedge.c | 3 +++
>>> > 1 file changed, 3 insertions(+)
>>> >
>>> >diff --git a/gcc/tree-ssa-threadedge.c b/gcc/tree-ssa-threadedge.c
>>> >index 170e456..a97c00c 100644
>>> >--- a/gcc/tree-ssa-threadedge.c
>>> >+++ b/gcc/tree-ssa-threadedge.c
>>> >@@ -577,6 +577,9 @@ simplify_control_stmt_condition_1 (edge e,
>>> >   if (handle_dominating_asserts
>>> >       && (cond_code == EQ_EXPR || cond_code == NE_EXPR)
>>> >       && TREE_CODE (op0) == SSA_NAME
>>> >+      /* ??? Vector types are mishandled here.  */
>>> >+      && (INTEGRAL_TYPE_P (TREE_TYPE (op0))
>>> >+      || POINTER_TYPE_P (TREE_TYPE (op0)))
>>> >       && integer_zerop (op1))
>>> >     {
>>> >       gimple *def_stmt = SSA_NAME_DEF_STMT (op0);
>>>
>>>
>>>
typedef unsigned int ui;
ui x[32*32];
ui y[32];
ui z[32];
void __attribute__ ((noinline, noclone)) foo (ui n, ui z)
{
  ui i, b;
  ui v;
 for (i = 0; i< n; i++)
  {
    v = y[i];
    if (v) {
      for (b = 0; b < 32; b++)
	if ((v >> b) & 1)
	  x[i*32 +b] = z;
      y[i] = 0;
    }
  } 
}

int main()
{
  int i;
  unsigned int val;
  for (i = 0; i<32; i++)
    {
      val = 1 << i;
      y[i] = (i & 1)? 0 : val;
      z[i] = i;
    }
    foo (32, 10);
  for (i=0; i<1024; i+=66)
    if (x[i] != 10)
      __builtin_abort ();
  return 0;
}

Reply via email to