Re: [PATCH v4] fold fold_truth_andor field merging into ifcombine

Richard Biener Tue, 10 Dec 2024 05:12:58 -0800

On Mon, Dec 9, 2024 at 8:55 PM Alexandre Oliva <ol...@adacore.com> wrote:
>
> This patch introduces various improvements to the logic that merges
> field compares, while moving it into ifcombine.
>
> Before the patch, we could merge:
>
>   (a.x1 EQNE b.x1)  ANDOR  (a.y1 EQNE b.y1)
>
> into something like:
>
>   (((type *)&a)[Na] & MASK) EQNE (((type *)&b)[Nb] & MASK)
>
> if both of A's fields live within the same alignment boundaries, and
> so do B's, at the same relative positions.  Constants may be used
> instead of the object B.
>
> The initial goal of this patch was to enable such combinations when a
> field crossed alignment boundaries, e.g. for packed types.  We can't
> generally access such fields with a single memory access, so when we
> come across such a compare, we will attempt to combine each access
> separately.
>
> Some merging opportunities were missed because of right-shifts,
> compares expressed as e.g. ((a.x1 ^ b.x1) & MASK) EQNE 0, and
> narrowing conversions, especially after earlier merges.  This patch
> introduces handlers for several cases involving these.
>
> The merging of multiple field accesses into wider bitfield-like
> accesses is undesirable to do too early in compilation, so we move it
> from folding to ifcombine, and guard its warnings with
> -Wtautological-compare, turned into a common flag.
>
> When the second of a noncontiguous pair of compares is the first that
> accesses a word, we may merge the first compare with part of the
> second compare that refers to the same word, keeping the compare of
> the remaining bits at the spot where the second compare used to be.
>
> Handling compares with non-constant fields was somewhat generalized
> from what fold used to do, now handling non-adjacent fields, even if a
> field of one object crosses an alignment boundary but the other
> doesn't.
>
>
> Regstrapped on x86_64-linux-gnu.  The aarch64 CI had reported a
> regression with v3, so I'm also running a regstrap on aarch64-linux-gnu,
> but it will still be a while.  Ok to install if it passes?


OK.

Thanks again for doing this,
Richard.

> Changed from v3:
>
> - renamed new predicates within gimple-fold.cc, split out check for
>   integral constants, drop unneeded canonicalization
>
> - disabled ifcombine for gcc.target/aarch64/long_branch_1.c.  I posted
>   the incremental patch too soon and it had a thinko, fixed herein,
>   sorry.
>
> Changes from v2:
>
> - fixed ICE when xor appeared only in the right-hand operand of a
>   compare
>
> - simplified the interface of decode_field_reference.  pand_mask can now
>   be NULL, meaning mask operations are to be rejected, but this ended up
>   unused.  It no longer rejects plain SSA names as the expr to be
>   combined.
>
> - drop unused variables from fold_truth_andor_for_ifcombine, and ensure
>   right-hand masks are applied even on constants
>
> - normalize width of masks and constants represented as wide_int, fixing
>   regressions introduced by the transition to wide_int.  Don't bother to
>   sign extend them, treat them as unsigned since it's all bitwise
>   equality tests.
>
> - more tests; fixed thinko in v2-added test
>
> - guard warnings with -Wtautological-compares; made the option available
>   on all languages
>
> - backpedal from match.pd matchers to open-coded matchers
>
> - improve vuse setting to avoid unwarranted assumptions about
>   gimplification
>
> - simplify selection of types for combined operands
>
> - drop side effects tests not sensible in gimple
>
> - explain apparent presence of TRUTH_*IF_EXPR in gimple
>
> Changes from v1:
>
> - Noncontiguous ifcombine split out and already installed.
>
> - Do not attempt to place new loads next to the latest of the original
>   loads.  Only check that loads have the same vuses, and place loads
>   next to either one, even if that's suboptimal.
>
> - Use gimple pattern matching.  It turned out not to be very useful, but
>   it enabled the elimination of redundant auxiliary functions.
>
> - Rewrote constants and masks using wide_int.
>
> - Work harder to gather and reuse location_t.
>
> - Rework BIT_XOR handling to avoid having to match patterns again.
>
> - Distinguish the cases of contiguous and noncontiguous conditions.
>
> - Comments, lots of comments.
>
> - More tests.
>
> - Dropped the warnings that were hitting i386 and rs6000 bootstraps.
>   The new possibilities with ifcombine, noncontiguous at that, on top of
>   more flexible field merging, made room for too many false positives.
>
> Requested but unchanged from v1:
>
> - fold_truth_andor_for_ifcombine (renamed from ...andor_maybe_separate)
> still builds and returns generic expressions.  Since other
> ifcombine_ifandif build generic exprs and have to go through
> regimplification anyway, I failed to see the point.  Implementing that
> change would require some more work in tree-ssa-ifcombine.cc to support.
> I'd rather do that as a follow up if desired, please let me know.
>
> - it also remains a single bulky function.  There's so much state that
> breaking it up doesn't seem to be helpful.  Hopefully the added comments
> will help despite making it even bigger.
>
> - TBAA situation is unchanged, same as what's carried over from fold.
> I'm not sure what the concerns in your mind are, but if there are actual
> problems, they have long been there, and we'd better address them in
> both fold and in this bit now moved to ifcombine, ideally in a separate
> backportable patch.
>
>
> for  gcc/ChangeLog
>
>         * fold-const.cc (make_bit_field): Export.
>         (unextend, all_ones_mask_p): Drop.
>         (decode_field_reference, fold_truth_andor_1): Move
>         field compare merging logic...
>         * gimple-fold.cc: (fold_truth_andor_for_ifcombine) ... here,
>         with -Wtautological-compare warning guards, and...
>         (decode_field_reference): ... here.  Rework for gimple.
>         (gimple_convert_def_p, gimple_binop_def_p): New.
>         (compute_split_boundary_from_align): New.
>         (make_bit_field_load, build_split_load): New.
>         (reuse_split_load): New.
>         * fold-const.h: (make_bit_field_ref): Declare
>         (fold_truth_andor_for_ifcombine): Declare.
>         * tree-ssa-ifcombine.cc (ifcombine_ifandif): Try
>         fold_truth_andor_for_ifcombine.
>         * common.opt (Wtautological-compare): Move here.
>
> for  gcc/c-family/ChangeLog
>
>         * c.opt (Wtautological-compare): Move to ../common.opt.
>
> for  gcc/testsuite/ChangeLog
>
>         * gcc.dg/field-merge-1.c: New.
>         * gcc.dg/field-merge-2.c: New.
>         * gcc.dg/field-merge-3.c: New.
>         * gcc.dg/field-merge-4.c: New.
>         * gcc.dg/field-merge-5.c: New.
>         * gcc.dg/field-merge-6.c: New.
>         * gcc.dg/field-merge-7.c: New.
>         * gcc.dg/field-merge-8.c: New.
>         * gcc.dg/field-merge-9.c: New.
>         * gcc.dg/field-merge-10.c: New.
>         * gcc.dg/field-merge-11.c: New.
>         * gcc.dg/field-merge-12.c: New.
>         * gcc.target/aarch64/long_branch_1.c: Disable ifcombine.
> ---
>  gcc/c-family/c.opt                               |    4
>  gcc/common.opt                                   |    4
>  gcc/fold-const.cc                                |  512 ----------
>  gcc/fold-const.h                                 |   10
>  gcc/gimple-fold.cc                               | 1148 
> ++++++++++++++++++++++
>  gcc/testsuite/gcc.dg/field-merge-1.c             |   64 +
>  gcc/testsuite/gcc.dg/field-merge-10.c            |   36 +
>  gcc/testsuite/gcc.dg/field-merge-11.c            |   32 +
>  gcc/testsuite/gcc.dg/field-merge-12.c            |   33 +
>  gcc/testsuite/gcc.dg/field-merge-2.c             |   31 +
>  gcc/testsuite/gcc.dg/field-merge-3.c             |   36 +
>  gcc/testsuite/gcc.dg/field-merge-4.c             |   40 +
>  gcc/testsuite/gcc.dg/field-merge-5.c             |   40 +
>  gcc/testsuite/gcc.dg/field-merge-6.c             |   26
>  gcc/testsuite/gcc.dg/field-merge-7.c             |   23
>  gcc/testsuite/gcc.dg/field-merge-8.c             |   25
>  gcc/testsuite/gcc.dg/field-merge-9.c             |   38 +
>  gcc/testsuite/gcc.target/aarch64/long_branch_1.c |    2
>  gcc/tree-ssa-ifcombine.cc                        |   14
>  19 files changed, 1604 insertions(+), 514 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-10.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-11.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-12.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-5.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-6.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-7.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-8.c
>  create mode 100644 gcc/testsuite/gcc.dg/field-merge-9.c
>
> diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
> index 0d255561b1bdc..be7916e957d66 100644
> --- a/gcc/c-family/c.opt
> +++ b/gcc/c-family/c.opt
> @@ -1477,10 +1477,6 @@ Wtemplates
>  C++ ObjC++ Var(warn_templates) Warning
>  Warn on primary template declaration.
>
> -Wtautological-compare
> -C ObjC C++ ObjC++ Var(warn_tautological_compare) Warning LangEnabledBy(C 
> ObjC C++ ObjC++,Wall)
> -Warn if a comparison always evaluates to true or false.
> -
>  Wtemplate-body
>  C++ ObjC++ Var(warn_template_body) Warning Init(1)
>  Diagnose errors when parsing a template.
> diff --git a/gcc/common.opt b/gcc/common.opt
> index bb226ac61e6a1..915ce5bffb4e0 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -812,6 +812,10 @@ Wsystem-headers
>  Common Var(warn_system_headers) Warning
>  Do not suppress warnings from system headers.
>
> +Wtautological-compare
> +Common Var(warn_tautological_compare) Warning LangEnabledBy(C ObjC C++ 
> ObjC++,Wall)
> +Warn if a comparison always evaluates to true or false.
> +
>  Wtrampolines
>  Common Var(warn_trampolines) Warning
>  Warn whenever a trampoline is generated.
> diff --git a/gcc/fold-const.cc b/gcc/fold-const.cc
> index 1e8ae1ab493ba..6449664598641 100644
> --- a/gcc/fold-const.cc
> +++ b/gcc/fold-const.cc
> @@ -137,7 +137,6 @@ static tree range_successor (tree);
>  static tree fold_range_test (location_t, enum tree_code, tree, tree, tree);
>  static tree fold_cond_expr_with_comparison (location_t, tree, enum tree_code,
>                                             tree, tree, tree, tree);
> -static tree unextend (tree, int, int, tree);
>  static tree extract_muldiv (tree, tree, enum tree_code, tree, bool *);
>  static tree extract_muldiv_1 (tree, tree, enum tree_code, tree, bool *);
>  static tree fold_binary_op_with_conditional_arg (location_t,
> @@ -4711,7 +4710,7 @@ invert_truthvalue_loc (location_t loc, tree arg)
>     is the original memory reference used to preserve the alias set of
>     the access.  */
>
> -static tree
> +tree
>  make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
>                     HOST_WIDE_INT bitsize, poly_int64 bitpos,
>                     int unsignedp, int reversep)
> @@ -4961,136 +4960,6 @@ optimize_bit_field_compare (location_t loc, enum 
> tree_code code,
>    return lhs;
>  }
>
> -/* Subroutine for fold_truth_andor_1: decode a field reference.
> -
> -   If EXP is a comparison reference, we return the innermost reference.
> -
> -   *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
> -   set to the starting bit number.
> -
> -   If the innermost field can be completely contained in a mode-sized
> -   unit, *PMODE is set to that mode.  Otherwise, it is set to VOIDmode.
> -
> -   *PVOLATILEP is set to 1 if the any expression encountered is volatile;
> -   otherwise it is not changed.
> -
> -   *PUNSIGNEDP is set to the signedness of the field.
> -
> -   *PREVERSEP is set to the storage order of the field.
> -
> -   *PMASK is set to the mask used.  This is either contained in a
> -   BIT_AND_EXPR or derived from the width of the field.
> -
> -   *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
> -
> -   Return 0 if this is not a component reference or is one that we can't
> -   do anything with.  */
> -
> -static tree
> -decode_field_reference (location_t loc, tree *exp_, HOST_WIDE_INT *pbitsize,
> -                       HOST_WIDE_INT *pbitpos, machine_mode *pmode,
> -                       int *punsignedp, int *preversep, int *pvolatilep,
> -                       tree *pmask, tree *pand_mask)
> -{
> -  tree exp = *exp_;
> -  tree outer_type = 0;
> -  tree and_mask = 0;
> -  tree mask, inner, offset;
> -  tree unsigned_type;
> -  unsigned int precision;
> -
> -  /* All the optimizations using this function assume integer fields.
> -     There are problems with FP fields since the type_for_size call
> -     below can fail for, e.g., XFmode.  */
> -  if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
> -    return NULL_TREE;
> -
> -  /* We are interested in the bare arrangement of bits, so strip everything
> -     that doesn't affect the machine mode.  However, record the type of the
> -     outermost expression if it may matter below.  */
> -  if (CONVERT_EXPR_P (exp)
> -      || TREE_CODE (exp) == NON_LVALUE_EXPR)
> -    outer_type = TREE_TYPE (exp);
> -  STRIP_NOPS (exp);
> -
> -  if (TREE_CODE (exp) == BIT_AND_EXPR)
> -    {
> -      and_mask = TREE_OPERAND (exp, 1);
> -      exp = TREE_OPERAND (exp, 0);
> -      STRIP_NOPS (exp); STRIP_NOPS (and_mask);
> -      if (TREE_CODE (and_mask) != INTEGER_CST)
> -       return NULL_TREE;
> -    }
> -
> -  poly_int64 poly_bitsize, poly_bitpos;
> -  inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
> -                              pmode, punsignedp, preversep, pvolatilep);
> -  if ((inner == exp && and_mask == 0)
> -      || !poly_bitsize.is_constant (pbitsize)
> -      || !poly_bitpos.is_constant (pbitpos)
> -      || *pbitsize < 0
> -      || offset != 0
> -      || TREE_CODE (inner) == PLACEHOLDER_EXPR
> -      /* We eventually want to build a larger reference and need to take
> -        the address of this.  */
> -      || (!REFERENCE_CLASS_P (inner) && !DECL_P (inner))
> -      /* Reject out-of-bound accesses (PR79731).  */
> -      || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
> -         && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
> -                              *pbitpos + *pbitsize) < 0))
> -    return NULL_TREE;
> -
> -  unsigned_type = lang_hooks.types.type_for_size (*pbitsize, 1);
> -  if (unsigned_type == NULL_TREE)
> -    return NULL_TREE;
> -
> -  *exp_ = exp;
> -
> -  /* If the number of bits in the reference is the same as the bitsize of
> -     the outer type, then the outer type gives the signedness. Otherwise
> -     (in case of a small bitfield) the signedness is unchanged.  */
> -  if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
> -    *punsignedp = TYPE_UNSIGNED (outer_type);
> -
> -  /* Compute the mask to access the bitfield.  */
> -  precision = TYPE_PRECISION (unsigned_type);
> -
> -  mask = build_int_cst_type (unsigned_type, -1);
> -
> -  mask = const_binop (LSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> -  mask = const_binop (RSHIFT_EXPR, mask, size_int (precision - *pbitsize));
> -
> -  /* Merge it with the mask we found in the BIT_AND_EXPR, if any.  */
> -  if (and_mask != 0)
> -    mask = fold_build2_loc (loc, BIT_AND_EXPR, unsigned_type,
> -                       fold_convert_loc (loc, unsigned_type, and_mask), 
> mask);
> -
> -  *pmask = mask;
> -  *pand_mask = and_mask;
> -  return inner;
> -}
> -
> -/* Return nonzero if MASK represents a mask of SIZE ones in the low-order
> -   bit positions and MASK is SIGNED.  */
> -
> -static bool
> -all_ones_mask_p (const_tree mask, unsigned int size)
> -{
> -  tree type = TREE_TYPE (mask);
> -  unsigned int precision = TYPE_PRECISION (type);
> -
> -  /* If this function returns true when the type of the mask is
> -     UNSIGNED, then there will be errors.  In particular see
> -     gcc.c-torture/execute/990326-1.c.  There does not appear to be
> -     any documentation paper trail as to why this is so.  But the pre
> -     wide-int worked with that restriction and it has been preserved
> -     here.  */
> -  if (size > precision || TYPE_SIGN (type) == UNSIGNED)
> -    return false;
> -
> -  return wi::mask (size, false, precision) == wi::to_wide (mask);
> -}
> -
>  /* Subroutine for fold: determine if VAL is the INTEGER_CONST that
>     represents the sign bit of EXP's type.  If EXP represents a sign
>     or zero extension, also test VAL against the unextended type.
> @@ -6400,48 +6269,6 @@ fold_range_test (location_t loc, enum tree_code code, 
> tree type,
>    return 0;
>  }
>
> -/* Subroutine for fold_truth_andor_1: C is an INTEGER_CST interpreted as a P
> -   bit value.  Arrange things so the extra bits will be set to zero if and
> -   only if C is signed-extended to its full width.  If MASK is nonzero,
> -   it is an INTEGER_CST that should be AND'ed with the extra bits.  */
> -
> -static tree
> -unextend (tree c, int p, int unsignedp, tree mask)
> -{
> -  tree type = TREE_TYPE (c);
> -  int modesize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
> -  tree temp;
> -
> -  if (p == modesize || unsignedp)
> -    return c;
> -
> -  /* We work by getting just the sign bit into the low-order bit, then
> -     into the high-order bit, then sign-extend.  We then XOR that value
> -     with C.  */
> -  temp = build_int_cst (TREE_TYPE (c),
> -                       wi::extract_uhwi (wi::to_wide (c), p - 1, 1));
> -
> -  /* We must use a signed type in order to get an arithmetic right shift.
> -     However, we must also avoid introducing accidental overflows, so that
> -     a subsequent call to integer_zerop will work.  Hence we must
> -     do the type conversion here.  At this point, the constant is either
> -     zero or one, and the conversion to a signed type can never overflow.
> -     We could get an overflow if this conversion is done anywhere else.  */
> -  if (TYPE_UNSIGNED (type))
> -    temp = fold_convert (signed_type_for (type), temp);
> -
> -  temp = const_binop (LSHIFT_EXPR, temp, size_int (modesize - 1));
> -  temp = const_binop (RSHIFT_EXPR, temp, size_int (modesize - p - 1));
> -  if (mask != 0)
> -    temp = const_binop (BIT_AND_EXPR, temp,
> -                       fold_convert (TREE_TYPE (c), mask));
> -  /* If necessary, convert the type back to match the type of C.  */
> -  if (TYPE_UNSIGNED (type))
> -    temp = fold_convert (type, temp);
> -
> -  return fold_convert (type, const_binop (BIT_XOR_EXPR, c, temp));
> -}
> -
>  /* For an expression that has the form
>       (A && B) || ~B
>     or
> @@ -6512,20 +6339,13 @@ merge_truthop_with_opposite_arm (location_t loc, tree 
> op, tree cmpop,
>                             lhs, rhs);
>    return NULL_TREE;
>  }
> -
> +
>  /* Find ways of folding logical expressions of LHS and RHS:
>     Try to merge two comparisons to the same innermost item.
>     Look for range tests like "ch >= '0' && ch <= '9'".
>     Look for combinations of simple terms on machines with expensive branches
>     and evaluate the RHS unconditionally.
>
> -   For example, if we have p->a == 2 && p->b == 4 and we can make an
> -   object large enough to span both A and B, we can do this with a comparison
> -   against the object ANDed with the a mask.
> -
> -   If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
> -   operations to do this with one comparison.
> -
>     We check for both normal comparisons and the BIT_AND_EXPRs made this by
>     function and the one above.
>
> @@ -6550,24 +6370,9 @@ fold_truth_andor_1 (location_t loc, enum tree_code 
> code, tree truth_type,
>       convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
>       comparison for one-bit fields.  */
>
> -  enum tree_code wanted_code;
>    enum tree_code lcode, rcode;
>    tree ll_arg, lr_arg, rl_arg, rr_arg;
> -  tree ll_inner, lr_inner, rl_inner, rr_inner;
> -  HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
> -  HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
> -  HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
> -  HOST_WIDE_INT lnbitsize, lnbitpos, rnbitsize, rnbitpos;
> -  int ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
> -  int ll_reversep, lr_reversep, rl_reversep, rr_reversep;
> -  machine_mode ll_mode, lr_mode, rl_mode, rr_mode;
> -  scalar_int_mode lnmode, rnmode;
> -  tree ll_mask, lr_mask, rl_mask, rr_mask;
> -  tree ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
> -  tree l_const, r_const;
> -  tree lntype, rntype, result;
> -  HOST_WIDE_INT first_bit, end_bit;
> -  int volatilep;
> +  tree result;
>
>    /* Start by getting the comparison codes.  Fail if anything is volatile.
>       If one operand is a BIT_AND_EXPR with the constant one, treat it as if
> @@ -6662,316 +6467,7 @@ fold_truth_andor_1 (location_t loc, enum tree_code 
> code, tree truth_type,
>                            build_int_cst (TREE_TYPE (ll_arg), 0));
>      }
>
> -  /* See if the comparisons can be merged.  Then get all the parameters for
> -     each side.  */
> -
> -  if ((lcode != EQ_EXPR && lcode != NE_EXPR)
> -      || (rcode != EQ_EXPR && rcode != NE_EXPR))
> -    return 0;
> -
> -  ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
> -  volatilep = 0;
> -  ll_inner = decode_field_reference (loc, &ll_arg,
> -                                    &ll_bitsize, &ll_bitpos, &ll_mode,
> -                                    &ll_unsignedp, &ll_reversep, &volatilep,
> -                                    &ll_mask, &ll_and_mask);
> -  lr_inner = decode_field_reference (loc, &lr_arg,
> -                                    &lr_bitsize, &lr_bitpos, &lr_mode,
> -                                    &lr_unsignedp, &lr_reversep, &volatilep,
> -                                    &lr_mask, &lr_and_mask);
> -  rl_inner = decode_field_reference (loc, &rl_arg,
> -                                    &rl_bitsize, &rl_bitpos, &rl_mode,
> -                                    &rl_unsignedp, &rl_reversep, &volatilep,
> -                                    &rl_mask, &rl_and_mask);
> -  rr_inner = decode_field_reference (loc, &rr_arg,
> -                                    &rr_bitsize, &rr_bitpos, &rr_mode,
> -                                    &rr_unsignedp, &rr_reversep, &volatilep,
> -                                    &rr_mask, &rr_and_mask);
> -
> -  /* It must be true that the inner operation on the lhs of each
> -     comparison must be the same if we are to be able to do anything.
> -     Then see if we have constants.  If not, the same must be true for
> -     the rhs's.  */
> -  if (volatilep
> -      || ll_reversep != rl_reversep
> -      || ll_inner == 0 || rl_inner == 0
> -      || ! operand_equal_p (ll_inner, rl_inner, 0))
> -    return 0;
> -
> -  if (TREE_CODE (lr_arg) == INTEGER_CST
> -      && TREE_CODE (rr_arg) == INTEGER_CST)
> -    {
> -      l_const = lr_arg, r_const = rr_arg;
> -      lr_reversep = ll_reversep;
> -    }
> -  else if (lr_reversep != rr_reversep
> -          || lr_inner == 0 || rr_inner == 0
> -          || ! operand_equal_p (lr_inner, rr_inner, 0))
> -    return 0;
> -  else
> -    l_const = r_const = 0;
> -
> -  /* If either comparison code is not correct for our logical operation,
> -     fail.  However, we can convert a one-bit comparison against zero into
> -     the opposite comparison against that bit being set in the field.  */
> -
> -  wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
> -  if (lcode != wanted_code)
> -    {
> -      if (l_const && integer_zerop (l_const) && integer_pow2p (ll_mask))
> -       {
> -         /* Make the left operand unsigned, since we are only interested
> -            in the value of one bit.  Otherwise we are doing the wrong
> -            thing below.  */
> -         ll_unsignedp = 1;
> -         l_const = ll_mask;
> -       }
> -      else
> -       return 0;
> -    }
> -
> -  /* This is analogous to the code for l_const above.  */
> -  if (rcode != wanted_code)
> -    {
> -      if (r_const && integer_zerop (r_const) && integer_pow2p (rl_mask))
> -       {
> -         rl_unsignedp = 1;
> -         r_const = rl_mask;
> -       }
> -      else
> -       return 0;
> -    }
> -
> -  /* See if we can find a mode that contains both fields being compared on
> -     the left.  If we can't, fail.  Otherwise, update all constants and masks
> -     to be relative to a field of that size.  */
> -  first_bit = MIN (ll_bitpos, rl_bitpos);
> -  end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
> -  if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> -                     TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
> -                     volatilep, &lnmode))
> -    return 0;
> -
> -  lnbitsize = GET_MODE_BITSIZE (lnmode);
> -  lnbitpos = first_bit & ~ (lnbitsize - 1);
> -  lntype = lang_hooks.types.type_for_size (lnbitsize, 1);
> -  xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
> -
> -  if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> -    {
> -      xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
> -      xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
> -    }
> -
> -  ll_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, 
> ll_mask),
> -                        size_int (xll_bitpos));
> -  rl_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc, lntype, 
> rl_mask),
> -                        size_int (xrl_bitpos));
> -  if (ll_mask == NULL_TREE || rl_mask == NULL_TREE)
> -    return 0;
> -
> -  if (l_const)
> -    {
> -      l_const = fold_convert_loc (loc, lntype, l_const);
> -      l_const = unextend (l_const, ll_bitsize, ll_unsignedp, ll_and_mask);
> -      l_const = const_binop (LSHIFT_EXPR, l_const, size_int (xll_bitpos));
> -      if (l_const == NULL_TREE)
> -       return 0;
> -      if (! integer_zerop (const_binop (BIT_AND_EXPR, l_const,
> -                                       fold_build1_loc (loc, BIT_NOT_EXPR,
> -                                                        lntype, ll_mask))))
> -       {
> -         warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> -
> -         return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> -       }
> -    }
> -  if (r_const)
> -    {
> -      r_const = fold_convert_loc (loc, lntype, r_const);
> -      r_const = unextend (r_const, rl_bitsize, rl_unsignedp, rl_and_mask);
> -      r_const = const_binop (LSHIFT_EXPR, r_const, size_int (xrl_bitpos));
> -      if (r_const == NULL_TREE)
> -       return 0;
> -      if (! integer_zerop (const_binop (BIT_AND_EXPR, r_const,
> -                                       fold_build1_loc (loc, BIT_NOT_EXPR,
> -                                                        lntype, rl_mask))))
> -       {
> -         warning (0, "comparison is always %d", wanted_code == NE_EXPR);
> -
> -         return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> -       }
> -    }
> -
> -  /* If the right sides are not constant, do the same for it.  Also,
> -     disallow this optimization if a size, signedness or storage order
> -     mismatch occurs between the left and right sides.  */
> -  if (l_const == 0)
> -    {
> -      if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
> -         || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
> -         || ll_reversep != lr_reversep
> -         /* Make sure the two fields on the right
> -            correspond to the left without being swapped.  */
> -         || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos)
> -       return 0;
> -
> -      first_bit = MIN (lr_bitpos, rr_bitpos);
> -      end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
> -      if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> -                         TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
> -                         volatilep, &rnmode))
> -       return 0;
> -
> -      rnbitsize = GET_MODE_BITSIZE (rnmode);
> -      rnbitpos = first_bit & ~ (rnbitsize - 1);
> -      rntype = lang_hooks.types.type_for_size (rnbitsize, 1);
> -      xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
> -
> -      if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> -       {
> -         xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
> -         xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
> -       }
> -
> -      lr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
> -                                                           rntype, lr_mask),
> -                            size_int (xlr_bitpos));
> -      rr_mask = const_binop (LSHIFT_EXPR, fold_convert_loc (loc,
> -                                                           rntype, rr_mask),
> -                            size_int (xrr_bitpos));
> -      if (lr_mask == NULL_TREE || rr_mask == NULL_TREE)
> -       return 0;
> -
> -      /* Make a mask that corresponds to both fields being compared.
> -        Do this for both items being compared.  If the operands are the
> -        same size and the bits being compared are in the same position
> -        then we can do this by masking both and comparing the masked
> -        results.  */
> -      ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> -      lr_mask = const_binop (BIT_IOR_EXPR, lr_mask, rr_mask);
> -      if (lnbitsize == rnbitsize
> -         && xll_bitpos == xlr_bitpos
> -         && lnbitpos >= 0
> -         && rnbitpos >= 0)
> -       {
> -         lhs = make_bit_field_ref (loc, ll_inner, ll_arg,
> -                                   lntype, lnbitsize, lnbitpos,
> -                                   ll_unsignedp || rl_unsignedp, 
> ll_reversep);
> -         if (! all_ones_mask_p (ll_mask, lnbitsize))
> -           lhs = build2 (BIT_AND_EXPR, lntype, lhs, ll_mask);
> -
> -         rhs = make_bit_field_ref (loc, lr_inner, lr_arg,
> -                                   rntype, rnbitsize, rnbitpos,
> -                                   lr_unsignedp || rr_unsignedp, 
> lr_reversep);
> -         if (! all_ones_mask_p (lr_mask, rnbitsize))
> -           rhs = build2 (BIT_AND_EXPR, rntype, rhs, lr_mask);
> -
> -         return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
> -       }
> -
> -      /* There is still another way we can do something:  If both pairs of
> -        fields being compared are adjacent, we may be able to make a wider
> -        field containing them both.
> -
> -        Note that we still must mask the lhs/rhs expressions.  Furthermore,
> -        the mask must be shifted to account for the shift done by
> -        make_bit_field_ref.  */
> -      if (((ll_bitsize + ll_bitpos == rl_bitpos
> -           && lr_bitsize + lr_bitpos == rr_bitpos)
> -          || (ll_bitpos == rl_bitpos + rl_bitsize
> -              && lr_bitpos == rr_bitpos + rr_bitsize))
> -         && ll_bitpos >= 0
> -         && rl_bitpos >= 0
> -         && lr_bitpos >= 0
> -         && rr_bitpos >= 0)
> -       {
> -         tree type;
> -
> -         lhs = make_bit_field_ref (loc, ll_inner, ll_arg, lntype,
> -                                   ll_bitsize + rl_bitsize,
> -                                   MIN (ll_bitpos, rl_bitpos),
> -                                   ll_unsignedp, ll_reversep);
> -         rhs = make_bit_field_ref (loc, lr_inner, lr_arg, rntype,
> -                                   lr_bitsize + rr_bitsize,
> -                                   MIN (lr_bitpos, rr_bitpos),
> -                                   lr_unsignedp, lr_reversep);
> -
> -         ll_mask = const_binop (RSHIFT_EXPR, ll_mask,
> -                                size_int (MIN (xll_bitpos, xrl_bitpos)));
> -         lr_mask = const_binop (RSHIFT_EXPR, lr_mask,
> -                                size_int (MIN (xlr_bitpos, xrr_bitpos)));
> -         if (ll_mask == NULL_TREE || lr_mask == NULL_TREE)
> -           return 0;
> -
> -         /* Convert to the smaller type before masking out unwanted bits.  */
> -         type = lntype;
> -         if (lntype != rntype)
> -           {
> -             if (lnbitsize > rnbitsize)
> -               {
> -                 lhs = fold_convert_loc (loc, rntype, lhs);
> -                 ll_mask = fold_convert_loc (loc, rntype, ll_mask);
> -                 type = rntype;
> -               }
> -             else if (lnbitsize < rnbitsize)
> -               {
> -                 rhs = fold_convert_loc (loc, lntype, rhs);
> -                 lr_mask = fold_convert_loc (loc, lntype, lr_mask);
> -                 type = lntype;
> -               }
> -           }
> -
> -         if (! all_ones_mask_p (ll_mask, ll_bitsize + rl_bitsize))
> -           lhs = build2 (BIT_AND_EXPR, type, lhs, ll_mask);
> -
> -         if (! all_ones_mask_p (lr_mask, lr_bitsize + rr_bitsize))
> -           rhs = build2 (BIT_AND_EXPR, type, rhs, lr_mask);
> -
> -         return build2_loc (loc, wanted_code, truth_type, lhs, rhs);
> -       }
> -
> -      return 0;
> -    }
> -
> -  /* Handle the case of comparisons with constants.  If there is something in
> -     common between the masks, those bits of the constants must be the same.
> -     If not, the condition is always false.  Test for this to avoid 
> generating
> -     incorrect code below.  */
> -  result = const_binop (BIT_AND_EXPR, ll_mask, rl_mask);
> -  if (! integer_zerop (result)
> -      && simple_cst_equal (const_binop (BIT_AND_EXPR, result, l_const),
> -                          const_binop (BIT_AND_EXPR, result, r_const)) != 1)
> -    {
> -      if (wanted_code == NE_EXPR)
> -       {
> -         warning (0, "%<or%> of unmatched not-equal tests is always 1");
> -         return constant_boolean_node (true, truth_type);
> -       }
> -      else
> -       {
> -         warning (0, "%<and%> of mutually exclusive equal-tests is always 
> 0");
> -         return constant_boolean_node (false, truth_type);
> -       }
> -    }
> -
> -  if (lnbitpos < 0)
> -    return 0;
> -
> -  /* Construct the expression we will return.  First get the component
> -     reference we will make.  Unless the mask is all ones the width of
> -     that field, perform the mask operation.  Then compare with the
> -     merged constant.  */
> -  result = make_bit_field_ref (loc, ll_inner, ll_arg,
> -                              lntype, lnbitsize, lnbitpos,
> -                              ll_unsignedp || rl_unsignedp, ll_reversep);
> -
> -  ll_mask = const_binop (BIT_IOR_EXPR, ll_mask, rl_mask);
> -  if (! all_ones_mask_p (ll_mask, lnbitsize))
> -    result = build2_loc (loc, BIT_AND_EXPR, lntype, result, ll_mask);
> -
> -  return build2_loc (loc, wanted_code, truth_type, result,
> -                    const_binop (BIT_IOR_EXPR, l_const, r_const));
> +  return 0;
>  }
>
>  /* T is an integer expression that is being multiplied, divided, or taken a
> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
> index 3e3998b57b042..77a5c916cbd88 100644
> --- a/gcc/fold-const.h
> +++ b/gcc/fold-const.h
> @@ -253,11 +253,21 @@ extern tree fold_build_pointer_plus_hwi_loc (location_t 
> loc, tree ptr, HOST_WIDE
>  extern tree_code minmax_from_comparison (tree_code, tree, tree,
>                                          tree, tree);
>
> +extern tree make_bit_field_ref (location_t, tree, tree, tree,
> +                               HOST_WIDE_INT, poly_int64, int, int);
> +
>  /* In gimple-fold.cc.  */
>  extern void clear_type_padding_in_mask (tree, unsigned char *);
>  extern bool clear_padding_type_may_have_padding_p (tree);
>  extern bool arith_overflowed_p (enum tree_code, const_tree, const_tree,
>                                 const_tree);
> +extern tree fold_truth_andor_for_ifcombine (enum tree_code, tree,
> +                                           location_t, enum tree_code,
> +                                           tree, tree,
> +                                           location_t, enum tree_code,
> +                                           tree, tree,
> +                                           tree *);
> +
>
>  /* Class used to compare gimple operands.  */
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 3c72cd6c79ae6..a31fc283d51b0 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -7438,7 +7438,1155 @@ maybe_fold_comparisons_from_match_pd (tree type, 
> enum tree_code code,
>
>    return NULL_TREE;
>  }
> +
> +/* Return TRUE and set op[0] if T, following all SSA edges, is a type
> +   conversion.  */
>
> +static bool
> +gimple_convert_def_p (tree t, tree op[1])
> +{
> +  if (TREE_CODE (t) == SSA_NAME
> +      && !SSA_NAME_IS_DEFAULT_DEF (t))
> +    if (gassign *def = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (t)))
> +      switch (gimple_assign_rhs_code (def))
> +       {
> +       CASE_CONVERT:
> +         op[0] = gimple_assign_rhs1 (def);
> +         return true;
> +       case VIEW_CONVERT_EXPR:
> +         op[0] = TREE_OPERAND (gimple_assign_rhs1 (def), 0);
> +         return true;
> +       default:
> +         break;
> +       }
> +  return false;
> +}
> +
> +/* Return TRUE and set op[*] if T, following all SSA edges, resolves to a
> +   binary expression with code CODE.  */
> +
> +static bool
> +gimple_binop_def_p (enum tree_code code, tree t, tree op[2])
> +{
> +  if (TREE_CODE (t) == SSA_NAME
> +      && !SSA_NAME_IS_DEFAULT_DEF (t))
> +    if (gimple *def = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (t)))
> +      if (gimple_assign_rhs_code (def) == code)
> +       {
> +         op[0] = gimple_assign_rhs1 (def);
> +         op[1] = gimple_assign_rhs2 (def);
> +         return true;
> +       }
> +  return false;
> +}
> +/* Subroutine for fold_truth_andor_1: decode a field reference.
> +
> +   If *PEXP is a comparison reference, we return the innermost reference.
> +
> +   *PBITSIZE is set to the number of bits in the reference, *PBITPOS is
> +   set to the starting bit number.
> +
> +   *PVOLATILEP is set to 1 if the any expression encountered is volatile;
> +   otherwise it is not changed.
> +
> +   *PUNSIGNEDP is set to the signedness of the field.
> +
> +   *PREVERSEP is set to the storage order of the field.
> +
> +   *PAND_MASK is set to the mask found in a BIT_AND_EXPR, if any.
> +   If PAND_MASK *is NULL, BIT_AND_EXPR is not recognized.
> +
> +   *XOR_P is to be FALSE if EXP might be a XOR used in a compare, in which
> +   case, if XOR_CMP_OP is a zero constant, it will be overridden with *PEXP,
> +   *XOR_P will be set to TRUE, and the left-hand operand of the XOR will be
> +   decoded.  If *XOR_P is TRUE, XOR_CMP_OP is supposed to be NULL, and then 
> the
> +   right-hand operand of the XOR will be decoded.
> +
> +   *LOAD is set to the load stmt of the innermost reference, if any,
> +   *and NULL otherwise.
> +
> +   LOC[0..3] are filled in as conversion, masking, shifting and loading
> +   operations are located.
> +
> +   Return 0 if this is not a component reference or is one that we can't
> +   do anything with.  */
> +
> +static tree
> +decode_field_reference (tree *pexp, HOST_WIDE_INT *pbitsize,
> +                       HOST_WIDE_INT *pbitpos,
> +                       bool *punsignedp, bool *preversep, bool *pvolatilep,
> +                       wide_int *pand_mask, bool *xor_p, tree *xor_cmp_op,
> +                       gimple **load, location_t loc[4])
> +{
> +  tree exp = *pexp;
> +  tree outer_type = 0;
> +  wide_int and_mask;
> +  tree inner, offset;
> +  int shiftrt = 0;
> +  tree res_ops[2];
> +  machine_mode mode;
> +
> +  *load = NULL;
> +
> +  /* All the optimizations using this function assume integer fields.
> +     There are problems with FP fields since the type_for_size call
> +     below can fail for, e.g., XFmode.  */
> +  if (! INTEGRAL_TYPE_P (TREE_TYPE (exp)))
> +    return NULL_TREE;
> +
> +  /* Drop casts, only save the outermost type.  We need not worry about
> +     narrowing then widening casts, or vice-versa, for those that are not
> +     essential for the compare have already been optimized out at this
> +     point.  */
> +  if (gimple_convert_def_p (exp, res_ops))
> +    {
> +      if (!outer_type)
> +       {
> +         outer_type = TREE_TYPE (exp);
> +         loc[0] = gimple_location (SSA_NAME_DEF_STMT (exp));
> +       }
> +      exp = res_ops[0];
> +    }
> +
> +  /* Recognize and save a masking operation.  */
> +  if (pand_mask && gimple_binop_def_p (BIT_AND_EXPR, exp, res_ops)
> +      && uniform_integer_cst_p (res_ops[1]))
> +    {
> +      loc[1] = gimple_location (SSA_NAME_DEF_STMT (exp));
> +      exp = res_ops[0];
> +      and_mask = wi::to_wide (res_ops[1]);
> +    }
> +
> +  /* Turn (a ^ b) [!]= 0 into a [!]= b.  */
> +  if (xor_p && gimple_binop_def_p (BIT_XOR_EXPR, exp, res_ops)
> +      && uniform_integer_cst_p (res_ops[1]))
> +    {
> +      /* No location recorded for this one, it's entirely subsumed by the
> +        compare.  */
> +      if (*xor_p)
> +       {
> +         exp = res_ops[1];
> +         gcc_checking_assert (!xor_cmp_op);
> +       }
> +      else if (!xor_cmp_op)
> +       /* Not much we can do when xor appears in the right-hand compare
> +          operand.  */
> +       return NULL_TREE;
> +      else
> +       {
> +         *xor_p = true;
> +         exp = res_ops[0];
> +         *xor_cmp_op = *pexp;
> +       }
> +    }
> +
> +  /* Another chance to drop conversions.  */
> +  if (gimple_convert_def_p (exp, res_ops))
> +    {
> +      if (!outer_type)
> +       {
> +         outer_type = TREE_TYPE (exp);
> +         loc[0] = gimple_location (SSA_NAME_DEF_STMT (exp));
> +       }
> +      exp = res_ops[0];
> +    }
> +
> +  /* Take note of shifts.  */
> +  if (gimple_binop_def_p (RSHIFT_EXPR, exp, res_ops)
> +      && uniform_integer_cst_p (res_ops[1]))
> +    {
> +      loc[2] = gimple_location (SSA_NAME_DEF_STMT (exp));
> +      exp = res_ops[0];
> +      if (!tree_fits_shwi_p (res_ops[1]))
> +       return NULL_TREE;
> +      shiftrt = tree_to_shwi (res_ops[1]);
> +      if (shiftrt <= 0)
> +       return NULL_TREE;
> +    }
> +
> +  /* Yet another chance to drop conversions.  */
> +  if (gimple_convert_def_p (exp, res_ops))
> +    {
> +      if (!outer_type)
> +       {
> +         outer_type = TREE_TYPE (exp);
> +         loc[0] = gimple_location (SSA_NAME_DEF_STMT (exp));
> +       }
> +      exp = res_ops[0];
> +    }
> +
> +  /* Identify the load, if there is one.  */
> +  if (TREE_CODE (exp) == SSA_NAME
> +      && !SSA_NAME_IS_DEFAULT_DEF (exp))
> +    {
> +      gimple *def = SSA_NAME_DEF_STMT (exp);
> +      if (gimple_assign_load_p (def))
> +       {
> +         loc[3] = gimple_location (def);
> +         *load = def;
> +         exp = gimple_assign_rhs1 (def);
> +       }
> +    }
> +
> +  /* Identify the relevant bits.  */
> +  poly_int64 poly_bitsize, poly_bitpos;
> +  int unsignedp, reversep = *preversep, volatilep = *pvolatilep;
> +  inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
> +                              &mode, &unsignedp, &reversep, &volatilep);
> +
> +  HOST_WIDE_INT bs, bp;
> +  if (!poly_bitsize.is_constant (&bs)
> +      || !poly_bitpos.is_constant (&bp)
> +      || bs <= shiftrt
> +      || offset != 0
> +      || TREE_CODE (inner) == PLACEHOLDER_EXPR
> +      /* Reject out-of-bound accesses (PR79731).  */
> +      || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
> +         && compare_tree_int (TYPE_SIZE (TREE_TYPE (inner)),
> +                              bp + bs) < 0))
> +    return NULL_TREE;
> +
> +  *pbitsize = bs;
> +  *pbitpos = bp;
> +  *punsignedp = unsignedp;
> +  *preversep = reversep;
> +  *pvolatilep = volatilep;
> +
> +  /* Adjust shifts...  */
> +  if (shiftrt)
> +    {
> +      if (!*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +       *pbitpos += shiftrt;
> +      *pbitsize -= shiftrt;
> +    }
> +
> +  /* ... and bit position.  */
> +  if (outer_type && *pbitsize > TYPE_PRECISION (outer_type))
> +    {
> +      HOST_WIDE_INT excess = *pbitsize - TYPE_PRECISION (outer_type);
> +      if (*preversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +       *pbitpos += excess;
> +      *pbitsize -= excess;
> +    }
> +
> +  *pexp = exp;
> +
> +  /* If the number of bits in the reference is the same as the bitsize of
> +     the outer type, then the outer type gives the signedness. Otherwise
> +     (in case of a small bitfield) the signedness is unchanged.  */
> +  if (outer_type && *pbitsize == TYPE_PRECISION (outer_type))
> +    *punsignedp = TYPE_UNSIGNED (outer_type);
> +
> +  /* Make the mask the expected width.  */
> +  if (and_mask.get_precision () != 0)
> +    and_mask = wide_int::from (and_mask, *pbitsize, UNSIGNED);
> +
> +  if (pand_mask)
> +    *pand_mask = and_mask;
> +
> +  return inner;
> +}
> +
> +/* Return the one bitpos within bit extents L or R that is at an
> +   ALIGN-bit alignment boundary, or -1 if there is more than one such
> +   boundary, if there isn't any, or if there is any such boundary
> +   between the extents.  L and R are given by bitpos and bitsize.  If
> +   it doesn't return -1, there are two consecutive ALIGN-bit words
> +   that contain both extents, and at least one of the extents
> +   straddles across the returned alignment boundary.  */
> +
> +static inline HOST_WIDE_INT
> +compute_split_boundary_from_align (HOST_WIDE_INT align,
> +                                  HOST_WIDE_INT l_bitpos,
> +                                  HOST_WIDE_INT l_bitsize,
> +                                  HOST_WIDE_INT r_bitpos,
> +                                  HOST_WIDE_INT r_bitsize)
> +{
> +  HOST_WIDE_INT amask = ~(align - 1);
> +
> +  HOST_WIDE_INT first_bit = MIN (l_bitpos, r_bitpos);
> +  HOST_WIDE_INT end_bit = MAX (l_bitpos + l_bitsize, r_bitpos + r_bitsize);
> +
> +  HOST_WIDE_INT boundary = (end_bit - 1) & amask;
> +
> +  /* Make sure we're crossing no more than one alignment boundary.
> +
> +     ??? We don't have logic to recombine loads of two adjacent
> +     fields that each crosses a different alignment boundary, so
> +     as to load the middle word only once, if other words can't be
> +     otherwise recombined.  */
> +  if (boundary - first_bit > align)
> +    return -1;
> +
> +  HOST_WIDE_INT l_start_word = l_bitpos & amask;
> +  HOST_WIDE_INT l_end_word = (l_bitpos + l_bitsize - 1) & amask;
> +
> +  HOST_WIDE_INT r_start_word = r_bitpos & amask;
> +  HOST_WIDE_INT r_end_word = (r_bitpos + r_bitsize - 1) & amask;
> +
> +  /* If neither field straddles across an alignment boundary, it's no
> +     use to even try to merge them.  */
> +  if (l_start_word == l_end_word && r_start_word == r_end_word)
> +    return -1;
> +
> +  return boundary;
> +}
> +
> +/* Make a bit_field_ref.  If POINT is NULL, return the BIT_FIELD_REF.
> +   Otherwise, build and insert a load stmt before POINT, and return
> +   the SSA_NAME.  ???  Rewrite LOAD in terms of the bitfield?  */
> +
> +static tree
> +make_bit_field_load (location_t loc, tree inner, tree orig_inner, tree type,
> +                    HOST_WIDE_INT bitsize, poly_int64 bitpos,
> +                    bool unsignedp, bool reversep, gimple *point)
> +{
> +  if (point && loc == UNKNOWN_LOCATION)
> +    loc = gimple_location (point);
> +
> +  tree ref = make_bit_field_ref (loc, unshare_expr (inner),
> +                                unshare_expr (orig_inner),
> +                                type, bitsize, bitpos,
> +                                unsignedp, reversep);
> +  if (!point)
> +    return ref;
> +
> +  gimple_seq stmts = NULL;
> +  tree ret = force_gimple_operand (ref, &stmts, true, NULL_TREE);
> +
> +  /* We know the vuse is supposed to end up being the same as that at the
> +     original load at the insertion point, but if we don't set it, it will 
> be a
> +     generic placeholder that only the global SSA update at the end of the 
> pass
> +     would make equal, too late for us to use in further combinations.  So go
> +     ahead and copy the vuse.  */
> +
> +  tree reaching_vuse = gimple_vuse (point);
> +  for (gimple_stmt_iterator i = gsi_start (stmts);
> +       !gsi_end_p (i); gsi_next (&i))
> +    {
> +      gimple *new_stmt = gsi_stmt (i);
> +      if (gimple_has_mem_ops (new_stmt))
> +       gimple_set_vuse (new_stmt, reaching_vuse);
> +    }
> +
> +  gimple_stmt_iterator gsi = gsi_for_stmt (point);
> +  gsi_insert_seq_before (&gsi, stmts, GSI_SAME_STMT);
> +  return ret;
> +}
> +
> +/* Initialize ln_arg[0] and ln_arg[1] to a pair of newly-created (at
> +   LOC) loads from INNER (from ORIG_INNER), of modes MODE and MODE2,
> +   respectively, starting at BIT_POS, using reversed endianness if
> +   REVERSEP.  Also initialize BITPOS (the starting position of each
> +   part into INNER), BITSIZ (the bit count starting at BITPOS),
> +   TOSHIFT[1] (the amount by which the part and its mask are to be
> +   shifted right to bring its least-significant bit to bit zero) and
> +   SHIFTED (the amount by which the part, by separate loading, has
> +   already been shifted right, but that the mask needs shifting to
> +   match).  */
> +
> +static inline void
> +build_split_load (tree /* out */ ln_arg[2],
> +                 HOST_WIDE_INT /* out */ bitpos[2],
> +                 HOST_WIDE_INT /* out */ bitsiz[2],
> +                 HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
> +                 HOST_WIDE_INT /* out */ shifted[2],
> +                 location_t loc, tree inner, tree orig_inner,
> +                 scalar_int_mode mode, scalar_int_mode mode2,
> +                 HOST_WIDE_INT bit_pos, bool reversep,
> +                 gimple *point[2])
> +{
> +  scalar_int_mode modes[2] = { mode, mode2 };
> +  bitsiz[0] = GET_MODE_BITSIZE (mode);
> +  bitsiz[1] = GET_MODE_BITSIZE (mode2);
> +
> +  for (int i = 0; i < 2; i++)
> +    {
> +      tree type = lang_hooks.types.type_for_mode (modes[i], 1);
> +      if (!type)
> +       {
> +         type = build_nonstandard_integer_type (bitsiz[0], 1);
> +         gcc_assert (type);
> +       }
> +      bitpos[i] = bit_pos;
> +      ln_arg[i] = make_bit_field_load (loc, inner, orig_inner,
> +                                      type, bitsiz[i],
> +                                      bit_pos, 1, reversep, point[i]);
> +      bit_pos += bitsiz[i];
> +    }
> +
> +  toshift[1] = toshift[0];
> +  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +    {
> +      shifted[0] = bitsiz[1];
> +      shifted[1] = 0;
> +      toshift[0] = 0;
> +    }
> +  else
> +    {
> +      shifted[1] = bitsiz[0];
> +      shifted[0] = 0;
> +      toshift[1] = 0;
> +    }
> +}
> +
> +/* Make arrangements to split at bit BOUNDARY a single loaded word
> +   (with REVERSEP bit order) LN_ARG[0], to be shifted right by
> +   TOSHIFT[0] to bring the field of interest to the least-significant
> +   bit.  The expectation is that the same loaded word will be
> +   propagated from part 0 to part 1, with just different shifting and
> +   masking to extract both parts.  MASK is not expected to do more
> +   than masking out the bits that belong to the other part.  See
> +   build_split_load for more information on the other fields.  */
> +
> +static inline void
> +reuse_split_load (tree /* in[0] out[1] */ ln_arg[2],
> +                 HOST_WIDE_INT /* in[0] out[1] */ bitpos[2],
> +                 HOST_WIDE_INT /* in[0] out[1] */ bitsiz[2],
> +                 HOST_WIDE_INT /* in[0] out[0..1] */ toshift[2],
> +                 HOST_WIDE_INT /* out */ shifted[2],
> +                 wide_int /* out */ mask[2],
> +                 HOST_WIDE_INT boundary, bool reversep)
> +{
> +  unsigned prec = TYPE_PRECISION (TREE_TYPE (ln_arg[0]));
> +
> +  ln_arg[1] = ln_arg[0];
> +  bitpos[1] = bitpos[0];
> +  bitsiz[1] = bitsiz[0];
> +  shifted[1] = shifted[0] = 0;
> +
> +  if (reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +    {
> +      toshift[1] = toshift[0];
> +      toshift[0] = bitpos[0] + bitsiz[0] - boundary;
> +      mask[0] = wi::mask (toshift[0], true, prec);
> +      mask[1] = wi::mask (toshift[0], false, prec);
> +    }
> +  else
> +    {
> +      toshift[1] = boundary - bitpos[1];
> +      mask[1] = wi::mask (toshift[1], true, prec);
> +      mask[0] = wi::mask (toshift[1], false, prec);
> +    }
> +}
> +
> +/* Find ways of folding logical expressions of LHS and RHS:
> +
> +   Try to merge two comparisons to nearby fields.
> +
> +   For example, if we have p->a == 2 && p->b == 4 and we can load both A and 
> B
> +   at once, we can do this with a comparison against the object ANDed with 
> the
> +   a mask.
> +
> +   If we have p->a == q->a && p->b == q->b, we may be able to use bit masking
> +   operations to do this with one comparison, loading both fields from P at
> +   once, and likewise from Q.
> +
> +   Herein, loading at once means loading from within the same alignment
> +   boundary for the enclosing object.  If (packed) fields cross such 
> alignment
> +   boundaries, we may still recombine the compares, so that loads do not 
> cross
> +   the boundaries.
> +
> +   CODE is the logical operation being done.  It can be TRUTH_ANDIF_EXPR,
> +   TRUTH_AND_EXPR, TRUTH_ORIF_EXPR, or TRUTH_OR_EXPR.
> +
> +   TRUTH_TYPE is the type of the logical operand.
> +
> +   LHS is denoted as LL_ARG LCODE LR_ARG.
> +
> +   RHS is denoted as RL_ARG RCODE RR_ARG.
> +
> +   LHS is assumed to dominate RHS.
> +
> +   Combined loads are inserted next to preexisting loads, once we determine
> +   that the combination is viable, and the combined condition references new
> +   SSA_NAMEs that hold the loaded values.  Since the original loads are
> +   verified to have the same gimple_vuse, the insertion point doesn't matter
> +   for correctness.  ??? The loads may be a lot earlier than the compares, 
> and
> +   it's conceivable that one or two loads for RHS appear before those for 
> LHS.
> +   It could be advantageous to try to place the loads optimally, taking
> +   advantage of knowing whether RHS is accessed before LHS, or that both are
> +   accessed before both compares, but we don't do that (yet?).
> +
> +   SEPARATEP should be NULL if the combined condition must be returned as a
> +   single expression, even if it is a compound condition.  This must only be
> +   done if LHS and RHS are adjacent, without intervening conditions, and the
> +   combined condition is to replace RHS, while LHS is dropped altogether.
> +
> +   Otherwise, SEPARATEP must be a non-NULL pointer to a NULL_TREE, that may 
> be
> +   replaced by a part of the compound condition that could replace RHS, while
> +   the returned expression replaces LHS.  This works whether or not LHS and 
> RHS
> +   are adjacent, as long as there aren't VDEFs or other side effects between
> +   them.
> +
> +   If the "words" accessed by RHS are already accessed by LHS, this won't
> +   matter, but if RHS accesses "words" that LHS doesn't, then *SEPARATEP will
> +   be set to the compares that should take RHS's place.  By "words" we mean
> +   contiguous bits that do not cross a an TYPE_ALIGN boundary of the accessed
> +   object's type.
> +
> +   We return the simplified tree or 0 if no optimization is possible.  */
> +
> +tree
> +fold_truth_andor_for_ifcombine (enum tree_code code, tree truth_type,
> +                               location_t lloc, enum tree_code lcode,
> +                               tree ll_arg, tree lr_arg,
> +                               location_t rloc, enum tree_code rcode,
> +                               tree rl_arg, tree rr_arg,
> +                               tree *separatep)
> +{
> +  /* If this is the "or" of two comparisons, we can do something if
> +     the comparisons are NE_EXPR.  If this is the "and", we can do something
> +     if the comparisons are EQ_EXPR.  I.e.,
> +       (a->b == 2 && a->c == 4) can become (a->new == NEW).
> +
> +     WANTED_CODE is this operation code.  For single bit fields, we can
> +     convert EQ_EXPR to NE_EXPR so we need not reject the "wrong"
> +     comparison for one-bit fields.  */
> +
> +  enum tree_code orig_code = code;
> +  enum tree_code wanted_code;
> +  tree ll_inner, lr_inner, rl_inner, rr_inner;
> +  gimple *ll_load, *lr_load, *rl_load, *rr_load;
> +  HOST_WIDE_INT ll_bitsize, ll_bitpos, lr_bitsize, lr_bitpos;
> +  HOST_WIDE_INT rl_bitsize, rl_bitpos, rr_bitsize, rr_bitpos;
> +  HOST_WIDE_INT xll_bitpos, xlr_bitpos, xrl_bitpos, xrr_bitpos;
> +  HOST_WIDE_INT lnbitsize, lnbitpos, lnprec;
> +  HOST_WIDE_INT rnbitsize, rnbitpos, rnprec;
> +  bool ll_unsignedp, lr_unsignedp, rl_unsignedp, rr_unsignedp;
> +  bool ll_reversep, lr_reversep, rl_reversep, rr_reversep;
> +  scalar_int_mode lnmode, lnmode2, rnmode;
> +  wide_int ll_and_mask, lr_and_mask, rl_and_mask, rr_and_mask;
> +  wide_int l_const, r_const;
> +  tree lntype, rntype, result;
> +  HOST_WIDE_INT first_bit, end_bit;
> +  bool volatilep;
> +  bool l_split_load;
> +
> +  /* These are indexed by: conv, mask, shft, load.  */
> +  location_t ll_loc[4] = { lloc, lloc, lloc, UNKNOWN_LOCATION };
> +  location_t lr_loc[4] = { lloc, lloc, lloc, UNKNOWN_LOCATION };
> +  location_t rl_loc[4] = { rloc, rloc, rloc, UNKNOWN_LOCATION };
> +  location_t rr_loc[4] = { rloc, rloc, rloc, UNKNOWN_LOCATION };
> +
> +  gcc_checking_assert (!separatep || !*separatep);
> +
> +  /* Start by getting the comparison codes.  Fail if anything is volatile.
> +     If one operand is a BIT_AND_EXPR with the constant one, treat it as if
> +     it were surrounded with a NE_EXPR.  */
> +
> +  if (TREE_CODE_CLASS (lcode) != tcc_comparison
> +      || TREE_CODE_CLASS (rcode) != tcc_comparison)
> +    return 0;
> +
> +  /* We don't normally find TRUTH_*IF_EXPR in gimple, but these codes may be
> +     given by our caller to denote conditions from different blocks.  */
> +  switch (code)
> +    {
> +    case TRUTH_AND_EXPR:
> +    case TRUTH_ANDIF_EXPR:
> +      code = TRUTH_AND_EXPR;
> +      break;
> +
> +    case TRUTH_OR_EXPR:
> +    case TRUTH_ORIF_EXPR:
> +      code = TRUTH_OR_EXPR;
> +      break;
> +
> +    default:
> +      return 0;
> +    }
> +
> +  bool lsignbit = false, rsignbit = false;
> +  if ((lcode == LT_EXPR || lcode == GE_EXPR)
> +      && integer_zerop (lr_arg)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (ll_arg))
> +      && !TYPE_UNSIGNED (TREE_TYPE (ll_arg)))
> +    {
> +      lsignbit = true;
> +      lcode = (lcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
> +    }
> +  if ((rcode == LT_EXPR || rcode == GE_EXPR)
> +      && integer_zerop (rr_arg)
> +      && INTEGRAL_TYPE_P (TREE_TYPE (rl_arg))
> +      && !TYPE_UNSIGNED (TREE_TYPE (rl_arg)))
> +    {
> +      rsignbit = true;
> +      rcode = (rcode == LT_EXPR ? NE_EXPR : EQ_EXPR);
> +    }
> +
> +  /* See if the comparisons can be merged.  Then get all the parameters for
> +     each side.  */
> +
> +  if ((lcode != EQ_EXPR && lcode != NE_EXPR)
> +      || (rcode != EQ_EXPR && rcode != NE_EXPR))
> +    return 0;
> +
> +  ll_reversep = lr_reversep = rl_reversep = rr_reversep = 0;
> +  volatilep = 0;
> +  bool l_xor = false, r_xor = false;
> +  ll_inner = decode_field_reference (&ll_arg, &ll_bitsize, &ll_bitpos,
> +                                    &ll_unsignedp, &ll_reversep, &volatilep,
> +                                    &ll_and_mask, &l_xor, &lr_arg,
> +                                    &ll_load, ll_loc);
> +  lr_inner = decode_field_reference (&lr_arg, &lr_bitsize, &lr_bitpos,
> +                                    &lr_unsignedp, &lr_reversep, &volatilep,
> +                                    &lr_and_mask, &l_xor, 0,
> +                                    &lr_load, lr_loc);
> +  rl_inner = decode_field_reference (&rl_arg, &rl_bitsize, &rl_bitpos,
> +                                    &rl_unsignedp, &rl_reversep, &volatilep,
> +                                    &rl_and_mask, &r_xor, &rr_arg,
> +                                    &rl_load, rl_loc);
> +  rr_inner = decode_field_reference (&rr_arg, &rr_bitsize, &rr_bitpos,
> +                                    &rr_unsignedp, &rr_reversep, &volatilep,
> +                                    &rr_and_mask, &r_xor, 0,
> +                                    &rr_load, rr_loc);
> +
> +  /* It must be true that the inner operation on the lhs of each
> +     comparison must be the same if we are to be able to do anything.
> +     Then see if we have constants.  If not, the same must be true for
> +     the rhs's.  */
> +  if (volatilep
> +      || ll_reversep != rl_reversep
> +      || ll_inner == 0 || rl_inner == 0
> +      || ! operand_equal_p (ll_inner, rl_inner, 0)
> +      || (ll_load && rl_load
> +         && gimple_vuse (ll_load) != gimple_vuse (rl_load)))
> +    return 0;
> +
> +  if (TREE_CODE (lr_arg) == INTEGER_CST
> +      && TREE_CODE (rr_arg) == INTEGER_CST)
> +    {
> +      l_const = wi::to_wide (lr_arg);
> +      /* We don't expect masks on constants, but if there are any, apply
> +        them now.  */
> +      if (lr_and_mask.get_precision ())
> +       l_const &= wide_int::from (lr_and_mask,
> +                                  l_const.get_precision (), UNSIGNED);
> +      r_const = wi::to_wide (rr_arg);
> +      if (rr_and_mask.get_precision ())
> +       r_const &= wide_int::from (rr_and_mask,
> +                                  r_const.get_precision (), UNSIGNED);
> +      lr_reversep = ll_reversep;
> +    }
> +  else if (lr_reversep != rr_reversep
> +          || lr_inner == 0 || rr_inner == 0
> +          || ! operand_equal_p (lr_inner, rr_inner, 0)
> +          || (lr_load && rr_load
> +              && gimple_vuse (lr_load) != gimple_vuse (rr_load)))
> +    return 0;
> +
> +  /* If we found sign tests, finish turning them into bit tests.  */
> +
> +  if (lsignbit)
> +    {
> +      wide_int sign = wi::mask (ll_bitsize - 1, true, ll_bitsize);
> +      if (!ll_and_mask.get_precision ())
> +       ll_and_mask = sign;
> +      else
> +       ll_and_mask &= sign;
> +    }
> +
> +  if (rsignbit)
> +    {
> +      wide_int sign = wi::mask (rl_bitsize - 1, true, rl_bitsize);
> +      if (!rl_and_mask.get_precision ())
> +       rl_and_mask = sign;
> +      else
> +       rl_and_mask &= sign;
> +    }
> +
> +  /* If either comparison code is not correct for our logical operation,
> +     fail.  However, we can convert a one-bit comparison against zero into
> +     the opposite comparison against that bit being set in the field.  */
> +
> +  wanted_code = (code == TRUTH_AND_EXPR ? EQ_EXPR : NE_EXPR);
> +  if (lcode != wanted_code)
> +    {
> +      if (l_const.get_precision ()
> +         && l_const == 0
> +         && ll_and_mask.get_precision ()
> +         && wi::popcount (ll_and_mask) == 1)
> +       {
> +         /* Make the left operand unsigned, since we are only interested
> +            in the value of one bit.  Otherwise we are doing the wrong
> +            thing below.  */
> +         ll_unsignedp = 1;
> +         l_const = ll_and_mask;
> +       }
> +      else
> +       return 0;
> +    }
> +
> +  /* This is analogous to the code for l_const above.  */
> +  if (rcode != wanted_code)
> +    {
> +      if (r_const.get_precision ()
> +         && r_const == 0
> +         && rl_and_mask.get_precision ()
> +         && wi::popcount (rl_and_mask) == 1)
> +       {
> +         rl_unsignedp = 1;
> +         r_const = rl_and_mask;
> +       }
> +      else
> +       return 0;
> +    }
> +
> +  /* This will be bumped to 2 if any of the field pairs crosses an
> +     alignment boundary, so the merged compare has to be done in two
> +     parts.  */
> +  int parts = 1;
> +  /* Set to true if the second combined compare should come first,
> +     e.g., because the second original compare accesses a word that
> +     the first one doesn't, and the combined compares access those in
> +     cmp[0].  */
> +  bool first1 = false;
> +  /* Set to true if the first original compare is not the one being
> +     split.  */
> +  bool maybe_separate = false;
> +
> +  /* The following 2-dimensional arrays use the first index to
> +     identify left(0)- vs right(1)-hand compare operands, and the
> +     second one to identify merged compare parts.  */
> +  /* The memory loads or constants to be compared.  */
> +  tree ld_arg[2][2];
> +  /* The first bit of the corresponding inner object that the
> +     corresponding LD_ARG covers.  */
> +  HOST_WIDE_INT bitpos[2][2];
> +  /* The bit count starting at BITPOS that the corresponding LD_ARG
> +     covers.  */
> +  HOST_WIDE_INT bitsiz[2][2];
> +  /* The number of bits by which LD_ARG has already been shifted
> +     right, WRT mask.  */
> +  HOST_WIDE_INT shifted[2][2];
> +  /* The number of bits by which both LD_ARG and MASK need shifting to
> +     bring its least-significant bit to bit zero.  */
> +  HOST_WIDE_INT toshift[2][2];
> +  /* An additional mask to be applied to LD_ARG, to remove any bits
> +     that may have been loaded for use in another compare, but that
> +     don't belong in the corresponding compare.  */
> +  wide_int xmask[2][2] = {};
> +
> +  /* The combined compare or compares.  */
> +  tree cmp[2];
> +
> +  /* Consider we're comparing two non-contiguous fields of packed
> +     structs, both aligned at 32-bit boundaries:
> +
> +     ll_arg: an 8-bit field at offset 0
> +     lr_arg: a 16-bit field at offset 2
> +
> +     rl_arg: an 8-bit field at offset 1
> +     rr_arg: a 16-bit field at offset 3
> +
> +     We'll have r_split_load, because rr_arg straddles across an
> +     alignment boundary.
> +
> +     We'll want to have:
> +
> +     bitpos  = { {  0,  0 }, {  0, 32 } }
> +     bitsiz  = { { 32, 32 }, { 32,  8 } }
> +
> +     And, for little-endian:
> +
> +     shifted = { {  0,  0 }, {  0, 32 } }
> +     toshift = { {  0, 24 }, {  0,  0 } }
> +
> +     Or, for big-endian:
> +
> +     shifted = { {  0,  0 }, {  8,  0 } }
> +     toshift = { {  8,  0 }, {  0,  0 } }
> +  */
> +
> +  /* See if we can find a mode that contains both fields being compared on
> +     the left.  If we can't, fail.  Otherwise, update all constants and masks
> +     to be relative to a field of that size.  */
> +  first_bit = MIN (ll_bitpos, rl_bitpos);
> +  end_bit = MAX (ll_bitpos + ll_bitsize, rl_bitpos + rl_bitsize);
> +  if (get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> +                    TYPE_ALIGN (TREE_TYPE (ll_inner)), BITS_PER_WORD,
> +                    volatilep, &lnmode))
> +    l_split_load = false;
> +  else
> +    {
> +      /* Consider the possibility of recombining loads if any of the
> +        fields straddles across an alignment boundary, so that either
> +        part can be loaded along with the other field.  */
> +      HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (ll_inner));
> +      HOST_WIDE_INT boundary = compute_split_boundary_from_align
> +       (align, ll_bitpos, ll_bitsize, rl_bitpos, rl_bitsize);
> +
> +      if (boundary < 0
> +         || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> +                            align, BITS_PER_WORD, volatilep, &lnmode)
> +         || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> +                            align, BITS_PER_WORD, volatilep, &lnmode2))
> +       return 0;
> +
> +      /* If we can't have a single load, but can with two, figure out whether
> +        the two compares can be separated, i.e., whether the entirety of the
> +        first original compare is encompassed by the entirety of the first
> +        combined compare.  If the first original compare is past the 
> alignment
> +        boundary, arrange to compare that range first, by setting first1
> +        (meaning make cmp[1] first, instead of cmp[0]).  */
> +      l_split_load = true;
> +      parts = 2;
> +      if (ll_bitpos >= boundary)
> +       maybe_separate = first1 = true;
> +      else if (ll_bitpos + ll_bitsize <= boundary)
> +       maybe_separate = true;
> +    }
> +
> +  lnbitsize = GET_MODE_BITSIZE (lnmode);
> +  lnbitpos = first_bit & ~ (lnbitsize - 1);
> +  /* Avoid situations that the code below can't handle.  */
> +  if (lnbitpos < 0)
> +    return 0;
> +
> +  /* Choose the type for the combined compare.  Even if we're splitting 
> loads,
> +     make it wide enough to hold both.  */
> +  if (l_split_load)
> +    lnbitsize += GET_MODE_BITSIZE (lnmode2);
> +  lntype = build_nonstandard_integer_type (lnbitsize, 1);
> +  if (!lntype)
> +    return NULL_TREE;
> +  lnprec = TYPE_PRECISION (lntype);
> +  xll_bitpos = ll_bitpos - lnbitpos, xrl_bitpos = rl_bitpos - lnbitpos;
> +
> +  /* Adjust bit ranges for reverse endianness.  */
> +  if (ll_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +    {
> +      xll_bitpos = lnbitsize - xll_bitpos - ll_bitsize;
> +      xrl_bitpos = lnbitsize - xrl_bitpos - rl_bitsize;
> +    }
> +
> +  /* Adjust masks to match the positions in the combined lntype.  */
> +  wide_int ll_mask, rl_mask, r_mask;
> +  if (ll_and_mask.get_precision ())
> +    ll_mask = wi::lshift (wide_int::from (ll_and_mask, lnprec, UNSIGNED),
> +                         xll_bitpos);
> +  else
> +    ll_mask = wi::shifted_mask (xll_bitpos, ll_bitsize, false, lnprec);
> +  if (rl_and_mask.get_precision ())
> +    rl_mask = wi::lshift (wide_int::from (rl_and_mask, lnprec, UNSIGNED),
> +                         xrl_bitpos);
> +  else
> +    rl_mask = wi::shifted_mask (xrl_bitpos, rl_bitsize, false, lnprec);
> +
> +  /* Adjust right-hand constants in both original comparisons to match width
> +     and bit position.  */
> +  if (l_const.get_precision ())
> +    {
> +      /* We're doing bitwise equality tests, so don't bother with sign
> +        extensions.  */
> +      l_const = wide_int::from (l_const, lnprec, UNSIGNED);
> +      if (ll_and_mask.get_precision ())
> +       l_const &= wide_int::from (ll_and_mask, lnprec, UNSIGNED);
> +      l_const <<= xll_bitpos;
> +      if ((l_const & ~ll_mask) != 0)
> +       {
> +         warning_at (lloc, OPT_Wtautological_compare,
> +                     "comparison is always %d", wanted_code == NE_EXPR);
> +
> +         return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> +       }
> +
> +      /* When we set l_const, we also set r_const, so we need not test it
> +        again.  */
> +      gcc_checking_assert (r_const.get_precision ());
> +
> +      r_const = wide_int::from (r_const, lnprec, UNSIGNED);
> +      if (rl_and_mask.get_precision ())
> +       r_const &= wide_int::from (rl_and_mask, lnprec, UNSIGNED);
> +      r_const <<= xrl_bitpos;
> +      if ((r_const & ~rl_mask) != 0)
> +       {
> +         warning_at (rloc, OPT_Wtautological_compare,
> +                     "comparison is always %d", wanted_code == NE_EXPR);
> +
> +         return constant_boolean_node (wanted_code == NE_EXPR, truth_type);
> +       }
> +
> +      /* If there is something in common between the masks, those bits of the
> +        constants must be the same.  If not, the combined condition cannot be
> +        met, and the result is known.  Test for this to avoid generating
> +        incorrect code below.  */
> +      wide_int mask = ll_mask & rl_mask;
> +      if (mask != 0
> +         && (l_const & mask) != (r_const & mask))
> +       {
> +         if (wanted_code == NE_EXPR)
> +           return constant_boolean_node (true, truth_type);
> +         else
> +           return constant_boolean_node (false, truth_type);
> +       }
> +
> +      /* The constants are combined so as to line up with the loaded field, 
> so
> +        tentatively use the same parameters for the second combined
> +        compare.  */
> +      ld_arg[1][0] = wide_int_to_tree (lntype, l_const | r_const);
> +      toshift[1][0] = MIN (xll_bitpos, xrl_bitpos);
> +      shifted[1][0] = 0;
> +      bitpos[1][0] = lnbitpos;
> +      bitsiz[1][0] = lnbitsize;
> +
> +      if (parts > 1)
> +       reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> +                         shifted[1], xmask[1],
> +                         lnbitpos + GET_MODE_BITSIZE (lnmode),
> +                         lr_reversep);
> +
> +      /* No masking needed, we know the full constants.  */
> +      r_mask = wi::mask (0, true, lnprec);
> +
> +      /* If the compiler thinks this is used uninitialized below, it's
> +        because it can't realize that parts can only be 2 when
> +        comparing with constants if l_split_load is also true.  This
> +        just silences the warning.  */
> +      rnbitpos = 0;
> +    }
> +
> +  /* Likewise, if the right sides are not constant, align them for the 
> combined
> +     compare.  Also, disallow this optimization if a size, signedness or
> +     storage order mismatch occurs between the left and right sides.  */
> +  else
> +    {
> +      if (ll_bitsize != lr_bitsize || rl_bitsize != rr_bitsize
> +         || ll_unsignedp != lr_unsignedp || rl_unsignedp != rr_unsignedp
> +         || ll_reversep != lr_reversep
> +         /* Make sure the two fields on the right
> +            correspond to the left without being swapped.  */
> +         || ll_bitpos - rl_bitpos != lr_bitpos - rr_bitpos)
> +       return 0;
> +
> +      bool r_split_load;
> +      scalar_int_mode rnmode2;
> +
> +      /* Figure out how to load the bits for the right-hand size of the
> +        combined compare.  As in the left-hand size, we may have to split it,
> +        and then we use two separate compares.  */
> +      first_bit = MIN (lr_bitpos, rr_bitpos);
> +      end_bit = MAX (lr_bitpos + lr_bitsize, rr_bitpos + rr_bitsize);
> +      if (!get_best_mode (end_bit - first_bit, first_bit, 0, 0,
> +                         TYPE_ALIGN (TREE_TYPE (lr_inner)), BITS_PER_WORD,
> +                         volatilep, &rnmode))
> +       {
> +         /* Consider the possibility of recombining loads if any of the
> +            fields straddles across an alignment boundary, so that either
> +            part can be loaded along with the other field.  */
> +         HOST_WIDE_INT align = TYPE_ALIGN (TREE_TYPE (lr_inner));
> +         HOST_WIDE_INT boundary = compute_split_boundary_from_align
> +           (align, lr_bitpos, lr_bitsize, rr_bitpos, rr_bitsize);
> +
> +         if (boundary < 0
> +             /* If we're to split both, make sure the split point is
> +                the same.  */
> +             || (l_split_load
> +                 && (boundary - lr_bitpos
> +                     != (lnbitpos + GET_MODE_BITSIZE (lnmode)) - ll_bitpos))
> +             || !get_best_mode (boundary - first_bit, first_bit, 0, 0,
> +                                align, BITS_PER_WORD, volatilep, &rnmode)
> +             || !get_best_mode (end_bit - boundary, boundary, 0, 0,
> +                                align, BITS_PER_WORD, volatilep, &rnmode2))
> +           return 0;
> +
> +         r_split_load = true;
> +         parts = 2;
> +         if (lr_bitpos >= boundary)
> +           maybe_separate = first1 = true;
> +         else if (lr_bitpos + lr_bitsize <= boundary)
> +           maybe_separate = true;
> +       }
> +      else
> +       r_split_load = false;
> +
> +      /* Find a type that can hold the entire right-hand operand.  */
> +      rnbitsize = GET_MODE_BITSIZE (rnmode);
> +      rnbitpos = first_bit & ~ (rnbitsize - 1);
> +      if (r_split_load)
> +       rnbitsize += GET_MODE_BITSIZE (rnmode2);
> +      rntype = build_nonstandard_integer_type (rnbitsize, 1);
> +      if (!rntype)
> +       return  0;
> +      rnprec = TYPE_PRECISION (rntype);
> +      xlr_bitpos = lr_bitpos - rnbitpos, xrr_bitpos = rr_bitpos - rnbitpos;
> +
> +      /* Adjust for reversed endianness.  */
> +      if (lr_reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
> +       {
> +         xlr_bitpos = rnbitsize - xlr_bitpos - lr_bitsize;
> +         xrr_bitpos = rnbitsize - xrr_bitpos - rr_bitsize;
> +       }
> +
> +      /* Adjust the masks to match the combined type, and combine them.  */
> +      wide_int lr_mask, rr_mask;
> +      if (lr_and_mask.get_precision ())
> +       lr_mask = wi::lshift (wide_int::from (lr_and_mask, rnprec, UNSIGNED),
> +                         xlr_bitpos);
> +      else
> +       lr_mask = wi::shifted_mask (xlr_bitpos, lr_bitsize, false, rnprec);
> +      if (rl_and_mask.get_precision ())
> +       rr_mask = wi::lshift (wide_int::from (rr_and_mask, rnprec, UNSIGNED),
> +                             xrr_bitpos);
> +      else
> +       rr_mask = wi::shifted_mask (xrr_bitpos, rr_bitsize, false, rnprec);
> +      r_mask = lr_mask | rr_mask;
> +
> +      /* Load the right-hand operand of the combined compare.  */
> +      toshift[1][0] = MIN (xlr_bitpos, xrr_bitpos);
> +      shifted[1][0] = 0;
> +
> +      if (!r_split_load)
> +       {
> +         bitpos[1][0] = rnbitpos;
> +         bitsiz[1][0] = rnbitsize;
> +         ld_arg[1][0] = make_bit_field_load (ll_loc[3], lr_inner, lr_arg,
> +                                             rntype, rnbitsize, rnbitpos,
> +                                             lr_unsignedp || rr_unsignedp,
> +                                             lr_reversep, lr_load);
> +       }
> +
> +      /* ... and the second part of the right-hand operand if needed.  */
> +      if (parts > 1)
> +       {
> +         if (r_split_load)
> +           {
> +             gimple *point[2];
> +             point[0] = lr_load;
> +             point[1] = rr_load;
> +             build_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> +                               shifted[1], rl_loc[3], lr_inner, lr_arg,
> +                               rnmode, rnmode2, rnbitpos, lr_reversep, 
> point);
> +           }
> +         else
> +           reuse_split_load (ld_arg[1], bitpos[1], bitsiz[1], toshift[1],
> +                             shifted[1], xmask[1],
> +                             lnbitpos + GET_MODE_BITSIZE (lnmode)
> +                             - ll_bitpos + lr_bitpos, lr_reversep);
> +       }
> +    }
> +
> +  /* Now issue the loads for the left-hand combined operand/s.  */
> +  wide_int l_mask = ll_mask | rl_mask;
> +  toshift[0][0] = MIN (xll_bitpos, xrl_bitpos);
> +  shifted[0][0] = 0;
> +
> +  if (!l_split_load)
> +    {
> +      bitpos[0][0] = lnbitpos;
> +      bitsiz[0][0] = lnbitsize;
> +      ld_arg[0][0] = make_bit_field_load (ll_loc[3], ll_inner, ll_arg,
> +                                         lntype, lnbitsize, lnbitpos,
> +                                         ll_unsignedp || rl_unsignedp,
> +                                         ll_reversep, ll_load);
> +    }
> +
> +  if (parts > 1)
> +    {
> +      if (l_split_load)
> +           {
> +             gimple *point[2];
> +             point[0] = ll_load;
> +             point[1] = rl_load;
> +             build_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
> +                               shifted[0], rl_loc[3], ll_inner, ll_arg,
> +                               lnmode, lnmode2, lnbitpos, ll_reversep, 
> point);
> +           }
> +      else
> +       reuse_split_load (ld_arg[0], bitpos[0], bitsiz[0], toshift[0],
> +                         shifted[0], xmask[0],
> +                         rnbitpos + GET_MODE_BITSIZE (rnmode)
> +                         - lr_bitpos + ll_bitpos, ll_reversep);
> +    }
> +
> +  /* Compute the compares.  */
> +  for (int i = 0; i < parts; i++)
> +    {
> +      tree op[2] = { ld_arg[0][i], ld_arg[1][i] };
> +      wide_int mask[2] = { l_mask, r_mask };
> +      location_t *locs[2] = { i ? rl_loc : ll_loc, i ? rr_loc : lr_loc };
> +
> +      /* Figure out the masks, and unshare the original operands.  */
> +      for (int j = 0; j < 2; j++)
> +       {
> +         unsigned prec = TYPE_PRECISION (TREE_TYPE (op[j]));
> +         op[j] = unshare_expr (op[j]);
> +
> +         /* Mask out the bits belonging to the other part.  */
> +         if (xmask[j][i].get_precision ())
> +           mask[j] &= xmask[j][i];
> +
> +         if (shifted[j][i])
> +           {
> +             wide_int shift = wide_int::from (shifted[j][i], prec, UNSIGNED);
> +             mask[j] = wi::lrshift (mask[j], shift);
> +           }
> +         mask[j] = wide_int::from (mask[j], prec, UNSIGNED);
> +       }
> +
> +      /* Line up the operands for a compare.  */
> +      HOST_WIDE_INT shift = (toshift[0][i] - toshift[1][i]);
> +
> +      if (shift)
> +       {
> +         int j;
> +         if (shift > 0)
> +           j = 0;
> +         else
> +           {
> +             j = 1;
> +             shift = -shift;
> +           }
> +
> +         tree shiftsz = bitsize_int (shift);
> +         op[j] = fold_build2_loc (locs[j][1], RSHIFT_EXPR, TREE_TYPE (op[j]),
> +                                  op[j], shiftsz);
> +         mask[j] = wi::lrshift (mask[j], shift);
> +       }
> +
> +      /* Convert to the smaller type before masking out unwanted
> +        bits.  */
> +      tree type = TREE_TYPE (op[0]);
> +      if (type != TREE_TYPE (op[1]))
> +       {
> +         int j = (TYPE_PRECISION (type)
> +                  < TYPE_PRECISION (TREE_TYPE (op[1])));
> +         if (!j)
> +           type = TREE_TYPE (op[1]);
> +         op[j] = fold_convert_loc (locs[j][0], type, op[j]);
> +         mask[j] = wide_int::from (mask[j], TYPE_PRECISION (type), UNSIGNED);
> +       }
> +
> +      /* Apply masks.  */
> +      for (int j = 0; j < 2; j++)
> +       if (mask[j] != wi::mask (0, true, mask[j].get_precision ()))
> +         op[j] = build2_loc (locs[j][2], BIT_AND_EXPR, type,
> +                             op[j], wide_int_to_tree (type, mask[j]));
> +
> +      cmp[i] = build2_loc (i ? rloc : lloc, wanted_code, truth_type,
> +                          op[0], op[1]);
> +    }
> +
> +  /* Reorder the compares if needed.  */
> +  if (first1)
> +    std::swap (cmp[0], cmp[1]);
> +
> +  /* Prepare to return the resulting compares.  Combine two parts if
> +     needed.  */
> +  if (parts == 1)
> +    result = cmp[0];
> +  else if (!separatep || !maybe_separate)
> +    result = build2_loc (rloc, orig_code, truth_type, cmp[0], cmp[1]);
> +  else
> +    {
> +      result = cmp[0];
> +      *separatep = cmp[1];
> +    }
> +
> +  return result;
> +}
> +
>  /* Try to simplify the AND of two comparisons, specified by
>     (OP1A CODE1 OP1B) and (OP2B CODE2 OP2B), respectively.
>     If this can be simplified to a single expression (without requiring
> diff --git a/gcc/testsuite/gcc.dg/field-merge-1.c 
> b/gcc/testsuite/gcc.dg/field-merge-1.c
> new file mode 100644
> index 0000000000000..1818e104437e1
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-1.c
> @@ -0,0 +1,64 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -save-temps -fdump-tree-optimized" } */
> +
> +/* Check that field loads compared with constants are merged, even if
> +   tested out of order, and when fields straddle across alignment
> +   boundaries.  */
> +
> +struct TL {
> +  unsigned char p;
> +  unsigned int a;
> +  unsigned char q;
> +  unsigned int b;
> +  unsigned char r;
> +  unsigned int c;
> +  unsigned char s;
> +} __attribute__ ((packed, aligned (4), scalar_storage_order 
> ("little-endian")));
> +
> +struct TB {
> +  unsigned char p;
> +  unsigned int a;
> +  unsigned char q;
> +  unsigned int b;
> +  unsigned char r;
> +  unsigned int c;
> +  unsigned char s;
> +} __attribute__ ((packed, aligned (4), scalar_storage_order ("big-endian")));
> +
> +#define vc 0xaa
> +#define vi 0x12345678
> +
> +struct TL vL = { vc, vi, vc, vi, vc, vi, vc };
> +struct TB vB = { vc, vi, vc, vi, vc, vi, vc };
> +
> +void f (void) {
> +  /* Which words of    | vL | vB | */
> +  /* are accessed by   |0123|0123| */
> +  if (0 /* the tests?  |    |    | */
> +      || vL.p != vc /* |*   |    | */
> +      || vB.p != vc /* |    |*   | */
> +      || vL.s != vc /* |   *|    | */
> +      || vB.q != vc /* |    | *  | */
> +      || vL.a != vi /* |^*  |    | */
> +      || vB.b != vi /* |    | ^* | */
> +      || vL.c != vi /* |  *^|    | */
> +      || vB.c != vi /* |    |  ^*| */
> +      || vL.b != vi /* | ^^ |    | */
> +      || vL.q != vc /* | ^  |    | */
> +      || vB.a != vi /* |    |^^  | */
> +      || vB.r != vc /* |    |  ^ | */
> +      || vB.s != vc /* |    |   ^| */
> +      || vL.r != vc /* |  ^ |    | */
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 8 "optimized" } } */
> +/* { dg-final { scan-assembler-not "cmpb" { target { i*86-*-* || x86_64-*-* 
> } } } } */
> +/* { dg-final { scan-assembler-times "cmpl" 8 { target { i*86-*-* || 
> x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler-times "cmpw" 8 { target { powerpc*-*-* || 
> rs6000-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/field-merge-10.c 
> b/gcc/testsuite/gcc.dg/field-merge-10.c
> new file mode 100644
> index 0000000000000..dca6febb75859
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-10.c
> @@ -0,0 +1,36 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +/* Check that we don't move loads across stores.  */
> +
> +struct s {
> +  short a;
> +  short b;
> +} __attribute__ ((aligned (4)));
> +
> +struct s p[2] = {
> +  { 42, 0xfe01 },
> +  { 42, 0xfe10 }
> +};
> +
> +void f (void) {
> +  short a0 = p[0].a;
> +  short b0 = p[0].b;
> +  short a1 = p[1].a;
> +  short b1 = p[1].b;
> +  __builtin_memset (p, 0, sizeof (p));
> +  asm ("" : "+m" (p));
> +  if (0
> +      || p[0].a != p[1].a
> +      || p[0].b != p[1].b
> +      )
> +    __builtin_abort ();
> +  if (a0 == a1)
> +    if (b0 == b1)
> +      __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-11.c 
> b/gcc/testsuite/gcc.dg/field-merge-11.c
> new file mode 100644
> index 0000000000000..fe627cddd7fdf
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-11.c
> @@ -0,0 +1,32 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +/* Check that narrowing casts aren't ignored, and that same-field tests at
> +   different widths aren't misoptimized.  */
> +
> +struct s {
> +  short a;
> +  unsigned short b;
> +  int c;
> +} __attribute__ ((aligned (4)));
> +
> +struct s p = { 42, (short)(0xef1 - 0x1000), 0x12345678 };
> +
> +void f (void) {
> +  if (0
> +      || (p.a & 0xcc) != 8
> +      || p.a != 42
> +      || (int)(signed char)p.b != (int)(signed char)(0xef1 - 0x1000)
> +      || (unsigned)(unsigned char)p.b != (unsigned)(unsigned char)(0xef1 - 
> 0x1000)
> +      || (unsigned)p.b != (unsigned short)(0xef1 - 0x1000)
> +      || (int)(short)p.b != (int)(0xef1 - 0x1000)
> +      || (long)(unsigned char)(p.c >> 8) != (long)(unsigned char)0x123456
> +      || p.c != 0x12345678
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-12.c 
> b/gcc/testsuite/gcc.dg/field-merge-12.c
> new file mode 100644
> index 0000000000000..7056eb607e904
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-12.c
> @@ -0,0 +1,33 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2" } */
> +
> +/* Check that we don't crash when trying to handle masks that don't match the
> +   width of the original type.  */
> +
> +struct s {
> +  long long q;
> +};
> +
> +struct s x1 = { 1 };
> +struct s xm1 = { -1 };
> +struct s x8 = { 8 };
> +struct s x0 = { 0 };
> +
> +bool f(struct s *p)
> +{
> +  int q = (int)p->q;
> +  return (q < 0) || (q & 7);
> +}
> +
> +int main ()
> +{
> +  if (!f (&x1))
> +    __builtin_abort ();
> +  if (!f (&xm1))
> +    __builtin_abort ();
> +  if (f (&x8))
> +    __builtin_abort ();
> +  if (f (&x0))
> +    __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-2.c 
> b/gcc/testsuite/gcc.dg/field-merge-2.c
> new file mode 100644
> index 0000000000000..80c573b31ddc7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-2.c
> @@ -0,0 +1,31 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct TL {
> +  unsigned short a;
> +  unsigned short b;
> +} __attribute__ ((packed, aligned (8)));
> +
> +struct TB {
> +  unsigned char p;
> +  unsigned short a;
> +  unsigned short b;
> +} __attribute__ ((packed, aligned (8)));
> +
> +#define vc 0xaa
> +
> +struct TL vL = { vc, vc };
> +struct TB vB = { vc, vc, vc };
> +
> +void f (void) {
> +  if (0
> +      || vL.b != vB.b
> +      || vL.a != vB.a
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-3.c 
> b/gcc/testsuite/gcc.dg/field-merge-3.c
> new file mode 100644
> index 0000000000000..f26e8a96ea04f
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-3.c
> @@ -0,0 +1,36 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +const int BIG_ENDIAN_P = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__);
> +
> +struct T1 {
> +  unsigned char p[2];
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__aligned__(8)));
> +
> +struct T2 {
> +  unsigned short p;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__aligned__(8)));
> +
> +#define vc 0xaa
> +#define vi 0x12345678
> +
> +struct T1 v1 = { { vc + !BIG_ENDIAN_P, vc + BIG_ENDIAN_P }, vc, vi };
> +struct T2 v2 = { (vc << 8) | (vc - 1), vc, vi };
> +
> +void f (void) {
> +  if (0
> +      || v1.p[!BIG_ENDIAN_P] != v2.p >> 8
> +      || v1.a != v2.a
> +      || ((v1.z ^ v2.z) & 0xff00ff00) != 0
> +      || ((v1.z ^ v2.z) & 0x00ff00ff) != 0)
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-4.c 
> b/gcc/testsuite/gcc.dg/field-merge-4.c
> new file mode 100644
> index 0000000000000..c629069e52b2c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-4.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct T1 {
> +  unsigned int zn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(4)));
> +
> +struct T2 {
> +  unsigned int zn;
> +  unsigned char rn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(4)));
> +
> +#define vc 0xaa
> +#define vs 0xccdd
> +#define vi 0x12345678
> +
> +struct T1 v1 = { -1, vc, 1, vs, vi };
> +struct T2 v2 = { -1, 0, vc, 1, vs, vi };
> +
> +void f (void) {
> +  if (0
> +      || v1.p != v2.p
> +      || v1.a != v2.a
> +      || v1.z != v2.z
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-5.c 
> b/gcc/testsuite/gcc.dg/field-merge-5.c
> new file mode 100644
> index 0000000000000..1580b14bcc935
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-5.c
> @@ -0,0 +1,40 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +struct T1 {
> +  unsigned int zn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(8)));
> +
> +struct T2 {
> +  unsigned int zn;
> +  unsigned char rn;
> +  unsigned char p;
> +  unsigned char qn;
> +  unsigned short a;
> +  unsigned int z;
> +} __attribute__((__packed__, __aligned__(8)));
> +
> +#define vc 0xaa
> +#define vs 0xccdd
> +#define vi 0x12345678
> +
> +struct T1 v1 = { -1, vc, 1, vs, vi };
> +struct T2 v2 = { -1, 0, vc, 1, vs, vi };
> +
> +void f (void) {
> +  if (0
> +      || v1.p != v2.p
> +      || v1.a != v2.a
> +      || v1.z != v2.z
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-6.c 
> b/gcc/testsuite/gcc.dg/field-merge-6.c
> new file mode 100644
> index 0000000000000..7fd48a138d14a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-6.c
> @@ -0,0 +1,26 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +/* { dg-shouldfail } */
> +
> +/* Check that the third compare won't be pulled ahead of the second one and
> +   prevent, which would prevent the NULL pointer dereference that should 
> cause
> +   the execution to fail.  */
> +
> +struct s {
> +  char a, b;
> +  int *p;
> +};
> +
> +struct s a = { 0, 1, 0 };
> +struct s b = { 0, 0, 0 };
> +
> +int f () {
> +  return (a.a != b.a
> +         || *b.p != *a.p
> +         || a.b != b.b);
> +}
> +
> +int main() {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-7.c 
> b/gcc/testsuite/gcc.dg/field-merge-7.c
> new file mode 100644
> index 0000000000000..728a29b6fafa9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-7.c
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-ifcombine-details" } */
> +
> +/* Check that the third compare won't be combined with the first one.  */
> +
> +struct s {
> +  char a, b;
> +  int p;
> +};
> +
> +struct s a = { 0, 0, 0 };
> +struct s b = { 0, 0, 0 };
> +
> +int f () {
> +  return (a.a != b.a || (a.p != b.p && a.b != b.b));
> +}
> +
> +int g () {
> +  return (a.a == b.a && (a.p == b.p || a.b == b.b));
> +}
> +
> +/* { dg-final { scan-tree-dump-not "optimizing" "ifcombine" } } */
> +/* { dg-final { scan-tree-dump-not "BIT_FIELD_REF" "ifcombine" } } */
> diff --git a/gcc/testsuite/gcc.dg/field-merge-8.c 
> b/gcc/testsuite/gcc.dg/field-merge-8.c
> new file mode 100644
> index 0000000000000..ae270e10070e4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-8.c
> @@ -0,0 +1,25 @@
> +/* { dg-do run } */
> +/* { dg-options "-O" } */
> +
> +/* Check that conversions are not thrown away.  */
> +
> +struct s {
> +  unsigned char a;
> +  unsigned short b;
> +} __attribute__ ((aligned (4)));
> +
> +struct s p = { 42, 0xfe };
> +struct s q = { 42, 0xfe | (2 << (__CHAR_BIT__ - 1)) };
> +
> +void f (void) {
> +  if (0
> +      || p.a != q.a
> +      || (unsigned char)p.b != (unsigned char)q.b
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/field-merge-9.c 
> b/gcc/testsuite/gcc.dg/field-merge-9.c
> new file mode 100644
> index 0000000000000..b9e08d8fa37d2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/field-merge-9.c
> @@ -0,0 +1,38 @@
> +/* { dg-do run } */
> +/* { dg-options "-O -fdump-tree-ifcombine-details" } */
> +
> +/* Check that conversions and selections of similar bit ranges across 
> different
> +   types don't prevent combination.  */
> +
> +struct s1 {
> +  unsigned char b[2];
> +  unsigned char a;
> +} __attribute__ ((aligned (4)));
> +
> +struct s2 {
> +  unsigned short b;
> +  unsigned char a;
> +} __attribute__ ((aligned (4)));
> +
> +static const char le = __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__ ? 1 : 0;
> +
> +struct s1 p = { { -!le , -le }, 42 };
> +struct s2 q = { (le
> +                ? -2 << (__CHAR_BIT__ - 1)
> +                : -1 & ((1 << (__CHAR_BIT__ - 1) << 1) - 1)), 42 };
> +
> +void f (void) {
> +  if (0
> +      || p.a != q.a
> +      || p.b[!le] != (unsigned char)q.b
> +      || p.b[le] != (unsigned char)((q.b >> (__CHAR_BIT__ - 1)) >> 1)
> +      )
> +    __builtin_abort ();
> +}
> +
> +int main () {
> +  f ();
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "optimizing two comparisons" 2 
> "ifcombine" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/long_branch_1.c 
> b/gcc/testsuite/gcc.target/aarch64/long_branch_1.c
> index 49d8b6a2278ad..9fd9ec0d1d3f6 100644
> --- a/gcc/testsuite/gcc.target/aarch64/long_branch_1.c
> +++ b/gcc/testsuite/gcc.target/aarch64/long_branch_1.c
> @@ -1,6 +1,6 @@
>  /* { dg-do assemble } */
>  /* { dg-timeout-factor 2.0 } */
> -/* { dg-options "-O1 -fno-reorder-blocks -fno-tree-cselim --save-temps" } */
> +/* { dg-options "-O1 -fno-reorder-blocks -fno-tree-cselim 
> -fdisable-tree-ifcombine --save-temps" } */
>
>
>  __attribute__((noinline, noclone)) int
> diff --git a/gcc/tree-ssa-ifcombine.cc b/gcc/tree-ssa-ifcombine.cc
> index a87bf1210776f..de8db2be5572a 100644
> --- a/gcc/tree-ssa-ifcombine.cc
> +++ b/gcc/tree-ssa-ifcombine.cc
> @@ -971,7 +971,19 @@ ifcombine_ifandif (basic_block inner_cond_bb, bool 
> inner_inv,
>                                             outer_cond_code,
>                                             gimple_cond_lhs (outer_cond),
>                                             gimple_cond_rhs (outer_cond),
> -                                           gimple_bb (outer_cond))))
> +                                           gimple_bb (outer_cond)))
> +         && !(t = (fold_truth_andor_for_ifcombine
> +                   (TRUTH_ANDIF_EXPR, boolean_type_node,
> +                    gimple_location (outer_cond),
> +                    outer_cond_code,
> +                    gimple_cond_lhs (outer_cond),
> +                    gimple_cond_rhs (outer_cond),
> +                    gimple_location (inner_cond),
> +                    inner_cond_code,
> +                    gimple_cond_lhs (inner_cond),
> +                    gimple_cond_rhs (inner_cond),
> +                    single_pred (inner_cond_bb) != outer_cond_bb
> +                    ? &ts : 0))))
>         {
>           /* Only combine conditions in this fallback case if the blocks are
>              neighbors.  */
>
> --
> Alexandre Oliva, happy hacker            https://FSFLA.org/blogs/lxo/
>    Free Software Activist                   GNU Toolchain Engineer
> More tolerance and less prejudice are key for inclusion and diversity
> Excluding neuro-others for not behaving ""normal"" is *not* inclusive

Re: [PATCH v4] fold fold_truth_andor field merging into ifcombine

Reply via email to