https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111294
--- Comment #10 from Richard Biener <rguenth at gcc dot gnu.org> --- But this stmt isn't the issue, BB7 is <bb 7> [local count: 118111600]: # _31 = PHI <b.1_2(6), 0(5)> _4 = (unsigned char) _31; _6 = (int) a.8_28; j_22 = (short int) _4; _33 = _31 & 255; if (_33 > 11) and that does have one more stmt. It's if (a.8_28 != 0) goto <bb 6>; [34.00%] else goto <bb 7>; [66.00%] <bb 6> [local count: 40157944]: <bb 7> [local count: 118111600]: # _31 = PHI <b.1_2(6), 0(5)> _4 = (unsigned char) _31; _6 = (int) a.8_28; j_22 = (short int) _4; _33 = _31 & 255; if (_33 > 11) goto <bb 8>; [50.00%] else goto <bb 9>; [50.00%] <bb 8> [local count: 59055800]: <bb 9> [local count: 118111600]: # iftmp.11_27 = PHI <j_22(7), 1(8)> so what the cost model fails to see is that j_22 and _4 are only live on one path to BB9. It's that odd code again I attempted to remove at some point: /* PHIs in the path will create degenerate PHIS in the copied path which will then get propagated away, so looking at just the duplicate path the PHIs would seem unimportant. But those PHIs, because they're assignments to objects typically with lives that exist outside the thread path, will tend to generate PHIs (or at least new PHI arguments) at points where we leave the thread path and rejoin the original blocks. So we do want to account for them. We ignore virtual PHIs. We also ignore cases where BB has a single incoming edge. That's the most common degenerate PHI we'll see here. Finally we ignore PHIs that are associated with the value we're tracking as that object likely dies. */ if (EDGE_COUNT (bb->succs) > 1 && EDGE_COUNT (bb->preds) > 1) { for (gphi_iterator gsip = gsi_start_phis (bb); !gsi_end_p (gsip); gsi_next (&gsip)) { gphi *phi = gsip.phi (); tree dst = gimple_phi_result (phi); /* Note that if both NAME and DST are anonymous SSA_NAMEs, then we do not have enough information to consider them associated. */ if (dst != name && name && TREE_CODE (name) == SSA_NAME && (SSA_NAME_VAR (dst) != SSA_NAME_VAR (name) || !SSA_NAME_VAR (dst)) && !virtual_operand_p (dst)) ++m_n_insns; } } there's also a missed canonicalization I think: _4 = (unsigned char) _31; _6 = (int) a.8_28; j_22 = (short int) _4; _33 = _31 & 255; we canonicalize (int)(unsigned char) _31 to _31 & 255 but we fail to do the same for (short)(unsigned char) _31 or rather we fail to anticipate that (short)_33 could be used for j_22, eliding _4. Maybe costing "lowparts" as zero would be useful here.