https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98535
rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Assignee|rguenth at gcc dot gnu.org |rsandifo at gcc dot gnu.org --- Comment #9 from rsandifo at gcc dot gnu.org <rsandifo at gcc dot gnu.org> --- I think the problem is in the way that noutputs_bound is used. If we force limit = nvectors to suppress the (attempted) DCE, we get the correct output. In other words, the optimisation is trying to make sure we only generate the bare mininum vectors needed on each iteration. But in this case it's generating the wrong ones. Perhaps it would be easier to get rid of that and do something similar to the i&1 handling inside the loop.