https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77308
--- Comment #56 from Bernd Edlinger <bernd.edlinger at hotmail dot de> --- (In reply to wilco from comment #55) > (In reply to Bernd Edlinger from comment #39) > > Created attachment 39940 [details] > > proposed patch, v2 > > > > last upload was accidentally truncated. > > uploaded the right patch. > > Right so looking at your patch, I think we should make the LDRD peephole > change in a separate patch. I tried your foo example on all combinations of > ARM, Thumb-2, VFP, NEON on various CPUs with both settings of > prefer_ldrd_strd. > > In all cases the current GCC generates LDRD/STRD, even for zero offsets. > CPUs where prefer_ldrd_strd=false emit LDR/STR for the shifts with > -msoft-float or -mfpu=vfp (but not -mfpu=neon). This is clearly incorrect > given that LDRD/STRD is used in all other cases, and prefer_ldrd_strd seems > to imply whether to prefer using LDRD/STRD in prolog/epilog and inlined > memcpy. > > So that means we should remove the odd checks for codesize and > current_tune->prefer_ldrd_strd from all the peepholes. Agreed, I can split the patch. From what I understand, we should never emit ldrd/strd out of the memmovdi2 pattern when optimizing for speed and disable the peephole in the way I proposed it in the patch.