https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108695
--- Comment #14 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Martin Liška from comment #10) > > where the XOR16 is implemented as: > > > > #define XORN(in1,in2,out,len) \ > > do { \ > > uint _i; \ > > for (_i = 0; _i < len/sizeof(ulong); ++_i) \ > > *((ulong*)(out)+_i) = *((ulong*)(in1)+_i) ^ > > *((ulong*)(in2)+_i); \ > > } while(0) > > I can confirm that changing that to: > > #define XORN(in1, in2, out, len) \ > do \ > { \ > uint _i; \ > for (_i = 0; _i < len; ++_i) \ > *(out + _i) = *(in1 + _i) ^ *(in2 + _i); \ > } while (0) > > fixes the problem. It seems very close to what I saw here: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201#c13 It depends on if those arrays were stored as ulong or will be later read as ulong or something else. One could also use typedef ulong ulong_alias __attribute__((may_alias)); and use ulong_alias* above, or memcpy to/out of ulong temporaries.