http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50749
bin.cheng <amker.cheng at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |amker.cheng at gmail dot com --- Comment #15 from bin.cheng <amker.cheng at gmail dot com> --- There must be another scenario for the example, and in this case example: int test_0 (char* p, int c) { int r = 0; r += *p++; r += *p++; r += *p++; return r; } should be translated into sth like: //... ldrb [rx] ldrb [rx+1] ldrb [rx+2] add rx, rx, #3 //... This way all loads are independent and can be issued on super scalar machine. Actuall for targets like arm which supports post-increment constant (other than size of memory access), it can be further changed into: //... ldrb [rx], #3 ldrb [rx-2] ldrb [rx-1] //... For now auto-increment pass can't do this optimization. I once have a patch for this but benchmark shows the case is not common. This case is common especially after loop unrolling and rtl passes deliberately break down long dependence of RX, which I think is right.