Hello, I am working on GCC4.2.1 porting to our VLIW processor. Our No. 1 priority is code size. I noticed the following code generation:
Source code: if (a == 0x1ff ) c = a + b; return c; After tree copy propagation: foo (a, b, c) { <bb 2>: if (a_2 == 511) goto <L0>; else goto <L1>; <L0>:; c_5 = b_4 + 511; # c_1 = PHI <c_3(2), c_5(3)>; <L1>:; return c_1; } It will generate the following assembly code for our processor tstieqw p0, r0, #0x1ff //Compare r0 with 0x1ff and write result to a predicate p0. addwi r2, r1, #0x1ff //Predicated add sbl [link] : movw r8, r2 In our processor, p0. addwi r2, r1, #0x1ff is a long instruction (64-bit) Ideally, I don't want this copy propagation if the immediate is out of certain range. Then it will generate the following code tstieqw p0, r0, #0x1ff //Compare r0 with 0x1ff and write result to predicate p0. addw r2, r1, r0 //Predicated add (32-bit instruciton) sbl [link] : movw r8, r2 It is going to save us four bytes. Of couse, for processors without long/short instructions, this copy propagation is benefiical for performance by reducing unnecessary dependency. Therefore, whether to apply this copy propagation is machine dependent to some degree. What I do now is to add some check in tree-ssa-copy.c and tree-ssa-dom.c for our target. But this is not very clean. My question is whether there is better way to implement such machine-dependent tree-level optimization (like hooks in RTL level). I believe there are other processors that have the similar problem. What is common solution? Thanks, Bingfeng Mei Broadcom UK