Re: ICE with MEM_REF when Pmode is different from word_mode
On 29 May 2012 17:31, Richard Guenther richard.guent...@gmail.com wrote: On Tue, May 29, 2012 at 1:57 PM, Mohamed Shafi shafi...@gmail.com wrote: Hi, I am porting a private target in GCC 4.6.3 version. For my target pointer size is 24bits and word size is 32bits. Moreover a byte is 32bit For the testcase gcc.c-torture/compile/92-1.c i get the following ICE 92-1.c: In function 'f': 92-1.c:18:5: internal compiler error: in size_binop_loc, at fold-const.c:1436 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions This is the reduced testcase of the same struct vp { int wa; }; typedef struct vp *vpt; typedef struct vc { int o; vpt py[8]; } *vct; typedef struct np *npt; struct np { vct d; int di; }; int f(npt dp) { vpt *py; py = dp-d-py[dp-di]; return (int)(py[1])-wa; } The ICE happens in tree_slp_vectorizer pass. The following is the tree dump just before that ;; Function f (f) f (struct np * dp) { struct vp * D.1232; int D.1230; unsigned int D.1228; int D.1227; struct vc * D.1225; bb 2: D.1225_2 = dp_1(D)-d; D.1227_4 = dp_1(D)-di; D.1228_5 = (unsigned int) D.1227_4; D.1232_9 = MEM[(struct vp * *)D.1225_2 + 4B].py[D.1228_5]{lb: 0 sz: 4}; D.1230_10 = D.1232_9-wa; return D.1230_10; } The ICE happens for D.1232_9 = MEM[(struct vp * *)D.1225_2 + 4B].py[D.1228_5]{lb: 0 sz: 4}; This is due to the addition of the new code in tree-data-ref.c (this is was not there in 4.5 series) if (TREE_CODE (base) == MEM_REF) { if (!integer_zerop (TREE_OPERAND (base, 1))) { if (!poffset) { double_int moff = mem_ref_offset (base); poffset = double_int_to_tree (sizetype, moff); } else poffset = size_binop (PLUS_EXPR, poffset, TREE_OPERAND (base, 1)); This should use mem_ref_offset, too. This is present in the trunk also. Will you be submitting a patch for this? Shafi
ICE with MEM_REF when Pmode is different from word_mode
Hi, I am porting a private target in GCC 4.6.3 version. For my target pointer size is 24bits and word size is 32bits. Moreover a byte is 32bit For the testcase gcc.c-torture/compile/92-1.c i get the following ICE 92-1.c: In function 'f': 92-1.c:18:5: internal compiler error: in size_binop_loc, at fold-const.c:1436 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions This is the reduced testcase of the same struct vp { int wa; }; typedef struct vp *vpt; typedef struct vc { int o; vpt py[8]; } *vct; typedef struct np *npt; struct np { vct d; int di; }; int f(npt dp) { vpt *py; py = dp-d-py[dp-di]; return (int)(py[1])-wa; } The ICE happens in tree_slp_vectorizer pass. The following is the tree dump just before that ;; Function f (f) f (struct np * dp) { struct vp * D.1232; int D.1230; unsigned int D.1228; int D.1227; struct vc * D.1225; bb 2: D.1225_2 = dp_1(D)-d; D.1227_4 = dp_1(D)-di; D.1228_5 = (unsigned int) D.1227_4; D.1232_9 = MEM[(struct vp * *)D.1225_2 + 4B].py[D.1228_5]{lb: 0 sz: 4}; D.1230_10 = D.1232_9-wa; return D.1230_10; } The ICE happens for D.1232_9 = MEM[(struct vp * *)D.1225_2 + 4B].py[D.1228_5]{lb: 0 sz: 4}; This is due to the addition of the new code in tree-data-ref.c (this is was not there in 4.5 series) if (TREE_CODE (base) == MEM_REF) { if (!integer_zerop (TREE_OPERAND (base, 1))) { if (!poffset) { double_int moff = mem_ref_offset (base); poffset = double_int_to_tree (sizetype, moff); } else poffset = size_binop (PLUS_EXPR, poffset, TREE_OPERAND (base, 1)); } base = TREE_OPERAND (base, 0); } else base = build_fold_addr_expr (base); the assert check in size_binop fails gcc_assert (int_binop_types_match_p (code, TREE_TYPE (arg0), TREE_TYPE (arg1))); This is because the mode of arg0 and arg1 are different, one is Pmode and other is word_mode. This is present in m32c target which also has different Pmode and word_mode. Is this a know failure? I cannot find a bug entry for this issue. Should i report this? Regards, Shafi
Re: Reloading going wrong. Bug in GCC?
ping !!!. Any help on http://gcc.gnu.org/ml/gcc/2011-09/msg00150.html shafi On 14 September 2011 15:07, Mohamed Shafi shafi...@gmail.com wrote: Hi, I am working on a 32bit private target which has the following restriction 1. store/load can happen only through a general purpose register (GP_REGS) 2. base register should be an address register (AD_REGS) 3. moves between GP_REGS and AD_REGS can happen only through PT_REGS In a PRE_MODIFY instruction when both the base register and the output register gets spilled the reloading is going wrong. befor IRA pass ~~~ (insn 259 336 317 2 ../rld_bug.c:94 (set (reg:QI 234 [+1 ]) (mem/s/j/c:QI (pre_modify:PQI (reg/f:PQI 233) (plus:PQI (reg/f:PQI 233) (const_int 1 [0x1]))) [0+1 S1 A32])) 7 {movqi_op} (expr_list:REG_INC (reg/f:PQI 233) (nil))) after IRA pass ~~~ Reloads for insn # 259 Reload 0: GP_REGS, RELOAD_FOR_OPADDR_ADDR (opnum = 1), can't combine, secondary_reload_p reload_reg_rtx: (reg:PQI 11 g11) Reload 1: PT_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), can't combine, secondary_reload_p reload_reg_rtx: (reg:PQI 12 as0) secondary_in_reload = 0 Reload 2: GP_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), can't combine, secondary_reload_p reload_reg_rtx: (reg:PQI 11 g11) Reload 3: PT_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), can't combine, secondary_reload_p reload_reg_rtx: (reg:PQI 13 as1) secondary_out_reload = 2 Reload 4: reload_in (PQI) = (reg/f:PQI 233) reload_out (PQI) = (reg/f:PQI 233) AD_REGS, RELOAD_OTHER (opnum = 1) reload_in_reg: (reg/f:PQI 233) reload_out_reg: (reg/f:PQI 233) reload_reg_rtx: (reg:PQI 31 a3) secondary_in_reload = 1, secondary_out_reload = 3 Reload 5: reload_out (QI) = (reg:QI 234 [+1 ]) GP_REGS, RELOAD_FOR_OUTPUT (opnum = 0) reload_out_reg: (reg:QI 234 [+1 ]) reload_reg_rtx: (reg:QI 11 g11) (insn 744 336 745 2 ../rld_bug.c:94 (set (reg:PQI 11 g11) (mem/c:PQI (plus:PQI (reg/f:PQI 32 sp) (const_int -24 [0xffe8])) [99 %sfp+8 S1 A32])) 9 {movpqi_op} (nil)) (insn 745 744 746 2 ../rld_bug.c:94 (set (reg:PQI 12 as0) (reg:PQI 11 g11)) 9 {movpqi_op} (nil)) (insn 746 745 259 2 ../rld_bug.c:94 (set (reg:PQI 31 a3) (reg:PQI 12 as0)) 9 {movpqi_op} (nil)) (insn 259 746 747 2 ../rld_bug.c:94 (set (reg:QI 11 g11) (mem/s/j/c:QI (pre_modify:PQI (reg:PQI 31 a3) (plus:PQI (reg:PQI 31 a3) (const_int 1 [0x1]))) [0+1 S1 A32])) 7 {movqi_op} (expr_list:REG_INC (reg:PQI 31 a3) (nil))) (insn 747 259 748 2 ../rld_bug.c:94 (set (reg:PQI 13 as1) (reg:PQI 31 a3)) 9 {movpqi_op} (nil)) (insn 748 747 749 2 ../rld_bug.c:94 (set (reg:PQI 11 g11) (reg:PQI 13 as1)) 9 {movpqi_op} (nil)) (insn 749 748 750 2 ../rld_bug.c:94 (set (mem/c:PQI (plus:PQI (reg/f:PQI 32 sp) (const_int -24 [0xffe8])) [99 %sfp+8 S1 A32]) (reg:PQI 11 g11)) 9 {movpqi_op} (nil)) (insn 750 749 751 2 ../rld_bug.c:94 (set (mem/c:QI (plus:PQI (reg/f:PQI 32 sp) (const_int -29 [0xffe3])) [99 %sfp+3 S1 A32]) (reg:QI 11 g11)) 7 {movqi_op} (nil)) After IRA pass for insn 259 1st the modified address is stored into its spilled location and then the modified value is stored. As you can see from the instructions same register (g11) is used for Reload 5 and 2, and hence the modified value is getting corrupted and hence the modified address gets stored instead of modified value (insn 749 and insn 750). I am not able to figure out where this is going wrong in the reload phase. I suspect that this is a GCC issue. Can some one give me some pointers to resolve this issue? Regards, Shafi
Restricting with Multilib
Hi, For the target that i am porting needs a cpu command line option i.e it doesn't have a default option. Currently it takes 3 variant, say cpu1, cpu2, cpu3. So when i enable multilib option MULTILIB_OPTIONS = mcpu=1/mcpu=2/mcpu=3 I get the following libgcc variants: cpu1/libgcc cpu2/libgcc cpu3/libgcc libgcc That includes i variant for each cpu and a default version. Is there any way to restrict GCC from building the default version? Regards, Shafi
Reloading going wrong. Bug in GCC?
Hi, I am working on a 32bit private target which has the following restriction 1. store/load can happen only through a general purpose register (GP_REGS) 2. base register should be an address register (AD_REGS) 3. moves between GP_REGS and AD_REGS can happen only through PT_REGS In a PRE_MODIFY instruction when both the base register and the output register gets spilled the reloading is going wrong. befor IRA pass ~~~ (insn 259 336 317 2 ../rld_bug.c:94 (set (reg:QI 234 [+1 ]) (mem/s/j/c:QI (pre_modify:PQI (reg/f:PQI 233) (plus:PQI (reg/f:PQI 233) (const_int 1 [0x1]))) [0+1 S1 A32])) 7 {movqi_op} (expr_list:REG_INC (reg/f:PQI 233) (nil))) after IRA pass ~~~ Reloads for insn # 259 Reload 0: GP_REGS, RELOAD_FOR_OPADDR_ADDR (opnum = 1), can't combine, secondary_reload_p reload_reg_rtx: (reg:PQI 11 g11) Reload 1: PT_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), can't combine, secondary_reload_p reload_reg_rtx: (reg:PQI 12 as0) secondary_in_reload = 0 Reload 2: GP_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), can't combine, secondary_reload_p reload_reg_rtx: (reg:PQI 11 g11) Reload 3: PT_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1), can't combine, secondary_reload_p reload_reg_rtx: (reg:PQI 13 as1) secondary_out_reload = 2 Reload 4: reload_in (PQI) = (reg/f:PQI 233) reload_out (PQI) = (reg/f:PQI 233) AD_REGS, RELOAD_OTHER (opnum = 1) reload_in_reg: (reg/f:PQI 233) reload_out_reg: (reg/f:PQI 233) reload_reg_rtx: (reg:PQI 31 a3) secondary_in_reload = 1, secondary_out_reload = 3 Reload 5: reload_out (QI) = (reg:QI 234 [+1 ]) GP_REGS, RELOAD_FOR_OUTPUT (opnum = 0) reload_out_reg: (reg:QI 234 [+1 ]) reload_reg_rtx: (reg:QI 11 g11) (insn 744 336 745 2 ../rld_bug.c:94 (set (reg:PQI 11 g11) (mem/c:PQI (plus:PQI (reg/f:PQI 32 sp) (const_int -24 [0xffe8])) [99 %sfp+8 S1 A32])) 9 {movpqi_op} (nil)) (insn 745 744 746 2 ../rld_bug.c:94 (set (reg:PQI 12 as0) (reg:PQI 11 g11)) 9 {movpqi_op} (nil)) (insn 746 745 259 2 ../rld_bug.c:94 (set (reg:PQI 31 a3) (reg:PQI 12 as0)) 9 {movpqi_op} (nil)) (insn 259 746 747 2 ../rld_bug.c:94 (set (reg:QI 11 g11) (mem/s/j/c:QI (pre_modify:PQI (reg:PQI 31 a3) (plus:PQI (reg:PQI 31 a3) (const_int 1 [0x1]))) [0+1 S1 A32])) 7 {movqi_op} (expr_list:REG_INC (reg:PQI 31 a3) (nil))) (insn 747 259 748 2 ../rld_bug.c:94 (set (reg:PQI 13 as1) (reg:PQI 31 a3)) 9 {movpqi_op} (nil)) (insn 748 747 749 2 ../rld_bug.c:94 (set (reg:PQI 11 g11) (reg:PQI 13 as1)) 9 {movpqi_op} (nil)) (insn 749 748 750 2 ../rld_bug.c:94 (set (mem/c:PQI (plus:PQI (reg/f:PQI 32 sp) (const_int -24 [0xffe8])) [99 %sfp+8 S1 A32]) (reg:PQI 11 g11)) 9 {movpqi_op} (nil)) (insn 750 749 751 2 ../rld_bug.c:94 (set (mem/c:QI (plus:PQI (reg/f:PQI 32 sp) (const_int -29 [0xffe3])) [99 %sfp+3 S1 A32]) (reg:QI 11 g11)) 7 {movqi_op} (nil)) After IRA pass for insn 259 1st the modified address is stored into its spilled location and then the modified value is stored. As you can see from the instructions same register (g11) is used for Reload 5 and 2, and hence the modified value is getting corrupted and hence the modified address gets stored instead of modified value (insn 749 and insn 750). I am not able to figure out where this is going wrong in the reload phase. I suspect that this is a GCC issue. Can some one give me some pointers to resolve this issue? Regards, Shafi
Issue with delay slot scheduling?
Hi, I am doing a private port in GCC 4.5.1. For the my target i see some strange behavior in delay slot scheduling. For my target the instruction in the delay slots gets executed irrespective of whether the branch is taken or not. I have generated the following code after commenting out the call to 'relax_delay_slots' in the function 'dbr_schedule'. RTL: (insn 97 42 51 del1.c:19 (sequence [ (jump_insn 61 42 38 del1.c:19 (set (pc) (if_then_else (ne (reg:CCF 34 CC) (const_int 0 [0x0])) (label_ref:PQI 86) (pc))) 56 {conditional_branch} (expr_list:REG_BR_PRED (const_int 5 [0x5]) (expr_list:REG_DEAD (reg:CCF 34 CC) (expr_list:REG_BR_PROB (const_int 5000 [0x1388]) (nil - 86) (insn 38 61 43 (set (mem/s/j:QI (reg/f:PQI 28 a0 [orig:62 D.1955 ] [62]) [0 bytes S1 A32]) (reg:QI 1 g1 [orig:65 D.1938 ] [65])) 7 {movqi_op} (nil)) (insn 43 38 51 (set (reg:QI 1 g1 [75]) (ior:QI (reg:QI 1 g1 [orig:65 D.1938 ] [65]) (reg:QI 3 g3 [77]))) 31 {iorqi3} (expr_list:REG_EQUAL (ior:QI (reg:QI 1 g1 [orig:65 D.1938 ] [65]) (const_int 128 [0x80])) (nil))) ]) -1 (nil)) (code_label 51 97 52 1 [2 uses]) (note 52 51 73 [bb 4] NOTE_INSN_BASIC_BLOCK) (jump_insn 73 52 72 (return) 72 {return_rts} (expr_list:REG_BR_PRED (const_int 12 [0xc]) (nil))) (barrier 72 73 86) (code_label 86 72 41 5 [1 uses]) (note 41 86 45 [bb 5] NOTE_INSN_BASIC_BLOCK) (insn 45 41 44 del1.c:20 (set (reg:QI 2 g2 [orig:68 ivtmp.7 ] [68]) (plus:QI (reg:QI 2 g2 [orig:68 ivtmp.7 ] [68]) (const_int 1 [0x1]))) 13 {addqi3} (nil)) (insn 44 45 101 del1.c:20 (set (mem/s/j:QI (reg/f:PQI 28 a0 [orig:62 D.1955 ] [62]) [0 bytes S1 A32]) (reg:QI 1 g1 [75])) 7 {movqi_op} (expr_list:REG_DEAD (reg/f:PQI 28 a0 [orig:62 D.1955 ] [62]) (expr_list:REG_DEAD (reg:QI 1 g1 [75]) (nil (code_label 101 44 79 7 [1 uses]) Corresponding code: jmp.ne .L5; st [a0], g1; (INSN 38) or g1, g1, g3; (INSN 43) .L1: rts; nop; nop; .L5: add g2, g2, 1; (INSN 45) st [a0], g1;(INSN 44) - deleted .L7: You can see that INSN 44 and INSN 38 are identical. In 'relax_delay_slots' while processing INSN 97, the second call to 'try_merge_delay_insns' deletes the INSN 44 because of which unexpected result is generated. /* If we own the thread opposite the way this insn branches, see if we can merge its delay slots with following insns. */ if (INSN_FROM_TARGET_P (XVECEXP (pat, 0, 1)) own_thread_p (NEXT_INSN (insn), 0, 1)) try_merge_delay_insns (insn, next); else if (! INSN_FROM_TARGET_P (XVECEXP (pat, 0, 1)) own_thread_p (target_label, target_label, 0)) try_merge_delay_insns (insn, next_active_insn (target_label)); Deleting the INSN 44 would have been proper if the 2nd delay slot insn had not modified G1. But looking at the comments from the function 'try_merge_delay_insns' /* Try merging insns starting at THREAD which match exactly the insns in INSN's delay list. If all insns were matched and the insn was previously annulling, the annul bit will be cleared. For each insn that is merged, if the branch is or will be non-annulling, we delete the merged insn. */ I think REGOUT dependency of g1 between instructions 38 and 43 in the delay slot is not being considered by 'try_merge_delay_insns'. Is this a bug? Regards, Shafi
Re: Issue with delay slot scheduling?
On 6 September 2011 20:50, Jeff Law l...@redhat.com wrote: On 09/06/11 08:46, Mohamed Shafi wrote: Hi, I am doing a private port in GCC 4.5.1. For the my target i see some strange behavior in delay slot scheduling. For my target the instruction in the delay slots gets executed irrespective of whether the branch is taken or not. I have generated the following code after commenting out the call to 'relax_delay_slots' in the function 'dbr_schedule'. [ ... ] It looks like you have found a bug. While reorg.c is supposed to work with targets that have multiple delay slots, it's not something that has been extensively tested. I think REGOUT dependency of g1 between instructions 38 and 43 in the delay slot is not being considered by 'try_merge_delay_insns'. You're probably correct. Jeff How do raise a bug report, mine being a private target? Regards, Shafi
How to generate loop counter with a different mode ?
Hi all, I am trying to add support for hardware loops for a 32bit target. In the target QImode is 32bit. The loop counter used in hardware loop construct is 17bit address registers. This is represented using PQImode. Since mode for the doloop pattern is found out after loop discovery it need not be always PQImode . So what i did was to convert the mode of the counter variable to PQImode then emit the a new pattern with PQImode along with other bells and whistles required by the target loop construct. I am able to generate the assembly files with the proper loop initialization instructions and all. But the issue is that the loop counter is set to 0 in the body of the loop. In define_expand (in doloop_end and doloop_begin) I am converting to PQImode using the following construct: operands[0] = convert_to_mode (PQImode, operands[0], 0); So the above construct will result in an rtl pattern like: (insn 33 17 34 4 loop.c:52 (set (reg:PQI 50) (truncate:PQI (reg:QI 49))) -1 (nil)) But GCC will extract the loop counter from the define_expand generated doloop pattern, which is in PQImode. (insn 33 17 34 4 loop.c:52 (set (reg:PQI 50) (truncate:PQI (reg:QI 49))) -1 (nil)) (jump_insn 34 33 20 4 loop.c:52 (parallel [ (set (pc) (if_then_else (ne (reg:PQI 50) (const_int 1 [0x1])) (label_ref:PQI 30) (pc))) (set (reg:PQI 50) (plus:PQI (reg:PQI 50) (const_int -1 [0x]))) (unspec [ (const_int 0 [0x0]) ] 3) (clobber (scratch:PQI)) ]) 62 {doloop_end_pqi} (expr_list:REG_BR_PROB (const_int 9100 [0x238c]) (nil)) - 30) This is the counter value that gets used for doloop begin. Hence the original loop counter (reg:QI 49) never gets initialized. Due to this 'if-conversion' pass will modify the statement to: (insn 33 38 34 4 loop.c:52 (set (reg:PQI 50) (const_int 0 [0x0])) 9 {movpqi_op} (nil)) This results in loop counter being set to 0 in the body of the loop. Can someone suggest me solution to get out of this? Regards, Shafi
Reloading an auto-increment addresses
Hello all, I am porting GCC 4.5.1 for a private target. For one particular test reloading pass is being asked to reload the following instruction: (insn 45 175 46 11 pr20601-1.c:90 (set (reg/f:PQI 3 g3 [70]) (mem/f:PQI (pre_inc:PQI (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55])) [2 S1 A32])) 9 {movpqi_op} (expr_list:REG_INC (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55]) (nil))) The address is invalid in this. Base address should always be stored in the address register. This instruction gets reloaded in the following manner: (insn 175 43 202 11 pr20601-1.c:90 (set (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55]) (reg/f:PQI 12 as0 [orig:49 e.4 ] [49])) 9 {movpqi_op} (nil)) (insn 202 175 203 11 pr20601-1.c:90 (set (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55]) (plus:PQI (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55]) (const_int 1 [0x1]))) 14 {addpqi3} (nil)) (insn 203 202 45 11 pr20601-1.c:90 (set (reg:PQI 28 a0) (reg/f:PQI 1 g1 [orig:55 prephitmp.16 ] [55])) 9 {movpqi_op} (nil)) (insn 45 203 46 11 pr20601-1.c:90 (set (reg/f:PQI 3 g3 [70]) (mem/f:PQI (reg:PQI 28 a0) [2 S1 A32])) 9 {movpqi_op} (nil)) The issue with this reload is that there is no move operation between GP registers and address registers. So insn 203 is invalid. I am catching these kinds in secondary reloads, but auto-increment addressing modes are not handled in that . So if i try to do that in TARGET_SECONDARY_RELOAD i am getting assert failure from reload1.c:emit_input_reload_insns() due to the following code: /* Auto-increment addresses must be reloaded in a special way. */ if (rl-out ! rl-out_reg) { /* We are not going to bother supporting the case where a incremented register can't be copied directly from OLDEQUIV since this seems highly unlikely. */ gcc_assert (rl-secondary_in_reload 0); How can i overcome this failure? Can some one suggest a solution? Thanks for the help. Regards, Shafi
Re: Reloading an auto-increment addresses
On 11 February 2011 15:28, Paulo J. Matos pocma...@gmail.com wrote: On 11/02/11 09:46, Mohamed Shafi wrote: How can i overcome this failure? Can some one suggest a solution? Have you defined TARGET_LEGITIMATE_ADDRESS_P and also BASE_REG_CLASS correctly for your target? Yes, I have. Register allocator is allocating the wrong registers for the base registers. This probably is due to the fact that address registers cannot be saved and restored directly, a secondary reload is required. There is also the restriction that there is no move operation between the address registers. For that also a secondary reload is required. (I know its weird). I am trying to figure out why register allocator is not assigning a base register. But even then, reload could be asked to reload a auto-increment addresses. Shafi
Re: ICE in get_constraint_for_component_ref
On 10 February 2011 15:57, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Feb 10, 2011 at 6:23 AM, Mohamed Shafi shafi...@gmail.com wrote: Hi all, I am trying to port a private target in GCC 4.5.1. Following are the properties of the target #define BITS_PER_UNIT 32 #define BITS_PER_WORD 32 #define UNITS_PER_WORD 1 #define CHAR_TYPE_SIZE 32 #define SHORT_TYPE_SIZE 32 #define INT_TYPE_SIZE 32 #define LONG_TYPE_SIZE 32 #define LONG_LONG_TYPE_SIZE 32 I am getting an ICE internal compiler error: in get_constraint_for_component_ref, at tree-ssa-structalias.c:3031 For the following testcase: struct fb_cmap { int start; int len; int *green; }; extern struct fb_cmap fb_cmap; void directcolor_update_cmap(void) { fb_cmap.green[0] = 34; } The following is the output of debug_tree of the argument thats given for the function get_constraint_for_component_ref component_ref 0x2b6a45618080 type pointer_type 0x2b6a45559930 type integer_type 0x2b6a4554a498 int public QI size integer_cst 0x2b6a4553c460 constant 32 unit size integer_cst 0x2b6a4553c488 constant 1 align 32 symtab 0 alias set -1 canonical type 0x2b6a4554a498 precision 32 min integer_cst 0x2b6a4553c5c8 -2147483648 max integer_cst 0x2b6a4553c5f0 2147483647 pointer_to_this pointer_type 0x2b6a45559930 unsigned PQI size integer_cst 0x2b6a4553c460 32 unit size integer_cst 0x2b6a4553c488 1 align 32 symtab 0 alias set -1 canonical type 0x2b6a45559930 arg 0 var_decl 0x2b6a45614000 fb_cmap type record_type 0x2b6a45602888 fb_cmap type_0 BLK size integer_cst 0x2b6a455fc4d8 constant 96 unit size integer_cst 0x2b6a455fc488 constant 3 align 32 symtab 0 alias set -1 canonical type 0x2b6a45602888 fields field_decl 0x2b6a45613000 start context translation_unit_decl 0x2b6a4555f7e8 D.1201 chain type_decl 0x2b6a4555f730 D.1193 used public external common BLK file pr28675.c line 7 col 23 size integer_cst 0x2b6a455fc4d8 96 unit size integer_cst 0x2b6a455fc488 3 align 32 chain function_decl 0x2b6a45616000 directcolor_update_cmap type function_type 0x2b6a45560888 public static QI file pr28675.c line 9 col 6 align 32 initial block 0x2b6a45619000 result result_decl 0x2b6a45617000 D.1200 (mem:QI (symbol_ref:PQI (directcolor_update_cmap) [flags 0x3] function_decl 0x2b6a45616000 directcolor_update_cmap) [0 S1 A32]) struct-function 0x2b6a455453f0 arg 1 field_decl 0x2b6a45613130 green type pointer_type 0x2b6a45559930 unsigned PQI file pr28675.c line 4 col 7 size integer_cst 0x2b6a4553c460 32 unit size integer_cst 0x2b6a4553c488 1 align 32 offset_align 32 offset integer_cst 0x2b6a4553c8c0 constant 2 bit offset integer_cst 0x2b6a4553cc80 constant 0 context record_type 0x2b6a45602888 fb_cmap pr28675.c:11:10 I was wondering if this ICE is due to the fact that this is a 32bit char target ? Can somebody help me with pointers to debug this issue? Try fixing the * 8 in bitpos_of_field to use BITS_PER_UNIT. That did the trick. Looking at the code i assume that this is proper and hence should be committed in the trunk and 4.5 branch. Will that be done? Shafi
Re: ICE in get_constraint_for_component_ref
On 10 February 2011 17:16, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Feb 10, 2011 at 12:42 PM, Mohamed Shafi shafi...@gmail.com wrote: On 10 February 2011 15:57, Richard Guenther richard.guent...@gmail.com wrote: On Thu, Feb 10, 2011 at 6:23 AM, Mohamed Shafi shafi...@gmail.com wrote: Hi all, I am trying to port a private target in GCC 4.5.1. Following are the properties of the target #define BITS_PER_UNIT 32 #define BITS_PER_WORD 32 #define UNITS_PER_WORD 1 #define CHAR_TYPE_SIZE 32 #define SHORT_TYPE_SIZE 32 #define INT_TYPE_SIZE 32 #define LONG_TYPE_SIZE 32 #define LONG_LONG_TYPE_SIZE 32 I am getting an ICE internal compiler error: in get_constraint_for_component_ref, at tree-ssa-structalias.c:3031 For the following testcase: struct fb_cmap { int start; int len; int *green; }; extern struct fb_cmap fb_cmap; void directcolor_update_cmap(void) { fb_cmap.green[0] = 34; } The following is the output of debug_tree of the argument thats given for the function get_constraint_for_component_ref component_ref 0x2b6a45618080 type pointer_type 0x2b6a45559930 type integer_type 0x2b6a4554a498 int public QI size integer_cst 0x2b6a4553c460 constant 32 unit size integer_cst 0x2b6a4553c488 constant 1 align 32 symtab 0 alias set -1 canonical type 0x2b6a4554a498 precision 32 min integer_cst 0x2b6a4553c5c8 -2147483648 max integer_cst 0x2b6a4553c5f0 2147483647 pointer_to_this pointer_type 0x2b6a45559930 unsigned PQI size integer_cst 0x2b6a4553c460 32 unit size integer_cst 0x2b6a4553c488 1 align 32 symtab 0 alias set -1 canonical type 0x2b6a45559930 arg 0 var_decl 0x2b6a45614000 fb_cmap type record_type 0x2b6a45602888 fb_cmap type_0 BLK size integer_cst 0x2b6a455fc4d8 constant 96 unit size integer_cst 0x2b6a455fc488 constant 3 align 32 symtab 0 alias set -1 canonical type 0x2b6a45602888 fields field_decl 0x2b6a45613000 start context translation_unit_decl 0x2b6a4555f7e8 D.1201 chain type_decl 0x2b6a4555f730 D.1193 used public external common BLK file pr28675.c line 7 col 23 size integer_cst 0x2b6a455fc4d8 96 unit size integer_cst 0x2b6a455fc488 3 align 32 chain function_decl 0x2b6a45616000 directcolor_update_cmap type function_type 0x2b6a45560888 public static QI file pr28675.c line 9 col 6 align 32 initial block 0x2b6a45619000 result result_decl 0x2b6a45617000 D.1200 (mem:QI (symbol_ref:PQI (directcolor_update_cmap) [flags 0x3] function_decl 0x2b6a45616000 directcolor_update_cmap) [0 S1 A32]) struct-function 0x2b6a455453f0 arg 1 field_decl 0x2b6a45613130 green type pointer_type 0x2b6a45559930 unsigned PQI file pr28675.c line 4 col 7 size integer_cst 0x2b6a4553c460 32 unit size integer_cst 0x2b6a4553c488 1 align 32 offset_align 32 offset integer_cst 0x2b6a4553c8c0 constant 2 bit offset integer_cst 0x2b6a4553cc80 constant 0 context record_type 0x2b6a45602888 fb_cmap pr28675.c:11:10 I was wondering if this ICE is due to the fact that this is a 32bit char target ? Can somebody help me with pointers to debug this issue? Try fixing the * 8 in bitpos_of_field to use BITS_PER_UNIT. That did the trick. Looking at the code i assume that this is proper and hence should be committed in the trunk and 4.5 branch. Will that be done? I'll include it in one of my next bootstraps/tests and commit it. Thanks Richard :) Shafi
ICE in get_constraint_for_component_ref
Hi all, I am trying to port a private target in GCC 4.5.1. Following are the properties of the target #define BITS_PER_UNIT 32 #define BITS_PER_WORD32 #define UNITS_PER_WORD 1 #define CHAR_TYPE_SIZE32 #define SHORT_TYPE_SIZE 32 #define INT_TYPE_SIZE 32 #define LONG_TYPE_SIZE32 #define LONG_LONG_TYPE_SIZE 32 I am getting an ICE internal compiler error: in get_constraint_for_component_ref, at tree-ssa-structalias.c:3031 For the following testcase: struct fb_cmap { int start; int len; int *green; }; extern struct fb_cmap fb_cmap; void directcolor_update_cmap(void) { fb_cmap.green[0] = 34; } The following is the output of debug_tree of the argument thats given for the function get_constraint_for_component_ref component_ref 0x2b6a45618080 type pointer_type 0x2b6a45559930 type integer_type 0x2b6a4554a498 int public QI size integer_cst 0x2b6a4553c460 constant 32 unit size integer_cst 0x2b6a4553c488 constant 1 align 32 symtab 0 alias set -1 canonical type 0x2b6a4554a498 precision 32 min integer_cst 0x2b6a4553c5c8 -2147483648 max integer_cst 0x2b6a4553c5f0 2147483647 pointer_to_this pointer_type 0x2b6a45559930 unsigned PQI size integer_cst 0x2b6a4553c460 32 unit size integer_cst 0x2b6a4553c488 1 align 32 symtab 0 alias set -1 canonical type 0x2b6a45559930 arg 0 var_decl 0x2b6a45614000 fb_cmap type record_type 0x2b6a45602888 fb_cmap type_0 BLK size integer_cst 0x2b6a455fc4d8 constant 96 unit size integer_cst 0x2b6a455fc488 constant 3 align 32 symtab 0 alias set -1 canonical type 0x2b6a45602888 fields field_decl 0x2b6a45613000 start context translation_unit_decl 0x2b6a4555f7e8 D.1201 chain type_decl 0x2b6a4555f730 D.1193 used public external common BLK file pr28675.c line 7 col 23 size integer_cst 0x2b6a455fc4d8 96 unit size integer_cst 0x2b6a455fc488 3 align 32 chain function_decl 0x2b6a45616000 directcolor_update_cmap type function_type 0x2b6a45560888 public static QI file pr28675.c line 9 col 6 align 32 initial block 0x2b6a45619000 result result_decl 0x2b6a45617000 D.1200 (mem:QI (symbol_ref:PQI (directcolor_update_cmap) [flags 0x3] function_decl 0x2b6a45616000 directcolor_update_cmap) [0 S1 A32]) struct-function 0x2b6a455453f0 arg 1 field_decl 0x2b6a45613130 green type pointer_type 0x2b6a45559930 unsigned PQI file pr28675.c line 4 col 7 size integer_cst 0x2b6a4553c460 32 unit size integer_cst 0x2b6a4553c488 1 align 32 offset_align 32 offset integer_cst 0x2b6a4553c8c0 constant 2 bit offset integer_cst 0x2b6a4553cc80 constant 0 context record_type 0x2b6a45602888 fb_cmap pr28675.c:11:10 I was wondering if this ICE is due to the fact that this is a 32bit char target ? Can somebody help me with pointers to debug this issue? Regards, Shafi
Re: Help with reloading
On 20 December 2010 10:56, Jeff Law l...@redhat.com wrote: On 12/15/10 07:14, Mohamed Shafi wrote: Hi, I am doing a port in GCC 4.5.1. The target supports storing immediate values into memory location represented by a symbolic address. So in the move pattern i have given constraints to represent this. Presumably the target does not support storing an immediate value into other MEMs? ie, the only store-immediate is to a symbolic memory operand, right? yes you are right. I think this is a case where you're going to need a secondary reload to force the immediate into a register if the destination is a non-symbolic MEM or a pseudo without a hard reg and its equivalent address is non-symbolic. I am not sure how i should be implementing this. Currently in define_expand for move i have code to force the immediate value into a register if the destination is not a symbolic address. If i understand correctly this is the only place where i can decide what to do with the source depending on the destination. right? Moreover for the pattern (insn 27 25 33 4 pr23848-3.c:12 (set (mem/s/j:QI (reg/f:PQI 12 as0 [69]) [0 S1 A32]) (reg:QI 93)) 7 {movqi_op} (expr_list:REG_DEAD (reg/f:PQI 12 as0 [69]) (expr_list:REG_EQUAL (const_int 0 [0x0]) (nil destination is the src operand gets converted by /* This is equivalent to calling find_reloads_toplev. The code is duplicated for speed. When we find a pseudo always equivalent to a constant, we replace it by the constant. We must be sure, however, that we don't try to replace it in the insn in which it is being set. */ int regno = REGNO (recog_data.operand[i]); if (reg_equiv_constant[regno] != 0 (set == 0 || SET_DEST (set) != recog_data.operand_loc[i])) { /* Record the existing mode so that the check if constants are allowed will work when operand_mode isn't specified. */ if (operand_mode[i] == VOIDmode) operand_mode[i] = GET_MODE (recog_data.operand[i]); substed_operand[i] = recog_data.operand[i] = reg_equiv_constant[regno]; } and since the destination is already selected for reload /* If the address was already reloaded, we win as well. */ else if (MEM_P (operand) address_reloaded[i] == 1) win = 1; the reload phase never reaches secondary reload. So i do not understand your answer. Could you explain it briefly. Regards, Shafi
Re: Help with reloading
On 20 December 2010 19:30, Jeff Law l...@redhat.com wrote: On 12/20/10 01:47, Mohamed Shafi wrote: I think this is a case where you're going to need a secondary reload to force the immediate into a register if the destination is a non-symbolic MEM or a pseudo without a hard reg and its equivalent address is non-symbolic. I am not sure how i should be implementing this. Currently in define_expand for move i have code to force the immediate value into a register if the destination is not a symbolic address. If i understand correctly this is the only place where i can decide what to do with the source depending on the destination. right? Just changing the movxx expander is not sufficient since for this case you do not know until reload time whether or not a particular insn needs an extra register to implement the move. That's the whole point of the secondary reload mechanism -- to allow you to allocate a scratch register during reloading to handle oddball cases like this. In your secondary reload code you'll need to check for the case where the destination is a MEM and the source is an unallocated pseudo with a constant equivalent and return a suitable register class for that case. Jeff, thanks for the reply. I didn't know that you could do that in TARGET_SECONDARY_RELOAD hook. Can you point me to some target that does this - figuring out what the destination is based on the source or vice versa. In my case only the address operand comes into TARGET_SECONDARY_RELOAD hook during the reload pass. I am not sure how to find out the source for the pattern which has this particular address as the destination. Sorry for the trouble. Shafi
Help with reloading
Hi, I am doing a port in GCC 4.5.1. The target supports storing immediate values into memory location represented by a symbolic address. So in the move pattern i have given constraints to represent this. (define_insn movqi_op [(set (match_operand:QI 0 nonimmediate_operand =!Q,!Q,d,d,d,d,d,d,d,Q,R,S) (match_operand:QI 1 general_operand I,J,i,W,J,d,Q,R,S,d,d,d))] @ st.s32\t%0, %1; st.u32\t%0, %1; set\t%0, %1; set.u32\t%0, %1; set.u32\t%0, %1; move\t%0, %1; ld%u0\t%0, %1; ld%u0\t%0, %1; ld%u0\t%0, %1; st%u0\t%0, %1; st%u0\t%0, %1; st%u0\t%0, %1; ) where Q represents symbolic address, R represents all address formed using SP S represents all address formed using address registers I, J,W,i represents various const_ints d represents general registers. Whenever reload get a pattern to store const_int to a memory that is scheduled for reloading, the reload pass will match it with Q constraints. So to avoid those i added the constrain modifier '!' to 'Q'. But even then there is one particular case that causes trouble. This happens when reload pass gets a pattern where the destination is an illegal address and source is a pesudo register (no register allocated) for which reg_equiv_constant[regno] != 0. Before IRA pass: (insn 27 25 33 4 pr23848-3.c:12 (set (mem/s/j:QI (reg/f:PQI 69) [0 S1 A32]) (reg:QI 93)) 7 {movqi_op} (expr_list:REG_DEAD (reg/f:PQI 69) (expr_list:REG_EQUAL (const_int 49 [0x31]) (nil Just before reloading phase: (insn 27 25 33 4 pr23848-3.c:12 (set (mem/s/j:QI (reg/f:PQI 12 as0 [69]) [0 S1 A32]) (reg:QI 93)) 7 {movqi_op} (expr_list:REG_DEAD (reg/f:PQI 12 as0 [69]) (expr_list:REG_EQUAL (const_int 0 [0x0]) (nil Since reg93 is not allocated with any register, its replaced with reg_equiv_constant[regno], and this combination wins the (Q, I) constraint pair and in that process 'losers' (variable in loop over alternatives) becomes 0 and hence breaks out and returns. Due to this compiler crashes with insn does not satisfy its constraints: error. Any pointers in fixing this? Regards, Shafi P.S. When can we merge constraints? What are the criteria to decide which all constraints to merge
Re: Help with reloading FP + offset addressing mode
On 30 October 2010 05:45, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Mohamed Shafi shafi...@gmail.com: On 29 October 2010 00:06, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Mohamed Shafi shafi...@gmail.com: Hi, I am doing a port in GCC 4.5.1. For the port 1. there is only (reg + offset) addressing mode only when reg is SP. Other base registers are not allowed 2. FP cannot be used as a base register. (FP based addressing is done by copying it into a base register) In order to take advantage of FP elimination (this will create SP + offset addressing), what i did the following 1. Created a new register class (address registers + FP) and used this new class as the BASE_REG_CLASS Stop right there. You need to distinguish between FRAME_POINTER_REGNUM and HARD_FRAME_POINTER_REGNUM. From the description given in the internals, i am not able to understand why you suggested this. Could you please explain this? In order to trigger reloading of the address, you have to have a register elimination, even if the stack pointer is not a suitable destinatination for the elimination. Also, if you want to reload do the work for you, you must not lie to it about the addressing capabilities of an actual hard register. Hence, you need separate hard and soft frame pointers. If you have them, but conflate them when you describe what you are doing in your port, you are not only likely to confuse the listener/reader, but also your documentation, your code, and ultimately yourself. Having a FRAME_POINTER_REGNUM and HARD_FRAME_POINTER_REGNUM will trigger reloading of address. But for the following pattern (insn 3 2 4 2 test.c:120 (set (mem/c/i:QI (plus:QI (reg/f:QI 35 SFP) (const_int 1 [0x1])) [0 c+0 S1 A32]) (reg:QI 0 g0 [ c ])) 7 {movqi_op} (nil)) where SFP is FRAME_POINTER_REGNUM, an elimination will result in (insn 3 2 4 2 test.c:120 (set (mem/c/i:QI (plus:QI (reg/f:QI 27 as15) (const_int 1 [0x1])) [0 c+0 S1 A32]) (reg:QI 0 g0 [ c ])) 7 {movqi_op} (nil)) where as15 is the HARD_FRAME_POINTER_REGNUM. But remember this new address is not valid (as only SP is allowed in this addressing mode). When the above pattern is reloaded i get: (insn 28 27 4 2 test.c:120 (set (mem/c/i:QI (plus:QI (reg:QI 28 a0) (const_int 1 [0x1])) [0 c+0 S1 A32]) (reg:QI 3 g3)) -1 (nil)) I get unrecognizable insn ICE, because this addressing mode is not valid. I believe this happens because when the reload_pass get the address of the form (reg + off), it assumes that the address is invalid due to one of the following: 1. 'reg' is not a suitable base register 2. the offset is out of range 3. the address has an eliminatable register as a base register. Is there any way to over come this one? Any help is appreciated. Shafi
A question about combining constraints
Hi, For a private target that i am porting in GCC 4.5 I have the following pattern in my md file for call value: (define_insn call_value_op [(set (match_operand 0 register_operand =da) (call (mem:QI (match_operand:QI 1 call_operand Wd)) (match_operand:QI 2 )))] jsr\\t%1 [(set_attr slottable has_slot)] ) All the constraints are one letter constraints for my target. Here 'W' is for symbol_ref and all others are register constraints. So for a particular combination when operand 0 is 'a' and operand 1 is 'W' i got the following ICE : error: unable to generate reloads for: (call_insn 11 4 12 2 test.c:7 (set (reg:QI 12 as0) (call (mem:QI (symbol_ref:QI (malloc) [flags 0x41] function_decl 0x2b5733ff3600 __builtin_malloc) [0 S1 A32]) (const_int 0 [0x0]))) 50 {call_value_op} (expr_list:REG_DEAD (reg:QI 0 g0) (expr_list:REG_EH_REGION (const_int 0 [0x0]) (nil))) (expr_list:REG_DEP_TRUE (use (reg:QI 0 g0)) (nil))) I get this ICE because the constraints are not matched properly. I ICE goes away when i write the constraints as: =ad, Wd or a,a,d,d, , W,W,d,d So i have the following questions: 1. Why is that constraints are not matched here? 2. When can i combine the constrains? Regards, Shafi
Re: A question about combining constraints
On 12 November 2010 18:39, Joern Rennecke amyl...@spamcop.net wrote: Quoting Mohamed Shafi shafi...@gmail.com: So i have the following questions: 1. Why is that constraints are not matched here? Please read the node Register Classes in doc/tm.texi . I am sorry , could you please highlight the relevant portion for me? In the pattern that i have given the combination (a,W) satisfies the pattern. But its not matched because i have given then like (da,Wd). I know that we can combine the constraints together. Shafi
Opinion on a hardware feature for conditional instructions
Hi all, I need a opinion on a design front. I am doing a port for a private target in GCC 4.5.1. We are also in the process of designing the hardware along with the development of the build tools. Currently we don't have enough bits in the encoding to support conditional instruction like arm does. i.e. you have the option to decide whether to update the status flags or not. So what is the next best thing to have? 1. Allow both conditional and non-conditional instructions to update the status flags 2. Allow only non-conditional instructions to update the status flags Could you please let me know your thoughts on this and the reason for choosing it? Regards, Shafi
Re: Help with reloading FP + offset addressing mode
On 30 October 2010 05:45, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Mohamed Shafi shafi...@gmail.com: On 29 October 2010 00:06, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Mohamed Shafi shafi...@gmail.com: Hi, I am doing a port in GCC 4.5.1. For the port 1. there is only (reg + offset) addressing mode only when reg is SP. Other base registers are not allowed 2. FP cannot be used as a base register. (FP based addressing is done by copying it into a base register) In order to take advantage of FP elimination (this will create SP + offset addressing), what i did the following 1. Created a new register class (address registers + FP) and used this new class as the BASE_REG_CLASS Stop right there. You need to distinguish between FRAME_POINTER_REGNUM and HARD_FRAME_POINTER_REGNUM. From the description given in the internals, i am not able to understand why you suggested this. Could you please explain this? In order to trigger reloading of the address, you have to have a register elimination, even if the stack pointer is not a suitable destinatination for the elimination. Also, if you want to reload do the work for you, you must not lie to it about the addressing capabilities of an actual hard register. Hence, you need separate hard and soft frame pointers. Debugging sessions of the reload pass tells me that if the reload_pass get the address of the form (reg + off), it assumes one of the following: 1. the address is invalid because 'reg' is not a suitable base register 2. the offset is out of range 3. the address has an eliminatable register as a base register. Depending on what it finds, reload_pass reloads the address accordingly. So for my target when the pass encounters the address of the form: (plus:QI (reg/f:QI 33 ArgP) (const_int -2 [0xfffe])) it eliminates the arg pointer to either stack or frame pointer and reloads it. If the base register is FP, during reloading it just reloads the FP with a valid base register, but then the address becomes invalid. Relaod_pass cannot figure out that the addressing mode itself is invalid due to wrong base register. Since SP is the only valid register among the base registers that can form (reg + off) addressing mode, for the reload to work properly i will have to allow this addressing mode only when SP is base register - even in non-strict mode. But then i will loose lot of oppurtunities when elimination happens in favour of SP. Hence i allow the above form of address for all frame related pesudos. So to respond to your comments, i agree that as far as possible the port has to be truthful to reload pass about the addressing mode capabilities, but then i am not sure if distinguishing between FRAME_POINTER_REGNUM and HARD_FRAME_POINTER_REGNUM will help my cause. Do you agree? Or am i not understanding what your suggestion implies? Shafi
Help with reloading FP + offset addressing mode
Hi, I am doing a port in GCC 4.5.1. For the port 1. there is only (reg + offset) addressing mode only when reg is SP. Other base registers are not allowed 2. FP cannot be used as a base register. (FP based addressing is done by copying it into a base register) In order to take advantage of FP elimination (this will create SP + offset addressing), what i did the following 1. Created a new register class (address registers + FP) and used this new class as the BASE_REG_CLASS 2. Defined HARD_REGNO_OK_FOR_BASE_P like the following : #define HARD_REGNO_OK_FOR_BASE_P(NUM) \ ((NUM) FIRST_PSEUDO_REGISTER \ (((reload_completed || reload_in_progress)? 0 : (NUM) == FP_REG) \ || REGNO_REG_CLASS(NUM) == ADD_REGS)) 3. In legitimate_address_p i have the followoing: if (REGNO (x) == FP_REG) { if (strict) return false; else return true; } else if (strict) return STRICT_REG_OK_FOR_BASE_P (REGNO (x)); else return NONSTRICT_REG_OK_FOR_BASE_P (REGNO (x)); But when FP doesn't get eliminated i will get address of the form (plus:QI (reg/f:QI 27 as15) (const_int 2)) which gets reloaded by replacing FP with address register, other than SP. I am guessing this happens because of modified BASE_REG_CLASS. I haven't confirmed this. So in order to over come this what i have done is, in legitimize_reload_address i have the following : if (GET_CODE (*x) == PLUS REG_P (XEXP (*x, 0)) REGNO (XEXP (*x, 0)) FIRST_PSEUDO_REGISTER GET_CODE (XEXP (*x, 1)) == CONST_INT XEXP (*x, 0) == frame_pointer_rtx) { /* GCC will by default reload the FP into a BASE_CLASS_REG, which results in an invalid address. For us, the best thing to do is move the whole expression to a REG. */ push_reload (*x, NULL_RTX, x, NULL, SPAA_REGS, mode, VOIDmode,0, 0, opnum, (enum reload_type)type); return 1; } Does my logic makes sense? Is there any better way to implement this? With this implementation for the following sequence : (insn 9 6 10 2 fun_calls.c:12 (set (reg/f:QI 42) (mem/f/c/i:QI (plus:QI (reg/f:QI 33 AP) (const_int -2 [0xfffe])) [0 f+0 S1 A32])) 9 {movqi_op} (nil)) (insn 10 9 11 2 fun_calls.c:12 (set (reg:QI 43) (const_int 60 [0x3c])) 7 {movqi_op} (nil)) I am getting the following output: (insn 45 6 47 2 fun_calls.c:12 (set (reg:QI 28 a0) (const_int 2 [0x2])) 9 {movqi_op} (nil)) (insn 47 45 48 2 fun_calls.c:12 (set (reg:QI 28 a0) (reg/f:QI 27 as15)) 9 {movqi_op} (nil)) (insn 48 47 49 2 fun_calls.c:12 (set (reg:QI 28 a0) (plus:QI (reg:QI 28 a0) (const_int 2 [0x2]))) 14 {addqi3} (expr_list:REG_EQUIV (plus:QI (reg/f:QI 27 as15) (const_int 2 [0x2])) (nil))) (insn 49 48 10 2 fun_calls.c:12 (set (reg/f:QI 0 g0 [42]) (mem/f/c/i:QI (reg:QI 28 a0) [0 f+0 S1 A32])) 9 {movqi_op} (nil)) insn 45 is redundant. Is this generated because the legitimize_reload_address is wrong? Any hints as to why the redundant instruction gets generated? Regards, Shafi
Re: Help with reloading FP + offset addressing mode
On 29 October 2010 00:06, Joern Rennecke joern.renne...@embecosm.com wrote: Quoting Mohamed Shafi shafi...@gmail.com: Hi, I am doing a port in GCC 4.5.1. For the port 1. there is only (reg + offset) addressing mode only when reg is SP. Other base registers are not allowed 2. FP cannot be used as a base register. (FP based addressing is done by copying it into a base register) In order to take advantage of FP elimination (this will create SP + offset addressing), what i did the following 1. Created a new register class (address registers + FP) and used this new class as the BASE_REG_CLASS Stop right there. You need to distinguish between FRAME_POINTER_REGNUM and HARD_FRAME_POINTER_REGNUM. From the description given in the internals, i am not able to understand why you suggested this. Could you please explain this? Shafi
Need help in deciding the instruction set for a new target.
Hello all, I am trying to do a port on GCC 4.5. The target has a memory resolution of 32bits i.e. char is 32bits in the target (addr 0 selects 1st 32bit and addr 1 selects 2nd 32bit). It has only word (32bit) access. In terms of address resolution this target is similar to c4x which became obsolete in GCC 4.2. There are two ways to implement this port. One is to have BITS_PER_UNIT ==32, like c4x and other is to have a normal C like char == 8, short == 16, and int == 32. We are thinking about having BITS_PER_UNIT == 32. Yes I know the support for such a target is bit rotten in GCC. I am currently trying to removing it. In the mean time, we are in the process of finalizing the instructions. The current instruction set has support for 32bit immediate data only in move operations. i.e. move src1GP, #imm32 For all other operations like div, sub, add, compare, modulus, load, store the support is only for 16bit immediate. For all these instruction there is separate flavor for sign and zero extension. i.e. mod.s32 srcdstGP, #imm16 // 32%imm16 signed modulus mod.u32 srcdstGP, #imm16 // 32%imm16 unsigned modulus cmp.s32 src1GP, #imm16 // signed register to 16-bit immediate compare cmp.u32 src1GP, #imm16 // unsigned register to 16-bit immediate compare sub.s32 srcdstGP, #imm16 // signed 16-bit register to immediate subtract sub.u32 srcdstGP, #imm16 // unsigned 16-bit register to immediate subtract I want to know if it is good to have both sign and zero extension for 16bit immediate. Will it be of any use with a configuration where char == short == int == 32bit? Will I be able to support these kinds of instructions in a GCC port? Or will it good to have a separate sign and zero extension instruction, which the current instruction set doesn’t have. Do I need a separate sign and zero ext instructions along with the above instructions? It would be of great help if you could guide me in deciding these instructions. Regards, Shafi
Help for target with BITS_PER_UNIT = 16
Hello all, I am trying to port GCC 4.5.1 for a processor that has the following addressing capability: The data memory address space of 64K bytes is represented by a total of 15 bits, with each address selecting a 16-bit element. When using the address register, the LSB of address reg (AD) points to a 16-bit field in data memory. If a data memory line is 128 bits there are 8, 16-bit elements per data memory line. We use little endian addressing, so if AD=0, bits [15:0] of data memory line address 0 would be selected. If AD=1, bits [31:16] of data memory line address 0 would be selected. If AD=9, bits [31:16] of data memory line address 1 would be selected. So if i have the following program short arr[5] = {11,12,13,14,15}; int foo () { short a = arr[0] + arr[3]; return a; } Assume that short is 16bits and short address is 2byte aligned.Then I expect the following code to get generated: mov a0,#arr // Load the address mov a1, a0 // Copy the address add a1, 1 // Increment the location by 1 so that the address points to arr[1] ld.16 g0, (a1) // Load the value 12 into g0 mov a1, a0 // Copy the address add a1, 3 // Increment the location by 3 so that the address points to arr[3] ld.16 g1, (a1) // Load the value 14 into g0 add g1, g1, g0 // Add 12 and 14 For the following code: short arr[5] = {11,12,13,14,15}; int foo () { short a,b; a = (short) (arr[3] - arr[1]); // a is 2 after this operation b = (short) ((char*)arr[3] - (char*)arr[1]); // b is 4 after this operation return a; } My question is should i set the macro BITS_PER_UNIT = 16 to get a code generated like this? From IRC chat i realize that BITS_PER_UNIT != 8 is seriously rotten. If that is the case how can i proceed to port this target? Regards, Shafi
how to identify a part of a multi-word register
Hi, I am doing a port for a 32bit target in GCC 4.4.0. I need a way to identify that a register is part of a multiword register. I need to emit an instruction that works on LSW of the double word register on move instructions. Currently the target splits the DImode and DFmode moves after reloading. So i am able to generate the required instruction while doing the split. But it seems that sometimes the subreg pass splits the multiword register into SImode or SFmode register references before reg-alloc. Since it is not required to split these moves, I am not able to insert the required instruction for LSW. So I was wondering if it is possible to recognize a register as a part of a multiword register? In the rtl-dumps there are expressions like : (insn 255 254 256 2 pr28634.c:13 (set (mem/v/c/i:SI (plus:SI (reg/f:SI 49 sp) (const_int -16 [0xfff0])) [2 y+0 S4 A64]) (reg:SI 2 d2)) 2 {*movsi_internal} (nil)) (insn 256 255 257 2 pr28634.c:13 (set (mem/v/c/i:SI (plus:SI (reg/f:SI 49 sp) (const_int -12 [0xfff4])) [2 y+4 S4 A32]) (reg:SI 3 d3 [+4 ])) 2 {*movsi_internal} (nil)) which points out that d3 is part of a multiword register. Looking into the gcc sources I find that this is done with the help of REG_OFFSET macro. So can I use this macro to identify a register as a part of multiword register? Is there any other way to do this? Regards, Shafi
Question about peephole2 and addressing mode
Hello all, I am doing a port for a 32bit a target in GCC 4.4.0. The target supports (base + offset) addressing mode for QImode store instructions but not for QImode load instructions. GCC doesn't take the middle path. It either supports an addressing mode completely and doesn't support at all. I tried lot of hacks to support (base + offset) addressing mode only for QI mode store instructions. After a lot of fight i finally gave up and removed the QImode support for this addressing mode completely in GO_IF_ LEGITIMATE_ADDRESS macro. Now i am pursing an alternate solution. Have peephole2 patterns to implement QImode (base+offset) addressing mode for store instructions. How does it sound? So now i have written a peephole2 pattern like: (define_peephole2 [(parallel [(set (match_operand:SI 0 register_operand ) (plus:SI (match_operand:SI 1 register_operand ) (match_operand:SI 2 const_int_operand ))) (clobber (reg:CCC CC_REGNUM)) (clobber (reg:CCO EMR_REGNUM))]) (set (mem:QI (match_dup 0)) (match_operand:QI 3 register_operand ))] REGNO_OK_FOR_BASE_P (REGNO (operands[1])) constraint_satisfied_p (operands[2], CONSTRAINT_N) [(set (mem:QI (plus:SI (match_dup 1) (match_dup 2))) (match_dup 3))] ) In the rtl dumps just before peephole2 pass i get (insn 213 211 215 39 20010408-1.c:71 (parallel [ (set (reg/f:SI 16 r0 [121]) (plus:SI (reg/v/f:SI 18 r2 [orig:93 p ] [93]) (const_int -1 [0x]))) (clobber (reg:CCC 50 sr)) (clobber (reg:CCO 54 emr)) ]) 18 {addsi3} (expr_list:REG_UNUSED (reg:CCO 54 emr) (expr_list:REG_UNUSED (reg:CCC 50 sr) (nil (insn 215 213 214 39 20010408-1.c:71 (set (mem/f/c/i:SI (plus:SI (reg/f:SI 23 r7) (const_int -32 [0xffe0])) [5 s+0 S4 A32]) (reg/v/f:SI 18 r2 [orig:93 p ] [93])) 2 {*movsi_internal} (expr_list:REG_DEAD (reg/v/f:SI 18 r2 [orig:93 p ] [93]) (nil))) (insn 214 215 284 39 20010408-1.c:71 (set (mem:QI (reg/f:SI 16 r0 [121]) [0 S1 A8]) (reg/v:QI 6 d6 [orig:92 ch ] [92])) 0 {*movqi_internal} (expr_list:REG_DEAD (reg/f:SI 16 r0 [121]) (expr_list:REG_DEAD (reg/v:QI 6 d6 [orig:92 ch ] [92]) (nil This is not match by the peephole2 pattern. After debugging i see that the function 'peephole2_insns' matches only consecutive patterns. Is that true? Is there a way to over come this? Another issue. In another instance peephole2 matched but the generated pattern did not get recognized because GO_IF_ LEGITIMATE_ADDRESS was rejecting the addressing mode. Since peephole2 pass was run after reload i changed GO_IF_ LEGITIMATE_ADDRESS macro to allow the addressing mode after reload is completed. So now the check is something like this: case PLUS: { rtx offset = XEXP (x, 1); rtx base = XEXP (x, 0); if ( !(BASE_REG_RTX_P (base, strict) || STACK_REG_RTX_P (base))) return 0; /* For QImode the target does not suppport (base + offset) address in the load instructions. So we disable this addressing mode till reload is completed. */ if (!reload_completed mode == QImode BASE_REG_RTX_P (base, strict)) return 0; I haven't run the testsuite, but Is this ok to have like this? Please let me know your thoughts on this. Thanks for your time. Regards Shafi
Re: How to implement pattens with more that 30 alternatives
2009/12/22 Richard Earnshaw rearn...@arm.com: On Mon, 2009-12-21 at 18:44 +, Paul Brook wrote: I am doing a port in GCC 4.4.0 for a 32 bit target. As a part of scheduling framework i have to write the move patterns with more clarity, so that i could control the scheduling with the help of attributes. Re-writting the pattern resulted in movsi pattern with 41 alternatives :( Use rtl expressions instead of alternatives. e.g. arm.md:arith_shiftsi Or use the more modern iterators approach. Aren't iterators for generating multiple insns (e.g. movsi and movdi) from the same pattern, whereas in this case we have a single insn that needs to accept many different operand combinartions? Yes, but that is often better, I suspect, than having too fancy a pattern that breaks the optimization simplifications that genrecog does. Note that the attributes that were requested could be made part of the iterator as well, using a mode_attribute. I can't find a back-end that does this. Can you show me a example? Regards, Shafi
How to implement pattens with more that 30 alternatives
Hi all, I am doing a port in GCC 4.4.0 for a 32 bit target. As a part of scheduling framework i have to write the move patterns with more clarity, so that i could control the scheduling with the help of attributes. Re-writting the pattern resulted in movsi pattern with 41 alternatives :( When i specify the attributes it seems that all the alternatives above 31 are allocated with the default value of the attribute. This is done in the generated file insn-attrtab.c. The following is one such piece of code: case 2: /* *movsi_internal */ extract_constrain_insn_cached (insn); if (((1 which_alternative) 0xf)) { return DELAY_SLOT_TYPE_CLOB_SR; } else if (((1 which_alternative) 0x30)) { return DELAY_SLOT_TYPE_RW_SP; } else if (which_alternative == 6) { return DELAY_SLOT_TYPE_CLOB_SR; } else if (((1 which_alternative) 0x1fff80)) { return DELAY_SLOT_TYPE_COMMON; } else if (((1 which_alternative) 0x1e0)) { return DELAY_SLOT_TYPE_RW_SP; } else if (which_alternative == 25) { return DELAY_SLOT_TYPE_READ_SR; } else if (which_alternative == 26) { return DELAY_SLOT_TYPE_READ_EMR; } else if (which_alternative == 27) { return DELAY_SLOT_TYPE_COMMON; } else if (which_alternative == 28) { return DELAY_SLOT_TYPE_WRITE_SR; } else { return DELAY_SLOT_TYPE_COMMON; } As you can see from the above code all the alternatives which are more that 31 will always get the default value of the attribute. This is because GCC assumes that the target has only 31 alternatives. Even changing the macro #define MAX_RECOG_ALTERNATIVES 30 in the file recog.h there is no change in this assumption. (Which i think should have affected the attribute calulation). I guess that if i make need_64bit_hwint=yes , then this problem should go away. I havent check this. But i dont want to do that, since this means that i will have to change all the dependencies that are affected by this change. Is there any other solution for my problem? Any help is appreciated. Regards, Shafi
Re: How to support 40bit GP register - Take two
2009/12/18 Hans-Peter Nilsson h...@bitrange.com: On Fri, 20 Nov 2009, Mohamed Shafi wrote: I tried implementing the suggestion given by Richard, but got into issues. The GCC frame work is written assuming that there are no modes with HOST_BITS_PER_WIDE_INT GET_MODE_BITSIZE (mode) 2 * HOST_BITS_PER_WIDE_INT. (Not seeing a reply regarding this issue, so here's mine, belated:) Perhaps a wart, but with a 64-bit HOST_BITS_PER_WIDE_INT, would that affect your port? It's not? Just set need_64bit_hwint=yes in config.gcc. And send a patch for the introductory comment in that file, unless your port already matches the BITS_PER_WORD 32 bits condition. Thanks Hans for yourr reply I have already tried that. What you are suggesting is the first solution that i got from Richard Henderson. I have mentioned the issues if faced with this in my mail. The GCC frame work is written assuming that there are no modes with HOST_BITS_PER_WIDE_INT GET_MODE_BITSIZE (mode) 2 * HOST_BITS_PER_WIDE_INT. So i had to hack at places to get things working. For my target the BITS_PER_WORD == 32. The mode that i am using is RImode (5bytes) Regards, Shafi
How to support 40bit GP register - Take two
Hello all, I am porting GCC 4.4.0 for a 32bit target. The target has 40bit data registers and 32bit address register. Both can be used as general purpose registers. All load and store operations are 32bit. If 40bit data register is involved in load/sore the register gets sign extended. Whenever there is a move from address register to data register sign extension is automatically performed. Currently GCC generates code for 32bit register target. Since the data register is 40bit after/before some operations sign/zero extension has to be performed for the result to be proper. So at present for the port the results are not proper. I would need a solution to fix this. I had mailed about this previously. You can see about this here http://www.mail-archive.com/gcc@gcc.gnu.org/msg47224.html I tried implementing the suggestion given by Richard, but got into issues. The GCC frame work is written assuming that there are no modes with HOST_BITS_PER_WIDE_INT GET_MODE_BITSIZE (mode) 2 * HOST_BITS_PER_WIDE_INT. Moreover i am getting ICEs when there is an optimization/operation related to subreg. (GCC tries to split RImode values).RImode is 5byte and uses SImode load/store instructions. So GCC generates offsets/addresses that are not 32bit aligned. Currently i am hacking the complier all the way to get an executable (though i have not tested the output of the obtained executables) Even if i somehow manage to get proper output there is the issue of using 32bit registers in RImode instructions. RImode values is meant for 40bit register, i.e data register. That means i will not be able to use address registers(32bit registers) in RImode patterns even though the instructions accept them. This will definitely hamper efficiency. So i was wondering if anybody has any alternative solution that i can try. All i can think is to flag an insn for unsigned operation so that i will be able to insert sign/zero extension during say reorg pass. Can this be implemented? How feasible is this? Regards, Shafi
Re: How to split mulsi3 pattern
2009/11/10 Richard Henderson r...@redhat.com: On 11/10/2009 05:48 AM, Mohamed Shafi wrote: (define_insn mulsi3 [(set (match_operand:SI 0 register_operand =d) (mult:SI (match_operand:SI 1 register_operand %d) (match_operand:SI 2 register_operand d)))] Note that % is only useful if the constraints for the two operands are different (e.g. only one operand accepts an immediate input). When they're identical, you simply waste cpu cycles asking reload to try the operands in the other order. [(set (match_dup 0) (ashift:SI (plus:SI (mult:HI (unspec:HI [(match_dup 2)] UNSPEC_REG_LOW) (unspec:HI [(match_dup 1)] UNSPEC_REG_HIGH)) (mult:HI (unspec:HI [(match_dup 2)] UNSPEC_REG_HIGH) (unspec:HI [(match_dup 1)] UNSPEC_REG_LOW))) (const_int 16))) (set (match_dup 0) (plus:SI (match_dup 0) (mult:HI (unspec:HI [(match_dup 2)] UNSPEC_REG_LOW) (unspec:HI [(match_dup 1)] UNSPEC_REG_LOW] Well for one, your modes don't match. You actually want your unspecs and MULTs to be SImode. You could probably usefully model the second insn as (define_insn mulsi3_part2 [(set (match_operand:SI 0 register_operand =d) (plus:SI (mult:SI (zero_extend:SI (match_operand:HI 1 register_operand d)) (zero_extend:SI (match_operand:HI 2 register_operand d))) (match_operand:SI 3 register_operand 0)))] ...) So i need to change the mode of the register from SI to HI after reloading. Is that allowed? Regards, Shafi
How to split mulsi3 pattern
Hello all, I am doing a port for a 32bit target in GCC 4.4.0. In my target 32bit multiply instruction is carried out in two instructions. Dn = Da x Db is executed as Dn = (Da.L * Db.H + Da.H * Db.L) 16 Dn = Dn + (Da.L * Db.L) Currently the pattern that i have for this is as follows: (define_insn mulsi3 [(set (match_operand:SI 0 register_operand =d) (mult:SI (match_operand:SI 1 register_operand %d) (match_operand:SI 2 register_operand d)))] I would like to split this pattern into two (either after of before reload). Currently i am doing something like this: (define_insn_and_split mulsi3 [(set (match_operand:SI 0 register_operand =d) (mult:SI (match_operand:SI 1 register_operand %d) (match_operand:SI 2 register_operand d)))] # reload_completed [(set (match_dup 0) (ashift:SI (plus:SI (mult:HI (unspec:HI [(match_dup 2)] UNSPEC_REG_LOW) (unspec:HI [(match_dup 1)] UNSPEC_REG_HIGH)) (mult:HI (unspec:HI [(match_dup 2)] UNSPEC_REG_HIGH) (unspec:HI [(match_dup 1)] UNSPEC_REG_LOW))) (const_int 16))) (set (match_dup 0) (plus:SI (match_dup 0) (mult:HI (unspec:HI [(match_dup 2)] UNSPEC_REG_LOW) (unspec:HI [(match_dup 1)] UNSPEC_REG_LOW] ) But in few testcases this is creating problems. So i would like to know better patterns to split mulsi3 pattern. Can someone help me out. Regards, Shafi
Re: How to write shift and add pattern?
2009/11/6 Richard Henderson r...@redhat.com: On 11/06/2009 05:29 AM, Mohamed Shafi wrote: The target that i am working on has 1 2 bit shift-add patterns. GCC is not generating shift-add patterns when the shift count is 1. It is currently generating add operations. What should be done to generate shift-add pattern instead of add-add pattern? I'm not sure. You may have to resort to matching (set (match_operand 0 register_operand ) (plus (plus (match_operand 1 register_operand ) (match_dup 1)) (match_operand 2 register_operand But you should debug make_compound_operation first to figure out what's going on for your port, because it's working for x86_64: long foo(long a, long b) { return a*2 + b; } leaq (%rsi,%rdi,2), %rax # 8 *lea_2_rex64 ret # 26 return_internal r~ I have fixed this. The culprit was the cost factor. I added the case in targetm.rtx_costs and now it works properly. But i am having issues with the reload. Regards, Shafi
Re: How to write shift and add pattern?
2009/11/6 Ian Lance Taylor i...@google.com: Mohamed Shafi shafi...@gmail.com writes: It is generating with data registers. Here is the pattern that i have written: (define_insn *saddl [(set (match_operand:SI 0 register_operand =r,d) (plus:SI (mult:SI (match_operand:SI 1 register_operand r,d) (match_operand:SI 2 const24_operand J,J)) (match_operand:SI 3 register_operand 0,0)))] How can i do this. Will the constraint modifiers '?' or '!' help? How can make GCC generate shift and add sequence when the shift count is 1? Does 'd' represent a data register? I assume that 'r' is a general register, as it always is. What is the constraint character for an address register? You don't seem to have an alternative here for address registers, so I'm not surprised that the compiler isn't picking it. No doubt I misunderstand something. Ok the constrain for address register is 'a'. Thats typo in the pattern that i given here. The proper pattern is (define_insn *saddl [(set (match_operand:SI 0 register_operand =a,d) (plus:SI (mult:SI (match_operand:SI 1 register_operand a,d) (match_operand:SI 2 const24_operand J,J)) (match_operand:SI 3 register_operand 0,0)))] So how can i choose the address registers over data registers if that is more profitable? Regards, Shafi
Re: How to support 40bit GP register
2009/10/22 Richard Henderson r...@redhat.com: On 10/21/2009 07:25 AM, Mohamed Shafi wrote: For accessing a-b GCC generates the following code: move.l (sp-16), d3 lsrr.l #16, d3 move.l (sp-12),d2 asll #16,d2 or d3,d2 cmpeq.w #2,d2 jf _L2 Because data registers are 40 bit for 'asll' operation the shift count should be 16+8 or there should be sign extension from 32bit to 40 bits after the 'or' operation. The target has instruction to sign extend from 32bit to 40 bit. Similarly there are other operation that requires sign/zero extension. So is there any way to tell GCC that the data registers are 40bit and there by expect it to generate sign/zero extension accordingly ? Define a machine mode for your 40-bit type in cpu-modes.def. Depending on how your 40-bit type is stored in memory, you'll use either INT_MODE (RI, 5) // load-store uses exactly 5 bytes FRACTIONAL_INT_MODE (RI, 40, 8) // load-store uses 8 bytes Where I've arbitrarily chosen RImode as a mnemonic for Register Integral Mode. Now you define arithmetic operations, as needed, on RImode. You define the extendsiri pattern to be that sign-extend from 32-to-40-bit instruction. You define your comparison patterns on RImode, and not on SImode, since your comparison instruction works on the entire 40 bits. You'll wind up with a selection of patterns in your machine description that have a sign-extension pattern built in, depending on the exact behaviour of your ISA. There are plenty of examples on x86_64, mips64, and Alpha (to name a few) that have similar properties with SI and DImodes. Examine the -fdump-rtl-combine-details dump for exemplars of the canonical forms that the combiner creates when it tries to merge sign-extension instructions into preceeding patterns. Ok i have comparison patterns written in RImode. When you say that i will wind up with a selection of patterns do you mean to say that i should have patterns for operations that operate on full 40bits in RImode and disable the corresponding SImode patterns? Or is it that i have to write nameless patterns in RImode for arithmetic operations and look at the dumps to see how the combiner will merge the patterns so that it can match the comparison operations? Regards, Shafi
Re: How to support 40bit GP register
2009/10/22 Richard Henderson r...@redhat.com: On 10/21/2009 07:25 AM, Mohamed Shafi wrote: For accessing a-b GCC generates the following code: move.l (sp-16), d3 lsrr.l #16, d3 move.l (sp-12),d2 asll #16,d2 or d3,d2 cmpeq.w #2,d2 jf _L2 Because data registers are 40 bit for 'asll' operation the shift count should be 16+8 or there should be sign extension from 32bit to 40 bits after the 'or' operation. The target has instruction to sign extend from 32bit to 40 bit. Similarly there are other operation that requires sign/zero extension. So is there any way to tell GCC that the data registers are 40bit and there by expect it to generate sign/zero extension accordingly ? Define a machine mode for your 40-bit type in cpu-modes.def. Depending on how your 40-bit type is stored in memory, you'll use either INT_MODE (RI, 5) // load-store uses exactly 5 bytes FRACTIONAL_INT_MODE (RI, 40, 8) // load-store uses 8 bytes Richard thanks for the reply. Load-store uses 32bits. Sign extension happens automatically. So i have choosen INT_MODE (RI, 5) and copied movsi and renamed it to movri. I have also specified that RImode need only one register. Where I've arbitrarily chosen RImode as a mnemonic for Register Integral Mode. Now you define arithmetic operations, as needed, on RImode. You define the extendsiri pattern to be that sign-extend from 32-to-40-bit instruction. You define your comparison patterns on RImode, and not on SImode, since your comparison instruction works on the entire 40 bits. I have defined extendsiri and cbranchri4 patterns. When i compile a program like unsigned long xh = 1; int main () { unsigned long yh = 0xull; unsigned long z = xh * yh; if (z != yh) abort (); return 0; } I get the following ICE internal compiler error: in immed_double_const, at emit-rtl.c:553 This happens from cse_insn () calls insert() - gen_lowpart - gen_lowpart_common - simplify_gen_subreg - simplfy_immed_subreg. simplify_immed_subreg is called with the parameters (outermode=RImode, (const_int 65535), innermode=DImode, byte=0) cse_insn is called for the following insn (insn 10 9 11 3 bug7.c:14 (set (reg:RI 67) (const_int 65535 [0x])) 4 {movri} (nil)) How can i overcome this? Regards, Shafi You'll wind up with a selection of patterns in your machine description that have a sign-extension pattern built in, depending on the exact behaviour of your ISA. There are plenty of examples on x86_64, mips64, and Alpha (to name a few) that have similar properties with SI and DImodes. Examine the -fdump-rtl-combine-details dump for exemplars of the canonical forms that the combiner creates when it tries to merge sign-extension instructions into preceeding patterns.
Re: IRA is not looking into the predicates ?
2009/10/30 Jeff Law l...@redhat.com: On 10/30/09 07:13, Mohamed Shafi wrote: Hi, I am doing a port for a 32bit target in GCC 4.4.0. The target does not have support for symbolic address in QImode for load operations. You'll need to make sure to reject such addresses for QImode in GO_IF_LEGITIMATE_ADDRESS. In order to do this what i have done is in define_expand for moveqi reject symbolic address it they come in source operands and i have also written a predicate for *moveqi_internal to reject such cases. OK. Nothing wrong with these steps. Though you really need to make sure GO_IF_LEGITIMATE_ADDRESS is defined correctly. IRA doesn't look at operand predicates or insn conditions. It assumes that any insns are valid assuming any pseudo registers appearing in the insn get suitable hard registers. Based on the dumps you provided it appears that reg61 does not get a hard register and reload is generating the problematical insn #24. This is a good indication that your GO_IF_LEGITIMATE_ADDRESS is incorrectly implemented. I the GO_IF_LEGITIMATE_ADDRESS address macro i am allowing this address because the target supports symbolic address in QImode for store operations. And in the macro GO_IF_LEGITIMATE_ADDRESS there is no option to check if the address is used in load or store. Thats why in define_expand for moveqi i reject symbolic address it they come in source operands and a predicate for *moveqi_internal to reject such cases. But still i am getting the ICE. IIRC the control does not come to TARGET_SECONDARY_RELOAD also. How can i overcome this? Regards, Shafi
Re: IRA is not looking into the predicates ?
2009/10/30 Ian Lance Taylor i...@google.com: Mohamed Shafi shafi...@gmail.com writes: From ice4.c.168r.asmcons (insn 5 2 6 2 ice4.c:4 (set (reg:SI 61 [ s ]) (mem/c/i:SI (symbol_ref:SI (s) [flags 0x2] var_decl 0xb7bfd000 s) [0 s+0 S4 A32])) 2 {*movsi_internal} (nil)) (insn 6 5 7 2 ice4.c:4 (set (reg:QI 62) (plus:QI (subreg:QI (reg:SI 61 [ s ]) 0) (const_int -100 [0xff9c]))) 16 {addqi3} (expr_list:REG_DEAD (reg:SI 61 [ s ]) (nil))) How can i prevent this ICE ? If asmcons is the first place that this appears, then I think it must be coming from some asm statement. So the first step would be to look at the asm statement and see if it can be rewritten using a different constraint. No this appears from the rtl expand onwards. Shafi
IRA is not looking into the predicates ?
Hi, I am doing a port for a 32bit target in GCC 4.4.0. The target does not have support for symbolic address in QImode for load operations. In order to do this what i have done is in define_expand for moveqi reject symbolic address it they come in source operands and i have also written a predicate for *moveqi_internal to reject such cases. But i get the following ICE: insn does not satisfy its constraints: (insn 24 5 6 2 ice4.c:4 (set (reg:QI 17 r1) (mem/c/i:QI (symbol_ref:SI (s) [flags 0x2] var_decl 0xb7bfd000 s) [0 s+0 S1 A32])) 0 {*movqi_internal} (nil)) From ice4.c.172r.ira (insn 24 5 6 2 ice4.c:4 (set (reg:QI 17 r1) (mem/c/i:QI (symbol_ref:SI (s) [flags 0x2] var_decl 0xb7bfd000 s) [0 s+0 S1 A32])) 0 {*movqi_internal} (nil)) (insn 6 24 7 2 ice4.c:4 (set (reg:QI 16 r0 [62]) (plus:QI (reg:QI 17 r1) (const_int -100 [0xff9c]))) 16 {addqi3} (nil)) From ice4.c.168r.asmcons (insn 5 2 6 2 ice4.c:4 (set (reg:SI 61 [ s ]) (mem/c/i:SI (symbol_ref:SI (s) [flags 0x2] var_decl 0xb7bfd000 s) [0 s+0 S4 A32])) 2 {*movsi_internal} (nil)) (insn 6 5 7 2 ice4.c:4 (set (reg:QI 62) (plus:QI (subreg:QI (reg:SI 61 [ s ]) 0) (const_int -100 [0xff9c]))) 16 {addqi3} (expr_list:REG_DEAD (reg:SI 61 [ s ]) (nil))) How can i prevent this ICE ? Regards, Shafi
Typo in internals
Hi, The internal doc says : — Target Hook: bool TARGET_CAN_INLINE_P (tree caller, tree callee) This target hook returns false if the caller function cannot inline callee, based on target specific information. By default, inlining is not allowed if the callee function has function specific target options and the caller does not use the same options. But looking in the sources i think this really should have been TARGET_OPTION_CAN_INLINE_P Shafi.
Re: Supporting FP cmp lib routines
2009/9/14 Richard Henderson r...@redhat.com: Another thing to look at, since you have hand-written routines and may be able to specify that e.g. only a subset of the normal call clobbered registers are actually modified, is to leave the call as a compare insn. Something like (define_insn *cmpsf [(set (reg:CC status-reg) (compare:CC (match_operand:SF 0 register_operand R0) (match_operand:SF 1 register_operand R1))) (clobber (reg:SI r2)) (clobber (reg:SI r3))] call __compareSF [(set_attr type call)]) Where the R0 and R1 constraints resolve to the input registers for the routine. Depending on your ISA and ABI, you may not even need to split this pattern post-reload. I have implemented the above solution and it works. I have to support the same for DF also. But with DF i have a problem with the constraints. My target generates code for both big and little endian. The ABI specifies that when a 64bit value is passed as an argument they are passed in R6 and R7, R6 containing the most significant long word and R7 containing the least significant long word, regardless of the endianess mode. How can i do this in the DF compare pattern? Regards, Shafi
How to support 40bit GP register
HI all, I am porting GCC 4.4.0 for a 32bit target. The target has 40bit data registers and 32bit address registers that can be used as general purpose registers. When 40bit registers are used for arithmetic operations or comparison operations GCC generates code assuming that its a 32bit register. Whenever there is a move from address register to data register sign extension is automatically performed by the target. Since the data register is 40bit after some operations sign/zero extension has to be performed for the result to be proper. Take the following test case for example : typedef struct { char b0; char b1; char b2; char b3; char b4; char b5; } __attribute__ ((packed)) b_struct; typedef struct { short a; long b; short c; short d; b_struct e; } __attribute__ ((packed)) a_struct; int main(void) { volatile a_struct *a; volatile a_struct b; a = b; *a = (a_struct){1,2,3,4}; a-e.b4 = 'c'; if (a-b != 2) abort (); exit (0); } For accessing a-b GCC generates the following code: move.l (sp-16), d3 lsrr.l #16, d3 move.l (sp-12),d2 asll#16,d2 or d3,d2 cmpeq.w #2,d2 jf _L2 Because data registers are 40 bit for 'asll' operation the shift count should be 16+8 or there should be sign extension from 32bit to 40 bits after the 'or' operation. The target has instruction to sign extend from 32bit to 40 bit. Similarly there are other operation that requires sign/zero extension. So is there any way to tell GCC that the data registers are 40bit and there by expect it to generate sign/zero extension accordingly ? Regards, Shafi
Re: How to split 40bit data types load/store?
2009/9/14 Richard Henderson r...@redhat.com: On 09/14/2009 07:24 AM, Mohamed Shafi wrote: Hello all, I am doing a port for a 32bit target in GCC 4.4.0. I have to support a 40bit data (_Accum) in the port. The target has 40bit registers which is a GPR and works as 32bit reg in other modes. The load and store for _Accum happens in two step. The lower 32bit in one instruction and the upper 8bit in the next instruction. I want to split the instruction after reload. I tired to have a pattern (for load) like this: (define_insn fn_load_ext_sa [(set (unspec:SA [(match_operand:DA 0 register_operand )] UNSPEC_FN_EXT) (match_operand:SA 1 memory_operand ))] (define_insn fn_load_sa [(set (unspec:SA [(match_operand:DA 0 register_operand )] UNSPEC_FN) (match_operand:SA 1 memory_operand ))] Unspec on the left-hand-side isn't something that's supposed to happen, and is more than likely the cause of your problems. Try moving the unspec to the right-hand-side like: (set (reg:SI reg) (mem:SI addr)) (set (reg:SA reg) (unspec:SA [(reg:SI reg) (mem:QI addr)] UNSPEC_ACCUM_INSERT)) and (set (mem:SI addr) (reg:SI reg)) (set (mem:QI addr) (unspec:QI [(reg:SA reg)] UNSPEC_ACCUM_EXTRACT)) Note that after reload it's perfectly acceptable for a hard register to appear with the different SI and SAmodes. It's probably not too hard to define this with zero_extract sequences instead of unspecs, but given that these only appear after reload, it may not be worth the effort. I was able to implement this with unspecs. But now it seems that i need to split the pattern before reload also. So i am thinking of removing this and doing a split before reload. The issue is that there is no support to for register indirect addressing mode for accessing the upper eight bits of the 40bit register. The only addressing mode supported for accessing this section is (SP+offset). So what i thought was to allow this addressing mode and at the time of reloading, at the time of secondary reload with the help of a scratch register and a scratch memory. But it seems that in GCC it is not possible to have both scratch memory and a scratch register for the same operation. Am i right? So what i did was to implement this at the define_expand stage itself. The idea is to generate the following sequence: for load (R0), D0 generate load (R0), D0// 32bit mode , SAmode move load (R0+4), scratch_reg // 32bit mode, SAmode store scratch_reg, (SP+off) //32bit mode, SAmode load.ext (SP+off), D0.u8 and similarly for store. Here are the patterns that i used for this purpose: (define_expand movda [(set (match_operand:DA 0 nonimmediate_operand ) (match_operand:DA 1 nonimmediate_operand ))] { if (MEM_P (operands[1]) REG_P (XEXP (operands[1], 0)) XEXP (operands[1], 0) != virtual_stack_vars_rtx)) { rtx lo_half, hi_half; rtx scratch_mem, scratch_reg, subreg; gcc_assert (can_create_pseudo_p ()); scratch_reg = gen_reg_rtx (SAmode); scratch_mem = assign_stack_temp (SAmode, GET_MODE_SIZE (SAmode), 0);\ subreg = gen_rtx_SUBREG (SAmode, operands[0], 0); lo_half = adjust_address (operands[1], SAmode, 0); hi_half = adjust_address (operands[1], SAmode, 4); emit_insn (gen_rtx_SET (SAmode, subreg, lo_half)); emit_insn (gen_rtx_SET (SAmode, scratch_reg, hi_half)); emit_insn (gen_rtx_SET (SAmode, scratch_mem, scratch_reg)); emit_insn (gen_load_reg_ext (operands[0], scratch_mem)); DONE; } /* and similarly for store operation */ } ) (define_insn load_reg_ext [(set (subreg:SA (zero_extract:DA (match_operand:DA 0 register_operand =d) (const_int 8) (const_int 24)) 4) (match_operand:SA 1 memory_operand Sd3))] (define_insn store_reg_ext [(set (match_operand:SA 0 memory_operand =Sd3) (zero_extract:SA (match_operand:DA 1 register_operand d) (const_int 8) (const_int 24)))] (define_insn *movsa_internal [(set (match_operand:SA 0 nonimmediate_operand =m,d,d) (match_operand:SA 1 nonimmediate_operand d,m,d))] By default -fomit-frame-pointer will passed to the complier. Without optimization compiler generates the expected output. But with optimization that is not the case. It seems that the pattern that i have written above are not proper. For the simple function like the following _Accum foo (_Accum *a) { _Accum b = *a; return b; } with optimization enabled the complier generates only load (R0), D0// 32bit mode , SAmode move the 1st instruction in the expected 4 instruction sequence. How can i write the patterns properly? Regards Shafi
Re: define_memory_constraint and REG_OK_STRICT
2009/9/30 Richard Henderson r...@redhat.com: On 09/29/2009 09:46 PM, Mohamed Shafi wrote: bool strict = reload_completed ? true : false; What happens if you set strict = false here? That's what ARM does. That particular case works, and yes arm does it that way but there are other targets that uses (reload_completed || reload_in_progress) like s390. So thats why i had to ask if my definition of strict is proper or not. I am not sure which one to use? Shafi
Re: Reload going wrong for addition.
2009/9/28 Richard Henderson r...@redhat.com: On 09/28/2009 07:25 AM, Mohamed Shafi wrote: Hope someone suggests me a solution. The solution is almost certainly something involving the TARGET_SECONDARY_RELOAD hook. You need to inform reload that it's going to need some scratch registers in order to perform the operation. It's been a long time since I had to fiddle with this sort of thing, so I forget all the details involved. Perhaps someone else has some additional advice. Ok what i did was to remove the code from preferred_reload_class function, so that now it returns class i.e #define PREFERRED_RELOAD_CLASS(class, x) class And did in TARGET_SECONDARY_RELOAD i added the code to have a scratch register to do the move operation. Now things are working. So i guess i should as why we have PREFERRED_RELOAD_CLASS when we can do the same with TARGET_SECONDARY_RELOAD? Shafi
define_memory_constraint and REG_OK_STRICT
Hello all, I am doing a port for a 32bit target in GCC 4.4.0. I have defined memory_constraints in predicates.c like this (define_memory_constraint Sr0 Memory refrence through base registers (match_test target_mem_constraint (\r0\, op))) In the function target_mem_constraint i have int target_mem_constraint (const char *str, rtx op) { char c0 = str[0]; char c1 = str[1]; rtx op0 = XEXP (op, 0); bool strict = reload_completed; if (!MEM_P (op)) return 0; switch (c0) { case 'r': return (!STACK_REG_RTX_P (op0) BASE_REG_RTX_P (op0, strict)); ... ... My question is my definition of strict correct? or should it be reload_in_progress || reload_completed? Regards, Shafi
Reload going wrong for addition.
Hello all, I doing a port for a 32bit target for GCC 4.4.0. I am getting the following error: rd_er.c:19: error: insn does not satisfy its constraints: (insn 5 35 34 2 rd_er.c:8 (set (reg:SI 16 r0) (plus:SI (reg:SI 16 r0) (reg:SI 2 d2))) 57 {addsi3} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 49 sp) (const_int -65544 [0xfffefff8])) (nil))) My target has 16 data registers and 16 address registers. All are 32bit registers. The target also has a dedicated stack pointer. There is no move operation possible between SP and data regs. There is no provision for addition between data and address registers. R7 is used as Frame Pointer. Pattern for addition --- (define_insn addmode3 [(set (match_operand:INT 0 register_operand =d, t, k, a, a, t, k, t, d) (plus:INT(match_operand:INT 1 register_operand 0, 0, 0, t, k, 0, 0, 0, 0) (match_operand:INT 2 nonmemory_operand J, J, J, L, L, t, t, k, d)))] The constraints used are - ;;d - Data registers [D0 - D15] ;;a - Address registers [R0 - R15] ;;t - Address and Index registers ;;k - Stack Pointer ;;J - Unsigned 5bit immediate ;;L - Signed 16bit immediate Since there is no move operation between SP and data regs i have specified 12 as the register_move_cost between them. I also return the reload class as address register class in preferred_reload_class when the rtx is SP. b4 ira pass --- (insn 5 2 12 2 rd_er.c:8 (set (reg/v/f:SI 60 [ bufptr ]) (reg/f:SI 23 r7)) 43 {*movsi_internal} (nil)) Input for reload pass - (insn 5 2 12 2 rd_er.c:8 (set (reg/v/f:SI 7 d7 [orig:60 bufptr ] [60]) (plus:SI (reg/f:SI 49 sp) (const_int -65536 [0x]))) 57 {addsi3} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 49 sp) (const_int -65536 [0x])) (nil))) After IRA --- Reloads for insn # 5 Reload 0: reload_in (SI) = (reg/f:SI 49 sp) reload_out (SI) = (reg/v/f:SI 7 d7 [orig:60 bufptr ] [60]) HIGH_OR_LOW, RELOAD_OTHER (opnum = 0) reload_in_reg: (reg/f:SI 49 sp) reload_out_reg: (reg/v/f:SI 7 d7 [orig:60 bufptr ] [60]) reload_reg_rtx: (reg:SI 16 r0) Reload 1: reload_in (SI) = (const_int -65544 [0xfffefff8]) DALU_REGS, RELOAD_FOR_INPUT (opnum = 2) reload_in_reg: (const_int -65544 [0xfffefff8]) reload_reg_rtx: (reg:SI 2 d2) (insn 5 35 34 2 rd_er.c:8 (set (reg:SI 16 r0) (plus:SI (reg:SI 16 r0) (reg:SI 2 d2))) 57 {addsi3} (expr_list:REG_EQUAL (plus:SI (reg/f:SI 49 sp) (const_int -65544 [0xfffefff8])) (nil))) The reload pass chooses the final alternative as the goal for reloading. Since the input instruction already has data register as the destination the constraint combination (t, 0, t) looses to (d, 0, d), since the last combination requires least amount copying for constraint matching (or so the reload pass believes). There are cases when reload fixes the add pattern and those are when either the destination is address register or there is no stack pointer involved. But otherwise i am getting this ICE. I am not sure how to over come this,. Hope someone suggests me a solution. Regards, Shafi P.S Can i have commutative operation for the constraint combination (t, 0, t) i.e (t, %0, t). If so what will be the output template?
Segmentation fault when calling a library fun - GCC bug?
I am doing a port for a 32bit target in GCC 4.4.0 I am getting segmentation fault in the function assign_temp in the following line: if (DECL_P (type_or_decl)) After analyzing the issue i find that this might be a bug. I just want to confirm if that is the case or not. In order to reproduce i think the target should have the following properties a. Only 2 32bit registers available as argument registers. b. Second 64bit value will be pushed in stack c. ACCUMULATE_OUTGOING_ARGS is set d. STRICT_ALIGNMENT is set e. PARM_BOUNDARY is 32 When there is a library call for an operation that takes two 64bit arguments, say division of two long long values - _divdi3, the following sequence happens emit_library_call_value - emit_library_call_value_1 - emit_push_insn-assign_temp emit_push_insn is called because the second argument is pushed on the stack and ACCUMULATE_OUTGOING_ARGS is set. assign_temp is called because STRICT_ALIGNMENT PARM_BOUNDARY GET_MODE_ALIGNMENT (DImode) is true Can somebody please confirm whether this is due to some mistake in my port or a GCC bug? Thanks, Shafi
How to implement compare and branch instruction
Hello all, I am porting a 32bit target in GCC 4.4.0 The target has have distinct signed and unsigned compare instructions, and only one set of conditional branch instructions. Moreover the operands cannot be immediate values if the comparison is unsigned. I have implemented this using compare-and-branch instruction. This gets split after reload. The pattern that i have written are as follows: (define_expand cmpmode [(set (reg:CC CC_REGNUM) (compare (match_operand:INT 0 register_operand ) (match_operand:INT 1 nonmemory_operand )))] { compare_op0 = operands[0]; compare_op1 = operands[1]; DONE; } ) (define_expand bcode [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 1) (match_dup 2))) (set (pc) (if_then_else (comp_op:CC (reg:CC CC_REGNUM)(const_int 0)) (label_ref (match_operand 0 )) (pc)))] { operands[1] = compare_op0; operands[2] = compare_op1; if (CONSTANT_P (operands[2]) (CODE == LTU || CODE == GTU || CODE == LEU || CODE == GEU)) operands[2] = force_reg (GET_MODE (operands[1]), operands[2]); operands[3] = gen_rtx_fmt_ee (CODE, CCmode, gen_rtx_REG (CCmode,CC_REGNUM), const0_rtx); emit_jump_insn (gen_compare_and_branch_insn (operands[0], operands[1], operands[2], operands[3])); DONE; } ) (define_insn_and_split compare_and_branch_insn [(set (pc) (if_then_else (match_operator:CC 3 comparison_operator [(match_operand 1 register_operand d,d,a,a,d,t,k,t) (match_operand 2 nonmemory_operand J,L,J,L,d,t,t,k)]) (label_ref (match_operand 0 )) (pc)))] !unsigned_immediate_compare_p (GET_CODE (operands[3]), operands[2]) # reload_completed [(set (reg:CC CC_REGNUM) (match_op_dup:CC 3 [(match_dup 1) (match_dup 2)])) (set (pc) (if_then_else (eq (reg:CC CC_REGNUM) (const_int 0)) (label_ref (match_dup 0)) (pc)))] { if (expand_compare_insn (operands, 0)) DONE; } ) In the function expand_compare_insn i am asserting that operand[2] is not a immediate value if the comparison is unsigned. I am getting a assertion failure in this function. The problem is that reload pass will replace operand[2] with its equiv_constant. This breaks the pattern after reload pass. Before reload pass (jump_insn 58 56 59 10 20070129.c:73 (set (pc) (if_then_else (leu:CC (reg:QI 84) (reg:QI 91)) (label_ref 87) (pc))) 77 {compare_and_branch_insn} (expr_list:REG_DEAD (reg:QI 84) (expr_list:REG_BR_PROB (const_int 200 [0xc8]) (nil After reload pass: (jump_insn 58 56 59 10 20070129.c:73 (set (pc) (if_then_else (leu:CC (reg:QI 17 r1 [84]) (const_int 1 [0x1])) (label_ref 87) (pc))) 77 {compare_and_branch_insn} (expr_list:REG_BR_PROB (const_int 200 [0xc8]) (nil))) How can i overcome this error? Thanks for your help. Regards, Shafi
Supporting FP cmp lib routines
Hi all, I am doing a GCC port for a 32bit target in GCC 4.4.0. The target uses hand coded floating point compare routines. Generally the function returns the values in the general purpose registers. But these fp cmp routines return the result in the Status Register itself. So there is no need to have compare instruction after the function call for FP compare. Is there a way to let GCC know that the result for FP compare are stored in the Status Register so that GCC generates directly a jump operation? How can i implement this? Regards, Shafi
How to split 40bit data types load/store?
Hello all, I am doing a port for a 32bit target in GCC 4.4.0. I have to support a 40bit data (_Accum) in the port. The target has 40bit registers which is a GPR and works as 32bit reg in other modes. The load and store for _Accum happens in two step. The lower 32bit in one instruction and the upper 8bit in the next instruction. I want to split the instruction after reload. I tired to have a pattern (for load) like this: (define_insn fn_load_ext_sa [(set (unspec:SA [(match_operand:DA 0 register_operand )] UNSPEC_FN_EXT) (match_operand:SA 1 memory_operand ))] (define_insn fn_load_sa [(set (unspec:SA [(match_operand:DA 0 register_operand )] UNSPEC_FN) (match_operand:SA 1 memory_operand ))] The above patterns works for O0. But with optimizations i am getting ICE. It seems that GCC won't accept unspec object in destination operand. So how can split the pattens for the load and store for these data types? Regards, Shafi
Reloading is going wrong?
Hello all, I am doing a port for a 32bit target in GCC 4.4.0. Of the addressing modes that are allowed by my target the one with (base register + offset) is restrictive in QImode. The restriction is that if the base register is not Stack Pointer then this kind of address cannot come in a load instruction but only in store instruction. To implement this i added constrains for all supported memory operations in QImode. So the pattern is as follows (define_insn movqi [(set (match_operand:QI 0 nonimmediate_operand =b,b,d,t,d, b,Ss0, Ss1, a,Se1, Sb2, b,Sd3, d,Se0) (match_operand:QI 1 general_operand I, L,d,d,t, Ss0,b, b,Se1,a, b, Sd3,b, Se0,d))] where d is data registers a is address registers b is data and address registers Sb2 is Rn + offset addressing mode Sd3 is SP + offset addressing mode Se0 - (Rn), (Rn)+, (Rn)-, (Rn + Ri) and Post modify register addressing mode Se1 - Se0 excluding Post modify register addressing mode I believe that there are enough combinations available for the reload to try for alternate addressing mode if it encounters the restrictive addressing mode. But I am still getting the following error main1.c:11: error: insn does not satisfy its constraints: (insn 30 29 7 2 main1.c:9 (set (reg:QI 2 d2 [orig:61 variable.a+1 ] [61]) (mem/s/j:QI (plus:SI (reg:SI 16 r0) (const_int 1 [0x1])) [0 variable.a+1 S1 A8])) 41 {movqi} (nil)) main1.c:11: internal compiler error: in reload_cse_simplify_operands, at postreload.c:396 So what am i doing wrong? Cant this scenario be solved by the reload pass? How can generate instructions with the QImode restriction? Regards, Shafi
How to write shift and add pattern?
Hello all, I am trying to port a 32bit arch in GCC 4.4.0. My target has support for 1bit, 2bit shift and add operations. I tried to write patterns for this , but gcc is not generating those. The following are the patterns that i have written in md file: (define_insn shift_add_mode [(set (match_operand:SI 0 register_operand ) (plus:SI (match_operand:SI 3 register_operand ) (ashift:SI (match_operand:SI 1 register_operand ) (match_operand:SI 2 immediate_operand ] shadd1\\t%1, %0 ) (define_insn shift_add1_mode [(set (match_operand:SI 0 register_operand ) (plus:SI (ashift:SI (match_operand:SI 1 register_operand ) (match_operand:SI 2 immediate_operand )) (match_operand:SI 3 register_operand )))] shadd1\\t%1, %0 ) (define_insn shift_n_add_mode [(set (match_operand:SI 1 register_operand ) (ashift:SI (match_dup 1) (match_operand:SI 2 immediate_operand ))) (set (match_operand:SI 0 register_operand ) (plus:SI (match_dup 0) (match_dup 1)))] shadd2\\t%1, %0 ) As you can see i have tried combinations. Since i was looking for pattern matching i didnt bother to write according to the target. Thought i will do that after i get a matching pattern. When i debugged GCC was generating patterns with multiply. But that gets discarded since md file doesnt have those patterns. How can i make GCC generate shift and add pattern? Is GCC generating patterns with multiply due to cost issues? I havent mentioned any cost details. Regards, Shafi
Re: Function argument passing
2009/7/16 Richard Henderson r...@redhat.com: On 07/13/2009 07:35 AM, Mohamed Shafi wrote: So i made both TARGET_STRICT_ARGUMENT_NAMING and PRETEND_OUTGOING_VARARGS_NAMED to return false. Is this correct? Yes. How to make the varargs argument to be promoted to 32bits when the normal argument don't require promotion as mentioned in point (1) ? There is no way at present. You'll have to extend the promote_function_args hook to accept a bool named parameter. 4. A long long return value is returned in R6 and R7, R6 containing the most significant long word and R7 containing the least significant long word, regardless of the endianess mode. Solution: Used TARGET_RETURN_IN_MSB to return true when the mode is little endian I don't believe this is correct. RETURN_IN_MSB is supposed to be handling the case where the data to be returned is smaller than the register in which it is returned -- e.g. a 3 byte structure returned in a 32-bit register. I believe you should be using... 5. If the first argument is a long long , it is passed in R6 and R7, R6 containing the most significant long word and R7 containing the least significant long word, regardless of the endianess mode. For return value, i have done as mentioned in (4) but I am not sure how to control the argument passing so that R6 contains the msw and R7 contains lsw, regardless of the endianess mode. For both return values and arguments, we support a PARALLEL which allows the target to indicate where each piece of the value is located. It's also true that the generated rtl is more complicated, so you'd want to avoid this solution in big-endian mode, when it isn't needed. So here you would do if (WORDS_BIG_ENDIAN) return gen_rtx_REG (DImode, 6); else { rtx r6, r7, par; r7 = gen_rtx_REG (SImode, 7); r7 = gen_rtx_EXPR_LIST (SImode, r7, GEN_INT (0)); r6 = gen_rtx_REG (SImode, 6); r6 = gen_rtx_EXPR_LIST (SImode, r6, GEN_INT (4)); par = gen_rtx_PARALLEL (DImode, gen_rtvec (2, r7, r6))); return par; } See the docs for FUNCTION_ARG for details. I am getting the following error when i make a function call. (call_insn 18 17 19 3 1.c:29 (set (parallel:DI [ (expr_list:REG_UNUSED (reg:SI 7 d7) (const_int 0 [0x0])) (expr_list:REG_UNUSED (reg:SI 6 d6) (const_int 4 [0x4])) ]) (call:SI (mem:SI (symbol_ref:SI (dd1) [flags 0x41] function_decl 0xb7bfa980 dd1) [0 S4 A8]) (const_int 8 [0x8]))) -1 (nil) (expr_list:REG_DEP_TRUE (use (reg:SI 7 d7)) (expr_list:REG_DEP_TRUE (use (reg:SI 6 d6)) (nil How do i write a pattern for this? Another question is in LITTLE ENDIAN mode for the return value will the compiler know that values are actually stored the other way.. in big endian format? And generate the code accordingly for the rest of the program? Regards, Shafi
DI mode and endianess
HI, I am trying to port a 32bit target in GCC 4.4.0. My target supports big and little endian. This is selected using a target switch. So i have defined the macro #define WORDS_BIG_ENDIAN (TARGET_BIG_ENDIAN) Currently i have written pattens only for SImode moves. So GCC will synthesize DImode patterns for me. The problem is that GCC is generating the same code for both big and little endian i.e for the following code extern long long h; extern long long j; extern long long k; int temp() { k = j+h; return 0; } the compiler is generating the following code. section .text local ALIGN 16 GLOBAL _temp _temp: mov _h,d4 mov _h+4,d5 mov _j,d2 mov _j+4,d3 addd4,d2 adcd5,d3 mov d2,_k mov d3,_k+4 ret SIZE_temp,*-_temp irrespective of which endian it is. What could i be missing here? Should i add anything specific for this in the back-end? Regards, Shafi
Re: About feasibility of implementing an instruction
2009/7/3 Ian Lance Taylor i...@google.com: Mohamed Shafi shafi...@gmail.com writes: I just want to know about the feasibility of implementing an instruction for a port in gcc 4.4 The target has 40 bit register where the normal load/store/move instructions will be able to access the 32 bits of the register. In order to move data into the rest of the register [b32 to b39] the data has to be stored into a 32bit memory location. The data should be stored in such a way that if it is stored for 0-7 in memory the data can be moved to b32-b39 of a even register and if the data in the memory is stored in 16-23 of the memory word then it can be moved to b32-b39 of a odd register. Hope i make myself clear. Will it be possible to implement this in the gcc back-end so that the particular instruction is supported? In general, the gcc backend can do anything, so, yes, this can be supported. It sounds like this is not a general purpose register, so I would probably do it using a builtin function. If you need to treat it as a general purpose register (i.e., the register is managed by the register allocator) then you will need a secondary reload to handle this. This is a general purpose register. All the 40 bits are used only for fixed-point data types. When the register is used for fixed-point data type all the operations except initialization, are done through built-in functions. For initialization the immediate value should move through a memory ..i.e there is no immediate load when the data is 40bit. So i am planning to control this using LEGITIMATE_CONSTANT macro. But then i have a question. If all the operations are through intrinsics will there be a need for spilling for the variables used in the built-in functions? If so then depending on the register that get spilled is even or odd [b32 to b39] of the register gets stored in the memory to [b0 to b7] or [b16 tr b23] respectively. Will i be able to keep track of the spilling so that i can reload into the proper register? Hope i am clear. Regards Shafi
Restrictive addressing mode
Hello all, I am trying to port a 32bit target in GCC 4.4.0 Of the addressing modes that are allowed by my target the one with (base register + offset) is restrictive in QImode. The restriction is that if the base register is not Stack Pointer then this kind of address cannot come in a load instruction but only in store instruction. So how can i implement this? Should i do a define_expand for movQi3 and force it to a register when i get this addressing mode? Please let me know your thoughts on this. Regards, shafi
Re: How to set the alignment
2009/8/5 Jim Wilson wil...@codesourcery.com: On Tue, 2009-08-04 at 11:09 +0530, Mohamed Shafi wrote: i am not able to implement the alignment for short. The following is are the macros that i used for this #define PARM_BOUNDARY 8 #define STACK_BOUNDARY 64 The target is 32bit . The first two parameters are passed in registers and the rest in stack. For the parameters that are passed in stack the alignment is that of the data type. The stack pointer is 8 byte aligned. char is 1 byte, int is 4 byte and short is 2 byte. The code that is getting generated is give below (-O0 -fomit-frame-pointer) Er, wait. You set PARM_BOUNDARY to 8. This means all arguments will be padded to at most an 8-bit boundary, which means that yes, a short after a char will have only 1 byte alignment. If you want all arguments to have 2-byte alignment, then you need to set PARM_BOUNDARY to 16. But you probably want a value of 32 here so that 4-byte ints get 4-byte alignment. This will allocate a minimum 4-byte stack slot for every argument. I don't know the calling convention, so I don't know exactly how you want arguments arranged on the stack. If you are pushing arguments, then you can lie in the PUSH_ROUNDING macro. You could say for instance that one byte pushes always push 2 bytes. This ensures that the stack always has 2-byte alignment while pushing arguments. If your push instruction doesn't actually do this, then you need to modify the pushqi pattern to emit two pushes or use a HImode push to get the right behaviour. Try looking at the code in store_one_arg in calls.c, and emit_push_insn in expr.c. What i did was to define FUNCTION_ARG_BOUNDARY macro to return the alignment as per the requirement. i.e 8bits for char, 16bits for short, 32bits for int and kept PARM_BOUNDARY to 8. Now the complier is emitting the alignment prperly. Is this OK? Regards, Shafi
Re: How to set the alignment
2009/8/3 Jim Wilson wil...@codesourcery.com: On 08/03/2009 02:14 AM, Mohamed Shafi wrote: short - 2 bytes i am not able to implement the alignment for short. The following is are the macros that i used for this #define PARM_BOUNDARY 8 #define STACK_BOUNDARY 64 You haven't explained what the actual problem is. Is there a problem with global variables? Is the variable initialized or uninitialized? If it is uninitialized, is it common? If this a local variable? Is this a function argument or parameter? Is this a named or unnamed (stdarg) argument or parameter? Etc. It always helps to include a testcase. You should also mention what gcc is currently emitting, why it is wrong, and what the output should be instead. All this talk about stack and parm boundary suggests that it might be an issue with function arguments, in which case you will probably have to describe the calling conventions a bit so we can understand what you want. This is the test case that i tried short funs (int a, int b, char w,short e,short r) { return e+r; } The target is 32bit . The first two parameters are passed in registers and the rest in stack. For the parameters that are passed in stack the alignment is that of the data type. The stack pointer is 8 byte aligned. char is 1 byte, int is 4 byte and short is 2 byte. The code that is getting generated is give below (-O0 -fomit-frame-pointer) funs: add 16,sp mov d0,(sp-16) mov d1,(sp-12) movh (sp-19),d0 movh d0,(sp-8) movh (sp-21),d0 movh d0,(sp-6) movh (sp-8),d1 movh (sp-6),d0 add d1,d0,d0 sub16,sp ret From the above code you can see that some of the half word access is not aligned on a 2byte boundary. So where am i going wrong. Hope this info is enough Regards, Shafi
Re: Output sections
2009/8/1 Dave Korn dave.korn.cyg...@googlemail.com: Mohamed Shafi wrote: I am looking for adding something to the end of each section in the generated .s file. Using TARGET_ASM_NAMED_SECTION i will be able to keep track of the sections that are being emitted. But from TARGET_ASM_FILE_END hook how can i re-enter into each section. Are the sections stored in some global variable? I'm not sure I understand the question. You enter a section simply by emitting the correct .section directive into the asm output. You re-enter it by the same method. cheers, DaveK Ok, Then i don't understand your solution. you could use the TARGET_ASM_FILE_END hook to output directives that re-enter each used section and then output your new directive. if i want to do the following in the assembly output section .code . . .. section_end you are saying that if i emit a section directive the compiler will switch to the previously emitted section and then i have to somehow seek to the end of that section and emit my 'section_end' directive? Shafi
Re: Output sections
2009/8/1 Dave Korn dave.korn.cyg...@googlemail.com: Mohamed Shafi wrote: 2009/8/1 Dave Korn dave.korn.cyg...@googlemail.com: Mohamed Shafi wrote: I am looking for adding something to the end of each section in the generated .s file. Using TARGET_ASM_NAMED_SECTION i will be able to keep track of the sections that are being emitted. But from TARGET_ASM_FILE_END hook how can i re-enter into each section. Are the sections stored in some global variable? I'm not sure I understand the question. You enter a section simply by emitting the correct .section directive into the asm output. You re-enter it by the same method. Ok, Then i don't understand your solution. Ah, it looks like I didn't quite understand your problem. you could use the TARGET_ASM_FILE_END hook to output directives that re-enter each used section and then output your new directive. if i want to do the following in the assembly output section .code . . .. section_end I thought you just wanted to have .section .code section_end .section .data section_end ... etc. for all used sections, at the very end of the file; after all, all the contributions to a section get concatenated in the assembler. Now you seem to be saying that you want to have multiple section_end directives throughout the file, every time the current section changes. you are saying that if i emit a section directive the compiler will switch to the previously emitted section and then i have to somehow seek to the end of that section and emit my 'section_end' directive? I think you may need to re-read the assembler manual about sections, you are a little confused about the concepts. The compiler doesn't really switch anything; the compiler emits .section directives, in response to which the *assembler* switches to emit code in the chosen section. The compiler doesn't keep track of sections; it just randomly emits directives for whichever one it wants the assembly output to go into at any given time, according to whether it's generating the assembly for a function or a variable or other data object. Ok. will TARGET_NAMED_SECTION get invoked for the normal sections like text, data, bss ? I tired to include this hook in my code, but the execution never reaches this hook for the sections. Shafi
Re: Output sections
2009/7/18 Dave Korn dave.korn.cyg...@googlemail.com: Mohamed Shafi wrote: Hello all, Is it possible to emit a assembler directive at the end of each sections? Say like section_end Is there any support for doing something like this in the back-end files? Or should i need to the make changes in the gcc sources? Is so do does anyone know in which function it should happen? There isn't really such a concept as 'end of a section' until you get to final-link time and get all the contributions from different .o files to a given section. During assembler output GCC treats sections as random access, switching freely from one to another and back; it doesn't have any concept of starting/stopping/opening/closing a section but just jumps into any one it likes completely ad-hoc. Assuming you're happy with adding something to the end of each section in each generated .s file, you could use the TARGET_ASM_FILE_END hook to output directives that re-enter each used section and then output your new directive. You may find it hard to know which sections have been used or not in a given file - you can define TARGET_ASM_NAMED_SECTION and make a note of which sections get invoked there, but I'm not sure if that gets called for all sections e.g. init/fini, you may have to try it and see. I am looking for adding something to the end of each section in the generated .s file. Using TARGET_ASM_NAMED_SECTION i will be able to keep track of the sections that are being emitted. But from TARGET_ASM_FILE_END hook how can i re-enter into each section. Are the sections stored in some global variable? Shafi
Re: current_function_outgoing_args_size
2009/7/18 Ian Lance Taylor i...@google.com: Mohamed Shafi shafi...@gmail.com writes: The change logs says that current_function_outgoing_args_size is no more available. But it doesnt say with what it is replaced. Looking at the other targets i find that its replaced with some field in a structure crtl. Where is this defined/declared. crtl is declared in function.h. I am working in GCC 4.4.0. I checked with the mainline internals. Even there the references of these deleted variables are not replaced. Could somebody please take care of this. And also references to regs_ever_live. Regards, Shafi
Output sections
Hello all, Is it possible to emit a assembler directive at the end of each sections? Say like section_end Is there any support for doing something like this in the back-end files? Or should i need to the make changes in the gcc sources? Is so do does anyone know in which function it should happen? Regards, Shafi
current_function_outgoing_args_size
Hello all, The change logs says that current_function_outgoing_args_size is no more available. But it doesnt say with what it is replaced. Looking at the other targets i find that its replaced with some field in a structure crtl. Where is this defined/declared. I am working in GCC 4.4.0. I checked with the mainline internals. Even there the references of these deleted variables are not replaced. Could somebody please take care of this. Regards, Shafi
Function argument passing
Hello all, I am doing a port for a private target in GCC 4.4.0. It generates code for both little big endian. The ABI for the target is as follows: 1. All arguments passed in stack are passed using their alignment constrains. Solution: For this to happen no argument promotion should be done. 2. Functions with a variable number of arguments pass the last fixed argument and all subsequent variable arguments on the stack. Such arguments of fewer than 4 bytes are located on the stack as if the argument had been promoted to 32 bits. Solution: For TARGET_STRICT_ARGUMENT_NAMING the internals says the following : This hook controls how the named argument to FUNCTION_ARG is set for varargs and stdarg functions. If this hook returns true, the named argument is always true for named arguments, and false for unnamed arguments. If it returns false, but TARGET_PRETEND_OUTGOING_VARARGS_NAMED returns true, then all arguments are treated as named. Otherwise, all named arguments except the last are treated as named. So i made both TARGET_STRICT_ARGUMENT_NAMING and PRETEND_OUTGOING_VARARGS_NAMED to return false. Is this correct? How to make the varargs argument to be promoted to 32bits when the normal argument don't require promotion as mentioned in point (1) ? 3. A function returning a structure or union receives in D0 the address of the returned structure or union. The caller allocates space for the returned object. Solution: Used TARGET_FUNCTION_VALUE and returned D0 reg_rtx for structure and unions. 4. A long long return value is returned in R6 and R7, R6 containing the most significant long word and R7 containing the least significant long word, regardless of the endianess mode. Solution: Used TARGET_RETURN_IN_MSB to return true when the mode is little endian 5. If the first argument is a long long , it is passed in R6 and R7, R6 containing the most significant long word and R7 containing the least significant long word, regardless of the endianess mode. For return value, i have done as mentioned in (4) but I am not sure how to control the argument passing so that R6 contains the msw and R7 contains lsw, regardless of the endianess mode. Regards, Shafi
CALL_USED_REGISTERS vs CALL_REALLY_USED_REGISTERS
Hello all, The GCC 4.4.0 internal says : [Macro] CALL_REALLY_USED_REGISTERS Like CALL_USED_REGISTERS except this macro doesn’t require that the entire set of FIXED_REGISTERS be included. (CALL_USED_REGISTERS must be a superset of FIXED_ REGISTERS). This macro is optional. If not specifed, it defaults to the value of CALL_USED_REGISTERS. But it doesn't say why one needs to use this. What is the need for the macro CALL_REALLY_USED_REGISTERS when compared to CALL_USED_REGISTERS? regards, Shafi
About feasibility of implementing an instruction
Hello all, I just want to know about the feasibility of implementing an instruction for a port in gcc 4.4 The target has 40 bit register where the normal load/store/move instructions will be able to access the 32 bits of the register. In order to move data into the rest of the register [b32 to b39] the data has to be stored into a 32bit memory location. The data should be stored in such a way that if it is stored for 0-7 in memory the data can be moved to b32-b39 of a even register and if the data in the memory is stored in 16-23 of the memory word then it can be moved to b32-b39 of a odd register. Hope i make myself clear. Will it be possible to implement this in the gcc back-end so that the particular instruction is supported? Regards, Shafi
Variable Length Execution Set?
Hi all, Does GCC support architectures that has Variable Length Execution Set (VLES)? Are there any developments happening in this direction? Regards, Shafi
Re: Variable Length Execution Set?
2009/5/27 Ian Lance Taylor i...@google.com: Mohamed Shafi shafi...@gmail.com writes: Does GCC support architectures that has Variable Length Execution Set (VLES)? Are there any developments happening in this direction? gcc supports many instruction sets whose instructions are not all the same size, including x86. In particular, gcc supports ia64, which uses bundling. If you mean something else, I think you need to give more details. I know that GCC supports VLIW. VLES is similar to VLIW, except that in a packet i can have variable number of instruction. ie. each packet should contain at least one instruction with a max of 6 instructions in a packet. Shafi
Re: insn does not satisfy its constraints
- Original Message From: Omar Torres [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: gcc@gcc.gnu.org Sent: Saturday, August 30, 2008 12:11:36 AM Subject: Re: insn does not satisfy its constraints shafi wrote: Operand 0 is a register Operand 1 is a memory Operand 2 is a register The md description for this instruction is: ;; addhi3 (define_expand addhi3 [(set (match_operand:HI 0 register_operand ) (plus:HI (match_operand:HI 1 cool_addhi_operand ) (match_operand:HI 2 cool_addhi_operand )))] ) (define_insn *addhi3 [(set (match_operand:HI 0 register_operand=r ,r ,r) (plus:HI (match_operand:HI 1 cool_addhi_operand %0 ,rim,r) (match_operand:HI 2 cool_addhi_operand rim,0 ,r)))] Do you have an option where operand 0 is reg and operand 1 is mem and operand 2 is reg? My purpose is to describe the three possible scenarios: 1) Operand 0 is a register Operand 1 is the same register as operand 0 Operand 2 is a register, immediate or memory 2) Operand 0 is a register Operand 1 is a register, immediate or memory Operand 2 is the same register as operand 0 3) Operand 0 is a register Operand 2 is a register Operand 3 is also a register I am not sure what rim is for? rim = is a short cut for r, m, i. I think is is allow to mix several constrains like this, right? So rim is a user define constraint. Then i think you may want to look properly into EXTRA_CONSTRAINT_STR. Probably this is where you might be going wrong. HTH Shafi
Need a pointer for debugging
Hello all, I am involved in the porting of GCC 4.1.2 for a 16 bit target. The target doenst have any SImode comparisons. Most of the time SImode comparisons are synthesized using HImode comparisons. But some in some instances SImode patterns are generated like that, where the code is expanded in the pattern template. During the regression i got a ICE related to SImode comparisons, more specifically in unroll_and_peel_loops(). In unroll_loop_runtime_iterations called from unroll_and_peel_loops () the following piece of code for (i = 0; i n_peel; i++) { /* Peel the copy. */ sbitmap_zero (wont_exit); if (i != n_peel - 1 || !last_may_exit) SET_BIT (wont_exit, 1); ok = duplicate_loop_to_header_edge (loop, loop_preheader_edge (loop), loops, 1, wont_exit, desc-out_edge, remove_edges, n_remove_edges, DLTHE_FLAG_UPDATE_FREQ); gcc_assert (ok); /* Create item for switch. */ j = n_peel - i - (extra_zero_check ? 0 : 1); p = REG_BR_PROB_BASE / (i + 2); preheader = loop_split_edge_with (loop_preheader_edge (loop), NULL_RTX); branch_code = compare_and_jump_seq (copy_rtx (niter), GEN_INT (j), EQ, block_label (preheader), p, NULL_RTX); swtch = loop_split_edge_with (single_pred_edge (swtch), branch_code); set_immediate_dominator (CDI_DOMINATORS, preheader, swtch); single_pred_edge (swtch)-probability = REG_BR_PROB_BASE - p; e = make_edge (swtch, preheader, single_succ_edge (swtch)-flags EDGE_IRREDUCIBLE_LOOP); e-probability = p; } generated SImode comparisons from 'compare_and_jump_seq()'. This SImode comparisons are synthesized using HImode comparisons. This results is the generation of two jump instructions and the ICE is because both jump instructions end up in the same basic block. From bug.c.21.loop2_unroll ;; Start of basic block 26, registers live: (nil) (note 248 174 245 26 [bb 26] NOTE_INSN_BASIC_BLOCK) (jump_insn 245 248 246 26 (set (pc) (if_then_else (ne:CC (subreg:HI (reg:SI 113) 0) (const_int 0 [0x0])) (label_ref 247) (pc))) -1 (nil) (nil)) (jump_insn 246 245 247 26 (set (pc) (if_then_else (eq:CC (subreg:HI (reg:SI 113) 2) (const_int 0 [0x0])) (label_ref 244) (pc))) -1 (nil) (nil)) (code_label 247 246 224 26 18 [0 uses]) ;; End of basic block 26, registers live: (nil) Note that if i compile a code which has SImode EQ comparisons the basic blocks and code is generated properly. Right now i am stuck in debugging. Could anybody please provide me with any pointers? Regards, Shafi
ICE in flow.c - Gcc 4.1.2 private port
Hello all, For the target that i am porting if support for partial argument passing is enabled i get the following error: error: Attempt to delete prologue/epilogue insn: internal compiler error: in propagate_one_insn, at flow.c:1699 This is 16bit target with 4 argument registers. FRAME_POINTER_REQUIRED is defined to 0. The code that is being complied is : f(float a[],int b[],int c,float d) { } From *.c.00.expand ;; Start of basic block 0, registers live: (nil) (note 15 2 6 0 [bb 0] NOTE_INSN_BASIC_BLOCK) (insn 6 15 7 0 (set (reg/v/f:HI 24 [ a ]) (reg:HI 0 R0 [ a ])) -1 (nil) (nil)) (insn 7 6 8 0 (set (reg/v/f:HI 25 [ b ]) (reg:HI 1 R1 [ b ])) -1 (nil) (nil)) (insn 8 7 9 0 (set (reg/v:HI 26 [ c ]) (reg:HI 2 R2 [ c ])) -1 (nil) (nil)) (insn 9 8 12 0 (set (mem/c/i:HI (reg/f:HI 18 virtual-incoming-args) [0 d+0 S2 A16]) (reg:HI 3 R3)) -1 (nil) (nil)) (insn 12 9 10 0 (clobber (reg/v:SF 27 [ d ])) -1 (nil) (nil)) (insn 10 12 11 0 (set (subreg:HI (reg/v:SF 27 [ d ]) 0) (mem/c/i:HI (reg/f:HI 18 virtual-incoming-args) [0 d+0 S2 A16])) -1 (nil) (nil)) (insn 11 10 13 0 (set (subreg:HI (reg/v:SF 27 [ d ]) 2) (mem/c/i:HI (plus:HI (reg/f:HI 18 virtual-incoming-args) (const_int 2 [0x2])) [0 d+2 S2 A16])) -1 (nil) (nil)) (note 13 11 14 0 NOTE_INSN_FUNCTION_BEG) from *.c.37.lreg (note 2 0 15 NOTE_INSN_DELETED) ;; Start of basic block 0, registers live: 3 [R3] 12 [R12] (note 15 2 9 0 [bb 0] NOTE_INSN_BASIC_BLOCK) (insn 9 15 13 0 (set (mem/c/i:HI (reg/f:HI 12 R12) [0 d+0 S2 A16]) (reg:HI 3 R3)) 1 {movhi_internal} (nil) (nil)) (note 13 9 17 0 NOTE_INSN_FUNCTION_BEG) from *.c.40.flow2 (note 15 2 31 0 [bb 0] NOTE_INSN_BASIC_BLOCK) (insn/f 31 15 32 0 (set (reg/f:HI 12 R12) (minus:HI (reg/f:HI 12 R12) (const_int 2 [0x2]))) -1 (nil) (nil)) (insn/f 32 31 33 0 (set (mem:HI (reg/f:HI 12 R12) [0 S2 A16]) (reg:HI 3 R3)) -1 (nil) (nil)) (note 33 32 9 0 NOTE_INSN_PROLOGUE_END) (insn 9 33 13 0 (set (mem/c/i:HI (reg/f:HI 12 R12) [0 d+0 S2 A16]) (reg:HI 3 R3)) 1 {movhi_internal} (nil) (nil)) (note 13 9 17 0 NOTE_INSN_FUNCTION_BEG) When argument is passed partially then 'current_function_pretend_args_size' is initialized and prologue will set stack space accordingly. Based on the live information 'propagate_one_insn()' is trying to delete the insn from the prologue. My question is is gcc suppose to delete insn 9, even before prologue generation ? If its not the case where am i going wrong? Regards, Shafi
What are the functions that i can use?
Hello all, I am involved in porting gcc 4.1.2. For some processing i need to know whether a register is being defined and used in a particular instruction. Till now i have been using 'refers_to_regno_p()' to know whether a register is being used in a instruction and 'modified_in_p()' to know whether a register is being defined in the instruction. But 'refers_to_regno_p()' also looks into expr_list and/or notes in an instruction. So sometimes refers_to_regno_p() returns 1 when the register is referred in the expr_list in the instruction even though its use list of the instruction. Could any one tell me the functions that i can use to find out whether an register is being used and/or defined in a particular instruction? Regards, Shafi
A question about varargs
Hello all, I am involved in the porting of gcc 4.1.2 for 16 bit target. For this target size of long long is 32bits. For the following code #define VALUE 0x1B4E81B4E81B4DLL #define AFTER 0x55 //void test (int n, long long q, int y); void test (int n, ...); int main () { test (1, VALUE, AFTER); exit(0); } i find that the machine mode of the arguments of test are HImode, DImode and HImode. When replace function 'test' with normal one instead of varargs i find that the machine modes are HImode, SImode and HImode respectively. My question is even if the function is a vararg function shouldn't the mode of the argument be SImode instead of DImode since long long is only 32bit for the target? Regards, Shafi
Re: A question about varargs
2008/7/16 Ian Lance Taylor [EMAIL PROTECTED]: Mohamed Shafi [EMAIL PROTECTED] writes: I am involved in the porting of gcc 4.1.2 for 16 bit target. For this target size of long long is 32bits. For the following code #define VALUE 0x1B4E81B4E81B4DLL That is not a 32-bit value. #define AFTER 0x55 //void test (int n, long long q, int y); void test (int n, ...); int main () { test (1, VALUE, AFTER); exit(0); } i find that the machine mode of the arguments of test are HImode, DImode and HImode. When replace function 'test' with normal one instead of varargs i find that the machine modes are HImode, SImode and HImode respectively. My question is even if the function is a vararg function shouldn't the mode of the argument be SImode instead of DImode since long long is only 32bit for the target? The value is too big for a long long. When you specify the type, gcc is forced to convert (I hope you can get a warning for that). When you don't specify the type, gcc does not convert. The resulting value has a type which can only be expressed using a gcc extension. So the behavior that i am getting is a proper one. If you change the TARGET_SCALAR_MODE_SUPPORTED_P hook to reject all modes larger than SImode, you may get a different result--probably some sort of error. Yes this is one option that i dint think about. But let me ask you some thing. for my target when returning structures will use registers, if its available. So a structure that has size of 16x4 will be given 4 registers (i.e DImode). So if i use this hook will the structure returning work properly? I mean will they be broken down into two 32bit data types? Shafi
Is this the expected behavior?
Hello all, I am not sure if this the right mailing list. I am involved in the porting of gcc 4.1.2 for a 16 bit target. In some cases i noticed that callee save registers were getting allocated in the body even though there isn't any function call. I believe that callee save registers will be allocated only if some variable values are required across a function call. So if there is no function call there shouldn't be any callee save registers used in a function body. So my question is will GCC allocate callee save registers for function even if the function doesn't call any other function? Or is this a gcc bug? Hope my question is clear. Regards, Shafi
Re: Is this the expected behavior?
2008/7/15 Ramana Radhakrishnan [EMAIL PROTECTED]: Hi Mohamed, Why not ? Callee save registers are after all registers and the split is in the ABI's head (so to speak). So GCC is well within its right to use callee save registers. In fact if you were in a leaf function that did not make any function calls the first preference would be to allocate caller save registers and then to allocate callee save registers - Instead of spilling a caller save register , GCC could very well use a callee save register and the only extra cost would be saving and restoring context of the callee save register in the prologue and the epilogue respectively. I agree with you, but what about when there are still caller save register are available and there are no register restrictions for any instructions? In my case i find that GCC has used only the argument registers, stack pointer and callee saved registers. So out of the 16 available registers ony 5+1+4 registers were used, even though there was 6 caller save registers were available HTH. cheers Ramana On Tue, Jul 15, 2008 at 7:50 AM, Mohamed Shafi [EMAIL PROTECTED] wrote: Hello all, I am not sure if this the right mailing list. I am involved in the porting of gcc 4.1.2 for a 16 bit target. In some cases i noticed that callee save registers were getting allocated in the body even though there isn't any function call. I believe that callee save registers will be allocated only if some variable values are required across a function call. So if there is no function call there shouldn't be any callee save registers used in a function body. So my question is will GCC allocate callee save registers for function even if the function doesn't call any other function? Or is this a gcc bug? Hope my question is clear. Regards, Shafi -- Ramana Radhakrishnan
Re: Is this the expected behavior?
2008/7/15 Ramana Radhakrishnan [EMAIL PROTECTED]: snipped parts of the last mail I agree with you, but what about when there are still caller save register are available and there are no register restrictions for any instructions? In my case i find that GCC has used only the argument registers, stack pointer and callee saved registers. So out of the 16 available registers ony 5+1+4 registers were used, even though there was 6 caller save registers were available Check your REG_ALLOC_ORDER macro ? The order is argument registers, caller save registers and finally the callee save registers. cheers Ramana HTH. cheers Ramana On Tue, Jul 15, 2008 at 7:50 AM, Mohamed Shafi [EMAIL PROTECTED] wrote: Hello all, I am not sure if this the right mailing list. I am involved in the porting of gcc 4.1.2 for a 16 bit target. In some cases i noticed that callee save registers were getting allocated in the body even though there isn't any function call. I believe that callee save registers will be allocated only if some variable values are required across a function call. So if there is no function call there shouldn't be any callee save registers used in a function body. So my question is will GCC allocate callee save registers for function even if the function doesn't call any other function? Or is this a gcc bug? Hope my question is clear. Regards, Shafi -- Ramana Radhakrishnan -- Ramana Radhakrishnan
How to get signedness from rtx?
Hello all, Is there a way to know whether an operand is signed or unsigned from its rtx? Regards, Shafi
How to identify comparison of 8bit operands
Hello all, I am involved in porting a 16bit target in gcc 4.1.2 The target that i am porting to has a minor flaw. Comparison of signed variables will go wrong. So i have to use a different approach to do comparison of signed operands. This obviously takes more cycles and instructions. But the comparison of sign-extended 8bit values are proper. So i can use the normal comparison for char and the modified one for 16bit values. So my question is in the back-end will i be able to identify between comparisons of signed-extended 8bit and 16bit operands? Regards, Shafi
How to implement conditional execution
Hello all, For the 16-bit target that i porting now to gcc 4.1.2 doesn't have any branch instructions. It only has jump instructions. For comparison operation it has this instruction: if cond Rx Ry execute this insn So compare and branch is implemented as if cond Rx Ry jmp Label If the condition in the 'if' instruction is satisfied the processor will execute the next instruction or it will replace with a nop. So this means that i can instructions similar to: if eq Rx, Ry add Rx, Ry add Rx, 2 This is similar to conditional execution. This way any instruction can be executed conditionally. But this is different from normal. Normally the comparison operations set the status flags. An instruction gets conditionally executed based on these flags. This means that GCC can schedule instructions between the comparison instruction and the conditional instruction, provided none of the scheduled instructions are altering the status flags. This is not possible in my case as there shouldn't be any instruction between 'if eq Rx, Ry' and 'add Rx, Ry' and this is not as such an comparison operation and 'if' instruction doesn't set any status flags. Will it be possible to implement this in the Gcc backend ? Does any other targets have similar instructions? Regards, Shafi
Can register rename pass rename a callee-saved register?
Hello everyone, I am involved in gcc port in which i found the following problem. Before register renaming pass, callee registers was being used in the body of the code. Hence function prologue saved the register and epilogue restored the register. But register renaming pass removed this particular callee saved register.The output and code generation is proper, but there is an unnecessary save and restore of a callee saved register in the prologue and epilogue even though the reference of the callee saved register has been removed by the renaming pass. I am using the prologue/epilogue patterns instead of the target macros. So is the rename pass allowed to rename a callee saved register? Where might this going wrong? Thanks for you help. Regards, Shafi
Re: Can register rename pass rename a callee-saved register?
2008/6/19 Ian Lance Taylor [EMAIL PROTECTED]: Mohamed Shafi [EMAIL PROTECTED] writes: Before register renaming pass, callee registers was being used in the body of the code. Hence function prologue saved the register and epilogue restored the register. But register renaming pass removed this particular callee saved register.The output and code generation is proper, but there is an unnecessary save and restore of a callee saved register in the prologue and epilogue even though the reference of the callee saved register has been removed by the renaming pass. I am using the prologue/epilogue patterns instead of the target macros. Which version of gcc? I was under the impression that this longstanding buglet was cleaned up by the dataflow work. I am doing a port in gcc 4.1.2. The register is actually replaced by register copy-propagation optimization pass. Here is the rtl dumps before .rnreg (the relevant portions) (insn/f 42 41 43 0 (set (mem:HI (reg/f:HI 12 R12) [0 S2 A16]) (reg:HI 4 R4)) -1 (nil) (expr_list:REG_DEAD (reg:HI 4 R4) (nil))) (note 43 42 9 0 NOTE_INSN_PROLOGUE_END) (note 9 43 14 0 NOTE_INSN_FUNCTION_BEG) (insn 14 9 37 0 (set (reg:HI 4 R4 [orig:26+2 ] [26]) (reg:HI 0 R0 [ pExtern ])) 1 {*movhi_internal} (insn_list:REG_DEP_ANTI 16 (nil)) (expr_list:REG_DEAD (reg:HI 0 R0 [ pExtern ]) (expr_list:REG_NO_CONFLICT (reg/v:SI 0 R0 [orig:23 pExtern ] [23]) (nil (insn 37 14 18 0 (set (reg:HI 8 R8) (const_int 42 [0x2a])) 1 {*movhi_internal} (nil) (nil)) (insn 18 37 38 0 (set (unspec:HI [ (reg:HI 8 R8) ] 2) (unspec_volatile:HI [ (reg:HI 4 R4 [orig:26+2 ] [26]) ] 6)) 6 {out} (insn_list:REG_DEP_TRUE 17 (insn_list:REG_DEP_ANTI 14 (insn_list:REG_DEP_TRUE 7 (insn_list:REG_DEP_TRUE 6 (nil) (expr_list:REG_DEAD (reg:HI 8 R8) (expr_list:REG_DEAD (reg:HI 4 R4 [orig:26+2 ] [26]) (nil And this is the after the optimization pass insn 18: replaced reg 4 with 0 . (insn/f 42 41 43 0 (set (mem:HI (reg/f:HI 12 R12) [0 S2 A16]) (reg:HI 4 R4)) 1 {*movhi_internal} (nil) (expr_list:REG_DEAD (reg:HI 4 R4) (nil))) (note 43 42 9 0 NOTE_INSN_PROLOGUE_END) (note 9 43 37 0 NOTE_INSN_FUNCTION_BEG) (insn 37 9 18 0 (set (reg:HI 8 R8) (const_int 42 [0x2a])) 1 {*movhi_internal} (nil) (nil)) (insn 18 37 38 0 (set (unspec:HI [ (reg:HI 8 R8) ] 2) (unspec_volatile:HI [ (reg:HI 0 R0 [orig:26+2 ] [26]) ] 6)) 6 {out} (insn_list:REG_DEP_TRUE 17 (insn_list:REG_DEP_ANTI 14 (insn_list:REG_DEP_TRUE 7 (insn_list:REG_DEP_TRUE 6 (nil) (expr_list:REG_DEAD (reg:HI 8 R8) (expr_list:REG_DEAD (reg:HI 0 R0 [orig:26+2 ] [26]) (nil So is the rename pass allowed to rename a callee saved register? Where might this going wrong? If this is the buglet I'm thinking of, the resulting code does work, despite being suboptimal. It just does an unnecessary save and restore. The resulting code is proper except for the unnecessary save and restore. Ian
Re: Can register rename pass rename a callee-saved register?
2008/6/19 Ian Lance Taylor [EMAIL PROTECTED]: Mohamed Shafi [EMAIL PROTECTED] writes: Which version of gcc? I was under the impression that this longstanding buglet was cleaned up by the dataflow work. I am doing a port in gcc 4.1.2. The register is actually replaced by register copy-propagation optimization pass. I believe that in gcc 4.3 this unnecessary store and load should no longer happen. Can you tell me what was done in gcc 4.3 so that i can back port the changes to gcc 4.1.2 Regards, Shafi
Re: Can register rename pass rename a callee-saved register?
2008/6/20 Andrew Pinski [EMAIL PROTECTED]: On Thu, Jun 19, 2008 at 11:56 PM, Mohamed Shafi [EMAIL PROTECTED] wrote: Can you tell me what was done in gcc 4.3 so that i can back port the changes to gcc 4.1.2 It was a rewrite of life information of flow.c really. It is very hard to backport (trust me I have tried already). So i should do something in the machine reorg pass to catch cases like these. I guess that is the only hack that is possible. Is there any other way? Was there a bug report filed for this case? Regards, Shafi
How to write pattern for addition with carry operation
Hello all, The 16bit target that i am porting to gcc4.1.2 doesn't have any instructions for 32bit operations. But for addition and subtraction there is addc subc instructions that consider carry bit also. Presently i have patterns for SImode addition and subtraction such that the template will have add %0, %1\naddc %N0, %N1 sub %0, %1\nsubc %N0, %N1 Will it be possible for me to write separate patterns for the instructions add and addc? Regards, Shafi
How to insert nops
Hello all, For the big endian 16bit target that i am porting to gcc 4.1.2 a nop is needed after a load instruction if the destination register of the load instruction is used as the source in the next instruction. So load R0, R3[2] add R2, R0 needs a nop inserted in between the instructions. I have issues when the operation is that of 32bit data types. The target doesn't have any 32bit instructions. All the 32bit move instructions are split after reload. The following is an example where i am having issues (set (reg:HI 2 R2) (mem/s:HI (reg/f:HI 8 R8) (set (reg:HI 3 R3) (mem/s:HI (plus:HI (reg/f:HI 8 R8) (const_int 2 [0x2])) (set (reg:SI 0 R0) (minus:SI (reg:SI 0 R0) (reg:SI 2 R2))) load R2, R8 load R3, R8[2] sub R1, R3 subc R0, R2 For the above case no nop inserted. But because of the endianess src reg gets used in the next instructions. How do i solve this? I do nop insertion in reorg pass where i first do delay slot scheduling. The follwoing is what i have in reorg() for nop insertion attr = get_attr_type (insn); if (next_insn attr == TYPE_LOAD) { if (insn_true_dependent_p (insn, next_insn)) emit_insn_after (gen_nop (), insn); } static bool insn_true_dependent_p (rtx x, rtx y) { rtx tmp; if (! INSN_P (x) || ! INSN_P (y)) return 0; tmp = PATTERN (y); note_stores (PATTERN (x), insn_dependent_p_1, tmp); return (tmp == NULL_RTX); } static void insn_dependent_p_1 (rtx x, rtx pat ATTRIBUTE_UNUSED, void *data) { rtx * pinsn = (rtx *) data; if (*pinsn reg_mentioned_p (x, *pinsn)) *pinsn = NULL_RTX; } I think apart from the above cases i will also have cases where nop gets inserted when it's not really required. How will it be possible to solve this issue? Regards, Shafi
Implementing a restrictive addressing mode for a gcc port - Take 2
Hello all, The target that i am working on is 16bit, big endian and with 16 registers. It has this particular addressing mode load Rd, Ra[offset] store Rs, Ra[offset] where the offset should be positive, base register Ra should be an even register and for the source or the destination register Rd/Ra, the restriction is that it should be one more than the base register . So the following instructions are valid: load R5, R4[4] store R11, R10[2] while the following ones are wrong: load R8, R6[4] store R3, R8[2] What i did to implement this is to have eight register classes with each class having two registers, an even register and an odd register then in define expand look for the register indirect with offset addressing mode and emit gen_store_offset or gen_load_offset pattern if the addressing mode is found. In the pattern i will have the 8 similar constraints for the base register and the source/destination register. But this didn't work out properly, probably because i had many patterns for movhi operations. So i tired what Jim Wilson suggested to me when i posted this question earlier. What he suggested was: One thing you could try is generating a double-word pseudo-reg at RTL expand time, and then using subreg 0 for the source and subreg 1 for the dest (or vice versa depending on endianness/word order). This will get you a register pair you can use from the register allocator. This doesn't help at reload time though. You probably have to define a constraint for every register, and then write an alternative for every register pair matching the correct even register with the correct odd register. That gets you past reload. So i did the following to implement his suggestion. I have single define_expand and define_insn for movhi patterns. In define_expand for movhi i have the folllowing offset = INTVAL(XEXP(XEXP(mem_op, 0), 1)); dword = gen_reg_rtx (SImode); base = simplify_gen_subreg (HImode, dword, SImode, 0); if (mode == Pmode) { reg_op = simplify_gen_subreg (HImode, dword, SImode, 2); mem_op1 = gen_rtx_MEM (Pmode, plus_constant (base, offset)); } else { reg_op = simplify_gen_subreg (QImode, dword, SImode, 3); mem_op1 = gen_rtx_MEM (QImode, plus_constant (base, offset)); } if (GET_CODE (operands[0]) == MEM) { operands[0] = mem_op1; operands[1] = reg_op; } else if (GET_CODE (operands[1]) == MEM) { operands[1] = mem_op1; operands[0] = reg_op; } and in define_insn i have the following pattern: (define_insn *movhi_internal [(set (match_operand:HI 0 nonimmediate_operand =r,R01,R03,R05,R07,R09,R13,R15,r,U00,U02,U04,U06,U08,U12,U14,m,r) (match_operand:HI 1 general_operand r,U00,U02,U04,U06,U08,U12,U14,m,R01,R03,R05,R07,R09,R13,R15,r,i))] where Uxx is memory constraints and Rxx is register constraints. After implementing this i came across this problem: (insn 12 11 13 1 (set (reg/f:HI 24) (mem/c/i:HI (reg/f:HI 25) [0 m+0 S2 A16])) -1 (nil) (nil)) (insn 13 12 14 1 (set (subreg:QI (reg:SI 28) 3) (mem:QI (plus:HI (subreg:HI (reg:SI 28) 0) (const_int 1 [0x1])) [0 S1 A8])) -1 (nil) (nil)) (insn 14 13 15 1 (set (reg:HI 26) (zero_extend:HI (reg:QI 27))) -1 (nil) (nil)) For zero-extend both operands should be in registers. So one operand which was previously in memory is moved to reg27 through load operations(insn 13). Since for this offset addressing mode is used define_expand for movqi will generate SImode register, reg28 and does the operations. But this is not reflected in the subsequent instructions (insn 14). And hence insn 13 is getting deleted as its operands are never used. What i am i doing wrong? Am i implementing the addressing mode properly? Any help is appreciated. Regards, Shafi
How to specify registers constraints for memory operands?
Hello everyone, I need to specify constraints for registers used in the memory operands in a load pattern. For these the following are the things that i have done. #define CONSTRAINT_LEN(CHAR,STR) \ ((CHAR) == 'R' ? 3 \ : DEFAULT_CONSTRAINT_LEN(CHAR,STR)) #define EXTRA_MEMORY_CONSTRAINT(C, STR) \ ((C) == 'R') #define REG_CLASS_FROM_CONSTRAINT(CHAR,STR) \ reg_class_from_constraint (CHAR, STR) #define EXTRA_CONSTRAINT_STR(VALUE,C,STR) \ extra_constraint (VALUE, C, STR) in extra_constraints i have the following code: { if (GET_CODE(value) != MEM) return 0; if (c == 'R') { r = XEXP(value,0); if ((GET_CODE(r) == REG) (REGNO(r) FIRST_PSEUDO_REGISTER)) { rclass = REG_CLASS_FROM_CONSTRAINT(c, str); if (rclass == REGNO_REG_CLASS (REGNO(r))) return 1; } } return 0; } And i have the following pattern in the md file: (define_insn movhi_load [(set (match_operand:HI 0 register_operand =R01,R03,R05,R07,R09,R13,R15,r) (match_operand:HI 1 memory_operand R00,R02,R04,R06,R08,R12,R14,m))] Is this the proper way to do this? Thank you for taking the time to read this. Regards, Shafi
Re: Few question regarding the implementation of splitting HImode patterns
On Sat, May 24, 2008 at 12:26 AM, Omar Torres [EMAIL PROTECTED] wrote: Mohamed Shafi wrote: Hello Omar, I saw your mail to gcc mailing list regarding splitting of HImode patterns into QImode patterns. I am also involved in porting. My problem is similar to yours. But i have to split SImode patterns into HImode patterns. I am sure that you have modified your define_split patterns after receiving the suggestions from the mailing list. Could you just mail me the finalized define_split pattern of HImode. One thing that i noticed in your split pattern is that you are not handling cases where operand[0] is memory, i.e store patterns. How are you handling this? Do you have a define_insn for this case? I hope you don't mind me asking these questions. Thank you for your time. Regards, Shafi Hi Mohamed, I added the gcc mailing list to the threat. My current implementation looks like this: ;; movhi (define_expand movhi [(set (match_operand:HI 0 nonimmediate_operand ) (match_operand:HI 1 general_operand ))] { if (c816_expand_move (HImode, operands)) { DONE; } }) ;; =r creates an early clobber. ;; It prevent insn where the target register ;; is the same as the base register used for memory addressing... ;; This is needed so that the split produce correct code. (define_insn *movhi [(set (match_operand:HI 0 nonimmediate_operand =r,m) (match_operand:HI 1 general_operand g,r))] #) (define_split [(set (match_operand:HI 0 nonimmediate_operand ) (match_operand:HI 1 general_operand ))] reload_completed [(set (match_dup 2) (match_dup 4)) (set (match_dup 3) (match_dup 5))] { gcc_assert (REG_P (operands[0]) || MEM_P (operands[0])); #ifdef DEBUG_OVERLAP if (reg_overlap_mentioned_p(operands[0], operands[1])){ fprintf (stderr, \\nOperands Overlap:\n\); debug_rtx (curr_insn); } #endif if (REG_P (operands[0])) { operands[2] = gen_highpart(QImode, operands[0]); operands[3] = gen_lowpart (QImode, operands[0]); } else if (MEM_P (operands[0])) { operands[2] = adjust_address (operands[0], QImode, 0); operands[3] = adjust_address (operands[0], QImode, 1); } if (MEM_P (operands[1])) {// || CONST == GET_CODE (operands[1])) { operands[4] = adjust_address (operands[1], QImode, 0); operands[5] = adjust_address (operands[1], QImode, 1); } else if (LABEL_REF == GET_CODE (operands[1]) || SYMBOL_REF == GET_CODE (operands[1])) {// operands[4] = simplify_gen_subreg(QImode, operands[1], HImode, 0); operands[5] = simplify_gen_subreg(QImode, operands[1], HImode, 1); } else if (CONST_INT == GET_CODE (operands[1]) || REG_P (operands[1])) { operands[4] = simplify_gen_subreg(QImode, operands[1], HImode, 0); operands[5] = simplify_gen_subreg(QImode, operands[1], HImode, 1); } else { error(\Unrecognized code in operands[1]\); fputs(\\nrtx code is: \, stderr); debug_rtx(curr_insn); abort(); } }) The purpose of the expand is to load Label or Symbol references into base registers for index addressing. I decided to use the expand since the force_reg() was failing when I called from the split. Thank you for your reply. I think you can do this in GO_IF_LEGITIMATE_ADDRESS macro. There just return false if you find the above addressing modes or rather return tru only for the addressing modes you want to use. That way gcc will automatically load the symbol ref to registers. Regards, Shafi
Re: gmon.out creation procedure
On Tue, May 20, 2008 at 1:54 PM, [EMAIL PROTECTED] wrote: Dear Shafi Thanks you very much for the clear details. Definitely your inputs are helpful. 1) I am sure that in gcc-4.0 I found there is file gmon.c in the path gcc-4.0.0/gcc/gmon.c. Anyhow let me concentrate on gmon.c of glibc. I am not sure why this is found in gcc. It is not available in other versions. 2) Next thing I would like to know is to better understand the gmon.c of glibc I would like to degug glibc. since glibc is linked with gcc, I built gcc and glibc separately. while debugging gcc is referring shared glib library, but not the one I built freshly for debugging purpose. To make this happen, where I need to change the path to like both gcc and glibc ? IIRC by passing -static to linker you can link with the static glibc. To make sure that your glibc is picked up maybe you can hide the other glibc from the PATH variable. 3) Please correct me If I am wrong a. for every function mcount() function is called to collect the caller and callee address. where this collected info is placed ? b. the flow of monstartup() function monstartup()--moncontrol() -- profil() who will call the monstartup() ? is it gcrt0 ? before calling the main() function of our routine ? Thats right. Thats the other thing that happen when -pg option is provided. A different startup files is used. This will have a call to the monstartup. monstartup will initialize all the data structures required for collecting profile data and invokes profil system call. c. write_profiling() --- write_gmon() functions calls write_hist(), write_call_graph() and write_bb_counts(). here who calls the write_profiling() ? d. mcleanup() calls write_gmon(). who calls the mcleanup() ? is it gcrt0 ? after control return from main() function ? IIRC it is mcleanup that calls the output function write_gmon, which in turn calls the other functions. mcleanup will be called from the startup file after main returns. mcleanup dumps all the information in the output file. Hope this helps. Regards, Shafi Thanks and Regards Raja 2008/5/19 [EMAIL PROTECTED]: Hi, I am Raja, I need a favor on understand how the gmon.out file is created. Please help me. 1. gmon.c is available in both gcc and glibc. Which is the one used to create gmon.out ? I don't think gcc has gmon.c. Only glibc has it. You can also find gmon in newlib for some targets. But this will be customized for the target 2. Can you brief how profile information required to create gmon.out is captured?#65533;#65533;#65533;BBWhich are the functions are responsible for this ? These days gmon.c is used only to get histogram records(time related infomation). All the other information is now produced by gcc itself, than can be analyzed using gcov. (You will get gcov when you build gcc). For histogram records, gmon.c code primarily uses 'profil' system call. You can get more information about this in man pages. And of course you will get to know how this is used if you go through the code in gmon.c When -pg switch is enabled all complier does is inserting a call to the function mcount, usually after the function prologue. This is the function the collects all the needed information. For profiling information about caller address and callee address is necessary. If this information cannot be obtained using __bultin_return_address then this is calculated by mcount in a target specific manner and passed onto another function that takes these address as the arguments and gathers the profiling information. 3. Suppose assume that executable is built without #65533;#65533;Cpg option, but want to create gmon.out at run-time. Is there any way or guidelines to implement? A call to the profiling function (mcount) should be there to generate profiling information. Without that you won't be able to generate gmon.out Hope this helps, Regards, Shafi Thanks and Regards Raja Saleru
Re: How to legitimize the reload address?
On Wed, May 21, 2008 at 1:42 AM, Jeff Law [EMAIL PROTECTED] wrote: Ian Lance Taylor wrote: Mohamed Shafi [EMAIL PROTECTED] writes: For the 16 bit target that i am currently porting can have only positive offsets less than 0x100. (unsigned 8 bit) for offset addressing mode. I would expect reload to be able to handle this kind of thing anyhow, assuming you define GO_IF_LEGITIMATE_ADDRESS correctly. reload should automatically try loading an out of range offset into a register. Agreed. Typically if there are problems in this area it is because the port hasn't properly defined secondary reloads, or the valid offsets are not consistent within a machine mode. Mohamed, without more details, there's not much we can do to help you. I am sure that i have written GO_IF_LEGITIMATE_ADDRESS correctly. What i have in my port is something similar to mcore back-end. These are the relevant parts: else if (GET_CODE (X) == PLUS) { rtx xop0 = XEXP (X,0); rtx xop1 = XEXP (X,1); if (BASE_REGISTER_RTX_P (xop0)) return legitimate_index_p (mode, xop1); } static int legitimate_index_p (enum machine_mode mode, rtx OP) { if (GET_CODE (OP) == CONST_INT) { if (GET_MODE_SIZE (mode) = 4 (((unsigned)INTVAL (OP)) % 4) == 0 ((unsigned)INTVAL (OP)) = 0x0100) return 1; if (GET_MODE_SIZE (mode) == 2 (((unsigned)INTVAL (OP)) % 2) == 0 ((unsigned)INTVAL (OP)) = 0x0100) return 1; if (GET_MODE_SIZE (mode) == 1 ((unsigned)INTVAL (OP)) = 0x0100) return 1; } return 0; } The compiler is crashing in change_address_1, at emit-rtl.c ... if (validate) { if (reload_in_progress || reload_completed) gcc_assert (memory_address_p (mode, addr)); else addr = memory_address (mode, addr); } Everything starts when cleanup_subreg_operands() is called from reload() for the following pattern. (set (subreg:HI (mem:SI (plus:HI (reg:HI 12 [SP]) (const_int 256)) 2) (reg:HI 3)) and then this becomes (set (mem:HI (plus:HI (reg:HI 12 [SP] ) (const_int 258))) (reg:HI 3)) This pattern is not legitimate due to out of range offset. Will i be able to overcome this if i write LEGITIMIZE_RELOAD_ADDRESS or LEGITIMIZE_ADDRESS Thank you for your time. Regards, Shafi
How to legitimize the reload address?
Hello all, For the 16 bit target that i am currently porting can have only positive offsets less than 0x100. (unsigned 8 bit) for offset addressing mode. During reload i am getting ICE because the address created is not legitimate. So i guess i have to define the macro LEGITIMIZE_RELOAD_ADDRESS. But i am not sure how to do this? With this will i be able to convert load Rd, Rb[offset] into li Rs, offset add Rs,Rb load Rd, Rs where Rs is a reserved register. Or the only way is to do this like the other targets say in rs6000 From rs6000_legitimize_reload_address() /* Reload the high part into a base reg; leave the low part in the mem directly. */ x = gen_rtx_PLUS (GET_MODE (x), gen_rtx_PLUS (GET_MODE (x), XEXP (x, 0), GEN_INT (high)), GEN_INT (low)); push_reload (XEXP (x, 0), NULL_RTX, XEXP (x, 0), NULL, BASE_REG_CLASS, GET_MODE (x), VOIDmode, 0, 0, opnum, (enum reload_type)type); *win = 1; return x; I guess this will generate something like add Rs, Rb, excess_offset load Rd, Rs[legitimate_offset]; Regards, Shafi
Re: gmon.out creation procedure
2008/5/19 [EMAIL PROTECTED]: Hi, I am Raja, I need a favor on understand how the gmon.out file is created. Please help me. 1. gmon.c is available in both gcc and glibc. Which is the one used to create gmon.out ? I don't think gcc has gmon.c. Only glibc has it. You can also find gmon in newlib for some targets. But this will be customized for the target 2. Can you brief how profile information required to create gmon.out is captured?�BBWhich are the functions are responsible for this ? These days gmon.c is used only to get histogram records(time related infomation). All the other information is now produced by gcc itself, than can be analyzed using gcov. (You will get gcov when you build gcc). For histogram records, gmon.c code primarily uses 'profil' system call. You can get more information about this in man pages. And of course you will get to know how this is used if you go through the code in gmon.c When -pg switch is enabled all complier does is inserting a call to the function mcount, usually after the function prologue. This is the function the collects all the needed information. For profiling information about caller address and callee address is necessary. If this information cannot be obtained using __bultin_return_address then this is calculated by mcount in a target specific manner and passed onto another function that takes these address as the arguments and gathers the profiling information. 3. Suppose assume that executable is built without ¨Cpg option, but want to create gmon.out at run-time. Is there any way or guidelines to implement? A call to the profiling function (mcount) should be there to generate profiling information. Without that you won't be able to generate gmon.out Hope this helps, Regards, Shafi Thanks and Regards Raja Saleru
A question about UNSPEC expression and register allocation
Hello all, Recently i noticed that register allocation for the operands in a unspec pattern was going wrong. This was because there was no conflict between the registers used in the unspec pattern and the other registers which should have been there. During debugging i found out that the code is written in such a way that it doesn't consider registers used inside an unspec expression. So i rewrote the patten so that the unspec is in the source rather than in the destination of the pattern. That solved the issue. But is this expected? Will the allocation also go wrong for the source operands if they contain registers inside an unpsec expression? I still haven't encountered this. What about live analysis. How are the registers inside an unspec expression handled there? Regards, Shafi
Re: Implementing a restrictive addressing mode for a gcc port
On Tue, Apr 1, 2008 at 2:10 AM, Jim Wilson [EMAIL PROTECTED] wrote: Mohamed Shafi wrote: For the source or the destination register Rd/Ra, the restriction is that it should be one more than the base register . So the following instructions are valid: GCC doesn't provide any easy way for the source address to depend on the destination address, or vice versa. One thing you could try is generating a double-word pseudo-reg at RTL expand time, and then using subreg 0 for the source and subreg 1 for the dest (or vice versa depending on endianness/word order). This will get you a register pair you can use from the register allocator. This doesn't help at reload time though. Ok, whatever i tried to do didn't work properly. So i am trying to implement the way you have suggested. In define_expand for movhi i have the following code to generate double word pseudo-reg. . rtx dword,base,reg; HOST_WIDE_INT offset; offset = INTVAL(XEXP(XEXP(mem_op, 0), 1)); siwrd = gen_reg_rtx (SImode); base = simplify_gen_subreg (HImode, dword, SImode, 0); reg = simplify_gen_subreg (HImode, dword, SImode, 2); if (GET_CODE (operands[0]) == MEM) { operands[0] = gen_rtx_MEM (Pmode, plus_constant (base, offset)); operands[1] = reg; } else if (GET_CODE (operands[1]) == MEM) { operands[1] = gen_rtx_MEM (Pmode, plus_constant (base, offset)); operands[0] = reg; } I hope i am doing correctly. You probably have to define a constraint for every register, and then write an alternative for every register pair matching the correct even register with the correct odd register. That gets you past reload. I have defined a constraint for all the registers. But i am not sure as how to use them in the pattern. [(set (match_operand:HI 0 register_operand =r) (match_operand:HI 1 memory_operand m))] I have to add the constraints along with 'm' and 'r'. But the new constraints are suppose to indicate the register that has to be used. So i have defined REG_CLASS_FROM_CONSTRAINT macro to return the reg class of a particular constraint. But i am not sure how this can be used with a memory operand. Should i be defining EXTRA_MEMORY_CONSTRAINT? Can i directly use the register constraints for a memory operand? Thanks for your time. Regards, Shafi
Re: GCC 4.1.2 Port - Is live analysis going wrong?
On Fri, May 16, 2008 at 11:39 PM, Eric Botcazou [EMAIL PROTECTED] wrote: (insn 211 210 215 1 (set (reg:HI 1 R1 [+2 ]) (subreg:HI (reg/v:SF 207 [ d.104 ]) 2)) 4 {movhi_regmove} (insn_list:REG_DEP_TRUE 208 (nil)) (nil)) (call_insn/u 215 211 217 1 (set (reg:HI 0 R0) (call:HI (mem:HI (reg/f:HI 234) [0 S2 A16]) (const_int 0 [0x0]))) 25 {*call_value_internal_long} (insn_list:REG_DEP_ANTI 207 (insn_list:REG_DEP_ANTI 209 (insn_list:REG_DEP_TRUE 213 (insn_list:REG_DEP_TRUE 212 (insn_list:REG_DEP_TRUE 211 (insn_list:REG_DEP_TRUE 210 (insn_list:REG_DEP_ANTI 208 (nil (expr_list:REG_DEAD (reg:SF 2 R2) (insn_list:REG_RETVAL 210 (expr_list:REG_EH_REGION (const_int -1 [0x]) (nil (expr_list:REG_DEP_TRUE (use (reg:SF 2 R2)) (expr_list:REG_DEP_TRUE (use (reg:SF 0 R0)) (nil [...] Things go wrong in call_insn/u 215. Target has R0 and R1 are the parameter registers. There should probably be a USE for R1 on the call insn then, like for R0. Why is it there for the latter and not for the former? -- This is a 16bit target. SF uses two registers.So There its proper. But i am still tracing the bug. The problem is for some reason a definition of R1 is not getting emitted for a library call. This definition actually defines one of the parameters of the library call. This call also returns 2 register value, i.e in R0 and R1. So as far as live analysis is concerned there is a use for R1 but no definition. And hence it stays live through out the program. I now just need to find out why the instruction is not getting emitted. But thanks for taking your time to read this. Regards, Shafi
GCC 4.1.2 Port - Is live analysis going wrong?
Hello all, In the gcc 4.1.2 port i am working on, i get an ICE in insert_save, at caller-save.c:725 And following is the assert that assert failure. /* A common failure mode if register status is not correct in the RTL is for this routine to be called with a REGNO we didn't expect to save. That will cause us to write an insn with a (nil) SET_DEST or SET_SRC. Instead of doing so and causing a crash later, check for this common case here. This will remove one step in debugging such problems. */ gcc_assert (regno_save_mem[regno][1]); insert_save function is called by save_call_clobbered_regs() in the same file.The below is the relevant portion of the dump after local register allocation. (insn 210 213 211 1 (set (reg:HI 0 R0 [ d.104 ]) (subreg:HI (reg/v:SF 207 [ d.104 ]) 0)) 4 {movhi_regmove} (insn_list:REG_DEP_TRUE 208 (nil)) (insn_list:REG_LIBCALL 215 (nil))) (insn 211 210 215 1 (set (reg:HI 1 R1 [+2 ]) (subreg:HI (reg/v:SF 207 [ d.104 ]) 2)) 4 {movhi_regmove} (insn_list:REG_DEP_TRUE 208 (nil)) (nil)) (call_insn/u 215 211 217 1 (set (reg:HI 0 R0) (call:HI (mem:HI (reg/f:HI 234) [0 S2 A16]) (const_int 0 [0x0]))) 25 {*call_value_internal_long} (insn_list:REG_DEP_ANTI 207 (insn_list:REG_DEP_ANTI 209 (insn_list:REG_DEP_TRUE 213 (insn_list:REG_DEP_TRUE 212 (insn_list:REG_DEP_TRUE 211 (insn_list:REG_DEP_TRUE 210 (insn_list:REG_DEP_ANTI 208 (nil (expr_list:REG_DEAD (reg:SF 2 R2) (insn_list:REG_RETVAL 210 (expr_list:REG_EH_REGION (const_int -1 [0x]) (nil (expr_list:REG_DEP_TRUE (use (reg:SF 2 R2)) (expr_list:REG_DEP_TRUE (use (reg:SF 0 R0)) (nil (jump_insn 217 215 222 1 (set (pc) (if_then_else (le:CC (reg:HI 0 R0) (const_int 0 [0x0])) (label_ref:HI 226) (pc))) 48 {cmpbrhi_le} (insn_list:REG_DEP_TRUE 215 (nil)) (expr_list:REG_DEAD (reg:HI 0 R0) (expr_list:REG_BR_PROB (const_int 5000 [0x1388]) (nil ;; End of basic block 1, registers live: 1 [R1] 12 [R12] 14 [R14] 16 [AP] 206 207 208 209 215 218 234 Things go wrong in call_insn/u 215. Target has R0 and R1 are the parameter registers. So while building the reload chain for the call instructions the registers that are live during the call are R0, R1 among other registers. This information is stored in live_throughout member of the reload chain. In the function save_call_clobbered_regs() register life information in CHAIN is used to compute which regs are live during the call. And this is stored in hard_regs_to_save. After doing the following operations /* Compute which hard regs must be saved before this call. */ AND_COMPL_HARD_REG_SET (hard_regs_to_save, call_fixed_reg_set); AND_COMPL_HARD_REG_SET (hard_regs_to_save, this_insn_sets); AND_COMPL_HARD_REG_SET (hard_regs_to_save, hard_regs_saved); AND_HARD_REG_SET (hard_regs_to_save, call_used_reg_set); hard_regs_to_save will still contain R1 in it.(Parameter registers are part of call used register set.) And hence insert_save is called for saving reg R1. From the time of reload chain generation to save_call_clobbered_regs() function call things are proper, even though, as the comment says the registers status is not proper when save_call_clobbered_regs() is called. Looking at the dumps i think the only thing that is going wrong is the live information of the registers. For a call instructions all the parameter registers used by the call instructions will be live at the time of the call. But if these registers are used in the successor blocks only to pass the parameters, i.e their value is not used again, shouldn't these registers be marked as dead in the call instruction?. After some lengthy debugging this is the only conclusion that i can come to. But i am not sure if this is way live information is handled. Can some one give any thoughts on this? Regards, Shafi