Hi there. The architecture I'm working is a 32 bit, word based machine with a 16x16 -> 32 unsigned multiply. For some reason the combine stage is converting the umulhisi3 into a mulsi3 and I'm not sure how to track this down.
The test code is part of an alpha blend: void blend(uint8_t* sb, uint8_t* db) { uint16_t ia = 256 - *sb; uint16_t d = *db; *db = ((d * ia) >> 8) + *sb; } I've define the different multiplies in the .md file: (define_insn "umulhisi3" [(set (match_operand:SI 0 "register_operand" "=r") (mult:SI (zero_extend:SI (match_operand:HI 1 "register_operand" "%r")) (zero_extend:SI (match_operand:HI 2 "register_operand" "r"))))] "" ... (define_insn "mulsi3" [(set (match_operand:SI 0 "register_operand" "=r") (mult:SI (match_operand:SI 1 "register_operand" "%r") (match_operand:SI 2 "register_operand" "r")))] "" ... Running at -O level optimisations gives the following in umul.157r.outof_cfglayout, just before the combine stage: --- (insn 3 6 4 2 umul.c:16 (set (reg/v/f:SI 28 [ sb ]) (reg:SI 0 R10 [ sb ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 0 R10 [ sb ]) (nil))) (insn 4 3 5 2 umul.c:16 (set (reg/v/f:SI 29 [ db ]) (reg:SI 1 R11 [ db ])) 8 {movsi} (expr_list:REG_DEAD (reg:SI 1 R11 [ db ]) (nil))) (note 5 4 8 2 NOTE_INSN_FUNCTION_BEG) (insn 8 5 9 2 umul.c:17 (set (reg:SI 26 [ D.1217 ]) (zero_extend:SI (mem:QI (reg/v/f:SI 28 [ sb ]) [0 S1 A8]))) 27 {zero_extendqisi2} (expr_list:REG_DEAD (reg/v/f:SI 28 [ sb ]) (nil))) (insn 9 8 10 2 umul.c:20 (set (reg:HI 30) (const_int 256 [0x100])) 1 {movhi_insn} (nil)) (insn 10 9 11 2 umul.c:20 (set (reg:SI 31) (minus:SI (subreg:SI (reg:HI 30) 0) (reg:SI 26 [ D.1217 ]))) 12 {subsi3} (expr_list:REG_DEAD (reg:HI 30) (nil))) (insn 11 10 12 2 umul.c:20 (set (reg:SI 33) (zero_extend:SI (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8]))) 27 {zero_extendqisi2} (nil)) (insn 12 11 13 2 umul.c:20 (set (reg:HI 32) (subreg:HI (reg:SI 33) 0)) 1 {movhi_insn} (expr_list:REG_DEAD (reg:SI 33) (nil))) (insn 13 12 14 2 umul.c:20 (set (reg:SI 34) (mult:SI (zero_extend:SI (reg:HI 32)) (zero_extend:SI (subreg:HI (reg:SI 31) 0)))) 14 {umulhisi3} (expr_list:REG_DEAD (reg:HI 32) (expr_list:REG_DEAD (reg:SI 31) (nil)))) (insn 14 13 15 2 umul.c:20 (set (reg:SI 35) (ashiftrt:SI (reg:SI 34) (const_int 8 [0x8]))) 21 {ashrsi3_const} (expr_list:REG_DEAD (reg:SI 34) (nil))) (insn 15 14 16 2 umul.c:20 (set (reg:QI 36) (subreg:QI (reg:SI 35) 0)) 0 {movqi_insn} (expr_list:REG_DEAD (reg:SI 35) (nil))) (insn 16 15 17 2 umul.c:20 (set (reg:SI 37) (plus:SI (reg:SI 26 [ D.1217 ]) (subreg:SI (reg:QI 36) 0))) 11 {addsi3} (expr_list:REG_DEAD (reg:QI 36) (expr_list:REG_DEAD (reg:SI 26 [ D.1217 ]) (nil)))) (insn 17 16 0 2 umul.c:20 (set (mem:QI (reg/v/f:SI 29 [ db ]) [0 S1 A8]) (subreg:QI (reg:SI 37) 0)) 0 {movqi_insn} (expr_list:REG_DEAD (reg:SI 37) (expr_list:REG_DEAD (reg/v/f:SI 29 [ db ]) (nil)))) --- The umulhisi3 has been correctly found and used at this stage. In the following combine stage however, it gets converted into a mulsi3. The .combine dump is attached. The xtensa port is the closest match I can find as it is 32 bit, word based, and has the umulhisi3. It correctly keeps the 16 bit multiply. Some other test cases like: uint32_t mul(uint16_t a, uint16_t b) { return a*b; } come through fine. It might be something to do with the memory access. How does the combine stage work? It looks like it could get multiple potential matches for a set of RTLs. Does it use some type of costing function to pick between them? Can I tell combine that a umulhisi3 is cheaper than a mulsi3? Thanks for the earlier help on the post reload split to use the accumulator - it's working well. -- Michael
umul.i.159r.combine
Description: Binary data