https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82418

Alexander Monakov <amonakov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |uros at gcc dot gnu.org

--- Comment #6 from Alexander Monakov <amonakov at gcc dot gnu.org> ---
(the 'divx' function in comment 5 does not implement division by 100)

I'd like to see GCC improve here, so I looked at how this could be fixed. I'm
afraid adjusting expand_divmod to select the cheaper alternative on x86 is
going to be too complicated. I think it may be reasonable to conceal the 32x32
mul-highpart pattern on x86 from expand_divmod, so it uses the 32x32->64
widening multiply which leads to optimal code.

I also think the 32x32 mul-highpart pattern is not very useful outside of magic
division by constants, so concealing it altogether may be acceptable if no
better solution is available.

(to recap, we want 64-bit imul here rather than 32-bit widening mul with result
in edx:eax, because imul has better latency and throughput, less regalloc
constraints, and doesn't need a register to hold the immediate)

Patch I'm testing to disallow 32x32 mul-highpart on 64-bit x86:

--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -1042,6 +1042,10 @@ (define_mode_iterator SWIM248 [(HI "TARGET_HIMODE_MATH")
 (define_mode_iterator DWI [(DI "!TARGET_64BIT")
                           (TI "TARGET_64BIT")])

+;; Widest single word integer modes.
+(define_mode_iterator SWI48W [(SI "!TARGET_64BIT")
+                             (DI "TARGET_64BIT")])
+
 ;; GET_MODE_SIZE for selected modes.  As GET_MODE_SIZE is not
 ;; compile time constant, it is faster to use <MODE_SIZE> than
 ;; GET_MODE_SIZE (<MODE>mode).  For XFmode which depends on
@@ -7792,16 +7796,16 @@ (define_insn "*<u>mulqihi3_1"
    (set_attr "mode" "QI")])

 (define_expand "<s>mul<mode>3_highpart"
-  [(parallel [(set (match_operand:SWI48 0 "register_operand")
-                  (truncate:SWI48
+  [(parallel [(set (match_operand:SWI48W 0 "register_operand")
+                  (truncate:SWI48W
                     (lshiftrt:<DWI>
                       (mult:<DWI>
                         (any_extend:<DWI>
-                          (match_operand:SWI48 1 "nonimmediate_operand"))
+                          (match_operand:SWI48W 1 "nonimmediate_operand"))
                         (any_extend:<DWI>
-                          (match_operand:SWI48 2 "register_operand")))
+                          (match_operand:SWI48W 2 "register_operand")))
                       (match_dup 3))))
-             (clobber (match_scratch:SWI48 4))
+             (clobber (match_scratch:SWI48W 4))
              (clobber (reg:CC FLAGS_REG))])]
   ""
   "operands[3] = GEN_INT (GET_MODE_BITSIZE (<MODE>mode));")

Reply via email to