https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122488

            Bug ID: 122488
           Summary: Recognize integer 128-bit multiplication emulated with
                    64-bit operations
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: fxue at os dot amperecomputing.com
  Target Milestone: ---

The below is a plain generic implementation for (64 x 64 -> 128) integer
multiplication, which is found in some real applications. Its logic is pretty
plain and straightforward in mathematics. That is, given two 64 operands, A and
B, the computation is based on the formula as:

   A[63:0] * B[63:0]
      = (A[63:32] * 2^32 + A[31:0]) * (B[63:32] * 2^32 + B[31:0])
      = (A[63:32] * B[63:32] * 2^64 + (A[63:32] * B[31:0] + A[31:0] * B[63:32])
* 2^32 + A[31:0] * B[31:0]

void multiply_uint64_to_uint128(uint64_t op0, uint64_t op1, uint64_t *res)
{
  uint64_t op0_lo = op0 & 0x00000000FFFFFFFF;
  uint64_t op1_lo = op1 & 0x00000000FFFFFFFF;
  uint64_t op0_hi = op0 >> 32;
  uint64_t op1_hi = op1 >> 32;
  uint64_t mul_lo = op0_lo * op1_lo;
  uint64_t mul_mi_01 = op0_hi * op1_lo;
  uint64_t mul_mi_10 = op1_hi * op0_lo;
  uint64_t mul_hi = op0_hi * op1_hi;
  uint64_t sum_mi = mul_mi_01 + mul_mi_10;
  uint64_t sum_mi_carry = sum_mi < mul_mi_01;
  uint64_t sum_32 = (mul_lo >> 32) + (sum_mi & 0x00000000FFFFFFFF);

  res[1] = mul_hi + (sum_mi_carry << 32) + (sum_mi >> 32) + (sum_32 >> 32);
  res[0] = (sum_32 << 32) | (mul_lo & 0x00000000FFFFFFFF);
}

The codegen by gcc is far from optimal, is nearly 1:1 correspondence to the
above operation sequence. While on most architectures, the code could be
simplfied to a normal 64-bit mult instruction and another kind of extended one
which gets us the high 64 bits of product for two 64-bit integers. To be
expected, it is actually equivalent to compact source form written with int128
type.

void multiply_uint64_to_uint128(uint64_t op0, uint64_t op1, uint64_t *res)
{
  __uint128_t product = (__uint128_t)op0 * (__uint128_t)op1;

  res[0] = (uint64_t)product;
  res[1] = (uint64_t)(product >> 64);
}

Refer to https://godbolt.org/z/4n37Ta4Yb

Reply via email to