Re: [PATCH] middle-end: Optimize reversed CRC table-based implementation

Andrew Pinski Fri, 28 Nov 2025 11:45:19 -0800

On Fri, Nov 28, 2025 at 2:00 AM Kito Cheng <[email protected]> wrote:
>
> I know we're already in stage 3, so I'm basically asking for a review and am
> fine with deferring this to GCC 17. But if it's acceptable for GCC 16, that
> would be great too. :P
>
> ---
>
> The previous reversed CRC implementation used explicit bit reflection
> before and after the CRC computation:
>
>   reflect(crc_init);
>   reflect(data);
>   for (int i = 0; i < data_bit_size / 8; i++)
>     crc = (crc << 8) ^ table[(crc >> (crc_bit_size - 8))
>                              ^ (data >> (data_bit_size - (i+1) * 8) & 0xFF)];
>   reflect(crc);
>
> This patch generates a reversed polynomial lookup table directly,
> eliminating the need for bit reflection operations.  The new algorithm:
>
>   for (int i = 0; i < data_bit_size / 8; i++)
>     crc = (crc >> 8) ^ table[(crc ^ (data >> (i * 8))) & 0xFF];
>
> This improves code generation for all targets using table-based reversed
> CRC, as it removes the overhead of reflecting input data and CRC values.


The one question I have is this local decision or done globally? For
-Os (maybe it is already done I didn't check), if the program does
both the reversed and non-reversed poly in the same program, won't it
be smaller to always do the non-reversed poly as the table to share
the table between the 2?

Thanks,
Andrew Pinski

>
> Ref:
>
> [1] "Reversing CRC - Theory and Practice"
>   
> https://sar.informatik.hu-berlin.de/research/publications/SAR-PR-2006-05/SAR-PR-2006-05_.pdf
>
> gcc/ChangeLog:
>
>         * expr.cc (calculate_reversed_crc): New function.
>         (assemble_reversed_crc_table): New function.
>         (generate_reversed_crc_table): New function.
>         (calculate_table_based_reversed_CRC): New function.
>         (expand_reversed_crc_table_based): Remove gen_reflecting_code
>         parameter.  Use calculate_table_based_reversed_CRC.
>         * expr.h (expand_reversed_crc_table_based): Update prototype.
>         * builtins.cc (expand_builtin_crc_table_based): Update call.
>         * internal-fn.cc (expand_crc_optab_fn): Update call.
>         * config/aarch64/aarch64.md (crc_rev<ALLI:mode><ALLX:mode>4):
>         Update call.
>         * config/i386/i386.md (crc_rev<SWI124:mode>si4): Update call.
>         * config/loongarch/loongarch.md (crc_rev<mode>si4): Update call.
>         Remove local rbit lambda.
>         * config/riscv/bitmanip.md (crc_rev<ANYI1:mode><ANYI:mode>4):
>         Update call.  Remove TARGET_ZBKB case.
>         * config/riscv/riscv.cc (generate_reflecting_code_using_brev):
>         Remove.
>         * config/riscv/riscv-protos.h (generate_reflecting_code_using_brev):
>         Remove declaration.
> ---
>  gcc/builtins.cc                   |   3 +-
>  gcc/config/aarch64/aarch64.md     |   3 +-
>  gcc/config/i386/i386.md           |   3 +-
>  gcc/config/loongarch/loongarch.md |  17 +---
>  gcc/config/riscv/bitmanip.md      |  13 +--
>  gcc/config/riscv/riscv-protos.h   |   1 -
>  gcc/config/riscv/riscv.cc         |  33 ------
>  gcc/expr.cc                       | 160 ++++++++++++++++++++++++++----
>  gcc/expr.h                        |   3 +-
>  gcc/internal-fn.cc                |   3 +-
>  10 files changed, 148 insertions(+), 91 deletions(-)
>
> diff --git a/gcc/builtins.cc b/gcc/builtins.cc
> index bbe181a5128..4f4694f2e2d 100644
> --- a/gcc/builtins.cc
> +++ b/gcc/builtins.cc
> @@ -7828,8 +7828,7 @@ expand_builtin_crc_table_based (internal_fn fn, 
> scalar_mode crc_mode,
>    else
>      /* If it's IFN_CRC_REV generate bit-reversed CRC.  */
>      expand_reversed_crc_table_based (target, op1, op2, op3,
> -                                    data_mode,
> -                                    generate_reflecting_code_standard);
> +                                    data_mode);
>    return target;
>  }
>
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index de6b1d0ed06..1171ed8258b 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -4878,8 +4878,7 @@ (define_expand "crc_rev<ALLI:mode><ALLX:mode>4"
>      else
>        /* Otherwise, generate table-based CRC.  */
>        expand_reversed_crc_table_based (operands[0], operands[1], operands[2],
> -                                      operands[3], <ALLI:MODE>mode,
> -                                      generate_reflecting_code_standard);
> +                                      operands[3], <ALLI:MODE>mode);
>      DONE;
>    }
>  )
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 6af7dcfcdd3..efb7c768ca3 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -29719,8 +29719,7 @@ (define_expand "crc_rev<SWI124:mode>si4"
>      emit_insn (gen_sse4_2_crc32<mode> (operands[0], operands[1], 
> operands[2]));
>    else
>      expand_reversed_crc_table_based (operands[0], operands[1], operands[2],
> -                                    operands[3], <SWI124:MODE>mode,
> -                                    generate_reflecting_code_standard);
> +                                    operands[3], <SWI124:MODE>mode);
>    DONE;
>  })
>
> diff --git a/gcc/config/loongarch/loongarch.md 
> b/gcc/config/loongarch/loongarch.md
> index 763d514cac7..bfe99877aa1 100644
> --- a/gcc/config/loongarch/loongarch.md
> +++ b/gcc/config/loongarch/loongarch.md
> @@ -4687,23 +4687,8 @@ (define_expand "crc_rev<mode>si4"
>        }
>      else
>        {
> -       /* No CRC instruction is suitable, use the generic table-based
> -          implementation but optimize bit reversion.  */
> -       auto rbit = [](rtx *r)
> -         {
> -           /* Well, this is ugly.  The problem is
> -              expand_reversed_crc_table_based only accepts one helper
> -              for reversing data elements and CRC states.  */
> -           auto mode = GET_MODE (*r);
> -           auto rbit = (mode == <MODE>mode ? gen_rbit<mode> : gen_rbitsi);
> -           rtx out = gen_reg_rtx (mode);
> -
> -           emit_insn (rbit (out, *r));
> -           *r = out;
> -         };
>         expand_reversed_crc_table_based (operands[0], operands[1],
> -                                        msg, operands[3], <MODE>mode,
> -                                        rbit);
> +                                        msg, operands[3], <MODE>mode);
>        }
>      DONE;
>    })
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index 166ddd9db9e..9e63e48beed 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -1234,25 +1234,16 @@ (define_expand "crc_rev<ANYI1:mode><ANYI:mode>4"
>       it is possible to store the quotient within a single variable
>       (E.g.  CRC64's quotient may need 65 bits,
>       we can't keep it in 64 bit variable.)
> -     then use clmul instruction to implement the CRC,
> -     otherwise (TARGET_ZBKB) generate table based using brev.  */
> +     then use clmul instruction to implement the CRC.  */
>    if ((TARGET_ZBKC || TARGET_ZBC || TARGET_ZVBC) && <ANYI:MODE>mode < 
> word_mode)
>      expand_reversed_crc_using_clmul (<ANYI:MODE>mode, <ANYI1:MODE>mode,
>                                      operands);
> -  else if (TARGET_ZBKB)
> -    /* Generate table-based CRC.
> -       To reflect values use brev and bswap instructions.  */
> -    expand_reversed_crc_table_based (operands[0], operands[1],
> -                                    operands[2], operands[3],
> -                                    GET_MODE (operands[2]),
> -                                    generate_reflecting_code_using_brev);
>    else
>      /* Generate table-based CRC.
>         To reflect values use standard reflecting algorithm.  */
>      expand_reversed_crc_table_based (operands[0], operands[1],
>                                      operands[2], operands[3],
> -                                    GET_MODE (operands[2]),
> -                                    generate_reflecting_code_standard);
> +                                    GET_MODE (operands[2]));
>    DONE;
>  })
>
> diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
> index a372779cf9f..199006c4cb5 100644
> --- a/gcc/config/riscv/riscv-protos.h
> +++ b/gcc/config/riscv/riscv-protos.h
> @@ -181,7 +181,6 @@ extern bool riscv_reg_frame_related (rtx);
>  extern void riscv_split_sum_of_two_s12 (HOST_WIDE_INT, HOST_WIDE_INT *,
>                                         HOST_WIDE_INT *);
>  extern bool riscv_vector_float_type_p (const_tree type);
> -extern void generate_reflecting_code_using_brev (rtx *);
>  extern void expand_crc_using_clmul (scalar_mode, scalar_mode, rtx *);
>  extern void expand_reversed_crc_using_clmul (scalar_mode, scalar_mode, rtx 
> *);
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 2d14b3c92f5..c6883b53f4f 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -15339,39 +15339,6 @@ riscv_use_by_pieces_infrastructure_p (unsigned 
> HOST_WIDE_INT size,
>    return default_use_by_pieces_infrastructure_p (size, alignment, op, 
> speed_p);
>  }
>
> -/* Generate instruction sequence
> -   which reflects the value of the OP using bswap and brev8 instructions.
> -   OP's mode may be less than word_mode, to get the correct number,
> -   after reflecting we shift right the value by SHIFT_VAL.
> -   E.g. we have 1111 0001, after reflection (target 32-bit) we will get
> -   1000 1111 0000 0000, if we shift-out 16 bits,
> -   we will get the desired one: 1000 1111.  */
> -
> -void
> -generate_reflecting_code_using_brev (rtx *op_p)
> -{
> -  rtx op = *op_p;
> -  machine_mode op_mode = GET_MODE (op);
> -
> -  /* OP may be smaller than a word.  We can use a paradoxical subreg
> -     to compensate for that.  It should never be larger than a word
> -     for RISC-V.  */
> -  gcc_assert (op_mode <= word_mode);
> -  if (op_mode != word_mode)
> -    op = gen_lowpart (word_mode, op);
> -
> -  HOST_WIDE_INT shift_val = (BITS_PER_WORD
> -                            - GET_MODE_BITSIZE (op_mode).to_constant ());
> -  riscv_expand_op (BSWAP, word_mode, op, op, op);
> -  riscv_expand_op (LSHIFTRT, word_mode, op, op,
> -                  gen_int_mode (shift_val, word_mode));
> -  if (TARGET_64BIT)
> -    emit_insn (gen_riscv_brev8_di (op, op));
> -  else
> -    emit_insn (gen_riscv_brev8_si (op, op));
> -}
> -
> -
>  /* Generate assembly to calculate CRC using clmul instruction.
>     The following code will be generated when the CRC and data sizes are 
> equal:
>       li      a4,quotient
> diff --git a/gcc/expr.cc b/gcc/expr.cc
> index 7d84ad9e6fc..bfbcf0bfb08 100644
> --- a/gcc/expr.cc
> +++ b/gcc/expr.cc
> @@ -14485,6 +14485,75 @@ generate_crc_table (unsigned HOST_WIDE_INT polynom, 
> unsigned short crc_bits)
>    return assemble_crc_table (polynom, crc_bits);
>  }
>
> +/* Calculate CRC for the initial CRC and given POLYNOMIAL.
> +   CRC_BITS is CRC size.  */
> +
> +static unsigned HOST_WIDE_INT
> +calculate_reversed_crc (unsigned HOST_WIDE_INT crc,
> +                       unsigned HOST_WIDE_INT polynomial,
> +                       unsigned short crc_bits)
> +{
> +  unsigned HOST_WIDE_INT rev_polynom = reflect_hwi (polynomial, crc_bits);
> +  for (int j = 0; j < 8; j++)
> +    {
> +      if (crc & 1)
> +       crc = (crc >> 1) ^ rev_polynom;
> +      else
> +       crc >>= 1;
> +    }
> +  /* Zero out bits in crc beyond the specified number of crc_bits.  */
> +  if (crc_bits < sizeof (crc) * CHAR_BIT)
> +    crc &= (HOST_WIDE_INT_1U << crc_bits) - 1;
> +  return crc;
> +}
> +
> +/* Assemble CRC table with 256 elements for the given POLYNOM and CRC_BITS.
> +   POLYNOM is the polynomial used to calculate the CRC table's elements.
> +   CRC_BITS is the size of CRC, may be 8, 16, ... . */
> +
> +static rtx
> +assemble_reversed_crc_table (unsigned HOST_WIDE_INT polynom, unsigned short 
> crc_bits)
> +{
> +  unsigned table_el_n = 0x100;
> +  tree ar = build_array_type (make_unsigned_type (crc_bits),
> +                             build_index_type (size_int (table_el_n - 1)));
> +
> +  /* Initialize the table.  */
> +  vec<tree, va_gc> *initial_values;
> +  vec_alloc (initial_values, table_el_n);
> +  for (size_t i = 0; i < table_el_n; ++i)
> +    {
> +      unsigned HOST_WIDE_INT crc = calculate_reversed_crc (i, polynom, 
> crc_bits);
> +      tree element = build_int_cstu (make_unsigned_type (crc_bits), crc);
> +      vec_safe_push (initial_values, element);
> +    }
> +  tree ctor = build_constructor_from_vec (ar, initial_values);
> +  rtx mem = output_constant_def (ctor, 1);
> +  gcc_assert (MEM_P (mem));
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file,
> +              ";; emitting reversed crc table crc_%u_polynomial_"
> +              HOST_WIDE_INT_PRINT_HEX " ",
> +              crc_bits, polynom);
> +      print_rtl_single (dump_file, XEXP (mem, 0));
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  return XEXP (mem, 0);
> +}
> +
> +/* Generate reversed CRC table for the given POLYNOM and CRC_BITS.  */
> +
> +static rtx
> +generate_reversed_crc_table (unsigned HOST_WIDE_INT polynom,
> +                            unsigned short crc_bits)
> +{
> +  gcc_assert (crc_bits <= 64);
> +
> +  return assemble_reversed_crc_table (polynom, crc_bits);
> +}
> +
>  /* Generate table-based CRC code for the given CRC, INPUT_DATA and the
>     POLYNOMIAL (without leading 1).
>
> @@ -14564,6 +14633,68 @@ calculate_table_based_CRC (rtx *crc, const rtx 
> &input_data,
>      }
>  }
>
> +/* Generate table-based reversed CRC code for the given CRC, INPUT_DATA
> +   and the POLYNOMIAL (without leading 1).
> +
> +   This function generates code for reversed (bit-reflected) CRC calculation
> +   using a pre-computed lookup table.  Unlike the standard CRC calculation,
> +   this processes data from LSB to MSB, eliminating the need for explicit
> +   bit reflection before and after the CRC computation.  */
> +
> +static void
> +calculate_table_based_reversed_CRC (rtx *crc, const rtx &input_data,
> +                                   const rtx &polynomial,
> +                                   machine_mode data_mode)
> +{
> +  machine_mode mode = GET_MODE (*crc);
> +  unsigned short crc_bit_size = GET_MODE_BITSIZE (mode).to_constant ();
> +  unsigned short data_size = GET_MODE_SIZE (data_mode).to_constant ();
> +  rtx tab = generate_reversed_crc_table (UINTVAL (polynomial), crc_bit_size);
> +
> +  for (unsigned short i = 0; i < data_size; i++)
> +    {
> +      *crc = force_reg (mode, *crc);
> +
> +      /* data >> (8 * i).  */
> +      unsigned range_8 = 8 * i;
> +      /* CRC's mode is always at least as wide as INPUT_DATA.  Convert
> +        INPUT_DATA into CRC's mode.  */
> +      rtx data = gen_reg_rtx (mode);
> +      convert_move (data, input_data, 1);
> +      data = expand_shift (RSHIFT_EXPR, mode, data, range_8, NULL_RTX, 1);
> +
> +      /* crc ^ (data >> (8 * i)).  */
> +      rtx in = expand_binop (mode, xor_optab, *crc, data,
> +                            NULL_RTX, 1, OPTAB_WIDEN);
> +
> +      /* (crc ^ data) & 0xFF.  */
> +      rtx index = expand_and (mode, in, gen_int_mode (255, mode),
> +                             NULL_RTX);
> +      int log_crc_size = exact_log2 (GET_MODE_SIZE (mode).to_constant ());
> +      index = expand_shift (LSHIFT_EXPR, mode, index,
> +                           log_crc_size, NULL_RTX, 0);
> +
> +      rtx addr = gen_reg_rtx (Pmode);
> +      convert_move (addr, index, 1);
> +      addr = expand_binop (Pmode, add_optab, addr, tab, NULL_RTX,
> +                           0, OPTAB_DIRECT);
> +
> +      /* crc_table[(crc ^ data) & 0xFF].  */
> +      rtx tab_el = validize_mem (gen_rtx_MEM (mode, addr));
> +
> +      /* (crc >> 8) if CRC is larger than 8, otherwise 0.  */
> +      rtx high = NULL_RTX;
> +      if (crc_bit_size != 8)
> +       high = expand_shift (RSHIFT_EXPR, mode, *crc, 8, NULL_RTX, 1);
> +      else
> +       high = gen_int_mode (0, mode);
> +
> +      /* crc = (crc >> 8) ^ crc_table[(crc ^ data) & 0xFF].  */
> +      *crc = expand_binop (mode, xor_optab, tab_el, high, NULL_RTX, 1,
> +                          OPTAB_WIDEN);
> +    }
> +}
> +
>  /* Generate table-based CRC code for the given CRC, INPUT_DATA and the
>     POLYNOMIAL (without leading 1).
>
> @@ -14701,41 +14832,30 @@ generate_reflecting_code_standard (rtx *op)
>     the POLYNOMIAL (without leading 1).
>
>     CRC is OP1, data is OP2 and the polynomial is OP3.
> -   This must generate CRC table and assembly for the following code,
> +   This generates a reversed CRC table and assembly for the following code,
>     where crc_bit_size and data_bit_size may be 8, 16, 32, 64:
>     uint_crc_bit_size_t
>     crc_crc_bit_size (uint_crc_bit_size_t crc_init,
> -                          uint_data_bit_size_t data, size_t size)
> +                    uint_data_bit_size_t data)
>     {
> -     reflect (crc_init)
>       uint_crc_bit_size_t crc = crc_init;
> -     reflect (data);
>       for (int i = 0; i < data_bit_size / 8; i++)
> -       crc = (crc << 8) ^ crc_table[(crc >> (crc_bit_size - 8))
> -                         ^ (data >> (data_bit_size - (i + 1) * 8) & 0xFF))];
> -     reflect (crc);
> +       crc = (crc >> 8) ^ crc_table[(crc ^ (data >> (i * 8))) & 0xFF];
>       return crc;
> -   }  */
> +   }
> +
> +   This approach uses a pre-computed reversed polynomial table, eliminating
> +   the need for explicit bit reflection before and after the CRC 
> computation.  */
>
>  void
>  expand_reversed_crc_table_based (rtx op0, rtx op1, rtx op2, rtx op3,
> -                                machine_mode data_mode,
> -                                void (*gen_reflecting_code) (rtx *op))
> +                                machine_mode data_mode)
>  {
>    gcc_assert (!CONST_INT_P (op0));
>    gcc_assert (CONST_INT_P (op3));
>    machine_mode crc_mode = GET_MODE (op0);
> -
>    rtx crc = gen_reg_rtx (crc_mode);
>    convert_move (crc, op1, 0);
> -  gen_reflecting_code (&crc);
> -
> -  rtx data = gen_reg_rtx (data_mode);
> -  convert_move (data, op2, 0);
> -  gen_reflecting_code (&data);
> -
> -  calculate_table_based_CRC (&crc, data, op3, data_mode);
> -
> -  gen_reflecting_code (&crc);
> +  calculate_table_based_reversed_CRC (&crc, op2, op3, data_mode);
>    convert_move (op0, crc, 0);
>  }
> diff --git a/gcc/expr.h b/gcc/expr.h
> index 060151df010..40627c4f113 100644
> --- a/gcc/expr.h
> +++ b/gcc/expr.h
> @@ -385,8 +385,7 @@ gf2n_poly_long_div_quotient (unsigned HOST_WIDE_INT, 
> unsigned short);
>  /* Generate table-based CRC.  */
>  extern void generate_reflecting_code_standard (rtx *);
>  extern void expand_crc_table_based (rtx, rtx, rtx, rtx, machine_mode);
> -extern void expand_reversed_crc_table_based (rtx, rtx, rtx, rtx, 
> machine_mode,
> -                                            void (*) (rtx *));
> +extern void expand_reversed_crc_table_based (rtx, rtx, rtx, rtx, 
> machine_mode);
>
>  /* Cache of the "extended" flag in the target's _BitInt description
>     for use during expand.  */
> diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
> index 13fbd2ce788..acad06db8d7 100644
> --- a/gcc/internal-fn.cc
> +++ b/gcc/internal-fn.cc
> @@ -4087,8 +4087,7 @@ expand_crc_optab_fn (internal_fn fn, gcall *stmt, 
> convert_optab optab)
>        else
>         /* If it's IFN_CRC_REV generate bit-reversed CRC.  */
>         expand_reversed_crc_table_based (dest, crc, data, polynomial,
> -                                        TYPE_MODE (data_type),
> -                                        generate_reflecting_code_standard);
> +                                        TYPE_MODE (data_type));
>
>        /* Now get the return value where it needs to be, taking care to
>          ensure it's promoted appropriately if the ABI demands it.
> --
> 2.34.1
>

Re: [PATCH] middle-end: Optimize reversed CRC table-based implementation

Reply via email to