12] Implement internal functions for efficient CRC computation

Jeff Law Sat, 25 May 2024 10:41:12 -0700



On 5/24/24 2:41 AM, Mariam Arutunian wrote:

Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide fasterCRC generation.

One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.

Co-authored-by: Joern Rennecke <joern.renne...@embecosm.com<mailto:joern.renne...@embecosm.com>>


gcc/

    * doc/md.texi (crc@var{m}@var{n}4,
    crc_rev@var{m}@var{n}4): Document.
    * expr.cc (generate_crc_table): New function.
    (calculate_table_based_CRC): Likewise.
    (expand_crc_table_based): Likewise.
    (gen_common_operation_to_reflect): Likewise.
    (reflect_64_bit_value): Likewise.
    (reflect_32_bit_value): Likewise.
    (reflect_16_bit_value): Likewise.
    (reflect_8_bit_value): Likewise.
    (generate_reflecting_code_standard): Likewise.
    (expand_reversed_crc_table_based): Likewise.
    * expr.h (generate_reflecting_code_standard): New function declaration.
    (expand_crc_table_based): Likewise.
    (expand_reversed_crc_table_based): Likewise.
    * internal-fn.cc: (crc_direct): Define.
    (direct_crc_optab_supported_p): Likewise.
    (expand_crc_optab_fn): New function
    * internal-fn.def (CRC, CRC_REV): New internal functions.
    * optabs.def (crc_optab, crc_rev_optab): New optabs.

Signed-off-by: Mariam Arutunian <mariamarutun...@gmail.com<mailto:mariamarutun...@gmail.com>>

diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..be68ef860f9 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered with 
operand 2.

This pattern is not allowed to @code{FAIL}.+@cindex @code{crc@var{m}@var{n}4} instruction pattern

+@item @samp{crc@var{m}@var{n}4}
+Calculate a bit-forward CRC using operands 1, 2 and 3,
+then store the result in operand 0.
+Operands 1 is the initial CRC, operands 2 is the data and operands 3 is the
+polynomial without leading 1.
+Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, where
+both modes are integers.  The size of CRC to be calculated is determined by the
+mode; for example, if @var{n} is 'hi', a CRC16 is calculated.
+
+@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern
+@item @samp{crc_rev@var{m}@var{n}4}
+Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed CRC.
+

So just to be clear, this is a case where the input (operand 2) may havea different mode than the output (operand 0). That scenario isgenerally discouraged, with a few exceptions (the most common beingshift counts which are often QImode objects while thevalue-to-be-shifted and the output value are potentially any scalarinteger mode.

So I don't think this is a problem, just wanted to point it out toanyone else that may be looking at this code.

 @end table

@end ifset

diff --git a/gcc/expr.cc b/gcc/expr.cc
index 1baa39b98eb..18368ae6b6c 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp)

return tree_to_shwi (size);

 }
+
+/* Calculate CRC for the initial CRC and given POLYNOMIAL.
+   CRC_BITS is CRC size.  */
+
+static unsigned HOST_WIDE_INT
+calculate_crc (unsigned HOST_WIDE_INT crc,
+             unsigned HOST_WIDE_INT polynomial,
+             unsigned crc_bits)

Just a nit. Line up the polynomial & crc_bits declarations with the crcdeclaration.

+{
+  crc = crc << (crc_bits - 8);
+  for (int i = 8; i > 0; --i)
+    {
+      if ((crc >> (crc_bits - 1)) & 1)
+       crc = (crc << 1) ^ polynomial;
+      else
+       crc <<= 1;
+    }
+
+  crc <<=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);
+  crc >>=  (sizeof (crc) * BITS_PER_UNIT - crc_bits);

Another nit.  Just once space after the <<= or >>= operators.

+
+  return crc;
+}
+
+/* Assemble CRC table with 256 elements for the given POLYNOM and CRC_BITS with
+   given ID.
+   ID is the identifier of the table, the name of the table is unique,
+   contains CRC size and the polynomial.
+   POLYNOM is the polynomial used to calculate the CRC table's elements.
+   CRC_BITS is the size of CRC, may be 8, 16, ... . */
+
+rtx
+assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned crc_bits)
+{
+  unsigned table_el_n = 0x100;
+  tree ar = build_array_type (make_unsigned_type (crc_bits),
+                             build_index_type (size_int (table_el_n - 1)));

Nit.  Line up build_index_type at the same indention as make_unsigned_type.

Note that with TREE_READONLY set, there is at least some chance that thelinker will find identical tables and merge them. I haven't testedthis, but I know it happens for other objects in the constant pools.

+  sprintf (buf, "crc_table_for_crc_%u_polynomial_" HOST_WIDE_INT_PRINT_HEX,
+          crc_bits, polynom);

Another formatting nit. Line up the arguments when you need to wrap afunction call. ie


foo (arg1, arg2....
     arg3, arg4....)

+
+/* Generate table-based CRC code for the given CRC, INPUT_DATA and the
+   POLYNOMIAL (without leading 1).
+
+   First, using POLYNOMIAL's value generates CRC table of 256 elements,
+   then generates the assembly for the following code,
+   where crc_size and data_size may be 8, 16, 32, 64, depending on CRC:
+
+     for (int i = 0; i < data_size / 8; i++)
+       crc = (crc << 8) ^ crc_table[(crc >> (crc_size - 8))
+                               ^ (data >> (data_size - (i + 1) * 8) & 0xFF))];
+
+   So to take values from the table, we need 8-bit data.
+   If input data size is not 8, then first we extract upper 8 bits,
+   then the other 8 bits, and so on.  */
+
+void
+calculate_table_based_CRC (rtx *crc, const rtx &input_data,
+                          const rtx &polynomial,
+                          machine_mode crc_mode, machine_mode data_mode)

Same nit as before. Line up the parameters. It looks like we need tocheck for that throughout this function when you wrapped functionargument lists.

+
+/* Generate table-based CRC code for the given CRC, INPUT_DATA and the
+   POLYNOMIAL (without leading 1).
+
+   CRC is OP1, data is OP2 and the polynomial is OP3.
+   This must generate a CRC table and an assembly for the following code,
+   where crc_size and data_size may be 8, 16, 32, 64:
+   uint_crc_size_t
+   crc_crc_size (uint_crc_size_t crc_init, uint_data_size_t data, size_t size)
+   {
+     uint_crc_size_t crc = crc_init;
+     for (int i = 0; i < data_size / 8; i++)
+       crc = (crc << 8)
+       ^ crc_table[(crc >> (crc_size - 8))
+                       ^ (data >> (data_size - (i + 1) * 8) & 0xFF))];
+     return crc;
+   }  */
+
+void
+expand_crc_table_based (rtx op0, rtx op1, rtx op2, rtx op3,
+                       machine_mode data_mode)
+{
+  gcc_assert (!CONST_INT_P (op0));
+  gcc_assert (CONST_INT_P (op3));
+  machine_mode crc_mode = GET_MODE (op0);
+  rtx crc = gen_reg_rtx (word_mode);
+  convert_move (crc, op1, 0);
+  calculate_table_based_CRC (&crc, op2, op3, crc_mode, data_mode);
+  if (crc_mode == SImode && word_mode == DImode)
+    {
+      rtx a_low = gen_lowpart_SUBREG (crc_mode, crc);
+      crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
+    }
+  rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0);

The explicit checks for SI/DI mode don't look right here. What I thinkyou're looking for is crc_mode < word_mode. If you wanted to be evenomre precise you could check the sizes of the two modes.

Along the same lines, I don't think you want/need the call tosimplify_gen_subreg when word_mode == crc_mode. ISTM that you only wantthe subreg when crc_mode < word_mode.


Presumably we rejected the optimization earlier if crc_mode > word_mode?

+  emit_move_insn (tgt, crc);
+}
+
+/* Generate the common operation for reflecting values:
+   *OP = (*OP & AND1_VALUE) << SHIFT_VAL | (*OP & AND2_VALUE) >> SHIFT_VAL;  */
+
+void
+gen_common_operation_to_reflect (rtx *op,
+                                unsigned HOST_WIDE_INT and1_value,
+                                unsigned HOST_WIDE_INT and2_value,
+                                unsigned shift_val)
+{
+  rtx op1 = expand_and (word_mode, *op, gen_int_mode (and1_value, word_mode),
+                       NULL_RTX);
+  op1 = expand_shift (LSHIFT_EXPR, word_mode, op1, shift_val, op1, 0);
+  rtx op2 = expand_and (word_mode, *op, gen_int_mode (and2_value, word_mode),
+                       NULL_RTX);
+  op2 = expand_shift (RSHIFT_EXPR, word_mode, op2, shift_val, op2, 1);
+  *op = expand_binop (word_mode, ior_optab, op1, op2, *op, 0, OPTAB_DIRECT);
+}

Same nits as elsewhere. Line up the parameters & arguments when youneed to wrap lines.

+
+/* Reflect 64-bit value for the 64-bit target.  */
+
+void
+reflect_64_bit_value (rtx *op)
+{
+  gen_common_operation_to_reflect (op, 0x00000000FFFFFFFF, 0xFFFFFFFF00000000,
+                                  32);
+  gen_common_operation_to_reflect (op, 0x0000FFFF0000FFFF, 0xFFFF0000FFFF0000,
+                                  16);
+  gen_common_operation_to_reflect (op, 0x00FF00FF00FF00FF, 0xFF00FF00FF00FF00,
+                                  8);
+  gen_common_operation_to_reflect (op, 0x0F0F0F0F0F0F0F0F, 0xF0F0F0F0F0F0F0F0,
+                                  4);
+  gen_common_operation_to_reflect (op, 0x3333333333333333, 0xCCCCCCCCCCCCCCCC,
+                                  2);
+  gen_common_operation_to_reflect (op, 0x5555555555555555, 0xAAAAAAAAAAAAAAAA,
+                                  1);

Another class of nits. In general, if you're wrapping a long line likethese above, consider "balancing" the length of the lines a bit better.so instead of bringing down just the final constant, bring down thefinal two constants to a new line. I think that's what we generally doelsehwere.

+
+  gen_reflecting_code (&crc, GET_MODE_BITSIZE (word_mode) - crc_size);
+  if (crc_mode == SImode && word_mode == DImode)
+    {
+      rtx a_low = gen_lowpart_SUBREG (crc_mode, crc);
+      crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
+    }
+  rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0);
+  emit_move_insn (tgt, crc);

Same concerns as in the bit-forward implementation WRT testing SI/DIexplicitly. Testing crc_mode < word_mode or testing their bitsizesseems safer.

+/* Expand CRC call STMT.  */
+
+static void
+expand_crc_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab)
+{
+    tree lhs = gimple_call_lhs (stmt);
+    tree rhs1 = gimple_call_arg (stmt, 0); // crc
+    tree rhs2 = gimple_call_arg (stmt, 1); // data
+    tree rhs3 = gimple_call_arg (stmt, 2); // polynomial
+
+    tree result_type = TREE_TYPE (lhs);
+    tree data_type = TREE_TYPE (rhs2);
+
+    gcc_assert (TYPE_MODE (result_type) >= TYPE_MODE (data_type));
+    gcc_assert (word_mode >= TYPE_MODE (result_type));
+
+    rtx dest = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+    rtx crc = expand_normal (rhs1);
+    rtx data = expand_normal (rhs2);
+    gcc_assert (TREE_CODE (rhs3) == INTEGER_CST);
+    rtx polynomial = gen_rtx_CONST_INT (TYPE_MODE (result_type),
+                                TREE_INT_CST_LOW (rhs3));

Nit. Looks like this code got over-indented. Just two spaces ofindention after the open-curly.

In general, most of the concerns with this patch are formatting nits.The functional concerns are with those mode checks as mentioned in acouple places above. Repost once you've got an update fixing these issues.


Thanks,
Jeff

Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation

Reply via email to