Hi Aaron,

On Thu, Sep 22, 2016 at 03:10:24PM -0500, Aaron Sawdey wrote:
> The powerpc target had a movmemsi pattern which supports memcpy() but
> did not have anything for memcmp(). This adds support for builtin
> expansion of memcmp() into inline code for modest constant lengths.
> Performance on power8 is in the range of 3-7x faster than calling
> memcmp() for lengths under 40 bytes.
> Bootstrap on powerpc64le, regtest in progress, OK for trunk if no new
> regressions?

[ You are testing on BE and 32-bit as well, thanks. ]

Just a bit more formatting, and one nit:

> 2016-09-22  Aaron Sawdey  <acsaw...@linux.vnet.ibm.com>
        PR target/77685   <<<--- please add this here
>       * config/rs6000/rs6000.md (cmpmemsi): New define_expand.
>       * config/rs6000/rs6000.c (expand_block_compare): New function used by
>       cmpmemsi pattern to do builtin expansion of memcmp().

Space before (, here and below.

>       (compute_current_alignment): Add helper function for
>       expand_block_compare used to compute alignment as the compare proceeds.
>       (select_block_compare_mode): Used by expand_block_compare to select
>       the mode used for reading the next chunk of bytes in the compare.
>       (do_load_for_compare): Used by expand_block_compare to emit the load
>       insns for the compare.
>       (rs6000_emit_dot_insn): Moved this function to avoid a forward
>       reference from expand_block_compare().
>       * config/rs6000/rs6000-protos.h (expand_block_compare): Add a
>       prototype for this function.
>       * config/rs6000/rs6000.opt (mblock-compare-inline-limit): Add a new
>       target option for controlling how much code inline expansion of
>       memcmp() will be allowed to generate.

You can just say "New function." and the like for most things here, fwiw.

> +/* Figure out the correct instructions to generate to load data for
> +   block compare.  MODE is used for the read from memory, and
> +   data is zero extended if REG is wider than MODE.  If LE code
> +   is being generated, bswap loads are used.

"Emit a load for a block compare", maybe?  I.e. say it emits code, and
it does only one load.

> +  /* final fallback is do one byte */

Capital, dot space space.

> +  /* If this is not a fixed size compare, just call memcmp */

Dot space space.

> +  /* This must be a fixed size alignment */

Dot space space.

> +  /* SLOW_UNALIGNED_ACCESS -- don't do unaligned stuff */

Full sentence.

> +  /* Anything to move? */

Question mark space space.

> +  int offset = 0;
> +  machine_mode load_mode =
> +    select_block_compare_mode (offset, bytes, base_align, word_mode_ok);
> +  int load_mode_size = GET_MODE_SIZE (load_mode);

HOST_WIDE_INT instead of the ints?  It isn't slower and I find it hard to
convince myself int always works.

> +       int extra_bytes = load_mode_size - bytes;

Here too.

> +      int remain = bytes - cmp_bytes;

One more.

Very nice improvement, thank you!  This is okay for trunk.


Reply via email to