On Mon, Oct 23, 2023 at 4:47 PM Roger Sayle <ro...@nextmovesoftware.com> wrote: > > > The eagle-eyed may have spotted that my recent testcases for DImode shifts > on x86_64 included -mno-stv in the dg-options. This is because the > Scalar-To-Vector (STV) pass currently transforms these shifts to use > SSE vector operations, producing larger code even with -Os. The issue > is that the compute_convert_gain currently underestimates the size of > instructions required for interunit moves, which is corrected with the > patch below. > > For the simple test case: > > unsigned long long shl1(unsigned long long x) { return x << 1; } > > without this patch, GCC -m32 -Os -mavx2 currently generates: > > shl1: push %ebp // 1 byte > mov %esp,%ebp // 2 bytes > vmovq 0x8(%ebp),%xmm0 // 5 bytes > pop %ebp // 1 byte > vpaddq %xmm0,%xmm0,%xmm0 // 4 bytes > vmovd %xmm0,%eax // 4 bytes > vpextrd $0x1,%xmm0,%edx // 6 bytes > ret // 1 byte = 24 bytes total > > with this patch, we now generate the shorter > > shl1: push %ebp // 1 byte > mov %esp,%ebp // 2 bytes > mov 0x8(%ebp),%eax // 3 bytes > mov 0xc(%ebp),%edx // 3 bytes > pop %ebp // 1 byte > add %eax,%eax // 2 bytes > adc %edx,%edx // 2 bytes > ret // 1 byte = 15 bytes total > > Benchmarking using CSiBE, shows that this patch saves 1361 bytes > when compiling with -m32 -Os, and saves 172 bytes when compiling > with -Os. > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > and make -k check, both with and without --target_board=unix{-m32} > with no new failures. Ok for mainline? > > > 2023-10-23 Roger Sayle <ro...@nextmovesoftware.com> > > gcc/ChangeLog > * config/i386/i386-features.cc (compute_convert_gain): Provide > more accurate values (sizes) for inter-unit moves with -Os.
LGTM. Thanks, Uros. > > > Thanks in advance, > Roger > -- >