http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36043

--- Comment #23 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Michael Matz from comment #8)
> FWIW, I think the error is in the caller of move_block_to_reg. 
> move_block_to_reg can make use of a load_multiple instruction, which really
> loads full regs.  I.e. it would be unreasonable to require changes in
> move_block_to_reg to handle non-power-of-2 sizes.  Hence the caller
> (load_register_parameters) needs to handle this.  I'm not sure if the
> n_aligned_regs thingy could be misused for this, or if one simply should
> opencode the special case of the last register being partial.

That would be sth like

Index: gcc/calls.c
===================================================================
--- gcc/calls.c (revision 208124)
+++ gcc/calls.c (working copy)
@@ -1984,7 +1984,26 @@ load_register_parameters (struct arg_dat
                    emit_move_insn (ri, x);
                }
              else
-               move_block_to_reg (REGNO (reg), mem, nregs, args[i].mode);
+               {
+                 if (size % UNITS_PER_WORD == 0
+                     || MEM_ALIGN (mem) % BITS_PER_WORD == 0)
+                   move_block_to_reg (REGNO (reg), mem, nregs, args[i].mode);
+                 else
+                   {
+                     if (nregs > 1)
+                       move_block_to_reg (REGNO (reg), mem,
+                                          nregs - 1, args[i].mode);
+                     rtx dest = gen_rtx_REG (word_mode,
+                                             REGNO (reg) + nregs - 1);
+                     rtx src = operand_subword_force (mem,
+                                                      nregs - 1,
args[i].mode);
+                     rtx tem = extract_bit_field (src, size * BITS_PER_UNIT,
+                                                  0, 1, dest, word_mode,
+                                                  word_mode);
+                     if (tem != dest)
+                       convert_move (dest, tem, 1);
+                   }
+               }
            }

          /* When a parameter is a block, and perhaps in other cases, it is

it's similar to what store_unaligned_arguments_into_pseudos would end up
doing but only for the last register (so it's probably easier to dispatch
to that and handle !STRICT_ALIGNMENT targets there).

Anyway, the generated code is of course "horrible".

foo:
.LFB0:
        .cfi_startproc
        movq    %rdi, %rcx
        movzwl  (%rdi), %edx
        movzwl  2(%rdi), %edi
        salq    $16, %rdi
        movq    %rdi, %rax
        movzwl  4(%rcx), %edi
        orq     %rdx, %rax
        salq    $32, %rdi
        orq     %rax, %rdi
        jmp     print_colour

for some reason extract_bit_field doesn't consider using a 4-byte load
for the first part.  With AVX one could also use a masked load (and thus
implement the extv/insv pattern family?  not sure if it is valid to
reject non-byte boundary variants).  But if we end up using
extract_bit_field more and more it's worth optimizing it further to
avoid the above mess... (we end up using extract_split_bit_field).

Reply via email to