On Fri, 6 Jan 2012, Martin Jambor wrote: > Hi, > > I'm trying to teach our expander how to deal with misaligned MEM_REFs > on strict alignment targets. We currently generate code which leads > to bus error signals due to misaligned accesses. > > I admit my motivation is not any target in particular but simply being > able to produce misaligned MEM_REFs in SRA, currently we work-around > that by producing COMPONENT_REFs which causes quite a few headaches. > Nevertheless, I started by following Richi's advice and set out to fix > the following two simple testcases on a strict-alignment platform, a > sparc64 in the compile farm. If I understood him correctly, Richi > claimed they have been failing "since forever:" > > ----- test case 1: ----- > > extern void abort (); > > typedef unsigned int myint __attribute__((aligned(1))); > > /* even without the attributes we get bus error */ > unsigned int __attribute__((noinline, noclone)) > foo (myint *p) > { > return *p; > } > > struct blah > { > char c; > myint i; > }; > > struct blah g; > > #define cst 0xdeadbeef > > int > main (int argc, char **argv) > { > int i; > g.i = cst; > i = foo (&g.i); > if (i != cst) > abort (); > return 0; > } > > ----- test case 2: ----- > > extern void abort (); > > typedef unsigned int myint __attribute__((aligned(1))); > > void __attribute__((noinline, noclone)) > foo (myint *p, unsigned int i) > { > *p = i; > } > > struct blah > { > char c; > myint i; > }; > > struct blah g; > > #define cst 0xdeadbeef > > int > main (int argc, char **argv) > { > foo (&g.i, cst); > if (g.i != cst) > abort (); > return 0; > } > > ------------------------ > > I dug in expr.c and found two places which handle misaligned MEM_REfs > loads and stores respectively but only if there is a special > movmisalign_optab operation available for the given mode. My approach > therefore was to append calls to extract_bit_field and store_bit_field > which seem to be the part of expander capable of dealing with > misaligned memory accesses. The patch is below, it fixes both > testcases on sparc64--linux-gnu. > > Is this approach generally the right thing to do? And of course, > since my knowledge of RTL and expander is very limited I expect that > there will by quite many suggestions about its various particular > aspects. I have run the c and c++ testsuite with the second hunk in > place without any problems, the same test of the whole patch is under > way right now but it takes quite a lot of time, therefore most > probably I won't have the results today. Of course I plan to do a > bootstrap and at least Fortran checking on this platform too but that > is really going to take some time and I'd like to hear any comments > before that.
The idea is good (well, I suppose I suggested it ... ;)) > One more question: I'd like to be able to handle misaligned loads of > stores of SSE vectors this way too but then of course I cannot use > STRICT_ALIGNMENT as the guard but need a more elaborate predicate. I > assume it must already exist, which one is it? There is none :/ STRICT_ALIGNMENT would need to get a mode argument, but reality is that non-STRICT_ALIGNMENT targets at the moment need to have their movmisalign optab trigger for all cases that will not work when misaligned. > --- 4572,4653 ---- > || TREE_CODE (to) == TARGET_MEM_REF) > && mode != BLKmode > && ((align = get_object_or_type_alignment (to)) > ! < GET_MODE_ALIGNMENT (mode))) > { > enum machine_mode address_mode; > ! rtx reg; > > reg = expand_expr (from, NULL_RTX, VOIDmode, EXPAND_NORMAL); > reg = force_not_mem (reg); > > ! if ((icode = optab_handler (movmisalign_optab, mode)) > ! != CODE_FOR_nothing) > { > + struct expand_operand ops[2]; > + rtx op0, mem; > + > + if (TREE_CODE (to) == MEM_REF) > + { > + addr_space_t as > + = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (to, > 1)))); > + tree base = TREE_OPERAND (to, 0); > + address_mode = targetm.addr_space.address_mode (as); > + op0 = expand_expr (base, NULL_RTX, VOIDmode, EXPAND_NORMAL); > + op0 = convert_memory_address_addr_space (address_mode, op0, as); > + if (!integer_zerop (TREE_OPERAND (to, 1))) > + { > + rtx off > + = immed_double_int_const (mem_ref_offset (to), > address_mode); > + op0 = simplify_gen_binary (PLUS, address_mode, op0, off); > + } > + op0 = memory_address_addr_space (mode, op0, as); > + mem = gen_rtx_MEM (mode, op0); > + set_mem_attributes (mem, to, 0); > + set_mem_addr_space (mem, as); > + } > + else if (TREE_CODE (to) == TARGET_MEM_REF) > + { > + addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (to)); > + struct mem_address addr; > + > + get_address_description (to, &addr); > + op0 = addr_for_mem_ref (&addr, as, true); > + op0 = memory_address_addr_space (mode, op0, as); > + mem = gen_rtx_MEM (mode, op0); > + set_mem_attributes (mem, to, 0); > + set_mem_addr_space (mem, as); > + } > + else > + gcc_unreachable (); > + if (TREE_THIS_VOLATILE (to)) > + MEM_VOLATILE_P (mem) = 1; > + > + create_fixed_operand (&ops[0], mem); > + create_input_operand (&ops[1], reg, mode); > + /* The movmisalign<mode> pattern cannot fail, else the assignment > would > + silently be omitted. */ > + expand_insn (icode, 2, ops); > + return; > + } > + else if (STRICT_ALIGNMENT || SLOW_UNALIGNED_ACCESS (mode, align)) so - this can be STRICT_ALIGNMENT only. I guess then this does not fix the SSE testcase you had, right? (Alternatively just using SLOW_UNALIGNED_ACCESS is ok as well, we'd just change behavior here compared to what we have done in the past) > + { > + rtx op0, mem; > addr_space_t as > ! = TYPE_ADDR_SPACE (TREE_TYPE (TREE_TYPE (TREE_OPERAND (to, 1)))); > tree base = TREE_OPERAND (to, 0); > address_mode = targetm.addr_space.address_mode (as); > op0 = expand_expr (base, NULL_RTX, VOIDmode, EXPAND_NORMAL); > op0 = convert_memory_address_addr_space (address_mode, op0, as); > op0 = memory_address_addr_space (mode, op0, as); > mem = gen_rtx_MEM (mode, op0); > set_mem_attributes (mem, to, 0); > set_mem_addr_space (mem, as); > > ! store_bit_field (mem, tree_low_cst (TYPE_SIZE (TREE_TYPE (to)), 1), > ! tree_low_cst (TREE_OPERAND (to, 1), 0), > ! 0, 0, mode, reg); I think you want GET_MODE_BITSIZE (mode) for the bitsize argument, and zero for the bitnum argument and handle the offset of the MEM_REF (you don't handle TARGET_MEM_REFs here, so you'd better assert it is a MEM_REF only) similar to how it is handled in the movmisalign case. In fact the MEM_REF/TARGET_MEM_REF conversion to the mem can be shared between the movmisaling and store_bit_field cases I think. > ! return; > } > } > > /* Assignment of a structure component needs special treatment > *************** expand_expr_real_1 (tree exp, rtx target > *** 9356,9376 **** > if (TREE_THIS_VOLATILE (exp)) > MEM_VOLATILE_P (temp) = 1; > if (mode != BLKmode > ! && align < GET_MODE_ALIGNMENT (mode) > /* If the target does not have special handling for unaligned > loads of mode then it can use regular moves for them. */ > ! && ((icode = optab_handler (movmisalign_optab, mode)) > ! != CODE_FOR_nothing)) > ! { > ! struct expand_operand ops[2]; > > ! /* We've already validated the memory, and we're creating a > ! new pseudo destination. The predicates really can't fail, > ! nor can the generator. */ > ! create_output_operand (&ops[0], NULL_RTX, mode); > ! create_fixed_operand (&ops[1], temp); > ! expand_insn (icode, 2, ops); > ! return ops[0].value; > } > return temp; > } > --- 9379,9409 ---- > if (TREE_THIS_VOLATILE (exp)) > MEM_VOLATILE_P (temp) = 1; > if (mode != BLKmode > ! && align < GET_MODE_ALIGNMENT (mode)) > ! { > /* If the target does not have special handling for unaligned > loads of mode then it can use regular moves for them. */ > ! if ((icode = optab_handler (movmisalign_optab, mode)) > ! != CODE_FOR_nothing) > ! { > ! struct expand_operand ops[2]; > > ! /* We've already validated the memory, and we're creating a > ! new pseudo destination. The predicates really can't fail, > ! nor can the generator. */ > ! create_output_operand (&ops[0], NULL_RTX, mode); > ! create_fixed_operand (&ops[1], temp); > ! expand_insn (icode, 2, ops); > ! return ops[0].value; > ! } > ! else if (STRICT_ALIGNMENT || SLOW_UNALIGNED_ACCESS (mode, align)) > ! temp = extract_bit_field (temp, > ! tree_low_cst (TYPE_SIZE (TREE_TYPE > (exp)), 1), > ! tree_low_cst (TREE_OPERAND (exp, 1), 0), > ! TYPE_UNSIGNED (TREE_TYPE (exp)), true > /* ??? */, > ! (modifier == EXPAND_STACK_PARM > ! ? NULL_RTX : target), > ! mode, mode); Similar for the TREE_OPERAND (exp, 1) handling. Eric, do you have any objections in principle of handling get_object_or_type_alignment () < GET_MODE_ALIGNMENT (mode) stores/loads this way? Thanks, Richard.