On Thu, Apr 17, 2014 at 7:19 AM, Thomas Preud'homme <thomas.preudho...@arm.com> wrote: >> From: Richard Biener [mailto:richard.guent...@gmail.com] >> >> With handling only the outermost handled-component and then only a >> selected subset you'll catch many but not all cases. Why not simply >> use get_inner_reference () here (plus stripping the constant offset >> from an innermost MEM_REF) and get the best of both worlds (not >> duplicate parts of its logic and handle more cases)? Eventually >> using tree-affine.c and get_inner_reference_aff is even more appropriate >> so you can compute the address differences without decomposing >> them yourselves. > > Why does the constant offset from an innermost MEM_REF need to be > stripped? Shouldn't that be part of the offset in the symbolic number?
Yes, but get_inner_reference returns MEM[ptr, constant-offset] as base, thus it doesn't move the constant offset therein to bitpos and doesn't return MEM[ptr, 0]. You have to do that yourselves. (as you are really interested in the _address_ of the memory reference instead of the reference itself it would be appropriate to introduce a variant of get_inner_reference that returns 'ptr' in this case and &x for x.field1 for example) >> >> + /* Compute address to load from and cast according to the size >> + of the load. */ >> + load_ptr_type = build_pointer_type (load_type); >> + addr_expr = build1 (ADDR_EXPR, load_ptr_type, bswap_src); >> + addr_tmp = make_temp_ssa_name (load_ptr_type, NULL, >> "load_src"); >> + addr_stmt = gimple_build_assign_with_ops >> + (NOP_EXPR, addr_tmp, addr_expr, NULL); >> + gsi_insert_before (&gsi, addr_stmt, GSI_SAME_STMT); >> + >> + /* Perform the load. */ >> + load_offset_ptr = build_int_cst (load_ptr_type, 0); >> + val_tmp = make_temp_ssa_name (load_type, NULL, "load_dst"); >> + val_expr = build2 (MEM_REF, load_type, addr_tmp, >> load_offset_ptr); >> + load_stmt = gimple_build_assign_with_ops >> + (MEM_REF, val_tmp, val_expr, NULL); >> >> this is unnecessarily complex and has TBAA issues. You don't need to >> create a "correct" pointer type, so doing >> >> addr_expr = fold_build_addr_expr (bswap_src); >> >> is enough. Now, to fix the TBAA issues you either need to remember >> and "combine" the reference_alias_ptr_type of each original load and >> use that for the load_offset_ptr value or decide that isn't worth it and >> use alias-set zero (use ptr_type_node). > > Sorry this is only my second patch [1] to gcc so it's not all clear to me. > The TBAA > issue you mention comes from val_expr referring to a memory area that > overlap with the smaller memory area used in the bitwise OR operation, am I > right? Now, I have no idea about how to do the combination of the values > returned by reference_alias_ptr_type () for each individual small memory > area. Can you advise me on this? And what are the effect of not doing it and > using ptr_type_node for the alias-set? You can "combine" two reference_alias_ptr_type()s with if (alias_ptr_types_compatible_p (type1, type2)) return type1; else return ptr_type_node; using ptr_type_node for the alias-set will make it alias with all memory references (that is, type-based disambiguation will be disabled). That's required for example if you combine four loads with type 'short' using a single load with type 'long'. > [1] First one was a fix on the existing implementation of the bswap pass. > >> >> Can you also expand the comment about "size vs. range"? Is it >> that range can be bigger than size if you have (short)a[0] | >> ((short)a[3] << 1) sofar >> where size == 2 but range == 3? Thus range can also be smaller than size >> for example for (short)a[0] | ((short)a[0] << 1) where range would be 1 and >> size == 2? I suppose adding two examples like this to the comment, together >> with the expected value of 'n' would help here. > > You understood correctly. I will add the suggested example. > >> Otherwise the patch looks good. Now we're only missing the addition >> of trying to match to a VEC_PERM_EXPR with a constant mask >> using can_vec_perm_p ;) > > Is that the vector shuffle engine you were mentioning in PR54733? If I > understand correctly it is a generalization of the check again CMPNOP and > CMPXCHG in find_bswap in this new patchset. I will look if ARM could > Benefit from this and if yes I might take a look (yep, two conditions). Yep. For example it might match on things like int foo (char *x) { return ((((x[0] << 1 | x[0]) << 1) | x[1]) << 1) | x[0]; } not sure if target support for shuffles on small vectors (or vector parts) is working well. Thus on v1si as in the example. Richard. > Thanks a lot for such quick and detailed comments after my ping. > > Best regards, > > Thomas > > >