On 09/07/18 22:44, Martin Sebor wrote: > On 07/09/2018 06:40 AM, Richard Biener wrote: >> On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor <mse...@gmail.com> wrote: >>> >>> On 07/06/2018 09:52 AM, Richard Biener wrote: >>>> On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor <mse...@gmail.com> wrote: >>>>> >>>>> GCC folds accesses to members of constant aggregates except >>>>> for character arrays/strings. For example, the strlen() call >>>>> below is not folded: >>>>> >>>>> const char a[][4] = { "1", "12" }; >>>>> >>>>> int f (void) { retturn strlen (a[1]); } >>>>> >>>>> The attached change set enhances the string_constant() function >>>>> to make it possible to extract string constants from aggregate >>>>> initializers (CONSTRUCTORS). >>>>> >>>>> The initial solution was much simpler but as is often the case, >>>>> MEM_REF made it fail to fold things like: >>>>> >>>>> int f (void) { retturn strlen (a[1] + 1); } >>>>> >>>>> Handling those made the project a bit more interesting and >>>>> the final solution somewhat more involved. >>>>> >>>>> To handle offsets into aggregate string members the patch also >>>>> extends the fold_ctor_reference() function to extract entire >>>>> string array initializers even if the offset points past >>>>> the beginning of the string and even though the size and >>>>> exact type of the reference are not known (there isn't enough >>>>> information in a MEM_REF to determine that). >>>>> >>>>> Tested along with the patch for PR 86415 on x86_64-linux. >>>> >>>> + if (TREE_CODE (init) == CONSTRUCTOR) >>>> + { >>>> + tree type; >>>> + if (TREE_CODE (arg) == ARRAY_REF >>>> + || TREE_CODE (arg) == MEM_REF) >>>> + type = TREE_TYPE (arg); >>>> + else if (TREE_CODE (arg) == COMPONENT_REF) >>>> + { >>>> + tree field = TREE_OPERAND (arg, 1); >>>> + type = TREE_TYPE (field); >>>> + } >>>> + else >>>> + return NULL_TREE; >>>> >>>> what's wrong with just >>>> >>>> type = TREE_TYPE (field); >>> >>> In response to your comment below abut size I simplified things >>> further so determining the type a priori is no longer necessary. >>> >>>> ? >>>> >>>> + base_off *= BITS_PER_UNIT; >>>> >>>> poly_uint64 isn't enough for "bits", with wide-int you'd use >>>> offset_int, >>>> for poly you'd then use poly_offset? >>> >>> Okay, I tried to avoid the overflow. (Converting between all >>> these flavors of wide int types is a monumental PITA.) >>> >>>> >>>> You extend fold_ctor_reference to treat size == 0 specially but then >>>> bother to compute a size here - that looks unneeded? >>> >>> Yes, well spotted, thanks! I simplified the code so this isn't >>> necessary, and neither is the type. >>> >>>> >>>> While the offset of the reference determines the first field in the >>>> CONSTRUCTOR, how do you know the access doesn't touch >>>> adjacent ones? STRING_CSTs do not have to be '\0' terminated, >>>> so consider >>>> >>>> char x[2][4] = { "abcd", "abcd" }; >>>> >>>> and MEM[&x] with a char[8] type? memcpy "inlining" will create >>>> such MEMs for example. >>> >>> The code is only used to find string constants in initializer >>> expressions where I don't think the size of the access comes >>> into play. If a memcpy() call results in a MEM_REF[char[8], >>> &x, 8] that's fine. It's a valid reference and we can still >>> get the underlying character sequence (which is represented >>> as two STRING_CSTs with the two string literals). I might >>> be missing the point of your question. >> >> Maybe irrelevant for strlen folding depending on what you do >> for missing '\0' termination. >> >>>> >>>> @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree >>>> ctor, >>>> tree byte_offset = DECL_FIELD_OFFSET (cfield); >>>> tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); >>>> tree field_size = DECL_SIZE (cfield); >>>> - offset_int bitoffset; >>>> - offset_int bitoffset_end, access_end; >>>> + >>>> + if (!field_size && TREE_CODE (cval) == STRING_CST) >>>> + { >>>> + /* Determine the size of the flexible array member from >>>> + the size of the string initializer provided for it. */ >>>> + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); >>>> + tree eltype = TREE_TYPE (TREE_TYPE (cval)); >>>> + len *= tree_to_uhwi (TYPE_SIZE (eltype)); >>>> + field_size = build_int_cst (size_type_node, len); >>>> + } >>>> >>>> Why does this only apply to STRING_CST initializers and not >>>> CONSTRUCTORS, >>>> say, for >>>> >>>> struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } }; >>> >>> I can't think of a use for it. Do you have something in mind? >> >> Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset >> which is useful in other parts of the compiler. So I don't see why >> it shouldn't work for general flex-arrays. >> >>>> >>>> ? And why not use simply >>>> >>>> field_size = TYPE_SIZE (TREE_TYPE (cval)); >>>> >>>> like you do in c_strlen? >>> >>> Yes, that's simpler, thanks. >>> >>>> >>>> Otherwise looks reasonable. >>> >>> Attached is an updated patch. I also enhanced the handling >>> of non-constant indices. They were already handled before >>> to a smaller extent. (There may be other opportunities >>> here.) >> >> Please don't do functional changes to a patch in review, without >> exactly pointing out the change. It makes review inefficent for me. >> >> Looks like it might be the NULL type argument handling? > > Sorry. The change I was referring to is the addition and handling > of the varidx variable to string_constant. It was necessary for > parity with the existing optimization for non-constant array > indices. It makes it possible to fold strlen calls like: > > const char a[][3] = { "", "1", "12" }; > > void f (int i) > { > if (__builtin_strlen (&a[2][i]) > 2) > __builtin_abort (); > } > > Prior to the patch GCC could that for one-dimensional arrays > so to my mind it would have been an oversight for a patch to > handle multi-dimensional arrays not to do the same thing for > those. > >> + >> + if (!field_size && TREE_CODE (cval) == STRING_CST) >> + { >> + /* Determine the size of the flexible array member from >> + the size of the string initializer provided for it. */ >> + /* unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); */ >> + /* tree eltype = TREE_TYPE (TREE_TYPE (cval)); */ >> + /* len *= tree_to_uhwi (TYPE_SIZE (eltype)); */ >> + /* field_size = build_int_cst (size_type_node, len); */ >> + field_size = TYPE_SIZE (TREE_TYPE (cval)); >> >> why all the commented code? >> >> OK with the comments removed and TREE_CODE (cval) == CONSTRUCTOR >> handled at that point. > > Done in r262522. > > Thanks. > Martin Hi Martin,
I believe this caused the following regresion on aarch64 and arm: FAIL: gcc.c-torture/execute/builtins/strlen-3.c execution, -Og -g I haven't quite looked into why though, Ill have a closer look. Cheers, Andre