On Wed, 1 Aug 2018, Bernd Edlinger wrote:

> >> > The change to have all STRING_CSTs NUL terminated (but that NUL
> >> > termination not necessarily inclided in STRING_LENGTH) is a good
> >> > one.
> >> >
> >> > I'm not sure how we can reliably verify NUL termination after the
> >> > fact though and build_string already makes sure to NUL terminate
> >> > STRING_CSTs.  So if GO strings are not NUL terminated then
> >> > the STRING_CSTs still are.
> >>
> >> The advantage is that there are less variations how string literals look
> >> like in the middle end.  We will have a simple way to determine if
> >> a string literal is NUL terminated or not.  And checking that property
> >> in varasm.c is exactly the right thing to do.
> >>
> >> String literals always have an array_type which may be shorter
> >> than TREE_STRING_LENGTH, but that chops off only exactly
> >> one wide character nul. Otherwise if the array_type is equal or larger,
> >> we know for sure the value is nul terminated. In the middle-end
> >> we can easily determine if a string is not NUL terminated by:
> >>
> >> compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
> >>                        TREE_STRING_LENGTH (init)) < 0
> >>
> >> I did use that already in my patch for pr86711.
> >
> > Hmm.  How does that tell you whether the string is NUL terminated?
> > TREE_STRING_LENGTH includes the NUL termination and if it is
> > a significant char then TYPE_SIZE_UNIT should as well.
> 
> I debugged that code a lot recently.
> static const char x[2] = "ab"
> gives a TREE_STRING_LENGTH of 3, the TREE_TYPE of that
> beast is an array type for char[2]. and TYPE_SIZE_UNIT = 2.

Hmm.  I think it would be nice if TREE_STRING_LENGTH would
match char[2] and TYPE_SIZE_UNIT even if that is inconvenient
for your check above.  Because the '\0' doesn't belong to the
string.  Then build_string internally appends a '\0' outside
of TREE_STRING_LENGTH.

> An ordinary C string "ab" has TYPE_SIZE_UNIT(TREE_TYPE(x)) = 3.
> 
> Of course with wide caracher strings the TREE_STING_LENGTH
> and TYPE_SIZE_UNIT of the ARRAY_TYPE are multiple of
> the used wide character type.

Sure.

> So I would like to be able to assume that the STRING_CST objects
> are internally always generated properly by the front end.

Yeah, I guess we need to define what "properly" is ;)

> And that the ARRAY_TYPE of the string literal either has the
> same length than the TREE_STRING_LENGTH or if it is shorter,
> this is always exactly one (wide) character size less than TREE_STRING_LENGTH

I think it should be always the same...

> The idea is to use this property of string literals where needed,
> and check rigorously in varasm.c.
> 
> Does that make sense?

So if it is not the same then the excess character needs to be
a (wide) NUL in your model?  ISTR your varasm.c patch didn't verify
that.

> >
> > Isn't a proper test to look at TREE_STRING_POINTER[TREE_STRING_LENGTH - 1]
> > (for HOST_CHAR_BITS strings)?
> >
> 
> There are also wide character strings, and all those test are broken 
> everywhere
> for wide characters right now.
> 
> Therefore checking the string constants at varasm.c revealed a lot of 
> intersting
> things.
> I will post several patches in the afternoon.
> 
> > Relying on the type here looks somewhat fragile to me.
> >
> > Abstracting a string_cst_nul_terminated_p () helper would be a good
> > idea I guess.
> 
> Yes. indeed.
> 
> > I realize using strlen(TREE_STRING_POINTER) doesn't work because
> > of embedded NULs.
> >
> >> Additionally not having oversize string constants produced
> >> by the front ends, where the extra characters are good for nothing,
> >> also helps to improve correctness.
> >>
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to