[PATCH] D124221: Reimplement `__builtin_dump_struct` in Sema.

Aaron Ballman via Phabricator via cfe-commits Thu, 05 May 2022 07:59:30 -0700

aaron.ballman added a comment.

In D124221#3493792 <https://reviews.llvm.org/D124221#3493792>, @erichkeane 
wrote:


> FWIW, I'm in favor of the patch as it sits.
>
> As a followup: So I was thinking about the "%s" specifier for string types.  
> Assuming char-ptr types are all strings is a LITTLE dangerous, but more so 
> the way we're doing it.  Its a shame we don't have some way of setting a 
> 'max' limit to the number of characters we have for 2 reasons:
>
> 1- For safety: If the char-ptr points to non-null-terminated memory, it'll 
> stop us from just arbitrarily printing into space by limiting at least the 
> NUMBER of characters we print into nonsense.
> 2- For readability: printing a 'long' string likely makes this output look 
> like nonsense and breaks everything up.  Limiting us to only a few characters 
> is likely a good idea.
> 3- <Bonus #3 from @aaron.ballman >: It might discourage SOME level of 
> attempts at using this for reflection, or at least make it a little harder.
>
> What I would love would be for something like a 10 char max:
>
>   struct S {
>      char *C;
>    };
>    S s { "The Rest of this string is cut off"};
>    print as:
>    struct U20A a = {
>      .c = 0x1234 "The Rest o"
>    };
>
> Sadly, I don't see something like that in printf specifiers?  Unless someone 
> smarter than me can come up with some trickery.  PERHAPS have the max-limit 
> compile-time configurable, but I don't feel strongly.

The C Standard has this in the specification of the %s format specifier:

  If no l length modifier is present, the argument shall be a pointer to 
storage of character
  type. Characters from the storage are written up to (but not including) the 
terminating
  null character. If the precision is specified, no more than that many bytes 
are written. If
  the precision is not specified or is greater than the size of the storage, 
the storage shall
  contain a null character.

So you can use the precision modifier on %s to limit the length to a particular 
number of bytes. The only downside I can think of to picking a limit is, what 
happens when the user stores valid UTF-8 data in their string and prints it via 
`%.10s` (will we then potentially be splitting a codepoint in half and that 
does something bad?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124221/new/

https://reviews.llvm.org/D124221

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D124221: Reimplement `__builtin_dump_struct` in Sema.

Reply via email to