rsmith added inline comments.

================
Comment at: clang/lib/Sema/SemaChecking.cpp:433
+
+  llvm::StringRef getFormatSpecifier(QualType T) {
+    if (auto *BT = T->getAs<BuiltinType>()) {
----------------
rsmith wrote:
> aaron.ballman wrote:
> > yihanaa wrote:
> > > I think this is better maintained in "clang/AST/FormatString.h". For 
> > > example analyze_printf::PrintfSpecifier can get format specifier, what do 
> > > you all think about?
> > +1 to this suggestion -- my hope is that we could generalize it more then 
> > as I notice there are missing specifiers for things like intmax_t, size_t, 
> > ptrdiff_t, _Decimal types, etc. Plus, that will hopefully make it easier 
> > for __builtin_dump_struct to benefit when new format specifiers are added, 
> > such as ones for printing a _BitInt.
> I am somewhat uncertain: every one of these is making arbitrary choices about 
> how to format the value, so it's not clear to me that this is general logic 
> rather than something specific to `__builtin_dump_struct`. For example, using 
> `%f` rather than one of the myriad other ways of formatting `double` is 
> somewhat arbitrary. Using `%s` for any `const char*` is *extremely* arbitrary 
> and will be wrong and will cause crashes in some cases, but it may be the 
> pragmatically correct choice for a dumping utility. A general-purpose 
> mechanism would use `%p` for all kinds of pointer.
> 
> We could perhaps factor out the formatting for cases where there is a clear 
> canonical default formatting, such as for integer types and probably `%p` for 
> pointers, then call that from here with some special-casing, but without a 
> second consumer for that functionality it's really not clear to me what form 
> it should take.
I went ahead and did this, mostly to match concurrent changes to the old 
implementation. There are a few cases where our existing "guess a format 
specifier" logic does the wrong thing for dumping purposes, which I've 
explicitly handled -- things like wanting to dump a `char` / `signed char` / 
`unsigned char` member as a number rather than as a (potentially non-printable 
or whitespace) character.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:573
+          // We don't know how to print this field. Print out its address
+          // with a format specifier that a smart tool will be able to
+          // recognize and treat specially.
----------------
aaron.ballman wrote:
> rsmith wrote:
> > yihanaa wrote:
> > > Can we generate a warning diagnostic message when we see an unsupported 
> > > type (perhaps temporarily not supported), what do you all think about?
> > We could, but I'm leaning towards thinking that we shouldn't warn by 
> > default at least. That would get in the way of the C++ use case where the 
> > type actually is supported by the printing function; I'm not sure that we 
> > can easily tell those cases apart. And it's probably not all that useful to 
> > someone who just wants whatever information we can give them while 
> > debugging. I'd be OK with adding a warning that's off by default, though. 
> > What would be the intended use case here? Would an off-by-default warning 
> > satisfy it?
> So long as we don't trigger UB, I think it's fine to not diagnose for 
> unsupported types. But the UB cases do strike me as something we might want 
> to warn about if we think they can be hit (specifically, these kinds of UB: 
> https://eel.is/c++draft/cstdarg.syn#1).
> 
> As an example, do we properly handle dumping this case:
> ```
> struct S {
>   int &i;
>   S(int &ref) : i(ref) {}
> };
> ```
> where the vararg argument is of reference type? I suppose another interesting 
> question is:
> ```
> struct S {
>   int i;
> };
> 
> void func() {
>   S s;
>   [=]() {
>     __builtin_dump_struct(&s, ::printf);
>   };
> }
> ```
> (is S.i "an entity resulting from a lambda capture"?)
The only things we pass to the formatting function are:
* The small number of builtin types for which we have known formatting 
specifiers.
* Pointers to `void`, using `%p`.
* Pointers to `[const] char`, using `%s`.
* Pointers to user-defined types, using `*%p`.
All of these can be passed through a `...` without problems.

We can potentially introduce the following sources of UB:
* Indirection through a struct pointer that doesn't point to a struct object.
* Read of an inactive union member.
* Read of an uninitialized member.
* Printing something other than a pointer to a nul-terminated byte sequence 
with `%s`.
I'm not sure how much we should be worrying about those. It'd be nice to avoid 
them, though I don't think we can fix the fourth issue without giving up the 
`%s` formatting entirely.

Reference types aren't on our list of types with known formatting specifiers, 
so in your first example we'd call `printf("%s%s = *%p", "int &", "i", &p->i)`, 
producing output like `int & i = *0x1234abcd`. We could special-case that to 
also dereference the reference and include the value (`int & i = *0x1234abcd 
(5)`, but for now I'm treating it like any other type for which we don't have a 
special formatting rule. I've added test coverage for this case.

In your second example, we rewrite into `::printf("%s%s = %s", "int", "i", 
(&s)->i)`, which is fine. The `parmN` restriction is about calls to `va_start`:
```
void f(int a, ...) {
  [&] {
    va_list va;
    va_start(va, a); // error, `a` is captured here
    // ...
  };
}
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124221/new/

https://reviews.llvm.org/D124221

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to