rsmith added inline comments.
================ Comment at: clang/lib/Sema/SemaChecking.cpp:433 + + llvm::StringRef getFormatSpecifier(QualType T) { + if (auto *BT = T->getAs<BuiltinType>()) { ---------------- rsmith wrote: > aaron.ballman wrote: > > yihanaa wrote: > > > I think this is better maintained in "clang/AST/FormatString.h". For > > > example analyze_printf::PrintfSpecifier can get format specifier, what do > > > you all think about? > > +1 to this suggestion -- my hope is that we could generalize it more then > > as I notice there are missing specifiers for things like intmax_t, size_t, > > ptrdiff_t, _Decimal types, etc. Plus, that will hopefully make it easier > > for __builtin_dump_struct to benefit when new format specifiers are added, > > such as ones for printing a _BitInt. > I am somewhat uncertain: every one of these is making arbitrary choices about > how to format the value, so it's not clear to me that this is general logic > rather than something specific to `__builtin_dump_struct`. For example, using > `%f` rather than one of the myriad other ways of formatting `double` is > somewhat arbitrary. Using `%s` for any `const char*` is *extremely* arbitrary > and will be wrong and will cause crashes in some cases, but it may be the > pragmatically correct choice for a dumping utility. A general-purpose > mechanism would use `%p` for all kinds of pointer. > > We could perhaps factor out the formatting for cases where there is a clear > canonical default formatting, such as for integer types and probably `%p` for > pointers, then call that from here with some special-casing, but without a > second consumer for that functionality it's really not clear to me what form > it should take. I went ahead and did this, mostly to match concurrent changes to the old implementation. There are a few cases where our existing "guess a format specifier" logic does the wrong thing for dumping purposes, which I've explicitly handled -- things like wanting to dump a `char` / `signed char` / `unsigned char` member as a number rather than as a (potentially non-printable or whitespace) character. ================ Comment at: clang/lib/Sema/SemaChecking.cpp:573 + // We don't know how to print this field. Print out its address + // with a format specifier that a smart tool will be able to + // recognize and treat specially. ---------------- aaron.ballman wrote: > rsmith wrote: > > yihanaa wrote: > > > Can we generate a warning diagnostic message when we see an unsupported > > > type (perhaps temporarily not supported), what do you all think about? > > We could, but I'm leaning towards thinking that we shouldn't warn by > > default at least. That would get in the way of the C++ use case where the > > type actually is supported by the printing function; I'm not sure that we > > can easily tell those cases apart. And it's probably not all that useful to > > someone who just wants whatever information we can give them while > > debugging. I'd be OK with adding a warning that's off by default, though. > > What would be the intended use case here? Would an off-by-default warning > > satisfy it? > So long as we don't trigger UB, I think it's fine to not diagnose for > unsupported types. But the UB cases do strike me as something we might want > to warn about if we think they can be hit (specifically, these kinds of UB: > https://eel.is/c++draft/cstdarg.syn#1). > > As an example, do we properly handle dumping this case: > ``` > struct S { > int &i; > S(int &ref) : i(ref) {} > }; > ``` > where the vararg argument is of reference type? I suppose another interesting > question is: > ``` > struct S { > int i; > }; > > void func() { > S s; > [=]() { > __builtin_dump_struct(&s, ::printf); > }; > } > ``` > (is S.i "an entity resulting from a lambda capture"?) The only things we pass to the formatting function are: * The small number of builtin types for which we have known formatting specifiers. * Pointers to `void`, using `%p`. * Pointers to `[const] char`, using `%s`. * Pointers to user-defined types, using `*%p`. All of these can be passed through a `...` without problems. We can potentially introduce the following sources of UB: * Indirection through a struct pointer that doesn't point to a struct object. * Read of an inactive union member. * Read of an uninitialized member. * Printing something other than a pointer to a nul-terminated byte sequence with `%s`. I'm not sure how much we should be worrying about those. It'd be nice to avoid them, though I don't think we can fix the fourth issue without giving up the `%s` formatting entirely. Reference types aren't on our list of types with known formatting specifiers, so in your first example we'd call `printf("%s%s = *%p", "int &", "i", &p->i)`, producing output like `int & i = *0x1234abcd`. We could special-case that to also dereference the reference and include the value (`int & i = *0x1234abcd (5)`, but for now I'm treating it like any other type for which we don't have a special formatting rule. I've added test coverage for this case. In your second example, we rewrite into `::printf("%s%s = %s", "int", "i", (&s)->i)`, which is fine. The `parmN` restriction is about calls to `va_start`: ``` void f(int a, ...) { [&] { va_list va; va_start(va, a); // error, `a` is captured here // ... }; } ``` Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D124221/new/ https://reviews.llvm.org/D124221 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits