Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

Adrian Prantl via Dwarf-Discuss Tue, 25 Jan 2022 08:14:12 -0800


> On Jan 23, 2022, at 2:53 PM, David Blaikie <dblai...@gmail.com> wrote:
> 
> A rather common "quality of implementation" issue seems to be lambda naming.
> 
> I came across this due to non-canonicalization of lambda names in template 
> parameters depending on how a source file is named in Clang, and GCC's seem 
> to be very ambiguous:
> 
> $ cat tmp/lambda.h
> template<typename T>
> void f1(T) { }
> static int i = (f1([]{}), 1);
> static int j = (f1([]{}), 2);
> void f1() {
>   f1([]{});
>   f1([]{});
> }
> $ cat tmp/lambda.cpp
> #ifdef I_PATH
> #include <tmp/lambda.h>
> #else
> #include "lambda.h"
> #endif
> $ clang++-tot tmp/lambda.cpp -g -c -I. -DI_PATH && llvm-dwarfdump-tot 
> lambda.o | grep "f1<"
>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:3:20)>")
>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:4:20)>")
>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:6:6)>")
>                 DW_AT_name      ("f1<(lambda at ./tmp/lambda.h:7:6)>")
> $ clang++-tot tmp/lambda.cpp -g -c && llvm-dwarfdump-tot lambda.o | grep "f1<"
>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:3:20)>")
>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:4:20)>")
>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:6:6)>")
>                 DW_AT_name      ("f1<(lambda at tmp/lambda.h:7:6)>")
> $ g++-tot tmp/lambda.cpp -g -c -I. && llvm-dwarfdump-tot lambda.o | grep "f1<"
>                 DW_AT_name      ("f1<f1()::<lambda()> >")
>                 DW_AT_name      ("f1<f1()::<lambda()> >")
>                 DW_AT_name      ("f1<<lambda()> >")
>                 DW_AT_name      ("f1<<lambda()> >")
> 
> (I came across this in the context of my simplified template names work - 
> rebuilding names from the DW_TAG description of the template parameters - and 
> while I'm not rebuilding names that have lambda parameters (keep encoding the 
> full string instead). The issue is if some other type depending on a type 
> with a lambda parameter - but then multiple uses of that inner type exist, 
> from different translation units (using type units) with different ways of 
> naming the same file - so then the expected name has one spelling, but the 
> actual spelling is different due to the "./")
> 
> But all this said - it'd be good to figure out a reliable naming - the naming 
> we have here, while usable for humans (pointing to surce files, etc) - they 
> don't reliably give unique names for each lambda/template instantiation which 
> would make it difficult for a consumer to know if two entities are the same 
> (important for types - is some function parameter the same type as another 
> type?)
> 
> While it's expected cross-producer (eg: trying to be compatible with GCC and 
> Clang debug info) you have to do some fuzzy matching (eg: "f1<int*>" or 
> "f1<int *>" at the most basic - there are more complicated cases) - this 
> one's not possible with the data available.
> 
> The source file/line/column is insufficient to uniquely identify a lambda 
> (multiple lambdas stamped out by a macro would get all the same 
> file/line/col) and valid code (albeit unlikely) that writes the same 
> definition in multiple places could make the same lambda have different names.
> 
> We should probably use something more like the way various ABI manglings do 
> to identify these entities.
> 
> But we should probably also do this for other unnamed types that have linkage 
> (need to/would benefit from being matched up between two CUs), even not 
> lambdas.
> 
> FWIW, at least the llvm-cxxfilt demanglings of clang's manglings for these 
> symbols is:
> 
>  void f1<$_0>($_0)
>  f1<$_1>($_1)
>  void f1<f1()::$_2>(f1()::$_2)
>  void f1<f1()::$_3>(f1()::$_3)
> 
> Should we use that instead?


The only other information that the current human-readable DWARF name carries 
is the file+line and that is fully redundant with DW_AT_file/line, so the above 
scheme seem reasonable to me. Poorly symbolicated backtraces would be worse in 
this scheme, so I'm expecting most pushback from users who rely on a tool that 
just prints the human readable name with no source info.

> 
> GCC's mangling's different (in these examples that's OK, since they're all 
> internal linkage):
> 
>  void f1<f1()::'lambda0'()>(f1()::'lambda0'())
>  void f1<f1()::'lambda'()>(f1()::'lambda'())
> 
> If I add an example like this:
> 
> inline auto f1() { return []{}; }
> 
> and instantiate the template with the result of f1:
> 
>  void f1<f2()::'lambda'()>(f2()::'lambda'())
> 
> GCC:
> 
>  void f1<f2()::'lambda'()>(f2()::'lambda'()) 
> 
> So they consistently use the same mangling - we could use the same naming for 
> template parameters?
> 
> How should we communicate this sort of identity for unnamed types in the DIEs 
> describing the types themselves (not just the string of a template name of a 
> type instantiated with the unnamed type) so the unnamed type can be matched 
> up between translation units.
> 
> eg, if I have these two translation units:
> // header
> inline auto f1() { struct { } local; return local; }
> // unit 1:
> #include "header"
> auto f2(decltype(f1())) { }
> // unit 2:
> #include "header"
> decltype(f1()) v1;
> 
> Currently the DWARF produced for this unnamed type is:
> 0x0000003f:   DW_TAG_structure_type
>                 DW_AT_calling_convention        (DW_CC_pass_by_value)
>                 DW_AT_byte_size (0x01)
>                 DW_AT_decl_file 
> ("/usr/local/google/home/blaikie/dev/scratch/test.cpp")
>                 DW_AT_decl_line (1)
> 

is this the type of struct {}?

> 
> So there's no way to know if you see that structure type definition in two 
> different translation units whether they refer to the same type because there 
> may be multiple types that have the same DWARF description. (so no way to 
> know if the DWARF consumer should allow the user to evaluate an expression 
> `f2(v1)` or not, I think?)

Does a C++ compiler usually treat structurally equivalent but differently named 
types as interchangeable?
Does a C++ compiler usually treat structurally equivalent anonymous types as 
interchangeable?

-- adrian

> 
> I guess the only way to have an unnamed type with linkage is to use it inside 
> an inline function - so within that scope you'd have to produce DWARF for any 
> types consistently in all definitions of the function and then a consumer 
> could match them up by counting (assuming the unnamed types were always 
> emitted in the same order in the child DIE list)... 
> 
> But this all seems a bit subtle & maybe would benefit from a more 
> robust/explicit description? 
> 
> Perhaps adding an integer attribute to number anonymous types? They'd need to 
> differentiate between lambdas and other anonymous types, since they have 
> separate numberings.

_______________________________________________
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org

Re: [Dwarf-Discuss] lambda (& other anonymous type) identification/naming

Reply via email to