erichkeane added inline comments.

================
Comment at: clang/include/clang/Basic/AttrDocs.td:1416-1417
+not significant. This allows global constants with the same contents to be
+merged. This can break global pointer identity, i.e. two different globals have
+the same address.
+
----------------
aeubanks wrote:
> erichkeane wrote:
> > aeubanks wrote:
> > > aaron.ballman wrote:
> > > > aeubanks wrote:
> > > > > aaron.ballman wrote:
> > > > > > aeubanks wrote:
> > > > > > > aaron.ballman wrote:
> > > > > > > > What happens for tentative definitions where the value isn't 
> > > > > > > > known? e.g.,
> > > > > > > > ```
> > > > > > > > [[clang::unnamed_addr]] int i1, i2;
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > What happens if the types are similar but not the same?
> > > > > > > > ```
> > > > > > > > [[clang::unnamed_addr]] signed int i1 = 32;
> > > > > > > > [[clang::unnamed_addr]] unsigned int i2 = 32;
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > Should we diagnose taking the address of such an attributed 
> > > > > > > > variable so users have some hope of spotting the non-conforming 
> > > > > > > > situations?
> > > > > > > > 
> > > > > > > > Does this attribute have impacts across translation unit 
> > > > > > > > boundaries (perhaps only when doing LTO) or only within a 
> > > > > > > > single TU?
> > > > > > > > 
> > > > > > > > What does this attribute do in C++ in the presence of 
> > > > > > > > constructors and destructors? e.g.,
> > > > > > > > ```
> > > > > > > > struct S {
> > > > > > > >   S();
> > > > > > > >   ~S();
> > > > > > > > };
> > > > > > > > 
> > > > > > > > [[clang::unnamed_addr]] S s1, s2; // Are these merged and 
> > > > > > > > there's only one ctor/dtor call?
> > > > > > > > ```
> > > > > > > globals are only mergeable if they're known to be constant and 
> > > > > > > have the same value/size. this can be done at compile time only 
> > > > > > > if the optimizer can see the constant values, or at link time
> > > > > > > 
> > > > > > > so nothing would happen in any of the cases you've given.
> > > > > > > 
> > > > > > > but yeah that does imply that we should warn when the attribute 
> > > > > > > is used on non const, non-POD globals. I'll update this patch to 
> > > > > > > do that
> > > > > > > 
> > > > > > > as mentioned in the description, we actually do want to take the 
> > > > > > > address of these globals for table-driven parsing, but we don't 
> > > > > > > care about identity equality
> > > > > > > globals are only mergeable if they're known to be constant and 
> > > > > > > have the same value/size. this can be done at compile time only 
> > > > > > > if the optimizer can see the constant values, or at link time
> > > > > > >
> > > > > > > so nothing would happen in any of the cases you've given.
> > > > > > 
> > > > > > Ahhhh that's good to know. So I assume we *will* merge these?
> > > > > > 
> > > > > > ```
> > > > > > struct S {
> > > > > >   int i, j;
> > > > > >   float f;
> > > > > > };
> > > > > > 
> > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > ```
> > > > > > 
> > > > > > > but yeah that does imply that we should warn when the attribute 
> > > > > > > is used on non const, non-POD globals. I'll update this patch to 
> > > > > > > do that
> > > > > > 
> > > > > > Thank you, I think that will be more user-friendly
> > > > > > 
> > > > > > > as mentioned in the description, we actually do want to take the 
> > > > > > > address of these globals for table-driven parsing, but we don't 
> > > > > > > care about identity equality
> > > > > > 
> > > > > > Yeah, I still wonder if we want to diagnose just the same -- if the 
> > > > > > address is never taken, there's not really a way to notice the 
> > > > > > optimization, but if the address is taken, you basically get UB 
> > > > > > (and I think we should explicitly document it as such). Given how 
> > > > > > easy it is to accidentally take the address of something (like via 
> > > > > > a reference in C++), I think we should warn by default, but still 
> > > > > > have a warning group for folks who want to live life dangerously.
> > > > > > > globals are only mergeable if they're known to be constant and 
> > > > > > > have the same value/size. this can be done at compile time only 
> > > > > > > if the optimizer can see the constant values, or at link time
> > > > > > >
> > > > > > > so nothing would happen in any of the cases you've given.
> > > > > > 
> > > > > > Ahhhh that's good to know. So I assume we *will* merge these?
> > > > > > 
> > > > > > ```
> > > > > > struct S {
> > > > > >   int i, j;
> > > > > >   float f;
> > > > > > };
> > > > > > 
> > > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > > ```
> > > > > yeah those are merged even just by clang
> > > > > 
> > > > > ```
> > > > > struct S {
> > > > >   int i, j;
> > > > >   float f;
> > > > > };
> > > > > 
> > > > > [[clang::unnamed_addr]] const S s1 = { 1, 2, 3.0f };
> > > > > [[clang::unnamed_addr]] const S s2 = { 1, 2, 3.0f };
> > > > > [[clang::unnamed_addr]] const S s3 = s2;
> > > > > 
> > > > > const void * f1() {
> > > > >   return &s1;
> > > > > }
> > > > > 
> > > > > const void * f2() {
> > > > >   return &s2;
> > > > > }
> > > > > 
> > > > > const void * f3() {
> > > > >   return &s3;
> > > > > }
> > > > > 
> > > > > $ ./build/rel/bin/clang++ -S -emit-llvm -o - -O2 /tmp/a.cc
> > > > > ```
> > > > > > 
> > > > > > > but yeah that does imply that we should warn when the attribute 
> > > > > > > is used on non const, non-POD globals. I'll update this patch to 
> > > > > > > do that
> > > > > > 
> > > > > > Thank you, I think that will be more user-friendly
> > > > > > 
> > > > > > > as mentioned in the description, we actually do want to take the 
> > > > > > > address of these globals for table-driven parsing, but we don't 
> > > > > > > care about identity equality
> > > > > > 
> > > > > > Yeah, I still wonder if we want to diagnose just the same -- if the 
> > > > > > address is never taken, there's not really a way to notice the 
> > > > > > optimization, but if the address is taken, you basically get UB 
> > > > > > (and I think we should explicitly document it as such). Given how 
> > > > > > easy it is to accidentally take the address of something (like via 
> > > > > > a reference in C++), I think we should warn by default, but still 
> > > > > > have a warning group for folks who want to live life dangerously.
> > > > > 
> > > > > I don't understand how there would be any UB. Especially if the user 
> > > > > only dereferences them, and isn't comparing pointers, which is the 
> > > > > requested use case.
> > > > > yeah those are merged even just by clang
> > > > 
> > > > Nice, thank you for the confirmation!
> > > > 
> > > > > I don't understand how there would be any UB. Especially if the user 
> > > > > only dereferences them, and isn't comparing pointers, which is the 
> > > > > requested use case.
> > > > 
> > > > That's just it -- nothing prevents the user from taking the address and 
> > > > comparing the pointers, which is no longer defined behavior with this 
> > > > attribute. It would require a static analysis check to catch this 
> > > > problem unless the compiler statically warns on taking the address in 
> > > > the first place (IMO, we shouldn't assume users will use the attribute 
> > > > properly and thus need no help to do so). I was also thinking about 
> > > > things like accidental sharing across thread boundaries -- but perhaps 
> > > > that's fine because the data is constant. I was also thinking that this 
> > > > potentially breaks `restrict` semantics but on reflection... that seems 
> > > > almost intentional given the goal of the attribute. But things along 
> > > > these lines are what have me worried -- the language assumes unique 
> > > > locations for objects, so I expect there's going to be fallout when 
> > > > object locations are no longer unique. If we can remove sharp edges for 
> > > > users without compromising the utility of the attribute, I think that's 
> > > > beneficial. Or are you saying that warning like this would basically 
> > > > compromise the utility?
> > > > > I don't understand how there would be any UB. Especially if the user 
> > > > > only dereferences them, and isn't comparing pointers, which is the 
> > > > > requested use case.
> > > > 
> > > > That's just it -- nothing prevents the user from taking the address and 
> > > > comparing the pointers, which is no longer defined behavior with this 
> > > > attribute. It would require a static analysis check to catch this 
> > > > problem unless the compiler statically warns on taking the address in 
> > > > the first place (IMO, we shouldn't assume users will use the attribute 
> > > > properly and thus need no help to do so). I was also thinking about 
> > > > things like accidental sharing across thread boundaries -- but perhaps 
> > > > that's fine because the data is constant. I was also thinking that this 
> > > > potentially breaks `restrict` semantics but on reflection... that seems 
> > > > almost intentional given the goal of the attribute. But things along 
> > > > these lines are what have me worried -- the language assumes unique 
> > > > locations for objects, so I expect there's going to be fallout when 
> > > > object locations are no longer unique. If we can remove sharp edges for 
> > > > users without compromising the utility of the attribute, I think that's 
> > > > beneficial. Or are you saying that warning like this would basically 
> > > > compromise the utility?
> > > 
> > > when you say "undefined behavior" do you mean "it's unspecified what 
> > > happens" or literally the C/C++ "undefined behavior" where the compiler 
> > > can assume it doesn't happen?
> > > 
> > > I don't think there's any UB in the C/C++ "undefined behavior" sense, 
> > > we're just dropping a C/C++ guarantee of unique pointer identity for 
> > > certain globals.
> > > 
> > > Yes I believe the warning would compromise the utility since the 
> > > underlying request behind this is a case where the user explicitly wants 
> > > to take the address of these globals for table driven parsing but does 
> > > not care about unique global identity. i.e. it's fine if we have 
> > > duplicate addresses in the table as long as each entry points to the 
> > > proper data.
> > I think this IS causing undefined behavior, any program that assumes the 
> > addresses aren't the same (such as inserting addresses into a map, 
> > explicitly destructing/initializing/etc), or are comparing addresses are 
> > now exhibiting UB (in the purest of C++ senses).  It isn't the 'taking the 
> > address' that is UB, it is comparing them, but unfortunately we don't have 
> > a good way to diagnose THAT.  I believe what Aaron is getting at is that 
> > the taking of the addresses should be diagnosed, particularly if we end up 
> > taking said address less-obviously.
> > 
> > It DOES have to be a Static Analysis type diagnostic however, since I don't 
> > think it is accurate enough.
> > 
> perhaps I'm arguing too literally (or am just wrong), but even if the program 
> behaves incorrectly due to comparisons of addresses now giving different 
> results, that's not necessarily UB. even if a program were using pointers as 
> a key into a map and two globals that previously were separate entries are 
> now one, that's not necessarily UB, that's just a program behaving 
> incorrectly.
> 
> unless there's a specific C/C++ rule that you have in mind that I'm missing?
> 
> I'm happy to add a warning that we can turn off internally if people agree 
> that taking the address of an `unnamed_addr` global is dangerous in general, 
> not sure if that should be default on or off
The simplest one I can think of is something like a :

```swap_containers(global1, global2); //swap that takes by ref, but doesn't 
check addresses```

Warnings are obviously trivially disable-able, and I'd be fine with making it a 
really fine-grained warning group, but I think the dangers of this attribute 
are significant.  MANY operations can have a precondition of "my parameters are 
different objects" that this can now cause you to trivially violate in a way 
that was never a problem before (since we give them different names!).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158223/new/

https://reviews.llvm.org/D158223

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to