JonChesterfield added inline comments.

================
Comment at: clang/test/OpenMP/barrier_codegen.cpp:22
+// CLANGCG-NOT: readonly
+// IRBUILDER:      ; Function Attrs: nofree nosync nounwind readonly
+// IRBUILDER-NEXT: declare i32 @__kmpc_global_thread_num(%struct.ident_t*)
----------------
ABataev wrote:
> jdoerfert wrote:
> > ABataev wrote:
> > > jdoerfert wrote:
> > > > ABataev wrote:
> > > > > jdoerfert wrote:
> > > > > > ABataev wrote:
> > > > > > > jdoerfert wrote:
> > > > > > > > ABataev wrote:
> > > > > > > > > Not sure about correct use of `nosync` and `readonly` 
> > > > > > > > > attributes. OpenMP runtime uses lazy initialization of the 
> > > > > > > > > runtime library and when any runtime function is called, the 
> > > > > > > > > inner parts of the OpenMP runtime are initialized 
> > > > > > > > > automatically. It may use some sync primitives and may modify 
> > > > > > > > > memory, I assume. Same about `nofree`.
> > > > > > > > There are two versions of these functions, host and device. I 
> > > > > > > > assume host functions are not inlined but device functions 
> > > > > > > > might be. This is basically all the modes we support right now.
> > > > > > > > 
> > > > > > > > If we do not inline the function (host) we don't necessarily 
> > > > > > > > care what they do but what effect the user can expect.
> > > > > > > > The user can not expect to synchronize through 
> > > > > > > > `__kmpc_global_thread_num` calls in a defined way, thus 
> > > > > > > > `nosync`.
> > > > > > > > Similarly, from the users perspective there is no way to 
> > > > > > > > determine if something was written or freed, no matter how many 
> > > > > > > > of these calls I issue and under which control conditions, all 
> > > > > > > > I see is the number as a result. Thus, `readonly` and `nofree`. 
> > > > > > > > I believe `readnone` is even fine here but it might not work 
> > > > > > > > for the device version (see below) so I removed it.
> > > > > > > > 
> > > > > > > > If we do inline the function (device) we need to make sure the 
> > > > > > > > attributes are compatible with the inlined code to not cause 
> > > > > > > > UB. The code of `__kmpc_global_thread_num` at least does not 
> > > > > > > > write anything and only reads stuff (as far as I can tell).
> > > > > > > > 
> > > > > > > > Please correct me if I overlooked something. 
> > > > > > > But someone may try to inline the host-based runtime or try to 
> > > > > > > use LTO with it.
> > > > > > > The question is not about the user expectations but about the 
> > > > > > > optimizations which can be triggered with these attributes.
> > > > > > This is our runtime and we have supported and unsupported usage 
> > > > > > models.
> > > > > Hmm, I don't think this the right approach. Plus, you still did not 
> > > > > answer about optimizations. Maybe, currently, these attributes won't 
> > > > > trigger dangerous optimizations but they can do this in the future 
> > > > > and it may lead to unpredictable results. I would use the pessimistic 
> > > > > model here rather than over-optimistic.
> > > > I did (try to) describe why there cannot be any problems wrt. 
> > > > optimizations:
> > > > The specified behavior of the runtime call is _as if_ it is `readonly`, 
> > > > `nofree`, and `nosync`.
> > > > That is, from the perspective of the compiler this is true and 
> > > > optimizations are allowed to use that fact.
> > > >  
> > > > The fact that the first ever runtime call initializes the runtime is 
> > > > neither part of the specification nor of the observable behavior. If we 
> > > > change the order between two call to `__kmpc_global_thread_num`, or 
> > > > similar calls, we cannot observe if/which one initialized the runtime 
> > > > and which read only stuff.
> > > Here is the code of this function from the libomp:
> > > ```
> > >   int gtid;
> > > 
> > >   if (!__kmp_init_serial) {
> > >     gtid = KMP_GTID_DNE;
> > >   } else
> > > #ifdef KMP_TDATA_GTID
> > >       if (TCR_4(__kmp_gtid_mode) >= 3) {
> > >     KA_TRACE(1000, ("*** __kmp_get_global_thread_id_reg: using TDATA\n"));
> > >     gtid = __kmp_gtid;
> > >   } else
> > > #endif
> > >       if (TCR_4(__kmp_gtid_mode) >= 2) {
> > >     KA_TRACE(1000, ("*** __kmp_get_global_thread_id_reg: using keyed 
> > > TLS\n"));
> > >     gtid = __kmp_gtid_get_specific();
> > >   } else {
> > >     KA_TRACE(1000,
> > >              ("*** __kmp_get_global_thread_id_reg: using internal 
> > > alg.\n"));
> > >     gtid = __kmp_get_global_thread_id();
> > >   }
> > > 
> > >   /* we must be a new uber master sibling thread */
> > >   if (gtid == KMP_GTID_DNE) {
> > >     KA_TRACE(10,
> > >              ("__kmp_get_global_thread_id_reg: Encountered new root 
> > > thread. "
> > >               "Registering a new gtid.\n"));
> > >     __kmp_acquire_bootstrap_lock(&__kmp_initz_lock);
> > >     if (!__kmp_init_serial) {
> > >       __kmp_do_serial_initialize();
> > >       gtid = __kmp_gtid_get_specific();
> > >     } else {
> > >       gtid = __kmp_register_root(FALSE);
> > >     }
> > >     __kmp_release_bootstrap_lock(&__kmp_initz_lock);
> > >     /*__kmp_printf( "+++ %d\n", gtid ); */ /* GROO */
> > >   }
> > > 
> > >   KMP_DEBUG_ASSERT(gtid >= 0);
> > > 
> > >   return gtid;
> > > 
> > > ```
> > > 
> > > As you can see, it is very complex. It has locks, in the debug mode it 
> > > writes something, etc. It is not only about the initialization, there are 
> > > some other calls (like `__kmp_get_global_thread_id()`, which really 
> > > writes something). Do you really think it is safe to mark this as 
> > > readonly and nosync?
> > >  Do you really think it is safe to mark this as readonly and nosync?
> > 
> > Yes. I am arguing why in the last few comments.
> > 
> > The central question is not what the function does but how it behaves (as 
> > long as we do not inline it, then what it does leaks).
> I still don't think the right way to do it. Instead I would suggest to think 
> about platform dependent attributes. Say, in general we use some pessimistic 
> set of attributes but for some platforms (like NVPTX or AMDGCN) we can define 
> more optimistic set of attributes.
Properties that are true globally, in the as-if sense, but violated locally, if 
inlined, are a seriously complicated proposal. I don't want to block this patch 
on a satisfactory resolution to that so suggest we go with conservative.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D69922/new/

https://reviews.llvm.org/D69922



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to