[PATCH] D79279: Add overloaded versions of builtin mem* functions

JF Bastien via Phabricator via cfe-commits Wed, 26 Aug 2020 16:26:29 -0700

jfb added a comment.

In D79279#2235085 <https://reviews.llvm.org/D79279#2235085>, @rsmith wrote:


> Thanks, I'm happy with this approach.
>
> If I understand correctly, the primary (perhaps sole) purpose of 
> `__builtin_memcpy_sized` is to provide a primitive from which an atomic 
> operation can be produced. That being the case, I wonder if the name is 
> emphasizing the wrong thing, and a name that contains `atomic` would be more 
> obvious. As a concrete alternative, `__atomic_unordered_memcpy` is not much 
> longer and seems to have the right kinds of implications. WDYT?

Kinda, that's the motivating from Hans' paper which I'm following. One other 
use case (and the reason I assume Azul folks want it too) is when there's a GC 
that looks at objects. With this it knows it won't see torn objects when the GC 
is concurrent. It's similar, but generally `atomic` also implies an ordering, 
and here it's explicitly unordered (i.e. Bring Your Own Fences).

So I don't have a strong opinion on the name, but `atomic` seem slightly wrong.

Follow-ups I suggest in comments:

- Make "must be trivially copyable" a warning throughout (not just here, but 
for atomics too).
- Implement that diagnostic for `mem*` functions.



================
Comment at: clang/docs/LanguageExtensions.rst:2403-2406
+Constant evaluation support is only provided when the source and destination 
are
+pointers to arrays with the same trivially copyable element type, and the given
+size is an exact multiple of the element size that is no greater than the 
number
+of elements accessible through the source and destination operands.
----------------
rsmith wrote:
> Is this true in general, or only for `[w]mem{cpy,move}`? I thought for the 
> other cases, we required an array of `char` / `wchar_t`?
This is just moving documentation that was below. Phab is confused with the 
diff.


================
Comment at: clang/docs/LanguageExtensions.rst:2409
+Support for constant expression evaluation for the above builtins can be 
detected
+with ``__has_feature(cxx_constexpr_string_builtins)``.
+
----------------
rsmith wrote:
> I think this only applies to the above list minus the five functions you 
> added to it. Given this and the previous comment, I'm not sure that merging 
> the documentation on string builtins and memory builtins is working out well 
> -- they seem to have more differences than similarities.
> 
> (`memset` is an outlier here that should be called out -- we don't seem to 
> provide any constant evaluation support for it whatsoever.)
[w]memset are indeed the odd ones, update says so.


================
Comment at: clang/lib/AST/ExprConstant.cpp:8870
+        return false;
+      if (N.urem(ElSz.getLimitedValue()) != 0) {
+        Info.FFDiag(E, diag::note_constexpr_mem_sized_bad_size)
----------------
rsmith wrote:
> `getLimitedValue()` here seems unnecessary; `urem` can take an `APInt`.
Their bitwidth doesn't always match, and that asserts out.


================
Comment at: clang/lib/AST/ExprConstant.cpp:8872
+        Info.FFDiag(E, diag::note_constexpr_mem_sized_bad_size)
+            << (int)N.getLimitedValue() << (int)ElSz.getLimitedValue();
+        return false;
----------------
rsmith wrote:
> Consider using `toString` instead of truncating each `APSInt` to `uint64_t` 
> then to `int`. The size might reliably fit into `uint64_t`, but I don't think 
> we can assume that `int` is large enough.
OK I updated 2 other places as well.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:635
+    return ArgTy->castAs<clang::ObjCObjectPointerType>()->getPointeeType();
+  return ArgTy->castAs<clang::PointerType>()->getPointeeType();
+}
----------------
rsmith wrote:
> I'm not sure this `castAs` is necessarily correct. If the operand is C++11 
> `nullptr`, we could perform a null-to-pointer implicit conversion, and 
> `ArgTy` could be `NullPtrTy` after stripping that back off here.
> 
> It seems like maybe what we want to do is strip off implicit conversions 
> until we hit a non-pointer type, and take the pointee type we found 
> immediately before that?
Ah good catch! The new functions I'm adding just disallow nullptr, but the 
older ones allow it. I've modified the code accordingly and added a test in 
CodeGen for nullptr.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:2639-2640
   case Builtin::BI__builtin_mempcpy: {
+    QualType DestTy = getPtrArgType(CGM, E, 0);
+    QualType SrcTy = getPtrArgType(CGM, E, 1);
     Address Dest = EmitPointerWithAlignment(E->getArg(0));
----------------
rsmith wrote:
> jfb wrote:
> > rsmith wrote:
> > > jfb wrote:
> > > > rsmith wrote:
> > > > > Looking through implicit conversions in `getPtrArgType` here will 
> > > > > change the code we generate for cases like:
> > > > > 
> > > > > ```
> > > > > void f(volatile void *p, volatile void *q) {
> > > > >   memcpy(p, q, 4);
> > > > > }
> > > > > ```
> > > > > 
> > > > > ... (in C, where we permit such implicit conversions) to use a 
> > > > > volatile memcpy intrinsic. Is that an intentional change?
> > > > I'm confused... what's the difference that this makes for the 
> > > > pre-existing builtins? My intent was to get the `QualType` 
> > > > unconditionally, but I can conditionalize it if needed... However this 
> > > > ought to make no difference:
> > > > ```
> > > > static QualType getPtrArgType(CodeGenModule &CGM, const CallExpr *E,
> > > >                               unsigned ArgNo) {
> > > >   QualType ArgTy = E->getArg(ArgNo)->IgnoreImpCasts()->getType();
> > > >   if (ArgTy->isArrayType())
> > > >     return CGM.getContext().getAsArrayType(ArgTy)->getElementType();
> > > >   if (ArgTy->isObjCObjectPointerType())
> > > >     return 
> > > > ArgTy->castAs<clang::ObjCObjectPointerType>()->getPointeeType();
> > > >   return ArgTy->castAs<clang::PointerType>()->getPointeeType();
> > > > }
> > > > ```
> > > > and indeed I can't see the example you provided change in IR from one 
> > > > to the other. The issue I'm working around is that getting it 
> > > > unconditionally would make ObjC code sad when `id` is passed in as I 
> > > > outlined above.
> > > The example I gave should produce a non-volatile memcpy, and used to do 
> > > so (we passed `false` as the fourth parameter to `CreateMemCpy`). With 
> > > this patch, `getPtrArgType`will strip off the implicit conversion from 
> > > `volatile void*` to `void*` in the argument type, so `isVolatile` below 
> > > will be `true`, so I think it will now create a volatile memcpy for the 
> > > same testcase. If that's not what's happening, then I'd like to 
> > > understand why not :)
> > > 
> > > I'm not saying that's necessarily a bad change, but it is a change, and 
> > > it's one we should make intentionally if we make it at all.
> > Oh yes, sorry I thought you were talking about something that 
> > `getPtrArgType` did implicitly! Indeed the C code behaves differently in 
> > that it doesn't just strip `volatile` anymore.
> > 
> > I'm not super thrilled by the default C behavior, and I think this new 
> > behavior removes a gotcha, and is in fact what I was going for in the first 
> > iteration of the patch. Now that I've separated the builtin I agree that 
> > it's a bit odd... but it seems like the right thing to do anyways? But it 
> > no longer matches the C library function to do so.
> > 
> > FWIW, this currently "works as you'd expect":
> > ```
> > void f(__attribute__((address_space(32))) void *dst, const void *src, int 
> > sz) {
> >     __builtin_memcpy(dst, src, sz);
> > }
> > ```
> > https://godbolt.org/z/dcWxcK
> > 
> > and I think that's completely accidental because the C library function 
> > doesn't (and, as John pointed out earlier, the builtin is meant to act like 
> > the C function).
> > FWIW, this currently "works as you'd expect":
> >
> > ```
> > void f(__attribute__((address_space(32))) void *dst, const void *src, int 
> > sz) {
> >     __builtin_memcpy(dst, src, sz);
> > }
> > ```
> 
> The same is true even if you remove the `__builtin_` (and add a suitable 
> include), and that seems like a bug to me. It looks like we have special 
> logic that treats *all* builtins taking pointers as being overloaded on 
> address space, which seems wrong at least for lib builtins. C TR 18037:2008 
> is quite explicit about this. Section 5.1.4 says:
> 
> """
> The standard C library (ISO/IEC 9899:1999 clause 7 - Libraries) is unchanged; 
> the library's
> functions and objects continue to be declared only with regard to the generic 
> address space. One
> consequence is that pointers into named address spaces cannot be passed as 
> arguments to library
> functions except in the special case that the named address spaces are 
> subsets of the generic
> address space.
> """
> 
> We could retain that special rule for `__builtin_`-spelled variants of lib 
> builtins. If we do, then maybe we shouldn't be adding 
> `__builtin_memcpy_overloaded` at all and should only extend the behavior of 
> `__builtin_memcpy` to also propagate volatility (and add a new builtin for 
> the atomic case).
> 
> Regarding volatile, consider:
> 
> ```
> void maybe_volatile_memcpy(volatile void *dst, const volatile void *src, int 
> sz, _Bool is_volatile) {
>   if (is_volatile) {
> #ifdef __clang__
>     __builtin_memcpy_overloaded(dst, src, sz);
> #elif __GNUC__
>     // ...
> #else
>     // volatile char copy loop
> #endif
>   }
>   memcpy(dst, src, sz);
> }
> ```
> 
> With this patch, the above code will always perform a volatile `memcpy`. I 
> think that's a surprise. A call to `memcpy` should follow the C semantics, 
> even if we choose to change the semantics of `__builtin_memcpy`.
Yes, I believe that this is a pre-existing inconsistency with the 
non-`__builtin_` variants.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:5609
+    return ExprError(Diag(TheCall->getBeginLoc(),
+                          PDiag(diag::err_atomic_op_needs_trivial_copy))
+                     << DstValTy << DstOp->getSourceRange());
----------------
rsmith wrote:
> Generally, I'm a little uncomfortable about producing an error if a type is 
> complete but allowing the construct if the type is incomplete -- that seems 
> like a situation where a warning would be more appropriate to me. It's 
> surprising and largely unprecedented that providing more information about a 
> type would change the program from valid to invalid.
> 
> Do we really need the protection of an error here rather than an 
> enabled-by-default warning? Moreover, don't we already have a warning for 
> `memcpy` of a non-trivially-copyable object somewhere? If not, then I think 
> we should add such a thing that also applies to the real `memcpy`, rather 
> than only warning on the builtin.
That rationale makes sense, but it's pre-existing behavior for atomic. I can 
change all of them in a follow-up if that's OK?

We don't have such a check for other builtins. I can do a second follow-up to 
then adopt these warnings for them too?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D79279/new/

https://reviews.llvm.org/D79279

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D79279: Add overloaded versions of builtin mem* functions

Reply via email to