Re: debug container mutex association

Jonathan Wakely Thu, 15 Sep 2016 01:52:18 -0700

On 14/09/16 22:17 +0200, François Dumont wrote:

On 14/09/2016 11:00, Jonathan Wakely wrote:
On 13/09/16 22:37 +0200, François Dumont wrote:
Hi
When I proposed to change std::hash for pointers my plan was touse this approach for the debug containers. So here is the patchleveraging on this technique to avoid going through _Hash_impl tohash address and get mutex index from it. I think it is obiousthat new access is faster so I didn't wrote a performance test todemonstrate it.
Is this actually a bottleneck that needs to be made faster?

I know it's nice if Debug Mode isn't very slow, but you've already
made it much better, and performance is not the primary goal of Debug
Mode.
Sure but my approach is that if I can make something faster then Ijust try. I considered that making this mode faster will allow itsusage in an even more extended way.
diff --git a/libstdc++-v3/config/abi/pre/gnu.verb/libstdc++-v3/config/abi/pre/gnu.ver
index 9b5bb23..c9a253e 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1947,6 +1947,21 @@ GLIBCXX_3.4.23 {
   _ZNSsC[12]ERKSs[jmy]RKSaIcE;
   _ZNSbIwSt11char_traitsIwESaIwEEC[12]ERKS2_mRKS1_;

+    # __gnu_debug::_Safe_sequence_base
+ _ZN11__gnu_debug19_Safe_sequence_base18_M_detach_singularEm;
The 'm' here should be [jmy] because std::size_t mangles differently
on different targets.
+ _ZN11__gnu_debug19_Safe_sequence_base22_M_revalidate_singularEm;
+    _ZN11__gnu_debug19_Safe_sequence_base12_M_get_mutexEm;
+    _ZN11__gnu_debug19_Safe_sequence_base7_M_swapERS0_m;
+
+    # __gnu_debug::_Safe_iterator_base
+ _ZN11__gnu_debug19_Safe_iterator_base9_M_attachEPNS_19_Safe_sequence_baseEbm;
+    _ZN11__gnu_debug19_Safe_iterator_base9_M_detachEm;
+
+ # __gnu_debug::_Safe_unordered_container_base and_Safe_local_iterator_base
+ _ZN11__gnu_debug30_Safe_unordered_container_base7_M_swapERS0_m;
+ 
_ZN11__gnu_debug25_Safe_local_iterator_base9_M_attachEPNS_19_Safe_sequence_baseEbm;
+    _ZN11__gnu_debug25_Safe_local_iterator_base9_M_detachEm;
+
} GLIBCXX_3.4.22;
It's unfortunate to have to export and maintain new symbols when we
don't offer the same level of ABI guarantee for Debug Mode  :-(
Yes, I have been told that Debug ABI compatibility was not guaranteedbut at the same time if we don't do so abi-check is failing. Couldn'twe have a special section in gnu.ver for unversioned symbols likethose ?


I we could, but making them unversioned wouldn't really be useful - we
still need to keep them in the library forever.

@@ -174,6 +191,18 @@ namespace __gnu_debug
           const _Safe_iterator_base* __last)
 { return __first->_M_can_compare(*__last); }

+  // Compute power of 2 not greater than __n.
+  template<size_t __n>
+    struct _Lowest_power_of_two
+    {
+      static const size_t __val
+        = _Lowest_power_of_two< (__n >> 1) >::__val + 1;
+    };
+
+  template<>
+    struct _Lowest_power_of_two<1>
+    { static const size_t __val = 1; };


The lowest power of two not greater than 1 is 2^0, but this gives the
result 1.

Similarly, the lowest power of two not greater than 2 is 2^1, i.e. 1.

Indeed, when writing the test case I decided to return 1 rather than 0for the _Lowest_power_of_two<1> specialization.It gave a bettercontainer instance - mutex mapping. But I forgot to change its name.


And the name is misleading, it doesn't compute a power of two, it
computes the base-2 logarithm (then adds one), so it's a compile-time
version of floor(log2(n))+1.

@@ -123,6 +125,36 @@ namespace __gnu_debug
     template<typename _Predicate>
   void
   _M_transfer_from_if(_Safe_sequence& __from, _Predicate __pred);
+
+      static _GLIBCXX_CONSTEXPR std::size_t
+      _S_alignment() _GLIBCXX_NOEXCEPT
+      { return _Lowest_power_of_two<__alignof(_Sequence)>::__val; }


This function is called "alignment" but it returns log2(alignof(T))+1


Yes, I wasn't proud about the naming.


Does it need to be constexpr? Is that for std::array?

No, it is not used in std::array context. I just considered that itcould be constexpr.


OK. Sometimes that's useful, I agree. But it's never used in a
constexpr context, and it's only called by the library directly, so
there's no advantage here (it just makes it harder to write because it
can only use code that's valid in a constant expression).

Anyway, you can write it without a template, and still be constexpr:


You can get the same result without _Lowest_power_of_two:

static _GLIBCXX_CONSTEXPR size_t
_S_alignbits() _GLIBCXX_NOEXCEPT
{ return __builtin_ctz(__alignof(_Sequence)) + 1; }

But I'm not convinced the +1 is correct, see below.

@@ -46,15 +47,30 @@ namespace

/** Returns different instances of __mutex depending on thepassed address* in order to limit contention without breaking currentlibrary binary

  *  compatibility. */
+  const size_t mask = 0xf;
+
 __gnu_cxx::__mutex&
-  get_safe_base_mutex(void* __address)
+  get_mutex(size_t index)
 {
-    const size_t mask = 0xf;
   static __gnu_cxx::__mutex safe_base_mutex[mask + 1];
-    const size_t index = _Hash_impl::hash(__address) & mask;
   return safe_base_mutex[index];
 }


N.B. https://gcc.gnu.org/PR71312

Ok, debug mode is also impacted. Shouldn't the alignment be set on the__mutex type directly ?


No, definitely not. Apart from the fact that it would be an ABI
change, it would mean that struct X { __mutex mx; int i; } would not
place the int right next to the mutex, it would force it onto a
different cacheline. We want our arrays of mutexes to be on separate
cachelines, but most uses of __mutex are not in an array.

We probably can't fix this properly yet, because we don't have the
hardware_destructive_interference_size value. We could just make a
conservative estimate of 64 bytes though.


+  __gnu_cxx::__mutex&
+  get_safe_base_mutex(void* address)
+  {
+    const size_t index = _Hash_impl::hash(address) & mask;
+    return get_mutex(index);
+  }
+
+  __gnu_cxx::__mutex&
+  get_safe_base_mutex(void* address, std::size_t alignment)
+  {
+    const size_t index
+      = (reinterpret_cast<size_t>(address) >> alignment) & mask;
+    return get_mutex(index);
+  }


Since the "alignment" value (which is really log2(align)+1) is always
a positive value that means we always shift the address. But that
means we drop the least significant bit of all pointers, even when the
type has alignof(T)==1 and so all bits in the pointer are significant.
Is that intentional? Is the idea to only drop bits that are always
zero?


Yes, to make sure we get use of all available mutexes.


OK, so the +1 is not needed?


By changing the algorithm we use to map an address to a mutex do we
allow programs to get two different mutexes for the same object? That
can only happen when some objects are compiled with an older GCC and
so still use the old _M_get_mutex() function that doesn't take an
argument. We require all objects to be rebuilt with the same compiler
for Debug Mode (but maybe not if using __gnu_debug::vector etc.
explicitly in normal mode?) so that's not a big deal. But it's a pity
we have to keep the old symbols around if using them is unsafe.

Yes, it could happen even if very unlikely. I would also prefer to getrid of former symbols but can we ?


No. Objects built against an older libstdc++ would fail to run when
the dynamic linker finds a new libstdc++.so.

We can't remove exported symbols from the DSO.

 void
 swap_its(__gnu_debug::_Safe_sequence_base& __lhs,
      __gnu_debug::_Safe_iterator_base*& __lhs_its,
@@ -70,8 +86,8 @@ namespace
 }

 void
-  swap_seq(__gnu_debug::_Safe_sequence_base& __lhs,
-       __gnu_debug::_Safe_sequence_base& __rhs)
+  swap_seq_single(__gnu_debug::_Safe_sequence_base& __lhs,
+          __gnu_debug::_Safe_sequence_base& __rhs)
 {
   swap(__lhs._M_version, __rhs._M_version);
   swap_its(__lhs, __lhs._M_iterators,
@@ -81,10 +97,41 @@ namespace
 }

 void
-  swap_ucont(__gnu_debug::_Safe_unordered_container_base& __lhs,
-        __gnu_debug::_Safe_unordered_container_base& __rhs)
+  lock_and_run(__gnu_cxx::__mutex& lhs_mutex,
+           __gnu_cxx::__mutex& rhs_mutex,
+           std::function<void()> action)
Do all calls to this fit into the std::function small-object buffer,
so we don't need to allocate memory for it?
I didn't consider this. I thought it was a nice way to avoid makinglock_and_run a template function to take the lambda.
Reading this I realize that we could perhaps acheive the same withouttouching as much code. __alignof of container types will always givethe same value on a give target no ? Does it depends on the type usedto instantiate the container ? Isn't there some macro giving thetarget alignment that I could use ?


Alignment isn't a single value for a target, e.g. for some targets
alignof(void*) != alignof(int), so which "target alignment" do you
want?

It seems likely that all the debug containers will be aligned to at
least alignof(void*), because of the _Safe_base::_M_sequence pointer
member. It should be OK to assume alignof(*this) == alignof(void*).

Re: debug container mutex association

Reply via email to