I recall discussing this when you opened the defect but I'm not
sure what the outcome of the discussion was.

I looked at the code some more just now and I agree with you that
(at least the numpunct) facet isn't thread safe. The premise was
that we didn't need any locking because the facet never changes
after it's initialized, and the same data member can safely be
assigned multiple times.

The problem is that the string data members must be initialized
before they can be assigned, and assignment to the same member
must be guarded against concurrent accesses, there seems to be
no mechanism that guarantees that this will be true when the
same facet is used from within multiple threads.

For example, the function below tries to avoid re-initialization
of _C_grouping but the |= expression can be reordered either by
the compiler or by the hardware to complete before the assignment
to the member. The function also fails to protect the assignment
to the member from taking place simultaneously in multiple threads.

  template <class _CharT>
  inline string numpunct<_CharT>::grouping () const
  {
      if (!(_C_flags & _RW::__rw_gr)) {

          numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

          // [try to] get the grouping first (may throw)
          // then set a flag to avoid future initializations
          __self->_C_grouping  = do_grouping ();
          __self->_C_flags    |= _RW::__rw_gr;
      }

      return _C_grouping;
  }

We need a fix but I don't think the one in the patch attached
to the issue is a good solution. Locking all objects of the
template specialization is far too coarse. Even having a lock
per object would kill iostream performance. Creating a deep
copy on return from each of the functions would also slow
things down noticeably.

The efficient solution must avoid locking in the common case
(i.e., after the facet has been initialized). It can avoid
locking around the POD data members altogether since those can
safely be assigned multiple times (on common hardware), but it
needs to ensure that the string data members are initialized
before they are assigned to and are assigned at most once
simultaneously. If we introduce a lock, it must be per object
and not per type.

With that, I would expect the following function to fix the
problem in the one above. Let me know what you think.

  template <class _CharT>
  inline string numpunct<_CharT>::grouping () const
  {
      if (!(_C_flags & _RW::__rw_gr)) {

          numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

         _RWSTD_MT_GUARD (_C_mutex);

         // [try to] get the grouping first (may throw)
          // then set a flag to avoid future initializations
          __self->_C_grouping  = do_grouping ();
          __self->_C_flags    |= _RW::__rw_gr;
      }

      return _C_grouping;
  }

Martin

On 09/04/2012 07:40 PM, Stefan Teleman wrote:
On Tue, Sep 4, 2012 at 9:15 PM, Liviu Nicoara<nikko...@hates.ms>  wrote:

That is good, right?

Yes, it is good. :-)


Could not get an Intel build off the ground on the account of the LIMITS.cpp 
test not running to completion because of a possible compiler bug. I posted 
earlier an account of it. Do you have a support account that allows posting bug 
reports for Intel's C++ compiler?

Oh, yes I remember that now, it happened to me too. I worked around it
by copying the LIMITS executable from a gcc build.

I don't have a support contract with Intel - I'm using the free
compilers at home. I'm guessing Intel must have an open forum for
Intel developers using their compilers, so that might be a way to
report the bug. But I haven't looked for it.



I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle 
have any program that would allow one to test drive their compiler on a shared 
box somewhere? I vaguely remember that HP had something like that a while ago.

I know Sun had such a program. Oracle I don't really know but my
educated guess is "not a chance". ;-)

One way of testing with SunPro is to download the free version of
SunPro from Oracle, and use it on Linux (yes, SunPro exists for Linux
as well). It works very well on OpenSUSE and Ubuntu (tested and used
on both myself, debugged stdcxx on OpenSUSE with SunPro 12.2. Doesn't
work at all on Fedora 17 because of the TLS glibc bug (however it used
to on earlier Fedora releases).

And now for the latest about 1056:

I managed (after many, MANY tries) to get a run of
22.locale.numpunct.mt on Solaris SPARC 32-bit build (at work), and
without the thread-safety patches applied, and I hit the jackpot:

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: SunPro, __SUNPRO_CC = 0x5100
# ENVIRONMENT: sparc-v8 running sunos-5.11
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  4 2012, 16:10:03
# COMMENT: thread safety
############################################################

# CLAUSE: lib.locale.numpunct

# NOTE (S2) (5 lines):
# TEXT: executing "/usr/bin/locale -a>  /var/tmp/tmpfile-MWa4bQ"
# CLAUSE: lib.locale.numpunct
# FILE: process.cpp
# LINE: 276

# INFO (S1) (3 lines):
# TEXT: testing std::numpunct<charT>  with 4 threads, 1000 iterations
each, in 32 locales { "C" "POSIX" "af_ZA.UTF-8" "ar_AE.UTF-8"
"ar_BH.UTF-8" "ar_DZ.UTF-8" "ar_EG.ISO8859-6" "ar_EG.UTF-8"
"ar_IQ.UTF-8" "ar_JO.UTF-8" "ar_KW.UTF-8" "ar_LY.UTF-8" "ar_MA.UTF-8"
"ar_OM.UTF-8" "ar_QA.UTF-8" "ar_SA.UTF-8" "ar_TN.UTF-8" "ar_YE.UTF-8"
"as_IN.UTF-8" "az_AZ.UTF-8" "be_BY.UTF-8" "bg_BG.ISO8859-5"
"bg_BG.UTF-8" "bn_IN.UTF-8" "bs_BA.ISO8859-2" "bs_BA.UTF-8"
"ca_ES.ISO8859-1" "ca_ES.ISO8859-15" "ca_ES.UTF-8" "cs_CZ.ISO8859-2"
"cs_CZ.UTF-8" "cs_CZ.UTF-8@euro" }
# CLAUSE: lib.locale.numpunct

[ ... snip ... ]

# INFO (S1) (4 lines):
# TEXT: creating a thread pool with 4 threads
# CLAUSE: lib.locale.numpunct
# LINE: 548

Abort (core dump)

/builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/tests/localization/22.locale.numpunct.mt.cpp:140:
Assertion '0 == rw_strncmp (grp.c_str (), data.grouping_.c_str ())'
failed.
/builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/lib/libstdcxx4.so.4.2.1'__1cE__rwQ__rw_assert_fail6Fpkc2i2_v_+0x78
[0xff249af8]
/builds/steleman/userland-stdcxx-upstream-thread-safety/components/stdcxx/stdcxx-4.2.1/build/tests/22.locale.numpunct.mt'thread_func+0x500
[0x1b050]
/lib/libc.so.1'_lwp_start+0x0 [0xf7b553f0]

Bingo!

At line 140 in 22.locale.numpunct.mt.cpp, we see:

  RW_ASSERT (0 == rw_strncmp (grp.c_str (),
                                         data.grouping_.c_str ()));

What this means:

Between the time std::string grp was initialized at line 133:

const std::string grp = np.grouping ();

and the time we hit line 140 (the RW_ASSERT), another thread modified
the std::string _C_grouping member of class

template<class _CharT>  class numpunct : public _RW::__rw_facet.

And because this is the non-patched version of _numpunct.h, the
variable std::string grp did not really return a deep copy of
_C_grouping, but a shared handle to its underlying _CharT buffer. Nor
was the initialization of _C_grouping mutex-protected inside the
member function

template<class _CharT>
inline string numpunct<_CharT>::grouping () const.

This shared buffer was subsequently modified by another thread, in
another locale, while this thread was convinced, all along, that it
was holding a deep copy of the variable std::string grp, in the locale
it wanted. Hence, the assertion on the strcmp(3C) mismatch.

--Stefan


Reply via email to