[ 
https://issues.apache.org/jira/browse/STDCXX-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646021#action_12646021
 ] 

sebor edited comment on STDCXX-914 at 11/8/08 4:21 PM:
--------------------------------------------------------------

Here's a superficially tested patch to optimize 
{{\_\_rw_locale::_C_is_managed()}} and  {{\_\_rw_locale::_C_manage()}} in 
{{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}.
 It improves the performance of the test case by about 45% (down from 18.905s 
to 12.147s on an Intel Core 2 6600 running at 2.40GHz) by having  
{{\_\_rw_locale::_C_is_managed()}} avoid expensive tests for named faces in the 
"C" locale and by using a more efficient way to detect the classic locale in 
{{\_\_rw_locale::_C_manage()}} when invoked from {{locale::~locale()}}.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -859,7 +859,22 @@
         return tmp;
     }
 
+    if (plocale && plocale == classic) {
+        // optimize the "destruction" of the classic C locale
+        // the object is never destroyed and its reference count
+        // never drops to 0
+        _RWSTD_ASSERT (__rw_is_C (locname));
+        _RWSTD_ASSERT (__rw_is_C (plocale->_C_name));
 
+        const size_t ref =
+            _RWSTD_ATOMIC_PREDECREMENT (plocale->_C_ref, false);
+
+        _RWSTD_ASSERT (ref + 1U != 0);
+        _RWSTD_UNUSED (ref);
+
+        return 0;
+    }
+
     // re-entrant to protect static local data structures
     // (not the locales themselves)
     _RWSTD_MT_STATIC_GUARD (_RW::__rw_locale);
@@ -1066,6 +1081,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

With the patch applied, the top 12 list looks like so:
\\
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 16.70      0.97     0.97 50000000     0.00     0.00  
__rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
 12.57      1.70     0.73 10000000     0.00     0.00  std::istream& 
std::operator>>(std::istream&, std::string&)
  8.43      2.19     0.49 10000000     0.00     0.00  
std::num_put::_C_put(std::ostreambuf_iterator, std::ios_base&, char, int, void 
const*) const
  7.06      2.60     0.41 10000001     0.00     0.00  
std::string::operator=(std::string const&)
  6.45      2.98     0.38 10000000     0.00     0.00  std::string 
lex_cast<std::string, long>(long const&)
  5.34      3.29     0.31 10000000     0.00     0.00  __rw::__rw_dtoa(char*, 
unsigned long, unsigned int)
  4.65      3.56     0.27                             main
  4.30      3.81     0.25 10000000     0.00     0.00  std::ostream& 
__rw::__rw_insert(std::ostream&, long)
  3.27      4.00     0.19 10000000     0.00     0.00  
std::locale::locale(std::locale const&)
  3.01      4.17     0.18 10000000     0.00     0.00  std::stringbuf::str(char 
const*, unsigned long)
  2.75      4.33     0.16 30000000     0.00     0.00  
__rw::__rw_locale::_C_is_managed(int) const
  2.75      4.49     0.16 30000000     0.00     0.00  std::locale::~locale()
{noformat}


      was (Author: sebor):
    Here's a superficially tested patch to optimize 
{{\_\_rw_locale::_C_is_managed()}} and  {{\_\_rw_locale::_C_manage()}} in 
{{[src/locale_body.cpp|http://svn.eu.apache.org/viewvc/stdcxx/trunk/src/locale_body.cpp?revision=651334&view=markup]}}.
 It improves the performance of the test case by about 45% (down from 18.905s 
to 12.147s on an Intel Core 2 6600 running at 2.40GHz) by having  
{{\_\_rw_locale::_C_is_managed()}} avoid expensive tests for named faces in the 
"C" locale and by using a more efficient way to detect the classic locale in 
{{\_\_rw_locale::_C_manage()}} when invoked from {{locale::~locale()}}.
\\
\\
{noformat}
Index: src/locale_body.cpp
===================================================================
--- src/locale_body.cpp (revision 712407)
+++ src/locale_body.cpp (working copy)
@@ -859,7 +859,22 @@
         return tmp;
     }
 
+    if (plocale && plocale == classic) {
+        // optimize the "destruction" of the classic C locale
+        // the object is never destroyed and its reference count
+        // never drops to 0
+        _RWSTD_ASSERT (__rw_is_C (locname));
+        _RWSTD_ASSERT (__rw_is_C (plocale->_C_name));
 
+        const size_t ref =
+            _RWSTD_ATOMIC_PREDECREMENT (plocale->_C_ref, false);
+
+        _RWSTD_ASSERT (ref + 1U != 0);
+        _RWSTD_UNUSED (ref);
+
+        return 0;
+    }
+
     // re-entrant to protect static local data structures
     // (not the locales themselves)
     _RWSTD_MT_STATIC_GUARD (_RW::__rw_locale);
@@ -1066,6 +1081,15 @@
             return false;
         }
 
+        _RWSTD_ASSERT (0 == _C_usr_facets);
+
+        if (_C_all == _C_std_facet_bits && 0 == _C_byname_facet_bits) {
+            // optimized for the C locale
+            _RWSTD_ASSERT (__rw_is_C (_C_name));
+
+            return true;
+        }
+
         // unless all facets in the same category come either from
         // the C locale or from some named locale the locale object
         // containing the facets is not managed (this test doesn't
{noformat}

With the patch applied, the top 12 list looks like so:
\\
\\
{noformat}
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 16.70      0.97     0.97 50000000     0.00     0.00  
__rw::__rw_locale::_C_manage(__rw::__rw_locale*, char const*)
 12.57      1.70     0.73 10000000     0.00     0.00  std::basic_istream<char, 
std::char_traits<char> >& std::operator>><char, std::char_traits<char>, 
std::allocator<char> >(std::basic_istream<char, std::char_traits<char> >&, 
std::basic_string<char, std::char_traits<char>, std::allocator<char> >&)
  8.43      2.19     0.49 10000000     0.00     0.00  std::num_put<char, 
std::ostreambuf_iterator<char, std::char_traits<char> > 
>::_C_put(std::ostreambuf_iterator<char, std::char_traits<char> >, 
std::ios_base&, char, int, void const*) const
  7.06      2.60     0.41 10000001     0.00     0.00  
std::string::operator=(std::string const&)
  6.45      2.98     0.38 10000000     0.00     0.00  std::string 
lex_cast<std::string, long>(long const&)
  5.34      3.29     0.31 10000000     0.00     0.00  __rw::__rw_dtoa(char*, 
unsigned long, unsigned int)
  4.65      3.56     0.27                             main
  4.30      3.81     0.25 10000000     0.00     0.00  std::basic_ostream<char, 
std::char_traits<char> >& __rw::__rw_insert<char, std::char_traits<char>, 
long>(std::basic_ostream<char, std::char_traits<char> >&, long)
  3.27      4.00     0.19 10000000     0.00     0.00  
std::locale::locale(std::locale const&)
  3.01      4.17     0.18 10000000     0.00     0.00  
std::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> 
>::str(char const*, unsigned long)
  2.75      4.33     0.16 30000000     0.00     0.00  
__rw::__rw_locale::_C_is_managed(int) const
  2.75      4.49     0.16 30000000     0.00     0.00  std::locale::~locale()
{noformat}

  
> sstream ctors inefficient in reentrant modes
> --------------------------------------------
>
>                 Key: STDCXX-914
>                 URL: https://issues.apache.org/jira/browse/STDCXX-914
>             Project: C++ Standard Library
>          Issue Type: Improvement
>          Components: 27. Input/Output
>    Affects Versions: 4.1.2, 4.1.3, 4.1.4, 4.2.0, 4.2.1
>            Reporter: Martin Sebor
>            Priority: Critical
>             Fix For: 4.2.2
>
>         Attachments: stdcxx-914-gprof-gcc-4.1.2-12D.txt, 
> stdcxx-914-gprof-gcc-4.3.0-12S.txt
>
>   Original Estimate: 12h
>          Time Spent: 2.5h
>  Remaining Estimate: 9.5h
>
> As discussed in this [thread|http://markmail.org/message/hqlsw5dq23gx7d4o] 
> stream ctors in thread-safe builds are inefficient due to the initialization 
> of the mutex data member in every stream, even in those that never use it. As 
> soon as binary compatibility rules permit it we should remove the mutex 
> and/or defer its initialization until it's needed. It might be possible to 
> implement the deferred initialization as early as 4.2.2, or maybe 4.3. 
> Complete removal will need to wait until 5.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to