STDCXX-1056 : numpunct fix

2012-09-19 Thread Stefan Teleman
This is a proposed fix for the numpunct facet for stdcxx-1056:

0.  Number of reported race conditions is now 0 (zero).

1.  No memory leaks in stdcxx (there are memory leaks reported in either
libc or glibc, but there's nothing we can do about these anyway).

2.  This fix preserves perfect forwarding in the _numpunct.h header file.

3.  This fix eliminates code from facet.cpp and locale_body.cpp which
was creating unnecessary overhead, with the potential of causing
memory corruption, while providing no discernable benefit.

More specifically:

It is not true that there was no eviction policy of cached locales or
facets in stdcxx. Not only cache eviction code existed, and still exists
today, but cache cleanups and resizing were performed periodically,
either when an object's reference count dropped to 0 (zero), or whenever
the number of cached objects fell below sizeof(cache) / 2.

In the latter case, both the facet cache and the locale cache performed
a new allocation of the cache array, followed by a memcopy and a delete[]
of the old cache array.

First, the default size of the facets and locales caches was too small:
it was set to 8. I raised this to 32. A direct consequence of this
insufficient default size of 8 was that the cache had to resize itself
very soon after program startup. This cache resize operation consists of:
allocate memory for a new cache, copy the existing cached objects
from the old cache to the new cache, and then delete[] the old cache.

This is a first unnecessary overhead.

Second, and as I mentioned above, whenever the number of cached objects
fell below sizeof(cache) / 2, the cache resized itself, by performing
the same sequence of operations as described above.

This is a second unnecessary overhead.

Third, cached objects were automatically evicted whenever their reference
count dropped to 0 (zero). There are two consequences to this eviction
policy: if the program needs to re-use an object (facet or locale) which
has been evicted and subsequently destroyed, this object needs then to be
constructed again later on, and subsequently re-inserted into the cache.
This, in turn, would trigger a cache resize, followed by copying and
delete[] of the old cache buffer.

Object eviction followed by destruction followed by reconstruction is
a third unnecessary overhead. Re-inserting a re-constructed object into,
the cache, followed by a potential cache resize involving allocation of
a new buffer, copying pointers from the old cache to the new cache,
followed by delete[] of the old cache is a fourth unnecessary overhead.

Real-life programs tend to reuse locales and/or facets they have created.
There is no point in destroying and evicting these objects simply because
there may be periods when the object isn't referenced at time. The object
is likely to be needed again, later on.

The fix proposed here eliminates the cache eviction and object destruction
policy completely. Once created, objects remain in the cache, even though
they may reside in the cache with no references. This eliminates the
cache resize / copy / delete[] overhead. It also eliminates the overhead
of re-constructing an evicted / destroyed object, if it is needed again
later.

4.  Tests and Analysis Results:

4.1. SunPro 12.3 / Solaris / SPARC / Race Conditions Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.solaris-sparc.datarace.er.html/index.html

4.2. SunPro 12.3 / Solaris / SPARC / Heap and Memory Leaks Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.solaris-sparc.heapcheck.er.html/index.html

4.3. SunPro 12.3 / Linux / Intel / Race Conditions Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.linux-intel.datarace.er.html/index.html

4.4. SunPro 12.3 / Linux / Intel / Heap and Memory Leaks Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.linux-intel.heapcheck.er.html/index.html

4.5. Intel 2013 / Linux / Intel / Race Conditions Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.intel.linux.datarace.r007ti3.inspxez

4.6. Intel 2013 / Linux / Intel / Heap and Memory Leaks Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.intel.linux.heapcheck.r008mi1.inspxez

5.  Source code for this fix:

http://s247136804.onlinehome.us/stdcxx-1056-20120919/_numpunct.h
http://s247136804.onlinehome.us/stdcxx-1056-20120919/facet.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120919/locale_body.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120919/punct.cpp

These files are based on stdcxx 4.2.1.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-19 Thread Stefan Teleman
On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I think you are referring to `live' cache objects and the code which
 specifically adjusts the size of the buffer according to the number of
 `live' locales and/or facets in it. In that respect I would not call that
 eviction because locales and facets with non-zero reference counters are
 never evicted.

 But anyhoo, this is semantics. Bottom line is the locale/facet buffer
 management code follows a principle of economy.

Yes it does. But we have to choose between economy and efficiency. To
clarify: The overhead of having unused pointers in the cache is
sizeof(void*) times the number of unused slots.  This is 2012. Even
an entry-level Android cell phone comes with 1GB system memory. If we
want to talk about embedded systems, where memory constraints are more
stringent than cell phones, then we're not talking about Apache stdcxx
anymore, or any other open souce of the C++ Standard Library. These
types of systems use C++ for embedded systems, which is a different
animal altogether: no exceptions support, no rtti. For example see,
Green Hills: http://www.ghs.com/ec++.html. And even they have become
more relaxed about memory constraints. They use BOOST.

Bottom line: so what if 16 pointers in this 32 pointer slots cache
never get used. The maximum amount of wasted memory for these 16
pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers.
Can we live with that in 2012, a year when a $500 laptop comes with
4GB RAM out of the box? I would pick 128 bytes of allocated but unused
memory over random and entirely avoidable memory churn any day.

 The optimal number is subject to debate. Probably Martin can give an insight
 into the reasons for that number. Why did you pick 32 (or is it 64 in your
 patch) and not any other? Is it something based on your experience as a user
 or programmer?

Based on two things:

1. There are, apparently, 30 top languages spoken on this planet:

http://www.vistawide.com/languages/top_30_languages.htm

2. I've written locale-aware software back in my days on Wall Street.
The maximum number of locales I had to support was 14.

So max(14, 30) would be 30. So I made it 32 because it's a power of 2.

 A negligible overhead, IMO. The benefits of maintaining a small memory
 footprint may be important for some environments. As useful as principles
 may be, see above.

Small and negligible in theory. In practice, when the cache starts
resizing itself by allocating new memory, copying, delete[]'ing and -
I forgot to mention this in my initial post - finishing it all up with
a call to qsort(3C), it's not that negligible anymore. It doesn't just
happen once. It happens every time the cache gets anxious (for
reasons mentioned in my previous email) and wants to resize itself.
Which triggers the following question in my mind: why are we even
causing all this memory churn in the first place? Because we saved 128
bytes (or 64 bytes on a 32-bit machine, which is what most cell
phones/tablets are these days)?

My goal: I would be very happy if any application using Apache stdcxx
would reach its peak instantiation level of localization (read: max
number of locales and facets instantiated and cached, for the
application's particular use case), and would then stabilize at that
level *without* having to resize and re-sort the cache, *ever*. That
is a locale cache I can love. I love binary searches on sorted
containers. Wrecking the container with insertions or deletions, and
then having to re-sort it again, not so much. Especially when I can't
figure out why we're doing it in the first place.

 In this respect you could call every memory allocation and de-allocation is
 an overhead. Please keep in mind that this resembles the operations
 performed for any sequence containers; how likely is it for a program to
 have more locale/facet creation/destruction than strings or vectors
 mutations?

There's one fundamental difference: the non-sorted STL containers give
the developer the opportunity to construct them with an initial size
larger than the implementation-specific default size. Any application
developer worth their salt would perform some initial size
optimization for these types of containers. If I know that my
std::list will end up containing 5000 things, I would never
construct my list object with the default size of 16. Do that, and
you'll get flamed at code review. As for the sorted associative
containers, that's one of the major gripes against them: whenever they
have to grow, or rebalance, they get expensive. But we're not using a
sorted associative container here. It's just a plain ol' C array.

 Could you please elaborate a bit on this? Is this your opinion based on your
 user and/or programmer experience?

See above about the top 30 languages spoken in the world.

 Hey Stefan, are the above also timing the changes?

Nah, I didn't bother with the timings - yet - for a very simple
reason: in order to use