On 9/24/12 11:06 PM, Stefan Teleman wrote:
On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara <nikko...@hates.ms> wrote:
Stefan, was it your intention to completely eliminate all the race
conditions with this last patch? Is this what the tools showed in your
environment?
https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461452#comment-13461452
Yes, all the race conditions in std::numpunct<> and std::moneypunct<>.
Not all the race conditions in stdcxx.
FWIW, I have analyzed the performance of the latest patch from Stefan. The test
program is the same test program used in the timings attachments to the
incident. The version of the library was 4.2.1, compiler gcc 4.7.1, with:
1. The simple, perfect forwarding patch, with no caching, which does eliminate
race conditions.
2. Stefan's latest patch
3. A patch I created which truly, if crudely, eliminates all facet race
conditions by using a top-level lock for all facet operations(*)
The numbers from the first test run are already available in the incident.
The numbers for the second test run were a bit bigger, which is expected because
of the additional synchronization. E.g., I have found that the test, run against
the STDCXX locale database, gives:
$ time ./t en_US.UTF-8 4 5000000
4, 5000000
real 0m19.210s
user 0m32.659s
sys 0m36.507s
where the first patch gives:
$ time ./t en_US.UTF-8 4 50000000; done
4, 50000000
real 0m7.961s
user 0m31.211s
sys 0m0.003s
Notice the number of loops, 10x larger in the second set of number.
Now, for the third patch, the numbers are embarrassingly large. It practically
slows the program to a crawl because it does not do any reads of facet data
without holding the lock. This, I wager, eliminates all races in the
numpunct/moneypunct facet. See:
$ time ./t en_US.UTF-8 4 100000
4, 100000
real 0m24.316s
user 0m28.213s
sys 0m6.633s
Now that is a whooping performance hit.
I went back to see an explanation between the numbers I see with Stefan's more
performing patch, and my lock-all patch. I believe that there are still places
in Stefan's patch where the facet makes unguarded reads of facet data, e.g.:
loc/_numpunct.h:
190 // returns a pointer to the facet's implementation data
191 // if it exists, 0 otherwise
192 const void* _C_data () const {
193 return _C_impsize ? _C_impdata
194 : _RWSTD_CONST_CAST (__rw_facet*, this)->_C_get_data ();
195 }
where _C_impsize/_C_impdata are read outside of a lock scope, etc. I.e., a
thread may read the _C_impsize while, say, another is still writing the
_C_impdata member variable.
I conclude, even with such simple testing, that any solution that attempts to
completely eliminate the race conditions in these facets will incur a penalty,
with the safest (top-level locking) the worst by orders of magnitude and any
other solution somewhere in between.
Please let me know if you have any questions.
Liviu
(*) Will attach it as soon as we figure out how to undo the closing of the
issue.