On 9/24/12 11:06 PM, Stefan Teleman wrote:
On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara <nikko...@hates.ms> wrote:

Stefan, was it your intention to completely eliminate all the race
conditions with this last patch? Is this what the tools showed in your
environment?

https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13461452#comment-13461452

Yes, all the race conditions in std::numpunct<> and std::moneypunct<>.
Not all the race conditions in stdcxx.

FWIW, I have analyzed the performance of the latest patch from Stefan. The test program is the same test program used in the timings attachments to the incident. The version of the library was 4.2.1, compiler gcc 4.7.1, with:

1. The simple, perfect forwarding patch, with no caching, which does eliminate race conditions.
2. Stefan's latest patch
3. A patch I created which truly, if crudely, eliminates all facet race conditions by using a top-level lock for all facet operations(*)

The numbers from the first test run are already available in the incident.

The numbers for the second test run were a bit bigger, which is expected because of the additional synchronization. E.g., I have found that the test, run against the STDCXX locale database, gives:

$ time ./t en_US.UTF-8 4 5000000
4, 5000000

real    0m19.210s
user    0m32.659s
sys     0m36.507s

where the first patch gives:

$ time ./t en_US.UTF-8 4 50000000; done
4, 50000000

real    0m7.961s
user    0m31.211s
sys     0m0.003s

Notice the number of loops, 10x larger in the second set of number.

Now, for the third patch, the numbers are embarrassingly large. It practically slows the program to a crawl because it does not do any reads of facet data without holding the lock. This, I wager, eliminates all races in the numpunct/moneypunct facet. See:

$ time ./t en_US.UTF-8 4 100000
4, 100000

real    0m24.316s
user    0m28.213s
sys     0m6.633s

Now that is a whooping performance hit.

I went back to see an explanation between the numbers I see with Stefan's more performing patch, and my lock-all patch. I believe that there are still places in Stefan's patch where the facet makes unguarded reads of facet data, e.g.:

loc/_numpunct.h:
    190     // returns a pointer to the facet's implementation data
    191     // if it exists, 0 otherwise
    192     const void* _C_data () const {
    193         return _C_impsize ? _C_impdata
    194             : _RWSTD_CONST_CAST (__rw_facet*, this)->_C_get_data ();
    195     }

where _C_impsize/_C_impdata are read outside of a lock scope, etc. I.e., a thread may read the _C_impsize while, say, another is still writing the _C_impdata member variable.

I conclude, even with such simple testing, that any solution that attempts to completely eliminate the race conditions in these facets will incur a penalty, with the safest (top-level locking) the worst by orders of magnitude and any other solution somewhere in between.

Please let me know if you have any questions.

Liviu

(*) Will attach it as soon as we figure out how to undo the closing of the 
issue.

Reply via email to