Re: Fwd: Re: STDCXX-1071 numpunct facet defect

Liviu Nicoara Fri, 26 Oct 2012 05:51:22 -0700

On 10/03/12 11:10, Martin Sebor wrote:

[...]
I was just thinking of a few simple loops along the lines of:


   void* thread_func (void*) {
       for (int i = 0; i < N; ++)
           test 1: do some simple stuff inline
           test 2: call a virtual function to do the same stuff
           test 3: lock and unlock a mutex and do the same stuff
   }

Test 1 should be the fastest and test 3 the slowest. This should
hold regardless of what "simple stuff" is (eventually, even when
it's getting numpunct::grouping() data).

tl;dr: removing the facet data cache is a priority. All else can be puton the back-burner.

Conflicting test results aside, there still is the case of the incorrecthandling of the cached data in the facet. I don't think there is adisagreement on that. Considering that the std::string is moving in thedirection of dropping the handle-body implementation, simply getting ridof the cache is a step in the same direction.

I think that we should preserve the lock-free reading of the facet data,as a benign race, but making it benign is perhaps more complicated thanpreviously suggested.

As a reminder, the core of the facet access and initialization codeessentially looks like this (pseudocode-ish):



// facet data accessor
...
    if (0 == _C_impsize) {              // 1
        mutex_lock ();
        if (_C_impsize)
            return _C_data;
        _C_data    = get_facet_data (); // 2
        ??                              // 3
        _C_impsize = 1;                 // 4
        mutex_unlock ();
    }
    ??                                  // 5
    return _C_data;                     // 6
...

with question marks for missing, necessary fixes. The compiler needs tobe prevented from re-ordering both 2-4 and 1-6. Just for the sake ofargument I can imagine an optimization that reorders the reads in 1-6:


    register x = _C_data;
    if (_C_impsize)
        return x;

and if the loads are executed in this order, the caller will see a stale_C_data.

First, the 2-4 writes need to be executed in the program order. Thisneeds both a compiler barrier and a store-store memory barrier that willkeep the writes ordered.

Then, the reads in 1-6 need to be ordered such that _C_data is readafter _C_impsize, via a compiler barrier and a load-load memory barrierthat will preserve the program order of the loads.

Various compilers provide these features in various forms, but at themoment we don't have a unified STDCXX API to implement this.


Of course, I might be wrong. Input is appreciated.

Thanks,
Liviu

Re: Fwd: Re: STDCXX-1071 numpunct facet defect

Reply via email to