Board report time

2012-09-11 Thread Jim Jagielski
It's time for our report to the board...

what would we like to share?

I see:

 o renewed discussion on health/viability of pmc
 o increased development being done
 o PMC expressing interest in moving to git


Re: Board report time

2012-09-11 Thread Liviu Nicoara

On 09/11/12 08:15, Jim Jagielski wrote:

It's time for our report to the board...

what would we like to share?

I see:

  o renewed discussion on health/viability of pmc
  o increased development being done
  o PMC expressing interest in moving to git


This sounds about right. It should also mention that some members expressed 
interest in alternative licensing.

Thanks.

Liviu



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-11 Thread Liviu Nicoara

On 9/11/12 9:40 PM, Martin Sebor wrote:

On 09/11/2012 04:15 PM, Stefan Teleman wrote:

On Mon, Sep 10, 2012 at 4:24 PM, Stefan Teleman

I think I have something which doesn't break BC - stay tuned because
I'm testing it now.


OK.

So, here's a possible implementation of __rw_get_numpunct() with
minimal locking, which passes the MT tests and does not break ABI:

s247136804.onlinehome.us/stdcxx-1056-20120911/punct.cpp

And the same for include/loc/_numpunct.h:

http://s247136804.onlinehome.us/stdcxx-1056-20120911/_numpunct.h

In _numpunct.h, all the functions perform no checks and no lazy
initialization. They function simply as a pass-through to
__rw_get_numpunct(). std::numpunctT's data members are now dead
varaiables.

The bad: performance is no better than with locking the mutex inside
each of the std::numpunctT::*() functions, and with lazy
instantiation.


I wouldn't expect this to be faster than the original. In fact,
I would expect it to be slower because each call to one of the
public, non-virtual members results in a call to the out-of-line
virtual functions (and another to __rw_get_moneypunct). Avoiding
the overhead of such calls is the main and only reason why the
caching exists.



AFAICT, there are two cases to consider:

1. Using STDCXX locale database initializes the __rw_punct_t data in the first, 
properly synchronized pass through __rw_get_numpunct. All subsequent calls use 
the __rw_punct_t data to construct returned objects.
2. Using the C library locales does the same in the first pass, via setlocale 
and localeconv, but setlocale synchronization is via a per-process lock. The 
facet data, once initialized is used just like above.


I probably missed this in the previous conversation, but did you detect a race 
condition in the tests if the facets are simply forwarding to the private 
virtual interface? I.e., did you detect that the facet initialization code is 
unsafe? I think the facet __rw_punct_t data is safely initialized in both cases, 
it's the caching that is done incorrectly.



I'm afraid unoptimized timings don't tell us much. Neither does
a comparison between two compilers, even on the same OS.

I looked at Liviu's timings today. I was puzzled by the difference
between (1) which, IIUC, is the current implementation (presumably
an optimized, thread-safe build with the same compiler and OS) and
(4), which, again IIUC, is the equivalent of your latest patch here
(again, presumably optimized, thread safe, same compiler/OS). I'm
having trouble envisioning how calling a virtual function to
retrieve the value of grouping can possibly be faster than not
calling it (and simply returning the value cached in the data
member of the facet.



The new results I attached to the issue come from a a bit clearer tests and they 
focus on just two cases: the current implementation vs. a non-caching one; the 
latter just forwards the grouping calls to the protected do_grouping, with _no_ 
other changes to the implementation.


The timing numbers seem to show that MT builds fare far worse with the caching 
than without. Stefan, if you have the time, could you please infirm :) my 
conclusions by timing it on one of your machines?


Thanks,

Liviu



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-11 Thread Stefan Teleman
On Tue, Sep 11, 2012 at 10:18 PM, Liviu Nicoara nikko...@hates.ms wrote:

 AFAICT, there are two cases to consider:

 1. Using STDCXX locale database initializes the __rw_punct_t data in the
 first, properly synchronized pass through __rw_get_numpunct. All subsequent
 calls use the __rw_punct_t data to construct returned objects.
 2. Using the C library locales does the same in the first pass, via
 setlocale and localeconv, but setlocale synchronization is via a per-process
 lock. The facet data, once initialized is used just like above.

 I probably missed this in the previous conversation, but did you detect a
 race condition in the tests if the facets are simply forwarding to the
 private virtual interface? I.e., did you detect that the facet
 initialization code is unsafe? I think the facet __rw_punct_t data is safely
 initialized in both cases, it's the caching that is done incorrectly.

I originally thought so too, but now I'm having doubts. :-) And I
haven't tracked it down with 100% accuracy yet. I saw today this
comment in src/facet.cpp, line 358:

// a per-process array of facet pointers sufficiently large
// to hold (pointers to) all standard facets for 8 locales
static __rw_facet*  std_facet_buf [__rw_facet::_C_last_type * 8];

this leads me to suspect that there is an upper limit of 8 locales +
their standard facets. If the locales (and their facets) are being
recycled in and out of this 8-limit cache, that would explain the
other thing I also noticed (which also answers your question): yes, i
have gotten the dreaded strcmp(3C) 'Assertion failed' in
22.locale.numpunct.mt when I test implemented 22.locale.numpunct.mt in
a similar way to your tests. which in theory shouldn't happen, but it
did. which means that there's something going on with
behind-the-scenes facet re-initialization that i haven't found yet.
which would partially explain your observation that MT-tests perform
much worse with caching than without.

this is all investigative stuff for tomorrow. :-)

and I agree with Martin that breaking ABI in a minor release is really
not an option. I'm trying to find the best way of making these facets
thread-safe while inflicting the least horrible performance hit.

i will run your tests tomorrow and let you know. :-)

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com