Re: STDCXX-1056 : numpunct fix

2012-09-25 Thread Liviu Nicoara

On 09/24/12 22:28, Martin Sebor wrote:

On 09/20/2012 06:46 PM, Stefan Teleman wrote:

On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoaranikko...@hates.ms  wrote:


I have not created this requirement out of thin air. STDCXX development has 
functioned in this manner for as long as I remember. If it does not suit you, 
that's fine.


That would explain why these bugs are present in the first place.

If the official method of determining thread-safety here is
fprintf(3C), then we have a much bigger problem than
22.locale.numpunct.mt.


Tests that end in .mt.cpp use the RW_ASSERT() macro to verify
correctness in multithreaded programs. There is no way at the
moment to run a thread analyzer as part of the test suite,
although it would be a nice addition. As I said in my other
reply, though, not all data races necessarily indicate bugs,
so we'd need a way to distinguish between know/benign races
and the rest. (Unless we decide to eliminate all races and
accept the performance penalty.)


Preserving the lazy initialization is possible: we perfectly forward to the 
implementation with either:

1. no additional synchronization (preserves fast reads of the data), while 
fully knowing that there are data races in __rw_get_numpunct/_C_get_data there, 
or
2.we put a big lock on top of all and every facet operations (the version that 
would be race free).

Caching seems out of the window for now, more so in the presence of unguarded 
reads of facet's data.

The only argument for (1) is that at the moment we don't have a failing test 
yet, and not for lack of arduous trying.

However, if atomic/membar APIs would be recognized by the compiler, lazy 
initialization of the facet and caching of data would be attainable in a quite 
simple fashion.

Liviu


Re: STDCXX-1056 : numpunct fix

2012-09-25 Thread Liviu Nicoara

On 09/24/12 23:50, Stefan Teleman wrote:

On Mon, Sep 24, 2012 at 10:03 PM, Martin Sebor mse...@gmail.com wrote:

FWIW, there are race conditions in stdcxx. Some of them are by
design and benign on the systems the library runs on (although
I acknowledge that some others may be bugs). One such benign
date race is:

   timeT1  T2
 0x = N
 1x = N   read x

where x is a scalar that can be accessed atomically by the CPU
and the compiler.

I think some of the lazy facet initialization falls under this
class. It would be nice to fix them but only if it doesn't slow
things down. The others need to be fixed in any case.


The race conditions I am talking about are not benign.


Caching failures aside, we have no failing tests. Before saying they are malign 
you need to objectively show a failing program.



It shows the types of race conditions it's reporting: these are
read/write race conditions, not read/read. The thread analyzer's html
filter doesn't show the types of races reported as clearly as the
command-line analyzer which has a windowing GUI.


Martin's example above is a read-write race. In the absence of a failing test 
case it is quite unreasonable to modify the current implementation.

Liviu



Re: STDCXX-1056 : numpunct fix

2012-09-25 Thread Liviu Nicoara

On 9/24/12 11:06 PM, Stefan Teleman wrote:

On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara nikko...@hates.ms wrote:


Stefan, was it your intention to completely eliminate all the race
conditions with this last patch? Is this what the tools showed in your
environment?



https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452


Yes, all the race conditions in std::numpunct and std::moneypunct.
Not all the race conditions in stdcxx.


FWIW, I have analyzed the performance of the latest patch from Stefan. The test 
program is the same test program used in the timings attachments to the 
incident. The version of the library was 4.2.1, compiler gcc 4.7.1, with:


1. The simple, perfect forwarding patch, with no caching, which does eliminate 
race conditions.

2. Stefan's latest patch
3. A patch I created which truly, if crudely, eliminates all facet race 
conditions by using a top-level lock for all facet operations(*)


The numbers from the first test run are already available in the incident.

The numbers for the second test run were a bit bigger, which is expected because 
of the additional synchronization. E.g., I have found that the test, run against 
the STDCXX locale database, gives:


$ time ./t en_US.UTF-8 4 500
4, 500

real0m19.210s
user0m32.659s
sys 0m36.507s

where the first patch gives:

$ time ./t en_US.UTF-8 4 5000; done
4, 5000

real0m7.961s
user0m31.211s
sys 0m0.003s

Notice the number of loops, 10x larger in the second set of number.

Now, for the third patch, the numbers are embarrassingly large. It practically 
slows the program to a crawl because it does not do any reads of facet data 
without holding the lock. This, I wager, eliminates all races in the 
numpunct/moneypunct facet. See:


$ time ./t en_US.UTF-8 4 10
4, 10

real0m24.316s
user0m28.213s
sys 0m6.633s

Now that is a whooping performance hit.

I went back to see an explanation between the numbers I see with Stefan's more 
performing patch, and my lock-all patch. I believe that there are still places 
in Stefan's patch where the facet makes unguarded reads of facet data, e.g.:


loc/_numpunct.h:
190 // returns a pointer to the facet's implementation data
191 // if it exists, 0 otherwise
192 const void* _C_data () const {
193 return _C_impsize ? _C_impdata
194 : _RWSTD_CONST_CAST (__rw_facet*, this)-_C_get_data ();
195 }

where _C_impsize/_C_impdata are read outside of a lock scope, etc. I.e., a 
thread may read the _C_impsize while, say, another is still writing the 
_C_impdata member variable.


I conclude, even with such simple testing, that any solution that attempts to 
completely eliminate the race conditions in these facets will incur a penalty, 
with the safest (top-level locking) the worst by orders of magnitude and any 
other solution somewhere in between.


Please let me know if you have any questions.

Liviu

(*) Will attach it as soon as we figure out how to undo the closing of the 
issue.


Re: STDCXX-1056 : numpunct fix

2012-09-24 Thread Liviu Nicoara

On 9/24/12 12:06 AM, Stefan Teleman wrote:

On Fri, Sep 21, 2012 at 9:10 AM, Liviu Nicoara nikko...@hates.ms wrote:

On 09/21/12 05:13, Stefan Teleman wrote:


On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
travis.vi...@roguewave.com wrote:

I have provided this list with test results showing that my patch
*does* fix the race condition problems identified by all the tools at
my disposal. I'm willing to bet you never looked at it. You dismissed
it outright, just because you didn't like the *idea* that increasing
the size of the cache, and eliminating that useless cache resizing
would play an important role in eliminating the race conditions.




I looked at it in great detail and I am sure Travis read it too. The
facet/locale buffer management is a red herring. Apparently, we cannot
convince you, regardless of the argument. That's fine by me.


This bug [STDCXX-1056] was updated over the weekend with new comments.
Here's the link to the comments, for the record:


Stefan, was it your intention to completely eliminate all the race conditions 
with this last patch? Is this what the tools showed in your environment?


Thanks,
Liviu



https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452





Re: STDCXX-1056 : numpunct fix

2012-09-24 Thread Stefan Teleman
On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara nikko...@hates.ms wrote:

 Stefan, was it your intention to completely eliminate all the race
 conditions with this last patch? Is this what the tools showed in your
 environment?

 https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452

Yes, all the race conditions in std::numpunct and std::moneypunct.
Not all the race conditions in stdcxx.

Yes, that's what the test results show with my latest patchset - 0
race conditions in numpunct and moneypunct on Linux/Intel 32/64,
Solaris SPARC 32/64 and Solaris Intel 32/64. The test results are in
the onlinehome.us directory URL i put in the comment.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-24 Thread Stefan Teleman
On Mon, Sep 24, 2012 at 10:03 PM, Martin Sebor mse...@gmail.com wrote:
 FWIW, there are race conditions in stdcxx. Some of them are by
 design and benign on the systems the library runs on (although
 I acknowledge that some others may be bugs). One such benign
 date race is:

   timeT1  T2
 0x = N
 1x = N   read x

 where x is a scalar that can be accessed atomically by the CPU
 and the compiler.

 I think some of the lazy facet initialization falls under this
 class. It would be nice to fix them but only if it doesn't slow
 things down. The others need to be fixed in any case.

The race conditions I am talking about are not benign.

I've uploaded a full thread analyzer output for 22.locale.numpunct.mt
showing dataraces here:

http://s247136804.onlinehome.us/stdcxx-1056-malign/22.locale.numpunct.mt.2.er.tar.bz2

The name of the analyzer results directory is 22.locale.numpunct.mt.2.er

You will need the SunPro Linux 12.3 Thread Analyzer installed, which
comes with SunPro anyway. The analyzer itself is
${PATH_TO_SUN_PRO_INSTALL}/bin/analyzer

There's a screenshot from the same Analyzer output here:

 
http://s247136804.onlinehome.us/stdcxx-1056-malign/sunpro_thread_analyzer_screenshot.jpg

taken just now on my laptop. The report itself is from a couple of
days ago, and it's from a run with only the _numpunct.h patch applied.
No patches to either facet.cpp, punct.cpp or locale_body.cpp.

It shows the types of race conditions it's reporting: these are
read/write race conditions, not read/read. The thread analyzer's html
filter doesn't show the types of races reported as clearly as the
command-line analyzer which has a windowing GUI.

At any rate you can see the same exact type of race conditions being
reported by the Intel Inspector 2013 Thread Analyzer.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-23 Thread Stefan Teleman
On Fri, Sep 21, 2012 at 9:10 AM, Liviu Nicoara nikko...@hates.ms wrote:
 On 09/21/12 05:13, Stefan Teleman wrote:

 On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
 travis.vi...@roguewave.com wrote:

 I have provided this list with test results showing that my patch
 *does* fix the race condition problems identified by all the tools at
 my disposal. I'm willing to bet you never looked at it. You dismissed
 it outright, just because you didn't like the *idea* that increasing
 the size of the cache, and eliminating that useless cache resizing
 would play an important role in eliminating the race conditions.



 I looked at it in great detail and I am sure Travis read it too. The
 facet/locale buffer management is a red herring. Apparently, we cannot
 convince you, regardless of the argument. That's fine by me.

This bug [STDCXX-1056] was updated over the weekend with new comments.
Here's the link to the comments, for the record:

https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


RE: STDCXX-1056 : numpunct fix

2012-09-21 Thread Travis Vitek

 -Original Message-
 From: Stefan Teleman
 Sent: Thursday, September 20, 2012 6:00 PM
 To: dev@stdcxx.apache.org
 Cc: Liviu Nicoara
 Subject: Re: STDCXX-1056 : numpunct fix
 
 On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström
 cbergst...@pathscale.com wrote:
 
 
 I do have a program which triggers the race conditions:
 22.locale.numpunct.mt. Part of the official test harness.


I don't think anyone is claiming that there isn't a thread safety problem in 
the locales. I'm open to the possibility, and I'm willing to review changes 
that fix the problem, but I want some understanding of what the problem 
actually is and how the changes actually address the problem before signing off.

I also find it hard to believe the changes to the facet cache (and many of the 
other changes) are necessary to eliminate any data races that were reported by 
these tools. Changing the cache size from 8 to 64 is _clearly_ not going to fix 
a data race. It might reduce the likelihood of one occurring, but it isn't 
going to fix it. I also fail to understand how not reducing the size of the 
cache fixes a data race either. These changes all appear to be optimizations, 
and as such, should _not_ be included in changes to fix thread-safety issues.

Perhaps we could go about this a different way. Test your code without the 
changes to the facet cache changes in _C_manage and report back on what you 
find. Or, perhaps more simply, provide minimal diffs (as required by the 
patching process) to fix the data race issues that you've outlined.
 
 The real reason why they don't want to accept what the instrumentation
 tools are saying is very simple: they don't LIKE reading what the
 tools are saying. So, blame the tools, pretend that as long as it
 doesn't crash again there's no bug and hope for the best.

My previous e-mail attempted to indicate that I don't trust the output that 
these tools have provided. I have no understanding of how the tool managed to 
find a data race in __rw_allocate (or _C_manage). Do you? Doesn't this output 
give you some doubt about the accuracy of the tool?

I have no problem using a tool to make my job easier. Sometimes those tools are 
wrong. The two cases mentioned above call into question the accuracy of the 
tool being used, and it is important to understand why those failures are being 
seen (or why I am wrong about the given code being safe).

 But I am very glad this is on a public mailing list, so everyone can
 read what's going on here.

As am I.

I've raised concerns about the tools you're using to detect problems. I've 
pointed out what I believe to be two false positives, and you've accused me of 
being incompetent and afraid of what the tools are saying. You've offered no 
explanation about these apparent false positives and you've not provided any 
explanation for why you don't see similar false positives when you run the code 
on the patched library.

I've raised concerns about some of the changes you've provided being 
unnecessary to resolve the issue at hand (a thread safety issue) and labeled 
them as optimizations (your original e-mail seemed to indicate you were under 
the same impression). As Liviu pointed out, the bug fix changes should be 
handled separately from the optimization. I wholeheartedly agree.

You called out premature optimization as evil, in a discussion about patches 
you provided that include optimizations and no testcase showing that your 
changes are not premature and provide measureable benefit. Then you rail on...

Then, to top it off, you go on to call me a know-it-all who isn't capable of 
figuring out my own bugs.

I'm sorry, but that isn't acceptable.

Travis



Re: STDCXX-1056 : numpunct fix

2012-09-21 Thread Stefan Teleman
On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
travis.vi...@roguewave.com wrote:

 You called out premature optimization as evil, in a discussion about patches 
 you provided that include optimizations and no testcase showing that your 
 changes are not premature and provide measureable benefit. Then you rail on...

I didn't call premature optimization evil. Donald Knuth did. If you
disagree with him, please feel free to let him know. He's on faculty
at Stanford.

 Then, to top it off, you go on to call me a know-it-all who isn't capable of 
 figuring out my own bugs.

 I'm sorry, but that isn't acceptable

Too bad if you feel that way. Next time you get the idea of making
snide remarks about my working knowledge of the Standard C Library, or
offer out-of-context one-line code fragments completely unrelated to
what this 1056 bug is about, maybe you'll think twice. And this isn't
the first time you offer gratuitous snide remarks directed at me.

You are one of the deniers of the existence of this thread safety
problem in the facets code, going back to early February of this year.

Between the release of stdcxx 4.2.1 in 2008 and the beginning of this
month, when the possibility of this thread safety problem was finally
acknowledged, did you really not know that 22.locale.numpunct.mt and
22.locale.moneypunct.mt have been crashing or getting stuck? Did you
really not know that these crashes were typical symptoms of race
conditions? I find that very hard to believe, given that the problem
has been reported several times before February of this year.

I have provided this list with test results showing that my patch
*does* fix the race condition problems identified by all the tools at
my disposal. I'm willing to bet you never looked at it. You dismissed
it outright, just because you didn't like the *idea* that increasing
the size of the cache, and eliminating that useless cache resizing
would play an important role in eliminating the race conditions.

I have yet to see an alternative patch being proposed here, which
would eliminate the race conditions, and which I am willing to test
just as thoroughly as I've tested mine. The only thing i've seen are
continued attempts at dismissing the existence of these race
conditions, as  reported by the instrumentation tools, based on some
general axioms about their accuracy. No-one on this list has a clue as
to how the SunPro Thread Analyzer actually works, because it's not
open source, and none of you work at Oracle, therefore you can't see
the code. But everyone just *knows* that it's inaccurate, or that it
reports false positives.

As long as you, or anyone else, continues to blame the instrumentation
tools for the bug, and as long as anyone here continues to suggest
that the only acceptable proof of the existence of this bug is some
other program which needs to be written using fprintf(3C), and as long
as no-one here is willing to provide an alternative patch which
demonstrably eliminates 100% of the reported race conditions, this
entire back-and-forth about the existence of these race conditions,
the accuracy of the tools and what they are reporting is nothing but a
giant waste of time.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-21 Thread Liviu Nicoara

On 09/21/12 05:13, Stefan Teleman wrote:

On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
travis.vi...@roguewave.com wrote:

I have provided this list with test results showing that my patch
*does* fix the race condition problems identified by all the tools at
my disposal. I'm willing to bet you never looked at it. You dismissed
it outright, just because you didn't like the *idea* that increasing
the size of the cache, and eliminating that useless cache resizing
would play an important role in eliminating the race conditions.



I looked at it in great detail and I am sure Travis read it too. The 
facet/locale buffer management is a red herring. Apparently, we cannot convince 
you, regardless of the argument. That's fine by me.




I have yet to see an alternative patch being proposed here, which
would eliminate the race conditions, and which I am willing to test
just as thoroughly as I've tested mine. The only thing i've seen are
continued attempts at dismissing the existence of these race
conditions, as  reported by the instrumentation tools, based on some
general axioms about their accuracy. No-one on this list has a clue as
to how the SunPro Thread Analyzer actually works, because it's not
open source, and none of you work at Oracle, therefore you can't see
the code. But everyone just *knows* that it's inaccurate, or that it
reports false positives.


We are not dismissing the analysis. We are dismissing your interpretation of 
it. The tool is not indicating a defect, but a potential defect. I am calling 
it potential because, as my test case (http://tinyurl.com/97ks9gc) indicates, 
one can have a race condition which is not a defect in a concurrent environment.



As long as you, or anyone else, continues to blame the instrumentation
tools for the bug, and as long as anyone here continues to suggest
that the only acceptable proof of the existence of this bug is some
other program which needs to be written using fprintf(3C), and as long
as no-one here is willing to provide an alternative patch which
demonstrably eliminates 100% of the reported race conditions, this
entire back-and-forth about the existence of these race conditions,
the accuracy of the tools and what they are reporting is nothing but a
giant waste of time.



The correction of 1056 does not necessarily need to eliminate the analyzer 
warnings. Aiming for that is simply unreasonable, not because it is not 
attainable (after all, you demonstrated it), but because it is inappropriately 
and insufficiently argued for the situation at hand.

Liviu


RE: STDCXX-1056 : numpunct fix

2012-09-21 Thread Travis Vitek


 -Original Message-
 From: Stefan Teleman [mailto:stefan.tele...@gmail.com]
 Sent: Friday, September 21, 2012 2:14 AM
 To: dev@stdcxx.apache.org
 Subject: Re: STDCXX-1056 : numpunct fix
 
 On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
 travis.vi...@roguewave.com wrote:
 
  You called out premature optimization as evil, in a discussion about
 patches you provided that include optimizations and no testcase showing
 that your changes are not premature and provide measureable benefit.
 Then you rail on...
 
 I didn't call premature optimization evil. Donald Knuth did. If you
 disagree with him, please feel free to let him know. He's on faculty
 at Stanford.

I never said I disagree with the Knuth quote. I just said you have to apply it 
consistently. Is your change an optimization or not? If it is, then is it a 
premature optimization? Do we have a test case (or test results) to indicate 
that there is a discernible performance improvement?

 
 You are one of the deniers of the existence of this thread safety
 problem in the facets code, going back to early February of this year.

Again from my previous e-mail.

quote
I don't think anyone is claiming that there isn't a thread safety problem in 
the locales. I'm open to the possibility, and I'm willing to review changes 
that fix the problem, but I want some understanding of what the problem 
actually is and how the changes actually address the problem before signing off.
/quote

 Between the release of stdcxx 4.2.1 in 2008 and the beginning of this
 month, when the possibility of this thread safety problem was finally
 acknowledged, did you really not know that 22.locale.numpunct.mt and
 22.locale.moneypunct.mt have been crashing or getting stuck? Did you
 really not know that these crashes were typical symptoms of race
 conditions? I find that very hard to believe, given that the problem
 has been reported several times before February of this year.
 
 I have provided this list with test results showing that my patch
 *does* fix the race condition problems identified by all the tools at
 my disposal. I'm willing to bet you never looked at it. You dismissed
 it outright, just because you didn't like the *idea* that increasing
 the size of the cache, and eliminating that useless cache resizing
 would play an important role in eliminating the race conditions.

I looked at every line of your proposed patch _before_ my last e-mail. Despite 
the fact that you didn't provide it in the expected format (diff) and it 
included many changes that are unrelated and make little sense.

 I have yet to see an alternative patch being proposed here, which
 would eliminate the race conditions, and which I am willing to test
 just as thoroughly as I've tested mine. The only thing i've seen are
 continued attempts at dismissing the existence of these race
 conditions

I'm _not_ denying the existence of the problem. Please re-read all of my 
correspondence on this issue. If you don't believe me after that, re-read it 
again.




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara
Thanks for the feed-back. Please see below.


On Sep 19, 2012, at 10:02 PM, Stefan Teleman wrote:

 On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 I think you are referring to `live' cache objects and the code which
 specifically adjusts the size of the buffer according to the number of
 `live' locales and/or facets in it. In that respect I would not call that
 eviction because locales and facets with non-zero reference counters are
 never evicted.
 
 But anyhoo, this is semantics. Bottom line is the locale/facet buffer
 management code follows a principle of economy.
 
 Yes it does. But we have to choose between economy and efficiency. To
 clarify: The overhead of having unused pointers in the cache is
 sizeof(void*) times the number of unused slots.  This is 2012. Even
 an entry-level Android cell phone comes with 1GB system memory. If we
 want to talk about embedded systems, where memory constraints are more
 stringent than cell phones, then we're not talking about Apache stdcxx
 anymore, or any other open souce of the C++ Standard Library. These
 types of systems use C++ for embedded systems, which is a different
 animal altogether: no exceptions support, no rtti. For example see,
 Green Hills: http://www.ghs.com/ec++.html. And even they have become
 more relaxed about memory constraints. They use BOOST.
 
 Bottom line: so what if 16 pointers in this 32 pointer slots cache
 never get used. The maximum amount of wasted memory for these 16
 pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers.
 Can we live with that in 2012, a year when a $500 laptop comes with
 4GB RAM out of the box? I would pick 128 bytes of allocated but unused
 memory over random and entirely avoidable memory churn any day.


The argument is plausible and fine as far as brainstorming goes. 

But have you measured the amount of memory consumed by all STDCXX locale data 
loaded in one process? How much absolute time is spent in resizing the locale 
and facet buffers? What is the gain in space and time performance with such a 
change versus without? Just how fragmented the heap becomes and is there a 
performance impact because of it, etc.? IOW, before changing the status quo one 
must show an objective defect, produce a body of evidence, including a failing 
test case for the argument.


 
 My goal: I would be very happy if any application using Apache stdcxx
 would reach its peak instantiation level of localization (read: max
 number of locales and facets instantiated and cached, for the
 application's particular use case), and would then stabilize at that
 level *without* having to resize and re-sort the cache, *ever*. That
 is a locale cache I can love. I love binary searches on sorted
 containers. Wrecking the container with insertions or deletions, and
 then having to re-sort it again, not so much. Especially when I can't
 figure out why we're doing it in the first place.


And I love minimalistic code, and hate waste at the same time, especially in a 
general purpose library. To each its own.


 
 Hey Stefan, are the above also timing the changes?
 
 Nah, I didn't bother with the timings - yet - for a very simple
 reason: in order to use instrumentation, both with SunPro and with
 Intel compilers, optimization of any kind must be disabled. On SunPro
 you have to pass -xkeepframe=%all (which disables tail-call
 optimization as well), in addition to passing -xO0 and -g. So the
 timings for these unoptimized experiments would have been completely
 irrelevant.

Well, I think you are the only one around here with access to SPARC hardware, 
your input is very precious in this sense. Also, this is the reason for which I 
kept asking that question earlier: do we have currently any failing locale MT 
test when numpunct does just perfect forwarding, with no caching? I.e., 
changing just _numpunct.h and no other source file (as to silence thread 
analyzers warnings) does any locale (or other) MT tests fail? I would greatly 
appreciate it if you could give it a run on your hardware if you don't already 
know the answer.

The discussion has been productive. But I object to the patch as is because it 
goes out of the scope of the original incident. I think this patch should only 
touch the MT defect detected by the failing test cases. If you think the other 
parts you changed are defects you should open corresponding issues in JIRA and 
have them discussed in their separate rooms.

Thanks,
Liviu

RE: STDCXX-1056 : numpunct fix

2012-09-20 Thread Travis Vitek


 -Original Message-
 From: Stefan Teleman [mailto:stefan.tele...@gmail.com]
 Sent: Thursday, September 20, 2012 10:11 AM
 To: dev@stdcxx.apache.org
 Subject: Re: STDCXX-1056 : numpunct fix
 
 On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms
 wrote:
  But have you measured the amount of memory consumed by all STDCXX
 locale data loaded in one process? How much absolute time is spent in
 resizing the locale and facet buffers? What is the gain in space and
 time performance with such a change versus without? Just how fragmented
 the heap becomes and is there a performance impact because of it, etc.?
 IOW, before changing the status quo one must show an objective defect,
 produce a body of evidence, including a failing test case for the
 argument.
 
 sizeof(std::locale) * number_of_locales.
 
 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.

You have a working knowledge of all Standard C Library implementations?

 
 What I do know for a fact that this optimization did, was to cause
 the races conditions reported by 4 different thread analyzers. Race
 conditions are a show-stopper for me, and they are not negotiable.

The following is found near the top of the _C_manage method of __rw_facet.

// acquire lock
_RWSTD_MT_STATIC_GUARD (_RW::__rw_facet);

None of the shared data related to is read/written outside of the critical 
section protected by that lock, and given the declaration for that shared data 
it cannot be accessed by any code outside that function. Put bluntly, there is 
no way that there is a race condition relating to the caching code itself.

Your Performance Analyzer output indicates a race (7 race accesses) for 
_C_manage...

  http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/

Specifically, it is calling out the following block of code.

##  70 488. *__rw_access::_C_get_pid (*pfacet) =
   489. _RWSTD_STATIC_CAST 
(_RWSTD_SIZE_T, (type + 1) / 2);

The function _C_get_pid simply exposes a reference to a data member of the 
given facet. That function is thread safe. Provided that pfacet (the parameter 
passed to _C_manage) isn't being accessed by another thread, there is no way 
that this code is not safe. It is possible that calling code is not safe, but 
this code is clean.

Regardless, the proposed patch to _C_manage does nothing to change this block 
of code. I do not understand how you can claim that this change eliminated the 
race conditions you are so offended by. It is possible that other changes you 
have made eliminated the data races, but I do not see how this change has any 
effect.

 
  And I love minimalistic code, and hate waste at the same time,
 especially in a general purpose library. To each its own.
 
 Here's a helpful quote:
 
 We should forget about small efficiencies, say about 97% of the time:
 premature optimization is the root of all evil. It's from Donald
 Knuth.

By that measure, your entire patch could be considered evil. I've seen no 
evidence that the subsequent two allocation/copy/deallocate/sort cycles 
required to get from 8 to 64 entries is measurably more expensive, and I've 
seen nothing to indicate that a normal application using the C++ Standard 
Library would be creating and destroying locale instances in large numbers, or 
that doing so has a measureable impact on performance.

 And I love correct code which doesn't cause thread analyzers to report
 more than 12000 race conditions for just one test case. I've said it
 before and I will say it again: race conditions are a showstopper and
 are not negotiable. Period.

When the code in question has 12 threads that invoke a function 1000 times, 
you've found 1 race condition. I do agree data races are bad and should be 
fixed. But making changes to 'optimize' the code instead of fixing it is 
actually much worse.

 
 The patch is in scope for the issue at hand. The issue is that
 std::numpunct and std::moneypunct are not thread safe. This has been
 confirmed by 4 different thread analyzers, even after applying your
 _numpunct.h patch.

I looked at the output from the thread analyzer. It points out a data race in 
__rw::__rw_allocate(), indicating that a memset() is responsible for a data 
race...

  
http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/file.14.src.txt.html#line_43

Assuming that `operator new' is indeed thread safe (I didn't bother to look), 
I'm curious to hear how this is an actual data race. I'm also curious to hear 
how you managed to avoid having the same race appear in the output that you 
submitted with the proposed patch.

 You are more than welcome

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On 09/20/12 13:11, Stefan Teleman wrote:

On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms wrote:

But have you measured the amount of memory consumed by all STDCXX locale data 
loaded in one process? How much absolute time is spent in resizing the locale 
and facet buffers? What is the gain in space and time performance with such a 
change versus without? Just how fragmented the heap becomes and is there a 
performance impact because of it, etc.? IOW, before changing the status quo one 
must show an objective defect, produce a body of evidence, including a failing 
test case for the argument.


sizeof(std::locale) * number_of_locales.


I was more interested in the underlying locale data, not the C++ objects.



I'll let you in on a little secret: once you call setlocale(3C) and
localeconv(3C), the Standard C Library doesn't release its own locale
handles until process termination. So you might think you save a lot
of memory by destroying and constructing the same locales. You're
really not. It's the Standard C Library locale data which takes up a
lot of space.


Thanks for the secret, I appreciate it. Did you mean to say that the C Standard 
mandates that?!



What I do know for a fact that this optimization did, was to cause
the races conditions reported by 4 different thread analyzers. Race
conditions are a show-stopper for me, and they are not negotiable.


No, that optimization was not causing the MT defect you originally noted. 
Saying so only shows a lack of familiarity with the implementation.




And I love minimalistic code, and hate waste at the same time, especially in a 
general purpose library. To each its own.


Here's a helpful quote:

We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. It's from Donald
Knuth.


Please, no.



And I love correct code which doesn't cause thread analyzers to report
more than 12000 race conditions for just one test case. I've said it
before and I will say it again: race conditions are a showstopper and
are not negotiable. Period.



Stefan, you are not being correct by a consensus of thread analyzers output or 
by being loud, or by pounding your fist on the table. This being said I will 
continue to exercise due diligence and put in the necessary time and effort to 
provide you with the most useful feed-back I can.

I see that you missed my question in the post before: did you see a failure in 
the locale MT tests in your SPARC runs? If you do not want to run that test, 
that is fine, just let me know.

Thanks,
Liviu




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek
travis.vi...@roguewave.com wrote:


 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.

 You have a working knowledge of all Standard C Library implementations?

I happen to do, yes, for the operating systems that I've been testing
on. I also happen to know that you don't. This fact alone pretty much
closes up *this* particular discussion.

Do yourself, and this mailing list a favor: either write a patch which
addresses all of your concerns *AND* eliminates all the race
conditions reported, or stop this pseudo software engineering bullshit
via email.

There is apparently, a high concentration of know-it-alls on this
mailing list, who are much better at detecting race conditions and
thread unsafety than the tools themselves. Too bad they aren't as good
at figuring out their own bugs.

It took eight months for anyone here to even *acknowledge* that
numpunct and moneypunct do have, in fact, a thread safety problem.
Never mind that the test case for these facets had been crashing for 4
years. To be quite blunt and to the point, after 8 months of denying
obvious facts, your credibility is quite a bit under question at this
point.

This entire discussion has become a perfect illustration with what's
wrong with the ASF, as reported here:

http://www.mikealrogers.com/posts/apache-considered-harmful.html

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote:

 
 To answer your question [...]:
 yes, the MT failures occur on SPARC as well, on both SPARCV8 and
 SPARCV9, and the race conditions are reported on *ALL* plaforms
 tested, even after having applied your _numpunct.h patch. This patch
 alone does *NOT* solve the problem.

Stefan, I want to be clear. You are talking about a patch identical in nature 
to the one I have attached now. Just want to be 100% sure we are talking about 
the same thing. This one still produces failures (crashes, assertions, etc.) in 
the locale MT tests on SPARC and elsewhere in your builds?

Thanks,
Liviu




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Wojciech Meyer
Hi,

My perceptions is by reading through the whole thread - we should not
trust 100% external tools to asses the safety of the code. I don't think
there exist an algorithm that produces no false positives.

That's said I admire Stefan's approach, but we should ask the question
are we MT safe enough? I would say from what I read here: yes.

Liviu Nicoara nikko...@hates.ms writes:

 On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote:


 To answer your question [...]:
 yes, the MT failures occur on SPARC as well, on both SPARCV8 and
 SPARCV9, and the race conditions are reported on *ALL* plaforms
 tested, even after having applied your _numpunct.h patch. This patch
 alone does *NOT* solve the problem.

 Stefan, I want to be clear. You are talking about a patch identical in
 nature to the one I have attached now. Just want to be 100% sure we
 are talking about the same thing. This one still produces failures
 (crashes, assertions, etc.) in the locale MT tests on SPARC and
 elsewhere in your builds?

 Thanks,
 Liviu



--
Wojciech Meyer
http://danmey.org


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer
wojciech.me...@googlemail.com wrote:
 Hi,

 My perceptions is by reading through the whole thread - we should not
 trust 100% external tools to asses the safety of the code. I don't think
 there exist an algorithm that produces no false positives.

 That's said I admire Stefan's approach, but we should ask the question
 are we MT safe enough? I would say from what I read here: yes.

Based on what objective metric?

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 7:37 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer
 wojciech.me...@googlemail.com wrote:
 Hi,
 
 My perceptions is by reading through the whole thread - we should not
 trust 100% external tools to asses the safety of the code. I don't think
 there exist an algorithm that produces no false positives.
 
 That's said I admire Stefan's approach, but we should ask the question
 are we MT safe enough? I would say from what I read here: yes.
 
 Based on what objective metric?


The only gold currency that anyone in here accepts without reservations are 
failing test cases. I believe I have seen some exceptions to the golden rule in 
my RW time, but I can't recall any specific instance.

Liviu 



Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote:

 Stefan, I want to be clear. You are talking about a patch identical in nature 
 to the one I have attached now. Just want to be 100% sure we are talking 
 about the same thing. This one still produces failures (crashes, assertions, 
 etc.) in the locale MT tests on SPARC and elsewhere in your builds?

On September 17, 2012 I have posted the following message to this list:

http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html

In that message, there is a link to my SPARC thread-safety test results:

http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html

This test was run with the following _numpunct.h file:

http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html

The test above shows 12440 race conditions detected for a test run of
22.locale.numpunct.mt, with  --nthreads=8 --nloops=1.

Did you ever look at these test results? From reading your email, I
realize that you never looked at it. That is the only possible
explanation as to why you're asking now for SPARC test results, today
being September 20, 2012.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek
 travis.vi...@roguewave.com wrote:
 
 
 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.
 
 You have a working knowledge of all Standard C Library implementations?
 
 I happen to do, yes, for the operating systems that I've been testing
 on. I also happen to know that you don't. This fact alone pretty much
 closes up *this* particular discussion.
 
 Do yourself, and this mailing list a favor: either write a patch which
 addresses all of your concerns *AND* eliminates all the race
 conditions reported, or stop this pseudo software engineering bullshit
 via email.
 
 There is apparently, a high concentration of know-it-alls on this
 mailing list, who are much better at detecting race conditions and
 thread unsafety than the tools themselves. Too bad they aren't as good
 at figuring out their own bugs.


The sniping is uncalled for. There are no enemies here, no one is after you. 
There is criticism though and you are expected to take it and argue your point 
of view. If you can't stand the heat, get out of the kitchen.


 
 It took eight months for anyone here to even *acknowledge* that
 numpunct and moneypunct do have, in fact, a thread safety problem.
 Never mind that the test case for these facets had been crashing for 4
 years. To be quite blunt and to the point, after 8 months of denying
 obvious facts, your credibility is quite a bit under question at this
 point.


Yes, we are busy with other stuff. I wish I got paid to work on this instead.


 
 This entire discussion has become a perfect illustration with what's
 wrong with the ASF, as reported here:
 
 http://www.mikealrogers.com/posts/apache-considered-harmful.html


I actually read it. I see a guy complaining he can't have it his way. No 
problem. One can fork this project at any time and start it anew, by 
themselves, or in the company of like programmers elsewhere. 

For better or worse Apache got STDCXX from RogueWave. Complaining about it is 
like complaining that Apple doesn't give us iPhones for free; after all we are 
the power users and we know what to do with them.

L

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 7:45 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 Stefan, I want to be clear. You are talking about a patch identical in 
 nature to the one I have attached now. Just want to be 100% sure we are 
 talking about the same thing. This one still produces failures (crashes, 
 assertions, etc.) in the locale MT tests on SPARC and elsewhere in your 
 builds?
 
 On September 17, 2012 I have posted the following message to this list:
 
 http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html
 
 In that message, there is a link to my SPARC thread-safety test results:
 
 http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html
 
 This test was run with the following _numpunct.h file:
 
 http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html
 
 The test above shows 12440 race conditions detected for a test run of
 22.locale.numpunct.mt, with  --nthreads=8 --nloops=1.
 
 Did you ever look at these test results? From reading your email, I
 realize that you never looked at it. That is the only possible
 explanation as to why you're asking now for SPARC test results, today
 being September 20, 2012.


I see, there is a confusion about this. Probably nobody explained it before. A 
failing test case means a test case that causes the abnormal termination of the 
execution of the program or creates evidence of abnormal data in the program 
execution.

In this respect please see the atomic add and exchange tests as classical 
examples of what I mean.

I have read all your emails in detail and I have inspected all your 
attachments, modulo the ones I could not open.

Thanks,
Liviu




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote:

 On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote:

 The only gold currency that anyone in here accepts without reservations are 
 failing test cases. I believe I have seen some exceptions to the golden 
 rule in my RW time, but I can't recall any specific instance.

 That may be a valid metric here.

 The only one. Any programmer worth his salt -- I am borrowing your words here 
 -- would be able to demonstrate the validity of his point of view with a test 
 case.

I did. There are 12440 race conditions detected for an incomplete run
of 22.locale.numpunct.mt. By incomplete I mean: it did not run with
its default nthreads and nloops which I believe are 8 threads and
20 loop iterations.

I presented a *proposal* fix which:

1. keeps your _numpunct.h forwarding patch
2. eliminates 100% of the race conditions

I have yet to see a counter-proposal.

The only thing i've seen are assertions (race condition
instrumentation and detection tools are wrong), mischaracterizations
(your patch is evil) and overall just email bullshit.

Not a single line of code which would resolve the 12440 race conditions problem.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Wojciech Meyer
Liviu Nicoara nikko...@hates.ms writes:

 On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek
 travis.vi...@roguewave.com wrote:


 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.

 You have a working knowledge of all Standard C Library implementations?

 I happen to do, yes, for the operating systems that I've been testing
 on. I also happen to know that you don't. This fact alone pretty much
 closes up *this* particular discussion.

 Do yourself, and this mailing list a favor: either write a patch which
 addresses all of your concerns *AND* eliminates all the race
 conditions reported, or stop this pseudo software engineering bullshit
 via email.

 There is apparently, a high concentration of know-it-alls on this
 mailing list, who are much better at detecting race conditions and
 thread unsafety than the tools themselves. Too bad they aren't as good
 at figuring out their own bugs.

I fully agree - tools are great, however I know a little about
compilers, and I can tell you that there are limits of static guarantees
you can get from any analyser, because in nature there is something
defined as a halting problem, which limits the tools even the topnotch
ones based on abstract interpretation to the certain extent.  The
halting problem says: for every program in a formal language that is
Turing complete you can't say with 100% assurance it will halt for every
input data. You can try to analyse it statically, but then there is a
balance between analysing and interpreting parts of it, even in the
extreme case if you run it - you will not know if it suppose to
halt. Therefore please use tools but be a bit reserved for the results.

All these MT analysers are based on a simple heuristics and logical
assertions that can't give you 100% right results. I don't think people
here are picky about your patches, it's just better sometimes to take a
breath and see the big picture.

--
Wojciech Meyer
http://danmey.org


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 8:04 PM, Wojciech Meyer
wojciech.me...@googlemail.com wrote:


 Therefore please use tools but be a bit reserved for the results.

I *am* being cautiously skeptical about the results. That's why I am
using 4 [ FOUR ] different thread analyzers, on three different
operating systems, each one of them in 32- and 64- bit, and not just
one.

With this testing setup described above, when all FOUR instrumentation
toosl report the same exact problem in the same exact spot, for all
flavors of the operating system, what would be a rational conclusion?

1. There is indeed a race condition and thread safety problem, it
needs to be investigated and fixed..

2. Bah, the tools are crap, nothing to see here, move along, declare victory.

I chose [1] because I am willing to accept my *own* limitations.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 8:02 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote:
 
 On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 The only gold currency that anyone in here accepts without reservations 
 are failing test cases. I believe I have seen some exceptions to the 
 golden rule in my RW time, but I can't recall any specific instance.
 
 That may be a valid metric here.
 
 The only one. Any programmer worth his salt -- I am borrowing your words 
 here -- would be able to demonstrate the validity of his point of view with 
 a test case.
 
 I did. There are 12440 race conditions detected for an incomplete run
 of 22.locale.numpunct.mt. By incomplete I mean: it did not run with
 its default nthreads and nloops which I believe are 8 threads and
 20 loop iterations.


That is not it, and you did not. Please pay attention: given your assertion 
that a race condition is a defect that causes an abnormal execution of the 
program during which the program sees abnormal, incorrect states (read: 
variable values) it should be easy for you to craft a program that shows 
evidence of that by either printing those values, or aborting upon detecting 
them, etc.

 
 [...] and overall just email bullshit.

Stop using that word. 

L

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 8:18 PM, Liviu Nicoara nikko...@hates.ms wrote:

 That is not it, and you did not. Please pay attention: given your assertion 
 that a race condition is a defect that causes an abnormal execution of the 
 program during which the program sees abnormal, incorrect states (read: 
 variable values) it should be easy for you to craft a program that shows 
 evidence of that by either printing those values, or aborting upon detecting 
 them, etc.

Oh, I see.

So now I'm supposed to write a program which may, or may not, prove to
you that the 12440 race conditions detected by SunPro and Intel are,
in fact, real race conditions (as opposed to fake race
conditions)?

And the means of proving the existence of these real race conditions
is ... [ drum roll ] ... fprintf(3C)?

This is very funny. You made my day,

Have a nice evening.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread C. Bergström

On 09/21/12 07:39 AM, Liviu Nicoara wrote:

Now, in all honesty, it is not too hard to do that. Once you can satisfactorily 
explain to yourself what is wrong in the program, the creation of a test case 
is trivial. Some multi-threading bugs are insidious and hard to reproduce, but 
even then it's doable by isolating as little a portion of the codebase as 
possible, as a standalone program, and trimming it until the failure becomes 
easily reproducible.
fencepost comment - The results are based on tools and I don't think he 
has a large program which actually triggers the conditions.  (Creating 
one may take quite some time)


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I have not created this requirement out of thin air. STDCXX development has 
 functioned in this manner for as long as I remember. If it does not suit you, 
 that's fine.

That would explain why these bugs are present in the first place.

If the official method of determining thread-safety here is
fprintf(3C), then we have a much bigger problem than
22.locale.numpunct.mt.


-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 8:59 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström
 cbergst...@pathscale.com wrote:
 
 
 fencepost comment - The results are based on tools and I don't think he has
 a large program which actually triggers the conditions.  (Creating one may
 take quite some time)
 
 I do have a program which triggers the race conditions:
 22.locale.numpunct.mt. Part of the official test harness.
 
 The real reason why they don't want to accept what the instrumentation
 tools are saying is very simple: they don't LIKE reading what the
 tools are saying. So, blame the tools, pretend that as long as it
 doesn't crash again there's no bug and hope for the best.

I cannot include an analyzer output as a regression test in the test suite. 

 
 But I am very glad this is on a public mailing list, so everyone can
 read what's going on here.
 

?

STDCXX-1056 : numpunct fix

2012-09-19 Thread Stefan Teleman
This is a proposed fix for the numpunct facet for stdcxx-1056:

0.  Number of reported race conditions is now 0 (zero).

1.  No memory leaks in stdcxx (there are memory leaks reported in either
libc or glibc, but there's nothing we can do about these anyway).

2.  This fix preserves perfect forwarding in the _numpunct.h header file.

3.  This fix eliminates code from facet.cpp and locale_body.cpp which
was creating unnecessary overhead, with the potential of causing
memory corruption, while providing no discernable benefit.

More specifically:

It is not true that there was no eviction policy of cached locales or
facets in stdcxx. Not only cache eviction code existed, and still exists
today, but cache cleanups and resizing were performed periodically,
either when an object's reference count dropped to 0 (zero), or whenever
the number of cached objects fell below sizeof(cache) / 2.

In the latter case, both the facet cache and the locale cache performed
a new allocation of the cache array, followed by a memcopy and a delete[]
of the old cache array.

First, the default size of the facets and locales caches was too small:
it was set to 8. I raised this to 32. A direct consequence of this
insufficient default size of 8 was that the cache had to resize itself
very soon after program startup. This cache resize operation consists of:
allocate memory for a new cache, copy the existing cached objects
from the old cache to the new cache, and then delete[] the old cache.

This is a first unnecessary overhead.

Second, and as I mentioned above, whenever the number of cached objects
fell below sizeof(cache) / 2, the cache resized itself, by performing
the same sequence of operations as described above.

This is a second unnecessary overhead.

Third, cached objects were automatically evicted whenever their reference
count dropped to 0 (zero). There are two consequences to this eviction
policy: if the program needs to re-use an object (facet or locale) which
has been evicted and subsequently destroyed, this object needs then to be
constructed again later on, and subsequently re-inserted into the cache.
This, in turn, would trigger a cache resize, followed by copying and
delete[] of the old cache buffer.

Object eviction followed by destruction followed by reconstruction is
a third unnecessary overhead. Re-inserting a re-constructed object into,
the cache, followed by a potential cache resize involving allocation of
a new buffer, copying pointers from the old cache to the new cache,
followed by delete[] of the old cache is a fourth unnecessary overhead.

Real-life programs tend to reuse locales and/or facets they have created.
There is no point in destroying and evicting these objects simply because
there may be periods when the object isn't referenced at time. The object
is likely to be needed again, later on.

The fix proposed here eliminates the cache eviction and object destruction
policy completely. Once created, objects remain in the cache, even though
they may reside in the cache with no references. This eliminates the
cache resize / copy / delete[] overhead. It also eliminates the overhead
of re-constructing an evicted / destroyed object, if it is needed again
later.

4.  Tests and Analysis Results:

4.1. SunPro 12.3 / Solaris / SPARC / Race Conditions Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.solaris-sparc.datarace.er.html/index.html

4.2. SunPro 12.3 / Solaris / SPARC / Heap and Memory Leaks Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.solaris-sparc.heapcheck.er.html/index.html

4.3. SunPro 12.3 / Linux / Intel / Race Conditions Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.linux-intel.datarace.er.html/index.html

4.4. SunPro 12.3 / Linux / Intel / Heap and Memory Leaks Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.linux-intel.heapcheck.er.html/index.html

4.5. Intel 2013 / Linux / Intel / Race Conditions Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.intel.linux.datarace.r007ti3.inspxez

4.6. Intel 2013 / Linux / Intel / Heap and Memory Leaks Test:


http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.intel.linux.heapcheck.r008mi1.inspxez

5.  Source code for this fix:

http://s247136804.onlinehome.us/stdcxx-1056-20120919/_numpunct.h
http://s247136804.onlinehome.us/stdcxx-1056-20120919/facet.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120919/locale_body.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120919/punct.cpp

These files are based on stdcxx 4.2.1.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-19 Thread Stefan Teleman
On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I think you are referring to `live' cache objects and the code which
 specifically adjusts the size of the buffer according to the number of
 `live' locales and/or facets in it. In that respect I would not call that
 eviction because locales and facets with non-zero reference counters are
 never evicted.

 But anyhoo, this is semantics. Bottom line is the locale/facet buffer
 management code follows a principle of economy.

Yes it does. But we have to choose between economy and efficiency. To
clarify: The overhead of having unused pointers in the cache is
sizeof(void*) times the number of unused slots.  This is 2012. Even
an entry-level Android cell phone comes with 1GB system memory. If we
want to talk about embedded systems, where memory constraints are more
stringent than cell phones, then we're not talking about Apache stdcxx
anymore, or any other open souce of the C++ Standard Library. These
types of systems use C++ for embedded systems, which is a different
animal altogether: no exceptions support, no rtti. For example see,
Green Hills: http://www.ghs.com/ec++.html. And even they have become
more relaxed about memory constraints. They use BOOST.

Bottom line: so what if 16 pointers in this 32 pointer slots cache
never get used. The maximum amount of wasted memory for these 16
pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers.
Can we live with that in 2012, a year when a $500 laptop comes with
4GB RAM out of the box? I would pick 128 bytes of allocated but unused
memory over random and entirely avoidable memory churn any day.

 The optimal number is subject to debate. Probably Martin can give an insight
 into the reasons for that number. Why did you pick 32 (or is it 64 in your
 patch) and not any other? Is it something based on your experience as a user
 or programmer?

Based on two things:

1. There are, apparently, 30 top languages spoken on this planet:

http://www.vistawide.com/languages/top_30_languages.htm

2. I've written locale-aware software back in my days on Wall Street.
The maximum number of locales I had to support was 14.

So max(14, 30) would be 30. So I made it 32 because it's a power of 2.

 A negligible overhead, IMO. The benefits of maintaining a small memory
 footprint may be important for some environments. As useful as principles
 may be, see above.

Small and negligible in theory. In practice, when the cache starts
resizing itself by allocating new memory, copying, delete[]'ing and -
I forgot to mention this in my initial post - finishing it all up with
a call to qsort(3C), it's not that negligible anymore. It doesn't just
happen once. It happens every time the cache gets anxious (for
reasons mentioned in my previous email) and wants to resize itself.
Which triggers the following question in my mind: why are we even
causing all this memory churn in the first place? Because we saved 128
bytes (or 64 bytes on a 32-bit machine, which is what most cell
phones/tablets are these days)?

My goal: I would be very happy if any application using Apache stdcxx
would reach its peak instantiation level of localization (read: max
number of locales and facets instantiated and cached, for the
application's particular use case), and would then stabilize at that
level *without* having to resize and re-sort the cache, *ever*. That
is a locale cache I can love. I love binary searches on sorted
containers. Wrecking the container with insertions or deletions, and
then having to re-sort it again, not so much. Especially when I can't
figure out why we're doing it in the first place.

 In this respect you could call every memory allocation and de-allocation is
 an overhead. Please keep in mind that this resembles the operations
 performed for any sequence containers; how likely is it for a program to
 have more locale/facet creation/destruction than strings or vectors
 mutations?

There's one fundamental difference: the non-sorted STL containers give
the developer the opportunity to construct them with an initial size
larger than the implementation-specific default size. Any application
developer worth their salt would perform some initial size
optimization for these types of containers. If I know that my
std::list will end up containing 5000 things, I would never
construct my list object with the default size of 16. Do that, and
you'll get flamed at code review. As for the sorted associative
containers, that's one of the major gripes against them: whenever they
have to grow, or rebalance, they get expensive. But we're not using a
sorted associative container here. It's just a plain ol' C array.

 Could you please elaborate a bit on this? Is this your opinion based on your
 user and/or programmer experience?

See above about the top 30 languages spoken in the world.

 Hey Stefan, are the above also timing the changes?

Nah, I didn't bother with the timings - yet - for a very simple
reason: in order to use