Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara
Thanks for the feed-back. Please see below.


On Sep 19, 2012, at 10:02 PM, Stefan Teleman wrote:

 On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 I think you are referring to `live' cache objects and the code which
 specifically adjusts the size of the buffer according to the number of
 `live' locales and/or facets in it. In that respect I would not call that
 eviction because locales and facets with non-zero reference counters are
 never evicted.
 
 But anyhoo, this is semantics. Bottom line is the locale/facet buffer
 management code follows a principle of economy.
 
 Yes it does. But we have to choose between economy and efficiency. To
 clarify: The overhead of having unused pointers in the cache is
 sizeof(void*) times the number of unused slots.  This is 2012. Even
 an entry-level Android cell phone comes with 1GB system memory. If we
 want to talk about embedded systems, where memory constraints are more
 stringent than cell phones, then we're not talking about Apache stdcxx
 anymore, or any other open souce of the C++ Standard Library. These
 types of systems use C++ for embedded systems, which is a different
 animal altogether: no exceptions support, no rtti. For example see,
 Green Hills: http://www.ghs.com/ec++.html. And even they have become
 more relaxed about memory constraints. They use BOOST.
 
 Bottom line: so what if 16 pointers in this 32 pointer slots cache
 never get used. The maximum amount of wasted memory for these 16
 pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers.
 Can we live with that in 2012, a year when a $500 laptop comes with
 4GB RAM out of the box? I would pick 128 bytes of allocated but unused
 memory over random and entirely avoidable memory churn any day.


The argument is plausible and fine as far as brainstorming goes. 

But have you measured the amount of memory consumed by all STDCXX locale data 
loaded in one process? How much absolute time is spent in resizing the locale 
and facet buffers? What is the gain in space and time performance with such a 
change versus without? Just how fragmented the heap becomes and is there a 
performance impact because of it, etc.? IOW, before changing the status quo one 
must show an objective defect, produce a body of evidence, including a failing 
test case for the argument.


 
 My goal: I would be very happy if any application using Apache stdcxx
 would reach its peak instantiation level of localization (read: max
 number of locales and facets instantiated and cached, for the
 application's particular use case), and would then stabilize at that
 level *without* having to resize and re-sort the cache, *ever*. That
 is a locale cache I can love. I love binary searches on sorted
 containers. Wrecking the container with insertions or deletions, and
 then having to re-sort it again, not so much. Especially when I can't
 figure out why we're doing it in the first place.


And I love minimalistic code, and hate waste at the same time, especially in a 
general purpose library. To each its own.


 
 Hey Stefan, are the above also timing the changes?
 
 Nah, I didn't bother with the timings - yet - for a very simple
 reason: in order to use instrumentation, both with SunPro and with
 Intel compilers, optimization of any kind must be disabled. On SunPro
 you have to pass -xkeepframe=%all (which disables tail-call
 optimization as well), in addition to passing -xO0 and -g. So the
 timings for these unoptimized experiments would have been completely
 irrelevant.

Well, I think you are the only one around here with access to SPARC hardware, 
your input is very precious in this sense. Also, this is the reason for which I 
kept asking that question earlier: do we have currently any failing locale MT 
test when numpunct does just perfect forwarding, with no caching? I.e., 
changing just _numpunct.h and no other source file (as to silence thread 
analyzers warnings) does any locale (or other) MT tests fail? I would greatly 
appreciate it if you could give it a run on your hardware if you don't already 
know the answer.

The discussion has been productive. But I object to the patch as is because it 
goes out of the scope of the original incident. I think this patch should only 
touch the MT defect detected by the failing test cases. If you think the other 
parts you changed are defects you should open corresponding issues in JIRA and 
have them discussed in their separate rooms.

Thanks,
Liviu

RE: STDCXX-1056 : numpunct fix

2012-09-20 Thread Travis Vitek


 -Original Message-
 From: Stefan Teleman [mailto:stefan.tele...@gmail.com]
 Sent: Thursday, September 20, 2012 10:11 AM
 To: dev@stdcxx.apache.org
 Subject: Re: STDCXX-1056 : numpunct fix
 
 On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms
 wrote:
  But have you measured the amount of memory consumed by all STDCXX
 locale data loaded in one process? How much absolute time is spent in
 resizing the locale and facet buffers? What is the gain in space and
 time performance with such a change versus without? Just how fragmented
 the heap becomes and is there a performance impact because of it, etc.?
 IOW, before changing the status quo one must show an objective defect,
 produce a body of evidence, including a failing test case for the
 argument.
 
 sizeof(std::locale) * number_of_locales.
 
 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.

You have a working knowledge of all Standard C Library implementations?

 
 What I do know for a fact that this optimization did, was to cause
 the races conditions reported by 4 different thread analyzers. Race
 conditions are a show-stopper for me, and they are not negotiable.

The following is found near the top of the _C_manage method of __rw_facet.

// acquire lock
_RWSTD_MT_STATIC_GUARD (_RW::__rw_facet);

None of the shared data related to is read/written outside of the critical 
section protected by that lock, and given the declaration for that shared data 
it cannot be accessed by any code outside that function. Put bluntly, there is 
no way that there is a race condition relating to the caching code itself.

Your Performance Analyzer output indicates a race (7 race accesses) for 
_C_manage...

  http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/

Specifically, it is calling out the following block of code.

##  70 488. *__rw_access::_C_get_pid (*pfacet) =
   489. _RWSTD_STATIC_CAST 
(_RWSTD_SIZE_T, (type + 1) / 2);

The function _C_get_pid simply exposes a reference to a data member of the 
given facet. That function is thread safe. Provided that pfacet (the parameter 
passed to _C_manage) isn't being accessed by another thread, there is no way 
that this code is not safe. It is possible that calling code is not safe, but 
this code is clean.

Regardless, the proposed patch to _C_manage does nothing to change this block 
of code. I do not understand how you can claim that this change eliminated the 
race conditions you are so offended by. It is possible that other changes you 
have made eliminated the data races, but I do not see how this change has any 
effect.

 
  And I love minimalistic code, and hate waste at the same time,
 especially in a general purpose library. To each its own.
 
 Here's a helpful quote:
 
 We should forget about small efficiencies, say about 97% of the time:
 premature optimization is the root of all evil. It's from Donald
 Knuth.

By that measure, your entire patch could be considered evil. I've seen no 
evidence that the subsequent two allocation/copy/deallocate/sort cycles 
required to get from 8 to 64 entries is measurably more expensive, and I've 
seen nothing to indicate that a normal application using the C++ Standard 
Library would be creating and destroying locale instances in large numbers, or 
that doing so has a measureable impact on performance.

 And I love correct code which doesn't cause thread analyzers to report
 more than 12000 race conditions for just one test case. I've said it
 before and I will say it again: race conditions are a showstopper and
 are not negotiable. Period.

When the code in question has 12 threads that invoke a function 1000 times, 
you've found 1 race condition. I do agree data races are bad and should be 
fixed. But making changes to 'optimize' the code instead of fixing it is 
actually much worse.

 
 The patch is in scope for the issue at hand. The issue is that
 std::numpunct and std::moneypunct are not thread safe. This has been
 confirmed by 4 different thread analyzers, even after applying your
 _numpunct.h patch.

I looked at the output from the thread analyzer. It points out a data race in 
__rw::__rw_allocate(), indicating that a memset() is responsible for a data 
race...

  
http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/file.14.src.txt.html#line_43

Assuming that `operator new' is indeed thread safe (I didn't bother to look), 
I'm curious to hear how this is an actual data race. I'm also curious to hear 
how you managed to avoid having the same race appear in the output that you 
submitted with the proposed patch.

 You are more than welcome to 

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On 09/20/12 13:11, Stefan Teleman wrote:

On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms wrote:

But have you measured the amount of memory consumed by all STDCXX locale data 
loaded in one process? How much absolute time is spent in resizing the locale 
and facet buffers? What is the gain in space and time performance with such a 
change versus without? Just how fragmented the heap becomes and is there a 
performance impact because of it, etc.? IOW, before changing the status quo one 
must show an objective defect, produce a body of evidence, including a failing 
test case for the argument.


sizeof(std::locale) * number_of_locales.


I was more interested in the underlying locale data, not the C++ objects.



I'll let you in on a little secret: once you call setlocale(3C) and
localeconv(3C), the Standard C Library doesn't release its own locale
handles until process termination. So you might think you save a lot
of memory by destroying and constructing the same locales. You're
really not. It's the Standard C Library locale data which takes up a
lot of space.


Thanks for the secret, I appreciate it. Did you mean to say that the C Standard 
mandates that?!



What I do know for a fact that this optimization did, was to cause
the races conditions reported by 4 different thread analyzers. Race
conditions are a show-stopper for me, and they are not negotiable.


No, that optimization was not causing the MT defect you originally noted. 
Saying so only shows a lack of familiarity with the implementation.




And I love minimalistic code, and hate waste at the same time, especially in a 
general purpose library. To each its own.


Here's a helpful quote:

We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. It's from Donald
Knuth.


Please, no.



And I love correct code which doesn't cause thread analyzers to report
more than 12000 race conditions for just one test case. I've said it
before and I will say it again: race conditions are a showstopper and
are not negotiable. Period.



Stefan, you are not being correct by a consensus of thread analyzers output or 
by being loud, or by pounding your fist on the table. This being said I will 
continue to exercise due diligence and put in the necessary time and effort to 
provide you with the most useful feed-back I can.

I see that you missed my question in the post before: did you see a failure in 
the locale MT tests in your SPARC runs? If you do not want to run that test, 
that is fine, just let me know.

Thanks,
Liviu




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek
travis.vi...@roguewave.com wrote:


 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.

 You have a working knowledge of all Standard C Library implementations?

I happen to do, yes, for the operating systems that I've been testing
on. I also happen to know that you don't. This fact alone pretty much
closes up *this* particular discussion.

Do yourself, and this mailing list a favor: either write a patch which
addresses all of your concerns *AND* eliminates all the race
conditions reported, or stop this pseudo software engineering bullshit
via email.

There is apparently, a high concentration of know-it-alls on this
mailing list, who are much better at detecting race conditions and
thread unsafety than the tools themselves. Too bad they aren't as good
at figuring out their own bugs.

It took eight months for anyone here to even *acknowledge* that
numpunct and moneypunct do have, in fact, a thread safety problem.
Never mind that the test case for these facets had been crashing for 4
years. To be quite blunt and to the point, after 8 months of denying
obvious facts, your credibility is quite a bit under question at this
point.

This entire discussion has become a perfect illustration with what's
wrong with the ASF, as reported here:

http://www.mikealrogers.com/posts/apache-considered-harmful.html

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote:

 
 To answer your question [...]:
 yes, the MT failures occur on SPARC as well, on both SPARCV8 and
 SPARCV9, and the race conditions are reported on *ALL* plaforms
 tested, even after having applied your _numpunct.h patch. This patch
 alone does *NOT* solve the problem.

Stefan, I want to be clear. You are talking about a patch identical in nature 
to the one I have attached now. Just want to be 100% sure we are talking about 
the same thing. This one still produces failures (crashes, assertions, etc.) in 
the locale MT tests on SPARC and elsewhere in your builds?

Thanks,
Liviu




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Wojciech Meyer
Hi,

My perceptions is by reading through the whole thread - we should not
trust 100% external tools to asses the safety of the code. I don't think
there exist an algorithm that produces no false positives.

That's said I admire Stefan's approach, but we should ask the question
are we MT safe enough? I would say from what I read here: yes.

Liviu Nicoara nikko...@hates.ms writes:

 On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote:


 To answer your question [...]:
 yes, the MT failures occur on SPARC as well, on both SPARCV8 and
 SPARCV9, and the race conditions are reported on *ALL* plaforms
 tested, even after having applied your _numpunct.h patch. This patch
 alone does *NOT* solve the problem.

 Stefan, I want to be clear. You are talking about a patch identical in
 nature to the one I have attached now. Just want to be 100% sure we
 are talking about the same thing. This one still produces failures
 (crashes, assertions, etc.) in the locale MT tests on SPARC and
 elsewhere in your builds?

 Thanks,
 Liviu



--
Wojciech Meyer
http://danmey.org


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer
wojciech.me...@googlemail.com wrote:
 Hi,

 My perceptions is by reading through the whole thread - we should not
 trust 100% external tools to asses the safety of the code. I don't think
 there exist an algorithm that produces no false positives.

 That's said I admire Stefan's approach, but we should ask the question
 are we MT safe enough? I would say from what I read here: yes.

Based on what objective metric?

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 7:37 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer
 wojciech.me...@googlemail.com wrote:
 Hi,
 
 My perceptions is by reading through the whole thread - we should not
 trust 100% external tools to asses the safety of the code. I don't think
 there exist an algorithm that produces no false positives.
 
 That's said I admire Stefan's approach, but we should ask the question
 are we MT safe enough? I would say from what I read here: yes.
 
 Based on what objective metric?


The only gold currency that anyone in here accepts without reservations are 
failing test cases. I believe I have seen some exceptions to the golden rule in 
my RW time, but I can't recall any specific instance.

Liviu 



Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote:

 Stefan, I want to be clear. You are talking about a patch identical in nature 
 to the one I have attached now. Just want to be 100% sure we are talking 
 about the same thing. This one still produces failures (crashes, assertions, 
 etc.) in the locale MT tests on SPARC and elsewhere in your builds?

On September 17, 2012 I have posted the following message to this list:

http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html

In that message, there is a link to my SPARC thread-safety test results:

http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html

This test was run with the following _numpunct.h file:

http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html

The test above shows 12440 race conditions detected for a test run of
22.locale.numpunct.mt, with  --nthreads=8 --nloops=1.

Did you ever look at these test results? From reading your email, I
realize that you never looked at it. That is the only possible
explanation as to why you're asking now for SPARC test results, today
being September 20, 2012.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek
 travis.vi...@roguewave.com wrote:
 
 
 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.
 
 You have a working knowledge of all Standard C Library implementations?
 
 I happen to do, yes, for the operating systems that I've been testing
 on. I also happen to know that you don't. This fact alone pretty much
 closes up *this* particular discussion.
 
 Do yourself, and this mailing list a favor: either write a patch which
 addresses all of your concerns *AND* eliminates all the race
 conditions reported, or stop this pseudo software engineering bullshit
 via email.
 
 There is apparently, a high concentration of know-it-alls on this
 mailing list, who are much better at detecting race conditions and
 thread unsafety than the tools themselves. Too bad they aren't as good
 at figuring out their own bugs.


The sniping is uncalled for. There are no enemies here, no one is after you. 
There is criticism though and you are expected to take it and argue your point 
of view. If you can't stand the heat, get out of the kitchen.


 
 It took eight months for anyone here to even *acknowledge* that
 numpunct and moneypunct do have, in fact, a thread safety problem.
 Never mind that the test case for these facets had been crashing for 4
 years. To be quite blunt and to the point, after 8 months of denying
 obvious facts, your credibility is quite a bit under question at this
 point.


Yes, we are busy with other stuff. I wish I got paid to work on this instead.


 
 This entire discussion has become a perfect illustration with what's
 wrong with the ASF, as reported here:
 
 http://www.mikealrogers.com/posts/apache-considered-harmful.html


I actually read it. I see a guy complaining he can't have it his way. No 
problem. One can fork this project at any time and start it anew, by 
themselves, or in the company of like programmers elsewhere. 

For better or worse Apache got STDCXX from RogueWave. Complaining about it is 
like complaining that Apple doesn't give us iPhones for free; after all we are 
the power users and we know what to do with them.

L

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 7:45 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 Stefan, I want to be clear. You are talking about a patch identical in 
 nature to the one I have attached now. Just want to be 100% sure we are 
 talking about the same thing. This one still produces failures (crashes, 
 assertions, etc.) in the locale MT tests on SPARC and elsewhere in your 
 builds?
 
 On September 17, 2012 I have posted the following message to this list:
 
 http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html
 
 In that message, there is a link to my SPARC thread-safety test results:
 
 http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html
 
 This test was run with the following _numpunct.h file:
 
 http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html
 
 The test above shows 12440 race conditions detected for a test run of
 22.locale.numpunct.mt, with  --nthreads=8 --nloops=1.
 
 Did you ever look at these test results? From reading your email, I
 realize that you never looked at it. That is the only possible
 explanation as to why you're asking now for SPARC test results, today
 being September 20, 2012.


I see, there is a confusion about this. Probably nobody explained it before. A 
failing test case means a test case that causes the abnormal termination of the 
execution of the program or creates evidence of abnormal data in the program 
execution.

In this respect please see the atomic add and exchange tests as classical 
examples of what I mean.

I have read all your emails in detail and I have inspected all your 
attachments, modulo the ones I could not open.

Thanks,
Liviu




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote:

 On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote:

 The only gold currency that anyone in here accepts without reservations are 
 failing test cases. I believe I have seen some exceptions to the golden 
 rule in my RW time, but I can't recall any specific instance.

 That may be a valid metric here.

 The only one. Any programmer worth his salt -- I am borrowing your words here 
 -- would be able to demonstrate the validity of his point of view with a test 
 case.

I did. There are 12440 race conditions detected for an incomplete run
of 22.locale.numpunct.mt. By incomplete I mean: it did not run with
its default nthreads and nloops which I believe are 8 threads and
20 loop iterations.

I presented a *proposal* fix which:

1. keeps your _numpunct.h forwarding patch
2. eliminates 100% of the race conditions

I have yet to see a counter-proposal.

The only thing i've seen are assertions (race condition
instrumentation and detection tools are wrong), mischaracterizations
(your patch is evil) and overall just email bullshit.

Not a single line of code which would resolve the 12440 race conditions problem.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Wojciech Meyer
Liviu Nicoara nikko...@hates.ms writes:

 On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek
 travis.vi...@roguewave.com wrote:


 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.

 You have a working knowledge of all Standard C Library implementations?

 I happen to do, yes, for the operating systems that I've been testing
 on. I also happen to know that you don't. This fact alone pretty much
 closes up *this* particular discussion.

 Do yourself, and this mailing list a favor: either write a patch which
 addresses all of your concerns *AND* eliminates all the race
 conditions reported, or stop this pseudo software engineering bullshit
 via email.

 There is apparently, a high concentration of know-it-alls on this
 mailing list, who are much better at detecting race conditions and
 thread unsafety than the tools themselves. Too bad they aren't as good
 at figuring out their own bugs.

I fully agree - tools are great, however I know a little about
compilers, and I can tell you that there are limits of static guarantees
you can get from any analyser, because in nature there is something
defined as a halting problem, which limits the tools even the topnotch
ones based on abstract interpretation to the certain extent.  The
halting problem says: for every program in a formal language that is
Turing complete you can't say with 100% assurance it will halt for every
input data. You can try to analyse it statically, but then there is a
balance between analysing and interpreting parts of it, even in the
extreme case if you run it - you will not know if it suppose to
halt. Therefore please use tools but be a bit reserved for the results.

All these MT analysers are based on a simple heuristics and logical
assertions that can't give you 100% right results. I don't think people
here are picky about your patches, it's just better sometimes to take a
breath and see the big picture.

--
Wojciech Meyer
http://danmey.org


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 8:04 PM, Wojciech Meyer
wojciech.me...@googlemail.com wrote:


 Therefore please use tools but be a bit reserved for the results.

I *am* being cautiously skeptical about the results. That's why I am
using 4 [ FOUR ] different thread analyzers, on three different
operating systems, each one of them in 32- and 64- bit, and not just
one.

With this testing setup described above, when all FOUR instrumentation
toosl report the same exact problem in the same exact spot, for all
flavors of the operating system, what would be a rational conclusion?

1. There is indeed a race condition and thread safety problem, it
needs to be investigated and fixed..

2. Bah, the tools are crap, nothing to see here, move along, declare victory.

I chose [1] because I am willing to accept my *own* limitations.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 8:02 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote:
 
 On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 The only gold currency that anyone in here accepts without reservations 
 are failing test cases. I believe I have seen some exceptions to the 
 golden rule in my RW time, but I can't recall any specific instance.
 
 That may be a valid metric here.
 
 The only one. Any programmer worth his salt -- I am borrowing your words 
 here -- would be able to demonstrate the validity of his point of view with 
 a test case.
 
 I did. There are 12440 race conditions detected for an incomplete run
 of 22.locale.numpunct.mt. By incomplete I mean: it did not run with
 its default nthreads and nloops which I believe are 8 threads and
 20 loop iterations.


That is not it, and you did not. Please pay attention: given your assertion 
that a race condition is a defect that causes an abnormal execution of the 
program during which the program sees abnormal, incorrect states (read: 
variable values) it should be easy for you to craft a program that shows 
evidence of that by either printing those values, or aborting upon detecting 
them, etc.

 
 [...] and overall just email bullshit.

Stop using that word. 

L

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 8:18 PM, Liviu Nicoara nikko...@hates.ms wrote:

 That is not it, and you did not. Please pay attention: given your assertion 
 that a race condition is a defect that causes an abnormal execution of the 
 program during which the program sees abnormal, incorrect states (read: 
 variable values) it should be easy for you to craft a program that shows 
 evidence of that by either printing those values, or aborting upon detecting 
 them, etc.

Oh, I see.

So now I'm supposed to write a program which may, or may not, prove to
you that the 12440 race conditions detected by SunPro and Intel are,
in fact, real race conditions (as opposed to fake race
conditions)?

And the means of proving the existence of these real race conditions
is ... [ drum roll ] ... fprintf(3C)?

This is very funny. You made my day,

Have a nice evening.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread C. Bergström

On 09/21/12 07:39 AM, Liviu Nicoara wrote:

Now, in all honesty, it is not too hard to do that. Once you can satisfactorily 
explain to yourself what is wrong in the program, the creation of a test case 
is trivial. Some multi-threading bugs are insidious and hard to reproduce, but 
even then it's doable by isolating as little a portion of the codebase as 
possible, as a standalone program, and trimming it until the failure becomes 
easily reproducible.
fencepost comment - The results are based on tools and I don't think he 
has a large program which actually triggers the conditions.  (Creating 
one may take quite some time)


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman
On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I have not created this requirement out of thin air. STDCXX development has 
 functioned in this manner for as long as I remember. If it does not suit you, 
 that's fine.

That would explain why these bugs are present in the first place.

If the official method of determining thread-safety here is
fprintf(3C), then we have a much bigger problem than
22.locale.numpunct.mt.


-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 8:59 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström
 cbergst...@pathscale.com wrote:
 
 
 fencepost comment - The results are based on tools and I don't think he has
 a large program which actually triggers the conditions.  (Creating one may
 take quite some time)
 
 I do have a program which triggers the race conditions:
 22.locale.numpunct.mt. Part of the official test harness.
 
 The real reason why they don't want to accept what the instrumentation
 tools are saying is very simple: they don't LIKE reading what the
 tools are saying. So, blame the tools, pretend that as long as it
 doesn't crash again there's no bug and hope for the best.

I cannot include an analyzer output as a regression test in the test suite. 

 
 But I am very glad this is on a public mailing list, so everyone can
 read what's going on here.
 

?