RE: STDCXX-1056 : numpunct fix

2012-09-21 Thread Travis Vitek

 -Original Message-
 From: Stefan Teleman
 Sent: Thursday, September 20, 2012 6:00 PM
 To: dev@stdcxx.apache.org
 Cc: Liviu Nicoara
 Subject: Re: STDCXX-1056 : numpunct fix
 
 On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström
 cbergst...@pathscale.com wrote:
 
 
 I do have a program which triggers the race conditions:
 22.locale.numpunct.mt. Part of the official test harness.


I don't think anyone is claiming that there isn't a thread safety problem in 
the locales. I'm open to the possibility, and I'm willing to review changes 
that fix the problem, but I want some understanding of what the problem 
actually is and how the changes actually address the problem before signing off.

I also find it hard to believe the changes to the facet cache (and many of the 
other changes) are necessary to eliminate any data races that were reported by 
these tools. Changing the cache size from 8 to 64 is _clearly_ not going to fix 
a data race. It might reduce the likelihood of one occurring, but it isn't 
going to fix it. I also fail to understand how not reducing the size of the 
cache fixes a data race either. These changes all appear to be optimizations, 
and as such, should _not_ be included in changes to fix thread-safety issues.

Perhaps we could go about this a different way. Test your code without the 
changes to the facet cache changes in _C_manage and report back on what you 
find. Or, perhaps more simply, provide minimal diffs (as required by the 
patching process) to fix the data race issues that you've outlined.
 
 The real reason why they don't want to accept what the instrumentation
 tools are saying is very simple: they don't LIKE reading what the
 tools are saying. So, blame the tools, pretend that as long as it
 doesn't crash again there's no bug and hope for the best.

My previous e-mail attempted to indicate that I don't trust the output that 
these tools have provided. I have no understanding of how the tool managed to 
find a data race in __rw_allocate (or _C_manage). Do you? Doesn't this output 
give you some doubt about the accuracy of the tool?

I have no problem using a tool to make my job easier. Sometimes those tools are 
wrong. The two cases mentioned above call into question the accuracy of the 
tool being used, and it is important to understand why those failures are being 
seen (or why I am wrong about the given code being safe).

 But I am very glad this is on a public mailing list, so everyone can
 read what's going on here.

As am I.

I've raised concerns about the tools you're using to detect problems. I've 
pointed out what I believe to be two false positives, and you've accused me of 
being incompetent and afraid of what the tools are saying. You've offered no 
explanation about these apparent false positives and you've not provided any 
explanation for why you don't see similar false positives when you run the code 
on the patched library.

I've raised concerns about some of the changes you've provided being 
unnecessary to resolve the issue at hand (a thread safety issue) and labeled 
them as optimizations (your original e-mail seemed to indicate you were under 
the same impression). As Liviu pointed out, the bug fix changes should be 
handled separately from the optimization. I wholeheartedly agree.

You called out premature optimization as evil, in a discussion about patches 
you provided that include optimizations and no testcase showing that your 
changes are not premature and provide measureable benefit. Then you rail on...

Then, to top it off, you go on to call me a know-it-all who isn't capable of 
figuring out my own bugs.

I'm sorry, but that isn't acceptable.

Travis



Re: STDCXX-1056 : numpunct fix

2012-09-21 Thread Stefan Teleman
On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
travis.vi...@roguewave.com wrote:

 You called out premature optimization as evil, in a discussion about patches 
 you provided that include optimizations and no testcase showing that your 
 changes are not premature and provide measureable benefit. Then you rail on...

I didn't call premature optimization evil. Donald Knuth did. If you
disagree with him, please feel free to let him know. He's on faculty
at Stanford.

 Then, to top it off, you go on to call me a know-it-all who isn't capable of 
 figuring out my own bugs.

 I'm sorry, but that isn't acceptable

Too bad if you feel that way. Next time you get the idea of making
snide remarks about my working knowledge of the Standard C Library, or
offer out-of-context one-line code fragments completely unrelated to
what this 1056 bug is about, maybe you'll think twice. And this isn't
the first time you offer gratuitous snide remarks directed at me.

You are one of the deniers of the existence of this thread safety
problem in the facets code, going back to early February of this year.

Between the release of stdcxx 4.2.1 in 2008 and the beginning of this
month, when the possibility of this thread safety problem was finally
acknowledged, did you really not know that 22.locale.numpunct.mt and
22.locale.moneypunct.mt have been crashing or getting stuck? Did you
really not know that these crashes were typical symptoms of race
conditions? I find that very hard to believe, given that the problem
has been reported several times before February of this year.

I have provided this list with test results showing that my patch
*does* fix the race condition problems identified by all the tools at
my disposal. I'm willing to bet you never looked at it. You dismissed
it outright, just because you didn't like the *idea* that increasing
the size of the cache, and eliminating that useless cache resizing
would play an important role in eliminating the race conditions.

I have yet to see an alternative patch being proposed here, which
would eliminate the race conditions, and which I am willing to test
just as thoroughly as I've tested mine. The only thing i've seen are
continued attempts at dismissing the existence of these race
conditions, as  reported by the instrumentation tools, based on some
general axioms about their accuracy. No-one on this list has a clue as
to how the SunPro Thread Analyzer actually works, because it's not
open source, and none of you work at Oracle, therefore you can't see
the code. But everyone just *knows* that it's inaccurate, or that it
reports false positives.

As long as you, or anyone else, continues to blame the instrumentation
tools for the bug, and as long as anyone here continues to suggest
that the only acceptable proof of the existence of this bug is some
other program which needs to be written using fprintf(3C), and as long
as no-one here is willing to provide an alternative patch which
demonstrably eliminates 100% of the reported race conditions, this
entire back-and-forth about the existence of these race conditions,
the accuracy of the tools and what they are reporting is nothing but a
giant waste of time.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com


Re: STDCXX-1056 : numpunct fix

2012-09-21 Thread Liviu Nicoara

On 09/21/12 05:13, Stefan Teleman wrote:

On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
travis.vi...@roguewave.com wrote:

I have provided this list with test results showing that my patch
*does* fix the race condition problems identified by all the tools at
my disposal. I'm willing to bet you never looked at it. You dismissed
it outright, just because you didn't like the *idea* that increasing
the size of the cache, and eliminating that useless cache resizing
would play an important role in eliminating the race conditions.



I looked at it in great detail and I am sure Travis read it too. The 
facet/locale buffer management is a red herring. Apparently, we cannot convince 
you, regardless of the argument. That's fine by me.




I have yet to see an alternative patch being proposed here, which
would eliminate the race conditions, and which I am willing to test
just as thoroughly as I've tested mine. The only thing i've seen are
continued attempts at dismissing the existence of these race
conditions, as  reported by the instrumentation tools, based on some
general axioms about their accuracy. No-one on this list has a clue as
to how the SunPro Thread Analyzer actually works, because it's not
open source, and none of you work at Oracle, therefore you can't see
the code. But everyone just *knows* that it's inaccurate, or that it
reports false positives.


We are not dismissing the analysis. We are dismissing your interpretation of 
it. The tool is not indicating a defect, but a potential defect. I am calling 
it potential because, as my test case (http://tinyurl.com/97ks9gc) indicates, 
one can have a race condition which is not a defect in a concurrent environment.



As long as you, or anyone else, continues to blame the instrumentation
tools for the bug, and as long as anyone here continues to suggest
that the only acceptable proof of the existence of this bug is some
other program which needs to be written using fprintf(3C), and as long
as no-one here is willing to provide an alternative patch which
demonstrably eliminates 100% of the reported race conditions, this
entire back-and-forth about the existence of these race conditions,
the accuracy of the tools and what they are reporting is nothing but a
giant waste of time.



The correction of 1056 does not necessarily need to eliminate the analyzer 
warnings. Aiming for that is simply unreasonable, not because it is not 
attainable (after all, you demonstrated it), but because it is inappropriately 
and insufficiently argued for the situation at hand.

Liviu


RE: STDCXX-1056 : numpunct fix

2012-09-21 Thread Travis Vitek


 -Original Message-
 From: Stefan Teleman [mailto:stefan.tele...@gmail.com]
 Sent: Friday, September 21, 2012 2:14 AM
 To: dev@stdcxx.apache.org
 Subject: Re: STDCXX-1056 : numpunct fix
 
 On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
 travis.vi...@roguewave.com wrote:
 
  You called out premature optimization as evil, in a discussion about
 patches you provided that include optimizations and no testcase showing
 that your changes are not premature and provide measureable benefit.
 Then you rail on...
 
 I didn't call premature optimization evil. Donald Knuth did. If you
 disagree with him, please feel free to let him know. He's on faculty
 at Stanford.

I never said I disagree with the Knuth quote. I just said you have to apply it 
consistently. Is your change an optimization or not? If it is, then is it a 
premature optimization? Do we have a test case (or test results) to indicate 
that there is a discernible performance improvement?

 
 You are one of the deniers of the existence of this thread safety
 problem in the facets code, going back to early February of this year.

Again from my previous e-mail.

quote
I don't think anyone is claiming that there isn't a thread safety problem in 
the locales. I'm open to the possibility, and I'm willing to review changes 
that fix the problem, but I want some understanding of what the problem 
actually is and how the changes actually address the problem before signing off.
/quote

 Between the release of stdcxx 4.2.1 in 2008 and the beginning of this
 month, when the possibility of this thread safety problem was finally
 acknowledged, did you really not know that 22.locale.numpunct.mt and
 22.locale.moneypunct.mt have been crashing or getting stuck? Did you
 really not know that these crashes were typical symptoms of race
 conditions? I find that very hard to believe, given that the problem
 has been reported several times before February of this year.
 
 I have provided this list with test results showing that my patch
 *does* fix the race condition problems identified by all the tools at
 my disposal. I'm willing to bet you never looked at it. You dismissed
 it outright, just because you didn't like the *idea* that increasing
 the size of the cache, and eliminating that useless cache resizing
 would play an important role in eliminating the race conditions.

I looked at every line of your proposed patch _before_ my last e-mail. Despite 
the fact that you didn't provide it in the expected format (diff) and it 
included many changes that are unrelated and make little sense.

 I have yet to see an alternative patch being proposed here, which
 would eliminate the race conditions, and which I am willing to test
 just as thoroughly as I've tested mine. The only thing i've seen are
 continued attempts at dismissing the existence of these race
 conditions

I'm _not_ denying the existence of the problem. Please re-read all of my 
correspondence on this issue. If you don't believe me after that, re-read it 
again.




SUNPro 5.12 compilation failure in optimized Linux builds

2012-09-21 Thread Liviu Nicoara

Optimized (but not debug) builds fail to compile setlocale.cpp with the error:

$ cat t.cpp; CC -c t.cpp
#define _XOPEN_SOURCE
#include cwchar
/opt/sunpro/12_3/prod/include/cc/wchar.h, line 90: Error: tm is not defined.
/opt/sunpro/12_3/prod/include/cc/wchar.h, line 92: Error: fgetwc is not 
defined.
...

and so on for every wide-char function in wchar.h.

The definition of _XOPEN_SOURCE at the top of setlocale.cpp is not entirely 
correct in that it interacts with the various guards in the C library headers. 
AFAICT, Oracle/Sun wchar.h includes C library wchar.h, which includes C library 
wctype.h, which includes again wchar.h, coming full circle to the compiler 
wchar.h which uses the names w/o them being defined, yet. The fact that 
Oracle/Sun headers include_next corresponding C library headers outside of a 
inclusion guard does not help.


(Debug builds are not affected because of interaction from the difference in the 
structure and includes of rw/_traits.h in debug vs. optimized builds).


The poor man's fix is to guard that _XOPEN_SOURCE define for SUNPro builds (see 
below). However, I am not sure whose side the error is.


Thanks,
Liviu

PS. Is it still called SUNPro? Oracles seems it has sanitized that name out on 
their website.


Index: src/setlocale.cpp
===
--- src/setlocale.cpp   (revision 1388733)
+++ src/setlocale.cpp   (working copy)
@@ -34,8 +34,10 @@
 #include rw/_defs.h

 #if defined (__linux__)  !defined (_XOPEN_SOURCE)
+#  if !defined (__SUNPRO_CC)
// need S_IFDIR on Linux
 #  define _XOPEN_SOURCE
+#  endif // !__SUNPRO_CC
 #endif   // __linux__  !_XOPEN_SOURCE

 #include locale.h   // for setlocale()