RE: STDCXX-1056 : numpunct fix
-Original Message- From: Stefan Teleman Sent: Thursday, September 20, 2012 6:00 PM To: dev@stdcxx.apache.org Cc: Liviu Nicoara Subject: Re: STDCXX-1056 : numpunct fix On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström cbergst...@pathscale.com wrote: I do have a program which triggers the race conditions: 22.locale.numpunct.mt. Part of the official test harness. I don't think anyone is claiming that there isn't a thread safety problem in the locales. I'm open to the possibility, and I'm willing to review changes that fix the problem, but I want some understanding of what the problem actually is and how the changes actually address the problem before signing off. I also find it hard to believe the changes to the facet cache (and many of the other changes) are necessary to eliminate any data races that were reported by these tools. Changing the cache size from 8 to 64 is _clearly_ not going to fix a data race. It might reduce the likelihood of one occurring, but it isn't going to fix it. I also fail to understand how not reducing the size of the cache fixes a data race either. These changes all appear to be optimizations, and as such, should _not_ be included in changes to fix thread-safety issues. Perhaps we could go about this a different way. Test your code without the changes to the facet cache changes in _C_manage and report back on what you find. Or, perhaps more simply, provide minimal diffs (as required by the patching process) to fix the data race issues that you've outlined. The real reason why they don't want to accept what the instrumentation tools are saying is very simple: they don't LIKE reading what the tools are saying. So, blame the tools, pretend that as long as it doesn't crash again there's no bug and hope for the best. My previous e-mail attempted to indicate that I don't trust the output that these tools have provided. I have no understanding of how the tool managed to find a data race in __rw_allocate (or _C_manage). Do you? Doesn't this output give you some doubt about the accuracy of the tool? I have no problem using a tool to make my job easier. Sometimes those tools are wrong. The two cases mentioned above call into question the accuracy of the tool being used, and it is important to understand why those failures are being seen (or why I am wrong about the given code being safe). But I am very glad this is on a public mailing list, so everyone can read what's going on here. As am I. I've raised concerns about the tools you're using to detect problems. I've pointed out what I believe to be two false positives, and you've accused me of being incompetent and afraid of what the tools are saying. You've offered no explanation about these apparent false positives and you've not provided any explanation for why you don't see similar false positives when you run the code on the patched library. I've raised concerns about some of the changes you've provided being unnecessary to resolve the issue at hand (a thread safety issue) and labeled them as optimizations (your original e-mail seemed to indicate you were under the same impression). As Liviu pointed out, the bug fix changes should be handled separately from the optimization. I wholeheartedly agree. You called out premature optimization as evil, in a discussion about patches you provided that include optimizations and no testcase showing that your changes are not premature and provide measureable benefit. Then you rail on... Then, to top it off, you go on to call me a know-it-all who isn't capable of figuring out my own bugs. I'm sorry, but that isn't acceptable. Travis
Re: STDCXX-1056 : numpunct fix
On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: You called out premature optimization as evil, in a discussion about patches you provided that include optimizations and no testcase showing that your changes are not premature and provide measureable benefit. Then you rail on... I didn't call premature optimization evil. Donald Knuth did. If you disagree with him, please feel free to let him know. He's on faculty at Stanford. Then, to top it off, you go on to call me a know-it-all who isn't capable of figuring out my own bugs. I'm sorry, but that isn't acceptable Too bad if you feel that way. Next time you get the idea of making snide remarks about my working knowledge of the Standard C Library, or offer out-of-context one-line code fragments completely unrelated to what this 1056 bug is about, maybe you'll think twice. And this isn't the first time you offer gratuitous snide remarks directed at me. You are one of the deniers of the existence of this thread safety problem in the facets code, going back to early February of this year. Between the release of stdcxx 4.2.1 in 2008 and the beginning of this month, when the possibility of this thread safety problem was finally acknowledged, did you really not know that 22.locale.numpunct.mt and 22.locale.moneypunct.mt have been crashing or getting stuck? Did you really not know that these crashes were typical symptoms of race conditions? I find that very hard to believe, given that the problem has been reported several times before February of this year. I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I have yet to see an alternative patch being proposed here, which would eliminate the race conditions, and which I am willing to test just as thoroughly as I've tested mine. The only thing i've seen are continued attempts at dismissing the existence of these race conditions, as reported by the instrumentation tools, based on some general axioms about their accuracy. No-one on this list has a clue as to how the SunPro Thread Analyzer actually works, because it's not open source, and none of you work at Oracle, therefore you can't see the code. But everyone just *knows* that it's inaccurate, or that it reports false positives. As long as you, or anyone else, continues to blame the instrumentation tools for the bug, and as long as anyone here continues to suggest that the only acceptable proof of the existence of this bug is some other program which needs to be written using fprintf(3C), and as long as no-one here is willing to provide an alternative patch which demonstrably eliminates 100% of the reported race conditions, this entire back-and-forth about the existence of these race conditions, the accuracy of the tools and what they are reporting is nothing but a giant waste of time. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On 09/21/12 05:13, Stefan Teleman wrote: On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I looked at it in great detail and I am sure Travis read it too. The facet/locale buffer management is a red herring. Apparently, we cannot convince you, regardless of the argument. That's fine by me. I have yet to see an alternative patch being proposed here, which would eliminate the race conditions, and which I am willing to test just as thoroughly as I've tested mine. The only thing i've seen are continued attempts at dismissing the existence of these race conditions, as reported by the instrumentation tools, based on some general axioms about their accuracy. No-one on this list has a clue as to how the SunPro Thread Analyzer actually works, because it's not open source, and none of you work at Oracle, therefore you can't see the code. But everyone just *knows* that it's inaccurate, or that it reports false positives. We are not dismissing the analysis. We are dismissing your interpretation of it. The tool is not indicating a defect, but a potential defect. I am calling it potential because, as my test case (http://tinyurl.com/97ks9gc) indicates, one can have a race condition which is not a defect in a concurrent environment. As long as you, or anyone else, continues to blame the instrumentation tools for the bug, and as long as anyone here continues to suggest that the only acceptable proof of the existence of this bug is some other program which needs to be written using fprintf(3C), and as long as no-one here is willing to provide an alternative patch which demonstrably eliminates 100% of the reported race conditions, this entire back-and-forth about the existence of these race conditions, the accuracy of the tools and what they are reporting is nothing but a giant waste of time. The correction of 1056 does not necessarily need to eliminate the analyzer warnings. Aiming for that is simply unreasonable, not because it is not attainable (after all, you demonstrated it), but because it is inappropriately and insufficiently argued for the situation at hand. Liviu
RE: STDCXX-1056 : numpunct fix
-Original Message- From: Stefan Teleman [mailto:stefan.tele...@gmail.com] Sent: Friday, September 21, 2012 2:14 AM To: dev@stdcxx.apache.org Subject: Re: STDCXX-1056 : numpunct fix On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: You called out premature optimization as evil, in a discussion about patches you provided that include optimizations and no testcase showing that your changes are not premature and provide measureable benefit. Then you rail on... I didn't call premature optimization evil. Donald Knuth did. If you disagree with him, please feel free to let him know. He's on faculty at Stanford. I never said I disagree with the Knuth quote. I just said you have to apply it consistently. Is your change an optimization or not? If it is, then is it a premature optimization? Do we have a test case (or test results) to indicate that there is a discernible performance improvement? You are one of the deniers of the existence of this thread safety problem in the facets code, going back to early February of this year. Again from my previous e-mail. quote I don't think anyone is claiming that there isn't a thread safety problem in the locales. I'm open to the possibility, and I'm willing to review changes that fix the problem, but I want some understanding of what the problem actually is and how the changes actually address the problem before signing off. /quote Between the release of stdcxx 4.2.1 in 2008 and the beginning of this month, when the possibility of this thread safety problem was finally acknowledged, did you really not know that 22.locale.numpunct.mt and 22.locale.moneypunct.mt have been crashing or getting stuck? Did you really not know that these crashes were typical symptoms of race conditions? I find that very hard to believe, given that the problem has been reported several times before February of this year. I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I looked at every line of your proposed patch _before_ my last e-mail. Despite the fact that you didn't provide it in the expected format (diff) and it included many changes that are unrelated and make little sense. I have yet to see an alternative patch being proposed here, which would eliminate the race conditions, and which I am willing to test just as thoroughly as I've tested mine. The only thing i've seen are continued attempts at dismissing the existence of these race conditions I'm _not_ denying the existence of the problem. Please re-read all of my correspondence on this issue. If you don't believe me after that, re-read it again.
SUNPro 5.12 compilation failure in optimized Linux builds
Optimized (but not debug) builds fail to compile setlocale.cpp with the error: $ cat t.cpp; CC -c t.cpp #define _XOPEN_SOURCE #include cwchar /opt/sunpro/12_3/prod/include/cc/wchar.h, line 90: Error: tm is not defined. /opt/sunpro/12_3/prod/include/cc/wchar.h, line 92: Error: fgetwc is not defined. ... and so on for every wide-char function in wchar.h. The definition of _XOPEN_SOURCE at the top of setlocale.cpp is not entirely correct in that it interacts with the various guards in the C library headers. AFAICT, Oracle/Sun wchar.h includes C library wchar.h, which includes C library wctype.h, which includes again wchar.h, coming full circle to the compiler wchar.h which uses the names w/o them being defined, yet. The fact that Oracle/Sun headers include_next corresponding C library headers outside of a inclusion guard does not help. (Debug builds are not affected because of interaction from the difference in the structure and includes of rw/_traits.h in debug vs. optimized builds). The poor man's fix is to guard that _XOPEN_SOURCE define for SUNPro builds (see below). However, I am not sure whose side the error is. Thanks, Liviu PS. Is it still called SUNPro? Oracles seems it has sanitized that name out on their website. Index: src/setlocale.cpp === --- src/setlocale.cpp (revision 1388733) +++ src/setlocale.cpp (working copy) @@ -34,8 +34,10 @@ #include rw/_defs.h #if defined (__linux__) !defined (_XOPEN_SOURCE) +# if !defined (__SUNPRO_CC) // need S_IFDIR on Linux # define _XOPEN_SOURCE +# endif // !__SUNPRO_CC #endif // __linux__ !_XOPEN_SOURCE #include locale.h // for setlocale()