Re: STDCXX-1056 : numpunct fix
On 09/24/12 22:28, Martin Sebor wrote: On 09/20/2012 06:46 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoaranikko...@hates.ms wrote: I have not created this requirement out of thin air. STDCXX development has functioned in this manner for as long as I remember. If it does not suit you, that's fine. That would explain why these bugs are present in the first place. If the official method of determining thread-safety here is fprintf(3C), then we have a much bigger problem than 22.locale.numpunct.mt. Tests that end in .mt.cpp use the RW_ASSERT() macro to verify correctness in multithreaded programs. There is no way at the moment to run a thread analyzer as part of the test suite, although it would be a nice addition. As I said in my other reply, though, not all data races necessarily indicate bugs, so we'd need a way to distinguish between know/benign races and the rest. (Unless we decide to eliminate all races and accept the performance penalty.) Preserving the lazy initialization is possible: we perfectly forward to the implementation with either: 1. no additional synchronization (preserves fast reads of the data), while fully knowing that there are data races in __rw_get_numpunct/_C_get_data there, or 2.we put a big lock on top of all and every facet operations (the version that would be race free). Caching seems out of the window for now, more so in the presence of unguarded reads of facet's data. The only argument for (1) is that at the moment we don't have a failing test yet, and not for lack of arduous trying. However, if atomic/membar APIs would be recognized by the compiler, lazy initialization of the facet and caching of data would be attainable in a quite simple fashion. Liviu
Re: STDCXX-1056 : numpunct fix
On 09/24/12 23:50, Stefan Teleman wrote: On Mon, Sep 24, 2012 at 10:03 PM, Martin Sebor mse...@gmail.com wrote: FWIW, there are race conditions in stdcxx. Some of them are by design and benign on the systems the library runs on (although I acknowledge that some others may be bugs). One such benign date race is: timeT1 T2 0x = N 1x = N read x where x is a scalar that can be accessed atomically by the CPU and the compiler. I think some of the lazy facet initialization falls under this class. It would be nice to fix them but only if it doesn't slow things down. The others need to be fixed in any case. The race conditions I am talking about are not benign. Caching failures aside, we have no failing tests. Before saying they are malign you need to objectively show a failing program. It shows the types of race conditions it's reporting: these are read/write race conditions, not read/read. The thread analyzer's html filter doesn't show the types of races reported as clearly as the command-line analyzer which has a windowing GUI. Martin's example above is a read-write race. In the absence of a failing test case it is quite unreasonable to modify the current implementation. Liviu
STDCXX-1070
I filed 1070, failure to build 22.locale.collate.cpp on Linux with gcc 4.7.1. Gcc, Comeau and Clang fail to compile it, Intel and Sun are fine. It looks to me like Intel and Sun compilers are not doing the right thing. A small test case and a patch have been attached. The failing code has been reduced to: $ cat test.cpp; g++ -c test.cpp template class charT void f () { g (charT ('a')); } template class charT void g (charT) { } int h () { return f char (), 0; } test.cpp: In instantiation of 'void f() \[with charT = char\]': test.cpp:14:23: required from here test.cpp:4:5: error: 'g' was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation \[-fpermissive\] test.cpp:8:6: note: 'templateclass charT void g(charT)' declared here, later in the translation unit Liviu
Re: STDCXX-1070
On 09/25/2012 02:46 PM, Liviu Nicoara wrote: I filed 1070, failure to build 22.locale.collate.cpp on Linux with gcc 4.7.1. Gcc, Comeau and Clang fail to compile it, Intel and Sun are fine. It looks to me like Intel and Sun compilers are not doing the right thing. A small test case and a patch have been attached. The failing code has been reduced to: I agree that the test case is ill-formed and requires a diagnostic. I haven't looked at the test so just to make sure the test case does reflect the problem there: the well-formedness depends on whether charT is a class type. If so, then the code would be okay because g() would be found via associated namespace lookup. I.e., this would be fine: template class charT void f () { g (charT ()); } template class charT void g (charT) { } struct S { }; void h () { fS(); } Martin $ cat test.cpp; g++ -c test.cpp template class charT void f () { g (charT ('a')); } template class charT void g (charT) { } int h () { return f char (), 0; } test.cpp: In instantiation of 'void f() \[with charT = char\]': test.cpp:14:23: required from here test.cpp:4:5: error: 'g' was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation \[-fpermissive\] test.cpp:8:6: note: 'templateclass charT void g(charT)' declared here, later in the translation unit Liviu
Re: [jira] [Closed] (STDCXX-1056) std::moneypunct and std::numpunct implementations are not thread-safe
On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote: [ https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan, I don't think it's ok to close this bug. The race conditions are there and we have not come to a completely satisfactory conclusion on how to deal with them, or even if we should deal with them. Whichever it is we gotta keep this discussion open. I sure hope you want to be a part of it. FWIW, I have spent quite some time looking at your proposed patch and I am going to reopen the incident. If I can. Liviu Stefan Teleman closed STDCXX-1056. -- Resolution: Won't Fix Bug will not be fixed. Upstream refuses to acknowledge the existence of the bug in spite of objective evidence to the contrary. std::moneypunct and std::numpunct implementations are not thread-safe - Key: STDCXX-1056 URL: https://issues.apache.org/jira/browse/STDCXX-1056 Project: C++ Standard Library Issue Type: Bug Components: 22. Localization Affects Versions: 4.2.1, 4.2.x, 4.3.x, 5.0.0 Environment: Solaris 10 and 11, RedHat and OpenSuSE Linux, Sun C++ Compilers 12.1, 12.2, 12.3 Issue is independent of platform and/or compiler. Reporter: Stefan Teleman Labels: thread-safety Fix For: 4.2.x, 4.3.x, 5.0.0 Attachments: 22.locale.numpunct.mt.out, facet.cpp.diff, locale_body.cpp.diff, punct.cpp.diff, runtests-linux32-all.out, runtests-linux64-all.out, runtests.out, STDCXX-1056-additional-timings.tgz, stdcxx-1056.patch, stdcxx-1056-timings.tgz, stdcxx-4.2.x-numpunct-perfect-forwarding.patch, stdcxx-4.3.x-numpunct-perfect-forwarding.patch several member functions in std::moneypunct and std::numpunct return a std::string by value (as required by the Standard). The implication of return-by-value being that the caller owns the returned object. In the stdcxx implementation, the std::basic_string copy constructor uses a shared underlying buffer implementation. This shared buffer creates the first problem for these classes: although the std::string object returned by value *appears* to be owned by the caller, it is, in fact, not. In a mult-threaded environment, this underlying shared buffer can be subsequently modified by a different thread than the one who made the initial call. Furthermore, two or more different threads can access the same shared buffer at the same time, and modify it, resulting in undefined run-time behavior. The cure for this defect has two parts: 1. the member functions in question must truly return a copy by avoiding a call to the copy constructor, and using a constructor which creates a deep copy of the std::string. 2. access to these member functions must be serialized, in order to guarantee atomicity of the creation of the std::string being returned by value. Patch for 4.2.1 to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Closed] (STDCXX-1056) std::moneypunct and std::numpunct implementations are not thread-safe
Hi guys, I think the consensus about this bug is that everybody wants to fix it. Stefan has a patch but I got completely lost what is the exact reason of not applying it. It would be a shame to stop at this point. Liviu Nicoara nikko...@hates.ms writes: On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote: [ https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan, I don't think it's ok to close this bug. The race conditions are there and we have not come to a completely satisfactory conclusion on how to deal with them, or even if we should deal with them. Whichever it is we gotta keep this discussion open. I sure hope you want to be a part of it. FWIW, I have spent quite some time looking at your proposed patch and I am going to reopen the incident. If I can. Liviu Stefan Teleman closed STDCXX-1056. -- Resolution: Won't Fix Bug will not be fixed. Upstream refuses to acknowledge the existence of the bug in spite of objective evidence to the contrary. std::moneypunct and std::numpunct implementations are not thread-safe - Key: STDCXX-1056 URL: https://issues.apache.org/jira/browse/STDCXX-1056 Project: C++ Standard Library Issue Type: Bug Components: 22. Localization Affects Versions: 4.2.1, 4.2.x, 4.3.x, 5.0.0 Environment: Solaris 10 and 11, RedHat and OpenSuSE Linux, Sun C++ Compilers 12.1, 12.2, 12.3 Issue is independent of platform and/or compiler. Reporter: Stefan Teleman Labels: thread-safety Fix For: 4.2.x, 4.3.x, 5.0.0 Attachments: 22.locale.numpunct.mt.out, facet.cpp.diff, locale_body.cpp.diff, punct.cpp.diff, runtests-linux32-all.out, runtests-linux64-all.out, runtests.out, STDCXX-1056-additional-timings.tgz, stdcxx-1056.patch, stdcxx-1056-timings.tgz, stdcxx-4.2.x-numpunct-perfect-forwarding.patch, stdcxx-4.3.x-numpunct-perfect-forwarding.patch several member functions in std::moneypunct and std::numpunct return a std::string by value (as required by the Standard). The implication of return-by-value being that the caller owns the returned object. In the stdcxx implementation, the std::basic_string copy constructor uses a shared underlying buffer implementation. This shared buffer creates the first problem for these classes: although the std::string object returned by value *appears* to be owned by the caller, it is, in fact, not. In a mult-threaded environment, this underlying shared buffer can be subsequently modified by a different thread than the one who made the initial call. Furthermore, two or more different threads can access the same shared buffer at the same time, and modify it, resulting in undefined run-time behavior. The cure for this defect has two parts: 1. the member functions in question must truly return a copy by avoiding a call to the copy constructor, and using a constructor which creates a deep copy of the std::string. 2. access to these member functions must be serialized, in order to guarantee atomicity of the creation of the std::string being returned by value. Patch for 4.2.1 to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira -- Wojciech Meyer http://danmey.org
Re: [jira] [Closed] (STDCXX-1056) std::moneypunct and std::numpunct implementations are not thread-safe
On Tue, Sep 25, 2012 at 8:05 PM, Liviu Nicoara nikko...@hates.ms wrote: On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote: [ https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan, I don't think it's ok to close this bug. The race conditions are there and we have not come to a completely satisfactory conclusion on how to deal with them, or even if we should deal with them. Whichever it is we gotta keep this discussion open. I sure hope you want to be a part of it. FWIW, I have spent quite some time looking at your proposed patch and I am going to reopen the incident. If I can. I am done wasting my time on trying to convince this project that it should fix its severe bugs. If and when this project decides to abandon the adolescent attitude which appears to be one of its primary characteristics, let me know. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: [jira] [Closed] (STDCXX-1066) SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte boundary
On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote: [ https://issues.apache.org/jira/browse/STDCXX-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anybody around here, except Stefan, who has access to a SPARC V8 machine updated to the specified kernel update or later, and who is willing to run a simple test program? It's a 5 minute job at most. Thanks! Stefan Teleman closed STDCXX-1066. -- Resolution: Won't Fix Bug will not be fixed upstream. It is fixed in the Solaris releases. SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte boundary --- Key: STDCXX-1066 URL: https://issues.apache.org/jira/browse/STDCXX-1066 Project: C++ Standard Library Issue Type: Bug Components: Thread Safety Affects Versions: 4.2.1, 4.2.2, 4.2.x, 4.3.x Environment: Solaris 10 Update 6 or later on SPARCV8 [32-bit] Defect is compiler-independent - in reality it only affects Sun Studio and GCC. Reporter: Stefan Teleman Labels: features, runtime, threads Fix For: 4.2.2, 4.2.x, 4.3.x Attachments: _config-gcc.h.stdcxx-1066.patch, _config-sunpro.h.stdcxx-1066.patch, ctype.cpp.stdcxx-1066.patch, exception.cpp.stdcxx-1066.patch, ios.cpp.stdcxx-1066.patch, iostream.cpp.stdcxx-1066.patch, iostream.stdcxx-1066.patch, locale_body.cpp.stdcxx-1066.patch, locale_classic.cpp.stdcxx-1066.patch, messages.cpp.stdcxx-1066.patch, _mutex.h.stdcxx-1066.patch, time_put.cpp.stdcxx-1066.patch, use_facet.h.stdcxx-1066.patch Starting with Solaris 10 Update 6, on SPARCV8, pthread_mutex_t and pthread_cond_t MUST be aligned on an 8-byte boundary. Misaligned access will result in either SEGV or SIGBUS. There are numerous places in the multi-threaded version of stdcxx where pthread_mutex_t and/or pthread_cond_t types are contained within an union, but with an enforced alignment different than 8. All these instances must be corrected, and #ifdef-guarded for SPARCV8. Patches to follow shortly, this is just opening the issue. Warning: the patchset resolving this issue is very large, and it affects a large number of files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: STDCXX-1056 : numpunct fix
On 9/24/12 11:06 PM, Stefan Teleman wrote: On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, was it your intention to completely eliminate all the race conditions with this last patch? Is this what the tools showed in your environment? https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452 Yes, all the race conditions in std::numpunct and std::moneypunct. Not all the race conditions in stdcxx. FWIW, I have analyzed the performance of the latest patch from Stefan. The test program is the same test program used in the timings attachments to the incident. The version of the library was 4.2.1, compiler gcc 4.7.1, with: 1. The simple, perfect forwarding patch, with no caching, which does eliminate race conditions. 2. Stefan's latest patch 3. A patch I created which truly, if crudely, eliminates all facet race conditions by using a top-level lock for all facet operations(*) The numbers from the first test run are already available in the incident. The numbers for the second test run were a bit bigger, which is expected because of the additional synchronization. E.g., I have found that the test, run against the STDCXX locale database, gives: $ time ./t en_US.UTF-8 4 500 4, 500 real0m19.210s user0m32.659s sys 0m36.507s where the first patch gives: $ time ./t en_US.UTF-8 4 5000; done 4, 5000 real0m7.961s user0m31.211s sys 0m0.003s Notice the number of loops, 10x larger in the second set of number. Now, for the third patch, the numbers are embarrassingly large. It practically slows the program to a crawl because it does not do any reads of facet data without holding the lock. This, I wager, eliminates all races in the numpunct/moneypunct facet. See: $ time ./t en_US.UTF-8 4 10 4, 10 real0m24.316s user0m28.213s sys 0m6.633s Now that is a whooping performance hit. I went back to see an explanation between the numbers I see with Stefan's more performing patch, and my lock-all patch. I believe that there are still places in Stefan's patch where the facet makes unguarded reads of facet data, e.g.: loc/_numpunct.h: 190 // returns a pointer to the facet's implementation data 191 // if it exists, 0 otherwise 192 const void* _C_data () const { 193 return _C_impsize ? _C_impdata 194 : _RWSTD_CONST_CAST (__rw_facet*, this)-_C_get_data (); 195 } where _C_impsize/_C_impdata are read outside of a lock scope, etc. I.e., a thread may read the _C_impsize while, say, another is still writing the _C_impdata member variable. I conclude, even with such simple testing, that any solution that attempts to completely eliminate the race conditions in these facets will incur a penalty, with the safest (top-level locking) the worst by orders of magnitude and any other solution somewhere in between. Please let me know if you have any questions. Liviu (*) Will attach it as soon as we figure out how to undo the closing of the issue.
Re: [jira] [Closed] (STDCXX-1056) std::moneypunct and std::numpunct implementations are not thread-safe
On 09/25/2012 06:08 PM, Wojciech Meyer wrote: Hi guys, I think the consensus about this bug is that everybody wants to fix it. Stefan has a patch but I got completely lost what is the exact reason of not applying it. It would be a shame to stop at this point. From my POV, there's no disagreement that there is a bug in the thread safety of the facets or its cause, or a lack of desire to fix it. I think the main reason for the lack of consensus on a patch is that we don't yet have a good sense of the performance impact of the proposed approaches or an understanding of the differences in the timings. As for Stefan's patches, the biggest problem I see there (besides the above) is that they introduce changes that appear to be unrelated to the problem without providing a satisfactory rationale. The established stdcxx bug fixing process is to create as small a patch as possible that does nothing but address the problem. This is particularly important when we don't have ready access to test the patch with all the targeted compilers and on all the targeted operating systems (to minimize the risk of incidental breakage and to make it as easy as possible to root cause possible regressions. Martin Liviu Nicoara nikko...@hates.ms writes: On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote: [ https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan, I don't think it's ok to close this bug. The race conditions are there and we have not come to a completely satisfactory conclusion on how to deal with them, or even if we should deal with them. Whichever it is we gotta keep this discussion open. I sure hope you want to be a part of it. FWIW, I have spent quite some time looking at your proposed patch and I am going to reopen the incident. If I can. Liviu Stefan Teleman closed STDCXX-1056. -- Resolution: Won't Fix Bug will not be fixed. Upstream refuses to acknowledge the existence of the bug in spite of objective evidence to the contrary. std::moneypunct and std::numpunct implementations are not thread-safe - Key: STDCXX-1056 URL: https://issues.apache.org/jira/browse/STDCXX-1056 Project: C++ Standard Library Issue Type: Bug Components: 22. Localization Affects Versions: 4.2.1, 4.2.x, 4.3.x, 5.0.0 Environment: Solaris 10 and 11, RedHat and OpenSuSE Linux, Sun C++ Compilers 12.1, 12.2, 12.3 Issue is independent of platform and/or compiler. Reporter: Stefan Teleman Labels: thread-safety Fix For: 4.2.x, 4.3.x, 5.0.0 Attachments: 22.locale.numpunct.mt.out, facet.cpp.diff, locale_body.cpp.diff, punct.cpp.diff, runtests-linux32-all.out, runtests-linux64-all.out, runtests.out, STDCXX-1056-additional-timings.tgz, stdcxx-1056.patch, stdcxx-1056-timings.tgz, stdcxx-4.2.x-numpunct-perfect-forwarding.patch, stdcxx-4.3.x-numpunct-perfect-forwarding.patch several member functions in std::moneypunct and std::numpunct return a std::string by value (as required by the Standard). The implication of return-by-value being that the caller owns the returned object. In the stdcxx implementation, the std::basic_string copy constructor uses a shared underlying buffer implementation. This shared buffer creates the first problem for these classes: although the std::string object returned by value *appears* to be owned by the caller, it is, in fact, not. In a mult-threaded environment, this underlying shared buffer can be subsequently modified by a different thread than the one who made the initial call. Furthermore, two or more different threads can access the same shared buffer at the same time, and modify it, resulting in undefined run-time behavior. The cure for this defect has two parts: 1. the member functions in question must truly return a copy by avoiding a call to the copy constructor, and using a constructor which creates a deep copy of the std::string. 2. access to these member functions must be serialized, in order to guarantee atomicity of the creation of the std::string being returned by value. Patch for 4.2.1 to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira -- Wojciech Meyer http://danmey.org
Re: [jira] [Closed] (STDCXX-1066) SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte boundary
I asked around and we don't have one available at this time. Travis Liviu Nicoara wrote: On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote: [ https://issues.apache.org/jira/browse/STDCXX-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anybody around here, except Stefan, who has access to a SPARC V8 machine updated to the specified kernel update or later, and who is willing to run a simple test program? It's a 5 minute job at most. Thanks! Stefan Teleman closed STDCXX-1066. -- Resolution: Won't Fix Bug will not be fixed upstream. It is fixed in the Solaris releases. SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte boundary --- Key: STDCXX-1066 URL: https://issues.apache.org/jira/browse/STDCXX-1066 Project: C++ Standard Library Issue Type: Bug Components: Thread Safety Affects Versions: 4.2.1, 4.2.2, 4.2.x, 4.3.x Environment: Solaris 10 Update 6 or later on SPARCV8 [32-bit] Defect is compiler-independent - in reality it only affects Sun Studio and GCC. Reporter: Stefan Teleman Labels: features, runtime, threads Fix For: 4.2.2, 4.2.x, 4.3.x Attachments: _config-gcc.h.stdcxx-1066.patch, _config-sunpro.h.stdcxx-1066.patch, ctype.cpp.stdcxx-1066.patch, exception.cpp.stdcxx-1066.patch, ios.cpp.stdcxx-1066.patch, iostream.cpp.stdcxx-1066.patch, iostream.stdcxx-1066.patch, locale_body.cpp.stdcxx-1066.patch, locale_classic.cpp.stdcxx-1066.patch, messages.cpp.stdcxx-1066.patch, _mutex.h.stdcxx-1066.patch, time_put.cpp.stdcxx-1066.patch, use_facet.h.stdcxx-1066.patch Starting with Solaris 10 Update 6, on SPARCV8, pthread_mutex_t and pthread_cond_t MUST be aligned on an 8-byte boundary. Misaligned access will result in either SEGV or SIGBUS. There are numerous places in the multi-threaded version of stdcxx where pthread_mutex_t and/or pthread_cond_t types are contained within an union, but with an enforced alignment different than 8. All these instances must be corrected, and #ifdef-guarded for SPARCV8. Patches to follow shortly, this is just opening the issue. Warning: the patchset resolving this issue is very large, and it affects a large number of files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira