Re: [VOTE] discontinue supporting of the MSVC 7.0
It was about time. +1 Tim Adams wrote: Our experience with our customer base indicates that MSVC 7.0 was abandoned quickly in favor of 7.1. So my vote is also +1 in favor of dropping 7.0. -- Tim -Original Message- From: Martin Sebor [mailto:[EMAIL PROTECTED] On Behalf Of Martin Sebor Sent: Saturday, January 26, 2008 10:39 AM To: dev@stdcxx.apache.org Subject: Re: [VOTE] discontinue supporting of the MSVC 7.0 Farid Zaripov wrote: For now we are supporting four versions of the MSVC starting from 7.0. The MSVC 7.0 doesn't supports the modern C++ features and I propose to discontinue supporting of this compiler in stdcxx 4.3 version. Everyone is encouraged to vote, including non-committers. This vote will close on Tuesday, 01/29 at 6:00 AM US/Mountain time. Follow the link for the countdown: http://tinyurl.com/23clxd +1 in favor of dropping MSVC 7.0 in 4.3. Martin Farid.
Assistance with account reset
Hi, I need assistance with the resetting of my account, lnicoara at apache. I cannot reset my password via https://id.apache.org because I do not have the backup email account anymore, where the reset information is sent. Thanks in advance. Liviu Nicoara
Re: New chair and/or attic
On 08/29/12 10:54, Jim Jagielski wrote: Looking over the lack of activity within this project, it's obvious (at least to me), that maybe its day is done. Should I call a vote to move C++ to the Attic? Or is there someone who feels that the project should still exist *and* is willing to stand as chair? Hi Jim, The discussion back in February showed that, even though committers have not spent much time lately contributing new code to it, there is an active review of the activity occurring on the mailing list and people have volunteered time to at least review outside contributions. As Stefan remarked, putting it in the Attic pretty much closes the activity around it, as little as it is. I personally have a renewed interest in the implementation and am in the process of reviving my apache account with the intention of being a constant presence here, and I hope I will be able to contribute as well. I am not sure if anyone reviewed the patches volunteered by Stefan yet, or the changes in forks elsewhere, but I am currently looking at that, too. Thanks. Liviu
Re: New chair and/or attic
On 08/30/12 06:38, Jim Jagielski wrote: On Aug 29, 2012, at 1:12 PM, Liviu Nicoaranikko...@hates.ms wrote: The discussion back in February showed that, even though committers have not spent much time lately contributing new code to it, there is an active review of the activity occurring on the mailing list and people have volunteered time to at least review outside contributions. As Stefan remarked, putting it in the Attic pretty much closes the activity around it, as little as it is. The issue is that I'm not seeing any real activity on any of the mailing lists... And probably, even with the stated interest, the activity will continue to stay low a while. FWIW, I have spent my past few days catching up with the changes since '08 and refreshing on the build and test infrastructure, etc. Not much of a mailing list activity generator. Liviu
Re: New chair and/or attic
On 08/30/12 06:48, C. Bergström wrote: I'm sincerely sorry to ask this and I have my own answers, but why continue STDCXX when such negativity from Apache is apparent.. AFAICT, the Apache Foundation has been a good host for STDCXX during these years. They have provided a framework for STDCXX to function in as well as an infrastructure for its daily activities. All in accordance to their principles about what constitutes a healthy software project. or Why not move to libc++? (Yes I realize the amount of effort involved here) It can't be explained. L
Re: New chair and/or attic
On 08/30/12 08:56, C. Bergström wrote: On 08/30/12 07:29 PM, Liviu Nicoara wrote: AFAICT, the Apache Foundation has been a good host for STDCXX during these years. They have provided a framework [...] in accordance to their principles about what constitutes a healthy software project. I disagree that the recent actions have fostered positive growth in the project. 1) They fired the previous PMC - who was by far the most invested and dedicated person to the project. I don't care if he missed some reports or had a few flippant comments - I think it was pretty stupid (I mean he's part of the C++ standard committee) Again, according to their principles on what is and what is not a healthy project. I have not yet regained access to my committer account so I am not fully aware of the private discussions around the PMC switch. L
New committers?
I see in the February report (http://stdcxx.apache.org/status/2012-02.txt) that three new committers have been added to the project. Congratulations! Could one of you please update the stdcxx list of committers? Thanks. L
Re: New committers?
On 08/30/12 11:17, Stefan Teleman wrote: [...] I don't mean to punt but I think Jim Jagielski maintains a separate link with the correct list of committers: I don't see any difference between the two, either. I'll leave it at that. L
Re: New chair and/or attic
On Aug 30, 2012, at 5:45 PM, C. Bergström cbergst...@pathscale.com wrote: On 08/31/12 03:10 AM, Pavel Heimlich, a.k.a. hajma wrote: 2) Posting the project is dead on a public list certainly doesn't help grow a community well, it's half year since revival of the project was announced and has there been any progress/improvements? The state of this is a koma at best. Just because bureaucrats say jump doesn't mean anything is going to happen. Pavel has had an unfortunate choice of words. Let's leave it at that. --- The facts as I know it 1) Our fork is maintained (continuous bug fixes - which we won't submit to Apache now) 2) Stefan is putting in some work (one man army) 3) Wojciech Meyer had put in some work 4) NetBSD has a small amount of patches they could probably push upstream (If Jörg has the time) 5) Martin is/was great for feedback in all areas of STL/C++/occasional code review While I recognize the value of each one of the points you make, I am puzzled as to why you are not going forward on your way with your fork? How is the Apache Foundation keeping you from making progress on your use of the library? STDCXX isn't some stupid ass java framework or widget - It's a *critical* part of ... We all know that. This is the reason we/I come back to it over and over again, for reference, or inspiration, or sometime just to remember the good ol' days. That's what I meant by it can't be explained. L
Re: New chair and/or attic
On Aug 30, 2012, at 8:00 PM, C. Bergström wrote: On 08/31/12 06:43 AM, Liviu Nicoara wrote: While I recognize the value of each one of the points you make, I am puzzled as to why you are not going forward on your way with your fork? How is the Apache Foundation keeping you from making progress on your use of the library? For our use it's not and I welcome any patches/help. It may be missed opportunity for getting a larger userbase and a moot point anyway. Specifically FBSD - When trying to push it as part of c++ stack replacement or part of ports the only objection I got was licensing related. (At this point they could also argue missing c++11 support) [...] IIUC, you would want to see STDCXX getting more exposure; one such avenue would involve having it used in FreeBSD as a ports package, with an all permissive BSD license. While STDCXX is at Apache it will never be BSD licensed. Solution - move it away from Apache foundation and have them transfer some of the additional rights they received to allow recipient foundation to relicense. I thought this would be a win for the project and everyone, but for some reason instead of opening a discussion to transfer - it's just death grip and pushing to the attic. The fact that Rogue Wave agreed to release the STDCXX code back in 2005 is nothing short of a miracle. IMHO, we are lucky to benefit from having had this library released to the public, anyway. L
STDCXX forks
Please correct me if I am wrong. I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and the other to Stefan Teleman. The first one contains a number of genuine code changes outside the Apache repository, but without canonical and meaningful commit comments, and without accompanying test cases. Some carry what seem to be bug id's from a private bug database. Some of the changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727, fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd. Are you guys dropping support for any platforms? Stefan's seem like a complete git-ification of the whole Apache repository but with no changes I could detect. Thanks. L
Re-focus [was: Re: New chair and/or attic]
On 08/31/12 08:18, Jim Jagielski wrote: On Aug 30, 2012, at 5:45 PM, C. Bergströmcbergst...@pathscale.com wrote: [...] STDCXX isn't some stupid ass java framework or widget - It's a *critical* part of a C++ stack and the cost of leaving it out of the attic is negligible - What's the benefit of bringing up these attic discussions? It's a critical part in which people either lack the time, motivation or desire to push or submit patches to the canonical source? Or is the desire to force Apache's hand in the matter such that someone else's fork or branch becomes the de-facto source of this critical part??? If stdcxx is as important as you say, and you are fighting to keep it active, then put your money where your mouth is and start working on bumping up the activity. Submit your bug fixes. This discussion is going nowhere and is not becoming of a professional community. By now it must be clear where everybody's interests lie; all who can read took note of that. Since this is largely a dispute between Apache and Pathscale as an alleged representative of a free software community I suggest you take the licensing related discussions in private. L
Re: STDCXX forks
On 08/31/12 08:58, C. Bergström wrote: On 08/31/12 07:40 PM, Liviu Nicoara wrote: Please correct me if I am wrong. I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and the other to Stefan Teleman. The first one contains a number of genuine code changes outside the Apache repository, but without canonical and meaningful commit comments, and without accompanying test cases. We can help clean this up if it makes a real difference That is appreciated. It would make a huge difference to anybody coming anew to the project or just catching up, like me. Some carry what seem to be bug id's from a private bug database. Some of the changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727 iirc we renamed the files because of some problem with cmake - it was trying to compile those files instead of treating them like headers. , fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd. I forget why we did this tbh - I'll try to get an answer in case it comes up again as a question My point exactly, a good comment explains the why's. Are you guys dropping support for any platforms? Not intentionally - We are using/testing this on Solaris, FBSD and Linux - We may add OSX/Win to that list soon, but I can't promise at this point. For targets we only are able to test x86 and x86_64 The reason I am asking is that changes to the template implementation files have most likely broken the support for compilers that do implicit inclusion of template implementations (a.k.a. link-time instantiation of templates). I am not incriminating anyone mind you, I am just trying to point out there mightbe a gap in your testing of your changes. NetBSD also has a fork I believe, but not sure if it's public/status Do you know where that is? Stefan's seem like a complete git-ification of the whole Apache repository but with no changes I could detect. He has quite a number of patches and forget where those are kept. I'm guessing a lot of his fixes target KDE/Qt apps and the Perennial C++VS testsuite. http://www.peren.com/pages/cppvs_set.htm I briefly looked over some patches, there's quite a few of them. I plan to go in more detail over the week-end. Cheers. L
Re: stdcxx issue 1058
On 08/31/12 13:14, Stefan Teleman wrote: In June this year I committed r1353821 to trunk which fixes stdcxx-1058. I have the patches for 1058 ready to commit to branches (4.2.x and 4.3.x). OK to go? The patch looks ok to me. What seems to be the problem? +1 L
Branching policy, 4.3.x, 5.0.0, etc.
The branching policy [1] in effect looks very much like the Rogue Wave release process: branch at the beginning of each release cycle, work on the release branch, merge changes back into the trunk at release time (and in between as needed). Did I get that right? From what I gather 4.2.x has last released 4.2.1, there is a 4.3.x with no releases, and a prospective 5.0.0 which should come out of the trunk (I saw changes mentioning 5.0.0). What are the stated goals for the 4.3.x and 5.x? Thanks. L [1]http://wiki.apache.org/stdcxx/Branching
Re: New committers?
My input below. On 08/31/12 09:42, Wojciech Meyer wrote: The two significant ones (as far as I can understand): - as I heard from Christopher Bergström that it's hard to push the stdcxx to FreeBSD ports repository (I can understand it and that sounds pretty bad, if that's the case then the board should consider re-licensing as advised; I agree in general it's a hard decision for the board, but imagine the project would benefit, IANAL tho) Christopher's wishes and goals may be different from others'. I do not believe he has ulterior motives that would be detrimental to the rest of us but AFAICT he has not made a compelling argument. Even with one, it stretches the imagination what could possibly convince Apache to give up on STDCXX ownership. - I'm also reading through that methodology we use might not fit the distributed model which could basically improve the pace of development stream (and again github is nice at these things; but there are other considerations) The process put in place by Apache closely mirrors the rigors of the Rogue Wave environment where the project originates. The development proceeds at the best speed possible while at the same time proving the consistency and correctness of the code base through passing unit tests. The process is tightly controlled by rules which are observed by everyone (such as creating test cases before fixing bugs, thoroughly documenting changes, following coding and code structuring conventions, etc.). The process has an ultimate authority in the person of the tech lead, Martin. The end result of the _pedantic_ application of these rules is the product you and I, all of us, enjoy. As mentioned before it is of an excellent quality, not often seen in the software world. It also is a very sophisticated product both with an intricate code structure, and extreme use of the language which pushes the compilers to their limits. Any change, however small, must be carefully considered and weighed, and careless changes will almost always break it in subtle ways. As a rule of thumb, if there is something that looks wrong in the source code, chances are you're not getting it right. In case my point did not get across by now, I am strongly for the continuation of a tightly controlled development process. Thanks. Liviu
[PATCH] Trivial test fix
Would someone please apply the patch on 4.2.x branch? (I have not yet regained access to my Apache account.) 2012-09-01 Liviu Nicoara nikko...@hates.ms * tests/containers/23.bitset.cpp: swapped lines to avoid compiler bug (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54442) Index: tests/containers/23.bitset.cpp === --- tests/containers/23.bitset.cpp (revision 1379762) +++ tests/containers/23.bitset.cpp (working copy) @@ -278,8 +278,8 @@ test_synopsis (std::bitset0*) MEMFUN (Bitset, flip, ()); MEMFUN (Bitset, flip, (std::size_t)); -MEMFUN (Bitset::reference, operator[], (std::size_t)); MEMFUN (bool, operator[], (std::size_t) const); +MEMFUN (Bitset::reference, operator[], (std::size_t)); MEMFUN (unsigned long, to_ulong, () const);
Re: STDCXX forks
On 08/31/12 12:20, Stefan Teleman wrote: On Fri, Aug 31, 2012 at 8:40 AM, Liviu Nicoara nikko...@hates.ms wrote: Stefan's seem like a complete git-ification of the whole Apache repository but with no changes I could detect. Not quite. :-) [...] The official Oracle port for Solaris 10 and 11, which I maintain, is here: http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/ Hi Stefan, I have went through the patches. Specifically, I have spent more time looking in the mutex alignment changes and the C++ C library headers patches, and I only read the others. In order: The test extensions seem to be genuine by and large, but I would further analyze them after I find out what is it they are addressing (test cases?). The regression tests whose names contain references to internal bug numbers require a bit more analysis as to their usefulness. Of course an explanation attached to each would alleviate duplicating your work. I have not cross-checked them to JIRA. Some of the compiler characterizations changes, as well as the associated GNUmakefile's, seem to be specific to your port, e.g., GNUmakefile, GNUmakefile.cfg changes. I may have spotted other issues but I would wait for your feed-back first. The C++ C library headers seem to have been re-written to your port. I am unsure why you needed this, but it surely breaks the original intent for these headers' structure. I have also noticed that you stripped the Apache notice and added an Oracle copyright notice on them. This pretty much sums up my first impression. If you were willing to submit them one by one (*), complete with test cases and complete patches (ChangeLog entries and all) I would put in the necessary review time and provide feed-back asap. Please let me know if I missed anything. I sincerely hope this helps. Thanks! Liviu (*) As in issue by issue, not file by file.
Re: STDCXX forks
On 08/31/12 08:58, C. Bergström wrote: On 08/31/12 07:40 PM, Liviu Nicoara wrote: Please correct me if I am wrong. I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and the other to Stefan Teleman. The first one contains a number of genuine code changes outside the Apache repository, but without canonical and meaningful commit comments, and without accompanying test cases. We can help clean this up if it makes a real difference Some carry what seem to be bug id's from a private bug database. Some of the changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727 iirc we renamed the files because of some problem with cmake - it was trying to compile those files instead of treating them like headers. , fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd. I forget why we did this tbh - I'll try to get an answer in case it comes up again as a question Are you guys dropping support for any platforms? Not intentionally - We are using/testing this on Solaris, FBSD and Linux - We may add OSX/Win to that list soon, but I can't promise at this point. For targets we only are able to test x86 and x86_64 Hi Christopher, I finished looking though the github changes. If you are interested in contributing some of those changes back, as complete patches [1], I would volunteer my time to promptly review them and work with you to get them in, as needed. In the meantime, if you will, could you please elaborate on the two changes mentioned above? The first one changes the traditional file extension on the template implementation files, the second affects the library building on platforms like I mentioned in another post. AFAIK STDCXX does not implement or follow Boost conventions and/or library file structure. HTH. Thanks! Liviu [1] http://stdcxx.apache.org/bugs.html#patch_format
Re: STDCXX forks
On Sep 1, 2012, at 1:52 PM, Stefan Teleman wrote: On Sat, Sep 1, 2012 at 12:15 PM, Liviu Nicoara nikko...@hates.ms wrote: [...] This pretty much sums up my first impression. [...] To begin with: the compiler flags/GNUmakefile changes are very specific to the SunPro compilers and to our internal build system. These changes are most likely not suitable for inclusion in the canonical stdcxx, except maybe for the sunpro.config changes, in case someone would like to be able to replicate our builds. I'd like to mention that, in Solaris, Apache stdcxx is a system library. Good point. About the Standard C Library forwarding header files: these changes are specfic to Solaris. The reason behind them is: the Solaris architectural rules, which can be best summarized as: there can be only one of each. In other words, it is Verboten, in Solaris, to duplicate the Standard C Library header files (or any other header file for that matter). The Solaris Standard C Library header files are C++-clean - they are required to be so, by the same architectural rules. Again, these changes are specific to Solaris, and are probably not portable across other implementations. I know for a fact that they are not portable for either the GCC or Intel compilers (with which I test regularly on Linux, in addition to SunPro). So these two groups of changesets can be ignored. Got it. I opened yesterday STDCXX-1066: https://issues.apache.org/jira/browse/STDCXX-1056 about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll have patches done this weekend. Achtung: the patchset is very large and touches a very large number of files. It's strange that I didn't get an email about STDCXX-1066. Acknowledged. I have seen the individual patches on those two websites. I'd also like to talk about STDCXX-1056: https://issues.apache.org/jira/browse/STDCXX-1056 which has already had an initial discussion, and for which I have attached a patch. This issue also addresses (indirectly) linkage when building with GCC. On the recent versions of GCC that I have tested with, passing -supc++ on link line automatically links with the GNU libstdc++6.so (on top of linking with stdcxx), and that just bad. I briefly looked at it, will delve into it later. And then I'll have to cross reference the patches which refer to our internal bug numbers because most of them are quite old and right now, off the top of my head, I can't remember what they are. :-) Thanks! Liviu
Re: stdcxx Wikipedia update
On Sep 2, 2012, at 12:16 PM, Wojciech Meyer wrote: According to the wikipedia we are dead? http://en.wikipedia.org/wiki/Apache_C%2B%2B_Standard_Library are we? :-) It's hard to argue against, at this point. But I hope we'll soon be able to justify changing it. L
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Sep 4, 2012, at 10:34 AM, Stefan Teleman wrote: On Tue, Sep 4, 2012 at 7:35 AM, Liviu Nicoara nikko...@hates.ms wrote: By no means I am dismissing it. It is at the very least an issue of efficiency. I will try an Intel C++ build on x86_64 at some point today. What build type was that? I also notice that you have 4.2.1 in your path? Are you building out of 4.2.1. tag? I built off 4.2.x branch which also has support for custom timeouts (--soft-timeout) in rw_test. Yes, this is 4.2.1 with all the Solaris patches which can be applied on Linux. The environment is: # INFO (S1) (10 lines): # TEXT: # COMPILER: Intel C++, __INTEL_COMPILER = 1210, __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 4 2012, 09:11:36 # COMMENT: thread safety # CLAUSE: lib.locale.numpunct [ ... snip ... ] With all the thread-safety Solaris patches for 1056 applied. The test runs to completion: # NOTE (S2) (5 lines): # TEXT: executing locale -a /tmp/tmpfile-Cej8JU # CLAUSE: lib.locale.numpunct # FILE: process.cpp # LINE: 276 # INFO (S1) (3 lines): # TEXT: testing std::numpunctcharT with 8 threads, 20 iterations each, in 32 locales { C aa_DJ aa_DJ.iso88591 aa_DJ.utf8 aa_ER aa_ER@saaho aa_ER.utf8 aa_ER.utf8@saaho aa_ET aa_ET.utf8 af_ZA af_ZA.iso88591 af_ZA.utf8 am_ET am_ET.utf8 an_ES an_ES.iso885915 an_ES.utf8 ar_AE ar_AE.iso88596 ar_AE.utf8 ar_BH ar_BH.iso88596 ar_BH.utf8 ar_DZ ar_DZ.iso88596 ar_DZ.utf8 ar_EG ar_EG.iso88596 ar_EG.utf8 ar_IN ar_IN.utf8 } # CLAUSE: lib.locale.numpunct [ ... snip ... ] # +---+--+--+--+ # | DIAGNOSTIC| ACTIVE | TOTAL | INACTIVE | # +---+--+--+--+ # | (S1) INFO | 11 | 11 | 0% | # | (S2) NOTE |1 |1 | 0% | # | (S8) ERROR|0 |3 | 100% | # | (S9) FATAL|0 |1 | 100% | # +---+--+--+--+ real 2139.31 user 2406.09 sys 155.61 [steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/04/2012 9:50:14][1300] These tests running to completion (with the patches applied) are 100% reproducible on every single run, on Linux and Solaris (Intel and SPARC). That is good, right? The timings are the output of /usr/bin/time -p. Yes, ~40 minutes wall clock run time for one test is quite a lot. But it runs to completion, with zero errors. And when taking into consideration that there are 8 threads with 20 iterations, maybe it's not that much. FWIW, SELinux is also fully enabled in my environment. Ok, so that clears (this build at least), maybe inefficient but runs to completion with no crashes. The timeouts can also be a symptom of race conditions/deadlocks or attempting to access invalid thread stacks, or partially written data. I would look at timeouts from the standpoint of the progression of the run: if the run does not progress at all, as in threads are deadlocked, then I would consider that a defect. If the run progresses, it is a mere inefficiency, however bad. It's very platform specific. On Solaris SPARC (either 32-bit or 64-bit) I never get timeouts, I always get either SEGV or SIGBUS. On Intel it appears that timeouts are the prevailing symptom. Could not get an Intel build off the ground on the account of the LIMITS.cpp test not running to completion because of a possible compiler bug. I posted earlier an account of it. Do you have a support account that allows posting bug reports for Intel's C++ compiler? I am not sure what an explicit run timeout would add, except hiding the hang/deadlock/memory corruption problem behind a timeout. The timeout option allowed me to run the program with a larger value, such that the alarm would not trigger and allow the program to run to completion. Before enlarging the timeout value, the individual tests would time out. I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle have any program that would allow one to test drive their compiler on a shared box somewhere? I vaguely remember that HP had something like that a while ago. Thanks! Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/05/12 00:51, Stefan Teleman wrote: On Tue, Sep 4, 2012 at 10:49 PM, Martin Sebor mse...@gmail.com wrote: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_GUARD (_C_mutex); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } return _C_grouping; } That's what I wanted to do originally - use a per-object mutex. FWIW, it is pretty obvious to me _now_ that these assignments are not MT-safe, by themselves and also in the context of the test guarding them. You're right, access must be synchronized on a per-facet object basis. Since the data that needs to be protected on assignment is instance data, the mutex used in the guard must be a (yet to be added) instance mutex. That would make it binary incompatible and would go in ... 4.3.x? This works: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); _RWSTD_MT_STATIC_GUARD (_Type); It seems to me that a static guard is excessive. The only thing that needs guarding is the actual assignment to facet instance members -- a mutex instance member would suffice. And even so, this is still not thread-safe: Two different threads [ T1 and T2 ], seeking two different locales [en_US.UTF-8 and ja_JP.UTF-8], enter std::numpunct_CharT::grouping() at the same time - because they are running on two different cores. They both test for if (!(_C_flags _RW::__rw_gr)) and then -- assuming the expression above evaluates to true -- one of them wins the mutex [T1], and the other one [T2] blocks on the mutex. When T1 is done and exits the function, the grouping is set to en_US.UTF-8 and the mutex is released. Now T2 acquires the mutex, and proceeds to setting grouping to ja-JP.UTF-8. Woe on the caller running from T1 who now thinks he got en_US.UTF-8, but instead he gets ja_JP.UTF-8, which was duly set so by T2, but T1 had no clue about it (remember, the std::string grouping _charT buffer is shared by the caller from T1 and the caller from T2). I am pretty sure that in your example, they would operate on two different instances of the facet class. The static guard would synchronize their running alright but through two different facet instances, copying data from two different places of the locale database. In this respect a static guard is excessive. I would defer it to Martin to validate a course of action but to me it looks like the only problem is the concurrent assignment to facet instance member variables inside the facet member functions. Which could be easily and nicely solved with a plain guard on a mutex ordinary member variable. Thanks! Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/05/12 14:09, Stefan Teleman wrote: On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote: [...] OK so I did a little bit of testing, after looking at the *right* __rw_guard class. :-) I changed the std::numpunct class thusly: [...] And then: template class _CharT inline string numpunct_CharT::grouping () const { if (!(_C_flags _RW::__rw_gr)) { _RWSTD_MT_GUARD (_C_object_mutex); // double-test here to avoid re-writing an already written string if (!(_C_flags _RW::__rw_gr)) { numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); // [try to] get the grouping first (may throw) // then set a flag to avoid future initializations __self-_C_grouping = do_grouping (); __self-_C_flags|= _RW::__rw_gr; } } return _C_grouping; } I am afraid this would be unsafe, too (if I said otherwise earlier I was wrong). The compiler might re-arrange the protected assignments, such that another thread sees a partially updated object, where the flags are updated and the string not. I don't think we're going to get away with this here without either a simpler and more inefficient top-level locking, or doing away completely with the lazy initialization. Thoughts? Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/05/12 23:51, Martin Sebor wrote: On 09/05/2012 09:03 PM, Stefan Teleman wrote: [...] Agreed. But: if the choice is between an implementation which [1] breaks ABI and [2] performs 20% worse -- even in contrived test cases -- than another implementation [2] which doesn't break ABI, and performs better than the first one, why would we even consider the first one? Breaking the ABI is not an option (unless we rev the version). But I'm not sure I understand what you think breaks the ABI. I think Stefan is referring to adding a mutex member variable to the facet in question and breaking binary compatibility. If that is the case I have confused things when I suggested exactly that, earlier. A cursory read through the __rw_facet source shows that inherits from __rw_synchronized in MT builds, therefore each facet carries its own mutex member. Liviu We don't need to add a new mutex -- we can use the __rw_facet member for the locking. Or did you mean something else? Martin --Stefan -- And now I see with eye serene The very pulse of the machine.
A question (or two) of procedure, etc.
What is the latest policy in what regards trivial fixes, e.g., the volatile qualifier for the max var in LIMITS.cpp we discussed earlier, etc.? It seems excessive to create a bug report for such issues. Also, IIUC from reading previous discussions, forward and backward binary compatible changes go in 4.2.x, followed by merges to 4.3.x and trunk. Am I getting this right? Also, besides the Linux, FreeBSD, Windows, Solaris builds hosted on Apache (Jenkins) is anybody building on HP-UX, AIX, etc.? Thanks. Liviu
Re: A question (or two) of procedure, etc.
On 09/06/12 14:37, Wojciech Meyer wrote: Liviu Nicoara nikko...@hates.ms writes: What is the latest policy in what regards trivial fixes, e.g., the volatile qualifier for the max var in LIMITS.cpp we discussed earlier, etc.? It seems excessive to create a bug report for such issues. [...] So I vote for keeping the changes as lightweight as possible, and avoid extra bureaucracy if it makes sense. This assumption is based that developers here trust their selves, and run the tests often. I'm not subscribed to the commit list here, but if I do - I usually follow people's changes and assume that developers do read commit lists. Makes sense. Thanks. So the general consensus from my experience with other project was: not sure - ask. That's my experience, also I don't have full rights to express my opinion right now about stdcxx. I sure hope we can have totally open (civilized) discussions going forward. :) Also, IIUC from reading previous discussions, forward and backward binary compatible changes go in 4.2.x, followed by merges to 4.3.x and trunk. Am I getting this right? Every project has certain branch strategy, I'm not sure about this so maybe Martin can advice. I prefer to develop on trunk and cherry pick to the other branches avoiding bulk merges (and that's in both directions). Right... I saw some discussions from a while back about active development on 4.2.x with integration to other branches as dictated by compatibility (e.g., integrating 4.2.x - 4.3.x and 4.2.x - 4.2.1), and reintegration to trunk as needed. I am not looking to change any such policy just wanna make sure I am not messing something up. Also, besides the Linux, FreeBSD, Windows, Solaris builds hosted on Apache (Jenkins) is anybody building on HP-UX, AIX, etc.? Thanks. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On Sep 5, 2012, at 4:03 PM, Martin Sebor wrote: On 09/05/2012 01:33 PM, Liviu Nicoara wrote: On 09/05/12 15:17, Martin Sebor wrote: On 09/05/2012 12:40 PM, Liviu Nicoara wrote: On 09/05/12 14:09, Stefan Teleman wrote: On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote: [...] OK so I did a little bit of testing, after looking at the *right* __rw_guard class. :-) I changed the std::numpunct class thusly: [...] I am afraid this would be unsafe, too (if I said otherwise earlier I was wrong). [...] Thoughts? You're right, there's still a problem. We didn't get the double checked locking quite right. I wish for simplicity: eliminate the lazy initialization, and fully initialize the facet in the constructor. Then we'd need no locking on copying its data (std::string takes care of its own copying). I'm not sure how easily we can do that. Almost all of locale is initialized lazily. Some of the layers might depend on the facets being initialized lazily as well. This was a deliberate design choice. One of the constraints was to avoid dynamic initialization or allocation at startup. [...] There would be a performance degradation. IMHO, it would be minor and would simplify the code considerably. After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I tried to remove the lazy initialization and a (smaller) test case now passes. I am attaching the program and the numpunct patch. Will continue to look into it. Although STDCXX does not have an implementation of the atomics library we could probably use the existing, unexposed, atomics API to implement a more robust lazy initialization. Liviu $ cat tests/localization/t.cpp; svn diff #include locale #include cstdio #include cstdlib #include unistd.h #define MAX_THREADS16 #define MAX_LOOPS1000 static bool hold = true; extern C { static void* f (void* pv) { std::numpunctchar const fac = *reinterpret_cast std::numpunctchar* (pv); while (hold) ; for (int i = 0; i != MAX_LOOPS; ++i) { std::string const grp = fac.grouping (); } return 0; } } int main (int argc, char** argv) { std::locale const loc = std::locale (argv [1]); std::numpunctchar const fac = std::use_facetstd::numpunctchar (loc); pthread_t tid [MAX_THREADS] = { 0 }; for (int i = 0; i MAX_THREADS; ++i) { if (pthread_create (tid + i, 0, f, (void*)fac)) exit (-1); } sleep (1); hold = false; for (int i = 0; i MAX_THREADS; ++i) { if (tid [i]) pthread_join (tid [i], 0); } return 0; } Index: include/loc/_numpunct.h === --- include/loc/_numpunct.h (revision 1381575) +++ include/loc/_numpunct.h (working copy) @@ -61,24 +61,40 @@ struct numpunct: _RW::__rw_facet string_type; _EXPLICIT numpunct (_RWSTD_SIZE_T __ref = 0) -: _RW::__rw_facet (__ref), _C_flags (0) { } +: _RW::__rw_facet (__ref) { +_C_grouping = do_grouping (); +_C_truename = do_truename (); +_C_falsename = do_falsename (); +_C_decimal_point = do_decimal_point (); +_C_thousands_sep = do_thousands_sep (); +} virtual ~numpunct () _RWSTD_ATTRIBUTE_NOTHROW; // 22.2.3.1.1, p1 -char_type decimal_point () const; +char_type decimal_point () const { +return _C_decimal_point; +} // 22.2.3.1.1, p2 -char_type thousands_sep () const; +char_type thousands_sep () const { +return _C_thousands_sep; +} // 22.2.3.1.1, p3 -string grouping () const; +string grouping () const { +return _C_grouping; +} // 22.2.3.1.1, p4 -string_type truename () const; +string_type truename () const { +return _C_truename; +} // 22.2.3.1.1, p4 -string_type falsename () const; +string_type falsename () const { +return _C_falsename; +} static _RW::__rw_facet_id id; @@ -112,15 +128,13 @@ protected: private: -int _C_flags; // bitmap of cached data valid flags -string _C_grouping;// cached results of virtual members +string _C_grouping; string_type _C_truename; string_type _C_falsename; char_type _C_decimal_point; char_type _C_thousands_sep; }; - #ifndef _RWSTD_NO_SPECIALIZED_FACET_ID _RWSTD_SPECIALIZED_CLASS @@ -134,95 +148,6 @@ _RW::__rw_facet_id numpunctwchar_t::id # endif // _RWSTD_NO_WCHAR_T #endif // _RWSTD_NO_SPECIALIZED_FACET_ID - -template class _CharT -inline _TYPENAME numpunct_CharT::char_type -numpunct_CharT::decimal_point () const -{ -if (!(_C_flags _RW::__rw_dp)) { - -numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); - -// [try to] get the decimal point first (may throw) -// then set a flag to avoid future
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/06/12 19:54, Martin Sebor wrote: I'm not sure how easily we can do that. Almost all of locale is initialized lazily. Some of the layers might depend on the facets being initialized lazily as well. This was a deliberate design choice. One of the constraints was to avoid dynamic initialization or allocation at startup. [...] There would be a performance degradation. IMHO, it would be minor and would simplify the code considerably. After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I tried to remove the lazy initialization and a (smaller) test case now passes. I am attaching the program and the numpunct patch. Out of curiosity, does it still work if you move the locale into the thread function? Like this: I created the test case because I wanted something more specific and with more predictable results. Initially, the locale object was created in the thread and it was passing. Also, does the 27.objects test pass with this patch? Will try. I don't think that we can actually use this patch, I bundled it with the test so that people can test the approach right away. I don't know if we have tests for it but writing to cerr/cout in a bad_alloc handler should succeed. I.e., this should not abort: try { struct rlimit rlim = { }; getrlimit (RLIMIT_DATA, rlim); rlim.rlim_cur = 0; setrlimit (RLIMIT_DATA, rlim); for ( ; ; ) new int; } catch (bad_alloc) { cout caught bad alloc: 123 '\n'; } I see.. Liviu
Re: A question (or two) of procedure, etc.
On 09/06/12 23:00, Martin Sebor wrote: Every project has certain branch strategy, I'm not sure about this so maybe Martin can advice. I prefer to develop on trunk and cherry pick to the other branches avoiding bulk merges (and that's in both directions). We've done most work on 4.2.x for historical reasons. I think a better strategy is to develop, as you suggest, on trunk which has the least restrictive commit policy, and merge changes out to the more restrictive branches as appropriate. If open to voting, I am for trunk first, branches later. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/06/12 22:58, Stefan Teleman wrote: On Thu, Sep 6, 2012 at 7:31 PM, Liviu Nicoara nikko...@hates.ms wrote: There would be a performance degradation. IMHO, it would be minor and would simplify the code considerably. After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I tried to remove the lazy initialization and a (smaller) test case now passes. I am attaching the program and the numpunct patch. With your patches, the performance is much much better: I didn't think about timing it, thanks! The result is somewhat expected without the synchronization. However, the full construction-time initialization is inappropriate if you take in consideration the delegation from the public API to the protected virtual API of the facet class, and the possibility of overrides in subclasses. Also, lazy initialization follows the principle of economy of effort, and caching increases efficiency. If possible in a robust way, I would rather have those two. :-) Liviu # INFO (S1) (10 lines): # TEXT: # COMPILER: Intel C++, __INTEL_COMPILER = 1210, __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15 # FILE: 22.locale.numpunct.mt.cpp # COMPILED: Sep 6 2012, 20:50:13 # COMMENT: thread safety # +---+--+--+--+ # | DIAGNOSTIC| ACTIVE | TOTAL | INACTIVE | # +---+--+--+--+ # | (S1) INFO | 11 | 11 | 0% | # | (S2) NOTE |1 |1 | 0% | # | (S8) ERROR|0 |3 | 100% | # | (S9) FATAL|0 |1 | 100% | # +---+--+--+--+ real 1035.05 user 1191.76 sys 63.49 --Stefan -- And now I see with eye serene The very pulse of the machine.
Re: A question (or two) of procedure, etc.
On 09/07/12 10:54, Martin Sebor wrote: We should remember that there are a number of Jira issues that we fixed on 4.2.x but haven't merged out to 4.3.x or trunk. The idea behind the current process (4.2.x - 4.3.x - trunk) was to be able to simply merge the branches in bulk, as opposed to an fix at a time. Unfortunately, we ran into some Subversion issues that made it a huge pain. IIRC, Travis did it at least once so he might still remember the details. That would be very helpful to know. Going forward, to avoid this mess, I would suggest we make an effort to commit fixes wherever necessary at the same time instead of delaying it until some time in the future. Got it. Liviu
dbx [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]
On 09/07/12 11:58, Stefan Teleman wrote: On Fri, Sep 7, 2012 at 8:52 AM, Liviu Nicoara nikko...@hates.ms wrote: On 09/06/12 19:54, Martin Sebor wrote: Also, does the 27.objects test pass with this patch? No, it does not. It hangs at the first insertion, line 227. Unfortunately, I cannot debug it because dbx does not work properly in my installation. What are the symptoms with dbx mishbehaving? (maybe I can help). Thanks! I get this when launching the debugger: $ dbx -xexec32 t For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc Reading t Reading ld-linux.so.2 dbx: fetch at 0xf400 failed -- Input/output error dbx: warning: could not put in breakpoint dbx: warning: internal handler (-80) made defunct -- could not enable event BPT dbx: warning: internal handler (-86) made defunct -- could not enable event BPT ... Then the program just runs from the get go. If I set breakpoints and rerun, it ignores them. Also, long running programs cannot be broken into with C-c: ^Cdbx: warning: Interrupt ignored but forwarded to child. signal INT in (unknown) at 0xf77cf6c4 0xf77cf6c4: movl %edx,%eax dbx: warning: 'stop' ignored -- while doing rtld handshake terminating signal 2 SIGINT dbx: warning: 'stop' ignored -- while doing rtld handshake (dbx) where dbx: program is not active (dbx) Liviu
[PATCH] STDCXX-1067 Mac builds
The default compiler on recent Apple Macs is LLVM with Clang and gcc C++ front-ends. The compiler does not come with a C++ language support library. However, gcc Mac builds are fine with GNU stock compilers, modulo the issues for which I attach the patch below, for review. I built successfully and ran the test suite on both Mac and Linux, wigh gcc 4.5.4 and 4.5.2, respectively. Thanks. Liviu Index: src/x86_64/atomic.s === --- src/x86_64/atomic.s (revision 1382343) +++ src/x86_64/atomic.s (working copy) @@ -26,8 +26,17 @@ * **/ +#if defined (__MACH__) +// Mac OS X Mach-O assembler: no .type, power of two alignment +# define ALIGN_DIR .align 4 +# define TYPE_DIR(ignore,ignore2) +#else +# define ALIGN_DIR .align 16 +# define TYPE_DIR(sym,attr) .type sym, attr +#endif // __MACH__ + .text -.align 16 +ALIGN_DIR /*** * extern C int8_t __rw_atomic_xchg8 (int8_t *x, int8_t y); @@ -37,7 +46,7 @@ **/ .globl __rw_atomic_xchg8 -.type __rw_atomic_xchg8, @function +TYPE_DIR (__rw_atomic_xchg8, STT_FUNC) __rw_atomic_xchg8: /* ; int8_t (int8_t *x, int8_t y) */ movq %rdi, %rcx /* ; %rcx = x */ movb %sil, %al /* ; %al = y */ @@ -53,7 +62,7 @@ **/ .globl __rw_atomic_xchg16 -.type __rw_atomic_xchg16, @function +TYPE_DIR (__rw_atomic_xchg16, STT_FUNC) __rw_atomic_xchg16:/* ; int16_t (int16_t *x, int16_t y) */ movq %rdi, %rcx /* ; %rcx = x*/ movw %si, %ax/* ; %ax = y */ @@ -69,7 +78,7 @@ **/ .globl __rw_atomic_xchg32 -.type __rw_atomic_xchg32, @function +TYPE_DIR (__rw_atomic_xchg32, STT_FUNC) __rw_atomic_xchg32:/* ; int32_t (int32_t *x, int32_t y) */ movq %rdi, %rcx /* ; %rcx = x*/ movl %esi, %eax /* ; %eax = y*/ @@ -85,7 +94,7 @@ **/ .globl __rw_atomic_xchg64 -.type __rw_atomic_xchg64, @function +TYPE_DIR (__rw_atomic_xchg64, STT_FUNC) __rw_atomic_xchg64:/* ; int64_t (int64_t *x, int64_t y) */ movq %rdi, %rcx /* ; %rcx = x*/ movq %rsi, %rax /* ; %rax = y*/ @@ -101,7 +110,7 @@ **/ .globl __rw_atomic_add8 -.type __rw_atomic_add8, @function +TYPE_DIR (__rw_atomic_add8, STT_FUNC) __rw_atomic_add8: /* ; int8_t (int8_t *dst, int8_t inc) */ movq %rdi, %rcx /* ; %rcx = dst */ movl %esi, %eax /* ; %eax = inc */ @@ -123,7 +132,7 @@ **/ .globl __rw_atomic_add16 -.type __rw_atomic_add16, @function +TYPE_DIR (__rw_atomic_add16, STT_FUNC) __rw_atomic_add16: /* ; int16_t (int16_t *dst, int16_t inc) */ movq %rdi, %rcx /* ; %rcx = dst */ movw %si, %ax /* ; %ax = inc */ @@ -146,7 +155,7 @@ **/ .globl __rw_atomic_add32 -.type __rw_atomic_add32, @function +TYPE_DIR (__rw_atomic_add32, STT_FUNC) __rw_atomic_add32: /* ; int32_t (int32_t *dst, int32_t inc) */ movq %rdi, %rcx /* ; %rcx = dst */ movl %esi, %edx /* ; %edx = inc */ @@ -169,7 +178,7 @@ **/ .globl __rw_atomic_add64 -.type __rw_atomic_add64, @function +TYPE_DIR (__rw_atomic_add64, STT_FUNC) __rw_atomic_add64: /* ; int64_t (int64_t *dst, int64_t inc) */ movq %rdi, %rcx /* ; %rcx = dst */ movq %rsi, %rdx /* ; %edx = inc */ Index: etc/config/gcc.config === --- etc/config/gcc.config (revision 1382343) +++ etc/config/gcc.config (working copy) @@ -40,6 +40,7 @@ else ifeq ($(OSNAME),Darwin) OS_MAJOR := $(shell
Re: [PATCH] STDCXX-1067 Mac builds
On 9/9/12 7:07 PM, Wojciech Meyer wrote: Hi Liviu, I don't use Mac OS X at all but: Liviu Nicoara nikko...@hates.ms writes: The default compiler on recent Apple Macs is LLVM with Clang and gcc C++ front-ends. The compiler does not come with a C++ language support library. However, gcc Mac builds are fine with GNU stock compilers, modulo the issues for which I attach the patch below, for review. I think it will be a big win to support Clang for the community. AFAICT, there is only one problem with that, the lack of a complete language support library. Pathscale released a replacement for libsupc++/libgcc_s in the form of the pair libcxxrt/libunwind, which provide equivalent functionality on *BSD/Linux, but not Darwin. Perhaps Christopher can shed some light here. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/10/12 15:01, Stefan Teleman wrote: On Mon, Sep 10, 2012 at 2:21 PM, Liviu Nicoara nikko...@hates.ms wrote: 4. Without caching of grouping values, grouping() delegates always to do_grouping(): real0m5.668s user1m11.389s sys 0m3.952s FWIW, 22.2.3.1.1 explicitly states that all of the decimal_point(), grouping(), truename(), falsename() must return their do_*() counterparts. Yes, of course. :-) I just meant to say that is _always_ delegating to do_grouping(), i.e., does no caching. Liviu --Stefan -- And now I see with eye serene The very pulse of the machine.
Re: Board report time
On 09/11/12 08:15, Jim Jagielski wrote: It's time for our report to the board... what would we like to share? I see: o renewed discussion on health/viability of pmc o increased development being done o PMC expressing interest in moving to git This sounds about right. It should also mention that some members expressed interest in alternative licensing. Thanks. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/11/12 9:40 PM, Martin Sebor wrote: On 09/11/2012 04:15 PM, Stefan Teleman wrote: On Mon, Sep 10, 2012 at 4:24 PM, Stefan Teleman I think I have something which doesn't break BC - stay tuned because I'm testing it now. OK. So, here's a possible implementation of __rw_get_numpunct() with minimal locking, which passes the MT tests and does not break ABI: s247136804.onlinehome.us/stdcxx-1056-20120911/punct.cpp And the same for include/loc/_numpunct.h: http://s247136804.onlinehome.us/stdcxx-1056-20120911/_numpunct.h In _numpunct.h, all the functions perform no checks and no lazy initialization. They function simply as a pass-through to __rw_get_numpunct(). std::numpunctT's data members are now dead varaiables. The bad: performance is no better than with locking the mutex inside each of the std::numpunctT::*() functions, and with lazy instantiation. I wouldn't expect this to be faster than the original. In fact, I would expect it to be slower because each call to one of the public, non-virtual members results in a call to the out-of-line virtual functions (and another to __rw_get_moneypunct). Avoiding the overhead of such calls is the main and only reason why the caching exists. AFAICT, there are two cases to consider: 1. Using STDCXX locale database initializes the __rw_punct_t data in the first, properly synchronized pass through __rw_get_numpunct. All subsequent calls use the __rw_punct_t data to construct returned objects. 2. Using the C library locales does the same in the first pass, via setlocale and localeconv, but setlocale synchronization is via a per-process lock. The facet data, once initialized is used just like above. I probably missed this in the previous conversation, but did you detect a race condition in the tests if the facets are simply forwarding to the private virtual interface? I.e., did you detect that the facet initialization code is unsafe? I think the facet __rw_punct_t data is safely initialized in both cases, it's the caching that is done incorrectly. I'm afraid unoptimized timings don't tell us much. Neither does a comparison between two compilers, even on the same OS. I looked at Liviu's timings today. I was puzzled by the difference between (1) which, IIUC, is the current implementation (presumably an optimized, thread-safe build with the same compiler and OS) and (4), which, again IIUC, is the equivalent of your latest patch here (again, presumably optimized, thread safe, same compiler/OS). I'm having trouble envisioning how calling a virtual function to retrieve the value of grouping can possibly be faster than not calling it (and simply returning the value cached in the data member of the facet. The new results I attached to the issue come from a a bit clearer tests and they focus on just two cases: the current implementation vs. a non-caching one; the latter just forwards the grouping calls to the protected do_grouping, with _no_ other changes to the implementation. The timing numbers seem to show that MT builds fare far worse with the caching than without. Stefan, if you have the time, could you please infirm :) my conclusions by timing it on one of your machines? Thanks, Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/12/12 00:21, Stefan Teleman wrote: On Tue, Sep 11, 2012 at 10:18 PM, Liviu Nicoara nikko...@hates.ms wrote: AFAICT, there are two cases to consider: 1. Using STDCXX locale database initializes the __rw_punct_t data in the first, properly synchronized pass through __rw_get_numpunct. All subsequent calls use the __rw_punct_t data to construct returned objects. 2. Using the C library locales does the same in the first pass, via setlocale and localeconv, but setlocale synchronization is via a per-process lock. The facet data, once initialized is used just like above. I probably missed this in the previous conversation, but did you detect a race condition in the tests if the facets are simply forwarding to the private virtual interface? I.e., did you detect that the facet initialization code is unsafe? I think the facet __rw_punct_t data is safely initialized in both cases, it's the caching that is done incorrectly. I originally thought so too, but now I'm having doubts. :-) And I haven't tracked it down with 100% accuracy yet. I saw today this comment in src/facet.cpp, line 358: // a per-process array of facet pointers sufficiently large // to hold (pointers to) all standard facets for 8 locales static __rw_facet* std_facet_buf [__rw_facet::_C_last_type * 8]; The localization code is cleverly dense and low-level. There are a few patterns that are used throughout, though. One is the function-local static buffers, e.g., the locales buffer and the facets buffer, in locale_body.cpp and facet.cpp, respectively. They both start from a reasonable size and are grown and shrunk according to the needs, but neither locales nor facets are ever evicted from them. See: locale_body:936 -- reduction of locale buffer locale_body:1006 -- expansion of locale buffer facet.cpp:397-- reduction of facet buffer facet.cpp:462-- expansion of facet buffer Facets buffer is guarded by a static guard in: __rw_facet::_C_manage:366 The locales buffer is protected by a static guard in: __rw_locale::_C_manage:880 [...] this is all investigative stuff for tomorrow. :-) and I agree with Martin that breaking ABI in a minor release is really not an option. I'm trying to find the best way of making these facets thread-safe while inflicting the least horrible performance hit. We are getting close to the point where we can summarize all these findings and gauge an appropriate course of action. I will run your tests tomorrow and let you know. :-) That would be very helpful! Liviu
STDCXX-1066 [was: Re: STDCXX forks]
On 9/1/12 1:52 PM, Stefan Teleman wrote: On Sat, Sep 1, 2012 at 12:15 PM, I opened yesterday STDCXX-1066: https://issues.apache.org/jira/browse/STDCXX-1056 about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll have patches done this weekend. Achtung: the patchset is very large and touches a very large number of files. It's strange that I didn't get an email about STDCXX-1066. Hi Stefan, I have read through the patches attached to the incident, then I briefly read about the SunPro pragma align and pack. Two questions: 1. AFAICT, the use of the packing pragma may interfere with a user's setting of the same value. I.e., a user sets the packing in their sources and then, directly or not, includes an STDCXX header. It seems to me that in such a situation, our setting of the packing value would interfere with the rest of the user's translation unit, since there is no way to `restore' the previous packing value. Something along the lines of: // user source file #pragma pack (X) // X != 8 #include iostream struct UserDef { // different alignment than X ? // ... }; Is my understanding correct? 2. The patches are against 4.2.1, but the change would be binary incompatible with the already released 4.2.1 branch. Do you plan to have this fix in 4.3.x? Thanks. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/15/12 1:51 PM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 9:01 AM, Liviu Nicoara nikko...@hates.ms wrote: That is funny. What compiler are you using? What does the following test case return for you? It's the Intel compiler with the patched stdcxx for the wrong case and GCC 4.7.1 + GNU libstdc++ for the correct case. GCC + GNU libstdc++ are correct. The patched facet does not call the protected do_*() virtual functions from their public counterparts, as it is required to do by the Standard. Instead, it returns the data mebers directly (the data members were initialized in the constructor). That is the patch you proposed, which is indeed much better performing than using a mutex lock. Unfortunately, in doing so, overriding the virtual functions in a derived facet type becomes pointless. Ahh, I see now. You are indeed right, that patch is defective. I was under the impression that we were discussing the (later) attachment stdcxx-1056-timings.tgz which contains a perfectly forwarding implementation of the facet public grouping method. The timings I attached there were the ones I thought we were discussing all along. Now, to clear the confusion I created: the timing numbers I posted in the attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a perfectly forwarding, no caching public interface (exemplified by a changed grouping) performs better than the current implementation. It was that test case that I hoped you could time, perhaps on SPARC, in both MT and ST builds. The t.cpp program is for MT, s.cpp for ST. Please let me know if this clarifies things. I apologize for the misunderstanding. Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 9/15/12 2:57 PM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 1:02 PM, Liviu Nicoara nikko...@hates.ms wrote: I have read through the patches attached to the incident, then I briefly read about the SunPro pragma align and pack. Two questions: 1. AFAICT, the use of the packing pragma may interfere with a user's setting of the same value. I.e., a user sets the packing in their sources and then, directly or not, includes an STDCXX header. It seems to me that in such a situation, our setting of the packing value would interfere with the rest of the user's translation unit, since there is no way to `restore' the previous packing value. Something along the lines of: // user source file #pragma pack (X) // X != 8 #include iostream struct UserDef { // different alignment than X ? // ... }; Is my understanding correct? Yes, you are indeed correct, Sir. :-) However, for every #pragma pack(8) there should be a corresponding subsequent #pragma pack(0). If there isn't, that's a patch bug. #pragma pack(0) restores the default alignment. So, there should be (there *must* be) no silent packing side-effects in user programs. Yes, but it restores the default packing, not an arbitrary one, potentially set by the user prior to including our headers. Say, the user sets 2, the default is 4 and we set 8. When we set it to default it goes back to 4, instead of the expected 2. Did I get this right? 2. The patches are against 4.2.1, but the change would be binary incompatible with the already released 4.2.1 branch. Do you plan to have this fix in 4.3.x? Yes, definitely for 4.2.x and 4.3.x. I sent them for 4.2.1 just because that's what I have handy. I see now. I tried to apply them to 4.2.x and some chunks failed, and I thought I was not applying them where they were intended to go. Thanks! Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 9/15/12 5:19 PM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 4:57 PM, Liviu Nicoara nikko...@hates.ms wrote: Yes, but it restores the default packing, not an arbitrary one, potentially set by the user prior to including our headers. Say, the user sets 2, the default is 4 and we set 8. When we set it to default it goes back to 4, instead of the expected 2. Did I get this right? This is true, but leaving some arbirary pragma pack(N) (for N != 0) in effect for the duration of a program, and expecting it to work sounds like a very defective programming approach to me. It will certainly not work on SPARC at all. if the program needs to pack something in a certain specific way, it need to do so for that specific something only. Otherwise the side-effects of globally setting a non-default packing will destroy the program anyway. I merely wanted to point out that restoring the default packing is not the same as restoring the packing to the previous value in effect. Given this, I thought about an alternative way of forcing this alignment, e.g., via a union, aligned on an appropriate type. I see an advantage here in that most of the changes would occur where we define the 'primitive' mutex and condition wrappers, saving a few pre-processor conditionals and pragmas along the way. What do you think? Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/16/12 3:20 AM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote: Now, to clear the confusion I created: the timing numbers I posted in the attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a perfectly forwarding, no caching public interface (exemplified by a changed grouping) performs better than the current implementation. It was that test case that I hoped you could time, perhaps on SPARC, in both MT and ST builds. The t.cpp program is for MT, s.cpp for ST. I got your patch, and have tested it. Thanks, Stefan. I looked over it and it seems very similar to, and somewhat more detailed than gprof profiling output. I am going to update the incident shortly with a more detailed timing measurements on my side, in the form of a new attachment. Just FYI in case you still don't get notifications. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/16/12 11:21 AM, Liviu Nicoara wrote: On 9/16/12 3:20 AM, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote: Now, to clear the confusion I created: the timing numbers I posted in the attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a perfectly forwarding, no caching public interface (exemplified by a changed grouping) performs better than the current implementation. It was that test case that I hoped you could time, perhaps on SPARC, in both MT and ST builds. The t.cpp program is for MT, s.cpp for ST. I got your patch, and have tested it. Thanks, Stefan. I looked over it and it seems very similar to, and somewhat more detailed than gprof profiling output. I am going to update the incident shortly with a more detailed timing measurements on my side, in the form of a new attachment. Just FYI in case you still don't get notifications. I have attached a new set of results to the incident, in the form of the archive: http://tinyurl.com/9drzg4e Please see the content for a description of the library changes (_numpunct.h file), the MT test program (t.cpp) and the results collected through two separate builds on two different machines (results.txt file). Thanks. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/17/12 09:51, Stefan Teleman wrote: On Mon, Sep 17, 2012 at 8:46 AM, Liviu Nicoara nikko...@hates.ms wrote: In the meantime I would like to stress again that __rw_get_numpunct is perfectly thread-safe and does not need extra locking for perfect forwarding. So, by removing the test for if (!(_C_flags _RW::__rw_gr)) (or any other bitmask for that matter), the functions which were thread-unsafe - and were exhibiting all the symptoms of a run-time race condition -, magically became thread-safe? I have looked *extensively* at the code in __rw_get_numpunct. It is inherently thread-unsafe. I mean to say that no extra locking is necessary when the public interface forwards and no caching is done. In more detail: __rw_get_numpunct code is entered upon a call from the protected virtual interface. It calls facet _C_data member function which either returns objects constructed from previously-initialized POD data (in which case no locking is necessary), or it attempts to first initialize the facet data from the STDCXX database. If the latter, the execution goes in the facet data initialization code which is appropriately synchronized, see this stack trace: (gdb) bt #0 __rw::__rw_facet::_C_get_data (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/src/facet.cpp:179 #1 0x0043788b in __rw::__rw_facet::_C_data (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_facet.h:194 #2 0x0044874f in __rw::__rw_get_numpunct (pfacet=0x6e10a0, flags=4) at srcdir/stdcxx/branches/4.2.x/src/punct.cpp:80 #3 0x00449a14 in __rw::__rw_get_punct (pfacet=0x6e10a0, flags=4) at srcdir/stdcxx/branches/4.2.x/src/punct.cpp:578 #4 0x0041cecf in std::numpunctchar::do_grouping (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:99 #5 0x0041bd31 in std::numpunctchar::grouping (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:190 #6 0x004036b8 in main (argc=2, argv=0x7fffe018) at srcdir/stdcxx/branches/4.2.x/tests/localization/s.cpp:29 (gdb) When the above fails, the facet data has not been initialized, e.g., when there is no STDCXX locale database, or the name of the locale does not refer to a locale in STDCXX database. The __rw_get_numpunct function will attempt next to use libc and system locales, via the __rw_setlocale class, which again is properly synchronized and the scope of that lock extends to cover the facet data initialization. See this stack trace: (gdb) bt #0 __rw::__rw_setlocale::__rw_setlocale (this=0x7fffdd30, locname=0x6e112a en_US.utf8, cat=6, nothrow=0) at srcdir/stdcxx/branches/4.2.x/src/setlocale.cpp:84 #1 0x00448974 in __rw::__rw_get_numpunct (pfacet=0x6e10a0, flags=4) at srcdir/stdcxx/branches/4.2.x/src/punct.cpp:134 #2 0x00449a14 in __rw::__rw_get_punct (pfacet=0x6e10a0, flags=4) at srcdir/stdcxx/branches/4.2.x/src/punct.cpp:578 #3 0x0041cecf in std::numpunctchar::do_grouping (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:99 #4 0x0041bd31 in std::numpunctchar::grouping (this=0x6e10a0) at srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:190 #5 0x004036b8 in main (argc=2, argv=0x7fffe018) at srcdir/stdcxx/branches/4.2.x/tests/localization/s.cpp:29 I hope you agree that this synchronization is sufficient for the facet initialization and reading of facet data. Thanks. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/17/12 11:21, Stefan Teleman wrote: On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote: I hope you agree that this synchronization is sufficient for the facet initialization and reading of facet data. Sorry, I do not agree. Two different thread analyzers from two different compilers written by two different compiler writers tell me not to. Stefan, that is indeed your prerogative. However, please keep in mind that tools may be buggy or may have limitations beyond what is advertised. I do not ask you to have faith in my analysis, which would be absurd, but to look for yourself, exercise due diligence in testing the code and draw your own conclusions. Thanks, Liviu
Sun fbe assembler manual
Hi all, I need a reference manual for the fbe assembler. I am interested in the syntax of the `.type' directive. Do you know where such a manual would be located? The Solaris assembler manual I found on Oracle's website does not mention the .type directive. Thanks! Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/18/12 08:55, Stefan Teleman wrote: On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote: I hope you agree that this synchronization is sufficient for the facet initialization and reading of facet data. I have reduced the number of reported race conditions in 22.locale.numpunct.mt from 12440: I am attaching a test program which, while 100% MT-safe, is flagged by the Solaris thread analyzer. The test program contains, conceptually, the exact same of shared variable access like the locale management code, the facet management code and the facet data management code. This proves, if there was a need, that the tool results are to be further analyzed and interpreted but not taken at literal value. The fix we are looking for is one which corrects the initial MT failure observed in caching the facet data, as well as it preserves the unguarded reads of facet data for performance reasons. Thanks. Liviu http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html to 288: http://s247136804.onlinehome.us/stdcxx-1056-20120918/22.locale.numpunct.mt.5.er.html/index.html The changes are in the following files: http://s247136804.onlinehome.us/stdcxx-1056-20120918/facet.cpp http://s247136804.onlinehome.us/stdcxx-1056-20120918/punct.cpp _numpunct.h looks like this: http://s247136804.onlinehome.us/stdcxx-1056-20120918/_numpunct.h With these changes, no races conditions are repoted for any of the functions in std::numpunctT. Still, there are 288 race conditions being reported in __rw_locale::__rw_locale and in std::locale::_C_get_facet. We need to identify the source and cause of these race conditions and correct them as well. This is not a complete solution to the problem, because we still have to re-write the chunk of code I eliminated from facet.cpp. It is only step one towards finding a real solution. But, at least for now, we have pinpointed where the source of these race conditions is located, and what causing it. The test program was run as: ./22.locale.numpunct.mt --nthreads=8 --nloops=1. More to follow. --Stefan #include stdio.h #include stdlib.h #include pthread.h #include unistd.h #define MAX_THREADS 128 static long counter = 0, nloops = 1000, nthreads = 16; static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER; extern C { static void* f (void* pv) { for (size_t i = 0; i nloops; ++i) { if (counter == 0) { pthread_mutex_lock (lock); if (counter == 0) ++counter; pthread_mutex_unlock (lock); } else { // counter value is safe to use here } } printf (%ld\n, counter); return 0; } } int main (int argc, char** argv) { switch (argc) { case 3: nloops = atol (argv [2]); case 2: nthreads = atol (argv [1]); break; } pthread_t tid [MAX_THREADS] = { 0 }; if (nthreads MAX_THREADS) nthreads = MAX_THREADS; for (int i = 0; i nthreads; ++i) { if (pthread_create (tid + i, 0, f, 0)) exit (-1); } for (int i = 0; i nthreads; ++i) { if (tid [i]) pthread_join (tid [i], 0); } return 0; }
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/18/12 13:21, Stefan Teleman wrote: On Tue, Sep 18, 2012 at 12:43 PM, Liviu Nicoara nikko...@hates.ms wrote: I am attaching a test program which, while 100% MT-safe, is flagged by the Solaris thread analyzer. The program as written is not thread safe. It is reading the value of the counter variable and performing a zero comparison outside of a mutex lock: Stefan, I urge you to consider the argument on its merits. Yes, the thread analyzer flags it, but it is nonetheless MT-safe. Specifically: 1. writes are properly synchronized wrt themselves 2. reads are inherently thread-safe wrt themselves 3. reads are properly synchronized wrt writes 4.no thread can possibly observe an intermediate or otherwise incomplete value. I will also add that the flag is either 0 or 1 during the execution of the program, with only one transition from 0 to 1, performed by one single thread. I will concede that I might be wrong and I am open to arguments. I would accept as a counter-argument this program if you can show a runtime failure. I would also accept as argument a scenario under which two threads would see inconsistent/incorrect values or write the variable more than once, etc. Thanks, Liviu for (size_t i = 0; i nloops; ++i) { if (counter == 0) { // --- pthread_mutex_lock (lock); if (counter == 0) ++counter; pthread_mutex_unlock (lock); } else { // counter value is safe to use here } }
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 09/18/12 18:24, Stefan Teleman wrote: On Tue, Sep 18, 2012 at 4:35 PM, Liviu Nicoara nikko...@hates.ms wrote: I will concede that I might be wrong and I am open to arguments. I would accept as a counter-argument this program if you can show a runtime failure. The the first read of the counter variable is outside a mutex lock correct? The read is followed by a 0 comparison, correct? Correct. What guarantees that between the read and the comparison the value of the counter variable hasn't been modified by another thread? The thread currently doing the comparison cannot guarantee it: it hasn't locked the mutex. Other threads may be running - actually they probably are. Another thread may have already acquired the mutex and incremented the value of counter. Your thread has no way of knowing if that has happened, because it does not yet have exclusive access to the counter variable. It will, after it acquires the mutex. It does not matter if the value is changed concurrently in between the reading of it and the actual comparison. The value can only be 0, in which case it takes the branch that locks the mutex and the next read will absolutely see a fresh value; or it can be 1 in which case it has already been initialized and this thread does not pay the penalty of a lock anymore. Where is it reading the variable from? A register? Is it declared volatile? L2 cache? L3 cache? Irrelevant. If the value has not been changed it can be read from either the cache or directly from the memory. If the value has been changed concurrently in between the reading and the actual comparison, the processor sees either a stale value of 0, or the new value 1. It is guaranteed that any thread reading the value after the (one and only) unlock will see a fresh value. The program, as you wrote it, implicitly acknowledges that it is not thread safe. That is the point of the double check: one before the mutex lock, and one after it. That is a misstatement of the program intentions. See below. The point of the first check has nothing to do with thread-safety, and everything to do with a minor optimization: if the value stored in variable counter is already not zero, then there's no point in locking the mutex or performing the increment. The first check is indeed an optimization. It is the point of this exercise to show that the unguarded reads in the localization library are not defects and the code, simplified in my test case, is thread safe in exactly the respects I mentioned before: the program only observes consistent, correct values (program states) in a concurrent environment. HTH. Liviu
Re: STDCXX-1056 [was: Re: STDCXX forks]
On 9/18/12 7:04 PM, Stefan Teleman wrote: On Tue, Sep 18, 2012 at 6:42 PM, Liviu Nicoara nikko...@hates.ms wrote: The first check is indeed an optimization. It is the point of this exercise to show that the unguarded reads in the localization library are not defects and the code, simplified in my test case, is thread safe in exactly the respects I mentioned before: the program only observes consistent, correct values (program states) in a concurrent environment. The Intel Thread Analyzer shows a data race in the exact same spot the SunPro Thread Analyzer shows a data race, and in the exact same spot I have stated earlier that there is a data race. The results are available here: http://s247136804.onlinehome.us/liviu-mt-test/22.mt.test.r000ti3.inspxez As usual, it is an Intel Inspector XE 2003 Analysis Results Blob. It opens with the Intel Inspector. i don't know what format it is in, or whether or not it can be read by other means. So far we have SunPro 12.3/Linux/Intel, SunPro 12.3/Solaris/Intel, SunPro 12.3/Solaris/SPARC and Intel 2003/Linux/Intel saying that there is a race condition in your program. And now I would like to get back to solving the thread-unsafety and race conditions problems in Apache stdcxx. Stefan, you cannot ignore others' arguments and embrace a 'no' attitude; the tools are not saying what you think they do. They all show the same thing because they all do the exact same analysis and we could delve into that, too. I sincerely hoped you could come up with some sort of a logical argument that would prove my latest test case wrong, or at least show it failing at runtime. My time is also limited and I have already spent more time than I wanted to trying to present a logical argument, down to the simplest test cases. I say we defer this to Martin for when he will be back from vacation, next week. For the record, because people may not have the energy to read all this back and forth, I will summarize here the gist of this thread: 1. The facet data caching is not MT-safe 2. The facet data initialization (STDCXX or system locales) is safe (*) 3. There is no unit test currently showing a failure in (2) 4. Timing results show that caching may be slower than non-caching in MT builds 5. A fix should, ideally, be binary compatible 6. A fix should, ideally, preserve performance or increase it 7. There is one patch, currently attached to the issue, by Stefan 8. Other partial patches are referenced from this thread Please correct me if I missed anything. The above summary is a good starting point for a new thread dealing with this issue. HTH. Liviu (*) When perfectly forwarding, with a theoretical concern I asked in an earlier post, directed at Martin.
Re: STDCXX-1056 : numpunct fix
Thanks for the feed-back. Please see below. On Sep 19, 2012, at 10:02 PM, Stefan Teleman wrote: On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara nikko...@hates.ms wrote: I think you are referring to `live' cache objects and the code which specifically adjusts the size of the buffer according to the number of `live' locales and/or facets in it. In that respect I would not call that eviction because locales and facets with non-zero reference counters are never evicted. But anyhoo, this is semantics. Bottom line is the locale/facet buffer management code follows a principle of economy. Yes it does. But we have to choose between economy and efficiency. To clarify: The overhead of having unused pointers in the cache is sizeof(void*) times the number of unused slots. This is 2012. Even an entry-level Android cell phone comes with 1GB system memory. If we want to talk about embedded systems, where memory constraints are more stringent than cell phones, then we're not talking about Apache stdcxx anymore, or any other open souce of the C++ Standard Library. These types of systems use C++ for embedded systems, which is a different animal altogether: no exceptions support, no rtti. For example see, Green Hills: http://www.ghs.com/ec++.html. And even they have become more relaxed about memory constraints. They use BOOST. Bottom line: so what if 16 pointers in this 32 pointer slots cache never get used. The maximum amount of wasted memory for these 16 pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers. Can we live with that in 2012, a year when a $500 laptop comes with 4GB RAM out of the box? I would pick 128 bytes of allocated but unused memory over random and entirely avoidable memory churn any day. The argument is plausible and fine as far as brainstorming goes. But have you measured the amount of memory consumed by all STDCXX locale data loaded in one process? How much absolute time is spent in resizing the locale and facet buffers? What is the gain in space and time performance with such a change versus without? Just how fragmented the heap becomes and is there a performance impact because of it, etc.? IOW, before changing the status quo one must show an objective defect, produce a body of evidence, including a failing test case for the argument. My goal: I would be very happy if any application using Apache stdcxx would reach its peak instantiation level of localization (read: max number of locales and facets instantiated and cached, for the application's particular use case), and would then stabilize at that level *without* having to resize and re-sort the cache, *ever*. That is a locale cache I can love. I love binary searches on sorted containers. Wrecking the container with insertions or deletions, and then having to re-sort it again, not so much. Especially when I can't figure out why we're doing it in the first place. And I love minimalistic code, and hate waste at the same time, especially in a general purpose library. To each its own. Hey Stefan, are the above also timing the changes? Nah, I didn't bother with the timings - yet - for a very simple reason: in order to use instrumentation, both with SunPro and with Intel compilers, optimization of any kind must be disabled. On SunPro you have to pass -xkeepframe=%all (which disables tail-call optimization as well), in addition to passing -xO0 and -g. So the timings for these unoptimized experiments would have been completely irrelevant. Well, I think you are the only one around here with access to SPARC hardware, your input is very precious in this sense. Also, this is the reason for which I kept asking that question earlier: do we have currently any failing locale MT test when numpunct does just perfect forwarding, with no caching? I.e., changing just _numpunct.h and no other source file (as to silence thread analyzers warnings) does any locale (or other) MT tests fail? I would greatly appreciate it if you could give it a run on your hardware if you don't already know the answer. The discussion has been productive. But I object to the patch as is because it goes out of the scope of the original incident. I think this patch should only touch the MT defect detected by the failing test cases. If you think the other parts you changed are defects you should open corresponding issues in JIRA and have them discussed in their separate rooms. Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
On 09/20/12 13:11, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms wrote: But have you measured the amount of memory consumed by all STDCXX locale data loaded in one process? How much absolute time is spent in resizing the locale and facet buffers? What is the gain in space and time performance with such a change versus without? Just how fragmented the heap becomes and is there a performance impact because of it, etc.? IOW, before changing the status quo one must show an objective defect, produce a body of evidence, including a failing test case for the argument. sizeof(std::locale) * number_of_locales. I was more interested in the underlying locale data, not the C++ objects. I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. Thanks for the secret, I appreciate it. Did you mean to say that the C Standard mandates that?! What I do know for a fact that this optimization did, was to cause the races conditions reported by 4 different thread analyzers. Race conditions are a show-stopper for me, and they are not negotiable. No, that optimization was not causing the MT defect you originally noted. Saying so only shows a lack of familiarity with the implementation. And I love minimalistic code, and hate waste at the same time, especially in a general purpose library. To each its own. Here's a helpful quote: We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. It's from Donald Knuth. Please, no. And I love correct code which doesn't cause thread analyzers to report more than 12000 race conditions for just one test case. I've said it before and I will say it again: race conditions are a showstopper and are not negotiable. Period. Stefan, you are not being correct by a consensus of thread analyzers output or by being loud, or by pounding your fist on the table. This being said I will continue to exercise due diligence and put in the necessary time and effort to provide you with the most useful feed-back I can. I see that you missed my question in the post before: did you see a failure in the locale MT tests in your SPARC runs? If you do not want to run that test, that is fine, just let me know. Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote: To answer your question [...]: yes, the MT failures occur on SPARC as well, on both SPARCV8 and SPARCV9, and the race conditions are reported on *ALL* plaforms tested, even after having applied your _numpunct.h patch. This patch alone does *NOT* solve the problem. Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 7:37 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer wojciech.me...@googlemail.com wrote: Hi, My perceptions is by reading through the whole thread - we should not trust 100% external tools to asses the safety of the code. I don't think there exist an algorithm that produces no false positives. That's said I admire Stefan's approach, but we should ask the question are we MT safe enough? I would say from what I read here: yes. Based on what objective metric? The only gold currency that anyone in here accepts without reservations are failing test cases. I believe I have seen some exceptions to the golden rule in my RW time, but I can't recall any specific instance. Liviu
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek travis.vi...@roguewave.com wrote: I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. You have a working knowledge of all Standard C Library implementations? I happen to do, yes, for the operating systems that I've been testing on. I also happen to know that you don't. This fact alone pretty much closes up *this* particular discussion. Do yourself, and this mailing list a favor: either write a patch which addresses all of your concerns *AND* eliminates all the race conditions reported, or stop this pseudo software engineering bullshit via email. There is apparently, a high concentration of know-it-alls on this mailing list, who are much better at detecting race conditions and thread unsafety than the tools themselves. Too bad they aren't as good at figuring out their own bugs. The sniping is uncalled for. There are no enemies here, no one is after you. There is criticism though and you are expected to take it and argue your point of view. If you can't stand the heat, get out of the kitchen. It took eight months for anyone here to even *acknowledge* that numpunct and moneypunct do have, in fact, a thread safety problem. Never mind that the test case for these facets had been crashing for 4 years. To be quite blunt and to the point, after 8 months of denying obvious facts, your credibility is quite a bit under question at this point. Yes, we are busy with other stuff. I wish I got paid to work on this instead. This entire discussion has become a perfect illustration with what's wrong with the ASF, as reported here: http://www.mikealrogers.com/posts/apache-considered-harmful.html I actually read it. I see a guy complaining he can't have it his way. No problem. One can fork this project at any time and start it anew, by themselves, or in the company of like programmers elsewhere. For better or worse Apache got STDCXX from RogueWave. Complaining about it is like complaining that Apple doesn't give us iPhones for free; after all we are the power users and we know what to do with them. L
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 7:45 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? On September 17, 2012 I have posted the following message to this list: http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html In that message, there is a link to my SPARC thread-safety test results: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html This test was run with the following _numpunct.h file: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html The test above shows 12440 race conditions detected for a test run of 22.locale.numpunct.mt, with --nthreads=8 --nloops=1. Did you ever look at these test results? From reading your email, I realize that you never looked at it. That is the only possible explanation as to why you're asking now for SPARC test results, today being September 20, 2012. I see, there is a confusion about this. Probably nobody explained it before. A failing test case means a test case that causes the abnormal termination of the execution of the program or creates evidence of abnormal data in the program execution. In this respect please see the atomic add and exchange tests as classical examples of what I mean. I have read all your emails in detail and I have inspected all your attachments, modulo the ones I could not open. Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 8:02 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote: On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote: The only gold currency that anyone in here accepts without reservations are failing test cases. I believe I have seen some exceptions to the golden rule in my RW time, but I can't recall any specific instance. That may be a valid metric here. The only one. Any programmer worth his salt -- I am borrowing your words here -- would be able to demonstrate the validity of his point of view with a test case. I did. There are 12440 race conditions detected for an incomplete run of 22.locale.numpunct.mt. By incomplete I mean: it did not run with its default nthreads and nloops which I believe are 8 threads and 20 loop iterations. That is not it, and you did not. Please pay attention: given your assertion that a race condition is a defect that causes an abnormal execution of the program during which the program sees abnormal, incorrect states (read: variable values) it should be easy for you to craft a program that shows evidence of that by either printing those values, or aborting upon detecting them, etc. [...] and overall just email bullshit. Stop using that word. L
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 8:59 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström cbergst...@pathscale.com wrote: fencepost comment - The results are based on tools and I don't think he has a large program which actually triggers the conditions. (Creating one may take quite some time) I do have a program which triggers the race conditions: 22.locale.numpunct.mt. Part of the official test harness. The real reason why they don't want to accept what the instrumentation tools are saying is very simple: they don't LIKE reading what the tools are saying. So, blame the tools, pretend that as long as it doesn't crash again there's no bug and hope for the best. I cannot include an analyzer output as a regression test in the test suite. But I am very glad this is on a public mailing list, so everyone can read what's going on here. ?
Re: STDCXX-1056 : numpunct fix
On 09/21/12 05:13, Stefan Teleman wrote: On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I looked at it in great detail and I am sure Travis read it too. The facet/locale buffer management is a red herring. Apparently, we cannot convince you, regardless of the argument. That's fine by me. I have yet to see an alternative patch being proposed here, which would eliminate the race conditions, and which I am willing to test just as thoroughly as I've tested mine. The only thing i've seen are continued attempts at dismissing the existence of these race conditions, as reported by the instrumentation tools, based on some general axioms about their accuracy. No-one on this list has a clue as to how the SunPro Thread Analyzer actually works, because it's not open source, and none of you work at Oracle, therefore you can't see the code. But everyone just *knows* that it's inaccurate, or that it reports false positives. We are not dismissing the analysis. We are dismissing your interpretation of it. The tool is not indicating a defect, but a potential defect. I am calling it potential because, as my test case (http://tinyurl.com/97ks9gc) indicates, one can have a race condition which is not a defect in a concurrent environment. As long as you, or anyone else, continues to blame the instrumentation tools for the bug, and as long as anyone here continues to suggest that the only acceptable proof of the existence of this bug is some other program which needs to be written using fprintf(3C), and as long as no-one here is willing to provide an alternative patch which demonstrably eliminates 100% of the reported race conditions, this entire back-and-forth about the existence of these race conditions, the accuracy of the tools and what they are reporting is nothing but a giant waste of time. The correction of 1056 does not necessarily need to eliminate the analyzer warnings. Aiming for that is simply unreasonable, not because it is not attainable (after all, you demonstrated it), but because it is inappropriately and insufficiently argued for the situation at hand. Liviu
SUNPro 5.12 compilation failure in optimized Linux builds
Optimized (but not debug) builds fail to compile setlocale.cpp with the error: $ cat t.cpp; CC -c t.cpp #define _XOPEN_SOURCE #include cwchar /opt/sunpro/12_3/prod/include/cc/wchar.h, line 90: Error: tm is not defined. /opt/sunpro/12_3/prod/include/cc/wchar.h, line 92: Error: fgetwc is not defined. ... and so on for every wide-char function in wchar.h. The definition of _XOPEN_SOURCE at the top of setlocale.cpp is not entirely correct in that it interacts with the various guards in the C library headers. AFAICT, Oracle/Sun wchar.h includes C library wchar.h, which includes C library wctype.h, which includes again wchar.h, coming full circle to the compiler wchar.h which uses the names w/o them being defined, yet. The fact that Oracle/Sun headers include_next corresponding C library headers outside of a inclusion guard does not help. (Debug builds are not affected because of interaction from the difference in the structure and includes of rw/_traits.h in debug vs. optimized builds). The poor man's fix is to guard that _XOPEN_SOURCE define for SUNPro builds (see below). However, I am not sure whose side the error is. Thanks, Liviu PS. Is it still called SUNPro? Oracles seems it has sanitized that name out on their website. Index: src/setlocale.cpp === --- src/setlocale.cpp (revision 1388733) +++ src/setlocale.cpp (working copy) @@ -34,8 +34,10 @@ #include rw/_defs.h #if defined (__linux__) !defined (_XOPEN_SOURCE) +# if !defined (__SUNPRO_CC) // need S_IFDIR on Linux # define _XOPEN_SOURCE +# endif // !__SUNPRO_CC #endif // __linux__ !_XOPEN_SOURCE #include locale.h // for setlocale()
STDCXX-1056 DCII [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]
On 9/17/12 5:42 PM, Liviu Nicoara wrote: [...] However, after more careful thought, I think there is a problem there even though we don't have an objective proof for it, yet. The writes are not atomic and they function just like DCII, being subject to both compiler reordering and out of order execution. E.g., it is assumed that the writes to the _C_impsize and _C_impdata members occur in the program order, which would make unguarded reads/tests of _C_impsize safe. This is what the facet initialization and access code does, conceptually: if (_C_impsize) { // already initialized, use data } else { // initialize mutex.lock (); _C_impdata = get_data_from_database (); _C_impsize = get_data_size (); mutex.unlock (); } The mutex lock and unlock operations introduce memory barriers but they are not guaranteeing the order in which the two writes occur. If the writes would occur in the exact program order, unguarded reads and tests of _C_impsize would be safe. Unfortunately, a thread may see the _C_impsize variable being updated before the _C_impdata. Elaborating on the above -- facet.cpp:~250, the code is actually closer to this: if (_C_impsize) { // already initialized, use _C_impdata } else { // initialize mutex.lock (); if (_C_impsize) return _C_impdata; void* const pv = ...; size_t sz = ...; _C_impdata = pv; // 1 _C_impsize = sz; // 2 mutex.unlock (); } (1) and (2) can be reordered. Also, when mapping the STDCXX locale database, facet.cpp:288, there is another defect writing the two variables: _C_impdata = __rw_get_facet_data (cat, _C_impsize, 0, codeset); ^^^ ||| Set early in __rw_get_facet_data -+++ The only way to order the above writes, preserve the fast checking for initialization, and generally salvaging the current algorithm is a full memory barrier in between the writes, e.g.: if (_C_impsize) { // already initialized, use data } else { // ... _C_impdata = pv; full_membar (); _C_impsize = sz; // ... } Atomics are part of C++11, I wonder if anybody here has working experience with the atomic API. A mutex locking will not do because the lock is only acquiring and the write before the lock can shift down. Of course I could be wrong. Any input is welcome. Thanks, Liviu
STDCXX-1068 and alignment
Stefan, could you please elaborate on the misaligned reads/writes that you observed on those platforms? What was failing? Thanks, Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 09/16/12 12:03, Stefan Teleman wrote: On Sat, Sep 15, 2012 at 5:47 PM, Liviu Nicoara nikko...@hates.ms wrote: I merely wanted to point out that restoring the default packing is not the same as restoring the packing to the previous value in effect. Given this, I thought about an alternative way of forcing this alignment, e.g., via a union, aligned on an appropriate type. I see an advantage here in that most of the changes would occur where we define the 'primitive' mutex and condition wrappers, saving a few pre-processor conditionals and pragmas along the way. I understood what you were saying. I just don't understand under what _sane_ circumstances a program would #include a system library header file with a previously set packing to something other than default. Or would expect a non-default packing to work during a function call to a sytem library. That's an insane configuration on any operating system that I know of, not just on SPARCV8. I can't think of a valid scenario, either. I guess the point is moot. A few more questions, if you will, as I am going through the changes: 1. I see similarities with 1040, should/would you close that one? 2. The issue only exists in MT builds, should there be a guard in configs? 3. The align reference docs talk only about aligning variables, not types. Is that different on SPARC? 4. I see rw/_mutex.h has alignment pragmas for both __rw_mutex_base class and its mutex member; same for __rw_static_mutex and its static member, etc. How does that work? 5. Why is __rw_guard aligned explicitly? I see it only contains a pointer to a mutex object. 6. The docs mention that the pragma must use the mangled variables names but I don't see that in the patch. Thanks! Liviu
[PATCH] STDCXX-853
Umm, I didn't think to search for a corresponding incident and I considered the defect to be so minor as to not warrant an issue. The following patch has been applied already on 4.2.x: Index: tests/support/atomic_xchg.cpp === --- tests/support/atomic_xchg.cpp (revision 1388732) +++ tests/support/atomic_xchg.cpp (revision 1388733) @@ -297,7 +297,7 @@ void run_test (intT, thr_args_base::tag_ // compute the expected result, skipping zeros by incrementing // expect twice when it overflows and wraps around to 0 (zero is // used as the lock variable in thread_routine() above) -intT expect = intT (1); +intT volatile expect = intT (1); const unsigned long nincr = (Args::nthreads_ * Args::nincr_) / 2U;
[PATCH] STDCXX-1069 [was: Re: SUNPro 5.12 compilation failure in optimized Linux builds]
On 09/22/12 00:51, Liviu Nicoara wrote: Optimized (but not debug) builds fail to compile setlocale.cpp with the error: A patch and a comment have been attached to the issue. Thanks, Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 9/23/12 3:48 PM, Stefan Teleman wrote: On Sun, Sep 23, 2012 at 3:26 PM, Liviu Nicoara nikko...@hates.ms wrote: To be honest it's quite bizarre that you cannot share that with us. Is it really a trade secret? How can that be the case if Oracle customers are also required to perform the same alignment, perhaps using the same techniques like you showed in the patch? That's the problem. I don't know what is and what is not a trade secret, or copyrighted information, or unauthorized intellectual property disclosure anymore. I think it's in the eye of the beholder. At Sun it was very clear. Believe it or not, I had to get written approval from the Legal Counsel's Office in order to be able to share these patches. And that in spite of the fact that these patches are published, and have already been published for *years*. That clarifies things. Thanks. Liviu
Re: STDCXX-1066 [was: Re: STDCXX forks]
On 9/23/12 5:50 PM, Stefan Teleman wrote: On Sun, Sep 23, 2012 at 5:23 PM, Stefan Teleman stefan.tele...@gmail.com wrote: The second URL says this: QUOTE Due to a change in the implementation of the userland mutexes introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and pthread_mutex_t must start at 8-byte aligned addresses. If this requirement is not satisfied, all non-compliant applications on Solaris/SPARC may fail with the signal SEGV with a callstack similar to the following one or with similar callstacks containing the function mutex_trylock_process. \*_atomic_cas_64(0x141f2c, 0x0, 0xff00, 0x1651, 0xff00, 0x466d90) set_lock_byte64(0x0, 0x1651, 0xff00, 0x0, 0xfec82a00, 0x0) fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780) /QUOTE Here's a link to an official datatype alignment table for SPARCV8: http://docs.oracle.com/cd/E19205-01/819-5267/bkbkl/index.html The interesting table is: Table B–2 Storage Sizes and Default Alignments in Bytes I see nothing really outstanding here. What is it that I should pay attention to? Thanks, Liviu
Re: [PATCH] STDCXX-853
On 09/24/12 01:18, Travis Vitek wrote: Liviu, Should the volatile be to the left of the intT typename here? I know it is equivalent, but it is weird to look at the line of code below and see that we're following two different conventions. Thanks, will do. Travis ___ From: Liviu Nicoara Sent: Sunday, September 23, 2012 8:34 AM To: dev@stdcxx.apache.org Subject: [PATCH] STDCXX-853 Umm, I didn't think to search for a corresponding incident and I considered the defect to be so minor as to not warrant an issue. The following patch has been applied already on 4.2.x: Index: tests/support/atomic_xchg.cpp === --- tests/support/atomic_xchg.cpp (revision 1388732) +++ tests/support/atomic_xchg.cpp (revision 1388733) @@ -297,7 +297,7 @@ void run_test (intT, thr_args_base::tag_ // compute the expected result, skipping zeros by incrementing // expect twice when it overflows and wraps around to 0 (zero is // used as the lock variable in thread_routine() above) -intT expect = intT (1); +intT volatile expect = intT (1); const unsigned long nincr = (Args::nthreads_ * Args::nincr_) / 2U; -- And now I see with eye serene The very pulse of the machine.
Re: STDCXX-1056 : numpunct fix
On 9/24/12 12:06 AM, Stefan Teleman wrote: On Fri, Sep 21, 2012 at 9:10 AM, Liviu Nicoara nikko...@hates.ms wrote: On 09/21/12 05:13, Stefan Teleman wrote: On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I looked at it in great detail and I am sure Travis read it too. The facet/locale buffer management is a red herring. Apparently, we cannot convince you, regardless of the argument. That's fine by me. This bug [STDCXX-1056] was updated over the weekend with new comments. Here's the link to the comments, for the record: Stefan, was it your intention to completely eliminate all the race conditions with this last patch? Is this what the tools showed in your environment? Thanks, Liviu https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452
Re: STDCXX-1056 : numpunct fix
On 09/24/12 22:28, Martin Sebor wrote: On 09/20/2012 06:46 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoaranikko...@hates.ms wrote: I have not created this requirement out of thin air. STDCXX development has functioned in this manner for as long as I remember. If it does not suit you, that's fine. That would explain why these bugs are present in the first place. If the official method of determining thread-safety here is fprintf(3C), then we have a much bigger problem than 22.locale.numpunct.mt. Tests that end in .mt.cpp use the RW_ASSERT() macro to verify correctness in multithreaded programs. There is no way at the moment to run a thread analyzer as part of the test suite, although it would be a nice addition. As I said in my other reply, though, not all data races necessarily indicate bugs, so we'd need a way to distinguish between know/benign races and the rest. (Unless we decide to eliminate all races and accept the performance penalty.) Preserving the lazy initialization is possible: we perfectly forward to the implementation with either: 1. no additional synchronization (preserves fast reads of the data), while fully knowing that there are data races in __rw_get_numpunct/_C_get_data there, or 2.we put a big lock on top of all and every facet operations (the version that would be race free). Caching seems out of the window for now, more so in the presence of unguarded reads of facet's data. The only argument for (1) is that at the moment we don't have a failing test yet, and not for lack of arduous trying. However, if atomic/membar APIs would be recognized by the compiler, lazy initialization of the facet and caching of data would be attainable in a quite simple fashion. Liviu
Re: STDCXX-1056 : numpunct fix
On 09/24/12 23:50, Stefan Teleman wrote: On Mon, Sep 24, 2012 at 10:03 PM, Martin Sebor mse...@gmail.com wrote: FWIW, there are race conditions in stdcxx. Some of them are by design and benign on the systems the library runs on (although I acknowledge that some others may be bugs). One such benign date race is: timeT1 T2 0x = N 1x = N read x where x is a scalar that can be accessed atomically by the CPU and the compiler. I think some of the lazy facet initialization falls under this class. It would be nice to fix them but only if it doesn't slow things down. The others need to be fixed in any case. The race conditions I am talking about are not benign. Caching failures aside, we have no failing tests. Before saying they are malign you need to objectively show a failing program. It shows the types of race conditions it's reporting: these are read/write race conditions, not read/read. The thread analyzer's html filter doesn't show the types of races reported as clearly as the command-line analyzer which has a windowing GUI. Martin's example above is a read-write race. In the absence of a failing test case it is quite unreasonable to modify the current implementation. Liviu
STDCXX-1070
I filed 1070, failure to build 22.locale.collate.cpp on Linux with gcc 4.7.1. Gcc, Comeau and Clang fail to compile it, Intel and Sun are fine. It looks to me like Intel and Sun compilers are not doing the right thing. A small test case and a patch have been attached. The failing code has been reduced to: $ cat test.cpp; g++ -c test.cpp template class charT void f () { g (charT ('a')); } template class charT void g (charT) { } int h () { return f char (), 0; } test.cpp: In instantiation of 'void f() \[with charT = char\]': test.cpp:14:23: required from here test.cpp:4:5: error: 'g' was not declared in this scope, and no declarations were found by argument-dependent lookup at the point of instantiation \[-fpermissive\] test.cpp:8:6: note: 'templateclass charT void g(charT)' declared here, later in the translation unit Liviu
Re: [jira] [Closed] (STDCXX-1056) std::moneypunct and std::numpunct implementations are not thread-safe
On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote: [ https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan, I don't think it's ok to close this bug. The race conditions are there and we have not come to a completely satisfactory conclusion on how to deal with them, or even if we should deal with them. Whichever it is we gotta keep this discussion open. I sure hope you want to be a part of it. FWIW, I have spent quite some time looking at your proposed patch and I am going to reopen the incident. If I can. Liviu Stefan Teleman closed STDCXX-1056. -- Resolution: Won't Fix Bug will not be fixed. Upstream refuses to acknowledge the existence of the bug in spite of objective evidence to the contrary. std::moneypunct and std::numpunct implementations are not thread-safe - Key: STDCXX-1056 URL: https://issues.apache.org/jira/browse/STDCXX-1056 Project: C++ Standard Library Issue Type: Bug Components: 22. Localization Affects Versions: 4.2.1, 4.2.x, 4.3.x, 5.0.0 Environment: Solaris 10 and 11, RedHat and OpenSuSE Linux, Sun C++ Compilers 12.1, 12.2, 12.3 Issue is independent of platform and/or compiler. Reporter: Stefan Teleman Labels: thread-safety Fix For: 4.2.x, 4.3.x, 5.0.0 Attachments: 22.locale.numpunct.mt.out, facet.cpp.diff, locale_body.cpp.diff, punct.cpp.diff, runtests-linux32-all.out, runtests-linux64-all.out, runtests.out, STDCXX-1056-additional-timings.tgz, stdcxx-1056.patch, stdcxx-1056-timings.tgz, stdcxx-4.2.x-numpunct-perfect-forwarding.patch, stdcxx-4.3.x-numpunct-perfect-forwarding.patch several member functions in std::moneypunct and std::numpunct return a std::string by value (as required by the Standard). The implication of return-by-value being that the caller owns the returned object. In the stdcxx implementation, the std::basic_string copy constructor uses a shared underlying buffer implementation. This shared buffer creates the first problem for these classes: although the std::string object returned by value *appears* to be owned by the caller, it is, in fact, not. In a mult-threaded environment, this underlying shared buffer can be subsequently modified by a different thread than the one who made the initial call. Furthermore, two or more different threads can access the same shared buffer at the same time, and modify it, resulting in undefined run-time behavior. The cure for this defect has two parts: 1. the member functions in question must truly return a copy by avoiding a call to the copy constructor, and using a constructor which creates a deep copy of the std::string. 2. access to these member functions must be serialized, in order to guarantee atomicity of the creation of the std::string being returned by value. Patch for 4.2.1 to follow. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] [Closed] (STDCXX-1066) SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte boundary
On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote: [ https://issues.apache.org/jira/browse/STDCXX-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anybody around here, except Stefan, who has access to a SPARC V8 machine updated to the specified kernel update or later, and who is willing to run a simple test program? It's a 5 minute job at most. Thanks! Stefan Teleman closed STDCXX-1066. -- Resolution: Won't Fix Bug will not be fixed upstream. It is fixed in the Solaris releases. SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte boundary --- Key: STDCXX-1066 URL: https://issues.apache.org/jira/browse/STDCXX-1066 Project: C++ Standard Library Issue Type: Bug Components: Thread Safety Affects Versions: 4.2.1, 4.2.2, 4.2.x, 4.3.x Environment: Solaris 10 Update 6 or later on SPARCV8 [32-bit] Defect is compiler-independent - in reality it only affects Sun Studio and GCC. Reporter: Stefan Teleman Labels: features, runtime, threads Fix For: 4.2.2, 4.2.x, 4.3.x Attachments: _config-gcc.h.stdcxx-1066.patch, _config-sunpro.h.stdcxx-1066.patch, ctype.cpp.stdcxx-1066.patch, exception.cpp.stdcxx-1066.patch, ios.cpp.stdcxx-1066.patch, iostream.cpp.stdcxx-1066.patch, iostream.stdcxx-1066.patch, locale_body.cpp.stdcxx-1066.patch, locale_classic.cpp.stdcxx-1066.patch, messages.cpp.stdcxx-1066.patch, _mutex.h.stdcxx-1066.patch, time_put.cpp.stdcxx-1066.patch, use_facet.h.stdcxx-1066.patch Starting with Solaris 10 Update 6, on SPARCV8, pthread_mutex_t and pthread_cond_t MUST be aligned on an 8-byte boundary. Misaligned access will result in either SEGV or SIGBUS. There are numerous places in the multi-threaded version of stdcxx where pthread_mutex_t and/or pthread_cond_t types are contained within an union, but with an enforced alignment different than 8. All these instances must be corrected, and #ifdef-guarded for SPARCV8. Patches to follow shortly, this is just opening the issue. Warning: the patchset resolving this issue is very large, and it affects a large number of files. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: STDCXX-1056 : numpunct fix
On 9/24/12 11:06 PM, Stefan Teleman wrote: On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, was it your intention to completely eliminate all the race conditions with this last patch? Is this what the tools showed in your environment? https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452 Yes, all the race conditions in std::numpunct and std::moneypunct. Not all the race conditions in stdcxx. FWIW, I have analyzed the performance of the latest patch from Stefan. The test program is the same test program used in the timings attachments to the incident. The version of the library was 4.2.1, compiler gcc 4.7.1, with: 1. The simple, perfect forwarding patch, with no caching, which does eliminate race conditions. 2. Stefan's latest patch 3. A patch I created which truly, if crudely, eliminates all facet race conditions by using a top-level lock for all facet operations(*) The numbers from the first test run are already available in the incident. The numbers for the second test run were a bit bigger, which is expected because of the additional synchronization. E.g., I have found that the test, run against the STDCXX locale database, gives: $ time ./t en_US.UTF-8 4 500 4, 500 real0m19.210s user0m32.659s sys 0m36.507s where the first patch gives: $ time ./t en_US.UTF-8 4 5000; done 4, 5000 real0m7.961s user0m31.211s sys 0m0.003s Notice the number of loops, 10x larger in the second set of number. Now, for the third patch, the numbers are embarrassingly large. It practically slows the program to a crawl because it does not do any reads of facet data without holding the lock. This, I wager, eliminates all races in the numpunct/moneypunct facet. See: $ time ./t en_US.UTF-8 4 10 4, 10 real0m24.316s user0m28.213s sys 0m6.633s Now that is a whooping performance hit. I went back to see an explanation between the numbers I see with Stefan's more performing patch, and my lock-all patch. I believe that there are still places in Stefan's patch where the facet makes unguarded reads of facet data, e.g.: loc/_numpunct.h: 190 // returns a pointer to the facet's implementation data 191 // if it exists, 0 otherwise 192 const void* _C_data () const { 193 return _C_impsize ? _C_impdata 194 : _RWSTD_CONST_CAST (__rw_facet*, this)-_C_get_data (); 195 } where _C_impsize/_C_impdata are read outside of a lock scope, etc. I.e., a thread may read the _C_impsize while, say, another is still writing the _C_impdata member variable. I conclude, even with such simple testing, that any solution that attempts to completely eliminate the race conditions in these facets will incur a penalty, with the safest (top-level locking) the worst by orders of magnitude and any other solution somewhere in between. Please let me know if you have any questions. Liviu (*) Will attach it as soon as we figure out how to undo the closing of the issue.
Re: svn commit: r1389337 - /stdcxx/branches/4.2.x/tests/support/atomic_xchg.cpp
On 09/25/12 23:20, Martin Sebor wrote: On 09/24/2012 06:38 AM, lnico...@apache.org wrote: Author: lnicoara Date: Mon Sep 24 12:38:17 2012 New Revision: 1389337 URL: http://svn.apache.org/viewvc?rev=1389337view=rev Log: 2012-09-24 Liviu Nicoara nikko...@hates.ms * tests/support/atomic_xchg.cpp: (run_test) moved counter volatile qualification to match STDCXX conventions (see r1388733) The stdcxx ChangeLog format follows the GNU Coding Standard: http://www.gnu.org/prep/standards/html_node/Change-Logs.html I.e., this entry should look like this: * tests/support/atomic_xchg.cpp (run_test): Move counter volatile qualification to match STDCXX convention (see r1388733). (The colon follows the parenthesized function name. The text should also end with a period.) The svn propedit command lets you change a comment for an already committed change. Sorry about that. I fixed it. L
Reopen closed incidents
I want to reopen the closed incidents, esp. 1056. On a second thought it might be more useful to open new ones and link the old ones so that we don't mess with ownership, etc. If nobody objects to this I will go forward with the latter. Thanks.
STDCXX-1071 numpunct facet defect
I have created STDCXX-1071 and linked to STDCXX-1056. I have reduced the scope to numpunct because moneypunct is not failing for me. If someone has a moneypunct failure listing I want to see it. I have reduced the library code to a failing test case. I have attached there the reduced program. The program shows the same failures like the library (interestingly enough it does not fail when removing the data caching). I have (re)attached the forwarding patches I created for 1056, one binary compatible, the other not. Martin, my wish is to move this to some completion. Please let me know if there is something else you think I should do to speed this up. I am open to all questions, the more the better. Most of my opinions have been expressed earlier, but please ask if you want to know more. Thanks, Liviu
Re: [jira] [Closed] (STDCXX-1066) SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte boundary
On 09/27/12 07:15, Pavel Heimlich, a.k.a. hajma wrote: 2012/9/26 Liviu Nicoara nikko...@hates.ms: On 09/26/12 05:49, Pavel Heimlich, a.k.a. hajma wrote: 2012/9/26 Liviu Nicoara nikko...@hates.ms: On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote: [ https://issues.apache.org/jira/browse/STDCXX-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anybody around here, except Stefan, who has access to a SPARC V8 machine updated to the specified kernel update or later, and who is willing to run a simple test program? It's a 5 minute job at most. Please point me to the test program. Hi Pavel, I attached it. IIRC Solaris had both Solaris threads API and a POSIX threads API on top of it. Could you please give it a run with MUTEX defined to both mutex_t and pthread_mutex_t? Might need to tweak the includes. Here it is, the define seems to make no difference. BTW I'm not sure about the bug description, was Solaris 10 ever supported on a sparcv8? I ran it on the crappiest machine I could find, but all that is sparcv9. Sun Blade 1000 - UltraSPARC-III+(sparcv9), running Solaris 10u9 bash-3.00# /opt/SUNWspro/bin/CC a.cpp bash-3.00# ./a.out 24 ffbffcd8 ffbffcf0 ffbffd08 ffbffd20 ffbffd38 ffbffd50 ffbffd68 ffbffd80 ffbffbe0 ffbffc00 ffbffc20 ffbffc40 ffbffc60 ffbffc80 ffbffca0 ffbffcc0 bash-3.00# /opt/SUNWspro/bin/CC -xarch=v8 ./a.cpp bash-3.00# ./a.out 24 ffbffcd8 ffbffcf0 ffbffd08 ffbffd20 ffbffd38 ffbffd50 ffbffd68 ffbffd80 ffbffbe0 ffbffc00 ffbffc20 ffbffc40 ffbffc60 ffbffc80 ffbffca0 ffbffcc0 bash-3.00# vi a.cpp (--- pthread_mutex_t) bash-3.00# /opt/SUNWspro/bin/CC a.cpp bash-3.00# ./a.out 24 ffbffcd8 ffbffcf0 ffbffd08 ffbffd20 ffbffd38 ffbffd50 ffbffd68 ffbffd80 ffbffbe0 ffbffc00 ffbffc20 ffbffc40 ffbffc60 ffbffc80 ffbffca0 ffbffcc0 bash-3.00# /opt/SUNWspro/bin/CC -xarch=v8 ./a.cpp bash-3.00# ./a.out 24 ffbffcd8 ffbffcf0 ffbffd08 ffbffd20 ffbffd38 ffbffd50 ffbffd68 ffbffd80 ffbffbe0 ffbffc00 ffbffc20 ffbffc40 ffbffc60 ffbffc80 ffbffca0 ffbffcc0 HTH Very much appreciated. Liviu
Re: [PATCH] STDCXX-1069 [was: Re: SUNPro 5.12 compilation failure in optimized Linux builds]
On 09/23/12 12:15, Liviu Nicoara wrote: On 09/22/12 00:51, Liviu Nicoara wrote: Optimized (but not debug) builds fail to compile setlocale.cpp with the error: A patch and a comment have been attached to the issue. I am posting it here to save a trip to the JIRA issue. Any feed-back is appreciated. The issue can be tracked down to the interaction between the definition of _XOPEN_SOURCE in src/setlocale.cpp and util/path.cpp, the system headers, and Solaris Studio C . It seems that the original purpose for the definitions has been lost because both S_IFDIR and symlink are defined and declared, respectively, on modern Linux (edit: by default). Moreover, it seems that the original definition of _XOPEN_SOURCE was incorrect, without a value. I propose we remove the two blocks – see the patch. Thanks, Liviu Index: src/setlocale.cpp === --- src/setlocale.cpp (revision 1388733) +++ src/setlocale.cpp (working copy) @@ -33,11 +33,6 @@ #include rw/_defs.h -#if defined (__linux__) !defined (_XOPEN_SOURCE) - // need S_IFDIR on Linux -# define _XOPEN_SOURCE -#endif // __linux__ !_XOPEN_SOURCE - #include locale.h // for setlocale() #include stdlib.h // for getenv() #include string.h // for memcpy(), strcmp() Index: util/path.cpp === --- util/path.cpp (revision 1388733) +++ util/path.cpp (working copy) @@ -27,16 +27,6 @@ **/ #ifndef _WIN32 -# ifdef __linux__ - // for symlink() -#ifndef _XOPEN_SOURCE -# define _XOPEN_SOURCE -#endif -#ifndef _XOPEN_SOURCE_EXTENDED -# define _XOPEN_SOURCE_EXTENDED -#endif -# endif // __linux__ - # include unistd.h // for getcwd() # include sys/stat.h// for struct stat, stat() #else
Re: STDCXX-1068 and alignment
On 09/22/12 16:22, Stefan Teleman wrote: On Sat, Sep 22, 2012 at 4:07 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, could you please elaborate on the misaligned reads/writes that you observed on those platforms? What was failing? Several tests from the test harness were failing because of it, and for no other reason. I'll have to go through my stuff at work and look at the test reports from the Compiler Guys, we were testing these back in March 2011, and right now I can't remember exactly which tests were failing. I'll have the details in a few days. Any news on this one? Thanks, Liviu
Re: STDCXX-1071 numpunct facet defect
On 09/26/12 20:12, Liviu Nicoara wrote: I have created STDCXX-1071 and linked to STDCXX-1056. [...] I am open to all questions, the more the better. Most of my opinions have been expressed earlier, but please ask if you want to know more. I am attaching here the proposed (4.3.x) patch and the timings results (after re-verifying the correctness of the timing program and the results). The 4.2.x patch, the 4.3.x patch, the test program and the results file are also attached to the incident. Thanks, Liviu Index: include/loc/_numpunct.h === --- include/loc/_numpunct.h (revision 1388733) +++ include/loc/_numpunct.h (working copy) @@ -61,7 +61,7 @@ struct numpunct: _RW::__rw_facet string_type; _EXPLICIT numpunct (_RWSTD_SIZE_T __ref = 0) -: _RW::__rw_facet (__ref), _C_flags (0) { } +: _RW::__rw_facet (__ref) { } virtual ~numpunct () _RWSTD_ATTRIBUTE_NOTHROW; @@ -109,15 +109,6 @@ protected: virtual string_type do_falsename () const { return _RW::__rw_get_punct (this, _RW::__rw_fn, char_type ()); } - -private: - -int _C_flags; // bitmap of cached data valid flags -string _C_grouping;// cached results of virtual members -string_type _C_truename; -string_type _C_falsename; -char_type _C_decimal_point; -char_type _C_thousands_sep; }; @@ -139,17 +130,7 @@ template class _CharT inline _TYPENAME numpunct_CharT::char_type numpunct_CharT::decimal_point () const { -if (!(_C_flags _RW::__rw_dp)) { - -numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); - -// [try to] get the decimal point first (may throw) -// then set a flag to avoid future initializations -__self-_C_decimal_point = do_decimal_point (); -__self-_C_flags |= _RW::__rw_dp; -} - -return _C_decimal_point; +return do_decimal_point (); } @@ -157,34 +138,14 @@ template class _CharT inline _TYPENAME numpunct_CharT::char_type numpunct_CharT::thousands_sep () const { -if (!(_C_flags _RW::__rw_ts)) { - -numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); - -// [try to] get the thousands_sep first (may throw) -// then set a flag to avoid future initializations -__self-_C_thousands_sep = do_thousands_sep (); -__self-_C_flags |= _RW::__rw_ts; -} - -return _C_thousands_sep; +return do_thousands_sep (); } template class _CharT inline string numpunct_CharT::grouping () const { -if (!(_C_flags _RW::__rw_gr)) { - -numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); - -// [try to] get the grouping first (may throw) -// then set a flag to avoid future initializations -__self-_C_grouping = do_grouping (); -__self-_C_flags|= _RW::__rw_gr; -} - -return _C_grouping; +return do_grouping (); } @@ -192,17 +153,7 @@ template class _CharT inline _TYPENAME numpunct_CharT::string_type numpunct_CharT::truename () const { -if (!(_C_flags _RW::__rw_tn)) { - -numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); - -// [try to] get the true name first (may throw) -// then set a flag to avoid future initializations -__self-_C_truename = do_truename (); -__self-_C_flags|= _RW::__rw_tn; -} - -return _C_truename; +return do_truename (); } @@ -210,17 +161,7 @@ template class _CharT inline _TYPENAME numpunct_CharT::string_type numpunct_CharT::falsename () const { -if (!(_C_flags _RW::__rw_fn)) { - -numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this); - -// [try to] get the false name first (may throw) -// then set a flag to avoid future initializations -__self-_C_falsename = do_falsename (); -__self-_C_flags |= _RW::__rw_fn; -} - -return _C_falsename; +return do_falsename (); } // #endif _RWSTD_NO_EXT_NUMPUNCT_PRIMARY -*- mode: org -*- * Machines: ** iMac, Intel, 4 cores: $ uname -a; gcc -v Darwin imax 11.4.0 Darwin Kernel Version 11.4.0: Mon Apr 9 19:32:15 PDT 2012; root:xnu-1699.26.8~1/RELEASE_X86_64 x86_64 gcc version 4.7.1 (GCC) ** Linux Slackware, AMD, 16 cores: $ uname -a; gcc -v Linux behemoth 2.6.37.6 #3 SMP Sat Apr 9 22:49:32 CDT 2011 x86_64 AMD Opteron(tm) Processor 6134 AuthenticAMD GNU/Linux gcc version 4.5.2 (GCC) * Method ** Library Apply the patch. Build an optimized library (I used 12S in all runs). Build the library, rwtest, and locale database: $ nice make -Clib $ nice make -Cbin locales $ nice make -Crwtest Properly export the necessary envar if running against STDCXX locale database or unset, otherwise: $ export RWSTD_LOCALE_ROOT=/path/to/.../nls ** Test program Place the multi-threaded program source file, t.cpp, in srcdir/tests/localization and run make
Re: STDCXX-1071 numpunct facet defect
I thought I replied but I see no trace of my post: On 09/27/12 20:27, Martin Sebor wrote: On 09/27/2012 06:41 AM, Liviu Nicoara wrote: On 09/26/12 20:12, Liviu Nicoara wrote: I have created STDCXX-1071 and linked to STDCXX-1056. [...] I am open to all questions, the more the better. Most of my opinions have been expressed earlier, but please ask if you want to know more. I am attaching here the proposed (4.3.x) patch and the timings results (after re-verifying the correctness of the timing program and the results). The 4.2.x patch, the 4.3.x patch, the test program and the results file are also attached to the incident. The patch isn't binary compatible. We can't remove data members in a minor release. We could only do it in a major release. I posted two patches, one for 4.3.x and one for 4.2.x (the latter is binary compatible). My understanding was that minor version revisions may break binary compatibility. Liviu
STDCXX-1072 SPARC V8 mutex alignment requirements
I have created the above and linked it to the closed STDCXX-1066. In short, my reading about this issue is that the kernel patch changed the alignment of the userland mutex objects from a machine word to a double-word boundary. No changes are required of the users who use such objects in their programs unless users create mutex objects in buffers which may not be aligned on a proper boundary. E.g., the following are safe: mutex_t lock; struct S { char misalign; mutex_t lock; }; whereas the following is not: union { void* align; char buf [sizeof mutex_t]; } u; new (u) mutex_t; because the alignment requirements for void pointer are less strict than for mutex_t. A few places in the library use the latter for all sorts of static objects (mostly local statics). I looked esp. for places where we build objects that contain mutex sub-objects inside a union-aligned buffer: struct S { char c; mutex_t m; }; ... union { void* align; // - incorrect char buf [sizeof (S)]; } u; new (u) S (); The alignment must be changed to a value equal or greater than the mutex alignment requirements. IMO, the patch I attached does not break binary compatibility. It uses a one size fits all long double for alignment -- like we use in rw/_mutex.h -- but in doing so dispenses with all sorts of preprocessor conditionals. I still don't have access to a SPARC machine. Any feed-back and/or SPARC build results are more than welcome! Thanks! Liviu Index: src/messages.cpp === --- src/messages.cpp(revision 1390953) +++ src/messages.cpp(working copy) @@ -65,7 +65,7 @@ struct __rw_open_cat_data { nl_catd catd; union { -void *_C_align; +long double _C_align; char _C_data [sizeof (_STD::locale)]; } loc; }; Index: src/locale_body.cpp === --- src/locale_body.cpp (revision 1390953) +++ src/locale_body.cpp (working copy) @@ -1025,7 +1025,7 @@ _C_manage (__rw_locale *plocale, const c // the classic C locale is statically allocated // and not destroyed during the lifetime of the process static union { -void* _C_align; +long double _C_align; char _C_buf [sizeof (__rw_locale)]; } classic_body; Index: src/use_facet.h === --- src/use_facet.h (revision 1390953) +++ src/use_facet.h (working copy) @@ -64,7 +64,7 @@ else { \ /* construct an ordinary facet in static memory */ \ static union { \ - void *align_; \ + long double align_; \ char data_ [sizeof (__rw_ ## fid ## _facet)]; \ } f;\ static __rw_facet* const pf = \ @@ -91,7 +91,7 @@ { \ /* construct an ordinary facet in static memory */ \ static union { \ - void *align_; \ + long double align_; \ char data_ [sizeof (__rw_ ## fid ## _facet)]; \ } f;\ static __rw_facet* const pf = \ Index: src/ctype.cpp === --- src/ctype.cpp (revision 1390953) +++ src/ctype.cpp (working copy) @@ -627,7 +627,7 @@ _RWSTD_EXPORT __rw_facet* __rw_ct_ctype } else { static union { -void *align_; +long double align_; char data_ [sizeof (_STD::ctypechar)]; } f; static __rw_facet* const pf = Index: src/locale_classic.cpp === --- src/locale_classic.cpp (revision 1390953) +++ src/locale_classic.cpp (working copy) @@ -39,7 +39,7 @@ _RWSTD_NAMESPACE (__rw) { // static buffer for the classic C locale object static union { -void* _C_align; +long double _C_align; char _C_buf [sizeof (_STD::locale)]; } __rw_classic;
Re: STDCXX-1072 SPARC V8 mutex alignment requirements
On 09/28/12 08:29, Liviu Nicoara wrote: I have created the above and linked it to the closed STDCXX-1066. [...] IMO, the patch I attached does not break binary compatibility. Scratch this, I haven't thought it through. Thanks, Liviu
Re: STDCXX-1072 SPARC V8 mutex alignment requirements
On 09/28/12 08:45, Liviu Nicoara wrote: On 09/28/12 08:29, Liviu Nicoara wrote: I have created the above and linked it to the closed STDCXX-1066. [...] IMO, the patch I attached does not break binary compatibility. Scratch this, I haven't thought it through. Actually, after more thought, I believe that the patch itself is not causing binary incompatibility. The library built on a kernel-patched system will be binary incompatible with a library built on a previous system, regardless of the patch I presented earlier, because new mutex alignment requirements will have changed the size and layout of library objects. Comments are welcome. Thanks, Liviu
Re: STDCXX-1071 numpunct facet defect
On 09/28/12 11:01, Travis Vitek wrote: Only major versions can break binary. The versioning policy for stdcxx can be found here.. http://stdcxx.apache.org/versions.html Thanks, that clarifies things. Liviu
Re: STDCXX-1072 SPARC V8 mutex alignment requirements
On 09/28/12 11:45, Travis Vitek wrote: Liviu, Sorry I didn't look until just now, but it appears that I could have re-opened STDCXX-1066. I only see the 'Reopen Issue' button for STDCXX issues, but it is most definitely there. Perhaps there is some sort of permission issue for you? It's ok, I think it's somewhat cleaner to create a new one and link it to the old one. Even if clean was not a concern, it was within Stefan's options to close the incident. I don't know. Also, STDCXX-1066 appears to have been a duplicate of STDCXX-1040. Yep, forgot about it, I am thinking about linking that one, too. Thanks.
Re: STDCXX-1072 SPARC V8 mutex alignment requirements
On 09/28/12 11:32, Travis Vitek wrote: -Original Message- From: Liviu Nicoara Sent: Friday, September 28, 2012 5:29 AM In short, my reading about this issue is that the kernel patch changed the alignment of the userland mutex objects from a machine word to a double-word boundary. No changes are required of the users who use such objects in their programs unless users create mutex objects in buffers which may not be aligned on a proper boundary. Your reading of this is consistent with my previous understanding of the problem, so that is good. I still don't have access to a SPARC machine. Any feed-back and/or SPARC build results are more than welcome! I can provide build results for SPARCV9 if we want them, but I thought that the problem only came up on 32-bit SPARCV8 builds. I guess a -xarch=sparcv8 build will do. It is quite unlikely anybody has a real 32-bit SPARC hardware. The patch assumes the type `long double' exists on every platform. While I do believe that it is available everywhere, we have lots of conditional code guarding its use in the library now. If we are going to use `long double' in this context, we should guard it with _RWSTD_NO_LONG_DOUBLE. I think an even cleaner solution is to switch to using __rw_aligned_buffer instead. It gives us a single point of failure for alignment issues like this, and it makes the code self-documenting and easier to read. I took a cue from an alignment in the rw/_mutex.h code there. I'll do better. As for your concerns about binary compatibility, I think that everything should be okay. We aren't changing the size of anything that is being passed around, we're just changing its alignment. I could be wrong, but I'm pretty sure that we're safe. The library object sizes and layouts will change with or without our patch. Before the kernel patch our objects will have one layout and size and different ones afterwards. E.g.: struct locale { int c; pthread_mutex_t lock; }; before kernel patching you'd have no padding between the members. That's what I thought when firing that second post about compatibility. Liviu
Re: STDCXX-1072 SPARC V8 mutex alignment requirements
On 09/28/12 13:51, Martin Sebor wrote: [...] One other comment: I would suggest choosing subjects for bug reports that reflect the problem rather than a fix for it or a rationale for it. For STDCXX-1066 I think something like Library mutex objects misaligned on SPARCV8 would better capture the problem than the current title. (It's also up to us to rename an issue if we find it more descriptive than the original.) I can't do that myself. I looked at that and 1056 and there is no button for me to reopen, or to edit stuff which is not mine. Needless to say, I like my issues better. But I have no problems with the re-opening of the closed ones. Thanks, Liviu
Re: STDCXX-1071 numpunct facet defect
On 9/28/12 11:01 AM, Travis Vitek wrote: Only major versions can break binary. The versioning policy for stdcxx can be found here.. http://stdcxx.apache.org/versions.html I have renamed the binary-incompatible patch as patch-5.0.x.diff. Thanks, Liviu Travis -Original Message- From: Liviu Nicoara [mailto:nikko...@hates.ms] Sent: Friday, September 28, 2012 3:52 AM To: dev@stdcxx.apache.org Subject: Re: STDCXX-1071 numpunct facet defect I thought I replied but I see no trace of my post: On 09/27/12 20:27, Martin Sebor wrote: On 09/27/2012 06:41 AM, Liviu Nicoara wrote: On 09/26/12 20:12, Liviu Nicoara wrote: I have created STDCXX-1071 and linked to STDCXX-1056. [...] I am open to all questions, the more the better. Most of my opinions have been expressed earlier, but please ask if you want to know more. I am attaching here the proposed (4.3.x) patch and the timings results (after re-verifying the correctness of the timing program and the results). The 4.2.x patch, the 4.3.x patch, the test program and the results file are also attached to the incident. The patch isn't binary compatible. We can't remove data members in a minor release. We could only do it in a major release. I posted two patches, one for 4.3.x and one for 4.2.x (the latter is binary compatible). My understanding was that minor version revisions may break binary compatibility. Liviu
[PATCH] Re: STDCXX-1072 SPARC V8 mutex alignment requirements
On 9/28/12 11:32 AM, Travis Vitek wrote: -Original Message- From: Liviu Nicoara Sent: Friday, September 28, 2012 5:29 AM [...] The patch assumes the type `long double' exists on every platform. While I do believe that it is available everywhere, we have lots of conditional code guarding its use in the library now. If we are going to use `long double' in this context, we should guard it with _RWSTD_NO_LONG_DOUBLE. I think an even cleaner solution is to switch to using __rw_aligned_buffer instead. It gives us a single point of failure for alignment issues like this, and it makes the code self-documenting and easier to read. I am attaching another patch here, which makes use of the __rw_aligned_buffer, slightly more verbose but the code is slightly cleaner. Thanks! Liviu Index: src/messages.cpp === --- src/messages.cpp(revision 1390953) +++ src/messages.cpp(working copy) @@ -38,6 +38,7 @@ #include rw/_mutex.h +#include podarray.h #if !defined (_RWSTD_NO_NL_TYPES_H) !defined (_RWSTD_NO_CATOPEN_CATGETS) # include nl_types.h @@ -64,10 +65,7 @@ _RWSTD_NAMESPACE (__rw) { struct __rw_open_cat_data { nl_catd catd; -union { -void *_C_align; -char _C_data [sizeof (_STD::locale)]; -} loc; +__rw_aligned_buffer_STD::locale loc; }; struct __rw_open_cat_page @@ -159,7 +157,8 @@ __rw_manage_cat_data (int cat, __rw_op cat = int (n_catalogs); catalogs [cat]-catd = pcat_data-catd; -memcpy (catalogs [cat]-loc, pcat_data-loc, +memcpy (catalogs [cat]-loc._C_store (), +pcat_data-loc._C_store (), sizeof (_STD::locale)); if (size_t (cat) largest_cat) @@ -175,7 +174,8 @@ __rw_manage_cat_data (int cat, __rw_op } catalogs [cat]-catd = pcat_data-catd; -memcpy (catalogs [cat]-loc, pcat_data-loc, +memcpy (catalogs [cat]-loc._C_store (), +pcat_data-loc._C_store (), sizeof (_STD::locale)); if (size_t (cat) largest_cat) @@ -258,8 +258,9 @@ int __rw_cat_open (const _STD::string c return -1; __rw_open_cat_data cat_data; + cat_data.catd = catd; -new (cat_data.loc) _STD::locale (loc); +new (cat_data.loc._C_store ()) _STD::locale (loc); int cat = -1; __rw_manage_cat_data (cat, cat_data); @@ -307,7 +308,7 @@ const _STD::locale __rw_get_locale (int _RWSTD_ASSERT (0 != pcat_data); -return *(_RWSTD_REINTERPRET_CAST (_STD::locale*, (pcat_data-loc))); +return *pcat_data-loc._C_data (); } @@ -329,8 +330,7 @@ void __rw_cat_close (int cat) catclose (pcat_data-catd); -_STD::locale* const ploc = -_RWSTD_REINTERPRET_CAST (_STD::locale*, pcat_data-loc); +_STD::locale* const ploc = pcat_data-loc._C_data (); ploc-~locale (); Index: src/locale_body.cpp === --- src/locale_body.cpp (revision 1390953) +++ src/locale_body.cpp (working copy) @@ -1024,14 +1024,12 @@ _C_manage (__rw_locale *plocale, const c // the classic C locale is statically allocated // and not destroyed during the lifetime of the process -static union { -void* _C_align; -char _C_buf [sizeof (__rw_locale)]; -} classic_body; +static __rw_aligned_buffer__rw_locale classic_body; // construct a locale body in place // with the initial reference count of 1 -classic = new (classic_body) __rw_locale (locname); +classic = new (classic_body._C_store ()) +__rw_locale (locname); _RWSTD_ASSERT (1 == classic-_C_ref); } Index: src/use_facet.h === --- src/use_facet.h (revision 1390953) +++ src/use_facet.h (working copy) @@ -37,6 +37,7 @@ #include rw/_defs.h #include access.h +#include podarray.h // helper macro _RWSTD_DEFINE_FACET_FACTORY() defines a facet factory @@ -63,12 +64,9 @@ } \ else { \ /* construct an ordinary facet in static memory */ \ - static union { \ - void *align_; \ - char data_ [sizeof (__rw_ ## fid ## _facet)]; \ - } f
STDCXX-970 and locale tests
While checking for fallout from 1072 I stumbled upon the collate test and its numerous (unrelated to 1072) failures. While I am looking at it I have a question: it seems to me that the locale tests do not test against STDCXX locale database. Is that right? So, all locale tests are supposed to be run without RWSTD_LOCALE_ROOT defined? Thanks, Liviu
Fwd: Re: STDCXX-1071 numpunct facet defect
Forwarding with the attachment. Original Message Subject: Re: STDCXX-1071 numpunct facet defect Date: Sun, 30 Sep 2012 12:09:10 -0600 From: Martin Sebor mse...@gmail.com To: Liviu Nicoara nikko...@hates.ms On 09/27/2012 06:36 PM, Liviu Nicoara wrote: On 9/27/12 8:27 PM, Martin Sebor wrote: On 09/27/2012 06:41 AM, Liviu Nicoara wrote: On 09/26/12 20:12, Liviu Nicoara wrote: I have created STDCXX-1071 and linked to STDCXX-1056. [...] I am open to all questions, the more the better. Most of my opinions have been expressed earlier, but please ask if you want to know more. I am attaching here the proposed (4.3.x) patch and the timings results (after re-verifying the correctness of the timing program and the results). The 4.2.x patch, the 4.3.x patch, the test program and the results file are also attached to the incident. The patch isn't binary compatible. We can't remove data members in a minor release. We could only do it in a major release. There are two of them, one for 4.2.x, one for 4.3.x. I'm still curious about the performance, though. It doesn't make sense to me that a call to a virtual function is faster than one to an almost trivial inline function. I scratched my head long on that. I wager that it depends on the machine, too. Here are my timings for library-reduction.cpp when compiled GCC 4.5.3 on Solaris 10 (4 SPARCV9 CPUs). I had to make a small number of trivial changes to get it to compile: With cache No cache real1m38.332s 8m58.568s user6m30.244s34m25.942s sys 0m0.060s 0m3.922s I also experimented with the program on Linux (CEL 4 with 16 CPUs). Initially, I saw no differences between the two versions. So I modified it a bit to make it closer to the library (the modified program is attached). With those changes the timings are below: With cache No cache real0m 1.107s0m 5.669s user0m17.204s0m 5.669s sys 0m 0.000s0m22.347s I also recompiled and re-ran the test on Solaris. To speed things along, I set the number threads and loops to 8 and 100. The numbers are as follows: With cache No cache real0m3.341s 0m26.333s user0m13.052s1m37.470s sys 0m0.009s 0m0.132s The numbers match my expectation. The overhead without the numpunct cache is considerable. Somewhat unexpectedly, the test with the cache didn't crash. I also tried timing the numpunct patch attached to the issue with the test program. The test program runs to completion without the patch but it crashes with a SIGSEGV after the patch is applied. That suggests there's a bug somewhere in do_thousands_sep() (in addition to the caching). Martin Let me know if there is anything I could help with. Liviu #include iostream #include locale #include cstdio #include cstdlib #include cstring #include pthread.h #include unistd.h #define MAX_THREADS 128 static long nloops = 1000, nthreads = 16; static bool volatile pwait = true; namespace X { struct guard { guard (pthread_mutex_t* ptr) : lock_ (ptr) { pthread_mutex_lock (lock_); } ~guard () { pthread_mutex_unlock (lock_); } private: guard (guard const); guard operator= (guard const); private: pthread_mutex_t* lock_; }; struct facet { facet () { pthread_mutex_init (_C_mutex, 0); } void* _C_data () { return _C_impsize ? _C_impdata : _C_get_data (); } void* _C_get_data (); size_t _C_impsize; void* _C_impdata; pthread_mutex_t _C_mutex; }; void* facet::_C_get_data () { if (_C_impsize) return _C_impdata; guard g (_C_mutex); if (_C_impsize) return _C_impdata; #if !defined (NO_USE_STDCXX_LOCALES) _C_impdata = (void*)\3\3; #endif // USE_STDCXX_LOCALES _C_impsize = (size_t)(-1); return _C_impdata; } void* get_data (facet*); struct numpunct : facet { numpunct () : _C_flags () { } virtual std::string do_grouping () const; std::string grouping () const { numpunct* const __self = (numpunct*)this; #if !defined (NO_USE_NUMPUNCT_CACHE) if (!(_C_flags 1)) { __self-_C_flags |= 1; __self-_C_grouping = do_grouping (); } return _C_grouping; #else return do_grouping (); #endif // NO_USE_NUMPUNCT_CACHE } private: int _C_flags; std::string _C_grouping; }; std::string numpunct::do_grouping () const { return (const char*)get_data (const_castnumpunct*(this)); } void* get_data (facet* pfacet) { void* pdata = pfacet-_C_data (); if (pdata) return pdata; // __rw_setlocale loc (...); - other mutex if (pfacet-_C_data ()) return get_data (pfacet); pfacet-_C_impdata = const_castchar*(\4\4); pfacet-_C_impsize = (size_t)(-1); return 0; } } // namespace X
Re: Fwd: Re: STDCXX-1071 numpunct facet defect
On 9/30/12 2:21 PM, Liviu Nicoara wrote: Forwarding with the attachment. Original Message Subject: Re: STDCXX-1071 numpunct facet defect Date: Sun, 30 Sep 2012 12:09:10 -0600 From: Martin Sebor mse...@gmail.com To: Liviu Nicoara nikko...@hates.ms On 9/27/12 8:27 PM, Martin Sebor wrote: Here are my timings for library-reduction.cpp when compiled GCC 4.5.3 on Solaris 10 (4 SPARCV9 CPUs). I had to make a small number of trivial changes to get it to compile: With cache No cache real1m38.332s 8m58.568s user6m30.244s34m25.942s sys 0m0.060s 0m3.922s I also experimented with the program on Linux (CEL 4 with 16 CPUs). Initially, I saw no differences between the two versions. So I modified it a bit to make it closer to the library (the modified program is attached). With those changes the timings I see the difference -- your program has a virtual function it calls from the inline grouping function. are below: With cache No cache real0m 1.107s0m 5.669s user0m17.204s0m 5.669s sys0m 0.000s0m22.347s I also recompiled and re-ran the test on Solaris. To speed things along, I set the number threads and loops to 8 and 100. The numbers are as follows: With cache No cache real0m3.341s 0m26.333s user0m13.052s1m37.470s sys 0m0.009s 0m0.132s The numbers match my expectation. The overhead without the numpunct cache is considerable. I have done another (smaller) round of measurements, this time using the test program you posted. Here are the results: * iMac, 4x Intel, 12S: 16, 1000: Cached Not cached real0m9.300s 0m5.224s user0m36.441s0m20.523s sys 0m0.043s 0m0.068s * iMac, 4x Intel, 12D: CachedNot cached real0m9.012s 0m5.774s user0m35.343s 0m20.997s sys 0m0.045s 0m0.183s * Linux Slackware, 16x AMD Opteron, 12S: 16, 1000: Cached Not cached real0m29.798s 0m3.278s user0m48.662s 0m47.338s sys 6m18.525s 0m3.298s Somewhat unexpectedly, the test with the cache didn't crash. On my iMac it did not crash for me either (gcc 4.5.4), this time. On the other box (gcc 4.5.2) crashed every time with caching, so I had to add a call to fac.grouping outside the thread function to initialize the facet. Liviu
Fwd: Re: Fwd: Re: STDCXX-1071 numpunct facet defect
Forwarding to the list. Duh. Original Message Subject: Re: Fwd: Re: STDCXX-1071 numpunct facet defect Date: Sun, 30 Sep 2012 19:02:27 -0400 From: Liviu Nicoara nikko...@hates.ms To: Martin Sebor mse...@gmail.com On 9/30/12 6:18 PM, Martin Sebor wrote: I see you did a 64-bit build while I did a 32-bit one. so I tried 64-bits. The cached version (i.e., the one compiled with -UNO_USE_NUMPUNCT_CACHE) is still about twice as fast as the non-cached one (compiled with -DNO_USE_NUMPUNCT_CACHE). I had made one change to the test program that I thought might account for the difference: I removed the call to abort from the thread function since it was causing the process to exit prematurely in some of my tests. But since you used the modified program for your latest measurements that couldn't be it. I can't explain the differences. They just don't make sense to me. Your results should be the other way around. Can you post the disassembly of function f() for each of the two configurations of the test? Here they are. Liviu Dump of assembler code for function f: 0x00403870 +0: push %r15 0x00403872 +2: push %r14 0x00403874 +4: push %r13 0x00403876 +6: push %r12 0x00403878 +8: push %rbp 0x00403879 +9: push %rbx 0x0040387a +10:mov%rdi,%rbx 0x0040387d +13:sub$0x38,%rsp 0x00403881 +17:nopl 0x0(%rax) 0x00403888 +24:movzbl 0x261f11(%rip),%eax# 0x6657a0 _ZL5pwait 0x0040388f +31:test %al,%al 0x00403891 +33:jne0x403888 f+24 0x00403893 +35:cmpq $0x0,0x261ef5(%rip)# 0x665790 _ZL6nloops 0x0040389b +43:jle0x403b12 f+674 0x004038a1 +49:xor%ebp,%ebp 0x004038a3 +51:xor%r12d,%r12d 0x004038a6 +54:lea0x10(%rsp),%r13 0x004038ab +59:lea0x48(%rbx),%r14 0x004038af +63:lea0x20(%rsp),%r15 0x004038b4 +68:jmpq 0x4039a7 f+311 0x004038b9 +73:nopl 0x0(%rax) 0x004038c0 +80:cmp$0x66a020,%rdx 0x004038c7 +87:mov%rdi,0x20(%rsp) 0x004038cc +92:je 0x403ab8 f+584 0x004038d2 +98:mov%rdx,%rdi 0x004038d5 +101: mov%rdx,(%rsp) 0x004038d9 +105: callq 0x403658 pthread_mutex_lock@plt 0x004038de +110: test %eax,%eax 0x004038e0 +112: mov(%rsp),%rdx 0x004038e4 +116: je 0x4038fb f+139 0x004038e6 +118: mov$0x4452a4,%esi 0x004038eb +123: mov$0xa,%edi 0x004038f0 +128: xor%eax,%eax 0x004038f2 +130: callq 0x404370 _ZN4__rw10__rw_throwEiz 0x004038f7 +135: mov(%rsp),%rdx 0x004038fb +139: addl $0x1,0x28(%rdx) 0x004038ff +143: test %rdx,%rdx 0x00403902 +146: je 0x40390c f+156 0x00403904 +148: mov%rdx,%rdi 0x00403907 +151: callq 0x4036c8 pthread_mutex_unlock@plt 0x0040390c +156: mov0x20(%rsp),%rdx 0x00403911 +161: mov%rdx,%rdi 0x00403914 +164: mov%rdx,(%rsp) 0x00403918 +168: callq 0x403258 strlen@plt 0x0040391d +173: mov(%rsp),%rdx 0x00403921 +177: add%rax,%r12 0x00403924 +180: lea-0x40(%rdx),%rcx 0x00403928 +184: cmp$0x66a020,%rcx 0x0040392f +191: je 0x40398b f+283 0x00403931 +193: mov%rcx,%rdi 0x00403934 +196: mov%rcx,0x8(%rsp) 0x00403939 +201: callq 0x403658 pthread_mutex_lock@plt 0x0040393e +206: test %eax,%eax 0x00403940 +208: mov(%rsp),%rdx 0x00403944 +212: mov0x8(%rsp),%rcx 0x00403949 +217: je 0x403965 f+245 0x0040394b +219: mov$0x4452a4,%esi 0x00403950 +224: mov$0xa,%edi 0x00403955 +229: xor%eax,%eax 0x00403957 +231: callq 0x404370 _ZN4__rw10__rw_throwEiz 0x0040395c +236: mov0x8(%rsp),%rcx 0x00403961 +241: mov(%rsp),%rdx 0x00403965 +245: mov-0x18(%rdx),%esi 0x00403968 +248: test %rcx,%rcx 0x0040396b +251: lea-0x1(%rsi),%eax 0x0040396e +254: mov%eax,-0x18(%rdx) 0x00403971 +257: je 0x403983 f+275 0x00403973 +259: mov%rcx,%rdi 0x00403976 +262: mov%esi,0x8(%rsp) 0x0040397a +266: callq 0x4036c8 pthread_mutex_unlock@plt 0x0040397f +271: mov0x8(%rsp),%esi 0x00403983 +275: test %esi,%esi 0x00403985 +277: jle0x403a80 f+528 0x0040398b +283: add$0x1,%ebp 0x0040398e +286: movq $0x0,0x20(%rsp
Re: Fwd: Re: STDCXX-1071 numpunct facet defect
On 09/30/12 18:18, Martin Sebor wrote: I see you did a 64-bit build while I did a 32-bit one. so I tried 64-bits. The cached version (i.e., the one compiled with -UNO_USE_NUMPUNCT_CACHE) is still about twice as fast as the non-cached one (compiled with -DNO_USE_NUMPUNCT_CACHE). I had made one change to the test program that I thought might account for the difference: I removed the call to abort from the thread function since it was causing the process to exit prematurely in some of my tests. But since you used the modified program for your latest measurements that couldn't be it. I can't explain the differences. They just don't make sense to me. Your results should be the other way around. Can you post the disassembly of function f() for each of the two configurations of the test? The first thing that struck me in the cached `f' was that __string_ref class uses a mutex for synchronizing access to the ref counter. It turns out, for Linux on AMD64 we explicitly use a mutex instead of the atomic ops on the ref counter, via a block in rw/_config.h: # if _RWSTD_VER_MAJOR 5 #ifdef _RWSTD_OS_LINUX // on Linux/AMD64, unless explicitly requested, disable the use // of atomic operations in string for binary compatibility with // stdcxx 4.1.x # ifndef _RWSTD_USE_STRING_ATOMIC_OPS #define _RWSTD_NO_STRING_ATOMIC_OPS # endif // _RWSTD_USE_STRING_ATOMIC_OPS #endif // _WIN32 # endif // stdcxx 5.0 That is not the cause for the performance difference, though. Even after building with __RWSTD_USE_STRING_ATOMIC_OPS I get the same better performance with the non-cached version. Liviu
Re: Fwd: Re: STDCXX-1071 numpunct facet defect
On 10/02/12 10:41, Martin Sebor wrote: I haven't had time to look at this since my last email on Sunday. I also forgot about the string mutex. I don't think I'll have time to spend on this until later in the week. Unless the disassembly reveals the smoking gun, I think we might need to simplify the test to get to the bottom of the differences in our measurements. (I.e., eliminate the library and measure the runtime of a simple thread loop, with and without locking.) We should also look at the GLIBC and kernel versions on our systems, on the off chance that there has been a change that could explain the discrepancy between my numbers and yours. I suspect my system (RHEL 4.8) is much older than yours (I don't remember now if you posted your details). I am gathering some more measurements along these lines but it's time consuming. I estimate I will have some ready for review later today or tomorrow. In the meantime could you please post your kernel, glibc and compiler versions? Liviu Martin On 10/02/2012 06:22 AM, Liviu Nicoara wrote: On 09/30/12 18:18, Martin Sebor wrote: I see you did a 64-bit build while I did a 32-bit one. so I tried 64-bits. The cached version (i.e., the one compiled with -UNO_USE_NUMPUNCT_CACHE) is still about twice as fast as the non-cached one (compiled with -DNO_USE_NUMPUNCT_CACHE). I had made one change to the test program that I thought might account for the difference: I removed the call to abort from the thread function since it was causing the process to exit prematurely in some of my tests. But since you used the modified program for your latest measurements that couldn't be it. I can't explain the differences. They just don't make sense to me. Your results should be the other way around. Can you post the disassembly of function f() for each of the two configurations of the test? The first thing that struck me in the cached `f' was that __string_ref class uses a mutex for synchronizing access to the ref counter. It turns out, for Linux on AMD64 we explicitly use a mutex instead of the atomic ops on the ref counter, via a block in rw/_config.h: # if _RWSTD_VER_MAJOR 5 # ifdef _RWSTD_OS_LINUX // on Linux/AMD64, unless explicitly requested, disable the use // of atomic operations in string for binary compatibility with // stdcxx 4.1.x # ifndef _RWSTD_USE_STRING_ATOMIC_OPS # define _RWSTD_NO_STRING_ATOMIC_OPS # endif // _RWSTD_USE_STRING_ATOMIC_OPS # endif // _WIN32 # endif // stdcxx 5.0 That is not the cause for the performance difference, though. Even after building with __RWSTD_USE_STRING_ATOMIC_OPS I get the same better performance with the non-cached version. Liviu