Re: STDCXX-1056 : numpunct fix
On 09/24/12 22:28, Martin Sebor wrote: On 09/20/2012 06:46 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoaranikko...@hates.ms wrote: I have not created this requirement out of thin air. STDCXX development has functioned in this manner for as long as I remember. If it does not suit you, that's fine. That would explain why these bugs are present in the first place. If the official method of determining thread-safety here is fprintf(3C), then we have a much bigger problem than 22.locale.numpunct.mt. Tests that end in .mt.cpp use the RW_ASSERT() macro to verify correctness in multithreaded programs. There is no way at the moment to run a thread analyzer as part of the test suite, although it would be a nice addition. As I said in my other reply, though, not all data races necessarily indicate bugs, so we'd need a way to distinguish between know/benign races and the rest. (Unless we decide to eliminate all races and accept the performance penalty.) Preserving the lazy initialization is possible: we perfectly forward to the implementation with either: 1. no additional synchronization (preserves fast reads of the data), while fully knowing that there are data races in __rw_get_numpunct/_C_get_data there, or 2.we put a big lock on top of all and every facet operations (the version that would be race free). Caching seems out of the window for now, more so in the presence of unguarded reads of facet's data. The only argument for (1) is that at the moment we don't have a failing test yet, and not for lack of arduous trying. However, if atomic/membar APIs would be recognized by the compiler, lazy initialization of the facet and caching of data would be attainable in a quite simple fashion. Liviu
Re: STDCXX-1056 : numpunct fix
On 09/24/12 23:50, Stefan Teleman wrote: On Mon, Sep 24, 2012 at 10:03 PM, Martin Sebor mse...@gmail.com wrote: FWIW, there are race conditions in stdcxx. Some of them are by design and benign on the systems the library runs on (although I acknowledge that some others may be bugs). One such benign date race is: timeT1 T2 0x = N 1x = N read x where x is a scalar that can be accessed atomically by the CPU and the compiler. I think some of the lazy facet initialization falls under this class. It would be nice to fix them but only if it doesn't slow things down. The others need to be fixed in any case. The race conditions I am talking about are not benign. Caching failures aside, we have no failing tests. Before saying they are malign you need to objectively show a failing program. It shows the types of race conditions it's reporting: these are read/write race conditions, not read/read. The thread analyzer's html filter doesn't show the types of races reported as clearly as the command-line analyzer which has a windowing GUI. Martin's example above is a read-write race. In the absence of a failing test case it is quite unreasonable to modify the current implementation. Liviu
Re: STDCXX-1056 : numpunct fix
On 9/24/12 11:06 PM, Stefan Teleman wrote: On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, was it your intention to completely eliminate all the race conditions with this last patch? Is this what the tools showed in your environment? https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452 Yes, all the race conditions in std::numpunct and std::moneypunct. Not all the race conditions in stdcxx. FWIW, I have analyzed the performance of the latest patch from Stefan. The test program is the same test program used in the timings attachments to the incident. The version of the library was 4.2.1, compiler gcc 4.7.1, with: 1. The simple, perfect forwarding patch, with no caching, which does eliminate race conditions. 2. Stefan's latest patch 3. A patch I created which truly, if crudely, eliminates all facet race conditions by using a top-level lock for all facet operations(*) The numbers from the first test run are already available in the incident. The numbers for the second test run were a bit bigger, which is expected because of the additional synchronization. E.g., I have found that the test, run against the STDCXX locale database, gives: $ time ./t en_US.UTF-8 4 500 4, 500 real0m19.210s user0m32.659s sys 0m36.507s where the first patch gives: $ time ./t en_US.UTF-8 4 5000; done 4, 5000 real0m7.961s user0m31.211s sys 0m0.003s Notice the number of loops, 10x larger in the second set of number. Now, for the third patch, the numbers are embarrassingly large. It practically slows the program to a crawl because it does not do any reads of facet data without holding the lock. This, I wager, eliminates all races in the numpunct/moneypunct facet. See: $ time ./t en_US.UTF-8 4 10 4, 10 real0m24.316s user0m28.213s sys 0m6.633s Now that is a whooping performance hit. I went back to see an explanation between the numbers I see with Stefan's more performing patch, and my lock-all patch. I believe that there are still places in Stefan's patch where the facet makes unguarded reads of facet data, e.g.: loc/_numpunct.h: 190 // returns a pointer to the facet's implementation data 191 // if it exists, 0 otherwise 192 const void* _C_data () const { 193 return _C_impsize ? _C_impdata 194 : _RWSTD_CONST_CAST (__rw_facet*, this)-_C_get_data (); 195 } where _C_impsize/_C_impdata are read outside of a lock scope, etc. I.e., a thread may read the _C_impsize while, say, another is still writing the _C_impdata member variable. I conclude, even with such simple testing, that any solution that attempts to completely eliminate the race conditions in these facets will incur a penalty, with the safest (top-level locking) the worst by orders of magnitude and any other solution somewhere in between. Please let me know if you have any questions. Liviu (*) Will attach it as soon as we figure out how to undo the closing of the issue.
Re: STDCXX-1056 : numpunct fix
On 9/24/12 12:06 AM, Stefan Teleman wrote: On Fri, Sep 21, 2012 at 9:10 AM, Liviu Nicoara nikko...@hates.ms wrote: On 09/21/12 05:13, Stefan Teleman wrote: On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I looked at it in great detail and I am sure Travis read it too. The facet/locale buffer management is a red herring. Apparently, we cannot convince you, regardless of the argument. That's fine by me. This bug [STDCXX-1056] was updated over the weekend with new comments. Here's the link to the comments, for the record: Stefan, was it your intention to completely eliminate all the race conditions with this last patch? Is this what the tools showed in your environment? Thanks, Liviu https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452
Re: STDCXX-1056 : numpunct fix
On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, was it your intention to completely eliminate all the race conditions with this last patch? Is this what the tools showed in your environment? https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452 Yes, all the race conditions in std::numpunct and std::moneypunct. Not all the race conditions in stdcxx. Yes, that's what the test results show with my latest patchset - 0 race conditions in numpunct and moneypunct on Linux/Intel 32/64, Solaris SPARC 32/64 and Solaris Intel 32/64. The test results are in the onlinehome.us directory URL i put in the comment. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Mon, Sep 24, 2012 at 10:03 PM, Martin Sebor mse...@gmail.com wrote: FWIW, there are race conditions in stdcxx. Some of them are by design and benign on the systems the library runs on (although I acknowledge that some others may be bugs). One such benign date race is: timeT1 T2 0x = N 1x = N read x where x is a scalar that can be accessed atomically by the CPU and the compiler. I think some of the lazy facet initialization falls under this class. It would be nice to fix them but only if it doesn't slow things down. The others need to be fixed in any case. The race conditions I am talking about are not benign. I've uploaded a full thread analyzer output for 22.locale.numpunct.mt showing dataraces here: http://s247136804.onlinehome.us/stdcxx-1056-malign/22.locale.numpunct.mt.2.er.tar.bz2 The name of the analyzer results directory is 22.locale.numpunct.mt.2.er You will need the SunPro Linux 12.3 Thread Analyzer installed, which comes with SunPro anyway. The analyzer itself is ${PATH_TO_SUN_PRO_INSTALL}/bin/analyzer There's a screenshot from the same Analyzer output here: http://s247136804.onlinehome.us/stdcxx-1056-malign/sunpro_thread_analyzer_screenshot.jpg taken just now on my laptop. The report itself is from a couple of days ago, and it's from a run with only the _numpunct.h patch applied. No patches to either facet.cpp, punct.cpp or locale_body.cpp. It shows the types of race conditions it's reporting: these are read/write race conditions, not read/read. The thread analyzer's html filter doesn't show the types of races reported as clearly as the command-line analyzer which has a windowing GUI. At any rate you can see the same exact type of race conditions being reported by the Intel Inspector 2013 Thread Analyzer. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Fri, Sep 21, 2012 at 9:10 AM, Liviu Nicoara nikko...@hates.ms wrote: On 09/21/12 05:13, Stefan Teleman wrote: On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I looked at it in great detail and I am sure Travis read it too. The facet/locale buffer management is a red herring. Apparently, we cannot convince you, regardless of the argument. That's fine by me. This bug [STDCXX-1056] was updated over the weekend with new comments. Here's the link to the comments, for the record: https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452 -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
RE: STDCXX-1056 : numpunct fix
-Original Message- From: Stefan Teleman Sent: Thursday, September 20, 2012 6:00 PM To: dev@stdcxx.apache.org Cc: Liviu Nicoara Subject: Re: STDCXX-1056 : numpunct fix On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström cbergst...@pathscale.com wrote: I do have a program which triggers the race conditions: 22.locale.numpunct.mt. Part of the official test harness. I don't think anyone is claiming that there isn't a thread safety problem in the locales. I'm open to the possibility, and I'm willing to review changes that fix the problem, but I want some understanding of what the problem actually is and how the changes actually address the problem before signing off. I also find it hard to believe the changes to the facet cache (and many of the other changes) are necessary to eliminate any data races that were reported by these tools. Changing the cache size from 8 to 64 is _clearly_ not going to fix a data race. It might reduce the likelihood of one occurring, but it isn't going to fix it. I also fail to understand how not reducing the size of the cache fixes a data race either. These changes all appear to be optimizations, and as such, should _not_ be included in changes to fix thread-safety issues. Perhaps we could go about this a different way. Test your code without the changes to the facet cache changes in _C_manage and report back on what you find. Or, perhaps more simply, provide minimal diffs (as required by the patching process) to fix the data race issues that you've outlined. The real reason why they don't want to accept what the instrumentation tools are saying is very simple: they don't LIKE reading what the tools are saying. So, blame the tools, pretend that as long as it doesn't crash again there's no bug and hope for the best. My previous e-mail attempted to indicate that I don't trust the output that these tools have provided. I have no understanding of how the tool managed to find a data race in __rw_allocate (or _C_manage). Do you? Doesn't this output give you some doubt about the accuracy of the tool? I have no problem using a tool to make my job easier. Sometimes those tools are wrong. The two cases mentioned above call into question the accuracy of the tool being used, and it is important to understand why those failures are being seen (or why I am wrong about the given code being safe). But I am very glad this is on a public mailing list, so everyone can read what's going on here. As am I. I've raised concerns about the tools you're using to detect problems. I've pointed out what I believe to be two false positives, and you've accused me of being incompetent and afraid of what the tools are saying. You've offered no explanation about these apparent false positives and you've not provided any explanation for why you don't see similar false positives when you run the code on the patched library. I've raised concerns about some of the changes you've provided being unnecessary to resolve the issue at hand (a thread safety issue) and labeled them as optimizations (your original e-mail seemed to indicate you were under the same impression). As Liviu pointed out, the bug fix changes should be handled separately from the optimization. I wholeheartedly agree. You called out premature optimization as evil, in a discussion about patches you provided that include optimizations and no testcase showing that your changes are not premature and provide measureable benefit. Then you rail on... Then, to top it off, you go on to call me a know-it-all who isn't capable of figuring out my own bugs. I'm sorry, but that isn't acceptable. Travis
Re: STDCXX-1056 : numpunct fix
On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: You called out premature optimization as evil, in a discussion about patches you provided that include optimizations and no testcase showing that your changes are not premature and provide measureable benefit. Then you rail on... I didn't call premature optimization evil. Donald Knuth did. If you disagree with him, please feel free to let him know. He's on faculty at Stanford. Then, to top it off, you go on to call me a know-it-all who isn't capable of figuring out my own bugs. I'm sorry, but that isn't acceptable Too bad if you feel that way. Next time you get the idea of making snide remarks about my working knowledge of the Standard C Library, or offer out-of-context one-line code fragments completely unrelated to what this 1056 bug is about, maybe you'll think twice. And this isn't the first time you offer gratuitous snide remarks directed at me. You are one of the deniers of the existence of this thread safety problem in the facets code, going back to early February of this year. Between the release of stdcxx 4.2.1 in 2008 and the beginning of this month, when the possibility of this thread safety problem was finally acknowledged, did you really not know that 22.locale.numpunct.mt and 22.locale.moneypunct.mt have been crashing or getting stuck? Did you really not know that these crashes were typical symptoms of race conditions? I find that very hard to believe, given that the problem has been reported several times before February of this year. I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I have yet to see an alternative patch being proposed here, which would eliminate the race conditions, and which I am willing to test just as thoroughly as I've tested mine. The only thing i've seen are continued attempts at dismissing the existence of these race conditions, as reported by the instrumentation tools, based on some general axioms about their accuracy. No-one on this list has a clue as to how the SunPro Thread Analyzer actually works, because it's not open source, and none of you work at Oracle, therefore you can't see the code. But everyone just *knows* that it's inaccurate, or that it reports false positives. As long as you, or anyone else, continues to blame the instrumentation tools for the bug, and as long as anyone here continues to suggest that the only acceptable proof of the existence of this bug is some other program which needs to be written using fprintf(3C), and as long as no-one here is willing to provide an alternative patch which demonstrably eliminates 100% of the reported race conditions, this entire back-and-forth about the existence of these race conditions, the accuracy of the tools and what they are reporting is nothing but a giant waste of time. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On 09/21/12 05:13, Stefan Teleman wrote: On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I looked at it in great detail and I am sure Travis read it too. The facet/locale buffer management is a red herring. Apparently, we cannot convince you, regardless of the argument. That's fine by me. I have yet to see an alternative patch being proposed here, which would eliminate the race conditions, and which I am willing to test just as thoroughly as I've tested mine. The only thing i've seen are continued attempts at dismissing the existence of these race conditions, as reported by the instrumentation tools, based on some general axioms about their accuracy. No-one on this list has a clue as to how the SunPro Thread Analyzer actually works, because it's not open source, and none of you work at Oracle, therefore you can't see the code. But everyone just *knows* that it's inaccurate, or that it reports false positives. We are not dismissing the analysis. We are dismissing your interpretation of it. The tool is not indicating a defect, but a potential defect. I am calling it potential because, as my test case (http://tinyurl.com/97ks9gc) indicates, one can have a race condition which is not a defect in a concurrent environment. As long as you, or anyone else, continues to blame the instrumentation tools for the bug, and as long as anyone here continues to suggest that the only acceptable proof of the existence of this bug is some other program which needs to be written using fprintf(3C), and as long as no-one here is willing to provide an alternative patch which demonstrably eliminates 100% of the reported race conditions, this entire back-and-forth about the existence of these race conditions, the accuracy of the tools and what they are reporting is nothing but a giant waste of time. The correction of 1056 does not necessarily need to eliminate the analyzer warnings. Aiming for that is simply unreasonable, not because it is not attainable (after all, you demonstrated it), but because it is inappropriately and insufficiently argued for the situation at hand. Liviu
RE: STDCXX-1056 : numpunct fix
-Original Message- From: Stefan Teleman [mailto:stefan.tele...@gmail.com] Sent: Friday, September 21, 2012 2:14 AM To: dev@stdcxx.apache.org Subject: Re: STDCXX-1056 : numpunct fix On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek travis.vi...@roguewave.com wrote: You called out premature optimization as evil, in a discussion about patches you provided that include optimizations and no testcase showing that your changes are not premature and provide measureable benefit. Then you rail on... I didn't call premature optimization evil. Donald Knuth did. If you disagree with him, please feel free to let him know. He's on faculty at Stanford. I never said I disagree with the Knuth quote. I just said you have to apply it consistently. Is your change an optimization or not? If it is, then is it a premature optimization? Do we have a test case (or test results) to indicate that there is a discernible performance improvement? You are one of the deniers of the existence of this thread safety problem in the facets code, going back to early February of this year. Again from my previous e-mail. quote I don't think anyone is claiming that there isn't a thread safety problem in the locales. I'm open to the possibility, and I'm willing to review changes that fix the problem, but I want some understanding of what the problem actually is and how the changes actually address the problem before signing off. /quote Between the release of stdcxx 4.2.1 in 2008 and the beginning of this month, when the possibility of this thread safety problem was finally acknowledged, did you really not know that 22.locale.numpunct.mt and 22.locale.moneypunct.mt have been crashing or getting stuck? Did you really not know that these crashes were typical symptoms of race conditions? I find that very hard to believe, given that the problem has been reported several times before February of this year. I have provided this list with test results showing that my patch *does* fix the race condition problems identified by all the tools at my disposal. I'm willing to bet you never looked at it. You dismissed it outright, just because you didn't like the *idea* that increasing the size of the cache, and eliminating that useless cache resizing would play an important role in eliminating the race conditions. I looked at every line of your proposed patch _before_ my last e-mail. Despite the fact that you didn't provide it in the expected format (diff) and it included many changes that are unrelated and make little sense. I have yet to see an alternative patch being proposed here, which would eliminate the race conditions, and which I am willing to test just as thoroughly as I've tested mine. The only thing i've seen are continued attempts at dismissing the existence of these race conditions I'm _not_ denying the existence of the problem. Please re-read all of my correspondence on this issue. If you don't believe me after that, re-read it again.
Re: STDCXX-1056 : numpunct fix
Thanks for the feed-back. Please see below. On Sep 19, 2012, at 10:02 PM, Stefan Teleman wrote: On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara nikko...@hates.ms wrote: I think you are referring to `live' cache objects and the code which specifically adjusts the size of the buffer according to the number of `live' locales and/or facets in it. In that respect I would not call that eviction because locales and facets with non-zero reference counters are never evicted. But anyhoo, this is semantics. Bottom line is the locale/facet buffer management code follows a principle of economy. Yes it does. But we have to choose between economy and efficiency. To clarify: The overhead of having unused pointers in the cache is sizeof(void*) times the number of unused slots. This is 2012. Even an entry-level Android cell phone comes with 1GB system memory. If we want to talk about embedded systems, where memory constraints are more stringent than cell phones, then we're not talking about Apache stdcxx anymore, or any other open souce of the C++ Standard Library. These types of systems use C++ for embedded systems, which is a different animal altogether: no exceptions support, no rtti. For example see, Green Hills: http://www.ghs.com/ec++.html. And even they have become more relaxed about memory constraints. They use BOOST. Bottom line: so what if 16 pointers in this 32 pointer slots cache never get used. The maximum amount of wasted memory for these 16 pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers. Can we live with that in 2012, a year when a $500 laptop comes with 4GB RAM out of the box? I would pick 128 bytes of allocated but unused memory over random and entirely avoidable memory churn any day. The argument is plausible and fine as far as brainstorming goes. But have you measured the amount of memory consumed by all STDCXX locale data loaded in one process? How much absolute time is spent in resizing the locale and facet buffers? What is the gain in space and time performance with such a change versus without? Just how fragmented the heap becomes and is there a performance impact because of it, etc.? IOW, before changing the status quo one must show an objective defect, produce a body of evidence, including a failing test case for the argument. My goal: I would be very happy if any application using Apache stdcxx would reach its peak instantiation level of localization (read: max number of locales and facets instantiated and cached, for the application's particular use case), and would then stabilize at that level *without* having to resize and re-sort the cache, *ever*. That is a locale cache I can love. I love binary searches on sorted containers. Wrecking the container with insertions or deletions, and then having to re-sort it again, not so much. Especially when I can't figure out why we're doing it in the first place. And I love minimalistic code, and hate waste at the same time, especially in a general purpose library. To each its own. Hey Stefan, are the above also timing the changes? Nah, I didn't bother with the timings - yet - for a very simple reason: in order to use instrumentation, both with SunPro and with Intel compilers, optimization of any kind must be disabled. On SunPro you have to pass -xkeepframe=%all (which disables tail-call optimization as well), in addition to passing -xO0 and -g. So the timings for these unoptimized experiments would have been completely irrelevant. Well, I think you are the only one around here with access to SPARC hardware, your input is very precious in this sense. Also, this is the reason for which I kept asking that question earlier: do we have currently any failing locale MT test when numpunct does just perfect forwarding, with no caching? I.e., changing just _numpunct.h and no other source file (as to silence thread analyzers warnings) does any locale (or other) MT tests fail? I would greatly appreciate it if you could give it a run on your hardware if you don't already know the answer. The discussion has been productive. But I object to the patch as is because it goes out of the scope of the original incident. I think this patch should only touch the MT defect detected by the failing test cases. If you think the other parts you changed are defects you should open corresponding issues in JIRA and have them discussed in their separate rooms. Thanks, Liviu
RE: STDCXX-1056 : numpunct fix
-Original Message- From: Stefan Teleman [mailto:stefan.tele...@gmail.com] Sent: Thursday, September 20, 2012 10:11 AM To: dev@stdcxx.apache.org Subject: Re: STDCXX-1056 : numpunct fix On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms wrote: But have you measured the amount of memory consumed by all STDCXX locale data loaded in one process? How much absolute time is spent in resizing the locale and facet buffers? What is the gain in space and time performance with such a change versus without? Just how fragmented the heap becomes and is there a performance impact because of it, etc.? IOW, before changing the status quo one must show an objective defect, produce a body of evidence, including a failing test case for the argument. sizeof(std::locale) * number_of_locales. I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. You have a working knowledge of all Standard C Library implementations? What I do know for a fact that this optimization did, was to cause the races conditions reported by 4 different thread analyzers. Race conditions are a show-stopper for me, and they are not negotiable. The following is found near the top of the _C_manage method of __rw_facet. // acquire lock _RWSTD_MT_STATIC_GUARD (_RW::__rw_facet); None of the shared data related to is read/written outside of the critical section protected by that lock, and given the declaration for that shared data it cannot be accessed by any code outside that function. Put bluntly, there is no way that there is a race condition relating to the caching code itself. Your Performance Analyzer output indicates a race (7 race accesses) for _C_manage... http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/ Specifically, it is calling out the following block of code. ## 70 488. *__rw_access::_C_get_pid (*pfacet) = 489. _RWSTD_STATIC_CAST (_RWSTD_SIZE_T, (type + 1) / 2); The function _C_get_pid simply exposes a reference to a data member of the given facet. That function is thread safe. Provided that pfacet (the parameter passed to _C_manage) isn't being accessed by another thread, there is no way that this code is not safe. It is possible that calling code is not safe, but this code is clean. Regardless, the proposed patch to _C_manage does nothing to change this block of code. I do not understand how you can claim that this change eliminated the race conditions you are so offended by. It is possible that other changes you have made eliminated the data races, but I do not see how this change has any effect. And I love minimalistic code, and hate waste at the same time, especially in a general purpose library. To each its own. Here's a helpful quote: We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. It's from Donald Knuth. By that measure, your entire patch could be considered evil. I've seen no evidence that the subsequent two allocation/copy/deallocate/sort cycles required to get from 8 to 64 entries is measurably more expensive, and I've seen nothing to indicate that a normal application using the C++ Standard Library would be creating and destroying locale instances in large numbers, or that doing so has a measureable impact on performance. And I love correct code which doesn't cause thread analyzers to report more than 12000 race conditions for just one test case. I've said it before and I will say it again: race conditions are a showstopper and are not negotiable. Period. When the code in question has 12 threads that invoke a function 1000 times, you've found 1 race condition. I do agree data races are bad and should be fixed. But making changes to 'optimize' the code instead of fixing it is actually much worse. The patch is in scope for the issue at hand. The issue is that std::numpunct and std::moneypunct are not thread safe. This has been confirmed by 4 different thread analyzers, even after applying your _numpunct.h patch. I looked at the output from the thread analyzer. It points out a data race in __rw::__rw_allocate(), indicating that a memset() is responsible for a data race... http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/file.14.src.txt.html#line_43 Assuming that `operator new' is indeed thread safe (I didn't bother to look), I'm curious to hear how this is an actual data race. I'm also curious to hear how you managed to avoid having the same race appear in the output that you submitted with the proposed patch. You are more than welcome
Re: STDCXX-1056 : numpunct fix
On 09/20/12 13:11, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms wrote: But have you measured the amount of memory consumed by all STDCXX locale data loaded in one process? How much absolute time is spent in resizing the locale and facet buffers? What is the gain in space and time performance with such a change versus without? Just how fragmented the heap becomes and is there a performance impact because of it, etc.? IOW, before changing the status quo one must show an objective defect, produce a body of evidence, including a failing test case for the argument. sizeof(std::locale) * number_of_locales. I was more interested in the underlying locale data, not the C++ objects. I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. Thanks for the secret, I appreciate it. Did you mean to say that the C Standard mandates that?! What I do know for a fact that this optimization did, was to cause the races conditions reported by 4 different thread analyzers. Race conditions are a show-stopper for me, and they are not negotiable. No, that optimization was not causing the MT defect you originally noted. Saying so only shows a lack of familiarity with the implementation. And I love minimalistic code, and hate waste at the same time, especially in a general purpose library. To each its own. Here's a helpful quote: We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. It's from Donald Knuth. Please, no. And I love correct code which doesn't cause thread analyzers to report more than 12000 race conditions for just one test case. I've said it before and I will say it again: race conditions are a showstopper and are not negotiable. Period. Stefan, you are not being correct by a consensus of thread analyzers output or by being loud, or by pounding your fist on the table. This being said I will continue to exercise due diligence and put in the necessary time and effort to provide you with the most useful feed-back I can. I see that you missed my question in the post before: did you see a failure in the locale MT tests in your SPARC runs? If you do not want to run that test, that is fine, just let me know. Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek travis.vi...@roguewave.com wrote: I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. You have a working knowledge of all Standard C Library implementations? I happen to do, yes, for the operating systems that I've been testing on. I also happen to know that you don't. This fact alone pretty much closes up *this* particular discussion. Do yourself, and this mailing list a favor: either write a patch which addresses all of your concerns *AND* eliminates all the race conditions reported, or stop this pseudo software engineering bullshit via email. There is apparently, a high concentration of know-it-alls on this mailing list, who are much better at detecting race conditions and thread unsafety than the tools themselves. Too bad they aren't as good at figuring out their own bugs. It took eight months for anyone here to even *acknowledge* that numpunct and moneypunct do have, in fact, a thread safety problem. Never mind that the test case for these facets had been crashing for 4 years. To be quite blunt and to the point, after 8 months of denying obvious facts, your credibility is quite a bit under question at this point. This entire discussion has become a perfect illustration with what's wrong with the ASF, as reported here: http://www.mikealrogers.com/posts/apache-considered-harmful.html -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote: To answer your question [...]: yes, the MT failures occur on SPARC as well, on both SPARCV8 and SPARCV9, and the race conditions are reported on *ALL* plaforms tested, even after having applied your _numpunct.h patch. This patch alone does *NOT* solve the problem. Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
Hi, My perceptions is by reading through the whole thread - we should not trust 100% external tools to asses the safety of the code. I don't think there exist an algorithm that produces no false positives. That's said I admire Stefan's approach, but we should ask the question are we MT safe enough? I would say from what I read here: yes. Liviu Nicoara nikko...@hates.ms writes: On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote: To answer your question [...]: yes, the MT failures occur on SPARC as well, on both SPARCV8 and SPARCV9, and the race conditions are reported on *ALL* plaforms tested, even after having applied your _numpunct.h patch. This patch alone does *NOT* solve the problem. Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? Thanks, Liviu -- Wojciech Meyer http://danmey.org
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer wojciech.me...@googlemail.com wrote: Hi, My perceptions is by reading through the whole thread - we should not trust 100% external tools to asses the safety of the code. I don't think there exist an algorithm that produces no false positives. That's said I admire Stefan's approach, but we should ask the question are we MT safe enough? I would say from what I read here: yes. Based on what objective metric? --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 7:37 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer wojciech.me...@googlemail.com wrote: Hi, My perceptions is by reading through the whole thread - we should not trust 100% external tools to asses the safety of the code. I don't think there exist an algorithm that produces no false positives. That's said I admire Stefan's approach, but we should ask the question are we MT safe enough? I would say from what I read here: yes. Based on what objective metric? The only gold currency that anyone in here accepts without reservations are failing test cases. I believe I have seen some exceptions to the golden rule in my RW time, but I can't recall any specific instance. Liviu
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? On September 17, 2012 I have posted the following message to this list: http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html In that message, there is a link to my SPARC thread-safety test results: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html This test was run with the following _numpunct.h file: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html The test above shows 12440 race conditions detected for a test run of 22.locale.numpunct.mt, with --nthreads=8 --nloops=1. Did you ever look at these test results? From reading your email, I realize that you never looked at it. That is the only possible explanation as to why you're asking now for SPARC test results, today being September 20, 2012. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek travis.vi...@roguewave.com wrote: I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. You have a working knowledge of all Standard C Library implementations? I happen to do, yes, for the operating systems that I've been testing on. I also happen to know that you don't. This fact alone pretty much closes up *this* particular discussion. Do yourself, and this mailing list a favor: either write a patch which addresses all of your concerns *AND* eliminates all the race conditions reported, or stop this pseudo software engineering bullshit via email. There is apparently, a high concentration of know-it-alls on this mailing list, who are much better at detecting race conditions and thread unsafety than the tools themselves. Too bad they aren't as good at figuring out their own bugs. The sniping is uncalled for. There are no enemies here, no one is after you. There is criticism though and you are expected to take it and argue your point of view. If you can't stand the heat, get out of the kitchen. It took eight months for anyone here to even *acknowledge* that numpunct and moneypunct do have, in fact, a thread safety problem. Never mind that the test case for these facets had been crashing for 4 years. To be quite blunt and to the point, after 8 months of denying obvious facts, your credibility is quite a bit under question at this point. Yes, we are busy with other stuff. I wish I got paid to work on this instead. This entire discussion has become a perfect illustration with what's wrong with the ASF, as reported here: http://www.mikealrogers.com/posts/apache-considered-harmful.html I actually read it. I see a guy complaining he can't have it his way. No problem. One can fork this project at any time and start it anew, by themselves, or in the company of like programmers elsewhere. For better or worse Apache got STDCXX from RogueWave. Complaining about it is like complaining that Apple doesn't give us iPhones for free; after all we are the power users and we know what to do with them. L
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 7:45 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? On September 17, 2012 I have posted the following message to this list: http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html In that message, there is a link to my SPARC thread-safety test results: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html This test was run with the following _numpunct.h file: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html The test above shows 12440 race conditions detected for a test run of 22.locale.numpunct.mt, with --nthreads=8 --nloops=1. Did you ever look at these test results? From reading your email, I realize that you never looked at it. That is the only possible explanation as to why you're asking now for SPARC test results, today being September 20, 2012. I see, there is a confusion about this. Probably nobody explained it before. A failing test case means a test case that causes the abnormal termination of the execution of the program or creates evidence of abnormal data in the program execution. In this respect please see the atomic add and exchange tests as classical examples of what I mean. I have read all your emails in detail and I have inspected all your attachments, modulo the ones I could not open. Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote: On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote: The only gold currency that anyone in here accepts without reservations are failing test cases. I believe I have seen some exceptions to the golden rule in my RW time, but I can't recall any specific instance. That may be a valid metric here. The only one. Any programmer worth his salt -- I am borrowing your words here -- would be able to demonstrate the validity of his point of view with a test case. I did. There are 12440 race conditions detected for an incomplete run of 22.locale.numpunct.mt. By incomplete I mean: it did not run with its default nthreads and nloops which I believe are 8 threads and 20 loop iterations. I presented a *proposal* fix which: 1. keeps your _numpunct.h forwarding patch 2. eliminates 100% of the race conditions I have yet to see a counter-proposal. The only thing i've seen are assertions (race condition instrumentation and detection tools are wrong), mischaracterizations (your patch is evil) and overall just email bullshit. Not a single line of code which would resolve the 12440 race conditions problem. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
Liviu Nicoara nikko...@hates.ms writes: On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek travis.vi...@roguewave.com wrote: I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. You have a working knowledge of all Standard C Library implementations? I happen to do, yes, for the operating systems that I've been testing on. I also happen to know that you don't. This fact alone pretty much closes up *this* particular discussion. Do yourself, and this mailing list a favor: either write a patch which addresses all of your concerns *AND* eliminates all the race conditions reported, or stop this pseudo software engineering bullshit via email. There is apparently, a high concentration of know-it-alls on this mailing list, who are much better at detecting race conditions and thread unsafety than the tools themselves. Too bad they aren't as good at figuring out their own bugs. I fully agree - tools are great, however I know a little about compilers, and I can tell you that there are limits of static guarantees you can get from any analyser, because in nature there is something defined as a halting problem, which limits the tools even the topnotch ones based on abstract interpretation to the certain extent. The halting problem says: for every program in a formal language that is Turing complete you can't say with 100% assurance it will halt for every input data. You can try to analyse it statically, but then there is a balance between analysing and interpreting parts of it, even in the extreme case if you run it - you will not know if it suppose to halt. Therefore please use tools but be a bit reserved for the results. All these MT analysers are based on a simple heuristics and logical assertions that can't give you 100% right results. I don't think people here are picky about your patches, it's just better sometimes to take a breath and see the big picture. -- Wojciech Meyer http://danmey.org
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 8:04 PM, Wojciech Meyer wojciech.me...@googlemail.com wrote: Therefore please use tools but be a bit reserved for the results. I *am* being cautiously skeptical about the results. That's why I am using 4 [ FOUR ] different thread analyzers, on three different operating systems, each one of them in 32- and 64- bit, and not just one. With this testing setup described above, when all FOUR instrumentation toosl report the same exact problem in the same exact spot, for all flavors of the operating system, what would be a rational conclusion? 1. There is indeed a race condition and thread safety problem, it needs to be investigated and fixed.. 2. Bah, the tools are crap, nothing to see here, move along, declare victory. I chose [1] because I am willing to accept my *own* limitations. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 8:02 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote: On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote: The only gold currency that anyone in here accepts without reservations are failing test cases. I believe I have seen some exceptions to the golden rule in my RW time, but I can't recall any specific instance. That may be a valid metric here. The only one. Any programmer worth his salt -- I am borrowing your words here -- would be able to demonstrate the validity of his point of view with a test case. I did. There are 12440 race conditions detected for an incomplete run of 22.locale.numpunct.mt. By incomplete I mean: it did not run with its default nthreads and nloops which I believe are 8 threads and 20 loop iterations. That is not it, and you did not. Please pay attention: given your assertion that a race condition is a defect that causes an abnormal execution of the program during which the program sees abnormal, incorrect states (read: variable values) it should be easy for you to craft a program that shows evidence of that by either printing those values, or aborting upon detecting them, etc. [...] and overall just email bullshit. Stop using that word. L
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 8:18 PM, Liviu Nicoara nikko...@hates.ms wrote: That is not it, and you did not. Please pay attention: given your assertion that a race condition is a defect that causes an abnormal execution of the program during which the program sees abnormal, incorrect states (read: variable values) it should be easy for you to craft a program that shows evidence of that by either printing those values, or aborting upon detecting them, etc. Oh, I see. So now I'm supposed to write a program which may, or may not, prove to you that the 12440 race conditions detected by SunPro and Intel are, in fact, real race conditions (as opposed to fake race conditions)? And the means of proving the existence of these real race conditions is ... [ drum roll ] ... fprintf(3C)? This is very funny. You made my day, Have a nice evening. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On 09/21/12 07:39 AM, Liviu Nicoara wrote: Now, in all honesty, it is not too hard to do that. Once you can satisfactorily explain to yourself what is wrong in the program, the creation of a test case is trivial. Some multi-threading bugs are insidious and hard to reproduce, but even then it's doable by isolating as little a portion of the codebase as possible, as a standalone program, and trimming it until the failure becomes easily reproducible. fencepost comment - The results are based on tools and I don't think he has a large program which actually triggers the conditions. (Creating one may take quite some time)
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoara nikko...@hates.ms wrote: I have not created this requirement out of thin air. STDCXX development has functioned in this manner for as long as I remember. If it does not suit you, that's fine. That would explain why these bugs are present in the first place. If the official method of determining thread-safety here is fprintf(3C), then we have a much bigger problem than 22.locale.numpunct.mt. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 8:59 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström cbergst...@pathscale.com wrote: fencepost comment - The results are based on tools and I don't think he has a large program which actually triggers the conditions. (Creating one may take quite some time) I do have a program which triggers the race conditions: 22.locale.numpunct.mt. Part of the official test harness. The real reason why they don't want to accept what the instrumentation tools are saying is very simple: they don't LIKE reading what the tools are saying. So, blame the tools, pretend that as long as it doesn't crash again there's no bug and hope for the best. I cannot include an analyzer output as a regression test in the test suite. But I am very glad this is on a public mailing list, so everyone can read what's going on here. ?
STDCXX-1056 : numpunct fix
This is a proposed fix for the numpunct facet for stdcxx-1056: 0. Number of reported race conditions is now 0 (zero). 1. No memory leaks in stdcxx (there are memory leaks reported in either libc or glibc, but there's nothing we can do about these anyway). 2. This fix preserves perfect forwarding in the _numpunct.h header file. 3. This fix eliminates code from facet.cpp and locale_body.cpp which was creating unnecessary overhead, with the potential of causing memory corruption, while providing no discernable benefit. More specifically: It is not true that there was no eviction policy of cached locales or facets in stdcxx. Not only cache eviction code existed, and still exists today, but cache cleanups and resizing were performed periodically, either when an object's reference count dropped to 0 (zero), or whenever the number of cached objects fell below sizeof(cache) / 2. In the latter case, both the facet cache and the locale cache performed a new allocation of the cache array, followed by a memcopy and a delete[] of the old cache array. First, the default size of the facets and locales caches was too small: it was set to 8. I raised this to 32. A direct consequence of this insufficient default size of 8 was that the cache had to resize itself very soon after program startup. This cache resize operation consists of: allocate memory for a new cache, copy the existing cached objects from the old cache to the new cache, and then delete[] the old cache. This is a first unnecessary overhead. Second, and as I mentioned above, whenever the number of cached objects fell below sizeof(cache) / 2, the cache resized itself, by performing the same sequence of operations as described above. This is a second unnecessary overhead. Third, cached objects were automatically evicted whenever their reference count dropped to 0 (zero). There are two consequences to this eviction policy: if the program needs to re-use an object (facet or locale) which has been evicted and subsequently destroyed, this object needs then to be constructed again later on, and subsequently re-inserted into the cache. This, in turn, would trigger a cache resize, followed by copying and delete[] of the old cache buffer. Object eviction followed by destruction followed by reconstruction is a third unnecessary overhead. Re-inserting a re-constructed object into, the cache, followed by a potential cache resize involving allocation of a new buffer, copying pointers from the old cache to the new cache, followed by delete[] of the old cache is a fourth unnecessary overhead. Real-life programs tend to reuse locales and/or facets they have created. There is no point in destroying and evicting these objects simply because there may be periods when the object isn't referenced at time. The object is likely to be needed again, later on. The fix proposed here eliminates the cache eviction and object destruction policy completely. Once created, objects remain in the cache, even though they may reside in the cache with no references. This eliminates the cache resize / copy / delete[] overhead. It also eliminates the overhead of re-constructing an evicted / destroyed object, if it is needed again later. 4. Tests and Analysis Results: 4.1. SunPro 12.3 / Solaris / SPARC / Race Conditions Test: http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.solaris-sparc.datarace.er.html/index.html 4.2. SunPro 12.3 / Solaris / SPARC / Heap and Memory Leaks Test: http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.solaris-sparc.heapcheck.er.html/index.html 4.3. SunPro 12.3 / Linux / Intel / Race Conditions Test: http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.linux-intel.datarace.er.html/index.html 4.4. SunPro 12.3 / Linux / Intel / Heap and Memory Leaks Test: http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.linux-intel.heapcheck.er.html/index.html 4.5. Intel 2013 / Linux / Intel / Race Conditions Test: http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.intel.linux.datarace.r007ti3.inspxez 4.6. Intel 2013 / Linux / Intel / Heap and Memory Leaks Test: http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.intel.linux.heapcheck.r008mi1.inspxez 5. Source code for this fix: http://s247136804.onlinehome.us/stdcxx-1056-20120919/_numpunct.h http://s247136804.onlinehome.us/stdcxx-1056-20120919/facet.cpp http://s247136804.onlinehome.us/stdcxx-1056-20120919/locale_body.cpp http://s247136804.onlinehome.us/stdcxx-1056-20120919/punct.cpp These files are based on stdcxx 4.2.1. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara nikko...@hates.ms wrote: I think you are referring to `live' cache objects and the code which specifically adjusts the size of the buffer according to the number of `live' locales and/or facets in it. In that respect I would not call that eviction because locales and facets with non-zero reference counters are never evicted. But anyhoo, this is semantics. Bottom line is the locale/facet buffer management code follows a principle of economy. Yes it does. But we have to choose between economy and efficiency. To clarify: The overhead of having unused pointers in the cache is sizeof(void*) times the number of unused slots. This is 2012. Even an entry-level Android cell phone comes with 1GB system memory. If we want to talk about embedded systems, where memory constraints are more stringent than cell phones, then we're not talking about Apache stdcxx anymore, or any other open souce of the C++ Standard Library. These types of systems use C++ for embedded systems, which is a different animal altogether: no exceptions support, no rtti. For example see, Green Hills: http://www.ghs.com/ec++.html. And even they have become more relaxed about memory constraints. They use BOOST. Bottom line: so what if 16 pointers in this 32 pointer slots cache never get used. The maximum amount of wasted memory for these 16 pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers. Can we live with that in 2012, a year when a $500 laptop comes with 4GB RAM out of the box? I would pick 128 bytes of allocated but unused memory over random and entirely avoidable memory churn any day. The optimal number is subject to debate. Probably Martin can give an insight into the reasons for that number. Why did you pick 32 (or is it 64 in your patch) and not any other? Is it something based on your experience as a user or programmer? Based on two things: 1. There are, apparently, 30 top languages spoken on this planet: http://www.vistawide.com/languages/top_30_languages.htm 2. I've written locale-aware software back in my days on Wall Street. The maximum number of locales I had to support was 14. So max(14, 30) would be 30. So I made it 32 because it's a power of 2. A negligible overhead, IMO. The benefits of maintaining a small memory footprint may be important for some environments. As useful as principles may be, see above. Small and negligible in theory. In practice, when the cache starts resizing itself by allocating new memory, copying, delete[]'ing and - I forgot to mention this in my initial post - finishing it all up with a call to qsort(3C), it's not that negligible anymore. It doesn't just happen once. It happens every time the cache gets anxious (for reasons mentioned in my previous email) and wants to resize itself. Which triggers the following question in my mind: why are we even causing all this memory churn in the first place? Because we saved 128 bytes (or 64 bytes on a 32-bit machine, which is what most cell phones/tablets are these days)? My goal: I would be very happy if any application using Apache stdcxx would reach its peak instantiation level of localization (read: max number of locales and facets instantiated and cached, for the application's particular use case), and would then stabilize at that level *without* having to resize and re-sort the cache, *ever*. That is a locale cache I can love. I love binary searches on sorted containers. Wrecking the container with insertions or deletions, and then having to re-sort it again, not so much. Especially when I can't figure out why we're doing it in the first place. In this respect you could call every memory allocation and de-allocation is an overhead. Please keep in mind that this resembles the operations performed for any sequence containers; how likely is it for a program to have more locale/facet creation/destruction than strings or vectors mutations? There's one fundamental difference: the non-sorted STL containers give the developer the opportunity to construct them with an initial size larger than the implementation-specific default size. Any application developer worth their salt would perform some initial size optimization for these types of containers. If I know that my std::list will end up containing 5000 things, I would never construct my list object with the default size of 16. Do that, and you'll get flamed at code review. As for the sorted associative containers, that's one of the major gripes against them: whenever they have to grow, or rebalance, they get expensive. But we're not using a sorted associative container here. It's just a plain ol' C array. Could you please elaborate a bit on this? Is this your opinion based on your user and/or programmer experience? See above about the top 30 languages spoken in the world. Hey Stefan, are the above also timing the changes? Nah, I didn't bother with the timings - yet - for a very simple reason: in order to use