Re: STDCXX-1056 : numpunct fix
Thanks for the feed-back. Please see below. On Sep 19, 2012, at 10:02 PM, Stefan Teleman wrote: On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara nikko...@hates.ms wrote: I think you are referring to `live' cache objects and the code which specifically adjusts the size of the buffer according to the number of `live' locales and/or facets in it. In that respect I would not call that eviction because locales and facets with non-zero reference counters are never evicted. But anyhoo, this is semantics. Bottom line is the locale/facet buffer management code follows a principle of economy. Yes it does. But we have to choose between economy and efficiency. To clarify: The overhead of having unused pointers in the cache is sizeof(void*) times the number of unused slots. This is 2012. Even an entry-level Android cell phone comes with 1GB system memory. If we want to talk about embedded systems, where memory constraints are more stringent than cell phones, then we're not talking about Apache stdcxx anymore, or any other open souce of the C++ Standard Library. These types of systems use C++ for embedded systems, which is a different animal altogether: no exceptions support, no rtti. For example see, Green Hills: http://www.ghs.com/ec++.html. And even they have become more relaxed about memory constraints. They use BOOST. Bottom line: so what if 16 pointers in this 32 pointer slots cache never get used. The maximum amount of wasted memory for these 16 pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers. Can we live with that in 2012, a year when a $500 laptop comes with 4GB RAM out of the box? I would pick 128 bytes of allocated but unused memory over random and entirely avoidable memory churn any day. The argument is plausible and fine as far as brainstorming goes. But have you measured the amount of memory consumed by all STDCXX locale data loaded in one process? How much absolute time is spent in resizing the locale and facet buffers? What is the gain in space and time performance with such a change versus without? Just how fragmented the heap becomes and is there a performance impact because of it, etc.? IOW, before changing the status quo one must show an objective defect, produce a body of evidence, including a failing test case for the argument. My goal: I would be very happy if any application using Apache stdcxx would reach its peak instantiation level of localization (read: max number of locales and facets instantiated and cached, for the application's particular use case), and would then stabilize at that level *without* having to resize and re-sort the cache, *ever*. That is a locale cache I can love. I love binary searches on sorted containers. Wrecking the container with insertions or deletions, and then having to re-sort it again, not so much. Especially when I can't figure out why we're doing it in the first place. And I love minimalistic code, and hate waste at the same time, especially in a general purpose library. To each its own. Hey Stefan, are the above also timing the changes? Nah, I didn't bother with the timings - yet - for a very simple reason: in order to use instrumentation, both with SunPro and with Intel compilers, optimization of any kind must be disabled. On SunPro you have to pass -xkeepframe=%all (which disables tail-call optimization as well), in addition to passing -xO0 and -g. So the timings for these unoptimized experiments would have been completely irrelevant. Well, I think you are the only one around here with access to SPARC hardware, your input is very precious in this sense. Also, this is the reason for which I kept asking that question earlier: do we have currently any failing locale MT test when numpunct does just perfect forwarding, with no caching? I.e., changing just _numpunct.h and no other source file (as to silence thread analyzers warnings) does any locale (or other) MT tests fail? I would greatly appreciate it if you could give it a run on your hardware if you don't already know the answer. The discussion has been productive. But I object to the patch as is because it goes out of the scope of the original incident. I think this patch should only touch the MT defect detected by the failing test cases. If you think the other parts you changed are defects you should open corresponding issues in JIRA and have them discussed in their separate rooms. Thanks, Liviu
RE: STDCXX-1056 : numpunct fix
-Original Message- From: Stefan Teleman [mailto:stefan.tele...@gmail.com] Sent: Thursday, September 20, 2012 10:11 AM To: dev@stdcxx.apache.org Subject: Re: STDCXX-1056 : numpunct fix On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms wrote: But have you measured the amount of memory consumed by all STDCXX locale data loaded in one process? How much absolute time is spent in resizing the locale and facet buffers? What is the gain in space and time performance with such a change versus without? Just how fragmented the heap becomes and is there a performance impact because of it, etc.? IOW, before changing the status quo one must show an objective defect, produce a body of evidence, including a failing test case for the argument. sizeof(std::locale) * number_of_locales. I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. You have a working knowledge of all Standard C Library implementations? What I do know for a fact that this optimization did, was to cause the races conditions reported by 4 different thread analyzers. Race conditions are a show-stopper for me, and they are not negotiable. The following is found near the top of the _C_manage method of __rw_facet. // acquire lock _RWSTD_MT_STATIC_GUARD (_RW::__rw_facet); None of the shared data related to is read/written outside of the critical section protected by that lock, and given the declaration for that shared data it cannot be accessed by any code outside that function. Put bluntly, there is no way that there is a race condition relating to the caching code itself. Your Performance Analyzer output indicates a race (7 race accesses) for _C_manage... http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/ Specifically, it is calling out the following block of code. ## 70 488. *__rw_access::_C_get_pid (*pfacet) = 489. _RWSTD_STATIC_CAST (_RWSTD_SIZE_T, (type + 1) / 2); The function _C_get_pid simply exposes a reference to a data member of the given facet. That function is thread safe. Provided that pfacet (the parameter passed to _C_manage) isn't being accessed by another thread, there is no way that this code is not safe. It is possible that calling code is not safe, but this code is clean. Regardless, the proposed patch to _C_manage does nothing to change this block of code. I do not understand how you can claim that this change eliminated the race conditions you are so offended by. It is possible that other changes you have made eliminated the data races, but I do not see how this change has any effect. And I love minimalistic code, and hate waste at the same time, especially in a general purpose library. To each its own. Here's a helpful quote: We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. It's from Donald Knuth. By that measure, your entire patch could be considered evil. I've seen no evidence that the subsequent two allocation/copy/deallocate/sort cycles required to get from 8 to 64 entries is measurably more expensive, and I've seen nothing to indicate that a normal application using the C++ Standard Library would be creating and destroying locale instances in large numbers, or that doing so has a measureable impact on performance. And I love correct code which doesn't cause thread analyzers to report more than 12000 race conditions for just one test case. I've said it before and I will say it again: race conditions are a showstopper and are not negotiable. Period. When the code in question has 12 threads that invoke a function 1000 times, you've found 1 race condition. I do agree data races are bad and should be fixed. But making changes to 'optimize' the code instead of fixing it is actually much worse. The patch is in scope for the issue at hand. The issue is that std::numpunct and std::moneypunct are not thread safe. This has been confirmed by 4 different thread analyzers, even after applying your _numpunct.h patch. I looked at the output from the thread analyzer. It points out a data race in __rw::__rw_allocate(), indicating that a memset() is responsible for a data race... http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/file.14.src.txt.html#line_43 Assuming that `operator new' is indeed thread safe (I didn't bother to look), I'm curious to hear how this is an actual data race. I'm also curious to hear how you managed to avoid having the same race appear in the output that you submitted with the proposed patch. You are more than welcome to
Re: STDCXX-1056 : numpunct fix
On 09/20/12 13:11, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms wrote: But have you measured the amount of memory consumed by all STDCXX locale data loaded in one process? How much absolute time is spent in resizing the locale and facet buffers? What is the gain in space and time performance with such a change versus without? Just how fragmented the heap becomes and is there a performance impact because of it, etc.? IOW, before changing the status quo one must show an objective defect, produce a body of evidence, including a failing test case for the argument. sizeof(std::locale) * number_of_locales. I was more interested in the underlying locale data, not the C++ objects. I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. Thanks for the secret, I appreciate it. Did you mean to say that the C Standard mandates that?! What I do know for a fact that this optimization did, was to cause the races conditions reported by 4 different thread analyzers. Race conditions are a show-stopper for me, and they are not negotiable. No, that optimization was not causing the MT defect you originally noted. Saying so only shows a lack of familiarity with the implementation. And I love minimalistic code, and hate waste at the same time, especially in a general purpose library. To each its own. Here's a helpful quote: We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. It's from Donald Knuth. Please, no. And I love correct code which doesn't cause thread analyzers to report more than 12000 race conditions for just one test case. I've said it before and I will say it again: race conditions are a showstopper and are not negotiable. Period. Stefan, you are not being correct by a consensus of thread analyzers output or by being loud, or by pounding your fist on the table. This being said I will continue to exercise due diligence and put in the necessary time and effort to provide you with the most useful feed-back I can. I see that you missed my question in the post before: did you see a failure in the locale MT tests in your SPARC runs? If you do not want to run that test, that is fine, just let me know. Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek travis.vi...@roguewave.com wrote: I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. You have a working knowledge of all Standard C Library implementations? I happen to do, yes, for the operating systems that I've been testing on. I also happen to know that you don't. This fact alone pretty much closes up *this* particular discussion. Do yourself, and this mailing list a favor: either write a patch which addresses all of your concerns *AND* eliminates all the race conditions reported, or stop this pseudo software engineering bullshit via email. There is apparently, a high concentration of know-it-alls on this mailing list, who are much better at detecting race conditions and thread unsafety than the tools themselves. Too bad they aren't as good at figuring out their own bugs. It took eight months for anyone here to even *acknowledge* that numpunct and moneypunct do have, in fact, a thread safety problem. Never mind that the test case for these facets had been crashing for 4 years. To be quite blunt and to the point, after 8 months of denying obvious facts, your credibility is quite a bit under question at this point. This entire discussion has become a perfect illustration with what's wrong with the ASF, as reported here: http://www.mikealrogers.com/posts/apache-considered-harmful.html -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote: To answer your question [...]: yes, the MT failures occur on SPARC as well, on both SPARCV8 and SPARCV9, and the race conditions are reported on *ALL* plaforms tested, even after having applied your _numpunct.h patch. This patch alone does *NOT* solve the problem. Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
Hi, My perceptions is by reading through the whole thread - we should not trust 100% external tools to asses the safety of the code. I don't think there exist an algorithm that produces no false positives. That's said I admire Stefan's approach, but we should ask the question are we MT safe enough? I would say from what I read here: yes. Liviu Nicoara nikko...@hates.ms writes: On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote: To answer your question [...]: yes, the MT failures occur on SPARC as well, on both SPARCV8 and SPARCV9, and the race conditions are reported on *ALL* plaforms tested, even after having applied your _numpunct.h patch. This patch alone does *NOT* solve the problem. Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? Thanks, Liviu -- Wojciech Meyer http://danmey.org
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer wojciech.me...@googlemail.com wrote: Hi, My perceptions is by reading through the whole thread - we should not trust 100% external tools to asses the safety of the code. I don't think there exist an algorithm that produces no false positives. That's said I admire Stefan's approach, but we should ask the question are we MT safe enough? I would say from what I read here: yes. Based on what objective metric? --Stefan -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 7:37 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer wojciech.me...@googlemail.com wrote: Hi, My perceptions is by reading through the whole thread - we should not trust 100% external tools to asses the safety of the code. I don't think there exist an algorithm that produces no false positives. That's said I admire Stefan's approach, but we should ask the question are we MT safe enough? I would say from what I read here: yes. Based on what objective metric? The only gold currency that anyone in here accepts without reservations are failing test cases. I believe I have seen some exceptions to the golden rule in my RW time, but I can't recall any specific instance. Liviu
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? On September 17, 2012 I have posted the following message to this list: http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html In that message, there is a link to my SPARC thread-safety test results: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html This test was run with the following _numpunct.h file: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html The test above shows 12440 race conditions detected for a test run of 22.locale.numpunct.mt, with --nthreads=8 --nloops=1. Did you ever look at these test results? From reading your email, I realize that you never looked at it. That is the only possible explanation as to why you're asking now for SPARC test results, today being September 20, 2012. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek travis.vi...@roguewave.com wrote: I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. You have a working knowledge of all Standard C Library implementations? I happen to do, yes, for the operating systems that I've been testing on. I also happen to know that you don't. This fact alone pretty much closes up *this* particular discussion. Do yourself, and this mailing list a favor: either write a patch which addresses all of your concerns *AND* eliminates all the race conditions reported, or stop this pseudo software engineering bullshit via email. There is apparently, a high concentration of know-it-alls on this mailing list, who are much better at detecting race conditions and thread unsafety than the tools themselves. Too bad they aren't as good at figuring out their own bugs. The sniping is uncalled for. There are no enemies here, no one is after you. There is criticism though and you are expected to take it and argue your point of view. If you can't stand the heat, get out of the kitchen. It took eight months for anyone here to even *acknowledge* that numpunct and moneypunct do have, in fact, a thread safety problem. Never mind that the test case for these facets had been crashing for 4 years. To be quite blunt and to the point, after 8 months of denying obvious facts, your credibility is quite a bit under question at this point. Yes, we are busy with other stuff. I wish I got paid to work on this instead. This entire discussion has become a perfect illustration with what's wrong with the ASF, as reported here: http://www.mikealrogers.com/posts/apache-considered-harmful.html I actually read it. I see a guy complaining he can't have it his way. No problem. One can fork this project at any time and start it anew, by themselves, or in the company of like programmers elsewhere. For better or worse Apache got STDCXX from RogueWave. Complaining about it is like complaining that Apple doesn't give us iPhones for free; after all we are the power users and we know what to do with them. L
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 7:45 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote: Stefan, I want to be clear. You are talking about a patch identical in nature to the one I have attached now. Just want to be 100% sure we are talking about the same thing. This one still produces failures (crashes, assertions, etc.) in the locale MT tests on SPARC and elsewhere in your builds? On September 17, 2012 I have posted the following message to this list: http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html In that message, there is a link to my SPARC thread-safety test results: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html This test was run with the following _numpunct.h file: http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html The test above shows 12440 race conditions detected for a test run of 22.locale.numpunct.mt, with --nthreads=8 --nloops=1. Did you ever look at these test results? From reading your email, I realize that you never looked at it. That is the only possible explanation as to why you're asking now for SPARC test results, today being September 20, 2012. I see, there is a confusion about this. Probably nobody explained it before. A failing test case means a test case that causes the abnormal termination of the execution of the program or creates evidence of abnormal data in the program execution. In this respect please see the atomic add and exchange tests as classical examples of what I mean. I have read all your emails in detail and I have inspected all your attachments, modulo the ones I could not open. Thanks, Liviu
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote: On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote: The only gold currency that anyone in here accepts without reservations are failing test cases. I believe I have seen some exceptions to the golden rule in my RW time, but I can't recall any specific instance. That may be a valid metric here. The only one. Any programmer worth his salt -- I am borrowing your words here -- would be able to demonstrate the validity of his point of view with a test case. I did. There are 12440 race conditions detected for an incomplete run of 22.locale.numpunct.mt. By incomplete I mean: it did not run with its default nthreads and nloops which I believe are 8 threads and 20 loop iterations. I presented a *proposal* fix which: 1. keeps your _numpunct.h forwarding patch 2. eliminates 100% of the race conditions I have yet to see a counter-proposal. The only thing i've seen are assertions (race condition instrumentation and detection tools are wrong), mischaracterizations (your patch is evil) and overall just email bullshit. Not a single line of code which would resolve the 12440 race conditions problem. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
Liviu Nicoara nikko...@hates.ms writes: On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek travis.vi...@roguewave.com wrote: I'll let you in on a little secret: once you call setlocale(3C) and localeconv(3C), the Standard C Library doesn't release its own locale handles until process termination. So you might think you save a lot of memory by destroying and constructing the same locales. You're really not. It's the Standard C Library locale data which takes up a lot of space. You have a working knowledge of all Standard C Library implementations? I happen to do, yes, for the operating systems that I've been testing on. I also happen to know that you don't. This fact alone pretty much closes up *this* particular discussion. Do yourself, and this mailing list a favor: either write a patch which addresses all of your concerns *AND* eliminates all the race conditions reported, or stop this pseudo software engineering bullshit via email. There is apparently, a high concentration of know-it-alls on this mailing list, who are much better at detecting race conditions and thread unsafety than the tools themselves. Too bad they aren't as good at figuring out their own bugs. I fully agree - tools are great, however I know a little about compilers, and I can tell you that there are limits of static guarantees you can get from any analyser, because in nature there is something defined as a halting problem, which limits the tools even the topnotch ones based on abstract interpretation to the certain extent. The halting problem says: for every program in a formal language that is Turing complete you can't say with 100% assurance it will halt for every input data. You can try to analyse it statically, but then there is a balance between analysing and interpreting parts of it, even in the extreme case if you run it - you will not know if it suppose to halt. Therefore please use tools but be a bit reserved for the results. All these MT analysers are based on a simple heuristics and logical assertions that can't give you 100% right results. I don't think people here are picky about your patches, it's just better sometimes to take a breath and see the big picture. -- Wojciech Meyer http://danmey.org
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 8:04 PM, Wojciech Meyer wojciech.me...@googlemail.com wrote: Therefore please use tools but be a bit reserved for the results. I *am* being cautiously skeptical about the results. That's why I am using 4 [ FOUR ] different thread analyzers, on three different operating systems, each one of them in 32- and 64- bit, and not just one. With this testing setup described above, when all FOUR instrumentation toosl report the same exact problem in the same exact spot, for all flavors of the operating system, what would be a rational conclusion? 1. There is indeed a race condition and thread safety problem, it needs to be investigated and fixed.. 2. Bah, the tools are crap, nothing to see here, move along, declare victory. I chose [1] because I am willing to accept my *own* limitations. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 8:02 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote: On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote: The only gold currency that anyone in here accepts without reservations are failing test cases. I believe I have seen some exceptions to the golden rule in my RW time, but I can't recall any specific instance. That may be a valid metric here. The only one. Any programmer worth his salt -- I am borrowing your words here -- would be able to demonstrate the validity of his point of view with a test case. I did. There are 12440 race conditions detected for an incomplete run of 22.locale.numpunct.mt. By incomplete I mean: it did not run with its default nthreads and nloops which I believe are 8 threads and 20 loop iterations. That is not it, and you did not. Please pay attention: given your assertion that a race condition is a defect that causes an abnormal execution of the program during which the program sees abnormal, incorrect states (read: variable values) it should be easy for you to craft a program that shows evidence of that by either printing those values, or aborting upon detecting them, etc. [...] and overall just email bullshit. Stop using that word. L
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 8:18 PM, Liviu Nicoara nikko...@hates.ms wrote: That is not it, and you did not. Please pay attention: given your assertion that a race condition is a defect that causes an abnormal execution of the program during which the program sees abnormal, incorrect states (read: variable values) it should be easy for you to craft a program that shows evidence of that by either printing those values, or aborting upon detecting them, etc. Oh, I see. So now I'm supposed to write a program which may, or may not, prove to you that the 12440 race conditions detected by SunPro and Intel are, in fact, real race conditions (as opposed to fake race conditions)? And the means of proving the existence of these real race conditions is ... [ drum roll ] ... fprintf(3C)? This is very funny. You made my day, Have a nice evening. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On 09/21/12 07:39 AM, Liviu Nicoara wrote: Now, in all honesty, it is not too hard to do that. Once you can satisfactorily explain to yourself what is wrong in the program, the creation of a test case is trivial. Some multi-threading bugs are insidious and hard to reproduce, but even then it's doable by isolating as little a portion of the codebase as possible, as a standalone program, and trimming it until the failure becomes easily reproducible. fencepost comment - The results are based on tools and I don't think he has a large program which actually triggers the conditions. (Creating one may take quite some time)
Re: STDCXX-1056 : numpunct fix
On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoara nikko...@hates.ms wrote: I have not created this requirement out of thin air. STDCXX development has functioned in this manner for as long as I remember. If it does not suit you, that's fine. That would explain why these bugs are present in the first place. If the official method of determining thread-safety here is fprintf(3C), then we have a much bigger problem than 22.locale.numpunct.mt. -- Stefan Teleman KDE e.V. stefan.tele...@gmail.com
Re: STDCXX-1056 : numpunct fix
On Sep 20, 2012, at 8:59 PM, Stefan Teleman wrote: On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström cbergst...@pathscale.com wrote: fencepost comment - The results are based on tools and I don't think he has a large program which actually triggers the conditions. (Creating one may take quite some time) I do have a program which triggers the race conditions: 22.locale.numpunct.mt. Part of the official test harness. The real reason why they don't want to accept what the instrumentation tools are saying is very simple: they don't LIKE reading what the tools are saying. So, blame the tools, pretend that as long as it doesn't crash again there's no bug and hope for the best. I cannot include an analyzer output as a regression test in the test suite. But I am very glad this is on a public mailing list, so everyone can read what's going on here. ?