Re: [VOTE] discontinue supporting of the MSVC 7.0

2008-01-27 Thread Liviu Nicoara

It was about time.

+1

Tim Adams wrote:

Our experience with our customer base indicates that MSVC 7.0 was
abandoned quickly in favor of 7.1.  


So my vote is also +1 in favor of dropping 7.0.

-- Tim

-Original Message-
From: Martin Sebor [mailto:[EMAIL PROTECTED] On Behalf Of Martin Sebor
Sent: Saturday, January 26, 2008 10:39 AM
To: dev@stdcxx.apache.org
Subject: Re: [VOTE] discontinue supporting of the MSVC 7.0

Farid Zaripov wrote:

  For now we are supporting four versions of the MSVC starting from

7.0.

The MSVC 7.0 doesn't supports the modern C++ features and I propose to



discontinue supporting of this compiler in stdcxx 4.3 version.

  Everyone is encouraged to vote, including non-committers.

  This vote will close on Tuesday, 01/29 at 6:00 AM US/Mountain time.
Follow the link for the countdown: http://tinyurl.com/23clxd


+1 in favor of dropping MSVC 7.0 in 4.3.

Martin


Farid.







Assistance with account reset

2012-08-28 Thread Liviu Nicoara

Hi,

I need assistance with the resetting of my account, lnicoara at apache.

I cannot reset my password via https://id.apache.org because I do not have the 
backup email account anymore, where the reset information is sent.

Thanks in advance.

Liviu Nicoara


Re: New chair and/or attic

2012-08-29 Thread Liviu Nicoara

On 08/29/12 10:54, Jim Jagielski wrote:

Looking over the lack of activity within this project, it's
obvious (at least to me), that maybe its day is done.

Should I call a vote to move C++ to the Attic? Or is there someone
who feels that the project should still exist *and* is willing
to stand as chair?


Hi Jim,

The discussion back in February showed that, even though committers have not 
spent much time lately contributing new code to it, there is an active review 
of the activity occurring on the mailing list and people have volunteered time 
to at least review outside contributions. As Stefan remarked, putting it in the 
Attic pretty much closes the activity around it, as little as it is.

I personally have a renewed interest in the implementation and am in the 
process of reviving my apache account with the intention of being a constant 
presence here, and I hope I will be able to contribute as well. I am not sure 
if anyone reviewed the patches volunteered by Stefan yet, or the changes in 
forks elsewhere, but I am currently looking at that, too.

Thanks.

Liviu


Re: New chair and/or attic

2012-08-30 Thread Liviu Nicoara

On 08/30/12 06:38, Jim Jagielski wrote:


On Aug 29, 2012, at 1:12 PM, Liviu Nicoaranikko...@hates.ms  wrote:


The discussion back in February showed that, even though committers have not 
spent much time lately contributing new code to it, there is an active review 
of the activity occurring on the mailing list and people have volunteered time 
to at least review outside contributions. As Stefan remarked, putting it in the 
Attic pretty much closes the activity around it, as little as it is.



The issue is that I'm not seeing any real activity on any of the mailing 
lists...


And probably, even with the stated interest, the activity will continue to stay 
low a while. FWIW, I have spent my past few days catching up with the changes 
since '08 and refreshing on the build and test infrastructure, etc. Not much of 
a mailing list activity generator.

Liviu


Re: New chair and/or attic

2012-08-30 Thread Liviu Nicoara

On 08/30/12 06:48, C. Bergström wrote:

I'm sincerely sorry to ask this and I have my own answers, but why continue 
STDCXX when such negativity from Apache is apparent..


AFAICT, the Apache Foundation has been a good host for STDCXX during these 
years. They have provided a framework for STDCXX to function in as well as an 
infrastructure for its daily activities. All in accordance to their principles 
about what constitutes a healthy software project.



or
Why not move to libc++? (Yes I realize the amount of effort involved here)


It can't be explained.

L


Re: New chair and/or attic

2012-08-30 Thread Liviu Nicoara

On 08/30/12 08:56, C. Bergström wrote:

On 08/30/12 07:29 PM, Liviu Nicoara wrote:


AFAICT, the Apache Foundation has been a good host for STDCXX during these 
years. They have provided a framework [...] in accordance to their principles 
about what constitutes a healthy software project.

I disagree that the recent actions have fostered positive growth in the project.
1) They fired the previous PMC - who was by far the most invested and dedicated 
person to the project. I don't care if he missed some reports or had a few 
flippant comments - I think it was pretty stupid (I mean he's part of the C++ 
standard committee)


Again, according to their principles on what is and what is not a healthy 
project. I have not yet regained access to my committer account so I am not 
fully aware of the private discussions around the PMC switch.

L


New committers?

2012-08-30 Thread Liviu Nicoara

I see in the February report (http://stdcxx.apache.org/status/2012-02.txt) that 
three new committers have been added to the project. Congratulations! Could one 
of you please update the stdcxx list of committers?

Thanks.

L


Re: New committers?

2012-08-30 Thread Liviu Nicoara

On 08/30/12 11:17, Stefan Teleman wrote:

[...]
I don't mean to punt but I think Jim Jagielski maintains a separate
link with the correct list of committers:


I don't see any difference between the two, either. I'll leave it at that.

L


Re: New chair and/or attic

2012-08-30 Thread Liviu Nicoara

On Aug 30, 2012, at 5:45 PM, C. Bergström cbergst...@pathscale.com wrote:

 On 08/31/12 03:10 AM, Pavel Heimlich, a.k.a. hajma wrote:
 2) Posting the project is dead on a public list certainly doesn't help grow 
 a community
 
 
 well, it's half year since revival of the project was announced and has
 there been any progress/improvements? The state of this is a koma at best.
 Just because bureaucrats say jump doesn't mean anything is going to happen.

Pavel has had an unfortunate choice of words. Let's leave it at that.

 ---
 The facts as I know it
 1) Our fork is maintained (continuous bug fixes - which we won't submit to 
 Apache now)
 2) Stefan is putting in some work (one man army)
 3) Wojciech Meyer had put in some work
 4) NetBSD has a small amount of patches they could probably push upstream (If 
 Jörg has the time)
 5) Martin is/was great for feedback in all areas of STL/C++/occasional code 
 review

While I recognize the value of each one of the points you make, I am puzzled as 
to why you are not going forward on your way with your fork? How is the Apache 
Foundation keeping you from making progress on your use of the library?

 
 STDCXX isn't some stupid ass java framework or widget - It's a *critical* 
 part of ...

We all know that. This is the reason we/I come back to it over and over again, 
for reference, or inspiration, or sometime just to remember the good ol' days. 
That's what I meant by it can't be explained.

L



Re: New chair and/or attic

2012-08-30 Thread Liviu Nicoara

On Aug 30, 2012, at 8:00 PM, C. Bergström wrote:

 On 08/31/12 06:43 AM, Liviu Nicoara wrote:
 While I recognize the value of each one of the points you make, I am puzzled 
 as to why you are not going forward on your way with your fork? How is the 
 Apache Foundation keeping you from making progress on your use of the 
 library?
 For our use it's not and I welcome any patches/help.
 
 It may be missed opportunity for getting a larger userbase and a moot point 
 anyway.  Specifically FBSD - When trying to push it as part of c++ stack 
 replacement or part of ports the only objection I got was licensing related.  
 (At this point they could also argue missing c++11 support)  [...]


IIUC, you would want to see STDCXX getting more exposure; one such avenue would 
involve having it used in FreeBSD as a ports package, with an all permissive 
BSD license.


 While STDCXX is at Apache it will never be BSD licensed.  Solution - move it 
 away from Apache foundation and have them transfer some of the additional 
 rights they received to allow recipient foundation to relicense.  I thought 
 this would be a win for the project and everyone, but for some reason instead 
 of opening a discussion to transfer - it's just death grip and pushing to the 
 attic.


The fact that Rogue Wave agreed to release the STDCXX code back in 2005 is 
nothing short of a miracle. IMHO, we are lucky to benefit from having had this 
library released to the public, anyway.

L

STDCXX forks

2012-08-31 Thread Liviu Nicoara

Please correct me if I am wrong.

I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and 
the other to Stefan Teleman.

The first one contains a number of genuine code changes outside the Apache 
repository, but without canonical and meaningful commit comments, and without 
accompanying test cases. Some carry what seem to be bug id's from a private bug 
database. Some of the changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727, 
fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd. Are you guys dropping 
support for any  platforms?

Stefan's seem like a complete git-ification of the whole Apache repository but 
with no changes I could detect.

Thanks.

L


Re-focus [was: Re: New chair and/or attic]

2012-08-31 Thread Liviu Nicoara

On 08/31/12 08:18, Jim Jagielski wrote:


On Aug 30, 2012, at 5:45 PM, C. Bergströmcbergst...@pathscale.com  wrote:

[...]
STDCXX isn't some stupid ass java framework or widget - It's a *critical* part 
of a C++ stack and the cost of leaving it out of the attic is negligible - 
What's the benefit of bringing up these attic discussions?


It's a critical part in which people either lack the time, motivation or
desire to push or submit patches to the canonical source?

Or is the desire to force Apache's hand in the matter such that
someone else's fork or branch becomes the de-facto source of this
critical part???

If stdcxx is as important as you say, and you are fighting to
keep it active, then put your money where your mouth is and
start working on bumping up the activity. Submit your bug fixes.


This discussion is going nowhere and is not becoming of a professional 
community. By now it must be clear where everybody's interests lie; all who can 
read took note of that. Since this is largely a dispute between Apache and 
Pathscale as an alleged representative of a free software community I suggest 
you take the licensing related discussions in private.

L


Re: STDCXX forks

2012-08-31 Thread Liviu Nicoara

On 08/31/12 08:58, C. Bergström wrote:

On 08/31/12 07:40 PM, Liviu Nicoara wrote:

Please correct me if I am wrong.

I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and 
the other to Stefan Teleman.

The first one contains a number of genuine code changes outside the Apache 
repository, but without canonical and meaningful commit comments, and without 
accompanying test cases.

We can help clean this up if it makes a real difference


That is appreciated. It would make a huge difference to anybody coming anew to 
the project or just catching up, like me.


Some carry what seem to be bug id's from a private bug database. Some of the 
changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727

iirc we renamed the files because of some problem with cmake - it was trying to 
compile those files instead of treating them like headers.

, fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd.

I forget why we did this tbh - I'll try to get an answer in case it comes up 
again as a question


My point exactly, a good comment explains the why's.


Are you guys dropping support for any platforms?

Not intentionally - We are using/testing this on Solaris, FBSD and Linux - We 
may add OSX/Win to that list soon, but I can't promise at this point. For 
targets we only are able to test x86 and x86_64


The reason I am asking is that changes to the template implementation files have most 
likely broken the support for compilers that do implicit inclusion of 
template implementations (a.k.a. link-time instantiation of templates). I am not 
incriminating anyone mind you, I am just trying to point out there mightbe a gap in your 
testing of your changes.



NetBSD also has a fork I believe, but not sure if it's public/status


Do you know where that is?



Stefan's seem like a complete git-ification of the whole Apache repository but 
with no changes I could detect.

He has quite a number of patches and forget where those are kept. I'm guessing 
a lot of his fixes target KDE/Qt apps and the Perennial C++VS testsuite.
http://www.peren.com/pages/cppvs_set.htm



I briefly looked over some patches, there's quite a few of them. I plan to go 
in more detail over the week-end.

Cheers.

L


Re: stdcxx issue 1058

2012-08-31 Thread Liviu Nicoara

On 08/31/12 13:14, Stefan Teleman wrote:


In June this year I committed r1353821 to trunk which fixes stdcxx-1058.

I have the patches for 1058 ready to commit to branches (4.2.x and 4.3.x).

OK to go?


The patch looks ok to me. What seems to be the problem?  +1

L


Branching policy, 4.3.x, 5.0.0, etc.

2012-08-31 Thread Liviu Nicoara

The branching policy [1] in effect looks very much like the Rogue Wave release 
process: branch at the beginning of each release cycle, work on the release 
branch, merge changes back into the trunk at release time (and in between as 
needed). Did I get that right?

From what I gather 4.2.x has last released 4.2.1, there is a 4.3.x with no 
releases, and a prospective 5.0.0 which should come out of the trunk (I saw 
changes mentioning 5.0.0). What are the stated goals for the 4.3.x and 5.x?

Thanks.

L

[1]http://wiki.apache.org/stdcxx/Branching


Re: New committers?

2012-08-31 Thread Liviu Nicoara

My input below.

On 08/31/12 09:42, Wojciech Meyer wrote:

The two significant ones (as far as I can understand):

- as I heard from Christopher Bergström that it's hard to push the
   stdcxx to FreeBSD ports repository (I can understand it and that
   sounds pretty bad, if that's the case then the board should consider
   re-licensing as advised; I agree in general it's a hard decision for
   the board, but imagine the project would benefit, IANAL tho)


Christopher's wishes and goals may be different from others'. I do not believe 
he has ulterior motives that would be detrimental to the rest of us but AFAICT 
he has not made a compelling argument. Even with one, it stretches the 
imagination what could possibly convince Apache to give up on STDCXX ownership.


- I'm also reading through that methodology we use might not fit the
   distributed model which could basically improve the pace of
   development stream (and again github is nice at these things; but
   there are other considerations)


The process put in place by Apache closely mirrors the rigors of the Rogue Wave 
environment where the project originates. The development proceeds at the best 
speed possible while at the same time proving the consistency and correctness 
of the code base through passing unit tests. The process is tightly controlled 
by rules which are observed by everyone (such as creating test cases before 
fixing bugs, thoroughly documenting changes, following coding and code 
structuring conventions, etc.). The process has an ultimate authority in the 
person of the tech lead, Martin.

The end result of the _pedantic_ application of these rules is the product you 
and I, all of us, enjoy. As mentioned before it is of an excellent quality, not 
often seen in the software world. It also is a very sophisticated product both 
with an intricate code structure, and extreme use of the language which pushes 
the compilers to their limits. Any change, however small, must be carefully 
considered and weighed, and careless changes will almost always break it in 
subtle ways. As a rule of thumb, if there is something that looks wrong in the 
source code, chances are you're not getting it right.

In case my point did not get across by now, I am strongly for the continuation 
of a tightly controlled development process.

Thanks.

Liviu



[PATCH] Trivial test fix

2012-09-01 Thread Liviu Nicoara

Would someone please apply the patch on 4.2.x branch? (I have not yet regained 
access to my Apache account.)


2012-09-01  Liviu Nicoara  nikko...@hates.ms

* tests/containers/23.bitset.cpp: swapped lines to avoid compiler bug
(see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54442)


Index: tests/containers/23.bitset.cpp
===
--- tests/containers/23.bitset.cpp  (revision 1379762)
+++ tests/containers/23.bitset.cpp  (working copy)
@@ -278,8 +278,8 @@ test_synopsis (std::bitset0*)
 MEMFUN (Bitset, flip, ());
 MEMFUN (Bitset, flip, (std::size_t));
 
-MEMFUN (Bitset::reference, operator[], (std::size_t));

 MEMFUN (bool, operator[], (std::size_t) const);
+MEMFUN (Bitset::reference, operator[], (std::size_t));
 
 MEMFUN (unsigned long, to_ulong, () const);


Re: STDCXX forks

2012-09-01 Thread Liviu Nicoara

On 08/31/12 12:20, Stefan Teleman wrote:

On Fri, Aug 31, 2012 at 8:40 AM, Liviu Nicoara nikko...@hates.ms wrote:


Stefan's seem like a complete git-ification of the whole Apache repository
but with no changes I could detect.


Not quite. :-)
[...]
The official Oracle port for Solaris 10 and 11, which I maintain, is here:

http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/


Hi Stefan, I have went through the patches. Specifically, I have spent more 
time looking in the mutex alignment changes and the C++ C library headers 
patches, and I only read the others. In order:

The test extensions seem to be genuine by and large, but I would further 
analyze them after I find out what is it they are addressing (test cases?).

The regression tests whose names contain references to internal bug numbers 
require a bit more analysis as to their usefulness. Of course an explanation 
attached to each would alleviate duplicating your work. I have not 
cross-checked them to JIRA.

Some of the compiler characterizations changes, as well as the associated 
GNUmakefile's, seem to be specific to your port, e.g., GNUmakefile, 
GNUmakefile.cfg changes. I may have spotted other issues but I would wait for 
your feed-back first.

The C++ C library headers seem to have been re-written to your port. I am 
unsure why you needed this, but it surely breaks the original intent for these 
headers' structure. I have also noticed that you stripped the Apache notice and 
added an Oracle copyright notice on them.

This pretty much sums up my first impression.

If you were willing to submit them one by one (*), complete with test cases and 
complete patches (ChangeLog entries and all) I would put in the necessary 
review time and provide feed-back asap. Please let me know if I missed anything.

I sincerely hope this helps.

Thanks!

Liviu

 
(*) As in issue by issue, not file by file.


Re: STDCXX forks

2012-09-01 Thread Liviu Nicoara

On 08/31/12 08:58, C. Bergström wrote:

On 08/31/12 07:40 PM, Liviu Nicoara wrote:

Please correct me if I am wrong.

I have seen two forks on github, one belonging to C. Bergstrom/Pathscale and 
the other to Stefan Teleman.

The first one contains a number of genuine code changes outside the Apache 
repository, but without canonical and meaningful commit comments, and without 
accompanying test cases.

We can help clean this up if it makes a real difference

Some carry what seem to be bug id's from a private bug database. Some of the 
changes, e.g., 84d01405124b8c08bb43fdc3755ada2592449727

iirc we renamed the files because of some problem with cmake - it was trying to 
compile those files instead of treating them like headers.

, fa9a45fb36e45b71ba6b4ae93b50bd6679549420, seem odd.

I forget why we did this tbh - I'll try to get an answer in case it comes up 
again as a question

Are you guys dropping support for any  platforms?

Not intentionally - We are using/testing this on Solaris, FBSD and Linux - We 
may add OSX/Win to that list soon, but I can't promise at this point.  For 
targets we only are able to test x86 and x86_64


Hi Christopher,

I finished looking though the github changes. If you are interested in 
contributing some of those changes back, as complete patches [1], I would 
volunteer my time to promptly review them and work with you to get them in, as 
needed.

In the meantime, if you will, could you please elaborate on the two changes 
mentioned above? The first one changes the traditional file extension on the 
template implementation files, the second affects the library building on 
platforms like I mentioned in another post. AFAIK STDCXX does not implement or 
follow Boost conventions and/or library file structure.

HTH.

Thanks!

Liviu

[1] http://stdcxx.apache.org/bugs.html#patch_format


Re: STDCXX forks

2012-09-01 Thread Liviu Nicoara

On Sep 1, 2012, at 1:52 PM, Stefan Teleman wrote:

 On Sat, Sep 1, 2012 at 12:15 PM, Liviu Nicoara nikko...@hates.ms wrote:
 [...]
 This pretty much sums up my first impression.
 
 [...]
 To begin with: the compiler flags/GNUmakefile changes are very
 specific to the SunPro compilers and to our internal build system.
 These changes are most likely not suitable for inclusion in the
 canonical stdcxx, except maybe for the sunpro.config changes, in case
 someone would like to be able to replicate our builds. I'd like to
 mention that, in Solaris, Apache stdcxx is a system library.

Good point. 

 
 About the Standard C Library forwarding header files: these changes
 are specfic to Solaris. The reason behind them is: the Solaris
 architectural rules, which can be best summarized as: there can be
 only one of each. In other words, it is Verboten, in Solaris, to
 duplicate the Standard C Library header files (or any other header
 file for that matter). The Solaris Standard C Library header files are
 C++-clean - they are required to be so, by the same architectural
 rules. Again, these changes are specific to Solaris, and are probably
 not portable across other implementations. I know for a fact that they
 are not portable for either the GCC or Intel compilers (with which I
 test regularly on Linux, in addition to SunPro).
 
 So these two groups of changesets can be ignored.

Got it.

 
 I opened yesterday STDCXX-1066:
 
 https://issues.apache.org/jira/browse/STDCXX-1056
 
 about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll
 have patches done this weekend. Achtung: the patchset is very large
 and touches a very large number of files. It's strange that I didn't
 get an email about STDCXX-1066.

Acknowledged. I have seen the individual patches on those two websites.

 
 I'd also like to talk about STDCXX-1056:
 
 https://issues.apache.org/jira/browse/STDCXX-1056
 
 which has already had an initial discussion, and for which I have
 attached  a patch. This issue also addresses (indirectly) linkage when
 building with GCC. On the recent versions of GCC that I have tested
 with, passing -supc++ on link line automatically links with the GNU
 libstdc++6.so (on top of linking with stdcxx), and that just bad.

I briefly looked at it, will delve into it later.

 
 And then I'll have to cross reference the patches which refer to our
 internal bug numbers because most of them are quite old and right now,
 off the top of my head, I can't remember what they are. :-)

Thanks!

Liviu



Re: stdcxx Wikipedia update

2012-09-02 Thread Liviu Nicoara

On Sep 2, 2012, at 12:16 PM, Wojciech Meyer wrote:

 According to the wikipedia we are dead?
 
 http://en.wikipedia.org/wiki/Apache_C%2B%2B_Standard_Library
 
 are we? :-)
 


It's hard to argue against, at this point. But I hope we'll soon be able to 
justify changing it.

L

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-04 Thread Liviu Nicoara

On Sep 4, 2012, at 10:34 AM, Stefan Teleman wrote:

 On Tue, Sep 4, 2012 at 7:35 AM, Liviu Nicoara nikko...@hates.ms wrote:
 
 By no means I am dismissing it. It is at the very least an issue of
 efficiency. I will try an Intel C++ build on x86_64 at some point today.
 What build type was that? I also notice that you have 4.2.1 in your path?
 Are you building out of 4.2.1. tag? I built off 4.2.x branch which also has
 support for custom timeouts (--soft-timeout) in rw_test.
 
 Yes, this is 4.2.1 with all the Solaris  patches which can be applied
 on Linux. The environment is:
 
 # INFO (S1) (10 lines):
 # TEXT:
 # COMPILER: Intel C++, __INTEL_COMPILER = 1210,
 __INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
 # ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
 Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
 # FILE: 22.locale.numpunct.mt.cpp
 # COMPILED: Sep  4 2012, 09:11:36
 # COMMENT: thread safety
 
 
 # CLAUSE: lib.locale.numpunct
 
 [ ... snip ... ]
 
 With all the thread-safety Solaris patches for 1056 applied. The test
 runs to completion:
 
 # NOTE (S2) (5 lines):
 # TEXT: executing locale -a  /tmp/tmpfile-Cej8JU
 # CLAUSE: lib.locale.numpunct
 # FILE: process.cpp
 # LINE: 276
 
 # INFO (S1) (3 lines):
 # TEXT: testing std::numpunctcharT with 8 threads, 20 iterations
 each, in 32 locales { C aa_DJ aa_DJ.iso88591 aa_DJ.utf8
 aa_ER aa_ER@saaho aa_ER.utf8 aa_ER.utf8@saaho aa_ET
 aa_ET.utf8 af_ZA af_ZA.iso88591 af_ZA.utf8 am_ET
 am_ET.utf8 an_ES an_ES.iso885915 an_ES.utf8 ar_AE
 ar_AE.iso88596 ar_AE.utf8 ar_BH ar_BH.iso88596 ar_BH.utf8
 ar_DZ ar_DZ.iso88596 ar_DZ.utf8 ar_EG ar_EG.iso88596
 ar_EG.utf8 ar_IN ar_IN.utf8 }
 # CLAUSE: lib.locale.numpunct
 
 [ ... snip ... ]
 
 # +---+--+--+--+
 # | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
 # +---+--+--+--+
 # | (S1) INFO |   11 |   11 |   0% |
 # | (S2) NOTE |1 |1 |   0% |
 # | (S8) ERROR|0 |3 | 100% |
 # | (S9) FATAL|0 |1 | 100% |
 # +---+--+--+--+
 real 2139.31
 user 2406.09
 sys 155.61
 [steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/04/2012
 9:50:14][1300]
 
 These tests running to completion (with the patches applied) are 100%
 reproducible on every single run, on Linux and Solaris (Intel and
 SPARC).

That is good, right?

 
 The timings are the output of /usr/bin/time -p.
 
 Yes, ~40 minutes wall clock run time for one test is quite a lot. But
 it runs to completion, with zero errors. And when taking into
 consideration that there are 8 threads with 20 iterations, maybe
 it's not that much. FWIW, SELinux is also fully enabled in my
 environment.


Ok, so that clears (this build at least), maybe inefficient but runs to 
completion with no crashes.


 
 The timeouts can also be a symptom of race conditions/deadlocks or
 attempting to access invalid thread stacks, or partially written data.


I would look at timeouts from the standpoint of the progression of the run: if 
the run does not progress at all, as in threads are deadlocked, then I would 
consider that a defect. If the run progresses, it is a mere inefficiency, 
however bad.


 It's very platform specific. On Solaris SPARC (either 32-bit or
 64-bit) I never get timeouts, I always get either SEGV or SIGBUS. On
 Intel it appears that timeouts are the prevailing symptom.

Could not get an Intel build off the ground on the account of the LIMITS.cpp 
test not running to completion because of a possible compiler bug. I posted 
earlier an account of it. Do you have a support account that allows posting bug 
reports for Intel's C++ compiler?


 
 I am not sure what an explicit run timeout would add, except hiding
 the hang/deadlock/memory corruption problem behind a timeout.


The timeout option allowed me to run the program with a larger value, such that 
the alarm would not trigger and allow the program to run to completion. Before 
enlarging the timeout value, the individual tests would time out.

I did not have access to a Solaris box since I left Rogue Wave. Does Sun/Oracle 
have any program that would allow one to test drive their compiler on a shared 
box somewhere? I vaguely remember that HP had something like that a while ago.

Thanks!

Liviu

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Liviu Nicoara

On 09/05/12 00:51, Stefan Teleman wrote:

On Tue, Sep 4, 2012 at 10:49 PM, Martin Sebor mse...@gmail.com wrote:


   template class _CharT
   inline string numpunct_CharT::grouping () const
   {
   if (!(_C_flags  _RW::__rw_gr)) {

   numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

  _RWSTD_MT_GUARD (_C_mutex);

  // [try to] get the grouping first (may throw)
   // then set a flag to avoid future initializations
   __self-_C_grouping  = do_grouping ();
   __self-_C_flags|= _RW::__rw_gr;
   }

   return _C_grouping;
   }


That's what I wanted to do originally - use a per-object mutex.


FWIW, it is pretty obvious to me _now_ that these assignments are not MT-safe, 
by themselves and also in the context of the test guarding them.

You're right, access must be synchronized on a per-facet object basis. Since 
the data that needs to be protected on assignment is instance data, the mutex 
used in the guard must be a (yet to be added) instance mutex. That would make 
it binary incompatible and would go in ... 4.3.x?




This works:

template class _CharT
inline string numpunct_CharT::grouping () const
{
 if (!(_C_flags  _RW::__rw_gr)) {

 numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

 _RWSTD_MT_STATIC_GUARD (_Type);


It seems to me that a static guard is excessive. The only thing that needs 
guarding is the actual assignment to facet instance members -- a mutex instance member 
would suffice.



And even so, this is still not thread-safe:

Two different threads [ T1 and T2 ], seeking two different locales
[en_US.UTF-8 and ja_JP.UTF-8], enter std::numpunct_CharT::grouping()
at the same time - because they are running on two different cores.
They both test for

 if (!(_C_flags  _RW::__rw_gr))

and then -- assuming the expression above evaluates to true -- one of
them wins the mutex [T1], and the other one [T2] blocks on the mutex.

When T1 is done and exits the function, the grouping is set to
en_US.UTF-8 and the mutex is released. Now T2 acquires the mutex, and
proceeds to setting grouping to ja-JP.UTF-8. Woe on the caller running
from T1 who now thinks he got en_US.UTF-8, but instead he gets
ja_JP.UTF-8, which was duly set so by T2, but T1 had no clue about it
(remember, the std::string grouping _charT buffer is shared by the
caller from T1 and the caller from T2).


I am pretty sure that in your example, they would operate on two different instances of the facet 
class. The static guard would synchronize their running alright but through two 
different facet instances, copying data from two different places of the locale database. In this 
respect a static guard is excessive.

I would defer it to Martin to validate a course of action but to me it looks 
like the only problem is the concurrent assignment to facet instance member 
variables inside the facet member functions. Which could be easily and nicely 
solved with a plain guard on a mutex ordinary member variable.

Thanks!

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Liviu Nicoara

On 09/05/12 14:09, Stefan Teleman wrote:

On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote:
[...]
OK so I did a little bit of testing, after looking at the *right*
__rw_guard class. :-)

I changed the std::numpunct class thusly:
[...]
And then:

template class _CharT
inline string numpunct_CharT::grouping () const
{
 if (!(_C_flags  _RW::__rw_gr)) {

 _RWSTD_MT_GUARD (_C_object_mutex);

 // double-test here to avoid re-writing an already written string
 if (!(_C_flags  _RW::__rw_gr)) {

 numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

 // [try to] get the grouping first (may throw)
 // then set a flag to avoid future initializations
 __self-_C_grouping  = do_grouping ();
 __self-_C_flags|= _RW::__rw_gr;

 }
 }

 return _C_grouping;
}


I am afraid this would be unsafe, too (if I said otherwise earlier I was 
wrong). The compiler might re-arrange the protected assignments, such that 
another thread sees a partially updated object, where the flags are updated and 
the string not. I don't think we're going to get away with this here without 
either a simpler and more inefficient top-level locking, or doing away 
completely with the lazy initialization. Thoughts?

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Liviu Nicoara

On 09/05/12 23:51, Martin Sebor wrote:

On 09/05/2012 09:03 PM, Stefan Teleman wrote:

[...]
Agreed.

But: if the choice is between an implementation which [1] breaks ABI
and [2] performs 20% worse -- even in contrived test cases -- than
another implementation [2] which doesn't break ABI, and performs
better than the first one,  why would we even consider the first one?


Breaking the ABI is not an option (unless we rev the version).
But I'm not sure I understand what you think breaks the ABI.


I think Stefan is referring to adding a mutex member variable to the facet in 
question and breaking binary compatibility. If that is the case I have confused 
things when I suggested exactly that, earlier. A cursory read through the 
__rw_facet source shows that inherits from __rw_synchronized in MT builds, 
therefore each facet carries its own mutex member.

Liviu


We don't need to add a new mutex -- we can use the __rw_facet
member for the locking. Or did you mean something else?

Martin



--Stefan




--
And now I see with eye serene
The very pulse of the machine.


A question (or two) of procedure, etc.

2012-09-06 Thread Liviu Nicoara

What is the latest policy in what regards trivial fixes, e.g., the volatile 
qualifier for the max var in LIMITS.cpp we discussed earlier, etc.? It seems 
excessive to create a bug report for such issues.

Also, IIUC from reading previous discussions, forward and backward binary 
compatible changes go in 4.2.x, followed by merges to 4.3.x and trunk. Am I 
getting this right?

Also, besides the Linux, FreeBSD, Windows, Solaris builds hosted on Apache 
(Jenkins) is anybody building on HP-UX, AIX, etc.?

Thanks.

Liviu



Re: A question (or two) of procedure, etc.

2012-09-06 Thread Liviu Nicoara

On 09/06/12 14:37, Wojciech Meyer wrote:

Liviu Nicoara nikko...@hates.ms writes:


What is the latest policy in what regards trivial fixes, e.g., the
volatile qualifier for the max var in LIMITS.cpp we discussed earlier,
etc.? It seems excessive to create a bug report for such issues.


[...]
So I vote for keeping the changes as lightweight as possible, and avoid
extra bureaucracy if it makes sense. This assumption is based that
developers here trust their selves, and run the tests often. I'm not
subscribed to the commit list here, but if I do - I usually follow
people's changes and assume that developers do read commit lists.


Makes sense. Thanks.



So the general consensus from my experience with other project was: not
sure - ask.

That's my experience, also I don't have full rights to express my
opinion right now about stdcxx.


I sure hope we can have totally open (civilized) discussions going forward. :)




Also, IIUC from reading previous discussions, forward and backward
binary compatible changes go in 4.2.x, followed by merges to 4.3.x and
trunk. Am I getting this right?


Every project has certain branch strategy, I'm not sure about this so
maybe Martin can advice. I prefer to develop on trunk and cherry pick
to the other branches avoiding bulk merges (and that's in both
directions).


Right... I saw some discussions from a while back about active development on 4.2.x 
with integration to other branches as dictated by compatibility (e.g., integrating 
4.2.x - 4.3.x and 4.2.x - 4.2.1), and reintegration to trunk as needed. I am 
not looking to change any such policy just wanna make sure I am not messing something 
up.





Also, besides the Linux, FreeBSD, Windows, Solaris builds hosted on Apache 
(Jenkins) is anybody building on HP-UX, AIX, etc.?


Thanks.

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Liviu Nicoara

On Sep 5, 2012, at 4:03 PM, Martin Sebor wrote:

 On 09/05/2012 01:33 PM, Liviu Nicoara wrote:
 On 09/05/12 15:17, Martin Sebor wrote:
 On 09/05/2012 12:40 PM, Liviu Nicoara wrote:
 On 09/05/12 14:09, Stefan Teleman wrote:
 On Wed, Sep 5, 2012 at 10:52 AM, Martin Sebor mse...@gmail.com wrote:
 [...]
 OK so I did a little bit of testing, after looking at the *right*
 __rw_guard class. :-)
 
 I changed the std::numpunct class thusly:
 [...]
 
 I am afraid this would be unsafe, too (if I said otherwise earlier I was
 wrong). [...] Thoughts?
 
 You're right, there's still a problem. We didn't get the double
 checked locking quite right.
 
 I wish for simplicity: eliminate the lazy initialization, and fully
 initialize the facet in the constructor. Then we'd need no locking on
 copying its data (std::string takes care of its own copying).
 
 I'm not sure how easily we can do that. Almost all of locale
 is initialized lazily. Some of the layers might depend on the
 facets being initialized lazily as well. This was a deliberate
 design choice. One of the constraints was to avoid dynamic
 initialization or allocation at startup. [...]

There would be a performance degradation. IMHO, it would be minor and would 
simplify the code considerably.

After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I 
tried to remove the lazy initialization and a (smaller) test case now passes. I 
am attaching the program and the numpunct patch. 

Will continue to look into it. Although STDCXX does not have an implementation 
of the atomics library we could probably use the existing, unexposed, atomics 
API to implement a more robust lazy initialization. 

Liviu

$ cat tests/localization/t.cpp; svn diff
#include locale

#include cstdio
#include cstdlib

#include unistd.h

#define MAX_THREADS16
#define MAX_LOOPS1000

static bool hold = true;

extern C {

static void* 
f (void* pv)
{
std::numpunctchar const fac = 
*reinterpret_cast std::numpunctchar*  (pv);

while (hold) ; 

for (int i = 0; i != MAX_LOOPS; ++i) {
std::string const grp = fac.grouping ();
}

return 0;
}

}

int
main (int argc, char** argv)
{
std::locale const loc = std::locale (argv [1]);

std::numpunctchar const fac =
std::use_facetstd::numpunctchar  (loc);

pthread_t tid [MAX_THREADS] = { 0 };

for (int i = 0; i  MAX_THREADS; ++i) {
if (pthread_create (tid + i, 0, f, (void*)fac)) 
exit (-1);
}

sleep (1);

hold = false;

for (int i = 0; i  MAX_THREADS; ++i) {
if (tid [i])
pthread_join (tid [i], 0);
}

return 0;
}

Index: include/loc/_numpunct.h
===
--- include/loc/_numpunct.h (revision 1381575)
+++ include/loc/_numpunct.h (working copy)
@@ -61,24 +61,40 @@ struct numpunct: _RW::__rw_facet
 string_type;
 
 _EXPLICIT numpunct (_RWSTD_SIZE_T __ref = 0)
-: _RW::__rw_facet (__ref), _C_flags (0) { }
+: _RW::__rw_facet (__ref) {
+_C_grouping   = do_grouping ();
+_C_truename   = do_truename ();
+_C_falsename  = do_falsename ();
+_C_decimal_point  = do_decimal_point ();
+_C_thousands_sep  = do_thousands_sep ();
+}
 
 virtual ~numpunct () _RWSTD_ATTRIBUTE_NOTHROW;
 
 // 22.2.3.1.1, p1
-char_type decimal_point () const;
+char_type decimal_point () const {
+return _C_decimal_point;
+}
 
 // 22.2.3.1.1, p2
-char_type thousands_sep () const;
+char_type thousands_sep () const {
+return _C_thousands_sep;
+}
 
 // 22.2.3.1.1, p3
-string grouping () const;
+string grouping () const {
+return _C_grouping;
+}
 
 // 22.2.3.1.1, p4
-string_type truename () const;
+string_type truename () const {
+return _C_truename;
+}
 
 // 22.2.3.1.1, p4
-string_type falsename () const;
+string_type falsename () const {
+return _C_falsename;
+}
 
 static _RW::__rw_facet_id id;
 
@@ -112,15 +128,13 @@ protected:
 
 private:
 
-int _C_flags;   // bitmap of cached data valid flags
-string  _C_grouping;// cached results of virtual members
+string  _C_grouping;
 string_type _C_truename;
 string_type _C_falsename;
 char_type   _C_decimal_point;
 char_type   _C_thousands_sep;
 };
 
-
 #ifndef _RWSTD_NO_SPECIALIZED_FACET_ID
 
 _RWSTD_SPECIALIZED_CLASS
@@ -134,95 +148,6 @@ _RW::__rw_facet_id numpunctwchar_t::id
 #  endif   // _RWSTD_NO_WCHAR_T
 #endif   // _RWSTD_NO_SPECIALIZED_FACET_ID
 
-
-template class _CharT
-inline _TYPENAME numpunct_CharT::char_type
-numpunct_CharT::decimal_point () const
-{
-if (!(_C_flags  _RW::__rw_dp)) {
-
-numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);
-
-// [try to] get the decimal point first (may throw)
-// then set a flag to avoid future

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-07 Thread Liviu Nicoara

On 09/06/12 19:54, Martin Sebor wrote:

I'm not sure how easily we can do that. Almost all of locale
is initialized lazily. Some of the layers might depend on the
facets being initialized lazily as well. This was a deliberate
design choice. One of the constraints was to avoid dynamic
initialization or allocation at startup. [...]


There would be a performance degradation. IMHO, it would be minor and would 
simplify the code considerably.

After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I 
tried to remove the lazy initialization and a (smaller) test case now passes. I 
am attaching the program and the numpunct patch.


Out of curiosity, does it still work if you move the locale into
the thread function? Like this:


I created the test case because I wanted something more specific and with more 
predictable results. Initially, the locale object was created in the thread and 
it was passing.




Also, does the 27.objects test pass with this patch?


Will try. I don't think that we can actually use this patch, I bundled it with 
the test so that people can test the approach right away.



I don't know if we have tests for it but writing to cerr/cout
in a bad_alloc handler should succeed. I.e., this should not
abort:

   try {
   struct rlimit rlim = { };
   getrlimit (RLIMIT_DATA, rlim);
   rlim.rlim_cur = 0;
   setrlimit (RLIMIT_DATA, rlim);

   for ( ; ; ) new int;
   }
   catch (bad_alloc) {
   cout  caught bad alloc:  123  '\n';
   }


I see..

Liviu



Re: A question (or two) of procedure, etc.

2012-09-07 Thread Liviu Nicoara

On 09/06/12 23:00, Martin Sebor wrote:

Every project has certain branch strategy, I'm not sure about this so
maybe Martin can advice. I prefer to develop on trunk and cherry pick
to the other branches avoiding bulk merges (and that's in both
directions).


We've done most work on 4.2.x for historical reasons. I think
a better strategy is to develop, as you suggest, on trunk which
has the least restrictive commit policy, and merge changes out
to the more restrictive branches as appropriate.


If open to voting, I am for trunk first, branches later.

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-07 Thread Liviu Nicoara

On 09/06/12 22:58, Stefan Teleman wrote:

On Thu, Sep 6, 2012 at 7:31 PM, Liviu Nicoara nikko...@hates.ms wrote:


There would be a performance degradation. IMHO, it would be minor and would 
simplify the code considerably.

After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I 
tried to remove the lazy initialization and a (smaller) test case now passes. I 
am attaching the program and the numpunct patch.


With your patches, the performance is much much better:


I didn't think about timing it, thanks! The result is somewhat expected without 
the synchronization. However, the full construction-time initialization is 
inappropriate if you take in consideration the delegation from the public API 
to the protected virtual API of the facet class, and the possibility of 
overrides in subclasses. Also, lazy initialization follows the principle of 
economy of effort, and caching increases efficiency. If possible in a robust 
way, I would rather have those two. :-)

Liviu



# INFO (S1) (10 lines):
# TEXT:
# COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  6 2012, 20:50:13
# COMMENT: thread safety


# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 | 100% |
# +---+--+--+--+
real 1035.05
user 1191.76
sys 63.49

--Stefan




--
And now I see with eye serene
The very pulse of the machine.


Re: A question (or two) of procedure, etc.

2012-09-07 Thread Liviu Nicoara

On 09/07/12 10:54, Martin Sebor wrote:

We should remember that there are a number of Jira issues that
we fixed on 4.2.x but haven't merged out to 4.3.x or trunk. The
idea behind the current process (4.2.x - 4.3.x - trunk) was
to be able to simply merge the branches in bulk, as opposed to
an fix at a time. Unfortunately, we ran into some Subversion
issues that made it a huge pain. IIRC, Travis did it at least
once so he might still remember the details.


That would be very helpful to know.



Going forward, to avoid this mess, I would suggest we make
an effort to commit fixes wherever necessary at the same
time instead of delaying it until some time in the future.


Got it.

Liviu


dbx [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]

2012-09-07 Thread Liviu Nicoara

On 09/07/12 11:58, Stefan Teleman wrote:

On Fri, Sep 7, 2012 at 8:52 AM, Liviu Nicoara nikko...@hates.ms wrote:

On 09/06/12 19:54, Martin Sebor wrote:



Also, does the 27.objects test pass with this patch?



No, it does not. It hangs at the first insertion, line 227. Unfortunately, I
cannot debug it because dbx does not work properly in my installation.


What are the symptoms with dbx mishbehaving? (maybe I can help).


Thanks!

I get this when launching the debugger:

$ dbx -xexec32 t
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.9' in your .dbxrc
Reading t
Reading ld-linux.so.2
dbx: fetch at 0xf400 failed -- Input/output error
dbx: warning: could not put in breakpoint
dbx: warning: internal handler (-80) made defunct -- could not enable event BPT
dbx: warning: internal handler (-86) made defunct -- could not enable event BPT
...

Then the program just runs from the get go. If I set breakpoints and rerun, it 
ignores them.

Also, long running programs cannot be broken into with C-c:

^Cdbx: warning: Interrupt ignored but forwarded to child.
signal INT in (unknown) at 0xf77cf6c4
0xf77cf6c4: movl %edx,%eax
dbx: warning: 'stop' ignored -- while doing rtld handshake
terminating signal 2 SIGINT
dbx: warning: 'stop' ignored -- while doing rtld handshake
(dbx) where
dbx: program is not active
(dbx)

Liviu


[PATCH] STDCXX-1067 Mac builds

2012-09-09 Thread Liviu Nicoara
The default compiler on recent Apple Macs is LLVM with Clang and gcc C++ 
front-ends. The compiler does not come with a C++ language support library. 
However, gcc Mac builds are fine with GNU stock compilers, modulo the issues for 
which I attach the patch below, for review.


I built successfully and ran the test suite on both Mac and Linux, wigh gcc 
4.5.4 and 4.5.2, respectively.


Thanks.

Liviu


Index: src/x86_64/atomic.s
===
--- src/x86_64/atomic.s (revision 1382343)
+++ src/x86_64/atomic.s (working copy)
@@ -26,8 +26,17 @@
  * 
  **/
 
+#if defined (__MACH__)
+// Mac OS X Mach-O assembler: no .type, power of two alignment
+#  define ALIGN_DIR   .align 4
+#  define TYPE_DIR(ignore,ignore2) 
+#else
+#  define ALIGN_DIR   .align 16
+#  define TYPE_DIR(sym,attr)  .type sym, attr
+#endif // __MACH__
+
 .text
-.align 16
+ALIGN_DIR
 
 /***
  * extern C int8_t __rw_atomic_xchg8 (int8_t *x, int8_t y);
@@ -37,7 +46,7 @@
  **/
 
 .globl __rw_atomic_xchg8
-.type __rw_atomic_xchg8, @function
+TYPE_DIR (__rw_atomic_xchg8, STT_FUNC)
 __rw_atomic_xchg8: /* ; int8_t (int8_t *x, int8_t y)  */
 movq  %rdi, %rcx   /* ; %rcx = x  */
 movb  %sil,  %al   /* ; %al = y   */
@@ -53,7 +62,7 @@
  **/
 
 .globl __rw_atomic_xchg16
-.type __rw_atomic_xchg16, @function
+TYPE_DIR (__rw_atomic_xchg16, STT_FUNC)
 __rw_atomic_xchg16:/* ; int16_t (int16_t *x, int16_t y) */
 movq  %rdi, %rcx   /* ; %rcx = x*/
 movw  %si,  %ax/* ; %ax = y */
@@ -69,7 +78,7 @@
  **/
 
 .globl __rw_atomic_xchg32
-.type __rw_atomic_xchg32, @function
+TYPE_DIR (__rw_atomic_xchg32, STT_FUNC)
 __rw_atomic_xchg32:/* ; int32_t (int32_t *x, int32_t y) */
 movq  %rdi,  %rcx  /* ; %rcx = x*/
 movl  %esi,  %eax  /* ; %eax = y*/
@@ -85,7 +94,7 @@
  **/
 
 .globl __rw_atomic_xchg64
-.type __rw_atomic_xchg64, @function
+TYPE_DIR (__rw_atomic_xchg64, STT_FUNC)
 __rw_atomic_xchg64:/* ; int64_t (int64_t *x, int64_t y) */
 movq  %rdi,  %rcx  /* ; %rcx = x*/
 movq  %rsi,  %rax  /* ; %rax = y*/
@@ -101,7 +110,7 @@
  **/
 
 .globl __rw_atomic_add8
-.type __rw_atomic_add8, @function
+TYPE_DIR (__rw_atomic_add8, STT_FUNC)
 __rw_atomic_add8:  /* ; int8_t (int8_t *dst, int8_t inc) */
 movq   %rdi, %rcx  /* ; %rcx = dst   */
 movl   %esi, %eax  /* ; %eax = inc   */
@@ -123,7 +132,7 @@
  **/
 
  .globl __rw_atomic_add16
-.type __rw_atomic_add16, @function
+TYPE_DIR (__rw_atomic_add16, STT_FUNC)
 __rw_atomic_add16: /* ; int16_t (int16_t *dst, int16_t inc) */
 movq   %rdi, %rcx  /* ; %rcx = dst  */
 movw   %si,  %ax   /* ; %ax = inc   */
@@ -146,7 +155,7 @@
  **/
 
 .globl __rw_atomic_add32
-.type __rw_atomic_add32, @function
+TYPE_DIR (__rw_atomic_add32, STT_FUNC)
 __rw_atomic_add32: /* ; int32_t (int32_t *dst, int32_t inc) */
 movq   %rdi, %rcx  /* ; %rcx = dst  */
 movl   %esi, %edx  /* ; %edx = inc  */
@@ -169,7 +178,7 @@
  **/
 
 .globl __rw_atomic_add64
-.type __rw_atomic_add64, @function
+TYPE_DIR (__rw_atomic_add64, STT_FUNC)
 __rw_atomic_add64: /* ; int64_t (int64_t *dst, int64_t inc) */
 movq   %rdi, %rcx  /* ; %rcx = dst  */
 movq   %rsi, %rdx  /* ; %edx = inc  */
Index: etc/config/gcc.config
===
--- etc/config/gcc.config   (revision 1382343)
+++ etc/config/gcc.config   (working copy)
@@ -40,6 +40,7 @@
 else
 ifeq ($(OSNAME),Darwin)
 OS_MAJOR := $(shell 

Re: [PATCH] STDCXX-1067 Mac builds

2012-09-09 Thread Liviu Nicoara

On 9/9/12 7:07 PM, Wojciech Meyer wrote:

Hi Liviu,

I don't use Mac OS X at all but:

Liviu Nicoara nikko...@hates.ms writes:


The default compiler on recent Apple Macs is LLVM with Clang and gcc
C++ front-ends. The compiler does not come with a C++ language support
library. However, gcc Mac builds are fine with GNU stock compilers,
modulo the issues for which I attach the patch below, for review.


I think it will be a big win to support Clang for the community.


AFAICT, there is only one problem with that, the lack of a complete language 
support library.


Pathscale released a replacement for libsupc++/libgcc_s in the form of the pair 
libcxxrt/libunwind, which provide equivalent functionality on *BSD/Linux, but 
not Darwin.


Perhaps Christopher can shed some light here.

Liviu




Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-10 Thread Liviu Nicoara

On 09/10/12 15:01, Stefan Teleman wrote:

On Mon, Sep 10, 2012 at 2:21 PM, Liviu Nicoara nikko...@hates.ms wrote:


4. Without caching of grouping values, grouping() delegates always to
do_grouping():

real0m5.668s
user1m11.389s
sys 0m3.952s


FWIW, 22.2.3.1.1 explicitly states that all of the decimal_point(),
grouping(), truename(), falsename() must return their do_*()
counterparts.


Yes, of course. :-) I just meant to say that is _always_ delegating to 
do_grouping(), i.e., does no caching.

Liviu



--Stefan




--
And now I see with eye serene
The very pulse of the machine.


Re: Board report time

2012-09-11 Thread Liviu Nicoara

On 09/11/12 08:15, Jim Jagielski wrote:

It's time for our report to the board...

what would we like to share?

I see:

  o renewed discussion on health/viability of pmc
  o increased development being done
  o PMC expressing interest in moving to git


This sounds about right. It should also mention that some members expressed 
interest in alternative licensing.

Thanks.

Liviu



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-11 Thread Liviu Nicoara

On 9/11/12 9:40 PM, Martin Sebor wrote:

On 09/11/2012 04:15 PM, Stefan Teleman wrote:

On Mon, Sep 10, 2012 at 4:24 PM, Stefan Teleman

I think I have something which doesn't break BC - stay tuned because
I'm testing it now.


OK.

So, here's a possible implementation of __rw_get_numpunct() with
minimal locking, which passes the MT tests and does not break ABI:

s247136804.onlinehome.us/stdcxx-1056-20120911/punct.cpp

And the same for include/loc/_numpunct.h:

http://s247136804.onlinehome.us/stdcxx-1056-20120911/_numpunct.h

In _numpunct.h, all the functions perform no checks and no lazy
initialization. They function simply as a pass-through to
__rw_get_numpunct(). std::numpunctT's data members are now dead
varaiables.

The bad: performance is no better than with locking the mutex inside
each of the std::numpunctT::*() functions, and with lazy
instantiation.


I wouldn't expect this to be faster than the original. In fact,
I would expect it to be slower because each call to one of the
public, non-virtual members results in a call to the out-of-line
virtual functions (and another to __rw_get_moneypunct). Avoiding
the overhead of such calls is the main and only reason why the
caching exists.



AFAICT, there are two cases to consider:

1. Using STDCXX locale database initializes the __rw_punct_t data in the first, 
properly synchronized pass through __rw_get_numpunct. All subsequent calls use 
the __rw_punct_t data to construct returned objects.
2. Using the C library locales does the same in the first pass, via setlocale 
and localeconv, but setlocale synchronization is via a per-process lock. The 
facet data, once initialized is used just like above.


I probably missed this in the previous conversation, but did you detect a race 
condition in the tests if the facets are simply forwarding to the private 
virtual interface? I.e., did you detect that the facet initialization code is 
unsafe? I think the facet __rw_punct_t data is safely initialized in both cases, 
it's the caching that is done incorrectly.



I'm afraid unoptimized timings don't tell us much. Neither does
a comparison between two compilers, even on the same OS.

I looked at Liviu's timings today. I was puzzled by the difference
between (1) which, IIUC, is the current implementation (presumably
an optimized, thread-safe build with the same compiler and OS) and
(4), which, again IIUC, is the equivalent of your latest patch here
(again, presumably optimized, thread safe, same compiler/OS). I'm
having trouble envisioning how calling a virtual function to
retrieve the value of grouping can possibly be faster than not
calling it (and simply returning the value cached in the data
member of the facet.



The new results I attached to the issue come from a a bit clearer tests and they 
focus on just two cases: the current implementation vs. a non-caching one; the 
latter just forwards the grouping calls to the protected do_grouping, with _no_ 
other changes to the implementation.


The timing numbers seem to show that MT builds fare far worse with the caching 
than without. Stefan, if you have the time, could you please infirm :) my 
conclusions by timing it on one of your machines?


Thanks,

Liviu



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-12 Thread Liviu Nicoara

On 09/12/12 00:21, Stefan Teleman wrote:

On Tue, Sep 11, 2012 at 10:18 PM, Liviu Nicoara nikko...@hates.ms wrote:


AFAICT, there are two cases to consider:

1. Using STDCXX locale database initializes the __rw_punct_t data in the
first, properly synchronized pass through __rw_get_numpunct. All subsequent
calls use the __rw_punct_t data to construct returned objects.
2. Using the C library locales does the same in the first pass, via
setlocale and localeconv, but setlocale synchronization is via a per-process
lock. The facet data, once initialized is used just like above.

I probably missed this in the previous conversation, but did you detect a
race condition in the tests if the facets are simply forwarding to the
private virtual interface? I.e., did you detect that the facet
initialization code is unsafe? I think the facet __rw_punct_t data is safely
initialized in both cases, it's the caching that is done incorrectly.


I originally thought so too, but now I'm having doubts. :-) And I
haven't tracked it down with 100% accuracy yet. I saw today this
comment in src/facet.cpp, line 358:

// a per-process array of facet pointers sufficiently large
// to hold (pointers to) all standard facets for 8 locales
static __rw_facet*  std_facet_buf [__rw_facet::_C_last_type * 8];


The localization code is cleverly dense and low-level. There are a few patterns 
that are used throughout, though. One is the function-local static buffers, 
e.g., the locales buffer and the facets buffer, in locale_body.cpp and 
facet.cpp, respectively. They both start from a reasonable size and are grown 
and shrunk according to the needs, but neither locales nor facets are ever 
evicted from them. See:

locale_body:936  -- reduction of locale buffer
locale_body:1006 -- expansion of locale buffer

facet.cpp:397-- reduction of facet buffer
facet.cpp:462-- expansion of facet buffer

Facets buffer is guarded by a static guard in:

__rw_facet::_C_manage:366

The locales buffer is protected by a static guard in:

__rw_locale::_C_manage:880




[...] this is all investigative stuff for tomorrow. :-)

and I agree with Martin that breaking ABI in a minor release is really
not an option. I'm trying to find the best way of making these facets
thread-safe while inflicting the least horrible performance hit.


We are getting close to the point where we can summarize all these findings and 
gauge an appropriate course of action.



I will run your tests tomorrow and let you know. :-)



That would be very helpful!

Liviu


STDCXX-1066 [was: Re: STDCXX forks]

2012-09-15 Thread Liviu Nicoara

On 9/1/12 1:52 PM, Stefan Teleman wrote: On Sat, Sep 1, 2012 at 12:15 PM, 
 I opened yesterday STDCXX-1066:

 https://issues.apache.org/jira/browse/STDCXX-1056

 about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll
 have patches done this weekend. Achtung: the patchset is very large
 and touches a very large number of files. It's strange that I didn't
 get an email about STDCXX-1066.


Hi Stefan,

I have read through the patches attached to the incident, then I briefly read 
about the SunPro pragma align and pack. Two questions:


1. AFAICT, the use of the packing pragma may interfere with a user's setting of 
the same value. I.e., a user sets the packing in their sources and then, 
directly or not, includes an STDCXX header. It seems to me that in such a 
situation, our setting of the packing value would interfere with the rest of the 
user's translation unit, since there is no way to `restore' the previous packing 
value.


Something along the lines of:

// user source file

#pragma pack (X) // X != 8

#include iostream

struct UserDef
{
// different alignment than X ?
// ...
};

Is my understanding correct?

2. The patches are against 4.2.1, but the change would be binary incompatible 
with the already released 4.2.1 branch. Do you plan to have this fix in 4.3.x?


Thanks.

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-15 Thread Liviu Nicoara

On 9/15/12 1:51 PM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 9:01 AM, Liviu Nicoara nikko...@hates.ms wrote:


That is funny. What compiler are you using? What does the following test
case return for you?


It's the Intel compiler with the patched stdcxx for  the wrong case
and GCC 4.7.1 + GNU libstdc++ for  the correct case.

GCC + GNU libstdc++ are correct.

The patched facet does not call the protected do_*() virtual functions
from their public counterparts, as it is required to do by the
Standard. Instead, it returns the data mebers directly (the data
members were initialized in the constructor). That is the patch you
proposed, which is indeed much better performing than using a mutex
lock. Unfortunately, in doing so, overriding the virtual functions in
a derived facet type becomes pointless.


Ahh, I see now. You are indeed right, that patch is defective. I was under the 
impression that we were discussing the (later) attachment 
stdcxx-1056-timings.tgz which contains a perfectly forwarding implementation of 
the facet public grouping method. The timings I attached there were the ones I 
thought we were discussing all along.


Now, to clear the confusion I created: the timing numbers I posted in the 
attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a 
perfectly forwarding, no caching public interface (exemplified by a changed 
grouping) performs better than the current implementation. It was that test case 
that I hoped you could time, perhaps on SPARC, in both MT and ST builds. The 
t.cpp program is for MT, s.cpp for ST.


Please let me know if this clarifies things. I apologize for the 
misunderstanding.

Liviu


Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-15 Thread Liviu Nicoara

On 9/15/12 2:57 PM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 1:02 PM, Liviu Nicoara nikko...@hates.ms wrote:


I have read through the patches attached to the incident, then I briefly
read about the SunPro pragma align and pack. Two questions:

1. AFAICT, the use of the packing pragma may interfere with a user's setting
of the same value. I.e., a user sets the packing in their sources and then,
directly or not, includes an STDCXX header. It seems to me that in such a
situation, our setting of the packing value would interfere with the rest of
the user's translation unit, since there is no way to `restore' the previous
packing value.

Something along the lines of:

// user source file

#pragma pack (X) // X != 8

#include iostream

struct UserDef
{
 // different alignment than X ?
 // ...
};

Is my understanding correct?


Yes, you are indeed correct, Sir. :-)

However, for every #pragma pack(8) there should be a corresponding
subsequent #pragma pack(0). If there isn't, that's a patch bug.
#pragma pack(0) restores the default alignment. So, there should be
(there *must* be) no silent packing side-effects in user programs.


Yes, but it restores the default packing, not an arbitrary one, potentially set 
by the user prior to including our headers. Say, the user sets 2, the default is 
4 and we set 8. When we set it to default it goes back to 4, instead of the 
expected 2. Did I get this right?





2. The patches are against 4.2.1, but the change would be binary
incompatible with the already released 4.2.1 branch. Do you plan to have
this fix in 4.3.x?


Yes, definitely for 4.2.x and 4.3.x. I sent them for 4.2.1 just
because that's what I have handy.


I see now. I tried to apply them to 4.2.x and some chunks failed, and I thought 
I was not applying them where they were intended to go.


Thanks!

Liviu


Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-15 Thread Liviu Nicoara

On 9/15/12 5:19 PM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 4:57 PM, Liviu Nicoara nikko...@hates.ms wrote:


Yes, but it restores the default packing, not an arbitrary one, potentially
set by the user prior to including our headers. Say, the user sets 2, the
default is 4 and we set 8. When we set it to default it goes back to 4,
instead of the expected 2. Did I get this right?


This is true, but leaving some arbirary pragma pack(N) (for N != 0) in
effect for the duration of a program, and expecting it to work sounds
like a very defective programming approach to me. It will certainly
not work on SPARC at all.  if the program needs to pack something in a
certain specific way, it need to do so for that specific something
only. Otherwise the side-effects of globally setting a non-default
packing will destroy the program anyway.


I merely wanted to point out that restoring the default packing is not the same 
as restoring the packing to the previous value in effect.


Given this, I thought about an alternative way of forcing this alignment, e.g., 
via a union, aligned on an appropriate type. I see an advantage here in that 
most of the changes would occur where we define the 'primitive' mutex and 
condition wrappers, saving a few pre-processor conditionals and pragmas along 
the way.


What do you think?

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-16 Thread Liviu Nicoara

On 9/16/12 3:20 AM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote:


Now, to clear the confusion I created: the timing numbers I posted in the
attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a
perfectly forwarding, no caching public interface (exemplified by a changed
grouping) performs better than the current implementation. It was that test
case that I hoped you could time, perhaps on SPARC, in both MT and ST
builds. The t.cpp program is for MT, s.cpp for ST.


I got your patch, and have tested it.


Thanks, Stefan. I looked over it and it seems very similar to, and somewhat more 
detailed than gprof profiling output.


I am going to update the incident shortly with a more detailed timing 
measurements on my side, in the form of a new attachment. Just FYI in case you 
still don't get notifications.


Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-16 Thread Liviu Nicoara

On 9/16/12 11:21 AM, Liviu Nicoara wrote:

On 9/16/12 3:20 AM, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote:


Now, to clear the confusion I created: the timing numbers I posted in the
attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a
perfectly forwarding, no caching public interface (exemplified by a changed
grouping) performs better than the current implementation. It was that test
case that I hoped you could time, perhaps on SPARC, in both MT and ST
builds. The t.cpp program is for MT, s.cpp for ST.


I got your patch, and have tested it.


Thanks, Stefan. I looked over it and it seems very similar to, and somewhat more
detailed than gprof profiling output.

I am going to update the incident shortly with a more detailed timing
measurements on my side, in the form of a new attachment. Just FYI in case you
still don't get notifications.


I have attached a new set of results to the incident, in the form of the 
archive:

http://tinyurl.com/9drzg4e

Please see the content for a description of the library changes (_numpunct.h 
file), the MT test program (t.cpp) and the results collected through two 
separate builds on two different machines (results.txt file).


Thanks.

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Liviu Nicoara

On 09/17/12 09:51, Stefan Teleman wrote:

On Mon, Sep 17, 2012 at 8:46 AM, Liviu Nicoara nikko...@hates.ms wrote:


In the meantime I would like to stress again that __rw_get_numpunct is
perfectly thread-safe and does not need extra locking for perfect
forwarding.


So, by removing the test for

   if (!(_C_flags  _RW::__rw_gr))

(or any other bitmask for that matter), the functions which were
thread-unsafe - and were exhibiting all the symptoms of a run-time
race condition -, magically became thread-safe?

I have looked *extensively* at the code in __rw_get_numpunct. It is
inherently thread-unsafe.


I mean to say that no extra locking is necessary when the public interface 
forwards and no caching is done.

In more detail: __rw_get_numpunct code is entered upon a call from the 
protected virtual interface. It calls facet _C_data member function which 
either returns objects constructed from previously-initialized POD data (in 
which case no locking is necessary), or it attempts to first initialize the 
facet data from the STDCXX database.

If the latter, the execution goes in the facet data initialization code which 
is appropriately synchronized, see this stack trace:

(gdb) bt
#0  __rw::__rw_facet::_C_get_data (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/src/facet.cpp:179
#1  0x0043788b in __rw::__rw_facet::_C_data (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_facet.h:194
#2  0x0044874f in __rw::__rw_get_numpunct (pfacet=0x6e10a0, flags=4) at 
srcdir/stdcxx/branches/4.2.x/src/punct.cpp:80
#3  0x00449a14 in __rw::__rw_get_punct (pfacet=0x6e10a0, flags=4) at 
srcdir/stdcxx/branches/4.2.x/src/punct.cpp:578
#4  0x0041cecf in std::numpunctchar::do_grouping (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:99
#5  0x0041bd31 in std::numpunctchar::grouping (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:190
#6  0x004036b8 in main (argc=2, argv=0x7fffe018) at 
srcdir/stdcxx/branches/4.2.x/tests/localization/s.cpp:29
(gdb)

When the above fails, the facet data has not been initialized, e.g., when there 
is no STDCXX locale database, or the name of the locale does not refer to a 
locale in STDCXX database. The __rw_get_numpunct function will attempt next to 
use libc and system locales, via the __rw_setlocale class, which again is 
properly synchronized and the scope of that lock extends to cover the facet 
data initialization. See this stack trace:

(gdb) bt
#0  __rw::__rw_setlocale::__rw_setlocale (this=0x7fffdd30, locname=0x6e112a 
en_US.utf8, cat=6, nothrow=0) at 
srcdir/stdcxx/branches/4.2.x/src/setlocale.cpp:84
#1  0x00448974 in __rw::__rw_get_numpunct (pfacet=0x6e10a0, flags=4) at 
srcdir/stdcxx/branches/4.2.x/src/punct.cpp:134
#2  0x00449a14 in __rw::__rw_get_punct (pfacet=0x6e10a0, flags=4) at 
srcdir/stdcxx/branches/4.2.x/src/punct.cpp:578
#3  0x0041cecf in std::numpunctchar::do_grouping (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:99
#4  0x0041bd31 in std::numpunctchar::grouping (this=0x6e10a0) at 
srcdir/stdcxx/branches/4.2.x/include/loc/_numpunct.h:190
#5  0x004036b8 in main (argc=2, argv=0x7fffe018) at 
srcdir/stdcxx/branches/4.2.x/tests/localization/s.cpp:29

I hope you agree that this synchronization is sufficient for the facet 
initialization and reading of facet data.

Thanks.

Liviu


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Liviu Nicoara

On 09/17/12 11:21, Stefan Teleman wrote:

On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote:


I hope you agree that this synchronization is sufficient for the facet
initialization and reading of facet data.


Sorry, I do not agree. Two different thread analyzers from two
different compilers written by two different compiler writers tell me
not to.


Stefan, that is indeed your prerogative. However, please keep in mind that 
tools may be buggy or may have limitations beyond what is advertised. I do not 
ask you to have faith in my analysis, which would be absurd, but to look for 
yourself, exercise due diligence in testing the code and draw your own 
conclusions.

Thanks,
Liviu


Sun fbe assembler manual

2012-09-17 Thread Liviu Nicoara

Hi all,

I need a reference manual for the fbe assembler. I am interested in the syntax 
of the `.type' directive. Do you know where such a manual would be located? The 
Solaris assembler manual I found on Oracle's website does not mention the .type 
directive.

Thanks!

Liviu



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Liviu Nicoara

On 09/18/12 08:55, Stefan Teleman wrote:

On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote:

I hope you agree that this synchronization is sufficient for the facet
initialization and reading of facet data.


I have reduced the number of reported race conditions in
22.locale.numpunct.mt from 12440:



I am attaching a test program which, while 100% MT-safe, is flagged by the 
Solaris thread analyzer. The test program contains, conceptually, the exact 
same of shared variable access like the locale management code, the facet 
management code and the facet data management code. This proves, if there was a 
need, that the tool results are to be further analyzed and interpreted but not 
taken at literal value.

The fix we are looking for is one which corrects the initial MT failure 
observed in caching the facet data, as well as it preserves the unguarded reads 
of facet data for performance reasons.

Thanks.

Liviu



http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html

to 288:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/22.locale.numpunct.mt.5.er.html/index.html

The changes are in the following files:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/facet.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120918/punct.cpp

_numpunct.h looks like this:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/_numpunct.h

With these changes, no races conditions are repoted for any of the
functions in std::numpunctT.

Still, there are 288 race conditions being reported in
__rw_locale::__rw_locale and in std::locale::_C_get_facet. We need to
identify the source and cause of these race conditions and correct
them as well.

This is not a complete solution to the problem, because we still have
to re-write the chunk of code I eliminated from facet.cpp. It is only
step one towards finding a real solution. But, at least for now, we
have pinpointed where the source of these race conditions is located,
and what causing it.

The test program was run as: ./22.locale.numpunct.mt --nthreads=8
--nloops=1.

More to follow.

--Stefan

#include stdio.h
#include stdlib.h
#include pthread.h
#include unistd.h

#define MAX_THREADS 128

static long counter = 0, nloops = 1000, nthreads = 16;
static pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

extern C {

static void* 
f (void* pv)
{
for (size_t i = 0; i  nloops; ++i) {
if (counter == 0) {
pthread_mutex_lock (lock);
if (counter == 0)
++counter;
pthread_mutex_unlock (lock);
}
else {
// counter value is safe to use here
}
}

printf (%ld\n, counter);

return 0;
}

}

int
main (int argc, char** argv)
{
switch (argc) {
case 3:
nloops = atol (argv [2]);
case 2:
nthreads = atol (argv [1]);
break;
}

pthread_t tid [MAX_THREADS] = { 0 };

if (nthreads  MAX_THREADS)
nthreads = MAX_THREADS;

for (int i = 0; i  nthreads; ++i) {
if (pthread_create (tid + i, 0, f, 0))
exit (-1);
}

for (int i = 0; i  nthreads; ++i) {
if (tid [i])
pthread_join (tid [i], 0);
}

return 0;
}


Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Liviu Nicoara

On 09/18/12 13:21, Stefan Teleman wrote:

On Tue, Sep 18, 2012 at 12:43 PM, Liviu Nicoara nikko...@hates.ms wrote:


I am attaching a test program which, while 100% MT-safe, is flagged by
the Solaris thread analyzer.


The program as written is not thread safe. It is reading the value of
the counter variable and performing a zero comparison outside of a
mutex lock:


Stefan, I urge you to consider the argument on its merits. Yes, the thread 
analyzer flags it, but it is nonetheless MT-safe. Specifically:

1. writes are properly synchronized wrt themselves
2. reads are inherently thread-safe wrt themselves
3. reads are properly synchronized wrt writes
4.no thread can possibly observe an intermediate or otherwise incomplete value.

I will also add that the flag is either 0 or 1 during the execution of the 
program, with only one transition from 0 to 1, performed by one single thread.

I will concede that I might be wrong and I am open to arguments. I would accept 
as a counter-argument this program if you can show a runtime failure. I would 
also accept as argument a scenario under which two threads would see 
inconsistent/incorrect values or write the variable more than once, etc.

Thanks,
Liviu



for (size_t i = 0; i  nloops; ++i) {
 if (counter == 0) {  // --- 
 pthread_mutex_lock (lock);
 if (counter == 0)
 ++counter;
 pthread_mutex_unlock (lock);
 }
 else {
 // counter value is safe to use here
 }
 }



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Liviu Nicoara

On 09/18/12 18:24, Stefan Teleman wrote:

On Tue, Sep 18, 2012 at 4:35 PM, Liviu Nicoara nikko...@hates.ms wrote:


I will concede that I might be wrong and I am open to arguments. I would
accept as a counter-argument this program if you can show a runtime failure.


The the first read of the counter variable is outside a mutex lock
correct? The read is followed by a 0 comparison, correct?


Correct.



What guarantees that between the read and the comparison the value of
the counter variable hasn't been modified by another thread? The
thread currently doing the comparison cannot guarantee it: it hasn't
locked the mutex. Other threads may be running - actually they
probably are. Another thread may have already acquired the mutex and
incremented the value of counter. Your thread has no way of knowing if
that has happened, because it does not yet have exclusive access to
the counter variable. It will, after it acquires the mutex.


It does not matter if the value is changed concurrently in between the 
reading of it and the actual comparison. The value can only be 0, in 
which case it takes the branch that locks the mutex and the next read 
will absolutely see a fresh value; or it can be 1 in which case it has 
already been initialized and this thread does not pay the penalty of a 
lock anymore.




Where is it reading the variable  from? A register? Is it declared
volatile? L2 cache? L3 cache?


Irrelevant. If the value has not been changed it can be read from either 
the cache or directly from the memory. If the value has been changed 
concurrently in between the reading and the actual comparison, the 
processor sees either a stale value of 0, or the new value 1. It is 
guaranteed that any thread reading the value after the (one and only) 
unlock will see a fresh value.




The program, as you wrote it, implicitly acknowledges that it is not
thread safe. That is the point of the double check: one before the
mutex lock, and one after it.


That is a misstatement of the program intentions. See below.


The point of the first check has nothing
to do with thread-safety, and everything to do with a minor
optimization: if the value stored in variable counter is already not
zero, then there's no point in locking the mutex or performing the
increment.


The first check is indeed an optimization. It is the point of this 
exercise to show that the unguarded reads in the localization library 
are not defects and the code, simplified in my test case, is thread safe 
in exactly the respects I mentioned before: the program only observes 
consistent, correct values (program states) in a concurrent environment.


HTH.

Liviu



Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Liviu Nicoara

On 9/18/12 7:04 PM, Stefan Teleman wrote:

On Tue, Sep 18, 2012 at 6:42 PM, Liviu Nicoara nikko...@hates.ms wrote:


The first check is indeed an optimization. It is the point of this exercise
to show that the unguarded reads in the localization library are not defects
and the code, simplified in my test case, is thread safe in exactly the
respects I mentioned before: the program only observes consistent, correct
values (program states) in a concurrent environment.


The Intel Thread Analyzer shows a data race in the exact same spot the
SunPro Thread Analyzer shows a data race, and in the exact same spot I
have stated earlier that there is a data race.

The results are available here:

http://s247136804.onlinehome.us/liviu-mt-test/22.mt.test.r000ti3.inspxez

As usual, it is an Intel Inspector XE 2003 Analysis Results Blob. It
opens with the Intel Inspector. i don't know what format it is in, or
whether or not it can be read by other means.

So far we have SunPro 12.3/Linux/Intel, SunPro 12.3/Solaris/Intel,
SunPro 12.3/Solaris/SPARC and Intel 2003/Linux/Intel saying that there
is a race condition in your program.

And now I would like to get back to solving the thread-unsafety and
race conditions problems in Apache stdcxx.


Stefan, you cannot ignore others' arguments and embrace a 'no' attitude; the 
tools are not saying what you think they do. They all show the same thing 
because they all do the exact same analysis and we could delve into that, too. I 
sincerely hoped you could come up with some sort of a logical argument that 
would prove my latest test case wrong, or at least show it failing at runtime.


My time is also limited and I have already spent more time than I wanted to 
trying to present a logical argument, down to the simplest test cases. I say we 
defer this to Martin for when he will be back from vacation, next week. For the 
record, because people may not have the energy to read all this back and forth, 
I will summarize here the gist of this thread:


1. The facet data caching is not MT-safe
2. The facet data initialization (STDCXX or system locales) is safe (*)
3. There is no unit test currently showing a failure in (2)
4. Timing results show that caching may be slower than non-caching in MT builds
5. A fix should, ideally, be binary compatible
6. A fix should, ideally, preserve performance or increase it
7. There is one patch, currently attached to the issue, by Stefan
8. Other partial patches are referenced from this thread

Please correct me if I missed anything. The above summary is a good starting 
point for a new thread dealing with this issue.


HTH.

Liviu

(*) When perfectly forwarding, with a theoretical concern I asked in an earlier 
post, directed at Martin.




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara
Thanks for the feed-back. Please see below.


On Sep 19, 2012, at 10:02 PM, Stefan Teleman wrote:

 On Wed, Sep 19, 2012 at 8:51 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 I think you are referring to `live' cache objects and the code which
 specifically adjusts the size of the buffer according to the number of
 `live' locales and/or facets in it. In that respect I would not call that
 eviction because locales and facets with non-zero reference counters are
 never evicted.
 
 But anyhoo, this is semantics. Bottom line is the locale/facet buffer
 management code follows a principle of economy.
 
 Yes it does. But we have to choose between economy and efficiency. To
 clarify: The overhead of having unused pointers in the cache is
 sizeof(void*) times the number of unused slots.  This is 2012. Even
 an entry-level Android cell phone comes with 1GB system memory. If we
 want to talk about embedded systems, where memory constraints are more
 stringent than cell phones, then we're not talking about Apache stdcxx
 anymore, or any other open souce of the C++ Standard Library. These
 types of systems use C++ for embedded systems, which is a different
 animal altogether: no exceptions support, no rtti. For example see,
 Green Hills: http://www.ghs.com/ec++.html. And even they have become
 more relaxed about memory constraints. They use BOOST.
 
 Bottom line: so what if 16 pointers in this 32 pointer slots cache
 never get used. The maximum amount of wasted memory for these 16
 pointers is 128 bytes, on a 64-bit machine with 8-byte sized pointers.
 Can we live with that in 2012, a year when a $500 laptop comes with
 4GB RAM out of the box? I would pick 128 bytes of allocated but unused
 memory over random and entirely avoidable memory churn any day.


The argument is plausible and fine as far as brainstorming goes. 

But have you measured the amount of memory consumed by all STDCXX locale data 
loaded in one process? How much absolute time is spent in resizing the locale 
and facet buffers? What is the gain in space and time performance with such a 
change versus without? Just how fragmented the heap becomes and is there a 
performance impact because of it, etc.? IOW, before changing the status quo one 
must show an objective defect, produce a body of evidence, including a failing 
test case for the argument.


 
 My goal: I would be very happy if any application using Apache stdcxx
 would reach its peak instantiation level of localization (read: max
 number of locales and facets instantiated and cached, for the
 application's particular use case), and would then stabilize at that
 level *without* having to resize and re-sort the cache, *ever*. That
 is a locale cache I can love. I love binary searches on sorted
 containers. Wrecking the container with insertions or deletions, and
 then having to re-sort it again, not so much. Especially when I can't
 figure out why we're doing it in the first place.


And I love minimalistic code, and hate waste at the same time, especially in a 
general purpose library. To each its own.


 
 Hey Stefan, are the above also timing the changes?
 
 Nah, I didn't bother with the timings - yet - for a very simple
 reason: in order to use instrumentation, both with SunPro and with
 Intel compilers, optimization of any kind must be disabled. On SunPro
 you have to pass -xkeepframe=%all (which disables tail-call
 optimization as well), in addition to passing -xO0 and -g. So the
 timings for these unoptimized experiments would have been completely
 irrelevant.

Well, I think you are the only one around here with access to SPARC hardware, 
your input is very precious in this sense. Also, this is the reason for which I 
kept asking that question earlier: do we have currently any failing locale MT 
test when numpunct does just perfect forwarding, with no caching? I.e., 
changing just _numpunct.h and no other source file (as to silence thread 
analyzers warnings) does any locale (or other) MT tests fail? I would greatly 
appreciate it if you could give it a run on your hardware if you don't already 
know the answer.

The discussion has been productive. But I object to the patch as is because it 
goes out of the scope of the original incident. I think this patch should only 
touch the MT defect detected by the failing test cases. If you think the other 
parts you changed are defects you should open corresponding issues in JIRA and 
have them discussed in their separate rooms.

Thanks,
Liviu

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On 09/20/12 13:11, Stefan Teleman wrote:

On Thu, Sep 20, 2012 at 8:07 AM, Liviu Nicoara nikko...@hates.ms wrote:

But have you measured the amount of memory consumed by all STDCXX locale data 
loaded in one process? How much absolute time is spent in resizing the locale 
and facet buffers? What is the gain in space and time performance with such a 
change versus without? Just how fragmented the heap becomes and is there a 
performance impact because of it, etc.? IOW, before changing the status quo one 
must show an objective defect, produce a body of evidence, including a failing 
test case for the argument.


sizeof(std::locale) * number_of_locales.


I was more interested in the underlying locale data, not the C++ objects.



I'll let you in on a little secret: once you call setlocale(3C) and
localeconv(3C), the Standard C Library doesn't release its own locale
handles until process termination. So you might think you save a lot
of memory by destroying and constructing the same locales. You're
really not. It's the Standard C Library locale data which takes up a
lot of space.


Thanks for the secret, I appreciate it. Did you mean to say that the C Standard 
mandates that?!



What I do know for a fact that this optimization did, was to cause
the races conditions reported by 4 different thread analyzers. Race
conditions are a show-stopper for me, and they are not negotiable.


No, that optimization was not causing the MT defect you originally noted. 
Saying so only shows a lack of familiarity with the implementation.




And I love minimalistic code, and hate waste at the same time, especially in a 
general purpose library. To each its own.


Here's a helpful quote:

We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil. It's from Donald
Knuth.


Please, no.



And I love correct code which doesn't cause thread analyzers to report
more than 12000 race conditions for just one test case. I've said it
before and I will say it again: race conditions are a showstopper and
are not negotiable. Period.



Stefan, you are not being correct by a consensus of thread analyzers output or 
by being loud, or by pounding your fist on the table. This being said I will 
continue to exercise due diligence and put in the necessary time and effort to 
provide you with the most useful feed-back I can.

I see that you missed my question in the post before: did you see a failure in 
the locale MT tests in your SPARC runs? If you do not want to run that test, 
that is fine, just let me know.

Thanks,
Liviu




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 5:31 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 5:07 PM, Liviu Nicoara nikko...@hates.ms wrote:

 
 To answer your question [...]:
 yes, the MT failures occur on SPARC as well, on both SPARCV8 and
 SPARCV9, and the race conditions are reported on *ALL* plaforms
 tested, even after having applied your _numpunct.h patch. This patch
 alone does *NOT* solve the problem.

Stefan, I want to be clear. You are talking about a patch identical in nature 
to the one I have attached now. Just want to be 100% sure we are talking about 
the same thing. This one still produces failures (crashes, assertions, etc.) in 
the locale MT tests on SPARC and elsewhere in your builds?

Thanks,
Liviu




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 7:37 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer
 wojciech.me...@googlemail.com wrote:
 Hi,
 
 My perceptions is by reading through the whole thread - we should not
 trust 100% external tools to asses the safety of the code. I don't think
 there exist an algorithm that produces no false positives.
 
 That's said I admire Stefan's approach, but we should ask the question
 are we MT safe enough? I would say from what I read here: yes.
 
 Based on what objective metric?


The only gold currency that anyone in here accepts without reservations are 
failing test cases. I believe I have seen some exceptions to the golden rule in 
my RW time, but I can't recall any specific instance.

Liviu 



Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 5:23 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek
 travis.vi...@roguewave.com wrote:
 
 
 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.
 
 You have a working knowledge of all Standard C Library implementations?
 
 I happen to do, yes, for the operating systems that I've been testing
 on. I also happen to know that you don't. This fact alone pretty much
 closes up *this* particular discussion.
 
 Do yourself, and this mailing list a favor: either write a patch which
 addresses all of your concerns *AND* eliminates all the race
 conditions reported, or stop this pseudo software engineering bullshit
 via email.
 
 There is apparently, a high concentration of know-it-alls on this
 mailing list, who are much better at detecting race conditions and
 thread unsafety than the tools themselves. Too bad they aren't as good
 at figuring out their own bugs.


The sniping is uncalled for. There are no enemies here, no one is after you. 
There is criticism though and you are expected to take it and argue your point 
of view. If you can't stand the heat, get out of the kitchen.


 
 It took eight months for anyone here to even *acknowledge* that
 numpunct and moneypunct do have, in fact, a thread safety problem.
 Never mind that the test case for these facets had been crashing for 4
 years. To be quite blunt and to the point, after 8 months of denying
 obvious facts, your credibility is quite a bit under question at this
 point.


Yes, we are busy with other stuff. I wish I got paid to work on this instead.


 
 This entire discussion has become a perfect illustration with what's
 wrong with the ASF, as reported here:
 
 http://www.mikealrogers.com/posts/apache-considered-harmful.html


I actually read it. I see a guy complaining he can't have it his way. No 
problem. One can fork this project at any time and start it anew, by 
themselves, or in the company of like programmers elsewhere. 

For better or worse Apache got STDCXX from RogueWave. Complaining about it is 
like complaining that Apple doesn't give us iPhones for free; after all we are 
the power users and we know what to do with them.

L

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 7:45 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 Stefan, I want to be clear. You are talking about a patch identical in 
 nature to the one I have attached now. Just want to be 100% sure we are 
 talking about the same thing. This one still produces failures (crashes, 
 assertions, etc.) in the locale MT tests on SPARC and elsewhere in your 
 builds?
 
 On September 17, 2012 I have posted the following message to this list:
 
 http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html
 
 In that message, there is a link to my SPARC thread-safety test results:
 
 http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html
 
 This test was run with the following _numpunct.h file:
 
 http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html
 
 The test above shows 12440 race conditions detected for a test run of
 22.locale.numpunct.mt, with  --nthreads=8 --nloops=1.
 
 Did you ever look at these test results? From reading your email, I
 realize that you never looked at it. That is the only possible
 explanation as to why you're asking now for SPARC test results, today
 being September 20, 2012.


I see, there is a confusion about this. Probably nobody explained it before. A 
failing test case means a test case that causes the abnormal termination of the 
execution of the program or creates evidence of abnormal data in the program 
execution.

In this respect please see the atomic add and exchange tests as classical 
examples of what I mean.

I have read all your emails in detail and I have inspected all your 
attachments, modulo the ones I could not open.

Thanks,
Liviu




Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 8:02 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote:
 
 On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote:
 
 The only gold currency that anyone in here accepts without reservations 
 are failing test cases. I believe I have seen some exceptions to the 
 golden rule in my RW time, but I can't recall any specific instance.
 
 That may be a valid metric here.
 
 The only one. Any programmer worth his salt -- I am borrowing your words 
 here -- would be able to demonstrate the validity of his point of view with 
 a test case.
 
 I did. There are 12440 race conditions detected for an incomplete run
 of 22.locale.numpunct.mt. By incomplete I mean: it did not run with
 its default nthreads and nloops which I believe are 8 threads and
 20 loop iterations.


That is not it, and you did not. Please pay attention: given your assertion 
that a race condition is a defect that causes an abnormal execution of the 
program during which the program sees abnormal, incorrect states (read: 
variable values) it should be easy for you to craft a program that shows 
evidence of that by either printing those values, or aborting upon detecting 
them, etc.

 
 [...] and overall just email bullshit.

Stop using that word. 

L

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Liviu Nicoara

On Sep 20, 2012, at 8:59 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 8:44 PM, C. Bergström
 cbergst...@pathscale.com wrote:
 
 
 fencepost comment - The results are based on tools and I don't think he has
 a large program which actually triggers the conditions.  (Creating one may
 take quite some time)
 
 I do have a program which triggers the race conditions:
 22.locale.numpunct.mt. Part of the official test harness.
 
 The real reason why they don't want to accept what the instrumentation
 tools are saying is very simple: they don't LIKE reading what the
 tools are saying. So, blame the tools, pretend that as long as it
 doesn't crash again there's no bug and hope for the best.

I cannot include an analyzer output as a regression test in the test suite. 

 
 But I am very glad this is on a public mailing list, so everyone can
 read what's going on here.
 

?

Re: STDCXX-1056 : numpunct fix

2012-09-21 Thread Liviu Nicoara

On 09/21/12 05:13, Stefan Teleman wrote:

On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
travis.vi...@roguewave.com wrote:

I have provided this list with test results showing that my patch
*does* fix the race condition problems identified by all the tools at
my disposal. I'm willing to bet you never looked at it. You dismissed
it outright, just because you didn't like the *idea* that increasing
the size of the cache, and eliminating that useless cache resizing
would play an important role in eliminating the race conditions.



I looked at it in great detail and I am sure Travis read it too. The 
facet/locale buffer management is a red herring. Apparently, we cannot convince 
you, regardless of the argument. That's fine by me.




I have yet to see an alternative patch being proposed here, which
would eliminate the race conditions, and which I am willing to test
just as thoroughly as I've tested mine. The only thing i've seen are
continued attempts at dismissing the existence of these race
conditions, as  reported by the instrumentation tools, based on some
general axioms about their accuracy. No-one on this list has a clue as
to how the SunPro Thread Analyzer actually works, because it's not
open source, and none of you work at Oracle, therefore you can't see
the code. But everyone just *knows* that it's inaccurate, or that it
reports false positives.


We are not dismissing the analysis. We are dismissing your interpretation of 
it. The tool is not indicating a defect, but a potential defect. I am calling 
it potential because, as my test case (http://tinyurl.com/97ks9gc) indicates, 
one can have a race condition which is not a defect in a concurrent environment.



As long as you, or anyone else, continues to blame the instrumentation
tools for the bug, and as long as anyone here continues to suggest
that the only acceptable proof of the existence of this bug is some
other program which needs to be written using fprintf(3C), and as long
as no-one here is willing to provide an alternative patch which
demonstrably eliminates 100% of the reported race conditions, this
entire back-and-forth about the existence of these race conditions,
the accuracy of the tools and what they are reporting is nothing but a
giant waste of time.



The correction of 1056 does not necessarily need to eliminate the analyzer 
warnings. Aiming for that is simply unreasonable, not because it is not 
attainable (after all, you demonstrated it), but because it is inappropriately 
and insufficiently argued for the situation at hand.

Liviu


SUNPro 5.12 compilation failure in optimized Linux builds

2012-09-21 Thread Liviu Nicoara

Optimized (but not debug) builds fail to compile setlocale.cpp with the error:

$ cat t.cpp; CC -c t.cpp
#define _XOPEN_SOURCE
#include cwchar
/opt/sunpro/12_3/prod/include/cc/wchar.h, line 90: Error: tm is not defined.
/opt/sunpro/12_3/prod/include/cc/wchar.h, line 92: Error: fgetwc is not 
defined.
...

and so on for every wide-char function in wchar.h.

The definition of _XOPEN_SOURCE at the top of setlocale.cpp is not entirely 
correct in that it interacts with the various guards in the C library headers. 
AFAICT, Oracle/Sun wchar.h includes C library wchar.h, which includes C library 
wctype.h, which includes again wchar.h, coming full circle to the compiler 
wchar.h which uses the names w/o them being defined, yet. The fact that 
Oracle/Sun headers include_next corresponding C library headers outside of a 
inclusion guard does not help.


(Debug builds are not affected because of interaction from the difference in the 
structure and includes of rw/_traits.h in debug vs. optimized builds).


The poor man's fix is to guard that _XOPEN_SOURCE define for SUNPro builds (see 
below). However, I am not sure whose side the error is.


Thanks,
Liviu

PS. Is it still called SUNPro? Oracles seems it has sanitized that name out on 
their website.


Index: src/setlocale.cpp
===
--- src/setlocale.cpp   (revision 1388733)
+++ src/setlocale.cpp   (working copy)
@@ -34,8 +34,10 @@
 #include rw/_defs.h

 #if defined (__linux__)  !defined (_XOPEN_SOURCE)
+#  if !defined (__SUNPRO_CC)
// need S_IFDIR on Linux
 #  define _XOPEN_SOURCE
+#  endif // !__SUNPRO_CC
 #endif   // __linux__  !_XOPEN_SOURCE

 #include locale.h   // for setlocale()


STDCXX-1056 DCII [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]

2012-09-22 Thread Liviu Nicoara

On 9/17/12 5:42 PM, Liviu Nicoara wrote:

[...]
However, after more careful thought, I think there is a problem there even
though we don't have an objective proof for it, yet.

The writes are not atomic and they function just like DCII, being subject to
both compiler reordering and out of order execution. E.g., it is assumed that
the writes to the _C_impsize and _C_impdata members occur in the program order,
which would make unguarded reads/tests of _C_impsize safe. This is what the
facet initialization and access code does, conceptually:

if (_C_impsize) {
 // already initialized, use data
}
else {
 // initialize
 mutex.lock ();
 _C_impdata = get_data_from_database ();
 _C_impsize = get_data_size ();
 mutex.unlock ();
}

The mutex lock and unlock operations introduce memory barriers but they are not
guaranteeing the order in which the two writes occur. If the writes would occur
in the exact program order, unguarded reads and tests of _C_impsize would be
safe. Unfortunately, a thread may see the _C_impsize variable being updated
before the _C_impdata.


Elaborating on the above -- facet.cpp:~250, the code is actually closer to this:

if (_C_impsize) {
 // already initialized, use _C_impdata
}
else {
 // initialize
 mutex.lock ();

 if (_C_impsize)
 return _C_impdata;

 void* const pv = ...;
 size_t sz = ...;

 _C_impdata = pv;   // 1
 _C_impsize = sz;   // 2

 mutex.unlock ();
}

(1) and (2) can be reordered. Also, when mapping the STDCXX locale database, 
facet.cpp:288, there is another defect writing the two variables:


   _C_impdata = __rw_get_facet_data (cat, _C_impsize, 0, codeset);
  ^^^
  |||
Set early in __rw_get_facet_data -+++

The only way to order the above writes, preserve the fast checking for 
initialization, and generally salvaging the current algorithm is a full memory 
barrier in between the writes, e.g.:


if (_C_impsize) {
 // already initialized, use data
}
else {
 // ...

 _C_impdata = pv;
 full_membar ();
 _C_impsize = sz;

 // ...
}

Atomics are part of C++11, I wonder if anybody here has working experience with 
the atomic API. A mutex locking will not do because the lock is only acquiring 
and the write before the lock can shift down.


Of course I could be wrong. Any input is welcome.

Thanks,
Liviu


STDCXX-1068 and alignment

2012-09-22 Thread Liviu Nicoara
Stefan, could you please elaborate on the misaligned reads/writes that you 
observed on those platforms? What was failing?


Thanks,
Liviu


Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Liviu Nicoara

On 09/16/12 12:03, Stefan Teleman wrote:

On Sat, Sep 15, 2012 at 5:47 PM, Liviu Nicoara nikko...@hates.ms wrote:


I merely wanted to point out that restoring the default packing is not the
same as restoring the packing to the previous value in effect.

Given this, I thought about an alternative way of forcing this alignment,
e.g., via a union, aligned on an appropriate type. I see an advantage here
in that most of the changes would occur where we define the 'primitive'
mutex and condition wrappers, saving a few pre-processor conditionals and
pragmas along the way.


I understood what you were saying. I just don't understand under what
_sane_ circumstances a program would #include a system library header
file with a previously set packing to something other than default. Or
would expect a non-default packing to work during a function call to a
sytem library. That's an insane configuration on any operating system
that I know of, not just on SPARCV8.


I can't think of a valid scenario, either. I guess the point is moot.

A few more questions, if you will, as I am going through the changes:

1. I see similarities with 1040, should/would you close that one?
2. The issue only exists in MT builds, should there be a guard in configs?
3. The align reference docs talk only about aligning variables, not types. Is 
that different on SPARC?
4. I see rw/_mutex.h has alignment pragmas for both __rw_mutex_base class and 
its mutex member; same for __rw_static_mutex and its static member, etc. How 
does that work?
5. Why is __rw_guard aligned explicitly? I see it only contains a pointer to a 
mutex object.
6. The docs mention that the pragma must use the mangled variables names but I 
don't see that in the patch.

Thanks!

Liviu





[PATCH] STDCXX-853

2012-09-23 Thread Liviu Nicoara

Umm, I didn't think to search for a corresponding incident and I considered the 
defect to be so minor as to not warrant an issue. The following patch has been 
applied already on 4.2.x:

Index: tests/support/atomic_xchg.cpp
===
--- tests/support/atomic_xchg.cpp   (revision 1388732)
+++ tests/support/atomic_xchg.cpp   (revision 1388733)
@@ -297,7 +297,7 @@ void run_test (intT, thr_args_base::tag_
 // compute the expected result, skipping zeros by incrementing
 // expect twice when it overflows and wraps around to 0 (zero is
 // used as the lock variable in thread_routine() above)
-intT expect = intT (1);
+intT volatile expect = intT (1);
 
 const unsigned long nincr = (Args::nthreads_ * Args::nincr_) / 2U;
 


[PATCH] STDCXX-1069 [was: Re: SUNPro 5.12 compilation failure in optimized Linux builds]

2012-09-23 Thread Liviu Nicoara

On 09/22/12 00:51, Liviu Nicoara wrote:

Optimized (but not debug) builds fail to compile setlocale.cpp with the error:


A patch and a comment have been attached to the issue.

Thanks,
Liviu


Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Liviu Nicoara

On 9/23/12 3:48 PM, Stefan Teleman wrote:

On Sun, Sep 23, 2012 at 3:26 PM, Liviu Nicoara nikko...@hates.ms wrote:


To be honest it's quite bizarre that you cannot share that with us. Is it
really a trade secret? How can that be the case if Oracle customers are also
required to perform the same alignment, perhaps using the same techniques
like you showed in the patch?


That's the problem. I don't know what is and what is not a trade
secret, or copyrighted information, or unauthorized intellectual
property disclosure anymore.  I think it's in the eye of the
beholder. At Sun it was very clear.

Believe it or not, I had to get written approval from the Legal
Counsel's Office in order to be able to share these patches. And that
in spite of the fact that these patches are published, and have
already been published for *years*.


That clarifies things. Thanks.

Liviu



Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Liviu Nicoara

On 9/23/12 5:50 PM, Stefan Teleman wrote:

On Sun, Sep 23, 2012 at 5:23 PM, Stefan Teleman
stefan.tele...@gmail.com wrote:


The second URL says this:

QUOTE
Due to a change in the implementation of the userland mutexes
introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and
pthread_mutex_t must start at 8-byte aligned addresses. If this
requirement is not satisfied, all non-compliant applications on
Solaris/SPARC may fail with the signal SEGV with a callstack similar
to the following one or with similar callstacks containing the
function mutex_trylock_process.

   \*_atomic_cas_64(0x141f2c, 0x0, 0xff00, 0x1651, 0xff00, 0x466d90)
   set_lock_byte64(0x0, 0x1651, 0xff00, 0x0, 0xfec82a00, 0x0)
   fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780)

/QUOTE


Here's a link to an official datatype alignment table for SPARCV8:

http://docs.oracle.com/cd/E19205-01/819-5267/bkbkl/index.html

The interesting table is:

Table B–2 Storage Sizes and Default Alignments in Bytes


I see nothing really outstanding here. What is it that I should pay attention 
to?

Thanks,
Liviu


Re: [PATCH] STDCXX-853

2012-09-24 Thread Liviu Nicoara

On 09/24/12 01:18, Travis Vitek wrote:


Liviu,

Should the volatile be to the left of the intT typename here? I know it is 
equivalent, but it is weird to look at the line of code below and see that 
we're following two different conventions.



Thanks, will do.


Travis
___
From: Liviu Nicoara
Sent: Sunday, September 23, 2012 8:34 AM
To: dev@stdcxx.apache.org
Subject: [PATCH] STDCXX-853

Umm, I didn't think to search for a corresponding incident and I considered the 
defect to be so minor as to not warrant an issue. The following patch has been 
applied already on 4.2.x:

Index: tests/support/atomic_xchg.cpp
===
--- tests/support/atomic_xchg.cpp   (revision 1388732)
+++ tests/support/atomic_xchg.cpp   (revision 1388733)
@@ -297,7 +297,7 @@ void run_test (intT, thr_args_base::tag_
   // compute the expected result, skipping zeros by incrementing
   // expect twice when it overflows and wraps around to 0 (zero is
   // used as the lock variable in thread_routine() above)
-intT expect = intT (1);
+intT volatile expect = intT (1);

   const unsigned long nincr = (Args::nthreads_ * Args::nincr_) / 2U;




--
And now I see with eye serene
The very pulse of the machine.


Re: STDCXX-1056 : numpunct fix

2012-09-24 Thread Liviu Nicoara

On 9/24/12 12:06 AM, Stefan Teleman wrote:

On Fri, Sep 21, 2012 at 9:10 AM, Liviu Nicoara nikko...@hates.ms wrote:

On 09/21/12 05:13, Stefan Teleman wrote:


On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
travis.vi...@roguewave.com wrote:

I have provided this list with test results showing that my patch
*does* fix the race condition problems identified by all the tools at
my disposal. I'm willing to bet you never looked at it. You dismissed
it outright, just because you didn't like the *idea* that increasing
the size of the cache, and eliminating that useless cache resizing
would play an important role in eliminating the race conditions.




I looked at it in great detail and I am sure Travis read it too. The
facet/locale buffer management is a red herring. Apparently, we cannot
convince you, regardless of the argument. That's fine by me.


This bug [STDCXX-1056] was updated over the weekend with new comments.
Here's the link to the comments, for the record:


Stefan, was it your intention to completely eliminate all the race conditions 
with this last patch? Is this what the tools showed in your environment?


Thanks,
Liviu



https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452





Re: STDCXX-1056 : numpunct fix

2012-09-25 Thread Liviu Nicoara

On 09/24/12 22:28, Martin Sebor wrote:

On 09/20/2012 06:46 PM, Stefan Teleman wrote:

On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoaranikko...@hates.ms  wrote:


I have not created this requirement out of thin air. STDCXX development has 
functioned in this manner for as long as I remember. If it does not suit you, 
that's fine.


That would explain why these bugs are present in the first place.

If the official method of determining thread-safety here is
fprintf(3C), then we have a much bigger problem than
22.locale.numpunct.mt.


Tests that end in .mt.cpp use the RW_ASSERT() macro to verify
correctness in multithreaded programs. There is no way at the
moment to run a thread analyzer as part of the test suite,
although it would be a nice addition. As I said in my other
reply, though, not all data races necessarily indicate bugs,
so we'd need a way to distinguish between know/benign races
and the rest. (Unless we decide to eliminate all races and
accept the performance penalty.)


Preserving the lazy initialization is possible: we perfectly forward to the 
implementation with either:

1. no additional synchronization (preserves fast reads of the data), while 
fully knowing that there are data races in __rw_get_numpunct/_C_get_data there, 
or
2.we put a big lock on top of all and every facet operations (the version that 
would be race free).

Caching seems out of the window for now, more so in the presence of unguarded 
reads of facet's data.

The only argument for (1) is that at the moment we don't have a failing test 
yet, and not for lack of arduous trying.

However, if atomic/membar APIs would be recognized by the compiler, lazy 
initialization of the facet and caching of data would be attainable in a quite 
simple fashion.

Liviu


Re: STDCXX-1056 : numpunct fix

2012-09-25 Thread Liviu Nicoara

On 09/24/12 23:50, Stefan Teleman wrote:

On Mon, Sep 24, 2012 at 10:03 PM, Martin Sebor mse...@gmail.com wrote:

FWIW, there are race conditions in stdcxx. Some of them are by
design and benign on the systems the library runs on (although
I acknowledge that some others may be bugs). One such benign
date race is:

   timeT1  T2
 0x = N
 1x = N   read x

where x is a scalar that can be accessed atomically by the CPU
and the compiler.

I think some of the lazy facet initialization falls under this
class. It would be nice to fix them but only if it doesn't slow
things down. The others need to be fixed in any case.


The race conditions I am talking about are not benign.


Caching failures aside, we have no failing tests. Before saying they are malign 
you need to objectively show a failing program.



It shows the types of race conditions it's reporting: these are
read/write race conditions, not read/read. The thread analyzer's html
filter doesn't show the types of races reported as clearly as the
command-line analyzer which has a windowing GUI.


Martin's example above is a read-write race. In the absence of a failing test 
case it is quite unreasonable to modify the current implementation.

Liviu



STDCXX-1070

2012-09-25 Thread Liviu Nicoara

I filed 1070, failure to build 22.locale.collate.cpp on Linux with gcc 4.7.1. 
Gcc, Comeau and Clang fail to compile it, Intel and Sun are fine. It looks to 
me like Intel and Sun compilers are not doing the right thing. A small test 
case and a patch have been attached. The failing code has been reduced to:

$ cat test.cpp; g++ -c test.cpp
template  class charT 
void f ()
{
g (charT ('a'));
}

template  class charT 
void g (charT)
{
}

int h ()
{
return f char  (), 0;
}
test.cpp: In instantiation of 'void f() \[with charT = char\]':
test.cpp:14:23:   required from here
test.cpp:4:5: error: 'g' was not declared in this scope, and no declarations 
were found by argument-dependent lookup at the point of instantiation 
\[-fpermissive\]
test.cpp:8:6: note: 'templateclass charT void g(charT)' declared here, later 
in the translation unit

Liviu


Re: [jira] [Closed] (STDCXX-1056) std::moneypunct and std::numpunct implementations are not thread-safe

2012-09-25 Thread Liviu Nicoara

On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote:


  [ 
https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefan,

I don't think it's ok to close this bug. The race conditions are there and we 
have not come to a completely satisfactory conclusion on how to deal with them, 
or even if we should deal with them. Whichever it is we gotta keep this 
discussion open. I sure hope you want to be a part of it.


FWIW, I have spent quite some time looking at your proposed patch and I am going 
to reopen the incident. If I can.


Liviu



Stefan Teleman closed STDCXX-1056.
--

 Resolution: Won't Fix

Bug will not be fixed. Upstream refuses to acknowledge the existence of the bug 
in spite of objective evidence to the contrary.


std::moneypunct and std::numpunct implementations are not thread-safe
-

 Key: STDCXX-1056
 URL: https://issues.apache.org/jira/browse/STDCXX-1056
 Project: C++ Standard Library
  Issue Type: Bug
  Components: 22. Localization
Affects Versions: 4.2.1, 4.2.x, 4.3.x, 5.0.0
 Environment: Solaris 10 and 11, RedHat and OpenSuSE Linux, Sun C++ 
Compilers 12.1, 12.2, 12.3
Issue is independent of platform and/or compiler.
Reporter: Stefan Teleman
  Labels: thread-safety
 Fix For: 4.2.x, 4.3.x, 5.0.0

 Attachments: 22.locale.numpunct.mt.out, facet.cpp.diff, 
locale_body.cpp.diff, punct.cpp.diff, runtests-linux32-all.out, 
runtests-linux64-all.out, runtests.out, STDCXX-1056-additional-timings.tgz, 
stdcxx-1056.patch, stdcxx-1056-timings.tgz, 
stdcxx-4.2.x-numpunct-perfect-forwarding.patch, 
stdcxx-4.3.x-numpunct-perfect-forwarding.patch


several member functions in std::moneypunct and std::numpunct return
a std::string by value (as required by the Standard). The implication of 
return-by-value
being that the caller owns the returned object.
In the stdcxx implementation, the std::basic_string copy constructor uses a 
shared
underlying buffer implementation. This shared buffer creates the first problem 
for
these classes: although the std::string object returned by value *appears* to 
be owned
by the caller, it is, in fact, not.
In a mult-threaded environment, this underlying shared buffer can be 
subsequently modified by a different thread than the one who made the initial 
call. Furthermore, two or more different threads can access the same shared 
buffer at the same time, and modify it, resulting in undefined run-time 
behavior.
The cure for this defect has two parts:
1. the member functions in question must truly return a copy by avoiding a call 
to the copy constructor, and using a constructor which creates a deep copy of 
the std::string.
2. access to these member functions must be serialized, in order to guarantee 
atomicity
of the creation of the std::string being returned by value.
Patch for 4.2.1 to follow.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





Re: [jira] [Closed] (STDCXX-1066) SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte boundary

2012-09-25 Thread Liviu Nicoara

On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote:


  [ 
https://issues.apache.org/jira/browse/STDCXX-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anybody around here, except Stefan, who has access to a SPARC V8 machine updated 
to the specified kernel update or later, and who is willing to run a simple test 
program? It's a 5 minute job at most.


Thanks!



Stefan Teleman closed STDCXX-1066.
--

 Resolution: Won't Fix

Bug will not be fixed upstream. It is fixed in the Solaris releases.


SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte 
boundary
---

 Key: STDCXX-1066
 URL: https://issues.apache.org/jira/browse/STDCXX-1066
 Project: C++ Standard Library
  Issue Type: Bug
  Components: Thread Safety
Affects Versions: 4.2.1, 4.2.2, 4.2.x, 4.3.x
 Environment: Solaris 10 Update 6 or later on SPARCV8 [32-bit]
Defect is compiler-independent - in reality it only affects Sun Studio and GCC.
Reporter: Stefan Teleman
  Labels: features, runtime, threads
 Fix For: 4.2.2, 4.2.x, 4.3.x

 Attachments: _config-gcc.h.stdcxx-1066.patch, 
_config-sunpro.h.stdcxx-1066.patch, ctype.cpp.stdcxx-1066.patch, 
exception.cpp.stdcxx-1066.patch, ios.cpp.stdcxx-1066.patch, 
iostream.cpp.stdcxx-1066.patch, iostream.stdcxx-1066.patch, 
locale_body.cpp.stdcxx-1066.patch, locale_classic.cpp.stdcxx-1066.patch, 
messages.cpp.stdcxx-1066.patch, _mutex.h.stdcxx-1066.patch, 
time_put.cpp.stdcxx-1066.patch, use_facet.h.stdcxx-1066.patch


Starting with Solaris 10 Update 6, on SPARCV8, pthread_mutex_t and 
pthread_cond_t MUST be aligned on an 8-byte boundary. Misaligned access will 
result in either SEGV or SIGBUS.
There are numerous places in the multi-threaded version of stdcxx where 
pthread_mutex_t and/or pthread_cond_t types are contained within an union, but 
with an enforced alignment different than 8. All these instances must be 
corrected, and #ifdef-guarded for SPARCV8.
Patches to follow shortly, this is just opening the issue.
Warning: the patchset resolving this issue is very large, and it affects a 
large number of files.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira





Re: STDCXX-1056 : numpunct fix

2012-09-25 Thread Liviu Nicoara

On 9/24/12 11:06 PM, Stefan Teleman wrote:

On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara nikko...@hates.ms wrote:


Stefan, was it your intention to completely eliminate all the race
conditions with this last patch? Is this what the tools showed in your
environment?



https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452


Yes, all the race conditions in std::numpunct and std::moneypunct.
Not all the race conditions in stdcxx.


FWIW, I have analyzed the performance of the latest patch from Stefan. The test 
program is the same test program used in the timings attachments to the 
incident. The version of the library was 4.2.1, compiler gcc 4.7.1, with:


1. The simple, perfect forwarding patch, with no caching, which does eliminate 
race conditions.

2. Stefan's latest patch
3. A patch I created which truly, if crudely, eliminates all facet race 
conditions by using a top-level lock for all facet operations(*)


The numbers from the first test run are already available in the incident.

The numbers for the second test run were a bit bigger, which is expected because 
of the additional synchronization. E.g., I have found that the test, run against 
the STDCXX locale database, gives:


$ time ./t en_US.UTF-8 4 500
4, 500

real0m19.210s
user0m32.659s
sys 0m36.507s

where the first patch gives:

$ time ./t en_US.UTF-8 4 5000; done
4, 5000

real0m7.961s
user0m31.211s
sys 0m0.003s

Notice the number of loops, 10x larger in the second set of number.

Now, for the third patch, the numbers are embarrassingly large. It practically 
slows the program to a crawl because it does not do any reads of facet data 
without holding the lock. This, I wager, eliminates all races in the 
numpunct/moneypunct facet. See:


$ time ./t en_US.UTF-8 4 10
4, 10

real0m24.316s
user0m28.213s
sys 0m6.633s

Now that is a whooping performance hit.

I went back to see an explanation between the numbers I see with Stefan's more 
performing patch, and my lock-all patch. I believe that there are still places 
in Stefan's patch where the facet makes unguarded reads of facet data, e.g.:


loc/_numpunct.h:
190 // returns a pointer to the facet's implementation data
191 // if it exists, 0 otherwise
192 const void* _C_data () const {
193 return _C_impsize ? _C_impdata
194 : _RWSTD_CONST_CAST (__rw_facet*, this)-_C_get_data ();
195 }

where _C_impsize/_C_impdata are read outside of a lock scope, etc. I.e., a 
thread may read the _C_impsize while, say, another is still writing the 
_C_impdata member variable.


I conclude, even with such simple testing, that any solution that attempts to 
completely eliminate the race conditions in these facets will incur a penalty, 
with the safest (top-level locking) the worst by orders of magnitude and any 
other solution somewhere in between.


Please let me know if you have any questions.

Liviu

(*) Will attach it as soon as we figure out how to undo the closing of the 
issue.


Re: svn commit: r1389337 - /stdcxx/branches/4.2.x/tests/support/atomic_xchg.cpp

2012-09-26 Thread Liviu Nicoara

On 09/25/12 23:20, Martin Sebor wrote:

On 09/24/2012 06:38 AM, lnico...@apache.org wrote:

Author: lnicoara
Date: Mon Sep 24 12:38:17 2012
New Revision: 1389337

URL: http://svn.apache.org/viewvc?rev=1389337view=rev
Log:
2012-09-24  Liviu Nicoara  nikko...@hates.ms

 * tests/support/atomic_xchg.cpp: (run_test) moved counter volatile
   qualification to match STDCXX conventions (see r1388733)


The stdcxx ChangeLog format follows the GNU Coding Standard:
   http://www.gnu.org/prep/standards/html_node/Change-Logs.html

I.e., this entry should look like this:

 * tests/support/atomic_xchg.cpp (run_test): Move counter
   volatile qualification to match STDCXX convention (see
   r1388733).

(The colon follows the parenthesized function name. The text
should also end with a period.)

The svn propedit command lets you change a comment for an
already committed change.


Sorry about that. I fixed it.

L


Reopen closed incidents

2012-09-26 Thread Liviu Nicoara

I want to reopen the closed incidents, esp. 1056. On a second thought it might 
be more useful to open new ones and link the old ones so that we don't mess 
with ownership, etc. If nobody objects to this I will go forward with the 
latter.

Thanks.


STDCXX-1071 numpunct facet defect

2012-09-26 Thread Liviu Nicoara
I have created STDCXX-1071 and linked to STDCXX-1056. I have reduced the scope 
to numpunct because moneypunct is not failing for me. If someone has a 
moneypunct failure listing I want to see it.


I have reduced the library code to a failing test case. I have attached there 
the reduced program. The program shows the same failures like the library 
(interestingly enough it does not fail when removing the data caching).


I have (re)attached the forwarding patches I created for 1056, one binary 
compatible, the other not.


Martin, my wish is to move this to some completion. Please let me know if there 
is something else you think I should do to speed this up.


I am open to all questions, the more the better. Most of my opinions have been 
expressed earlier, but please ask if you want to know more.


Thanks,
Liviu


Re: [jira] [Closed] (STDCXX-1066) SPARCV8 requires pthread_mutex_t and pthread_cond_t to be aligned on an 8-byte boundary

2012-09-27 Thread Liviu Nicoara

On 09/27/12 07:15, Pavel Heimlich, a.k.a. hajma wrote:

2012/9/26 Liviu Nicoara nikko...@hates.ms:

On 09/26/12 05:49, Pavel Heimlich, a.k.a. hajma wrote:


2012/9/26 Liviu Nicoara nikko...@hates.ms:


On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote:




[

https://issues.apache.org/jira/browse/STDCXX-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]




Anybody around here, except Stefan, who has access to a SPARC V8 machine
updated to the specified kernel update or later, and who is willing to
run a
simple test program? It's a 5 minute job at most.



Please point me to the test program.



Hi Pavel,

I attached it. IIRC Solaris had both Solaris threads API and a POSIX threads
API on top of it. Could you please give it a run with MUTEX defined to both
mutex_t and pthread_mutex_t? Might need to tweak the includes.


Here it is, the define seems to make no difference.
BTW I'm not sure about the bug description, was Solaris 10 ever
supported on a sparcv8? I ran it on the crappiest machine I could
find, but all that is sparcv9.


Sun Blade 1000 - UltraSPARC-III+(sparcv9), running Solaris 10u9
bash-3.00# /opt/SUNWspro/bin/CC a.cpp
bash-3.00# ./a.out
24
ffbffcd8 ffbffcf0 ffbffd08 ffbffd20 ffbffd38 ffbffd50 ffbffd68 ffbffd80
ffbffbe0 ffbffc00 ffbffc20 ffbffc40 ffbffc60 ffbffc80 ffbffca0 ffbffcc0
bash-3.00# /opt/SUNWspro/bin/CC -xarch=v8 ./a.cpp
bash-3.00# ./a.out
24
ffbffcd8 ffbffcf0 ffbffd08 ffbffd20 ffbffd38 ffbffd50 ffbffd68 ffbffd80
ffbffbe0 ffbffc00 ffbffc20 ffbffc40 ffbffc60 ffbffc80 ffbffca0 ffbffcc0
bash-3.00# vi a.cpp (--- pthread_mutex_t)
bash-3.00# /opt/SUNWspro/bin/CC a.cpp
bash-3.00# ./a.out
24
ffbffcd8 ffbffcf0 ffbffd08 ffbffd20 ffbffd38 ffbffd50 ffbffd68 ffbffd80
ffbffbe0 ffbffc00 ffbffc20 ffbffc40 ffbffc60 ffbffc80 ffbffca0 ffbffcc0
bash-3.00# /opt/SUNWspro/bin/CC -xarch=v8 ./a.cpp
bash-3.00# ./a.out
24
ffbffcd8 ffbffcf0 ffbffd08 ffbffd20 ffbffd38 ffbffd50 ffbffd68 ffbffd80
ffbffbe0 ffbffc00 ffbffc20 ffbffc40 ffbffc60 ffbffc80 ffbffca0 ffbffcc0

HTH


Very much appreciated.

Liviu


Re: [PATCH] STDCXX-1069 [was: Re: SUNPro 5.12 compilation failure in optimized Linux builds]

2012-09-27 Thread Liviu Nicoara

On 09/23/12 12:15, Liviu Nicoara wrote:

On 09/22/12 00:51, Liviu Nicoara wrote:

Optimized (but not debug) builds fail to compile setlocale.cpp with the error:


A patch and a comment have been attached to the issue.


I am posting it here to save a trip to the JIRA issue. Any feed-back is 
appreciated.

The issue can be tracked down to the interaction between the definition of 
_XOPEN_SOURCE in src/setlocale.cpp and util/path.cpp, the system headers, and Solaris 
Studio C . It seems that the original purpose for the definitions has been lost because 
both S_IFDIR and symlink are defined and declared, respectively, on modern Linux (edit: 
by default). Moreover, it seems that the original definition of _XOPEN_SOURCE was 
incorrect, without a value. I propose we remove the two blocks – see the patch.

Thanks,
Liviu

Index: src/setlocale.cpp
===
--- src/setlocale.cpp   (revision 1388733)
+++ src/setlocale.cpp   (working copy)
@@ -33,11 +33,6 @@
 
 #include rw/_defs.h
 
-#if defined (__linux__)  !defined (_XOPEN_SOURCE)

-   // need S_IFDIR on Linux
-#  define _XOPEN_SOURCE
-#endif   // __linux__  !_XOPEN_SOURCE
-
 #include locale.h   // for setlocale()
 #include stdlib.h   // for getenv()
 #include string.h   // for memcpy(), strcmp()
Index: util/path.cpp
===
--- util/path.cpp   (revision 1388733)
+++ util/path.cpp   (working copy)
@@ -27,16 +27,6 @@
  **/
 
 #ifndef _WIN32

-#  ifdef __linux__
- // for symlink()
-#ifndef _XOPEN_SOURCE
-#  define _XOPEN_SOURCE
-#endif
-#ifndef _XOPEN_SOURCE_EXTENDED
-#  define _XOPEN_SOURCE_EXTENDED
-#endif
-#  endif   // __linux__
-
 #  include unistd.h  // for getcwd()
 #  include sys/stat.h// for struct stat, stat()
 #else



Re: STDCXX-1068 and alignment

2012-09-27 Thread Liviu Nicoara

On 09/22/12 16:22, Stefan Teleman wrote:

On Sat, Sep 22, 2012 at 4:07 PM, Liviu Nicoara nikko...@hates.ms wrote:

Stefan, could you please elaborate on the misaligned reads/writes that you
observed on those platforms? What was failing?


Several tests from the test harness were failing because of it, and
for no other reason.

I'll have to go through my stuff at work and look at the test reports
from the Compiler Guys, we were testing these back in March 2011, and
right now I can't remember exactly which tests were failing. I'll have
the details in a few days.



Any news on this one?

Thanks,
Liviu


Re: STDCXX-1071 numpunct facet defect

2012-09-27 Thread Liviu Nicoara

On 09/26/12 20:12, Liviu Nicoara wrote:

I have created STDCXX-1071 and linked to STDCXX-1056. [...]

I am open to all questions, the more the better. Most of my opinions have been 
expressed earlier, but please ask if you want to know more.



I am attaching here the proposed (4.3.x) patch and the timings results (after 
re-verifying the correctness of the timing program and the results). The 4.2.x 
patch, the 4.3.x patch, the test program and the results file are also attached 
to the incident.

Thanks,
Liviu



Index: include/loc/_numpunct.h
===
--- include/loc/_numpunct.h (revision 1388733)
+++ include/loc/_numpunct.h (working copy)
@@ -61,7 +61,7 @@ struct numpunct: _RW::__rw_facet
 string_type;
 
 _EXPLICIT numpunct (_RWSTD_SIZE_T __ref = 0)
-: _RW::__rw_facet (__ref), _C_flags (0) { }
+: _RW::__rw_facet (__ref) { }
 
 virtual ~numpunct () _RWSTD_ATTRIBUTE_NOTHROW;
 
@@ -109,15 +109,6 @@ protected:
 virtual string_type do_falsename () const {
 return _RW::__rw_get_punct (this, _RW::__rw_fn, char_type ());
 }
-
-private:
-
-int _C_flags;   // bitmap of cached data valid flags
-string  _C_grouping;// cached results of virtual members
-string_type _C_truename;
-string_type _C_falsename;
-char_type   _C_decimal_point;
-char_type   _C_thousands_sep;
 };
 
 
@@ -139,17 +130,7 @@ template class _CharT
 inline _TYPENAME numpunct_CharT::char_type
 numpunct_CharT::decimal_point () const
 {
-if (!(_C_flags  _RW::__rw_dp)) {
-
-numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);
-
-// [try to] get the decimal point first (may throw)
-// then set a flag to avoid future initializations
-__self-_C_decimal_point  = do_decimal_point ();
-__self-_C_flags |= _RW::__rw_dp;
-}
-
-return _C_decimal_point;
+return do_decimal_point ();
 }
 
 
@@ -157,34 +138,14 @@ template class _CharT
 inline _TYPENAME numpunct_CharT::char_type
 numpunct_CharT::thousands_sep () const
 {
-if (!(_C_flags  _RW::__rw_ts)) {
-
-numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);
-
-// [try to] get the thousands_sep first (may throw)
-// then set a flag to avoid future initializations
-__self-_C_thousands_sep  = do_thousands_sep ();
-__self-_C_flags |= _RW::__rw_ts;
-}
-
-return _C_thousands_sep;
+return do_thousands_sep ();
 }
 
 
 template class _CharT
 inline string numpunct_CharT::grouping () const
 {
-if (!(_C_flags  _RW::__rw_gr)) {
-
-numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);
-
-// [try to] get the grouping first (may throw)
-// then set a flag to avoid future initializations
-__self-_C_grouping  = do_grouping ();
-__self-_C_flags|= _RW::__rw_gr;
-}
-
-return _C_grouping;
+return do_grouping ();
 }
 
 
@@ -192,17 +153,7 @@ template class _CharT
 inline _TYPENAME numpunct_CharT::string_type
 numpunct_CharT::truename () const
 {
-if (!(_C_flags  _RW::__rw_tn)) {
-
-numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);
-
-// [try to] get the true name first (may throw)
-// then set a flag to avoid future initializations
-__self-_C_truename  = do_truename ();
-__self-_C_flags|= _RW::__rw_tn;
-}
-
-return _C_truename;
+return do_truename ();
 }
 
 
@@ -210,17 +161,7 @@ template class _CharT
 inline _TYPENAME numpunct_CharT::string_type
 numpunct_CharT::falsename () const
 {
-if (!(_C_flags  _RW::__rw_fn)) {
-
-numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);
-
-// [try to] get the false name first (may throw)
-// then set a flag to avoid future initializations
-__self-_C_falsename  = do_falsename ();
-__self-_C_flags |= _RW::__rw_fn;
-}
-
-return _C_falsename;
+return do_falsename ();
 }
 
 // #endif _RWSTD_NO_EXT_NUMPUNCT_PRIMARY
-*- mode: org -*-

* Machines:

** iMac, Intel, 4 cores:

$ uname -a; gcc -v
Darwin imax 11.4.0 Darwin Kernel Version 11.4.0: Mon Apr  9 19:32:15 PDT 2012; 
root:xnu-1699.26.8~1/RELEASE_X86_64 x86_64
gcc version 4.7.1 (GCC) 

** Linux Slackware, AMD, 16 cores:

$ uname -a; gcc -v
Linux behemoth 2.6.37.6 #3 SMP Sat Apr 9 22:49:32 CDT 2011 x86_64 AMD 
Opteron(tm) Processor 6134 AuthenticAMD GNU/Linux
gcc version 4.5.2 (GCC) 

* Method

** Library

Apply the patch. Build an optimized library (I used 12S in all runs). Build the 
library, rwtest, and locale database:

$ nice make -Clib
$ nice make -Cbin locales
$ nice make -Crwtest 

Properly export the necessary envar if running against STDCXX locale
database or unset, otherwise:

$ export RWSTD_LOCALE_ROOT=/path/to/.../nls

** Test program

Place the multi-threaded program source file, t.cpp, in
srcdir/tests/localization and run make

Re: STDCXX-1071 numpunct facet defect

2012-09-28 Thread Liviu Nicoara

I thought I replied but I see no trace of my post:

On 09/27/12 20:27, Martin Sebor wrote:

On 09/27/2012 06:41 AM, Liviu Nicoara wrote:

On 09/26/12 20:12, Liviu Nicoara wrote:

I have created STDCXX-1071 and linked to STDCXX-1056. [...]

I am open to all questions, the more the better. Most of my opinions
have been expressed earlier, but please ask if you want to know more.



I am attaching here the proposed (4.3.x) patch and the timings results
(after re-verifying the correctness of the timing program and the
results). The 4.2.x patch, the 4.3.x patch, the test program and the
results file are also attached to the incident.


The patch isn't binary compatible. We can't remove data members
in a minor release. We could only do it in a major release.


I posted two patches, one for 4.3.x and one for 4.2.x (the latter is binary 
compatible). My understanding was that minor version revisions may break binary 
compatibility.

Liviu


STDCXX-1072 SPARC V8 mutex alignment requirements

2012-09-28 Thread Liviu Nicoara

I have created the above and linked it to the closed STDCXX-1066.

In short, my reading about this issue is that the kernel patch changed the 
alignment of the userland mutex objects from a machine word to a double-word 
boundary. No changes are required of the users who use such objects in their 
programs unless users create mutex objects in buffers which may not be aligned 
on a proper boundary. E.g., the following are safe:

mutex_t lock;

struct S {
char misalign;
mutex_t lock;
};

whereas the following is not:

union {
void* align;
char buf [sizeof mutex_t];
} u;

new (u) mutex_t;

because the alignment requirements for void pointer are less strict than for 
mutex_t. A few places in the library use the latter for all sorts of static 
objects (mostly local statics). I looked esp. for places where we build objects 
that contain mutex sub-objects inside a union-aligned buffer:

struct S {
char c;
mutex_t m;
};

...

union {
void* align; // - incorrect
char buf [sizeof (S)];
} u;

new (u) S ();

The alignment must be changed to a value equal or greater than the mutex 
alignment requirements.

IMO, the patch I attached does not break binary compatibility. It uses a one 
size fits all long double for alignment -- like we use in rw/_mutex.h -- but in 
doing so dispenses with all sorts of preprocessor conditionals.

I still don't have access to a SPARC machine. Any feed-back and/or SPARC build 
results are more than welcome!

Thanks!

Liviu

Index: src/messages.cpp
===
--- src/messages.cpp(revision 1390953)
+++ src/messages.cpp(working copy)
@@ -65,7 +65,7 @@ struct __rw_open_cat_data
 {
 nl_catd catd;
 union {
-void *_C_align;
+long double _C_align;
 char _C_data [sizeof (_STD::locale)];
 } loc;
 };
Index: src/locale_body.cpp
===
--- src/locale_body.cpp (revision 1390953)
+++ src/locale_body.cpp (working copy)
@@ -1025,7 +1025,7 @@ _C_manage (__rw_locale *plocale, const c
 // the classic C locale is statically allocated
 // and not destroyed during the lifetime of the process
 static union {
-void* _C_align;
+long double _C_align;
 char  _C_buf [sizeof (__rw_locale)];
 } classic_body;
 
Index: src/use_facet.h
===
--- src/use_facet.h (revision 1390953)
+++ src/use_facet.h (working copy)
@@ -64,7 +64,7 @@
else {  \
/* construct an ordinary facet in static memory */  \
static union {  \
-   void *align_;   \
+   long double align_; \
char  data_ [sizeof (__rw_ ## fid ## _facet)];  \
} f;\
static __rw_facet* const pf =   \
@@ -91,7 +91,7 @@
{   \
/* construct an ordinary facet in static memory */  \
static union {  \
-   void *align_;   \
+   long double align_; \
char  data_ [sizeof (__rw_ ## fid ## _facet)];  \
} f;\
static __rw_facet* const pf =   \
Index: src/ctype.cpp
===
--- src/ctype.cpp   (revision 1390953)
+++ src/ctype.cpp   (working copy)
@@ -627,7 +627,7 @@ _RWSTD_EXPORT __rw_facet* __rw_ct_ctype
 }
 else {
 static union {
-void *align_;
+long double align_;
 char  data_ [sizeof (_STD::ctypechar)];
 } f;
 static __rw_facet* const pf =
Index: src/locale_classic.cpp
===
--- src/locale_classic.cpp  (revision 1390953)
+++ src/locale_classic.cpp  (working copy)
@@ -39,7 +39,7 @@ _RWSTD_NAMESPACE (__rw) {
 
 // static buffer for the classic C locale object
 static union {
-void* _C_align;
+long double _C_align;
 char  _C_buf [sizeof (_STD::locale)];
 } __rw_classic;
 


Re: STDCXX-1072 SPARC V8 mutex alignment requirements

2012-09-28 Thread Liviu Nicoara

On 09/28/12 08:29, Liviu Nicoara wrote:

I have created the above and linked it to the closed STDCXX-1066.
[...]
IMO, the patch I attached does not break binary compatibility.


Scratch this, I haven't thought it through.

Thanks,
Liviu


Re: STDCXX-1072 SPARC V8 mutex alignment requirements

2012-09-28 Thread Liviu Nicoara

On 09/28/12 08:45, Liviu Nicoara wrote:

On 09/28/12 08:29, Liviu Nicoara wrote:

I have created the above and linked it to the closed STDCXX-1066.
[...]
IMO, the patch I attached does not break binary compatibility.


Scratch this, I haven't thought it through.


Actually, after more thought, I believe that the patch itself is not causing 
binary incompatibility. The library built on a kernel-patched system will be 
binary incompatible with a library built on a previous system, regardless of 
the patch I presented earlier, because new mutex alignment requirements will 
have changed the size and layout of library objects. Comments are welcome.

Thanks,
Liviu



Re: STDCXX-1071 numpunct facet defect

2012-09-28 Thread Liviu Nicoara

On 09/28/12 11:01, Travis Vitek wrote:

Only major versions can break binary. The versioning policy for stdcxx can be 
found here..

   http://stdcxx.apache.org/versions.html


Thanks, that clarifies things.

Liviu


Re: STDCXX-1072 SPARC V8 mutex alignment requirements

2012-09-28 Thread Liviu Nicoara

On 09/28/12 11:45, Travis Vitek wrote:

Liviu,

Sorry I didn't look until just now, but it appears that I could have re-opened 
STDCXX-1066. I only see the 'Reopen Issue' button for STDCXX issues, but it is 
most definitely there. Perhaps there is some sort of permission issue for you?


It's ok, I think it's somewhat cleaner to create a new one and link it to the 
old one. Even if clean was not a concern, it was within Stefan's options to 
close the incident. I don't know.



Also, STDCXX-1066 appears to have been a duplicate of STDCXX-1040.


Yep, forgot about it, I am thinking about linking that one, too.

Thanks.


Re: STDCXX-1072 SPARC V8 mutex alignment requirements

2012-09-28 Thread Liviu Nicoara

On 09/28/12 11:32, Travis Vitek wrote:



-Original Message-
From: Liviu Nicoara
Sent: Friday, September 28, 2012 5:29 AM





In short, my reading about this issue is that the kernel patch changed
the alignment of the userland mutex objects from a machine word to a
double-word boundary. No changes are required of the users who use such
objects in their programs unless users create mutex objects in buffers
which may not be aligned on a proper boundary.


Your reading of this is consistent with my previous understanding of the 
problem, so that is good.



I still don't have access to a SPARC machine. Any feed-back and/or
SPARC build results are more than welcome!



I can provide build results for SPARCV9 if we want them, but I thought that the 
problem only came up on 32-bit SPARCV8 builds.


I guess a -xarch=sparcv8 build will do. It is quite unlikely anybody has a real 
32-bit SPARC hardware.



The patch assumes the type `long double' exists on every platform. While I do 
believe that it is available everywhere, we have lots of conditional code 
guarding its use in the library now. If we are going to use `long double' in 
this context, we should guard it with _RWSTD_NO_LONG_DOUBLE. I think an even 
cleaner solution is to switch to using __rw_aligned_buffer instead. It gives us 
a single point of failure for alignment issues like this, and it makes the code 
self-documenting and easier to read.


I took a cue from an alignment in the rw/_mutex.h code there. I'll do better.



As for your concerns about binary compatibility, I think that everything should 
be okay. We aren't changing the size of anything that is being passed around, 
we're just changing its alignment. I could be wrong, but I'm pretty sure that 
we're safe.



The library object sizes and layouts will change with or without our patch. 
Before the kernel patch our objects will have one layout and size and different 
ones afterwards. E.g.:

struct locale {
int c;
pthread_mutex_t lock;
};

before kernel patching you'd have no padding between the members. That's what I 
thought when firing that second post about compatibility.

Liviu


Re: STDCXX-1072 SPARC V8 mutex alignment requirements

2012-09-28 Thread Liviu Nicoara

On 09/28/12 13:51, Martin Sebor wrote:

[...]
One other comment: I would suggest choosing subjects for bug
reports that reflect the problem rather than a fix for it or
a rationale for it. For  STDCXX-1066 I think something like
Library mutex objects misaligned on SPARCV8 would better
capture the problem than the current title. (It's also up
to us to rename an issue if we find it more descriptive
than the original.)


I can't do that myself. I looked at that and 1056 and there is no button for me 
to reopen, or to edit stuff which is not mine.

Needless to say, I like my issues better. But I have no problems with the 
re-opening of the closed ones.

Thanks,
Liviu



Re: STDCXX-1071 numpunct facet defect

2012-09-28 Thread Liviu Nicoara

On 9/28/12 11:01 AM, Travis Vitek wrote:

Only major versions can break binary. The versioning policy for stdcxx can be 
found here..

   http://stdcxx.apache.org/versions.html


I have renamed the binary-incompatible patch as patch-5.0.x.diff.

Thanks,
Liviu



Travis

-Original Message-
From: Liviu Nicoara [mailto:nikko...@hates.ms]
Sent: Friday, September 28, 2012 3:52 AM
To: dev@stdcxx.apache.org
Subject: Re: STDCXX-1071 numpunct facet defect

I thought I replied but I see no trace of my post:

On 09/27/12 20:27, Martin Sebor wrote:

On 09/27/2012 06:41 AM, Liviu Nicoara wrote:

On 09/26/12 20:12, Liviu Nicoara wrote:

I have created STDCXX-1071 and linked to STDCXX-1056. [...]

I am open to all questions, the more the better. Most of my opinions
have been expressed earlier, but please ask if you want to know more.



I am attaching here the proposed (4.3.x) patch and the timings results
(after re-verifying the correctness of the timing program and the
results). The 4.2.x patch, the 4.3.x patch, the test program and the
results file are also attached to the incident.


The patch isn't binary compatible. We can't remove data members
in a minor release. We could only do it in a major release.


I posted two patches, one for 4.3.x and one for 4.2.x (the latter is binary 
compatible). My understanding was that minor version revisions may break binary 
compatibility.

Liviu





[PATCH] Re: STDCXX-1072 SPARC V8 mutex alignment requirements

2012-09-29 Thread Liviu Nicoara

On 9/28/12 11:32 AM, Travis Vitek wrote:



-Original Message-
From: Liviu Nicoara
Sent: Friday, September 28, 2012 5:29 AM
[...]

The patch assumes the type `long double' exists on every platform. While I do 
believe that it is available everywhere, we have lots of conditional code 
guarding its use in the library now. If we are going to use `long double' in 
this context, we should guard it with _RWSTD_NO_LONG_DOUBLE. I think an even 
cleaner solution is to switch to using __rw_aligned_buffer instead. It gives us 
a single point of failure for alignment issues like this, and it makes the code 
self-documenting and easier to read.


I am attaching another patch here, which makes use of the __rw_aligned_buffer, 
slightly more verbose but the code is slightly cleaner.


Thanks!

Liviu


Index: src/messages.cpp
===
--- src/messages.cpp(revision 1390953)
+++ src/messages.cpp(working copy)
@@ -38,6 +38,7 @@
 
 #include rw/_mutex.h
 
+#include podarray.h
 
 #if !defined (_RWSTD_NO_NL_TYPES_H)  !defined (_RWSTD_NO_CATOPEN_CATGETS)
 #  include nl_types.h
@@ -64,10 +65,7 @@ _RWSTD_NAMESPACE (__rw) {
 struct __rw_open_cat_data
 {
 nl_catd catd;
-union {
-void *_C_align;
-char _C_data [sizeof (_STD::locale)];
-} loc;
+__rw_aligned_buffer_STD::locale loc;
 };
 
 struct __rw_open_cat_page
@@ -159,7 +157,8 @@ __rw_manage_cat_data (int cat,  __rw_op
 cat = int (n_catalogs);
 
 catalogs [cat]-catd = pcat_data-catd;
-memcpy (catalogs [cat]-loc, pcat_data-loc,
+memcpy (catalogs [cat]-loc._C_store (), 
+pcat_data-loc._C_store (),
 sizeof (_STD::locale));
 
 if (size_t (cat)  largest_cat)
@@ -175,7 +174,8 @@ __rw_manage_cat_data (int cat,  __rw_op
 }
 
 catalogs [cat]-catd = pcat_data-catd;
-memcpy (catalogs [cat]-loc, pcat_data-loc,
+memcpy (catalogs [cat]-loc._C_store (),
+pcat_data-loc._C_store (),
 sizeof (_STD::locale));
 
 if (size_t (cat)  largest_cat)
@@ -258,8 +258,9 @@ int __rw_cat_open (const _STD::string c
 return -1;
 
 __rw_open_cat_data cat_data;
+
 cat_data.catd = catd;
-new (cat_data.loc) _STD::locale (loc);
+new (cat_data.loc._C_store ()) _STD::locale (loc);
 
 int cat = -1;
 __rw_manage_cat_data (cat, cat_data);
@@ -307,7 +308,7 @@ const _STD::locale __rw_get_locale (int
 
 _RWSTD_ASSERT (0 != pcat_data);
 
-return *(_RWSTD_REINTERPRET_CAST (_STD::locale*, (pcat_data-loc)));
+return *pcat_data-loc._C_data ();
 }
 
 
@@ -329,8 +330,7 @@ void __rw_cat_close (int cat)
 
 catclose (pcat_data-catd);
 
-_STD::locale* const ploc =
-_RWSTD_REINTERPRET_CAST (_STD::locale*, pcat_data-loc);
+_STD::locale* const ploc = pcat_data-loc._C_data ();
 
 ploc-~locale ();
 
Index: src/locale_body.cpp
===
--- src/locale_body.cpp (revision 1390953)
+++ src/locale_body.cpp (working copy)
@@ -1024,14 +1024,12 @@ _C_manage (__rw_locale *plocale, const c
 
 // the classic C locale is statically allocated
 // and not destroyed during the lifetime of the process
-static union {
-void* _C_align;
-char  _C_buf [sizeof (__rw_locale)];
-} classic_body;
+static __rw_aligned_buffer__rw_locale classic_body;
 
 // construct a locale body in place
 // with the initial reference count of 1
-classic = new (classic_body) __rw_locale (locname);
+classic = new (classic_body._C_store ()) 
+__rw_locale (locname);
 
 _RWSTD_ASSERT (1 == classic-_C_ref);
 }
Index: src/use_facet.h
===
--- src/use_facet.h (revision 1390953)
+++ src/use_facet.h (working copy)
@@ -37,6 +37,7 @@
 #include rw/_defs.h
 
 #include access.h
+#include podarray.h
 
 
 // helper macro _RWSTD_DEFINE_FACET_FACTORY() defines a facet factory
@@ -63,12 +64,9 @@
}   \
else {  \
/* construct an ordinary facet in static memory */  \
-   static union {  \
-   void *align_;   \
-   char  data_ [sizeof (__rw_ ## fid ## _facet)];  \
-   } f

STDCXX-970 and locale tests

2012-09-29 Thread Liviu Nicoara
While checking for fallout from 1072 I stumbled upon the collate test and its 
numerous (unrelated to 1072) failures. While I am looking at it I have a 
question: it seems to me that the locale tests do not test against STDCXX locale 
database. Is that right? So, all locale tests are supposed to be run without 
RWSTD_LOCALE_ROOT defined?


Thanks,
Liviu


Fwd: Re: STDCXX-1071 numpunct facet defect

2012-09-30 Thread Liviu Nicoara

Forwarding with the attachment.

 Original Message 
Subject: Re: STDCXX-1071 numpunct facet defect
Date: Sun, 30 Sep 2012 12:09:10 -0600
From: Martin Sebor mse...@gmail.com
To: Liviu Nicoara nikko...@hates.ms

On 09/27/2012 06:36 PM, Liviu Nicoara wrote:

On 9/27/12 8:27 PM, Martin Sebor wrote:

On 09/27/2012 06:41 AM, Liviu Nicoara wrote:

On 09/26/12 20:12, Liviu Nicoara wrote:

I have created STDCXX-1071 and linked to STDCXX-1056. [...]

I am open to all questions, the more the better. Most of my opinions
have been expressed earlier, but please ask if you want to know more.



I am attaching here the proposed (4.3.x) patch and the timings results
(after re-verifying the correctness of the timing program and the
results). The 4.2.x patch, the 4.3.x patch, the test program and the
results file are also attached to the incident.


The patch isn't binary compatible. We can't remove data members
in a minor release. We could only do it in a major release.


There are two of them, one for 4.2.x, one for 4.3.x.



I'm still curious about the performance, though. It doesn't make
sense to me that a call to a virtual function is faster than one
to an almost trivial inline function.


I scratched my head long on that. I wager that it depends on the
machine, too.


Here are my timings for library-reduction.cpp when compiled
GCC 4.5.3 on Solaris 10 (4 SPARCV9 CPUs). I had to make a small
number of trivial changes to get it to compile:

 With cache   No cache
real1m38.332s 8m58.568s
user6m30.244s34m25.942s
sys 0m0.060s  0m3.922s

I also experimented with the program on Linux (CEL 4 with 16
CPUs). Initially, I saw no differences between the two versions.
So I modified it a bit to make it closer to the library (the
modified program is attached). With those changes the timings
are below:

 With cache   No cache
real0m 1.107s0m 5.669s
user0m17.204s0m 5.669s
sys 0m 0.000s0m22.347s

I also recompiled and re-ran the test on Solaris. To speed
things along, I set the number threads and loops to 8 and
100. The numbers are as follows:

 With cache   No cache
real0m3.341s 0m26.333s
user0m13.052s1m37.470s
sys 0m0.009s 0m0.132s

The numbers match my expectation. The overhead without the
numpunct cache is considerable.

Somewhat unexpectedly, the test with the cache didn't crash.

I also tried timing the numpunct patch attached to the issue
with the test program. The test program runs to completion
without the patch but it crashes with a SIGSEGV after the
patch is applied. That suggests there's a bug somewhere in
do_thousands_sep() (in addition to the caching).

Martin



Let me know if there is anything I could help with.

Liviu





#include iostream
#include locale

#include cstdio
#include cstdlib
#include cstring

#include pthread.h
#include unistd.h

#define MAX_THREADS 128

static long nloops = 1000, nthreads = 16;
static bool volatile pwait = true;



namespace X {

struct guard 
{
guard (pthread_mutex_t* ptr) : lock_ (ptr) { 
pthread_mutex_lock (lock_);
}

~guard () { 
pthread_mutex_unlock (lock_); 
}

private:

guard (guard const);
guard operator= (guard const);

private:

pthread_mutex_t* lock_;
};

struct facet
{
facet () {
pthread_mutex_init (_C_mutex, 0);
}

void* _C_data () {
return _C_impsize ? _C_impdata : _C_get_data ();
}

void* _C_get_data ();

size_t _C_impsize;
void*  _C_impdata;

pthread_mutex_t _C_mutex;
};

void* facet::_C_get_data ()
{
if (_C_impsize)
return _C_impdata;

guard g (_C_mutex);

if (_C_impsize)
return _C_impdata;

#if !defined (NO_USE_STDCXX_LOCALES)
_C_impdata = (void*)\3\3;
#endif // USE_STDCXX_LOCALES
_C_impsize = (size_t)(-1);

return _C_impdata;
}

void* get_data (facet*);

struct numpunct : facet
{
numpunct () : _C_flags () { }

virtual std::string do_grouping () const;

std::string grouping () const {
numpunct* const __self = (numpunct*)this;

#if !defined (NO_USE_NUMPUNCT_CACHE)
if (!(_C_flags  1)) {
__self-_C_flags  |= 1;
__self-_C_grouping = do_grouping ();
}

return _C_grouping;
#else
return do_grouping ();
#endif // NO_USE_NUMPUNCT_CACHE
}

private:

int  _C_flags;
std::string  _C_grouping;
};

std::string numpunct::do_grouping () const
{
return (const char*)get_data (const_castnumpunct*(this));
}

void* get_data (facet* pfacet)
{
void* pdata = pfacet-_C_data ();

if (pdata) 
return pdata;

// __rw_setlocale loc (...); - other mutex

if (pfacet-_C_data ())
return get_data (pfacet);

pfacet-_C_impdata = const_castchar*(\4\4);
pfacet-_C_impsize = (size_t)(-1);

return 0;
}

} // namespace X

Re: Fwd: Re: STDCXX-1071 numpunct facet defect

2012-09-30 Thread Liviu Nicoara

On 9/30/12 2:21 PM, Liviu Nicoara wrote:

Forwarding with the attachment.

 Original Message 
Subject: Re: STDCXX-1071 numpunct facet defect
Date: Sun, 30 Sep 2012 12:09:10 -0600
From: Martin Sebor mse...@gmail.com
To: Liviu Nicoara nikko...@hates.ms


On 9/27/12 8:27 PM, Martin Sebor wrote:


Here are my timings for library-reduction.cpp when compiled
GCC 4.5.3 on Solaris 10 (4 SPARCV9 CPUs). I had to make a small
number of trivial changes to get it to compile:

  With cache   No cache
real1m38.332s 8m58.568s
user6m30.244s34m25.942s
sys 0m0.060s  0m3.922s

I also experimented with the program on Linux (CEL 4 with 16
CPUs). Initially, I saw no differences between the two versions.
So I modified it a bit to make it closer to the library (the
modified program is attached). With those changes the timings


I see the difference -- your program has a virtual function it calls from the 
inline grouping function.



are below:

  With cache   No cache
real0m 1.107s0m 5.669s
user0m17.204s0m 5.669s
sys0m 0.000s0m22.347s

I also recompiled and re-ran the test on Solaris. To speed
things along, I set the number threads and loops to 8 and
100. The numbers are as follows:

  With cache   No cache
real0m3.341s 0m26.333s
user0m13.052s1m37.470s
sys 0m0.009s 0m0.132s

The numbers match my expectation. The overhead without the
numpunct cache is considerable.


I have done another (smaller) round of measurements, this time using the test 
program you posted. Here are the results:


* iMac, 4x Intel, 12S:

16, 1000:

Cached   Not cached
real0m9.300s 0m5.224s
user0m36.441s0m20.523s
sys 0m0.043s 0m0.068s

* iMac, 4x Intel, 12D:

CachedNot cached
real0m9.012s  0m5.774s
user0m35.343s 0m20.997s
sys 0m0.045s  0m0.183s

* Linux Slackware, 16x AMD Opteron, 12S:

16, 1000:

Cached Not cached
real0m29.798s  0m3.278s
user0m48.662s  0m47.338s
sys 6m18.525s  0m3.298s



Somewhat unexpectedly, the test with the cache didn't crash.


On my iMac it did not crash for me either (gcc 4.5.4), this time. On the other 
box (gcc 4.5.2) crashed every time with caching, so I had to add a call to 
fac.grouping outside the thread function to initialize the facet.


Liviu


Fwd: Re: Fwd: Re: STDCXX-1071 numpunct facet defect

2012-09-30 Thread Liviu Nicoara

Forwarding to the list. Duh.


 Original Message 
Subject: Re: Fwd: Re: STDCXX-1071 numpunct facet defect
Date: Sun, 30 Sep 2012 19:02:27 -0400
From: Liviu Nicoara nikko...@hates.ms
To: Martin Sebor mse...@gmail.com

On 9/30/12 6:18 PM, Martin Sebor wrote:

I see you did a 64-bit build while I did a 32-bit one. so
I tried 64-bits. The cached version (i.e., the one compiled
with -UNO_USE_NUMPUNCT_CACHE) is still about twice as fast
as the non-cached one (compiled with -DNO_USE_NUMPUNCT_CACHE).

I had made one change to the test program that I thought might
account for the difference: I removed the call to abort from
the thread function since it was causing the process to exit
prematurely in some of my tests. But since you used the
modified program for your latest measurements that couldn't
be it.

I can't explain the differences. They just don't make sense
to me. Your results should be the other way around. Can you
post the disassembly of function f() for each of the two
configurations of the test?



Here they are.

Liviu




Dump of assembler code for function f:
   0x00403870 +0: push   %r15
   0x00403872 +2: push   %r14
   0x00403874 +4: push   %r13
   0x00403876 +6: push   %r12
   0x00403878 +8: push   %rbp
   0x00403879 +9: push   %rbx
   0x0040387a +10:mov%rdi,%rbx
   0x0040387d +13:sub$0x38,%rsp
   0x00403881 +17:nopl   0x0(%rax)
   0x00403888 +24:movzbl 0x261f11(%rip),%eax# 0x6657a0 
_ZL5pwait
   0x0040388f +31:test   %al,%al
   0x00403891 +33:jne0x403888 f+24
   0x00403893 +35:cmpq   $0x0,0x261ef5(%rip)# 0x665790 
_ZL6nloops
   0x0040389b +43:jle0x403b12 f+674
   0x004038a1 +49:xor%ebp,%ebp
   0x004038a3 +51:xor%r12d,%r12d
   0x004038a6 +54:lea0x10(%rsp),%r13
   0x004038ab +59:lea0x48(%rbx),%r14
   0x004038af +63:lea0x20(%rsp),%r15
   0x004038b4 +68:jmpq   0x4039a7 f+311
   0x004038b9 +73:nopl   0x0(%rax)
   0x004038c0 +80:cmp$0x66a020,%rdx
   0x004038c7 +87:mov%rdi,0x20(%rsp)
   0x004038cc +92:je 0x403ab8 f+584
   0x004038d2 +98:mov%rdx,%rdi
   0x004038d5 +101:   mov%rdx,(%rsp)
   0x004038d9 +105:   callq  0x403658 pthread_mutex_lock@plt
   0x004038de +110:   test   %eax,%eax
   0x004038e0 +112:   mov(%rsp),%rdx
   0x004038e4 +116:   je 0x4038fb f+139
   0x004038e6 +118:   mov$0x4452a4,%esi
   0x004038eb +123:   mov$0xa,%edi
   0x004038f0 +128:   xor%eax,%eax
   0x004038f2 +130:   callq  0x404370 _ZN4__rw10__rw_throwEiz
   0x004038f7 +135:   mov(%rsp),%rdx
   0x004038fb +139:   addl   $0x1,0x28(%rdx)
   0x004038ff +143:   test   %rdx,%rdx
   0x00403902 +146:   je 0x40390c f+156
   0x00403904 +148:   mov%rdx,%rdi
   0x00403907 +151:   callq  0x4036c8 pthread_mutex_unlock@plt
   0x0040390c +156:   mov0x20(%rsp),%rdx
   0x00403911 +161:   mov%rdx,%rdi
   0x00403914 +164:   mov%rdx,(%rsp)
   0x00403918 +168:   callq  0x403258 strlen@plt
   0x0040391d +173:   mov(%rsp),%rdx
   0x00403921 +177:   add%rax,%r12
   0x00403924 +180:   lea-0x40(%rdx),%rcx
   0x00403928 +184:   cmp$0x66a020,%rcx
   0x0040392f +191:   je 0x40398b f+283
   0x00403931 +193:   mov%rcx,%rdi
   0x00403934 +196:   mov%rcx,0x8(%rsp)
   0x00403939 +201:   callq  0x403658 pthread_mutex_lock@plt
   0x0040393e +206:   test   %eax,%eax
   0x00403940 +208:   mov(%rsp),%rdx
   0x00403944 +212:   mov0x8(%rsp),%rcx
   0x00403949 +217:   je 0x403965 f+245
   0x0040394b +219:   mov$0x4452a4,%esi
   0x00403950 +224:   mov$0xa,%edi
   0x00403955 +229:   xor%eax,%eax
   0x00403957 +231:   callq  0x404370 _ZN4__rw10__rw_throwEiz
   0x0040395c +236:   mov0x8(%rsp),%rcx
   0x00403961 +241:   mov(%rsp),%rdx
   0x00403965 +245:   mov-0x18(%rdx),%esi
   0x00403968 +248:   test   %rcx,%rcx
   0x0040396b +251:   lea-0x1(%rsi),%eax
   0x0040396e +254:   mov%eax,-0x18(%rdx)
   0x00403971 +257:   je 0x403983 f+275
   0x00403973 +259:   mov%rcx,%rdi
   0x00403976 +262:   mov%esi,0x8(%rsp)
   0x0040397a +266:   callq  0x4036c8 pthread_mutex_unlock@plt
   0x0040397f +271:   mov0x8(%rsp),%esi
   0x00403983 +275:   test   %esi,%esi
   0x00403985 +277:   jle0x403a80 f+528
   0x0040398b +283:   add$0x1,%ebp
   0x0040398e +286:   movq   $0x0,0x20(%rsp

Re: Fwd: Re: STDCXX-1071 numpunct facet defect

2012-10-02 Thread Liviu Nicoara

On 09/30/12 18:18, Martin Sebor wrote:

I see you did a 64-bit build while I did a 32-bit one. so
I tried 64-bits. The cached version (i.e., the one compiled
with -UNO_USE_NUMPUNCT_CACHE) is still about twice as fast
as the non-cached one (compiled with -DNO_USE_NUMPUNCT_CACHE).

I had made one change to the test program that I thought might
account for the difference: I removed the call to abort from
the thread function since it was causing the process to exit
prematurely in some of my tests. But since you used the
modified program for your latest measurements that couldn't
be it.

I can't explain the differences. They just don't make sense
to me. Your results should be the other way around. Can you
post the disassembly of function f() for each of the two
configurations of the test?


The first thing that struck me in the cached `f' was that __string_ref 
class uses a mutex for synchronizing access to the ref counter. It turns 
out, for Linux on AMD64 we explicitly use a mutex instead of the atomic 
ops on the ref counter, via a block in rw/_config.h:


#  if _RWSTD_VER_MAJOR  5
#ifdef _RWSTD_OS_LINUX
   // on Linux/AMD64, unless explicitly requested, disable the use
   // of atomic operations in string for binary compatibility with
   // stdcxx 4.1.x
#  ifndef _RWSTD_USE_STRING_ATOMIC_OPS
#define _RWSTD_NO_STRING_ATOMIC_OPS
#  endif   // _RWSTD_USE_STRING_ATOMIC_OPS
#endif   // _WIN32
#  endif   // stdcxx  5.0


That is not the cause for the performance difference, though. Even after 
building with __RWSTD_USE_STRING_ATOMIC_OPS I get the same better 
performance with the non-cached version.


Liviu


Re: Fwd: Re: STDCXX-1071 numpunct facet defect

2012-10-03 Thread Liviu Nicoara

On 10/02/12 10:41, Martin Sebor wrote:

I haven't had time to look at this since my last email on
Sunday. I also forgot about the string mutex. I don't think
I'll have time to spend on this until later in the week.
Unless the disassembly reveals the smoking gun, I think we
might need to simplify the test to get to the bottom of the
differences in our measurements. (I.e., eliminate the library
and measure the runtime of a simple thread loop, with and
without locking.) We should also look at the GLIBC and
kernel versions on our systems, on the off chance that
there has been a change that could explain the discrepancy
between my numbers and yours. I suspect my system (RHEL 4.8)
is much older than yours (I don't remember now if you posted
your details).


I am gathering some more measurements along these lines but it's time 
consuming. I estimate I will have some ready for review later today or 
tomorrow. In the meantime could you please post your kernel, glibc and 
compiler versions?


Liviu



Martin

On 10/02/2012 06:22 AM, Liviu Nicoara wrote:

On 09/30/12 18:18, Martin Sebor wrote:

I see you did a 64-bit build while I did a 32-bit one. so
I tried 64-bits. The cached version (i.e., the one compiled
with -UNO_USE_NUMPUNCT_CACHE) is still about twice as fast
as the non-cached one (compiled with -DNO_USE_NUMPUNCT_CACHE).

I had made one change to the test program that I thought might
account for the difference: I removed the call to abort from
the thread function since it was causing the process to exit
prematurely in some of my tests. But since you used the
modified program for your latest measurements that couldn't
be it.

I can't explain the differences. They just don't make sense
to me. Your results should be the other way around. Can you
post the disassembly of function f() for each of the two
configurations of the test?


The first thing that struck me in the cached `f' was that __string_ref
class uses a mutex for synchronizing access to the ref counter. It turns
out, for Linux on AMD64 we explicitly use a mutex instead of the atomic
ops on the ref counter, via a block in rw/_config.h:

# if _RWSTD_VER_MAJOR  5
# ifdef _RWSTD_OS_LINUX
// on Linux/AMD64, unless explicitly requested, disable the use
// of atomic operations in string for binary compatibility with
// stdcxx 4.1.x
# ifndef _RWSTD_USE_STRING_ATOMIC_OPS
# define _RWSTD_NO_STRING_ATOMIC_OPS
# endif // _RWSTD_USE_STRING_ATOMIC_OPS
# endif // _WIN32
# endif // stdcxx  5.0


That is not the cause for the performance difference, though. Even after
building with __RWSTD_USE_STRING_ATOMIC_OPS I get the same better
performance with the non-cached version.

Liviu




  1   2   >