from:"Stefan Teleman"

Re: C++0x support?

2010-06-04 Thread Stefan Teleman

2010/6/4 C. Bergström cbergst...@pathscale.com:

 Hi Stefan,

 Does that mean the suncc team will be helping to improve it?  If neither
 please don't hijack threads.  I removed maybe too much context from the
 email, but it was in reference to C++0x + OpenSolaris.

You should ask the compiler team directly what their plans are. I
cannot speak for them.

My email was in response to your statement about supporting C++0X in
OpenSolaris with the Sun C++ compiler (which you have restated in this
message).

I wonder how you plan on supporting C++0X in OpenSolaris with the Sun
C++ compiler, when this compiler does not currently support *any*
C++0X features, and support for these features will not be available
for quite some time. And the compiler is not open source.

So no, I am not hijacking threads. I am seeking some clarification
with respect to your own statements.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: C++0x support?

2010-06-04 Thread Stefan Teleman

2010/6/4 C. Bergström cbergst...@pathscale.com:

 I checked my email and I think you just assumed sun cc..

Yes I assumed Sun CC when I read OpenSolaris, and I didn't quite see
any reference to PathScale.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: stdcxx, Solaris, KDE

2011-02-03 Thread Stefan Teleman

On Thu, Feb 3, 2011 at 16:40, Pavel Heimlich, a.k.a. hajma
tropikha...@gmail.com wrote:
 Hi,
 I'm one of the guys porting KDE to Solaris and we use stdcxx extensively 
 there.
 I just spent some time rediscovering an old stdcxx bug, which led me

Which old stdcxx bug are you referring to?

I am in the process of integrating a batch of patches for stdcxx into
Solaris 11 and Solaris 10 [1], and I would like to know which bug this
is -- that we haven't identified yet.

If you could file a CR with a bug description and possibly a simple
test case exercising the bug, it would be appreciated.

Thank you.

--Stefan

---

[1] Apache stdcxx will also become available in Solaris 10 starting
with Update 10, to be released sometime this year.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX fork

2011-06-25 Thread Stefan Teleman

2011/6/26 C. Bergström cbergst...@pathscale.com:

 The last time we checked the patches they caused some boost regressions so 
 please make sure to run the boost test suite.

We don't run the Boost tests to validate the 2003 C++ Standard. We run
the 2003 C++ Standard validation tests. If strict conformance to the
2003 C++ Standard causes problems with Boost, then that's a Boost
problem and not a stdcxx problem.

There were indeed numerous deviations from the 2003 C++ Standard in
the original stdcxx implementation.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX fork

2011-06-26 Thread Stefan Teleman

2011/6/26 C. Bergström cbergst...@pathscale.com:

 PathScale has a Perennial license and feel free to privately email which
 issues the patches specifically fix.

Great, then PathScale can run the Perennial C++ validation tests on
PathScale's recently published stdcxx fork.

I looked at the github code published by PathScale and it is obvious
to me that it has not been validated against *any* C++2003 validation
test harness.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX fork

2011-06-26 Thread Stefan Teleman

2011/6/27 C. Bergström cbergst...@pathscale.com:

 Your false statements are annoying and unnecessary.

I deeply regret that I am annoying you.

 Please don't avoid the question as I'm trying to help review your changes.
  Either publicly or privately email which patch fixes which Perennial test.
  (If in fact you've ran them at all)

Quite frankly, I really don't need your help in reviewing my patches.
They've already been reviewed.

My current job description does not require me to help you run the
Perennial validation tests, or to provide you with any information
about the Perennial test results. As a matter of fact, I don't even
have to provide you with patches at all. I am doing this as a
courtesy: you stated that you wanted to look at the Solaris patches.

You work for a compiler writer, and you stated you have a Perennial
license. You should, therefore, be able to run the Perennial tests
yourself.

I stand by my previous statement: you have not validated the github
fork of stdcxx against any validation test harness. Had you done so,
several tests would/should have failed. Had you corrected the stdcxx
code causing these failures (which you have not, I have verified that
the violations are still there), several tests from the apache stdcxx
test harness would have failed, and these tests would have required
patches too. I do not see the necessary code changes, and I can tell
all this by looking at the PathScale stdcxx fork code.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: [VOTE] Retirement of stdcxx to the 'Attic'?

2012-02-02 Thread Stefan Teleman

On Thu, Feb 2, 2012 at 12:03, William A. Rowe Jr. wr...@rowe-clan.net wrote:
Fans and contributors,

it appears that the stdcxx project is entirely dormant.  The ASF has
launched a new 'Attic' project over the past two years, to neatly
retire dormant works until and unless a community comes along who
wishes to revive the effort.

As a simple formality your votes please;

 [X ] -1 - No, stdcxx should not fold, I am still contributing, and
        [would serve|am serving] on its project management committee

 The results of this vote will be taken up by the ASF Board of Directors
 at their 15 Feb meeting.

I maintain/enahnce stdcxx on Solaris 10 and 11 and Linux at Oracle
(with the Sun Studio compilers). I also test with gcc on the same
platforms (although we do not publish stdcxx for gcc).

stdcxx on Solaris is a long-term commitment on Oracle's part. It will
be available and maintained in Solaris for a long time to come. It
would be very sad for such a nice implementation of C++2003 to be
retired and left to gather dust.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: [disscuss] Retirement of stdcxx to the 'Attic'?

2012-02-02 Thread Stefan Teleman

On Thu, Feb 2, 2012 at 17:57, William A. Rowe Jr. wr...@rowe-clan.net wrote:

 The much larger issue is that the ASF is designed as a collaboration
 hub where multiple consumers can be represented.  It is designed to
 avoid the need for forks except in radical divisions within communities
 where two or more groups want the code to proceed in different directions.
 In order to remain a project, the ASF requires a PMC composed of the
 contributors to the project (committers) which represent active user -
 developers of the project's code, and are willing to both incorporate
 all reasonable changes and draw in new individuals who are frequently
 offering those changes.

 As a standards body implementation, we would /hope/ there aren't huge
 fractures in the direction of the code :)

 If there are multiple forks at this point, the questions are why, and
 what can be done to bring it all back together into a single community
 where no one company or individual is shouldering the burden of
 entirely maintaining the code on their own.

 Feel free to chime in here on these questions.

Speaking for the Solaris/Sun Studio C++ fork:

To begin with, it's not a fork. Or at least it was never intended to
be a fork (and still isn't). It's simply a very large collection of
patches, based on stdcxx 4.2.1. This was the last official stdcxx
release published at the ASF, and that's the release we used as a
starting point in Solaris.

There are three categories of patches:

1. Patches specific to Solaris and Sun Studio. These affect stdcxx's
GNUmakefiles*, sunpro.config and gcc.config. The GNUmakefile* patches
can probably be ignored for a general-purpose relase. The
sunpro.config and gcc.confnig patches are useful for building on
Solaris.

2. Patches pertaining to a specific set of Solaris SPARC
idiosyncracies. You can find more details about these patches here:

https://issues.apache.org/jira/browse/STDCXX-1040

3. Patches for stdcxx issues which were discovered while running the
stdcxx test harness, and for which there was no canonical resolution.
For example:

https://issues.apache.org/jira/browse/STDCXX-839

Turns out that std::numpunct and std::moneypunct are thread-unsafe
(because of the std::basic_string's copy constructor and shared buffer
implementation).

3. Patches for issues discovered during C++2003 validation testing:

I wrote the patches based on failures or violations discovered while
running the Perennial CPPVS 8.1 (which is what we use internally) and
some other simple, trivial tests on stdcxx with Sun Studio C++.

These are general-purpose patches, they address problems independent
of platform or architecture. This is the largest set of patches.
Caveat: some (a few) of these patches break BC with the existing
stdcxx 4.2.1 implementation. This may be a problem at ASF; for Solaris
we had the advantage that stdcxx was new, and we could afford to break
BC (because there was nothing to maintain BC with in the first place).

Why did these patches never make it into stdcxx: because by the time I
started submitting them here, stdcxx was already on its way to
becoming dormant. Or at least very sleepy. A small set of very simple
patches made it into the yet-unreleased 4.2.2, but the big set of
complex patches never made it.

At any rate, you can find the complete set of Studio C++ patches here:

http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/
http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/Solaris/

The README.Solaris file is out-of-date and very obsolete. Please ignore it.

This set of patches generates a stdcxx identical to that available
with Solaris 10 and 11, with two exceptions:

ios_base.failure.90.diff and
stdexcept.91.diff

aren't part of the Solaris canonical stdcxx release. Strictly
technically speaking, std::ios_base::failure should be a class, and
not a typedef (and making it a typedef, as it is in the stdcxx
implementation causes a failure on a very specific and otherwise
obscure CPPVS test case). However, making it a proper class (as it is
declared in 27.4.2.1.1) has a noticeable performance impact) so we
decited to leave it as is.

svn co should work anonymously. If it doesn't please let me know.

what can be done to bring it all back together into a single community:

1. Don't retire it to the Attic. :-) The Attic pretty much guarantees
that it will never be brought back all together.

2. Someone with stdcxx commit privileges should be part of this
reunification (for obvious reasons). It is very discouraging to submit
patches knowing full well and ahead of time that they will never make
it anywhere. Perhaps the process of submitting patches could be
somewhat less of a process.

Just my 0.02.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: [disscuss] Retirement of stdcxx to the 'Attic'?

2012-02-04 Thread Stefan Teleman

OK I will start submitting patches at stdcxx. Breaking them up into
smaller chunks will increase the number of patches though. :-) Stay
tuned.

I don't intend to push changes to the build system - we use gmake to
build stdcxx at Oracle.

--Stefan



On Fri, Feb 3, 2012 at 16:48, Andrew Black andrew.bl...@roguewave.com wrote:
 Like Farid, I too am willing to help process patches for review and
 submission. Once a track record has been established, someone on the PMC
 would likely raise a motion to designate you as a committer, as defined at
 http://stdcxx.apache.org/#committers . This would allow you to make changes
 directly to subversion without assistance. Do note that in order to be
 designated as such, you will need to have a Contributor License Agreement (
 http://www.apache.org/licenses/icla.txt ) on file with the Apache
 foundation. If you are being paid to perform this work, the company you work
 for will likely need to have a Corporate Contributor License Agreement (
 http://www.apache.org/licenses/cla-corporate.txt ) on file.

 If we are trying to revitalize this project, there are a few things I
 personally would/would not like to see in the patches:
 * I would not like to see major changes to the build infrastructure at this
 time. One of the goals of this project has been portability, and this
 includes the build infrastructure. My understanding is that gmake is
 considered to be more portable than some of the alternatives (cmake, ant).
 * I would like to see tests added to verify any library changes. Ideally the
 new tests will pass on most platforms, though we don't currently have an
 automated test mechanism in place. If any existing tests are incorrect,
 commentary for the change about why they are broken would be appreciated.
 * Changes destined for the 4.2.x branch should have forwards and backwards
 binary compatibility.
 * Changes destined for the 4.3.x branch should have backwards source
 compatibility.

 --Andrew Black


 On 02/03/2012 03:04 PM, Farid Zaripov wrote:

 On 03.02.2012 1:52, Stefan Teleman wrote:

 2. Someone with stdcxx commit privileges should be part of this
 reunification (for obvious reasons). It is very discouraging to submit
 patches knowing full well and ahead of time that they will never make
 it anywhere. Perhaps the process of submitting patches could be
 somewhat less of a process. Just my 0.02. --Stefan


    Stefan, if you split the all your patches to a set of small finalized
 changes and submit them through a set of corresponding issues in JIRA, I
 promise I will process them all one by one.
 At the moment I don't see any issues, reported by you. Sorry, but
 process is a process.

 Farid.





-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: Check… 1 2 3

2012-05-01 Thread Stefan Teleman

On Tue, May 1, 2012 at 09:14, Jim Jagielski j...@jagunet.com wrote:
 is this thing on?

 Just checking :)


Yes, it works! :-)

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: Apache Standard C++ Project chair change

2012-05-15 Thread Stefan Teleman

On Tue, May 15, 2012 at 9:33 AM, Jim Jagielski j...@jagunet.com wrote:
 Since being tasked as chair, I've seen no activity. There was
 an email from Bill regarding 2 outstanding iCLAs, but the response
 from one of the committers was less than optimistic.

 That's it. No Emails on dev@ or private@, no code activities,
 really no evidence at all of renewed interest/health/activity
 in stdcxx land.

 Am I wrong?

I haven't received any of Bill's emails. and I noticed that the
nightly build emails which had re-started in February have stopped.

I attached a proposed patch for stdcxx-1058 but I haven't received any comments.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: Apache Standard C++ Project chair change

2012-05-16 Thread Stefan Teleman

On Tue, May 15, 2012 at 10:15 AM, Stefan Teleman
stefan.tele...@gmail.com wrote:
 On Tue, May 15, 2012 at 9:33 AM, Jim Jagielski j...@jagunet.com wrote:
 Since being tasked as chair, I've seen no activity. There was
 an email from Bill regarding 2 outstanding iCLAs, but the response
 from one of the committers was less than optimistic.

 That's it. No Emails on dev@ or private@, no code activities,
 really no evidence at all of renewed interest/health/activity
 in stdcxx land.

 Am I wrong?

 I haven't received any of Bill's emails. and I noticed that the
 nightly build emails which had re-started in February have stopped.

 I attached a proposed patch for stdcxx-1058 but I haven't received any 
 comments.

I am going to ask the following questions in the open. Perhaps this
way I will get an answer:

1. Apparently, Bill's emails were sent and replied to on the private
list. I have not received any of them, although I am on the private
list. Or at least, I was on the private list as recently as late
March. Is this a case of mailing list malfunction? If that is the
case, it should be fixed.

2. Which Bill are we talking about? I know of two Bills at stdcxx. A
clarification would be welcome.

3. There was a recent and important PMC change at stdcxx.
Congratulations to Jim Jagielski for his new role as PMC Chair. Again,
I learned of this change yesterday, and that only from Jim's email. I
haven't received any notification about Jim's appointment prior to
yesterday, although, to be fair, I knew about it at the beginning of
May, because I read the ASF Board Minutes. Again, why wasn't there any
notification of this change?

4. The nightly build emails, which had restarted towards the end of
February/beginning of March, have stopped. Why is that?

Inquiring minds want to know.

Thank you.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: [jira] [Updated] (STDCXX-1058) std::basic_ios::copyfmt() with registered callback (via std::ios_base::register_callback()) run-time SIGABRT

2012-05-31 Thread Stefan Teleman

On Thu, May 17, 2012 at 11:58 AM, Martin Sebor mse...@gmail.com wrote:
 On 05/16/2012 09:23 PM, Stefan Teleman wrote:

 On Wed, May 16, 2012 at 2:58 PM, Martin Sebormse...@gmail.com  wrote:

 On 05/16/2012 11:55 AM, Travis Vitek wrote:


 I approve the change, but with one caveat. The branching policy [1]
 indicates that you should commit your changes directly to the 4.2.x
 branch.
 They should be merged from 4.2.x to 4.3.x, and then from 4.3.x to trunk.



 The test in the issue should be committed with the patch (after
 adding the necessary asserts, being renamed to follow the naming
 convention for regression tests, i.e., something like
 regress/27.basic_ios.stdcxx-1058.cpp, and decorated with the ASF
 license header).


 I attached 27.ios_base.event_callback.stdcxx-1058.cpp to stdcxx-1058,
 which will go in ${TOPDIR}/test/regress/.


 Great! A few suggestions:

 The naming convention used for the tests is based on a few things:

 1. the section number in the standard of the tested functionality
   (this is only used for sorting)
 2. the name of class whose members are being tested (if any)
 3. the name of the function or type (or member function or type)
   being tested (may be omitted for tests that exercise multiple
   members)
 4. for regression tests, the issue number

 Since this issue is about std::basic_ios::copyfmt() crashing (as
 the subject says), its name should probably look something like:

  27.basic_ios.copyfmt.stdcxx-1056.cpp

 Each test should exercise just the affected class, function, or
 type. It's best to avoid relying on parts of the library that
 aren't affected or subject of the test so that when they break
 as a result of a some unrelated change in the future we don't
 start seeing failures in unrelated tests. In this case, I would
 suggest to avoid using fstream (and especially cerr) and rely
 directly on basic_ios (define a derived class to access its
 protected ctor). Most regression tests don't tend to produce
 debugging output but if you find it useful please use stdio,
 not iostreams.

 Finally, regression tests should verify expected postconditions
 by using the assert() macro. The exit status can be 0 on success
 and non-zero on failure, but this is not done consistently.

OK I attached a new test case - 27.basic_ios.copyfmt.stdcxx-1058.cpp.

But, using a simple derived class from std::basic_ios doesn't trigger
the bug. It's only triggered when using std::fstream or
std::stringstream.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: [jira] [Updated] (STDCXX-1058) std::basic_ios::copyfmt() with registered callback (via std::ios_base::register_callback()) run-time SIGABRT

2012-06-13 Thread Stefan Teleman

On Mon, Jun 11, 2012 at 12:35 PM, Martin Sebor mse...@gmail.com wrote:
 ...

 OK I attached a new test case - 27.basic_ios.copyfmt.stdcxx-1058.cpp.

 But, using a simple derived class from std::basic_ios doesn't trigger
 the bug. It's only triggered when using std::fstream or
 std::stringstream.


 Here's what I meant:

  struct A: std::streambuf { };
  struct B: std::ios {
    A sb;
    B () { init (sb); }
  } f0, f1;

 Btw., in your new test, either the TEST_ASSERT() macro should abort
 or the test should when before returning ret is non-zero.

 I.e., every regression test should report failure by calling abort
 (via the assert() macro).

 Returning a non-zero exit status (in addition to making use of
 assert()) is fine but it shouldn't be the sole mechanism for
 reporting an error.

 Also, please avoid #including iostream in tests unless
 exercising the standard streams (std::cout et al.) The header
 runs complicated code and can lead to unrelated failures.

 Attached is a slightly modified test to show what I mean. (Also
 fixes formatting -- please use 4 space indents and a space before
 each open paren; curly brace goes on the same line as the statement
 except for namespace scope declarations).

 Martin

Done - new attachment 27.basic_ios.copyfmt.stdcxx-1058.cpp

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: svn access help

2012-06-24 Thread Stefan Teleman

On Mon, Jun 11, 2012 at 12:22 PM, Martin Sebor se...@apache.org wrote:
 On 06/11/2012 02:13 AM, Stefan Teleman wrote:

 Hi!

 Trying to commit my fix for STDCXX-1058 to trunk, I get the following:


 [steleman@darthvader][/src/steleman/programming/stdcxx-svn/stdcxx-trunk][06/11/2012
 4:05:37][1026]  svn ci --username stele...@apache.org --password
   src/iostore.cpp
 tests/regress/27.basic_ios.copyfmt.stdcxx-1058.cpp
 svn: Commit failed (details follow):
 svn: Server sent unexpected return value (403 Forbidden) in response
 to MKACTIVITY request for
 '/repos/asf/!svn/act/99786e10-b39c-11e1-a62b-5964c85c7fa1'
 svn: Your commit message was left in a temporary file:
 svn:
  '/src/steleman/programming/stdcxx-svn/stdcxx-trunk/svn-commit.2.tmp'

 [steleman@darthvader][/src/steleman/programming/stdcxx-svn/stdcxx-trunk][06/11/2012
 4:08:16][1027]


 That looks like a permission problem. We'll have to check to make
 sure you have commit permissions. Let me look into it and get back
 to you.

 Martin

 PS You can/should CC dev@stdcxx rather than private on questions
 like this. Private is only for truly private/confidential issues.

Still doesn't work:

[steleman@darthvader][/src/steleman/programming/stdcxx-svn/stdcxx-trunk][06/25/2012
0:54:36][1015] svn ci src/iostore.cpp
tests/regress/27.basic_ios.copyfmt.stdcxx-1058.cpp
svn: Commit failed (details follow):
svn: Server sent unexpected return value (403 Forbidden) in response
to MKACTIVITY request for
'/repos/asf/!svn/act/04b47418-be82-11e1-a6c2-3b1e26a37b30'
svn: Your commit message was left in a temporary file:
svn:'/src/steleman/programming/stdcxx-svn/stdcxx-trunk/svn-commit.3.tmp'
[steleman@darthvader][/src/steleman/programming/stdcxx-svn/stdcxx-trunk][06/25/2012
0:55:43][1016]

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: New chair and/or attic

2012-08-29 Thread Stefan Teleman

On Wed, Aug 29, 2012 at 1:12 PM, Liviu Nicoara nikko...@hates.ms wrote:
 On 08/29/12 10:54, Jim Jagielski wrote:

 Looking over the lack of activity within this project, it's
 obvious (at least to me), that maybe its day is done.

 Should I call a vote to move C++ to the Attic? Or is there someone
 who feels that the project should still exist *and* is willing
 to stand as chair?


 Hi Jim,

 The discussion back in February showed that, even though committers have not
 spent much time lately contributing new code to it, there is an active
 review of the activity occurring on the mailing list and people have
 volunteered time to at least review outside contributions. As Stefan
 remarked, putting it in the Attic pretty much closes the activity around it,
 as little as it is.

 I personally have a renewed interest in the implementation and am in the
 process of reviving my apache account with the intention of being a constant
 presence here, and I hope I will be able to contribute as well. I am not
 sure if anyone reviewed the patches volunteered by Stefan yet, or the
 changes in forks elsewhere, but I am currently looking at that, too.

 Thanks.

 Liviu



-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: New chair and/or attic

2012-08-29 Thread Stefan Teleman

On Wed, Aug 29, 2012 at 1:12 PM, Liviu Nicoara nikko...@hates.ms wrote:
 On 08/29/12 10:54, Jim Jagielski wrote:

 Looking over the lack of activity within this project, it's
 obvious (at least to me), that maybe its day is done.

 Should I call a vote to move C++ to the Attic? Or is there someone
 who feels that the project should still exist *and* is willing
 to stand as chair?


 Hi Jim,

 The discussion back in February showed that, even though committers have not
 spent much time lately contributing new code to it, there is an active
 review of the activity occurring on the mailing list and people have
 volunteered time to at least review outside contributions. As Stefan
 remarked, putting it in the Attic pretty much closes the activity around it,
 as little as it is.

 I personally have a renewed interest in the implementation and am in the
 process of reviving my apache account with the intention of being a constant
 presence here, and I hope I will be able to contribute as well. I am not
 sure if anyone reviewed the patches volunteered by Stefan yet, or the
 changes in forks elsewhere, but I am currently looking at that, too.

 Thanks.

 Liviu

I've been quiet lately for reasons completely unrelated to my interest
in stdcxx. I'm still just as interested as I was before. I've also
developed a new interest in getting stdcxx to compile with clang 3.1 -
it currently doesn't.

Perhaps we could also start discussing C++2011 - at a convenient pace,
since only clang currently supports it.

0.02.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX forks

2012-08-31 Thread Stefan Teleman

On Fri, Aug 31, 2012 at 8:40 AM, Liviu Nicoara nikko...@hates.ms wrote:

 Stefan's seem like a complete git-ification of the whole Apache repository
 but with no changes I could detect.

Not quite. :-)

You are - most likely referring to the svn repo at CVSDude here:

http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/

Yes I maintain that fork. All the patches are in this directory:

http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/Solaris/diffs/

and they apply with the apply_patches.sh script from here:

http://kdesolaris-svn.cvsdude.com/trunk/STDCXX/4.2.1/Solaris/apply_patches.sh

The official Oracle port for Solaris 10 and 11, which I maintain, is here:

http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/

This is the source code which is used to build stdcxx on Solaris 10
and 11. The patches for stdcxx on Solaris are here:

http://src.opensolaris.org/source/xref/userland/gate/components/stdcxx/patches/

The official Solaris 11 stdcxx package can be installed on Solaris 11 from here:

http://pkg.oracle.com/solaris/release/en/search.shtml?token=stdcxxaction=Search

For Solaris 10, you need to install SUNWlibstdcxx4 - which contains
the stdcxx library and its header files - and SUNWlibstdcxx4S which
contains the source code plus all the patches, and which installs in
/usr/share/src/. It installs on Solaris 10 Update 10 and later.

I maintain the repo at cvsdude on an ad-hoc basis. That means now and then. :-)

The source code repo at Oracle is constantly maintained at Oracle, and
we publish source code drops every two weeks there (I think).

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: New chair and/or attic

2012-08-31 Thread Stefan Teleman

On Fri, Aug 31, 2012 at 8:43 AM, C. Bergström
cbergst...@pathscale.com wrote:
 On 08/31/12 07:20 PM, Jim Jagielski wrote:

 On Aug 30, 2012, at 8:00 PM, C. Bergströmcbergst...@pathscale.com
 wrote:

 While STDCXX is at Apache it will never be BSD licensed.  Solution - move
 it away from Apache foundation and have them transfer some of the additional
 rights they received to allow recipient foundation to relicense.  I thought
 this would be a win for the project and everyone, but for some reason
 instead of opening a discussion to transfer - it's just death grip and
 pushing to the attic.

 What is wrong with ALv2?

 Armchair lawyer discussion on this will never end and I'll try to keep this
 brief..

 Apache lawyer views, our lawyer views, your views.. etc (not the problem
 here)

 FSF views which probably have some weight across the open source community
 is summed up with this..
 Despite our best efforts, the FSF has never considered the Apache License
 to be compatible with GPL version 2
 http://www.apache.org/licenses/GPL-compatibility.html

 That view seems to have been accepted by the FBSD community - The effect is
 that the large amount of GPLv2 code in ports/elsewhere can't take advantage
 of STDCXX due to it's license.  Please note I'm not arguing if this is
 correct, but just the feedback I've gotten.  I'm not interested to fight
 that.

 Open source works like this in my experience : people use it, they love it
 and they contribute back.  To get users we need to solve problems for larger
 communities - Make sense?

 Can you help clear this roadblock, yes or no?


My 0.02 of observations about FOSS licenses in general, based on my
direct experience:

For any FOSS component M, licensed under an Open Source License N,
there will always exist a person P, or a group of persons G[P] who
will declare that the current license N is
inappropriate/invalid/incompatible/etc, and will advocate a change to
another Open Source License Q.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: Branching policy, 4.3.x, 5.0.0, etc.

2012-08-31 Thread Stefan Teleman

On Fri, Aug 31, 2012 at 1:56 PM, Liviu Nicoara nikko...@hates.ms wrote:
 The branching policy [1] in effect looks very much like the Rogue Wave
 release process: branch at the beginning of each release cycle, work on the
 release branch, merge changes back into the trunk at release time (and in
 between as needed). Did I get that right?

 From what I gather 4.2.x has last released 4.2.1, there is a 4.3.x with no
 releases, and a prospective 5.0.0 which should come out of the trunk (I saw
 changes mentioning 5.0.0). What are the stated goals for the 4.3.x and 5.x?

My understanding is that 4.2.x and 4.3.x are bugfix/rfe releases while
5.x would become C++2011.

Please correct me if i'm wrong

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: stdcxx issue 1058

2012-08-31 Thread Stefan Teleman

On Fri, Aug 31, 2012 at 1:29 PM, Liviu Nicoara nikko...@hates.ms wrote:
 On 08/31/12 13:14, Stefan Teleman wrote:


 In June this year I committed r1353821 to trunk which fixes stdcxx-1058.

 I have the patches for 1058 ready to commit to branches (4.2.x and 4.3.x).

 OK to go?

 The patch looks ok to me. What seems to be the problem?  +1

Done.

trunk revision 1353821
branches/4.2.x revision 1379523
branches/4.3.x revision 1379520

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX forks

2012-08-31 Thread Stefan Teleman

On Fri, Aug 31, 2012 at 8:58 AM, C. Bergström
cbergst...@pathscale.com wrote:

 He has quite a number of patches and forget where those are kept.  I'm
 guessing a lot of his fixes target KDE/Qt apps and the Perennial C++VS
 testsuite.
 http://www.peren.com/pages/cppvs_set.htm

Correction: my patches aren't related to Qt/KDE at all at this point -
they are related to Solaris. Apache stdcxx is part of Solaris Core.

So the authoritative patchset for Solaris/stdcxx is the one
publisehd by Oracle, since it is kept up-to-date and it represents an
officially released and supported version of stdcxx.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: [PATCH] Trivial test fix

2012-09-01 Thread Stefan Teleman

On Sat, Sep 1, 2012 at 10:49 AM, Liviu Nicoara nikko...@hates.ms wrote:
 Would someone please apply the patch on 4.2.x branch? (I have not yet
 regained access to my Apache account.)


 2012-09-01  Liviu Nicoara  nikko...@hates.ms

 * tests/containers/23.bitset.cpp: swapped lines to avoid compiler
 bug
 (see http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54442)


 Index: tests/containers/23.bitset.cpp
 ===
 --- tests/containers/23.bitset.cpp  (revision 1379762)
 +++ tests/containers/23.bitset.cpp  (working copy)
 @@ -278,8 +278,8 @@ test_synopsis (std::bitset0*)
  MEMFUN (Bitset, flip, ());
  MEMFUN (Bitset, flip, (std::size_t));
  -MEMFUN (Bitset::reference, operator[], (std::size_t));
  MEMFUN (bool, operator[], (std::size_t) const);
 +MEMFUN (Bitset::reference, operator[], (std::size_t));
   MEMFUN (unsigned long, to_ulong, () const);

Done - revision 1379813.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX forks

2012-09-01 Thread Stefan Teleman

On Sat, Sep 1, 2012 at 12:15 PM, Liviu Nicoara nikko...@hates.ms wrote:

 Hi Stefan, I have went through the patches. Specifically, I have spent more
 time looking in the mutex alignment changes and the C++ C library headers
 patches, and I only read the others. In order:

 The test extensions seem to be genuine by and large, but I would further
 analyze them after I find out what is it they are addressing (test cases?).

 The regression tests whose names contain references to internal bug numbers
 require a bit more analysis as to their usefulness. Of course an explanation
 attached to each would alleviate duplicating your work. I have not
 cross-checked them to JIRA.

 Some of the compiler characterizations changes, as well as the associated
 GNUmakefile's, seem to be specific to your port, e.g., GNUmakefile,
 GNUmakefile.cfg changes. I may have spotted other issues but I would wait
 for your feed-back first.

 The C++ C library headers seem to have been re-written to your port. I am
 unsure why you needed this, but it surely breaks the original intent for
 these headers' structure. I have also noticed that you stripped the Apache
 notice and added an Oracle copyright notice on them.

 This pretty much sums up my first impression.

Hi

Yes, you are correct. :-)

To begin with: the compiler flags/GNUmakefile changes are very
specific to the SunPro compilers and to our internal build system.
These changes are most likely not suitable for inclusion in the
canonical stdcxx, except maybe for the sunpro.config changes, in case
someone would like to be able to replicate our builds. I'd like to
mention that, in Solaris, Apache stdcxx is a system library.

About the Standard C Library forwarding header files: these changes
are specfic to Solaris. The reason behind them is: the Solaris
architectural rules, which can be best summarized as: there can be
only one of each. In other words, it is Verboten, in Solaris, to
duplicate the Standard C Library header files (or any other header
file for that matter). The Solaris Standard C Library header files are
C++-clean - they are required to be so, by the same architectural
rules. Again, these changes are specific to Solaris, and are probably
not portable across other implementations. I know for a fact that they
are not portable for either the GCC or Intel compilers (with which I
test regularly on Linux, in addition to SunPro).

So these two groups of changesets can be ignored.

I opened yesterday STDCXX-1066:

https://issues.apache.org/jira/browse/STDCXX-1056

about the pthread_mutex_t/pthread_cond_t alignment on SPARCV8. I'll
have patches done this weekend. Achtung: the patchset is very large
and touches a very large number of files. It's strange that I didn't
get an email about STDCXX-1066.

I'd also like to talk about STDCXX-1056:

https://issues.apache.org/jira/browse/STDCXX-1056

which has already had an initial discussion, and for which I have
attached  a patch. This issue also addresses (indirectly) linkage when
building with GCC. On the recent versions of GCC that I have tested
with, passing -supc++ on link line automatically links with the GNU
libstdc++6.so (on top of linking with stdcxx), and that just bad.

And then I'll have to cross reference the patches which refer to our
internal bug numbers because most of them are quite old and right now,
off the top of my head, I can't remember what they are. :-)

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-03 Thread Stefan Teleman

On Mon, Sep 3, 2012 at 11:57 PM, Stefan Teleman
stefan.tele...@gmail.com wrote:
 On Mon, Sep 3, 2012 at 3:19 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I tried, unsuccessfully, to reproduce the failure observed by Martin in 
 22.locale.moneypunct.mt, in both debug and optimized, wide and narrow builds 
 on a 16x machine:


 I can reproduce it consistently on Solaris without the patches. It has
 a very high failure rate. It's also been reported at least once
 before, here:

 http://old.nabble.com/22.locale.numpunct.mt-run-hangs-td20133013.html

 I remember there were more hits about 22.locale.numpunct.mt in Google
 when I searched for it a while ago, but I didn't look as closely now.

 I'll create a Solaris build without my patches tomorrow and send the output.

FWIW, my today's builds without the patches with the Intel compiler:

COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403

22.locale.numpunct.mt ran for 3 hours (wall clock time) without ever
completing or doing anything except flatlining the cpu at 115%, on
both 32-bit and 64-bit. This is on:

[steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/04/2012
0:21:13][1181] uname -a
Linux darthvader.stefanteleman.org 3.5.0-2.fc17.x86_64 #1 SMP Mon Jul
30 14:48:59 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

I can't (yet) rebuild with GCC 4.7.0 because Fedora 17's build of GCC
C++ is a mess (libsupc++.a requires TLS but glibc was built with TLS
disabled).

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-04 Thread Stefan Teleman

On Tue, Sep 4, 2012 at 10:49 PM, Martin Sebor mse...@gmail.com wrote:

   template class _CharT
   inline string numpunct_CharT::grouping () const
   {
   if (!(_C_flags  _RW::__rw_gr)) {

   numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

  _RWSTD_MT_GUARD (_C_mutex);

  // [try to] get the grouping first (may throw)
   // then set a flag to avoid future initializations
   __self-_C_grouping  = do_grouping ();
   __self-_C_flags|= _RW::__rw_gr;
   }

   return _C_grouping;
   }

That's what I wanted to do originally - use a per-object mutext.
Unfortunately the _C_mutex member in rw::__rw_synchronized is static:

struct __rw_synchronized
{
// static so that it takes up no space
static _RWSTD_EXPORT __rw_mutex _C_mutex;

void _C_lock () { }

void _C_unlock () { }

__rw_guard _C_guard () {
return __rw_guard (_C_mutex);
}

and __rw::rw_guard doesn't have an appropriate constructor.

Intel C++ complains about it too:

/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/include/loc/_numpunct.h(181):
error: no instance of constructor __rw::__rw_guard::__rw_guard
matches the argument list
argument types are: (const __rw::__rw_mutex)
  _RWSTD_MT_GUARD (_C_mutex);
  ^

This works:

template class _CharT
inline string numpunct_CharT::grouping () const
{
if (!(_C_flags  _RW::__rw_gr)) {

numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

_RWSTD_MT_STATIC_GUARD (_Type);

// [try to] get the grouping first (may throw)
// then set a flag to avoid future initializations
__self-_C_grouping  = do_grouping ();
__self-_C_flags|= _RW::__rw_gr;
}

return _C_grouping;
}

Although I'm not sure of the performance implications difference
between _RWSTD_MT_STATIC_GUARD and _RWSTD_MT_CLASS_GUARD for this
particular problem. I'm going with nothing in real life. :-)

And even so, this is still not thread-safe:

Two different threads [ T1 and T2 ], seeking two different locales
[en_US.UTF-8 and ja_JP.UTF-8], enter std::numpunct_CharT::grouping()
at the same time - because they are running on two different cores.
They both test for

if (!(_C_flags  _RW::__rw_gr))

and then -- assuming the expression above evaluates to true -- one of
them wins the mutex [T1], and the other one [T2] blocks on the mutex.

When T1 is done and exits the function, the grouping is set to
en_US.UTF-8 and the mutex is released. Now T2 acquires the mutex, and
proceeds to setting grouping to ja-JP.UTF-8. Woe on the caller running
from T1 who now thinks he got en_US.UTF-8, but instead he gets
ja_JP.UTF-8, which was duly set so by T2, but T1 had no clue about it
(remember, the std::string grouping _charT buffer is shared by the
caller from T1 and the caller from T2).

So at a minimum, the locking must happen before evaluating the

if (!(_C_flags  _RW::__rw_gr))

expression.

This still doesn't solve what ends up being returned in grouping. If
we lock at the top of the function, then, when T2 acquires the mutex,
the test expression will evaluate to false. Therefore T2 will return
whatever is in grouping right now, which happens to be en_US.UTF-8 as
set by T1, when T2 really wanted ja_JP.UTF-8.

I really think the appropriate fix here -- which would address the
performance implications -- is more complex than this. I am thinking
about creating and using a (non-publicly accessible) internal locale
cache:

typedef std::mapstd::string, std::locale locale_cache;

where all the locales are stored fully initialized, on demand. There
is only one locale instantiation and initialization overhead cost per
locale. After a locale has been instantiated and placed into the
cache, the caller of any specfic locale gets a copy from the cache,
fully instantiated and initialized. But this breaks ABI, so I'm
thinking it's for stdcxx 5.

Thoughts?

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Stefan Teleman

 | 100% |
# +---+--+--+--+
real 2416.75
user 2694.64
sys 159.49

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Stefan Teleman

On Wed, Sep 5, 2012 at 4:20 PM, Stefan Teleman stefan.tele...@gmail.com wrote:

 But then there's another aspect  -- which I probably failed to
 highlight in my previous email: the per-object mutex implementation is
 20% *slower* than the class-static mutex implementation.

 class-static implementation:
 real 2139.31
 user 2406.09
 sys 155.61

 pe-object implementation:
 real 2416.75
 user 2694.64
 sys 159.49

The above results are for the Intel compiler. Now here are the results
with GCC 4.7.0:

1. with the static per-class mutex:

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: gcc 4.7.0, __VERSION__ = 4.7.0 20120507 (Red Hat 4.7.0-5)
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  5 2012, 06:21:18
# COMMENT: thread safety


[ ... ]

# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 | 100% |
# +---+--+--+--+
real 2165.06
user 2428.08
sys 151.30


2. With the per-object mutex:

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: gcc 4.7.0, __VERSION__ = 4.7.0 20120507 (Red Hat 4.7.0-5)
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  5 2012, 21:29:56
# COMMENT: thread safety


# CLAUSE: lib.locale.numpunct

 [ ... ]

# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 | 100% |
# +---+--+--+--+
real 2438.70
user 2726.44
sys 155.79

About the same percentage difference as the Intel compiler.

--Stefan

---
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-05 Thread Stefan Teleman

On Wed, Sep 5, 2012 at 10:55 PM, Martin Sebor mse...@gmail.com wrote:

 I suspect the difference is due to the overhead of the repeated
 initialization and destruction of the per-object mutex in the
 test. The test repeatedly creates (and discards) named locale
 objects.

 The per-class mutex is initialized just once in the process, no
 matter how many facet objects (how many distinct named locales)
 the test creates. But the per-object mutex must be created (and
 destroyed) for each named locale.

Agreed.

But: if the choice is between an implementation which [1] breaks ABI
and [2] performs 20% worse -- even in contrived test cases -- than
another implementation [2] which doesn't break ABI, and performs
better than the first one,  why would we even consider the first one?

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Stefan Teleman

On Thu, Sep 6, 2012 at 9:16 AM, Liviu Nicoara nikko...@hates.ms wrote:

 I think Stefan is referring to adding a mutex member variable to the facet
 in question and breaking binary compatibility. If that is the case I have
 confused things when I suggested exactly that, earlier. A cursory read
 through the __rw_facet source shows that inherits from __rw_synchronized in
 MT builds, therefore each facet carries its own mutex member.

 On 09/05/12 23:51, Martin Sebor wrote:
 We don't need to add a new mutex -- we can use the __rw_facet
 member for the locking. Or did you mean something else?

A possible implementation using the __rw_facet mutex could look like this:

template class _CharT
inline string numpunct_CharT::grouping () const
{
if (!(_C_flags  _RW::__rw_gr)) {

numpunct* const __self = _RWSTD_CONST_CAST (numpunct*, this);

_RWSTD_MT_GUARD (__self-_C_mutex);

if (!(_C_flags  _RW::__rw_gr)) {

// [try to] get the grouping first (may throw)
// then set a flag to avoid future initializations
__self-_C_grouping  = do_grouping ();
__self-_C_flags|= _RW::__rw_gr;

}
}

return _C_grouping;
}

Except that it will not work. Because the __rw_facet mutex member is
being locked  in file ../src/facet.cpp in function
__rw_facet::_C_manage at line 366:

// acquire lock
_RWSTD_MT_STATIC_GUARD (_RW::__rw_facet);

This will deadlock because this is the mutex already locked by
std::numpunctT::grouping().

I've already tested this with 3 compilers, and, it does indeed deadlock.

So yes, I did indeed mean something different. I meant adding another
mutex data member to the numpunct class.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Stefan Teleman

On Thu, Sep 6, 2012 at 2:46 PM, Stefan Teleman stefan.tele...@gmail.com wrote:

 [steleman@darthvader][/src/steleman/programming/stdcxx-ss122/stdcxx-4.2.1/build/tests][09/06/2012
 14:40:11][1084] ./22.locale.numpunct.mt --nthreads=2 --nloops=100
 # INFO (S1) (10 lines):
 # TEXT:
 # COMPILER: SunPro, __SUNPRO_CC = 0x5120
 # ENVIRONMENT: i386 running linux (Fedora release 17 (Beefy Miracle)
 (3.5.0-2.fc17.x86_64)) with glibc 2.15
 # FILE: 22.locale.numpunct.mt.cpp
 # COMPILED: Sep  6 2012, 14:38:42
 # COMMENT: thread safety
 

 # CLAUSE: lib.locale.numpunct

 # NOTE (S2) (5 lines):
 # TEXT: executing /usr/bin/locale -a  /tmp/tmpfile-YzXcb9
 # CLAUSE: lib.locale.numpunct
 # FILE: process.cpp
 # LINE: 276

 grouping: _RWSTD_MT_GUARD (__self-_C_mutex): 0xf774913c
 _C_get_data: _RWSTD_MT_GUARD (_C_mutex): 0xf774913c

 [ ... deadlock ... ]

And just to make absolutely sure that this isn't a case of SunPro
being insane, here's the output from the Intel compiler:

[steleman@darthvader][/src/steleman/programming/stdcxx-intel/stdcxx-4.2.1-thread-safe/build/tests][09/06/2012
15:44:23][1390] ./22.locale.numpunct.mt --nthreads=2 --nloops=100
# INFO (S1) (10 lines):
# TEXT:
# COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  6 2012, 15:43:29
# COMMENT: thread safety


# CLAUSE: lib.locale.numpunct

# NOTE (S2) (5 lines):
# TEXT: executing locale -a  /tmp/tmpfile-FoZR0J
# CLAUSE: lib.locale.numpunct
# FILE: process.cpp
# LINE: 276

grouping: _RWSTD_MT_GUARD (__self-_C_mutex): 0xf74c2424
_C_get_data: _RWSTD_MT_GUARD (_C_mutex): 0xf74c2424

[ ... deadlock ... ]

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-06 Thread Stefan Teleman

On Thu, Sep 6, 2012 at 7:31 PM, Liviu Nicoara nikko...@hates.ms wrote:

 There would be a performance degradation. IMHO, it would be minor and would 
 simplify the code considerably.

 After finally being able to reproduce the defect with SunPro 12.3 on x86_64 I 
 tried to remove the lazy initialization and a (smaller) test case now passes. 
 I am attaching the program and the numpunct patch.

With your patches, the performance is much much better:

# INFO (S1) (10 lines):
# TEXT:
# COMPILER: Intel C++, __INTEL_COMPILER = 1210,
__INTEL_COMPILER_BUILD_DATE = 20111011, __EDG_VERSION__ = 403
# ENVIRONMENT: pentiumpro running linux-elf (Fedora release 17 (Beefy
Miracle) (3.5.0-2.fc17.x86_64)) with glibc 2.15
# FILE: 22.locale.numpunct.mt.cpp
# COMPILED: Sep  6 2012, 20:50:13
# COMMENT: thread safety


# +---+--+--+--+
# | DIAGNOSTIC|  ACTIVE  |   TOTAL  | INACTIVE |
# +---+--+--+--+
# | (S1) INFO |   11 |   11 |   0% |
# | (S2) NOTE |1 |1 |   0% |
# | (S8) ERROR|0 |3 | 100% |
# | (S9) FATAL|0 |1 | 100% |
# +---+--+--+--+
real 1035.05
user 1191.76
sys 63.49

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: dbx [was: Re: STDCXX-1056 [was: Re: STDCXX forks]]

2012-09-07 Thread Stefan Teleman

On Fri, Sep 7, 2012 at 12:23 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I get this when launching the debugger:

 $ dbx -xexec32 t
 For information about new features see `help changes'
 To remove this message, put `dbxenv suppress_startup_message 7.9' in your
 .dbxrc
 Reading t
 Reading ld-linux.so.2
 dbx: fetch at 0xf400 failed -- Input/output error
 dbx: warning: could not put in breakpoint

There is fix for this, but for Studio 12.2:

http://wesunsolve.net/bugid/id/6545393

It had priority 5 (very low) so that leads me to assume that it hasn't
been fixed yet in 12.3. But I will ask at work about 12.3/Linux.

Strangely enough, I don't get the error on Fedora 17 with 12.3.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-10 Thread Stefan Teleman

On Thu, Sep 6, 2012 at 6:43 PM, Stefan Teleman stefan.tele...@gmail.com wrote:
 On Thu, Sep 6, 2012 at 4:02 PM, Martin Sebor mse...@gmail.com wrote:

 Here's a thought: it's not pretty but how about having
 the function initialize the facet? It already has a pointer
 to the base class, so it could downcast it to std::numpunct
 (or moneypunct, respectively), and assign the initial values
 to the members. Would that work?

 I haven't looked at them in detail (yet) but a cursory look shows that
 they're both recursive for the successful case.

It's not going to work that way.

For one, __rw_get_numpunct() and __rw_get_moneypunct() are static
functions in the __rw namespace. Neither can access or modify the
std::numpunctT or std::moneypunctT data members directly, because
they are private.

Second, both __rw_get_numpunct() and __rw_get_moneypunct() are
recursive. Unless we want to start playing with
PTHREAD_MUTEX_RECURSIVE, which I'm not at all sure is supported on all
the platforms we support, we're not going to be able to solve the
thread-safety problem here (Linux supports it as
PTHREAD_MUTEX_RECURSIVE_NP, Solaris supports it
PTHREAD_MUTEX_RECURSIVE).

Third, both __rw_get_numpunct() and __rw_get_moneypunct() can return a
NULL pointer. This is bad, because it will cause a SEGV at string
assignment, during a call to either of the
std::numpunctT::grouping(), std::numpunctT::truename(), etc.
functions. We should fix this and throw an exception instead. The
Standard doesn't say that any of these functions can throw, but it
doesn't say they can't throw either. And both __rw_get_numpunct() and
__rw_get_moneypunct() throw already.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-10 Thread Stefan Teleman

On Mon, Sep 10, 2012 at 2:21 PM, Liviu Nicoara nikko...@hates.ms wrote:

 4. Without caching of grouping values, grouping() delegates always to
 do_grouping():

 real0m5.668s
 user1m11.389s
 sys 0m3.952s

FWIW, 22.2.3.1.1 explicitly states that all of the decimal_point(),
grouping(), truename(), falsename() must return their do_*()
counterparts.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-10 Thread Stefan Teleman

On Mon, Sep 10, 2012 at 1:32 PM, Martin Sebor mse...@gmail.com wrote:


 That said, I'd certainly prefer to avoid hacks as much as
 possible. This problem could perhaps more cleanly be solved
 by having the facet pass a reference to the string (or to
 all of its internal data) to modify to the function (or
 something like that). Unfortunately, it would break binary
 compatibility.

I think I have something which doesn't break BC - stay tuned because
I'm testing it now.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-11 Thread Stefan Teleman

On Tue, Sep 11, 2012 at 10:18 PM, Liviu Nicoara nikko...@hates.ms wrote:

 AFAICT, there are two cases to consider:

 1. Using STDCXX locale database initializes the __rw_punct_t data in the
 first, properly synchronized pass through __rw_get_numpunct. All subsequent
 calls use the __rw_punct_t data to construct returned objects.
 2. Using the C library locales does the same in the first pass, via
 setlocale and localeconv, but setlocale synchronization is via a per-process
 lock. The facet data, once initialized is used just like above.

 I probably missed this in the previous conversation, but did you detect a
 race condition in the tests if the facets are simply forwarding to the
 private virtual interface? I.e., did you detect that the facet
 initialization code is unsafe? I think the facet __rw_punct_t data is safely
 initialized in both cases, it's the caching that is done incorrectly.

I originally thought so too, but now I'm having doubts. :-) And I
haven't tracked it down with 100% accuracy yet. I saw today this
comment in src/facet.cpp, line 358:

// a per-process array of facet pointers sufficiently large
// to hold (pointers to) all standard facets for 8 locales
static __rw_facet*  std_facet_buf [__rw_facet::_C_last_type * 8];

this leads me to suspect that there is an upper limit of 8 locales +
their standard facets. If the locales (and their facets) are being
recycled in and out of this 8-limit cache, that would explain the
other thing I also noticed (which also answers your question): yes, i
have gotten the dreaded strcmp(3C) 'Assertion failed' in
22.locale.numpunct.mt when I test implemented 22.locale.numpunct.mt in
a similar way to your tests. which in theory shouldn't happen, but it
did. which means that there's something going on with
behind-the-scenes facet re-initialization that i haven't found yet.
which would partially explain your observation that MT-tests perform
much worse with caching than without.

this is all investigative stuff for tomorrow. :-)

and I agree with Martin that breaking ABI in a minor release is really
not an option. I'm trying to find the best way of making these facets
thread-safe while inflicting the least horrible performance hit.

i will run your tests tomorrow and let you know. :-)

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: [REPORT] Apache C++ Standard Library (stdcxx)

2012-09-13 Thread Stefan Teleman

On Thu, Sep 13, 2012 at 1:12 PM, C. Bergström
cbergst...@pathscale.com wrote:

 [System lib exception was of course brought up during the BSD discussion,
 but it was said that system libraries are usually shipped by default with
 the system.  This may not always be the case with STDCXX.]

In order to best answer this questions, could one of the BSD Internet
Attorneys please provide the legal definitions for the following
terms:

1. system
2. libraries
3. are
4. usually
5. shipped
6. by
7. default
8. with
9. the

Thank you.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-15 Thread Stefan Teleman

On Sat, Sep 15, 2012 at 9:01 AM, Liviu Nicoara nikko...@hates.ms wrote:

 That is funny. What compiler are you using? What does the following test
 case return for you?

It's the Intel compiler with the patched stdcxx for  the wrong case
and GCC 4.7.1 + GNU libstdc++ for  the correct case.

GCC + GNU libstdc++ are correct.

The patched facet does not call the protected do_*() virtual functions
from their public counterparts, as it is required to do by the
Standard. Instead, it returns the data mebers directly (the data
members were initialized in the constructor). That is the patch you
proposed, which is indeed much better performing than using a mutex
lock. Unfortunately, in doing so, overriding the virtual functions in
a derived facet type becomes pointless.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-16 Thread Stefan Teleman

On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara nikko...@hates.ms wrote:

 Now, to clear the confusion I created: the timing numbers I posted in the
 attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a
 perfectly forwarding, no caching public interface (exemplified by a changed
 grouping) performs better than the current implementation. It was that test
 case that I hoped you could time, perhaps on SPARC, in both MT and ST
 builds. The t.cpp program is for MT, s.cpp for ST.

I got your patch, and have tested it.

I have created two Experiments (that's what they are called) with the
SunPro Performance Analyzer. Both experiments are targeting race
conditions and deadlocks in the instrumented program,  and both
experiments are running the 22.locale.numpunct.mt program from the
stdcxx test harness. One experiment is with  your patch applied. The
other experiment is with our (Solaris) patch applied.

Here are the results:

1. with your patch applied:

http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.nts/

2. with our (Solaris) patch applied:

http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Stefan Teleman

On Mon, Sep 17, 2012 at 8:46 AM, Liviu Nicoara nikko...@hates.ms wrote:

 In the meantime I would like to stress again that __rw_get_numpunct is
 perfectly thread-safe and does not need extra locking for perfect
 forwarding.

So, by removing the test for

  if (!(_C_flags  _RW::__rw_gr))

(or any other bitmask for that matter), the functions which were
thread-unsafe - and were exhibiting all the symptoms of a run-time
race condition -, magically became thread-safe?

I have looked *extensively* at the code in __rw_get_numpunct. It is
inherently thread-unsafe.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Stefan Teleman

On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote:

 I hope you agree that this synchronization is sufficient for the facet
 initialization and reading of facet data.

Sorry, I do not agree. Two different thread analyzers from two
different compilers written by two different compiler writers tell me
not to.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-17 Thread Stefan Teleman

On Mon, Sep 17, 2012 at 11:38 AM, Wojciech Meyer wojciech.me...@arm.com wrote:

 so which compilers do fail? You know, some of them might use the same
 component.

Intel Compiler/Thread Analyzer on Linux, SunPro Compiler/Thread
Analyzer on Linux and Solaris (Intel and SPARC). All three of them
show the same exact problems.

The Intel Compilers and the SunPro Compilers have nothing in common
with each other.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Stefan Teleman

On Mon, Sep 17, 2012 at 11:17 AM, Liviu Nicoara nikko...@hates.ms wrote:
I hope you agree that this synchronization is sufficient for the facet
initialization and reading of facet data.

I have reduced the number of reported race conditions in
22.locale.numpunct.mt from 12440:

http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html

to 288:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/22.locale.numpunct.mt.5.er.html/index.html

The changes are in the following files:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/facet.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120918/punct.cpp

_numpunct.h looks like this:

http://s247136804.onlinehome.us/stdcxx-1056-20120918/_numpunct.h

With these changes, no races conditions are repoted for any of the
functions in std::numpunctT.

Still, there are 288 race conditions being reported in
__rw_locale::__rw_locale and in std::locale::_C_get_facet. We need to
identify the source and cause of these race conditions and correct
them as well.

This is not a complete solution to the problem, because we still have
to re-write the chunk of code I eliminated from facet.cpp. It is only
step one towards finding a real solution. But, at least for now, we
have pinpointed where the source of these race conditions is located,
and what causing it.

The test program was run as: ./22.locale.numpunct.mt --nthreads=8
--nloops=1.

More to follow.

--Stefan

--
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Stefan Teleman

On Tue, Sep 18, 2012 at 12:43 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I am attaching a test program which, while 100% MT-safe, is flagged by
 the Solaris thread analyzer.

The program as written is not thread safe. It is reading the value of
the counter variable and performing a zero comparison outside of a
mutex lock:

for (size_t i = 0; i  nloops; ++i) {
if (counter == 0) {  // --- 
pthread_mutex_lock (lock);
if (counter == 0)
++counter;
pthread_mutex_unlock (lock);
}
else {
// counter value is safe to use here
}
}




-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Stefan Teleman

On Tue, Sep 18, 2012 at 4:35 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I will concede that I might be wrong and I am open to arguments. I would
 accept as a counter-argument this program if you can show a runtime failure.

The the first read of the counter variable is outside a mutex lock
correct? The read is followed by a 0 comparison, correct?

What guarantees that between the read and the comparison the value of
the counter variable hasn't been modified by another thread? The
thread currently doing the comparison cannot guarantee it: it hasn't
locked the mutex. Other threads may be running - actually they
probably are. Another thread may have already acquired the mutex and
incremented the value of counter. Your thread has no way of knowing if
that has happened, because it does not yet have exclusive access to
the counter variable. It will, after it acquires the mutex.

Where is it reading the variable  from? A register? Is it declared
volatile? L2 cache? L3 cache?

The program, as you wrote it, implicitly acknowledges that it is not
thread safe. That is the point of the double check: one before the
mutex lock, and one after it. The point of the first check has nothing
to do with thread-safety, and everything to do with a minor
optimization: if the value stored in variable counter is already not
zero, then there's no point in locking the mutex or performing the
increment.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 [was: Re: STDCXX forks]

2012-09-18 Thread Stefan Teleman

On Tue, Sep 18, 2012 at 7:38 PM, Liviu Nicoara nikko...@hates.ms wrote:

 1. The facet data caching is not MT-safe
 2. The facet data initialization (STDCXX or system locales) is safe (*)
 3. There is no unit test currently showing a failure in (2)
 4. Timing results show that caching may be slower than non-caching in MT
 builds
 5. A fix should, ideally, be binary compatible
 6. A fix should, ideally, preserve performance or increase it
 7. There is one patch, currently attached to the issue, by Stefan
 8. Other partial patches are referenced from this thread

 Please correct me if I missed anything. The above summary is a good starting

For the record, I fundamentally disagree with your assessment above.
It is not based on any verifiable and reproducible facts, analysis of
facts and/or measurements. It is based on assertions, which are, in
turn, based on other assertions.

As long as you continue dismissinsg the results from 4 [ FOUR ]
different thread analyzer, on 4 [ FOUR ] different operating systems,
and that solely on the basis on your assertions and beliefs, there is
no point in continuing this debate.

The results from all four thread analyzers contradict to all of your
assertions. If you firmly and strongly believe that you are always
right, and that the four thread analyzers are always wrong, that is
your choice.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

STDCXX-1056 : numpunct fix

2012-09-19 Thread Stefan Teleman

This is a proposed fix for the numpunct facet for stdcxx-1056:

0. Number of reported race conditions is now 0 (zero).

1. No memory leaks in stdcxx (there are memory leaks reported in either
libc or glibc, but there's nothing we can do about these anyway).

2. This fix preserves perfect forwarding in the _numpunct.h header file.

3. This fix eliminates code from facet.cpp and locale_body.cpp which
was creating unnecessary overhead, with the potential of causing
memory corruption, while providing no discernable benefit.

More specifically:

It is not true that there was no eviction policy of cached locales or
facets in stdcxx. Not only cache eviction code existed, and still exists
today, but cache cleanups and resizing were performed periodically,
either when an object's reference count dropped to 0 (zero), or whenever
the number of cached objects fell below sizeof(cache) / 2.

In the latter case, both the facet cache and the locale cache performed
a new allocation of the cache array, followed by a memcopy and a delete[]
of the old cache array.

First, the default size of the facets and locales caches was too small:
it was set to 8. I raised this to 32. A direct consequence of this
insufficient default size of 8 was that the cache had to resize itself
very soon after program startup. This cache resize operation consists of:
allocate memory for a new cache, copy the existing cached objects
from the old cache to the new cache, and then delete[] the old cache.

This is a first unnecessary overhead.

Second, and as I mentioned above, whenever the number of cached objects
fell below sizeof(cache) / 2, the cache resized itself, by performing
the same sequence of operations as described above.

This is a second unnecessary overhead.

Third, cached objects were automatically evicted whenever their reference
count dropped to 0 (zero). There are two consequences to this eviction
policy: if the program needs to re-use an object (facet or locale) which
has been evicted and subsequently destroyed, this object needs then to be
constructed again later on, and subsequently re-inserted into the cache.
This, in turn, would trigger a cache resize, followed by copying and
delete[] of the old cache buffer.

Object eviction followed by destruction followed by reconstruction is
a third unnecessary overhead. Re-inserting a re-constructed object into,
the cache, followed by a potential cache resize involving allocation of
a new buffer, copying pointers from the old cache to the new cache,
followed by delete[] of the old cache is a fourth unnecessary overhead.

Real-life programs tend to reuse locales and/or facets they have created.
There is no point in destroying and evicting these objects simply because
there may be periods when the object isn't referenced at time. The object
is likely to be needed again, later on.

The fix proposed here eliminates the cache eviction and object destruction
policy completely. Once created, objects remain in the cache, even though
they may reside in the cache with no references. This eliminates the
cache resize / copy / delete[] overhead. It also eliminates the overhead
of re-constructing an evicted / destroyed object, if it is needed again
later.

4. Tests and Analysis Results:

4.1. SunPro 12.3 / Solaris / SPARC / Race Conditions Test:

http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.solaris-sparc.datarace.er.html/index.html

4.2. SunPro 12.3 / Solaris / SPARC / Heap and Memory Leaks Test:

http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.solaris-sparc.heapcheck.er.html/index.html

4.3. SunPro 12.3 / Linux / Intel / Race Conditions Test:

http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.linux-intel.datarace.er.html/index.html

4.4. SunPro 12.3 / Linux / Intel / Heap and Memory Leaks Test:

http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.sunpro.linux-intel.heapcheck.er.html/index.html

4.5. Intel 2013 / Linux / Intel / Race Conditions Test:

http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.intel.linux.datarace.r007ti3.inspxez

4.6. Intel 2013 / Linux / Intel / Heap and Memory Leaks Test:

http://s247136804.onlinehome.us/stdcxx-1056-20120919/22.locale.numpunct.mt.intel.linux.heapcheck.r008mi1.inspxez

5. Source code for this fix:

http://s247136804.onlinehome.us/stdcxx-1056-20120919/_numpunct.h
http://s247136804.onlinehome.us/stdcxx-1056-20120919/facet.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120919/locale_body.cpp
http://s247136804.onlinehome.us/stdcxx-1056-20120919/punct.cpp

These files are based on stdcxx 4.2.1.

--
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-19 Thread Stefan Teleman

 instrumentation, both with SunPro and with
Intel compilers, optimization of any kind must be disabled. On SunPro
you have to pass -xkeepframe=%all (which disables tail-call
optimization as well), in addition to passing -xO0 and -g. So the
timings for these unoptimized experiments would have been completely
irrelevant.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman

On Thu, Sep 20, 2012 at 4:45 PM, Travis Vitek
travis.vi...@roguewave.com wrote:


 I'll let you in on a little secret: once you call setlocale(3C) and
 localeconv(3C), the Standard C Library doesn't release its own locale
 handles until process termination. So you might think you save a lot
 of memory by destroying and constructing the same locales. You're
 really not. It's the Standard C Library locale data which takes up a
 lot of space.

 You have a working knowledge of all Standard C Library implementations?

I happen to do, yes, for the operating systems that I've been testing
on. I also happen to know that you don't. This fact alone pretty much
closes up *this* particular discussion.

Do yourself, and this mailing list a favor: either write a patch which
addresses all of your concerns *AND* eliminates all the race
conditions reported, or stop this pseudo software engineering bullshit
via email.

There is apparently, a high concentration of know-it-alls on this
mailing list, who are much better at detecting race conditions and
thread unsafety than the tools themselves. Too bad they aren't as good
at figuring out their own bugs.

It took eight months for anyone here to even *acknowledge* that
numpunct and moneypunct do have, in fact, a thread safety problem.
Never mind that the test case for these facets had been crashing for 4
years. To be quite blunt and to the point, after 8 months of denying
obvious facts, your credibility is quite a bit under question at this
point.

This entire discussion has become a perfect illustration with what's
wrong with the ASF, as reported here:

http://www.mikealrogers.com/posts/apache-considered-harmful.html

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman

On Thu, Sep 20, 2012 at 7:34 PM, Wojciech Meyer
wojciech.me...@googlemail.com wrote:
 Hi,

 My perceptions is by reading through the whole thread - we should not
 trust 100% external tools to asses the safety of the code. I don't think
 there exist an algorithm that produces no false positives.

 That's said I admire Stefan's approach, but we should ask the question
 are we MT safe enough? I would say from what I read here: yes.

Based on what objective metric?

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman

On Thu, Sep 20, 2012 at 7:22 PM, Liviu Nicoara nikko...@hates.ms wrote:

 Stefan, I want to be clear. You are talking about a patch identical in nature 
 to the one I have attached now. Just want to be 100% sure we are talking 
 about the same thing. This one still produces failures (crashes, assertions, 
 etc.) in the locale MT tests on SPARC and elsewhere in your builds?

On September 17, 2012 I have posted the following message to this list:

http://www.mail-archive.com/dev@stdcxx.apache.org/msg01929.html

In that message, there is a link to my SPARC thread-safety test results:

http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/index.html

This test was run with the following _numpunct.h file:

http://s247136804.onlinehome.us/stdcxx-1056-SPARC-20120917/22.locale.numpunct.mt.nts.1.er.html/file.14.src.txt.html

The test above shows 12440 race conditions detected for a test run of
22.locale.numpunct.mt, with  --nthreads=8 --nloops=1.

Did you ever look at these test results? From reading your email, I
realize that you never looked at it. That is the only possible
explanation as to why you're asking now for SPARC test results, today
being September 20, 2012.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman

On Thu, Sep 20, 2012 at 7:52 PM, Liviu Nicoara nikko...@hates.ms wrote:

 On Sep 20, 2012, at 7:49 PM, Stefan Teleman wrote:

 On Thu, Sep 20, 2012 at 7:40 PM, Liviu Nicoara nikko...@hates.ms wrote:

 The only gold currency that anyone in here accepts without reservations are 
 failing test cases. I believe I have seen some exceptions to the golden 
 rule in my RW time, but I can't recall any specific instance.

 That may be a valid metric here.

 The only one. Any programmer worth his salt -- I am borrowing your words here 
 -- would be able to demonstrate the validity of his point of view with a test 
 case.

I did. There are 12440 race conditions detected for an incomplete run
of 22.locale.numpunct.mt. By incomplete I mean: it did not run with
its default nthreads and nloops which I believe are 8 threads and
20 loop iterations.

I presented a *proposal* fix which:

1. keeps your _numpunct.h forwarding patch
2. eliminates 100% of the race conditions

I have yet to see a counter-proposal.

The only thing i've seen are assertions (race condition
instrumentation and detection tools are wrong), mischaracterizations
(your patch is evil) and overall just email bullshit.

Not a single line of code which would resolve the 12440 race conditions problem.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman

On Thu, Sep 20, 2012 at 8:04 PM, Wojciech Meyer
wojciech.me...@googlemail.com wrote:


 Therefore please use tools but be a bit reserved for the results.

I *am* being cautiously skeptical about the results. That's why I am
using 4 [ FOUR ] different thread analyzers, on three different
operating systems, each one of them in 32- and 64- bit, and not just
one.

With this testing setup described above, when all FOUR instrumentation
toosl report the same exact problem in the same exact spot, for all
flavors of the operating system, what would be a rational conclusion?

1. There is indeed a race condition and thread safety problem, it
needs to be investigated and fixed..

2. Bah, the tools are crap, nothing to see here, move along, declare victory.

I chose [1] because I am willing to accept my *own* limitations.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman

On Thu, Sep 20, 2012 at 8:18 PM, Liviu Nicoara nikko...@hates.ms wrote:

 That is not it, and you did not. Please pay attention: given your assertion 
 that a race condition is a defect that causes an abnormal execution of the 
 program during which the program sees abnormal, incorrect states (read: 
 variable values) it should be easy for you to craft a program that shows 
 evidence of that by either printing those values, or aborting upon detecting 
 them, etc.

Oh, I see.

So now I'm supposed to write a program which may, or may not, prove to
you that the 12440 race conditions detected by SunPro and Intel are,
in fact, real race conditions (as opposed to fake race
conditions)?

And the means of proving the existence of these real race conditions
is ... [ drum roll ] ... fprintf(3C)?

This is very funny. You made my day,

Have a nice evening.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-20 Thread Stefan Teleman

On Thu, Sep 20, 2012 at 8:39 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I have not created this requirement out of thin air. STDCXX development has 
 functioned in this manner for as long as I remember. If it does not suit you, 
 that's fine.

That would explain why these bugs are present in the first place.

If the official method of determining thread-safety here is
fprintf(3C), then we have a much bigger problem than
22.locale.numpunct.mt.


-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-21 Thread Stefan Teleman

On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
travis.vi...@roguewave.com wrote:

 You called out premature optimization as evil, in a discussion about patches 
 you provided that include optimizations and no testcase showing that your 
 changes are not premature and provide measureable benefit. Then you rail on...

I didn't call premature optimization evil. Donald Knuth did. If you
disagree with him, please feel free to let him know. He's on faculty
at Stanford.

 Then, to top it off, you go on to call me a know-it-all who isn't capable of 
 figuring out my own bugs.

 I'm sorry, but that isn't acceptable

Too bad if you feel that way. Next time you get the idea of making
snide remarks about my working knowledge of the Standard C Library, or
offer out-of-context one-line code fragments completely unrelated to
what this 1056 bug is about, maybe you'll think twice. And this isn't
the first time you offer gratuitous snide remarks directed at me.

You are one of the deniers of the existence of this thread safety
problem in the facets code, going back to early February of this year.

Between the release of stdcxx 4.2.1 in 2008 and the beginning of this
month, when the possibility of this thread safety problem was finally
acknowledged, did you really not know that 22.locale.numpunct.mt and
22.locale.moneypunct.mt have been crashing or getting stuck? Did you
really not know that these crashes were typical symptoms of race
conditions? I find that very hard to believe, given that the problem
has been reported several times before February of this year.

I have provided this list with test results showing that my patch
*does* fix the race condition problems identified by all the tools at
my disposal. I'm willing to bet you never looked at it. You dismissed
it outright, just because you didn't like the *idea* that increasing
the size of the cache, and eliminating that useless cache resizing
would play an important role in eliminating the race conditions.

I have yet to see an alternative patch being proposed here, which
would eliminate the race conditions, and which I am willing to test
just as thoroughly as I've tested mine. The only thing i've seen are
continued attempts at dismissing the existence of these race
conditions, as  reported by the instrumentation tools, based on some
general axioms about their accuracy. No-one on this list has a clue as
to how the SunPro Thread Analyzer actually works, because it's not
open source, and none of you work at Oracle, therefore you can't see
the code. But everyone just *knows* that it's inaccurate, or that it
reports false positives.

As long as you, or anyone else, continues to blame the instrumentation
tools for the bug, and as long as anyone here continues to suggest
that the only acceptable proof of the existence of this bug is some
other program which needs to be written using fprintf(3C), and as long
as no-one here is willing to provide an alternative patch which
demonstrably eliminates 100% of the reported race conditions, this
entire back-and-forth about the existence of these race conditions,
the accuracy of the tools and what they are reporting is nothing but a
giant waste of time.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Stefan Teleman

On Sun, Sep 23, 2012 at 3:26 PM, Liviu Nicoara nikko...@hates.ms wrote:

 To be honest it's quite bizarre that you cannot share that with us. Is it
 really a trade secret? How can that be the case if Oracle customers are also
 required to perform the same alignment, perhaps using the same techniques
 like you showed in the patch?

That's the problem. I don't know what is and what is not a trade
secret, or copyrighted information, or unauthorized intellectual
property disclosure anymore.  I think it's in the eye of the
beholder. At Sun it was very clear.

Believe it or not, I had to get written approval from the Legal
Counsel's Office in order to be able to share these patches. And that
in spite of the fact that these patches are published, and have
already been published for *years*.

IANAL and I don't want to play one on TV. ;-)

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Stefan Teleman

On Sun, Sep 23, 2012 at 5:23 PM, Stefan Teleman
stefan.tele...@gmail.com wrote:

 The second URL says this:

 QUOTE
 Due to a change in the implementation of the userland mutexes
 introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and
 pthread_mutex_t must start at 8-byte aligned addresses. If this
 requirement is not satisfied, all non-compliant applications on
 Solaris/SPARC may fail with the signal SEGV with a callstack similar
 to the following one or with similar callstacks containing the
 function mutex_trylock_process.

   \*_atomic_cas_64(0x141f2c, 0x0, 0xff00, 0x1651, 0xff00, 0x466d90)
   set_lock_byte64(0x0, 0x1651, 0xff00, 0x0, 0xfec82a00, 0x0)
   fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780)

 /QUOTE

Here's a link to an official datatype alignment table for SPARCV8:

http://docs.oracle.com/cd/E19205-01/819-5267/bkbkl/index.html

The interesting table is:

Table B–2 Storage Sizes and Default Alignments in Bytes

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-23 Thread Stefan Teleman

On Fri, Sep 21, 2012 at 9:10 AM, Liviu Nicoara nikko...@hates.ms wrote:
 On 09/21/12 05:13, Stefan Teleman wrote:

 On Fri, Sep 21, 2012 at 2:28 AM, Travis Vitek
 travis.vi...@roguewave.com wrote:

 I have provided this list with test results showing that my patch
 *does* fix the race condition problems identified by all the tools at
 my disposal. I'm willing to bet you never looked at it. You dismissed
 it outright, just because you didn't like the *idea* that increasing
 the size of the cache, and eliminating that useless cache resizing
 would play an important role in eliminating the race conditions.



 I looked at it in great detail and I am sure Travis read it too. The
 facet/locale buffer management is a red herring. Apparently, we cannot
 convince you, regardless of the argument. That's fine by me.

This bug [STDCXX-1056] was updated over the weekend with new comments.
Here's the link to the comments, for the record:

https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-23 Thread Stefan Teleman

On Sun, Sep 23, 2012 at 7:25 PM, Liviu Nicoara nikko...@hates.ms wrote:

 I am not asking for any other implementation and I am not looking to change
 anything. I wish you could explain it to us, but in the absence of trade
 secret details I will take an explanation for the questions above.

Sorry, no.

This will not be another replay of the stdcxx-1056 email discussion,
which was a three week waste of my time.

The patch is provided AS IS. No further testing and no further
comments. I do have more important things to do.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1066 [was: Re: STDCXX forks]

2012-09-24 Thread Stefan Teleman

On Mon, Sep 24, 2012 at 8:21 AM, Liviu Nicoara nikko...@hates.ms wrote:

 In the light of your inability to answer the simplest questions about the
 correctness and usefulness of this patch, I propose we strike the patch in
 its entirety.

Let me make something very clear to you: what I am doing here is a
courtesy to the stdcxx project. There is no requirement in my job
description to waste hours arguing with you in pointless squabbles
over email. Nor is there a requirement in the APL V2.0 which would
somehow compel us to redistribute our source code changes.

  We could and should re-work the instances in the library where
 we might use mutex and condition objects that are misaligned wrt the
 mentioned kernel update.

Yeah, why don't you go ahead and do that. Just like you fixed stdcxx-1056.

--Stefan

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-24 Thread Stefan Teleman

On Mon, Sep 24, 2012 at 7:48 PM, Liviu Nicoara nikko...@hates.ms wrote:

 Stefan, was it your intention to completely eliminate all the race
 conditions with this last patch? Is this what the tools showed in your
 environment?

 https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461452#comment-13461452

Yes, all the race conditions in std::numpunct and std::moneypunct.
Not all the race conditions in stdcxx.

Yes, that's what the test results show with my latest patchset - 0
race conditions in numpunct and moneypunct on Linux/Intel 32/64,
Solaris SPARC 32/64 and Solaris Intel 32/64. The test results are in
the onlinehome.us directory URL i put in the comment.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: STDCXX-1056 : numpunct fix

2012-09-24 Thread Stefan Teleman

On Mon, Sep 24, 2012 at 10:03 PM, Martin Sebor mse...@gmail.com wrote:
 FWIW, there are race conditions in stdcxx. Some of them are by
 design and benign on the systems the library runs on (although
 I acknowledge that some others may be bugs). One such benign
 date race is:

   timeT1  T2
 0x = N
 1x = N   read x

 where x is a scalar that can be accessed atomically by the CPU
 and the compiler.

 I think some of the lazy facet initialization falls under this
 class. It would be nice to fix them but only if it doesn't slow
 things down. The others need to be fixed in any case.

The race conditions I am talking about are not benign.

I've uploaded a full thread analyzer output for 22.locale.numpunct.mt
showing dataraces here:

http://s247136804.onlinehome.us/stdcxx-1056-malign/22.locale.numpunct.mt.2.er.tar.bz2

The name of the analyzer results directory is 22.locale.numpunct.mt.2.er

You will need the SunPro Linux 12.3 Thread Analyzer installed, which
comes with SunPro anyway. The analyzer itself is
${PATH_TO_SUN_PRO_INSTALL}/bin/analyzer

There's a screenshot from the same Analyzer output here:

 
http://s247136804.onlinehome.us/stdcxx-1056-malign/sunpro_thread_analyzer_screenshot.jpg

taken just now on my laptop. The report itself is from a couple of
days ago, and it's from a run with only the _numpunct.h patch applied.
No patches to either facet.cpp, punct.cpp or locale_body.cpp.

It shows the types of race conditions it's reporting: these are
read/write race conditions, not read/read. The thread analyzer's html
filter doesn't show the types of races reported as clearly as the
command-line analyzer which has a windowing GUI.

At any rate you can see the same exact type of race conditions being
reported by the Intel Inspector 2013 Thread Analyzer.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: [jira] [Closed] (STDCXX-1056) std::moneypunct and std::numpunct implementations are not thread-safe

2012-09-25 Thread Stefan Teleman

On Tue, Sep 25, 2012 at 8:05 PM, Liviu Nicoara nikko...@hates.ms wrote:
 On 9/25/12 7:56 PM, Stefan Teleman (JIRA) wrote:


   [
 https://issues.apache.org/jira/browse/STDCXX-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


 Stefan,

 I don't think it's ok to close this bug. The race conditions are there and
 we have not come to a completely satisfactory conclusion on how to deal with
 them, or even if we should deal with them. Whichever it is we gotta keep
 this discussion open. I sure hope you want to be a part of it.

 FWIW, I have spent quite some time looking at your proposed patch and I am
 going to reopen the incident. If I can.

I am done wasting my time on trying to convince this project that it
should fix its severe bugs.

If and when this project decides to abandon the adolescent attitude
which appears to be one of its primary characteristics, let me know.

-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

Re: Search for new chair

2013-05-29 Thread Stefan Teleman

On Wed, May 29, 2013 at 5:55 PM, Martin Sebor mse...@gmail.com wrote:
 On 05/29/2013 07:27 AM, Stefan Teleman wrote:

 On Wed, May 29, 2013 at 7:33 AM, C. Bergström
 cbergst...@pathscale.com wrote:

 On 05/29/13 06:29 PM, Jim Jagielski wrote:


 I am stepping down as Chair of the C++ StdLib PMC.

 So the question is: Does this project and community
 elect a new Chair, or does it enter the Attic?


 I'd be willing to chair if others are supportive


 OK, so before I give you a +1, could you please outline what is your
 Plan(TM) regarding resurrecting this project?

 How are you going to do it, and, more specifically, what are you going to
 do?


 FYI: A chair doesn't necessarily need to have a plan to do anything
 other than fulfill the duties assigned to them by the ASF:

   http://www.apache.org/dev/pmc.html#chair

 It's mostly a bureaucratic role, and can be a big time sink (reading
 all the mailing lists, like board and members can be especially time
 consuming). Other than that, being a chair doesn't give one the power
 or ability to assure the success of a project than the rest of us.

 But to be chair, one needs to be a member of the foundation. It's
 usually a non-trivial process for one to become a member. Some of
 the prerequisites include long time contribution to at least one
 project, the sponsorship and nomination by another member, and
 a vote to accept the new member of the rest of the membership.
 I think the vote happens just a few times a year, and the last
 one was just last week.

 Maybe there's a way around this bureaucracy if the alternative
 is shutting the project down.

My question was purely pragmatic. If we are to try reviving the
project (again) and elect another chair (again) I'd like to know that
it won't be yet another dead-end exercise.

I also think Leandro asked a very pertinent set of questions in his
earlier post.

--Stefan


-- 
Stefan Teleman
KDE e.V.
stefan.tele...@gmail.com

66 matches

Mail list logo