Re: Fixing more OpenSSL callback crashes

2017-05-04 Thread Ruediger Pluem


On 05/04/2017 05:47 PM, Jacob Champion wrote:
> On 05/03/2017 11:25 PM, Ruediger Pluem wrote:
>> Just as a heads up as I currently don't have time to investigate further. I 
>> get the below on CentOS 6.9 64 bit, which
>> puzzles me a little bit as I would expect the errno addresses to be 
>> different in different threads on my OS.
>>
>> [Thu May 04 06:17:13.723918 2017] [ssl:notice] [pid 2629:tid 
>> 140039001335552] AH10028: using deprecated
>> CRYPTO_set_id_callback for OpenSSL
>>
>> OpenSSL used is the one delivered by CentOS: openssl-1.0.1e-57.el6.x86_64
> 
> Do you use an installed APR or build using --with-included-apr? The configure 
> test that determines whether your errno is
> safe has to be linked against an already-built APR; otherwise the fallbacks 
> come into play.

This is likely the cause for this. I have apr trunk dropped in the srclib 
directory and build this together with httpd
trunk.
You mentioned this in the threads. I forgot about this :-). Thanks for the 
reminder.

Regards

RĂ¼diger


Re: Fixing more OpenSSL callback crashes

2017-05-04 Thread Jacob Champion

On 05/04/2017 09:39 AM, Jacob Champion wrote:

On 05/04/2017 09:36 AM, William A Rowe Jr wrote:

Ugh... This suggests we've further broken crosscompile, just noticed
this based on your comment.


Why? Cross-compilation uses the same fallback mechanism.


To expand on this, there are three choices for implementation for older 
(pre-1.1.0) OpenSSLs:


- Builtin (optimal on some platforms, nonexistent or unsafe on others)
- Deprecated (but believed safe enough for most)
- Dangerous (but still apparently good enough for a bunch of people?)

Builtin is only used if we can *prove* that the builtin implementation 
is available and safe. Some environments (Windows + 1.0.x) are known to 
have safe builtins; everyone else has to run a test.


If we can't run that test for any reason, we fall back to the Deprecated 
implementation. If that API is no longer available (e.g. 
OPENSSL_NO_DEPRECATED is in use), we have no choice but to use the 
Dangerous implementation.


So since we can't run a test executable on a cross-compile target, if 
you're cross-compiling to a platform that isn't "known safe", we'll fall 
back to the Deprecated implementation if it's available. That choice can 
be overridden with a cachevar, if you know your platform guarantees 
safety -- ac_cv_openssl_use_errno_threadid in this case.


--Jacob


Re: Fixing more OpenSSL callback crashes

2017-05-04 Thread Jacob Champion

On 05/04/2017 09:36 AM, William A Rowe Jr wrote:

Ugh... This suggests we've further broken crosscompile, just noticed
this based on your comment.


Why? Cross-compilation uses the same fallback mechanism. If a user 
doesn't like the conservative choice, he/she should set the cachevars to 
override the defaults, per usual.


(If you find that my system *doesn't* work for cross-compilation, please 
let me know; that's a bug. :D But I did have it in mind when I wrote it.)


--Jacob


Re: Fixing more OpenSSL callback crashes

2017-05-04 Thread William A Rowe Jr
On May 4, 2017 10:47 AM, "Jacob Champion"  wrote:

On 05/03/2017 11:25 PM, Ruediger Pluem wrote:

> Just as a heads up as I currently don't have time to investigate further.
> I get the below on CentOS 6.9 64 bit, which
> puzzles me a little bit as I would expect the errno addresses to be
> different in different threads on my OS.
>
> [Thu May 04 06:17:13.723918 2017] [ssl:notice] [pid 2629:tid
> 140039001335552] AH10028: using deprecated
> CRYPTO_set_id_callback for OpenSSL
>
> OpenSSL used is the one delivered by CentOS: openssl-1.0.1e-57.el6.x86_64
>

Do you use an installed APR or build using --with-included-apr? The
configure test that determines whether your errno is safe has to be linked
against an already-built APR; otherwise the fallbacks come into play.


Ugh... This suggests we've further broken crosscompile, just noticed this
based on your comment.


Re: Fixing more OpenSSL callback crashes

2017-05-04 Thread Jacob Champion

On 05/03/2017 11:25 PM, Ruediger Pluem wrote:

Just as a heads up as I currently don't have time to investigate further. I get 
the below on CentOS 6.9 64 bit, which
puzzles me a little bit as I would expect the errno addresses to be different 
in different threads on my OS.

[Thu May 04 06:17:13.723918 2017] [ssl:notice] [pid 2629:tid 140039001335552] 
AH10028: using deprecated
CRYPTO_set_id_callback for OpenSSL

OpenSSL used is the one delivered by CentOS: openssl-1.0.1e-57.el6.x86_64


Do you use an installed APR or build using --with-included-apr? The 
configure test that determines whether your errno is safe has to be 
linked against an already-built APR; otherwise the fallbacks come into play.


--Jacob


Re: Fixing more OpenSSL callback crashes

2017-05-04 Thread Ruediger Pluem


On 04/19/2017 05:54 PM, Jacob Champion wrote:
> On 04/12/2017 11:34 AM, Jacob Champion wrote:
>> It's probably worth noting at this point that, even if  is unsafe:
>>
>> - Windows and BeOS users are still handled explicitly by default in 1.0.x.
>> - If OpenSSL still provides the deprecated CRYPTO_set_id_callback(), we
>> can use it instead. We're not making use of the pointer-based THREADID
>> implementation like we should be, heh, so we're not really getting a
>> benefit out of the new system.
>> - This whole problem goes away in 1.1.x.
> 
> Latest trunk now takes care of all of these cases -- if you're not on a 
> platform that has a known safe default
> implementation, we'll use CRYPTO_set_id_callback() instead, falling back to 
> the THREADID stuff only as a last resort.

Just as a heads up as I currently don't have time to investigate further. I get 
the below on CentOS 6.9 64 bit, which
puzzles me a little bit as I would expect the errno addresses to be different 
in different threads on my OS.

[Thu May 04 06:17:13.723918 2017] [ssl:notice] [pid 2629:tid 140039001335552] 
AH10028: using deprecated
CRYPTO_set_id_callback for OpenSSL


OpenSSL used is the one delivered by CentOS: openssl-1.0.1e-57.el6.x86_64

Regards

RĂ¼diger



Re: Fixing more OpenSSL callback crashes

2017-04-19 Thread Jacob Champion

On 04/12/2017 11:34 AM, Jacob Champion wrote:

It's probably worth noting at this point that, even if  is unsafe:

- Windows and BeOS users are still handled explicitly by default in 1.0.x.
- If OpenSSL still provides the deprecated CRYPTO_set_id_callback(), we
can use it instead. We're not making use of the pointer-based THREADID
implementation like we should be, heh, so we're not really getting a
benefit out of the new system.
- This whole problem goes away in 1.1.x.


Latest trunk now takes care of all of these cases -- if you're not on a 
platform that has a known safe default implementation, we'll use 
CRYPTO_set_id_callback() instead, falling back to the THREADID stuff 
only as a last resort.


If anyone would like to help test out the new logic and collect some 
data, please also let me know which of the three implementations 
(builtin, deprecated, or dangerous) your machine ends up using. mod_ssl 
will log a notice on startup that tells you. (My plan is to reduce that 
log level to something like debug, once PR60999 is fixed.)


If you're really interested in putting it through its paces, here are 
the configuration cachevars that control which implementation is in use. 
Note that Windows and BeOS-based machines will *always* use the builtin 
implementation, if you're using OpenSSL 1.0.x.


- ac_cv_openssl_use_errno_threadid=yes|no

Indicates whether your machine has a threadsafe errno address. Set
it to 'no' to force a fallback to the deprecated implementation.

- ac_cv_func_CRYPTO_set_id_callback=yes|no

Indicates whether your OpenSSL provides the deprecated callback
implementation. Set both variables to 'no' to force a fallback to
the dangerous (crashy) implementation.

Thanks,
--Jacob


Re: Fixing more OpenSSL callback crashes

2017-04-14 Thread Tsuyoshi SASAMOTO
>I saw it in the header file, but actually testing it on Solaris 10 Sparc
>showed that the address was different as well in different threads. Can
>you try on you system as well?

Actually, my system is OpenIndiana, but you're right.

The address of ___errno() function is same,
but the return value of ___errno() (that is,
the pointer to thread-local errno) is different.

Sorry for the confusion.


Tsuyoshi SASAMOTO
nazon...@miobox.jp


Re: Fixing more OpenSSL callback crashes

2017-04-14 Thread Rainer Jung

Am 13.04.2017 um 23:40 schrieb Tsuyoshi SASAMOTO:

...oh. So errno is actually threadsafe, but its "address" is the same in
every thread? Interesting.


MT-Safe errno of Solaris is implemented as a function,
so its address is same but the value is different.
cf. https://src.illumos.org/source/xref/illumos-gate/usr/src/head/errno.h

https://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/port/gen/errno.c


I saw it in the header file, but actually testing it on Solaris 10 Sparc 
showed that the address was different as well in different threads. Can 
you try on you system as well?


Regards,

Rainer


Re: Fixing more OpenSSL callback crashes

2017-04-13 Thread Tsuyoshi SASAMOTO
>...oh. So errno is actually threadsafe, but its "address" is the same in
>every thread? Interesting.

MT-Safe errno of Solaris is implemented as a function,
so its address is same but the value is different.
cf. https://src.illumos.org/source/xref/illumos-gate/usr/src/head/errno.h

https://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/port/gen/errno.c


Tsuyoshi SASAMOTO
nazon...@miobox.jp


Re: Fixing more OpenSSL callback crashes

2017-04-12 Thread Jacob Champion

On 04/12/2017 07:53 AM, Rainer Jung wrote:

The good news:

I checked by compiling your test program standalone (outside of
configure) to be able to run it easily very often (1000 times). I
checked on Solaris 8 (32 Bit) and 10 Sparc (32 Bit and 64 Bit) compilation.

When I compile without any flags, the test fails with return code 5,
when I compile with -D_REENTRANT it succeeds (rc 0). Since apr always
adds -D_REENTRANT for Solaris in build/apr_hints.m4 (independent of
APR_HAS_THREADS), and your test runs in httpd configure after apr
configure and thus inherits the flags, the test would correctly detect a
thread-safe errno for Solaris during configure.


Excellent. Thanks for the test!


The bad news:

If building httpd with apr/apu sources in srclib and using
--with-included-apr (and not assuming some unrelated installed system
APR), the test can't run, because the binary needs to get linked against
libapr-1 during configure runtime (where the lib has not yet been
build). I don't know how easy a non-apr using test would be that still
runs on the relevant target platforms :(


I was worried about that. I don't think it makes much sense to 
reimplement the portability layer just to get threads and locks for a 
configure test, especially if this problem is going to slowly disappear 
as people upgrade OpenSSL. Let me know if you have any good ideas, though...


Worst case, we don't run the test, and fall back to assuming  is 
unsafe. Then affected users either risk the crashes like they do today, 
or they set the cachevar manually to override the test.


It's probably worth noting at this point that, even if  is unsafe:

- Windows and BeOS users are still handled explicitly by default in 1.0.x.
- If OpenSSL still provides the deprecated CRYPTO_set_id_callback(), we 
can use it instead. We're not making use of the pointer-based THREADID 
implementation like we should be, heh, so we're not really getting a 
benefit out of the new system.

- This whole problem goes away in 1.1.x.

So the users who will still be affected are those who use OpenSSL 1.0.x, 
compiled with OPENSSL_NO_DEPRECATED, on platforms that don't pass (or 
can't run) our errno test. The only way to fix them is to pin the 
callback address somehow, and I'm not entirely sure it's worth the 
effort, if we're not currently seeing other reports of problems.



Side note:

I did increase NUM_THREADS to 10. But actually your original number of 3
gave the same results. Nevertheless I would increase the number if it is
OK for you (I know, quadratic behavior, but we shouldn't care too much
for this test):

[patch]


Looks good to me, thanks! If you'd like, go ahead and check it into the 
trunk-openssl-threadid feature branch. Otherwise I'll plan to get around 
to it tomorrow.


--Jacob


Re: Fixing more OpenSSL callback crashes

2017-04-12 Thread Rainer Jung

Hi Jacob,

Am 12.04.2017 um 02:16 schrieb Jacob Champion:

On 04/10/2017 03:59 PM, Jacob Champion wrote:

So it looks like my test program might still be a possible solution for
detecting whether we need a callback at configure time, unless anyone
knows of a platform where two thread-local errnos can have the same
address some of the time and different addresses at other times...


I've checked my first attempt into trunk. It writes the result into the
new AP_OPENSSL_USE_ERRNO_THREADID macro in ap_config_auto.h. On
platforms where every thread has its own errno address, that macro will
be #defined to 1.

Rainer, would you mind sanity-checking the result on your Solaris box
(sounds like the test should *not* pass there)? I've tested on Ubuntu
16.04, where the test passes.


The good news:

I checked by compiling your test program standalone (outside of 
configure) to be able to run it easily very often (1000 times). I 
checked on Solaris 8 (32 Bit) and 10 Sparc (32 Bit and 64 Bit) compilation.


When I compile without any flags, the test fails with return code 5, 
when I compile with -D_REENTRANT it succeeds (rc 0). Since apr always 
adds -D_REENTRANT for Solaris in build/apr_hints.m4 (independent of 
APR_HAS_THREADS), and your test runs in httpd configure after apr 
configure and thus inherits the flags, the test would correctly detect a 
thread-safe errno for Solaris during configure.


The bad news:

If building httpd with apr/apu sources in srclib and using 
--with-included-apr (and not assuming some unrelated installed system 
APR), the test can't run, because the binary needs to get linked against 
libapr-1 during configure runtime (where the lib has not yet been 
build). I don't know how easy a non-apr using test would be that still 
runs on the relevant target platforms :(


Side note:

I did increase NUM_THREADS to 10. But actually your original number of 3 
gave the same results. Nevertheless I would increase the number if it is 
OK for you (I know, quadratic behavior, but we shouldn't care too much 
for this test):


Index: trunk/acinclude.m4
===
--- trunk/acinclude.m4(revision 1791085)
+++ trunk/acinclude.m4(working copy)
@@ -653,7 +653,7 @@
   #include "apr_thread_cond.h"
   #include "apr_thread_proc.h"

-  #define NUM_THREADS 3
+  #define NUM_THREADS 10

   struct thread_data {
   apr_thread_mutex_t *mutex;
@@ -692,6 +692,7 @@
   int ret = 0;
   apr_status_t status;
   int i;
+  int j;

   apr_pool_t *pool;
   apr_thread_mutex_t *mutex;
@@ -738,10 +739,13 @@
   }

   /* Check that no addresses were duplicated. */
-  if ((tdata[0].errno_addr == tdata[1].errno_addr)
-  || (tdata[1].errno_addr == tdata[2].errno_addr)
-  || (tdata[0].errno_addr == tdata[2].errno_addr)) {
-  ret = 5;
+  for (i = 0; i < NUM_THREADS - 1; ++i) {
+  for (j = i + 1; j < NUM_THREADS; ++j) {
+  if (tdata[i].errno_addr == tdata[j].errno_addr) {
+  ret = 5;
+  goto out;
+  }
+  }
   }

   out:



If there's anyone else who wouldn't mind updating and posting their
platform and test result, that would really help me figure out how
widespread the "problem" is. Remember to rebuild configure first! :)


Regards,

Rainer


Re: Fixing more OpenSSL callback crashes

2017-04-11 Thread Jacob Champion

On 04/10/2017 03:59 PM, Jacob Champion wrote:

So it looks like my test program might still be a possible solution for
detecting whether we need a callback at configure time, unless anyone
knows of a platform where two thread-local errnos can have the same
address some of the time and different addresses at other times...


I've checked my first attempt into trunk. It writes the result into the 
new AP_OPENSSL_USE_ERRNO_THREADID macro in ap_config_auto.h. On 
platforms where every thread has its own errno address, that macro will 
be #defined to 1.


Rainer, would you mind sanity-checking the result on your Solaris box 
(sounds like the test should *not* pass there)? I've tested on Ubuntu 
16.04, where the test passes.


If there's anyone else who wouldn't mind updating and posting their 
platform and test result, that would really help me figure out how 
widespread the "problem" is. Remember to rebuild configure first! :)


--Jacob


Re: Fixing more OpenSSL callback crashes

2017-04-10 Thread Jacob Champion

On 04/10/2017 03:21 PM, Rainer Jung wrote:

Am 10.04.2017 um 22:41 schrieb Jacob Champion:

A few questions for the list while I'm brainstorming the best way to fix
https://bz.apache.org/bugzilla/show_bug.cgi?id=60947 ...

- What is the oldest version of OpenSSL we'll support for the 2.4.x
line? Will that version change in 2.next?


For 2.4.0 it was discussed originally in May/June 2010. The original
proposal was dropping support < 1.0.0, but we ended up in supporting
everything starting with 0.9.7 (Threads was: "RFC: drop support for
OpenSSL < 1.0 in trunk/2.3?"):




Excellent, thanks for the history!


2.next: that would be up for discussion like it was done for 2.4 in
2010. I guess we would look again at what current OS distributions provide.


Okay. I'll be the first to push for dropping support for anything no 
longer supported by OpenSSL upstream -- that would leave us with 1.0.2 
and 1.1.0 -- but unfortunately I suspect we will need to drop to 1.0.1 
to deal with some common long-term-release distributions that are still 
alive.


And I'm not looking forward to the inevitable Mac conversation. That's 
for another thread, I guess.



- Does anyone have a clever way to check, during configuration time,
that errno is threadsafe? My plan was to spin up two threads in a config
test program and see if the address of errno was different.


I tried it and it does not detect the threadsafe errno when using
-D_REENTRANT. See below.


...oh. So errno is actually threadsafe, but its "address" is the same in 
every thread? Interesting. In that case, my original question was not 
actually what I wanted to ask.


What I actually want to know is whether the address of errno can be used 
as a makeshift thread ID, because that's what the default implementation 
in OpenSSL uses. For platforms on which that isn't true (looks like 
Solaris is one), we either have to provide our own callback, which may 
cause a crash during server startup, or have the user upgrade to 1.1.0, 
where the threadid callback isn't needed at all.


So it looks like my test program might still be a possible solution for 
detecting whether we need a callback at configure time, unless anyone 
knows of a platform where two thread-local errnos can have the same 
address some of the time and different addresses at other times...


--Jacob


Re: Fixing more OpenSSL callback crashes

2017-04-10 Thread Rainer Jung

Am 10.04.2017 um 22:41 schrieb Jacob Champion:

A few questions for the list while I'm brainstorming the best way to fix
https://bz.apache.org/bugzilla/show_bug.cgi?id=60947 ...

- What is the oldest version of OpenSSL we'll support for the 2.4.x
line? Will that version change in 2.next?


For 2.4.0 it was discussed originally in May/June 2010. The original 
proposal was dropping support < 1.0.0, but we ended up in supporting 
everything starting with 0.9.7 (Threads was: "RFC: drop support for 
OpenSSL < 1.0 in trunk/2.3?"):


- support and build warning-free with OpenSSL >= 0.9.8
- support and build with OpenSSL >= 0.9.7a, albeit with (harmless)
  compiler warnings about argument const-ness all over the shop
- drop support for OpenSSL < 0.9.7a
- drop support for non-OpenSSL/derivatives of OpenSSL

The new minimum was mostly driven by looking at what was available on 
platforms at that time, e.g. not demanding users to build OpenSSL 
themselves. Sander Temme at that time provided this list:


* Windows: none
* Mac OS X 10.6: OpenSSL 0.9.8l 5 Nov 2009
* FreeBSD 6.4-STABLE: OpenSSL 0.9.7e-p1 25 Oct 2004
* FreeBSD 7.2-STABLE: OpenSSL 0.9.8e 23 Feb 2007
* FreeBSD 8-STABLE: OpenSSL 0.9.8k 25 Mar 2009
* OpenBSD 4.6: OpenSSL 0.9.8k 25 Mar 2009
* Solaris 10: 0.9.7 with backports... don't recall what's in the 
Coolstack but someone else may be able to tell us.
* Sunfreeware.com: 1.0.0 and 0.9.7g, with both Apache 2.0.59 and 2.2.15 
built against 1.0.0

* Red Hat 5: 0.9.8b with backports
* Red Hat 4: 0.9.7 with backports
* Ubuntu 10.04: OpenSSL 0.9.8k 25 Mar 2009

Later in 2.4.7 minimum support changed to 0.9.8a in Nov 2013 (commits 
r1542327 and r1555469). I think it hasn't changed since then.


2.next: that would be up for discussion like it was done for 2.4 in 
2010. I guess we would look again at what current OS distributions provide.



- Does anyone have a clever way to check, during configuration time,
that errno is threadsafe? My plan was to spin up two threads in a config
test program and see if the address of errno was different.


I tried it and it does not detect the threadsafe errno when using 
-D_REENTRANT. See below.


Probably not really good as well, but I tried starting two threads, both 
doing an invalid fdopen(), the first with an invalid fd (-1), the second 
with an invalid mode ("z"), then let both wait a few seconds, so that 
both had a chance to set errno, and then let both show what errno is. 
When compiling with -D_REENTRANT is get different errno values for both 
threads.



- Does anyone know of platforms we support that *definitely* don't have
a threadsafe errno? (From poking around on Google, it seemed like
Solaris might fit into that bucket? Any others?)


Solaris errno is threadsafe if _REENTRANT is defined.

More precisely, if

#if defined(_REENTRANT) || defined(_TS_ERRNO) || _POSIX_C_SOURCE - 0 >= 
199506L


Regards,

Rainer