Re: Fixing more OpenSSL callback crashes
On 05/04/2017 05:47 PM, Jacob Champion wrote: > On 05/03/2017 11:25 PM, Ruediger Pluem wrote: >> Just as a heads up as I currently don't have time to investigate further. I >> get the below on CentOS 6.9 64 bit, which >> puzzles me a little bit as I would expect the errno addresses to be >> different in different threads on my OS. >> >> [Thu May 04 06:17:13.723918 2017] [ssl:notice] [pid 2629:tid >> 140039001335552] AH10028: using deprecated >> CRYPTO_set_id_callback for OpenSSL >> >> OpenSSL used is the one delivered by CentOS: openssl-1.0.1e-57.el6.x86_64 > > Do you use an installed APR or build using --with-included-apr? The configure > test that determines whether your errno is > safe has to be linked against an already-built APR; otherwise the fallbacks > come into play. This is likely the cause for this. I have apr trunk dropped in the srclib directory and build this together with httpd trunk. You mentioned this in the threads. I forgot about this :-). Thanks for the reminder. Regards RĂ¼diger
Re: Fixing more OpenSSL callback crashes
On 05/04/2017 09:39 AM, Jacob Champion wrote: On 05/04/2017 09:36 AM, William A Rowe Jr wrote: Ugh... This suggests we've further broken crosscompile, just noticed this based on your comment. Why? Cross-compilation uses the same fallback mechanism. To expand on this, there are three choices for implementation for older (pre-1.1.0) OpenSSLs: - Builtin (optimal on some platforms, nonexistent or unsafe on others) - Deprecated (but believed safe enough for most) - Dangerous (but still apparently good enough for a bunch of people?) Builtin is only used if we can *prove* that the builtin implementation is available and safe. Some environments (Windows + 1.0.x) are known to have safe builtins; everyone else has to run a test. If we can't run that test for any reason, we fall back to the Deprecated implementation. If that API is no longer available (e.g. OPENSSL_NO_DEPRECATED is in use), we have no choice but to use the Dangerous implementation. So since we can't run a test executable on a cross-compile target, if you're cross-compiling to a platform that isn't "known safe", we'll fall back to the Deprecated implementation if it's available. That choice can be overridden with a cachevar, if you know your platform guarantees safety -- ac_cv_openssl_use_errno_threadid in this case. --Jacob
Re: Fixing more OpenSSL callback crashes
On 05/04/2017 09:36 AM, William A Rowe Jr wrote: Ugh... This suggests we've further broken crosscompile, just noticed this based on your comment. Why? Cross-compilation uses the same fallback mechanism. If a user doesn't like the conservative choice, he/she should set the cachevars to override the defaults, per usual. (If you find that my system *doesn't* work for cross-compilation, please let me know; that's a bug. :D But I did have it in mind when I wrote it.) --Jacob
Re: Fixing more OpenSSL callback crashes
On May 4, 2017 10:47 AM, "Jacob Champion"wrote: On 05/03/2017 11:25 PM, Ruediger Pluem wrote: > Just as a heads up as I currently don't have time to investigate further. > I get the below on CentOS 6.9 64 bit, which > puzzles me a little bit as I would expect the errno addresses to be > different in different threads on my OS. > > [Thu May 04 06:17:13.723918 2017] [ssl:notice] [pid 2629:tid > 140039001335552] AH10028: using deprecated > CRYPTO_set_id_callback for OpenSSL > > OpenSSL used is the one delivered by CentOS: openssl-1.0.1e-57.el6.x86_64 > Do you use an installed APR or build using --with-included-apr? The configure test that determines whether your errno is safe has to be linked against an already-built APR; otherwise the fallbacks come into play. Ugh... This suggests we've further broken crosscompile, just noticed this based on your comment.
Re: Fixing more OpenSSL callback crashes
On 05/03/2017 11:25 PM, Ruediger Pluem wrote: Just as a heads up as I currently don't have time to investigate further. I get the below on CentOS 6.9 64 bit, which puzzles me a little bit as I would expect the errno addresses to be different in different threads on my OS. [Thu May 04 06:17:13.723918 2017] [ssl:notice] [pid 2629:tid 140039001335552] AH10028: using deprecated CRYPTO_set_id_callback for OpenSSL OpenSSL used is the one delivered by CentOS: openssl-1.0.1e-57.el6.x86_64 Do you use an installed APR or build using --with-included-apr? The configure test that determines whether your errno is safe has to be linked against an already-built APR; otherwise the fallbacks come into play. --Jacob
Re: Fixing more OpenSSL callback crashes
On 04/19/2017 05:54 PM, Jacob Champion wrote: > On 04/12/2017 11:34 AM, Jacob Champion wrote: >> It's probably worth noting at this point that, even if is unsafe: >> >> - Windows and BeOS users are still handled explicitly by default in 1.0.x. >> - If OpenSSL still provides the deprecated CRYPTO_set_id_callback(), we >> can use it instead. We're not making use of the pointer-based THREADID >> implementation like we should be, heh, so we're not really getting a >> benefit out of the new system. >> - This whole problem goes away in 1.1.x. > > Latest trunk now takes care of all of these cases -- if you're not on a > platform that has a known safe default > implementation, we'll use CRYPTO_set_id_callback() instead, falling back to > the THREADID stuff only as a last resort. Just as a heads up as I currently don't have time to investigate further. I get the below on CentOS 6.9 64 bit, which puzzles me a little bit as I would expect the errno addresses to be different in different threads on my OS. [Thu May 04 06:17:13.723918 2017] [ssl:notice] [pid 2629:tid 140039001335552] AH10028: using deprecated CRYPTO_set_id_callback for OpenSSL OpenSSL used is the one delivered by CentOS: openssl-1.0.1e-57.el6.x86_64 Regards RĂ¼diger
Re: Fixing more OpenSSL callback crashes
On 04/12/2017 11:34 AM, Jacob Champion wrote: It's probably worth noting at this point that, even if is unsafe: - Windows and BeOS users are still handled explicitly by default in 1.0.x. - If OpenSSL still provides the deprecated CRYPTO_set_id_callback(), we can use it instead. We're not making use of the pointer-based THREADID implementation like we should be, heh, so we're not really getting a benefit out of the new system. - This whole problem goes away in 1.1.x. Latest trunk now takes care of all of these cases -- if you're not on a platform that has a known safe default implementation, we'll use CRYPTO_set_id_callback() instead, falling back to the THREADID stuff only as a last resort. If anyone would like to help test out the new logic and collect some data, please also let me know which of the three implementations (builtin, deprecated, or dangerous) your machine ends up using. mod_ssl will log a notice on startup that tells you. (My plan is to reduce that log level to something like debug, once PR60999 is fixed.) If you're really interested in putting it through its paces, here are the configuration cachevars that control which implementation is in use. Note that Windows and BeOS-based machines will *always* use the builtin implementation, if you're using OpenSSL 1.0.x. - ac_cv_openssl_use_errno_threadid=yes|no Indicates whether your machine has a threadsafe errno address. Set it to 'no' to force a fallback to the deprecated implementation. - ac_cv_func_CRYPTO_set_id_callback=yes|no Indicates whether your OpenSSL provides the deprecated callback implementation. Set both variables to 'no' to force a fallback to the dangerous (crashy) implementation. Thanks, --Jacob
Re: Fixing more OpenSSL callback crashes
>I saw it in the header file, but actually testing it on Solaris 10 Sparc >showed that the address was different as well in different threads. Can >you try on you system as well? Actually, my system is OpenIndiana, but you're right. The address of ___errno() function is same, but the return value of ___errno() (that is, the pointer to thread-local errno) is different. Sorry for the confusion. Tsuyoshi SASAMOTO nazon...@miobox.jp
Re: Fixing more OpenSSL callback crashes
Am 13.04.2017 um 23:40 schrieb Tsuyoshi SASAMOTO: ...oh. So errno is actually threadsafe, but its "address" is the same in every thread? Interesting. MT-Safe errno of Solaris is implemented as a function, so its address is same but the value is different. cf. https://src.illumos.org/source/xref/illumos-gate/usr/src/head/errno.h https://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/port/gen/errno.c I saw it in the header file, but actually testing it on Solaris 10 Sparc showed that the address was different as well in different threads. Can you try on you system as well? Regards, Rainer
Re: Fixing more OpenSSL callback crashes
>...oh. So errno is actually threadsafe, but its "address" is the same in >every thread? Interesting. MT-Safe errno of Solaris is implemented as a function, so its address is same but the value is different. cf. https://src.illumos.org/source/xref/illumos-gate/usr/src/head/errno.h https://src.illumos.org/source/xref/illumos-gate/usr/src/lib/libc/port/gen/errno.c Tsuyoshi SASAMOTO nazon...@miobox.jp
Re: Fixing more OpenSSL callback crashes
On 04/12/2017 07:53 AM, Rainer Jung wrote: The good news: I checked by compiling your test program standalone (outside of configure) to be able to run it easily very often (1000 times). I checked on Solaris 8 (32 Bit) and 10 Sparc (32 Bit and 64 Bit) compilation. When I compile without any flags, the test fails with return code 5, when I compile with -D_REENTRANT it succeeds (rc 0). Since apr always adds -D_REENTRANT for Solaris in build/apr_hints.m4 (independent of APR_HAS_THREADS), and your test runs in httpd configure after apr configure and thus inherits the flags, the test would correctly detect a thread-safe errno for Solaris during configure. Excellent. Thanks for the test! The bad news: If building httpd with apr/apu sources in srclib and using --with-included-apr (and not assuming some unrelated installed system APR), the test can't run, because the binary needs to get linked against libapr-1 during configure runtime (where the lib has not yet been build). I don't know how easy a non-apr using test would be that still runs on the relevant target platforms :( I was worried about that. I don't think it makes much sense to reimplement the portability layer just to get threads and locks for a configure test, especially if this problem is going to slowly disappear as people upgrade OpenSSL. Let me know if you have any good ideas, though... Worst case, we don't run the test, and fall back to assuming is unsafe. Then affected users either risk the crashes like they do today, or they set the cachevar manually to override the test. It's probably worth noting at this point that, even if is unsafe: - Windows and BeOS users are still handled explicitly by default in 1.0.x. - If OpenSSL still provides the deprecated CRYPTO_set_id_callback(), we can use it instead. We're not making use of the pointer-based THREADID implementation like we should be, heh, so we're not really getting a benefit out of the new system. - This whole problem goes away in 1.1.x. So the users who will still be affected are those who use OpenSSL 1.0.x, compiled with OPENSSL_NO_DEPRECATED, on platforms that don't pass (or can't run) our errno test. The only way to fix them is to pin the callback address somehow, and I'm not entirely sure it's worth the effort, if we're not currently seeing other reports of problems. Side note: I did increase NUM_THREADS to 10. But actually your original number of 3 gave the same results. Nevertheless I would increase the number if it is OK for you (I know, quadratic behavior, but we shouldn't care too much for this test): [patch] Looks good to me, thanks! If you'd like, go ahead and check it into the trunk-openssl-threadid feature branch. Otherwise I'll plan to get around to it tomorrow. --Jacob
Re: Fixing more OpenSSL callback crashes
Hi Jacob, Am 12.04.2017 um 02:16 schrieb Jacob Champion: On 04/10/2017 03:59 PM, Jacob Champion wrote: So it looks like my test program might still be a possible solution for detecting whether we need a callback at configure time, unless anyone knows of a platform where two thread-local errnos can have the same address some of the time and different addresses at other times... I've checked my first attempt into trunk. It writes the result into the new AP_OPENSSL_USE_ERRNO_THREADID macro in ap_config_auto.h. On platforms where every thread has its own errno address, that macro will be #defined to 1. Rainer, would you mind sanity-checking the result on your Solaris box (sounds like the test should *not* pass there)? I've tested on Ubuntu 16.04, where the test passes. The good news: I checked by compiling your test program standalone (outside of configure) to be able to run it easily very often (1000 times). I checked on Solaris 8 (32 Bit) and 10 Sparc (32 Bit and 64 Bit) compilation. When I compile without any flags, the test fails with return code 5, when I compile with -D_REENTRANT it succeeds (rc 0). Since apr always adds -D_REENTRANT for Solaris in build/apr_hints.m4 (independent of APR_HAS_THREADS), and your test runs in httpd configure after apr configure and thus inherits the flags, the test would correctly detect a thread-safe errno for Solaris during configure. The bad news: If building httpd with apr/apu sources in srclib and using --with-included-apr (and not assuming some unrelated installed system APR), the test can't run, because the binary needs to get linked against libapr-1 during configure runtime (where the lib has not yet been build). I don't know how easy a non-apr using test would be that still runs on the relevant target platforms :( Side note: I did increase NUM_THREADS to 10. But actually your original number of 3 gave the same results. Nevertheless I would increase the number if it is OK for you (I know, quadratic behavior, but we shouldn't care too much for this test): Index: trunk/acinclude.m4 === --- trunk/acinclude.m4(revision 1791085) +++ trunk/acinclude.m4(working copy) @@ -653,7 +653,7 @@ #include "apr_thread_cond.h" #include "apr_thread_proc.h" - #define NUM_THREADS 3 + #define NUM_THREADS 10 struct thread_data { apr_thread_mutex_t *mutex; @@ -692,6 +692,7 @@ int ret = 0; apr_status_t status; int i; + int j; apr_pool_t *pool; apr_thread_mutex_t *mutex; @@ -738,10 +739,13 @@ } /* Check that no addresses were duplicated. */ - if ((tdata[0].errno_addr == tdata[1].errno_addr) - || (tdata[1].errno_addr == tdata[2].errno_addr) - || (tdata[0].errno_addr == tdata[2].errno_addr)) { - ret = 5; + for (i = 0; i < NUM_THREADS - 1; ++i) { + for (j = i + 1; j < NUM_THREADS; ++j) { + if (tdata[i].errno_addr == tdata[j].errno_addr) { + ret = 5; + goto out; + } + } } out: If there's anyone else who wouldn't mind updating and posting their platform and test result, that would really help me figure out how widespread the "problem" is. Remember to rebuild configure first! :) Regards, Rainer
Re: Fixing more OpenSSL callback crashes
On 04/10/2017 03:59 PM, Jacob Champion wrote: So it looks like my test program might still be a possible solution for detecting whether we need a callback at configure time, unless anyone knows of a platform where two thread-local errnos can have the same address some of the time and different addresses at other times... I've checked my first attempt into trunk. It writes the result into the new AP_OPENSSL_USE_ERRNO_THREADID macro in ap_config_auto.h. On platforms where every thread has its own errno address, that macro will be #defined to 1. Rainer, would you mind sanity-checking the result on your Solaris box (sounds like the test should *not* pass there)? I've tested on Ubuntu 16.04, where the test passes. If there's anyone else who wouldn't mind updating and posting their platform and test result, that would really help me figure out how widespread the "problem" is. Remember to rebuild configure first! :) --Jacob
Re: Fixing more OpenSSL callback crashes
On 04/10/2017 03:21 PM, Rainer Jung wrote: Am 10.04.2017 um 22:41 schrieb Jacob Champion: A few questions for the list while I'm brainstorming the best way to fix https://bz.apache.org/bugzilla/show_bug.cgi?id=60947 ... - What is the oldest version of OpenSSL we'll support for the 2.4.x line? Will that version change in 2.next? For 2.4.0 it was discussed originally in May/June 2010. The original proposal was dropping support < 1.0.0, but we ended up in supporting everything starting with 0.9.7 (Threads was: "RFC: drop support for OpenSSL < 1.0 in trunk/2.3?"): Excellent, thanks for the history! 2.next: that would be up for discussion like it was done for 2.4 in 2010. I guess we would look again at what current OS distributions provide. Okay. I'll be the first to push for dropping support for anything no longer supported by OpenSSL upstream -- that would leave us with 1.0.2 and 1.1.0 -- but unfortunately I suspect we will need to drop to 1.0.1 to deal with some common long-term-release distributions that are still alive. And I'm not looking forward to the inevitable Mac conversation. That's for another thread, I guess. - Does anyone have a clever way to check, during configuration time, that errno is threadsafe? My plan was to spin up two threads in a config test program and see if the address of errno was different. I tried it and it does not detect the threadsafe errno when using -D_REENTRANT. See below. ...oh. So errno is actually threadsafe, but its "address" is the same in every thread? Interesting. In that case, my original question was not actually what I wanted to ask. What I actually want to know is whether the address of errno can be used as a makeshift thread ID, because that's what the default implementation in OpenSSL uses. For platforms on which that isn't true (looks like Solaris is one), we either have to provide our own callback, which may cause a crash during server startup, or have the user upgrade to 1.1.0, where the threadid callback isn't needed at all. So it looks like my test program might still be a possible solution for detecting whether we need a callback at configure time, unless anyone knows of a platform where two thread-local errnos can have the same address some of the time and different addresses at other times... --Jacob
Re: Fixing more OpenSSL callback crashes
Am 10.04.2017 um 22:41 schrieb Jacob Champion: A few questions for the list while I'm brainstorming the best way to fix https://bz.apache.org/bugzilla/show_bug.cgi?id=60947 ... - What is the oldest version of OpenSSL we'll support for the 2.4.x line? Will that version change in 2.next? For 2.4.0 it was discussed originally in May/June 2010. The original proposal was dropping support < 1.0.0, but we ended up in supporting everything starting with 0.9.7 (Threads was: "RFC: drop support for OpenSSL < 1.0 in trunk/2.3?"): - support and build warning-free with OpenSSL >= 0.9.8 - support and build with OpenSSL >= 0.9.7a, albeit with (harmless) compiler warnings about argument const-ness all over the shop - drop support for OpenSSL < 0.9.7a - drop support for non-OpenSSL/derivatives of OpenSSL The new minimum was mostly driven by looking at what was available on platforms at that time, e.g. not demanding users to build OpenSSL themselves. Sander Temme at that time provided this list: * Windows: none * Mac OS X 10.6: OpenSSL 0.9.8l 5 Nov 2009 * FreeBSD 6.4-STABLE: OpenSSL 0.9.7e-p1 25 Oct 2004 * FreeBSD 7.2-STABLE: OpenSSL 0.9.8e 23 Feb 2007 * FreeBSD 8-STABLE: OpenSSL 0.9.8k 25 Mar 2009 * OpenBSD 4.6: OpenSSL 0.9.8k 25 Mar 2009 * Solaris 10: 0.9.7 with backports... don't recall what's in the Coolstack but someone else may be able to tell us. * Sunfreeware.com: 1.0.0 and 0.9.7g, with both Apache 2.0.59 and 2.2.15 built against 1.0.0 * Red Hat 5: 0.9.8b with backports * Red Hat 4: 0.9.7 with backports * Ubuntu 10.04: OpenSSL 0.9.8k 25 Mar 2009 Later in 2.4.7 minimum support changed to 0.9.8a in Nov 2013 (commits r1542327 and r1555469). I think it hasn't changed since then. 2.next: that would be up for discussion like it was done for 2.4 in 2010. I guess we would look again at what current OS distributions provide. - Does anyone have a clever way to check, during configuration time, that errno is threadsafe? My plan was to spin up two threads in a config test program and see if the address of errno was different. I tried it and it does not detect the threadsafe errno when using -D_REENTRANT. See below. Probably not really good as well, but I tried starting two threads, both doing an invalid fdopen(), the first with an invalid fd (-1), the second with an invalid mode ("z"), then let both wait a few seconds, so that both had a chance to set errno, and then let both show what errno is. When compiling with -D_REENTRANT is get different errno values for both threads. - Does anyone know of platforms we support that *definitely* don't have a threadsafe errno? (From poking around on Google, it seemed like Solaris might fit into that bucket? Any others?) Solaris errno is threadsafe if _REENTRANT is defined. More precisely, if #if defined(_REENTRANT) || defined(_TS_ERRNO) || _POSIX_C_SOURCE - 0 >= 199506L Regards, Rainer