Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread Yann Ylavic
On Mon, Jan 17, 2022 at 4:05 PM Joe Orton  wrote:
>
> For 2.4.x I would argue it's better to keep a preference for 8.x over
> 10.x so that users aren't switched from one to the other across an
> upgrade - with some new performance trade-off we know about - without
> changing the environment/configure line?

+1, we should try to find PCRE1 first in 2.4.x and fall back to PCRE2
only if PCRE1 is not there.

Regards;
Yann.


Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread Yann Ylavic
On Mon, Jan 17, 2022 at 10:26 PM Christophe JAILLET
 wrote:
>
> Le 17/01/2022 à 13:27, Ruediger Pluem a écrit :
> >
> > With regards to the de-optimization / stack usage probably stupid question:
> > Can't we use the TLS features that several compilers offer?
> > e.g. see at the end of the page at 
> > https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably
> >
> > If we have no thread_local we could fall back to the current unoptimized 
> > implementation?

Looks good to me, I think the simpler way would be in APR though, see below.

>
> What about something as stupid as the attached patch?
> (just a POC to give the general idea)

I don't think we can override pcre2_{malloc,free}() for every module,
they may be using pcre2 outside of the httpd api/abi.

>
> Reserve some space on stack (in the spirit of the optimization for
> PCRE1) and implement a malloc/free that is clever enough to use this
> space instead a real malloc if pcre2 does not ask for too much space.
[]
>
> The space required by pcre2 is:
> offsetof(pcre2_match_data, ovector) + 2*oveccount*sizeof(PCRE2_SIZE)

Which may be too big for the stack already?

>
>PCRE2_SIZE   ovector[131072]; /* Must be last in the structure */

That one clearly is, but AIUI it's just a variable length structure so
we wouldn't really reserve that size.

>
> So we should be able to compute a MAX_PCRE2_STACK_DATA (see patch) that
> is both large enough to store 10 or so matches, and not too big.
>
> Not sure it is a clean solution, but I guess that it could work.

I'd rather we use the efficient
_Thread_local/__thread/__declspec(thread) in APR (see attached patch,
apr-1.8 material?) to track the current apr_thread_t and return it
with e.g. apr_thread_current(). Then we could simply do things like
this for ap_regex:

#ifndef APR_HAS_THREAD_LOCAL
/* always allocate */
#else
struct match_data {
#ifdef HAVE_PCRE2
pcre2_match_data *data;
#else
int *data;
#endif
apr_size_t size;
};
struct match_data *matchdata = NULL;
apr_thread_t *thd = apr_thread_current();
apr_thread_data_get((void **), "apreg", thd);
if (!matchdata || matchdata->size < new_size) {
if (matchdata) {
#ifdef HAVE_PCRE2
pcre2_match_data_free(matchdata->data);
#else
free(matchdata->data);
#endif
   }
else {
matchdata = calloc(1, sizeof(*matchdata);
apr_thread_data_set(matchdata, "ap_regex",
free_matchdata, thd);
   }
matchdata->size *= 2;
if (matchdata->size < new_size) {
matchdata->size = new_size;
if (matchdata->size < POSIX_MALLOC_THRESHOLD) {
matchdata->size = POSIX_MALLOC_THRESHOLD;
}
}
#ifdef HAVE_PCRE2
matchdata->data = pcre2_match_data_create(matchdata->size);
#else
matchdata->data = malloc(matchdata->size * sizeof(int) * 3);
#endif
}
#ifdef HAVE_PCRE2
ovector = pcre2_get_ovector_pointer(matchdata->data);
#else
ovector = matchdata->data;
#endif

#endif /* APR_HAS_THREAD_LOCAL */

(We could still keep the stack for the PCRE1 small size case though
that might complicate the code unnecessarily since reusing the TLS
matchdata in this case should be as effective as an apr_hash lookup).

WDYT?


Regards;
Yann.
Index: include/apr_thread_proc.h
===
--- include/apr_thread_proc.h	(revision 1896968)
+++ include/apr_thread_proc.h	(working copy)
@@ -212,6 +212,24 @@ typedef enum {
 #if APR_HAS_THREADS
 
 /**
+ * APR_THREAD_LOCAL keyword mapping the compiler's.
+ */
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 201112
+#define APR_THREAD_LOCAL _Thread_local
+#elif defined(__cplusplus) && __cplusplus >= 201103L
+#define APR_THREAD_LOCAL thread_local
+#elif defined(__GNUC__) /* works for clang too */
+#define APR_THREAD_LOCAL __thread
+#elif defined(WIN32) && defined(_MSC_VER)
+#define APR_THREAD_LOCAL __declspec(thread)
+#endif
+#ifdef APR_THREAD_LOCAL
+#define APR_HAS_THREAD_LOCAL 1
+#else
+#undef APR_HAS_THREAD_LOCAL
+#endif
+
+/**
  * Create and initialize a new threadattr variable
  * @param new_attr The newly created threadattr.
  * @param cont The pool to use
@@ -257,6 +275,13 @@ APR_DECLARE(apr_status_t) apr_threadattr_guardsize
apr_size_t guardsize);
 
 /**
+ * Get the current thread
+ * @param The current apr_thread, NULL if it is not an apr_thread or if
+ *it could not be determined.
+ */
+APR_DECLARE(apr_thread_t *) apr_thread_current(void);
+
+/**
  * Create a new thread of execution
  * @param new_thread The newly created thread handle.
  * @param attr The threadattr to use to determine how to create the thread
Index: threadproc/beos/thread.c
===
--- threadproc/beos/thread.c	(revision 1896968)
+++ 

Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread Christophe JAILLET

Le 17/01/2022 à 13:27, Ruediger Pluem a écrit :



On 1/16/22 10:35 PM, William A Rowe Jr wrote:

4 day ago, you all saw this. 6 years ago, you all started using this on trunk.

Don't know what I can to do help this along and honor the library
author's wishes for
all of us to walk away from the dead fork, and use the maintained
fork. It's whatever
it is, I'm out of here and removing the backport application branch,
whoever 3rd upvotes
this be prepared to apply this for us all, thanks.


With regards to the de-optimization / stack usage probably stupid question:
Can't we use the TLS features that several compilers offer?
e.g. see at the end of the page at 
https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably

If we have no thread_local we could fall back to the current unoptimized 
implementation?

Regards

Rüdiger




What about something as stupid as the attached patch?
(just a POC to give the general idea)

Reserve some space on stack (in the spirit of the optimization for 
PCRE1) and implement a malloc/free that is clever enough to use this 
space instead a real malloc if pcre2 does not ask for too much space.


The only difference with what is done with PCRE1 is that in this case we 
use on the stack the exact space needed for for POSIX_MALLOC_THRESHOLD 
(10) matches. In the attach patch, we reserve "some space" on the stack 
and we expect that it is large enough to avoid a malloc.



The space required by pcre2 is:
offsetof(pcre2_match_data, ovector) + 2*oveccount*sizeof(PCRE2_SIZE)

With:
typedef struct pcre2_real_match_data {
  pcre2_memctl memctl;
  const pcre2_real_code *code;/* The pattern used for the match */
  PCRE2_SPTR   subject;   /* The subject that was matched */
  PCRE2_SPTR   mark;  /* Pointer to last mark */
  PCRE2_SIZE   leftchar;  /* Offset to leftmost code unit */
  PCRE2_SIZE   rightchar; /* Offset to rightmost code unit */
  PCRE2_SIZE   startchar; /* Offset to starting code unit */
  uint8_t  matchedby; /* Type of match (normal, JIT, DFA) */
  uint8_t  flags; /* Various flags */
  uint16_t oveccount; /* Number of pairs */
  int  rc;/* The return code from the match */
  PCRE2_SIZE   ovector[131072]; /* Must be last in the structure */
} pcre2_real_match_data;

(my understanding is that pcre2_real_match_data and pcre2_match_data are 
the same)




So we should be able to compute a MAX_PCRE2_STACK_DATA (see patch) that 
is both large enough to store 10 or so matches, and not too big.


Not sure it is a clean solution, but I guess that it could work.


BTW, does someone have some numbers for the cost of the malloc we try to 
avoid? (read: are we over engineering the solution?)


CJIndex: util_pcre.c
===
--- util_pcre.c	(révision 1896817)
+++ util_pcre.c	(copie de travail)
@@ -272,6 +272,31 @@
   eflags);
 }
 
+//
+// Code called somewhere at server start to "register" our malloc/free functions
+// For PCRE2 memory allocation optimization
+//
+pcre2_general_context *global_pcre2_context =
+pcre2_general_context_create(_malloc, _free, NULL)
+//
+//
+
+#define MAX_PCRE2_STACK_DATA1024
+#define PCRE2_ON_STACK  ((void *)(-1))
+
+void *pcre2_malloc(size_t size, void *data)
+{
+if (size <= MAX_PCRE2_STACK_DATA)
+return PCRE2_ON_STACK;
+
+return malloc(size);
+}
+
+void pcre2_free(void *mem, void *data)
+{
+return free(mem);
+}
+
 AP_DECLARE(int) ap_regexec_len(const ap_regex_t *preg, const char *buff,
apr_size_t len, apr_size_t nmatch,
ap_regmatch_t *pmatch, int eflags)
@@ -282,11 +307,12 @@
 #ifdef HAVE_PCRE2
 pcre2_match_data *matchdata;
 size_t *ovector;
+char small_ovector[MAX_PCRE2_STACK_DATA];
 #else
 int small_ovector[POSIX_MALLOC_THRESHOLD * 3];
-int allocated_ovector = 0;
 int *ovector = NULL;
 #endif
+int allocated_ovector = 0;
 
 if ((eflags & AP_REG_NOTBOL) != 0)
 options |= PCREn(NOTBOL);
@@ -304,9 +330,17 @@
  */
 nlim = ((apr_size_t)preg->re_nsub + 1) > nmatch
  ? ((apr_size_t)preg->re_nsub + 1) : nmatch;
-matchdata = pcre2_match_data_create(nlim, NULL);
+
+allocated_ovector = 1;
+matchdata = pcre2_match_data_create(nlim, global_pcre2_context);
 if (matchdata == NULL)
 return AP_REG_ESPACE;
+if (matchdata == PCRE2_ON_STACK)
+{
+allocated_ovector = 0;
+matchdata = small_ovector;
+}
+
 ovector = pcre2_get_ovector_pointer(matchdata);
 rc = pcre2_match((const pcre2_code *)preg->re_pcre,
  (const unsigned char *)buff, len,
@@ -343,7 +377,8 @@
 }
 
 #ifdef HAVE_PCRE2
-pcre2_match_data_free(matchdata);
+if (allocated_ovector)
+pcre2_match_data_free(matchdata);
 

Re: TLS neverbleed design

2022-01-17 Thread Graham Leggett
On 17 Jan 2022, at 19:17, Ruediger Pluem  wrote:

>> I see. Thanks for the clarification. Did not really now about that 
>> interface. Then I see no pressing point in adding an additional API, indeed.
> 
> I haven't looked deeply in this, but based on the pointers from Joe I guess
> this can be done with
> 
> - p11-kit
> - softhsm
> - Some configuration on OpenSSL side to use the p11-kit client as pkcs11 
> provider for accessing the p11-kit server over a Unix
> domain socket
> 
> p11-kit and softhsm seem to be readily available at least on later versions 
> of Ubuntu and RedHat.
> 
> As the pkcs11 engine requires some configuration it could be nice adding a 
> feature to mod_ssl that allows to load a different
> configuration file than the standard Openssl configuration file.

No need - as of recently this has been simplified and is available in v2.4.

Start by passing pkcs11 URLs into mod_ssl as follows:

SSLCertificateFile 
pkcs11:token=Local%20Tokens;id=%B0%90%5A%36%1A%E0%2F%DA%13%14%81%FC%E1%DE%9D%86%9E%30%8F%AE;type=cert
SSLCertificateKeyFile 
pkcs11:token=Local%20Tokens;id=%B0%90%5A%36%1A%E0%2F%DA%13%14%81%FC%E1%DE%9D%86%9E%30%8F%AE;object=;type=private?pin-value=

The pkcs11 URL prefix is enough to get mod_ssl to tell openssl it wants the 
pkcs11 engine, provided by the openssl-pkcs11 RPM package on Redhat derivatives 
(native package is called https://github.com/OpenSC/libp11). No need to mess 
about with manual engine configs, mod_ssl handles the details for you.

In turn, if p11-kit is installed then all pkcs11 driver packages should “just 
work” on installation. I have had the most success with the opencryptoki-swtok 
package, a software HSM made by IBM.

The softhsm package by Redhat is packaged in such a way that it cannot be used 
by anything other than opendnssec, which is a shame.

If you want to store more than one certificate in your HSM / on your token, you 
need this fix: https://github.com/OpenSC/libp11/pull/433

Regards,
Graham
—



Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread Christophe JAILLET



Le 17/01/2022 à 14:16, Ruediger Pluem a écrit :


On 1/17/22 1:38 PM, Eric Covener wrote:

On Mon, Jan 17, 2022 at 7:28 AM Ruediger Pluem  wrote:



On 1/16/22 10:35 PM, William A Rowe Jr wrote:

4 day ago, you all saw this. 6 years ago, you all started using this on trunk.

Don't know what I can to do help this along and honor the library
author's wishes for
all of us to walk away from the dead fork, and use the maintained
fork. It's whatever
it is, I'm out of here and removing the backport application branch,
whoever 3rd upvotes
this be prepared to apply this for us all, thanks.

With regards to the de-optimization / stack usage probably stupid question:
Can't we use the TLS features that several compilers offer?
e.g. see at the end of the page at 
https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably

If we have no thread_local we could fall back to the current unoptimized 
implementation?

I more identified with Joe's comment, if we had an _ex version we
could pass pools for our own usage.
If some other heavy regex hitter third-party module finds an issue
(like mod_security) it might degrade a little bit but they can always
take advantage of it too.

Also, what about alloca()?

Trivia but I should add that in an integration between two third-party
mods I recently ripped out apr_threadkey stuff due to a crash after
some OS updates that nobody can explain. Fortunately the problematic
API had been extended with userdata and it was no longer absolutely
necessary. It was our only use of these API's so I was glad to
simplify and just remove them and move on.
Net, I would be hesitant to introduce something too new.

>From a quick glance apr_threadkey seems to use pthread_*specific for which I 
found postings that state that it is slow compared to
the compiler implemented thread locals. OTOH __thread alike stuff does not seem 
to offer any cleanup at thread exit which would
not allow us to free any resource that was created for that thread :-(. Hence I 
guess providing a pool to the call remains the
only solution here.


Some time ago, I played with apr_threadkey for another use case related 
to mod_unique_id (see r1887244+r1887245)
It has been reverted later because getting the thread ID was not 
reliable on Windows.


Discussion in BZ 65159.

I thought it could be a solution for PREC10 (see r1889287) but it wasn't.


I also looked at thread locals but AFAIK, it is not used in APR and it 
was raising a potential compatibility issue with build chain.



Just my 2c.

CJ



Regards

Rüdiger



Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread William A Rowe Jr
On Mon, Jan 17, 2022, 09:37 Ruediger Pluem  wrote:

>
>
> On 1/17/22 4:05 PM, Joe Orton wrote:
> > On Sun, Jan 16, 2022 at 03:35:15PM -0600, William A Rowe Jr wrote:
> >> 4 day ago, you all saw this. 6 years ago, you all started using this on
> trunk.
> >>
> >> Don't know what I can to do help this along and honor the library
> >> author's wishes for all of us to walk away from the dead fork, and use
> >> the maintained fork. It's whatever it is, I'm out of here and removing
> >> the backport application branch, whoever 3rd upvotes this be prepared
> >> to apply this for us all, thanks.
> >
> > I'm fine with PCRE 10.x as a trunk/2.5 feature.  PCRE upstream have
> > maintained 8.x better than e.g. zlib upstream have done in recent years
> > (last zlib release in 2017).  So I don't find the fact it's considered
> > EOL upstream presents any particular urgency, it's still supported
> > downstream on platforms people deploy to.
> >
> > For 2.4.x I would argue it's better to keep a preference for 8.x over
> > 10.x so that users aren't switched from one to the other across an
> > upgrade - with some new performance trade-off we know about - without
> > changing the environment/configure line?
>
> Sounds sensible for Linux to keep the default to 8.x if found where people
> can expect their distribution to maintain stuff provided that the
> distribution is still maintained.
> I am not so sure for other platforms especially Windows where I guess that
> people get stuff
> more often directly from upstream.
>

Sensible? Did you read the memo at pcre.org? There will be no more
evaluations of security risks on the abandoned fork and we were told this
back in May 2021.

Do you still have the same posture? Some of us spent the last 5 years
arguing for httpd.next, and I resigned over it. Your call, you are PMC and
I choose not to be.


Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread William A Rowe Jr
Ixney on the alloca. That's the entire point of not moving array
allocations to the stack, because the author won't take responsibility for
potential underflows and overflows. Stack is the wrong place for these bits



On Mon, Jan 17, 2022, 06:38 Eric Covener  wrote:

> On Mon, Jan 17, 2022 at 7:28 AM Ruediger Pluem  wrote:
> >
> >
> >
> > On 1/16/22 10:35 PM, William A Rowe Jr wrote:
> > > 4 day ago, you all saw this. 6 years ago, you all started using this
> on trunk.
> > >
> > > Don't know what I can to do help this along and honor the library
> > > author's wishes for
> > > all of us to walk away from the dead fork, and use the maintained
> > > fork. It's whatever
> > > it is, I'm out of here and removing the backport application branch,
> > > whoever 3rd upvotes
> > > this be prepared to apply this for us all, thanks.
> >
> > With regards to the de-optimization / stack usage probably stupid
> question:
> > Can't we use the TLS features that several compilers offer?
> > e.g. see at the end of the page at
> https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably
> >
> > If we have no thread_local we could fall back to the current unoptimized
> implementation?
>
> I more identified with Joe's comment, if we had an _ex version we
> could pass pools for our own usage.
> If some other heavy regex hitter third-party module finds an issue
> (like mod_security) it might degrade a little bit but they can always
> take advantage of it too.
>
> Also, what about alloca()?
>
> Trivia but I should add that in an integration between two third-party
> mods I recently ripped out apr_threadkey stuff due to a crash after
> some OS updates that nobody can explain. Fortunately the problematic
> API had been extended with userdata and it was no longer absolutely
> necessary. It was our only use of these API's so I was glad to
> simplify and just remove them and move on.
> Net, I would be hesitant to introduce something too new.
>


Re: TLS neverbleed design

2022-01-17 Thread Ruediger Pluem



On 1/17/22 5:32 PM, Stefan Eissing wrote:
> 
> 
>> Am 17.01.2022 um 16:42 schrieb Joe Orton :
>>
>> On Wed, Jan 12, 2022 at 03:37:05PM +0100, Stefan Eissing wrote:
>>> My current, rough idea would be:
>>>
>>> - fork a process, like mod_cgid does, that can be communicated
>>>  with over a unix domain socket
>>> - have an ap_hook to load a key and return an opaque key handle
>>> - have an ap_hooks to sign/encrypt/decrypt data for a key handle
>>>
>>>
>>> For mod_ssl that would mean:
>>> - use the hook on loading a key file
>>>  - if no handle returned, proceed as now, tell SSL_CTX to load the file
>>>  - on handle, construct a EVP_PKEY that proxies its methods to the
>>>new hooks
>>
>> You can do this already with PKCS#11 and it's already supported in 
>> mod_ssl, reinventing that wheel to add another API 
>>
>>> Example of how this is done at 
>>> https://github.com/h2o/neverbleed/blob/master/neverbleed.c
>>> which AFAICT is used in production at Fastly.
>>>
>>> If we implement this in a new module, that would become usable with
>>> an additional On|Off directive and no changes to SSL configs.
>>>
 You should be able to deploy something like this with PKCS#11, e.g. 
 softhsm, p11-kit can proxy PKCS#11 over a Unix domain socket, there are 
 probably more solutions out there.
>>>
>>> With the proposed hooks interface, one could do an implementation using 
>>> soft or
>>> hard hsms. But that would require changing mod_ssl configs as then the
>>> configuration would need to specify keys by an id other than file names. 
>>> etc. etc.
>>
>> Reinventing PKCS#11 as an ap_* level API is hardly without cost, though, 
>> so I wouldn't wave that away against some "etc etc" costs that users 
>> would need to tweak configs for.
>>
>> When I look at this problem from 10,000ft I see two parts:
>>
>> a) daemon which loads keys and offers key operations over an AF_UNIX 
>> interface
>>
>> b) interface to (a) from SSL_* layer
>>
>> Importantly, both of these seem equally useful in e.g. exim as they are 
>> in httpd.  You get (b) from supporting PKCS#11 which is already done in 
>> mod_ssl/OpenSSL.  I don't know how much you can get (a) in a convenient 
>> way for users, but as a project it mostly orthogonal to httpd except for 
>> some startup integration stuff.
> 
> I see. Thanks for the clarification. Did not really now about that interface. 
> Then I see no pressing point in adding an additional API, indeed.

I haven't looked deeply in this, but based on the pointers from Joe I guess
this can be done with

- p11-kit
- softhsm
- Some configuration on OpenSSL side to use the p11-kit client as pkcs11 
provider for accessing the p11-kit server over a Unix
domain socket

p11-kit and softhsm seem to be readily available at least on later versions of 
Ubuntu and RedHat.

As the pkcs11 engine requires some configuration it could be nice adding a 
feature to mod_ssl that allows to load a different
configuration file than the standard Openssl configuration file.

Regards

Rüdiger


Re: TLS neverbleed design

2022-01-17 Thread Stefan Eissing



> Am 17.01.2022 um 16:42 schrieb Joe Orton :
> 
> On Wed, Jan 12, 2022 at 03:37:05PM +0100, Stefan Eissing wrote:
>> My current, rough idea would be:
>> 
>> - fork a process, like mod_cgid does, that can be communicated
>>  with over a unix domain socket
>> - have an ap_hook to load a key and return an opaque key handle
>> - have an ap_hooks to sign/encrypt/decrypt data for a key handle
>> 
>> 
>> For mod_ssl that would mean:
>> - use the hook on loading a key file
>>  - if no handle returned, proceed as now, tell SSL_CTX to load the file
>>  - on handle, construct a EVP_PKEY that proxies its methods to the
>>new hooks
> 
> You can do this already with PKCS#11 and it's already supported in 
> mod_ssl, reinventing that wheel to add another API 
> 
>> Example of how this is done at 
>> https://github.com/h2o/neverbleed/blob/master/neverbleed.c
>> which AFAICT is used in production at Fastly.
>> 
>> If we implement this in a new module, that would become usable with
>> an additional On|Off directive and no changes to SSL configs.
>> 
>>> You should be able to deploy something like this with PKCS#11, e.g. 
>>> softhsm, p11-kit can proxy PKCS#11 over a Unix domain socket, there are 
>>> probably more solutions out there.
>> 
>> With the proposed hooks interface, one could do an implementation using soft 
>> or
>> hard hsms. But that would require changing mod_ssl configs as then the
>> configuration would need to specify keys by an id other than file names. 
>> etc. etc.
> 
> Reinventing PKCS#11 as an ap_* level API is hardly without cost, though, 
> so I wouldn't wave that away against some "etc etc" costs that users 
> would need to tweak configs for.
> 
> When I look at this problem from 10,000ft I see two parts:
> 
> a) daemon which loads keys and offers key operations over an AF_UNIX 
> interface
> 
> b) interface to (a) from SSL_* layer
> 
> Importantly, both of these seem equally useful in e.g. exim as they are 
> in httpd.  You get (b) from supporting PKCS#11 which is already done in 
> mod_ssl/OpenSSL.  I don't know how much you can get (a) in a convenient 
> way for users, but as a project it mostly orthogonal to httpd except for 
> some startup integration stuff.

I see. Thanks for the clarification. Did not really now about that interface. 
Then I see no pressing point in adding an additional API, indeed.

Kind Regards,
Stefan

> 
> Regards, Joe
> 



Re: TLS neverbleed design

2022-01-17 Thread Joe Orton
On Wed, Jan 12, 2022 at 03:37:05PM +0100, Stefan Eissing wrote:
> My current, rough idea would be:
> 
> - fork a process, like mod_cgid does, that can be communicated
>   with over a unix domain socket
> - have an ap_hook to load a key and return an opaque key handle
> - have an ap_hooks to sign/encrypt/decrypt data for a key handle
>
> 
> For mod_ssl that would mean:
> - use the hook on loading a key file
>   - if no handle returned, proceed as now, tell SSL_CTX to load the file
>   - on handle, construct a EVP_PKEY that proxies its methods to the
> new hooks

You can do this already with PKCS#11 and it's already supported in 
mod_ssl, reinventing that wheel to add another API 

> Example of how this is done at 
> https://github.com/h2o/neverbleed/blob/master/neverbleed.c
> which AFAICT is used in production at Fastly.
> 
> If we implement this in a new module, that would become usable with
> an additional On|Off directive and no changes to SSL configs.
> 
> > You should be able to deploy something like this with PKCS#11, e.g. 
> > softhsm, p11-kit can proxy PKCS#11 over a Unix domain socket, there are 
> > probably more solutions out there.
> 
> With the proposed hooks interface, one could do an implementation using soft 
> or
> hard hsms. But that would require changing mod_ssl configs as then the
> configuration would need to specify keys by an id other than file names. etc. 
> etc.

Reinventing PKCS#11 as an ap_* level API is hardly without cost, though, 
so I wouldn't wave that away against some "etc etc" costs that users 
would need to tweak configs for.

When I look at this problem from 10,000ft I see two parts:

a) daemon which loads keys and offers key operations over an AF_UNIX 
interface

b) interface to (a) from SSL_* layer

Importantly, both of these seem equally useful in e.g. exim as they are 
in httpd.  You get (b) from supporting PKCS#11 which is already done in 
mod_ssl/OpenSSL.  I don't know how much you can get (a) in a convenient 
way for users, but as a project it mostly orthogonal to httpd except for 
some startup integration stuff.

Regards, Joe



Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread Ruediger Pluem



On 1/17/22 4:05 PM, Joe Orton wrote:
> On Sun, Jan 16, 2022 at 03:35:15PM -0600, William A Rowe Jr wrote:
>> 4 day ago, you all saw this. 6 years ago, you all started using this on 
>> trunk.
>>
>> Don't know what I can to do help this along and honor the library 
>> author's wishes for all of us to walk away from the dead fork, and use 
>> the maintained fork. It's whatever it is, I'm out of here and removing 
>> the backport application branch, whoever 3rd upvotes this be prepared 
>> to apply this for us all, thanks.
> 
> I'm fine with PCRE 10.x as a trunk/2.5 feature.  PCRE upstream have 
> maintained 8.x better than e.g. zlib upstream have done in recent years 
> (last zlib release in 2017).  So I don't find the fact it's considered 
> EOL upstream presents any particular urgency, it's still supported 
> downstream on platforms people deploy to.
> 
> For 2.4.x I would argue it's better to keep a preference for 8.x over 
> 10.x so that users aren't switched from one to the other across an 
> upgrade - with some new performance trade-off we know about - without 
> changing the environment/configure line?

Sounds sensible for Linux to keep the default to 8.x if found where people
can expect their distribution to maintain stuff provided that the distribution 
is still maintained.
I am not so sure for other platforms especially Windows where I guess that 
people get stuff
more often directly from upstream.

Regards

Rüdiger


Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread Joe Orton
On Sun, Jan 16, 2022 at 03:35:15PM -0600, William A Rowe Jr wrote:
> 4 day ago, you all saw this. 6 years ago, you all started using this on trunk.
> 
> Don't know what I can to do help this along and honor the library 
> author's wishes for all of us to walk away from the dead fork, and use 
> the maintained fork. It's whatever it is, I'm out of here and removing 
> the backport application branch, whoever 3rd upvotes this be prepared 
> to apply this for us all, thanks.

I'm fine with PCRE 10.x as a trunk/2.5 feature.  PCRE upstream have 
maintained 8.x better than e.g. zlib upstream have done in recent years 
(last zlib release in 2017).  So I don't find the fact it's considered 
EOL upstream presents any particular urgency, it's still supported 
downstream on platforms people deploy to.

For 2.4.x I would argue it's better to keep a preference for 8.x over 
10.x so that users aren't switched from one to the other across an 
upgrade - with some new performance trade-off we know about - without 
changing the environment/configure line?

Regards, Joe



Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread gs-apache-dev
On Mon, Jan 17, 2022 at 02:16:32PM +0100, Ruediger Pluem wrote:
> 
> 
> On 1/17/22 1:38 PM, Eric Covener wrote:
> > On Mon, Jan 17, 2022 at 7:28 AM Ruediger Pluem  wrote:
> >>
> >>
> >>
> >> On 1/16/22 10:35 PM, William A Rowe Jr wrote:
> >>> 4 day ago, you all saw this. 6 years ago, you all started using this on 
> >>> trunk.
> >>>
> >>> Don't know what I can to do help this along and honor the library
> >>> author's wishes for
> >>> all of us to walk away from the dead fork, and use the maintained
> >>> fork. It's whatever
> >>> it is, I'm out of here and removing the backport application branch,
> >>> whoever 3rd upvotes
> >>> this be prepared to apply this for us all, thanks.
> >>
> >> With regards to the de-optimization / stack usage probably stupid question:
> >> Can't we use the TLS features that several compilers offer?
> >> e.g. see at the end of the page at 
> >> https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably
> >>
> >> If we have no thread_local we could fall back to the current unoptimized 
> >> implementation?
> > 
> > I more identified with Joe's comment, if we had an _ex version we
> > could pass pools for our own usage.
> > If some other heavy regex hitter third-party module finds an issue
> > (like mod_security) it might degrade a little bit but they can always
> > take advantage of it too.
> > 
> > Also, what about alloca()?
> > 
> > Trivia but I should add that in an integration between two third-party
> > mods I recently ripped out apr_threadkey stuff due to a crash after
> > some OS updates that nobody can explain. Fortunately the problematic
> > API had been extended with userdata and it was no longer absolutely
> > necessary. It was our only use of these API's so I was glad to
> > simplify and just remove them and move on.
> > Net, I would be hesitant to introduce something too new.
> 
> From a quick glance apr_threadkey seems to use pthread_*specific for which I 
> found postings that state that it is slow compared to
> the compiler implemented thread locals. OTOH __thread alike stuff does not 
> seem to offer any cleanup at thread exit which would
> not allow us to free any resource that was created for that thread :-(. Hence 
> I guess providing a pool to the call remains the
> only solution here.

Maybe use thread-local storage to set a pointer to an apr_pool
associated with the thread, and set that only for short-lived
(or otherwise appropriately scoped) threads?

Cheers, Glenn


Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread Ruediger Pluem



On 1/17/22 1:38 PM, Eric Covener wrote:
> On Mon, Jan 17, 2022 at 7:28 AM Ruediger Pluem  wrote:
>>
>>
>>
>> On 1/16/22 10:35 PM, William A Rowe Jr wrote:
>>> 4 day ago, you all saw this. 6 years ago, you all started using this on 
>>> trunk.
>>>
>>> Don't know what I can to do help this along and honor the library
>>> author's wishes for
>>> all of us to walk away from the dead fork, and use the maintained
>>> fork. It's whatever
>>> it is, I'm out of here and removing the backport application branch,
>>> whoever 3rd upvotes
>>> this be prepared to apply this for us all, thanks.
>>
>> With regards to the de-optimization / stack usage probably stupid question:
>> Can't we use the TLS features that several compilers offer?
>> e.g. see at the end of the page at 
>> https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably
>>
>> If we have no thread_local we could fall back to the current unoptimized 
>> implementation?
> 
> I more identified with Joe's comment, if we had an _ex version we
> could pass pools for our own usage.
> If some other heavy regex hitter third-party module finds an issue
> (like mod_security) it might degrade a little bit but they can always
> take advantage of it too.
> 
> Also, what about alloca()?
> 
> Trivia but I should add that in an integration between two third-party
> mods I recently ripped out apr_threadkey stuff due to a crash after
> some OS updates that nobody can explain. Fortunately the problematic
> API had been extended with userdata and it was no longer absolutely
> necessary. It was our only use of these API's so I was glad to
> simplify and just remove them and move on.
> Net, I would be hesitant to introduce something too new.

>From a quick glance apr_threadkey seems to use pthread_*specific for which I 
>found postings that state that it is slow compared to
the compiler implemented thread locals. OTOH __thread alike stuff does not seem 
to offer any cleanup at thread exit which would
not allow us to free any resource that was created for that thread :-(. Hence I 
guess providing a pool to the call remains the
only solution here.

Regards

Rüdiger



Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread Eric Covener
On Mon, Jan 17, 2022 at 7:28 AM Ruediger Pluem  wrote:
>
>
>
> On 1/16/22 10:35 PM, William A Rowe Jr wrote:
> > 4 day ago, you all saw this. 6 years ago, you all started using this on 
> > trunk.
> >
> > Don't know what I can to do help this along and honor the library
> > author's wishes for
> > all of us to walk away from the dead fork, and use the maintained
> > fork. It's whatever
> > it is, I'm out of here and removing the backport application branch,
> > whoever 3rd upvotes
> > this be prepared to apply this for us all, thanks.
>
> With regards to the de-optimization / stack usage probably stupid question:
> Can't we use the TLS features that several compilers offer?
> e.g. see at the end of the page at 
> https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably
>
> If we have no thread_local we could fall back to the current unoptimized 
> implementation?

I more identified with Joe's comment, if we had an _ex version we
could pass pools for our own usage.
If some other heavy regex hitter third-party module finds an issue
(like mod_security) it might degrade a little bit but they can always
take advantage of it too.

Also, what about alloca()?

Trivia but I should add that in an integration between two third-party
mods I recently ripped out apr_threadkey stuff due to a crash after
some OS updates that nobody can explain. Fortunately the problematic
API had been extended with userdata and it was no longer absolutely
necessary. It was our only use of these API's so I was glad to
simplify and just remove them and move on.
Net, I would be hesitant to introduce something too new.


Re: svn commit: r1896976 - /httpd/httpd/branches/2.4.x/STATUS

2022-01-17 Thread Ruediger Pluem



On 1/16/22 10:35 PM, William A Rowe Jr wrote:
> 4 day ago, you all saw this. 6 years ago, you all started using this on trunk.
> 
> Don't know what I can to do help this along and honor the library
> author's wishes for
> all of us to walk away from the dead fork, and use the maintained
> fork. It's whatever
> it is, I'm out of here and removing the backport application branch,
> whoever 3rd upvotes
> this be prepared to apply this for us all, thanks.

With regards to the de-optimization / stack usage probably stupid question:
Can't we use the TLS features that several compilers offer?
e.g. see at the end of the page at 
https://stackoverflow.com/questions/18298280/how-to-declare-a-variable-as-thread-local-portably

If we have no thread_local we could fall back to the current unoptimized 
implementation?

Regards

Rüdiger