On Tue, Jul 12, 2022 at 12:46 PM Ivan Zhakov <i...@visualsvn.com> wrote:
>
> On Wed, 29 Jun 2022 at 01:28, Yann Ylavic <ylavic....@gmail.com> wrote:
> >
> > On Tue, Jun 28, 2022 at 6:08 PM Ivan Zhakov <i...@visualsvn.com> wrote:
> > > >
> > > I meant a memory leak for native threads. For example, with an MPM
> > > that uses native Windows threadpool for worker threads.
> >
> > That's not mpm_winnt right? (it does not use native threads anymore
> > and plays no apr_thread_current_create() game, it just uses
> > apr_thread_create() so apr_thread_current() works automatically
> > there).
>
> No, it's not mpm_winnt, but still a possible real-world example. And there
> are multiple other users of APR besides the HTTP Server.

So we have a hypothetical issue for what exactly? Someone will be
using the new ap_thread API (or future APR one) for something that it
can't be used for?

>
> >
> > Probably the thread pool lets you pass the thread function? So you
> > could "apr_thread_current_create(&mythread)" when entering the
> > function and "apr_pool_destroy(apr_thread_pool_get(mythread))" before
> > leaving?
> > That'd still leak if the thread calls TerminateThread() explicitly at
> > some point, but it seems that the native Windows thread API is not
> > very helpful for that since there is way to register a destructor for
> > when the thread terminates by any mean (like the pthread_key_create()
> > API's destructor for instance). If the native API does not help I
> > don't think that APR can do much better..
> >
> This sounds problematic to me, because the result is a memory leak unless
> the application code is changed (and the APR consumer has to somehow be aware
> of that).
>
> Also, native Windows thread pool works in a different way: it invokes a
> callback on the random thread created by OS. So there is no easy way to
> track the thread exit.

You call something that does not exist as problematic. If some native
threads can call TerminateThread() at any uncontrollable point then
don't use ap_thread_current_create(), and if this is in an MPM then
don't use that MPM at all if you are worried about leaks.
But threadpools usually let you provide the thread function, so you
have both an entry and exit point between which you are not supposed
to kill the thread (underneath the threadpool), right?

> >
> > As said above, this is how it works already when apr_thread_current() is 
> > NULL.
> > If it's not NULL, you get the nice behaviour that it reuses
> > efficiently the same vector (PCRE1) or context (PCRE2) for each call
> > on the same thread, with no allocation/free overhead or memory
> > fragmentation.
> >
>
> The code in server/util_pcre.c:342 was doing the following:
> [[[
>     current = ap_thread_current();
>     if (!current) {
>         *to_free = 1;
>         return alloc_match_data(size, small_vector);
>     }
> ]]]
>
> Here alloc_match_data() was calling pcre2_match_data_create(), resulting
> in a malloc().

Yes OK, with the previous implementation there was no cache/stack
provided for PCRE2 for non APR threads (or !AP_HAS_THREAD_LOCAL).
You implemented that, but now what's missing is a per-thread cache
that works for all the patterns and is reused across the calls to
ap_regexec(), and for that you need
AP_THREAD_LOCAL/ap_thread_current().

So how about we still use the stack buffer only if it's enough, or use
some AP_THREAD_LOCAL/ap_thread_current() cache if available, or use
malloc() finally otherwise. Best of both worlds, should there be
regexes with many captures or PCRE2's START_FRAMES_SIZE be smaller for
some libpcre2 build or pcre2_match_data_create_with_frames() be
available on a future PCRE2 version (all of those cases would put more
pressure on private_malloc() with more frequent and/or larger calls
from pcre2_match()), then systems with
AP_THREAD_LOCAL/ap_thread_current() will still work efficiently.

Patch attached, thoughts?

Regards;
Yann.

PS: the patch also includes the fix from
https://bz.apache.org/bugzilla/show_bug.cgi?id=66119
Index: server/util_pcre.c
===================================================================
--- server/util_pcre.c	(revision 1902674)
+++ server/util_pcre.c	(working copy)
@@ -56,6 +56,11 @@ POSSIBILITY OF SUCH DAMAGE.
 #include "apr_strings.h"
 #include "apr_tables.h"
 
+#if AP_HAS_THREAD_LOCAL
+#include "apr_thread_proc.h"
+#include "mpm_common.h" /* for ap_max_mem_free */
+#endif
+
 #ifdef HAVE_PCRE2
 #define PCRE2_CODE_UNIT_WIDTH 8
 #include "pcre2.h"
@@ -280,9 +285,24 @@ typedef int*              match_data_pt;
 typedef int*              match_vector_pt;
 #endif
 
+#if AP_HAS_THREAD_LOCAL && !defined(APREG_NO_THREAD_LOCAL)
+#define APREG_USE_THREAD_LOCAL 1
+#else
+#define APREG_USE_THREAD_LOCAL 0
+#endif
+
+#if APREG_USE_THREAD_LOCAL
+struct match_thread_state
+{
+    char *heap;
+    apr_size_t heap_size;
+    apr_size_t heap_used;
+    apr_pool_t *pool;
+};
+#endif
+
 struct match_data_state
 {
-    match_data_pt data;
     char *buf;
     apr_size_t buf_len;
     apr_size_t buf_used;
@@ -289,63 +309,122 @@ struct match_data_state
 #ifdef HAVE_PCRE2
     pcre2_general_context *pcre2_ctx;
 #endif
+#if APREG_USE_THREAD_LOCAL
+    apr_thread_t *thd;
+    struct match_thread_state *ts;
+#endif
+    match_data_pt match_data;
 };
 
+#if APREG_USE_THREAD_LOCAL
+static AP_THREAD_LOCAL struct match_thread_state *thread_state;
+
+static void *thread_malloc(apr_size_t size, struct match_data_state *state)
+{
+    struct match_thread_state *ts = thread_state;
+
+    if (!ts) {
+        apr_pool_t *tp = apr_thread_pool_get(state->thd);
+        ts = apr_pcalloc(tp, sizeof(*ts));
+        apr_pool_create(&ts->pool, tp);
+        thread_state = ts;
+    }
+
+    if (!state->ts) {
+        state->ts = ts;
+        ts->heap_used = 0;
+    }
+    if (size > ts->heap_size - ts->heap_used) {
+        ts->heap_size *= 2;
+        if (ts->heap_size < size) {
+            ts->heap_size = size;
+        }
+        ts->heap = apr_palloc(ts->pool, ts->heap_size);
+        ts->heap_used = 0;
+    }
+
+    ts->heap_used += size;
+    return ts->heap + ts->heap_used - size;
+}
+#endif /* APREG_USE_THREAD_LOCAL */
+
 static void * private_malloc(size_t size, void *ctx)
 {
     struct match_data_state *state = ctx;
+    apr_size_t aligned_size = APR_ALIGN_DEFAULT(size);
 
-    if(size <= (state->buf_len - state->buf_used)) {
-        void *p = state->buf + state->buf_used;
+    if (aligned_size <= state->buf_len - state->buf_used) {
+        state->buf_used += aligned_size;
+        return state->buf + state->buf_used - aligned_size;
+    }
 
-        state->buf_used += APR_ALIGN_DEFAULT(size);
+#if APREG_USE_THREAD_LOCAL
+    if (state->thd) {
+        return thread_malloc(size, state);
+    }
+#endif
 
-        return p;
-    }
-    else {
-        return malloc(size);
-    }
+    return malloc(size);
 }
 
 static void private_free(void *block, void *ctx)
 {
     struct match_data_state *state = ctx;
-    void *buf_start = state->buf;
-    void *buf_end = state->buf + state->buf_len;
 
-    if (block >= buf_start && block <= buf_end) {
+    if ((char *)block >= state->buf
+        && (char *)block < &state->buf[state->buf_len]) {
         /* This block allocated from stack buffer. Do nothing. */
+        return;
     }
-    else {
-        free(block);
+
+#if APREG_USE_THREAD_LOCAL
+    if (state->thd) {
+        /* Freed in free_match_data() eventually. */
+        return;
     }
+#endif
+
+    free(block);
 } 
 
 static APR_INLINE
-match_data_pt alloc_match_data(apr_size_t size,
-                               struct match_data_state *state,
-                               void *stack_buf,
-                               apr_size_t stack_buf_len)
+int alloc_match_data(struct match_data_state *state, apr_uint32_t ncaps,
+                     void *stack_buf, apr_size_t stack_buf_len)
 {
     state->buf = stack_buf;
     state->buf_len = stack_buf_len;
     state->buf_used = 0;
 
+#if APREG_USE_THREAD_LOCAL
+    state->thd = ap_thread_current();
+    state->ts = NULL;
+#endif
+
 #ifdef HAVE_PCRE2
     state->pcre2_ctx = pcre2_general_context_create(private_malloc, private_free, state);
     if (!state->pcre2_ctx) { 
-        return NULL;
+        return 0;
     }
-    state->data = pcre2_match_data_create((int)size, state->pcre2_ctx);
-    if (!state->data) {
+
+    state->match_data = pcre2_match_data_create(ncaps, state->pcre2_ctx);
+    if (!state->match_data) {
         pcre2_general_context_free(state->pcre2_ctx);
-        return NULL;
+        return 0;
     }
 #else
-    state->data = private_malloc(size * sizeof(int) * 3, state);
+    if (ncaps) {
+        state->match_data = private_malloc(ncaps * sizeof(int) * 3, state);
+        if (!state->match_data) {
+            return 0;
+        }
+    }
+    else {
+        /* Fine with PCRE1 */
+        state->match_data = NULL;
+    }
 #endif
 
-    return state->data;
+    return 1;
 }
 
 static APR_INLINE
@@ -352,11 +431,23 @@ static APR_INLINE
 void free_match_data(struct match_data_state *state)
 {
 #ifdef HAVE_PCRE2
-    pcre2_match_data_free(state->data);
+    pcre2_match_data_free(state->match_data);
     pcre2_general_context_free(state->pcre2_ctx);
 #else
-    private_free(state->data, state);
+    if (state->match_data) {
+        private_free(state->match_data, state);
+    }
 #endif
+
+#if APREG_USE_THREAD_LOCAL
+    if (state->ts
+        && ap_max_mem_free > 0
+        && state->ts->heap_size > ap_max_mem_free) {
+        apr_pool_clear(state->ts->pool);
+        state->ts->heap_size = 0;
+        state->ts->heap = NULL;
+    }
+#endif
 }
 
 AP_DECLARE(int) ap_regexec(const ap_regex_t *preg, const char *string,
@@ -372,16 +463,39 @@ AP_DECLARE(int) ap_regexec_len(const ap_regex_t *p
                                ap_regmatch_t *pmatch, int eflags)
 {
     int rc;
-    int options = 0, to_free = 0;
+    int options = 0;
+    struct match_data_state state;
     match_vector_pt ovector = NULL;
-    apr_size_t ncaps = (apr_size_t)preg->re_nsub + 1;
-    struct match_data_state state;
-    match_data_pt data;
     /* Use apr_uint64_t to get proper alignment. */
-    apr_uint64_t stack_buf[(AP_PCRE_STACKBUF_SIZE + sizeof(apr_uint64_t) - 1) / sizeof(apr_uint64_t)];
+    apr_uint64_t stack_buf[AP_PCRE_STACKBUF_SIZE / sizeof(apr_uint64_t)];
+    apr_uint32_t ncaps = (apr_uint32_t)preg->re_nsub + 1;
 
-    data = alloc_match_data(ncaps, &state, stack_buf, sizeof(stack_buf));
-    if (!data) {
+#ifndef HAVE_PCRE2
+    /* This is fine if pcre_exec() gets a vector size smaller than the
+     * number of capturing groups (it will treat the remaining ones as
+     * non-capturing), but if the vector is too small to keep track of
+     * the potential backrefs within the pattern, it will temporarily
+     * malloc()ate the necessary space anyway. So let's provide a vector
+     * of at least PCRE_INFO_BACKREFMAX entries (likely zero, otherwise
+     * the vector is most likely cached already anyway).
+     * Note that if no captures are to be used by the caller, passing an
+     * nmatch of zero (thus forcing all groups to be non-capturing) may
+     * allow for some optimizations and/or less recursion (stack usage)
+     * with PCRE1, unless backrefs..
+     */
+    if (ncaps > nmatch) {
+        int refmax = 0;
+        pcre_fullinfo((const pcre *)preg->re_pcre, NULL,
+                      PCRE_INFO_BACKREFMAX, &refmax);
+        if (refmax > 0 && (apr_uint32_t)refmax >= nmatch) {
+            ncaps = (apr_uint32_t)refmax + 1;
+        }
+        else {
+            ncaps = nmatch;
+        }
+    }
+#endif
+    if (!alloc_match_data(&state, ncaps, stack_buf, sizeof(stack_buf))) {
         return AP_REG_ESPACE;
     }
 
@@ -396,11 +510,11 @@ AP_DECLARE(int) ap_regexec_len(const ap_regex_t *p
 
 #ifdef HAVE_PCRE2
     rc = pcre2_match((const pcre2_code *)preg->re_pcre,
-                     (const unsigned char *)buff, len,
-                     0, options, data, NULL);
-    ovector = pcre2_get_ovector_pointer(data);
+                     (const unsigned char *)buff, len, 0, options,
+                     state.match_data, NULL);
+    ovector = pcre2_get_ovector_pointer(state.match_data);
 #else
-    ovector = data;
+    ovector = state.match_data;
     rc = pcre_exec((const pcre *)preg->re_pcre, NULL, buff, (int)len,
                    0, options, ovector, ncaps * 3);
 #endif

Reply via email to