On Fri, 8 Jul 2022 at 19:58, Yann Ylavic <ylavic....@gmail.com> wrote:
>
> On Fri, Jul 8, 2022 at 5:07 PM <i...@apache.org> wrote:
> >
> > Author: ivan
> > Date: Fri Jul  8 15:07:00 2022
> > New Revision: 1902572
> >
> > URL: http://svn.apache.org/viewvc?rev=1902572&view=rev
> > Log:
> > Rewrite ap_regexec() without a thread-local storage context for allocations.
> >
> > Provide custom malloc() and free() implementations that use a stack buffer
> > for first N bytes and then fall back to an ordinary malloc/free().
> >
> > The key properties of this approach are:
> >
> > 1) Allocations with PCRE2 happen the same way as they were happening
> > with PCRE1 in httpd 2.4.52 and earlier.
> >
> > 2) There are no malloc()/free() calls for typical cases where the
> > match data can be kept on stack.
> >
> > 3) The patch avoids a malloc() for the match_data structure itself,
> > because the match data is allocated with the provided custom malloc()
> > function.
> >
> > 4) Using custom allocation functions should ensure that PCRE is not
> > going to use malloc() for any auxiliary allocations, if they are
> > necessary.
> >
> > 5) There is no per-thread state.
> >
> > References:
> > 1) https://lists.apache.org/thread/l6m7dqjkk0yy3tooyd2so0rb20jmtpwd
> > 2) https://lists.apache.org/thread/5k9y264whn4f1ll35tvl2164dz0wphvy
>
> The main downside is:
>
> > +#ifndef AP_PCRE_STACKBUF_SIZE
> > +#define AP_PCRE_STACKBUF_SIZE (256)
> >  #endif
> []
> >  AP_DECLARE(int) ap_regexec(const ap_regex_t *preg, const char *string,
> >                             apr_size_t nmatch, ap_regmatch_t *pmatch,
> >                             int eflags)
> > -    int small_vector[POSIX_MALLOC_THRESHOLD * 3];
> > +    /* Use apr_uint64_t to get proper alignment. */
> > +    apr_uint64_t stack_buf[(AP_PCRE_STACKBUF_SIZE + sizeof(apr_uint64_t) - 
> > 1) / sizeof(apr_uint64_t)];
>
> So, if I count correctly, that's >2KB stack usage (regardless of PCRE1
> or PCRE2) compared to the 120B we had previously (for PCRE1 only)?
AP_PCRE_STACKBUF_SIZE is in bytes. And by default is 256 bytes. So
code allocates exactly 256 bytes on stack.

> This stack size allows for more than 200 captures for PCR1 (way too
> much), so how many captures/complexity/margin do these 2KB represent
> for PCRE2 and how much can it be reduced to have PCRE2 not malloc()ate
> for the usual (<10 captures) regexes?
> (Remember that ap_regexec() can be called anywhere/deep in the call
> stack, e.g. mod_rewrite, so 2KB stack is not reasonable IMHO).
>

> Finally, I don't see how it's better than one allocation (or two) for
> the whole lifetime of a thread, given that thread_local is available
> on all the platforms (where performances matter), neither do I see
> what are your grievances against thread_local..
>
1. tls storage is growing to maximum captures count. So it requires
more than one or two allocations.
2. tls storage doesn't support dynamic MPM that uses native threads
3. access to tls storage requires hash lookup.
4. Windows build is currently broken due to tls storage API changes.
5. Using tls reserves the memory, even if pcre is not being used
afterwards. This can be thought as of implicitly increasing the
per-thread stack size.
6. With new code there will be no actual allocations in real-world cases.



--
Ivan Zhakov

Reply via email to