util_pcre.c

Ruediger Pluem Mon, 01 Aug 2022 11:15:34 -0700

On 8/1/22 5:05 PM, Ivan Zhakov wrote:
> On 2022/07/29 06:19:03 Ruediger Pluem wrote:
>>
>>
>> On 7/16/22 10:28 AM, Ivan Zhakov wrote:
>>> On Wed, 13 Jul 2022 at 18:06, Yann Ylavic <ylavic....@gmail.com 
>>> <mailto:ylavic....@gmail.com>> wrote:
>>>
>>>     On Wed, Jul 13, 2022 at 1:38 PM Ivan Zhakov <i...@apache.org 
>>> <mailto:i...@apache.org>> wrote:
>>>     >
>>>     > On Sun, 10 Jul 2022 at 19:52, Yann Ylavic <ylavic....@gmail.com 
>>> <mailto:ylavic....@gmail.com>> wrote:
>>>     > >
>>>     > > Let me explain (maybe), I thought that PCRE2's match_data were not
>>>     > > only containing the captures vector but also all the frames needed 
>>> for
>>>     > > matching not-too-complex regexes (at least the initial ones since 
>>> they
>>>     > > will be reallocated on need). So the (supposed) 2KB looked necessary
>>>     > > to me for a few stack frames.
>>>     > > But the actual 256 bytes only made me look at the PCRE2 code again
>>>     > > (better this time) for how that can possibly work, and guess what: 
>>> the
>>>     > > initial frames are allocated on the stack ([1]) still in PCRE2, and
>>>     > > that's 20KiB ([2]) of stack space for each call to pcre2_match().
>>>     > >
>>>     > > On Alpine Linux for instance (quite used in docker envs) there is 
>>> only
>>>     > > 128KiB stack per thread, so I've always configured
>>>     > > --no-recurse-on-stack with PCRE1 and used per-thread (and its pool)
>>>     > > vector cache, with custom pcre_malloc/free() to avoid performances
>>>     > > overhead and be able to run more complex regexes without exploding,
>>>     > > and I can't do that now with PCRE2 and without requiring 20KB of 
>>> stack
>>>     > > :/
>>>     > > So I created a pull request ([3]).
>>>     > >
>>>     > The 20KB stack usage in PCRE is interesting, but somewhat unrelated. 
>>> But
>>>     > I would suggest a different solution: it would be nice if PCRE 
>>> provided
>>>     > a compile time directive to specify the maximum stack allocated 
>>> buffer.
>>>     > So then distributions like Alpine Linux would specify more appropriate
>>>     > defaults.
>>>
>>>     This is what pcre2 folks proposed in [3] as a fast(ly acceptable)
>>>     solution indeed.
>>>     Say START_FRAMES_SIZE is set to some minimum, a single struct
>>>     heapframe with 24 captures, that's (on a 64bit system):
>>>       offsetof(heapframe, ovector) + 24 * 2 * sizeof(size_t) = 125 + 384
>>>     ~= 512 bytes
>>>     which would be much appreciated, but would also put much more pressure
>>>     on util_regex's private_malloc().
>>>     Will ap_regexec() start to use malloc() quite often with the current
>>>     implementation then?
>>>     Up to the old/default START_FRAMES_SIZE if it was a good default 
>>> afterall?
>>>
>>>     But we are not here, because I don't think distros will change the
>>>     default pcre START_FRAMES_SIZE for all possible code they run, so most
>>>     httpd users (which run on distros I suppose) will continue to consume
>>>     20K of stack memory for every ap_regexec() call.
>>>
>>>     The flexible way for pcre2 users (and httpd maintainers) would be to
>>>     let them opt-out (at runtime) of stack allocation from pcre2_match(),
>>>     which is the patch I proposed.
>>>
>>> I see several problems in this approach:
>>> 1. Every apr_pool_t consumes at least one page, i.e., 4kb or 8kb depending 
>>> of the platform.
>>> 2. The implementation may introduce unbounded memory usage, if PCRE2 
>>> performs multiple malloc/free during pattern matching. It
>>> seems that currently it doesn't, but we cannot rely on the implementation 
>>> details. Stackbuf code doesn't have this problem,
>>> because it's limited to a fixed-size buffer.
>>> 3. As I said before, the proposed approach effectively increases per thread 
>>> memory usage: 4 or 8 kb for match data stored in
>>> thread state + stack size. Which is something I don't really understand, 
>>> because if we are fine with that, then we can just
>>> increase the thread stack size.
>>>
>>> With the above in mind, I'm -1 for this approach and r1902731.
>>>
>>
>> The latest implementation gives you both, a small allocation on the stack 
>> that does not drive stack usage
>> up too much and that should be sufficient for many regex cases. If this does 
>> not work we use tls to tuck away a pointer to a per
>> thread pool and we only create the pool in case the stack memory isn't 
>> sufficient.
>> If we really increase the memory usage if need to use the pool isn't clear 
>> to me as we use a subpool of the thread pool and thus
>> its allocator. Maybe that allocator has free nodes still around and we reuse 
>> these.
>> With regards to unbound memory usage I don't think that this is an actual 
>> problem, but if PCRE would really massively malloc/free
>> beyond the space we use on the stack just using malloc/free to satisfy these 
>> needs would be problematic as well and I guess this
>> would probably need fixing in PCRE then.
>> Furthermore if this happens we could have a look at using apr_bucket_alloc / 
>> apr_bucket_free or as this might not fit exactly
>> take ideas from there.
>> Could you please reconsider your -1?
>>
> 
> My overall concern is that with the current solution as of r1902858, there 
> are valid and
> reasonable allocation patterns that will cause an unbounded memory usage.
> 
> For example, nothing prevents current or future PCRE versions from doing this:
> 
>     for (int i = 0; i < N; i++)
>     {
>         void *p = private_malloc();
>         private_free(p);
>     }
> 
> which is going to cause an unbounded memory usage because private_free() is
> a no-op and retains everything that was allocated.

This is true, but these kind of allocation patterns would also cause a lot of 
problems with a malloc/free backend and thus I think
they are unlikely and would need to be addressed within PCRE. But even if they 
are happening and cannot be fixed for our users
by adjusting PCRE e.g. because they are stuck to distribution versions we still 
could tackle this then with an apr_bucket_alloc /
apr_bucket_free approach or worst case even with a malloc / free approach for 
particular "bad" PCRE versions. But for now the
current code seems to work well with the current PCRE allocation pattern.
While having a view beyond the current is good, we I think we should not assume 
the worst for the future.

Regards

Rüdiger
Re: svn commit: r1902572 - /httpd/httpd/trunk/server/util_pcre.c

Reply via email to